Social Network Analysis Basic (Ticket 173)
Find Proposals Here [MeetingNotes]
I copy requests from Ann as follows:
Before I forgot/got otherwise occupied, it seemed wise to write this down. I can't get to everything today, but I think I've basically covered the first three items completely. I would be happy to provide whatever explanation/formulas/clarification with anything listed.
Just to clarify, I will be referring to "network(s)" as a way of recognizing that the data format allows for multiple types of relationships and/or "edge attributes."
Top Priority:
A. Data Check/Confirmation. Some type of data loading report that gives (at least the option for finding out) the following:
- Do I have multiple/multiplex relationships or "edge attributes" in
my network(s). (yes/no and perhaps what they are) ---easy in NAT
- Whether the network(s) are directed or undirected. This should be a
check on the data, not purely the way that the data is formatted, I think. -- already done
- Whether the network(s) are weighted or unweighted. While you can
technically treat a dichotomous relation value as a weight, I think if would be safe to say that if your data presents a value of relationships that is different than either 0 or 1, then it's weighted. -- easy, work with A.1 in NAT
- Number of nodes in the network(s) -- done
- Number of edges (undirected) or arcs (directed) in the network(s), -- sort of done, need to clarify any further requirement
- Are there self-loops in the data? (yes/no) -- done
- Are there isolates? (yes/no) -- easy in NAT
- Do I have missing data? (yes/no) -- ??? clarify requirement with Ann, postpone it right now
Note: NAT does not handle hybrid network
B. Transformation Routines for the network(s) I sent under separate email the transformation document I pulled together in January. The "routines" I would suggest would be:
- Recoding -- postpone
- Transposing -- must done (4) Russell
- Reversing -- ??
- Dichotomizing (Turning weighted/valued into unweighted/unvalued) -- must done (3) Russell
- Symmetrizing (Turning directed/asymmetrical into undirected/symmetrical) -- must done (1) Russell
- Dealing with Diagonal (self-loops) -- must done (2) Russell
- Normalizing -- postpone??
Within this group, dichotomizing, symmetrizing, and diagonals would be the most critical priorities. Transposing would be next and the other would be nice. In addition, I would also add to that now the following:
- Deleting isolates (I think this exists somewhere already - I am not
able to open it right now and look). -- Use K-Core extraction (Russell recommend as must have) Russell
- Unpack multiple relationships (either through clearly separate
relationships or via edge attributes.) -- ?? not decision on it?? (Russell recommend as must have) Russell
Dealing with isolates and multiple relations in a file format are both critical.
C. Basic network statistics. Some of this is a repeat from your data check - so however it gets built in is fine - as long as you can find a basic set of network statistics!
It should first have an option to specify whether you use self-loops or not. If allowing such an option is not a possibility from the programming perspective, it should be assumed that self-loops don't count in the calculations. It's fine to put the burden on the user here as long as you've given them (up in the data check/confirmation stage) the option of figuring out what they have!
The most critical:
- Number of nodes in the network(s) -- done
- Number of edges/arcs in the network(s) -- done
- Mean value of edges/arcs in network(s) --easy, in NAT
- I have a question to clarify, are we talking about the mean of the weights or the mean edges/node? I'd assume the mean edges/node, but I'd like to make sure. Also, when calculating the mean edges/arcs in the network, should we also keep track of variance? If we calculate variance, we should calculate the variance for both in degree and out degree in directed networks.
- Density of network(s). -- Tim will work on density of weighted network.
For an unweighted relationship, this is equal to the mean - the sum of the possible ties ((n(n-1) for a directed relationship, and ((.5)n(n-1) for an undirected tie) minus the number of actual edges/arcs present. -- done
- Another question, the density of a network that I am using for the NAT is number of present edges/arcs divided by the total possible edges/arcs. That seems to be different than the density proposed here. Can we clarify?
For a weighted one, it is typically the sum of the tie values divided by the possible ties. -- easy, not in NAT, separate one (Russell, Tim)
- Reciprocity. Arc and dyad-based. I can send you more specifics, but
here is a decent explanation: http://www.faculty.ucr.edu/~hanneman/nettext/C8_Embedding.html#reciprocity -- postpone or duygu
- Transitivity. Again, this is a good summary - and we would want to
use the "adjacency" definition of transitivity. It would be nice (read: lower priority) to distinguish between strong and weak transitivity for a weighted network. http://faculty.ucr.edu/~hanneman/nettext/C8_Embedding.html#transitivity -- postpone or duygu
- Dyad and Triad Censuses are simple counts of the possible numbers
of dyads and triads present in the network. There are two possible dyads in a undirected relation, three in a directed relationship. There are four possible triads in an undirected relation and 16 in an directed relation. I would be happy to give you more information on all of these - of them, the dyad counts are already available to you in other things - it's counting the triads (particularly in the directed relation) that would be both the hardest and most critical. -- postpone or duygu
These are all "network" based statistics, but they first three could also be considered as node-based stats. In fact, a lower priority feature would be to allow the user to generate a those first three stats per node.
Others that would be really nice, lower priority - perhaps you already have them implemented somewhere:
- SSQ Sum of Squares
- MCSSQ Mean-centered sum of squares
- Variance
- Euclidean Norm
- Maximum Value of edge/arc -- easy in NAT, others postpone
- Are we talking just degree or should we look for max in degree and max outdegree if the network is directed?
- Minimum Value of edge/arc -- easy in NAT
- See 12.
- Number of observations (this is simply n(n-1) for directed
relations and (.5n(n-1) for undirected.
- Can we clarify this? This just seems to be the total possible connections in a network. Did we miss something?
The above are all provided in UCINET are are common statistics used to generate other statistics and can be generated on the network and on the nodes (and edges, too, actually). Again, low priority.
Stuff I can't get to right now, but things we would like to see as well:
D. More types of centrality (as you requested, I will get more specific and prioritized) -- Duygu
E. An n-clique/n-clan/k-plex package. --Duygu
F. Bridge to R – specifically for implementation of ERGM package. However, the added bonus of letting you get your data into R format would be FANTASTIC.
Within each, I think we can work on prioritizing the specifics. And I would say that adding more types of centrality and the R bridge would be the biggest priorities. The Cliques stuff is lower, but would be great.
Last December we mentioned the following, but it seems unreasonable now, and not a priority:
G. Blockmodeling
Thanks for your patience with me – and for your feedback. I realize we are at times speaking different languages, and I am at times telling you things in the most basic terms, but when language is an issue, that's all I know to do!
Thanks, Ann
