Getting Things Documented

  • Compile a complete list of documentation needs, particularly for individual algorithms
  • create tickets for each reasonable chunk of tasks; at the per-algorithm level or smaller
  • Agree on a minimal number of things each person will take care of per week (two or three, belike)
  • Each week, briefly go through what things have been taken care of in the previous one.

Implementing New Things

While there will be parallelism, I see the following as being in approximately order of when we start working on them.

Formats

  • Divvy up all the to-table converters
  • Meet after the owners have had a chance to look at the formats, and discuss what the resulting tables need to look like in relation to the formats and the necessary final products
  • Come up with reasonable time tables for implementation

Networks

  • Generalize existing algorithms (co-authorship extraction, node merging)
  • determine all additional 'types' of network extraction from tables (at least citation and
  • determine that network to derived network tools are capable of creating all further needed networks from table-extracted networks
  • implement/modify network algorithms as needed

Framework stuff

  • I suspect Bruce will be the lead on most of these, with the other categories mostly being implemented by everyone else
  • Integrate preferences and familiarize ourselves with what is involved, in case it is needed for various algorithms
  • in_data extensions
  • parallel application of single data algorithms
  • scheduler GUI
  • refactor menu manager

Visualization

  • Integrate postscript viewer
  • add the ability to save out graphs from GUESS that then appear in the data manager, with at least the position information
  • add graph modifier stuff to GUESS; also, resizeLinear and any other fixes we need
  • create visualization for time series (or really, any series) sets of graphs

Supporting Algorithms

  • Burst supporting
    • table column normalization and stemming
  • Visualization supporting
    • extract top-n nodes by attribute (ascending or descending)
    • extract nodes above/below a threshhold by attribute
    • extract nodes above/below a certain number of standard deviations by attribute
    • extract edges by attribute, like all of above
    • extract edges by normalized attribute (pick attribute on edge, attribute on node to normalize by, normalization formula)
      • e / sqrt(n1 * n2)
      • 2e / (n1 + n2)
  • analysis supporting
    • extract time slices

Tutorials and other High Level Documentation

  • As sufficient parts of tutorials to execute them are completed, these should be started on.
  • For the most part, tutorials will be of the form "Take dataset-type, extract network, look at basic properties, and visualize".
  • Tutorials should be particularly sensitive to potential alternate paths, problems, and workarounds.
  • They should be modeled after the tutorials we will be completing in the near future for the initial scientometrics capabilities.

Timeline

I think we primarily need firm commitments on minimal amounts of time we will have to work on things. Then, we will need to get Katy to sign off on those amounts, including that other things fall to the wayside in order for us to reach the hours in question, and that she does not get to put things as part of those hours; the hours are for the specific tasks on our list.

We should be able to churn out the table-to-graph things in the first week or two after the meeting where we look at the particular vagaries of the formats, and be on our way to the more complicated algorithms. I'll probably take many of the supporting algorithms.

I view preparing the current higher-level documentation (tutorials and such) needed (some of which I am working on) to be a high priority but separate task. Similarly for preparing workshops.