Ticket #154 (closed task: fixed)

Opened 1 year ago

Last modified 3 months ago

Data Converter: file:text/isi --> prefuse.data.Table

Reported by: huangb@IU.EDU Assigned to: mwlinnem@IU.EDU
Priority: high Milestone:
Component: DataConverter Version: 0.8.x
Severity: major Keywords:
Cc:

Description (Last modified by huangb@IU.EDU)

Basic requirements for the coming v0.8 release
1. one way data conversion first, file:text/isi --> prefuse.data.Table.
2. normalize SOME multi-item columns, such as 'AU', 'C1', make sure that the code can easily be extended to normalize other columns later if needed.
3. store the value of a multi-item column in a string in prefuse.data.Table. Using "|" as a delimiter to seperate the multiple items.

Change History

10/09/07 12:14:39 changed by huangb@IU.EDU

  • component changed from Algorithm to DataConverter.

10/09/07 14:00:17 changed by huangb@IU.EDU

  • description changed.

10/29/07 11:33:51 changed by mwlinnem@IU.EDU

  • owner changed from Micah to mwlinnem@IU.EDU.

10/29/07 11:35:15 changed by mwlinnem@IU.EDU

  • status changed from new to assigned.

10/29/07 11:37:45 changed by mwlinnem@IU.EDU

  • owner changed from mwlinnem@IU.EDU to Micah.
  • status changed from assigned to new.

10/29/07 12:19:53 changed by mwlinnem@IU.EDU

  • owner changed from Micah to mwlinnem@IU.EDU.
  • status changed from new to assigned.

10/29/07 12:20:13 changed by mwlinnem@IU.EDU

  • owner changed from mwlinnem@IU.EDU to Micah.
  • status changed from assigned to new.

10/29/07 12:20:53 changed by mwlinnem@IU.EDU

Implementation completed. Awaiting user feedback.

10/29/07 12:25:00 changed by mwlinnem@IU.EDU

  • owner changed from Micah to mwlinnem@IU.EDU.

10/29/07 12:25:19 changed by mwlinnem@IU.EDU

  • status changed from new to assigned.

10/29/07 14:08:05 changed by mwlinnem@IU.EDU

Stage: Awaiting user feedback

Description: The user requirements are now fully implemented (and more).

Next Step: Awaiting user feedback to determine if feature is adequately completed, so we can move on to the documentation phase.

10/29/07 14:08:16 changed by mwlinnem@IU.EDU

  • priority changed from highest to high.
  • severity changed from critical to major.

11/12/07 11:55:50 changed by mwlinnem@IU.EDU

Stage: User feedback received

Description: The following feedback was received on the current implementation:

Please keep the following printout in the console window ======================= Loaded: /home/mwlinnem/isi/milgrams.isi

Normalizing author names...

Removing duplicate publications... ========================

And then, please do NOT print out the long log that display the info of removing duplicate publications.

Replace the following printout summary at the end ============================= 191 out of 595 publication records were duplicates Saved: /home/mwlinnem/milgrams.csv =============================

with the following sequence:

Loaded # records Deleted # duplicate records and saved them in xxx.csv Saved # records in xxx.csv

For " Deleted # duplicate records and saved them in xxx.csv", it will be better if you can add one more column in output csv file. At the end of each record, the new column lists the reason of deletion, such as smaller # of CT, identical, etc. ]

Next Step: Reimplementing based on feedback

11/12/07 14:09:10 changed by mwlinnem@IU.EDU

A subtask of ISI Co-authorship network pipeline
(this should be at the top but oh well)

11/14/07 13:04:15 changed by mwlinnem@IU.EDU

Stage:
Re-implementation Description:
After looking into this for a while, I realised there had been somewhat of a misunderstanding. The isi duplicate remover has in_data type prefuse.data.Table, and out_data type prefuse.data.Table. There is no saving out to the file system involved. As such, it does not make sense to save out the duplicate records as a csv file, nor does it make sense to talk about anything being "saved". I will instead write the log information (such and such pair of publications with identical ID was found, and one was deleted because of whatever) out to a file, and keep a simple message about how many records were loaded/deleted/saved.

Next Step:
Finish up implementation, taking into account what I have said above.

11/14/07 13:31:07 changed by mwlinnem@IU.EDU

Stage: [BR]] Implementation Completed

Description:
What was implemented was significantly different than what was requested, but in light of my previous comment I think the current implementation should be adequate.

Next Step:
User feedback

12/03/07 11:24:56 changed by mwlinnem@IU.EDU

Stage:
User feedback received

Description:
Implementation was reviewed by Katy, who approved it after a few minor changes.

Next Step:
Documentation

08/13/08 12:42:46 changed by mwlinnem@IU.EDU

  • status changed from assigned to closed.
  • resolution set to fixed.

Documented, and reviewed by many people in the process of writing a paper based on it. Closing.