Data Set: Cora weight vectors


Proposed by: Peter Christen

Added on: 21 November 2016.

Tags: bibliographic, local, public, similarity


Description

This data set is a file containing vectors with weights (similarities) as calculated between attributes of the Cora data set.

Files

cora-weight-vectors.csv.gz

The header line of this file contains the following attributes/fields:

rec_id1,rec_id2,Str-Exact-ID-ID,Q-Gram-AUTHORS-AUTHORS,Q-Gram-TITLE-TITLE,Q-Gram-VENUE-VENUE,Edit-Dist-YEAR-YEAR

with elements:

This weight vector file was generated using the Febrl (Freely Extensible Biomedical Record Linkage) system. The Python script used is:

febrl-cora-project.py

Citation

@inproceedings{christen2015efficient,
  title={Efficient Entity Resolution with Adaptive and Interactive Training Data Selection},
  author={Christen, Peter and Vatsalan, Dinusha and Wang, Qing},
  booktitle={Data Mining (ICDM), 2015 IEEE International Conference on},
  pages={727--732},
  year={2015},
  organization={IEEE}
}

References

Efficient Entity Resolution with Adaptive and Interactive Training Data Selection, P Christen, D Vatsalan and Q Wang, 2015

Efficient Interactive Training Selection for Large-Scale Entity Resolution, Q Wang, D Vatsalan and P Christen, 2015




License CC BY 4.0