Data Set: Cora


Proposed by: Andrew McCallum

Added on: 20 November 2016.

Tags: bibliographic, local, public, external


Description

The Cora data contains bibliographic records of machine learning papers that have been manually clustered into groups that refer to the same publication.

Originally, Cora was prepared by Andrew McCallum, and his versions of this data set are available on his Data web page. The data is also hosted here in the DLRep.

Note that various versions of the Cora data set have been used by many publications in record linkage and entity resolution over the years.

Files

cora.csv

The Cora versions local in dlrep is a comma separated values (CSV) file as downloaded from the SecondString approximate string matching open source package.

Note the second column (field/attribute) contains the entity identifiers (publication identifiers).




License CC BY 4.0