load_data

deduplipy.datasets.load_data(kind: str = 'voters')pandas.core.frame.DataFrame

Load data for experimentation. kind can be ‘stoxx50’ or ‘voters’.

Stoxx 50 data are created by the developer of DedupliPy. Voters data is based on the North Carolina voter registry and this dataset is provided by Prof. Erhard Rahm (‘Comparative Evaluation of Distributed Clustering Schemes for Multi-source Entity Resolution’).

Args:

kind: ‘stoxx50’ or ‘voters’

Returns:

Pandas dataframe containing experimentation dataset