clustering¶

deduplipy.clustering.hierarchical_clustering(scored_pairs_table: pandas.core.frame.DataFrame, col_names: List, cluster_threshold: float = 0.5) → pandas.core.frame.DataFrame¶

Apply hierarchical clustering to scored_pairs_table and perform the actual deduplication by adding a cluster id to each record