active_learning¶
-
class
deduplipy.active_learning.
ActiveStringMatchLearner
(col_names: List[str], interaction: bool = False, uncertainty_threshold: float = 0.1, verbose: Union[int, bool] = 0, uncertainty_improvement_threshold: float = 0.01, min_nr_entries: int = 10)¶ Bases:
object
Class to train a string matching model using active learning.
- Parameters
col_names – column names to use for matching
interaction – whether to include interaction features
uncertainty_threshold – threshold on the uncertainty of the classifier during active learning, used for determining if the model has converged
uncertainty_improvement_threshold – threshold on the uncertainty improvement of classifier during active learning, used for determining if the model has converged
verbose – sets verbosity
min_nr_entries – minimum number of responses required before classifier convergence is tested
-
fit
(X: pandas.core.frame.DataFrame) → deduplipy.active_learning.active_learning.ActiveStringMatchLearner¶ Fit ActiveStringMatchLearner instance on pairs of strings
- Parameters
X – Pandas dataframe containing pairs of strings
-
predict
(X: Union[pandas.core.frame.DataFrame, numpy.ndarray]) → numpy.ndarray¶ Predict on new data whether the pairs are a match or not
- Parameters
X – Pandas dataframe to predict on
- Returns
predictions
-
predict_proba
(X: Union[pandas.core.frame.DataFrame, numpy.ndarray]) → numpy.ndarray¶ Predict probabilities on new data whether the pairs are a match or not
- Parameters
X – Pandas dataframe to predict on
- Returns
match probabilities