Class MinimumCardinalityDataset<T extends Output<T>>

java.lang.Object
org.tribuo.Dataset<T>
org.tribuo.ImmutableDataset<T>
org.tribuo.dataset.MinimumCardinalityDataset<T>
Type Parameters:
T - The type of the outputs in this Dataset.
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>, Serializable, Iterable<Example<T>>

public class MinimumCardinalityDataset<T extends Output<T>> extends ImmutableDataset<T>
This class creates a pruned dataset in which low frequency features that occur less than the provided minimum cardinality have been removed. This can be useful when the dataset is very large due to many low-frequency features. For example, this class can be used to remove low frequency words from a BoW formatted dataset. Here, a new dataset is created so that the feature counts are recalculated and so that the original, passed-in dataset is not modified. The returned dataset may have fewer examples because if any of the examples have no features after the minimum cardinality has been applied, then those examples will not be added to the constructed dataset.
See Also:
  • Constructor Details

    • MinimumCardinalityDataset

      public MinimumCardinalityDataset(Dataset<T> dataset, int minCardinality)
      Parameters:
      dataset - this dataset is left untouched and is used to populate the constructed dataset.
      minCardinality - features with a frequency less than minCardinality will be removed.
  • Method Details