Class SelectedFeatureDataset<T extends Output<T>>

java.lang.Object
org.tribuo.Dataset<T>
org.tribuo.ImmutableDataset<T>
org.tribuo.dataset.SelectedFeatureDataset<T>
Type Parameters:
T - The type of the outputs in this Dataset.
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>, Serializable, Iterable<Example<T>>, ProtoSerializable<org.tribuo.protos.core.DatasetProto>

public final class SelectedFeatureDataset<T extends Output<T>> extends ImmutableDataset<T>
This class creates a pruned dataset which only contains the selected features. The new dataset may have fewer examples because if any of the examples have no features after the minimum cardinality has been applied, then those examples will not be added to the constructed dataset.
See Also:
  • Field Details

    • CURRENT_VERSION

      public static final int CURRENT_VERSION
      Protobuf serialization version.
      See Also:
  • Constructor Details

    • SelectedFeatureDataset

      public SelectedFeatureDataset(Dataset<T> dataset, SelectedFeatureSet featureSet)
      Constructs a selected feature dataset using all the features in the supplied feature set.
      Parameters:
      dataset - The dataset to copy.
      featureSet - The feature set to use.
    • SelectedFeatureDataset

      public SelectedFeatureDataset(Dataset<T> dataset, SelectedFeatureSet featureSet, int k)
      Constructs a selected feature dataset.
      Parameters:
      dataset - This dataset is left untouched and is used to populate the constructed dataset.
      featureSet - The feature set to use.
      k - Use the top k features if the feature set is ordered, or FeatureSelector.SELECT_ALL to select all of them, throws IllegalArgumentException if it is unordered and set to a positive value.
  • Method Details

    • deserializeFromProto

      public static SelectedFeatureDataset<?> deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException
      Deserialization factory.
      Parameters:
      version - The serialized object version.
      className - The class name.
      message - The serialized data.
      Returns:
      The deserialized object.
      Throws:
      com.google.protobuf.InvalidProtocolBufferException - If the protobuf could not be parsed from the message.
    • getNumExamplesRemoved

      public int getNumExamplesRemoved()
      The number of examples removed due to a lack of features.
      Returns:
      The number of removed examples.
    • getK

      public int getK()
      The number of features to use.

      -1 signals that all features in the supplied feature set were used.

      Returns:
      The number of features to use.
    • getFeatureSet

      public SelectedFeatureSet getFeatureSet()
      The feature set.
      Returns:
      The feature set.
    • getSelectedFeatures

      public Set<String> getSelectedFeatures()
      The subset of the selected feature set.
      Returns:
      The used subset of the selected feature set.
    • getProvenance

      public DatasetProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<T extends Output<T>>
      Overrides:
      getProvenance in class ImmutableDataset<T extends Output<T>>
    • serialize

      public org.tribuo.protos.core.DatasetProto serialize()
      Description copied from interface: ProtoSerializable
      Serializes this object to a protobuf.
      Specified by:
      serialize in interface ProtoSerializable<T extends Output<T>>
      Overrides:
      serialize in class ImmutableDataset<T extends Output<T>>
      Returns:
      The protobuf.