Package org.tribuo.dataset
Class SelectedFeatureDataset<T extends Output<T>>
java.lang.Object
org.tribuo.Dataset<T>
org.tribuo.ImmutableDataset<T>
org.tribuo.dataset.SelectedFeatureDataset<T>
- Type Parameters:
T
- The type of the outputs in thisDataset
.
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>
,Serializable
,Iterable<Example<T>>
,ProtoSerializable<org.tribuo.protos.core.DatasetProto>
This class creates a pruned dataset which only contains the selected features.
The new dataset may have fewer examples because if any of the examples
have no features after the minimum cardinality has been applied, then those
examples will not be added to the constructed dataset.
- See Also:
-
Nested Class Summary
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
Protobuf serialization version.Fields inherited from class org.tribuo.ImmutableDataset
dropInvalidExamples, featureIDMap, outputIDInfo
Fields inherited from class org.tribuo.Dataset
data, indices, outputFactory, sourceProvenance, tribuoVersion
Fields inherited from interface org.tribuo.protos.ProtoSerializable
DESERIALIZATION_METHOD_NAME, PROVENANCE_SERIALIZER
-
Constructor Summary
ConstructorDescriptionSelectedFeatureDataset
(Dataset<T> dataset, SelectedFeatureSet featureSet) Constructs a selected feature dataset using all the features in the supplied feature set.SelectedFeatureDataset
(Dataset<T> dataset, SelectedFeatureSet featureSet, int k) Constructs a selected feature dataset. -
Method Summary
Modifier and TypeMethodDescriptionstatic SelectedFeatureDataset<?>
deserializeFromProto
(int version, String className, com.google.protobuf.Any message) Deserialization factory.The feature set.int
getK()
The number of features to use.int
The number of examples removed due to a lack of features.The subset of the selected feature set.org.tribuo.protos.core.DatasetProto
Serializes this object to a protobuf.Methods inherited from class org.tribuo.ImmutableDataset
add, add, copyDataset, copyDataset, copyDataset, getDropInvalidExamples, getFeatureIDMap, getFeatureMap, getOutputIDInfo, getOutputInfo, getOutputs, hashFeatureMap, toString
Methods inherited from class org.tribuo.Dataset
castDataset, createDataCarrier, createDataCarrier, createTransformers, createTransformers, deserialize, deserializeExamples, deserializeFromFile, deserializeFromStream, getData, getExample, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, serializeToFile, serializeToStream, shuffle, size, validate
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
CURRENT_VERSION
public static final int CURRENT_VERSIONProtobuf serialization version.- See Also:
-
-
Constructor Details
-
SelectedFeatureDataset
Constructs a selected feature dataset using all the features in the supplied feature set.- Parameters:
dataset
- The dataset to copy.featureSet
- The feature set to use.
-
SelectedFeatureDataset
Constructs a selected feature dataset.- Parameters:
dataset
- This dataset is left untouched and is used to populate the constructed dataset.featureSet
- The feature set to use.k
- Use the top k features if the feature set is ordered, orFeatureSelector.SELECT_ALL
to select all of them, throwsIllegalArgumentException
if it is unordered and set to a positive value.
-
-
Method Details
-
deserializeFromProto
public static SelectedFeatureDataset<?> deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException Deserialization factory.- Parameters:
version
- The serialized object version.className
- The class name.message
- The serialized data.- Returns:
- The deserialized object.
- Throws:
com.google.protobuf.InvalidProtocolBufferException
- If the protobuf could not be parsed from themessage
.
-
getNumExamplesRemoved
public int getNumExamplesRemoved()The number of examples removed due to a lack of features.- Returns:
- The number of removed examples.
-
getK
public int getK()The number of features to use.-1 signals that all features in the supplied feature set were used.
- Returns:
- The number of features to use.
-
getFeatureSet
The feature set.- Returns:
- The feature set.
-
getSelectedFeatures
The subset of the selected feature set.- Returns:
- The used subset of the selected feature set.
-
getProvenance
- Specified by:
getProvenance
in interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<T extends Output<T>>
- Overrides:
getProvenance
in classImmutableDataset<T extends Output<T>>
-
serialize
public org.tribuo.protos.core.DatasetProto serialize()Description copied from interface:ProtoSerializable
Serializes this object to a protobuf.- Specified by:
serialize
in interfaceProtoSerializable<T extends Output<T>>
- Overrides:
serialize
in classImmutableDataset<T extends Output<T>>
- Returns:
- The protobuf.
-