Class MutableSequenceDataset<T extends Output<T>>
java.lang.Object
org.tribuo.sequence.SequenceDataset<T>
org.tribuo.sequence.MutableSequenceDataset<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>,Serializable,Iterable<SequenceExample<T>>,ProtoSerializable<org.tribuo.protos.core.SequenceDatasetProto>
A MutableSequenceDataset is a
SequenceDataset with a MutableFeatureMap which grows over time.
Whenever an SequenceExample is added to the dataset.- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intProtobuf serialization version.protected booleanDoes this dataset have a dense feature space.protected final MutableFeatureMapA map from feature names to IDs for the features found in this dataset.protected final MutableOutputInfo<T> A map from labels to IDs for the labels found in this dataset.Fields inherited from class org.tribuo.sequence.SequenceDataset
data, outputFactory, sourceProvenance, tribuoVersionFields inherited from interface org.tribuo.protos.ProtoSerializable
DESERIALIZATION_METHOD_NAME, PROVENANCE_SERIALIZER -
Constructor Summary
ConstructorsConstructorDescriptionMutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates a dataset from a data source.MutableSequenceDataset(DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates an empty sequence dataset.MutableSequenceDataset(ImmutableSequenceDataset<T> dataset) Copies the immutable dataset into a mutable dataset.MutableSequenceDataset(SequenceDataSource<T> dataSource) Builds a dataset from the supplied data source. -
Method Summary
Modifier and TypeMethodDescriptionvoidadd(SequenceExample<T> ex) Adds aSequenceExampleto this dataset.voidaddAll(Collection<SequenceExample<T>> collection) Adds all the SequenceExamples in the supplied collection to this dataset.voidclear()Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances.voiddensify()Iterates through the examples, converting implicit zeros into explicit zeros.static MutableSequenceDataset<?> deserializeFromProto(int version, String className, com.google.protobuf.Any message) Deserialization factory.An immutable view on the feature map.The feature map.An immutable view on the output info in this dataset.The output info in this dataset.Gets the set of labels that occur in the examples in this dataset.booleanisDense()Is the dataset dense (i.e., do all features in the domain have a value in each example).org.tribuo.protos.core.SequenceDatasetProtoSerializes this object to a protobuf.toString()Methods inherited from class org.tribuo.sequence.SequenceDataset
castDataset, createDataCarrier, deserialize, deserializeExamples, deserializeFromFile, deserializeFromStream, getData, getExample, getFlatDataset, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, serializeToFile, serializeToStream, size, validateMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
CURRENT_VERSION
public static final int CURRENT_VERSIONProtobuf serialization version.- See Also:
-
outputInfo
A map from labels to IDs for the labels found in this dataset. -
featureMap
A map from feature names to IDs for the features found in this dataset. -
dense
protected boolean denseDoes this dataset have a dense feature space.
-
-
Constructor Details
-
MutableSequenceDataset
Creates an empty sequence dataset.- Parameters:
sourceProvenance- A description of the input data, including preprocessing steps.outputFactory- The output factory.
-
MutableSequenceDataset
public MutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates a dataset from a data source. This method will create the output and feature ID maps that are needed for training and evaluating classifiers.- Parameters:
dataSource- The input data.sourceProvenance- A description of the data, including preprocessing steps.outputFactory- The output factory.
-
MutableSequenceDataset
Builds a dataset from the supplied data source.- Parameters:
dataSource- The data source.
-
MutableSequenceDataset
Copies the immutable dataset into a mutable dataset.This should be infrequently used and mostly exists for the ViterbiTrainer.
- Parameters:
dataset- The dataset to copy.
-
-
Method Details
-
deserializeFromProto
public static MutableSequenceDataset<?> deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException Deserialization factory.- Parameters:
version- The serialized object version.className- The class name.message- The serialized data.- Returns:
- The deserialized object.
- Throws:
com.google.protobuf.InvalidProtocolBufferException- If the protobuf could not be parsed from themessage.
-
clear
public void clear()Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances. -
add
Adds aSequenceExampleto this dataset.It also canonicalises the reference to each feature's name (i.e., replacing the reference to a feature's name with the canonical one stored in this Dataset's
VariableInfo). This greatly reduces the memory footprint.- Parameters:
ex- The example to add.
-
addAll
Adds all the SequenceExamples in the supplied collection to this dataset.- Parameters:
collection- The collection of SequenceExamples.
-
getOutputs
Description copied from class:SequenceDatasetGets the set of labels that occur in the examples in this dataset.- Specified by:
getOutputsin classSequenceDataset<T extends Output<T>>- Returns:
- the set of labels that occur in the examples in this dataset.
-
getFeatureIDMap
Description copied from class:SequenceDatasetAn immutable view on the feature map.- Specified by:
getFeatureIDMapin classSequenceDataset<T extends Output<T>>- Returns:
- The feature map.
-
getFeatureMap
Description copied from class:SequenceDatasetThe feature map.- Specified by:
getFeatureMapin classSequenceDataset<T extends Output<T>>- Returns:
- The feature map.
-
getOutputIDInfo
Description copied from class:SequenceDatasetAn immutable view on the output info in this dataset.- Specified by:
getOutputIDInfoin classSequenceDataset<T extends Output<T>>- Returns:
- The output info.
-
getOutputInfo
Description copied from class:SequenceDatasetThe output info in this dataset.- Specified by:
getOutputInfoin classSequenceDataset<T extends Output<T>>- Returns:
- The output info.
-
isDense
public boolean isDense()Is the dataset dense (i.e., do all features in the domain have a value in each example).- Returns:
- True if the dataset is dense.
-
densify
public void densify()Iterates through the examples, converting implicit zeros into explicit zeros. -
toString
-
getProvenance
-
serialize
public org.tribuo.protos.core.SequenceDatasetProto serialize()Description copied from interface:ProtoSerializableSerializes this object to a protobuf.- Returns:
- The protobuf.
-