Class MutableSequenceDataset<T extends Output<T>>
java.lang.Object
org.tribuo.sequence.SequenceDataset<T>
org.tribuo.sequence.MutableSequenceDataset<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>,Serializable,Iterable<SequenceExample<T>>
A MutableSequenceDataset is a
SequenceDataset with a MutableFeatureMap which grows over time.
Whenever an SequenceExample is added to the dataset.- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected booleanDoes this dataset have a dense feature space.protected final MutableFeatureMapA map from feature names to IDs for the features found in this dataset.protected final MutableOutputInfo<T> A map from labels to IDs for the labels found in this dataset.Fields inherited from class org.tribuo.sequence.SequenceDataset
data, outputFactory, sourceProvenance -
Constructor Summary
ConstructorsConstructorDescriptionMutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates a dataset from a data source.MutableSequenceDataset(DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates an empty sequence dataset.MutableSequenceDataset(ImmutableSequenceDataset<T> dataset) Copies the immutable dataset into a mutable dataset.MutableSequenceDataset(SequenceDataSource<T> dataSource) Builds a dataset from the supplied data source. -
Method Summary
Modifier and TypeMethodDescriptionvoidadd(SequenceExample<T> ex) Adds aSequenceExampleto this dataset.voidaddAll(Collection<SequenceExample<T>> collection) Adds all the SequenceExamples in the supplied collection to this dataset.voidclear()Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances.voiddensify()Iterates through the examples, converting implicit zeros into explicit zeros.An immutable view on the feature map.The feature map.An immutable view on the output info in this dataset.The output info in this dataset.Gets the set of labels that occur in the examples in this dataset.booleanisDense()Is the dataset dense (i.e., do all features in the domain have a value in each example).toString()Methods inherited from class org.tribuo.sequence.SequenceDataset
getData, getExample, getFlatDataset, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, sizeMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
outputInfo
A map from labels to IDs for the labels found in this dataset. -
featureMap
A map from feature names to IDs for the features found in this dataset. -
dense
protected boolean denseDoes this dataset have a dense feature space.
-
-
Constructor Details
-
MutableSequenceDataset
Creates an empty sequence dataset.- Parameters:
sourceProvenance- A description of the input data, including preprocessing steps.outputFactory- The output factory.
-
MutableSequenceDataset
public MutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates a dataset from a data source. This method will create the output and feature ID maps that are needed for training and evaluating classifiers.- Parameters:
dataSource- The input data.sourceProvenance- A description of the data, including preprocessing steps.outputFactory- The output factory.
-
MutableSequenceDataset
Builds a dataset from the supplied data source.- Parameters:
dataSource- The data source.
-
MutableSequenceDataset
Copies the immutable dataset into a mutable dataset.This should be infrequently used and mostly exists for the ViterbiTrainer.
- Parameters:
dataset- The dataset to copy.
-
-
Method Details
-
clear
public void clear()Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances. -
add
Adds aSequenceExampleto this dataset.It also canonicalises the reference to each feature's name (i.e., replacing the reference to a feature's name with the canonical one stored in this Dataset's
VariableInfo). This greatly reduces the memory footprint.- Parameters:
ex- The example to add.
-
addAll
Adds all the SequenceExamples in the supplied collection to this dataset.- Parameters:
collection- The collection of SequenceExamples.
-
getOutputs
Description copied from class:SequenceDatasetGets the set of labels that occur in the examples in this dataset.- Specified by:
getOutputsin classSequenceDataset<T extends Output<T>>- Returns:
- the set of labels that occur in the examples in this dataset.
-
getFeatureIDMap
Description copied from class:SequenceDatasetAn immutable view on the feature map.- Specified by:
getFeatureIDMapin classSequenceDataset<T extends Output<T>>- Returns:
- The feature map.
-
getFeatureMap
Description copied from class:SequenceDatasetThe feature map.- Specified by:
getFeatureMapin classSequenceDataset<T extends Output<T>>- Returns:
- The feature map.
-
getOutputIDInfo
Description copied from class:SequenceDatasetAn immutable view on the output info in this dataset.- Specified by:
getOutputIDInfoin classSequenceDataset<T extends Output<T>>- Returns:
- The output info.
-
getOutputInfo
Description copied from class:SequenceDatasetThe output info in this dataset.- Specified by:
getOutputInfoin classSequenceDataset<T extends Output<T>>- Returns:
- The output info.
-
isDense
public boolean isDense()Is the dataset dense (i.e., do all features in the domain have a value in each example).- Returns:
- True if the dataset is dense.
-
densify
public void densify()Iterates through the examples, converting implicit zeros into explicit zeros. -
toString
-
getProvenance
-