Class MutableSequenceDataset<T extends Output<T>>
java.lang.Object
org.tribuo.sequence.SequenceDataset<T>
org.tribuo.sequence.MutableSequenceDataset<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>,Serializable,Iterable<SequenceExample<T>>
A MutableSequenceDataset is a
SequenceDataset with a MutableFeatureMap which grows over time.
Whenever an SequenceExample is added to the dataset.- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected booleanprotected final MutableFeatureMapA map from feature names to IDs for the features found in this dataset.protected final MutableOutputInfo<T> A map from labels to IDs for the labels found in this dataset.Fields inherited from class org.tribuo.sequence.SequenceDataset
data, outputFactory, sourceProvenance -
Constructor Summary
ConstructorsConstructorDescriptionMutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates a dataset from a data source.MutableSequenceDataset(DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates an empty sequence dataset.MutableSequenceDataset(ImmutableSequenceDataset<T> dataset) MutableSequenceDataset(SequenceDataSource<T> dataSource) -
Method Summary
Modifier and TypeMethodDescriptionvoidadd(SequenceExample<T> ex) Adds aSequenceExampleto this dataset.voidaddAll(Collection<SequenceExample<T>> collection) Adds all the SequenceExamples in the supplied collection to this dataset.voidclear()Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances.voiddensify()Iterates through the examples, converting implicit zeros into explicit zeros.An immutable view on the feature map.The feature map.An immutable view on the output info in this dataset.The output info in this dataset.Gets the set of labels that occur in the examples in this dataset.booleanisDense()Is the dataset dense (i.e., do all features in the domain have a value in each example).toString()Methods inherited from class org.tribuo.sequence.SequenceDataset
getData, getExample, getFlatDataset, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, sizeMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
outputInfo
A map from labels to IDs for the labels found in this dataset. -
featureMap
A map from feature names to IDs for the features found in this dataset. -
dense
protected boolean dense
-
-
Constructor Details
-
MutableSequenceDataset
Creates an empty sequence dataset.- Parameters:
sourceProvenance- A description of the input data, including preprocessing steps.outputFactory- The output factory.
-
MutableSequenceDataset
public MutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates a dataset from a data source. This method will create the output and feature ID maps that are needed for training and evaluating classifiers.- Parameters:
dataSource- The input data.sourceProvenance- A description of the data, including preprocessing steps.outputFactory- The output factory.
-
MutableSequenceDataset
-
MutableSequenceDataset
-
-
Method Details
-
clear
public void clear()Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances. -
add
Adds aSequenceExampleto this dataset.It also canonicalises the reference to each feature's name (i.e., replacing the reference to a feature's name with the canonical one stored in this Dataset's
VariableInfo). This greatly reduces the memory footprint.- Parameters:
ex- The example to add.
-
addAll
Adds all the SequenceExamples in the supplied collection to this dataset.- Parameters:
collection- The collection of SequenceExamples.
-
getOutputs
Description copied from class:SequenceDatasetGets the set of labels that occur in the examples in this dataset.- Specified by:
getOutputsin classSequenceDataset<T extends Output<T>>- Returns:
- the set of labels that occur in the examples in this dataset.
-
getFeatureIDMap
Description copied from class:SequenceDatasetAn immutable view on the feature map.- Specified by:
getFeatureIDMapin classSequenceDataset<T extends Output<T>>- Returns:
- The feature map.
-
getFeatureMap
Description copied from class:SequenceDatasetThe feature map.- Specified by:
getFeatureMapin classSequenceDataset<T extends Output<T>>- Returns:
- The feature map.
-
getOutputIDInfo
Description copied from class:SequenceDatasetAn immutable view on the output info in this dataset.- Specified by:
getOutputIDInfoin classSequenceDataset<T extends Output<T>>- Returns:
- The output info.
-
getOutputInfo
Description copied from class:SequenceDatasetThe output info in this dataset.- Specified by:
getOutputInfoin classSequenceDataset<T extends Output<T>>- Returns:
- The output info.
-
isDense
public boolean isDense()Is the dataset dense (i.e., do all features in the domain have a value in each example).- Returns:
- True if the dataset is dense.
-
densify
public void densify()Iterates through the examples, converting implicit zeros into explicit zeros. -
toString
-
getProvenance
-