Package org.tribuo
Class MutableDataset<T extends Output<T>>
java.lang.Object
org.tribuo.Dataset<T>
org.tribuo.MutableDataset<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>
,Serializable
,Iterable<Example<T>>
A MutableDataset is a
Dataset
with a MutableFeatureMap
which grows over time.
Whenever an Example
is added to the dataset it observes each feature and output
keeping appropriate statistics in the FeatureMap
and OutputInfo
.- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionprotected boolean
Denotes if this dataset contains implicit zeros or not.protected final MutableFeatureMap
A map from feature names to feature info objects.protected final MutableOutputInfo<T>
Information about the outputs in this dataset.protected final List<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance>
The provenances of the transformations applied to this dataset.Fields inherited from class org.tribuo.Dataset
data, indices, outputFactory, sourceProvenance
-
Constructor Summary
ConstructorDescriptionMutableDataset
(Iterable<Example<T>> dataSource, DataProvenance provenance, OutputFactory<T> outputFactory) Creates a dataset from a data source.MutableDataset
(DataSource<T> dataSource) Creates a dataset from a data source.MutableDataset
(DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates an empty dataset. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Adds an example to the dataset, which observes the output and each feature value.void
addAll
(Collection<? extends Example<T>> collection) Adds all the Examples in the supplied collection to this dataset.void
clear()
Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances.static <T extends Output<T>>
MutableDataset<T>createDeepCopy
(Dataset<T> other) Creates a deep copy of the suppliedDataset
which is mutable.void
densify()
Iterates through the examples, converting implicit zeros into explicit zeros.Returns or generates anImmutableFeatureMap
.Returns this dataset'sFeatureMap
.Returns or generates anImmutableOutputInfo
.Returns this dataset'sOutputInfo
.Gets the set of possible outputs in this dataset.boolean
isDense()
Is the dataset dense (i.e., do all features in the domain have a value in each example).void
Rebuilds the feature info by inspecting each example.void
Rebuilds the output info by inspecting each example.void
setWeights
(Map<T, Float> weights) Sets the weights in each example according to their output.toString()
void
transform
(TransformerMap transformerMap) Applies all the transformations from theTransformerMap
to this dataset.Methods inherited from class org.tribuo.Dataset
castDataset, createTransformers, createTransformers, getData, getExample, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, shuffle, size, validate
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
outputMap
Information about the outputs in this dataset. -
featureMap
A map from feature names to feature info objects. -
transformProvenances
The provenances of the transformations applied to this dataset. -
dense
protected boolean denseDenotes if this dataset contains implicit zeros or not.
-
-
Constructor Details
-
MutableDataset
Creates an empty dataset.- Parameters:
sourceProvenance
- A description of the input data, including preprocessing steps.outputFactory
- The output factory.
-
MutableDataset
public MutableDataset(Iterable<Example<T>> dataSource, DataProvenance provenance, OutputFactory<T> outputFactory) Creates a dataset from a data source. This method will create the output and feature maps that are needed for training and evaluating classifiers.- Parameters:
dataSource
- The examples.provenance
- A description of the input data, including preprocessing steps.outputFactory
- The output factory.
-
MutableDataset
Creates a dataset from a data source. This method creates the output and feature maps needed for training and evaluating classifiers.- Parameters:
dataSource
- The examples.
-
-
Method Details
-
add
Adds an example to the dataset, which observes the output and each feature value.It also canonicalises the reference to each feature's name (i.e., replacing the reference to a feature's name with the canonical one stored in this Dataset's
VariableInfo
). This greatly reduces the memory footprint.- Parameters:
ex
- The example to add.
-
addAll
Adds all the Examples in the supplied collection to this dataset.- Parameters:
collection
- The collection of Examples.
-
setWeights
Sets the weights in each example according to their output.- Parameters:
weights
- A map ofOutput
s to float weights.
-
getOutputs
Gets the set of possible outputs in this dataset.In the case of regression returns a Set containing dimension names.
- Specified by:
getOutputs
in classDataset<T extends Output<T>>
- Returns:
- The set of possible outputs.
-
getFeatureIDMap
Description copied from class:Dataset
Returns or generates anImmutableFeatureMap
.- Specified by:
getFeatureIDMap
in classDataset<T extends Output<T>>
- Returns:
- An immutable feature map with id numbers.
-
getFeatureMap
Description copied from class:Dataset
Returns this dataset'sFeatureMap
.- Specified by:
getFeatureMap
in classDataset<T extends Output<T>>
- Returns:
- The feature map from this dataset.
-
getOutputIDInfo
Description copied from class:Dataset
Returns or generates anImmutableOutputInfo
.- Specified by:
getOutputIDInfo
in classDataset<T extends Output<T>>
- Returns:
- An immutable output info.
-
getOutputInfo
Description copied from class:Dataset
Returns this dataset'sOutputInfo
.- Specified by:
getOutputInfo
in classDataset<T extends Output<T>>
- Returns:
- The output info.
-
toString
-
isDense
public boolean isDense()Is the dataset dense (i.e., do all features in the domain have a value in each example).- Returns:
- True if the dataset is dense.
-
transform
Applies all the transformations from theTransformerMap
to this dataset.- Parameters:
transformerMap
- The transformations to apply.
-
densify
public void densify()Iterates through the examples, converting implicit zeros into explicit zeros. -
clear
public void clear()Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances. -
regenerateOutputInfo
public void regenerateOutputInfo()Rebuilds the output info by inspecting each example. -
regenerateFeatureInfo
public void regenerateFeatureInfo()Rebuilds the feature info by inspecting each example. -
getProvenance
-
createDeepCopy
Creates a deep copy of the suppliedDataset
which is mutable.Copies the individual examples using their copy method.
- Type Parameters:
T
- The output type.- Parameters:
other
- The dataset to copy.- Returns:
- A mutable deep copy of the dataset.
-