public class MutableDataset<T extends Output<T>> extends Dataset<T>
Dataset
with a MutableFeatureMap
which grows over time.
Whenever an Example
is added to the dataset it observes each feature and output
keeping appropriate statistics in the FeatureMap
and OutputInfo
.Modifier and Type | Field and Description |
---|---|
protected boolean |
dense
Denotes if this dataset contains implicit zeros or not.
|
protected MutableFeatureMap |
featureMap
A map from feature names to feature info objects.
|
protected MutableOutputInfo<T> |
outputMap
Information about the outputs in this dataset.
|
protected List<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> |
transformProvenances
The provenances of the transformations applied to this dataset.
|
data, indices, outputFactory, sourceProvenance
Constructor and Description |
---|
MutableDataset(DataProvenance sourceProvenance,
OutputFactory<T> outputFactory)
Creates an empty dataset.
|
MutableDataset(DataSource<T> dataSource)
Creates a dataset from a data source.
|
MutableDataset(Iterable<Example<T>> dataSource,
DataProvenance provenance,
OutputFactory<T> outputFactory)
Creates a dataset from a data source.
|
Modifier and Type | Method and Description |
---|---|
void |
add(Example<T> ex)
Adds an example to the dataset, which observes the output and each feature value.
|
void |
addAll(Collection<? extends Example<T>> collection)
Adds all the Examples in the supplied collection to this dataset.
|
void |
clear()
Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances.
|
static <T extends Output<T>> |
createDeepCopy(Dataset<T> other)
Creates a deep copy of the supplied
Dataset which is mutable. |
void |
densify()
Iterates through the examples, converting implicit zeros into explicit zeros.
|
ImmutableFeatureMap |
getFeatureIDMap()
Returns or generates an
ImmutableFeatureMap . |
MutableFeatureMap |
getFeatureMap()
Returns this dataset's
FeatureMap . |
ImmutableOutputInfo<T> |
getOutputIDInfo()
Returns or generates an
ImmutableOutputInfo . |
OutputInfo<T> |
getOutputInfo()
Returns this dataset's
OutputInfo . |
Set<T> |
getOutputs()
Gets the set of possible outputs in this dataset.
|
DatasetProvenance |
getProvenance() |
boolean |
isDense()
Is the dataset dense (i.e., do all features in the domain have a value in each example).
|
void |
setWeights(Map<T,Float> weights)
Sets the weights in each example according to their output.
|
String |
toString() |
void |
transform(TransformerMap transformerMap)
Applies all the transformations from the
TransformerMap to this dataset. |
createTransformers, createTransformers, getData, getExample, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, shuffle, size
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
forEach, spliterator
protected final MutableOutputInfo<T extends Output<T>> outputMap
protected final MutableFeatureMap featureMap
protected final List<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformProvenances
protected boolean dense
public MutableDataset(DataProvenance sourceProvenance, OutputFactory<T> outputFactory)
sourceProvenance
- A description of the input data, including preprocessing steps.outputFactory
- The output factory.public MutableDataset(Iterable<Example<T>> dataSource, DataProvenance provenance, OutputFactory<T> outputFactory)
dataSource
- The examples.provenance
- A description of the input data, including preprocessing steps.outputFactory
- The output factory.public MutableDataset(DataSource<T> dataSource)
dataSource
- The examples.public void add(Example<T> ex)
It also canonicalises the reference to each feature's name (i.e., replacing the reference
to a feature's name with the canonical one stored in this Dataset's VariableInfo
).
This greatly reduces the memory footprint.
ex
- The example to add.public void addAll(Collection<? extends Example<T>> collection)
collection
- The collection of Examples.public void setWeights(Map<T,Float> weights)
weights
- A map of Output
s to float weights.public Set<T> getOutputs()
In the case of regression returns a Set containing dimension names.
getOutputs
in class Dataset<T extends Output<T>>
public ImmutableFeatureMap getFeatureIDMap()
Dataset
ImmutableFeatureMap
.getFeatureIDMap
in class Dataset<T extends Output<T>>
public MutableFeatureMap getFeatureMap()
Dataset
FeatureMap
.getFeatureMap
in class Dataset<T extends Output<T>>
public ImmutableOutputInfo<T> getOutputIDInfo()
Dataset
ImmutableOutputInfo
.getOutputIDInfo
in class Dataset<T extends Output<T>>
public OutputInfo<T> getOutputInfo()
Dataset
OutputInfo
.getOutputInfo
in class Dataset<T extends Output<T>>
public boolean isDense()
public void transform(TransformerMap transformerMap)
TransformerMap
to this dataset.transformerMap
- The transformations to apply.public void densify()
public void clear()
public DatasetProvenance getProvenance()
public static <T extends Output<T>> MutableDataset<T> createDeepCopy(Dataset<T> other)
Dataset
which is mutable.
Copies the individual examples using their copy method.
T
- The output type.other
- The dataset to copy.Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.