T
- the type of the features in the data set.public abstract class Dataset<T extends Output<T>> extends Object implements Iterable<Example<T>>, com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>, Serializable
Subclass MutableDataset
rather than this class.
Modifier and Type | Field and Description |
---|---|
protected List<Example<T>> |
data
The data in this data set.
|
protected int[] |
indices
The indices of the shuffled order.
|
protected OutputFactory<T> |
outputFactory
A factory for making
OutputInfo and Output of the appropriate type. |
protected DataProvenance |
sourceProvenance
The provenance of the data source, extracted on construction.
|
Modifier | Constructor and Description |
---|---|
protected |
Dataset(DataProvenance provenance,
OutputFactory<T> outputFactory)
Creates a dataset.
|
protected |
Dataset(DataSource<T> dataSource)
Creates a dataset.
|
Modifier and Type | Method and Description |
---|---|
TransformerMap |
createTransformers(TransformationMap transformations)
Takes a
TransformationMap and converts it into a TransformerMap by
observing all the values in this dataset. |
TransformerMap |
createTransformers(TransformationMap transformations,
boolean includeImplicitZeroFeatures)
Takes a
TransformationMap and converts it into a TransformerMap by
observing all the values in this dataset. |
List<Example<T>> |
getData()
Gets the examples as an unmodifiable list.
|
Example<T> |
getExample(int index)
Gets the example at the supplied index.
|
abstract ImmutableFeatureMap |
getFeatureIDMap()
Returns or generates an
ImmutableFeatureMap . |
abstract FeatureMap |
getFeatureMap()
Returns this dataset's
FeatureMap . |
OutputFactory<T> |
getOutputFactory()
Gets the output factory this dataset contains.
|
abstract ImmutableOutputInfo<T> |
getOutputIDInfo()
Returns or generates an
ImmutableOutputInfo . |
abstract OutputInfo<T> |
getOutputInfo()
Returns this dataset's
OutputInfo . |
abstract Set<T> |
getOutputs()
Gets the set of outputs that occur in the examples in this dataset.
|
String |
getSourceDescription()
A String description of this dataset.
|
DataProvenance |
getSourceProvenance()
The provenance of the data this Dataset contains.
|
Iterator<Example<T>> |
iterator() |
void |
shuffle(boolean shuffle)
Shuffles the indices, or stops shuffling them.
|
int |
size()
Gets the size of the data set.
|
String |
toString() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
forEach, spliterator
protected final DataProvenance sourceProvenance
protected final OutputFactory<T extends Output<T>> outputFactory
OutputInfo
and Output
of the appropriate type.protected int[] indices
protected Dataset(DataProvenance provenance, OutputFactory<T> outputFactory)
provenance
- A description of the data, including preprocessing steps.outputFactory
- The output factory.protected Dataset(DataSource<T> dataSource)
dataSource
- the DataSource to use.public String getSourceDescription()
public DataProvenance getSourceProvenance()
public List<Example<T>> getData()
In other words, using the following to add additional examples to this dataset with throw an exception:
dataset.getData().add(example)
Instead, use MutableDataset.add(Example)
.
public OutputFactory<T> getOutputFactory()
public abstract Set<T> getOutputs()
public Example<T> getExample(int index)
Throws IllegalArgumentException if the index is invalid or outside the bounds.
index
- The index of the example.public int size()
public void shuffle(boolean shuffle)
The shuffle only affects the iterator, it does not affect
getExample(int)
.
Multiple calls with the argument true will shuffle the dataset multiple times. The RNG is shared across all Dataset instances, so methods which access it are synchronized.
Using this method will prevent the provenance system from tracking the exact state of the dataset, which may be important for trainers which depend on the example order, like those using stochastic gradient descent.
shuffle
- If true shuffle the data.public abstract ImmutableOutputInfo<T> getOutputIDInfo()
ImmutableOutputInfo
.public abstract OutputInfo<T> getOutputInfo()
OutputInfo
.public abstract ImmutableFeatureMap getFeatureIDMap()
ImmutableFeatureMap
.public abstract FeatureMap getFeatureMap()
FeatureMap
.public TransformerMap createTransformers(TransformationMap transformations)
TransformationMap
and converts it into a TransformerMap
by
observing all the values in this dataset.
Does not mutate the dataset, if you wish to apply the TransformerMap, use
MutableDataset.transform(org.tribuo.transform.TransformerMap)
or TransformerMap.transformDataset(org.tribuo.Dataset<T>)
.
TransformerMaps operate on feature values which are present, sparse values
are ignored and not transformed. If the zeros should be transformed, call
MutableDataset.densify()
on the datasets before applying a transformer.
This method calls createTransformers(TransformationMap, boolean)
with
includeImplicitZeroFeatures
set to false, thus ignoring implicitly zero
features when fitting the transformations. This is the default behaviour in
Tribuo 4.0, but causes erroneous behaviour in
IDFTransformation
so should be
avoided with that transformation.
See org.tribuo.transform
for a more detailed discussion of densify and includeImplicitZeroFeatures.
Throws IllegalArgumentException
if the TransformationMap object has
regexes which apply to multiple features.
transformations
- The transformations to fit.public TransformerMap createTransformers(TransformationMap transformations, boolean includeImplicitZeroFeatures)
TransformationMap
and converts it into a TransformerMap
by
observing all the values in this dataset.
Does not mutate the dataset, if you wish to apply the TransformerMap, use
MutableDataset.transform(org.tribuo.transform.TransformerMap)
or TransformerMap.transformDataset(org.tribuo.Dataset<T>)
.
TransformerMaps operate on feature values which are present, sparse values
are ignored and not transformed. If the zeros should be transformed, call
MutableDataset.densify()
on the datasets before applying a transformer.
See org.tribuo.transform
for a more detailed discussion of densify and includeImplicitZeroFeatures.
Throws IllegalArgumentException
if the TransformationMap object has
regexes which apply to multiple features.
transformations
- The transformations to fit.includeImplicitZeroFeatures
- Use the implicit zero feature values to construct the transformations.Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.