Dataset
.See: Description
Interface | Description |
---|---|
Transformation |
An interface representing a class of transformations
which can be applied to a feature.
|
TransformationProvenance |
A tag interface for provenances in the transformation system.
|
Transformer |
A fitted
Transformation which can apply
a transform to the input value. |
TransformStatistics |
An interface for the statistics that need to be
collected for a specific
Transformation on
a single feature. |
Class | Description |
---|---|
TransformationMap |
A carrier type for a set of transformations to be applied to a
Dataset . |
TransformationMap.TransformationList |
A carrier type as OLCUT does not support nested generics.
|
TransformedModel<T extends Output<T>> |
Wraps a
Model with it's TransformerMap so all Example s are transformed
appropriately before the model makes predictions. |
TransformerMap | |
TransformerMap.TransformerMapProvenance |
Provenance for
TransformerMap . |
TransformTrainer<T extends Output<T>> |
A
Trainer which encapsulates another trainer plus a TransformationMap object
to apply to each Dataset before training each Model . |
Dataset
.
This package is the necessary infrastructure for transformations. The workflow is first to build a
TransformationMap
which represents the
Transformation
s and the order that they should be applied to the specified
Feature
s. This can be applied to a Dataset to produce a
TransformerMap
which contains a fitted set of
Transformer
s which can be used to apply the transformation to any
other Dataset (e.g., to apply the same transformation to training and test sets), or to be used at prediction
time to stream data through.
It also provides a TransformTrainer
which accepts a
TransformationMap and an inner Trainer
and produces a
TransformedModel
which automatically transforms it's input data at
prediction time.
Transformations don't produce new Feature
s - they only modify the values of existing ones.
When doing so they can be instructed to treat Features that are absent due to sparsity as zero or as
not existing at all. Independently, we can explicitly add zero-valued Features by densifying the dataset
before the transformation is fit or before it is applied. Once they exist these Features can be altered by
Transformer
s and are visible to Transformation
s which are
being fit.
The transformation fitting methods have two parameters which alter their behaviour: includeImplicitZeroFeatures
and densify
. includeImplicitZeroFeatures
controls if the transformation incorporates the implicit zero
valued features (i.e., the ones not present in the example but are present in the dataset's
FeatureMap
) when building the transformation statistics. This is
important when working with, e.g. IDFTransformation
as it allows correct
computation of the inverse document frequency, but can be detrimental to features which are one-hot encodings of
categoricals (as they have many more implicit zeros). densify
controls if the example or dataset should have
its implicit zero valued features converted into explicit zero valued features (i.e., it makes a sparse example into
a dense one which contains an explicit value for every feature known to the dataset) before the transformation is
applied, and transformations are only applied to feature values which are present.
These parameters interact to form 4 possibilities:
includeImplicitZeroFeatures
is true, densify
is false: the implicit zeroes are used to fit
the transformation, but not modified when the transformation is applied. This is most useful when working with
text data where you want to compute IDF style statisticsincludeImplicitZeroFeatures
is false, densify
is true: the implicit zeros are not used to
fit the transformation, but are converted to explicit zeros and transformed. This is less useful than the other
three combinations, but could be used to move the minimum value, or when zero is not appropriate for a missing
value and needs to be transformed.MutableDataset.densify()
before passing the data to
TransformTrainer.train(org.tribuo.Dataset<T>, java.util.Map<java.lang.String, com.oracle.labs.mlrg.olcut.provenance.Provenance>)
, which is equivalent to setting includeImplicitZeroFeatures
to true and densify
to true. To sum up, in the context of transformations includeImplicitZeroFeatures
determines whether (implicit) zero-values features are measured and densify
determines whether
they can be altered.Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.