Class BinningTransformation
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<TransformationProvenance>
,Transformation
Three binning types are implemented:
- Equal width bins, based on the observed min and max.
- Equal frequency bins, based on the observed data.
- Standard deviation width bins, based on the observed standard deviation and mean.
The equal frequency TransformStatistics
needs to
store all the observed feature values, and thus has much higher
memory usage than all other binning types.
The binned values are in the range [1, numBins].
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final class
Provenance forBinningTransformation
.static final class
The implementation of aTransformer
which splits the input into n bins.static enum
The allowed binning types. -
Method Summary
Modifier and TypeMethodDescriptionCreates the statistics object for this Transformation.static BinningTransformation
equalFrequency
(int numBins) Returns a BinningTransformation which generates bins which contain the same amount of training data that is, each bin has an equal probability of occurrence in the training data.static BinningTransformation
equalWidth
(int numBins) Returns a BinningTransformation which generates fixed equal width bins between the observed min and max values.void
Used by the OLCUT configuration system, and should not be called by external code.static BinningTransformation
stdDevs
(int numDeviations) Returns a BinningTransformation which generates bins based on the observed standard deviation of the training data.toString()
-
Method Details
-
postConfig
public void postConfig()Used by the OLCUT configuration system, and should not be called by external code.- Specified by:
postConfig
in interfacecom.oracle.labs.mlrg.olcut.config.Configurable
-
createStats
Description copied from interface:Transformation
Creates the statistics object for this Transformation.- Specified by:
createStats
in interfaceTransformation
- Returns:
- The statistics object.
-
getProvenance
- Specified by:
getProvenance
in interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<TransformationProvenance>
-
toString
-
equalWidth
Returns a BinningTransformation which generates fixed equal width bins between the observed min and max values.Values outside the observed range are clamped to either the minimum or maximum bin. Bins are numbered in the range [1,numBins].
- Parameters:
numBins
- The number of bins to generate.- Returns:
- An equal width binning.
-
equalFrequency
Returns a BinningTransformation which generates bins which contain the same amount of training data that is, each bin has an equal probability of occurrence in the training data.Values outside the observed range are clamped to either the minimum or maximum bin. Bins are numbered in the range [1,numBins].
- Parameters:
numBins
- The number of bins to generate.- Returns:
- An equal frequency binning.
-
stdDevs
Returns a BinningTransformation which generates bins based on the observed standard deviation of the training data. Each bin is a standard deviation wide, except for the bins at the edges which encompass all lower or higher values.Bins are numbered in the range [1,numDeviations*2]. The middle two bins are either side of the mean, the lowest bin is the mean minus numDeviations * observed standard deviation, the highest bin is the mean plus numDeviations * observed standard deviation.
- Parameters:
numDeviations
- The number of standard deviations to bin.- Returns:
- A standard deviation based binning.
-