Class BinningTransformation

java.lang.Object
org.tribuo.transform.transformations.BinningTransformation
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<TransformationProvenance>, Transformation

public final class BinningTransformation extends Object implements Transformation
A Transformation which bins values.

Three binning types are implemented:

  • Equal width bins, based on the observed min and max.
  • Equal frequency bins, based on the observed data.
  • Standard deviation width bins, based on the observed standard deviation and mean.

The equal frequency TransformStatistics needs to store all the observed feature values, and thus has much higher memory usage than all other binning types.

The binned values are in the range [1, numBins].

  • Method Details

    • postConfig

      public void postConfig()
      Used by the OLCUT configuration system, and should not be called by external code.
      Specified by:
      postConfig in interface com.oracle.labs.mlrg.olcut.config.Configurable
    • createStats

      public TransformStatistics createStats()
      Description copied from interface: Transformation
      Creates the statistics object for this Transformation.
      Specified by:
      createStats in interface Transformation
      Returns:
      The statistics object.
    • getProvenance

      public TransformationProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<TransformationProvenance>
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • equalWidth

      public static BinningTransformation equalWidth(int numBins)
      Returns a BinningTransformation which generates fixed equal width bins between the observed min and max values.

      Values outside the observed range are clamped to either the minimum or maximum bin. Bins are numbered in the range [1,numBins].

      Parameters:
      numBins - The number of bins to generate.
      Returns:
      An equal width binning.
    • equalFrequency

      public static BinningTransformation equalFrequency(int numBins)
      Returns a BinningTransformation which generates bins which contain the same amount of training data that is, each bin has an equal probability of occurrence in the training data.

      Values outside the observed range are clamped to either the minimum or maximum bin. Bins are numbered in the range [1,numBins].

      Parameters:
      numBins - The number of bins to generate.
      Returns:
      An equal frequency binning.
    • stdDevs

      public static BinningTransformation stdDevs(int numDeviations)
      Returns a BinningTransformation which generates bins based on the observed standard deviation of the training data. Each bin is a standard deviation wide, except for the bins at the edges which encompass all lower or higher values.

      Bins are numbered in the range [1,numDeviations*2]. The middle two bins are either side of the mean, the lowest bin is the mean minus numDeviations * observed standard deviation, the highest bin is the mean plus numDeviations * observed standard deviation.

      Parameters:
      numDeviations - The number of standard deviations to bin.
      Returns:
      A standard deviation based binning.