Package org.tribuo

Class CategoricalInfo

All Implemented Interfaces:
Serializable, Cloneable, ProtoSerializable<org.tribuo.protos.core.VariableInfoProto>, VariableInfo
Direct Known Subclasses:
CategoricalIDInfo

public class CategoricalInfo extends SkeletalVariableInfo
Stores information about Categorical features.

Contains a mapping from values to observed counts for that value, has an initial optimisation for the binary case to reduce memory consumption.

Can be transformed into a RealInfo if there are too many unique observed values.

Does not contain an id number, but can be transformed into CategoricalIDInfo which does contain an id number.

Note that the synchronization in this class only protects instantiation where CDF and values are recomputed. Care should be taken if data is read while observe(double) is called.

See Also:
  • Field Details

    • CURRENT_VERSION

      public static final int CURRENT_VERSION
      Protobuf serialization version.
      See Also:
    • THRESHOLD

      public static final int THRESHOLD
      The default threshold for converting a categorical info into a RealInfo.
      See Also:
    • valueCounts

      protected Map<Double,com.oracle.labs.mlrg.olcut.util.MutableLong> valueCounts
      The occurrence counts of each value.
    • observedValue

      protected double observedValue
      The observed value if it's only seen a single one.
    • observedCount

      protected long observedCount
      The count of the observed value if it's only seen a single one.
    • values

      protected transient double[] values
      The values array.
    • totalObservations

      protected transient long totalObservations
      The total number of observations (including zeros).
    • cdf

      protected transient double[] cdf
      The CDF to sample from.
  • Constructor Details

    • CategoricalInfo

      public CategoricalInfo(String name)
      Constructs a new empty categorical info for the supplied feature name.
      Parameters:
      name - The feature name.
    • CategoricalInfo

      protected CategoricalInfo(CategoricalInfo info)
      Constructs a deep copy of the supplied categorical info.
      Parameters:
      info - The info to copy.
    • CategoricalInfo

      protected CategoricalInfo(CategoricalInfo info, String newName)
      Constructs a deep copy of the supplied categorical info, with the new feature name.
      Parameters:
      info - The info to copy.
      newName - The new feature name.
  • Method Details

    • deserializeFromProto

      public static CategoricalInfo deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException
      Deserialization factory.
      Parameters:
      version - The serialized object version.
      className - The class name.
      message - The serialized data.
      Returns:
      The deserialized object.
      Throws:
      com.google.protobuf.InvalidProtocolBufferException - If the protobuf could not be parsed from the message.
    • serialize

      public org.tribuo.protos.core.VariableInfoProto serialize()
      Description copied from interface: ProtoSerializable
      Serializes this object to a protobuf.
      Returns:
      The protobuf.
    • observe

      protected void observe(double value)
      Description copied from class: SkeletalVariableInfo
      Records the value.
      Overrides:
      observe in class SkeletalVariableInfo
      Parameters:
      value - The observed value.
    • getObservationCount

      public long getObservationCount(double value)
      Gets the number of times a specific value was observed, and zero if this value is unknown.
      Parameters:
      value - The value to check.
      Returns:
      The count of times this value was observed, zero otherwise.
    • getUniqueObservations

      public int getUniqueObservations()
      Gets the number of unique values this CategoricalInfo has observed.
      Returns:
      An int representing the number of unique values.
    • generateRealInfo

      public RealInfo generateRealInfo()
      Generates a RealInfo using the currently observed counts to calculate the min, max, mean and variance.
      Returns:
      A RealInfo representing the data in this CategoricalInfo.
    • copy

      public CategoricalInfo copy()
      Description copied from interface: VariableInfo
      Returns a copy of this variable info.
      Returns:
      A copy.
    • makeIDInfo

      public CategoricalIDInfo makeIDInfo(int id)
      Description copied from interface: VariableInfo
      Generates a VariableIDInfo subclass which represents the same feature.
      Parameters:
      id - The id number.
      Returns:
      A VariableInfo with the same information, plus the id.
    • rename

      public CategoricalInfo rename(String newName)
      Description copied from interface: VariableInfo
      Rename generates a fresh VariableInfo with the new name. The name forms part of the hashcode so it's immutable in the object.
      Parameters:
      newName - The new name.
      Returns:
      A VariableInfo subclass with the new name.
    • uniformSample

      public double uniformSample(SplittableRandom rng)
      Description copied from interface: VariableInfo
      Sample a value uniformly from the range of this variable.
      Parameters:
      rng - The rng to use.
      Returns:
      A sample from this variable.
    • frequencyBasedSample

      public double frequencyBasedSample(SplittableRandom rng, long totalObservations)
      Samples a value from this feature according to the frequency of observation.
      Parameters:
      rng - The RNG to use.
      totalObservations - The observations including the implicit zeros.
      Returns:
      The sampled value.
    • frequencyBasedSample

      public double frequencyBasedSample(Random rng, long totalObservations)
      Samples a value from this feature according to the frequency of observation.
      Parameters:
      rng - The RNG to use.
      totalObservations - The observations including the implicit zeros.
      Returns:
      The sampled value.
    • getValues

      public double[] getValues()
      Returns an array containing the observed values ordered by Double.compare(double, double).
      Returns:
      The observed values.
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class SkeletalVariableInfo
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class SkeletalVariableInfo
    • toString

      public String toString()
      Overrides:
      toString in class SkeletalVariableInfo