Package org.tribuo
Class CategoricalInfo
java.lang.Object
org.tribuo.SkeletalVariableInfo
org.tribuo.CategoricalInfo
- All Implemented Interfaces:
Serializable
,Cloneable
,ProtoSerializable<org.tribuo.protos.core.VariableInfoProto>
,VariableInfo
- Direct Known Subclasses:
CategoricalIDInfo
Stores information about Categorical features.
Contains a mapping from values to observed counts for that value, has an initial optimisation for the binary case to reduce memory consumption.
Can be transformed into a RealInfo
if there are too many unique observed values.
Does not contain an id number, but can be transformed into CategoricalIDInfo
which
does contain an id number.
Note that the synchronization in this class only protects instantiation where CDF and values
are recomputed. Care should be taken if data is read while observe(double)
is called.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionprotected double[]
The CDF to sample from.static final int
Protobuf serialization version.protected long
The count of the observed value if it's only seen a single one.protected double
The observed value if it's only seen a single one.static final int
The default threshold for converting a categorical info into aRealInfo
.protected long
The total number of observations (including zeros).The occurrence counts of each value.protected double[]
The values array.Fields inherited from class org.tribuo.SkeletalVariableInfo
count, name
Fields inherited from interface org.tribuo.protos.ProtoSerializable
DESERIALIZATION_METHOD_NAME, PROVENANCE_SERIALIZER
-
Constructor Summary
ModifierConstructorDescriptionCategoricalInfo
(String name) Constructs a new empty categorical info for the supplied feature name.protected
Constructs a deep copy of the supplied categorical info.protected
CategoricalInfo
(CategoricalInfo info, String newName) Constructs a deep copy of the supplied categorical info, with the new feature name. -
Method Summary
Modifier and TypeMethodDescriptioncopy()
Returns a copy of this variable info.static CategoricalInfo
deserializeFromProto
(int version, String className, com.google.protobuf.Any message) Deserialization factory.boolean
double
frequencyBasedSample
(Random rng, long totalObservations) Samples a value from this feature according to the frequency of observation.double
frequencyBasedSample
(SplittableRandom rng, long totalObservations) Samples a value from this feature according to the frequency of observation.Generates aRealInfo
using the currently observed counts to calculate the min, max, mean and variance.long
getObservationCount
(double value) Gets the number of times a specific value was observed, and zero if this value is unknown.int
Gets the number of unique values this CategoricalInfo has observed.double[]
Returns an array containing the observed values ordered byDouble.compare(double, double)
.int
hashCode()
makeIDInfo
(int id) Generates a VariableIDInfo subclass which represents the same feature.protected void
observe
(double value) Records the value.Rename generates a fresh VariableInfo with the new name.org.tribuo.protos.core.VariableInfoProto
Serializes this object to a protobuf.toString()
double
Sample a value uniformly from the range of this variable.Methods inherited from class org.tribuo.SkeletalVariableInfo
getCount, getName
-
Field Details
-
CURRENT_VERSION
public static final int CURRENT_VERSIONProtobuf serialization version.- See Also:
-
THRESHOLD
public static final int THRESHOLDThe default threshold for converting a categorical info into aRealInfo
.- See Also:
-
valueCounts
The occurrence counts of each value. -
observedValue
protected double observedValueThe observed value if it's only seen a single one. -
observedCount
protected long observedCountThe count of the observed value if it's only seen a single one. -
values
protected transient double[] valuesThe values array. -
totalObservations
protected transient long totalObservationsThe total number of observations (including zeros). -
cdf
protected transient double[] cdfThe CDF to sample from.
-
-
Constructor Details
-
CategoricalInfo
Constructs a new empty categorical info for the supplied feature name.- Parameters:
name
- The feature name.
-
CategoricalInfo
Constructs a deep copy of the supplied categorical info.- Parameters:
info
- The info to copy.
-
CategoricalInfo
Constructs a deep copy of the supplied categorical info, with the new feature name.- Parameters:
info
- The info to copy.newName
- The new feature name.
-
-
Method Details
-
deserializeFromProto
public static CategoricalInfo deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException Deserialization factory.- Parameters:
version
- The serialized object version.className
- The class name.message
- The serialized data.- Returns:
- The deserialized object.
- Throws:
com.google.protobuf.InvalidProtocolBufferException
- If the protobuf could not be parsed from themessage
.
-
serialize
public org.tribuo.protos.core.VariableInfoProto serialize()Description copied from interface:ProtoSerializable
Serializes this object to a protobuf.- Returns:
- The protobuf.
-
observe
protected void observe(double value) Description copied from class:SkeletalVariableInfo
Records the value.- Overrides:
observe
in classSkeletalVariableInfo
- Parameters:
value
- The observed value.
-
getObservationCount
public long getObservationCount(double value) Gets the number of times a specific value was observed, and zero if this value is unknown.- Parameters:
value
- The value to check.- Returns:
- The count of times this value was observed, zero otherwise.
-
getUniqueObservations
public int getUniqueObservations()Gets the number of unique values this CategoricalInfo has observed.- Returns:
- An int representing the number of unique values.
-
generateRealInfo
Generates aRealInfo
using the currently observed counts to calculate the min, max, mean and variance.- Returns:
- A RealInfo representing the data in this CategoricalInfo.
-
copy
Description copied from interface:VariableInfo
Returns a copy of this variable info.- Returns:
- A copy.
-
makeIDInfo
Description copied from interface:VariableInfo
Generates a VariableIDInfo subclass which represents the same feature.- Parameters:
id
- The id number.- Returns:
- A VariableInfo with the same information, plus the id.
-
rename
Description copied from interface:VariableInfo
Rename generates a fresh VariableInfo with the new name. The name forms part of the hashcode so it's immutable in the object.- Parameters:
newName
- The new name.- Returns:
- A VariableInfo subclass with the new name.
-
uniformSample
Description copied from interface:VariableInfo
Sample a value uniformly from the range of this variable.- Parameters:
rng
- The rng to use.- Returns:
- A sample from this variable.
-
frequencyBasedSample
Samples a value from this feature according to the frequency of observation.- Parameters:
rng
- The RNG to use.totalObservations
- The observations including the implicit zeros.- Returns:
- The sampled value.
-
frequencyBasedSample
Samples a value from this feature according to the frequency of observation.- Parameters:
rng
- The RNG to use.totalObservations
- The observations including the implicit zeros.- Returns:
- The sampled value.
-
getValues
public double[] getValues()Returns an array containing the observed values ordered byDouble.compare(double, double)
.- Returns:
- The observed values.
-
equals
- Overrides:
equals
in classSkeletalVariableInfo
-
hashCode
public int hashCode()- Overrides:
hashCode
in classSkeletalVariableInfo
-
toString
- Overrides:
toString
in classSkeletalVariableInfo
-