Package org.tribuo
Class CategoricalInfo
java.lang.Object
org.tribuo.SkeletalVariableInfo
org.tribuo.CategoricalInfo
- All Implemented Interfaces:
Serializable
,Cloneable
,VariableInfo
- Direct Known Subclasses:
CategoricalIDInfo
Stores information about Categorical features.
Contains a mapping from values to observed counts for that value, has an initial optimisation for the binary case to reduce memory consumption.
Can be transformed into a RealInfo
if there are too many unique observed values.
Does not contain an id number, but can be transformed into CategoricalIDInfo
which
does contain an id number.
Note that the synchronization in this class only protects instantiation where CDF and values
are recomputed. Care should be taken if data is read while observe(double)
is called.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionprotected double[]
The CDF to sample from.protected long
The count of the observed value if it's only seen a single one.protected double
The observed value if it's only seen a single one.static final int
The default threshold for converting a categorical info into aRealInfo
.protected long
The total number of observations (including zeros).The occurrence counts of each value.protected double[]
The values array.Fields inherited from class org.tribuo.SkeletalVariableInfo
count, name
-
Constructor Summary
ModifierConstructorDescriptionCategoricalInfo
(String name) Constructs a new empty categorical info for the supplied feature name.protected
Constructs a deep copy of the supplied categorical info.protected
CategoricalInfo
(CategoricalInfo info, String newName) Constructs a deep copy of the supplied categorical info, with the new feature name. -
Method Summary
Modifier and TypeMethodDescriptioncopy()
Returns a copy of this variable info.double
frequencyBasedSample
(Random rng, long totalObservations) Samples a value from this feature according to the frequency of observation.double
frequencyBasedSample
(SplittableRandom rng, long totalObservations) Samples a value from this feature according to the frequency of observation.Generates aRealInfo
using the currently observed counts to calculate the min, max, mean and variance.long
getObservationCount
(double value) Gets the number of times a specific value was observed, and zero if this value is unknown.int
Gets the number of unique values this CategoricalInfo has observed.makeIDInfo
(int id) Generates a VariableIDInfo subclass which represents the same feature.protected void
observe
(double value) Records the value.Rename generates a fresh VariableInfo with the new name.toString()
double
Sample a value uniformly from the range of this variable.Methods inherited from class org.tribuo.SkeletalVariableInfo
equals, getCount, getName, hashCode
-
Field Details
-
THRESHOLD
public static final int THRESHOLDThe default threshold for converting a categorical info into aRealInfo
.- See Also:
-
valueCounts
The occurrence counts of each value. -
observedValue
protected double observedValueThe observed value if it's only seen a single one. -
observedCount
protected long observedCountThe count of the observed value if it's only seen a single one. -
values
protected transient double[] valuesThe values array. -
totalObservations
protected transient long totalObservationsThe total number of observations (including zeros). -
cdf
protected transient double[] cdfThe CDF to sample from.
-
-
Constructor Details
-
CategoricalInfo
Constructs a new empty categorical info for the supplied feature name.- Parameters:
name
- The feature name.
-
CategoricalInfo
Constructs a deep copy of the supplied categorical info.- Parameters:
info
- The info to copy.
-
CategoricalInfo
Constructs a deep copy of the supplied categorical info, with the new feature name.- Parameters:
info
- The info to copy.newName
- The new feature name.
-
-
Method Details
-
observe
protected void observe(double value) Description copied from class:SkeletalVariableInfo
Records the value.- Overrides:
observe
in classSkeletalVariableInfo
- Parameters:
value
- The observed value.
-
getObservationCount
public long getObservationCount(double value) Gets the number of times a specific value was observed, and zero if this value is unknown.- Parameters:
value
- The value to check.- Returns:
- The count of times this value was observed, zero otherwise.
-
getUniqueObservations
public int getUniqueObservations()Gets the number of unique values this CategoricalInfo has observed.- Returns:
- An int representing the number of unique values.
-
generateRealInfo
Generates aRealInfo
using the currently observed counts to calculate the min, max, mean and variance.- Returns:
- A RealInfo representing the data in this CategoricalInfo.
-
copy
Description copied from interface:VariableInfo
Returns a copy of this variable info.- Returns:
- A copy.
-
makeIDInfo
Description copied from interface:VariableInfo
Generates a VariableIDInfo subclass which represents the same feature.- Parameters:
id
- The id number.- Returns:
- A VariableInfo with the same information, plus the id.
-
rename
Description copied from interface:VariableInfo
Rename generates a fresh VariableInfo with the new name. The name forms part of the hashcode so it's immutable in the object.- Parameters:
newName
- The new name.- Returns:
- A VariableInfo subclass with the new name.
-
uniformSample
Description copied from interface:VariableInfo
Sample a value uniformly from the range of this variable.- Parameters:
rng
- The rng to use.- Returns:
- A sample from this variable.
-
frequencyBasedSample
Samples a value from this feature according to the frequency of observation.- Parameters:
rng
- The RNG to use.totalObservations
- The observations including the implicit zeros.- Returns:
- The sampled value.
-
frequencyBasedSample
Samples a value from this feature according to the frequency of observation.- Parameters:
rng
- The RNG to use.totalObservations
- The observations including the implicit zeros.- Returns:
- The sampled value.
-
toString
- Overrides:
toString
in classSkeletalVariableInfo
-