public class CategoricalInfo extends SkeletalVariableInfo
Contains a mapping from values to observed counts for that value, has an initial optimisation for the binary case to reduce memory consumption.
Can be transformed into a RealInfo
if there are too many unique observed values.
Does not contain an id number, but can be transformed into CategoricalIDInfo
which
does contain an id number.
Note that the synchronization in this class only protects instantiation where CDF and values
are recomputed. Care should be taken if data is read while observe(double)
is called.
Modifier and Type | Field and Description |
---|---|
protected double[] |
cdf
The CDF to sample from.
|
protected long |
observedCount
The count of the observed value if it's only seen a single one.
|
protected double |
observedValue
The observed value if it's only seen a single one.
|
static int |
THRESHOLD
The default threshold for converting a categorical info into a
RealInfo . |
protected long |
totalObservations
The total number of observations (including zeros).
|
protected Map<Double,com.oracle.labs.mlrg.olcut.util.MutableLong> |
valueCounts
The occurrence counts of each value.
|
protected double[] |
values
The values array.
|
count, name
Modifier | Constructor and Description |
---|---|
protected |
CategoricalInfo(CategoricalInfo info)
Constructs a deep copy of the supplied categorical info.
|
protected |
CategoricalInfo(CategoricalInfo info,
String newName)
Constructs a deep copy of the supplied categorical info, with the new feature name.
|
|
CategoricalInfo(String name)
Constructs a new empty categorical info for the supplied feature name.
|
Modifier and Type | Method and Description |
---|---|
CategoricalInfo |
copy()
Returns a copy of this variable info.
|
double |
frequencyBasedSample(Random rng,
long totalObservations)
Samples a value from this feature according to the frequency of observation.
|
double |
frequencyBasedSample(SplittableRandom rng,
long totalObservations)
Samples a value from this feature according to the frequency of observation.
|
RealInfo |
generateRealInfo()
Generates a
RealInfo using the currently observed counts to calculate
the min, max, mean and variance. |
long |
getObservationCount(double value)
Gets the number of times a specific value was observed, and zero if this value is unknown.
|
int |
getUniqueObservations()
Gets the number of unique values this CategoricalInfo has observed.
|
CategoricalIDInfo |
makeIDInfo(int id)
Generates a VariableIDInfo subclass which represents the same feature.
|
protected void |
observe(double value)
Records the value.
|
CategoricalInfo |
rename(String newName)
Rename generates a fresh VariableInfo with the new name.
|
String |
toString() |
double |
uniformSample(SplittableRandom rng)
Sample a value uniformly from the range of this variable.
|
equals, getCount, getName, hashCode
public static final int THRESHOLD
RealInfo
.protected Map<Double,com.oracle.labs.mlrg.olcut.util.MutableLong> valueCounts
protected double observedValue
protected long observedCount
protected transient double[] values
protected transient long totalObservations
protected transient double[] cdf
public CategoricalInfo(String name)
name
- The feature name.protected CategoricalInfo(CategoricalInfo info)
info
- The info to copy.protected CategoricalInfo(CategoricalInfo info, String newName)
info
- The info to copy.newName
- The new feature name.protected void observe(double value)
SkeletalVariableInfo
observe
in class SkeletalVariableInfo
value
- The observed value.public long getObservationCount(double value)
value
- The value to check.public int getUniqueObservations()
public RealInfo generateRealInfo()
RealInfo
using the currently observed counts to calculate
the min, max, mean and variance.public CategoricalInfo copy()
VariableInfo
public CategoricalIDInfo makeIDInfo(int id)
VariableInfo
id
- The id number.public CategoricalInfo rename(String newName)
VariableInfo
newName
- The new name.public double uniformSample(SplittableRandom rng)
VariableInfo
rng
- The rng to use.public double frequencyBasedSample(SplittableRandom rng, long totalObservations)
rng
- The RNG to use.totalObservations
- The observations including the implicit zeros.public double frequencyBasedSample(Random rng, long totalObservations)
rng
- The RNG to use.totalObservations
- The observations including the implicit zeros.public String toString()
toString
in class SkeletalVariableInfo
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.