Package org.tribuo.util.infotheory
Class WeightedInformationTheory
java.lang.Object
org.tribuo.util.infotheory.WeightedInformationTheory
A class of (discrete) weighted information theoretic functions. Gives warnings if
there are insufficient samples to estimate the quantities accurately.
Defaults to log_2, so returns values in bits.
All functions expect that the element types have well defined equals and hashcode, and that equals is consistent with hashcode. The behaviour is undefined if this is not true.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
Chooses which variable is the one with associated weights. -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
The initial size of the various maps.static final double
Log base 2.static double
Sets the base of the logarithm used in the information theoretic calculations.static final double
Log base e.static final double
The ratio of samples to symbols before emitting a warning. -
Method Summary
Modifier and TypeMethodDescriptionstatic <T> Map<T,
WeightCountTuple> calculateWeightedCountDist
(ArrayList<T> vector, ArrayList<Double> weights) Generate the counts for a single vector.static <T1,
T2, T3> double Calculates the discrete weighted conditional mutual information, using histogram probability estimators.static <T1,
T2, T3> double conditionalMI
(TripleDistribution<T1, T2, T3> rv, Map<?, Double> weights, WeightedInformationTheory.VariableSelector vs) Calculates the discrete weighted conditional mutual information, using histogram probability estimators.static <T1,
T2, T3> double conditionalMI
(WeightedTripleDistribution<T1, T2, T3> tripleRV) Calculates the discrete weighted conditional mutual information, using histogram probability estimators.static <T1,
T2> double jointEntropy
(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights) Calculates the Shannon/Guiasu weighted joint entropy of two arrays, using histogram probability estimators.static <T1,
T2, T3> double Calculates the discrete weighted joint mutual information, using histogram probability estimators.static <T1,
T2, T3> double jointMI
(TripleDistribution<T1, T2, T3> rv, Map<?, Double> weights, WeightedInformationTheory.VariableSelector vs) Calculates the discrete weighted joint mutual information, using histogram probability estimators.static <T1,
T2, T3> double jointMI
(WeightedTripleDistribution<T1, T2, T3> tripleRV) Calculates the discrete weighted joint mutual information, using histogram probability estimators.static <T1,
T2> double Calculates the discrete weighted mutual information, using histogram probability estimators.static <T1,
T2> double mi
(PairDistribution<T1, T2> pairDist, Map<?, Double> weights, WeightedInformationTheory.VariableSelector vs) Calculates the discrete weighted mutual information, using histogram probability estimators.static <T1,
T2> double mi
(WeightedPairDistribution<T1, T2> jointDist) Calculates the discrete weighted mutual information, using histogram probability estimators.static <T> void
normaliseWeights
(Map<T, WeightCountTuple> map) Normalizes the weights in the map, i.e., divides each weight by it's count.static <T1,
T2> double weightedConditionalEntropy
(ArrayList<T1> vector, ArrayList<T2> condition, ArrayList<Double> weights) Calculates the discrete Shannon/Guiasu Weighted Conditional Entropy of two arrays, using histogram probability estimators.static <T> double
weightedEntropy
(ArrayList<T> vector, ArrayList<Double> weights) Calculates the discrete Shannon/Guiasu Weighted Entropy, using histogram probability estimators.
-
Field Details
-
SAMPLES_RATIO
public static final double SAMPLES_RATIOThe ratio of samples to symbols before emitting a warning.- See Also:
-
DEFAULT_MAP_SIZE
public static final int DEFAULT_MAP_SIZEThe initial size of the various maps.- See Also:
-
LOG_2
public static final double LOG_2Log base 2. -
LOG_E
public static final double LOG_ELog base e. -
LOG_BASE
public static double LOG_BASESets the base of the logarithm used in the information theoretic calculations. For LOG_2 the unit is "bit", for LOG_E the unit is "nat".
-
-
Method Details
-
jointMI
public static <T1,T2, double jointMIT3> (List<T1> first, List<T2> second, List<T3> target, List<Double> weights) Calculates the discrete weighted joint mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
T1
- Type contained in the first array.T2
- Type contained in the second array.T3
- Type contained in the target array.- Parameters:
first
- An array of values.second
- Another array of values.target
- Target array of values.weights
- Array of weight values.- Returns:
- The weighted mutual information I_w(first,second;joint)
-
jointMI
Calculates the discrete weighted joint mutual information, using histogram probability estimators.- Type Parameters:
T1
- The first element type.T2
- The second element type.T3
- The third element type.- Parameters:
tripleRV
- The weighted triple distribution.- Returns:
- The weighted mutual information I_w(first,second;joint)
-
jointMI
public static <T1,T2, double jointMIT3> (TripleDistribution<T1, T2, T3> rv, Map<?, Double> weights, WeightedInformationTheory.VariableSelector vs) Calculates the discrete weighted joint mutual information, using histogram probability estimators.- Type Parameters:
T1
- The first element type.T2
- The second element type.T3
- The third element type.- Parameters:
rv
- The triple distribution.weights
- The weights for one of the variables.vs
- The weighted variable id.- Returns:
- The weighted mutual information I_w(first,second;joint)
-
conditionalMI
public static <T1,T2, double conditionalMIT3> (List<T1> first, List<T2> second, List<T3> condition, List<Double> weights) Calculates the discrete weighted conditional mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
T1
- Type contained in the first array.T2
- Type contained in the second array.T3
- Type contained in the condition array.- Parameters:
first
- An array of values.second
- Another array of values.condition
- Array to condition upon.weights
- Array of weight values.- Returns:
- The weighted conditional mutual information I_w(first;second|condition)
-
conditionalMI
Calculates the discrete weighted conditional mutual information, using histogram probability estimators.- Type Parameters:
T1
- The first element type.T2
- The second element type.T3
- The condition element type.- Parameters:
tripleRV
- The weighted triple distribution.- Returns:
- The weighted conditional mutual information I_w(first;second|condition)
-
conditionalMI
public static <T1,T2, double conditionalMIT3> (TripleDistribution<T1, T2, T3> rv, Map<?, Double> weights, WeightedInformationTheory.VariableSelector vs) Calculates the discrete weighted conditional mutual information, using histogram probability estimators.- Type Parameters:
T1
- The first element type.T2
- The second element type.T3
- The condition element type.- Parameters:
rv
- The triple distribution.weights
- The element weights.vs
- The variable to apply the weights to.- Returns:
- The weighted conditional mutual information I_w(first;second|condition)
-
mi
public static <T1,T2> double mi(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights) Calculates the discrete weighted mutual information, using histogram probability estimators.Arrays must be the same length.
- Type Parameters:
T1
- Type of the first arrayT2
- Type of the second array- Parameters:
first
- An array of valuessecond
- Another array of valuesweights
- Array of weight values.- Returns:
- The weighted mutual information I_w(first;Second)
-
mi
Calculates the discrete weighted mutual information, using histogram probability estimators.- Type Parameters:
T1
- Type of the first element.T2
- Type of the second element.- Parameters:
jointDist
- The weighted joint distribution.- Returns:
- The weighted mutual information I_w(first;Second)
-
mi
public static <T1,T2> double mi(PairDistribution<T1, T2> pairDist, Map<?, Double> weights, WeightedInformationTheory.VariableSelector vs) Calculates the discrete weighted mutual information, using histogram probability estimators.- Type Parameters:
T1
- Type of the first element.T2
- Type of the second element.- Parameters:
pairDist
- The joint distribution.weights
- The element weights.vs
- The variable to apply the weights to.- Returns:
- The weighted mutual information I_w(first;Second)
-
jointEntropy
public static <T1,T2> double jointEntropy(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights) Calculates the Shannon/Guiasu weighted joint entropy of two arrays, using histogram probability estimators.Arrays must be same length.
- Type Parameters:
T1
- Type of the first array.T2
- Type of the second array.- Parameters:
first
- An array of values.second
- Another array of values.weights
- Array of weight values.- Returns:
- The entropy H(first,second)
-
weightedConditionalEntropy
public static <T1,T2> double weightedConditionalEntropy(ArrayList<T1> vector, ArrayList<T2> condition, ArrayList<Double> weights) Calculates the discrete Shannon/Guiasu Weighted Conditional Entropy of two arrays, using histogram probability estimators.Arrays must be the same length.
- Type Parameters:
T1
- Type of the first array.T2
- Type of the second array.- Parameters:
vector
- The main array of values.condition
- The array to condition on.weights
- Array of weight values.- Returns:
- The weighted conditional entropy H_w(vector|condition).
-
weightedEntropy
Calculates the discrete Shannon/Guiasu Weighted Entropy, using histogram probability estimators.- Type Parameters:
T
- Type of the array.- Parameters:
vector
- The array of values.weights
- Array of weight values.- Returns:
- The weighted entropy H_w(vector).
-
calculateWeightedCountDist
public static <T> Map<T,WeightCountTuple> calculateWeightedCountDist(ArrayList<T> vector, ArrayList<Double> weights) Generate the counts for a single vector.- Type Parameters:
T
- The type inside the vector.- Parameters:
vector
- An array of values.weights
- The array of weight values.- Returns:
- A HashMap from states of T to Pairs of count and total weight for that state.
-
normaliseWeights
Normalizes the weights in the map, i.e., divides each weight by it's count.- Type Parameters:
T
- The type of the variable that was counted.- Parameters:
map
- The map to normalize.
-