Package org.tribuo.util.infotheory
Class InformationTheory
java.lang.Object
org.tribuo.util.infotheory.InformationTheory
A class of (discrete) information theoretic functions. Gives warnings if
there are insufficient samples to estimate the quantities accurately.
Defaults to log_2, so returns values in bits.
All functions expect that the element types have well defined equals and hashcode, and that equals is consistent with hashcode. The behaviour is undefined if this is not true.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final class
An immutable named tuple containing the statistics from a G test. -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
The initial size of the various maps.static final double
Log base 2.static double
Sets the base of the logarithm used in the information theoretic calculations.static final double
Log base e.static final double
The ratio of samples to symbols before emitting a warning. -
Method Summary
Modifier and TypeMethodDescriptioncalculateCountDist
(List<T> vector) Generate the counts for a single vector.static double
calculateEntropy
(DoubleStream vector) Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.static double
calculateEntropy
(Stream<Double> vector) Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.static <T1,
T2, T3> double Calculates the conditional mutual information between first and second conditioned on the set.static <T1,
T2> double conditionalEntropy
(List<T1> vector, List<T2> condition) Calculates the discrete Shannon conditional entropy of two arrays, using histogram probability estimators.static <T1,
T2, T3> double conditionalMI
(List<T1> first, List<T2> second, List<T3> condition) Calculates the discrete Shannon conditional mutual information, using histogram probability estimators.static <T1,
T2, T3> double conditionalMI
(TripleDistribution<T1, T2, T3> rv) Calculates the discrete Shannon conditional mutual information, using histogram probability estimators.static <T1,
T2, T3> double conditionalMIFlipped
(TripleDistribution<T1, T2, T3> rv) Calculates the discrete Shannon conditional mutual information, using histogram probability estimators.static <T> double
Calculates the discrete Shannon entropy, using histogram probability estimators.static <T> double
expectedMI
(List<T> first, List<T> second) Compute the expected mutual information assuming randomized inputs.static <T1,
T2, T3> InformationTheory.GTestStatistics Calculates the GTest statistics for the input variables conditioned on the set.static <T1,
T2> double jointEntropy
(List<T1> first, List<T2> second) Calculates the Shannon joint entropy of two arrays, using histogram probability estimators.static <T1,
T2, T3> double Calculates the discrete Shannon joint mutual information, using histogram probability estimators.static <T1,
T2, T3> double jointMI
(TripleDistribution<T1, T2, T3> rv) Calculates the discrete Shannon joint mutual information, using histogram probability estimators.static <T1,
T2> double Calculates the discrete Shannon mutual information, using histogram probability estimators.static <T1,
T2> double Calculates the mutual information between the two sets of random variables.static <T1,
T2> double mi
(PairDistribution<T1, T2> pairDist) Calculates the discrete Shannon mutual information, using histogram probability estimators.
-
Field Details
-
SAMPLES_RATIO
public static final double SAMPLES_RATIOThe ratio of samples to symbols before emitting a warning.- See Also:
-
DEFAULT_MAP_SIZE
public static final int DEFAULT_MAP_SIZEThe initial size of the various maps.- See Also:
-
LOG_2
public static final double LOG_2Log base 2. -
LOG_E
public static final double LOG_ELog base e. -
LOG_BASE
public static double LOG_BASESets the base of the logarithm used in the information theoretic calculations. For LOG_2 the unit is "bit", for LOG_E the unit is "nat".
-
-
Method Details
-
mi
Calculates the mutual information between the two sets of random variables.- Type Parameters:
T1
- The first type.T2
- The second type.- Parameters:
first
- The first set of random variables.second
- The second set of random variables.- Returns:
- The mutual information I(first;second).
-
cmi
Calculates the conditional mutual information between first and second conditioned on the set.- Type Parameters:
T1
- The first type.T2
- The second type.T3
- The third type.- Parameters:
first
- A sample from the first random variable.second
- A sample from the second random variable.condition
- A sample from the conditioning set of random variables.- Returns:
- The conditional mutual information I(first;second|condition).
-
gTest
public static <T1,T2, InformationTheory.GTestStatistics gTestT3> (List<T1> first, List<T2> second, Set<List<T3>> condition) Calculates the GTest statistics for the input variables conditioned on the set.- Type Parameters:
T1
- The first type.T2
- The second type.T3
- The third type.- Parameters:
first
- A sample from the first random variable.second
- A sample from the second random variable.condition
- A sample from the conditioning set of random variables.- Returns:
- The GTest statistics.
-
jointMI
Calculates the discrete Shannon joint mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
T1
- Type contained in the first array.T2
- Type contained in the second array.T3
- Type contained in the target array.- Parameters:
first
- An array of values.second
- Another array of values.target
- Target array of values.- Returns:
- The mutual information I(first,second;joint)
-
jointMI
Calculates the discrete Shannon joint mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
T1
- Type contained in the first array.T2
- Type contained in the second array.T3
- Type contained in the target array.- Parameters:
rv
- The random variable to calculate the joint mi of- Returns:
- The mutual information I(first,second;joint)
-
conditionalMI
public static <T1,T2, double conditionalMIT3> (List<T1> first, List<T2> second, List<T3> condition) Calculates the discrete Shannon conditional mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
T1
- Type contained in the first array.T2
- Type contained in the second array.T3
- Type contained in the condition array.- Parameters:
first
- An array of values.second
- Another array of values.condition
- Array to condition upon.- Returns:
- The conditional mutual information I(first;second|condition)
-
conditionalMI
Calculates the discrete Shannon conditional mutual information, using histogram probability estimators. Note this calculates I(T1;T2|T3).- Type Parameters:
T1
- Type of the first variable.T2
- Type of the second variable.T3
- Type of the condition variable.- Parameters:
rv
- The triple random variable of the three inputs.- Returns:
- The conditional mutual information I(first;second|condition)
-
conditionalMIFlipped
Calculates the discrete Shannon conditional mutual information, using histogram probability estimators. Note this calculates I(T1;T3|T2).- Type Parameters:
T1
- Type of the first variable.T2
- Type of the condition variable.T3
- Type of the second variable.- Parameters:
rv
- The triple random variable of the three inputs.- Returns:
- The conditional mutual information I(first;second|condition)
-
mi
Calculates the discrete Shannon mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
T1
- Type of the first arrayT2
- Type of the second array- Parameters:
first
- An array of valuessecond
- Another array of values- Returns:
- The mutual information I(first;second)
-
mi
Calculates the discrete Shannon mutual information, using histogram probability estimators.- Type Parameters:
T1
- Type of the first variableT2
- Type of the second variable- Parameters:
pairDist
- PairDistribution for the two variables.- Returns:
- The mutual information I(first;second)
-
jointEntropy
Calculates the Shannon joint entropy of two arrays, using histogram probability estimators. Arrays must be same length.- Type Parameters:
T1
- Type of the first array.T2
- Type of the second array.- Parameters:
first
- An array of values.second
- Another array of values.- Returns:
- The entropy H(first,second)
-
conditionalEntropy
Calculates the discrete Shannon conditional entropy of two arrays, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
T1
- Type of the first array.T2
- Type of the second array.- Parameters:
vector
- The main array of values.condition
- The array to condition on.- Returns:
- The conditional entropy H(vector|condition).
-
entropy
Calculates the discrete Shannon entropy, using histogram probability estimators.- Type Parameters:
T
- Type of the array.- Parameters:
vector
- The array of values.- Returns:
- The entropy H(vector).
-
calculateCountDist
Generate the counts for a single vector.- Type Parameters:
T
- The type inside the vector.- Parameters:
vector
- An array of values.- Returns:
- A HashMap from states of T to counts.
-
calculateEntropy
Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.- Parameters:
vector
- The probability distribution.- Returns:
- The entropy.
-
calculateEntropy
Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.- Parameters:
vector
- The probability distribution.- Returns:
- The entropy.
-
expectedMI
Compute the expected mutual information assuming randomized inputs.- Type Parameters:
T
- The type inside the list. Must define equals and hashcode.- Parameters:
first
- The first vector.second
- The second vector.- Returns:
- The expected mutual information under a hypergeometric distribution.
-