Class InformationTheory
java.lang.Object
org.tribuo.util.infotheory.InformationTheory
A class of (discrete) information theoretic functions. Gives warnings if
 there are insufficient samples to estimate the quantities accurately.
 
Defaults to log_2, so returns values in bits.
All functions expect that the element types have well defined equals and hashcode, and that equals is consistent with hashcode. The behaviour is undefined if this is not true.
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic final classAn immutable named tuple containing the statistics from a G test.
- 
Field SummaryFieldsModifier and TypeFieldDescriptionstatic final intstatic final doublestatic doubleSets the base of the logarithm used in the information theoretic calculations.static final doublestatic final double
- 
Method SummaryModifier and TypeMethodDescriptioncalculateCountDist(List<T> vector) Generate the counts for a single vector.static doublecalculateEntropy(DoubleStream vector) Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.static doublecalculateEntropy(Stream<Double> vector) Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.static <T1,T2, T3> double Calculates the conditional mutual information between first and second conditioned on the set.static <T1,T2> double conditionalEntropy(List<T1> vector, List<T2> condition) Calculates the discrete Shannon conditional entropy of two arrays, using histogram probability estimators.static <T1,T2, T3> double conditionalMI(List<T1> first, List<T2> second, List<T3> condition) Calculates the discrete Shannon conditional mutual information, using histogram probability estimators.static <T1,T2, T3> double conditionalMI(TripleDistribution<T1, T2, T3> rv) Calculates the discrete Shannon conditional mutual information, using histogram probability estimators.static <T1,T2, T3> double conditionalMIFlipped(TripleDistribution<T1, T2, T3> rv) Calculates the discrete Shannon conditional mutual information, using histogram probability estimators.static <T> doubleCalculates the discrete Shannon entropy, using histogram probability estimators.static <T1,T2, T3> InformationTheory.GTestStatistics Calculates the GTest statistics for the input variables conditioned on the set.static <T1,T2> double jointEntropy(List<T1> first, List<T2> second) Calculates the Shannon joint entropy of two arrays, using histogram probability estimators.static <T1,T2, T3> double Calculates the discrete Shannon joint mutual information, using histogram probability estimators.static <T1,T2, T3> double jointMI(TripleDistribution<T1, T2, T3> rv) Calculates the discrete Shannon joint mutual information, using histogram probability estimators.static <T1,T2> double Calculates the discrete Shannon mutual information, using histogram probability estimators.static <T1,T2> double Calculates the mutual information between the two sets of random variables.static <T1,T2> double mi(PairDistribution<T1, T2> pairDist) Calculates the discrete Shannon mutual information, using histogram probability estimators.
- 
Field Details- 
SAMPLES_RATIO- See Also:
 
- 
DEFAULT_MAP_SIZE- See Also:
 
- 
LOG_2
- 
LOG_E
- 
LOG_BASESets the base of the logarithm used in the information theoretic calculations. For LOG_2 the unit is "bit", for LOG_E the unit is "nat".
 
- 
- 
Method Details- 
miCalculates the mutual information between the two sets of random variables.- Type Parameters:
- T1- The first type.
- T2- The second type.
- Parameters:
- first- The first set of random variables.
- second- The second set of random variables.
- Returns:
- The mutual information I(first;second).
 
- 
cmiCalculates the conditional mutual information between first and second conditioned on the set.- Type Parameters:
- T1- The first type.
- T2- The second type.
- T3- The third type.
- Parameters:
- first- A sample from the first random variable.
- second- A sample from the second random variable.
- condition- A sample from the conditioning set of random variables.
- Returns:
- The conditional mutual information I(first;second|condition).
 
- 
gTestpublic static <T1,T2, InformationTheory.GTestStatistics gTestT3> (List<T1> first, List<T2> second, Set<List<T3>> condition) Calculates the GTest statistics for the input variables conditioned on the set.- Type Parameters:
- T1- The first type.
- T2- The second type.
- T3- The third type.
- Parameters:
- first- A sample from the first random variable.
- second- A sample from the second random variable.
- condition- A sample from the conditioning set of random variables.
- Returns:
- The GTest statistics.
 
- 
jointMICalculates the discrete Shannon joint mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
- T1- Type contained in the first array.
- T2- Type contained in the second array.
- T3- Type contained in the target array.
- Parameters:
- first- An array of values.
- second- Another array of values.
- target- Target array of values.
- Returns:
- The mutual information I(first,second;joint)
 
- 
jointMICalculates the discrete Shannon joint mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
- T1- Type contained in the first array.
- T2- Type contained in the second array.
- T3- Type contained in the target array.
- Parameters:
- rv- The random variable to calculate the joint mi of
- Returns:
- The mutual information I(first,second;joint)
 
- 
conditionalMICalculates the discrete Shannon conditional mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
- T1- Type contained in the first array.
- T2- Type contained in the second array.
- T3- Type contained in the condition array.
- Parameters:
- first- An array of values.
- second- Another array of values.
- condition- Array to condition upon.
- Returns:
- The conditional mutual information I(first;second|condition)
 
- 
conditionalMICalculates the discrete Shannon conditional mutual information, using histogram probability estimators. Note this calculates I(T1;T2|T3).- Type Parameters:
- T1- Type of the first variable.
- T2- Type of the second variable.
- T3- Type of the condition variable.
- Parameters:
- rv- The triple random variable of the three inputs.
- Returns:
- The conditional mutual information I(first;second|condition)
 
- 
conditionalMIFlippedCalculates the discrete Shannon conditional mutual information, using histogram probability estimators. Note this calculates I(T1;T3|T2).- Type Parameters:
- T1- Type of the first variable.
- T2- Type of the condition variable.
- T3- Type of the second variable.
- Parameters:
- rv- The triple random variable of the three inputs.
- Returns:
- The conditional mutual information I(first;second|condition)
 
- 
miCalculates the discrete Shannon mutual information, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
- T1- Type of the first array
- T2- Type of the second array
- Parameters:
- first- An array of values
- second- Another array of values
- Returns:
- The mutual information I(first;second)
 
- 
miCalculates the discrete Shannon mutual information, using histogram probability estimators.- Type Parameters:
- T1- Type of the first variable
- T2- Type of the second variable
- Parameters:
- pairDist- PairDistribution for the two variables.
- Returns:
- The mutual information I(first;second)
 
- 
jointEntropyCalculates the Shannon joint entropy of two arrays, using histogram probability estimators. Arrays must be same length.- Type Parameters:
- T1- Type of the first array.
- T2- Type of the second array.
- Parameters:
- first- An array of values.
- second- Another array of values.
- Returns:
- The entropy H(first,second)
 
- 
conditionalEntropyCalculates the discrete Shannon conditional entropy of two arrays, using histogram probability estimators. Arrays must be the same length.- Type Parameters:
- T1- Type of the first array.
- T2- Type of the second array.
- Parameters:
- vector- The main array of values.
- condition- The array to condition on.
- Returns:
- The conditional entropy H(vector|condition).
 
- 
entropy
- 
calculateCountDistGenerate the counts for a single vector.- Type Parameters:
- T- The type inside the vector.
- Parameters:
- vector- An array of values.
- Returns:
- A HashMap from states of T to counts.
 
- 
calculateEntropyCalculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.- Parameters:
- vector- The probability distribution.
- Returns:
- The entropy.
 
- 
calculateEntropyCalculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.- Parameters:
- vector- The probability distribution.
- Returns:
- The entropy.
 
 
-