org.tribuo.util.infotheory

## Class InformationTheory

• ```public final class InformationTheory
extends Object```
A class of (discrete) information theoretic functions. Gives warnings if there are insufficient samples to estimate the quantities accurately.

Defaults to log_2, so returns values in bits.

All functions expect that the element types have well defined equals and hashcode, and that equals is consistent with hashcode. The behaviour is undefined if this is not true.

• ### Nested Class Summary

Nested Classes
Modifier and Type Class and Description
`static class ` `InformationTheory.GTestStatistics`
An immutable named tuple containing the statistics from a G test.
• ### Field Summary

Fields
Modifier and Type Field and Description
`static int` `DEFAULT_MAP_SIZE`
`static double` `LOG_2`
`static double` `LOG_BASE`
Sets the base of the logarithm used in the information theoretic calculations.
`static double` `LOG_E`
`static double` `SAMPLES_RATIO`
• ### Method Summary

All Methods
Modifier and Type Method and Description
`static <T> Map<T,Long>` `calculateCountDist(List<T> vector)`
Generate the counts for a single vector.
`static double` `calculateEntropy(DoubleStream vector)`
Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.
`static double` `calculateEntropy(Stream<Double> vector)`
Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.
`static <T1,T2,T3> double` ```cmi(List<T1> first, List<T2> second, Set<List<T3>> condition)```
Calculates the conditional mutual information between first and second conditioned on the set.
`static <T1,T2> double` ```conditionalEntropy(List<T1> vector, List<T2> condition)```
Calculates the discrete Shannon conditional entropy of two arrays, using histogram probability estimators.
`static <T1,T2,T3> double` ```conditionalMI(List<T1> first, List<T2> second, List<T3> condition)```
Calculates the discrete Shannon conditional mutual information, using histogram probability estimators.
`static <T1,T2,T3> double` `conditionalMI(TripleDistribution<T1,T2,T3> rv)`
Calculates the discrete Shannon conditional mutual information, using histogram probability estimators.
`static <T1,T2,T3> double` `conditionalMIFlipped(TripleDistribution<T1,T2,T3> rv)`
Calculates the discrete Shannon conditional mutual information, using histogram probability estimators.
`static <T> double` `entropy(List<T> vector)`
Calculates the discrete Shannon entropy, using histogram probability estimators.
`static <T1,T2,T3> InformationTheory.GTestStatistics` ```gTest(List<T1> first, List<T2> second, Set<List<T3>> condition)```
Calculates the GTest statistics for the input variables conditioned on the set.
`static <T1,T2> double` ```jointEntropy(List<T1> first, List<T2> second)```
Calculates the Shannon joint entropy of two arrays, using histogram probability estimators.
`static <T1,T2,T3> double` ```jointMI(List<T1> first, List<T2> second, List<T3> target)```
Calculates the discrete Shannon joint mutual information, using histogram probability estimators.
`static <T1,T2,T3> double` `jointMI(TripleDistribution<T1,T2,T3> rv)`
Calculates the discrete Shannon joint mutual information, using histogram probability estimators.
`static <T1,T2> double` ```mi(List<T1> first, List<T2> second)```
Calculates the discrete Shannon mutual information, using histogram probability estimators.
`static <T1,T2> double` `mi(PairDistribution<T1,T2> pairDist)`
Calculates the discrete Shannon mutual information, using histogram probability estimators.
`static <T1,T2> double` ```mi(Set<List<T1>> first, Set<List<T2>> second)```
Calculates the mutual information between the two sets of random variables.
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Field Detail

• #### SAMPLES_RATIO

`public static final double SAMPLES_RATIO`
Constant Field Values
• #### DEFAULT_MAP_SIZE

`public static final int DEFAULT_MAP_SIZE`
Constant Field Values
• #### LOG_2

`public static final double LOG_2`
• #### LOG_E

`public static final double LOG_E`
• #### LOG_BASE

`public static double LOG_BASE`
Sets the base of the logarithm used in the information theoretic calculations. For LOG_2 the unit is "bit", for LOG_E the unit is "nat".
• ### Method Detail

• #### mi

```public static <T1,T2> double mi(Set<List<T1>> first,
Set<List<T2>> second)```
Calculates the mutual information between the two sets of random variables.
Type Parameters:
`T1` - The first type.
`T2` - The second type.
Parameters:
`first` - The first set of random variables.
`second` - The second set of random variables.
Returns:
The mutual information I(first;second).
• #### cmi

```public static <T1,T2,T3> double cmi(List<T1> first,
List<T2> second,
Set<List<T3>> condition)```
Calculates the conditional mutual information between first and second conditioned on the set.
Type Parameters:
`T1` - The first type.
`T2` - The second type.
`T3` - The third type.
Parameters:
`first` - A sample from the first random variable.
`second` - A sample from the second random variable.
`condition` - A sample from the conditioning set of random variables.
Returns:
The conditional mutual information I(first;second|condition).
• #### gTest

```public static <T1,T2,T3> InformationTheory.GTestStatistics gTest(List<T1> first,
List<T2> second,
Set<List<T3>> condition)```
Calculates the GTest statistics for the input variables conditioned on the set.
Type Parameters:
`T1` - The first type.
`T2` - The second type.
`T3` - The third type.
Parameters:
`first` - A sample from the first random variable.
`second` - A sample from the second random variable.
`condition` - A sample from the conditioning set of random variables.
Returns:
The GTest statistics.
• #### jointMI

```public static <T1,T2,T3> double jointMI(List<T1> first,
List<T2> second,
List<T3> target)```
Calculates the discrete Shannon joint mutual information, using histogram probability estimators. Arrays must be the same length.
Type Parameters:
`T1` - Type contained in the first array.
`T2` - Type contained in the second array.
`T3` - Type contained in the target array.
Parameters:
`first` - An array of values.
`second` - Another array of values.
`target` - Target array of values.
Returns:
The mutual information I(first,second;joint)
• #### jointMI

`public static <T1,T2,T3> double jointMI(TripleDistribution<T1,T2,T3> rv)`
Calculates the discrete Shannon joint mutual information, using histogram probability estimators. Arrays must be the same length.
Type Parameters:
`T1` - Type contained in the first array.
`T2` - Type contained in the second array.
`T3` - Type contained in the target array.
Parameters:
`rv` - The random variable to calculate the joint mi of
Returns:
The mutual information I(first,second;joint)
• #### conditionalMI

```public static <T1,T2,T3> double conditionalMI(List<T1> first,
List<T2> second,
List<T3> condition)```
Calculates the discrete Shannon conditional mutual information, using histogram probability estimators. Arrays must be the same length.
Type Parameters:
`T1` - Type contained in the first array.
`T2` - Type contained in the second array.
`T3` - Type contained in the condition array.
Parameters:
`first` - An array of values.
`second` - Another array of values.
`condition` - Array to condition upon.
Returns:
The conditional mutual information I(first;second|condition)
• #### conditionalMI

`public static <T1,T2,T3> double conditionalMI(TripleDistribution<T1,T2,T3> rv)`
Calculates the discrete Shannon conditional mutual information, using histogram probability estimators. Note this calculates I(T1;T2|T3).
Type Parameters:
`T1` - Type of the first variable.
`T2` - Type of the second variable.
`T3` - Type of the condition variable.
Parameters:
`rv` - The triple random variable of the three inputs.
Returns:
The conditional mutual information I(first;second|condition)
• #### conditionalMIFlipped

`public static <T1,T2,T3> double conditionalMIFlipped(TripleDistribution<T1,T2,T3> rv)`
Calculates the discrete Shannon conditional mutual information, using histogram probability estimators. Note this calculates I(T1;T3|T2).
Type Parameters:
`T1` - Type of the first variable.
`T2` - Type of the condition variable.
`T3` - Type of the second variable.
Parameters:
`rv` - The triple random variable of the three inputs.
Returns:
The conditional mutual information I(first;second|condition)
• #### mi

```public static <T1,T2> double mi(List<T1> first,
List<T2> second)```
Calculates the discrete Shannon mutual information, using histogram probability estimators. Arrays must be the same length.
Type Parameters:
`T1` - Type of the first array
`T2` - Type of the second array
Parameters:
`first` - An array of values
`second` - Another array of values
Returns:
The mutual information I(first;second)
• #### mi

`public static <T1,T2> double mi(PairDistribution<T1,T2> pairDist)`
Calculates the discrete Shannon mutual information, using histogram probability estimators.
Type Parameters:
`T1` - Type of the first variable
`T2` - Type of the second variable
Parameters:
`pairDist` - PairDistribution for the two variables.
Returns:
The mutual information I(first;second)
• #### jointEntropy

```public static <T1,T2> double jointEntropy(List<T1> first,
List<T2> second)```
Calculates the Shannon joint entropy of two arrays, using histogram probability estimators. Arrays must be same length.
Type Parameters:
`T1` - Type of the first array.
`T2` - Type of the second array.
Parameters:
`first` - An array of values.
`second` - Another array of values.
Returns:
The entropy H(first,second)
• #### conditionalEntropy

```public static <T1,T2> double conditionalEntropy(List<T1> vector,
List<T2> condition)```
Calculates the discrete Shannon conditional entropy of two arrays, using histogram probability estimators. Arrays must be the same length.
Type Parameters:
`T1` - Type of the first array.
`T2` - Type of the second array.
Parameters:
`vector` - The main array of values.
`condition` - The array to condition on.
Returns:
The conditional entropy H(vector|condition).
• #### entropy

`public static <T> double entropy(List<T> vector)`
Calculates the discrete Shannon entropy, using histogram probability estimators.
Type Parameters:
`T` - Type of the array.
Parameters:
`vector` - The array of values.
Returns:
The entropy H(vector).
• #### calculateCountDist

`public static <T> Map<T,Long> calculateCountDist(List<T> vector)`
Generate the counts for a single vector.
Type Parameters:
`T` - The type inside the vector.
Parameters:
`vector` - An array of values.
Returns:
A HashMap from states of T to counts.
• #### calculateEntropy

`public static double calculateEntropy(Stream<Double> vector)`
Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.
Parameters:
`vector` - The probability distribution.
Returns:
The entropy.
• #### calculateEntropy

`public static double calculateEntropy(DoubleStream vector)`
Calculates the discrete Shannon entropy of a stream, assuming each element of the stream is an element of the same probability distribution.
Parameters:
`vector` - The probability distribution.
Returns:
The entropy.