java.lang.Object

org.tribuo.util.infotheory.WeightedInformationTheory

public final class WeightedInformationTheory extends Object

A class of (discrete) weighted information theoretic functions. Gives warnings if there are insufficient samples to estimate the quantities accurately.

Defaults to log_2, so returns values in bits.

All functions expect that the element types have well defined equals and hashcode, and that equals is consistent with hashcode. The behaviour is undefined if this is not true.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

WeightedInformationTheory.VariableSelector

Chooses which variable is the one with associated weights.
Field Summary

Fields

Modifier and Type

Field

Description

static final int

DEFAULT_MAP_SIZE

static final double

LOG_2

static double

LOG_BASE

Sets the base of the logarithm used in the information theoretic calculations.

static final double

LOG_E

static final double

SAMPLES_RATIO
Method Summary

Modifier and Type

Method

Description

static <T> Map<T, WeightCountTuple>

calculateWeightedCountDist(ArrayList<T> vector, ArrayList<Double> weights)

Generate the counts for a single vector.

static <T1,T2,T3> double

conditionalMI(List<T1> first, List<T2> second, List<T3> condition, List<Double> weights)

Calculates the discrete weighted conditional mutual information, using histogram probability estimators.

static <T1,T2,T3> double

conditionalMI(TripleDistribution<T1,T2,T3> rv, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)

static <T1,T2,T3> double

conditionalMI(WeightedTripleDistribution<T1,T2,T3> tripleRV)

static <T1,T2> double

jointEntropy(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights)

Calculates the Shannon/Guiasu weighted joint entropy of two arrays, using histogram probability estimators.

static <T1,T2,T3> double

jointMI(List<T1> first, List<T2> second, List<T3> target, List<Double> weights)

Calculates the discrete weighted joint mutual information, using histogram probability estimators.

static <T1,T2,T3> double

jointMI(TripleDistribution<T1,T2,T3> rv, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)

static <T1,T2,T3> double

jointMI(WeightedTripleDistribution<T1,T2,T3> tripleRV)

static <T1,T2> double

mi(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights)

Calculates the discrete weighted mutual information, using histogram probability estimators.

static <T1,T2> double

mi(PairDistribution<T1,T2> pairDist, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)

static <T1,T2> double

mi(WeightedPairDistribution<T1,T2> jointDist)

static <T> void

normaliseWeights(Map<T, WeightCountTuple> map)

Normalizes the weights in the map, i.e., divides each weight by it's count.

static <T1,T2> double

weightedConditionalEntropy(ArrayList<T1> vector, ArrayList<T2> condition, ArrayList<Double> weights)

Calculates the discrete Shannon/Guiasu Weighted Conditional Entropy of two arrays, using histogram probability estimators.

static <T> double

weightedEntropy(ArrayList<T> vector, ArrayList<Double> weights)

Calculates the discrete Shannon/Guiasu Weighted Entropy, using histogram probability estimators.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- SAMPLES_RATIO
  public static final double SAMPLES_RATIO
  
  See Also:
  
  Constant Field Values
- DEFAULT_MAP_SIZE
  public static final int DEFAULT_MAP_SIZE
  
  See Also:
  
  Constant Field Values
- LOG_2
  
  public static final double LOG_2
- LOG_E
  
  public static final double LOG_E
- LOG_BASE
  
  public static double LOG_BASE
  
  Sets the base of the logarithm used in the information theoretic calculations. For LOG_2 the unit is "bit", for LOG_E the unit is "nat".
Method Details
- jointMI
  
  public static <T1,T2,T3> double jointMI(List<T1> first, List<T2> second, List<T3> target, List<Double> weights)
  
  Calculates the discrete weighted joint mutual information, using histogram probability estimators. Arrays must be the same length.
  
  Type Parameters:
  
  T1 - Type contained in the first array.
  
  T2 - Type contained in the second array.
  
  T3 - Type contained in the target array.
  
  Parameters:
  
  first - An array of values.
  
  second - Another array of values.
  
  target - Target array of values.
  
  weights - Array of weight values.
  
  Returns:
  
  The mutual information I(first,second;joint)
- jointMI
  
  public static <T1,T2,T3> double jointMI(WeightedTripleDistribution<T1,T2,T3> tripleRV)
- jointMI
  
  public static <T1,T2,T3> double jointMI(TripleDistribution<T1,T2,T3> rv, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)
- conditionalMI
  
  public static <T1,T2,T3> double conditionalMI(List<T1> first, List<T2> second, List<T3> condition, List<Double> weights)
  
  Calculates the discrete weighted conditional mutual information, using histogram probability estimators. Arrays must be the same length.
  
  Type Parameters:
  
  T1 - Type contained in the first array.
  
  T2 - Type contained in the second array.
  
  T3 - Type contained in the condition array.
  
  Parameters:
  
  first - An array of values.
  
  second - Another array of values.
  
  condition - Array to condition upon.
  
  weights - Array of weight values.
  
  Returns:
  
  The conditional mutual information I(first;second|condition)
- conditionalMI
  
  public static <T1,T2,T3> double conditionalMI(WeightedTripleDistribution<T1,T2,T3> tripleRV)
- conditionalMI
  
  public static <T1,T2,T3> double conditionalMI(TripleDistribution<T1,T2,T3> rv, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)
- mi
  
  public static <T1,T2> double mi(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights)
  
  Calculates the discrete weighted mutual information, using histogram probability estimators.
  Arrays must be the same length.
  
  Type Parameters:
  
  T1 - Type of the first array
  
  T2 - Type of the second array
  
  Parameters:
  
  first - An array of values
  
  second - Another array of values
  
  weights - Array of weight values.
  
  Returns:
  
  The mutual information I(first;Second)
- mi
  
  public static <T1,T2> double mi(WeightedPairDistribution<T1,T2> jointDist)
- mi
  
  public static <T1,T2> double mi(PairDistribution<T1,T2> pairDist, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)
- jointEntropy
  
  public static <T1,T2> double jointEntropy(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights)
  
  Calculates the Shannon/Guiasu weighted joint entropy of two arrays, using histogram probability estimators.
  Arrays must be same length.
  
  Type Parameters:
  
  T1 - Type of the first array.
  
  T2 - Type of the second array.
  
  Parameters:
  
  first - An array of values.
  
  second - Another array of values.
  
  weights - Array of weight values.
  
  Returns:
  
  The entropy H(first,second)
- weightedConditionalEntropy
  
  public static <T1,T2> double weightedConditionalEntropy(ArrayList<T1> vector, ArrayList<T2> condition, ArrayList<Double> weights)
  
  Calculates the discrete Shannon/Guiasu Weighted Conditional Entropy of two arrays, using histogram probability estimators.
  Arrays must be the same length.
  
  Type Parameters:
  
  T1 - Type of the first array.
  
  T2 - Type of the second array.
  
  Parameters:
  
  vector - The main array of values.
  
  condition - The array to condition on.
  
  weights - Array of weight values.
  
  Returns:
  
  The weighted conditional entropy H_w(vector|condition).
- weightedEntropy
  
  public static <T> double weightedEntropy(ArrayList<T> vector, ArrayList<Double> weights)
  
  Calculates the discrete Shannon/Guiasu Weighted Entropy, using histogram probability estimators.
  
  Type Parameters:
  
  T - Type of the array.
  
  Parameters:
  
  vector - The array of values.
  
  weights - Array of weight values.
  
  Returns:
  
  The weighted entropy H_w(vector).
- calculateWeightedCountDist
  
  public static <T> Map<T, WeightCountTuple> calculateWeightedCountDist(ArrayList<T> vector, ArrayList<Double> weights)
  
  Generate the counts for a single vector.
  
  Type Parameters:
  
  T - The type inside the vector.
  
  Parameters:
  
  vector - An array of values.
  
  weights - The array of weight values.
  
  Returns:
  
  A HashMap from states of T to Pairs of count and total weight for that state.
- normaliseWeights
  
  public static <T> void normaliseWeights(Map<T, WeightCountTuple> map)
  
  Normalizes the weights in the map, i.e., divides each weight by it's count.
  
  Type Parameters:
  
  T - The type of the variable that was counted.
  
  Parameters:
  
  map - The map to normalize.

Class WeightedInformationTheory

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

SAMPLES_RATIO

DEFAULT_MAP_SIZE

LOG_2

LOG_E

LOG_BASE

Method Details

jointMI

jointMI

jointMI

conditionalMI

conditionalMI

conditionalMI

mi

mi

mi

jointEntropy

weightedConditionalEntropy

weightedEntropy

calculateWeightedCountDist

normaliseWeights