Class WeightedInformationTheory

java.lang.Object
org.tribuo.util.infotheory.WeightedInformationTheory

public final class WeightedInformationTheory extends Object
A class of (discrete) weighted information theoretic functions. Gives warnings if there are insufficient samples to estimate the quantities accurately.

Defaults to log_2, so returns values in bits.

All functions expect that the element types have well defined equals and hashcode, and that equals is consistent with hashcode. The behaviour is undefined if this is not true.

  • Field Details

    • SAMPLES_RATIO

      public static final double SAMPLES_RATIO
      The ratio of samples to symbols before emitting a warning.
      See Also:
    • DEFAULT_MAP_SIZE

      public static final int DEFAULT_MAP_SIZE
      The initial size of the various maps.
      See Also:
    • LOG_2

      public static final double LOG_2
      Log base 2.
    • LOG_E

      public static final double LOG_E
      Log base e.
    • LOG_BASE

      public static double LOG_BASE
      Sets the base of the logarithm used in the information theoretic calculations. For LOG_2 the unit is "bit", for LOG_E the unit is "nat".
  • Method Details

    • jointMI

      public static <T1, T2, T3> double jointMI(List<T1> first, List<T2> second, List<T3> target, List<Double> weights)
      Calculates the discrete weighted joint mutual information, using histogram probability estimators. Arrays must be the same length.
      Type Parameters:
      T1 - Type contained in the first array.
      T2 - Type contained in the second array.
      T3 - Type contained in the target array.
      Parameters:
      first - An array of values.
      second - Another array of values.
      target - Target array of values.
      weights - Array of weight values.
      Returns:
      The weighted mutual information I_w(first,second;joint)
    • jointMI

      public static <T1, T2, T3> double jointMI(WeightedTripleDistribution<T1,T2,T3> tripleRV)
      Calculates the discrete weighted joint mutual information, using histogram probability estimators.
      Type Parameters:
      T1 - The first element type.
      T2 - The second element type.
      T3 - The third element type.
      Parameters:
      tripleRV - The weighted triple distribution.
      Returns:
      The weighted mutual information I_w(first,second;joint)
    • jointMI

      public static <T1, T2, T3> double jointMI(TripleDistribution<T1,T2,T3> rv, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)
      Calculates the discrete weighted joint mutual information, using histogram probability estimators.
      Type Parameters:
      T1 - The first element type.
      T2 - The second element type.
      T3 - The third element type.
      Parameters:
      rv - The triple distribution.
      weights - The weights for one of the variables.
      vs - The weighted variable id.
      Returns:
      The weighted mutual information I_w(first,second;joint)
    • conditionalMI

      public static <T1, T2, T3> double conditionalMI(List<T1> first, List<T2> second, List<T3> condition, List<Double> weights)
      Calculates the discrete weighted conditional mutual information, using histogram probability estimators. Arrays must be the same length.
      Type Parameters:
      T1 - Type contained in the first array.
      T2 - Type contained in the second array.
      T3 - Type contained in the condition array.
      Parameters:
      first - An array of values.
      second - Another array of values.
      condition - Array to condition upon.
      weights - Array of weight values.
      Returns:
      The weighted conditional mutual information I_w(first;second|condition)
    • conditionalMI

      public static <T1, T2, T3> double conditionalMI(WeightedTripleDistribution<T1,T2,T3> tripleRV)
      Calculates the discrete weighted conditional mutual information, using histogram probability estimators.
      Type Parameters:
      T1 - The first element type.
      T2 - The second element type.
      T3 - The condition element type.
      Parameters:
      tripleRV - The weighted triple distribution.
      Returns:
      The weighted conditional mutual information I_w(first;second|condition)
    • conditionalMI

      public static <T1, T2, T3> double conditionalMI(TripleDistribution<T1,T2,T3> rv, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)
      Calculates the discrete weighted conditional mutual information, using histogram probability estimators.
      Type Parameters:
      T1 - The first element type.
      T2 - The second element type.
      T3 - The condition element type.
      Parameters:
      rv - The triple distribution.
      weights - The element weights.
      vs - The variable to apply the weights to.
      Returns:
      The weighted conditional mutual information I_w(first;second|condition)
    • mi

      public static <T1, T2> double mi(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights)
      Calculates the discrete weighted mutual information, using histogram probability estimators.

      Arrays must be the same length.

      Type Parameters:
      T1 - Type of the first array
      T2 - Type of the second array
      Parameters:
      first - An array of values
      second - Another array of values
      weights - Array of weight values.
      Returns:
      The weighted mutual information I_w(first;Second)
    • mi

      public static <T1, T2> double mi(WeightedPairDistribution<T1,T2> jointDist)
      Calculates the discrete weighted mutual information, using histogram probability estimators.
      Type Parameters:
      T1 - Type of the first element.
      T2 - Type of the second element.
      Parameters:
      jointDist - The weighted joint distribution.
      Returns:
      The weighted mutual information I_w(first;Second)
    • mi

      public static <T1, T2> double mi(PairDistribution<T1,T2> pairDist, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)
      Calculates the discrete weighted mutual information, using histogram probability estimators.
      Type Parameters:
      T1 - Type of the first element.
      T2 - Type of the second element.
      Parameters:
      pairDist - The joint distribution.
      weights - The element weights.
      vs - The variable to apply the weights to.
      Returns:
      The weighted mutual information I_w(first;Second)
    • jointEntropy

      public static <T1, T2> double jointEntropy(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights)
      Calculates the Shannon/Guiasu weighted joint entropy of two arrays, using histogram probability estimators.

      Arrays must be same length.

      Type Parameters:
      T1 - Type of the first array.
      T2 - Type of the second array.
      Parameters:
      first - An array of values.
      second - Another array of values.
      weights - Array of weight values.
      Returns:
      The entropy H(first,second)
    • weightedConditionalEntropy

      public static <T1, T2> double weightedConditionalEntropy(ArrayList<T1> vector, ArrayList<T2> condition, ArrayList<Double> weights)
      Calculates the discrete Shannon/Guiasu Weighted Conditional Entropy of two arrays, using histogram probability estimators.

      Arrays must be the same length.

      Type Parameters:
      T1 - Type of the first array.
      T2 - Type of the second array.
      Parameters:
      vector - The main array of values.
      condition - The array to condition on.
      weights - Array of weight values.
      Returns:
      The weighted conditional entropy H_w(vector|condition).
    • weightedEntropy

      public static <T> double weightedEntropy(ArrayList<T> vector, ArrayList<Double> weights)
      Calculates the discrete Shannon/Guiasu Weighted Entropy, using histogram probability estimators.
      Type Parameters:
      T - Type of the array.
      Parameters:
      vector - The array of values.
      weights - Array of weight values.
      Returns:
      The weighted entropy H_w(vector).
    • calculateWeightedCountDist

      public static <T> Map<T,WeightCountTuple> calculateWeightedCountDist(ArrayList<T> vector, ArrayList<Double> weights)
      Generate the counts for a single vector.
      Type Parameters:
      T - The type inside the vector.
      Parameters:
      vector - An array of values.
      weights - The array of weight values.
      Returns:
      A HashMap from states of T to Pairs of count and total weight for that state.
    • normaliseWeights

      public static <T> void normaliseWeights(Map<T,WeightCountTuple> map)
      Normalizes the weights in the map, i.e., divides each weight by it's count.
      Type Parameters:
      T - The type of the variable that was counted.
      Parameters:
      map - The map to normalize.