Class WeightedInformationTheory

java.lang.Object
org.tribuo.util.infotheory.WeightedInformationTheory

public final class WeightedInformationTheory extends Object
A class of (discrete) weighted information theoretic functions. Gives warnings if there are insufficient samples to estimate the quantities accurately.

Defaults to log_2, so returns values in bits.

All functions expect that the element types have well defined equals and hashcode, and that equals is consistent with hashcode. The behaviour is undefined if this is not true.

  • Field Details

  • Method Details

    • jointMI

      public static <T1,T2,T3> double jointMI(List<T1> first, List<T2> second, List<T3> target, List<Double> weights)
      Calculates the discrete weighted joint mutual information, using histogram probability estimators. Arrays must be the same length.
      Type Parameters:
      T1 - Type contained in the first array.
      T2 - Type contained in the second array.
      T3 - Type contained in the target array.
      Parameters:
      first - An array of values.
      second - Another array of values.
      target - Target array of values.
      weights - Array of weight values.
      Returns:
      The mutual information I(first,second;joint)
    • jointMI

      public static <T1,T2,T3> double jointMI(WeightedTripleDistribution<T1,T2,T3> tripleRV)
    • jointMI

      public static <T1,T2,T3> double jointMI(TripleDistribution<T1,T2,T3> rv, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)
    • conditionalMI

      public static <T1,T2,T3> double conditionalMI(List<T1> first, List<T2> second, List<T3> condition, List<Double> weights)
      Calculates the discrete weighted conditional mutual information, using histogram probability estimators. Arrays must be the same length.
      Type Parameters:
      T1 - Type contained in the first array.
      T2 - Type contained in the second array.
      T3 - Type contained in the condition array.
      Parameters:
      first - An array of values.
      second - Another array of values.
      condition - Array to condition upon.
      weights - Array of weight values.
      Returns:
      The conditional mutual information I(first;second|condition)
    • conditionalMI

      public static <T1,T2,T3> double conditionalMI(WeightedTripleDistribution<T1,T2,T3> tripleRV)
    • conditionalMI

      public static <T1,T2,T3> double conditionalMI(TripleDistribution<T1,T2,T3> rv, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)
    • mi

      public static <T1,T2> double mi(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights)
      Calculates the discrete weighted mutual information, using histogram probability estimators.

      Arrays must be the same length.

      Type Parameters:
      T1 - Type of the first array
      T2 - Type of the second array
      Parameters:
      first - An array of values
      second - Another array of values
      weights - Array of weight values.
      Returns:
      The mutual information I(first;Second)
    • mi

      public static <T1,T2> double mi(WeightedPairDistribution<T1,T2> jointDist)
    • mi

      public static <T1,T2> double mi(PairDistribution<T1,T2> pairDist, Map<?,Double> weights, WeightedInformationTheory.VariableSelector vs)
    • jointEntropy

      public static <T1,T2> double jointEntropy(ArrayList<T1> first, ArrayList<T2> second, ArrayList<Double> weights)
      Calculates the Shannon/Guiasu weighted joint entropy of two arrays, using histogram probability estimators.

      Arrays must be same length.

      Type Parameters:
      T1 - Type of the first array.
      T2 - Type of the second array.
      Parameters:
      first - An array of values.
      second - Another array of values.
      weights - Array of weight values.
      Returns:
      The entropy H(first,second)
    • weightedConditionalEntropy

      public static <T1,T2> double weightedConditionalEntropy(ArrayList<T1> vector, ArrayList<T2> condition, ArrayList<Double> weights)
      Calculates the discrete Shannon/Guiasu Weighted Conditional Entropy of two arrays, using histogram probability estimators.

      Arrays must be the same length.

      Type Parameters:
      T1 - Type of the first array.
      T2 - Type of the second array.
      Parameters:
      vector - The main array of values.
      condition - The array to condition on.
      weights - Array of weight values.
      Returns:
      The weighted conditional entropy H_w(vector|condition).
    • weightedEntropy

      public static <T> double weightedEntropy(ArrayList<T> vector, ArrayList<Double> weights)
      Calculates the discrete Shannon/Guiasu Weighted Entropy, using histogram probability estimators.
      Type Parameters:
      T - Type of the array.
      Parameters:
      vector - The array of values.
      weights - Array of weight values.
      Returns:
      The weighted entropy H_w(vector).
    • calculateWeightedCountDist

      public static <T> Map<T, WeightCountTuple> calculateWeightedCountDist(ArrayList<T> vector, ArrayList<Double> weights)
      Generate the counts for a single vector.
      Type Parameters:
      T - The type inside the vector.
      Parameters:
      vector - An array of values.
      weights - The array of weight values.
      Returns:
      A HashMap from states of T to Pairs of count and total weight for that state.
    • normaliseWeights

      public static <T> void normaliseWeights(Map<T, WeightCountTuple> map)
      Normalizes the weights in the map, i.e., divides each weight by it's count.
      Type Parameters:
      T - The type of the variable that was counted.
      Parameters:
      map - The map to normalize.