Class MultiLabel

java.lang.Object
org.tribuo.multilabel.MultiLabel
All Implemented Interfaces:
Serializable, Classifiable<MultiLabel>, Output<MultiLabel>

public class MultiLabel extends Object implements Classifiable<MultiLabel>
A class for multi-label classification.

Multi-label classification is where a (possibly empty) set of labels is predicted for each example. For example, predicting that a Reuters article has both the Finance and Sports labels.

Both the labels in the set, and the MultiLabel itself may have optional scores (which are not required to be probabilities). If the scores are not present these are represented by Double.NaN. This is most common with ground-truth labels which usually do not supply scores.

See Also:
  • Field Details Link icon

    • NEGATIVE_LABEL_STRING Link icon

      public static final String NEGATIVE_LABEL_STRING
      See Also:
    • NEGATIVE_LABEL Link icon

      public static final Label NEGATIVE_LABEL
      A Label representing the binary negative label. Used in binary approaches to multi-label classification to represent the absence of a Label.
  • Constructor Details Link icon

    • MultiLabel Link icon

      public MultiLabel(Set<Label> labels)
      Builds a MultiLabel object from a Set of Labels.

      Sets the whole set score to Double.NaN.

      Parameters:
      labels - A set of (possibly scored) labels.
    • MultiLabel Link icon

      public MultiLabel(Set<Label> labels, double score)
      Builds a MultiLabel object from a Set of Labels, when the whole set has a score as well as (optionally) the individual labels.
      Parameters:
      labels - A set of (possibly scored) labels.
      score - An overall score for the set.
    • MultiLabel Link icon

      public MultiLabel(String label)
      Builds a MultiLabel with a single String label.

      The created Label is unscored and used by MultiLabelInfo.

      Sets the whole set score to Double.NaN.

      Parameters:
      label - The label.
    • MultiLabel Link icon

      public MultiLabel(Label label)
      Builds a MultiLabel from a single Label.

      Sets the whole set score to Double.NaN.

      Parameters:
      label - The label.
  • Method Details Link icon

    • createLabel Link icon

      public Label createLabel(Label otherLabel)
      Creates a binary label from this multilabel. The returned Label is the input parameter if this MultiLabel contains that Label, and NEGATIVE_LABEL otherwise.
      Parameters:
      otherLabel - The input label.
      Returns:
      A binarised form of this MultiLabel.
    • getLabelString Link icon

      public String getLabelString()
      Returns a comma separated string representing the labels in this multilabel instance.
      Returns:
      A comma separated string of labels.
    • getScore Link icon

      public double getScore()
      The overall score for this set of labels.
      Returns:
      The score for this MultiLabel.
    • getLabelSet Link icon

      public Set<Label> getLabelSet()
      The set of labels contained in this multilabel.
      Returns:
      The set of labels.
    • getNameSet Link icon

      public Set<String> getNameSet()
      The set of strings that represent the labels in this multilabel.
      Returns:
      The set of strings.
    • contains Link icon

      public boolean contains(String input)
      Does this MultiLabel contain this string?
      Parameters:
      input - A string representing a Label.
      Returns:
      True if the label string is in this MultiLabel.
    • contains Link icon

      public boolean contains(Label input)
      Does this MultiLabel contain this Label?
      Parameters:
      input - A Label.
      Returns:
      True if the label is in this MultiLabel.
    • equals Link icon

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • fullEquals Link icon

      public boolean fullEquals(MultiLabel o)
      Description copied from interface: Output
      Compares other to this output. Uses all score values and the strings.
      Specified by:
      fullEquals in interface Output<MultiLabel>
      Parameters:
      o - Another output instance.
      Returns:
      True if the other instance has value equality to this instance. False otherwise.
    • hashCode Link icon

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString Link icon

      public String toString()
      Overrides:
      toString in class Object
    • copy Link icon

      public MultiLabel copy()
      Description copied from interface: Output
      Deep copy of the output up to it's immutable state.
      Specified by:
      copy in interface Output<MultiLabel>
      Returns:
      A copy of the output.
    • getSerializableForm Link icon

      public String getSerializableForm(boolean includeConfidence)
      For a MultiLabel with label set = {a, b, c}, outputs a string of the form:
       "a=true,b=true,c=true"
       
      If includeConfidence is set to true, outputs a string of the form:
       "a=true,b=true,c=true:0.5"
       
      where the last element after the colon is this label's score.
      Specified by:
      getSerializableForm in interface Output<MultiLabel>
      Parameters:
      includeConfidence - Include whatever confidence score the label contains, if known.
      Returns:
      a comma-separated, densified string representation of this MultiLabel
    • convertToDenseVector Link icon

      public DenseVector convertToDenseVector(ImmutableOutputInfo<MultiLabel> info)
      Converts this MultiLabel into a DenseVector using the indices from the output info. The label score is used as the value for that index if it's non-NaN, and is 1.0 otherwise. Labels which are not present are given the score 0.0.
      Parameters:
      info - The info to use for the ids.
      Returns:
      A DenseVector representing this MultiLabel.
    • convertToSparseVector Link icon

      public SparseVector convertToSparseVector(ImmutableOutputInfo<MultiLabel> info)
      Converts this MultiLabel into a SparseVector using the indices from the output info. The label score is used as the value for that index if it's non-NaN, and is 1.0 otherwise.
      Parameters:
      info - The info to use for the ids.
      Returns:
      A SparseVector representing this MultiLabel.
    • parseString Link icon

      public static MultiLabel parseString(String s)
      Parses a string of the form: dimension-name=output,...,dimension-name=output where output must be readable by Boolean.parseBoolean(String).
      Parameters:
      s - The string form of a multi-label example.
      Returns:
      A MultiLabel parsed from the input string.
    • parseString Link icon

      public static MultiLabel parseString(String s, char splitChar)
      Parses a string of the form:
       dimension-name=output<splitChar>...<splitChar>dimension-name=output
       
      where output must be readable by Boolean.parseBoolean(java.lang.String).
      Parameters:
      s - The string form of a multilabel output.
      splitChar - The char to split on.
      Returns:
      A MultiLabel output parsed from the input string.
    • parseElement Link icon

      public static com.oracle.labs.mlrg.olcut.util.Pair<String,Boolean> parseElement(String s)
      Parses a string of the form:
           class1=true
       
      OR of the form:
           class1
       
      In the first case, the value in the "key=value" pair must be parseable by Boolean.parseBoolean(String). TODO: Boolean.parseBoolean("1") returns false. We may want to think more carefully about this case.
      Parameters:
      s - The string form of a single dimension from a multilabel input.
      Returns:
      A tuple representing the dimension name and the value.
    • createFromPairList Link icon

      public static MultiLabel createFromPairList(List<com.oracle.labs.mlrg.olcut.util.Pair<String,Boolean>> dimensions)
      Creates a MultiLabel from a list of dimensions.
      Parameters:
      dimensions - The dimensions to use.
      Returns:
      A MultiLabel representing these dimensions.
    • intersectionSize Link icon

      public static int intersectionSize(MultiLabel first, MultiLabel second)
      The number of labels present in both MultiLabels.
      Parameters:
      first - The first MultiLabel.
      second - The second MultiLabel.
      Returns:
      The set intersection size.
    • unionSize Link icon

      public static int unionSize(MultiLabel first, MultiLabel second)
      The number of unique labels across both MultiLabels.
      Parameters:
      first - The first MultiLabel.
      second - The second MultiLabel.
      Returns:
      The set union size.
    • jaccardScore Link icon

      public static double jaccardScore(MultiLabel first, MultiLabel second)
      The Jaccard score/index between the two MultiLabels.
      Parameters:
      first - The first MultiLabel.
      second - The second MultiLabel.
      Returns:
      The Jaccard score.