Class MultiLabel

java.lang.Object
org.tribuo.multilabel.MultiLabel
All Implemented Interfaces:
Serializable, Classifiable<MultiLabel>, Output<MultiLabel>, ProtoSerializable<org.tribuo.protos.core.OutputProto>

public class MultiLabel extends Object implements Classifiable<MultiLabel>
A class for multi-label classification.

Multi-label classification is where a (possibly empty) set of labels is predicted for each example. For example, predicting that a Reuters article has both the Finance and Sports labels.

Both the labels in the set, and the MultiLabel itself may have optional scores (which are not required to be probabilities). If the scores are not present these are represented by Double.NaN. This is most common with ground-truth labels which usually do not supply scores.

See Also:
  • Field Details

    • NEGATIVE_LABEL_STRING

      public static final String NEGATIVE_LABEL_STRING
      The string for the binary negative label.
      See Also:
    • NEGATIVE_LABEL

      public static final Label NEGATIVE_LABEL
      A Label representing the binary negative label. Used in binary approaches to multi-label classification to represent the absence of a Label.
  • Constructor Details

    • MultiLabel

      public MultiLabel(Set<Label> labels)
      Builds a MultiLabel object from a Set of Labels.

      Sets the whole set score to Double.NaN.

      Parameters:
      labels - A set of (possibly scored) labels.
    • MultiLabel

      public MultiLabel(Set<Label> labels, double score)
      Builds a MultiLabel object from a Set of Labels, when the whole set has a score as well as (optionally) the individual labels.
      Parameters:
      labels - A set of (possibly scored) labels.
      score - An overall score for the set.
    • MultiLabel

      public MultiLabel(String label)
      Builds a MultiLabel with a single String label.

      The created Label is unscored and used by MultiLabelInfo.

      Sets the whole set score to Double.NaN.

      Parameters:
      label - The label.
    • MultiLabel

      public MultiLabel(Label label)
      Builds a MultiLabel from a single Label.

      Sets the whole set score to Double.NaN.

      Parameters:
      label - The label.
  • Method Details

    • deserializeFromProto

      public static MultiLabel deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException
      Deserialization factory.
      Parameters:
      version - The serialized object version.
      className - The class name.
      message - The serialized data.
      Returns:
      The deserialized object.
      Throws:
      com.google.protobuf.InvalidProtocolBufferException - If the protobuf could not be parsed from the message.
    • serialize

      public org.tribuo.protos.core.OutputProto serialize()
      Description copied from interface: ProtoSerializable
      Serializes this object to a protobuf.
      Specified by:
      serialize in interface ProtoSerializable<org.tribuo.protos.core.OutputProto>
      Returns:
      The protobuf.
    • createLabel

      public Label createLabel(Label otherLabel)
      Creates a binary label from this multilabel. The returned Label is the input parameter if this MultiLabel contains that Label, and NEGATIVE_LABEL otherwise.
      Parameters:
      otherLabel - The input label.
      Returns:
      A binarised form of this MultiLabel.
    • getLabelString

      public String getLabelString()
      Returns a comma separated string representing the labels in this multilabel instance.
      Returns:
      A comma separated string of labels.
    • getScore

      public double getScore()
      The overall score for this set of labels.
      Returns:
      The score for this MultiLabel.
    • getLabelScore

      public OptionalDouble getLabelScore(Label label)
      The score for the specified label if present, returns an empty optional otherwise.
      Parameters:
      label - The label to check.
      Returns:
      The score for the label if present.
    • getLabelSet

      public Set<Label> getLabelSet()
      The set of labels contained in this multilabel.
      Returns:
      The set of labels.
    • getNameSet

      public Set<String> getNameSet()
      The set of strings that represent the labels in this multilabel.
      Returns:
      The set of strings.
    • contains

      public boolean contains(String input)
      Does this MultiLabel contain this string?
      Parameters:
      input - A string representing a Label.
      Returns:
      True if the label string is in this MultiLabel.
    • contains

      public boolean contains(Label input)
      Does this MultiLabel contain this Label?
      Parameters:
      input - A Label.
      Returns:
      True if the label is in this MultiLabel.
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • fullEquals

      public boolean fullEquals(MultiLabel o)
      Description copied from interface: Output
      Compares other to this output. Uses all score values and the strings.
      Specified by:
      fullEquals in interface Output<MultiLabel>
      Parameters:
      o - Another output instance.
      Returns:
      True if the other instance has value equality to this instance. False otherwise.
    • fullEquals

      public boolean fullEquals(MultiLabel o, double tolerance)
      Description copied from interface: Output
      Compares other to this output. Uses all score values and the strings.

      The default implementation of this method ignores the tolerance for compatibility reasons, it is overridden in all output classes in Tribuo.

      Specified by:
      fullEquals in interface Output<MultiLabel>
      Parameters:
      o - Another output instance.
      tolerance - The tolerance level for an absolute value comparison.
      Returns:
      True if the other instance has value equality to this instance. False otherwise.
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • copy

      public MultiLabel copy()
      Description copied from interface: Output
      Deep copy of the output up to its immutable state.
      Specified by:
      copy in interface Output<MultiLabel>
      Returns:
      A copy of the output.
    • getSerializableForm

      public String getSerializableForm(boolean includeConfidence)
      For a MultiLabel with label set = {a, b, c}, outputs a string of the form:
       "a=true,b=true,c=true"
       
      If includeConfidence is set to true, outputs a string of the form:
       "a=true,b=true,c=true:0.5"
       
      where the last element after the colon is this label's score.
      Specified by:
      getSerializableForm in interface Output<MultiLabel>
      Parameters:
      includeConfidence - Include whatever confidence score the label contains, if known.
      Returns:
      a comma-separated, densified string representation of this MultiLabel
    • convertToDenseVector

      public DenseVector convertToDenseVector(ImmutableOutputInfo<MultiLabel> info)
      Converts this MultiLabel into a DenseVector using the indices from the output info. The label score is used as the value for that index if it's non-NaN, and is 1.0 otherwise. Labels which are not present are given the score 0.0.
      Parameters:
      info - The info to use for the ids.
      Returns:
      A DenseVector representing this MultiLabel.
    • convertToSparseVector

      public SparseVector convertToSparseVector(ImmutableOutputInfo<MultiLabel> info)
      Converts this MultiLabel into a SparseVector using the indices from the output info. The label score is used as the value for that index if it's non-NaN, and is 1.0 otherwise.
      Parameters:
      info - The info to use for the ids.
      Returns:
      A SparseVector representing this MultiLabel.
    • parseString

      public static MultiLabel parseString(String s)
      Parses a string of the form: dimension-name=output,...,dimension-name=output where output must be readable by Boolean.parseBoolean(String).
      Parameters:
      s - The string form of a multi-label example.
      Returns:
      A MultiLabel parsed from the input string.
    • parseString

      public static MultiLabel parseString(String s, char splitChar)
      Parses a string of the form:
       dimension-name=output<splitChar>...<splitChar>dimension-name=output
       
      where output must be readable by Boolean.parseBoolean(java.lang.String).
      Parameters:
      s - The string form of a multilabel output.
      splitChar - The char to split on.
      Returns:
      A MultiLabel output parsed from the input string.
    • parseElement

      public static com.oracle.labs.mlrg.olcut.util.Pair<String,Boolean> parseElement(String s)
      Parses a string of the form:
           class1=true
       
      OR of the form:
           class1
       
      In the first case, the value in the "key=value" pair must be parseable by Boolean.parseBoolean(String). TODO: Boolean.parseBoolean("1") returns false. We may want to think more carefully about this case.
      Parameters:
      s - The string form of a single dimension from a multilabel input.
      Returns:
      A tuple representing the dimension name and the value.
    • createFromPairList

      public static MultiLabel createFromPairList(List<com.oracle.labs.mlrg.olcut.util.Pair<String,Boolean>> dimensions)
      Creates a MultiLabel from a list of dimensions.
      Parameters:
      dimensions - The dimensions to use.
      Returns:
      A MultiLabel representing these dimensions.
    • intersectionSize

      public static int intersectionSize(MultiLabel first, MultiLabel second)
      The number of labels present in both MultiLabels.
      Parameters:
      first - The first MultiLabel.
      second - The second MultiLabel.
      Returns:
      The set intersection size.
    • unionSize

      public static int unionSize(MultiLabel first, MultiLabel second)
      The number of unique labels across both MultiLabels.
      Parameters:
      first - The first MultiLabel.
      second - The second MultiLabel.
      Returns:
      The set union size.
    • jaccardScore

      public static double jaccardScore(MultiLabel first, MultiLabel second)
      The Jaccard score/index between the two MultiLabels.
      Parameters:
      first - The first MultiLabel.
      second - The second MultiLabel.
      Returns:
      The Jaccard score.