Package org.tribuo.multilabel
Class MultiLabel
java.lang.Object
org.tribuo.multilabel.MultiLabel
- All Implemented Interfaces:
Serializable
,Classifiable<MultiLabel>
,Output<MultiLabel>
A class for multi-label classification.
Multi-label classification is where a (possibly empty) set of labels is predicted for each example. For example, predicting that a Reuters article has both the Finance and Sports labels.
Both the labels in the set, and the MultiLabel itself may have optional
scores (which are not required to be probabilities). If the scores are
not present these are represented by Double.NaN
. This is most
common with ground-truth labels which usually do not supply scores.
- See Also:
-
Field Summary
-
Constructor Summary
ConstructorDescriptionMultiLabel
(String label) Builds a MultiLabel with a single String label.MultiLabel
(Set<Label> labels) Builds a MultiLabel object from a Set of Labels.MultiLabel
(Set<Label> labels, double score) Builds a MultiLabel object from a Set of Labels, when the whole set has a score as well as (optionally) the individual labels.MultiLabel
(Label label) Builds a MultiLabel from a single Label. -
Method Summary
Modifier and TypeMethodDescriptionboolean
Does this MultiLabel contain this string?boolean
Does this MultiLabel contain this Label?Converts this MultiLabel into a DenseVector using the indices from the output info.Converts this MultiLabel into a SparseVector using the indices from the output info.copy()
Deep copy of the output up to it's immutable state.static MultiLabel
createFromPairList
(List<com.oracle.labs.mlrg.olcut.util.Pair<String, Boolean>> dimensions) Creates a MultiLabel from a list of dimensions.createLabel
(Label otherLabel) Creates a binary label from this multilabel.boolean
boolean
Compares other to this output.getLabelScore
(Label label) The score for the specified label if present, returns an empty optional otherwise.The set of labels contained in this multilabel.Returns a comma separated string representing the labels in this multilabel instance.The set of strings that represent the labels in this multilabel.double
getScore()
The overall score for this set of labels.getSerializableForm
(boolean includeConfidence) For a MultiLabel with label set = {a, b, c}, outputs a string of the form:int
hashCode()
static int
intersectionSize
(MultiLabel first, MultiLabel second) The number of labels present in both MultiLabels.static double
jaccardScore
(MultiLabel first, MultiLabel second) The Jaccard score/index between the two MultiLabels.Parses a string of the form:static MultiLabel
Parses a string of the form: dimension-name=output,...,dimension-name=output where output must be readable byBoolean.parseBoolean(String)
.static MultiLabel
parseString
(String s, char splitChar) Parses a string of the form:toString()
static int
unionSize
(MultiLabel first, MultiLabel second) The number of unique labels across both MultiLabels.
-
Field Details
-
NEGATIVE_LABEL_STRING
The string for the binary negative label.- See Also:
-
NEGATIVE_LABEL
A Label representing the binary negative label. Used in binary approaches to multi-label classification to represent the absence of a Label.
-
-
Constructor Details
-
MultiLabel
Builds a MultiLabel object from a Set of Labels.Sets the whole set score to
Double.NaN
.- Parameters:
labels
- A set of (possibly scored) labels.
-
MultiLabel
Builds a MultiLabel object from a Set of Labels, when the whole set has a score as well as (optionally) the individual labels.- Parameters:
labels
- A set of (possibly scored) labels.score
- An overall score for the set.
-
MultiLabel
Builds a MultiLabel with a single String label.The created
Label
is unscored and used by MultiLabelInfo.Sets the whole set score to
Double.NaN
.- Parameters:
label
- The label.
-
MultiLabel
Builds a MultiLabel from a single Label.Sets the whole set score to
Double.NaN
.- Parameters:
label
- The label.
-
-
Method Details
-
createLabel
Creates a binary label from this multilabel. The returned Label is the input parameter if this MultiLabel contains that Label, andNEGATIVE_LABEL
otherwise.- Parameters:
otherLabel
- The input label.- Returns:
- A binarised form of this MultiLabel.
-
getLabelString
Returns a comma separated string representing the labels in this multilabel instance.- Returns:
- A comma separated string of labels.
-
getScore
public double getScore()The overall score for this set of labels.- Returns:
- The score for this MultiLabel.
-
getLabelScore
The score for the specified label if present, returns an empty optional otherwise.- Parameters:
label
- The label to check.- Returns:
- The score for the label if present.
-
getLabelSet
The set of labels contained in this multilabel.- Returns:
- The set of labels.
-
getNameSet
The set of strings that represent the labels in this multilabel.- Returns:
- The set of strings.
-
contains
Does this MultiLabel contain this string?- Parameters:
input
- A string representing aLabel
.- Returns:
- True if the label string is in this MultiLabel.
-
contains
Does this MultiLabel contain this Label?- Parameters:
input
- ALabel
.- Returns:
- True if the label is in this MultiLabel.
-
equals
-
fullEquals
Description copied from interface:Output
Compares other to this output. Uses all score values and the strings.- Specified by:
fullEquals
in interfaceOutput<MultiLabel>
- Parameters:
o
- Another output instance.- Returns:
- True if the other instance has value equality to this instance. False otherwise.
-
hashCode
public int hashCode() -
toString
-
copy
Description copied from interface:Output
Deep copy of the output up to it's immutable state.- Specified by:
copy
in interfaceOutput<MultiLabel>
- Returns:
- A copy of the output.
-
getSerializableForm
For a MultiLabel with label set = {a, b, c}, outputs a string of the form:"a=true,b=true,c=true"
If includeConfidence is set to true, outputs a string of the form:"a=true,b=true,c=true:0.5"
where the last element after the colon is this label's score.- Specified by:
getSerializableForm
in interfaceOutput<MultiLabel>
- Parameters:
includeConfidence
- Include whatever confidence score the label contains, if known.- Returns:
- a comma-separated, densified string representation of this MultiLabel
-
convertToDenseVector
Converts this MultiLabel into a DenseVector using the indices from the output info. The label score is used as the value for that index if it's non-NaN, and is 1.0 otherwise. Labels which are not present are given the score 0.0.- Parameters:
info
- The info to use for the ids.- Returns:
- A DenseVector representing this MultiLabel.
-
convertToSparseVector
Converts this MultiLabel into a SparseVector using the indices from the output info. The label score is used as the value for that index if it's non-NaN, and is 1.0 otherwise.- Parameters:
info
- The info to use for the ids.- Returns:
- A SparseVector representing this MultiLabel.
-
parseString
Parses a string of the form: dimension-name=output,...,dimension-name=output where output must be readable byBoolean.parseBoolean(String)
.- Parameters:
s
- The string form of a multi-label example.- Returns:
- A
MultiLabel
parsed from the input string.
-
parseString
Parses a string of the form:dimension-name=output<splitChar>...<splitChar>dimension-name=output
where output must be readable byBoolean.parseBoolean(java.lang.String)
.- Parameters:
s
- The string form of a multilabel output.splitChar
- The char to split on.- Returns:
- A
MultiLabel
output parsed from the input string.
-
parseElement
Parses a string of the form:class1=true
OR of the form:class1
In the first case, the value in the "key=value" pair must be parseable byBoolean.parseBoolean(String)
. TODO: Boolean.parseBoolean("1") returns false. We may want to think more carefully about this case.- Parameters:
s
- The string form of a single dimension from a multilabel input.- Returns:
- A tuple representing the dimension name and the value.
-
createFromPairList
public static MultiLabel createFromPairList(List<com.oracle.labs.mlrg.olcut.util.Pair<String, Boolean>> dimensions) Creates a MultiLabel from a list of dimensions.- Parameters:
dimensions
- The dimensions to use.- Returns:
- A MultiLabel representing these dimensions.
-
intersectionSize
The number of labels present in both MultiLabels.- Parameters:
first
- The first MultiLabel.second
- The second MultiLabel.- Returns:
- The set intersection size.
-
unionSize
The number of unique labels across both MultiLabels.- Parameters:
first
- The first MultiLabel.second
- The second MultiLabel.- Returns:
- The set union size.
-
jaccardScore
The Jaccard score/index between the two MultiLabels.- Parameters:
first
- The first MultiLabel.second
- The second MultiLabel.- Returns:
- The Jaccard score.
-