Class CRFModel
java.lang.Object
org.tribuo.sequence.SequenceModel<Label>
org.tribuo.classification.sequence.ConfidencePredictingSequenceModel
org.tribuo.classification.sgd.crf.CRFModel
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<ModelProvenance>
,Serializable
,ProtoSerializable<org.tribuo.protos.core.SequenceModelProto>
An inference time model for a linear chain CRF trained using SGD.
Can be switched to use Viterbi, belief propagation, or constrained BP at test time. By default it uses Viterbi.
See:
Lafferty J, McCallum A, Pereira FC. "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data" Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001).
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
The type of subsequence level confidence to predict.Nested classes/interfaces inherited from class org.tribuo.classification.sequence.ConfidencePredictingSequenceModel
ConfidencePredictingSequenceModel.Subsequence
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
Protobuf serialization version.Fields inherited from class org.tribuo.sequence.SequenceModel
featureIDMap, name, outputIDMap, provenanceOutput
Fields inherited from interface org.tribuo.protos.ProtoSerializable
DESERIALIZATION_METHOD_NAME, PROVENANCE_SERIALIZER
-
Method Summary
Modifier and TypeMethodDescriptionstatic com.oracle.labs.mlrg.olcut.util.Pair<int[],
SparseVector[]> convert
(SequenceExample<Label> example, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<Label> labelIDMap) Deprecated.static <T extends Output<T>>
SparseVector[]convert
(SequenceExample<T> example, ImmutableFeatureMap featureIDMap) Deprecated.As it's replaced withconvertToVector(org.tribuo.sequence.SequenceExample<T>, org.tribuo.ImmutableFeatureMap)
which is more flexible.static com.oracle.labs.mlrg.olcut.util.Pair<int[],
SGDVector[]> convertToVector
(SequenceExample<Label> example, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<Label> labelIDMap) Converts aSequenceExample
into an array ofSGDVector
s and labels suitable for CRF prediction.convertToVector
(SequenceExample<T> example, ImmutableFeatureMap featureIDMap) Converts aSequenceExample
into an array ofSGDVector
s suitable for CRF prediction.static CRFModel
deserializeFromProto
(int version, String className, com.google.protobuf.Any message) Deserialization factory.Generates a human readable string containing all the weights in this model.getFeatureWeights
(int featureID) Get a copy of the weights for featurefeatureID
.getFeatureWeights
(String featureName) Get a copy of the weights for feature namedfeatureName
.getTopFeatures
(int n) Gets the topn
features associated with this model.predict
(SequenceExample<Label> example) Uses the model to predict the output for a single example.scoreChunks
(SequenceExample<Label> example, List<Chunk> chunks) Scores the chunks using constrained belief propagation.<SUB extends ConfidencePredictingSequenceModel.Subsequence>
List<Double>scoreSubsequences
(SequenceExample<Label> example, List<Prediction<Label>> predictions, List<SUB> subsequences) The scoring function for the subsequences.org.tribuo.protos.core.SequenceModelProto
Serializes this object to a protobuf.void
Sets the inference method used for confidence prediction.Methods inherited from class org.tribuo.classification.sequence.ConfidencePredictingSequenceModel
multiplyWeights
Methods inherited from class org.tribuo.sequence.SequenceModel
castModel, createDataCarrier, deserialize, deserializeFromFile, deserializeFromStream, getFeatureIDMap, getName, getOutputIDInfo, getProvenance, predict, predict, serializeToFile, serializeToStream, setName, toMaxLabels, toString, validate
-
Field Details
-
CURRENT_VERSION
public static final int CURRENT_VERSIONProtobuf serialization version.- See Also:
-
-
Method Details
-
deserializeFromProto
public static CRFModel deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException Deserialization factory.- Parameters:
version
- The serialized object version.className
- The class name.message
- The serialized data.- Returns:
- The deserialized object.
- Throws:
com.google.protobuf.InvalidProtocolBufferException
- If the protobuf could not be parsed from themessage
.
-
setConfidenceType
Sets the inference method used for confidence prediction. If CONSTRAINED_BP uses the constrained belief propagation algorithm from Culotta and McCallum 2004, if MULTIPLY multiplies the maximum marginal for each token, if NONE uses Viterbi.- Parameters:
type
- Enum specifying the confidence type.
-
getFeatureWeights
Get a copy of the weights for featurefeatureID
.- Parameters:
featureID
- The feature ID.- Returns:
- The per class weights.
-
getFeatureWeights
Get a copy of the weights for feature namedfeatureName
.- Parameters:
featureName
- The feature name.- Returns:
- The per class weights.
-
predict
Description copied from class:SequenceModel
Uses the model to predict the output for a single example.- Specified by:
predict
in classSequenceModel<Label>
- Parameters:
example
- the example to predict.- Returns:
- the result of the prediction.
-
getTopFeatures
Description copied from class:SequenceModel
Gets the topn
features associated with this model.If the model does not produce per output feature lists, it returns a map with a single element with key Model.ALL_OUTPUTS.
If the model cannot describe it's top features then it returns
Collections.emptyMap()
.- Specified by:
getTopFeatures
in classSequenceModel<Label>
- Parameters:
n
- the number of features to return. If this value is less than 0, all features should be returned for each class, unless the model cannot score its features.- Returns:
- a map from string outputs to an ordered list of pairs of feature names and weights associated with that feature in the model
-
scoreSubsequences
public <SUB extends ConfidencePredictingSequenceModel.Subsequence> List<Double> scoreSubsequences(SequenceExample<Label> example, List<Prediction<Label>> predictions, List<SUB> subsequences) Description copied from class:ConfidencePredictingSequenceModel
The scoring function for the subsequences. Provides the scores which should be assigned to each subsequence.- Specified by:
scoreSubsequences
in classConfidencePredictingSequenceModel
- Type Parameters:
SUB
- The subsequence type.- Parameters:
example
- The input sequence example.predictions
- The predictions produced by this model.subsequences
- The subsequences to score.- Returns:
- The scores for the subsequences.
-
scoreChunks
Scores the chunks using constrained belief propagation.- Parameters:
example
- The example to score.chunks
- The predicted chunks.- Returns:
- The scores.
-
generateWeightsString
Generates a human readable string containing all the weights in this model.- Returns:
- A string containing all the weight values.
-
serialize
public org.tribuo.protos.core.SequenceModelProto serialize()Description copied from interface:ProtoSerializable
Serializes this object to a protobuf.- Specified by:
serialize
in interfaceProtoSerializable<org.tribuo.protos.core.SequenceModelProto>
- Overrides:
serialize
in classSequenceModel<Label>
- Returns:
- The protobuf.
-
convert
@Deprecated public static <T extends Output<T>> SparseVector[] convert(SequenceExample<T> example, ImmutableFeatureMap featureIDMap) Deprecated.As it's replaced withconvertToVector(org.tribuo.sequence.SequenceExample<T>, org.tribuo.ImmutableFeatureMap)
which is more flexible.Converts aSequenceExample
into an array ofSparseVector
s suitable for CRF prediction.- Type Parameters:
T
- The type parameter of the sequence example.- Parameters:
example
- The sequence example to convertfeatureIDMap
- The feature id map, used to discover the number of features.- Returns:
- An array of
SparseVector
.
-
convert
@Deprecated public static com.oracle.labs.mlrg.olcut.util.Pair<int[],SparseVector[]> convert(SequenceExample<Label> example, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<Label> labelIDMap) Deprecated.As it's replaced withconvertToVector(org.tribuo.sequence.SequenceExample<T>, org.tribuo.ImmutableFeatureMap)
which is more flexible.Converts aSequenceExample
into an array ofSparseVector
s and labels suitable for CRF prediction.- Parameters:
example
- The sequence example to convertfeatureIDMap
- The feature id map, used to discover the number of features.labelIDMap
- The label id map, used to get the index of the labels.- Returns:
- A
Pair
of an int array of labels and an array ofSparseVector
.
-
convertToVector
public static <T extends Output<T>> SGDVector[] convertToVector(SequenceExample<T> example, ImmutableFeatureMap featureIDMap) Converts aSequenceExample
into an array ofSGDVector
s suitable for CRF prediction.- Type Parameters:
T
- The type parameter of the sequence example.- Parameters:
example
- The sequence example to convertfeatureIDMap
- The feature id map, used to discover the number of features.- Returns:
- An array of
SGDVector
.
-
convertToVector
public static com.oracle.labs.mlrg.olcut.util.Pair<int[],SGDVector[]> convertToVector(SequenceExample<Label> example, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<Label> labelIDMap) Converts aSequenceExample
into an array ofSGDVector
s and labels suitable for CRF prediction.- Parameters:
example
- The sequence example to convertfeatureIDMap
- The feature id map, used to discover the number of features.labelIDMap
- The label id map, used to get the index of the labels.- Returns:
- A
Pair
of an int array of labels and an array ofSparseVector
.
-
convertToVector(org.tribuo.sequence.SequenceExample<T>, org.tribuo.ImmutableFeatureMap)
which is more flexible.