Class CRFModel

All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<ModelProvenance>, Serializable, ProtoSerializable<org.tribuo.protos.core.SequenceModelProto>

public class CRFModel extends ConfidencePredictingSequenceModel
An inference time model for a linear chain CRF trained using SGD.

Can be switched to use Viterbi, belief propagation, or constrained BP at test time. By default it uses Viterbi.

See:

 Lafferty J, McCallum A, Pereira FC.
 "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data"
 Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001).
 
See Also:
  • Field Details

    • CURRENT_VERSION

      public static final int CURRENT_VERSION
      Protobuf serialization version.
      See Also:
  • Method Details

    • deserializeFromProto

      public static CRFModel deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException
      Deserialization factory.
      Parameters:
      version - The serialized object version.
      className - The class name.
      message - The serialized data.
      Returns:
      The deserialized object.
      Throws:
      com.google.protobuf.InvalidProtocolBufferException - If the protobuf could not be parsed from the message.
    • setConfidenceType

      public void setConfidenceType(CRFModel.ConfidenceType type)
      Sets the inference method used for confidence prediction. If CONSTRAINED_BP uses the constrained belief propagation algorithm from Culotta and McCallum 2004, if MULTIPLY multiplies the maximum marginal for each token, if NONE uses Viterbi.
      Parameters:
      type - Enum specifying the confidence type.
    • getFeatureWeights

      public DenseVector getFeatureWeights(int featureID)
      Get a copy of the weights for feature featureID.
      Parameters:
      featureID - The feature ID.
      Returns:
      The per class weights.
    • getFeatureWeights

      public DenseVector getFeatureWeights(String featureName)
      Get a copy of the weights for feature named featureName.
      Parameters:
      featureName - The feature name.
      Returns:
      The per class weights.
    • predict

      public List<Prediction<Label>> predict(SequenceExample<Label> example)
      Description copied from class: SequenceModel
      Uses the model to predict the output for a single example.
      Specified by:
      predict in class SequenceModel<Label>
      Parameters:
      example - the example to predict.
      Returns:
      the result of the prediction.
    • getTopFeatures

      public Map<String,List<com.oracle.labs.mlrg.olcut.util.Pair<String,Double>>> getTopFeatures(int n)
      Description copied from class: SequenceModel
      Gets the top n features associated with this model.

      If the model does not produce per output feature lists, it returns a map with a single element with key Model.ALL_OUTPUTS.

      If the model cannot describe it's top features then it returns Collections.emptyMap().

      Specified by:
      getTopFeatures in class SequenceModel<Label>
      Parameters:
      n - the number of features to return. If this value is less than 0, all features should be returned for each class, unless the model cannot score its features.
      Returns:
      a map from string outputs to an ordered list of pairs of feature names and weights associated with that feature in the model
    • scoreSubsequences

      public <SUB extends ConfidencePredictingSequenceModel.Subsequence> List<Double> scoreSubsequences(SequenceExample<Label> example, List<Prediction<Label>> predictions, List<SUB> subsequences)
      Description copied from class: ConfidencePredictingSequenceModel
      The scoring function for the subsequences. Provides the scores which should be assigned to each subsequence.
      Specified by:
      scoreSubsequences in class ConfidencePredictingSequenceModel
      Type Parameters:
      SUB - The subsequence type.
      Parameters:
      example - The input sequence example.
      predictions - The predictions produced by this model.
      subsequences - The subsequences to score.
      Returns:
      The scores for the subsequences.
    • scoreChunks

      public List<Double> scoreChunks(SequenceExample<Label> example, List<Chunk> chunks)
      Scores the chunks using constrained belief propagation.
      Parameters:
      example - The example to score.
      chunks - The predicted chunks.
      Returns:
      The scores.
    • generateWeightsString

      public String generateWeightsString()
      Generates a human readable string containing all the weights in this model.
      Returns:
      A string containing all the weight values.
    • serialize

      public org.tribuo.protos.core.SequenceModelProto serialize()
      Description copied from interface: ProtoSerializable
      Serializes this object to a protobuf.
      Specified by:
      serialize in interface ProtoSerializable<org.tribuo.protos.core.SequenceModelProto>
      Overrides:
      serialize in class SequenceModel<Label>
      Returns:
      The protobuf.
    • convert

      @Deprecated public static <T extends Output<T>> SparseVector[] convert(SequenceExample<T> example, ImmutableFeatureMap featureIDMap)
      Deprecated.
      Converts a SequenceExample into an array of SparseVectors suitable for CRF prediction.
      Type Parameters:
      T - The type parameter of the sequence example.
      Parameters:
      example - The sequence example to convert
      featureIDMap - The feature id map, used to discover the number of features.
      Returns:
      An array of SparseVector.
    • convert

      @Deprecated public static com.oracle.labs.mlrg.olcut.util.Pair<int[],SparseVector[]> convert(SequenceExample<Label> example, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<Label> labelIDMap)
      Deprecated.
      Converts a SequenceExample into an array of SparseVectors and labels suitable for CRF prediction.
      Parameters:
      example - The sequence example to convert
      featureIDMap - The feature id map, used to discover the number of features.
      labelIDMap - The label id map, used to get the index of the labels.
      Returns:
      A Pair of an int array of labels and an array of SparseVector.
    • convertToVector

      public static <T extends Output<T>> SGDVector[] convertToVector(SequenceExample<T> example, ImmutableFeatureMap featureIDMap)
      Converts a SequenceExample into an array of SGDVectors suitable for CRF prediction.
      Type Parameters:
      T - The type parameter of the sequence example.
      Parameters:
      example - The sequence example to convert
      featureIDMap - The feature id map, used to discover the number of features.
      Returns:
      An array of SGDVector.
    • convertToVector

      public static com.oracle.labs.mlrg.olcut.util.Pair<int[],SGDVector[]> convertToVector(SequenceExample<Label> example, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<Label> labelIDMap)
      Converts a SequenceExample into an array of SGDVectors and labels suitable for CRF prediction.
      Parameters:
      example - The sequence example to convert
      featureIDMap - The feature id map, used to discover the number of features.
      labelIDMap - The label id map, used to get the index of the labels.
      Returns:
      A Pair of an int array of labels and an array of SparseVector.