Class CRFModel

All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<ModelProvenance>, Serializable

public class CRFModel extends ConfidencePredictingSequenceModel
An inference time model for a linear chain CRF trained using SGD.

Can be switched to use Viterbi, belief propagation, or constrained BP at test time. By default it uses Viterbi.

See:

 Lafferty J, McCallum A, Pereira FC.
 "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data"
 Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001).
 
See Also:
  • Method Details

    • setConfidenceType

      public void setConfidenceType(CRFModel.ConfidenceType type)
      Sets the inference method used for confidence prediction. If CONSTRAINED_BP uses the constrained belief propagation algorithm from Culotta and McCallum 2004, if MULTIPLY multiplies the maximum marginal for each token, if NONE uses Viterbi.
      Parameters:
      type - Enum specifying the confidence type.
    • getFeatureWeights

      public DenseVector getFeatureWeights(int featureID)
      Get a copy of the weights for feature featureID.
      Parameters:
      featureID - The feature ID.
      Returns:
      The per class weights.
    • getFeatureWeights

      public DenseVector getFeatureWeights(String featureName)
      Get a copy of the weights for feature named featureName.
      Parameters:
      featureName - The feature name.
      Returns:
      The per class weights.
    • predict

      public List<Prediction<Label>> predict(SequenceExample<Label> example)
      Description copied from class: SequenceModel
      Uses the model to predict the output for a single example.
      Specified by:
      predict in class SequenceModel<Label>
      Parameters:
      example - the example to predict.
      Returns:
      the result of the prediction.
    • getTopFeatures

      public Map<String,List<com.oracle.labs.mlrg.olcut.util.Pair<String,Double>>> getTopFeatures(int n)
      Description copied from class: SequenceModel
      Gets the top n features associated with this model.

      If the model does not produce per output feature lists, it returns a map with a single element with key Model.ALL_OUTPUTS.

      If the model cannot describe it's top features then it returns Collections.emptyMap().

      Specified by:
      getTopFeatures in class SequenceModel<Label>
      Parameters:
      n - the number of features to return. If this value is less than 0, all features should be returned for each class, unless the model cannot score it's features.
      Returns:
      a map from string outputs to an ordered list of pairs of feature names and weights associated with that feature in the model
    • scoreSubsequences

      public <SUB extends ConfidencePredictingSequenceModel.Subsequence> List<Double> scoreSubsequences(SequenceExample<Label> example, List<Prediction<Label>> predictions, List<SUB> subsequences)
      Description copied from class: ConfidencePredictingSequenceModel
      The scoring function for the subsequences. Provides the scores which should be assigned to each subsequence.
      Specified by:
      scoreSubsequences in class ConfidencePredictingSequenceModel
      Type Parameters:
      SUB - The subsequence type.
      Parameters:
      example - The input sequence example.
      predictions - The predictions produced by this model.
      subsequences - The subsequences to score.
      Returns:
      The scores for the subsequences.
    • scoreChunks

      public List<Double> scoreChunks(SequenceExample<Label> example, List<Chunk> chunks)
      Scores the chunks using constrained belief propagation.
      Parameters:
      example - The example to score.
      chunks - The predicted chunks.
      Returns:
      The scores.
    • generateWeightsString

      public String generateWeightsString()
      Generates a human readable string containing all the weights in this model.
      Returns:
      A string containing all the weight values.
    • convert

      @Deprecated public static <T extends Output<T>> SparseVector[] convert(SequenceExample<T> example, ImmutableFeatureMap featureIDMap)
      Deprecated.
      Converts a SequenceExample into an array of SparseVectors suitable for CRF prediction.
      Type Parameters:
      T - The type parameter of the sequence example.
      Parameters:
      example - The sequence example to convert
      featureIDMap - The feature id map, used to discover the number of features.
      Returns:
      An array of SparseVector.
    • convert

      @Deprecated public static com.oracle.labs.mlrg.olcut.util.Pair<int[],SparseVector[]> convert(SequenceExample<Label> example, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<Label> labelIDMap)
      Deprecated.
      Converts a SequenceExample into an array of SparseVectors and labels suitable for CRF prediction.
      Parameters:
      example - The sequence example to convert
      featureIDMap - The feature id map, used to discover the number of features.
      labelIDMap - The label id map, used to get the index of the labels.
      Returns:
      A Pair of an int array of labels and an array of SparseVector.
    • convertToVector

      public static <T extends Output<T>> SGDVector[] convertToVector(SequenceExample<T> example, ImmutableFeatureMap featureIDMap)
      Converts a SequenceExample into an array of SGDVectors suitable for CRF prediction.
      Type Parameters:
      T - The type parameter of the sequence example.
      Parameters:
      example - The sequence example to convert
      featureIDMap - The feature id map, used to discover the number of features.
      Returns:
      An array of SGDVector.
    • convertToVector

      public static com.oracle.labs.mlrg.olcut.util.Pair<int[],SGDVector[]> convertToVector(SequenceExample<Label> example, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<Label> labelIDMap)
      Converts a SequenceExample into an array of SGDVectors and labels suitable for CRF prediction.
      Parameters:
      example - The sequence example to convert
      featureIDMap - The feature id map, used to discover the number of features.
      labelIDMap - The label id map, used to get the index of the labels.
      Returns:
      A Pair of an int array of labels and an array of SparseVector.