Class NgramProcessor

java.lang.Object
org.tribuo.data.text.impl.NgramProcessor
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>, TextProcessor

public class NgramProcessor extends Object implements TextProcessor
A text processor that will generate token ngrams of a particular size.
  • Constructor Details

    • NgramProcessor

      public NgramProcessor(Tokenizer tokenizer, int n, double value)
      Creates a processor that will generate token ngrams of size n.
      Parameters:
      tokenizer - The tokenizer to use to process text.
      n - the size of the ngram to generate
      value - the value we will put in the new features.
  • Method Details

    • postConfig

      public void postConfig()
      Used by the OLCUT configuration system, and should not be called by external code.
      Specified by:
      postConfig in interface com.oracle.labs.mlrg.olcut.config.Configurable
    • process

      public List<Feature> process(String text) throws TextProcessingException
      Description copied from interface: TextProcessor
      Extracts features from the supplied text.
      Specified by:
      process in interface TextProcessor
      Parameters:
      text - The text to extract.
      Returns:
      The extracted features.
      Throws:
      TextProcessingException - If an error occurred during extraction (usually from tokenization).
    • process

      public List<Feature> process(String tag, String text) throws TextProcessingException
      Description copied from interface: TextProcessor
      Extracts features from the supplied text.
      Specified by:
      process in interface TextProcessor
      Parameters:
      tag - The feature name tag.
      text - The text to extract.
      Returns:
      The extracted features.
      Throws:
      TextProcessingException - If an error occurred during extraction (usually from tokenization).
    • getProvenance

      public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>