java.lang.Object

org.tribuo.data.text.impl.NgramProcessor

All Implemented Interfaces:: com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>, TextProcessor

public class NgramProcessor extends Object implements TextProcessor

A text processor that will generate token ngrams of a particular size.

Constructor Summary

Constructors

Constructor

Description

NgramProcessor(Tokenizer tokenizer, int n, double value)

Creates a processor that will generate token ngrams of size n.
Method Summary

Modifier and Type

Method

Description

com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance

getProvenance()

void

postConfig()

Used by the OLCUT configuration system, and should not be called by external code.

List<Feature>

process(String text)

Extracts features from the supplied text.

List<Feature>

process(String tag, String text)

Extracts features from the supplied text.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- NgramProcessor
  
  public NgramProcessor(Tokenizer tokenizer, int n, double value)
  
  Creates a processor that will generate token ngrams of size n.
  
  Parameters:
  
  tokenizer - The tokenizer to use to process text.
  
  n - the size of the ngram to generate
  
  value - the value we will put in the new features.
Method Details
- postConfig
  
  public void postConfig()
  
  Used by the OLCUT configuration system, and should not be called by external code.
  
  Specified by:
  
  postConfig in interface com.oracle.labs.mlrg.olcut.config.Configurable
- process
  
  public List<Feature> process(String text) throws TextProcessingException
  
  Description copied from interface: TextProcessor
  
  Extracts features from the supplied text.
  
  Specified by:
  
  process in interface TextProcessor
  
  Parameters:
  
  text - The text to extract.
  
  Returns:
  
  The extracted features.
  
  Throws:
  
  TextProcessingException - If an error occurred during extraction (usually from tokenization).
- process
  
  public List<Feature> process(String tag, String text) throws TextProcessingException
  
  Description copied from interface: TextProcessor
  
  Extracts features from the supplied text.
  
  Specified by:
  
  process in interface TextProcessor
  
  Parameters:
  
  tag - The feature name tag.
  
  text - The text to extract.
  
  Returns:
  
  The extracted features.
  
  Throws:
  
  TextProcessingException - If an error occurred during extraction (usually from tokenization).
- getProvenance
  
  public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
  
  Specified by:
  
  getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>

Class NgramProcessor

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

NgramProcessor

Method Details

postConfig

process

process

getProvenance