Class BasicPipeline

java.lang.Object
org.tribuo.data.text.impl.BasicPipeline
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>, TextPipeline

public class BasicPipeline extends Object implements TextPipeline
An example implementation of TextPipeline. Generates unique ngrams.
  • Constructor Summary

    Constructors
    Constructor
    Description
    BasicPipeline(Tokenizer tokenizer, int ngram)
    Constructs a basic text pipeline which tokenizes the input and generates word n-gram features in the range 1 to ngram.
  • Method Summary

    Modifier and Type
    Method
    Description
    com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance
     
    void
    Used by the OLCUT configuration system, and should not be called by external code.
    process(String tag, String data)
    Extracts a list of features from the supplied text, using the tag to prepend the feature names.
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • BasicPipeline

      public BasicPipeline(Tokenizer tokenizer, int ngram)
      Constructs a basic text pipeline which tokenizes the input and generates word n-gram features in the range 1 to ngram.
      Parameters:
      tokenizer - The tokenizer.
      ngram - The size of the n-grams to generate.
  • Method Details

    • postConfig

      public void postConfig()
      Used by the OLCUT configuration system, and should not be called by external code.
      Specified by:
      postConfig in interface com.oracle.labs.mlrg.olcut.config.Configurable
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • process

      public List<Feature> process(String tag, String data)
      Description copied from interface: TextPipeline
      Extracts a list of features from the supplied text, using the tag to prepend the feature names.
      Specified by:
      process in interface TextPipeline
      Parameters:
      tag - The feature name tag.
      data - The text to extract.
      Returns:
      The extracted features.
    • getProvenance

      public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>