Interface DocumentPreprocessor

All Superinterfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
All Known Implementing Classes:
CasingPreprocessor, NewsPreprocessor, RegexPreprocessor

public interface DocumentPreprocessor extends com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
An interface for things that can pre-process documents before they are broken into features.
  • Method Summary

    Modifier and Type
    Method
    Description
    Processes the content of part of a document stored as a string, returning a new string.

    Methods inherited from interface com.oracle.labs.mlrg.olcut.config.Configurable

    postConfig

    Methods inherited from interface com.oracle.labs.mlrg.olcut.provenance.Provenancable

    getProvenance
  • Method Details

    • processDoc

      String processDoc(String doc)
      Processes the content of part of a document stored as a string, returning a new string.
      Parameters:
      doc - the document to process
      Returns:
      the processed string. Note that the return value may be null, in which case the resulting string will be ignored.