Class RegexPreprocessor

java.lang.Object
org.tribuo.data.text.impl.RegexPreprocessor
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>, DocumentPreprocessor

public final class RegexPreprocessor extends Object implements DocumentPreprocessor
A simple document preprocessor which applies regular expressions to the input.
  • Constructor Details

    • RegexPreprocessor

      public RegexPreprocessor(List<String> regexStrings, List<String> replacements)
      Construct a regex preprocessor.
      Parameters:
      regexStrings - A list of strings containing regular expressions.
      replacements - A list of strings containing the replacements for matches to the regular expressions in the input
  • Method Details

    • postConfig

      public void postConfig()
      Used by the OLCUT configuration system, and should not be called by external code.
      Specified by:
      postConfig in interface com.oracle.labs.mlrg.olcut.config.Configurable
    • processDoc

      public String processDoc(String doc)
      Description copied from interface: DocumentPreprocessor
      Processes the content of part of a document stored as a string, returning a new string.
      Specified by:
      processDoc in interface DocumentPreprocessor
      Parameters:
      doc - the document to process
      Returns:
      the processed string. Note that the return value may be null, in which case the resulting string will be ignored.
    • getProvenance

      public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>