Class NewsPreprocessor

java.lang.Object
org.tribuo.data.text.impl.NewsPreprocessor
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>, DocumentPreprocessor

public class NewsPreprocessor extends Object implements DocumentPreprocessor
A document pre-processor for 20 newsgroup data. This processor will take a news group message in a string and reduce it to the subject of the message and the body of the message. It deals with a variety of weird conditions (e.g., no headers, can't find subject, etc.)
  • Constructor Details

    • NewsPreprocessor

      public NewsPreprocessor()
      Constructor.
  • Method Details

    • processDoc

      public String processDoc(String doc)
      Description copied from interface: DocumentPreprocessor
      Processes the content of part of a document stored as a string, returning a new string.
      Specified by:
      processDoc in interface DocumentPreprocessor
      Parameters:
      doc - the document to process
      Returns:
      the processed string. Note that the return value may be null, in which case the resulting string will be ignored.
    • getProvenance

      public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>