Package org.tribuo.data.text.impl
Class NewsPreprocessor
java.lang.Object
org.tribuo.data.text.impl.NewsPreprocessor
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
,DocumentPreprocessor
A document pre-processor for 20 newsgroup data. This processor will take a
news group message in a string and reduce it to the subject of the message
and the body of the message. It deals with a variety of weird conditions
(e.g., no headers, can't find subject, etc.)
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptioncom.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance
processDoc
(String doc) Processes the content of part of a document stored as a string, returning a new string.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface com.oracle.labs.mlrg.olcut.config.Configurable
postConfig
-
Constructor Details
-
NewsPreprocessor
public NewsPreprocessor()Constructor.
-
-
Method Details
-
processDoc
Description copied from interface:DocumentPreprocessor
Processes the content of part of a document stored as a string, returning a new string.- Specified by:
processDoc
in interfaceDocumentPreprocessor
- Parameters:
doc
- the document to process- Returns:
- the processed string. Note that the return value may be
null
, in which case the resulting string will be ignored.
-
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()- Specified by:
getProvenance
in interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
-