Package org.tribuo.data.text
Interface DocumentPreprocessor
- All Superinterfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
- All Known Implementing Classes:
CasingPreprocessor
,NewsPreprocessor
,RegexPreprocessor
public interface DocumentPreprocessor
extends com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
An interface for things that can pre-process documents before they are
broken into features.
-
Method Summary
Modifier and TypeMethodDescriptionprocessDoc
(String doc) Processes the content of part of a document stored as a string, returning a new string.Methods inherited from interface com.oracle.labs.mlrg.olcut.config.Configurable
postConfig
Methods inherited from interface com.oracle.labs.mlrg.olcut.provenance.Provenancable
getProvenance
-
Method Details
-
processDoc
Processes the content of part of a document stored as a string, returning a new string.- Parameters:
doc
- the document to process- Returns:
- the processed string. Note that the return value may be
null
, in which case the resulting string will be ignored.
-