Package org.tribuo.data.text.impl
Class RegexPreprocessor
java.lang.Object
org.tribuo.data.text.impl.RegexPreprocessor
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
,DocumentPreprocessor
A simple document preprocessor which applies regular expressions to the input.
-
Constructor Summary
ConstructorDescriptionRegexPreprocessor
(List<String> regexStrings, List<String> replacements) Construct a regex preprocessor. -
Method Summary
Modifier and TypeMethodDescriptioncom.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance
void
Used by the OLCUT configuration system, and should not be called by external code.processDoc
(String doc) Processes the content of part of a document stored as a string, returning a new string.
-
Constructor Details
-
RegexPreprocessor
Construct a regex preprocessor.- Parameters:
regexStrings
- A list of strings containing regular expressions.replacements
- A list of strings containing the replacements for matches to the regular expressions in the input
-
-
Method Details
-
postConfig
public void postConfig()Used by the OLCUT configuration system, and should not be called by external code.- Specified by:
postConfig
in interfacecom.oracle.labs.mlrg.olcut.config.Configurable
-
processDoc
Description copied from interface:DocumentPreprocessor
Processes the content of part of a document stored as a string, returning a new string.- Specified by:
processDoc
in interfaceDocumentPreprocessor
- Parameters:
doc
- the document to process- Returns:
- the processed string. Note that the return value may be
null
, in which case the resulting string will be ignored.
-
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()- Specified by:
getProvenance
in interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
-