Package org.tribuo.data.text.impl
Class NgramProcessor
java.lang.Object
org.tribuo.data.text.impl.NgramProcessor
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
,TextProcessor
A text processor that will generate token ngrams of a particular size.
-
Constructor Summary
ConstructorDescriptionNgramProcessor
(Tokenizer tokenizer, int n, double value) Creates a processor that will generate token ngrams of sizen
. -
Method Summary
Modifier and TypeMethodDescriptioncom.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance
void
Used by the OLCUT configuration system, and should not be called by external code.Extracts features from the supplied text.Extracts features from the supplied text.
-
Constructor Details
-
NgramProcessor
Creates a processor that will generate token ngrams of sizen
.- Parameters:
tokenizer
- The tokenizer to use to process text.n
- the size of the ngram to generatevalue
- the value we will put in the new features.
-
-
Method Details
-
postConfig
public void postConfig()Used by the OLCUT configuration system, and should not be called by external code.- Specified by:
postConfig
in interfacecom.oracle.labs.mlrg.olcut.config.Configurable
-
process
Description copied from interface:TextProcessor
Extracts features from the supplied text.- Specified by:
process
in interfaceTextProcessor
- Parameters:
text
- The text to extract.- Returns:
- The extracted features.
- Throws:
TextProcessingException
- If an error occurred during extraction (usually from tokenization).
-
process
Description copied from interface:TextProcessor
Extracts features from the supplied text.- Specified by:
process
in interfaceTextProcessor
- Parameters:
tag
- The feature name tag.text
- The text to extract.- Returns:
- The extracted features.
- Throws:
TextProcessingException
- If an error occurred during extraction (usually from tokenization).
-
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()- Specified by:
getProvenance
in interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
-