Class NgramProcessor
java.lang.Object
org.tribuo.data.text.impl.NgramProcessor
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>,TextProcessor
A text processor that will generate token ngrams of a particular size.
-
Constructor Summary
ConstructorsConstructorDescriptionNgramProcessor(Tokenizer tokenizer, int n, double value) Creates a processor that will generate token ngrams of sizen. -
Method Summary
Modifier and TypeMethodDescriptioncom.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenancevoidUsed by the OLCUT configuration system, and should not be called by external code.Extracts features from the supplied text.Extracts features from the supplied text.
-
Constructor Details
-
NgramProcessor
Creates a processor that will generate token ngrams of sizen.- Parameters:
tokenizer- The tokenizer to use to process text.n- the size of the ngram to generatevalue- the value we will put in the new features.
-
-
Method Details
-
postConfig
Used by the OLCUT configuration system, and should not be called by external code.- Specified by:
postConfigin interfacecom.oracle.labs.mlrg.olcut.config.Configurable
-
process
Description copied from interface:TextProcessorExtracts features from the supplied text.- Specified by:
processin interfaceTextProcessor- Parameters:
text- The text to extract.- Returns:
- The extracted features.
- Throws:
TextProcessingException- If an error occurred during extraction (usually from tokenization).
-
process
Description copied from interface:TextProcessorExtracts features from the supplied text.- Specified by:
processin interfaceTextProcessor- Parameters:
tag- The feature name tag.text- The text to extract.- Returns:
- The extracted features.
- Throws:
TextProcessingException- If an error occurred during extraction (usually from tokenization).
-
getProvenance
- Specified by:
getProvenancein interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
-