public class NgramProcessor extends Object implements TextProcessor
Constructor and Description |
---|
NgramProcessor(Tokenizer tokenizer,
int n,
double value)
Creates a processor that will generate token ngrams of size
n . |
Modifier and Type | Method and Description |
---|---|
com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance |
getProvenance() |
void |
postConfig()
Used by the OLCUT configuration system, and should not be called by external code.
|
List<Feature> |
process(String text)
Extracts features from the supplied text.
|
List<Feature> |
process(String tag,
String text)
Extracts features from the supplied text.
|
public NgramProcessor(Tokenizer tokenizer, int n, double value)
n
.tokenizer
- The tokenizer to use to process text.n
- the size of the ngram to generatevalue
- the value we will put in the new features.public void postConfig()
postConfig
in interface com.oracle.labs.mlrg.olcut.config.Configurable
public List<Feature> process(String text) throws TextProcessingException
TextProcessor
process
in interface TextProcessor
text
- The text to extract.TextProcessingException
- If an error occurred during extraction (usually from tokenization).public List<Feature> process(String tag, String text) throws TextProcessingException
TextProcessor
process
in interface TextProcessor
tag
- The feature name tag.text
- The text to extract.TextProcessingException
- If an error occurred during extraction (usually from tokenization).public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
getProvenance
in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.