Package org.tribuo.data.text.impl
Class BasicPipeline
java.lang.Object
org.tribuo.data.text.impl.BasicPipeline
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
,TextPipeline
An example implementation of
TextPipeline
. Generates unique ngrams.-
Constructor Summary
ConstructorDescriptionBasicPipeline
(Tokenizer tokenizer, int ngram) Constructs a basic text pipeline which tokenizes the input and generates word n-gram features in the range 1 tongram
. -
Method Summary
Modifier and TypeMethodDescriptioncom.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance
void
Used by the OLCUT configuration system, and should not be called by external code.Extracts a list of features from the supplied text, using the tag to prepend the feature names.toString()
-
Constructor Details
-
BasicPipeline
Constructs a basic text pipeline which tokenizes the input and generates word n-gram features in the range 1 tongram
.- Parameters:
tokenizer
- The tokenizer.ngram
- The size of the n-grams to generate.
-
-
Method Details
-
postConfig
public void postConfig()Used by the OLCUT configuration system, and should not be called by external code.- Specified by:
postConfig
in interfacecom.oracle.labs.mlrg.olcut.config.Configurable
-
toString
-
process
Description copied from interface:TextPipeline
Extracts a list of features from the supplied text, using the tag to prepend the feature names.- Specified by:
process
in interfaceTextPipeline
- Parameters:
tag
- The feature name tag.data
- The text to extract.- Returns:
- The extracted features.
-
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()- Specified by:
getProvenance
in interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
-