Package org.tribuo.data.text.impl
package org.tribuo.data.text.impl
Provides implementations of text data processors.
-
ClassDescriptionA feature aggregator that averages feature values across a feature list.An example implementation of
TextPipeline
.A document preprocessor which uppercases or lowercases the input.The possible casing operations.Hashes the feature names to reduce the dimensionality.A document pre-processor for 20 newsgroup data.A text processor that will generate token ngrams of a particular size.A simple document preprocessor which applies regular expressions to the input.SimpleStringDataSource<T extends Output<T>>A version ofSimpleTextDataSource
that accepts aList
of Strings.Provenance forSimpleStringDataSource
.SimpleTextDataSource<T extends Output<T>>A dataset for a simple data format for text classification experiments.Provenance forSimpleTextDataSource
.A feature aggregator that aggregates occurrence counts across a number of feature lists.TextFeatureExtractorImpl<T extends Output<T>>A pipeline for generating ngram features.Aggregates feature tokens, generating unique features.