Package org.tribuo.data.text
package org.tribuo.data.text
-
ClassDescriptionDirectoryFileSource<T extends Output<T>>A data source for a somewhat-common format for text classification datasets: a top level directory that contains a number of subdirectories.Provenance for
DirectoryFileSource
.An interface for things that can pre-process documents before they are broken into features.An interface for aggregating feature values into other values.A feature transformer maps a list of features to a new list of features Useful for example to apply the hashing trick to a set of featuresSplits data in our standard text format into training and testing portions.Command line options.TextDataSource<T extends Output<T>>A base class for textual data sets.TextFeatureExtractor<T extends Output<T>>An interface for things that take text and turn them into examples that we can use to train or evaluate a classifier.A pipeline that takes a String and returns a List ofFeature
s.An exception thrown by the text processing system.A TextProcessor takes some text and optionally a feature tag and generates a list ofFeature
s from that text.