Package | Description |
---|---|
org.tribuo.data.text | |
org.tribuo.data.text.impl |
Provides implementations of text data processors.
|
Modifier and Type | Field and Description |
---|---|
protected List<DocumentPreprocessor> |
TextDataSource.preprocessors
Document preprocessors that should be run on the documents that make up
this data set.
|
protected List<DocumentPreprocessor> |
DirectoryFileSource.preprocessors
Document preprocessors that should be run on the documents that make up
this data set.
|
Constructor and Description |
---|
DirectoryFileSource(OutputFactory<T> outputFactory,
TextFeatureExtractor<T> extractor,
DocumentPreprocessor... preprocessors)
Creates a data source that will use the given feature extractor and
document preprocessors on the data read from the files in the directories
representing classes.
|
DirectoryFileSource(Path newsDir,
OutputFactory<T> outputFactory,
TextFeatureExtractor<T> extractor,
DocumentPreprocessor... preprocessors) |
TextDataSource(File file,
OutputFactory<T> outputFactory,
TextFeatureExtractor<T> extractor,
DocumentPreprocessor... preprocessors) |
TextDataSource(Path path,
OutputFactory<T> outputFactory,
TextFeatureExtractor<T> extractor,
DocumentPreprocessor... preprocessors)
Creates a text data set by reading it from a path.
|
Modifier and Type | Class and Description |
---|---|
class |
CasingPreprocessor
A document preprocessor which uppercases or lowercases the input.
|
class |
NewsPreprocessor
A document pre-processor for 20 newsgroup data.
|
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.