public abstract class TextDataSource<T extends Output<T>> extends Object implements ConfigurableDataSource<T>
Modifier and Type | Field and Description |
---|---|
protected List<Example<T>> |
data
The actual data read out of the text file.
|
protected TextFeatureExtractor<T> |
extractor
The extractor that we'll use to turn text into examples.
|
protected OutputFactory<T> |
outputFactory
The factory that converts a String into an
Output . |
protected Path |
path
The path that data was read from.
|
protected List<DocumentPreprocessor> |
preprocessors
Document preprocessors that should be run on the documents that make up
this data set.
|
Modifier | Constructor and Description |
---|---|
protected |
TextDataSource()
for olcut
|
|
TextDataSource(File file,
OutputFactory<T> outputFactory,
TextFeatureExtractor<T> extractor,
DocumentPreprocessor... preprocessors) |
|
TextDataSource(Path path,
OutputFactory<T> outputFactory,
TextFeatureExtractor<T> extractor,
DocumentPreprocessor... preprocessors)
Creates a text data set by reading it from a path.
|
Modifier and Type | Method and Description |
---|---|
OutputFactory<T> |
getOutputFactory()
Returns the output factory used to convert the text input into an
Output . |
protected String |
handleDoc(String doc)
A method that can be overridden to do different things to each document
that we've read.
|
Iterator<Example<T>> |
iterator() |
protected abstract void |
read()
Reads the data from the Path.
|
String |
toString() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
forEach, spliterator
@Config(description="The document preprocessors to run on each document in the data source.") protected List<DocumentPreprocessor> preprocessors
@Config(mandatory=true, description="The path to read the data from.") protected Path path
@Config(mandatory=true, description="The factory that converts a String into an Output instance.") protected OutputFactory<T extends Output<T>> outputFactory
Output
.@Config(mandatory=true, description="The feature extractor that generates Features from text.") protected TextFeatureExtractor<T extends Output<T>> extractor
protected TextDataSource()
public TextDataSource(Path path, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor, DocumentPreprocessor... preprocessors)
path
- the path to read data fromoutputFactory
- the output factory used to generate the outputs.extractor
- The feature extractor to run on the text.preprocessors
- processors that will be run on the data before it
is added as examples.public TextDataSource(File file, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor, DocumentPreprocessor... preprocessors)
protected String handleDoc(String doc)
doc
- The document to handleprotected abstract void read() throws IOException
IOException
- if there is any error reading the data.public OutputFactory<T> getOutputFactory()
Output
.getOutputFactory
in interface DataSource<T extends Output<T>>
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.