T
- The type of the features built by the underlying text processing
infrastructure.public class DirectoryFileSource<T extends Output<T>> extends Object implements ConfigurableDataSource<T>
In these subdirectories are a number of files. Each file represents a single document that should be labeled with the name of the subdirectory.
This data source will produce appropriately labeled Examples<T>
from each of these files.
Modifier and Type | Class and Description |
---|---|
static class |
DirectoryFileSource.DirectoryFileSourceProvenance
Provenance for
DirectoryFileSource . |
Modifier and Type | Field and Description |
---|---|
protected TextFeatureExtractor<T> |
extractor
The extractor that we'll use to turn text into examples.
|
protected OutputFactory<T> |
outputFactory
The factory that converts a String into an
Output . |
protected List<DocumentPreprocessor> |
preprocessors
Document preprocessors that should be run on the documents that make up
this data set.
|
Modifier | Constructor and Description |
---|---|
protected |
DirectoryFileSource()
for olcut
|
|
DirectoryFileSource(OutputFactory<T> outputFactory,
TextFeatureExtractor<T> extractor,
DocumentPreprocessor... preprocessors)
Creates a data source that will use the given feature extractor and
document preprocessors on the data read from the files in the directories
representing classes.
|
|
DirectoryFileSource(Path newsDir,
OutputFactory<T> outputFactory,
TextFeatureExtractor<T> extractor,
DocumentPreprocessor... preprocessors) |
Modifier and Type | Method and Description |
---|---|
OutputFactory<T> |
getOutputFactory()
Returns the OutputFactory associated with this Output subclass.
|
ConfiguredDataSourceProvenance |
getProvenance() |
Iterator<Example<T>> |
iterator() |
String |
toString() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
forEach, spliterator
@Config(description="The preprocessors to apply to the input documents.") protected List<DocumentPreprocessor> preprocessors
@Config(mandatory=true, description="The output factory to use.") protected OutputFactory<T extends Output<T>> outputFactory
Output
.@Config(mandatory=true, description="The feature extractor that converts text into examples.") protected TextFeatureExtractor<T extends Output<T>> extractor
protected DirectoryFileSource()
public DirectoryFileSource(OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor, DocumentPreprocessor... preprocessors)
outputFactory
- The output factory used to generate the outputs.extractor
- The text feature extractor that will run on the
documents.preprocessors
- Pre-processors that we will run on the documents
before extracting their features.public DirectoryFileSource(Path newsDir, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor, DocumentPreprocessor... preprocessors)
public OutputFactory<T> getOutputFactory()
DataSource
getOutputFactory
in interface DataSource<T extends Output<T>>
public ConfiguredDataSourceProvenance getProvenance()
getProvenance
in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.