Class SimpleTextDataSource<T extends Output<T>>
java.lang.Object
org.tribuo.data.text.TextDataSource<T>
org.tribuo.data.text.impl.SimpleTextDataSource<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable,com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>,Iterable<Example<T>>,ConfigurableDataSource<T>,DataSource<T>
- Direct Known Subclasses:
SimpleStringDataSource
A dataset for a simple data format for text classification experiments. A line
in the file looks like:
OUTPUT##Document textEach line in the file specifies a single output and document pair. Leading and trailing spaces will be trimmed from outputs and documents. Outputs will be converted to upper case.
As with all of our text data, the file should be in UTF-8.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected ConfiguredDataSourceProvenanceThe data source provenance.Fields inherited from class org.tribuo.data.text.TextDataSource
data, extractor, outputFactory, path, preprocessors -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedfor olcutSimpleTextDataSource(File file, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) Constructs a simple text data source by reading lines from the supplied file.SimpleTextDataSource(Path path, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) Constructs a simple text data source by reading lines from the supplied path.protectedSimpleTextDataSource(OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) Cosntructs a data source without a path. -
Method Summary
Modifier and TypeMethodDescriptionprotected ConfiguredDataSourceProvenanceComputes the provenance.Parses a line in Tribuo's default text format.voidUsed by the OLCUT configuration system, and should not be called by external code.protected voidread()Reads the data from the Path.Methods inherited from class org.tribuo.data.text.TextDataSource
getOutputFactory, handleDoc, iterator, toStringMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
provenance
The data source provenance.
-
-
Constructor Details
-
SimpleTextDataSource
protected SimpleTextDataSource()for olcut -
SimpleTextDataSource
public SimpleTextDataSource(Path path, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) throws IOException Constructs a simple text data source by reading lines from the supplied path.- Parameters:
path- The path to load.outputFactory- The output factory to use.extractor- The feature extractor.- Throws:
IOException- If the path could not be read.
-
SimpleTextDataSource
public SimpleTextDataSource(File file, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) throws IOException Constructs a simple text data source by reading lines from the supplied file.- Parameters:
file- The file to load.outputFactory- The output factory to use.extractor- The feature extractor.- Throws:
IOException- If the file could not be read.
-
SimpleTextDataSource
Cosntructs a data source without a path.- Parameters:
outputFactory- The output factory.extractor- The text extraction pipeline.
-
-
Method Details
-
postConfig
Used by the OLCUT configuration system, and should not be called by external code.- Throws:
IOException
-
parseLine
-
read
Description copied from class:TextDataSourceReads the data from the Path.- Specified by:
readin classTextDataSource<T extends Output<T>>- Throws:
IOException- if there is any error reading the data.
-
getProvenance
-
cacheProvenance
Computes the provenance.- Returns:
- The provenance.
-