Class SimpleTextDataSource<T extends Output<T>>
java.lang.Object
org.tribuo.data.text.TextDataSource<T>
org.tribuo.data.text.impl.SimpleTextDataSource<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable,com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>,Iterable<Example<T>>,ConfigurableDataSource<T>,DataSource<T>
- Direct Known Subclasses:
SimpleStringDataSource
A dataset for a simple data format for text classification experiments. A line
in the file looks like:
OUTPUT##Document textEach line in the file specifies a single output and document pair. Leading and trailing spaces will be trimmed from outputs and documents. Outputs will be converted to upper case.
As with all of our text data, the file should be in UTF-8.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsFields inherited from class org.tribuo.data.text.TextDataSource
data, extractor, outputFactory, path, preprocessors -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedfor olcutSimpleTextDataSource(File file, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) SimpleTextDataSource(Path path, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) protectedSimpleTextDataSource(OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) -
Method Summary
Modifier and TypeMethodDescriptionprotected ConfiguredDataSourceProvenancevoidUsed by the OLCUT configuration system, and should not be called by external code.protected voidread()Reads the data from the Path.Methods inherited from class org.tribuo.data.text.TextDataSource
getOutputFactory, handleDoc, iterator, toStringMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
provenance
-
-
Constructor Details
-
SimpleTextDataSource
protected SimpleTextDataSource()for olcut -
SimpleTextDataSource
public SimpleTextDataSource(Path path, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) throws IOException - Throws:
IOException
-
SimpleTextDataSource
public SimpleTextDataSource(File file, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) throws IOException - Throws:
IOException
-
SimpleTextDataSource
-
-
Method Details
-
postConfig
Used by the OLCUT configuration system, and should not be called by external code.- Throws:
IOException
-
parseLine
-
read
Description copied from class:TextDataSourceReads the data from the Path.- Specified by:
readin classTextDataSource<T extends Output<T>>- Throws:
IOException- if there is any error reading the data.
-
getProvenance
-
cacheProvenance
-