Class SimpleTextDataSource<T extends Output<T>>
java.lang.Object
org.tribuo.data.text.TextDataSource<T>
org.tribuo.data.text.impl.SimpleTextDataSource<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>
,Iterable<Example<T>>
,ConfigurableDataSource<T>
,DataSource<T>
- Direct Known Subclasses:
SimpleStringDataSource
A dataset for a simple data format for text classification experiments. A line
in the file looks like:
OUTPUT##Document textEach line in the file specifies a single output and document pair. Leading and trailing spaces will be trimmed from outputs and documents. Outputs will be converted to upper case.
As with all of our text data, the file should be in UTF-8.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsFields inherited from class org.tribuo.data.text.TextDataSource
data, extractor, outputFactory, path, preprocessors
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
for olcutSimpleTextDataSource
(File file, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) SimpleTextDataSource
(Path path, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) protected
SimpleTextDataSource
(OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) -
Method Summary
Modifier and TypeMethodDescriptionprotected ConfiguredDataSourceProvenance
void
Used by the OLCUT configuration system, and should not be called by external code.protected void
read()
Reads the data from the Path.Methods inherited from class org.tribuo.data.text.TextDataSource
getOutputFactory, handleDoc, iterator, toString
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
provenance
-
-
Constructor Details
-
SimpleTextDataSource
protected SimpleTextDataSource()for olcut -
SimpleTextDataSource
public SimpleTextDataSource(Path path, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) throws IOException - Throws:
IOException
-
SimpleTextDataSource
public SimpleTextDataSource(File file, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) throws IOException - Throws:
IOException
-
SimpleTextDataSource
-
-
Method Details
-
postConfig
Used by the OLCUT configuration system, and should not be called by external code.- Throws:
IOException
-
parseLine
-
read
Description copied from class:TextDataSource
Reads the data from the Path.- Specified by:
read
in classTextDataSource<T extends Output<T>>
- Throws:
IOException
- if there is any error reading the data.
-
getProvenance
-
cacheProvenance
-