Package org.tribuo.data.text.impl
Class SimpleTextDataSource<T extends Output<T>>
java.lang.Object
org.tribuo.data.text.TextDataSource<T>
org.tribuo.data.text.impl.SimpleTextDataSource<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>
,Iterable<Example<T>>
,ConfigurableDataSource<T>
,DataSource<T>
- Direct Known Subclasses:
SimpleStringDataSource
A dataset for a simple data format for text classification experiments. A line
in the file looks like:
OUTPUT##Document textEach line in the file specifies a single output and document pair. Leading and trailing spaces will be trimmed from outputs and documents. Outputs will be converted to upper case.
As with all of our text data, the file should be in UTF-8.
-
Nested Class Summary
-
Field Summary
Fields inherited from class org.tribuo.data.text.TextDataSource
data, extractor, outputFactory, path, preprocessors
-
Constructor Summary
ModifierConstructorDescriptionprotected
for olcutSimpleTextDataSource
(File file, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) Constructs a simple text data source by reading lines from the supplied file.SimpleTextDataSource
(Path path, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) Constructs a simple text data source by reading lines from the supplied path.protected
SimpleTextDataSource
(OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) -
Method Summary
Modifier and TypeMethodDescriptionprotected ConfiguredDataSourceProvenance
void
Used by the OLCUT configuration system, and should not be called by external code.protected void
read()
Reads the data from the Path.Methods inherited from class org.tribuo.data.text.TextDataSource
getOutputFactory, handleDoc, iterator, toString
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
provenance
-
-
Constructor Details
-
SimpleTextDataSource
protected SimpleTextDataSource()for olcut -
SimpleTextDataSource
public SimpleTextDataSource(Path path, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) throws IOException Constructs a simple text data source by reading lines from the supplied path.- Parameters:
path
- The path to load.outputFactory
- The output factory to use.extractor
- The feature extractor.- Throws:
IOException
- If the path could not be read.
-
SimpleTextDataSource
public SimpleTextDataSource(File file, OutputFactory<T> outputFactory, TextFeatureExtractor<T> extractor) throws IOException Constructs a simple text data source by reading lines from the supplied file.- Parameters:
file
- The file to load.outputFactory
- The output factory to use.extractor
- The feature extractor.- Throws:
IOException
- If the file could not be read.
-
SimpleTextDataSource
-
-
Method Details
-
postConfig
Used by the OLCUT configuration system, and should not be called by external code.- Throws:
IOException
-
parseLine
-
read
Description copied from class:TextDataSource
Reads the data from the Path.- Specified by:
read
in classTextDataSource<T extends Output<T>>
- Throws:
IOException
- if there is any error reading the data.
-
getProvenance
-
cacheProvenance
-