Class SimpleTextDataSource<T extends Output<T>>

java.lang.Object
org.tribuo.data.text.TextDataSource<T>
org.tribuo.data.text.impl.SimpleTextDataSource<T>
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>, Iterable<Example<T>>, ConfigurableDataSource<T>, DataSource<T>
Direct Known Subclasses:
SimpleStringDataSource

public class SimpleTextDataSource<T extends Output<T>> extends TextDataSource<T>
A dataset for a simple data format for text classification experiments. A line in the file looks like:
 OUTPUT##Document text
 
Each line in the file specifies a single output and document pair. Leading and trailing spaces will be trimmed from outputs and documents. Outputs will be converted to upper case.

As with all of our text data, the file should be in UTF-8.