Class DirectoryFileSource<T extends Output<T>>

java.lang.Object
org.tribuo.data.text.DirectoryFileSource<T>
Type Parameters:
T - The type of the features built by the underlying text processing infrastructure.
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>, Iterable<Example<T>>, ConfigurableDataSource<T>, DataSource<T>

public class DirectoryFileSource<T extends Output<T>> extends Object implements ConfigurableDataSource<T>
A data source for a somewhat-common format for text classification datasets: a top level directory that contains a number of subdirectories. Each of these subdirectories contains the data for a output whose name is the name of the subdirectory.

In these subdirectories are a number of files. Each file represents a single document that should be labeled with the name of the subdirectory.

This data source will produce appropriately labeled Examples<T> from each of these files.