Class TextDataSource<T extends Output<T>>

java.lang.Object
org.tribuo.data.text.TextDataSource<T>
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>, Iterable<Example<T>>, ConfigurableDataSource<T>, DataSource<T>
Direct Known Subclasses:
SimpleTextDataSource

public abstract class TextDataSource<T extends Output<T>> extends Object implements ConfigurableDataSource<T>
A base class for textual data sets. We assume that all textual data is written and read using UTF-8.
  • Field Details

    • preprocessors

      @Config(description="The document preprocessors to run on each document in the data source.") protected List<DocumentPreprocessor> preprocessors
      Document preprocessors that should be run on the documents that make up this data set.
    • path

      @Config(mandatory=true, description="The path to read the data from.") protected Path path
      The path that data was read from.
    • outputFactory

      @Config(mandatory=true, description="The factory that converts a String into an Output instance.") protected OutputFactory<T extends Output<T>> outputFactory
      The factory that converts a String into an Output.
    • extractor

      @Config(mandatory=true, description="The feature extractor that generates Features from text.") protected TextFeatureExtractor<T extends Output<T>> extractor
      The extractor that we'll use to turn text into examples.
    • data

      protected final List<Example<T extends Output<T>>> data
      The actual data read out of the text file.
  • Constructor Details

  • Method Details