Class CSVDataSource<T extends Output<T>>

java.lang.Object
org.tribuo.data.columnar.ColumnarDataSource<T>
org.tribuo.data.csv.CSVDataSource<T>
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>, Iterable<Example<T>>, ConfigurableDataSource<T>, DataSource<T>

public class CSVDataSource<T extends Output<T>> extends ColumnarDataSource<T>
A DataSource for loading separable data from a text file (e.g., CSV, TSV) and applying FieldProcessors to it.
  • Constructor Details

    • CSVDataSource

      public CSVDataSource(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired)
      Creates a CSVDataSource using the specified RowProcessor to process the data.

      Uses ',' as the separator, '"' as the quote character, and '\' as the escape character.

      Parameters:
      dataPath - The Path to the data file.
      rowProcessor - The row processor which converts a row into an Example.
      outputRequired - Is the output required to exist in the data file.
    • CSVDataSource

      public CSVDataSource(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired)
      Creates a CSVDataSource using the specified RowProcessor to process the data.

      Uses ',' as the separator, '"' as the quote character, and '\' as the escape character.

      Parameters:
      dataFile - A URI for the data file.
      rowProcessor - The row processor which converts a row into an Example.
      outputRequired - Is the output required to exist in the data file.
    • CSVDataSource

      public CSVDataSource(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired, char separator)
      Creates a CSVDataSource using the specified RowProcessor to process the data.

      Uses '"' as the quote character, and '\' as the escape character.

      Parameters:
      dataPath - The Path to the data file.
      rowProcessor - The row processor which converts a row into an Example.
      outputRequired - Is the output required to exist in the data file.
      separator - The separator character in the data file.
    • CSVDataSource

      public CSVDataSource(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired, char separator)
      Creates a CSVDataSource using the specified RowProcessor to process the data.

      Uses '"' as the quote character, and '\' as the escape character.

      Parameters:
      dataFile - A URI for the data file.
      rowProcessor - The row processor which converts a row into an Example.
      outputRequired - Is the output required to exist in the data file.
      separator - The separator character in the data file.
    • CSVDataSource

      public CSVDataSource(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote)
      Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.
      Parameters:
      dataFile - A URI for the data file.
      rowProcessor - The row processor which converts a row into an Example.
      outputRequired - Is the output required to exist in the data file.
      separator - The separator character in the data file.
      quote - The quote character in the data file.
    • CSVDataSource

      public CSVDataSource(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote, List<String> headers)
      Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.

      Used in CSVLoader to read a CSV without headers.

      If headers is the empty list then the headers are read from the file, otherwise the file is assumed to not contain headers.

      Parameters:
      dataFile - A URI for the data file.
      rowProcessor - The row processor which converts a row into an Example.
      outputRequired - Is the output required to exist in the data file.
      separator - The separator character in the data file.
      quote - The quote character in the data file.
      headers - The CSV file headers.
    • CSVDataSource

      public CSVDataSource(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote)
      Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.
      Parameters:
      dataPath - The Path to the data file.
      rowProcessor - The row processor which converts a row into an Example.
      outputRequired - Is the output required to exist in the data file.
      separator - The separator character in the data file.
      quote - The quote character in the data file.
    • CSVDataSource

      public CSVDataSource(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote, List<String> headers)
      Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.

      If headers is the empty list then the headers are read from the file, otherwise the file is assumed to not contain headers.

      Parameters:
      dataPath - The Path to the data file.
      rowProcessor - The row processor which converts a row into an Example.
      outputRequired - Is the output required to exist in the data file.
      separator - The separator character in the data file.
      quote - The quote character in the data file.
      headers - The CSV file headers.
  • Method Details