Package org.tribuo.data.csv
Class CSVDataSource<T extends Output<T>>
java.lang.Object
org.tribuo.data.columnar.ColumnarDataSource<T>
org.tribuo.data.csv.CSVDataSource<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>
,Iterable<Example<T>>
,ConfigurableDataSource<T>
,DataSource<T>
A
DataSource
for loading separable data from a text file (e.g., CSV, TSV)
and applying FieldProcessor
s to it.-
Nested Class Summary
-
Field Summary
Fields inherited from class org.tribuo.data.columnar.ColumnarDataSource
outputRequired, rowProcessor
-
Constructor Summary
ConstructorDescriptionCSVDataSource
(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired) Creates a CSVDataSource using the specified RowProcessor to process the data.CSVDataSource
(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired, char separator) Creates a CSVDataSource using the specified RowProcessor to process the data.CSVDataSource
(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote) Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.CSVDataSource
(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote, List<String> headers) Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.CSVDataSource
(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired) Creates a CSVDataSource using the specified RowProcessor to process the data.CSVDataSource
(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired, char separator) Creates a CSVDataSource using the specified RowProcessor to process the data.CSVDataSource
(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote) Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.CSVDataSource
(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote, List<String> headers) Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Used by the OLCUT configuration system, and should not be called by external code.The iterator that emitsColumnarIterator.Row
objects from the underlying data source.toString()
Methods inherited from class org.tribuo.data.columnar.ColumnarDataSource
getMetadataTypes, getOutputFactory, iterator
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
CSVDataSource
Creates a CSVDataSource using the specified RowProcessor to process the data.Uses ',' as the separator, '"' as the quote character, and '\' as the escape character.
- Parameters:
dataPath
- The Path to the data file.rowProcessor
- The row processor which converts a row into anExample
.outputRequired
- Is the output required to exist in the data file.
-
CSVDataSource
Creates a CSVDataSource using the specified RowProcessor to process the data.Uses ',' as the separator, '"' as the quote character, and '\' as the escape character.
- Parameters:
dataFile
- A URI for the data file.rowProcessor
- The row processor which converts a row into anExample
.outputRequired
- Is the output required to exist in the data file.
-
CSVDataSource
public CSVDataSource(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired, char separator) Creates a CSVDataSource using the specified RowProcessor to process the data.Uses '"' as the quote character, and '\' as the escape character.
- Parameters:
dataPath
- The Path to the data file.rowProcessor
- The row processor which converts a row into anExample
.outputRequired
- Is the output required to exist in the data file.separator
- The separator character in the data file.
-
CSVDataSource
public CSVDataSource(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired, char separator) Creates a CSVDataSource using the specified RowProcessor to process the data.Uses '"' as the quote character, and '\' as the escape character.
- Parameters:
dataFile
- A URI for the data file.rowProcessor
- The row processor which converts a row into anExample
.outputRequired
- Is the output required to exist in the data file.separator
- The separator character in the data file.
-
CSVDataSource
public CSVDataSource(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote) Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.- Parameters:
dataFile
- A URI for the data file.rowProcessor
- The row processor which converts a row into anExample
.outputRequired
- Is the output required to exist in the data file.separator
- The separator character in the data file.quote
- The quote character in the data file.
-
CSVDataSource
public CSVDataSource(URI dataFile, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote, List<String> headers) Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.Used in
CSVLoader
to read a CSV without headers.If headers is the empty list then the headers are read from the file, otherwise the file is assumed to not contain headers.
- Parameters:
dataFile
- A URI for the data file.rowProcessor
- The row processor which converts a row into anExample
.outputRequired
- Is the output required to exist in the data file.separator
- The separator character in the data file.quote
- The quote character in the data file.headers
- The CSV file headers.
-
CSVDataSource
public CSVDataSource(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote) Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.- Parameters:
dataPath
- The Path to the data file.rowProcessor
- The row processor which converts a row into anExample
.outputRequired
- Is the output required to exist in the data file.separator
- The separator character in the data file.quote
- The quote character in the data file.
-
CSVDataSource
public CSVDataSource(Path dataPath, RowProcessor<T> rowProcessor, boolean outputRequired, char separator, char quote, List<String> headers) Creates a CSVDataSource using the specified RowProcessor to process the data, and the supplied separator and quote characters to read the input data file.If headers is the empty list then the headers are read from the file, otherwise the file is assumed to not contain headers.
- Parameters:
dataPath
- The Path to the data file.rowProcessor
- The row processor which converts a row into anExample
.outputRequired
- Is the output required to exist in the data file.separator
- The separator character in the data file.quote
- The quote character in the data file.headers
- The CSV file headers.
-
-
Method Details
-
postConfig
public void postConfig()Used by the OLCUT configuration system, and should not be called by external code. -
toString
-
rowIterator
Description copied from class:ColumnarDataSource
The iterator that emitsColumnarIterator.Row
objects from the underlying data source.- Specified by:
rowIterator
in classColumnarDataSource<T extends Output<T>>
- Returns:
- The row level iterator.
-
getProvenance
-