Class CSVLoader<T extends Output<T>>
- Type Parameters:
T
- The type of the output generated.
The delimiter and quote characters are user controlled, so this class can parse TSVs, CSVs, semi-colon separated data and other types of single character delimiter separated data.
This class is a simple loader *only* for numerical CSV files with a String response field.
If you need more complex processing, the response field isn't present, or you don't wish to
use all of the columns as features then you should use CSVDataSource
and build a
RowProcessor
to cope with your specific input format.
CSVLoader is thread safe and immutable.
Multi-output responses such as MultiLabel
or Regressor
can be processed in
two different ways either as a single column of separated values, or multiple columns. If
there is a single column the value is passed directly to the OutputFactory
. If
there are multiple response columns then the name of the column is concatenated with the
value, then a list of the concatenated values is passed to the OutputFactory
.
-
Nested Class Summary
-
Constructor Summary
ConstructorDescriptionCSVLoader
(char separator, char quote, OutputFactory<T> outputFactory) Creates a CSVLoader using the supplied separator, quote and output factory.CSVLoader
(char separator, OutputFactory<T> outputFactory) Creates a CSVLoader using the supplied separator and output factory.CSVLoader
(OutputFactory<T> outputFactory) Creates a CSVLoader using the supplied output factory. -
Method Summary
Modifier and TypeMethodDescriptionLoads a DataSource from the specified csv file then wraps it in a dataset.Loads a DataSource from the specified csv file then wraps it in a dataset.Loads a DataSource from the specified csv file then wraps it in a dataset.Loads a DataSource from the specified csv file then wraps it in a dataset.loadDataSource
(URL csvPath, String responseName) Loads a DataSource from the specified csv path.loadDataSource
(URL csvPath, String responseName, String[] header) Loads a DataSource from the specified csv path.loadDataSource
(URL csvPath, Set<String> responseNames) Loads a DataSource from the specified csv path.loadDataSource
(URL csvPath, Set<String> responseNames, String[] header) Loads a DataSource from the specified csv path.loadDataSource
(Path csvPath, String responseName) Loads a DataSource from the specified csv path.loadDataSource
(Path csvPath, String responseName, String[] header) Loads a DataSource from the specified csv path.loadDataSource
(Path csvPath, Set<String> responseNames) Loads a DataSource from the specified csv path.loadDataSource
(Path csvPath, Set<String> responseNames, String[] header) Loads a DataSource from the specified csv path.
-
Constructor Details
-
CSVLoader
Creates a CSVLoader using the supplied separator, quote and output factory.- Parameters:
separator
- The separator character.quote
- The quote character.outputFactory
- The output factory.
-
CSVLoader
Creates a CSVLoader using the supplied separator and output factory. Sets the quote toCSVIterator.QUOTE
.- Parameters:
separator
- The separator character.outputFactory
- The output factory.
-
CSVLoader
Creates a CSVLoader using the supplied output factory. Sets the separator toCSVIterator.SEPARATOR
and the quote toCSVIterator.QUOTE
.- Parameters:
outputFactory
- The output factory.
-
-
Method Details
-
load
Loads a DataSource from the specified csv file then wraps it in a dataset.- Parameters:
csvPath
- The path to load.responseName
- The name of the response variable.- Returns:
- A dataset containing the csv data.
- Throws:
IOException
- If the read failed.
-
load
public MutableDataset<T> load(Path csvPath, String responseName, String[] header) throws IOException Loads a DataSource from the specified csv file then wraps it in a dataset.- Parameters:
csvPath
- The path to load.responseName
- The name of the response variable.header
- The header of the CSV if it's not present in the file.- Returns:
- A dataset containing the csv data.
- Throws:
IOException
- If the read failed.
-
load
Loads a DataSource from the specified csv file then wraps it in a dataset.The
responseNames
set is traversed in iteration order to emit outputs, and should be an ordered set to ensure reproducibility.If there are multiple elements in
responseNames
then the responses are processed into the form 'column-name=column-value' before being passed to theOutputFactory
for conversion into anOutput
.- Parameters:
csvPath
- The path to load.responseNames
- The names of the response variables.- Returns:
- A dataset containing the csv data.
- Throws:
IOException
- If the read failed.
-
load
public MutableDataset<T> load(Path csvPath, Set<String> responseNames, String[] header) throws IOException Loads a DataSource from the specified csv file then wraps it in a dataset.The
responseNames
set is traversed in iteration order to emit outputs, and should be an ordered set to ensure reproducibility.If there are multiple elements in
responseNames
then the responses are processed into the form 'column-name=column-value' before being passed to theOutputFactory
for conversion into anOutput
.- Parameters:
csvPath
- The path to load.responseNames
- The names of the response variables.header
- The header of the CSV if it's not present in the file.- Returns:
- A dataset containing the csv data.
- Throws:
IOException
- If the read failed.
-
loadDataSource
Loads a DataSource from the specified csv path.- Parameters:
csvPath
- The csv to load from.responseName
- The name of the response variable.- Returns:
- A datasource containing the csv data.
- Throws:
IOException
- If the disk read failed.
-
loadDataSource
Loads a DataSource from the specified csv path.- Parameters:
csvPath
- The csv to load from.responseName
- The name of the response variable.- Returns:
- A datasource containing the csv data.
- Throws:
IOException
- If the disk read failed.
-
loadDataSource
public DataSource<T> loadDataSource(Path csvPath, String responseName, String[] header) throws IOException Loads a DataSource from the specified csv path.- Parameters:
csvPath
- The csv to load from.responseName
- The name of the response variable.header
- The header of the CSV if it's not present in the file.- Returns:
- A datasource containing the csv data.
- Throws:
IOException
- If the disk read failed.
-
loadDataSource
public DataSource<T> loadDataSource(URL csvPath, String responseName, String[] header) throws IOException Loads a DataSource from the specified csv path.- Parameters:
csvPath
- The csv to load from.responseName
- The name of the response variable.header
- The header of the CSV if it's not present in the file.- Returns:
- A datasource containing the csv data.
- Throws:
IOException
- If the disk read failed.
-
loadDataSource
Loads a DataSource from the specified csv path.The
responseNames
set is traversed in iteration order to emit outputs, and should be an ordered set to ensure reproducibility.If there are multiple elements in
responseNames
then the responses are processed into the form 'column-name=column-value' before being passed to theOutputFactory
for conversion into anOutput
.- Parameters:
csvPath
- The csv to load from.responseNames
- The names of the response variables.- Returns:
- A datasource containing the csv data.
- Throws:
IOException
- If the disk read failed.
-
loadDataSource
Loads a DataSource from the specified csv path.The
responseNames
set is traversed in iteration order to emit outputs, and should be an ordered set to ensure reproducibility.If there are multiple elements in
responseNames
then the responses are processed into the form 'column-name=column-value' before being passed to theOutputFactory
for conversion into anOutput
.- Parameters:
csvPath
- The csv to load from.responseNames
- The names of the response variables.- Returns:
- A datasource containing the csv data.
- Throws:
IOException
- If the disk read failed.
-
loadDataSource
public DataSource<T> loadDataSource(Path csvPath, Set<String> responseNames, String[] header) throws IOException Loads a DataSource from the specified csv path.The
responseNames
set is traversed in iteration order to emit outputs, and should be an ordered set to ensure reproducibility.If there are multiple elements in
responseNames
then the responses are processed into the form 'column-name=column-value' before being passed to theOutputFactory
for conversion into anOutput
.- Parameters:
csvPath
- The csv to load from.responseNames
- The names of the response variables.header
- The header of the CSV if it's not present in the file.- Returns:
- A datasource containing the csv data.
- Throws:
IOException
- If the disk read failed.
-
loadDataSource
public DataSource<T> loadDataSource(URL csvPath, Set<String> responseNames, String[] header) throws IOException Loads a DataSource from the specified csv path.The
responseNames
set is traversed in iteration order to emit outputs, and should be an ordered set to ensure reproducibility.If there are multiple elements in
responseNames
then the responses are processed into the form 'column-name=column-value' before being passed to theOutputFactory
for conversion into anOutput
.- Parameters:
csvPath
- The csv to load from.responseNames
- The names of the response variables.header
- The header of the CSV if it's not present in the file.- Returns:
- A datasource containing the csv data.
- Throws:
IOException
- If the disk read failed.
-
CSVDataSource
.