Package org.tribuo.data
Class DataOptions
java.lang.Object
org.tribuo.data.DataOptions
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Options
Options for working with training and test data in a CLI.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
The delimiters supported by CSV files in this options object.static enum
The input formats supported by this options object. -
Field Summary
Modifier and TypeFieldDescriptionchar
Quote character in the CSV file.Response name in the csv file.Delimiterint
Hashing dimension used for standard text format.Loads the data using the specified format.int
Minimum cardinality of the features.boolean
Write the model out as a protobuf.int
Ngram size to generate when using standard text format.Path to serialize model to.RowProcessor<?>
The name of the row processor from the config file.boolean
Scales the features to the range 0-1 independently.boolean
Includes implicit zeros in the scale range calculation.long
RNG seed.boolean
Use term counts instead of boolean when using the standard text format.Path to the testing file.Path to the training file.Fields inherited from interface com.oracle.labs.mlrg.olcut.config.Options
header
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionload
(OutputFactory<T> outputFactory) Loads the training and testing data fromtrainingPath
andtestingPath
according to the other parameters specified in this class.<T extends Output<T>>
voidSaves the model out to the path inoutputPath
.
-
Field Details
-
hashDim
@Option(longName="hashing-dimension", usage="Hashing dimension used for standard text format.") public int hashDimHashing dimension used for standard text format. -
ngram
@Option(longName="ngram", usage="Ngram size to generate when using standard text format.") public int ngramNgram size to generate when using standard text format. -
termCounting
@Option(longName="term-counting", usage="Use term counts instead of boolean when using the standard text format.") public boolean termCountingUse term counts instead of boolean when using the standard text format. -
outputPath
@Option(charName='f', longName="model-output-path", usage="Path to serialize model to.") public Path outputPathPath to serialize model to. -
modelOutputProtobuf
@Option(longName="model-output-protobuf", usage="Serialize the model as a protobuf.") public boolean modelOutputProtobufWrite the model out as a protobuf. -
seed
@Option(charName='r', longName="seed", usage="RNG seed.") public long seedRNG seed. -
inputFormat
@Option(charName='s', longName="input-format", usage="Loads the data using the specified format.") public DataOptions.InputFormat inputFormatLoads the data using the specified format. -
csvResponseName
@Option(longName="csv-response-name", usage="Response name in the csv file.") public String csvResponseNameResponse name in the csv file. -
delimiter
Delimiter -
csvQuoteChar
@Option(longName="csv-quote-char", usage="Quote character in the CSV file.") public char csvQuoteCharQuote character in the CSV file. -
rowProcessor
@Option(longName="columnar-row-processor", usage="The name of the row processor from the config file.") public RowProcessor<?> rowProcessorThe name of the row processor from the config file. -
minCount
@Option(longName="min-count", usage="Minimum cardinality of the features.") public int minCountMinimum cardinality of the features. -
trainingPath
@Option(charName='u', longName="training-file", usage="Path to the training file.") public Path trainingPathPath to the training file. -
testingPath
@Option(charName='v', longName="testing-file", usage="Path to the testing file.") public Path testingPathPath to the testing file. -
scaleFeatures
@Option(longName="scale-features", usage="Scales the features to the range 0-1 independently.") public boolean scaleFeaturesScales the features to the range 0-1 independently. -
scaleIncZeros
@Option(longName="scale-including-zeros", usage="Includes implicit zeros in the scale range calculation.") public boolean scaleIncZerosIncludes implicit zeros in the scale range calculation.
-
-
Constructor Details
-
DataOptions
public DataOptions()
-
-
Method Details
-
getOptionsDescription
- Specified by:
getOptionsDescription
in interfacecom.oracle.labs.mlrg.olcut.config.Options
-
load
public <T extends Output<T>> com.oracle.labs.mlrg.olcut.util.Pair<Dataset<T>,Dataset<T>> load(OutputFactory<T> outputFactory) throws IOException Loads the training and testing data fromtrainingPath
andtestingPath
according to the other parameters specified in this class.- Type Parameters:
T
- The dataset output type.- Parameters:
outputFactory
- The output factory to use to process the inputs.- Returns:
- A pair containing the training and testing datasets. The training dataset is element 'A' and the testing dataset is element 'B'.
- Throws:
IOException
- If the paths could not be loaded.
-
saveModel
Saves the model out to the path inoutputPath
.- Type Parameters:
T
- The model's output type.- Parameters:
model
- The model to save.- Throws:
IOException
- If the model could not be saved.
-