java.lang.Object

org.tribuo.data.DataOptions

All Implemented Interfaces:: com.oracle.labs.mlrg.olcut.config.Options

public final class DataOptions extends Object implements com.oracle.labs.mlrg.olcut.config.Options

Options for working with training and test data in a CLI.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

DataOptions.Delimiter

The delimiters supported by CSV files in this options object.

static enum

DataOptions.InputFormat

The input formats supported by this options object.
Field Summary

Fields

Modifier and Type

Field

Description

char

csvQuoteChar

Quote character in the CSV file.

String

csvResponseName

Response name in the csv file.

DataOptions.Delimiter

delimiter

Delimiter

int

hashDim

Hashing dimension used for standard text format.

DataOptions.InputFormat

inputFormat

Loads the data using the specified format.

int

minCount

Minimum cardinality of the features.

boolean

modelOutputProtobuf

Write the model out as a protobuf.

int

ngram

Ngram size to generate when using standard text format.

Path

outputPath

Path to serialize model to.

RowProcessor<?>

rowProcessor

The name of the row processor from the config file.

boolean

scaleFeatures

Scales the features to the range 0-1 independently.

boolean

scaleIncZeros

Includes implicit zeros in the scale range calculation.

long

seed

RNG seed.

boolean

termCounting

Use term counts instead of boolean when using the standard text format.

Path

testingPath

Path to the testing file.

Path

trainingPath

Path to the training file.

Fields inherited from interface com.oracle.labs.mlrg.olcut.config.Options
header
Constructor Summary

Constructors

Constructor

Description

DataOptions()
Method Summary

Modifier and Type

Method

Description

String

getOptionsDescription()

<T extends Output<T>> com.oracle.labs.mlrg.olcut.util.Pair<Dataset<T>, Dataset<T>>

load(OutputFactory<T> outputFactory)

Loads the training and testing data from trainingPath and testingPath according to the other parameters specified in this class.

<T extends Output<T>> void

saveModel(Model<T> model)

Saves the model out to the path in outputPath.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- hashDim
  
  @Option(longName="hashing-dimension", usage="Hashing dimension used for standard text format.") public int hashDim
  
  Hashing dimension used for standard text format.
- ngram
  
  @Option(longName="ngram", usage="Ngram size to generate when using standard text format.") public int ngram
  
  Ngram size to generate when using standard text format.
- termCounting
  
  @Option(longName="term-counting", usage="Use term counts instead of boolean when using the standard text format.") public boolean termCounting
  
  Use term counts instead of boolean when using the standard text format.
- outputPath
  
  @Option(charName='f', longName="model-output-path", usage="Path to serialize model to.") public Path outputPath
  
  Path to serialize model to.
- modelOutputProtobuf
  
  @Option(longName="model-output-protobuf", usage="Serialize the model as a protobuf.") public boolean modelOutputProtobuf
  
  Write the model out as a protobuf.
- seed
  
  @Option(charName='r', longName="seed", usage="RNG seed.") public long seed
  
  RNG seed.
- inputFormat
  
  @Option(charName='s', longName="input-format", usage="Loads the data using the specified format.") public DataOptions.InputFormat inputFormat
  
  Loads the data using the specified format.
- csvResponseName
  
  @Option(longName="csv-response-name", usage="Response name in the csv file.") public String csvResponseName
  
  Response name in the csv file.
- delimiter
  
  @Option(longName="csv-delimiter", usage="Delimiter") public DataOptions.Delimiter delimiter
  
  Delimiter
- csvQuoteChar
  
  @Option(longName="csv-quote-char", usage="Quote character in the CSV file.") public char csvQuoteChar
  
  Quote character in the CSV file.
- rowProcessor
  
  @Option(longName="columnar-row-processor", usage="The name of the row processor from the config file.") public RowProcessor<?> rowProcessor
  
  The name of the row processor from the config file.
- minCount
  
  @Option(longName="min-count", usage="Minimum cardinality of the features.") public int minCount
  
  Minimum cardinality of the features.
- trainingPath
  
  @Option(charName='u', longName="training-file", usage="Path to the training file.") public Path trainingPath
  
  Path to the training file.
- testingPath
  
  @Option(charName='v', longName="testing-file", usage="Path to the testing file.") public Path testingPath
  
  Path to the testing file.
- scaleFeatures
  
  @Option(longName="scale-features", usage="Scales the features to the range 0-1 independently.") public boolean scaleFeatures
  
  Scales the features to the range 0-1 independently.
- scaleIncZeros
  
  @Option(longName="scale-including-zeros", usage="Includes implicit zeros in the scale range calculation.") public boolean scaleIncZeros
  
  Includes implicit zeros in the scale range calculation.
Constructor Details
- DataOptions
  
  public DataOptions()
Method Details
- getOptionsDescription
  
  public String getOptionsDescription()
  
  Specified by:
  
  getOptionsDescription in interface com.oracle.labs.mlrg.olcut.config.Options
- load
  
  public <T extends Output<T>> com.oracle.labs.mlrg.olcut.util.Pair<Dataset<T>, Dataset<T>> load(OutputFactory<T> outputFactory) throws IOException
  
  Loads the training and testing data from trainingPath and testingPath according to the other parameters specified in this class.
  
  Type Parameters:
  
  T - The dataset output type.
  
  Parameters:
  
  outputFactory - The output factory to use to process the inputs.
  
  Returns:
  
  A pair containing the training and testing datasets. The training dataset is element 'A' and the testing dataset is element 'B'.
  
  Throws:
  
  IOException - If the paths could not be loaded.
- saveModel
  
  public <T extends Output<T>> void saveModel(Model<T> model) throws IOException
  
  Saves the model out to the path in outputPath.
  
  Type Parameters:
  
  T - The model's output type.
  
  Parameters:
  
  model - The model to save.
  
  Throws:
  
  IOException - If the model could not be saved.

Class DataOptions

Nested Class Summary

Field Summary

Fields inherited from interface com.oracle.labs.mlrg.olcut.config.Options

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

hashDim

ngram

termCounting

outputPath

modelOutputProtobuf

seed

inputFormat

csvResponseName

delimiter

csvQuoteChar

rowProcessor

minCount

trainingPath

testingPath

scaleFeatures

scaleIncZeros

Constructor Details

DataOptions

Method Details

getOptionsDescription

load

saveModel