Class DataOptions

java.lang.Object
org.tribuo.data.DataOptions
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Options

public final class DataOptions extends Object implements com.oracle.labs.mlrg.olcut.config.Options
Options for working with training and test data in a CLI.
  • Field Details

    • hashDim

      @Option(longName="hashing-dimension", usage="Hashing dimension used for standard text format.") public int hashDim
      Hashing dimension used for standard text format.
    • ngram

      @Option(longName="ngram", usage="Ngram size to generate when using standard text format.") public int ngram
      Ngram size to generate when using standard text format.
    • termCounting

      @Option(longName="term-counting", usage="Use term counts instead of boolean when using the standard text format.") public boolean termCounting
      Use term counts instead of boolean when using the standard text format.
    • outputPath

      @Option(charName='f', longName="model-output-path", usage="Path to serialize model to.") public Path outputPath
      Path to serialize model to.
    • seed

      @Option(charName='r', longName="seed", usage="RNG seed.") public long seed
      RNG seed.
    • inputFormat

      @Option(charName='s', longName="input-format", usage="Loads the data using the specified format.") public DataOptions.InputFormat inputFormat
      Loads the data using the specified format.
    • csvResponseName

      @Option(longName="csv-response-name", usage="Response name in the csv file.") public String csvResponseName
      Response name in the csv file.
    • delimiter

      @Option(longName="csv-delimiter", usage="Delimiter") public DataOptions.Delimiter delimiter
      Delimiter
    • csvQuoteChar

      @Option(longName="csv-quote-char", usage="Quote character in the CSV file.") public char csvQuoteChar
      Quote character in the CSV file.
    • rowProcessor

      @Option(longName="columnar-row-processor", usage="The name of the row processor from the config file.") public RowProcessor<?> rowProcessor
      The name of the row processor from the config file.
    • minCount

      @Option(longName="min-count", usage="Minimum cardinality of the features.") public int minCount
      Minimum cardinality of the features.
    • trainingPath

      @Option(charName='u', longName="training-file", usage="Path to the training file.") public Path trainingPath
      Path to the training file.
    • testingPath

      @Option(charName='v', longName="testing-file", usage="Path to the testing file.") public Path testingPath
      Path to the testing file.
    • scaleFeatures

      @Option(longName="scale-features", usage="Scales the features to the range 0-1 independently.") public boolean scaleFeatures
      Scales the features to the range 0-1 independently.
    • scaleIncZeros

      @Option(longName="scale-including-zeros", usage="Includes implicit zeros in the scale range calculation.") public boolean scaleIncZeros
      Includes implicit zeros in the scale range calculation.
  • Constructor Details

    • DataOptions

      public DataOptions()
  • Method Details

    • getOptionsDescription

      public String getOptionsDescription()
      Specified by:
      getOptionsDescription in interface com.oracle.labs.mlrg.olcut.config.Options
    • load

      public <T extends Output<T>> com.oracle.labs.mlrg.olcut.util.Pair<Dataset<T>,Dataset<T>> load(OutputFactory<T> outputFactory) throws IOException
      Loads the training and testing data from trainingPath and testingPath according to the other parameters specified in this class.
      Type Parameters:
      T - The dataset output type.
      Parameters:
      outputFactory - The output factory to use to process the inputs.
      Returns:
      A pair containing the training and testing datasets. The training dataset is element 'A' and the testing dataset is element 'B'.
      Throws:
      IOException - If the paths could not be loaded.
    • saveModel

      public <T extends Output<T>> void saveModel(Model<T> model) throws IOException
      Saves the model out to the path in outputPath.
      Type Parameters:
      T - The model's output type.
      Parameters:
      model - The model to save.
      Throws:
      IOException - If the model could not be saved.