Class LabelledDataGenerator

java.lang.Object
org.tribuo.classification.example.LabelledDataGenerator

public final class LabelledDataGenerator extends Object
Generates three example train and test datasets, used for unit testing. They don't necessarily have sensible classification boundaries, it's for testing the machinery rather than accuracy.
  • Method Details

    • denseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>,Dataset<Label>> denseTrainTest()
      Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.
      Returns:
      A pair of datasets.
    • denseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>,Dataset<Label>> denseTrainTest(double negate)
      Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.
      Parameters:
      negate - Supply -1.0 to insert some negative values into the dataset.
      Returns:
      A pair of datasets.
    • sparseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>,Dataset<Label>> sparseTrainTest()
      Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. It has the same 4 classes {Foo,Bar,Baz,Quux}.
      Returns:
      A pair of train and test datasets.
    • sparseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>,Dataset<Label>> sparseTrainTest(double negate)
      Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. It has the same 4 classes {Foo,Bar,Baz,Quux}.
      Parameters:
      negate - Supply -1.0 to negate some values in this dataset.
      Returns:
      A pair of train and test datasets.
    • binarySparseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>,Dataset<Label>> binarySparseTrainTest()
      Generates a pair of datasets with sparse features and unknown features in the test data. Has binary labels {Foo,Bar}.
      Returns:
      A pair of train and test datasets.
    • binarySparseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>,Dataset<Label>> binarySparseTrainTest(double negate)
      Generates a pair of datasets with sparse features and unknown features in the test data. Has binary labels {Foo,Bar}.
      Parameters:
      negate - Supply -1.0 to negate some values in this dataset.
      Returns:
      A pair of train and test datasets.
    • invalidSparseExample

      public static Example<Label> invalidSparseExample()
      Generates an example with the feature ids 1,5,8, which does not intersect with the ids used elsewhere in this class. This should make the example empty at prediction time.
      Returns:
      An example with features {1:1.0,5:5.0,8:8.0}.
    • emptyExample

      public static Example<Label> emptyExample()
      Generates an example with no features.
      Returns:
      An example with no features.