Class AnomalyDataGenerator

java.lang.Object
org.tribuo.anomaly.example.AnomalyDataGenerator

public abstract class AnomalyDataGenerator extends Object
Generates three example train and test datasets, used for unit testing. They don't necessarily have sensible boundaries, it's for testing the machinery rather than accuracy.

Also has a dataset generator which returns a training dataset with no anomalies sampled from a single gaussian, and a test dataset sampled from two gaussians where the second is labelled anomalous.

For most use cases that are not unit tests, it is recommended to use GaussianAnomalyDataSource which has similar functionality but is more flexible and configurable.

  • Constructor Details

    • AnomalyDataGenerator

      public AnomalyDataGenerator()
  • Method Details

    • gaussianAnomaly

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>,Dataset<Event>> gaussianAnomaly()
      Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous. Generates 200 training examples and 200 test examples, with 20% anomalies.
      Returns:
      A pair of datasets.
    • gaussianAnomaly

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>,Dataset<Event>> gaussianAnomaly(long size, double fractionAnomalous)
      Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.
      Parameters:
      size - The number of points to sample for each dataset.
      fractionAnomalous - The fraction of anomalous data to generate.
      Returns:
      A pair of datasets.
    • denseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>,Dataset<Event>> denseTrainTest()
      Makes a simple dataset for training and testing.

      Used for smoke testing, doesn't have a real boundary.

      Returns:
      A pair containing a training dataset and a testing dataset.
    • denseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>,Dataset<Event>> denseTrainTest(double negate)
      Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.
      Parameters:
      negate - Supply -1.0 to negate some feature values.
      Returns:
      A pair of datasets.
    • sparseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>,Dataset<Event>> sparseTrainTest()
      Makes a simple dataset for training and testing.

      Used for smoke testing, doesn't have a real boundary.

      Returns:
      A pair containing a training dataset and a testing dataset.
    • sparseTrainTest

      public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>,Dataset<Event>> sparseTrainTest(double negate)
      Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. It has the same 4 clusters {0,1,2,3}.
      Parameters:
      negate - Supply -1.0 to negate some feature values.
      Returns:
      A pair of datasets.
    • invalidSparseExample

      public static Example<Event> invalidSparseExample()
      Generates an example with the feature ids 1,5,8, which does not intersect with the ids used elsewhere in this class. This should make the example empty at prediction time.
      Returns:
      An example with features {1:1.0,5:5.0,8:8.0}.
    • emptyExample

      public static Example<Event> emptyExample()
      Generates an example with no features.
      Returns:
      An example with no features.