Class GaussianAnomalyDataSource

java.lang.Object
org.tribuo.anomaly.example.GaussianAnomalyDataSource
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>, Iterable<Example<Event>>, ConfigurableDataSource<Event>, DataSource<Event>

public final class GaussianAnomalyDataSource extends Object implements ConfigurableDataSource<Event>
Generates an anomaly detection dataset sampling each feature uniformly from a univariate Gaussian.

Or equivalently sampling all the features from a spherical Gaussian. Can accept at most 26 features.

By default the expected means are (1.0, 2.0, 1.0, 2.0, 5.0), with variances (1.0, 0.5, 0.25, 1.0, 0.1). The anomalous means are (-2.0, 2.0, -2.0, 2.0, -10.0), with variances (1.0, 0.5, 0.25, 1.0, 0.1) which are the same as the default expected variances.

  • Constructor Details

    • GaussianAnomalyDataSource

      public GaussianAnomalyDataSource(int numSamples, float fractionAnomalous, long seed)
      Generates anomaly detection examples sampling each feature uniformly from a univariate Gaussian.

      Or equivalently sampling all the features from a spherical Gaussian.

      Can accept at most 26 features.

      Parameters:
      numSamples - The size of the output dataset.
      fractionAnomalous - The fraction of anomalies in the generated data.
      seed - The rng seed to use.
    • GaussianAnomalyDataSource

      public GaussianAnomalyDataSource(int numSamples, double[] expectedMeans, double[] expectedVariances, double[] anomalousMeans, double[] anomalousVariances, float fractionAnomalous, long seed)
      Generates anomaly detection examples sampling each feature uniformly from a univariate Gaussian.

      Or equivalently sampling all the features from a spherical Gaussian.

      Can accept at most 26 features.

      Parameters:
      numSamples - The size of the output dataset.
      expectedMeans - The means of the expected event features.
      expectedVariances - The variances of the expected event features.
      anomalousMeans - The means of the anomalous event features.
      anomalousVariances - The variances of the anomalous event features.
      fractionAnomalous - The fraction of anomalies to generate.
      seed - The rng seed to use.
  • Method Details

    • postConfig

      public void postConfig()
      Used by the OLCUT configuration system, and should not be called by external code.
      Specified by:
      postConfig in interface com.oracle.labs.mlrg.olcut.config.Configurable
    • getOutputFactory

      public OutputFactory<Event> getOutputFactory()
      Description copied from interface: DataSource
      Returns the OutputFactory associated with this Output subclass.
      Specified by:
      getOutputFactory in interface DataSource<Event>
      Returns:
      The output factory.
    • getProvenance

      public DataSourceProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<DataSourceProvenance>
    • iterator

      public Iterator<Example<Event>> iterator()
      Specified by:
      iterator in interface Iterable<Example<Event>>
    • generateDataset

      public static MutableDataset<Event> generateDataset(int numSamples, double[] expectedMeans, double[] expectedVariances, double[] anomalousMeans, double[] anomalousVariances, float fractionAnomalous, long seed)
      Generates an anomaly detection dataset sampling each feature uniformly from a univariate Gaussian.

      Or equivalently sampling all the features from a spherical Gaussian.

      Can accept at most 26 features.

      Parameters:
      numSamples - The size of the output dataset.
      expectedMeans - The means of the expected event features.
      expectedVariances - The variances of the expected event features.
      anomalousMeans - The means of the anomalous event features.
      anomalousVariances - The variances of the anomalous event features.
      fractionAnomalous - The fraction of anomalies to generate.
      seed - The rng seed to use.
      Returns:
      A dataset drawn from a gaussian.