Package org.tribuo.anomaly.example
Class AnomalyDataGenerator
java.lang.Object
org.tribuo.anomaly.example.AnomalyDataGenerator
Generates three example train and test datasets, used for unit testing.
They don't necessarily have sensible boundaries,
it's for testing the machinery rather than accuracy.
Also has a dataset generator which returns a training dataset with no anomalies sampled from a single gaussian, and a test dataset sampled from two gaussians where the second is labelled anomalous.
For most use cases that are not unit tests, it is recommended to use
GaussianAnomalyDataSource
which has similar functionality but
is more flexible and configurable.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionMakes a simple dataset for training and testing.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.Generates an example with no features.Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.gaussianAnomaly
(long size, double fractionAnomalous) Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.Generates an example with the feature ids 1,5,8, which does not intersect with the ids used elsewhere in this class.Makes a simple dataset for training and testing.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.
-
Constructor Details
-
AnomalyDataGenerator
public AnomalyDataGenerator()
-
-
Method Details
-
gaussianAnomaly
Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous. Generates 200 training examples and 200 test examples, with 20% anomalies.- Returns:
- A pair of datasets.
-
gaussianAnomaly
public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>,Dataset<Event>> gaussianAnomaly(long size, double fractionAnomalous) Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.- Parameters:
size
- The number of points to sample for each dataset.fractionAnomalous
- The fraction of anomalous data to generate.- Returns:
- A pair of datasets.
-
denseTrainTest
Makes a simple dataset for training and testing.Used for smoke testing, doesn't have a real boundary.
- Returns:
- A pair containing a training dataset and a testing dataset.
-
denseTrainTest
public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>,Dataset<Event>> denseTrainTest(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.- Parameters:
negate
- Supply -1.0 to negate some feature values.- Returns:
- A pair of datasets.
-
sparseTrainTest
Makes a simple dataset for training and testing.Used for smoke testing, doesn't have a real boundary.
- Returns:
- A pair containing a training dataset and a testing dataset.
-
sparseTrainTest
public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>,Dataset<Event>> sparseTrainTest(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. It has the same 4 clusters {0,1,2,3}.- Parameters:
negate
- Supply -1.0 to negate some feature values.- Returns:
- A pair of datasets.
-
invalidSparseExample
Generates an example with the feature ids 1,5,8, which does not intersect with the ids used elsewhere in this class. This should make the example empty at prediction time.- Returns:
- An example with features {1:1.0,5:5.0,8:8.0}.
-
emptyExample
Generates an example with no features.- Returns:
- An example with no features.
-