Package org.tribuo.clustering.example
Class ClusteringDataGenerator
java.lang.Object
org.tribuo.clustering.example.ClusteringDataGenerator
Generates three example train and test datasets, used for unit testing.
They don't necessarily have sensible cluster boundaries,
it's for testing the machinery rather than accuracy.
Also has a dataset generator which returns a dataset sampled from a mixture of 2 dimensional gaussians.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionGenerates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.Generates an example with no features.gaussianClusters
(long size, long seed) Generates a dataset drawn from a mixture of 5 2d gaussians.Generates an example with the feature ids 1,5,8, which does not intersect with the ids used elsewhere in this class.Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.
-
Constructor Details
-
ClusteringDataGenerator
public ClusteringDataGenerator()
-
-
Method Details
-
gaussianClusters
Generates a dataset drawn from a mixture of 5 2d gaussians.- Parameters:
size
- The number of points to sample for the dataset.seed
- The RNG seed.- Returns:
- A pair of datasets.
-
denseTrainTest
public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>,Dataset<ClusterID>> denseTrainTest()Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.- Returns:
- A pair of datasets.
-
denseTrainTest
public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>,Dataset<ClusterID>> denseTrainTest(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.- Parameters:
negate
- Supply -1.0 to negate some feature values.- Returns:
- A pair of datasets.
-
sparseTrainTest
public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>,Dataset<ClusterID>> sparseTrainTest()Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. It has the same 4 clusters {0,1,2,3}.- Returns:
- A pair of datasets.
-
sparseTrainTest
public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>,Dataset<ClusterID>> sparseTrainTest(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. It has the same 4 clusters {0,1,2,3}.- Parameters:
negate
- Supply -1.0 to negate some feature values.- Returns:
- A pair of datasets.
-
invalidSparseExample
Generates an example with the feature ids 1,5,8, which does not intersect with the ids used elsewhere in this class. This should make the example empty at prediction time.- Returns:
- An example with features {1:1.0,5:5.0,8:8.0}.
-
emptyExample
Generates an example with no features.- Returns:
- An example with no features.
-