java.lang.Object

org.tribuo.clustering.example.ClusteringDataGenerator

public abstract class ClusteringDataGenerator extends Object

Generates three example train and test datasets, used for unit testing. They don't necessarily have sensible cluster boundaries, it's for testing the machinery rather than accuracy.

Also has a dataset generator which returns a dataset sampled from a mixture of 2 dimensional gaussians.

Constructor Summary

Constructors

Constructor

Description

ClusteringDataGenerator()
Method Summary

Modifier and Type

Method

Description

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

denseTrainTest()

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

denseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.

static Example<ClusterID>

emptyExample()

Generates an example with no features.

static Dataset<ClusterID>

gaussianClusters(long size, long seed)

Generates a dataset drawn from a mixture of 5 2d gaussians.

static Example<ClusterID>

invalidSparseExample()

Generates an example with the feature ids 1,5,8, which does not intersect with the ids used elsewhere in this class.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

sparseTrainTest()

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

sparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- ClusteringDataGenerator
  
  public ClusteringDataGenerator()
Method Details
- gaussianClusters
  
  public static Dataset<ClusterID> gaussianClusters(long size, long seed)
  
  Generates a dataset drawn from a mixture of 5 2d gaussians.
  
  Parameters:
  
  size - The number of points to sample for the dataset.
  
  seed - The RNG seed.
  
  Returns:
  
  A pair of datasets.
- denseTrainTest
  
  public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>> denseTrainTest()
  
  Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.
  
  Returns:
  
  A pair of datasets.
- denseTrainTest
  
  public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>> denseTrainTest(double negate)
  
  Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.
  
  Parameters:
  
  negate - Supply -1.0 to negate some feature values.
  
  Returns:
  
  A pair of datasets.
- sparseTrainTest
  
  public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>> sparseTrainTest()
  
  Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. It has the same 4 clusters {0,1,2,3}.
  
  Returns:
  
  A pair of datasets.
- sparseTrainTest
  
  public static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>> sparseTrainTest(double negate)
  
  Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. It has the same 4 clusters {0,1,2,3}.
  
  Parameters:
  
  negate - Supply -1.0 to negate some feature values.
  
  Returns:
  
  A pair of datasets.
- invalidSparseExample
  
  public static Example<ClusterID> invalidSparseExample()
  
  Generates an example with the feature ids 1,5,8, which does not intersect with the ids used elsewhere in this class. This should make the example empty at prediction time.
  
  Returns:
  
  An example with features {1:1.0,5:5.0,8:8.0}.
- emptyExample
  
  public static Example<ClusterID> emptyExample()
  
  Generates an example with no features.
  
  Returns:
  
  An example with no features.

Class ClusteringDataGenerator

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

ClusteringDataGenerator

Method Details

gaussianClusters

denseTrainTest

denseTrainTest

sparseTrainTest

sparseTrainTest

invalidSparseExample

emptyExample