org.tribuo.clustering.hdbscan.HdbscanModel

All Implemented Interfaces:: com.oracle.labs.mlrg.olcut.provenance.Provenancable<ModelProvenance>, Serializable

public final class HdbscanModel extends Model<ClusterID>

A trained HDBSCAN* model which provides the cluster assignment labels and outlier scores for every data point.

The predict method of this model approximates the cluster labels for new data points, based on the current clustering. The model is not updated with the new data. This is a novel prediction technique which leverages the computed cluster exemplars from the HDBSCAN* algorithm.

See Also:

Field Summary

Fields inherited from class org.tribuo.Model
ALL_OUTPUTS, BIAS_FEATURE, featureIDMap, generatesProbabilities, name, outputIDInfo, provenance, provenanceOutput
Method Summary

Modifier and Type

Method

Description

protected HdbscanModel

copy(String newName, ModelProvenance newProvenance)

Copies a model, replacing its provenance and name with the supplied values.

List<Integer>

getClusterLabels()

Returns the cluster labels for the training data.

Optional<Excuse<ClusterID>>

getExcuse(Example<ClusterID> example)

Generates an excuse for an example.

List<Double>

getOutlierScores()

Returns the GLOSH (Global-Local Outlier Scores from Hierarchies) outlier scores for the training data.

Map<String, List<com.oracle.labs.mlrg.olcut.util.Pair<String,Double>>>

getTopFeatures(int n)

Gets the top n features associated with this model.

Prediction<ClusterID>

predict(Example<ClusterID> example)

Uses the model to predict the output for a single example.

Methods inherited from class org.tribuo.Model
castModel, copy, generatesProbabilities, getExcuses, getFeatureIDMap, getName, getOutputIDInfo, getProvenance, innerPredict, predict, predict, setName, toString, validate

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Method Details
- getClusterLabels
  
  public List<Integer> getClusterLabels()
  
  Returns the cluster labels for the training data.
  The cluster labels are in the same order as the original data points. A label of HdbscanTrainer.OUTLIER_NOISE_CLUSTER_LABEL indicates an outlier or noise point.
  
  Returns:
  
  The cluster labels for every data point from the training data.
- getOutlierScores
  
  public List<Double> getOutlierScores()
  
  Returns the GLOSH (Global-Local Outlier Scores from Hierarchies) outlier scores for the training data. These are values between 0 and 1. A higher score indicates that a point is more likely to be an outlier.
  The outlier scores are in the same order as the original data points.
  
  Returns:
  
  The outlier scores for every data point from the training data.
- predict
  
  public Prediction<ClusterID> predict(Example<ClusterID> example)
  
  Description copied from class: Model
  
  Uses the model to predict the output for a single example.
  predict does not mutate the example.
  Throws IllegalArgumentException if the example has no features or no feature overlap with the model.
  
  Specified by:
  
  predict in class Model<ClusterID>
  
  Parameters:
  
  example - the example to predict.
  
  Returns:
  
  the result of the prediction.
- getTopFeatures
  
  public Map<String, List<com.oracle.labs.mlrg.olcut.util.Pair<String,Double>>> getTopFeatures(int n)
  
  Description copied from class: Model
  
  Gets the top n features associated with this model.
  If the model does not produce per output feature lists, it returns a map with a single element with key Model.ALL_OUTPUTS.
  
  If the model cannot describe it's top features then it returns Collections.emptyMap().
  
  Specified by:
  
  getTopFeatures in class Model<ClusterID>
  
  Parameters:
  
  n - the number of features to return. If this value is less than 0, all features should be returned for each class, unless the model cannot score it's features.
  
  Returns:
  
  a map from string outputs to an ordered list of pairs of feature names and weights associated with that feature in the model
- getExcuse
  
  public Optional<Excuse<ClusterID>> getExcuse(Example<ClusterID> example)
  
  Description copied from class: Model
  
  Generates an excuse for an example.
  This attempts to explain a classification result. Generating an excuse may be quite an expensive operation.
  This excuse either contains per class information or an entry with key Model.ALL_OUTPUTS.
  The optional is empty if the model does not provide excuses.
  
  Specified by:
  
  getExcuse in class Model<ClusterID>
  
  Parameters:
  
  example - The input example.
  
  Returns:
  
  An optional excuse object. The optional is empty if this model does not provide excuses.
- copy
  
  protected HdbscanModel copy(String newName, ModelProvenance newProvenance)
  
  Description copied from class: Model
  
  Copies a model, replacing its provenance and name with the supplied values.
  Used to provide the provenance removal functionality.
  
  Specified by:
  
  copy in class Model<ClusterID>
  
  Parameters:
  
  newName - The new name.
  
  newProvenance - The new provenance.
  
  Returns:
  
  A copy of the model.

Class HdbscanModel

Field Summary

Fields inherited from class org.tribuo.Model

Method Summary

Methods inherited from class org.tribuo.Model

Methods inherited from class java.lang.Object

Method Details

getClusterLabels

getOutlierScores

predict

getTopFeatures

getExcuse

copy