Class KMeansModel

java.lang.Object
org.tribuo.Model<ClusterID>
org.tribuo.clustering.kmeans.KMeansModel
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<ModelProvenance>, Serializable, ProtoSerializable<org.tribuo.protos.core.ModelProto>

public class KMeansModel extends Model<ClusterID>
A K-Means model with a selectable distance function.

The predict method of this model assigns centres to the provided input, but it does not update the model's centroids.

The predict method is single threaded.

See:

 J. Friedman, T. Hastie, & R. Tibshirani.
 "The Elements of Statistical Learning"
 Springer 2001. PDF
 
See Also:
  • Field Details

    • CURRENT_VERSION

      public static final int CURRENT_VERSION
      Protobuf serialization version.
      See Also:
  • Method Details

    • deserializeFromProto

      public static KMeansModel deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException
      Deserialization factory.
      Parameters:
      version - The serialized object version.
      className - The class name.
      message - The serialized data.
      Returns:
      The deserialized object.
      Throws:
      com.google.protobuf.InvalidProtocolBufferException - If the protobuf could not be parsed from the message.
    • getCentroidVectors

      public DenseVector[] getCentroidVectors()
      Returns a copy of the centroids.

      In most cases you should prefer getCentroids() as it performs the mapping from Tribuo's internal feature ids to the externally visible feature names for you. This method provides direct access to the centroid vectors for use in downstream processing if the ids are not relevant (or are known to match).

      Returns:
      The centroids.
    • getCentroids

      public List<List<Feature>> getCentroids()
      Returns a list of features, one per centroid.

      This should be used in preference to getCentroidVectors() as it performs the mapping from Tribuo's internal feature ids to the externally visible feature names.

      Returns:
      A list containing all the centroids.
    • predict

      public Prediction<ClusterID> predict(Example<ClusterID> example)
      Description copied from class: Model
      Uses the model to predict the output for a single example.

      predict does not mutate the example.

      Throws IllegalArgumentException if the example has no features or no feature overlap with the model.

      Specified by:
      predict in class Model<ClusterID>
      Parameters:
      example - the example to predict.
      Returns:
      the result of the prediction.
    • getTopFeatures

      public Map<String,List<com.oracle.labs.mlrg.olcut.util.Pair<String,Double>>> getTopFeatures(int n)
      Description copied from class: Model
      Gets the top n features associated with this model.

      If the model does not produce per output feature lists, it returns a map with a single element with key Model.ALL_OUTPUTS.

      If the model cannot describe it's top features then it returns Collections.emptyMap().

      Specified by:
      getTopFeatures in class Model<ClusterID>
      Parameters:
      n - the number of features to return. If this value is less than 0, all features should be returned for each class, unless the model cannot score it's features.
      Returns:
      a map from string outputs to an ordered list of pairs of feature names and weights associated with that feature in the model
    • getExcuse

      public Optional<Excuse<ClusterID>> getExcuse(Example<ClusterID> example)
      Description copied from class: Model
      Generates an excuse for an example.

      This attempts to explain a classification result. Generating an excuse may be quite an expensive operation.

      This excuse either contains per class information or an entry with key Model.ALL_OUTPUTS.

      The optional is empty if the model does not provide excuses.

      Specified by:
      getExcuse in class Model<ClusterID>
      Parameters:
      example - The input example.
      Returns:
      An optional excuse object. The optional is empty if this model does not provide excuses.
    • serialize

      public org.tribuo.protos.core.ModelProto serialize()
      Description copied from interface: ProtoSerializable
      Serializes this object to a protobuf.
      Specified by:
      serialize in interface ProtoSerializable<org.tribuo.protos.core.ModelProto>
      Overrides:
      serialize in class Model<ClusterID>
      Returns:
      The protobuf.
    • copy

      protected KMeansModel copy(String newName, ModelProvenance newProvenance)
      Description copied from class: Model
      Copies a model, replacing its provenance and name with the supplied values.

      Used to provide the provenance removal functionality.

      Specified by:
      copy in class Model<ClusterID>
      Parameters:
      newName - The new name.
      newProvenance - The new provenance.
      Returns:
      A copy of the model.