Class FeatureHasher

java.lang.Object
org.tribuo.data.text.impl.FeatureHasher
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>, FeatureTransformer

public class FeatureHasher extends Object implements FeatureTransformer
Hashes the feature names to reduce the dimensionality.

Uses murmurhash3_x86_32 as the hashing function for the feature names.

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    Default value for the hash function seed.
    static final int
    Default value for the value hash function seed.
  • Constructor Summary

    Constructors
    Constructor
    Description
    FeatureHasher(int dimension)
    Constructs a feature hasher using the supplied hash dimension.
    FeatureHasher(int dimension, boolean preserveValue)
    Constructs a feature hasher using the supplied hash dimension.
    FeatureHasher(int dimension, int hashSeed, int valueHashSeed, boolean preserveValue)
    Constructs a feature hasher using the supplied hash dimension and seed values.
  • Method Summary

    Modifier and Type
    Method
    Description
    com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance
     
    map(String tag, List<Feature> features)
    Transforms features into a new list of features
    void
    Used by the OLCUT configuration system, and should not be called by external code.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DEFAULT_HASH_SEED

      public static final int DEFAULT_HASH_SEED
      Default value for the hash function seed.
      See Also:
    • DEFAULT_VALUE_HASH_SEED

      public static final int DEFAULT_VALUE_HASH_SEED
      Default value for the value hash function seed.
      See Also:
  • Constructor Details

    • FeatureHasher

      public FeatureHasher(int dimension)
      Constructs a feature hasher using the supplied hash dimension.

      Note the hasher also hashes the feature value into {-1, 1}.

      Parameters:
      dimension - The dimension to reduce the hashed features into.
    • FeatureHasher

      public FeatureHasher(int dimension, boolean preserveValue)
      Constructs a feature hasher using the supplied hash dimension.
      Parameters:
      dimension - The dimension to reduce the hashed features into.
      preserveValue - If true the feature value is used unaltered in the new features, if false it is hashed into the values {-1, 1}.
    • FeatureHasher

      public FeatureHasher(int dimension, int hashSeed, int valueHashSeed, boolean preserveValue)
      Constructs a feature hasher using the supplied hash dimension and seed values.
      Parameters:
      dimension - The dimension to reduce the hashed features into.
      hashSeed - The seed used in the murmurhash computation.
      valueHashSeed - The seed used in the murmurhash computation for the feature value, unused if preserveValue is true.
      preserveValue - If true the feature value is used unaltered in the new features, if false it is hashed into the values {-1, 1}.
  • Method Details

    • postConfig

      public void postConfig()
      Used by the OLCUT configuration system, and should not be called by external code.
      Specified by:
      postConfig in interface com.oracle.labs.mlrg.olcut.config.Configurable
    • map

      public List<Feature> map(String tag, List<Feature> features)
      Description copied from interface: FeatureTransformer
      Transforms features into a new list of features
      Specified by:
      map in interface FeatureTransformer
      Parameters:
      tag - The feature name tag.
      features - The features to transform.
      Returns:
      The transformed features.
    • getProvenance

      public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>