Class FeatureHasher
java.lang.Object
org.tribuo.data.text.impl.FeatureHasher
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>,FeatureTransformer
Hashes the feature names to reduce the dimensionality.
Uses murmurhash3_x86_32 as the hashing function for the feature names.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intDefault value for the hash function seed.static final intDefault value for the value hash function seed. -
Constructor Summary
ConstructorsConstructorDescriptionFeatureHasher(int dimension) Constructs a feature hasher using the supplied hash dimension.FeatureHasher(int dimension, boolean preserveValue) Constructs a feature hasher using the supplied hash dimension.FeatureHasher(int dimension, int hashSeed, int valueHashSeed, boolean preserveValue) Constructs a feature hasher using the supplied hash dimension and seed values. -
Method Summary
Modifier and TypeMethodDescriptioncom.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenanceTransforms features into a new list of featuresvoidUsed by the OLCUT configuration system, and should not be called by external code.
-
Field Details
-
DEFAULT_HASH_SEED
public static final int DEFAULT_HASH_SEEDDefault value for the hash function seed.- See Also:
-
DEFAULT_VALUE_HASH_SEED
public static final int DEFAULT_VALUE_HASH_SEEDDefault value for the value hash function seed.- See Also:
-
-
Constructor Details
-
FeatureHasher
public FeatureHasher(int dimension) Constructs a feature hasher using the supplied hash dimension.Note the hasher also hashes the feature value into {-1, 1}.
- Parameters:
dimension- The dimension to reduce the hashed features into.
-
FeatureHasher
public FeatureHasher(int dimension, boolean preserveValue) Constructs a feature hasher using the supplied hash dimension.- Parameters:
dimension- The dimension to reduce the hashed features into.preserveValue- If true the feature value is used unaltered in the new features, if false it is hashed into the values {-1, 1}.
-
FeatureHasher
public FeatureHasher(int dimension, int hashSeed, int valueHashSeed, boolean preserveValue) Constructs a feature hasher using the supplied hash dimension and seed values.- Parameters:
dimension- The dimension to reduce the hashed features into.hashSeed- The seed used in the murmurhash computation.valueHashSeed- The seed used in the murmurhash computation for the feature value, unused ifpreserveValueis true.preserveValue- If true the feature value is used unaltered in the new features, if false it is hashed into the values {-1, 1}.
-
-
Method Details
-
postConfig
public void postConfig()Used by the OLCUT configuration system, and should not be called by external code.- Specified by:
postConfigin interfacecom.oracle.labs.mlrg.olcut.config.Configurable
-
map
Description copied from interface:FeatureTransformerTransforms features into a new list of features- Specified by:
mapin interfaceFeatureTransformer- Parameters:
tag- The feature name tag.features- The features to transform.- Returns:
- The transformed features.
-
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()- Specified by:
getProvenancein interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
-