Package org.tribuo.data.text.impl
Class FeatureHasher
java.lang.Object
org.tribuo.data.text.impl.FeatureHasher
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
,FeatureTransformer
Hashes the feature names to reduce the dimensionality.
Uses murmurhash3_x86_32 as the hashing function for the feature names.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
Default value for the hash function seed.static final int
Default value for the value hash function seed. -
Constructor Summary
ConstructorDescriptionFeatureHasher
(int dimension) Constructs a feature hasher using the supplied hash dimension.FeatureHasher
(int dimension, boolean preserveValue) Constructs a feature hasher using the supplied hash dimension.FeatureHasher
(int dimension, int hashSeed, int valueHashSeed, boolean preserveValue) Constructs a feature hasher using the supplied hash dimension and seed values. -
Method Summary
Modifier and TypeMethodDescriptioncom.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance
Transforms features into a new list of featuresvoid
Used by the OLCUT configuration system, and should not be called by external code.
-
Field Details
-
DEFAULT_HASH_SEED
public static final int DEFAULT_HASH_SEEDDefault value for the hash function seed.- See Also:
-
DEFAULT_VALUE_HASH_SEED
public static final int DEFAULT_VALUE_HASH_SEEDDefault value for the value hash function seed.- See Also:
-
-
Constructor Details
-
FeatureHasher
public FeatureHasher(int dimension) Constructs a feature hasher using the supplied hash dimension.Note the hasher also hashes the feature value into {-1, 1}.
- Parameters:
dimension
- The dimension to reduce the hashed features into.
-
FeatureHasher
public FeatureHasher(int dimension, boolean preserveValue) Constructs a feature hasher using the supplied hash dimension.- Parameters:
dimension
- The dimension to reduce the hashed features into.preserveValue
- If true the feature value is used unaltered in the new features, if false it is hashed into the values {-1, 1}.
-
FeatureHasher
public FeatureHasher(int dimension, int hashSeed, int valueHashSeed, boolean preserveValue) Constructs a feature hasher using the supplied hash dimension and seed values.- Parameters:
dimension
- The dimension to reduce the hashed features into.hashSeed
- The seed used in the murmurhash computation.valueHashSeed
- The seed used in the murmurhash computation for the feature value, unused ifpreserveValue
is true.preserveValue
- If true the feature value is used unaltered in the new features, if false it is hashed into the values {-1, 1}.
-
-
Method Details
-
postConfig
public void postConfig()Used by the OLCUT configuration system, and should not be called by external code.- Specified by:
postConfig
in interfacecom.oracle.labs.mlrg.olcut.config.Configurable
-
map
Description copied from interface:FeatureTransformer
Transforms features into a new list of features- Specified by:
map
in interfaceFeatureTransformer
- Parameters:
tag
- The feature name tag.features
- The features to transform.- Returns:
- The transformed features.
-
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()- Specified by:
getProvenance
in interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
-