Package org.tribuo.hash
Class HashedFeatureMap
java.lang.Object
org.tribuo.FeatureMap
org.tribuo.ImmutableFeatureMap
org.tribuo.hash.HashedFeatureMap
- All Implemented Interfaces:
Serializable
,Iterable<VariableInfo>
,ProtoSerializable<org.tribuo.protos.core.FeatureDomainProto>
A
FeatureMap
used by the HashingTrainer
to
provide feature name hashing and guarantee that the Model
does not contain feature name information, but still works
with unhashed features names.
The salt must be set after this object has been deserialized.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
Protobuf serialization version.Fields inherited from class org.tribuo.ImmutableFeatureMap
idMap, size
Fields inherited from class org.tribuo.FeatureMap
m
Fields inherited from interface org.tribuo.protos.ProtoSerializable
DESERIALIZATION_METHOD_NAME, PROVENANCE_SERIALIZER
-
Method Summary
Modifier and TypeMethodDescriptionstatic HashedFeatureMap
deserializeFromProto
(int version, String className, com.google.protobuf.Any message) Deserialization factory.static HashedFeatureMap
generateHashedFeatureMap
(FeatureMap map, Hasher hasher) Converts a standardFeatureMap
by hashing each entry using the supplied hash functionHasher
.Gets theVariableIDInfo
for this name.int
Gets the id number for this feature, returns -1 if it's unknown.void
The salt is not serialised with theModel
.Methods inherited from class org.tribuo.ImmutableFeatureMap
generateIDs, generateIDs, get, serialize, size
Methods inherited from class org.tribuo.FeatureMap
deserialize, domainEquals, equals, hashCode, iterator, keySet, toReadableString, toString
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
CURRENT_VERSION
public static final int CURRENT_VERSIONProtobuf serialization version.- See Also:
-
-
Method Details
-
deserializeFromProto
public static HashedFeatureMap deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException Deserialization factory.- Parameters:
version
- The serialized object version.className
- The class name.message
- The serialized data.- Returns:
- The deserialized object.
- Throws:
com.google.protobuf.InvalidProtocolBufferException
- If the protobuf could not be parsed from themessage
.
-
get
Description copied from class:ImmutableFeatureMap
Gets theVariableIDInfo
for this name. Returns null if it's unknown.- Overrides:
get
in classImmutableFeatureMap
- Parameters:
name
- The name to lookup.- Returns:
- The VariableInfo, or null.
-
getID
Gets the id number for this feature, returns -1 if it's unknown.- Overrides:
getID
in classImmutableFeatureMap
- Parameters:
name
- The name of the feature.- Returns:
- A non-negative integer if the feature is known, -1 otherwise.
-
setSalt
The salt is not serialised with theModel
. It must be set after deserialisation to the same value from training time.If the salt is invalid it will throw
IllegalArgumentException
.- Parameters:
salt
- The salt value. Must be the same as the one from training time.
-
generateHashedFeatureMap
Converts a standardFeatureMap
by hashing each entry using the supplied hash functionHasher
.This preserves the index ordering of the original feature names, which is important for making sure test time performance is good.
It guarantees any collisions will produce a feature id number lower than the previous feature's number, and so can be easily removed.
- Parameters:
map
- TheFeatureMap
to hash.hasher
- The hashing function.- Returns:
- A
HashedFeatureMap
.
-