Interface FieldProcessor
- All Superinterfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
- All Known Implementing Classes:
DoubleFieldProcessor
,IdentityProcessor
,RegexFieldProcessor
,TextFieldProcessor
public interface FieldProcessor
extends com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
An interface for things that process the columns in a data set.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic enum
The types of generated features. -
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionReturns a copy of this FieldProcessor bound to the supplied newFieldName.Returns the feature type this FieldProcessor generates.Gets the field name this FieldProcessor uses.default int
Binarised categoricals can be namespaced, where the field name is appended with "#<non-negative-int>" to denote the namespace.Processes the field value and generates a (possibly empty) list ofColumnarFeature
s.Methods inherited from interface com.oracle.labs.mlrg.olcut.config.Configurable
postConfig
Methods inherited from interface com.oracle.labs.mlrg.olcut.provenance.Provenancable
getProvenance
-
Field Details
-
NAMESPACE
-
-
Method Details
-
getFieldName
-
process
Processes the field value and generates a (possibly empty) list ofColumnarFeature
s.- Parameters:
value
- The field value to process.- Returns:
- A list of
ColumnarFeature
s.
-
getFeatureType
Returns the feature type this FieldProcessor generates.- Returns:
- The feature type.
-
getNumNamespaces
Binarised categoricals can be namespaced, where the field name is appended with "#<non-negative-int>" to denote the namespace. This allows one FieldProcessor to emit multiple binarised categoricals from the same field value, provided each emitted feature is in a different namespace. Without this guarantee it's impossible to recover the original categorical distribution before binarisation. If there is only a single namespace, it is omitted from the feature name.- Returns:
- The number of namespaces.
-
copy
Returns a copy of this FieldProcessor bound to the supplied newFieldName.- Parameters:
newFieldName
- The new field name for the copy.- Returns:
- A copy of this FieldProcessor.
-