Package org.tribuo.data.columnar
Interface FieldProcessor
- All Superinterfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
- All Known Implementing Classes:
DateFieldProcessor
,DoubleFieldProcessor
,IdentityProcessor
,RegexFieldProcessor
,TextFieldProcessor
public interface FieldProcessor
extends com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
An interface for things that process the columns in a data set.
-
Nested Class Summary
Modifier and TypeInterfaceDescriptionstatic enum
The types of generated features. -
Field Summary
-
Method Summary
Modifier and TypeMethodDescriptionReturns a copy of this FieldProcessor bound to the supplied newFieldName.Returns the feature type this FieldProcessor generates.Gets the field name this FieldProcessor uses.default int
Binarised categoricals can be namespaced, where the field name is appended with "#<non-negative-int>" to denote the namespace.Processes the field value and generates a (possibly empty) list ofColumnarFeature
s.Methods inherited from interface com.oracle.labs.mlrg.olcut.config.Configurable
postConfig
Methods inherited from interface com.oracle.labs.mlrg.olcut.provenance.Provenancable
getProvenance
-
Field Details
-
NAMESPACE
The namespacing separator.- See Also:
-
-
Method Details
-
getFieldName
String getFieldName()Gets the field name this FieldProcessor uses.- Returns:
- The field name.
-
process
Processes the field value and generates a (possibly empty) list ofColumnarFeature
s.- Parameters:
value
- The field value to process.- Returns:
- A list of
ColumnarFeature
s.
-
getFeatureType
FieldProcessor.GeneratedFeatureType getFeatureType()Returns the feature type this FieldProcessor generates.- Returns:
- The feature type.
-
getNumNamespaces
default int getNumNamespaces()Binarised categoricals can be namespaced, where the field name is appended with "#<non-negative-int>" to denote the namespace. This allows one FieldProcessor to emit multiple binarised categoricals from the same field value, provided each emitted feature is in a different namespace. Without this guarantee it's impossible to recover the original categorical distribution before binarisation. If there is only a single namespace, it is omitted from the feature name.- Returns:
- The number of namespaces.
-
copy
Returns a copy of this FieldProcessor bound to the supplied newFieldName.- Parameters:
newFieldName
- The new field name for the copy.- Returns:
- A copy of this FieldProcessor.
-