Interface FieldProcessor

All Superinterfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
All Known Implementing Classes:
DateFieldProcessor, DoubleFieldProcessor, IdentityProcessor, RegexFieldProcessor, TextFieldProcessor

public interface FieldProcessor extends com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
An interface for things that process the columns in a data set.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Interface
    Description
    static enum 
    The types of generated features.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    The namespacing separator.
  • Method Summary

    Modifier and Type
    Method
    Description
    copy(String newFieldName)
    Returns a copy of this FieldProcessor bound to the supplied newFieldName.
    Returns the feature type this FieldProcessor generates.
    Gets the field name this FieldProcessor uses.
    default int
    Binarised categoricals can be namespaced, where the field name is appended with "#<non-negative-int>" to denote the namespace.
    process(String value)
    Processes the field value and generates a (possibly empty) list of ColumnarFeatures.

    Methods inherited from interface com.oracle.labs.mlrg.olcut.config.Configurable

    postConfig

    Methods inherited from interface com.oracle.labs.mlrg.olcut.provenance.Provenancable

    getProvenance
  • Field Details

  • Method Details

    • getFieldName

      String getFieldName()
      Gets the field name this FieldProcessor uses.
      Returns:
      The field name.
    • process

      List<ColumnarFeature> process(String value)
      Processes the field value and generates a (possibly empty) list of ColumnarFeatures.
      Parameters:
      value - The field value to process.
      Returns:
      A list of ColumnarFeatures.
    • getFeatureType

      Returns the feature type this FieldProcessor generates.
      Returns:
      The feature type.
    • getNumNamespaces

      default int getNumNamespaces()
      Binarised categoricals can be namespaced, where the field name is appended with "#<non-negative-int>" to denote the namespace. This allows one FieldProcessor to emit multiple binarised categoricals from the same field value, provided each emitted feature is in a different namespace. Without this guarantee it's impossible to recover the original categorical distribution before binarisation. If there is only a single namespace, it is omitted from the feature name.
      Returns:
      The number of namespaces.
    • copy

      FieldProcessor copy(String newFieldName)
      Returns a copy of this FieldProcessor bound to the supplied newFieldName.
      Parameters:
      newFieldName - The new field name for the copy.
      Returns:
      A copy of this FieldProcessor.