Class RegexFieldProcessor

java.lang.Object
org.tribuo.data.columnar.processors.field.RegexFieldProcessor
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>, FieldProcessor

public class RegexFieldProcessor extends Object implements FieldProcessor
A FieldProcessor which applies a regex to a field and generates ColumnarFeatures based on the matches.
  • Constructor Details

    • RegexFieldProcessor

      public RegexFieldProcessor(String fieldName, Pattern regex, EnumSet<RegexFieldProcessor.Mode> modes)
      Constructs a field processor which emits features when the field value matches the supplied regex.
      Parameters:
      fieldName - The field name to read.
      regex - The regex to use for matching.
      modes - The matching mode.
    • RegexFieldProcessor

      public RegexFieldProcessor(String fieldName, String regex, EnumSet<RegexFieldProcessor.Mode> modes)
      Constructs a field processor which emits features when the field value matches the supplied regex.

      The regex is compiled on construction.

      Parameters:
      fieldName - The field name to read.
      regex - The regex to use for matching.
      modes - The matching mode.
  • Method Details

    • postConfig

      public void postConfig()
      Used by the OLCUT configuration system, and should not be called by external code.
      Specified by:
      postConfig in interface com.oracle.labs.mlrg.olcut.config.Configurable
    • getFieldName

      public String getFieldName()
      Description copied from interface: FieldProcessor
      Gets the field name this FieldProcessor uses.
      Specified by:
      getFieldName in interface FieldProcessor
      Returns:
      The field name.
    • process

      public List<ColumnarFeature> process(String value)
      Description copied from interface: FieldProcessor
      Processes the field value and generates a (possibly empty) list of ColumnarFeatures.
      Specified by:
      process in interface FieldProcessor
      Parameters:
      value - The field value to process.
      Returns:
      A list of ColumnarFeatures.
    • getFeatureType

      public FieldProcessor.GeneratedFeatureType getFeatureType()
      Description copied from interface: FieldProcessor
      Returns the feature type this FieldProcessor generates.
      Specified by:
      getFeatureType in interface FieldProcessor
      Returns:
      The feature type.
    • copy

      public RegexFieldProcessor copy(String newFieldName)
      Description copied from interface: FieldProcessor
      Returns a copy of this FieldProcessor bound to the supplied newFieldName.
      Specified by:
      copy in interface FieldProcessor
      Parameters:
      newFieldName - The new field name for the copy.
      Returns:
      A copy of this FieldProcessor.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • getProvenance

      public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>