Class TextFieldProcessor

java.lang.Object
org.tribuo.data.columnar.processors.field.TextFieldProcessor
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>, FieldProcessor

public class TextFieldProcessor extends Object implements FieldProcessor
A FieldProcessor which takes a text field and runs a TextPipeline on it to generate features.
  • Constructor Details

    • TextFieldProcessor

      public TextFieldProcessor(String fieldName, TextPipeline pipeline)
      Constructs a field processor which uses the supplied text pipeline to process the field value.
      Parameters:
      fieldName - The field name to read.
      pipeline - The text processing pipeline to use.
  • Method Details

    • getFieldName

      public String getFieldName()
      Description copied from interface: FieldProcessor
      Gets the field name this FieldProcessor uses.
      Specified by:
      getFieldName in interface FieldProcessor
      Returns:
      The field name.
    • process

      public List<ColumnarFeature> process(String value)
      Description copied from interface: FieldProcessor
      Processes the field value and generates a (possibly empty) list of ColumnarFeatures.
      Specified by:
      process in interface FieldProcessor
      Parameters:
      value - The field value to process.
      Returns:
      A list of ColumnarFeatures.
    • getFeatureType

      public FieldProcessor.GeneratedFeatureType getFeatureType()
      Description copied from interface: FieldProcessor
      Returns the feature type this FieldProcessor generates.
      Specified by:
      getFeatureType in interface FieldProcessor
      Returns:
      The feature type.
    • copy

      public TextFieldProcessor copy(String newFieldName)
      Note: the copy shares the text pipeline with the original. This may induce multithreading issues if the underlying pipeline is not thread safe. Tribuo builtin pipelines are thread safe.
      Specified by:
      copy in interface FieldProcessor
      Parameters:
      newFieldName - The new field name for the copy.
      Returns:
      A copy of this TextFieldProcessor with the new field name.
    • wrapFeatures

      public static List<ColumnarFeature> wrapFeatures(String fieldName, List<Feature> inputFeatures)
      Convert the Features from a text pipeline into ColumnarFeatures with the right field name.
      Parameters:
      fieldName - The field name to prepend.
      inputFeatures - The features to convert.
      Returns:
      A list of columnar features.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • getProvenance

      public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>