Class WhitespaceTokenizer

java.lang.Object
org.tribuo.util.tokens.impl.SplitFunctionTokenizer
org.tribuo.util.tokens.impl.WhitespaceTokenizer
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>, Cloneable, Tokenizer

public class WhitespaceTokenizer extends SplitFunctionTokenizer
A simple tokenizer that splits on whitespace. This tokenizer does not create tokens that correspond to whitespace - only those spans of text delimited by whitespace. For example, the text "a b" will result in two tokens "a" and "b".
  • Field Details

  • Constructor Details

    • WhitespaceTokenizer

      public WhitespaceTokenizer()
      Constructs a tokenizer that splits on whitespace.
  • Method Details

    • getProvenance

      public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
    • clone

      public WhitespaceTokenizer clone()
      Description copied from interface: Tokenizer
      Clones a tokenizer with it's configuration. Cloned tokenizers are not processing the same text as the original tokenizer and need to be reset with a fresh CharSequence.
      Specified by:
      clone in interface Tokenizer
      Overrides:
      clone in class SplitFunctionTokenizer
      Returns:
      A tokenizer with the same configuration, but independent state.