Class WhitespaceTokenizer

All Implemented Interfaces:,<>, Cloneable, Tokenizer

public class WhitespaceTokenizer extends SplitFunctionTokenizer
A simple tokenizer that splits on whitespace. This tokenizer does not create tokens that correspond to whitespace - only those spans of text delimited by whitespace. For example, the text "a b" will result in two tokens "a" and "b".
  • Field Details

  • Constructor Details

    • WhitespaceTokenizer

      public WhitespaceTokenizer()
      Constructs a tokenizer that splits on whitespace.
  • Method Details

    • getProvenance

      public getProvenance()
    • clone

      public WhitespaceTokenizer clone()
      Description copied from interface: Tokenizer
      Clones a tokenizer with it's configuration. Cloned tokenizers are not processing the same text as the original tokenizer and need to be reset with a fresh CharSequence.
      Specified by:
      clone in interface Tokenizer
      clone in class SplitFunctionTokenizer
      A tokenizer with the same configuration, but independent state.