Class SplitCharactersTokenizer.SplitCharactersSplitterFunction

java.lang.Object
org.tribuo.util.tokens.impl.SplitCharactersTokenizer.SplitCharactersSplitterFunction
All Implemented Interfaces:
SplitFunctionTokenizer.SplitFunction
Enclosing class:
SplitCharactersTokenizer

public static class SplitCharactersTokenizer.SplitCharactersSplitterFunction extends Object implements SplitFunctionTokenizer.SplitFunction
Splits tokens at the supplied characters.
  • Constructor Details

    • SplitCharactersSplitterFunction

      public SplitCharactersSplitterFunction(char[] splitCharacters, char[] splitXDigitsCharacters)
      Constructs a splitting function using the supplied split characters.
      Parameters:
      splitCharacters - The characters to split on.
      splitXDigitsCharacters - Characters that are valid split points outside of a run of digits.
  • Method Details

    • apply

      public SplitFunctionTokenizer.SplitResult apply(int codepoint, int index, CharSequence cs)
      Description copied from interface: SplitFunctionTokenizer.SplitFunction
      Applies the split function.
      Specified by:
      apply in interface SplitFunctionTokenizer.SplitFunction
      Parameters:
      codepoint - The codepoint to check.
      index - The character index.
      cs - The sequence that's being split.
      Returns:
      How the sequence should be split.
    • isSplitCharacter

      public boolean isSplitCharacter(char c)
      Checks if this is a valid split character or whitespace.
      Parameters:
      c - The character to check.
      Returns:
      True if the character should split the token.
    • isSplitXDigitCharacter

      public boolean isSplitXDigitCharacter(char c)
      Checks if this a valid split character outside of a run of digits.
      Parameters:
      c - The character to check.
      Returns:
      True if the character should split the token.