Class SplitCharactersTokenizer.SplitCharactersSplitterFunction
java.lang.Object
org.tribuo.util.tokens.impl.SplitCharactersTokenizer.SplitCharactersSplitterFunction
- All Implemented Interfaces:
SplitFunctionTokenizer.SplitFunction
- Enclosing class:
SplitCharactersTokenizer
public static class SplitCharactersTokenizer.SplitCharactersSplitterFunction
extends Object
implements SplitFunctionTokenizer.SplitFunction
Splits tokens at the supplied characters.
- Author:
- Philip Ogren
-
Constructor Summary
ConstructorsConstructorDescriptionSplitCharactersSplitterFunction(char[] splitCharacters, char[] splitXDigitsCharacters) Constructs a splitting function using the supplied split characters. -
Method Summary
Modifier and TypeMethodDescriptionapply(int codepoint, int index, CharSequence cs) Applies the split function.booleanisSplitCharacter(char c) Checks if this is a valid split character or whitespace.booleanisSplitXDigitCharacter(char c) Checks if this a valid split character outside of a run of digits.
-
Constructor Details
-
SplitCharactersSplitterFunction
public SplitCharactersSplitterFunction(char[] splitCharacters, char[] splitXDigitsCharacters) Constructs a splitting function using the supplied split characters.- Parameters:
splitCharacters- The characters to split on.splitXDigitsCharacters- Characters that are valid split points outside of a run of digits.
-
-
Method Details
-
apply
Description copied from interface:SplitFunctionTokenizer.SplitFunctionApplies the split function.- Specified by:
applyin interfaceSplitFunctionTokenizer.SplitFunction- Parameters:
codepoint- The codepoint to check.index- The character index.cs- The sequence that's being split.- Returns:
- How the sequence should be split.
-
isSplitCharacter
public boolean isSplitCharacter(char c) Checks if this is a valid split character or whitespace.- Parameters:
c- The character to check.- Returns:
- True if the character should split the token.
-
isSplitXDigitCharacter
public boolean isSplitXDigitCharacter(char c) Checks if this a valid split character outside of a run of digits.- Parameters:
c- The character to check.- Returns:
- True if the character should split the token.
-