Package org.tribuo.util.tokens.impl
Class SplitCharactersTokenizer.SplitCharactersSplitterFunction
java.lang.Object
org.tribuo.util.tokens.impl.SplitCharactersTokenizer.SplitCharactersSplitterFunction
- All Implemented Interfaces:
SplitFunctionTokenizer.SplitFunction
- Enclosing class:
- SplitCharactersTokenizer
public static class SplitCharactersTokenizer.SplitCharactersSplitterFunction
extends Object
implements SplitFunctionTokenizer.SplitFunction
Splits tokens at the supplied characters.
-
Constructor Summary
ConstructorDescriptionSplitCharactersSplitterFunction
(char[] splitCharacters, char[] splitXDigitsCharacters) Constructs a splitting function using the supplied split characters. -
Method Summary
Modifier and TypeMethodDescriptionapply
(int codepoint, int index, CharSequence cs) Applies the split function.boolean
isSplitCharacter
(char c) Checks if this is a valid split character or whitespace.boolean
isSplitXDigitCharacter
(char c) Checks if this a valid split character outside of a run of digits.
-
Constructor Details
-
SplitCharactersSplitterFunction
public SplitCharactersSplitterFunction(char[] splitCharacters, char[] splitXDigitsCharacters) Constructs a splitting function using the supplied split characters.- Parameters:
splitCharacters
- The characters to split on.splitXDigitsCharacters
- Characters that are valid split points outside of a run of digits.
-
-
Method Details
-
apply
Description copied from interface:SplitFunctionTokenizer.SplitFunction
Applies the split function.- Specified by:
apply
in interfaceSplitFunctionTokenizer.SplitFunction
- Parameters:
codepoint
- The codepoint to check.index
- The character index.cs
- The sequence that's being split.- Returns:
- How the sequence should be split.
-
isSplitCharacter
public boolean isSplitCharacter(char c) Checks if this is a valid split character or whitespace.- Parameters:
c
- The character to check.- Returns:
- True if the character should split the token.
-
isSplitXDigitCharacter
public boolean isSplitXDigitCharacter(char c) Checks if this a valid split character outside of a run of digits.- Parameters:
c
- The character to check.- Returns:
- True if the character should split the token.
-