Package org.tribuo.util.tokens.impl
Enum Class SplitFunctionTokenizer.SplitType
java.lang.Object
java.lang.Enum<SplitFunctionTokenizer.SplitType>
org.tribuo.util.tokens.impl.SplitFunctionTokenizer.SplitType
- All Implemented Interfaces:
Serializable
,Comparable<SplitFunctionTokenizer.SplitType>
,Constable
- Enclosing class:
- SplitFunctionTokenizer
Defines different ways that a tokenizer can split the input text at a given character.
-
Nested Class Summary
Nested classes/interfaces inherited from class java.lang.Enum
Enum.EnumDesc<E extends Enum<E>>
-
Enum Constant Summary
Enum ConstantDescriptionthe current character is added to the in-progress token (i.e.The current character will cause the in-progress token to be completed after the current character is appended to the in-progress token.The current character will cause the in-progress token to be completed.The current character will cause the in-progress token to be completed the current character will be included in the next token.The current character should cause the in-progress token to be completed. -
Method Summary
Modifier and TypeMethodDescriptionReturns the enum constant of this class with the specified name.static SplitFunctionTokenizer.SplitType[]
values()
Returns an array containing the constants of this enum class, in the order they are declared.
-
Enum Constant Details
-
NO_SPLIT
the current character is added to the in-progress token (i.e. do not split on the current character) -
SPLIT_AT
The current character will cause the in-progress token to be completed. the current character will not be included in any returned token and the token type of the corresponding SplitResult is ignored (SeeSplitFunctionTokenizer.SplitResult.SPLIT_AT
). This SplitType may be useful for whitespace. -
SPLIT_BEFORE
The current character will cause the in-progress token to be completed the current character will be included in the next token. The token type of the corresponding SplitResult is ignored (SeeSplitFunctionTokenizer.SplitResult.SPLIT_BEFORE
). This SplitType may be useful for e.g. capitalized letters when CamelCase splitting of digits when separating out a currency symbol. -
SPLIT_AFTER
The current character will cause the in-progress token to be completed after the current character is appended to the in-progress token. The token type of the created token (that includes the current character) will be assigned the type included with theSplitFunctionTokenizer.SplitResult
. -
SPLIT_BEFORE_AND_AFTER
The current character should cause the in-progress token to be completed. The token assigned to the in-progress token will be whatever was previously assigned to the previous character. This token will be followed by a second single-character token consisting of the current character. The token type assigned to this second token will be provided with theSplitFunctionTokenizer.SplitResult
.
-
-
Method Details
-
values
Returns an array containing the constants of this enum class, in the order they are declared.- Returns:
- an array containing the constants of this enum class, in the order they are declared
-
valueOf
Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)- Parameters:
name
- the name of the enum constant to be returned.- Returns:
- the enum constant with the specified name
- Throws:
IllegalArgumentException
- if this enum class has no constant with the specified nameNullPointerException
- if the argument is null
-