public abstract class SplitFunctionTokenizer extends Object implements Tokenizer
SplitFunctionTokenizer.SplitFunction
which will be called for each character and
a SplitFunctionTokenizer.SplitResult
consisting of a SplitFunctionTokenizer.SplitType
and a Token.TokenType
will be returned.
Tokenization is achieved based on the SplitFunctionTokenizer.SplitResult
returned for each
character. Please see notes below for each SplitFunctionTokenizer.SplitType
and SplitFunctionTokenizer.SplitResult
.Modifier and Type | Class and Description |
---|---|
static interface |
SplitFunctionTokenizer.SplitFunction
An interface for checking if the text should be split at the supplied codepoint.
|
static class |
SplitFunctionTokenizer.SplitResult
A combination of a
SplitFunctionTokenizer.SplitType and a Token.TokenType . |
static class |
SplitFunctionTokenizer.SplitType
Defines different ways that a tokenizer can split the input text at a given character.
|
Modifier and Type | Field and Description |
---|---|
protected SplitFunctionTokenizer.SplitFunction |
splitFunction |
Modifier | Constructor and Description |
---|---|
protected |
SplitFunctionTokenizer()
Constructs a tokenizer, used by OLCUT.
|
|
SplitFunctionTokenizer(SplitFunctionTokenizer.SplitFunction splitFunction)
Creates a new tokenizer using the supplied split function.
|
Modifier and Type | Method and Description |
---|---|
boolean |
advance()
Advances the tokenizer to the next token.
|
Tokenizer |
clone()
Clones a tokenizer with it's configuration.
|
int |
getEnd()
Gets the ending offset (exclusive) of the current token in the character
sequence
|
int |
getStart()
Gets the starting character offset of the current token in the character
sequence
|
String |
getText()
Gets the text of the current token, as a string
|
Token.TokenType |
getType()
Gets the type of the current token.
|
void |
reset(CharSequence cs)
Resets the tokenizer so that it operates on a new sequence of characters.
|
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
createSupplier, createThreadLocal, getToken, split, tokenize
protected SplitFunctionTokenizer.SplitFunction splitFunction
protected SplitFunctionTokenizer()
public SplitFunctionTokenizer(SplitFunctionTokenizer.SplitFunction splitFunction)
splitFunction
- The split function.public void reset(CharSequence cs)
Tokenizer
public boolean advance()
Tokenizer
public String getText()
Tokenizer
public int getStart()
Tokenizer
public int getEnd()
Tokenizer
public Token.TokenType getType()
Tokenizer
public Tokenizer clone() throws CloneNotSupportedException
Tokenizer
clone
in interface Tokenizer
clone
in class Object
CloneNotSupportedException
- if the tokenizer isn't cloneable.Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.