Class | Description |
---|---|
BreakIteratorTokenizer |
A tokenizer wrapping a
BreakIterator instance. |
NonTokenizer |
A convenience class for when you are required to provide a tokenizer but you
don't actually want to split up the text into tokens.
|
ShapeTokenizer |
This tokenizer is loosely based on the notion of word shape which is a common
feature used in NLP.
|
SplitCharactersTokenizer |
This implementation of
Tokenizer is instantiated with an array of
characters that are considered split characters. |
SplitPatternTokenizer |
This implementation of
Tokenizer is instantiated with a regular
expression pattern which determines how to split a string into tokens. |
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.