Package | Description |
---|---|
org.tribuo.util.tokens.impl |
Simple fixed rule tokenizers.
|
Class and Description |
---|
BreakIteratorTokenizer
A tokenizer wrapping a
BreakIterator instance. |
NonTokenizer
A convenience class for when you are required to provide a tokenizer but you
don't actually want to split up the text into tokens.
|
ShapeTokenizer
This tokenizer is loosely based on the notion of word shape which is a common
feature used in NLP.
|
SplitCharactersTokenizer
This implementation of
Tokenizer is instantiated with an array of
characters that are considered split characters. |
SplitPatternTokenizer
This implementation of
Tokenizer is instantiated with a regular
expression pattern which determines how to split a string into tokens. |
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.