Uses of Package
org.tribuo.util.tokens.impl
Package
Description
Simple fixed rule tokenizers.
Provides an implementation of a Wordpiece tokenizer which implements
to the Tribuo
Tokenizer
API.-
ClassDescriptionA tokenizer wrapping a
BreakIterator
instance.A convenience class for when you are required to provide a tokenizer but you don't actually want to split up the text into tokens.This tokenizer is loosely based on the notion of word shape which is a common feature used in NLP.This implementation ofTokenizer
is instantiated with an array of characters that are considered split characters.This class supports character-by-character (that is, codepoint-by-codepoint) iteration over input text to create tokens.An interface for checking if the text should be split at the supplied codepoint.A combination of aSplitFunctionTokenizer.SplitType
and aToken.TokenType
.Defines different ways that a tokenizer can split the input text at a given character.This implementation ofTokenizer
is instantiated with a regular expression pattern which determines how to split a string into tokens.A simple tokenizer that splits on whitespace. -
ClassDescriptionThis class supports character-by-character (that is, codepoint-by-codepoint) iteration over input text to create tokens.An interface for checking if the text should be split at the supplied codepoint.