Class BreakIteratorTokenizer
java.lang.Object
org.tribuo.util.tokens.impl.BreakIteratorTokenizer
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>,Cloneable,Tokenizer
A tokenizer wrapping a
BreakIterator instance.-
Constructor Summary
ConstructorsConstructorDescriptionBreakIteratorTokenizer(Locale locale) Constructs a BreakIteratorTokenizer using the specified locale. -
Method Summary
Modifier and TypeMethodDescriptionbooleanadvance()Advances the tokenizer to the next token.clone()Clones a tokenizer with it's configuration.intgetEnd()Gets the ending offset (exclusive) of the current token in the character sequenceReturns the locale string this tokenizer uses.com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenanceintgetStart()Gets the starting character offset of the current token in the character sequencegetText()Gets the text of the current token, as a stringgetType()Gets the type of the current token.voidUsed by the OLCUT configuration system, and should not be called by external code.voidreset(CharSequence cs) Resets the tokenizer so that it operates on a new sequence of characters.
-
Constructor Details
-
BreakIteratorTokenizer
Constructs a BreakIteratorTokenizer using the specified locale.- Parameters:
locale- The locale to use.
-
-
Method Details
-
postConfig
public void postConfig()Used by the OLCUT configuration system, and should not be called by external code.- Specified by:
postConfigin interfacecom.oracle.labs.mlrg.olcut.config.Configurable
-
getLanguageTag
Returns the locale string this tokenizer uses.- Returns:
- The locale string.
-
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()- Specified by:
getProvenancein interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
-
reset
Description copied from interface:TokenizerResets the tokenizer so that it operates on a new sequence of characters. -
advance
-
getText
-
getStart
-
getEnd
-
getType
Description copied from interface:TokenizerGets the type of the current token. -
clone
Description copied from interface:TokenizerClones a tokenizer with it's configuration. Cloned tokenizers are not processing the same text as the original tokenizer and need to be reset with a fresh CharSequence.
-