Class BreakIteratorTokenizer
java.lang.Object
org.tribuo.util.tokens.impl.BreakIteratorTokenizer
- All Implemented Interfaces:
 com.oracle.labs.mlrg.olcut.config.Configurable,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>,Cloneable,Tokenizer
A tokenizer wrapping a 
BreakIterator instance.- 
Constructor Summary
ConstructorsConstructorDescriptionBreakIteratorTokenizer(Locale locale) Constructs a BreakIteratorTokenizer using the specified locale. - 
Method Summary
Modifier and TypeMethodDescriptionbooleanadvance()Advances the tokenizer to the next token.clone()Clones a tokenizer with it's configuration.intgetEnd()Gets the ending offset (exclusive) of the current token in the character sequenceReturns the locale string this tokenizer uses.com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenanceintgetStart()Gets the starting character offset of the current token in the character sequencegetText()Gets the text of the current token, as a stringgetType()Gets the type of the current token.voidUsed by the OLCUT configuration system, and should not be called by external code.voidreset(CharSequence cs) Resets the tokenizer so that it operates on a new sequence of characters. 
- 
Constructor Details
- 
BreakIteratorTokenizer
Constructs a BreakIteratorTokenizer using the specified locale.- Parameters:
 locale- The locale to use.
 
 - 
 - 
Method Details
- 
postConfig
public void postConfig()Used by the OLCUT configuration system, and should not be called by external code.- Specified by:
 postConfigin interfacecom.oracle.labs.mlrg.olcut.config.Configurable
 - 
getLanguageTag
Returns the locale string this tokenizer uses.- Returns:
 - The locale string.
 
 - 
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()- Specified by:
 getProvenancein interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
 - 
reset
Description copied from interface:TokenizerResets the tokenizer so that it operates on a new sequence of characters. - 
advance
 - 
getText
 - 
getStart
 - 
getEnd
 - 
getType
Description copied from interface:TokenizerGets the type of the current token. - 
clone
Description copied from interface:TokenizerClones a tokenizer with it's configuration. Cloned tokenizers are not processing the same text as the original tokenizer and need to be reset with a fresh CharSequence. 
 -