Class WhitespaceTokenizer
java.lang.Object
org.tribuo.util.tokens.impl.SplitFunctionTokenizer
org.tribuo.util.tokens.impl.WhitespaceTokenizer
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
,Cloneable
,Tokenizer
A simple tokenizer that splits on whitespace. This tokenizer does not create
tokens that corresond to whitespace - only those spans of text delimited by
whitespace. For example, the text "a b" will result in two tokens "a" and "b".
-
Nested Class Summary
Nested classes/interfaces inherited from class org.tribuo.util.tokens.impl.SplitFunctionTokenizer
SplitFunctionTokenizer.SplitFunction, SplitFunctionTokenizer.SplitResult, SplitFunctionTokenizer.SplitType
-
Field Summary
FieldsModifier and TypeFieldDescriptionFields inherited from class org.tribuo.util.tokens.impl.SplitFunctionTokenizer
splitFunction
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionclone()
Clones a tokenizer with it's configuration.com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance
Methods inherited from class org.tribuo.util.tokens.impl.SplitFunctionTokenizer
advance, getEnd, getStart, getText, getType, reset
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface com.oracle.labs.mlrg.olcut.config.Configurable
postConfig
-
Field Details
-
whitespaceSplitCharacterFunction
-
-
Constructor Details
-
WhitespaceTokenizer
public WhitespaceTokenizer()
-
-
Method Details
-
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance() -
clone
Description copied from interface:Tokenizer
Clones a tokenizer with it's configuration. Cloned tokenizers are not processing the same text as the original tokenizer and need to be reset with a fresh CharSequence.- Specified by:
clone
in interfaceTokenizer
- Overrides:
clone
in classSplitFunctionTokenizer
- Returns:
- A tokenizer with the same configuration, but independent state.
-