Package org.tribuo.util.tokens.impl
Class NonTokenizer
java.lang.Object
org.tribuo.util.tokens.impl.NonTokenizer
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.config.Configurable
,com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
,Cloneable
,Tokenizer
A convenience class for when you are required to provide a tokenizer but you
don't actually want to split up the text into tokens. This tokenizer will
serve up a single "token" corresponding to the input text.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionboolean
advance()
Advances the tokenizer to the next token.clone()
Clones a tokenizer with it's configuration.int
getEnd()
Gets the ending offset (exclusive) of the current token in the character sequencecom.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance
int
getStart()
Gets the starting character offset of the current token in the character sequencegetText()
Gets the text of the current token, as a stringgetType()
Gets the type of the current token.void
reset
(CharSequence cs) Resets the tokenizer so that it operates on a new sequence of characters.Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface com.oracle.labs.mlrg.olcut.config.Configurable
postConfig
-
Constructor Details
-
NonTokenizer
public NonTokenizer()Constructs a NonTokenizer.
-
-
Method Details
-
getProvenance
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()- Specified by:
getProvenance
in interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
-
reset
Description copied from interface:Tokenizer
Resets the tokenizer so that it operates on a new sequence of characters. -
advance
public boolean advance()Description copied from interface:Tokenizer
Advances the tokenizer to the next token. -
getText
Description copied from interface:Tokenizer
Gets the text of the current token, as a string -
getStart
public int getStart()Description copied from interface:Tokenizer
Gets the starting character offset of the current token in the character sequence -
getEnd
public int getEnd()Description copied from interface:Tokenizer
Gets the ending offset (exclusive) of the current token in the character sequence -
getType
Description copied from interface:Tokenizer
Gets the type of the current token. -
clone
Description copied from interface:Tokenizer
Clones a tokenizer with it's configuration. Cloned tokenizers are not processing the same text as the original tokenizer and need to be reset with a fresh CharSequence.
-