public class ShapeTokenizer extends Object implements Tokenizer
Strings are split according to whitespace and contiguous runs of characters in the same character classes. Except for one exception - if uppercase letters are immediately followed by lowercase letters, then we keep them together. This has the effect of recognizing camel case and splits "CamelCase" into "Camel" and "Case". It also splits "ABCdef AAbb" into "ABCdef" and "AAbb".
Constructor and Description |
---|
ShapeTokenizer() |
Modifier and Type | Method and Description |
---|---|
boolean |
advance()
Advances the tokenizer to the next token.
|
ShapeTokenizer |
clone()
Clones a tokenizer with it's configuration.
|
int |
getEnd()
Gets the ending offset (exclusive) of the current token in the character
sequence
|
com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance |
getProvenance() |
int |
getStart()
Gets the starting character offset of the current token in the character
sequence
|
String |
getText()
Gets the text of the current token, as a string
|
Token.TokenType |
getType()
Gets the type of the current token.
|
void |
reset(CharSequence cs)
Resets the tokenizer so that it operates on a new sequence of characters.
|
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
createSupplier, createThreadLocal, getToken, split, tokenize
public com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance getProvenance()
getProvenance
in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<com.oracle.labs.mlrg.olcut.provenance.ConfiguredObjectProvenance>
public void reset(CharSequence cs)
Tokenizer
public boolean advance()
Tokenizer
public String getText()
Tokenizer
public int getStart()
Tokenizer
public int getEnd()
Tokenizer
public Token.TokenType getType()
Tokenizer
public ShapeTokenizer clone()
Tokenizer
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.