Package | Description |
---|---|
org.tribuo.classification.explanations.lime |
Provides an implementation of LIME (Locally Interpretable Model Explanations).
|
org.tribuo.data.text.impl |
Provides implementations of text data processors.
|
org.tribuo.util.tokens |
Core definitions for tokenization.
|
org.tribuo.util.tokens.impl |
Simple fixed rule tokenizers.
|
org.tribuo.util.tokens.impl.wordpiece |
Provides an implementation of a Wordpiece tokenizer which implements
to the Tribuo
Tokenizer API. |
org.tribuo.util.tokens.options |
OLCUT
Options implementations
which can construct Tokenizer s of various types. |
org.tribuo.util.tokens.universal |
An implementation of a "universal" tokenizer which will split
on word boundaries or character boundaries for languages where
word boundaries are contextual.
|
Constructor and Description |
---|
LIMEColumnar(SplittableRandom rng,
Model<Label> innerModel,
SparseTrainer<Regressor> explanationTrainer,
int numSamples,
RowProcessor<Label> exampleGenerator,
Tokenizer tokenizer)
Constructs a LIME explainer for a model which uses the columnar data processing system.
|
LIMEText(SplittableRandom rng,
Model<Label> innerModel,
SparseTrainer<Regressor> explanationTrainer,
int numSamples,
TextFeatureExtractor<Label> extractor,
Tokenizer tokenizer)
Constructs a LIME explainer for a model which uses text data.
|
Constructor and Description |
---|
BasicPipeline(Tokenizer tokenizer,
int ngram) |
NgramProcessor(Tokenizer tokenizer,
int n,
double value)
Creates a processor that will generate token ngrams of size
n . |
TokenPipeline(Tokenizer tokenizer,
int ngram,
boolean termCounting)
Creates a new token pipeline.
|
TokenPipeline(Tokenizer tokenizer,
int ngram,
boolean termCounting,
int dimension)
Creates a new token pipeline.
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
Tokenizer.clone()
Clones a tokenizer with it's configuration.
|
Modifier and Type | Method and Description |
---|---|
static Supplier<Tokenizer> |
Tokenizer.createSupplier(Tokenizer tokenizer) |
static ThreadLocal<Tokenizer> |
Tokenizer.createThreadLocal(Tokenizer tokenizer) |
Modifier and Type | Method and Description |
---|---|
static Supplier<Tokenizer> |
Tokenizer.createSupplier(Tokenizer tokenizer) |
static ThreadLocal<Tokenizer> |
Tokenizer.createThreadLocal(Tokenizer tokenizer) |
Modifier and Type | Class and Description |
---|---|
class |
BreakIteratorTokenizer
A tokenizer wrapping a
BreakIterator instance. |
class |
NonTokenizer
A convenience class for when you are required to provide a tokenizer but you
don't actually want to split up the text into tokens.
|
class |
ShapeTokenizer
This tokenizer is loosely based on the notion of word shape which is a common
feature used in NLP.
|
class |
SplitCharactersTokenizer
This implementation of
Tokenizer is instantiated with an array of
characters that are considered split characters. |
class |
SplitFunctionTokenizer
This class supports character-by-character (that is, codepoint-by-codepoint)
iteration over input text to create tokens.
|
class |
SplitPatternTokenizer
This implementation of
Tokenizer is instantiated with a regular
expression pattern which determines how to split a string into tokens. |
class |
WhitespaceTokenizer
A simple tokenizer that splits on whitespace.
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
SplitFunctionTokenizer.clone() |
Modifier and Type | Class and Description |
---|---|
class |
WordpieceBasicTokenizer
This is a tokenizer that is used "upstream" of
WordpieceTokenizer and
implements much of the functionality of the 'BasicTokenizer'
implementation in huggingface. |
class |
WordpieceTokenizer
This Tokenizer is meant to be a reasonable approximation of the BertTokenizer
defined here.
|
Constructor and Description |
---|
WordpieceTokenizer(Wordpiece wordpiece,
Tokenizer tokenizer,
boolean toLowerCase,
boolean stripAccents,
Set<String> neverSplit)
Constructs a wordpiece tokenizer.
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
TokenizerOptions.getTokenizer()
Creates the appropriately configured tokenizer.
|
Tokenizer |
SplitPatternTokenizerOptions.getTokenizer() |
Tokenizer |
SplitCharactersTokenizerOptions.getTokenizer() |
Tokenizer |
CoreTokenizerOptions.getTokenizer() |
Tokenizer |
BreakIteratorTokenizerOptions.getTokenizer() |
Modifier and Type | Class and Description |
---|---|
class |
UniversalTokenizer
This class was originally written for the purpose of document indexing in an
information retrieval context (principally used in Sun Labs' Minion search
engine).
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
UniversalTokenizer.clone() |
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.