TextFeatureExtractorto explain the prediction for a given piece of text.
LIME uses a naive sampling procedure which blanks out words and trains the linear model on a set of binary features, each of which is the presence of a word+position combination. Words are not permuted, and new words are not added (so it's only explaining when the absence of a word would change the prediction).
Ribeiro MT, Singh S, Guestrin C. "Why should I trust you?: Explaining the predictions of any classifier" Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016.
Method SummaryModifier and TypeMethodDescription
protected StringGenerate the feature name by combining the word and index.Samples a new dataset from the input text.
Methods inherited from class org.tribuo.classification.explanations.lime.LIMEBase
explain, explainWithSamples, kernelDist, measureDistance, samplePoint, trainExplainer, transformOutput
(SplittableRandom rng, Model<Label> innerModel, SparseTrainer<Regressor> explanationTrainer, int numSamples, TextFeatureExtractor<Label> extractor, Tokenizer tokenizer)Constructs a LIME explainer for a model which uses text data.
rng- The rng to use for sampling.
innerModel- The model to explain.
explanationTrainer- The sparse trainer to use to generate explanations.
numSamples- The number of samples to generate for each explanation.
TextFeatureExtractorused to generate text features from a string.
tokenizer- The tokenizer used to tokenize the examples.
explainDescription copied from interface:
TextExplainerConverts the supplied text into an
Example, and generates an explanation of the contained
nameFeatureGenerate the feature name by combining the word and index.
name- The word.
idx- The index.
- A string representing both of the inputs.
sampleDataSamples a new dataset from the input text. Uses the tokenized representation, removes words by blanking them out. Only removes words to generate a new sentence, and does not generate the empty sentence.
inputText- The input text.
tokens- The tokenized representation of the input text.
- A list of samples from the input text.