Package org.tribuo.util.tokens.impl.wordpiece
package org.tribuo.util.tokens.impl.wordpiece
Provides an implementation of a Wordpiece tokenizer which implements
to the Tribuo
Tokenizer
API.-
ClassDescriptionThis is vanilla implementation of the Wordpiece algorithm as found here: https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/tokenization_bert.pyThis is a tokenizer that is used "upstream" of
WordpieceTokenizer
and implements much of the functionality of the 'BasicTokenizer' implementation in huggingface.This Tokenizer is meant to be a reasonable approximation of the BertTokenizer defined here.