Package org.tribuo.util.tokens.universal


package org.tribuo.util.tokens.universal
An implementation of a "universal" tokenizer which will split on word boundaries or character boundaries for languages where word boundaries are contextual.

It was originally developed to support information retrieval and forms a useful baseline tokenizer for generating features for machine learning.