Skip navigation links

Package org.tribuo.util.tokens.universal

An implementation of a "universal" tokenizer which will split on word boundaries or character boundaries for languages where word boundaries are contextual.

See: Description

Package org.tribuo.util.tokens.universal Description

An implementation of a "universal" tokenizer which will split on word boundaries or character boundaries for languages where word boundaries are contextual.

It was originally developed to support information retrieval and forms a useful baseline tokenizer for generating features for machine learning.

Skip navigation links

Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.