The500Feed.Live
Everything going on in AI - updated daily from 500+ sources
📄 ResearchMay 21, 2026
Tokenization with Split Trees
We introduce Tokenization with Split Trees (ToaST), a subword tokenization method that directly optimizes compression under a new recursive inference procedure. ToaST greedily splits each pretoken into a full binary tree using precomputed byte n-gram counts, independent of any vocabulary. Given a vo...
Read Original Article →