Byte-Pair Encoding Explained: The Algorithm Powering Modern LLM Tokenization
From a 1994 compression algorithm to the tokenizer powering modern LLMs: how Byte-Pair Encoding works and why it became the standard.
From a 1994 compression algorithm to the tokenizer powering modern LLMs: how Byte-Pair Encoding works and why it became the standard.
Working through the edge cases of vocabulary design to understand why modern LLMs use subword tokenization.
Interactive tools and visualizations for teaching LLM architecture, gathered from preparing a workshop on LLMs.