Combining hierarchical latent tokenization with block-wise discrete diffusion and self-speculation for faster byte-level language modelsFast Byte Latent Transformer: Efficient Byte-Level Generation via Diffusion and Speculation
This paper introduces BLT Diffusion (BLT-D), BLT Self-speculation (BLT-S), and BLT Diffusion+Verification (BLT-DV) to accelerate byte-level language models. By replacing autoregressive decoding with block-wise diffusion and verification, the methods achieve over 50% memory-bandwidth reduction and up to 92% with larger blocks, while maintaining competitive performance on translation and code generation tasks.