Training
Page 3 of 3

Inside Talkie: The 13B LM Trained Only on Pre-1931 Text
Talkie is a 13B-parameter language model trained exclusively on 260 billion tokens of text published before 1931. Built by Nick Levine, Alec Radford, and David Duvenaud to study AI generalization, it sparks discussion on historical perspective and anachronistic outputs. This deep dive covers data sources, processing, limitations, and public release plans.

Gemma 4 MTP Fails to Deliver Speed Gains on Top GPUs
Reddit users tested the work-in-progress Gemma 4 MTP model. Most high-end GPU configurations saw equal or worse performance compared to non-MTP inference. Only a mixed VRAM/CPU setup showed significant speedup. Stability issues reported. Community anticipates further optimizations.

Fast Byte Latent Transformer: Efficient Byte-Level Generation via Diffusion and Speculation
This paper introduces BLT Diffusion (BLT-D), BLT Self-speculation (BLT-S), and BLT Diffusion+Verification (BLT-DV) to accelerate byte-level language models. By replacing autoregressive decoding with block-wise diffusion and verification, the methods achieve over 50% memory-bandwidth reduction and up to 92% with larger blocks, while maintaining competitive performance on translation and code generation tasks.