Training
Placeholder content for Training.

dots.tts: 2B-Parameter Continuous Autoregressive TTS Foundation Model
Introducing dots.tts, a 2B-parameter continuous autoregressive text-to-speech foundation model. It leverages AudioVAE, full-history conditioning, and self-corrective post-training for unparalleled performance on multilingual benchmarks, offering strong generation stability, voice cloning, and emotional expressiveness with efficient MeanFlow distillation.

Hyper-Epoch Pretraining (q0) for Data-Constrained Language Models
1Q Labs researchers introduce Hyper-Epoch Pretraining (q0), a conceptual shift from single-model training to exploring and aggregating a population of models. q0 uses cyclic schedules, chain distillation, and a learned prior to achieve significant data efficiency gains and lower validation loss in multi-epoch pretraining.

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Language Tasks
Introducing SCOPE, a data-free self-play framework for open-ended tasks that co-evolves a Challenger for task generation and a Solver for answering. It uses a self-judge to create rubrics and grade responses, improving 7-8B instruction-tuned models by up to +10.4 points on open-ended and +13.8 points on held-out QA benchmarks.

SANA-Streaming: Real-time Video Editing with Hybrid Diffusion Transformer
SANA-Streaming introduces a hybrid diffusion transformer and Cycle-Reverse Regularization for real-time streaming video editing. Optimized for NVIDIA Blackwell (RTX 5090), it achieves 1280x704 resolution at 24 FPS with superior temporal coherence and throughput on consumer GPUs.

Harness-1: Reinforcement Learning for Search Agents
Harness-1 introduces a novel approach to reinforcement learning for search agents through state-externalizing harnesses. This project, detailed in arXiv:2606.02373, provides a framework for advanced AI agent development.

Cosmos 3: Omnimodal World Models for Physical AI
NVIDIA introduces Cosmos 3, a cutting-edge omnimodal world model designed for physical AI applications. This project leverages diverse data inputs to enable robots and embodied AI systems to better understand and interact with the physical world, pushing the boundaries of autonomous intelligence.

How DiffusionBlocks Overcomes the Deep Learning Memory Wall
Explore the "memory wall" in deep learning and how DiffusionBlocks, by reinterpreting residual networks as diffusion processes, offers a principled, block-wise training method. Learn how it dramatically cuts memory usage for large Transformer models, making them accessible on standard hardware.

Why Clean-Latent Prediction Outperforms Velocity in Diffusion Models
Explore how the choice of prediction target profoundly impacts diffusion model performance, even in latent spaces. This article details a controlled study comparing clean-latent (JLT) and velocity prediction (DiT), revealing why direct clean-latent regression consistently yields superior results due to fundamental differences in the underlying regression problem.

Why Gaussianity is Key to Identifiable World Models in AI
Explore the "if and only if" theorem behind LeJEPA's success in representation learning. Understand the role of Gaussian distributions, alignment, and regularization in achieving linear identifiability in AI's quest for robust world models.

SkillOpt: Optimizing LLM Behavior with Trainable Skill Documents
SkillOpt optimizes large language model behavior by iteratively refining natural-language "skill documents" through a propose-and-test loop. It uses an optimizer model to suggest edits, applies them under a bounded textual learning rate, and validates improvements, ensuring robust and portable domain adaptation for even closed-source frontier models.

LLMs Learn to "Sleep" for Deeper Reasoning
This article explores how "LLM sleep," an offline consolidation phase, allows hybrid attention-SSM models to improve deep reasoning by iteratively refining fast-weight memories. Inspired by hippocampal replay, this method addresses the computational bottleneck of context eviction, enhancing performance on complex sequential tasks without increasing prediction-time cost.

Missing Paper Content Hinders Accurate Synthesis
This article highlights the challenges of producing accurate and comprehensive paper summaries when only a title is provided. It emphasizes that a full understanding of research requires complete content, encompassing abstract, methodology, results, and illustrative figures, to ensure an evidence-based synthesis.