Academic
Page 1 of 3

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Language Tasks
Introducing SCOPE, a data-free self-play framework for open-ended tasks that co-evolves a Challenger for task generation and a Solver for answering. It uses a self-judge to create rubrics and grade responses, improving 7-8B instruction-tuned models by up to +10.4 points on open-ended and +13.8 points on held-out QA benchmarks.

Scaling PEFT for Trillion-Parameter Personal Models
This article explores the scaling capabilities of Parameter-Efficient Fine-Tuning (PEFT) towards creating millions of personal models, each potentially reaching trillion-parameter scales. It delves into the architectural and practical considerations for achieving such unprecedented model personalization and efficiency.

How LFM2.5-8B-A1B Powers On-Device AI with Unmatched Throughput
LFM2.5-8B-A1B is a new family of hybrid models designed for on-device deployment, building on the LFM2 architecture with extended pre-training and reinforcement learning. It offers competitive performance with larger models on instruction following and agentic tasks, boasting unmatched throughput on CPU and GPU inference with day-one support for llama.cpp, MLX, vLLM, and SGLang.

How Science Superpowers Transforms AI Agents into Disciplined Scientific Collaborators
Science Superpowers guides AI agents through a rigorous, preregistered workflow for scientific collaboration, ensuring precision, reproducibility, and protection against p-hacking. This guide details its functionality, emphasizing its zero third-party dependency design and installation across various agent harnesses like Cursor, Claude Code, and Gemini CLI.

SANA-Streaming: Real-time Video Editing with Hybrid Diffusion Transformer
SANA-Streaming introduces a hybrid diffusion transformer and Cycle-Reverse Regularization for real-time streaming video editing. Optimized for NVIDIA Blackwell (RTX 5090), it achieves 1280x704 resolution at 24 FPS with superior temporal coherence and throughput on consumer GPUs.

How UNISON Unifies Audio and Speech Generation with Deep LLM Fusion
UNISON is a unified latent flow-matching framework for audio and speech generation and editing. Using a single set of weights, it integrates text-to-audio, text-to-speech, zero-shot speaker cloning, mixed speech-and-sound scene generation, and audio/speech-in-scene editing—all in one model, one architecture, one forward pass, leveraging deep LLM fusion with Qwen2.5-Omni-7B.

Cosmos 3: Omnimodal World Models for Physical AI
NVIDIA introduces Cosmos 3, a cutting-edge omnimodal world model designed for physical AI applications. This project leverages diverse data inputs to enable robots and embodied AI systems to better understand and interact with the physical world, pushing the boundaries of autonomous intelligence.

What is Ideogram 4: The Open-Weight Text-to-Image Foundation Model?
Ideogram 4 is Ideogram's first open-weight text-to-image foundation model, trained from scratch. It features a new structured JSON prompting interface, best-in-class multilingual text rendering, deep language understanding, explicit layout/color controls, and native 2k resolution. It leads open-weight models in Design Arena and ContraLabs typography evaluations.

How DiffusionBlocks Overcomes the Deep Learning Memory Wall
Explore the "memory wall" in deep learning and how DiffusionBlocks, by reinterpreting residual networks as diffusion processes, offers a principled, block-wise training method. Learn how it dramatically cuts memory usage for large Transformer models, making them accessible on standard hardware.

Why Clean-Latent Prediction Outperforms Velocity in Diffusion Models
Explore how the choice of prediction target profoundly impacts diffusion model performance, even in latent spaces. This article details a controlled study comparing clean-latent (JLT) and velocity prediction (DiT), revealing why direct clean-latent regression consistently yields superior results due to fundamental differences in the underlying regression problem.

New LLM "Sleep" Phase Boosts Long-Context Performance
Researchers propose a "sleep" phase for large language models that converts recent context into persistent fast weights, clearing the key-value cache. This innovative approach addresses the attention bottleneck, enabling models to handle long-context tasks efficiently and perform better on complex benchmarks like math reasoning.

Why Gaussianity is Key to Identifiable World Models in AI
Explore the "if and only if" theorem behind LeJEPA's success in representation learning. Understand the role of Gaussian distributions, alignment, and regularization in achieving linear identifiability in AI's quest for robust world models.