Training
Page 2 of 3

Why Gaussianity is Key to Identifiable World Models in AI
Explore the "if and only if" theorem behind LeJEPA's success in representation learning. Understand the role of Gaussian distributions, alignment, and regularization in achieving linear identifiability in AI's quest for robust world models.

MiniMax Unveils M2 Series, Teases M3 with 9.7x Speedup via Sparse Attention
MiniMax releases a technical report on its M2 model series, featuring a sparse Mixture-of-Experts backbone and innovative "interleaved thinking." The report also previews the upcoming M3 model, which achieves a 9.7x prefilling speedup with MiniMax Sparse Attention (MSA) for 1-million-token sequences, pushing AI efficiency boundaries.

How MOSS-SoundEffect v2.0 Revolutionizes Text-to-Audio Synthesis
Discover MOSS-SoundEffect v2.0, a cutting-edge text-to-audio model using a 1.3B-parameter Diffusion Transformer and Flow Matching for superior sound generation. Learn about its capabilities, multilingual support, and optimal settings for creating diverse audio content.

SkillOpt: Optimizing LLM Behavior with Trainable Skill Documents
SkillOpt optimizes large language model behavior by iteratively refining natural-language "skill documents" through a propose-and-test loop. It uses an optimizer model to suggest edits, applies them under a bounded textual learning rate, and validates improvements, ensuring robust and portable domain adaptation for even closed-source frontier models.

LLMs Learn to "Sleep" for Deeper Reasoning
This article explores how "LLM sleep," an offline consolidation phase, allows hybrid attention-SSM models to improve deep reasoning by iteratively refining fast-weight memories. Inspired by hippocampal replay, this method addresses the computational bottleneck of context eviction, enhancing performance on complex sequential tasks without increasing prediction-time cost.

The Recursion Ceiling is a Myth: NovaSky Unleashes Recursive Language Models
Discover how NovaSky's SkyRL framework shatters the limitations of large language models. By spawning recursive child agents within persistent Python sandboxes, models can now reason in multi-turn, multi-agent trees, redefining what "thinking" means for AI.

xAI Completes Grok V9-Medium Training, June Release Expected
xAI has finished training its Grok V9-Medium foundational model, a 1.5 trillion parameter AI with significant improvements over its predecessor, v8-small. The model, which heavily emphasizes coding tasks through Cursor data, is now undergoing fine-tuning and reinforcement learning, with a public release anticipated in early to mid-June 2026.

How to Compile Multi-Step AI Workflows Directly into Small Models
Discover how synthetic data and full-parameter fine-tuning can internalize complex procedures in a small LLM, removing the need for external orchestration and delivering dramatic cost savings.

Z-Anime: Full Anime Fine-Tune on Z-Image Base
Z-Anime is a full fine-tune of the Z-Image Base architecture, not a LoRA merge. It provides anime-style generation with natural language prompting, high diversity, and multiple variants including Base, Distill-8-Step, Distill-4-Step, GGUF, and AIO. Supports 8GB VRAM and includes VAE and text encoder.

Juggernaut Z V1: Cinematic Fine-Tune of Z-Image Base
Juggernaut Z V1 is a cinematic fine-tune of Z-Image Base, trained by KandooAI and released by RunDiffusion. It features dramatic lighting, sharper focus, natural skin, improved anatomy, and better ethnic diversity out of the box. Available in FP16, FP8, and GGUF formats for Diffusers and other workflows.

SANA-WM: Open-Source Bidirectional World Model for Minute-Long Video
SANA-WM is an efficient open-source world model trained for one-minute video generation. It uses a bidirectional image-to-video diffusion transformer with hybrid linear attention, dual-branch camera control, and a two-stage pipeline. Runs on under 8GB VRAM and generates 60-second 720p clips in 34 seconds on a single RTX 5090.

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
LongLive-2.0 presents the first end-to-end NVFP4 system for long video generation. It introduces Balanced Sequence Parallelism (SP) and NVFP4 quantization to accelerate training and inference. On Blackwell GPUs, W4A4 inference and quantized KV cache reduce memory and boost throughput. A clean training pipeline directly fine-tunes diffusion models into autoregressive models with standalone LoRA for real-time generation. Multi-shot attention sink enables stable streaming. Experiments show up to 2.15× training speedup and 1.84× inference speedup, achieving 45.7 FPS at 5B parameters.