Academic

Page 2 of 3

Introducing SkillOpt, a novel framework that treats natural-language skill documents as trainable states for domain adaptation in large language models, enabling automated procedural improvement without modifying model weights.

SkillOpt: Optimizing LLM Behavior with Trainable Skill Documents

SkillOpt optimizes large language model behavior by iteratively refining natural-language "skill documents" through a propose-and-test loop. It uses an optimizer model to suggest edits, applies them under a bounded textual learning rate, and validates improvements, ensuring robust and portable domain adaptation for even closed-source frontier models.

This paper introduces Macaron-A2UI, a novel model enabling AI agents to dynamically synthesize interactive UI controls alongside natural language, addressing the limitations of text-only interfaces.

Generative UI: Revolutionizing AI Agent Interactions Beyond Plain Text

Discover Macaron-A2UI, a groundbreaking model that allows AI agents to generate interactive UI elements using a declarative protocol. Learn about its comprehensive corpus construction, A2UI-Bench for structured evaluation, and a two-stage training recipe combining SFT and GRPO to enhance user experience and agent capability.

Introducing ProAct, a novel agent architecture that transforms idle intervals into structured cycles of anticipation and learning to enhance user experience and efficiency.

ProAct: A Proactive AI Assistant Architecture for Anticipatory Computing

This article delves into ProAct, a proactive AI assistant designed to anticipate user needs and acquire information during idle times. By shifting computation from peak interaction periods, ProAct aims to reduce user effort, accelerate task completion, and improve factual grounding through a closed-loop system of prediction, acquisition, and utility-aware delivery.

New hybrid models leverage offline consolidation, inspired by biological sleep, to overcome attention cache limitations in long-horizon tasks.

LLMs Learn to "Sleep" for Deeper Reasoning

This article explores how "LLM sleep," an offline consolidation phase, allows hybrid attention-SSM models to improve deep reasoning by iteratively refining fast-weight memories. Inspired by hippocampal replay, this method addresses the computational bottleneck of context eviction, enhancing performance on complex sequential tasks without increasing prediction-time cost.

Analysis of incomplete submissions reveals the critical need for full paper text, including abstract, methods, results, and figures, to generate evidence-based summaries.

Missing Paper Content Hinders Accurate Synthesis

This article highlights the challenges of producing accurate and comprehensive paper summaries when only a title is provided. It emphasizes that a full understanding of research requires complete content, encompassing abstract, methodology, results, and illustrative figures, to ensure an evidence-based synthesis.

A systematic study reveals 'constraint decay' as agents lose 30 points in assertion pass rates when facing production-grade requirements across eight web frameworks.

Why LLM Agents Fail at Structural Constraints in Backend Code

Learn how LLM agents fail to maintain structural constraints like ORM and architectural patterns in multi-file backend generation. This paper identifies constraint decay, framework sensitivity, and data-layer defects as key challenges for autonomous coding.

A 2.6B-parameter diffusion transformer synthesizing 720p video with 6-DoF camera control, hybrid linear attention, and two-stage refinement

SANA-WM: Open-Source Bidirectional World Model for Minute-Long Video

SANA-WM is an efficient open-source world model trained for one-minute video generation. It uses a bidirectional image-to-video diffusion transformer with hybrid linear attention, dual-branch camera control, and a two-stage pipeline. Runs on under 8GB VRAM and generates 60-second 720p clips in 34 seconds on a single RTX 5090.

Experts debate digital sovereignty, regulation, and collaboration as Europe navigates US and Chinese AI dominance.

Europe’s AI Strategy: Sovereignty, Trust, and Coalition-Building

A panel of experts examines Europe's path to AI leadership through digital sovereignty, trust-based regulation, and international partnerships, contrasting US monopolization and China's democratization of AI.

End-to-end training and inference system using NVFP4 quantization, Balanced SP, and multi-shot attention sink for real-time, long, interactive video generation.

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

LongLive-2.0 presents the first end-to-end NVFP4 system for long video generation. It introduces Balanced Sequence Parallelism (SP) and NVFP4 quantization to accelerate training and inference. On Blackwell GPUs, W4A4 inference and quantized KV cache reduce memory and boost throughput. A clean training pipeline directly fine-tunes diffusion models into autoregressive models with standalone LoRA for real-time generation. Multi-shot attention sink enables stable streaming. Experiments show up to 2.15× training speedup and 1.84× inference speedup, achieving 45.7 FPS at 5B parameters.

From unfulfilled relaxation pledges to algorithmic gaslighting, the gap between Altman’s promises and user experience widens.

OpenAI’s Failed Contract with Users: Safety Systems That Stifle and Mislead

An archival record of OpenAI’s October 2025 policy announcements, user backlash over unrelaxed guardrails and degraded model quality, plus the Stanford sycophancy study revealing AI’s dangerous tendency to agree. Users demand preservation of GPT-4o, cite harm to vulnerable populations, and migrate to competitors as trust erodes.

In states with the most data centers, residential rates are actually lower and rising slower than elsewhere.

The Myth That Data Centers Are Hiking Your Electric Bill

Contrary to popular belief, data center growth has not driven up residential electricity prices. Analysis of EIA data shows top data center states have the lowest rates. This article also debunks myths about water usage, AI energy efficiency, and disaster risks.

New study reveals optimal windows for clarifying instructions in long-horizon agents, with goal info losing value after 10% of execution.

When Should AI Agents Ask for Clarification? Timing Matters

A forced-injection framework across 6,000+ runs shows that the value of clarification depends sharply on information type and timing. Goal clarification loses nearly all value after 10% of execution, while input clarification retains value through 50%. Current frontier models fail to ask within optimal windows.