Agents

Page 3 of 6

Discover BES, a novel framework coupling forward evolutionary search with backward goal decomposition to overcome sampling bottlenecks in LLM reasoning.

How Bidirectional Evolutionary Search Improves LLM Self-Improvement

This article explains Bidirectional Evolutionary Search (BES), a new framework that enhances LLM self-improvement by combining evolutionary operators for broader exploration with dense, intermediate feedback from goal decomposition. Learn how BES tackles the limitations of traditional sampling methods like best-of-N and tree search.

Discover how LeJEPA achieves linear identifiability and why a Gaussian latent distribution is crucial for perfect recovery of underlying AI world models.

Why Gaussianity is Key to Identifiable World Models in AI

Explore the "if and only if" theorem behind LeJEPA's success in representation learning. Understand the role of Gaussian distributions, alignment, and regularization in achieving linear identifiability in AI's quest for robust world models.

Shanghai-based AI firm, backed by Tencent and Alibaba, details M2's MoE architecture and "interleaved thinking," while previewing M3's significant performance gains for ultra-long contexts.

MiniMax Unveils M2 Series, Teases M3 with 9.7x Speedup via Sparse Attention

MiniMax releases a technical report on its M2 model series, featuring a sparse Mixture-of-Experts backbone and innovative "interleaved thinking." The report also previews the upcoming M3 model, which achieves a 9.7x prefilling speedup with MiniMax Sparse Attention (MSA) for 1-million-token sequences, pushing AI efficiency boundaries.

Exploring Anthropic's Claude Managed Agents update: self-hosted sandboxes, MCP tunnels, and the partner ecosystem enabling secure, production-ready AI.

Inside Enterprise Security for Agentic Workflows

Anthropic's latest Claude Managed Agents update introduces self-hosted sandboxes and MCP tunnels, fundamentally changing how enterprises deploy autonomous AI. This deep dive covers the new security architecture, allowing agents to execute tools and access services within an organization's perimeter, crucial for regulated industries.

Explore MiniCPM5-1B, a 1B-parameter LLM designed for on-device deployment, featuring state-of-the-art performance and a unique 'Think'/'No Think' dual-mode chat template.

What is MiniCPM5-1B and How Does Its Dual-Mode Architecture Work?

Discover MiniCPM5-1B, an efficient 1B-parameter causal language model optimized for local and resource-constrained environments. Learn about its Llama-based architecture, impressive 131K context window, and innovative 'Think' and 'No Think' modes that enable it to function as both a fast assistant and a deliberate reasoner from a single checkpoint.

Introducing SkillOpt, a novel framework that treats natural-language skill documents as trainable states for domain adaptation in large language models, enabling automated procedural improvement without modifying model weights.

SkillOpt: Optimizing LLM Behavior with Trainable Skill Documents

SkillOpt optimizes large language model behavior by iteratively refining natural-language "skill documents" through a propose-and-test loop. It uses an optimizer model to suggest edits, applies them under a bounded textual learning rate, and validates improvements, ensuring robust and portable domain adaptation for even closed-source frontier models.

This paper introduces Macaron-A2UI, a novel model enabling AI agents to dynamically synthesize interactive UI controls alongside natural language, addressing the limitations of text-only interfaces.

Generative UI: Revolutionizing AI Agent Interactions Beyond Plain Text

Discover Macaron-A2UI, a groundbreaking model that allows AI agents to generate interactive UI elements using a declarative protocol. Learn about its comprehensive corpus construction, A2UI-Bench for structured evaluation, and a two-stage training recipe combining SFT and GRPO to enhance user experience and agent capability.

Introducing ProAct, a novel agent architecture that transforms idle intervals into structured cycles of anticipation and learning to enhance user experience and efficiency.

ProAct: A Proactive AI Assistant Architecture for Anticipatory Computing

This article delves into ProAct, a proactive AI assistant designed to anticipate user needs and acquire information during idle times. By shifting computation from peak interaction periods, ProAct aims to reduce user effort, accelerate task completion, and improve factual grounding through a closed-loop system of prediction, acquisition, and utility-aware delivery.

Understand what your coding agents send to language models, debug prompt issues, and monitor usage without TLS pinning hassles.

How to Inspect and Debug AI Agent API Calls with ccglass

Discover ccglass, a local logging reverse-proxy and web dashboard that provides deep insights into AI agent API requests. Learn how to inspect prompts, tool schemas, message history, and cost data for various coding agents and IDEs, bypassing common proxy limitations.

Discover how SmallCode leverages small local LLMs for effective programming tasks on consumer hardware, offering advanced context management and interactive features.

What is SmallCode? A Terminal-Native AI Coding Agent

Explore SmallCode, a terminal-native AI coding agent designed to make 8B–35B parameter local language models powerful for programming. Learn about its context budget management, patch-first editing, TODO-driven planning, and interactive TUI, enabling efficient development fully locally.

Discover how FigMirror replicates reference image styles with your data to produce editable Matplotlib scripts and camera-ready PDFs.

How FigMirror Automates Publication-Quality Figure Generation

Learn about FigMirror, a tool that automates the creation of publication-quality figures. Understand its agentic Drawer-Reviewer loop, Grounded Measurement, and Aesthetic Library, and explore its Web UI and skill-only installation modes for coding agents.

Discover AI-Memory: A shared, persistent wiki for AI coding agents that captures context, enables seamless handoffs, and eliminates re-explanation.

How to Give AI Coding Agents Persistent Memory and Context

Learn how AI-Memory solves the context loss problem for AI coding agents. This tool provides a persistent, Git-versioned Markdown wiki, enabling cross-agent handoffs, automatic context capture, and project isolation for a truly continuous AI-assisted development workflow.