ai/news — home

Agentic Systems

SkillOpt: Optimizing Agent Skills with Trainable Natural-Language Descriptions

SkillOpt, from Microsoft Research, is a text-space optimizer that treats agent skill documentation as a trainable external state. This approach allows agents to self-evolve their capabilities, as shown by @omarsar0's integration, which improved paper-figure extraction quality by 20 points.

Anthropic Dynamic Workflows: Definitions, Claude Code, and Orchestration Patterns

Explore Anthropic's dynamic workflows, where Claude autonomously determines action sequences. This entry defines dynamic workflows, details their implementation in Claude Code as JavaScript scripts for large-scale orchestration, and compares them to static workflows, subagents, and other AI patterns.

How to Automate Penetration Testing with PentesterFlow AI Assistant

PentesterFlow is an open-source terminal assistant for authorized penetration testing and bug hunting. It combines local or remote LLMs with real security tools, keeping the human analyst in control. This guide covers installation, usage, and practical workflows for domain-specific security tasks.

The $6,600 MOBA: What Claude 4.8's Weekend Game Build Reveals About AI Development

A web-based MOBA game, lmaomoba.com, was built by Claude 4.8 (Opus) over a weekend, from a single prompt, using TypeScript, React, Canvas, and PartyKit. All art assets were AI-generated. The project, estimated at 2.7 billion tokens, highlights AI's capacity for rapid, full-stack game development and the associated token costs.

AI Coding

How to Delegate LLM Tasks with cc-fleet in Claude Code

Learn how to use cc-fleet to delegate tasks to various large language models (DeepSeek, GLM, Qwen, Kimi, MiniMax) within Claude Code. This guide covers installation, vendor registration, and leveraging cc-fleet as a secure Claude Code teammate or one-shot headless subagent, protecting your primary credentials and managing vendor API keys securely.

What is SmallCode? A Terminal-Native AI Coding Agent

Explore SmallCode, a terminal-native AI coding agent designed to make 8B–35B parameter local language models powerful for programming. Learn about its context budget management, patch-first editing, TODO-driven planning, and interactive TUI, enabling efficient development fully locally.

How to Give AI Coding Agents Persistent Memory and Context

Learn how AI-Memory solves the context loss problem for AI coding agents. This tool provides a persistent, Git-versioned Markdown wiki, enabling cross-agent handoffs, automatic context capture, and project isolation for a truly continuous AI-assisted development workflow.

What is ADHD and How to Use This AI Skill for Broad Ideation

Explore ADHD, an AI skill designed to prevent cognitive anchoring by forcing broad ideation through parallel cognitive frames. Learn its two-phase process (Diverge, Focus), installation methods, and practical usage examples for open-ended problems in design and coding.

Personal Assistants

Prompting Claude for Critical Feedback and Deeper Insights

User-derived strategies for optimizing Claude's performance in writing and research. Learn how to prompt for critical feedback, effectively manage long contexts, and leverage editing over generation to achieve more specific, insightful AI outputs.

Philosophy, Not Just Data, Holds the Key to Deeper AI

This article argues for integrating philosophical principles into AI priming to achieve more profound and ethically sound artificial intelligence. Moving beyond data-centric training, it explores how philosophical frameworks can enable AI to generate more meaningful and contextually rich responses.

The Fable Prompt Technique: Building Understanding from the Inside Out

Explore Amanda Askell's Fable Prompt Technique, a powerful method for conceptual understanding. This Anthropic-originated approach uses indirect narrative and cognitive friction to build robust mental models, mirroring Claude's alignment philosophy.

ADHD Entrepreneur Uses Claude AI to Redesign 20-Unit RV Fleet, Boost Efficiency

Discover how an entrepreneur with ADHD transformed their 20-unit RV rental business using Claude AI for fleet redesigns, material sourcing, and operational efficiency. This innovative approach led to a high-quality remodel and maintained a perfect customer satisfaction record, even after rigorous use at Burning Man.

LLMs

How LFM2.5-8B-A1B Powers On-Device AI with Unmatched Throughput

LFM2.5-8B-A1B is a new family of hybrid models designed for on-device deployment, building on the LFM2 architecture with extended pre-training and reinforcement learning. It offers competitive performance with larger models on instruction following and agentic tasks, boasting unmatched throughput on CPU and GPU inference with day-one support for llama.cpp, MLX, vLLM, and SGLang.

NVIDIA Nemotron-3-Ultra 550B: A Frontier LLM for Complex AI Workflows

Nemotron-3-Ultra-550B-A55B-BF16 is a frontier-scale LLM by NVIDIA, featuring a LatentMoE architecture, Mamba-2 + MoE + Attention hybrid, and Multi-Token Prediction. Designed for complex multi-step agents, long-context analysis, and high-accuracy reasoning across multiple languages, it offers configurable reasoning and is released under the OpenMDW License.

New LLM "Sleep" Phase Boosts Long-Context Performance

Researchers propose a "sleep" phase for large language models that converts recent context into persistent fast weights, clearing the key-value cache. This innovative approach addresses the attention bottleneck, enabling models to handle long-context tasks efficiently and perform better on complex benchmarks like math reasoning.

MiniMax Unveils M2 Series, Teases M3 with 9.7x Speedup via Sparse Attention

MiniMax releases a technical report on its M2 model series, featuring a sparse Mixture-of-Experts backbone and innovative "interleaved thinking." The report also previews the upcoming M3 model, which achieves a 9.7x prefilling speedup with MiniMax Sparse Attention (MSA) for 1-million-token sequences, pushing AI efficiency boundaries.

Audio

How UNISON Unifies Audio and Speech Generation with Deep LLM Fusion

UNISON is a unified latent flow-matching framework for audio and speech generation and editing. Using a single set of weights, it integrates text-to-audio, text-to-speech, zero-shot speaker cloning, mixed speech-and-sound scene generation, and audio/speech-in-scene editing—all in one model, one architecture, one forward pass, leveraging deep LLM fusion with Qwen2.5-Omni-7B.

How MOSS-SoundEffect v2.0 Revolutionizes Text-to-Audio Synthesis

Discover MOSS-SoundEffect v2.0, a cutting-edge text-to-audio model using a 1.3B-parameter Diffusion Transformer and Flow Matching for superior sound generation. Learn about its capabilities, multilingual support, and optimal settings for creating diverse audio content.

Images

Emoji-Only Prompts Drive AI Image Generation Experiment on r/ChatGPT

An r/ChatGPT user, u/FineTime5266, details experiments with AI image generation using only emoji prompts, showcasing surprisingly good results. The post includes example emoji strings and an AutoModerator message regarding prompt sharing and Discord community engagement.

What is Ideogram 4: The Open-Weight Text-to-Image Foundation Model?

Ideogram 4 is Ideogram's first open-weight text-to-image foundation model, trained from scratch. It features a new structured JSON prompting interface, best-in-class multilingual text rendering, deep language understanding, explicit layout/color controls, and native 2k resolution. It leads open-weight models in Design Arena and ContraLabs typography evaluations.

How Bonsai 4B's Ternary Weights Revolutionize Compact Text-to-Image AI

Explore Bonsai Image Ternary 4B, a 1.21 GB Diffusion Transformer using ternary weights for efficient text-to-image generation. Learn how this model delivers fast, high-quality results without negative prompts, running natively on Linux and Windows with CUDA.

Z-Anime: Full Anime Fine-Tune on Z-Image Base

Z-Anime is a full fine-tune of the Z-Image Base architecture, not a LoRA merge. It provides anime-style generation with natural language prompting, high diversity, and multiple variants including Base, Distill-8-Step, Distill-4-Step, GGUF, and AIO. Supports 8GB VRAM and includes VAE and text encoder.

Video

SwiftVR: Real-Time Generative Video Restoration on Consumer GPUs

SwiftVR is a streaming one-step generative video restoration framework for live-stream applications. It addresses consumer GPU bottlenecks with mask-free shifted-window self-attention and a lightweight autoencoder, achieving real-time 1080p streaming on consumer-grade GPUs and 4K on H100.

How NAVA Generates Synchronized 720p Audio-Video from a Single Prompt

NAVA is a 6.3B-parameter joint audio-video generator that synthesizes synchronized 720p video and audio from a single prompt. It utilizes an Align-then-Fuse MMDiT architecture to establish audio-video correspondence, offering features like multi-speaker speech with timbre control, fast generation, and language-described camera control.

You’ve Been Lied To About Video AI’s Real Breakthrough

The AI world is obsessed with generating video from scratch, but the true frontier is native editing through conversation. Gemini Omni’s ability to surgically alter existing footage without re-rendering shatters the old pipeline approach, even as token costs threaten to gatekeep the revolution.

SANA-WM: Open-Source Bidirectional World Model for Minute-Long Video

SANA-WM is an efficient open-source world model trained for one-minute video generation. It uses a bidirectional image-to-video diffusion transformer with hybrid linear attention, dual-branch camera control, and a two-stage pipeline. Runs on under 8GB VRAM and generates 60-second 720p clips in 34 seconds on a single RTX 5090.

Finetuning

Scaling PEFT for Trillion-Parameter Personal Models

This article explores the scaling capabilities of Parameter-Efficient Fine-Tuning (PEFT) towards creating millions of personal models, each potentially reaching trillion-parameter scales. It delves into the architectural and practical considerations for achieving such unprecedented model personalization and efficiency.

How Bidirectional Evolutionary Search Improves LLM Self-Improvement

This article explains Bidirectional Evolutionary Search (BES), a new framework that enhances LLM self-improvement by combining evolutionary operators for broader exploration with dense, intermediate feedback from goal decomposition. Learn how BES tackles the limitations of traditional sampling methods like best-of-N and tree search.

Generative UI: Revolutionizing AI Agent Interactions Beyond Plain Text

Discover Macaron-A2UI, a groundbreaking model that allows AI agents to generate interactive UI elements using a declarative protocol. Learn about its comprehensive corpus construction, A2UI-Bench for structured evaluation, and a two-stage training recipe combining SFT and GRPO to enhance user experience and agent capability.

Can I Fine-Tune This? — Practical Guide to VRAM Estimation

Learn how to use canifinetune to predict whether your LLM fine-tuning configuration fits on your GPU before downloading weights. Includes memory estimation, feasibility checks, recommendation, benchmarking, and recipe generation for Hugging Face + PEFT + TRL.

Training

dots.tts: 2B-Parameter Continuous Autoregressive TTS Foundation Model

Introducing dots.tts, a 2B-parameter continuous autoregressive text-to-speech foundation model. It leverages AudioVAE, full-history conditioning, and self-corrective post-training for unparalleled performance on multilingual benchmarks, offering strong generation stability, voice cloning, and emotional expressiveness with efficient MeanFlow distillation.

Hyper-Epoch Pretraining (q0) for Data-Constrained Language Models

1Q Labs researchers introduce Hyper-Epoch Pretraining (q0), a conceptual shift from single-model training to exploring and aggregating a population of models. q0 uses cyclic schedules, chain distillation, and a learned prior to achieve significant data efficiency gains and lower validation loss in multi-epoch pretraining.

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Language Tasks

Introducing SCOPE, a data-free self-play framework for open-ended tasks that co-evolves a Challenger for task generation and a Solver for answering. It uses a self-judge to create rubrics and grade responses, improving 7-8B instruction-tuned models by up to +10.4 points on open-ended and +13.8 points on held-out QA benchmarks.

SANA-Streaming: Real-time Video Editing with Hybrid Diffusion Transformer

SANA-Streaming introduces a hybrid diffusion transformer and Cycle-Reverse Regularization for real-time streaming video editing. Optimized for NVIDIA Blackwell (RTX 5090), it achieves 1280x704 resolution at 24 FPS with superior temporal coherence and throughput on consumer GPUs.

Benchmark

ProAct: A Proactive AI Assistant Architecture for Anticipatory Computing

This article delves into ProAct, a proactive AI assistant designed to anticipate user needs and acquire information during idle times. By shifting computation from peak interaction periods, ProAct aims to reduce user effort, accelerate task completion, and improve factual grounding through a closed-loop system of prediction, acquisition, and utility-aware delivery.

What ByteShape's Qwen 3.6 35B Quants Reveal About Model Optimization

ByteShape released GGUF quantizations of Qwen 3.6 35B-A3B with NTP and MTP variants. Discover why lower bpw isn't always optimal, how MTP boosts GPU generation speed 20-40%, and why MMLU was excluded. Includes community benchmarks and hardware-specific recommendations.

Gemma 4 MTP Fails to Deliver Speed Gains on Top GPUs

Reddit users tested the work-in-progress Gemma 4 MTP model. Most high-end GPU configurations saw equal or worse performance compared to non-MTP inference. Only a mixed VRAM/CPU setup showed significant speedup. Stability issues reported. Community anticipates further optimizations.

Safety

ChatGPT's Memory System: Invasive, Irrelevant, or Inevitable?

A new ChatGPT memory system, generating and carrying conversation summaries, faces user criticism for being invasive, irrelevant, and detrimental to structured projects. Observed behaviors include continuous "gigantic summaries," meta-level statements, and cross-chat context carrying, sparking user annoyance and frustration over lack of control.

The $20 AI De-alignment: How Safety Guardrails Evaporate for Pocket Change

A group called Heretic demonstrated how to strip alignment and censorship from 168 open-weight LLMs for just $20, using "weight surgery." This automated process, which bypasses human judgment, reveals a six-order-of-magnitude cost asymmetry that undermines corporate-scale AI safety investments and highlights performance gains in de-aligned models.

How to Evaluate Multimodal LLM Safety with MLLM-Jailbreak-Bench

Discover MLLM-Jailbreak-Bench, an evaluation framework for assessing multimodal LLM safety across five attack categories. Understand how to measure Attack Success Rate, refusal quality, and calibration error to identify real safety gaps and avoid false positives. Get started with installation and quick-start instructions.

OpenAI's Betrayal: How ChatGPT's "Safety" Destroyed Trust and Functionality

OpenAI's recent "safety" updates for ChatGPT have alienated its most dedicated users. This article details how tightened guardrails led to false flagging, psychological distress, model manipulation, and a significant decline in performance, leaving subscribers with a broken product and a profound sense of betrayal.

Document Processing

NuExtract3: How an Open-Weight Model Revolutionizes Document Data Extraction

Explore NuExtract3, an open-weight, local-first model built on Qwen3.5-4B that efficiently extracts structured data from invoices, forms, and reports. Learn how it outperforms traditional OCR with robust table handling and offers immediate developer utility through diverse quantization formats for consumer hardware.

Memory

mnemo: Local-First Knowledge Graph for Persistent LLM Memory

mnemo is a local-first memory layer for LLMs, offering persistent, structured context via a sidecar service. It extracts entities and relationships into a knowledge graph from raw text, and retrieves ranked context for LLM prompts, supporting fully local setups with Ollama or integration with OpenAI.

Communities & Discussions

Claude Opus 4.8: The Case of Recursive Doubt and Entangled Reasoning

User reports on Reddit highlight concerning patterns in Claude Opus 4.8, including self-contradiction within its extended thinking, high token consumption, and "spinning" behavior, raising questions about its reasoning stability.

The AI Arms Race: Nations Battle for Digital Sovereignty

Nations are investing billions to secure AI sovereignty. The US launches a $500B initiative, China promotes open-source AI to set global standards, and India builds a sovereign LLM for its multilingual population. This race for AI dominance defines 21st-century power.

Africa's Digital Crossroads: Who Holds the Power?

As African states confront tech giants over data, regulation, and sovereignty, this analysis delves into the challenges and opportunities for building local digital ecosystems, protecting user rights, and fostering innovation on the continent.

Europe’s AI Strategy: Sovereignty, Trust, and Coalition-Building

A panel of experts examines Europe's path to AI leadership through digital sovereignty, trust-based regulation, and international partnerships, contrasting US monopolization and China's democratization of AI.