ai/news — latest articles 2

latest articles

Scaling PEFT for Trillion-Parameter Personal Models

This article explores the scaling capabilities of Parameter-Efficient Fine-Tuning (PEFT) towards creating millions of personal models, each potentially reaching trillion-parameter scales. It delves into the architectural and practical considerations for achieving such unprecedented model personalization and efficiency.

LLMs

How LFM2.5-8B-A1B Powers On-Device AI with Unmatched Throughput

LFM2.5-8B-A1B is a new family of hybrid models designed for on-device deployment, building on the LFM2 architecture with extended pre-training and reinforcement learning. It offers competitive performance with larger models on instruction following and agentic tasks, boasting unmatched throughput on CPU and GPU inference with day-one support for llama.cpp, MLX, vLLM, and SGLang.

Personal Assistants

Prompting Claude for Critical Feedback and Deeper Insights

User-derived strategies for optimizing Claude's performance in writing and research. Learn how to prompt for critical feedback, effectively manage long contexts, and leverage editing over generation to achieve more specific, insightful AI outputs.

Agentic Systems

Munder Difflin: Beyond The Office's Humor, a Serious Open-Source Multi-Agent System Emerges

Explore Munder Difflin, an open-source multi-agent system drawing inspiration from "The Office." This project offers a practical, distributed AI architecture, demonstrating how pop culture can spark serious software innovation.

Safety

ChatGPT's Memory System: Invasive, Irrelevant, or Inevitable?

A new ChatGPT memory system, generating and carrying conversation summaries, faces user criticism for being invasive, irrelevant, and detrimental to structured projects. Observed behaviors include continuous "gigantic summaries," meta-level statements, and cross-chat context carrying, sparking user annoyance and frustration over lack of control.

Agentic Systems

How Science Superpowers Transforms AI Agents into Disciplined Scientific Collaborators

Science Superpowers guides AI agents through a rigorous, preregistered workflow for scientific collaboration, ensuring precision, reproducibility, and protection against p-hacking. This guide details its functionality, emphasizing its zero third-party dependency design and installation across various agent harnesses like Cursor, Claude Code, and Gemini CLI.

Training

SANA-Streaming: Real-time Video Editing with Hybrid Diffusion Transformer

SANA-Streaming introduces a hybrid diffusion transformer and Cycle-Reverse Regularization for real-time streaming video editing. Optimized for NVIDIA Blackwell (RTX 5090), it achieves 1280x704 resolution at 24 FPS with superior temporal coherence and throughput on consumer GPUs.

Audio

How UNISON Unifies Audio and Speech Generation with Deep LLM Fusion

UNISON is a unified latent flow-matching framework for audio and speech generation and editing. Using a single set of weights, it integrates text-to-audio, text-to-speech, zero-shot speaker cloning, mixed speech-and-sound scene generation, and audio/speech-in-scene editing—all in one model, one architecture, one forward pass, leveraging deep LLM fusion with Qwen2.5-Omni-7B.

Personal Assistants

Philosophy, Not Just Data, Holds the Key to Deeper AI

This article argues for integrating philosophical principles into AI priming to achieve more profound and ethically sound artificial intelligence. Moving beyond data-centric training, it explores how philosophical frameworks can enable AI to generate more meaningful and contextually rich responses.

Training

Harness-1: Reinforcement Learning for Search Agents

Harness-1 introduces a novel approach to reinforcement learning for search agents through state-externalizing harnesses. This project, detailed in arXiv:2606.02373, provides a framework for advanced AI agent development.

Agentic Systems

How to Build an AI App-Builder with Sandboxed

Learn how to set up and use sandboxed, an open-source engine that powers AI app-builders by providing isolated cloud dev environments, built-in coding agents, and live preview links for multiple users on a single server. Understand its architecture and practical usage.

AI Coding

How to Delegate LLM Tasks with cc-fleet in Claude Code

Learn how to use cc-fleet to delegate tasks to various large language models (DeepSeek, GLM, Qwen, Kimi, MiniMax) within Claude Code. This guide covers installation, vendor registration, and leveraging cc-fleet as a secure Claude Code teammate or one-shot headless subagent, protecting your primary credentials and managing vendor API keys securely.