latest articles

Scaling PEFT for Trillion-Parameter Personal Models
This article explores the scaling capabilities of Parameter-Efficient Fine-Tuning (PEFT) towards creating millions of personal models, each potentially reaching trillion-parameter scales. It delves into the architectural and practical considerations for achieving such unprecedented model personalization and efficiency.

How LFM2.5-8B-A1B Powers On-Device AI with Unmatched Throughput
LFM2.5-8B-A1B is a new family of hybrid models designed for on-device deployment, building on the LFM2 architecture with extended pre-training and reinforcement learning. It offers competitive performance with larger models on instruction following and agentic tasks, boasting unmatched throughput on CPU and GPU inference with day-one support for llama.cpp, MLX, vLLM, and SGLang.

Prompting Claude for Critical Feedback and Deeper Insights
User-derived strategies for optimizing Claude's performance in writing and research. Learn how to prompt for critical feedback, effectively manage long contexts, and leverage editing over generation to achieve more specific, insightful AI outputs.

Munder Difflin: Beyond The Office's Humor, a Serious Open-Source Multi-Agent System Emerges
Explore Munder Difflin, an open-source multi-agent system drawing inspiration from "The Office." This project offers a practical, distributed AI architecture, demonstrating how pop culture can spark serious software innovation.

ChatGPT's Memory System: Invasive, Irrelevant, or Inevitable?
A new ChatGPT memory system, generating and carrying conversation summaries, faces user criticism for being invasive, irrelevant, and detrimental to structured projects. Observed behaviors include continuous "gigantic summaries," meta-level statements, and cross-chat context carrying, sparking user annoyance and frustration over lack of control.

How Science Superpowers Transforms AI Agents into Disciplined Scientific Collaborators
Science Superpowers guides AI agents through a rigorous, preregistered workflow for scientific collaboration, ensuring precision, reproducibility, and protection against p-hacking. This guide details its functionality, emphasizing its zero third-party dependency design and installation across various agent harnesses like Cursor, Claude Code, and Gemini CLI.

SANA-Streaming: Real-time Video Editing with Hybrid Diffusion Transformer
SANA-Streaming introduces a hybrid diffusion transformer and Cycle-Reverse Regularization for real-time streaming video editing. Optimized for NVIDIA Blackwell (RTX 5090), it achieves 1280x704 resolution at 24 FPS with superior temporal coherence and throughput on consumer GPUs.

How UNISON Unifies Audio and Speech Generation with Deep LLM Fusion
UNISON is a unified latent flow-matching framework for audio and speech generation and editing. Using a single set of weights, it integrates text-to-audio, text-to-speech, zero-shot speaker cloning, mixed speech-and-sound scene generation, and audio/speech-in-scene editing—all in one model, one architecture, one forward pass, leveraging deep LLM fusion with Qwen2.5-Omni-7B.

Philosophy, Not Just Data, Holds the Key to Deeper AI
This article argues for integrating philosophical principles into AI priming to achieve more profound and ethically sound artificial intelligence. Moving beyond data-centric training, it explores how philosophical frameworks can enable AI to generate more meaningful and contextually rich responses.

Harness-1: Reinforcement Learning for Search Agents
Harness-1 introduces a novel approach to reinforcement learning for search agents through state-externalizing harnesses. This project, detailed in arXiv:2606.02373, provides a framework for advanced AI agent development.

How to Build an AI App-Builder with Sandboxed
Learn how to set up and use sandboxed, an open-source engine that powers AI app-builders by providing isolated cloud dev environments, built-in coding agents, and live preview links for multiple users on a single server. Understand its architecture and practical usage.

How to Delegate LLM Tasks with cc-fleet in Claude Code
Learn how to use cc-fleet to delegate tasks to various large language models (DeepSeek, GLM, Qwen, Kimi, MiniMax) within Claude Code. This guide covers installation, vendor registration, and leveraging cc-fleet as a secure Claude Code teammate or one-shot headless subagent, protecting your primary credentials and managing vendor API keys securely.