Memory
Page 1 of 1

mnemo: Local-First Knowledge Graph for Persistent LLM Memory
mnemo is a local-first memory layer for LLMs, offering persistent, structured context via a sidecar service. It extracts entities and relationships into a knowledge graph from raw text, and retrieves ranked context for LLM prompts, supporting fully local setups with Ollama or integration with OpenAI.

SANA-Streaming: Real-time Video Editing with Hybrid Diffusion Transformer
SANA-Streaming introduces a hybrid diffusion transformer and Cycle-Reverse Regularization for real-time streaming video editing. Optimized for NVIDIA Blackwell (RTX 5090), it achieves 1280x704 resolution at 24 FPS with superior temporal coherence and throughput on consumer GPUs.

New LLM "Sleep" Phase Boosts Long-Context Performance
Researchers propose a "sleep" phase for large language models that converts recent context into persistent fast weights, clearing the key-value cache. This innovative approach addresses the attention bottleneck, enabling models to handle long-context tasks efficiently and perform better on complex benchmarks like math reasoning.

ProAct: A Proactive AI Assistant Architecture for Anticipatory Computing
This article delves into ProAct, a proactive AI assistant designed to anticipate user needs and acquire information during idle times. By shifting computation from peak interaction periods, ProAct aims to reduce user effort, accelerate task completion, and improve factual grounding through a closed-loop system of prediction, acquisition, and utility-aware delivery.

LLMs Learn to "Sleep" for Deeper Reasoning
This article explores how "LLM sleep," an offline consolidation phase, allows hybrid attention-SSM models to improve deep reasoning by iteratively refining fast-weight memories. Inspired by hippocampal replay, this method addresses the computational bottleneck of context eviction, enhancing performance on complex sequential tasks without increasing prediction-time cost.

How to Give AI Coding Agents Persistent Memory and Context
Learn how AI-Memory solves the context loss problem for AI coding agents. This tool provides a persistent, Git-versioned Markdown wiki, enabling cross-agent handoffs, automatic context capture, and project isolation for a truly continuous AI-assisted development workflow.

OpenAI's Betrayal: How ChatGPT's "Safety" Destroyed Trust and Functionality
OpenAI's recent "safety" updates for ChatGPT have alienated its most dedicated users. This article details how tightened guardrails led to false flagging, psychological distress, model manipulation, and a significant decline in performance, leaving subscribers with a broken product and a profound sense of betrayal.

What ByteShape's Qwen 3.6 35B Quants Reveal About Model Optimization
ByteShape released GGUF quantizations of Qwen 3.6 35B-A3B with NTP and MTP variants. Discover why lower bpw isn't always optimal, how MTP boosts GPU generation speed 20-40%, and why MMLU was excluded. Includes community benchmarks and hardware-specific recommendations.