ai/news — latest articles 4

latest articles

What is Genspark AI and How Does It Work?

Discover Genspark AI, an open-source Super Agent framework that orchestrates multiple LLMs to plan, reason, and execute complex tasks. Learn about its local operation, customizability, and ability to generate dynamic Sparkpages, presentations, spreadsheets, and more, all without subscription costs or vendor lock-in.

Safety

How to Evaluate Multimodal LLM Safety with MLLM-Jailbreak-Bench

Discover MLLM-Jailbreak-Bench, an evaluation framework for assessing multimodal LLM safety across five attack categories. Understand how to measure Attack Success Rate, refusal quality, and calibration error to identify real safety gaps and avoid false positives. Get started with installation and quick-start instructions.

Finetuning

How Bidirectional Evolutionary Search Improves LLM Self-Improvement

This article explains Bidirectional Evolutionary Search (BES), a new framework that enhances LLM self-improvement by combining evolutionary operators for broader exploration with dense, intermediate feedback from goal decomposition. Learn how BES tackles the limitations of traditional sampling methods like best-of-N and tree search.

Training

Why Clean-Latent Prediction Outperforms Velocity in Diffusion Models

Explore how the choice of prediction target profoundly impacts diffusion model performance, even in latent spaces. This article details a controlled study comparing clean-latent (JLT) and velocity prediction (DiT), revealing why direct clean-latent regression consistently yields superior results due to fundamental differences in the underlying regression problem.

LLMs

New LLM "Sleep" Phase Boosts Long-Context Performance

Researchers propose a "sleep" phase for large language models that converts recent context into persistent fast weights, clearing the key-value cache. This innovative approach addresses the attention bottleneck, enabling models to handle long-context tasks efficiently and perform better on complex benchmarks like math reasoning.

Training

Why Gaussianity is Key to Identifiable World Models in AI

Explore the "if and only if" theorem behind LeJEPA's success in representation learning. Understand the role of Gaussian distributions, alignment, and regularization in achieving linear identifiability in AI's quest for robust world models.

LLMs

MiniMax Unveils M2 Series, Teases M3 with 9.7x Speedup via Sparse Attention

MiniMax releases a technical report on its M2 model series, featuring a sparse Mixture-of-Experts backbone and innovative "interleaved thinking." The report also previews the upcoming M3 model, which achieves a 9.7x prefilling speedup with MiniMax Sparse Attention (MSA) for 1-million-token sequences, pushing AI efficiency boundaries.

Agentic Systems

Inside Enterprise Security for Agentic Workflows

Anthropic's latest Claude Managed Agents update introduces self-hosted sandboxes and MCP tunnels, fundamentally changing how enterprises deploy autonomous AI. This deep dive covers the new security architecture, allowing agents to execute tools and access services within an organization's perimeter, crucial for regulated industries.

Communities & Discussions

Africa's Digital Crossroads: Who Holds the Power?

As African states confront tech giants over data, regulation, and sovereignty, this analysis delves into the challenges and opportunities for building local digital ecosystems, protecting user rights, and fostering innovation on the continent.

Audio

How MOSS-SoundEffect v2.0 Revolutionizes Text-to-Audio Synthesis

Discover MOSS-SoundEffect v2.0, a cutting-edge text-to-audio model using a 1.3B-parameter Diffusion Transformer and Flow Matching for superior sound generation. Learn about its capabilities, multilingual support, and optimal settings for creating diverse audio content.

Images

How Bonsai 4B's Ternary Weights Revolutionize Compact Text-to-Image AI

Explore Bonsai Image Ternary 4B, a 1.21 GB Diffusion Transformer using ternary weights for efficient text-to-image generation. Learn how this model delivers fast, high-quality results without negative prompts, running natively on Linux and Windows with CUDA.

LLMs

What is MiniCPM5-1B and How Does Its Dual-Mode Architecture Work?

Discover MiniCPM5-1B, an efficient 1B-parameter causal language model optimized for local and resource-constrained environments. Learn about its Llama-based architecture, impressive 131K context window, and innovative 'Think' and 'No Think' modes that enable it to function as both a fast assistant and a deliberate reasoner from a single checkpoint.