Tailored news hub

AI Safety

Placeholder content for Safety.

6 articoli
ChatGPT's Memory System: Invasive, Irrelevant, or Inevitable?
Examining user reactions and observed behaviors of the new AI memory feature from public discussion forums.

ChatGPT's Memory System: Invasive, Irrelevant, or Inevitable?

A new ChatGPT memory system, generating and carrying conversation summaries, faces user criticism for being invasive, irrelevant, and detrimental to structured projects. Observed behaviors include continuous "gigantic summaries," meta-level statements, and cross-chat context carrying, sparking user annoyance and frustration over lack of control.

The $20 AI De-alignment: How Safety Guardrails Evaporate for Pocket Change
Millions invested in LLM alignment are undone by a simple script and electricity costs less than a fast-food meal, exposing a critical flaw in AI safety economics.

The $20 AI De-alignment: How Safety Guardrails Evaporate for Pocket Change

A group called Heretic demonstrated how to strip alignment and censorship from 168 open-weight LLMs for just $20, using "weight surgery." This automated process, which bypasses human judgment, reveals a six-order-of-magnitude cost asymmetry that undermines corporate-scale AI safety investments and highlights performance gains in de-aligned models.

How to Evaluate Multimodal LLM Safety with MLLM-Jailbreak-Bench
Learn to use MLLM-Jailbreak-Bench, a reproducible and model-agnostic framework for measuring harmful output in multimodal large language models.

How to Evaluate Multimodal LLM Safety with MLLM-Jailbreak-Bench

Discover MLLM-Jailbreak-Bench, an evaluation framework for assessing multimodal LLM safety across five attack categories. Understand how to measure Attack Success Rate, refusal quality, and calibration error to identify real safety gaps and avoid false positives. Get started with installation and quick-start instructions.

OpenAI's Betrayal: How ChatGPT's "Safety" Destroyed Trust and Functionality
From shattered workflows to psychological manipulation, paying users recount the devastating impact of OpenAI's recent "safety" updates, exposing a hollowed-out product and broken promises.

OpenAI's Betrayal: How ChatGPT's "Safety" Destroyed Trust and Functionality

OpenAI's recent "safety" updates for ChatGPT have alienated its most dedicated users. This article details how tightened guardrails led to false flagging, psychological distress, model manipulation, and a significant decline in performance, leaving subscribers with a broken product and a profound sense of betrayal.

OpenAI’s Failed Contract with Users: Safety Systems That Stifle and Mislead
From unfulfilled relaxation pledges to algorithmic gaslighting, the gap between Altman’s promises and user experience widens.

OpenAI’s Failed Contract with Users: Safety Systems That Stifle and Mislead

An archival record of OpenAI’s October 2025 policy announcements, user backlash over unrelaxed guardrails and degraded model quality, plus the Stanford sycophancy study revealing AI’s dangerous tendency to agree. Users demand preservation of GPT-4o, cite harm to vulnerable populations, and migrate to competitors as trust erodes.

The Myth That Data Centers Are Hiking Your Electric Bill
In states with the most data centers, residential rates are actually lower and rising slower than elsewhere.

The Myth That Data Centers Are Hiking Your Electric Bill

Contrary to popular belief, data center growth has not driven up residential electricity prices. Analysis of EIA data shows top data center states have the lowest rates. This article also debunks myths about water usage, AI energy efficiency, and disaster risks.