The Reactive Status Quo and a Proactive Alternative
Today’s AI assistants remain fundamentally reactive: they compute responses only after explicit user prompts, leaving the idle time between interactions unused. This contrasts with the psychological concept of proactive coping, where individuals anticipate future demands and prepare resources in advance.
The paper introduces ProAct, a proactive agent architecture that transforms idle intervals into structured cycles of anticipation and learning. Instead of waiting for a request, ProAct analyzes dialogue history and persistent memory to predict likely upcoming user needs, then acquires supporting evidence during idle windows. A value-aware delivery gate ensures that prepared content is surfaced only when it is genuinely useful, avoiding irrelevant interruptions.

This paradigm shifts substantial computation from interaction peaks to off-peak periods, aiming to reduce user effort, accelerate task completion, and improve factual grounding.
ProAct Architecture: Prediction, Acquisition, and Delivery
ProAct operates through a closed loop that couples foreground interactions with background preparation. After each user turn, the system updates its persistent memory, which stores user profiles, conversation summaries, entity facts, and previously acquired artifacts.
During the subsequent idle interval, two tightly integrated modules take over:
- Future-State Prediction generates a compact set of candidate future needs by extrapolating from the recent dialogue and expanding into related topics grounded in memory. It also incorporates signals from memory maintenance, converting stale or missing knowledge into prediction targets.
- Idle-Time Acquisition scores each candidate using a value function that balances user relevance, knowledge gaps, incremental value, and timeliness. Only high-scoring candidates receive search budget. The module then retrieves or reuses evidence, generates compact knowledge artifacts with provenance, and commits them to memory.
A utility-aware delivery policy decides whether each artifact should be pushed immediately to the user, queued for integration into a later response, or stored silently for future use. This gate prevents proactive work from overwhelming the user with low-value content.

Formalizing Proactive Agent Behavior
The proactive interaction is formulated as a closed-loop decision problem. Let be the dialogue history up to turn and the persistent memory state. During an idle window with budget , the predictor generates a set of candidate future needs:
Each candidate is represented as : the anticipated need, grounding rationale, confidence, and retrieval plan.
The proactive policy selects candidates, allocates budget, and assigns delivery decisions to maximize expected future utility under interruption, budget, and hallucination constraints. Because downstream utility is unobservable at idle time, ProAct uses a candidate-level value score for acquisition gating:
where is user relevance, the knowledge gap, incremental value, and timeliness. Only candidates with proceed to evidence acquisition. This scoring mechanism ties prediction directly to resource allocation, ensuring idle-time compute is spent only on high-value preparation.
ProActEval: A Benchmark for Proactive Assistance
Evaluating proactive agents requires more than testing reactive question-answering. The authors introduce ProActEval, a comprehensive benchmark with 200 scenarios across 40 domains. Each scenario contains a self-contained fact sheet of fictional entities and an ordered sequence of user needs with explicit predictability annotations.
Key design features:
- Needs are organized into reveal groups with
predictable_afterlinks, forming a user-needs graph that the assistant never sees. - Scenarios span five cognitive archetypes (e.g., Foundational Memory, Trace and Dependency Reasoning) to cover diverse anticipatory demands.
- A user simulator traverses the need sequence, skipping needs already covered proactively, thereby translating anticipation into reduced user effort.
Evaluation metrics include and (turns to reach 80% and 100% must-have coverage), User Effort (explicit user turns), Fact Accuracy, Hallucination Rate, and Anticipation Recall. An LLM-based judge assesses factual correctness and coverage without access to gold metadata.
Proactive Gains: Efficiency, Coverage, and Factual Integrity
On ProActEval, the full Directed Idle configuration (prediction-guided idle-time compute) substantially outperforms both a reactive baseline and an undirected idle variant.
| Metric | Reactive | Undirected Idle | Directed Idle | vs. Reactive | |--------|----------|-----------------|---------------|------------------------| | | 8.110 | 8.040 | 6.910 | –14.8% | | User Effort | 9.140 | 9.040 | 8.075 | –11.7% | | Hallucination Rate | 0.132 | 0.124 | 0.095 | –28.1% | | Anticipation Recall | 0.000 | 0.000 | 0.428 | +0.428 |
The ablation reveals that undirected background search alone yields negligible improvements, while predictive direction drives the gains. Compared with an adapted ProactiveAgent baseline, ProAct anticipates 703 of 1,572 predictable needs (recall 0.447) versus only 32 (0.020), demonstrating that proactive behavior must be targeted to benchmark-relevant needs to reduce user effort.
Memory Backbone and the Cost of Idle-Time Search
ProAct’s memory layer achieves state-of-the-art reflective accuracy on MemBench: 84.3% at 10k tokens and 86.3% at 100k tokens, surpassing prior systems like MemGPT and MemoryBank. This robust long-term memory is essential for grounding future-state predictions.
A search-budget analysis on a 50-scenario subset reveals a clear cost–efficiency trade-off. Increasing the idle-search budget from 4 to 16 raises Anticipation Recall from 0.253 to 0.432, but and User Effort do not improve monotonically. Once the main predictable needs are covered, additional searches chase lower-marginal needs and can alter the closed-loop conversation trajectory, sometimes even degrading end-to-end efficiency.

At every matched budget, Directed Idle outperforms Undirected Idle, confirming that predictive direction improves the utility of idle-time compute beyond raw search volume. The budget should be treated as an operating point, not a parameter to maximize.
Conclusion and Outlook
ProAct demonstrates that idle-time compute, when guided by future-state prediction and grounded in persistent memory, can significantly improve proactive assistance: reducing interaction turns, lowering user effort, and cutting hallucinations. The ProActEval benchmark provides a rigorous framework for measuring these capabilities across diverse, predictable need chains.
The work also highlights important limitations. Results are obtained on a closed-world synthetic benchmark; real-world deployments would require user controls, rate limits, and privacy safeguards. Proactive preparation can occasionally backfire, competing with reactive answers or pushing low-value content. The budget analysis underscores that more idle-time search does not guarantee better outcomes—efficient proactive assistance depends on accurate need prediction and value-aware delivery gating.
When applied with appropriate controls, proactive agents could reduce repetitive information-seeking, help users prepare for foreseeable follow-ups, and improve factual grounding by acquiring evidence before rushed responses are needed. This research opens a path toward AI assistants that actively anticipate and learn, rather than merely react.



