home›Agentic Systems›

SkillOpt: Optimizing Agent Skills with Trainable Natural-Language Descriptions

Microsoft Research's text-space optimizer enables self-evolving agent capabilities, demonstrated in a multimodal paper-figure extraction task.

June 10, 2026

#Agents #Automation #Framework #LLM #Open Source

SkillOpt, from Microsoft Research, is a text-space optimizer that treats agent skill documentation as a trainable external state. This approach allows agents to self-evolve their capabilities, as shown by @omarsar0's integration, which improved paper-figure extraction quality by 20 points.

The Silent Killer of AI Agent Performance

Most AI agent failures are not caused by weak language models. They are caused by poorly written agent skill documentation. Hand-crafting skill documents has become the default — authors write descriptions of how an agent should behave, then hope those instructions generalize across tasks. What the SkillOpt team at Microsoft Research observed is stark: this manual approach is “probably not optimal.”

SkillOpt reframes the entire problem. Instead of treating agent skill docs as static text authored once, it treats them as a trainable external state. This changes everything. Suddenly, an agent’s abilities can be continuously improved without touching the frozen model underneath. The project, openly available on GitHub at microsoft/SkillOpt, offers a glimpse into a future where your agent’s skill documentation evolves on its own.

From Static Manuals to Trainable External State

The core insight behind SkillOpt is that natural-language skill descriptions are just long pieces of text — and text can be optimized. SkillOpt operates as a text-space optimizer, searching for better wordings that improve downstream task performance. It keeps the underlying agent model frozen and only modifies the reusable skill descriptions.

This is a radical departure from the prevailing workflow of writing agent instructions and moving on. In the SkillOpt paradigm, the skill documents become a state machine of sorts — an externally adjustable configuration that guides the agent’s behavior. The optimizer iteratively refines that state, making skills measurably sharper. Whatever agent framework you use — be it Anthropic’s Claude agent skills documentation or custom orchestrators — SkillOpt plugs in as a generic, task-agnostic improver.

A luminous, abstract sculpture of flowing text—fragments of script glowing amber and cyan—suspended in a dark void. The letters shift and morph, sharpening into crystalline edges, as if refined by invisible forces. Soft, ethereal light pulses through the forms, casting shadows that ripple like liquid. The texture is a blend of smooth glass and granular sand, suggesting both precision and organic evolution. No diagrams, no labels—pure metaphor of iterative optimization and measurable clarity.

A Real-World Integration

Developer Elvis (@omarsar0) put SkillOpt to the test just days after its public mention. He integrated the optimizer into his own agent orchestrator and saw an immediate shift. His agent skills suddenly had a proper testing framework and the capacity to self-evolve. Rather than guessing whether a skill description was good enough, he could now run SkillOpt and watch it automatically produce better variants.

This wasn’t a theoretical exercise. The integration revealed that even skill documents that “looked fine” to a human could be optimized significantly. The agent’s outputs became more reliable after each round of text-space optimization. The process turned skill authoring from an art into a measurable, test-driven improvement loop.

Case in Point: Extracting Figures from Papers

A concrete example highlights the leap. The test task involved multimodal analysis — extracting figures and tables from academic papers. The metric was a straightforward quality score.

Task	Metric	Before	After	Improvement
Paper figure/table extraction	Quality score	0.73	0.93	+0.20

A 20-point absolute gain after SkillOpt optimization. This wasn’t achieved by swapping the underlying model or adding more data. It came solely from refining the skill description — the text that tells the agent how to perform the task. The result underscores how much latent performance is trapped inside today’s agents simply because their documentation is imprecise.

Skill Documentation as a State Machine

SkillOpt effectively turns agent skill documentation into a dynamic, optimizable component. The analogy of a state machine fits naturally: the docs are no longer a static manual but a continuously updated external state that governs the agent’s decision flow. Every round of optimization adjusts that state to produce better outcomes.

This shift has profound implications. Until now, agent skill docs have been treated as fixed artifacts. With SkillOpt, they become living, trainable assets. The optimizer can be run whenever new evaluation data arrives, keeping documentation aligned with real-world requirements. For the broader community, it means that maintaining an agent’s skill library is no longer a hand-crafting chore — it’s an automated, quality-driven process.

The Road Ahead for Self-Optimizing Agents

SkillOpt challenges the assumption that human-written skill descriptions are good enough. The evidence from both the research lab and independent integrations shows that even small text optimizations can unlock enormous performance jumps. As agent frameworks increasingly adopt patterns like Anthropic’s Claude agent skills documentation, the need for a text-space optimizer becomes mainstream.

The optimizer is already public on GitHub at microsoft/SkillOpt. It signals a turning point: we are moving from hand-built agent behaviors to self-evolving skill documentation that gets better every time you run a test. Agents won’t just follow instructions — they will constantly refine the very instructions that define them.