ALITA: Agent That Creates Its Own Tools

6/21/2025•Dishant Miyani•6 min read

Revolutionary AI agent that starts with minimal tools and evolves by creating its own capabilities on-the-fly, achieving state-of-the-art results on GAIA benchmark.

ALITA (Agent Generalist Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution) represents a paradigm shift in AI agent architecture - instead of being a consumer of predefined tools, it becomes a creator of its own tools as needed.

Executive Summary of ALITA #

ALITA is a new type of AI agent designed to autonomously perform complex, open-ended tasks. Unlike many existing AI agents that rely on a large, manually predefined set of tools and workflows, ALITA is built on two core principles:

1. Minimal Predefinition #

ALITA starts with very few built-in tools. Essentially, its main predefined capability is a “Web Agent” that can search the internet.

2. Maximal Self-Evolution #

ALITA is designed to create its own tools as needed. When faced with a task it can’t solve with its existing (limited) capabilities, it:

Brainstorms: Figures out what kind of tool or capability it’s missing
Searches: Uses its Web Agent to find open-source code, libraries, or documentation that could help build that tool
Generates: Writes new code (typically Python scripts) to implement the needed functionality
Tests & Packages: Runs the new code in an isolated environment. If it works, it packages this new script/tool as a “Model Context Protocol” (MCP)
Reuses: Stores these self-created MCPs in an “MCP Box” for future use by itself or potentially other agents

This approach allows ALITA to be highly adaptable, scalable, and generalizable across different domains without developers needing to anticipate and pre-program every possible tool. The paper shows ALITA achieving top-ranking performance on the GAIA benchmark, outperforming more complex systems.

Key Research Insights #

Core Philosophy #

“Simplicity is the ultimate sophistication.” The design emphasizes a lean starting point and dynamic capability generation.

Architecture Components #

Manager Agent: The central coordinator that analyzes tasks, decides if new MCPs are needed, and orchestrates the MCP creation process or uses existing MCPs.

Web Agent: Searches for information, libraries, and code.

MCP Creation Component: Includes sub-modules for brainstorming, script generation, code running, and environment management (e.g., creating isolated Conda environments).

MCP (Model Context Protocol) Creation Process #

MCPs are essentially self-generated, reusable tools/scripts that follow this process:

MCP Brainstorming
Open-source Searching (via Web Agent)
Script Generation
Virtual Environment Execution & Testing
Encapsulation as MCP

This allows the agent to learn and expand its toolkit over time in a self-reinforcing cycle.

Performance Results #

Achieves state-of-the-art results on the GAIA benchmark (75.15% pass@1), outperforming systems like OpenAI’s DeepResearch
Shows strong performance on Mathvista and PathVQA
Distillation Effect: MCPs generated by ALITA (using powerful LLMs like GPT-4o) can be used by simpler agents, significantly boosting their performance

What Makes This Game-Changing? #

The revolutionary aspect is the shift from agents being consumers of predefined tools to agents being creators of their own tools on-the-fly.

Key Advantages #

Scalability & Adaptability: Traditional agents hit a wall because you can’t pre-build a tool for every conceivable task. ALITA’s approach means it can adapt to entirely new problems by finding and integrating existing open-source solutions.

Reduced Development Overhead: Developers don’t need to spend time building and maintaining a vast library of tools. Instead, they focus on building the meta-capability of tool creation.

Emergent Capabilities: The agent can develop capabilities that the original designers might not have thought of, simply by combining information and code it finds in novel ways.

Democratization of Tooling: The MCPs, once created, are standardized and can potentially be shared and reused across different agent systems.

This “maximal self-evolution” through dynamic tool generation is a significant step towards more autonomous and truly “generalist” agents.

Future Applications #

1. More Powerful Generalist Assistants #

As LLMs improve, ALITA-like agents will become even better at creating more complex and reliable tools, tackling sophisticated tasks like complex data analysis, automated scientific experimentation, and advanced software development assistance.

2. Personalized Agent Ecosystems #

Users could have agents that build up a unique set of MCPs tailored to their specific needs, software, and workflows, becoming highly personalized and efficient assistants.

3. Automated Software Development #

Agents could not only write code for specific functions but also find, adapt, and integrate existing libraries and APIs to build more complex applications with less human intervention.

4. Rapid Prototyping #

Need a quick script to convert file formats, scrape a specific website, or automate a repetitive task? An ALITA-like agent could generate it for you.

5. Collaborative Agent Networks #

Agents could share their self-generated MCPs, leading to a collective intelligence where the capabilities of the entire network grow much faster.

Practical Implementation for Developers #

Even without implementing the full ALITA framework, developers can adopt the core principles:

1. Embrace Dynamic Tool Creation #

Design your agent to recognize when it’s missing a capability
Have the LLM describe the tool it needs (inputs, outputs, function)
Search for existing Python libraries, code snippets, or APIs
Use LLMs to generate Python scripts based on task specifications
Always test in isolated environments (Docker containers, virtual environments)

2. Build a Reusable “Toolbox” #

Save working scripts with descriptions
Create a simple system to store these scripts (folder of Python files with JSON manifest)
Teach your agent to check its “toolbox” before generating new tools

3. Focus on “Manager Agent” Logic #

Design prompts that encourage the LLM to:

Break down complex problems
Identify when existing tools are sufficient
Recognize when new tools are needed
Plan the steps to create or use tools

Example Workflow #

User: "Find current weather in London and summarize top 3 AI news articles"

Agent Manager:
1. "Do I have a weather tool?" → If no, create one using weather API
2. "Do I have a news search tool?" → If no, create one
3. "Do I have a summarizer?" → If no, create one
4. Execute tools and combine results

By adopting this mindset of dynamic capability generation, agents can become more flexible and powerful without developers having to foresee every possible need.

Key Limitation #

The ability of ALITA to generate useful MCPs is highly dependent on the coding and reasoning capabilities of the underlying Large Language Model. When run with weaker LLMs, performance drops significantly, though this will improve as LLMs advance.