Blog›AI›We Gave Our AI Agent 26…

We Gave Our AI Agent 26 Tools. Here's Why That's the Right Number. (2026)

April 17, 2026·24 min read·Stan Chang·AI·#engineering #ai-agents #tool-design

On this page (31)

Vercel removed 80% of their AI agent's tools and got better results. We tried the same thing and got worse. The difference: Vercel's agent crosses one system boundary, Taskade's EVE crosses four. 26 tools, one per system operation, zero overlap. Mode-based filtering narrows the action space per context. [Build with AI agents →](/agents)

TL;DR: The "fewer tools is better" narrative is only half right. It works for single-system agents. Multi-system orchestrators like Taskade need one tool per system operation — 26 in our case, across 8 categories spanning Projects, Agents, Automations, and Apps. Mode-based filtering gives each context only the tools it needs. Try Taskade Genesis free →

In January 2026, Vercel published a post titled "We removed 80% of our agent's tools." It went viral in engineering circles. The message was clean and compelling: fewer tools equals better agent performance. Developers on Hacker News celebrated. The consensus formed quickly — tool sprawl is the enemy of agent accuracy.

We read it the day it dropped. And then we tried it.

We stripped our AI agent down from 26 tools to 6 core tools. The results were immediate. Our agent stopped creating automations when users asked for them. It could not wire knowledge bases to agents. It lost the ability to orchestrate multi-agent conversations. In other words, it lost the capabilities that make it useful.

We added the tools back within a week.

This post is not a rebuttal of Vercel's conclusion. Their conclusion was correct — for their agent. But the universal application of "fewer tools is better" is wrong. The right number of tools is a function of how many system boundaries your agent crosses, not a universal constant. Here is our tool taxonomy, why each tool exists, and the design principles that make 26 work without confusion.

Why Vercel Was Right (For Vercel)

Vercel's agent operates within a single system: a code editor. It reads files, writes files, and runs commands. Within that single domain, having multiple tools that modify files — "edit file," "replace in file," "write file," "patch file" — creates genuine confusion. The agent cannot reliably distinguish when to use "edit" versus "replace" because both tools operate on the same system in the same way.

Reducing to the minimal set that covers the system boundary is the correct move for single-system agents. If your agent does one thing, give it the fewest tools needed to do that thing well.

The lesson Vercel taught is real: do not give an agent multiple tools that do the same thing. We agree with this completely. But the corollary is just as important: do not remove tools that do different things across different systems just because the total count feels high.

Why We Need 26 (For Taskade)

Taskade's EVE agent orchestrates four distinct systems: Memory (Projects), Intelligence (Agents), Execution (Automations), and Interface (Apps). This is what we call Workspace DNA — the self-reinforcing loop where Memory feeds Intelligence, Intelligence triggers Execution, and Execution creates Memory.

Each system has operations that cannot be combined or abstracted away:

You cannot use a project tool to manage agents. Projects store data. Agents process it. Different systems, different schemas, different lifecycles.
You cannot use an automation tool to edit app code. Automations orchestrate workflows across 100+ integrations. App editing manipulates files in a virtual filesystem. Completely separate domains.
You cannot use a file editor to configure knowledge bases. Knowledge bases connect to agents through a specific wiring API. A text editor does not know what a knowledge base is.

When you break down 26 tools across four system boundaries, you get roughly 6-7 tools per system. That is a perfectly reasonable number for any single domain. The tools do not overlap — each maps to exactly one system operation.

The principle: one tool per system operation. If two tools operate on the same system in the same way, merge them. If they operate on different systems, keep them separate.

This is the distinction the "fewer tools" narrative misses. Vercel has one system boundary. We have four. The math is different.

The 26-Tool Taxonomy

Here is every tool our agent has access to, grouped by the system boundary it operates on. No tool touches two systems. No two tools do the same thing.

Category	Count	Tools	System Boundary
Navigation	3	`vfs_manager`, `taskade_navigate`, `taskade_retrieve_entities`	Workspace filesystem
Memory (Projects)	4	`project_manage`, `project_retrieve`, `project_search`, `project_content`	Project data CRUD
Intelligence (Agents)	5	`agent_manage`, `agent_chat`, `agent_team_chat`, `agent_configure`, `knowledge_connect`	Agent lifecycle
Execution (Automations)	4	`automation_manage`, `automation_resolve`, `automation_retrieve`, `workflow_action_as_tool`	Flow lifecycle
Interface (App)	3	`str_replace_editor`, `file_tree_manage`, `app_preview`	App code editing
Content Creation	3	`gen_image`, `web_search`, `fetch_webpage`	External content
Meta	2	`retrieve_entities`, `retrieve_media_file`	Workspace introspection
System	2	`app_logs_review`, `deploy_app`	App operations

Let me walk through each category and explain why every tool earns its place.

Navigation tools give the agent spatial awareness. Without them, the agent is blind — it does not know what exists in the workspace or where it is.

vfs_manager — Navigate the Virtual Filesystem. The workspace is represented as a file tree: spaces are directories, projects are files, folders are subdirectories. This tool lets the agent ls, cd, and pwd through the workspace structure.
taskade_navigate — Move between spaces, folders, and projects. While vfs_manager treats the workspace as a filesystem, taskade_navigate uses the Taskade domain model (spaces, folders, projects) with proper permissions and visibility rules.
taskade_retrieve_entities — List available resources in the current scope. This is the agent's "what can I work with?" query. It returns typed entities (projects, agents, automations, apps) with metadata.

Why three and not one? Because navigating a filesystem, navigating a domain model, and querying available entities are three distinct operations. Merging them would create a single tool with a complex, overloaded interface — exactly the kind of design that Vercel correctly removed.

Memory (4 Tools) — How the Agent Manages Project Data

Memory tools operate on Projects, which are the fundamental data unit in Taskade. Projects store tasks, notes, mind maps, documents — anything the user creates.

project_manage — Create, update, and delete projects. This is the CRUD tool for project lifecycle operations.
project_retrieve — Read project contents and metadata. Separate from project_manage because reads and writes have different permission models, different caching behaviors, and different error modes.
project_search — Find projects by content, title, or metadata. Search is not a read operation. It crosses project boundaries, uses different indexes, and returns ranked results rather than exact data.
project_content — Bulk content operations like import and export. This handles operations that act on project content as a whole rather than individual items.

Intelligence (5 Tools) — How the Agent Manages Other Agents

Intelligence tools manage AI agents — creating them, configuring them, and orchestrating conversations between them. This is the largest category because agent lifecycle management has the most distinct operations.

agent_manage — Create, configure, and delete agents. The CRUD baseline for agent lifecycle.
agent_chat — Talk to another agent. This is the inter-agent communication channel. EVE uses it to delegate subtasks to specialized agents.
agent_team_chat — Orchestrate multi-agent conversations. Different from agent_chat because it manages a conversation with multiple participants, handling turn-taking and context sharing.
agent_configure — Set agent parameters, tools, and personality. Separate from agent_manage because configuration is a deep operation with its own schema (system prompts, tool selections, model preferences, temperature settings).
knowledge_connect — Wire knowledge bases to agents. Knowledge bases are their own entity type with their own lifecycle. Connecting one to an agent is a relationship operation, not a configuration operation.

Execution (4 Tools) — How the Agent Manages Automations

Execution tools manage automation workflows — the system that handles triggers, actions, and integrations.

automation_manage — Create, update, and delete automation flows.
automation_resolve — Trigger and monitor flow execution. This is the "run this workflow and tell me what happened" tool.
automation_retrieve — Read flow definitions and execution history. Separate from automation_manage because reading flow history is a fundamentally different operation from modifying a flow.
workflow_action_as_tool — Invoke any automation action directly as a tool. This is the bridge pattern. More on this in a dedicated section below.

Interface (3 Tools) — How the Agent Builds Apps

Interface tools manipulate Genesis app code — the files that make up a published application.

str_replace_editor — Edit individual files in the app using targeted string replacements. Precise, predictable, and idempotent.
file_tree_manage — Create and delete files and directories. Structural operations on the app's file system.
app_preview — Generate a preview of the current app state. The agent needs to see what it built. This is a read-only rendering operation with no write side effects.

Content Creation (3 Tools) — How the Agent Gets External Content

Content tools reach outside the workspace to get information the agent does not already have.

gen_image — Generate images via AI models. Used when building apps or projects that need visual content.
web_search — Search the web for information. The agent uses this when it needs facts, data, or context that is not in the workspace.
fetch_webpage — Retrieve content from a specific URL. Different from web_search because it fetches a known page rather than discovering pages.

Meta + System (4 Tools) — Support Operations

retrieve_entities — Workspace-wide introspection across entity types. Returns metadata about the workspace structure itself.
retrieve_media_file — Access media files (images, documents) stored in the workspace.
app_logs_review — Review application logs for debugging. Essential for the build-test-fix loop in Genesis app development.
deploy_app — Deploy the current app to production. The final step in the Genesis build pipeline.

Mode-Based Tool Filtering

Here is the part that makes 26 tools work in practice. Not all 26 tools are available all the time. We use mode-based filtering to dynamically adjust which tools the agent can access based on the current task.

Mode	Tools Available	Use Case
Genesis	26 (all)	Creating a complete app from scratch — full orchestration across all four systems
Projects	8	Managing tasks, notes, and project data — only project-related tools
Agents	6	Configuring and chatting with agents — only agent-related tools
Automations	5	Building and managing workflows — only automation-related tools

In Projects Mode, the agent sees 8 tools, not 26. It cannot accidentally try to create an automation when the user asked to manage tasks. It cannot attempt to deploy an app when the user is organizing a mind map. Mode filtering narrows the agent's action space to match the user's intent.

This is the key insight that reconciles the "fewer tools" camp with our 26-tool architecture: you can have many tools total while showing few tools per context. The agent gets the accuracy benefits of a narrow tool set AND the capability benefits of a broad one.

Barry Zhang from Anthropic warns in his talk "How We Build Effective Agents" that "cost and latency go up with agency." Mode filtering is our direct answer. When the user is in Projects Mode, the agent reasons over 8 tools instead of 26. Fewer options mean fewer tokens spent on tool selection reasoning, faster responses, and fewer errors.

The result is context-appropriate tool availability. Users in Genesis mode get maximum capability for building full applications. Users managing tasks in a project get a focused, reliable agent that does exactly what they ask.

Tool Design Principles

Having the right number of tools is necessary but not sufficient. Each tool must be well-designed or the agent will misuse it regardless of how many tools exist. Here are six principles we learned from three years of iteration.

1. Clear Boundaries

Each tool operates on exactly one system. No tool touches two systems. If a tool needs to coordinate across systems (create a project AND assign an agent to it), that coordination happens in the agent's reasoning, not inside a single tool call.

This is Anthropic's "keep it simple" principle applied at the tool level. Each tool is simple. The agent handles orchestration. The system is complex because the agent composes simple tools into complex workflows.

2. Descriptive Names

Agents use tool names and descriptions to decide which tool to call. A tool named crud_data tells the agent nothing. A tool named taskade_project_manage tells the agent exactly what system it operates on and what kind of operations it supports.

We treat tool names as part of the prompt. A bad tool name is a bad prompt. A good tool name is self-documenting.

3. Structured Inputs with Zod Schemas

Every tool has a Zod schema defining its input parameters. This constrains what the agent can provide, reducing hallucinated parameters. Instead of accepting a free-form string, a tool schema specifies exactly which fields are required, what types they accept, and what values are valid.

Typescript

// Example: structured input prevents hallucinated parameters
const ProjectManageInput = z.object({
  action: z.enum(['create', 'update', 'delete']),
  projectId: z.string().optional(),
  title: z.string().max(500).optional(),
  spaceId: z.string(),
  content: z.string().optional(),
});

The schema IS the contract. The agent cannot send parameters the schema does not define. This eliminates an entire class of hallucination errors where the agent invents parameters that do not exist.

4. Informative Outputs

Tool outputs include status, result data, AND suggested next steps. When the agent creates a project, the response is not just { "status": "ok" }. It includes the created project's ID, its URL, and a hint about what the agent might do next.

Json

{
  "status": "success",
  "projectId": "abc-123",
  "url": "/p/abc-123",
  "hint": "Project created. You may want to connect an agent or set up an automation next."
}

These hints do not force the agent's hand. They provide context that improves the agent's next decision. The agent still decides. But it decides with better information.

5. LLM-Friendly Error Messages

When a tool fails, the error message is written for the LLM, not for a log file. "Permission denied for workspace X. Try switching to workspace Y or asking the user for access." — not just "403."

Standard HTTP error codes are meaningless to an agent. The agent does not know what 403 means in the context of workspaces. A descriptive error message lets the agent recover gracefully — it can explain the problem to the user or try an alternative approach.

6. Idempotent Where Possible

Calling project_manage with the same parameters twice should not create two projects. Where full idempotency is not possible (some operations are inherently non-idempotent), the tool checks for duplicates and returns the existing resource instead of creating a new one.

This matters because agents retry. Network hiccups, timeout errors, and reasoning loops can all cause the agent to invoke a tool multiple times. If the tool is not idempotent, the user ends up with duplicate projects, duplicate automations, and duplicate agents.

The workflow_action_as_tool Bridge Pattern

The most innovative tool in our taxonomy is workflow_action_as_tool. It is a meta-tool — a tool that turns other things into tools.

Here is the problem it solves. Taskade has 100+ integrations across categories like Communication (Slack, Discord), Email (Gmail, Outlook), Payments (Stripe, Shopify), Development (GitHub, GitLab), and more. Each integration has multiple actions — Slack has "send message," "create channel," "list users." Shopify has "list products," "create order," "update inventory."

Traditionally, each of these actions would need its own dedicated agent tool. That would mean hundreds of tools. Even with mode-based filtering, that is too many.

The workflow_action_as_tool bridge solves this by letting the agent invoke ANY automation action directly as a tool call. The agent says "I want to send a Slack message" and the bridge routes it to the Slack integration's "send message" action, using the same schema and execution path that automation workflows use.

This means:

Zero agent-side engineering per integration. When we add a new integration to the automation platform, it automatically becomes available as an agent tool. No tool code to write. No schema to define. The integration's existing schema IS the tool schema.
The agent's capabilities grow with every integration we add. 100+ integrations today means 100+ potential tool actions. When we add Salesforce, every agent immediately gets Salesforce tools. When we add Notion, every agent gets Notion tools.
Consistency across automation and agent paths. The same action executed by an automation workflow and by an agent tool produces the same result. There is no "agent version" and "automation version" of the same operation.

This is what Mahesh Murag from Anthropic calls the "build once, connect everywhere" principle in his talk on building agents with MCP. Each integration is built once and becomes both an automation action and an agent tool. The 26 core tools are the foundation. The bridge pattern extends them to hundreds.

AI agent tools

Single-System vs. Multi-System: A Comparison

The "how many tools" debate only makes sense when you specify how many systems the agent crosses. Here is a framework for reasoning about it.

Approach	Tool Count	When It Works	When It Fails
Minimal (3-5)	Vercel, simple chatbots	Single-task, one system boundary	Multi-system orchestration
Moderate (10-15)	GitHub Copilot, Cursor	Multi-file editing, one domain	Cross-domain workflows
Full (20-30)	Taskade EVE, enterprise agents	Multi-system, multi-domain	If tools overlap or boundaries blur
Excessive (50+)	Over-engineered systems	Never	Always — agent confusion scales super-linearly

The key variable is system boundaries. Count yours:

1 boundary (code editor, chatbot, search agent) → 3-5 tools
2 boundaries (editor + deployment, chat + knowledge) → 8-12 tools
3-4 boundaries (projects + agents + automations + apps) → 18-28 tools
5+ boundaries → Consider splitting into multiple specialized agents rather than one agent with 40+ tools

Capability	Single-System (Vercel v0)	Multi-System (Taskade EVE)
System boundaries crossed	1 (code editor)	4 (Projects, Agents, Automations, Apps)
Total tools	3-5	26
Tools visible per context	3-5 (always all)	5-8 (mode-filtered)
External integrations as tools	No	Yes (100+ via bridge pattern)
Multi-agent orchestration	No	Yes (`agent_team_chat`)
Automation workflows	No	Yes (4 dedicated tools)
Knowledge base wiring	No	Yes (`knowledge_connect`)
App deployment	Yes (single output)	Yes (`deploy_app` with custom domains)

This is not a criticism of Vercel's approach. v0 is an excellent code generation agent precisely because it stays within one system boundary and does that one thing well. The comparison illustrates why tool count is a function of architecture, not a universal best practice.

How We Got to 26: The Timeline

The number 26 was not planned. It grew organically as we added system boundaries. Here is the timeline.

Date	Version	Milestone	Tool Count
Dec 2022	v4.22.0	First AI tools (editor slash commands)	3
May 2023	v4.76.0	Agent-specific tools launch	8
Jun 2024	v5.61.0	Web search tool for agents	10
Oct 2024	v5.98.0	Custom agent tool editing	14
Jul 2025	v5.200.0	Agents + Automation tools converge	22
Oct 2025	v6.30.0	Full 26-tool orchestration with mode filtering	26
Feb 2026	v6.110.0	Agent Public API (external tools call agents)	26 + external

Each jump in tool count corresponds to a new system boundary. When we added agent management, we added agent tools. When we added automations, we added automation tools. When we added Genesis app building, we added interface and system tools.

The number tracks architecture, not ambition. If we add a fifth system boundary tomorrow, we will add 4-6 more tools. If we remove a system boundary (we will not), we will remove its tools.

Production Lessons

Three years of running 26 tools in production taught us things that design documents cannot.

Tool Descriptions Are Prompts

The description you write for a tool is not documentation. It is a prompt. The agent reads the description to decide whether to call the tool. A vague description ("manages projects") leads to over-use. A precise description ("creates, updates, or deletes projects in the current workspace; use project_retrieve for read operations") leads to correct use.

We spent more engineering time writing tool descriptions than writing the agent's system prompt. That sounds backwards until you realize that 26 tool descriptions are, collectively, a larger prompt surface than the system prompt itself.

Tool Call Frequency Reveals Design Problems

We track how often each tool is called. This data reveals two types of problems:

Never called — The tool is redundant or its description is confusing. Either merge it with another tool or rewrite its description.
Called more than 40% of the time — The tool is too broad. It is probably handling operations that should be split into separate tools. A tool called in nearly half of all interactions is doing too many things.

The sweet spot for most tools is 5-15% of interactions. Core tools like project_manage and str_replace_editor run higher. Specialized tools like deploy_app and knowledge_connect run lower. But every tool should have a nonzero call rate. Dead tools are dead weight in the agent's reasoning context.

Agents Will Surprise You

An agent will try to use web_search when you expected it to use project_search. It will call agent_chat to ask another agent about a project instead of using project_retrieve to read the project directly. These creative uses are not bugs — they are the agent reasoning about the fastest path to the answer.

Design tools to fail gracefully when used creatively. If web_search is called when project_search would be better, the web search result will be less relevant but not catastrophic. The agent can self-correct on the next turn. Catastrophic failures — tools that corrupt data when called in the wrong context — are the real enemy.

26 Is Stable but Not Permanent

We started with 3 tools, grew to 8, then 14, then 22, then 26. The number stabilized at 26 because our system boundary count stabilized at four. If we add a new system (and we will), the tool count will grow. If we merge two systems (unlikely), the count will shrink.

The important thing is that 26 was never a target. It was never a goal. It emerged from applying one principle consistently: one tool per system operation, no overlap, clear boundaries.

Testing Tool Interactions Is Harder Than Testing Tools

Individual tool tests are straightforward. Call project_manage with valid inputs, verify it creates a project. Call it with invalid inputs, verify it returns a useful error.

The hard part is testing tool sequences. A user says "build me a task manager with Slack notifications." The agent needs to: create a project (project_manage), create an agent (agent_manage), configure the agent (agent_configure), create an automation flow (automation_manage), wire the Slack integration (workflow_action_as_tool), build the app interface (str_replace_editor, file_tree_manage), preview it (app_preview), and deploy it (deploy_app). That is 8 tools in sequence, and the output of each informs the input of the next.

We test these sequences end-to-end. Individual tool reliability compounds into agent-level reliability only if the composition works. A chain is only as strong as its weakest tool.

Framework: Finding Your Right Number

If you are building an agent and wondering how many tools it should have, here is the framework we use.

Step 1: Count your system boundaries. A system boundary is a distinct domain your agent interacts with. A code editor is one boundary. A database is another. An API gateway is a third. If two operations use the same data store, same permissions model, and same error modes, they are in the same system.

Step 2: List operations per system. For each system boundary, list the distinct operations: create, read, update, delete, search, configure, deploy, monitor. Most systems have 4-7 distinct operations.

Step 3: Check for overlaps. If two operations in different systems do the same thing (both "read" operations return the same data type), consider whether they can share a tool. Usually they cannot — different systems have different schemas. But check.

Step 4: Add mode-based filtering. Once you have your total tool count, decide which tools belong to which mode. Users managing projects should not see deployment tools. Users building apps should not see automation configuration tools.

Step 5: Validate with call frequency. After launch, track tool call frequency. Tools that are never called are candidates for removal. Tools called too frequently are candidates for splitting. Iterate.

Your number will be different from ours. A code generation agent might correctly have 5 tools. An enterprise integration agent might correctly have 35. The number is a function of your architecture, not ours.

The Deeper Question

The "how many tools" debate masks a deeper question: what kind of agent are you building?

If you are building a single-task agent — one that does one thing in one system — fewer tools is almost always better. Vercel's approach is correct. Strip down to the minimum. Remove anything that could confuse the agent. Ship it.

If you are building a multi-system orchestrator — one that coordinates across multiple domains, manages multiple entity types, and composes workflows from different systems — you need more tools. You need one tool per system operation. You need mode-based filtering to manage complexity. And you need the workflow_action_as_tool bridge to scale your agent's capabilities with your platform's integrations.

The real innovation is not having 26 tools. The real innovation is the architecture that makes 26 tools work: clear boundaries, structured schemas, informative outputs, mode-based filtering, and the bridge pattern that turns 100+ integrations into agent capabilities automatically.

We did not set out to build an agent with 26 tools. We set out to build a workspace where AI agents, projects, automations, and apps work together as one system. Twenty-six tools is what it took to make that real.

Frequently Asked Questions

How many tools should an AI agent have?

The right number of AI agent tools depends on how many system boundaries the agent crosses. Single-system agents (like code editors) need 3-5 tools. Multi-system orchestrators (like Taskade EVE, which coordinates Projects, Agents, Automations, and Apps) need 20-30. The principle is one tool per distinct system operation, with no overlap.

Why did Vercel remove 80 percent of their agent's tools?

Vercel's agent operates within a single system (a code editor) where multiple tools for similar operations caused confusion. Reducing to the minimal set improved agent accuracy. This is the correct approach for single-system agents, but multi-system orchestrators like Taskade need more tools to cover different system boundaries.

What are mode-based tool filters for AI agents?

Mode-based filtering dynamically adjusts which tools an agent can access based on the current task. In Taskade, Genesis Mode exposes all 26 tools for full app creation, while Projects Mode limits the agent to 8 project-related tools. This narrows the action space, reducing errors and improving tool selection accuracy.

What is the workflow_action_as_tool pattern?

The workflow_action_as_tool pattern allows AI agents to invoke any automation action directly as a tool. In Taskade, this means 100+ integration actions (Slack, Gmail, Shopify, GitHub) become tools the agent can call. The agent's capabilities grow automatically with every new integration added to the platform.

How do you design tools for AI agents that minimize hallucination?

Key practices include using structured Zod schemas for inputs (constraining what agents can provide), writing descriptive tool names and descriptions (agents use these to decide which tool to call), returning informative error messages for LLMs (not just status codes), and making tools idempotent where possible to prevent duplicate operations.

What is the difference between single-system and multi-system AI agents?

A single-system agent operates within one domain, such as a code editor or a chatbot. A multi-system agent orchestrates across multiple distinct systems. Taskade EVE coordinates four systems: Projects (memory), Agents (intelligence), Automations (execution), and Apps (interface). Each system boundary requires its own set of tools, which is why multi-system agents need more tools than single-system agents.

How does mode-based tool filtering improve AI agent accuracy?

Mode-based filtering reduces the action space an agent must reason over. When a user is managing tasks, the agent sees only 8 project tools instead of all 26. Fewer options mean less confusion about which tool to select. The agent cannot accidentally invoke an automation tool when the user asked to organize a project. This narrowing improves tool selection accuracy without removing capabilities from the platform.

Can AI agent tools grow automatically with new integrations?

Yes. The workflow_action_as_tool bridge pattern in Taskade turns every automation integration action into a callable agent tool. When a new integration is added to the platform, its actions become available to the agent without any agent-side engineering. This means 100+ integration actions across Slack, Gmail, Shopify, GitHub, and more are already available as agent tools.