← Blog·Engineering

Skill vs Tool: the distinction that makes MONO different

2026-04-14·MONO Team·5 min read

Why the separation matters

Most agent frameworks (LangChain, AutoGPT, etc.) treat "tool" and "function" as the base atom. The agent sees a flat list of 50 functions and the LLM decides which to call.

This works with 5-10 tools. It breaks at 50. The prompt bloats with repetitive descriptions, and similar tools (log_expense vs create_transaction vs record_spending) confuse the model.

Skills introduce a layer of semantic grouping. Instead of "here are 83 functions, pick one", we say "here are 21 skills; which one(s) are relevant?". The Haiku router picks 1-3 skills. Then, inside that skill, only its subset of tools is exposed.

Anatomy of a skill

A skill in MONO contains:

Tools: the functions it executes (create_event, list_events, delete_event for the calendar skill).
Manifest: YAML file with description, activation examples, UI conditions, boundaries.
Renderer: Go function that turns results into Dynamic UI (rendered HTML).
Modes: declaration of which modes it supports (tool/think/agent — see the router post).
Proactive monitors (optional): cron jobs that run without user input (e.g. calendar's morning brief at 7am).

Example: the Expenses skill

Tools (6): log_expense, list_expenses, expense_summary, delete_expense, categorize_expense, analyze_expenses (agent mode).

Manifest examples: "spent 300 on uber" → log_expense. "how much did I spend this month?" → expense_summary. "explain my patterns" → analyze_expenses (agent mode).

Renderer: HTML table with categories, monthly projection, delta vs last month, bar chart for top-5 categories.

Monitor: every Friday 6pm, compute the weekly summary and proactively send it if the user exceeded their average.

The LLM never sees all 6 tools individually until the router picked "expenses" as the relevant skill. Then and only then they're exposed in context.

Architectural consequences

Composability: adding a new skill = creating a Go package + a YAML + a renderer. No need to touch the router, executor, or UI system.

Isolated testing: tools are tested independently. The manifest is validated against a schema. The renderer is tested with output fixtures.

Billing per skill: we can charge for individual skills (email $2/mo, calls $7/mo) because they're discrete units with clear boundaries.

User-facing vocabulary: users think in skills ("I want to activate the Fitness skill"), not tools. Internally we have ~83 tools, but the UI has 21 skills. That's the right abstraction for the product.

TL;DR

Tool = executable function. Skill = domain capability that groups tools + manifest + renderer + monitors. The separation lets us scale to 83 tools without confusing the LLM, bill per capability, and test each domain in isolation.