Skill vs Tool: the distinction that makes MONO different
Why the separation matters
Most agent frameworks (LangChain, AutoGPT, etc.) treat "tool" and "function" as the base atom. The agent sees a flat list of 50 functions and the LLM decides which to call.
This works with 5-10 tools. It breaks at 50. The prompt bloats with repetitive descriptions, and similar tools (log_expense vs create_transaction vs record_spending) confuse the model.
Skills introduce a layer of semantic grouping. Instead of "here are 83 functions, pick one", we say "here are 21 skills; which one(s) are relevant?". The Haiku router picks 1-3 skills. Then, inside that skill, only its subset of tools is exposed.
Anatomy of a skill
A skill in MONO contains:
- Tools: the functions it executes (create_event, list_events, delete_event for the calendar skill).
- Manifest: YAML file with description, activation examples, UI conditions, boundaries.
- Renderer: Go function that turns results into Dynamic UI (rendered HTML).
- Modes: declaration of which modes it supports (tool/think/agent — see the router post).
- Proactive monitors (optional): cron jobs that run without user input (e.g. calendar's morning brief at 7am).
Example: the Expenses skill
Tools (6): log_expense, list_expenses, expense_summary, delete_expense, categorize_expense, analyze_expenses (agent mode).
Manifest examples: "spent 300 on uber" → log_expense. "how much did I spend this month?" → expense_summary. "explain my patterns" → analyze_expenses (agent mode).
Renderer: HTML table with categories, monthly projection, delta vs last month, bar chart for top-5 categories.
Monitor: every Friday 6pm, compute the weekly summary and proactively send it if the user exceeded their average.
The LLM never sees all 6 tools individually until the router picked "expenses" as the relevant skill. Then and only then they're exposed in context.
Architectural consequences
Composability: adding a new skill = creating a Go package + a YAML + a renderer. No need to touch the router, executor, or UI system.
Isolated testing: tools are tested independently. The manifest is validated against a schema. The renderer is tested with output fixtures.
Billing per skill: we can charge for individual skills (email $2/mo, calls $7/mo) because they're discrete units with clear boundaries.
User-facing vocabulary: users think in skills ("I want to activate the Fitness skill"), not tools. Internally we have ~83 tools, but the UI has 21 skills. That's the right abstraction for the product.
TL;DR
Tool = executable function. Skill = domain capability that groups tools + manifest + renderer + monitors. The separation lets us scale to 83 tools without confusing the LLM, bill per capability, and test each domain in isolation.