How MONO learns from you without sending your data anywhere

2026-04-14·MONO Team·6 min read

How most assistants "learn"

ChatGPT, Copilot, Gemini — they all store your conversation history on their servers. They use that history for two things: (a) give you context when you return, (b) train future models (unless you pay Enterprise and opt out).

When you tell them "remember my mom is named Rose", that data lives in OpenAI or Google's central database. Multi-tenant. Next to the data of millions of other users. A mis-filtered SQL query, a curious employee, a court order — all are leak vectors.

MONO's memory: local, yours

In MONO, your memory lives in SurrealDB running on your VPS. Three main tables:

memory: atomic facts ("mom's name is Rose", "birthday May 3", "shellfish allergy").
entity: people, places, companies, projects (with relationships).
topic: self-emerging thematic clusters (work, health, family).

Each one has vector (HNSW) and full-text (BM25) indexes over your data. When you ask your MONO "what do I know about my mom?", the agent searches these tables on your VPS, finds related facts, and uses them as context for the response.

What the LLM sees

Here's the subtle point. MONO uses Claude (Anthropic) or GPT (OpenAI) to generate responses. Those calls go to the provider's cloud. So what data is sent?

Only the current request's context. If you ask "what do I know about my mom?", the pipeline:

Searches your local memory — returns 3-5 relevant facts (~300 tokens).
Builds a prompt: "User asks X. Relevant context: [facts]. Respond."
Sends the prompt to Claude. Claude generates response. Done.

Claude does not retain that prompt (Anthropic API Zero-Data-Retention policy). The facts sent are minimal (not your full history). The complete base memory never leaves your VPS.

But still — it reaches the LLM?

Yes. That's the honest trade-off. If you want zero cloud, you need local inference — a Llama 3.3 model running on your own VPS. That's possible but requires larger VPSs (16GB RAM+) and is slower (tokens/sec lower than cloud).

MONO is working on self-hosted Llama 3.3 support on the same VPS. By mid-2026, if you configure MONO in BYOK with a local model, your conversations never touch the cloud. Zero data leaves.

For now, the default is Claude with no-retention policy. For most users, the reasonable trade-off: your full history lives local (big advantage vs ChatGPT), and the individual request passes through a provider with a privacy contract (still better than multi-tenant).

What your MONO learns over time

Because memory persists across conversations, your MONO accumulates:

Names and birthdays of your circle.
Diets, allergies, preferences.
Routines (gym time, bedtime).
Spending patterns by category.
Ongoing projects and their state.
Passwords and docs (encrypted, Vault skill).
Notes and reflections (Memory skill).

With 3 months of use, MONO knows more about your day-to-day than most of your family. That's why the privacy architecture matters so much: what a useful assistant becomes, other services call a "catastrophic breach".

The honest version

MONO isn't 100% offline yet. It uses cloud LLMs for responses. But: (1) your full history never uploads, only per-request fragments, (2) providers have no-retention policies, (3) your VPS is only yours — not multi-tenant. Compared to ChatGPT where everything lives in OpenAI's multi-tenant cloud, the delta is huge. And we're heading toward local Llama to close the last gap.