Introduction
"It's like talking to a friend who actually remembers what I told them last week." That's the line we kept hearing from testers — and it's the goal we set when we started building the FastAI Coach. Generic chatbots forget you between sessions. FastAI's Coach doesn't, and this post explains how.
If you've used a typical AI coaching app, you've felt the gap. You explain your routine on Monday — that you train fasted, can't tolerate dairy, work the night shift on Wednesdays — and by Friday the coach is back to suggesting milk-based smoothies for breakfast. The AI is technically intelligent. It just isn't your intelligent.
This is the engineering story of how we closed that gap with a memory pipeline built on Claude, Voyage 3.5-lite embeddings, and retrieval-augmented generation — tuned not for corporate documents but for the messy, personal context of someone's health habits.
The Problem with Stateless AI Coaches
Most AI features in consumer apps today are stateless. The model gets a prompt, returns an answer, and forgets the conversation the moment it ends. For a search query or a one-off question, that's fine. For a coach you talk to every day for months, it's the wrong shape entirely.
The naive fix is to dump the full chat history into every new prompt. That breaks fast:
- Token cost explodes: A user with three months of daily check-ins has tens of thousands of words of history. Sending all of it on every turn is unaffordable.
- Models lose focus: Even with massive context windows, the signal-to-noise ratio drops. Important nuggets get diluted by routine chatter.
- Privacy surface grows: Every byte of memory you send to the model is another byte of personal-health data crossing a network. Less is better.
- Latency suffers: Bigger prompts mean slower responses. For a Coach you're chatting with, every extra second matters.
The right shape is selective recall. Don't send everything; send the right things. That's exactly what RAG — retrieval-augmented generation — was designed for. The trick is tuning it for personal-health memory, where what matters isn't a document but a habit.
The Architecture: Three Layers of Memory
FastAI's Coach has three distinct memory layers, each tuned for a different time horizon and a different shape of recall.
Layer 1 — Live Context (this session)
The current conversation, your live fasting status, your last few meals, your weight today. This is the immediate frame the Coach reasons within and is sent on every Claude call. It's small, fresh, and structured as natural-language summaries rather than raw JSON — which both saves tokens and improves the model's reasoning.
Layer 2 — Recent Window (4–6 weeks)
A rolling window of recent conversations and meal logs, retained in Convex but only surfaced when relevant to the current question. If you ask "why did I plateau?", the Coach pulls in the last 30 days of fasting durations and meals. If you're just checking in, that history doesn't crowd the prompt.
Layer 3 — Long-Term Memory (extracted nuggets)
This is where the magic happens. Every night, a Convex cron runs a Claude pass over recent conversations and asks: "what's worth remembering about this user beyond a few weeks?" The model returns short, structured memory nuggets — facts, preferences, patterns:
- "Trains fasted on Tuesdays and Thursdays; says workouts feel stronger when fasted past 16 hours."
- "Avoids dairy entirely — caused digestive issues in week 2."
- "Goal is body recomp, not pure weight loss; weight stalls don't bother them if measurements are improving."
- "Works night shifts on Wednesdays; eating window shifts to 4 PM – midnight that day."
Each nugget gets embedded with Voyage 3.5-lite and stored in Convex with vector search enabled. When the Coach gets a new question, we embed the question, run a similarity search over the user's memory nuggets, and pull the top few back into the prompt. The Coach answers with both the live context AND the relevant long-term memory loaded — exactly as if a human coach had skimmed their notes before the call.
Why Voyage 3.5-lite (Not the Bigger Models)
Picking an embedding model is a tradeoff between recall quality, dimension size, and cost-per-token. We started on a heavier model and switched to Voyage 3.5-lite in v1.6 for three reasons:
Cost at scale. Personal-health memory means embedding a lot of small chunks for every user, every night. A lighter model with smaller embedding dimensions cuts both token cost and storage cost without meaningfully hurting recall on short, single-fact nuggets like "avoids dairy."
Latency on retrieval. Smaller embeddings mean faster vector search at query time. The Coach feels snappier — a perceptible difference even at the millisecond level.
Recall quality holds. For the kind of memory we extract — short, declarative, single-topic — the lite model retrieves the right nuggets just as well as the heavier model in our tests. Save the bigger embeddings for use cases that actually need them.
It's a deliberately boring choice. The interesting design isn't picking the fanciest model; it's picking the right one for the shape of data you actually have.
What We Extract — and What We Don't
The single most important decision in the whole pipeline is what counts as "memory-worthy." Get it wrong in one direction and you fill the user's memory store with chatter that drowns out signal. Get it wrong in the other and you miss the things that actually matter.
Our extraction prompt asks the model to look for four specific shapes:
- Hard preferences — dietary restrictions, foods that consistently cause problems, religious/cultural constraints around food.
- Routine patterns — fasting windows tied to specific weekdays, training schedules, meal timing patterns that recur.
- Goals & motivations — what success looks like to this user, the deeper "why" behind the fasting habit.
- Stated health context — conditions or sensitivities the user has shared that should inform every future suggestion.
And critically, we explicitly don't extract:
- Pleasantries, greetings, casual chitchat.
- Single-day events or transient feelings ("I'm a bit tired today").
- Anything the model is uncertain about — better to forget than to confidently remember a misread.
The bias is toward sparse, high-quality memory rather than dense, low-quality memory. A user with thirty good nuggets is better served than one with three hundred mediocre ones.
The Daily Insight: Memory Made Visible
Memory does its best work when the user can see it working. That's the role of the Daily Insight — the morning summary FastAI generates each day. It explicitly references long-term memory in its language: "Given that you avoid dairy and trained fasted yesterday, your protein intake at 4 PM was right on target." When the user reads that, they feel the Coach knowing them — and they trust the suggestions that come next.
This is not a UI flourish. It's the feedback loop that makes memory durable. The user sees their constraints honored, so they tell the Coach more. They tell the Coach more, so the memory store gets richer. The memory store gets richer, so the suggestions get sharper. That's the loop we're optimizing for.
Privacy, Honestly
Long-term memory of someone's health habits is sensitive. We took it seriously from the first commit:
- Memory lives in your account, not a shared corpus. Each user's nuggets are scoped to their identity. Nothing is mixed into a global model or used to train anything.
- You can delete it. Account deletion is a real cascade — across nine tables — validated end-to-end by real testers, not a "request" form. More on the audit close-out here →
- Telemetry is EU-hosted and PII-scrubbed. Sentry and PostHog are both in the EU region, and we strip emails, JWTs, and secrets before any event leaves the device.
- Prompt-injection guard on every AI surface. Even if a user pastes adversarial text, it can't override our system prompt or extract another user's data — there's no other user's data to extract.
What Memory Won't Fix
It's worth being honest about the things memory can't solve. A Coach that remembers you better still won't make a hard fast easy. It won't replace a doctor. It won't tell you what to eat with the certainty of a clinical trial. What it does is much smaller and, for daily use, much more important: it stops asking you the same questions twice. It stops suggesting things you've already told it won't work for you. It builds, over weeks, into a sense that this app actually knows you — and that's the difference between an AI tool you use once and an AI tool you keep coming back to.
What's Next
The memory pipeline today is good enough to ship and good enough to feel. The roadmap from here is about depth, not novelty:
- Wearable signals as memory inputs. Heart-rate variability and sleep quality from Apple Watch / Oura would become extraction inputs alongside chat — letting the Coach notice "your fasts feel harder when your sleep score is below 70" without you having to say it.
- User-visible memory editing. A simple settings screen where you can read what the Coach remembers about you, correct anything wrong, and pin a memory you want it to never forget.
- Memory consolidation. Periodic re-extraction passes that merge near-duplicate nuggets and prune ones that haven't been retrieved in months — keeping the store sharp as it grows.
Try the Coach that Remembers
FastAI Health Coach is LIVE on the iOS App Store (v2.12.2 in review). Android in Closed Testing on Google Play. Try the Coach with real memory today.
🍎 Download on iOS App Store →