Coach that Remembers: How FastAI Built Long-Term AI Memory for Health Habits

📖 9 min read
AI / Memory & RAG
April 2026

Introduction

"It's like talking to a friend who actually remembers what I told them last week." That's the line we kept hearing from testers — and it's the goal we set when we started building the FastAI Coach. Generic chatbots forget you between sessions. FastAI's Coach doesn't, and this post explains how.

If you've used a typical AI coaching app, you've felt the gap. You explain your routine on Monday — that you train fasted, can't tolerate dairy, work the night shift on Wednesdays — and by Friday the coach is back to suggesting milk-based smoothies for breakfast. The AI is technically intelligent. It just isn't your intelligent.

This is the engineering story of how we closed that gap with a memory pipeline built on Claude, Voyage 3.5-lite embeddings, and retrieval-augmented generation — tuned not for corporate documents but for the messy, personal context of someone's health habits.

The Problem with Stateless AI Coaches

Most AI features in consumer apps today are stateless. The model gets a prompt, returns an answer, and forgets the conversation the moment it ends. For a search query or a one-off question, that's fine. For a coach you talk to every day for months, it's the wrong shape entirely.

The naive fix is to dump the full chat history into every new prompt. That breaks fast:

  • Token cost explodes: A user with three months of daily check-ins has tens of thousands of words of history. Sending all of it on every turn is unaffordable.
  • Models lose focus: Even with massive context windows, the signal-to-noise ratio drops. Important nuggets get diluted by routine chatter.
  • Privacy surface grows: Every byte of memory you send to the model is another byte of personal-health data crossing a network. Less is better.
  • Latency suffers: Bigger prompts mean slower responses. For a Coach you're chatting with, every extra second matters.

The right shape is selective recall. Don't send everything; send the right things. That's exactly what RAG — retrieval-augmented generation — was designed for. The trick is tuning it for personal-health memory, where what matters isn't a document but a habit.

The Architecture: Three Layers of Memory

FastAI's Coach has three distinct memory layers, each tuned for a different time horizon and a different shape of recall.

Layer 1 — Live Context (this session)

The current conversation, your live fasting status, your last few meals, your weight today. This is the immediate frame the Coach reasons within and is sent on every Claude call. It's small, fresh, and structured as natural-language summaries rather than raw JSON — which both saves tokens and improves the model's reasoning.

Layer 2 — Recent Window (4–6 weeks)

A rolling window of recent conversations and meal logs, retained in Convex but only surfaced when relevant to the current question. If you ask "why did I plateau?", the Coach pulls in the last 30 days of fasting durations and meals. If you're just checking in, that history doesn't crowd the prompt.

Layer 3 — Long-Term Memory (extracted nuggets)

This is where the magic happens. Every night, a Convex cron runs a Claude pass over recent conversations and asks: "what's worth remembering about this user beyond a few weeks?" The model returns short, structured memory nuggets — facts, preferences, patterns:

Each nugget gets embedded with Voyage 3.5-lite and stored in Convex with vector search enabled. When the Coach gets a new question, we embed the question, run a similarity search over the user's memory nuggets, and pull the top few back into the prompt. The Coach answers with both the live context AND the relevant long-term memory loaded — exactly as if a human coach had skimmed their notes before the call.

Why Voyage 3.5-lite (Not the Bigger Models)

Picking an embedding model is a tradeoff between recall quality, dimension size, and cost-per-token. We started on a heavier model and switched to Voyage 3.5-lite in v1.6 for three reasons:

Cost at scale. Personal-health memory means embedding a lot of small chunks for every user, every night. A lighter model with smaller embedding dimensions cuts both token cost and storage cost without meaningfully hurting recall on short, single-fact nuggets like "avoids dairy."

Latency on retrieval. Smaller embeddings mean faster vector search at query time. The Coach feels snappier — a perceptible difference even at the millisecond level.

Recall quality holds. For the kind of memory we extract — short, declarative, single-topic — the lite model retrieves the right nuggets just as well as the heavier model in our tests. Save the bigger embeddings for use cases that actually need them.

It's a deliberately boring choice. The interesting design isn't picking the fanciest model; it's picking the right one for the shape of data you actually have.

What We Extract — and What We Don't

The single most important decision in the whole pipeline is what counts as "memory-worthy." Get it wrong in one direction and you fill the user's memory store with chatter that drowns out signal. Get it wrong in the other and you miss the things that actually matter.

Our extraction prompt asks the model to look for four specific shapes:

And critically, we explicitly don't extract:

The bias is toward sparse, high-quality memory rather than dense, low-quality memory. A user with thirty good nuggets is better served than one with three hundred mediocre ones.

The Daily Insight: Memory Made Visible

Memory does its best work when the user can see it working. That's the role of the Daily Insight — the morning summary FastAI generates each day. It explicitly references long-term memory in its language: "Given that you avoid dairy and trained fasted yesterday, your protein intake at 4 PM was right on target." When the user reads that, they feel the Coach knowing them — and they trust the suggestions that come next.

This is not a UI flourish. It's the feedback loop that makes memory durable. The user sees their constraints honored, so they tell the Coach more. They tell the Coach more, so the memory store gets richer. The memory store gets richer, so the suggestions get sharper. That's the loop we're optimizing for.

Privacy, Honestly

Long-term memory of someone's health habits is sensitive. We took it seriously from the first commit:

What Memory Won't Fix

It's worth being honest about the things memory can't solve. A Coach that remembers you better still won't make a hard fast easy. It won't replace a doctor. It won't tell you what to eat with the certainty of a clinical trial. What it does is much smaller and, for daily use, much more important: it stops asking you the same questions twice. It stops suggesting things you've already told it won't work for you. It builds, over weeks, into a sense that this app actually knows you — and that's the difference between an AI tool you use once and an AI tool you keep coming back to.

What's Next

The memory pipeline today is good enough to ship and good enough to feel. The roadmap from here is about depth, not novelty:

Try the Coach that Remembers

FastAI Health Coach is LIVE on the iOS App Store (v2.12.2 in review). Android in Closed Testing on Google Play. Try the Coach with real memory today.

🍎 Download on iOS App Store →