Why route AI calls through a Next.js API route instead of calling Anthropic directly from the client?

Calling the Anthropic API directly from a mobile or web client would expose the API key in the bundle and make server-side concerns like auth, quota enforcement, and usage recording impossible. By centralizing everything behind a single Next.js API route at POST /api/ai/assist, both the Expo app and the web app share one endpoint with consistent behavior. Auth is checked server-side, the subscription plan is verified before any Claude call goes out, and usage is recorded atomically. The client never sees the API key or the raw Claude response — it only gets the validated, cleaned suggestion.

How do you stop Claude from ignoring the structured output format?

Two levers help significantly. First, the system prompt is explicit about the exact emoji markers and section structure required for each diary pattern — there is no ambiguity about what the output should look like. Second, we lower the temperature for structured patterns (0.65 for Growth and Five-Minute Journal) and raise it only for free-form output (0.85 for the Free pattern). Lower temperature reduces the chance the model drifts from the prescribed format. On top of that, each pattern has a dedicated validator: the Three Lines validator, for example, checks that at least two of the three expected emoji markers are present before accepting the response. Validated failure triggers the fallback path, so the user never sees a malformed suggestion.

What happens when the Anthropic API is down or returns an error?

The feature degrades gracefully to a bank of pre-written static suggestions. There are three trigger points: the API client is not initialized (missing key), the API call throws, or the response fails validation. In all three cases the handler returns a successful response from the static fallback pool — the user has no idea the AI was unavailable. The fallback pool covers all six diary patterns in English and Japanese, and falls back to English if the user's locale isn't covered. The core principle is that the writing assist UI should always return something useful, never an empty state.

How does the monthly usage quota work without a scheduled reset job?

There is no cron job. The quota check runs a COUNT query against the ai_assist_usage table filtered to the current calendar month using a WHERE clause on the created_at timestamp. Because the query always scopes to the current month, the counter resets naturally when the month rolls over — no batch job, no reset flag, no drift risk. The response includes a usage object with used, limit, and remaining fields so the client can display "3 uses left this month" without a separate API call. Usage recording is fire-and-forget on the happy path: if the INSERT fails, we log the error but still return the suggestion. A few leaked counts per month is an acceptable trade-off for not breaking the user experience on a database hiccup.

Why use key-value format for the user prompt instead of natural language?

Conciseness and predictability. A natural-language prompt like "The user is writing on March 20th, 2026 in the evening, their mood score is 4, and they want the Free pattern" wastes tokens and makes the prompt-building code messier. Key-value pairs like Language=ja, Date=2026-03-20, Time=evening, Pattern=free, MoodScore=4 are unambiguous, token-efficient, and trivially assembled in code. Claude handles structured key-value input without any loss of comprehension, and the deterministic format makes it straightforward to add new context fields (like StreakDays) without reworking prose.

Why Claude Haiku rather than a larger model?

Writing assist is a short-generation task — the output is a few sentences at most, even for structured multi-section patterns. Haiku delivers acceptable quality at significantly lower latency and cost than Sonnet or Opus. We cap max_tokens per pattern (300 for Free, up to 500 for structured patterns) which further limits cost. For a feature used potentially dozens of times per month per user, the economics of a smaller model matter. If we move to longer-form generation or more complex reasoning in future iterations, that decision will be revisited — but for now Haiku is the right fit.

AI writing assist in a diary app: prompt design, validation, fallbacks, and usage control

The hardest part of keeping a diary is the first sentence. We built Storyie's writing assist feature to solve exactly that: given a user's mood, the time of day, and any keywords they want to write about, the feature generates a natural opening that they can continue from.

This post is about the engineering behind it — prompt structure, output validation, fallback behavior, and how the usage quota ties into the subscription system. Not the UX rationale; the technical decisions.

TL;DR

Both the web app and the Expo mobile app call a single Next.js API route at POST /api/ai/assist. The AI layer lives on the server — auth, quota, and usage recording all happen before Claude is ever invoked.
Six diary patterns each constrain the output to a specific structure (emoji markers, section headings). The LLM is not given creative latitude over format.
Per-pattern validators check that the structure survived the generation. Failures fall through to a static fallback pool, so the feature always returns something.
Usage quota is enforced with a monthly COUNT query — no reset job required.

| Concern | Approach |
|---|---|
| Cross-platform delivery | Single Next.js API route, called by both Expo and web |
| Output consistency | Six diary patterns with fixed structure; per-pattern validators |
| Fault tolerance | Three-tier fallback: API down → API error → validation failure |
| Cost control | Claude Haiku, per-pattern max_tokens caps, subscription quota |
| Quota reset | Monthly COUNT query on created_at; no cron job |

Architecture overview

[Mobile / Web client]
        ↓ POST /api/ai/assist
[Next.js API Route]
        ├── Auth check (Cookie / Bearer token)
        ├── Subscription plan → monthly usage check
        ├── Prompt assembly (pattern × language × context)
        ├── Anthropic API call (Claude Haiku)
        ├── Response validation → fallback decision
        └── Usage recording → response

The key decision here is that all AI logic sits in the API route. The Expo app and the web app both call the same endpoint with the same payload shape. There is no platform-specific AI path to keep in sync, and the API key never leaves the server.

Six diary patterns

Letting the model write whatever it wants produces repetitive output. Storyie defines six diary patterns, each with a fixed output structure.

| Pattern | Shape | Example output |
|---|---|---|
| Free | 1–2 open sentences | "I took a different route home tonight..." |
| Three Lines | Three emoji-marked lines (Kobayashi method) | 😔 What didn't go well / ✨ What moved me / 🏃 What I'll do tomorrow |
| Fact → Feel → Next | Three labeled sections | 📝 What happened / 💬 How I felt / 🔮 What's next |
| Gratitude | Three-angle gratitude reflection | — |
| 5-Minute Journal | Morning vs. evening variant | 🌅 Intention setting / 🌙 Evening review |
| Growth | Four-step growth diary | 📌 Fact → 🔍 Discovery → 📖 Lesson → 💪 Declaration |

Each pattern constrains the output structure in the system prompt. The model is told exactly which emoji markers to use and in what order — there is no creative latitude on format, only on the text within each section. This constraint is what makes validation possible.

Prompt design

System prompt: two layers

The system prompt is split into shared rules and pattern-specific instructions.

The shared rules enforce five things:

Language fidelity — respond in the user's language, no exceptions (ten languages supported)
Brevity — the opening should be short; leave room for the user to continue
Consistent tone — warm, non-judgmental, like an encouraging friend
Diversity — don't repeat previous openings (context about recent entries is included)
No assumptions — don't invent specifics about the user's day that weren't provided

Pattern-specific instructions follow the shared block. For Three Lines, the model is told the exact three emoji markers, in order, and that each should be a single reflective sentence.

User prompt: key-value over prose

We pass context as a structured key-value string rather than natural language:

Language=ja, Date=2026-03-20, Time=evening, Pattern=free, MoodScore=4, StreakDays=10

Natural language ("The user is writing on the evening of March 20th...") is more tokens and harder to build programmatically. Claude reads key-value input without any loss — and adding new context fields like StreakDays is a one-line code change.

Per-pattern temperature

This is a quiet but effective lever. Structured patterns need the model to stay on format; free-form patterns benefit from variety.

const PATTERN_TEMPERATURE: Record<DiaryPattern, number> = {
  free: 0.85,        // maximize variety
  three_lines: 0.7,  // structured, some flexibility
  fact_feel_next: 0.65,
  gratitude: 0.7,
  five_min: 0.65,    // strict format adherence
  growth: 0.65,
};

Lowering temperature for structured patterns meaningfully improved the validator pass rate in our testing. The Free pattern stays high because sameness defeats the purpose.

Output validation

The baseline assumption is that LLM output cannot be trusted to match the specified format. Every pattern has a dedicated validator.

// Three Lines: require at least 2 of 3 expected emoji markers
function validateThreeLines(text: string): ValidationResult {
  const markers = ["😔", "✨", "🏃"];
  const found = countMarkers(text, markers);
  if (found < 2) {
    return { valid: false, reason: `found ${found}/3 markers` };
  }
  const cleaned = stripTrailingMeta(stripPreamble(text, markers));
  return { valid: true, cleaned };
}

The threshold is 2/3, not 3/3. Claude occasionally substitutes a similar emoji for one of the specified markers. Requiring exact matches would spike the rejection rate without meaningful quality gain. "Close enough" is the right bar here.

Stripping LLM noise

Models frequently add preamble and sign-off text that the user should never see:

Preamble: "Of course! Here's a diary opener for you..."
Trailing meta: "Feel free to adjust this to match your style!"

stripPreamble finds the first occurrence of an expected emoji marker and discards everything before it. stripTrailingMeta removes trailing lines matching patterns like "Feel free to..." or "Hope this helps...". Both run on every response before it reaches the validator.

Fallback strategy

Three situations trigger the fallback path:

The Anthropic client is not initialized (API key missing in environment)
The API call throws (network error, rate limit, service unavailable)
The response fails validation

In all three cases, the handler returns a 200 response with suggestions from the static fallback pool. From the client's perspective, the feature worked.

if (!client) {
  const suggestions = getFallbackSuggestions(language, pattern);
  return NextResponse.json({ success: true, suggestions, ... });
}

The fallback pool covers all six patterns in English and Japanese. For other locales, we cascade: try the user's language → fall back to English → fall back to the Free pattern for that language. The priority is that something is always returned.

Subscription-gated usage quota

AI generation has a real cost per call, so we gate usage by plan:

| Plan | Monthly limit |
|---|---|
| Free | 5 uses |
| Pro | 30 uses |

The check runs against a ai_assist_usage table in Supabase:

const limit = await getUserLimit(user.id);
const used = await countMonthlyUsage(user.id);
if (limit !== null && used >= limit) {
  return NextResponse.json({
    error: "AI_ASSIST_LIMIT_EXCEEDED",
    usage: { used, limit, remaining: 0 },
  }, { status: 429 });
}

countMonthlyUsage is a COUNT query filtered by created_at >= start of current month. No scheduled reset job — the counter resets automatically when the month rolls over because the WHERE clause always scopes to the current month.

Every successful response includes the usage object (used / limit / remaining) so the client can display remaining count without an extra API call.

Non-fatal usage recording

Failing to record a use does not fail the request:

async function recordUsage(userId: string, tokenCount: number): Promise<void> {
  const { error } = await supabase.from("ai_assist_usage").insert({ ... });
  if (error) {
    console.error("[AI Assist API] Failed to record usage:", error);
    // Non-fatal: the response still goes out
  }
}

A database hiccup that drops a count or two is less harmful than an error that leaves the user with no response. The plan limits are soft enough that occasional under-counting is acceptable.

Streak days as context

We compute the user's current diary streak server-side and inject it into the prompt as StreakDays. This lets the model produce contextually relevant openings — acknowledging a ten-day streak, for example, without requiring the client to send that information.

The streak is computed from the last 60 days of diary records. If the client sends a streak_days field we use it; otherwise we compute it ourselves. Anything trust-sensitive is computed server-side.

Model and token budget

We use claude-haiku-4-5-20251001. Writing assist is a short-generation task, and Haiku delivers adequate quality at lower latency and cost than larger models.

max_tokens is capped per pattern:

const PATTERN_MAX_TOKENS: Record<DiaryPattern, number> = {
  free: 300,
  three_lines: 400,
  fact_feel_next: 500,
  gratitude: 500,
  five_min: 500,
  growth: 500,
};

Free suggestions are intentionally brief; structured patterns need more tokens for their multiple sections. Capping tokens per pattern prevents runaway generation and keeps costs predictable.

What worked, what we'd change

Worked well:

The pattern × language matrix from a single endpoint gives a lot of output variety without adding complexity on the client.
The validation + fallback double safety net means AI downtime is invisible to users.
Tying quota enforcement to the @storyie/subscription package means plan changes propagate automatically — the AI route picks up new limits without any logic change of its own.

Would change:

Static fallbacks get stale if the user hits them repeatedly. A rotation index would help.
The structured pattern validators are emoji-dependent. If the model substitutes a visually similar emoji, the validator may still accept it, but the text won't render with the expected icon. A semantic marker approach would be more robust.
Streak computation runs on every request. A short-lived cache or a precomputed column would be a better trade-off as usage scales.

Key takeaways

Constrain LLM output with structure, then validate it. Giving the model a free-form brief produces inconsistent output. Specifying exact markers and sections, then validating their presence, is more reliable than hoping the model follows directions every time.
Fallbacks are a first-class feature, not an afterthought. The writing assist UI should never return empty. Designing the three fallback trigger points from the start made it easy to reason about fault modes.
Usage control belongs in the platform layer from day one. Wiring quota enforcement to the existing subscription logic at the start meant no retrofitting. Any future plan change is automatically reflected in the AI route.

Building a Cross-Platform Mobile App with Expo — how Storyie's Expo app calls the same Next.js API routes that the web client uses
Building a Monorepo with pnpm and TypeScript — workspace conventions behind the shared subscription package
Cross-platform Lexical with use dom: monorepo gains and the bridges you still own — the editor layer that sits below this feature

Try Storyie

The writing assist feature is live in the app. Try it on the web or on iOS — pick a diary pattern, enter a mood score, and see what comes back.