AI writing assist in a diary app: prompt design, validation, fallbacks, and usage control

Storyie Engineering Team
8 min read

How we built Storyie's AI writing assist feature — covering structured prompt design across six diary patterns, LLM output validation, graceful fallbacks when the API is down, and subscription-gated usage quotas.

AI writing assist in a diary app: prompt design, validation, fallbacks, and usage control

The hardest part of keeping a diary is the first sentence. We built Storyie's writing assist feature to solve exactly that: given a user's mood, the time of day, and any keywords they want to write about, the feature generates a natural opening that they can continue from.

This post is about the engineering behind it — prompt structure, output validation, fallback behavior, and how the usage quota ties into the subscription system. Not the UX rationale; the technical decisions.

TL;DR

  • Both the web app and the Expo mobile app call a single Next.js API route at POST /api/ai/assist. The AI layer lives on the server — auth, quota, and usage recording all happen before Claude is ever invoked.
  • Six diary patterns each constrain the output to a specific structure (emoji markers, section headings). The LLM is not given creative latitude over format.
  • Per-pattern validators check that the structure survived the generation. Failures fall through to a static fallback pool, so the feature always returns something.
  • Usage quota is enforced with a monthly COUNT query — no reset job required.

| Concern | Approach |
|---|---|
| Cross-platform delivery | Single Next.js API route, called by both Expo and web |
| Output consistency | Six diary patterns with fixed structure; per-pattern validators |
| Fault tolerance | Three-tier fallback: API down → API error → validation failure |
| Cost control | Claude Haiku, per-pattern max_tokens caps, subscription quota |
| Quota reset | Monthly COUNT query on created_at; no cron job |

Architecture overview

[Mobile / Web client]
        ↓ POST /api/ai/assist
[Next.js API Route]
        ├── Auth check (Cookie / Bearer token)
        ├── Subscription plan → monthly usage check
        ├── Prompt assembly (pattern × language × context)
        ├── Anthropic API call (Claude Haiku)
        ├── Response validation → fallback decision
        └── Usage recording → response

The key decision here is that all AI logic sits in the API route. The Expo app and the web app both call the same endpoint with the same payload shape. There is no platform-specific AI path to keep in sync, and the API key never leaves the server.

Six diary patterns

Letting the model write whatever it wants produces repetitive output. Storyie defines six diary patterns, each with a fixed output structure.

| Pattern | Shape | Example output |
|---|---|---|
| Free | 1–2 open sentences | "I took a different route home tonight..." |
| Three Lines | Three emoji-marked lines (Kobayashi method) | 😔 What didn't go well / ✨ What moved me / 🏃 What I'll do tomorrow |
| Fact → Feel → Next | Three labeled sections | 📝 What happened / 💬 How I felt / 🔮 What's next |
| Gratitude | Three-angle gratitude reflection | — |
| 5-Minute Journal | Morning vs. evening variant | 🌅 Intention setting / 🌙 Evening review |
| Growth | Four-step growth diary | 📌 Fact → 🔍 Discovery → 📖 Lesson → 💪 Declaration |

Each pattern constrains the output structure in the system prompt. The model is told exactly which emoji markers to use and in what order — there is no creative latitude on format, only on the text within each section. This constraint is what makes validation possible.

Prompt design

System prompt: two layers

The system prompt is split into shared rules and pattern-specific instructions.

The shared rules enforce five things:

  1. Language fidelity — respond in the user's language, no exceptions (ten languages supported)
  2. Brevity — the opening should be short; leave room for the user to continue
  3. Consistent tone — warm, non-judgmental, like an encouraging friend
  4. Diversity — don't repeat previous openings (context about recent entries is included)
  5. No assumptions — don't invent specifics about the user's day that weren't provided

Pattern-specific instructions follow the shared block. For Three Lines, the model is told the exact three emoji markers, in order, and that each should be a single reflective sentence.

User prompt: key-value over prose

We pass context as a structured key-value string rather than natural language:

Language=ja, Date=2026-03-20, Time=evening, Pattern=free, MoodScore=4, StreakDays=10

Natural language ("The user is writing on the evening of March 20th...") is more tokens and harder to build programmatically. Claude reads key-value input without any loss — and adding new context fields like StreakDays is a one-line code change.

Per-pattern temperature

This is a quiet but effective lever. Structured patterns need the model to stay on format; free-form patterns benefit from variety.

const PATTERN_TEMPERATURE: Record<DiaryPattern, number> = {
  free: 0.85,        // maximize variety
  three_lines: 0.7,  // structured, some flexibility
  fact_feel_next: 0.65,
  gratitude: 0.7,
  five_min: 0.65,    // strict format adherence
  growth: 0.65,
};

Lowering temperature for structured patterns meaningfully improved the validator pass rate in our testing. The Free pattern stays high because sameness defeats the purpose.

Output validation

The baseline assumption is that LLM output cannot be trusted to match the specified format. Every pattern has a dedicated validator.

// Three Lines: require at least 2 of 3 expected emoji markers
function validateThreeLines(text: string): ValidationResult {
  const markers = ["😔", "✨", "🏃"];
  const found = countMarkers(text, markers);
  if (found < 2) {
    return { valid: false, reason: `found ${found}/3 markers` };
  }
  const cleaned = stripTrailingMeta(stripPreamble(text, markers));
  return { valid: true, cleaned };
}

The threshold is 2/3, not 3/3. Claude occasionally substitutes a similar emoji for one of the specified markers. Requiring exact matches would spike the rejection rate without meaningful quality gain. "Close enough" is the right bar here.

Stripping LLM noise

Models frequently add preamble and sign-off text that the user should never see:

  • Preamble: "Of course! Here's a diary opener for you..."
  • Trailing meta: "Feel free to adjust this to match your style!"

stripPreamble finds the first occurrence of an expected emoji marker and discards everything before it. stripTrailingMeta removes trailing lines matching patterns like "Feel free to..." or "Hope this helps...". Both run on every response before it reaches the validator.

Fallback strategy

Three situations trigger the fallback path:

  1. The Anthropic client is not initialized (API key missing in environment)
  2. The API call throws (network error, rate limit, service unavailable)
  3. The response fails validation

In all three cases, the handler returns a 200 response with suggestions from the static fallback pool. From the client's perspective, the feature worked.

if (!client) {
  const suggestions = getFallbackSuggestions(language, pattern);
  return NextResponse.json({ success: true, suggestions, ... });
}

The fallback pool covers all six patterns in English and Japanese. For other locales, we cascade: try the user's language → fall back to English → fall back to the Free pattern for that language. The priority is that something is always returned.

Subscription-gated usage quota

AI generation has a real cost per call, so we gate usage by plan:

| Plan | Monthly limit |
|---|---|
| Free | 5 uses |
| Pro | 30 uses |

The check runs against a ai_assist_usage table in Supabase:

const limit = await getUserLimit(user.id);
const used = await countMonthlyUsage(user.id);
if (limit !== null && used >= limit) {
  return NextResponse.json({
    error: "AI_ASSIST_LIMIT_EXCEEDED",
    usage: { used, limit, remaining: 0 },
  }, { status: 429 });
}

countMonthlyUsage is a COUNT query filtered by created_at >= start of current month. No scheduled reset job — the counter resets automatically when the month rolls over because the WHERE clause always scopes to the current month.

Every successful response includes the usage object (used / limit / remaining) so the client can display remaining count without an extra API call.

Non-fatal usage recording

Failing to record a use does not fail the request:

async function recordUsage(userId: string, tokenCount: number): Promise<void> {
  const { error } = await supabase.from("ai_assist_usage").insert({ ... });
  if (error) {
    console.error("[AI Assist API] Failed to record usage:", error);
    // Non-fatal: the response still goes out
  }
}

A database hiccup that drops a count or two is less harmful than an error that leaves the user with no response. The plan limits are soft enough that occasional under-counting is acceptable.

Streak days as context

We compute the user's current diary streak server-side and inject it into the prompt as StreakDays. This lets the model produce contextually relevant openings — acknowledging a ten-day streak, for example, without requiring the client to send that information.

The streak is computed from the last 60 days of diary records. If the client sends a streak_days field we use it; otherwise we compute it ourselves. Anything trust-sensitive is computed server-side.

Model and token budget

We use claude-haiku-4-5-20251001. Writing assist is a short-generation task, and Haiku delivers adequate quality at lower latency and cost than larger models.

max_tokens is capped per pattern:

const PATTERN_MAX_TOKENS: Record<DiaryPattern, number> = {
  free: 300,
  three_lines: 400,
  fact_feel_next: 500,
  gratitude: 500,
  five_min: 500,
  growth: 500,
};

Free suggestions are intentionally brief; structured patterns need more tokens for their multiple sections. Capping tokens per pattern prevents runaway generation and keeps costs predictable.

What worked, what we'd change

Worked well:

  • The pattern × language matrix from a single endpoint gives a lot of output variety without adding complexity on the client.
  • The validation + fallback double safety net means AI downtime is invisible to users.
  • Tying quota enforcement to the @storyie/subscription package means plan changes propagate automatically — the AI route picks up new limits without any logic change of its own.

Would change:

  • Static fallbacks get stale if the user hits them repeatedly. A rotation index would help.
  • The structured pattern validators are emoji-dependent. If the model substitutes a visually similar emoji, the validator may still accept it, but the text won't render with the expected icon. A semantic marker approach would be more robust.
  • Streak computation runs on every request. A short-lived cache or a precomputed column would be a better trade-off as usage scales.

Key takeaways

  1. Constrain LLM output with structure, then validate it. Giving the model a free-form brief produces inconsistent output. Specifying exact markers and sections, then validating their presence, is more reliable than hoping the model follows directions every time.
  2. Fallbacks are a first-class feature, not an afterthought. The writing assist UI should never return empty. Designing the three fallback trigger points from the start made it easy to reason about fault modes.
  3. Usage control belongs in the platform layer from day one. Wiring quota enforcement to the existing subscription logic at the start meant no retrofitting. Any future plan change is automatically reflected in the AI route.

Related Posts

Try Storyie

The writing assist feature is live in the app. Try it on the web or on iOS — pick a diary pattern, enter a mood score, and see what comes back.