Making Storyie discoverable to AI: llms.txt and Markdown route handlers in Next.js

Storyie Engineering Team
7 min read

How we implemented llms.txt and Markdown route handlers in the Storyie Next.js app — dynamic generation via ISR, Lexical JSON-to-Markdown conversion, privacy-safe public content filtering, and why noindex + stale-while-revalidate is the right cache strategy for LLM crawlers.

Making Storyie discoverable to AI: llms.txt and Markdown route handlers in Next.js

Websites have robots.txt to tell search engine crawlers what they can access, and sitemap.xml to tell them what exists. Neither helps much when an AI assistant is trying to understand what a site is about.

Since late 2024 there has been a growing answer to that gap: llms.txt. The idea is simple — put a plain Markdown file at /llms.txt that describes your site's purpose, structure, and key content. AI assistants retrieving context before answering queries can read it far more efficiently than parsing your HTML.

We added llms.txt and a set of Markdown route handlers to Storyie's Next.js app. This post covers the implementation decisions: why we generate dynamically instead of statically, how Lexical JSON becomes Markdown, how we keep private diaries private, and what caching strategy makes sense for AI crawler traffic.

TL;DR

  • llms.txt is to AI assistants what robots.txt is to search crawlers — a structured, machine-readable description of your site.
  • For a UGC app, generate it dynamically via a Next.js Route Handler with ISR (revalidate = 3600) rather than committing a static file.
  • Serve public diary content as Markdown at .md URLs for LLM readability; add X-Robots-Tag: noindex to prevent search engine duplication.
  • Filter for public-only content at the query layer — privacy enforcement belongs in SQL, not in the serialization step.
  • Use stale-while-revalidate caching: AI crawlers tolerate mildly stale content, and unpredictable burst traffic should hit the CDN, not the database.

Concern

Approach

llms.txt freshness

Route Handler + revalidate = 3600 (ISR)

Content format for LLMs

Markdown with YAML frontmatter at /diary/[slug].md

Lexical → Markdown

lexicalJsonToMarkdown() from @storyie/lexical-common

Privacy

is_public = true filter at the SQL query layer

Search engine dedup

X-Robots-Tag: noindex on every Markdown response

Cache strategy

s-maxage=3600, stale-while-revalidate=86400

What llms.txt actually is

The file lives at the site root (/llms.txt) and uses a defined Markdown structure:

# Site Name

> One-line summary

Longer description.

## Section

- [Page Name](URL): description
- [Page Name](URL): description

## Optional

- [Privacy Policy](URL): legal details

Where robots.txt controls crawler access, llms.txt provides context. ChatGPT, Claude, and Perplexity can read a well-written llms.txt in a fraction of the tokens it would take to parse the site's HTML, and the structure makes it easier to extract the right information. The spec is still informal but adoption by Stripe, Cloudflare, and Anthropic is enough momentum to treat it as worth implementing.

Why dynamic generation

For a static blog, public/llms.txt is entirely reasonable. Storyie has user-generated content — public diaries and notes that appear and disappear as users change their visibility settings. A static file committed to the repo would be stale by the next deployment.

We use a Route Handler with ISR instead:

// app/llms.txt/route.ts
import { diaryQueries } from "@/lib/db/queries/diary";
import { noteQueries } from "@/lib/db/queries/notes";

export const revalidate = 3600; // ISR: regenerate at most once per hour

export async function GET() {
  const [publicDiaries, publicNotes] = await Promise.all([
    diaryQueries.getPublicDiarySummaries(100),
    noteQueries.getPublicNoteSummaries(100),
  ]);

  const content = `# Storyie

> A diary and storytelling platform where personal thoughts become shareable stories.

Storyie is a cross-platform journaling app ...

## Public Diaries

${publicDiaries.map(({ diary, author }) =>
  `- [${author?.slug}'s Diary - ${formatDate(diary.diaryDatetime)}](${baseUrl}/diary/${diary.slug}.md)`
).join("\n")}

## Public Notes

${publicNotes.map(({ note }) =>
  `- [${note.title || "Untitled Note"}](${baseUrl}/note/${note.slug}.md)`
).join("\n")}
`;

  return new Response(content, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

The ISR revalidate = 3600 means the first request after an hour triggers a background regeneration. Subsequent requests within that window get the cached version from the CDN. The DB query runs once per window rather than once per request, so the cost is negligible.

The llms.txt structure we settled on for Storyie:

Section

Content

Purpose

Header

Site name and one-line pitch

Immediate intent signal for the AI

Features

Links to key pages

Full feature surface at a glance

Blog

Links to engineering posts

Detailed context about how the service works

Public Diaries

Dynamic list of public diaries

UGC content for the AI to reference

Public Notes

Dynamic list of public notes

UGC content for the AI to reference

Optional

Privacy policy, terms

Available if the AI needs legal context

Markdown route handlers

The links in llms.txt point to .md URLs. That means we need endpoints that actually return diary content as Markdown.

// app/(public)/diary/[slug].md/route.ts
export async function GET(_req: Request, { params }: { params: Promise<{ slug: string }> }) {
  const { slug } = await params;
  const entry = await diaryQueries.getPublicDiaryBySlugWithAuthor(slug);

  if (!entry) {
    return new Response("Not found", { status: 404 });
  }

  const { diary, author } = entry;
  const title = extractTitle(diary.content) ?? `${author?.slug}'s Diary`;
  const markdown = contentToMarkdown(diary.content);

  const frontmatter = `---
title: ${yamlEscape(title)}
author: ${yamlEscape(author?.slug ?? "")}
published: ${formatDate(diary.diaryDatetime)}
url: ${baseUrl}/diary/${diary.slug}.md
---

`;

  return new Response(frontmatter + markdown, {
    headers: {
      "Content-Type": "text/markdown; charset=utf-8",
      "Cache-Control": "public, s-maxage=3600, stale-while-revalidate=86400",
      "X-Robots-Tag": "noindex",
    },
  });
}

Lexical JSON to Markdown

Storyie's editor is Lexical-based, so diary content is stored as a JSON tree — HeadingNode, ListNode, ParagraphNode, and so on. The shared package @storyie/lexical-common already contains a lexicalJsonToMarkdown() function that walks that tree and emits Markdown. Keeping the conversion there is deliberate: the package owns the node schema, so any new custom node type only needs a Markdown serializer added in one place.

// lib/utils/markdown-utils.ts
import { lexicalJsonToMarkdown } from "@storyie/lexical-common";

export function contentToMarkdown(content: unknown): string {
  const c = content as Record<string, unknown>;
  if (c.root) {
    return lexicalJsonToMarkdown(JSON.stringify(content));
  }
  if (typeof c.text === "string") {
    return c.text;
  }
  return "";
}

YAML frontmatter for structured metadata

The frontmatter at the top of each Markdown response gives AI assistants structured access to metadata — title, author, publication date, canonical URL — without requiring them to parse the prose. User input goes directly into that YAML, so escaping is non-negotiable:

export function yamlEscape(value: string): string {
  if (/[:#"'\n\r\t[\]{}|>!&*?,]/.test(value) || value.trim() !== value) {
    return `"${value.replace(/\\/g, "\\\\").replace(/"/g, '\\"')}"`;
  }
  return value;
}

Title extraction from content

Diaries in Storyie have no separate title field — the first heading in the content serves as the title. The extractor walks the Lexical node tree to find it:

export function extractTitle(content: unknown): string | null {
  // Walk root.children looking for the first heading node
  for (const node of root.children) {
    if (node.type === "heading" && node.children) {
      const text = node.children.map((child) => child.text ?? "").join("");
      if (text.trim()) return text.trim();
    }
  }
  return null;
}

If there is no heading, the title falls back to "${author}'s Diary".

X-Robots-Tag: noindex

The Markdown endpoint is for AI assistants. It is not the canonical URL for the diary — the HTML page at /diary/[slug] is. Without noindex, Google would index both, creating a duplicate content problem. X-Robots-Tag: noindex on the Markdown response tells search crawlers to skip it while leaving LLM access open.

Privacy

Every query that feeds into llms.txt or the Markdown route handler filters on is_public = true at the SQL level. There is no post-query filtering step that could be bypassed:

// diary.ts
export const diaryQueries = {
  getPublicDiarySummaries: async (limit: number) => {
    // Only rows where is_public = true
  },
  getPublicDiaryBySlugWithAuthor: async (slug: string) => {
    // Only rows where is_public = true AND slug matches
  },
};

A request to /diary/some-private-slug.md returns 404 because the query finds nothing — the filter happens before any content is serialized. This pairs with Supabase RLS at the database layer. Two independent enforcement points for the same rule.

Cache strategy for AI crawlers

AI crawler traffic is unpredictable. A mention in a widely-used AI assistant's context window can trigger bursts of requests at any time, and those bursts should hit the CDN edge, not the database.

The Markdown responses use:

Cache-Control: public, s-maxage=3600, stale-while-revalidate=86400

s-maxage=3600 gives CDN nodes a one-hour fresh window. stale-while-revalidate=86400 extends that to 24 hours for background revalidation — the CDN returns the cached version immediately while fetching a fresh copy behind the scenes. For a diary app, 24-hour staleness is acceptable: the goal of these endpoints is discoverability, not real-time accuracy.

What we learned

llms.txt is cheap to add and hard to justify skipping. A single Route Handler and an hour of writing good descriptive copy is the entire implementation cost. The marginal value — being accurately represented when someone asks an AI assistant about journaling tools — is hard to measure but straightforward to reason about.

Dynamic generation is the right default for UGC apps. The static-file approach only works if your content doesn't change. Any app with user-generated public content needs to regenerate the file from the database. ISR makes that cheap.

The Markdown conversion layer earns its place in the shared package. Because @storyie/lexical-common already owns the node schema, adding Markdown serialization there means every surface — web rendering, Expo display, and LLM output — uses the same logic. A new custom node type gets a Markdown serializer once and works everywhere.

noindex + LLM access is an explicit design choice, not a side effect. Search engines see the HTML version; AI crawlers see the Markdown version. That separation lets us optimize each output for its consumer without either interfering with the other.

The spec is still evolving, but momentum is real. llms.txt was proposed in 2024 and is not yet formally standardized. Stripe, Cloudflare, and Anthropic have adopted it anyway. Implementing it now costs almost nothing; waiting until it is formalized costs discoverability in the meantime.

Related Posts

Try Storyie

If you want to see what this looks like in production, visit storyie.com/llms.txt and compare it to the iOS app. The same diary content that renders as rich text in the app is served as structured Markdown for any AI assistant that wants to read it.