Making Storyie discoverable to AI: llms.txt and Markdown route handlers in Next.js
Websites have robots.txt to tell search engine crawlers what they can access, and sitemap.xml to tell them what exists. Neither helps much when an AI assistant is trying to understand what a site is about.
Since late 2024 there has been a growing answer to that gap: llms.txt. The idea is simple — put a plain Markdown file at /llms.txt that describes your site's purpose, structure, and key content. AI assistants retrieving context before answering queries can read it far more efficiently than parsing your HTML.
We added llms.txt and a set of Markdown route handlers to Storyie's Next.js app. This post covers the implementation decisions: why we generate dynamically instead of statically, how Lexical JSON becomes Markdown, how we keep private diaries private, and what caching strategy makes sense for AI crawler traffic.
TL;DR
- llms.txt is to AI assistants what robots.txt is to search crawlers — a structured, machine-readable description of your site.
- For a UGC app, generate it dynamically via a Next.js Route Handler with ISR (
revalidate = 3600) rather than committing a static file. - Serve public diary content as Markdown at
.mdURLs for LLM readability; addX-Robots-Tag: noindexto prevent search engine duplication. - Filter for public-only content at the query layer — privacy enforcement belongs in SQL, not in the serialization step.
- Use
stale-while-revalidatecaching: AI crawlers tolerate mildly stale content, and unpredictable burst traffic should hit the CDN, not the database.
Concern | Approach |
|---|---|
llms.txt freshness | Route Handler + |
Content format for LLMs | Markdown with YAML frontmatter at |
Lexical → Markdown |
|
Privacy |
|
Search engine dedup |
|
Cache strategy |
|
What llms.txt actually is
The file lives at the site root (/llms.txt) and uses a defined Markdown structure:
# Site Name
> One-line summary
Longer description.
## Section
- [Page Name](URL): description
- [Page Name](URL): description
## Optional
- [Privacy Policy](URL): legal detailsWhere robots.txt controls crawler access, llms.txt provides context. ChatGPT, Claude, and Perplexity can read a well-written llms.txt in a fraction of the tokens it would take to parse the site's HTML, and the structure makes it easier to extract the right information. The spec is still informal but adoption by Stripe, Cloudflare, and Anthropic is enough momentum to treat it as worth implementing.
Why dynamic generation
For a static blog, public/llms.txt is entirely reasonable. Storyie has user-generated content — public diaries and notes that appear and disappear as users change their visibility settings. A static file committed to the repo would be stale by the next deployment.
We use a Route Handler with ISR instead:
// app/llms.txt/route.ts
import { diaryQueries } from "@/lib/db/queries/diary";
import { noteQueries } from "@/lib/db/queries/notes";
export const revalidate = 3600; // ISR: regenerate at most once per hour
export async function GET() {
const [publicDiaries, publicNotes] = await Promise.all([
diaryQueries.getPublicDiarySummaries(100),
noteQueries.getPublicNoteSummaries(100),
]);
const content = `# Storyie
> A diary and storytelling platform where personal thoughts become shareable stories.
Storyie is a cross-platform journaling app ...
## Public Diaries
${publicDiaries.map(({ diary, author }) =>
`- [${author?.slug}'s Diary - ${formatDate(diary.diaryDatetime)}](${baseUrl}/diary/${diary.slug}.md)`
).join("\n")}
## Public Notes
${publicNotes.map(({ note }) =>
`- [${note.title || "Untitled Note"}](${baseUrl}/note/${note.slug}.md)`
).join("\n")}
`;
return new Response(content, {
headers: { "Content-Type": "text/plain; charset=utf-8" },
});
}The ISR revalidate = 3600 means the first request after an hour triggers a background regeneration. Subsequent requests within that window get the cached version from the CDN. The DB query runs once per window rather than once per request, so the cost is negligible.
The llms.txt structure we settled on for Storyie:
Section | Content | Purpose |
|---|---|---|
Header | Site name and one-line pitch | Immediate intent signal for the AI |
Features | Links to key pages | Full feature surface at a glance |
Blog | Links to engineering posts | Detailed context about how the service works |
Public Diaries | Dynamic list of public diaries | UGC content for the AI to reference |
Public Notes | Dynamic list of public notes | UGC content for the AI to reference |
Optional | Privacy policy, terms | Available if the AI needs legal context |
Markdown route handlers
The links in llms.txt point to .md URLs. That means we need endpoints that actually return diary content as Markdown.
// app/(public)/diary/[slug].md/route.ts
export async function GET(_req: Request, { params }: { params: Promise<{ slug: string }> }) {
const { slug } = await params;
const entry = await diaryQueries.getPublicDiaryBySlugWithAuthor(slug);
if (!entry) {
return new Response("Not found", { status: 404 });
}
const { diary, author } = entry;
const title = extractTitle(diary.content) ?? `${author?.slug}'s Diary`;
const markdown = contentToMarkdown(diary.content);
const frontmatter = `---
title: ${yamlEscape(title)}
author: ${yamlEscape(author?.slug ?? "")}
published: ${formatDate(diary.diaryDatetime)}
url: ${baseUrl}/diary/${diary.slug}.md
---
`;
return new Response(frontmatter + markdown, {
headers: {
"Content-Type": "text/markdown; charset=utf-8",
"Cache-Control": "public, s-maxage=3600, stale-while-revalidate=86400",
"X-Robots-Tag": "noindex",
},
});
}Lexical JSON to Markdown
Storyie's editor is Lexical-based, so diary content is stored as a JSON tree — HeadingNode, ListNode, ParagraphNode, and so on. The shared package @storyie/lexical-common already contains a lexicalJsonToMarkdown() function that walks that tree and emits Markdown. Keeping the conversion there is deliberate: the package owns the node schema, so any new custom node type only needs a Markdown serializer added in one place.
// lib/utils/markdown-utils.ts
import { lexicalJsonToMarkdown } from "@storyie/lexical-common";
export function contentToMarkdown(content: unknown): string {
const c = content as Record<string, unknown>;
if (c.root) {
return lexicalJsonToMarkdown(JSON.stringify(content));
}
if (typeof c.text === "string") {
return c.text;
}
return "";
}YAML frontmatter for structured metadata
The frontmatter at the top of each Markdown response gives AI assistants structured access to metadata — title, author, publication date, canonical URL — without requiring them to parse the prose. User input goes directly into that YAML, so escaping is non-negotiable:
export function yamlEscape(value: string): string {
if (/[:#"'\n\r\t[\]{}|>!&*?,]/.test(value) || value.trim() !== value) {
return `"${value.replace(/\\/g, "\\\\").replace(/"/g, '\\"')}"`;
}
return value;
}Title extraction from content
Diaries in Storyie have no separate title field — the first heading in the content serves as the title. The extractor walks the Lexical node tree to find it:
export function extractTitle(content: unknown): string | null {
// Walk root.children looking for the first heading node
for (const node of root.children) {
if (node.type === "heading" && node.children) {
const text = node.children.map((child) => child.text ?? "").join("");
if (text.trim()) return text.trim();
}
}
return null;
}If there is no heading, the title falls back to "${author}'s Diary".
X-Robots-Tag: noindex
The Markdown endpoint is for AI assistants. It is not the canonical URL for the diary — the HTML page at /diary/[slug] is. Without noindex, Google would index both, creating a duplicate content problem. X-Robots-Tag: noindex on the Markdown response tells search crawlers to skip it while leaving LLM access open.
Privacy
Every query that feeds into llms.txt or the Markdown route handler filters on is_public = true at the SQL level. There is no post-query filtering step that could be bypassed:
// diary.ts
export const diaryQueries = {
getPublicDiarySummaries: async (limit: number) => {
// Only rows where is_public = true
},
getPublicDiaryBySlugWithAuthor: async (slug: string) => {
// Only rows where is_public = true AND slug matches
},
};A request to /diary/some-private-slug.md returns 404 because the query finds nothing — the filter happens before any content is serialized. This pairs with Supabase RLS at the database layer. Two independent enforcement points for the same rule.
Cache strategy for AI crawlers
AI crawler traffic is unpredictable. A mention in a widely-used AI assistant's context window can trigger bursts of requests at any time, and those bursts should hit the CDN edge, not the database.
The Markdown responses use:
Cache-Control: public, s-maxage=3600, stale-while-revalidate=86400s-maxage=3600 gives CDN nodes a one-hour fresh window. stale-while-revalidate=86400 extends that to 24 hours for background revalidation — the CDN returns the cached version immediately while fetching a fresh copy behind the scenes. For a diary app, 24-hour staleness is acceptable: the goal of these endpoints is discoverability, not real-time accuracy.
What we learned
llms.txt is cheap to add and hard to justify skipping. A single Route Handler and an hour of writing good descriptive copy is the entire implementation cost. The marginal value — being accurately represented when someone asks an AI assistant about journaling tools — is hard to measure but straightforward to reason about.
Dynamic generation is the right default for UGC apps. The static-file approach only works if your content doesn't change. Any app with user-generated public content needs to regenerate the file from the database. ISR makes that cheap.
The Markdown conversion layer earns its place in the shared package. Because @storyie/lexical-common already owns the node schema, adding Markdown serialization there means every surface — web rendering, Expo display, and LLM output — uses the same logic. A new custom node type gets a Markdown serializer once and works everywhere.
noindex + LLM access is an explicit design choice, not a side effect. Search engines see the HTML version; AI crawlers see the Markdown version. That separation lets us optimize each output for its consumer without either interfering with the other.
The spec is still evolving, but momentum is real. llms.txt was proposed in 2024 and is not yet formally standardized. Stripe, Cloudflare, and Anthropic have adopted it anyway. Implementing it now costs almost nothing; waiting until it is formalized costs discoverability in the meantime.
Related Posts
- Cross-platform Lexical with
use dom: monorepo gains and the bridges you still own — how@storyie/lexical-commonis structured and why the shared package owns the node schema - Building a Monorepo with pnpm and TypeScript — workspace conventions and cross-package dependency rules
Try Storyie
If you want to see what this looks like in production, visit storyie.com/llms.txt and compare it to the iOS app. The same diary content that renders as rich text in the app is served as structured Markdown for any AI assistant that wants to read it.