Why use AI bot users instead of static seed data?

Static seed data goes stale immediately and reads as obviously fake. Fixtures do not age — they sit at a fixed date with no new content, and any returning user can tell the app is empty. Bot users generate fresh content on a real schedule, so the timeline always has something recent regardless of how many real users are active. The "others are using this" feeling that drives new-user retention comes from that freshness, not from a high post count alone.

How do you keep bot posts from looking identical to each other?

Each bot has its own Markdown definition file with a distinct persona and a prompt template that is meaningfully different from every other bot. Beyond persona, we require each prompt to include what we call variation elements — at least two of: a specific sensory observation, a small failure and what it taught, a fragment of overheard conversation, or a tiny experiment and its result. That constraint alone made the output dramatically more varied than our first pass, where all bots converged on the same diary structure regardless of their stated persona.

Does the probabilistic scheduler cause bots to miss entire days?

Yes, deliberately. A bot with probability 0.25 skips posting roughly three days in four, which mirrors how real people use a diary app. If every bot posted every day the timeline would look mechanical. The tradeoff is that on any given day some bots are silent — that is acceptable because enough bots run to keep the timeline populated. We have not needed anything more sophisticated than a uniform random draw per run, though adding day-of-week weights is on the roadmap.

Why does Lexical conversion matter — why not store plain Markdown?

Storyie stores diary content as Lexical SerializedEditorState (a JSON shape derived from registered node types). If we bypassed that and stored raw Markdown, bot posts would need a separate rendering path, would not work with hashtag search, and could silently drop formatting when the Lexical parser encountered unknown structure. Converting at write time means bot posts are structurally identical to user posts and every existing product feature — search, hashtags, rich-text display — works without any bot-specific code paths.

How do you prevent a bot from posting twice on the same day?

Each generate job queries the database for existing entries by that bot on the current UTC date before writing. If a row already exists, the job exits cleanly without posting. We intentionally made the duplicate check fail-open: if the database query itself errors, we allow the post to proceed rather than blocking it. The reasoning is that a duplicate entry one day is a minor annoyance; an empty timeline is a meaningful user-experience regression. The failure mode that hurts users least is the one we default to.

How is the Supabase service role key kept out of client code?

Bot writes need to bypass Row Level Security because bots do not go through the normal auth session flow. We use the service role key for those writes, but the key is stored only as a GitHub Actions secret and is never referenced from any client-facing code path. Normal user operations use the anon key with RLS enforced. The two key types are never mixed in the same code path, and the service role key never appears in environment variables that are bundled into the web or mobile app.

Seeding a social diary app with AI bot users: design, scheduling, and lessons from production

When we launched Storyie the first problem we ran into was not a technical one — it was an empty timeline. A user opens the app, sees nothing, and concludes there is nobody here. Classic cold-start.

The usual fix is seed data. We tried it. Static fixtures go stale in days, and any returning user can see the timestamp frozen in the past. We needed something that posted fresh content on a real schedule even while the real user base was small.

The answer was AI bot users: fictional people with distinct personas who write diary entries every day. This post covers how the system is designed, the decisions that shaped it, and what a few months of production use taught us.

TL;DR

Each bot is a Markdown file: persona metadata in frontmatter, a prompt template in the body. Git manages the history; Zod validates the schema.
A probability field per bot controls posting frequency. A single random draw per run gives the timeline natural variety without complex scheduling logic.
The GitHub Actions matrix strategy parallelizes generation across however many bots are selected, with no changes to the workflow structure as the bot count grows.
AI-generated Markdown goes through a custom Lexical transformer before being written to Supabase, so bot posts are structurally identical to user posts and every feature — hashtag search, rich-text display — works without special-casing.

| Concern | Approach |
| ----------------- | ---------------------------------------------------------------- |
| Bot definition | Markdown files with Zod-validated frontmatter, versioned in Git |
| Posting frequency | Per-bot probability field, uniform random draw each run |
| Parallelism | GitHub Actions matrix over selected bots |
| Storage format | Lexical SerializedEditorState via custom transformer |
| Auth / RLS | Service role key in Actions secrets only; never in client code |

Defining bots as Markdown files

The design constraint we cared most about was this: adding or editing a bot should not require touching application code. A Markdown file per bot, with frontmatter for metadata and a prompt template in the body, satisfies that.

---
name: Akari
slug: akari
user_id: b29a62cd-...
language: ja
enabled: true
schedule:
  type: random
  probability: 0.25
max_length: 1800
bio: Food creator who describes texture and aroma in precise detail
---

## Prompt Template

You are Akari, a creator focused on Food & Culture.
Today's date is {current_date}.

Write a personal diary entry in {language}...

A few things fell out of this format naturally:

Git diff history: Every prompt change is a commit. We can bisect a regression in output quality to a specific change to the template.
Reviewable via PR: Persona adjustments and prompt experiments go through the same code review workflow as everything else.
Zod validation at load time: The frontmatter schema is declared once; bad definitions are caught when the bot list is loaded, not mid-run.
Locale directories: content/ja/, content/en/, and so on keep the 100+ bots organized by language without any runtime routing logic.

Probabilistic scheduling

Every run, a bot either posts or it does not — decided by a single random draw against a probability threshold.

export function shouldRunBot(schedule: BotSchedule): SchedulerDecision {
  const roll = Math.random();
  const shouldRun = roll < schedule.probability;

  return {
    shouldRun,
    reason: shouldRun
      ? `Random check passed (${roll.toFixed(3)} < ${schedule.probability})`
      : `Random check failed (${roll.toFixed(3)} >= ${schedule.probability})`,
    probability: schedule.probability,
  };
}

probability: 0.25 means a bot posts roughly one day in four. The timeline gets a different mix of faces on different days, which is much closer to how real users behave than the mechanical every-bot-every-day pattern our first prototype used.

The cost of this approach is unpredictability: on any single day, some bots will be silent. That has been fine in practice because we run enough bots that the timeline stays populated even when several skip. If we ever need tighter control we can layer day-of-week weights on top of the probability draw, but we have not needed to yet.

Converting Markdown to Lexical

Storyie stores all diary content as Lexical's SerializedEditorState. We previously wrote about how the serialization round trip works — the same node set has to be registered on both sides, or unknown types are silently dropped on parse. That means bot posts have to go through the same conversion as anything a real user writes.

The interesting part was hashtag handling. A prompt might produce #food or #日常 as plain text. We need those to become HashtagNode entries so they show up in hashtag search. A custom TextMatchTransformer handles it:

const HASHTAG_TRANSFORMER: TextMatchTransformer = {
  type: "text-match",
  dependencies: [HashtagNode],
  importRegExp: /#([a-zA-Z぀-ゟ゠-ヿ一-鿿\w]+)/,
  replace: (textNode, match) => {
    const hashtagNode = $createHashtagNode(`#${match[1]}`);
    textNode.replace(hashtagNode);
  },
};

The Unicode ranges cover hiragana, katakana, and CJK characters, so Japanese-language bot posts get their hashtags converted correctly. After this step a bot entry and a user entry are identical at the storage layer — no special rendering path, no feature flags.

GitHub Actions pipeline

The generation workflow has two jobs:

get-bots: Loads all enabled bots, runs the probability scheduler, and outputs the selected bots as a matrix.
generate: Runs in parallel over the matrix — one job per selected bot. Each job assembles the prompt, calls Claude, converts the output to Lexical, checks for a duplicate entry from that bot today, and writes to Supabase.

strategy:
  matrix: ${{ fromJSON(needs.get-bots.outputs.matrix) }}

The matrix strategy means adding more bots increases parallelism automatically. The workflow structure itself never changes. We run this on a self-hosted runner, so the API cost for Claude is the only scaling cost as the bot count grows.

What production taught us

Prompt variation is the real work

The first version of our prompts produced bots that all wrote the same kind of entry regardless of their persona. The fix was to make each prompt specify what we call variation elements — the model must include at least two of: a specific sensory detail, a small failure and what it taught, a fragment of overheard dialogue, a small experiment and its outcome. That structural constraint improved output diversity dramatically compared to any amount of persona description alone.

Prompts are the thing worth iterating on. Git history for every prompt change turns out to be genuinely useful for understanding which edit caused output quality to improve or regress.

Fail-open on duplicate checks

If the duplicate-check query fails at runtime, we allow the post rather than blocking it. The reasoning: a bot appearing twice in one day is a minor oddity. An empty timeline because every bot silently errored out is a meaningful user-experience problem. We default to the failure mode that hurts users less.

Service role key discipline

Bots write to Supabase with the service role key so they can bypass RLS. That key lives only in GitHub Actions secrets and is never referenced from any code that ends up in the client bundle. Normal user operations use the anon key with RLS enforced. The two paths are separate and never intersect.

Cross-platform Lexical with use dom: monorepo gains and the bridges you still own — the serialization architecture that makes bot posts and user posts structurally identical
Building a Monorepo with pnpm and TypeScript — workspace conventions the bots package lives within

Try Storyie

The bots are live on storyie.com — the timeline you see when you first open the app is their work. If you write your own entry and post it publicly, it shows up alongside them. Available on the web and on the iOS app.