Eight cron jobs in production: how we run background work on SST v3 and Lambda

Storyie Engineering Team
10 min read

A practical look at how Storyie runs eight scheduled Lambda jobs — view aggregation, like tallies, performance monitoring, hashtag extraction, reminder notifications, email queues, and a monthly AI report — using SST v3 Cron, and what we learned along the way.

Storyie runs eight background jobs in production: view aggregation, like tallies, performance monitoring, hashtag extraction, diary reminders, email queue processing, welcome emails, and a monthly AI-generated report. Every one of them is defined with sst.aws.Cron and executed on Lambda.

This post covers the design decisions behind that setup — why we chose Lambda Cron, how we structured the code, and the specific patterns that actually helped once things were running in production.

TL;DR

  • sst.aws.Cron spins up a dedicated Lambda per job, keeping cost near zero for low-frequency work and isolating each job's memory, timeout, and environment variables from the others.
  • All handlers live in a standalone packages/jobs package — independent from the Next.js app, testable in isolation, and deployable without touching web infrastructure.
  • We offset overlapping jobs by 30 minutes to avoid hitting the database with heavy queries at the same instant.
  • Reminder notifications fire every 15 minutes and query for users whose configured time falls in the current window — no per-timezone scheduling needed.
  • The monthly AI report runs on the first of each month, converts Lexical rich-text JSON to plain text, caps input at 30,000 characters per user, and calls the Anthropic API. It is disabled in staging to avoid surprise costs.
  • Each job receives only the environment variables it needs — no blanket secret injection.
  • Every handler doubles as a CLI script for local dry-runs and manual recovery.

Concern

Our approach

Cost

Lambda billed per invocation — near zero for daily aggregations

Isolation

One Lambda per job, one environment block per job

Load spreading

30-minute offsets between jobs that hit the same tables

Timezone support

15-minute polling cadence, window query in the handler

Testing

Dual Lambda + CLI entry point per handler

Secrets

Minimum variables per job, not a shared full-stack environment

Why Lambda Cron

We considered four options before landing on SST Cron.

Approach

Pros

Cons

Next.js API route + ext. cron

Simple to deploy

Shared Lambda, 15 s timeout, external dependency

ECS / Fargate tasks

Handles long-running work

Higher cost, more infrastructure to manage

SST Cron + Lambda

Serverless, near-zero cost, IaC-first

15-minute Lambda timeout ceiling

SQS + Lambda

Event-driven, built-in retry

Not designed for time-based scheduling

For Storyie the decision came down to two things: running cost and management overhead. Lambda Cron charges only for invocations and duration. A handful of daily aggregation jobs costs essentially nothing. Everything — schedule, memory, timeout, environment — is declared in sst.config.ts alongside the rest of the stack, so there is no separate cron dashboard or service to keep in sync.

The packages/jobs package

Job handlers live in their own workspace package, separate from the Next.js app.

packages/jobs/
├── src/
│   ├── handlers/
│   │   ├── aggregations/   # views, likes, performance
│   │   ├── emails/         # welcome emails, queue processing
│   │   ├── notifications/  # diary reminders
│   │   ├── reports/        # monthly AI report
│   │   ├── tags/           # hashtag extraction
│   │   └── x-posting/      # automated social posts
│   ├── lib/                # shared utilities
│   └── types/
├── package.json
└── tsconfig.json

Keeping jobs in a separate package means:

  • Independent deploys. Fixing a job does not require rebuilding or redeploying the Next.js app.
  • Explicit dependencies. Each handler imports only what it needs, and the package's package.json makes those dependencies auditable.
  • Testable in isolation. Handlers are plain async functions; a test can import and call them directly.
  • CLI mode. The same handler can run from a terminal for dry-runs or manual recovery (more on this below).

Each handler is a single exported handler function. SST deploys that function as a Lambda.

Schedule design: spreading load across the database

Eight jobs running on the same database is a coordination problem. Here is what we do about it.

Offset jobs that share tables

View aggregation and like aggregation both run every four hours and both touch related tables. Running them at the same second doubles the query pressure. We offset one by 30 minutes:

// Views: every 4 hours on the hour
new sst.aws.Cron("ViewAggregator", {
  schedule: "cron(0 */4 * * ? *)",
  // ...
});

// Likes: every 4 hours at :30 (offset by 30 minutes)
new sst.aws.Cron("LikeAggregator", {
  schedule: "cron(30 */4 * * ? *)",
  // ...
});

It is a small change, but it makes a real difference when both jobs are running heavy aggregation queries.

Match frequency to freshness requirements

Not every job needs to run at the same cadence. We ask one question for each: how stale is too stale?

Job

Frequency

Rationale

Performance aggregation

Daily at 02:00 UTC

Previous day's data processed in one batch

View / like aggregation

Every 4 hours

Balance between freshness and query cost

Hashtag extraction

Every 1 hour

New diary tags reflected within a reasonable window

Diary reminders

Every 15 minutes

Enough resolution to respect per-user timezone times

Email queue

Every 5 minutes

Queued emails processed promptly

Monthly AI report

1st of month 03:00

Prior month analyzed once at the start of the new one

View counts being four hours stale in a diary app does not affect the user experience. Running that job more frequently would cost more and add database load for no user-visible benefit.

Reminder notifications and timezones

Diary reminders need to fire at a user-configured time — say, 20:00 in the user's local timezone. Users are spread across the world, so we cannot use a single timezone-specific schedule.

Our approach: run the Lambda every 15 minutes and query for users whose reminder time (converted from their stored timezone to UTC) falls within the current 15-minute window.

cron(0/15 * * * ? *)  →  fires at :00, :15, :30, :45 each hour

A user configured for 20:00 JST is 11:00 UTC. When the Lambda fires at 11:00, the query returns that user. The maximum notification delay is 14 minutes, which is fine for this use case.

Performance monitoring: two-day breach before alerting

The performance aggregation job does more than write numbers to a table. It implements a simple monitoring pipeline:

  1. Collect RUM samples (LCP, CLS, INP) sent from the browser.
  2. Compute the P75 for each metric.
  3. Compare LCP P75 against a 1-second threshold — above it is a breach, below is a pass.
  4. Check whether yesterday was also a breach.
  5. If two consecutive days breach, create an alert. If a pass follows a breach, resolve any open alerts automatically.

The two-day rule filters out transient spikes — a deployment cold-start, a brief CDN issue — without requiring manual suppression or external monitoring tooling. One bad day is noise. Two bad days is signal.

Closing this loop inside a Lambda means we do not depend on an external observability service to catch regressions. The monitoring logic is code, it lives in the repo, and it is testable.

Monthly AI report

The monthly report is the most involved job we run. On the first of each month at 03:00 UTC, it fetches the previous month's diary entries for every Pro subscriber, converts the Lexical rich-text JSON to plain text, and passes the content to the Anthropic API to generate a personal summary: emotional trends, recurring topics, standout moments.

A few design choices worth noting:

  • Lexical JSON → plain text. Diary content is stored as Lexical editor state (structured JSON). Before sending to the API we extract plain text so the model works with readable prose, not node trees.
  • 30,000-character cap per user. This limits token usage and keeps the per-user cost predictable as the diary length grows.
  • 100 ms delay between users. A simple rate-limit guard that keeps us within the Anthropic API's request budget.
  • Production-only. The job is wrapped in a stage check so it never runs in staging, where it would burn real API budget on test data.
if (stage === "production") {
  new sst.aws.Cron("MonthlyReportGenerator", {
    schedule: "cron(0 3 1 * ? *)",
    job: {
      handler: "packages/jobs/src/handlers/reports/monthlyReport.handler",
      timeout: "15 minutes",
      memory: "1024 MB",
    },
  });
}

We allocate the full 15-minute Lambda timeout and 1 GB of memory. Today the job finishes comfortably within those limits. When the user base grows to the point that a single invocation cannot finish in time, the migration path is SQS + one Lambda invocation per user — the handler logic stays the same, only the orchestration changes. We are not optimizing for that today.

Environment variables per job, not a shared pool

Each Cron definition includes only the environment variables that specific job needs:

new sst.aws.Cron("ViewAggregator", {
  job: {
    environment: {
      DATABASE_URL: process.env.DATABASE_URL!,
      // nothing else — no Stripe keys, no Firebase credentials
    },
  },
});

Passing a shared environment block to every job is tempting because it is less typing. We do not do it because a compromised or misconfigured aggregation job should not be able to read Stripe secret keys. Least privilege at the environment-variable level costs almost nothing and meaningfully reduces the blast radius of any single job going wrong.

Dual Lambda + CLI entry point

Every handler is written to work in two modes: as a Lambda function and as a CLI script.

// Lambda entry point — SST calls this
export async function handler() {
  // core logic
}

// CLI entry point — runs when invoked directly
async function main() {
  const options = parseArgs();
  // same logic, command-line flags for date ranges, dry-run, etc.
}

if (process.argv[1] === new URL(import.meta.url).pathname) {
  main();
}

This makes it straightforward to run any job locally before deploying:

pnpm perf:aggregate --dry-run --from=2025-11-01 --to=2025-11-02

It also serves as a manual recovery path. When a job fails in production and we need to reprocess a date range without waiting for the next scheduled invocation, we run the handler directly from the command line. The dual entry point is low-effort to set up and has saved us meaningful debugging time.

Things that tripped us up

Cold starts on frequently-running jobs

Even a job that runs every 15 minutes will see cold starts. Lambda does not guarantee a warm container. We budget extra time in every timeout allocation to account for this — a job with 30 seconds of actual work gets at least two minutes on the clock.

copyFiles for email templates

The email queue handler references HTML templates at runtime. When SST bundles the Lambda, it does not automatically include files that are not imported in the dependency graph. Templates need to be copied explicitly:

new sst.aws.Cron("WelcomeEmailSender", {
  job: {
    copyFiles: [
      { from: "apps/web/content/emails", to: "content/emails" },
    ],
  },
});

Forgetting this produces a Lambda that boots successfully and then fails silently when it tries to read a template file that does not exist in the bundle. The error is not obvious from the Lambda logs unless you know to look for it.

EventBridge cron syntax is not standard Unix cron

AWS EventBridge uses a six-field cron expression with an exclusive constraint between the day-of-month and day-of-week fields — exactly one of them must be ?.

✗ cron(0 2 * * * *)    ← day-of-week must be ? when day-of-month is *
✓ cron(0 2 * * ? *)    ← correct

The error message when you get this wrong is not always clear. If a schedule silently fails to register, check the cron syntax first.

Takeaways

sst.aws.Cron is a good fit for low-to-medium frequency scheduled work in a serverless stack. The things we would do the same way again:

  • One package for all job handlers. Clean separation from the web app, explicit dependencies, testable in isolation.
  • Offset overlapping jobs. Thirty minutes of offset cost nothing and meaningfully reduces peak database load.
  • 15-minute polling for timezone-aware work. Simpler and more reliable than trying to schedule per-timezone Lambdas.
  • Per-job environment variables. Small discipline, real security improvement.
  • CLI mode on every handler. Pays back its setup cost the first time you need to manually reprocess something.

If and when we need to scale beyond what a single Lambda invocation can handle, SQS fan-out is the natural next step. Until then, the simplicity of a cron-scheduled Lambda is hard to beat.

Related Posts

Try Storyie

The jobs described here run every day against real user data at storyie.com. If you are a Pro subscriber, the monthly report lands in your account on the first of each month. The iOS app surfaces it alongside your diary history.