Why use SST Cron + Lambda instead of a Next.js API route with an external cron service?

Next.js API routes work for quick tasks, but they share the Lambda that also serves web traffic, which means cold-start pressure and a 15-second response timeout. An external cron service also adds another dependency to manage. SST Cron spins up a dedicated Lambda per job, so each job is isolated: its own timeout, its own memory allocation, its own environment variables. Infrastructure is declared in the same TypeScript file as the rest of the stack, so there is nothing to configure outside the repo. For us the combination of cost (near-zero for a handful of daily runs), isolation, and IaC-first management made it the easy call.

How do you handle timezone-aware reminders when Lambda runs on a fixed UTC schedule?

We run a Lambda every 15 minutes around the clock. Each invocation queries the database for users whose reminder time falls within the current 15-minute window, converting from their stored timezone to UTC for the comparison. The Lambda does not try to be timezone-smart itself — it just fires frequently enough that any user-configured time lands within one window. The maximum drift is 14 minutes, which is acceptable for a diary reminder. This approach scales to any number of timezones without changing the schedule.

How does the performance monitoring job avoid noisy alerts?

It collects browser RUM samples (LCP, CLS, INP), computes the P75 for each metric, and checks whether LCP P75 exceeds 1 second. A single breach does not fire an alert — we mark that day as a breach and move on. An alert only fires when the previous day was also marked as a breach. That two-day consecutive rule filters out transient spikes like a deployment cold-start or a CDN hiccup without needing any external monitoring service or manual suppression window.

What is the monthly AI report job doing, and how do you keep the cost reasonable?

The job runs on the first of each month at 03:00 UTC (production only — disabled in staging). It fetches the previous month's diary entries for every Pro subscriber, converts Lexical rich-text JSON to plain text, and passes up to 30,000 characters per user to the Anthropic API. Claude analyzes the entries and generates a personal report covering emotional trends, recurring themes, and highlights. We cap the input per user to control token cost, insert a 100 ms delay between users to stay within API rate limits, and allocate 15-minute timeout and 1 GB memory to the Lambda. If the user base grows to the point where one Lambda invocation can't finish in time, the natural next step is SQS + Lambda so each user is processed independently.

Why does each job only receive the environment variables it actually needs?

Least-privilege at the environment-variable level. A view-aggregation job only needs a database connection string — it has no business touching Stripe keys or Firebase credentials. Passing only what is needed means a misconfigured or compromised job cannot read secrets it was never supposed to see. It also makes each job easier to reason about in isolation: you can look at the environment block and know exactly what external services the job talks to.

How do you test and manually recover jobs without deploying?

Every handler exports both a Lambda entry point and a CLI entry point. When the module is executed directly (process.argv[1] matches the file's URL), it runs a main() function that accepts command-line flags — --dry-run, date ranges, etc. This lets us run any job locally against real or test data before deploying: `pnpm perf:aggregate --dry-run --from=2025-11-01 --to=2025-11-02`. The same path serves as a manual recovery tool when a job fails in production and we need to reprocess a specific date range without waiting for the next scheduled run.

Eight cron jobs in production: how we run background work on SST v3 and Lambda

Storyie runs eight background jobs in production: view aggregation, like tallies, performance monitoring, hashtag extraction, diary reminders, email queue processing, welcome emails, and a monthly AI-generated report. Every one of them is defined with sst.aws.Cron and executed on Lambda.

This post covers the design decisions behind that setup — why we chose Lambda Cron, how we structured the code, and the specific patterns that actually helped once things were running in production.

TL;DR

sst.aws.Cron spins up a dedicated Lambda per job, keeping cost near zero for low-frequency work and isolating each job's memory, timeout, and environment variables from the others.
All handlers live in a standalone packages/jobs package — independent from the Next.js app, testable in isolation, and deployable without touching web infrastructure.
We offset overlapping jobs by 30 minutes to avoid hitting the database with heavy queries at the same instant.
Reminder notifications fire every 15 minutes and query for users whose configured time falls in the current window — no per-timezone scheduling needed.
The monthly AI report runs on the first of each month, converts Lexical rich-text JSON to plain text, caps input at 30,000 characters per user, and calls the Anthropic API. It is disabled in staging to avoid surprise costs.
Each job receives only the environment variables it needs — no blanket secret injection.
Every handler doubles as a CLI script for local dry-runs and manual recovery.

Concern	Our approach
Cost	Lambda billed per invocation — near zero for daily aggregations
Isolation	One Lambda per job, one environment block per job
Load spreading	30-minute offsets between jobs that hit the same tables
Timezone support	15-minute polling cadence, window query in the handler
Testing	Dual Lambda + CLI entry point per handler
Secrets	Minimum variables per job, not a shared full-stack environment

Why Lambda Cron

We considered four options before landing on SST Cron.

Approach	Pros	Cons
Next.js API route + ext. cron	Simple to deploy	Shared Lambda, 15 s timeout, external dependency
ECS / Fargate tasks	Handles long-running work	Higher cost, more infrastructure to manage
SST Cron + Lambda	Serverless, near-zero cost, IaC-first	15-minute Lambda timeout ceiling
SQS + Lambda	Event-driven, built-in retry	Not designed for time-based scheduling

For Storyie the decision came down to two things: running cost and management overhead. Lambda Cron charges only for invocations and duration. A handful of daily aggregation jobs costs essentially nothing. Everything — schedule, memory, timeout, environment — is declared in sst.config.ts alongside the rest of the stack, so there is no separate cron dashboard or service to keep in sync.

The `packages/jobs` package

Job handlers live in their own workspace package, separate from the Next.js app.

packages/jobs/
├── src/
│   ├── handlers/
│   │   ├── aggregations/   # views, likes, performance
│   │   ├── emails/         # welcome emails, queue processing
│   │   ├── notifications/  # diary reminders
│   │   ├── reports/        # monthly AI report
│   │   ├── tags/           # hashtag extraction
│   │   └── x-posting/      # automated social posts
│   ├── lib/                # shared utilities
│   └── types/
├── package.json
└── tsconfig.json

Keeping jobs in a separate package means:

Independent deploys. Fixing a job does not require rebuilding or redeploying the Next.js app.
Explicit dependencies. Each handler imports only what it needs, and the package's package.json makes those dependencies auditable.
Testable in isolation. Handlers are plain async functions; a test can import and call them directly.
CLI mode. The same handler can run from a terminal for dry-runs or manual recovery (more on this below).

Each handler is a single exported handler function. SST deploys that function as a Lambda.

Schedule design: spreading load across the database

Eight jobs running on the same database is a coordination problem. Here is what we do about it.

Offset jobs that share tables

View aggregation and like aggregation both run every four hours and both touch related tables. Running them at the same second doubles the query pressure. We offset one by 30 minutes:

// Views: every 4 hours on the hour
new sst.aws.Cron("ViewAggregator", {
  schedule: "cron(0 */4 * * ? *)",
  // ...
});

// Likes: every 4 hours at :30 (offset by 30 minutes)
new sst.aws.Cron("LikeAggregator", {
  schedule: "cron(30 */4 * * ? *)",
  // ...
});

It is a small change, but it makes a real difference when both jobs are running heavy aggregation queries.

Match frequency to freshness requirements

Not every job needs to run at the same cadence. We ask one question for each: how stale is too stale?

Job	Frequency	Rationale
Performance aggregation	Daily at 02:00 UTC	Previous day's data processed in one batch
View / like aggregation	Every 4 hours	Balance between freshness and query cost
Hashtag extraction	Every 1 hour	New diary tags reflected within a reasonable window
Diary reminders	Every 15 minutes	Enough resolution to respect per-user timezone times
Email queue	Every 5 minutes	Queued emails processed promptly
Monthly AI report	1st of month 03:00	Prior month analyzed once at the start of the new one

View counts being four hours stale in a diary app does not affect the user experience. Running that job more frequently would cost more and add database load for no user-visible benefit.

Reminder notifications and timezones

Diary reminders need to fire at a user-configured time — say, 20:00 in the user's local timezone. Users are spread across the world, so we cannot use a single timezone-specific schedule.

Our approach: run the Lambda every 15 minutes and query for users whose reminder time (converted from their stored timezone to UTC) falls within the current 15-minute window.

cron(0/15 * * * ? *)  →  fires at :00, :15, :30, :45 each hour

A user configured for 20:00 JST is 11:00 UTC. When the Lambda fires at 11:00, the query returns that user. The maximum notification delay is 14 minutes, which is fine for this use case.

Performance monitoring: two-day breach before alerting

The performance aggregation job does more than write numbers to a table. It implements a simple monitoring pipeline:

Collect RUM samples (LCP, CLS, INP) sent from the browser.
Compute the P75 for each metric.
Compare LCP P75 against a 1-second threshold — above it is a breach, below is a pass.
Check whether yesterday was also a breach.
If two consecutive days breach, create an alert. If a pass follows a breach, resolve any open alerts automatically.

The two-day rule filters out transient spikes — a deployment cold-start, a brief CDN issue — without requiring manual suppression or external monitoring tooling. One bad day is noise. Two bad days is signal.

Closing this loop inside a Lambda means we do not depend on an external observability service to catch regressions. The monitoring logic is code, it lives in the repo, and it is testable.

Monthly AI report

The monthly report is the most involved job we run. On the first of each month at 03:00 UTC, it fetches the previous month's diary entries for every Pro subscriber, converts the Lexical rich-text JSON to plain text, and passes the content to the Anthropic API to generate a personal summary: emotional trends, recurring topics, standout moments.

A few design choices worth noting:

Lexical JSON → plain text. Diary content is stored as Lexical editor state (structured JSON). Before sending to the API we extract plain text so the model works with readable prose, not node trees.
30,000-character cap per user. This limits token usage and keeps the per-user cost predictable as the diary length grows.
100 ms delay between users. A simple rate-limit guard that keeps us within the Anthropic API's request budget.
Production-only. The job is wrapped in a stage check so it never runs in staging, where it would burn real API budget on test data.

if (stage === "production") {
  new sst.aws.Cron("MonthlyReportGenerator", {
    schedule: "cron(0 3 1 * ? *)",
    job: {
      handler: "packages/jobs/src/handlers/reports/monthlyReport.handler",
      timeout: "15 minutes",
      memory: "1024 MB",
    },
  });
}

We allocate the full 15-minute Lambda timeout and 1 GB of memory. Today the job finishes comfortably within those limits. When the user base grows to the point that a single invocation cannot finish in time, the migration path is SQS + one Lambda invocation per user — the handler logic stays the same, only the orchestration changes. We are not optimizing for that today.

Environment variables per job, not a shared pool

Each Cron definition includes only the environment variables that specific job needs:

new sst.aws.Cron("ViewAggregator", {
  job: {
    environment: {
      DATABASE_URL: process.env.DATABASE_URL!,
      // nothing else — no Stripe keys, no Firebase credentials
    },
  },
});

Passing a shared environment block to every job is tempting because it is less typing. We do not do it because a compromised or misconfigured aggregation job should not be able to read Stripe secret keys. Least privilege at the environment-variable level costs almost nothing and meaningfully reduces the blast radius of any single job going wrong.

Dual Lambda + CLI entry point

Every handler is written to work in two modes: as a Lambda function and as a CLI script.

// Lambda entry point — SST calls this
export async function handler() {
  // core logic
}

// CLI entry point — runs when invoked directly
async function main() {
  const options = parseArgs();
  // same logic, command-line flags for date ranges, dry-run, etc.
}

if (process.argv[1] === new URL(import.meta.url).pathname) {
  main();
}

This makes it straightforward to run any job locally before deploying:

pnpm perf:aggregate --dry-run --from=2025-11-01 --to=2025-11-02

It also serves as a manual recovery path. When a job fails in production and we need to reprocess a date range without waiting for the next scheduled invocation, we run the handler directly from the command line. The dual entry point is low-effort to set up and has saved us meaningful debugging time.

Things that tripped us up

Cold starts on frequently-running jobs

Even a job that runs every 15 minutes will see cold starts. Lambda does not guarantee a warm container. We budget extra time in every timeout allocation to account for this — a job with 30 seconds of actual work gets at least two minutes on the clock.

`copyFiles` for email templates

The email queue handler references HTML templates at runtime. When SST bundles the Lambda, it does not automatically include files that are not imported in the dependency graph. Templates need to be copied explicitly:

new sst.aws.Cron("WelcomeEmailSender", {
  job: {
    copyFiles: [
      { from: "apps/web/content/emails", to: "content/emails" },
    ],
  },
});

Forgetting this produces a Lambda that boots successfully and then fails silently when it tries to read a template file that does not exist in the bundle. The error is not obvious from the Lambda logs unless you know to look for it.

EventBridge cron syntax is not standard Unix cron

AWS EventBridge uses a six-field cron expression with an exclusive constraint between the day-of-month and day-of-week fields — exactly one of them must be ?.

✗ cron(0 2 * * * *)    ← day-of-week must be ? when day-of-month is *
✓ cron(0 2 * * ? *)    ← correct

The error message when you get this wrong is not always clear. If a schedule silently fails to register, check the cron syntax first.

Takeaways

sst.aws.Cron is a good fit for low-to-medium frequency scheduled work in a serverless stack. The things we would do the same way again:

One package for all job handlers. Clean separation from the web app, explicit dependencies, testable in isolation.
Offset overlapping jobs. Thirty minutes of offset cost nothing and meaningfully reduces peak database load.
15-minute polling for timezone-aware work. Simpler and more reliable than trying to schedule per-timezone Lambdas.
Per-job environment variables. Small discipline, real security improvement.
CLI mode on every handler. Pays back its setup cost the first time you need to manually reprocess something.

If and when we need to scale beyond what a single Lambda invocation can handle, SQS fan-out is the natural next step. Until then, the simplicity of a cron-scheduled Lambda is hard to beat.

Building a Monorepo with pnpm and TypeScript — how the workspace is structured and how packages/jobs fits into it
Building a Cross-Platform Mobile App with Expo — the Expo side of the same stack these jobs support
Cross-platform Lexical with use dom: monorepo gains and the bridges you still own — how diary content is stored as Lexical JSON, which the monthly report job has to parse

Try Storyie

The jobs described here run every day against real user data at storyie.com. If you are a Pro subscriber, the monthly report lands in your account on the first of each month. The iOS app surfaces it alongside your diary history.

Eight cron jobs in production: how we run background work on SST v3 and Lambda

TL;DR

Why Lambda Cron

The `packages/jobs` package

Schedule design: spreading load across the database

Offset jobs that share tables

Match frequency to freshness requirements

Reminder notifications and timezones

Performance monitoring: two-day breach before alerting

Monthly AI report

Environment variables per job, not a shared pool

Dual Lambda + CLI entry point

Things that tripped us up

Cold starts on frequently-running jobs

`copyFiles` for email templates

EventBridge cron syntax is not standard Unix cron

Takeaways

Related Posts

Try Storyie

Web Vitals monitoring without a SaaS: Next.js, Supabase, and a Lambda Cron

Staging environment design with SST and CloudFront: safely isolating production from everything else

SST v3 tips: Pulumi Output, Cron, ARM64

TL;DR

Why Lambda Cron

The packages/jobs package

Schedule design: spreading load across the database

Offset jobs that share tables

Match frequency to freshness requirements

Reminder notifications and timezones

Performance monitoring: two-day breach before alerting

Monthly AI report

Environment variables per job, not a shared pool

Dual Lambda + CLI entry point

Things that tripped us up

Cold starts on frequently-running jobs

copyFiles for email templates

EventBridge cron syntax is not standard Unix cron

Takeaways

Related Posts

Try Storyie

Related posts

Web Vitals monitoring without a SaaS: Next.js, Supabase, and a Lambda Cron

Staging environment design with SST and CloudFront: safely isolating production from everything else

SST v3 tips: Pulumi Output, Cron, ARM64

The `packages/jobs` package

`copyFiles` for email templates