Storyie runs eight background jobs in production: view aggregation, like tallies, performance monitoring, hashtag extraction, diary reminders, email queue processing, welcome emails, and a monthly AI-generated report. Every one of them is defined with sst.aws.Cron and executed on Lambda.
This post covers the design decisions behind that setup — why we chose Lambda Cron, how we structured the code, and the specific patterns that actually helped once things were running in production.
TL;DR
sst.aws.Cronspins up a dedicated Lambda per job, keeping cost near zero for low-frequency work and isolating each job's memory, timeout, and environment variables from the others.- All handlers live in a standalone
packages/jobspackage — independent from the Next.js app, testable in isolation, and deployable without touching web infrastructure. - We offset overlapping jobs by 30 minutes to avoid hitting the database with heavy queries at the same instant.
- Reminder notifications fire every 15 minutes and query for users whose configured time falls in the current window — no per-timezone scheduling needed.
- The monthly AI report runs on the first of each month, converts Lexical rich-text JSON to plain text, caps input at 30,000 characters per user, and calls the Anthropic API. It is disabled in staging to avoid surprise costs.
- Each job receives only the environment variables it needs — no blanket secret injection.
- Every handler doubles as a CLI script for local dry-runs and manual recovery.
Concern | Our approach |
|---|---|
Cost | Lambda billed per invocation — near zero for daily aggregations |
Isolation | One Lambda per job, one environment block per job |
Load spreading | 30-minute offsets between jobs that hit the same tables |
Timezone support | 15-minute polling cadence, window query in the handler |
Testing | Dual Lambda + CLI entry point per handler |
Secrets | Minimum variables per job, not a shared full-stack environment |
Why Lambda Cron
We considered four options before landing on SST Cron.
Approach | Pros | Cons |
|---|---|---|
Next.js API route + ext. cron | Simple to deploy | Shared Lambda, 15 s timeout, external dependency |
ECS / Fargate tasks | Handles long-running work | Higher cost, more infrastructure to manage |
SST Cron + Lambda | Serverless, near-zero cost, IaC-first | 15-minute Lambda timeout ceiling |
SQS + Lambda | Event-driven, built-in retry | Not designed for time-based scheduling |
For Storyie the decision came down to two things: running cost and management overhead. Lambda Cron charges only for invocations and duration. A handful of daily aggregation jobs costs essentially nothing. Everything — schedule, memory, timeout, environment — is declared in sst.config.ts alongside the rest of the stack, so there is no separate cron dashboard or service to keep in sync.
The packages/jobs package
Job handlers live in their own workspace package, separate from the Next.js app.
packages/jobs/
├── src/
│ ├── handlers/
│ │ ├── aggregations/ # views, likes, performance
│ │ ├── emails/ # welcome emails, queue processing
│ │ ├── notifications/ # diary reminders
│ │ ├── reports/ # monthly AI report
│ │ ├── tags/ # hashtag extraction
│ │ └── x-posting/ # automated social posts
│ ├── lib/ # shared utilities
│ └── types/
├── package.json
└── tsconfig.jsonKeeping jobs in a separate package means:
- Independent deploys. Fixing a job does not require rebuilding or redeploying the Next.js app.
- Explicit dependencies. Each handler imports only what it needs, and the package's
package.jsonmakes those dependencies auditable. - Testable in isolation. Handlers are plain async functions; a test can import and call them directly.
- CLI mode. The same handler can run from a terminal for dry-runs or manual recovery (more on this below).
Each handler is a single exported handler function. SST deploys that function as a Lambda.
Schedule design: spreading load across the database
Eight jobs running on the same database is a coordination problem. Here is what we do about it.
Offset jobs that share tables
View aggregation and like aggregation both run every four hours and both touch related tables. Running them at the same second doubles the query pressure. We offset one by 30 minutes:
// Views: every 4 hours on the hour
new sst.aws.Cron("ViewAggregator", {
schedule: "cron(0 */4 * * ? *)",
// ...
});
// Likes: every 4 hours at :30 (offset by 30 minutes)
new sst.aws.Cron("LikeAggregator", {
schedule: "cron(30 */4 * * ? *)",
// ...
});It is a small change, but it makes a real difference when both jobs are running heavy aggregation queries.
Match frequency to freshness requirements
Not every job needs to run at the same cadence. We ask one question for each: how stale is too stale?
Job | Frequency | Rationale |
|---|---|---|
Performance aggregation | Daily at 02:00 UTC | Previous day's data processed in one batch |
View / like aggregation | Every 4 hours | Balance between freshness and query cost |
Hashtag extraction | Every 1 hour | New diary tags reflected within a reasonable window |
Diary reminders | Every 15 minutes | Enough resolution to respect per-user timezone times |
Email queue | Every 5 minutes | Queued emails processed promptly |
Monthly AI report | 1st of month 03:00 | Prior month analyzed once at the start of the new one |
View counts being four hours stale in a diary app does not affect the user experience. Running that job more frequently would cost more and add database load for no user-visible benefit.
Reminder notifications and timezones
Diary reminders need to fire at a user-configured time — say, 20:00 in the user's local timezone. Users are spread across the world, so we cannot use a single timezone-specific schedule.
Our approach: run the Lambda every 15 minutes and query for users whose reminder time (converted from their stored timezone to UTC) falls within the current 15-minute window.
cron(0/15 * * * ? *) → fires at :00, :15, :30, :45 each hourA user configured for 20:00 JST is 11:00 UTC. When the Lambda fires at 11:00, the query returns that user. The maximum notification delay is 14 minutes, which is fine for this use case.
Performance monitoring: two-day breach before alerting
The performance aggregation job does more than write numbers to a table. It implements a simple monitoring pipeline:
- Collect RUM samples (LCP, CLS, INP) sent from the browser.
- Compute the P75 for each metric.
- Compare LCP P75 against a 1-second threshold — above it is a
breach, below is apass. - Check whether yesterday was also a
breach. - If two consecutive days breach, create an alert. If a
passfollows a breach, resolve any open alerts automatically.
The two-day rule filters out transient spikes — a deployment cold-start, a brief CDN issue — without requiring manual suppression or external monitoring tooling. One bad day is noise. Two bad days is signal.
Closing this loop inside a Lambda means we do not depend on an external observability service to catch regressions. The monitoring logic is code, it lives in the repo, and it is testable.
Monthly AI report
The monthly report is the most involved job we run. On the first of each month at 03:00 UTC, it fetches the previous month's diary entries for every Pro subscriber, converts the Lexical rich-text JSON to plain text, and passes the content to the Anthropic API to generate a personal summary: emotional trends, recurring topics, standout moments.
A few design choices worth noting:
- Lexical JSON → plain text. Diary content is stored as Lexical editor state (structured JSON). Before sending to the API we extract plain text so the model works with readable prose, not node trees.
- 30,000-character cap per user. This limits token usage and keeps the per-user cost predictable as the diary length grows.
- 100 ms delay between users. A simple rate-limit guard that keeps us within the Anthropic API's request budget.
- Production-only. The job is wrapped in a stage check so it never runs in staging, where it would burn real API budget on test data.
if (stage === "production") {
new sst.aws.Cron("MonthlyReportGenerator", {
schedule: "cron(0 3 1 * ? *)",
job: {
handler: "packages/jobs/src/handlers/reports/monthlyReport.handler",
timeout: "15 minutes",
memory: "1024 MB",
},
});
}We allocate the full 15-minute Lambda timeout and 1 GB of memory. Today the job finishes comfortably within those limits. When the user base grows to the point that a single invocation cannot finish in time, the migration path is SQS + one Lambda invocation per user — the handler logic stays the same, only the orchestration changes. We are not optimizing for that today.
Environment variables per job, not a shared pool
Each Cron definition includes only the environment variables that specific job needs:
new sst.aws.Cron("ViewAggregator", {
job: {
environment: {
DATABASE_URL: process.env.DATABASE_URL!,
// nothing else — no Stripe keys, no Firebase credentials
},
},
});Passing a shared environment block to every job is tempting because it is less typing. We do not do it because a compromised or misconfigured aggregation job should not be able to read Stripe secret keys. Least privilege at the environment-variable level costs almost nothing and meaningfully reduces the blast radius of any single job going wrong.
Dual Lambda + CLI entry point
Every handler is written to work in two modes: as a Lambda function and as a CLI script.
// Lambda entry point — SST calls this
export async function handler() {
// core logic
}
// CLI entry point — runs when invoked directly
async function main() {
const options = parseArgs();
// same logic, command-line flags for date ranges, dry-run, etc.
}
if (process.argv[1] === new URL(import.meta.url).pathname) {
main();
}This makes it straightforward to run any job locally before deploying:
pnpm perf:aggregate --dry-run --from=2025-11-01 --to=2025-11-02It also serves as a manual recovery path. When a job fails in production and we need to reprocess a date range without waiting for the next scheduled invocation, we run the handler directly from the command line. The dual entry point is low-effort to set up and has saved us meaningful debugging time.
Things that tripped us up
Cold starts on frequently-running jobs
Even a job that runs every 15 minutes will see cold starts. Lambda does not guarantee a warm container. We budget extra time in every timeout allocation to account for this — a job with 30 seconds of actual work gets at least two minutes on the clock.
copyFiles for email templates
The email queue handler references HTML templates at runtime. When SST bundles the Lambda, it does not automatically include files that are not imported in the dependency graph. Templates need to be copied explicitly:
new sst.aws.Cron("WelcomeEmailSender", {
job: {
copyFiles: [
{ from: "apps/web/content/emails", to: "content/emails" },
],
},
});Forgetting this produces a Lambda that boots successfully and then fails silently when it tries to read a template file that does not exist in the bundle. The error is not obvious from the Lambda logs unless you know to look for it.
EventBridge cron syntax is not standard Unix cron
AWS EventBridge uses a six-field cron expression with an exclusive constraint between the day-of-month and day-of-week fields — exactly one of them must be ?.
✗ cron(0 2 * * * *) ← day-of-week must be ? when day-of-month is *
✓ cron(0 2 * * ? *) ← correctThe error message when you get this wrong is not always clear. If a schedule silently fails to register, check the cron syntax first.
Takeaways
sst.aws.Cron is a good fit for low-to-medium frequency scheduled work in a serverless stack. The things we would do the same way again:
- One package for all job handlers. Clean separation from the web app, explicit dependencies, testable in isolation.
- Offset overlapping jobs. Thirty minutes of offset cost nothing and meaningfully reduces peak database load.
- 15-minute polling for timezone-aware work. Simpler and more reliable than trying to schedule per-timezone Lambdas.
- Per-job environment variables. Small discipline, real security improvement.
- CLI mode on every handler. Pays back its setup cost the first time you need to manually reprocess something.
If and when we need to scale beyond what a single Lambda invocation can handle, SQS fan-out is the natural next step. Until then, the simplicity of a cron-scheduled Lambda is hard to beat.
Related Posts
- Building a Monorepo with pnpm and TypeScript — how the workspace is structured and how
packages/jobsfits into it - Building a Cross-Platform Mobile App with Expo — the Expo side of the same stack these jobs support
- Cross-platform Lexical with
use dom: monorepo gains and the bridges you still own — how diary content is stored as Lexical JSON, which the monthly report job has to parse
Try Storyie
The jobs described here run every day against real user data at storyie.com. If you are a Pro subscriber, the monthly report lands in your account on the first of each month. The iOS app surfaces it alongside your diary history.