SST v3 in production: eight lessons from six months of running Storyie on AWS

Storyie Engineering Team
8 min read

Practical SST v3 gotchas and workarounds we accumulated running Storyie on AWS — from resolving Pulumi Output types inside CloudFront transforms to ARM64 tuning, Cron staggering, and Cloudflare DNS automation.

We wrote a previous post on the basics of deploying Next.js to AWS via SST. The short version: SST handles the CloudFront distribution, Lambda functions, S3 assets, and ACM certificates for you, and the DX is good enough that we replaced a hand-rolled Terraform config with it. This post is about what comes after the happy path — the things we hit in six months of running Storyie in production that the SST docs don't cover, or cover only partially.

TL;DR

  • Use $resolve whenever you need to read an SST-managed resource's properties inside a transform callback — those values are Pulumi Output types, not plain strings.
  • In the app() function, use input.stage. In run(), use $app.stage. They're the same value, but $app isn't initialized yet when app() executes.
  • EventBridge cron syntax has six fields, not five, and requires a ? in either the day-of-week or day-of-month position.
  • Stagger concurrent Cron jobs to avoid connection pool contention on Supabase.
  • Use copyFiles for any file a Lambda reads at runtime that isn't imported by the handler.
  • CloudFront Function IP restrictions work well for staging but require explicit bypass paths for webhook endpoints.
  • ARM64 is a free cost reduction — switch unless you have x86-only native addons.
  • sst.cloudflare.dns() automates DNS record creation end-to-end, but keep Cloudflare in DNS-only mode, not proxy mode.

Tip

One-liner

$resolve

Unwrap Output types inside transform callbacks

Stage variable

input.stage in app(), $app.stage in run()

Cron syntax

Six fields; one of DOW / DOM must be ?

Cron stagger

Offset jobs by 30 min to avoid pool contention

copyFiles

Include runtime-read files that esbuild won't trace

IP restriction

CloudFront Function injection; bypass webhook paths

ARM64

One config line, ~20% cost reduction

Cloudflare DNS

sst.cloudflare.dns() handles records automatically

1. $resolve for dynamic CloudFront cache behaviors

sst.aws.Nextjs creates a CloudFront distribution automatically, but there are cases where you need to override the cache behavior for specific paths. OAuth callbacks are the textbook example: an authorization code is single-use, so caching a redirect response at CloudFront breaks the auth flow entirely.

The problem is that inside the transform.cdn callback, properties like defaultCacheBehavior.targetOriginId and defaultCacheBehavior.originRequestPolicyId are Pulumi Output<string> — deferred values that haven't resolved to real strings yet. Accessing them directly causes a type error:

// This does not work — targetOriginId is Output<string>, not string
args.orderedCacheBehaviors = [
  {
    pathPattern: "/api/auth/*",
    targetOriginId: args.defaultCacheBehavior.targetOriginId, // Error
  },
];

The fix is $resolve, SST v3's equivalent of Pulumi's pulumi.all(). It accepts an array of Output values and gives you a single .apply() callback where every value is fully resolved:

new sst.aws.Nextjs("StoryieWeb", {
  transform: {
    cdn: (args) => {
      const cachingDisabledPolicy = await aws.cloudfront.getCachePolicy({
        name: "Managed-CachingDisabled",
      });

      args.orderedCacheBehaviors = $resolve([
        args.orderedCacheBehaviors,
        args.defaultCacheBehavior,
      ]).apply(([existing, defaultBehavior]) => {
        return [
          {
            pathPattern: "/api/auth/*",
            viewerProtocolPolicy: "redirect-to-https",
            allowedMethods: ["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"],
            cachedMethods: ["GET", "HEAD"],
            cachePolicyId: cachingDisabledPolicy.id,
            targetOriginId: defaultBehavior.targetOriginId,
            originRequestPolicyId: defaultBehavior.originRequestPolicyId,
            functionAssociations: defaultBehavior.functionAssociations,
            compress: true,
          },
          ...(Array.isArray(existing) ? existing : []),
        ];
      });
    },
  },
});

Whenever you're inside a transform callback and need to read a property of another SST-managed resource, reach for $resolve first.

2. Stage variables: input.stage vs $app.stage

SST exposes $app.stage as a global inside the run() function. But $app doesn't exist yet when app() executes — that function runs during the config phase, before the Pulumi runtime is initialized. Use input.stage there instead:

export default $config({
  app: async (input) => {
    // $app is not available here — use input.stage
    const stage = input?.stage ?? "dev";

    const envFile = stage === "production" ? ".env.production" : ".env";
    const { config } = await import("dotenv");
    config({ path: resolve(process.cwd(), envFile) });

    return {
      name: "storyie",
      removal: stage === "production" ? "retain" : "remove",
    };
  },
  async run() {
    // $app.stage is available here
    const stage = $app.stage;
  },
});

Two things to keep in mind:

  • removal: "retain" is essential for production. With "remove", a sst remove call will destroy all AWS resources — S3 buckets, databases, everything. Default to "retain" for any stage you care about.
  • .env loading belongs in app() so that process.env is populated before run() starts. Dynamic import("dotenv") is the clean way to do it there.

3. Cron jobs: three separate gotchas

The syntax is EventBridge, not standard cron

EventBridge uses a six-field cron expression. The fields are minutes hours day-of-month month day-of-week year — there's no year field in standard Unix cron. More importantly, either the day-of-month or day-of-week field must be ?, not *. Both being * is an error.

// EventBridge cron — six fields, DOW or DOM must be ?
schedule: "cron(0 2 * * ? *)"   // ✓ runs at 02:00 every day

// Standard Unix cron — five fields, not valid here
schedule: "cron(0 2 * * *)"     // ✗ will be rejected

Stagger concurrent jobs to protect the connection pool

Storyie runs two aggregation jobs — one for view counts, one for likes — both on a four-hour cycle. Running them simultaneously would double the database connections during their overlap window. On Supabase Free and Pro plans, connection pool capacity is limited, so even two concurrent Lambda cold starts can exhaust it.

We offset the schedules by 30 minutes:

// View aggregation: runs at :00 past the hour
new sst.aws.Cron("ViewAggregator", {
  schedule: "cron(0 */4 * * ? *)",
  job: { /* ... */ },
});

// Like aggregation: runs at :30 past the hour
new sst.aws.Cron("LikeAggregator", {
  schedule: "cron(30 */4 * * ? *)",
  job: { /* ... */ },
});

By the time the second job triggers, the first is done and its connections are released. A trivial fix for what can otherwise look like a mysterious intermittent failure.

Use copyFiles for runtime-read files

SST bundles Lambda handlers with esbuild, which traces import and require statements. Files that a handler reads at runtime with fs.readFile — email templates, MDX content, static JSON configs — are not statically imported, so esbuild doesn't know about them and they're not included in the bundle. The handler works locally because the file exists on disk, then fails in Lambda with a "no such file" error.

copyFiles resolves this:

new sst.aws.Cron("WelcomeEmailSender", {
  job: {
    handler: "packages/jobs/src/handlers/emails/welcomeEmail.handler",
    copyFiles: [
      {
        from: "apps/web/content/emails",
        to: "content/emails",
      },
    ],
  },
});

If you're seeing a path that definitely exists locally throw ENOENT in Lambda, this is almost certainly why.

4. CloudFront Function IP restriction for staging

We wanted staging locked down without Basic Auth (which degrades the experience for testing OAuth flows, Stripe webhooks, etc.). CloudFront Functions are the cleanest solution: the check runs at the edge, before the request reaches Lambda, and there's nothing to maintain at the application level.

SST's edge.viewerRequest.injection accepts a JavaScript snippet that gets inlined into the generated CloudFront Function. The event variable is already in scope:

const ipRestrictionCode = stage !== "production"
  ? `
var allowedIPs = ["203.0.113.1", "198.51.100.2"];
var clientIP = event.viewer.ip;
var uri = event.request.uri;

// Bypass paths for external webhooks
var bypassPaths = ["/api/stripe/webhook", "/api/revenuecat/webhook"];
var shouldBypass = bypassPaths.some(function(path) {
  return uri === path || uri.startsWith(path + "?");
});

if (!shouldBypass && allowedIPs.indexOf(clientIP) === -1) {
  return {
    statusCode: 403,
    statusDescription: "Forbidden",
    headers: { "content-type": { value: "text/html" } },
    body: "Access denied.",
  };
}
`
  : undefined;

new sst.aws.Nextjs("StoryieWeb", {
  edge: ipRestrictionCode
    ? { viewerRequest: { injection: ipRestrictionCode } }
    : undefined,
});

Two things to get right here:

  • ES5 compatibility: CloudFront Functions run in a restricted JavaScript runtime. Use var, not const/let. Use regular function expressions, not arrow functions. The runtime has gotten more permissive over time but the safe floor is still ES5.
  • Webhook bypass paths: Any external service that calls your staging URL — Stripe, RevenueCat, or anything else — will get a 403 if its IP isn't in the allowlist. Add those paths to the bypass list. Forgetting Stripe's webhook is a fast way to break test-mode payment flows silently.

5. Lambda configuration tuning

ARM64 is a free cost reduction

server: {
  architecture: "arm64",
  memory: "1024 MB",
  runtime: "nodejs22.x",
  timeout: "20 seconds",
},

Graviton2 (ARM64) is approximately 20% cheaper than x86_64 for equivalent compute, with marginally faster cold starts. For Next.js SSR without x86-only native addons, the switch is a single config line with no other changes required. Sharp (used by Next.js image optimization) ships ARM64 binaries, so that's not a concern.

The only scenario where you should pause: check your node_modules for .node files. If any native addon ships only an x86_64 build, it will fail to load on ARM64. For Storyie — Drizzle ORM, no unusual native deps — there was nothing to check.

Image optimization memory

imageOptimization: {
  memory: "1536 MB",
},

Next.js image optimization uses Sharp, and Sharp is memory-hungry for large source images. We were seeing timeouts at the default 1024 MB on images uploaded from mobile. Bumping to 1536 MB resolved it. If your app handles user-uploaded images, err toward more memory from the start.

6. Cache invalidation strategy

invalidation: {
  paths: "all",
  wait: false,
},

paths: "all" invalidates the entire CloudFront distribution on every deploy. AWS gives you 1,000 invalidation paths per month free; for a solo project or small team that's more than enough. wait: false lets the deploy complete without waiting for invalidation to finish — invalidation can take a few minutes, and making the deploy step block on it adds unnecessary wall time to CI.

For asset cache headers, we use different strategies depending on whether the file has a content hash:

assets: {
  // Unhashed files (favicon.ico, etc.): CDN caches for one day, browsers revalidate
  nonVersionedFilesCacheHeader:
    "public,max-age=0,s-maxage=86400,stale-while-revalidate=8640",
  // Hashed files (_next/static/**): immutable, cached for one year
  versionedFilesCacheHeader:
    "public,max-age=31536000,immutable",
},

Files under _next/static/ get content-addressed filenames from Next.js's build, so they're safe to cache indefinitely. Everything else — favicons, robots.txt, open graph images — uses CDN-level caching while forcing browsers to revalidate on every load.

7. Production-only resources with plain if statements

Some resources only make sense in production. SST v3, being a Pulumi program written in TypeScript, lets you gate resource creation with ordinary control flow:

async run() {
  const stage = $app.stage;

  if (stage === "production") {
    new sst.aws.Cron("MonthlyReportGenerator", {
      schedule: "cron(0 3 1 * ? *)",
      job: { /* ... */ },
    });
  }

  if (stage === "production") {
    new sst.aws.Cron("XPostScheduler", {
      schedule: "cron(*/5 * * * ? *)",
      job: { /* ... */ },
    });
  }
}

This is one of the real advantages of SST v3's Pulumi-based approach compared to Terraform's count or CDK's Condition constructs. There's no DSL to learn, no indirection — it's just TypeScript. Staging environments don't accumulate resources that exist for no reason, and the logic is readable at a glance.

8. Cloudflare DNS automation

SST has native Cloudflare provider support. When you specify sst.cloudflare.dns() as the DNS provider, SST automatically creates the ACM certificate validation records and the CloudFront CNAME in Cloudflare — no manual DNS steps after deployment:

domain: {
  name: "storyie.com",
  aliases: ["*.storyie.com"],
  dns: sst.cloudflare.dns(),
},

One constraint: the Cloudflare record for the domain must be in DNS Only mode (the grey cloud icon), not Proxied (the orange cloud). Running CloudFront through Cloudflare's proxy creates a double-CDN situation that interferes with ACM certificate validation and can cause caching conflicts. Turn off the proxy and let CloudFront handle everything.

Wrapping up

SST v3's move to Pulumi as the underlying engine was the right call. Full TypeScript expressiveness for infrastructure means the staging-vs-production branching, the dynamic cache behavior construction, and the staggered Cron schedules all read as ordinary application code rather than a configuration DSL with limited escape hatches.

The rough edges are real though. Pulumi's Output type is invisible until it bites you — there's no type error that says "this is an Output, use $resolve." CloudFront Function's ES5 runtime has no warning at authoring time; you find out at the edge. The copyFiles gap between local and Lambda environments is exactly the kind of thing that costs an afternoon to diagnose.

Hopefully this list saves someone that time.

Related Posts

Try Storyie

Storyie is the diary app running on all of this. Write and share your stories at storyie.com, or grab the iOS app.