Why use a CloudFront Function for IP gating instead of AWS WAF?

WAF starts at around $5/month even before you count request charges — for a staging environment that sits idle most of the time, that cost adds up fast. A CloudFront Function runs at the edge, costs nothing for the first 10 million requests per month, and plugs into SST's `edge.viewerRequest.injection` without any extra infrastructure. The tradeoff is that CloudFront Functions only support ES 5.1-compatible JavaScript, so you have to write `var`, `indexOf`, and named functions instead of modern arrow syntax. For a short IP allowlist check, that constraint is trivial.

How do you test Stripe and RevenueCat webhooks against staging when IP gating is on?

We bypass the IP check for specific paths — `/api/stripe/webhook` and `/api/revenuecat/webhook` — inside the CloudFront Function itself. The bypass is checked before the IP comparison, so Stripe's servers can always reach the endpoint. The webhook handlers themselves verify the request signature, so an attacker who guesses the bypass path still can't forge a valid event. It's a two-lock model: IP gating for the front door, signature verification for the room.

Why does the SST transform use `$resolve` when adding the auth cache behavior?

SST's Nextjs component generates CloudFront configuration as Pulumi `Input` types, which are promises that resolve during deployment. You can't spread or merge them directly. `$resolve` unwraps those inputs so you can access their values, then re-wrap the result. Without it, the new cache behavior loses the `targetOriginId` and `functionAssociations` from the default behavior — which means the auth route suddenly has no origin to talk to and no IP check running on it.

Does staging run the same Lambda configuration as production?

Yes, intentionally. We use the same memory (1024 MB), runtime (Node.js 22, ARM64), and timeout (20 seconds) on both stages. The goal is to catch Lambda cold-start behavior, memory pressure, and arm64-specific issues in staging before they hit production. The small cost difference between stages is worth the confidence. The only things that differ are domain, IP gating, environment variables, and which cron jobs are deployed.

How do you handle environment variables across stages without them leaking?

The env file selection (`".env.production"` vs `".env"`) is determined by the SST stage at deploy time, but any URL-shaped variable is explicitly overridden in the SST config code rather than trusted from the file. This means even if someone writes a production URL into the wrong env file, the deployed function gets the correct value for its stage. All env files are encrypted with dotenvx, so the raw values are never in plaintext in the repo.

When should staging be running and when should it be torn down?

We run staging on demand — `sst deploy --stage staging` before a significant change, and `sst remove --stage staging` when the work is merged. Because staging resources use `removal: "remove"`, the teardown is clean and complete. Keeping staging permanently alive would cost money for idle Lambda capacity and CloudFront distributions; spinning it up per-feature means we only pay when we're actually testing. The Cloudflare DNS record is cleaned up automatically by SST when the stack is removed.

Staging environment design with SST and CloudFront: safely isolating production from everything else

At some point in every project's life, "just test it in production" stops being an option. For Storyie, that moment came when we started wiring up Stripe webhooks, OAuth callback flows, and CloudFront configuration changes that we weren't willing to break live. We needed a staging environment — and we needed it to be safe, cheap to run, and easy to tear down.

This post walks through how we built it using SST v3's stage system and a CloudFront Function for IP gating, all from a single sst.config.ts.

TL;DR

SST v3's --stage flag spins up a completely independent AWS stack with one command. Staging and production share no resources.
A CloudFront Function (free tier: 10M requests/month) gates staging to allowlisted IPs — no WAF required.
Webhook endpoints (Stripe, RevenueCat) bypass the IP check; their own signature verification handles security.
The /api/auth/* path gets Managed-CachingDisabled applied via SST's transform.cdn so OAuth callbacks are never served from cache.
Production-only cron jobs (monthly reports, X posting) are conditionally created with a plain if (stage === "production") guard.

Item	Production	Staging
Domain	`storyie.com` + `*.storyie.com`	`staging.storyie.com`
IP restriction	None	CloudFront Function
Webhook access	Unrestricted	Bypass IP check
Auth route caching	Disabled	Disabled
Resource removal	`retain`	`remove`
Cron jobs	Active	Not deployed

Why staging at all

We started with the standard two-environment setup: local development and production. It worked until it didn't. The specific pain points:

Stripe webhook testing: the Stripe CLI can forward webhooks locally, but Lambda execution behavior on AWS is different from a local Node process. We kept hitting subtle differences.
OAuth callback URLs: Google and Apple OAuth require a registered redirect URI. localhost is fine for development, but some auth provider behaviors (particularly around Apple's private email relay) only trigger on real HTTPS domains.
SST / CloudFront config changes: CloudFront behavior configuration is infrastructure-level. We weren't going to test distribution settings directly on production.
Demo access: occasionally we need to show a feature to someone outside the team without handing them access to production data.

Domain branching

The SST Nextjs component accepts a domain object. We switch it based on stage:

const domain =
  stage === "production"
    ? {
        name: "storyie.com",
        aliases: ["*.storyie.com"],
        dns: sst.cloudflare.dns(),
      }
    : {
        name: "staging.storyie.com",
        aliases: ["*.staging.storyie.com"],
        dns: sst.cloudflare.dns(),
      };

SST creates and updates the Cloudflare DNS records automatically on deploy, and removes them on sst remove. The wildcard alias is there because we're building toward per-tenant subdomains — staging gets the same shape so we can test that routing before it touches production.

IP gating with a CloudFront Function

The most important property of a staging environment is that it's not publicly reachable. We use a CloudFront Function on the viewer request event to enforce this:

const ipRestrictionCode =
  stage !== "production"
    ? `
var allowedIPs = ["xxx.xxx.xxx.xxx"];
var clientIP = event.viewer.ip;

if (allowedIPs.indexOf(clientIP) === -1) {
  return {
    statusCode: 403,
    statusDescription: "Forbidden",
    body: "Access denied.",
  };
}
`
    : undefined;

A few things worth noting here. First, the condition is stage !== "production" rather than stage === "staging" — that way any non-production stage gets gated by default, not just one named "staging." Second, this is ES 5.1 JavaScript: var, not const; indexOf, not includes; named function expressions, not arrows. CloudFront Functions run in a restricted runtime and won't accept modern syntax.

We chose CloudFront Functions over Lambda@Edge for two reasons: latency (CloudFront Functions run at the edge POP before the request ever reaches a Lambda) and cost (the first 10M invocations per month are free). WAF would have worked too, but WAF carries a fixed monthly fee regardless of traffic.

Bypassing IP gating for webhooks

IP-gating the entire staging domain would block Stripe and RevenueCat from delivering webhook events. Those services call our endpoints from IP ranges we don't control. The fix is a path-level bypass inside the same function:

var bypassPaths = ["/api/stripe/webhook", "/api/revenuecat/webhook"];
var shouldBypass = bypassPaths.some(function(path) {
  return uri === path || uri.startsWith(path + "?");
});

if (!shouldBypass && allowedIPs.indexOf(clientIP) === -1) {
  return { statusCode: 403, ... };
}

This isn't a security hole. The bypass only removes IP gating for those paths — the webhook handlers themselves verify the request signature (Stripe's Webhook.constructEvent, RevenueCat's equivalent). The IP gate protects the staging environment as a whole; the signature check protects the individual endpoints. Two different locks, two different things being protected.

Disabling auth route caching

OAuth callbacks carry a one-time authorization code. If CloudFront serves a cached response for /api/auth/callback, the code gets reused, the second use fails, and the login breaks. We apply AWS's Managed-CachingDisabled policy to all /api/auth/* paths:

const cachingDisabledPolicy = await aws.cloudfront.getCachePolicy({
  name: "Managed-CachingDisabled",
});

const authCacheBehavior = {
  pathPattern: "/api/auth/*",
  viewerProtocolPolicy: "redirect-to-https",
  allowedMethods: ["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"],
  cachePolicyId: cachingDisabledPolicy.id,
  compress: true,
};

Getting this behavior wired into SST's auto-generated CloudFront distribution requires transform.cdn:

new sst.aws.Nextjs("StoryieWeb", {
  transform: {
    cdn: (args) => {
      args.orderedCacheBehaviors = $resolve([
        args.orderedCacheBehaviors,
        args.defaultCacheBehavior,
      ]).apply(([existing, defaultBehavior]) => {
        return [
          {
            ...authCacheBehavior,
            targetOriginId: defaultBehavior.targetOriginId,
            originRequestPolicyId: defaultBehavior.originRequestPolicyId,
            functionAssociations: defaultBehavior.functionAssociations,
          },
          ...existing,
        ];
      });
    },
  },
});

The $resolve call is important. SST exposes CloudFront configuration as Pulumi Input types — they're not plain values yet, they're promises that resolve during deployment. $resolve unwraps them so you can read defaultBehavior.targetOriginId. Without it, the auth behavior ends up with no origin and no IP check function, which means it silently breaks in two different ways.

Keeping production-only jobs out of staging

Some things should simply not run in staging: the monthly digest emails, the X (Twitter) posting cron, anything that touches external services we'd have to clean up. SST's stage system is just string comparison, so the guard is a plain if:

if (stage === "production") {
  new sst.aws.Cron("MonthlyReportGenerator", { ... });
  new sst.aws.Cron("XPostScheduler", { ... });
}

No abstraction needed. Resources that don't exist cost nothing and generate no noise in CloudWatch.

Environment variable discipline

We pick the dotenvx-encrypted env file based on stage:

const envFile = stage === "production" ? ".env.production" : ".env";

But we don't trust the file for URL-shaped variables. Those get set explicitly in the SST config:

NEXT_PUBLIC_BASE_URL:
  stage === "production"
    ? "https://storyie.com"
    : "https://staging.storyie.com",

The reasoning: if someone accidentally writes a production URL into .env, the deployed function for staging still gets the right value, because the SST code wins. The configuration is correct by construction, not by convention.

Resource removal policy

removal: input?.stage === "production" ? "retain" : "remove",

retain on production means sst remove won't delete the CloudFront distribution or S3 bucket — useful if you ever need to roll back at the infrastructure level. remove on staging means sst remove --stage staging does a complete teardown. Staging is meant to be disposable.

Lambda configuration

server: {
  memory: "1024 MB",
  runtime: "nodejs22.x",
  architecture: "arm64",
  timeout: "20 seconds",
},

Staging runs identical Lambda configuration to production. The point of staging is to catch issues before they hit production — if staging runs different memory or a different architecture, you're not actually testing what will run in production. ARM64 (Graviton) is roughly 20% cheaper than x86 at equivalent performance, which is a free win on both stages.

How we actually use it

Day-to-day the workflow is:

Local dev: sst dev for hot reload against a local Next.js server.
Pre-merge check: sst deploy --stage staging to get a real AWS deployment with a real domain and real HTTPS.
Webhook testing: point Stripe's test-mode webhook URL at staging.storyie.com/api/stripe/webhook.
Production deploy: sst deploy --stage production from GitHub Actions on merge to main.
Teardown: sst remove --stage staging once the change is merged.

Staging is not always running. We spin it up per-feature and tear it down after. The remove policy makes cleanup trivial.

A few things that bit us along the way

CloudFront Function syntax: ES 5.1 only. When we first wrote the IP check with const and arrow functions, the deployment succeeded but the function failed silently at runtime with no useful error. Test with a minimal function first, then expand.

Staging doesn't need production data: we seed staging with a handful of test accounts and a few sample diary entries. The point is that the infrastructure matches, not the data. Trying to sync production data into staging creates more problems than it solves.

Two locks for webhooks: when we first designed the webhook bypass, the question was whether IP gating was redundant given that signature verification exists. It isn't — they protect different things. IP gating keeps unauthorized traffic off the staging environment as a whole. Signature verification keeps forged events out of the webhook handler. If either one fails, the other still holds.

Takeaways

SST's stage mechanism does the heavy lifting here — one config file, two fully independent AWS stacks, with no shared resources between them. The CloudFront Function adds IP gating at essentially zero cost, and the transform.cdn escape hatch lets us layer in auth-specific cache behavior without fighting SST's defaults.

The design isn't elaborate. if (stage === "production") for conditional resources, explicit env var overrides for URLs, removal: "remove" for clean teardowns. The simplicity is the point — a staging environment you have to fight to maintain doesn't get used.

Deploying Next.js to AWS with SST v3 — the full deployment setup this staging design builds on
Building a Monorepo with pnpm and TypeScript — workspace conventions the config lives inside

Try Storyie

The staging environment lets us ship with confidence before anything reaches production. If you'd like to see the result, storyie.com is the production version — and the iOS app talks to the same backend.

Staging environment design with SST and CloudFront: safely isolating production from everything else

TL;DR

Why staging at all

Domain branching

IP gating with a CloudFront Function

Bypassing IP gating for webhooks

Disabling auth route caching

Keeping production-only jobs out of staging

Environment variable discipline

Resource removal policy

Lambda configuration

How we actually use it

A few things that bit us along the way

Takeaways

Related Posts

Try Storyie

Eight cron jobs in production: how we run background work on SST v3 and Lambda

Seeding a social diary app with AI bot users: design, scheduling, and lessons from production

Shared subscription limits across web and mobile: a DB-free package design

TL;DR

Why staging at all

Domain branching

IP gating with a CloudFront Function

Bypassing IP gating for webhooks

Disabling auth route caching

Keeping production-only jobs out of staging

Environment variable discipline

Resource removal policy

Lambda configuration

How we actually use it

A few things that bit us along the way

Takeaways

Related Posts

Try Storyie

Related posts

Eight cron jobs in production: how we run background work on SST v3 and Lambda

Seeding a social diary app with AI bot users: design, scheduling, and lessons from production

Shared subscription limits across web and mobile: a DB-free package design