At some point in every project's life, "just test it in production" stops being an option. For Storyie, that moment came when we started wiring up Stripe webhooks, OAuth callback flows, and CloudFront configuration changes that we weren't willing to break live. We needed a staging environment — and we needed it to be safe, cheap to run, and easy to tear down.
This post walks through how we built it using SST v3's stage system and a CloudFront Function for IP gating, all from a single sst.config.ts.
TL;DR
- SST v3's
--stageflag spins up a completely independent AWS stack with one command. Staging and production share no resources. - A CloudFront Function (free tier: 10M requests/month) gates staging to allowlisted IPs — no WAF required.
- Webhook endpoints (Stripe, RevenueCat) bypass the IP check; their own signature verification handles security.
- The
/api/auth/*path getsManaged-CachingDisabledapplied via SST'stransform.cdnso OAuth callbacks are never served from cache. - Production-only cron jobs (monthly reports, X posting) are conditionally created with a plain
if (stage === "production")guard.
Item | Production | Staging |
|---|---|---|
Domain |
|
|
IP restriction | None | CloudFront Function |
Webhook access | Unrestricted | Bypass IP check |
Auth route caching | Disabled | Disabled |
Resource removal |
|
|
Cron jobs | Active | Not deployed |
Why staging at all
We started with the standard two-environment setup: local development and production. It worked until it didn't. The specific pain points:
- Stripe webhook testing: the Stripe CLI can forward webhooks locally, but Lambda execution behavior on AWS is different from a local Node process. We kept hitting subtle differences.
- OAuth callback URLs: Google and Apple OAuth require a registered redirect URI.
localhostis fine for development, but some auth provider behaviors (particularly around Apple's private email relay) only trigger on real HTTPS domains. - SST / CloudFront config changes: CloudFront behavior configuration is infrastructure-level. We weren't going to test distribution settings directly on production.
- Demo access: occasionally we need to show a feature to someone outside the team without handing them access to production data.
Domain branching
The SST Nextjs component accepts a domain object. We switch it based on stage:
const domain =
stage === "production"
? {
name: "storyie.com",
aliases: ["*.storyie.com"],
dns: sst.cloudflare.dns(),
}
: {
name: "staging.storyie.com",
aliases: ["*.staging.storyie.com"],
dns: sst.cloudflare.dns(),
};SST creates and updates the Cloudflare DNS records automatically on deploy, and removes them on sst remove. The wildcard alias is there because we're building toward per-tenant subdomains — staging gets the same shape so we can test that routing before it touches production.
IP gating with a CloudFront Function
The most important property of a staging environment is that it's not publicly reachable. We use a CloudFront Function on the viewer request event to enforce this:
const ipRestrictionCode =
stage !== "production"
? `
var allowedIPs = ["xxx.xxx.xxx.xxx"];
var clientIP = event.viewer.ip;
if (allowedIPs.indexOf(clientIP) === -1) {
return {
statusCode: 403,
statusDescription: "Forbidden",
body: "Access denied.",
};
}
`
: undefined;A few things worth noting here. First, the condition is stage !== "production" rather than stage === "staging" — that way any non-production stage gets gated by default, not just one named "staging." Second, this is ES 5.1 JavaScript: var, not const; indexOf, not includes; named function expressions, not arrows. CloudFront Functions run in a restricted runtime and won't accept modern syntax.
We chose CloudFront Functions over Lambda@Edge for two reasons: latency (CloudFront Functions run at the edge POP before the request ever reaches a Lambda) and cost (the first 10M invocations per month are free). WAF would have worked too, but WAF carries a fixed monthly fee regardless of traffic.
Bypassing IP gating for webhooks
IP-gating the entire staging domain would block Stripe and RevenueCat from delivering webhook events. Those services call our endpoints from IP ranges we don't control. The fix is a path-level bypass inside the same function:
var bypassPaths = ["/api/stripe/webhook", "/api/revenuecat/webhook"];
var shouldBypass = bypassPaths.some(function(path) {
return uri === path || uri.startsWith(path + "?");
});
if (!shouldBypass && allowedIPs.indexOf(clientIP) === -1) {
return { statusCode: 403, ... };
}This isn't a security hole. The bypass only removes IP gating for those paths — the webhook handlers themselves verify the request signature (Stripe's Webhook.constructEvent, RevenueCat's equivalent). The IP gate protects the staging environment as a whole; the signature check protects the individual endpoints. Two different locks, two different things being protected.
Disabling auth route caching
OAuth callbacks carry a one-time authorization code. If CloudFront serves a cached response for /api/auth/callback, the code gets reused, the second use fails, and the login breaks. We apply AWS's Managed-CachingDisabled policy to all /api/auth/* paths:
const cachingDisabledPolicy = await aws.cloudfront.getCachePolicy({
name: "Managed-CachingDisabled",
});
const authCacheBehavior = {
pathPattern: "/api/auth/*",
viewerProtocolPolicy: "redirect-to-https",
allowedMethods: ["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"],
cachePolicyId: cachingDisabledPolicy.id,
compress: true,
};Getting this behavior wired into SST's auto-generated CloudFront distribution requires transform.cdn:
new sst.aws.Nextjs("StoryieWeb", {
transform: {
cdn: (args) => {
args.orderedCacheBehaviors = $resolve([
args.orderedCacheBehaviors,
args.defaultCacheBehavior,
]).apply(([existing, defaultBehavior]) => {
return [
{
...authCacheBehavior,
targetOriginId: defaultBehavior.targetOriginId,
originRequestPolicyId: defaultBehavior.originRequestPolicyId,
functionAssociations: defaultBehavior.functionAssociations,
},
...existing,
];
});
},
},
});The $resolve call is important. SST exposes CloudFront configuration as Pulumi Input types — they're not plain values yet, they're promises that resolve during deployment. $resolve unwraps them so you can read defaultBehavior.targetOriginId. Without it, the auth behavior ends up with no origin and no IP check function, which means it silently breaks in two different ways.
Keeping production-only jobs out of staging
Some things should simply not run in staging: the monthly digest emails, the X (Twitter) posting cron, anything that touches external services we'd have to clean up. SST's stage system is just string comparison, so the guard is a plain if:
if (stage === "production") {
new sst.aws.Cron("MonthlyReportGenerator", { ... });
new sst.aws.Cron("XPostScheduler", { ... });
}No abstraction needed. Resources that don't exist cost nothing and generate no noise in CloudWatch.
Environment variable discipline
We pick the dotenvx-encrypted env file based on stage:
const envFile = stage === "production" ? ".env.production" : ".env";But we don't trust the file for URL-shaped variables. Those get set explicitly in the SST config:
NEXT_PUBLIC_BASE_URL:
stage === "production"
? "https://storyie.com"
: "https://staging.storyie.com",The reasoning: if someone accidentally writes a production URL into .env, the deployed function for staging still gets the right value, because the SST code wins. The configuration is correct by construction, not by convention.
Resource removal policy
removal: input?.stage === "production" ? "retain" : "remove",retain on production means sst remove won't delete the CloudFront distribution or S3 bucket — useful if you ever need to roll back at the infrastructure level. remove on staging means sst remove --stage staging does a complete teardown. Staging is meant to be disposable.
Lambda configuration
server: {
memory: "1024 MB",
runtime: "nodejs22.x",
architecture: "arm64",
timeout: "20 seconds",
},Staging runs identical Lambda configuration to production. The point of staging is to catch issues before they hit production — if staging runs different memory or a different architecture, you're not actually testing what will run in production. ARM64 (Graviton) is roughly 20% cheaper than x86 at equivalent performance, which is a free win on both stages.
How we actually use it
Day-to-day the workflow is:
- Local dev:
sst devfor hot reload against a local Next.js server. - Pre-merge check:
sst deploy --stage stagingto get a real AWS deployment with a real domain and real HTTPS. - Webhook testing: point Stripe's test-mode webhook URL at
staging.storyie.com/api/stripe/webhook. - Production deploy:
sst deploy --stage productionfrom GitHub Actions on merge tomain. - Teardown:
sst remove --stage stagingonce the change is merged.
Staging is not always running. We spin it up per-feature and tear it down after. The remove policy makes cleanup trivial.
A few things that bit us along the way
CloudFront Function syntax: ES 5.1 only. When we first wrote the IP check with const and arrow functions, the deployment succeeded but the function failed silently at runtime with no useful error. Test with a minimal function first, then expand.
Staging doesn't need production data: we seed staging with a handful of test accounts and a few sample diary entries. The point is that the infrastructure matches, not the data. Trying to sync production data into staging creates more problems than it solves.
Two locks for webhooks: when we first designed the webhook bypass, the question was whether IP gating was redundant given that signature verification exists. It isn't — they protect different things. IP gating keeps unauthorized traffic off the staging environment as a whole. Signature verification keeps forged events out of the webhook handler. If either one fails, the other still holds.
Takeaways
SST's stage mechanism does the heavy lifting here — one config file, two fully independent AWS stacks, with no shared resources between them. The CloudFront Function adds IP gating at essentially zero cost, and the transform.cdn escape hatch lets us layer in auth-specific cache behavior without fighting SST's defaults.
The design isn't elaborate. if (stage === "production") for conditional resources, explicit env var overrides for URLs, removal: "remove" for clean teardowns. The simplicity is the point — a staging environment you have to fight to maintain doesn't get used.
Related Posts
- Deploying Next.js to AWS with SST v3 — the full deployment setup this staging design builds on
- Building a Monorepo with pnpm and TypeScript — workspace conventions the config lives inside
Try Storyie
The staging environment lets us ship with confidence before anything reaches production. If you'd like to see the result, storyie.com is the production version — and the iOS app talks to the same backend.