Deploying Next.js to AWS with SST: CloudFront, IP restrictions, and cron jobs

Storyie Engineering Team
8 min read

How we deploy Storyie's Next.js app to AWS using SST v3 — covering stage isolation, CloudFront function IP restrictions, cache policy customization for OAuth routes, Graviton Lambda tuning, and cron job scheduling.

Storyie's web app runs on Next.js, deployed to AWS via SST v3. The choice isn't exotic — most Next.js apps are fine on Vercel — but Storyie has enough infrastructure requirements that managing our own AWS stack with SST pays off in control and cost. This post walks through our actual sst.config.ts: the stage setup, the CloudFront customizations, Lambda tuning, and the cron job schedule.

TL;DR

  • sst.aws.Nextjs wraps OpenNext to build a Lambda + CloudFront + S3 deployment from a single component declaration.
  • Stages (production, staging) share one config file and produce fully isolated environments. The removal policy, env file, and domain all branch on input.stage.
  • CloudFront Functions handle staging IP restriction at the edge — written in ES5, with explicit bypasses for webhook paths.
  • OAuth callback routes need Managed-CachingDisabled to prevent auth codes from being served from cache.
  • Graviton (arm64) Lambda is ~20% cheaper with no meaningful performance difference.
  • Seven cron jobs run on staggered schedules to spread DB load.

Concern

Mechanism

Next.js deployment

sst.aws.Nextjs via OpenNext

Environment isolation

SST stages with per-stage domains and env files

Staging IP restriction

CloudFront Function on viewer-request

OAuth cache bypass

Ordered cache behavior with Managed-CachingDisabled

Lambda cost optimization

arm64 + tuned memory per function type

Scheduled batch processing

sst.aws.Cron with offset schedules

Env var source of truth

sst.config.ts — infra code, not .env files

What SST gives you out of the box

SST's sst.aws.Nextjs component is the main primitive. It calls OpenNext internally to split your Next.js app into Lambda functions (server rendering, image optimization) plus S3 (static assets) plus CloudFront (CDN/routing), then wires it all together.

new sst.aws.Nextjs("StoryieWeb", {
  path: "./apps/web",
  domain: {
    name: "storyie.com",
    aliases: ["*.storyie.com"],
    dns: sst.cloudflare.dns(),
  },
});

That block creates the CloudFront distribution, Lambda functions, and S3 bucket. Storyie uses Cloudflare for DNS, so we pass sst.cloudflare.dns() and SST handles the CNAME/alias records automatically.

The wildcard alias (*.storyie.com) is required for multi-tenant user subdomains. Every user gets a {username}.storyie.com page, so we need CloudFront to match any subdomain and route it through Next.js.

Stage isolation

SST's stage system is one of its strongest features. A single config file generates completely separate infrastructure per stage:

app: async (input) => {
  const stage = input?.stage || "dev";
  const envFile = stage === "production" ? ".env.production" : ".env";

  return {
    name: "storyie",
    removal: input?.stage === "production" ? "retain" : "remove",
    home: "aws",
  };
},

Two things here worth calling out:

removal: "retain" means that if you ever run sst remove on the production stage, the CloudFormation stack is deleted but the underlying resources (S3 buckets, Lambda functions, CloudFront distribution) are retained. It's a safety net against accidental destruction. Staging uses "remove" so it cleans itself up.

Domain branching gives staging its own subdomain with the same wildcard structure:

const domain =
  stage === "production"
    ? {
        name: "storyie.com",
        aliases: ["*.storyie.com"],
        dns: sst.cloudflare.dns(),
      }
    : {
        name: "staging.storyie.com",
        aliases: ["*.staging.storyie.com"],
        dns: sst.cloudflare.dns(),
      };

staging.storyie.com mirrors production's multi-tenant structure. Any feature involving user subdomains can be tested against {username}.staging.storyie.com before it goes live.

CloudFront Function for IP restriction

Staging is restricted by IP. Only our office IPs and personal connections can reach it. We implement this with a CloudFront Function on the viewer-request event — it runs at edge before the request hits Lambda, so blocked traffic costs nothing beyond the CloudFront request price.

const ipRestrictionCode =
  stage !== "production"
    ? `
var allowedIPs = ["221.246.xxx.xxx", "153.166.xxx.xxx"];
var clientIP = event.viewer.ip;
var uri = event.request.uri;

// Webhook paths come from external services — bypass the IP check
var bypassPaths = ["/api/stripe/webhook"];
var shouldBypass = bypassPaths.some(function(path) {
  return uri === path || uri.startsWith(path + "?");
});

if (!shouldBypass && allowedIPs.indexOf(clientIP) === -1) {
  return {
    statusCode: 403,
    statusDescription: "Forbidden",
    headers: { "content-type": { value: "text/html" } },
    body: "Access denied.",
  };
}
`.trim()
    : undefined;

Three things to get right here:

  1. CloudFront Functions run in ES5. No const, no let, no Array.prototype.includes(). Use var and indexOf(). Writing modern JS here will either fail silently or throw a CloudFront runtime error that's annoying to debug.
  2. Webhook paths must be bypassed explicitly. Stripe's webhook events come from Stripe's IP ranges, not ours. Without the bypass, stripe trigger in development and live webhook deliveries both get 403'd.
  3. The injection API takes a code string. SST's edge.viewerRequest.injection injects your code into the CloudFront Function's handler before the return statement:
edge: ipRestrictionCode
  ? {
      viewerRequest: {
        injection: ipRestrictionCode,
      },
    }
  : undefined,

Disabling the cache on OAuth routes

OAuth authorization codes are one-time-use. If CloudFront caches the response from /api/auth/callback, the second visitor (or retry) that hits the cache gets a stale response with an already-consumed code, and authentication breaks.

The fix is an ordered cache behavior that attaches Managed-CachingDisabled specifically to the /api/auth/* path pattern:

const cachingDisabledPolicy = await aws.cloudfront.getCachePolicy({
  name: "Managed-CachingDisabled",
});

const authCacheBehavior = {
  pathPattern: "/api/auth/*",
  viewerProtocolPolicy: "redirect-to-https",
  allowedMethods: ["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"],
  cachedMethods: ["GET", "HEAD"],
  cachePolicyId: cachingDisabledPolicy.id,
  compress: true,
};

SST exposes transform.cdn to reach the underlying Pulumi CloudFront resource. We prepend our behavior ahead of SST's defaults using $resolve to unwrap the Input<T> types:

transform: {
  cdn: (args) => {
    args.orderedCacheBehaviors = $resolve([
      args.orderedCacheBehaviors,
      args.defaultCacheBehavior,
    ]).apply(([existing, defaultBehavior]) => {
      const existingBehaviors = Array.isArray(existing) ? existing : [];
      return [
        {
          ...authCacheBehavior,
          targetOriginId: defaultBehavior.targetOriginId,
          originRequestPolicyId: defaultBehavior.originRequestPolicyId,
          functionAssociations: defaultBehavior.functionAssociations,
        },
        ...existingBehaviors,
      ];
    });
  },
},

The $resolve + .apply() pattern is the correct way to work with Pulumi's async Input types in SST — trying to read args.defaultCacheBehavior directly gives you a Pulumi Output, not the actual value.

Lambda configuration

server: {
  memory: "1024 MB",
  runtime: "nodejs22.x",
  architecture: "arm64",
  timeout: "20 seconds",
},
imageOptimization: {
  memory: "1536 MB",
},

arm64 (AWS Graviton) is meaningfully cheaper — roughly 20% less per GB-second than x86 — with equivalent or better performance for Node.js workloads. There's no reason not to use it for new deployments.

Image optimization gets more memory than the server function because Next.js's <Image> resize pipeline is memory-hungry. We found 1536 MB eliminates the occasional OOM on large uploaded images; the server function runs fine at 1024 MB.

nodejs22.x is the current LTS runtime. OpenNext keeps up with Node.js releases, so staying on the latest LTS gets you security patches without breaking changes.

Cron jobs

SST's sst.aws.Cron maps directly to EventBridge Scheduler → Lambda. All our background jobs live in the same sst.config.ts, next to the web deployment:

Job

Schedule

Purpose

PerformanceAggregator

Daily at 02:00 UTC

Aggregate performance metrics

ViewAggregator

Every 4 hours at :00

Count diary views

LikeAggregator

Every 4 hours at :30

Count diary likes

TagManager

Every hour

Extract and sync tags from diaries

DiaryReminderNotifier

Every 15 minutes

Send diary reminder push notifications

WeeklySummaryEmailSender

Sundays at 12:00 UTC

Send weekly summary emails

MilestoneEmailSender

Every 4 hours

Detect milestones and send emails

The 30-minute offset between ViewAggregator and LikeAggregator is deliberate:

// Views at :00
new sst.aws.Cron("ViewAggregator", {
  schedule: "cron(0 */4 * * ? *)",
  // ...
});

// Likes at :30 — staggered to avoid simultaneous DB load
new sst.aws.Cron("LikeAggregator", {
  schedule: "cron(30 */4 * * ? *)",
  // ...
});

Both jobs hit the same database tables. Running them simultaneously would double the instantaneous query load. Offsetting by 30 minutes costs nothing and keeps the DB load smooth.

Environment variables: one source of truth

SST passes environment variables to Lambda via the environment property. The principle we follow: stage-specific values live in infra code, not in .env files.

environment: {
  NEXT_PUBLIC_BASE_URL:
    stage === "production"
      ? "https://storyie.com"
      : "https://staging.storyie.com",
  // ... other vars
}

Even if .env.production contains NEXT_PUBLIC_BASE_URL=https://storyie.com, the sst.config.ts value overrides it at deploy time. This matters because .env files and infra code can drift independently. If the URL is defined in both places, one of them will eventually be wrong. Centralizing stage-specific values in sst.config.ts makes it the authoritative source.

Cache headers

invalidation: {
  paths: "all",
  wait: false,
},
assets: {
  nonVersionedFilesCacheHeader:
    "public,max-age=0,s-maxage=86400,stale-while-revalidate=8640",
  versionedFilesCacheHeader:
    "public,max-age=31536000,immutable",
},

Versioned files (everything under _next/static/) get a one-year browser cache plus immutable. The content hash in the filename guarantees these files never change between builds, so there's no reason to revalidate.

Non-versioned files get no browser cache (max-age=0) but a one-day CDN cache with stale-while-revalidate. Visitors always get a fresh file, but the CDN doesn't hammer the origin on every request.

wait: false on the invalidation means deploy doesn't block waiting for CloudFront to flush all paths — it kicks off the invalidation and returns. The invalidation finishes in the background, typically within a minute.

Compared to Vercel

To be direct: if your Next.js app doesn't have unusual infrastructure requirements, Vercel is easier. git push deploys, preview environments are automatic, and you don't touch CloudFormation.

We use SST because Storyie needs:

  • Fine-grained CloudFront control — IP restrictions, per-route cache policies, edge functions.
  • Cron jobs in the same codebase — Lambda-based scheduling without a separate service.
  • Multi-tenant wildcard domains — full control over *.storyie.com routing.
  • Cost — AWS charges per usage. Vercel Pro is per-seat regardless of scale.
  • No platform lock-in — AWS infrastructure generalizes; if SST itself is ever a problem, the underlying resources are standard CloudFormation.

The SST downsides are real:

  • Initial stack creation takes 10–15 minutes. CloudFormation bootstrapping is slow.
  • Deploys take 2–5 minutes vs. Vercel's ~30 seconds.
  • OpenNext compatibility lags. New Next.js features sometimes need an OpenNext release before they work correctly on Lambda.
  • Debugging requires CloudWatch. There's no Vercel-style function log UI — you go to CloudWatch Logs.

Takeaways

  1. SST stages give you full environment isolation from a single config. Production and staging diverge only where they need to — domain, env file, removal policy — and share everything else.
  2. CloudFront Functions are the right tool for edge IP restriction. Lightweight, cheap, and they run before Lambda. Write them in ES5 and remember to bypass external webhook paths.
  3. OAuth routes must have caching disabled. Authorization codes are one-time-use; a cached response breaks auth.
  4. Graviton (arm64) is a free cost reduction for Lambda-backed Next.js deployments. Use it by default.
  5. Stagger cron jobs that share a database. Simultaneous batch queries compound unnecessarily.
  6. Stage-specific env vars belong in sst.config.ts, not in .env files. Two sources of truth drift apart eventually.

Related Posts

Try Storyie

Storyie is live at storyie.com — the infrastructure described here is exactly what serves it. If you write a diary on the web and open it on the iOS app, you're seeing the same AWS deployment from two different entry points.