SitecoreAI - Next.js head app deployed to GCP Cloud Run with ISR enabled
Back to home

SitecoreAI CMS (XM Cloud) Next.js on GCP Cloud Run, Part 2: ISR with Google Cloud Storage as Distributed Cache

Miguel Minoldo's picture
Miguel Minoldo

In Part 1, I got a SitecoreAI Next.js head running on Cloud Run with Cloud Build and Artifact Registry. It worked, but it was pure SSR. Every page hit triggered a full GraphQL fetch to Sitecore's Edge delivery API. Fine for a PoC. Not how you want to run a content site at scale.

The natural next step was ISR (Incremental Static Regeneration): pages built once, cached, served fast, regenerated only when the TTL expires. On Vercel this is a checkbox. On Cloud Run, it turned into a three-day detour through a Redis dead-end before landing on a clean Google Cloud Storage solution.

The problem with ISR on Cloud Run

On Vercel, ISR is automatic. On a container platform, you own the cache infrastructure. Two questions come up immediately:

Does ISR even work on Cloud Run? Cloud Run scales to multiple instances, and each instance has its own local file system. If instance A regenerates a page, instance B doesn't know about it.

If you add a distributed cache, will Next.js use it? Next.js instruments the global fetch to track rendering dependencies, and has done so since the early App Router days. It turns out this matters a lot when you pick your cache backend, and on Next.js 16 it's what killed my first attempt.

The short answer to the first question: yes. Each instance has a local cache, so the first miss regenerates and then it's fast. With --min-instances=1 (which we already set in Part 1), one instance stays warm and you get consistent ISR behaviour. Multiple instances give you per-instance caches. Not ideal for efficiency, since each instance may regenerate the same page independently, but the site still serves correctly.

The short answer to the second question: it depends entirely on how your cache client makes HTTP calls. That's where the detour started.

First attempt: Upstash Redis, and why it failed

Redis is the obvious choice for a shared ISR cache. Upstash offers serverless Redis with a free tier, a good Node.js SDK, and an HTTP API that works without persistent connections. It looked like a natural fit for Cloud Run.

I installed @upstash/redis and wired it up through @neshca/cache-handler, a popular wrapper that maps the Next.js cacheHandler interface to various backends. First hit: @neshca/cache-handler@1.9.0 has a peer dependency on next@">=13.5.1 <15". This project runs Next.js 16. I resolved it with legacy-peer-deps=true for the time being.

Then I deployed. The logs told the story:

Error: Page changed from static to dynamic at runtime /mysite/en, reason: no-store fetch https://<your-instance>.upstash.io/pipeline

Every page was being forced to dynamic rendering. ISR was silently disabled.

What's happening

The @upstash/redis SDK uses the Web fetch API to talk to Upstash over HTTPS. Next.js 16 instruments the global fetch to track caching semantics. When Upstash makes its request with cache: 'no-store' (which it does by design, you don't want Redis responses cached by Next.js), Next.js sees a no-store fetch during page rendering and concludes the page has dynamic data dependencies. It overrides export const revalidate and forces the page to render on every request.

This is not a bug in Upstash or in Next.js. The no-store opt-out behaviour has existed since Next.js 13/14, it's documented, and Next.js 16 still enforces it. But it completely breaks the ISR approach with any fetch-based cache client.

Why this is unfixable with Upstash

The no-store annotation is set deep inside the Upstash SDK. You can't remove it. And even if you could, caching Redis responses inside Next.js would introduce correctness problems. In my testing, Upstash's HTTP-based client triggered Next.js dynamic rendering semantics and disabled ISR. Other fetch-based cache clients may exhibit similar behaviour and should be validated carefully.

To be clear, this is not "Redis doesn't work for ISR". Plenty of teams run Next.js ISR on Redis successfully, with node-redis or ioredis over a persistent TCP connection, which never touches the instrumented global fetch. What fails is specifically the HTTP/fetch-based client. For this setup I didn't go down the TCP route: on Cloud Run that means Memorystore plus a VPC connector, real infrastructure and cost for a PoC, and long-lived TCP connections sit awkwardly with scale-to-zero containers. Which led me to look for a backend that needed no extra infrastructure at all.

Second attempt: Google Cloud Storage, and why it works

The key insight is that Next.js only tracks calls made through the global Web fetch that it patches at runtime. It knows nothing about HTTP calls that bypass it.

The @google-cloud/storage SDK doesn't use the global fetch. Its transport layer (gaxios) goes through node-fetch, a separate module built directly on Node.js's http stack. From Next.js's perspective, the GCS calls are invisible. The page renders with its normal revalidate = 300 TTL intact, and the GCS read/write happens as a side effect that Next.js doesn't track.

This isn't a workaround. It's the correct mental model. The Next.js cache handler interface is designed to be opaque. What matters is that your get() and set() implementations don't go through the instrumented global fetch.

One honest caveat: this property depends on the SDK's transport, which is a Google implementation detail. There are open requests asking Google to migrate gaxios to the native global fetch. If that ever ships in a future major version, this behaviour changes. Pin your @google-cloud/storage version and re-test the static/dynamic behaviour when you upgrade it.

The architecture

Caching and ISR Architecture Diagram
Click to expand
Caching and ISR Architecture Diagram

All Cloud Run instances share the same GCS bucket. The first instance to render a page wins and writes to GCS. Subsequent requests are able to reuse the shared cache entry regardless of which instance serves them.

Implementation

1. Enable ISR on the page route

In src/app/[site]/[locale]/[[...path]]/page.tsx, add:

typescript
1export const revalidate = 300; // 5 minutes

This is the only change needed in the page component itself. Next.js handles the rest via the cache handler.

One warning: do not wrap the page render in a manual getCachedPage() helper or any custom caching layer. Next.js ISR manages the full-page cache lifecycle. Adding your own layer creates double-caching with unpredictable TTL behaviour.

A production XM Cloud implementation would ideally trigger revalidation from Sitecore publishing events rather than relying exclusively on TTL expiration.

2. Create the GCS cache handler

Create cache-handler.js at the app root (not inside src/):

1const { Storage } = require('@google-cloud/storage');
2
3class GCSCacheHandler {
4 constructor(options) {
5 this.revalidatedTags = options.revalidatedTags || [];
6
7 const bucketName = process.env.GCS_CACHE_BUCKET;
8 if (bucketName) {
9 this.bucket = new Storage().bucket(bucketName);
10 } else {
11 console.warn('[cache-handler] GCS_CACHE_BUCKET not set, caching disabled');
12 }
13 }
14
15 async get(key) {
16 if (!this.bucket) return null;
17
18 try {
19 const [content] = await this.bucket.file(toObjectName(key)).download();
20 const stored = JSON.parse(content.toString('utf8'));
21
22 if (stored.tags?.some((tag) => this.revalidatedTags.includes(tag))) {
23 return null;
24 }
25 if (stored.revalidateAfter && Date.now() > stored.revalidateAfter) {
26 return null;
27 }
28
29 return { lastModified: stored.lastModified, value: stored.value };
30 } catch (err) {
31 if (err.code === 404) return null;
32 console.error('[cache-handler] get error:', err.message);
33 return null;
34 }
35 }
36
37 async set(key, data, ctx = {}) {
38 if (!this.bucket) return;
39
40 const { revalidate, tags = [] } = ctx;
41
42 try {
43 const payload = JSON.stringify({
44 value: data,
45 lastModified: Date.now(),
46 tags,
47 revalidateAfter: revalidate ? Date.now() + revalidate * 1000 : null,
48 });
49
50 await this.bucket.file(toObjectName(key)).save(payload, {
51 contentType: 'application/json',
52 });
53 } catch (err) {
54 console.error('[cache-handler] set error:', err.message);
55 }
56 }
57
58 async revalidateTag(tag) {
59 // Time-based ISR only, the no-op keeps Next.js happy
60 console.log(`[cache-handler] revalidateTag: ${tag}`);
61 }
62}
63
64function toObjectName(key) {
65 return `isr/${key.replace(/[#?[\]*]/g, '_')}`;
66}
67
68module.exports = GCSCacheHandler;

Why a direct implementation, not @neshca/cache-handler? The wrapper declares a peer dependency on next@">=13.5.1 <15" and this project uses Next.js 16. It's dead weight. The Next.js cacheHandler interface is simple enough to implement directly in ~70 lines, with no dependency conflicts.

Why plain JavaScript, not TypeScript? next.config.ts loads cache-handler.js at runtime via path.resolve(). Next.js expects the cache handler to be a plain Node.js module it can load directly, CommonJS or .mjs both work, TypeScript does not. I kept CommonJS for simplicity.

Authentication. No credentials in code. Cloud Run uses the default service account via Application Default Credentials (ADC), and the Storage() constructor picks this up automatically. Locally, run gcloud auth application-default login and set GCS_CACHE_BUCKET in .env.local.

3. Update next.config.ts

javascript
1import path from 'path';
2
3const nextConfig: NextConfig = {
4 output: 'standalone',
5
6 cacheHandler:
7 process.env.NODE_ENV === 'production'
8 ? path.resolve('./cache-handler.js')
9 : undefined,
10 cacheMaxMemorySize: 0, // disable in-memory cache, GCS is the source of truth
11
12 // ... rest of config
13};

For this implementation I disabled the in-memory cache (cacheMaxMemorySize: 0) so GCS remains the single source of truth across instances. This defeats the purpose of the shared cache: an instance would serve stale in-memory content even after GCS has been updated.

Good to know for Next.js 16: cacheHandler (singular) covers ISR, route handler responses, and optimized images. It is not used by 'use cache' directives. If you adopt Cache Components later, those go through the separate cacheHandlers (plural) config. Two different systems, easy to confuse.

4. Update the Dockerfile

The standalone output doesn't automatically include files outside src/. You need to explicitly copy cache-handler.js into the runner stage:

dockerfile
1FROM node:22-alpine AS builder
2WORKDIR /app
3# ... install, build steps ...
4RUN npm run docker:build
5
6FROM node:22-alpine AS runner
7WORKDIR /app
8# ... copy standalone output ...
9COPY --from=builder --chown=nextjs:nodejs /app/cache-handler.js ./
10# ^^^^ this line is the only addition
11
12USER nextjs
13CMD ["node", "server.js"]

If you forget this, Cloud Run starts successfully but logs Cannot find module './cache-handler.js' and falls back to per-instance file-system caching without warning. More on that in the issues section.

5. GCP setup

Create the bucket:

bash
1gcloud storage buckets create gs://sitecore-isr-cache \
2 --project=YOUR_PROJECT_ID \
3 --location=europe-west1 \
4 --uniform-bucket-level-access

Grant Cloud Run's service account read/write access:

bash
1# Get your project number
2gcloud projects describe YOUR_PROJECT_ID --format="value(projectNumber)"
3
4gcloud storage buckets add-iam-policy-binding gs://sitecore-isr-cache \
5 --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" \
6 --role="roles/storage.objectAdmin"

roles/storage.objectAdmin gives create, read, update, and delete on objects. Don't be tempted to scope down to roles/storage.objectCreator: it grants create only, no read. Your cache handler's get() would fail on every lookup, and ISR revalidation overwrites the same object key, which in GCS requires delete permission. The cache would write and never hit, silently. If you want something tighter than objectAdmin, roles/storage.objectUser (read/write without ACL management) is the minimum that works.

6. Pass GCS_CACHE_BUCKET to Cloud Run

In cloudbuild.yaml, add GCS_CACHE_BUCKET to the --set-env-vars flag on the gcloud run deploy step:

yaml
1- '--set-env-vars=...,GCS_CACHE_BUCKET=${_GCS_CACHE_BUCKET}'

And in the substitutions section:

yaml
1substitutions:
2 _GCS_CACHE_BUCKET: 'sitecore-isr-cache'

This is not a secret, it's just a bucket name. No need to put it in Secret Manager.

The Issues

Same drill as Part 1. This is the part that cost me the three days.

Issue 1: @neshca/cache-handler peer dep conflict with Next.js 16

Error:

npm error ERESOLVE could not resolve @neshca/cache-handler@1.9.0 peer next@">= 13.5.1 < 15"

What's happening: The wrapper library has not been updated for Next.js 15/16. The ERESOLVE blocks the Docker build.

Fix: Remove @neshca/cache-handler from package.json. Implement the cacheHandler class interface directly (see above). The interface is three methods: get, set, revalidateTag. No wrapper needed. (If you do want a maintained wrapper, the community successor is @fortedigital/nextjs-cache-handler, which targets Next.js 15+, but for this use case the direct implementation is smaller than the integration code would be.)

Issue 2: Page changed from static to dynamic with Upstash Redis

Error (Cloud Run logs):

Error: Page changed from static to dynamic at runtime /mysite/en, reason: no-store fetch https://<your-instance>.upstash.io/pipeline

What's happening: The Upstash SDK uses the Web fetch API with cache: 'no-store'. Next.js 16 instruments fetch globally to track rendering dependencies. A no-store fetch during page rendering forces the page to dynamic rendering, overriding export const revalidate.

Fix: Switch from Redis (Upstash or any client built on the global fetch) to a backend whose SDK bypasses it. GCS qualifies: the Google Cloud Storage SDK @google-cloud/storage does not appear to use the instrumented global fetch that Next.js tracks for rendering decisions.

Issue 3: GCS_CACHE_BUCKET not set on Cloud Run despite the env var being configured

What's happening: The handler logs GCS_CACHE_BUCKET not set even though it appears in the Cloud Run service configuration.

What to check:

  1. The env var is set on the Cloud Run service, not just in Cloud Build substitutions.
  2. The deployment step in cloudbuild.yaml includes GCS_CACHE_BUCKET in --set-env-vars. Cloud Build substitution values do not automatically become Cloud Run env vars, they must be explicitly forwarded.

This is the same class of mistake as Issue 6 in Part 1: provisioning that silently doesn't apply what you think it applied.

Issue 4: Cannot find module './cache-handler.js' on Cloud Run

What's happening: cache-handler.js was not copied into the runner stage of the Docker build. The output: 'standalone' build only includes what's under .next/standalone. Files in the app root are not included automatically.

Fix: Add this line to the Dockerfile runner stage:

dockerfile
1COPY --from=builder --chown=nextjs:nodejs /app/cache-handler.js ./

The dangerous part is that this fails silently. The container starts, the site serves, ISR appears to work. It's just falling back to per-instance file-system caching, and your distributed cache does nothing.

Validating the cache

Once deployed, verify that ISR is writing to GCS.

GCP Console: Cloud Storage → Buckets → your bucket → isr/ folder. You should see one object per cached page path, named after the URL path with special characters replaced by _.

gcloud CLI:

bash
1gcloud storage ls gs://sitecore-isr-cache/isr/

Check a cached object:

bash
1gcloud storage cat gs://sitecore-isr-cache/isr/mysite_en \
2 | python -m json.tool | head -30

You should see a JSON object with value, lastModified, tags, and revalidateAfter fields. The revalidateAfter timestamp should be ~5 minutes in the future from when the page was last rendered.

Confirm ISR is working (not SSR):

  1. Load a page and note the response time (~200–500ms on first render).
  2. Reload immediately. It should be much faster (~20–50ms from a GCS cache hit).
  3. Check the Cloud Run logs. You should not see a new Sitecore Edge API call on the second load.
  4. Wait 5+ minutes and reload again. You should see a fresh Sitecore Edge call as the page revalidates.

What This Actually Teaches You

ISR is a rendering strategy, not a deployment feature. Vercel makes it feel like a platform capability, but it's a Next.js runtime behaviour. It works anywhere you can run node server.js, as long as you provide the right cacheHandler implementation.

Next.js's fetch tracking decides which cache backends are viable. Any backend whose SDK uses the patched global fetch (Upstash, most HTTP-based stores) will silently disable ISR for your pages. Use backends whose SDKs bypass it: GCS (via node-fetch), S3 (Node http handler), ioredis over TCP, filesystem. This is underdocumented, version-dependent, and the single most important takeaway of this post. When you upgrade a cache SDK, re-check that your pages are still static.

The cacheHandler interface is intentionally simple. Three methods. You don't need a wrapper library. Writing a direct implementation gives you full control, removes peer dependency risks, and is easy to debug.

GCS as a cache is not "optimal", it's pragmatic. GCS is not an in-memory store. Latency is higher than Redis. For a Sitecore content site where pages are cached for minutes at a time, the GCS read latency (~50–100ms) on a cache hit is acceptable. If sub-10ms cache reads matter to you, Redis is still the right tool, but use an SDK that doesn't go through the global fetch (e.g., ioredis over TCP, not Upstash's HTTP client).

Cloud Run with --min-instances=1 is a viable ISR host. One warm instance gives you consistent ISR behaviour. Multiple instances share the GCS cache. Scale-up causes brief ISR duplication (two instances may independently regenerate the same page), but correctness is maintained.

Silent fallbacks are the real enemy. Two of the four issues here (the missing env var and the missing COPY line) fail without any error at request time. The site works, the cache just isn't shared. Validate the bucket contents, don't trust the response times alone.

Final file structure

Next.js App on Cloud run with GCS ISR Cache
Click to expand
Next.js App on Cloud run with GCS ISR Cache

The Bottom Line

In Part 1 the friction came from Vercel assumptions in the starter kit. Here, the friction comes from Vercel assumptions in the ecosystem around Next.js caching. The tooling everyone reaches for (Upstash, @neshca/cache-handler) was built for a world where the platform manages ISR for you, and it quietly breaks under Next.js 16's fetch tracking.

The fix is not clever. It's a 70-line CommonJS file, a GCS bucket, and one COPY line in the Dockerfile. The hard part was understanding why Redis failed, because the failure mode (every page going dynamic) looks nothing like a caching problem.

What's Next

With ISR and distributed caching working, the next layer is the edge CDN I mentioned at the end of Part 1. The current setup regenerates pages in Cloud Run (europe-west1), so users far from that region still pay the 50–100ms GCS read. Putting a CDN in front (Fastly, Cloudflare, or Cloud CDN) would serve cached responses from points of presence closer to users.

The caveat: HTML caching at the CDN layer conflicts with Sitecore Personalize if you're serving variant pages per user. The model I'm leaning towards is CDN for static assets (_next/static/*) and origin shielding, while ISR handles the HTML cache at the Cloud Run layer. That's the setup I'm evaluating next, and most likely Part 3.

References