GCP + Clud Run + SitecoreAI
Back to home

Deploying SitecoreAI CMS (XM Cloud) Next.js to Google GCP Cloud Run

Miguel Minoldo's picture
Miguel Minoldo

The SitecoreAI CMS (formerly XM Cloud) starter kits are built for Vercel/Netlify. That's fine. But if your organisation runs on GCP, you're going to hit a wall, (several of them), in sequence. I had some fun getting into myself as preparing a PoC for one of our customers.

This post documents the full path: the architecture decisions, the pipeline, and every concrete error we hit deploying a Next.js SitecoreAI head to Cloud Run using Cloud Build and Artifact Registry.

No "hello world" framing. Just what actually happened and why.

The architecture
Click to expand
The architecture

Three decisions shaped everything.

No static generation. Cloud Run is a request-time SSR platform. generateStaticPaths: false in sitecore.config.ts. There's no value in pre-generating paths at build time, and it will fail anyway (more on that below).

Secrets at runtime, not in the image. SITECORE_EDGE_CONTEXT_ID and other secrets are injected by Cloud Run from Secret Manager. The image contains no credentials. One exception: NEXT_PUBLIC_* variables must be baked in at next build time because Next.js replaces them with literal values in the client bundle. Runtime config has no effect on those.

sitecore-tools:build runs outside Docker. This SDK step makes live calls to the Edge API to generate .sitecore/ files (sites.json, metadata.json), import maps. It can't run inside a Docker build context where secrets are reasonably not available as build-args. It runs before the Docker build, and the output files travel with the source into the Cloud Build workspace.

GCP Setup

I've used the Cloud Shell, that it actually comes with a VS Code-Style IDE.

GCP Sell Console + IDE
Click to expand
GCP Sell Console + IDE

Creating a Google Cloud Project

  • Create a Google Cloud project:
    • gcloud projects create miguel-sitecore-poc
    • Replace PROJECT_ID with a name for the Google Cloud project you are creating.
  • Verify that billing is enabled for your Google Cloud project.
    • Select the Google Cloud project that you created:
    • gcloud config set project miguel-sitecore-poc
    • Replace PROJECT_ID with your Google Cloud project name.
  • Enable the Cloud Run Admin API and Cloud Build APIs:
    • Roles required to enable APIs
    • gcloud services enable run.googleapis.com cloudbuild.googleapis.com
    • After the Cloud Run Admin API is enabled, the Compute Engine default service account is automatically created.

Grant the Cloud Build service account access to your project

Cloud Build automatically uses the Compute Engine default service account as the default Cloud Build service account to build your source code and Cloud Run resource, unless you override this behavior.

For Cloud Build to build your sources, grant the Cloud Build service account the Cloud Run Builder (roles/run.builder) role on your project:

gcloud projects add-iam-policy-binding miguel-sitecore-poc \
--member=serviceAccount:migue77000@gmail.com \
--role=roles/run.builder

Replace PROJECT_ID with your Google Cloud project ID and SERVICE_ACCOUNT_EMAIL_ADDRESS with the email address of the Cloud Build service account. If you're using the Compute Engine default service account as the Cloud Build service account, then use the following format for the service account email address:

PROJECT_NUMBER-compute@developer.gserviceaccount.com

Replace PROJECT_NUMBER with your Google Cloud project number.

For detailed instructions on how to find your project ID, and project number, see Creating and managing projects.

Granting the Cloud Run builder role takes a couple of minutes to propagate.

Artifact Registry repository, Secret Manager secrets, and IAM bindings. Run this once per environment.

Finally, I've just used the IDE with the GitHub integration to clone my SUGCON project (Content SDK v2.0 vanilla).

Artifact Registry repository, Secret Manager secrets, and IAM bindings

Run this once per environment:

gcloud artifacts repositories create sitecore-heads \
--repository-format=docker \
--location=europe-west1 \
--project=YOUR_PROJECT_ID

Environment variables:

echo -n "$SITECORE_EDGE_CONTEXT_ID" | \
gcloud secrets create sitecore-edge-context-id --data-file=- --project=YOUR_PROJECT_ID

echo -n "$SITECORE_EDITING_SECRET" | \
gcloud secrets create sitecore-editing-secret --data-file=- --project=YOUR_PROJECT_ID

echo -n "$REVALIDATE_SECRET" | \
gcloud secrets create revalidate-secret --data-file=- --project=YOUR_PROJECT_ID

Grant the Compute default service account access to every secret:

PROJECT_NUMBER=$(gcloud projects describe YOUR_PROJECT_ID --format="value(projectNumber)")
COMPUTE_SA="${PROJECT_NUMBER}-compute@developer.gserviceaccount.com"

for SECRET in sitecore-edge-context-id sitecore-editing-secret fastly-api-token revalidate-secret; do
gcloud secrets add-iam-policy-binding $SECRET \
--project=YOUR_PROJECT_ID \
--member="serviceAccount:${COMPUTE_SA}" \
--role="roles/secretmanager.secretAccessor"
done

Do not skip the loop. More on what happens when you do (Issue 6).

The Dockerfile

Two-stage build. Builder runs next build. Runner uses the standalone output.

FROM node:22-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
RUN npm install

COPY . .

ARG SITECORE_EDGE_CONTEXT_ID
ENV SITECORE_EDGE_CONTEXT_ID=$SITECORE_EDGE_CONTEXT_ID

ARG NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID
ENV NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID=$NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID

RUN npm run docker:build
RUN mkdir -p /app/public

FROM node:22-alpine AS runner
WORKDIR /app

ENV NODE_ENV=production
ENV PORT=3000

RUN addgroup --system --gid 1001 nodejs \
&& adduser --system --uid 1001 nextjs

COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /app/public ./public

USER nextjs
EXPOSE 3000
CMD ["node", "server.js"]

The docker:build script runs only next build, not sitecore-tools:build:

"docker:build": "cross-env NODE_ENV=production next build"

The RUN mkdir -p /app/public line will look unnecessary until it isn't. See Issue 5.

The Cloud Build Pipeline

steps:
- name: 'node:22-alpine'
entrypoint: npm
args: ['install']
dir: '${_HEAD_APP_DIR}'
id: install

- name: 'node:22-alpine'
entrypoint: npm
args: ['run', 'sitecore-tools:build']
dir: '${_HEAD_APP_DIR}'
secretEnv: ['SITECORE_EDGE_CONTEXT_ID']
id: sitecore-tools
waitFor: ['install']

- name: 'gcr.io/cloud-builders/docker'
secretEnv: ['SITECORE_EDGE_CONTEXT_ID', 'NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID']
args:
- build
- '--build-arg'
- 'SITECORE_EDGE_CONTEXT_ID'
- '--build-arg'
- 'NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID'
- '-t'
- '${_REGION}-docker.pkg.dev/${_PROJECT_ID}/${_REPO_NAME}/${_SERVICE_NAME}:${_IMAGE_TAG}'
- '${_HEAD_APP_DIR}'
id: docker-build
waitFor: ['sitecore-tools']

- name: 'gcr.io/cloud-builders/docker'
args: [push, '--all-tags', '${_REGION}-docker.pkg.dev/${_PROJECT_ID}/${_REPO_NAME}/${_SERVICE_NAME}']
id: docker-push
waitFor: ['docker-build']

- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: gcloud
args:
- run
- deploy
- '${_SERVICE_NAME}'
- '--image=${_REGION}-docker.pkg.dev/${_PROJECT_ID}/${_REPO_NAME}/${_SERVICE_NAME}:${_IMAGE_TAG}'
- '--region=${_REGION}'
- '--platform=managed'
- '--allow-unauthenticated'
- '--port=3000'
- '--min-instances=1'
- '--max-instances=10'
- '--memory=1Gi'
- '--set-env-vars=SITECORE_SITE_NAME=${_SITECORE_SITE_NAME},NEXT_PUBLIC_DEFAULT_LANGUAGE=${_DEFAULT_LANGUAGE},SITECORE_INTERNAL_EDITING_HOST_URL=http://localhost:3000'
- '--set-secrets=SITECORE_EDGE_CONTEXT_ID=sitecore-edge-context-id:latest,SITECORE_EDITING_SECRET=sitecore-editing-secret:latest,FASTLY_API_TOKEN=fastly-api-token:latest,REVALIDATE_SECRET=revalidate-secret:latest'
id: deploy
waitFor: ['docker-push']

availableSecrets:
secretManager:
- versionName: projects/${_PROJECT_ID}/secrets/sitecore-edge-context-id/versions/latest
env: SITECORE_EDGE_CONTEXT_ID
- versionName: projects/${_PROJECT_ID}/secrets/sitecore-edge-context-id/versions/latest
env: NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID

options:
logging: CLOUD_LOGGING_ONLY
machineType: 'E2_HIGHCPU_8'

Note the same secret mapped to two env names in availableSecrets. That's deliberate. The context ID needs to reach the client bundle as NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID and the server-side config as SITECORE_EDGE_CONTEXT_ID. Same value, two names, one secret version.

The Results

Build + Deploy successfully finished
Click to expand
Build + Deploy successfully finished

Then I created a new external Editing Host to check that everything was working as expected (spoiler: it wasn’t, see issues 7 & 8).

New external Rendering Host pointing to the GCP app
Click to expand
New external Rendering Host pointing to the GCP app
Assign the new RH to the site
Click to expand
Assign the new RH to the site
Website deployed to GCP Cloud Run
Click to expand
Website deployed to GCP Cloud Run
Pages Builder fully functional with the GCP external RH
Click to expand
Pages Builder fully functional with the GCP external RH

The Issues

This is the part that cost me actual time.

Issue 1: sitecore-tools:build fails inside Docker

Error: Configuration error: provide either Edge contextId

What's happening: sitecore-tools:build makes live calls to the Sitecore Experience Edge API to build the .sitecore/ folder — sites.json, metadata.json, import-map.server.ts, import-map.client.ts, component-map.client.ts. It needs a real, valid SITECORE_EDGE_CONTEXT_ID. Inside a Docker build context, you don't want secrets as build-args — they end up in image layers and in docker history.

Fix: Run it outside Docker, before the image build. The generated files ride along with the source into the Cloud Build workspace. The Dockerfile only runs next build.

# build.ps1
npm --prefix $HeadAppDir install
npm --prefix $HeadAppDir run sitecore-tools:build # live Edge API here
gcloud builds submit $HeadAppDir --config=cloudbuild.build-only.yaml

Issue 2: Module not found: Can't resolve '.sitecore/import-map.server'

Error: Module not found: Can't resolve '.sitecore/import-map.server'

What's happening: .sitecore/ is in .gitignore. gcloud builds submit defaults to .gitignore rules, so the files you just generated locally never reach the Cloud Build workspace. The Docker build runs with the source missing those generated files entirely.

Fix: Create a .gcloudignore inside the app directory — not the repo root. gcloud builds submit <dir> picks it up from the submitted directory. Just omit .sitecore/ from the exclusions:

# .gcloudignore
node_modules/
.next/
.git/
.env*
*.md

.gitignore still excludes .sitecore/ for version control. .gcloudignore lets it through to the build. Two separate concerns, two separate files.

Issue 3: SDK validation throws during next build

Error: Configuration error: provide either Edge contextId at validateApiConfiguration (...)

This one hits during Next.js "Collecting page data", after both previous issues are resolved.

What's happening: sitecore.config.ts uses defineConfig, which sets up a lazy getter for api.edge.contextId. When Next.js spawns worker processes for static generation, those workers reimport modules and the SITECORE_EDGE_CONTEXT_ID Docker ENV is not reliably available in them. Additionally, not-found.tsx calls client.getErrorPage() at module-init time, and Next.js tries to statically render the 404 page at build time.

Fix: sitecore.config.ts: Guard the context ID with a NEXT_PHASE check:

const contextId =
process.env.SITECORE_EDGE_CONTEXT_ID ||
process.env.NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID ||
(process.env.NEXT_PHASE === 'phase-production-build' ? '__build_phase__' : '');

const clientContextId = process.env.NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID || contextId;

export default defineConfig({
api: { edge: { contextId, clientContextId } },
generateStaticPaths: false,
});

NEXT_PHASE is set to phase-production-build by Next.js during next build. The placeholder value satisfies the SDK validator. It will never be used at request time.

Fix: not-found.tsx: Force it to server-render, not statically generate:

export const dynamic = 'force-dynamic';

Issue 4: GraphQL Error 404: "No sitecore context" in generateStaticParams

Error: GraphQL Error (Code: 404): {"response":{"errors":[{"message":"No sitecore context"}]}}

What's happening: Even with the NEXT_PHASE guard satisfying the SDK validator, Next.js still calls generateStaticParams for the catch-all [...path] route. That function calls client.getAppRouterStaticParams(), which hits the live Edge API with the __build_phase__ placeholder — which is not a valid context ID, which returns a 404.

Fix: generateStaticPaths: false in sitecore.config.ts. The Content SDK checks this flag before making the Edge API call. It's also semantically correct — Cloud Run is an SSR deployment. There's no CDN pre-rendering, so static path generation serves no purpose here.

Issue 5: COPY failed: stat app/public: file does not exist

Error: COPY failed: stat app/public: file does not exist

This one lands at Docker build step 16 of 19, after a fully successful next build. Painful.

What's happening: The standard Next.js standalone Dockerfile template includes COPY public ./public. Most apps have a public/ directory for static assets. This project doesn't. Docker's COPY has no conditional copy syntax, it fails hard if the source doesn't exist.

Fix: Add RUN mkdir -p /app/public in the builder stage after next build. The directory always exists for the runner stage to copy from:

RUN npm run docker:build
RUN mkdir -p /app/public

Issue 6: Permission denied on secret: fastly-api-token at Cloud Run revision creation

Error: (gcloud.run.deploy) spec.template.spec.containers[0].env[5].value_from.secret_key_ref.name:
Permission denied on secret: projects/.../secrets/fastly-api-token/versions/latest for Revision service account ...-compute@developer.gserviceaccount.com.

What's happening: The provisioning script skips any secret whose corresponding environment variable wasn't set when the script ran. FASTLY_API_TOKEN wasn't set in the shell at the time, so the IAM binding for the Compute service account was silently never applied. Cloud Run fails at revision creation (not at deploy time) which delays the feedback loop by a full build cycle.

Fix:

gcloud secrets add-iam-policy-binding fastly-api-token \
--project=YOUR_PROJECT_ID \
--member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"

Prevention: Decouple "create secret value" from "grant access". Always run the IAM binding step against existing secrets regardless of whether you're updating the value. Or grant secretmanager.secretAccessor at the project level if your security posture allows it. Either way, provisioning scripts should fail loudly when values are missing, not skip silently.

Issue 7: "Client Edge API settings missing from configuration"

Error (browser console, Sitecore Pages): Client Edge API settings missing from configuration

The original diagnosis pointed at the SDK. It wasn't the SDK.

What's actually happening: Grepping the codebase for the exact error string reveals it originates in src/Bootstrap.tsx project code, not the SDK. That component is 'use client' and reads config.api.edge?.clientContextId directly from sitecore.config. In the browser bundle, SITECORE_EDGE_CONTEXT_ID is undefined (no NEXT_PUBLIC_ prefix, never inlined by Next.js). clientContextId is an empty string. The CloudSDK call hits the else-branch and logs the error.

Three things made this hard to find: the SDK's own types reference clientContextId, the error is a console.error not an exception, and the NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID build-arg fix, genuinely needed for personalisation, didn't silence this particular message. So you apply the right fix for the wrong reason and the error persists.

Fix: Server Component prop pattern:

Server Components have full access to runtime environment variables. Read the secret there, pass it as a prop. Never import unprefixed process.env.SITECORE_* inside a 'use client' file.

src/app/[site]/layout.tsx - Server Component:

// src/app/[site]/layout.tsx — Server Component
export default async function SiteLayout({ children, params }) {
const { site } = await params;
const { isEnabled } = await draftMode();
const contextId = process.env.SITECORE_EDGE_CONTEXT_ID || '';

return (
<>
<Bootstrap siteName={site} isPreviewMode={isEnabled} contextId={contextId} />
{children}
</>
);
}

src/Bootstrap.tsx - Client Component

// src/Bootstrap.tsx — Client Component
'use client';
const Bootstrap = ({ siteName, isPreviewMode, contextId }: {
siteName: string; isPreviewMode: boolean; contextId: string;
}): JSX.Element | null => {
useEffect(() => {
if (process.env.NODE_ENV === 'development') return;
if (isPreviewMode) return;
if (contextId) {
CloudSDK({
sitecoreEdgeUrl: config.api.edge?.edgeUrl,
sitecoreEdgeContextId: contextId, // prop from Server Component, not client bundle
siteName,
enableBrowserCookie: true,
cookieDomain: window.location.hostname.replace(/^www\./, ''),
}).addEvents().initialize();
}
}, [siteName, isPreviewMode, contextId]);
return null;
};

Next.js serialises the prop when the Server Component renders. The value reaches the browser without being baked into the bundle.

NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID is still required, but for a separate reason. The SDK's PersonalizeMiddleware needs clientContextId from sitecore.config, which must be a NEXT_PUBLIC_* value compiled at build time. Bake it in via next.config.ts:

// next.config.ts
env: {
NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID:
process.env.NEXT_PUBLIC_SITECORE_EDGE_CONTEXT_ID ||
process.env.SITECORE_EDGE_CONTEXT_ID ||
'',
},

And map the same secret to both env names in cloudbuild.yaml's availableSecrets, same secret version, two names.

Note on secret naming: The editing secret env var must be SITECORE_EDITING_SECRET, not JSS_EDITING_SECRET. The Content SDK reads specifically process.env.SITECORE_EDITING_SECRET. The Secret Manager secret should be named sitecore-editing-secret. Map them accordingly: SITECORE_EDITING_SECRET=sitecore-editing-secret:latest.

Issue 8: Pages Builder loads forever: AbortError: signal is aborted without reason

Error (browser console): AbortError: signal is aborted without reason at main.<hash>.js:36:92759 at w.<computed> (polyfills.<hash>.js:1:11062)

After fixing Issue 7, the Pages Builder still hangs. The spinner never resolves, fields are non-editable. The error appears in minified Next.js bundle files, not application code, which makes it look like a framework bug.

What's happening: The Content SDK's editing render route handler (/api/editing/render) makes an internal server-side HTTP request to fetch the page HTML for the editing experience. The URL it constructs follows this priority:

  1. SITECORE_INTERNAL_EDITING_HOST_URL: explicit override
  2. process.env.SITECOREhttp://localhost:3000: XM Cloud managed hosting
  3. process.env.VERCEL or process.env.NETLIFYhttps://${host}
  4. Fallback → http://${host}

On Cloud Run, none of the first three conditions are set. The fallback resolves to http://your-service-xxxx-ew.a.run.app, an HTTP request to a domain that Cloud Run serves exclusively over HTTPS. That request either redirects (301) or is refused. The handler falls back to Response.redirect(route) with a bare path like /site/en/home. The Pages Builder's internal fetch receives a 307 redirect to a relative URL it can't follow, and aborts. The AbortError surfaces deep in the Next.js runtime with no useful stack trace pointing at the actual cause.

Normal page rendering is completely unaffected, only the editing render route makes this internal self-request.

Fix: Set SITECORE_INTERNAL_EDITING_HOST_URL=http://localhost:3000 as a plain env var on the Cloud Run service. This tells the handler to make its internal fetch directly to the container's Node.js process, bypassing the external load balancer entirely.

# Immediate fix, no rebuild required:
gcloud run services update YOUR_SERVICE_NAME \
--region=europe-west1 \
--project=YOUR_PROJECT_ID \
--update-env-vars="SITECORE_INTERNAL_EDITING_HOST_URL=http://localhost:3000"

Add it permanently to all deploy paths:

# cloudbuild.yaml — deploy step - '--set-env-vars=...,SITECORE_INTERNAL_EDITING_HOST_URL=http://localhost:3000'

localhost:3000 is correct because the Dockerfile runs the Next.js standalone server on PORT=3000. The internal fetch hits the Node.js process directly, no round-trip through Google's load balancer, no HTTP→HTTPS mismatch.

This is not a Cloud Run quirk. It will affect any container-based hosting that isn't Vercel, Netlify, or Sitecore's own managed hosting. The SDK knows about three environments. Cloud Run isn't one of them.

What This Actually Teaches You

These aren't random errors. They all trace back to a handful of architectural mismatches between the Sitecore/Next.js starter kit assumptions and what Cloud Run actually is.

sitecore-tools:build is a provisioning step, not a build step. It needs a live, valid Edge connection. Treat it like code generation that runs in your CI environment with injected credentials, not inside Docker. The output is source, upload it, don't try to regenerate it inside the image.

.gcloudignore is not .gitignore. They coexist. Generated files that belong in your build context but not in your repository go into .gcloudignore's absence list, not .gitignore.

NEXT_PUBLIC_* variables are compile-time constants. The name tells you this. They are literal substitutions baked into the client bundle at next build. No Cloud Run runtime configuration matters after the image is built. Anything the browser bundle needs must exist as a --build-arg.

generateStaticPaths: false is the correct default for Cloud Run. Not a workaround, an architectural statement. Cloud Run does not do CDN pre-rendering. SSR is the model.

NEXT_PHASE is your safety valve for build-time config validation. The Content SDK validates config at module-init time. NEXT_PHASE === 'phase-production-build' lets you provide a placeholder value that satisfies the validator without a live Edge connection during next build.

Secret Manager IAM bindings are per-secret, not per-service. Provisioning scripts that silently skip secrets when env vars are absent leave holes you won't find until Cloud Run revision creation fails. Separate value creation from access grants.

Grep for the exact error string before reading SDK source. "Client Edge API settings missing" looked like an SDK message. It was project code. A thirty-second grep told us exactly who owned the bug.

Server Components are the right place to read runtime secrets. SITECORE_EDGE_CONTEXT_ID is server-only. Read it in a Server Component, pass it as a prop. Never import unprefixed process.env.SITECORE_* inside a 'use client' file, it will always be undefined in the browser.

SITECORE_INTERNAL_EDITING_HOST_URL is mandatory on Cloud Run. The SDK assumes Vercel, Netlify, or Sitecore-managed hosting for its internal editing fetch. On Cloud Run the fallback points at your HTTPS-only public URL over HTTP. Set http://localhost:3000 and the handler talks directly to the container process. One env var, no rebuild required.

SitecoreClient.getData() is the correct API for GraphQL queries in new code. GraphQLRequestClient is a JSS-era pattern that requires a raw GraphQL endpoint and an API key. The Content SDK handles authentication via the Context ID automatically. If you see GraphQLRequestClient in a new project, it shouldn't be there.

Final File Structure

Final File Structure
Click to expand
Final File Structure

The Bottom Line

Deploying Sitecore XM Cloud to Cloud Run works. The architecture is clean. Secrets management in GCP is excellent. Request-time SSR maps naturally to Sitecore's headless content delivery model.

The friction is almost entirely Vercel assumptions in the starter kit: static path generation, build-time SDK validation, an editing handler that only knows about three hosting environments, and the expectation that generated files exist in the repository. Once you understand that the pipeline needs two distinct phases, SDK tooling with live Edge credentials, then Docker build with a placeholder, the rest is straightforward.

The key distinction: what needs the live Edge API (sitecore-tools:build) versus what only needs source code (next build inside Docker). Keep those two phases separate, set SITECORE_INTERNAL_EDITING_HOST_URL, and the rest follows.

Next Steps

As this setup is only relying on SSR, there is no SSG/ISR, my next natural step is to explore options to improve caching and therefore performance. Aligned to customers ecosystem, I'll be giving it a try with Fastly

References