How Google Cloud Run Provides Built In Fault Tolerance for Highly Available Services

TechOps Examples

Hey — It's Govardhana MK 👋

Welcome to another technical edition.

Every Tuesday – You’ll receive a free edition with a byte-size use case, remote job opportunities, top news, tools, and articles.

Every Thursday and Saturday – You’ll receive a special edition with a deep dive use case, remote job opportunities and articles.

Deploying AI-generated code into Kubernetes environments is on the rise.

PerfectScale is conducting a practical, hands-on workshop using mirrord and Cursor.

As a DevOps Enginner, You’ll learn how to:

Catch integration issues earlier
Reduce reliance on slow CI pipelines
Test AI-generated changes without impacting production

25 spots available. I’m joining.

IN TODAY'S EDITION

🧠 Use Case

How Google Cloud Run Provides Built In Fault Tolerance for Highly Available Services

👀 Remote Jobs

Optimum is hiring a Senior DevOps Engineer
Remote Location: Worldwide
Zimperium is hiring a Site Reliability Engineer
Remote Location: India

📚 Resources

AWS Privilege Escalation Techniques

pre-commit hooks are fundamentally broken

Kubernetes Optimization using In-Place Pod Resizing and Zone-Aware Routing

If you’re not a subscriber, here’s what you missed last week.

Kubernetes etcd Crash Course for DevOps Engineers

How Kubernetes Requests and Limits Really Work

To receive all the full articles and support TechOps Examples, consider subscribing:

One-time 25% OFF on all annual plans of memberships. Closes Soon.

🛠 TOOL OF THE DAY

Nelm - A Helm 4 alternative. It is a Kubernetes deployment tool that manages Helm Charts and deploys them to Kubernetes.

Try it Out →

🧠 USE CASE

How Google Cloud Run Provides Built In Fault Tolerance for Highly Available Services

If you are new to fault tolerance, it simply means building systems that keep working even when something breaks. Servers can crash, networks can fail, and traffic can spike unexpectedly. A fault-tolerant system is designed so these problems do not stop users from accessing the service.

By default, Google Cloud Run runs in a single region and already handles failures across zones inside that region. This protects you from instance and zone-level issues, but it does not protect you from a full regional outage. To tolerate regional failures, you must design the architecture explicitly.

Architecting Cloud Run for Regional Fault Tolerance

Ref: Google Cloud

Cloud Run supports this natively through multi-regional deployments:

Deploy the same Cloud Run service to multiple regions using the same container image and configuration
Place a global external application load balancer in front
Configure one backend per region, each backed by a Serverless Network Endpoint Group (NEG)
Expose everything through one global external IP as the single entry point for users

❝

Without Cloud Run Service HealthIn simple terms, users hit one global IP, and traffic is routed to Cloud Run services running in different regions.

A Serverless NEG is just the glue between the load balancer and Cloud Run. It tells the load balancer, “this backend is a Cloud Run service,” without exposing servers, instances, or pods.

Without Cloud Run Service Health

When Cloud Run is deployed to multiple regions behind a global external Application Load Balancer, traffic is routed based on proximity, not actual service health.

Ref: Google Cloud

Users in Europe are sent to the Europe region
Users in the US are sent to the US region
The load balancer assumes the regional service is healthy as long as it exists

If the Cloud Run service in a region is partially failing or degraded, traffic still flows to it. The load balancer has no direct signal that the service itself is unhealthy.

Result:

Requests hit a failing regional service
Users see errors or high latency
A healthy region is available but unused

This setup gives regional routing, but not regional fault tolerance.

With Cloud Run Service Health

Here the routing behavior changes.

Cloud Run actively reports regional service health to the global load balancer.

Ref: Google Cloud

When a regional Cloud Run service becomes unhealthy, it is marked as unavailable
The load balancer automatically stops sending traffic to that region
Requests are routed to another healthy region, even if it is farther away

From the user’s perspective, the service continues to work despite a regional failure. This turns multi-region Cloud Run from “closest-region routing” into automatic regional failover

The system no longer depends only on location. It depends on real service health, which is what fault tolerance actually requires.

— # (#)

🔴 Get my DevOps & Kubernetes ebooks! (free for Premium Club and Personal Tier newsletter subscribers)

Looking to promote your company, product, service, or event to 58,000+ DevOps and Cloud Professionals? Let's work together.

ADVERTISE WITH US →

How Google Cloud Run Provides Built In Fault Tolerance for Highly Available Services

IN TODAY'S EDITION

🧠 Use Case

👀 Remote Jobs

📚 Resources

🛠 TOOL OF THE DAY

🧠 USE CASE

How Google Cloud Run Provides Built In Fault Tolerance for Highly Available Services

Architecting Cloud Run for Regional Fault Tolerance

Without Cloud Run Service Health

With Cloud Run Service Health

Keep Reading

TechOps Examples

Home

Account

POLICIES

Request Sponsorship Details

SUPPORT

Upgrade