The Hidden Risk of Zombie Workflows in GitHub Actions

In partnership with

TechOps Examples

Hey — It's Govardhana MK 👋

Welcome to another technical edition.

Every Tuesday – You’ll receive a free edition with a byte-size use case, remote job opportunities, top news, tools, and articles.

Every Thursday and Saturday – You’ll receive a special edition with a deep dive use case, remote job opportunities and articles.

👋 👋 A big thank you to today's sponsor UDACITY

Build real AI and tech skills, faster

Udacity helps you build the AI and tech skills employers actually need—fast. Learn from industry experts through hands-on projects designed to mirror real-world work, not just theory.

Whether you’re advancing in your current role or preparing for what’s next, Udacity’s flexible, fully online courses let you learn on your schedule and apply new skills immediately. From AI and machine learning to data, programming, and cloud technologies, you’ll gain practical experience you can show, not just list on a résumé.

Build confidence, stay competitive, and move your career forward with AI and tech skills that are in demand.

Learn More

IN TODAY'S EDITION

🧠 Use Case

The Hidden Risk of Zombie Workflows in GitHub Actions

👀 Remote Jobs

Fanvue is hiring a Senior Platform Engineer
Remote Location: Worldwide
Category Labs is hiring a Infrastructure Engineer
Remote Location: Worldwide

📚 Resources

What's Wrong with Kubernetes Today

Google Lessons for Using AI Agents for Securing Our Enterprise

Build multi-step applications and AI workflows with AWS Lambda durable functions

If you’re not a subscriber, here’s what you missed last week.

How DNS Routing Works in Amazon Route 53 and How to Configure It

How Load Balancing Works in Kubernetes

To receive all the full articles and support TechOps Examples, consider subscribing:

One-time 25% OFF on all annual plans of memberships. Closes Soon.

🧠 USE CASE

The Hidden Risk of Zombie Workflows in GitHub Actions

A zombie workflow is a GitHub Actions run that is no longer useful but is still executing or retrying. Common patterns:

Old commits still running CI after a newer commit is already merged
Rerun loops triggered by flaky steps
Long running jobs waiting on resources that will never arrive
Parallel matrix jobs continuing even after the result is irrelevant

Sample Workflow Pattern

I recently went through an interesting study by sonarsource, where they started with 28,384 popular GitHub repositories and found that only 15,691 actually used GitHub Actions. After removing single branch repos, 14,130 multi branch repositories remained for analysis.

Across these repos, they scanned 7.7 million branches and discovered 442,321 unique workflow files, many of them duplicated across branches as historical snapshots. Filtering only workflows using pull_request_target reduced this to 18,002 potentially attackable workflows.

Ref: sonarsource

Using a strict heuristic focusing on secret usage and write permissions, the list shrank to 2,191 high risk candidates, of which 188 workflows were confirmed vulnerable after manual review.

Ref: sonarsource

121 vulnerabilities still existed on default branches, leaving 67 true Zombie Workflows that lived only in non default branches. These were found in well known projects, proving this is a real risk hiding in forgotten branches.

Ref: sonarsource

How Zombie Workflows are Born

1. Push triggered workflows without cancellation

on:
  push:
    branches:
      - main

Every push creates a new run. If you push 5 commits quickly:

5 workflows start
All of them run full CI
Only the last one matters

The first four are zombies the moment a newer commit exists.

2. Pull Request workflows with retries and flaky tests

A test flakes. GitHub retries the job. The developer clicks “Re-run all jobs”. Now you have:

Old run retrying
New run triggered by updated commit
Both competing for runners

No guardrails. No cancellation.

3. Matrix jobs that don’t fail fast

strategy:
  matrix:
    region: [us-east-1, eu-west-1, ap-south-1]

One region fails early. But the other regions keep running for 20 more minutes.

From a deployment perspective, the outcome is already decided. But the runners don’t know that. Zombie behavior.

How We Can Fix It

1. Use concurrency aggressively

This single block eliminates most zombies.

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

Groups runs by workflow + branch
Cancels older runs when a new one starts
Ensures only the latest commit matters

This is mandatory for: CI workflows, Terraform plans, Preview environments, Any push triggered automation

2. Split CI and CD workflows

One workflow for validation. Another for deployment. This prevents old CI runs from blocking production deploys.

3. Fail fast on matrix jobs

strategy:
  fail-fast: true

If one matrix job fails: Others are cancelled, Runners are freed, Signal is immediate

4. Time box everything

jobs:
  build:
    timeout-minutes: 20

No job should run indefinitely. If it can’t finish in 20 minutes: It’s broken or waiting on something external. Either way, kill it

— # (#)

🔴 Get my DevOps & Kubernetes ebooks! (free for Premium Club and Personal Tier newsletter subscribers)

Looking to promote your company, product, service, or event to 58,000+ DevOps and Cloud Professionals? Let's work together.

ADVERTISE WITH US →

The Hidden Risk of Zombie Workflows in GitHub Actions

Build real AI and tech skills, faster

IN TODAY'S EDITION

🧠 Use Case

👀 Remote Jobs

📚 Resources

🧠 USE CASE

The Hidden Risk of Zombie Workflows in GitHub Actions

Sample Workflow Pattern

How Zombie Workflows are Born

1. Push triggered workflows without cancellation

2. Pull Request workflows with retries and flaky tests

3. Matrix jobs that don’t fail fast

How We Can Fix It

1. Use concurrency aggressively

2. Split CI and CD workflows

3. Fail fast on matrix jobs

4. Time box everything

Keep Reading

TechOps Examples

Home

Account

POLICIES

Request Sponsorship Details

SUPPORT

Upgrade