- TechOps Examples
- Posts
- The Hidden Risk of Zombie Workflows in GitHub Actions
The Hidden Risk of Zombie Workflows in GitHub Actions
TechOps Examples
Hey — It's Govardhana MK 👋
Welcome to another technical edition.
Every Tuesday – You’ll receive a free edition with a byte-size use case, remote job opportunities, top news, tools, and articles.
Every Thursday and Saturday – You’ll receive a special edition with a deep dive use case, remote job opportunities and articles.
👋 👋 A big thank you to today's sponsor UDACITY
Build real AI and tech skills, faster
Udacity helps you build the AI and tech skills employers actually need—fast. Learn from industry experts through hands-on projects designed to mirror real-world work, not just theory.
Whether you’re advancing in your current role or preparing for what’s next, Udacity’s flexible, fully online courses let you learn on your schedule and apply new skills immediately. From AI and machine learning to data, programming, and cloud technologies, you’ll gain practical experience you can show, not just list on a résumé.
Build confidence, stay competitive, and move your career forward with AI and tech skills that are in demand.
IN TODAY'S EDITION
🧠 Use Case
The Hidden Risk of Zombie Workflows in GitHub Actions
👀 Remote Jobs
Fanvue is hiring a Senior Platform Engineer
Remote Location: Worldwide
Category Labs is hiring a Infrastructure Engineer
Remote Location: Worldwide
📚️ Resources
If you’re not a subscriber, here’s what you missed last week.
To receive all the full articles and support TechOps Examples, consider subscribing:
🧠 USE CASE
The Hidden Risk of Zombie Workflows in GitHub Actions
A zombie workflow is a GitHub Actions run that is no longer useful but is still executing or retrying. Common patterns:
Old commits still running CI after a newer commit is already merged
Rerun loops triggered by flaky steps
Long running jobs waiting on resources that will never arrive
Parallel matrix jobs continuing even after the result is irrelevant
Sample Workflow Pattern

I recently went through an interesting study by sonarsource, where they started with 28,384 popular GitHub repositories and found that only 15,691 actually used GitHub Actions. After removing single branch repos, 14,130 multi branch repositories remained for analysis.
Across these repos, they scanned 7.7 million branches and discovered 442,321 unique workflow files, many of them duplicated across branches as historical snapshots. Filtering only workflows using pull_request_target reduced this to 18,002 potentially attackable workflows.

Ref: sonarsource
Using a strict heuristic focusing on secret usage and write permissions, the list shrank to 2,191 high risk candidates, of which 188 workflows were confirmed vulnerable after manual review.

Ref: sonarsource
121 vulnerabilities still existed on default branches, leaving 67 true Zombie Workflows that lived only in non default branches. These were found in well known projects, proving this is a real risk hiding in forgotten branches.

Ref: sonarsource
How Zombie Workflows are Born
1. Push triggered workflows without cancellation
on:
push:
branches:
- mainEvery push creates a new run. If you push 5 commits quickly:
5 workflows start
All of them run full CI
Only the last one matters
The first four are zombies the moment a newer commit exists.
2. Pull Request workflows with retries and flaky tests
A test flakes. GitHub retries the job. The developer clicks “Re-run all jobs”. Now you have:
Old run retrying
New run triggered by updated commit
Both competing for runners
No guardrails. No cancellation.
3. Matrix jobs that don’t fail fast
strategy:
matrix:
region: [us-east-1, eu-west-1, ap-south-1]
One region fails early. But the other regions keep running for 20 more minutes.
From a deployment perspective, the outcome is already decided. But the runners don’t know that. Zombie behavior.
How We Can Fix It
1. Use concurrency aggressively
This single block eliminates most zombies.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: trueGroups runs by workflow + branch
Cancels older runs when a new one starts
Ensures only the latest commit matters
This is mandatory for: CI workflows, Terraform plans, Preview environments, Any push triggered automation
2. Split CI and CD workflows
One workflow for validation. Another for deployment. This prevents old CI runs from blocking production deploys.
3. Fail fast on matrix jobs
strategy:
fail-fast: true
If one matrix job fails: Others are cancelled, Runners are freed, Signal is immediate
4. Time box everything
jobs:
build:
timeout-minutes: 20
No job should run indefinitely. If it can’t finish in 20 minutes: It’s broken or waiting on something external. Either way, kill it
🔴 Get my DevOps & Kubernetes ebooks! (free for Premium Club and Personal Tier newsletter subscribers)


