- TechOps Examples
- Posts
- Kubernetes Design Mistakes You Should Avoid
Kubernetes Design Mistakes You Should Avoid
TechOps Examples
Hey — It's Govardhana MK 👋
Welcome to another technical edition.
Every Tuesday – You’ll receive a free edition with a byte-size use case, remote job opportunities, top news, tools, and articles.
Every Thursday and Saturday – You’ll receive a special edition with a deep dive use case, remote job opportunities, and articles.
Struggling with cloud spend or unpredictable traffic?
Zbynek Roubalik, Co-Creator of KEDA, is conducting a free short live demo.
You’ll see how teams use Kedify to:
• Cut cloud costs by 30–40%
• Eliminate cold starts with predictive autoscaling
• Prevent performance incidents before they happen
Built and maintained by the founding team behind KEDA, Kedify gives you production-grade autoscaling without the engineering overhead.
👀 Remote Jobs
Uvation is hiring a Azure DevOps Engineer
Remote Location: India
Jobgether is hiring a Sr. Azure Architect
Remote Location: Malaysia
Powered by: Jobsurface.com
📚️ Resources
Looking to promote your company, product, service, or event to 60,000+ Cloud Native Professionals? Let's work together. Advertise With Us
🧠 DEEP DIVE USE CASE
Kubernetes Design Mistakes You Should Avoid
Setting up a Kubernetes environment is far easier than running and optimizing it. In my 8+ years with Kubernetes, I have seen even mature companies fall into anti patterns. These are not beginner mistakes. Even the best practitioners repeat them, especially under scale, pressure, or team churn.
The top design mistakes I repeatedly see:
1. Single Cluster Deployment

Teams often start with a single production cluster and keep scaling everything into it (services, workloads, secrets, users, and integrations). It works fine until it doesn't.
In real world scenarios, I’ve seen how a single issue like a control plane outage, faulty node group update, or a bad deployment can take down the entire system. There is zero fault tolerance. There is no blast radius control.
Even large scale companies with strong DevOps teams end up routing all production traffic to a single cluster due to operational convenience. But convenience breaks under pressure.
What usually goes wrong:
Cluster goes unreachable during upgrade or KMS issue
One tenant affects others in a multi tenant shared cluster
Region specific outage takes down global access
No way to isolate or failover in critical production moments
Adopt a Multi Cluster Architecture:
Split production workloads across two or more regional clusters
Use DNS level routing or Global Load Balancers (like AWS Route 53, GCLB, Akamai) to handle failover and traffic steering
If SLA is critical, design clusters per business unit or per customer tier
Back up cluster state regularly using tools like Velero
Operational Tips:
Start with two clusters if budget allows, test multi cluster from Day 1
Set up observability and alerting per cluster (not globally)
Practice failover once every quarter
Avoid hardcoding cluster names or URLs into CI/CD pipelines or secrets
2. Chaotic Access Control

Giving broad cluster access for convenience is a dangerous habit. Devs accidentally access or delete production resources because RBAC was loosely configured.
This usually happens when teams share the same cluster for dev and prod, and use ClusterRoleBinding to quickly unblock access. Over time, there's no clear boundary between environments, and no visibility into who can do what.
Fix it with namespace scoped RBAC:
Use Role and RoleBinding instead of cluster wide bindings
Create separate access policies for dev, stage, and prod
Group users and bind roles to those groups
Only grant the minimum permissions needed
Practical Hygiene:
Run kubectl get rolebindings --all-namespaces regularly
Use tools like rakkess or kubectl-who-can
Never use system:masters for human users
3. No Policy As Code
Manual reviews and verbal guidelines are not security. In many teams, enforcement of naming conventions, image registries, host paths, and privilege flags is done through Slack messages or code reviews. That does not scale. It also does not prevent mistakes.

🔴 Get my DevOps & Kubernetes ebooks! (free for Premium Club and Personal Tier newsletter subscribers)
Upgrade to Paid to read the rest.
Become a paying subscriber to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In.
Paid subscriptions get you:
- • Access to archive of 250+ use cases
- • Deep Dive use case editions (Thursdays and Saturdays)
- • Access to Private Discord Community
- • Invitations to monthly Zoom calls for use case discussions and industry leaders meetups
- • Quarterly 1:1 'Ask Me Anything' power session
