TechOps Examples

Hey — It's Govardhana MK 👋

Welcome to another technical edition.

Every Tuesday – You’ll receive a free edition with a byte-size use case, remote job opportunities, top news, tools, and articles.

Every Thursday and Saturday – You’ll receive a special edition with a deep dive use case, remote job opportunities and articles.

So why are so many teams running:

  • half-empty nodes

  • over-provisioned workloads

  • unpredictable autoscaling

  • rising cloud costs

The root cause is often poor binpacking caused by a few common (and fixable) configuration issues.

Join Eli Birger and Anton Weiss as they walk through real examples of what causes poor node utilization and how to fix it fast.

What You’ll Learn

  • Why inaccurate resource requests are often the root cause of wasted capacity

  • How affinity rules, topology constraints, and priority classes create bottlenecks

  • Hands-on fixes that improve node utilization immediately

  • How PerfectScale by DoiT helps surface inefficiencies before they compound

Can't make the time? Feel free to sign up anyway and we'll send you a recording after the session. We hope to see you there!

IN TODAY'S EDITION

🧠 Use Case
  • Kubernetes Liveness Probes: A Practical Guide

👀 Remote Jobs

Powered by: Jobsurface.com

📚 Resources

If you’re not a subscriber, here’s what you missed last week.

To receive all the full articles and support TechOps Examples, consider subscribing:

🧠 USE CASE

Kubernetes Liveness Probes: A Practical Guide

Kubernetes probes are diagnostic checks that the kubelet runs against your containers on a recurring schedule. They are the mechanism through which Kubernetes answers a fundamental operational question: is this container actually doing what it is supposed to be doing, or has it entered a broken state that requires intervention?

There are three probe types.

  • Liveness probes answer "is this container still alive and worth keeping?"

  • Readiness probes answer "is this container ready to accept traffic?"

  • Startup probes answer "has this container finished its initialisation sequence?"

Each solves a different problem, and conflating them is one of the most common probe configuration mistakes in production clusters.

All probes have five parameters that are crucial to configure.

initialDelaySeconds: Time to wait after the container starts (default 0)
periodSeconds: Probe execution frequency (default 10)
timeoutSeconds: Time to wait for the reply (default 1)
successThreshold: Successful checks to mark healthy (default 1)
failureThreshold: Failed checks to mark unhealthy (default 3)

What is a Liveness Probe?

A liveness probe detects when a container has entered a state it cannot recover from on its own, the container process has become deadlocked, leaked memory into an unrecoverable state, or entered an infinite loop that makes it functionally dead while technically still running.

Kubelet periodically calls the probe endpoint on the running container. When the probe returns healthy, the container continues running and serving client traffic normally. When the probe fails, kubelet does not wait for the application to recover on its own. It restarts the container immediately. The key word here is restart, not reschedule. Liveness failure triggers a container restart in place on the same node. It is not an eviction or a rescheduling event.

How Kubelet Executes the Probe

This is exactly what happens during a liveness failure sequence in practice.

Kubelet sends an HTTP GET to the configured path and port on the container. While the container is healthy, it responds with a 2xx status code and kubelet takes no action. When the container begins degrading, it stops responding. Kubelet sends the probe again. No answer. And again. No answer. After the configured failure threshold is crossed, kubelet restarts the pod.

This sequence is why the failureThreshold parameter matters. A single missed probe does not trigger a restart. Kubelet applies the threshold to tolerate transient network hiccups and brief GC pauses that would cause spurious restarts if the threshold were set to 1.

Configuring Liveness Probes for Production (Keep it simple and safe.)

  • The liveness endpoint must be cheap and self contained. Do not call databases or external services.

  • Return 200 if the process is alive, 5xx only for unrecoverable states like deadlocks.

  • Use a startup probe for slow apps. It prevents premature restarts during initialization.

  • Test under load. Slow responses can trigger false restarts with low timeoutSeconds.

  • Set failureThreshold conservatively. Restarts should happen only for real failures, not temporary slowdowns.

Rule: Liveness probes are for unrecoverable failures, not full system health checks.

Join Eli Birger and Anton Weiss as they walk through real examples of what causes poor node utilization and how to fix it fast.

Can't make the time? Feel free to sign up anyway and we'll send you a recording after the session. We hope to see you there!

🔴 Get my DevOps & Kubernetes ebooks! (free for Premium Club and Personal Tier newsletter subscribers)

Looking to promote your company, product, service, or event to 56,000+ DevOps and Cloud Professionals? Let's work together.

Keep Reading