How Kubernetes Applies Resource Quotas

Kubernetes clusters are shared infrastructure. Without guardrails, a single team running a poorly configured batch job can consume every CPU and memory resource on every node, starving every other team's workloads. Production services go down not because of a bug in their own code but because a neighbouring namespace consumed everything.

A ResourceQuota sets a hard ceiling on the total resources a namespace can consume. Once the limit is reached, new pods that would push the namespace over it are rejected at the API server. Running workloads continue unaffected.

Resource Quotas Request flow

Every request arriving at the Kubernetes API server passes through Admission Control before being written to etcd. The ResourceQuota plugin sits in this pipeline alongside Service Account admission.

When a pod creation request arrives, the ResourceQuota plugin evaluates it against the namespace's current consumption. If the request would exceed the quota, the API server returns an error and nothing is written to etcd. Both the Resource Management and Security admission layers must pass before the object is persisted.

Critically, quota enforcement happens at admission time, not retroactively. If you add a quota to a namespace already consuming 8 CPU and set the limit to 4 CPU, existing pods keep running. Only new pods are blocked going forward. This makes quotas safe to apply to live namespaces without disrupting anything running.

Quotas are Scoped Per Namespace

Each ResourceQuota applies to exactly one namespace. Different namespaces have completely independent limits and independent tracking.

A pod in Namespace 2 consuming 200 milliCPUs has zero effect on the quota tracking in Namespace 1. This independence is what makes the namespace model viable for multi-tenant clusters. Each team gets their allocated resource budget, and teams cannot interfere with each other.

The quota values are aggregates. A CPU limit of 250 milliCPUs means the total CPU requests across all pods in that namespace cannot exceed 250 milliCPUs combined. The sum of all pod requests is what matters, not any individual pod. You can also set quotas on object counts, limiting the number of pods, Services, PersistentVolumeClaims, Secrets, and ConfigMaps per namespace.

How Quotas Connect to Pod level Resource Specs

ResourceQuota operates at the namespace aggregate level but connects directly to the resource requests and limits on individual pods and containers.

The ResourceQuota admission plugin adds up the requests and limits across all containers in all running pods in the namespace and compares the sum against the quota whenever a new pod is submitted.

This creates a dependency that catches many engineers off guard. When a namespace has a ResourceQuota defined, every pod submitted to that namespace must specify resource requests and limits on every container. A pod without them is rejected immediately.

This is intentional. A quota tracking total CPU requests cannot function correctly if some pods have no CPU requests because those pods consume unmeasured resources and the accounting breaks. This is why LimitRange is commonly paired with ResourceQuota. LimitRange injects default requests and limits into pods that do not specify them, so developers do not need to add resource specs manually on every pod while the quota enforcement stays complete and accurate.

What Happens When a Quota is Exceeded

The API server returns a 403 Forbidden with a message like:

pods "my-app-xyz" is forbidden: exceeded quota: team-quota, requested: requests.cpu=500m, used: requests.cpu=200m, limited: requests.cpu=250m

For Deployments and StatefulSets, a quota rejection means the ReplicaSet controller continuously retries. The Deployment appears partially rolled out with some replicas running and the rest failing silently.

Running kubectl describe quota -n <namespace> is the first thing to check when pods in a namespace are unexpectedly pending or unavailable.

Teams that use quotas well apply them automatically to every new namespace through a Helm chart or Kustomize overlay, expose quota utilisation in Grafana dashboards, and treat consistently high utilisation as a signal to increase the allocation before the team hits the ceiling at an inconvenient moment.

THE CODE - Learn how to code faster with AI in 5 mins a day. Loved by 200k+ devs, engineers at Meta, Google, OpenAI, and more.

Sign up to get The Ultimate Claude Code Guide + 200 Free Engineering resources.

🔴 Get my DevOps & Kubernetes ebooks! (free for Premium Club and Personal Tier newsletter subscribers)

Looking to promote your company, product, service, or event to 48,000+ DevOps and Cloud Professionals? Let's work together.

ADVERTISE WITH US →

How Kubernetes Applies Resource Quotas

IN TODAY'S EDITION

🧠 Use Case

👀 Remote Jobs

📚 Resources

🛠 TOOL OF THE DAY

🧠 USE CASE

How Kubernetes Applies Resource Quotas

Resource Quotas Request flow

Quotas are Scoped Per Namespace

How Quotas Connect to Pod level Resource Specs

What Happens When a Quota is Exceeded

Keep Reading

TechOps Examples

Home

Account

POLICIES

Request Sponsorship Details

SUPPORT

Upgrade