Resource details
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
CPU resources are defined in cores or millicores (m). If your container needs two full cores to run, you would put the value “2” or “2000m”. If your container only needs ¼ of a core, you would put a value of “250m”.
Memory resources are defined in bytes. Normally, you give a mebibyte value for memory (this is basically the same thing as a megabyte), but you can give anything from bytes to petabytes.
The current GAP resource request and limit defaults on the various clusters are the following.
| CLUSTER | CPU REQUEST | CPU LIMIT | RAM REQUEST | RAM LIMIT |
|---|---|---|---|---|
| staging | 125m | 500m | 125Mi | 500Mi |
| production | 250m | 1 | 250Mi | 1Gi |
Please set your request and limit values according to your needs, however there are some things to consider:
- Resource limits must be greater than or equal to the resource requests
- The CPU resource limits must not be configured to values lower than 100m, as it causes unreliable throttling on the cluster. Resource requests can be set to lower values if an app is mostly idle.
Resource requests and limits can be defined on a per-deployment basis in gap.yaml. Please see the deployments section in gap-setup.
If you want to set different values to staging and production you can use overriding yamls.
Resource requests directly determine what we pay for. Kubernetes reserves the requested amount on a node regardless of whether the application actually uses it. Over-requesting wastes cluster capacity and increases infrastructure costs for everyone.
Please review the resource usage of every one of your deployments — on both staging and production — and adjust requests to match real-world needs.
We provide detailed Grafana dashboards that make this straightforward:
- Open the deployment dashboards (EU production | EU staging) and select your application. Currently we are working on observability and logging solutions for the other multi-region instances.
- Inspect each running container in the pod individually. A pod can contain multiple containers (e.g. your app, cloudsql-proxy, istio-proxy, flagd sidecar) — check every one of them, as each has its own resource settings. The aggregated pod-level view bundles all container requests and usage together, which can be misleading: a pod may look reasonably utilized overall while one container is heavily over-requested and another is near its limit. Selecting containers individually gives you a clear picture of exactly where adjustments are needed.
- Look at the usage-to-request percentage panels. These show at a glance how much of the requested resources are actually being consumed. If your container consistently sits at 5–10 % of its request, you are heavily over-requesting.
- Compare staging and production. Staging typically receives much less traffic and can use lower requests — this is often where requests are the most inflated relative to actual usage. Check the EU staging dashboard to see how much your apps really consume there. Use per-environment overrides to set appropriate values for each cluster separately.
- Use a sufficiently long time range. When analysing usage on the dashboard, make sure to look at least a few days of data. Very short time frames (e.g. the last hour or last few hours) can be misleading — they may capture an idle period or an unusual spike that is not representative of normal behaviour. A longer window gives you a much more reliable picture of your application’s actual resource consumption. If the dashboard loads slowly with a wider range, try 2–3 days as a good compromise between accuracy and performance.
A container that shows < 20 % usage/request ratio most of the time is a strong candidate for lower requests. Start by reducing the request to roughly 1.5–2× the observed P95 usage and monitor from there.
Set your application’s request values to match the usual / sustained load, not the theoretical maximum. If your app idles at 50m CPU and occasionally spikes to 300m, a request of 100–150m is far more appropriate than 1000m.
Key guidelines:
- Check all environments. Staging workloads are typically lighter — don’t copy production values to staging (or vice versa). Use per-environment overrides.
- Check every container in the pod. Sidecar containers (proxies, log shippers, etc.) often need very little CPU/memory. Review them individually on the dashboard.
- Iterate. Lower your requests, deploy, observe for a few days, and adjust again if needed.
A good rule of thumb is to set your limit 25 % above what you think your maximum spike usage could be.
A typical issue caused by incorrectly set limits is throttling — the mechanism Kubernetes uses to prevent your application from exceeding its CPU limit. Throttling makes your application slower, which can lead to gateway timeouts and other cascading failures.
You can see when your pods are throttling on the deployment dashboard or globally on the throttling dashboard.
In case of Node.js applications with high CPU usage, or ones which have spiky CPU usage patterns, the CPU limit should be set to “1250m”. Unless explicitly using multi-threading (throng, clustering) a Node.js process cannot use more than this.
You can see the resource usage of your applications on the deployment dashboard.
Auto scaling is based on resource requests. You can read more about the subject in the auto scaling documentation.