Cluster Upgrade
- Check the Kubernetes changelog
- (for each staging, production, hq) Run kubepug to see any deprecated APIs in use.
- Announce the upgrade in the #infra-announcements Slack channel beforehand with the template:
⚠️ GAP <staging/production/hq>
Dear Users,
We’re starting a cluster upgrade on the <staging/production/hq> cluster. No issues are expected (if-stage-or-production:, but some Deployment Replicas Unavailable alerts might fire during this time.)
- Upgrade the control plane node:
- In Google Cloud Console upgrade the cluster control plane (GKE -> Cluster -> Details -> Version -> click Upgrade)
- Set the target version to the desired version
- Click Save Changes
- Upgrade the node pools (Node pool details -> Edit -> Node version) in the following order for <staging/production>:
- Cluster-components
- Whitelist-internal (Make sure the
gap-staging-whitelist-internal-ip-*whitelist-internal fix IPs are assigned to nodes in GKE under VPC Network/IP addresses) - Whitelist (Make sure the
gap-staging-whitelist-ip-*whitelist fix IPs are assigned to nodes in GKE under VPC Network/IP addresses) - Standard pool
- Baseline pool (on staging, you can upgrade Standard and Baseline pools together)
- Ingress
- Upgrade the node pools in the following order for
: - Cluster-components
- Baseline pool
- Update the
kindest/nodeversion here and here for the CI to validate the chart against the current kubernetes version - Check the dashboards and alerts to make sure
Alles ist in Ordnung.