GAP Documentation
GitHub Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage
Edit page

Troubleshooting resources

One-off debug pod

apiVersion: v1
kind: Pod
metadata:
  name: netshoot-unprivileged-meshed
  namespace: cloud-platform
  labels:
    app: netshoot-unprivileged-meshed
    istio.io/rev: default
spec:
  containers:
  - name: netshoot-unprivileged
    resources:
      limits:
        cpu: 50m
        memory: 50Mi
      requests:
        cpu: 50m
        memory: 50Mi
    image: eu.gcr.io/ems-gap-images/netshoot-unprivileged:latest
    command: ["/bin/sleep", "3650d"]
    imagePullPolicy: IfNotPresent
    securityContext:
      runAsUser: 1000
  restartPolicy: Always

Node networking debug checklist

  • take a look at node resource usage (including bandwidth)
    • overall node resource trend
    • podwise resource trend
  • calico-node runs and not throttling
    • check logs for obvoius errors
  • netd runs and not throttling
    • check logs for obvoius errors
  • kube-dns runs and not throttling
    • check logs for obvoius errors
  • check basic communication
    • from node (one-shot pod or another running workload) (telnet/curl/nc -vz)
    • to node (one-shot pod or another running workload) (telnet/curl/nc -vz)
  • symptoms
  • connection reset
    • indicates that the other side is cutting an existing connection
    • if sporadic rule out keepalive timeout
    • if reproducible
  • connection refused
    • indicates that the other side is not even accepting the connection
  • no route to host
    • likely you are doing something wrong check source/dst ips if they are for active resources
  • timeout
    • indicates dropped packet on the firewall level or bad routing
    • rule out network policies