Security policies

TL;DR jump to practical guide

With Kubernetes v1.25 release the PodSecurityPolicy admission control, that GAP relied on, was removed from Kubernetes. We decided to replace it with a purpose-built policy engine, Kyverno.

Kyverno provides two types of policies: validating and mutating.

Validating policy

A validating policy validates a Kubernetes resource against provided rules, which can assume certain fields with certain values or the absence of fields. If the pod violates any of the rules, the pod cannot be started (in enforce mode).

Mutating policy

A mutating policy can be used to modify the pod manifest so that it complies with the assumed rules. If a pod does not comply with the rule, the controller patches it with the needed changes automatically.

Policies

Business workloads on GAP need to comply with the following policies, that are based on the Restricted Pod Security Standard Profile.

Host namespaces

type: validating

Workloads are not allowed to use the host Linux namespaces (PID, network, IPC). The rule requires the absence or false value of the following fields:

spec.hostPID
spec.hostNetwork
spec.hostIPC

Privileged containers

type: mutating

Workloads are not allowed to run privileged containers. The rule adds the following fields with false value if not present or overwrites the value to false:

spec.containers[*].securityContext.privileged
spec.initContainers[*].securityContext.privileged

Privilege escalation

type: mutating

Containers are not allowed to gain privileged permissions in runtime (ex. via set-user-ID or set-group-ID file mode). The rule sets the following fileds to false:

spec.containers[*].securityContext.allowPrivilegeEscalation
spec.initContainers[*].securityContext.allowPrivilegeEscalation

Run as non-root

type: mutating

Containers must explicitly disallow running as root. The rule sets the following field to true:

spec.securityContext.runAsNonRoot

Run as non-root user

type: validating

Containers must explicity set the user to a nonzero value either on container or pod level. The rule assumes the presence of the following fields with a nonzero value:

spec.securityContext.runAsUser or
spec.containers[*].securityContext.runAsUser
spec.initContainers[*].securityContext.runAsUser

Drop all capabilities

type: mutating

Containers must drop ALL capabilities. The rule adds the below snippet to the following fields:

spec.containers[*].securityContext
spec.initContainers[*].securityContext

capabilites:
  drop:
  - ALL

Disallow add capabilities

type: validating

Containers are not allowed to add capabilities. The rule assumes that the following fileds are absent:

spec.containers[*].securityContext.capabilities.add
spec.initContainers[*].securityContext.capabilities.add

Volume types

type: validating

Workloads are allowed to use certain volume types only. The rule validates the following field against the allowed types:

spec.volumes[*]

Allowed types are the following:

configMap
csi
downwardAPI
emptyDir
ephemeral
persistentVolumeClaim
projected
secret

AppArmor and Seccomp

type: mutating

Workloads are required to use runtime/default profiles. The rule adds the following fields:

metadata.annotations["container.apparmor.security.beta.kubernetes.io/*"]: runtime/default for each container
metadata.annotations["seccomp.security.alpha.kubernetes.io/pod"]: runtime/default
spec.securityContext.seccompProfile.type: RuntimeDefault

Extra policies

We implemented other policies not directly related to pod security.

Deny `ExternalName` service port

type: validating

A service with type ExternalName and with any port defined results in all traffic in Istio being redirected to that service. This policy blocks services that set spec.type: ExternalName and define the following field with any value:

spec.ports[*].port

Prevent namespace deletion

type: validating

This policy aims to prevent accidental namespace deletion by requiring a certain label in case of DELETE request. The deletion will be rejected unless the namespace has the following label set:

metadata.labels.delete: allow

How does that affect me? (Violations and mitigation)

Workloads that violate policies are prevented to start by the policy engine (except see Known Issues).

You can find AdmissionReport and BackgroundScanReport in the cluster for each pod and see which policies the pod passed or failed. Example AdmissionReport:

apiVersion: kyverno.io/v1alpha2
kind: AdmissionReport
metadata:
  creationTimestamp: "2023-08-25T11:23:28Z"
  generation: 1
  labels: {...}
  name: 210be514-77ff-4c88-958a-8f4cc658a4ef
  namespace: default
spec:
  owner:
    apiVersion: ""
    kind: ""
    name: ""
    uid: ""
  results:
  - category: Pod Security Standards (Baseline)
    message: validation rule 'adding-capabilities' passed.
    policy: disallow-capabilities # <---------- Policy name
    resources:
    - apiVersion: v1
      kind: Pod
      name: test
      namespace: default
      uid: 210be514-77ff-4c88-958a-8f4cc658a4ef
    result: pass # <---------- Result of validation
    rule: adding-capabilities # <---------- Rule name within policy
    scored: true
    severity: medium
    source: kyverno
    timestamp:
      nanos: 0
      seconds: 1692962608
  [...]
  summary:
    error: 0
    fail: 6
    pass: 13
    skip: 0
    warn: 0

GAP applications generated by gap.yaml configuration should not violate any policy, however patches, custom resources and ad-hoc pods may.

If you encounter unexpected validation failures, find the violated rule above and try to mitigate the issue. If you’re uncertain you can always ask for help in the #infra-support Slack channel.

GAP application

If your GAP configuration is resulted in a policy violation, your deployment to staging environment will fail. You can go to your application in ArgoCD, click on Sync failed and look for the red (💔) events. If you see something like the below error, you can determine which policy is violated and seek help for mitigation if uncertain in the #infra-support Slack channel.

Error from server: error when creating "test-objects/pod.yaml": admission webhook "validate.kyverno.svc-
fail" denied the request: 

resource Pod/default/test was blocked due to the following policies 

disallow-capabilities-strict:
  require-drop-all: 'validation failure: Containers must drop `ALL` capabilities.'

Ad-hoc pod

If you try to start an ad-hoc pod (ex. via kubectl run command) that violates the policies and the mutation controller cannot fix it automatically, your pod will be rejected. You will see the above error message directly in your terminal.

Known issues

Non-numeric user

This issue does not affect the main business application container for the following reason:
When you create a new user to run your application in your Dockerfile, it gets the uid of 1000 by default, since the uid for new users starts from 1000 on most distributions.
Then the GAP manifest generation automatically adds runAsUser: 1000 to the main container.

In case your container sets the user as a username instead of a numeric ID, the validation of the rule Run as non-root user will run into an error. The pod will be stuck in CreateContainerConfigError and you will find something similar when describing the pod:

Warning  Failed           4s (x2 over 5s)  kubelet            Error: container has runAsNonRoot and 
image has non-numeric user (test), cannot verify user is non-root (pod: "test_default(d0733e6a-7485-
4ca5-811f-eb56ed4c5a30)", container: test)

This can happen if your container image is built so that it runs as a named user. An example for that is the stunnel image (eu.gcr.io/ems-gap-images/stunnel:v2) some teams are using. When stunnel is installed a user stunnel is created and the container is set up so that the process runs as this user, but the container runtime (ContainerD) cannot determine the uid of the user.

To mitigate this issue, run your image locally and determine the uid of the named user:

docker run --rm -it eu.gcr.io/ems-gap-images/stunnel:v2 sh               
/ $ id
uid=100(stunnel) gid=101(stunnel) groups=101(stunnel)
/ $

As we see, the uid of stunnel is 100.

After that you can set it in your patch or custom resource in the container’s securityContext as follows:

containers:
  - name: stunnel
    image:
      repository: sap-ems-base-infra-package-p/gap-images/stunnel
      tag: v2
    command:
      - dumb-init
    args:
      - stunnel
      - /etc/stunnel/config
    securityContext:
      runAsUser: 100
    ...

Or if you’re running an ad-hoc pod with kubectl run add the --overrides flag as follows:

kubectl run --overrides={
  "apiVersion": "v1"
  "spec":
    "securityContext":
      "runAsUser": 65534
} ...

(65534 is the uid of the user nobody which is present in most Linux distributions.)

Security policies

Validating policy

Mutating policy

Policies

Host namespaces

Privileged containers

Privilege escalation

Run as non-root

Run as non-root user

Drop all capabilities

Disallow add capabilities

Volume types

AppArmor and Seccomp

Extra policies

Deny ExternalName service port

Prevent namespace deletion

How does that affect me? (Violations and mitigation)

GAP application

Ad-hoc pod

Known issues

Non-numeric user

Deny `ExternalName` service port