Security policies
TL;DR jump to practical guide
With Kubernetes v1.25 release the PodSecurityPolicy admission control, that GAP relied on, was removed from Kubernetes. We decided to replace it with a purpose-built policy engine, Kyverno.
Kyverno provides two types of policies: validating and mutating.
A validating policy validates a Kubernetes resource against provided rules, which can assume certain fields with certain values or the absence of fields. If the pod violates any of the rules, the pod cannot be started (in enforce mode).
A mutating policy can be used to modify the pod manifest so that it complies with the assumed rules. If a pod does not comply with the rule, the controller patches it with the needed changes automatically.
Business workloads on GAP need to comply with the following policies, that are based on the Restricted Pod Security Standard Profile.
type: validating
Workloads are not allowed to use the host Linux namespaces (PID, network, IPC). The rule requires the absence or false value of the following fields:
spec.hostPIDspec.hostNetworkspec.hostIPC
type: mutating
Workloads are not allowed to run privileged containers. The rule adds the following fields with false value if not present or overwrites the value to false:
spec.containers[*].securityContext.privilegedspec.initContainers[*].securityContext.privileged
type: mutating
Containers are not allowed to gain privileged permissions in runtime (ex. via set-user-ID or set-group-ID file mode). The rule sets the following fileds to false:
spec.containers[*].securityContext.allowPrivilegeEscalationspec.initContainers[*].securityContext.allowPrivilegeEscalation
type: mutating
Containers must explicitly disallow running as root. The rule sets the following field to true:
spec.securityContext.runAsNonRoot
type: validating
Containers must explicity set the user to a nonzero value either on container or pod level. The rule assumes the presence of the following fields with a nonzero value:
spec.securityContext.runAsUserorspec.containers[*].securityContext.runAsUserspec.initContainers[*].securityContext.runAsUser
type: mutating
Containers must drop ALL capabilities. The rule adds the below snippet to the following fields:
spec.containers[*].securityContextspec.initContainers[*].securityContext
capabilites:
drop:
- ALL
type: validating
Containers are not allowed to add capabilities. The rule assumes that the following fileds are absent:
spec.containers[*].securityContext.capabilities.addspec.initContainers[*].securityContext.capabilities.add
type: validating
Workloads are allowed to use certain volume types only. The rule validates the following field against the allowed types:
spec.volumes[*]
Allowed types are the following:
configMapcsidownwardAPIemptyDirephemeralpersistentVolumeClaimprojectedsecret
type: mutating
Workloads are required to use runtime/default profiles. The rule adds the following fields:
metadata.annotations["container.apparmor.security.beta.kubernetes.io/*"]: runtime/defaultfor each containermetadata.annotations["seccomp.security.alpha.kubernetes.io/pod"]: runtime/defaultspec.securityContext.seccompProfile.type: RuntimeDefault
We implemented other policies not directly related to pod security.
type: validating
A service with type ExternalName and with any port defined results in all traffic in Istio being redirected to that service. This policy blocks services that set spec.type: ExternalName and define the following field with any value:
spec.ports[*].port
type: validating
This policy aims to prevent accidental namespace deletion by requiring a certain label in case of DELETE request. The deletion will be rejected unless the namespace has the following label set:
metadata.labels.delete: allow
Workloads that violate policies are prevented to start by the policy engine (except see Known Issues).
You can find AdmissionReport and BackgroundScanReport in the cluster for each pod and see which policies the pod passed or failed. Example AdmissionReport:
apiVersion: kyverno.io/v1alpha2
kind: AdmissionReport
metadata:
creationTimestamp: "2023-08-25T11:23:28Z"
generation: 1
labels: {...}
name: 210be514-77ff-4c88-958a-8f4cc658a4ef
namespace: default
spec:
owner:
apiVersion: ""
kind: ""
name: ""
uid: ""
results:
- category: Pod Security Standards (Baseline)
message: validation rule 'adding-capabilities' passed.
policy: disallow-capabilities # <---------- Policy name
resources:
- apiVersion: v1
kind: Pod
name: test
namespace: default
uid: 210be514-77ff-4c88-958a-8f4cc658a4ef
result: pass # <---------- Result of validation
rule: adding-capabilities # <---------- Rule name within policy
scored: true
severity: medium
source: kyverno
timestamp:
nanos: 0
seconds: 1692962608
[...]
summary:
error: 0
fail: 6
pass: 13
skip: 0
warn: 0
GAP applications generated by gap.yaml configuration should not violate any policy, however patches, custom resources and ad-hoc pods may.
If you encounter unexpected validation failures, find the violated rule above and try to mitigate the issue. If you’re uncertain you can always ask for help in the #infra-support Slack channel.
If your GAP configuration is resulted in a policy violation, your deployment to staging environment will fail. You can go to your application in ArgoCD, click on Sync failed and look for the red (💔) events. If you see something like the below error, you can determine which policy is violated and seek help for mitigation if uncertain in the #infra-support Slack channel.
Error from server: error when creating "test-objects/pod.yaml": admission webhook "validate.kyverno.svc-
fail" denied the request:
resource Pod/default/test was blocked due to the following policies
disallow-capabilities-strict:
require-drop-all: 'validation failure: Containers must drop `ALL` capabilities.'
If you try to start an ad-hoc pod (ex. via kubectl run command) that violates the policies and the mutation controller cannot fix it automatically, your pod will be rejected. You will see the above error message directly in your terminal.
This issue does not affect the main business application container for the following reason:
- When you create a new user to run your application in your
Dockerfile, it gets theuidof1000by default, since theuidfor new users starts from 1000 on most distributions.- Then the GAP manifest generation automatically adds
runAsUser: 1000to the main container.
In case your container sets the user as a username instead of a numeric ID, the validation of the rule Run as non-root user will run into an error. The pod will be stuck in CreateContainerConfigError and you will find something similar when describing the pod:
Warning Failed 4s (x2 over 5s) kubelet Error: container has runAsNonRoot and
image has non-numeric user (test), cannot verify user is non-root (pod: "test_default(d0733e6a-7485-
4ca5-811f-eb56ed4c5a30)", container: test)
This can happen if your container image is built so that it runs as a named user. An example for that is the stunnel image (eu.gcr.io/ems-gap-images/stunnel:v2) some teams are using. When stunnel is installed a user stunnel is created and the container is set up so that the process runs as this user, but the container runtime (ContainerD) cannot determine the uid of the user.
To mitigate this issue, run your image locally and determine the uid of the named user:
docker run --rm -it eu.gcr.io/ems-gap-images/stunnel:v2 sh
/ $ id
uid=100(stunnel) gid=101(stunnel) groups=101(stunnel)
/ $
As we see, the uid of stunnel is 100.
After that you can set it in your patch or custom resource in the container’s securityContext as follows:
containers:
- name: stunnel
image:
repository: sap-ems-base-infra-package-p/gap-images/stunnel
tag: v2
command:
- dumb-init
args:
- stunnel
- /etc/stunnel/config
securityContext:
runAsUser: 100
...
Or if you’re running an ad-hoc pod with kubectl run add the --overrides flag as follows:
kubectl run --overrides={
"apiVersion": "v1"
"spec":
"securityContext":
"runAsUser": 65534
} ...
(65534 is the uid of the user nobody which is present in most Linux distributions.)