GAP application setup

Set up your GAP deployment configuration in your repository

This configuration file describes how to deploy your application, what is its name, what deployments it should have, etc.

Create a new folder named gap in your repository.
In the gap folder, create a file named gap.yaml. This file is used by the pipeline to generate the proper files for your deployment. See a sample gap.yaml file below.

gap.yaml

name: "<name-of-your-application>"
namespace: "<your-teams-namespace>"
image:
  repository: sap-ems-base-infra-package-p/gap-images/<name-of-your-application>
deployments:
  web:
    ingress:
      enabled: true
    command:
      - command-to-run-in-web
    args:
      - argument-for-command

name: (required) the name of your application. This name will be used for your Kubernetes service/pod/ingress/etc
namespace: (required) your team’s namespace
usePrebuiltImage: (optional) boolean more info
useServiceMesh: (optional) boolean that enables the Service Mesh more info
useProxyEnvVars: (optional) defaults to false, boolean that renders http proxy env vars necessary to make HTTP/S egress calls to the internet from the allowed list, more info
deletionProtection: (optional) boolean that enables protection of the Kubernetes resources (Deployments, Cronjobs) of an application, if considered critical, from deletion (Secrets are not protected via this, they’d need to have the deletionProtection: enabled label added manually. As for Custom Resources, they will need to have the aforementioned label added manually as well, and if their kind is not in this list, you’d need to reach out to us in order to include it in the list of watched resources). This way e.g an Argocd delete or a manual delete via kubectl would be blocked.
defaultAuthorizationPolicy: (optional) see the defaultAuthorizationPolicy docs
autoSyncProduction: (optional) setup the application to be automatically promoted into production on a schedule
- enabled: (required) true, if you want to enable the automatic promotion
- autoSyncHourUTC: (required) 0-23 integer, on the start of which hour (in UTC) should your application promoted to production
image: (optional) specifies the container image for your application.
- repository: (required when using the multiregion pipeline) the Artifact Registry repository path, e.g. sap-ems-base-infra-package-p/gap-images/<my-application-name>. The registry is automatically calculated per region.
- tag: (optional) the image tag — this is automatically set by the update-image-tag workflow step and should not be set manually.
env: (optional) see the environments docs
slackNotificationChannel: (optional) if a slack channel name is set where the “Argo-Bot” application is already added to, you will receive notifications about your application. See the ArgoCD page for more info.
serviceAccount: (optional) setup the service account for your application (it is created by default)
- enabled: (optional) generates a GAP service account and related authorization policies if applicable (defaults to true). See docs.
- name: (optional) to declare the name of the Kubernetes service account (defaults to the application name)
- annotations: (optional) a map of annotations to add to your service account, for example to connect it to a GCP service account (iam.gke.io/gcp-service-account: example@project.iam.gserviceaccount.com)
metricCollectorSidecar: (optional) specifies the global properties of the telemetry metric collector sidecar.
- enabled: (optional) true, if you want to enable the telemetry metric collection, thus injecting a sidecar to your deployments. This is a global setting that will enable it for all deployments (defaults to false). If you enable this feature, you either need to put an otel configuration file per environment to the following path gap/<environment>/config/otel.yaml or define the exporters via metricCollectorSidecar.config in case of setting it globally.
- config: (optional) configuration for the otel sidecar config exporters
  - exporters: (required) at least one of the following must be set:
    - googlemanagedprometheus: (optional) exporter settings for the googlemanagedprometheus. Any setting is allowed that can be found in the exporter. Docs can be found here.
    - debug: (optional) exporter settings for the debug. Any setting is allowed that can be found in the exporter. Docs can be found here.
- configMap: (optional) If you do not want to use the default (otel.yaml) configuration, you need to create a k8s ConfigMap resource in your gap/<environment> folder named gap_<somename>.yaml. You should set the name of the ConfigMap resource to this property. The content of the configuration file has to be put on key otel.yaml in the ConfigMap. If you do not specify this property, and have not set the global metricCollectorSidecar.config, the ConfigMap generated from the gap/<environment>/config/otel.yaml config file will be used. Cannot be set at the same time as the global metricCollectorSidecar.config.
- jobShutdownDelay: (optional, integer seconds) applies to every cronjob unless overriden via cronjob level jobShutdownDelay. If you use metric aggregation and the main process in your pod is terminated or your job runs to completion in a shorter time period than the aggregation period, the metrics will not be sent to Cloud Monitoring (ex. you aggregate for 60s but the job completes in 20s). To mitigate this we added a delay to shut down the metric collector container, which defaults to the terminationGracePeriodSeconds gap.yaml cronjob setting (default 30 seconds). You can override that default with shorter or longer delay. Please note, that the terminationGracePeriodSeconds in gap.yaml needs to be set to a higher value, because if e.g the aggregation_interval in the otel config is set to 60s, the terminationGracePeriodSeconds kept at default 30s which is also the default of jobShutdownDelay, then no metric will ever be sent away if pod is short living, and also because when the terminationGracePeriodSeconds is reached, the entire pod will be terminated.
- image: (optional) image for the otel collector sidecar, defaults to eu.gcr.io/ems-gap-images/otel-collector:latest.
cloudSQLProxy: (optional) injects a cloudsql-proxy container to help connect to google managed SQL services over TLS. It assumes workload identity is already configured and it is set for PRIVATE ip type.
- enabled: (optional) enable the sidecar for all deployments, pre/post deploy pods and cronjobs
- enableStartupProbe: (optional) if true, it adds a startup probe to CloudSQLProxy sidecar container, so that it only transitions into ready state once the connection to the CloudSQL instance is established.
- instance: (one of either instance exclusive or instances is required) name of the instance
- instances: (one of either instance exclusive or instances is required) allows the connection to multiple instances, that are provided using array notation
  - instance: (required) the name of the instance
  - port: (required) if the instances notation is used, the port number is required
  - project: (optional) the GCP project that contains the instance, if not provided, the value for the project key one level above will be used
  - region: (optional) if not provided, the value for the region key one level above will be used
- project: (required) the GCP project that contains the instance
- region: (optional) defaults to europe-west3
- port: (optional) defaults to 5432
- enableIAMLogin: (optional) if true the -enable_iam_login command line flag will be added to the proxy startup (defaults to true)
- terminationGracePeriod: (optional) how long to wait for connections to close after TERM signal. Defaults to 30s
- resources: (optional) override default resource requests and limits
isProduction: (optional) defaults to true, indicates whether the application runs in production. If set to false, production manifests and ArgoCD application will not be created. If previously the production app was existing and now disabled, please follow this guide to remove your production application, the guide should be followed by taking into account that only the production app is being deleted.
deployments - (optional) map of deployment items the key is the name of the deployment
cronJobs - (optional) map of cronjobs where the key is the name of the cronjob
ingress
preDeploy
postDeploy
flagd
waitForWorkloadIdentity
environment overrides
labels - (optional) list of custom labels to be added to the labels of every resource that has metadata object.
apiGateway

`deployments`

(optional) here you can define your deployments (service, one-shot, etc).

name - (required) the short name of the deployment (e.g: web)
- replicas: (optional) the number of replicas (default is 2), also accepts the string “null” (quotes necessary) for disabling override in case of custom autoscaling
- command: (required) the command to run in the container. For more complex commands see the examples below. command field corresponds to entrypoint field in Docker. Make sure that command which you specify handles process signals correctly (e.g. SIGTERM). For example, do not use npm run here, since it will not forward signals to the child process that it starts. Either call node directly, or use an init system, such as dumb-init, and then pass your npm command in the args field.
- args: (optional) the list of arguments to the command to run in the container.
- useProxyEnvVars: (optional) defaults to false, boolean that renders http proxy env vars necessary to make HTTP/S egress calls to the internet from the allowed list, more info
- authorizationPolicy: (optional) see the authorizationPolicy docs
- annotations: (optional) list of annotations added to the deployment
- podLabels: (optional) list of custom labels to be added to the pods of the deployment
- podAnnotations: (optional) list of annotations added to the pods of the deployment. If none is provided only cluster-autoscaler.kubernetes.io/safe-to-evict is added as true.
- externalServiceAccount: (optional) to override the service account if an externally managed service account is desired.
- image: (optional) image location that can override the default image from gap.yaml
- env: (optional) see the environments docs
- ingress: (optional) see the ingress section.
- service: (optional) creates the service, defaults to port named http with port value 80 and targetPort 8080. Cannot be used together with servicePort.
  - enabled: (required) set to true to have the service created with the default port (defaults to false).
  - ports: (optional, advanced) list of ports to define similarly to the ports field in the K8s Service API. Please note that the above mentioned default port won’t apply when setting the ports field, therefore e.g the http port must be set manually by the user.
    - name: (required) name of the port.
    - port: (required) the port that will be exposed by this service.
    - targetPort: (optional) number of the port to access on the pods targeted by the service, k8s will default to port value if not set.
- servicePort: (deprecated, use the service object instead)(optional) enables creation of service resource for the deployment (defaults to 8080 or 8081 if routeLogSidecar is enabled as well, ingress enables this implicitly). Cannot be set simultaneously with service object.
- dailyRestart: (optional) set this explicitly to true if you want your application to be restarted daily. For more information please read the documentation.
- collectMetrics: (optional) Set this to true if your deployment has a /metrics endpoint and you want your exposed metrics to be collected by Prometheus. Defaults to false.
- terminationGracePeriodSeconds: (optional, integer seconds) amount of time k8s should wait for the pod to shut down gracefully after SIGTERM before forcefully killing it. Defaults to 30 seconds.
- initContainers: (optional) init containers can run additional commands before the pod containers, see more. For workloads on service mesh init container networking is disabled by default. There is a workaround, you can find it here.
- resources: (optional) Your deployment’s (technically the main container’s) resources (defaults). Further details can be found in the official Kubernetes documentation
  - requests: (optional) Your deployments resource requests. These values will be guaranteed to your deployment.
    - cpu: (optional) How much cpu will your deployment have.
    - memory: (optional) How much memory will your deployment have.
  - limits: (optional) Your deployments resource limits. With these values you can set the maximum resources your deployment can use.
    - cpu: (optional) How much is the maximum cpu your deployment will be able to have.
    - memory: (optional) How much is the maximum memory your deployment will be able to have.
- livenessProbe: (optional) as defined by kubernetes. If ingress is enabled a default http probe is generated for the /healthcheck endpoint, overriding this will not be merged, all fields need to be specified. See healtheck best practices.
- readinessProbe: (optional) as defined by kubernetes. If ingress is enabled a default http probe is generated for the /healthcheck endpoint, overriding this will not be merged, all fields need to be specified. See healtheck best practices.
- autoscaling: (optional) the definition of autoscaling settings to use for the deployment. For further details see the autoscaling page or the official Kubernetes documentation.
  - enabled: (required) set to true to enable autoscaling configuration generation (defaults to false)
  - minReplicas: (required) the number of the minimum replicas
  - maxReplicas: (required) the number of the maximum replicas
  - metrics: (required) the definition of the metrics to drive autoscaling
    - type: (required) can be Resource or Object or External
    - name: (required) the name of the resource, eg.: cpu, memory, pubsub.googleapis.com|subscription|num_undelivered_messages, etc.
    - one of targetValue, targetAverageValue or targetAverageUtilization: (required) the target value to use by the autoscaler
    - selector: (optional) in case of Object or External selectors are required to identify the metric to autoscale by more.
    - describedObject: (required) for Object type example.
- podDisruptionBudget: (optional) You can specify the number of pod disruptions your application can tolerate without disruption. It’s enabled by default. Further details can be found in the official Kubernetes documentation.
  - enabled: (optional) enables the pod distruption budget generation (defaults to true)
  - maxUnavailable: (optional) sets the maxUnavailable value of the pod distruption budget. If set in percentage it will be rounded up to the nearest integer. (defaults to 25%)
- strategy: (optional) The deployment strategy to use during deployment by Kubernetes. Further details in the Kuberentes documentation.
  - type: (required) Setting this to Recreate will remove all running pod and recreate them in one go, RollingUpdate will roll out your changes in smaller iterations based on its settings. Default is RollingUpdate
  - rollingUpdate: (optional) specifies the parameters for the Rolling update deployment strategy.
    - maxSurge: Whole number or percentage of the number of pods which can be started above the set replica count. Default is 25%.
    - maxUnavailable: Whole number or percentage of the number of pods which can be unavailable. Default is 25%.`
- tolerations: (optional) Allows to specify tolerations to be able to run on tainted nodes. Requires a list of objects.
  - list of objects as [{key: role, value: baseline}]
  - specific use-cases should be discussed with the cloud platform team
- affinityRequire: (optional) Allows to specify required node-pool affinity, this allows the deployment to require a based on label. Requires a list of objects as the matchExpression block in the mentioned k8s doc.
  - key: label key (e.g: role)
  - operator: operator (usually In)
  - values: list of label values ["fixip"]
- metricCollectorSidecar: (optional) specifies the properties of the deployment specific override of the telemetry metric collector sidecar.
  - enabled: (required) true, if you want to enable the telemetry metric collection, thus inject a sidecar to your deployments (defaults to false). If you enable this feature, you need to put an otel configuration file per environment to the following path gap/<environment>/config/otel.yaml. In case you want to use the exact same otel configs in more of your repositories, please refer the Namespace wide config section, to see how to set it up.
  - configMap: (optional) If you do not want to use the default (otel.yaml) configuration, you need to create a k8s ConfigMap resource in your gap/<environment> folder named gap_<somename>.yaml. You should set the name of the ConfigMap resource to this property. The content of the configuration file has to be put on key otel.yaml in the ConfigMap. If you do not specify this property, and have not set the global metricCollectorSidecar.config, the ConfigMap generated from the gap/<environment>/config/otel.yaml config file will be used.
  - image: (optional) image for the otel collector sidecar, defaults to eu.gcr.io/ems-gap-images/otel-collector:latest.
- cloudSQLProxy: (optional) injects a CloudSQL auth proxy sidecar, see above for configuration
  - enabled: (optional) boolean, enable/disable sidecar for the deployment
- waitForWorkloadIdentity See the specific section for the configuration of the field.
- flagd: See the specific section for the configuration of the field.

We set a PORT environment variable in your deployment’s container to 8080 by default, and it cannot be overriden, please prefer to use differently named port env vars especially if you are using custom ports via service.ports.

Example

# gap/gap.yaml
name: "name-of-your-application"
namespace: "your-teams-namespace"
deployments:
  web:
    command:
      - command-to-run-in-web
    args:
      - argument-for-command
    ingress:
      enabled: true
    dailyRestart: true
    collectMetrics: true
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 4
      metrics:
        - type: Resource
          name: cpu
          targetAverageUtilization: 50
    podDisruptionBudget:
      enabled: true
      maxUnavailable: 1
    resources:
      requests:
        cpu: 500m
        memory: 500Mi
      limits:
        cpu: 1000m
        memory: 1000Mi
  # Use dumb-init to capture process signals correctly
  npm-example:
    command:
      - dumb-init
    args:
      - npm
      - start

ingress

enabled: (optional) should be true, if this deployment needs to have an ingress (aka it is needed to be accessed from outside). false by default.
useCustomVirtualService (optional) set to true in order to use custom virtual services, e.g for porting the ingress.rules to virtual services as detailed here.
annotations: (optional) custom annotations can be set here for the ingress object. defaults:
- cert-manager.io/cluster-issuer: letsencrypt-prod
labels: (optional) custom labels to be added to the ingress object
class: (optional) can only be nginx or gce, defaults to nginx this is needed to use the Google HTTP/S Loadbalancer. For gce class a FrontendConfig will be created with a predefined ssl policy (gap-ssl-policy), also the NEG annotation will be added to the service manifest.
backendConfigName: (optional) if using gce ingress class this would be the name of the BackendConfig object that is defined in a gap_*.yaml file in your gap folder. If specified this will be added to the service’s annotations
hosts: (optional) the list of hosts to be included in the ingress for the ingress to point traffic directed from those hosts to your service. Defaults to <application>-staging.gservice.emarsys.com for staging and <application>.gservice.emarsys.net for production. Note that if hosts are manually given then the otherwise defaults need to be included in the list as well. See examples below.
rules: (optional) If it is not defined, it defaults to the example like below (domain is autogenerated by environment as described above). See Custom ingress section for use cases. It cannot be set in combination with useServiceMesh, please see the examples here how to set up hosts in case of use useServiceMesh enabled.
- some-domain: (required) the domain which should point to your application. There can be more definition of this block.
  - /some-path: (required) the subpath of the domain given above which should point to your application. There can be more definition of this block.
    - serviceName: (optional) sets the service name to given value (defaults to the deployment’s name, e.g.: web). If you are using a custom service, make sure it exposes port 80.
    - pathType: (optional) sets the ingress pathType parameter (defaults to ImplementationSpecific). More info
    - path: (optional) sets the ingress path parameter (defaults to /).
      - Validation Rules: (for Exact or Prefix pathType)
        The value must start with a forward slash /.
        It can include alphanumeric characters, as well as hyphens - and underscores _.
        For example, valid path values could look like /api, /user-profile, or /service-endpoint_1.

Enabling an ingress will include the generation of a corresponding k8s Service resource

The TLS generation is automatically taken care of for the domains declared in the rules. If this is not sufficient please contact the cloud platform team.

Adding an EmarsysSSO authentication layer using OAuthProxy

If your application is an internal tool for SAP Emarsys colleagues, you configure the Ingress resources of your application to work with EmarsysSSO via OAuthProxy. All you need to do is to add the following annotations to the ingress field of your gap.yaml. Since the values for these annotations differ across environments, put them in your environment-specific gap.yaml files (e.g. gap/s-eu1-01/gap.yaml for EU staging or gap/p-eu1-01/gap.yaml for EU production).

# Staging:
nginx.ingress.kubernetes.io/auth-signin: https://auth-staging.gservice.emarsys.com/oauth2/start?rd=https://$host$escaped_request_uri
nginx.ingress.kubernetes.io/auth-url: https://auth-staging.gservice.emarsys.com/oauth2/auth

# Production:
nginx.ingress.kubernetes.io/auth-signin: https://auth.gservice.emarsys.net/oauth2/start?rd=https://$host$escaped_request_uri
nginx.ingress.kubernetes.io/auth-url: https://auth.gservice.emarsys.net/oauth2/auth

Example

deployments:
  <deployment-name>:
    ingress:
      enabled: true
      annotations:
        nginx.ingress.kubernetes.io/auth-signin: https://auth-staging.gservice.emarsys.com/oauth2/start?rd=https://$host$escaped_request_uri
        nginx.ingress.kubernetes.io/auth-url: https://auth-staging.gservice.emarsys.com/oauth2/auth

Default Ingress and Virtual Service

If ingress is enabled and no hosts or rules are defined a default ingress and a virtual service is going to be created for both environments. The domains are auto-generated from the application name with the environment default domains (<name>-staging.gservice.emarsys.com for staging, <name>.gservice.emarsys.net` for production).

Example

The default ingress and Virtual Service are generated with these defaults for the production environment, as if they were manually given.

name: "application"
namespace: "your-teams-namespace"
deployments:
  web:
    ingress:
      enabled: true
      annotations:
        kubernetes.io/ingress.class: nginx
      hosts:
      - <application>.gservice.emarsys.net

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: "application"
spec:
  hosts:
  - <application>.gservice.emarsys.net
  http:
  - route:
    - destination:
        host: <deployment-name>.<your-teams-namespace>.svc.cluster.local

Custom Ingress

Please refer to this doc to set up your custom ingress traffic logic via virtual services (Recommended). If the application is still not using service mesh, then below sections will show how custom traffic routing can be done via Ingress.

If the default ingress is not enough it is possible to use custom ingress configurations by manually defining the rules. Environment override should be created with instance specific domains (see how).

Use case #1: I have multiple/custom domains for my service

# gap/gap.yaml
name: "name-of-your-application"
namespace: "your-teams-namespace"
deployments:
  web:
    ingress:
      enabled: true

# gap/s-eu1-01/gap.yaml (EU staging)
deployments:
  web:
    ingress:
      rules:
        myapp-staging.gservice.emarsys.com:
          /: {}
        app-staging.eservice.emarsys.com:
          /: {}

# gap/p-eu1-01/gap.yaml (EU production)
deployments:
  web:
    ingress:
      rules:
        myapp.gservice.emarsys.net:
          /: {}
        app.eservice.emarsys.net:
          /: {}

Use case #2: I have have a custom service that I want to expose via my ingress

In this case make sure your custom service exposes port 80. We do not support custom ports at the moment.

# gap/gap.yaml
name: "name-of-your-application"
namespace: "your-teams-namespace"
deployments:
  web:
    ingress:
      enabled: true

# gap/s-eu1-01/gap.yaml (EU staging)
deployments:
  web:
    ingress:
      rules:
        myapp-staging.gservice.emarsys.com:
          /: {}
          /api:
            serviceName: my-api-service
            pathType: Prefix

# gap/p-eu1-01/gap.yaml (EU production)
deployments:
  web:
    ingress:
      rules:
        myapp.gservice.emarsys.net:
          /: {}
          /api:
            serviceName: my-api-service
            pathType: Prefix

cronJobs

(optional) - Configuration for cronjobs.

Under the cronJobs property you can define keys which will become of the cronjob and it should have the following required properties:

schedule: (required) - the jobs schedule in cron format
timeZone: (optional) - the time zone name the given schedule is interpreted in, defaults to the kube-controller-manager’s time zone if not provided. See more about time zones.
image: (optional) image location that can override the default image from gap.yaml
env: (optional) see the environments docs
command: (required) - command to run on each job start. For more complex commands see the examples below.
args: (optional) - arguments to use for the command
useProxyEnvVars: (optional) defaults to false, boolean that renders http proxy env vars necessary to make HTTP/S egress calls to the internet from the allowed list, more info
podLabels: (optional) list of custom labels to be added to the pods of the cronjob
suspend: (optional) - true if you want to suspend your job
resources: (optional) - Your deployment resources (defaults). Further details can be found in the official Kubernetes documentation
- requests: (optional) - Your deployments resource requests. These values will be guaranteed to your deployment.
  - cpu: (optional) - How much cpu will your deployment have.
  - memory: (optional) - How much memory will your deployment have.
- limits: (optional) - Your deployments resource requests. These values will be guaranteed to your deployment.
  - cpu: (optional) - How much is the maximum cpu your deployment will be able to have.
  - memory: (optional) - How much is the maximum memory your deployment will be able to have.
concurrencyPolicy: (optional): specifies how to treat concurrent executions of a Job created by the CronJob controller, it can be Allow, Forbid, Replace. Defaults to Allow.
startingDeadlineSeconds: (optional): indicates the maximum number of seconds the CronJob can take to start if it misses its scheduled time for any reason. Defaults to 3600.
successfulJobsHistoryLimit: (optional): The number of successful CronJob executions that are saved. Defaults to 3.
failedJobsHistoryLimit: (optional): The number of failed CronJob executions that are saved. Defaults to 1.
parallelism: (optional): Can be set to any non-negative value. Defaults to 1. Actual parallelism (number of pods running at any instant) may be more or less than requested parallelism, for a variety of reasons. For more information see the documentation
backoffLimit: (optional): There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. You can specify the number of retries before considering a Job as failed. Defaults to 6.
activeDeadlineSeconds: (optional): Duration after k8s terminates the job if it did not exit, resulting in a failed status. Defaults to 300s.
ttlSecondsAfterFinished: (optional): Duration after k8s cleans up the finished job (either Complete or Failed), see more. Default behaviour: if you do not set this the job will not be automatically cleaned up.
initContainers: (optional) init containers can run additional commands before the pod containers, see more. For workloads on service mesh init container networking is disabled by default. There is a workaround, you can find it here.
terminationGracePeriodSeconds: (optional, integer seconds, becomes default value of jobShutdownDelay) - Amount of time k8s should wait for the pod to shut down gracefully after SIGTERM before forcefully killing it. Defaults to 30 seconds. Note that if metricCollectorSidecar is enabled as well, the value set to the terminationGracePeriodSeconds will be the default value for jobShutdownDelay unless overriden with the said gap.yaml field.
cloudSQLProxy: (optional) injects a CloudSQL auth proxy sidecar, see above for configuration
- enabled: (optional) boolean, enable/disable sidecar for the cronjob
waitForWorkloadIdentity See the specific section for the configuration of the field.
flagd: See the specific section for the configuration of the field.
tolerations: (optional) - Allows to specify tolerations to be able to run on tainted nodes. Requires a list of objects.
- list of objects as [{key: role, value: baseline}]
- specific use-cases should be discussed with the GAP team.
affinityRequire: (optional) - Allows to specify required node-pool affinity, this allows the deployment to require a based on label. Requires a list of objects as the matchExpression block in the mentioned k8s doc.
- key: label key (e.g: role)
- operator: operator (usually In)
- values: list of label values ["fixip"]
metricCollectorSidecar: (optional) specifies the properties of the cronjob specific override of the telemetry metric collector sidecar.
- enabled: (required) true, if you want to enable the telemetry metric collection, thus inject a sidecar to your cron jobs (defaults to false). If you enable this feature, you need to put an otel configuration file per environment to the following path gap/<environment>/config/otel.yaml. In case you want to use the exact same otel configs in more of your repositories, please refer the Namespace wide config section, to see how to set it up.
- configMap: (optional) If you do not want to use the default (otel.yaml) configuration, you need to create a k8s ConfigMap resource in your gap/<environment> folder named gap_<somename>.yaml. You should set the name of the ConfigMap resource to this property. The content of the configuration file has to be put on key otel.yaml in the ConfigMap. If you do not specify this property the ConfigMap generated from the gap/<environment>/config/otel.yaml config file will be used.
- jobShutdownDelay: (optional, integer seconds). If you use metric aggregation and the main process in your pod is terminated or your job runs to completion in a shorter time period than the aggregation period, the metrics will not be sent to Cloud Monitoring (ex. you aggregate for 60s but the job completes in 20s). To mitigate this we added a delay to shut down the metric collector container, which defaults to the terminationGracePeriodSeconds gap.yaml cronjob setting (default 30 seconds). You can override that default with shorter or longer delay. Please note, that the terminationGracePeriodSeconds in gap.yaml needs to be set to a higher value, because if e.g the aggregation_interval in the otel config is set to 60s, the terminationGracePeriodSeconds kept at default 30s which is also the default of jobShutdownDelay, then no metric will ever be sent away if pod is short living, and also because when the terminationGracePeriodSeconds is reached, the entire pod will be terminated.
- image: (optional) image for the otel collector sidecar, defaults to eu.gcr.io/ems-gap-images/otel-collector:latest.

Example

appName: "name-of-your-application"
namespace: "your-teams-namespace"
cronJobs:
  periodic-job:
    schedule: "* * * * *"
    command: ["command", "to", "run", "periodic", "job"]
    args:
      - argument-for-command

preDeploy

(optional) - Command executed before rollout. For more complex commands see the examples below.

If this command exits with 0 the flow will continue with the rollout of your application.

If it exits with a non zero code, the deployment flow will exit and your build will fail.

This command runs on the same image as the application.

Optionally, the cloudSQLProxy configuration can be specified to allow connecting to GCP SQL databases for arguments, see above for configuration options.

Some of the settings on a preDeploy level:

env: (optional) see the environments docs

useProxyEnvVars: (optional) defaults to false, boolean that renders http proxy env vars necessary to make HTTP/S egress calls to the internet from the allowed list, more info

Lastly, flagd and/or waitForWorkloadIdentity configurations can be set on a preDeploy scope as well.

name: "name-of-your-application"
namespace: "your-teams-namespace"
preDeploy: 
  command:
    - "command-to-run-in-pre-deploy"
deployments:
  web:
    command: ["command", "to", "run", "in", "web"]
    ingress:
      enabled: true

Disabling the creation of a default service account

serviceAccount:
  enabled: false

In case your application does not need to use the service account created for it by GAP, and every deployment and every cronjob has an external service account specified, you can set this field to false to avoid generating an unused service account and authorization policy.

Please note that externalServiceAccount must be set for every deployment and every cronjob when serviceAccount.enabled is set to false otherwise your GAP pipeline build will fail.

defaultAuthorizationPolicy and authorizationPolicy

(optional) - Configuration for the app or deployment level Istio Authorization Policy

rules: (required) match requests from a list of sources that perform a list of operations subject to a list of conditions. A rule match occurs when at least one source, one operation and all conditions matches the request. At least one from or to or when must be set in a rule. See about the defaults below.
- from: (optional) A list of sources. Specifies the source of a request, if not set, any source is allowed. Sources will be ORed together.
  - source: (required) specifies the source identities of a request.
    - principals: (required) A list of namespace and list of service accounts of the application(s) from which the traffic is allowed. If not set, any principal allowed unless specific notPrincipals set. Principals will be ORed together.
      - namespace: (required) The namespace where the service account(s) exist.
      - serviceAccountName: (required) A list of service account name(s) of the application(s) from which traffic is allowed. Name of the service account will be the the name of the application you’d like to allow traffic from unless a custom service account name is set by the respective team.
    - notPrincipals: (required) A list of namespace and list of service accounts of the application(s) from which the traffic is not allowed. If not set, any principal allowed unless specific principals set. Principals will be ORed together.
      - namespace: (required) The namespace where the service account(s) exist.
      - serviceAccountName: (required) A list of service account name(s) of the application(s) from which traffic is not allowed. Name of the service account will be the the name of the application you’d like to disallow traffic from unless a custom service account name is set by the respective team.
- to: (optional) A list of operations. Specifies the operation of a request, if not set, any operation is allowed. Operations will be ORed together.
  - operation: (required) specifies the operation of a request. The fields in the operations will be ANDed together. At least one of the following operations must be set.
    - hosts: (optional) A list of hosts to allow as specified in the HTTP request. The match is case-insensitive. If not set, any host is allowed unless specific notHosts set.
    - notHosts: (optional) A list of hosts to disallow as specified in the HTTP request. The match is case-insensitive. If not set, any host is allowed unless specific hosts set.
    - ports: (optional) A list of ports to allow as specified in the connection. If not set, any port is allowed unless specific notPorts set.
    - notPorts: (optional) A list of ports to disallow as specified in the connection. If not set, any port is allowed unless specific ports set.
    - methods: (optional) A list of methods to allow as specified in the HTTP request. If not set, any method is allowed unless specific notMethods set.
    - notMethods: (optional) A list of methods to disallow as specified in the HTTP request. If not set, any method is allowed unless specific methods set.
    - paths: (optional) A list of paths as specified in the HTTP request. If not set, any path is allowed unless specific paths set. Regex is a supported feature when wildcard charater is applied in curly brackets. See the documentation here
    - notPaths: (optional) A list of paths to disallow as specified in the HTTP request. If not set, any path is allowed unless specific notPaths set.
- when: (optional) A list of conditions. Specifies a list of additional conditions of a request, if not set, any condition is allowed. Conditions therein will be ANDed together.
  - key: (required) The name of an Istio attribute. See the full list of supported attributes.
  - values: (required) A list of allowed values for the attribute.

It can be set both on the app level and/or on deployment level, please note that when set on app level it should be defined as defaultAuthorizationPolicy.

Please note that yaml arrays are not merged like dictionaries due to tooling limitation. If a default and/or deployment level authorization policy rules are set in the root gap.yaml and also on the e.g production/gap.yaml, only the env specific one will apply for the given env, leaving the root level gap.yaml authorization policy rules unused.

The gap.yaml configuration of authorizationPolicy is similar to as defined in the Istio docs, main differences being:

the action being set to ALLOW which is not modifiable.
in the source only the principals can be selected.
our abstraction on principals.

If you’d like to have a more advanced Authorization Policy by fully utilizing the API provided by Istio you can create custom resources, providing the Authorization Policies manually.

By default an authorization policy will be created, which will apply to all deployments under your application defined in the gap.yaml, allowing all the application’s components to communicate with each other.

If ingress is enabled for a deployment, an authorization policy will be created applying to the specific deployment where the ingress is enabled, allowing all traffic from the ingress-nginx. This behaviour can be overridden by setting a rule on the deployment level authorization policy with the ingress-nginx principal, as in the example below.
If collectMetrics set to true, traffic from path /metrics with method GET will be allowed by default.

Please note that a rule with the ingress-nginx principal may not be set on the application level defaultAuthorizationPolicy.

Example

name: "name-of-your-application"
namespace: "your-teams-namespace"
# following policy applies to all deployments, in the first rule allowing from any of the principals 
# AND with the specified operation, where the method should be GET or HEAD and the host has suffix .example.com 
# AND when the condition is satisfied that the header value has what is defined. 
# If the first rule is not matched, then the second one is evaluated.
defaultAuthorizationPolicy: 
  rules:
  - from:
    - source:
        principals:
        - namespace: cloud-platform
          serviceAccountName: ["gap-docs", "gap-example-docker"]
        - namespace: mobile-engage
          serviceAccountName: ["me-delivery"]   
    to:
    - operation:
        hosts: ["*.example.com"]
        methods: ["GET", "HEAD"]
    when:
    - key: request.headers[User-Agent]
      values: ["Mozilla/*"]
  - from:
    - source:
        principals:
        - namespace: segmentation
          serviceAccountName: ["segment-registry"]
    to:
    - operation:
        methods: ["GET", "POST"]
deployments:
  web:
    command: ["command", "to", "run", "in", "web"]
    # following policy applies only to the web deployment, allowing all traffic from test-app of cloud platform, 
    # and also all from ingress-nginx as no specific rule containing ingress-nginx principal was set.
    authorizationPolicy:
      rules:
      - from:
        - source:
            principals:
            - namespace: cloud-platform
              serviceAccountName: ["test-app"]
    ingress:
      enabled: true
  web2:
    command: ["command", "to", "run", "in", "web2"]
    authorizationPolicy:
      rules:
      # following policy rule applies only to the web2 deployment, allowing only GET and POST method operations from ingress-nginx.
      - from:
        - source:
            principals:
            - namespace: ingress-nginx
              serviceAccountName: ["ingress-nginx"]
        to:
        - operation:
            methods: ["GET", "POST"]
      # following policy rule applies only to the web2 deployment, allowing any Escher signed request
      # Note: Escher signature validation is your responsibility
      - when:
        - key: request.headers[x-ems-auth]
          values:
          - '*'
    ingress:
      enabled: true

postDeploy

(optional) - Command executed after rollout. For more complex commands see the examples below.

If this command exits with 0 your build is successful.

If it exits with non zero code, the deployment flow will exit and your build will fail.

This command runs on the same image as the application.

env: (optional) see the environments docs

useProxyEnvVars: (optional) defaults to false, boolean that renders http proxy env vars necessary to make HTTP/S egress calls to the internet from the allowed list, more info

Optionally the cloudSQLProxy configuration can be specified to allow connecting to GCP SQL databases for arguments, see above for configuration options.

Lastly, flagd and/or waitForWorkloadIdentity configurations can be set on a postDeploy scope as well.

name: "name-of-your-application"
namespace: "your-teams-namespace"
postDeploy: 
  command:
    - "command-to-run-in-post-deploy"
deployments:
  web:
    command: ["command", "to", "run", "in", "web"]
    ingress:
      enabled: true

waitForWorkloadIdentity

(optional, except when flagd is enabled on the same scope) set this field to true to add an initContainer to your pod that will wait for Workload Identity setup to finish while preventing the application container to start before it’s ready. This is useful if you experience authentication issues on application startup while using Workload Identity. Please note that using this feature may affect startup time (adds a few seconds on average) and so the overall duration of a deploy/restart may be longer.

flagd

(optional) with it you can define your flagd sidecar. Please note, that when it is enabled on a scope (root level, deployment etc.), waitForWorkloadIdentity must be set to true alongside it on the same scope (see the example below).

enabled: (required) set to true, if you would like to enable flagd sidecar for the specific scope (root level, deployment etc.) (defaults to false).
source: (optional) define the uri and the provider for your flagd feature flag configuration (defaults to the one provided by DEVX). Documentation for the fields can be read here
- uri: (required) specify the uri which points to the location where the flagd feature flag configuration lies.
- provider (required) specify the provider for the location of the flagd feature flag configuration.
resources: (optional) Your flagd sidecar’s resources.
- requests: (optional) Your flagd sidecar’s resource requests. These values will be guaranteed for your sidecar.
  - cpu: (optional, defaults to 250m) How much cpu will your sidecar have.
  - memory: (optional, defaults to 250Mi) How much memory will your sidecar have.
- limits: (optional) Your flagd sidecar’s resource limits. With these values you can set the maximum resources your sidecar can use.
  - cpu: (optional, defaults to 500m) How much is the maximum cpu your sidecar will be able to use.
  - memory: (optional, defaults to 750Mi) How much is the maximum memory your sidecar will be able to use.

Important

In order for your sidecar to access the default flagd.json, you must:

Follow the workload identity setup docs
After setting up the workload identity, please apply the following:

For Stage: gcloud storage buckets add-iam-policy-binding gs://sap-gap-feature-toggles-s_s-1-gcp-europe-west3_output --member=serviceAccount:<YOUR-GCP-SERVICE-ACCOUNT-EMAIL> --role=roles/storage.objectViewer (email example: poller-service@payment-team.iam.gserviceaccount.com)
For Prod: gcloud storage buckets add-iam-policy-binding gs://sap-gap-feature-toggles-p_p-1-gcp-europe-west3_output --member=serviceAccount:<YOUR-GCP-SERVICE-ACCOUNT-EMAIL> --role=roles/storage.objectViewer (email example: poller-service@payment-team.iam.gserviceaccount.com)

Example configuration:

name: "name-of-your-application"
namespace: "your-teams-namespace"
postDeploy: 
  command:
    - "command-to-run-in-post-deploy"
deployments:
  web:
    command: ["command", "to", "run", "in", "web"]
    flagd:
      enabled: true
    waitForWorkloadIdentity: true # required when flagd is enabled

More info and support

Guidance pertaining to Flagd and Feature Toggles can be followed in this documentation by the Devx team, and for support requests regarding them please refer to the project-feature-toggles Slack channel by the aforementioned team.

Environment variables

Can be set as key value pairs on root and deployment level. If both root and deployment level is set, they will be merged with the later one taking presedence over the first one in case of conflict.

namespace: example
name: example

env:
  LOG_LEVEL: info
  DEBUG: "false"
  ENVIRONMENT: staging

Configuration overrides

All configurations set in the gap/gap.yaml will be used across all instances. If you want to set an setting on a specific instance or all instances within an environment you need to create an override gap.yaml in the corresponding subfolder. This yaml essentially acts like a patch file for your gap/gap.yaml, overriding values in certain paths. You can also think of it like merging two objects.

Though the gap.yaml structure conforms in many cases to kubernetes manifests, it does not entirely. When you are creating the overriding gap.yaml files you should always use the gap.yaml structure.

Environment overrides

Environment specific overrides can be dropped in the corresponding folders:

staging-defaults - applies to all staging instances
production-defaults - applies to all production instances

The typical use cases for this is overriding the resource requests and limits for production environments or adding autoscaling to production only. In the latter case you would not specify autoscaling settings in the main gap.yaml or in a staging override, but only in the production override.

Staging and production typically have very different resource usage profiles. Don’t just copy the same requests across environments — check your actual usage on the Grafana or GCM dashboards and set values accordingly. See the resource details guide for practical advice on setting your requests and limits.

In addition to instance overrides, you can create staging-defaults and production-defaults directories inside the gap folder. These apply overrides to all staging or all production clusters respectively, without having to duplicate configuration across individual cluster folders. See our own gap folder for example.

Instance overrides

The folders are named after the instances:

s-eu1-01 — EU staging
p-eu1-01 — EU production
s-us1-01 — US staging
p-us1-01 — US production

The merge order is:

gap/gap.yaml (base)
gap/staging-defaults/gap.yaml or gap/production-defaults/gap.yaml (environment type defaults)
gap/s-eu1-01/gap.yaml, gap/p-eu1-01/gap.yaml, etc. (instance overrides)

Each layer patches the previous one, so instance overrides take precedence over environment type defaults, which in turn take precedence over the base gap.yaml.

This is especially useful for resource requests — you can set sensible lower defaults for all staging clusters in staging-defaults and appropriate production values in production-defaults, then only use instance overrides for the exceptions.

Example

# gap/gap.yaml (base — shared across all environments)
name: "name-of-your-application"
namespace: "your-teams-namespace"
deployments:
  web:
    command: ["command", "to", "run", "in", "web"]

# gap/staging-defaults/gap.yaml — applies to all staging clusters (s-eu1-01, s-us1-01, etc.)
deployments:
  web:
    replicas: 1
    resources:
      requests:
        cpu: 100m
        memory: 128Mi

# gap/production-defaults/gap.yaml — applies to all production clusters (p-eu1-01, p-us1-01, etc.)
deployments:
  web:
    resources:
      requests:
        cpu: 500m
        memory: 500Mi
      limits:
        cpu: 1000m
        memory: 1000Mi

# gap/s-us1-01/gap.yaml — instance override, only for US staging (takes precedence over staging-defaults)
deployments:
  web:
    replicas: 2

Use staging-defaults and production-defaults to set baseline resource requests that reflect the typical usage difference between environments. You can then fine-tune individual instances when needed. See the resource details guide for help on choosing the right values.

labels

(optional) list of custom labels to be added to the labels of every resource that has metadata object.

If custom labels are added to a specific object, the labels list will be merged with the root level labels. If a key both exist in root level labels and in the specific list, the specific will be used.

Note: Setting the following labels is prohibited, as they are used internally: app, applicationName, app.kubernetes.io/instance, app.kubernetes.io/version, helm.sh/chart, istio.io/rev, laas, mesh

Example

# gap/gap.yaml
name: "name-of-your-application"
namespace: "your-teams-namespace"

labels:
  component: 'ABC-DEF-GHI'

apiGateway

The apiGateway object contains configuration for exposing your service via the API Gateway.

enabled: (optional) set to true if you would like to enable availability of the deployed service behind the API Gateway (defaults to false).
servicePath: (optional) the path on your service where the Gateway will redirect requests (defaults to ""). Its purpose is to prepend a specific path segment to the request URI before the request is routed to the upstream.

Example

deployments:
  web:
    ingress:
      enabled: true
    apiGateway:
      enabled: true
      servicePath: /internal/v2/tenants/~tenant_id~

Command - complex cases

To use environment variables or more complex commands, the main command under a multiline string can be used as per the below example. Any needed escape character will be applied automatically during rendering.

postDeploy:
  command: [sh]
  args:
    - -c
    - |-
      curl -H "Accept: application/vnd.github.everest-preview+json" -H "Authorization: token $EMARSYS_DEPLOYER_GITHUB_TOKEN" --request POST --data '{"event_type": "run-e2e-test-after-prod-deploy" }' https://api.github.com/repos/emartech/unified-segmentation/dispatches

Healthcheck best practices

It is advised that you don’t check the health of underlying resources (e.g. checking whether Cloud Spanner is available) within your healthcheck logic. If the database is unavailable for a short period of time, Kubernetes will kill all your pods, causing several minutes of downtime. Be aware that Spring Boot Actuator’s healtcheck mechanism checks the health of all resources and external systems used by your service, such as Cloud Spanner and PubSub. Use a simple healthcheck returning HTTP 200 OK with simple {"success": true} body.