Enrollment
Add the following field on the root level of gap.yaml:
useServiceMesh: true
After deploying your application as you usually do, all your pods (deployments, cronjobs, pre- and post-deploy pods) will get an istio proxy sidecar injected into them.
Pilots have reported that Envoy interprets HTTP Spec more strictly than Nginx. This manifested in protocol errors for invalid responses for some applications. If you encounter protocol errors let us know so we can confirm. So far we have seen 2 issues both tied to 1-to-1 proxying of HTTP responses:
- Transfer-Encoding: “chunked” & Content-lenght: xxx
- when the original response was encoded but the proxy read the entire message and forwarded the headers unmodified. It is not valid HTTP to say the response is chunked but send the content-length. Filtering the proxied headers is necessary.
- Duplicate header using differently cased letters in the header key
- This is not allowed by the HTTP spec. It is recommended to filter the proxied headers.
If you rely on a manual init container to wait for the workload identity, your pod will not be able to start after onboarding to mesh. Your best choice is to use the built-in waitForWorkloadIdentity flag in gap.yaml which works with mesh as well. If you have a custom use case, please read forward.
With the introduction of the service mesh sidecar (istio-proxy) networking is disabled in initContainers. This is because the Istio init container changes the network settings (iptables) so that the proxy intercepts all network connections in the pod, but the proxy is not yet started in the pod init phase (PodInitializing).
There is a workaround for this: running the init container as the user 1337 allows your process to circumvent the rules and access the network without the proxy. So if you have a custom init conatiner that needs network access, you have to modify the config as follows:
...
initContainers:
- name: custom-init
command: ["run", "init"]
image: busybox
securityContext:
runAsUSer: 1337 # <-- This is the important part
...
Currently, meshed cronjobs’ Docker images CANNOT be based on Distroless nor Scratch images.
GAP currently does NOT support any command in
gap.yamlthat contains whitespace in any of their arguments.
To be able to trap SIGTERM and shut down sidecar containers after a cronjob is finished, we have to wrap the commands defined in gap.yaml in a sh -c call. This requires sh to be available in the container, causing Distroless- and Scratch-based images to crash. This problem does not have a workaround.
The commands defined in gap.yaml are interpolated into the manifests as arguments of the above-mentioned sh -c call using Go templating. Due to its peculiarities, single and double quotes are lost in the process, resulting in arguments that contain whitespace to be interpreted as two different argument. This problem can simply be worked around by writing a shell script containing the command in its original form, and the shell script being invoked in the gap.yaml command.
Example:
#WRONG
cronJobs:
bad-example-cronjob:
command: [ "sh" ]
args:
- -c
- "PYTHONPATH=. python3 my_python_script.py"
#FIXED
cronJobs:
good-example-cronjob:
command:
- sh
- scripts/my_wrapped_script.sh
#scripts/my_wrapped_script.sh
#!/bin/sh
PYTHONPATH=. python3 my_python_script.py
In a future upgrade to Kubernetes v1.30+, sidecars will be natively supported, so the above tricks will not be needed anymore. We cannot offer an ETA yet for this upgrade.
Most of the applications used the <xxxx>.gservice.emarsys.(com|net) domain to call other applications because this was the way to be able to produce router logs that is mandated by security and used in LaaS alerts. This has the request go through the ingress-nginx for routing, load-balancing and logging.
Instead for the in-cluster communication now it is recommended to call the applications directly when the mesh is enabled.
As an example let’s say you have been calling the pmta-manager.gservice.emarsys.net/healthcheck from your application. This application most likely will have a <app-name>-web deployment which will have a Service generated to it by the pipeline called pmta-manager-web if ingress.enabled or service.enabled is true (e.g.: doc). The kubernetes Service is generated for default port 80 that maps to port 8080, when calling the Service then port 80 should be used. In this case the following two examples show how you could have it in your application config secret to call the Service.
# if the service is within the same namespace
URLS_TO_CALL = http://pmta-manager-web/healthcheck
# if the service is in some other namespace
URLS_TO_CALL = http://pmta-manager-web.<other-namespace>/healthcheck
The routing settings set in your Ingresses will not apply to meshed applications with pod-to-pod traffic. Instead, you will be able to handle the routing for your meshed application(s) with a Virtual Service resource. A GAP YAML API for this will be available soon. If you need it sooner, you can create a Virtual Service via a patch in your GAP folder.
The following example shows how one can do routing with a Virtual Service similarly to how it was done with the Custom Ingress way previously.
You only need to set the followings, note that ingress.rules is not supported in case of useServiceMesh enabled:
# gap/staging/gap.yaml
deployments:
web:
ingress:
enabled: true
hosts:
- myapp-staging.gservice.emarsys.com
- app-staging.eservice.emarsys.com
# gap/production/gap.yaml
deployments:
web:
ingress:
enabled: true
hosts:
- myapp.gservice.emarsys.net
- app.eservice.emarsys.net
Previously with the Custom Ingress way:
# gap/staging/gap.yaml
deployments:
web:
ingress:
enabled: true
rules:
myapp-staging.gservice.emarsys.com:
/: {}
/api:
serviceName: my-api-service
pathType: Prefix
# gap/production/gap.yaml
deployments:
web:
ingress:
enabled: true
rules:
myapp.gservice.emarsys.net:
/: {}
/api:
serviceName: my-api-service
pathType: Prefix
Now with Virtual Services:
# gap/staging/gap.yaml
deployments:
web:
ingress:
enabled: true
useCustomVirtualService: true
# gap/staging/gap_virtualservice.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp-web
spec:
hosts:
- myapp-staging.gservice.emarsys.com
http:
- match: # the rules are evaluated in a sequential order, if api prefix not matched, the below route will apply.
- uri:
prefix: "/api"
route:
- destination:
host: my-custom-service
- route:
- destination:
host: myapp-web (name of your Service the same as your deployment which is generated by the pipeline)
# gap/production/gap.yaml
deployments:
web:
ingress:
enabled: true
useCustomVirtualService: true
# gap/production/gap_virtualservice.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp-web
spec:
hosts:
- myapp.gservice.emarsys.net
http:
- match:
- uri:
prefix: "/api"
route:
- destination:
host: my-custom-service
- route:
- destination:
host: myapp-web (name of your Service the same as your deployment which is generated by the pipeline)
Please note that the match rules for Virtual Services are evaluated in a sequential order from top to bottom, with the first rule being given the highest priority. Therefore please make sure to have a rule at the bottom in each of your Virtual Services to ensure that traffic will match to at least one rule.
Please refer to the Virtual Service docs for more information.
More examples about routing rules can be found here
By default, Istio uses a least requests load balancing policy, where requests are distributed among your application’s pods with the least number of requests.
This is the recommended way of load balancing by Istio as it outperforms other options such as Round Robin.
If you require a different method of load balancing for your use case, you can avail yourself of Destination Rule Istio objects, where you can define policies that apply to traffic intended for an application after routing has occurred past your VirtualService.
An example follows where the rule uses the least connection load balancing policy for all traffic to port 80, while uses a round robin load balancing setting for traffic to the port 9080.
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: myapp-web
spec:
host: myapp-web (name of your Service the same as your deployment which is generated by the pipeline)
trafficPolicy:
portLevelSettings:
- port:
number: 80
loadBalancer:
simple: LEAST_REQUEST
- port:
number: 9080
loadBalancer:
simple: ROUND_ROBIN
More on what you can do with DestinationRule can be found in the docs
The Istio proxy access logs can be found in the gap-ingress-nginx index with the filter @gap.resource.labels.container_name: istio-proxy. That index with the given filter can be used for alert rules based on the access logs. It is configured with the same logging format as the Nginx was with field mapping in Elastic Common Schema standard.
You shouldn’t need to change anything in your alerts but reviewing them is recommended. Please note that for internal calls (not coming through NGINX) the router.ingress_name field will be empty. If you rely on this field in your alert query, be aware that it will alert for external requests only.
Should you make use of the following fields in your alerts, adaptations would be required:
* @gap.service
* router.ingress_name
* router.service_name
* router.namespace
For instance, to catch entries both from ingress and istio, you could adjust your alert query filter as follow:
(@gap.service:"myappname--web" OR labels.k8s-pod\/gap\/application-name:"myappname")
Note that any slash needs to be escaped with backslash, otherwise this leads to an error on Kibana that would look like M of N shards failed The data you are seeing might be incomplete or wrong.
In case you receive false alerts for logs having request="-" and response=“0”, this is due to the logging of encrypted outgoing requests via TLS. You could add an extra filter to disregard those entries.
For the time being, the value shown in serve_time for log entries originating from istio, should be read without considering the last three zeros. This is a known issue that will be addressed in the future: https://emarsys.jira.com/browse/GAP-182. In case you receive alerts due to this change of magnitude, you can adjust your alert until this is fix, such as:
- Existing alert:
serve_time:>=25000 - New alert:
serve_time:>=25000000
Should you want to save a Kibana view that helps distinguishing log entries originating from ingress and those from istio, we recommend showing the field applicationName as a column. The possible values would look like:
ingress-nginx: self-explanatory, log entries coming from ingressmyappname: those entries originate from the istio sidecar of your service / application
Due to the occurrence of OOM (out of memory) kill errors that resulted from the higher memory use of Istio proxies, the default memory request for ALL Istio proxy sidecars is raised from 128 MB to 192 MB.
Because the traffic flows through the Istio proxy towards and from your pods it can happen that you will need to modify the resource requests and limits of it, which can be done by the following annotations on a deployment level:
deployments:
<deployment-name>:
podAnnotations:
sidecar.istio.io/proxyCPU: 1
sidecar.istio.io/proxyCPULimit: 2
sidecar.istio.io/proxyMemory: 1Gi
sidecar.istio.io/proxyMemoryLimit: 2Gi
Please note that due to Istio’s implementation of using these resource annotations, if you only set a request for either CPU or memory, the limits, and the request for the other resource (if not set explicitly) will be zeroed out. If you only set a limit for either CPU or memory, the request for the same resource will be set to the same amount. This is undesirable behaviour for most cases, so we strongly recommend that if you need to adjust any of the defaults, please set all four annotations.
To opt-out your application from using Service Mesh you need to set the useServiceMesh field to false on the root level of gap.yaml.