Migrating applications to multiregion pipeline
We highly encourage you to do the migration in one sitting. It should not take more time than 1 hour based on our experience. Running the old and new pipeline at the same time is not supported and can lead to unexpected synchronization problems.
The US production cluster (p-us1-01) will be enabled by the end of April 2026 at the latest. This may happen earlier — we will make an announcement before enabling it.
Add the new cluster(s) to your machine to be used in e.g k9s (if not yet authenticated to GPC locally, follow the points related here):
gcloud container clusters get-credentials s-us1-01-gap-primary --region us-east4 --project ems-gap-s-us1-01 --dns-endpoint gcloud container clusters get-credentials p-us1-01-gap-primary --region us-east4 --project ems-gap-p-us1-01 --dns-endpointRemove
autoSyncProductionfrom yourgap.yamlif you have it set and commit this change.autoSyncProductionshould be disabled before the migration to avoid potential on first glance confusing situations where the production app is out of sync after the migration. Eg.: When the old applications are not deleted yet, they can take back the ownership of the resources and redeploy an old state.The new pipeline also doesn’t support auto-sync to production environments for complience reasons and leaving it here would lead to confusion as it will be a no-op after the migration.
Use the
usePrebuiltImagefeature of gap.yaml if not doing so already:- Add a github workflow that builds your application, similarly to the example defined here
- Add
usePrebuiltImage: trueto the root of thegap.yaml.
If before you were already using
usePrebuiltImage:- add the github actions workflow level env var
IMAGE_TAGwith the value of, for example, thegithub.shabuilt-in variable:
env: IMAGE_TAG: "${{ github.sha }}"- add the
IMAGE_NAMEenv var as below, so it is pushed to the new registry and repository:
env: IMAGE_NAME: europe-west3-docker.pkg.dev/sap-ems-base-infra-package-p/gap-images/<your-application-name>- update the following in your image building step in the github actions workflow:
- your tag value in docker build to use the workflow level environment variable set above. Notice that the
latesttag was removed, as the new image repositories have tag immutability enabled:run: docker build . --file Dockerfile --tag ${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} - update the docker registry authentication command before docker push at the last stage:
gcloud auth configure-docker europe-west3-docker.pkg.dev
- your tag value in docker build to use the workflow level environment variable set above. Notice that the
- add the github actions workflow level env var
Migrate to new image specification format:
- Use new image specification format with the below example in
gap.yaml. The registry is automatically calculated for the region, if not set explicitly. The tag will be automatically generated by theupdate-image-tagstep:
image: repository: sap-ems-base-infra-package-p/gap-images/<your-application-name>- Use new image specification format with the below example in
Add GAP-Workflow application to your repository: https://github.com/apps/gap-workflows/installations/107694084
- In case you have branch protection policies that require PRs or blocks push to master/main in your repo, add GAP-Workflow application to the Bypass list on GitHub.
Make sure to have, if there are or would be any, the
gapfolder env level override directories (where they were previously set tostagingand/orproduction, still keep them) added for the new envs with the new environment name, or added tostaging-defaults,production-defaultsfolders respectively if there are common configurations across instances:s-eu1-01(for the current EU staging, thestagingfolder should be copied (not yet deleted) with this new name if there is any)p-eu1-01(for the current EU production, theproductionfolder should be copied (not yet deleted) with this new name if there is any)s-us1-01(for the US staging)p-us1-01(for the US production)staging-defaults(common configs across staging instances)production-defaults(common configs across prod instances)
Your
gapfolder should look like this during the migration (old directories kept until step 15):gap/ ├── gap.yaml ├── staging/ # old — keep until migration is complete (step 15) │ └── gap.yaml ├── production/ # old — keep until migration is complete (step 15) │ └── gap.yaml ├── staging-defaults/ # (optional) overrides for all staging clusters │ └── gap.yaml ├── production-defaults/ # (optional) overrides for all production clusters │ └── gap.yaml ├── s-eu1-01/ # new EU staging (copied from staging/) │ └── gap.yaml ├── p-eu1-01/ # new EU production (copied from production/) │ └── gap.yaml ├── s-us1-01/ # new US staging │ └── gap.yaml └── p-us1-01/ # new US production └── gap.yaml
If you don’t commit these before adding the application togap-registry, the first manifest generation could lead to dangling resources.
The migration is a great opportunity to review and rationalize your resource requests for each environment. Staging apps often have significantly inflated requests relative to their actual usage. Check your apps on our Grafana dashboards EU staging dashboard and EU production and set appropriate values per cluster. See the resource details guide for practical guidance. We are currently working Logging and Monitoring solutions with the new clusters in multi-region.
If you need to make Egress calls to the outside (non-GCP), please refer to this doc.
Add a github workflow that updates the image tag in
gap.yaml, with the addition of the paths-ignore in order to not run the belowupdate-image-tagjob when only agapfolder change happened, as there should be an image build in that case:Make sure you have updated below:
<name-of-container-building-step><name-of-the-main-branch>
on: push: paths-ignore: - 'gap/**' jobs: update-image-tag: timeout-minutes: 10 permissions: contents: "write" id-token: "write" needs: - <name-of-container-building-step> runs-on: ubuntu-latest if: ${{ github.ref == 'refs/heads/<name-of-the-main-branch>' && github.actor != 'gap-workflows[bot]' }} steps: - uses: actions/create-github-app-token@v3 id: app-token with: client-id: ${{ secrets.GAP_WORKFLOW_APP_ID }} private-key: ${{ secrets.GAP_WORKFLOW_APP_PEM }} - uses: "actions/checkout@v6" with: token: ${{ steps.app-token.outputs.token }} - name: Get GitHub App User ID id: get-user-id run: echo "user-id=$(gh api "/users/${{ steps.app-token.outputs.app-slug }}[bot]" --jq .id)" >> "$GITHUB_OUTPUT" env: GH_TOKEN: ${{ steps.app-token.outputs.token }} - name: Update image tag run: | yq -i '.image.tag = env(IMAGE_TAG)' gap/gap.yaml - name: Commit and push changes id: commit_gap_yaml run: | git config --global user.name '${{ steps.app-token.outputs.app-slug }}[bot]' git config --global user.email '${{ steps.get-user-id.outputs.user-id }}+${{ steps.app-token.outputs.app-slug }}[bot]@users.noreply.github.com' git add gap/gap.yaml git commit -m "Update image tag to ${{ env.IMAGE_TAG }}" git push origin HEAD:${{ github.ref_name }}Remove the
submit-build(or sometimes it is calledgap-deploy) step from the current workflow.Using the old and new workflow at the same time is not supported as it can lead to unexpected problems. It’s much easier to just clean it up.Add entry to https://github.com/emartech/gap-registry the GAP team will approve and merge your PR. In case you can not create a PR please reach out to the GAP team on #infra-support slack channel or open a ticket in the team’s Jira project
- create the app.yaml with the below specification:
your-namespace/your-app-name/app.yaml (e.g cloud-platform/gap-docs/app.yaml)- inside app.yaml:
appname: name of your app, same as in gap.yaml namerepo: url to your app repo, such as https://github.com/emartech/gap-docs.gitlabels: (optional)group: (string, optional) the value set for the group key can be used to group together Applications on Argo CD Applications view UI (if set on gap.yaml with theapplicationLabelsfield, please move it over to the app.yaml)
path: (optional, defaults to “gap”) if there are multiple gap folders in the app repo, name of the gap folder in the root for the applicationautoSyncToStaging: (optional, defaults totrue) if usingpreDeploy, set to false for the first sync in order to only syncServiceAccountfirst. The field value only applies to staging environments.instanceConfig: (optional) per-instance overrides<instance-key>: (e.g:s-us1-01)disable: true (if set to true, the application will not be deployed on the specified instance, except if set to true after the deployment of the app, deletion of the app with the defaultForegroundpropogation policy onArgoCDis required.)
targetRepoRevision: (optional, defaults to"HEAD", which means the most current revision in the repo) can be set to a tag/branch/commit
- inside app.yaml:
Example:
app: name: <my-application> repo: https://github.com/emartech/<my-app-repo-name>.git- create the app.yaml with the below specification:
Make sure you have commited your cluster specific overrides in your applicationsgapfolder before adding the application togap-registry, otherwise the first manifest generation could lead to dangling resources. See step 7 for more details.
- Ensure, if needed, the workload identity is configured to access e.g your databases with the help of this guide.
- Don’t forget to create your secrets for your application for example with gap-cli config:init.
- Verify that the new applications popped up in Argo CD for the new cluster(s). It can take a few minutes to show up correctly on Argo CD. Unless
autoSyncToStagingwas set to false the staging environments will be auto-synced.- Check the status for the new staging application. If
autoSyncToStagingwas not set to false, the staging app will take ownership of your old app’s resources automatically. - After checking that the
difffor the production app seems good (nothing unexpected, mostly metadata changes) sync the new application (if usingpreDeploy, sync theServiceAccountfirst), so it takes ownership of the old resources.- If using
preDeployand forgot to setautoSyncToStagingtofalsefor staging for the first run or synced all production resources without syncingServiceAccountfirst for the first run, click on the button that displays the sync in progress, and on the popped up page clickTerminate.
- If using
- Check the status for the new staging application. If
SharedResourceWarningwill show up on Argo CD in the App Conditions. These can safely be ignored, these will resolve after the sync completes. You are seeing these because you already have an application with the old name and workflow.If the app shows as OutOfSync after the initial sync but auto-sync does not trigger, this is expected behaviour. Argo CD will not re-attempt automated sync for a commit SHA it has already successfully synced against — even if chart metadata or generation fields have since changed. Simply manually sync the application once from the Argo CD UI to resolve it.
- At this point, the
stagingandproductiongap folder env level override directories should be deleted. usePrebuiltImageis a no-op unless you roll back, you can remove it.- Ask the GAP team to delete the no longer necessary gap-application directory by creating a ticket in Jira to the GAP team’s Jira project with the following template. Important Avoid creating separate tickets for each application — the ideal case is to get rid of the whole namespace with a single ticket once all apps are migrated. If you are migrating multiple applications but the rollout is spread out over time, do open a ticket per app rather than waiting — having duplicate Argo CD apps running in parallel for extended periods is not recommended.
Dear GAP team,
Please remove the following folder(s) from the gap-applications repository for our migration process:
- <your-namespace>
or
- <your-namespace>/<your-migrated-app>
- <your-namespace>/<your-other-migrated-app>
This namespace/application has been successfully migrated and is no longer needed in the gap-applications repository.
and please delete our application from Argo CD with the non-cascading option.
Cheers,
<your-name>
- Set
app.autoSyncToStagingfield to false in the app.yaml of the app that is being rolled back to the old pipeline. - (If using env level gap folder overrides) Bring back the
stagingand/orproductiondirectories in thegapfolder for the proper overrides to apply. - Remove the
on: paths-ignorein the workflow, it might look like this:on: push: paths-ignore: - 'gap/**' - Add back the
latesttag in your docker build commandrun: docker build . --file Dockerfile --tag ${{ env.IMAGE_NAME }}:latest --tag ${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }} - Comment out the
update-image-tagworkflow job. - Put back the
submit-buildstep which was deleted, and sync the applications that it brings up.
In case you have a monorepo with multiple gap.yaml files and having multiple workflows building each application, you can end up with race condition in the update-image-tag step.
To rectify this you can use a retry mechanism:
- uses: nick-fields/retry@v3
name: Commit and push changes
id: commit_gap_yaml
with:
timeout_minutes: 10
max_attempts: 10
retry_on_exit_code: 1
on_retry_command: echo "Retrying commit and push due to parallel GAP deployment changes..."
command: |
git pull --rebase origin ${{ github.ref_name }}
git config --global user.name '${{ steps.app-token.outputs.app-slug }}[bot]'
git config --global user.email '${{ steps.get-user-id.outputs.user-id }}+${{ steps.app-token.outputs.app-slug }}[bot]@users.noreply.github.com'
git add gap-workers/gap.yaml
git commit -m "Update image tag to ${{ env.IMAGE_TAG }}"
git push origin HEAD:${{ github.ref_name }}
Each application in a monorepo needs its own entry in the gap-registry. Create a separate<your-namespace>/<your-app-name>/app.yamlfile for each application, and make sure each entry uses the correctapp.pathto point to its respective gap folder (e.g.app.path: gap-workers).
If your application uses preDeploy, the first sync requires extra care because Argo CD needs to create the ServiceAccount before it can run the pre-deploy job and sync the rest of the resources.
Before the first sync:
- In your
app.yamlin the gap-registry, setautoSyncToStaging: false. This prevents Argo CD from auto-syncing all resources at once before theServiceAccountexists.
First sync (staging and production):
- Manually sync only the
ServiceAccountresource first, using the selective sync option in the Argo CD UI. (Click on the ‘…’ on the box of the SA resource and selectSync) - Once the
ServiceAccountis in place, sync the remaining resources.
If you forgot to set autoSyncToStaging: false (staging) or synced everything at once without the ServiceAccount first:
- Click on the sync operation that is in progress in the Argo CD UI, and on the page that appears click Terminate.
- Then follow the steps above from the beginning.
After the first successful sync, you can (and probably should) set autoSyncToStaging: true again in your app.yaml.
So far we have seen only very few applications that do not build their own image, but use one provided by another team. (eg.: contact-data-deletion-v2). You should only use this approach if the image is updated infrequently. In such cases the migrations steps are mostly the same with the following differences:
- You will need to remove the
submit-buildstep but you don’t need to add ‘update-image-tag’ step, as there is no image being built in the workflow. - You will need to specify the image repository and tag in the
gap.yamlfile. Eg.:image: repository: sap-ems-base-infra-package-p/gap-images/contact-data-deletion-v2 tag: "2.7" # make sure you use the quotes for such tags, otherwise it will be misinterpreted as a float number and not a string, which causes the schema validation to fail and the pipeline to break - In case of using patches, the full image path should be used, beause there is no registry interpolation.
Eg.:
image: europe-west3-docker.pkg.dev/sap-ems-base-infra-package-p/gap-images/contact-data-deletion-v2:2.7instead of justimage: contact-data-deletion-v2:2.7 - You will need to change the image tag manually in the
gap.yamlfile for every new image version, when notified by the team that provides the image to you.