Global Load Balancer

Introduction

The Global Load Balancer (also called External HTTP Load Balancer) is Google’s managed reverse proxy service. In this section we introduce the features of the offering and provide guidance on how to use it with applications hosted on GAP.

When using the GLB from GAP, which is based on GKE, all configuration is done through Kubernetes manifests. An in-cluster controller takes care of configuring all moving parts of the solution (certificates, target proxies, URL maps, network endpoint groups, etc.), and as such, it “owns” the process - there is no need to interact with the load balancer configuration in any other way.

Glossary

GFE: Google’s frontend layer. It consists of many servers at the GFE points of presence (PoP), which are much more numerous than the GCP regions.

Anycast IP: In this terminology, an IP address (defined in the forwarding rule) that is announced on external BGP from all GFE PoPs. It provides an IP address for the customer which is reachable with low-latency from all around the world.

Forwarding rule: Frontend IP address, protocol and port information, targets a single target proxy. Multiple forwarding rules can target the same target proxy (eg. an IPv4 and an IPv6 forwarding rule).

Target proxy: The logical component that terminates the client http/https connections, makes routing decisions based on the URL map, and forwards the request to the appropriate backend service.

URL map: A collection of configuration entries defining routing rules based on request data. In our case it is derived from the Ingress configuration.

Backend service: An entity that defines the destination of the request. It can be VMs, instance groups, network endpoint groups, etc. Caching policy, authentication are also defined here. It also has capacity and utilization data which is tracked by the GLB (eg. to shift traffic to another region when at capacity). The backend services are also derived from the Ingress configuration but there are separate objects to modify their behaviour.

Cloud CDN: This is a product which basically enables standard caching in the GLB layer.

Cloud Armor: Enables complex authorization of the requests in the GLB layer.

Container native load-balancing: The GFE can reach pods directly on their internal IP (with the help of Network Endpoint Groups), we call this container-native because all layers of the “standard” Kubernetes ingress and load balancer stack are skipped and the GLB has a native connection to the pods. It can also assess their individual health status.

Network endpoint group (NEG): A certain type of backend service configuration that enables interfacing with the GLB. As other configuration entries, it is automatically derived from the Kubernetes manifests. For more information please see https://cloud.google.com/load-balancing/docs/negs

Global health check: With container-native load balacing, the external load balancer performs their own health checks. They come from pre-defined IP ranges. The most important implication here is that a pod will have an additional health check beyond the “local” Kubernetes mechanisms. The external checks are integrated and will report back to Kubernetes. It means that a Pod might have an unready status if the global health checkes cannot verify them.

When to use the Global Load Balancer

The solution, even if operationally completely managed, adds complexity so all factors should be evaluated.

Consider the GLB when the application is both high-traffic and user-facing, eg. receives connections directly from end user devices. The GFE significantly reduces latency, mainly due to the reduced round-trip time with the proxy doing the TLS termination.

There is also a cost factor, please see Pricing. Please note that the bandwidth based costs are paid for all applications because we use another type of load balancer in front of the ingress controller, therefore normally only the forwarding rule cost presents an increase. Things change when Cloud CDN or Cloud Armor are used because they incur per-request fees. They should be slightly cheaper than Amazon’s CloudFront offering.

Getting started (deprecated)

Please take note that the following guide is now deprecated. Please use the Setting up a new Load Balancer with TLS guide instead.

The following assumes that there’s already a functioning GAP application that needs to be changed to utilize the GLB.

Service object

Ingresses target Service objects, which target Pods. You have to create a Service definition manifest that targets your pods with the following annotation:

cloud.google.com/neg: '{"ingress": true}'

This annotation is added automatically if the ingress class is gce in gap.yaml.

This instructs the controller to create and manage a Network Endpoint Group (NEG) which is required for container-native load balancing. NEG-annotated services can be used the same way inside Kubernetes, they just add more functionality.

Ingress object

The Ingress object needs to be modified with the following annotations:

kubernetes.io/ingress.class: gce
acme.cert-manager.io/http01-edit-in-place: "true"

These annotations are added automatically if the ingress class is gce in gap.yaml.

The ingress class annotation does the following:

instructs the default ingress controller to ignore this Ingress
instructs the GCE controller to pick up this ingress and start provisioning the GLB resources

The cert-manager annotation is for ensuring compatibility with the GLB. Normally, cert-manager creates a separate Ingress object for the challenge URL. This annotation instructs it to put the challenge URI in the same Ingress, as well as turns on some features for GLB compatibility.

For more details on these annotations please see https://cert-manager.io/docs/usage/ingress/#supported-annotations and https://github.com/kubernetes/ingress-gce

Existing certificate keys will be picked up and transferred automatically.

As with the Service object, it is recommended to create a separate Ingress first to get familiar with the process.

Monitoring

After applying the modifications, both the Service and the Ingress objects should have additional status fields and annotations to report on the status of the process and the IP address assigned.

Visual dashboards are available at the following locations:

See the troubleshooting guide for more detailed view.

Advanced topics

Attaching a BackendConfig

Out of the box, the GLB integration provides sane defaults but often there is a need to configure it’s behaviour. The BackendConfig resource definition provides a mechanism to alter the GLB backend service configuration. The fields are directly mapped to GLB backend service api call properties.

First, create a BackendConfig object:

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: my-backendconfig
spec:
  timeoutSec: 30
  connectionDraining:
    drainingTimeoutSec: 60

Due to a GCP bug it is possible that the readiness path will not be picked up for the health check (https://github.com/kubernetes/ingress-gce/issues/241#issuecomment-384749607). To make sure the health check will work properly, add a property to spec similar to the following:

healthCheck:
  checkIntervalSec: 13
  timeoutSec: 9
  type: HTTP
  requestPath: <your health check path>

The name of this BackendConfig needs to be referenced from the NEG Service, with an additional annotation (assuming that the serving port is a named port ‘http’ - otherwise every port (default) or a port number should be used):

cloud.google.com/neg: '{"ingress": true}'
cloud.google.com/backend-config: '{"ports": {"http":"my-backendconfig"}}'

These annotations are added automatically if the ingress class is gce in gap.yaml.

Please see https://cloud.google.com/kubernetes-engine/docs/how-to/configure-backend-service and https://cloud.google.com/kubernetes-engine/docs/concepts/backendconfig for deep insight. Please also see https://cloud.google.com/load-balancing/docs/backend-service for information about GLB backend services. The BackendConfig directly manipulates these backend service configurations.

Access logs

The GLB provides it’s own logging mechanism in the form of structured JSON logs into Stackdriver. This is incompatible with the mechanism that GAP currently provides. There are two approaches to manage this:

Logs can be found in Google Cloud Logging for the specific project (ems-gap-stage or ems-gap-production) by filtering to

resource.type="http_load_balancer"
resource.labels.forwarding_rule_name="<<load-balance-name>>"

or selecting ‘Cloud HTTP Load Balancer’ and then the name of your LB from the list.

Utilize the router-log-sidecar.

In the latter case it makes sense to disable logging in the GLB. This is not currently possible in current GKE version (see BackendConfig doc above), needs to be done manually as a one-step process.

Cloud CDN

You can enable the Cloud CDN feature (standard caching) with the following BackendConfig setting:

spec:
  cdn:
    enabled: true

Please see https://cloud.google.com/kubernetes-engine/docs/how-to/cdn-backendconfig for further options and https://cloud.google.com/cdn/docs for deep insight into the Cloud CDN product (features, pricing).

GLB-specific metrics

The GLB publishes advanced metrics into Google Cloud Monitoring and ready-made dashboards for monitoring. Currently there is no plan to integrate these into the GAP Prometheus infrastructure.

Keepalive connections

The GFE system heavily utilizes long running TCP connections to minimize latency and maximize throughput.

It is highly recommended that the backend service properly supports keepalive connections in the following way:

keep idle connections for a value higher than 600 seconds
accept an unlimited number of requets on a connection

The reasoning is that the target proxies will keep connections for 10 minutes. Normally it isn’t a big deal if the backend closes the connection but with GLB, these connections can have significant round trip time (eg. from another continent) and there is an inherent race condition when the backend closes an idle connection but there’s already an inflight request from the proxy. They cannot retry unsafe requests.

With the GLB, it is typical to see a large, stable number of connections being kept open and receiving traffic on only a few of them (depending on conditions). This is one of the reasons why the GLB doesn’t make much sense below a certain size - it simply cannot ensure proper, equal distribution below a certain traffic level.

Further resources

https://cloud.google.com/load-balancing/docs/https

https://cloud.google.com/kubernetes-engine/docs/how-to/configure-backend-service