An Intermediate-Level Deep Dive into GKE Workload Identity

riddle
MIXI DEVELOPERS
Published in
6 min readApr 6, 2022

--

Hello.

This is riddle from the SRE Group, Development Division of mixi, Inc.

GKE Workload Identity allows Pods to operate GCP resources by linking a Kubernetes Service Account to a Google Service Account.

What is Workload Identity?

GKE Workload Identity is very useful, but I didn’t have a concrete idea of exactly how Pods on GKE get Google Cloud authorization, so I looked into it.

Table of Contents

  1. How Apps on Google Cloud use Google Service Accounts
    1.1 The Answer: Metadata Servers
  2. How GKE Pods Get Access Tokens
  3. How to Use Workload Identity and the gke-metadata-server
  4. Using Workload Identity with GKE
  5. The System Flow for a Pod Getting an Access Token
  6. Afterword
  7. References

How Apps on Google Cloud use Google Service Accounts

Applications deployed on Google Cloud can use its Service Account through the Application Default Credentials (ADC) library provided by Google Cloud.

For example, the gcloud command installed on Google Compute Engine can be used directly because it already has the information for the Service Account that is linked to Google Compute Engine.

So how does the gcloud command get the Service Account name and the access token linked to Google Compute Engine?

The Answer: Metadata Servers

The app on Google Cloud and the gcloud command get the access token from the Metadata Server. Let’s look at this in more detail.

ADC settings are defined in the FindDefaultCredentialsWithParams function of the x/oauth2 library provided by Google Cloud.

Here are the comments in the code

default.go — Go

Noteworthy is №4, where applications that work on Google Compute Engine get credentials from the Metadata Server.

This is the actual code that obtains the credential.

google.go — Go

In this code, you can see that it creates the URL for “instance/service-accounts/” + acct + “/token” and decodes the result of sending an HTTP request.

If you run the command on Google Compute Engine, you can certainly get the access token.

In summary, apps and commands on Google Compute Engine can access Google Cloud resources by getting the Service Account’s access token using the Metadata Server through the client library.

How GKE Pods Get Access Tokens

Google Kubernetes Engine

GKE Pods can likewise use client libraries to obtain access tokens. However, since there are various Pods existing together in GKE, reusing Google Compute Engine’s Service Account is not safe from a security point of view.

Therefore, GKE recommends using Workload Identity to configure the Google Service Account to be used for each Pod.

How to Use Workload Identity and the gke-metadata-server

When Workload Identity is enabled, a DaemonSet named gke-metadata-server is launched. (Since it is a DaemonSet, it will be started on each Node)

As its name implies,the gke-metadata-server is a server that manages metadata.

Normally, a metadata server listens to the link-local address 169.254.169.254 (metadata.google.internal), so in case of GCE, it will pass an HTTP request to this URL to get an access token.

However, with Workload Identity enabled, any access from the Pod to 169.254.169.254 (metadata.google.internal) is forwarded to the gke-metadata-server.

This is due to the Node iptables rules that are added when Workload Identity is enabled, and DNAT will be performed according to the following settings, and access will be routed to the gke-metadata-server.

In other words, the original metadata-server is no longer available when Workload Identity is enabled.

The following configuration is a simplified version of the iptables settings.

You can see that the gke-metadata-server is set to listen on ports 987 and 988 on Node using HostPort. The following is the manifest of the gke-metadata-server.

That is, Pods on GKE get their access tokens via the gke-metadata-server on the same Node, not the Metadata Server.

Using Workload Identity with GKE

Now let’s use Workload Identity. Cluster and node pool settings for Workload Identity must be enabled in advance.

For the first step, add the necessary permissions.

roles/iam.workloadIdentityUser has the following four permissions.

iam.serviceAccounts.get
iam.serviceAccounts.getAccessToken
iam.serviceAccounts.getOpenIdToken
iam.serviceAccounts.list

Understanding roles | Google Cloud

Since iam.serviceAccounts.getAccessToken is the permission needed to retrieve the access token, this setting allows PROJECT_ID.svc.id.goog[default/gke-workload to get an access token for gke-workload@PROJECT_ID.iam.gserviceaccount.com.

Next, prepare a Kubernetes Service Account and 2 kinds of Pods — one that uses the Service Account and one that does not. (In GKE, the Pod will use the Google Service Account permissions defined in the Annotation of Kubernetes SA.)

When the manifest is ready, let’s deploy it!
* The command should be “kubectl apply -f filename”.

This time, I logged into each Pod to check the service account used via the gke-metadata-server.

pod-without-sa: The case for Pods without a Service Account.

pod-with-sa: The case for Pods with a Service Account.

For a Pod with a Service Account, the email address of the Google Service Account specified in the Annotation is defined. That means it could get the access token from the Pod through the gke-metadata-server.

On the other hand, for a Pod without a Service Account, the default account for the Workload Identity Pool is shown because the Service Account is not set.

It will fail when running the command.

The System Flow for a Pod Getting an Access Token

Finally, let’s look at the system flow of how the Pod gets an access token.

GKE Workload Identity Flow

Here is a quick explanation of this system flow.

  1. MyPod gets credentials from the gke-metadata-server Pod using the client library.
  2. The gke-metadata-server Pod looks up the Kubernetes Service Account that MyPod is using and gets the iam.gke.io/gcp-service-account: gke-workload@PROJECT_ID.iam.gserviceaccount.com annotation.
  3. The gke-metadata-server Pod gets OIDC-signed JWT from the OIDC Provider on the GKE control plane.
  4. The gke-metadata-server Pod passes the OIDC-signed JWT to IAM to get an access token for the Google Service Account, which is PROJECT_ID.svc.id.goog[default/gke-workload].
    *IAM queries the OIDC Provider on GKE to verify JWT.
  5. The gke-metadata-server Pod gets the access token for the Google Service Account, gke-workload@PROJECT_ID.iam.gserviceaccount.com, by using the access token obtained in 4.
    *IAM checks the binding of two Google SAs.
    *Binding can be done by roles/iam.workloadIdentityUser.
  6. The gke-metadata-server Pod passes the access token retrieved in step 4 to MyPod
  7. MyPod uses the access token to manipulate Google Cloud resources
    * This is introduced in Keyless Entry: Securely Access GCP Services From Kubernetes (Cloud Next ’19) .
    * Step 2 may come between step 4 and step 5 depending on the (undisclosed) implementation of the gke-metadata-server (what I wrote in step 2 is just my guess).

This is how the GKE Pod gets the access token via the gke-metadata-server. It’s complicated…

Afterword

In this article, we took a brief look at Workload Identity, which securely manages privileges in GKE. When I first used it, I was enticed by its convenience. But looking back, I realize I had no idea how it worked.

It’s not simple, but let’s try to understand it step by step!

References

--

--