Securing Azure Kubernetes Service

Securing Kubernetes on any platform can be a daunting task. In this post I will attempt to explain the main building blocks to securing Kubernetes on Azure. Although the concepts can be applied to any Kubernetes hosting platform, my focus will be on Azure as it is my cloud of choice for Kubernetes at the moment.

All the information in this post can be found in the official documentation of Azure Kubernetes Service. But it took me a while to grasp when and why to use this or that feature. And I thought a big-picture guide to security on Azure Kubernetes would be a good idea.

Introduction: The Main Pillars for Kubernetes Security

aks security pillars

Identity: Service Principals and Azure RBAC

A Kubernetes cluster on Azure must have its own identity. It is similar to a service account. You have two alternatives which lead to the same result, you can either use a Service Principal, or a Managed Service Identities (which is basically a wrapper around a service principal).

I’m not going to go into the details of the difference between them or how to create them, you can find that in the links above. But what I want to point your attention to is what they are used for.

What a service principal allows you to do, is to provide access for the cluster to Azure resources it needs to function, such as: Azure Container Registries, Network, Storage, and Azure Container Instances. Again, we’re not going to talk about how to do that, you can easily find that here.

What is important to understand however from a security perspective, is that this service principal will have access to other things outside of the cluster itself. You’ll have to keep track of your service principal, learn how to rotate its credentials, and learn how to delete it after it is no longer required.

When you create a new cluster using the az aks create command, the crednetials to the service principal used by this cluster will be saved in file located at ~/.azure/aksServicePrincipal.json. It is important to keep track of that file and secure. If some malicious actor would gain access to that file, they would be able to use it to gain access to everything that Service Principal has access to, including the Container Registry, the Cluster Network, and Cluster Storage.

In case the file in compromised, and you want to create a new service principal entirely, or rotate the credentials of the existing one, you can use this guide to do this. The service principal credential (client secret) is valid for one year by default, this is a security measure by Azure to force you to rotate it at least once a year.

Also, it is important to remember that if a Service Principal is created for an AKS Cluster, it is not automatically deleted upon the deletion of the cluster itself. It is a separate object which you must delete separately, otherwise you risk having a lingering identity which has access to other resources which still may exist. To delete a service principal, use the az ad sp delete command. You can find various examples online on how to track down unused service principals.

Identity: Azure AD and Kubernetes RBAC

When you create your AKS Cluster, you’ll have the option to stick to the vanilla cluster authentication and simply use vanilla Kubernetes RBAC to secure access to your cluster. This means, your users will be provided separate usernames and passwords to authenticate to Kubernetes. To me this sounds like a very bad idea, because a user will have to memorize a separate set of credentials to use the cluster and it is an additional authentication system which you’ll have to manage separately.

Alternatively, you can use Kubernetes RBAC with AD Integration. That means that the authentication process will be handled by Azure Active Directory, so your users can use their normal domain identities to login to the cluster. Then the authorization part will still be provided by Kubernetes using vanilla cluster role bindings and role bindings.

Needless to say, you’ll still have to take care of your role bindings in Kubernetes separately. But at least your users won’t have to worry about memorizing different sets of credentials, and you can take advantage of all the built-in security mechanisms in Azure AD, such as multi-factor authentication, tracking failed logins, and conditional access.

Here is how you can enable Azure AD Integration.

And then, here is how to use Azure AD Integration with Kuberetes RBAC.

Platform: Monitoring & Auditing

Master Logs & Control Plane

A typical need from a security perspective, is to log and audit all administrative user access and actions to the cluster. The challenge with doing this on AKS is that the control plane including the api server are actually managed for you by Azure. Therefore, you’ll need the help of Log Analytics to view the logs for the entire control plane.

Enabling and using this feature is duly covered in this article.

This allows you to query these logs using Log Analytics, and make sure you can set the proper audits and alerts in place as you see fit.

In case you don’t want to use Log Analytics, and want to use your own log management tool instead, you can either archive these logs to an storage account and have your other tool pick them up from there, or stream them to an event hub and from there push them to your prefered tool. We’re not going to cover that here as it depends on your log management took and it can be pretty involved, but various guides can be found online.

Container Logs & Insights

Another aspect of monitoting and logging is the containers themselves. For that, you can use Container Insights for Azure Monitor.

You can of course use alternative tools for that as well, such as Prometheus or others. But going the Azure Monitor route gives you the advantage of using the same tool Log Analytics for managing these logs. You’re going to have to learn your queries, a complete list of Data Sources can be found here.

Platform: Azure Security Center

Having access to the logs does not guarantee that nothing will go unnoticed. You’ll still need something to make sense of these logs and recommend to you fixes for clear vulnerabilities or issues with your configuration.

Container Security in Azure Secruity Center will provide you with vulnerability management for your images, hardening the Docker hosts, hardening the cluster, and runtime protection for your containers.

This guide will walk you through enabling these features in Azure Security Center. Once you do that, you’ll start getting recommendations about vulenrabilities with your configuration.

Platform: Azure Container Registry Security

The most common vulnerabilities with containers, are in the images themselves. There are certain checks and bounds in Azure Container Registry to allow you to secure your images:

Ability to quarantine pushed images until they are scanned by a security assessment tool such as Aqua, Twistlock, or Qualys.
Content trust in Azure Container Registry, allows you to verify image signatures when the image is being pushed to the registry, and pulled by the Docker client.
Restrict access to ACR to allow access only from VNets you allow.

Cluster: Upgrading AKS

Cluster Control Plane Ugrade

Kubernetes is not a perfect product. It is a work in progress and still has lots of vulnerabilities.

Upgrading your cluster to the latest Kubernetes version available on AKS will give you the best chance at avoiding these vulnerabilities.

The process of upgrading the cluster control plane is very simple. You use the az aks upgrade command, or upgrade the cluster from the GUI from the Upgrade blade.

Node Pools Upgrade

Now that you’re control plane is upgraded you can proceed with upgrading your node pools. This is similarly simple. Use the az aks nodepool upgrade command or the GUI to do so.

Node Operating System Updates

AKS automatically applies operating system patches and updates to nodes. So you generally do not need to worry about that. However, some of these updates may require operating system or service restarts to take effect. AKS will not reboot the nodes on your behalf. So you’ll need some way to keep track of when these nodes need to be rebooted and do that yourself.

For that, you can use kured (KUbernetes REboot Daemon) to watch for Linux nodes that require a reboot, then automatically handle the rescheduling of running pods and node reboot process.

Cluster: Internal Load Balancers

By default, when you deploy the cluster it will have an External Load Balancer. This is to easily allow you to publish your Kubernetes services through this Load Balancer. However, you probably do not want to directly publish your services to Public IP Addresses through this load balancer. In most cases, you’ll either need an application gateway or a web application firewall to put in front of your services. Or maybe you’ll want your service to be only accessible internally through the VNet, or your on-premise network through a VPN.

For that, you’ll have to publish services using an Internal Load Balancer instead of an External one.

The way to do that is very simple, just add the following annotation in your Kubernetes service YAML as such:

  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"

This is publish the service on an internal load balancer, and assign to it a private VNet IP Address instead of a public one. Now that your service has a private IP Address, it can be accessed within the VNet directly from any other caller on the same VNet, Peered VNet, or an on-premise network through your VPN Gateway. It can also be published securely to the outside world through an Azure Application Gateway/WAF or any third-party WAF you may choose.

The topic of Azure Application Gateway as a WAF for your Kubernetes Services will be covered in a follow up post as it can be a little involved. But for now this piece of info from the docs will do.

Cluster: Private Clusters

Even when you publish your Kubernetes services and workloads securely, your AKS control plane and api endpoint will still be publicly accessible. This is because AKS is a managed service where the control place is provided to you by Azure.

However, now you can create Private Clusters on Azure, whereby the control plane and api server will have an internal IP address.

Doing so gives you certain advantages when it comes to security, but also multiple limitations when it comes to DevOps hosted agents, DevSpaces, and Azure Monitor live data and other small nuances related the connectivity between some services and the control plane now that it is not publicly accessible.

Cluster: Access & Management

For accessing the API Server securely, since it typically have a public IP, a good practice is to secure access by setting authorized IP ranges. This limits the api server connectivity to a set of preset IP ranges, which prevents any attempts of unauthorized access from anywhere you haven’t already approved.

If you choose to go with the private cluster route, or if you want to gain access to the cluster nodes themselves (these always have a private IP address), you’ll need a secure way to do that.

The alternatives are:

A Jumpbox VM within the same VNet, or a Peered VNet (who wants to do that?!)
Express Route or VPN connection between your network and the VNet. This is fine for accessing the api server. But for accessing the nodes I find it limited as it only allows you access from certain networks (which already have the established VPN connection), and that it requires a lot of pre-planning, because the cluster network cannot have a CIDR which collides with any connected network.
Azure Bastion, this is my favorite method for managing any SSH host on Azure, including the cluster nodes. I will also be writing a dedicated post for this.

Cluster: Certificates

If you’ve ever done Kubernetes the hard way, then you know how impossibly difficult it can be to rotate Kubernetes certificates. Fortunately, you can easily rotate the certificates using a single command: az aks rotate-certs.

Cluster: Egress Traffic

When one of your pods attempts to access the outside world, it will do so through a randomly assigned egress IP Address. This absolutely fine for most cases, except when - for any security reason - you want to make sure this traffic egresses from a static Public IP which your external services can identify and whitelist. You can configure this static IP address using this guide.

Cluster: Network Policies

In case you are not using a Service Mesh or some fancy CNI, you’re going to have to control traffic within your AKS cluster using vanilla Kubernetes Network Policies. This is no different than any other Kubernetes cluster. I just thought it should get an honorable mention as this is still an essential part of securing your cluster.

Pod: Pod Identity

Your pods will most certainly need to access external resources, and most certainly your pods will need to authenticate against them. You can use Pod Identities so that your pods can simply gain access to external Azure resources such as Storage, SQL, or KeyVault, without the need to use dedicated credentials. The process is quite involved, but the details are laid out in this GitHub Repository. I think this is a crucial component in a secure application deployed on Azure Kubernetes.

Pod: Storage

Persistant Storage in Azure Kubernetes Service can be done in many ways such as Azure Files, Azure Disks, Azure Blobs, NFS and others.

It is important to encrypt and secure access to these storage types regardless of what they are. For Azure Files and Azure Blobs, they are inherintly encrpted at rest and in transit, but you still need to secure access to them using Azure RBAC.

As for Azure Disks, you’re going to have to apply Azure Disk Encryption to these disks to encrypt them at rest.

If you’re using NFS as a persistant storage, then it is up to you to make sure you NFS storage provider is secure.

Pod: Secrets

Your pods will also need to be passed secrets. Either sensitive configuration values, or credentials it must use to access external systems.

There are many good ways you can do that in AKS, the two most popular are Azure Key Vault Provider for Secrets Store CSI Driver or Vault by HashiCorp.

For Kubernetes version 1.15 and below, use Azure Keyvault Flexvolume instead of the Azure KeyVault Store.

All these solutions serve the same function, to provide the secret to the pod in a convinient and secure manner. The main limitation with Azure KeyVault FlexVolume is that it provides the secrets to the pod in the form of a file in a mounted volume path. This usually requires your pod to have logic in place to read the secret from that file mounted volume. In contrast, Vault by HashiCorp mounts the secret as an environment variable, this is far more useful as most container images are configured to receive those configuration values via environment variables.

The ability to mount secrets to environment variables has been added to Azure Key Vault Provider. So if you’re using Kubernetes 1.16 or higher, you can use Azure Key Vault Provider and choose to mount your secrets as files in a mounted volume path or as environment variables.

Do not hardcode your secrets in your deployments or pods yaml definitions. Use any of the methods above instead.

Pod: Security Policy & Security Context

This is not specific to Azure. Basic Kubernetes security didcates that you use Security Policies and Security Context.

A notable mention here is the use of AppArmor to restrict the capabilities of individual programs in your containers.

If your program does not need privalges escalations, make sure to set allowPrivelageEscalation: false in the container security context in your yaml definition.

Please let me know if there is anything you think I’ve missed. As expected, this is an ever evolving process.

zoomspeaks.tech