Skip to content

Create a new user node pool in AKS using terraform

Introduction

In Azure Kubernetes Service (AKS), there are two types of node pools: system node pools and user node pools. These node pools serve different purposes and are used for distinct workloads.

This guide will walk you through creating a new user node pool in Azure Kubernetes Service (AKS) using Terraform and implementing taints and tolerations.

Technical Scenario

In this scenario, we will create a new user node pool in an existing AKS cluster using Terraform. We'll apply a taint to this node pool. Then, we'll deploy an application with tolerations to ensure it runs on nodes with the corresponding taint.

Objective

In this exercise we will accomplish & learn how to implement following:

  • Step 1: Declare & Define Variables
  • Step 2: Create a new user node pool in AKS using Terraform
  • Step 3: Verify the new user node pool Taint
  • Step 4: Deploy an application with tolerations to run on nodes with the taint
  • Step 5: Verify your application status

Prerequisites:

  1. Terraform Installed: Ensure you have Terraform installed on your local machine.

  2. Azure CLI: Install the Azure Command-Line Interface (CLI) for authentication.

  3. Terraform Configuration: Make sure you have an existing Terraform configuration for your AKS cluster.

Implementation Details

This guide will walk you through creating a new user node pool in Azure Kubernetes Service (AKS) using Terraform and implementing taints and tolerations.

login to Azure

Verify that you are logged into the right Azure subscription before start anything in visual studio code

# Login to Azure
az login 

# Set Azure subscription
az account set -s "anji.keesari"

Connect to Cluster

Use the following command to connect to your AKS cluster.

# Azure Kubernetes Service Cluster User Role
az aks get-credentials -g "rg-aks-dev" -n "aks-cluster1-dev"

# Azure Kubernetes Service Cluster Admin Role
az aks get-credentials -g "rg-aks-dev" -n "aks-cluster1-dev" --admin

# get nodes
kubectl get no
kubectl get namespace -A

Step 1: Declare & Define Variables

In your Terraform configuration file, declare variables for AKS cluster details, and node pool configuration.

This table presents the variables along with their descriptions, data types, and default values:

Variable Name Description Type Default Value
user_node_pool_enabled Should the User Node Pool enabled? Defaults to false. bool false
user_node_pool_vm_size Specifies the VM size of the user node pool string Standard_B4ms
user_node_pool_availability_zones Specifies the availability zones of the user node pool list(string) ["1", "2", "3"]
user_node_pool_name Specifies the name of the user node pool string agentpool
user_node_pool_subnet_name Specifies the name of the subnet that hosts the user node pool string SystemSubnet
user_node_pool_subnet_address_prefix Specifies the address prefix of the subnet that hosts the user node pool list(string) ["10.0.0.0/20"]
user_node_pool_enable_auto_scaling Whether to enable auto-scaler. Defaults to false. bool false
user_node_pool_enable_host_encryption Should the nodes in this Node Pool have host encryption enabled? Defaults to false. bool false
user_node_pool_enable_node_public_ip Should each node have a Public IP Address? Defaults to false. Changing this forces a new resource to be created. bool false
user_node_pool_max_pods The maximum number of pods that can run on each agent. Changing this forces a new resource to be created. number 30
user_node_pool_node_labels A list of Kubernetes taints which should be applied to nodes in the agent pool (e.g key=value:NoSchedule). Changing this forces a new resource to be created. map(any) {"kubernetes.azure.com/scalesetpriority" = "spot"}
user_node_pool_node_taints A map of Kubernetes labels which should be applied to nodes in this Node Pool. Changing this forces a new resource to be created. list(string) ["kubernetes.azure.com/scalesetpriority=spot:NoSchedule"]
user_node_pool_os_disk_type The type of disk which should be used for the Operating System. Possible values are Ephemeral and Managed. Defaults to Managed. Changing this forces a new resource to be created. string Managed
user_node_pool_os_type The type of Operating System. The operating system used on each Node in this Node Pool. string Linux
user_node_pool_priority The priority of the Virtual Machines in the Virtual Machine Scale Set backing this Node Pool. string Regular
user_node_pool_max_count The maximum number of nodes which should exist within this Node Pool. Valid values are between 0 and 1000 and must be greater than or equal to min_count. number 5
user_node_pool_min_count The minimum number of nodes which should exist within this Node Pool. Valid values are between 0 and 1000 and must be less than or equal to max_count. number 1
user_node_pool_node_count The initial number of nodes which should exist within this Node Pool. Valid values are between 0 and 1000 and must be a value in the range min_count - max_count. number 2
user_node_pool_os_disk_size_gb The size of the OS Disk on each Node in this Node Pool. number 128

Declare Variables

variables.tf
// ========================== Azure Kubernetes services (AKS)- User Node Pool ==========================

variable "user_node_pool_enabled" {
  description = "(Optional) Should the User Node Pool enabled? Defaults to false."
  type        = bool
  default     = false
}


variable "user_node_pool_vm_size" {
  description = "Specifies the vm size of the user node pool"
  default     = "Standard_B4ms"
  type        = string
}

variable "user_node_pool_availability_zones" {
  description = "Specifies the availability zones of the user node pool"
  default     = ["1", "2", "3"]
  type        = list(string)
}

variable "user_node_pool_name" {
  description = "Specifies the name of the user node pool"
  default     = "agentpool"
  type        = string
}

variable "user_node_pool_subnet_name" {
  description = "Specifies the name of the subnet that hosts the user node pool"
  default     = "SystemSubnet"
  type        = string
}

variable "user_node_pool_subnet_address_prefix" {
  description = "Specifies the address prefix of the subnet that hosts the user node pool"
  default     = ["10.0.0.0/20"]
  type        = list(string)
}

variable "user_node_pool_enable_auto_scaling" {
  description = "(Optional) Whether to enable auto-scaler. Defaults to false."
  type        = bool
  default     = false
}

variable "user_node_pool_enable_host_encryption" {
  description = "(Optional) Should the nodes in this Node Pool have host encryption enabled? Defaults to false."
  type        = bool
  default     = false
}

variable "user_node_pool_enable_node_public_ip" {
  description = "(Optional) Should each node have a Public IP Address? Defaults to false. Changing this forces a new resource to be created."
  type        = bool
  default     = false
}

variable "user_node_pool_max_pods" {
  description = "(Optional) The maximum number of pods that can run on each agent. Changing this forces a new resource to be created."
  type        = number
  default     = 30
}

variable "user_node_pool_node_labels" {
  description = "(Optional) A list of Kubernetes taints which should be applied to nodes in the agent pool (e.g key=value:NoSchedule). Changing this forces a new resource to be created."
  type        = map(any)
  default     = { "kubernetes.azure.com/scalesetpriority" = "spot" }
}

variable "user_node_pool_node_taints" {
  description = "(Optional) A map of Kubernetes labels which should be applied to nodes in this Node Pool. Changing this forces a new resource to be created."
  type        = list(string)
  default     = ["kubernetes.azure.com/scalesetpriority=spot:NoSchedule"]
}

variable "user_node_pool_os_disk_type" {
  description = "(Optional) The type of disk which should be used for the Operating System. Possible values are Ephemeral and Managed. Defaults to Managed. Changing this forces a new resource to be created."
  type        = string
  default     = "Managed"
}

variable "user_node_pool_os_type" {
  description = "(Optional) The type of Operating System. The operating system used on each Node in this Node Pool."
  type        = string
  default     = "Linux"
}
variable "user_node_pool_priority" {
  description = "(Optional) The priority of the Virtual Machines in the Virtual Machine Scale Set backing this Node Pool."
  type        = string
  default     = "Regular"
}


variable "user_node_pool_max_count" {
  description = "(Required) The maximum number of nodes which should exist within this Node Pool. Valid values are between 0 and 1000 and must be greater than or equal to min_count."
  type        = number
  default     = 5
}

variable "user_node_pool_min_count" {
  description = "(Required) The minimum number of nodes which should exist within this Node Pool. Valid values are between 0 and 1000 and must be less than or equal to max_count."
  type        = number
  default     = 1
}

variable "user_node_pool_node_count" {
  description = "(Optional) The initial number of nodes which should exist within this Node Pool. Valid values are between 0 and 1000 and must be a value in the range min_count - max_count."
  type        = number
  default     = 2
}
variable "user_node_pool_os_disk_size_gb" {
  description = "(Optional) The size of the OS Disk on each Node in this Node Pool."
  type        = number
  default     = 128
}

Define variables

dev-variables.tfvar - update this existing file for AKS values for development environment.

dev-variables.tfvar
# user_node_pool
user_node_pool_enabled              = true
user_node_pool_enable_auto_scaling  = true
user_node_pool_max_count            = 5
user_node_pool_min_count            = 1
user_node_pool_max_pods             = 110
user_node_pool_node_count           = 1
user_node_pool_node_labels          = {"TenantName" = "tenant1"}
user_node_pool_node_taints          = ["TenantName=tenant1:NoSchedule"]
user_node_pool_name                 = "usernodepool" #"sysnodepool"
user_node_pool_os_disk_size_gb      = 128
user_node_pool_vm_size              = "Standard_B8ms"
user_node_pool_availability_zones   = ["1", "2", "3"]

Step 2: Create a new user node pool in AKS using Terraform

Define the new user node pool in your Terraform configuration. Ensure that you specify the desired taint on the node pool.

aks.tf
resource "azurerm_kubernetes_cluster_node_pool" "user" {
  count                 = var.user_node_pool_enabled ? 1 : 0
  zones                 = var.user_node_pool_availability_zones
  enable_auto_scaling   = var.user_node_pool_enable_auto_scaling
  os_disk_type          = var.user_node_pool_os_disk_type
  os_type               = var.user_node_pool_os_type
  priority              = var.user_node_pool_priority
  os_disk_size_gb       = var.user_node_pool_os_disk_size_gb
  vm_size               = var.user_node_pool_vm_size
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
  max_count             = var.user_node_pool_max_count
  min_count             = var.user_node_pool_min_count
  max_pods              = var.user_node_pool_max_pods
  node_count            = var.user_node_pool_node_count
  node_labels           = var.user_node_pool_node_labels
  node_taints           = var.user_node_pool_node_taints
  mode                  = "User"
  name                  = var.user_node_pool_name
  # orchestrator_version  = data.azurerm_kubernetes_service_versions.current.latest_version
  vnet_subnet_id = azurerm_subnet.aks.id


  tags = merge(local.default_tags, var.aks_tags)
  lifecycle {
    ignore_changes = [
      tags,
    ]
  }
}

Terraform validate

terraform validate
# output
Success! The configuration is valid.

run terraform plan & apply again here.

Terraform plan

terraform plan -out=dev-plan -var-file="./environments/dev-variables.tfvars"

terraform apply

terraform apply dev-plan


# output
azurerm_kubernetes_cluster_node_pool.user["1"]: Creating...
.
.
.

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

Step 3: Verify the new user node pool Taint

After creating the new user node pool, it's essential to confirm that taint has been successfully applied. You can use the azure portal to verification the process.

Node Pool Taint

Node List Status Before and After:

Before creating new user node pool, the list of nodes may look like this:

Before

kubectl get nodes -o wide

# output
NAME                                   STATUS   ROLES   AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-agentpool-25316841-vmss000000      Ready    agent   27h   v1.26.6   10.64.4.4     <none>        Ubuntu 22.04.3 LTS   5.15.0-1041-azure   containerd://1.7.1+azure-1
aks-agentpool-25316841-vmss000001      Ready    agent   26h   v1.26.6   10.64.4.113   <none>        Ubuntu 22.04.3 LTS   5.15.0-1041-azure   containerd://1.7.1+azure-1

After

After creating the new user node pool, the node list may include the new user node pool and look like this:

kubectl get nodes -o wide

# output
NAME                                   STATUS   ROLES   AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-agentpool-25316841-vmss000000      Ready    agent   27h   v1.26.6   10.64.4.4     <none>        Ubuntu 22.04.3 LTS   5.15.0-1041-azure   containerd://1.7.1+azure-1
aks-agentpool-25316841-vmss000001      Ready    agent   26h   v1.26.6   10.64.4.113   <none>        Ubuntu 22.04.3 LTS   5.15.0-1041-azure   containerd://1.7.1+azure-1
aks-usernodepool-19872531-vmss000000   Ready    agent   21m   v1.26.6   10.64.4.222   <none>        Ubuntu 22.04.3 LTS   5.15.0-1041-azure   containerd://1.7.1+azure-1

Step 4: Deploy an Application with Tolerations

Now, let's deploy your application onto the AKS cluster, considering the taint we've applied to the user node pool. To achieve this, you need to create a Kubernetes manifest for your application with tolerations that match the taint on the node pool. Here's an example of a tolerations YAML manifest:

      tolerations:
        - key: "TenantName"
          operator: "Equal"
          value: "tenant1"
          effect: "NoSchedule"  # This pod tolerates the taint

Now, let's integrate this into your complete deployment.yaml file:

deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: aspnet-api
  namespace: sample
spec:
  replicas: 1
  selector: 
    matchLabels:
      app: aspnet-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  minReadySeconds: 5 
  template:
    metadata:
      labels:
        app: aspnet-api
    spec:
      nodeSelector:
        "kubernetes.io/os": linux
      serviceAccountName: default
      containers:
        - name: aspnet-api
          image: acr1dev.azurecr.io/sample/aspnet-api:20230323.15
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
      tolerations:
        - key: "TenantName"
          operator: "Equal"
          value: "tenant1"
          effect: "NoSchedule"  # This pod tolerates the taint
# kubectl apply -f deployment.yaml -n sample

Now that you have your deployment.yaml ready, proceed to the next steps.

Deploy the Application

To deploy your application to the AKS cluster, apply the Kubernetes manifest using the following command:

kubectl apply -f deployment.yaml

Step 5: Verify Your Application Status

To ensure your application has been successfully deployed, you can check the status of your application pods using kubectl. Here's how:

Before Applying the Tolerations: Before

# output
sample              aspnet-api-7d96f84c56-88dnz                         1/1     Running   0          3m31s   10.64.4.225   aks-agentpool-25316841-vmss000001   <none>           <none>
sample              aspnet-app-c5c847d44-tfhw6                          1/1     Running   0          7s      10.64.4.248   aks-agentpool-25316841-vmss000001   <none>           <none>

After Applying the Tolerations:

kubectl get pods -o wide -A

# output
sample              aspnet-api-7d96f84c56-88dnz                         1/1     Running   0          3m31s   10.64.4.225   aks-usernodepool-19872531-vmss000000   <none>           <none>
sample              aspnet-app-c5c847d44-tfhw6                          1/1     Running   0          7s      10.64.4.248   aks-usernodepool-19872531-vmss000000   <none>           <none>

You should now observe your application pods running on nodes with the corresponding taint, as indicated by the change in the node pool. This ensures that your pods are scheduled on the appropriate nodes, considering the taints and tolerations you've configured.

Conclusion

You've successfully created a new user node pool in AKS using Terraform, applied a taint to it, and deployed an application with tolerations to ensure it runs on nodes with the taint.