Auto-installing NVIDIA device plug-in on GPU nodes only
Problem Statement
On a Kubernetes cluster before the GPUs in the nodes can be used, you need to deploy a DaemonSet for the NVIDIA device plugin. This DaemonSet runs a pod on each node to provide the required drivers for the GPUs. E.g. on Azure AKS cluster, here is the Azure official document regarding how to install NVIDIA device plug-in.
The YMAL manifest is in the nvidia-device-plugin-ds.yaml file. Once this file is applied, it does automatically run a pod to provide the required driver once a GPU node is added, no matter it's manually scaled or auto-scaled. But it may also run the pod on a node which has no GPUs at all, since that yaml manifest doesn't constraint the pods to run only on GPU nodes. This is a waste of resources on non-GPU nodes.
Solution
This is where "nodeSelector" and "affinity" can help. "nodeSelector" provides a very simple way to constrain pods to nodes with particular labels. The "affinity/anti-affinity" feature, greatly expands the types of constraints you can express.
Every node in an AKS nodepool is auto-labeled with agentpool=<nodepool name>. You may use the below command to check the values of label 'agentpool',
kubectl get nodes -L agentpool
We can leverage this label in the nodeAffinity configuration like below, assuming the AKS cluster has two gpu nodepools, gpupool1 and gpupool2. Then the DeamonSet will only run NVIDIA device plug-in on nodes of gpupool1 and gpupool2, but not of any other node pools.
apiVersion: apps/v1 kind: DaemonSet metadata: name: nvidia-device-plugin-daemonset namespace: gpu-resources spec: selector: matchLabels: name: nvidia-device-plugin-ds updateStrategy: type: RollingUpdate template: metadata: # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler # reserves resources for critical add-on pods so that they can be rescheduled after # a failure. This annotation works in tandem with the toleration below. annotations: scheduler.alpha.kubernetes.io/critical-pod: "" labels: name: nvidia-device-plugin-ds spec: tolerations: # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode. # This, along with the annotation above marks this pod as a critical add-on. - key: CriticalAddonsOnly operator: Exists - key: nvidia.com/gpu operator: Exists effect: NoSchedule containers: - image: mcr.microsoft.com/oss/nvidia/k8s-device-plugin:1.11 name: nvidia-device-plugin-ctr securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] volumeMounts: - name: device-plugin mountPath: /var/lib/kubelet/device-plugins volumes: - name: device-plugin hostPath: path: /var/lib/kubelet/device-plugins affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: agentpool operator: In values: - gpupool1 - gpupool2
References
Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS)
Comments
Post a Comment