# GPU Kubernetes Role This document describes how to use the `gpu-k8s` role to deploy a simple Kubernetes cluster with NVIDIA GPU support. ## Overview The role performs four main tasks: 1. **Create the Kubernetes cluster** using [sealos](https://github.com/labring/sealos). It runs the provided `sealos run` command to bootstrap the master and worker nodes. 2. **Install NVIDIA drivers and the NVIDIA container toolkit** on the target hosts so that Kubernetes can access GPU resources. 3. **Verify the cluster state** after initialization, displaying the `sealos` version and the current Kubernetes nodes. 4. **Verify GPU access** by deploying the official NVIDIA device plugin and running a small CUDA workload. The following command is used to create the cluster (example with one master and one worker): ```bash sealos run \ registry.cn-shanghai.aliyuncs.com/labring/kubernetes:v1.29.9 \ registry.cn-shanghai.aliyuncs.com/labring/cilium:v1.13.4 \ registry.cn-shanghai.aliyuncs.com/labring/helm:v3.9.4 \ --masters 172.16.11.120 \ --nodes 172.16.11.152 \ --env '{}' \ --cmd "kubeadm init --skip-phases=addon/kube-proxy" ``` After the cluster is running the role installs the NVIDIA device plugin and runs a test pod to ensure `nvidia-smi` works inside the cluster. ## Usage Add the role to your playbook along with the `ssh-trust` role which configures passwordless access from the ops host to the cluster nodes: ```yaml - hosts: all roles: - ssh-trust - gpu-k8s ``` Example playbook snippet defining the IP lists: ```yaml - hosts: all vars: master_ips: - "172.16.11.120" node_ips: - "172.16.11.152" roles: - ssh-trust - gpu-k8s ``` The playbook expects `master_ips` and `node_ips` variables which are lists of IP addresses. Up to three masters can be specified. Run the playbook with your inventory that contains the master and node IP addresses. ```bash ansible-playbook -i inventory/hosts/all playbooks/demo_gpu_k8s.yml ``` The final step prints the output of `nvidia-smi` from inside a Kubernetes pod, confirming that the GPU is available.