docs: cuda guide: include files directly and use local references

2021-07-05 16:23:02 +02:00 · 2021-07-05 16:23:02 +02:00 · a2305bd87a
commit a2305bd87a
parent bf3f630d1b
3 changed files with 35 additions and 257 deletions
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@ -3,4 +3,5 @@ mkdocs-material
 pymdown-extensions
 mkdocs-git-revision-date-localized-plugin
 mkdocs-awesome-pages-plugin
-mdx_truly_sane_lists
+mdx_truly_sane_lists
+mkdocs-include-markdown-plugin # https://github.com/mondeja/mkdocs-include-markdown-plugin
--- a/docs/usage/guides/cuda.md
+++ b/docs/usage/guides/cuda.md
@ -1,135 +1,32 @@
 # Running CUDA workloads

-If you want to run CUDA workloads on the K3S container you need to customize the container.  
+If you want to run CUDA workloads on the K3s container you need to customize the container.  
 CUDA workloads require the NVIDIA Container Runtime, so containerd needs to be configured to use this runtime.  
-The K3S container itself also needs to run with this runtime.  
+The K3s container itself also needs to run with this runtime.  
 If you are using Docker you can install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).

-## Building a customized K3S image
+## Building a customized K3s image

-To get the NVIDIA container runtime in the K3S image you need to build your own K3S image.  
-The native K3S image is based on Alpine but the NVIDIA container runtime is not supported on Alpine yet.  
+To get the NVIDIA container runtime in the K3s image you need to build your own K3s image.  
+The native K3s image is based on Alpine but the NVIDIA container runtime is not supported on Alpine yet.  
 To get around this we need to build the image with a supported base image.

-### Dockerfiles:  
+### Dockerfiles
  
-Dockerfile.base:
+[Dockerfile.base](cuda/Dockerfile.base):
+
 ```Dockerfile
-FROM nvidia/cuda:11.2.0-base-ubuntu18.04
-
-ENV DEBIAN_FRONTEND noninteractive
-
-ARG DOCKER_VERSION
-ENV DOCKER_VERSION=$DOCKER_VERSION
-
-RUN set -x && \
-    apt-get update && \
-    apt-get install -y \
-    apt-transport-https \
-    ca-certificates \
-    curl \
-    wget \
-    tar \
-    zstd \
-    gnupg \
-    lsb-release \
-    git \
-    software-properties-common \
-    build-essential && \
-    rm -rf /var/lib/apt/lists/*
-
-RUN set -x && \
-    curl -fsSL https://download.docker.com/linux/$(lsb_release -is | tr '[:upper:]' '[:lower:]')/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg && \
-    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/$(lsb_release -is | tr '[:upper:]' '[:lower:]') $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null && \
-    apt-get update && \
-    apt-get install -y \
-    containerd.io \
-    docker-ce=5:$DOCKER_VERSION~3-0~$(lsb_release -is | tr '[:upper:]' '[:lower:]')-$(lsb_release -cs) \
-    docker-ce-cli=5:$DOCKER_VERSION~3-0~$(lsb_release -is | tr '[:upper:]' '[:lower:]')-$(lsb_release -cs) && \
-    rm -rf /var/lib/apt/lists/*
+{% include "cuda/Dockerfile.base" %}

 ```  
  
-  
-  
-Dockerfile.k3d-gpu:  
+[Dockerfile.k3d-gpu](cuda/Dockerfile.k3d-gpu):  

 ```Dockerfile
-FROM nvidia/cuda:11.2.0-base-ubuntu18.04 as base
-
-RUN set -x && \
-    apt-get update && \
-    apt-get install -y ca-certificates zstd
-
-COPY k3s/build/out/data.tar.zst /
-
-RUN set -x && \
-    mkdir -p /image/etc/ssl/certs /image/run /image/var/run /image/tmp /image/lib/modules /image/lib/firmware && \
-    tar -I zstd -xf /data.tar.zst -C /image && \
-    cp /etc/ssl/certs/ca-certificates.crt /image/etc/ssl/certs/ca-certificates.crt
-
-RUN set -x && \
-    cd image/bin && \
-    rm -f k3s && \
-    ln -s k3s-server k3s
-
-FROM nvidia/cuda:11.2.0-base-ubuntu18.04
-
-ARG NVIDIA_CONTAINER_RUNTIME_VERSION
-ENV NVIDIA_CONTAINER_RUNTIME_VERSION=$NVIDIA_CONTAINER_RUNTIME_VERSION
-
-RUN set -x && \
-    echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
-
-RUN set -x && \
-    apt-get update && \
-    apt-get -y install gnupg2 curl
-
-# Install NVIDIA Container Runtime
-RUN set -x && \
-    curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | apt-key add -
-
-RUN set -x && \
-    curl -s -L https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/nvidia-container-runtime.list | tee /etc/apt/sources.list.d/nvidia-container-runtime.list
-
-RUN set -x && \
-    apt-get update && \
-    apt-get -y install nvidia-container-runtime=${NVIDIA_CONTAINER_RUNTIME_VERSION}
-
-
-COPY --from=base /image /
-
-RUN set -x && \
-    mkdir -p /etc && \
-    echo 'hosts: files dns' > /etc/nsswitch.conf
-
-RUN set -x && \
-    chmod 1777 /tmp
-
-# Provide custom containerd configuration to configure the nvidia-container-runtime
-RUN set -x && \
-    mkdir -p /var/lib/rancher/k3s/agent/etc/containerd/
-
-COPY config.toml.tmpl /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
-
-# Deploy the nvidia driver plugin on startup
-RUN set -x && \
-    mkdir -p /var/lib/rancher/k3s/server/manifests
-
-COPY gpu.yaml /var/lib/rancher/k3s/server/manifests/gpu.yaml
-
-VOLUME /var/lib/kubelet
-VOLUME /var/lib/rancher/k3s
-VOLUME /var/lib/cni
-VOLUME /var/log
-
-ENV PATH="$PATH:/bin/aux"
-
-ENTRYPOINT ["/bin/k3s"]
-CMD ["agent"]
+{% include "cuda/Dockerfile.k3d-gpu" %}
 ```

-These Dockerfiles [Dockerfile.base](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/Dockerfile.base) + [Dockerfile.k3d-gpu](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/Dockerfile.k3d-gpu) are based on the [K3s Dockerfile](https://github.com/rancher/k3s/blob/master/package/Dockerfile)
+These Dockerfiles are based on the [K3s Dockerfile](https://github.com/rancher/k3s/blob/master/package/Dockerfile)
 The following changes are applied:

 1. Change the base images to nvidia/cuda:11.2.0-base-ubuntu18.04 so the NVIDIA Container Runtime can be installed. The version of `cuda:xx.x.x` must match the one you're planning to use.
@ -141,61 +38,7 @@ The following changes are applied:
 We need to configure containerd to use the NVIDIA Container Runtime. We need to customize the config.toml that is used at startup. K3s provides a way to do this using a [config.toml.tmpl](cuda/config.toml.tmpl) file. More information can be found on the [K3s site](https://rancher.com/docs/k3s/latest/en/advanced/#configuring-containerd).

 ```go
-[plugins.opt]
-  path = "{{ .NodeConfig.Containerd.Opt }}"
-
-[plugins.cri]
-  stream_server_address = "127.0.0.1"
-  stream_server_port = "10010"
-
-{{- if .IsRunningInUserNS }}
-  disable_cgroup = true
-  disable_apparmor = true
-  restrict_oom_score_adj = true
-{{end}}
-
-{{- if .NodeConfig.AgentConfig.PauseImage }}
-  sandbox_image = "{{ .NodeConfig.AgentConfig.PauseImage }}"
-{{end}}
-
-{{- if not .NodeConfig.NoFlannel }}
-[plugins.cri.cni]
-  bin_dir = "{{ .NodeConfig.AgentConfig.CNIBinDir }}"
-  conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
-{{end}}
-
-[plugins.cri.containerd.runtimes.runc]
-  # ---- changed from 'io.containerd.runc.v2' for GPU support
-  runtime_type = "io.containerd.runtime.v1.linux"
-
-# ---- added for GPU support
-[plugins.linux]
-  runtime = "nvidia-container-runtime"
-
-{{ if .PrivateRegistryConfig }}
-{{ if .PrivateRegistryConfig.Mirrors }}
-[plugins.cri.registry.mirrors]{{end}}
-{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
-[plugins.cri.registry.mirrors."{{$k}}"]
-  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
-{{end}}
-
-{{range $k, $v := .PrivateRegistryConfig.Configs }}
-{{ if $v.Auth }}
-[plugins.cri.registry.configs."{{$k}}".auth]
-  {{ if $v.Auth.Username }}username = "{{ $v.Auth.Username }}"{{end}}
-  {{ if $v.Auth.Password }}password = "{{ $v.Auth.Password }}"{{end}}
-  {{ if $v.Auth.Auth }}auth = "{{ $v.Auth.Auth }}"{{end}}
-  {{ if $v.Auth.IdentityToken }}identitytoken = "{{ $v.Auth.IdentityToken }}"{{end}}
-{{end}}
-{{ if $v.TLS }}
-[plugins.cri.registry.configs."{{$k}}".tls]
-  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
-  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
-  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
-{{end}}
-{{end}}
-{{end}}
+{% include "cuda/config.toml.tmpl" %}
 ```

 ### The NVIDIA device plugin
@ -207,102 +50,34 @@ To enable NVIDIA GPU support on Kubernetes you also need to install the [NVIDIA
 * Run GPU enabled containers in your Kubernetes cluster.

 ```yaml
-apiVersion: apps/v1
-kind: DaemonSet
-metadata:
-  name: nvidia-device-plugin-daemonset
-  namespace: kube-system
-spec:
-  selector:
-    matchLabels:
-      name: nvidia-device-plugin-ds
-  template:
-    metadata:
-      # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
-      # reserves resources for critical add-on pods so that they can be rescheduled after
-      # a failure.  This annotation works in tandem with the toleration below.
-      annotations:
-        scheduler.alpha.kubernetes.io/critical-pod: ""
-      labels:
-        name: nvidia-device-plugin-ds
-    spec:
-      tolerations:
-      # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
-      # This, along with the annotation above marks this pod as a critical add-on.
-      - key: CriticalAddonsOnly
-        operator: Exists
-      containers:
-      - env:
-        - name: DP_DISABLE_HEALTHCHECKS
-          value: xids
-        image: nvidia/k8s-device-plugin:1.11
-        name: nvidia-device-plugin-ctr
-        securityContext:
-          allowPrivilegeEscalation: true
-          capabilities:
-            drop: ["ALL"]
-        volumeMounts:
-          - name: device-plugin
-            mountPath: /var/lib/kubelet/device-plugins
-      volumes:
-        - name: device-plugin
-          hostPath:
-            path: /var/lib/kubelet/device-plugins
+{% include "cuda/gpu.yaml" %}
 ```

-### Build the K3S image
+### Build the K3s image

-To build the custom image we need to build K3S because we need the generated output.
+To build the custom image we need to build K3s because we need the generated output.

 Put the following files in a directory:
-* [Dockerfile.base](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/Dockerfile.base)
-* [Dockerfile.k3d-gpu](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/Dockerfile.k3d-gpu)
+
+* [Dockerfile.base](cuda/Dockerfile.base)
+* [Dockerfile.k3d-gpu](cuda/Dockerfile.k3d-gpu)
 * [config.toml.tmpl](cuda/config.toml.tmpl)
-* [gpu.yaml](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/gpu.yaml)
-* [build.sh](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/build.sh)
-* [cuda-vector-add.yaml](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/cuda-vector-add.yaml)
+* [gpu.yaml](cuda/gpu.yaml)
+* [build.sh](cuda/build.sh)
+* [cuda-vector-add.yaml](cuda/cuda-vector-add.yaml)

 The `build.sh` script is configured using exports & defaults to `v1.21.2+k3s1`. Please set your CI_REGISTRY_IMAGE! The script performs the following steps:

-* pulls K3S
-* builds K3S
+* pulls K3s
+* builds K3s
 * build the custom K3D Docker image

 The resulting image is tagged as k3s-gpu:&lt;version tag&gt;. The version tag is the git tag but the '+' sign is replaced with a '-'.

-[build.sh](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/build.sh):
+[build.sh](cuda/build.sh):

 ```bash
-#!/bin/bash
-
-export CI_REGISTRY_IMAGE="YOUR_REGISTRY_IMAGE_URL"
-export VERSION="1.0"
-export K3S_TAG="v1.21.2+k3s1"
-export DOCKER_VERSION="20.10.7"
-export IMAGE_TAG="v1.21.2-k3s1"
-export NVIDIA_CONTAINER_RUNTIME_VERSION="3.5.0-1"
-
-docker build -f Dockerfile.base --build-arg DOCKER_VERSION=$DOCKER_VERSION -t $CI_REGISTRY_IMAGE/base:$VERSION . && \
-docker push $CI_REGISTRY_IMAGE/base:$VERSION
-
-rm -rf ./k3s && \
-git clone --depth 1 https://github.com/rancher/k3s.git -b "$K3S_TAG" && \
-docker run -ti -v ${PWD}/k3s:/k3s -v /var/run/docker.sock:/var/run/docker.sock $CI_REGISTRY_IMAGE/base:1.0 sh -c "cd /k3s && make" && \
-ls -al k3s/build/out/data.tar.zst
-
-if [ -f k3s/build/out/data.tar.zst ]; then
-  echo "File exists! Building!"
-  docker build -f Dockerfile.k3d-gpu \
-    --build-arg NVIDIA_CONTAINER_RUNTIME_VERSION=$NVIDIA_CONTAINER_RUNTIME_VERSION \
-    -t $CI_REGISTRY_IMAGE:$IMAGE_TAG . && \
-  docker push $CI_REGISTRY_IMAGE:$IMAGE_TAG
-  echo "Done!"
-else
-  echo "Error, file does not exist!"
-  exit 1
-fi
-
-docker build -t $CI_REGISTRY_IMAGE:$IMAGE_TAG .
+{% include "cuda/build.sh" %}
 ```

 ## Run and test the custom image with Docker
@ -313,7 +88,7 @@ You can run a container based on the new image with Docker:
 docker run --name k3s-gpu -d --privileged --gpus all $CI_REGISTRY_IMAGE:$IMAGE_TAG
 ```

-Deploy a [test pod](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/cuda-vector-add.yaml):
+Deploy a [test pod](cuda/cuda-vector-add.yaml):

 ```bash
 docker cp cuda-vector-add.yaml k3s-gpu:/cuda-vector-add.yaml
@ -329,7 +104,7 @@ Tou can use the image with k3d:
 k3d cluster create local --image=$CI_REGISTRY_IMAGE:$IMAGE_TAG --gpus=1
 ```

-Deploy a [test pod](https://github.com/vainkop/k3d/blob/main/docs/usage/guides/cuda/cuda-vector-add.yaml):
+Deploy a [test pod](cuda/cuda-vector-add.yaml):

 ```bash
 kubectl apply -f cuda-vector-add.yaml
@ -346,10 +121,11 @@ Most of the information in this article was obtained from various sources:

 * [Add NVIDIA GPU support to k3s with containerd](https://dev.to/mweibel/add-nvidia-gpu-support-to-k3s-with-containerd-4j17)
 * [microk8s](https://github.com/ubuntu/microk8s)
-* [K3S](https://github.com/rancher/k3s)
+* [K3s](https://github.com/rancher/k3s)
 * [k3s-gpu](https://gitlab.com/vainkop1/k3s-gpu)

 ## Authors

- [@markrexwinkel](https://github.com/markrexwinkel)
- [@vainkop](https://github.com/vainkop)
+* [@markrexwinkel](https://github.com/markrexwinkel)
+* [@vainkop](https://github.com/vainkop)
+* [@iwilltry42](https://github.com/iwilltry42)
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -70,6 +70,7 @@ plugins:
  - git-revision-date-localized: # https://squidfunk.github.io/mkdocs-material/plugins/revision-date/
      type: date
  - awesome-pages # https://squidfunk.github.io/mkdocs-material/plugins/awesome-pages/
+  - include-markdown # https://github.com/mondeja/mkdocs-include-markdown-plugin

 # Other Settings
 strict: true # halt processing when a warning is raised