Commit Graph

18 Commits

Author SHA1 Message Date
Noel Georgi
37f2297e6b
feat: support lts and production nvidia modules
Support LTS and production versions of NVIDIA kernel modules as per https://docs.nvidia.com/datacenter/tesla/drivers/index.html#lifecycle

Part of: https://github.com/siderolabs/talos/issues/9086

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-06 22:31:38 +05:30
Noel Georgi
d6773dd25a
chore: bump deps
Bump dependencies

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-05 16:07:09 +05:30
Noel Georgi
5334e89374
fix: glibc search paths for nvidia
Set `glibc/lib` as first `rpath` for `nvidia-container-cli`. Also
install nvidia libraries to `/usr/local/glibc/lib` so any musl libraries
lives separately.

`nvidia-container-cli` explicitly sets an `RPATH` as `$ORIGIN/../$LIB` here:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/v1.14.6/Makefile?ref_type=tags#L183,
this means `/usr/local/lib` would be searched first, since `zfs` and
nvidia ship their own `libtirpc`, `nvidia-container-cli` first tries to
use the `libtirpc` shippeed with `zfs` at `/usr/local/lib` instead of
the one at `/usr/local/glibc/lib`. Fix this by setting an additional
`RPATH` as `$ORIGIN/../glibc/$LIB`, so that libraries in
`/usr/local/glibc/lib` have higher preference.

```bash
❯ scanelf -r _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
 TYPE   RPATH FILE
ET_DYN $ORIGIN/../glibc/$LIB:$ORIGIN/../$LIB _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
```

Properly fixes: #380

Fixes from #401 and #410 were not complete.

Manually tested by spinning up a NVIDIA worker in AWS.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-24 14:15:15 +05:30
Noel Georgi
3526f4507a
fix: zfs extensions with nvidia
Introduce a proper fix for #401, keep musl path's as is, and use
`/usr/local/glibc` as install path for all glibc related stuff so that
any new common libraries will not cause an issue in the future.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-12 16:00:09 +08:00
Noel Georgi
4ed9ee5849
fix: zfs-tools libtirpc path
Use a custom path for libtirpc shipped with zfs-tools so that it doesn't
conflict with libtirpc built for nvidia-container-toolkit (as it's
linked against glibc).

Fixes: #380

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-07 21:58:38 +08:00
Noel Georgi
eb79cf81c2
chore: bump dependencies
Bump dependencies and bring in stable pkgs.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-04-02 19:17:14 +05:30
Andrey Smirnov
ba40f6e508
feat: update Go to 1.22.1, update releases
```
| Package | Update | Change |
|---|---|---|
| git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git | minor | `4.2` -> `4.3` |
| git://sourceware.org/git/elfutils.git | minor | `0.190` -> `0.191` |
| [https://github.com/qemu/qemu.git](https://togithub.com/qemu/qemu) | patch | `8.2.1` -> `8.2.2` |
| [tailscale/tailscale](https://togithub.com/tailscale/tailscale) | patch | `1.60.0` -> `1.60.1` |
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-07 16:02:21 +04:00
Noel Georgi
9cdf805a5d
chore: bump dependencies
Bump dependencies.

Use [go1.20 for building nvidia stuff](https://github.com/NVIDIA/nvidia-container-toolkit/issues/372).

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-21 23:47:01 +05:30
Andrey Smirnov
9105eef354
feat: bump dependencies
```
| Package | Update | Change |
|---|---|---|
| git://sourceware.org/git/elfutils.git | minor | `0.189` -> `0.190` |
| [https://github.com/qemu/qemu.git](https://togithub.com/qemu/qemu) | patch | `8.1.2` -> `8.1.3` |
| [nvidia/open-gpu-kernel-modules](https://togithub.com/nvidia/open-gpu-kernel-modules) | minor | `535.54.03` -> `535.129.03` |
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-28 22:27:45 +04:00
Noel Georgi
7c68b1b932
chore: use kres to manage project
Move to using kres to manage project.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-03 19:00:52 +05:30
Noel Georgi
a5c0b0086b
chore: revert nvidia bumps from #220
Revert nvidia bumps from #220. The extensions-test fail and there's not
much debug info available for now.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-05 19:18:06 +05:30
Noel Georgi
d9145f9b6b
chore: bump deps
| Package | Update | Change |
|---|---|---|
| [https://github.com/qemu/qemu.git](https://togithub.com/qemu/qemu) | minor | `8.0.2` -> `v8.1.0` |
| [https://gitlab.com/nvidia/container-toolkit/container-toolkit.git](https://gitlab.com/nvidia/container-toolkit/container-toolkit) | minor | `v1.13.5` -> `v1.14.0` |
| [https://gitlab.com/nvidia/container-toolkit/libnvidia-container.git](https://gitlab.com/nvidia/container-toolkit/libnvidia-container) | minor | `v1.13.5` -> `v1.14.0` |
| [https://gitlab.gnome.org/GNOME/glib.git](https://gitlab.gnome.org/GNOME/glib) | minor | `2.76.3` -> `2.77.3` |
| [siderolabs/bldr](https://togithub.com/siderolabs/bldr) | patch | `v0.2.0` -> `v0.2.1` |
| [tailscale/tailscale](https://togithub.com/tailscale/tailscale) | minor | `1.46.1` -> `1.48.1` |

Also fix the wolfi-base variable to get renovate updates.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-04 23:37:48 +05:30
Noel Georgi
d4d42e52d9
feat: use wolfi as base for nvidia
Use wolfi base as toolchain for NVIDIA build.
This removes a lot of hacks and patches we maintain.

Fixes: #171
Fixes: https://github.com/siderolabs/pkgs/issues/720

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-02 21:36:00 +05:30
Noel Georgi
130ebd5798
chore: bump deps
Bump dependencies.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-22 23:34:00 +05:30
Noel Georgi
8cb8014ce2
chore: bump deps
Bump dependencies and reduce renovate noise

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-17 16:08:17 +05:30
Noel Georgi
b4edb73cd4
chore: bump deps
Bump deps

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-22 12:03:40 +05:30
Noel Georgi
eac3211468
feat: enable renovate bot
Enable renovate bot for dependency updates.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-30 02:11:40 +05:30
Noel Georgi
e77f3477ee
feat: publish nvidia modules and toolkit
Publish the OSS Nvidia kernel modules built against a release version of
Talos and also the nvidia toolkit required for running GPU workloads on
Kubernetes.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-18 22:44:10 +05:30