Since we no longer need to run emerge in the developer container, we can as
well just treat the developer container more like a container image and use an
ephemeral overlay.
Currently the setup-nvidia script fails when re-executed. It should work in
cases when the driver is already built and just needs to be loaded, or when it
needs to be rebuilt for a new kernel (but driver version may not have changed).
To make this work, several changes where necessary:
* `./nvidia*.run -x -s` fails when already unpacked. Allow it so that we can
rebuild
* there are several module dependencies for nvidia modules that are implicit,
related to i2c/ipmi. Probe those explicitly.
* `[ -f /dev/nvidia* ]` fails because those are character devices, so need a
`[ -c ...]` check.
* `nvidia-modprobe` previously always failed, because it doesn't actually know
the location of the modules and can only call modprobe (modprobe looks into
/lib/modules/). We now explicitly probe the important modules, at that point
nvidia-modprobe just creates additional device nodes.
* `is_nvidia_installation_required` checks whether building and loading is needed.
Factor out the loading check so that we can reload the module after an update.
Currently the script will reuse a developer container that was downloaded once,
without ensuring that the same version is used as the running image. This works
on the first boot, but wouldn't be correct after an OS update.
To resolve this, add a version number to the downloaded filename, and check for
the versioned dev container file. When the file is missing we also cleanup all
other dev container files via glob remove.
...by providing /etc/flatcar/nvidia-metadata. Newer driver packages do not
support some older Nvidia cards. An example is the Tesla K80 cards in
Standard_NC6 VMs on Azure, which are only supported up to the 470.x driver
version. To allow users to continue using those, give them a way to override
the driver version through /etc/flatcar/nvidia-metadata. For example, this
entry could be used to pin a specific driver version:
NVIDIA_DRIVER_VERSION=470.103.01
There are two ways to build the nvidia-driver - either against a full kernel
source tree in /usr/src/linux, or against a slim kernel-devel equivalent in
/lib/modules/*/build. The /lib/modules/*/build is provided by
sys-kernel/coreos-module, see `install_build_source`. The interesting thing is
that in absence of --kernel-source-path, nvidia-installer will autodetect which
to use and already builds against /lib/modules/*/build on Flatcar right now. By
passing --kernel-name, we make that choice explicit and this allows us to skip
the emerge steps of the build.
Since this runs in the developer container, there is also no point in trying to
execute systemctl or depmod, so pass the flags to disable usage of those.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
With the new mantle container image referenced by the scripts repo we
don't need the mantle copy in the SDK anymore.
Drop the mantle package and the unused kola-data package.
Found this while checking why I was still seeing lots of
!!! Section 'gentoo' in repos.conf is missing location attribute
messages while building. Turns out that after the last sync of portage we
stopped applying patches from files/. This was caused by a local variable
definition of PATCHES that was overriding the global one.
This might be a sign to drop them or we can refresh them, as they do fix bugs
that have been hit in CoreOS in the past. I opted to refresh them, and inject
them into the local variable.
If a GCP image is tagged with GVNIC support, GCP will replace the default
virtio nic with the more optimized GVE NIC. Enable building the kernel module
for that.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The "init" repo has a systemd unit with lines that should be kept in
sync with upstream. Normally changes are not expected but in case there
are some, it may be good to be aware.
The container performs multi-queue optimizations for ssd and network devices
which requires touching /proc and /sys/ mounts which systemd-nspawn usually
mounts readonly. Allow the container to modify those by setting the appropriate
environment variable (found via https://systemd.io/ENVIRONMENT/).
and add missing dependencies on dev-python/distro and sys-apps/coreutils. We
need to bump the version to 20190124 because:
* 20180611 is not compatible with python 3.9 because of missing distro module and
trying to access os.errno (instead of importing the errno module). Also why we
need the dependency on dev-python/distro
* 20190124 is the last version before the repo was split and reorganized which
would require more work to the ebuilds
The coreutils dependency is necessary because the scripts call basename/nproc/cat
but previously coreutils was pulled in by the following dependency chain:
(dependency required by "app-admin/eselect-1.4.16::portage-stable" [binary])
(dependency required by "app-eselect/eselect-python-20160516::portage-stable" [binary])
(dependency required by "dev-lang/python-2.7.15::portage-stable" [binary])
(dependency required by "dev-python/boto-2.48.0::portage-stable" [binary])
(dependency required by "app-emulation/google-compute-engine-20180611::coreos" [binary])
(dependency required by "coreos-base/coreos-oem-gce-0.0.1-r5::coreos" [binary])
(dependency required by "coreos-base/coreos-oem-gce" [argument])
This chain seems to not hold any longer and we should be explicit about
dependencies.
The oem-aci profile previously removed python3 from the produced oem
images by having an entry saying dev-lang/python-3.X is provided and
removing all python3 files. This only worked as long as python2 was
available and installed instead, but since python2 was removed from the
tree these entries in the profile resulted in oem-aci having no python
at all. This prevents the oem-gce service from working, since a lot of
what it does is python.
Remove the INSTALL_MASK and package.provided entries for python3 to
allow python3 into oem-aci images.
This enables support for the Intel Running Average Power Limit (RAPL)
technology via MSR interface, which allows power limits to be enforced
and monitored on modern Intel processors.
It can be useful for energy consumption monitoring tools.
src: https://github.com/torvalds/linux/blob/master/drivers/powercap/Kconfig
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
This pulls in
https://github.com/flatcar-linux/init/pull/66
to fix the problem that Ignition keys would be lost as soon as
update-ssh-keys runs. This is done by placing Ignition's keys in as
files in the authorized_keys.d folder and calling update-ssh-keys after
Ignition ran.