This involves putting libraries under /usr/lib64 and kernel modules under
/usr/lib/module. This is an experiment at making the nvidia installation work
as a sysext as well, but there are still some issues around that. The major
issue was that `systemd-sysext refresh` would remove the OEM symlink and I
don't feel comfortable with `systemctl restart systemd-sysext` from within
another unit.
If anyone wants to try it, it's now a matter of:
ln -s /opt/nvidia/current /run/extensions/nvidia-driver
Bonus points for moving nvidia binaries from /opt/bin to
/opt/nvidia/current/usr/bin.
Since we no longer need to run emerge in the developer container, we can as
well just treat the developer container more like a container image and use an
ephemeral overlay.
Currently the setup-nvidia script fails when re-executed. It should work in
cases when the driver is already built and just needs to be loaded, or when it
needs to be rebuilt for a new kernel (but driver version may not have changed).
To make this work, several changes where necessary:
* `./nvidia*.run -x -s` fails when already unpacked. Allow it so that we can
rebuild
* there are several module dependencies for nvidia modules that are implicit,
related to i2c/ipmi. Probe those explicitly.
* `[ -f /dev/nvidia* ]` fails because those are character devices, so need a
`[ -c ...]` check.
* `nvidia-modprobe` previously always failed, because it doesn't actually know
the location of the modules and can only call modprobe (modprobe looks into
/lib/modules/). We now explicitly probe the important modules, at that point
nvidia-modprobe just creates additional device nodes.
* `is_nvidia_installation_required` checks whether building and loading is needed.
Factor out the loading check so that we can reload the module after an update.
Currently the script will reuse a developer container that was downloaded once,
without ensuring that the same version is used as the running image. This works
on the first boot, but wouldn't be correct after an OS update.
To resolve this, add a version number to the downloaded filename, and check for
the versioned dev container file. When the file is missing we also cleanup all
other dev container files via glob remove.
...by providing /etc/flatcar/nvidia-metadata. Newer driver packages do not
support some older Nvidia cards. An example is the Tesla K80 cards in
Standard_NC6 VMs on Azure, which are only supported up to the 470.x driver
version. To allow users to continue using those, give them a way to override
the driver version through /etc/flatcar/nvidia-metadata. For example, this
entry could be used to pin a specific driver version:
NVIDIA_DRIVER_VERSION=470.103.01
There are two ways to build the nvidia-driver - either against a full kernel
source tree in /usr/src/linux, or against a slim kernel-devel equivalent in
/lib/modules/*/build. The /lib/modules/*/build is provided by
sys-kernel/coreos-module, see `install_build_source`. The interesting thing is
that in absence of --kernel-source-path, nvidia-installer will autodetect which
to use and already builds against /lib/modules/*/build on Flatcar right now. By
passing --kernel-name, we make that choice explicit and this allows us to skip
the emerge steps of the build.
Since this runs in the developer container, there is also no point in trying to
execute systemctl or depmod, so pass the flags to disable usage of those.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
With the new mantle container image referenced by the scripts repo we
don't need the mantle copy in the SDK anymore.
Drop the mantle package and the unused kola-data package.
Found this while checking why I was still seeing lots of
!!! Section 'gentoo' in repos.conf is missing location attribute
messages while building. Turns out that after the last sync of portage we
stopped applying patches from files/. This was caused by a local variable
definition of PATCHES that was overriding the global one.
This might be a sign to drop them or we can refresh them, as they do fix bugs
that have been hit in CoreOS in the past. I opted to refresh them, and inject
them into the local variable.
Crossdev currently uses binutils 2.36 (stable), while the SDK and sysroot both
build binutils 2.37 due to keywording. Kernel modules built within the
developer container fail to load due to relocation errors. Add the same
keywords to cross-*/binutils packages so that the versions match.
If a GCP image is tagged with GVNIC support, GCP will replace the default
virtio nic with the more optimized GVE NIC. Enable building the kernel module
for that.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The "init" repo has a systemd unit with lines that should be kept in
sync with upstream. Normally changes are not expected but in case there
are some, it may be good to be aware.
The container performs multi-queue optimizations for ssd and network devices
which requires touching /proc and /sys/ mounts which systemd-nspawn usually
mounts readonly. Allow the container to modify those by setting the appropriate
environment variable (found via https://systemd.io/ENVIRONMENT/).
and add missing dependencies on dev-python/distro and sys-apps/coreutils. We
need to bump the version to 20190124 because:
* 20180611 is not compatible with python 3.9 because of missing distro module and
trying to access os.errno (instead of importing the errno module). Also why we
need the dependency on dev-python/distro
* 20190124 is the last version before the repo was split and reorganized which
would require more work to the ebuilds
The coreutils dependency is necessary because the scripts call basename/nproc/cat
but previously coreutils was pulled in by the following dependency chain:
(dependency required by "app-admin/eselect-1.4.16::portage-stable" [binary])
(dependency required by "app-eselect/eselect-python-20160516::portage-stable" [binary])
(dependency required by "dev-lang/python-2.7.15::portage-stable" [binary])
(dependency required by "dev-python/boto-2.48.0::portage-stable" [binary])
(dependency required by "app-emulation/google-compute-engine-20180611::coreos" [binary])
(dependency required by "coreos-base/coreos-oem-gce-0.0.1-r5::coreos" [binary])
(dependency required by "coreos-base/coreos-oem-gce" [argument])
This chain seems to not hold any longer and we should be explicit about
dependencies.
The oem-aci profile previously removed python3 from the produced oem
images by having an entry saying dev-lang/python-3.X is provided and
removing all python3 files. This only worked as long as python2 was
available and installed instead, but since python2 was removed from the
tree these entries in the profile resulted in oem-aci having no python
at all. This prevents the oem-gce service from working, since a lot of
what it does is python.
Remove the INSTALL_MASK and package.provided entries for python3 to
allow python3 into oem-aci images.
This enables support for the Intel Running Average Power Limit (RAPL)
technology via MSR interface, which allows power limits to be enforced
and monitored on modern Intel processors.
It can be useful for energy consumption monitoring tools.
src: https://github.com/torvalds/linux/blob/master/drivers/powercap/Kconfig
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
This pulls in
https://github.com/flatcar-linux/init/pull/66
to fix the problem that Ignition keys would be lost as soon as
update-ssh-keys runs. This is done by placing Ignition's keys in as
files in the authorized_keys.d folder and calling update-ssh-keys after
Ignition ran.
Usually last two versions are supported, so make sure we keep them
both updated, not only just the latest. But try to also update the
newest unsupported version in case there was a window where the update
happened and then new major version was released.
When an action generates a couple of patches separately, then it might
be a good idea to specify a numbering, so applying the patches is done
in the desired order. Without that, all the generated patches would
start with "0001-" prefix.
They became enabled by default after an update. We didn't need them
before, we don't need them now. Also, enabling smi pulls in
net-libs/libsmi that does not have a keyword for arm64 even.
It became enabled by default after an update, so revert that change in
our profiles. It was enabled upstream, because it was needed by
dev-qt/qtcore, which we don't have.
We want to base the work branch (like rust-1.59-main) on top of the
base branch from our remote, not from remote that came with SDK. This
will make the work branch creation fork-friendly.
This action runs over main and the release branches and creates a PR that
updates mantle reference to the latest one. By using a fixed branch name,
rerunning the action will update/close an existing PR if new mantle commits
happen or if the PR becomes obsolete.
The tool is deprecated, nothing pulls that in any more and it has a
dependency on dev-perl/XML-Parser, an updated version of which would
want to pull a bunch of new packages through dev-perl/libwww-perl.
Avoid the hassle and drop the tool.
Realmd didn't have dev-util/intltool listed as a dependency, but it
actually required it during build. Apply a patch from upstream that
converts the project from intltool to gettext in order to get rid of
the dependency on the obsolete tool. To apply the patch without
conflicts, apply also another patch from upstream that modernizes the
configure.ac file.
We also disable the i18n through the --disable-nls flag. The disabling
is not complete though, so we still need to point gettext to the ITS
rules we have installed in ROOT.
Our github actions use cork to create an sdk chroot, which pulls down bzipped
archives. The runners have 2 CPUs, so this unpacking could be faster if we
installed lbzip2. Cork transparently uses lbzip2.
The size contains not only of the /usr partition but also the /boot
partition require that we reduce the size of binaries as much as
possible.
Strip all Go binaries by default.
This usually doesn't happen for releases, but for development
dev-containers it might be the case that portage-stable or
coreos-overlay commit is specified as some pull request reference -
these need to be fetched differently, as refs from refs/pull usually
are not fetched by default.
We were appending the [build] section, and the updated cargo eclass
already added that to the config, so we ended up with having two
[build] sections in the config file. Try to amend the section instead
of appending it to the file. While at it, do the same with the
target.${RUST_TARGET} section too to be a bit more futureproof.
- sys-libs/pam: Make /sbin/unix_chkpwd suid
This is to avoid importing fcaps eclass which adds a dependency on
sys-libs/libcap, which in turn depends on sys-libs/pam. To get out of
this conundrum, we could specify a "-filecaps" use flag for
sys-libs/pam. Problem with this solution would be no capability
override for the binary making it unable to read /etc/shadow. Thus we
make the binary suid. This is strictly less secure than overriding its
capabilities, but I have no idea how to solve it in a less hacky way.
- sys-libs/pam: Install configuration into /usr
Also provide a tmpfiles fragment to bring it back.
- sys-libs/pam: Locked accounts functionality
Signed-off-by: Sayan Chowdhury <schowdhury@microsoft.com>
As sys-apps/shadow has its own su binary, sys-apps/util-linux should
not have its own su binary. Otherwise, build will simply fail.
Disable su USE flag for util-linux.
The lib64/systemd location only happened to work through the used
symlink on Flatcar. The standard location is lib/systemd.
Use the standard location as we now want to split the libs folders.
The split of /usr/lib64 into /usr/lib and /usr/lib64 means that paths
to /usr/lib64/X that worked before now wouldn't.
Therefore, create compatibility symlinks.
The profile Flatcar is on had SYMLINK_LIB set for amd64 which set up
(/usr)/lib as symlink to (/usr)/lib64. This is not the case for arm64
nor common in other recent distributions and causes systemd-sysext
loading to fail.
Disable SYMLINK_LIB for the amd64 board for now, leaving the SDK as is
but we could also set it for the SDK, too. A future profile update will
also bring this change.
The /lib symlink does not point to /usr/lib but instead points to
/usr/lib64 on current releases which have a single /usr/lib64 folder
and a symlink from /usr/lib to it. This means that when they update to
a release with a split lib vs. lib64 setup, the kernel modules are not
found because /lib/modules does not exist (because /lib still points
to /usr/lib64 instead of /usr/lib).
Force link recreation to match the new layout. The system will still be
able to rollback because the link to /usr/lib is still valid because
/usr/lib is itself a link that forwards to /usr/lib64.
- remove unecessary files
- drop `pkg_postint`
- create `/etc/ssl` with tmpfiles
- mark openssl as stable for arm64 and amd64
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
For open-vm-tools 12.0.0, add a new USE flag salt-minion.
Pass `--disable-containerinfo` to fix build issues, because it is
currently not trivial to import dependency libs grpc++ into Flatcar.
rng-tools does not appear to be necessary for booting in virtual machine
environments in 2022. Back in the day the boot process would block if
there was not enough entropy to seed the system random pool, but over
the years the linux kernel made sure that the pool is force seeded if
userspace does not do so one it's own. Remove rng-tool as it is not
needed and it would require work to make sure it works (detection of
tpm/hwrng/intel cpu instructions).
flatcar-eks/nvidia-drivers/nvidia-metadata are now required to build
AWS/Azure images on all architectures, so we need the packages to not be
amd64-only dependencies of board-packages or coreos any longer.
coreos-base/oem-azure now requires systemd units installed by
nvidia-drivers, so the nvidia-drivers package needs to be available for
both architectures. Nvidia-drivers depends on nvidia-metadata so the
same applies.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This enables containerd to do appropriate SELinux labeling of containers
and files by default. This should not be problematic as Flatcar ships with
SELinux permissive by default.
Signed-off-by: Juan Antonio Osorio <juan.osoriorobles@eu.equinix.com>
With ignitionv3, there is no more `default.ign` loaded configuration. We
can safely remove this configuration since it won't be loaded anyway.
oem-cloudinit will be conditionally enabled based on `ignition`
execution result.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
it mainly brings V3 support on top of V2 support for Ignition and ensure
backward compatibility with existing integration.
Signed-off-by: Mathieu Tortuyaux <mathieu@kinvolk.io>
Recreate the old posix symlink for compatibility, and drop all the
pkg functions that maintain /etc/localtime since we default to UTC.
Signed-off-by: Sayan Chowdhury <schowdhury@microsoft.com>