Inode sizes smaller than 256:
- don't support extended metadata (nanosecond timestamp resolution)
- cannot handle dates beyond 2038
- are deprecated
Change the default from 128 to 256. There is no way to apply this change on a
mounted filesystem so this change will only apply to new deployments.
Fixes: flatcar/flatcar#1082
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
disk_util sometimes bails out during build with an ASCII conversion
error:
Traceback (most recent call last):
File "/mnt/host/source/src/scripts/build_library/disk_util", line 1114, in <module>
main(sys.argv)
File "/mnt/host/source/src/scripts/build_library/disk_util", line 1110, in main
options.func(options)
File "/mnt/host/source/src/scripts/build_library/disk_util", line 779, in Verity
Tune2fsReadWrite(options, part, disable_rw=True)
File "/mnt/host/source/src/scripts/build_library/disk_util", line 716, in Tune2fsReadWrite
image.write(chr(flag_value))
UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 0: ordinal not in range(128)
Curiously, the error does not reproduce every time (though the code
leading to the error is straightforward).
This change converts the integer to be written to a byte array (of size
1) instead of using chr(). Also, the file to be written is explicitly
opened in binary mode.
Signed-off-by: Thilo Fromm <thilo@kinvolk.io>
This is some fallout from converting scripts from python2 to
python3. Output received from the functions in subprocess module now
return bytearrays, but we operate on them as if they were a text. So
decode the bytearrays to strings. Otherwise we are either getting some
junk values passed to the command line utilities (for example:
`b'/dev/loop2'` instead of `/dev/loop2`), or exceptions are thrown,
because a function expected a string.
This is just a conversion done by 2to3 with a manual updates of
shebangs to mention python3 explicitly. The fixups for bytearray vs
string issues will follow up.
The defaults already give more space than the ext4 defaults but it's
recommended to use the mixed mode for filesystems smaller than 1-5 GB.
Another aspect is the duplication of metadata and while it currently is
off it's actually related to the underlying block device and could
change as soon as the block device type changes.
Select the mixed mode that uses a merged area for data and metadata
blocks. Also ensure that no metadata duplication gets enabled
automatically.
The limited /usr and OEM partiton size is a challenge when adding new
packages or updating a package. Since the disk layout can't be changed
for compatibility reasons when updating an existing instance, we can't
simply try out something without ensuring first that enough space is
there by removing something else. This situation can be relaxed by
leveraging btrfs compression. There was some support for btrfs but it
was a bit outdated and didn't allow to configure compression or setting
read-only flags.
Fix the btrfs support, allow to mark the default subvolume as read only
and add a compression variable that allows to select a compression
algorithm. Instead of enabling compression by setting the mount option,
we can set the filesystem attribute which has the benefit that
compression is still used with the default mount options for this (top)
directory and its contents. While for the ext2 /usr partition a hack
existed to force read-only mode by modifying some bytes and checking
these bytes could also be used to know if read-only should be used to
prevent corruption of dm-verity data, we rather check directly whether
dm-verity is active for this partition and mount it read-only (and
with the norecovery option to really prevent any write attempt).
When loop device partition nodes aren't cleaned up, building images
will fail with:
mkfs.vfat: Partitions or virtual mappings on device '/dev/loop0', not making filesystem (use -I to override)
Just add the flag unconditionally to work around it.
This reverts commit 7f058d61a1.
Reverting because of bug 2284 [1] where grub will sometimes fail due to
memory corruption. This is _not_ the cause of the bug, and the bug can
even be reproduced with this reversion, but it seems to occur less when
not using fat32.
[1] https://github.com/coreos/bugs/issues/2284
mkfs.vfat was defaulting to FAT16 based on the size of the partition.
The UEFI spec (2.7 errata A, section 13.3) implies that only FAT32 is
necessarily supported on the ESP, and we've received a report of
hardware that doesn't recognize FAT16.
Previously fsck output was suppressed to reduce the amount of noise in
build logs on the assumption that fsck really shouldn't have a reason to
fail. The filesystem is freshly created after all. However some users
have reported that fsck is failing but without error messages we don't
know why.
This change changes the default 'bytes-per-inode' ration from 16K to 4K,
the block size. To prevent this from wasting too much space change the
inode size from the default 256 to the minimum size, 128. Larger inodes
are used to store extended attributes more efficiently but since we do
not use SELinux the majority of files do not have security attributes.
These defaults may be modified via the new `bytes_per_inode` and
`inode_size` options.
Mark the initial copy of CoreOS as 'successful' and with a non-zero
priority. Required to boot with a stricter interpretation of the
partition selection scheme which ignores partitions that have a priority
of zero. The new grub implementation follows this rule and is what the
original ChromeOS spec used too.
For the sake of completeness if multiple partitions are configured in
the json file with this feature they will be prioritized in disk-order.
So far the default iteration order of python dicts has mostly matched
the order that we want the partitions on disk but this is not always the
case. I caught the BIOS-BOOT partition being ordered on disk after the
USR-A partition. Nothing bad came of this but consistancy is good.
The VHD disk format internally includes CHS addressing and qemu-img
respectfully aligns disk images to the common 16 heads 63 sectors
geometry when possible. This is unfortunate since images uploaded to
Azure must also be aligned to 1MB we normally do.
Since qemu-img doesn't have a way to handle this well right now adjust
our existing alignment logic to create disk images aligned to both.
We don't need to do anything like manually install the MBR boot code
for grub but we do need to continue to expose the ESP partition as a
hybrid partition to support pvgrub.
Calling cgpt create when resizing zeros the MBR boot code. This worked
with the syslinux setup because the boot code was re-written. When not
using syslinux it is easier to just preserve the existing MBR instead.
Attempting to work around an apparent race in mtools, the command
'extlinux' these days is just the install tool for mounted partitions
while 'syslinux' is for unmounted devices.
The new Update() performs the same tasks as the old Resize()
in addition to formatting previously-unformatted partitions. This
allows children disk-layouts to repartition the base layout in
addition to resizing.
btrfs isn't designed for small volumes and can run out of space sooner
than one would expect in our current setup, particularly with docker.
To try to improve the situation always create the filesystem initially
as 2GB instead of 512MB using the default settings: metadata is
duplicated, data is single, not mixed. The mixed setting may have been
partly why our performance can be so poor. For the default vm layout
use 6GB instead of 3GB, about what we use for EC2.
Using the classic mbr.bin was only needed during the transition from
syslinux 3 to 6 because the behavior of gptmbr.bin changed after 3.
Now that the transition is done and cgpt supports the new scheme now it
is time we switched back. This avoids depending on using a hybrid MBR.
cgpt now supports generating hybrid MBRs and the classic style mbr.bin
from any version of SYSLINUX should work the same with the hybrid MBR.
The other code, gptmbr.bin, changes after SYSLINUX 3. Switching lets me
play with different versions of SYSLINUX without breaking everything.
With this change all images feature a hybrid MBR so the special case for
some VM platforms has been removed.
The funky UUID and other special settings should only be applied to
coreos-rootfs and coreos-usr partitions which will never be fscked. When
STATE becomes ROOT in -usr images it gets fsked while mounted read-only
and fsck updates the filesystem's UUID if it is blank. Turns out this
causes disagreement between the kernel and the disk leading to bad
things. A related issue was fixed in a newer version of tune2fs but
unless I missed it the same bugfix didn't make it into e2fsck so
updating wouldn't resolve the issue.
http://e2fsprogs.sourceforge.net/e2fsprogs-release.html#1.42.9
Now disk_util is aware of the weird ext2 read-only hack, both by
providing a command to manipulate it and support in the mount command to
automatically set the 'ro' mount option for filesystems with it.
Making mount aware of the hack makes it much easier to mount prod
images with a mix read-only and read-write filesystems.
write_gpt --update <img> will read an existing image and make sure all
existing partitions will not get moved or truncated in the new layout.
This is mostly useful for resizing the final partition or just rewriting
metadata like partition types and labels.