Building with clang 16 on MIPS64 yields this warning:
src/slz.c:931:24: warning: unused function 'crc32_uint32' [-Wunused-function]
static inline uint32_t crc32_uint32(uint32_t data)
^
Let's guard it using UNALIGNED_LE_OK which is the only case where it's
used. This saves us from introducing a possibly non-portable attribute.
This is libslz upstream commit f5727531dba8906842cb91a75c1ffa85685a6421.
Calling slz_rfc1950_finish() without emitting any data would result in
incorrectly emitting a gzip header (rfc1952) instead of a zlib header
(rfc1950) due to a copy-paste between the two wrappers. The impact is
almost inexistent since the zlib format is almost never used in this
context, and compressing totally empty messages is quite rare as well.
Let's take this opportunity for fixing another mistake on an RFC number
in a comment.
This is slz upstream commit 7f3fce4f33e8c2f5e1051a32a6bca58e32d4f818.
The current hash involves 3 simple shifts and additions so that it can
be mapped to a multiply on architecures having a fast multiply. This is
indeed what the compiler does on x86_64. A large range of values was
scanned to try to find more optimal factors on machines supporting such
a fast multiply, and it turned out that new factor 0x1af42f resulted in
smoother hashes that provided on average 0.4% better compression on both
the Silesia corpus and an mbox file composed of very compressible emails
and uncompressible attachments. It's even slightly better than CRC32C
while being faster on Skylake. This patch enables this factor on archs
with a fast multiply.
This is slz upstream commit 82ad1e75c13245a835c1c09764c89f2f6e8e2a40.
If building for sse4 and USE_CRC32C_HASH is defined, then we can use
crc32c to calculate the lookup hash. By default we don't do it because
even on skylake it's slower than the current hash, which only involves
a short multiply (~5% slower). But the gains are marginal (0.3%).
This is slz upstream commit 44ae4f3f85eb275adba5844d067d281e727d8850.
Note: this is not used by default and only merged in order to avoid
divergence between the code bases.
On 64-bit platforms, disassembling the code shows that send_huff() performs
a left shift followed by a right one, which are the result of integer
truncation and zero-extension caused solely by using different types at
different levels in the call chain. By making encode24() take a 64-bit
int on input and send_huff() take one optionally, we can remove one shift
in the hot path and gain 1% performance without affecting other platforms.
This is slz upstream commit fd165b36c4621579c5305cf3bb3a7f5410d3720b.
In some cases it may be desirable for latency reasons to forcefully
flush the queue even if it results in suboptimal compression. In our
case the queue might contain up to almost 4 bytes, which need an EOB
and a switch to literal mode, followed by 4 bytes to encode an empty
message. This means that each call can add 5 extra bytes in the ouput
stream. And the flush may also result in the header being produced for
the first time, which can amount to 2 or 10 bytes (zlib or gzip). In
the worst case, a total of 19 bytes may be emitted at once upon a flush
with 31 pending bits and a gzip header.
This is libslz upstream commit cf8c4668e4b4216e930b56338847d8d46a6bfda9.
Building with -DFIND_OPTIMAL_MATCH would fail on undeclared "len".
This one likely vanished in some cleanup.
This is libslz upstream commit 1ea20360715e1ad0cd81db83fa4361310716b8cc
Many ARMv8 processors also support Aarch32 and can run armv7 and even
thumb2 code. While armv8 compilers will not emit these instructions,
armv7 compilers that are aware of these processors will do. For
example, using gcc built for an armv7 target and passing it
"-mcpu=cortex-a72" or "-march=armv8-a+crc" will result in the CRC32
instruction to be used.
In this case the current assembly code fails because with the ARM and
Thumb2 instruction sets there is no such "%wX" half-registers. We need
to use "%X" instead as the native 32-bit register when running with a
32-bit instruction set, and use "%wX" when using the 64-bit instruction
set (A64).
This is slz upstream commit fab83248612a1e8ee942963fe916a9cdbf085097
The test on FIND_OPTIMAL_MATCH for the experimental code can yield a
build warning when using -Wundef, let's turn it into a regular ifdef.
This is slz upstream commit 05630ae8f22b71022803809eb1e7deb707bb30fb
stdint.h is not as portable as inttypes.h. It doesn't exist at least
on AIX 5.1 and Solaris 7, while inttypes.h is present there and does
include stdint.h on platforms supporting it.
This is equivalent to libslz upstream commit e36710a ("slz: use
inttypes.h instead of stdint.h")
On ARM with native CRC support, no need to inflate the executable with
a 4kB CRC table, let's just drop it.
This is slz upstream commit d8715db20b2968d1f3012a734021c0978758f911.
This is the only place where we conditionally use the crc32_fast table,
better call the crc32_char inline function for this. This should also
reduce by ~1kB the L1 cache footprint of the compression when dealing
with small blocks, and at least shows a consistent 0.5% perf improvement.
This is slz upstream commit 075351b6c2513b548bac37d6582e46855bc7b36f.
As we now embed the library we don't need to support the older 1.0 API
any more, so we can remove the explicit calls to slz_make_crc_table()
and slz_prepare_dist_table().
SLZ is rarely packaged by distros and there have been complaints about
the CPU and memory usage of ZLIB, leading to some suggestions to better
address the issue by simply integrating SLZ into the tree (just 3 files).
See discussions below:
https://www.mail-archive.com/haproxy@formilux.org/msg38037.htmlhttps://www.mail-archive.com/haproxy@formilux.org/msg40079.htmlhttps://www.mail-archive.com/haproxy@formilux.org/msg40365.html
This patch does just this, after minor adjustments to these files:
- tables.h was renamed to slz-tables.h
- tables.h had the precomputed tables removed since not used here
- slz.c uses includes <import/slz*> instead of "slz*.h"
The slz commit imported here was b06c172 ("slz: avoid a build warning
with -Wimplicit-fallthrough"). No other change was performed either to
SLZ nor to haproxy at this point so that this operation may be replicated
if needed for a future version.