Released version 3.4-dev2 with the following main changes :
- BUG/MEDIUM: mworker/listener: ambiguous use of RX_F_INHERITED with shards
- BUG/MEDIUM: http-ana: Properly detect client abort when forwarding response (v2)
- BUG/MEDIUM: stconn: Don't report abort from SC if read0 was already received
- BUG/MEDIUM: quic: Don't try to use hystart if not implemented
- CLEANUP: backend: Remove useless test on server's xprt
- CLEANUP: tcpcheck: Remove useless test on the xprt used for healthchecks
- CLEANUP: ssl-sock: Remove useless tests on connection when resuming TLS session
- REGTESTS: quic: fix a TLS stack usage
- REGTESTS: list all skipped tests including 'feature cmd' ones
- CI: github: remove openssl no-deprecated job
- CI: github: add a job to test the master branch of OpenSSL
- CI: github: openssl-master.yml misses actions/checkout
- BUG/MEDIUM: backend: Do not remove CO_FL_SESS_IDLE in assign_server()
- CI: github: use git prefix for openssl-master.yml
- BUG/MEDIUM: mux-h2: synchronize all conditions to create a new backend stream
- REGTESTS: fix error when no test are skipped
- MINOR: cpu-topo: Turn the cpu policy configuration into a struct
- MEDIUM: cpu-topo: Add a "threads-per-core" keyword to cpu-policy
- MEDIUM: cpu-topo: Add a "cpu-affinity" option
- MEDIUM: cpu-topo: Add a new "max-threads-per-group" global keyword
- MEDIUM: cpu-topo: Add the "per-thread" cpu_affinity
- MEDIUM: cpu-topo: Add the "per-ccx" cpu_affinity
- BUG/MINOR: cpu-topo: fix -Wlogical-not-parentheses build with clang
- DOC: config: fix number of values for "cpu-affinity"
- MINOR: tools: add a secure implementation of memset
- MINOR: mux-h2: add missing glitch count for non-decodable H2 headers
- MINOR: mux-h2: perform a graceful close at 75% glitches threshold
- MEDIUM: mux-h1: implement basic glitches support
- MINOR: mux-h1: perform a graceful close at 75% glitches threshold
- MEDIUM: cfgparse: acknowledge that proxy ID auto numbering starts at 2
- MINOR: cfgparse: remove useless checks on no server in backend
- OPTIM/MINOR: proxy: do not init proxy management task if unused
- MINOR: patterns: preliminary changes for reorganization
- MEDIUM: patterns: reorganize pattern reference elements
- CLEANUP: patterns: remove dead code
- OPTIM: patterns: cache the current generation
- MINOR: tcp: add new bind option "tcp-ss" to instruct the kernel to save the SYN
- MINOR: protocol: support a generic way to call getsockopt() on a connection
- MINOR: tcp: implement the get_opt() function
- MINOR: tcp_sample: implement the fc_saved_syn sample fetch function
- CLEANUP: assorted typo fixes in the code, commits and doc
- BUG/MEDIUM: cpu-topo: Don't forget to reset visited_ccx.
- BUG/MAJOR: set the correct generation ID in pat_ref_append().
- BUG/MINOR: backend: fix the conn_retries check for TFO
- BUG/MINOR: backend: inspect request not response buffer to check for TFO
- MINOR: net_helper: add sample converters to decode ethernet frames
- MINOR: net_helper: add sample converters to decode IP packet headers
- MINOR: net_helper: add sample converters to decode TCP headers
- MINOR: net_helper: add ip.fp() to build a simplified fingerprint of a SYN
- MINOR: net_helper: prepare the ip.fp() converter to support more options
- MINOR: net_helper: add an option to ip.fp() to append the TTL to the fingerprint
- MINOR: net_helper: add an option to ip.fp() to append the source address
- DOC: config: fix the length attribute name for stick tables of type binary / string
- MINOR: mworker/cli: only keep positive PIDs in proc_list
- CLEANUP: mworker: remove duplicate list.h include
- BUG/MINOR: mworker/cli: fix show proc pagination using reload counter
- MINOR: mworker/cli: extract worker "show proc" row printer
- MINOR: cpu-topo: Factorize code
- MINOR: cpu-topo: Rename variables to better fit their usage
- BUG/MEDIUM: peers: Properly handle shutdown when trying to get a line
- BUG/MEDIUM: mux-h1: Take care to update <kop> value during zero-copy forwarding
- MINOR: threads: Avoid using a thread group mask when stopping.
- MINOR: hlua: Add support for lua 5.5
- MEDIUM: cpu-topo: Add an optional directive for per-group affinity
- BUG/MEDIUM: mworker: can't use signals after a failed reload
- BUG/MEDIUM: stconn: Move data from <kip> to <kop> during zero-copy forwarding
- DOC: config: fix a few typos and refine cpu-affinity
- MINOR: receiver: Remove tgroup_mask from struct shard_info
- BUG/MINOR: quic: fix deprecated warning for window size keyword
QUIC configuration was cleaned up in the previous release. Several
global keyword names were changed to unify the configuration. For each
of them the older keyword is marked as deprecated, with a warning to
mention the newer alternative.
This patch fixes the warning for 'tune.quic.frontend.default-max-size'
as the alternative proposed was not correct. The proper value now is
'tune.quic.fe.cc.max-win-size'.
This must be backported up to 3.3.
The only purpose from tgroup_mask seems to be to calculate how many
tgroups share the same shard, but this is an information we can
calculate differently, we just have to increment the number when a new
receiver is added to the shard, and decrement it when one is detached
from the shard. Removing thread group masks will allow us to increase
the maximum number of thread groups past 64.
There were two typos in the recently updated parts about per-group.
Also, change the commas to ':' after the options values, as sometimes
it would be confusing. Last, place quotes around keyword names so that
they're explicitly referred to as language keywords. No backport is
needed.
The <kip> of producer was not forwarded to <kop> of consumer when zero-copy
data forwarding was tried. Because of the issue, the chunking of emitted H1
messages could be invalid.
To fix the bug, sc_ep_fwd_kip() must be called at this stage.
This fix is related to the previous one (529a8dbfb "BUG/MEDIUM: mux-h1: Take
care to update <kop> value during zero-copy forwarding"). Both are required
to fully fix the issue #3230.
This patch must be backported to 3.3.
In issue #3229 it was reported that the master couldn't reload after a
failed reload following a wrong configuration.
It is still possible to do a reload using the "reload" command of the
master CLI. But every signals are blocked.
The problem was introduced in 709cde6d0 ("BUG/MEDIUM: mworker: signals
inconsistencies during startup and reload") which fixes the blocking of
signals during the reload.
However the patch missed a case, indeed, the
run_master_in_recovery_mode() is not being called when the worker failed
to parse the configuration, it is only failing when the master is
failing.
To handle this case, the mworker_unblock_signals() function must be
called upon mworker_on_new_child_failure(). But since this is called in
an haproxy signal handler it would mess with the signals.
Instead, the patch adds a task which is started by the signal handler,
and restores the signals outside of it.
This must be backported as far as 3.1.
When using per-group affinity, add an optional new directive. It accepts
the values of "auto", where when multiple thread groups are created, the
available CPUs are split equally across the groups, and is the new
default, and "loose", where all groups are bound to all available CPUs,
this is the old default.
Lua 5.5 adds an extra argument to lua_newstate(). Since there are
already a few other ifdefs in hlua.c checking for the Lua version,
and there's a single call place, let's do the same here. This should
be safe for backporting if needed.
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Remove the "stopped_tgroup_mask" variable, that indicated which thread
groups were stopping, and instead just use "stopped_tgroups", a counter
indicating how many thread groups are stopping. We want to remove all
thread group masks, so that we can increase the maximum number of thread
groups past 64.
Since the extra field was removed from the HTX structure, a regression was
introduced when forwarding of chunked messages. The <kop> value was not
decreased as it should be when data were sent via the zero-copy
forwarding. Because of this bug, it was possible to announce a chunk size
larger than the chunk data sent.
To fix the bug, an helper function was added to properly update the <kop>
value when a chunk size is emitted. This function is now called when new
chunk is announced, including during zero-copy forwarding.
As a workaround, "tune.disable-zero-copy-forwarding" or just
"tune.h1.zero-copy-fwd-send off" can be set in the global section.
This patch should fix the issue #3230. It must be backported to 3.3.
When a shutdown was reported to a peer applet, the event was not properly
handled if it failed to receive data. The function responsible to get data
was exiting too early if the applet buffer was empty, without testing the
sedesc status. Because of this issue, it was possible to have frozen peer
applets. For instance, it happend on client timeout. With too many frozen
applets, it was possible to reach the maxconn.
This patch should fix the issue #3234. It must be backported to 3.3.
Rename "visited_tsid" and "visited_ccx" to "touse_tsid" and
"touse_ccx". They are not there to remember which tsid/ccx we
alreaday visited, contrarily to visited_ccx_set and
visited_cl_set, they are there to know which tsid/ccx we should
use, so make that clear.
Introduce cli_append_worker_row() to centralize formatting of a single
worker row. Also, replace duplicated row-printing code in both current
and old workers loops with the helper. Motivation: Reduces LOC and
improves readability by removing duplication.
After commit 594408cd612b5 ("BUG/MINOR: mworker/cli: 'show proc' is limited
by buffer size"), related to ticket #3204, the "show proc" logic
has been fixed to be able to print more than 202 processes. However, this
fix can lead to the omission of entries in case they have the same
timestamp.
To fix this, we use the unique reload counter instead of the timestamp.
On partial flush, set ctx->next_reload = child->reloads.
On resume skip entries with child->reloads >= ctx->next_reload.
Finally, we clear ctx->next_reload at the end of a complete dump so
subsequent show proc starts from the top.
Could be backported in all stable branches.
Change mworker_env_to_proc_list() to if (child->pid > 0) before
LIST_APPEND, avoiding invalid PIDs (0/-1) in the process list.
This has no functional impact beyond stricter validation and it aligns
with existing kill safeguards.
The stick-table doc was reworked and moved in 3.2 with commit da67a89f3
("DOC: config: move stick-tables and peers to their own section"), however
the optional length attribute for binary/string types was mistakenly
spelled "length" while it's "len".
This must be backported to 3.2.
It can make sense to support extra components in the fingerprint to ease
configuration, so let's change the 0/1 value to a bit field. We also turn
the current 1 (TCP options list) to 2 so that we'll reuse 1 for the TTL.
Here we collect all the stuff that depends on the sender's settings,
such as TOS, IP version, TTL range, presence of DF bit or IP options,
presence of DATA in the SYN, CWR+ECE flags, TCP header length, wscale,
initial window, mss, as well as the list of TCP extension kinds. It's
obviously fairly limited but can allows to avoid blacklisting certain
valid clients sharing the same IP address as a misbehaving one.
It supports both a short and a long mode depending on the argument.
These can be used with the tcp-ss bind option. The doc was updated
accordingly.
This adds the following converters, used to decode fields
in an incoming tcp header:
tcp.dst, tcp.flags, tcp.seq, tcp.src, tcp.win,
tcp.options.mss, tcp.options.tsopt, tcp.options.tsval,
tcp.options.wscale, tcp.options_list,
These can be used with the tcp-ss bind option. The doc was updated
accordingly.
This adds a few converters that help decode parts of IP packets:
- ip.data : returns the next header (typically TCP)
- ip.df : returns the dont-fragment flags
- ip.dst : returns the destination IPv4/v6 address
- ip.hdr : returns only the IP header
- ip.proto: returns the upper level protocol (udp/tcp)
- ip.src : returns the source IPv4/v6 address
- ip.tos : returns the TOS / TC field
- ip.ttl : returns the TTL/HL value
- ip.ver : returns the IP version (4 or 6)
These can be used with the tcp-ss bind option. The doc was updated
accordingly.
This adds a few converters that help decode parts of ethernet frame
headers:
- eth.data : returns the next header (typically IP)
- eth.dst : returns the destination MAC address
- eth.hdr : returns only the ethernet header
- eth.proto: returns the ethernet proto
- eth.src : returns the source MAC address
- eth.vlan : returns the VLAN ID when present
These can be used with the tcp-ss bind option. The doc was updated
accordingly.
In 2.6, do_connect_server() was introduced by commit 0a4dcb65f ("MINOR:
stream-int/backend: Move si_connect() in the backend scope") and changed
the approach to work with a stream instead of a stream-interface. However
si_oc(si) was wrongly turned to &s->res instead of &s->req, which breaks
TFO by always inspecting the response channel to figure whether there are
data pending.
This fix can be backported to all versions till 2.6.
In 2.6, the retries counter on a stream was changed from retries left
to retries done via commit 731c8e6cf ("MINOR: stream: Simplify retries
counter calculation"). However, one comparison fell through the cracks
in order to detect whether or not we can use TFO (only first attempt),
resulting in TFO never working anymore.
This may be backported to all versions till 2.6.
We want to reset visited_ccx, as introduced by commit
8aef5bec1ef57eac449298823843d6cc08545745, each time we run the loop,
otherwise the chances of its content being correct are very low, and
will likely end up being bound to the wrong threads.
This was reported in github issue #3224.
This function retrieves the copy of a SYN packet that the system has
kept for us when bind option "tcp-ss" was set to 1 or above. It's
recommended to copy it to a local variable because it will be freed
after being read. It allows to inspect all parts of an incoming SYN
packet, provided that it was preserved (e.g. not possible with SYN
cookies). The doc provides examples of how to use it.
It's regularly needed to call getsockopt() on a connection, but each
time the calling code has to do all the job by itself. This commit adds
a "get_opt()" callback on the protocol struct, that directly calls
getsockopt() on the connection's FD. A generic implementation for
standard sockets is provided, though QUIC would likely require a
different approach, or maybe a mapping. Due to the overlap between
IP/TCP/socket option values, it is necessary for the caller to indicate
both the level and the option. An abstraction of the level could be
done, but the caller would nonetheless have to know the optname, which
is generally defined in the same include files. So for now we'll
consider that this callback is only for very specific use.
The levels and optnames are purposely passed as signed ints so that it
is possible to further extend the API by using negative levels for
internal namespaces.
This option enables TCP_SAVE_SYN on the listening socket, which will
cause the kernel to try to save a copy of the SYN packet header (L2,
IP and TCP are supported). This can permit to check the source MAC
address of a client, or find certain TCP options such as a source
address encapsulated using RFC7974. It could also be used as an
alternate approach to retrieving the source and destination addresses
and ports. For now setting the option is enabled, but sample fetch
functions and converters will be needed to extract info.
This makes a significant difference when loading large files and during
commit and clear operations, thanks to improved cache locality. In the
measurements below, master refers to the code before any of the changes
to the patterns code, not the code before this one commit.
Timing the replacement of 10M entries from the CLI with this command
which also reports timestamps at start, end of upload and end of clear:
$ (echo "prompt i"; echo "show activity"; echo "prepare acl #0";
awk '{print "add acl @1 #0",$0}' < bad-ip.map; echo "show activity";
echo "commit acl @1 #0"; echo "clear acl @0 #0";echo "show activity") |
socat -t 10 - /tmp/sock1 | grep ^uptim
master, on a 3.7 GHz EPYC, 3 samples:
uptime_now: 6.087030
uptime_now: 25.981777 => 21.9 sec insertion time
uptime_now: 29.286368 => 3.3 sec commit+clear
uptime_now: 5.748087
uptime_now: 25.740675 => 20.0s insertion time
uptime_now: 29.039023 => 3.3 s commit+clear
uptime_now: 7.065362
uptime_now: 26.769596 => 19.7s insertion time
uptime_now: 30.065044 => 3.3s commit+clear
And after this commit:
uptime_now: 6.119215
uptime_now: 25.023019 => 18.9 sec insertion time
uptime_now: 27.155503 => 2.1 sec commit+clear
uptime_now: 5.675931
uptime_now: 24.551035 => 18.9s insertion
uptime_now: 26.652352 => 2.1s commit+clear
uptime_now: 6.722256
uptime_now: 25.593952 => 18.9s insertion
uptime_now: 27.724153 => 2.1s commit+clear
Now timing the startup time with a 10M entries file (on another machine)
on master, 20 samples:
Standard Deviation, s: 0.061652677408033
Mean: 4.217
And after this commit:
Standard Deviation, s: 0.081821371548669
Mean: 3.78
Situations where we are iterating over elements and find one with a
different generation ID cannot arise anymore since the elements are kept
per-generation.
Instead of a global list (and tree) of pattern reference elements, we
now have an intermediate pat_ref_gen structure and store the elements in
those. This simplifies the logic of some operations such as commit and
clear, and improves performance in some cases - numbers to be provided
in a subsequent commit after one important optimization is added.
A lot of the changes are due to adding an extra level of indirection,
changing many cases where we iterate over all elements to an outer loop
iterating over the generation and an inner one iterating over the
elements of the current generation. It is therefore easier to read this
patch using 'git diff -w'.
Safe and non-functional changes that only add currently unused
structures, field, functions and macros, in preparation of larger
changes that alter the way pattern reference elements are stored.
This includes code to create and lookup generation objects, and
macros to iterate over the generations of a pattern reference.
Each proxy has its owned task for internal purpose. Currently, it is
only used either by frontends or if a stick-table is present.
This commit rendres the task allocation optional to only the required
case. Thus, it is not allocated anymore for backend only proxies without
stick-table.
A legacy check could be activated at compile time to reject backends
without servers. In practice this is not used anymore and does not have
much sense with the introduction of dynamic servers.
Each frontend/backend/listen proxies is assigned an unique ID. It can
either be set explicitely via 'id' keyword, or automatically assigned on
post parsing depending on the available values.
It was expected that the first automatically assigned value would start
at '1'. However, due to a legacy bug this is not the case as this value
is always skipped. Thus, automatically assigned proxies always start at
'2' or more.
To avoid breaking the current existing state, this situation is now
acknowledged with the current patch. The code is rewritten with an
explicit warning to ensure that this won't be fixed without knowing the
current status. A new regtest also ensures this.
We now count glitches for each parsing error, including those that
have been accepted via accept-unsafe-violations-*. Front and back
are considered and the connection gets killed on error once if the
threshold is reached or passed and the CPU usage is beyond the
configured limit (0 by default). This was tested with:
curl -ivH "host : blah" 0:4445{,,,,,,,,,}
which sends 10 requests to a configuration having a threshold of 5.
The global keywords are named similarly to H2 and quic:
tune.h1.be.glitches-threshold xxxx
tune.h1.fe.glitches-threshold xxxx
The glitches count of each connection is also reported when non-null
in the connection dumps (e.g. "show fd").
This avoids hitting the hard wall for connections with non-compliant
peers that would be accumulating errors over long connections. We now
permit to recycle the connection early enough to reset the connection
counter.
This was tested artificially by adding this to h2c_frt_handle_headers():
h2c_report_glitch(h2c, 1, "new stream");
or this to h2_detach():
h2c_report_glitch(h2c, 1, "detaching");
and injecting using h2load -c 1 -n 1000 0:4445 on a config featuring
tune.h2.fe.glitches-threshold 1000:
finished in 8.74ms, 85802.54 req/s, 686.62MB/s
requests: 1000 total, 751 started, 751 done, 750 succeeded, 250 failed, 250 errored, 0 timeout
status codes: 750 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 6.00MB (6293303) total, 132.57KB (135750) headers (space savings 29.84%), 5.86MB (6144000) data
min max mean sd +/- sd
time for request: 9us 178us 10us 6us 99.47%
time for connect: 139us 139us 139us 0us 100.00%
time to 1st byte: 339us 339us 339us 0us 100.00%
req/s : 87477.70 87477.70 87477.70 0.00 100.00%
The failures are due to h2load not supporting reconnection.
One rare error case could produce a protocol error on the stream when
not being able to decode response headers wasn't being accounted as a
glitch, so let's fix it.
This guarantees that the compiler will not optimize away the memset()
call if it detects a dead store.
Use this to clear SSL passphrases.
No backport needed.
src/cpu_topo.c:1325:15: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses]
1325 | } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
| ^ ~
src/cpu_topo.c:1325:15: note: add parentheses after the '!' to evaluate the bitwise operator first
1325 | } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
| ^
| ( )
src/cpu_topo.c:1325:15: note: add parentheses around left hand side expression to silence this warning
1325 | } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
| ^
| ( )
src/cpu_topo.c:1533:15: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses]
1533 | } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
| ^ ~
src/cpu_topo.c:1533:15: note: add parentheses after the '!' to evaluate the bitwise operator first
1533 | } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
| ^
| ( )
src/cpu_topo.c:1533:15: note: add parentheses around left hand side expression to silence this warning
1533 | } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
| ^
| ( )
No backport needed.
Add a new cpu-affinity keyword, "per-thread".
If used, each thread will be bound to only one hardware thread of the
thread group.
If used in conjonction with the "threads-per-core 1" cpu_policy, then
each thread will be bound on a different core.