haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-15 16:01:02 +01:00

Author	SHA1	Message	Date
Willy Tarreau	4ef6be4a1f	[RELEASE] Released version 3.2-dev6 Released version 3.2-dev6 with the following main changes : - BUG/MEDIUM: debug: close a possible race between thread dump and panic() - DEBUG: thread: report the spin lock counters as seek locks - DEBUG: thread: make lock time computation more consistent - DEBUG: thread: report the wait time buckets for lock classes - DEBUG: thread: don't keep the redundant _locked counter - DEBUG: thread: make lock_stat per operation instead of for all operations - DEBUG: thread: reduce the struct lock_stat to store only 30 buckets - MINOR: lbprm: add a new callback ->server_requeue to the lbprm - MEDIUM: server: allocate a tasklet for asyncronous requeuing - MAJOR: leastconn: postpone the server's repositioning under contention - BUG/MINOR: quic: reserve length field for long header encoding - BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding - MINOR: quic: simplify length calculation for STREAM/CRYPTO frames - BUG/MINOR: mworker: section ignored in discovery after a post_section_parser - BUG/MINOR: mworker: post_section_parser for the last section in discovery - CLEANUP: mworker: "program" section does not have a post_section_parser anymore - MEDIUM: initcall: allow to register mutiple post_section_parser per section - CI: cirrus-ci: bump FreeBSD image to 14-2 - DOC: initcall: name correctly REGISTER_CONFIG_POST_SECTION() - REGTESTS: stop using truncated.vtc on freebsd - MINOR: quic: refactor STREAM encoding and splitting - MINOR: quic: refactor CRYPTO encoding and splitting - BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED - BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals - BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals - MINOR: ssl/cli: display more filenames in 'show ssl cert' - DOC: watchdog: document the sequence of the watchdog and panic - MINOR: ssl: store the filenames resulting from a lookup in ckch_conf - MINOR: startup: allow hap_register_feature() to enable a feature in the list - MINOR: quic: support frame type as a varint - BUG/MINOR: startup: leave at first post_section_parser which fails - BUG/MINOR: startup: hap_register_feature() fix for partial feature name - BUG/MEDIUM: cli: Be sure to drop all input data in END state - BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker - BUG/MEDIUM: filters: Handle filters registered on data with no payload callback - BUG/MINOR: fcgi: Don't set the status to 302 if it is already set - MINOR: ssl/crtlist: split the ckch_conf loading from the crtlist line parsing - MINOR: ssl/crtlist: handle crt_path == cc->crt in crtlist_load_crt() - MINOR: ssl/ckch: return from ckch_conf_clean() when conf is NULL - MEDIUM: ssl/crtlist: "crt" keyword in frontend - DOC: configuration: document the "crt" frontend keyword - DEV: h2: add a Lua-based HTTP/2 connection tracer - BUG/MINOR: quic: prevent crash on conn access after MUX init failure - BUG/MINOR: mux-quic: prevent crash after MUX init failure - DEV: h2: fix flags for the continuation frame - REGTESTS: Fix truncated.vtc to send 0-CRLF - BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut - Revert "REGTESTS: stop using truncated.vtc on freebsd" - MINOR: mux-quic: define a QCC application state member - MINOR: mux-quic/h3: emit SETTINGS via MUX tasklet handler - MINOR: mux-quic/h3: support temporary blocking on control stream sending v3.2-dev6	2025-02-19 18:39:51 +01:00
Amaury Denoyelle	a7645d7cd5	MINOR: mux-quic/h3: support temporary blocking on control stream sending When HTTP/3 layer is initialized via QUIC MUX, it first emits a SETTINGS frame on an unidirectional control stream. However, this could be prevented if client did not provide initial flow control. Previously, QUIC MUX was unable to deal with such situation. Thus, the connection was closed immediately and no transfer could occur. Improve this by extending QUIC MUX application layer API : initialization may now return a transient error. This allows MUX to continue to use the connection normally. Initialization will be retried periodically alter until it can succeed. This new API allows to deal with the flow control issue described above. Note that this patch is not considered as a bug fix. Indeed, clients are strongly advised to provide enough flow control for a SETTINGS frame exchange.	2025-02-19 11:08:02 +01:00
Amaury Denoyelle	06e7674399	MINOR: mux-quic/h3: emit SETTINGS via MUX tasklet handler Previously, QUIC MUX application layer was installed and initialized via MUX init. However, the latter stage involve I/O operations, for example when using HTTP/3 with the emission of a SETTINGS frame. Change this to prevent any I/O operations during MUX init. As such, finalize app_ops callback is now called during the first invokation of qcc_io_send(), in the context of MUX tasklet. To implement this, a new application state value is added, to detect the transition from NULL to INIT stage.	2025-02-19 11:03:40 +01:00
Amaury Denoyelle	188fc45b95	MINOR: mux-quic: define a QCC application state member Introduce a new QCC field to track the current application layer state. For the moment, only INIT and SHUT state are defined. This allows to replace the older flag QC_CF_APP_SHUT. This commit does not bring major changes. It is only necessary to permit future evolutions on QUIC MUX. The only noticeable change is that QMUX traces can now display this new field.	2025-02-19 10:59:53 +01:00
Christopher Faulet	4a99f15f0c	Revert "REGTESTS: stop using truncated.vtc on freebsd" This reverts commit 0b9a75e8781593c250f6366a64a019018ade688e. Thanks to the previous fixes ("REGTESTS: Fix truncated.vtc to send 0-CRLF" and "BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut"), this script can be reenabled for FreeBSD.	2025-02-18 17:35:00 +01:00
Christopher Faulet	b70921f2c1	BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut On shut, truncated HTX messages were not properly handled by the H2 multiplexer. Depending on how data were emitted, a chunked HTX message without the 0-CRLF could be considered as full and an empty data with ES flag set could be emitted instead of a RST_STREAM(CANCEL) frame. In the H2 multiplexer, when a shut is performed, an HTX message is considered as truncated if more HTX data are still expected. It is based on the presence or not of the H2_SF_MORE_HTX_DATA flag on the H2 stream. However, this flag is set or unset depending on the HTX extra field value. This field is used to state how much data that must still be transferred, based on the announced data length. For a message with a content-length, this assumption is valid. But for a chunked message, it is not true. Only the length of the current chunk is announced. So we cannot rely on this field in that case to know if a message is full or not. Instead, we must rely on the HTX start-line flags to know if more HTX data are expected or not. If the xfer length is known (the HTX_SL_F_XFER_LEN flag is set on the HTX start-line), it means that more data are always expected, until the end of message is reached (the HTX_FL_EOM flag is set on the HTX message). This is true for bodyless message because the end of message is reported with the end of headers. This is also true for tunneled messages because the end of message is received before switching the H2 stream in tunnel mode. This patch must be backported as far as 2.8.	2025-02-18 17:34:59 +01:00
Christopher Faulet	b93e419750	REGTESTS: Fix truncated.vtc to send 0-CRLF When a chunked messages is sent, the 0-CRLF must be explicitely sent. Since the begining, it is missing. Just add it.	2025-02-18 17:34:59 +01:00
Willy Tarreau	af5c07eee9	DEV: h2: fix flags for the continuation frame It's flag 2 (end of headers) that's defined there, not 3 (padded).	2025-02-18 14:17:17 +01:00
Amaury Denoyelle	2715dbe9d0	BUG/MINOR: mux-quic: prevent crash after MUX init failure qmux_init() may fail for several reasons. In this case, connection resources are freed and underlying and a CONNECTION_CLOSE will be emitted via its quic_conn instance. In case of qmux_init() failure, qcc_release() is used to clean up resources, but QCC <conn> member is first resetted to NULL, as connection released must be delayed. Some cleanup operations are thus skipped, one of them is the resetting of <ctx> connection member to NULL. This may cause a crash as <ctx> is a dangling pointer after QCC release. One of the possible reproducer is to activate QMUX traces, which will cause a segfault on the qmux_init() error leave trace. To fix this, simply reset <ctx> to NULL manually on qmux_init() failure. This must be backported up to 3.0.	2025-02-18 11:02:46 +01:00
Amaury Denoyelle	2cdc4695cb	BUG/MINOR: quic: prevent crash on conn access after MUX init failure Initially, QUIC-MUX was responsible to reset quic_conn <conn> member to NULL when MUX was released. This was performed via qcc_release(). However, qcc_release() is also used on qmux_init() failure. In this case, connection must be freed via its session, so QCC <conn> member is resetted to NULL prior to qcc_release(), which prevents quic_conn <conn> member to also be resetted. As the connection is freed soon after, quic_conn <conn> is a dangling pointer, which may cause crashes. This bug should be very rare as first it implies that QUIC-MUX initialization has failed (for example due to a memory alloc error). Also, <conn> member is rarely used by quic_conn instance. In fact, the only reproducible crash was done with QUIC traces activated, as in this case connection is accessed via quic_conn under __trace_enabled() function. To fix this, detach connection from quic_conn via the XPRT layer instead of the MUX. More precisely, this is performed via quic_close(). This should ensure that it will always be conducted, either on normal connection closure, but also after special conditions such as MUX init failure. This should be backported up to 2.6.	2025-02-18 10:43:56 +01:00
Willy Tarreau	607aa57b2e	DEV: h2: add a Lua-based HTTP/2 connection tracer The following config is sufficient to trace H2 exchanges between a client and a server: global lua-load "dev/h2/h2-tracer.lua" listen h2_sniffer mode tcp bind :8002 filter lua.h2-tracer #hex server s1 127.0.0.1:8003 The commented "hex" argument will also display full frames in hex (not recommended). The connections are prefixed with a 3-hex digit number in order to also support a bit of multiplexing without impacting the reading too much. The screen is split in two, with the request on the left and the response on the right. Here's an example of what it does between an haproxy backend and an haproxy frontend both in H2, when submitted a curl request for /?s=30k handled by httpterm: [001] ### req start [001] [PREFACE len=24] [001] [SETTINGS sid=0 len=24 (bytes=24)] [001] \| ### res start [001] \| [SETTINGS sid=0 len=18 (bytes=27)] [001] \| [SETTINGS ACK sid=0 len=0 (bytes=0)] [001] [SETTINGS ACK sid=0 len=0 (bytes=56)] [001] [HEADERS EH+ES sid=1 len=47 (bytes=47)] [001] \| [HEADERS EH sid=1 len=101 (bytes=15351)] [001] \| [DATA sid=1 len=15126 (bytes=15241)] [001] \| [DATA sid=1 len=1258 (bytes=106)] [001] \| ... -106 = 1152 [001] \| ... -1152 = 0 [001] [WINDOW_UPDATE sid=1 len=4 (bytes=43)] [001] [WINDOW_UPDATE sid=0 len=4 (bytes=30)] [001] [WINDOW_UPDATE sid=1 len=4 (bytes=17)] [001] [WINDOW_UPDATE sid=0 len=4 (bytes=4)] [001] \| [DATA ES sid=1 len=14336 (bytes=14336)] [001] [WINDOW_UPDATE sid=0 len=4 (bytes=4)] [001] ### req end: 31080 bytes total [001] \| [GOAWAY sid=0 len=8 (bytes=8)] [001] \| ### res end: 31097 bytes total It deserves some improvements. For instance at the moment it does not verify the preface, any 24 bytes will work. It does not perform any protocol validation either. Detecting some issues such as out-of-sequence frames could be helpful. But it already helps as-is.	2025-02-18 09:26:15 +01:00
William Lallemand	764f6910ed	DOC: configuration: document the "crt" frontend keyword Document the "crt" keyword of frontend and listen section.	2025-02-17 18:26:37 +01:00
William Lallemand	cd6a02ace9	MEDIUM: ssl/crtlist: "crt" keyword in frontend This patch implements the "crt" keywords in frontend, declaring an implicit crt-list named after the frontend. The patch is split in two steps: The first step is the crt keyword parser, which parses crt lines and fill a "cfg_crt_node" struct containing a ssl_bind_conf and a ckch_conf which are put in a list to be used later. After parsing the frontend section, as a 2nd step, a post_section_parser is called, it will create a crt-list named after the frontend and will fill it with certificates from the list of cfg_crt_node. Once created this crt-list will be loaded in every "ssl" bind lines that didn't declare any crt or crt-list. Example: listen https bind :443 ssl crt foobar.pem crt test1.net.crt key test1.net.key Implements part of #2854	2025-02-17 18:26:37 +01:00
William Lallemand	82f927817e	MINOR: ssl/ckch: return from ckch_conf_clean() when conf is NULL ckch_conf_clean() mustn't be executed when the argument is NULL, this will keep the API more consistant like any free() function.	2025-02-17 18:26:37 +01:00
William Lallemand	0330011acf	MINOR: ssl/crtlist: handle crt_path == cc->crt in crtlist_load_crt() Handle the case where crt_path == cc->crt, so the pointer doesn't get free'd before getting strdup'ed in crtlist_load_crt().	2025-02-17 18:26:37 +01:00
William Lallemand	69163cd63e	MINOR: ssl/crtlist: split the ckch_conf loading from the crtlist line parsing ckch_conf loading is not that simple as it requires to check - if the cert already exists in the ckchs_tree - if the ckch_conf is compatible with an existing cert in ckchs_tree - if the cert is a bundle which need to load multiple ckch_store This logic could be reuse elsewhere, so this commit introduce the new crtlist_load_crt() function which does that.	2025-02-17 18:26:37 +01:00
Christopher Faulet	ca79ed5eef	BUG/MINOR: fcgi: Don't set the status to 302 if it is already set When a "Location" header was found in a FCGI response, the status code was forced to 302. But it should only be performed if no status code was set first. So now, we take care to not override an already defined status code when the "Location" header is found. This patch should fix the issue #2865. It must backported to all stable versions.	2025-02-17 16:37:53 +01:00
Christopher Faulet	34542d5ec2	BUG/MEDIUM: filters: Handle filters registered on data with no payload callback An HTTP filter with no http_payload callback function may be registered on data. In that case, this filter is obviously not called when some data are received but it remains important to update its internal state to be sure to keep it synchronized on the stream, especially its offet value. Otherwise, the wrong calculation on the global offset may be performed in flt_http_end(), leading to an integer overflow when data are moved from input to output. This overflow triggers a BUG_ON() in c_adv(). The same is true for TCP filters with no tcp_payload callback function. This patch must be backport to all stable versions.	2025-02-17 16:16:29 +01:00
Christopher Faulet	49b7bcf583	BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker On reload, the new worker requests bound FDs to the old one. The old worker sends them in message of at most 252 FDs. Each message is acknowledged by the new worker. All messages sent or received by the old worker are handled manually via sendmsg/recv syscalls. So the old worker must be sure consume all the ACK replies. However, the last one was never consumed. So it was considered as a command by the CLI applet. This issue was hidden since recently. But it was the root cause of the issue #2862. Note this last ack is also the first one when there are less than 252 FDs to transfer. This patch must be backported to all stable versions.	2025-02-17 15:31:07 +01:00
Christopher Faulet	972ce87676	BUG/MEDIUM: cli: Be sure to drop all input data in END state Commit 7214dcd ("BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR") revealed a bug with the CLI applet. Pending input data when the applet is in CLI_ST_END state were never consumed or dropped, leading to a wakeup loop. The CLI applet implements its own snd_buf callback function. It is important it consumes all pending input data. Otherwise, the applet is woken up in loop until it empties the request buffer. Another way to fix the issue would be to report an error. But in that case, it seems reasonnable to drop these data. The issue can be observed on reload, in master/worker mode, because of issue about the last ACK message which was never consummed by the _getsocks() command. This patch should fix the issue #2862. It must be backported to 3.1 with the commit above.	2025-02-17 15:31:07 +01:00
William Lallemand	ab2fa95bdd	BUG/MINOR: startup: hap_register_feature() fix for partial feature name In patch 2fe4cbd8e ("MINOR: startup: allow hap_register_feature() to enable a feature in the list"), the ability to overwrite a '-' in the feature list was added. However the code was not tokenizing correctly the string, and partial feature name found in the name could result in having the same feature name multiple time. This patch rewrites the lookup of the string by tokenizing it correctly.	2025-02-17 14:56:09 +01:00
William Lallemand	7268e9c249	BUG/MINOR: startup: leave at first post_section_parser which fails Since we are now iterating on post_section_parser() for a same keyword, we need to exit at the first ERR_ABORT. The post_section_parser() is called when parsing a new section, but also at the end of the file to be called for the last section. The changes in 4de86bb ("MEDIUM: initcall: allow to register mutiple post_section_parser per section") should have added tests on the ERR_ABORT value. Also pcs->post_section_parser() must be called instead of cs->post_section_parser() because we could have a NULL ptr. This bug does not affect anything since we don't use REGISTER_CONFIG_POST_SECTION() yet.	2025-02-17 11:21:20 +01:00
Amaury Denoyelle	32691e7c25	MINOR: quic: support frame type as a varint QUIC frame type is encoded as a variable-length integer. Thus, 64-bit integer should be used for them. Currently, this was not the case as type was represented as a 1-byte char inside quic_frame structure. This does not cause any issue with QUIC from RFC9000, as all frame types fit in this range. Furthermore, a QUIC implementation is required to use the smallest size varint when encoding a frame type. However, the current code is unable to accept QUIC extension with bigger frame types. This is notably the case for quic-on-streams draft. Thus, this commit readjusts quic_frame architecture to be able to support higher frame type values. First, type field of quic_frame is changed to a 64-bits variable. Both encoding and decoding frame functions uses variable-length integer helpers to manipulate the frame type field. Secondly, the quic_frame builders/parsers infrastructure is still preserved. However, it could be impossible to define new large frame type as an index into quic_frame_builders / quic_frame_parsers arrays. Thus, wrapper functions are now provided to access the builders and parsers. Both qf_builder() and qf_parser() wrappers can then be extended to return custom builder/parser instances for larger frame type. Finally, unknown frame type detection also uses the new wrapper quic_frame_is_known(). As with builders/parsers, for large frame type, this function must be manually completed to support a new type value.	2025-02-14 09:00:05 +01:00
William Lallemand	2fe4cbd8e5	MINOR: startup: allow hap_register_feature() to enable a feature in the list This patch allows hap_register_feature() to enable a feature in the list which was already registered and marked disabled. This way we could enable automatically some features under certain condition without the need of the USE argument with make and correctly report its activation.	2025-02-14 00:09:17 +01:00
William Lallemand	7034f2ca48	MINOR: ssl: store the filenames resulting from a lookup in ckch_conf With this patch, files resulting from a lookup (.key, .ocsp, *.issuer etc) are now stored in the ckch_conf. It allows to see the original filename from where it was loaded in "show ssl cert <filename>"	2025-02-13 17:44:00 +01:00
Willy Tarreau	a4d65c9cc8	DOC: watchdog: document the sequence of the watchdog and panic Each time we go into the watchdog and panic code, it's super hard to figure who calls what since signals are involved to bounce between threads. Let's document the main principles and sequences to ease the journey next time.	2025-02-13 16:45:07 +01:00
William Lallemand	0c0b38d64c	MINOR: ssl/cli: display more filenames in 'show ssl cert' "show ssl cert <file>" only displays a unique filename, which is the key used in the ckch_store tree. This patch extends it by displaying every filenames from the ckch_conf that can be configured with the crt-store. In order to be more consistant, some changes are needed in the future: - we need to store the complete path in the ckch_conf (meaning with crt-path or key-path) - we need to fill a ckch_conf in cases the files are autodiscovered	2025-02-13 16:18:06 +01:00
William Lallemand	5a7cbb8d81	BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals 1d3c8223 ("MINOR: ssl: allow to change the server signature algorithm") mplemented the sigals keyword in the crt-list but never the dump of the keyword over the CLI. Must be backported as far as 2.8.	2025-02-12 17:16:50 +01:00
William Lallemand	037d2e5498	BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals b6ae2aafde43 ("MINOR: ssl: allow to change the signature algorithm for client authentication") implemented the client-sigals keyword in the crt-list but never the dump of the keyword over the CLI. Must be backported as far as 2.8.	2025-02-12 17:16:50 +01:00
Willy Tarreau	561319bd1c	BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED The crappy epoll API stroke again with reloads and transferred FDs. Indeed, when listening sockets are retrieved by a new worker from a previous one, and the old one finally stops listening on them, it closes the FDs. But in this case, since the sockets themselves were not closed, epoll will not unregister them and will continue to report new activity for these in the old process, which can only observe, count an fd_poll_drop event and not unregister them since they're not reachable anymore. The unfortunate effect is that long-lasting old processes are woken up at the same rate as the new process when accepting new connections, and can waste a lot of CPU. Accept rates divided by 8 were observed on a small test involving a slow transfer on 10 connections facing a reload every second so that 10 processes were busy dealing with them while another process was hammering the service with new connections. Fortunately, years ago we implemented a flag FD_CLONED exactly for similar purposes. Let's simply mark transferred FDs with FD_CLONED so that the process knows that these ones require special treatment and have to be manually unregistered before being closed. This does the job fine, now old processes correctly unregister the FD before closing it and no longer receive accept events for the new process. This needs to be backported to all stable versions. It only affects epoll, as usual, and this time in combination with transferred FDs (typically reloads in master-worker mode). Thanks to Damien Claisse for providing all detailed measurements and statistics allowing to understand and reproduce the problem.	2025-02-12 16:35:01 +01:00
Amaury Denoyelle	e2744d23be	MINOR: quic: refactor CRYPTO encoding and splitting This patch is the direct follow-up of the previous one which refactor STREAM frame encoding. Reuse the newly defined quic_strm_frm_fillbuf() and quic_strm_frm_split() functions for CRYPTO frame encoding. The code for CRYPTO and STREAM frames encoding should now be clearer as it is mostly identical.	2025-02-12 15:10:54 +01:00
Amaury Denoyelle	f96af8e463	MINOR: quic: refactor STREAM encoding and splitting CRYPTO and STREAM frames encoding is similar. If payload is too large, frame will be splitted and only the first payload part will be written in the output QUIC packet. This process is complexified by the presence of a variable-length integer Length field prior to the payload. This commit aims at refactor these operations. Define two functions to simplify the code : * quic_strm_frm_fillbuf() which is used to calculate the optimal frame length of a STREAM/CRYPTO frame with its payload in a buffer * quic_strm_frm_split() which is used to split the frame payload if buffer is too small With this patch, both functions are now implemented for STREAM encoding.	2025-02-12 15:10:03 +01:00
William Lallemand	0b9a75e878	REGTESTS: stop using truncated.vtc on freebsd We never succeed to make the truncated.vtc reg-test work constantly on the Cirrus FreeBSD CI. Let's exclude it from the FreeBSD tests so the CI don't break randomly.	2025-02-12 13:34:40 +01:00
William Lallemand	0b47e5fa20	DOC: initcall: name correctly REGISTER_CONFIG_POST_SECTION() REGISTER_CONFIG_POST_SECTION() was not named correctly.	2025-02-12 13:27:44 +01:00
William Lallemand	6097938209	CI: cirrus-ci: bump FreeBSD image to 14-2 FreeBSD CI since to be broken for a while, try to upgrade the image to the latest 14.2 version.	2025-02-12 13:18:55 +01:00
William Lallemand	4de86bbbfc	MEDIUM: initcall: allow to register mutiple post_section_parser per section Before this patch, REGISTER_CONFIG_SECTION() allowed to register one and only one callback (<post>) called after the parsing of a section. It was limitating because you couldn't register a post callback from anywhere else in the code. This patch introduces the new REGISTER_CONFIG_SECTION_POST() macros which allows to register a new post callback for a section keyword from anywhere. This patch introduces the feature by allowing `struct cfg_section` entries that does not have a `section_parser`, and then iterating on all cfg_section with a post_section_parser for a keyword.	2025-02-12 12:52:41 +01:00
William Lallemand	5c2039b5b8	CLEANUP: mworker: "program" section does not have a post_section_parser anymore The "program" section does not have a post_section_parser anymore so no need to make an exception for it.	2025-02-12 12:37:01 +01:00
William Lallemand	313eeae7db	BUG/MINOR: mworker: post_section_parser for the last section in discovery Previous patch 2c270a05f ("BUG/MINOR: mworker: section ignored in discovery after a post_section_parser") needs an adjustment for the last section of the file. Indeed the post_section_parser of the last section must not be called in discovery mode. Must be backported in 3.1.	2025-02-12 12:34:57 +01:00
William Lallemand	2c270a05f0	BUG/MINOR: mworker: section ignored in discovery after a post_section_parser When a new section is discovered, the post_section_parser of the previous section is called. However in the new master-worker mode the discovery mode will skip the post_section_parser. But instead of trying to parse the current section keyword after that, it would skip completely the current line. This is a minor bug since there isn't a lot of section with post_section_parser, and not a lot of section to parse in discovery mode. But this could be reproduced like this: global expose-deprecated-directives resolvers res parse-resolv-conf program foo command sleep 10 program bar command sleep 10 Ths 'resolvers' section has a post_section_parser which will be ignored in discovery mode with the consequence of ignoring the first program section. This must be backported in 3.1.	2025-02-12 12:18:17 +01:00
Amaury Denoyelle	731340afbd	MINOR: quic: simplify length calculation for STREAM/CRYPTO frames STREAM and CRYPTO frames have a similar encoding format. In particular, both of them have a variable-length integer Length field just before the frame payload. It is complex to determine the optimal Length value before copying the payload data in the remaining buffer space. As such, helper functions were implemented to calculate this. However, CRYPTO and STREAM frames encoding implementation were not completely aligned, which renders the code harder to follow. The purpose of this commit is to simplify CRYPTO and STREAM frames encoding. First, a new helper quic_int_cap_length() is defined which is useful to determine the optimal buffer room available if prefixed by a variable-length integer as Length field. Then, processing of both CRYPTO and STREAM frames is now nearly identical, based on this new helper function. Functions max_available_room() and max_stream_data_size() are now unused and are removed.	2025-02-12 11:51:09 +01:00
Amaury Denoyelle	e6a223542a	BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding Function max_stream_data_size() is used to determine the payload length of a CRYPTO frame. It takes into account that the CRYPTO length field is a variable length integer. Implemented calcul was incorrect as it reserved too much space as a frame header. This error is mostly due because max_stream_data_size() reuses max_available_room() which also reserve space for a variable length integer. This results in CRYPTO frames shorter of 1 to 2 bytes than the maximum achievable value, which produces in the end datagram shorter than the MTU. Fix max_stream_data_size() implementation. It is now merely a wrapper on max_available_room(). This ensures that CRYPTO frame encoding is now properly optimized to use the MTU available. This should be backported up to 2.6.	2025-02-12 11:51:09 +01:00
Amaury Denoyelle	63747452a3	BUG/MINOR: quic: reserve length field for long header encoding Long header packets have a mandatory Length field, which contains the size of Packet number and payload, encoded as a variable-length integer. Its value can thus only be determined after the payload size is known, which depends on the remaining buffer space after this variable-length field. Packet payload are encoded in two steps. First, a list of input frames is processed until the packet buffer is full. CRYPTO and STREAM frames payload can be splitted if need to fill the buffer. Real encoding is then performed as a second stage operation, first with Length field, then with the selected frames themselves. Before this patch, no space was reserved in the buffer for Length field when attaching the frames to the packet. This could result in a error as the packet payload would be too large for the remaining space. In practice, this issue was rarely encounted, mostly as a side-effect from another issue linked to CRYPTO frame encoding. Indeed, a wrong calculation is performed on CRYPTO splitting, which results in frame payload shorter by a few bytes than expected. This however ensured there would be always enough room for the Length field and payload during encoding. As CRYPTO frames are the only big enough content emitted with a Long header packet, this renders the current issue mostly non reproducible. Fix the original issue by reserving some space for Length field prior to frame payload calculation, using a maximum value based on the remaining room space. Packet length is then reduced if needed when encoding is performed, which ensures there is always enough room for the selected frames. Note that the other issue impacting CRYPTO frame encoding is not yet fixed. This could result in datagrams with Long header packets not completely extended to the full MTU. The issue will be addressed in another patch. This should be backported up to 2.6.	2025-02-12 11:51:09 +01:00
Willy Tarreau	627280e15f	MAJOR: leastconn: postpone the server's repositioning under contention When leastconn is used under many threads, there can be a lot of contention on leastconn, because the same node has to be moved around all the time (when picking it and when releasing it). In GH issue #2861 it was noticed that 46 threads out of 64 were waiting on the same lock in fwlc_srv_reposition(). In such a case, the accuracy of the server's key becomes quite irrelevant because nobody cares if the same server is picked twice in a row and the next one twice again. While other approaches in the past considered using a floating key to avoid moving the server each time (which was not compatible with the round-robin rule for equal keys), here a more drastic solution is needed. What we're doing instead is that we turn this lock into a trylock. If we can grab it, we do the job. If we can't, then we just wake up a server's tasklet dedicated to this. That tasklet will then try again slightly later, knowing that during this short time frame, the server's position in the queue is slightly inaccurate. Note that any thread touching the same server will also reposition it and save that work for next time. Also if multiple threads wake the tasklet up, then that's fine, their calls will be merged and a single lock will be taken in the end. Testing this on a 24-core EPYC 74F3 showed a significant performance boost from 382krps to 610krps. The performance profile reported by perf top dropped from 43% to 2.5%: Before: Overhead Shared Object Symbol 43.46% haproxy-master-inlineebo [.] fwlc_srv_reposition 21.20% haproxy-master-inlineebo [.] fwlc_get_next_server 0.91% haproxy-master-inlineebo [.] process_stream 0.75% [kernel] [k] ice_napi_poll 0.51% [kernel] [k] tcp_recvmsg 0.50% [kernel] [k] ice_start_xmit 0.50% [kernel] [k] tcp_ack After: Overhead Shared Object Symbol 30.37% haproxy [.] fwlc_get_next_server 2.51% haproxy [.] fwlc_srv_reposition 1.91% haproxy [.] process_stream 1.46% [kernel] [k] ice_napi_poll 1.36% [kernel] [k] tcp_recvmsg 1.04% [kernel] [k] tcp_ack 1.00% [kernel] [k] skb_release_data 0.96% [kernel] [k] ice_start_xmit 0.91% haproxy [.] conn_backend_get 0.82% haproxy [.] connect_server 0.82% haproxy [.] run_tasks_from_lists Tested on an Ampere Altra with 64 aarch64 cores dedicated to haproxy, the gain is even more visible (3.6x): Before: 311-323k rps, 3.16-3.25ms, 6400% CPU Overhead Shared Object Symbol 55.69% haproxy-master [.] fwlc_srv_reposition 33.30% haproxy-master [.] fwlc_get_next_server 0.89% haproxy-master [.] process_stream 0.45% haproxy-master [.] h1_snd_buf 0.34% haproxy-master [.] run_tasks_from_lists 0.32% haproxy-master [.] connect_server 0.31% haproxy-master [.] conn_backend_get 0.31% haproxy-master [.] h1_headers_to_hdr_list 0.24% haproxy-master [.] srv_add_to_idle_list 0.23% haproxy-master [.] http_request_forward_body 0.22% haproxy-master [.] __pool_alloc 0.21% haproxy-master [.] http_wait_for_response 0.21% haproxy-master [.] h1_send After: 1.21M rps, 0.842ms, 6400% CPU Overhead Shared Object Symbol 17.44% haproxy [.] fwlc_get_next_server 6.33% haproxy [.] process_stream 4.40% haproxy [.] fwlc_srv_reposition 3.64% haproxy [.] conn_backend_get 2.75% haproxy [.] connect_server 2.71% haproxy [.] h1_snd_buf 2.66% haproxy [.] srv_add_to_idle_list 2.33% haproxy [.] run_tasks_from_lists 2.14% haproxy [.] h1_headers_to_hdr_list 1.56% haproxy [.] stream_set_backend 1.37% haproxy [.] http_request_forward_body 1.35% haproxy [.] http_wait_for_response 1.34% haproxy [.] h1_send And at similar loads, the CPU usage considerably drops (3.55x), as well as the response time (10x): After: 320k rps, 0.322ms, 1800% CPU Overhead Shared Object Symbol 7.62% haproxy [.] process_stream 4.64% haproxy [.] h1_headers_to_hdr_list 3.09% haproxy [.] h1_snd_buf 3.08% haproxy [.] h1_process_demux 2.22% haproxy [.] __pool_alloc 2.14% haproxy [.] connect_server 1.87% haproxy [.] h1_send > 1.84% haproxy [.] fwlc_srv_reposition 1.84% haproxy [.] run_tasks_from_lists 1.77% haproxy [.] sock_conn_iocb 1.75% haproxy [.] srv_add_to_idle_list 1.66% haproxy [.] http_request_forward_body 1.65% haproxy [.] wake_expired_tasks 1.59% haproxy [.] h1_parse_msg_hdrs 1.51% haproxy [.] http_wait_for_response > 1.50% haproxy [.] fwlc_get_next_server The cost of fwlc_get_next_server() naturally increases as the server count increases, but now has no visible effect on updates. The load distribution remains unchanged compared to the previous approach, the weight still being respected. For further improvements to the fwlc algo, please consult github issue #881 which centralizes everything related to this algorithm.	2025-02-12 11:48:10 +01:00
Willy Tarreau	b6a8318cc2	MEDIUM: server: allocate a tasklet for asyncronous requeuing This creates a tasklet that only expects to be called when the LB algorithm is under contention when trying to reposition the server in its tree. Indeed, that's one of the operations that usually requires to take a write lock on a highly contended area, often for very little benefits under contention; indeed, under load, if a server keeps its previous position for a few extra microseconds, usually there's no harm. Thus this new tasklet can be woken up by the LB algo to ask the server to later call lbprm.server_requeue(). It does nothing else.	2025-02-11 17:24:09 +01:00
Willy Tarreau	20b8c4ddba	MINOR: lbprm: add a new callback ->server_requeue to the lbprm This callback will be used to reposition a server to its expected position regardless of the fact that it was taken or dropped. It will only be used by supporting LB algos. For now, only fwlc defines it and assigns it to fwlc_srv_reposition(). At the moment it's not used yet.	2025-02-11 17:16:14 +01:00
Willy Tarreau	eced1d6d8a	DEBUG: thread: reduce the struct lock_stat to store only 30 buckets Storing only 30 buckets means we only keep 256 bytes per label. This further simplifies address calculation and reduces the memory used without complicating the locking code. It means we won't measure wait times larger than a second but we're not supposed to face this as it would trigger the watchdog anyway. It may become a little bit just if measuring using rdtsc() instead of now_mono_time() though (typically the limit would be around 350ms for a 3 GHz CPU).	2025-02-10 18:34:43 +01:00
Willy Tarreau	c2f2d6fd3c	DEBUG: thread: make lock_stat per operation instead of for all operations It's more convenient (and more readable) to have the lock stats arranged by operation type (read, seek, write). It will also allow to later simplify the structure format and the bucket address calculation. Now lock_stat[] got split into lock_stats_rd[], lock_stats_sk[], lock_stats_wr[].	2025-02-10 18:34:43 +01:00
Willy Tarreau	4168d1278c	DEBUG: thread: don't keep the redundant _locked counter Now that we have our sums by bucket, the _locked counter is redundant since it's always equal to the sum of all entries. Let's just get rid of it and replace its consumption with a loop over all buckets, this will reduce the overhead of taking each lock at the expense of a tiny extra effort when dumping all locks, which we don't care about.	2025-02-10 18:34:43 +01:00
Willy Tarreau	a22550fbd7	DEBUG: thread: report the wait time buckets for lock classes In addition to the total/average wait time, we now also store the wait time in 2^N buckets. There are 32 buckets for each type (read, seek, write), allowing to store wait times from 1-2ns to 2.1-4.3s, which is quite sufficient, even if we'd want to switch from NS to CPU cycles in the future. The counters are only reported for non- zero buckets so as not to visually pollute the output. This significantly inflates the lock_stat struct, which is now aligned to 256 bytes and rounded up to 1kB. But that's not really a problem, given that there's only one per lock label.	2025-02-10 18:34:43 +01:00
Willy Tarreau	0b849c59fb	DEBUG: thread: make lock time computation more consistent The lock time computation was a bit inconsistent between functions, particularly those using a try_lock. Some of them would count the lock as taken without counting the time, others would simply not count it. This is essentially due to the way the time is retrieved, as it was done inside the atomic increment. Let's instead always use start_time to carry the elapsed time, by presetting it to the negative time before the event and addinf the positive time after, so that it finally contains the duration. Then depending on the try lock's success, we add the result or not. This was generalized to all lock functions for consistency, and because this will be handy for future changes.	2025-02-10 18:34:43 +01:00

1 2 3 4 5 ...

23960 Commits