haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-16 03:56:56 +02:00

Author	SHA1	Message	Date
Christopher Faulet	4b8098bf48	MINOR: connection: No longer include stconn type header in connection-t.h It is a small change, but it is cleaner to no include stconn-t.h header in connection-t.h, mainly to avoid circular definitions. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Christopher Faulet	33ac3dabcb	MEDIUM: applet: Add a .shut callback function for applets Applets can now define a shutdown callback function, just like the multiplexer. It is especially usefull to get the abort reason. This will be pretty useful to get the status code from the SPOP stream to report it at the SPOe filter level. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Christopher Faulet	1538c4aa82	MEDIUM: proxy/spoe: Add a SPOP mode The SPOE was significantly lightened. It is now possible to refactor it to use a dedicated multiplexer. The first step is to add a SPOP mode for proxies. The corresponding multiplexer mode is also added. For now, there is no SPOP multiplexer, so it is only declarative. But at the end, the SPOP multiplexer will be automatically selected for servers inside a SPOP backend. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Christopher Faulet	b986952a75	MINOR: spoe: Remove the dedicated SPOE applet task The dedicated task per SPOE applet is no longer used. So it is removed. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Christopher Faulet	4e589095d9	MAJOR: spoe: Remove idle applets and pipelining support Management of idle applets is removed. Consequently, the pipelining support is also removed. It is a huge change but it should be transparent for the agents, except regarding the performances. Of course, being able to reuse already openned connections and being able to multiplex frames on a given connection is a must have. These features will be restored later. hello and idle timeout are not longer used. Because an applet is spawned to process a NOTIFY frame and closed after receiving the ACK reply, the processing timeout is the only one required. In addition, the parameters to limit the SPOE applet creation are no longer used too. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Christopher Faulet	2405881ab0	MINOR: spoe: Remove debugging All the SPOE debugging is removed. The code will be easier to rework this way and the debugging will be mainly moved in the SPOP multiplexter via the trace API. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Christopher Faulet	d37489abef	MINOR: spoe: Use only a global engine-id per agent Because the async mode was removed, it is no longer mandatory to announce a different engine identifiers per thread for a given SPOE agent. This was used to be sure requests and the corresponding responses are stuck on the same thread. So, now, a SPOE agent only announces one engine identifier on all connections. No changes should be expected for agents. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Christopher Faulet	52ad7eb79e	MEDIUM: spoe: Remove async mode support The support for asynchronous mode, the ability to send messages on a connection and receive the responses on any other connections, is removed. It appears this feature was a bit overkill. And it is a problem for this refactoring. This feature is removed and will not be restored at the end. It is not a big deal for agent supporting the async mode because it is usable if it is announced on both sides. HAProxy stops to announce it. This should be transparent for agents. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Christopher Faulet	e3c92209f7	MEDIUM: spoe: Remove fragmentation support It is the first patch of a long series to refactor the SPOE filter. The idea is to rely on a dedicated multiplexer instead of hakcing HAProxy with a list of applets processing a message queue. First of all, optionnal features will be removed. Some will be restored at the end, some others will just be removed. It is the case here. The frame fragmentation support is removed. The only purpose of this feature is to be able to support the streaming. Because it is out of the scope of this refactoring, the fragmentation is removed. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Christopher Faulet	249a547f37	CLEANUP: stconn: Fix a typo in comments for SE_ABRT_SRC_* Just a little typo: s/set bu/ set by/	2024-07-12 15:27:04 +02:00
Valentine Krasnobaeva	9302869c95	BUG/MINOR: limits: fix license type in limits.h Need to use LGPL-2.1-or-later in headers since our hedaers default to LGPL.	2024-07-11 18:15:48 +02:00
Amaury Denoyelle	3be58fc720	CLEANUP: quic: rename TID affinity elements This commit is the renaming counterpart of the previous one, this time for quic_conn module. Several elements related to TID affinity update from quic_conn has been renamed : public functions, but also flag renamed to QUIC_FL_CONN_TID_REBIND and trace event to QUIC_EV_CONN_BIND_TID. This should be backported with the same instruction as the previous commit.	2024-07-11 15:14:06 +02:00
Amaury Denoyelle	9fbe8b0334	CLEANUP: proto: rename TID affinity callbacks Since the following patch, protocol API to update a connection TID affinity has been extended. commit `1a43b9f32c` MINOR: proto: extend connection thread rebind API The single callback set_affinity has been splitted in 3 different functions which are called at different stages during listener_accept(), depending on accept queue push success or not. However, the naming was rendered confusing by the usage of function prefix 1 and 2. Rename proto callback related to TID affinity update and use the following names : * bind_tid_prep * bind_tid_commit * bind_tid_reset This commit should probably be backported at least up to 3.0 with the above patch. This is because the fix was recently backported and it would allow to keep changes minimal between the two versions. It could even be backported up to 2.8 if there is no major conflict.	2024-07-11 15:14:06 +02:00
Amaury Denoyelle	b0990b38f8	MINOR: quic: add counters of sent bytes with and without GSO Add a sent bytes counter for each quic_conn instance. A secondary field which only account bytes sent via GSO which is useful to ensure if this is activated. For the moment, these counters are reported on "show quic" but not aggregated on proxy quic module stats.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	d0ea173e35	MEDIUM: quic: implement GSO fallback mechanism UDP GSO on Linux is not implemented in every network devices. For example, this is not available for veth devices frequently used in container environment. In such case, EIO is reported on send() invocation. It is impossible to test at startup for proper GSO support in this case as a listener may be bound on multiple network interfaces. Furthermore, network interfaces may change during haproxy lifetime. As such, the only option is to react on send syscall error when GSO is used. The purpose of this patch is to implement a fallback when encountering such conditions. Emission can be retried immediately by trying to send each prepared datagrams individually. To support this, qc_send_ppkts() is able to iterate over each datagram in a so-called non-GSO fallback mode. Between each emission, a datagram header is rewritten in front of the buffer which allows the sending loop to proceed until last datagram is emitted. To complement this, quic_conn listener is flagged on first GSO send error with value LI_F_UDP_GSO_NOTSUPP. This completely disables GSO for all future emission with QUIC connections using this listener. For the moment, non-GSO fallback mode is activated when EIO is reported after GSO has been set. This is the error reported for the veth usage described above.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	448d3d388a	MINOR: quic: add GSO parameter on quic_sock send API Add <gso_size> parameter to qc_snd_buf(). When non-null, this specifies the value for socket option SOL_UDP/UDP_SEGMENT. This allows to send several datagrams in a single call by splitting data multiple times at <gso_size> boundary. For now, <gso_size> remains set to 0 by caller, as such there should not be any functional change.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	96a34d79d9	MINOR: quic: define quic_cc_path MTU as constant Future commits will implement GSO support to be able to emit multiple datagrams in a single syscall invocation. This will be used every time there is more data to sent than the UDP network MTU. No change will be done for Tx buffer encoding, in particular when using extra metadata datagram header. When GSO will be used, length field will contain the total length of all datagrams to emit in a single GSO syscall send. As such, QUIC send functions will detect that GSO is in use if total length is greater than MTU. This last assumption forces to ensure that MTU is constant. Indeed, in case qc_send() is interrupted, Tx buffer will be left with prepared datagrams. These datagrams will be emitted at the next qc_send() invocation. If MTU would change during these two calls, it would be impossible to know if GSO was used or not. To prevent this, mark <mtu> field of quic_cc_path as constant.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	35470d5185	MINOR: quic: activate UDP GSO for QUIC if supported Add a startup test for GSO support in quic_test_socketopts() and automatically activate it in qc_prep_pkts() when building datagrams as big as MTU. Also define a new config option tune.quic.disable-udp-gso. This is useful to prevent warning on older platform or to debug an issue which may be related to GSO.	2024-07-11 11:02:44 +02:00
Valentine Krasnobaeva	22db643648	MINOR: haproxy: prepare to move limits-related code This patch is done in order to prepare the move of handlers to compute and to check process related limits as maxconn, maxsock, maxpipes. So, these handlers become no longer static due to the future move. We add the handlers declarations in limits.h in this patch as well, in order to keep the next patch, dedicated to code replacement, without any additional modifications. Such split also assures that this patch can be compiled separately from the next one, where we moving the handlers. This is important in case of git-bisect.	2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva	b8dc783eb9	REORG: global: move rlim_fd_*_at_boot in limits Let's move in 'limits' compilation unit global variables to keep the initial process fd limits.	2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva	47f2afb436	CLEANUP: fd: rm struct rlimit definition As raise_rlim_nofile() was moved to limits compilation unit, limits.h includes the system <sys/resource.h>. So, this definition of rlimit system type structure is no longer need for compilation of fd unit.	2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva	3759674047	REORG: fd: move raise_rlim_nofile to limits Let's move raise_rlim_nofile() from 'fd' compilation unit to 'limits', as it wraps setrlimit to change process RLIMIT_NOFILE.	2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva	1517bcb5e3	MINOR: limits: prepare to keep limits in one place The code which gets, sets and checks initial and current fd limits and process related limits (maxconn, maxsock, ulimit-n, fd-hard-limit) is spread around different functions in haproxy.c and in fd.c. Let's group it together in dedicated limits.c and limits.h. This patch is done in order to prepare the moving of limits-related functions from different places to the new 'limits' compilation unit. It helps to keep clean the next patch, which will do only the move without any additional modifications. Such detailed split is needed in order to be sure not to break accidentally limits logic and in order to be able to compile each commit separately in case of git-bisect.	2024-07-10 18:05:48 +02:00
Willy Tarreau	4e65fc66f6	MAJOR: import: update mt_list to support exponential back-off (try #2 ) This is the second attempt at importing the updated mt_list code (commit 59459ea3). The previous one was attempted with commit `c618ed5ff4` ("MAJOR: import: update mt_list to support exponential back-off") but revealed problems with QUIC connections and was reverted. The problem that was faced was that elements deleted inside an iterator were no longer reset, and that if they were to be recycled in this form, they could appear as busy to the next user. This was trivially reproduced with this: $ cat quic-repro.cfg global stats socket /tmp/sock1 level admin stats timeout 1h limited-quic frontend stats mode http bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3 timeout client 5s stats uri / $ ./haproxy -db -f quic-repro.cfg & $ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/ => hang This was purely an API issue caused by the simplified usage of the macros for the iterator. The original version had two backups (one full element and one pointer) that the user had to take care of, while the new one only uses one that is transparent for the user. But during removal, the element still has to be unlocked if it's going to be reused. All of this sparked discussions with Fred and Aur�lien regarding the still unclear state of locking. It was found that the lock API does too much at once and is lacking granularity. The new version offers a much more fine- grained control allowing to selectively lock/unlock an element, a link, the rest of the list etc. It was also found that plenty of places just want to free the current element, or delete it to do anything with it, hence don't need to reset its pointers (e.g. event_hdl). Finally it appeared obvious that the root cause of the problem was the unclear usage of the list iterators themselves because one does not necessarily expect the element to be presented locked when not needed, which makes the unlock easy to overlook during reviews. The updated version of the list presents explicit lock status in the macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED suffix, the caller is expected to unlock the element if it intends to reuse it. At least the status is advertised. The _UNLOCKED variant, instead, always unlocks it before starting the loop block. This means it's not necessary to think about unlocking it, though it's obviously not usable with everything. A few _UNLOCKED were used at obvious places (i.e. where the element is deleted and freed without any prior check). Interestingly, the tests performed last year on QUIC forwarding, that resulted in limited traffic for the original version and higher bit rate for the new one couldn't be reproduced because since then the QUIC stack has gaind in efficiency, and the 100 Gbps barrier is now reached with or without the mt_list update. However the unit tests definitely show a huge difference, particularly on EPYC platforms where the EBO provides tremendous CPU savings. Overall, the following changes are visible from the application code: - mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr => MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED() + 1 back elem - MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED() => just manually set iterator to NULL however. For MT_LIST_FOR_EACH_ENTRY_LOCKED() => mt_list_unlock_self() (if element going to be reused) + NULL - MT_LIST_LOCK_ELT => mt_list_lock_full() - MT_LIST_UNLOCK_ELT => mt_list_unlock_full() - l = MT_LIST_APPEND_LOCKED(h, e); MT_LIST_UNLOCK_ELT(); => l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)	2024-07-09 16:46:38 +02:00
Amaury Denoyelle	19b8c1b7cd	DEV: flags/quic: decode quic_conn flags Decode quic_conn flags via qc_show_flags() function. To support this, quic flags definition have been put outside of USE_QUIC directive.	2024-07-08 09:38:35 +02:00
Amaury Denoyelle	95f624540b	BUG/MEDIUM: quic: prevent crash on accept queue full Handshake for quic_conn instances runs on a single non-chosen thread. On completion, listener_accept() is performed to select the less loaded thread before initializing connection instance. As such, quic_conn instance is migrated to the thread with its upper connection. In case accept queue is full, listener_accept() fallback to local accept mode, which cause the connection to be assigned to the current thread. However, this is not supported by QUIC as quic_conn instance is left on the previously selected thread. In most cases, this will cause a BUG_ON() due to a task manipulation from an outside thread. To fix this, handle quic_conn thread rebind in multiple steps using the new extended protocol API. Several operations have been moved from qc_set_tid_affinity1() to newly defined qc_set_tid_affinity2(), in particular CID TID update. This ensures that quic_conn instance is not prematurely accessed on the new thread until accept queue push is guaranteed to succeed. qc_reset_tid_affinity() is also newly defined to reassign the newly created tasks and tasklets to the current thread. This is necessary to prevent the BUG_ON() crash described above. This must be backported up to 2.8 after a period of observation. Note that it depends on previous patch : MINOR: proto: extend connection thread rebind API	2024-07-04 17:28:56 +02:00
Amaury Denoyelle	1a43b9f32c	MINOR: proto: extend connection thread rebind API MINOR: listener: define callback for accept queue push Extend API for connection thread rebind API by replacing single callback set_affinity by three different ones. Each one of them is used at a different stage of the operation : * set_affinity1 is used similarly to previous set_affinity * set_affinity2 is called directly from accept_queue_push_mp() when an entry has been found in accept ring. This operation cannot fail. * reset_affinity is called after set_affinity1 in case of failure from accept_queue_push_mp() due to no space left in accept ring. This is necessary for protocols which must reconfigure resources before fallback on the current tid. This patch does not have any functional changes. However, it will be required to fix crashes for QUIC connections when accept queue ring is full. As such, it must be backported with it.	2024-07-04 16:33:21 +02:00
Valentine Krasnobaeva	41275a6918	MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD Let's provide a default value for fd_hard_limit, if it's not set in the configuration. With this patch we could set some specific default via compile-time variable DEFAULT_MAXFD as well. Hope, this will be helpfull for haproxy package maintainers. make -j 8 TARGET=linux-glibc DEBUG=-DDEFAULT_MAXFD=50000 If haproxy is comipled without DEFAULT_MAXFD defined, the default will be set to 1048576. This is done to avoid killing the process by its watchdog, while it started without any limitations in its configuration or in the command line and the hard RLIMIT_NOFILE is extremely huge (~1000000000). We use in this case compute_ideal_maxconn() to calculate maxconn and maxsock, maxsock defines the size of internal fdtab, which becames very-very large as well. When the process starts to simply loop over this fdtab (0(n)), this takes a lot of time, so watchdog does it job. To avoid this, maxconn now is always reduced to some reasonable value either by explicit global.fd-hard-limit from configuration, or by its default. The default may be changed at build-time and overwritten then by global.fd-hard-limit at runtime. Explicit global.fd-hard-limit from the configuration has always precedence over DEFAULT_MAXFD, if set. Must be backported in all stable versions until v2.6.0, including v2.6.0.	2024-07-04 07:52:42 +02:00
Amaury Denoyelle	8550549cca	REORG: quic: remove quic_cid_trees reference from proto_quic Previous commit removed access/manipulation to QUIC CID global tree outside of quic_cid module. This ensures that proper locking is always performed. This commit finalizes this cleanup by marking CID global tree as static only to quic_cid source file. Initialization of this tree is removed from proto_quic and now performed using dedicated initcalls quic_alloc_global_cid_tree(). As a side change, complete CID global tree documentation, in particular to explain CID global tree artificial splitting and ODCID handling. Overall, the code is now clearer and safer.	2024-07-03 15:02:40 +02:00
Amaury Denoyelle	0a352ef08e	MINOR: quic: remove access to CID global tree outside of quic_cid module haproxy generates for each QUIC connection a set of CID. The peer must reuse them as DCID for its emitted packet. On datagram reception, DCID field serves as identifier to dispatch them on their correct thread. These CIDs are stored in a global CID tree. Access to this data structure must always be protected with CID_LOCK. This commit is a refactoring to regroup all CID tree access in quic_cid module. Several code parts are ajusted : * quic_cid_insert() is extended to check for insertion race-condition. This is useful on quic_conn instantiation. Code where such race cannot happen can use unsafe _quic_cid_insert() instead. * on RETIRE_CONNECTION_ID frame reception, existing quic_cid_delete() function is used. * remove tree lookup from qc_check_dcid(), extracted in the new quic_cmp_cid_conn() function. Ultimately, the latter should be removed as CID lookup could be conducted on quic_conn owned tree without locking.	2024-07-03 15:02:40 +02:00
Amaury Denoyelle	a05fefe74d	CLEANUP: quic: cleanup prototypes related to CIDs handling Remove duplicated prototypes from quic_conn.h also present in quic_cid.h. Also remove quic_derive_cid() prototype and mark it as static.	2024-07-03 15:02:40 +02:00
Amaury Denoyelle	789d4abd73	BUG/MEDIUM: h3: ensure the ":method" pseudo header is totally valid Ensure pseudo-header method is only constitued of valid characters according to RFC 9110. If an invalid value is found, the request is rejected and stream is resetted. Previously only characters forbidden in headers were rejected (NUL/CR/LF), but this is insufficient for :method, where some other forbidden chars might be used to trick a non-compliant backend server into seeing a different path from the one seen by haproxy. Note that header injection is not possible though. This must be backported up to 2.6. Many thanks to Yuki Mogi of FFRI Security Inc for the detailed report that allowed to quicky spot, confirm and fix the problem.	2024-06-28 14:36:30 +02:00
Willy Tarreau	290659ffd3	MINOR: activity: make the memory profiling hash size configurable at build time The MEMPROF_HASH_BITS variable was set to 10 without a possibility to change it (beyond patching the code). After seeing a few reports already with "other" being listed and a list with close to 1024 entries, it looks like it's about time to either increase the hash size, or at least make it configurable for special cases. As a reminder, in order to remain fast, the algorithm searches no more than 16 places after the hash, so when a table is almost full, searches are long and new places are rare. The present patch just makes it possible to redefine it by passing "-DMEMPROF_HASH_BITS=11" or "-DMEMPROF_HASH_BITS=12" in CFLAGS, and moves the definition to defaults.h to make it easier to find. Such values should be way sufficient for the vast majority of use cases. Maybe in the future we'd change the default. At least this version should be backported to ease rebuilds, say, till 2.8 or so.	2024-06-27 18:01:27 +02:00
Valentine Krasnobaeva	5e06d45df7	REORG: init: encapsulate 'reload' sockpair and master CLI listeners creation Let's encapsulate the logic of 'reload' sockpair and master CLI listeners creation, used by master CLI into a separate function, as we needed this only in master-worker runtime mode. This makes the code of init() more readable.	2024-06-27 16:08:42 +02:00
Christopher Faulet	ad946a704d	MINOR: stick-table: Always decrement ref count before killing a session Guarded functions to kill a sticky session, stksess_kill() stksess_kill_if_expired(), may or may not decrement and test its reference counter before really killing it. This depends on a parameter. If it is set to non-zero value, the ref count is decremented and if it falls to zero, the session is killed. Otherwise, if this parameter is equal to zero, the session is killed, regardless the ref count value. In the code, these functions are always called with a non-zero parameter and the ref count is always decremented and tested. So, there is no reason to still have a special case. Especially because it is not really easy to say if it is supported or not. Does it mean it is possible to kill a sticky session while it is still referenced somewhere ? probably not. So, does it mean it is possible to kill a unreferenced session ? This case may be problematic because the session is accessed outside of any lock and thus may be released by another thread because it is unreferenced. Enlarging scope of the lock to avoid any issue is possible but it is a bit of shame to do so because there is no usage for now. The best is to simplify the API and remove this case. Now, stksess_kill() and stksess_kill_if_expired() functions always decrement and test the ref count before killing a sticky session.	2024-06-26 15:05:06 +02:00
Christopher Faulet	9357873641	BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session When we try to kill a session, the shard must be locked before decrementing the ref count on the session. Otherwise, the ref count can fall to 0 and a purge task (stktable_trash_oldest or process_table_expire) may release the session before we have the opportunity to acquire the lock on the shard to effectively kill the session. This could lead to a double free. Here is the scenario: Thread 1 Thread 2 sktsess_kill(ts) if (ATOMIC_DEC(&ts->ref_cnt) != 0) return /* here the ref count is 0 / stktable_trash_oldest() LOCK(&sh_lock) if (!ATOMIC_LOAD(&ts->ref_cnf)) __stksess_free(ts) UNLOCK(&sh_lock) / here the session was released */ LOCK(&sh_lock) __stksess_free(ts) <--- double free UNLOCK(&sh_lock) The bug was introduced in 2.9 by the commit `7968fe3889` ("MEDIUM: stick-table: change the ref_cnt atomically"). The ref count must be decremented inside the lock for stksess_kill() and sktsess_kill_if_expired() function. This patch should fix the issue #2611. It must be backported as far as 2.9. On the 2.9, there is no sharding. All the table is locked. The patch will have to be adapted.	2024-06-26 12:05:37 +02:00
Frederic Lecaille	bc9821fd26	BUILD: Missing inclusion header for ssize_t type Compilation issue detected as follows by gcc: In file included from src/ncbuf.c:19: src/ncbuf.c: In function 'ncb_write_off': include/haproxy/bug.h:144:10: error: unknown type name 'ssize_t' 144 \| extern ssize_t write(int, const void *, size_t); \	2024-06-26 10:17:09 +02:00
Willy Tarreau	2d27c80288	BUILD: debug: also declare strlen() in __ABORT_NOW() Previous commit `8f204fa8ae` ("MINOR: debug: print gdb hints when crashing") broken on the CI where strlen() isn't known. Let's forward-declare it in the __ABORT_NOW() functions, just like write(). No backport is needed.	2024-06-26 08:04:40 +02:00
Willy Tarreau	8f204fa8ae	MINOR: debug: print gdb hints when crashing To make bug reporting easier for users, when crashing, let's suggest what to do. Typically when a BUG_ON() matches, only the current thread is useful the vast majority of the time, while when the watchdog triggers, all threads are interesting. The messages are printed at the end after the dump. We may adjust these with wiki links in the future is more detailed instructions are relevant.	2024-06-26 07:43:00 +02:00
Valentine Krasnobaeva	2cd52a88be	MINOR: cli/debug: show dev: show capabilities If haproxy compiled with Linux capabilities support, let's show process capabilities before applying the configuration and at runtime in 'show dev' command output. This maybe useful for debugging purposes. Especially in cases, when process changes its UID and GID to non-priviledged or it has started and run under non-priviledged UID and needed capabilities are set by admin on the haproxy binary.	2024-06-26 07:38:21 +02:00
Valentine Krasnobaeva	0d79c9bedf	MINOR: cli/debug: show dev: add cmdline and version 'show dev' command is very convenient to obtain haproxy debugging information, while process is run in container. Let's extend its output with version and cmdline. cmdline is useful in a way, as it shows absolute binary path and its arguments, because sometimes the person, who is debugging failing container is not the same, who has created and deployed it. argc and argv are stored in the exported global structure, because feed_post_mortem() is added as a post check function callback in the post_check_list. So we can't simply change the signature of feed_post_mortem(), without breaking other post check callbacks APIs. Parsers are not supposed to modify argv, so we can safely bypass its pointer to debug_parse_cli_show_dev(), without copying all argument stings somewhere in the heap or on stack.	2024-06-26 07:38:21 +02:00
Valentine Krasnobaeva	fcf1a0bcf5	MINOR: capabilities: export capget and __user_cap_header_struct To be able to show process capabilities before applying its configuration and also at runtime in 'show dev' command output, we need to export the wrapper around capget() syscall. It also seems more handy to place __user_cap_header_struct in .data section and declare it as globally accessible, as we always fill it with the same values. This avoids allocate and fill these 8 bytes each time on the stack frame, when capget() or capset() wrappers are called.	2024-06-26 07:38:21 +02:00
Aurelien DARRAGON	9d312212df	BUG/MINOR: proxy: fix email-alert leak on deinit() (2nd try) As shown in GH #2608 and ("BUG/MEDIUM: proxy: fix email-alert invalid free"), simply calling free_email_alert() from free_proxy() is not the right thing to do. In this patch, we reuse proxy->email_alert.set memory space to introduce proxy->email_alert.flags in order to support 2 flags: PR_EMAIL_ALERT_SET (to mimic proxy->email_alert.set) and PR_EMAIL_ALERT_RESOLVED (set once init_email_alert() was called on the proxy to resolve email_alert.mailer pointer). Thanks to PR_EMAIL_ALERT_RESOLVED flag, free_email_alert() may now properly handle the freeing of proxy email_alert settings: if the RESOLVED flag is set, then it means the .email_alert.mailers.name parsing hint was replaced by the actual mailers pointer, thus no free should be attempted. No backport needed: as described in ("BUG/MEDIUM: proxy: fix email-alert invalid free"), this historical leak is not sensitive as it cannot be triggered during runtime.. thus given that the fix is not backport- friendly, it's not worth the trouble.	2024-06-17 19:37:29 +02:00
Aurelien DARRAGON	ee8be55942	REORG: mailers: move free_email_alert() to mailers.c free_email_alert() was declared in cfgparse.c, but it should belong to mailers.c instead.	2024-06-17 19:37:29 +02:00
William Lallemand	30a432d198	MINOR: ssl: activate sigalgs feature for AWS-LC AWSLC lacks the SSL_CTX_set1_sigalgs_list define, however the function exists, which disables the feature in HAProxy, even if we could have build with it. SSL_CTX_set1_client_sigalgs_list() is not available, though. This patch introduce the define so the feature is enabled.	2024-06-17 17:40:49 +02:00
Aurelien DARRAGON	983513d901	DEBUG: hlua: distinguish burst timeout errors from exec timeout errors hlua burst timeout was introduced in `58e36e5b1` ("MEDIUM: hlua: introduce tune.lua.burst-timeout"). It is a safety measure that allows to detect when too much time is spent on a single lua execution (between 2 interruptions/yields), meaning that the current thread is not able to perform other tasks. Such scenario should be avoided because it will cause thread contention which may have negative performance impact and could cause the watchdog to trigger. When the burst timeout is exceeded, the current Lua execution is aborted and a timeout error is reported to the user. Unfortunately, the same error is currently being reported for cumulative (AKA execution) timeout and for burst timeout, which may be confusing to the user. Indeed, "execution timeout" error historically results from the current hlua context exceeding the total (cumulative) time it's allowed to run. It is set per lua context using the dedicated tunables: - tune.lua.session-timeout - tune.lua.task-timeout - tune.lua.service-timeout We've already faced an user report where the user was able to trigger the burst timeout and got "Lua task: execution timeout." error while the user didn't set cumulative timeout. Thus the error was actually confusing because it was indeed the burst timeout which was causing it due to the use of cpu-intensive call from within the task without sufficient manual "yield" keypoints around the cpu-intensive call to ensure it runs on a dedicated scheduler cycle. In this patch we make it so burst timeout related errors are reported as "burst timeout" errors instead of "execution timeout" errors (which in fact became the generic timeout errors catchall with `58e36e5b1`). To do this, hlua_timer_check() now returns a different value depending if the exeeded timeout is the burst one or the cumulative one, which allows us to return either HLUA_E_ETMOUT or HLUA_E_BTMOUT in hlua_ctx_resume(). It should improve the situation described in GH #2356 and may possibly be backported with `58e36e5b1` to improve error reporting if it applies without resistance.	2024-06-14 18:25:58 +02:00
William Lallemand	ee5aa4e5e6	BUILD: ssl: disable deprecated functions for AWS-LC 1.29.0 AWS-LC have a lot of functions that does nothing, which are now deprecated and emits some warning. This patch disables the following useless functions that emits a warning: SSL_CTX_get_security_level(), SSL_CTX_set_tmp_dh_callback(), ERR_load_SSL_strings(), RAND_keep_random_devices_open() The list of deprecated functions is here: https://github.com/aws/aws-lc/blob/main/docs/porting/functionality-differences.md	2024-06-14 10:41:36 +02:00
William Lallemand	7120c77b14	MEDIUM: ssl: support for ECDA+RSA certificate selection with AWS-LC AWS-LC does not support the SSL_CTX_set_client_hello_cb() function from OpenSSL which allows to analyze ciphers and signatures algorithm of the ClientHello. However it supports the SSL_CTX_set_select_certificate_cb() which allows the same thing but was the implementation from the boringSSL side. This patch uses the SSL_CTX_set_select_certificate_cb() as well as the SSL_early_callback_ctx_extension_get() function to get the signature algorithms. This was successfully tested with openssl s_client as well as testssl.sh. This should allow to enable more reg-tests that depend on certificate selection. Require at least AWS-LC 1.22.0.	2024-06-13 19:36:40 +02:00
William Lallemand	5149cc4990	BUILD: ssl: fix build with wolfSSL fix build with wolfSSL, broken since the reorg in src/ssl_clienthello.c	2024-06-13 17:01:45 +02:00
William Lallemand	4ced880d22	REORG: ssl: move the SNI selection code in ssl_clienthello.c Move the code which is used to select the final certificate with the clienthello callback. ssl_sock_client_sni_pool need to be exposed from outside ssl_sock.c	2024-06-13 16:48:17 +02:00
William Lallemand	fc7c5d892b	MINOR: ssl: add ssl_sock_bind_verifycbk() in ssl_sock.h Add missing ssl_sock_bind_verifycbk() in ssl_sock.h	2024-06-13 16:48:17 +02:00
Aurelien DARRAGON	15e9c7da6b	MINOR: log: add log-profile parsing logic This patch implements prerequisite log-profile struct and parser logic. It has no effect during runtime for now. Logformat expressions provided in log-profile "steps" are postchecked during postparsing for each proxy "log" directive that makes use of a given profile. (this allows to ensure that the logformat expressions used in the profile are compatible with proxy using them)	2024-06-13 15:43:09 +02:00
Aurelien DARRAGON	33f3bec7ee	MINOR: log: add logger flags Logger struct may benefit from having a "flags" struct member to set or remove different logger states. For that, we reuse an existing 4 bytes hole in the logger struct to store a 2 bytes flags integer, leaving the struct with a 2-bytes hole now.	2024-06-13 15:43:09 +02:00
Aurelien DARRAGON	3102c89dde	MINOR: log: provide proxy context to resolve_logger() Prerequisite work for log-profiles, we need to know under which proxy context the logger is being used. When the info is not available, (ie: global section or log-forward section, <px> is set to NULL)	2024-06-13 15:43:09 +02:00
Aurelien DARRAGON	8f34320e15	MINOR: log: provide log origin in logformat expressions using '%OG' '%OG' logformat alias may be used to report the log origin (when/where) that triggered log generation using sess_build_logline(). Possible values are: - "sess_error": log was generated during session error handling - "sess_killed": log was generated during session abortion (killed embryonic session) - "txn_accept": log was generated right after frontend conn was accepted - "txn_request": log was generated after client request was received - "txn_connect": log was generated after backend connection establishment - "txn_response": log was generated during server response handling - "txn_close": log was generated at the final txn step, before closing - "unspec": unknown or not specified Documentation was updated.	2024-06-13 15:43:09 +02:00
Aurelien DARRAGON	b52862d401	MINOR: log: add log_orig_to_str() function Get human readable string from log_orig enum members.	2024-06-13 15:43:09 +02:00
Aurelien DARRAGON	2a91bd52ad	MINOR: log: provide sending log context to process_send_log() when available This is another prerequisite work in preparation for log-profiles: in this patch we make process_send_log() aware of the log origin, primarily aiming for sess and txn logging steps such as error, accept, connect, close, as well as relevant sess and stream pointers.	2024-06-13 15:43:09 +02:00
Aurelien DARRAGON	0b7a5a64eb	MEDIUM: log/session: handle embryonic session log within sess_log() Move the embryonic session logging logic down to sess_log() in preparation for log-profiles because then log preferences will be set per logger and not per proxy. Indeed, as each logger may come with its own log-profile that possibly overrides proxy logformat preferences, the check will need to be performed at a central place by lower sending functions. To ensure the change doesn't break existing behavior, a dedicated sess_log_embryonic() wrapper was added and is exclusively used by session_kill_embryonic() to indicate that a special logging logic must be performed under sess_log(). Also, thanks to this change, log-format-sd will now be taken into account for legacy embryonic session logging.	2024-06-13 15:43:09 +02:00
Aurelien DARRAGON	79a0a7b4d8	MINOR: session: expose session_embryonic_build_legacy_err() function rename session_build_err_string() to session_embryonic_build_legacy_err() and add new <out> buffer argument to the prototype. <out> will be used as destination for the generated string instead of implicitly relying on the trash buffer. Finally, expose the new function through the header file so that it becomes usable from any source file. The function is expected to be called with a session originating from a connection and should not be used for applets.	2024-06-13 15:43:09 +02:00
Aurelien DARRAGON	ee288a4eef	REORG: log: reorder send log helpers by dependency order This commit looks messy, but all it does is reorganize send_log() helpers by dependency order to remove the need of forward-declaring some of them. Also, since they're all internal helpers, let's explicitly mark them as static to prevent any misuse.	2024-06-13 15:43:09 +02:00
Amaury Denoyelle	88681681cc	MINOR: quic: refactor qc_build_pkt() error handling qc_build_pkt() error handling was difficult due to multiple error code possible. Improve this by defining a proper enum to describe the various error code. Also clean up ending labels inside qc_build_pkt().	2024-06-12 18:05:40 +02:00
Christopher Faulet	9748df29ff	BUG/MEDIUM: mux-quic: Don't unblock zero-copy fwding if blocked during nego The previous fix (`792a645ec2` ["BUG/MEDIUM: mux-quic: Unblock zero-copy forwarding if the txbuf can be released"]) introduced a regression. The zero-copy data forwarding must only be unblocked if it was blocked by the producer, after a successful negotiation. It is important because during a negotiation, the consumer may be blocked for another reason. Because of the flow control for instance. In that case, there is not necessarily a TX buffer. And it unexpected to try to release an unallocated TX buf. In addition, the same may happen while a TX buf is still in-use. In that case, it must also not be released. So testing the TX buffer is not the right solution. To fix the issue, a new IOBUF flag was added (IOBUF_FL_FF_WANT_ROOM). It must be set by the producer if it is blocked after a sucessful negotiation because it needs more room. In that case, we know a buffer was provided by the consummer. In done_fastfwd() callback function, it is then possible to safely unblock the zero-copy data forwarding if this flag is set. This patch must be backported to 3.0 with the commit above.	2024-06-05 07:28:10 +02:00
Willy Tarreau	1eb0f22ee1	[RELEASE] Released version 3.1-dev0 Released version 3.1-dev0 with the following main changes : - MINOR: version: mention that it's development again	2024-05-29 15:00:02 +02:00
Willy Tarreau	555772e961	MINOR: version: mention that it's development again This essentially reverts `2e42a19cde`.	2024-05-29 14:59:19 +02:00
Willy Tarreau	2e42a19cde	MINOR: version: mention that it's 3.0 LTS now. The version will be maintained up to around Q2 2029. Let's also update the INSTALL file to mention this.	2024-05-29 14:40:26 +02:00
Willy Tarreau	decb7c90df	CLEANUP: ssl_sock: move dirty openssl-1.0.2 wrapper to openssl-compat Valentine noticed this ugly SSL_CTX_get_tlsext_status_cb() macro definition inside ssl_sock.c that is dedicated to openssl-1.0.2 only. It would be better placed in openssl-compat.h, which is what this patch does. It also addresses a missing pair of parenthesis and removes an invalid extra semicolon.	2024-05-28 19:17:57 +02:00
Aurelien DARRAGON	435a9da267	MINOR: log: rename 'log-format tag' to 'log-format alias' In 2.9 we started to introduce an ambiguity in the documentation by referring to historical log-format variables ('%var') as log-format tags in `739c4e5b1e` ("MINOR: sample: accept_date / request_date return %Ts / %tr timestamp values") and `454c372b60` ("DOC: configuration: add sample fetches for timing events"). In fact, we've had this confusion between log-format tag and log-format var for more than 10 years now, but in 2.9 it was the first time the confusion was exposed in the documentation. Indeed, both 'log-format variable' and 'log-format tag' actually refer to the same feature (that is: '%B' and friends that can be used for direct access to some log-oriented predefined fetches instead of using %[expr] with generic sample expressions). This feature was first implemented in `723b73ad75` ("MINOR: config: Parse the string of the log-format config keyword") and later documented in `4894040fa` ("DOC: log-format documentation"). At that time, it was clear that we used to name it 'log-format variable'. But later the same year, 'log-format tag' naming started to appear in some commit messages (while still referring to the same feature), for instance with `ffc3fcd6d` ("MEDIUM: log: report SSL ciphers and version in logs using logformat %sslc/%sslv"). Unfortunately in 2.9 when we added (and documented) new log-format variables we officially started drifting to the misleading 'log-format tag' naming (perhaps because it was the most recent naming found for this feature in git log history, or because the confusion has always been there) Even worse, in 3.0 this confusion led us to rename all 'var' occurrences to 'tag' in log-format related code to unify the code with the doc. Hopefully William quickly noticed that we made a mistake there, but instead of reverting to historical naming (log-format variable), it was decided that we must use a different name that is less confusing than 'tags' or 'variables' (tags and variables are keywords that are already used to designate other features in the code and that are not very explicit under log-format context today). Now we refer to '%B' and friends as a logformat alias, which is essentially a handy way to print some log oriented information in the log string instead of leveraging '%[expr]' with generic sample expressions made of fetches and converters. Of course, there are some subtelties, such as a few log-format aliases that still don't have sample fetch equivalent for historical reasons, and some aliases that may be a little faster than their generic sample expression equivalents because most aliases are pretty much hardcoded in the log building function. But in general logformat aliases should be simply considered as an alternative to using expressions (with '%[expr']') Also, under log-format context, when we want to refer to either an alias ('%alias') or an expression ('%[expr]'), we should use the generic term 'logformat item', which in fact designates a single item within the logformat string provided by the user. Indeed, a logformat item (whether is is an alias or an expression) always starts with '%' and may accept optional flags / arguments Both the code and the documentation were updated in that sense, hopefully this will clarify things and prevent future confusions.	2024-05-27 17:03:48 +02:00
Amaury Denoyelle	47168e217a	MEDIUM: connection: use pool-conn-name instead of sni on reuse Implement pool-conn-name support for idle connection reuse. It replaces SNI as arbitrary identifier for connections in the idle pool. Thus, every SNI reference in this context have been replaced. Main change occurs in connect_server() where pool-conn-name sample fetch is now prehash to generate idle connection identifier. SNI is now solely used in the context of SSL for ssl_sock_set_servername().	2024-05-24 14:47:21 +02:00
Amaury Denoyelle	be4f89f2b2	MINOR: server: define pool-conn-name keyword Define a new server keyword pool-conn-name. The purpose of this keyword will be to identify connections inside the idle connections pool, replacing SNI in case SSL is not wanted. This keyword uses a sample expression argument. It thus can reuse existing function parse_srv_expr() for parsing. In the future, it may be necessary to define a keyword variant which uses a logformat for extensability. This patch only implement parsing. Argument is stored inside new server field <pool_conn_name> and expression is generated in _srv_parse_finalize() into <pool_conn_name_expr>. If pool-conn-name is not set but SNI is, the latter is reused automatically as pool-conn-name via _srv_parse_finalize(). This ensures current reuse behavior remains compatible and idle connection reuse will not mix connections with different SNIs by mistake. Main usage will be for rhttp when SSL is not wanted between the two haproxy instances. Previously, it was possible to use "sni" keyword even without SSL on a server line which have a similar effect. However, having a dedicated "pool-conn-name" keyword is deemed clearer. Besides, it would allow for more complex configuration where pool-conn-name and SNI are use in parallel with different values.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	91001422b4	MINOR: server: generalize sni expr parsing Two functions exists for server sni sample expression parsing. This is confusing so this commit aims at clarifying this. Functions are renamed with the following identifiers. First function is named parse_srv_expr() and can be used during parsing. Besides expression parsing, it has ensure sample fetch validity in the context of a server line. Second function is renamed _parse_srv_expr() and is used internally by parse_srv_expr(). It only implements sample parsing without extra checks. It is already use for server instantiation derived from server-template as checks were already performed. Also, it is now used in http-client code as SNI is a fixed string. Finally, both functions are generalized to remove any reference to SNI. This will allow to reuse it to parse other server keywords which use an expression. This will be the case for the future keyword pool-conn-name.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	5764bc50b5	BUG/MINOR: quic: adjust restriction for stateless reset emission Review RFC 9000 and ensure restriction on Stateless reset are properly enforced. After careful examination, several changes are introduced. First, redefine minimal Stateless Reset emitted packet length to 21 bytes (5 random bytes + a token). This is the new default length used in every case, unless received packet which triggered it is 43 bytes or smaller. Ensure every Stateless Reset packets emitted are at 1 byte shorter than the received packet which triggered it. No Stateless reset will be emitted if this falls under the above limit of 21 bytes. Thus this should prevent looping issues. This should be backported up to 2.6.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	45f40bac4c	MEDIUM: config: prevent communication with privileged ports This commit introduces a new global setting named harden.reject_privileged_ports.{tcp\|quic}. When active, communications with clients which use privileged source ports are forbidden. Such behavior is considered suspicious as it can be used as spoofing or DNS/NTP amplication attack. Value is configured per transport protocol. For each TCP and QUIC distinct code locations are impacted by this setting. The first one is in sock_accept_conn() which acts as a filter for all TCP based communications just after accept() returns a new connection. The second one is dedicated for QUIC communication in quic_recv(). In both cases, if a privileged source port is used and setting is disabled, received message is silently dropped. By default, protection are disabled for both protocols. This is to be able to backport it without breaking changes on stable release. This should be backported as it is an interesting security feature yet relatively simple to implement.	2024-05-24 14:36:31 +02:00
Aurelien DARRAGON	9d37c4b989	DEBUG: tools: add vma_set_name_id() helper Just like vma_set_name() from `51a8f134e` ("DEBUG: tools: add vma_set_name() helper"), but also takes <id> as parameter to append "-$id" suffix after the name in order to differentiate 2 areas that were named using the same <type> and <name> combination. example, using mmap + MAP_SHARED\|MAP_ANONYMOUS: 7364c4fff000-736508000000 rw-s 00000000 00:01 3540 [anon_shmem:type:name-id] Another example, using mmap + MAP_PRIVATE\|MAP_ANONYMOUS or using glibc/malloc() above MMAP_THRESHOLD: 7364c4fff000-736508000000 rw-s 00000000 00:01 3540 [anon:type:name-id]	2024-05-24 12:07:13 +02:00
Willy Tarreau	381ed2a4dd	MINOR: config: add thread-hard-limit to set an upper bound to nbthread On todays large systems, it's not always desired to run on all threads for light loads, and usually users enforce nbthread to a lower value (e.g. 8). The problem is that this is a fixed value, and moving such configs to smaller machines continues to enforce the value and this becomes extremely unproductive due to having more threads than CPUs. This also happens quite a bit in VMs, containers, or cloud instances of various sizes. This commit introduces the thread-hard-limit setting that allows to only set an upper bound to the number of threads without raising a lower value. This means that using "thread-hard-limit 8" will make sure that no more than 8 threads will be used when available, but it will remain two when run on a dual-core machine.	2024-05-24 09:46:49 +02:00
Willy Tarreau	c7335d55f8	BUG/MEDIUM: quic_tls: prevent LibreSSL < 4.0 from negotiating CHACHA20_POLY1305 As diagnosed in GH issue #2569, there's currently an issue in LibreSSL's CHACHA20 in-place implementation that makes haproxy discard incoming QUIC packets encrypted with it. It's not very easy to observe the issue because: - QUIC recommends that CHACHA20 is used in priority - on x86 with AES-NI, LibreSSL prefers AES-GCM for performance reasons, so the problem is only observed there if a client explicitly forces TLS_CHACHA20_POLY1305_SHA256 only. - discarded packets cause retransmits showing some apparent activity, and the handshake succeeds so it's not easy to analyze from the client which thinks that the server is slow to respond. Thus in practice, on non-x86 machines running LibreSSL, requests made over QUIC freeze for a long time, unless the client explicitly forces algos excluding TLS_CHACHA20_POLY1305_SHA256. That's typically the case by default on modern OpenBSD systems, and was reported in the issue above for an arm64 machine running OpenBSD -current, and was also observed on a mips64 one running OpenBSD 7.5. There is no simple solution to this problem due to some of the protocol's constraints without digging too low into the stack (and risking to break more). Here we're taking a pragmatic approach consisting in making the connection fail hard when TLS_CHACHA20_POLY1305_SHA256 is selected, regardless of the availability of other ciphers. This means that every time a connection would have hung, instead it will fail fast, allowing the client to retry over TLS/TCP. Theo Buehler recommends that we limit this protection to all LibreSSL versions before 4.0 since it's where the fix will be implemented. Older stable versions will just see TLS_CHACHA20_POLY1305_SHA256 disabled, which should be sufficient to make QUIC work there again as well. The following config is sufficient to reproduce the issue (on a non-x86 machine, both arm64 & mips64 were confirmed to reproduce it): global limited-quic frontend stats mode http #bind :8181 #bind :8443 ssl crt rsa+dh2048.pem bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3 timeout client 5s stats uri / And the following commands will trigger the problem on affected LibreSSL versions: curl --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -v --http3 -k https://127.0.0.1:8443/ curl -v --http3 -k https://127.0.0.1:8443/ while these ones must work: curl --tls13-ciphers TLS_AES_128_GCM_SHA256 -v --http3 -k https://127.0.0.1:8443/ curl --tls13-ciphers TLS_AES_256_GCM_SHA384 -v --http3 -k https://127.0.0.1:8443/ Normally all of them will work with LibreSSL 4, and only the first one should fail with stable LibreSSL versions higher than 3.9.2. An haproxy version without this workaround will show an unresponsive command after the GET is sent, while a version with the workaround will close the connection on error. On a version with this workaround, if TCP listeners are uncommented, curl will automatically fall back to TCP and attempt the reqeust again over HTTP/2. Finally, on OpenSSL 1.1.1 in compat mode (hence the limited-quic option above) all of them must work. Many thanks to github user @lgv5 for the detailed report, tests, and for spotting the issue, and to @botovq (Theo Buehler) for the quick analysis, patch and help on this workaround. This needs to be backported to versions 2.6 and above.	2024-05-22 16:22:22 +02:00
Amaury Denoyelle	60496e884e	MINOR: connection: support PROXY v2 TLV emission without stream Update API for PROXY protocol header encoding. Previously, it requires stream parameter to be set. Change make_proxy_line() and associated functions to add an extra session parameter. This is useful in context where no stream is instantiated. For example, this is the case for rhttp preconnect. This change allows to extend PROXY v2 TLV encoding. Replace build_logline() which requires a stream instance and call directly sess_build_logline(). Note that stream parameter is kept as it is necessary for unique ID encoding. This change has no functional change for standard connections. However, it is necessary to support TLV encoding on rhttp preconnect.	2024-05-22 10:01:57 +02:00
Amaury Denoyelle	12c40c25a9	MEDIUM: rhttp: create session for active preconnect Modify rhttp preconnect by instantiating a new session for each connection attempt. Connection is thus linked to a session directly on its instantiation contrary to previously where no session existed until listener_accept(). This patch will allow to extend rhttp usage. Most notably, it will be useful to use various sample fetches on the server line and extend logging capabilities. Changes are minimal, yet consequences are considered not trivial as for the first time a FE connection session is instantiated before listener_accept(). This requires an extra explicit check in session_accept_fd() to not overwrite an existing session. Also, flag SESS_FL_RELEASE_LI is not set immediately as listener counters must note be decremented if connection and its session are freed before reversal is completed, or else listener counters will be invalid. conn_session_free() is used as connection destroy callback to ensure the session will be freed automatically on connection release.	2024-05-22 10:01:57 +02:00
Amaury Denoyelle	45b80aed70	MINOR: session: define flag to explicitely release listener on free When a session is allocated for a FE connection, session_free() is responsible to call listener_release() to decrement listener connection counters and resume listening. Until now, <listener> member of session was tested inside session_free() before invocating listener_release(). To highlight more explicitely the relation between sessions and listeners, introduce a new flag SESS_FL_RELEASE_LI. Only session with such flag set will invoke listener_release() on their cleanup. Flag is set inside session_accept_fd() on success. This patch has no functional change. However, it will be useful to implement session creation for rHTTP preconnect.	2024-05-22 10:01:57 +02:00
Amaury Denoyelle	2770ef352e	BUG/MINOR: rhttp: prevent listener suspend Ensure "disable frontend" on a reverse HTTP listener is forbidden by returing -1 on suspend callback. Suspending such a listener has unknown effect and so is not properly implemented for now. This should be backported up to 2.9.	2024-05-22 10:01:57 +02:00
Valentine Krasnobaeva	5f713c03be	BUG/MEDIUM: proto: fix fd leak in <proto>_connect_server This fixes the fd leak, introduced in the commit `d3fc982cd7` ("MEDIUM: proto: make common fd checks in sock_create_server_socket"). Initially sock_create_server_socket() was designed to return only created socket FD or -1. Its callers from upper protocol layers were required to test the returned errno and were required then to apply different configuration related checks to obtained positive sock_fd. A lot of this code was duplicated among protocols implementations. The new refactored version of sock_create_server_socket() gathers in one place all duplicated checks, but in order to be complient with upper protocol layers, it needs the 3rd parameter: 'stream_err', in which it sets the Stream Error Flag for upper levels, if the obtained sock_fd has passed all additional checks. No backport needed since this was introduced in 3.0-dev10.	2024-05-21 20:14:05 +02:00
William Lallemand	e6657fd108	MEDIUM: ssl: don't load file by discovering them in crt-store In commit `55e9e9591` ("MEDIUM: ssl: temporarily load files by detecting their presence in crt-store"), ssl_sock_load_pem_into_ckch() was replaced by ssl_sock_load_files_into_ckch() in the crt-store loading. But the side effect was that we always try to autodetect, and this is not what we want. This patch reverse this, and add specific code in the crt-list loading, so we could autodetect in crt-list like it was done before, but still try to load files when a crt-store filename keyword is specified. Example: These crt-list lines won't autodetect files: foobar.crt [key foobar.key issuer foobar.issuer ocsp-update on] .foo.bar foobar.crt [key foobar.key] .foo.bar These crt-list lines will autodect files: foobar.pem [ocsp-update on] *.foo.bar foobar.pem	2024-05-21 18:30:45 +02:00
Aurelien DARRAGON	51a8f134ef	DEBUG: tools: add vma_set_name() helper Following David Carlier's work in `98d22f21` ("MEDIUM: shctx: Naming shared memory context"), let's provide an helper function to set a name hint on a virtual memory area (ie: anonymous map created using mmap(), or memory area returned by malloc()). Naming will only occur if available, and naming errors will be ignored. The function takes mandatory <type> and <name> parameterss to build the map name as follow: "type:name". When looking at /proc/<pid>/maps, vma named using this helper function will show up this way (provided that the kernel has prtcl support for PR_SET_VMA_ANON_NAME): example, using mmap + MAP_SHARED\|MAP_ANONYMOUS: 7364c4fff000-736508000000 rw-s 00000000 00:01 3540 [anon_shmem:type:name] Another example, using mmap + MAP_PRIVATE\|MAP_ANONYMOUS or using glibc/malloc() above MMAP_THRESHOLD: 7364c4fff000-736508000000 rw-s 00000000 00:01 3540 [anon:type:name]	2024-05-21 17:54:58 +02:00
Aurelien DARRAGON	0cfbeb1ae8	BUG/MINOR: ring: free ring's allocated area not ring's usable area when using maps Since `40d1c84bf0` ("BUG/MAJOR: ring: free the ring storage not the ring itself when using maps"), munmap() call for startup_logs's ring and file-backed rings fails to work (EINVAL) and causes memory leaks during process cleanup. munmap() fails because it is called with the ring's usable area pointer which is an offset from the underlying original memory block allocated using mmap(). Indeed, ring_area() helper function was misused because it didn't explicitly mention that the returned address corresponds to the usable storage's area, not the allocated one. To fix the issue, we add an explicit ring_allocated_area() helper to return the allocated area for the ring, just like we already have ring_allocated_size() for the allocated size, and we properly use both the allocated size and allocated area to manipulate them using munmap() and msync(). No backport needed.	2024-05-21 11:42:35 +02:00
William Lallemand	55e9e95914	MEDIUM: ssl: temporarily load files by detecting their presence in crt-store crt-store is maint to be stricter than your common crt argument on a bind line, and is supposed to be a declarative format. However, since the 'ocsp-update' was migrated from ssl_conf to ckch_conf, the .issuer file is not autodetected anymore when adding a ocsp-update keyword in a crt-list file, which breaks retro-compatibility. This patch is a quick fix that will disappear once we are able to be strict on a crt-store and autodetect on a crt-list.	2024-05-17 17:35:51 +02:00
William Lallemand	58103bc8e6	MINOR: ssl: ckch_conf_cmp() compare multiple ckch_conf structures The ckch_conf_cmp() function allow to compare multiple ckch_conf structures in order to check that multiple usage of the same crt in the configuration uses the same ckch_conf definition. A crt-list allows to use "crt-store" keywords that defines a ckch_store, that can lead to inconsistencies when a crt is called multiple time with different parameters. This function compare and dump a list of differences in the err variable to be output as error. The variant ckch_conf_cmp_empty() compares the ckch_conf structure to an empty one, which is useful for bind lines, that are not able to have crt-store keywords. These functions are used when a crt-store is already inialized and we need to verify if the parameters are compatible. ckch_conf_cmp() handles multiple cases: - When the previous ckch_conf was declared with CKCH_CONF_SET_EMPTY, we can't define any new keyword in the next initialisation - When the previous ckch_conf was declared with keywords in a crtlist (CKCH_CONF_SET_CRTLIST), the next initialisation must have the exact same keywords. - When the previous ckch_conf was declared in a "crt-store" (CKCH_CONF_SET_CRTSTORE), the next initialisaton could use no keyword at all or the exact same keywords.	2024-05-17 17:35:51 +02:00
William Lallemand	1bc6e990f2	MEDIUM: ssl/cli: handle crt-store keywords in crt-list over the CLI This patch adds crt-store keywords from the crt-list on the CLI. - keywords from crt-store can be used over the CLI when inserting certificate in a crt-list - keywords from crt-store are dumped when showing a crt-list content over the CLI The ckch_conf_kws.func function pointer needed a new "cli" parameter, in order to differenciate loading that come from the CLI or from the startup, as they don't behave the same. For example it must not try to load a file on the filesystem when loading a crt-list line from the CLI. dump_crtlist_sslconf() was renamed in dump_crtlist_conf() and takes a new ckch_conf parameter in order to dump relevant crt-store keywords.	2024-05-17 17:35:51 +02:00
William Lallemand	2bcf38c7c8	MEDIUM: ssl: add ocsp-update.disable global option This option allow to disable completely the ocsp-update. To achieve this, the ocsp-update.mode global keyword don't rely anymore on SSL_SOCK_OCSP_UPDATE_OFF during parsing to call ssl_create_ocsp_update_task(). Instead, we will inherit the SSL_SOCK_OCSP_UPDATE_* value from ocsp-update.mode for each certificate which does not specify its own mode. To disable completely the ocsp without editing all crt entries, ocsp-update.disable is used instead of "ocsp-update.mode" which is now only used as the default value for crt.	2024-05-17 17:35:51 +02:00
William Lallemand	2e6615b282	MINOR: ssl: ckch_conf_clean() utility function for ckch_conf - ckch_conf_clean() to free() the content of a ckch_conf structure, mostly the string that were strdup()	2024-05-17 17:35:51 +02:00
William Lallemand	2b6b7fea58	MINOR: ssl/ocsp: use 'ocsp-update' in crt-store Use the ocsp-update keyword in the crt-store section. This is not used as an exception in the crtlist code anymore. This patch introduces the "ocsp_update_mode" variable in the ckch_conf structure. The SSL_SOCK_OCSP_UPDATE_* enum was changed to a define to match the ckch_conf on/off parser so we can have off to -1.	2024-05-17 17:35:51 +02:00
William Lallemand	462e5b0098	MINOR: ssl: handle PARSE_TYPE_INT and PARSE_TYPE_ONOFF in ckch_store_load_files() The callback used by ckch_store_load_files() only works with PARSE_TYPE_STR. This allows to use a callback which will use a integer type for PARSE_TYPE_INT and PARSE_TYPE_ONOFF. This require to change the type of the callback to void * to pass either a char * or a int depending of the parsing type. The ssl_sock_load_* functions were encapsuled in ckch_conf_load_* function just to match the type. This will allow to handle crt-store keywords that are ONOFF or INT types.	2024-05-17 17:35:51 +02:00
William Lallemand	db09c2168f	CLEANUP: ssl/ocsp: remove the deprecated parsing code for "ocsp-update" Remove the "ocsp-update" keyword handling from the crt-list. The code was made as an exception everywhere so we could activate the ocsp-update for an individual certificate. The feature will still exists but will be parsed as a "crt-store" keyword which will still be usable in a "crt-list". This will appear in future commits. This commit also disable the reg-tests for now.	2024-05-17 17:35:51 +02:00
William Lallemand	d616932076	MEDIUM: ssl/crtlist: loading crt-store keywords from a crt-list This patch allows the usage of "crt-store" keywords from a "crt-list". The crtstore_parse_load() function was splitted into 2 functions, so the keywords parsing is done in ckch_conf_parse(). With this patch, crt are loaded with ckch_store_new_load_files_conf() or ckch_store_new_load_files_path() depending on weither or not there is a "crt-store" keyword. More checks need to be done on "crt" bind keywords to ensure that keywords are compatible. This patch does not introduce the feature on the CLI.	2024-05-17 17:35:51 +02:00
William Lallemand	8526d666d2	MINOR: ssl: ckch_store_new_load_files_conf() loads filenames from ckch_conf ckch_store_new_load_files_conf() is the equivalent of new_ckch_store_load_files_path() but instead of trying to find the files using a base filename, it will load them from a list of files.	2024-05-17 17:35:51 +02:00
Christopher Faulet	1a2699d5f7	CLEANUP: mux-h1: Remove unused H1S_F_ERROR_MASK mask value This mask value is unused, so we can safely remove it. It is a chance because its value was wrong. But there is no bug here, even in stable versions, because it is no longer used in all versions.	2024-05-17 16:33:53 +02:00
Christopher Faulet	071057d112	REORG: mux-h1: Group H1S_F_BODYLESS_* flags To ease reading of H1S flags, H1S_F_BODYLESS_REQ and H1S_F_BODYLESS_RESP flags are grouped.	2024-05-17 16:33:53 +02:00
Christopher Faulet	8e55d29109	MINOR: mux-h1: Add a flag to ignore the request payload There was a flag to skip the response payload on output, if any, by stating it is bodyless. It is used for responses to HEAD requests or for 204/304 responses. This allow rewrites during analysis. For instance a HEAD request can be rewrite to a GET request for any reason (ie, a server not supporting HEAD requests). In this case, the server will send a response with a payload. On frontend side, the payload will be skipped and a valid response (without payload) will be sent to the client. With this patch we introduce the corresponding flag for the request. It will be used to skip the request payload. In addition, when payload must be skipped for a request or a response, The zero-copy data forwarding is now disabled.	2024-05-17 16:33:53 +02:00
Willy Tarreau	0999e3d959	CLEANUP: compat: make the MIN/MAX macros more reliable After every release we say that MIN/MAX should be changed to be an expression that only evaluates each operand once, and before every version we forget to change it and we recheck that the code doesn't misuse them. Let's fix them now.	2024-05-17 15:57:18 +02:00
Willy Tarreau	ea3b89952d	BUILD: stick-tables: better mark the stktable_data as 32-bit aligned Aur�lien reported that clang's build was broken by the recent fix `845fb846c7` ("BUG/MEDIUM: stick-tables: properly mark stktable_data as packed"), because it now wants to use a helper for some atomic ops (to increment std_t_uint). While this makes no sense to do something that slow on modern architectures like x86 and arm64 which are fine with unaligned accesses, we actually we can simply mark the struct as aligned to its smallest element which is 32-bit (but still packed). With this, it was verified that it is enough for clang to see that its 32-bit operations will always be aligned, while making 64-bit operations safe on 64-bit platforms that do not support unaligned accesses. This should be backported wherever the patch above is backported.	2024-05-17 11:00:45 +02:00
Amaury Denoyelle	216f70f989	MINOR: mux-quic: support glitches Implement basic support for glitches on QUIC multiplexer. This is mostly identical too glitches for HTTP/2. A new configuration option named tune.quic.frontend.glitches-threshold is defined to limit the number of glitches on a connection before closing it. Glitches counter is incremented via qcc_report_glitch(). A new qcc_app_ops callback <report_susp> is defined. On threshold reaching, it allows to set an application error code to close the connection. For HTTP/3, value H3_EXCESSIVE_LOAD is returned. If not defined, default code INTERNAL_ERROR is used. For the moment, no glitch are reported for QUIC or HTTP/3 usage. This will be added in future patches as needed.	2024-05-16 10:58:20 +02:00
Amaury Denoyelle	e094412337	MINOR: h3/qpack: adjust naming for errors Rename enum values used for HTTP/3 and QPACK RFC defined codes. First uses a prefix H3_ERR_* which serves as identifier between them. Also separate QPACK values in a new dedicated enum qpack_err. This is deemed cleaner.	2024-05-16 10:31:17 +02:00
Amaury Denoyelle	2dabcf30be	MINOR: qpack: prepare error renaming There is two distinct enums both related to QPACK error management. The first one is dedicated to RFC defined code. The other one is a set of internal values returned by qpack_decode_fs(). There has been issues discovered recently due to the confusion between them. Rename internal values with the prefix QPACK_RET_. The older name QPACK_ERR_ will be used in a future commit for the first enum.	2024-05-16 10:31:17 +02:00
Willy Tarreau	b0349cf2de	MINOR: dynbuf: provide a b_dequeue() variant for multi-thread In order to forcefully unregister a buffer waiter during an inter-thread takeover under isolation, we'll need to that the function works without th_ctx but the target thread's ctx instead. Let's implement this by passing the target thread as an argument. Now b_dequeue() simply calls this one with tid. It's OK it's not on that critical a path, especially since the list has been checked for existence before performing the call.	2024-05-15 19:37:12 +02:00
Willy Tarreau	845fb846c7	BUG/MEDIUM: stick-tables: properly mark stktable_data as packed The stktable_data union is made of types of varying sizes, and depending on which types are stored in a table, some offsets might not necessarily be aligned. This results in a bus error for certain regtests (e.g. lb-services) on MIPS64. This bug may impact MIPS64, SPARC64, armv7 when accessing a 64-bit counter (e.g. bytes) and depending on how the compiler emitted the operation, and cause a trap that's emulated by the OS on RISCV (heavy cost). x86_64 and armv8 are not affected at all. Let's properly mark the struct with __attribute__((packed)) so that the compiler emits the suitable unaligned-compatible instructions when accessing the fields. This should be backported to all versions where it applies.	2024-05-15 19:03:18 +02:00
Willy Tarreau	276cdc11e8	BUG/MEDIUM: htx: mark htx_sl as packed since it may be realigned A test on MIPS64 revealed that the following reg tests would all fail at the same place in htx_replace_stline() when updating parts of the request line: reg-tests/cache/if-modified-since.vtc reg-tests/http-rules/h1or2_to_h1c.vtc reg-tests/http-rules/http_after_response.vtc reg-tests/http-rules/normalize_uri.vtc reg-tests/http-rules/path_and_pathq.vtc While the status line is normally aligned since it's the first block of the HTX, it may become unaligned once replaced. The problem is, it is a structure which contains some u16 and u32, and dereferencing them on machines not natively supporting unaligned accesses makes them crash or handle crap. Typically, MIPS/MIPS64/SPARC will crash, ARMv5 will either crash or (more likely) return swapped values and do crap, and RISCV will trap and turn to slow emulation. We can assign the htx_sl struct the packed attribute, but then this also causes the ints to fill the 2-bytes gap before them, always causing unaligned accesses for this part on such machines. The patch does a bit better, by explicitly filling this two-bytes hole, and packing the struct. This should be backported to all versions.	2024-05-15 19:03:17 +02:00
Amaury Denoyelle	86aafd0236	BUG/MINOR: qpack: fix error code reported on QPACK decoding failure qpack_decode_fs() is used to decode QPACK field section on HTTP/3 headers parsing. Its return value is incoherent as it returns either QPACK_DECOMPRESSION_FAILED defined in RFC 9204 or any other internal values defined in qpack-dec.h. On failure, such return code is reused by HTTP/3 layer to be reported via a CONNECTION_CLOSE frame. This is incorrect if an internal error values was reported as it is not defined by any specification. Fir return values of qpack_decode_fs() in two ways. Firstly, fix invalid usages of QPACK_DECOMPRESSION_FAILED when decoded content is too large for the correct internal error QPACK_ERR_TOO_LARGE. Secondly, adjust qpack_decode_fs() API to only returns internal code values. A new internal enum QPACK_ERR_DECOMP is defined to replace QPACK_DECOMPRESSION_FAILED. Caller is responsible to convert it to a suitable error value. For other internal values, H3_INTERNAL_ERROR is used. This is done through a set of convert functions. This should be backported up to 2.6. Note that trailers are not supported in 2.6 so chunk related to h3_trailers_to_htx() can be safely skipped.	2024-05-15 16:07:15 +02:00
Willy Tarreau	fc792694a6	MEDIUM: dynbuf: use emergency buffers upon failed memory allocations Now, if a pool_alloc() fails for a buffer and if conditions are met based on the queue number, we'll try to get an emergency buffer. Thanks to this the situation is way more stable now. With only 4 reserve buffers and 1 buffer it's possible to reliably serve 500 concurrent end- to-end H1 connections and consult stats in parallel in loops showing the growing number of buf_wait events in "show activity" without facing an instant stall like in the past. Lower values still cause quick stalls though. It's also apparent that some subsystems do not seem to detach from the buffer_wait lists when leaving. For example several crashes in the H1 part showed list elements still present after a free(), so maybe some operations performed inside h1_release() after the b_dequeue() call can sometimes result in a new allocation. Same for streams, where the dequeue is done relatively early.	2024-05-10 17:18:13 +02:00
Willy Tarreau	0ce51dc93b	MEDIUM: dynbuf: implement emergency buffers The buffer reserve set by tune.buffers.reserve has long been unused, and in order to deal gracefully with failed memory allocations we'll need to resort to a few emergency buffers that are pre-allocated per thread. These buffers are only for emergency use, so every time their count is below the configured number a b_free() will refill them. For this reason their count can remain pretty low. We changed the default number from 2 to 4 per thread, and the minimum value is now zero (e.g. for low-memory systems). The tune.buffers.limit setting has always been a problem when trying to deal with the reserve but now we could simplify it by simply pushing the limit (if set) to match the reserve. That was already done in the past with a static value, but now with threads it was a bit trickier, which is why the per-thread allocators increment the limit on the fly before allocating their own buffers. This also means that the configured limit is saner and now corresponds to the regular buffers that can be allocated on top of emergency buffers. At the moment these emergency buffers are not used upon allocation failure. The only reason is to ease bisecting later if needed, since this commit only has to deal with resource management.	2024-05-10 17:18:13 +02:00
Willy Tarreau	5b8d27617f	MEDIUM: channel: allocate without queuing when retrying Now when trying to allocate a channel buffer, we can check if we've been notified of availability via the producer stream connector callback, in which case we should not consult the queue, or if we're doing a first allocation and check the queue.	2024-05-10 17:18:13 +02:00
Willy Tarreau	f552f79ba5	MINOR: mux-h1: report that a buffer allocation succeeded When the buffer allocation callback is notified of a buffer availability, it will now set a MAYALLOC flag in addition to clearing the ALLOC one, for each of the 3 levels where we may fail an allocation. The flag will be cleared upon a successful allocation. This will soon be used to decide to re-allocate without waiting again in the queue. For now it has no effect. There's just a trick, we need to clear the various *_ALLOC flags before testing h1_recv_allowed() otherwise it will return false!	2024-05-10 17:18:13 +02:00
Willy Tarreau	cb2d758043	MINOR: applet: report about buffer allocation success When appctx_buf_available() is called, it now sets APPCTX_FL_IN_MAYALLOC or APPCTX_FL_OUT_MAYALLOC depending on the reportedly permitted buffer allocation, and these flags are cleared when the said buffers are allocated. For now they're not used for anything else.	2024-05-10 17:18:13 +02:00
Willy Tarreau	17d8916bb1	MINOR: stream: report that a buffer allocation succeeded When the buffer allocation callback is notified of a buffer availability, it will now set a MAYALLOC flag on the stream so that the stream knows it is allowed to bypass the queue checks. For now this is not used.	2024-05-10 17:18:13 +02:00
Willy Tarreau	7aff64518c	MINOR: stconn: report that a buffer allocation succeeded We used to have two states for the channel's input buffer used by the SC, NEED_BUFF or not, flipped by sc_need_buff() and sc_have_buff(). We want to have a 3rd state, indicating that we've just got a desired buffer. Let's add an HAVE_BUFF flag that is set by sc_have_buff() and that is cleared by sc_used_buff(). This way by looking at HAVE_BUFF we know that we're coming back from the allocation callback and that the offered buffer has not yet been used.	2024-05-10 17:18:13 +02:00
Willy Tarreau	d1eb48a12b	MEDIUM: dynbuf: refrain from offering a buffer if more critical ones are waiting Now b_alloc() will check the queues at the same and higher criticality levels before allocating a buffer, and will refrain from allocating one if these are not empty. The purpose is to put some priorities in the allocation order so that most critical allocators are offered a chance to complete. However in order to permit a freshly dequeued task to allocate again while siblings are still in the queue, there is a special DB_F_NOQUEUE flag to pass to b_alloc() that will take care of this special situation.	2024-05-10 17:18:13 +02:00
Willy Tarreau	4a42af1744	MINOR: applet: adjust the allocation criticity based on the requested buffer When we want to allocate an in buffer, it's in order to pass data to the applet, that will consume it, so it must be seen as the same as a send() from the higher level, i.e. MUX_TX. And for the outbuf, it's a stream endpoint returning data, i.e. DB_SE_RX.	2024-05-10 17:18:13 +02:00
Willy Tarreau	4ffb3b5ebe	MINOR: applet: set the blocking flag in the buffer allocation function Instead of having each caller of appctx_get_buf() think about setting the blocking flag, better have the function do it, since it's already handling the queue anyway. This way we're sure that both are consistent.	2024-05-10 17:18:13 +02:00
Willy Tarreau	f5566afec6	MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait Now thanks to this the bufq_map field is expected to remain accurate.	2024-05-10 17:18:13 +02:00
Willy Tarreau	f70bd5fad1	MINOR: dynbuf: provide a b_dequeue() function to detach a bw from the queue Now that we need to keep the bitmap in sync with the list heads, we don't want tasks to leave just doing a LIST_DEL_INIT() without updating the map. Let's provide a b_dequeue() function for that purpose. The function detects when it's going to remove the last element and figures the queue number based on the pointer since it points to the root. It's not used yet.	2024-05-10 17:18:13 +02:00
Willy Tarreau	53461e4d94	CLEANUP: tinfo: better align fields in thread_ctx The introduction of buffer_wq[] in thread_ctx pushed a few fields around and the cache line alignment is less satisfying. And more importantly, even before this, all the lists in the local parts were 8-aligned, with the first one split across two cache lines. We can do better: - sched_profile_entry is not atomic at all, the data it points to is atomic so it doesn't need to be in the atomic-only region, and it can fill the 8-hole before the lists - the align(2*void) that was only before tasklets[] moves before all lists (and it's a nop for now) This now makes the lists and buffer_wq[] start on a cache line boundary, leaves 48 bytes after the lists before the atomic-only cache line, and leaves a full cache line at the end for 128-alignment. This way we still have plenty of room in both parts with better aligned fields.	2024-05-10 17:18:13 +02:00
Willy Tarreau	a5d6a79986	MEDIUM: dynbuf: make the buffer_wq an array of list heads Let's turn the buffer_wq into an array of 4 list heads. These are chosen by criticality. The DB_CRIT_TO_QUEUE() macro maps each criticality level into one of these 4 queues. The goal here clearly is to make it possible to wake up the most critical queues in priority in order to let some tasks finish their job and release buffers that others can use. In order to avoid having to look up all queues, a bit map indicates which queues are in use, which also allows to avoid looping in the most common case where queues are empty..	2024-05-10 17:18:13 +02:00
Willy Tarreau	a214197ce7	MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere The code places that were used to manipulate the buffer_wq manually now just call b_queue() or b_requeue(). This will simplify the multiple list management later.	2024-05-10 17:18:13 +02:00
Willy Tarreau	d1c2f325a2	MINOR: dynbuf: add functions to help queue/requeue buffer_wait fields When failing an allocation we always do the same dance, add the buffer_wait struct to a list if it's not, and return. Let's just add dedicated functions to centralize this, this will be useful to implement a bit more complex logic. For now they're not used.	2024-05-10 17:18:13 +02:00
Willy Tarreau	72d0dcda8e	MINOR: dynbuf: pass a criticality argument to b_alloc() The goal is to indicate how critical the allocation is, between the least one (growing an existing buffer ring) and the topmost one (boot time allocation for the life of the process). The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function to try to allocate otherwise subscribe. There's currently no distinction of direction nor part that tries to allocate, and this should be revisited to improve this situation, particularly when we consider that mux-h2 can reduce its Tx allocations if needed. For now, 4 main levels are planned, to translate how the data travels inside haproxy from a producer to a consumer: - MUX_RX: buffer used to receive data from the OS - SE_RX: buffer used to place a transformation of the RX data for a mux, or to produce a response for an applet - CHANNEL: the channel buffer for sync recv - MUX_TX: buffer used to transfer data from the channel to the outside, generally a mux but there can be a few specificities (e.g. http client's response buffer passed to the application, which also gets a transformation of the channel data). The other levels are a bit different in that they don't strictly need to allocate for the first two ones, or they're permanent for the last one (used by compression).	2024-05-10 17:18:13 +02:00
Christopher Faulet	eca9831ec8	MINOR: muxes: Add ctl commands to get info on streams for a connection There are 2 new ctl commands that may be used to retrieve the current number of streams openned for a connection and its limit (the maximum number of streams a mux connection supports). For the PT and H1 muxes, the limit is always 1 and the current number of streams is 0 for idle connections, otherwise 1 is returned. For the H2 and the FCGI muxes, info are already available in the mux connection. For the QUIC mux, the limit is also directly available. It is the maximum initial sub-ID of bidirectional stream allowed for the connection. For the current number of streams, it is the number of SC attached on the connection and the number of not already attached streams present in the "opening_list" list.	2024-05-06 22:00:00 +02:00
Christopher Faulet	96f8b7ad08	MEDIUM: stconn/muxes: Add an abort reason for SE shutdowns on muxes A reason is now passed as parameter to muxes shutdowns to pass additional info about the abort, if any. No info means no abort or only generic one. For now, the reason is composed of 2 32-bits integer. The first on represents the abort code and the other one represents the info about the code (for instance the source). The code should be interpreted according to the associated info. One info is the source, encoding on 5 bits. Other bits are reserverd for now. For now, the muxes are the only supported source. But we can imagine to extend it to applets, streams, health-checks... The current design is quite simple and will most probably evolved.. But the idea is to let the opposite side forward some errors and let's a mux know why its stream was aborted. At first glance, a abort reason must only be evaluated if SE_SHW_SILENT flag is set. The main goal at short term, is to forward some H2 RST_STREAM codes because it is mandatory for gRPC applications, mainly to forward gRPC cancellation from an H2 client to an H2 server. But we can imagine to alter this reason at the applicative level to enrich it. It would also be used to report more accurate errors in logs.	2024-05-06 22:00:00 +02:00
Aurelien DARRAGON	48e0efb00b	MEDIUM: log: optimizing tmp->type handling in sess_build_logline() Instead of chaining 2 switchcases and performing encoding checks for all nodes let's actually split the logic in 2: first handle simple node types (text/separator), and then handle dynamic node types (tag, expr). Encoding options are only evaluated for dynamic node types. Also, last_isspace is always set to 0 after next_fmt label, since next_fmt label is only used for dynamic nodes, thus != LOG_FMT_SEPARATOR. Since LF_NODE_WITH_OPT() macro (which was introduced recently) is now unused, let's get rid of it. No functional change should be expected. (Use diff -w to check patch changes since reindentation makes the patch look heavy, but in fact it remains fairly small)	2024-05-03 16:48:21 +02:00
Ilia Shipitsin	a65c6d3574	CLEANUP: assorted typo fixes in the code and comments This is 42nd iteration of typo fixes	2024-05-03 09:01:36 +02:00
Amaury Denoyelle	53782b9ea5	MINOR: stats: extract proxy clear-counter in a dedicated function Split code related to proxies list looping in cli_parse_clear_counters() to a new dedicated function. This function is placed in the new module stats-proxy.	2024-05-02 16:43:26 +02:00
Amaury Denoyelle	f0644d1bd7	REORG: stats: define stats-proxy source module Create a new module stats-proxy. Move stats functions related to proxies list looping in it. This allows to reduce stats source file dividing its size by half.	2024-05-02 16:42:36 +02:00
William Lallemand	964f093504	CLEANUP: ssl: rename new_ckch_store_load_files_path() to ckch_store_new_load_files_path() Rename the new_ckch_store_load_files_path() function to ckch_store_new_load_files_path(), in order to be more consistent.	2024-05-02 16:03:20 +02:00
Amaury Denoyelle	10ab56831e	MINOR: stats: convert age as generic column for proxy stat Convert FN_AGE in stat_cols_px[] as generic columns. These values will be automatically used for dump/preload of a stats-file. Remove srv_lastsession() / be_lastsession() function which are now useless as last_sess is calculated via me_generate_field().	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	634cc2a5d8	MINOR: counters: move last_change into counters struct last_change was a member present in both proxy and server struct. It is used as an age statistics to report the last update of the object. Move last_change into fe_counters/be_counters. This is necessary to be able to manipulate it through generic stat column and report it into stats-file. Note that there is a change for proxy structure with now 2 different last_change values, on frontend and backend side. Special care was taken to ensure that the value is initialized only on the proxy side. The other value is set to 0 unless a listen proxy is instantiated. For the moment, only backend counter is reported in stats. However, with now two distinct values, stats could be extended to report it on both side.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	fec2ae9b76	MINOR: stats: support rate in stats-file Implement support for FN_RATE stat column into stat-file. For the output part, only minimal change is required. Reuse the function read_freq_ctr() to print the same value in both stats output and stats-file dump. For counter preloading, define a new utility function preload_freq_ctr(). This can be used to initialize a freq-ctr type by preloading previous period value. Reuse this function in load_ctr() during stats-file parsing. At the moment, no rate column is defined as generic. Thus, this commit does not have functional change. This will be changed as soon as FN_RATE are converted to generic columns.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	639e73f8f2	MINOR: counters: move freq-ctr from proxy/server into counters struct Move freq-ctr defined in proxy or server structures into their dedicated fe_counters/be_counters struct. Functionnaly no change here. This commit will allow to convert rate stats column to generic one, which is mandatory to manipulate them in the stats-file.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	4e9e841878	MINOR: stats: prepare stats-file support for values other than FN_COUNTER Currently, only FN_COUNTER are dumped and preloaded via a stats-file. Thus in several places we relied on the assumption that only FN_COUNTER are valid in stats-file context. New stats types will soon be implemented as they are also eligilible to statistics reloading on process startup. Thus, prepare stats-file functions to remove any FN_COUNTER restriction. As one of this change, generate_stat_tree() now uses stcol_is_generic() for stats name tree indexing before stats-file parsing. Also related to stats-file parsing, individual counter preloading step as been extracted from line parsing in a dedicated new function load_ctr(). This will allow to extend it to support multiple mechanism of counter preloading depending on the stats type.	2024-05-02 10:55:25 +02:00
Valentine Krasnobaeva	5cbb278fae	MINOR: capabilities: add cap_sys_admin support If 'namespace' keyword is used in the backend server settings or/and in the bind string, it means that haproxy process will call setns() to change its default namespace to the configured one and then, it will create a socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN capability in the process Effective set (see man 2 setns). Otherwise, the process must be run as root. To avoid to run haproxy as root, let's add cap_sys_admin capability in the same way as we already added the support for some other network capabilities. As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was found during configuration parsing. The flag may be unset only in prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which inspect process EUID/RUID and Effective and Permitted capabilities sets. If system doesn't support Linux capabilities or 'cap_sys_admin' was not set in 'setcap', but 'namespace' keyword is presented in the configuration, we keep the previous strict behaviour. Process, that has changed uid to the non-priviledged user, will terminate with alert. This alert invites the user to recheck its configuration. In the case, when haproxy will start and run under a non-root user and 'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch does not change previous behaviour as well. We'll still let the user to try its configuration, but we inform via warning, that unexpected things, like socket creation errors, may occur.	2024-04-30 21:40:17 +02:00
Valentine Krasnobaeva	d3fc982cd7	MEDIUM: proto: make common fd checks in sock_create_server_socket quic_connect_server(), tcp_connect_server(), uxst_connect_server() duplicate same code to check different ERRNOs, that socket() and setns() may return. They also duplicate some runtime condition checks, applied to the obtained server socket fd. So, in order to remove these duplications and to improve code readability, let's encapsulate socket() and setns() ERRNOs handling in sock_handle_system_err(). It must be called just before fd's runtime condition checks, which we also move in sock_create_server_socket by the same reason.	2024-04-30 21:39:24 +02:00
Valentine Krasnobaeva	772d070ab5	MINOR: sock_set_mark: take sock family in account SO_MARK, SO_USER_COOKIE, SO_RTABLE socket options (used to set the special mark/ID on socket, in order to perform mark-based routing) are only supported by AF_INET sockets. So, let's check socket address family, when we enter into this function.	2024-04-30 21:38:29 +02:00
Aurelien DARRAGON	9931a62c3f	BUG/MINOR: log: fix global lf_expr node options behavior (2nd try) In `98b44e8` ("BUG/MINOR: log: fix global lf_expr node options behavior"), I properly restored global node options behavior for when encoding is not used, however the fix is not optimal when encoding is involved: Indeed, encoding logic in sess_build_logline() relies on global node options to know if encoding must be handled expression-wide or individually. However, because of the above fix, if an expression is made of 1 or multiple nodes that all set an encoding option manually (without '%o'), we consider that the option was set globally, but that's probably not what the user intended. Instead we should only evaluate global options from '%o', so that it remains possible to skip global encoding when needed. No backport needed.	2024-04-30 10:10:35 +02:00
William Lallemand	95949e6868	MINOR: httpclient: allow to use absolute URI with new flag HC_F_HTTPROXY The new HC_F_HTTPPROXY flag allows to use an absolute URI within a request that won't be modified in order to use an http proxy.	2024-04-29 17:10:47 +02:00
Aurelien DARRAGON	9bdce67585	CLEANUP: log: add a macro to know if a lf_node is configurable LF_NODE_WITH_OPT(node) returns true if the node's option may be set and thus should be considered. Logic is based on logformat node's type: for now only TAG and FMT nodes can be configured.	2024-04-29 14:47:37 +02:00
Aurelien DARRAGON	0e2aea8224	CLEANUP: tools/cbor: rename cbor_encode_ctx struct members Rename e_byte_fct to e_fct_byte and e_fct_byte_ctx to e_fct_ctx, and adjust some comments to make it clear that e_fct_ctx is here to provide additional user-ctx to the custom cbor encode function pointers. For now, only e_fct_byte function may be provided, but we could imagine having e_fct_int{16,32,64}() one day to speed up the encoding when we know we can encode multiple bytes at a time, but for now it's not worth the hassle.	2024-04-29 14:47:37 +02:00
Willy Tarreau	1db3a390bb	MINOR: list: add a macro to detect that a list contains at most one element The new LIST_ATMOST1() test verifies that the designated element is either alone or points on both sides to the same element. This is used to detect that a list has at most a single element, or that an element about to be deleted was the last one of a list.	2024-04-27 09:36:36 +02:00
Aurelien DARRAGON	c614fd3b9f	MINOR: log: add +cbor encoding option In this patch, we make use of the CBOR (RFC8949) encode helper functions from the previous commit to implement '+cbor' encoding option for log- formats. The logic behind it is pretty similar to '+json' encoding option, except that the produced output is a CBOR payload written in HEX format so that it remains compatible to use this with regular syslog endpoints. Example: log-format "%{+cbor}o %[int(4)] test %(named_field)[str(ok)]" Will produce: BF6B6E616D65645F6669656C64626F6BFF Detailed view (from cbor.me): BF # map() 6B # text(11) 6E616D65645F6669656C64 # "named_field" 62 # text(2) 6F6B # "ok" FF # primitive() If the option isn't set globally, but on a specific node instead, then only the value will be encoded according to CBOR specification. Example: log-format "test cbor bool: %{+cbor}[bool(true)]" Will produce: test cbor bool: F5	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	810303e3e6	MINOR: tools: add cbor encode helpers Add cbor helpers to encode strings (bytes/text) and integers according to RFC8949, also add cbor_encode_ctx struct to pass encoding options such as how to encode a single byte.	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	3f7c8387c0	MINOR: log: add +json encoding option In this patch, we add the "+json" log format option that can be set globally or per log format node. What it does, it that it sets the LOG_OPT_ENCODE_JSON flag for the current context which is provided to all lf_* log building function. This way, all lf_* are now aware of this option and try to comply with JSON specification when the option is set. If the option is set globally, then sess_build_logline() will produce a map-like object with key=val pairs for named logformat nodes. (logformat nodes that don't have a name are simply ignored). Example: log-format "%{+json}o %[int(4)] test %(named_field)[str(ok)]" Will produce: {"named_field": "ok"} If the option isn't set globally, but on a specific node instead, then only the value will be encoded according to JSON specification. Example: log-format "{ \"manual_key\": %(named_field){+json}[bool(true)] }" Will produce: {"manual_key": true} When the option is set, +E option will be ignored, and partial numerical values (ie: because of logasap) will be encoded as-is.	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	b7c3d8c87c	MINOR: log: add +bin logformat node option Support '+bin' option argument on logformat nodes to try to preserve binary output type with binary sample expressions. For this, we rely on the log/sink API which is capable of conveying binary data since all related functions don't search for a terminating NULL byte in provided log payload as they take a string pointer and a string length as argument. Example: log-format "%{+bin}o %[bin(00AABB)]" Will produce: 00aabb (output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX characters) This should be used carefully, because many syslog endpoints don't expect binary data (especially NULL bytes). This is mainly intended for use with set-var-fmt actions or with ring/udp log endpoints that know how to deal with such binary payloads. Also, this option is only supported globally (for use with '%o'), it will not have any effect when set on an individual node. (it makes no sense to have binary data in the middle of log payload that was started without binary data option)	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	2caa921abf	MINOR: log: add LOG_OPT_NONE flag Add LOG_OPT_NONE flag for default value. Flag is not explicitly used yet but with way we make it official that 0 value means NONE.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	a1583ec7c7	MINOR: log: make all lf_* sess build helper static There is no need to expose such functions since they are only involved in the log building process that occurs inside sess_build_logline(). Making functions static and removing their public prototype to ease code maintenance.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	507223d527	MINOR: log: global lf_expr node options Add options to lf_expr->nodes to store global options (those that are common to all node) for easier access. No functional change should be expected.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	7ff4f09e23	MINOR: log: store lf_expr nodes inside substruct Add another struct level inside lf_expr struct to allow new information to be stored alongside lf_expr nodes.	2024-04-26 18:39:31 +02:00
Amaury Denoyelle	374dc08611	MINOR: stats: parse header lines from stats-file This patch implements parsing of headers line from stats-file. A header line is defined as starting with '#' character. It is directly followed by a domain name. For the moment, either 'fe' or 'be' is allowed. The following lines will contain counters values relatives to the domain context until the next header line. This is implemented via static function parse_header_line(). It first sets the domain context used during apply_stats_file(). A stats column array is generated to contains the order on which column are stored. This will be reused to parse following lines values. If an invalid line is found and no header was parsed, considered the stats-file as ill formatted and stop parsing. This allows to immediately interrupt parsing if a garbage file was used without emitting a ton of warnings to the user.	2024-04-26 11:34:02 +02:00
Amaury Denoyelle	34ae7755b3	MINOR: stats: apply stats-file on process startup This commit is the first one of a serie to implement preloading of haproxy counters via stats-file parsing. This patch defines a basic apply_stats_file() function. It implements reading line by line of a stats-file without any parsing for the moment. It is called automatically on process startup via init().	2024-04-26 11:29:25 +02:00
Amaury Denoyelle	83731c8048	MINOR: guid: define guid_is_valid_fmt() Extract GUID format validation in a dedicated function named guid_is_valid_fmt(). For the moment, it is only used on guid_insert(). This will be reused when parsing stats-file, to ensure GUID has a valid format before tree lookup.	2024-04-26 11:29:25 +02:00
Amaury Denoyelle	bc3c117dc0	MINOR: ist: define iststrip() new function Implement iststrip(). This function removes any trailing newline sequence if present from an ist.	2024-04-26 11:29:25 +02:00
Amaury Denoyelle	e74148fb7c	MEDIUM: stats: implement dump stats-file CLI Define a new CLI command "dump stats-file" with its handler cli_parse_dump_stat_file(). It will loop twice on proxies_list to dump first frontend and then backend side. It reuses the common function stats_dump_stat_to_buffer(), using STAT_F_BOUND to restrict on the correct side. A new module stats-file.c is added to regroup function specifics to stats-file. It defines two main functions : * stats_dump_file_header() to generate the list of column list prefixed by the line context, either "#fe" or "#be" * stats_dump_fields_file() to generate each stat lines. Object without GUID are skipped. Each stat entry is separated by a comma. For the moment, stats-file does not support statistics modules. As such, stats_dump_*_line() functions are updated to prevent looping over stats module on stats-file output.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	83281303f6	MINOR: stats: define stats-file output format support Prepare stats function to handle a new format labelled "stats-file". Its purpose is to generate a statistics dump with a format closed from the CSV output. Such output will be then used to preload haproxy internal counters on process startup. stats-file output differs from a standard CSV on several points. First, only an excerpt of all statistics is outputted. All values that does not make sense to preload are excluded. For the moment, stats-file only list stats fully defined via "struct stat_col" method. Contrary to a CSV, sll columns of a stats-file will be filled. As such, empty field value is used to mark stats which should not be outputted. Some adaptation specifics to stats-file are necessary into me_generate_field(). First, stats-file will output separatedly values from frontend and backend sides with their own respective set of columns. As such, an empty field value is returned if stat is not defined for either frontend/listener, or backend/server when outputting the other side. Also, as stats-file does not support empty column, stcol_hide() is not used for it. A minor adjustement was necessary for stats_fill_fe_line() to pass context flags. This is necessary to detect stat output format. All other listener/server/backend corresponding functions already have it.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	6615252656	MEDIUM: stats: convert counters to new column definition Convert most of proxy counters statistics to new "struct stat_col" definition. Remove their corresponding switch..case entries in stats_fill_*_line() functions. Their value are automatically calculate via me_generate_field() invocation. Along with this, also complete stcol_hide() when some stats should be hidden. Only a few counters where not converted. This is because they rely on values stored outside of fe/be_counters structure, which me_generate_field() cannot use for now.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	a7810b7be6	MINOR: stats: implement automatic metric generation from stat_col This commit is a direct follow-up of the previous one which define a new type "struct stat_col" to fully define a statistic entry. Define a new function metric_generate(). For metrics statistics, it is able to automatically calculate a stat value field for "offsets" from "struct stat_col". Use it in stats_fill_*_stats() functions. Maintain a fallback to previously used switch-case for old-style statistics. This commit does not introduce functional change as currently no statistic is defined as "struct stat_col". This will be the subject of a future commit.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	65624876f2	MINOR: stats: introduce a more expressive stat definition method Previously, statistics were simply defined as a list of name_desc, as for example "stat_cols_px" for proxy stats. No notion of type was fixed for each stat definition. This correspondance was done individually inside stats_fill_*_line() functions. This renders the process to define new statistics tedious. Implement a more expressive stat definition method via a new API. A new type "struct stat_col" for stat column to replace name_desc usage is defined. It contains a field to store the stat nature and format. A <cap> field is also defined to be able to define a proxy stat only for certain type of objects. This new type is also further extended to include counter offsets. This allows to define a method to automatically generate a stat value field from a "struct stat_col". This will be the subject of a future commit. New type "struct stat_col" is fully compatible full name_desc. This allows to gradually convert stats definition. The focus will be first for proxies counters to implement statistics preservation on reload.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	861370a6d4	MINOR: stats: update ambiguous "metrics" naming to "stat_cols" The name "metrics" was chosen to represent the various list of haproxy exposed statistics. However, it is deemed as ambiguous as some stats are indeed metric in the true sense, but some are not, as highlighted by various "enum field_origin" values. Replace it by the new name "stat_cols" for statistic columns. Along with the already existing notion of stat lines it should better reflect its purpose.	2024-04-26 10:20:57 +02:00
Christopher Faulet	608e23c495	MINOR: peers: Use a static variable to wait a resync on reload When a process is reloaded, the old process must performed a synchronisation with the new process. To do so, the sync task notify the local peer to proceed and waits. Internally, the sync task used PEERS_F_DONOTSTOP flag to know it should wait. However, this flag was only set/unset in a single function. There is no real reason to set a flag to do so. A static variable set to 1 when the resync starts and to 0 when it is finished is enough.	2024-04-25 18:29:58 +02:00
Christopher Faulet	5df54f4796	DEV: flags/peers: Decode PEER and PEERS flags Decode peer and peers flags via peer_show_flags() and peers_show_flags() functions.	2024-04-25 18:29:58 +02:00
Christopher Faulet	697bd69efc	REORG: peers: Move peer and peers flags in the corresponding header file PEER_F_* and PEERS_F_ * flags were moved to <peer-t.h> header file. It is mandatory to decode them from "flags" dev tool.	2024-04-25 18:29:58 +02:00
Christopher Faulet	c904f7b440	MEDIUM: peers: Use true states for the learn state of a peer Some flags were used to define the learn state of a peer. It was a bit confusing, especially because the learn state of a peer is manipulated from the peer applet but also from the sync task. It is harder to understand the transitions if it is based on flags than if it is based a dedicated state based on an enum. It is the purpose of this patch. Now, we can define the following rules regarding this learn state: * A peer is assigned to learn by the sync task * The learn state is then changed by the peer itself to notify the learning is in progress and when it is finished. * Finally, when the peer finished to learn, the sync task must acknowledge it by unassigning the peer.	2024-04-25 18:29:57 +02:00
Christopher Faulet	ea9bd6d075	MEDIUM: peers: Use true states for the peer applets as seen from outside This patch is a cleanup of the recent change about the relation between a peer and the applet used to deal with I/O. Three flags was introduced to reflect the peer applet state as seen from outside (from the sync task in fact). Using flags instead of true states was in fact a bad idea. This work but it is confusing. Especially because it was mixed with LEARN and TEACH peer flags. So, now, to make it clearer, we are now using a dedicated state for this purpose. From the outside, the peer may be in one of the following state with respects of its applet: * the peer has no applet, it is stopped (PEER_APP_ST_STOPPED). * the peer applet was created with a validated connection from the protocol perspective. But the sync task must synchronized it with the peers section. It is in starting state (PEER_APP_ST_STARTING). * The starting starting was acknowledged by the sync task, the peer applet can start to process messages. It is in running state (PEER_APP_ST_RUNNING). * The last peer applet was released and the associated connection closed. But the sync task must synchronized it with the peers section. It is in stopping state (PEER_APP_ST_STOPPING). Functionnaly speaking, there is no true change here. But it should be easier to understand now. In addition to these changes, __process_peer_state() function was renamed sync_peer_app_state().	2024-04-25 18:29:57 +02:00
Christopher Faulet	bea541b70a	MINOR: applet: Add a function to know the side where an applet was created appctx_is_back() function may be used to know if an applet was create on frontend side or on backend side. It may be handy for some applets that may exist on both sides, like peer applets.	2024-04-25 18:29:57 +02:00
Willy Tarreau	13515d9fbe	MINOR: intops: add a pair of functions to check multi-byte ranges These new functions is_char4_outside() and is_char8_outside() are meant to be used to verify if any of the 4 or 8 chars represented respectively by a uint32_t or a uint64_t is outside of the min,max byte range passed in argument. This is the simplified, fast version of the function so it is restricted to less than 0x80 distance between min and max (sufficient to validate chars). Extra functions are also provided to check for min or max alone as well, with the same restriction. The use case typically is to check that the output of read_u32() or read_u64() contains exclusively certain bytes.	2024-04-24 15:54:55 +02:00
David Carlier	98d22f212a	MEDIUM: shctx: Naming shared memory context From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA so caches can be identified when looking at HAProxy process memory mapping. The most possible error is lack of kernel support, as a result we ignore it, if the naming fails the mapping of memory context ought to still occur.	2024-04-24 10:25:38 +02:00
Tim Duesterhus	aab6477b67	MINOR: Add `ha_generate_uuid_v7` This function generates a version 7 UUID as per draft-ietf-uuidrev-rfc4122bis-14.	2024-04-24 08:23:56 +02:00
Tim Duesterhus	c6cea750a9	MINOR: tools: Rename `ha_generate_uuid` to `ha_generate_uuid_v4` This is in preparation of adding support for other UUID versions.	2024-04-24 08:23:56 +02:00
Willy Tarreau	19f8762a98	BUILD: stick-tables: silence build warnings when threads are disabled Since 3.0-dev7 with commit `1a088da7c2` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), building without threads yields a warning about the shard not being used. This is because the locks API does nothing of its arguments, which is the only place where the shard is being used. We cannot modify the lock API to pretend to consume its argument because quite often it's not even instantiated. Let's just pretend we consume shard using an explict ALREADY_CHECKED() statement instead. While we're at it, let's make sure that XXH32() is not called when there is a single bucket! No backport is needed.	2024-04-24 08:23:56 +02:00
Amaury Denoyelle	341bf913d4	MINOR: stats: use STAT_F_* prefix for flags Some flags are defined during statistics generation and output. They use the prefix STAT_* which is also used for other purposes. Rename them with the new prefix STAT_F_* to differentiate them from the other usages.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	e97375dcab	MINOR: stats: use stricter naming stats/field/line Several unique names were used for different purposes under statistics implementation. This caused the code to be difficult to understand. * stat/stats name is removed when a more specific name could be used * restrict field usage to purely refer to <struct field> which represents a raw stat value. * use "line" naming to represent an array of <struct field>	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	8dbb74542f	MINOR: stats: rename info stats Info are used to expose haproxy global metrics. It is similar to proxy statistics and any other module. As such, rename info indexes using SI_I_INF_* prefix. Also info variable is renamed stat_line_info. Thanks to this, naming is now consistent between info and other statistics. It will help to integrate it as a "global" statistics module.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	02e0dd6d30	MINOR: stats: rename ambiguous stat_l and stat_count Statistics were extended with the introduction of stats module. This mechanism allows to expose various metrics for several haproxy components. As a consequence of this, some static variables were transformed to dynamic ones to be able to regroup all statistics definition. Rename these variables with more explicit naming : * stat_lines can be used to generate one line of statistics for any module using struct field as value * metrics and metrics_len are used to stored description of metrics indexed by module Note that info is not integrated in the statistics module mechanism. However, it could be done in the future to better reflect its purpose.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	8fc0b18087	MINOR: stats: rename proxy stats This commit is the first one of a serie which adjust naming convention for stats module. The objective is to remove ambiguity and better reflect how stats are implemented, especially since the introduction of stats module. This patch renames elements related to proxies statistics. One of the main change is to rename ST_F_* statistics indexes prefix with the new name ST_I_PX_*. This remove the reference to field which represents another concept in the stats module. In the same vein, global stat_fields variable is renamed metrics_px.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	c02ec9a9db	BUG/MINOR: backend: use cum_sess counters instead of cum_conn This commit is part of a serie to align counters usage between frontends/listeners on one side and backends/servers on the other. "stot" metric refers to the total number of sessions. On backend side, it is interpreted as a number of streams. Previously, this was accounted using <cum_sess> be_counters field for servers, but <cum_conn> instead for backend proxies. Adjust this by using <cum_sess> for both proxies and servers. As such, <cum_conn> field can be removed from be_counters. Note that several diagnostic messages which reports total frontend and backend connections were adjusted to use <cum_sess>. However, this is an outdated and misleading information as it does reports streams count on backend side. These messages should be fixed in a separate commit. This should be backported to all stable releases.	2024-04-22 10:35:18 +02:00
Amaury Denoyelle	93066be32d	MINOR: backend: use be_counters for health down accounting This commit is the first one of a series which aims to align counters usage between frontends/listeners on one side and backends/servers on the other. Remove <down_trans> field from proxy structure. Use instead the same name field from be_counters structure, which is already used for servers.	2024-04-22 10:35:18 +02:00
Christopher Faulet	fbc0850d36	MEDIUM: muxes: Use one callback function to shut a mux stream mux-ops .shutr and .shutw callback functions are merged into a unique functions, called .shut. The shutdown mode is still passed as argument, muxes are responsible to test it. Concretly, .shut() function of each mux is now the content of the old .shutw() followed by the content of the old .shutr().	2024-04-19 16:33:40 +02:00
Christopher Faulet	1e38ac72ce	MEDIUM: stconn: Use one function to shut connection and applet endpoints se_shutdown() function is now used to perform a shutdown on a connection endpoint and an applet endpoint. The same function is used for both. sc_conn_shut() function was removed and appctx_shut() function was updated to only deal with the applet stuff.	2024-04-19 16:33:35 +02:00
Christopher Faulet	4b80442832	MEDIUM: stconn: Explicitly pass shut modes to shut applet endpoints It is the same than the previous patch but for applets. Here there is already only one function. But with this patch, appctx_shut() function was modified to explicitly get shutdown mode as parameter. In addition appctx_shutw() was removed.	2024-04-19 16:25:06 +02:00
Christopher Faulet	c96a873ba3	MEDIUM: stconn: Use only one SC function to shut connection endpoints The SC API to perform shutdowns on connection endpoints was unified to have only one function, sc_conn_shut(), with read/write shut modes passed explicitly. It means sc_conn_shutr() and sc_conn_shutw() were removed. The next step is to do the same at the mux level.	2024-04-19 16:25:06 +02:00
Christopher Faulet	d2c3f8dde7	MINOR: stconn/connection: Move shut modes at the SE descriptor level CO_SHR_* and CO_SHW_* modes are in fact used by the stream-connectors to instruct the muxes how streams must be shut done. It is then the mux responsibility to decide if it must be propagated to the connection layer or not. And in this case, the modes above are only tested to pass a boolean (clean or not). So, it is not consistant to still use connection related modes for information set at an upper layer and never used by the connection layer itself. These modes are thus moved at the sedesc level and merged into a single enum. Idea is to add more modes, not necessarily mutually exclusive, to pass more info to the muxes. For now, it is a one-for-one renaming.	2024-04-19 16:24:46 +02:00
Christopher Faulet	f58883002c	BUG/MINOR: stconn: Fix sc_mux_strm() return value Since the begining, this function returns a pointer on an appctx while it should be a void pointer. It is the caller responsibility to cast it to the right type, the corresponding mux stream in this case. However, it is not a big deal because this function is unused for now. Only the unsafe one is used. This patch must be backported as far as 2.6.	2024-04-19 15:31:06 +02:00
Olivier Houchard	a7caa14a64	MINOR: stats: Get the right prototype for stats_dump_html_end(). When the stat code was reorganized, and the prototype to stats_dump_html_end() was moved to its own header, it missed the function arguments. Fix that. This should fix issue 2540.	2024-04-19 01:54:00 +02:00
Amaury Denoyelle	0109c0658d	REORG: stats: extract JSON related functions This commit is similar to the previous one. This time it deals with functions related to stats JSON output.	2024-04-18 17:04:08 +02:00
Amaury Denoyelle	b8c1fdf24e	REORG: stats: extract HTML related functions Extract functions related to HTML stats webpage from stats.c into a new module named stats-html. This allows to reduce stats.c to roughly half of its original size.	2024-04-18 17:04:08 +02:00
Amaury Denoyelle	b3d5708adc	MINOR: stats: remove implicit static trash_chunk usage A static variable trash_chunk was used as implicit buffer in most of stats output function. It was a oneline buffer uses as temporary storage before emitting to the final applet or CLI buffer. Replaces it by a buffer defined in show_stat_ctx structure. This allows to retrieve it in most of stats output function. An additional parameter was added for the function where context was not already used. This renders the code cleaner and will allow to split stats.c in several source files. As a result of a new member into show_stat_ctx, per-command context max size has increased. This forces to increase APPLET_MAX_SVCCTX to ensure pool size is big enough. Increase it to 128 bytes which includes some extra room for the future.	2024-04-18 17:04:08 +02:00
Christopher Faulet	9b3a27f70c	BUILD: linuxcap: Properly declare prepare_caps_from_permitted_set() Expected arguments were not specified in the prepare_caps_from_permitted_set() function declaration. It is an issue for some compilers, for instance clang. But at the end, it is unexpected and deprecated. No backport needed, except if `f0b6436f57` ("MEDIUM: capabilities: check process capabilities sets") is backported.	2024-04-18 10:17:38 +02:00
Christopher Faulet	40aa87a28f	BUG/MEDIUM: applet: Fix applet API to put input data in a buffer applet_putblk and co were added to simplify applets. In 2.8, a fix was pushed to deal with all errors as a room error because the vast majority of applets didn't expect other kind of errors. The API was changed with the commit 389b7d1f7b ("BUG/MEDIUM: applet: Fix API for function to push new data in channels buffer"). Unfortunately and for unknown reason, the fix was totally failed. Checks on channel functions were just wrong and not consistent. applet_putblk() function is especially affected because the error is returned but no flag are set on the SC to request more room. Because of this bug, applets relying on it may be blocked, waiting for more room, and never woken up. It is an issue for the peer and spoe applets. This patch must be backported as far as 2.8.	2024-04-18 09:17:03 +02:00
William Lallemand	10224d72fd	BUG/MINOR: ssl: fix crt-store load parsing The crt-store load line parser relies on offsets of member of the ckch_conf struct. However the new "alias" keyword as an offset to -1, because it does not need to be used. Plan was to handle it that way in the parser, but it wasn't supported yet. So -1 was still used in an offset computation which was not used, but ASAN could see the problem. This patch fixes the issue by using a signed type for the offset value, so any negative value would be skipped. It also introduced a PARSE_TYPE_NONE for the parser. No backport needed.	2024-04-17 21:00:34 +02:00
Ilya Shipitsin	ab7f05daba	CLEANUP: assorted typo fixes in the code and comments This is 41st iteration of typo fixes	2024-04-17 11:14:44 +02:00
Willy Tarreau	99c918ed8a	BUILD: xxhash: silence a build warning on Solaris + gcc-5.5 Testing an undefined macro emits warnings due to -Wundef, and we have exactly one such case in xxhash: include/import/xxhash.h:3390:42: warning: "__cplusplus" is not defined [-Wundef] #if ((defined(sun) \|\| defined(__sun)) && __cplusplus) /* Solaris includes __STDC_VERSION__ with C++. Tested with GCC 5.5 */ Let's just prepend "defined(__cplusplus) &&" before __cplusplus to resolve the problem. Upstream is still affected apparently.	2024-04-17 09:43:32 +02:00
Frederic Lecaille	98583c4256	BUG/MEDIUM: grpc: Fix several unaligned 32/64 bits accesses There were several places in grpc and its dependency protobuf where unaligned accesses were done. Read accesses to 32 (resp. 64) bits values should be performed by read_u32() (resp. read_u64()). Replace these unligned read accesses by correct calls to these functions. Same fixes for doubles and floats. Such unaligned read accesses could lead to crashes with bus errors on CPU archictectures which do not fix them at run time. This patch depends on this previous commit: 861199fa71 MINOR: net_helper: Add support for floats/doubles. Must be backported as far as 2.6.	2024-04-16 07:37:28 +02:00
Frederic Lecaille	153fac4804	MINOR: net_helper: Add support for floats/doubles. Implement (read\|write)_flt() (resp. (read\|write)_dbl()) to read/write floats (resp. read/write doubles) from/to an unaligned buffer.	2024-04-16 07:37:28 +02:00
William Lallemand	fa5c4cc6ce	MINOR: ssl: 'key-base' allows to load a 'key' from a specific path The global 'key-base' keyword allows to read the 'key' parameter of a crt-store load line using a path prefix. This is the equivalent of the 'crt-base' keyword but for 'key'. It only applies on crt-store.	2024-04-15 15:27:10 +02:00
William Lallemand	6567d09af5	MINOR: ssl: supports crt-base in crt-store Add crt-base support for "crt-store". It will be used by 'crt', 'ocsp', 'issuer', 'sctl' load line parameter. In order to keep compatibility with previous configurations and scripts for the CLI, a crt-store load line will save its ckch_store using the absolute crt path with the crt-base as the ckch tree key. This way, a `show ssl cert` on the CLI will always have the completed path.	2024-04-15 15:25:36 +02:00
Willy Tarreau	4615cb510c	MINOR: ring: always check that the old ring fits in the new one in ring_dup() Let's add a BUG_ON() to make sure we don't accidentally shrink a buffer.	2024-04-15 08:31:01 +02:00
Willy Tarreau	b662c5d2b8	MINOR: ring: clarify the usage of ring_size() and add ring_allocated_size() There's currently an abiguity around ring_size(), it's said to return the allocated size but returns the usable size. We can't change it as it's used everywhere in the code like this. Let's fix the comment and add ring_allocated_size() instead for anything related to allocation.	2024-04-15 08:25:03 +02:00
Willy Tarreau	c0ee2d78d7	DEBUG: pools: report the data around the offending area in case of mismatch When the integrity check fails, it's useful to get a dump of the area around the first faulty byte. That's what this patch does. For example it now shows this before reporting info about the tag itself: Contents around first corrupted address relative to pool item:. Contents around address 0xe4febc0792c0+40=0xe4febc0792e8: 0xe4febc0792c8 [80 75 56 d8 fe e4 00 00] [.uV.....] 0xe4febc0792d0 [a0 f7 23 a4 fe e4 00 00] [..#.....] 0xe4febc0792d8 [90 75 56 d8 fe e4 00 00] [.uV.....] 0xe4febc0792e0 [d9 93 fb ff fd ff ff ff] [........] 0xe4febc0792e8 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc0792f0 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc0792f8 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc079300 [d9 93 fb ff ff ff ff ff] [........] This may be backported to 2.9 and maybe even 2.8 as it does help spot the cause of the memory corruption.	2024-04-12 18:01:55 +02:00
Willy Tarreau	16e3655fbd	REORG: pool: move the area dump with symbol resolution to tools.c This function is particularly useful to dump unknown areas watching for opportunistic symbols, so let's move it to tools.c so that we can reuse it a little bit more.	2024-04-12 18:01:20 +02:00
William Lallemand	81e54ef197	MINOR: ssl: rename ckchs_load_cert_file to new_ckch_store_load_files_path Remove the ambiguous "ckchs" name and reflect the fact that its loaded from a path.	2024-04-12 15:38:54 +02:00
William Lallemand	00eb44864b	MINOR: ssl: add the section parser for 'crt-store' 'crt-store' is a new section useful to define the struct ckch_store. The "load" keyword in the "crt-store" section allows to define which files you want to load for a specific certificate definition. Ex: crt-store load crt "site1.crt" key "site1.key" load crt "site2.crt" key "site2.key" frontend in bind *:443 ssl crt "site1.crt" crt "site2.crt" This is part of the certificate loading which was discussed in #785.	2024-04-12 15:38:54 +02:00
Willy Tarreau	772f9a5874	BUILD: pools: make DEBUG_MEMORY_POOLS=1 the default option This option has been set by default for a very long time and also complicates the manipulation of the DEBUG variable. Let's make it the official default and permit to unset it by setting it to zero. The other pool-related DEBUG options were adjusted to also explicitly check for the zero value for consistency.	2024-04-11 17:25:45 +02:00
Willy Tarreau	b70981532a	BUILD: debug: make DEBUG_STRICT=1 the default We continue to carry it in the makefile, which adds to the difficulty of passing new options. Let's make DEBUG_STRICT=1 the default so that one has to explicitly pass DEBUG_STRICT=0 to disable it. This allows us to remove the option from the default DEBUG variable in the makefile.	2024-04-11 17:25:45 +02:00
Willy Tarreau	e791b243f0	BUG/MINOR: debug: make sure DEBUG_STRICT=0 does work as documented Setting DEBUG_STRICT=0 only validates the defined(DEBUG_STRICT) test regarding DEBUG_STRICT_ACTION, which is equivalent to DEBUG_STRICT>=0. Let's make sure the test checks for >0 so that DEBUG_STRICT=0 properly disables DEBUG_STRICT.	2024-04-11 16:41:08 +02:00
Willy Tarreau	2a9ccf5b25	BUILD: atomic: fix peers build regression on gcc < 4.7 after recent changes Recent commit `4c1480f13b` ("MINOR: stick-tables: mark the seen stksess with a flag "seen"") introduced a build regression on older versions of gcc before 4.7. This is in the old __sync_ API, the HA_ATOMIC_LOAD() implementation uses an intermediary return value called "ret" that is of the same name as the variable passed in argument to the macro in the aforementioned commit. As such, the compiler complains with a cryptic error: src/peers.c: In function 'peer_teach_process_stksess_lookup': src/peers.c:1502: error: invalid type argument of '->' (have 'int') The solution is to avoid referencing the argument in the expression and using an intermediary variable for the pointer as done elsewhere in the code. It seems there's no other place affected with this. It probably does not need to be backported since this code is antique and very rarely used nowadays.	2024-04-11 16:41:08 +02:00
Willy Tarreau	d78c346670	BUILD: makefile: support USE_xxx=0 as well William rightfully reported that not supporting =0 to disable a USE_xxx option is sometimes painful (e.g. a script might do USE_xxx=$(command)). It's not that difficult to handle actually, we just need to consider the value 0 as empty at the few places that test for an empty string in options.mk, and in each "ifneq" test in the main Makefile, so let's do that. We even take care of preserving the original value in the build options string so that building with USE_OPENSSL=0 will be reported as-is in haproxy -vv, and with "-OPENSSL" in the feature list.	2024-04-11 11:06:19 +02:00
Willy Tarreau	aa32ab13f0	BUILD: makefile: warn about unknown USE_* variables William suggested that it would be nice to warn about unknown USE_* variables to more easily catch misspelled ones. The valid ones are present in use_opts, so by appending "=%" to each of them, we can build a series of patterns to exclude from MAKEOVERRIDES and emit a warning for the ones that stand out. Example: $ make TARGET=linux-glibc USE_QUIC_COMPAT_OPENSSL=1 Makefile:338: Warning: ignoring unknown build option: USE_QUIC_COMPAT_OPENSSL=1 CC src/slz.o	2024-04-11 11:06:19 +02:00
Christopher Faulet	1fa6eb2eb9	BUG/MINOR: http-ana: Fix TX_L7_RETRY and TX_D_L7_RETRY values These values are obviously wrong. There is an extra zero at the end for both defines. By chance, it is harmless. But it is better to fix it. This patch should be backported as far as 2.6.	2024-04-10 15:50:00 +02:00
Amaury Denoyelle	34b31d85cb	OPTIM: quic: do not call qc_send() if nothing to emit qc_send() was systematically called by quic_conn IO handlers with all instantiated quic_enc_level. Change this to only register quic_enc_level for send if needed. Do not call at all qc_send() if no qel registered. A new function qel_need_sending() is defined to detect if sending is required. First, it checks if quic_enc_level has prepared frames or probing is set. It can also returns true if ACK required either on quic_enc_level itself or because of quic_conn ack timer fired. Finally, a CONNECTION_CLOSE emission for quic_conn is also a valid case. This should reduce the number of invocations of qc_send(). This could improve slightly performance, as well as simplify traces debugging.	2024-04-10 11:17:21 +02:00
Amaury Denoyelle	7fc1ce5bc8	MEDIUM: quic: remove duplicate hdshk/app send functions A series of previous patches have clean up sending function for handshake case. Their new exposed API is now flexible enough to convert app case to use the same functions. As such, qc_send_hdshk_pkts() is renamed qc_send() and become the single entry point for QUIC emission. It is used during application packets emission in quic_conn_app_io_cb(), qc_send_mux(). Also the internal function qc_prep_hpkts() is renamed qc_prep_pkts(). Remove the new unneeded qc_send_app_pkts() and qc_prep_app_pkts(). Also removed qc_send_app_probing(). It was a simple wrapper over other application send functions. Now, default qc_send() can be reuse for such cases with <old_data> argument set to true. An adjustment was needed when converting qc_send_hdshk_pkts() to the general qc_send() version. Previously, only a single packets encoding/emission cycle was performed. This was enough as handshake packets are always smaller than Tx buffer. However, it may be possible to emit more application data. As such, a loop is necessary to perform multiple encoding/emission cycles, as this was already the case in qc_send_app_pkts(). No functional difference should happen with this commit. However, as these are critcal functions with a lot of changes, this patch is labelled as medium.	2024-04-10 11:07:35 +02:00
Amaury Denoyelle	4e4127a66d	MINOR: quic: use qc_send_hdshk_pkts() in handshake IO cb quic_conn_io_cb() manually implements emission by using lower level functions qc_prep_pkts() and qc_send_ppkts(). Replace this by using the higher level function qc_send_hdshk_pkts() which notably handle buffer allocation and purging. This allows to clean up send API by flagging qc_prep_pkts() and qc_send_ppkts() as static. They are now used in a single location inside qc_send_hdshk_pkts().	2024-04-10 11:07:19 +02:00
Amaury Denoyelle	3a8f4761e7	MINOR: quic: improve sending API on retransmit qc_send_hdshk_pkts() is a wrapper for qc_prep_hpkts() used on retransmission. It was restricted to use two quic_enc_level pointers as distinct arguments. Adapt it to directly use the same list of quic_enc_level which is passed then to qc_prep_hpkts(). Now for retransmission quic_enc_level send list is built directly into qc_dgrams_retransmit() which calls qc_send_hdshk_pkts(). Along this change, a new utility function qel_register_send() is defined. It is an helper to build the quic_enc_level send list. It enfores that each quic_enc_level instance is only registered in a single list to prevent memory issues. It is both used in qc_dgrams_retransmit() and quic_conn_io_cb().	2024-04-10 11:06:55 +02:00
Amaury Denoyelle	93f5b4c8ae	MINOR: quic: uniformize sending methods for handshake Emission of packets during handshakes was implemented via an API which uses two alternative ways to specify the list of frames. The first one uses a NULL list of quic_enc_level as argument for qc_prep_hpkts(). This was an implicit method to iterate on all qels stored in quic_conn instance, with frames already inserted in their corresponding quic_pktns. The second method was used for retransmission. It uses a custom local quic_enc_level list specified by the caller as input to qc_prep_hpkts(). Frames were accessible through <retransmit> list pointers of each quic_enc_level used in an implicit mechanism. This commit clarifies the API by using a single common method. Now quic_enc_level list must always be specified by the caller. As for frames list, each qels must set its new field <send_frms> pointer to the list of frames to send. Callers of qc_prep_hpkts() are responsible to always clear qels send list. This prevent a single instance of quic_enc_level to be inserted while being attached to another list. This allows notably to clean up some unnecessary code. First, <retransmit> list of quic_enc_level is removed as it is replaced by new <send_frms>. Also, it's now possible to use proper list_for_each_entry() inside qc_prep_hpkts() to loop over each qels. Internal functions for quic_enc_level selection is now removed.	2024-04-10 11:06:41 +02:00
Aurelien DARRAGON	8226e92eb0	BUG/MINOR: tools/log: invalid encode_{chunk,string} usage encode_{chunk,string}() is often found to be used this way: ret = encode_{chunk,string}(start, stop...) if (ret == NULL \|\| *ret != '\0') { //error } //success Indeed, encode_{chunk,string} will always try to add terminating NULL byte to the output string, unless no space is available for even 1 byte. However, it means that for the caller to be able to spot an error, then it must provide a buffer (here: start) which is already initialized. But this is wrong: not only this is very tricky to use, but since those functions don't return NULL on failure, then if the output buffer was not properly initialized prior to calling the function, the caller will perform invalid reads when checking for failure this way. Moreover, even if the buffer is initialized, we cannot reliably tell if the function actually failed this way because if the buffer was previously initialized with NULL byte, then the caller might think that the call actually succeeded (since the function didn't return NULL and didn't update the buffer). Also, sess_build_logline() relies lf_encode_{chunk,string}() functions which are in fact wrappers for encode_{chunk,string}() functions and thus exhibit the same error handling mechanism. It turns out that sess_build_logline() makes unsafe use of those functions because it uses the error-checking logic mentionned above while buffer (tmplog) is not guaranteed to be initialized when entering the function. This may ultimately cause malfunctions or invalid reads if the output buffer is lacking space. To fix the issue once and for all and prevent similar bugs from being introduced, we make it so encode_{string, chunk} and escape_string() (based on encode_string()) now explicitly return NULL on failure (when the function failed to write at least the ending NULL byte) lf_encode_{string,chunk}() helpers had to be patched as well due to code duplication. This should be backported to all stable versions. [ada: for 2.4 and 2.6 the patch won't apply as-is, it might be helpful to backport `ae1e14d65` ("CLEANUP: tools: removing escape_chunk() function") first, considering it's not very relevant to maintain a dead function]	2024-04-09 17:35:45 +02:00
Valentine Krasnobaeva	eef14e9574	CLEANUP: global: remove LSTCHK_CAP_BIND Remove LSTCHK_CAP_BIND as it is never set and never checked.	2024-04-05 18:01:54 +02:00
Valentine Krasnobaeva	f0b6436f57	MEDIUM: capabilities: check process capabilities sets Since the Linux capabilities support add-on (see the commit `bd84387beb` ("MEDIUM: capabilities: enable support for Linux capabilities")), we can also check haproxy process effective and permitted capabilities sets, when it starts and runs as non-root. Like this, if needed network capabilities are presented only in the process permitted set, we can get this information with capget and put them in the process effective set via capset. To do this properly, let's introduce prepare_caps_from_permitted_set(). First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If there is a match, LSTCHK_NETADM is removed from global.last_checks list to avoid warning, because in the initialization sequence some last configuration checks are based on LSTCHK_NETADM flag and haproxy process euid may stay unpriviledged. If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted set will be checked and only capabilities given in 'setcap' keyword will be promoted in the process effective set. LSTCHK_NETADM will be also removed in this case by the same reason. In order to be transparent, we promote from permitted set only capabilities given by user in 'setcap' keyword. So, if caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not be unset and warning about missing priviledges will be emitted at initialization. Need to call it before protocol_bind_all() to allow binding to priviledged ports under non-root and 'setcap cap_net_bind_service' must be set in the global section in this case.	2024-04-05 18:01:54 +02:00
Amaury Denoyelle	0489d85263	MINOR: listener: implement GUID support This commit is similar with the two previous ones. Its purpose is to add GUID support on listeners. Due to bind_conf and listeners configuration, some specifities were required. Its possible to define several listeners on a single bind line, for example by specifying multiple addresses. As such, it's impossible to support a "guid" keyword on a bind line. The problem is exacerbated by the cloning of listeners when sharding is used. To resolve this, a new keyword "guid-prefix" is defined for bind lines. It allows to specify a string which will be used as a prefix for automatically generated GUID for each listeners attached to a bind_conf. Automatic GUID listeners generation is implemented via a new function bind_generate_guid(). It is called on post-parsing, after bind_complete_thread_setup(). For each listeners on a bind_conf, a new GUID is generated with bind_conf prefix and the index of the listener relative to other listeners in the bind_conf. This last value is stored in a new bind_conf field named <guid_idx>. If a GUID cannot be inserted, for example due to a non-unique value, an error is returned, startup is interrupted with configuration rejected.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	8259456981	MINOR: server: implement GUID support This commit is similar to previous one, except that it implements GUID support for server instances. A guid_node field is inserted into server structure. A new "guid" server keyword is defined.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	da754b4533	MINOR: proxy: implement GUID support Implement proxy identiciation through GUID. As such, a guid_node member is inserted into proxy structure. A proxy keyword "guid" is defined to allow user to fix its value.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	1009ca4160	MINOR: guid: restrict guid format GUID format is unspecified to allow users to choose the naming scheme. Some restrictions however are added by this patch, mainly to ensure coherence and memory usage. The first restriction is on the length of GUID. No more than 127 characters can be used to prevent memory over consumption. The second restriction is on the character set allowed in GUID. Utility function invalid_char() is used for this : it allows alphanumeric values and '-', '_', '.' and ':'.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	84fa6b344a	MINOR: guid: introduce global UID module Define a new module guid. Its purpose is to be able to attach a global identifier for various objects such as proxies, servers and listeners. A new type guid_node is defined. It will be stored in the objects which can be referenced by such GUID. Several functions are implemented to properly initialized, insert, remove and lookup GUID in a global tree. Modification operations should only be conducted under thread isolation.	2024-04-05 15:40:42 +02:00
Aurelien DARRAGON	e751eebfc6	MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing Currently, the way proxy-oriented logformat directives are handled is way too complicated. Indeed, "log-format", "log-format-error", "log-format-sd" and "unique-id-format" all rely on preparsing hints stored inside proxy->conf member struct. Those preparsing hints include the original string that should be compiled once the proxy parameters are known plus the config file and line number where the string was found to generate precise error messages in case of failure during the compiling process that happens within check_config_validity(). Now that lf_expr API permits to compile a lf_expr struct that was previously prepared (with original string and config hints), let's leverage lf_expr_compile() from check_config_validity() and instead of relying on individual proxy->conf hints for each logformat expression, store string and config hints in the lf_expr struct directly and use lf_expr helpers funcs to handle them when relevant (ie: original logformat string freeing is now done at a central place inside lf_expr_deinit(), which allows for some simplifications) Doing so allows us to greatly simplify the preparsing logic for those 4 proxy directives, and to finally save some space in the proxy struct. Also, since httpclient proxy has its "logformat" automatically compiled in check_config_validity(), we now use the file hint from the logformat expression struct to set an explicit name that will be reported in case of error ("parsing [httpclient:0] : ...") and remove the extraneous check in httpclient_precheck() (logformat was parsed twice previously..)	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	2b79457bc0	MEDIUM: log: add compiling logic to logformat expressions split parse_logformat_string() into two functions: parse_logformat_string() sticks to the same behavior, but now becomes an helper for lf_expr_compile() which uses explicit arguments so that it becomes possible to use lf_expr_compile() without a proxy, but also compile an expression which was previously prepared for compiling (set string and config hints within the logformat expression to avoid manually storing string and config context if the compiling step happens later). lf_expr_dup() may be used to duplicate an expression before it is compiled, lf_expr_xfer() now makes sure that the input logformat is already compiled. This is some prerequisite works for log-profiles implementation, no functional change should be expected.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	7a21c3a4ef	MAJOR: log: implement proper postparsing for logformat expressions This patch tries to address a design flaw with how logformat expressions are parsed from config. Indeed, some parse_logformat_string() calls are performed during config parsing when the proxy mode is not yet known. Here's a config example that illustrates the issue: defaults mode tcp listen test bind :8888 http-response set-header custom-hdr "%trl" # needs http mode http The above config should work, because the effective proxy mode is http, yet haproxy fails with this error: [ALERT] (99051) : config : parsing [repro.conf:6] : error detected in proxy 'test' while parsing 'http-response set-header' rule : format tag 'trl' is reserved for HTTP mode. To fix the issue once and for all, let's implement smart postparsing for logformat expressions encountered during config parsing: - split parse_logformat_string() (and subfonctions) in order to create a new lf_expr_postcheck() function that must be called to finish preparing and checking the logformat expression once the proxy type is known. - save some config hints info during parse_logformat_string() to generate more precise error messages during lf_expr_postcheck(), if needed, we rely on curpx->conf.args.{file,line} hints for that because parse_logformat_string() doesn't know about current file and line number. - lf_expr_postcheck() uses PR_FL_CHECKED proxy flag to know if the function may try to make the proxy compatible with the expression, or if it should simply fail as soon as an incompatibility is detected. - if parse_logformat_string() is called from an unchecked proxy, then schedule the expression for postparsing, else (ie: during runtime), run the postcheck right away. This change will also allow for some logformat expression error handling simplifications in the future.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	56d8074798	MINOR: proxy: add PR_FL_CHECKED flag PR_FL_CHECKED is set on proxy once the proxy configuration was fully checked (including postparsing checks). This information may be useful to functions that need to know if some config-related proxy properties are likely to change or not due to parsing or postparsing/check logics. Also, during runtime, except for some rare cases config-related proxy properties are not supposed to be changed.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	6810c41f8e	MEDIUM: tree-wide: add logformat expressions wrapper log format expressions are broadly used within the code: once they are parsed from input string, they are converted to a linked list of logformat nodes. We're starting to face some limitations because we're simply storing the converted expression as a generic logformat_node list. The first issue we're facing is that storing logformat expressions that way doesn't allow us to add metadata alongside the list, which is part of the prerequites for implementing log-profiles. Another issue with storing logformat expressions as generic lists of logformat_node elements is that it's starting to become really hard to tell when we rely on logformat expressions or not in the code given that there isn't always a comment near the list declaration or manipulation to indicate that it's relying on logformat expressions under the hood, so this adds some complexity for code maintenance. This patch looks quite impressive due to changes in a lot of header and source files (since logformat expressions are broadly used), but it does a simple thing: it defines the lf_expr structure which itself holds a generic list of logformat nodes, and then declares some helpers to manipulate lf_expr elements and fixes the code so that we now exclusively manipulate logformat_node lists as lf_expr elements outside of log.c. For now, lf_expr struct only contains the list of logformat nodes (no additional metadata), but now that we have dedicated type and helpers, doing so in the future won't be problematic at all and won't require extensive code changes.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	7d8f45b647	MEDIUM: log: carry tag context in logformat node This is a pretty simple patch despite requiring to make some visible changes in the code: When parsing a logformat string, log tags (ie: '%tag', AKA log tags) are turned into logformat nodes with their type set to the type of the corresponding logformat_tag element which was matched by name. Thus, when "compiling" a logformat tag, we only keep a reference to the tag type from the original logformat_tag. For example, for "%B" log tag, we have the following logformat_tag element: { .name = "B", .type = LOG_FMT_BYTES, .mode = PR_MODE_TCP, .lw = LW_BYTES, .config_callback = NULL } When parsing "%B" string, we search for a matching logformat tag inside logformat_tags[] array using the provided name, once we find a matching element, we craft a logformat node whose type will be LOG_FMT_BYTES, but from the node itself, we no longer have access to other informations that are set in the logformat_tag struct element. Thus from a logformat_node resulting from a log tag, with current implementation, we cannot easily get back to matching logformat_tag struct element as it would require us to scan the whole logformat_tags array at runtime using node->type to find the matching element. Let's take a simpler path and consider all tag-specific LOG_FMT_* subtypes as being part of the same logformat node type: LOG_FMT_TAG. Thanks to that, we're now able to distinguish logformat nodes made from logformat tag from other logformat nodes, and link them to their corresponding logformat_tag element from logformat_tags[] array. All it costs is a simple indirection and an extra pointer in logformat_node struct. While at it, all LOG_FMT_* types related to logformat tags were moved inside log.c as they have no use outside of it since they are simply lookup indexes for sess_build_logline() and could even be replaced by function pointers some day...	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	8cf5c3d7f0	MINOR: log: expose logformat_tag struct rename logformat_type internal struct to logformat_tag to to make it less confusing, then expose logformat_tag struct through header file so that it can be referenced in other structs. also rename logformat_keywords[] to logformat_tags[] for better consistency.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	c85cbc1061	MEDIUM: log: rename logformat var to logformat tag What we use to call logformat variable in the code is referred as log-format tag in the documentation. Having both 'var' and 'tag' labels referring to the same thing is really confusing. Let's make the code comply with the documentation by replacing all logformat var/variable/VAR occurences with either tag or TAG. No functional change should be expected, the only visible side-effect from user point of view is that "variable" was replaced by "tag" in some error messages.	2024-04-04 19:10:01 +02:00
Willy Tarreau	1a088da7c2	MAJOR: stktable: split the keys across multiple shards to reduce contention In order to reduce the contention on the table when keys expire quickly, we're spreading the load over multiple trees. That counts for keys and expiration dates. The shard number is calculated from the key value itself, both when looking up and when setting it. The "show table" dump on the CLI iterates over all shards so that the output is not fully sorted, it's only sorted within each shard. The Lua table dump just does the same. It was verified with a Lua program to count stick-table entries that it works as intended (the test case is reproduced here as it's clearly not easy to automate as a vtc): function dump_stk() local dmp = core.proxies['tbl'].stktable:dump({}); local count = 0 for _, __ in pairs(dmp) do count = count + 1 end core.Info('Total entries: ' .. count) end core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0); ## global tune.lua.log.stderr on lua-load-per-thread lua-cnttbl.lua listen front bind :8001 http-request lua.dump_stk if { path_beg /stk } http-request track-sc1 rand(),upper,hex table tbl http-request redirect location / backend tbl stick-table size 100k type string len 12 store http_req_cnt ## $ h2load -c 16 -n 10000 0:8001/ $ curl 0:8001/stk ## A count close to 100k appears on haproxy's stderr ## On the CLI, "show table tbl" \| wc will show the same. Some large parts were reindented only to add a top-level loop to iterate over shards (e.g. process_table_expire()). Better check the diff using git show -b. The number of shards is decided just like for the pools, at build time based on the max number of threads, so that we can keep a constant. Maybe this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used, and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the measurements made for the pools. It turns out that this value seems to be the most reasonable one without inflating the struct stktable too much. By default for 1024 threads the value is 32 and delivers 980k RPS in a test involving 80 threads, while adding 1kB to the struct stktable (roughly doubling it). The same test at 64 gives 1008 kRPS and at 128 it gives 1040 kRPS for 8 times the initial size. 16 would be too low however, with 675k RPS. The stksess already have a shard number, it's the one used to decide which peer connection to send the entry. Maybe we should also store the one associated with the entry itself instead of recalculating it, though it does not happen that often. The operation is done by hashing the key using XXH32(). The peers also take and release the table's lock but the way it's used it not very clear yet, so at this point it's sure this will not work. At this point, this allowed to completely unlock the performance on a 80-thread setup: before: 5.4 Gbps, 150k RPS, 80 cores 52.71% haproxy [.] stktable_lookup_key 36.90% haproxy [.] stktable_get_entry.part.0 0.86% haproxy [.] ebmb_lookup 0.18% haproxy [.] process_stream 0.12% haproxy [.] process_table_expire 0.11% haproxy [.] fwrr_get_next_server 0.10% haproxy [.] eb32_insert 0.10% haproxy [.] run_tasks_from_lists after: 36 Gbps, 980k RPS, 80 cores 44.92% haproxy [.] stktable_get_entry 5.47% haproxy [.] ebmb_lookup 2.50% haproxy [.] fwrr_get_next_server 0.97% haproxy [.] eb32_insert 0.92% haproxy [.] process_stream 0.52% haproxy [.] run_tasks_from_lists 0.45% haproxy [.] conn_backend_get 0.44% haproxy [.] __pool_alloc 0.35% haproxy [.] process_table_expire 0.35% haproxy [.] connect_server 0.35% haproxy [.] h1_headers_to_hdr_list 0.34% haproxy [.] eb_delete 0.31% haproxy [.] srv_add_to_idle_list 0.30% haproxy [.] h1_snd_buf WIP: uint64_t -> long WIP: ulong -> uint code is much smaller	2024-04-03 17:34:47 +02:00
Willy Tarreau	4c1480f13b	MINOR: stick-tables: mark the seen stksess with a flag "seen" Right now we're taking the stick-tables update lock for reads just for the sake of checking if the update index is past it or not. That's costly because even taking the read lock is sufficient to provoke a cache line write, while when under load or attack it's frequent that the update has not yet been propagated and wouldn't require anything. This commit brings a new field to the stksess, "seen", which is zeroed when the entry is updated, and set to one as soon as at least one peer starts to consult it. This way it will reflect that the entry must be updated again so that this peer can see it. Otherwise no update will be necessary. For now the flag is only set/reset but not exploited. A great care is taken to avoid writes whenever possible.	2024-04-03 17:34:47 +02:00
William Lallemand	aa3632962f	MEDIUM: mworker: get rid of libsystemd Given the xz drama which allowed liblzma to be linked to openssh, lets remove libsystemd to get rid of useless dependencies. The sd_notify API seems to be stable and is now documented. This patch replaces the sd_notify() and sd_notifyf() function by a reimplementation inspired by the systemd documentation. This should not change anything functionnally. The function will be built when haproxy is built using USE_SYSTEMD=1. References: https://github.com/systemd/systemd/issues/32028 https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes Before: wla@kikyo:~% ldd /usr/sbin/haproxy linux-vdso.so.1 (0x00007ffcfaf65000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x000074637fef4000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x000074637fe4f000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x000074637f400000) liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x000074637fe0d000) libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x000074637f92a000) libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x000074637f365000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000074637f000000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000074637f27a000) libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x000074637fdff000) libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x000074637eeb8000) liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x000074637fdcd000) libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x000074637ee01000) liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x000074637fda8000) /lib64/ld-linux-x86-64.so.2 (0x000074637ff5d000) libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x000074637f904000) After: wla@kikyo:~% ldd /usr/sbin/haproxy linux-vdso.so.1 (0x00007ffd51901000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f758d6c0000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f758d61b000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f758ca00000) liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x00007f758d5d9000) libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f758d365000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f758d5ba000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f758c600000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f758c915000) /lib64/ld-linux-x86-64.so.2 (0x00007f758d729000) A backport to all stable versions could be considered at some point.	2024-04-03 15:53:18 +02:00
Frederic Lecaille	a305bb92b9	MINOR: quic: HyStart++ implementation (RFC 9406) This is a simple algorithm to replace the classic slow start phase of the congestion control algorithms. It should reduce the high packet loss during this step. Implemented only for Cubic.	2024-04-02 18:47:19 +02:00
Anthony Deschamps	faa8c3e024	MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key <key>" which can be set to "id" (the existing behaviour, default), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now check whether the server's hash has changed. If it has, we have to remove all its nodes first, since the node keys will also have to change.	2024-04-02 07:00:10 +02:00
Aurelien DARRAGON	3c6dfa618a	MEDIUM: log/balance: leverage lbprm api for log load-balancing log load-balancing implementation was not seamlessly integrated within lbprm API. The consequence is that it could become harder to maintain over time since it added some specific cases just for the log backend. Moreover, it resulted in some code duplication since balance algorithms that are common to logs and regular (tcp, http) backends were specifically rewritten for log backends. Thanks to the previous commit, we now have all the prerequisites to make log load-balancing fully leverage lbprm logic. Thus in this patch we make __do_send_log_backend() use existing lbprm algorithms, and we no longer require log-specific lbprm initialization in cfgparse.c and in postcheck_log_backend(). As a bonus, for log backends this allows weighed algorithms to properly support weights (ie: roundrobin, random and log-hash) since we now leverage the same lb algorithms that we use for tcp/http backends (doc was updated).	2024-03-29 17:08:37 +01:00
Aurelien DARRAGON	9aea6df81f	MINOR: lbprm: implement true "sticky" balance algo As previously mentioned in `cd352c0db` ("MINOR: log/balance: rename "log-sticky" to "sticky""), let's define a sticky algorithm that may be used from any protocol. Sticky algorithm sticks on the same server as long as it remains available. The documentation was updated accordingly.	2024-03-29 17:08:37 +01:00
Christopher Faulet	87426e82ec	MAJOR: cli: Use a custom .snd_buf function to only copy the current command The CLI applet is now using its own snd_buf callback function. Instead of copying as most output data as possible, only one command is copied at a time. To do so, a new state CLI_ST_PARSEREQ is added for the CLI applet. In this state, the CLI I/O handle knows a full command was copied into its input buffer and it must parse this command to evaluate it.	2024-03-28 17:32:55 +01:00
Christopher Faulet	838fb54de6	MINOR: stconn: Add a connection flag to notify sending data are the last ones This flag can be use by endpoints to know the data to send, via .snd_buf callback function are the last ones. It is useful to know a shutdown is pending but it cannot be delivered while sedning data are not consumed.	2024-03-28 17:32:55 +01:00
Christopher Faulet	2c6321842b	MEDIUM: applet: Handle applets with their own buffers in put functions applet_putchk() and other similar functions are now testing the applet's type to use the applet's outbuf instead of the channel's buffer. This will ease applets convertion because most of them relies on these functions.	2024-03-28 17:28:20 +01:00
Christopher Faulet	1380b97285	MEDIUM: buf: Add b_getline() and b_getdelim() functions These functions are very similar to co_getline() and co_getdelim(). The first one retrieves the longest part of the buffer that is composed exclusively of characters not in the a delimiter set. The second one stops on LF only and thus returns a line.	2024-03-28 17:28:20 +01:00
Christopher Faulet	5056cbdb86	MINOR: sc_strm: Add generic version to perform sync receives and sends sc_sync_recv() and sc_sync_send() were added to use connection or applet versions, depending on the endpoint type. For now these functions are not used. But this will be used by process_stream() to replace the connection version.	2024-03-28 17:28:20 +01:00
Remi Tricot-Le Breton	7359c0c7f4	MEDIUM: ssl: Add 'tune.ssl.ocsp-update.mode' global option This option can be used to set a default ocsp-update mode for all certificates of a given conf file. It allows to activate ocsp-update on certificates without the need to create separate crt-lists. It can still be superseded by the crt-list 'ocsp-update' option. It takes either "on" or "off" as value and defaults to "off". Since setting this new parameter to "on" would mean that we try to enable ocsp-update on any certificate, and also certificates that don't have an OCSP URI, the checks performed in ssl_sock_load_ocsp were softened. We don't systematically raise an error when trying to enable ocsp-update on a certificate that does not have an OCSP URI, be it via the global option or the crt-list one. We will still raise an error when a user tries to load a certificate that does have an OCSP URI but a missing issuer certificate (if ocsp-update is enabled).	2024-03-27 11:38:28 +01:00
Willy Tarreau	6c1b29d06f	MINOR: ring: make the number of queues configurable Now the rings have one wait queue per group. This should limit the contention on systems such as EPYC CPUs where the performance drops dramatically when using more than one CCX. Tests were run with different numbers and it was showed that value 6 outperforms all other ones at 12, 24, 48, 64 and 80 threads on an EPYC, a Xeon and an Ampere CPU. Value 7 sometimes comes close and anything around these values degrades quickly. The value has been left tunable in the global section. This commit only introduces everything needed to set up the queue count so that it's easier to adjust it in the forthcoming patches, but it was initially added after the series, making it harder to compare. It was also shown that trying to group the threads in queues by their thread groups is counter-productive and that it was more efficient to do that by applying a modulo on the thread number. As surprising as it seems, it does have the benefit of well balancing any number of threads.	2024-03-25 17:34:19 +00:00
Willy Tarreau	e3f101a19a	MINOR: ring: add the definition of a ring waiting cell This is what will be used to describe one waiting thread, its message in the queues, and the aggregation of pending messages after it.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c7bd7a68e4	OPTIM: ring: have only one thread at a time wake up all readers It's inefficient and counter-productive that each ring writer iterates over all readers to wake them up. Let's just have one in charge of this, it strongly limits contention. The only thing is that since the thread is iterating over a list, we want to be sure that if the first readers have already completed their job, they will be woken up again. For this we keep a counter of messages delivered after the wakeup started, and the waking thread will check it before going back to sleep. In order to avoid looping forever, it will also drop its waking flag soon enough to possibly let another one take it. There used to be a few cases of watchdogs before this on a 24-core AMD EPYC platform on the list iteration those never appeared anymore. The perf has dropped a bit on 3C6T on the EPYC, from 6.61 to 6.0M but remains unchanged at 24C48T.	2024-03-25 17:34:19 +00:00
Willy Tarreau	9e99cfbeb6	MAJOR: ring: drop the now unneeded lock It was only used to protect the list which is now an mt_list so it doesn't provide any required protection anymore. It obviously also used to provide strict ordering between the writer and the reader when the writer started to update the messages, but that's now covered by the oredered tail updates and updates to the readers count to protect the area. The message rate on small thread counts (up to 12) saw a boost of roughly 5% while on large counts while for large counts it lost about 2% due to some contention now becoming visible elsewhere. Typical measures are 6.13M -> 6.61M at 3C6T, and 1.88 -> 1.92M at 24C48T on the EPYC.	2024-03-25 17:34:19 +00:00
Willy Tarreau	a2d2dbf210	MEDIUM: ring/applet: turn the wait_entry list to an mt_list instead Rings are keeping a lock only for the list, which apparently doesn't need anything more than an mt_list, so let's first turn it into that before dropping the lock. There should be no visible effect.	2024-03-25 17:34:19 +00:00
Willy Tarreau	eb3d5f464d	MEDIUM: ring: use the topmost bit of the tail as a lock We're now locking the tail while looking for some room in the ring. In fact it's still while writing to it, but the goal definitely is to get rid of the lock ASAP. For this we reserve the topmost bit of the tail as a lock, which may have as a possible visible effect that buffers will be limited to 2GB instead of 4GB on 32-bit machines (though in practise, good luck for allocating more than 2GB contiguous on 32-bit), but in practice since the size is read with atol() and some operating systems limit it to LONG_MAX unless passing negative numbers, the limit is already there. For now the impact on x86_64 is significant (drop from 2.35 to 1.4M/s on 48 threads on EPYC 24 cores) but this situation is only temporary so that changes can be reviewable and bisectable. Other approaches were attempted, such as using XCHG instead, which is slightly faster on x86 with low thread counts (but causes more write contention), and forces readers to stall under heavy traffic because they can't access a valid value for the queue anymore. A CAS requires preloading the value and is les good on ARMv8.1. XADD could also be considered with 12-13 upper bits of the offset dedicated to locking, but that looks overkill.	2024-03-25 17:34:19 +00:00
Willy Tarreau	dd8ea5d928	MEDIUM: ring: align the head and tail fields in the ring_storage structure We really want to let the readers and writers act on different areas, so we want to have the tail and the head on separate cache lines, themselves separate from the rest of the ring. Doing so improves the performance from 2.15 to 2.35M msg/s at 48 threads on a 24-core EPYC. This increases the header space from 32 to 192 bytes when threads are enabled. But since we already have the header size available in the file, haring remains able to detect the aligned vs unaligned formats and call dump_v2a() when aligned is detected.	2024-03-25 17:34:19 +00:00
Willy Tarreau	bf3dead20c	MEDIUM: ring: remove the struct buffer from the ring The purpose is to store a head and a tail that are independent so that we can further improve the API to update them independently from each other. The struct was arranged like the original one so that as long as a ring has its head set to zero (i.e. no recycling) it will continue to work. The new format is already detectable thanks to the "rsvd" field which indicates the number of reserved bytes at the beginning. It's located where the buffer's area pointer previously was, so that older versions of haring can continue to open the ring in repair mode, and newer ones can use the fact that the upper bits of that variable are zero to guess that it's working with the new format instead of the old one. Also let's keep in mind that the layout will further change to place some alignment constraints. The haring tool will thus updated based on this and it detects that the rsvd field is smaller than a page and that the sum of it with the size equals the mapped size, in which case it uses the new dump_v2() function instead of dump_v1(). The new function also creates a buffer from the ring's area, size, head and tail and calls the generic one so that no other code had to be adapted.	2024-03-25 17:34:19 +00:00
Willy Tarreau	01aa0a057c	MEDIUM: ring: change the ring reader to use the new vector-based API now The code now looks cleaner and more easily shows what still needs to be addressed. There are not that many changes in practice, these are mostly mechanical, essentially hiding the buffer from the callers.	2024-03-25 17:34:19 +00:00
Willy Tarreau	03816ccfa9	MAJOR: ring: insert an intermediary ring_storage level We'll need to add more complex structures in the ring, such as wait queues. That's far too much to be stored into the area in case of file-backed contents, so let's split the ring definition and its storage once for all. This patch introduces a struct ring_storage which is assigned to ring->storage, which contains minimal information to represent the storage layout, i.e. for now only the buffer, and all the rest remains in the ring itself. The storage is appended immediately after it and the buffer's pointer always points to that area. It has the benefit of remaining 100% compatible with the existing file-backed layout. In memory, the allocation loses the size of a struct buffer. It's not even certain it's worth placing the size there, given that it's constant and that a dump of a ring wouldn't really need it (the file size is sufficient). But for now everything comes with the struct buffer, and later this will change once split into head and tail. Also this area may be completed with more information in the future (e.g. storage version, format, endianness, word size etc).	2024-03-25 17:34:19 +00:00
Willy Tarreau	01abdcb307	MINOR: ring: add a flag to indicate a mapped file Till now we used to rely on a heuristic pointer comparison to check if a ring was mapped or allocated. Better assign a flag to clarify this because it's going to become difficult otherwise.	2024-03-25 17:34:19 +00:00
Willy Tarreau	b30fd8cc2d	MINOR: ring: also add ring_area(), ring_head(), ring_tail() These will essentially be used to simplify the conversion to a new API.	2024-03-25 17:34:19 +00:00
Willy Tarreau	dc4836c15c	MINOR: ring: add ring_dup() to copy a ring into another one This will mostly be used during reallocation and boot-time duplicates, the purpose is simply to save the caller from having to know the details of the internal representation.	2024-03-25 17:34:19 +00:00
Willy Tarreau	a185d3d90d	MINOR: ring: add ring_size() to return the ring's size This is just to ease conversion so that callers stop accessing the ring's buffer.	2024-03-25 17:34:19 +00:00
Willy Tarreau	4c41fcd0da	MINOR: ring: add ring_data() to report the amount of data in a ring This will be used as an accessor for the few functions that need this outside of ring.c.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c222cb8389	MINOR: vecpair: add necessary functions to use vecpairss from/to ring APIs Many ring-based APIs need a tail and a head, with some extra assumption that the user takes care of not filling the ring so that tail==head is unambiguous. Vectors are particularly suited to this usage so here we create 4 functions to create vectors representing free room or data from a ring, as well as updating rings based on a pair of vectors that represents either free space or data.	2024-03-25 17:34:19 +00:00
Willy Tarreau	63261aae39	MINOR: vecpair: add new vector pair based data manipulation mechanisms The buffers API defines both a storage layout and how to handle the data. The storage is shared with the chunks API which only deals with non-wrapping messages while buffers support wrapping both of the data and of the free space. As such, most of the buffers code already makes special cases of two parts in a buffer, the first one before wrapping and the optional second one after the wrapping occurred. The thing is, there are plenty of other places (e.g. rings) where the code dealing with wrapping is desirable but with a different storage layout. Let's export the existing buffer handling code related to reading/writing wrapping data and make it work with arbitrary vector pairs instead. This will handle wrapping and holes in messages if desired, and it will be up to the caller to decide how its messages are arranged and to pass the relevant ptr,len elements. The code is limited to two vectors because this is sufficient to deal with wrapping without making the code needlessly complex. I.e. this will not reassemble an iovec. For vectors, since we already had the ist type, there's no point inventing a new type, and it's even possible that over time some callers will find benefits in using this unified API (i.e. no NOP translation layer). It also allows to pass inputs as direct arguments and outputs as pointers. Not only this is more efficient code-wise, but it also avoids the accidental use of a wrong function. It was indeed found that naming functions is even harder than with the buffer as the notion of from/to is even fuzzier here. The API will likely continue to evolve and some functions might get renamed to more explicit ones over time to limit confusion. For now the code provides anything needed to reset/create/fill/erase/read/peek or measure vector pairs and to manipulate chars/blocks/varints to/from there.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0b1c17a2dd	MINOR: ring: reserve one special value for the readers count In order to support concurrent writers we'll need to lock areas in the buffer. For this we'll use one special value of the single-byte readers count. Let's reserve it now and use the macro instead of the hardcoded 255.	2024-03-25 17:34:19 +00:00
Willy Tarreau	63242a59c4	MINOR: buf: add b_getblk_ofs() that works relative to area and not head For some concurrently accessed buffers we can't rely on head/data etc, but sometimes the access patterns guarantees that the buffer contents are there. Let's implement a function to read contents from a fixed offset, which never checks head nor data, only the area and its size. It's the caller's job to get this offset.	2024-03-25 17:34:19 +00:00
Willy Tarreau	2f28981546	MINOR: buf: add b_putblk_ofs() to copy a block at a specific position This new function b_putblk_ofs() puts one full block of data of length <len> from <blk> into the buffer, starting from absolute offset <offset> after the buffer's area. As a convenience to avoid complex checks in callers, the offset is allowed to exceed a valid one by no more than one buffer size, and will automatically be wrapped. The caller is responsible for ensuring that <len> doesn't exceed the known length of the available room at this position, otherwise data may be overwritten. The buffer's length is not updated, so generally the caller will have updated it before calling this function. This is meant to be used on concurrently accessed buffers, so that a writer can append data while a reader is blocked by other means from reaching the current area The function guarantees never to use ->head nor ->data.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c5004ccb36	MINOR: buf: add b_rel_ofs() to turn an absolute offset into a relative one It basically does the opposite of b_peek_ofs(). If x=b_peek_ofs(y), then y=b_rel_ofs(x).	2024-03-25 17:34:19 +00:00
Willy Tarreau	15e47b6a59	MINOR: buf: add b_add_ofs() to add a count to an absolute position This function is used to compute a new absolute buffer offset by adding a length to an existing valid offset. It will check for wrapping.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c62a2d540d	MEDIUM: ring: move the ring reader code to ring_dispatch_messages() This new function is made around the loop that scans a ring for new messages and dispatches them to a message handler. It also takes ring flags (WAIT, NEW, etc) and offset pointers that the caller will use to initialize/reuse/update the current processing offset. The caller is still responsible for presetting it to ~0 before the first call if it wants the function to automatically adjust it (or set it to the correct value). The function may also return the last_ofs that was known before releasing the lock so that the caller knows what to compare against and if it needs to restart processing or not. The context remains a void* so that should not necessarily depend on an appctx. The current "show ring" code was ported to this and it continues to work as expected.	2024-03-25 17:34:19 +00:00
Willy Tarreau	ad31e53287	REORG: dns/ring: split the ring between the generic one and the DNS one A ring is used for the DNS code but slightly differently from the generic one, which prevents some important changes from being made to the generic code without breaking DNS. As the use cases differ, it's better to just split them apart for now and have the DNS code use its own ring that we rename dns_ring and let the generic code continue to live on its own. The unused parts such as CLI registration were dropped, resizing and allocation from a mapped area were dropped. dns_ring_detach_appctx() was kept despite not being used, so as to stay consistent with the comments that say it must be called, despite the DNS code explicitly mentioning that it skips it for now (i.e. this may change in the future). Hopefully after the generic rings are converted the DNS code can migrate back to them, though this is really not necessary.	2024-03-25 17:34:19 +00:00
Willy Tarreau	201c706330	MINOR: log/applet: add new function syslog_applet_append_event() This function takes a buffer on input, and offset and a length, and consumes the block from that buffer to send it to the appctx's output buffer. Contrary to its sibling applet_append_line(), instead of just appending an LF at the end of the line, it prepends the message size in decimal and a space before the message, as expected by syslog TCP implementaions. This will be used to simplify the ring reader code.	2024-03-25 17:34:19 +00:00
Willy Tarreau	6ae41dc510	MINOR: applet: add new function applet_append_line() This function takes a buffer on input, and offset and a length, and consumes the block from that buffer to send it to the appctx's output buffer. This will be used to simplify the ring reader code.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c038ca8e8c	MINOR: atomic: add a read-specific variant of __ha_cpu_relax() Tests on various systems show that x86 prefers not to wait at all inside read loops while aarch64 prefers to wait a little bit. Instead of having to stuff ifdefs around __ha_cpu_relax() inside plenty of such loops waiting for a condition to appear, better implement a new variant that we call __ha_cpu_relax_for_read() which honors each architecture's preferences and is the same as __ha_cpu_relax() for other ones.	2024-03-25 17:34:19 +00:00
Aurelien DARRAGON	3de1acfb23	BUILD: server: fix build regression on old compilers (<= gcc-4.4) Willy reported that since `3ac79b504` ("MEDIUM: server: make server_set_inetaddr() updater serializable"), haproxy fails to compile on some older compilers such as gcc-4.4 with this kind of error: src/server.c: In function 'snr_resolution_cb': src/server.c:4471: error: unknown field 'dns_resolver' specified in initializer compilation terminated due to -Wfatal-errors. make: *** [Makefile:1006: src/server.o] Error 1 This is due to referencing a member inside anonymous union from a compound literal assignment. Apparently such use of anonymous union wasn't properly supported back then on older compilers. To fix the issue, we give "u" name to the parent union use this name to explicitly refer to the union where relevant in the code (only a few changes fortunately). The fix itself was verified to restore build compatibility with gcc 4.4 (and even 4.2). As `3ac79b504` is used as a prerequisite for `64c9c8ef3` ("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS"), please consider backporting this patch too if `64c9c8ef3` happens to be backported in 2.9.	2024-03-25 16:23:37 +01:00
Amaury Denoyelle	0d4273f04b	MEDIUM: server: close private idle connection before server deletion This commit similar to the following one : 65ae241dcfe710e1cdd3ec4e7a9bde38d2e4c116 MEDIUM: server: close idle conn before server deletion This patch implements a similar logic, this time to close private idle connections stored in sessions. The principle is identical to the above commit : conn_release() is used on idle connections after a takeover to ensure thread safety. An extra change was required to be able to execute takeover on such connections. Their original thread ID was unknown, contrary to non private connections which are stored in sharded lists. As such, a new tid member has been added under sess_priv_conns chaining element.	2024-03-22 17:12:27 +01:00
Amaury Denoyelle	f3862a9bc7	MINOR: connection: extend takeover with release option Extend takeover API both for MUX and XPRT with a new boolean argument <release>. Its purpose is to signal if the connection will be freed immediately after the takeover, rendering new resources allocation unnecessary. For the moment, release argument is always false. However, it will be set to true on delete server CLI handler to proactively close server idle connections.	2024-03-22 16:12:36 +01:00
Amaury Denoyelle	ff2e71ae24	MINOR: connection: implement conn_release() Several places reuse the same code to ensure a connection is properly freed, either via its MUX or by calling the proper set of functions. Factorize all of this in a new function conn_release(). This new function is now called via session_free() and session_accept_fd(). It will also be reused on delete server to proactively close idle connections.	2024-03-22 16:12:36 +01:00
Remi Tricot-Le Breton	5c25c577a0	BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing The CLI command "update ssl ocsp-response" was forcefully removing an OCSP response from the update tree regardless of whether it used to be in it beforehand or not. But since the main OCSP upate task works by removing the entry being currently updated from the update tree and then reinserting it when the update process is over, it meant that in the CLI command code we were modifying a structure that was already being used. These concurrent accesses were not properly locked on the "regular" update case because it was assumed that once an entry was removed from the update tree, the update task was the only one able to work on it. Rather than locking the whole update process, an "updating" flag was added to the certificate_ocsp in order to prevent the "update ssl ocsp-response" command from trying to update a response already being updated. An easy way to reproduce this crash was to perform two "simultaneous" calls to "update ssl ocsp-response" on the same certificate. It would then crash on an eb64_delete call in the main ocsp update task function. This patch can be backported up to 2.8. Wait a little bit before backporting.	2024-03-20 16:12:10 +01:00
Remi Tricot-Le Breton	69071490ff	BUG/MAJOR: ocsp: Separate refcount per instance and per store With the current way OCSP responses are stored, a single OCSP response is stored (in a certificate_ocsp structure) when it is loaded during a certificate parsing, and each SSL_CTX that references it increments its refcount. The reference to the certificate_ocsp is kept in the SSL_CTX linked to each ckch_inst, in an ex_data entry that gets freed when the context is freed. One of the downsides of this implementation is that if every ckch_inst referencing a certificate_ocsp gets detroyed, then the OCSP response is removed from the system. So if we were to remove all crt-list lines containing a given certificate (that has an OCSP response), and if all the corresponding SSL_CTXs were destroyed (no ongoing connection using them), the OCSP response would be destroyed even if the certificate remains in the system (as an unused certificate). In such a case, we would want the OCSP response not to be "usable", since it is not used by any ckch_inst, but still remain in the OCSP response tree so that if the certificate gets reused (via an "add ssl crt-list" command for instance), its OCSP response is still known as well. But we would also like such an entry not to be updated automatically anymore once no instance uses it. An easy way to do it could have been to keep a reference to the certificate_ocsp structure in the ckch_store as well, on top of all the ones in the ckch_instances, and to remove the ocsp response from the update tree once the refcount falls to 1, but it would not work because of the way the ocsp response tree keys are calculated. They are decorrelated from the ckch_store and are the actual OCSP_CERTIDs, which is a combination of the issuer's name hash and key hash, and the certificate's serial number. So two copies of the same certificate but with different names would still point to the same ocsp response tree entry. The solution that answers to all the needs expressed aboved is actually to have two reference counters in the certificate_ocsp structure, one actual reference counter corresponding to the number of "live" pointers on the certificate_ocsp structure, incremented for every SSL_CTX using it, and one for the ckch stores. If the ckch_store reference counter falls to 0, the corresponding certificate must have been removed via CLI calls ('set ssl cert' for instance). If the actual refcount falls to 0, then no live SSL_CTX uses the response anymore. It could happen if all the corresponding crt-list lines were removed and there are no live SSL sessions using the certificate anymore. If any of the two refcounts becomes 0, we will always remove the response from the auto update tree, because there's no point in spending time updating an OCSP response that no new SSL connection will be able to use. But the certificate_ocsp object won't be removed from the tree unless both refcounts are 0. Must be backported up to 2.8. Wait a little bit before backporting.	2024-03-20 16:12:10 +01:00
Amaury Denoyelle	c130f74803	BUG/MINOR: session: ensure conn owner is set after insert into session A crash could occured if a session_add_conn() would temporarily failed when called via h2_detach(). In this case, connection owner is reset to NULL. However, if this wasn't the last connection stream, the connection won't be destroyed. When h2_detach() is recalled for another stream and this time session_add_conn() succeeds, a crash will occur due to session_check_idle_conn() invocation with a NULL connection owner. To fix this, ensure connection owner is always set after session_add_conn() success. This bug is considered as minor as the only failure reason for session_add_conn() is a pool allocation issue. This should be backported up to all stable releases.	2024-03-20 14:26:57 +01:00
Christopher Faulet	189f74d4ff	MINOR: cfgparse: Add a global option to expose deprecated directives Similarly to "expose-exprimental-directives" option, there is no a global option to expose some deprecated directives. Idea is to have a way to silent warnings about deprecated directives when there is no alternative solution. Of course, deprecated directives covered by this option are not listed and may change. It is only a best effort to let users upgrade smoothly.	2024-03-15 11:31:48 +01:00
Amaury Denoyelle	7dae3ceaa0	BUG/MAJOR: server: do not delete srv referenced by session A server can only be deleted if there is no elements which reference it. This is taken care via srv_check_for_deletion(), most notably for active and idle connections. A special case occurs for connections directly managed by a session. This is for so-called private connections, when using http-reuse never or H2 + http-reuse safe for example. In this case. server does not account these connections into its idle lists. This caused a bug as the server is deleted despite the session still being able to access it. To properly fix this, add a new referencing element into the server for these session connections. A mt_list has been chosen for this. On default http-reuse, private connections are typically not used so it won't make any difference. If using H2 servers, or more generally when dealing with private connections, insert/delete should typically occur only once per session lifetime so impact on performance should be minimal. This should be backported up to 2.4. Note that srv_check_for_deletion() was introduced in 3.0 dev tree. On backport, the extra condition in it should be placed in cli_parse_delete_server() instead.	2024-03-14 15:21:07 +01:00
Amaury Denoyelle	5ad801c058	MINOR: session: rename private conns elements By default, backend connections are attached to a server instance. This allows to implement connection reuse. However, in some particular cases, connection cannot be shared accross several clients. These connections are considered and private and are attached to the session instance instead. These private connections are also indexed by the target server to not mix them. All of this is implemented via a dedicated structure previously named struct sess_srv_list. Rename it to better reflect its usage to struct sess_priv_conns. Also rename its internal members and all of the associated functions. This commit is only a renaming, thus no functional impact is expected.	2024-03-14 15:21:02 +01:00
Aurelien DARRAGON	07b2e84bce	BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread (2nd try) While trying to reproduce another crash case involving lua filters reported by @bgrooot on GH #2467, we found out that mixing filters loaded from different contexts ('lua-load' vs 'lua-load-per-thread') for the same stream isn't supported and may even cause the process to crash. Historically, mixing lua-load and lua-load-per-threads for a stream wasn't supported, but this changed thanks to `0913386` ("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread"). However, the above fix didn't consider lua filters's use-case properly: unlike lua fetches, actions or even services, lua filters don't simply use the stream hlua context as a "temporary" hlua running context to process some hlua code. For fetches, actions.. hlua executions are processed sequentially, so we simply reuse the hlua context from the previous action/fetch to run the next one (this allows to bypass memory allocations and initialization, thus it increases performance), unless we need to run on a different hlua state-id, in which case we perform a reset of the hlua context. But this cannot work with filters: indeed, once registered, a filter will last for the whole stream duration. It means that the filter will rely on the stream hlua context from ->attach() to ->detach(). And here is the catch, if for the same stream we register 2 lua filters from different contexts ('lua-load' + 'lua-load-per-thread'), then we have an issue, because the hlua stream will be re-created each time we switch between runtime contexts, which means each time we switch between the filters (may happen for each stream processing step), and since lua filters rely on the stream hlua to carry context between filtering steps, this context will be lost upon a switch. Given that lua filters code was not designed with that in mind, it would confuse the code and cause unexpected behaviors ranging from lua errors to crashing process. So here we take another approach: instead of re-creating the stream hlua context each time we switch between "global" and "per-thread" runtime context, let's have both of them inside the stream directly as initially suggested by Christopher back then when talked about the original issue. For this we leverage hlua_stream_ctx_prepare() and hlua_stream_ctx_get() helper functions which return the proper hlua context for a given stream and state_id combination. As for debugging infos reported after ha_panic(), we check for both hlua runtime contexts to check if one of them was active when the panic occured (only 1 runtime ctx per stream may be active at a given time). This should be backported to all stable versions with `0913386` ("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread") This commit depends on: - "DEBUG: lua: precisely identify if stream is stuck inside lua or not" [for versions < 2.9 the ha_thread_dump_one() part should be skipped] - "MINOR: hlua: use accessors for stream hlua ctx" For 2.4, the filters API didn't exist. However it may be a good idea to backport it anyway because ->set_priv()/->get_priv() from tcp/http lua applets may also be affected by this bug, plus it will ease code maintenance. Of course, filters-related parts should be skipped in this case.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	1a2cdf64c9	DEBUG: lua: precisely identify if stream is stuck inside lua or not When ha_panic() is called by the watchdog, we try to guess from ha_task_dump() and ha_thread_dump_one() if the thread was stuck while executing lua from the stream context. However we consider this is the case by simply checking if the stream hlua context was set, but this is not very precise because if the hlua context is set, then it simply means that at least one lua instruction was executed at the stream level, not that the stuck was currently executing lua when the panic occured. This is especially true with filters, one could simply register a lua filter that does nothing but this will still end up initializing the stream hlua context for each stream. If the thread end up being stuck during the stream handling, then debug dumping functions will report that the stream was stuck while handling lua, which is not necessarilly true, and could in fact confuse us even more. So here we take another approach, we add the BUSY flag to hlua context: this flag is set by hlua_ctx_resume() around lua_resume() call, this way we can precisely tell if the thread was handling lua when it was interrupted, and we rely on this flag in debug functions to check if the thread was effectively stuck inside lua or not while processing the stream No backport needed unless a commit depends on it.	2024-03-13 09:24:46 +01:00
William Lallemand	bbc215d3bd	CLEANUP: ssl: remove useless #ifdef in openssl-compat.h Remove a useless #ifdef in openssl-compat.h	2024-03-13 08:51:04 +01:00
William Lallemand	501d9fdb86	MEDIUM: ssl: allow to change the OpenSSL security level from global section The new "ssl-security-level" option allows one to change the OpenSSL security level without having to change the openssl.cnf global file of your distribution. This directives applies on every SSL_CTX context. People sometimes change their security level directly in the ciphers directive, however there are some cases when the security level change is not applied in the right order (for example when applying a DH param). Before this patch, it was to possible to trick by using a specific openssl.cnf file and start haproxy this way: OPENSSL_CONF=./openssl.cnf ./haproxy -f bug-2468.cfg Values for the security level can be found there: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_security_level.html This was discussed in github issue #2468.	2024-03-12 17:37:11 +01:00
Amaury Denoyelle	c499d66f37	MINOR: quic: remove qc_treat_rx_crypto_frms() This commit removes qc_treat_rx_crypto_frms(). This function was used in a single place inside qc_ssl_provide_all_quic_data(). Besides, its naming was confusing as conceptually it is directly linked to quic_ssl module instead of quic_rx. Thus, body of qc_treat_rx_crypto_frms() is inlined directly inside qc_ssl_provide_all_quic_data(). Also, qc_ssl_provide_quic_data() is now only used inside quic_ssl to its scope is set to static. Overall, API for CRYPTO frame handling is now cleaner.	2024-03-11 14:27:51 +01:00
matthias sweertvaegher	062ea3a3d4	BUILD: solaris: fix compilation errors Compilation on solaris fails because of usage of names reserved on that platform, i.e. 'queue' and 's_addr'. This patch redefines 'queue' as '_queue' and renames 's_addr' to 'srv_addr' which fixes compilation for now. Future plan: rename 'queue' in code base so define can be removed again. Backporting: 2.9, 2.8	2024-03-09 11:24:54 +01:00
Willy Tarreau	758cb450a2	OPTIM: sink: drop the sink lock used to count drops The sink lock was made to prevent event producers from passing while there were other threads trying to print a "dropped" message, in order to guarantee the absence of reordering. It has a serious impact however, which is that all threads need to take the read lock when producing a regular trace even when there's no reader. This patch takes a different approach. The drop counter is shifted left by one so that the lowest bit is used to indicate that one thread is already taking care of trying to dump the counter. Threads only read this value normally, and will only try to change it if it's non-null, in which case they'll first check if they are the first ones trying to dump it, otherwise will simply count another drop and leave. This has a large benefit. First, it will avoid the locking that causes stalls as soon as a slow reader is present. Second, it avoids any write on the fast path as long as there's no drop. And it remains very lightweight since we just need to add +2 or subtract 2*dropped in operations, while offering the guarantee that the sink_write() has succeeded before unlocking the counter. While a reader was previously limiting the traffic to 11k RPS under 4C/8T, now we reach 36k RPS vs 14k with no reader, so readers will no longer slow the traffic down and will instead even speed it up due to avoiding the contention down the chain in the ring. The locking cost dropped from ~75% to ~60% now (it's in ring_write now).	2024-03-09 11:23:52 +01:00
Willy Tarreau	26cd248feb	BUILD: ssl: define EVP_CTRL_AEAD_GET_TAG for older versions Amaury reported that previous commit `08ac282375` ("MINOR: Add aes_gcm_enc converter") broke the CI on OpenSSL 1.0.2 due to the define above not existing there. Let's just map it to its older name when not existing. For reference, these were renamed when switching to 1.1.0: https://marc.info/?l=openssl-cvs&m=142244190907706&w=2 No backport is needed.	2024-03-08 18:23:34 +01:00
Amaury Denoyelle	1ee7bf5bd9	MINOR: quic: always use ncbuf for rx CRYPTO The previous patch fix the handling of in-order CRYPTO frames which requires the usage of a new buffer for these data as their handling is delayed to run under TASK_HEAVY. In fact, as now all CRYPTO frames handling must be delayed, their handling can be unify. This is the purpose of this commit, which removes the just introduced new buffer. Now, all CRYPTO frames are buffered inside the ncbuf. Unused elements such as crypto_frms member for encryption level are also removed. This commit is not a bugcfix but is a direct follow-up to the last one. As such, it can probably be backported with it to 2.9 to reduce code differences between these versions.	2024-03-08 17:22:48 +01:00
Amaury Denoyelle	81f118cec0	BUG/MEDIUM: quic: fix handshake freeze under high traffic QUIC relies on SSL_do_hanshake() to be able to validate handshake. As this function is computation heavy, it is since 2.9 called only under TASK_HEAVY. This has been implemented by the following patch : `94d20be138` MEDIUM: quic: Heavy task mode during handshake Instead of handling CRYPTO frames immediately during reception, this patch delays the process to run under TASK_HEAVY tasklet. A frame copy is stored in qel.rx.crypto_frms list. However, this frame still reference the receive buffer. If the receive buffer is cleared before the tasklet is rescheduled, it will point to garbage data, resulting in haproxy decryption error. This happens if a fair amount of data is received constantly to preempt the quic_conn tasklet execution. This bug can be reproduced with a fair amount of clients. It is exhibited by 'show quic full' which can report connections blocked on handshake. Using the following commands result in h2load non able to complete the last connections. $ h2load --alpn-list h3 -t 8 -c 800 -m 10 -w 10 -n 8000 "https://127.0.0.1:20443/?s=10k" Also, haproxy QUIC listener socket mode was active to trigger the issue. This forces several connections to share the same reception buffer, rendering the bug even more plausible to occur. It should be possible to reproduce it with connection socket if increasing the clients amount. To fix this bug, define a new buffer under quic_cstream. It is used exclusively to copy CRYPTO data for in-order frame if ncbuf is empty. This ensures data remains accessible even if receive buffer is cleared. Note that this fix is only a temporary step. Indeed, a ncbuf is also already used for out-of-order data. It should be possible to unify its usage for both in and out-of-order data, rendering this new buffer instance unnecessary. In this case, several unneeded elements will become obsolete such as qel.rx.crypto_frms list. This will be done in a future refactoring patch. This must be backported up to 2.9.	2024-03-08 17:22:48 +01:00
Nenad Merdanovic	e225e04ba7	MINOR: vars: export var_set and var_unset functions Co-authored-by: Dragan Dosen <ddosen@haproxy.com>	2024-03-08 17:20:43 +01:00
Willy Tarreau	93a0fb74f4	BUILD: buf: make b_ncat() take a const for the source In 2.7 with commit `35df34223b` ("MINOR: buffers: split b_force_xfer() into b_cpy() and b_force_xfer()"), b_ncat() was extracted from b_force_xfer() but kept its source variable instead of constant, making it unusable for calls from a const source. Let's just fix it.	2024-03-05 11:50:34 +01:00
Willy Tarreau	0a0041d195	BUILD: tree-wide: fix a few missing includes in a few files Some include files, mostly types definitions, are missing a few includes to define the types they're using, causing include ordering dependencies between files, which are most often not seen due to the alphabetical order of includes. Let's just fix them. These were spotted by building pre-compiled headers for all these files to .h.gch.	2024-03-05 11:50:34 +01:00
Willy Tarreau	ac692d7ee5	BUILD: thread: move lock label definitions to thread-t.h The 'lock_label' enum is defined in thread.h but it's used in a few type files, so let's move it to thread-t.h to allow explicit includes.	2024-03-05 11:50:34 +01:00
Amaury Denoyelle	f913d42aaf	MINOR: quic: add MUX output for show quic Extend "show quic" to be able to dump MUX related information. This is done via the new function qcc_show_quic(). This replaces the old streams dumping list which was incomplete. These info are displayed on full output or by specifying "mux" field.	2024-02-29 10:03:36 +01:00
Christopher Faulet	60fcc27577	MEDIUM: htx/http-ana: No longer close connection on early HAProxy response When a response was returned by HAProxy, a dedicated HTX flag was set. Thanks to this flag, it was possible to add a "connection: close" header to the response if the request was not fully received and to close the connection. In the same way, when a redirect rule was applied, keep-alive was forcefully disabled for unfinished requests. All these mechanisms are now useless because the H1 mux is able to drain the response. So HTX_FL_PROXY_RESP flag is removed and no special processing is performed on HAProxy response when the request is unfinished.	2024-02-28 16:02:33 +01:00
Christopher Faulet	077906da14	MAJOR: mux-h1: Drain requests on client side before shut a stream down unlike for H2 and H3, there is no mechanism in H1 to notify the client it must stop to upload data when a response is replied before the end of the request without closing the connection. There is no RST_STREAM frame equivalent. Thus, there is only two ways to deal with this situation: closing the connection or draining the request. Until now, HAProxy didn't support draining H1 messages. Closing the connection in this case has however a major drawback. It leads to send a TCP reset, dropping this way all in-fly data. There is no warranty the client has fully received the response. Draining H1 messages was never implemented because in old versions it was a bit tricky to implement. However, it is now far simplier to support this feature because it is possible to have a H1 stream without any applicative stream. It is the purpose of this patch. Now, when a shutdown is requested and the stream is detached from the connection, if the request is unfinished while the response was fully sent, the request in drained. To do so, in this case the shutdown and the detach are delayed. From the upper layer point of view, there is no changes. The endpoint is shut down and detached as usual. But on H1 mux point of view, the H1 stream is still alive and is being able to drain data. However the stream-endpoint descriptor is orphan. Once the request is fully received (and drained), the connection is shut down if it cannot be reused for a new transaction and the H1 stream is destroyed.	2024-02-28 16:02:33 +01:00
Amaury Denoyelle	8a31783b64	BUG/MEDIUM: server: fix dynamic servers initial settings Contrary to static servers, dynamic servers does not initialize their settings from a default server instance. As such, _srv_parse_init() was responsible to set a set of minimal values to have a correct behavior. However, some settings were not properly initialized. This caused dynamic servers to not behave as static ones without explicit parameters. Currently, the main issue detected is connection reuse which was completely impossible. This is due to incorrect pool_purge_delay and max_reuse settings incompatible with srv_add_to_idle_list(). To fix the connection reuse, but also more generally to ensure dynamic servers are aligned with other server instances, define a new function srv_settings_init(). This is used to set initial values for both default servers and dynamic servers. For static servers, srv_settings_cpy() is kept instead, using their default server as reference. This patch could have unexpected effects on dynamic servers behavior as it restored proper initial settings. Previously, they were set to 0 via calloc() invocation from new_server(). This should be backported up to 2.6, after a brief period of observation.	2024-02-27 17:02:20 +01:00
William Lallemand	4895fdac5a	BUG/MAJOR: ssl/ocsp: crash with ocsp when old process exit or using ocsp CLI This patch reverts 2 fixes that were made in an attempt to fix the ocsp-update feature used with the 'commit ssl cert' command. The patches crash the worker when doing a soft-stop when the 'set ssl ocsp-response' command was used, or during runtime if the ocsp-update was used. This was reported in issue #2462 and #2442. The last patch reverted is the associated reg-test. Revert "BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing" This reverts commit `5e66bf26ec`. Revert "BUG/MEDIUM: ocsp: Separate refcount per instance and per store" This reverts commit 04b77f84d1b52185fc64735d7d81137479d68b00. Revert "REGTESTS: ssl: Add OCSP related tests" This reverts commit acd1b85d3442fc58164bd0fb96e72f3d4b501d15.	2024-02-26 18:04:25 +01:00
Willy Tarreau	a4d44250eb	BUG/MINOR: ist: only store NUL byte on succeeded alloc The trailing NUL added at the end of istdup() by recent commit `de0216758` ("BUG/MINOR: ist: allocate nul byte on istdup") was placed outside of the pointer validity test, rightfully showing null deref warnings. This fix should be backported along with the fix above, to the same versions.	2024-02-23 19:51:54 +01:00
Miroslav Zagorac	3f771f5118	MINOR: ssl: Call callback function after loading SSL CRL data Due to the possibility of calling a control process after adding CRLs, the ssl_commit_crlfile_cb variable was added. It is actually a pointer to the callback function, which is called if defined after initial loading of CRL data from disk and after committing CRL data via CLI command 'commit ssl crl-file ..'. If the callback function returns an error, then the CLI commit operation is terminated. Also, one case was added to the CLI context used by "commit cafile" and "commit crlfile": CACRL_ST_CRLCB in which the callback function is called. Signed-off-by: William Lallemand <wlallemand@haproxy.com>	2024-02-23 18:12:27 +01:00
Christopher Faulet	3d93ecc132	BUG/MAJOR: cli: Restore non-interactive mode behavior with pipelined commands The issue was decribed in commit "BUG/MEDIUM: cli: Warn if pipelined commands are delimited by a \n". In non-interactive mode, it was possible to use a newline character as delimiter for pipelined commands. As a consequence, it was possible to stop commands processing on the middle. With the above commit, a warning is emitted to notify users. With this one, we restore the expected behavior, as documented in the management guide. Only the first line of commands is parsed. This commit will not be backported to avoid breaking changes on stable versions. This commit has of course some visible effects. All script using a newline character as delimiter to pipeline commands in non-interactive mode will stop working. Only the first command will be evaluated, all others will be ignored. Pipelined commands MUST now be separated by a semi-colon. But there is a more subtle and probably more annoying change. It is no longer possible to pipeline commands with a payload ! A command with a payload will always be the last one evaluated because it must be finished by a newline (eventually preceeded by a custom pattern). It is really annoying to introduce such breaking change. But, on the long term, it is mandatory. The 2.8 will be the last LST version supporting the old behavior (with some warning however). This will let 4 years to users to adapt their scripts. No backport needed.	2024-02-23 15:19:49 +01:00
Christopher Faulet	598c7f164c	BUG/MEDIUM: cli: Warn if pipelined commands are delimited by a \n This was broken since commit `0011c25144` ("BUG/MINOR: cli: avoid O(bufsize) parsing cost on pipelined commands"). It is not really a bug fix but it is labelled as is to make it more visible. Before, a full line was first retrieved from the request buffer before extracting the first command to eval it. Now, only one command is retrieved. But we rely on the request buffer state to interrupt processing in non-interactive mode. After a command processing, if output of the request buffer is empty, we leave. Before the above commit, this was not a problem. But since then, it is obviously a bad statement. First because some input data may still be there. It is not true today, but it might change. Then, there is no warranty to receive all commands in same time. For small list of commands, it will be most of time the case, but it is a dangerous assumption. For long list of commands, it is almost always false. To be an issue, commands must be chunked exactly between two commands. But in this case, remaining commands are skipped. A good way to reproduce the issue is to wait a bit between two commands, for instance: (printf "show info;"; sleep 2; printf "show stat\n") \| socat ... In fact, to properly fix the issue, we should exit on the first command finished by a newline. Indeed, as stated in the documentation, in non-interactive mode, a single line is processed. To pipeline commands, commands must be separated by a semi-colon. Unfortunately, the above commit introduced another change. It is possible to pipeline commands delimited by a newline. It was pushed 2 years ago and backported to all stable versions. Several scripts may rely on this behavior. So, on stable version, the bug will not be fixed. However a warning will be emitted to notify users their scripts don't respect the documentation and they must adapt it. Mainly because the cli behavior on this point will be changed in 3.0 to stick to the doc. This warning will only be emitted once over the whole worker process life. Idea is to not flood the logs with the same warning for every offending commands. This commit should probably be backported to all stable versions. But with some cautions because the CLI was often modified.	2024-02-23 15:19:49 +01:00
Amaury Denoyelle	de02167584	BUG/MINOR: ist: allocate nul byte on istdup istdup() is documented as having the same behavior as strdup(). However, it may cause confusion as it allocates a block of input length, without an extra byte for \0 delimiter. This behavior is incoherent as in case of an empty string however a single \0 is allocated. This API inconsistency could cause a bug anywhere an IST is used as a C-string after istdup() invocation. Currently, the only found issue is with 'wait' CLI command using 'srv-unused'. This causes a buffer overflow due to ist0() invocation after istdup() for be_name and sv_name. Backport should be done to all stable releases. Even if no bug has been found outside of wait CLI implementation, it ensures the code is more consistent on every releases.	2024-02-22 18:24:35 +01:00
Aurelien DARRAGON	1c2e16ba8a	MINOR: log: add free_logformat_node() helper function Function may be used to free a single logformat node.	2024-02-22 15:32:42 +01:00
Aurelien DARRAGON	891bac673b	CLEANUP: proxy/log: remove unused proxy flag Since `3d6350e10` ("MINOR: log: Remove log-error-via-logformat option"), PR_O_ERR_LOGFMT flag is not used anymore, but it was left in the proxy-t.h header file. Simply removing it and adding a comment to indicate that the corresponding bit is now unused.	2024-02-22 15:32:42 +01:00
Amaury Denoyelle	8b950f40fa	MINOR: quic: only use sendmsg() syscall variant This patch is the direct followup of the previous one : MINOR: quic: remove sendto() usage variant This finalizes qc_snd_buf() simplification by removing send() syscall usage for quic-conn owned socket. Syscall invocation is merged in a single code location to the sendmsg() variant. The only difference for owned socket is that destination address for sendmsg() is set to NULL. This usage is documented in man 2 sendmsg as valid for connected sockets. This allows maximum performance by avoiding unnecessary lookups on kernel socket address tables. As the previous patch, no functional change should happen here. However, it will be simpler to extend qc_snd_buf() for GSO usage.	2024-02-20 16:42:05 +01:00
Aurelien DARRAGON	1448478d62	MINOR: log: explicit typecasting for logformat nodes Add the ability to manually specify desired output type after a custom field name for logformat nodes. Forcing the type can be useful to ensure value is stored with the proper type representation. (i.e.: forcing numerical to string to work around the limited resolution of JS number types) By default, type is set to SMP_T_SAME, which means the original type will be preserved. Currently supported types are: bool, str, sint	2024-02-20 15:49:54 +01:00
Aurelien DARRAGON	0cfcc64b79	MINOR: sample: add type_to_smp() helper function type_to_smp(type) does the reverse operation of smp_to_type[smp]: it takes a type name as input string and tries to return the corresponding SMP_T_* smp type or SMP_TYPES if not found.	2024-02-20 15:18:39 +01:00
Aurelien DARRAGON	2ed6068f2a	MINOR: log: custom name for logformat node Add the ability to specify custom name (will be used for representation in verbose output types such as json) to logformat nodes. For now, a custom name should be composed by characters [a-zA-Z0-9-_]*	2024-02-20 15:18:39 +01:00
Christopher Faulet	22d4b0e901	MINOR: stconn: Add SE flag to announce zero-copy forwarding on consumer side The SE_FL_MAY_FASTFWD_CONS is added and it will be used by endpoints to announce their support for the zero-copy forwarding on the consumer side. The flag is not necessarily permanent. However, it will be used this way for now.	2024-02-14 15:09:14 +01:00
Christopher Faulet	7598c0ba69	MINOR: stconn: Rename SE_FL_MAY_FASTFWD and reorder bitfield To fix a bug, a flag to announce the capabitlity to support the zero-copy forwarding on the consumer side will be added on the SE descriptor. So the old flag SE_FL_MAY_FASTFWD is renamed to indicate it concerns the producer side. It is now SE_FL_MAY_FASTFWD_PROD. And to prepare addition of the new flag, the bitfield is a bit reordered.	2024-02-14 15:00:32 +01:00
Christopher Faulet	8cfc11f461	CLEANUP: stconn: Move SE flags set by app layer at the end of the bitfield To fix a bug, some SE flags must be added or renamed. To avoid mixing flags set by the endpoint and flags set by the app, the second set of flags are moved at the end of the bitfield, leaving the holes on the middle.	2024-02-14 14:52:25 +01:00
Christopher Faulet	40d98176ba	BUG/MEDIUM: stconn: Don't check pending shutdown to wake an applet up This revert of commit `0b93ff8c87` ("BUG/MEDIUM: stconn: Wake applets on sending path if there is a pending shutdown") and `9e394d34e0` ("BUG/MINOR: stconn: Don't report blocked sends during connection establishment") because it was not the right fixes. We must not wake an applet up when a shutdown is pending because it means output some data are still blocked in the channel buffer. The applet does not necessarily consume these data. In this case, the applet may be woken up infinitly, except if it explicitly reports it wont consume datay yet. This patch must be backported as far as 2.8. For older versions, as far as 2.2, it may be backported. If so, a previous fix must be pushed to prevent an HTTP applet to be stuck. In http_ana.c, in http_end_request() and http_end_reponse(), the call to channel_htx_truncate() on the request channel in case of MSG_ERROR must be replace by a call to channel_htx_erase().	2024-02-14 14:22:36 +01:00
Christopher Faulet	2c672f282d	BUG/MEDIUM: stconn: Allow expiration update when READ/WRITE event is pending When a READ or a WRITE activity is reported on a channel, the corresponding date is updated. the last-read-activity date (lra) is updated and the first-send-block date (fsb) is reset. The event is also reported at the channel level by setting CF_READ_EVENT or CF_WRITE_EVENT flags. When one of these flags is set, this prevent the update of the stream's task expiration date from sc_notify(). It also prevents corresponding timeout to be reported from process_stream(). But it is a problem during fast-forwarding stage if no expiration date was set by the stream. Only process_stream() resets these flags. So a first READ or WRITE event will prevent any stream's expiration date update till a new call to process_stream(). But with no expiration date, this will only happen on shutdown/abort event, blocking the stream for a while. It is for instance possible to block the stats applet or the cli applet if a client does not consume the response. The stream may be blocked, the client timeout is not respected and the stream can only be closed on a client abort. So now, we update the stream's expiration date, regardless of reported READ/WRITE events. It is not a big deal because lra and fsb date are properly updated. It also means an old READ/WRITE event will no prevent the stream to report a timeout and it is expected too. This patch must be backported as far as 2.8. On older versions, timeouts and stream's expiration date are not updated in the same way and this works as expected.	2024-02-14 14:22:36 +01:00
Christopher Faulet	4a78f766ff	MEDIUM: applet: Add notion of shutdown for write for applets In fact there is already flags on the SE to state a shutdown for reads or writes was performed. But for applets, this notion does not exist. Both flags are set in same time when the applet is released. But at the SC level, there are functions to perform a shutdown (formely the shutw) and an abort (formely the shutr). For applets, when a shutdown is performed on the SC, if the applet is not immediately released, nothing is acknowledge at the SE level. With old way to implement applets, this was not an real issue until recently because applets accessed to the channel/SC flags. It was thus possible to catch the shutdowns. But the "wait" command on the CLI reveals the flaw. Indeed, when this command is executed, nothing is read or sent. So, it is not possible to detect the shutdowns. As a workaround, a dedicated test on the SC flags was added at the end of the wait command I/O handler. But it is pretty ugly. With new way to implement applets, there is no longer access to the channel or SC. So we must add a way to acknowledge shutdown into the SE. This patch solves the both sides of the issue. The shutw notion is added for applets. Its only purpose is to set SE_FL_SHWN flags. This flag is tested by all applets, so, it solves the issue quite simply. Note that it is described as a bug fix but there is no real issue, just a design flaw. However, if the "wait" command is backported, this patch must be backported too. Unfortinately it will require an adaptation because there is no appctx flags on older versions.	2024-02-14 14:22:36 +01:00
Christopher Faulet	5df45cff8f	BUG/MEDIUM: stconn/applet: Block 0-copy forwarding if producer needs more room This case does not exist yet with the H1 multiplexer, but applets may decide to not produce data if there is not enough room in the destination buffer (the applet's outbuf or the opposite SE buffer). It is true for the stats applets for instance. However this case is not properly handled when the zero-copy forwarding is in-use. To fix the issue, the se_done_ff() function was modified to return the number of bytes really forwarded and to subs for sends if nothing was forwarded while the zero-copy forwarding was blocked by the producer. On the applet side, we take care to block the zero-copy forwarding if the applet requests more room. At the end, zero-copy forwarding is unblocked if something was forwarded. This way, it is now possible for the stats applet to report a full buffer and block the zero-copy forwarding, even if the buffer is not really full, by requesting more room. No backport needed.	2024-02-14 14:22:36 +01:00
Christopher Faulet	ece002af1d	BUG/MEDIUM: applet: Add a flag to state an applet is using zero-copy forwarding An issue was introduced when zero-copy forwarding was added to the stats and cache applets. There is no test to be sure the upper layer is ready to use the zero-copy forwarding. So these applets refuse to deliver the response into the applet's output buffer if the zero-copy forwarding is supported by the opposite endpoint. It is especially an issue when a filter, like the compression, is in-use on the response channel. Because of this bug, the response is not delivered and the applet is woken up in loop to produce data. To fix the issue, an appctx flag was added, APPCTX_FL_FASTFWD, to know when the zero-copy forwarding is in-use. We rely on this flag to not fill the outbuf in the applet's I/O handler. No backport needed.	2024-02-14 14:22:36 +01:00
Frederic Lecaille	167e38e0e0	MINOR: quic: Add a counter for reordered packets A packet is considered as reordered when it is detected as lost because its packet number is above the largest acknowledeged packet number by at least the packet reordering threshold value. Add ->nb_reordered_pkt new quic_loss struct member at the same location that the number of lost packets to count such packets. Should be backported to 2.6.	2024-02-14 11:32:29 +01:00
Frederic Lecaille	eeeb81bb49	MINOR: quic: Dynamic packet reordering threshold Let's say that the largest packet number acknowledged by the peer is #10, when inspecting the non already acknowledged packets to detect if they are lost or not, this is the case a least if the difference between this largest packet number and and their packet numbers are bigger or equal to the packet reordering threshold as defined by the RFC 9002. This latter must not be less than QUIC_LOSS_PACKET_THRESHOLD(3). Which such a value, packets #7 and oldest are detected as lost if non acknowledged, contrary to packet number #8 or #9. So, the packet loss detection is very sensitive to such a network characteristic where non acknowledged packets are distant from each others by their packet number differences. Do not use this static value anymore for the packet reordering threshold which is used as a criteria to detect packet loss. In place, make it depend on the difference between the number of the last transmitted packet and the number of the oldest one among the packet which are still in flight before being inspected to be deemed as lost. Add new tune.quic.reorder-ratio setting to apply a ratio in percent to this dynamic packet reorder threshold. Should be backported to 2.6.	2024-02-14 11:32:29 +01:00
Remi Tricot-Le Breton	5e66bf26ec	BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing The CLI command "update ssl ocsp-response" was forcefully removing an OCSP response from the update tree regardless of whether it used to be in it beforehand or not. But since the main OCSP upate task works by removing the entry being currently updated from the update tree and then reinserting it when the update process is over, it meant that in the CLI command code we were modifying a structure that was already being used. These concurrent accesses were not properly locked on the "regular" update case because it was assumed that once an entry was removed from the update tree, the update task was the only one able to work on it. Rather than locking the whole update process, an "updating" flag was added to the certificate_ocsp in order to prevent the "update ssl ocsp-response" command from trying to update a response already being updated. An easy way to reproduce this crash was to perform two "simultaneous" calls to "update ssl ocsp-response" on the same certificate. It would then crash on an eb64_delete call in the main ocsp update task function. This patch can be backported up to 2.8.	2024-02-12 11:15:45 +01:00
Willy Tarreau	613e959c7b	MINOR: cli/wait: add a condition to wait on a server to become unused The "wait" command now supports a condition, "srv-unused", which waits for the designated server to become totally unused, indicating that it is removable. Upon each wakeup it calls srv_check_for_deletion() to verify if conditions are met, if not if it's recoverable, or if it's not recoverable, and proceeds according to this, never waiting for a final decision longer than the configured delay. The purpose is to make it possible to remove servers from the CLI after waiting for their sessions to be terminated: $ socat -t5 /path/to/socket - <<< " disable server px/srv1 shutdown sessions server px/srv1 wait 2s srv-unused px/srv1 del server px/srv1" Or even wait for connections to terminate themselves: $ socat -t70 /path/to/socket - <<< " disable server px/srv1 wait 1m srv-unused px/srv1 del server px/srv1"	2024-02-09 20:38:08 +01:00
Willy Tarreau	66989ff426	MINOR: cli/wait: also pass up to 4 arguments to the external conditions Conditions will need to have context, arguments etc from the command line. Since these will vary with time (otherwise we wouldn't wait), let's just pass them as text (possibly pre-processed). We're starting with 4 strings that are expected to be allocated by strdup() and are always sent to free() upon release.	2024-02-09 20:38:08 +01:00
Willy Tarreau	2673f8be82	MINOR: cli/wait: also support an unrecoverable failure status Since we'll support waiting for an action to succeed or permanently fail, we need the ability to return an unrecoverable failure. Let's add CLI_WAIT_ERR_FAIL for this. A static error message may be placed into ctx->msg to report to the user why the failure is unrecoverable.	2024-02-09 20:38:08 +01:00
Willy Tarreau	9b680d7411	MINOR: server: split the server deletion code in two parts We'll need to be able to verify whether or not a server may be deleted. For now, both the verification and the action are performed in the same function, at once under thread isolation. The goal here is to extract the verification code into a new function that will perform these checks, return a status between success/recoverable/non-recoverable failure, and will also return a message for the caller.	2024-02-09 20:38:08 +01:00
Willy Tarreau	1d2255a78a	MINOR: cli: add a new "wait" command to wait for a certain delay This allows to insert delays between commands, i.e. to collect a same set of metrics at a fixed interval. E.g: $ socat -t20 /path/to/socket <<< "show activity; wait 10s; show activity" The goal will be to extend the feature to optionally support waiting on certain conditions. For this reason the struct definitions and enums were placed into cli-t.h.	2024-02-08 21:54:54 +01:00
Willy Tarreau	8581d62daf	MINOR: session: add the necessary functions to update the per-session glitches This provides a new function session_add_glitch_ctr() that will update the glitch counter and rate for the session, if tracked at all.	2024-02-08 15:51:49 +01:00
Willy Tarreau	c9c6b683fb	MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate This adds a new pair of stored types in the stick-tables: - glitch_cnt - glitch_rate These keep count of the number of glitches reported on a front connection, in order to decide how to act with a badly defective client or a potential attacker. For now nothing updates these counters, but all the infrastructure needed to configure, update and retrieve them was added, including the doc. No regtest was added yet since they're not filled yet.	2024-02-08 15:51:49 +01:00
Remi Tricot-Le Breton	befebf8b51	BUG/MEDIUM: ocsp: Separate refcount per instance and per store With the current way OCSP responses are stored, a single OCSP response is stored (in a certificate_ocsp structure) when it is loaded during a certificate parsing, and each ckch_inst that references it increments its refcount. The reference to the certificate_ocsp is actually kept in the SSL_CTX linked to each ckch_inst, in an ex_data entry that gets freed when he context is freed. One of the downside of this implementation is that is every ckch_inst referencing a certificate_ocsp gets detroyed, then the OCSP response is removed from the system. So if we were to remove all crt-list lines containing a given certificate (that has an OCSP response), the response would be destroyed even if the certificate remains in the system (as an unused certificate). In such a case, we would want the OCSP response not to be "usable", since it is not used by any ckch_inst, but still remain in the OCSP response tree so that if the certificate gets reused (via an "add ssl crt-list" command for instance), its OCSP response is still known as well. But we would also like such an entry not to be updated automatically anymore once no instance uses it. An easy way to do it could have been to keep a reference to the certificate_ocsp structure in the ckch_store as well, on top of all the ones in the ckch_instances, and to remove the ocsp response from the update tree once the refcount falls to 1, but it would not work because of the way the ocsp response tree keys are calculated. They are decorrelated from the ckch_store and are the actual OCSP_CERTIDs, which is a combination of the issuer's name hash and key hash, and the certificate's serial number. So two copies of the same certificate but with different names would still point to the same ocsp response tree entry. The solution that answers to all the needs expressed aboved is actually to have two reference counters in the certificate_ocsp structure, one for the actual ckch instances and one for the ckch stores. If the instance refcount becomes 0 then we remove the entry from the auto update tree, and if the store reference becomes 0 we can then remove the OCSP response from the tree. This would allow to chain some "del ssl crt-list" and "add ssl crt-list" CLI commands without losing any functionality. Must be backported to 2.8.	2024-02-07 17:10:05 +01:00
Remi Tricot-Le Breton	28e78a0a74	MINOR: ssl: Use OCSP_CERTID instead of ckch_store in ckch_store_build_certid The only useful information taken out of the ckch_store in order to copy an OCSP certid into a buffer (later used as a key for entries in the OCSP response tree) is the ocsp_certid field of the ckch_data structure. We then don't need to pass a pointer to the full ckch_store to ckch_store_build_certid or even any information related to the store itself. The ckch_store_build_certid is then converted into a helper function that simply takes an OCSP_CERTID and converts it into a char buffer.	2024-02-07 17:09:39 +01:00
Christopher Faulet	d7467cd495	MINOR: applet: Identify applets using their own buffers via a flag These applets can now be identified by testing APPCTX_FL_INOUT_BUFS flag. This will be useful between the kind of applets in helper functions.	2024-02-07 15:05:05 +01:00
Christopher Faulet	a9301c96f1	MINOR: applet: Use an option to disable zero-copy forwarding for all applets At the beginning of the 3.0-dev cycle, the zero-copy forwarding support was added only for the cache applet with an option to disable it. This was a hack, waiting for a better integration with applets. It is now possible to implement the zero-copy forwarding for any applets. So the specific option for the cache applet was renamed to be used for all applets. And this option is now also checked for the stats applet. Concretely, 'tune.cache.zero-copy-forwarding' was renamed to 'tune.applet.zero-copy-forwarding'.	2024-02-07 15:05:01 +01:00
Christopher Faulet	ee53d8421f	MEDIUM: applet: Simplify a bit API to exchange data with applets Default .rcv_buf and .snd_buf functions that applets can use are now specialized to manipulate raw buffers or HTX buffers. Thus a TCP applet should use appctx_raw_rcv_buf() and appctx_raw_snd_buf() while HTTP applet should use appctx_htx_rcv_buf() and appctx_htx_snd_buf(). Note that the appctx is now directly passed to these functions instead of the SC.	2024-02-07 15:04:52 +01:00
Christopher Faulet	868205943c	MAJOR: stats: Send stats dump over HTTP using zero-copy forwarding Just like for the cache applet, it is now possible to send response to the opposite side using the zero-copy forwarding. Internal functions were slightly updated but there is nothing special to say. Except the requested size during the nego stage is not exact.	2024-02-07 15:04:48 +01:00
Christopher Faulet	1c18d32a0d	MEDIUM: stconn: Nofify requested size during zero-copy forwarding nego is exact It is now possible to use a flag during zero-copy forwarding negotiation to specify the requested size is exact, it means the producer really expect to receive at least this amount of data. It can be used by consumer to prepare some processing at this stage, based on the requested size. For instance, in the H1 mux, it is used to write the next chunk size.	2024-02-07 15:04:38 +01:00
Christopher Faulet	2297f52734	MINOR: stconn: Add support for flags during zero-copy forwarding negotiation During zero-copy forwarding negotiation, a pseudo flag was already used to notify the consummer if the producer is able to use kernel splicing or not. But this was not extensible. So, now we use a true bitfield to be able to pass flags during the negotiation. NEGO_FF_FL_* flags may be used now. Of course, for now, there is only one flags, the kernel splicing support on producer side (NEGO_FF_FL_MAY_SPLICE).	2024-02-07 15:04:29 +01:00
Christopher Faulet	39b6f5b04c	MEDIUM: applet: Add support for zero-copy forwarding from an applet Thanks to this patch, it is possible to an applet to directly send data to the opposite endpoint. To do so, it must implement <fastfwd> appctx callback function and set SE_FL_MAY_FASTFWD flag. Everything will be handled by appctx_fastfwd() function. The applet is only responsible to transfer data. If it sets <to_forward> value, it is used to limit the amount of data to forward.	2024-02-07 15:04:01 +01:00
Christopher Faulet	62a81cb6a6	MINOR: applet: Add callback function to deal with zero-copy forwarding This patch introduces the support for the callback function responsible to produce data via the zero-copy forwarding mechanism. There is no implementation for now. But <to_forward> field was added in the appctx structure to let an applet inform how much data it want to forward. It is not mandatory but it will be used during the zero-copy forwarding negociation.	2024-02-07 15:03:57 +01:00
Christopher Faulet	cc7b141e1c	MINOR: applet: Add an appctx flag to report shutdown to applets There is no shutdown for reads and send with applets. Both are performed when the appctx is released. So instead of 2 flags, like for muxes/connections, only one flag is used. But the idea is the same: acknowledge the event at the applet level.	2024-02-07 15:03:50 +01:00
Christopher Faulet	14bd091fd7	MINOR: applet: Remove appctx state field to only used the flags The appctx state was never really used as a state. It is only used to know when an applet should be freed on the next wakeup. This can be converted to a flag and the state can be removed. This is what this patch does.	2024-02-07 15:03:46 +01:00
Christopher Faulet	4434b03358	MINIOR: applet: Add flags to deal with ends of input, ends of stream and errors Dedicated appctx flags to report EOI, EOS and errors (pending or terminal) were added with the functions to set these flags. It is pretty similar to what it done on most of muxes.	2024-02-07 15:03:42 +01:00
Christopher Faulet	e8655546b7	MINOR: applet: Add flags on the appctx and stop abusing its state Till now, we've extended the appctx state to add some flags. However, the field name is misleading. So a bitfield was added to handle real flags. And helper functions to manipulate this bitfield were added.	2024-02-07 15:03:34 +01:00
Christopher Faulet	4ad8192ce4	MEDIM: applet: Add the applet handler based on IN/OUT buffers A dedicated function to run applets was introduced, in addition to the old one, to deal with applets that use their own buffers. The main differnce here is that this handler does not use channels at all. It performs a synchronous send before calling the applet and performs a synchronous receive just after. No applets are plugged on this handler for now.	2024-02-07 15:03:26 +01:00
Christopher Faulet	f81b704d01	MEDIUM: stconn: Add functions to handle applets I/O from the SC layer There is no tasklet to handle I/O subscriptions for applets, but functions to deal with receives and sends from the SC layer were added. it meanse a function to retrieve data from an applet with this synchronous version and a function to push data to an applet wit this synchronous version. It is pretty similar to the functions used for muxes but there are some differences. So for now, we keep them separated. Zero-copy forwarding is not supported for now. In addition, there is no subscription mechanism.	2024-02-07 15:03:23 +01:00
Christopher Faulet	525ec12305	MINOR: applet: Implement default functions to exchange data with channels In this patch, we add default functions to copy data from a channel to the <inbuf> buffer of an applet (appctx_rcv_buf) and another on to copy data from <outbuf> buffer of an applet to a channel (appctx_snd_buf). These functions are not used for now, but they will be used by applets to define their <rcv_buf> and <snd_buf> callback functions. Of course, it will be possible for a specific applet to implement its own functions but these ones should be good enough for most of applets. HTX and RAW buffers are supported.	2024-02-07 15:03:18 +01:00
Christopher Faulet	361b81bfca	MINOR: applet: Add support for callback functions to exchange data with channels For now, it is not usable, but this patch introduce the support of callback functions, in the applet structure, to exchange data between channels and applets. It is pretty similar to callback functions defined by muxes.	2024-02-07 15:03:14 +01:00
Christopher Faulet	ab9d2c6ca8	MINOR: applet: Add dedicated IN/OUT buffers for appctx It is the first patch of a series aimed to align applets on connections. Here, dedicated buffers are added for applets. For now, buffers are initialized and helpers function to deal with allocation are added. In addition, flags to report allocation failures or full buffers are also introduced. <inbuf> will be used to push data to the applet from the stream and <outbuf> will be used to push data from the applet to the stream.	2024-02-07 15:03:01 +01:00
Christopher Faulet	0dd7ff0d67	MINOR: stconn: Be able to detect applets using HTX IS_HXT_SC() macro is only usable if the stream-connector is attached to a connection. It is a bit restrictive because this cannot work if the SC is attached to an applet. So let's fix that be adding the support of applets too.	2024-02-07 15:02:19 +01:00
Christopher Faulet	6734e56514	MINOR: task: Move wait_event in the task header file wait_event structure was in connection header file because it is only used by connections and muxes. But, this may change. For instance applets may be good candidates to use it too. So, the structure is moved to the task header file instead.	2024-02-07 15:02:13 +01:00
Willy Tarreau	25968c186a	MINOR: debug: add an optional message argument to the BUG_ON() family This commit adds support for an optional second argument to BUG_ON(), WARN_ON(), CHECK_IF(), that can be a constant string. When such an argument is given, it will be printed on a second line after the existing first message that contains the condition. This can be used to provide more human-readable explanations about what happened, such as "too low on memory" or "memory corruption detected" that may help a user resolve the incident by themselves.	2024-02-05 17:09:00 +01:00
Willy Tarreau	d417863828	MINOR: debug: support passing an optional message in ABORT_NOW() The ABORT_NOW() macro is not much used since we have BUG_ON(), but there are situations where it makes sense, typically if the program must always die regardless od DEBUG_STRICT, or if the condition must always be evaluated (e.g. decompress something and check it). It's not convenient not to have any hint about what happened there. But providing too much info also results in wiping some registers, making the trace less exploitable, so a compromise must be found. What this patch does is to provide the support for an optional argument to ABORT_NOW(). When an argument is passed (a string), then a message will be emitted with the file name, line number, the message and a trailing LF, before the stack dump and the crash. It should be used reasonably, for example in functions that have multiple calls that need to be more easily distinguished.	2024-02-05 17:09:00 +01:00
Willy Tarreau	bc70b385fd	MINOR: debug: make BUG_ON() catch build errors even without DEBUG_STRICT As seen in previous commit `59acb27001` ("BUILD: quic: Variable name typo inside a BUG_ON()."), it can sometimes happen that with DEBUG forced without DEBUG_STRICT, BUG_ON() statements are ignored. Sadly, it means that typos there are not even build-tested. This patch makes these statements reference sizeof(cond) to make sure the condition is parsed. This doesn't result in any code being emitted, but makes sure the expression is correct so that an issue such as the one above will fail to build (which was verified). This may be backported as it can help spot failed backports as well.	2024-02-05 15:09:37 +01:00
Aurelien DARRAGON	be0165b249	BUILD: debug: remove leftover parentheses in ABORT_NOW() Since `d480b7b` ("MINOR: debug: make ABORT_NOW() store the caller's line number when using abort"), building with 'DEBUG_USE_ABORT' fails with: \|In file included from include/haproxy/api.h:35, \| from include/haproxy/activity.h:26, \| from src/ev_poll.c:20: \|include/haproxy/thread.h: In function ‘ha_set_thread’: \|include/haproxy/bug.h:107:47: error: expected ‘;’ before ‘_with_line’ \| 107 \| #define ABORT_NOW() do { DUMP_TRACE(); abort()_with_line(__LINE__); } while (0) \| \| ^~~~~~~~~~ \|include/haproxy/bug.h:129:25: note: in expansion of macro ‘ABORT_NOW’ \| 129 \| ABORT_NOW(); \ \| \| ^~~~~~~~~ \|include/haproxy/bug.h:123:9: note: in expansion of macro ‘__BUG_ON’ \| 123 \| __BUG_ON(cond, file, line, crash, pfx, sfx) \| \| ^~~~~~~~ \|include/haproxy/bug.h:174:30: note: in expansion of macro ‘_BUG_ON’ \| 174 \| # define BUG_ON(cond) _BUG_ON (cond, __FILE__, __LINE__, 3, "FATAL: bug ", "") \| \| ^~~~~~~ \|include/haproxy/thread.h:201:17: note: in expansion of macro ‘BUG_ON’ \| 201 \| BUG_ON(!thr->ltid_bit); \| \| ^~~~~~ \|compilation terminated due to -Wfatal-errors. \|make: *** [Makefile:1006: src/ev_poll.o] Error 1 This is because of a leftover: abort()_with_line(__LINE__); ^^ Fixing it by removing the extra parentheses after 'abort' since the abort() call is now performed under abort_with_line() helper function. This was raised by Ilya in GH #2440. No backport is needed, unless the above commit gets backported.	2024-02-05 14:55:04 +01:00
Willy Tarreau	d480b7be96	MINOR: debug: make ABORT_NOW() store the caller's line number when using abort Placing DO_NOT_FOLD() before abort() only works in -O2 but not in -Os which continues to place only 5 calls to abort() in h3.o for call places. The approach taken here is to replace abort() with a new function that wraps it and stores the line number in the stack. This slightly increases the code size (+0.1%) but when unwinding a crash, the line number remains present now. This is a very low cost, especially if we consider that DEBUG_USE_ABORT is almost only used by code coverage tools and occasional debugging sessions.	2024-02-02 17:12:06 +01:00
Willy Tarreau	2bb192ba91	MINOR: debug: make sure calls to ha_crash_now() are never merged As indicated in previous commit, we don't want calls to ha_crash_now() to be merged, since it will make gdb return a wrong line number. This was found to happen with gcc 4.7 and 4.8 in h3.c where 26 calls end up as only 5 to 18 "ud2" instructions depending on optimizations. By calling DO_NOT_FOLD() just before provoking the trap, we can reliably avoid this folding problem. Note that this does not address the case where abort() is used instead (DEBUG_USE_ABORT).	2024-02-02 17:12:06 +01:00
Willy Tarreau	e06e8a2390	MINOR: compiler: add a new DO_NOT_FOLD() macro to prevent code folding Modern compilers sometimes perform function tail merging and identical code folding, which consist in merging identical occurrences of same code paths, generally final ones (e.g. before a return, a jump or an unreachable statement). In the case of ABORT_NOW(), it can happen that the compiler merges all of them into a single one in a function, defeating the purpose of the check which initially was to figure where the bug occurred. Here we're creating a DO_NO_FOLD() macro which makes use of the line number and passes it as an integer argument to an empty asm() statement. The effect is a code position dependency which prevents the compiler from merging the code till that point (though it may still merge the following code). In practice it's efficient at stopping the compilers from merging calls to ha_crash_now(), which was the initial purpose. It may also be used to force certain optimization constructs since it gives more control to the developer.	2024-02-02 17:12:06 +01:00
Christopher Faulet	3246f863d6	MEDIUM: stats: Be able to access a specific field into a stats module It is now possible to selectively retrieve extra counters from stats modules. H1, H2, QUIC and H3 fill_stats() callback functions are updated to return a specific counter.	2024-02-01 12:00:53 +01:00
Christopher Faulet	fd366a106b	MINOR: stats: Be able to access to registered stats modules from anywhere The list of modules registered on the stats to expose extra counters is now public. It is required to export these counters into the Prometheus exporter.	2024-02-01 12:00:53 +01:00
Aurelien DARRAGON	42a97d9feb	MEDIUM: tcp-act/backend: support for set-bc-{mark,tos} actions set-bc-{mark,tos} actions are pretty similar to set-fc-{mark,tos} to set mark/tos on packets sent from haproxy to server: set-bc-{mark,tos} actions act on the whole backend/srv connection: from connect() to connection teardown, thus they may only be used before the connection to the server is instantiated, meaning that they are only relevant for request-oriented rules such as tcp-request or http-request rules. For now their use is limited to content request rules, because tos and mark informations are stored directly within the stream, thus it is required that the stream already exists. stream flags are used in combination with dedicated stream struct members variables to pass 'tos' and 'mark' informations so that they are correctly considered during stream connection assignment logic (prior to connecting to actually connecting to the server) 'tos' and 'mark' fd sockopts are taken into account in conn hash parameters for connection reuse mechanism. The documentation was updated accordingly.	2024-02-01 10:58:30 +01:00
Aurelien DARRAGON	b4ee7b044e	MEDIUM: tcp-act: <expr> support for set-fc-{mark,tos} actions In this patch we add the possibility to use sample expression as argument for set-fc-{mark,tos} actions. To make it backward compatible with previous behavior, during parsing we first try to parse the value as as integer (decimal or hex notation), and then fallback to expr parsing in case of failure. The documentation was updated accordingly.	2024-02-01 10:58:30 +01:00
Aurelien DARRAGON	ea09075f59	OPTIM: connection: progressive hash for conn_calculate_hash() Some CPU time is needlessly wasted in conn_calculate_hash(), because all params are first copied into a temporary buffer before computing the hash on the whole buffer. Instead, let's leverage the XXH progressive hash update functions to avoid expensive memcpys.	2024-02-01 10:58:30 +01:00
Aurelien DARRAGON	1de149fb6d	CLEANUP: connection: remove obsolete comment in header file 0x00000008 bit for CO_FL_* flags is no more unused since `8cc3fc73f1` ("MINOR: connection: update rhttp flags usage"). Removing the comment that says otherwise.	2024-02-01 10:58:30 +01:00
Amaury Denoyelle	4b5f557283	MINOR: mux-quic: realign Tx buffer if possible A major reorganization of QUIC MUX sending has been implemented. Now data transfer occur over a single QCS buffer. This has improve performance but at the cost of restrictions on snd_buf. Indeed, buffer instances are now shared from stream callback snd_buf up to quic-conn layer. As such, snd_buf cannot manipulate freely already present data buffer. In particular, realign has been completely removed by the previous patches. This commit reintroduces a partial realign support. This is only done if the buffer contains only unsent data, via a new MUX function qcc_realign_stream_txbuf() which is called during snd_buf.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	4513787d0d	MEDIUM: mux-quic: properly handle conn Tx buf exhaustion This commit is a direct follow-up on the major rearchitecture of send buffering. This patch implements the proper handling of connection pool buffer temporary exhaustion. The first step is to be able to differentiate a fatal allocation error from a temporary pool exhaustion. This is done via a new output argument on qcc_get_stream_txbuf(). For a fatal error, application protocol layer will schedule the immediate connection closing. For a pool exhaustion, QCC is flagged with QC_CF_CONN_FULL and stream sending process is interrupted. QCS instance is also registered in a new list <qcc.buf_wait_list>. A new connection buffer can become available when all ACKs are received for an older buffer. This process is taken in charge by quic-conn layer. It uses qcc_notify_buf() function to clear QC_CF_CONN_FULL and to wake up every streams registered on buf_wait_list to resume sending process.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	cd22200d23	MEDIUM: mux-quic: release Tx buf on too small room This commit is a direct follow-up on the major rearchitecture of send buffering. It allows application protocol to react if current QCS sending buffer space is too small. In this case, the buffer can be released to the quic-conn layer. This allows to allocate a new QCS buffer and retry HTX parsing, unless connection buffer pool is already depleted. A new function qcc_release_stream_txbuf() serves as API for app protocol to release the QCS sending buffer. This operation fails if there is unsent data in it. In this case, MUX has to keep it to finalize transfer of unsent data to quic-conn layer. QCS is thus flagged with QC_SF_BLK_MROOM to interrupt snd_buf operation. When all data are sent to the quic-conn layer, QC_SF_BLK_MROOM is cleared via qcc_streams_sent_done() and stream layer is woken up to restart snd_buf. Note that a new function qcc_stream_can_send() has been defined. It allows app proto to check if sending is currently blocked for the current QCS. For now, it checks QC_SF_BLK_MROOM flag. However, it will be extended to other conditions with the following patches.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	3fe3251593	MEDIUM: mux-quic: simplify sending API The previous commit was a major rework for QUIC MUX sending process. Following this, this patch cleans up a few elements that remains but can be removed as they are duplicated. Of notable changes, offset fields from QCS and QCC are removed. They are both equivalent to flow control soft offsets. A new function qcs_prep_bytes() is implemented. Its purpose is to return the count of prepared data bytes not yet sent. It also replaces qcs_need_sending().	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	00a3e5f786	MAJOR: mux-quic: remove intermediary Tx buffer Previously, QUIC MUX sending was implemented with data transfered along two different buffer instances per stream. The first QCS buffer was used for HTX blocks conversion into H3 (or other application protocol) during snd_buf stream callback. QCS instance is then registered for sending via qcc_io_cb(). For each sending QCS, data memcpy is performed from the first to a secondary buffer. A STREAM frame is produced for each QCS based on the content of their secondary buffer. This model is useful for QUIC MUX which has a major difference with other muxes : data must be preserved longer, even after sent to the lower layer. Data references is shared with quic-conn layer which implements retransmission and data deletion on ACK reception. This double buffering stages was the first model implemented and remains active until today. One of its major drawbacks is that it requires memcpy invocation for every data transferred between the two buffers. Another important drawback is that the first buffer was is allocated by each QCS individually without restriction. On the other hand, secondary buffers are accounted for the connection. A bottleneck can appear if secondary buffer pool is exhausted, causing unnecessary haproxy buffering. The purpose of this commit is to completely break this model. The first buffer instance is removed. Now, application protocols will directly allocate buffer from qc_stream_desc layer. This removes completely the memcpy invocation. This commit has a lot of code modifications. The most obvious one is the removal of <qcs.tx.buf> field. Now, qcc_get_stream_txbuf() returns a buffer instance from qc_stream_desc layer. qcs_xfer_data() which was responsible for the memcpy between the two buffers is also completely removed. Offset fields of QCS and QCC are now incremented directly by qcc_send_stream(). These values are used as boundary with flow control real offset to delimit the STREAM frames built. As this change has a big impact on the code, this commit is only the first part to fully support single buffer emission. For the moment, some limitations are reintroduced and will be fixed in the next patches : * on snd_buf if QCS sent buffer in used has room but not enough for the application protocol to store its content * on snd_buf if QCS sent buffer is NULL and allocation cannot succeeds due to connection pool exhaustion One final important aspect is that extra care is necessary now in snd_buf callback. The same buffer instance is referenced by both the stream and quic-conn layer. As such, some operation such as realign cannot be done anymore freely.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	c6ef55407c	MINOR: mux-quic: remove unneeded sent-offset fields Both QCS and QCC have their owned sent offset field. These fields store the newest offset sent to the quic-conn layer. It is similar to QCS/QCC flow control real offset. This patch removes them and replaces them by the latter for code clarification. MINOR: mux-quic: remove unneeded qcc.tx.sent_offsets field This commit as a similar purpose as previous, except that it removes QCC <sent_offsets> field, now equivalent to connection flow control real offset.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	d4bf6f0526	MEDIUM: mux-quic: limit conn flow control on snd_buf This commit is a direct follow-up on the previous one. This time, it deals with connection level flow control. Process is similar to stream level : soft offset is incremented during snd_buf and real offset during STREAM frame emission. On MAX_DATA reception, both stream layer and QMUX is woken up if necessary. One extra feature for conn level is the introduction of a new QCC list to reference QCS instances. It will store instances for which snd_buf callback has been interrupted on QCC soft offset reached. Every stream instances is woken up on MAX_DATA reception if soft_offset is unblocked.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	c44692356d	MEDIUM: mux-quic: limit stream flow control on snd_buf This patch is the first of two to reimplement flow control emission limits check. The objective is to account flow control earlier during snd_buf stream callback. This should smooth transfers and prevent over buffering on haproxy side if flow control limit is reached. The current patch deals with stream level flow control. It reuses the newly defined flow control type. Soft offset is incremented after HTX to data conversion. If limit is reached, snd_buf is interrupted and stream layer will subscribe on QCS. On qcc_io_cb(), generation of STREAM frames is restricted as previously to ensure to never surpass peer limits. Finally, flow control real offset is incremented on lower layer send notification. Thus, it will serve as a base offset for built STREAM frames. If limit is reached, STREAM frames generation is suspended. Each time QCS data flow control limit is reached, soft and real offsets are reconsidered. Finally, special care is used when flow control limit is incremented via MAX_STREAM_DATA reception. If soft value is unblocked, stream layer snd_buf is woken up. If real value is unblocked, qcc_io_cb() is rescheduled.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	25493ca036	MINOR: mux-quic: define a flow control related type Create a new module dedicated to flow control handling. It will be used to implement earlier flow control update on snd_buf stream callback. For the moment, only Tx part is implemented (i.e. limit set by the peer that haproxy must respect for sending). A type quic_fctl is defined to count emitted data bytes. Two offsets are used : a real one and a soft one. The difference is that soft offset can be incremented beyond limit unless it is already in excess. Soft offset will be used for HTX to H3 parsing. As size of generated H3 is unknown before parsing, it allows to surpass the limit one time. Real offset will be used during STREAM frame generation : this time the limit must not be exceeded to prevent protocol violation.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	f32c08be34	MINOR: mux-quic: prepare for earlier flow control update Add a new argument to qcc_send_stream() to specify the count of sent bytes. For the moment this argument is unused. This commit is in fact a step to implement earlier flow control update during stream layer snd_buf.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	220386ae40	BUG/MINOR: ssl/quic: fix 0RTT define Previous patches have reorganize define definitions for SSL 0RTT support. However a typo was introduced. This caused haproxy to disable 0RTT support announcement and report of an erroneous warning for no support on the SSL library side when using quictls/openssl compat layer. This was detected by using ngtcp2-client. No 0RTT packet were emitted by the client due to haproxy missing support advertisement. The faulty commit is the following one : commit `5c45199347` MEDIUM: ssl/quic: always compile the ssl_conf.early_data test This must be backported wherever the above patch is.	2024-01-31 16:28:32 +01:00
Willy Tarreau	fadabc430f	CLEANUP: h1: remove unused function h1_measure_trailers() This one stopped being used in 2.1 when HTX became mandatory, let's drop it.	2024-01-31 15:22:12 +01:00
William Lallemand	025f5105ee	MINOR: ssl: rename HA_OPENSSL_HAVE_0RTT_SUPPORT constant to HAVE_SSL_0RTT_QUIC Rename the constant to be me more comprehensive.	2024-01-31 11:57:54 +01:00
William Lallemand	f5353f2c45	MINOR: ssl: add HAVE_SSL_0RTT constant Add the HAVE_SSL_0RTT constant which define if the SSL library supports 0RTT. Which is different from HA_OPENSSL_HAVE_0RTT_SUPPORT which was used only in the context of QUIC	2024-01-31 11:57:54 +01:00
Christopher Faulet	4837e99892	BUG/MEDIUM: h1: Don't support LF only to mark the end of a chunk size It is similar to the previous fix but for the chunk size parsing. But this one is more annoying because a poorly coded application in front of haproxy may ignore the last digit before the LF thinking it should be a CR. In this case it may be out of sync with HAProxy and that could be exploited to perform some sort or request smuggling attack. While it seems unlikely, it is safer to forbid LF with CR at the end of a chunk size. This patch must be backported to 2.9 and probably to all stable versions because there is no reason to still support LF without CR in this case.	2024-01-30 15:00:14 +01:00
Christopher Faulet	7b737da825	BUG/MINOR: h1: Don't support LF only at the end of chunks When the message is chunked, all chunks must ends with a CRLF. However, on old versions, to support bad client or server implementations, the LF only was also accepted. Nowadays, it seems useless and can even be considered as an issue. Just forbid LF only at the end of chunks, it seems reasonnable. This patch must be backported to 2.9 and probably to all stable versions because there is no reason to still support LF without CR in this case.	2024-01-30 14:58:59 +01:00
Miroslav Zagorac	24a5e42db6	CLEANUP: log: deinitialization of the log buffer in one function In several places in the source, there was the same block of code that was used to deinitialize the log buffer. There were even two functions that did this, but they were called only from the code that is in the same source file (free_tcpcheck_fmt() in src/tcpcheck.c and free_logformat_list() in src/proxy.c - they were both static functions). The function free_logformat_list() was moved from the file src/proxy.c to src/log.c, and a check of the list before freeing the memory was added to that function.	2024-01-30 08:27:26 +01:00
Willy Tarreau	e5ac9fc98b	BUILD: makefile: also define cmd_CXX to pretty-print C++ build commands Device Atlas' dummy lib will use a C++ file when built with cache support, so for completeness we'll have to pretty-print it as well. Let's define cmd_CXX.	2024-01-26 18:54:23 +01:00
Ilya Shipitsin	558d385c85	CLEANUP: fix spelling of "elemt"	2024-01-26 17:29:27 +01:00
Amaury Denoyelle	ad6b13d317	BUG/MEDIUM: quic: remove unsent data from qc_stream_desc buf QCS instances use qc_stream_desc for data buffering on emission. On stream reset, its Tx channel is closed earlier than expected. This may leave unsent data into qc_stream_desc. Before this patch, these unsent data would remain after QCS freeing. This prevents the buffer to be released as no ACK reception will remove them. The buffer is only freed when the whole connection is closed. As qc_stream_desc buffer is limited per connection, this reduces the buffer pool for other streams of the same connection. In the worst case if several streams are resetted, this may completely freeze the transfer of the remaining connection streams. This bug was reproduced by reducing the connection buffer pool to a single buffer instance by using the following global statement : tune.quic.frontend.conn-tx-buffers.limit 1. Then a QUIC client is used which opens a stream for a large enough object to ensure data are buffered. The client them emits a STOP_SENDING before reading all data, which forces the corresponding QCS instance to be resetted. The client then opens a new request but the transfer is freezed due to this bug. To fix this, adjust qc_stream_desc API. Add a new argument <final_size> on qc_stream_desc_release() function. Its value is compared to the currently buffered offset in latest qc_stream_desc buffer. If <final_size> is inferior, it means unsent data are present in the buffer. As such, qc_stream_desc_release() removes them to ensure the buffer will finally be freed when all ACKs are received. It is also possible that no data remains immediately, indicating that ACK were already received. As such, buffer instance is immediately removed by qc_stream_buf_free(). This must be backported up to 2.6. As this code section is known to regression, a period of observation could be reserved before distributing it on LTS releases.	2024-01-26 16:02:05 +01:00
Frederic Lecaille	ab75d89e07	BUILD: quic: Fix build error when building QUIC against libressl. This previous commit was not sufficient to completely fix the building issue in relation with the TLS stack 0-RTT support. LibreSSL was the last TLS stack to refuse to compile because of undefined a QUIC specific function for 0-RTT: SSL_set_quic_early_data_enabled(). To get rid of such compilation issues, define HA_OPENSSL_HAVE_0RTT_SUPPORT only when building against TLS stack with 0-RTT support. No need to backport.	2024-01-24 15:37:40 +01:00
Emeric Brun	ef02dba7bc	BUG/MEDIUM: cli: some err/warn msg dumps add LR into CSV output on stat's CLI The initial purpose of CSV stats through CLI was to make it easely parsable by scripts. But in some specific cases some error or warning messages strings containing LF were dumped into cells of this CSV. This made some parsing failure on several tools. In addition, if a warning or message contains to successive LF, they will be dumped directly but double LFs tag the end of the response on CLI and the client may consider a truncated response. This patch extends the 'csv_enc_append' and 'csv_enc' functions used to format quoted string content according to RFC with an additionnal parameter to convert multi-lines strings to one line: CRs are skipped, and LFs are replaced with spaces. In addition and optionally, it is also possible to remove resulting trailing spaces. The call of this function to fill strings into stat's CSV output is updated to force this conversion. This patch should be backported on all supported branches (issue was already present in v2.0)	2024-01-24 08:38:59 +01:00
Willy Tarreau	a3d6af6a0f	MINOR: connection: add a new mux_ctl to report number of connection glitches MUX_CTL_GET_GLITCHES will report the non-negative number of clitches observed on a connection, or -1 if not supported.	2024-01-18 17:21:44 +01:00
William Lallemand	97832ab823	MEDIUM: ssl: implements 'default-crt' keyword for bind Lines The 'default-crt' bind keyword allows to specify multiples default/fallback certificates, allowing one to have an RSA as well as an ECDSA default.	2024-01-12 17:40:42 +01:00
William Lallemand	83a0cde207	REORG: ssl: move 'generate-certificates' code to ssl_gencert.c A lot of code specific to the 'generate-certificates' option was left in ssl_sock.c. Move the code to 'ssl_gencert.c' and 'ssl_gencert.h'	2024-01-12 17:40:42 +01:00
William Lallemand	b80635a7e0	MEDIUM: ssl: does not use default_ctx for 'generate-certificate' option The 'generate-certificates' option does not need its dedicated SSL_CTX *, it only needs the default SSL_CTX. Use the default SSL_CTX found in the sni_ctx to generate certificates. It allows to remove all the specific default_ctx initialization, as well as the default_ssl_conf and 'default_inst'.	2024-01-12 17:40:42 +01:00
William Lallemand	0bf9d122a9	MEDIUM: ssl: generate '' SNI filters for default certificates This patch follows the previous one about default certificate selection ("MEDIUM: ssl: allow multiple fallback certificate to allow ECDSA/RSA selection"). This patch generates '" SNI filters for the first certificate of a bind line, it will be used to match default certificates. Instead of setting the default_ctx pointer in the bind line. Since the filters are in the SNI tree, it allows to have multiple default certificate and restore the ecdsa/rsa selection with a multi-cert bundle. This configuration: # foobar.pem.ecdsa and foobar.pem.rsa bind *:8443 ssl crt foobar.pem crt next.pem will use "foobar.pem.ecdsa" and "foobar.pem.rsa" as default certificates. Note: there is still cleanup needed around default_ctx. This was discussed in github issue #2392.	2024-01-12 17:40:42 +01:00
Amaury Denoyelle	c121fcef30	BUILD: quic: missing include for quic_tp Add missing netinet/in.h required for in_addr/in6_addr types. This should be backported up to 2.9.	2024-01-12 16:08:36 +01:00
Willy Tarreau	3c135569c5	MINOR: http: add infrastructure to choose status codes for err / fail At the moment, http_err_cnt and http_fail_cnt are incremented on a well-defined set of status codes, which are checked at various places. Over time, there have been some complains about 404, 401 or 407 triggering errors, or 500 triggering failures in SOAP environments for example. With a small bit field that fits in a cache line we can match the presence of a status code from 100 to 599, so that remains cheap. This patch adds two such bit fields, one per code class, and the accompanying functions to set/clear/test the codes. The arrays are preset at boot time. For now they are not used and it's not possible to adjust them.	2024-01-11 15:10:08 +01:00
Frédéric Lécaille	37d5a26cc5	CLEANUP: quic: Double quic_dgram_parse() prototype declaration. This function is defined in the RX part (quic_rx.c) and declared in quic_rx.h header. This is its correct place. Remove the useless declaration of this function in quic_conn.h. Should be backported in 2.9 where this double declaration was introduced when moving quic_dgram_parse() from quic_conn.c to quic_rx.c.	2024-01-10 17:22:24 +01:00
Willy Tarreau	5c0128d942	IMPORT: ebtree: make string_equal_bits() return an unsigned It used to return ssize_t for -1 but in fact we're using this -1 as the largest possible value and the result is generally cast to signed to check if the end was reached, so better make it clearly return an unsigned value here. This is cbtree commit e1e58a2b2ced2560d4544abaefde595273089704. This is ebtree commit d7531a7475f8ba8e592342ef1240df3330d0ab47.	2024-01-06 13:35:42 +01:00
Willy Tarreau	b7068b3152	IMPORT: ebtree: use unsigned ints for flznz() There's no reason to return signed values there. And it turns out that the compiler manages to improve the performance by ~2%. This is cbtree commit ab3fd53b8d6bbe15c196dfb4f47d552c3441d602. This is ebtree commit 0ebb1d7411d947de55fa5913d3ab17d089ea865c.	2024-01-06 13:35:42 +01:00
Willy Tarreau	2a14f99dbb	IMPORT: ebtree: make string_equal_bits turn back to unsigned char With flsnz() instead of flsnz_long() we're now getting a better performance on both x86 and ARM. The difference is that previously we were relying on a function that was forcing the use of register %eax for the 8-bit version and that was preventing the compiler from keeping the code optimized. The gain is roughly 5% on ARM and 1% on x86. This is cbtree commit 19cf39b2514bea79fed94d85e421e293be097a0e. This is ebtree commit a9aaf2d94e2c92fa37aa3152c2ad8220a9533ead.	2024-01-06 13:35:42 +01:00
Willy Tarreau	1c46a07460	IMPORT: ebtree: rework the fls macros to better deal with arch-specific ones The definitions were a bit of a mess and there wasn't even a fall back to __builtin_clz() on compilers supporting it. Now we instead define a macro for each implementation that is set on an arch-dependent case by case, and add the fall back ones only when not defined. This also allows the flsnz8() to automatically fall back to the 32-bit arch-specific version if available. This shows a consistent 33% speedup on arm for strings. This is cbtree commit c6075742e8d0a6924e7183d44bd93dec20ca8049. This is ebtree commit f452d0f83eca72f6c3484ccb138d341ed6fd27ed.	2024-01-06 13:35:42 +01:00
Willy Tarreau	fc421e5b3d	IMPORT: ebtree: switch the sizes and offsets to size_t and ssize_t Let's use these in order to avoid 32-64 bit casts on 64 bit platforms. This is cbtree commit e4f4c10fcb5719b626a1ed4f8e4e94d175468c34. This is ebtree commit cc10507385c784d9a9e74ea9595493317d3da99e.	2024-01-06 13:35:13 +01:00
Willy Tarreau	9afe3b59a7	IMPORT: ebtree: implement and use flsnz_long() to count bits The asm code shows multiple conversions. Gcc has always been terribly bad at dealing with chars, which are constantly converted to ints for every operation and zero-extended after each operation. But here in addition there are conversions before and after the flsnz(). Let's just mark the variables as long and use flsnz_long() to process them without any conversion. This shortens the code and makes it slightly faster. Note that the fls operations could make use of __builtin_clz() on gcc 4.6 and above, and it would be useful to implement native support for ARM as well. This is cbtree commit 1f0f83ba26f2279c8bba0080a2e09a803dddde47. This is ebtree commit 9c38dcae22a84f0b0d9c5a56facce1ca2ad0aaef.	2024-01-06 13:35:13 +01:00
Christopher Faulet	7cc4151422	BUG/MEDIUM: stconn: Set fsb date if zero-copy forwarding is blocked during nego During the zero-copy forwarding, if the consumer side reports it is blocked, it means it is blocked on send. At the stream-connector level, the event must be reported to be sure to set/update the fsb date. Otherwise, write timeouts cannot be properly reported. If this happens when no other timeout is armed, this freezes the stream. This patch must be backported to 2.9.	2024-01-05 17:28:06 +01:00
Frédéric Lécaille	fd178ccdb0	BUILD: quic: Missing quic_ssl.h header protection Such "#ifdef USE_QUIC" prepocessor statements are used by QUIC C header to avoid inclusion of QUIC headers when the QUIC support is not enabled (by USE_QUIC make variable). Furthermore, this allows inclusions of QUIC header from C file without having to protect them with others "#ifdef USE_QUIC" statements as follows: #ifdef USE_QUIC #include <a QUIC header> #include <another one QUIC header> #endif /* USE_QUIC */ So, here if this quic_ssl.h header was included by a C file, and compiled without QUIC support, this will lead to build errrors as follows: In file included from <a C file...>: include/haproxy/quic_ssl.h:35:35: warning: ‘enum ssl_encryption_level_t’ declared inside parameter list will not be visible outside of this definition or declaration Should be backported to 2.9 to avoid such building issues to come.	2024-01-04 13:56:44 +01:00
Frédéric Lécaille	860028db47	CLEANUP: quic: Remaining useless code into server part Remove some QUIC definitions of members from server structure as the haproxy QUIC stack does not support at all the server part (QUIC client) as this time. Remove the statements in relation with their initializations. This patch should be backported as far as 2.6 to save memory.	2024-01-04 11:16:06 +01:00
Willy Tarreau	afba58f21e	MINOR: global: export a way to list build options The new function hap_get_next_build_opt() will iterate over the list of build options. This will be used for debugging, so that the build options can be retrieved from the CLI.	2024-01-02 11:44:42 +01:00
Dragan Dosen	96c1a61136	MEDIUM: udp: allow to retrieve the frontend destination address A new flag RX_F_PASS_PKTINFO is now available, whose purpose is to mark that the destination address is about to be retrieved on some listeners. The address can be retrieved from the first received datagram, and relies on the IP_PKTINFO, IP_RECVDSTADDR and IPV6_RECVPKTINFO support.	2024-01-02 11:44:42 +01:00
Dragan Dosen	1582ccf9d3	MINOR: tcpcheck: export proxy_parse_tcpcheck() Export proxy_parse_tcpcheck() in tcpcheck.h	2024-01-02 11:44:42 +01:00
Dragan Dosen	5b1609f9da	MINOR: backend: export get_server_*() functions This is in preparation for exposing more of the LB internals.	2024-01-02 11:44:42 +01:00
Aurelien DARRAGON	689784ed91	CLEANUP: resolvers: remove some more unused RSLV_UDP flags RSLV_UPD_CNAME and RSLV_UPD_NAME_ERROR flags have now become useless since `3cf7f987` ("MINOR: dns: proper domain name validation when receiving DNS response") as they are never set, but we forgot to remove them.	2024-01-02 10:29:41 +01:00
Aurelien DARRAGON	299501845d	CLEANUP: resolvers: remove unused RSLV_UPD_OBSOLETE_IP flag RSLV_UPD_OBSOLETE_IP was introduced with commit `a8c6db8d2` ("MINOR: dns: Cache previous DNS answers.") but the commit didn't make any use of it, and today the flag is still unused. Since we have no valid use for it, better remove it to prevent confusions.	2024-01-02 10:29:33 +01:00
Ilya Shipitsin	8705e45964	CLEANUP: assorted typo fixes in the code and comments This is 38th iteration of typo fixes	2024-01-02 10:19:48 +01:00
Frédéric Lécaille	10e96fcd17	BUG/MINOR: quic: Missing call to TLS message callbacks This bug impacts only the QUIC OpenSSL compatibility module (USE_QUIC_OPENSSL_COMPAT). The TLS capture of information from client hello enabled by tune.ssl.capture-buffer-size could not work with USE_QUIC_OPENSSL_COMPAT. This is due to the fact the callback set for this feature was replaced by quic_tls_compat_msg_callback(). In fact this called must be registered by ssl_sock_register_msg_callback() as this done for the TLS client hello capture. A call to this function appends the function passed as parameter to a list of callbacks to be called when the TLS stack parse a TLS message. quic_tls_compat_msg_callback() had to be modified to return if it is called for a non-QUIC TLS session. Must be backported to 2.8.	2023-12-21 16:33:06 +01:00
Amaury Denoyelle	235e8f1afd	MEDIUM: mux-quic: add BUG_ON if sending on locally closed QCS Previously, if snd_buf operation was conducted despite QCS already locally closed, the input buffer was silently dropped. This situation could happen if a RESET_STREAM was emitted butemission not reported to the stream layer. Resetting silently the buffer ensure QUIC MUX remain compliant with RFC 9000 which forbid emission after RESET_STREAM. Since previous commit, it is now ensured that RESET_STREAM sending will always be reported to stream-layer. Thus, there is no need anymore to silently reset the buffer. A BUG_ON() statement is added to ensure this assumption will remain valid. The new code is deemed cleaner as it does not hide a missing error notification on the stconn-layer. Previously, if an error was missing, sending would continue unnecessarily with a false success status reported for the stream. Note that the BUG_ON() statement was also added into nego_ff callback. This is necessary to ensure both sending path remains consistent. This patch is labelled as MEDIUM as issues were already encountered in snd_buf/nego_ff implementation and it's not easy to cover all occurences during test. If the BUG_ON() is triggered without any apparent stream-layer issue, this commit should be reverted.	2023-12-21 15:42:08 +01:00
Aurelien DARRAGON	f6ae25858d	MINOR: peers: rely on srv->addr and remove peer->addr Similarly to the previous commit, we get rid of unused peer member. peer->addr was only used to save a copy of the sever's addr at parsing time. But instead of relying on an intermediate variable, we can actually use server's address directly when initiating the peer session. As with other streams created from server's settings (tcp/http, log, ring), we should rely on srv->svc_port for the port part of the address. This shouldn't change anything for peers since the address is fully resolved at parsing time and runtime changes are not supported, but this should help to make the code future-proof.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	372d3e2934	CLEANUP: peers: remove unused "proto" and "xprt" struct members peer->proto and peer->xprt struct members are now pure legacy: they are only set during parsing but never used afterwards. This is due to commit `02efedac` ("MINOR: peers: now remove the remote connection setup code") which made some cleanup in the past, but the unused proto and xprt members were probably left unused by mistake. Since we don't have valid uses for them, we remove them. Also, peer_xprt() helper function was removed since it was related to peer->xprt struct member.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	334caefaaa	CLEANUP: peers: remove unused sock_init_arg struct member Since `be0688c6` ("MEDIUM: stream_interface: remove the si->init"), sock_init_arg is completely useless (set but never used later), thus we remove it.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	7293eb68ff	MEDIUM: peers: use server as stream target Historically, we used the internal peer proxy as stream target, because then we only cared about initiating a basic tcp connection with the endpoint, and relying on parent proxy settings was enough. But later, we introduced the possibility to connect to an SSL peer by taking server's SSL parameters into acount. This was done in commit `1055e687` ("MINOR: peers: Make outgoing connection to SSL/TLS peers work.") However, the above commit introduced an ambiguity: peer_session_target() function was introduced, and the function will either return the peers proxy's object or the current server's object depending if ssl is configured or not. While this works fine to ensure proper SSL handling while being conservative with historical behavior, this cause other server transport related settings to only work when ssl settings are provided, which is quite debatable. Indeed, while we're there, why not always using the server's object as a stream target, to ensure all transport related options are properly handled? Moreover, the peers documentation tells this: ... "support for all "server" parameters found in 5.2 paragraph that are related to transport settings" ... To remove the ambiguity and fully comply with the documentation, we make peer_session_target() always return the server's object.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	334ebfa1a2	MEDIUM: server/dns: clear RMAINT when addr resolves again snr_update_srv_status() and srvrq_update_srv_status() will both set or clear the server RMAINT state depending of the result of the current dns resolution. This used to work pretty well in the past, but now that addr:svc_port changes are changed atomically through a dedicated task, the change is performed asynchronously, so this can cause some flapping issues if the server is put out of maintenance while the server's address is still unassigned. To prevent errors, the resolver's code is now only allowed to put the server under maintenance but not to remove it from maintenance: the decision to remove a server from maintenance is performed by the task responsible for updating the server's addr: if the addr resolves again thanks to a valid DNS resolution and the server was previously under RMAINT, then it cleared from RMAINT state. srvrq_update_srv_status() was renamed srvrq_set_srv_down(), since it is only called to put the server in maintenance as a result of a failing SRV entry. snr_update_srv_status() was renamed srv_set_srv_down() and slightly modified so that it only takes care of putting the server under maintenance when needed. The cli command "set server x/y addr" does not need to remove the RMAINT flag anymore.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	72e2c8db3e	MINOR: server: add dns hint in server_inetaddr_updater struct This will allow event consumers to know if the update was triggered dns/ resolver stuff by checking the ->dns boolean.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	33cd676e9e	MINOR: server/event_hdl: expose updater info through INETADDR event Thanks to the previous commit, we can now expose updater info through INETADDR event.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	3ac79b504a	MEDIUM: server: make server_set_inetaddr() updater serializable server_set_inetaddr() updater argument is a simple char * string containing infos about the caller responsible for the update. In this patch, we try to make this argument serializable, that is, make it so that we can easily export it without having to keep the original pointer passed by the caller or having to work with strings of variable lengths. This was a prerequisite for exposing more updater information through SERVER_INETADDR event (upcoming patch). Static strings were simply mapped to a fixed ID that can be converted back to a string when needed using server_inetaddr_updater_by_to_str(). One special case one made for the SERVER_INETADDR_UPDATER_DNS_RESOLVER updater since in this case the updater hint has to be generated from the corresponding resolver id / nameserver id combination. This was achieved by saving the nameserver id within the updater struct. Knowing that the resolver id can be guessed from the server struct directly, it was not exposed through the updater struct. This patch depends on: - "MINOR: resolvers: add unique numeric id to nameservers" No functional change should be expected.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	2f6120d6d4	MINOR: resolvers: add unique numeric id to nameservers When we want to avoid keeping pointers on a nameserver struct, it's not always convenient to refer as a nameserver using it's text-based unique identifier since it's not limited in length thus it cannot be serialized and deserialized safely. To address this limitation, we add a new ->puid member in dns_nameserver struct which is a parent-unique numeric value that can be used to refer to the dns nameserver within its parent resolver context. To achieve this, we reused the resolver->nb_nameserver member that wasn't used. Each time we add a new nameserver to a resolver: we set ns->puid to the current number of nameservers within the resolver and we increment this number right away. Public helper function find_nameserver_by_resolvers_and_id() was added to help retrieve nameserver pointer from (resolver X nameserver puid) combination.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	4fe0cca305	CLEANUP: resolvers: remove duplicate func prototype dns_dgram_init() function prototype was found in both resolvers and dns header files, but it should belong to the dns header file, so the duplicate entry was simply removed.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	ab6fef4882	CLEANUP: server: remove unused server_parse_addr_change_request() function server_parse_addr_change_request() was completely replaced by the newer srv_update_addr_port() function. Considering the function doesn't offer useful features that srv_update_addr_port() couldn't do, we simply remove the function.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	f1f4b93a67	MEDIUM: server: merge srv_update_addr() and srv_update_addr_port() logic Both functions are performing the similar tasks, except that the _port() version is doing a bit more work. In this patch, we add the server_set_inetaddr() function that works like the srv_update_addr_port() but it takes parsed inputs instead of raw strings as arguments. Then, server_set_inetaddr() is used as underlying helper function for both srv_update_addr() and srv_update_addr_port() to make them easier to maintain. Also, helper functions were added: - server_set_inetaddr_warn() -> same as server_set_inetaddr() but report a warning on updates. - server_get_inetaddr() -> fills a struct server_inetaddr from srv Since the feedback message generation part was slightly reworked, some minor changes in the way addr:svc_port updates are reported in the logs or cli messages should be expected (no loss of information though).	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	2d0c7f5935	CLEANUP: server/event_hdl: remove purge_conn hint in INETADDR event Now that purge_conn hint is now being ignored thanks to previous commit, we can simply get rid of it.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	545e72546c	BUG/MINOR: server/event_hdl: propagate map port info through inetaddr event server addr:svc_port updates during runtime might set or clear the SRV_F_MAPPORTS flag. Unfortunately, the flag update is still directly performed by srv_update_addr_port() function while the addr:svc_port update is being scheduled for atomic update. Given that existing readers don't take server's lock to read addr:svc_port, they also check the SRV_F_MAPPORTS flag right after without the lock. So we could cause the readers to incorrectly interpret the svc_port from the server struct because the mapport information is not published atomically, resulting in inconsistencies between svc_port / mapport flag. (MAPPORTS flag causes svc_port to be used differently by the reader) To fix this, we publish the mapport information within the INETADDR server event and we let the task responsible for updating server's addr and port position or clear the flag depending on the mapport hint. This patch depends on: - MINOR: server/event_hdl: add server_inetaddr struct to facilitate event data usage - MINOR: server/event_hdl: update _srv_event_hdl_prepare_inetaddr prototype This should be backported in 2.9 with `683b2ae01` ("MINOR: server/event_hdl: add SERVER_INETADDR event")	2023-12-21 14:22:26 +01:00
Aurelien DARRAGON	14893a6a00	MINOR: server/event_hdl: add server_inetaddr struct to facilitate event data usage event_hdl_cb_data_server_inetaddr struct had some anonymous structs defined in it, making it impossible to pass as a function argument and harder to maintain since changes must be performed at multiple places at once. So instead we define a storage struct named server_inetaddr that helps to save addr:port server information in INET context.	2023-12-21 14:22:26 +01:00
Aurelien DARRAGON	835263047e	OPTIM: server: ebtree lookups for findserver_unique_* functions `4e5e2664` ("MINOR: proxy: add findserver_unique_id() and findserver_unique_name()") added findserver_unique_id() and findserver_unique_name() functions that were inspired from the historical findserver() function, so unfortunately they don't perform well when used on large backend farms because they scan the whole server list linearly. I was about to provide a patch to optimize such functions when I stumbled on Baptiste's work: `19a106d24` ("MINOR: server: server_find functions: id, name, best_match") It turns out Baptiste already implemented helper functions to supersed the unoptimized findserver() function (at least at runtime when servers have been assigned their final IDs and inserted in the lookup trees): they offer more matching options and rely on eb lookups so they are much more suitable for fast queries. I don't know how I missed that, but they are a perfect base for the server rid matching functions. So in this patch, we essentially revert `4e5e2664` to provide the optimized equivalent functions named server_find_by_id_unique() and server_find_by_name_unique(), then we force existing findserver_unique_*() callers to switch to the new functions. This patch depends on: - "OPTIM: server: eb lookup for server_find_by_name()" This could be backported up to 2.8.	2023-12-21 14:22:26 +01:00
Aurelien DARRAGON	8a6cc6e3ea	MEDIUM: proxy: set PR_O_HTTP_UPG on implicit upgrades When a TCP frontend uses an HTTP backend, the stream is automatically upgraded and it results in a similar behavior as if a switch-mode http rule was evaluated since stream_set_http_mode() gets called in both situations and minimal HTTP analyzers are set. In the current implementation, some postparsing checks are generating errors or warnings when the frontend is in TCP mode with some HTTP options set and no upgrade is expected (no switch-rule http). But as you can guess, unfortunately this leads in issues when such "HTTP" only options are used in a frontend that has implicit switching rules (that is, when the frontend uses an HTTP backend for example), because in this case the PR_O_HTTP_UPG will not be set, so the postparsing checks will consider that some options are not relevant and will raise some warnings. Consider the following example: backend back mode http server s1 git.haproxy.org:80 frontend front mode tcp bind localhost:8080 http-request set-var(txn.test) str(TRUE),debug(WORKING,stderr) use_backend back By starting an haproxy instance with the above example conf, we end up having this warning: [WARNING] (400280) : config : 'http-request' rules ignored for frontend 'front' as they require HTTP mode. However, by making a request on the frontend, we notice that the request rules are still executed, and that's because the stream is effectively upgraded as a result of an implicit upgrade: [debug] WORKING: type=str <TRUE> So this confirms the previous description: since implicit and explicit upgrades result in approximately the same behavior on the frontend side, we should consider them both when doing postparsing checks. This is what we try to address in the following commit: PR_O_HTTP_UPG flag is now more generic in the sense that it refers to either implicit (through default_backend or use_backend rules) or explicit (switch-mode rules) upgrades. Indeed, everytime an HTTP or dynamic backend (where the mode cannot be assumed during parsing) is encountered in default_backend directive or use_backend rules, we explicitly position the upgrade flag so that further checks that depend on the proxy being in HTTP context don't report false warnings.	2023-12-21 14:22:26 +01:00
Aurelien DARRAGON	ef9d692544	MINOR: stats: store the parent proxy in stats ctx (http) Some HTTP related stats functions need to know the parent proxy, mainly to get a pointer on the related uri_auth set by the proxy or to check scope settings. The current design (probably historical as only the http context existed by then) took the other approach: it propagates the uri pointer from the http context deep down the calling stack up to the relevant functions. For non-http contexts (cli), the pointer is set to NULL. Doing so is not very pretty and not easy to maintain. Moreover, there were still some places in the code were the uri pointer was learned directly from the stream proxy because the argument was not available as argument from those functions. This is error-prone, because if one day we decide to change the source proxy in the parent function, we might still have some functions down the stack that ignore the top most argument and still do on their own, and we'll probably end up with inconsistencies. So in this patch, we take a safer approach: the caller responsible for creating the stats applet should set the http_px pointer so that any stats function running under the applet that needs to know if it's running in http context or needs to access parent proxy info may do so thanks to the dedicated ctx->http_px pointer.	2023-12-21 14:20:03 +01:00
Christopher Faulet	123a9e7d83	BUG/MAJOR: stconn: Disable zero-copy forwarding if consumer is shut or in error A regression was introduced by commit `2421c6fa7d` ("BUG/MEDIUM: stconn: Block zero-copy forwarding if EOS/ERROR on consumer side"). When zero-copy forwarding is inuse and the consumer side is shut or in error, we declare it as blocked and it is woken up. The idea is to handle this state at the stream-connector level. However this definitly blocks receives on the producer side. So if the mux is unable to close by itself, but instead wait the peer to shut, this can lead to a wake up loop. And indeed, with the passthrough multiplexer this may happen. To fix the issue and prevent any loop, instead of blocking the zero-copy forwarding, we now disable it. This way, the stream-connector on producer side will fallback on classical receives and will be able to handle peer shutdown properly. In addition, the wakeup of the consumer side was removed. This will be handled, if necessary, by sc_notify(). This patch should fix the issue #2395. It must be backported to 2.9.	2023-12-21 11:00:57 +01:00
Christopher Faulet	2421c6fa7d	BUG/MEDIUM: stconn: Block zero-copy forwarding if EOS/ERROR on consumer side When the producer side (h1 for now) negociates with the consumer side to perform a zero-copy forwarding, we now consider the consumer side as blocked if it is closed and this was reported to the SE via a end-of-stream or a (pending) error. It is performed before calling ->nego_ff callback function, in se_nego_ff(). This way, all consumer are concerned automatically. The aim of this patch is to fix an issue with the QUIC mux. Indeed, it is unexpected to send a frame on an closed stream. This triggers a BUG_ON(). Other muxes are not affected but it remains useless to try to send data if the stream is closed. This patch should fix the issue #2372. It must be backported to 2.9.	2023-12-13 16:45:29 +01:00
Amaury Denoyelle	e772d3f40f	CLEANUP: mux-quic: clean up app ops callback definitions qcc_app_ops is a set of callbacks used to unify application protocol running over QUIC. This commit introduces some changes to clarify its API : * write simple comment to reflect each callback purpose * rename decode_qcs to rcv_buf as this name is more common and is similar to already existing snd_buf * finalize is moved up as it is used during connection init stage All these changes are ported to HTTP/3 layer. Also function comments have been extended to highlight HTTP/3 special characteristics.	2023-12-11 16:15:13 +01:00
Amaury Denoyelle	f496c7469b	MINOR: mux-quic: clean up qcs Tx buffer allocation API This function is similar to the previous one, but this time for QCS sending buffer. Previously, each application layer redefine their own version of mux_get_buf() which was used to allocate <qcs.tx.buf>. Unify it under a single function renamed qcc_get_stream_txbuf().	2023-12-11 16:08:51 +01:00
Amaury Denoyelle	b526ffbfb9	MINOR: mux-quic: clean up qcs Rx buffer allocation API Replaces qcs_get_buf() function which naming does not reflect its purpose. Add a new function qcc_get_stream_rxbuf() which allocate if needed <qcs.rx.app_buf> and returns the buffer pointer. This function is reserved for application protocol layer. This buffer is then accessed by stconn layer. For other qcs_get_buf() invocation which was used in effect for a local buffer, replace these by a plain b_alloc().	2023-12-11 16:02:30 +01:00
Amaury Denoyelle	14d968f2f2	CLEANUP: mux-quic: remove unused prototype Remove qcc_emit_cc_app() prototype from header file. This function was removed by a previous commit and does not exist anymore.	2023-12-11 15:12:57 +01:00
William Lallemand	1c1bb8ef2a	BUG/MINOR: mworker/cli: fix set severity-output support "set severity-output" is one of these command that changes the appctx state so the next commands are affected. Unfortunately the master CLI works with pipelining and server close mode, which means the connection between the master and the worker is closed after each response, so for the next command this is a new appctx state. To fix the problem, 2 new flags are added ACCESS_MCLI_SEVERITY_STR and ACCESS_MCLI_SEVERITY_NB which are used to prefix each command sent to the worker with the right "set severity-output" command. This patch fixes issue #2350. It could be backported as far as 2.6.	2023-12-07 17:37:23 +01:00
Christopher Faulet	67c03508d6	MEDIUM: pattern: Add support for virtual and optional files for patterns Before this patch, it was not possible to use a list of patterns, map or a list of acls, without an existing file. However, it could be handy to just use an ID, with no file on the disk. It is pretty useful for everyone managing dynamically these lists. It could also be handy to try to load a list from a file if it exists without failing if not. This way, it could be possible to make a cold start without any file (instead of empty file), dynamically add and del patterns, dump the list to the file periodically to reuse it on reload (via an external process). In this patch, we uses some prefixes to be able to use virtual or optional files. The default case remains unchanged. regular files are used. A filename, with no prefix, is used as reference, and it must exist on the disk. With the prefix "file@", the same is performed. Internally this prefix is skipped. Thus the same file, with ou without "file@" prefix, references the same list of patterns. To use a virtual map, "virt@" prefix must be used. No file is read, even if the following name looks like a file. It is just an ID. The prefix is part of ID and must always be used. To use a optional file, ie a file that may or may not exist on a disk at startup, "opt@" prefix must be used. If the file exists, its content is loaded. But HAProxy doesn't complain if not. The prefix is not part of ID. For a given file, optional files and regular files reference the same list of patterns. This patch should fix the issue #2202.	2023-12-06 10:24:41 +01:00
Christopher Faulet	533121a56e	MINOR: cache: Add global option to enable/disable zero-copy forwarding tune.cache.zero-copy-forwarding parameter can now be used to enable or disable the zero-copy fast-forwarding for the cache applet only. It is enabled ('on') by default. It can be disabled by setting the parameter to 'off'.	2023-12-06 10:24:41 +01:00
Christopher Faulet	a40321eb3b	MINOR: channel: Use dedicated functions to deal with STREAMER flags For now, CF_STREAMER and CF_STREAMER_FAST flags are set in sc_conn_recv() function. The logic is moved in dedicated functions. First, channel_check_idletimer() function is now responsible to check the channel's last read date against the idle timer value to be sure the producer is still streaming data. Otherwise, it removes STREAMER flags. Then, channel_check_xfer() function is responsible to check amount of data transferred avec a receive, to eventually update STREAMER flags. In sc_conn_recv(), we now use these functions.	2023-12-06 10:24:41 +01:00
Willy Tarreau	eb67d63456	[RELEASE] Released version 3.0-dev0 Released version 3.0-dev0 with the following main changes : - exact copy of 2.9.0	2023-12-05 16:19:35 +01:00
Christopher Faulet	7732323cf3	MINOR: global: Use a dedicated bitfield to customize zero-copy fast-forwarding Zero-copy fast-forwading feature is a quite new and is a bit sensitive. There is an option to disable it globally. However, all protocols have not the same maturity. For instance, for the PT multiplexer, there is nothing really new. The zero-copy fast-forwading is only another name for the kernel splicing. However, for the QUIC/H3, it is pretty new, not really optimized and it will evolved. And soon, the support will be added for the cache applet. In this context, it is usefull to be able to enable/disable zero-copy fast-forwading per-protocol and applet. And when it is applicable, on sends or receives separately. So, instead of having one flag to disable it globally, there is now a dedicated bitfield, global.tune.no_zero_copy_fwd.	2023-12-04 15:31:47 +01:00
Aurelien DARRAGON	c2cd6a419c	BUG/MINOR: server/event_hdl: properly handle AF_UNSPEC for INETADDR event It is possible that a server's addr family is temporarily set to AF_UNSPEC even if we're certain to be in INET context (ipv4, ipv6). Indeed, as soon as IP address resolving is involved, srv->addr family will be set to AF_UNSPEC when the resolution fails (could happen at anytime). However, _srv_event_hdl_prepare_inetaddr() wrongly assumed that it would only be called with AF_INET or AF_INET6 families. Because of that, the function will handle AF_UNSPEC address as an IPV6 address: not only we could risk reading from an unititialized area, but we would then propagate false information when publishing the event. In this patch we make sure to properly handle the AF_UNSPEC family in both the "prev" and the "next" part for SERVER_INETADDR event and that every members are explicitly initialized. This bug was introduced by 6fde37e046 ("MINOR: server/event_hdl: add SERVER_INETADDR event"), no backport needed.	2023-12-01 20:43:42 +01:00
Amaury Denoyelle	0ce213d246	MINOR: quic_tp: use in_addr/in6_addr for preferred_address preferred_address is a transport parameter specify by the server. It specified both an IPv4 and IPv6 address. These addresses were defined as plain array in <struct tp_preferred_address>. Convert these adressees to use the common types in_addr/in6_addr. With this change, dumping of preferred_address is extended. It now displays the addresses using inet_ntop() and CID value.	2023-11-30 15:59:45 +01:00
Amaury Denoyelle	f31719edae	CLEANUP: quic_cid: remove unused listener arg retrieve_qc_conn_from_cid() requires listener as argument whereas it is unused. This is an artifact from the old architecture where CID trees where stored on listener instances instead of globally. Remove it to better reflect this change.	2023-11-30 15:04:27 +01:00
Christopher Faulet	0f15dcd9a7	MINOR: muxes: Add a callback function to send commands to mux streams Just like the ->ctl() callback function, used to send commands to mux connections, the ->sctl() callback function can now be used to send commands to mux streams. The first command, MUX_SCTL_SID, is a way to request the mux stream ID. It will be implemented later for each mux.	2023-11-29 11:11:12 +01:00
Christopher Faulet	d982a37e4c	MINOR: muxes: Rename mux_ctl_type values to use MUX_CTL_ prefix Instead of the generic MUX_, we now use MUX_CTL_ prefix for all mux_ctl_type value. This will avoid any ambiguities with other enums, especially with a new one that will be added to get information on mux streams.	2023-11-29 11:11:12 +01:00
Christopher Faulet	8f56552862	MINOR: stream: Expose session terminate state via a new sample fetch It is now possible to retrieve the session terminate state, using "txn.sess_term_state". The sample fetch returns the 2-character session termation state. Of course, the result of this sample fetch is volatile. It is subject to change. It is also most of time useless because no termation state is set except at the end. It should only be useful in http-after-response rule sets. It may also be used to customize the logs using a log-format directive. This patch should fix the issue #2221.	2023-11-29 11:11:12 +01:00
Christopher Faulet	b2f82b2b51	MINOR: http-fetch: Add a sample to retrieve the server status code The code returned by the "status" sample fetch is the one in the HTTP response at the moment the sample is evaluated. It may be the status code in the server response or the one of the HAProxy reply in case of error, deny, redirect... However, it could be handy to retrieve the status code returned by the server, when a HTTP response was really received from it. It is the purpose of the "server_status" sample fetch. The server status code itself is stored in the HTTP txn.	2023-11-29 11:11:12 +01:00
Aurelien DARRAGON	2f2cb6d082	MEDIUM: log/balance: support FQDN for UDP log servers In previous log backend implementation, we created a pseudo log target for each declared log server, and we made the log target's address point to the actual server address to save some time and prevent unecessary copies. But this was done without knowing that when FQDN is involved (more broadly when dns/resolution is involved), the "port" part of server addr should not be relied upon, and we should explicitly use ->svc_port for that purpose. With that in mind and thanks to the previous commit, some changes were required: we allocate a dedicated addr within the log target when target is in DGRAM mode. The addr is first initialized with known values and it is then updated automatically by _srv_set_inetaddr() during runtime. (the change is atomic so readers don't need to worry about it) addr from server "log target" (INET/DGRAM mode) is made of the combination of server's address (lacking the port part) and server's svc_port.	2023-11-29 08:59:27 +01:00
Aurelien DARRAGON	cb3ec978fd	MINOR: event_hdl: add global tunables The local variable "event_hdl_async_max_notif_at_once" which was introduced with the event_hdl API was left as is but with a TODO note telling that we should make it a global tunable. Well, we're doing this now. To prepare for upcoming tunables related to event_hdl API, we add a dedicated struct named event_hdl_tune which is globally exposed through the event_hdl header file so that it may be used from everywhere. The struct is automatically initialized in event_hdl_init() according to defaults.h. "event_hdl_async_max_notif_at_once" now becomes "event_hdl_tune.max_events_at_once" with it's dedicated configuation keyword: "tune.events.max-events-at-once". We're also taking this opportunity to raise the default value from 10 to 100 since it's seems quite reasonnable given existing async event_hdl users. The documentation was updated accordingly.	2023-11-29 08:59:27 +01:00
William Lallemand	08f1e2bea2	MINOR: mworker/cli: implements the customized payload pattern for master CLI Implements the customized payload pattern for the master CLI. The pattern is stored in the stream in char pcli_payload_pat[8]. The principle is basically the same as the CLI one, it looks for '<<' then stores what's between '<<' and '\n', and look for it to exit the payload mode.	2023-11-28 19:13:49 +01:00
William Lallemand	e3557c7d45	MEDIUM: cli: allow custom pattern for payload The CLI payload syntax has some limitation, it can't handle payloads with empty lines, which is a common problem when uploading a PEM file over the CLI. This patch implements a way to customize the ending pattern of the CLI, so we can't look for other things than empty lines. A char cli_payload_pat[8] is used in the appctx to store the customized pattern. The pattern can't be more than 7 characters and can still empty to match an empty line. The cli_io_handler() identifies the pattern and stores it, and cli_parse_request() identifies the end of the payload. If the customized pattern between "<<" and "\n" is more than 7 characters, it is not considered as a pattern. This patch only implements the parser for the 'stats socket', another patch is needed for the 'master CLI'.	2023-11-28 19:12:32 +01:00
Frédéric Lécaille	ad61a5dde3	REORG: quic: Move quic_increment_curr_handshake() to quic_sock Move quic_increment_curr_handshake() from quic_conn.c to quic_sock.h to be inlined. Also move all the inlined functions at the end of this header.	2023-11-28 15:47:18 +01:00
Frédéric Lécaille	95e9033fd2	REORG: quic: Add a new module for retransmissions Move several functions in relation with the retransmissions from TX part (quic_tx.c) to quic_retransmit.c new C file.	2023-11-28 15:47:18 +01:00
Frédéric Lécaille	714d1096bc	REORG: quic: Move qc_notify_send() to quic_conn Move qc_notify_send() from quic_tx.c to quic_conn.c. Note that it was already exported from both quic_conn.h and quic_tx.h. Modify this latter header to fix the duplication.	2023-11-28 15:47:18 +01:00
Frédéric Lécaille	b39362070d	BUILD: quic: Several compiler warns fixes after retry module creation Such a warning appeared after having added quic_retry.h which includes only headers for types (quic_cid-t.h, clock-t.h...) In file included from include/haproxy/quic_retry.h:12, from src/quic_retry.c:5: include/haproxy/quic_cid-t.h:26:26: error: field ‘seq_num’ has incomplete type 26 \| struct eb64_node seq_num;	2023-11-28 15:47:18 +01:00
Frédéric Lécaille	b5970967ca	REORG: quic: Add a new module for QUIC retry Add quic_retry.c new C file for the QUIC retry feature: quic_saddr_cpy() moved from quic_tx.c, quic_generate_retry_token_aad() moved from quic_generate_retry_token() moved from parse_retry_token() moved from quic_retry_token_check() moved from quic_retry_token_check() moved from	2023-11-28 15:47:18 +01:00
Frédéric Lécaille	43fbea0f38	REORG: quic: Move ncbuf related function from quic_rx to quic_conn Move quic_get_ncbuf() and quic_free_ncbuf() from quic_rx.c to quic_conn.h as static inlined functions.	2023-11-28 15:47:18 +01:00
Frédéric Lécaille	e0d3eb496b	REORG: quic: Move NEW_CONNECTION_ID frame builder to quic_cid Move qc_build_new_connection_id_frm() from quic_conn.c to quic_cid.c. Also move quic_connection_id_to_frm_cpy() from quic_conn.h to quic_cid.h.	2023-11-28 15:47:18 +01:00
Frédéric Lécaille	795d1a57bf	REORG: quic: Rename some (quic\|qc)_conn* objects to quic_conn_closed These objects could be confused with the ones defined by the congestion control part (quic_cc.c).	2023-11-28 15:47:16 +01:00
Frédéric Lécaille	d7a5fa24dc	REORG: quic: Move qc_pkt_long() to quic_rx.h This inlined function takes a quic_rx_packet struct as argument unique argument. Let's move it to QUIC RX part.	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	0b872e24cd	REORG: quic: Move qc_may_probe_ipktns() to quic_tls.h This function is in relation with the Initial packet number space which is more linked to the QUIC TLS specifications. Let's move it to quic_tls.h to be inlined.	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	c93ebcc59b	REORG: quic: Move quic_build_post_handshake_frames() to quic_conn module Move quic_build_post_handshake_frames() from quic_rx.c to quic_conn.c. This is a function which is also called from the TX part (quic_tx.c).	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	3482455ddd	REORG: quic: Move qc_handle_conn_migration() to quic_conn.c This function manipulates only quic_conn objects. Its location is definitively in quic_conn.c.	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	581549851c	REORG: quic: Move QUIC path definitions/declarations to quic_cc module Move quic_path struct from quic_conn-t.h to quic_cc-t.h and rename it to quic_cc_path. Update the code consequently. Also some inlined functions in relation with QUIC path to quic_cc.h	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	f32fc26b62	REORG: quic: Rename some functions used upon ACK receipt Rename some functions to reflect more their jobs. Move qc_release_lost_pkts() to quic_loss.c	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	f74d882ef0	REORG: quic: Move the QUIC DCID parser to quic_sock.c Move quic_get_dgram_dcid() from quic_conn.c to quic_sock.c because only used in this file and define it as static.	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	09ab48472c	REORG: quic: Move several inlined functions from quic_conn.h Move quic_pkt_type(), quic_saddr_cpy(), quic_write_uint32(), max_available_room(), max_stream_data_size(), quic_packet_number_length(), quic_packet_number_encode() and quic_compute_ack_delay_us() to quic_tx.c because only used in this file. Also move quic_ack_delay_ms() and quic_read_uint32() to quic_tx.c because they are used only in this file. Move quic_rx_packet_refinc() and quic_rx_packet_refdec() to quic_rx.h header. Move qc_el_rx_pkts(), qc_el_rx_pkts_del() and qc_list_qel_rx_pkts() to quic_tls.h header.	2023-11-28 15:37:47 +01:00
Frédéric Lécaille	831764641f	REORG: quic: Move QUIC CRYPTO stream definitions/declarations to QUIC TLS Move quic_cstream struct definition from quic_conn-t.h to quic_tls-t.h. Its pool is also moved from quic_conn module to quic_tls. Same thing for quic_cstream_new() and quic_cstream_free().	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	ae885b9b68	REORG: quic: Move CRYPTO data buffer defintions to QUIC TLS module Move quic_crypto_buf struct definition from quic_conn-t.h to quic_tls-t.h. Also move its pool definition/declaration to quic_tls-t.h/quic_tls.c.	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	5f9bd6bbce	BUILD: quic: Missing RX header inclusions Fix such building issues: In file included from src/quic_tx.c:15: include/haproxy/quic_tx.h:51:23: warning: ‘struct quic_rx_packet’ Do not know why the compiler warns about such missing header inclusions just now. It should have complained a long time ago during the big QUIC source code split.	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	f949f7df83	REORG: quic: QUIC connection types header cleaning Move UDP datagram definitions from quic_conn-t.h to quic_sock-t.h Move debug quic_rx_crypto_frm struct from quic_conn-t.h to quic_trace-t.h	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	0fc0d45745	REORG: quic: Add a new module to handle QUIC connection IDs Move quic_cid and quic_connnection_id from quic_conn-t.h to new quic_cid-t.h header. Move defintions of quic_stateless_reset_token_init(), quic_derive_cid(), new_quic_cid(), quic_get_cid_tid() and retrieve_qc_conn_from_cid() to quic_cid.c new C file.	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	21615d4376	CLEANUP: quic: Remove dead definitions/declarations Remove useless definitions and declarations.	2023-11-28 15:37:22 +01:00
Christopher Faulet	2a307d273a	BUG/MEDIUM: stconn: Don't perform zero-copy FF if opposite SC is blocked When zero-copy data fast-forwarding is inuse, if the opposite SC is blocked, there is no reason to try to fast-forward more data. Worst, in some cases, this can lead to a receive loop of the producer side while the consumer side is blocked. No backport needed.	2023-11-28 14:01:56 +01:00
Amaury Denoyelle	e97489a526	MINOR: trace: support -dt optional format Add an optional argument for "-dt". This argument is interpreted as a list of several trace statement separated by comma. For each statement, a specific trace name can be specifed, or none to act on all sources. Using double-colon separator, it is possible to add specifications on the wanted level and verbosity.	2023-11-27 17:15:14 +01:00
Amaury Denoyelle	cef29d3708	MINOR: trace: define simple -dt argument Add '-dt' haproxy process argument. This will automatically activate all trace sources on stderr with the error level. This could be useful to troubleshoot issues such as protocol violations.	2023-11-27 17:10:18 +01:00
Willy Tarreau	3ac9912837	OPTIM: pattern: save memory and time using ebst instead of ebis In the pat_ref_elt struct, the pattern string is stored outside of the node element, using a pointer to an strdup(). Not only this needlessly wastes at least 16-24 bytes per entry (8 for the pointer, 8-16 for the allocator), it also makes the tree descent less efficient since both the node and the string have to be visited for each layer (hence at least two cache lines). Let's use an ebmb storage and place the pattern right at the end of the pat_ref_elt, making it a variable-sized element instead. The set-map test below jumps from 173 to 182 kreq/s/core, and the memory usage drops from 356 MB to 324 MB: http-request set-map(/dev/null) %[rand(1000000)] 1 This is even more visible with large maps: after loading 16M IP addresses into a map, the process uses this amount of memory: - 3.15 GB with haproxy-2.8 - 4.21 GB with haproxy-2.9-dev11 - 3.68 GB with this patch So that's a net saving of 32 bytes per entry here, which cuts in half the extra cost of the tree, and loading a large map takes about 20% less time.	2023-11-27 11:25:07 +01:00
Willy Tarreau	fc800b6cb7	MINOR: task/profiling: do not record task_drop_running() as a caller Task_drop_running() is used to remove the RUNNING bit and check if while the task was running it got a new wakeup from itself. Thus each time task_drop_running() marks itself as a caller, it in fact removes the previous caller that woke up the task, such as below: Tasks activity over 10.439 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 57895273 6.396m 6.628us 2.733h 170.0us <- run_tasks_from_lists@src/task.c:658 task_drop_running Better not mark this function as a caller and keep the original one: Tasks activity over 13.834 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 62424582 5.825m 5.599us 5.717h 329.7us <- sc_app_chk_rcv_applet@src/stconn.c:952 appctx_wakeup	2023-11-27 11:24:52 +01:00
William Lallemand	3dd55fa132	MINOR: mworker/cli: implement hard-reload over the master CLI The mworker mode never had a proper 'hard-stop' (-st) for the reload, this is a mode which was commonly used with the daemon mode, but it was never implemented in mworker mode. This patch fixes the problem by implementing a "hard-reload" command over the master CLI. It does the same as the "reload" command, but instead of waiting for the connections to stop in the previous process, it immediately quits the previous process after binding.	2023-11-24 21:44:25 +01:00
Aurelien DARRAGON	f2629ebd4e	MINOR: proxy: add free_server_rules() helper function Take the px->server_rules freeing part out of free_proxy() and make it a dedicated helper function so that it becomes possible to use it from anywhere.	2023-11-24 16:27:55 +01:00
Aurelien DARRAGON	24da4d3ee7	MINOR: tools: use const for read only pointers in ip{cmp,cpy} In this patch we fix the prototype for ipcmp() and ipcpy() functions so that input pointers that are used exclusively for reads are used as const pointers. This way, the compiler can safely assume that those variables won't be altered by the function.	2023-11-24 16:27:55 +01:00
Aurelien DARRAGON	683b2ae013	MINOR: server/event_hdl: add SERVER_INETADDR event In this patch we add the support for a new SERVER event in the event_hdl API. SERVER_INETADDR is implemented as an advanced server event. It is published each time the server's ip address or port is about to change. (ie: from the cli, dns, lua...) SERVER_INETADDR data is an event_hdl_cb_data_server_inetaddr struct that provides additional info related to the server inet addr change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-11-24 16:27:55 +01:00
Christopher Faulet	8d46a2c973	MAJOR: h3: Implement zero-copy support to send DATA frame When possible, we try send DATA frame without copying data. To do so, we swap the input buffer with QCS tx buffer. It is only possible iff: * There is only one HTX block of data at the beginning of the message * Amount of data to send is equal to the size of the HTX data block * The QCS tx buffer is empty In this case, both buffers are swapped. The frame metadata are written at the begining of the buffer, before data and where the HTX structure is stored.	2023-11-24 07:42:43 +01:00
Christopher Faulet	1bcc0f8892	MEDIUM: mux-quic: Add consumer-side fast-forwarding support The QUIC multiplexer now implements callbacks to consume fast-forwarded data. It relies on the H3 stack to acquire the buffer and format the frame.	2023-11-24 07:42:43 +01:00
Amaury Denoyelle	a3187fe06c	MINOR: rhttp: add count of active conns per thread Add a new member <nb_rhttp_conns> in thread_ctx structure. Its purpose is to count the current number of opened reverse HTTP connections regarding from their listeners membership. This patch will be useful to support multi-thread for active reverse HTTP, in order to select the less loaded thread. Note that despite access to <nb_rhttp_conns> are only done by the current thread, atomic operations are used. This is because once multi-thread support will be added, external threads will also retrieve values from others.	2023-11-23 17:43:01 +01:00
Amaury Denoyelle	55e78ff7e1	MINOR: rhttp: large renaming to use rhttp prefix Previous commit renames 'proto_reverse_connect' module to 'proto_rhttp'. This commits follows this by replacing various custom prefix by 'rhttp_' to make the code uniform. Note that 'reverse_' prefix was kept in connection module. This is because if a new reversable protocol not based on HTTP is implemented, it may be necessary to reused the same connection function which are protocol agnostic.	2023-11-23 17:40:01 +01:00
Amaury Denoyelle	e09af499b4	MINOR: rhttp: rename proto_reverse_connect This commit is renaming of module proto_reverse_connect to proto_rhttp. This name is selected as it is shorter and more precise.	2023-11-23 17:38:58 +01:00
Willy Tarreau	1de44daf7d	MINOR: ext-check: add an option to preserve environment variables In Github issue #2128, @jvincze84 explained the complexity of using external checks in some advanced setups due to the systematic purge of environment variables, and expressed the desire to preserve the existing environment. During the discussion an agreement was found around having an option to "external-check" to do that and that solution was tested and confirmed to work by user @nyxi. This patch just cleans this up, implements the option as "preserve-env" and documents it. The default behavior does not change, the environment is still purged, unless "preserve-env" is passed. The choice of not using "import-env" instead was made so that we could later use it to name specific variables that have to be imported instead of keeping the whole environment. The patch is simple enough that it could be backported if needed (and was in fact tested on 2.6 first).	2023-11-23 16:53:57 +01:00
Ilya Shipitsin	80813cdd2a	CLEANUP: assorted typo fixes in the code and comments This is 37th iteration of typo fixes	2023-11-23 16:23:14 +01:00
Willy Tarreau	6455fd5024	MINOR: debug: add the ability to enter components in the post_mortem struct Here the idea is to collect components' versions and build options. The main component is haproxy, but the API is made so that any sub-system can easily add a component there (for example the detailed version of a device detection lib, or some info about a lib loaded from Lua). The elements are stored as a pointer to an array of structs and its count so that it's sufficient to issue this in gdb to list them all at once: print *post_mortem.components@post_mortem.nb_components For now we collect name, version, toolchain, toolchain options, build options and path. Maybe more could be useful in the future.	2023-11-23 15:39:21 +01:00
Willy Tarreau	2268f10dd6	DEBUG: tinfo: store the pthread ID and the stack pointer in tinfo When debugging a core, it's difficult to match a given gdb thread number against an internal thread. Let's just store the pthread ID and the stack pointer in each tinfo. This could help in the future by allowing to just glance over them and pick the right one depending what info is found first.	2023-11-23 14:32:55 +01:00
Amaury Denoyelle	54c94c60d2	DEBUG: connection/flags: update flags for reverse HTTP Add missing CO_FL_REVERSED and CO_FL_ACT_REVERSING flag definitions in conn_show_flags(). These flags were introduced in this release with reverse HTTP support. No need to backport	2023-11-20 18:10:12 +01:00
Amaury Denoyelle	decf29d06d	MINOR: quic: remove unneeded QUIC specific stopping function On CONNECTION_CLOSE reception/emission, QUIC connections enter CLOSING state. At this stage, only CONNECTION_CLOSE can be reemitted and all other exchanges are stopped. Previously, on haproxy process stopping, if all QUIC connections were in CLOSING state, they were released before their closing timer expiration to not block the process shutdown. However, since a recent commit, the closing timer has been shorten to a more reasonable delay. It is now consider viable to respect connections closing state even on process shutdown. As such, stopping specific code in QUIC connections idle timer task was removed. A specific function quic_handle_stopping() was implemented to notify QUIC connections on shutdown from main() function. It should have been deleted along the removal in QUIC idle timer task. This patch just does this.	2023-11-20 17:59:52 +01:00
Willy Tarreau	445fc1fe3a	BUG/MINOR: sock: mark abns sockets as non-suspendable and always unbind them In 2.3, we started to get a cleaner socket unbinding mechanism with commit `f58b8db47` ("MEDIUM: receivers: add an rx_unbind() method in the protocols"). This mechanism rightfully refrains from unbinding when sockets are expected to be transferrable to another worker via "expose-fd listeners", but this is not compatible with ABNS sockets, which do not support reuseport, unbinding nor being renamed: in short they will always prevent a new process from binding. It turns out that this is not much visible because by pure accident, GTUNE_SOCKET_TRANSFER is only set in the code dealing with master mode and deamons, so it's never set in foreground mode nor in tests even if present on the stats socket. However with master mode, it is now always set even when not present on the stats socket, and will always conflict. The only reasonable approach seems to consist in marking these abns sockets as non-suspendable so that the generic sock_unbind() code can decide to just unbind them regardless of GTUNE_SOCKET_TRANSFER. This should carefully be backported as far as 2.4.	2023-11-20 11:38:26 +01:00
Aurelien DARRAGON	4b2616f784	MINOR: log/backend: prevent stick table and stick rules with LOG mode Report a warning and prevent errors if user tries to declare a stick table or use stick rules within a log backend.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	6a29888f60	MINOR: log/backend: ensure log exclusive params are not used in other modes add proxy_cfg_ensure_no_log() function (similar to proxy_cfg_ensure_no_http()) to ensure at the end of proxy parsing that no log exclusive options are found if the proxy is not in log mode.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	b61147fd2a	MEDIUM: log/balance: merge tcp/http algo with log ones "log-balance" directive was recently introduced to configure the balancing algorithm to use when in a log backend. However, it is confusing and it causes issues when used in default section. In this patch, we take another approach: first we remove the "log-balance" directive, and instead we rely on existing "balance" directive to configure log load balancing in log backend. Some algorithms such as roundrobin can be used as-is in a log backend, and for log-only algorithms, they are implemented as "log-$name" inside the "backend" directive. The documentation was updated accordingly.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	f42dfaa214	MEDIUM: lbprm: store algo params on 32bits Make sure lbprm.algo can store 32bits by declaring it as uint32_t Then, use all 32 available bits to offer 4 extra bits for the BE_LB_NEED inputs. This will allow new required inputs to be easily added (up to 4 new ones, plus one that wasn't used yet if we keep them exclusive) This required some cleanup: all ALGO bitfields were rewritten in the 32bits format and the high ones were shifted to make room for the new BE_LB_NEED bits.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	a327b80f1f	CLEANUP: backend: removing unused LB param BE_LB_HASH_RND was introduced with `760e81d35` ("MINOR: backend: implement random-based load balancing") but was never used since. Removing it to regain an extra slot for future types.	2023-11-18 11:16:21 +01:00

... 8 9 10 11 12 ...

8215 Commits