haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-08 08:07:10 +02:00

Author	SHA1	Message	Date
Ilya Shipitsin	80813cdd2a	CLEANUP: assorted typo fixes in the code and comments This is 37th iteration of typo fixes	2023-11-23 16:23:14 +01:00
Willy Tarreau	45a9e4e24b	MINOR: init: add info about the main program to the post_mortem struct This way we'll still have haproxy's version, build options etc in core dumps and centralized all at once.	2023-11-23 15:39:21 +01:00
Willy Tarreau	6455fd5024	MINOR: debug: add the ability to enter components in the post_mortem struct Here the idea is to collect components' versions and build options. The main component is haproxy, but the API is made so that any sub-system can easily add a component there (for example the detailed version of a device detection lib, or some info about a lib loaded from Lua). The elements are stored as a pointer to an array of structs and its count so that it's sufficient to issue this in gdb to list them all at once: print *post_mortem.components@post_mortem.nb_components For now we collect name, version, toolchain, toolchain options, build options and path. Maybe more could be useful in the future.	2023-11-23 15:39:21 +01:00
Willy Tarreau	a88a3482b5	MINOR: debug: dump the mapping of the libs into post_mortem Having the libs and their addresses listed in the post_mortem struct is also helpful. Sometimes it helps notice that one version is not the expected one, e.g. due to some LD_LIBRARY_PATH. We don't emit it on "show dev" however since that's already available via "show libs".	2023-11-23 15:39:21 +01:00
Willy Tarreau	37e3dd718c	MINOR: debug: copy the thread info into the post_mortem struct The last starting thread now copies the pthread ID and stack top of each thread into post_mortem. That way it's as easy as issuing "p post_mortem" in gdb to see all thread IDs and stack frames and more easily map them to the threads met in a core.	2023-11-23 15:39:21 +01:00
Willy Tarreau	c0eec3a4aa	MINOR: debug: collect some boot-time info related to the process Here we collect the original uid/gid/rlimits for FD and RAM since these ones do affect behavior and are sometimes different from expected in containers or when starting as a service.	2023-11-23 15:39:21 +01:00
Willy Tarreau	ff9e06cd53	MINOR: debug: report any detected hypervisor in post_mortem When the x86 CPU flags show the "hypervisor" flag, we know we're running inside QEMU, VMware or possibly other flavors of hypervisors. In this case we'll report either "qemu", "vmware" or "yes" for other ones in the "virt_techno" field, based on the DMI hardware vendor name, otherwise "no" when the flag is not found.	2023-11-23 15:39:21 +01:00
Willy Tarreau	0cc799bdd1	MINOR: debug: detect CPU model and store it in post_mortem The CPU model and type has significant impact on certain bugs, such as contention issues caused by CPUs having split L3 caches, or stricter memory models that exhibit some barrier issues. It's complicated though because the info about the model depends on the arch. For example, x86 reports an SKU name while ARM rather reports the CPU core types, families and versions for each CPU core. There, the SoC will sometimes be reported in the device tree or DMI info instead. But we don't really care, it's essentially useful to know if the code is running on an armv8.0 such as A53, a 8.2 such as A55/A76/Neoverse etc. For MIPS the model appears to generally be there, and in addition the SoC is often present in the "system type" field before the first CPU, and the type of machine in the "machine" field, to replace the missing DMI and DT, so they are also collected. Note that only the first CPU is checked and reported, that's expected to be vastly sufficient, since we're just trying to spot known incompatibilities or issues.	2023-11-23 15:39:21 +01:00
Willy Tarreau	2974f3e71b	MINOR: debug: report in post_mortem if the container techno used is docker If we detect we're running inside a container on Linux, let's check if it seems to be docker. Docker usually creates a /.dockerenv file, which is easy to check. It's uncertain whether it's always the case, but on the few tested instances that was true, and we don't really care, what matters is to place helpful debugging info for developers. When this file is detected, we report "docker" instead of "yes" in the container techno.	2023-11-23 15:39:21 +01:00
Willy Tarreau	cf8be50a3d	MINOR: debug: report in port_mortem whether a container was detected Containers often cause significant trouble depending on how they're set up, and they're not always trivial for their users to extract info from. Here we're trying to detect if we're running inside a container on Linux. There are plenty of approaches and none is perfectly clean nor reliable, which makes sense since the goal is to remain transparent enough. One interesting approach is to rely on the observation that containers generally do not expose most kernel threads, and that the very firsts of them are extremely stable across all kernel versions: pid 2 was called "keventd" in kernel 2.4, became "kthreadd" in kernel 2.6, and has since not changed. This is true on all architectures tested, even with highly stripped down kernels such as those found on 15 year-old OpenWRT images. And this one doesn't appear inside containers. Thus here we check if we find such a thread via /proc and whether it's called keventd or kthreadd, to detect a container, and we set the "cont_techno" variable to "yes" or "no" depending on what is found.	2023-11-23 15:39:21 +01:00
Willy Tarreau	4e3f9921de	MINOR: debug: add OS/hardware info to the post_mortem struct Let's extract some info about the system (board model, vendor etc), this will indicate some hypervisors, some cloud instances or some uncommon embedded boards etc. Typically, vmware, qemu and raspberry-pi are visible here and can help during the troubleshooting session.	2023-11-23 15:39:21 +01:00
Willy Tarreau	0184597522	MINOR: debug: start to create a new struct post_mortem The goal here is to accumulate precious debugging information in a struct that is easy to find in memory. It's aligned to 256-byte as it also helps. We'll progressively add a lot of info about the startup conditions, the operating system, the hardware and hypervisor so as to limit the number of round trips between developers and users during debugging sessions. Also, opening a core file with an hex editor should often be sufficient to extract most of the info. In addition, a new "show dev" command will show these information so that they can be checked at runtime without having to wait for a crash (e.g. if a limit is bad in a container, better know it early). For now the struct only contains utsname that's fed at boot time.	2023-11-23 15:39:21 +01:00
Willy Tarreau	2268f10dd6	DEBUG: tinfo: store the pthread ID and the stack pointer in tinfo When debugging a core, it's difficult to match a given gdb thread number against an internal thread. Let's just store the pthread ID and the stack pointer in each tinfo. This could help in the future by allowing to just glance over them and pick the right one depending what info is found first.	2023-11-23 14:32:55 +01:00
Willy Tarreau	53da8bfcb6	BUG/MINOR: server: do not leak default-server in defaults sections When a default-server directive is used in a defaults section, it's never freed and the "defaults" proxy gets reset without freeing the fields from that default-server. Normally there are no allocation there, except for the config file location stored in srv->conf.file form an strdup() since commit `9394a9444` ("REORG: server: move alert traces in parse_server") that appeared in 2.4. In addition, if a "default-server" directive appears multiple times in a defaults section, one more entry will be leaked per call. This commit addresses this by checking that we don't overwrite the file upon multiple calls, and by clearing it when resetting the default proxy. This should be backported to 2.4.	2023-11-23 14:32:55 +01:00
Frédéric Lécaille	7fc52357cb	BUG/MINOR: quic: Possible RX packet memory leak under heavy load This bug could be reproduced with -dMfail and h2load generating plenty of connections. A "show pools" CLI command showed that some memory in relation with RX packet pool was never release. Furthermore, adding a RX packet counter to each connection and a BUG_ON() in quic_conn_release() has proved that this unreleased memory was in relation with RX packet which were not linked to a connection. The responsible is quic_dgram_parse() which does not release some RX packet memory before exiting after the connection thread affinity has changed. Must be backported as far as 2.7.	2023-11-22 18:03:26 +01:00
Frédéric Lécaille	cd225da46c	BUG/MINOR: quic: Possible leak of TX packets under heavy load This bug could be reproduced with -dMfail and detected added a counter of TX packet to the QUIC connection. When released calling quic_conn_release() the connection should have a null counter of TX packets. This was not always the case. This could occur during the handshake step: a first packet was built, then another one should have followed in the same datagram, but fail due to a memory allocation issue. As the datagram length and first TX packet were not written in the TX buffer, this latter could not really be purged by qc_purge_tx_buf() even if called. This bug occured only when building coalesced packets in the same datagram. To fix this, write the packet information (datagram length and first packet address) in the TX buffer before purging it. Must be backported as far as 2.6.	2023-11-22 18:03:26 +01:00
Frédéric Lécaille	dc8a20b317	BUG/MEDIUM: quic: Possible crash during retransmissions and heavy load This bug could be reproduced with -dMfail and dectected by libasan as follows: $ ASAN_OPTIONS=disable_coredump=0:unmap_shadow_on_exit=1:abort_on_error=f quic-freeze.cfg -dMfail -dMno-cache -dM0x55 ================================================================= ==82989==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffc 0x560790cc4749 bp 0x7fff8e0e8e30 sp 0x7fff8e0e8e28 WRITE of size 8 at 0x7fff8e0ea338 thread T0 #0 0x560790cc4748 in qc_frm_free src/quic_frame.c:1222 #1 0x560790cc5260 in qc_release_frm src/quic_frame.c:1261 #2 0x560790d1de99 in qc_treat_acked_tx_frm src/quic_rx.c:312 #3 0x560790d1e708 in qc_ackrng_pkts src/quic_rx.c:370 #4 0x560790d22a1d in qc_parse_ack_frm src/quic_rx.c:694 #5 0x560790d25daa in qc_parse_pkt_frms src/quic_rx.c:988 #6 0x560790d2a509 in qc_treat_rx_pkts src/quic_rx.c:1373 #7 0x560790c72d45 in quic_conn_io_cb src/quic_conn.c:906 #8 0x560791207847 in run_tasks_from_lists src/task.c:596 #9 0x5607912095f0 in process_runnable_tasks src/task.c:876 #10 0x560791135564 in run_poll_loop src/haproxy.c:2966 #11 0x5607911363af in run_thread_poll_loop src/haproxy.c:3165 #12 0x56079113938c in main src/haproxy.c:3862 #13 0x7f92606edd09 in __libc_start_main ../csu/libc-start.c:308 #14 0x560790bcd529 in _start (/home/flecaille/src/haproxy/haproxy+0x Address 0x7fff8e0ea338 is located in stack of thread T0 at offset 1032 i #0 0x560790d29b52 in qc_treat_rx_pkts src/quic_rx.c:1341 This frame has 2 object(s): [32, 48) 'ar' (line 1380) [64, 1088) '_msg' (line 1368) <== Memory access at offset 1032 is inable HINT: this may be a false positive if your program uses some custom stacnism, swapcontext or vfork (longjmp and C++ exceptions are supported) SUMMARY: AddressSanitizer: stack-use-after-scope src/quic_frame.c:1222 i Shadow bytes around the buggy address: 0x100071c15410: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 0x100071c15420: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 0x100071c15430: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 0x100071c15440: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 0x100071c15450: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 =>0x100071c15460: f8 f8 f8 f8 f8 f8 f8[f8]f8 f8 f8 f8 f8 f8 f3 f3 0x100071c15470: f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 00 00 0x100071c15480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100071c15490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100071c154a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x100071c154b0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f3 f3 f3 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==82989==ABORTING AddressSanitizer:DEADLYSIGNAL AddressSanitizer:DEADLYSIGNAL AddressSanitizer:DEADLYSIGNAL AddressSanitizer:DEADLYSIGNAL AddressSanitizer:DEADLYSIGNAL AddressSanitizer:DEADLYSIGNAL Aborted (core dumped) Note that a coredump could not always be produced with all compilers. This was always the case with clang 11. When allocating frames to be retransmitted from qc_dgrams_retransmit(), if they could not be sent for any reason, they could remain attached to a local list to qc_dgrams_retransmit() and trigger a crash with libasan when releasing the original frames they were duplicated from. To fix this, always release the frames which could not be sent during retransmissions calling qc_free_frm_list() where needed. Must be backported as far as 2.6.	2023-11-22 18:03:26 +01:00
Frédéric Lécaille	34bc100b8f	MINOR: quic: Add traces to debug frames handling during retransmissions This is really boring to not know why some retransmissions could not be done from qc_prep_hpkts() which allocates frames, prepare packets and send them. Especially to not know about if frames are not remaining allocated and attached to list on the stack. This patch already helped in diagnosing such an issue during "-dMfail" tests.	2023-11-22 18:03:26 +01:00
Willy Tarreau	8f9e94ecff	BUILD: log: silence a build warning when threads are disabled Building without threads emits two warnings because the proxy pointer is no longer used (only serves for the lock) since 2.9 commit `9a74a6cb1` ("MAJOR: log: introduce log backends"). No backport is needed.	2023-11-22 11:21:07 +01:00
Amaury Denoyelle	89da4e9e5d	MINOR: acl: define explicit HTTP_3.0 Some ACL shortcuts are defined to match HTTP requests by their version. This exists for HTTP_1.0 to HTTP_2.0. This patch adds HTTP_3.0 definition.	2023-11-20 18:01:07 +01:00
Amaury Denoyelle	decf29d06d	MINOR: quic: remove unneeded QUIC specific stopping function On CONNECTION_CLOSE reception/emission, QUIC connections enter CLOSING state. At this stage, only CONNECTION_CLOSE can be reemitted and all other exchanges are stopped. Previously, on haproxy process stopping, if all QUIC connections were in CLOSING state, they were released before their closing timer expiration to not block the process shutdown. However, since a recent commit, the closing timer has been shorten to a more reasonable delay. It is now consider viable to respect connections closing state even on process shutdown. As such, stopping specific code in QUIC connections idle timer task was removed. A specific function quic_handle_stopping() was implemented to notify QUIC connections on shutdown from main() function. It should have been deleted along the removal in QUIC idle timer task. This patch just does this.	2023-11-20 17:59:52 +01:00
Frédéric Lécaille	756b3c5f7b	BUG/MEDIUM: quic: Possible crash for connections to be killed The connections are flagged as "to be killed" asap when the peer has left (detected by sendto() "Connection refused" errno) by qc_kill_conn(). This function has to wakeup the idle timer task to release the connection (and the idle timer and the idle timer task itself). Then if in the meantime the connection was flagged as having to process some retransmissions, some packet could lead to sendto() errors again with a call to qc_kill_conn(), this time with a released idle timer task. This bug could be detected by libasan as follows: .AddressSanitizer:DEADLYSIGNAL ================================================================= ==21018==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x 560b5d898717 bp 0x7f9aaac30000 sp 0x7f9aaac2ff80 T3) ==21018==The signal is caused by a READ memory access. ==21018==Hint: address points to the zero page. . #0 0x560b5d898717 in _task_wakeup include/haproxy/task.h:209 #1 0x560b5d8a563c in qc_kill_conn src/quic_conn.c:171 #2 0x560b5d97f832 in qc_send_ppkts src/quic_tx.c:636 #3 0x560b5d981b53 in qc_send_app_pkts src/quic_tx.c:876 #4 0x560b5d987122 in qc_send_app_probing src/quic_tx.c:910 #5 0x560b5d987122 in qc_dgrams_retransmit src/quic_tx.c:1397 #6 0x560b5d8ab250 in quic_conn_app_io_cb src/quic_conn.c:712 #7 0x560b5de41593 in run_tasks_from_lists src/task.c:596 #8 0x560b5de4333c in process_runnable_tasks src/task.c:876 #9 0x560b5dd6f2b0 in run_poll_loop src/haproxy.c:2966 #10 0x560b5dd700fb in run_thread_poll_loop src/haproxy.c:3165 #11 0x7f9ab9188ea6 in start_thread nptl/pthread_create.c:477 #12 0x7f9ab90a8a2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV include/haproxy/task.h:209 in _task_wakeup Thread T3 created by T0 here: #0 0x7f9ab97ac2a2 in __interceptor_pthread_create ../../../../src/libsaniti zer/asan/asan_interceptors.cpp:214 #1 0x560b5df4f3ef in setup_extra_threads src/thread.c:252 o #2 0x560b5dd730c7 in main src/haproxy.c:3856 #3 0x7f9ab8fd0d09 in __libc_start_main ../csu/libc-start.c:308 i ==21018==ABORTING AddressSanitizer:DEADLYSIGNAL Aborted (core dumped) To fix, simply reset the connection flag QUIC_FL_CONN_RETRANS_NEEDED to cancel the retransmission when qc_kill_conn is called. Note that this new bug arrived with this fix which is correct and flagged as to be backported as far as 2.6. BUG/MINOR: quic: idle timer task requeued in the past Must be backported as far as 2.6.	2023-11-20 17:17:16 +01:00
Amaury Denoyelle	a8968701c0	BUG/MAJOR: quic: complete thread migration before tcp-rules A quic_conn is instantiated and tied on the first thread which has received the first INITIAL packet. After handshake completion, listener_accept() is called. For each quic_conn, a new thread is selected among the least loaded ones Note that this occurs earlier if handling 0-RTT data. This thread connection migration is done in two steps : * inside listener_accept(), on the origin thread, quic_conn tasks/tasklet are killed. After this, no quic_conn related processing will occur on this thread. The connection is flagged with QUIC_FL_CONN_AFFINITY_CHANGED. * as soon as the first quic_conn related processing occurs on the new thread, the migration is finalized. This allows to allocate the new tasks/tasklet directly on the destination thread. This last step on the new thread must be done prior to other quic_conn access. There is two events which may trigger it : * a packet is received on the new thread. In this case, qc_finalize_affinity_rebind() is called from quic_dgram_parse(). * the recently accepted connection is popped from accept_queue_ring via accept_queue_process(). This will called session_accept_fd() as listener.bind_conf.accept callback. This instantiates a new session and start connection stack via conn_xprt_start(), which itself calls qc_xprt_start() where qc_finalize_affinity_rebind() is used. A condition was recently found which could cause a closing to be used with qc_finalize_affinity_rebind() which is forbidden with a BUG_ON(). This lat step was not compatible with layer 4 rule such as "tcp-request connection reject" which closes the connection early. In this case, most of the body of session_accept_fd() is skipped, including qc_xprt_start(), so thread migration is not finalized. At the end of the function, conn_xprt_close() is then called which flags the connection as CLOSING. If a datagram is received for this connection before it is released, this will call qc_finalize_affinity_rebind() which triggers its BUG_ON() to prevent thread migration for CLOSING quic_conn. FATAL: bug condition "qc->flags & ((1U << 29)\|(1U << 30))" matched at src/quic_conn.c:2036 Thread 3 "haproxy" received signal SIGILL, Illegal instruction. [Switching to Thread 0x7ffff794f700 (LWP 2973030)] 0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036 2036 BUG_ON(qc->flags & (QUIC_FL_CONN_CLOSING\|QUIC_FL_CONN_DRAINING)); (gdb) bt #0 0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036 #1 0x0000555555682463 in quic_dgram_parse (dgram=0x7fff5003ef10, from_qc=0x0, li=0x555555f38670) at src/quic_rx.c:2602 #2 0x0000555555651aae in quic_lstnr_dghdlr (t=0x555555fc4440, ctx=0x555555fc3f78, state=32832) at src/quic_sock.c:189 #3 0x00005555558c9393 in run_tasks_from_lists (budgets=0x7ffff7944c90) at src/task.c:596 #4 0x00005555558c9e8e in process_runnable_tasks () at src/task.c:876 #5 0x000055555586b7b2 in run_poll_loop () at src/haproxy.c:2966 #6 0x000055555586be87 in run_thread_poll_loop (data=0x555555d3d340 <ha_thread_info+64>) at src/haproxy.c:3165 #7 0x00007ffff7b59609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #8 0x00007ffff7a7e133 in clone () from /lib/x86_64-linux-gnu/libc.so.6 To fix this issue, ensure quic_conn migration is completed earlier inside session_accept_fd(), before any tcp rules processing. This is done by moving qc_finalize_affinity_rebind() invocation from qc_xprt_start() to qc_conn_init(). This must be backported up to 2.7.	2023-11-20 16:11:26 +01:00
Willy Tarreau	3e913909e7	BUILD: cache: fix build error on older compilers pre-c99 compilers will fail to build the cache since commit `48f81ec09` ("MAJOR: cache: Delay cache entry delete in reserve_hot function") due to an int declaration in the for loop. No backport is needed.	2023-11-20 11:43:52 +01:00
Willy Tarreau	445fc1fe3a	BUG/MINOR: sock: mark abns sockets as non-suspendable and always unbind them In 2.3, we started to get a cleaner socket unbinding mechanism with commit `f58b8db47` ("MEDIUM: receivers: add an rx_unbind() method in the protocols"). This mechanism rightfully refrains from unbinding when sockets are expected to be transferrable to another worker via "expose-fd listeners", but this is not compatible with ABNS sockets, which do not support reuseport, unbinding nor being renamed: in short they will always prevent a new process from binding. It turns out that this is not much visible because by pure accident, GTUNE_SOCKET_TRANSFER is only set in the code dealing with master mode and deamons, so it's never set in foreground mode nor in tests even if present on the stats socket. However with master mode, it is now always set even when not present on the stats socket, and will always conflict. The only reasonable approach seems to consist in marking these abns sockets as non-suspendable so that the generic sock_unbind() code can decide to just unbind them regardless of GTUNE_SOCKET_TRANSFER. This should carefully be backported as far as 2.4.	2023-11-20 11:38:26 +01:00
William Lallemand	ef9a195742	BUG/MINOR: startup: set GTUNE_SOCKET_TRANSFER correctly This bug was forbidding the GTUNE_SOCKET_TRANSFER option to be set when haproxy is neither in daemon mode nor in mworker mode. So it basically only impacts the foreground mode. The fix moves the code outside the 'if (global.mode & (MODE_DAEMON \| MODE_MWORKER \| MODE_MWORKER_WAIT))' condition. Bug was introduced with `7f80eb23` ("MEDIUM: proxy: zombify proxies only when the expose-fd socket is bound"). Must be backported in every stable version.	2023-11-20 10:49:05 +01:00
Aurelien DARRAGON	82f4bcafae	MINOR: log/backend: prevent "dynamic-cookie-key" use with LOG mode It doesn't make sense to set "dynamic-cookie-key" inside a log backend, thus we report a warning to the user and reset the setting.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	c7783fb32b	MINOR: log/backend: prevent "http-send-name-header" use with LOG mode It doesn't make sense to use the "http-send-name-header" directive inside a log backend so we report a warning in with case and reset the setting.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	4b2616f784	MINOR: log/backend: prevent stick table and stick rules with LOG mode Report a warning and prevent errors if user tries to declare a stick table or use stick rules within a log backend.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	5335618967	MINOR: log/backend: prevent tcp-{request,response} use with LOG mode We start implementing some postparsing compatibility checks for log backends. Here we report a warning if user tries to use tcp-{request,response} rules with log backend, and we properly ignore such rules when inherited from defaults section.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	6a29888f60	MINOR: log/backend: ensure log exclusive params are not used in other modes add proxy_cfg_ensure_no_log() function (similar to proxy_cfg_ensure_no_http()) to ensure at the end of proxy parsing that no log exclusive options are found if the proxy is not in log mode.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	42d7d1bd47	Revert "MINOR: filter: "filter" requires TCP or HTTP mode" This reverts commit `f9422551cd` since we cannot perform the test during parsing as the effective proxy mode is not yet known.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	c8948fb7ac	Revert "MINOR: flt_http_comp: "compression" requires TCP or HTTP mode" This reverts commit `225526dc16` since we cannot perform the test during parsing as the effective proxy mode is not yet known.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	33e5c4055f	Revert "MINOR: http_htx/errors: prevent the use of some keywords when not in tcp/http mode" This reverts commit `b41b77b4cc` since we cannot perform the test during parsing as the effective proxy mode is not yet known.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	0f9b475333	Revert "MINOR: fcgi-app: "use-fcgi-app" requires TCP or HTTP mode" This reverts commit `0ba731f50b` since we cannot perform the test during parsing as the effective proxy mode is not yet known.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	7d59730100	Revert "MINOR: cfgparse-listen: "http-reuse" requires TCP or HTTP mode" This reverts commit `65f1124b5d` since we cannot perform the test during parsing as the effective proxy mode is not yet known.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	f1a072d077	Revert "MINOR: cfgparse-listen: "dynamic-cookie-key" requires TCP or HTTP mode" This reverts commit `0b09727a22` since we cannot perform the test during parsing as the effective proxy mode is not yet known.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	a0a7dd1ee7	Revert "MINOR: cfgparse-listen: "http-send-name-header" requires TCP or HTTP mode" This reverts commit `d354947365` since we cannot perform the test during parsing as the proxy mode is not yet known.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	c90d7dc46b	Revert "MINOR: stktable: "stick" requires TCP or HTTP mode" This reverts commit `098ae743fd` since we cannot perform the test during parsing as the effective proxy mode is not yet known.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	8e20fdbb1c	Revert "MINOR: tcp_rules: tcp-{request,response} requires TCP or HTTP mode" This reverts commit `09b15e4163` since we cannot perform the test during parsing as the effective proxy mode is not yet known.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	b6e1e9ec8b	Revert "MINOR: proxy: report a warning for max_ka_queue in proxy_cfg_ensure_no_http()" This reverts commit `3934901` since it makes no sense to report a warning in this case given that max-keepalive-queue will also work with TCP backends.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	b61147fd2a	MEDIUM: log/balance: merge tcp/http algo with log ones "log-balance" directive was recently introduced to configure the balancing algorithm to use when in a log backend. However, it is confusing and it causes issues when used in default section. In this patch, we take another approach: first we remove the "log-balance" directive, and instead we rely on existing "balance" directive to configure log load balancing in log backend. Some algorithms such as roundrobin can be used as-is in a log backend, and for log-only algorithms, they are implemented as "log-$name" inside the "backend" directive. The documentation was updated accordingly.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	2c4943c18b	BUG/MINOR: proxy/stktable: missing frees on proxy cleanup In `1b8e68e` ("MEDIUM: stick-table: Stop handling stick-tables as proxies.") we forgot to free the table pointer which is now dynamically allocated. Let's take this opportunity to also fix a missing free in the table itself (the table expire task wasn't properly destroyed) This patch depends on: - "MINOR: stktable: add sktable_deinit function" It should be backported in every stable versions.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	e10cf61099	MINOR: stktable: add stktable_deinit function Adding sktable_deinit() helper function to properly cleanup a sticktable that was initialized using stktable_init().	2023-11-18 11:16:21 +01:00
Willy Tarreau	6c7771f1b4	MINOR: stream/cli: add another filter "susp" to "show sess" This one reports streams considered as "suspicious", i.e. those with no expiration dates or dates in the past, or those without a front endpoint. More criteria could be added in the future.	2023-11-17 19:30:07 +01:00
Willy Tarreau	3ffcf7beb1	MINOR: stream/cli: add an optional "older" filter for "show sess" It's often needed to be able to refine "show sess" when debugging, and very often a first glance at old streams is performed, but that's a difficult task in large dumps, and it takes lots of resources to dump everything. This commit adds "older <age>" to "show sess" in order to specify the minimum age of streams that will be dumped. This should simplify the identification of blocked ones.	2023-11-17 19:30:04 +01:00
Willy Tarreau	ec76e0138b	BUG/MINOR: stream/cli: report correct stream age in "show sess" Since 2.4-dev2 with commit `15e525f49` ("MINOR: stream: Don't retrieve anymore timing info from the mux csinfo"), we don't replace the tv_accept (now accept_ts) anymore with the current request's, so that it properly reflects the session's accept date and not the request's date. However, since then we failed to update "show sess" to make use of the request's timestamp instead of the session's timestamp, resulting in fantasist values in the "age" field of "show sess" for the task. Indeed, the session's age is displayed instead of the stream's, which leads to great confusion when debugging, particularly when it comes to multiplexed inter-proxy connections which are kept up forever. Let's fix this now. This must be backported as far as 2.4. However, for 2.7 and older, the field was named tv_request and was a timeval.	2023-11-17 18:59:12 +01:00
Willy Tarreau	662565ddb4	MINOR: backend: without ->connect(), allow to pick another thread's connection If less connections than threads are established on a reverse-http gateway and these servers have a non-nul pool-min-conn, then conn_backend_get() will refrain from picking available connections from other threads. But this makes no sense for protocols for which there is no ->connect(), since there's no way the current thread will manage to establish its own connection. For such situations we should always accept to use another thread's connection. That's precisely what this patch does.	2023-11-17 18:13:04 +01:00
Willy Tarreau	f592a0d5dd	MINOR: rhttp: remove the unused outgoing connect() function A dummy connect() function previously had to be installed for the log server so that a reverse-http address could be referenced on a "server" line, but after the recent rework of the server line parsing, this is no longer needed, and this is actually annoying as it makes one believe there is a way to connect outside, which is not true. Let's now get rid of this function.	2023-11-17 18:10:16 +01:00
Willy Tarreau	d069825c5f	BUG/MEDIUM: mux-fcgi: fail earlier on malloc in takeover() This is the equivalent of the previous "BUG/MEDIUM: mux-h1: fail earlier on malloc in takeover()". Connection takeover was implemented for fcgi in 2.2 by commit `a41bb0b6c` ("MEDIUM: mux_fcgi: Implement the takeover() method."). It does have one corner case related to memory allocation failure: in case the task or tasklet allocation fails, the connection gets released synchronously. Unfortunately the situation is bad there, because the lower layers are already switched to the new thread while the tasklet is either NULL or still the old one, and calling fcgi_release() will also result in touching the thread-local list of buffer waiters, calling unsubscribe(), There are even code paths where the thread will try to grab the lock of its own idle conns list, believing the connection is there while it has no useful effect. However, if the owner thread was doing the same at the same moment, and ended up trying to pick from the current thread (which could happen if picking a connection for a different name), the two could even deadlock. No tests were made to try to reproduce the problem, but the description above is sufficient to see that nothing can guarantee against it. This patch takes a simple but radically different approach. Instead of starting to migrate the connection before risking to face allocation failures, it first pre-allocates a new task and tasklet, then assigns them to the connection if the migration succeeds, otherwise it just frees them. This way it's no longer needed to manipulate the connection until it's fully migrated, and as a bonus this means the connection will continue to exist and the use-after-free condition is solved at the same time. This should be backported to 2.2. Thanks to Fred for the initial analysis of the problem!	2023-11-17 18:10:16 +01:00
Willy Tarreau	95fd2d6801	BUG/MEDIUM: mux-h1: fail earlier on malloc in takeover() This is the h1 equivalent of previous "BUG/MEDIUM: mux-h2: fail earlier on malloc in takeover()". Connection takeover was implemented for H1 in 2.2 by commit `f12ca9f8f1` ("MEDIUM: mux_h1: Implement the takeover() method."). It does have one corner case related to memory allocation failure: in case the task or tasklet allocation fails, the connection gets released synchronously. Unfortunately the situation is bad there, because the lower layers are already switched to the new thread while the tasklet is either NULL or still the old one, and calling h1_release() will call some unsubscribe and and possibly other things whose safety is not guaranteed (and the ambiguity here alone is sufficient to be careful). There are even code paths where the thread will try to grab the lock of its own idle conns list, believing the connection is there while it has no useful effect. However, if the owner thread was doing the same at the same moment, and ended up trying to pick from the current thread (which could happen if picking a connection for a different name), the two could even deadlock. Contrary to mux-h2, a few tests were not sufficient to try to crash the process, but there's nothing that indicates it couldn't happen based on the description above. This patch takes a simple but radically different approach. Instead of starting to migrate the connection before risking to face allocation failures, it first pre-allocates a new task and tasklet, then assigns them to the connection if the migration succeeds, otherwise it just frees them. This way it's no longer needed to manipulate the connection until it's fully migrated, and as a bonus this means the connection will continue to exist and the use-after-free condition is solved at the same time. This should be backported to 2.2. Thanks to Fred for the initial analysis of the problem!	2023-11-17 18:10:16 +01:00
Willy Tarreau	4f02e3da67	BUG/MEDIUM: mux-h2: fail earlier on malloc in takeover() Connection takeover was implemented for H2 in 2.2 by commit `cd4159f03` ("MEDIUM: mux_h2: Implement the takeover() method."). It does have one corner case related to memory allocation failure: in case the task or tasklet allocation fails, the connection gets released synchronously. Unfortunately the situation is bad there, because the lower layers are already switched to the new thread while the tasklet is either NULL or still the old one, and calling h2_release() will also result in h2_process() and h2_process_demux() that may process any possibly pending frames. Even the session remains the old one on the old thread, so that some sess_log() that are called when facing certain demux errors will be associated with the previous thread, possibly accessing a number of elements belonging to another thread. There are even code paths where the thread will try to grab the lock of its own idle conns list, believing the connection is there while it has no useful effect. However, if the owner thread was doing the same at the same moment, and ended up trying to pick from the current thread (which could happen if picking a connection for a different name), the two could even deadlock. The risk is extremely low, but Fred managed to reproduce use-after-free errors in conn_backend_get() after a takeover() failed by playing with -dMfail, indicating that h2_release() had been successfully called. In practise it's sufficient to have h2 on the server side with reuse-always and to inject lots of request on it with -dMfail. This patch takes a simple but radically different approach. Instead of starting to migrate the connection before risking to face allocation failures, it first pre-allocates a new task and tasklet, then assigns them to the connection if the migration succeeds, otherwise it just frees them. This way it's no longer needed to manipulate the connection until it's fully migrated, and as a bonus this means the connection will continue to exist and the use-after-free condition is solved at the same time. This should be backported to 2.2. Thanks to Fred for the initial analysis of the problem!	2023-11-17 18:10:16 +01:00
Willy Tarreau	c7a90cc181	CLEANUP: haproxy: remove old comment from 1.1 from the file header There was still a totally outdated comment speaking about issues affecting solaris on 1.1.8pre4 (April 2002, 21 year-old)! This proves that comments in headers are never read, so let's take this opportunity for also removing the outdated one recommending to read the "updated" RFC7230.	2023-11-17 18:10:16 +01:00
Fr�d�ric L�caille	888d1dc3dc	MINOR: quic: Rename "handshake" timeout to "client-hs" Use a more specific name for this timeout to distinguish it from a possible future one on the server side. Also update the documentation.	2023-11-17 18:09:41 +01:00
Frédéric Lécaille	373e40f0c1	MEDIUM: session: handshake timeout (TCP) Adapt session_accept_fd() called on accept() to set the handshake timeout from "hanshake-timeout" setting if set by configuration. If not set, continue to use the "client" timeout setting.	2023-11-17 17:31:42 +01:00
Frédéric Lécaille	392640a61b	BUG/MINOR: quic: Malformed CONNECTION_CLOSE frame This bug arrived with this commit: MINOR: quic: Avoid zeroing frame structures Before this latter, the CONNECTION_CLOSE was zeroed, especially the "reason phrase length". Restablish this behavior. No need to backport.	2023-11-17 17:31:42 +01:00
Frédéric Lécaille	953c7dc2b9	MINOR: quic: Dump the expiration date of the idle timer task This date is shared between the idle timer and hanshake timeout. So, it should be useful to dump the expiration date of the idle timer task itself, in place of the idle timer expiration date. This way, the handshake timeout value will be visible during the handshake from CLI "show quic full" command.	2023-11-17 17:31:42 +01:00
Frédéric Lécaille	e3e0bb90ce	MEDIUM: quic: Add support for "handshake" timeout setting. The idle timer task may be used to trigger the client handshake timeout. The hanshake timeout expiration date (qc->hs_expire) is initialized when the connection is allocated. Obviously, this timeout is taken into an account only during the handshake by qc_idle_timer_do_rearm() whose job is to rearm the idle timer. The idle timer expiration date could be initialized only one time, then never updated until the hanshake completes. But this only works if the handshake timeout is smaller than the idle timer task timeout. If the handshake timeout is set greater than the idle timeout, this latter may expire before the handshake timeout. This patch may have an impact on the L1/C1 interop tests (with heavy packet loss or corruption). This is why I guess some implementations with a hanshake timeout support set a big timeout during this test. This is at least the case for ngtcp2 which sets a 180s hanshake timeout! haproxy will certainly have to proceed the same way if it wants to have a chance to pass this test as before this handshake timeout.	2023-11-17 17:31:42 +01:00
Frédéric Lécaille	b33eacc523	MINOR: proxy: Add "handshake" new timeout (frontend side) Add a new timeout for the handshake, on the frontend side only. Such a hanshake will be typically used for TLS hanshakes during client connections to TLS/TCP or QUIC frontends.	2023-11-17 17:31:42 +01:00
Remi Tricot-Le Breton	d5cce92a46	BUG/MINOR: shctx: Remove old HA_SPIN_INIT The shctx lock was changed from a SPINLOCK to a RWLOCK in commit `ed35b94` "MEDIUM: cache: Switch shctx spinlock to rwlock and restrict its scope" but a SPIN_INIT was left behind. This patch does not need to be backported.	2023-11-17 16:56:18 +01:00
Christopher Faulet	7676a2cdf6	BUG/MINOR: stconn/applet: Report send activity only if there was output data For applets and connection, when a send attempt is performed, we must be sure to not report a send activity if there was no output data at all before the attempt. It is not important for the <fsb> date itself but for the <lra> date for non-independent stream. This patch must be backported to 2.8.	2023-11-17 15:36:43 +01:00
Christopher Faulet	ab5ecaa2ea	BUG/MINOR: stconn: Use HTX-aware channel's functions to get info on buffer Some channel function are used to check if the channel's buffer is full, not empty or if there are input data. However, functions used are not HTX-aware. So it is not accurate and may prevent some actions to be performed (However, not sure there are really issues). Because HTX-aware versions now exist, use them instead. This patch may be backported as far as 2.2. It relies on * "MINOR: channel: Add functions to get info on buffers and deal with HTX streams" * "MINOR: htx: Use a macro for overhead induced by HTX"	2023-11-17 15:09:33 +01:00
Christopher Faulet	24409a5caa	BUG/MINOR: stconn: Fix streamer detection for HTX streams Since the HTX was introduced, the streamer detection is broken for HTX streams because the HTX overhead was not counted in the test to set CF_STREAMER and CF_STREAMER_FAST flags. The consequence was that the consumer side was no longer able to send more than tune.ssl.maxrecord at a time in SSL. To fix the issue, we now count the HTX overhead of HTX streams to be able to set CF_STREAMER/CF_STREAMER_FAST flags on a channel. This patch relies on folloing commits: * "MINOR: channel: Add functions to get info on buffers and deal with HTX streams" * "MINOR: htx: Use a macro for overhead induced by HTX" The series must be backported as far as 2.2.	2023-11-17 15:09:17 +01:00
Christopher Faulet	b68c579eda	BUG/MEDIUM: stconn: Update fsb date on partial sends The first-send-blocked date was originally designed to save the date of the first send of a series where some data remain blocked. It was relaxed recently (`3083fd90e` "BUG/MEDIUM: stconn: Report a send activity everytime data were sent") to save the date of the first full blocked send. However, it is not accurrate. When all data are sent, the fsb value must be reset to TICK_ETERNITY. When nothing is sent and if it is not already set, it must be set. But when data are partially sent, the value must be updated and not reset. Otherwise the write timeout may be ignored because fsb date is never set. So, changes brought by the patch above are reverted and sc_ep_report_blocked_send() was changed to know if some data were sent or not. This way we are able to update fsb value. l This patch must be backported to 2.8.	2023-11-17 12:13:00 +01:00
Remi Tricot-Le Breton	f1f8e2b3df	DOC: cache: Specify when function expects a cache lock Some functions are built on the fact that the cache lock must be already taken by the caller. This patch adds this information in the functions' descriptions.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	45a2ff0f4a	MINOR: shctx: Remove 'use_shared_mem' variable This global variable was used to avoid using locks on shared_contexts in the unlikely case of nbthread==1. Since the locks do not do anything when USE_THREAD is not defined, it will be more beneficial to simply remove this variable and the systematic test on its value in the shared context locking functions.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	4fe6c1365d	MINOR: shctx: Remove redundant arg from free_block callback The free_block callback does not get called on blocks that are not row heads anymore so we don't need too shared_block parameters.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	48f81ec09d	MAJOR: cache: Delay cache entry delete in reserve_hot function A reference counter on the cache_entry was added in a previous commit. Its value is atomically increased and decreased via the retain_entry and release_entry functions. This is needed because of the latest cache and shared_context modifications that introduced two separate locks instead of the preexisting single shctx_lock one. With the new logic, we have two main blocks competing for the two locks: - the one in the http_action_req_cache_use that performs a lookup in the cache tree (locked by the cache lock) and then tries to remove the corresponding blocks from the shared_context's 'avail' list until the response is sent to the client by the cache applet, - the shctx_row_reserve_hot that traverses the 'avail' list and gives them back to the caller, while removing previous row heads from the cache tree Those two blocks require the two locks but one of them would take the cache lock first, and the other one the shctx_lock first, which would end in a deadlock without the current patch. The way this conflict is resolved in this patch is by ensuring that at least one of those uses works without taking the two locks at the same time. The solution found was to keep taking the two locks in the cache_use case. We first lock the cache to lookup for an entry and we then take the shctx lock as well to detach the corresponding blocks from the 'avail' list. The subtlety is that between the cache lookup and the actual locking of the shctx, another thread might have called the reserve_hot function in which we only take the shctx lock. In this function we traverse the 'avail' list to remove blocks that are then given to the caller. If one of those blocks corresponds to a previous row head, we call the 'free_blocks' callback that used to delete the cache entry from the tree. We now avoid deleting directly the cache entries in reserve_hot and we rather set the cache entries 'complete' param to 0 so that no other thread tries to work with this entry. This way, when we release the shctx lock in reserve_hot, the first thread that had performed the cache lookup and had found an entry that we just gave to another thread will see that the 'complete' field is 0 and it won't try to work with this response. The actual removal of entries from the cache tree will now be performed in the new 'reserve_finish' callback called at the end of the shctx_row_reserve_hot function. It will iterate on all the row head that were inserted in a dedicated list in the 'free_block' callback and perform the actual delete.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	1cd91b4f2a	MINOR: shctx: Add new reserve_finish callback call to shctx_row_reserve_hot This patch adds a reserve_finish callback that can be defined by the subsystems that require a shared_context. It is called at the end of shctx_row_reserve_hot after the shared_context lock is released.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	11df806c88	MEDIUM: shctx: Descend shctx_lock calls into the shctx_row_reserve_hot Descend the shctx_lock calls into the shctx_row_reserve_hot so that the cases when we don't need to lock anything (enough space in the current row or not enough space in the 'avail' list) do not take the lock at all. In sh_ssl_sess_new_cb the lock had to be descended into sh_ssl_sess_store in order not to cover the shctx_row_reserve_hot call anymore.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	a29b073f26	MEDIUM: cache: Add refcount on cache_entry Add a reference counter on the cache_entry. Its value will be atomically increased and decreased via the retain_entry and release_entry functions. The release_entry function has two distinct versions, release_entry_locked and release_entry_unlocked that should be called when the cache lock is already taken in write mode or not (respectively). In the unlocked case the cache lock will only be taken in write mode on the last reference of the entry (before calling delete_entry). This allows to limit the amount of times when we need to take the cache lock during a release operation.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	ed35b9411a	MEDIUM: cache: Switch shctx spinlock to rwlock and restrict its scope Since a lock on the cache tree was added in the latest cache changes, we do not need to use the shared_context's lock to lock more than pure shared_context related data anymore. This already existing lock will now only cover the 'avail' list from the shared_context. It can then be changed to a rwlock instead of a spinlock because we might want to only run through the avail list sometimes. Apart form changing the type of the shctx lock, the main modification introduced by this patch is to limit the amount of code covered by the shctx lock. This lock does not need to cover any code strictly related to the cache tree anymore.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	a0d7c290ec	MINOR: cache: Use dedicated trash for "show cache" cli command After the latest changes in the cache/shared_context mechanism, the cache and shared_context logic were decorrelated and in some unlikely cases we might end up using the "show cache" command while some regular cache processing is occurring (a response being stored in the cache for instance). In such a case, because we used the same 'trash' buffer in those two contexts, we could end up with the contents of a response in the ouput of the "show cache" command. This patch fixes this problem by allocating a dedicated trash for the CLI command.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	3831d8454f	MEDIUM: shctx: Remove 'hot' list from shared_context The "hot" list stored in a shared_context was used to keep a reference to shared blocks that were currently being used and were thus removed from the available list (so that they don't get reused for another cache response). This 'hot' list does not ever need to be shared across threads since every one of them only works on their current row. The main need behind this 'hot' list was to detach the corresponding blocks from the 'avail' list and to have a known list root when calling list_for_each_entry_from in shctx_row_data_append (for instance). Since we actually never need to iterate over all members of the 'hot' list, we can remove it and replace the inc_hot/dec_hot logic by a detach/reattach one.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	bd24118212	MEDIUM: cache: Use rdlock on cache in cache_use When looking for a valid entry in the cache tree in http_action_req_cache_use, we do not need to delete an expired entry at once because even if an expired entry exists, since the request will be forwarded to the server, then the expired entry will be overwritten when the updated response is seen. We can then use a simpler rdlock during cache_use operation.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	0dfb57bbf9	MINOR: cache: Add option to avoid removing expired entries in lookup function Any lookup in the cache tree done through entry_exist or secondary_entry_exist functions could end up deleting the corresponding entry if it is expired which prevents from using a rdlock on code paths that would just perform a lookup on the tree (in http_action_req_cache_use for instance). Adding a 'delete_expired' boolean as a parameter allows for "pure" lookups and thus it will allow to perform operations on the tree that simply require a rdlock instead of a "heavier" wrlock.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	ff3cb6dad4	MINOR: cache: Remove expired entry delete in "show cache" command The "show cache" CLI command iterates over all the entries of the cache tree and it used this opportunity to remove expired entries from the cache. This behavior was completely undocumented and does not seem that necessary. By removing it we can take the cache lock in read mode only which limits the impact on the other threads.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	ac9c49b40d	MEDIUM: cache: Use dedicated cache tree lock alongside shctx lock Every use of the cache tree was covered by the shctx lock even when no operations were performed on the shared_context lists (avail and hot). This patch adds a dedicated RW lock for the cache so that blocks of code that work on the cache tree only can use this lock instead of the superseding shctx one. This is useful for operations during which the concerned blocks are already in the hot list. When the two locks need to be taken at the same time, in http_action_req_cache_use and in shctx_row_reserve_hot, the shctx one must be taken first. A new parameter needed to be added to the shared_context's free_block callback prototype so that cache_free_block can take the cache lock and release it afterwards.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	81d8014af8	MINOR: shctx: Remove explicit 'from' param from shctx_row_data_append This parameter is not necessary since the first element of a row always has a pointer to the row's tail.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	610b67fd8b	MEDIUM: shctx: Simplify shctx_row_reserve_hot loop The shctx_row_reserve_hot relied on two loop levels in order to first look for the first block of a preused row and then iterate on all the blocks of this row to reserve them for the new row. This was not the simplest nor the easiest to read way so this logic could be replaced by a single iteration on the avail list members. The two use cases of calling this function with or without a preexisting "first" member were a bit cumbersome as well and were replaced by a more straightforward approach.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	eccb97f60e	MEDIUM: shctx: Move list between hot and avail list in O(1) Instead of iterating over all the elements of a given row when moving it between the hot and available lists, we can make use of the last_reserved pointer that already points to the last block of the list to perform the move in O(1).	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	55fbf82080	MINOR: shctx: Set last_append to NULL when reserving block in hot list Ensure that the last_append pointer is always set to NULL on first block of rows reserved by the subsystems using the shctx (cache for instance). This pointer will be used directly in shctx_row_data_append instead of the 'from' param which will simplify its uses.	2023-11-16 19:35:10 +01:00
Amaury Denoyelle	560cb1332a	MINOR: server: force add to idle on reverse A backend connection is inserted in server idle list via srv_add_to_idle_list(). This function has several conditions which may cause the connection to be rejected instead. One of this condition is based on the current estimate count of needed connections for the server. If the count of idle connections stored has already reached this estimation, the new connection is rejected. This is in opposition with the purpose of reverse HTTP. On active reverse, haproxy can instantiate several connections to properly serve the future traffic. However, the opposite passive haproxy will have only a low estimate of needed connection and will reject most of them. To fix this, simply check CO_FL_REVERSED connection flag on srv_add_to_idle_list(). If set, the connection is inserted without checking for estimate count. Note that all other conditions are not impacted, so it's still possible to reject a connection, for example if process FD limit is reached. This commit relies on recent patch which change CO_FL_REVERSED flag for connection after passive reverse.	2023-11-16 18:43:41 +01:00
Amaury Denoyelle	a1457296d5	BUG/MINOR: mux_h2: reject passive reverse conn if error on add to idle On passive reverse, H2 mux is responsible to insert the connection in the server idle list. This is done via srv_add_to_idle_list(). However, this function may fail for various reason, such as FD usage limit reached. Handle properly this error case. H2 mux flags the connection on error which will cause its release. Prior to this patch, the connection was only released on server timeout. This bug was found inspecting server curr_used_conns counter. Indeed, on connection reverse, this counter is first incremented. It is decremented just after on srv_add_to_idle_list() if insertion is validated. However, if insertion is rejected, the connection was not released which cause curr_used_conns to remains positive. This has the major downside to break the reusing of idle connection on rhttp causing spurrious 503 errors. No need to backport.	2023-11-16 18:43:32 +01:00
Amaury Denoyelle	8cc3fc73f1	MINOR: connection: update rhttp flags usage Change the flags used for reversed connection : * CO_FL_REVERSED is now put after reversal for passive connect. For active connect, it is delayed when accept is completed after reversal. * CO_FL_ACT_REVERSING replace the old CO_FL_REVERSED. It is put only for active connect on reversal and removes once accept is done. This allows to identify a connection as reversed during its whole lifetime. This should be useful to extend reverse connect.	2023-11-16 17:53:31 +01:00
Christopher Faulet	691f4cf449	BUG/MEDIUM: stream: Don't call mux .ctl() callback if not implemented The commit `5ff7d2276` ("BUG/MEDIUM: stream: Properly handle abortonclose when set on backend only") introduced a regression. Not all multiplexer implement the .ctl() callback function. Thus we must be sure this callback function is defined first to call it. This patch should fix a crash reported by Tristan in the issue #2095. It must be backported as far as 2.2, with the commit above.	2023-11-14 19:21:52 +01:00
William Lallemand	d76fa37534	BUG/MEDIUM: mworker: set the master variable earlier Since 2.7 and the mcli_reload_bind_conf (`56f73b21a5`), upon a reload failure because of a bind error, the mcli_reload_bind_conf go through a sock_unbind((). This is not supposed to do anything when a listener is RX_F_INHERITED in the master, but unfortunately this is done too early and provokes an exit of the master. We already suspected in the past that setting the 'master' variable this late could have negative impact. The fix sets the master variable earlier before the bind. This must be backported at least to 2.7. This could be backported earlier but better wait any feedbacks on the fix.	2023-11-14 14:32:39 +01:00
Willy Tarreau	a63e016d27	MINOR: activity: report profiling duration and age in "show profiling" Seeing counters in "show profiling" is not always very helpful without an indication of how long the analysis lasted nor if it's still active or not. Let's add a pair of start/stop timers for tasks and memory so that we can now indicate how long the measurements lasted and when they ended (or 0 if still running). Note that for tasks profiling set to "auto", the measurement is considered enabled since it can automatically switch on and off on a per-thread basis.	2023-11-14 11:46:37 +01:00
Christopher Faulet	ec3ea6f698	MINOR: stconn: Use SC to detect frontend connections in sc_conn_recv() In sc_conn_recv(), instead of using the connection to know we are on the frontend side, we now use the SC flags. It changes nothing but it is cleaner.	2023-11-14 11:01:51 +01:00
Christopher Faulet	5ff7d22767	BUG/MEDIUM: stream: Properly handle abortonclose when set on backend only Since the 2.2 and the commit `dedd30610` ("MEDIUM: h1: Don't wake the H1 tasklet if we got the whole request."), we avoid to subscribe for reads if the H1 message is fully received. However, this broke the abortonclose option. To fix the issue, a CO_RFL flag was added to instruct the mux it should still wait for read events to properly handle read0. Only the H1 mux was concerned. But since then, most of time, the option is only handled if it is set on the frontend proxy because the request is fully received before selecting the backend. If the backend is selected before the end of the request there is no issue. But otherwise, because the backend is not known yet, we are unable to properly handle the option and we miss to subscribe for reads. Of course the option cannot be set on a frontend proxy. So concretly it means the option is properly handled if it is enabled in the defaults section (if common to frontend and backend) or a listen proxy, but it is ignored if it is set on backend only. Thanks to previous patches, we can now instruct the mux it should subscribe for reads if not already done. We use this mechanism in process_stream() when the connection is set up, ie when backend SC is set to SC_ST_REQ state. This patch relies on following patches: * MINOR: connection: Add a CTL flag to notify mux it should wait for reads again * MEDIUM: mux-h1: Handle MUX_SUBS_RECV flag in h1_ctl() and susbscribe for reads This patch should be the issue #2344. All the series must be backported as far as 2.2.	2023-11-14 11:01:51 +01:00
Christopher Faulet	450ff71c95	MEDIUM: mux-h1: Handle MUX_SUBS_RECV flag in h1_ctl() and susbscribe for reads The H1 mux now handle MUX_SUBS_RECV flag in h1_ctl(). If it is not already subscribed for reads, it does so. This patch will be mandatory to properly handle abortonclose option.	2023-11-14 11:01:51 +01:00
Christopher Faulet	9327e7efa7	BUG/MINOR: stconn: Handle abortonclose if backend connection was already set up abortonclose option is a backend option, it should not be handle on frontend side. Of course a frontend can also be a backend but the option should not be handled too early because it is not necessarily the selected backend (think about a listen proxy routing requests to another backend). It is especially an issue when the abortonclose option is enabled in the defaults section and disabled by the selected backend. Because in this case, the option may still be enabled while it should not. Thus, now we wait the backend connection was set up to handle the option. To do so, we check the backend SC state. The option is ignored if it is in ST_CS_INI state. For all other states, it means the backend was already selected. This patch could be backported as far as 2.2.	2023-11-14 11:01:51 +01:00
Willy Tarreau	6a4591c3d0	BUG/MEDIUM: connection: report connection errors even when no mux is installed An annoying issue was met when testing the reverse-http mechanism, by which failed connection attempts would apparently not be attempted again when there was no connect timeout. It turned out to be more generalized than the rhttp system, and actually affects all outgoing connections relying on NPN or ALPN to choose the mux, on which no mux is installed and for which the subscriber (ssl_sock) must be notified instead. The problem appeared during 2.2-dev1 development. First, commit `062df2c23` ("MEDIUM: backend: move the connection finalization step to back_handle_st_con()") broke the error reporting by testing CO_FL_ERROR only under CO_FL_CONNECTED. While it still worked OK for cases where a mux was present, it did not for this specific situation because no single error path would be considered when no mux was present. Changing the CO_FL_CONNECTED test to also include CO_FL_ERROR did work, until a few commits later with `477902bd2` ("MEDIUM: connections: Get ride of the xprt_done callback.") which removed the xprt_done callback that was used to indicate success or failure of the transport layer setup, since, as the commit explains, we can report this via the mux. What this last commit says is true, except when there is no mux. For this, however, the sock_conn_iocb() function (formerly conn_fd_handler) is called for such errors, evaluates a number of conditions, none of which is matched in this error condition case, since sock_conn_check() instantly reports an error causing a jump to the leave label. There, the mux is notified if installed, and the function returns. In other error condition cases, readiness and activity are checked for both sides, the tasklets woken up and the corresponding subscriber flags removed. This means that a sane (and safe) approach would consist in just notifying the subscriber in case of error, if such a subscriber still exists: if still there, it means the event hasn't been caught earlier, then it's the right moment to report it. And since this is done after conn_notify_mux(), it still leaves all control to the mux once it's installed. This commit should be progressively backported as far as 2.2 since it's where the problem was introduced. It's important to clearly check the error path in each function to make sure the fix still does what it's supposed to.	2023-11-14 08:49:23 +01:00
Frédéric Lécaille	3741e4bf90	BUG/MINOR: quic: maximum window limits do not match the doc This bug arrived with this commit: MINOR: quic: Add a max window parameter to congestion control algorithms The documentation was been modified with missing/wrong modifications in the code part. The 'g' suffix must be accepted to parse value in gigabytes. And exctly 4g is also accepted. No need to backport.	2023-11-13 19:56:28 +01:00
Frédéric Lécaille	9021e8935e	MINOR: quic: Maximum congestion control window for each algo Make all the congestion support the maximum congestion control window set by configuration. There is nothing special to explain. For each each algo, each time the window is incremented it is also bounded.	2023-11-13 17:53:18 +01:00
Frédéric Lécaille	028a55a1d0	MINOR: quic: Add a max window parameter to congestion control algorithms Add a new ->max_cwnd member to bind_conf struct to store the maximum congestion control window value for each QUIC binding. Modify the "quic-cc-algo" keyword parsing to add an optional parameter to its value: the maximum congestion window value between parentheses as follows: ex: quic-cc-algo cubic(10m) This value must be bounded, greater than 10k and smaller than 1g.	2023-11-13 17:53:18 +01:00
Frédéric Lécaille	840af0928b	BUG/MEDIUM: quic: Non initialized CRYPTO data stream deferencing This bug arrived with this commit: BUG/MINOR: quic: Useless use of non-contiguous buffer for in order CRYPTO data Before this commit qc->cstream was tested before entering qc_treat_rx_crypto_frms(). This patch restablishes this behavior. Furthermore, it simplyfies qc_ssl_provide_all_quic_data() which is a little bit ugly: the CRYPTO data frame may be freed asap in the list_for_each_entry_safe() block after having store its data pointer and length in local variables. Also interrupt the CRYPTO data process as soon as qc_ssl_provide_quic_data() or qc_treat_rx_crypto_frms() fail. No need to be backported.	2023-11-13 16:00:25 +01:00
Amaury Denoyelle	954b5b756a	BUG/MEDIUM: quic: fix FD for quic_cc_conn Since following commit, quic_conn closes its owned socket before transition to quic_cc_conn for closing state. This allows to save FDs as quic_cc_conn could use the listener socket for their I/O. commit `150c0da889` MEDIUM: quic: release conn socket before using quic_cc_conn This patch is incomplete as it removes initialization of <fd> member for quic_cc_conn. Thus, if sending is done on closing state, <fd> value is undefined which in most cases will result in a crash. Fix this by simply initializing <fd> member with qc_init_fd() in qc_new_cc_conn(). This bug should fix recent issue from #2095. Thanks to Tristan for its reporting and then testing of this patch. No need to backport.	2023-11-13 11:55:07 +01:00
Amaury Denoyelle	78d244e9f7	BUG/MINOR: quic: fix decrement of half_open counter on qc alloc failure Half open counter is used to comptabilize QUIC connections waiting for address validation. It was recently reworked to adjust its scope. With each decrement operation, a BUG_ON() was added to ensure the counter never wraps. This BUG_ON() could be triggered if an allocation fails for one of quic_conn members in qc_new_conn(). This is because half open counter is incremented at the end of qc_new_conn(). However, in case of alloc failure, quic_conn_release() is called immediately to ensure the counter is decremented if a connection is freed before peer address has been validated. To fix this, increment half open counter early in qc_new_conn() prior to every quic_conn members allocations. This issue was reproduced using -dMfail argument. This issue has been introduced by commit `278808915b` MINOR: quic: reduce half open counters scope No need to backport.	2023-11-13 11:16:41 +01:00
Amaury Denoyelle	92da3accfd	BUG/MINOR: quic: fix crash on qc_new_conn alloc failure A new counter was recently introduced to comptabilize the current number of active QUIC handshakes. This counter is stored on the listener instance. This counter is incremented at the beginning of qc_new_conn() to check if limit is not reached prior to quic_conn allocation. If quic_conn or one of its inner member allocation fails, special care is taken to decrement the counter as the connection instance is released. However, it relies on <l> variable which is initialized too late to cover pool_head_quic_conn allocation failure. To fix this, simply initialize <l> at the beginning of qc_new_conn(). This issue was reproduced using -dMfail argument. This issue was introduced by the following commit commit `3df6a60113` MEDIUM: quic: limit handshake per listener No need to backport.	2023-11-13 11:16:41 +01:00
Aurelien DARRAGON	76acde9107	BUG/MINOR: log: keep the ref in dup_logger() This bug was introduced with `969e212` ("MINOR: log: add dup_logsrv() helper function") When duplicating an existing log entry, we must take care to inherit from its original ->ref if it is set, because not doing so would make `28ac0999` ("MINOR: log: Keep the ref when a log server is copied to avoid duplicate entries") ineffective given that global log directives will lose their original reference when duplicated resursively (at least twice), which is what happens when global log directives are first inherited to defaults which are then inherited to a regular proxy at the end of the chain. This can be easily reproduced using the following configuration: \|global \| log stdout format raw local0 \| \|defaults \| log global \| \|frontend test \| log global \| ... Logs from "test" proxy will be duplicated because test incorrectly inherited from global "log" directives twice, which `28ac0999` would normally detect and prevent. No backport needed unless `969e212` gets backported.	2023-11-13 11:06:05 +01:00
Christopher Faulet	33a1fc883a	BUG/MINOR: sample: Fix bytes converter if offset is bigger than sample length When the bytes converter was improved to be able to use variables (`915e48675` ["MEDIUM: sample: Enhances converter "bytes" to take variable names as arguments"]), the behavior of the sample slightly change. A failure is reported if the given offset is bigger than the sample length. Before, a empty binary sample was returned. This patch fixes the converter to restore the original behavior. The function was also refactored to properly handle failures by removing SMP_F_MAY_CHANGE flag. Because the converter now handles variables, the conversion to an integer may fail. In this case SMP_F_MAY_CHANGE flag must be removed to be sure the caller will not retry. This patch should fix the issue #2335. No backport needed except if commit above is backported.	2023-11-13 11:06:05 +01:00
William Lallemand	a06f6212c9	MEDIUM: startup: 'haproxy -c' is quiet when valid MODE_CHECK does not output "Configuration file is valid" by default anymore. To display this message the -V option must be used with -c. However the warning and errors are still output by default if they exist. This allows to clean the output of the systemd unit file with is doing a -c.	2023-11-13 09:59:34 +01:00
Willy Tarreau	cf07cb96be	BUG/MEDIUM: proxy: always initialize the default settings after init The proxy's initialization is rather odd. First, init_new_proxy() is called to zero all the lists and certain values, except those that can come from defaults, which are initialized by proxy_preset_defaults(). The default server settings are also only set there. This results in these settings not to be set for a number of internal proxies that do not explicitly call proxy_preset_defaults() after allocation, such as sink and log forwarders. This was revealed by last commit `79aa63823` ("MINOR: server: always initialize pp_tlvs for default servers") which crashes in log parsers when applied to certain proxies which did not initialize their default servers. In theory this should be backported, however it would be desirable to wait a bit before backporting it, in case certain parts would rely on these elements not being initialized.	2023-11-13 09:17:05 +01:00
Willy Tarreau	79aa638238	MINOR: server: always initialize pp_tlvs for default servers In commit `6f4bfed3a` ("MINOR: server: Add parser support for set-proxy-v2-tlv-fmt") a suspicious check for a NULL srv_tlv was placed in the list_for_each_entry(), that should not be needed. In practice, it's caused by the list head not being initialized, hence the first element is NULL, as shown by Alexander's reproducer below which crashes if the test in the loop is removed: backend dummy default-server send-proxy-v2 set-proxy-v2-tlv-fmt(0xE1) %[fc_pp_tlv(0xE1)] server dummy_server 127.0.0.1:2319 The right place to initialize this field is proxy_preset_defaults(). We'd really need a function to initialize a server :-/ The check in the loop was removed. No backport is needed.	2023-11-13 08:53:28 +01:00
Fr�d�ric L�caille	dfda884633	BUG/MINOR: quic: Useless use of non-contiguous buffer for in order CRYPTO data This issue could be reproduced with a TLS client certificate verificatio to generate enough CRYPTO data between the client and haproxy and with dev/udp/udp-perturb as network perturbator. Haproxy could crash thanks to a BUG_ON() call as soon as in disorder data were bufferized into a non-contiguous buffer. There is no need to pass a non NULL non-contiguous to qc_ssl_provide_quic_data() from qc_ssl_provide_all_quic_data() which handles in order CRYPTO data which have not been bufferized. If not, the first call to qc_ssl_provide_quic_data() to process the first block of in order data leads the non-contiguous buffer head to be advanced to a wrong offset, by <len> bytes which is the length of the in order CRYPTO frame. This is detected by a BUG_ON() as follows: FATAL: bug condition "ncb_ret != NCB_RET_OK" matched at src/quic_ssl.c:620 call trace(11): \| 0x5631cc41f3cc [0f 0b 8b 05 d4 df 48 00]: qc_ssl_provide_quic_data+0xca7/0xd78 \| 0x5631cc41f6b2 [89 45 bc 48 8b 45 b0 48]: qc_ssl_provide_all_quic_data+0x215/0x576 \| 0x5631cc3ce862 [48 8b 45 b0 8b 40 04 25]: quic_conn_io_cb+0x19a/0x8c2 \| 0x5631cc67f092 [e9 1b 02 00 00 83 45 e4]: run_tasks_from_lists+0x498/0x741 \| 0x5631cc67fb51 [89 c2 8b 45 e0 29 d0 89]: process_runnable_tasks+0x816/0x879 \| 0x5631cc625305 [8b 05 bd 0c 2d 00 83 f8]: run_poll_loop+0x8b/0x4bc \| 0x5631cc6259c0 [48 8b 05 b9 ac 29 00 48]: main-0x2c6 \| 0x7fa6c34a2ea7 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea7 \| 0x7fa6c33c2a2f [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a Thank you to @Tristan971 for having reported this issue in GH #2095. No need to backport.	2023-11-10 18:16:14 +01:00
Aurelien DARRAGON	078ebde870	CLEANUP: sink: useless leftover in sink_add_srv() Removing a useless leftover which has been introduced with `31e8a003a5` ("MINOR: sink: function to add new sink servers")	2023-11-10 17:49:57 +01:00
Aurelien DARRAGON	2694621151	CLEANUP: sink: bad indent in sink_new_from_logger() Fixing bad indent in sink_new_from_logger() which was recently introduced	2023-11-10 17:49:57 +01:00
Aurelien DARRAGON	d710dfbacc	BUG/MINOR: sink: don't learn srv port from srv addr Since `04276f3d` ("MEDIUM: server: split the address and the port into two different fields") we should not use srv->addr to store server's port and rely on srv->svc_port instead. For sink servers, we correctly set >svc_port upon server creation but we didn't use it when initializing address for the connection. As a result, FQDN resolution will not work properly with sink servers. Hopefully, this used to work by accident because sink servers were resolved using the PA_O_RESOLVE flag in str2sa_range(), which made the srv->addr contain the port in addition to the address. But this will fail to work when FQDN resolution is postponed because only ->svc_port will contain the proper server port upon resolution. For instance, FQDN resolution with servers from log backends (which are resolved as regular servers, that is, without the PA_O_RESOLVE) will fail to work because of this. This may be backported as far as 2.2 even though the bug didn't have noticeable effects for versions below 2.9 [In 2.2, sink_forward_session_init() didn't exist it should be applied in sink_forward_session_create()]	2023-11-10 17:49:57 +01:00
Aurelien DARRAGON	64e0b63442	BUG/MEDIUM: server: invalid address (post)parsing checks This bug was introduced with `29b76ca` ("BUG/MEDIUM: server/log: "mode log" after server keyword causes crash ") Indeed, we cannot safely rely on addr_proto being set when str2sa_range() returns in parse_server() (even if SRV_PARSE_PARSE_ADDR is set), because proto lookup might be bypassed when FQDN addresses are involved. Unfortunately, the above patch wrongly assumed that proto would always be set when SRV_PARSE_PARSE_ADDR was passed to parse_server() (so when str2sa_range() was called), resulting in invalid postparsing checks being performed, which could as well lead to crashes with log backends ("mode log" set) because some postparsing init was skipped as a result of proto not being set and this wasn't expected later in the init code. To fix this, we now make use of the previous patch to perform server's address compatibility checks on hints that are always set when str2sa_range() succesfully returns. For log backend, we're also adding a complementary test to check if the address family is of expected type, else we report an error, plus we're moving the postinit logic in log api since _srv_check_proxy_mode() is only meant to check proxy mode compatibility and we were abusing it. This patch depends on: - "MINOR: tools: make str2sa_range() directly return type hints" No backport required unless `29b76ca` gets backported.	2023-11-10 17:49:57 +01:00
Aurelien DARRAGON	12582eb8e5	MINOR: tools: make str2sa_range() directly return type hints str2sa_range() already allows the caller to provide <proto> in order to get a pointer on the protocol matching with the string input thanks to `5fc9328a` ("MINOR: tools: make str2sa_range() directly return the protocol") However, as stated into the commit message, there is a trick: "we can fail to return a protocol in case the caller accepts an fqdn for use later. This is what servers do and in this case it is valid to return no protocol" In this case, we're unable to return protocol because the protocol lookup depends on both the [proto type + xprt type] and the [family type] to be known. While family type might not be directly resolved when fqdn is involved (because family type might be discovered using DNS queries), proto type and xprt type are already known. As such, the caller might be interested in knowing those address related hints even if the address family type is not yet resolved and thus the matching protocol cannot be looked up. Thus in this patch we add the optional net_addr_type (custom type) argument to str2sa_range to enable the caller to check the protocol type and transport type when the function succeeds.	2023-11-10 17:49:57 +01:00
Christopher Faulet	ebf90ca550	BUG/MEDIUM: applet: Remove appctx from buffer wait list on release For now, the appctx is removed from the buffer wait list when it is freed. However, when it is released, it is not necessarily freed immediately. But it is detached from the SC. If it is still registered in the buffer wait list, it could then be woken up to get a buffer. At this stage it is totally unexpected, especially because we must access the SC. The fix is obvious, the appctx must be removed from the buffer wait list on release. Note this bug exists because the appctx was moved at the mux level. This patch must be backported as far as 2.6.	2023-11-10 17:49:57 +01:00
Amaury Denoyelle	150c0da889	MEDIUM: quic: release conn socket before using quic_cc_conn After emission/reception of a CONNECTION_CLOSE, a connection enters the CLOSING state. In this state, only minimal exchanges occurs as only the packets which containted the CONNECTION_CLOSE frame can be reemitted. In conformance with the RFC, most resources are released and quic_conn instance is converted to the lighter quic_cc_conn. Push further this optimization by closing quic_conn socket FD before switching to a quic_cc_conn. This means that quic_cc_conn will rely on listener socket for its send/recv operation. This should not impact performance as as stated input/output are minimal on this state. This patch should improve FD consumption as prior to this a socket FD was kept during the closing delay which could cause maxsock to be reached for other connections. Note that fd member is kept in QUIC_CONN_COMMON and not removed from quic_cc_conn. This is because quic_cc_conn relies on qc_snd_buf() which access this field. As a side-effect to this change, jobs accounting for quic_conn is also updated. quic_cc_conn instances are now not counted as jobs. Indeed, the main objective of jobs is to prevent haproxy process to be stopped with data truncation. However, this relies on the connection to uses its owned socket as the listener socket is shut down inconditionaly on shutdown. A consequence of the jobs handling change is that haproxy process will be closed if only quic_cc_conn instances are present, thus preventing to respect the closing state. In case of a reload, if a client missed a CONNECTION_CLOSE frame just before process shutdown, it will probably received a Stateless Reset on sending retry. This change is considered safe as, for now, haproxy only emits CONNECTION_CLOSE on error conditions (such as protocol violation or timeout). It is considered as expected to suffer from data truncation from this. However, if connection closing is reused by haproxy to implement clean shutdown, it should be necessary to delay CONNECTION_CLOSE frame emission to ensure no data truncation happens here.	2023-11-10 15:27:45 +01:00
Amaury Denoyelle	f549eb2b34	MEDIUM: quic: respect closing state even on soft-stop Prior to this patch, a special condition was set when idle timer was rearmed for closing connections during haproxy process stopping. In this case, the timeout was ditched and the idle task woken up immediatly. The objective was to release quickly closing connections to not prevent the process stopping to be too long. However, it is not conform with RFC 9000 recommandations and may cause some clients to miss a CONNECTION_CLOSE in case of a packet loss. A recent fix was set to use a shorter timeout for closing state. Now a connection should only be left in this state for one second or less. This reduces greatly the importance of stopping special condition. Thus, this patch removes it completely.	2023-11-10 15:26:03 +01:00
Amaury Denoyelle	75e36c57f0	BUG/MINOR: quic: remove dead code in error path In quic_rx_pkt_retrieve_conn(), err label is now only used if qc is NULL. Thus, condition on qc can be removed. No need to backport. This issue was reported by coverity on github. This should fix issue #2338.	2023-11-10 15:26:03 +01:00
Willy Tarreau	0a7ab7067f	OPTIM: mux-h2: don't allocate more buffers per connections than streams When an H2 mux works with a slow downstream connection and without the mux-mux mode, it is possible that a single stream will allocate all 32 buffers in the connection. This is not desirable at all because 1) it brings no value, and 2) it allocates a lot of memory per connection, which, in addition to using a lot of memory, tends to degrade performance due to cache thrashing. This patch improves the situation by refraining from sending data frames over a connection when more mbufs than streams are allocated. On a test featuring 10k connections each with a single stream reading from the cache, this patch reduces the RAM usage from ~180k buffers to ~20k bufs, and improves the bandwidth. This may even be backported later to recent versions to improve memory usage. Note however that it is efficient only when combined with `e16762f8a` ("OPTIM: mux-h2: call h2_send() directly from h2_snd_buf()"), and tends to slightly reduce the single-stream performance without it, so in case of a backport, the two need to be considered together.	2023-11-09 17:24:00 +01:00
Willy Tarreau	a13f8425f0	MINOR: task/debug: make task_queue() and task_schedule() possible callers It's common to see process_stream() being woken up by wake_expired_tasks in the profiling output, without knowing which timeout was set to cause this. By making it possible to record the call places of task_queue() and task_schedule(), and by making wake_expired_tasks() explicitly not replace it, we'll be able to know which task_queue() or task_schedule() was triggered for a given wakeup. For example below: process_stream 51200 311.4ms 6.081us 34.59s 675.6us <- run_tasks_from_lists@src/task.c:659 task_queue process_stream 19227 70.00ms 3.640us 9.813m 30.62ms <- sc_notify@src/stconn.c:1136 task_wakeup process_stream 6414 102.3ms 15.95us 8.093m 75.70ms <- stream_new@src/stream.c:578 task_wakeup It's visible that it's the run_tasks_from_lists() which in fact applies on the task->expire returned by the ->process() function itself.	2023-11-09 17:24:00 +01:00
Amaury Denoyelle	4dee110f56	BUG/MINOR: quic: fix retry token check inconsistency A client may send multiple INITIAL packets if ClientHello is too big for only one. In case a Retry token is used, the client must reuse it for every INITIAL packets. On the haproxy server side, there was an inconsistency to handle these packets depending on the socket mode : * when using listener socket, token is always revalidated. * when using connection socket, token check is bypassed. This is because quic_conn instance is known through its socket and thus quic_rx_pkt_retrieve_conn() is not necessary. RFC 9000 does not seems to mandate retry token validation after the first INITIAL packet per connection. Thus, this patch chooses to bypass the check every time the connection instance is known, as this indicates that a previous token was already validated. This should be backported up to 2.7.	2023-11-09 16:57:37 +01:00
Amaury Denoyelle	bb28215d9b	MEDIUM: quic: define an accept queue limit QUIC connections are pushed manually into a dedicated listener queue when they are ready to be accepted. This happens after handshake finalization or on 0-RTT packet reception. Listener is then woken up to dequeue them with listener_accept(). This patch comptabilizes the number of connections currently stored in the accept queue. If reaching a certain limit, INITIAL packets are dropped on reception to prevent further QUIC connections allocation. This should help to preserve system resources. This limit is automatically derived from the listener backlog. Half of its value is reserved for handshakes and the other half for accept queues. By default, backlog is equal to maxconn which guarantee that there can't be no more than maxconn connections in handshake or waiting to be accepted.	2023-11-09 16:24:00 +01:00
Amaury Denoyelle	3df6a60113	MEDIUM: quic: limit handshake per listener Implement a limit per listener for concurrent number of QUIC connections. When reached, INITIAL packets for new connections are automatically dropped until the number of handshakes is reduced. The limit value is automatically based on listener backlog, which itself defaults to maxconn. This feature is important to ensure CPU and memory resources are not consume if too many handshakes attempt are started in parallel. Special care is taken if a connection is released before handshake completion. In this case, counter must be decremented. This forces to ensure that member <qc.state> is set early in qc_new_conn() before any quic_conn_release() invocation.	2023-11-09 16:23:52 +01:00
Amaury Denoyelle	278808915b	MINOR: quic: reduce half open counters scope Accounting is implemented for half open connections which represent QUIC connections waiting for handshake completion. When reaching a certain limit, Retry mechanism is automatically activated prior to instantiate new connections. The issue with this behavior is that two notions are mixed : QUIC connection handshake phase and Retry which is mechanism against amplification attacks. As such, only peer address validation should be taken into account to activate Retry protection. This patch chooses to reduce the scope of half_open_conn. Now only connection waiting to validate the peer address are now accounted for. Most notably, connections instantiated with a validated Retry token check are not accounted. One impact of this patch is that it should prevent to activate Retry mechanism too early, in particular in case if multiple handshakes are too slow. Another limitation should be implemented to protect against this scenario.	2023-11-09 16:23:52 +01:00
Amaury Denoyelle	d38bb7f8a7	MEDIUM: quic: adjust address validation When a new QUIC connection is created, server considers peer address as not yet validated. The server must limit its sending up to 3 times the content already received. This is a defensive measure to avoid flooding a remote host victim of address spoofing. This patch adjust the condition to consider the peer address as validated. Two conditions are now considered : * successful handling of a received HANDSHAKE packet. This was already done before although implemented in a different way. * validation of a Retry token. This was not considered prior this patch despite RFC recommandation. This patch also adjusts how a connection is internally labelled as using a validated peer address. Before, above conditions were checked via quic_peer_validated_addr(). Now, a flag QUIC_FL_CONN_PEER_VALIDATED_ADDR is set to labelled this. It already existed prior this patch but was only used for quic_cc_conn. This should now be more explicit.	2023-11-09 16:23:52 +01:00
Christopher Faulet	3a051ca0c8	BUG/MEDIUM: mux-h1: Exit early if fast-forward is not supported by opposite SC The commit `4be0c7c65` ("MEDIUM: stconn/muxes: Loop on data fast-forwarding to forward at least a buffer") introduced a regression. In h1_fastfwd(), if data fast-forwarding is not supported by the opposite SC, we must exit without calling se_donn_ff(). Otherwise a BUG_ON() will be triggered because the opposite mux has no .done_fastfwd() callback function. No backport needed.	2023-11-09 15:18:43 +01:00
William Lallemand	3ac3a06963	MEDIUM: mworker: -W is mandatory when using -S Defining a master CLI without the master-worker mode emits a warning since version 1.8. This patch enforce the behavior by forbiding the usage of the -S option without the master-worker mode.	2023-11-09 15:07:15 +01:00
William Lallemand	da24b462c3	MEDIUM: errors: move the MODE_QUIET test in print_message() Move the MODE_QUIET and MODE_VERBOSE test in print_message() so we always output in the startup-logs even with MODE_QUIET. ha_warning(), ha_alert() and ha_notice() does not check the MODE_QUIET and MODE_VERBOSE anymore, it is done before doing the fprintf() in print_message().	2023-11-09 14:39:11 +01:00
William Lallemand	59d699c0c4	MINOR: errors: does not check MODE_STARTING for log emission ha_alert(), ha_warning() and ha_notice() shouldn't check MODE_STARTING for log emission. Let's remove the check. This shouldn't do much since the stdio_quiet() function mute the output in main().	2023-11-09 14:39:11 +01:00
William Lallemand	b959b752f9	MINOR: errors: ha_alert() and ha_warning() uses warn_exec_path() Move the code to display the haproxy version and path during starting mode, which is called by the first ha_alert() or ha_warning().	2023-11-09 14:39:11 +01:00
Christopher Faulet	78021ee9ef	BUG/MEDIUM: stconn: Don't update stream expiration date if already expired The commit `08d7169f4` ("MINOR: stconn: Don't queue stream task in past in sc_notify()") tried to fix issues with epiration date set in past for the stream in sc_notify(). However it remains some cases where the stream expiration date may already be expired before recomputing it. This happens when an event is reported by the mux exactly when a timeout is triggered. In this case, depending on the scheduling, the SC may be woken up before the stream. For these cases, we fall into the BUG_ON() preventing to queue in the past. So, it remains unexpected to queue a task in the past. The BUG_ON() is correct at this place. We must just avoid to recompute the stream expiration date if it is already expired. At worst, the stream will be woken up for nothing. But it is not really a big deal because it will only happen on timeouts from time to time. It is so sporadic that we can ignore it from a performance point of view. This patch must be backpoted to 2.8. Be careful to remove the BUG_ON() on the 2.8.	2023-11-09 12:08:59 +01:00
Frédéric Lécaille	819690303d	BUG/MEDIUM: quic: Avoid some crashes upon TX packet allocation failures If a TX packet cannot be allocated (by qc_build_pkt()), as it can be coalesced to another one, this leads the TX buffer to have remaining not sent prepared data. Then haproxy crashes upon a BUG_ON() triggered by the next call to qc_txb_release(). This may happen only during handshakes. To fix this, qc_build_pkt() returns a new -3 error to dected such allocation failures followed which is for now on followed by a call to qc_purge_txbuf() to send the TX prepared data and purge the TX buffer. Must be backported as far as 2.6.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	b21e08cbd2	BUG/MEDIUM: quic: Possible crashes when sending too short Initial packets This may happen during handshakes when Handshake packets cannot be coalesced to a first Initial packet because of TX frame allocation failures (from qc_build_frms()). This leads too short (not padded) Initial packets to be sent. This is detected by a BUG_ON() in qc_send_ppkts(). To avoid this an Handshake packet without ack-eliciting frames which should have been built by qc_build_frms() is built. Must be backported as far as 2.6.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	c78cb49a3b	BUG/MEDIUM: quic: Avoid trying to send ACK frames from an empty ack ranges tree This may happen upon ack ranges allocation failures (from quic_update_ack_ranges_list(). This can lead to empty trees of ack ranges to be used to build ACK frames which is not good at all. Furthermore this is detected by a BUG_ON() (in qc_do_build_pkt()). To avoid this, simply update the acknowledgemen state of the connection only if quic_update_ack_ranges_list() succeeds, as it fails only in case of memory allocation failures. Must be backported as far as 2.6.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	4e3b28e8b6	BUG/MEDIUM: quic: Too short Initial packet sent (enc. level allocation failed) If the Handshake encryption level could not be allocated, this could lead to Initial packets to be sent because no Handshake CRYPTO frames were generated. Furthermore in such an allocation failure case, the connection should be closed as soon as possible. This is done making ha_quic_set_encryption_secrets() return 0 upon an encryption level allocation failure. Also fix a typo in the trace in relation to this allocation failure. No need to be backported.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	4cf784f38e	MINOR: quic: Avoid zeroing frame structures Do not initialize anymore ->type of quic_frame structures which leads to the others to be zeroed.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	f1be725474	CLEANUP: quic: Indentation fix in qc_do_build_pkt() Modification without any functional impact.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	7ecf4b34b9	BUG/MINOR: quic: idle timer task requeued in the past When the idle timer expired with a still present mux, this task was not freed and even requeued with a timer in the past. Fix this issue calling task_destroy() in this case. As the task is freed, its handler must return NULL setting local <t> variable to NULL in every cases. Also ensure that this timer task is not armed again after having been released with a <return> statement when this is the case from qc_idle_timer_do_rearm(). Must be backported as far as 2.6.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	b48abf0beb	MINOR: quic: Add idle timer task pointer to traces Helpful to detect if this timer was freed or not.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	4cfae3ac01	MINOR: quic: release the TLS context asap from quic_conn_release() This was no reason not to release as soon as possible the TLS/SSL QUIC connection context from quic_conn_release() before allocating a "closing connection" connection (quic_cc_conn struct).	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	3a8dd48e30	MEDIUM: quic: Heavy task mode with non contiguously bufferized CRYPTO data This patch sets the handshake task in heavy task mode when receiving in disorder CRYPTO data which results in in order bufferized CRYPTO data. This is done thanks to a non-contiguous buffer and from qc_handle_crypto_frm() after having potentially bufferized CRYPTO data in this buffer. qc_treat_rx_crypto_frms() is no more called from qc_treat_rx_pkts() but instead this is where the task is set in heavy task mode. Consequently, this is the job of qc_ssl_provide_all_quic_data() to call directly qc_treat_rx_crypto_frms() to provide the in order bufferized CRYPTO data to the TLS stack. As this function releases the non-contiguous buffer for the CRYPTO data, if possible, there is no need to do that from qc_treat_rx_crypto_frms() anymore.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	94d20be138	MEDIUM: quic: Heavy task mode during handshake Add a new pool for the CRYPTO data frames received in order. Add ->rx.crypto_frms list to each encryption level to store such frames when they are received in order from qc_handle_crypto_frm(). Also set the handshake task (qc_conn_io_cb()) in heavy task mode from this function after having received such frames. When this task detects that it is set in heavy mode, it calls qc_ssl_provide_all_quic_data() newly implemented function to provide the CRYPTO data to the TLS task. Modify quic_conn_enc_level_uninit() to release these CRYPTO frames when releasing the encryption level they are in relation with.	2023-11-09 10:32:31 +01:00
Christopher Faulet	84d26bcf3f	MINOR: stconn/mux-h2: Use a iobuf flag to report EOI to consumer side during FF IOBUF_FL_EOI iobuf flag is now set by the producer to notify the consumer that the end of input was reached. Thanks to this flag, we can remove the ugly ack in h2_done_ff() to test the opposite SE flags. Of course, for now, it works and it is good enough. But we must keep in mind that EOI is always forwarded from the producer side to the consumer side in this case. But if this change, a new CO_RFL_ flag will have to be added to instruct the producer if it can forward EOI or not.	2023-11-08 21:14:07 +01:00
Christopher Faulet	4be0c7c655	MEDIUM: stconn/muxes: Loop on data fast-forwarding to forward at least a buffer In the mux-to-mux data forwarding, we now try, as far as possible to send at least a buffer. Of course, if the consumer side is congested or if nothing more can be received, we leave. But the idea is to retry to fast-forward data if less than a buffer was forwarded. It is only performed for buffer fast-forwarding, not splicing. The idea behind this patch is to optimise the forwarding, when a first forward was performed to complete a buffer with some existing data. In this case, the amount of data forwarded is artificially limited because we are using a non-empty buffer. But without this limitation, it is highly probable that a full buffer could have been sent. And indeed, with H2 client, a significant improvement was observed during our test. To do so, .done_fastfwd() callback function must be able to deal with interim forwards. Especially for the H2 mux, to remove H2_SF_NOTIFIED flags on the H2S on the last call only. Otherwise, the H2 stream can be blocked by itself because it is in the send_list. IOBUF_FL_INTERIM_FF iobuf flag is used to notify the consumer it is not the last call. This flag is then removed on the last call.	2023-11-08 21:14:07 +01:00
Willy Tarreau	a57f2a5cfe	BUG/MEDIUM: pool: try once to allocate from another bucket if empty In order to limit inter-thread contention on the global pool, in 2.9-dev3 with commit `7bf829ace` ("MAJOR: pools: move the shared pool's free_list over multiple buckets"), it was decided that if the selected bucket had an empty free list, we would simply give up and fall back to the OS allocator. But this causes allocations to be made from the OS for certain threads, to be released to overloaded pools that are sent back to the OS. One visible effect is that sending a lot of traffic using h2load with 100 parallel streams over 100 connections causes 5-10k buffers to be allocated, then reducing the load to only 10 connections doesn't make these allocations go down, just because some buckets are no longer visited. Tests show that giving a second chance to pick another bucket in this case is sufficient to visit all other buckets and recycle their pending objects. Now "show pools" that starts at 10k buffers at 100 connections goes down to about 150 with 1 connection and 100 streams in a fraction of a second. No backport is needed, as the issue is only in 2.9.	2023-11-08 17:14:03 +01:00
Willy Tarreau	a9ae094b27	BUG/MINOR: pool: check one other random bucket on alloc conflict Since 2.9-dev3 with commit `7bf829ace` ("MAJOR: pools: move the shared pool's free_list over multiple buckets"), the global pool supports multiple heads to reduce inter-thread contention. However, when grabbing a freelist head fails because another thread is already picking from it, we just skip to the next one and try again. Unfortunately, it still maintains a bit of contention between thread pairs when for some reasons only a few threads are used. This may happen for example when running on a 4- or 8- thread system and the two most active ones end up on adjacent buckets. A better and much simpler solution consists in visiting a random bucket instead of the current one. Tests show that the CPU usage spent in pool_refill_local_from_shared() reduces at low number of connections (hence threads). No backport is needed, as the issue is only in 2.9.	2023-11-08 17:12:49 +01:00
Christopher Faulet	5705a6e3b7	BUG/MEDIUM: freq-ctr: Don't report overshoot for long inactivity period The function returning the excess of events over the current period for a target frequency (the overshoot) has a flaw if the inactivity period is too long. In this case, the result may overflow. Instead to be negative, a very high positive value is returned. This function is used by the bandwidth limitation filter. It means after a long inactivity period, a huge burst may be detected while it should not. In fact, the problem arise from the moment we're past the current period. In this case, we should not report any overshoot and just get the number of remaining events as usual. This patch should be backported as far as 2.7.	2023-11-08 16:38:06 +01:00
Christopher Faulet	2c9c2f9d77	BUG/MINOR: mux-h1: Properly handle http-request and http-keep-alive timeouts It is now the turn for the H1 mux to be fix to properly handle http-request and http-keep-alive timeouts. It is quite surprising but it is broken since the 2.2. For idle connections on client side, the smallest value between the client timeout and the http-request/http-keep-alive timeout is used while the client timeout should only be used if other ones are not defined. So, if the client timeout is the smallest value, the keep-alive timeout is not respected. It is only an issue for idle client connections. The http-request timeout is respected from the moment part of the next request was received. This patch should fix the issue #2334. It must be backported as far as 2.2. But be careful during the backports. The H1 mux had evolved a lot since the 2.2.	2023-11-08 16:38:06 +01:00
Aurelien DARRAGON	8dae361f35	MINOR: stktable/cli: support v6tov4 and v4tov6 conversions Add a special treatment for the IPV4 and IPV6 cases in table_process_entry_per_key() function so that input string is parsed in best effort (STR to pseudo type ADDR): input format is first considered over table type and then let smp_to_stkey() do the type conversion for us when needed. This patch heavily depends on: - "MEDIUM: stktable/cli: simplify entry key handling" And optionally depends on: - `72514a44` ("MEDIUM: tools/ip: v4tov6() and v6tov4() rework")	2023-11-08 16:38:06 +01:00
Aurelien DARRAGON	0a47e6bccc	MEDIUM: stktable/cli: simplify entry key handling Make use of smp_to_stkey() in table_process_entry_per_key() to simplify key handling and leverage auto type conversions from sample API. One noticeable side effect is that integer input checks will be relaxed given that c_str2int() sample conv is more permissible than the integrated table_process_entry_per_key() integer parser.	2023-11-08 16:38:06 +01:00
Aurelien DARRAGON	c6826b9570	BUG/MINOR: stick-table/cli: Check for invalid ipv4 key When an ipv4 key is used to filter a CLI command on a stick table clear/set/show table ...), inetaddr_host+htonl combination was used with no error checking. Instead, we now use inet_pton(), which is what we use for ipv6 addresses since `b7c962b0c0` ("BUG/MINOR: stick-table/cli: Check for invalid ipv6 key") Doing this allows us to easily check for parsing errors: we're trading off some parsing efficience to better catch input errors and ensure we get similar behavior between ipv4 and ipv6 addresses handling. This patch may be backported to all supported versions.	2023-11-08 16:38:06 +01:00
Christopher Faulet	ba6ad4654e	BUG/MINOR: mux-h1: Release empty ibuf during data fast-forwarding We must take care to release H1 input buffer when it is emptied during the fast-forwarding nego. Otherwise, it may be kept allocated for a while, waiting for the next "normal" receive or the H1C release. No backport needed.	2023-11-08 16:38:06 +01:00
Amaury Denoyelle	d434acd8bb	MINOR: proto_reverse_connect: use connect timeout Use backend connect timeout when a new connection is instantiated for rhttp. This ensures that if connect operation fails after a certain delay, reverse_connect listener task is woken up. This allows to free the current connection and retry a new connect. As a consequence of this change, rev_process() may be woken up even if connection is not reported with CO_FL_ERROR. This happens if timeout fired before any network reported issue. Connection freeing is adjusted as in this case MUX instance is already allocated. Use destroy callback to release MUX context prior to the connection itself. This patch is really useful as a side measure for a haproxy bug impacting connect with SSL for both backend connections and active reverse connect. This is caused by the delayed allocation of MUX allocation. Asynchronous connect error detected at the socket layer is not notified to upper layers. Currently, only connect timeout allows to release this failed connection.	2023-11-08 10:17:43 +01:00

1 2 3 4 5 ...

16773 Commits