haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-26 17:11:31 +02:00

Author	SHA1	Message	Date
Christopher Faulet	cc640e851a	BUG/MINOR: log: Preserve message facility when the log target is a ring buffer When a ring is used as log target, the original facility, if any, must be preserved. The default facility must only be used if there no facility was found in the incoming log message. This patch should fix the issue #1901. It must be backported as far as 2.4.	2022-10-20 09:03:19 +02:00
Amaury Denoyelle	9e3026c58d	MINOR: quic: extend Retry token check function On Initial packet reception, token is checked for validity through quic_retry_token_check() function. However, some related parts were left in the parent function quic_rx_pkt_retrieve_conn(). Move this code directly into quic_retry_token_check() to facilitate its call in various context. The API of quic_retry_token_check() has also been refactored. Instead of working on a plain char* buffer, it now uses a quic_rx_packet instance. This helps to reduce the number of parameters. This change will allow to check Retry token even if data were received with a FD-owned quic-conn socket. Indeed, in this case, quic_rx_pkt_retrieve_conn() call will probably be skipped. This should be backported up to 2.6.	2022-10-19 18:45:58 +02:00
Amaury Denoyelle	6e56a9e055	MINOR: quic: refactor packet drop on reception Sometimes, a packet is dropped on reception. Several goto statements are used, mostly to increment a proxy drop counter or drop silently the packet. However, this labels are interleaved. Re-arrang goto labels to simplify this process : * drop label is used to drop a packet with counter incrementation. This is the default method. * drop_silent is the next label which does the same thing but skip the counter incrementation. This is useful when we do not need to report the packet dropping operation. This should be backported up to 2.6.	2022-10-19 18:45:58 +02:00
Amaury Denoyelle	982896961c	MINOR: quic: split and rename qc_lstnr_pkt_rcv() This change is the following of qc_lstnr_pkt_rcv() refactoring. This function has finally been split into several ones. The first half is renamed quic_rx_pkt_parse(). This function is responsible to parse a QUIC packet header and calculate the packet length. QUIC connection retrieval has been extracted and is now called directly by quic_lstnr_dghdlr(). The second half of qc_lstnr_pkt_rcv() is renamed to qc_rx_pkt_handle(). This function is responsible to copy a QUIC packet content to a quic-conn receive buffer. A third function named qc_rx_check_closing() is responsible to detect if the connection is already in closing state. As this requires to drop the whole datagram, it seems justified to be in a separate function. This change has no functional impact. It is part of a refactoring series on qc_lstnr_pkt_rcv(). The objective is to facilitate the integration of FD-owned quic-conn socket patches. This should be backported up to 2.6.	2022-10-19 18:45:55 +02:00
Amaury Denoyelle	449b1a8f55	MINOR: quic: extract connection retrieval Simplify qc_lstnr_pkt_rcv() by extracting code responsible to retrieve the quic-conn instance. This code is put in a dedicated function named quic_rx_pkt_retrieve_conn(). This new function could be skipped if a FD-owned quic-conn socket is used. The first traces of qc_lstnr_pkt_rcv() have been clean up as qc instance is always NULL here : thus qc parameter can be removed without any change. This change has no functional impact. It is a part of a refactoring series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of FD-owned socket patches. This should be backported up to 2.6.	2022-10-19 18:12:56 +02:00
Amaury Denoyelle	deb7c87f55	MINOR: quic: define first packet flag Received packets treatment has some difference regarding if this is the first one or not of the encapsulating datagram. Previously, this was set via a function argument. Simplify this by defining a new Rx packet flag named QUIC_FL_RX_PACKET_DGRAM_FIRST. This change does not have functional impact. It will simplify API when qc_lstnr_pkt_rcv() is broken into several functions : their number of arguments will be reduced thanks to this patch. This should be backported up to 2.6.	2022-10-19 18:12:56 +02:00
Amaury Denoyelle	845169da58	MINOR: quic: extend pn_offset field from quic_rx_packet pn_offset field was only set if header protection cannot be removed. Extend the usage of this field : it is now set everytime on packet parsing in qc_lstnr_pkt_rcv(). This change helps to clean up API of Rx functions by removing unnecessary variables and function argument. This change has no functional impact. It is a part of a refactoring series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of FD-owned socket patches. This should be backported up to 2.6.	2022-10-19 18:12:56 +02:00
Amaury Denoyelle	0eae57273b	MINOR: quic: add version field on quic_rx_packet Add a new field version on quic_rx_packet structure. This is set on header parsing in qc_lstnr_pkt_rcv() function. This change has no functional impact. It is a part of a refactoring series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of FD-owned socket patches. This should be backported up to 2.6.	2022-10-19 18:12:56 +02:00
Amaury Denoyelle	6c940569f6	BUG/MINOR: quic: fix buffer overflow on retry token generation When generating a Retry token, client CID is used as encryption input. The client must reuse the same CID when emitting the token in a new Initial packet. A memory overflow can occur on quic_generate_retry_token() depending on the size of client CID. This is because space reserved for <aad> only accounted for QUIC_HAP_CID_LEN (size of haproxy owned generated CID). However, the client CID size only depends on client parameter and is instead limited to QUIC_CID_MAXLEN as specified in RFC9000. This was reproduced with ngtcp2 and haproxy built with ASAN. Here is the error log : ==14964==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffee228cee at pc 0x7ffff785f427 bp 0x7fffee2289e0 sp 0x7fffee228188 WRITE of size 17 at 0x7fffee228cee thread T5 #0 0x7ffff785f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 #1 0x555555906ea7 in quic_generate_retry_token_aad src/quic_conn.c:5452 #2 0x555555907e72 in quic_retry_token_check src/quic_conn.c:5577 #3 0x55555590d01e in qc_lstnr_pkt_rcv src/quic_conn.c:6103 #4 0x5555559190fa in quic_lstnr_dghdlr src/quic_conn.c:7179 #5 0x555555eb0abf in run_tasks_from_lists src/task.c:590 #6 0x555555eb285f in process_runnable_tasks src/task.c:855 #7 0x555555d9118f in run_poll_loop src/haproxy.c:2853 #8 0x555555d91f88 in run_thread_poll_loop src/haproxy.c:3042 #9 0x7ffff709f8fc (/usr/lib/libc.so.6+0x868fc) #10 0x7ffff7121a5f (/usr/lib/libc.so.6+0x108a5f) This must be backported up to 2.6.	2022-10-18 14:36:47 +02:00
Frédéric Lécaille	ea492e3e47	BUILD: quic: Fix build for m68k cross-compilation Fix several warinings as this one: src/qmux_trace.c:80:45: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘uint64_t’ {aka ‘const long long unsigned int’} [-Werror=format=] 80 \| chunk_appendf(&trace_buf, " qcs=%p .id=%lu .st=%s", \| ~~^ \| \| \| long unsigned int \| %llu 81 \| qcs, qcs->id, \| ~~~~~~~ \| \| \| uint64_t {aka const long long unsigned int} compilation terminated due to -Wfatal-errors. Cast remaining uint64_t variables as ullong with %llu as printf format and size_t others as ulong with %lu as printf format. Thank you to Ilya for having reported this issue in GH #1899. Must be backported to 2.6	2022-10-18 12:04:10 +02:00
Amaury Denoyelle	ba303deadc	BUILD: ssl_sock: fix null dereference for QUIC build A previous commit tries to fix uninitialized GCC warning on ssl code for QUIC build. See the fix here : 48e46f98ccf97427995eb41c6f28cc38705bdd7e BUILD: ssl_sock: bind_conf uninitialized in ssl_sock_bind_verifycbk() However, this is incomplete as it still reports possible NULL dereference on ctx variable (GCC v12.2.0). Here is the compilation result : src/ssl_sock.c: In function ‘ssl_sock_bind_verifycbk’: src/ssl_sock.c:1739:12: error: potential null pointer dereference [-Werror=null-dereference] 1739 \| ctx->xprt_st \|= SSL_SOCK_ST_FL_VERIFY_DONE; \| To fix this, remove check on qc which can also never happens and replace it with a BUG_ON. This seems to satisfy GCC on my machine. This must be backported up to 2.6.	2022-10-17 18:58:09 +02:00
Thierry Fournier	74a9eb5216	BUG/MEDIUM: httpclient: segfault when the httpclient parser fails If the uri is unexpected ("/" in place of "http://xxx/"), some parsing function fails. The failure is not handled. This patch handle these errors. Note: the return code is boolean, maybe we can return more precise error for Lua reporting ? Must be backported in 2.6.	2022-10-17 12:04:06 +02:00
Fr�d�ric L�caille	5a5d05c71b	BUILD: quic: QUIC mux build fix for 32-bit build Thank you to Ilya for having reported this issue in GH #1897 Must be backported to 2.6.	2022-10-14 22:43:08 +02:00
Christopher Faulet	380ae9c3ff	MINOR: httpclient/lua: Don't set req_payload callback if body is empty The HTTPclient callback req_payload callback is set when a request payload must be streamed. In the lua, this callback is set when a body is passed as argument in one of httpclient functions (head/get/post/put/delete). However, there is no reason to set it if body string is empty. This patch is related to the issue #1898. It may be backported as far as 2.5.	2022-10-14 15:18:25 +02:00
Christopher Faulet	48005de17c	BUG/MEDIUM: httpclient: Don't set EOM flag on an empty HTX message In the HTTP client, when the request body is streamed, at the end of the payload, we must be sure to not set the EOM flag on an empty message. Otherwise, because there is no data, the buffer is reset to be released and the flag is lost. Thus, the HTTP client is never notified of the end of payload for the request and the applet is blocked. If the HTTP client is instanciated from a Lua script, it is even worse because we fall into a wakeup loop between the lua script and the HTTP client applet. At the end, HAProxy is killed because of the watchdog. This patch should fix the issue #1898. It must be backported to 2.6.	2022-10-14 15:18:25 +02:00
Fr�d�ric L�caille	48e46f98cc	BUILD: ssl_sock: bind_conf uninitialized in ssl_sock_bind_verifycbk() Even if this cannot happen, ensure <bind_conf> is initialized in this function to please some compilers. Takes the opportunity of this patch to replace an ABORT_NOW() by a BUG_ON() because if the variable values they test are not initialized, this is really because there is a bug. Must be backported to 2.6.	2022-10-14 10:25:11 +02:00
Willy Tarreau	f5a0c8abf5	MEDIUM: quic: respect the threads assigned to a bind line Right now the QUIC thread mapping derives the thread ID from the CID by dividing by global.nbthread. This is a problem because this makes QUIC work on all threads and ignores the "thread" directive on the bind lines. In addition, only 8 bits are used, which is no more compatible with the up to 4096 threads we may have in a configuration. Let's modify it this way: - the CID now dedicates 12 bits to the thread ID - on output we continue to place the TID directly there. - on input, the value is extracted. If it corresponds to a valid thread number of the bind_conf, it's used as-is. - otherwise it's used as a rank within the current bind_conf's thread mask so that in the end we still get a valid thread ID for this bind_conf. The extraction function now requires a bind_conf in order to get the group and thread mask. It was better to use bind_confs now as the goal is to make them support multiple listeners sooner or later.	2022-10-13 18:08:05 +02:00
William Lallemand	ec1f8a62ca	MINOR: mworker/cli: reload command displays the startup-logs Change the output of the "reload" command, it now displays "Success=0" if the reload failed and "Success=1" if it succeed. If the startup-logs is available (USE_SHM_OPEN=1), the command will print a "--\n" line, followed by the content of the startup-logs. Example: $ echo "reload" \| socat /tmp/master.sock - Success=1 -- [NOTICE] (482713) : haproxy version is 2.7-dev7-4827fb-69 [NOTICE] (482713) : path to executable is ./haproxy [WARNING] (482713) : config : 'http-request' rules ignored for proxy 'frt1' as they require HTTP mode. [NOTICE] (482713) : New worker (482720) forked [NOTICE] (482713) : Loading success. $ echo "reload" \| socat /tmp/master.sock - Success=0 -- [NOTICE] (482886) : haproxy version is 2.7-dev7-4827fb-69 [NOTICE] (482886) : path to executable is ./haproxy [ALERT] (482886) : config : parsing [test3.cfg:1]: unknown keyword 'Aglobal' out of section. [ALERT] (482886) : config : Fatal errors found in configuration. [WARNING] (482886) : Loading failure! $	2022-10-13 17:59:48 +02:00
William Lallemand	eba6a54cd4	MINOR: logs: startup-logs can use a shm for logging the reload When compiled with USE_SHM_OPEN=1 the startup-logs are now able to use an shm which is used to keep the logs when switching to mworker wait mode. This allows to keep the failed reload logs. When allocating the startup-logs at first start of the process, haproxy will do a shm_open with a unique path using the PID of the process, the file is unlink immediatly so we don't let unwelcomed files be. The fd resulting from this shm is stored in the HAPROXY_STARTUPLOGS_FD environment variable so it can be mmap again when switching to wait mode. When forking children, the process is copying the mmap to a a mallocated ring so we never share the same memory section between the master and the workers. When switching to wait mode, the shm is not used anymore as it is also copied to a mallocated structure. This allow to use the "show startup-logs" command over the master CLI, to get the logs of the latest startup or reload. This way the logs of the latest failed reload are also kept. This is only activated on the linux-glibc target for now.	2022-10-13 16:50:22 +02:00
William Lallemand	9e4ead3095	MINOR: ring: ring_cast_from_area() cast from an allocated area Cast an unified ring + storage area to a ring from area, without reinitializing the data buffer. Reinitialize the waiters and the lock. It helps retrieving a previously allocated ring, from an mmap for example.	2022-10-13 16:45:28 +02:00
Amaury Denoyelle	91b2305ad7	MINOR: quic: implement datagram cleanup for quic_receiver_buf Each time data is read on QUIC receiver socket, we try to reuse the first datagram of the currently used quic_receiver_buf instead of allocating a new one. This algorithm is suboptimal if there is several unused datagrams as only the first one is tested and its buffer removed from quic_receiver_buf. If QUIC traffic is quite substential, this can lead to an important number of quic_dgram occurences allocated from pool_head_quic_dgram and a lack of free space in allocated quic_receiver_buf buffers. To improve this, each time we want to reuse a datagram, we pop elements until a non-yet released datagram is found or the list is empty. All intermediary elements are freed and the last found datagram can be reused. This operation has been extracted in a dedicated function named quic_rxbuf_purge_dgrams(). This should improve memory consumption incured by quic_dgram instances under heavy QUIC traffic. Note that there is still room for improvement as if the first datagram is still in use, it may block several unused datagram after him. However this requires to support removal of datagrams out of order which is currently not possible. This should be backported up to 2.6.	2022-10-13 11:06:48 +02:00
Amaury Denoyelle	1cba8d60f3	CLEANUP: quic: improve naming for rxbuf/datagrams handling QUIC datagrams are read from a random thread. They are then redispatch to the connection thread according to the first packet DCID. These operations are implemented through a special buffer designed to avoid locking. Refactor this code with the following changes : * <rxbuf> type is renamed <quic_receiver_buf>. Its list element is also renamed to highligh its attach point to a receiver. * <quic_dgram> and <quic_receiver_buf> definition are moved to quic_sock-t.h. This helps to reduce the size of quic_conn-t.h. * <quic_dgram> list elements are renamed to highlight their attach point into a <quic_receiver_buf> and a <quic_dghdlr>. This should be backported up to 2.6.	2022-10-13 11:06:48 +02:00
Amaury Denoyelle	8c4d062d25	CLEANUP: quic: remove unused rxbufs member in receiver rxbuf is the structure used to store QUIC datagrams and redispatch them to the connection thread. Each receiver manages a list of rxbuf. This was stored both as an array and a mt_list. Currently, only mt_list is needed so removed <rxbufs> member from receiver structure. This should be backported up to 2.6.	2022-10-13 11:05:41 +02:00
Frédéric Lécaille	e1a49cfd4d	MINOR: quic: Split the secrets key allocation in two parts Implement quic_tls_secrets_keys_alloc()/quic_tls_secrets_keys_free() to allocate the memory for only one direction (RX or TX). Modify ha_quic_set_encryption_secrets() to call these functions for one of this direction (or both). So, for now on we can rely on the value of the secret keys to know if it was derived. Remove QUIC_FL_TLS_SECRETS_SET flag which is no more useful. Consequently, the secrets are dumped by the traces only if derived. Must be backported to 2.6.	2022-10-13 10:12:03 +02:00
Frédéric Lécaille	4aa7d8197a	BUG/MINOR: quic: Stalled 0RTT connections with big ClientHello TLS message This issue was reproduced with -Q picoquic client option to split a big ClientHello message into two Initial packets and haproxy as server without any knowledged of any previous ORTT session (restarted after a firt 0RTT session). The ORTT received packets were removed from their queue when the second Initial packet was parsed, and the QUIC handshake state never progressed and remained at Initial state. To avoid such situations, after having treated some Initial packets we always check if there are ORTT packets to parse and we never remove them from their queue. This will be done after the hanshake is completed or upon idle timeout expiration. Also add more traces to be able to analize the handshake progression. Tested with ngtcp2 and picoquic Must be backported to 2.6.	2022-10-13 10:12:03 +02:00
Frédéric Lécaille	9f9263ed13	MINOR: quic: Use a non-contiguous buffer for RX CRYPTO data Implement quic_get_ncbuf() to dynamically allocate a new ncbuf to be attached to any quic_cstream struct which needs such a buffer. Note that there is no quic_cstream for 0RTT encryption level. quic_free_ncbuf() is added to release the memory allocated for a non-contiguous buffer. Modify qc_handle_crypto_frm() to call this function and allocate an ncbuf for crypto data which are not received in order. The crypto data which are received in order are not buffered but provide to the TLS stack (calling qc_provide_cdata()). Modify qc_treat_rx_crypto_frms() which is called after having provided the in order received crypto data to the TLS stack to provide again the remaining crypto data which has been buffered, if possible (if they are in order). Each time buffered CRYPTO data were consumed, we try to release the memory allocated for the non-contiguous buffer (ncbuf). Also move rx.crypto.offset quic_enc_level struct member to rx.offset quic_cstream struct member. Must be backported to 2.6.	2022-10-13 10:12:03 +02:00
Frédéric Lécaille	a20c93e6e2	MINOR: quic: Extract CRYPTO frame parsing from qc_parse_pkt_frms() Implement qc_handle_crypto_frm() to parse a CRYPTO frame. Must be backported to 2.6.	2022-10-13 10:12:03 +02:00
Frédéric Lécaille	7e3f7c47e9	MINOR: quic: New quic_cstream object implementation Add new quic_cstream struct definition to implement the CRYPTO data stream. This is a simplication of the qcs object (QUIC streams) for the CRYPTO data without any information about the flow control. They are not attached to any tree, but to a QUIC encryption level, one by encryption level except for the early data encryption level (for 0RTT). A stream descriptor is also allocated for each CRYPTO data stream. Must be backported to 2.6	2022-10-13 10:12:03 +02:00
Willy Tarreau	d114f4a68f	MEDIUM: checks: spread the checks load over random threads The CPU usage pattern was found to be high (5%) on a machine with 48 threads and only 100 servers checked every second That was supposed to be only 100 connections per second, which should be very cheap. It was figured that due to the check tasks unbinding from any thread when going back to sleep, they're queued into the shared queue. Not only this requires to manipulate the global queue lock, but in addition it means that all threads have to check the global queue before going to sleep (hence take a lock again) to figure how long to sleep, and that they would all sleep only for the shortest amount of time to the next check, one would pick it and all other ones would go down to sleep waiting for the next check. That's perfectly visible in time-to-first-byte measurements. A quick test consisting in retrieving the stats page in CSV over a 48-thread process checking 200 servers every 2 seconds shows the following tail: percentile ttfb(ms) 99.98 2.43 99.985 5.72 99.99 32.96 99.995 82.176 99.996 82.944 99.9965 83.328 99.997 83.84 99.9975 84.288 99.998 85.12 99.9985 86.592 99.999 88 99.9995 89.728 99.9999 100.352 One solution could consist in forcefully binding checks to threads at boot time, but that's annoying, will cause trouble for dynamic servers and may cause some skew in the load depending on some server patterns. Instead here we take a different approach. A check remains bound to its thread for as long as possible, but upon every wakeup, the thread's load is compared with another random thread's load. If it's found that that other thread's load is less than half of the current one's, the task is bounced to that thread. In order to prevent that new thread from doing the same, we set a flag "CHK_ST_SLEEPING" that indicates that it just woke up and we're bouncing the task only on this condition. Tests have shown that the initial load was very unfair before, with a few checks threads having a load of 15-20 and the vast majority having zero. With this modification, after two "inter" delays, the load is either zero or one everywhere when checks start. The same test shows a CPU usage that significantly drops, between 0.5 and 1%. The same latency tail measurement is much better, roughly 10 times smaller: percentile ttfb(ms) 99.98 1.647 99.985 1.773 99.99 4.912 99.995 8.76 99.996 8.88 99.9965 8.944 99.997 9.016 99.9975 9.104 99.998 9.224 99.9985 9.416 99.999 9.8 99.9995 10.04 99.9999 10.432 In fact one difference here is that many threads work while in the past they were waking up and going down to sleep after having perturbated the shared lock. Thus it is anticipated that this will scale way smoother than before. Under strace it's clearly visible that all threads are sleeping for the time it takes to relaunch a check, there's no more thundering herd wakeups. However it is also possible that in some rare cases such as very short check intervals smaller than a scheduler's timeslice (such as 4ms), some users might have benefited from the work being concentrated on less threads and would instead observe a small increase of apparent CPU usage due to more total threads waking up even if that's for less work each and less total work. That's visible with 200 servers at 4ms where show activity shows that a few threads were overloaded and others doing nothing. It's not a problem, though as in practice checks are not supposed to eat much CPU and to wake up fast enough to represent a significant load anyway, and the main issue they could have been causing (aside the global lock) is an increase last-percentile latency.	2022-10-12 21:49:30 +02:00
Willy Tarreau	a840b4a39b	MINOR: checks: use the lighter PRNG for spread checks There's no point using ha_random32() which is heavy and uses shared variables to calculate a random timer when we have statistical_prng() which does the same and was made exactly for this.	2022-10-12 21:49:30 +02:00
Willy Tarreau	99521abd59	BUG/MINOR: server: make sure "show servers state" hides private bits In the past we've seen "show servers state" dump some internal bits for the check states, that were causing regtests to fail. The relevant bits have been added to the doc to fix the public API and make sure they do not change by accident, but the output doesn't take care of masking the undesired ones, causing regtests (and possibly user programs) to fail when new bits are added. Let's add the mask for the only documented ones (0x0F for check and 0x1F for agent respectively). This could be backported wherever the server state is present, though there's a tiny risk that some undocumented bits might have already leaked to some user scripts, so it might be wise to wait a bit before doing that or even not to backport too far.	2022-10-12 21:45:39 +02:00
Christopher Faulet	104985610d	BUG/MEDIUM: mux-h1: Handle abort with an incomplete message during parsing In h1_process_demux(), aborts for incomplete messages were not properly handled. It was not an issue because the abort was detected later in h1_process(). But it will be an issue to perform the aborts refoctoring. First, when a read0 was detected, the SE_FL_EOI flag was set for messages in DONE or TUNNEL state or for messages without known length (so responses in close mode). The last statement is not accurate. The message must also be in DATA state. Otherwise, SE_FL_EOI flag may be set on incomplete message. Then, an error was reported, via SE_FL_ERROR flag, only when an incomplete message was detected on the payload parsing. It must also be reported if headers are incomplete. Here again, the error is detected later for now. But it could be an issue later. There is no reason to backport this patch.	2022-10-12 17:10:44 +02:00
Christopher Faulet	9009c974c1	BUG/MEDIUM: mux-h1: Add connection error handling when reading/sending on a pipe There is no error handling when we read or write on a pipe. There error is caught later, in the mux I/O handler. But there is no reason to not do so here. There is no reason to backport it because no issue was reported for now because of this "bug". In all cases, it must be evaluated first.	2022-10-12 17:10:44 +02:00
Christopher Faulet	3965aa7494	REORG: mux-fcgi: Extract flags and enums into mux_fcgi-t.h The same was performed for the H2 and H1 multiplexers. FCGI connection and stream flags are moved in a dedicated header file. It will be mainly used to be able to decode mux-fcgi flags from the flags utility. In this patch, we move the flags and enums to mux_fcgi-t.h, as well as the two state decoding inline functions.	2022-10-12 17:10:37 +02:00
Amaury Denoyelle	3e0648837c	BUG/MINOR: stick-table: fix build with DEBUG_THREAD Compilation is broken with DEBUG_THREAD since the following patch 76642223f014f89cd1f374291798499f4fba7dde MEDIUM: stick-table: switch the table lock to rwlock Fix this by updating a legacy HA_SPIN_INIT() to HA_RWLOCK_INIT(). No backport needed unless the mentionned patch is backported.	2022-10-12 16:54:59 +02:00
Willy Tarreau	cbdb528a76	MEDIUM: stick-table: requeue the wakeup task out of the write lock We don't need to call stktable_requeue_exp() with the table's lock held anymore, so let's move it out. It should slightly reduce the contention on the write lock, though it is now already quite low.	2022-10-12 14:19:05 +02:00
Willy Tarreau	dbae89e09c	MEDIUM: stick-table: always use atomic ops to requeue the table's task We're generalizing the change performed in previous commit "MEDIUM: stick-table: requeue the expiration task out of the exclusive lock" to stktable_requeue_exp() so that it can also be used by callers of __stktable_store(). At the moment there's still no visible change since it's still called under the write lock. However, the previous code in stitable_touch_with_exp() was updated to use this function.	2022-10-12 14:19:05 +02:00
Willy Tarreau	eb23e3e243	MINOR: stick-table: split stktable_store() between key and requeue __staktable_store() performs two distinct things, one is to insert a key and the other one is to requeue the task's expiration date. Since the latter might be done without a lock, let's first split the function in two halves. For now this has no impact.	2022-10-12 14:19:05 +02:00
Willy Tarreau	e3f5ae895a	MEDIUM: stick-table: requeue the expiration task out of the exclusive lock With 48 threads, a heavily loaded table with plenty of trackers and rules and a short expiration timer of 10ms saturates the CPU at 232k rps. By carefully using atomic ops we can make sure that t->exp_next and t->task->expire converge to the earliest next expiration date and that all of this can be performed under atomic ops without any lock. That's what this patch is doing in stktable_touch_with_exp(). This is sufficient to double the performance and reach 470k rps. It's worth noting that __stktable_store() uses a mix of eb32_insert() and task_queue, and that the second part of it could possibly benefit from this, even though sometimes it's called under a lock that was already held.	2022-10-12 14:19:05 +02:00
Willy Tarreau	e62885237c	MEDIUM: stick-table: make stktable_set_entry() look up under a read lock On a 24-core machine having some "stick-store response" rules, a lot of time is spent in the write lock in stktable_set_entry(). Let's apply the same mechanism as for the stktable_get_entry() consisting in looking up the value under the read lock and upgrading it to a write lock only to perform modifications. Here we even have the luxury of upgrading the lock since there are no alloc/free in the path. All this increases the performance by 40% (from 363k to 510k rps).	2022-10-12 14:19:05 +02:00
Willy Tarreau	996f1a5124	MEDIUM: stick-table: do not take a lock to update t->current anymore. We don't need to be protected by the table's lock when touching t->current if we do it using atomics, and that's great because it allows us to have a cleaner stksess_new() that doesn't require a lock either, and to avoid manipulating pools under a lock. That's another 1% performance gain from 2.07 to 2.10M req/s under 48 threads.	2022-10-12 14:19:05 +02:00
Willy Tarreau	47f229702e	MEDIUM: stick-table: make stktable_get_entry() look up under a read lock On a 24-core machine doing lots of track-sc, it was found that the lock in stktable_get_entry() was responsible for 25% of the CPU alone. It's sad because most of its job is to protect the table during the lookup. Here we're taking a slightly different approach: the lock is first taken for reads during the lookup, and only in case of failure we switch it for a write lock. We don't even perform an upgrade here since an allocation is needed between the two, it would be wasted to do it under the lock, and is generally not a good idea, so better release the read lock and try again. Here the performance under 48 threads with 3 trackers on the same table jumped from 455k to 2.07M, or 4.55x! Note that the same approach should be possible for stktable_set_entry().	2022-10-12 14:19:05 +02:00
Willy Tarreau	a7d6a1396e	MEDIUM: stick-table: switch to rdlock in stktable_lookup() and lookup_key() These functions do not modify anything in the the table except the refcount on success. Let's just lock the table for shared accesses and make use of atomic ops to update the refcount. This brings a nice gain from 425k to 455k under 48 threads (7%), but some contention remains on the exclusive locks in other parts. Note that the refcount continues to be updated under the lock because it's not yet certain whether there are races between it and some of the exclusive lock on the table. The difference is marginal and we prefer to stay on the safe side for now.	2022-10-12 14:19:05 +02:00
Willy Tarreau	175aa06232	MEDIUM: stick-table: free newly allocated stkess if it couldn't be inserted In __stktable_get_entry() now we're planning for the possibility that the call to __stktable_store() doesn't add the newly allocated entry and instead finds a previously inserted one. At the moment this doesn't exist because the lookup + insert passes are made under the same lock. But it will soon change.	2022-10-12 14:19:05 +02:00
Willy Tarreau	d2d3fd9b5e	MEDIUM: stick-table: return inserted entry in __stktable_store() This function is used to create an entry in the table. But it doesn't consider the possibility that the entry already exists, because right now it's only called in situations where it was verified under a lock that it doesn't exist. Since we'll soon need to break that assumption we need it to verify that the requested entry was added and to return a pointer to the one in the tree so that the caller can detect any possible conflict. At the moment this is not used.	2022-10-12 14:19:05 +02:00
Willy Tarreau	8d3c3336f9	MEDIUM: stick-table: make stksess_kill_if_expired() avoid the exclusive lock stream_store_counters() calls stksess_kill_if_expired() for each active counter. And this one takes an exclusive lock on the table before checking if it has any work to do (hint: it almost never has since it only wants to delete expired entries). However a lock is still neeed for now to protect the ref_cnt, but we can do it atomically under the read lock. Let's change the mechanism. Now what we do is to check out of the lock if the entry is expired. If it is, we take the write lock, expire it, and decrement the refcount. Otherwise we just decrement the refcount under a read lock. With this change alone, the config based on 3 trackers without the previous patches saw a 2.6x improvement, but here it doesn't yet change anything because some heavy contention remains on the lookup part.	2022-10-12 14:19:05 +02:00
Willy Tarreau	a7536ef9e1	MEDIUM: stick-table: only take the lock when needed in stktable_touch_with_exp() As previously mentioned, this function currently holds an exclusive lock on the table during all the time it take to check if the entry needs to be updated and synchronized with peers. The reality is that many setups do not use peers and that on highly loaded setups, the same entries are hammered all the time so the key's expiration doesn't change between a number of consecutive accesses. With this patch we take a different approach. The function starts without taking the lock, and will take it only if needed, keeping track of it. This way we can avoid it most of the time, or even entirely. Finally if the decrefcnt argument requires that the refcount is decremented, we either do it using a non-atomic op if the table was locked (since no other entry may touch it) or via an atomic under the read lock only. With this change alone, a 48-thread test with 3 trackers increased from 193k req/s to 425k req/s, which is a 2.2x factor.	2022-10-12 14:19:05 +02:00
Willy Tarreau	9f5cb435b6	MINOR: stick-table: move the write lock inside stktable_touch_with_exp() Taking the write lock prior to entering that function is a problem because this function is full of conditions that most of the time can lead to eliminating the lock. This commit first moves the write lock inside the function and passes the extra argument required to implement stktable_touch_remote() and stktable_touch_local(). It also renames the function to remove the underscores since there's no other variant and it's exported under this name (probably an old rename that was not propagated). The code was stressed under 48 threads using 3 trackers on the same table. It already shows a tiny 3% improvement from 187k to 193k rps.	2022-10-12 14:19:05 +02:00
Willy Tarreau	4be073b99b	MINOR: stick-table: do not take an exclusive lock when downing ref_cnt At plenty of places we decrement ts->ref_cnt under the write lock because it's held. We don't technically need it to be done that way if there's contention and an atomic could suffice. However until all places are turned to atomic, we at least need to do that under a read lock for now, so that we don't mix atomic and non-atomic uses. Regardless it already brings ~1.5% req rate improvement with 3 trackers on the same table under 48 threads at 184k->187k rps.	2022-10-12 14:19:05 +02:00
Willy Tarreau	76642223f0	MEDIUM: stick-table: switch the table lock to rwlock Right now a spinlock is used, but most accesses are for reads, so let's switch the lock to an rwlock and switch all accesses to exclusive locks for now. There should be no visible difference at this point.	2022-10-12 14:19:05 +02:00

... 57 58 59 60 61 ...

17544 Commits