haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-26 13:21:00 +01:00

Author	SHA1	Message	Date
Willy Tarreau	4b984c5baa	MINOR: ring: simplify the write loop a little bit This is mostly a cleanup in that it turns the two-level loop into a single one, but it also simplifies the code a little bit and brings some performance savings again, which are mostly noticeable on ARM, but don't change anything for x86.	2024-03-25 17:34:19 +00:00
Willy Tarreau	573bbbe127	MEDIUM: ring: improve speed in the queue waiting loop on x86_64 x86_64 doesn't have a native atomic FETCH_OR(), it's implemented using a CAS, which will always cause a write cycle. Here we know we can just wait as long as the lock bit is held so better loop on a load, and only attempt the CAS on success. This requires a tiny ifdef and brings nice benefits. This brings the performance back from 3.33M to 3.75M at 24C48T while doing no change at 3C6T.	2024-03-25 17:34:19 +00:00
Willy Tarreau	30a659c355	MEDIUM: ring: significant boost in the loop by checking the ring queue ptr first By doing that and placing the cpu_relax at the right places, the ARM reaches 6.0M/s on 80 threads. On x86_64, at 3C6T the EPYC sees a small increase from 4.45M to 4.57M but at 24C48T it sees a drop from 3.82M to 3.33M due to the write contention hidden behind the CAS that implements the FETCH_OR(), that we'll address next.	2024-03-25 17:34:19 +00:00
Willy Tarreau	1e2311edbc	MAJOR: ring: implement a waiting queue in front of the ring The queue-based approach consists in forcing threads to wait away from the work area so as not to disturb the current writer, and to prepare the work by grouping them in a queue. The last arrived takes the head of the queue by placing its preinitialized ring cell there, becomes the queue's leader, informs itself about the amount of previously accumulated bytes so that when its turn comes, it immediately knows how much room is needed to be released. It can then take the whole queue with it, leaving an empty one for new threads to come while it's releasing the room needed to copy everything. By doing so we're cascading contention areas so that multiple parts can work in parallel. Note that we must never leave a write counter set to 0xFF at tail, and this happens when a message cannot fit and we give up, because in this case we're writing back tail_ofs, and only later we restore the counter. The solution here is to make a special case when we're going to drop the messages, and to write the readers count before restoring tail. This already shows a tremendous performance gain on ARM (385k -> 4.8M), thanks to the fact that now all waiting threads wait on the queue's head instead of polluting the tail lock. On x86_64, the EPYC sees a big boost at 24C48T (1.88M -> 3.82M) and a slowdown at 3C6T (6.0->4.45) though this one is much less of a concern as so few threads need less bandwidth than bigger counts.	2024-03-25 17:34:19 +00:00
Willy Tarreau	6c1b29d06f	MINOR: ring: make the number of queues configurable Now the rings have one wait queue per group. This should limit the contention on systems such as EPYC CPUs where the performance drops dramatically when using more than one CCX. Tests were run with different numbers and it was showed that value 6 outperforms all other ones at 12, 24, 48, 64 and 80 threads on an EPYC, a Xeon and an Ampere CPU. Value 7 sometimes comes close and anything around these values degrades quickly. The value has been left tunable in the global section. This commit only introduces everything needed to set up the queue count so that it's easier to adjust it in the forthcoming patches, but it was initially added after the series, making it harder to compare. It was also shown that trying to group the threads in queues by their thread groups is counter-productive and that it was more efficient to do that by applying a modulo on the thread number. As surprising as it seems, it does have the benefit of well balancing any number of threads.	2024-03-25 17:34:19 +00:00
Willy Tarreau	e3f101a19a	MINOR: ring: add the definition of a ring waiting cell This is what will be used to describe one waiting thread, its message in the queues, and the aggregation of pending messages after it.	2024-03-25 17:34:19 +00:00
Willy Tarreau	447189f286	MINOR: ring: keep a few frequently used pointers in the local stack Code disassembly shows that ring->storage->tail and ring->queue are accessed a lot and reloaded a lot due to aliasing. Let's just have variables for them in the local stack. It makes the code smaller and slightly faster.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c7bd7a68e4	OPTIM: ring: have only one thread at a time wake up all readers It's inefficient and counter-productive that each ring writer iterates over all readers to wake them up. Let's just have one in charge of this, it strongly limits contention. The only thing is that since the thread is iterating over a list, we want to be sure that if the first readers have already completed their job, they will be woken up again. For this we keep a counter of messages delivered after the wakeup started, and the waking thread will check it before going back to sleep. In order to avoid looping forever, it will also drop its waking flag soon enough to possibly let another one take it. There used to be a few cases of watchdogs before this on a 24-core AMD EPYC platform on the list iteration those never appeared anymore. The perf has dropped a bit on 3C6T on the EPYC, from 6.61 to 6.0M but remains unchanged at 24C48T.	2024-03-25 17:34:19 +00:00
Willy Tarreau	1f8b14b7be	OPTIM: ring: don't even try to update offset when failed to read If there's nothing to read, it's pointless for a reader to try to update the offset pointer, that's two atomic ops to replace a value by itself twice. Let's just stop this.	2024-03-25 17:34:19 +00:00
Willy Tarreau	9e99cfbeb6	MAJOR: ring: drop the now unneeded lock It was only used to protect the list which is now an mt_list so it doesn't provide any required protection anymore. It obviously also used to provide strict ordering between the writer and the reader when the writer started to update the messages, but that's now covered by the oredered tail updates and updates to the readers count to protect the area. The message rate on small thread counts (up to 12) saw a boost of roughly 5% while on large counts while for large counts it lost about 2% due to some contention now becoming visible elsewhere. Typical measures are 6.13M -> 6.61M at 3C6T, and 1.88 -> 1.92M at 24C48T on the EPYC.	2024-03-25 17:34:19 +00:00
Willy Tarreau	cb482f92c4	MINOR: ring: make sure ring_dispatch waits when facing a changing message The writer is using tags 0xFF instead of readers count at the front of messages that are undergoing an update, while the tail has already been updated. The reader needs to take care of this because it can face these messages and mistakenly parse data that's still being written, leading to corruption (especially if this happens while the size is changing). Let's just stop reading when facing reserved codes, since they indicate that the end of usable messages was reached.	2024-03-25 17:34:19 +00:00
Willy Tarreau	31b93b40b0	MEDIUM: ring: protect the initialization of the initial reader offset Since we're going to remove the lock, there's no more way to prevent the ring from being fed while we're attaching a client to it. We need to freeze the buffer while looking at its head so that we can attach there and have a trustable one. We could do it by setting the lock bit on the tail offset but quite frankly we don't need to bother with that, attaching a client is rare enough to permit a thread_isolate().	2024-03-25 17:34:19 +00:00
Willy Tarreau	a2d2dbf210	MEDIUM: ring/applet: turn the wait_entry list to an mt_list instead Rings are keeping a lock only for the list, which apparently doesn't need anything more than an mt_list, so let's first turn it into that before dropping the lock. There should be no visible effect.	2024-03-25 17:34:19 +00:00
Willy Tarreau	04f1e3f3d9	MINOR: ring: don't take the readers lock if there are no readers There's no point looking for freshly attached readers if there are none, taking this lock requires an atomic write to a shared area, something we clearly want to avoid. A general test with 213-byte messages on different thread counts shows how the performance degrades across CCX and how this patch improves the situation: Before After 3C6T/1CCX: 6.39 Mmsg/s 6.35 Mmsg/s 6C12T/2CCX: 2.90 Mmsg/s 3.16 Mmsg/s 12C24T/4CCX: 2.14 Mmsg/s 2.33 Mmsg/s 24C48T/8CCX: 1.75 Mmsg/s 1.92 Mmsg/s This tends to confirm that the queues will really be needed and that they'll have to be per-ccx hence per thread-group. They will amortize the number of updates on head & tail (one per multiple messages).	2024-03-25 17:34:19 +00:00
Willy Tarreau	41d3ea521b	MEDIUM: ring: unlock the ring's tail earlier We know we can continue to protect the message area so we can unlock the tail as soon as we know its new value. Now we're seeing ~6.4M msg/s vs 5.4M previously on 3C6T of a 3rd gen EPYC, and 1.88M vs 1.54M for 24C48T threads, which is a significant gain! This requires to carefully write the new head counter before releasing the writers, and to change the calculation of the work area from tail..head to tail...new_tail while writing the message.	2024-03-25 17:34:19 +00:00
Willy Tarreau	3cdd3d27a8	MEDIUM: move the ring's lock to only protect the readers list Now the lock is only taken around the readers list. With careful ordering of writes to head/tail, the ring remains protected. The perf is a bit better, though (1.54M msg/s vs 1.4M at 48T on a 3rd gen EPYC, and 5.4M vs 5.3M for a 3C6T setup).	2024-03-25 17:34:19 +00:00
Willy Tarreau	eb3d5f464d	MEDIUM: ring: use the topmost bit of the tail as a lock We're now locking the tail while looking for some room in the ring. In fact it's still while writing to it, but the goal definitely is to get rid of the lock ASAP. For this we reserve the topmost bit of the tail as a lock, which may have as a possible visible effect that buffers will be limited to 2GB instead of 4GB on 32-bit machines (though in practise, good luck for allocating more than 2GB contiguous on 32-bit), but in practice since the size is read with atol() and some operating systems limit it to LONG_MAX unless passing negative numbers, the limit is already there. For now the impact on x86_64 is significant (drop from 2.35 to 1.4M/s on 48 threads on EPYC 24 cores) but this situation is only temporary so that changes can be reviewable and bisectable. Other approaches were attempted, such as using XCHG instead, which is slightly faster on x86 with low thread counts (but causes more write contention), and forces readers to stall under heavy traffic because they can't access a valid value for the queue anymore. A CAS requires preloading the value and is les good on ARMv8.1. XADD could also be considered with 12-13 upper bits of the offset dedicated to locking, but that looks overkill.	2024-03-25 17:34:19 +00:00
Willy Tarreau	2192983ffd	MEDIUM: ring: protect the reader's positions against writers The reader now needs to protect the positions it's reading. This is already done via the readers counter at the beginning of messages, but as long as the lock is present, this counter is decremented before starting to parse messages, and incremented at the end. We must now do that in reverse, first protect the end of the messages, and only then remove ourselves from the already processed messages, so that at no point could a writer pass over and possibly overwrite data we're currently watching.	2024-03-25 17:34:19 +00:00
Willy Tarreau	73b2436fe6	MEDIUM: ring: lock the tail's readers counters before proceeding with the changes The goal here is to start to protect the writing area inside the area itself so that we'll later be able to release the ring's lock. We're not there yet, but at least the tail is marked as protected for as long as the message is not fully written.	2024-03-25 17:34:19 +00:00
Willy Tarreau	d336d71cbb	MINOR: ring: make the reader check the readers count before inc/dec We'll want to reserve some special values for the readers count to temporary lock the following message, but for this it will be mandatory that readers check for them before incrementing/decrementing the counter. Let'sdo that using a CAS. The readers performance is not as critical as the writer's anyway so the slight overhead is not a problem.	2024-03-25 17:34:19 +00:00
Willy Tarreau	bf3dead20c	MEDIUM: ring: remove the struct buffer from the ring The purpose is to store a head and a tail that are independent so that we can further improve the API to update them independently from each other. The struct was arranged like the original one so that as long as a ring has its head set to zero (i.e. no recycling) it will continue to work. The new format is already detectable thanks to the "rsvd" field which indicates the number of reserved bytes at the beginning. It's located where the buffer's area pointer previously was, so that older versions of haring can continue to open the ring in repair mode, and newer ones can use the fact that the upper bits of that variable are zero to guess that it's working with the new format instead of the old one. Also let's keep in mind that the layout will further change to place some alignment constraints. The haring tool will thus updated based on this and it detects that the rsvd field is smaller than a page and that the sum of it with the size equals the mapped size, in which case it uses the new dump_v2() function instead of dump_v1(). The new function also creates a buffer from the ring's area, size, head and tail and calls the generic one so that no other code had to be adapted.	2024-03-25 17:34:19 +00:00
Willy Tarreau	01aa0a057c	MEDIUM: ring: change the ring reader to use the new vector-based API now The code now looks cleaner and more easily shows what still needs to be addressed. There are not that many changes in practice, these are mostly mechanical, essentially hiding the buffer from the callers.	2024-03-25 17:34:19 +00:00
Willy Tarreau	4e6fadb8a1	MEDIUM: ring: replace the buffer API in ring_write() with the vec<->ring API This is the start of the replacement of the buffer API calls. Only the ring_write() function was touched. Instead of manipulating a buffer all along, we now extract the ring buffer's head and tail upon entry, store them locally and use them using the vec<->ring API until the last moment where we can update the buffer with the new values. One subtle point is that we must never fill the buffer past the last byte otherwise the vec-to-ring conversion gets lost and there's no more possibility to know where's the beginning nor the end (just like when dealing with head+tail in fact), because it then becomes impossible to distinguish between an empty and a full buffer.	2024-03-25 17:34:19 +00:00
Willy Tarreau	4e6de42b27	MINOR: ring: allow to reduce a ring size In ring_resize() we used to check if the new ring was at least as large as the previous one before resizing it, but what counts is that it's as large as the previous one's contents. Initially it was thought this would not really matter, but given that rings are initially created as BUFSIZE, it's currently not possible to shrink them for debugging purposes. Now with this change it is.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0fa05ce171	MINOR: ring: resize only under thread isolation The ring resizing was already quite tricky, but when facing atomic writes it will no longer be possible and we definitely do not want to have to deal with a lock there. Since it's only done at boot time, and possibly later from the CLI, let's just do it under thread isolation.	2024-03-25 17:34:19 +00:00
Willy Tarreau	03816ccfa9	MAJOR: ring: insert an intermediary ring_storage level We'll need to add more complex structures in the ring, such as wait queues. That's far too much to be stored into the area in case of file-backed contents, so let's split the ring definition and its storage once for all. This patch introduces a struct ring_storage which is assigned to ring->storage, which contains minimal information to represent the storage layout, i.e. for now only the buffer, and all the rest remains in the ring itself. The storage is appended immediately after it and the buffer's pointer always points to that area. It has the benefit of remaining 100% compatible with the existing file-backed layout. In memory, the allocation loses the size of a struct buffer. It's not even certain it's worth placing the size there, given that it's constant and that a dump of a ring wouldn't really need it (the file size is sufficient). But for now everything comes with the struct buffer, and later this will change once split into head and tail. Also this area may be completed with more information in the future (e.g. storage version, format, endianness, word size etc).	2024-03-25 17:34:19 +00:00
Willy Tarreau	01abdcb307	MINOR: ring: add a flag to indicate a mapped file Till now we used to rely on a heuristic pointer comparison to check if a ring was mapped or allocated. Better assign a flag to clarify this because it's going to become difficult otherwise.	2024-03-25 17:34:19 +00:00
Willy Tarreau	80441a6983	MINOR: ring: use ring_size(), ring_area(), ring_head() and ring_tail() Some open-coded constructs were updated to make use of the ring accessors instead. This allows to remove some direct dependencies on the buffers API a bit more.	2024-03-25 17:34:19 +00:00
Willy Tarreau	a75052d665	MINOR: errors: use ring_dup() to duplicate the startup_logs In startup_logs_dup() we currently need to reference the ring's buffer, better not do this as it will complicate operations when switching to other types.	2024-03-25 17:34:19 +00:00
Willy Tarreau	7c9ce715c9	MINOR: ring: make callers use ring_data() and ring_size(), not ring->buf As we're going to remove the ring's buffer, we don't want callers to access it directly, so let's use ring_data() and ring_size() instead for this.	2024-03-25 17:34:19 +00:00
Willy Tarreau	ee1c92cf10	MINOR: ring: rename totlen vs msglen in ring_write() The ring_write() function uses confusing variable names: totlen is in fact the length of the message, not the total length that is going to be written. Let's rename it msglen and have a real "needed" that corresponds to the total size we're going to write. We also add a BUG_ON_HOT() to catch mistakes causing discrepancies.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0b1c17a2dd	MINOR: ring: reserve one special value for the readers count In order to support concurrent writers we'll need to lock areas in the buffer. For this we'll use one special value of the single-byte readers count. Let's reserve it now and use the macro instead of the hardcoded 255.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0f611987da	MINOR: ring: make the ring reader use only absolute offsets The goal is to remove references to the buffer's head and tail in the fast path so that we can release the lock during some reads. This means no more comparisons with b_data() nor operations relative to b_head() will be possible anymore. As a first step we need to have an absolute offset in the buffer, and to use b_getblk_ofs() in the applet callbacks to retrieve the data based on this.	2024-03-25 17:34:19 +00:00
Willy Tarreau	8f3edf2ac6	MEDIUM: log/sink: make the log forwarder code use ring_dispatch_messages() This code becomes even simpler and almost does not need any knowledge of the structure of the ring anymore. It even highlighted that an old race had not been fixed due to code duplication, but that's now done.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c262442b1a	MEDIUM: sink: move the generic ring forwarder code use ring_dispatch_messages() Now the code is much simpler than the ring forwarding function almost does not need any knowledge of the structure of the ring anymore.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c62a2d540d	MEDIUM: ring: move the ring reader code to ring_dispatch_messages() This new function is made around the loop that scans a ring for new messages and dispatches them to a message handler. It also takes ring flags (WAIT, NEW, etc) and offset pointers that the caller will use to initialize/reuse/update the current processing offset. The caller is still responsible for presetting it to ~0 before the first call if it wants the function to automatically adjust it (or set it to the correct value). The function may also return the last_ofs that was known before releasing the lock so that the caller knows what to compare against and if it needs to restart processing or not. The context remains a void* so that should not necessarily depend on an appctx. The current "show ring" code was ported to this and it continues to work as expected.	2024-03-25 17:34:19 +00:00
Willy Tarreau	ad31e53287	REORG: dns/ring: split the ring between the generic one and the DNS one A ring is used for the DNS code but slightly differently from the generic one, which prevents some important changes from being made to the generic code without breaking DNS. As the use cases differ, it's better to just split them apart for now and have the DNS code use its own ring that we rename dns_ring and let the generic code continue to live on its own. The unused parts such as CLI registration were dropped, resizing and allocation from a mapped area were dropped. dns_ring_detach_appctx() was kept despite not being used, so as to stay consistent with the comments that say it must be called, despite the DNS code explicitly mentioning that it skips it for now (i.e. this may change in the future). Hopefully after the generic rings are converted the DNS code can migrate back to them, though this is really not necessary.	2024-03-25 17:34:19 +00:00
Willy Tarreau	8022ae326c	MEDIUM: ring/sink: use applet_append_line()/syslog_applet_append_event() for readers The rink reader code was duplicated as-is in 2.2 for the ring forwarding code in commits 494c505703 ("MEDIUM: ring: add server statement to forward messages from a ring") and 975564784f ("MEDIUM: ring: add new srv statement to support octet counting forward") (which only differs by using a prefix instead of a suffix to delimit messages). Unfortunately, that makes it almost impossible to rework the core ring code because all these parts rely on it. This first commit aims at restoring a common structure for the core loop by just calling a distinct function based on the use case. The functions are either applet_append_line() when a whole line is to be emitted followed by an LF character, or syslog_applet_appent_event() when trying to send a TCP syslog line prepended with its size in decimal. There is no functional change beyond this.	2024-03-25 17:34:19 +00:00
Willy Tarreau	201c706330	MINOR: log/applet: add new function syslog_applet_append_event() This function takes a buffer on input, and offset and a length, and consumes the block from that buffer to send it to the appctx's output buffer. Contrary to its sibling applet_append_line(), instead of just appending an LF at the end of the line, it prepends the message size in decimal and a space before the message, as expected by syslog TCP implementaions. This will be used to simplify the ring reader code.	2024-03-25 17:34:19 +00:00
Willy Tarreau	6ae41dc510	MINOR: applet: add new function applet_append_line() This function takes a buffer on input, and offset and a length, and consumes the block from that buffer to send it to the appctx's output buffer. This will be used to simplify the ring reader code.	2024-03-25 17:34:19 +00:00
Willy Tarreau	5df0df96dd	MINOR: debug: add "debug dev trace" to flood with traces This new command, enabled only with "DEBUG_DEV", sends 2 or 20 traces per task wakeup (depending on the verbosity level), and stops after 1M wakeups per thread in order not to have to stop/start the process each time it's fired. We have two small messages and 18 larger ones from 20 to 270 bytes each, so that the average size is approx 213 bytes counting headers (the header adds approx 82 bytes), which matches what's generally observed on average when traces are enabled in all muxes. Typical figures show varations between 5.7M and 6.2M msg/s on an EPYC in a 3C6T setup (single CCX), and 2.12M - 2.22M in a 24C48T setup (across 8 CCX, with 8 thread groups).	2024-03-25 17:32:22 +00:00
Aurelien DARRAGON	db1cd8f881	OPTIM: http_ext: avoid useless copy in http_7239_extract_{ipv4,ipv6} In http_7239_extract_{ipv4,ipv6}, we declare a local buffer in order to use inet_pton() since it requires a valid destination argument (cannot be NULL). Then, if the caller provided <ip> argument, we copy inet_pton() result (from local buffer to <ip>). In fact when the caller provides <ip>, we may directly use <ip> as inet_pton() dst argument to avoid an useless copy. Thus the local buffer is only relevant when the user doesn't provide <ip>. While at it, let's add a missing testcase for the rfc7239_n2nn converter (to check that http_7239_extract_ipv4() with <ip> provided works properly) This could be backported in 2.8 with b2bb925 ("MINOR: proxy/http_ext: introduce proxy forwarded option")	2024-03-25 16:24:15 +01:00
Aurelien DARRAGON	3de1acfb23	BUILD: server: fix build regression on old compilers (<= gcc-4.4) Willy reported that since 3ac79b504 ("MEDIUM: server: make server_set_inetaddr() updater serializable"), haproxy fails to compile on some older compilers such as gcc-4.4 with this kind of error: src/server.c: In function 'snr_resolution_cb': src/server.c:4471: error: unknown field 'dns_resolver' specified in initializer compilation terminated due to -Wfatal-errors. make: *** [Makefile:1006: src/server.o] Error 1 This is due to referencing a member inside anonymous union from a compound literal assignment. Apparently such use of anonymous union wasn't properly supported back then on older compilers. To fix the issue, we give "u" name to the parent union use this name to explicitly refer to the union where relevant in the code (only a few changes fortunately). The fix itself was verified to restore build compatibility with gcc 4.4 (and even 4.2). As 3ac79b504 is used as a prerequisite for 64c9c8ef3 ("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS"), please consider backporting this patch too if 64c9c8ef3 happens to be backported in 2.9.	2024-03-25 16:23:37 +01:00
Christopher Faulet	56c4b29ff1	BUG/MEDIUM: mux-fcgi: Properly handle EOM flag on end-of-trailers HTX block Trailers are skipped by the FCGI multiplexer. However empty chunked messages are not properly handled. It may be a chunked H1 request with no payload or a H2/H3 POST request with no payload. In that caes, the EOT HTX block is just ignored. The issue is that the EOM flag is thus ignored too. It means no empty STDIN record is sent to mark the end of the request to the server. To fix the issue, when a EOT htx block is found and it is the last HTX block of the message (and it should be), the EOM flag is tested. If it is found, an empty STDIN record is emitted. This patch should fix the issue #2499. It must be backported as far as 2.4.	2024-03-25 11:06:41 +01:00
Amaury Denoyelle	bd384a359b	BUG/MINOR: mux-quic: close all QCS before freeing QCC tasklet QUIC MUX is freed via qcc_release(). This in turn liberate all the remaining QCS instances. For each one of them, their corresponding stream-desc is released via qc_stream_desc_release(). This last function may itself notifies QUIC MUX when new buffers are available. This is useful when QCS are closed individually without the whole connection. However, when the connection is closed through qcc_release(), this may cause issue as some elements of QUIC MUX are already freed. In 2.9.6, a bug was detected directly linked to this. Indeed, QCC instance may be woken up on stream-desc release. If called through qcc_release(), this is an issue because QCC tasklet is freed before QCS instances. However, this bug is not systematic and relies on prior conditions : in particular, QUIC MUX must be under Tx buffers exhaustion prior to the qcc_release() invocation. The current dev tree is not impacted by this bug, thanks to QUIC MUX refactoring. Indeed, notifying accross layers have changed and now stream-desc release notifies individual QCS instances instead of the QCC element, which is a safer mechanism. However, to simplify backport process, bugfix is introduced in the current dev tree as it does not have any impact. Note that a proper fix would be to set quic-conn MUX state to QC_MUX_RELEASED. However, it is not possible to call quic_close() without having releasing all stream-desc elements first. The simpler solution was chosen to prevent other breaking issues during backports. This should fix github issue #2494. It should be backported up to 2.6. Note that prior to 2.7 qcc_release() was named qc_release().	2024-03-25 10:24:59 +01:00
Amaury Denoyelle	0d4273f04b	MEDIUM: server: close private idle connection before server deletion This commit similar to the following one : 65ae241dcfe710e1cdd3ec4e7a9bde38d2e4c116 MEDIUM: server: close idle conn before server deletion This patch implements a similar logic, this time to close private idle connections stored in sessions. The principle is identical to the above commit : conn_release() is used on idle connections after a takeover to ensure thread safety. An extra change was required to be able to execute takeover on such connections. Their original thread ID was unknown, contrary to non private connections which are stored in sharded lists. As such, a new tid member has been added under sess_priv_conns chaining element.	2024-03-22 17:12:27 +01:00
Amaury Denoyelle	5e8eb3661b	MEDIUM: mux: prepare for takeover on private connections When a backend connection is marked as idle, a special flag TASK_F_USR1 is set on MUX tasklet. When MUX tasklet is reactivated, extra checks are executed under this flag to ensure no takeover occurred in the meantime. Previously, only non private connections could be targetted by a takeover. However, this will change when implementing private idle connections closure on "delete server" CLI handler. As such, TASK_F_USR1 is now also set for private connections in MUX detach callbacks.	2024-03-22 17:10:06 +01:00
Amaury Denoyelle	6e0afb2e27	MEDIUM: server: close idle conn on server deletion To be able to delete a server, a number of preconditions must be validated to ensure it is not in used anymore. Previously, if idle connections were stored in the server, the deletion was cancelled. No action was implemented to force idle connection closure, the only solution was to wait for the periodic purging to be achieved. This is an extra burden to be able to delete a server. Indeed, idle connections are by definition inactive and can be closed prior to delete a server. This is the exact purpose of this patch. Idle connections removal is implemented inside "delete server" handler, once it has been determined that the server can be freely removed. A simple loop is run to call conn_release() over each idle connections. Takeover is also executed before conn_release() to ensure tasks/tasklets or any other sensible elements are not deleted from a foreign thread. This patch should reduce the occurence of rejected "delete server" execution, especially when connection reuse is high.	2024-03-22 16:59:02 +01:00
Amaury Denoyelle	f3862a9bc7	MINOR: connection: extend takeover with release option Extend takeover API both for MUX and XPRT with a new boolean argument <release>. Its purpose is to signal if the connection will be freed immediately after the takeover, rendering new resources allocation unnecessary. For the moment, release argument is always false. However, it will be set to true on delete server CLI handler to proactively close server idle connections.	2024-03-22 16:12:36 +01:00
Amaury Denoyelle	ff2e71ae24	MINOR: connection: implement conn_release() Several places reuse the same code to ensure a connection is properly freed, either via its MUX or by calling the proper set of functions. Factorize all of this in a new function conn_release(). This new function is now called via session_free() and session_accept_fd(). It will also be reused on delete server to proactively close idle connections.	2024-03-22 16:12:36 +01:00

... 30 31 32 33 34 ...

18755 Commits