haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-12-04 01:01:00 +01:00

Author	SHA1	Message	Date
Willy Tarreau	a7f8693fa2	MEDIUM: ring: always allocate properly aligned ring structures The rings were manually padded to place the various areas that compose them into different cache lines, provided that the allocator returned a cache-aligned address, which until now was not granted. By now switching to the aligned API we can finally have this guarantee and hope for more consistent ring performance between tests. Like previously the few carefully crafted THREAD_PAD() could simply be replaced by generic THREAD_ALIGN() that dictate the type's alignment. This was the last user of THREAD_PAD() by the way.	2025-08-13 17:47:39 +02:00
Willy Tarreau	f4634e5a38	MINOR: ring/cli: support delimiting events with a trailing \0 on "show events" At the moment it is not supported to produce multi-line events on the "show events" output, simply because the LF character is used as the default end-of-event mark. However it could be convenient to produce well-formatted multi-line events, e.g. in JSON or other formats. UNIX utilities have already faced similar needs in the past and added "-print0" to "find" and "-0" to "xargs" to mention that the delimiter is the NUL character. This makes perfect sense since it's never present in contents, so let's do exactly the same here. Thus from now on, "show events <ring> -0" will delimit messages using a \0 instead of a \n, permitting a better and safer encapsulation.	2025-04-08 14:36:35 +02:00
Willy Tarreau	6c1b29d06f	MINOR: ring: make the number of queues configurable Now the rings have one wait queue per group. This should limit the contention on systems such as EPYC CPUs where the performance drops dramatically when using more than one CCX. Tests were run with different numbers and it was showed that value 6 outperforms all other ones at 12, 24, 48, 64 and 80 threads on an EPYC, a Xeon and an Ampere CPU. Value 7 sometimes comes close and anything around these values degrades quickly. The value has been left tunable in the global section. This commit only introduces everything needed to set up the queue count so that it's easier to adjust it in the forthcoming patches, but it was initially added after the series, making it harder to compare. It was also shown that trying to group the threads in queues by their thread groups is counter-productive and that it was more efficient to do that by applying a modulo on the thread number. As surprising as it seems, it does have the benefit of well balancing any number of threads.	2024-03-25 17:34:19 +00:00
Willy Tarreau	e3f101a19a	MINOR: ring: add the definition of a ring waiting cell This is what will be used to describe one waiting thread, its message in the queues, and the aggregation of pending messages after it.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c7bd7a68e4	OPTIM: ring: have only one thread at a time wake up all readers It's inefficient and counter-productive that each ring writer iterates over all readers to wake them up. Let's just have one in charge of this, it strongly limits contention. The only thing is that since the thread is iterating over a list, we want to be sure that if the first readers have already completed their job, they will be woken up again. For this we keep a counter of messages delivered after the wakeup started, and the waking thread will check it before going back to sleep. In order to avoid looping forever, it will also drop its waking flag soon enough to possibly let another one take it. There used to be a few cases of watchdogs before this on a 24-core AMD EPYC platform on the list iteration those never appeared anymore. The perf has dropped a bit on 3C6T on the EPYC, from 6.61 to 6.0M but remains unchanged at 24C48T.	2024-03-25 17:34:19 +00:00
Willy Tarreau	9e99cfbeb6	MAJOR: ring: drop the now unneeded lock It was only used to protect the list which is now an mt_list so it doesn't provide any required protection anymore. It obviously also used to provide strict ordering between the writer and the reader when the writer started to update the messages, but that's now covered by the oredered tail updates and updates to the readers count to protect the area. The message rate on small thread counts (up to 12) saw a boost of roughly 5% while on large counts while for large counts it lost about 2% due to some contention now becoming visible elsewhere. Typical measures are 6.13M -> 6.61M at 3C6T, and 1.88 -> 1.92M at 24C48T on the EPYC.	2024-03-25 17:34:19 +00:00
Willy Tarreau	a2d2dbf210	MEDIUM: ring/applet: turn the wait_entry list to an mt_list instead Rings are keeping a lock only for the list, which apparently doesn't need anything more than an mt_list, so let's first turn it into that before dropping the lock. There should be no visible effect.	2024-03-25 17:34:19 +00:00
Willy Tarreau	eb3d5f464d	MEDIUM: ring: use the topmost bit of the tail as a lock We're now locking the tail while looking for some room in the ring. In fact it's still while writing to it, but the goal definitely is to get rid of the lock ASAP. For this we reserve the topmost bit of the tail as a lock, which may have as a possible visible effect that buffers will be limited to 2GB instead of 4GB on 32-bit machines (though in practise, good luck for allocating more than 2GB contiguous on 32-bit), but in practice since the size is read with atol() and some operating systems limit it to LONG_MAX unless passing negative numbers, the limit is already there. For now the impact on x86_64 is significant (drop from 2.35 to 1.4M/s on 48 threads on EPYC 24 cores) but this situation is only temporary so that changes can be reviewable and bisectable. Other approaches were attempted, such as using XCHG instead, which is slightly faster on x86 with low thread counts (but causes more write contention), and forces readers to stall under heavy traffic because they can't access a valid value for the queue anymore. A CAS requires preloading the value and is les good on ARMv8.1. XADD could also be considered with 12-13 upper bits of the offset dedicated to locking, but that looks overkill.	2024-03-25 17:34:19 +00:00
Willy Tarreau	dd8ea5d928	MEDIUM: ring: align the head and tail fields in the ring_storage structure We really want to let the readers and writers act on different areas, so we want to have the tail and the head on separate cache lines, themselves separate from the rest of the ring. Doing so improves the performance from 2.15 to 2.35M msg/s at 48 threads on a 24-core EPYC. This increases the header space from 32 to 192 bytes when threads are enabled. But since we already have the header size available in the file, haring remains able to detect the aligned vs unaligned formats and call dump_v2a() when aligned is detected.	2024-03-25 17:34:19 +00:00
Willy Tarreau	bf3dead20c	MEDIUM: ring: remove the struct buffer from the ring The purpose is to store a head and a tail that are independent so that we can further improve the API to update them independently from each other. The struct was arranged like the original one so that as long as a ring has its head set to zero (i.e. no recycling) it will continue to work. The new format is already detectable thanks to the "rsvd" field which indicates the number of reserved bytes at the beginning. It's located where the buffer's area pointer previously was, so that older versions of haring can continue to open the ring in repair mode, and newer ones can use the fact that the upper bits of that variable are zero to guess that it's working with the new format instead of the old one. Also let's keep in mind that the layout will further change to place some alignment constraints. The haring tool will thus updated based on this and it detects that the rsvd field is smaller than a page and that the sum of it with the size equals the mapped size, in which case it uses the new dump_v2() function instead of dump_v1(). The new function also creates a buffer from the ring's area, size, head and tail and calls the generic one so that no other code had to be adapted.	2024-03-25 17:34:19 +00:00
Willy Tarreau	03816ccfa9	MAJOR: ring: insert an intermediary ring_storage level We'll need to add more complex structures in the ring, such as wait queues. That's far too much to be stored into the area in case of file-backed contents, so let's split the ring definition and its storage once for all. This patch introduces a struct ring_storage which is assigned to ring->storage, which contains minimal information to represent the storage layout, i.e. for now only the buffer, and all the rest remains in the ring itself. The storage is appended immediately after it and the buffer's pointer always points to that area. It has the benefit of remaining 100% compatible with the existing file-backed layout. In memory, the allocation loses the size of a struct buffer. It's not even certain it's worth placing the size there, given that it's constant and that a dump of a ring wouldn't really need it (the file size is sufficient). But for now everything comes with the struct buffer, and later this will change once split into head and tail. Also this area may be completed with more information in the future (e.g. storage version, format, endianness, word size etc).	2024-03-25 17:34:19 +00:00
Willy Tarreau	01abdcb307	MINOR: ring: add a flag to indicate a mapped file Till now we used to rely on a heuristic pointer comparison to check if a ring was mapped or allocated. Better assign a flag to clarify this because it's going to become difficult otherwise.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0b1c17a2dd	MINOR: ring: reserve one special value for the readers count In order to support concurrent writers we'll need to lock areas in the buffer. For this we'll use one special value of the single-byte readers count. Let's reserve it now and use the macro instead of the hardcoded 255.	2024-03-25 17:34:19 +00:00
Willy Tarreau	a2a3d5dd25	CLEANUP: ring: remove the now unused ring's offset Since the previous patch, the ring's offset is not used anymore. The haring utility remains backward-compatible since it can trust the buffer element that's at the beginning of the map and which still contains all the valid data.	2023-02-24 09:26:30 +01:00
Willy Tarreau	cba8838e59	CLEANUP: ring: pass the ring watch flags to ring_attach_cli(), not in ctx.cli The ring watch flags (wait, seek end) were dangerously passed via ctx.cli.i0 from "show buf" in sink.c:cli_parse_show_events(), or implicitly reset in "show errors". That's very unconvenient, difficult to follow, and prone to short-term breakage. Let's pass an extra argument to ring_attach_cli() to take these flags, now defined in ring-t.h as RING_WF_*, and let the function set them itself where appropriate (still ctx.cli.i0 for now).	2022-05-06 18:13:36 +02:00
Willy Tarreau	e5793916f0	REORG: include: make list-t.h part of the base API There are list definitions everywhere in the code, let's drop the need for including list-t.h to declare them. The rest of the list manipulation is huge however and not needed everywhere so using the list walking macros still requires to include list.h.	2020-06-11 10:18:59 +02:00
Willy Tarreau	d2ad57c352	REORG: include: move ring to haproxy/ring{,-t}.h Some includes were wrong in the type definition but beyond this no change was needed.	2020-06-11 10:18:57 +02:00

17 Commits