haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-13 15:00:59 +01:00

Author	SHA1	Message	Date
Ilya Shipitsin	1f6e5f7a61	CLEANUP: assorted typo fixes in the code and comments This is 43rd iteration of typo fixes	2024-09-03 17:49:21 +02:00
Christopher Faulet	a7f6b0ac03	MEDIUM: stick-table: Add support of a factor for IN/OUT bytes rates Add a factor parameter to stick-tables, called "brates-factor", that is applied to in/out bytes rates to work around the 32-bits limit of the frequency counters. Thanks to this factor, it is possible to have bytes rates beyond the 4GB. Instead of counting each bytes, we count blocks of bytes. Among other things, it will be useful for the bwlim filter, to be able to configure shared limit exceeding the 4GB/s. For now, this parameter must be in the range ]0-1024].	2024-09-02 15:50:25 +02:00
Christopher Faulet	ad946a704d	MINOR: stick-table: Always decrement ref count before killing a session Guarded functions to kill a sticky session, stksess_kill() stksess_kill_if_expired(), may or may not decrement and test its reference counter before really killing it. This depends on a parameter. If it is set to non-zero value, the ref count is decremented and if it falls to zero, the session is killed. Otherwise, if this parameter is equal to zero, the session is killed, regardless the ref count value. In the code, these functions are always called with a non-zero parameter and the ref count is always decremented and tested. So, there is no reason to still have a special case. Especially because it is not really easy to say if it is supported or not. Does it mean it is possible to kill a sticky session while it is still referenced somewhere ? probably not. So, does it mean it is possible to kill a unreferenced session ? This case may be problematic because the session is accessed outside of any lock and thus may be released by another thread because it is unreferenced. Enlarging scope of the lock to avoid any issue is possible but it is a bit of shame to do so because there is no usage for now. The best is to simplify the API and remove this case. Now, stksess_kill() and stksess_kill_if_expired() functions always decrement and test the ref count before killing a sticky session.	2024-06-26 15:05:06 +02:00
Christopher Faulet	9357873641	BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session When we try to kill a session, the shard must be locked before decrementing the ref count on the session. Otherwise, the ref count can fall to 0 and a purge task (stktable_trash_oldest or process_table_expire) may release the session before we have the opportunity to acquire the lock on the shard to effectively kill the session. This could lead to a double free. Here is the scenario: Thread 1 Thread 2 sktsess_kill(ts) if (ATOMIC_DEC(&ts->ref_cnt) != 0) return /* here the ref count is 0 / stktable_trash_oldest() LOCK(&sh_lock) if (!ATOMIC_LOAD(&ts->ref_cnf)) __stksess_free(ts) UNLOCK(&sh_lock) / here the session was released */ LOCK(&sh_lock) __stksess_free(ts) <--- double free UNLOCK(&sh_lock) The bug was introduced in 2.9 by the commit 7968fe3889 ("MEDIUM: stick-table: change the ref_cnt atomically"). The ref count must be decremented inside the lock for stksess_kill() and sktsess_kill_if_expired() function. This patch should fix the issue #2611. It must be backported as far as 2.9. On the 2.9, there is no sharding. All the table is locked. The patch will have to be adapted.	2024-06-26 12:05:37 +02:00
Willy Tarreau	19f8762a98	BUILD: stick-tables: silence build warnings when threads are disabled Since 3.0-dev7 with commit 1a088da7c2 ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), building without threads yields a warning about the shard not being used. This is because the locks API does nothing of its arguments, which is the only place where the shard is being used. We cannot modify the lock API to pretend to consume its argument because quite often it's not even instantiated. Let's just pretend we consume shard using an explict ALREADY_CHECKED() statement instead. While we're at it, let's make sure that XXH32() is not called when there is a single bucket! No backport is needed.	2024-04-24 08:23:56 +02:00
Willy Tarreau	1a088da7c2	MAJOR: stktable: split the keys across multiple shards to reduce contention In order to reduce the contention on the table when keys expire quickly, we're spreading the load over multiple trees. That counts for keys and expiration dates. The shard number is calculated from the key value itself, both when looking up and when setting it. The "show table" dump on the CLI iterates over all shards so that the output is not fully sorted, it's only sorted within each shard. The Lua table dump just does the same. It was verified with a Lua program to count stick-table entries that it works as intended (the test case is reproduced here as it's clearly not easy to automate as a vtc): function dump_stk() local dmp = core.proxies['tbl'].stktable:dump({}); local count = 0 for _, __ in pairs(dmp) do count = count + 1 end core.Info('Total entries: ' .. count) end core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0); ## global tune.lua.log.stderr on lua-load-per-thread lua-cnttbl.lua listen front bind :8001 http-request lua.dump_stk if { path_beg /stk } http-request track-sc1 rand(),upper,hex table tbl http-request redirect location / backend tbl stick-table size 100k type string len 12 store http_req_cnt ## $ h2load -c 16 -n 10000 0:8001/ $ curl 0:8001/stk ## A count close to 100k appears on haproxy's stderr ## On the CLI, "show table tbl" \| wc will show the same. Some large parts were reindented only to add a top-level loop to iterate over shards (e.g. process_table_expire()). Better check the diff using git show -b. The number of shards is decided just like for the pools, at build time based on the max number of threads, so that we can keep a constant. Maybe this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used, and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the measurements made for the pools. It turns out that this value seems to be the most reasonable one without inflating the struct stktable too much. By default for 1024 threads the value is 32 and delivers 980k RPS in a test involving 80 threads, while adding 1kB to the struct stktable (roughly doubling it). The same test at 64 gives 1008 kRPS and at 128 it gives 1040 kRPS for 8 times the initial size. 16 would be too low however, with 675k RPS. The stksess already have a shard number, it's the one used to decide which peer connection to send the entry. Maybe we should also store the one associated with the entry itself instead of recalculating it, though it does not happen that often. The operation is done by hashing the key using XXH32(). The peers also take and release the table's lock but the way it's used it not very clear yet, so at this point it's sure this will not work. At this point, this allowed to completely unlock the performance on a 80-thread setup: before: 5.4 Gbps, 150k RPS, 80 cores 52.71% haproxy [.] stktable_lookup_key 36.90% haproxy [.] stktable_get_entry.part.0 0.86% haproxy [.] ebmb_lookup 0.18% haproxy [.] process_stream 0.12% haproxy [.] process_table_expire 0.11% haproxy [.] fwrr_get_next_server 0.10% haproxy [.] eb32_insert 0.10% haproxy [.] run_tasks_from_lists after: 36 Gbps, 980k RPS, 80 cores 44.92% haproxy [.] stktable_get_entry 5.47% haproxy [.] ebmb_lookup 2.50% haproxy [.] fwrr_get_next_server 0.97% haproxy [.] eb32_insert 0.92% haproxy [.] process_stream 0.52% haproxy [.] run_tasks_from_lists 0.45% haproxy [.] conn_backend_get 0.44% haproxy [.] __pool_alloc 0.35% haproxy [.] process_table_expire 0.35% haproxy [.] connect_server 0.35% haproxy [.] h1_headers_to_hdr_list 0.34% haproxy [.] eb_delete 0.31% haproxy [.] srv_add_to_idle_list 0.30% haproxy [.] h1_snd_buf WIP: uint64_t -> long WIP: ulong -> uint code is much smaller	2024-04-03 17:34:47 +02:00
Willy Tarreau	0a0041d195	BUILD: tree-wide: fix a few missing includes in a few files Some include files, mostly types definitions, are missing a few includes to define the types they're using, causing include ordering dependencies between files, which are most often not seen due to the alphabetical order of includes. Let's just fix them. These were spotted by building pre-compiled headers for all these files to .h.gch.	2024-03-05 11:50:34 +01:00
Willy Tarreau	c9c6b683fb	MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate This adds a new pair of stored types in the stick-tables: - glitch_cnt - glitch_rate These keep count of the number of glitches reported on a front connection, in order to decide how to act with a badly defective client or a potential attacker. For now nothing updates these counters, but all the infrastructure needed to configure, update and retrieve them was added, including the doc. No regtest was added yet since they're not filled yet.	2024-02-08 15:51:49 +01:00
Aurelien DARRAGON	e10cf61099	MINOR: stktable: add stktable_deinit function Adding sktable_deinit() helper function to properly cleanup a sticktable that was initialized using stktable_init().	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	b8c19f877a	MINOR: stktable: stktable_init() sets err_msg on error stktable_init() now sets err_msg when error occurs so that caller is able to precisely report the cause of the failure.	2023-11-03 17:30:30 +01:00
Willy Tarreau	7968fe3889	MEDIUM: stick-table: change the ref_cnt atomically Due to the ts->ref_cnt being manipulated and checked inside wrlocks, we continue to have it updated under plenty of read locks, which have an important cost on many-thread machines. This patch turns them all to atomic ops and carefully moves them outside of locks every time this is possible: - the ref_cnt is incremented before write-unlocking on creation otherwise the element could vanish before we can do it - the ref_cnt is decremented after write-locking on release - for all other cases it's updated out of locks since it's guaranteed by the sequence that it cannot vanish - checks are done before locking every time it's used to decide whether we're going to release the element (saves several write locks) - expiration tests are just done using atomic loads, since there's no particular ordering constraint there, we just want consistent values. For Lua, the loop that is used to dump stick-tables could switch to read locks only, but this was not done. For peers, the loop that builds updates in peer_send_teachmsgs is extremely expensive in write locks and it doesn't seem this is really needed since the only updated variables are last_pushed and commitupdate, the first one being on the shared table (thus not used by other threads) and the commitupdate could likely be changed using a CAS. Thus all of this could theoretically move under a read lock, but that was not done here. On a 80-thread machine with a peers section enabled, the request rate increased from 415 to 520k rps.	2023-08-11 19:03:35 +02:00
William Lallemand	3f210970bf	BUG/MINOR: stick_table: alert when type len has incorrect characters Alert when the len argument of a stick table type contains incorrect characters. Replace atol by strtol. Could be backported in every maintained versions.	2023-04-13 14:46:08 +02:00
Willy Tarreau	6c0117168e	MEDIUM: stick-table: set the track-sc limit at boottime via tune.stick-counters The number of stick-counter entries usable by track-sc rules is currently set at build time. There is no good value for this since the vast majority of users don't need any, most need only a few and rare users need more. Adding more counters for everyone increases memory and CPU usages for no reason. This patch moves the per-session and per-stream arrays to a pool of a size defined at boot time. This way it becomes possible to set the number of entries at boot time via a new global setting "tune.stick-counters" that sets the limit for the whole process. When not set, the MAX_SESS_STR_CTR value still applies, or 3 if not set, as before. It is also possible to lower the value to 0 to save a bit of memory if not used at all. Note that a few low-level sample-fetch functions had to be protected due to the ability to use sample-fetches in the global section to set some variables.	2023-01-06 18:08:49 +01:00
Willy Tarreau	d5cae6a0c7	MINOR: stick-table: change the API of the function used to calculate the shard The function used to calculate the shard number currently requires a stktable_key on input for this. Unfortunately, it happens that peers currently miss this calculation and they do not provide stktable_key at all, instead they're open-coding all the low-level stick-table work (hence why it's missing). Thus we'll need to be able to calculate the shard number in keys coming from peers as well but the current API does not make it possible. This commit addresses this by inverting the order where the length and the shard number are used. Now the low-level function is independent on stksess and stktable_key, it takes a table, pointer and length and does all the job. The upper function takes care of the type and key to get the its length, and is for use only from stick-table code. This doesn't change anything except that the low-level one will be usable from outside (hence why it's exported now).	2022-11-29 18:06:42 +01:00
Willy Tarreau	dbae89e09c	MEDIUM: stick-table: always use atomic ops to requeue the table's task We're generalizing the change performed in previous commit "MEDIUM: stick-table: requeue the expiration task out of the exclusive lock" to stktable_requeue_exp() so that it can also be used by callers of __stktable_store(). At the moment there's still no visible change since it's still called under the write lock. However, the previous code in stitable_touch_with_exp() was updated to use this function.	2022-10-12 14:19:05 +02:00
Willy Tarreau	8d3c3336f9	MEDIUM: stick-table: make stksess_kill_if_expired() avoid the exclusive lock stream_store_counters() calls stksess_kill_if_expired() for each active counter. And this one takes an exclusive lock on the table before checking if it has any work to do (hint: it almost never has since it only wants to delete expired entries). However a lock is still neeed for now to protect the ref_cnt, but we can do it atomically under the read lock. Let's change the mechanism. Now what we do is to check out of the lock if the entry is expired. If it is, we take the write lock, expire it, and decrement the refcount. Otherwise we just decrement the refcount under a read lock. With this change alone, the config based on 3 trackers without the previous patches saw a 2.6x improvement, but here it doesn't yet change anything because some heavy contention remains on the lookup part.	2022-10-12 14:19:05 +02:00
Willy Tarreau	9f5cb435b6	MINOR: stick-table: move the write lock inside stktable_touch_with_exp() Taking the write lock prior to entering that function is a problem because this function is full of conditions that most of the time can lead to eliminating the lock. This commit first moves the write lock inside the function and passes the extra argument required to implement stktable_touch_remote() and stktable_touch_local(). It also renames the function to remove the underscores since there's no other variant and it's exported under this name (probably an old rename that was not propagated). The code was stressed under 48 threads using 3 trackers on the same table. It already shows a tiny 3% improvement from 187k to 193k rps.	2022-10-12 14:19:05 +02:00
Willy Tarreau	76642223f0	MEDIUM: stick-table: switch the table lock to rwlock Right now a spinlock is used, but most accesses are for reads, so let's switch the lock to an rwlock and switch all accesses to exclusive locks for now. There should be no visible difference at this point.	2022-10-12 14:19:05 +02:00
Willy Tarreau	9310f481ce	CLEANUP: tree-wide: remove unneeded include time.h in ~20 files 20 files used to have haproxy/time.h included only for now_ms, and two were missing it for other things but used to inherit from it via other files.	2021-10-07 01:41:14 +02:00
Emeric Brun	c64a2a307c	MEDIUM: stick-table: handle arrays of standard types into stick-tables This patch provides the code to handle arrays of some standard types (SINT, UINT, ULL and FRQP) in stick table. This way we could define new "array" data types. Note: the number of elements of an array was limited to 100 to put a limit and to ensure that an encoded update message will continue to fit into a buffer when the peer protocol will handle such data types.	2021-07-06 07:24:42 +02:00
Emeric Brun	0e3457b63a	MINOR: stick-table: make skttable_data_cast to use only std types This patch replaces all advanced data type aliases on stktable_data_cast calls by standard types. This way we could call the same stktable_data_cast regardless of the used advanced data type as long they are using the same std type. It also removes all the advanced data type aliases.	2021-07-06 07:24:42 +02:00
Willy Tarreau	6ec1f25bc5	REORG: stick-table: move composite address functions to stick_table.h These caddr_* functions were once placed into tools.h in the hope they would be useful but nobody knows they exist. They could deserve being moved to their own file with other pointer manipulation functions maybe, but for now they're the only reason left for stick_table.h to include tools.h, so let's move them directly there since it's its only user. This allows to remove tools.h from stick_table.h and slightly reduce the overall build time.	2021-05-08 20:24:09 +02:00
Willy Tarreau	3b63ca20f4	REORG: stick-table: uninline stktable_alloc_data_type() This function has no business being inlined in stick_table.h since it's only used at boot time by the config parser. In addition it causes an undesired dependency on tools.h because it uses parse_time_err(). Let's move it to stick_table.c.	2021-05-08 20:24:09 +02:00
Willy Tarreau	5703a38a06	BUILD: stick-table: include freq_ctr.h from stick_table.h It's needed for update_freq_ctr_period() which is used there.	2021-05-08 19:37:41 +02:00
Willy Tarreau	fa1258f02c	MINOR: freq_ctr: unify freq_ctr and freq_ctr_period into freq_ctr Both structures are identical except the name of the field starting the period and its description. Let's call them all freq_ctr and the period's start "curr_tick" which is generic. This is only a temporary change and fields are expected to remain the same with no code change (verified).	2021-04-11 11:11:27 +02:00
Willy Tarreau	826f3ab5e6	MINOR: stick-tables/counters: add http_fail_cnt and http_fail_rate data types Historically we've been counting lots of client-triggered events in stick tables to help detect misbehaving ones, but we've been missing the same on the server side, and there's been repeated requests for being able to count the server errors per URL in order to precisely monitor the quality of service or even to avoid routing requests to certain dead services, which is also called "circuit breaking" nowadays. This commit introduces http_fail_cnt and http_fail_rate, which work like http_err_cnt and http_err_rate in that they respectively count events and their frequency, but they only consider server-side issues such as network errors, unparsable and truncated responses, and 5xx status codes other than 501 and 505 (since these ones are usually triggered by the client). Note that retryable errors are purposely not accounted for, so that only what the client really sees is considered. With this it becomes very simple to put some protective measures in place to perform a redirect or return an excuse page when the error rate goes beyond a certain threshold for a given URL, and give more chances to the server to recover from this condition. Typically it could look like this to bypass a URL causing more than 10 requests per second: stick-table type string len 80 size 4k expire 1m store http_fail_rate(1m) http-request track-sc0 base # track host+path, ignore query string http-request return status 503 content-type text/html \ lf-file excuse.html if { sc0_http_fail_rate gt 10 } A more advanced mechanism using gpt0 could even implement high/low rates to disable/enable the service. Reg-test converteers_ref_cnt_never_dec.vtc was updated to test it.	2021-02-10 12:27:01 +01:00
Christopher Faulet	84600631cd	MINOR: stick-tables: Add functions to update some values of a tracked counter The cumulative numbers of http requests, http errors, bytes received and sent and their respective rates for a tracked counters are now updated using specific stream independent functions. These functions are used by the stream but the aim is to allow the session to do so too. For now, there is no reason to perform these updates from the session, except from the mux-h2 maybe. But, the mux-h1, on the frontend side, will be able to return some errors to the client, before the stream creation. In this case, it will be mandatory to update counters tracked at the session level.	2020-12-04 14:41:49 +01:00
Willy Tarreau	b2551057af	CLEANUP: include: tree-wide alphabetical sort of include files This patch fixes all the leftovers from the include cleanup campaign. There were not that many (~400 entries in ~150 files) but it was definitely worth doing it as it revealed a few duplicates.	2020-06-11 10:18:59 +02:00
Willy Tarreau	872f2ea209	REORG: include: move stick_table.h to haproxy/stick_table{,-t}.h The stktable_types[] array declaration was moved to the main file as it had nothing to do in the types. A few declarations were reordered in the types file so that defines were before the structs. Thread-t was added since there are a few __decl_thread(). The loss of peers.h revealed that cfgparse-listen needed it.	2020-06-11 10:18:58 +02:00

29 Commits