haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-06 23:27:04 +02:00

Author	SHA1	Message	Date
Willy Tarreau	e03d05c6ce	MINOR: check: remember when we migrate a check The goal here is to explicitly mark that a check was migrated so that we don't do it again. This will allow us to perform other actions on the target thread while still knowing that we don't want to be migrated again. The new READY bit combine with SLEEPING to form 4 possible states: SLP RDY State Description 0 0 - (reserved) 0 1 RUNNING Check is bound to current thread and running 1 0 SLEEPING Check is sleeping, not bound to a thread 1 1 MIGRATING Check is migrating to another thread Thus we set READY upon migration, and check for it before migrating, this is sufficient to prevent a second migration. To make things a bit clearer, the SLEEPING bit was switched with FASTINTER so that SLEEPING and READY are adjacent.	2023-09-01 08:26:06 +02:00
Willy Tarreau	7163f95b43	MINOR: checks: start the checks in sleeping state The CHK_ST_SLEEPING state was introduced by commit `d114f4a68` ("MEDIUM: checks: spread the checks load over random threads") to indicate that a check was not currently bound to a thread and that it could easily be migrated to any other thread. However it did not start the checks in this state, meaning that they were not redispatchable on startup. Sometimes under heavy load (e.g. when using SSL checks with OpenSSL 3.0) the cost of setting up new connections is so high that some threads may experience connection timeouts on startup. In this case it's better if they can transfer their excess load to other idle threads. By just marking the check as sleeping upon startup, we can do this and significantly reduce the number of failed initial checks.	2023-09-01 08:26:06 +02:00
Willy Tarreau	52b260bae4	MINOR: server/ssl: maintain an index of the last known valid SSL session When a thread creates a new session for a server, if none was known yet, we assign the thread id (hence the reused_sess index) to a shared variable so that other threads will later be able to find it when they don't have one yet. For now we only set and clear the pointer upon session creation, we do not yet pick it. Note that we could have done it per thread-group, so as to avoid any cross-thread exchanges, but it's anticipated that this is essentially used during startup, at a moment where the cost of inter-thread contention is very low compared to the ability to restart at full speed, which explains why instead we store a single entry.	2023-08-31 08:50:01 +02:00
Willy Tarreau	607041dec3	MEDIUM: server/ssl: place an rwlock in the per-thread ssl server session The goal will be to permit a thread to update its session while having it shared with other threads. For now we only place the lock and arrange the code around it so that this is quite light. For now only the owner thread uses this lock so there is no contention. Note that there is a subtlety in the openssl API regarding i2s_SSL_SESSION() in that it fills the area pointed to by its argument with a dump of the session and returns a size that's equal to the previously allocated one. As such, it does modify the shared area even if that's not obvious at first glance.	2023-08-31 08:50:01 +02:00
Alexander Stephan	ece0d1ab49	MINOR: sample: Refactor fc_pp_authority by wrapping the generic TLV fetch We already have a call that can retreive an TLV with any value. Therefore, the fetch logic is redundant and can be simplified by simply calling the generic fetch with the correct TLV ID set as an argument.	2023-08-29 15:31:51 +02:00
Alexander Stephan	fecc573da1	MEDIUM: connection: Generic, list-based allocation and look-up of PPv2 TLVs In order to be able to implement fetches in the future that allow retrieval of any TLVs, a new generic data structure for TLVs is introduced. Existing TLV fetches for PP2_TYPE_AUTHORITY and PP2_TYPE_UNIQUE_ID are migrated to use this new data structure. TLV related pools are updated to not rely on type, but only on size. Pools accomodate the TLV list element with their associated value. For now, two pools for 128 B and 256 B values are introduced. More fine-grained solutions are possible in the future, if necessary.	2023-08-29 15:15:47 +02:00
Alexander Stephan	c9d47652d2	CLEANUP/MINOR: connection: Improve consistency of PPv2 related constants This patch improves readability by scoping HA proxy related PPv2 constants with a 'HA" prefix. Besides, a new constant for the length of a CRC32C TLV is introduced. The length is derived from the PPv2 spec, so 32 Bit.	2023-08-29 15:15:47 +02:00
Willy Tarreau	bd84387beb	MEDIUM: capabilities: enable support for Linux capabilities For a while there has been the constraint of having to run as root for transparent proxying, and we're starting to see some cases where QUIC is not running in socket-per-connection mode due to the missing capability that would be needed to bind a privileged port. It's not realistic to ask all QUIC users on port 443 to run as root, so instead let's provide a basic support for capabilities at least on linux. The ones currently supported are cap_net_raw, cap_net_admin and cap_net_bind_service. The mechanism was made OS-specific with a dedicated file because it really is. It can be easily refined later for other OSes if needed. A new keyword "setcaps" is added to the global section, to enumerate the capabilities that must be kept when switching from root to non-root. This is ignored in other situations though. HAProxy has to be built with USE_LINUX_CAP=1 for this to be supported, which is enabled by default for linux-glibc, linux-glibc-legacy and linux-musl. A good way to test this is to start haproxy with such a config: global uid 1000 setcap cap_net_bind_service frontend test mode http timeout client 3s bind quic4@:443 ssl crt rsa+dh2048.pem allow-0rtt and run it under "sudo strace -e trace=bind,setuid", then connecting there from an H3 client. The bind() syscall must succeed despite the user id having been switched.	2023-08-29 11:11:50 +02:00
William Lallemand	e7d9082315	BUG/MINOR: ssl/cli: can't find ".crt" files when replacing a certificate Bug was introduced by commit 26654 ("MINOR: ssl: add "crt" in the cert_exts array"). When looking for a .crt directly in the cert_exts array, the ssl_sock_load_pem_into_ckch() function will be called with a argument which does not have its ".crt" extensions anymore. If "ssl-load-extra-del-ext" is used this is not a problem since we try to add the ".crt" when doing the lookup in the tree. However when using directly a ".crt" without this option it will failed looking for the file in the tree. The fix removes the "crt" entry from the array since it does not seem to be really useful without a rework of all the lookups. Should fix issue #2265 Must be backported as far as 2.6.	2023-08-28 18:20:39 +02:00
Willy Tarreau	892d04733f	BUILD: import: guard plock.h against multiple inclusion Surprisingly there's no include guard in plock.h though there is one in atomic-ops.h. Let's add one, or we cannot risk including the file multiple times.	2023-08-26 17:28:08 +02:00
Amaury Denoyelle	5afcb686b9	MAJOR: connection: purge idle conn by last usage Backend idle connections are purged on a recurring occurence during the process lifetime. An estimated number of needed connections is calculated and the excess is removed periodically. Before this patch, purge was done directly using the idle then the safe connection tree of a server instance. This has a major drawback to take no account of a specific ordre and it may removed functional connections while leaving ones which will fail on the next reuse. The problem can be worse when using criteria to differentiate idle connections such as the SSL SNI. In this case, purge may remove connections with a high rate of reusing while leaving connections with criteria never matched once, thus reducing drastically the reuse rate. To improve this, introduce an alternative storage for idle connection used in parallel of the idle/safe trees. Now, each connection inserted in one of this tree is also inserted in the new list at `srv_per_thread.idle_conn_list`. This guarantees that recently used connection is present at the end of the list. During the purge, use this list instead of idle/safe trees. Remove first connection in front of the list which were not reused recently. This will ensure that connection that are frequently reused are not purged and should increase the reuse rate, particularily if distinct idle connection criterias are in used.	2023-08-25 15:57:48 +02:00
Amaury Denoyelle	61fc9568fb	MINOR: server: move idle tree insert in a dedicated function Define a new function _srv_add_idle(). This is a simple wrapper to insert a connection in the server idle tree. This is reserved for simple usage and require to idle_conns lock. In most cases, srv_add_to_idle_list() should be used. This patch does not have any functional change. However, it will help with the next patch as idle connection will be always inserted in a list as secondary storage along with idle/safe trees.	2023-08-25 15:57:48 +02:00
Amaury Denoyelle	77ac8eb4a6	MINOR: connection: simplify removal of idle conns from their trees Small change of API for conn_delete_from_tree(). Now the connection instance is taken as argument instead of its inner node. No functional change introduced with this commit. This simplifies slightly invocation of conn_delete_from_tree(). The most useful changes is that this function will be extended in the next patch to be able to remove the connection from its new idle list at the same time as in its idle tree.	2023-08-25 15:57:48 +02:00
Fr�d�ric L�caille	81815a9a83	MEDIUM: map/acl: Replace map/acl spin lock by a read/write lock. Replace ->lock type of pat_ref struct by HA_RWLOCK_T. Replace all calls to HA_SPIN_LOCK() (resp. HA_SPIN_UNLOCK()) by HA_RWLOCK_WRLOCK() (resp. HA_RWLOCK_WRUNLOCK()) when a write access is required. There is only one read access which is needed. This is in the "show map" command callback, cli_io_handler_map_lookup() where a HA_SPIN_LOCK() call is replaced by HA_RWLOCK_RDLOCK() (resp. HA_SPIN_UNLOCK() by HA_RWLOCK_RDUNLOCK). Replace HA_SPIN_INIT() calls by HA_RWLOCK_INIT() calls.	2023-08-25 15:42:03 +02:00
Fr�d�ric L�caille	745d1a269b	MEDIUM: map/acl: Improve pat_ref_set_elt() efficiency (for "set-map", "add-acl"action perfs) Store a pointer to the expression (struct pattern_expr) into the data structure used to chain/store the map element references (struct pat_ref_elt) , e.g. the struct pattern_tree when stored into an ebtree or struct pattern_list when chained to a list. Modify pat_ref_set_elt() to stop inspecting all the expressions attached to a map and to look for the <elt> element passed as parameter to retrieve the sample data to be parsed. Indeed, thanks to the pointer added above to each pattern tree nodes or list elements, they all can be inspected directly from the <elt> passed as parameter and its ->tree_head and ->list_head member: the pattern tree nodes are stored into elt->tree_head, and the pattern list elements are chained to elt->list_head list. This inspection was also the job of pattern_find_smp() which is no more useful. This patch removes the code of this function.	2023-08-25 15:41:59 +02:00
Fr�d�ric L�caille	0844bed7d3	MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs) Organize reference to pattern element of map (struct pat_ref_elt) into an ebtree: - add an eb_root member to the map (pat_ref struct) and an ebpt_node to its element (pat_ref_elt struct), - modify the code to insert these nodes into their ebtrees each time they are allocated. This is done in pat_ref_append(). Note that ->head member (struct list) of map (struct pat_ref) is not removed could have been removed. This is not the case because still necessary to dump the map contents from the CLI in the order the map elememnts have been inserted. This patch also modifies http_action_set_map() which is the callback at least used by "set-map" action. The pat_ref_elt element returned by pat_ref_find_elt() is no more ignored, but reused if not NULL by pat_ref_set() as first element to lookup from. This latter is also modified to use the ebtree attached to the map in place of the ->head list attached to each map element (pat_ref_elt struct). Also modify pat_ref_find_elt() to makes it use ->eb_root map ebtree added to the map by this patch in place of inspecting all the elements with a strcmp() call.	2023-08-25 15:41:56 +02:00
Amaury Denoyelle	5053e89142	MEDIUM: h2: prevent stream opening before connection reverse completed HTTP/2 demux must be handled with care for active reverse connection. Until accept has been completed, it should be forbidden to handle HEADERS frame as session is not yet ready to handle streams. To implement this, use the flag H2_CF_DEM_TOOMANY which blocks demux process. This flag is automatically set just after conn_reverse() invocation. The flag is removed on rev_accept_conn() callback via a new H2 ctl enum. H2 tasklet is woken up to restart demux process. As a side-effect, reporting in H2 mux may be blocked as demux functions are used to convert error status at the connection level with CO_FL_ERROR. To ensure error is reported for a reverse connection, check h2c_is_dead() specifically for this case in h2_wake(). This change also has its own side-effect : h2c_is_dead() conditions have been adjusted to always exclude !h2c->conn->owner condition which is always true for reverse connection or else H2 mux may kill them unexpectedly.	2023-08-24 17:03:08 +02:00
Amaury Denoyelle	47f502df5e	MEDIUM: proto_reverse_connect: bootstrap active reverse connection Implement active reverse connection initialization. This is done through a new task stored in the receiver structure. This task is instantiated via bind callback and first woken up via enable callback. Task handler is separated into two halves. On the first step, a new connection is allocated and stored in <pend_conn> member of the receiver. This new client connection will proceed to connect using the server instance referenced in the bind_conf. When connect has successfully been executed and HTTP/2 connection is ready for exchange after SETTINGS, reverse_connect task is woken up. As <pend_conn> is still set, the second halve is executed which only execute listener_accept(). This will in turn execute accept_conn callback which is defined to return the pending connection. The task is automatically requeued inside accept_conn callback if bind maxconn is not yet reached. This allows to specify how many connection should be opened. Each connection is instantiated and reversed serially one by one until maxconn is reached. conn_free() has been modified to handle failure if a reverse connection fails before being accepted. In this case, no session exists to notify about the failure. Instead, reverse_connect task is requeud with a 1 second delay, giving time to fix a possible network issue. This will allow to attempt a new connection reverse. Note that for the moment connection rebinding after accept is disabled for simplicity. Extra operations are required to migrate an existing connection and its stack to a new thread which will be implemented later.	2023-08-24 17:03:06 +02:00
Amaury Denoyelle	0747e493a0	MINOR: proto_reverse_connect: parse rev@ addresses for bind Implement parsing for "rev@" addresses on bind line. On config parsing, server name is stored on the bind_conf. Several new callbacks are defined on reverse_connect protocol to complete parsing. listen callback is used to retrieve the server instance from the bind_conf server name. If found, the server instance is stored on the receiver. Checks are implemented to ensure HTTP/2 protocol only is used by the server.	2023-08-24 17:02:37 +02:00
Amaury Denoyelle	008e8f67ee	MINOR: connection: extend conn_reverse() for active reverse Implement active reverse support inside conn_reverse(). This is used to transfer the connection from the backend to the frontend side. A new flag is defined CO_FL_REVERSED which is set just after this transition. This will be used to identify connections which were reversed but not yet accepted.	2023-08-24 17:02:37 +02:00
Amaury Denoyelle	5db6dde058	MINOR: proto: define dedicated protocol for active reverse connect A new protocol named "reverse_connect" is created. This will be used to instantiate connections that are opened by a reverse bind. For the moment, only a minimal set of callbacks are defined with no real work. This will be extended along the next patches.	2023-08-24 17:02:37 +02:00
Amaury Denoyelle	1723e21af2	MINOR: connection: use attach-srv name as SNI reuse parameter on reverse On connection passive reverse from frontend to backend, its hash node is calculated to be able to select it from the idle server pool. If attach-srv rule defined an associated name, reuse it as the value for SNI prehash. This change allows a client to select a reverse connection by its name by configuring its server line with a SNI to permit this.	2023-08-24 17:02:34 +02:00
Amaury Denoyelle	0b3758e18f	MINOR: tcp-act: define optional arg name for attach-srv Add an optional argument 'name' for attach-srv rule. This contains an expression which will be used as an identifier inside the server idle pool after reversal. To match this connection for a future transfer through the server, the SNI server parameter must match this name. If no name is defined, match will only occur with an empty SNI value. For the moment, only the parsing step is implemented. An extra check is added to ensure that the reverse server uses SSL with a SNI. Indeed, if name is defined but server does not uses a SNI, connections will never be selected on reused after reversal due to a hash mismatch.	2023-08-24 15:28:38 +02:00
Amaury Denoyelle	58cb76d7e1	MINOR: tcp-act: parse 'tcp-request attach-srv' session rule Create a new tcp-request session rule 'attach-srv'. The parsing handler is used to extract the server targetted with the notation 'backend/server'. The server instance is stored in the act_rule instance under the new union variant 'attach_srv'. Extra checks are implemented in parsing to ensure attach-srv is only used for proxy in HTTP mode and with listeners/server with no explicit protocol reference or HTTP/2 only. The action handler itself is really simple. It assigns the stored server instance to the 'reverse' member of the connection instance. It will be used in a future patch to implement passive reverse-connect.	2023-08-24 15:02:32 +02:00
Amaury Denoyelle	6e428dfaf2	MINOR: backend: only allow reuse for reverse server A reverse server relies solely on its pool of idle connection to transfer requests which will be populated through a new tcp-request rule 'attach-srv'. Several changes are required on connect_server() to implement this. First, reuse mode is forced to always for this type of server. Then, if no idle connection is found, the request will be aborted. This results with a 503 HTTP error code, similarly to when no server is available.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	e6223a3188	MINOR: server: define reverse-connect server Implement reverse-connect server. This server type cannot instantiate its own connection on transfer. Instead, it can only reuse connection from its idle pool. These connections will be populated using the future 'tcp-request session attach-srv' rule. A reverse-connect has no address. Instead, it uses a new custom server notation with '@' character prefix. For the moment, only '@reverse' is defined. An extra check is implemented to ensure server is used in a HTTP proxy.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	4fb538d4b6	MEDIUM: h2: reverse connection after SETTINGS reception Reverse connection after SETTINGS reception if it was set as reversable. This operation is done in a new function h2_conn_reverse(). It regroups common changes which are needed for both reversal direction : H2_CF_IS_BACK is set or unset and timeouts are inverted. For the moment, only passive reverse is fully implemented. Once done, the connection instance is directly inserted in its targetted server pool. It can then be used immediately for future transfers using this server.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	1f76b8ae07	MEDIUM: connection: implement passive reverse Define a new method conn_reverse(). This method is used to reverse a connection from frontend to backend or vice-versa depending on its initial status. For the moment, passive reverse only is implemented. This covers the transition from frontend to backend side. The connection is detached from its owner session which can then be freed. Then the connection is linked to the server instance. only for passive connection on frontend to transfer them on the backend side. This requires to free the connection session after detaching it from.	2023-08-24 14:44:33 +02:00
Amaury Denoyelle	fbe35afaa4	MINOR: proxy: simplify parsing 'backend/server' Several CLI handlers use a server argument specified with the format '<backend>/<server>'. The parsing of this arguement is done in two steps, first splitting the string with '/' delimiter and then use get_backend_server() to retrieve the server instance. Refactor this code sections with the following changes : * splitting is reimplented using ist API * get_backend_server() is removed. Instead use the already existing proxy_be_by_name() then server_find_by_name() which contains duplicated code with the now removed function. No functional change occurs with this commit. However, it will be useful to add new configuration options reusing the same '<backend>/<server>' for reverse connect.	2023-08-24 14:44:33 +02:00
Willy Tarreau	9b47ed1a93	IMPORT: xxhash: update xxHash to version 0.8.2 Peter Varkoly reported a build issue on ppc64le in xxhash.h. Our version (0.8.1) was the last one 9 months ago, and since then this specific issue was addressed in 0.8.2, so let's apply the maintenance update. This should be backported to 2.8 and 2.7.	2023-08-24 12:01:06 +02:00
Amaury Denoyelle	cd97ba147c	BUILD/IMPORT: fix compilation with PLOCK_DISABLE_EBO=1 Compilation is broken due to missing __pl_wait_unlock_long() definition when building with PLOCK_DISABLE_EBO=1. This has been introduced since the following commit which activates the inlining version of pl_wait_unlock_long() : commit `071d689a51` MINOR: threads: inline the wait function for pthread_rwlock emulation Add an extra check on PLOCK_DISABLE_EBO before choosing the inline or default version of pl_wait_unlock_long() to fix this.	2023-08-17 11:16:54 +02:00
Willy Tarreau	78fa54863d	MINOR: atomic: make sure to always relax after a failed CAS There were a few places left where we forgot to call __ha_cpu_relax() after a failed CAS, in the HA_ATOMIC_UPDATE_{MIN,MAX} macros, and in a few sync_* API macros (the same as above plus HA_ATOMIC_CAS and HA_ATOMIC_XCHG). Let's add them now. This could have been a cause of contention, particularly with process_stream() calling stream_update_time_stats() which uses 8 of them in a call (4 for the server, 4 for the proxy). This may be a possible explanation for the high CPU consumption reported in GH issue #2251. This should be backported at least to 2.6 as it's harmless.	2023-08-17 09:09:20 +02:00
Willy Tarreau	071d689a51	MINOR: threads: inline the wait function for pthread_rwlock emulation When using pthread_rwlock emulation, contention is reported on pl_wait_unlock_long(). This is really not convenient to analyse what is happening. Now plock supports inlining the wait call for just the lorw functions by enabling PLOCK_LORW_INLINE_WAIT. Let's do this so that now the wait time will be precisely reported as either pthread_rwlock_rdlock() or pthread_rwlock_wrlock() depending on the contended function, but no more on pl_wait_unlock_long(), which will still be reported for all other locks.	2023-08-17 00:09:05 +02:00
Willy Tarreau	e56275378f	IMPORT: lorw: support inlining the wait call Now when PLOCK_LORW_INLINE_WAIT is defined, the pl_wait_unlock_long() calls in pl_lorw_rdlock() and pl_lorw_wrlock() will be inlined so that all the CPU time is accounted for in the calling function. This is plock upstream commit c993f81d581732a6eb8fe3033f21970420d21e5e.	2023-08-17 00:09:05 +02:00
Willy Tarreau	66dcc0550e	IMPORT: plock: always expose the inline version of the lock wait function Doing so will allow to expose the time spent in certain highly contended functions, which can be desirable for more accurate CPU profiling. For example this could be done in locking functions that are already not inlined so that they are the ones being reported as those consuming the CPU instead of just pl_wait_unlock_long(). This is plock upstream commit 7505c2e2c8c4aa0ab8f52a2288e1334ae6412be4.	2023-08-17 00:09:05 +02:00
Willy Tarreau	c6b98f05d2	IMPORT: plock: also support inlining the int code Commit 9db830b ("plock: support inlining exponential backoff code") added an option to support inlining of the wait code for longs but forgot to do it for ints. Let's do it now. This is plock upstream commit b1f9f0d252fa40577d11cfb2bc0a809d6960a297.	2023-08-17 00:09:05 +02:00
Willy Tarreau	7bf829ace1	MAJOR: pools: move the shared pool's free_list over multiple buckets This aims at further reducing the contention on the free_list when using global pools. The free_list pointer now appears for each bucket, and both the alloc and the release code skip to a next bucket when ending on a contended entry. The default entry used for allocations and releases depend on the thread ID so that locality is preserved as much as possible under low contention. It would be nice to improve the situation to make sure that releases to the shared pools doesn't consider the first entry's pointer but only an argument that would be passed and that would correspond to the bucket in the thread's cache. This would reduce computations and make sure that the shared cache only contains items whose pointers match the same bucket. This was not yet done. One possibility could be to keep the same splitting in the local cache. With this change, an h2load test with 5 * 160 conns & 40 streams on 80 threads that was limited to 368k RPS with the shared cache jumped to 3.5M RPS for 8 buckets, 4M RPS for 16 buckets, 4.7M RPS for 32 buckets and 5.5M RPS for 64 buckets.	2023-08-12 19:04:34 +02:00
Willy Tarreau	8a0b5f783b	MINOR: pools: move the failed allocation counter over a few buckets The failed allocation counter cannot depend on a pointer, but since it's a perpetually increasing counter and not a gauge, we don't care where it's incremented. Thus instead we're hashing on the TID. There's no contention there anyway, but it's better not to waste the room in the pool's heads and to move that with the other counters.	2023-08-12 19:04:34 +02:00
Willy Tarreau	da6999f839	MEDIUM: pools: move the needed_avg counter over a few buckets That's the same principle as for ->allocated and ->used. Here we return the summ of the raw values, so the result still needs to be fed to swrate_avg(). It also means that we now use the local ->used instead of the global one for the calculations and do not need to call pool_used() anymore on fast paths. The number of samples should likely be divided by the number of buckets, but that's not done yet (better observe first). A function pool_needed_avg() was added to report aggregated values for the "show pools" command. With this change, an h2load made of 5 * 160 conn * 40 streams on 80 threads raised from 1.5M RPS to 6.7M RPS.	2023-08-12 19:04:34 +02:00
Willy Tarreau	9e5eb586b1	MEDIUM: pools: move the used counter over a few buckets That's the same principle as for ->allocated. The small difference here is that it's no longer possible to decrement ->used in batches when releasing clusters from the cache to the shared cache, so the counter has to be decremented for each of them. But as it provides less contention and it's done only during forced eviction, it shouldn't be a problem. A function "pool_used()" was added to return the sum of the entries. It's used by pool_alloc_nocache() and pool_free_nocache() which need to count the number of used entries. It's not a problem since such operations are done when picking/releasing objects to/from the OS, but it is a reminder that the number of buckets should remain small. With this change, an h2load test made of 5 * 160 conn * 40 streams on 80 threads raised from 812k RPS to 1.5M RPS.	2023-08-12 19:04:34 +02:00
Willy Tarreau	cdb711e42b	MEDIUM: pools: spread the allocated counter over a few buckets The ->used counter is one of the most stressed, and it heavily depends on the ->allocated one, so let's first move ->allocated to a few buckets. A function "pool_allocated()" was added to return the sum of the entries. It's important not to abuse it as it does iterate, so everywhere it's possible to avoid it by keeping a local counter, it's better. Currently it's used for limited pools which need to make sure they do not allocate too many objects. That's an acceptable tradeoff to save CPU on large machines at the expense of spending a little bit more on small ones which normally are not under load.	2023-08-12 19:04:34 +02:00
Willy Tarreau	06885aaea7	MINOR: pools: introduce the use of multiple buckets On many threads and without the shared cache, there can be extreme contention on the ->allocated counter, the ->free_list pointer, and the ->used counter. It's possible to limit this contention by spreading the counters a little bit over multiple entries, that are summed up when a consultation is needed. The criterion used to spread the values cannot be related to the thread ID due to migrations, since we need to keep consistent stats (allocated vs used). Instead we'll just hash the pointer, it provides an index that does the job and that is consistent for the object. When having just a few entries (16 here as it showed almost identical performance between global and non-global pools) even iterations should be short enough during measurements to not be a problem. A pair of functions designed to ease pointer hash bucket calculation were added, with one of them doing it for thread IDs because allocation failures will be associated with a thread and not a pointer. For now this patch only brings in the relevant parts of the infrastructure, the CONFIG_HAP_POOL_BUCKETS_BITS macro that defaults to 6 bits when 512 threads or more are supported, 5 bits when 128 or more are supported, 4 bits when 16 or more are supported, otherwise 3 bits for small setups. The array in the pool_head and the two utility functions are already added. It should have no measurable impact beyond inflating the pool_head structure.	2023-08-12 19:04:34 +02:00
Willy Tarreau	29ad61fb00	OPTIM: pools: make pool_get_from_os() / pool_put_to_os() not update ->allocated The pool's allocation counter doesn't strictly require to be updated from these functions, it may more efficiently be done in the caller (even out of a loop for pool_flush() and pool_gc()), and doing so will also help us spread the counters over an array later. The functions were renamed _noinc and _nodec to make sure we catch any possible user in an external patch. If needed, the original functions may easily be reimplemented in an inline function.	2023-08-12 19:04:34 +02:00
Willy Tarreau	f0d188f6ed	OPTIM: tools: improve hash distribution using a better prime seed During tests it was noticed that the current hash is not that good on 4- and 5- bit hashes. About 7.5% of all the 32-bit primes were tested as candidates for the hash function, by submitting them 128 arrangements of N pointers among 40k extracted from haproxy's pools, and the average fill rates for 1- to 12- bit hashes were measured and compared. It was clear that some values do not provide great hashes and other ones are way more resistant. The current value is not bad at all but delivers 42.6% unique 2-bit outputs, 41.6% 3-bit, 38.0% 4-bit, 38.2% 5-bit and 37.1% 10-bit. Some values did perform significantly better, among which 0xacd1be85 which does 43.2% 2-bit, 42.5% 3-bit, 42.2% 4-bit, 39.2% 5-bit and 37.3% 10-bit. The reverse value used in the ptr2_hash() was really underperforming and was replaced with 0x9d28e4e9 which does 49.6%, 40.4%, 42.6%, 39.1%, and 37.2% respectvely. This should slightly improve the accuracy of the task and memory profiling, and will be useful for pools.	2023-08-12 19:04:34 +02:00
Willy Tarreau	58946d44f8	MINOR: tools: improve ptr hash distribution on 64 bits When testing the pointer hash on 64-bit real pointers (map entries), it appeared that the shift by 33 bits that hoped to compensate for the 3 nul LSB degrades the hash, and the centering is more optimal on 31-(bits+1)/2. This makes sense since the topmost bit of the multiplicator is 31, so for an input of 1 bit and 1 bit of output we would always get zero. With the formula adjusted this way, we can get up to ~15% more unique entries at 10 bits and ~24% more at 11 bits.	2023-08-12 19:04:34 +02:00
Willy Tarreau	ab6cb5dea0	MINOR: tools: make ptr_hash() support 0-bit outputs When dealing with macro-based size definitions, it is useful to be able to hash pointers on zero bits so that the macro automatically returns a constant 0. For now it only supports 1-32. Let's just add this special case. It's automatically optimized out by the compiler since the function is inlined.	2023-08-12 19:04:34 +02:00
Willy Tarreau	59c347c15e	BUILD: defaults: use __WORDSIZE not LONGBITS for MAX_THREADS_PER_GROUP LONGBITS was defined long ago with old compilers that didn't provide the word size. It's still present as being referenced in various places in the code, but we must not use it to define other macros that may be evaluated at pre-processing time since it contains sizeof() and casts that are not compatible with preprocessor conditions. Let's switch MAX_THREADS_PER_GROUP to __WORDSIZE so that we can condition blocks of code on it if needed. LONGBITS should really be removed by now, given that we don't support compilers not providing __WORDSIZE anymore (gcc < 4.2).	2023-08-12 19:04:34 +02:00
Willy Tarreau	9e52c35de4	CLEANUP: stick-table: slightly reorder the stktable struct By moving the config-time stuff after the updt_lock, we can plug some holes without interfering with it. This allows us to get back to the 768-bytes struct. The performance was not affected at all.	2023-08-11 19:03:35 +02:00
Willy Tarreau	9c6248560e	MINOR: stick-table: move the update lock into its own cache line The read-lock contention observed on the update lock while turning it into an upgradable lock were due to false sharing with the nearby updates. Simply moving the lock alone into its own cache line is sufficient to almost double the performance again, raising from 2355 to 4480k RPS with very low contention: Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 743422995452 lost Overhead Shared Object Symbol 15.88% haproxy [.] stktable_lookup_key 5.94% haproxy [.] ebmb_lookup 5.69% haproxy [.] http_wait_for_request 3.66% haproxy [.] stktable_touch_with_exp 2.62% [kernel] [k] _raw_spin_unlock_irqrestore 1.86% haproxy [.] http_action_return 1.79% haproxy [.] stream_process_counters 1.78% [kernel] [k] skb_release_data 1.77% haproxy [.] process_stream Unfortunately, trying to move the line anywhere else didn't work, despite the remaining holes, because this structure is not quite clean. This adds 64 bytes to a struct that was already 768 long, so it's now 832. It's possible to repack it a little bit and regain these bytes by removing the THREAD_ALIGN before "keys" because we rarely use the config stuff, but that's a bit unsafe.	2023-08-11 19:03:35 +02:00
Willy Tarreau	87e072eea5	MEDIUM: stick-table: use a distinct lock for the updates tree Updating an entry in the updates tree is currently performed under the table's write lock, which causes huge contention with other accesses such as lookups and free. Aside the updates tree, the update, localupdate and commitupdate variables, nothing is manipulated, so let's create a distinct lock (updt_lock) to protect these together to remove this contention. It required to add an extra lock in the few places where we delete the update (though only if we're really going to delete it) to protect the tree. This is very convenient because now peer_send_teachmsgs() only needs to take this read lock, and there is very little contention left on the stick-table. With this alone, the performance jumped from 614k to 1140k/s on a 80-thread machine with a peers section! Stick-table updates with no peers however now has to stand two locks and slightly regressed from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the significant unlocking of the peers updates and considered totally acceptable.	2023-08-11 19:03:35 +02:00
Willy Tarreau	cc10fce9c2	MINOR: stick-table: better organize the struct stktable The structure currently mixes R/O and R/W fields, let's organize them by access type, focusing mainly on splitting the updates from the rest so that peers activity does not affect the rest. For now it doesn't bring any benefit but it paves the way for splitting the lock.	2023-08-11 19:03:35 +02:00
Willy Tarreau	7968fe3889	MEDIUM: stick-table: change the ref_cnt atomically Due to the ts->ref_cnt being manipulated and checked inside wrlocks, we continue to have it updated under plenty of read locks, which have an important cost on many-thread machines. This patch turns them all to atomic ops and carefully moves them outside of locks every time this is possible: - the ref_cnt is incremented before write-unlocking on creation otherwise the element could vanish before we can do it - the ref_cnt is decremented after write-locking on release - for all other cases it's updated out of locks since it's guaranteed by the sequence that it cannot vanish - checks are done before locking every time it's used to decide whether we're going to release the element (saves several write locks) - expiration tests are just done using atomic loads, since there's no particular ordering constraint there, we just want consistent values. For Lua, the loop that is used to dump stick-tables could switch to read locks only, but this was not done. For peers, the loop that builds updates in peer_send_teachmsgs is extremely expensive in write locks and it doesn't seem this is really needed since the only updated variables are last_pushed and commitupdate, the first one being on the shared table (thus not used by other threads) and the commitupdate could likely be changed using a CAS. Thus all of this could theoretically move under a read lock, but that was not done here. On a 80-thread machine with a peers section enabled, the request rate increased from 415 to 520k rps.	2023-08-11 19:03:35 +02:00
Willy Tarreau	8178a5211c	MAJOR: threads/plock: update the embedded library again This updates the local copy of the plock library to benefit from finer memory ordering, EBO on more operations such as when take_w() and stow() wait for readers to leave and refined EBO, especially on common operation such as attempts to upgade R to S, and avoids a counter-productive prior read in rtos() and take_r(). These changes have shown a 5% increase on regular operations on ARM, a 33% performance increase on ARM on stick-tables and 2% on x86, and a 14% and 4% improvements on peers updates respectively on ARM and x86. The availability of relaxed operations will probably be useful for stats counters which are still extremely expensive to update. The following plock commits were included in this update: 9db830b plock: support inlining exponential backoff code 008d3c2 plock: make the rtos upgrade faster 2f76dde atomic: clean up the generic xchg() 3c6919b atomic: make sure that the no-return macros do not return a value 97c2bb7 atomic: make the fallback bts use the pointed type for the shift f4c1880 atomic: also implement the missing pl_btr() 8329b82 atomic: guard all generic definitions to make it easier to provide specific ones 7c5cb62 atomic: use C11 atomics when available 96afaf9 atomic: prefer the C11 definitions in general f3ec7a6 atomic: implement load/store/atomic barriers 8bdbd1e atomic: add atomic load/stores 0f604c0 atomic: add more _noret operations 3fe35db atomic: remove the (void) cast from the C11 operations 3b08a7c atomic: allow to define the fallback _noret variants 28deb22 atomic: make x86 arithmetic operations the _noret variants 8061fe2 atomic: handle modern compilers that support returning flags b8b91b7 atomic: add the fetch-and-<op> operations (pl_ld<op>) 59817ca atomic: add memory order variants for most operations a40774f plock: explicitly make use of the pl_*_noret operations 6f1861b plock: switch to pl_sub_noret_lax() for cancellation c013980 plock: use pl_ldadd{_lax,_acq,} instead of pl_xadd() 382eea3 plock: use a release ordering when dropping the lock 60d750d plock: use EBO when waiting for readers to leave in take_w() and stow() fc01c4f plock: improve EBO a little bit 1ef6390 plock: switch to CAS + XADD for pl_take_r()	2023-08-11 19:03:35 +02:00
Frédéric Lécaille	b0e32c6263	BUG/MINOR: quic: Possible crash when issuing "show fd/sess" CLI commands ->xprt_ctx (struct ssl_sock_ctx) and ->conn (struct connection) must be kept by the remaining QUIC connection object (struct quic_cc_conn) after having release the previous one (struct quic_conn) to allow "show fd/sess" commands to be functional without causing haproxy crashes. No need to backport.	2023-08-11 11:21:31 +02:00
Willy Tarreau	d93a00861d	MINOR: h2: pass accept-invalid-http-request down the request parser We're adding a new argument "relaxed" to h2_make_htx_request() so that we can control its level of acceptance of certain invalid requests at the proxy level with "option accept-invalid-http-request". The goal will be to add deactivable checks that are still desirable to have by default. For now no test is subject to it.	2023-08-08 19:10:54 +02:00
Willy Tarreau	30f58f4217	MINOR: http: add new function http_path_has_forbidden_char() As its name implies, this function checks if a path component has any forbidden headers starting at the designated location. The goal is to seek from the result of a successful ist_find_range() for more precise chars. Here we're focusing on 0x00-0x1F, 0x20 and 0x23 to make sure we're not too strict at this point.	2023-08-08 19:10:54 +02:00
Willy Tarreau	197668de97	MINOR: ist: add new function ist_find_range() to find a character range This looks up the character range <min>..<max> in the input string and returns a pointer to the first one found. It's essentially the equivalent of ist_find_ctl() in that it searches by 32 or 64 bits at once, but deals with a range.	2023-08-08 19:10:54 +02:00
Willy Tarreau	d4069f3cee	REORG: http: move has_forbidden_char() from h2.c to http.h This function is not H2 specific but rather generic to HTTP. We'll need it in H3 soon, so let's move it to HTTP and rename it to http_header_has_forbidden_char().	2023-08-08 19:02:24 +02:00
Frédéric Lécaille	9f7cfb0a56	MEDIUM: quic: Allow the quic_conn memory to be asap released. When the connection enters the "connection closing" state after having sent a datagram with CONNECTION_CLOSE frames inside its packets, a lot of memory may be freed from quic_conn objects (QUIC connection). This is done allocating a reduced sized object which keeps enough information to handle the remaining incoming packets for the connection in "connection closing" state, and to continue to send again the previous datagram with CONNECTION_CLOSE frames inside which has already been sent. Define a new quic_cc_conn struct which represents the connection object after entering the "connection close" state and after having release the quic_conn connection object. Define <pool_head_quic_cc_conn> new pool for these quic_cc_conn struct objects. Define QUIC_CONN_COMMON structure which is shared between quic_conn struct object (the connection before entering "connection close" state), and new quic_cc_conn struct object (the connection after entering "connection close"). So, all the members inside QUIC_CONN_COMMON may be indifferently dereferenced from a quic_conn struct or a quic_cc_conn struct pointer. Implement qc_new_cc_conn() function to allocate such connections in "connection close" state. This function is responsible of copying the required information from the original connection (quic_conn) to the remaining connection (quic_cc_conn). Among others initialization, it redefined the QUIC packet handler task to quic_cc_conn_io_cb() and the idle timer task to qc_cc_idle_timer_task(). quic_cc_conn_io_cb() drains the received and resend the datagram which CONNECTION_CLOSE frame which has already been sent when entering "connection close" state. qc_cc_idle_timer_task() only releases the remaining quic_cc_conn struct object. Modify quic_conn_release() to allocate quic_cc_conn struct objects from the original connection passed as argument. It does nothing if this original connection is not in closing state, or if the idle timer has already expired. Implement quic_release_cc_conn() to release a "connection close" connection. It is called when its timer expires or if an error occured when sending a packet from this connection when the peer is no more reachable.	2023-08-08 14:59:17 +02:00
Frédéric Lécaille	276697438d	MINOR: quic: Use a pool for the connection ID tree. Add "quic_cids" new pool to allocate the ->cids trees of quic_conn objects. Replace ->cids member of quic_conn objects by pointer to "quic_cids" and adapt the code consequently. Nothing special.	2023-08-08 10:57:00 +02:00
Frédéric Lécaille	dc9b8e1f27	MEDIUM: quic: Send CONNECTION_CLOSE packets from a dedicated buffer. Add a new pool <pool_head_quic_cc_buf> for buffer used when building datagram wich CONNECTION_CLOSE frames inside with QUIC_MIN_CC_PKTSIZE(128) as minimum size. Add ->cc_buf_area to quic_conn struct to store such buffers. Add ->cc_dgram_len to store the size of the "connection close" datagrams and ->cc_buf a buffer struct to be used with ->cc_buf_area as ->area member value. Implement qc_get_txb() to be called in place of qc_txb_alloc() to allocate a struct "quic_cc_buf" buffer when the connection needs an immediate close or a buffer struct if not. Modify qc_prep_hptks() and qc_prep_app_pkts() to allow them to use such "quic_cc_buf" buffer when an immediate close is required.	2023-08-08 10:57:00 +02:00
Frédéric Lécaille	f7ab5918d1	MINOR: quic: Move some counters from [rt]x quic_conn anonymous struct Move rx.bytes, tx.bytes and tx.prep_bytes quic_conn struct member to bytes anonymous struct (bytes.rx, bytes.tx and bytes.prep member respectively). They are moved before being defined into a bytes anonoymous struct common to a future struct to be defined. Consequently adapt the code.	2023-08-07 18:57:45 +02:00
Frédéric Lécaille	a45f90dd4e	MINOR: quic: Amplification limit handling sanitization. Add a BUG_ON() to quic_peer_validated_addr() to check the amplification limit is respected when it return false(0), i.e. when the connection is not validated. Implement quic_may_send_bytes() which returns the number of bytes which may be sent when the connection has not already been validated and call this functions at several places when this is the case (after having called quic_peer_validated_addr()). Furthermore, this patch improves the code maintainability. Some patches to come will have to rename ->[rt]x.bytes quic_conn struct members.	2023-08-07 18:57:45 +02:00
Frédéric Lécaille	1f40b6c9fe	CLEANUP: quic: Remove quic_path_room(). This function is definitively no more needed/used.	2023-08-07 18:57:45 +02:00
Amaury Denoyelle	559482c11e	MINOR: h3: abort request if not completed before full response A HTTP server may provide a complete response even prior receiving the full request. In this case, RFC 9114 allows the server to abort read with a STOP_SENDING with error code H3_NO_ERROR. This scenario was notably reproduced with haproxy and an inactive server. If the client send a POST request, haproxy may provide a full HTTP 503 response before the end of the full request.	2023-08-04 16:17:16 +02:00
Christopher Faulet	8670bb42c2	CLEANUP: stconn: Move comment about sedesc fields on the field line Fields of sedesc structure were documented in the comment about the structure itself. It was not really convenient, hard to read, hard to update. So comments about the fields are moved on the corresponding field line, as usual.	2023-08-04 14:32:57 +02:00
Christopher Faulet	ef2b15998c	BUG/MINOR: htx/mux-h1: Properly handle bodyless responses when splicing is used There is a mechanisme in the H1 and H2 multiplexer to skip the payload when a response is returned to the client when it must not contain any payload (response to a HEAD request or a 204/304 response). However, this does not work when the splicing is used. The H2 multiplexer does not support the splicing, so there is no issue. But with the mux-h1, when data are sent using the kernel splicing, the mux on the server side is not aware the client side should skip the payload. And once the data are put in a pipe, there is no way to stop the sending. It is a defect of the current design. This will be easier to deal with this case when the mux-to-mux forwarding will be implemented. But for now, to fix the issue, we should add an HTX flag on the start-line to pass the info from the client side to the server side and be able to disable the splicing in necessary. The associated reg-test was improved to be sure it does not fail when the splicing is configured. This patch should be backported as far as 2.4..	2023-08-02 12:05:05 +02:00
Patrick Hemmer	7fccccccea	MINOR: acl: add acl() sample fetch This provides a sample fetch which returns the evaluation result of the conjunction of named ACLs.	2023-08-01 10:49:06 +02:00
Patrick Hemmer	00e00fb424	REORG: cfgparse: extract curproxy as a global variable This extracts curproxy from cfg_parse_listen so that it can be referenced by keywords that need the context of the proxy they are being used within.	2023-08-01 10:48:28 +02:00
Patrick Hemmer	997a31dbdf	CLEANUP: acl: remove cache_idx from acl struct It isn't used and never has been.	2023-08-01 10:48:05 +02:00
Frédéric Lécaille	c156c5bda6	MINOR: quic; Move the QUIC frame pool to its proper location pool_head_quic_frame QUIC frame pool definition is move from quic_conn-t.h to quic_frame-t.h. Its declation is moved from quic_conn.c to quic_frame.c.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	fa58f67787	CLEANUP: quic: quic_conn struct cleanup Remove no more used QUIC_TX_RING_BUFSZ macro. Remove several no more used quic_conn struct members.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	444c1a4113	MINOR: quic: Split QUIC connection code into three parts Move the TX part of the code to quic_tx.c. Add quic_tx-t.h and quic_tx.h headers for this TX part code. The definition of quic_tx_packet struct has been move from quic_conn-t.h to quic_tx-t.h. Same thing for the TX part: Move the RX part of the code to quic_rx.c. Add quic_rx-t.h and quic_rx.h headers for this TX part code. The definition of quic_rx_packet struct has been move from quic_conn-t.h to quic_rx-t.h.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	2fe50a01ca	CLEANUP: quic: Defined but no more used function (quic_get_tls_enc_levels()) This function is no more used since this commit: MEDIUM: quic: Handshake I/O handler rework. Let's remove it!	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	7008f16d57	MINOR: quic: Add a new quic_ack.c C module for QUIC acknowledgements Extract the code in relation with the QUIC acknowledgements from quic_conn.c to quic_ack.c to accelerate the compilation of quic_conn.c.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	f454b78fa9	MINOR: quic: Add new "QUIC over SSL" C module. Move the code which directly calls the functions of the OpenSSL QUIC API into quic_ssl.c new C file. Some code have been extracted from qc_conn_finalize() to implement only the QUIC TLS part (see quic_tls_finalize()) into quic_tls.c. qc_conn_finalize() has also been exported to be used from this new quic_ssl.c C module.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	57237f68ad	MINOR: quic: Move TLS related code to quic_tls.c quic_tls_key_update() and quic_tls_rotate_keys() are QUIC TLS functions. Let's move them to their proper location: quic_tls.c.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	953e67abb6	MINOR: quic: Export QUIC CLI code from quic_conn.c To accelerate the compilation of quic_conn.c file, export the code in relation with the QUIC CLI from quic_conn.c to quic_cli.c.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	6334f4f6c5	MINOR: quic: Export QUIC traces code from quic_conn.c To accelerate the compilation of quic_conn.c file, export the code in relation with the traces from quic_conn.c to quic_trace.c. Also add some headers (quic_trace-t.h and quic_trace.h).	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	f32201abb0	MINOR: quic: Add "limited-quic" new tuning setting This setting which may be used into a "global" section, enables the QUIC listener bindings when haproxy is compiled with the OpenSSL wrapper. It has no effect when haproxy is compiled against a TLS stack with QUIC support, typically quictls.	2023-07-21 19:19:27 +02:00
Frédéric Lécaille	2fd67c558a	MINOR: quic: Missing encoded transport parameters for QUIC OpenSSL wrapper This wrapper needs to have an access to an encoded version of the local transport parameter (to be sent to the peer). They are provided to the TLS stack thanks to qc_ssl_compat_add_tps_cb() callback. These encoded transport parameters were attached to the QUIC connection but removed by this commit to save memory: MINOR: quic: Stop storing the TX encoded transport parameters This patch restores these transport parameters and attaches them again to the QUIC connection (quic_conn struct), but only when the QUIC OpenSSL wrapper is compiled. Implement qc_set_quic_transport_params() to encode the transport parameters for a connection and to set them into the stack and make this function work for both the OpenSSL wrapper or any other TLS stack with QUIC support. Its uses the encoded version of the transport parameters attached to the connection when compiled for the OpenSSL wrapper, or local parameters when compiled with TLS stack with QUIC support. These parameters are passed to quic_transport_params_encode() and SSL_set_quic_transport_params() as before this patch.	2023-07-21 17:27:40 +02:00
Frédéric Lécaille	7978493c2e	MINOR: quic: Add a quic_openssl_compat struct to quic_conn struct Add quic_openssl_compat struct to the quic_conn struct to support the QUIC OpenSSL wrapper feature.	2023-07-21 15:54:31 +02:00
Frédéric Lécaille	e3991e03cc	MINOR: quic: Export some KDF functions (QUIC-TLS) quic_hkdf_expand() and quic_hkdf_expand_label() must be used by the QUIC OpenSSL wrapper.	2023-07-21 15:53:41 +02:00
Frédéric Lécaille	780133548c	MINOR: quic: Include QUIC opensssl wrapper header from TLS stacks compatibility header Include haproxy/quic_openssl_compat.h from haproxy/openssl-compat.h when the compilation of the QUIC openssl wrapper for TLS stacks is enabled with USE_QUIC_OPENSSLCOMPAT.	2023-07-21 15:53:40 +02:00
Frédéric Lécaille	1b03f8016d	MINOR: quic: QUIC openssl wrapper implementation Highly inspired from nginx openssl wrapper code. This wrapper implement this list of functions: SSL_set_quic_method(), SSL_quic_read_level(), SSL_quic_write_level(), SSL_set_quic_transport_params(), SSL_provide_quic_data(), SSL_process_quic_post_handshake() and SSL_QUIC_METHOD QUIC specific bio method which are also implemented by quictls to support QUIC from OpenSSL. So, its aims is to support QUIC from a standard OpenSSL stack without QUIC support. It relies on the OpenSSL keylog feature to retreive the secrets derived by the OpenSSL stack during a handshake and to pass them to the ->set_encryption_secrets() callback as this is done by quictls. It makes usage of a callback (quic_tls_compat_msg_callback()) to handle some TLS messages only on the receipt path. Some of them must be passed to the ->add_handshake_data() callback as this is done with quictls to be sent to the peer as CRYPTO data. quic_tls_compat_msg_callback() callback also sends the received TLS alert with ->send_alert() callback. AES 128-bits with CCM mode is not supported at this time. It is often disabled by the OpenSSL stack, but as it can be enabled by "ssl-default-bind-ciphersuites", the wrapper will send a TLS alerts (Handhshake failure) if this algorithm is negotiated between the client and the server. 0rtt is also not supported by this wrapper.	2023-07-21 15:53:40 +02:00
Frédéric Lécaille	72619bda4c	MINOR: quic: add trace about pktns packet/frames releasing Add useful traces which have alredy helped in debugging issues.	2023-07-21 14:31:42 +02:00
Frédéric Lécaille	0645e56a6e	MINOR: quic: Add traces for qc_frm_free() Useful to diagnose memory leak issues in relation with the QUIC frame objects.	2023-07-21 14:30:35 +02:00
Frédéric Lécaille	cf2368a3d5	MEDIUM: quic: Packet building rework. The aim of this patch is to allow the building of QUIC datagrams with as much as packets with different encryption levels inside during handshake. At this time, this is possible only for at most two encryption levels. That said, most of the time, a server only needs to use two encryption levels by datagram, except during retransmissions. Modify qc_prep_pkts(), the function responsible of building datagrams, to pass a list of encryption levels as parameter in place of two encryption levels. This function is also used when retransmitting datagrams. In this case this is a customized/flexible list of encryption level which is passed to this function. Add ->retrans new member to quic_enc_level struct, to be used as attach point to list of encryption level used only during retransmission, and ->retrans_frms new member which is a pointer to a list of frames to be retransmitted.	2023-07-21 14:30:35 +02:00
Frédéric Lécaille	2b8510d722	MINOR: quic: Release asap the negotiated Initial TLS context. This context may be released at the same time as the Initial TLS context. This is done calling quic_tls_ctx_secs_free() and pool_free() in two code locations. Implement quic_nictx_free() to do that.	2023-07-21 14:27:10 +02:00
Frédéric Lécaille	90a63ae4fa	MINOR: quic: Dynamic allocation for negotiated Initial TLS cipher context. Shorten ->negotiated_ictx quic_conn struct member (->nictx). This variable is used during version negotiation. Indeed, a connection may have to support support several QUIC versions of paquets during the handshake. ->nictx is the QUIC TLS cipher context used for the negotiated QUIC version. This patch allows a connection to dynamically allocate this TLS cipher context. Add a new pool (pool_head_quic_tls_ctx) for such QUIC TLS cipher context object. Modify qc_new_conn() to initialize ->nictx to NULL value. quic_tls_ctx_secs_free() frees all the secrets attached to a QUIC TLS cipher context. Modify it to do nothing if it is called with a NULL TLS cipher context. Modify to allocate ->nictx from qc_conn_finalize() just before initializing its secrets. qc_conn_finalize() allocates -nictx only if needed (if a new QUIC version was negotiated). Modify qc_conn_release() which release a QUIC connection (quic_conn struct) to release ->nictx TLS cipher context.	2023-07-21 14:27:10 +02:00
Frédéric Lécaille	642dba8c22	MINOR: quic: Stop storing the TX encoded transport parameters There is no need to keep an encoded version of the QUIC listener transport parameters attache to the connection. Remove ->enc_params and ->enc_params_len member of quic_conn struct. Use variables to build the encoded transport parameter local to ha_quic_set_encryption_secrets() before they are passed to SSL_set_quic_transport_params(). Modify qc_ssl_sess_init() prototype. It was expected to be used with the encoded transport parameters as passed parameter, but they were not used. Cleanup this function.	2023-07-21 14:27:10 +02:00
Patrick Hemmer	57926fe8a3	MINOR: peers: add peers keyword registration This adds support for registering keywords in the 'peers' section.	2023-07-20 18:12:44 +02:00
Willy Tarreau	6ecabb3f35	CLEANUP: config: make parse_cpu_set() return documented values parse_cpu_set() stopped returning the undocumented -1 which was a leftover from an earlier attempt, changed from ulong to int since it only returns a success/failure and no more a mask. Thus it must not return -1 and its callers must only test for != 0, as is documented.	2023-07-20 11:01:09 +02:00
Willy Tarreau	f54d8c6457	CLEANUP: cpuset: remove the unused proc_t1 field in cpu_map This field used to store the cpumap of the first thread in a group, and was used till 2.4 to hold some default settings, after which it was no longer used. Let's just drop it.	2023-07-20 11:01:09 +02:00
Willy Tarreau	151f9a2808	BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct We're currently having a problem with the porting from cpu_map from processes to thread-groups as it happened in 2.7 with commit `5b09341c0` ("MEDIUM: cpu-map: replace the process number with the thread group number"), though it seems that it has deeper roots even in 2.0 and that it was progressively made worng over time. The issue stems in the way the per-process and per-thread cpu-sets were employed over time. Originally only processes were supported. Then threads were added after an optional "/" and it was documented that "cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified in 2.5 by commit `317804d28` ("DOC: update references to process numbers in cpu-map and bind-process"). The reality is different: when processes were still supported, setting "cpu-map 1" would apply the mask to the process itself (and only when run in the background, which is not documented either and is also a bug for another fix), and would be combined with any possible per-thread mask when calculating the threads' affinity, possibly resulting in empty sets. However, "cpu-map 1/all" would only set the mask for the threads and not the process. As such the following: cpu-map 1 odd cpu-map 1/1-8 even would leave no CPU while doing: cpu-map 1/all odd cpu-map 1/1-8 even would allow all CPUs. While such configs are very unlikely to ever be met (which is why this bug is tagged minor), this is becoming quite more visible while testing automatic CPU binding during 2.9 development because due to this bug it's much more common to end up with incorrect bindings. This patch fixes it by simply removing the .proc entry from cpu_map and always setting all threads' maps. The process is no longer arbitrarily bound to the group 1's mask, but in case threads are disabled, we'll use thread 1's mask since it contains the configured CPUs. This fix should be backported at least to 2.6, but no need to insist if it resists as it's easier to break cpu-map than to fix an unlikely issue.	2023-07-20 11:01:09 +02:00
Willy Tarreau	7134417613	MINOR: cpuset: add cpu_map_configured() to know if a cpu-map was found Since we'll soon want to adjust the "thread-groups" degree of freedom based on the presence of cpu-map, we first need to be able to detect if cpu-map was used. This function scans all cpu-map sets to detect if any is present, and returns true accordingly.	2023-07-20 11:01:09 +02:00
Aurelien DARRAGON	2e7d3d2e5c	BUG/MINOR: hlua: hlua_yieldk ctx argument should support pointers lua_yieldk ctx argument is of type lua_KContext which is typedefed to intptr_t when available so it can be used to store pointers. But the wrapper function hlua_yieldk() passes it as a regular it so it breaks that promise. Changing hlua_yieldk() prototype so that ctx argument is of type lua_KContext. This bug had no functional impact because ctx argument is not being actively used so far. This may be backported to all stable versions anyway.	2023-07-17 07:42:47 +02:00
Emeric Brun	49ddd87d41	CLEANUP: quic: remove useless parameter 'key' from quic_packet_encrypt Parameter 'key' was not used in this function. This patch removes it from the prototype of the function. This patch could be backported until v2.6.	2023-07-12 14:33:03 +02:00
Emeric Brun	cadb232e93	BUG/MEDIUM: quic: timestamp shared in token was using internal time clock The internal tick clock was used to export the timestamp int the token on retry packets. Doing this in cluster mode the nodes don't understand the timestamp from tokens generated by others. This patch re-work this using the the real current date (wall-clock time). Timestamp are also now considered in secondes instead of milleseconds. This patch should be backported until v2.6	2023-07-12 14:32:01 +02:00
Aurelien DARRAGON	b6e2d62fb3	MINOR: sink/api: pass explicit maxlen parameter to sink_write() sink_write() currently relies on sink->maxlen to know when to stop writing a given payload. But it could be useful to pass a smaller, explicit value to sink_write() to stop before the ring maxlen, for instance if the ring is shared between multiple feeders. sink_write() now takes an optional maxlen parameter: if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen is smaller than ring->maxlen, else only ring->maxlen will be considered. [for haproxy <= 2.7, patch must be applied by hand: that is: __sink_write() and sink_write() should be patched to take maxlen into account and function calls to sink_write() should use 0 as second argument to keep original behavior]	2023-07-10 18:28:08 +02:00
Aurelien DARRAGON	4f0e0f5a65	MEDIUM: sample: introduce 'same' output type Thierry Fournier reported an annoying side-effect when using the debug() converter. Consider the following examples: [1] http-request set-var(txn.test) bool(true),ipmask(24) [2] http-request redirect location /match if { bool(true),ipmask(32) } When starting haproxy with [1] example we get: config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request set-var(txn.test)' rule : converter 'ipmask' cannot be applied. With [2], we get: config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request redirect' rule : error in condition: converter 'ipmask' cannot be applied in ACL expression 'bool(true),ipmask(32)'. Now consider the following examples which are based on [1] and [2] but with the debug() sample conv inserted in-between those incompatible sample types: [1] http-request set-var(txn.test) bool(true),debug,ipmask(24) [2] http-request redirect location /match if { bool(true),debug,ipmask(32) } According to the documentation, "it is safe to insert the debug converter anywhere in a chain, even with non-printable sample types". Thus we don't expect any side-effect from using it within a chain. However in current implementation, because of debug() returning SMP_T_ANY type which is a pseudo type (only resolved at runtime), the sample compatibility checks performed at parsing time are completely uneffective. (haproxy will start and no warning will be emitted) The indesirable effect of this is that debug() prevents haproxy from warning you about impossible type conversions, hiding potential errors in the configuration that could result to unexpected evaluation or logic while serving live traffic. We better let haproxy warn you about this kind of errors when it has the chance. With our previous examples, this could cause some inconveniences. Let's say for example that you are testing a configuration prior to deploying it. When testing the config you might want to use debug() converter from time to time to check how the conversion chain behaves. Now after deploying the exact same conf, you might want to remove those testing debug() statements since they are no longer relevant.. but removing them could "break" the config and suddenly prevent haproxy from starting upon reload/restart. (this is the expected behavior, but it comes a bit too late because of debug() hiding errors during tests..) To fix this, we introduce a new output type for sample expressions: SMP_T_SAME - may only be used as "expected" out_type (parsing time) for sample converters. As it name implies, it is a way for the developpers to indicate that the resulting converter's output type is guaranteed to match the type of the sample that is presented on the converter's input side. (converter may alter data content, but data type must not be changed) What it does is that it tells haproxy that if switching to the converter (by looking at the converter's input only, since outype is SAME) is conversion-free, then the converter type can safely be ignored for type compatibility checks within the chain. debug()'s out_type is thus set to SMP_T_SAME instead of ANY, which allows it to fully comply with the doc in the sense that it does not impact the conversion chain when inserted between sample items. Co-authored-by: Thierry Fournier <thierry.f.78@gmail.com>	2023-07-03 16:32:01 +02:00
Aurelien DARRAGON	a635a1779a	MEDIUM: sample: add missing ADDR=>? compatibility matrix entries SMP_T_ADDR support was added in `b805f71` ("MEDIUM: sample: let the cast functions set their output type"). According to the above commit, it is made clear that the ADDR type is a pseudo/generic type that may be used for compatibility checks but that cannot be emitted from a fetch or converter. With that in mind, all conversions from ADDR to other types were explicitly disabled in the compatibility matrix. But later, when map__ip functions were updated in `b2f8f08` ("MINOR: map: The map can return IPv4 and IPv6"), we started using ADDR as "expected" output type for converters. This still complies with the original description from `b805f71`, because it is used as the theoric output type, and is never emitted from the converters themselves (only "real" types such as IPV4 or IPV6 are actually being emitted at runtime). But this introduced an ambiguity as well as a few bugs, because some compatibility checks are being performed at config parse time, and thus rely on the expected output type to check if the conversion from current element to the next element in the chain is theorically supported. However, because the compatibility matrix doesn't support ADDR to other types it is currently impossible to use map__ip converters in the middle of a chain (the only supported usage is when map__ip converters are at the very end of the chain). To illustrate this, consider the following examples: acl test str(ok),map_str_ip(test.map) -m found # this will work acl test str(ok),map_str_ip(test.map),ipmask(24) -m found # this will raise an error Likewise, stktable_compatible_sample() check for stick tables also relies on out_type[table_type] compatibility check, so map__ip cannot be used with sticktables at the moment: backend st_test stick-table type string size 1m expire 10m store http_req_rate(10m) frontend fe bind localhost:8080 mode http http-request track-sc0 str(test),map_str_ip(test.map) table st_test # raises an error To fix this, and prevent future usage of ADDR as expected output type (for either fetches or converters) from introducing new bugs, the ADDR=>? part of the matrix should follow the ANY type logic. That is, ADDR, which is a pseudo-type, must be compatible with itself, and where IPV4 and IPV6 both support a backward conversion to a given type, ADDR must support it as well. It is done by setting the relevant ADDR entries to c_pseudo() in the compatibility matrix to indicate that the operation is theorically supported (c_pseudo() will never be executed because ADDR should not be emitted: this only serves as a hint for compatibility checks at parse time). This is what's being done in this commit, thanks to this the broken examples documented above should work as expected now, and future usage of ADDR as out_type should not cause any issue.	2023-07-03 16:32:01 +02:00
Aurelien DARRAGON	30cd137d3f	MINOR: sample: introduce c_pseudo() conv function This function is used for ANY=>!ANY conversions in the compatibility matrix to help differentiate between real NOOP (c_none) and pseudo conversions that are theorically supported at config parse time but can never occur at runtime,. That is, to explicit the fact that actual related runtime operations (e.g.: ANY->IPV4) are not NOOP since they might require some conversion to be performed depending on the input type. When checking the conf we don't know the effective out types so cast[pseudo type][pseudo type] is allowed in the compatibility matrix, but at runtime we only expect cast[real type][(real type \|\| pseudo type)] because fetches and converters may not emit pseudo types, thus using c_none() everywhere was too ambiguous. The process will crash if c_pseudo() is invoked to help catch bugs: crashing here means that a pseudo type has been encountered on a converter's input at runtime (because it was emitted earlier in the chain), which is not supported and results from a broken sample fetch or converter implementation. (pseudo types may only be used as out_type in sample definitions for compatibility checks at parsing time)	2023-07-03 16:32:01 +02:00
Aurelien DARRAGON	58bbe41cb8	MEDIUM: acl/sample: unify sample conv parsing in a single function Both sample_parse_expr() and parse_acl_expr() implement some code logic to parse sample conv list after respective fetch or acl keyword. (Seems like the acl one was inspired by the sample one historically) But there is clearly code duplication between the two functions, making them hard to maintain. Hopefully, the parsing logic between them has stayed pretty much the same, thus the sample conv parsing part may be moved in a dedicated helper parsing function. This is what's being done in this commit, we're adding the new function sample_parse_expr_cnv() which does a single thing: parse the converters that are listed right after a sample fetch keyword and inject them into an already existing sample expression. Both sample_parse_expr() and parse_acl_expr() were adapted to now make use of this specific parsing function and duplicated code parts were cleaned up. Although sample_parse_expr() remains quite complicated (numerous function arguments due to contextual parsing data) the first goal was to get rid of code duplication without impacting the current behavior, with the added benefit that it may allow further code cleanups / simplification in the future.	2023-07-03 16:32:01 +02:00
Frédéric Lécaille	7f3c1bef37	MINOR: quic: Drop packet with type for discarded packet number space. This patch allows the low level packet parser to drop packets with type for discarded packet number spaces. Furthermore, this prevents it from reallocating new encryption levels and packet number spaces already released/discarded. When a packet number space is discarded, it MUST NOT be reallocated. As the packet number space discarding is done asap the type of packet received is known, some packet number space discarding check may be safely removed from qc_try_rm_hp() and qc_qel_may_rm_hp() which are called after having parse the packet header, and is type.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	b97de9dc21	MINOR: quic: Move the packet number space status at quic_conn level As the packet number spaces and encryption level are dynamically allocated, the information about the packet number space discarded status must be kept somewhere else than in these objects. quic_tls_discard_keys() is no more useful. Modify quic_pktns_discard() to do the same job: flag the quic_conn object has having discarded packet number space. Implement quic_tls_pktns_is_disarded() to check if a packet number space is discarded. Note the Application data packet number space is never discarded.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	f7749968d6	CLEANUP: quic: Remove two useless pools a low QUIC connection level Both "quic_tx_ring" and "quic_rx_crypto_frm" pool are no more used. Should be backported as far as 2.6.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	a5c1a3b774	MINOR: quic: Reduce the maximum length of TLS secrets The maximum length of the secrets derived by the TLS stack is 384 bits. This reduces the size of the objects provided by the "quic_tls_secret" pool by 16 bytes. Should be backported as far as 2.6	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	3097be92f1	MEDIUM: quic: Dynamic allocations of QUIC TLS encryption levels Replace ->els static array of encryption levels by 4 pointers into the QUIC connection object, quic_conn struct. ->iel denotes the Initial encryption level, ->eel the Early-Data encryption level, ->hel the Handshaske encryption level and ->ael the Application Data encryption level. Add ->qel_list to this structure to list the encryption levels after having been allocated. Modify consequently the encryption level object itself (quic_enc_level struct) so that it might be added to ->qel_list QUIC connection list of encryption levels. Implement qc_enc_level_alloc() to initialize the value of a pointer to an encryption level object. It is used to initialized the pointer newly added to the quic_conn structure. It also takes a packet number space pointer address as argument to initialize it if not already initialized. Modify quic_tls_ctx_reset() to call it from quic_conn_enc_level_init() which is called by qc_enc_level_alloc() to allocate an encryption level object. Implement 2 new helper functions: - ssl_to_qel_addr() to match and pointer address to a quic_encryption level attached to a quic_conn object with a TLS encryption level enum value; - qc_quic_enc_level() to match a pointer to a quic_encryption level attached to a quic_conn object with an internal encryption level enum value. This functions are useful to be called from ->set_encryption_secrets() and ->add_handshake_data() TLS stack called which takes a TLS encryption enum as argument (enum ssl_encryption_level_t). Replace all the use of the qc->els[] array element values by one of the newly added ->[ieha]el quic_conn struct member values.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	25a7b15144	MINOR: quic: Add a pool for the QUIC TLS encryption levels Very simple patch to define and declare a pool for the QUIC TLS encryptions levels. It will be used to dynamically allocate such objects to be attached to the QUIC connection object (quic_conn struct) and to remove from quic_conn struct the static array of encryption levels (see ->els[]).	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	7d9f12998d	CLEANUP: quic: Remove qc_list_all_rx_pkts() defined but not used This function is not used. May be safely removed.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	6635aa6a0a	MEDIUM: quic: Dynamic allocations of packet number spaces Add a pool to dynamically handle the memory used for the QUIC TLS packet number spaces. Remove the static array of packet number spaces at QUIC connection level (struct quic_conn) and add three new members to quic_conn struc as pointers to quic_pktns struct, one by packet number space as follows: ->ipktns for Initial packet number space, ->hpktns for Handshake packet number space and ->apktns for Application packet number space. Also add a ->pktns_list new member (struct list) to quic_conn struct to attach the list of the packet number spaces allocated for the QUIC connection. Implement ssl_to_quic_pktns() to map and retrieve the addresses of these pointers from TLS stack encryption levels. Modify quic_pktns_init() to initialize these members. Modify ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() to allocate the packet numbers and initialize the encryption level. Implement quic_pktns_release() which takes pointers to pointers to packet number space objects to release the memory allocated for a packet number space attached to a QUIC connection and reset their address values. Modify qc_new_conn() to allocation only the Initial packet number space and Initial encryption level. Modify QUIC loss detection API (quic_loss.c) to use the new ->pktns_list list attached to a QUIC connection in place of a static array of packet number spaces. Replace at several locations the use of elements of an array of packet number spaces by one of the three pointers to packet number spaces	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	ef39a74f4a	MINOR: quic: Move packet number space related functions Move packet number space related functions from quic_conn.h to quic_tls.h. Should be backported as far as 2.6 to ease future backports to come.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	411b6f73b7	MINOR: quic: Implement a packet number space identification function Implement quic_pktns_char() to identify a packet number space from a quic_conn object. Usefull only for traces.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	dc6b339733	MINOR: quic: Move QUIC encryption level structure definition haproxy/quic_tls-t.h is the correct place to quic_enc_level structure definition. Should be backported as far as 2.6 to ease any further backport to come.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	6593ec6f5e	MINOR: quic: Move QUIC TLS encryption level related code (quic_conn_enc_level_init()) quic_conn_enc_level_init() location is definitively in QUIC TLS API source file: src/quic_tls.c.	2023-06-30 16:20:55 +02:00
Willy Tarreau	90d18e2006	IMPORT: slz: implement a synchronous flush() operation In some cases it may be desirable for latency reasons to forcefully flush the queue even if it results in suboptimal compression. In our case the queue might contain up to almost 4 bytes, which need an EOB and a switch to literal mode, followed by 4 bytes to encode an empty message. This means that each call can add 5 extra bytes in the ouput stream. And the flush may also result in the header being produced for the first time, which can amount to 2 or 10 bytes (zlib or gzip). In the worst case, a total of 19 bytes may be emitted at once upon a flush with 31 pending bits and a gzip header. This is libslz upstream commit cf8c4668e4b4216e930b56338847d8d46a6bfda9.	2023-06-30 16:12:36 +02:00
William Lallemand	593c895eed	MINOR: ssl: allow to change the client-sigalgs on server lines This patch introduces the "client-sigalgs" keyword for the server line, which allows to configure the list of server signature algorithms negociated during the handshake. Also available as "ssl-default-server-client-sigalgs" in the global section.	2023-06-29 14:11:46 +02:00
William Lallemand	717f0ad995	MINOR: ssl: allow to change the server signature algorithm on server lines This patch introduces the "sigalgs" keyword for the server line, which allows to configure the list of server signature algorithms negociated during the handshake. Also available as "ssl-default-server-sigalgs" in the global section.	2023-06-29 13:40:18 +02:00
Frédéric Lécaille	c2bab72d32	BUG/MINOR: quic: Missing TLS secret context initialization This bug arrived with this commit: MINOR: quic: Remove pool_zalloc() from qc_new_conn() Missing initialization of largest packet number received during a keyupdate phase. This prevented the keyupdate feature from working and made the keyupdate interop tests to fail for all the clients. Furthermore, ->flags from quic_tls_ctx was also not initialized. This could also impact the keyupdate feature at least. No backport needed.	2023-06-19 19:05:45 +02:00
Frédéric Lécaille	ddc616933c	MINOR: quic: Remove pool_zalloc() from qc_new_conn() qc_new_conn() is ued to initialize QUIC connections with quic_conn struct objects. This function calls quic_conn_release() when it fails to initialize a connection. quic_conn_release() is also called to release the memory allocated by a QUIC connection. Replace pool_zalloc() by pool_alloc() in this function and initialize all quic_conn struct members which are referenced by quic_conn_release() to prevent use of non initialized variables in this fonction. The ebtrees, the lists attached to quic_conn struct must be initialized. The tasks must be reset to their NULL default values to be safely destroyed by task_destroy(). This is all the case for all the TLS cipher contexts of the encryption levels (struct quic_enc_level) and those for the keyupdate. The packet number spaces (struct quic_pktns) must also be initialized. ->prx_counters pointer must be initialized to prevent quic_conn_prx_cntrs_update() from dereferencing this pointer. ->latest_rtt member of quic_loss struct must also be initialized. This is done by quic_loss_init() called by quic_path_init().	2023-06-16 16:55:58 +02:00
Frédéric Lécaille	d66896036a	BUG/MINOR: quic: Missing initialization (packet number space probing) ->tx.pto_probe member of quic_pktns struct was not initialized by quic_pktns_init(). This bug never occured because all quic_pktns structs are attached to quic_conn structs which are always pool_zalloc()'ed. Must be backported as far as 2.6.	2023-06-14 11:35:22 +02:00
Aurelien DARRAGON	b7f8af3ca9	BUG/MINOR: proxy/server: free default-server on deinit proxy default-server is a specific type of server that is not allocated using new_server(): it is directly stored within the parent proxy structure. However, since it may contain some default config options that may be inherited by regular servers, it is also subject to dynamic members (strings, structures..) that needs to be deallocated when the parent proxy is cleaned up. Unfortunately, srv_drop() may not be used directly from p->defsrv since this function is meant to be used on regular servers only (those created using new_server()). To circumvent this, we're splitting srv_drop() to make a new function called srv_free_params() that takes care of the member cleaning which originally takes place in srv_drop(). This function is exposed through server.h, so it may be called from outside server.c. Thanks to this, calling srv_free_params(&p->defsrv) from free_proxy() prevents any memory leaks due to dynamic parameters allocated when parsing a default-server line from a proxy section. This partially fixes GH #2173 and may be backported to 2.8. [While it could also be relevant for other stable versions, the patch won't apply due to architectural changes / name changes between 2.4 => 2.6 and then 2.6 => 2.8. Considering this is a minor fix that only makes memory analyzers happy during deinit paths (at least for <= 2.8), it might not be worth the trouble to backport them any further?]	2023-06-06 15:15:17 +02:00
Willy Tarreau	4ad1c9635a	BUG/MINOR: stream: do not use client-fin/server-fin with HTX Historically the client-fin and server-fin timeouts were made to allow a connection closure to be effective quickly if the last data were sent down a socket and the client didn't close, something that can happen when the peer's FIN is lost and retransmits are blocked by a firewall for example. This made complete sense in 1.5 for TCP and HTTP in close mode. But nowadays with muxes, it's not done at the right layer anymore and even the description doesn't match what is being done, because what happens is that the stream will abort the whole transfer after it's done sending to the mux and this timeout expires. We've seen in GH issue 2095 that this can happen with very short timeout values, and while this didn't trigger often before, now that the muxes (h2 & quic) properly report an end of stream before even the first sc_conn_sync_recv(), it seems that it can happen more often, and have two undesirable effects: - logging a timeout when that's not the case - aborting the request channel, hence the server-side conn, possibly before it had a chance to be put back to the idle list, causing this connection to be closed and not reusable. Unfortunately for TCP (mux_pt) this remains necessary because the mux doesn't have a timeout task. So here we're adding tests to only do this through an HTX mux. But to be really clean we should in fact completely drop all of this and implement these timeouts in the mux itself. This needs to be backported to 2.8 where the issue was discovered, and maybe carefully to older versions, though that is not sure at all. In any case, using a higher timeout or removing client-fin in HTTP proxies is sufficient to make the issue disappear.	2023-06-02 16:33:40 +02:00
Willy Tarreau	ae0f8be011	MINOR: stats: protect against future stats fields omissions As seen in commits `33a4461fa` ("BUG/MINOR: stats: Fix Lua's `get_stats` function") and `a46b142e8` ("BUG/MINOR: Missing stat_field_names (since `f21d17bb`)") it seems frequent to omit to update stats_fields[] when adding a new ST_F_xxx entry. This breaks Lua's get_stats() and shows a "(null)" in the header of "show stat", but that one is not detectable to the naked eye anymore. Let's add a reminder above the enum declaration about this, and a small reg tests checking for the absence of "(null)". It was verified to fail before the last patch above.	2023-06-02 08:39:53 +02:00
Willy Tarreau	cb6a35fdc1	[RELEASE] Released version 2.9-dev0 Released version 2.9-dev0 with the following main changes : - MINOR: version: mention that it's development again	2023-05-31 16:29:19 +02:00
Willy Tarreau	9dc8308a67	MINOR: version: mention that it's development again This essentially reverts `b9b6e94474`.	2023-05-31 16:28:34 +02:00
Willy Tarreau	b9b6e94474	MINOR: version: mention that it's LTS now. The version will be maintained up to around Q2 2028. Let's also update the INSTALL file to mention this.	2023-05-31 16:23:56 +02:00
Amaury Denoyelle	d68f8b5a4a	CLEANUP: mux-quic: rename internal functions This patch is similar to the previous one but for QUIC mux functions used inside the mux code itself or application layer. Replace all occurences of qc_* prefix by qcc_* or qcs_*. This should help to better differentiate code between quic_conn and MUX. This should be backported up to 2.7.	2023-05-30 15:45:55 +02:00
Amaury Denoyelle	6d6ee0dc0b	MINOR: quic: fix stats naming for flow control BLOCKED frames There was a misnaming in stats counter for *_BLOCKED frames in regard to QUIC rfc convention. This patch fixes it to prevent future ambiguity : - STREAMS_BLOCKED -> STREAM_DATA_BLOCKED - STREAMS_DATA_BLOCKED_BIDI -> STREAMS_BLOCKED_BIDI - STREAMS_DATA_BLOCKED_UNI -> STREAMS_BLOCKED_UNI This should be backported up to 2.7.	2023-05-26 17:17:00 +02:00
Amaury Denoyelle	087c5f041b	MINOR: mux-quic: remove nb_streams from qcc Remove nb_streams field from qcc. It was not used outside of a BUG_ON() statement to ensure we never have a negative count of streams. However this is already checked with other fields. This should be backported up to 2.7.	2023-05-26 17:17:00 +02:00
Amaury Denoyelle	7b41dfd834	CLEANUP: mux-quic: remove unneeded fields in qcc Remove fields from qcc structure which are unused. This should be backported up to 2.7.	2023-05-26 17:17:00 +02:00
Patrick Hemmer	425d7ad89d	MINOR: init: pre-allocate kernel data structures on init The Linux kernel maintains data structures to track a processes' open file descriptors, and it expands these structures as necessary when FD usage grows (at every FD=2^X starting at 64). However when threading is in use, during expansion the kernel will pause (observed up to 47ms) while it waits for thread synchronization (see https://bugzilla.kernel.org/show_bug.cgi?id=217366). This change addresses the issue and avoids the random pauses by opening the maximum file descriptor during initialization, so that expansion will not occur while processing traffic.	2023-05-26 09:28:18 +02:00
Willy Tarreau	b298882acc	BUILD: compiler: systematically set USE_OBSOLETE_LINKER with TCC TCC silently ignores the weak and section attributes, which ruins the initcalls. Technically we're exactly in the same situation as with an obsolete linker. Let's just automatically set the flag if TCC is detected, this avoids surprises where the program compiles but does not start. No backport is needed.	2023-05-24 21:37:06 +02:00
Willy Tarreau	eced142aa8	BUILD: ist: use the literal declaration for ist_lc/ist_uc under TCC TCC doesn't knoow about __attribute__((weak)), it silently ignores it. We could add a "static" modifier there in this case but we already have an alternate portable mode that is based on a slightly larger literal for obsolete linkers (and non-ELF systems) which choke on weak. Let's just add the test for tcc there and use it in this case. No backport is needed.	2023-05-24 21:33:34 +02:00
Willy Tarreau	4e8720ab78	BUILD: ist: do not put a cast in an array declaration TCC is upset by the declaration looking like: const unsigned char ist_lc[256] __attribute__((weak)) = ((const unsigned char[256]){ ... }); It was written like this because it's expanded from the _IST_LC macro but it's never used as-is, it's only used from ist_lc, which should be the one containing the cast so that the macro only contains the list of bytes that can be used in both places. And this assigns more consistent roles to the lower and upper case macro/variable now, one is typed and the other one not. No backport is needed.	2023-05-24 21:27:39 +02:00
Fr�d�ric L�caille	12a815ad19	MINOR: quic: Add a counter for sent packets Add ->sent_pkt counter to quic_conn struct to count the packet at QUIC connection level. Then, when the connection is released, the ->sent_pkt counter value is added to the one for the listener. Must be backported to 2.7.	2023-05-24 16:30:11 +02:00
Fr�d�ric L�caille	bdd64fd71d	MINOR: quic: Add some counters at QUIC connection level Add some statistical counters to quic_conn struct from quic_counters struct which are used at listener level to handle them at QUIC connection level. This avoid calling atomic functions. Furthermore this will be useful soon when a counter will be added for the total number of packets which have been sent which will be very often incremented. Some counters were not added, espcially those which count the number of QUIC errors by QUIC error types. Indeed such counters would be incremented most of the time only one time at QUIC connection level. Implement quic_conn_prx_cntrs_update() which accumulates the QUIC connection level statistical counters to the listener level statistical counters. Must be backported to 2.7.	2023-05-24 16:30:11 +02:00
Willy Tarreau	1e1c28873c	BUILD: makefile: fix build issue on GNU make < 3.82 Thierry Fournier reported a build breakage with the ubiquitous make 3.81, LDFLAGS were ignored. This is caused by the declaration of the collect_opt_flags macro that is defined with an "=" sign, something that only appeared in 3.82 and that is not necessary. With it removed, the build now works fine at least from 3.80 to 4.3. No backport is needed since this makefile cleanup appeared in 2.8.	2023-05-24 15:51:03 +02:00
Ilya Shipitsin	97c344dae0	BUILD: quic: re-enable chacha20_poly1305 for libressl this reverts `d2be9d4c48` LibreSSL implements EVP_chacha20_poly1305() with EVP_CIPHER for every released version starting with 3.6.0	2023-05-23 19:20:36 +02:00
Willy Tarreau	b7209d42d9	MEDIUM: stconn: make the SE_FL_ERR_PENDING to ERROR transition systematic During a code audit of the various situations that promote ERR_PENDING to ERROR, it appeared that: - all muxes use se_fl_set_error() to set it, which chooses either based on EOI/EOS presence ; - EOI/EOS that arrive late after ERR_PENDING were not systematically upgraded to ERROR This results in confusion about how such ERROR or ERR_PENDING ought to be handled, which is not quite desirable. This patch adds a test to se_fl_set() to detect if we're setting EOI or EOS while ERR_PENDING is present, or the other way around so that any sequence of EOI/EOS <-> ERR_PENDING results in ERROR being set. This way there will no longer be possible situations where ERROR is missing while the other ones are set.	2023-05-23 16:17:04 +02:00
Amaury Denoyelle	5eadc27623	MINOR: quic: remove return val of quic_aead_iv_build() quic_aead_iv_build() should never fail unless we call it with buffers of different size. This never happens in the code as every input buffers are of size QUIC_TLS_IV_LEN. Remove the return value and add a BUG_ON() to prevent future misusage. This is especially useful to remove one error handling on the sending patch via quic_packet_encrypt(). This should be backported up to 2.7.	2023-05-22 11:17:18 +02:00
Willy Tarreau	5345490b8e	MINOR: clock: provide a function to automatically adjust now_offset Right now there's no way to enforce a specific value of now_ms upon startup in order to compensate for the time it takes to load a config, specifically when dealing with the health check startup. For this we'd need to force the now_offset value to compensate for the last known value of the current date. This patch exposes a function to do exactly this.	2023-05-17 09:33:54 +02:00
Willy Tarreau	5723b382ed	MINOR: stats: report the boot time in "show info" Just like we have the uptime in "show info", let's add the boot time. It's trivial to collect as it's just the difference between the ready date and the start date, and will allow users to monitor this element in order to take action before it starts becoming problematic. Here the boot time is reported in milliseconds, so this allows to even observe sub-second anomalies in startup delays.	2023-05-17 09:33:54 +02:00
Willy Tarreau	da4aa6905c	MINOR: clock: measure the total boot time Some huge configs take a significant amount of time to start and this can cause some trouble (e.g. health checks getting delayed and grouped, process not responding to the CLI etc). For example, some configs might start fast in certain environments and slowly in other ones just due to the use of a wrong DNS server that delays all libc's resolutions. Let's first start by measuring it by keeping a copy of the most recently known ready date, once before calling check_config_validity() and then refine it when leaving this function. A last call is finally performed just before deciding to split between master and worker processes, and it covers the whole boot. It's trivial to collect and even allows to get rid of a call to clock_update_date() in function check_config_validity() that was used in hope to better schedule future events.	2023-05-17 09:33:54 +02:00
Amaury Denoyelle	1a2faef92f	MINOR: mux-quic: uninline qc_attach_sc() Uninline and move qc_attach_sc() function to implementation source file. This will be useful for next commit to add traces in it. This should be backported up to 2.7.	2023-05-16 17:53:45 +02:00
Amaury Denoyelle	3cb78140cf	MINOR: mux-quic: properly report end-of-stream on recv MUX is responsible to put EOS on stream when read channel is closed. This happens if underlying connection is closed or a RESET_STREAM is received. FIN STREAM is ignored in this case. For connection closure, simply check for CO_FL_SOCK_RD_SH. For RESET_STREAM reception, a new flag QC_CF_RECV_RESET has been introduced. It is set when RESET_STREAM is received, unless we already received all data. This is conform to QUIC RFC which allows to ignore a RESET_STREAM in this case. During RESET_STREAM processing, input buffer is emptied so EOS can be reported right away on recv_buf operation. This should be backported up to 2.7.	2023-05-16 17:53:45 +02:00
William Lallemand	d0c363486c	BUILD: ssl: get0_verified chain is available on libreSSL Define HAVE_SSL_get0_verified_chain when it's using libreSSL >= 3.3.6.	2023-05-15 15:16:15 +02:00
William Lallemand	6e0c39d7ac	BUILD: ssl: ssl_c_r_dn fetches uses functiosn only available since 1.1.1 Fix the openssl build with older openssl version by disabling the new ssl_c_r_dn fetch. This also disable the ssl_client_samples.vtc file for OpenSSL version older than 1.1.1	2023-05-15 12:07:52 +02:00
Abhijeet Rastogi	df97f472fa	MINOR: ssl: add new sample ssl_c_r_dn This patch addresses #1514, adds the ability to fetch DN of the root ca that was in the chain when client certificate was verified during SSL handshake.	2023-05-15 10:48:05 +02:00
Amaury Denoyelle	6c501ed23b	BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc qc_stream_buf_alloc() can fail for two reasons : * limit of Tx buffer per connection reached * allocation failure The first case is properly treated. A flag QC_CF_CONN_FULL is set on the connection to interrupt emission. It is cleared when a buffer became available after in order ACK reception and the MUX tasklet is woken up. The allocation failure was handled with the same mechanism which in this case is not appropriate and could lead to a connection transfer freeze. Instead, prefer to close the connection with a QUIC internal error code. To differentiate the two causes, qc_stream_buf_alloc() API was changed to return the number of available buffers to the caller. This must be backported up to 2.6.	2023-05-12 16:26:20 +02:00
Amaury Denoyelle	93dd23cab4	MINOR: mux-quic: remove dedicated function to handle standalone FIN Remove QUIC MUX function qcs_http_handle_standalone_fin(). The purpose of this function was only used when receiving an empty STREAM frame with FIN bit. Besides, it was called by each application protocol which could have different approach and render the function purpose unclear. Invocation of qcs_http_handle_standalone_fin() have been replaced by explicit code in both H3 and HTTP/0.9 module. In the process, use htx_set_eom() to reliably put EOM on the HTX message. This should be backported up to 2.7, along with the previous patch which introduced htx_set_eom().	2023-05-12 15:50:30 +02:00
Amaury Denoyelle	25cf19d5c8	MINOR: htx: add function to set EOM reliably Implement a new HTX utility function htx_set_eom(). If the HTX message is empty, it will first add a dummy EOT block. This is a small trick needed to ensure readers will detect the HTX buffer as not empty and retrieve the EOM flag. Replace the H2 code related by a htx_set_eom() invocation. QUIC also has the same code which will be replaced in the next commit. This should be backported up to 2.7 before the related QUIC patch.	2023-05-12 15:29:28 +02:00
Willy Tarreau	ea07715ccf	MINOR: master/cli: also implement the timed prompt on the master CLI This provides more consistency between the master and the worker. When "prompt timed" is passed on the master, the timed mode is toggled. When enabled, for a master it will show the master process' uptime, and for a worker it will show this worker's uptime. Example: master> prompt timed [0:00:00:50] master> show proc #<PID> <type> <reloads> <uptime> <version> 11940 master 1 [failed: 0] 0d00h02m10s 2.8-dev11-474c14-21 # workers 11955 worker 0 0d00h00m59s 2.8-dev11-474c14-21 # old workers 11942 worker 1 0d00h02m10s 2.8-dev11-474c14-21 # programs [0:00:00:58] master> @!11955 [0:00:01:03] 11955> @!11942 [0:00:02:17] 11942> @ [0:00:01:10] master>	2023-05-11 16:38:52 +02:00
Willy Tarreau	225555711f	MINOR: cli: add an option to display the uptime in the CLI's prompt Entering "prompt timed" toggles reporting of the process' uptime in the prompt, which will report days, hours, minutes and seconds since it was started. As discussed with Tim in issue #2145, this can be convenient to roughly estimate the time between two outputs, as well as detecting that a process failed to be reloaded for example.	2023-05-11 16:38:52 +02:00
Aurelien DARRAGON	31b23aef38	CLEANUP: acl: discard prune_acl_cond() function Thanks to previous commit, we have no more use for prune_acl_cond(), let's remove it to prevent code duplication.	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	7abc9224a6	MINOR: proxy: add http_free_redirect_rule() function Adding http_free_redirect_rule() function to free a single redirect rule since it may be required to free rules outside of free_proxy() function. This patch is required for an upcoming bugfix. [for 2.2, free_proxy function did not exist (first seen in 2.4), thus http_free_redirect_rule() needs to be deducted from haproxy.c deinit() function if the patch is required]	2023-05-11 15:37:04 +02:00
Christopher Faulet	7542fb43d6	MINOR: stconn: Add a cross-reference between SE descriptor A xref is added between the endpoint descriptors. It is created when the server endpoint is attached to the SC and it is destroyed when an endpoint is detached. This xref is not used for now. But it will be useful to retrieve info about an endpoint for the opposite side. It is also the warranty there is still a endpoint attached on the other side.	2023-05-11 15:37:04 +02:00
Willy Tarreau	4cfb0019e6	MINOR: stats: report the listener's protocol along with the address in stats When "optioon socket-stats" is used in a frontend, its listeners have their own stats and will appear in the stats page. And when the stats page has "stats show-legends", then a tooltip appears on each such socket with ip:port and ID. The problem is that since QUIC arrived, it was not possible to distinguish the TCP listeners from the QUIC ones because no protocol indication was mentioned. Now we add a "proto" legend there with the protocol name, so we can see "tcp4" or "quic6" and figure how the socket is bound.	2023-05-11 14:52:56 +02:00
Amaury Denoyelle	5f67b17a59	MEDIUM: mux-quic: adjust transport layer error handling Following previous patch, error notification from quic_conn has been adjusted to rely on standard connection flags. Most notably, CO_FL_ERROR on the connection instance when a fatal error is detected. Check for CO_FL_ERROR is implemented by qc_send(). If set the new flag QC_CF_ERR_CONN will be set for the MUX instance. This flag is similar to the local error flag and will abort most of the futur processing. To ensure stream upper layer is also notified, qc_wake_some_streams() called by qc_process() will put the stream on error if this new flag is set. This should be backported up to 2.7.	2023-05-11 14:12:48 +02:00
Amaury Denoyelle	b2e31d33f5	MEDIUM: quic: streamline error notification When an error is detected at quic-conn layer, the upper MUX must be notified. Previously, this was done relying on quic_conn flag QUIC_FL_CONN_NOTIFY_CLOSE set and the MUX wake callback called on connection closure. Adjust this mechanism to use an approach more similar to other transport layers in haproxy. On error, connection flags are updated with CO_FL_ERROR, CO_FL_SOCK_RD_SH and CO_FL_SOCK_WR_SH. The MUX is then notified when the error happened instead of just before the closing. To reflect this change, qc_notify_close() has been renamed qc_notify_err(). This function must now be explicitely called every time a new error condition arises on the quic_conn layer. To ensure MUX send is disabled on error, qc_send_mux() now checks CO_FL_SOCK_WR_SH. If set, the function returns an error. This should prevent the MUX from sending data on closing or draining state. To complete this patch, MUX layer must now check for CO_FL_ERROR explicitely. This will be the subject of the following commit. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	2d5c3f5cd1	MINOR: mux-quic: add traces for stream wake Add traces for when an upper layer stream is woken up by the MUX. This should help to diagnose frozen stream issues. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Willy Tarreau	9615102b01	MINOR: stats: report the number of times the global maxconn was reached As discussed a few times over the years, it's quite difficult to know how often we stop accepting connections because the global maxconn was reached. This is not easy to know because when we reach the limit we stop accepting but we don't know if incoming connections are pending, so it's not possible to know how many were delayed just because of this. However, an interesting equivalent metric consist in counting the number of times an accepted incoming connection resulted in the limit being reached. I.e. "we've accepted the last one for now". That doesn't imply any other one got delayed but it's a factual indicator that something might have been delayed. And by counting the number of such events, it becomes easier to know whether some limits need to be adjusted because they're reached often, or if it's exceptionally rare. The metric is reported as a counter in show info and on the stats page in the info section right next to "maxconn".	2023-05-11 13:51:31 +02:00
Willy Tarreau	3c4a297d2b	MINOR: stats: report the total number of warnings issued Now in "show info" we have a TotalWarnings field that reports the total number of warnings issued since the process started. It's also reported in the the stats page next to the uptime.	2023-05-11 12:02:21 +02:00
Willy Tarreau	29dcc5e559	DEBUG: list: add DEBUG_LIST to purposely corrupt list heads after delete LIST_DELETE doesn't affect the previous pointers of the stored element. This can sometimes hide bugs when such a pointer is reused by accident in a LIST_NEXT() or equivalent after having been detached for example, or ia another LIST_DELETE is performed again, something that LIST_DEL_INIT() is immune to. By compiling with -DDEBUG_LIST, we'll replace a freshly detached list element with two invalid pointers that will cause a crash in case of accidental misuse. It's not enabled by default.	2023-05-11 11:33:35 +02:00
Frédéric Lécaille	b971696296	BUG/MINOR: quic: Possible crash when dumping version information ->others member of tp_version_information structure pointed to a buffer in the TLS stack used to parse the transport parameters. There is no garantee that this buffer is available until the connection is released. Do not dump the available versions selected by the client anymore, but displayed the chosen one (selected by the client for this connection) and the negotiated one. Must be backported to 2.7 and 2.6.	2023-05-10 13:26:37 +02:00
Amaury Denoyelle	58721f2192	BUG/MINOR: mux-quic: fix transport VS app CONNECTION_CLOSE A recent series of patch were introduced to streamline error generation by QUIC MUX. However, a regression was introduced : every error generated by the MUX was built as CONNECTION_CLOSE_APP frame, whereas it should be only for H3/QPACK errors. Fix this by adding an argument <app> in qcc_set_error. When false, a standard CONNECTION_CLOSE is used as error. This bug was detected by QUIC tracker with the following tests "stop_sending" and "server_flow_control" which requires a CONNECTION_CLOSE frame. This must be backported up to 2.7.	2023-05-09 18:42:34 +02:00
Christopher Faulet	557146ccc8	DOC: stconn: Update comments about ABRT/SHUT for stconn structure The comment for the stconn structure was still referencing the SHUTR/SHUTW flags. These flags were replaced and we now use ABRT/SHUT flags in comments. The comment itself was slightly updated to be accurate.	2023-05-09 16:36:45 +02:00
Christopher Faulet	e59f7583ee	MEDIUM: stconn: Be sure to always be able to unblock a SC that needs room When sc_need_room() is called, the caller cannot request more free space than a minimum value to be sure it is always possible to unblock it. it is a safety guard to not freeze any SC on NEED_ROOM condition. At worse it will lead to some wakeups un excess at the edge. To keep things simple, the following minimum is used: (global.tune.bufsize - global.tune.maxrewrite - sizeof(struct htx))	2023-05-09 11:53:28 +02:00
Frédéric Lécaille	1bc6e318f0	CLEANUP: quic: Rename several <buf> variables in quic_frame.(c\|h) Most of the function in quic_frame.c and quic_frame.h manipulate <buf> buffer position variables which have nothing to see with struct buffer variables. Rename them to <pos> Should be backported to 2.7.	2023-05-09 10:48:40 +02:00
Frédéric Lécaille	d19a02a40e	CLEANUP: quic: No more used q_buf structure This definition is no more used. Should be backported to 2.7.	2023-05-09 10:48:40 +02:00
Willy Tarreau	652d1712dd	BUILD: quic: fix build warning when threads are disabled Commit `e83f937cc` ("MEDIUM: quic: use a global CID trees list") uses a local variable "tree" used only for locks, but when threads are disabled it spews a warning about this unused variable.	2023-05-07 15:06:22 +02:00
Willy Tarreau	dd9f921b3a	CLEANUP: fix a few reported typos in code comments These are only the few relevant changes among those reported here: https://github.com/haproxy/haproxy/actions/runs/4856148287/jobs/8655397661	2023-05-07 07:07:44 +02:00
Willy Tarreau	615c301db4	MINOR: config: allow cpu-map to take commas in lists of ranges The function that cpu-map uses to parse CPU sets, parse_cpu_set(), was etended in 2.4 with commit `a80823543` ("MINOR: cfgparse: support the comma separator on parse_cpu_set") to support commas between ranges. But since it was quite late in the development cycle, by then it was decided not to add a last-minute surprise and not to magically support commas in cpu-map, hence the "comma_allowed" argument. Since then we know that it was not the best choice, because the comma is silently ignored in the cpu-map syntax, causing all sorts of surprises in field with threads running on a single node for example. In addition it's quite common to copy-paste a taskset line and put it directly into the haproxy configuration. This commit relaxes this rule an finally allows cpu-map to support commas between ranges. It simply consists in removing the comma_allowed argument in the parse_cpu_set() function. The doc was updated to reflect this.	2023-05-05 18:41:52 +02:00
Aurelien DARRAGON	fc4ec0d653	MINOR: hlua: declare hlua_yieldk() function Declaring hlua_yieldk() function to make it usable from hlua_fcn.c.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	40cd44f52c	MINOR: hlua: declare hlua_gethlua() function Declaring hlua_gethlua() function to make it usable from hlua_fcn.c.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	34c86760fa	MINOR: hlua: declare hlua_{ref,pushref,unref} functions Declaring hlua_{ref,pushref,unref} functions to make them usable from hlua_fcn.c to simplify reference handling.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	5bed48fec8	MINOR: mailers/hlua: disable email sending from lua Exposing a new hlua function, available from body or init contexts, that forcefully disables the sending of email alerts even if the mailers are defined in haproxy configuration. This will help for sending email directly from lua. (prevent legacy email sending from intefering with lua)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	dcbc2d2cac	MINOR: checks/event_hdl: SERVER_CHECK event Adding a new event type: SERVER_CHECK. This event is published when a server's check state ought to be reported. (check status change or check result) SERVER_CHECK event is provided as a server event with additional data carrying relevant check's context such as check's result and health.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	a163d65254	MINOR: server/event_hdl: add SERVER_ADMIN event Adding a new SERVER event in the event_hdl API. SERVER_ADMIN is implemented as an advanced server event. It is published each time the administrative state changes. (when s->cur_admin changes) SERVER_ADMIN data is an event_hdl_cb_data_server_admin struct that provides additional info related to the admin state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	e3eea29f48	MINOR: server/event_hdl: add SERVER_STATE event Adding a new SERVER event in the event_hdl API. SERVER_STATE is implemented as an advanced server event. It is published each time the server's effective state changes. (when s->cur_state changes) SERVER_STATE data is an event_hdl_cb_data_server_state struct that provides additional info related to the server state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	3889efa8e4	MINOR: hlua_fcn: add Server.get_proxy() Server.get_proxy(): get the proxy to which the server belongs (or nil if not available)	2023-05-05 16:28:32 +02:00
Christopher Faulet	7b3d38a633	MEDIUM: tree-wide: Change sc API to specify required free space to progress sc_need_room() now takes the required free space to receive more data as parameter. All calls to this function are updated accordingly. For now, this value is set but not used. When we are waiting for a buffer, 0 is used. So we expect to be unblocked ASAP. However this must be reviewed because SC_FL_NEED_BUF is probably enough in this case and this flag is already set if the input buffer allocation fails.	2023-05-05 15:44:23 +02:00
Christopher Faulet	9aed1124ed	MINOR: stconn: Add a field to specify the room needed by the SC to progress When the SC is blocked because it is waiting for room in the input buffer, it will be responsible to specify the minimum free space required to progress. In this commit, we only introduce the field in the stconn structure that will be used to store this value. It is a signed value with the following meaning: * -1: The SC is waiting for room but not based on the buffer state. It will be typically used during splicing when the pipe is full. In this case, only a successful send can unblock the SC. * >= 0; The minimum free space in the input buffer to unblock the SC. 0 is a special value to specify the SC must be unblocked ASAP, by the stream, at the end of process_stream() or when output data are consumed on the opposite side.	2023-05-05 15:41:30 +02:00
Christopher Faulet	f4258bdf3b	MINOR: stats: Use the applet API to write data stats_putchk() is updated to use the applet API instead of the channel API to write data. To do so, the appctx is passed as parameter instead of the channel. This way, the applet does not need to take care to request more room it it fails to put data into the channel's buffer.	2023-05-05 15:41:29 +02:00
William Lallemand	b6ae2aafde	MINOR: ssl: allow to change the signature algorithm for client authentication This commit introduces the keyword "client-sigalgs" for the bind line, which does the same as "sigalgs" but for the client authentication. "ssl-default-bind-client-sigalgs" allows to set the default parameter for all the bind lines. This patch should fix issue #2081.	2023-05-05 00:05:46 +02:00
William Lallemand	1d3c822300	MINOR: ssl: allow to change the server signature algorithm This patch introduces the "sigalgs" keyword for the bind line, which allows to configure the list of server signature algorithms negociated during the handshake. Also available as "ssl-default-bind-sigalgs" in the default section. This patch was originally written by Bruno Henc.	2023-05-04 22:43:18 +02:00
Willy Tarreau	e69919d1ba	CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash() The function isn't used anymore since each call place performs its own loop. Let's get rid of it.	2023-05-04 19:19:04 +02:00
Willy Tarreau	9a6ecbd590	MEDIUM: debug: simplify the thread dump mechanism The thread dump mechanism that is used by "show threads" and by the panic dump is overly complicated due to an initial misdesign. It firsts wakes all threads, then serializes their dumps, then releases them, while taking extreme care not to face colliding dumps. In fact this is not what we need and it reached a limit where big machines cannot dump all their threads anymore due to buffer size limitations. What is needed instead is to be able to dump one thread, and to let the requester iterate on all threads. That's what this patch does. It adds the thread_dump_buffer to the struct thread_ctx so that the requester offers the buffer to the thread that is about to be dumped. This buffer also serves as a lock. A thread at rest has a NULL, a valid pointer indicates the thread is using it, and 0x1 (NULL+1) is used by the dumped thread to tell the requester it's done. This makes sure that a given thread is dumped once at a time. In addition to this, the calling thread decides whether it accesses the thread by itself or via the debug signal handler, in order to get a backtrace. This is much saner because the calling thread is free to do whatever it wants with the buffer after each thread is dumped, and there is no dependency between threads, once they've dumped, they're free to continue (and possibly to dump for another requester if needed). Finally, when the THREAD_DUMP feature is disabled and the debug signal is not used, the requester accesses the thread by itself like before. For now we still have the buffer size limitation but it will be addressed in future patches.	2023-05-04 19:15:44 +02:00
Aurelien DARRAGON	e910909556	BUG/MINOR: time: fix NS_TO_TV macro NS_TO_TV helper was implemented in `591fa59` ("MINOR: time: add conversions to/from nanosecond timestamps") Due to NS_TO_TV being implemented as a macro and not a function, we must take extra care when manipulating user input. In current implementation, 't' argument is not isolated within the macro. Because of this, NS_TO_TV(1 + 1) will expand to: ((const struct timeval){ .tv_sec = 1 + 1 / 1000000000ULL, .tv_usec = (1 + 1 % 1000000000ULL) / 1000U }) Instead of: ((const struct timeval){ .tv_sec = 2 / 1000000000ULL, .tv_usec = (2 % 1000000000ULL) / 1000U }) As such, NS_TO_TV usage in hlua_now() is currently incorrect and this results in unexpected values being passed to lua. In this patch, we're adding an extra parenthesis around 't' in NS_TO_TV() macro to make it safe against such usages. (that is: ensure proper argument expansion as if NS_TO_TV was implemented as a function) This is a 2.8 specific bug, no backport needed.	2023-05-04 18:09:50 +02:00
Amaury Denoyelle	51f116d65e	MINOR: mux-quic: adjust local error API When a fatal error is detected by the QUIC MUX or H3 layer, the connection should be closed with a CONNECTION_CLOSE with an error code as the reason. Previously, a direct call was used to the quic_conn layer to try to close the connection. This API was adjusted to be more flexible. Now, when an error is detected, the function qcc_set_error() is called. This set the flag QC_CF_ERRL with the error code stored by the MUX. The connection will be closed soon so most of the operations are not conducted anymore. Connection is then finally closed during qc_send() via quic_conn layer if QC_CF_ERRL is set. This will set the flag QC_CF_ERRL_DONE which indicates that the MUX instance can be freed. This model is cleaner and brings the following improvments : - interaction with quic_conn layer for closure is centralized on a single function - CO_FL_ERROR is not set anymore. This was incorrect as this should be reserved to errors reported by the transport layer to be similar with other haproxy components. As a consequence, qcc_is_dead() has been adjusted to check for QC_CF_ERRL_DONE to release the MUX instance. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	8d44bfaf0b	MINOR: mux-quic: add trace event for local error Add a dedicated trace event QMUX_EV_QCC_ERR. This is used for locally detected error when a CONNECTION_CLOSE should be emitted. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	bc0adfa334	MINOR: proxy: factorize send rate measurement Implement a new dedicated function increment_send_rate() which can be call anywhere new bytes must be accounted for global total sent.	2023-04-28 16:53:44 +02:00
Willy Tarreau	c05d30e9d8	MINOR: clock: replace the timeval start_time with start_time_ns Now that "now" is no more a timeval, there's no point keeping a copy of it as a timeval, let's also switch start_time to nanoseconds, it simplifies operations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	eed5da1037	MINOR: clock: do not use now.tv_sec anymore Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec part to disappear. At this point, "now" is only used as a timeval in clock.c where it is updated.	2023-04-28 16:08:08 +02:00
Willy Tarreau	e8e4712771	MINOR: checks: use a nanosecond counters instead of timeval for checks->start Now we store the checks start date as a nanosecond timestamps instead of a timeval, this will simplify the operations with "now" in the near future.	2023-04-28 16:08:08 +02:00
Willy Tarreau	ad5a5f6779	MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request Let's get rid of timeval in storage of internal timestamps so that they are no longer mistaken for wall clock time. These were exclusively used subtracted from each other or to/from "now" after being converted to ns, so this patch removes the tv_to_ns() conversion to use them natively. Two occurrences of tv_isge() were turned to a regular wrapping subtract.	2023-04-28 16:08:08 +02:00
Willy Tarreau	aaebcae58b	MINOR: spoe: switch the timeval-based timestamps to nanosecond timestamps Various points were collected during a request/response and were stored using timeval. Let's now switch them to nanosecond based timestamps.	2023-04-28 16:08:08 +02:00
Willy Tarreau	76d343d3d3	MINOR: time: replace calls to tv_ms_elapsed() with a linear subtract Instead of operating on {sec, usec} now we convert both operands to ns then subtract them and convert to ms. This is a first step towards dropping timeval from these timestamps. Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at all and could be removed.	2023-04-28 16:08:08 +02:00
Willy Tarreau	591fa59da7	MINOR: time: add conversions to/from nanosecond timestamps In order to ease the transition away from the timeval used in internal timestamps, let's first create a few functions and macro to return a counter from a timeval and conversely, as well as ease the conversions to/from ns/us/ms/sec to save the user from having to count zeroes and to think about appending ULL in conversions.	2023-04-28 16:08:08 +02:00
Christopher Faulet	81951f264e	BUG/MINOR: stconn: Fix SC flags with same value SC_FL_SND_NEVERWAIT and SC_FL_SND_EXP_MORE flags have the same value. It is not critical because these flags are only used to know if MSG_MORE flag must be set on a send(). No backport needed.	2023-04-28 08:51:34 +02:00
Christopher Faulet	e99c43907c	BUG/MEDIUM: spoe: Don't start new applet if there are enough idle ones It is possible to start too many applets on sporadic burst of events after an inactivity period. It is due to the way we estimate if a new applet must be created or not. It is based on a frequency counter. We compare the events processing rate against the number of events currently processed (in progress or waiting to be processed). But we should also take care of the number of idle applets. We already track the number of idle applets, but it is global and not per-thread. Thus we now also track the number of idle applets per-thread. It is not a big deal because this fills a hole in the spoe_agent structure. Thanks to this counter, we can refrain applets creation if there is enough idle applets to handle currently processed events. This patch should be backported to every stable versions.	2023-04-28 08:51:34 +02:00
Amaury Denoyelle	d6646dddcc	MINOR: quic: finalize affinity change as soon as possible During accept, a quic-conn is rebind to a new thread. This process is done in two times : * first on the original thread via qc_set_tid_affinity() * then on the newly assigned thread via qc_finalize_affinity_rebind() Most quic_conn operations (I/O tasklet, task and quic_conn FD socket read) are reactivated ony after the second step. However, there is a possibility that datagrams are handled before it via quic_dgram_parse() when using listener sockets. This does not seem to cause any issue but this may cause unexpected behavior in the future. To simplify this, qc_finalize_affinity_rebind() will be called both by qc_xprt_start() and quic_dgram_parse(). Only one invocation will be performed thanks to the new flag QUIC_FL_CONN_AFFINITY_CHANGED. This should be backported up to 2.7.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	24962dd178	BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length Some HTX responses may not always contain a EOM block. For example this is the case if content-length header is missing from the HTTP server response. Stream termination is thus signaled to QUIC mux via shutw callback. However, this is interpreted inconditionnally as an early close by the mux with a RESET_STREAM emission. Most of the times, QUIC clients report this as an error. To fix this, check if htx.extra is set to HTX_UNKOWN_PAYLOAD_LENGTH for a qcs instance. If true, shutw will never be used to emit a RESET_STREAM. Instead, the stream will be closed properly with a FIN STREAM frame. If all data were already transfered, an empty STREAM frame is sent. This fix may help with the github issue #2004 where chrome browser stop to use QUIC after receiving RESET_STREAM frames. This issue was reported by Vladimir Zakharychev. Thanks to him for his help and testing. It was also reproduced locally using httpterm with the query string "/?s=1k&b=0&C=1". This should be backported up to 2.7.	2023-04-26 17:50:09 +02:00
Willy Tarreau	543e2544ca	DEBUG: crash using an invalid opcode on aarch64 instead of an invalid access On aarch64 there's also a guaranted invalid instruction, called UDF, and which even supports an optional 16-bit immediate operand: https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/UDF--Permanently-Undefined-?lang=en It's conveniently encoded as 4 zeroes (when the operand is zero). It's unclear when support for it was added into GAS, if at all; even a not-so-old 2.27 doesn't know about it. Let's byte-encode it. Tested on an A72 and works as expected.	2023-04-25 19:53:39 +02:00
Willy Tarreau	77787ec9bc	DEBUG: crash using an invalid opcode on x86/x86_64 instead of an invalid access BUG_ON() calls currently trigger a segfault. This is more convenient than abort() as it doesn't rely on any function call nor signal handler and never causes non-unwindable stacks when opening cores. But it adds quite some confusion in bug reports which are rightfully tagged "segv" and do not instantly allow to distinguish real segv (e.g. null derefs) from code asserts. Some CPU architectures offer various crashing methods. On x86 we have INT3 (0xCC), which stops into the debugger, and UD0/UD1/UD2. INT3 looks appealing but for whatever reason (maybe signal handling somewhere) it loses the last call point in the stack, making backtraces unusable. UD2 has the merit of being only 2 bytes and causing an invalid instruction, which almost never happens normally, so it's easily distinguishable. Here it was defined as a macro so that the line number in the core matches the one where the BUG_ON() macro is called, and the debugger shows the last frame exactly at its calligg point. E.g. when calling "debug dev bug": Program terminated with signal SIGILL, Illegal instruction. #0 debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408 408 BUG_ON(one > zero); [Current thread is 1 (Thread 0x7f7a660cc1c0 (LWP 14238))] (gdb) bt #0 debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408 #1 debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:402 #2 0x000000000061a69f in cli_parse_request (appctx=appctx@entry=0x181c0160) at src/cli.c:832 #3 0x000000000061af86 in cli_io_handler (appctx=0x181c0160) at src/cli.c:1035 #4 0x00000000006ca2f2 in task_run_applet (t=0x181c0290, context=0x181c0160, state=<optimized out>) at src/applet.c:449	2023-04-25 18:51:10 +02:00
Amaury Denoyelle	d5f03cd576	CLEANUP: quic: rename frame variables Rename all frame variables with the suffix _frm. This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:22 +02:00
Amaury Denoyelle	888c5f283a	CLEANUP: quic: rename frame types with an explicit prefix Each frame type used in quic_frame union has been renamed with the following prefix "qf_". This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:03 +02:00
Willy Tarreau	7310164b2c	MINOR: listener: add a new global tune.listener.default-shards setting This new setting accepts "by-process", "by-group" and "by-thread" and will dictate how listeners will be sharded by default when nothing is specified. While the default remains "by-process", "by-group" should be much more efficient with many threads, while not changing anything for single-group setups.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f1003ea7fa	MINOR: protocol: perform a live check for SO_REUSEPORT support When testing if a protocol supports SO_REUSEPORT, we're now able to verify if the OS does really support it. While it may be supported at build time, it may possibly have been blocked in a container for example so we'd rather know what it's like.	2023-04-23 09:46:15 +02:00
Willy Tarreau	b073573c10	MINOR: sock: add a function to check for SO_REUSEPORT support at runtime The new function _sock_supports_reuseport() will be used to check if a protocol type supports SO_REUSEPORT or not. This will be useful to verify that shards can really work.	2023-04-23 09:46:15 +02:00
Willy Tarreau	8a5e6f4cca	MINOR: protocol: add a function to check if some features are supported The new function protocol_supports_flag() checks the protocol flags to verify if some features are supported, but will support being extended to refine the tests. Let's use it to check for REUSEPORT.	2023-04-23 09:46:15 +02:00
Willy Tarreau	785b89f551	MINOR: protocol: move the global reuseport flag to the protocols Some protocol support SO_REUSEPORT and others not. Some have such a limitation in the kernel, and others in haproxy itself (e.g. sock_unix cannot support multiple bindings since each one will unbind the previous one). Also it's really protocol-dependent and not just family-dependent because on Linux for some time it was supported for TCP and not UDP. Let's move the definition to the protocols instead. Now it's preset in tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset. The enabled() config condition test validates IPv4 (generally sufficient), and -dR / noreuseport all protocols at once.	2023-04-23 09:46:15 +02:00
Willy Tarreau	65df7e028d	MINOR: protocol: add a flags field to store info about protocols We'll use these flags to know if some protocols are supported, and if so, with what options/extensions. Reuseport will move there for example. Two functions were added to globally set/clear a flag.	2023-04-23 09:46:15 +02:00
Willy Tarreau	da0d2cb698	MINOR: proxy: make proxy_type_str() recognize peers sections Now proxy_type_str() will emit "peers section" when the mode is set to peers, so as to ease sharing more code between peers and proxies.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f6a8444f55	REORG: listener: move the bind_conf's thread setup code to listener.c What used to be only two lines to apply a mask in a loop in check_config_validity() grew into a 130-line block that performs deeply listener-specific operations that do not have their place there anymore. In addition it's worth noting that the peers code still doesn't support shards nor being bound to more than one group, which is a second reason for moving that code to its own function. Nothing was changed except recreating the missing variables from the bind_conf itself (the fe only).	2023-04-23 09:46:15 +02:00
Willy Tarreau	4c538df28c	CLEANUP: protocol: move the nb_receivers to plug a hole in protocol This field forces an unaligned hole between two list heads. Let's move it up where it will be more easily combined with other fields. In addition, turn it to unsigned while it's still not used.	2023-04-23 09:46:15 +02:00
Willy Tarreau	798d6b4124	CLEANUP: protocol: move the l3_addrlen to plug a hole in proto_fam There's a two-byte hole in proto_fam after sock_family, let's move the l3_addrlen there as a ushort. Note that contrary to what the comment says, it's still not used by hash algorithms though it could.	2023-04-23 09:46:15 +02:00
Willy Tarreau	df4051cd58	BUILD: proto_tcp: export the correct names for proto_tcpv[46] The exported names were not correct (missing the 'v').	2023-04-23 09:46:15 +02:00
Willy Tarreau	968a4f34fc	BUILD: sock_inet: forward-declare struct receiver Including sock_inet.h without receiver-t.h causes build failures due to struct receiver not being defined. Let's just forward-declare it.	2023-04-23 09:46:15 +02:00
Ilya Shipitsin	ccf8012f28	CLEANUP: assorted typo fixes in the code and comments This is 36th iteration of typo fixes	2023-04-23 09:44:53 +02:00
Tim Duesterhus	3a8c63d48d	MINOR: Make `tasklet_free()` safe to be called with `NULL` Make this freeing function safe, like other freeing functions are as discussed in GitHub issue #2126.	2023-04-23 00:28:25 +02:00
Willy Tarreau	ff18504d73	MINOR: listener: make sure to avoid ABA updates in per-thread index One limitation of the current thread index mechanism is that if the values are assigned multiple times to the same thread and the index loops, it can match again the old value, which will not prevent a competing thread from finishing its CAS and assigning traffic to a thread that's not the optimal one. The probability is low but the solution is simple enough and consists in implementing an update counter in the high bits of the index to force a mismatch in this case (assuming we don't try to cover for extremely unlikely cases where the update counter loops while the index remains equal). So let's do that. In order to improve the situation a little bit, we now set the index to a ulong so that in 32 bits we have 8 bits of counter and in 64 bits we have 40 bits.	2023-04-21 17:41:26 +02:00
Willy Tarreau	e6f5ab5afa	MINOR: listener: make accept_queue index atomic There has always been a race when checking the length of an accept queue to determine which one is more loaded that another, because the head and tail are read at two different moments. This is not required, we can merge them as two 16 bit numbers inside a single 32-bit index that is always accessed atomically. This way we read both values at once and always have a consistent measurement.	2023-04-21 17:41:26 +02:00
Willy Tarreau	e4c36aa8a1	MINOR: receiver: add RX_F_MUST_DUP to indicate that an rx must be duped The purpose of this new flag will be to mark that some listeners duplicate their reference's FD instead of trying to setup a completely new listener from scratch. This will be used when multiple groups want to listen to the same socket, via multiple FDs.	2023-04-21 17:41:26 +02:00
Willy Tarreau	aae1810b4d	MINOR: receiver: add a struct shard_info to store info about each shard In order to create multiple receivers for one multi-group shard, we'll need some more info about the shard. Here we store: - the number of groups (= number of receivers) - the number of threads (will be used for accept LB) - pointer to the reference rx (to get the FD and to find all threads) - pointers to the other members (to iterate over all threads) For now since there's only one group per shard it remains simple. The listener deletion code already takes care of removing the current member from its shards list and moving others' reference to the last one if it was their reference (so as to avoid o(n^2) updates during ordered deletes). Since the vast majority of setups will not use multi-group shards, we try to save memory usage by only allocating the shard_info when it is needed, so the principle here is that a receiver shard_info==NULL is alone and doesn't share its socket with another group. Various approaches were considered and tests show that the management of the listeners during boot makes it easier to just attach to or detach from a shard_info and automatically allocate it if it does not exist, which is what is being done here. For now the attach code is not called, but detach is already called on delete.	2023-04-21 17:41:26 +02:00
Willy Tarreau	84fe1f479b	MINOR: listener: support another thread dispatch mode: "fair" This new algorithm for rebalancing incoming connections to multiple threads is simpler and instead of considering the threads load, it will only cycle through all of them, offering a fair share of the traffic to each thread. It may be well suited for short-lived connections but is also convenient for very large thread counts where it's not always certain that the least loaded thread will always be found.	2023-04-21 17:41:26 +02:00
Willy Tarreau	6a4d48b736	MINOR: quic_sock: index li->per_thr[] on local thread id, not global one There's a li_per_thread array in each listener for use with QUIC listeners. Since thread groups were introduced, this array can be allocated too large because global.nbthread is allocated for each listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP) may be used by a single listener. This was because the global thread ID is used as the index instead of the local ID (since a listener may only be used by a single group). Let's just switch to local ID and reduce the allocated size.	2023-04-21 17:41:26 +02:00
Willy Tarreau	77d37b07b1	MINOR: quic: support migrating the listener as well When migrating a quic_conn to another thread, we may need to also switch the listener if the thread belongs to another group. When this happens, the freshly created connection will already have the target listener, so let's just pick it from the connection and use it in qc_set_tid_affinity(). Note that it will be the caller's responsibility to guarantee this.	2023-04-21 17:41:26 +02:00
Aurelien DARRAGON	76e255520f	MINOR: server: pass adm and op cause to srv_update_status() Operational and administrative state change causes are not propagated through srv_update_status(), instead they are directly consumed within the function to provide additional info during the call when required. Thus, there is no valid reason for keeping adm and op causes within server struct. We are wasting space and keeping uneeded complexity. We now exlicitly pass change type (operational or administrative) and associated cause to srv_update_status() so that no extra storage is needed since those values are only relevant from srv_update_status().	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	1746b56e68	MINOR: server: change srv_op_st_chg_cause storage type This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type". While looking at current srv_op_st_chg_cause usage, it was clear that the struct needed some cleanup since some leftovers from asynchronous server state change updates were left behind and resulted in some useless code duplication, and making the whole thing harder to maintain. Two observations were made: - by tracking down srv_set_{running, stopped, stopping} usage, we can see that the <reason> argument is always a fixed statically allocated string. - check-related state change context (duration, status, code...) is not used anymore since srv_append_status() directly extracts the values from the server->check. This is pure legacy from when the state changes were applied asynchronously. To prevent code duplication, useless string copies and make the reason/cause more exportable, we store it as an enum now, and we provide srv_op_st_chg_cause() function to fetch the related description string. HEALTH and AGENT causes (check related) are now explicitly identified to make consumers like srv_append_op_chg_cause() able to fetch checks info from the server itself if they need to.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	f3b48a808e	MINOR: server: srv_append_status refacto srv_append_status() has become a swiss-knife function over time. It is used from server code and also from checks code, with various inputs and distincts code paths, making it very hard to guess the actual behavior of the function (resulting string output). To simplify the logic behind it, we're dividing it in multiple contextual functions that take simple inputs and do explicit things, making them more predictable and easier to maintain.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9b1ccd7325	MINOR: server: change adm_st_chg_cause storage type Even though it doesn't look like it at first glance, this is more like a cleanup than an actual code improvement: Given that srv->adm_st_chg_cause has been used to exclusively store static strings ever since it was implemented, we make the choice to store it as an enum instead of a fixed-size string within server struct. This will allow to save some space in server struct, and will make it more easily exportable (ie: event handlers) because of the reduced memory footprint during handling and the ability to later get the corresponding human-readable message when it's explicitly needed.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	e9314fb7a7	MINOR: event_hdl: provide event->when for advanced handlers For advanced async handlers only (Registered using EVENT_HDL_ASYNC_TASK() macro): event->when is provided as a struct timeval and fetched from 'date' haproxy global variable. Thanks to 'when', related event consumers will be able to timestamp events, even if they don't work in real-time or near real-time. Indeed, unlike sync or normal async handlers, advanced async handlers could purposely delay the consumption of pending events, which means that the date wouldn't be accurate if computed directly from within the handler.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	ebf58e991a	MINOR: event_hdl: dynamically allocated event data members Add the ability to provide a cleanup function for event data passed via the publishing function. One use case could be the need to provide valid pointers in the safe section of the data struct. Cleanup function will be automatically called with data (or copy of data) as argument when all handlers consumed the event, which provides an easy way to release some memory or decrement refcounts to ressources that were provided through the data struct. data in itself may not be freed by the cleanup function, it is handled by the API. This would allow passing large (allocated) data blocks through the data struct while keeping data struct size under the EVENT_HDL_ASYNC_EVENT_DATA size limit. To do so, when publishing an event, where we would currently do: struct event_hdl_cb_data_new_family event_data; /* safe data, available from both sync and async contexts * may not use pointers to short-living resources / event_data.safe.my_custom_data = x; / unsafe data, only available from sync contexts / event_data.unsafe.my_unsafe_data = y; / once data is prepared, we can publish the event / event_hdl_publish(NULL, EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1, EVENT_HDL_CB_DATA(&event_data)); We could do: struct event_hdl_cb_data_new_family event_data; / safe data, available from both sync and async contexts * may not use pointers to short-living resources, * unless EVENT_HDL_CB_DATA_DM is used to ensure pointer * consistency (ie: refcount) / event_data.safe.my_custom_static_data = x; event_data.safe.my_custom_dynamic_data = malloc(1); / unsafe data, only available from sync contexts / event_data.unsafe.my_unsafe_data = y; / once data is prepared, we can publish the event / event_hdl_publish(NULL, EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1, EVENT_HDL_CB_DATA_DM(&event_data, data_new_family_cleanup)); With data_new_family_cleanup func which would look like this: void data_new_family_cleanup(const void data) { const struct event_hdl_cb_data_new_family event_data = ptr; / some data members require specific cleanup once the event * is consumed / free(event_data.safe.my_custom_dynamic_data); / don't ever free data! it is not ours */ } Not sure if this feature will become relevant in the future, so I prefer not to mention it in the doc for now. But given that the implementation is trivial and does not put a burden on the existing API, it's a good thing to have it there, just in case.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	147691fd83	CLEANUP: event_hdl: fix comment typo about _sync assertion Fixing a comment relative to EVENT_HDL_ASSERT_SYNC macro where a typo was made and the comment was lacking some context.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	363ef4daa7	CLEANUP: event_hdl: updating obsolete comment for EVENT_HDL_CB_DATA EVENT_HDL_CB_DATA macro comments were not updated during the API refactor, fixing that.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	8273bfc639	BUG/MINOR: event_hdl: don't waste 1 event subtype slot ESUB_INDEX(n) index macro is used exclusively with n > 0 Fixing it so that it starts numbering at 1 instead of 2. This way, we don't waste a subtype slot in event_hdl_sub_type struct, and we comply with the structure comments about max supported event subtypes (currently set at 16). If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	a63f4903c9	MINOR: server/event_hdl: prepare for upcoming refactors This commit does nothing that ought to be mentioned, except that it adds missing comments and slighty moves some function calls out of "sensitive" code in preparation of some server code refactors.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	d714213862	MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server Expose proxy_uuid variable in event_hdl_cb_data_server struct to overcome proxy_name fixed length limitation. proxy_uuid may be used by the handler to perform proxy lookups. This should be preferred over lookups relying proxy_name. (proxy_name is suitable for printing / logging purposes but not for ID lookups since it has a maximum fixed length)	2023-04-21 14:36:45 +02:00
Frédéric Lécaille	0ed94032b2	MINOR: quic: Do not allocate too much ack ranges Limit the maximum number of ack ranges to QUIC_MAX_ACK_RANGES(32). Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Frédéric Lécaille	4b2627beae	BUG/MINOR: quic: Stop removing ACK ranges when building packets Since this commit: BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit. There are more chances that ack ranges may be removed from their trees when building a packet. It is preferable to impose a limit to these trees. This will be the subject of the a next commit to come. For now on, it is sufficient to stop deleting ack range from their trees. Remove quic_ack_frm_reduce_sz() and quic_rm_last_ack_ranges() which were there to do that. Make qc_frm_len() support ACK frames and calls it to ensure an ACK frame may be added to a packet before building it. Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Aurelien DARRAGON	2a9764baae	CLEANUP: hlua: avoid confusion between internal timers and tick based timers Not all hlua "time" variables use the same time logic. hlua->wake_time relies on ticks since its meant to be used in conjunction with task scheduling. Thus, it should be stored as a signed int and manipulated using the tick api. Adding a few comments about that to prevent mixups with hlua internal timer api which doesn't rely on the ticks api.	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	da9503ca9a	MEDIUM: hlua: reliable timeout detection For non yieldable lua handlers (converters, fetches or yield incompatible lua functions), current timeout detection relies on now_ms thread local variable. But within non-yieldable contexts, now_ms won't be updated if not by us (because we're momentarily stuck in lua context so we won't re-enter the polling loop, which is responsible for clock updates). To circumvent this, clock_update_date(0, 1) was manually performed right before now_ms is being read for the timeout checks. But this fails to work consistently, because if no other concurrent threads periodically run clock_update_global_date(), which do happen if we're the only active thread (nbthread=1 or low traffic), our clock_update_date() call won't reliably update our local now_ms variable Moreover, clock_update_date() is not the right tool for this anyway, as it was initially meant to be used from the polling context. Using it could have negative impact on other threads relying on now_ms to be stable. (because clock_update_date() performs global clock update from time to time) -> Introducing hlua multipurpose timer, which is internally based on now_cpu_time_fast() that provides per-thread consistent clock readings. Thanks to this new hlua timer API, hlua timeout logic is less error-prone and more robust. This allows the timeout detection to work as expected for both yieldable and non-yieldable lua handlers. This patch depends on commit "MINOR: clock: add now_cpu_time_fast() function" While this could theorically be backported to all stable versions, it is advisable to avoid backports unless we're confident enough since it could cause slight behavior changes (timing related) in existing setups.	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	df188f145b	MINOR: clock: add now_cpu_time_fast() function Same as now_cpu_time(), but for fast queries (less accurate) Relies on now_cpu_time() and now_mono_time_fast() is used as a cache expiration hint to prevent now_cpu_time() from being called too often since it is known to be quite expensive. Depends on commit "MINOR: clock: add now_mono_time_fast() function"	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	07cbd8e074	MINOR: clock: add now_mono_time_fast() function Same as now_mono_time(), but for fast queries (less accurate) Relies on coarse clock source (also known as fast clock source on some systems). Fallback to now_mono_time() if coarse source is not supported on the system.	2023-04-19 11:03:31 +02:00
Amaury Denoyelle	0783a7b08e	MINOR: listener: remove unneeded local accept flag Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC protocol before thread rebinding was supported by the quic_conn layer. This should be backported up to 2.7 after the previous patch has also been taken.	2023-04-18 17:09:34 +02:00
Amaury Denoyelle	739de3f119	MINOR: quic: properly finalize thread rebinding When a quic_conn instance is rebinded on a new thread its tasks and tasklet are destroyed and new ones created. Its socket is also migrated to a new thread which stop reception on it. To properly reactivate a quic_conn after rebind, wake up its tasks and tasklet if they were active before thread rebind. Also reactivate reading on the socket FD. These operations are implemented on a new function qc_finalize_affinity_rebind(). This should be backported up to 2.7 after a period of observation.	2023-04-18 17:09:02 +02:00
Amaury Denoyelle	25174d51ef	MEDIUM: quic: implement thread affinity rebinding Implement a new function qc_set_tid_affinity(). This function is responsible to rebind a quic_conn instance to a new thread. This operation consists mostly of releasing existing tasks and tasklet and allocating new instances on the new thread. If the quic_conn uses its owned socket, it is also migrated to the new thread. The migration is finally completed with updated the CID TID to the new thread. After this step, the connection is thus accessible to the new thread and cannot be access anymore on the old one without risking race condition. To ensure rebinding is either done completely or not at all, tasks and tasklet are pre-allocated before all operations. If this fails, an error is returned and rebiding is not done. To destroy the older tasklet, its context is set to NULL before wake up. In I/O callbacks, a new function qc_process() is used to check context and free the tasklet if NULL. The thread rebinding can cause a race condition if the older thread quic_dghdlrs::dgrams list contains datagram for the connection after rebinding is done. To prevent this, quic_rx_pkt_retrieve_conn() always check if the packet CID is still associated to the current thread or not. In the latter case, no connection is returned and the new thread is returned to allow to redispatch the datagram to the new thread in a thread-safe way. This should be backported up to 2.7 after a period of observation.	2023-04-18 17:08:34 +02:00

... 3 4 5 6 7 ...

7303 Commits