haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-08 08:07:10 +02:00

Author	SHA1	Message	Date
Amaury Denoyelle	6133aba889	BUG/MINOR: h3: missing goto on buf alloc failure The following patch introduced proper error management on buffer allocation failure : `0abde9dee6` BUG/MINOR: mux-quic: properly handle buf alloc failure However, when decoding an empty STREAM frame with just FIN bit set, this was not done correctly. Indeed, there is a missing goto statement in case of a NULL buffer check. This was reported thanks to coverity analysis. This should fix github issue #2163. This must be backported up to 2.6.	2023-05-15 14:57:56 +02:00
Amaury Denoyelle	1611a7659b	BUG/MINOR: mux-quic: handle properly Tx buf exhaustion Since the following patch commit `6c501ed23b` BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc it is not possible to check if Tx buf allocation failed due to a configured limit exhaustion or a simple memory failure. This patch fixes it as the condition was inverted. Indeed, if buf_avail is null, this means that the limit has been reached. On the contrary case, this is a real memory alloc failure. This caused the flag QC_CF_CONN_FULL to not be properly used and may have caused disruption on transfer with several streams or large data. This was detected due to an abnormal error QUIC MUX traces. Also change in consequence trace for limit exhaustion to be more explicit. This must be backported up to 2.6.	2023-05-15 14:06:21 +02:00
William Lallemand	6e0c39d7ac	BUILD: ssl: ssl_c_r_dn fetches uses functiosn only available since 1.1.1 Fix the openssl build with older openssl version by disabling the new ssl_c_r_dn fetch. This also disable the ssl_client_samples.vtc file for OpenSSL version older than 1.1.1	2023-05-15 12:07:52 +02:00
Willy Tarreau	d38d8c6ccb	BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout Christopher found as part of the analysis of Tim's issue #1891 that commit `15a4733d5` ("BUG/MEDIUM: mux-h2: make use of http-request and keep-alive timeouts") introduced in 2.6 incompletely addressed a timeout issue in the H2 mux. The problem was that the http-keepalive and http-request timeouts were not applied before it. With that commit they are now considered, but if a GOAWAY is sent (or even attempted to be sent), then they are not used anymore again, because the way the code is arranged consists in applying the client-fin timeout (if set) to the current date, and falling back to the client timeout, without considering the idle_start period. This means that a config having a "timeout http-keepalive" would still not close the connection quickly when facing a client that periodically sends PING, PRIORITY or whatever other frame types. In addition, after the GOAWAY was attempted to be sent, there was no check for pending data in the output buffer, meaning that it would be possible to truncate some responses in configs involving a very short client-fin timeout. Finally the spreading of the closures during the soft-stop brought in 2.6 by commit `b5d968d9b` ("MEDIUM: global: Add a "close-spread-time" option to spread soft-stop on time window") didn't consider the particular case of an idle "pre-connect" connection, which would also live long if a browser failed to deliver a valid request for a long time. All of this indicates that the conditions must be reworked so as not to have that level of exclusion between conditions, but rather stick to the rules from the doc that are already enforced on other muxes: - timeout client always applies if there are data pending, and is relative to each new I/O ; - timeout http-request applies before the first complete request and is relative to the entry in idle state ; - timeout http-keepalive applies between idle and the next complete request and is relative to the entry in idle state ; - timeout client-fin applies when in idle after a shut was sent (here the shut is the GOAWAY). The shut may only be considered as sent if the buffer is empty and the flags indicate that it was successfully sent (or failed) but not if it's still waiting for some room in the output buffer for example. This implies that this timeout may then lower the http-keepalive/http-request ones. This is what this patch implements. Of course the client timeout still applies as a fallback when all the ones above are not set or when their conditions are not met. It would seem reasoanble to backport this to 2.7 first, then only after one or two releases to 2.6.	2023-05-15 12:01:20 +02:00
Abhijeet Rastogi	df97f472fa	MINOR: ssl: add new sample ssl_c_r_dn This patch addresses #1514, adds the ability to fetch DN of the root ca that was in the chain when client certificate was verified during SSL handshake.	2023-05-15 10:48:05 +02:00
William Lallemand	7f95469163	MEDIUM: proxy: stop emitting logs for internal proxies when stopping The HTTPCLIENT and the OCSP-UPDATE proxies are internal proxies, we don't need to display logs of them stopping during the stopping of the process. This patch checks if a proxy has the flag PR_CAP_INT so it doesn't display annoying messages.	2023-05-15 10:38:09 +02:00
Christopher Faulet	6eb53b138d	MINOR: stconn: Remove useless test on sedesc on detach to release the xref When the SC is detached from the endpoint, the xref between the endpoints is removed. At this stage, the sedesc cannot be undefined. So we can remove the test on it. This issue should fix the issue #2156. No backport needed.	2023-05-15 09:53:30 +02:00
William Lallemand	1601eebcd1	MEDIUM: mworker/cli: does not disconnect the master CLI upon error In the proxy CLI analyzer, when pcli_parse_request returns -1, the client was shut to prevent any problem with the master CLI. This behavior is a little bit excessive and not handy at all in prompt mode. For example one could have activated multiples mode, then have an error which disconnect the CLI, and they would have to reconnect and enter all the modes again. This patch introduces the pcli_error() function, which only output an error and flush the input buffer, instead of closing everything. When encountering a parsing error, this function is used, and the prompt is written again, without any disconnection.	2023-05-14 18:42:31 +02:00
William Lallemand	4adb4b9903	MEDIUM: session/ssl: return the SSL error string during a SSL handshake error SSL hanshake error were unable to dump the OpenSSL error string by default, to do so it was mandatory to configure a error-log-format with the ssl_fc_err fetch. This patch implements the session_build_err_string() function which creates the error log to send during session_kill_embryonic(), a special case is made with CO_ER_SSL_HANDSHAKE which is able to dump the error string with ERR_error_string(). Before: <134>May 12 17:14:04 haproxy[183151]: 127.0.0.1:49346 [12/May/2023:17:14:04.571] frt2/1: SSL handshake failure After: <134>May 12 17:14:04 haproxy[183151]: 127.0.0.1:49346 [12/May/2023:17:14:04.571] frt2/1: SSL handshake failure (error:0A000418:SSL routines::tlsv1 alert unknown ca)	2023-05-12 17:43:58 +02:00
Amaury Denoyelle	ee65efbfae	BUG/MINOR: mux-quic: free task on qc_init() app ops failure qc_init() is used to initialize a QUIC MUX instance. On failure, each resources are released via a series of goto statements. There is one issue if the app_ops.init callback fails. In this case, MUX task is not freed. This can cause a crash as the task is already scheduled. When the handler will run, it will crash when trying to access qcc instance. To fix this, properly destroy qcc task on fail_install_app_ops label. The impact of this bug is minor as app_ops.init callback succeeds most of the time. However, it may fail on allocation failure due to memory exhaustion. This may fix github issue #2154. This must be backported up to 2.7.	2023-05-12 16:37:27 +02:00
Amaury Denoyelle	6c501ed23b	BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc qc_stream_buf_alloc() can fail for two reasons : * limit of Tx buffer per connection reached * allocation failure The first case is properly treated. A flag QC_CF_CONN_FULL is set on the connection to interrupt emission. It is cleared when a buffer became available after in order ACK reception and the MUX tasklet is woken up. The allocation failure was handled with the same mechanism which in this case is not appropriate and could lead to a connection transfer freeze. Instead, prefer to close the connection with a QUIC internal error code. To differentiate the two causes, qc_stream_buf_alloc() API was changed to return the number of available buffers to the caller. This must be backported up to 2.6.	2023-05-12 16:26:20 +02:00
Amaury Denoyelle	50fe00650f	BUG/MINOR: quic: do not alloc buf count on alloc failure The total number of buffer per connection for sending is limited by a configuration value. To ensure this, <stream_buf_count> quic_conn field is incremented on qc_stream_buf_alloc(). qc_stream_buf_alloc() may fail if the buffer cannot be allocated. In this case, <stream_buf_count> should not be incremented. To fix this, simply move increment operation after buffer allocation. The impact of this bug is low. However, if a connection suffers from several buffer allocation failure, it may cause the <stream_buf_count> to be incremented over the limit without being able to go back down. This must be backported up to 2.6.	2023-05-12 15:55:41 +02:00
Amaury Denoyelle	d00b3093c9	BUG/MINOR: mux-quic: handle properly recv ncbuf alloc failure The function qc_get_ncbuf() is used to allocate a ncbuf content. Allocation failure was handled using a plain BUG_ON. Fix this by a proper error management. This buffer is only used for STREAM frame reception to support out-of-order offsets. When an allocation failed, close the connection with a QUIC internal error code. This should be backported up to 2.6.	2023-05-12 15:52:19 +02:00
Amaury Denoyelle	0abde9dee6	BUG/MINOR: mux-quic: properly handle buf alloc failure A convenience function qc_get_buf() is implemented to centralize buffer allocation on MUX and H3 layers. However, allocation failure was not handled properly with a BUG_ON() statement. Replace this by proper error management. On emission, streams is temporarily skip over until the next qc_send() invocation. On reception, H3 uses this function for HTX conversion; on alloc failure the connection will be closed with QUIC internal error code. This must be backported up to 2.6.	2023-05-12 15:51:15 +02:00
Amaury Denoyelle	93dd23cab4	MINOR: mux-quic: remove dedicated function to handle standalone FIN Remove QUIC MUX function qcs_http_handle_standalone_fin(). The purpose of this function was only used when receiving an empty STREAM frame with FIN bit. Besides, it was called by each application protocol which could have different approach and render the function purpose unclear. Invocation of qcs_http_handle_standalone_fin() have been replaced by explicit code in both H3 and HTTP/0.9 module. In the process, use htx_set_eom() to reliably put EOM on the HTX message. This should be backported up to 2.7, along with the previous patch which introduced htx_set_eom().	2023-05-12 15:50:30 +02:00
Amaury Denoyelle	25cf19d5c8	MINOR: htx: add function to set EOM reliably Implement a new HTX utility function htx_set_eom(). If the HTX message is empty, it will first add a dummy EOT block. This is a small trick needed to ensure readers will detect the HTX buffer as not empty and retrieve the EOM flag. Replace the H2 code related by a htx_set_eom() invocation. QUIC also has the same code which will be replaced in the next commit. This should be backported up to 2.7 before the related QUIC patch.	2023-05-12 15:29:28 +02:00
Frédéric Lécaille	76d502588d	BUG/MINOR: quic: Wrong redispatch for external data on connection socket It is possible to receive datagram from other connection on a dedicated quic-conn socket. This is due to a race condition between bind() and connect() system calls. To handle this, an explicit check is done on each datagram. If the DCID is not associated to the connection which owns the socket, the datagram is redispatch as if it arrived on the listener socket. This redispatch step was not properly done because the source address specified for the redispatch function was incorrect. Instead of using the datagram source address, we used the address of the socket quic-conn which received the datagram due to the above race condition. Fix this simply by using the address from the recvmsg() system call. The impact of this bug is minor as redispatch on connection socket should be really rare. However, when it happens it can lead to several kinds of problems, like for example a connection initialized with an incorrect peer address. It can also break the Retry token check as this relies on the peer address. In fact, Retry token check failure was the reason this bug was found. When using h2load with thousands of clients, the counter of Retry token failure was unusually high. With this patch, no failure is reported anymore for Retry. Must be backported to 2.7.	2023-05-12 14:48:30 +02:00
Aurelien DARRAGON	256d581fbd	BUG/MINOR: log: fix memory error handling in parse_logsrv() A check was missing in parse_logsrv() to make sure that malloc-dependent variable is checked for non-NULL before using it. If malloc fails, the function raises an error and stops, like it's already done at a few other places within the function. This partially fixes GH #2130. It should be backported to every stable versions.	2023-05-12 09:45:30 +02:00
Aurelien DARRAGON	d4dba38ab1	BUG/MINOR: errors: handle malloc failure in usermsgs_put() usermsgs_buf.size is set without first checking if previous malloc attempt succeeded. This could fool the buffer API into assuming that the buffer is initialized, resulting in unsafe read/writes. Guarding usermsgs_buf.size assignment with the malloc attempt result to make the buffer initialization safe against malloc failures. This partially fixes GH #2130. It should be backported up to 2.6.	2023-05-12 09:45:30 +02:00
Aurelien DARRAGON	ceb13b5ed3	MINOR: ncbuf: missing malloc checks in standalone code Some malloc resulsts were not checked in standalone ncbuf code. As this is debug/test code, we don't need to explicitly handle memory errors, we just add some BUG_ON() to ensure that memory is properly allocated and prevent unexpected results. This partially fixes issue GH #2130. No backport needed.	2023-05-12 09:45:30 +02:00
Willy Tarreau	94df1b57ee	BUILD: debug: fix build issue on 32-bit platforms in "debug dev task" Commit `986798718` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") caused a build failure on 32-bit platforms when parsing the task's pointer. Let's use strtoul() and not strtoll(). No backport is needed, unless the commit above gets backported.	2023-05-12 04:40:06 +02:00
William Lallemand	e279f595ad	MINOR: httpclient: allow to disable the DNS resolvers of the httpclient httpclient.resolvers.disabled allow to disable completely the resolvers of the httpclient, prevents the creation of the "default" resolvers section, and does not insert the http do-resolve rule in the proxies.	2023-05-11 21:25:37 +02:00
Willy Tarreau	fe0ba0e9f9	MINOR: cli: make "show fd" identify QUIC connections and listeners Now we can detect the listener associated with a QUIC listener and report a bit more info (e.g. listening port and frontend name), and provide a bit more info about connections as well, and filter on both front connections and listeners using the "l" and "f" flags.	2023-05-11 17:20:39 +02:00
Willy Tarreau	ea07715ccf	MINOR: master/cli: also implement the timed prompt on the master CLI This provides more consistency between the master and the worker. When "prompt timed" is passed on the master, the timed mode is toggled. When enabled, for a master it will show the master process' uptime, and for a worker it will show this worker's uptime. Example: master> prompt timed [0:00:00:50] master> show proc #<PID> <type> <reloads> <uptime> <version> 11940 master 1 [failed: 0] 0d00h02m10s 2.8-dev11-474c14-21 # workers 11955 worker 0 0d00h00m59s 2.8-dev11-474c14-21 # old workers 11942 worker 1 0d00h02m10s 2.8-dev11-474c14-21 # programs [0:00:00:58] master> @!11955 [0:00:01:03] 11955> @!11942 [0:00:02:17] 11942> @ [0:00:01:10] master>	2023-05-11 16:38:52 +02:00
Willy Tarreau	225555711f	MINOR: cli: add an option to display the uptime in the CLI's prompt Entering "prompt timed" toggles reporting of the process' uptime in the prompt, which will report days, hours, minutes and seconds since it was started. As discussed with Tim in issue #2145, this can be convenient to roughly estimate the time between two outputs, as well as detecting that a process failed to be reloaded for example.	2023-05-11 16:38:52 +02:00
Willy Tarreau	21d7125c92	BUG/MINOR: cli: don't complain about empty command on empty lines There's something very irritating on the CLI, when just pressing ENTER, it complains "Unknown command: ''..." and dumps all the help. This action is often done to add a bit of clearance after a dump to visually find delimitors later, but this stupid error makes it unusable. This patch addresses this by just returning on empty command instead of trying to look up a matching keyword. It will result in an empty line to mark the end of the empty command and a prompt again. It's probably not worth backporting this given that nobody seems to have complained about it yet.	2023-05-11 16:38:52 +02:00
Aurelien DARRAGON	31b23aef38	CLEANUP: acl: discard prune_acl_cond() function Thanks to previous commit, we have no more use for prune_acl_cond(), let's remove it to prevent code duplication.	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	c610095258	MINOR: tree-wide: use free_acl_cond() where relevant Now that we have free_acl_cond(cond) function that does cond prune then frees cond, replace all occurences of this pattern: \| prune_acl_cond(cond) \| free(cond) with: \| free_acl_cond(cond)	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	cd9aff1321	CLEANUP: http_act: use http_free_redirect_rule() to clean redirect act Since redirect rules now have a dedicated cleanup function, better use it to prevent code duplication.	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	5313570605	BUG/MINOR: http_rules: fix errors paths in http_parse_redirect_rule() http_parse_redirect_rule() doesn't perform enough checks around NULL returning allocating functions. Moreover, existing error paths don't perform cleanups. This could lead to memory leaks. Adding a few checks and a cleanup path to ensure memory errors are properly handled and that no memory leaks occurs within the function (already allocated structures are freed on error path). It should partially fix GH #2130. This patch depends on ("MINOR: proxy: add http_free_redirect_rule() function") This could be backported up to 2.4. The patch is also relevant for 2.2 but "MINOR: proxy: add http_free_redirect_rule() function" would need to be adapted first. == Backport notes: -> For 2.2 only: Replace: (strcmp(args[cur_arg], "drop-query") == 0) with: (!strcmp(args[cur_arg],"drop-query")) -> For 2.2 and 2.4: Replace: "expects 'code', 'prefix', 'location', 'scheme', 'set-cookie', 'clear-cookie', 'drop-query', 'ignore-empty' or 'append-slash' (was '%s')", with: "expects 'code', 'prefix', 'location', 'scheme', 'set-cookie', 'clear-cookie', 'drop-query' or 'append-slash' (was '%s')",	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	7abc9224a6	MINOR: proxy: add http_free_redirect_rule() function Adding http_free_redirect_rule() function to free a single redirect rule since it may be required to free rules outside of free_proxy() function. This patch is required for an upcoming bugfix. [for 2.2, free_proxy function did not exist (first seen in 2.4), thus http_free_redirect_rule() needs to be deducted from haproxy.c deinit() function if the patch is required]	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	8dfc2491d2	BUG/MINOR: proxy: missing free in free_proxy for redirect rules cookie_str from struct redirect, which may be allocated through http_parse_redirect_rule() function is not properly freed on proxy cleanup within free_proxy(). This could be backported to all stable versions. [for 2.2, free_proxy() did not exist so the fix needs to be performed directly in deinit() function from haproxy.c]	2023-05-11 15:37:04 +02:00
Christopher Faulet	7542fb43d6	MINOR: stconn: Add a cross-reference between SE descriptor A xref is added between the endpoint descriptors. It is created when the server endpoint is attached to the SC and it is destroyed when an endpoint is detached. This xref is not used for now. But it will be useful to retrieve info about an endpoint for the opposite side. It is also the warranty there is still a endpoint attached on the other side.	2023-05-11 15:37:04 +02:00
Christopher Faulet	efebff35bb	BUG/MEDIUM: mux-fcgi: Don't request more room if mux is waiting for more data A mux must never report it is waiting for room in the channel buffer if this buffer is empty. Because there is nothing the application layer can do to unblock the situation. Indeed, when this happens, it means the mux is waiting for data to progress. It typically happens when all headers are not received. In the FCGI mux, if some data remain in the RX buffer but the channel buffer is empty, it does no longer report it is waiting for room. This patch should fix the issue #2150. It must be backported as far as 2.6.	2023-05-11 15:37:04 +02:00
Christopher Faulet	a272c39330	BUG/MEDIUM: mux-fcgi: Never set SE_FL_EOS without SE_FL_EOI or SE_FL_ERROR When end-of-stream is reported by a FCGI stream, we must take care to also report an error if end-of-input was not reported. Indeed, it is now mandatory to set SE_FL_EOI or SE_FL_ERROR flags when SE_FL_EOS is set. It is a 2.8-specific issue. No backport needed.	2023-05-11 15:37:04 +02:00
Willy Tarreau	4cfb0019e6	MINOR: stats: report the listener's protocol along with the address in stats When "optioon socket-stats" is used in a frontend, its listeners have their own stats and will appear in the stats page. And when the stats page has "stats show-legends", then a tooltip appears on each such socket with ip:port and ID. The problem is that since QUIC arrived, it was not possible to distinguish the TCP listeners from the QUIC ones because no protocol indication was mentioned. Now we add a "proto" legend there with the protocol name, so we can see "tcp4" or "quic6" and figure how the socket is bound.	2023-05-11 14:52:56 +02:00
Amaury Denoyelle	5f67b17a59	MEDIUM: mux-quic: adjust transport layer error handling Following previous patch, error notification from quic_conn has been adjusted to rely on standard connection flags. Most notably, CO_FL_ERROR on the connection instance when a fatal error is detected. Check for CO_FL_ERROR is implemented by qc_send(). If set the new flag QC_CF_ERR_CONN will be set for the MUX instance. This flag is similar to the local error flag and will abort most of the futur processing. To ensure stream upper layer is also notified, qc_wake_some_streams() called by qc_process() will put the stream on error if this new flag is set. This should be backported up to 2.7.	2023-05-11 14:12:48 +02:00
Amaury Denoyelle	b2e31d33f5	MEDIUM: quic: streamline error notification When an error is detected at quic-conn layer, the upper MUX must be notified. Previously, this was done relying on quic_conn flag QUIC_FL_CONN_NOTIFY_CLOSE set and the MUX wake callback called on connection closure. Adjust this mechanism to use an approach more similar to other transport layers in haproxy. On error, connection flags are updated with CO_FL_ERROR, CO_FL_SOCK_RD_SH and CO_FL_SOCK_WR_SH. The MUX is then notified when the error happened instead of just before the closing. To reflect this change, qc_notify_close() has been renamed qc_notify_err(). This function must now be explicitely called every time a new error condition arises on the quic_conn layer. To ensure MUX send is disabled on error, qc_send_mux() now checks CO_FL_SOCK_WR_SH. If set, the function returns an error. This should prevent the MUX from sending data on closing or draining state. To complete this patch, MUX layer must now check for CO_FL_ERROR explicitely. This will be the subject of the following commit. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	2ad41b8629	MINOR: mux-quic: simplify return path of qc_send() Remove the unnecessary err label for qc_send(). Anyway, this label cannot be used once some frames are sent because there is no cleanup part for it. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	b35e32e43b	MINOR: mux-quic: factorize send subscribing Factorize code for send subscribing on the lower layer in a dedicated function qcc_subscribe_send(). This allows to call the lower layer only if not already subscribed and print a trace in this case. This should help to understand when subscribing is really performed. In the future, this function may be extended to avoid subscribing under new conditions, such as connection already on error. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	04b2208aa0	MINOR: mux-quic: do not send STREAM frames if already subscribe Do not built STREAM frames if MUX is already subscribed for sending on lower layer. Indeed, this means that either socket currently encountered a transient error or congestion window is full. This change is an optimization which prevents to allocate and release a series of STREAM frames for nothing under congestion. Note that nothing is done for other frames (flow-control, RESET_STREAM and STOP_SENDING). Indeed, these frames are not restricted by flow control. However, this means that they will be allocated for nothing if send is blocked on a transient error. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	2d5c3f5cd1	MINOR: mux-quic: add traces for stream wake Add traces for when an upper layer stream is woken up by the MUX. This should help to diagnose frozen stream issues. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	69670e88bd	BUG/MINOR: mux-quic: no need to subscribe for detach streams When detach is conducted by stream endpoint layer, a stream is either freed or just flagged as detached if the transfer is not yet finished. In the latter case, the stream will be finally freed via qc_purge_streams() which is called periodically. A subscribe was done on quic-conn layer if a stream cannot be freed via qc_purge_streams() as this means FIN STREAM has not yet been sent. However, this is unnecessary as either HTX EOM was not yet received and we are waiting for the upper layer, or FIN stream is still in the buffer but was not yet transmitted due to an incomplete transfer, in which case a subscribe should have already been done. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	131f2d93e1	BUG/MINOR: mux-quic: do not free frame already released by quic-conn MUX uses qc_send_mux() function to send frames list over a QUIC connection. On network congestion, the lower layer will reject some frames and it is the MUX responsibility to free them. There is another category of error which are when the sendto() fails. In this case, the lower layer will free the packet and its attached frames and the MUX should not touch them. This model was violated by MUX layer for RESET_STREAM and STOP_SENDING emission. In this case, frames were freed every time by the MUX on error. This causes a double free error which lead to a crash. Fix this by always ensuring if frames were rejected by the lower layer before freeing them on the MUX. This is done simply by checking if frame list is not empty, as RESET_STREAM and STOP_SENDING are sent individually. This bug was never reproduced in production. Thus, it is labelled as MINOR. This must be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	3fd40935d9	BUG/MINOR: mux-quic: do not prevent shutw on error Since recent modification of MUX error processing, shutw operation was skipped for a connection reported as on error. However, this can caused the stream layer to not be notified about error. The impact of this bug is unknown but it may lead to stream never closed. To fix this, simply skip over send operations when connection is on error while keep notifying the stream layer. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Willy Tarreau	9615102b01	MINOR: stats: report the number of times the global maxconn was reached As discussed a few times over the years, it's quite difficult to know how often we stop accepting connections because the global maxconn was reached. This is not easy to know because when we reach the limit we stop accepting but we don't know if incoming connections are pending, so it's not possible to know how many were delayed just because of this. However, an interesting equivalent metric consist in counting the number of times an accepted incoming connection resulted in the limit being reached. I.e. "we've accepted the last one for now". That doesn't imply any other one got delayed but it's a factual indicator that something might have been delayed. And by counting the number of such events, it becomes easier to know whether some limits need to be adjusted because they're reached often, or if it's exceptionally rare. The metric is reported as a counter in show info and on the stats page in the info section right next to "maxconn".	2023-05-11 13:51:31 +02:00
Willy Tarreau	3c4a297d2b	MINOR: stats: report the total number of warnings issued Now in "show info" we have a TotalWarnings field that reports the total number of warnings issued since the process started. It's also reported in the the stats page next to the uptime.	2023-05-11 12:02:21 +02:00
Frédéric Lécaille	0dd4fa58e6	BUG/MINOR: quic: Buggy acknowlegments of acknowlegments function qc_treat_ack_of_ack() must remove ranges of acknowlegments from an ebtree which have been acknowledged. This is done keeping track of the largest acknowledged packet number which has been acknowledged and sent with an ack-eliciting packet. But due to the data structure of the acknowledgement ranges used to build an ACK frame, one must leave at least one range in such an ebtree which must at least contain a unique one-element range with the largest acknowledged packet number as element. This issue was revealed by @Tristan971 in GH #2140. Must be backported in 2.7 and 2.6.	2023-05-11 10:33:23 +02:00
Aurelien DARRAGON	d7d507aa8a	CLEANUP: hlua_fcn/queue: make queue:push() easier to read Adding some spaces and code comments in queue:push() function to make it easier to read.	2023-05-11 09:23:14 +02:00
Aurelien DARRAGON	c0af7cdba2	BUG/MINOR: hlua_fcn/queue: fix reference leak When pushing a lua object through lua Queue class, a new reference is created from the object so that it can be safely restored when needed. Likewise, when popping an object from lua Queue class, the object is restored at the top of the stack via its reference id. However, once the object is restored the related queue entry is removed, thus the object reference must be dropped to prevent reference leak.	2023-05-11 09:23:14 +02:00
Aurelien DARRAGON	bd8a94a759	BUG/MINOR: hlua_fcn/queue: fix broken pop_wait() queue:pop_wait() was broken during late refactor prior to merge. (Due to small modifications to ensure that pop() returns nil on empty queue instead of nothing) Because of this, pop_wait() currently behaves exactly as pop(), resulting in 100% active CPU when used in a while loop. Indeed, _hlua_queue_pop() should explicitly return 0 when the queue is empty since pop_wait logic relies on this and the pushnil should be handled directly in queue:pop() function instead. Adding some comments as well to document this.	2023-05-11 09:23:14 +02:00
Christopher Faulet	0fda8d2c8e	BUG/MEDIUM: filters: Don't deinit filters for disabled proxies during startup During the startup stage, if a proxy was disabled in config, all filters were released and removed. But it may be an issue if some info are shared between filters of the same type. Resources may be released too early. It happens with ACLs defined in SPOE configurations. Pattern expressions can be shared between filters. To fix the issue, filters for disabled proxies are no longer released during the startup stage but only when HAProxy is stopped. This commit depends on the previous one ("MINOR: spoe: Don't stop disabled proxies"). Both must be backported to all stable versions.	2023-05-11 09:22:46 +02:00
Christopher Faulet	7f4ffad46e	MINOR: spoe: Don't stop disabled proxies SPOE register a signal handler to be able to stop SPOE applets ASAP during soft-stop. Disabled proxies must be ignored at this staged because they are not fully configured. For now, it is useless but this change is mandatory to fix a bug.	2023-05-11 09:22:46 +02:00
Christopher Faulet	16e314150a	BUILD: mjson: Fix warning about unused variables clang 15 reports unused variables in src/mjson.c: src/mjson.c:196:21: fatal error: expected ';' at end of declaration int __maybe_unused n = 0; and src/mjson.c:727:17: fatal error: variable 'n' set but not used [-Wunused-but-set-variable] int sign = 1, n = 0; An issue was created on the project, but it was not fixed for now: https://github.com/cesanta/mjson/issues/51 So for now, to fix the build issue, these variables are declared as unused. Of course, if there is any update on this library, be careful to review this patch first to be sure it is always required. This patch should fix the issue #1868. It be backported as far as 2.4.	2023-05-11 09:22:46 +02:00
Ilya Shipitsin	83f54b9aef	CLEANUP: src/listener.c: remove redundant NULL check fixes #2031 quoting Willy Tarreau: "Originally the listeners were intended to work without a bind_conf (e.g. for FTP processing) hence these tests, but over time the bind_conf has become omnipresent"	2023-05-11 05:30:03 +02:00
Christopher Faulet	bd90a16564	MEDIUM: stream: Resync analyzers at the end of process_stream() on change At the end of process_stream(), if there was any change on request/response analyzers, we now trigger a resync. It is performed if any analyzer is added but also removed. It should help to catch internal changes on a stream and eventually avoid it to be frozen. There is no reason to backport this patch. But it may be good to keep an eye on it, just in case.	2023-05-10 16:45:36 +02:00
Christopher Faulet	b1368adcc7	BUG/MEDIUM: stream: Forward shutdowns when unhandled errors are caught In process_stream(), after request and response analyzers evaluation, unhandled errors are processed, if any. In this case, depending on the case, remaining request or response analyzers may be removed, unlesse the last one about end of filters. However, auto-close is not reenabled in same time. Thus it is possible to not forward the shutdown for a side to the other one while no analyzer is there to do so or at least to make evolved the situation. In theory, it is thus possible to freeze a stream if no wakeup happens. And it seems possible because it explain a freeze we've oberseved. This patch could be backported to every stable versions but only after a period of observation and if it may match an unexplained bug. It should not lead to any loop but at worst and eventually to truncated messages.	2023-05-10 16:45:36 +02:00
Willy Tarreau	862588a4b5	BUG/MINOR: config: make compression work again in defaults section When commit `ead43fe4f` ("MEDIUM: compression: Make it so we can compress requests as well.") added the test for the direction flags to select the compression, it implicitly broke compression defined in defaults sections because the flags from the default proxy were not recopied, hence the compression was enabled but in no direction. No backport is needed, that's 2.8 only.	2023-05-10 16:41:21 +02:00
Frédéric Lécaille	b971696296	BUG/MINOR: quic: Possible crash when dumping version information ->others member of tp_version_information structure pointed to a buffer in the TLS stack used to parse the transport parameters. There is no garantee that this buffer is available until the connection is released. Do not dump the available versions selected by the client anymore, but displayed the chosen one (selected by the client for this connection) and the negotiated one. Must be backported to 2.7 and 2.6.	2023-05-10 13:26:37 +02:00
Amaury Denoyelle	da24bcfad3	BUG/MEDIUM: mux-quic: wakeup tasklet to close on error A recent series of commit have been introduced to rework error generation on QUIC MUX side. Now, all MUX/APP functions uses qcc_set_error() to set the flag QC_CF_ERRL on error. Then, this flag is converted to QC_CF_ERRL_DONE with a CONNECTION_CLOSE emission by qc_send(). This has the advantage of centralizing the CONNECTION_CLOSE generation in one place and reduces the link between MUX and quic-conn layer. However, we must now ensure that every qcc_set_error() call is followed by a QUIC MUX tasklet to invoke qc_send(). This was not the case, thus when there is no active transfer, no CONNECTION_CLOSE frame is emitted and the connection remains opened. To fix this, add a tasklet_wakeup() directly in qcc_set_error(). This is a brute force solution as this may be unneeded when already in the MUX tasklet context. However, it is the simplest solution as it is too tedious for the moment to list all qcc_set_error() invocation outside of the tasklet. This must be backported up to 2.7.	2023-05-09 18:42:34 +02:00
Amaury Denoyelle	58721f2192	BUG/MINOR: mux-quic: fix transport VS app CONNECTION_CLOSE A recent series of patch were introduced to streamline error generation by QUIC MUX. However, a regression was introduced : every error generated by the MUX was built as CONNECTION_CLOSE_APP frame, whereas it should be only for H3/QPACK errors. Fix this by adding an argument <app> in qcc_set_error. When false, a standard CONNECTION_CLOSE is used as error. This bug was detected by QUIC tracker with the following tests "stop_sending" and "server_flow_control" which requires a CONNECTION_CLOSE frame. This must be backported up to 2.7.	2023-05-09 18:42:34 +02:00
Christopher Faulet	a236c58223	BUG/MEDIUM: stats: Require more room if buffer is almost full This was lost with commit `f4258bdf3` ("MINOR: stats: Use the applet API to write data"). When the buffer is almost full, the stats applet gives up. When this happens, the applet must require more room. Otherwise, data in the channel buffer are sent to the client but the applet is not woken up in return. It is a 2.8-specific bug, no backport needed.	2023-05-09 16:36:45 +02:00
William Lallemand	930afdf614	BUILD: ssl: buggy -Werror=dangling-pointer since gcc 13.0 GCC complains about swapping 2 heads list, one local and one global. gcc -Iinclude -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -Werror -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS -DUSE_EPOLL -DUSE_NETFILTER -DUSE_POLL -DUSE_THREAD -DUSE_BACKTRACE -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H -DUSE_GETADDRINFO -DUSE_OPENSSL -DUSE_SSL -DUSE_LUA -DUSE_ACCEPT4 -DUSE_ZLIB -DUSE_CPU_AFFINITY -DUSE_TFO -DUSE_NS -DUSE_DL -DUSE_RT -DUSE_MATH -DUSE_SYSTEMD -DUSE_PRCTL -DUSE_THREAD_DUMP -DUSE_QUIC -DUSE_SHM_OPEN -DUSE_PCRE -DUSE_PCRE_JIT -I/github/home/opt/include -I/usr/include -DCONFIG_HAPROXY_VERSION=\"2.8-dev8-7d23e8d1a6db\" -DCONFIG_HAPROXY_DATE=\"2023/04/24\" -c -o src/ssl_sample.o src/ssl_sample.c In file included from include/haproxy/pool.h:29, from include/haproxy/chunk.h:31, from include/haproxy/dynbuf.h:33, from include/haproxy/channel.h:27, from include/haproxy/applet.h:29, from src/ssl_sock.c:47: src/ssl_sock.c: In function 'tlskeys_finalize_config': include/haproxy/list.h:48:88: error: storing the address of local variable 'tkr' in 'tlskeys_reference.p' [-Werror=dangling-pointer=] 48 \| #define LIST_INSERT(lh, el) ({ (el)->n = (lh)->n; (el)->n->p = (lh)->n = (el); (el)->p = (lh); (el); }) \| ~~~~~~~~^~~~~~ src/ssl_sock.c:1086:9: note: in expansion of macro 'LIST_INSERT' 1086 \| LIST_INSERT(&tkr, &tlskeys_reference); \| ^~~~~~~~~~~ compilation terminated due to -Wfatal-errors. This appears with gcc 13.0. The fix uses LIST_SPLICE() instead of inserting the head of the local list in the global list. Should fix issue #2136 .	2023-05-09 14:25:10 +02:00
Christopher Faulet	d6f0557deb	BUG/MEDIUM: cache: Don't request more room than the max allowed Since a recent change on the SC API, a producer must specify the amount of free space it needs to progress when it is blocked. But, it must take care to never exceed the maximum size allowed in the buffer. Otherwise, the stream is freezed because it cannot reach the condition to unblock the producer. In this context, there is a bug in the cache applet when it fails to dump a message. It may request more space than allowed. It happens when the cached object is too big. It is a 2.8-specific bug. No backport needed.	2023-05-09 11:53:28 +02:00
Frédéric Lécaille	7a01ff7921	BUG/MINOR: quic: Wrong key update cipher context initialization for encryption As noticed by Miroslav, there was a typo in quic_tls_key_update() which lead a cipher context for decryption to be initialized and used in place of a cipher context for encryption. Surprisingly, this did not prevent the key update from working. Perhaps this is due to the fact that the underlying cryptographic algorithms used by QUIC are all symetric algorithms. Also modify incorrect traces. Must be backported in 2.6 and 2.7.	2023-05-09 11:03:26 +02:00
Frédéric Lécaille	a94612522d	CLEANUP: quic: Typo fix for quic_connection_id pool Remove a "n" extra letter. Should be backported to 2.7.	2023-05-09 10:48:40 +02:00
Frédéric Lécaille	1bc6e318f0	CLEANUP: quic: Rename several <buf> variables in quic_frame.(c\|h) Most of the function in quic_frame.c and quic_frame.h manipulate <buf> buffer position variables which have nothing to see with struct buffer variables. Rename them to <pos> Should be backported to 2.7.	2023-05-09 10:48:40 +02:00
Willy Tarreau	95e6c9999a	BUILD: debug: do not check the isolated_thread variable in non-threaded builds The build without thread support was broken by commit `b30ced3d8` ("BUG/MINOR: debug: fix incorrect profiling status reporting in show threads") because it accesses the isolated_thread variable that is not defined when threads are disabled. In fact both the test on harmless and this one make no sense without threads, so let's comment out the block and mark the related variables as unused. This may have to be backported to 2.7 if the commit above is.	2023-05-07 15:02:30 +02:00
Willy Tarreau	dd9f921b3a	CLEANUP: fix a few reported typos in code comments These are only the few relevant changes among those reported here: https://github.com/haproxy/haproxy/actions/runs/4856148287/jobs/8655397661	2023-05-07 07:07:44 +02:00
Willy Tarreau	615c301db4	MINOR: config: allow cpu-map to take commas in lists of ranges The function that cpu-map uses to parse CPU sets, parse_cpu_set(), was etended in 2.4 with commit `a80823543` ("MINOR: cfgparse: support the comma separator on parse_cpu_set") to support commas between ranges. But since it was quite late in the development cycle, by then it was decided not to add a last-minute surprise and not to magically support commas in cpu-map, hence the "comma_allowed" argument. Since then we know that it was not the best choice, because the comma is silently ignored in the cpu-map syntax, causing all sorts of surprises in field with threads running on a single node for example. In addition it's quite common to copy-paste a taskset line and put it directly into the haproxy configuration. This commit relaxes this rule an finally allows cpu-map to support commas between ranges. It simply consists in removing the comma_allowed argument in the parse_cpu_set() function. The doc was updated to reflect this.	2023-05-05 18:41:52 +02:00
Amaury Denoyelle	2273af11e0	MINOR: quic: implement oneline format for "show quic" Add a new output format "oneline" for "show quic" command. This prints one connection per line with minimal information. The objective is to have an equivalent of the netstat/ss tools with just enough information to quickly find connection which are misbehaving. A legend is printed on the first line to describe the field columns starting with a dash character. This should be backported up to 2.7.	2023-05-05 18:08:37 +02:00
Amaury Denoyelle	bc1f5fed72	MINOR: quic: add format argument for "show quic" Add an extra optional argument for "show quic" to specify desired output format. Its objective is to control the verbosity per connections. For the moment, only "full" is supported, which is the already implemented output with maximum information. This should be backported up to 2.7.	2023-05-05 18:06:51 +02:00
Aurelien DARRAGON	86fb22c557	MINOR: hlua_fcn: add Queue class Adding a new lua class: Queue. This class provides a generic FIFO storage mechanism that may be shared between multiple lua contexts to easily pass data between them, as stock Lua doesn't provide easy methods for passing data between multiple coroutines. New Queue object may be obtained using core.queue() (it works like core.concat() for a concat Class) Lua documentation was updated (including some usage examples)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	40cd44f52c	MINOR: hlua: declare hlua_gethlua() function Declaring hlua_gethlua() function to make it usable from hlua_fcn.c.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	e0b16355ce	CLEANUP: hlua: hlua_register_task() may longjmp Adding __LJMP prefix to hlua_register_task() to indicate that the function may longjmp when executed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	977688bd57	MINOR: server: fix message report when IDRAIN is set and MAINT is cleared Remaining in drain mode after removing one of server admins flags leads to this message being generated: "Server name/backend is leaving forced drain but remains in drain mode." However this is not necessarily true: the server might just be leaving MAINT with the IDRAIN flag set, so the report is incorrect in this case. (FDRAIN was not set so it cannot be cleared) To prevent confusion around this message and to comply with the code comment above it: we remove the "leaving forced drain" precision to make the report suitable for multiple transitions.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	a2c5321045	BUG/MINOR: hlua: spinning loop in hlua_socket_handler() Since `3157222` ("MEDIUM: hlua/applet: Use the sedesc to report and detect end of processing"), hlua_socket_handler() might spin loop if the hlua socket is destroyed and some data was left unconsumed in the applet. Prior to the above commit, the stream was explicitly KILLED (when ctx->die == 1) so the app couldn't spinloop on unconsumed data. But since the refactor this is no longer the case. To prevent unconsumed data from waking the applet indefinitely, we consume pending data when either one of EOS\|ERROR\|SHR\|SHW flags are set, as it is done everywhere else this check is performed in the code. Hence it was probably overlooked in the first place during the refacto. This bug is 2.8 specific only, so no backport needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	717a38d135	MINOR: hlua: expose proxy mailers Proxy mailers, which are configured using "email-alert" directives in proxy sections from the configuration, are now being exposed directly in lua thanks to the proxy:get_mailers() method which returns a class containing the various mailers settings if email alerts are configured for the given proxy (else nil is returned). Both the class and the proxy method were marked as LEGACY since this feature relies on mailers configuration, and the goal here is to provide the last missing bits of information to lua scripts in order to make them capable of sending email alerts instead of relying on the soon-to- be deprecated mailers implementation based on checks (see src/mailers.c) Lua documentation was updated accordingly.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	5bed48fec8	MINOR: mailers/hlua: disable email sending from lua Exposing a new hlua function, available from body or init contexts, that forcefully disables the sending of email alerts even if the mailers are defined in haproxy configuration. This will help for sending email directly from lua. (prevent legacy email sending from intefering with lua)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	0bd53b2152	MINOR: hlua/event_hdl: expose SERVER_CHECK event Exposing SERVER_CHECK event through the lua API. New lua class named ServerEventCheck was added to provide additional data for SERVER_CHECK event. Lua documentation was updated accordingly.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	dcbc2d2cac	MINOR: checks/event_hdl: SERVER_CHECK event Adding a new event type: SERVER_CHECK. This event is published when a server's check state ought to be reported. (check status change or check result) SERVER_CHECK event is provided as a server event with additional data carrying relevant check's context such as check's result and health.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	948dd3ddfb	MINOR: hlua: expose SERVER_ADMIN event Exposing SERVER_ADMIN event in lua and updating the documentation.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	a163d65254	MINOR: server/event_hdl: add SERVER_ADMIN event Adding a new SERVER event in the event_hdl API. SERVER_ADMIN is implemented as an advanced server event. It is published each time the administrative state changes. (when s->cur_admin changes) SERVER_ADMIN data is an event_hdl_cb_data_server_admin struct that provides additional info related to the admin state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	c99f3adf10	MINOR: hlua: expose SERVER_STATE event Exposing SERVER_STATE event in lua and updating the documentation.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	c249f6d964	OPTIM: server: publish UP/DOWN events from STATE change Reuse cb_data from STATE event to publish UP and DOWN events. This saves some CPU time since the event is only constructed once to publish STATE, STATE+UP or STATE+DOWN depending on the state change.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	e3eea29f48	MINOR: server/event_hdl: add SERVER_STATE event Adding a new SERVER event in the event_hdl API. SERVER_STATE is implemented as an advanced server event. It is published each time the server's effective state changes. (when s->cur_state changes) SERVER_STATE data is an event_hdl_cb_data_server_state struct that provides additional info related to the server state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	306a5fc987	MINOR: server/event_hdl: publish macro helper add a macro helper to help publish server events to global and per-server subscription list at once since all server events support both subscription modes.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	fc84553df8	MINOR: hlua_fcn: add Proxy.get_srv_act() and Proxy.get_srv_bck() Proxy.get_srv_act: number of active servers that are eligible for LB Proxy.get_srv_bck: number of backup servers that are eligible for LB	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	fc759b4ac2	MINOR: hlua_fcn: add Server.get_pend_conn() and Server.get_cur_sess() Server.get_pend_conn: number of pending connections to the server Server.get_cur_sess: number of current sessions handled by the server Lua documentation was updated accordingly.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	3889efa8e4	MINOR: hlua_fcn: add Server.get_proxy() Server.get_proxy(): get the proxy to which the server belongs (or nil if not available)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	4be36a1337	MINOR: hlua_fcn: add Server.get_trackers() This function returns an array of servers who are currently tracking the server.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	406511a2df	MINOR: hlua_fcn: add Server.tracking() This function returns the currently tracked server, if any.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	7a03dee36f	MINOR: hlua_fcn: add Server.is_dynamic() This function returns true if the current server is dynamic, meaning that it was instantiated at runtime (ie: from the cli)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	c72051d53a	MINOR: hlua_fcn: add Server.is_backup() This function returns true if the current server is a backup server.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	862a0fe75a	MINOR: hlua_fcn: fix Server.is_draining() return type Adjusting Server.is_draining() return type from integer to boolean to comply with the documentation.	2023-05-05 16:28:32 +02:00
Christopher Faulet	e7405d4124	MEDIUM: stconn: Check room needed to unblock opposite SC when data was sent After a sending attempt, we check the opposite SC to see if it is waiting for a minimum free space to receive more data. If the condition is respected, it is unblocked. 0 is special case where the SC is unconditionally unblocked.	2023-05-05 15:44:23 +02:00
Christopher Faulet	18b3309f38	MEDIUM: stconn: Check room needed to unblock SC on fast-forward During fast-forward, if the SC is waiting for a minimum free space to receive more data and some data was sent, it is only unblock is the condition is respected. 0 is special case where the SC is unconditionally unblocked.	2023-05-05 15:44:23 +02:00
Christopher Faulet	c184b11b1a	MEDIUM: applet: Check room needed to unblock opposite SC when data was consumed If the opposite SC is waiting for a minimum free space to receive more data, it is only unblock is the condition is respected. 0 is a special cases where the opposite SC is always unblocked.	2023-05-05 15:44:23 +02:00
Christopher Faulet	fab82bfd55	BUG/MEDIUM: stconn: Unblock SC from stream if there is enough room to progrees At the end of process_stream(), in sc_update_rx(), the SC is now unblocked if it was waiting for room and the free space in the input buffer is large enough. This patch should fix an issue with the compression filter that can leave the channel's buffer empty while the endpoint is waiting for room to progress. Indeed, in this case, because the buffer is empty, there is no send attempt and no other way to unblock the SE. This commit depends on following commits: * MEDIUM: tree-wide: Change sc API to specify required free space to progress * MINOR: stconn: Add a field to specify the room needed by the SC to progress * MINOR: peers: Use the applet API to send message * MINOR: stats: Use the applet API to write data * MINOR: cli: Use applet API to write output message It should fix a regression introduced with the commit `341a5783b` ("BUG/MEDIUM: stconn: stop to enable/disable reads from streams via si_update_rx"). It must be backported iff the commit above is also backported. It was not backported yet and it is thus probably a good idea to not do so to avoid to backport too many change..	2023-05-05 15:44:23 +02:00
Christopher Faulet	7b3d38a633	MEDIUM: tree-wide: Change sc API to specify required free space to progress sc_need_room() now takes the required free space to receive more data as parameter. All calls to this function are updated accordingly. For now, this value is set but not used. When we are waiting for a buffer, 0 is used. So we expect to be unblocked ASAP. However this must be reviewed because SC_FL_NEED_BUF is probably enough in this case and this flag is already set if the input buffer allocation fails.	2023-05-05 15:44:23 +02:00
Christopher Faulet	9aed1124ed	MINOR: stconn: Add a field to specify the room needed by the SC to progress When the SC is blocked because it is waiting for room in the input buffer, it will be responsible to specify the minimum free space required to progress. In this commit, we only introduce the field in the stconn structure that will be used to store this value. It is a signed value with the following meaning: * -1: The SC is waiting for room but not based on the buffer state. It will be typically used during splicing when the pipe is full. In this case, only a successful send can unblock the SC. * >= 0; The minimum free space in the input buffer to unblock the SC. 0 is a special value to specify the SC must be unblocked ASAP, by the stream, at the end of process_stream() or when output data are consumed on the opposite side.	2023-05-05 15:41:30 +02:00
Christopher Faulet	7a48b72d39	MINOR: peers: Use the applet API to send message The peers applet now use the applet API to send message instead of the channel API. This way, it does not need to take care to request more room if it fails to put data into the channel's buffer.	2023-05-05 15:41:30 +02:00
Christopher Faulet	f4258bdf3b	MINOR: stats: Use the applet API to write data stats_putchk() is updated to use the applet API instead of the channel API to write data. To do so, the appctx is passed as parameter instead of the channel. This way, the applet does not need to take care to request more room it it fails to put data into the channel's buffer.	2023-05-05 15:41:29 +02:00
Christopher Faulet	e8ee27b0fd	MINOR: cli: Use applet API to write output message Instead of using the channel API to to write output message from the CLI applet, we use the applet API. This way, the applet does not need to take care to request more room it it fails to put its message into the channel's buffer.	2023-05-05 15:41:19 +02:00
William Lallemand	b6ae2aafde	MINOR: ssl: allow to change the signature algorithm for client authentication This commit introduces the keyword "client-sigalgs" for the bind line, which does the same as "sigalgs" but for the client authentication. "ssl-default-bind-client-sigalgs" allows to set the default parameter for all the bind lines. This patch should fix issue #2081.	2023-05-05 00:05:46 +02:00
William Lallemand	1d3c822300	MINOR: ssl: allow to change the server signature algorithm This patch introduces the "sigalgs" keyword for the bind line, which allows to configure the list of server signature algorithms negociated during the handshake. Also available as "ssl-default-bind-sigalgs" in the default section. This patch was originally written by Bruno Henc.	2023-05-04 22:43:18 +02:00
Willy Tarreau	e69919d1ba	CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash() The function isn't used anymore since each call place performs its own loop. Let's get rid of it.	2023-05-04 19:19:04 +02:00
Willy Tarreau	009b5519e6	MINOR: debug: make "show threads" properly iterate over all threads Previously it would re-dump all threads to the same trash if the output buffer was full, which it never was since the trash is of the same size. Now it dumps one thread, copies it to the buffer and yields until it can continue. Showing 256 threads works as expected.	2023-05-04 19:15:50 +02:00
Willy Tarreau	880d1684a7	MINOR: debug: write panic dump to stderr one thread at a time Currently large setups cannot dump all their threads because they're first dumped to the trash buffer, then copied to stderr. Here we can now change this, instead we dump one thread at a time into the trash and immediately send it to stderr. We also keep a copy into a local trash chunk that's assigned to thread_dump_buffer so that a core file still contains a copy of a large number of threads, which is generally sufficient for the vast majority of situations. It was verified that dumping 256 threads now produces ~55kB of output and all of them are properly dumped.	2023-05-04 19:15:50 +02:00
Willy Tarreau	9a6ecbd590	MEDIUM: debug: simplify the thread dump mechanism The thread dump mechanism that is used by "show threads" and by the panic dump is overly complicated due to an initial misdesign. It firsts wakes all threads, then serializes their dumps, then releases them, while taking extreme care not to face colliding dumps. In fact this is not what we need and it reached a limit where big machines cannot dump all their threads anymore due to buffer size limitations. What is needed instead is to be able to dump one thread, and to let the requester iterate on all threads. That's what this patch does. It adds the thread_dump_buffer to the struct thread_ctx so that the requester offers the buffer to the thread that is about to be dumped. This buffer also serves as a lock. A thread at rest has a NULL, a valid pointer indicates the thread is using it, and 0x1 (NULL+1) is used by the dumped thread to tell the requester it's done. This makes sure that a given thread is dumped once at a time. In addition to this, the calling thread decides whether it accesses the thread by itself or via the debug signal handler, in order to get a backtrace. This is much saner because the calling thread is free to do whatever it wants with the buffer after each thread is dumped, and there is no dependency between threads, once they've dumped, they're free to continue (and possibly to dump for another requester if needed). Finally, when the THREAD_DUMP feature is disabled and the debug signal is not used, the requester accesses the thread by itself like before. For now we still have the buffer size limitation but it will be addressed in future patches.	2023-05-04 19:15:44 +02:00
Christopher Faulet	34f81d5815	BUG/MINOR: mux-h2: Also expect data when waiting for a tunnel establishment When a client H2 stream is waiting for a tunnel establishment, it must state it expects data from server. It is the second fix that should fix regressions of the commit 2722c04b ("MEDIUM: mux-h2: Don't expect data from server as long as request is unfinished") It is a 2.8-specific bug. No backport needed.	2023-05-04 16:58:33 +02:00
Willy Tarreau	cb01f5daa7	BUG/MINOR: debug: do not emit empty lines in thread dumps In 2.3, commit `471425f51` ("BUG/MINOR: debug: Don't dump the lua stack if it is not initialized") introduced the possibility to emit an empty line when there's no Lua info to dump. The problem is that doing this on the CLI in "show threads" marks the end of the output, and it may affect some external tools. We need to make sure that LFs are only emitted if there's something on the line and that all lines properly start with the prefix. This may be backported as far as 2.0 since the commit above was backported there.	2023-05-04 16:51:50 +02:00
Amaury Denoyelle	d4af04198b	MINOR: mux-quic: close connection asap on local error With the change for QUIC MUX local error API, the new flag QC_CF_ERRL is now checked on qc_detach(). If set, qcs instance is freed even though transfer is not finished. This should help to quickly release qcs and eventually all MUX instance resources. To further accelerate this, a specific check has been added in qc_shutw(). It is skipped if local error flag is set to prevent noisy reset stream invocation. In the same way, QUIC MUX is not rescheduled on qc_recv_buf() operation if local error flag set. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	35542ce7bf	MINOR: mux-quic: report local error on stream endpoint asap If an error a detected at the MUX layer, all remaining stream endpoints should be closed asap with error set. This is now done by checking for QC_CF_ERRL flag on qc_wake_some_streams() and qc_send_buf(). To complete this, qc_wake_some_streams() is called by qc_process() if needed. This should help to quickly release streams as soon as a new error is detected locally by the MUX or APP layer. This allows to in turn free the MUX instance itself. Previously, error would not have been automatically reported until the transport layer closure would occur on CONNECTION_CLOSE emission. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	51f116d65e	MINOR: mux-quic: adjust local error API When a fatal error is detected by the QUIC MUX or H3 layer, the connection should be closed with a CONNECTION_CLOSE with an error code as the reason. Previously, a direct call was used to the quic_conn layer to try to close the connection. This API was adjusted to be more flexible. Now, when an error is detected, the function qcc_set_error() is called. This set the flag QC_CF_ERRL with the error code stored by the MUX. The connection will be closed soon so most of the operations are not conducted anymore. Connection is then finally closed during qc_send() via quic_conn layer if QC_CF_ERRL is set. This will set the flag QC_CF_ERRL_DONE which indicates that the MUX instance can be freed. This model is cleaner and brings the following improvments : - interaction with quic_conn layer for closure is centralized on a single function - CO_FL_ERROR is not set anymore. This was incorrect as this should be reserved to errors reported by the transport layer to be similar with other haproxy components. As a consequence, qcc_is_dead() has been adjusted to check for QC_CF_ERRL_DONE to release the MUX instance. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	b8901d2c86	MINOR: mux-quic: wake up after recv only if avail data When HTX content is transferred from qcs instance to upper stream endpoint, a wakeup is conducted for MUX tasklet. However, this is only necessary if demux was interrupted due to a full QCS HTX buffer. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	8d44bfaf0b	MINOR: mux-quic: add trace event for local error Add a dedicated trace event QMUX_EV_QCC_ERR. This is used for locally detected error when a CONNECTION_CLOSE should be emitted. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	b737f95009	BUG/MINOR: mux-quic: prevent quic_conn error code to be overwritten When MUX performs a graceful shutdown, quic_conn error code is set to a "no error" code which depends on the application layer used. However, this may overwrite a previous error code if quic_conn layer has detected an error on its side. In practice, this behavior has not been seen on production. In fact, it may have undesirable effect only if this error code modification happens between the quic_conn error detection and the emission of the CONNECTION_CLOSE, so it should be pretty rare. However, there is still a tiny possibility it may happen. To prevent this, first check that quic_conn error code is not set before setting it. Ideally, transport layer API should be adjusted to be able to set this without fiddling with the quic_conn directly. This should be backported up to 2.6.	2023-05-04 16:36:51 +02:00
Christopher Faulet	4403cdf653	BUG/MEDIUM: mux-h2: Properly handle end of request to expect data from server The commit 2722c04b ("MEDIUM: mux-h2: Don't expect data from server as long as request is unfinished") introduced a regression in the H2 multiplexer. The end of the request is not systematically handled to state a H2 stream on client side now expexts data from the server. Indeed, while the client is uploading its request, the H2 stream warns it does not expect data from the server. This way, no server timeout is applied at this stage. When end of the request is detected, the H2 stream must state it now expects the server response. This enables the server timeout. However, it was only performed at one place while the end of the request can be handled at different places. First, during a zero-copy in h2_rcv_buf(). Then, when the SC is created with the full request. Because of this bug, it is possible to totally disable the server timeout for H2 streams. In h2_rcv_buf(), we now rely on h2s flags to detect the end of the request, but only when the rxbuf was emptied. It is a 2.8-specific bug. No backport needed.	2023-05-04 16:29:27 +02:00
Willy Tarreau	e5e62231d8	MINOR: debug: permit the "debug dev loop" to run under isolation Sometimes it's convenient to test the effect of tasks running under isolation, e.g. to validate the contents of the crash dumps. Let's add an optional "isolated" keyword to "debug dev loop" for this.	2023-05-04 11:50:26 +02:00
Willy Tarreau	b30ced3d88	BUG/MINOR: debug: fix incorrect profiling status reporting in show threads Thread dumps include a field "prof" for each thread that reports whether task profiling is currently active or not. It turns out that in 2.7-dev1, commit `680ed5f28` ("MINOR: task: move profiling bit to per-thread") mistakenly replaced it with a check for the current thread's bit in the thread dumps, which basically is the only place where another thread is being watched. The same mistake was done a few lines later by confusing threads_want_rdv_mask with the profiling mask. This mask disappeared in 2.7-dev2 with commit `598cf3f22` ("MAJOR: threads: change thread_isolate to support inter-group synchronization"), though instead we know the ID of the isolated thread. This commit fixes this and now reports "isolated" instead of "wantrdv". This can be backported to 2.7.	2023-05-04 11:41:33 +02:00
Willy Tarreau	8b3e39e37b	MINOR: activity: allow "show activity" to restart in the middle of a line 16kB buffers are not enough to dump 4096 threads with up to 10 bytes value on each line. By storing the column number in the applet's context, we can now restart from the last attempted column. This requires to dump all values as they are produced, but it doesn't cost that much: a 4096-thread output from a fesh process produces 300kB of output in ~8ms, or ~400us per call (19*16kB), most of which are spent in vfprintf(). Given that we don't print more than needed, it doesn't really change anything. The main caveat is that when interrupted on such large lines, there's a great possibility that the total or average on the first column doesn't match anymore the sum or average of all dumped values. In order to avoid this whenever possible (typically less than ~1500 threads), we first try to dump entire lines and only proceed one column at a time when we have to retry a failed dump. This is already the same for other stats that are dumped in an interruptible way anyway and there's little that can be done about it at this point (and not much immediately perceived benefit in doing this with extreme accuracy for >1500 threads).	2023-05-03 17:26:11 +02:00
Willy Tarreau	6ed0b9885d	MINOR: activity: allow "show activity" to restart dumping on any line When using many threads, it's difficult to see the end of "show activity" due to the numerous columns which fill the buffer. For example a dump of a 256-thread, freshly booted process yields around 15kB. Here by arranging the dump in a loop around a switch/case block where each case checks the code line number against the current dump position, we have a restartable counter for free with a granularity of the line of code, without having to maintain a matching between states and specific lines. It just requires to reset the trash buffer for each line and to try to dump it after each line. Now dumping 256 threads after a few seconds of traffic happily emits 20kB.	2023-05-03 17:24:54 +02:00
Willy Tarreau	8ee0d11cb8	MINOR: activity: iterate over all fields in a main loop for dumping Now each line of "show activity" will iterate over n+2 fields, one for the line header, one for the total, and one per thread. This will soon allow us to save the current state in a restartable way.	2023-05-03 17:24:54 +02:00
Willy Tarreau	a465b21516	MINOR: activity: show the line header inside the SHOW_VAL macro Doing so will allow us to drop the extra chunk_appendf() dedicated to the line header and simplify iteration over restartable columns.	2023-05-03 17:24:54 +02:00
Willy Tarreau	5ddf9bea09	MINOR: activity: use a single macro to iterate over all fields Instead of having SHOW_AVG() and SHOW_TOT(), let's just have SHOW_VAL() which iterates over all values.	2023-05-03 17:24:54 +02:00
Willy Tarreau	ff508f12c6	BUILD: cli: fix build on Windows due to isalnum() implemented as a macro Commit `986798718` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") broke the build on windows due to this: src/debug.c:940:95: error: array subscript has type char [-Werror=char-subscripts] 940 \| caller && may_access(caller) && may_access(caller->func) && isalnum(*caller->func) ? caller->func : "0", \| ^~~~~~~~~~~~~ It's classical on platforms which implement ctype.h as macros instead of functions, let's cast it as uchar. No backport is needed.	2023-05-03 16:32:50 +02:00
William Lallemand	117c7fde06	BUG/MINOR: ssl/sample: x509_v_err_str converter output when not found The x509_v_err_str converter now outputs the numerical value as a string when the corresponding constant name was not found. Must be backported as far as 2.7.	2023-05-03 15:19:38 +02:00
Willy Tarreau	9867987182	DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets When analyzing certain types of bugs in field, sometimes it would be nice to be able to wake up a task or tasklet to see how events progress (e.g. to detect a missing wakeup condition), or expire or kill such a task. This restricted command shows hte current state of a task or tasklet and allows to manipulate it like this. However it must be used with extreme care because while it does verify that the pointers are mapped, it cannot know if they point to a real task, and performing such actions on something not a task will easily lead to a crash. In addition, performing a "kill" on a task has great chances of provoking a deferred crash due to a double free and/or another kill that is not idempotent. Use with extreme care!	2023-05-03 11:47:44 +02:00
Willy Tarreau	dd01448953	MINOR: debug: clarify "debug dev stream" help message The help message was insufficient to figure how to use it and specify the stream pointer and changes to operate.	2023-05-03 11:47:44 +02:00
Willy Tarreau	65efd33c06	BUG/MINOR: stream/cli: fix stream age calculation in "show sess" The "show sess" command displays the stream's age in synthetic form, and also makes it appear in the long version (show sess all). But that last one uses the wrong origin, it uses accept_date.tv_sec instead of accept_ts (formerly known as tv_accept). This was introduced in 1.4.2 with the long format, with commit `66dc20a17` ("[MINOR] stats socket: add show sess <id> to dump details about a session"), while the code that split the two variables was introduced in 1.3.16 with commit `b7f694f20` ("[MEDIUM] implement a monotonic internal clock"). This problem was revealed by recent change `ad5a5f677` ("MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request") that made this value report random garbage, and generally emphasized by the fact that in 2.8 the two clocks have sufficiently large an offset for such mistakes to be noticeable early. Arguably a difference between date and accept_date could also make sense, to indicate if the stream had been there for more than 49 days, but this would introduce instabilities for most sockets (including negative times) for extremely rare cases while the goal is essentially to see how much longer than a configured timeout a stream has been there. And that's what other locations (including the short form) provide. This patch could be backported but most users will never notice. In case of backport, tv_accept.tv_sec should be used instead of accept_date.tv_sec.	2023-05-03 11:47:44 +02:00
William Lallemand	64a77e3ea5	MINOR: ssl: disable CRL checks with WolfSSL when no CRL file WolfSSL is enabling by default the CRL checks even if a CRL file wasn't provided. This patch resets the default X509_STORE flags so this is not checked by default.	2023-05-02 18:30:11 +02:00
Tim Duesterhus	0ababda701	BUG/MINOR: stats: fix typo in `TotalSplicedBytesOut` field name An additional `d` slipped in there. This likely should not be backported, because scripts might rely on the typoed name. Public discussion on this topic here: https://www.mail-archive.com/haproxy@formilux.org/msg43359.html	2023-05-02 11:15:49 +02:00
Amaury Denoyelle	bc0adfa334	MINOR: proxy: factorize send rate measurement Implement a new dedicated function increment_send_rate() which can be call anywhere new bytes must be accounted for global total sent.	2023-04-28 16:53:44 +02:00
Amaury Denoyelle	1bcb695a05	MINOR: quic: use real sending rate measurement Before this patch, global sending rate was measured on the QUIC lower layer just after sendto(). This meant that all QUIC frames were accounted for, including non STREAM frames and also retransmission. To have a better reflection of the application data transferred, move the incrementation into the MUX layer. This allows to account only for STREAM frames payload on their first emission. This should be backported up to 2.6.	2023-04-28 16:52:26 +02:00
Aleksandar Lazic	5529c9985e	MINOR: sample: Add bc_rtt and bc_rttvar This Patch adds fetch samples for backends round trip time.	2023-04-28 16:31:08 +02:00
Willy Tarreau	c05d30e9d8	MINOR: clock: replace the timeval start_time with start_time_ns Now that "now" is no more a timeval, there's no point keeping a copy of it as a timeval, let's also switch start_time to nanoseconds, it simplifies operations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	eed5da1037	MINOR: clock: do not use now.tv_sec anymore Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec part to disappear. At this point, "now" is only used as a timeval in clock.c where it is updated.	2023-04-28 16:08:08 +02:00
Willy Tarreau	e8e4712771	MINOR: checks: use a nanosecond counters instead of timeval for checks->start Now we store the checks start date as a nanosecond timestamps instead of a timeval, this will simplify the operations with "now" in the near future.	2023-04-28 16:08:08 +02:00
Willy Tarreau	b68d308aec	MINOR: activity: use nanoseconds, not timeval to compute uptime Now that we have the required functions, let's get rid of the timeval in intermediary calculations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	563efe62e9	MINOR: stats: use nanoseconds, not timeval to compute uptime Now that we have the required functions, let's get rid of the timeval in intermediary calculations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	ad5a5f6779	MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request Let's get rid of timeval in storage of internal timestamps so that they are no longer mistaken for wall clock time. These were exclusively used subtracted from each other or to/from "now" after being converted to ns, so this patch removes the tv_to_ns() conversion to use them natively. Two occurrences of tv_isge() were turned to a regular wrapping subtract.	2023-04-28 16:08:08 +02:00
Willy Tarreau	aaebcae58b	MINOR: spoe: switch the timeval-based timestamps to nanosecond timestamps Various points were collected during a request/response and were stored using timeval. Let's now switch them to nanosecond based timestamps.	2023-04-28 16:08:08 +02:00
Willy Tarreau	76d343d3d3	MINOR: time: replace calls to tv_ms_elapsed() with a linear subtract Instead of operating on {sec, usec} now we convert both operands to ns then subtract them and convert to ms. This is a first step towards dropping timeval from these timestamps. Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at all and could be removed.	2023-04-28 16:08:08 +02:00
Willy Tarreau	7222db7b84	BUG/MINOR: stats: report the correct start date in "show info" The "show info" help for "Start_time_sec" says "Start time in seconds" so it's definitely the start date in human format, not the internal one that is solely used to compute uptime. Since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"), both are split apart since the start time takes into account the offset needed to cause the early wraparound, so we must only use start_date here. No backport is needed.	2023-04-28 16:08:08 +02:00
Christopher Faulet	2ebac6a320	BUG/MEDIUM: tcpcheck: Don't eval custom expect rule on an empty buffer The commit `a664aa6a6` ("BUG/MINOR: tcpcheck: Be able to expect an empty response") instroduced a regression for expect rules relying on a custom function. Indeed, there is no check on the buffer to be sure it is not empty before calling the custom function. But some of these functions expect to have data and don't perform any test on the buffer emptiness. So instead of fixing all custom functions, we just don't eval them if the buffer is empty. This patch must be backported but only if the commit above was backported first.	2023-04-28 15:01:10 +02:00
Christopher Faulet	89aeabff5b	BUG/MINOR: resolvers: Use sc_need_room() to wait more room when dumping stats It was a cut/paste typo during stream-interface to conn-stream refactoring. sc_have_room() was used instead of sc_need_room(). This patch must be backported as far as 2.6.	2023-04-28 08:51:34 +02:00
Christopher Faulet	e99c43907c	BUG/MEDIUM: spoe: Don't start new applet if there are enough idle ones It is possible to start too many applets on sporadic burst of events after an inactivity period. It is due to the way we estimate if a new applet must be created or not. It is based on a frequency counter. We compare the events processing rate against the number of events currently processed (in progress or waiting to be processed). But we should also take care of the number of idle applets. We already track the number of idle applets, but it is global and not per-thread. Thus we now also track the number of idle applets per-thread. It is not a big deal because this fills a hole in the spoe_agent structure. Thanks to this counter, we can refrain applets creation if there is enough idle applets to handle currently processed events. This patch should be backported to every stable versions.	2023-04-28 08:51:34 +02:00
Willy Tarreau	d2f61de8c2	BUG/MINOR: hlua: return wall-clock date, not internal date in core.now() That's hopefully the last one affected by this. It was a bit trickier because there's the promise in the doc that the date is monotonous, so we continue to use now-start_time as the uptime value and add it to start_date to get the current date. It was also emphasized by commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"), causing core.now() to return a date of Mar 20 on Apr 27. No backport is needed.	2023-04-27 18:44:14 +02:00
Willy Tarreau	bc3c4e85f0	BUG/MINOR: trace: show wall-clock date, not internal date in show activity Yet another case where "now" was used instead of "date" for a publicly visible date that was already incorrect and became worse after commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"). No backport is needed.	2023-04-27 18:22:34 +02:00
Willy Tarreau	22b6d26c57	BUG/MINOR: calltrace: fix 'now' being used in place of 'date' Since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot") we have a much clearer distinction between 'now' (the internal, drifting clock) and 'date' (the wall clock time). The calltrace code was using "now" instead of "date" since the value is displayed to humans. No backport is needed.	2023-04-27 18:14:57 +02:00
Willy Tarreau	fe1b3b8777	Revert "BUG/MINOR: clock: fix a few occurrences of 'now' being used in place of 'date'" This reverts commit `aadcfc9ea6`. The parts affecting the DeviceAtlas addon were wrong actually, the "now" variable was a local time_t in a file that's not compiled with the haproxy binary (dadwsch). Only the fix to the calltrace is correct, so better revert and fix the only one in a separate commit. No backport is needed.	2023-04-27 18:14:57 +02:00
Willy Tarreau	82bde18aa4	BUG/MINOR: activity: show wall-clock date, not internal date in show activity Another case where "now" was used instead of "date" for a publicly visible date that was already incorrect and became worse after commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"). No backport is needed.	2023-04-27 14:47:50 +02:00
Willy Tarreau	a5f0e6cfc0	BUG/MINOR: spoe: use "date" not "now" in debug messages The debug messages were still emitted with a date taken from "now" instead of "date", which was not correct a long time ago but which became worse in 2.8 since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"). Let's fix it. No backport is needed.	2023-04-27 11:57:53 +02:00
Willy Tarreau	aadcfc9ea6	BUG/MINOR: clock: fix a few occurrences of 'now' being used in place of 'date' Since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot") we have a much clearer distinction between 'now' (the internal, drifting clock) and 'date' (the wall clock time). There were still a few places where 'now' was being used for human consumption. No backport is needed.	2023-04-26 19:21:25 +02:00
Amaury Denoyelle	7b516d3732	BUG/MINOR: quic: fix race on quic_conns list during affinity rebind Each quic_conn are attached in a global thread-local quic_conns list used for "show quic" command. During thread rebinding, a connection is detached from its local list instance and moved to its new thread list. However this operation is not thread-safe and may cause a race condition. To fix this, only remove the connection from its list inside qc_set_tid_affinity(). The connection is inserted only after in qc_finalize_affinity_rebind() on the new thread instance thus prevented a race condition. One impact of this is that a connection will be invisible during rebinding for "show quic". A connection must not transition to closing state in between this two steps or else cleanup via quic_handle_stopping() may not miss it. To ensure this, this patch relies on the previous commit : commit `d6646dddcc` MINOR: quic: finalize affinity change as soon as possible This should be backported up to 2.7.	2023-04-26 17:50:22 +02:00
Amaury Denoyelle	d6646dddcc	MINOR: quic: finalize affinity change as soon as possible During accept, a quic-conn is rebind to a new thread. This process is done in two times : * first on the original thread via qc_set_tid_affinity() * then on the newly assigned thread via qc_finalize_affinity_rebind() Most quic_conn operations (I/O tasklet, task and quic_conn FD socket read) are reactivated ony after the second step. However, there is a possibility that datagrams are handled before it via quic_dgram_parse() when using listener sockets. This does not seem to cause any issue but this may cause unexpected behavior in the future. To simplify this, qc_finalize_affinity_rebind() will be called both by qc_xprt_start() and quic_dgram_parse(). Only one invocation will be performed thanks to the new flag QUIC_FL_CONN_AFFINITY_CHANGED. This should be backported up to 2.7.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	a57ab0fabe	MINOR: mux-quic: do not allocate Tx buf for empty STREAM frame Sometimes it may be necessary to send an empty STREAM frame to signal clean stream closure with FIN bit set. Prior to this change, a Tx buffer was allocated unconditionnally even if no data is transferred. Most of the times, allocation was not performed due to an older buffer reused. But if data were already acknowledge, a new buffer is allocated. No memory leak occurs as the buffer is properly released when the empty frame acknowledge is received. But this allocation is unnecessary and it consumes a connexion Tx buffer for nothing. Improve this by skipping buffer allocation if no data to transfer. qcs_build_stream_frm() is now able to deal with a NULL out argument. This should be backported up to 2.6.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	42c5b75cac	MINOR: mux-quic: do not set buffer for empty STREAM frame Previous patch fixes an issue occurring with empty STREAM frames without payload. The crash was hidden in part because buf/data fields of qf_stream were set even if no payload is referenced. This was not the true cause of the crash but to ease future debugging, a STREAM frame built with no payload now has its buf and data fields set to NULL. This should be backported up to 2.6.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	19eaf88fda	BUG/MINOR: quic: prevent buggy memcpy for empty STREAM Sometimes it may be necessary to send empty STREAM frames with only the FIN bit set. For these frames, memcpy is thus unnecessary as their payload is empty. However, we did not prevent its invocation inside quic_build_stream_frame(). Normally, memcpy invocation with length==0 is safe. However, there is an extra condition in our function to handle data wrapping. For an empty STREAM frame in the context of MUX emission, this is safe as the frame points to a valid buffer which causes the wrapping condition to be false and resulting in a memcpy with 0 length. However, in the context of retransmission, this may lead to a crash. Consider the following scenario : two STREAM frames A and B are produced, one with payload and one empty with FIN set, pointing to the same stream_desc buffer. If A is acknowledged by the peer, its buffer is released as no more data is left in it. If B needs to be resent, the wrapping condition will be messed up to a reuse of a freed buffer. Most of the times, <wrap> will be a negative number, which results in a memcpy invocation causing a buffer overflow. To fix this, simply add an extra condition to skip memcpy and wrapping check if STREAM frame length is null inside quic_build_stream_frame(). This crash is pretty rare as it relies on a lot of conditions difficult to reproduce. It seems to be the cause for the latest crashes reported under github issue #2120. In all the inspected dumps, the segfault occurred during retransmission with an empty STREAM frame being used as input. Thanks again to Tristan from Mangadex for his help and investigation on it. This should be backported up to 2.6.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	7c5591facb	BUG/MEDIUM: mux-quic: improve streams fairness to prevent early timeout Since the following mentioned patch, a send-list mechanism was implemented to improve streams priorization on sending. commit `20f2a425ff` MAJOR: mux-quic: rework stream sending priorization This is done to prevent the same streams to always be used as first ones on emission. However there is still a flaw on the algorithm. Once put in the send-list, a streams is not removed until it has sent all of its content. When a stream transfers a large object, it will remain in the send-list during all the transfer and will soon monopolize the first place. the stream does never leave its position until the transfer is finished and will monopolize the first place. Other streams behind won't have the opportunity to advance on their own transfers due to a Tx buffer exhaustion. This situation is especially problematic if a small timeout client is used. As some streams won't advance on their transfer for a long period of time, they will be aborted due to a stream layer timeout client causing a RESET_STREAM emission. To fix this, during sending each stream with at least some bytes transferred from its tx.buf to qc_stream_desc out buffer is put at the end of the send-list. This ensures that on the next iteration streams that cannot transfer anything will be used in priority. This patch improves significantly h2load benchmarks for large objects with several streams opened in parallel on a single connection. Without it, errors may be reported by h2load for aborted streams. For example, this improved the following scenario on a 10mbit/s link with a 10s timeout client : $ ./build/bin/h2load --npn-list h3 -t 1 -c 1 -m 30 -n 30 https://198.18.10.11:20443/?s=500k This fix may help with the github issue #2004 where chrome browser stop to use QUIC after receiving RESET_STREAM frames. This should be backported up to 2.7.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	24962dd178	BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length Some HTX responses may not always contain a EOM block. For example this is the case if content-length header is missing from the HTTP server response. Stream termination is thus signaled to QUIC mux via shutw callback. However, this is interpreted inconditionnally as an early close by the mux with a RESET_STREAM emission. Most of the times, QUIC clients report this as an error. To fix this, check if htx.extra is set to HTX_UNKOWN_PAYLOAD_LENGTH for a qcs instance. If true, shutw will never be used to emit a RESET_STREAM. Instead, the stream will be closed properly with a FIN STREAM frame. If all data were already transfered, an empty STREAM frame is sent. This fix may help with the github issue #2004 where chrome browser stop to use QUIC after receiving RESET_STREAM frames. This issue was reported by Vladimir Zakharychev. Thanks to him for his help and testing. It was also reproduced locally using httpterm with the query string "/?s=1k&b=0&C=1". This should be backported up to 2.7.	2023-04-26 17:50:09 +02:00
Frédéric Lécaille	7d23e8d1a6	CLEANUP: quic: Rename several <buf> variables into quic_sock.c Rename some variables which are not struct buffer variables. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	bb426aa5f1	CLEANUP: quic: Rename <buf> variable into qc_parse_hd_form() There is no struct buffer variable manipulated by this function. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	6ff52f9ce5	CLEANUP: quic: Rename <buf> variable into quic_packet_read_long_header() Make this function be more readable: there is no struct buffer variable passed as parameter to this function. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	81a02b59f5	CLEANUP: quic: Rename several <buf> variables at low level Make quic_stateless_reset_token_cpy(), quic_derive_cid() and quic_get_cid_tid() be more readable: there is no struct buffer variable manipulated by these functions. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	182934d80b	CLEANUP: quic: Rename quic_get_dgram_dcid() <buf> variable quic_get_dgram_dcid() does not manipulate any struct buffer variable. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	1e0f8255a1	CLEANUP: quic: Make qc_build_pkt() be more readable There is no <buf> variable passed to this function. Also rename <buf_end> to <end> to mimic others functions. Rename <beg> to <first_byte> and <end> to <last_byte>. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	3adb9e85a1	CLEANUP: quic: Rename <buf> variable for several low level functions Make quic_build_packet_long_header(), quic_build_packet_short_header() and quic_apply_header_protection() be more readable: there is no struct buffer variables used by these functions. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	bef3098d33	CLEANUP: quic: Rename <buf> variable into quic_rx_pkt_parse() Make this function be more readable: there is no struct buffer variable used by this function. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	7f0b1c7016	CLEANUP: quic: Rename <buf> variable into quic_padding_check() Make quic_padding_check() be more readable: there is not struct buffer variable used by this function. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	dad0ede28a	CLEANUP: quic: Rename <buf> variable to <token> in quic_generate_retry_token() Make quic_generate_retry_token() be more readable: there is no struct buffer variable used in this function. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	e66d67a1ae	CLEANUP: quic: Remove useless parameters passes to qc_purge_tx_buf() Remove the pointer to the connection passed as parameters to qc_purge_tx_buf() and other similar function which came with qc_purge_tx_buf() implementation. They were there do track the connection during tests. Must be backported to 2.7.	2023-04-24 15:53:26 +02:00
Amaury Denoyelle	d5f03cd576	CLEANUP: quic: rename frame variables Rename all frame variables with the suffix _frm. This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:22 +02:00
Amaury Denoyelle	888c5f283a	CLEANUP: quic: rename frame types with an explicit prefix Each frame type used in quic_frame union has been renamed with the following prefix "qf_". This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:03 +02:00
Frédéric Lécaille	b73762ad78	BUG/MINOR: quic: Useless I/O handler task wakeups (draining, killing state) From the idle_timer_task(), the I/O handler must be woken up to send ack. But there is no reason to do that in draining state or killing state. In draining state this is even forbidden. Must be backported to 2.7.	2023-04-24 11:47:11 +02:00
Frédéric Lécaille	d21c628ffd	BUG/MINOR: quic: Useless probing retransmission in draining or killing state The timer task responsible of triggering probing retransmission did not inspect the state of the connection before doing its job. But there is no need to probe the peer when the connection is in draining or killing state. About the draining state, this is even forbidden. Must be backported to 2.7 and 2.6.	2023-04-24 11:46:33 +02:00
Frédéric Lécaille	c6bec2a3af	BUG/MINOR: quic: Possible leak during probing retransmissions qc_dgrams_retransmit() prepares two list of frames to be retransmitted into two datagrams. If the first datagram could not be sent, the TX buffer will be purged with the prepared packet and its frames, but this was not the case for the second list of frames. Must be backported in 2.7.	2023-04-24 11:38:28 +02:00
Frédéric Lécaille	ce0bb338c6	BUG/MINOR: quic: Possible memory leak from TX packets This bug arrived with this commit which was not sufficient: BUG/MEDIUM: quic: Missing TX buffer draining from qc_send_ppkts() Indeed, there were also remaining allocated TX packets to be released and their TX frames. Implement qc_purge_tx_buf() to do so which depends on qc_free_tx_coalesced_pkts() and qc_free_frm_list(). Must be backported to 2.7.	2023-04-24 11:38:28 +02:00
Frédéric Lécaille	e95e00e305	MINOR: quic: Move traces at proto level These traces has already been useful to debug issues. Must be backported to 2.7 and 2.6.	2023-04-24 11:38:16 +02:00
Willy Tarreau	0e875cf291	MEDIUM: listener: switch the default sharding to by-group Sharding by-group is exactly identical to by-process for a single group, and will use the same number of file descriptors for more than one group, while significantly lowering the kernel's locking overhead. Now that all special listeners (cli, peers) are properly handled, and that support for SO_REUSEPORT is detected at runtime per protocol, there should be no more reason for now switching to by-group by default. That's what this patch does. It does only this and nothing else so that it's easy to revert, should any issue be raised. Testing on an AMD EPYC 74F3 featuring 24 cores and 48 threads distributed into 8 core complexes of 3 cores each, shows that configuring 8 groups (one per CCX) is sufficient to simply double the forwarded connection rate from 112k to 214k/s, reducing kernel locking from 71 to 55%.	2023-04-23 10:18:16 +02:00
Willy Tarreau	7310164b2c	MINOR: listener: add a new global tune.listener.default-shards setting This new setting accepts "by-process", "by-group" and "by-thread" and will dictate how listeners will be sharded by default when nothing is specified. While the default remains "by-process", "by-group" should be much more efficient with many threads, while not changing anything for single-group setups.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c38499ceae	MINOR: listener: do not restrict CLI to first group anymore Now that we're able to run listeners on any set of groups, we don't need to maintain a special case about the stats socket anymore. It used to be forced to group 1 only so as to avoid startup failures in case several groups were configured, but if it's done now, it will automatically bind the needed FDs to have one per group so this is no more an issue.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f1003ea7fa	MINOR: protocol: perform a live check for SO_REUSEPORT support When testing if a protocol supports SO_REUSEPORT, we're now able to verify if the OS does really support it. While it may be supported at build time, it may possibly have been blocked in a container for example so we'd rather know what it's like.	2023-04-23 09:46:15 +02:00
Willy Tarreau	b073573c10	MINOR: sock: add a function to check for SO_REUSEPORT support at runtime The new function _sock_supports_reuseport() will be used to check if a protocol type supports SO_REUSEPORT or not. This will be useful to verify that shards can really work.	2023-04-23 09:46:15 +02:00
Willy Tarreau	8a5e6f4cca	MINOR: protocol: add a function to check if some features are supported The new function protocol_supports_flag() checks the protocol flags to verify if some features are supported, but will support being extended to refine the tests. Let's use it to check for REUSEPORT.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c1fbdd6397	MINOR: listener: automatically adjust shards based on support for SO_REUSEPORT Now if multiple shards are explicitly requested, and the listener's protocol doesn't support SO_REUSEPORT, sharding is disabled, which will result in the socket being automatically duped if needed. A warning is emitted when this happens. If "shards by-group" or "shards by-thread" are used, these will automatically be turned down to 1 since we want this to be possible easily using -dR on the command line without having to djust the config. For "by-thread", a diag warning will be emitted to help troubleshoot possible performance issues.	2023-04-23 09:46:15 +02:00
Willy Tarreau	785b89f551	MINOR: protocol: move the global reuseport flag to the protocols Some protocol support SO_REUSEPORT and others not. Some have such a limitation in the kernel, and others in haproxy itself (e.g. sock_unix cannot support multiple bindings since each one will unbind the previous one). Also it's really protocol-dependent and not just family-dependent because on Linux for some time it was supported for TCP and not UDP. Let's move the definition to the protocols instead. Now it's preset in tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset. The enabled() config condition test validates IPv4 (generally sufficient), and -dR / noreuseport all protocols at once.	2023-04-23 09:46:15 +02:00
Willy Tarreau	65df7e028d	MINOR: protocol: add a flags field to store info about protocols We'll use these flags to know if some protocols are supported, and if so, with what options/extensions. Reuseport will move there for example. Two functions were added to globally set/clear a flag.	2023-04-23 09:46:15 +02:00
Willy Tarreau	a22db6567f	MEDIUM: peers: call bind_complete_thread_setup() to finish the config The listeners in peers sections were still not handing the thread groups fine. Shards were silently ignored and if a listener was bound to more than one group, it would simply fail. Now we can call the dedicated function to resolve all this and possibly create the missing extra listeners. bind_complete_thread_setup() was adjusted to use the proxy_type_str() instead of writing "proxy" at the only place where this word was still hard-coded so that we continue to speak about peers sections when relevant.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f6a8444f55	REORG: listener: move the bind_conf's thread setup code to listener.c What used to be only two lines to apply a mask in a loop in check_config_validity() grew into a 130-line block that performs deeply listener-specific operations that do not have their place there anymore. In addition it's worth noting that the peers code still doesn't support shards nor being bound to more than one group, which is a second reason for moving that code to its own function. Nothing was changed except recreating the missing variables from the bind_conf itself (the fe only).	2023-04-23 09:46:15 +02:00
Willy Tarreau	e1a0107f9c	BUG/MINOR: config: fix NUMA topology detection on FreeBSD In 2.6-dev1, NUMA topology detection was enabled on FreeBSD with commit `f5d48f8b3` ("MEDIUM: cfgparse: numa detect topology on FreeBSD."). But it suffers from a minor bug which is that it forgets to check for the number of domains and always emits a confusing warning indicating that multiple sockets were found while it's not the case. This can be backported to 2.6.	2023-04-23 09:46:15 +02:00
Willy Tarreau	997ad155fe	BUG/MINOR: tools: check libssl and libcrypto separately The lib compatibility checks introduced in 2.8-dev6 with commit `c3b297d5a` ("MEDIUM: tools: further relax dlopen() checks too consider grouped symbols") were partially incorrect in that they check at the same time libcrypto and libssl. But if loading a library that only depends on libcrypto, the ssl-only symbols will be missing and this might present an inconsistency. This is what is observed on FreeBSD 13.1 when libcrypto is being loaded, where it sees two symbols having disappeared. The fix consists in splitting the checks for libcrypto and libssl. No backport is needed, unless the patch above finally gets backported.	2023-04-23 09:46:15 +02:00
Willy Tarreau	9f53b7b41a	BUG/MINOR: sock_inet: use SO_REUSEPORT_LB where available On FreeBSD 13.1 I noticed that thread balancing using shards was not always working. Sometimes several threads would work, but most of the time a single one was taking all the traffic. This is related to how SO_REUSEPORT works on FreeBSD since version 12, as it seems there is no guarantee that multiple sockets will receive the traffic. However there is SO_REUSEPORT_LB that is designed exactly for this, so we'd rather use it when available. This patch may possibly be backported, but nobody complained and it's not sure that many users rely on shards. So better wait for some feedback before backporting this.	2023-04-23 09:46:15 +02:00
Ilya Shipitsin	ccf8012f28	CLEANUP: assorted typo fixes in the code and comments This is 36th iteration of typo fixes	2023-04-23 09:44:53 +02:00
Willy Tarreau	023c311d70	BUG/MINOR: cli: clarify error message about stats bind-process In 2.7-dev2, "stats bind-process" was removed by commit `94f763b5e` ("MEDIUM: config: remove deprecated "bind-process" directives from frontends") and an error message indicates that it's no more supported. However it says "stats" is not supported instead of "stats bind-process", making it a bit confusing. This should be backported to 2.7.	2023-04-23 09:40:56 +02:00
Tim Duesterhus	1307cd42d2	CLEANUP: Stop checking the pointer before calling `ring_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { ring_free(e); - } @@ expression e; @@ - if (e) { ring_free(e); - } @@ expression e; @@ - if (e) ring_free(e); @@ expression e; @@ - if (e != NULL) ring_free(e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	fe83f58906	CLEANUP: Stop checking the pointer before calling `task_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { task_destroy(e); - } @@ expression e; @@ - if (e) { task_destroy(e); - } @@ expression e; @@ - if (e) task_destroy(e); @@ expression e; @@ - if (e != NULL) task_destroy(e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	c18e244515	CLEANUP: Stop checking the pointer before calling `pool_free()` Changes performed with this Coccinelle patch: @@ expression e; expression p; @@ - if (e != NULL) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) pool_free(p, e); @@ expression e; expression p; @@ - if (e != NULL) pool_free(p, e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	b1ec21d259	CLEANUP: Stop checking the pointer before calling `tasklet_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { tasklet_free(e); - } @@ expression e; @@ - if (e) { tasklet_free(e); - } @@ expression e; @@ - if (e) tasklet_free(e); @@ expression e; @@ - if (e != NULL) tasklet_free(e); See GitHub Issue #2126	2023-04-23 00:28:25 +02:00
Willy Tarreau	8adffaa899	MINOR: listener: always compare the local thread as well By comparing the local thread's load with the least loaded thread's load, we can further improve the fairness and at the same time also improve locality since it allows a small ratio of connections not to be migrated. This is visible on CPU usage with long connections on very large thread counts (224) and high bandwidth (200G). The cost of checking the local thread's load remains fairly low so there's no reason not to do this. We continue to update the index if we select the local thread, because it means that the two other threads were both more loaded so we'd rather find better ones.	2023-04-21 17:41:26 +02:00
Willy Tarreau	ff18504d73	MINOR: listener: make sure to avoid ABA updates in per-thread index One limitation of the current thread index mechanism is that if the values are assigned multiple times to the same thread and the index loops, it can match again the old value, which will not prevent a competing thread from finishing its CAS and assigning traffic to a thread that's not the optimal one. The probability is low but the solution is simple enough and consists in implementing an update counter in the high bits of the index to force a mismatch in this case (assuming we don't try to cover for extremely unlikely cases where the update counter loops while the index remains equal). So let's do that. In order to improve the situation a little bit, we now set the index to a ulong so that in 32 bits we have 8 bits of counter and in 64 bits we have 40 bits.	2023-04-21 17:41:26 +02:00
Willy Tarreau	77e33509c8	MINOR: listener: resync with the thread index before heavy calculations During heavy accept competition, the CAS will occasionally fail and we'll have to go through all the calculation again. While the first two loops look heavy, they're almost never taken so they're quite cheap. However the rest of the operation is heavy because we have to consult connection counts and queue indexes for other threads, so better double-check if the index is still valid before continuing. Tests show that it's more efficient do retry half-way like this.	2023-04-21 17:41:26 +02:00
Willy Tarreau	b657492680	MINOR: listener: use a common thr_idx from the reference listener Instead of seeing each listener use its own thr_idx, let's use the same for all those from a shard. It should provide more accurate and smoother thread allocation.	2023-04-21 17:41:26 +02:00
Willy Tarreau	9d360604bd	MEDIUM: listener: rework thread assignment to consider all groups Till now threads were assigned in listener_accept() to other threads of the same group only, using a single group mask. Now that we have all the relevant info (array of listeners of the same shard), we can spread the thr_idx to cover all assigned groups. The thread indexes now contain the group number in their upper bits, and the indexes run over te whole list of threads, all groups included. One particular subtlety here is that switching to a thread from another group also means switching the group, hence the listener. As such, when changing the group we need to update the connection's owner to point to the listener of the same shard that is bound to the target group.	2023-04-21 17:41:26 +02:00
Willy Tarreau	e6f5ab5afa	MINOR: listener: make accept_queue index atomic There has always been a race when checking the length of an accept queue to determine which one is more loaded that another, because the head and tail are read at two different moments. This is not required, we can merge them as two 16 bit numbers inside a single 32-bit index that is always accessed atomically. This way we read both values at once and always have a consistent measurement.	2023-04-21 17:41:26 +02:00
Willy Tarreau	09b52d1c3d	MEDIUM: config: permit to start a bind on multiple groups at once Now it's possible for a bind line to span multiple thread groups. When this happens, the first one will become the reference and will be entirely set up, and the subsequent ones will be duplicated from this reference, so that they can be registered in distinct groups. The reference is always setup and started first so it is always available when the other ones are started. The doc was updated to reflect this new possibility with its limitations and impacts, and the differences with the "shards" option.	2023-04-21 17:41:26 +02:00
Willy Tarreau	09e266e6f5	MINOR: proto: skip socket setup for duped FDs It's not strictly necessary, but it's still better to avoid setting up the same socket multiple times when it's being duplicated to a few FDs. We don't change that for inherited ones however since they may really need to be set up, so we only skip duplicated ones.	2023-04-21 17:41:26 +02:00
Willy Tarreau	0e1aaf4e78	MEDIUM: proto: duplicate receivers marked RX_F_MUST_DUP The different protocol's ->bind() function will now check the receiver's RX_F_MUST_DUP flag to decide whether to bind a fresh new listener from scratch or reuse an existing one and just duplicate it. It turns out that the existing code already supports reusing FDs since that was done as part of the FD passing and inheriting mechanism. Here it's not much different, we pass the FD of the reference receiver, it gets duplicated and becomes the new receiver's FD. These FDs are also marked RX_F_INHERITED so that they are not exported and avoid being touched directly (only the reference should be touched).	2023-04-21 17:41:26 +02:00
Willy Tarreau	aae1810b4d	MINOR: receiver: add a struct shard_info to store info about each shard In order to create multiple receivers for one multi-group shard, we'll need some more info about the shard. Here we store: - the number of groups (= number of receivers) - the number of threads (will be used for accept LB) - pointer to the reference rx (to get the FD and to find all threads) - pointers to the other members (to iterate over all threads) For now since there's only one group per shard it remains simple. The listener deletion code already takes care of removing the current member from its shards list and moving others' reference to the last one if it was their reference (so as to avoid o(n^2) updates during ordered deletes). Since the vast majority of setups will not use multi-group shards, we try to save memory usage by only allocating the shard_info when it is needed, so the principle here is that a receiver shard_info==NULL is alone and doesn't share its socket with another group. Various approaches were considered and tests show that the management of the listeners during boot makes it easier to just attach to or detach from a shard_info and automatically allocate it if it does not exist, which is what is being done here. For now the attach code is not called, but detach is already called on delete.	2023-04-21 17:41:26 +02:00
Willy Tarreau	84fe1f479b	MINOR: listener: support another thread dispatch mode: "fair" This new algorithm for rebalancing incoming connections to multiple threads is simpler and instead of considering the threads load, it will only cycle through all of them, offering a fair share of the traffic to each thread. It may be well suited for short-lived connections but is also convenient for very large thread counts where it's not always certain that the least loaded thread will always be found.	2023-04-21 17:41:26 +02:00
Willy Tarreau	6a4d48b736	MINOR: quic_sock: index li->per_thr[] on local thread id, not global one There's a li_per_thread array in each listener for use with QUIC listeners. Since thread groups were introduced, this array can be allocated too large because global.nbthread is allocated for each listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP) may be used by a single listener. This was because the global thread ID is used as the index instead of the local ID (since a listener may only be used by a single group). Let's just switch to local ID and reduce the allocated size.	2023-04-21 17:41:26 +02:00
Willy Tarreau	77d37b07b1	MINOR: quic: support migrating the listener as well When migrating a quic_conn to another thread, we may need to also switch the listener if the thread belongs to another group. When this happens, the freshly created connection will already have the target listener, so let's just pick it from the connection and use it in qc_set_tid_affinity(). Note that it will be the caller's responsibility to guarantee this.	2023-04-21 17:41:26 +02:00
Aurelien DARRAGON	23f352f7d0	MINOR: server/event_hdl: prepare for server event data wrapper Adding the possibility to publish an event using a struct wrapper around existing SERVER events to provide additional contextual info. Using the specific struct wrapper is not required: it is supported to cast event data as a regular server event data struct so that we don't break the existing API. However, casting event data with a more explicit data type allows to fetch event-only relevant hints.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	f71e0645c1	MEDIUM: server: split srv_update_status() in two functions Considering that srv_update_status() is now synchronous again since `3ff577e1` ("MAJOR: server: make server state changes synchronous again"), and that we can easily identify if the update is from an operational or administrative context thanks to "MINOR: server: pass adm and op cause to srv_update_status()". And given that administrative and operational updates cannot be cumulated (since srv_update_status() is called synchronously and independently for admin updates and state/operational updates, and the function directly consumes the changes). We split srv_update_status() in 2 distinct parts: Either <type> is 0, meaning the update is an operational update which is handled by directly looking at cur_state and next_state to apply the proper transition. Also, the check to prevent operational state from being applied if MAINT admin flag is set is no longer needed given that the calling functions already ensure this (ie: srv_set_{running,stopping,stopped) Or <type> is 1, meaning the update is an administrative update, where cur_admin and next_admin are evaluated to apply the proper transition and deduct the resulting server state (next_state is updated implicitly). Once this is done, both operations share a common code path in srv_update_status() to update proxy and servers stats if required. Thanks to this change, the function's behavior is much more predictable, it is not an all-in-one function anymore. Either we apply an operational change, else it is an administrative change. That's it, we cannot mix the 2 since both code paths are now properly separated.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	76e255520f	MINOR: server: pass adm and op cause to srv_update_status() Operational and administrative state change causes are not propagated through srv_update_status(), instead they are directly consumed within the function to provide additional info during the call when required. Thus, there is no valid reason for keeping adm and op causes within server struct. We are wasting space and keeping uneeded complexity. We now exlicitly pass change type (operational or administrative) and associated cause to srv_update_status() so that no extra storage is needed since those values are only relevant from srv_update_status().	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	10518c0d59	CLEANUP: server: fix srv_set_{running, stopping, stopped} function comment Fixing function comments for the server state changing function since they still refer to asynchonous propagation of server state which is no longer in play. Moreover, there were some mixups between running/stopping.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	c54b98ac9a	CLEANUP: server: remove unused variables in srv_update_status() check and px local variable aliases are not very useful. Let's remove them and use s->check and s->proxy instead.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	1746b56e68	MINOR: server: change srv_op_st_chg_cause storage type This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type". While looking at current srv_op_st_chg_cause usage, it was clear that the struct needed some cleanup since some leftovers from asynchronous server state change updates were left behind and resulted in some useless code duplication, and making the whole thing harder to maintain. Two observations were made: - by tracking down srv_set_{running, stopped, stopping} usage, we can see that the <reason> argument is always a fixed statically allocated string. - check-related state change context (duration, status, code...) is not used anymore since srv_append_status() directly extracts the values from the server->check. This is pure legacy from when the state changes were applied asynchronously. To prevent code duplication, useless string copies and make the reason/cause more exportable, we store it as an enum now, and we provide srv_op_st_chg_cause() function to fetch the related description string. HEALTH and AGENT causes (check related) are now explicitly identified to make consumers like srv_append_op_chg_cause() able to fetch checks info from the server itself if they need to.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	f3b48a808e	MINOR: server: srv_append_status refacto srv_append_status() has become a swiss-knife function over time. It is used from server code and also from checks code, with various inputs and distincts code paths, making it very hard to guess the actual behavior of the function (resulting string output). To simplify the logic behind it, we're dividing it in multiple contextual functions that take simple inputs and do explicit things, making them more predictable and easier to maintain.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9b1ccd7325	MINOR: server: change adm_st_chg_cause storage type Even though it doesn't look like it at first glance, this is more like a cleanup than an actual code improvement: Given that srv->adm_st_chg_cause has been used to exclusively store static strings ever since it was implemented, we make the choice to store it as an enum instead of a fixed-size string within server struct. This will allow to save some space in server struct, and will make it more easily exportable (ie: event handlers) because of the reduced memory footprint during handling and the ability to later get the corresponding human-readable message when it's explicitly needed.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	85b91375bf	MINOR: server: propagate lb changes through srv_lb_propagate() Now that we have a generic srv_lb_propagate(s) function, let's use it each time we explicitly wan't to set the status down as well. Indeed, it is tricky to try to handle "down" case explicitly, instead we use srv_lb_propagate() which will call the proper function that will handle the new server state. This will allow some code cleanup and will prevent any logic error. This commit depends on: - "MINOR: server: propagate server state change to lb through single function"	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	8bbe643acc	MINOR: server: propagate server state change to lb through single function Use a dedicated helper function to propagate server state change to lb algorithms, since it is performed at multiple places within srv_update_status() function.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	5f80f8bbc5	MINOR: server: central update for server counters on state change Based on "BUG/MINOR: server: don't miss server stats update on server state transitions", we're also taking advantage of the new centralized logic to update down_trans server counter directly from there instead of multiple places.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9c21ff0208	BUG/MINOR: server: don't use date when restoring last_change from state file When restoring from a state file: the server "Status" reports weird values on the html stats page: "5s UP" becomes -> "? UP" after the restore This is due to a bug in srv_state_srv_update(): when restoring the states from a state file, we rely on date.tv_sec to compute the process-relative server last_change timestamp. This is wrong because everywhere else we use now.tv_sec when dealing with last_change, for instance in srv_update_status(). date (which is Wall clock time) deviates from now (monotonic time) in the long run. They should not be mixed, and given that last_change is an internal time value, we should rely on now.tv_sec instead. last_change export through "show servers state" cli is safe since we export a delta and not the raw time value in dump_servers_state(): srv_time_since_last_change = now.tv_sec - srv->last_change -- While this bug affects all stable versions, it was revealed in 2.8 thanks to `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot") This is due to the fact that "now" immediately deviates from "date", whereas in the past they had the same value when starting. Thus prior to 2.8 the bug is trickier since it could take some time for date and now to deviate sufficiently for the issue to arise, and instead of reporting absurd values that are easy to spot it could just result in last_change becoming inconsistent over time. As such, the fix should be backported to all stable versions. [for 2.2 the patch needs to be applied manually since srv_state_srv_update() was named srv_update_state() and can be found in server.c instead of server_state.c]	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9f5853fa38	BUG/MINOR: server: don't miss server stats update on server state transitions s->last_change and s->down_time updates were manually updated for each effective server state change within srv_update_status(). This is rather error-prone, and as a result there were still some state transitions that were not handled properly since at least 1.8. ie: - when transitionning from DRAIN to READY: downtime was updated (which is wrong since a server in DRAIN state should not be considered as DOWN) - when transitionning from MAINT to READY: downtime was not updated (this can be easily seen in the html stats page) To fix these all at once, and prevent similar bugs from being introduced, we centralize the server last_change and down_time stats logic at the end of srv_update_status(): If the server state changed during the call, then it means that last_change must be updated, with a special case when changing from STOPPED state which means the server was previously DOWN and thus downtime should be updated. This patch depends on: - "MINOR: server: explicitly commit state change in srv_update_status()" This could be backported to every stable versions.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	e80ddb18a8	BUG/MINOR: server: don't miss proxy stats update on server state transitions backend "down" stats logic has been duplicated multiple times in srv_update_status(), resulting in the logic now being error-prone. For example, the following bugfix was needed to compensate for a copy-paste introduced bug: `d332f139` ("BUG/MINOR: server: update last_change on maint->ready transitions too") While the above patch works great, we actually forgot to update the proxy downtime like it is done for other down->up transitions... This is simply illustrating that the current design is error-prone, it is very easy to miss something in this area. To properly update the proxy downtime stats on the maint->ready transition, to cleanup srv_update_status() and to prevent similar bugs from being introduced in the future, proxy/backend stats update are now automatically performed at the end of the server state change if needed. Thus we can remove existing updates that were performed at various places within the function, this simplifies things a bit. This patch depends on: - "MINOR: server: explicitly commit state change in srv_update_status()" This could be backported to all stable versions. Backport notes: 2.2: Replace struct task srv_cleanup_toremove_conns(struct task task, void context, unsigned int state) by struct task srv_cleanup_toremove_connections(struct task task, void context, unsigned short state)	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	22151c70bb	MINOR: server: explicitly commit state change in srv_update_status() As shown in `8f29829` ("BUG/MEDIUM: checks: a down server going to maint remains definitely stucked on down state."), state changes that don't result in explicit lb state change, require us to perform an explicit server state commit to make sure the next state is applied before returning from the function. This is the case for server state changes that don't trigger lb logic and only perform some logging. This is quite error prone, we could easily forget a state change combination that could result in next_state, next_admin or next_eweight not being applied. (cur_state, cur_admin and cur_eweight would be left with unexpected values) To fix this, we explicitly call srv_lb_commit_status() at the end of srv_update_status() to enforce the new values, even if they were already applied. (when a state changes requires lb state update an implicit commit is already performed) Applying the state change multiple times is safe (since the next value always points to the current value). Backport notes: 2.2: Replace struct task srv_cleanup_toremove_conns(struct task task, void context, unsigned int state) by struct task srv_cleanup_toremove_connections(struct task task, void context, unsigned short state)	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9a1df02ccb	BUG/MINOR: server: incorrect report for tracking servers leaving drain Report message for tracking servers completely leaving drain is wrong: The check for "leaving drain .. via" never evaluates because the condition !(s->next_admin & SRV_ADMF_FDRAIN) is always true in the current block which is guarded by !(s->next_admin & SRV_ADMF_DRAIN). For tracking servers that leave inherited drain mode, this results in the following message being emitted: "Server x/b is UP (leaving forced drain)" Instead of: "Server x/b is UP (leaving drain) via x/a" To this fix: we check if FDRAIN is currently set, else it means that the drain status is inherited from the tracked server (IDRAIN) This regression was introduced with `64cc49cf` ("MAJOR: servers: propagate server status changes asynchronously."), thus it may be backported to every stable versions.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	096b383e16	MINOR: hlua/event_hdl: timestamp for events 'when' optional argument is provided to lua event handlers. It is an integer representing the number of seconds elapsed since Epoch and may be used in conjunction with lua `os.date()` function to provide a custom format string.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	e9314fb7a7	MINOR: event_hdl: provide event->when for advanced handlers For advanced async handlers only (Registered using EVENT_HDL_ASYNC_TASK() macro): event->when is provided as a struct timeval and fetched from 'date' haproxy global variable. Thanks to 'when', related event consumers will be able to timestamp events, even if they don't work in real-time or near real-time. Indeed, unlike sync or normal async handlers, advanced async handlers could purposely delay the consumption of pending events, which means that the date wouldn't be accurate if computed directly from within the handler.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	ebf58e991a	MINOR: event_hdl: dynamically allocated event data members Add the ability to provide a cleanup function for event data passed via the publishing function. One use case could be the need to provide valid pointers in the safe section of the data struct. Cleanup function will be automatically called with data (or copy of data) as argument when all handlers consumed the event, which provides an easy way to release some memory or decrement refcounts to ressources that were provided through the data struct. data in itself may not be freed by the cleanup function, it is handled by the API. This would allow passing large (allocated) data blocks through the data struct while keeping data struct size under the EVENT_HDL_ASYNC_EVENT_DATA size limit. To do so, when publishing an event, where we would currently do: struct event_hdl_cb_data_new_family event_data; /* safe data, available from both sync and async contexts * may not use pointers to short-living resources / event_data.safe.my_custom_data = x; / unsafe data, only available from sync contexts / event_data.unsafe.my_unsafe_data = y; / once data is prepared, we can publish the event / event_hdl_publish(NULL, EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1, EVENT_HDL_CB_DATA(&event_data)); We could do: struct event_hdl_cb_data_new_family event_data; / safe data, available from both sync and async contexts * may not use pointers to short-living resources, * unless EVENT_HDL_CB_DATA_DM is used to ensure pointer * consistency (ie: refcount) / event_data.safe.my_custom_static_data = x; event_data.safe.my_custom_dynamic_data = malloc(1); / unsafe data, only available from sync contexts / event_data.unsafe.my_unsafe_data = y; / once data is prepared, we can publish the event / event_hdl_publish(NULL, EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1, EVENT_HDL_CB_DATA_DM(&event_data, data_new_family_cleanup)); With data_new_family_cleanup func which would look like this: void data_new_family_cleanup(const void data) { const struct event_hdl_cb_data_new_family event_data = ptr; / some data members require specific cleanup once the event * is consumed / free(event_data.safe.my_custom_dynamic_data); / don't ever free data! it is not ours */ } Not sure if this feature will become relevant in the future, so I prefer not to mention it in the doc for now. But given that the implementation is trivial and does not put a burden on the existing API, it's a good thing to have it there, just in case.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	a63f4903c9	MINOR: server/event_hdl: prepare for upcoming refactors This commit does nothing that ought to be mentioned, except that it adds missing comments and slighty moves some function calls out of "sensitive" code in preparation of some server code refactors.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	2f6a07dce8	MINOR: hlua/event_hdl: fix return type for hlua_event_hdl_cb_data_push_args Changing hlua_event_hdl_cb_data_push_args() return type to void since it does not return anything useful. Also changing its name to hlua_event_hdl_cb_push_args() since it does more than just pushing cb data argument (it also handles event type and mgmt). Errors catched by the function are reported as lua errors.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	55f84c7cab	MINOR: hlua/event_hdl: expose proxy_uuid variable in server events Adding proxy_uuid to ServerEvent class. proxy_uuid contains the uuid of the proxy to which the server belongs	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	3d9bf4e1a5	MINOR: hlua/event_hdl: rely on proxy_uuid instead of proxy_name for lookups Since "MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server" we may now use proxy_uuid variable to perform proxy lookups when handling a server event. It is more reliable since proxy_uuid isn't subject to any size limitation	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	d714213862	MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server Expose proxy_uuid variable in event_hdl_cb_data_server struct to overcome proxy_name fixed length limitation. proxy_uuid may be used by the handler to perform proxy lookups. This should be preferred over lookups relying proxy_name. (proxy_name is suitable for printing / logging purposes but not for ID lookups since it has a maximum fixed length)	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	0ddf052972	CLEANUP: server: fix update_status() function comment srv_update_status() function comment says that the function "is designed to be called asynchronously". While this used to be true back then with `64cc49cf` ("MAJOR: servers: propagate server status changes asynchronously.") This is not true anymore since `3ff577e` ("MAJOR: server: make server state changes synchronous again") Fixing the comment in order to better reflect current behavior.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	88687f0980	CLEANUP: errors: fix obsolete function comments Since `9f903af5` ("MEDIUM: log: slightly refine the output format of alerts/warnings/etc"), messages generated by ha_{alert,warning,notice} don't embed date/time information anymore. Updating some old function comments that kept saying otherwise.	2023-04-21 14:36:45 +02:00
Amaury Denoyelle	a65dd3a2c8	BUG/MINOR: quic: consume Rx datagram even on error A BUG_ON crash can occur on qc_rcv_buf() if a Rx packet allocation failed. To fix this, datagram are marked as consumed even if a fatal error occured during parsing. For the moment, only a Rx packet allocation failure could provoke this. At this stage, it's unknown if the datagram were partially parsed or not at all so it's better to discard it completely. This bug was detected using -dMfail argument. This should be backported up to 2.7.	2023-04-20 14:49:32 +02:00
Amaury Denoyelle	d537ca79dc	BUG/MINOR: quic: prevent crash on qc_new_conn() failure Properly initialize el_th_ctx member first on qc_new_conn(). This prevents a segfault if release should be called later due to memory allocation failure in the function on qc_detach_th_ctx_list(). This should be backported up to 2.7.	2023-04-20 14:49:32 +02:00
Amaury Denoyelle	9bbfa72b67	BUG/MINOR: h3: fix crash on h3s alloc failure Do not emit a CONNECTION_CLOSE on h3s allocation failure. Indeed, this causes a crash as the calling function qcs_new() will also try to emit a CONNECTION_CLOSE which triggers a BUG_ON() on qcc_emit_cc(). This was reproduced using -dMfail. This should be backported up to 2.7.	2023-04-20 14:49:32 +02:00
Amaury Denoyelle	93d2ebe9f3	BUG/MINOR: mux-quic: properly handle STREAM frame alloc failure Previously, if a STREAM frame cannot be allocated for emission, a crash would occurs due to an ABORT_NOW() statement in _qc_send_qcs(). Replace this by proper error code handling. Each stream were sending fails are removed temporarily from qcc::send_list to a list local to _qc_send_qcs(). Once emission has been conducted for all streams, reinsert failed stream to qcc::send_list. This avoids to reloop on failed streams on the second while loop at the end of _qc_send_qcs(). This crash was reproduced using -dMfail. This should be backported up to 2.6.	2023-04-20 14:49:32 +02:00
Amaury Denoyelle	ed820823f0	BUG/MINOR: mux-quic: fix crash with app ops install failure On MUX initialization, the application layer is setup via qcc_install_app_ops(). If this function fails MUX is deallocated and an error is returned. This code path causes a crash before connection has been registered prior into the mux_stopping_data::list for stopping idle frontend conns. To fix this, insert the connection later in qc_init() once no error can occured. The crash was seen on the process closing with SUGUSR1 with a segfault on mux_stopping_process(). This was reproduced using -dMfail. This regression was introduced by the following patch : commit `b4d119f0c7` BUG/MEDIUM: mux-quic: fix crash on H3 SETTINGS emission This should be backported up to 2.7.	2023-04-20 14:49:32 +02:00
Frédéric Lécaille	d07421331f	BUG/MINOR: quic: Wrong Retry token generation timestamp computing Again a now_ms variable value used without the ticks API. It is used to store the generation time of the Retry token to be received back from the client. Must be backported to 2.6 and 2.7.	2023-04-19 17:31:28 +02:00
Frédéric Lécaille	45662efb2f	BUG/MINOR: quic: Unchecked buffer length when building the token As server, an Initial does not contain a token but only the token length field with zero as value. The remaining room was not checked before writting this field. Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Frédéric Lécaille	0ed94032b2	MINOR: quic: Do not allocate too much ack ranges Limit the maximum number of ack ranges to QUIC_MAX_ACK_RANGES(32). Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Frédéric Lécaille	4b2627beae	BUG/MINOR: quic: Stop removing ACK ranges when building packets Since this commit: BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit. There are more chances that ack ranges may be removed from their trees when building a packet. It is preferable to impose a limit to these trees. This will be the subject of the a next commit to come. For now on, it is sufficient to stop deleting ack range from their trees. Remove quic_ack_frm_reduce_sz() and quic_rm_last_ack_ranges() which were there to do that. Make qc_frm_len() support ACK frames and calls it to ensure an ACK frame may be added to a packet before building it. Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Aurelien DARRAGON	8cd620b46f	MINOR: hlua: safe coroutine.create() Overriding global coroutine.create() function in order to link the newly created subroutine with the parent hlua ctx. (hlua_gethlua() function from a subroutine will return hlua ctx from the hlua ctx on which the coroutine.create() was performed, instead of NULL) Doing so allows hlua_hook() function to support being called from subroutines created using coroutine.create() within user lua scripts. That is: the related subroutine will be immune to the forced-yield, but it will still be checked against hlua timeouts. If the subroutine fails to yield or finish before the timeout, the related lua handler will be aborted (instead of going rogue unnoticed like it would be the case prior to this commit)	2023-04-19 11:03:31 +02:00

... 3 4 5 6 7 ...

16085 Commits