haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-09 08:37:04 +02:00

Author	SHA1	Message	Date
Amaury Denoyelle	8de35925f7	MINOR: mux-quic: set both EOI EOS for stream fin A recent review was done to rationalize ERR/EOS/EOI flags on stream endpoint. A common definition for both H1/H2/QUIC mux have been written in the following documentation : ./doc/internals/stconn-close.txt Always set EOS with EOI flag to conform to this specification. EOI is set whenever the proper stream end has been encountered : with QUIC it corresponds to a STREAM frame with FIN bit. At this step, RESET_STREAM frames are ignored by QUIC MUX as allowed by RFC 9000. This means we can always set EOS at the same time with EOI. This should be backported up to 2.7.	2023-05-24 14:23:22 +02:00
Christopher Faulet	2437377445	MEDIUM: stconn/applet: Allow SF_SL_EOS flag alone During the refactoring on SC/SE flags, it was stated that SE_FL_EOS flag should not be set without on of SE_FL_EOI or SE_FL_ERROR flags. In fact, it is a problem for the QUIC/H3 multiplexer. When a RST_STREAM frame is received, it means no more data will be received from the peer. And this happens before the end of the message (RST_STREAM frame received after the end of the message are ignored). At this stage, it is a problem to report an error because from the QUIC point of view, it is valid. Data may still be sent to the peer. If an error is reported, this will stop the data sending too. In the same idea, the H1 mulitplexer reports an error when the message is truncated because of a read0. But only an EOS flag should be reported in this case, not an error. Fundamentally, it is important to distinguish errors from shuts for reads because some cases are valid. For instance a H1 client can choose to stop uploading data if it received the server response. So, relax tests on SE flags by removing BUG_ON_HOT() on SE_FL_EOS flag. For now, the abort will be handled in the HTTP analyzers.	2023-05-23 15:52:35 +02:00
Amaury Denoyelle	aa39cc9f42	MINOR: quic: fix alignment of oneline show quic Output of 'show quic' CLI in oneline mode was not correctly done. This was caused both due to differing qc pointer size and ports length. Force proper alignment by using maximum sizes as expected and complete with blanks if needed. This should be backported up to 2.7.	2023-05-22 14:18:02 +02:00
Amaury Denoyelle	7385ff3f0c	BUG/MINOR: quic: handle Tx packet allocation failure properly qc_prep_app_pkts() is responsible to built several new packets for sending. It can fail due to memory allocation error. Before this patch, the Tx buffer was released on error even if some packets were properly generated. With this patch, if an error happens on qc_prep_app_pkts(), we still try to send already built packets if Tx buffer is not empty. The sending loop is then interrupted and the Tx buffer is released with data cleared. This should be backported up to 2.7.	2023-05-22 14:18:02 +02:00
Amaury Denoyelle	f8fbb0b94e	MINOR: quic: use WARN_ON for encrypt failures It is expected that quic_packet_encrypt() and quic_apply_header_protection() never fails as encryption is done in place. This allows to remove their return value. This is useful to simplify error handling on sending path. An error can only be encountered on the first steps when allocating a new packet or copying its frame content. After a clear packet is successfully built, no error is expected on encryption. However, it's still unclear if our assumption that in-place encryption function never fail. As such, a WARN_ON() statement is used if an error is detected at this stage. Currently, it's impossible to properly manage this without data loss as this will leave partially unencrypted data in the send buffer. If warning are reported a solution will have to be implemented. This should be backported up to 2.7.	2023-05-22 11:20:44 +02:00
Amaury Denoyelle	5eadc27623	MINOR: quic: remove return val of quic_aead_iv_build() quic_aead_iv_build() should never fail unless we call it with buffers of different size. This never happens in the code as every input buffers are of size QUIC_TLS_IV_LEN. Remove the return value and add a BUG_ON() to prevent future misusage. This is especially useful to remove one error handling on the sending patch via quic_packet_encrypt(). This should be backported up to 2.7.	2023-05-22 11:17:18 +02:00
Amaury Denoyelle	8d6d246dbc	CLEANUP: mux-quic/h3: complete BUG_ON with comments Complete each useful BUG_ON statements with a comment to explain its purpose. Also convert BUG_ON_HOT to BUG_ON as they should not have a big impact. This should be backported up to 2.7.	2023-05-22 11:17:18 +02:00
Aurelien DARRAGON	b6a24a52a2	BUG/MINOR: debug: fix pointer check in debug_parse_cli_task() Task pointer check in debug_parse_cli_task() computes the theoric end address of provided task pointer to check if it is valid or not thanks to may_access() helper function. However, relative ending address is calculated by adding task size to 't' pointer (which is a struct task pointer), thus it will result to incorrect address since the compiler automatically translates 't + x' to 't + x * sizeof(t)' internally (with sizeof(t) != 1 here). Solving the issue by using 'ptr' (which is the void * raw address) as starting address to prevent automatic address scaling. This was revealed by coverity, see GH #2157. No backport is needed, unless `9867987` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") gets backported.	2023-05-17 16:49:17 +02:00
Aurelien DARRAGON	7428adaf0d	BUG/MINOR: hlua: SET_SAFE_LJMP misuse in hlua_event_runner() When hlua_event_runner() pauses the subscription (ie: if the consumer can't keep up the pace), hlua_traceback() is used to get the current lua trace (running context) to provide some info to the user. However, as hlua_traceback() may raise an error (__LJMP) is set, it is used within a SET_SAFE_LJMP() / RESET_SAFE_LJMP() combination to ensure lua errors are properly handled and don't result in unexpected behavior. But the current usage of SET_SAFE_LJMP() within the function is wrong since hlua_traceback() will run a second time (unprotected) if the first (protected) attempt fails. This is undefined behavior and could even lead to crashes. Hopefully it is very hard to trigger this code path, thus we can consider this as a minor bug. Also using this as an opportunity to enhance the message report to make it more meaningful to the user. This should fix GH #2159. It is a 2.8 specific bug, no backport needed unless `c84899c636` ("MEDIUM: hlua/event_hdl: initial support for event handlers") gets backported.	2023-05-17 16:48:40 +02:00
Christopher Faulet	06e9c81bd0	MEDIUM: resolvers: Stop scheduling resolution during stopping stage When the process is stopping, the server resolutions are suspended. However the task is still periodically woken up for nothing. If there is a huge number of resolution, it may lead to a noticeable CPU consumption for no reason. To avoid this extra CPU cost, we stop to schedule the the resolution tasks during the stopping stage. Of course, it is only true for server resolutinos. Dynamic ones, via do-resolve actions, are not concerned. These ones must still be triggered during stopping stage. Concretly, during the stopping stage, the resolvers task is no longer scheduled if there is no running resolutions. In this case, if a do-resolve action is evaluated, the task is woken up. This patch should partially solve the issue #2145.	2023-05-17 16:48:33 +02:00
Christopher Faulet	8bca3cc8c7	MEDIUM: checks: Stop scheduling healthchecks during stopping stage When the process is stopping, the health-checks are suspended. However the task is still periodically woken up for nothing. If there is a huge number of health-checks and if they are woken up in same time, it may lead to a noticeable CPU consumption for no reason. To avoid this extra CPU cost, we stop to schedule the health-check tasks when the proxy is disabled or stopped. This patch should partially solve the issue #2145.	2023-05-17 14:57:10 +02:00
Christopher Faulet	c8a7bb16b7	CLEANUP: fcgi-app; Remove useless assignment to NULL When the fcgi configuration is checked and fcgi rules are created, a useless assignment to NULL is reported by Covertiy. Let's remove it. This patch should fix the coverity report #2161.	2023-05-17 09:42:37 +02:00
Willy Tarreau	c7b9308f20	BUG/MINOR: clock: automatically adjust the internal clock with the boot time This is a better and more general solution to the problem described in this commit: BUG/MINOR: checks: postpone the startup of health checks by the boot time Now we're updating the now_offset that is used to compute now_ms at the few points where we update the ready date during boot. This ensures that now_ms while being stable during all the boot process will be correct and will start with the boot value right after the boot is finished. As such the patch above is rolled back (we don't want to count the boot time twice). This must not be backported because it relies on the more flexible clock architecture in 2.8.	2023-05-17 09:33:54 +02:00
Willy Tarreau	5345490b8e	MINOR: clock: provide a function to automatically adjust now_offset Right now there's no way to enforce a specific value of now_ms upon startup in order to compensate for the time it takes to load a config, specifically when dealing with the health check startup. For this we'd need to force the now_offset value to compensate for the last known value of the current date. This patch exposes a function to do exactly this.	2023-05-17 09:33:54 +02:00
Willy Tarreau	8e978a094d	BUG/MINOR: checks: postpone the startup of health checks by the boot time When health checks are started at boot, now_ms could be off by the boot time. In general it's not even noticeable, but with very large configs taking up to one or even a few seconds to start, this can result in a part of the servers' checks being scheduled slightly in the past. As such all of them will start groupped, partially defeating the purpose of the spread-checks setting. For example, this can cause a burst of connections for the network, or an excess of CPU usage during SSL handshakes, possibly even causing some timeouts to expire early. Here in order to compensate for this, we simply add the known boot time to the computed delay when scheduling the startup of checks. That's very simple and particularly efficient. For example, a config with 5k servers in 800 backends checked every 5 seconds, that was taking 3.8 seconds to start used to show this distribution of health checks previously despite the spread-checks 50: 3690 08:59:25 417 08:59:26 213 08:59:27 71 08:59:28 428 08:59:29 860 08:59:30 918 08:59:31 938 08:59:32 1124 08:59:33 904 08:59:34 647 08:59:35 890 08:59:36 973 08:59:37 856 08:59:38 893 08:59:39 154 08:59:40 Now with the fix it shows this: 470 08:59:59 929 09:00:00 896 09:00:01 937 09:00:02 854 09:00:03 827 09:00:04 906 09:00:05 863 09:00:06 913 09:00:07 873 09:00:08 162 09:00:09 This should be backported to all supported versions. It depends on this commit: MINOR: clock: measure the total boot time For 2.8 where the internal clock is now totally independent on the human one, an more generic fix will consist in simply updating now_ms to reflect the startup time.	2023-05-17 09:33:54 +02:00
Willy Tarreau	5723b382ed	MINOR: stats: report the boot time in "show info" Just like we have the uptime in "show info", let's add the boot time. It's trivial to collect as it's just the difference between the ready date and the start date, and will allow users to monitor this element in order to take action before it starts becoming problematic. Here the boot time is reported in milliseconds, so this allows to even observe sub-second anomalies in startup delays.	2023-05-17 09:33:54 +02:00
Willy Tarreau	da4aa6905c	MINOR: clock: measure the total boot time Some huge configs take a significant amount of time to start and this can cause some trouble (e.g. health checks getting delayed and grouped, process not responding to the CLI etc). For example, some configs might start fast in certain environments and slowly in other ones just due to the use of a wrong DNS server that delays all libc's resolutions. Let's first start by measuring it by keeping a copy of the most recently known ready date, once before calling check_config_validity() and then refine it when leaving this function. A last call is finally performed just before deciding to split between master and worker processes, and it covers the whole boot. It's trivial to collect and even allows to get rid of a call to clock_update_date() in function check_config_validity() that was used in hope to better schedule future events.	2023-05-17 09:33:54 +02:00
Willy Tarreau	52fd879953	CLEANUP: stats: update the trash chunk where it's used When integrating the number of warnings in "show info" in 2.8 with commit `3c4a297d2` ("MINOR: stats: report the total number of warnings issued"), the update of the trash buffer used by the Tainted flag got displaced lower. There's no harm for now util someone adds a new metric requiring a call to chunk_newstr() and gets both values merged. Let's move the call to its location now.	2023-05-17 09:33:54 +02:00
Christopher Faulet	cb76030356	CLEANUP: check; Remove some useless assignments to NULL In process_chk_conn(), some assignments to NULL are useless and are reported by Coverity as unused value. while it is harmless, these assignments can be removed. This patch should fix the coverity report #2158.	2023-05-17 09:28:23 +02:00
Aurelien DARRAGON	0d2f1acee6	BUG/MINOR: server: memory leak in _srv_update_status_op() on server DOWN When server is transitionning from UP to DOWN, a log message is generated. e.g.: "Server backend_name/server_name is DOWN") However since `f71e064` ("MEDIUM: server: split srv_update_status() in two functions"), the allocated buffer tmptrash which is used to prepare the log message is not freed after it has been used, resulting in a small memory leak each time a server goes DOWN because of an operational change. This is a 2.8 specific bug, no backport needed unless the above commit gets backported.	2023-05-17 09:21:01 +02:00
Aurelien DARRAGON	22d584a993	CLEANUP: server: remove useless tmptrash assigments in srv_update_status() Within srv_update_status subfunctions _op() and _adm(), each time tmptrash is freed, we assign it to NULL to ensure it will not be reused. However, within those functions it is not very useful given that tmptrash is never checked against NULL except upon allocation through alloc_trash_chunk(), which happens everytime a new log message is generated, sent, and then freed right away, so there are no code paths that could lead to tmptrash being checked for reuse (tmptrash is systematically overwritten since all log messages are independant from each other). This was raised by coverity, see GH #2162.	2023-05-17 09:21:01 +02:00
Christopher Faulet	2d5a5665fe	BUG/MINOR: tcp-rules: Don't shortened the inspect-delay when EOI is set A regression was introduced with the commit `cb59e0bc3` ("BUG/MINOR: tcp-rules: Stop content rules eval on read error and end-of-input"). We should not shorten the inspect-delay when the EOI flag is set on the SC. Idea of the inspect-delay is to wait a TCP rule is matching. It is only interrupted if an error occurs, on abort or if the peer shuts down. It is also interrupted if the buffer is full. This last case is a bit ambiguous and discutable. It could be good to add ACLS, like "wait_complete" and "wait_full" to do so. But for now, we only remove the test on SC_FL_EOI flag. This patch must be backported to all stable versions.	2023-05-17 09:21:01 +02:00
Willy Tarreau	b93758cec9	MINOR: checks: make sure spread-checks is used also at boot time This makes use of spread-checks also for the startup of the check tasks. This provides a smoother load on startup for uneven configurations which tend to enable only some servers. Below is the connection distribution per second of the SSL checks of a config with 5k servers spread over 800 backends, with a check inter of 5 seconds: - default: 682 08:00:50 826 08:00:51 773 08:00:52 1016 08:00:53 885 08:00:54 889 08:00:55 825 08:00:56 773 08:00:57 1016 08:00:58 884 08:00:59 888 08:01:00 491 08:01:01 - with spread-checks 50: 437 08:01:19 866 08:01:20 777 08:01:21 1023 08:01:22 1118 08:01:23 923 08:01:24 641 08:01:25 859 08:01:26 962 08:01:27 860 08:01:28 929 08:01:29 909 08:01:30 866 08:01:31 849 08:01:32 114 08:01:33 - with spread-checks 50 + this patch: 680 08:01:55 922 08:01:56 962 08:01:57 899 08:01:58 819 08:01:59 843 08:02:00 916 08:02:01 896 08:02:02 886 08:02:03 846 08:02:04 903 08:02:05 894 08:02:06 178 08:02:07 The load is much smoother from the start, this can help initial health checks succeed when many target the same overloaded server for example. This could be backported as it should make border-line configs more reliable across reloads.	2023-05-17 08:10:40 +02:00
Amaury Denoyelle	bf86d89ea6	BUG/MEDIUM: mux-quic: fix EOI for request without payload When a full message is received for a stream, MUX is responsible to set EOI flag. This was done through rcv_buf stream callback by checking if QCS HTX buffer contained the EOM flag. This is not correct for HTTP without body. In this case, QCS HTX buffer is never used. Only a local HTX buffer is used to transfer headers just as stream endpoint is created. As such, EOI is never transmitted to the upper layer. If the transfer occur without any issue, this does not seem to cause any problem. However, in case the transfer is aborted, the stream is never released which cause a memory leak and prevent the process soft-stop. To fix this, also check if EOM is put by application layer during headers conversion. If true, this is transferred through a new argument to qc_attach_sc() MUX function which is responsible to set the EOI flag. This issue was reproduced using h2load with hundred of connections. h2load is interrupted with a SIGINT which causes streams to never be closed on haproxy side. This should be backported up to 2.6.	2023-05-16 17:53:45 +02:00
Amaury Denoyelle	1a2faef92f	MINOR: mux-quic: uninline qc_attach_sc() Uninline and move qc_attach_sc() function to implementation source file. This will be useful for next commit to add traces in it. This should be backported up to 2.7.	2023-05-16 17:53:45 +02:00
Amaury Denoyelle	3cb78140cf	MINOR: mux-quic: properly report end-of-stream on recv MUX is responsible to put EOS on stream when read channel is closed. This happens if underlying connection is closed or a RESET_STREAM is received. FIN STREAM is ignored in this case. For connection closure, simply check for CO_FL_SOCK_RD_SH. For RESET_STREAM reception, a new flag QC_CF_RECV_RESET has been introduced. It is set when RESET_STREAM is received, unless we already received all data. This is conform to QUIC RFC which allows to ignore a RESET_STREAM in this case. During RESET_STREAM processing, input buffer is emptied so EOS can be reported right away on recv_buf operation. This should be backported up to 2.7.	2023-05-16 17:53:45 +02:00
Amaury Denoyelle	1649469be1	MINOR: mux-quic: add trace to stream rcv_buf operation Add traces to render each stream transition more explicit. Also, move ERR_PENDING to ERROR transition after other stream flags are set, as with the MUX H2 implementation. This is purely a cosmetic change and it should have no functional impact. This should be backported up to 2.7.	2023-05-16 17:53:45 +02:00
Amaury Denoyelle	6133aba889	BUG/MINOR: h3: missing goto on buf alloc failure The following patch introduced proper error management on buffer allocation failure : `0abde9dee6` BUG/MINOR: mux-quic: properly handle buf alloc failure However, when decoding an empty STREAM frame with just FIN bit set, this was not done correctly. Indeed, there is a missing goto statement in case of a NULL buffer check. This was reported thanks to coverity analysis. This should fix github issue #2163. This must be backported up to 2.6.	2023-05-15 14:57:56 +02:00
Amaury Denoyelle	1611a7659b	BUG/MINOR: mux-quic: handle properly Tx buf exhaustion Since the following patch commit `6c501ed23b` BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc it is not possible to check if Tx buf allocation failed due to a configured limit exhaustion or a simple memory failure. This patch fixes it as the condition was inverted. Indeed, if buf_avail is null, this means that the limit has been reached. On the contrary case, this is a real memory alloc failure. This caused the flag QC_CF_CONN_FULL to not be properly used and may have caused disruption on transfer with several streams or large data. This was detected due to an abnormal error QUIC MUX traces. Also change in consequence trace for limit exhaustion to be more explicit. This must be backported up to 2.6.	2023-05-15 14:06:21 +02:00
William Lallemand	6e0c39d7ac	BUILD: ssl: ssl_c_r_dn fetches uses functiosn only available since 1.1.1 Fix the openssl build with older openssl version by disabling the new ssl_c_r_dn fetch. This also disable the ssl_client_samples.vtc file for OpenSSL version older than 1.1.1	2023-05-15 12:07:52 +02:00
Willy Tarreau	d38d8c6ccb	BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout Christopher found as part of the analysis of Tim's issue #1891 that commit `15a4733d5` ("BUG/MEDIUM: mux-h2: make use of http-request and keep-alive timeouts") introduced in 2.6 incompletely addressed a timeout issue in the H2 mux. The problem was that the http-keepalive and http-request timeouts were not applied before it. With that commit they are now considered, but if a GOAWAY is sent (or even attempted to be sent), then they are not used anymore again, because the way the code is arranged consists in applying the client-fin timeout (if set) to the current date, and falling back to the client timeout, without considering the idle_start period. This means that a config having a "timeout http-keepalive" would still not close the connection quickly when facing a client that periodically sends PING, PRIORITY or whatever other frame types. In addition, after the GOAWAY was attempted to be sent, there was no check for pending data in the output buffer, meaning that it would be possible to truncate some responses in configs involving a very short client-fin timeout. Finally the spreading of the closures during the soft-stop brought in 2.6 by commit `b5d968d9b` ("MEDIUM: global: Add a "close-spread-time" option to spread soft-stop on time window") didn't consider the particular case of an idle "pre-connect" connection, which would also live long if a browser failed to deliver a valid request for a long time. All of this indicates that the conditions must be reworked so as not to have that level of exclusion between conditions, but rather stick to the rules from the doc that are already enforced on other muxes: - timeout client always applies if there are data pending, and is relative to each new I/O ; - timeout http-request applies before the first complete request and is relative to the entry in idle state ; - timeout http-keepalive applies between idle and the next complete request and is relative to the entry in idle state ; - timeout client-fin applies when in idle after a shut was sent (here the shut is the GOAWAY). The shut may only be considered as sent if the buffer is empty and the flags indicate that it was successfully sent (or failed) but not if it's still waiting for some room in the output buffer for example. This implies that this timeout may then lower the http-keepalive/http-request ones. This is what this patch implements. Of course the client timeout still applies as a fallback when all the ones above are not set or when their conditions are not met. It would seem reasoanble to backport this to 2.7 first, then only after one or two releases to 2.6.	2023-05-15 12:01:20 +02:00
Abhijeet Rastogi	df97f472fa	MINOR: ssl: add new sample ssl_c_r_dn This patch addresses #1514, adds the ability to fetch DN of the root ca that was in the chain when client certificate was verified during SSL handshake.	2023-05-15 10:48:05 +02:00
William Lallemand	7f95469163	MEDIUM: proxy: stop emitting logs for internal proxies when stopping The HTTPCLIENT and the OCSP-UPDATE proxies are internal proxies, we don't need to display logs of them stopping during the stopping of the process. This patch checks if a proxy has the flag PR_CAP_INT so it doesn't display annoying messages.	2023-05-15 10:38:09 +02:00
Christopher Faulet	6eb53b138d	MINOR: stconn: Remove useless test on sedesc on detach to release the xref When the SC is detached from the endpoint, the xref between the endpoints is removed. At this stage, the sedesc cannot be undefined. So we can remove the test on it. This issue should fix the issue #2156. No backport needed.	2023-05-15 09:53:30 +02:00
William Lallemand	1601eebcd1	MEDIUM: mworker/cli: does not disconnect the master CLI upon error In the proxy CLI analyzer, when pcli_parse_request returns -1, the client was shut to prevent any problem with the master CLI. This behavior is a little bit excessive and not handy at all in prompt mode. For example one could have activated multiples mode, then have an error which disconnect the CLI, and they would have to reconnect and enter all the modes again. This patch introduces the pcli_error() function, which only output an error and flush the input buffer, instead of closing everything. When encountering a parsing error, this function is used, and the prompt is written again, without any disconnection.	2023-05-14 18:42:31 +02:00
William Lallemand	4adb4b9903	MEDIUM: session/ssl: return the SSL error string during a SSL handshake error SSL hanshake error were unable to dump the OpenSSL error string by default, to do so it was mandatory to configure a error-log-format with the ssl_fc_err fetch. This patch implements the session_build_err_string() function which creates the error log to send during session_kill_embryonic(), a special case is made with CO_ER_SSL_HANDSHAKE which is able to dump the error string with ERR_error_string(). Before: <134>May 12 17:14:04 haproxy[183151]: 127.0.0.1:49346 [12/May/2023:17:14:04.571] frt2/1: SSL handshake failure After: <134>May 12 17:14:04 haproxy[183151]: 127.0.0.1:49346 [12/May/2023:17:14:04.571] frt2/1: SSL handshake failure (error:0A000418:SSL routines::tlsv1 alert unknown ca)	2023-05-12 17:43:58 +02:00
Amaury Denoyelle	ee65efbfae	BUG/MINOR: mux-quic: free task on qc_init() app ops failure qc_init() is used to initialize a QUIC MUX instance. On failure, each resources are released via a series of goto statements. There is one issue if the app_ops.init callback fails. In this case, MUX task is not freed. This can cause a crash as the task is already scheduled. When the handler will run, it will crash when trying to access qcc instance. To fix this, properly destroy qcc task on fail_install_app_ops label. The impact of this bug is minor as app_ops.init callback succeeds most of the time. However, it may fail on allocation failure due to memory exhaustion. This may fix github issue #2154. This must be backported up to 2.7.	2023-05-12 16:37:27 +02:00
Amaury Denoyelle	6c501ed23b	BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc qc_stream_buf_alloc() can fail for two reasons : * limit of Tx buffer per connection reached * allocation failure The first case is properly treated. A flag QC_CF_CONN_FULL is set on the connection to interrupt emission. It is cleared when a buffer became available after in order ACK reception and the MUX tasklet is woken up. The allocation failure was handled with the same mechanism which in this case is not appropriate and could lead to a connection transfer freeze. Instead, prefer to close the connection with a QUIC internal error code. To differentiate the two causes, qc_stream_buf_alloc() API was changed to return the number of available buffers to the caller. This must be backported up to 2.6.	2023-05-12 16:26:20 +02:00
Amaury Denoyelle	50fe00650f	BUG/MINOR: quic: do not alloc buf count on alloc failure The total number of buffer per connection for sending is limited by a configuration value. To ensure this, <stream_buf_count> quic_conn field is incremented on qc_stream_buf_alloc(). qc_stream_buf_alloc() may fail if the buffer cannot be allocated. In this case, <stream_buf_count> should not be incremented. To fix this, simply move increment operation after buffer allocation. The impact of this bug is low. However, if a connection suffers from several buffer allocation failure, it may cause the <stream_buf_count> to be incremented over the limit without being able to go back down. This must be backported up to 2.6.	2023-05-12 15:55:41 +02:00
Amaury Denoyelle	d00b3093c9	BUG/MINOR: mux-quic: handle properly recv ncbuf alloc failure The function qc_get_ncbuf() is used to allocate a ncbuf content. Allocation failure was handled using a plain BUG_ON. Fix this by a proper error management. This buffer is only used for STREAM frame reception to support out-of-order offsets. When an allocation failed, close the connection with a QUIC internal error code. This should be backported up to 2.6.	2023-05-12 15:52:19 +02:00
Amaury Denoyelle	0abde9dee6	BUG/MINOR: mux-quic: properly handle buf alloc failure A convenience function qc_get_buf() is implemented to centralize buffer allocation on MUX and H3 layers. However, allocation failure was not handled properly with a BUG_ON() statement. Replace this by proper error management. On emission, streams is temporarily skip over until the next qc_send() invocation. On reception, H3 uses this function for HTX conversion; on alloc failure the connection will be closed with QUIC internal error code. This must be backported up to 2.6.	2023-05-12 15:51:15 +02:00
Amaury Denoyelle	93dd23cab4	MINOR: mux-quic: remove dedicated function to handle standalone FIN Remove QUIC MUX function qcs_http_handle_standalone_fin(). The purpose of this function was only used when receiving an empty STREAM frame with FIN bit. Besides, it was called by each application protocol which could have different approach and render the function purpose unclear. Invocation of qcs_http_handle_standalone_fin() have been replaced by explicit code in both H3 and HTTP/0.9 module. In the process, use htx_set_eom() to reliably put EOM on the HTX message. This should be backported up to 2.7, along with the previous patch which introduced htx_set_eom().	2023-05-12 15:50:30 +02:00
Amaury Denoyelle	25cf19d5c8	MINOR: htx: add function to set EOM reliably Implement a new HTX utility function htx_set_eom(). If the HTX message is empty, it will first add a dummy EOT block. This is a small trick needed to ensure readers will detect the HTX buffer as not empty and retrieve the EOM flag. Replace the H2 code related by a htx_set_eom() invocation. QUIC also has the same code which will be replaced in the next commit. This should be backported up to 2.7 before the related QUIC patch.	2023-05-12 15:29:28 +02:00
Frédéric Lécaille	76d502588d	BUG/MINOR: quic: Wrong redispatch for external data on connection socket It is possible to receive datagram from other connection on a dedicated quic-conn socket. This is due to a race condition between bind() and connect() system calls. To handle this, an explicit check is done on each datagram. If the DCID is not associated to the connection which owns the socket, the datagram is redispatch as if it arrived on the listener socket. This redispatch step was not properly done because the source address specified for the redispatch function was incorrect. Instead of using the datagram source address, we used the address of the socket quic-conn which received the datagram due to the above race condition. Fix this simply by using the address from the recvmsg() system call. The impact of this bug is minor as redispatch on connection socket should be really rare. However, when it happens it can lead to several kinds of problems, like for example a connection initialized with an incorrect peer address. It can also break the Retry token check as this relies on the peer address. In fact, Retry token check failure was the reason this bug was found. When using h2load with thousands of clients, the counter of Retry token failure was unusually high. With this patch, no failure is reported anymore for Retry. Must be backported to 2.7.	2023-05-12 14:48:30 +02:00
Aurelien DARRAGON	256d581fbd	BUG/MINOR: log: fix memory error handling in parse_logsrv() A check was missing in parse_logsrv() to make sure that malloc-dependent variable is checked for non-NULL before using it. If malloc fails, the function raises an error and stops, like it's already done at a few other places within the function. This partially fixes GH #2130. It should be backported to every stable versions.	2023-05-12 09:45:30 +02:00
Aurelien DARRAGON	d4dba38ab1	BUG/MINOR: errors: handle malloc failure in usermsgs_put() usermsgs_buf.size is set without first checking if previous malloc attempt succeeded. This could fool the buffer API into assuming that the buffer is initialized, resulting in unsafe read/writes. Guarding usermsgs_buf.size assignment with the malloc attempt result to make the buffer initialization safe against malloc failures. This partially fixes GH #2130. It should be backported up to 2.6.	2023-05-12 09:45:30 +02:00
Aurelien DARRAGON	ceb13b5ed3	MINOR: ncbuf: missing malloc checks in standalone code Some malloc resulsts were not checked in standalone ncbuf code. As this is debug/test code, we don't need to explicitly handle memory errors, we just add some BUG_ON() to ensure that memory is properly allocated and prevent unexpected results. This partially fixes issue GH #2130. No backport needed.	2023-05-12 09:45:30 +02:00
Willy Tarreau	94df1b57ee	BUILD: debug: fix build issue on 32-bit platforms in "debug dev task" Commit `986798718` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") caused a build failure on 32-bit platforms when parsing the task's pointer. Let's use strtoul() and not strtoll(). No backport is needed, unless the commit above gets backported.	2023-05-12 04:40:06 +02:00
William Lallemand	e279f595ad	MINOR: httpclient: allow to disable the DNS resolvers of the httpclient httpclient.resolvers.disabled allow to disable completely the resolvers of the httpclient, prevents the creation of the "default" resolvers section, and does not insert the http do-resolve rule in the proxies.	2023-05-11 21:25:37 +02:00
Willy Tarreau	fe0ba0e9f9	MINOR: cli: make "show fd" identify QUIC connections and listeners Now we can detect the listener associated with a QUIC listener and report a bit more info (e.g. listening port and frontend name), and provide a bit more info about connections as well, and filter on both front connections and listeners using the "l" and "f" flags.	2023-05-11 17:20:39 +02:00
Willy Tarreau	ea07715ccf	MINOR: master/cli: also implement the timed prompt on the master CLI This provides more consistency between the master and the worker. When "prompt timed" is passed on the master, the timed mode is toggled. When enabled, for a master it will show the master process' uptime, and for a worker it will show this worker's uptime. Example: master> prompt timed [0:00:00:50] master> show proc #<PID> <type> <reloads> <uptime> <version> 11940 master 1 [failed: 0] 0d00h02m10s 2.8-dev11-474c14-21 # workers 11955 worker 0 0d00h00m59s 2.8-dev11-474c14-21 # old workers 11942 worker 1 0d00h02m10s 2.8-dev11-474c14-21 # programs [0:00:00:58] master> @!11955 [0:00:01:03] 11955> @!11942 [0:00:02:17] 11942> @ [0:00:01:10] master>	2023-05-11 16:38:52 +02:00
Willy Tarreau	225555711f	MINOR: cli: add an option to display the uptime in the CLI's prompt Entering "prompt timed" toggles reporting of the process' uptime in the prompt, which will report days, hours, minutes and seconds since it was started. As discussed with Tim in issue #2145, this can be convenient to roughly estimate the time between two outputs, as well as detecting that a process failed to be reloaded for example.	2023-05-11 16:38:52 +02:00
Willy Tarreau	21d7125c92	BUG/MINOR: cli: don't complain about empty command on empty lines There's something very irritating on the CLI, when just pressing ENTER, it complains "Unknown command: ''..." and dumps all the help. This action is often done to add a bit of clearance after a dump to visually find delimitors later, but this stupid error makes it unusable. This patch addresses this by just returning on empty command instead of trying to look up a matching keyword. It will result in an empty line to mark the end of the empty command and a prompt again. It's probably not worth backporting this given that nobody seems to have complained about it yet.	2023-05-11 16:38:52 +02:00
Aurelien DARRAGON	31b23aef38	CLEANUP: acl: discard prune_acl_cond() function Thanks to previous commit, we have no more use for prune_acl_cond(), let's remove it to prevent code duplication.	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	c610095258	MINOR: tree-wide: use free_acl_cond() where relevant Now that we have free_acl_cond(cond) function that does cond prune then frees cond, replace all occurences of this pattern: \| prune_acl_cond(cond) \| free(cond) with: \| free_acl_cond(cond)	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	cd9aff1321	CLEANUP: http_act: use http_free_redirect_rule() to clean redirect act Since redirect rules now have a dedicated cleanup function, better use it to prevent code duplication.	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	5313570605	BUG/MINOR: http_rules: fix errors paths in http_parse_redirect_rule() http_parse_redirect_rule() doesn't perform enough checks around NULL returning allocating functions. Moreover, existing error paths don't perform cleanups. This could lead to memory leaks. Adding a few checks and a cleanup path to ensure memory errors are properly handled and that no memory leaks occurs within the function (already allocated structures are freed on error path). It should partially fix GH #2130. This patch depends on ("MINOR: proxy: add http_free_redirect_rule() function") This could be backported up to 2.4. The patch is also relevant for 2.2 but "MINOR: proxy: add http_free_redirect_rule() function" would need to be adapted first. == Backport notes: -> For 2.2 only: Replace: (strcmp(args[cur_arg], "drop-query") == 0) with: (!strcmp(args[cur_arg],"drop-query")) -> For 2.2 and 2.4: Replace: "expects 'code', 'prefix', 'location', 'scheme', 'set-cookie', 'clear-cookie', 'drop-query', 'ignore-empty' or 'append-slash' (was '%s')", with: "expects 'code', 'prefix', 'location', 'scheme', 'set-cookie', 'clear-cookie', 'drop-query' or 'append-slash' (was '%s')",	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	7abc9224a6	MINOR: proxy: add http_free_redirect_rule() function Adding http_free_redirect_rule() function to free a single redirect rule since it may be required to free rules outside of free_proxy() function. This patch is required for an upcoming bugfix. [for 2.2, free_proxy function did not exist (first seen in 2.4), thus http_free_redirect_rule() needs to be deducted from haproxy.c deinit() function if the patch is required]	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	8dfc2491d2	BUG/MINOR: proxy: missing free in free_proxy for redirect rules cookie_str from struct redirect, which may be allocated through http_parse_redirect_rule() function is not properly freed on proxy cleanup within free_proxy(). This could be backported to all stable versions. [for 2.2, free_proxy() did not exist so the fix needs to be performed directly in deinit() function from haproxy.c]	2023-05-11 15:37:04 +02:00
Christopher Faulet	7542fb43d6	MINOR: stconn: Add a cross-reference between SE descriptor A xref is added between the endpoint descriptors. It is created when the server endpoint is attached to the SC and it is destroyed when an endpoint is detached. This xref is not used for now. But it will be useful to retrieve info about an endpoint for the opposite side. It is also the warranty there is still a endpoint attached on the other side.	2023-05-11 15:37:04 +02:00
Christopher Faulet	efebff35bb	BUG/MEDIUM: mux-fcgi: Don't request more room if mux is waiting for more data A mux must never report it is waiting for room in the channel buffer if this buffer is empty. Because there is nothing the application layer can do to unblock the situation. Indeed, when this happens, it means the mux is waiting for data to progress. It typically happens when all headers are not received. In the FCGI mux, if some data remain in the RX buffer but the channel buffer is empty, it does no longer report it is waiting for room. This patch should fix the issue #2150. It must be backported as far as 2.6.	2023-05-11 15:37:04 +02:00
Christopher Faulet	a272c39330	BUG/MEDIUM: mux-fcgi: Never set SE_FL_EOS without SE_FL_EOI or SE_FL_ERROR When end-of-stream is reported by a FCGI stream, we must take care to also report an error if end-of-input was not reported. Indeed, it is now mandatory to set SE_FL_EOI or SE_FL_ERROR flags when SE_FL_EOS is set. It is a 2.8-specific issue. No backport needed.	2023-05-11 15:37:04 +02:00
Willy Tarreau	4cfb0019e6	MINOR: stats: report the listener's protocol along with the address in stats When "optioon socket-stats" is used in a frontend, its listeners have their own stats and will appear in the stats page. And when the stats page has "stats show-legends", then a tooltip appears on each such socket with ip:port and ID. The problem is that since QUIC arrived, it was not possible to distinguish the TCP listeners from the QUIC ones because no protocol indication was mentioned. Now we add a "proto" legend there with the protocol name, so we can see "tcp4" or "quic6" and figure how the socket is bound.	2023-05-11 14:52:56 +02:00
Amaury Denoyelle	5f67b17a59	MEDIUM: mux-quic: adjust transport layer error handling Following previous patch, error notification from quic_conn has been adjusted to rely on standard connection flags. Most notably, CO_FL_ERROR on the connection instance when a fatal error is detected. Check for CO_FL_ERROR is implemented by qc_send(). If set the new flag QC_CF_ERR_CONN will be set for the MUX instance. This flag is similar to the local error flag and will abort most of the futur processing. To ensure stream upper layer is also notified, qc_wake_some_streams() called by qc_process() will put the stream on error if this new flag is set. This should be backported up to 2.7.	2023-05-11 14:12:48 +02:00
Amaury Denoyelle	b2e31d33f5	MEDIUM: quic: streamline error notification When an error is detected at quic-conn layer, the upper MUX must be notified. Previously, this was done relying on quic_conn flag QUIC_FL_CONN_NOTIFY_CLOSE set and the MUX wake callback called on connection closure. Adjust this mechanism to use an approach more similar to other transport layers in haproxy. On error, connection flags are updated with CO_FL_ERROR, CO_FL_SOCK_RD_SH and CO_FL_SOCK_WR_SH. The MUX is then notified when the error happened instead of just before the closing. To reflect this change, qc_notify_close() has been renamed qc_notify_err(). This function must now be explicitely called every time a new error condition arises on the quic_conn layer. To ensure MUX send is disabled on error, qc_send_mux() now checks CO_FL_SOCK_WR_SH. If set, the function returns an error. This should prevent the MUX from sending data on closing or draining state. To complete this patch, MUX layer must now check for CO_FL_ERROR explicitely. This will be the subject of the following commit. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	2ad41b8629	MINOR: mux-quic: simplify return path of qc_send() Remove the unnecessary err label for qc_send(). Anyway, this label cannot be used once some frames are sent because there is no cleanup part for it. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	b35e32e43b	MINOR: mux-quic: factorize send subscribing Factorize code for send subscribing on the lower layer in a dedicated function qcc_subscribe_send(). This allows to call the lower layer only if not already subscribed and print a trace in this case. This should help to understand when subscribing is really performed. In the future, this function may be extended to avoid subscribing under new conditions, such as connection already on error. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	04b2208aa0	MINOR: mux-quic: do not send STREAM frames if already subscribe Do not built STREAM frames if MUX is already subscribed for sending on lower layer. Indeed, this means that either socket currently encountered a transient error or congestion window is full. This change is an optimization which prevents to allocate and release a series of STREAM frames for nothing under congestion. Note that nothing is done for other frames (flow-control, RESET_STREAM and STOP_SENDING). Indeed, these frames are not restricted by flow control. However, this means that they will be allocated for nothing if send is blocked on a transient error. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	2d5c3f5cd1	MINOR: mux-quic: add traces for stream wake Add traces for when an upper layer stream is woken up by the MUX. This should help to diagnose frozen stream issues. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	69670e88bd	BUG/MINOR: mux-quic: no need to subscribe for detach streams When detach is conducted by stream endpoint layer, a stream is either freed or just flagged as detached if the transfer is not yet finished. In the latter case, the stream will be finally freed via qc_purge_streams() which is called periodically. A subscribe was done on quic-conn layer if a stream cannot be freed via qc_purge_streams() as this means FIN STREAM has not yet been sent. However, this is unnecessary as either HTX EOM was not yet received and we are waiting for the upper layer, or FIN stream is still in the buffer but was not yet transmitted due to an incomplete transfer, in which case a subscribe should have already been done. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	131f2d93e1	BUG/MINOR: mux-quic: do not free frame already released by quic-conn MUX uses qc_send_mux() function to send frames list over a QUIC connection. On network congestion, the lower layer will reject some frames and it is the MUX responsibility to free them. There is another category of error which are when the sendto() fails. In this case, the lower layer will free the packet and its attached frames and the MUX should not touch them. This model was violated by MUX layer for RESET_STREAM and STOP_SENDING emission. In this case, frames were freed every time by the MUX on error. This causes a double free error which lead to a crash. Fix this by always ensuring if frames were rejected by the lower layer before freeing them on the MUX. This is done simply by checking if frame list is not empty, as RESET_STREAM and STOP_SENDING are sent individually. This bug was never reproduced in production. Thus, it is labelled as MINOR. This must be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	3fd40935d9	BUG/MINOR: mux-quic: do not prevent shutw on error Since recent modification of MUX error processing, shutw operation was skipped for a connection reported as on error. However, this can caused the stream layer to not be notified about error. The impact of this bug is unknown but it may lead to stream never closed. To fix this, simply skip over send operations when connection is on error while keep notifying the stream layer. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Willy Tarreau	9615102b01	MINOR: stats: report the number of times the global maxconn was reached As discussed a few times over the years, it's quite difficult to know how often we stop accepting connections because the global maxconn was reached. This is not easy to know because when we reach the limit we stop accepting but we don't know if incoming connections are pending, so it's not possible to know how many were delayed just because of this. However, an interesting equivalent metric consist in counting the number of times an accepted incoming connection resulted in the limit being reached. I.e. "we've accepted the last one for now". That doesn't imply any other one got delayed but it's a factual indicator that something might have been delayed. And by counting the number of such events, it becomes easier to know whether some limits need to be adjusted because they're reached often, or if it's exceptionally rare. The metric is reported as a counter in show info and on the stats page in the info section right next to "maxconn".	2023-05-11 13:51:31 +02:00
Willy Tarreau	3c4a297d2b	MINOR: stats: report the total number of warnings issued Now in "show info" we have a TotalWarnings field that reports the total number of warnings issued since the process started. It's also reported in the the stats page next to the uptime.	2023-05-11 12:02:21 +02:00
Frédéric Lécaille	0dd4fa58e6	BUG/MINOR: quic: Buggy acknowlegments of acknowlegments function qc_treat_ack_of_ack() must remove ranges of acknowlegments from an ebtree which have been acknowledged. This is done keeping track of the largest acknowledged packet number which has been acknowledged and sent with an ack-eliciting packet. But due to the data structure of the acknowledgement ranges used to build an ACK frame, one must leave at least one range in such an ebtree which must at least contain a unique one-element range with the largest acknowledged packet number as element. This issue was revealed by @Tristan971 in GH #2140. Must be backported in 2.7 and 2.6.	2023-05-11 10:33:23 +02:00
Aurelien DARRAGON	d7d507aa8a	CLEANUP: hlua_fcn/queue: make queue:push() easier to read Adding some spaces and code comments in queue:push() function to make it easier to read.	2023-05-11 09:23:14 +02:00
Aurelien DARRAGON	c0af7cdba2	BUG/MINOR: hlua_fcn/queue: fix reference leak When pushing a lua object through lua Queue class, a new reference is created from the object so that it can be safely restored when needed. Likewise, when popping an object from lua Queue class, the object is restored at the top of the stack via its reference id. However, once the object is restored the related queue entry is removed, thus the object reference must be dropped to prevent reference leak.	2023-05-11 09:23:14 +02:00
Aurelien DARRAGON	bd8a94a759	BUG/MINOR: hlua_fcn/queue: fix broken pop_wait() queue:pop_wait() was broken during late refactor prior to merge. (Due to small modifications to ensure that pop() returns nil on empty queue instead of nothing) Because of this, pop_wait() currently behaves exactly as pop(), resulting in 100% active CPU when used in a while loop. Indeed, _hlua_queue_pop() should explicitly return 0 when the queue is empty since pop_wait logic relies on this and the pushnil should be handled directly in queue:pop() function instead. Adding some comments as well to document this.	2023-05-11 09:23:14 +02:00
Christopher Faulet	0fda8d2c8e	BUG/MEDIUM: filters: Don't deinit filters for disabled proxies during startup During the startup stage, if a proxy was disabled in config, all filters were released and removed. But it may be an issue if some info are shared between filters of the same type. Resources may be released too early. It happens with ACLs defined in SPOE configurations. Pattern expressions can be shared between filters. To fix the issue, filters for disabled proxies are no longer released during the startup stage but only when HAProxy is stopped. This commit depends on the previous one ("MINOR: spoe: Don't stop disabled proxies"). Both must be backported to all stable versions.	2023-05-11 09:22:46 +02:00
Christopher Faulet	7f4ffad46e	MINOR: spoe: Don't stop disabled proxies SPOE register a signal handler to be able to stop SPOE applets ASAP during soft-stop. Disabled proxies must be ignored at this staged because they are not fully configured. For now, it is useless but this change is mandatory to fix a bug.	2023-05-11 09:22:46 +02:00
Christopher Faulet	16e314150a	BUILD: mjson: Fix warning about unused variables clang 15 reports unused variables in src/mjson.c: src/mjson.c:196:21: fatal error: expected ';' at end of declaration int __maybe_unused n = 0; and src/mjson.c:727:17: fatal error: variable 'n' set but not used [-Wunused-but-set-variable] int sign = 1, n = 0; An issue was created on the project, but it was not fixed for now: https://github.com/cesanta/mjson/issues/51 So for now, to fix the build issue, these variables are declared as unused. Of course, if there is any update on this library, be careful to review this patch first to be sure it is always required. This patch should fix the issue #1868. It be backported as far as 2.4.	2023-05-11 09:22:46 +02:00
Ilya Shipitsin	83f54b9aef	CLEANUP: src/listener.c: remove redundant NULL check fixes #2031 quoting Willy Tarreau: "Originally the listeners were intended to work without a bind_conf (e.g. for FTP processing) hence these tests, but over time the bind_conf has become omnipresent"	2023-05-11 05:30:03 +02:00
Christopher Faulet	bd90a16564	MEDIUM: stream: Resync analyzers at the end of process_stream() on change At the end of process_stream(), if there was any change on request/response analyzers, we now trigger a resync. It is performed if any analyzer is added but also removed. It should help to catch internal changes on a stream and eventually avoid it to be frozen. There is no reason to backport this patch. But it may be good to keep an eye on it, just in case.	2023-05-10 16:45:36 +02:00
Christopher Faulet	b1368adcc7	BUG/MEDIUM: stream: Forward shutdowns when unhandled errors are caught In process_stream(), after request and response analyzers evaluation, unhandled errors are processed, if any. In this case, depending on the case, remaining request or response analyzers may be removed, unlesse the last one about end of filters. However, auto-close is not reenabled in same time. Thus it is possible to not forward the shutdown for a side to the other one while no analyzer is there to do so or at least to make evolved the situation. In theory, it is thus possible to freeze a stream if no wakeup happens. And it seems possible because it explain a freeze we've oberseved. This patch could be backported to every stable versions but only after a period of observation and if it may match an unexplained bug. It should not lead to any loop but at worst and eventually to truncated messages.	2023-05-10 16:45:36 +02:00
Willy Tarreau	862588a4b5	BUG/MINOR: config: make compression work again in defaults section When commit `ead43fe4f` ("MEDIUM: compression: Make it so we can compress requests as well.") added the test for the direction flags to select the compression, it implicitly broke compression defined in defaults sections because the flags from the default proxy were not recopied, hence the compression was enabled but in no direction. No backport is needed, that's 2.8 only.	2023-05-10 16:41:21 +02:00
Frédéric Lécaille	b971696296	BUG/MINOR: quic: Possible crash when dumping version information ->others member of tp_version_information structure pointed to a buffer in the TLS stack used to parse the transport parameters. There is no garantee that this buffer is available until the connection is released. Do not dump the available versions selected by the client anymore, but displayed the chosen one (selected by the client for this connection) and the negotiated one. Must be backported to 2.7 and 2.6.	2023-05-10 13:26:37 +02:00
Amaury Denoyelle	da24bcfad3	BUG/MEDIUM: mux-quic: wakeup tasklet to close on error A recent series of commit have been introduced to rework error generation on QUIC MUX side. Now, all MUX/APP functions uses qcc_set_error() to set the flag QC_CF_ERRL on error. Then, this flag is converted to QC_CF_ERRL_DONE with a CONNECTION_CLOSE emission by qc_send(). This has the advantage of centralizing the CONNECTION_CLOSE generation in one place and reduces the link between MUX and quic-conn layer. However, we must now ensure that every qcc_set_error() call is followed by a QUIC MUX tasklet to invoke qc_send(). This was not the case, thus when there is no active transfer, no CONNECTION_CLOSE frame is emitted and the connection remains opened. To fix this, add a tasklet_wakeup() directly in qcc_set_error(). This is a brute force solution as this may be unneeded when already in the MUX tasklet context. However, it is the simplest solution as it is too tedious for the moment to list all qcc_set_error() invocation outside of the tasklet. This must be backported up to 2.7.	2023-05-09 18:42:34 +02:00
Amaury Denoyelle	58721f2192	BUG/MINOR: mux-quic: fix transport VS app CONNECTION_CLOSE A recent series of patch were introduced to streamline error generation by QUIC MUX. However, a regression was introduced : every error generated by the MUX was built as CONNECTION_CLOSE_APP frame, whereas it should be only for H3/QPACK errors. Fix this by adding an argument <app> in qcc_set_error. When false, a standard CONNECTION_CLOSE is used as error. This bug was detected by QUIC tracker with the following tests "stop_sending" and "server_flow_control" which requires a CONNECTION_CLOSE frame. This must be backported up to 2.7.	2023-05-09 18:42:34 +02:00
Christopher Faulet	a236c58223	BUG/MEDIUM: stats: Require more room if buffer is almost full This was lost with commit `f4258bdf3` ("MINOR: stats: Use the applet API to write data"). When the buffer is almost full, the stats applet gives up. When this happens, the applet must require more room. Otherwise, data in the channel buffer are sent to the client but the applet is not woken up in return. It is a 2.8-specific bug, no backport needed.	2023-05-09 16:36:45 +02:00
William Lallemand	930afdf614	BUILD: ssl: buggy -Werror=dangling-pointer since gcc 13.0 GCC complains about swapping 2 heads list, one local and one global. gcc -Iinclude -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -Werror -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS -DUSE_EPOLL -DUSE_NETFILTER -DUSE_POLL -DUSE_THREAD -DUSE_BACKTRACE -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H -DUSE_GETADDRINFO -DUSE_OPENSSL -DUSE_SSL -DUSE_LUA -DUSE_ACCEPT4 -DUSE_ZLIB -DUSE_CPU_AFFINITY -DUSE_TFO -DUSE_NS -DUSE_DL -DUSE_RT -DUSE_MATH -DUSE_SYSTEMD -DUSE_PRCTL -DUSE_THREAD_DUMP -DUSE_QUIC -DUSE_SHM_OPEN -DUSE_PCRE -DUSE_PCRE_JIT -I/github/home/opt/include -I/usr/include -DCONFIG_HAPROXY_VERSION=\"2.8-dev8-7d23e8d1a6db\" -DCONFIG_HAPROXY_DATE=\"2023/04/24\" -c -o src/ssl_sample.o src/ssl_sample.c In file included from include/haproxy/pool.h:29, from include/haproxy/chunk.h:31, from include/haproxy/dynbuf.h:33, from include/haproxy/channel.h:27, from include/haproxy/applet.h:29, from src/ssl_sock.c:47: src/ssl_sock.c: In function 'tlskeys_finalize_config': include/haproxy/list.h:48:88: error: storing the address of local variable 'tkr' in 'tlskeys_reference.p' [-Werror=dangling-pointer=] 48 \| #define LIST_INSERT(lh, el) ({ (el)->n = (lh)->n; (el)->n->p = (lh)->n = (el); (el)->p = (lh); (el); }) \| ~~~~~~~~^~~~~~ src/ssl_sock.c:1086:9: note: in expansion of macro 'LIST_INSERT' 1086 \| LIST_INSERT(&tkr, &tlskeys_reference); \| ^~~~~~~~~~~ compilation terminated due to -Wfatal-errors. This appears with gcc 13.0. The fix uses LIST_SPLICE() instead of inserting the head of the local list in the global list. Should fix issue #2136 .	2023-05-09 14:25:10 +02:00
Christopher Faulet	d6f0557deb	BUG/MEDIUM: cache: Don't request more room than the max allowed Since a recent change on the SC API, a producer must specify the amount of free space it needs to progress when it is blocked. But, it must take care to never exceed the maximum size allowed in the buffer. Otherwise, the stream is freezed because it cannot reach the condition to unblock the producer. In this context, there is a bug in the cache applet when it fails to dump a message. It may request more space than allowed. It happens when the cached object is too big. It is a 2.8-specific bug. No backport needed.	2023-05-09 11:53:28 +02:00
Frédéric Lécaille	7a01ff7921	BUG/MINOR: quic: Wrong key update cipher context initialization for encryption As noticed by Miroslav, there was a typo in quic_tls_key_update() which lead a cipher context for decryption to be initialized and used in place of a cipher context for encryption. Surprisingly, this did not prevent the key update from working. Perhaps this is due to the fact that the underlying cryptographic algorithms used by QUIC are all symetric algorithms. Also modify incorrect traces. Must be backported in 2.6 and 2.7.	2023-05-09 11:03:26 +02:00
Frédéric Lécaille	a94612522d	CLEANUP: quic: Typo fix for quic_connection_id pool Remove a "n" extra letter. Should be backported to 2.7.	2023-05-09 10:48:40 +02:00
Frédéric Lécaille	1bc6e318f0	CLEANUP: quic: Rename several <buf> variables in quic_frame.(c\|h) Most of the function in quic_frame.c and quic_frame.h manipulate <buf> buffer position variables which have nothing to see with struct buffer variables. Rename them to <pos> Should be backported to 2.7.	2023-05-09 10:48:40 +02:00
Willy Tarreau	95e6c9999a	BUILD: debug: do not check the isolated_thread variable in non-threaded builds The build without thread support was broken by commit `b30ced3d8` ("BUG/MINOR: debug: fix incorrect profiling status reporting in show threads") because it accesses the isolated_thread variable that is not defined when threads are disabled. In fact both the test on harmless and this one make no sense without threads, so let's comment out the block and mark the related variables as unused. This may have to be backported to 2.7 if the commit above is.	2023-05-07 15:02:30 +02:00
Willy Tarreau	dd9f921b3a	CLEANUP: fix a few reported typos in code comments These are only the few relevant changes among those reported here: https://github.com/haproxy/haproxy/actions/runs/4856148287/jobs/8655397661	2023-05-07 07:07:44 +02:00
Willy Tarreau	615c301db4	MINOR: config: allow cpu-map to take commas in lists of ranges The function that cpu-map uses to parse CPU sets, parse_cpu_set(), was etended in 2.4 with commit `a80823543` ("MINOR: cfgparse: support the comma separator on parse_cpu_set") to support commas between ranges. But since it was quite late in the development cycle, by then it was decided not to add a last-minute surprise and not to magically support commas in cpu-map, hence the "comma_allowed" argument. Since then we know that it was not the best choice, because the comma is silently ignored in the cpu-map syntax, causing all sorts of surprises in field with threads running on a single node for example. In addition it's quite common to copy-paste a taskset line and put it directly into the haproxy configuration. This commit relaxes this rule an finally allows cpu-map to support commas between ranges. It simply consists in removing the comma_allowed argument in the parse_cpu_set() function. The doc was updated to reflect this.	2023-05-05 18:41:52 +02:00
Amaury Denoyelle	2273af11e0	MINOR: quic: implement oneline format for "show quic" Add a new output format "oneline" for "show quic" command. This prints one connection per line with minimal information. The objective is to have an equivalent of the netstat/ss tools with just enough information to quickly find connection which are misbehaving. A legend is printed on the first line to describe the field columns starting with a dash character. This should be backported up to 2.7.	2023-05-05 18:08:37 +02:00
Amaury Denoyelle	bc1f5fed72	MINOR: quic: add format argument for "show quic" Add an extra optional argument for "show quic" to specify desired output format. Its objective is to control the verbosity per connections. For the moment, only "full" is supported, which is the already implemented output with maximum information. This should be backported up to 2.7.	2023-05-05 18:06:51 +02:00
Aurelien DARRAGON	86fb22c557	MINOR: hlua_fcn: add Queue class Adding a new lua class: Queue. This class provides a generic FIFO storage mechanism that may be shared between multiple lua contexts to easily pass data between them, as stock Lua doesn't provide easy methods for passing data between multiple coroutines. New Queue object may be obtained using core.queue() (it works like core.concat() for a concat Class) Lua documentation was updated (including some usage examples)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	40cd44f52c	MINOR: hlua: declare hlua_gethlua() function Declaring hlua_gethlua() function to make it usable from hlua_fcn.c.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	e0b16355ce	CLEANUP: hlua: hlua_register_task() may longjmp Adding __LJMP prefix to hlua_register_task() to indicate that the function may longjmp when executed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	977688bd57	MINOR: server: fix message report when IDRAIN is set and MAINT is cleared Remaining in drain mode after removing one of server admins flags leads to this message being generated: "Server name/backend is leaving forced drain but remains in drain mode." However this is not necessarily true: the server might just be leaving MAINT with the IDRAIN flag set, so the report is incorrect in this case. (FDRAIN was not set so it cannot be cleared) To prevent confusion around this message and to comply with the code comment above it: we remove the "leaving forced drain" precision to make the report suitable for multiple transitions.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	a2c5321045	BUG/MINOR: hlua: spinning loop in hlua_socket_handler() Since `3157222` ("MEDIUM: hlua/applet: Use the sedesc to report and detect end of processing"), hlua_socket_handler() might spin loop if the hlua socket is destroyed and some data was left unconsumed in the applet. Prior to the above commit, the stream was explicitly KILLED (when ctx->die == 1) so the app couldn't spinloop on unconsumed data. But since the refactor this is no longer the case. To prevent unconsumed data from waking the applet indefinitely, we consume pending data when either one of EOS\|ERROR\|SHR\|SHW flags are set, as it is done everywhere else this check is performed in the code. Hence it was probably overlooked in the first place during the refacto. This bug is 2.8 specific only, so no backport needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	717a38d135	MINOR: hlua: expose proxy mailers Proxy mailers, which are configured using "email-alert" directives in proxy sections from the configuration, are now being exposed directly in lua thanks to the proxy:get_mailers() method which returns a class containing the various mailers settings if email alerts are configured for the given proxy (else nil is returned). Both the class and the proxy method were marked as LEGACY since this feature relies on mailers configuration, and the goal here is to provide the last missing bits of information to lua scripts in order to make them capable of sending email alerts instead of relying on the soon-to- be deprecated mailers implementation based on checks (see src/mailers.c) Lua documentation was updated accordingly.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	5bed48fec8	MINOR: mailers/hlua: disable email sending from lua Exposing a new hlua function, available from body or init contexts, that forcefully disables the sending of email alerts even if the mailers are defined in haproxy configuration. This will help for sending email directly from lua. (prevent legacy email sending from intefering with lua)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	0bd53b2152	MINOR: hlua/event_hdl: expose SERVER_CHECK event Exposing SERVER_CHECK event through the lua API. New lua class named ServerEventCheck was added to provide additional data for SERVER_CHECK event. Lua documentation was updated accordingly.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	dcbc2d2cac	MINOR: checks/event_hdl: SERVER_CHECK event Adding a new event type: SERVER_CHECK. This event is published when a server's check state ought to be reported. (check status change or check result) SERVER_CHECK event is provided as a server event with additional data carrying relevant check's context such as check's result and health.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	948dd3ddfb	MINOR: hlua: expose SERVER_ADMIN event Exposing SERVER_ADMIN event in lua and updating the documentation.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	a163d65254	MINOR: server/event_hdl: add SERVER_ADMIN event Adding a new SERVER event in the event_hdl API. SERVER_ADMIN is implemented as an advanced server event. It is published each time the administrative state changes. (when s->cur_admin changes) SERVER_ADMIN data is an event_hdl_cb_data_server_admin struct that provides additional info related to the admin state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	c99f3adf10	MINOR: hlua: expose SERVER_STATE event Exposing SERVER_STATE event in lua and updating the documentation.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	c249f6d964	OPTIM: server: publish UP/DOWN events from STATE change Reuse cb_data from STATE event to publish UP and DOWN events. This saves some CPU time since the event is only constructed once to publish STATE, STATE+UP or STATE+DOWN depending on the state change.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	e3eea29f48	MINOR: server/event_hdl: add SERVER_STATE event Adding a new SERVER event in the event_hdl API. SERVER_STATE is implemented as an advanced server event. It is published each time the server's effective state changes. (when s->cur_state changes) SERVER_STATE data is an event_hdl_cb_data_server_state struct that provides additional info related to the server state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	306a5fc987	MINOR: server/event_hdl: publish macro helper add a macro helper to help publish server events to global and per-server subscription list at once since all server events support both subscription modes.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	fc84553df8	MINOR: hlua_fcn: add Proxy.get_srv_act() and Proxy.get_srv_bck() Proxy.get_srv_act: number of active servers that are eligible for LB Proxy.get_srv_bck: number of backup servers that are eligible for LB	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	fc759b4ac2	MINOR: hlua_fcn: add Server.get_pend_conn() and Server.get_cur_sess() Server.get_pend_conn: number of pending connections to the server Server.get_cur_sess: number of current sessions handled by the server Lua documentation was updated accordingly.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	3889efa8e4	MINOR: hlua_fcn: add Server.get_proxy() Server.get_proxy(): get the proxy to which the server belongs (or nil if not available)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	4be36a1337	MINOR: hlua_fcn: add Server.get_trackers() This function returns an array of servers who are currently tracking the server.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	406511a2df	MINOR: hlua_fcn: add Server.tracking() This function returns the currently tracked server, if any.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	7a03dee36f	MINOR: hlua_fcn: add Server.is_dynamic() This function returns true if the current server is dynamic, meaning that it was instantiated at runtime (ie: from the cli)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	c72051d53a	MINOR: hlua_fcn: add Server.is_backup() This function returns true if the current server is a backup server.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	862a0fe75a	MINOR: hlua_fcn: fix Server.is_draining() return type Adjusting Server.is_draining() return type from integer to boolean to comply with the documentation.	2023-05-05 16:28:32 +02:00
Christopher Faulet	e7405d4124	MEDIUM: stconn: Check room needed to unblock opposite SC when data was sent After a sending attempt, we check the opposite SC to see if it is waiting for a minimum free space to receive more data. If the condition is respected, it is unblocked. 0 is special case where the SC is unconditionally unblocked.	2023-05-05 15:44:23 +02:00
Christopher Faulet	18b3309f38	MEDIUM: stconn: Check room needed to unblock SC on fast-forward During fast-forward, if the SC is waiting for a minimum free space to receive more data and some data was sent, it is only unblock is the condition is respected. 0 is special case where the SC is unconditionally unblocked.	2023-05-05 15:44:23 +02:00
Christopher Faulet	c184b11b1a	MEDIUM: applet: Check room needed to unblock opposite SC when data was consumed If the opposite SC is waiting for a minimum free space to receive more data, it is only unblock is the condition is respected. 0 is a special cases where the opposite SC is always unblocked.	2023-05-05 15:44:23 +02:00
Christopher Faulet	fab82bfd55	BUG/MEDIUM: stconn: Unblock SC from stream if there is enough room to progrees At the end of process_stream(), in sc_update_rx(), the SC is now unblocked if it was waiting for room and the free space in the input buffer is large enough. This patch should fix an issue with the compression filter that can leave the channel's buffer empty while the endpoint is waiting for room to progress. Indeed, in this case, because the buffer is empty, there is no send attempt and no other way to unblock the SE. This commit depends on following commits: * MEDIUM: tree-wide: Change sc API to specify required free space to progress * MINOR: stconn: Add a field to specify the room needed by the SC to progress * MINOR: peers: Use the applet API to send message * MINOR: stats: Use the applet API to write data * MINOR: cli: Use applet API to write output message It should fix a regression introduced with the commit `341a5783b` ("BUG/MEDIUM: stconn: stop to enable/disable reads from streams via si_update_rx"). It must be backported iff the commit above is also backported. It was not backported yet and it is thus probably a good idea to not do so to avoid to backport too many change..	2023-05-05 15:44:23 +02:00
Christopher Faulet	7b3d38a633	MEDIUM: tree-wide: Change sc API to specify required free space to progress sc_need_room() now takes the required free space to receive more data as parameter. All calls to this function are updated accordingly. For now, this value is set but not used. When we are waiting for a buffer, 0 is used. So we expect to be unblocked ASAP. However this must be reviewed because SC_FL_NEED_BUF is probably enough in this case and this flag is already set if the input buffer allocation fails.	2023-05-05 15:44:23 +02:00
Christopher Faulet	9aed1124ed	MINOR: stconn: Add a field to specify the room needed by the SC to progress When the SC is blocked because it is waiting for room in the input buffer, it will be responsible to specify the minimum free space required to progress. In this commit, we only introduce the field in the stconn structure that will be used to store this value. It is a signed value with the following meaning: * -1: The SC is waiting for room but not based on the buffer state. It will be typically used during splicing when the pipe is full. In this case, only a successful send can unblock the SC. * >= 0; The minimum free space in the input buffer to unblock the SC. 0 is a special value to specify the SC must be unblocked ASAP, by the stream, at the end of process_stream() or when output data are consumed on the opposite side.	2023-05-05 15:41:30 +02:00
Christopher Faulet	7a48b72d39	MINOR: peers: Use the applet API to send message The peers applet now use the applet API to send message instead of the channel API. This way, it does not need to take care to request more room if it fails to put data into the channel's buffer.	2023-05-05 15:41:30 +02:00
Christopher Faulet	f4258bdf3b	MINOR: stats: Use the applet API to write data stats_putchk() is updated to use the applet API instead of the channel API to write data. To do so, the appctx is passed as parameter instead of the channel. This way, the applet does not need to take care to request more room it it fails to put data into the channel's buffer.	2023-05-05 15:41:29 +02:00
Christopher Faulet	e8ee27b0fd	MINOR: cli: Use applet API to write output message Instead of using the channel API to to write output message from the CLI applet, we use the applet API. This way, the applet does not need to take care to request more room it it fails to put its message into the channel's buffer.	2023-05-05 15:41:19 +02:00
William Lallemand	b6ae2aafde	MINOR: ssl: allow to change the signature algorithm for client authentication This commit introduces the keyword "client-sigalgs" for the bind line, which does the same as "sigalgs" but for the client authentication. "ssl-default-bind-client-sigalgs" allows to set the default parameter for all the bind lines. This patch should fix issue #2081.	2023-05-05 00:05:46 +02:00
William Lallemand	1d3c822300	MINOR: ssl: allow to change the server signature algorithm This patch introduces the "sigalgs" keyword for the bind line, which allows to configure the list of server signature algorithms negociated during the handshake. Also available as "ssl-default-bind-sigalgs" in the default section. This patch was originally written by Bruno Henc.	2023-05-04 22:43:18 +02:00
Willy Tarreau	e69919d1ba	CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash() The function isn't used anymore since each call place performs its own loop. Let's get rid of it.	2023-05-04 19:19:04 +02:00
Willy Tarreau	009b5519e6	MINOR: debug: make "show threads" properly iterate over all threads Previously it would re-dump all threads to the same trash if the output buffer was full, which it never was since the trash is of the same size. Now it dumps one thread, copies it to the buffer and yields until it can continue. Showing 256 threads works as expected.	2023-05-04 19:15:50 +02:00
Willy Tarreau	880d1684a7	MINOR: debug: write panic dump to stderr one thread at a time Currently large setups cannot dump all their threads because they're first dumped to the trash buffer, then copied to stderr. Here we can now change this, instead we dump one thread at a time into the trash and immediately send it to stderr. We also keep a copy into a local trash chunk that's assigned to thread_dump_buffer so that a core file still contains a copy of a large number of threads, which is generally sufficient for the vast majority of situations. It was verified that dumping 256 threads now produces ~55kB of output and all of them are properly dumped.	2023-05-04 19:15:50 +02:00
Willy Tarreau	9a6ecbd590	MEDIUM: debug: simplify the thread dump mechanism The thread dump mechanism that is used by "show threads" and by the panic dump is overly complicated due to an initial misdesign. It firsts wakes all threads, then serializes their dumps, then releases them, while taking extreme care not to face colliding dumps. In fact this is not what we need and it reached a limit where big machines cannot dump all their threads anymore due to buffer size limitations. What is needed instead is to be able to dump one thread, and to let the requester iterate on all threads. That's what this patch does. It adds the thread_dump_buffer to the struct thread_ctx so that the requester offers the buffer to the thread that is about to be dumped. This buffer also serves as a lock. A thread at rest has a NULL, a valid pointer indicates the thread is using it, and 0x1 (NULL+1) is used by the dumped thread to tell the requester it's done. This makes sure that a given thread is dumped once at a time. In addition to this, the calling thread decides whether it accesses the thread by itself or via the debug signal handler, in order to get a backtrace. This is much saner because the calling thread is free to do whatever it wants with the buffer after each thread is dumped, and there is no dependency between threads, once they've dumped, they're free to continue (and possibly to dump for another requester if needed). Finally, when the THREAD_DUMP feature is disabled and the debug signal is not used, the requester accesses the thread by itself like before. For now we still have the buffer size limitation but it will be addressed in future patches.	2023-05-04 19:15:44 +02:00
Christopher Faulet	34f81d5815	BUG/MINOR: mux-h2: Also expect data when waiting for a tunnel establishment When a client H2 stream is waiting for a tunnel establishment, it must state it expects data from server. It is the second fix that should fix regressions of the commit 2722c04b ("MEDIUM: mux-h2: Don't expect data from server as long as request is unfinished") It is a 2.8-specific bug. No backport needed.	2023-05-04 16:58:33 +02:00
Willy Tarreau	cb01f5daa7	BUG/MINOR: debug: do not emit empty lines in thread dumps In 2.3, commit `471425f51` ("BUG/MINOR: debug: Don't dump the lua stack if it is not initialized") introduced the possibility to emit an empty line when there's no Lua info to dump. The problem is that doing this on the CLI in "show threads" marks the end of the output, and it may affect some external tools. We need to make sure that LFs are only emitted if there's something on the line and that all lines properly start with the prefix. This may be backported as far as 2.0 since the commit above was backported there.	2023-05-04 16:51:50 +02:00
Amaury Denoyelle	d4af04198b	MINOR: mux-quic: close connection asap on local error With the change for QUIC MUX local error API, the new flag QC_CF_ERRL is now checked on qc_detach(). If set, qcs instance is freed even though transfer is not finished. This should help to quickly release qcs and eventually all MUX instance resources. To further accelerate this, a specific check has been added in qc_shutw(). It is skipped if local error flag is set to prevent noisy reset stream invocation. In the same way, QUIC MUX is not rescheduled on qc_recv_buf() operation if local error flag set. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	35542ce7bf	MINOR: mux-quic: report local error on stream endpoint asap If an error a detected at the MUX layer, all remaining stream endpoints should be closed asap with error set. This is now done by checking for QC_CF_ERRL flag on qc_wake_some_streams() and qc_send_buf(). To complete this, qc_wake_some_streams() is called by qc_process() if needed. This should help to quickly release streams as soon as a new error is detected locally by the MUX or APP layer. This allows to in turn free the MUX instance itself. Previously, error would not have been automatically reported until the transport layer closure would occur on CONNECTION_CLOSE emission. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	51f116d65e	MINOR: mux-quic: adjust local error API When a fatal error is detected by the QUIC MUX or H3 layer, the connection should be closed with a CONNECTION_CLOSE with an error code as the reason. Previously, a direct call was used to the quic_conn layer to try to close the connection. This API was adjusted to be more flexible. Now, when an error is detected, the function qcc_set_error() is called. This set the flag QC_CF_ERRL with the error code stored by the MUX. The connection will be closed soon so most of the operations are not conducted anymore. Connection is then finally closed during qc_send() via quic_conn layer if QC_CF_ERRL is set. This will set the flag QC_CF_ERRL_DONE which indicates that the MUX instance can be freed. This model is cleaner and brings the following improvments : - interaction with quic_conn layer for closure is centralized on a single function - CO_FL_ERROR is not set anymore. This was incorrect as this should be reserved to errors reported by the transport layer to be similar with other haproxy components. As a consequence, qcc_is_dead() has been adjusted to check for QC_CF_ERRL_DONE to release the MUX instance. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	b8901d2c86	MINOR: mux-quic: wake up after recv only if avail data When HTX content is transferred from qcs instance to upper stream endpoint, a wakeup is conducted for MUX tasklet. However, this is only necessary if demux was interrupted due to a full QCS HTX buffer. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	8d44bfaf0b	MINOR: mux-quic: add trace event for local error Add a dedicated trace event QMUX_EV_QCC_ERR. This is used for locally detected error when a CONNECTION_CLOSE should be emitted. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	b737f95009	BUG/MINOR: mux-quic: prevent quic_conn error code to be overwritten When MUX performs a graceful shutdown, quic_conn error code is set to a "no error" code which depends on the application layer used. However, this may overwrite a previous error code if quic_conn layer has detected an error on its side. In practice, this behavior has not been seen on production. In fact, it may have undesirable effect only if this error code modification happens between the quic_conn error detection and the emission of the CONNECTION_CLOSE, so it should be pretty rare. However, there is still a tiny possibility it may happen. To prevent this, first check that quic_conn error code is not set before setting it. Ideally, transport layer API should be adjusted to be able to set this without fiddling with the quic_conn directly. This should be backported up to 2.6.	2023-05-04 16:36:51 +02:00
Christopher Faulet	4403cdf653	BUG/MEDIUM: mux-h2: Properly handle end of request to expect data from server The commit 2722c04b ("MEDIUM: mux-h2: Don't expect data from server as long as request is unfinished") introduced a regression in the H2 multiplexer. The end of the request is not systematically handled to state a H2 stream on client side now expexts data from the server. Indeed, while the client is uploading its request, the H2 stream warns it does not expect data from the server. This way, no server timeout is applied at this stage. When end of the request is detected, the H2 stream must state it now expects the server response. This enables the server timeout. However, it was only performed at one place while the end of the request can be handled at different places. First, during a zero-copy in h2_rcv_buf(). Then, when the SC is created with the full request. Because of this bug, it is possible to totally disable the server timeout for H2 streams. In h2_rcv_buf(), we now rely on h2s flags to detect the end of the request, but only when the rxbuf was emptied. It is a 2.8-specific bug. No backport needed.	2023-05-04 16:29:27 +02:00
Willy Tarreau	e5e62231d8	MINOR: debug: permit the "debug dev loop" to run under isolation Sometimes it's convenient to test the effect of tasks running under isolation, e.g. to validate the contents of the crash dumps. Let's add an optional "isolated" keyword to "debug dev loop" for this.	2023-05-04 11:50:26 +02:00
Willy Tarreau	b30ced3d88	BUG/MINOR: debug: fix incorrect profiling status reporting in show threads Thread dumps include a field "prof" for each thread that reports whether task profiling is currently active or not. It turns out that in 2.7-dev1, commit `680ed5f28` ("MINOR: task: move profiling bit to per-thread") mistakenly replaced it with a check for the current thread's bit in the thread dumps, which basically is the only place where another thread is being watched. The same mistake was done a few lines later by confusing threads_want_rdv_mask with the profiling mask. This mask disappeared in 2.7-dev2 with commit `598cf3f22` ("MAJOR: threads: change thread_isolate to support inter-group synchronization"), though instead we know the ID of the isolated thread. This commit fixes this and now reports "isolated" instead of "wantrdv". This can be backported to 2.7.	2023-05-04 11:41:33 +02:00
Willy Tarreau	8b3e39e37b	MINOR: activity: allow "show activity" to restart in the middle of a line 16kB buffers are not enough to dump 4096 threads with up to 10 bytes value on each line. By storing the column number in the applet's context, we can now restart from the last attempted column. This requires to dump all values as they are produced, but it doesn't cost that much: a 4096-thread output from a fesh process produces 300kB of output in ~8ms, or ~400us per call (19*16kB), most of which are spent in vfprintf(). Given that we don't print more than needed, it doesn't really change anything. The main caveat is that when interrupted on such large lines, there's a great possibility that the total or average on the first column doesn't match anymore the sum or average of all dumped values. In order to avoid this whenever possible (typically less than ~1500 threads), we first try to dump entire lines and only proceed one column at a time when we have to retry a failed dump. This is already the same for other stats that are dumped in an interruptible way anyway and there's little that can be done about it at this point (and not much immediately perceived benefit in doing this with extreme accuracy for >1500 threads).	2023-05-03 17:26:11 +02:00
Willy Tarreau	6ed0b9885d	MINOR: activity: allow "show activity" to restart dumping on any line When using many threads, it's difficult to see the end of "show activity" due to the numerous columns which fill the buffer. For example a dump of a 256-thread, freshly booted process yields around 15kB. Here by arranging the dump in a loop around a switch/case block where each case checks the code line number against the current dump position, we have a restartable counter for free with a granularity of the line of code, without having to maintain a matching between states and specific lines. It just requires to reset the trash buffer for each line and to try to dump it after each line. Now dumping 256 threads after a few seconds of traffic happily emits 20kB.	2023-05-03 17:24:54 +02:00
Willy Tarreau	8ee0d11cb8	MINOR: activity: iterate over all fields in a main loop for dumping Now each line of "show activity" will iterate over n+2 fields, one for the line header, one for the total, and one per thread. This will soon allow us to save the current state in a restartable way.	2023-05-03 17:24:54 +02:00
Willy Tarreau	a465b21516	MINOR: activity: show the line header inside the SHOW_VAL macro Doing so will allow us to drop the extra chunk_appendf() dedicated to the line header and simplify iteration over restartable columns.	2023-05-03 17:24:54 +02:00
Willy Tarreau	5ddf9bea09	MINOR: activity: use a single macro to iterate over all fields Instead of having SHOW_AVG() and SHOW_TOT(), let's just have SHOW_VAL() which iterates over all values.	2023-05-03 17:24:54 +02:00
Willy Tarreau	ff508f12c6	BUILD: cli: fix build on Windows due to isalnum() implemented as a macro Commit `986798718` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") broke the build on windows due to this: src/debug.c:940:95: error: array subscript has type char [-Werror=char-subscripts] 940 \| caller && may_access(caller) && may_access(caller->func) && isalnum(*caller->func) ? caller->func : "0", \| ^~~~~~~~~~~~~ It's classical on platforms which implement ctype.h as macros instead of functions, let's cast it as uchar. No backport is needed.	2023-05-03 16:32:50 +02:00
William Lallemand	117c7fde06	BUG/MINOR: ssl/sample: x509_v_err_str converter output when not found The x509_v_err_str converter now outputs the numerical value as a string when the corresponding constant name was not found. Must be backported as far as 2.7.	2023-05-03 15:19:38 +02:00
Willy Tarreau	9867987182	DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets When analyzing certain types of bugs in field, sometimes it would be nice to be able to wake up a task or tasklet to see how events progress (e.g. to detect a missing wakeup condition), or expire or kill such a task. This restricted command shows hte current state of a task or tasklet and allows to manipulate it like this. However it must be used with extreme care because while it does verify that the pointers are mapped, it cannot know if they point to a real task, and performing such actions on something not a task will easily lead to a crash. In addition, performing a "kill" on a task has great chances of provoking a deferred crash due to a double free and/or another kill that is not idempotent. Use with extreme care!	2023-05-03 11:47:44 +02:00
Willy Tarreau	dd01448953	MINOR: debug: clarify "debug dev stream" help message The help message was insufficient to figure how to use it and specify the stream pointer and changes to operate.	2023-05-03 11:47:44 +02:00
Willy Tarreau	65efd33c06	BUG/MINOR: stream/cli: fix stream age calculation in "show sess" The "show sess" command displays the stream's age in synthetic form, and also makes it appear in the long version (show sess all). But that last one uses the wrong origin, it uses accept_date.tv_sec instead of accept_ts (formerly known as tv_accept). This was introduced in 1.4.2 with the long format, with commit `66dc20a17` ("[MINOR] stats socket: add show sess <id> to dump details about a session"), while the code that split the two variables was introduced in 1.3.16 with commit `b7f694f20` ("[MEDIUM] implement a monotonic internal clock"). This problem was revealed by recent change `ad5a5f677` ("MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request") that made this value report random garbage, and generally emphasized by the fact that in 2.8 the two clocks have sufficiently large an offset for such mistakes to be noticeable early. Arguably a difference between date and accept_date could also make sense, to indicate if the stream had been there for more than 49 days, but this would introduce instabilities for most sockets (including negative times) for extremely rare cases while the goal is essentially to see how much longer than a configured timeout a stream has been there. And that's what other locations (including the short form) provide. This patch could be backported but most users will never notice. In case of backport, tv_accept.tv_sec should be used instead of accept_date.tv_sec.	2023-05-03 11:47:44 +02:00
William Lallemand	64a77e3ea5	MINOR: ssl: disable CRL checks with WolfSSL when no CRL file WolfSSL is enabling by default the CRL checks even if a CRL file wasn't provided. This patch resets the default X509_STORE flags so this is not checked by default.	2023-05-02 18:30:11 +02:00
Tim Duesterhus	0ababda701	BUG/MINOR: stats: fix typo in `TotalSplicedBytesOut` field name An additional `d` slipped in there. This likely should not be backported, because scripts might rely on the typoed name. Public discussion on this topic here: https://www.mail-archive.com/haproxy@formilux.org/msg43359.html	2023-05-02 11:15:49 +02:00
Amaury Denoyelle	bc0adfa334	MINOR: proxy: factorize send rate measurement Implement a new dedicated function increment_send_rate() which can be call anywhere new bytes must be accounted for global total sent.	2023-04-28 16:53:44 +02:00
Amaury Denoyelle	1bcb695a05	MINOR: quic: use real sending rate measurement Before this patch, global sending rate was measured on the QUIC lower layer just after sendto(). This meant that all QUIC frames were accounted for, including non STREAM frames and also retransmission. To have a better reflection of the application data transferred, move the incrementation into the MUX layer. This allows to account only for STREAM frames payload on their first emission. This should be backported up to 2.6.	2023-04-28 16:52:26 +02:00
Aleksandar Lazic	5529c9985e	MINOR: sample: Add bc_rtt and bc_rttvar This Patch adds fetch samples for backends round trip time.	2023-04-28 16:31:08 +02:00
Willy Tarreau	c05d30e9d8	MINOR: clock: replace the timeval start_time with start_time_ns Now that "now" is no more a timeval, there's no point keeping a copy of it as a timeval, let's also switch start_time to nanoseconds, it simplifies operations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	eed5da1037	MINOR: clock: do not use now.tv_sec anymore Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec part to disappear. At this point, "now" is only used as a timeval in clock.c where it is updated.	2023-04-28 16:08:08 +02:00
Willy Tarreau	e8e4712771	MINOR: checks: use a nanosecond counters instead of timeval for checks->start Now we store the checks start date as a nanosecond timestamps instead of a timeval, this will simplify the operations with "now" in the near future.	2023-04-28 16:08:08 +02:00
Willy Tarreau	b68d308aec	MINOR: activity: use nanoseconds, not timeval to compute uptime Now that we have the required functions, let's get rid of the timeval in intermediary calculations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	563efe62e9	MINOR: stats: use nanoseconds, not timeval to compute uptime Now that we have the required functions, let's get rid of the timeval in intermediary calculations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	ad5a5f6779	MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request Let's get rid of timeval in storage of internal timestamps so that they are no longer mistaken for wall clock time. These were exclusively used subtracted from each other or to/from "now" after being converted to ns, so this patch removes the tv_to_ns() conversion to use them natively. Two occurrences of tv_isge() were turned to a regular wrapping subtract.	2023-04-28 16:08:08 +02:00
Willy Tarreau	aaebcae58b	MINOR: spoe: switch the timeval-based timestamps to nanosecond timestamps Various points were collected during a request/response and were stored using timeval. Let's now switch them to nanosecond based timestamps.	2023-04-28 16:08:08 +02:00
Willy Tarreau	76d343d3d3	MINOR: time: replace calls to tv_ms_elapsed() with a linear subtract Instead of operating on {sec, usec} now we convert both operands to ns then subtract them and convert to ms. This is a first step towards dropping timeval from these timestamps. Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at all and could be removed.	2023-04-28 16:08:08 +02:00
Willy Tarreau	7222db7b84	BUG/MINOR: stats: report the correct start date in "show info" The "show info" help for "Start_time_sec" says "Start time in seconds" so it's definitely the start date in human format, not the internal one that is solely used to compute uptime. Since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"), both are split apart since the start time takes into account the offset needed to cause the early wraparound, so we must only use start_date here. No backport is needed.	2023-04-28 16:08:08 +02:00
Christopher Faulet	2ebac6a320	BUG/MEDIUM: tcpcheck: Don't eval custom expect rule on an empty buffer The commit `a664aa6a6` ("BUG/MINOR: tcpcheck: Be able to expect an empty response") instroduced a regression for expect rules relying on a custom function. Indeed, there is no check on the buffer to be sure it is not empty before calling the custom function. But some of these functions expect to have data and don't perform any test on the buffer emptiness. So instead of fixing all custom functions, we just don't eval them if the buffer is empty. This patch must be backported but only if the commit above was backported first.	2023-04-28 15:01:10 +02:00
Christopher Faulet	89aeabff5b	BUG/MINOR: resolvers: Use sc_need_room() to wait more room when dumping stats It was a cut/paste typo during stream-interface to conn-stream refactoring. sc_have_room() was used instead of sc_need_room(). This patch must be backported as far as 2.6.	2023-04-28 08:51:34 +02:00
Christopher Faulet	e99c43907c	BUG/MEDIUM: spoe: Don't start new applet if there are enough idle ones It is possible to start too many applets on sporadic burst of events after an inactivity period. It is due to the way we estimate if a new applet must be created or not. It is based on a frequency counter. We compare the events processing rate against the number of events currently processed (in progress or waiting to be processed). But we should also take care of the number of idle applets. We already track the number of idle applets, but it is global and not per-thread. Thus we now also track the number of idle applets per-thread. It is not a big deal because this fills a hole in the spoe_agent structure. Thanks to this counter, we can refrain applets creation if there is enough idle applets to handle currently processed events. This patch should be backported to every stable versions.	2023-04-28 08:51:34 +02:00
Willy Tarreau	d2f61de8c2	BUG/MINOR: hlua: return wall-clock date, not internal date in core.now() That's hopefully the last one affected by this. It was a bit trickier because there's the promise in the doc that the date is monotonous, so we continue to use now-start_time as the uptime value and add it to start_date to get the current date. It was also emphasized by commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"), causing core.now() to return a date of Mar 20 on Apr 27. No backport is needed.	2023-04-27 18:44:14 +02:00
Willy Tarreau	bc3c4e85f0	BUG/MINOR: trace: show wall-clock date, not internal date in show activity Yet another case where "now" was used instead of "date" for a publicly visible date that was already incorrect and became worse after commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"). No backport is needed.	2023-04-27 18:22:34 +02:00
Willy Tarreau	22b6d26c57	BUG/MINOR: calltrace: fix 'now' being used in place of 'date' Since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot") we have a much clearer distinction between 'now' (the internal, drifting clock) and 'date' (the wall clock time). The calltrace code was using "now" instead of "date" since the value is displayed to humans. No backport is needed.	2023-04-27 18:14:57 +02:00
Willy Tarreau	fe1b3b8777	Revert "BUG/MINOR: clock: fix a few occurrences of 'now' being used in place of 'date'" This reverts commit `aadcfc9ea6`. The parts affecting the DeviceAtlas addon were wrong actually, the "now" variable was a local time_t in a file that's not compiled with the haproxy binary (dadwsch). Only the fix to the calltrace is correct, so better revert and fix the only one in a separate commit. No backport is needed.	2023-04-27 18:14:57 +02:00
Willy Tarreau	82bde18aa4	BUG/MINOR: activity: show wall-clock date, not internal date in show activity Another case where "now" was used instead of "date" for a publicly visible date that was already incorrect and became worse after commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"). No backport is needed.	2023-04-27 14:47:50 +02:00
Willy Tarreau	a5f0e6cfc0	BUG/MINOR: spoe: use "date" not "now" in debug messages The debug messages were still emitted with a date taken from "now" instead of "date", which was not correct a long time ago but which became worse in 2.8 since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"). Let's fix it. No backport is needed.	2023-04-27 11:57:53 +02:00
Willy Tarreau	aadcfc9ea6	BUG/MINOR: clock: fix a few occurrences of 'now' being used in place of 'date' Since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot") we have a much clearer distinction between 'now' (the internal, drifting clock) and 'date' (the wall clock time). There were still a few places where 'now' was being used for human consumption. No backport is needed.	2023-04-26 19:21:25 +02:00
Amaury Denoyelle	7b516d3732	BUG/MINOR: quic: fix race on quic_conns list during affinity rebind Each quic_conn are attached in a global thread-local quic_conns list used for "show quic" command. During thread rebinding, a connection is detached from its local list instance and moved to its new thread list. However this operation is not thread-safe and may cause a race condition. To fix this, only remove the connection from its list inside qc_set_tid_affinity(). The connection is inserted only after in qc_finalize_affinity_rebind() on the new thread instance thus prevented a race condition. One impact of this is that a connection will be invisible during rebinding for "show quic". A connection must not transition to closing state in between this two steps or else cleanup via quic_handle_stopping() may not miss it. To ensure this, this patch relies on the previous commit : commit `d6646dddcc` MINOR: quic: finalize affinity change as soon as possible This should be backported up to 2.7.	2023-04-26 17:50:22 +02:00
Amaury Denoyelle	d6646dddcc	MINOR: quic: finalize affinity change as soon as possible During accept, a quic-conn is rebind to a new thread. This process is done in two times : * first on the original thread via qc_set_tid_affinity() * then on the newly assigned thread via qc_finalize_affinity_rebind() Most quic_conn operations (I/O tasklet, task and quic_conn FD socket read) are reactivated ony after the second step. However, there is a possibility that datagrams are handled before it via quic_dgram_parse() when using listener sockets. This does not seem to cause any issue but this may cause unexpected behavior in the future. To simplify this, qc_finalize_affinity_rebind() will be called both by qc_xprt_start() and quic_dgram_parse(). Only one invocation will be performed thanks to the new flag QUIC_FL_CONN_AFFINITY_CHANGED. This should be backported up to 2.7.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	a57ab0fabe	MINOR: mux-quic: do not allocate Tx buf for empty STREAM frame Sometimes it may be necessary to send an empty STREAM frame to signal clean stream closure with FIN bit set. Prior to this change, a Tx buffer was allocated unconditionnally even if no data is transferred. Most of the times, allocation was not performed due to an older buffer reused. But if data were already acknowledge, a new buffer is allocated. No memory leak occurs as the buffer is properly released when the empty frame acknowledge is received. But this allocation is unnecessary and it consumes a connexion Tx buffer for nothing. Improve this by skipping buffer allocation if no data to transfer. qcs_build_stream_frm() is now able to deal with a NULL out argument. This should be backported up to 2.6.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	42c5b75cac	MINOR: mux-quic: do not set buffer for empty STREAM frame Previous patch fixes an issue occurring with empty STREAM frames without payload. The crash was hidden in part because buf/data fields of qf_stream were set even if no payload is referenced. This was not the true cause of the crash but to ease future debugging, a STREAM frame built with no payload now has its buf and data fields set to NULL. This should be backported up to 2.6.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	19eaf88fda	BUG/MINOR: quic: prevent buggy memcpy for empty STREAM Sometimes it may be necessary to send empty STREAM frames with only the FIN bit set. For these frames, memcpy is thus unnecessary as their payload is empty. However, we did not prevent its invocation inside quic_build_stream_frame(). Normally, memcpy invocation with length==0 is safe. However, there is an extra condition in our function to handle data wrapping. For an empty STREAM frame in the context of MUX emission, this is safe as the frame points to a valid buffer which causes the wrapping condition to be false and resulting in a memcpy with 0 length. However, in the context of retransmission, this may lead to a crash. Consider the following scenario : two STREAM frames A and B are produced, one with payload and one empty with FIN set, pointing to the same stream_desc buffer. If A is acknowledged by the peer, its buffer is released as no more data is left in it. If B needs to be resent, the wrapping condition will be messed up to a reuse of a freed buffer. Most of the times, <wrap> will be a negative number, which results in a memcpy invocation causing a buffer overflow. To fix this, simply add an extra condition to skip memcpy and wrapping check if STREAM frame length is null inside quic_build_stream_frame(). This crash is pretty rare as it relies on a lot of conditions difficult to reproduce. It seems to be the cause for the latest crashes reported under github issue #2120. In all the inspected dumps, the segfault occurred during retransmission with an empty STREAM frame being used as input. Thanks again to Tristan from Mangadex for his help and investigation on it. This should be backported up to 2.6.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	7c5591facb	BUG/MEDIUM: mux-quic: improve streams fairness to prevent early timeout Since the following mentioned patch, a send-list mechanism was implemented to improve streams priorization on sending. commit `20f2a425ff` MAJOR: mux-quic: rework stream sending priorization This is done to prevent the same streams to always be used as first ones on emission. However there is still a flaw on the algorithm. Once put in the send-list, a streams is not removed until it has sent all of its content. When a stream transfers a large object, it will remain in the send-list during all the transfer and will soon monopolize the first place. the stream does never leave its position until the transfer is finished and will monopolize the first place. Other streams behind won't have the opportunity to advance on their own transfers due to a Tx buffer exhaustion. This situation is especially problematic if a small timeout client is used. As some streams won't advance on their transfer for a long period of time, they will be aborted due to a stream layer timeout client causing a RESET_STREAM emission. To fix this, during sending each stream with at least some bytes transferred from its tx.buf to qc_stream_desc out buffer is put at the end of the send-list. This ensures that on the next iteration streams that cannot transfer anything will be used in priority. This patch improves significantly h2load benchmarks for large objects with several streams opened in parallel on a single connection. Without it, errors may be reported by h2load for aborted streams. For example, this improved the following scenario on a 10mbit/s link with a 10s timeout client : $ ./build/bin/h2load --npn-list h3 -t 1 -c 1 -m 30 -n 30 https://198.18.10.11:20443/?s=500k This fix may help with the github issue #2004 where chrome browser stop to use QUIC after receiving RESET_STREAM frames. This should be backported up to 2.7.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	24962dd178	BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length Some HTX responses may not always contain a EOM block. For example this is the case if content-length header is missing from the HTTP server response. Stream termination is thus signaled to QUIC mux via shutw callback. However, this is interpreted inconditionnally as an early close by the mux with a RESET_STREAM emission. Most of the times, QUIC clients report this as an error. To fix this, check if htx.extra is set to HTX_UNKOWN_PAYLOAD_LENGTH for a qcs instance. If true, shutw will never be used to emit a RESET_STREAM. Instead, the stream will be closed properly with a FIN STREAM frame. If all data were already transfered, an empty STREAM frame is sent. This fix may help with the github issue #2004 where chrome browser stop to use QUIC after receiving RESET_STREAM frames. This issue was reported by Vladimir Zakharychev. Thanks to him for his help and testing. It was also reproduced locally using httpterm with the query string "/?s=1k&b=0&C=1". This should be backported up to 2.7.	2023-04-26 17:50:09 +02:00
Frédéric Lécaille	7d23e8d1a6	CLEANUP: quic: Rename several <buf> variables into quic_sock.c Rename some variables which are not struct buffer variables. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	bb426aa5f1	CLEANUP: quic: Rename <buf> variable into qc_parse_hd_form() There is no struct buffer variable manipulated by this function. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	6ff52f9ce5	CLEANUP: quic: Rename <buf> variable into quic_packet_read_long_header() Make this function be more readable: there is no struct buffer variable passed as parameter to this function. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	81a02b59f5	CLEANUP: quic: Rename several <buf> variables at low level Make quic_stateless_reset_token_cpy(), quic_derive_cid() and quic_get_cid_tid() be more readable: there is no struct buffer variable manipulated by these functions. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	182934d80b	CLEANUP: quic: Rename quic_get_dgram_dcid() <buf> variable quic_get_dgram_dcid() does not manipulate any struct buffer variable. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	1e0f8255a1	CLEANUP: quic: Make qc_build_pkt() be more readable There is no <buf> variable passed to this function. Also rename <buf_end> to <end> to mimic others functions. Rename <beg> to <first_byte> and <end> to <last_byte>. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	3adb9e85a1	CLEANUP: quic: Rename <buf> variable for several low level functions Make quic_build_packet_long_header(), quic_build_packet_short_header() and quic_apply_header_protection() be more readable: there is no struct buffer variables used by these functions. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	bef3098d33	CLEANUP: quic: Rename <buf> variable into quic_rx_pkt_parse() Make this function be more readable: there is no struct buffer variable used by this function. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	7f0b1c7016	CLEANUP: quic: Rename <buf> variable into quic_padding_check() Make quic_padding_check() be more readable: there is not struct buffer variable used by this function. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	dad0ede28a	CLEANUP: quic: Rename <buf> variable to <token> in quic_generate_retry_token() Make quic_generate_retry_token() be more readable: there is no struct buffer variable used in this function. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	e66d67a1ae	CLEANUP: quic: Remove useless parameters passes to qc_purge_tx_buf() Remove the pointer to the connection passed as parameters to qc_purge_tx_buf() and other similar function which came with qc_purge_tx_buf() implementation. They were there do track the connection during tests. Must be backported to 2.7.	2023-04-24 15:53:26 +02:00
Amaury Denoyelle	d5f03cd576	CLEANUP: quic: rename frame variables Rename all frame variables with the suffix _frm. This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:22 +02:00
Amaury Denoyelle	888c5f283a	CLEANUP: quic: rename frame types with an explicit prefix Each frame type used in quic_frame union has been renamed with the following prefix "qf_". This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:03 +02:00
Frédéric Lécaille	b73762ad78	BUG/MINOR: quic: Useless I/O handler task wakeups (draining, killing state) From the idle_timer_task(), the I/O handler must be woken up to send ack. But there is no reason to do that in draining state or killing state. In draining state this is even forbidden. Must be backported to 2.7.	2023-04-24 11:47:11 +02:00
Frédéric Lécaille	d21c628ffd	BUG/MINOR: quic: Useless probing retransmission in draining or killing state The timer task responsible of triggering probing retransmission did not inspect the state of the connection before doing its job. But there is no need to probe the peer when the connection is in draining or killing state. About the draining state, this is even forbidden. Must be backported to 2.7 and 2.6.	2023-04-24 11:46:33 +02:00
Frédéric Lécaille	c6bec2a3af	BUG/MINOR: quic: Possible leak during probing retransmissions qc_dgrams_retransmit() prepares two list of frames to be retransmitted into two datagrams. If the first datagram could not be sent, the TX buffer will be purged with the prepared packet and its frames, but this was not the case for the second list of frames. Must be backported in 2.7.	2023-04-24 11:38:28 +02:00
Frédéric Lécaille	ce0bb338c6	BUG/MINOR: quic: Possible memory leak from TX packets This bug arrived with this commit which was not sufficient: BUG/MEDIUM: quic: Missing TX buffer draining from qc_send_ppkts() Indeed, there were also remaining allocated TX packets to be released and their TX frames. Implement qc_purge_tx_buf() to do so which depends on qc_free_tx_coalesced_pkts() and qc_free_frm_list(). Must be backported to 2.7.	2023-04-24 11:38:28 +02:00
Frédéric Lécaille	e95e00e305	MINOR: quic: Move traces at proto level These traces has already been useful to debug issues. Must be backported to 2.7 and 2.6.	2023-04-24 11:38:16 +02:00
Willy Tarreau	0e875cf291	MEDIUM: listener: switch the default sharding to by-group Sharding by-group is exactly identical to by-process for a single group, and will use the same number of file descriptors for more than one group, while significantly lowering the kernel's locking overhead. Now that all special listeners (cli, peers) are properly handled, and that support for SO_REUSEPORT is detected at runtime per protocol, there should be no more reason for now switching to by-group by default. That's what this patch does. It does only this and nothing else so that it's easy to revert, should any issue be raised. Testing on an AMD EPYC 74F3 featuring 24 cores and 48 threads distributed into 8 core complexes of 3 cores each, shows that configuring 8 groups (one per CCX) is sufficient to simply double the forwarded connection rate from 112k to 214k/s, reducing kernel locking from 71 to 55%.	2023-04-23 10:18:16 +02:00
Willy Tarreau	7310164b2c	MINOR: listener: add a new global tune.listener.default-shards setting This new setting accepts "by-process", "by-group" and "by-thread" and will dictate how listeners will be sharded by default when nothing is specified. While the default remains "by-process", "by-group" should be much more efficient with many threads, while not changing anything for single-group setups.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c38499ceae	MINOR: listener: do not restrict CLI to first group anymore Now that we're able to run listeners on any set of groups, we don't need to maintain a special case about the stats socket anymore. It used to be forced to group 1 only so as to avoid startup failures in case several groups were configured, but if it's done now, it will automatically bind the needed FDs to have one per group so this is no more an issue.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f1003ea7fa	MINOR: protocol: perform a live check for SO_REUSEPORT support When testing if a protocol supports SO_REUSEPORT, we're now able to verify if the OS does really support it. While it may be supported at build time, it may possibly have been blocked in a container for example so we'd rather know what it's like.	2023-04-23 09:46:15 +02:00
Willy Tarreau	b073573c10	MINOR: sock: add a function to check for SO_REUSEPORT support at runtime The new function _sock_supports_reuseport() will be used to check if a protocol type supports SO_REUSEPORT or not. This will be useful to verify that shards can really work.	2023-04-23 09:46:15 +02:00
Willy Tarreau	8a5e6f4cca	MINOR: protocol: add a function to check if some features are supported The new function protocol_supports_flag() checks the protocol flags to verify if some features are supported, but will support being extended to refine the tests. Let's use it to check for REUSEPORT.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c1fbdd6397	MINOR: listener: automatically adjust shards based on support for SO_REUSEPORT Now if multiple shards are explicitly requested, and the listener's protocol doesn't support SO_REUSEPORT, sharding is disabled, which will result in the socket being automatically duped if needed. A warning is emitted when this happens. If "shards by-group" or "shards by-thread" are used, these will automatically be turned down to 1 since we want this to be possible easily using -dR on the command line without having to djust the config. For "by-thread", a diag warning will be emitted to help troubleshoot possible performance issues.	2023-04-23 09:46:15 +02:00
Willy Tarreau	785b89f551	MINOR: protocol: move the global reuseport flag to the protocols Some protocol support SO_REUSEPORT and others not. Some have such a limitation in the kernel, and others in haproxy itself (e.g. sock_unix cannot support multiple bindings since each one will unbind the previous one). Also it's really protocol-dependent and not just family-dependent because on Linux for some time it was supported for TCP and not UDP. Let's move the definition to the protocols instead. Now it's preset in tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset. The enabled() config condition test validates IPv4 (generally sufficient), and -dR / noreuseport all protocols at once.	2023-04-23 09:46:15 +02:00
Willy Tarreau	65df7e028d	MINOR: protocol: add a flags field to store info about protocols We'll use these flags to know if some protocols are supported, and if so, with what options/extensions. Reuseport will move there for example. Two functions were added to globally set/clear a flag.	2023-04-23 09:46:15 +02:00
Willy Tarreau	a22db6567f	MEDIUM: peers: call bind_complete_thread_setup() to finish the config The listeners in peers sections were still not handing the thread groups fine. Shards were silently ignored and if a listener was bound to more than one group, it would simply fail. Now we can call the dedicated function to resolve all this and possibly create the missing extra listeners. bind_complete_thread_setup() was adjusted to use the proxy_type_str() instead of writing "proxy" at the only place where this word was still hard-coded so that we continue to speak about peers sections when relevant.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f6a8444f55	REORG: listener: move the bind_conf's thread setup code to listener.c What used to be only two lines to apply a mask in a loop in check_config_validity() grew into a 130-line block that performs deeply listener-specific operations that do not have their place there anymore. In addition it's worth noting that the peers code still doesn't support shards nor being bound to more than one group, which is a second reason for moving that code to its own function. Nothing was changed except recreating the missing variables from the bind_conf itself (the fe only).	2023-04-23 09:46:15 +02:00
Willy Tarreau	e1a0107f9c	BUG/MINOR: config: fix NUMA topology detection on FreeBSD In 2.6-dev1, NUMA topology detection was enabled on FreeBSD with commit `f5d48f8b3` ("MEDIUM: cfgparse: numa detect topology on FreeBSD."). But it suffers from a minor bug which is that it forgets to check for the number of domains and always emits a confusing warning indicating that multiple sockets were found while it's not the case. This can be backported to 2.6.	2023-04-23 09:46:15 +02:00
Willy Tarreau	997ad155fe	BUG/MINOR: tools: check libssl and libcrypto separately The lib compatibility checks introduced in 2.8-dev6 with commit `c3b297d5a` ("MEDIUM: tools: further relax dlopen() checks too consider grouped symbols") were partially incorrect in that they check at the same time libcrypto and libssl. But if loading a library that only depends on libcrypto, the ssl-only symbols will be missing and this might present an inconsistency. This is what is observed on FreeBSD 13.1 when libcrypto is being loaded, where it sees two symbols having disappeared. The fix consists in splitting the checks for libcrypto and libssl. No backport is needed, unless the patch above finally gets backported.	2023-04-23 09:46:15 +02:00
Willy Tarreau	9f53b7b41a	BUG/MINOR: sock_inet: use SO_REUSEPORT_LB where available On FreeBSD 13.1 I noticed that thread balancing using shards was not always working. Sometimes several threads would work, but most of the time a single one was taking all the traffic. This is related to how SO_REUSEPORT works on FreeBSD since version 12, as it seems there is no guarantee that multiple sockets will receive the traffic. However there is SO_REUSEPORT_LB that is designed exactly for this, so we'd rather use it when available. This patch may possibly be backported, but nobody complained and it's not sure that many users rely on shards. So better wait for some feedback before backporting this.	2023-04-23 09:46:15 +02:00
Ilya Shipitsin	ccf8012f28	CLEANUP: assorted typo fixes in the code and comments This is 36th iteration of typo fixes	2023-04-23 09:44:53 +02:00
Willy Tarreau	023c311d70	BUG/MINOR: cli: clarify error message about stats bind-process In 2.7-dev2, "stats bind-process" was removed by commit `94f763b5e` ("MEDIUM: config: remove deprecated "bind-process" directives from frontends") and an error message indicates that it's no more supported. However it says "stats" is not supported instead of "stats bind-process", making it a bit confusing. This should be backported to 2.7.	2023-04-23 09:40:56 +02:00
Tim Duesterhus	1307cd42d2	CLEANUP: Stop checking the pointer before calling `ring_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { ring_free(e); - } @@ expression e; @@ - if (e) { ring_free(e); - } @@ expression e; @@ - if (e) ring_free(e); @@ expression e; @@ - if (e != NULL) ring_free(e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	fe83f58906	CLEANUP: Stop checking the pointer before calling `task_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { task_destroy(e); - } @@ expression e; @@ - if (e) { task_destroy(e); - } @@ expression e; @@ - if (e) task_destroy(e); @@ expression e; @@ - if (e != NULL) task_destroy(e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	c18e244515	CLEANUP: Stop checking the pointer before calling `pool_free()` Changes performed with this Coccinelle patch: @@ expression e; expression p; @@ - if (e != NULL) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) pool_free(p, e); @@ expression e; expression p; @@ - if (e != NULL) pool_free(p, e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	b1ec21d259	CLEANUP: Stop checking the pointer before calling `tasklet_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { tasklet_free(e); - } @@ expression e; @@ - if (e) { tasklet_free(e); - } @@ expression e; @@ - if (e) tasklet_free(e); @@ expression e; @@ - if (e != NULL) tasklet_free(e); See GitHub Issue #2126	2023-04-23 00:28:25 +02:00
Willy Tarreau	8adffaa899	MINOR: listener: always compare the local thread as well By comparing the local thread's load with the least loaded thread's load, we can further improve the fairness and at the same time also improve locality since it allows a small ratio of connections not to be migrated. This is visible on CPU usage with long connections on very large thread counts (224) and high bandwidth (200G). The cost of checking the local thread's load remains fairly low so there's no reason not to do this. We continue to update the index if we select the local thread, because it means that the two other threads were both more loaded so we'd rather find better ones.	2023-04-21 17:41:26 +02:00
Willy Tarreau	ff18504d73	MINOR: listener: make sure to avoid ABA updates in per-thread index One limitation of the current thread index mechanism is that if the values are assigned multiple times to the same thread and the index loops, it can match again the old value, which will not prevent a competing thread from finishing its CAS and assigning traffic to a thread that's not the optimal one. The probability is low but the solution is simple enough and consists in implementing an update counter in the high bits of the index to force a mismatch in this case (assuming we don't try to cover for extremely unlikely cases where the update counter loops while the index remains equal). So let's do that. In order to improve the situation a little bit, we now set the index to a ulong so that in 32 bits we have 8 bits of counter and in 64 bits we have 40 bits.	2023-04-21 17:41:26 +02:00
Willy Tarreau	77e33509c8	MINOR: listener: resync with the thread index before heavy calculations During heavy accept competition, the CAS will occasionally fail and we'll have to go through all the calculation again. While the first two loops look heavy, they're almost never taken so they're quite cheap. However the rest of the operation is heavy because we have to consult connection counts and queue indexes for other threads, so better double-check if the index is still valid before continuing. Tests show that it's more efficient do retry half-way like this.	2023-04-21 17:41:26 +02:00
Willy Tarreau	b657492680	MINOR: listener: use a common thr_idx from the reference listener Instead of seeing each listener use its own thr_idx, let's use the same for all those from a shard. It should provide more accurate and smoother thread allocation.	2023-04-21 17:41:26 +02:00
Willy Tarreau	9d360604bd	MEDIUM: listener: rework thread assignment to consider all groups Till now threads were assigned in listener_accept() to other threads of the same group only, using a single group mask. Now that we have all the relevant info (array of listeners of the same shard), we can spread the thr_idx to cover all assigned groups. The thread indexes now contain the group number in their upper bits, and the indexes run over te whole list of threads, all groups included. One particular subtlety here is that switching to a thread from another group also means switching the group, hence the listener. As such, when changing the group we need to update the connection's owner to point to the listener of the same shard that is bound to the target group.	2023-04-21 17:41:26 +02:00
Willy Tarreau	e6f5ab5afa	MINOR: listener: make accept_queue index atomic There has always been a race when checking the length of an accept queue to determine which one is more loaded that another, because the head and tail are read at two different moments. This is not required, we can merge them as two 16 bit numbers inside a single 32-bit index that is always accessed atomically. This way we read both values at once and always have a consistent measurement.	2023-04-21 17:41:26 +02:00
Willy Tarreau	09b52d1c3d	MEDIUM: config: permit to start a bind on multiple groups at once Now it's possible for a bind line to span multiple thread groups. When this happens, the first one will become the reference and will be entirely set up, and the subsequent ones will be duplicated from this reference, so that they can be registered in distinct groups. The reference is always setup and started first so it is always available when the other ones are started. The doc was updated to reflect this new possibility with its limitations and impacts, and the differences with the "shards" option.	2023-04-21 17:41:26 +02:00
Willy Tarreau	09e266e6f5	MINOR: proto: skip socket setup for duped FDs It's not strictly necessary, but it's still better to avoid setting up the same socket multiple times when it's being duplicated to a few FDs. We don't change that for inherited ones however since they may really need to be set up, so we only skip duplicated ones.	2023-04-21 17:41:26 +02:00
Willy Tarreau	0e1aaf4e78	MEDIUM: proto: duplicate receivers marked RX_F_MUST_DUP The different protocol's ->bind() function will now check the receiver's RX_F_MUST_DUP flag to decide whether to bind a fresh new listener from scratch or reuse an existing one and just duplicate it. It turns out that the existing code already supports reusing FDs since that was done as part of the FD passing and inheriting mechanism. Here it's not much different, we pass the FD of the reference receiver, it gets duplicated and becomes the new receiver's FD. These FDs are also marked RX_F_INHERITED so that they are not exported and avoid being touched directly (only the reference should be touched).	2023-04-21 17:41:26 +02:00
Willy Tarreau	aae1810b4d	MINOR: receiver: add a struct shard_info to store info about each shard In order to create multiple receivers for one multi-group shard, we'll need some more info about the shard. Here we store: - the number of groups (= number of receivers) - the number of threads (will be used for accept LB) - pointer to the reference rx (to get the FD and to find all threads) - pointers to the other members (to iterate over all threads) For now since there's only one group per shard it remains simple. The listener deletion code already takes care of removing the current member from its shards list and moving others' reference to the last one if it was their reference (so as to avoid o(n^2) updates during ordered deletes). Since the vast majority of setups will not use multi-group shards, we try to save memory usage by only allocating the shard_info when it is needed, so the principle here is that a receiver shard_info==NULL is alone and doesn't share its socket with another group. Various approaches were considered and tests show that the management of the listeners during boot makes it easier to just attach to or detach from a shard_info and automatically allocate it if it does not exist, which is what is being done here. For now the attach code is not called, but detach is already called on delete.	2023-04-21 17:41:26 +02:00
Willy Tarreau	84fe1f479b	MINOR: listener: support another thread dispatch mode: "fair" This new algorithm for rebalancing incoming connections to multiple threads is simpler and instead of considering the threads load, it will only cycle through all of them, offering a fair share of the traffic to each thread. It may be well suited for short-lived connections but is also convenient for very large thread counts where it's not always certain that the least loaded thread will always be found.	2023-04-21 17:41:26 +02:00
Willy Tarreau	6a4d48b736	MINOR: quic_sock: index li->per_thr[] on local thread id, not global one There's a li_per_thread array in each listener for use with QUIC listeners. Since thread groups were introduced, this array can be allocated too large because global.nbthread is allocated for each listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP) may be used by a single listener. This was because the global thread ID is used as the index instead of the local ID (since a listener may only be used by a single group). Let's just switch to local ID and reduce the allocated size.	2023-04-21 17:41:26 +02:00
Willy Tarreau	77d37b07b1	MINOR: quic: support migrating the listener as well When migrating a quic_conn to another thread, we may need to also switch the listener if the thread belongs to another group. When this happens, the freshly created connection will already have the target listener, so let's just pick it from the connection and use it in qc_set_tid_affinity(). Note that it will be the caller's responsibility to guarantee this.	2023-04-21 17:41:26 +02:00
Aurelien DARRAGON	23f352f7d0	MINOR: server/event_hdl: prepare for server event data wrapper Adding the possibility to publish an event using a struct wrapper around existing SERVER events to provide additional contextual info. Using the specific struct wrapper is not required: it is supported to cast event data as a regular server event data struct so that we don't break the existing API. However, casting event data with a more explicit data type allows to fetch event-only relevant hints.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	f71e0645c1	MEDIUM: server: split srv_update_status() in two functions Considering that srv_update_status() is now synchronous again since `3ff577e1` ("MAJOR: server: make server state changes synchronous again"), and that we can easily identify if the update is from an operational or administrative context thanks to "MINOR: server: pass adm and op cause to srv_update_status()". And given that administrative and operational updates cannot be cumulated (since srv_update_status() is called synchronously and independently for admin updates and state/operational updates, and the function directly consumes the changes). We split srv_update_status() in 2 distinct parts: Either <type> is 0, meaning the update is an operational update which is handled by directly looking at cur_state and next_state to apply the proper transition. Also, the check to prevent operational state from being applied if MAINT admin flag is set is no longer needed given that the calling functions already ensure this (ie: srv_set_{running,stopping,stopped) Or <type> is 1, meaning the update is an administrative update, where cur_admin and next_admin are evaluated to apply the proper transition and deduct the resulting server state (next_state is updated implicitly). Once this is done, both operations share a common code path in srv_update_status() to update proxy and servers stats if required. Thanks to this change, the function's behavior is much more predictable, it is not an all-in-one function anymore. Either we apply an operational change, else it is an administrative change. That's it, we cannot mix the 2 since both code paths are now properly separated.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	76e255520f	MINOR: server: pass adm and op cause to srv_update_status() Operational and administrative state change causes are not propagated through srv_update_status(), instead they are directly consumed within the function to provide additional info during the call when required. Thus, there is no valid reason for keeping adm and op causes within server struct. We are wasting space and keeping uneeded complexity. We now exlicitly pass change type (operational or administrative) and associated cause to srv_update_status() so that no extra storage is needed since those values are only relevant from srv_update_status().	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	10518c0d59	CLEANUP: server: fix srv_set_{running, stopping, stopped} function comment Fixing function comments for the server state changing function since they still refer to asynchonous propagation of server state which is no longer in play. Moreover, there were some mixups between running/stopping.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	c54b98ac9a	CLEANUP: server: remove unused variables in srv_update_status() check and px local variable aliases are not very useful. Let's remove them and use s->check and s->proxy instead.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	1746b56e68	MINOR: server: change srv_op_st_chg_cause storage type This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type". While looking at current srv_op_st_chg_cause usage, it was clear that the struct needed some cleanup since some leftovers from asynchronous server state change updates were left behind and resulted in some useless code duplication, and making the whole thing harder to maintain. Two observations were made: - by tracking down srv_set_{running, stopped, stopping} usage, we can see that the <reason> argument is always a fixed statically allocated string. - check-related state change context (duration, status, code...) is not used anymore since srv_append_status() directly extracts the values from the server->check. This is pure legacy from when the state changes were applied asynchronously. To prevent code duplication, useless string copies and make the reason/cause more exportable, we store it as an enum now, and we provide srv_op_st_chg_cause() function to fetch the related description string. HEALTH and AGENT causes (check related) are now explicitly identified to make consumers like srv_append_op_chg_cause() able to fetch checks info from the server itself if they need to.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	f3b48a808e	MINOR: server: srv_append_status refacto srv_append_status() has become a swiss-knife function over time. It is used from server code and also from checks code, with various inputs and distincts code paths, making it very hard to guess the actual behavior of the function (resulting string output). To simplify the logic behind it, we're dividing it in multiple contextual functions that take simple inputs and do explicit things, making them more predictable and easier to maintain.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9b1ccd7325	MINOR: server: change adm_st_chg_cause storage type Even though it doesn't look like it at first glance, this is more like a cleanup than an actual code improvement: Given that srv->adm_st_chg_cause has been used to exclusively store static strings ever since it was implemented, we make the choice to store it as an enum instead of a fixed-size string within server struct. This will allow to save some space in server struct, and will make it more easily exportable (ie: event handlers) because of the reduced memory footprint during handling and the ability to later get the corresponding human-readable message when it's explicitly needed.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	85b91375bf	MINOR: server: propagate lb changes through srv_lb_propagate() Now that we have a generic srv_lb_propagate(s) function, let's use it each time we explicitly wan't to set the status down as well. Indeed, it is tricky to try to handle "down" case explicitly, instead we use srv_lb_propagate() which will call the proper function that will handle the new server state. This will allow some code cleanup and will prevent any logic error. This commit depends on: - "MINOR: server: propagate server state change to lb through single function"	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	8bbe643acc	MINOR: server: propagate server state change to lb through single function Use a dedicated helper function to propagate server state change to lb algorithms, since it is performed at multiple places within srv_update_status() function.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	5f80f8bbc5	MINOR: server: central update for server counters on state change Based on "BUG/MINOR: server: don't miss server stats update on server state transitions", we're also taking advantage of the new centralized logic to update down_trans server counter directly from there instead of multiple places.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9c21ff0208	BUG/MINOR: server: don't use date when restoring last_change from state file When restoring from a state file: the server "Status" reports weird values on the html stats page: "5s UP" becomes -> "? UP" after the restore This is due to a bug in srv_state_srv_update(): when restoring the states from a state file, we rely on date.tv_sec to compute the process-relative server last_change timestamp. This is wrong because everywhere else we use now.tv_sec when dealing with last_change, for instance in srv_update_status(). date (which is Wall clock time) deviates from now (monotonic time) in the long run. They should not be mixed, and given that last_change is an internal time value, we should rely on now.tv_sec instead. last_change export through "show servers state" cli is safe since we export a delta and not the raw time value in dump_servers_state(): srv_time_since_last_change = now.tv_sec - srv->last_change -- While this bug affects all stable versions, it was revealed in 2.8 thanks to `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot") This is due to the fact that "now" immediately deviates from "date", whereas in the past they had the same value when starting. Thus prior to 2.8 the bug is trickier since it could take some time for date and now to deviate sufficiently for the issue to arise, and instead of reporting absurd values that are easy to spot it could just result in last_change becoming inconsistent over time. As such, the fix should be backported to all stable versions. [for 2.2 the patch needs to be applied manually since srv_state_srv_update() was named srv_update_state() and can be found in server.c instead of server_state.c]	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9f5853fa38	BUG/MINOR: server: don't miss server stats update on server state transitions s->last_change and s->down_time updates were manually updated for each effective server state change within srv_update_status(). This is rather error-prone, and as a result there were still some state transitions that were not handled properly since at least 1.8. ie: - when transitionning from DRAIN to READY: downtime was updated (which is wrong since a server in DRAIN state should not be considered as DOWN) - when transitionning from MAINT to READY: downtime was not updated (this can be easily seen in the html stats page) To fix these all at once, and prevent similar bugs from being introduced, we centralize the server last_change and down_time stats logic at the end of srv_update_status(): If the server state changed during the call, then it means that last_change must be updated, with a special case when changing from STOPPED state which means the server was previously DOWN and thus downtime should be updated. This patch depends on: - "MINOR: server: explicitly commit state change in srv_update_status()" This could be backported to every stable versions.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	e80ddb18a8	BUG/MINOR: server: don't miss proxy stats update on server state transitions backend "down" stats logic has been duplicated multiple times in srv_update_status(), resulting in the logic now being error-prone. For example, the following bugfix was needed to compensate for a copy-paste introduced bug: `d332f139` ("BUG/MINOR: server: update last_change on maint->ready transitions too") While the above patch works great, we actually forgot to update the proxy downtime like it is done for other down->up transitions... This is simply illustrating that the current design is error-prone, it is very easy to miss something in this area. To properly update the proxy downtime stats on the maint->ready transition, to cleanup srv_update_status() and to prevent similar bugs from being introduced in the future, proxy/backend stats update are now automatically performed at the end of the server state change if needed. Thus we can remove existing updates that were performed at various places within the function, this simplifies things a bit. This patch depends on: - "MINOR: server: explicitly commit state change in srv_update_status()" This could be backported to all stable versions. Backport notes: 2.2: Replace struct task srv_cleanup_toremove_conns(struct task task, void context, unsigned int state) by struct task srv_cleanup_toremove_connections(struct task task, void context, unsigned short state)	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	22151c70bb	MINOR: server: explicitly commit state change in srv_update_status() As shown in `8f29829` ("BUG/MEDIUM: checks: a down server going to maint remains definitely stucked on down state."), state changes that don't result in explicit lb state change, require us to perform an explicit server state commit to make sure the next state is applied before returning from the function. This is the case for server state changes that don't trigger lb logic and only perform some logging. This is quite error prone, we could easily forget a state change combination that could result in next_state, next_admin or next_eweight not being applied. (cur_state, cur_admin and cur_eweight would be left with unexpected values) To fix this, we explicitly call srv_lb_commit_status() at the end of srv_update_status() to enforce the new values, even if they were already applied. (when a state changes requires lb state update an implicit commit is already performed) Applying the state change multiple times is safe (since the next value always points to the current value). Backport notes: 2.2: Replace struct task srv_cleanup_toremove_conns(struct task task, void context, unsigned int state) by struct task srv_cleanup_toremove_connections(struct task task, void context, unsigned short state)	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9a1df02ccb	BUG/MINOR: server: incorrect report for tracking servers leaving drain Report message for tracking servers completely leaving drain is wrong: The check for "leaving drain .. via" never evaluates because the condition !(s->next_admin & SRV_ADMF_FDRAIN) is always true in the current block which is guarded by !(s->next_admin & SRV_ADMF_DRAIN). For tracking servers that leave inherited drain mode, this results in the following message being emitted: "Server x/b is UP (leaving forced drain)" Instead of: "Server x/b is UP (leaving drain) via x/a" To this fix: we check if FDRAIN is currently set, else it means that the drain status is inherited from the tracked server (IDRAIN) This regression was introduced with `64cc49cf` ("MAJOR: servers: propagate server status changes asynchronously."), thus it may be backported to every stable versions.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	096b383e16	MINOR: hlua/event_hdl: timestamp for events 'when' optional argument is provided to lua event handlers. It is an integer representing the number of seconds elapsed since Epoch and may be used in conjunction with lua `os.date()` function to provide a custom format string.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	e9314fb7a7	MINOR: event_hdl: provide event->when for advanced handlers For advanced async handlers only (Registered using EVENT_HDL_ASYNC_TASK() macro): event->when is provided as a struct timeval and fetched from 'date' haproxy global variable. Thanks to 'when', related event consumers will be able to timestamp events, even if they don't work in real-time or near real-time. Indeed, unlike sync or normal async handlers, advanced async handlers could purposely delay the consumption of pending events, which means that the date wouldn't be accurate if computed directly from within the handler.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	ebf58e991a	MINOR: event_hdl: dynamically allocated event data members Add the ability to provide a cleanup function for event data passed via the publishing function. One use case could be the need to provide valid pointers in the safe section of the data struct. Cleanup function will be automatically called with data (or copy of data) as argument when all handlers consumed the event, which provides an easy way to release some memory or decrement refcounts to ressources that were provided through the data struct. data in itself may not be freed by the cleanup function, it is handled by the API. This would allow passing large (allocated) data blocks through the data struct while keeping data struct size under the EVENT_HDL_ASYNC_EVENT_DATA size limit. To do so, when publishing an event, where we would currently do: struct event_hdl_cb_data_new_family event_data; /* safe data, available from both sync and async contexts * may not use pointers to short-living resources / event_data.safe.my_custom_data = x; / unsafe data, only available from sync contexts / event_data.unsafe.my_unsafe_data = y; / once data is prepared, we can publish the event / event_hdl_publish(NULL, EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1, EVENT_HDL_CB_DATA(&event_data)); We could do: struct event_hdl_cb_data_new_family event_data; / safe data, available from both sync and async contexts * may not use pointers to short-living resources, * unless EVENT_HDL_CB_DATA_DM is used to ensure pointer * consistency (ie: refcount) / event_data.safe.my_custom_static_data = x; event_data.safe.my_custom_dynamic_data = malloc(1); / unsafe data, only available from sync contexts / event_data.unsafe.my_unsafe_data = y; / once data is prepared, we can publish the event / event_hdl_publish(NULL, EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1, EVENT_HDL_CB_DATA_DM(&event_data, data_new_family_cleanup)); With data_new_family_cleanup func which would look like this: void data_new_family_cleanup(const void data) { const struct event_hdl_cb_data_new_family event_data = ptr; / some data members require specific cleanup once the event * is consumed / free(event_data.safe.my_custom_dynamic_data); / don't ever free data! it is not ours */ } Not sure if this feature will become relevant in the future, so I prefer not to mention it in the doc for now. But given that the implementation is trivial and does not put a burden on the existing API, it's a good thing to have it there, just in case.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	a63f4903c9	MINOR: server/event_hdl: prepare for upcoming refactors This commit does nothing that ought to be mentioned, except that it adds missing comments and slighty moves some function calls out of "sensitive" code in preparation of some server code refactors.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	2f6a07dce8	MINOR: hlua/event_hdl: fix return type for hlua_event_hdl_cb_data_push_args Changing hlua_event_hdl_cb_data_push_args() return type to void since it does not return anything useful. Also changing its name to hlua_event_hdl_cb_push_args() since it does more than just pushing cb data argument (it also handles event type and mgmt). Errors catched by the function are reported as lua errors.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	55f84c7cab	MINOR: hlua/event_hdl: expose proxy_uuid variable in server events Adding proxy_uuid to ServerEvent class. proxy_uuid contains the uuid of the proxy to which the server belongs	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	3d9bf4e1a5	MINOR: hlua/event_hdl: rely on proxy_uuid instead of proxy_name for lookups Since "MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server" we may now use proxy_uuid variable to perform proxy lookups when handling a server event. It is more reliable since proxy_uuid isn't subject to any size limitation	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	d714213862	MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server Expose proxy_uuid variable in event_hdl_cb_data_server struct to overcome proxy_name fixed length limitation. proxy_uuid may be used by the handler to perform proxy lookups. This should be preferred over lookups relying proxy_name. (proxy_name is suitable for printing / logging purposes but not for ID lookups since it has a maximum fixed length)	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	0ddf052972	CLEANUP: server: fix update_status() function comment srv_update_status() function comment says that the function "is designed to be called asynchronously". While this used to be true back then with `64cc49cf` ("MAJOR: servers: propagate server status changes asynchronously.") This is not true anymore since `3ff577e` ("MAJOR: server: make server state changes synchronous again") Fixing the comment in order to better reflect current behavior.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	88687f0980	CLEANUP: errors: fix obsolete function comments Since `9f903af5` ("MEDIUM: log: slightly refine the output format of alerts/warnings/etc"), messages generated by ha_{alert,warning,notice} don't embed date/time information anymore. Updating some old function comments that kept saying otherwise.	2023-04-21 14:36:45 +02:00
Amaury Denoyelle	a65dd3a2c8	BUG/MINOR: quic: consume Rx datagram even on error A BUG_ON crash can occur on qc_rcv_buf() if a Rx packet allocation failed. To fix this, datagram are marked as consumed even if a fatal error occured during parsing. For the moment, only a Rx packet allocation failure could provoke this. At this stage, it's unknown if the datagram were partially parsed or not at all so it's better to discard it completely. This bug was detected using -dMfail argument. This should be backported up to 2.7.	2023-04-20 14:49:32 +02:00
Amaury Denoyelle	d537ca79dc	BUG/MINOR: quic: prevent crash on qc_new_conn() failure Properly initialize el_th_ctx member first on qc_new_conn(). This prevents a segfault if release should be called later due to memory allocation failure in the function on qc_detach_th_ctx_list(). This should be backported up to 2.7.	2023-04-20 14:49:32 +02:00
Amaury Denoyelle	9bbfa72b67	BUG/MINOR: h3: fix crash on h3s alloc failure Do not emit a CONNECTION_CLOSE on h3s allocation failure. Indeed, this causes a crash as the calling function qcs_new() will also try to emit a CONNECTION_CLOSE which triggers a BUG_ON() on qcc_emit_cc(). This was reproduced using -dMfail. This should be backported up to 2.7.	2023-04-20 14:49:32 +02:00
Amaury Denoyelle	93d2ebe9f3	BUG/MINOR: mux-quic: properly handle STREAM frame alloc failure Previously, if a STREAM frame cannot be allocated for emission, a crash would occurs due to an ABORT_NOW() statement in _qc_send_qcs(). Replace this by proper error code handling. Each stream were sending fails are removed temporarily from qcc::send_list to a list local to _qc_send_qcs(). Once emission has been conducted for all streams, reinsert failed stream to qcc::send_list. This avoids to reloop on failed streams on the second while loop at the end of _qc_send_qcs(). This crash was reproduced using -dMfail. This should be backported up to 2.6.	2023-04-20 14:49:32 +02:00
Amaury Denoyelle	ed820823f0	BUG/MINOR: mux-quic: fix crash with app ops install failure On MUX initialization, the application layer is setup via qcc_install_app_ops(). If this function fails MUX is deallocated and an error is returned. This code path causes a crash before connection has been registered prior into the mux_stopping_data::list for stopping idle frontend conns. To fix this, insert the connection later in qc_init() once no error can occured. The crash was seen on the process closing with SUGUSR1 with a segfault on mux_stopping_process(). This was reproduced using -dMfail. This regression was introduced by the following patch : commit `b4d119f0c7` BUG/MEDIUM: mux-quic: fix crash on H3 SETTINGS emission This should be backported up to 2.7.	2023-04-20 14:49:32 +02:00
Frédéric Lécaille	d07421331f	BUG/MINOR: quic: Wrong Retry token generation timestamp computing Again a now_ms variable value used without the ticks API. It is used to store the generation time of the Retry token to be received back from the client. Must be backported to 2.6 and 2.7.	2023-04-19 17:31:28 +02:00
Frédéric Lécaille	45662efb2f	BUG/MINOR: quic: Unchecked buffer length when building the token As server, an Initial does not contain a token but only the token length field with zero as value. The remaining room was not checked before writting this field. Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Frédéric Lécaille	0ed94032b2	MINOR: quic: Do not allocate too much ack ranges Limit the maximum number of ack ranges to QUIC_MAX_ACK_RANGES(32). Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Frédéric Lécaille	4b2627beae	BUG/MINOR: quic: Stop removing ACK ranges when building packets Since this commit: BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit. There are more chances that ack ranges may be removed from their trees when building a packet. It is preferable to impose a limit to these trees. This will be the subject of the a next commit to come. For now on, it is sufficient to stop deleting ack range from their trees. Remove quic_ack_frm_reduce_sz() and quic_rm_last_ack_ranges() which were there to do that. Make qc_frm_len() support ACK frames and calls it to ensure an ACK frame may be added to a packet before building it. Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Aurelien DARRAGON	8cd620b46f	MINOR: hlua: safe coroutine.create() Overriding global coroutine.create() function in order to link the newly created subroutine with the parent hlua ctx. (hlua_gethlua() function from a subroutine will return hlua ctx from the hlua ctx on which the coroutine.create() was performed, instead of NULL) Doing so allows hlua_hook() function to support being called from subroutines created using coroutine.create() within user lua scripts. That is: the related subroutine will be immune to the forced-yield, but it will still be checked against hlua timeouts. If the subroutine fails to yield or finish before the timeout, the related lua handler will be aborted (instead of going rogue unnoticed like it would be the case prior to this commit)	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	cf0f792490	MINOR: hlua: hook yield on known lua state When forcing a yield attempt from hlua_hook(), we should perform it on the known hlua state, not on a potential substate created using coroutine.create() from an existing hlua state from lua script. Indeed, only true hlua couroutines will properly handle the yield and perform the required timeout checks when returning in hlua_ctx_resume(). So far, this was not a concern because hlua_gethlua() would return NULL if hlua_hook() is not directly being called from a hlua coroutine anyway. But with this we're trying to make hlua_hook() ready for being called from a subcoroutine which inherits from a parent hlua ctx. In this case, no yield attempt will be performed, we will simply check for hlua timeouts. Not doing so would result in the timeout checks not being performed since hlua_ctx_resume() is completely bypassed when yielding from the subroutine, resulting in a user-defined coroutine potentially going rogue unnoticed.	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	2a9764baae	CLEANUP: hlua: avoid confusion between internal timers and tick based timers Not all hlua "time" variables use the same time logic. hlua->wake_time relies on ticks since its meant to be used in conjunction with task scheduling. Thus, it should be stored as a signed int and manipulated using the tick api. Adding a few comments about that to prevent mixups with hlua internal timer api which doesn't rely on the ticks api.	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	58e36e5b14	MEDIUM: hlua: introduce tune.lua.burst-timeout The "burst" execution timeout applies to any Lua handler. If the handler fails to finish or yield before timeout is reached, handler will be aborted to prevent thread contention, to prevent traffic from not being served for too long, and ultimately to prevent the process from crashing because of the watchdog kicking in. Default value is 1000ms. Combined with forced-yield default value of 10000 lua instructions, it should be high enough to prevent any existing script breakage, while still being able to catch slow lua converters or sample fetches doing thread contention and risking the process stability. Setting value to 0 completely bypasses this check. (not recommended but could be required to restore original behavior if this feature breaks existing setups somehow...) No backport needed, although it could be used to prevent watchdog crashes due to poorly coded (slow/cpu consuming) lua sample fetches/converters.	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	da9503ca9a	MEDIUM: hlua: reliable timeout detection For non yieldable lua handlers (converters, fetches or yield incompatible lua functions), current timeout detection relies on now_ms thread local variable. But within non-yieldable contexts, now_ms won't be updated if not by us (because we're momentarily stuck in lua context so we won't re-enter the polling loop, which is responsible for clock updates). To circumvent this, clock_update_date(0, 1) was manually performed right before now_ms is being read for the timeout checks. But this fails to work consistently, because if no other concurrent threads periodically run clock_update_global_date(), which do happen if we're the only active thread (nbthread=1 or low traffic), our clock_update_date() call won't reliably update our local now_ms variable Moreover, clock_update_date() is not the right tool for this anyway, as it was initially meant to be used from the polling context. Using it could have negative impact on other threads relying on now_ms to be stable. (because clock_update_date() performs global clock update from time to time) -> Introducing hlua multipurpose timer, which is internally based on now_cpu_time_fast() that provides per-thread consistent clock readings. Thanks to this new hlua timer API, hlua timeout logic is less error-prone and more robust. This allows the timeout detection to work as expected for both yieldable and non-yieldable lua handlers. This patch depends on commit "MINOR: clock: add now_cpu_time_fast() function" While this could theorically be backported to all stable versions, it is advisable to avoid backports unless we're confident enough since it could cause slight behavior changes (timing related) in existing setups.	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	df188f145b	MINOR: clock: add now_cpu_time_fast() function Same as now_cpu_time(), but for fast queries (less accurate) Relies on now_cpu_time() and now_mono_time_fast() is used as a cache expiration hint to prevent now_cpu_time() from being called too often since it is known to be quite expensive. Depends on commit "MINOR: clock: add now_mono_time_fast() function"	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	07cbd8e074	MINOR: clock: add now_mono_time_fast() function Same as now_mono_time(), but for fast queries (less accurate) Relies on coarse clock source (also known as fast clock source on some systems). Fallback to now_mono_time() if coarse source is not supported on the system.	2023-04-19 11:03:31 +02:00
Willy Tarreau	be336620b7	BUG/MINOR: cfgparse: make sure to include openssl-compat Commit `5003ac7fe` ("MEDIUM: config: set useful ALPN defaults for HTTPS and QUIC") revealed a build dependency bug: if QUIC is not enabled, cfgparse doesn't have any dependency on the SSL stack, so the various ifdefs that try to check special conditions such as rejecting an H2 config with too small a bufsize, are silently ignored. This was detected because the default ALPN string was not set and caused the alpn regtest to fail without QUIC support. Adding openssl-compat to the list of includes seems to be sufficient to have what we need. It's unclear when this dependency was broken, it seems that even 2.2 didn't have an explicit dependency on anything SSL-related, though it could have been inherited through other files (as happens with QUIC here). It would be safe to backport it to all stable branches. The impact is very low anyway.	2023-04-19 10:46:21 +02:00
Amaury Denoyelle	89e48ff92f	BUG/MEDIUM: quic: prevent crash on Retry sending The following commit introduced a regression : commit `1a5cc19cec` MINOR: quic: adjust Rx packet type parsing Since this commit, qv variable was left to NULL as version is stored directly in quic_rx_packet instance. In most cases, this only causes traces to skip version printing. However, qv is dereferenced when sending a Retry which causes a segfault. To fix this, simply remove qv variable and use pkt->version instead, both for traces and send_retry() invocation. This bug was detected thanks to QUIC interop runner. It can easily be reproduced by using quic-force-retry on the bind line. This must be backported up to 2.7.	2023-04-19 10:18:58 +02:00
Willy Tarreau	5003ac7fe9	MEDIUM: config: set useful ALPN defaults for HTTPS and QUIC This commit makes sure that if three is no "alpn", "npn" nor "no-alpn" setting on a "bind" line which corresponds to an HTTPS or QUIC frontend, we automatically turn on "h2,http/1.1" as an ALPN default for an HTTP listener, and "h3" for a QUIC listener. This simplifies the configuration for end users since they won't have to explicitly configure the ALPN string to enable H2, considering that at the time of writing, HTTP/1.1 represents less than 7% of the traffic on large infrastructures. The doc and regtests were updated. For more info, refer to the following thread: https://www.mail-archive.com/haproxy@formilux.org/msg43410.html	2023-04-19 09:52:20 +02:00
Willy Tarreau	de85de69ec	MINOR: ssl_crtlist: dump "no-alpn" on "show crtlist" when "no-alpn" was set Instead of dumping "alpn " better show "no-alpn" as configured.	2023-04-19 09:12:43 +02:00
Willy Tarreau	a2a095536a	MINOR: ssl: do not set ALPN callback with the empty string While it does not have any effect, it's better not to try to setup an ALPN callback nor to try to lookup algorithms when the configured ALPN string is empty as a result of "no-alpn" being used.	2023-04-19 09:12:43 +02:00
Willy Tarreau	158c18e85a	MINOR: config: add "no-alpn" support for bind lines It's possible to replace a previously set ALPN but not to disable ALPN if it was previously set. The new "no-alpn" setting allows to disable a previously set ALPN setting by preparing an empty one that will be replaced and freed when the config is validated.	2023-04-19 08:38:06 +02:00
Christopher Faulet	d0c57d3d33	BUG/MEDIUM: stconn: Propagate error on the SC on sending path On sending path, a pending error can be promoted to a terminal error at the endpoint level (SE_FL_ERR_PENDING to SE_FL_ERROR). When this happens, we must propagate the error on the SC to be able to handle it at the stream level and eventually forward it to the other side. Because of this bug, it is possible to freeze sessions, for instance on the CLI. It is a 2.8-specific issue. No backport needed.	2023-04-18 18:57:04 +02:00
Christopher Faulet	845f7c4708	CLEANUP: cli: Remove useless debug message in cli_io_handler() When compiled in debug mode, HAProxy prints a debug message at the end of the cli I/O handle. It is pretty annoying and useless because, we can active applet traces. Thus, just remove it.	2023-04-18 18:57:04 +02:00
Christopher Faulet	cbfcb02e21	CLEANUP: backend: Remove useless debug message in assign_server() When compiled in debug mode, HAProxy prints a debug message at the beginning of assign_server(). It is pretty annoying and useless because, in debug mode, we can active stream traces. Thus, just remove it.	2023-04-18 18:57:04 +02:00
Christopher Faulet	27c17d1ca5	BUG/MINOR: http-ana: Update analyzers on both sides when switching in TUNNEL mode The commit `9704797fa` ("BUG/MEDIUM: http-ana: Properly switch the request in tunnel mode on upgrade") fixes the switch in TUNNEL mode, but only partially. Because both channels are switch in TUNNEL mode in same time on one side, the channel's analyzers on the opposite side are not updated accordingly. This prevents the tunnel timeout to be applied. So instead of updating both sides in same time, we only force the analysis on the other side by setting CF_WAKE_ONCE flag when a channel is switched in TUNNEL mode. In addition, we must take care to forward all data if there is no DATAa TCP filters registered. This patch is related to the issue #2125. It is 2.8-specific. No backport needed.	2023-04-18 18:57:04 +02:00
Amaury Denoyelle	0783a7b08e	MINOR: listener: remove unneeded local accept flag Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC protocol before thread rebinding was supported by the quic_conn layer. This should be backported up to 2.7 after the previous patch has also been taken.	2023-04-18 17:09:34 +02:00
Amaury Denoyelle	1acbbca171	MAJOR: quic: support thread balancing on accept Before this patch, QUIC protocol used a custom add_listener callback. This was because a quic_conn instance was allocated before accept. Its thread affinity was fixed and could not be changed after. The thread was derived itself from the CID selected by the client which prevent an even repartition of QUIC connections on multiple threads. A series of patches was introduced with a lot of changes. The most important ones : * removal of affinity between an encoded CID and a thread * possibility to rebind a quic_conn on a new thread Thanks to this, it's possible to suppress the custom add_listener callback. Accept is conducted for QUIC protocol as with the others. A less loaded thread is selected on listener_accept() and the connection stack is bind on it. This operation implies that quic_conn instance is moved to the new thread using the set_affinity QUIC protocol callback. To reactivate quic_conn instance after thread rebind, qc_finalize_affinity_rebind() is called after accept on the new thread by qc_xprt_start() through accept_queue_process() / session_accept_fd(). This should be backported up to 2.7 after a period of observation.	2023-04-18 17:09:34 +02:00
Amaury Denoyelle	739de3f119	MINOR: quic: properly finalize thread rebinding When a quic_conn instance is rebinded on a new thread its tasks and tasklet are destroyed and new ones created. Its socket is also migrated to a new thread which stop reception on it. To properly reactivate a quic_conn after rebind, wake up its tasks and tasklet if they were active before thread rebind. Also reactivate reading on the socket FD. These operations are implemented on a new function qc_finalize_affinity_rebind(). This should be backported up to 2.7 after a period of observation.	2023-04-18 17:09:02 +02:00
Amaury Denoyelle	5f8704152a	BUG/MINOR: quic: transform qc_set_timer() as a reentrant function qc_set_timer() function is used to rearm the timer for loss detection and probing. Previously, timer was always rearm when congestion window was free due to a wrong interpretation of the RFC which mandates the client to rearm the timer before handshake completion to avoid a deadlock related to anti-amplification. Fix this by removing this code from quic_pto_pktns(). This allows qc_set_timer() to be reentrant and only activate the timer if needed. The impact of this bug seems limited. It can probably caused the timer task to be processed too frequently which could caused too frequent probing. This change will allow to reuse easily qc_set_timer() after quic_conn thread migration. As such, the new timer task will be scheduled only if needed. This should be backported up to 2.6.	2023-04-18 17:09:02 +02:00
Amaury Denoyelle	25174d51ef	MEDIUM: quic: implement thread affinity rebinding Implement a new function qc_set_tid_affinity(). This function is responsible to rebind a quic_conn instance to a new thread. This operation consists mostly of releasing existing tasks and tasklet and allocating new instances on the new thread. If the quic_conn uses its owned socket, it is also migrated to the new thread. The migration is finally completed with updated the CID TID to the new thread. After this step, the connection is thus accessible to the new thread and cannot be access anymore on the old one without risking race condition. To ensure rebinding is either done completely or not at all, tasks and tasklet are pre-allocated before all operations. If this fails, an error is returned and rebiding is not done. To destroy the older tasklet, its context is set to NULL before wake up. In I/O callbacks, a new function qc_process() is used to check context and free the tasklet if NULL. The thread rebinding can cause a race condition if the older thread quic_dghdlrs::dgrams list contains datagram for the connection after rebinding is done. To prevent this, quic_rx_pkt_retrieve_conn() always check if the packet CID is still associated to the current thread or not. In the latter case, no connection is returned and the new thread is returned to allow to redispatch the datagram to the new thread in a thread-safe way. This should be backported up to 2.7 after a period of observation.	2023-04-18 17:08:34 +02:00
Amaury Denoyelle	1304d19dee	MINOR: quic: delay post handshake frames after accept When QUIC handshake is completed on our side, some frames are prepared to be sent : * HANDSHAKE_DONE * several NEW_CONNECTION_ID with CIDs allocated This step was previously executed in quic_conn_io_cb() directly after CRYPTO frames parsing. This patch delays it to be completed after accept. Special care have been taken to ensure it is still functional with 0-RTT activated. For the moment, this patch should have no impact. However, when quic_conn thread migration on accept will be implemented, it will be easier to remap only one CID to the new thread. New CIDs will be allocated after migration on the new thread. This should be backported up to 2.7 after a period of observation.	2023-04-18 17:08:28 +02:00
Amaury Denoyelle	a66e04338e	MINOR: protocol: define new callback set_affinity Define a new protocol callback set_affinity. This function is used during listener_accept() to notify about a rebind on a new thread just before pushing the connection on the selected thread queue. If the callback fails, accept is done locally. This change will be useful for protocols with state allocated before accept is done. For the moment, only QUIC protocol is concerned. This will allow to rebind the quic_conn to a new thread depending on its load. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:52 +02:00
Amaury Denoyelle	987812b190	MINOR: quic: do not proceed to accept for closing conn Each quic_conn is inserted in an accept queue to allocate the upper layers. This is done through a listener tasklet in quic_sock_accept_conn(). This patch interrupts the accept process for a quic_conn in closing/draining state. Indeed, this connection will soon be closed so it's unnecessary to allocate a complete stack for it. This patch will become necessary when thread migration is implemented. Indeed, it won't be allowed to proceed to thread migration for a closing quic_conn. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:48 +02:00
Amaury Denoyelle	f16ec344d5	MEDIUM: quic: handle conn bootstrap/handshake on a random thread TID encoding in CID was removed by a recent change. It is now possible to access to the <tid> member stored in quic_connection_id instance. For unknown CID, a quick solution was to redispatch to the thread corresponding to the first CID byte. This ensures that an identical CID will always be handled by the same thread to avoid creating multiple same connection. However, this forces an uneven load repartition which can be critical for QUIC handshake operation. To improve this, remove the above constraint. An unknown CID is now handled by its receiving thread. However, this means that if multiple packets are received with the same unknown CID, several threads will try to allocate the same connection. To prevent this race condition, CID insertion in global tree is now conducted first before creating the connection. This is a thread-safe operation which can only be executed by a single thread. The thread which have inserted the CID will then proceed to quic_conn allocation. Other threads won't be able to insert the same CID : this will stop the treatment of the current packet which is redispatch to the now owning thread. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:44 +02:00
Amaury Denoyelle	1e959ad522	MINOR: quic: remove TID encoding in CID CIDs were moved from a per-thread list to a global list instance. The TID-encoded is thus non needed anymore. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:31 +02:00
Amaury Denoyelle	e83f937cc1	MEDIUM: quic: use a global CID trees list Previously, quic_connection_id were stored in a per-thread tree list. Datagram were first dispatched to the correct thread using the encoded TID before a tree lookup was done. Remove these trees and replace it with a global trees list of 256 entries. A CID is using the list index corresponding to its first byte. On datagram dispatch, CID is lookup on its tree and TID is retrieved using new member quic_connection_id.tid. As such, a read-write lock protects each list instances. With 256 entries, it is expected that contention should be reduced. A new structure quic_cid_tree served as a tree container associated with its read-write lock. An API is implemented to ensure lock safety for insert/lookup/delete operation. This patch is a step forward to be able to break the affinity between a CID and a TID encoded thread. This is required to be able to migrate a quic_conn after accept to select thread based on their load. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:17 +02:00
Amaury Denoyelle	66947283ba	MINOR: quic: remove TID ref from quic_conn Remove <tid> member in quic_conn. This is moved to quic_connection_id instance. For the moment, this change has no impact. Indeed, qc.tid reference could easily be replaced by tid as all of this work was already done on the connection thread. However, it is planified to support quic_conn thread migration in the future, so removal of qc.tid will simplify this. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	c2a9264f34	MINOR: quic: adjust quic CID derive API ODCID are never stored in the CID tree. Instead, we store our generated CID which is directly derived from the CID using a hash function. This operation is done via quic_derive_cid(). Previously, generated CID was returned as a 64-bits integer. However, this is cumbersome to convert as an array of bytes which is the most common CID representation. Adjust this by modifying return type to a quic_cid struct. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	1a5cc19cec	MINOR: quic: adjust Rx packet type parsing qc_parse_hd_form() is the function used to parse the first byte of a packet and return its type and version. Its API has been simplified with the following changes : * extra out paremeters are removed (long_header and version). All infos are now stored directly in quic_rx_packet instance * a new dummy version is declared in quic_versions array with a 0 number code. This can be used to match Version negotiation packets. * a new default packet type is defined QUIC_PACKET_TYPE_UNKNOWN to be used as an initial value. Also, the function has been exported to an include file. This will be useful to be able to reuse on quic-sock to parse the first packet of a datagram. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	6ac0fb0f13	MINOR: quic: remove uneeded tasklet_wakeup after accept No need to explicitely wakeup quic-conn tasklet after accept is done. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	591e7981d9	CLEANUP: quic: rename quic_connection_id vars Two different structs exists for QUIC connection ID : * quic_connection_id which represents a full CID with its sequence number * quic_cid which is just a buffer with a length. It is contained in the above structure. To better differentiate them, rename all quic_connection_id variable instances to "conn_id" by contrast to "cid" which is used for quic_cid. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	9b68b64572	CLEANUP: quic: remove unused qc param on stateless reset token Remove quic_conn instance as first parameter of quic_stateless_reset_token_init() and quic_stateless_reset_token_cpy() functions. It was only used for trace purpose. The main advantage is that it will be possible to allocate a QUIC CID without a quic_conn instance using new_quic_cid() which is requires to first check if a CID is existing before allocating a connection. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	90e5027e46	CLEANUP: quic: remove unused scid_node Remove unused scid_node member for quic_conn structure. It was prepared for QUIC backend support. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	22a368ce58	CLEANUP: quic: remove unused QUIC_LOCK label QUIC_LOCK label is never used. Indeed, lock usage is minimal on QUIC as every connection is pinned to its owned thread. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	c361937d51	BUG/MINOR: task: allow to use tasklet_wakeup_after with tid -1 Adjust BUG_ON() statement to allow tasklet_wakeup_after() for tasklets with tid pinned to -1 (the current thread). This is similar to tasklet_wakeup(). This should be backported up to 2.6.	2023-04-18 16:20:47 +02:00
Willy Tarreau	ca1027c22f	MINOR: mux-h2: make the max number of concurrent streams configurable per side For a long time the maximum number of concurrent streams was set once for both sides (front and back) while the impacts are different. This commit allows it to be configured separately for each side. The older settings remains the fallback choice when other ones are not set.	2023-04-18 15:58:55 +02:00
Willy Tarreau	9d7abda787	MINOR: mux-h2: make the initial window size configurable per side For a long time the initial window size (per-stream size) was set once for both directions, frontend and backend, resulting in a tradeoff between upload speed and download fairness. This commit allows it to be configured separately for each side. The older settings remains the fallback choice when other ones are not set.	2023-04-18 15:58:55 +02:00
Christopher Faulet	b36e512bd0	MINOR: stconn: Propagate EOS from an applet to the attached stream-connector In the same way than for a stream-connector attached to a mux, an EOS is now propagated from an applet to its stream-connector. To do so, sc_applet_eos() function is added.	2023-04-17 17:41:28 +02:00
Christopher Faulet	1aec6c92cb	MINOR: stconn: Propagate EOS from a mux to the attached stream-connector Now there is a SC flag to state the endpoint has reported an end-of-stream, it is possible to distinguish an EOS from an abort at the stream-connector level. sc_conn_read0() function is renamed to sc_conn_eos() and it propagates an EOS by setting SC_FL_EOS instead of SC_FL_ABRT_DONE. It only concernes stream-connectors attached to a mux.	2023-04-17 17:41:28 +02:00
Christopher Faulet	ca5309a9a3	MINOR: stconn: Add a flag to report EOS at the stream-connector level SC_FL_EOS flag is added to report the end-of-stream at the SC level. It will be used to distinguish end of stream reported by the endoint, via the SE_FL_EOS flag, and the abort triggered by the stream, via the SC_FL_ABRT_DONE flag. In this patch, the flag is defined and is systematically tested everywhere SC_FL_ABRT_DONE is tested. It should be safe because it is never set.	2023-04-17 17:41:28 +02:00
Christopher Faulet	285aa40d35	BUG/MEDIUM: log: Properly handle client aborts in syslog applet In the syslog applet, when there is no output data, nothing is performed and the applet leaves by requesting more data. But it is an issue because a client abort is only handled if it reported with the last bytes of the message. If the abort occurs after the message was handled, it is ignored. The session remains opened and inactive until the client timeout is being triggered. It no such timeout is configured, given that the default maxconn is 10, all slots can be quickly busy and make the applet unresponsive. To fix the issue, the best is to always try to read a message when the I/O handle is called. This way, the abort can be handled. And if there is no data, we leave as usual. This patch should fix the issue #2112. It must be backported as far as 2.4.	2023-04-17 16:50:30 +02:00
Christopher Faulet	9704797fa2	BUG/MEDIUM: http-ana: Properly switch the request in tunnel mode on upgrade Since the commit `f2b02cfd9` ("MAJOR: http-ana: Review error handling during HTTP payload forwarding"), during the payload forwarding, we are analyzing a side, we stop to test the opposite side. It means when the HTTP request forwarding analyzer is called, we no longer check the response side and vice versa. Unfortunately, since then, the HTTP tunneling is broken after a protocol upgrade. On the response is switch in TUNNEL mode. The request remains in DONE state. As a consequence, data received from the server are forwarded to the client but not data received from the client. To fix the bug, when both sides are in DONE state, both are switched in same time in TUNNEL mode if it was requested. It is performed in the same way in http_end_request() and http_end_response(). This patch should fix the issue #2125. It is 2.8-specific. No backport needed.	2023-04-17 16:17:35 +02:00
William Lallemand	a21ca74e83	MINOR: ssl: remove OpenSSL 1.0.2 mention into certificate loading error Remove the mention to OpenSSL 1.0.2 in the certificate chain loading error, which is not relevant. Could be backported in 2.7.	2023-04-17 14:45:40 +02:00
Ilya Shipitsin	2ca01589a0	CLEANUP: use "offsetof" where appropriate let's use the C library macro "offsetof"	2023-04-16 09:58:49 +02:00
Fr�d�ric L�caille	b5efe7901d	BUG/MINOR: quic: Do not use ack delay during the handshakes As revealed by GH #2120 opened by @Tristan971, there are cases where ACKs have to be sent without packet to acknowledge because the ACK timer has been triggered and the connection needs to probe the peer at the same time. Indeed Thank you to @Tristan971 for having reported this issue. Must be backported to 2.6 and 2.7.	2023-04-14 21:09:13 +02:00
Christopher Faulet	75b954fea4	BUG/MINOR: stconn: Don't set SE_FL_ERROR at the end of sc_conn_send() When I reworked my series, this code was first removed and reinserted by error. So let's remove it again.	2023-04-14 17:32:44 +02:00
Christopher Faulet	25d9fe50f5	MEDIUM: stconn: Rely on SC flags to handle errors instead of SE flags It is the last commit on this subject. we stop to use SE_FL_ERROR flag from the SC, except at the I/O level. Otherwise, we rely on SC_FL_ERROR flag. Now, there should be a real separation between SE flags and SC flags.	2023-04-14 17:05:54 +02:00
Christopher Faulet	e182a8e651	MEDIUM: stream: Stop to use SE flags to detect endpoint errors Here again, we stop to use SE_FL_ERROR flag from process_stream() and sub-functions and we fully rely on SC_FL_ERROR to do so.	2023-04-14 17:05:54 +02:00
Christopher Faulet	d7bac88427	MEDIUM: stream: Stop to use SE flags to detect read errors from analyzers In the same way the previous commit, we stop to use SE_FL_ERROR flag from analyzers and their sub-functions. We now fully rely on SC_FL_ERROR to do so.	2023-04-14 17:05:54 +02:00
Christopher Faulet	725170eee6	MEDIUM: backend: Stop to use SE flags to detect connection errors SE_FL_ERROR flag is no longer set when an error is detected durign the connection establishment. SC_FL_ERROR flag is set instead. So it is safe to remove test on SE_FL_ERROR to detect connection establishment error.	2023-04-14 17:05:54 +02:00
Christopher Faulet	88d05a0f3b	MEDIUM: tree-wide: Stop to set SE_FL_ERROR from upper layer We can now fully rely on SC_FL_ERROR flag from the stream. The first step is to stop to set the SE_FL_ERROR flag. Only endpoints are responsible to set this flag. It was a design limitation. It is now fixed.	2023-04-14 17:05:54 +02:00
Christopher Faulet	ad46e52814	MINOR: tree-wide: Test SC_FL_ERROR with SE_FL_ERROR from upper layer From the stream, when SE_FL_ERROR flag is tested, we now also test the SC_FL_ERROR flag. Idea is to stop to rely on the SE descriptor to detect errors.	2023-04-14 17:05:54 +02:00
Christopher Faulet	340021b89f	MINOR: stream: Set SC_FL_ERROR on channels' buffer allocation error Set SC_FL_ERROR flag when we fail to allocate a buffer for a stream.	2023-04-14 17:05:54 +02:00
Christopher Faulet	38656f406c	MINOR: backend: Set SC_FL_ERROR on connection error During connection establishement, if an error occurred, the SC_FL_ERROR flag is now set. Concretely, it is set when SE_FL_ERROR is also set.	2023-04-14 17:05:53 +02:00
Christopher Faulet	a1d14a7c7f	MINOR: stconn: Add a flag to ack endpoint errors at SC level The flag SC_FL_ERROR is added to ack errors on the endpoint. When SE_FL_ERROR flag is detected on the SE descriptor, the corresponding is set on the SC. Idea is to avoid, as far as possible, to manipulated the SE descriptor in upper layers and know when an error in the endpoint is handled by the SC. For now, this flag is only set and cleared but never tested.	2023-04-14 17:05:53 +02:00
Christopher Faulet	638fe6ab0f	MINOR: stconn: Don't clear SE_FL_ERROR when endpoint is reset There is no reason to remove this flag. When the SC endpoint is reset, it is replaced by a new one. The old one is released. It was useful when the new endpoint inherited some flags from the old one. But it is no longer performed. Thus there is no reason still unset this flag.	2023-04-14 17:05:53 +02:00
Christopher Faulet	e8bcef5f22	MEDIUM: stconn: Forbid applets with more to deliver if EOI was reached When an applet is woken up, before calling its io_handler, we pretend it has no more data to deliver. So, after the io_handler execution, it is a bug if an applet states it has more data to deliver while the end of input is reached. So a BUG_ON() is added to be sure it never happens.	2023-04-14 17:05:53 +02:00
Christopher Faulet	56a2b608b0	MINOR: stconn: Stop to set SE_FL_ERROR on sending path It is not the SC responsibility to report errors on the SE descriptor. It is the endpoint responsibility. It must switch SE_FL_ERR_PENDING into SE_FL_ERROR if the end of stream was detected. It can even be considered as a bug if it is not done by he endpoint. So now, on sending path, a BUG_ON() is added to abort if SE_FL_EOS and SE_FL_ERR_PENDING flags are set but not SE_FL_ERROR. It is trully important to handle this case in the endpoint to be able to properly shut the endpoint down.	2023-04-14 17:05:53 +02:00
Christopher Faulet	d3bc340e7e	BUG/MINOR: cli: Don't close when SE_FL_ERR_PENDING is set in cli analyzer SE_FL_ERR_PENDING is used to report an error on the write side. But it is not a terminal error. Some incoming data may still be available. In the cli analyzers, it is important to not close the stream when this flag is set. Otherwise the response to a command can be truncated. It is probably hard to observe. But it remains a bug. While this patch could be backported to 2.7, there is no real reason to do so, except if someone reports a bug about truncated responses.	2023-04-14 16:49:04 +02:00
Christopher Faulet	214f1b5c16	MINOR: tree-wide: Replace several chn_prod() by the corresponding SC At many places, call to chn_prod() can be easily replaced by the corresponding SC. It is a bit easier to understand which side is manipulated.	2023-04-14 15:06:04 +02:00
Christopher Faulet	64350bbf05	MINOR: tree-wide: Replace several chn_cons() by the corresponding SC At many places, call to chn_cons() can be easily replaced by the corresponding SC. It is a bit easier to understand which side is manipulated.	2023-04-14 15:04:03 +02:00
Christopher Faulet	b2b1c3a6ea	MINOR: channel/stconn: Replace sc_shutw() by sc_shutdown() All reference to a shutw is replaced by an abort. So sc_shutw() is renamed sc_shutdown(). SC app ops functions are renamed accordingly.	2023-04-14 15:02:57 +02:00
Christopher Faulet	208c712b40	MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE Here again, it is just a flag renaming. In SC flags, there is no longer shutdown for writes but shutdowns.	2023-04-14 15:01:21 +02:00
Christopher Faulet	cfc11c0eae	MINOR: channel/stconn: Replace sc_shutr() by sc_abort() All reference to a shutr is replaced by an abort. So sc_shutr() is renamed sc_abort(). SC app ops functions are renamed accordingly.	2023-04-14 14:54:35 +02:00
Christopher Faulet	0c370eee6d	MINOR: stconn: Rename SC_FL_SHUTR in SC_FL_ABRT_DONE Here again, it is just a flag renaming. In SC flags, there is no longer shutdown for reads but aborts. For now this flag is set when a read0 is detected. It is of couse not accurate. This will be changed later.	2023-04-14 14:51:22 +02:00
Christopher Faulet	df7cd710a8	MINOR: channel/stconn: Replace channel_shutw_now() by sc_schedule_shutdown() After the flag renaming, it is now the turn for the channel function to be renamed and moved in the SC scope. channel_shutw_now() is replaced by sc_schedule_shutdown(). The request channel is replaced by the front SC and the response is replace by the back SC.	2023-04-14 14:49:45 +02:00
Christopher Faulet	e38534cbd0	MINOR: stconn: Rename SC_FL_SHUTW_NOW in SC_FL_SHUT_WANTED Because shutowns for reads are now considered as aborts, the shudowns for writes can now be considered as shutdowns. Here it is just a flag renaming. SC_FL_SHUTW_NOW is renamed SC_FL_SHUT_WANTED.	2023-04-14 14:46:07 +02:00
Christopher Faulet	12762f09a5	MINOR: channel/stconn: Replace channel_shutr_now() by sc_schedule_abort() After the flag renaming, it is now the turn for the channel function to be renamed and moved in the SC scope. channel_shutr_now() is replaced by sc_schedule_abort(). The request channel is replaced by the front SC and the response is replace by the back SC.	2023-04-14 14:08:49 +02:00
Christopher Faulet	573ead1e68	MINOR: stconn: Rename SC_FL_SHUTR_NOW in SC_FL_ABRT_WANTED It is the first step to transform shutdown for reads for the upper layer into aborts. This patch is quite simple, it is just a flag renaming.	2023-04-14 14:06:01 +02:00
Christopher Faulet	7eb837df4a	MINOR: stream: Introduce stream_abort() to abort on both sides in same time The function stream_abort() should now be called when an abort is performed on the both channels in same time.	2023-04-14 14:04:59 +02:00
Christopher Faulet	3db538ac2f	MINOR: channel: Forwad close to other side on abort Most of calls to channel_abort() are associated to a call to channel_auto_close(). Others are in areas where the auto close is the default. So, it is now systematically enabled when an abort is performed on a channel, as part of channel_abort() function.	2023-04-14 13:56:28 +02:00
Christopher Faulet	0adffb62c1	MINOR: filters: Review and simplify errors handling First, it is useless to abort the both channel explicitly. For HTTP streams, http_reply_and_close() is called. This function already take care to abort processing. For TCP streams, we can rely on stream_retnclose(). To set termination flags, we can also rely on http_set_term_flags() for HTTP streams and sess_set_term_flags() for TCP streams. Thus no reason to handle them by hand. At the end, the error handling after filters evaluation is now quite simple.	2023-04-14 12:13:09 +02:00
Christopher Faulet	dbad8ec787	MINOR: stream: Uninline and export sess_set_term_flags() function This function will be used to set termination flags on TCP streams from outside of process_stream(). Thus, it must be uninlined and exported.	2023-04-14 12:13:09 +02:00
Christopher Faulet	95125886ee	BUG/MEDIUM: stconn: Do nothing in sc_conn_recv() when the SC needs more room We erroneously though that an attempt to receive data was not possible if the SC was waiting for more room in the channel buffer. A BUG_ON() was added to detect bugs. And in fact, it is possible. The regression was added in commit `341a5783b` ("BUG/MEDIUM: stconn: stop to enable/disable reads from streams via si_update_rx"). This patch should fix the issue #2115. It must be backported if the commit above is backported.	2023-04-14 12:13:09 +02:00
Christopher Faulet	915ba08b57	BUG/MEDIUM: stream: Report write timeouts before testing the flags A regression was introduced when stream's timeouts were refactored. Write timeouts are not testing is the right order. When timeous of the front SC are handled, we must then test the read timeout on the request channel and the write timeout on the response channel. But write timeout is tested on the request channel instead. On the back SC, the same mix-up is performed. We must be careful to handle timeouts before checking channel flags. To avoid any confusions, all timeuts are handled first, on front and back SCs. Then flags of the both channels are tested. It is a 2.8-specific issue. No backport needed.	2023-04-14 12:13:09 +02:00
Christopher Faulet	925279ccf2	BUG/MINOR: stream: Fix test on SE_FL_ERROR on the wrong entity There is a bug at begining of process_stream(). The SE_FL_ERROR flag is tested against backend stream-connector's flags instead of its SE descriptor's flags. It is an old typo, introduced when the stream-interfaces were replaced by the conn-streams. This patch must be backported as far as 2.6.	2023-04-14 12:13:09 +02:00
Frédéric Lécaille	895700bd32	BUG/MINOR: quic: Wrong Application encryption level selection when probing This bug arrived with this commit: MEDIUM: quic: Ack delay implementation After having probed the Handshake packet number space, one must not select the Application encryption level to continue trying building packets as this is done when the connection is not probing. Indeed, if the ACK timer has been triggered in the meantime, the packet builder will try to build a packet at the Application encryption level to acknowledge the received packet. But there is very often no 01RTT packet to acknowledge when the connection is probing before the handshake is completed. This triggers a BUG_ON() in qc_do_build_pkt() which checks that the tree of ACK ranges to be used is not empty. Thank you to @Tristan971 for having reported this issue in GH #2109. Must be backported to 2.6 and 2.7.	2023-04-13 19:20:09 +02:00
Frédéric Lécaille	a576c1b0c6	MINOR: quic: Remove a useless test about probing in qc_prep_pkts() qel->pktns->tx.pto_probe is set to 0 after having prepared a probing datagram. There is no reason to check this parameter. Furthermore it is always 0 when the connection does not probe the peer. Must be backported to 2.6 and 2.7.	2023-04-13 19:20:09 +02:00
Frédéric Lécaille	91369cfcd0	MINOR: quic: Display the packet number space flags in traces Display this information when the encryption level is also displayed. Must be backported to 2.6 and 2.7.	2023-04-13 19:20:08 +02:00
Frédéric Lécaille	595251f22e	BUG/MINOR: quic: SIGFPE in quic_cubic_update() As reported by @Tristan971 in GH #2116, the congestion control window could be zero due to an inversion in the code about the reduction factor to be applied. On a new loss event, it must be applied to the slow start threshold and the window should never be below ->min_cwnd (2*max_udp_payload_sz). Same issue in both newReno and cubic algorithm. Furthermore in newReno, only the threshold was decremented. Must be backported to 2.6 and 2.7.	2023-04-13 19:20:08 +02:00
Frédéric Lécaille	9d68c6aaf6	BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit. Add two missing checks not to substract too big values from another too little one. In this case the resulted wrapped huge values could be passed to the function which has to remove the last range of a tree of ACK ranges as encoded limit size not to go below, cancelling the ACK ranges deletion. The consequence could be that no ACK were sent. Must be backported to 2.6 and 2.7.	2023-04-13 19:20:08 +02:00
Frédéric Lécaille	45bf1a82f1	BUG/MEDIUM: quic: Code sanitization about acknowledgements requirements qc_may_build_pkt() has been modified several times regardless of the conditions the functions it is supposed to allow to send packets (qc_build_pkt()/qc_do_build_pkt()) really use to finally send packets just after having received others, leading to contraditions and possible very long loops sending empty packets (PADDING only packets) because qc_may_build_pkt() could allow qc_build_pkt()/qc_do_build_pkt to build packet, and the latter did nothing except sending PADDING frames, because from its point of view they had nothing to send. For now on, this is the job of qc_may_build_pkt() to decide to if there is packets to send just after having received others AND to provide this information to the qc_build_pkt()/qc_do_build_pkt() Note that the unique case where the acknowledgements are completely ignored is when the endpoint must probe. But at least this is when sending at most two datagrams! This commit also fixes the issue reported by Willy about a very low throughput performance when the client serialized its requests. Must be backported to 2.7 and 2.6.	2023-04-13 19:20:08 +02:00
Frédéric Lécaille	eb3e5171ed	MINOR: quic: Add connection flags to traces This should help in diagnosing issues. Some adjustments have to be done to avoid deferencing a quic_conn objects from TRACE_*() calls. Must be backported to 2.7 and 2.6.	2023-04-13 19:20:08 +02:00
Frédéric Lécaille	809bd9fed1	BUG/MINOR: quic: Ignored less than 1ms RTTs Do not ignore very short RTTs (less than 1ms) before computing the smoothed RTT initializing it to an "infinite" value (UINT_MAX). Must be backported to 2.7 and 2.6.	2023-04-13 19:20:08 +02:00
Frédéric Lécaille	fad0e6cf73	MINOR: quic: Add packet loss and maximum cc window to "show quic" Add the number of packet losts and the maximum congestion control window computed by the algorithms to "show quic". Same thing for the traces of existent congestion control algorithms. Must be backported to 2.7 and 2.6.	2023-04-13 19:20:08 +02:00
Olivier Houchard	f98a8c317e	BUG/MEDIUM: fd: don't wait for tmask to stabilize if we're not in it. In fd_update_events(), we loop until there's no bit in the running_mask that is not in the thread_mask. Problem is, the thread sets its running_mask bit before that loop, and so if 2 threads do the same, and a 3rd one just closes the FD and sets the thread_mask to 0, then running_mask will always be non-zero, and we will loop forever. This is trivial to reproduce when using a DNS resolver that will just answer "port unreachable", but could theoretically happen with other types of file descriptors too. To fix that, just don't bother looping if we're no longer in the thread_mask, if that happens we know we won't have to take care of the FD, anyway. This should be backported to 2.7, 2.6 and 2.5.	2023-04-13 18:04:46 +02:00
Willy Tarreau	a07635ead5	MINOR: bind-conf: support a new shards value: "by-group" Setting "shards by-group" will create one shard per thread group. This can often be a reasonable tradeoff between a single one that can be suboptimal on CPUs with many cores, and too many that will eat a lot of file descriptors. It was shown to provide good results on a 224 thread machine, with a distribution that was even smoother than the system's since here it can take into account the number of connections per thread in the group. Depending on how popular it becomes, it could even become the default setting in a future version.	2023-04-13 17:38:31 +02:00
Willy Tarreau	d30e82b9f0	MINOR: receiver: reserve special values for "shards" Instead of artificially setting the shards count to MAX_THREAD when "by-thread" is used, let's reserve special values for symbolic names so that we can add more in the future. For now we use value -1 for "by-thread", which requires to turn the type to signed int but it was already used as such everywhere anyway.	2023-04-13 17:12:50 +02:00
Amaury Denoyelle	53fc98c3bc	MINOR: fd: implement fd_migrate_on() to migrate on a non-local thread fd_migrate_on() can be used to migrate an existing FD to any thread, even one belonging to a different group from the current one and from the caller's. All that is needed is to make sure the FD is still valid when the operation is performed (which is the case when such operations happen). This is potentially slightly expensive since it locks the tgid during the delicate operation, but it is normally performed only from an owning thread to offer the FD to another one (e.g. reassign a better thread upon accept()).	2023-04-13 16:57:51 +02:00
Willy Tarreau	97da942ba6	MINOR: thread: keep a bitmask of enabled groups in thread_set We're only checking for 0, 1, or >1 groups enabled there, and we'll soon need to be more precise and know quickly which groups are non-empty. Let's just replace the count with a mask of enabled groups. This will allow to quickly spot the presence of any such group in a set.	2023-04-13 16:57:51 +02:00
William Lallemand	3f210970bf	BUG/MINOR: stick_table: alert when type len has incorrect characters Alert when the len argument of a stick table type contains incorrect characters. Replace atol by strtol. Could be backported in every maintained versions.	2023-04-13 14:46:08 +02:00
Willy Tarreau	28f2a590f6	MINOR: activity: add a line reporting the average CPU usage to "show activity" It was missing from the output but is sometimes convenient to observe and understand how incoming connections are distributed. The CPU usage is reported as the instant measurement of 100-idle_pct for each thread, and the average value is shown for the aggregated value. This could be backported as it's helpful in certain troublehsooting sessions.	2023-04-12 08:42:52 +02:00
Frédéric Lécaille	6fd2576d5e	MINOR: quic: Add a trace for packet with an ACK frame As the ACK frames are not added to the packet list of ack-eliciting frames, it could not be traced. But there is a flag to identify such packet. Let's use it to add this information to the traces of TX packets. Must be backported to 2.6 and 2.7.	2023-04-11 10:47:19 +02:00
Frédéric Lécaille	e47adca432	MINOR: quic: Dump more information at proto level when building packets This should be helpful to debug issues at without too much traces. Must be backported to 2.7 and 2.6.	2023-04-11 10:47:19 +02:00
Frédéric Lécaille	c0aaa07aa3	MINOR: quic: Modify qc_try_rm_hp() traces Dump at proto level the packet information when its header protection was removed. Remove no more use qpkt_trace variable. Must be backported to 2.7 and 2.6.	2023-04-11 10:47:19 +02:00
Frédéric Lécaille	68737316ea	BUG/MINOR: quic: Wrong packet number space probing before confirmed handshake It is possible that the handshake was not confirmed and there was no more packet in flight to probe with. It this case the server must wait for the client to be unblocked without probing any packet number space contrary to what was revealed by interop tests as follows: [01\|quic\|2\|uic_loss.c:65] TX loss pktns : qc@0x7fac301cd390 pktns=I pp=0 [01\|quic\|2\|uic_loss.c:67] TX loss pktns : qc@0x7fac301cd390 pktns=H pp=0 tole=-102ms [01\|quic\|2\|uic_loss.c:67] TX loss pktns : qc@0x7fac301cd390 pktns=01RTT pp=0 if=1054 tole=-1987ms [01\|quic\|5\|uic_loss.c:73] quic_loss_pktns(): leaving : qc@0x7fac301cd390 [01\|quic\|5\|uic_loss.c:91] quic_pto_pktns(): entering : qc@0x7fac301cd390 [01\|quic\|3\|ic_loss.c:121] TX PTO handshake not already completed : qc@0x7fac301cd390 [01\|quic\|2\|ic_loss.c:141] TX PTO : qc@0x7fac301cd390 pktns=I pp=0 dur=83ms [01\|quic\|5\|ic_loss.c:142] quic_pto_pktns(): leaving : qc@0x7fac301cd390 [01\|quic\|3\|c_conn.c:5179] needs to probe Initial packet number space : qc@0x7fac301cd390 This bug was not visible before this commit: BUG/MINOR: quic: wake up MUX on probing only for 01RTT This means that before it, one could do bad things (probing the 01RTT packet number space before the handshake was confirmed). Must be backported to 2.7 and 2.6.	2023-04-11 10:47:19 +02:00
Frédéric Lécaille	2513b1dd7b	MINOR: quic: Trace fix in quic_pto_pktns() (handshaske status) The handshake must be confirmed before probing the 01RTT packet number space. Must be backported to 2.7 and 2.6.	2023-04-11 10:47:19 +02:00
Christopher Faulet	c202c740b5	BUG/MEDIUM: mux-h2: Never set SE_FL_EOS without SE_FL_EOI or SE_FL_ERROR When end-of-stream is reported by a H2 stream, we must take care to also report an error is end-of-input was not reported. Indeed, it is now mandatory to set SE_FL_EOI or SE_FL_ERROR flags when SE_FL_EOS is set. It is a 2.8-specific issue. No backport needed.	2023-04-11 08:59:10 +02:00
Christopher Faulet	c393c9e388	BUG/MEDIUM: mux-h1: Report EOI when a TCP connection is upgraded to H2 When TCP connection is first upgrade to H1 then to H2, the stream-connector, created by the PT mux, must be destroyed because the H2 mux cannot inherit from it. When it is performed, the SE_FL_EOS flag is set but SE_FL_EOI must also be set. It is now required to never set SE_FL_EOS without SE_FL_EOI or SE_FL_ERROR. It is a 2.8-specific issue. No backport needed.	2023-04-11 08:45:18 +02:00
Christopher Faulet	f65cf3684d	MINOR: hlua: Stop to check the SC state when executing a hlua cli command This part has changed but it was already handled by the CLI applet. There is no reason to performe this test when a hlua cli command is executed.	2023-04-11 08:19:06 +02:00
Christopher Faulet	5220a8c5c4	BUG/MEDIUM: resolvers: Force the connect timeout for DNS resolutions Timeouts for dynamic resolutions are not handled at the stream level but by the resolvers themself. It means there is no connect, client and server timeouts defined on the internal proxy used by a resolver. While it is not an issue for DNS resolution over UDP, it can be a problem for resolution over TCP. New sessions are automatically created when required, and killed on excess. But only established connections are considered. Connecting ones are never killed. Because there is no conncet timeout, we rely on the kernel to report a connection error. And this may be quite long. Because resolutions are periodically triggered, this may lead to an excess of unusable sessions in connecting state. This also prevents HAProxy to quickly exit on soft-stop. It is annoying, especially because there is no reason to not set a connect timeout. So to mitigate the issue, we now use the "resolve" timeout as connect timeout for the internal proxy attached to a resolver. This patch should be backported as far as 2.4.	2023-04-11 08:19:06 +02:00
Christopher Faulet	142cc1b52a	BUG/MINOR: resolvers: Wakeup DNS idle task on stopping Thanks to previous commit ("BUG/MEDIUM: dns: Kill idle DNS sessions during stopping stage"), DNS idle sessions are killed on stopping staged. But the task responsible to kill these sessions is running every 5 seconds. It means, when HAProxy is stopped, we can observe a delay before the process exits. To reduce this delay, when the resolvers task is executed, all DNS idle tasks are woken up. This patch must be backported as far as 2.6.	2023-04-11 08:19:06 +02:00
Christopher Faulet	e0f4717727	BUG/MEDIUM: dns: Kill idle DNS sessions during stopping stage There is no server timeout for DNS sessions over TCP. It means idle session cannot be killed by itself. There is a task running peridically, every 5s, to kill the excess of idle sessions. But the last one is never killed. During the stopping stage, it is an issue since the dynamic resolutions are no longer performed (2ec6f14c "BUG/MEDIUM: resolvers: Properly stop server resolutions on soft-stop"). Before the above commit, during stopping stage, the DNS sessions were killed when a resolution was triggered. Now, nothing kills these sessions. This prevents the process to finish on soft-stop. To fix this bug, the task killing excess of idle sessions now kill all idle sessions on stopping stage. This patch must be backported as far as 2.6.	2023-04-11 08:19:06 +02:00
Christopher Faulet	211452ef9a	BUG/MEDIUM: log: Eat output data when waiting for appctx shutdown When the log applet is executed while a shut is pending, the remaining output data must always be consumed. Otherwise, this can prevent the stream to exit, leading to a spinning loop on the applet. It is 2.8-specific. No backport needed.	2023-04-11 08:19:06 +02:00
Christopher Faulet	9837bd86dc	BUG/MEDIUM: stats: Eat output data when waiting for appctx shutdown When the stats applet is executed while a shut is pending, the remaining output data must always be consumed. Otherwise, this can prevent the stream to exit, leading to a spinning loop on the applet. It is 2.8-specific. No backport needed.	2023-04-11 07:43:26 +02:00
Christopher Faulet	1901c1bf5a	BUG/MEDIUM: http-client: Eat output data when waiting for appctx shutdown When the http-client applet is executed while a shut is pending, the remaining output data must always be consumed. Otherwise, this can prevent the stream to exit, leading to a spinning loop on the applet. It is 2.8-specific. No backport needed.	2023-04-11 07:43:26 +02:00
Christopher Faulet	1fb97e47f0	BUG/MEDIUM: cli: Eat output data when waiting for appctx shutdown When the cli applet is executed while a shut is pending, the remaining output data must always be consumed. Otherwise, this can prevent the stream to exit, leading to a spinning loop on the applet. This patch should fix the issue #2107. It is 2.8-specific. No backport needed.	2023-04-11 07:43:26 +02:00
Christopher Faulet	33af99655e	BUG/MEDIUM: cli: Set SE_FL_EOI flag for '_getsocks' and 'quit' commands An applet must never set SE_FL_EOS flag without SE_FL_EOI or SE_FL_ERROR flags. Here, SE_FL_EOI flag was missing for "quit" or "_getsocks" commands. Indeed, these commands are terminal. This bug triggers a BUG_ON() recently added. This patch is related to the issue #2107. It is 2.8-specific. No backport needed.	2023-04-11 07:43:26 +02:00
Olivier Houchard	0963b8a07f	BUG/MEDIUM: listeners: Use the right parameters for strlcpy2(). When calls to strcpy() were replaced with calls to strlcpy2(), one of them was replaced wrong, and the source and size were inverted. Correct that. This should fix issue #2110.	2023-04-08 15:01:57 +02:00
Willy Tarreau	fc458ec8aa	CLEANUP: tree-wide: remove strpcy() from constant strings These ones are genenerally harmless on modern compilers because the compiler checks them. While gcc optimizes them away without even referencing strcpy(), clang prefers to call strcpy(). Nevertheless they prevent from enabling stricter checks so better remove them altogether. They were all replaced by strlcpy2() and the size of the destination which is always known there.	2023-04-07 18:14:28 +02:00
Willy Tarreau	6d4c0c2ca2	CLEANUP: ocsp: do no use strpcy() to copy a path! strcpy() is quite nasty but tolerable to copy constants, but here it copies a variable path into a node in a code path that's not trivial to follow given that it takes the node as the result of a tree lookup. Let's get rid of it and mention where the entry is retrieved.	2023-04-07 17:57:05 +02:00
Willy Tarreau	a0fa577070	CLEANUP: tcpcheck: remove the only occurrence of sprintf() in the code There's a single sprintf() in the whole code, in the "option smtpchk" parser in tcpcheck.c. Let's turn it to a safer snprintf().	2023-04-07 16:04:54 +02:00
Willy Tarreau	22450af22a	BUG/MINOR: lua: remove incorrect usage of strncat() As every time strncat() is used, it's wrong, and this one is no exception. Users often think that the length applies to the destination except it applies to the source and makes it hard to use correctly. The bug did not have an impact because the length was preallocated from the sum of all the individual lengths as measured by strlen() so there was no chance one of them would change in between. But it could change in the future. Let's fix it to use memcpy() instead for strings, or byte copies for delimiters. No backport is needed, though it can be done if it helps to apply other fixes.	2023-04-07 16:04:54 +02:00
Olivier Houchard	ead43fe4f2	MEDIUM: compression: Make it so we can compress requests as well. Add code so that compression can be used for requests as well. New compression keywords are introduced : "direction" that specifies what we want to compress. Valid values are "request", "response", or "both". "type-req" and "type-res" define content-type to be compressed for requests and responses, respectively. "type" is kept as an alias for "type-res" for backward compatibilty. "algo-req" specifies the compression algorithm to be used for requests. Only one algorithm can be provided. "algo-res" provides the list of algorithm that can be used to compress responses. "algo" is kept as an alias for "algo-res" for backward compatibility.	2023-04-07 00:49:17 +02:00
Olivier Houchard	dea25f51b6	MINOR: compression: Count separately request and response compression Duplicate the compression counters, so that we have separate counters for request and response compression.	2023-04-07 00:47:04 +02:00
Olivier Houchard	db573e9c58	MINOR: compression: Store algo and type for both request and response Make provision for being able to store both compression algorithms and content-types to compress for both requests and responses. For now only the responses one are used.	2023-04-07 00:46:59 +02:00
Olivier Houchard	dfc11da561	MINOR: compression: Prepare compression code for request compression Make provision for storing the compression algorithm and the compression context twice, one for requests, and the other for responses. Only the response ones are used for now.	2023-04-07 00:46:55 +02:00
Olivier Houchard	3ce0f01b81	MINOR: compression: Make compression offload a flag Turn compression offload into a flag in struct comp, instead of using an int just for it.	2023-04-07 00:46:45 +02:00
Christopher Faulet	8eeec38bfa	MINOR: applet: Use unsafe version to get stream from SC in the trace function When a trace message for an applet is dumped, if the SC exists, the stream always exists too. There is no way to attached an applet to a health-check. So, we can use the unsafe version __sc_strm() to get the stream. This patch is related to #2106. Not sure it will be enough for Coverity. However, there is no bug here.	2023-04-06 08:48:17 +02:00
Aurelien DARRAGON	b28ded19a4	BUG/MINOR: errors: invalid use of memprintf in startup_logs_init() On startup/reload, startup_logs_init() will try to export startup logs shm filedescriptor through the internal HAPROXY_STARTUPLOGS_FD env variable. While memprintf() is used to prepare the string to be exported via setenv(), str_fd argument (first argument passed to memprintf()) could be non NULL as a result of HAPROXY_STARTUPLOGS_FD env variable being already set. Indeed: str_fd is already used earlier in the function to store the result of getenv("HAPROXY_STARTUPLOGS_FD"). The issue here is that memprintf() is designed to free the 'out' argument if out != NULL, and here we don't expect str_fd to be freed since it was provided by getenv() and would result in memory violation. To prevent any invalid free, we must ensure that str_fd is set to NULL prior to calling memprintf(). This must be backported in 2.7 with `eba6a54cd4` ("MINOR: logs: startup-logs can use a shm for logging the reload")	2023-04-05 17:06:38 +02:00
William Lallemand	b4e651f12f	BUG/MINOR: mworker: unset more internal variables from program section People who use HAProxy as a process 1 in containers sometimes starts other things from the program section. This is still not recommend as the master process has minimal features regarding process management. Environment variables are still inherited, even internal ones. Since 2.7, it could provoke a crash when inheriting the HAPROXY_STARTUPLOGS_FD variable. Note: for future releases it should be better to clean the env and sets a list of variable to be exported. We need to determine which variables are used by users before. Must be backported in 2.7.	2023-04-05 16:02:36 +02:00
Amaury Denoyelle	15adc4cc4e	MINOR: quic: remove address concatenation to ODCID Previously, ODCID were concatenated with the client address. This was done to prevent a collision between two endpoints which used the same ODCID. Thanks to the two previous patches, first connection generated CID is now directly derived from the client ODCID using a hash function which uses the client source address from the same purpose. Thus, it is now unneeded to concatenate client address to <odcid> quic-conn member. This change allows to simplify the quic_cid structure management and reduce its size which is important as it is embedded several times in various structures such as quic_conn and quic_rx_packet. This should be backported up to 2.7.	2023-04-05 11:09:57 +02:00
Amaury Denoyelle	2c98209c1c	MINOR: quic: remove ODCID dedicated tree First connection CID generation has been altered. It is now directly derived from client ODCID since previous commit : commit `162baaff7a` MINOR: quic: derive first DCID from client ODCID This patch removes the ODCID tree which is now unneeded. On connection lookup via CID, if a DCID is not found the hash derivation is performed for an INITIAL/0-RTT packet only. In case a client has used multiple times an ODCID, this will allow to retrieve our generated DCID in the CID tree without storing the ODCID node. The impact of this two combined patch is that it may improve slightly haproxy memory footprint by removing a tree node from quic_conn structure. The cpu calculation induced by hash derivation should only be performed only a few times per connection as the client will start to use our generated CID as soon as it received it. This should be backported up to 2.7.	2023-04-05 11:07:01 +02:00
Amaury Denoyelle	162baaff7a	MINOR: quic: derive first DCID from client ODCID Change the generation of the first CID of a connection. It is directly derived from the client ODCID using a 64-bits hash function. Client address is added to avoid collision between clients which could use the same ODCID. For the moment, this change as no functional impact. However, it will be directly used for the next commit to be able to remove the ODCID tree. This should be backported up to 2.7.	2023-04-05 11:06:04 +02:00
Frédéric Lécaille	ce5c145df5	BUG/MINOR: quic: Possible crashes in qc_idle_timer_task() This is due to this commit: MINOR: quic: Add trace to debug idle timer task issues where has been added without having been tested at developer level. <qc> was dereferenced after having been released by qc_conn_release(). Set qc to NULL value after having been released to forbid its dereferencing. Add a check for qc->idle_timer_task in the traces added by the mentionned commit above to prevent its dereferencing if NULL. Take the opportunity of this patch to modify trace events from QUIC_EV_CONN_SSLALERT to QUIC_EV_CONN_IDLE_TIMER. Must be backported to 2.6 and 2.7.	2023-04-05 11:03:20 +02:00
Christopher Faulet	2954bcc1e8	BUG/MINOR: http-ana: Don't switch message to DATA when waiting for payload The HTTP message must remains in BODY state during the analysis, to be able to report accurate termination state in logs. It is also important to know the HTTP analysis is still in progress. Thus, when we are waiting for the message payload, the message is no longer switch to DATA state. This was used to not process "Expect: " header at each evaluation. But thanks to the previous patch, it is no long necessary. This patch also fixes a bug in the lua filter api. Some functions must be called during the message analysis and not during the payload forwarding. It is not valid to try to manipulate headers during the forward stage because headers are already forwarded. We rely on the message state to detect errors. So the api was unusable if a "wait-for-body" action was used. This patch shoud fix the issue #2093. It relies on the commit: * MINOR: http-ana: Add a HTTP_MSGF flag to state the Expect header was checked Both must be backported as far as 2.5.	2023-04-05 10:53:20 +02:00
Christopher Faulet	ffcffa8e93	MINOR: http-ana: Add a HTTP_MSGF flag to state the Expect header was checked HTTP_MSGF_EXPECT_CHECKED is now set on the request message to know the "Expect: " header was already handled, if any. The flag is set from the moment we try to handle the header to send a "100-continue" response, whether it was found or not. This way, when we are waiting for the request payload, thanks to this flag, we only try to handle "Expect: " header only once. Before it was performed by changing the message state from BODY to DATA. But this has some side effects and it is no accurate. So, it is better to rely on a flag to do so.	2023-04-05 10:33:32 +02:00
Aurelien DARRAGON	223770ddca	MINOR: hlua/event_hdl: per-server event subscription Now that event_hdl api is properly implemented in hlua, we may add the per-server event subscription in addition to the global event subscription. Per-server subscription allows to be notified for events related to single server. It is useful to track a server UP/DOWN and DEL events. It works exactly like core.event_sub() except that the subscription will be performed within the server dedicated subscription list instead of the global one. The callback function will only be called for server events affecting the server from which the subscription was performed. Regarding the implementation, it is pretty trivial at this point, we add more doc than code this time. Usage examples have been added to the (lua) documentation.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	c84899c636	MEDIUM: hlua/event_hdl: initial support for event handlers Now that the event handler API is pretty mature, we can expose it in the lua API. Introducing the core.event_sub(<event_types>, <cb>) lua function that takes an array of event types <event_types> as well as a callback function <cb> as argument. The function returns a subscription <sub> on success. Subscription <sub> allows you to manage the subscription from anywhere in the script. To this day only the sub->unsub method is implemented. The following event types are currently supported: - "SERVER_ADD": when a server is added - "SERVER_DEL": when a server is removed from haproxy - "SERVER_DOWN": server states goes from up to down - "SERVER_UP": server states goes from down to up As for the <cb> function: it will be called when one of the registered event types occur. The function will be called with 3 arguments: cb(<event>,<data>,<sub>) <event>: event type (string) that triggered the function. (could be any of the types used in <event_types> when registering the subscription) <data>: data associated with the event (specific to each event family). For "SERVER_" family events, server details such as server name/id/proxy will be provided. If the server still exists (not yet deleted), a reference to the live server is provided to spare you from an additionnal lookup if you need to have direct access to the server from lua. <sub> refers to the subscription. In case you need to manage it from within an event handler. (It refers to the same subscription that the one returned from core.event_sub()) Subscriptions are per-thread: the thread that will be handling the event is the one who performed the subscription using core.event_sub() function. Each thread treats events sequentially, it means that if you have, let's say SERVER_UP, then SERVER_DOWN in a short timelapse, then your cb function will first be called with SERVER_UP, and once you're done handling the event, your function will be called again with SERVER_DOWN. This is to ensure event consitency when it comes to logging / triggering logic from lua. Your lua cb function may yield if needed, but you're pleased to process the event as fast as possible to prevent the event queue from growing up To prevent abuses, if the event queue for the current subscription goes over 100 unconsumed events, the subscription will pause itself automatically for as long as it takes for your handler to catch up. This would lead to events being missed, so a warning will be emitted in the logs to inform you about that. This is not something you want to let happen too often, it may indicate that you subscribed to an event that is occurring too frequently or/and that your callback function is too slow to keep up the pace and you should review it. If you want to do some parallel processing because your callback functions are slow: you might want to create subtasks from lua using core.register_task() from within your callback function to perform the heavy job in a dedicated task and allow remaining events to be processed more quickly. Please check the lua documentation for more information.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	4e5e26641d	MINOR: proxy: add findserver_unique_id() and findserver_unique_name() Adding alternative findserver() functions to be able to perform an unique match based on name or puid and by leveraging revision id (rid) to make sure the function won't match with a new server reusing the same name or puid of the "potentially deleted" server we were initially looking for. For example, if you were in the position of finding a server based on a given name provided to you by a different context: Since dynamic servers were implemented, between the time the name was picked and the time you will perform the findserver() call some dynamic server deletion/additions could've been performed in the mean time. In such cases, findserver() could return a new server that re-uses the name of a previously deleted server. Depending on your needs, it could be perfectly fine, but there are some cases where you want to lookup the original server that was provided to you (if it still exists).	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	f751a97a11	MINOR: event_hdl: pause/resume for subscriptions While working on event handling from lua, the need for a pause/resume function to temporarily disable a subscription was raised. We solve this by introducing the EHDL_SUB_F_PAUSED flag for subscriptions. The flag is set via _pause() and cleared via _resume(), and it is checked prior to notifying the subscription in publish function. Pause and Resume functions are also available for via lookups for identified subscriptions. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	b4b7320a6a	MINOR: event_hdl: add event_hdl_async_equeue_size() function Use event_hdl_async_equeue_size() in advanced async task handler to get the near real-time event queue size. By near real-time, you should understand that the queue size is not updated during element insertion/removal, but shortly before insertion and shortly after removal, so the size should reflect the approximate queue size at a given time but should definitely not be used as a unique source of truth. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	b289fd1420	MINOR: event_hdl: normal tasks support for advanced async mode advanced async mode (EVENT_HDL_ASYNC_TASK) provided full support for custom tasklets registration. Due to the similarities between tasks and tasklets, it may be useful to use the advanced mode with an existing task (not a tasklet). While the API did not explicitly disallow this usage, things would get bad if we try to wakeup a task using tasklet_wakeup() for notifying the task about new events. To make the API support both custom tasks and tasklets, we use the TASK_IS_TASKLET() macro to call the proper waking function depending on the task's type: - For tasklets: we use tasklet_wakeup() - For tasks: we use task_wakeup() If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	afcfc20e14	BUG/MEDIUM: event_hdl: fix async data refcount issue In _event_hdl_publish(), when publishing an event to async handler(s), async_data is allocated only once and then relies on a refcount logic to reuse the same data block for multiple async event handlers. (this allows to save significant amount of memory) Because the refcount is first set to 0, there is a small race where the consumers could consume async data (async data refcount reaching 0) before publishing is actually over. The consequence is that async data may be freed by one of the consumers while we still rely on it within _event_hdl_publish(). This was discovered by chance when stress-testing the API with multiple async handlers registered to the same event: some of the handlers were notified about a new event for which the event data was already freed, resulting in invalid reads and/or segfaults. To fix this, we first set the refcount to 1, assuming that the publish function relies on async_data until the publish is over. At the end of the publish, the reference to the async data is dropped. This way, async_data is either freed by _event_hdl_publish() itself or by one of the consumers, depending on who is the last one relying on it. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	ef6ca67176	BUG/MEDIUM: event_hdl: clean soft-stop handling soft-stop was not explicitly handled in event_hdl API. Because of this, event_hdl was causing some leaks on deinit paths. Moreover, a task responsible for handling events could require some additional cleanups (ie: advanced async task), and as the task was not protected against abort when soft-stopping, such cleanup could not be performed unless the task itself implements the required protections, which is not optimal. Consider this new approach: 'jobs' global variable is incremented whenever an async subscription is created to prevent the related task from being aborted before the task acknowledges the final END event. Once the END event is acknowledged and freed by the task, the 'jobs' variable is decremented, and the deinit process may continue (including the abortion of remaining tasks not guarded by the 'jobs' variable). To do this, a new global mt_list is required: known_event_hdl_sub_list This list tracks the known (initialized) subscription lists within the process. sub_lists are automatically added to the "known" list when calling event_hdl_sub_list_init(), and are removed from the list with event_hdl_sub_list_destroy(). This allows us to implement a global thread-safe event_hdl deinit() function that is automatically called on soft-stop thanks to signal(0). When event_hdl deinit() is initiated, we simply iterate against the known subscription lists to destroy them. event_hdl_subscribe_ptr() was slightly modified to make sure that a sub_list may not accept new subscriptions once it is destroyed (removed from the known list) This can occur between the time the soft-stop is initiated (signal(0)) and haproxy actually enters in the deinit() function (once tasks are either finished or aborted and other threads already joined). It is safe to destroy() the subscription list multiple times as long as the pointer is still valid (ie: first on soft-stop when handling the '0' signal, then from regular deinit() path): the function does nothing if the subscription list is already removed. We partially reverted "BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe" since we can use parent mt_list locking instead of a dedicated lock to make the check gainst duplicate subscription ID. (insert_lock is not useful anymore) The check in itself is not changed, only the locking method. sizeof(event_hdl_sub_list) slightly increases: from 24 bits to 32bits due to the additional mt_list struct within it. With that said, having thread-safe list to store known subscription lists is a good thing: it could help to implement additional management logic for subcription lists and could be useful to add some stats or debugging tools in the future. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	3a81e997ac	MINOR: event_hdl: global sublist management clarification event_hdl_sub_list_init() and event_hdl_sub_list_destroy() don't expect to be called with a NULL argument (to use global subscription list implicitly), simply because the global subscription list init and destroy is internally managed. Adding BUG_ON() to detect such invalid usages, and updating some comments to prevent confusion around these functions. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	d514ca45c6	BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe List insertion in event_hdl_subscribe() was not thread-safe when dealing with unique identifiers. Indeed, in this case the list insertion is conditional (we check for a duplicate, then we insert). And while we're using mt lists for this, the whole operation is not atomic: there is a race between the check and the insertion. This could lead to the same ID being registered multiple times with concurrent calls to event_hdl_subscribe() on the same ID. To fix this, we add 'insert_lock' dedicated lock in the subscription list struct. The lock's cost is nearly 0 since it is only used when registering identified subscriptions and the lock window is very short: we only guard the duplicate check and the list insertion to make the conditional insertion "atomic" within a given subscription list. This is the only place where we need the lock: as soon as the item is properly inserted we're out of trouble because all other operations on the list are already thread-safe thanks to mt lists. A new lock hint is introduced: LOCK_EHDL which is dedicated to event_hdl The patch may seem quite large since we had to rework the logic around the subscribe function and switch from simple mt_list to a dedicated struct wrapping both the mt_list and the insert_lock for the event_hdl_sub_list type. (sizeof(event_hdl_sub_list) is now 24 instead of 16) However, all the changes are internal: we don't break the API. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	b8038996e9	MINOR: hlua: support for optional arguments to core.register_task() core.register_task(function) may now take up to 4 additional arguments that will be passed as-is to the task function. This could be convenient to spawn sub-tasks from existing functions supporting core.register_task() without the need to use global variables to pass some context to the newly created task function. The new prototype is: core.register_task(function[, arg1[, arg2[, ...[, arg4]]]]) Implementation remains backward-compatible with existing scripts.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	94ee6632ee	MINOR: hlua_fcn: add server->get_rid() method Server revision ID was recently added to haproxy with `61e3894` ("MINOR: server: add srv->rid (revision id) value") Let's add it to the hlua server class.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	6b0b9bd39f	BUG/MEDIUM: hlua: prevent deadlocks with main lua lock Main lua lock is used at various places in the code. Most of the time it is used from unprotected lua environments, in which case the locking is mandatory. But there are some cases where the lock is attempted from protected lua environments, meaning that lock is already owned by the current thread. Thus new locking attempt should be skipped to prevent any deadlocks from occuring. To address this, "already_safe" lock hint was implemented in hlua_ctx_init() function with commit `bf90ce1` ("BUG/MEDIUM: lua: dead lock when Lua tasks are trigerred") But this approach is not very safe, for 2 reasons: First reason is that there are still some code paths that could lead to deadlocks. For instance, in register_task(), hlua_ctx_init() is called with already_safe set to 1 to prevent deadlock from occuring. But in case of task init failure, hlua_ctx_destroy() will be called from the same environment (protected environment), and hlua_ctx_destroy() does not offer the already_safe lock hint.. resulting in a deadlock. Second reason is that already_safe hint is used to completely skip SET_LJMP macros (which manipulates the lock internally), resulting in some logics in the function being unprotected from lua aborts in case of unexpected errors when manipulating the lua stack (the lock does not protect against longjmps) Instead of leaving the locking responsibility to the caller, which is quite error prone since we must find out ourselves if we are or not in a protected environment (and is not robust against code re-use), we move the deadlock protection logic directly in hlua_lock() function. Thanks to a thread-local lock hint, we can easily guess if the current thread already owns the main lua lock, in which case the locking attempt is skipped. The thread-local lock hint is implemented as a counter so that the lock is properly dropped when the counter reaches 0. (to match actual lock() and unlock() calls) This commit depends on "MINOR: hlua: simplify lua locking" It may be backported to every stable versions. [prior to 2.5 lua filter API did not exist, filter-related parts should be skipped]	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	e36f803b71	MINOR: hlua: simplify lua locking The check on lua state==0 to know whether locking is required or not can be performed in a locking wrapper to simplify things a bit and prevent implementation errors. Locking from hlua context should now be performed via hlua_lock(L) and unlocking via hlua_unlock(L)	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	fde199dddc	CLEANUP: hlua: use hlua_unref() instead of luaL_unref() Replacing some luaL_unref(, LUA_REGISTRYINDEX) calls with hlua_unref() which is simpler to use and more explicit.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	4fdf8b58f2	CLEANUP: hlua: use hlua_pushref() instead of lua_rawgeti() Using hlua_pushref() everywhere temporary lua objects are involved. (ie: hlua_checkfunction(), hlua_checktable...) Those references are expected to be cleared using hlua_unref() when they are no longer used.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	73d1a98d52	CLEANUP: hlua: use hlua_ref() instead of luaL_ref() Using hlua_ref() everywhere temporary lua objects are involved. Those references are expected to be cleared using hlua_unref() when they are no longer used.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	55afbedfb4	BUG/MINOR: hlua: prevent function and table reference leaks on errors Several error paths were leaking function or table references. (Obtained through hlua_checkfunction() and hlua_checktable() functions) Now we properly release the references thanks to hlua_unref() in such cases. This commit depends on "MINOR: hlua: add simple hlua reference handling API" This could be backported in every stable versions although it is not mandatory as such leaks only occur on rare error/warn paths. [prior to 2.5 lua filter API did not exist, the hlua_register_filter() part should be skipped]	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	16d047b615	BUG/MINOR: hlua: fix reference leak in hlua_post_init_state() hlua init function references were not released during hlua_post_init_state(). Hopefully, this function is only used during startup so the resulting leak is not a big deal. Since each init lua function runs precisely once, it is safe to release the ref as soon as the function is restored on the stack. This could be backported to every stable versions. Please note that this commit depends on "MINOR: hlua: add simple hlua reference handling API"	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	be58d6683c	BUG/MINOR: hlua: fix reference leak in core.register_task() In core.register_task(): we take a reference to the function passed as argument in order to push it in the new coroutine substack. However, once pushed in the substack: the reference is not useful anymore and should be cleared. Currently, this is not the case in hlua_register_task(). Explicitly dropping the reference once the function is pushed to the coroutine's stack to prevent any reference leak (which could contribute to resource shortage) This may be backported to every stable versions. Please note that this commit depends on "MINOR: hlua: add simple hlua reference handling API"	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	9ee0d04770	MINOR: hlua: fix return type for hlua_checkfunction() and hlua_checktable() hlua_checktable() and hlua_checkfunction() both return the raw value of luaL_ref() function call. As luaL_ref() returns a signed int, both functions should return a signed int as well to prevent any misuse of the returned reference value.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	f8f8a2b872	MINOR: hlua: add simple hlua reference handling API We're doing this in an attempt to simplify temporary lua objects references handling. Adding the new hlua_unref() function to release lua object references created using luaL_ref(, LUA_REGISTRYINDEX) (ie: hlua_checkfunction() and hlua_checktable()) Failure to release unused object reference prevents the reference index from being re-used and prevents the referred ressource from being garbage collected. Adding hlua_pushref(L, ref) to replace lua_rawgeti(L, LUA_REGISTRYINDEX, ref) Adding hlua_ref(L) to replace luaL_ref(L, LUA_REGISTRYINDEX)	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	60ab0f7d20	CLEANUP: hlua: fix conflicting comment in hlua_ctx_destroy() The comment for the hlua_ctx_destroy() function states that the "lua" struct is not freed. This is not true anymore since `2c8b54e7` ("MEDIUM: lua: remove Lua struct from session, and allocate it with memory pools") Updating the function comment to properly report the actual behavior. This could be backported in every stable versions with `2c8b54e7` ("MEDIUM: lua: remove Lua struct from session, and allocate it with memory pools")	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	c4b2437037	MEDIUM: hlua_fcn/api: remove some old server and proxy attributes Since ("MINOR: hlua_fcn: alternative to old proxy and server attributes"): - s->name(), s->puid() are superseded by s->get_name() and s->get_puid() - px->name(), px->uuid() are superseded by px->get_name() and px->get_uuid() And considering this is now the proper way to retrieve proxy name/uuid and server name/puid from lua: We're now removing such legacy attributes, but for retro-compatibility purposes we will be emulating them and warning the user for some time before completely dropping their support. To do this, we first remove old legacy code. Then we move server and proxy methods out of the metatable to allow direct elements access without systematically involving the "__index" metamethod. This allows us to involve the "__index" metamethod only when the requested key is missing from the table. Then we define relevant hlua_proxy_index and hlua_server_index functions that will be used as the "__index" metamethod to respectively handle "name, uuid" (proxy) or "name, puid" (server) keys, in which case we warn the user about the need to use the new getter function instead the legacy attribute (to prepare for the potential upcoming removal), and we call the getter function to return the value as if the getter function was directly called from the script. Note: Using the legacy variables instead of the getter functions results in a slight overhead due to the "__index" metamethod indirection, thus it is recommended to switch to the getter functions right away. With this commit we're also adding a deprecation notice about legacy attributes.	2023-04-05 08:58:16 +02:00
Thierry Fournier	1edf36a369	MEDIUM: hlua_fcn: dynamic server iteration and indexing This patch proposes to enumerate servers using internal HAProxy list. Also, remove the flag SRV_F_NON_PURGEABLE which makes the server non purgeable each time Lua uses the server. Removing reg-tests/cli_delete_server_lua.vtc since this test is no longer relevant (we don't set the SRV_F_NON_PURGEABLE flag anymore) and we already have a more generic test: reg-tests/server/cli_delete_server.vtc Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>	2023-04-05 08:58:16 +02:00
Thierry Fournier	b0467730a0	MINOR: hlua_fcn: alternative to old proxy and server attributes This patch adds new lua methods: - "Proxy.get_uuid()" - "Proxy.get_name()" - "Server.get_puid()" - "Server.get_name()" These methods will be equivalent to their old analog Proxy.{uuid,name} and Server.{puid,name} attributes, but this will be the new preferred way to fetch such infos as it duplicates memory only when necessary and thus reduce the overall lua Server/Proxy objects memory footprint. Legacy attributes (now superseded by the explicit getters) are expected to be removed some day. Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>	2023-04-05 08:58:16 +02:00
Thierry Fournier	467913c84e	MEDIUM: hlua: Dynamic list of frontend/backend in Lua When HAproxy is loaded with a lot of frontends/backends (tested with 300k), it is slow to start and it uses a lot of memory just for indexing backends in the lua tables. This patch uses the internal frontend/backend index of HAProxy in place of lua table. HAProxy startup is now quicker as each frontend/backend object is created on demand and not at init. This has to come with some cost: the execution of Lua will be a little bit slower.	2023-04-05 08:58:16 +02:00
Thierry Fournier	599f2311a8	MINOR: hlua: Fix two functions that return nothing useful Two lua init function seems to return something useful, but it is not the case. The function "hlua_concat_init" seems to return a failure status, but the function never fails. The function "hlua_fcn_reg_core_fcn" seems to return a number of elements in the stack, but it is not the case.	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	87f52974ba	BUG/MINOR: hlua: enforce proper running context for register_x functions register_{init, converters, fetches, action, service, cli, filter} are meant to run exclusively from body context according to the documentation (unlike register_task which is designed to work from both init and runtime contexts) A quick code inspection confirms that only register_task implements the required precautions to make it safe out of init context. Trying to use those register_* functions from a runtime lua task will lead to a program crash since they all assume that they are running from the main lua context and with no concurrent runs: core.register_task(function() core.register_init(function() end) end) When loaded from the config, the above example would segfault. To prevent this undefined behavior, we now report an explicit error if the user tries to use such functions outside of init/body context. This should be backported in every stable versions. [prior to 2.5 lua filter API did not exist, the hlua_register_filter() part should be skipped]	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	795441073c	MINOR: hlua: properly handle hlua_process_task HLUA_E_ETMOUT In hlua_process_task: when HLUA_E_ETMOUT was returned by hlua_ctx_resume(), meaning that the lua task reached tune.lua.task-timeout (default: none), we logged "Lua task: unknown error." before stopping the task. Now we properly handle HLUA_E_ETMOUT to report a meaningful error message.	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	0ebd41ff50	BUG/MINOR: hlua: hook yield does not behave as expected In function hlua_hook, a yieldk is performed when function is yieldable. But the following code in that function seems to assume that the yield never returns, which is not the case! Moreover, Lua documentation says that in this situation the yieldk call must immediately be followed by a return. This patch adds a return statement after the yieldk call. It also adds some comments and removes a needless lua_sethook call. It could be backported to all stable versions, but it is not mandatory, because even if it is undefined behavior this bug doesn't seem to negatively affect lua 5.3/5.4 stacks.	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	32483ecaac	MINOR: server: correctly free servers on deinit() srv_drop() function is reponsible for freeing the server when the refcount reaches 0. There is one exception: when global.mode has the MODE_STOPPING flag set, srv_drop() will ignore the refcount and free the server on first invocation. This logic has been implemented with `13f2e2ce` ("BUG/MINOR: server: do not use refcount in free_server in stopping mode") and back then doing so was not a problem since dynamic server API was just implemented and srv_take() and srv_drop() were not widely used. Now that dynamic server API is starting to get more popular we cannot afford to keep the current logic: some modules or lua scripts may hold references to existing server and also do their cleanup in deinit phases In this kind of situation, it would be easy to trigger double-frees since every call to srv_drop() on a specific server will try to free it. To fix this, we take a different approach and try to fix the issue at the source: we now properly drop server references involved with checks/agent_checks in deinit_srv_check() and deinit_srv_agent_check(). While this could theorically be backported up to 2.6, it is not very relevant for now since srv_drop() usage in older versions is very limited and we're only starting to face the issue in mid 2.8 developments. (ie: lua core updates)	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	b5ee8bebfc	MINOR: server: always call ssl->destroy_srv when available In srv_drop(), we only call the ssl->destroy_srv() method on specific conditions. But this has two downsides: First, destroy_srv() is reponsible for freeing data that may have been allocated in prepare_srv(), but not exclusively: it also frees ssl-related parameters allocated when parsing a server entry, such as ca-file for instance. So this is quite error-prone, we could easily miss a condition where some data needs to be deallocated using destroy_srv() even if prepare_srv() was not used (since prepare_srv() is also conditional), thus resulting in memory leaks. Moreover, depending on srv->proxy to guard the check is probably not a good idea here, since srv_drop() could be called in late de-init paths in which related proxy could be freed already. srv_drop() should only take care of freeing local server data without external logic. Thankfully, destroy_srv() function performs the necessary checks to ensure that a systematic call to the function won't result in invalid reads or double frees. No backport needed.	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	cca3355074	BUG/MINOR: log: free log forward proxies on deinit() Proxies belonging to the cfg_log_forward proxy list are not cleaned up in haproxy deinit() function. We add the missing cleanup directly in the main deinit() function since no other specific function may be used for this. This could be backported up to 2.4	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	9b1d15f53a	BUG/MINOR: sink: free forward_px on deinit() When a ring section is configured, a new sink is created and forward_px proxy may be allocated and assigned to the sink. Such sink-related proxies are added to the sink_proxies_list and thus don't belong to the main proxy list which is cleaned up in haproxy deinit() function. We don't have to manually clean up sink_proxies_list in the main deinit() func: sink API already provides the sink_deinit() function so we just add the missing free_proxy(sink->forward_px) there. This could be backported up to 2.4. [in 2.4, commit `b0281a49` ("MINOR: proxy: check if p is NULL in free_proxy()") must be backported first]	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	99a8d0f5d8	BUG/MINOR: stats: properly handle server stats dumping resumption In stats_dump_proxy_to_buffer() function, special care was taken when dealing with servers dump. Indeed, stats_dump_proxy_to_buffer() can be interrupted and resumed if buffer space is not big enough to complete dump. Thus, a reference is taken on the server being dumped in the hope that the server will still be valid when the function resumes. (to prevent the server from being freed in the meantime) While this is now true thanks to: - "BUG/MINOR: server/del: fix legacy srv->next pointer consistency" We still have an issue: when resuming, saved server reference is not dropped. This prevents the server from being freed when we no longer use it. Moreover, as the saved server might now be deleted (SRV_F_DELETED flag set), the current deleted server may still be dumped in the stats and while this is not a bug, this could be misleading for the user. Let's add a px_st variable to detect if the stats_dump_proxy_to_buffer() is being resumed at the STAT_PX_ST_SV stage: perform some housekeeping to skip deleted servers and properly drop the reference on the saved server. This commit depends on: - "MINOR: server: add SRV_F_DELETED flag" - "BUG/MINOR: server/del: fix legacy srv->next pointer consistency" This should be backported up to 2.6	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	f175b08bfb	BUG/MINOR: server/del: fix srv->next pointer consistency We recently discovered a bug which affects dynamic server deletion: When a server is deleted, it is removed from the "visible" server list. But as we've seen in previous commit ("MINOR: server: add SRV_F_DELETED flag"), it can still be accessed by someone who keeps a reference on it (waiting for the final srv_drop()). Throughout this transient state, server ptr is still valid (may be dereferenced) and the flag SRV_F_DELETED is set. However, as the server is not part of server list anymore, we have an issue: srv->next pointer won't be updated anymore as the only place where we perform such update is in cli_parse_delete_server() by iterating over the "visible" server list. Because of this, we cannot guarantee that a server with the SRV_F_DELETED flag has a valid 'next' ptr: 'next' could be pointing to a fully removed (already freed) server. This problem can be easily demonstrated with server dumping in the stats: server list dumping is performed in stats_dump_proxy_to_buffer() The function can be interrupted and resumed later by design. ie: output buffer is full: partial dump and finish the dump after the flush This is implemented by calling srv_take() on the server being dumped, and only releasing it when we're done with it using srv_drop(). (drop can be delayed after function resume if buffer is full) While the function design seems OK, it works with the assumption that srv->next will still be valid after the function resumes, which is not true. (especially if multiple servers are being removed in between the 2 dumping attempts) In practice, this did not cause any crash yet (at least this was not reported so far), because server dumping is so fast that it is very unlikely that multiple server deletions make their way between 2 dumping attempts in most setups. But still, this is a problem that we need to address because some upcoming work might depend on this assumption as well and for the moment it is not safe at all. ======================================================================== Here is a quick reproducer: With this patch, we're creating a large deletion window of 3s as soon as we reach a server named "t2" while iterating over the list. This will give us plenty of time to perform multiple deletions before the function is resumed. \| diff --git a/src/stats.c b/src/stats.c \| index 84a4f9b6e..15e49b4cd 100644 \| --- a/src/stats.c \| +++ b/src/stats.c \| @@ -3189,11 +3189,24 @@ int stats_dump_proxy_to_buffer(struct stconn sc, struct htx htx, \| * Temporarily increment its refcount to prevent its \| * anticipated cleaning. Call free_server to release it. \| / \| + struct server orig = ctx->obj2; \| for (; ctx->obj2 != NULL; \| ctx->obj2 = srv_drop(sv)) { \| \| sv = ctx->obj2; \| + printf("sv = %s\n", sv->id); \| srv_take(sv); \| + if (!strcmp("t2", sv->id) && orig == px->srv) { \| + printf("deletion window: 3s\n"); \| + thread_idle_now(); \| + thread_harmless_now(); \| + sleep(3); \| + thread_harmless_end(); \| + \| + thread_idle_end(); \| + \| + goto full; /* simulate full buffer / \| + } \| \| if (htx) { \| if (htx_almost_full(htx)) \| @@ -4353,6 +4366,7 @@ static void http_stats_io_handler(struct appctx appctx) \| struct channel res = sc_ic(sc); \| struct htx req_htx, res_htx; \| \| + printf("http dump\n"); \| / only proxy stats are available via http / \| ctx->domain = STATS_DOMAIN_PROXY; \| Ok, we're ready, now we start haproxy with the following conf: global stats socket /tmp/ha.sock mode 660 level admin expose-fd listeners thread 1-1 nbthread 2 frontend stats mode http bind :8081 thread 2-2 stats enable stats uri / backend farm server t1 127.0.0.1:1899 disabled server t2 127.0.0.1:18999 disabled server t3 127.0.0.1:18998 disabled server t4 127.0.0.1:18997 disabled And finally, we execute the following script: curl localhost:8081/stats& sleep .2 echo "del server farm/t2" \| nc -U /tmp/ha.sock echo "del server farm/t3" \| nc -U /tmp/ha.sock This should be enough to reveal the issue, I easily manage to consistently crash haproxy with the following reproducer: http dump sv = t1 http dump sv = t1 sv = t2 deletion window = 3s [NOTICE] (2940566) : Server deleted. [NOTICE] (2940566) : Server deleted. http dump sv = t2 sv = ��U [1] 2940566 segmentation fault (core dumped) ./haproxy -f ttt.conf ======================================================================== To fix this, we add prev_deleted mt_list in server struct. For a given "visible" server, this list will contain the pending "deleted" servers references that point to it using their 'next' ptr. This way, whenever this "visible" server is going to be deleted via cli_parse_delete_server() it will check for servers in its 'prev_deleted' list and update their 'next' pointer so that they no longer point to it, and then it will push them in its 'next->prev_deleted' list to transfer the update responsibility to the next 'visible' server (if next != NULL). Then, following the same logic, the server about to be removed in cli_parse_delete_server() will push itself as well into its 'next->prev_deleted' list (if next != NULL) so that it may still use its 'next' ptr for the time it is in transient removal state. In srv_drop(), right before the server is finally freed, we make sure to remove it from the 'next->prev_deleted' list so that 'next' won't try to perform the pointers update for this server anymore. This has to be done atomically to prevent 'next' srv from accessing a purged server. As a result: for a valid server, either deleted or not, 'next' ptr will always point to a non deleted (ie: visible) server. With the proposed fix, and several removal combinations (including unordered cli_parse_delete_server() and srv_drop() calls), I cannot reproduce the crash anymore. Example tricky removal sequence that is now properly handled: sv list: t1,t2,t3,t4,t5,t6 ops: take(t2) del(t4) del(t3) del(t5) drop(t3) drop(t4) drop(t5) drop(t2)	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	75b9d1c041	MINOR: server: add SRV_F_DELETED flag Set the SRV_F_DELETED flag when server is removed from the cli. When removing a server from the cli (in cli_parse_delete_server()), we update the "visible" server list so that the removed server is no longer part of the list. However, despite the server being removed from "visible" server list, one could still access the server data from a valid ptr (ie: srv_take()) Deleted flag helps detecting when a server is in transient removal state: that is, removed from the list, thus not visible but not yet purged from memory.	2023-04-05 08:58:16 +02:00
Christopher Faulet	8019f78326	MINOR: stconn/applet: Add BUG_ON_HOT() to be sure SE_FL_EOS is never set alone SE_FL_EOS flag must never be set on the SE descriptor without SE_FL_EOI or SE_FL_ERROR. When a mux or an applet report an end of stream, it must be able to state if it is the end of input too or if it is an error. Because all this part was recently refactored, especially the applet part, it is a bit sensitive. Thus a BUG_ON_HOT() is used and not a BUG_ON().	2023-04-05 08:57:06 +02:00
Christopher Faulet	7faac7cf34	MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly At many places, we simplify the tests on SHUT flags to remove calls to chn_prod() or chn_cons() function because the corresponding SC is available.	2023-04-05 08:57:06 +02:00
Christopher Faulet	87633c3a11	MEDIUM: tree-wide: Move flags about shut from the channel to the SC The purpose of this patch is only a one-to-one replacement, as far as possible. CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not so simple because at many places, we were testing if a channel was shut for reads and writes in same time. To do the same, shut for reads must be tested on one side on the SC and shut for writes on the other side on the opposite SC. A special care was taken with process_stream(). flags of SCs must be saved to be able to detect changes, just like for the channels.	2023-04-05 08:57:06 +02:00
Christopher Faulet	904763f562	MINOR: stconn/channel: Move CF_EOI into the SC and rename it The channel flag CF_EOI is renamed to SC_FL_EOI and moved into the stream-connector.	2023-04-05 08:57:06 +02:00
Christopher Faulet	be08df8fb3	MEDIUM: http_client: Use the sedesc to report and detect end of processing Just like for other applets, we now use the SE descriptor instead of the channel to report error and end-of-stream. Here, the applet is a bit refactored to handle SE descriptor EOS, EOI and ERROR flags	2023-04-05 08:57:06 +02:00
Christopher Faulet	df15a5d1f3	MEDIUM: stats: Use the sedesc to report and detect end of processing Just like for other applets, we now use the SE descriptor instead of the channel to report error and end-of-stream.	2023-04-05 08:57:06 +02:00
Christopher Faulet	a739dc22c5	MEDIUM: sink: Use the sedesc to report and detect end of processing Just like for other applets, we now use the SE descriptor instead of the channel to report error and end-of-stream.	2023-04-05 08:57:06 +02:00
Christopher Faulet	4b866959d8	MINOR: sink: Remove the tests on the opposite SC state to process messages The state of the opposite SC is already tested to wait the connection is established before sending messages. So, there is no reason to test it again before looping on the ring buffer.	2023-04-05 08:57:06 +02:00
Christopher Faulet	3d949010bc	MEDIUM: peers: Use the sedesc to report and detect end of processing Just like for other applets, we now use the SE descriptor instead of the channel to report error and end-of-stream. We must just be sure to consume request data when we are waiting the applet to be released.	2023-04-05 08:57:05 +02:00
Christopher Faulet	22a88f06d4	MEDIUM: log: Use the sedesc to report and detect end of processing Just like for other applets, we now use the SE descriptor instead of the channel to report error and end-of-stream. Here, the refactoring only reports errors by setting SE_FL_ERROR flag.	2023-04-05 08:57:05 +02:00
Christopher Faulet	31572229ed	MEDIUM: hlua/applet: Use the sedesc to report and detect end of processing There are 3 kinds of applet in lua: The co-sockets, the TCP services and the HTTP services. The three are refactored to use the SE descriptor instead of the channel to report error and end-of-stream.	2023-04-05 08:57:05 +02:00
Christopher Faulet	d550d26a39	MEDIUM: spoe: Use the sedesc to report and detect end of processing Just like for other applets, we now use the SE descriptor instead of the channel to report error and end-of-stream. We must just be sure to consume request data when we are waiting the applet to be released. This patch is bit different than others because messages handling is dispatched in several functions. But idea if the same.	2023-04-05 08:57:05 +02:00
Christopher Faulet	26769b0775	MEDIUM: dns: Use the sedesc to report and detect end of processing It is now the dns turn to be refactored to use the SE descriptor instead of the channel to report error and end-of-stream. We must just be sure to consume request data when we are waiting the applet to be released.	2023-04-05 08:57:05 +02:00
Christopher Faulet	4d3283f44b	MINOR: dns: Remove the test on the opposite SC state to send requests The state of the opposite SC is already tested to wait the connection is established before sending requests. So, there is no reason to test it again before looping on the ring buffer.	2023-04-05 08:57:05 +02:00
Christopher Faulet	2fd0c7669d	MEDIUM: cli: Use the sedesc to report and detect end of processing It is the same kind of change than for the cache applet. Idea is to use the SE desc instead of the channel or the SC to report end-of-input, end-of-stream and errors. Truncated commands are now reported on error. Other changes are the same than for the cache applet. We now set SE_FL_EOS flag instead of calling cf_shutr() and calls to cf_shutw are removed.	2023-04-05 08:57:05 +02:00
Christopher Faulet	f8130b2de2	MEDIUM: cache: Use the sedesc to report and detect end of processing We now try, as far as possible, to rely on the SE descriptor to detect end of processing. Idea is to no longer rely on the channel or the SC to do so. First, we now set SE_FL_EOS instead of calling and cf_shutr() to report the end of the stream. It happens when the response is fully sent (SE_FL_EOI is already set in this case) or when an error is reported. In this last case, SE_FL_ERROR is also set. Thanks to this change, it is now possible to detect the applet must only consume the request waiting for the upper layer releases it. So, if SE_FL_EOS or SE_FL_ERROR are set, it means the reponse was fully handled. And if SE_FL_SHR or SE_FL_SHW are set, it means the applet was released by upper layer and is waiting to be freed.	2023-04-05 08:57:05 +02:00
Christopher Faulet	0ffc9d7be3	MINOR: stconn/applet: Handle EOS in the applet .wake callback function Just like for end of input, the end of stream reported by the endpoint (SE_FL_EOS flag) is now handled in sc_applet_process(). The idea is to have applets acting as muxes by reporting events through the SE descriptor, as far as possible.	2023-04-05 08:57:05 +02:00
Christopher Faulet	92297749e1	MINOR: applet: No longer set EOI on the SC Thanks to the previous patch, it is now possible for applets to not set the CF_EOI flag on the channels. On this point, the applets get closer to the muxes.	2023-04-05 08:57:05 +02:00
Christopher Faulet	f8fbb6de66	MINOR: stconn/applet: Handle EOI in the applet .wake callback function The end of input reported by the endpoint (SE_FL_EOI flag), is now handled in sc_applet_process(). This function is always called after an applet was called. So, the applets can now only report EOI on the SE descriptor and have no reason to update the channel too.	2023-04-05 08:57:05 +02:00
Christopher Faulet	b208d8cd64	MINOR: stconn: Always ack EOS at the end of sc_conn_recv() EOS is now acknowledge at the end of sc_conn_recv(), even if an error was encountered. There is no reason to not do so, especially because, if it not performed here, it will be ack in sc_conn_process(). Note, it is still performed in sc_conn_process() because this function is also the .wake callback function and can be directly called from the lower layer.	2023-04-05 08:57:05 +02:00
Christopher Faulet	e9bacf642d	MINOR: mux-h1: Report an error to the SE descriptor on truncated message On truncated message, a parsing error is still reported. But an error on the SE descriptor is also reported. This will avoid any bugs in future. We are know sure the SC is able to detect the error, independently on the HTTP analyzers.	2023-04-05 08:57:05 +02:00
Christopher Faulet	88dd0b0d13	CLEANUP: mux-h1/mux-pt: Remove useless test on SE_FL_SHR/SE_FL_SHW flags It is already performed by the called, sc_conn_shutr() and sc_conn_shutw(). So there is no reason to still test these flags in the PT and H1 muxes.	2023-04-05 08:57:05 +02:00
Christopher Faulet	147e18f9d8	BUG/MINOR: mux-h1: Properly report EOI/ERROR on read0 in h1_rcv_pipe() In h1_rcv_pipe(), only the end of stream was reported when a read0 was detected. However, it is also important to report the end of input or an error, depending on the message state. This patch does not fix any real issue for now, but some others, specific to the 2.8, rely on it. No backport needed.	2023-04-05 08:57:05 +02:00
Christopher Faulet	872b01c984	MINOR: mux-pt: Report end-of-input with the end-of-stream after a read In the PT multiplexer, the end of stream is also the end of input. Thus we must report EOI to the stream-endpoint descriptor when the EOS is reported. For now, it is a bit useless but it will be important to disginguish an shutdown to an error to an abort. To be sure to not report an EOI on an error, the errors are now handled first.	2023-04-05 08:57:05 +02:00
Christopher Faulet	84d3ef982c	MINOR: stconn/channel: Move CF_EXPECT_MORE into the SC and rename it The channel flag CF_EXPECT_MORE is renamed to SC_FL_SND_EXP_MORE and moved into the stream-connector.	2023-04-05 08:57:05 +02:00
Christopher Faulet	68ef218a72	MINOR: stconn/channel: Move CF_NEVER_WAIT into the SC and rename it The channel flag CF_NEVER_WAIT is renamed to SC_FL_SND_NEVERWAIT and moved into the stream-connector.	2023-04-05 08:57:05 +02:00
Christopher Faulet	5c281d58ea	MINOR: stconn/channel: Move CF_SEND_DONTWAIT into the SC and rename it The channel flag CF_SEND_DONTWAIT is renamed to SC_FL_SND_ASAP and moved into the stream-connector.	2023-04-05 08:57:05 +02:00
Christopher Faulet	9a790f63ed	MINOR: stconn/channel: Move CF_READ_DONTWAIT into the SC and rename it The channel flag CF_READ_DONTWAIT is renamed to SC_FL_RCV_ONCE and moved into the stream-connector.	2023-04-05 08:57:05 +02:00
Christopher Faulet	9bce9724ec	MINOR: stconn: Remove unecessary test on SE_FL_EOS before receiving data In sc_conn_recv(), if the EOS is reported by the endpoint, it will always be acknowledged by the SC and a read0 will be performed on the input channel. Thus there is no reason to still test at the begining of the function because there is already a test on CF_SHUTR.	2023-04-05 08:57:05 +02:00
Christopher Faulet	28975e1e10	BUG/MEDIUM: dns: Properly handle error when a response consumed When a response is consumed, result for co_getblk() is never checked. It seems ok because amount of output data is always checked first. But There is an issue when we try to get the first 2 bytes to read the message length. If there is only one byte followed by a shutdown, the applet ignore the shutdown and loop till the timeout to get more data. So to avoid any issue and improve shutdown detection, the co_getblk() return value is always tested. In addition, if there is not enough data, the applet explicitly ask for more data by calling applet_need_more_data(). This patch relies on the previous one: * BUG/MEDIUM: channel: Improve reports for shut in co_getblk() Both should be backported as far as 2.4. On 2.5 and 2.4, applet_need_more_data() must be replaced by si_rx_endp_more().	2023-04-05 08:57:05 +02:00
Christopher Faulet	5f5c94617e	BUG/MEDIUM: channel: Improve reports for shut in co_getblk() When co_getblk() is called with a length and an offset to 0, shutdown is never reported. It may be an issue when the function is called to retrieve all available output data, while there is no output data at all. And it seems pretty annoying to handle this case in the caller. Thus, now, in co_getblk(), -1 is returned when the channel is empty and a shutdown was received. There is no real reason to backport this patch alone. However, another fix will rely on it.	2023-04-05 08:57:05 +02:00
Christopher Faulet	2726624ee7	CLEANUP: stconn: Remove remaining debug messages It is now possible to enable traces for applets. Thus we can remove annoying debug messages (DPRINTF) to track calls to applets.	2023-04-05 08:57:05 +02:00
Christopher Faulet	26e0935681	MEDIUM: applet/trace: Register a new trace source with its events Traces are now supported for applets. The first argument is always the appctx. This will help to debug applets.	2023-04-05 08:46:06 +02:00
Christopher Faulet	a5915eb1dd	MINOR: applet: Uninline appctx_free() This functin is uninlined and move in src/applet.c. It is mandatory to add traces for applets.	2023-04-05 08:46:06 +02:00
Christopher Faulet	947b2e5922	BUG/MINOR: stream: Fix test on channels flags to set clientfin/serverfin touts There is a bug in a way the channels flags are checked to set clientfin or serverfin timeout. Indeed, to set the clientfin timeout, the request channel must be shut for reads (CF_SHUTR) or the response channel must be shut for writes (CF_SHUTW). As the opposite, the serverfin timeout must be set when the request channel is shut for writes (CF_SHUTW) or the response channel is shut for reads (CF_SHUTR). It is a 2.8-dev specific issue. No backport needed.	2023-04-05 08:46:06 +02:00
Christopher Faulet	c665bb5637	BUG/MEDIUM: stconn: Add a missing return statement in sc_app_shutr() In the commut `b08c5259e` ("MINOR: stconn: Always report READ/WRITE event on shutr/shutw"), a return statement was erroneously removed from sc_app_shutr(). As a consequence, CF_SHUTR flags was never set. Fortunately, it is the default .shutr callback function. Thus when a connection or an applet is attached to the SC, another callback is used to performe a shutdown for reads. It is a 28-dev specific issue. No backport needed.	2023-04-05 08:46:06 +02:00
Christopher Faulet	a664aa6a68	BUG/MINOR: tcpcheck: Be able to expect an empty response It is not possible to successfully match an empty response. However using regex, it should be possible to reject response with any content. For instance: tcp-check expect !rstring ".+" It may seem a be strange to do that, but it is possible, it is a valid config. So it must work. Thanks to this patch, it is now really supported. This patch may be backported as far as 2.2. But only if someone ask for it.	2023-04-05 08:46:06 +02:00
Frédéric Lécaille	0222cc6366	BUG/MINOR: quic: Possible wrong PTO computing As timestamps based on now_ms values are used to compute the probing timeout, they may wrap. So, use ticks API to compared them. Must be backported to 2.7 and 2.6.	2023-04-04 18:24:28 +02:00
Frédéric Lécaille	fdb1494985	BUILD: quic: 32bits compilation issue in cli_io_handler_dump_quic() Replaced a %zu printf format by %llu for an uint64_t. Must be backported to 2.7.	2023-04-04 18:24:28 +02:00
Frédéric Lécaille	92f4a7c614	BUG/MINOR: quic: Wrong idle timer expiration (during 20s) This this commit, this is ->idle_expire of quic_conn struct which must be taken into an account to display the idel timer task expiration value: MEDIUM: quic: Ack delay implementation Furthermore, this value was always zero until now_ms has wrapped (20 s after the start time) due to this commit: MEDIUM: clock: force internal time to wrap early after boot Do not rely on the value of now_ms compared to ->idle_expire to display the difference but use ticks_remain() to compute it. Must be backported to 2.7 where "show quic" has already been backported.	2023-04-04 18:24:28 +02:00
Frédéric Lécaille	12eca3a727	BUG/MINOR: quic: Unexpected connection closures upon idle timer task execution This bug arrived with this commit: MEDIUM: quic: Ack delay implementation It is possible that the idle timer task was already in the run queue when its ->expire field was updated calling qc_idle_timer_do_rearm(). To prevent this task from running in this condition, one must check its ->expire field value with this condition to run the task if its timer has really expired: !tick_is_expired(t->expire, now_ms) Furthermore, as this task may be directly woken up with a call to task_wakeup() all, for instance by qc_kill_conn() to kill the connection, one must check this task has really been woken up when it was in the wait queue and not by a direct call to task_wakeup() thanks to this test: (state & TASK_WOKEN_ANY) == TASK_WOKEN_TIMER Again, when this condition is not fulfilled, the task must be run. Must be backported where the commit mentionned above was backported.	2023-04-04 18:24:28 +02:00
Frédéric Lécaille	495968ed51	MINOR: quic: Add trace to debug idle timer task issues Add TRACE_PROTO() call where this is relevant to debug issues about qc_idle_timer_task() issues. Must be backported to 2.6 and 2.7.	2023-04-04 18:24:28 +02:00
Willy Tarreau	db12c0dd10	MINOR: http-act: emit a warning when a header field name contains forbidden chars As found in issue #2089, it's easy to mistakenly paste a colon in a header name, or other chars (e.g. spaces) when quotes are in use, and this causes all sort of trouble in field because such chars are rejected by the peer. Better try to detect these upfront. That's what we're doing here during the parsing of the add-header/set-header/early-hint actions, where a warning is emitted if a non-token character is found in a header name. A special case is made for the colon at the beginning so that it remains possible to place any future pseudo-headers that may appear. E.g: [WARNING] (14388) : config : parsing [badchar.cfg:23] : header name 'X-Content-Type-Options:' contains forbidden character ':'. This should be backported to 2.7, and ideally it should be turned to an error in future versions.	2023-04-04 05:38:01 +02:00
Frédéric Lécaille	c877bd4ea5	BUG/MINOR: quic: Remove useless BUG_ON() in newreno and cubic algo implementation As now_ms may be zero, these BUG_ON() could be triggered when its value has wrapped. These call to BUG_ON() may be removed because the values they was supposed to check are safely used by the ticks API. Must be backported to 2.6 and 2.7.	2023-04-03 13:15:56 +02:00
Frédéric Lécaille	7d6270a845	BUG/MAJOR: quic: Congestion algorithms states shared between the connection This very old bug is there since the first implementation of newreno congestion algorithm implementation. This was a very bad idea to put a state variable into quic_cc_algo struct which only defines the congestion control algorithm used by a QUIC listener, typically its type and its callbacks. This bug could lead to crashes since BUG_ON() calls have been added to each algorithm implementation. This was revealed by interop test, but not very often as there was not very often several connections run at the time during these tests. Hopefully this was also reported by Tristan in GH #2095. Move the congestion algorithm state to the correct structures which are private to a connection (see cubic and nr structs). Must be backported to 2.7 and 2.6.	2023-04-02 13:10:13 +02:00
Frédéric Lécaille	de2ba8640b	MINOR: quic: Add missing traces in cubic algorithm implementation May be useful to debug. Must be backported to 2.7 and 2.6.	2023-04-02 13:10:08 +02:00
Frédéric Lécaille	db54847212	BUG/MINOR: quic: Cubic congestion control window may wrap Add a check to prevent the cubic congestion control from wrapping (very low risk) in slow start callback. Must be backported to 2.6 and 2.7.	2023-04-02 13:10:07 +02:00
Frédéric Lécaille	23b8eef05b	BUG/MINOR: quic: Remaining useless statements in cubic slow start callback When entering a recovery period, the algo state is set by quic_enter_recovery(). And that's it!. These two lines should have been removed with this commit: BUG/MINOR: quic: Wrong use of now_ms timestamps (cubic algo) Take the opportunity of this patch to add a missing TRACE_LEAVE() call in quic_cc_cubic_ca_cb(). Must be backported to 2.7 and 2.6.	2023-04-02 13:10:04 +02:00
Ilya Shipitsin	07be66d21b	CLEANUP: assorted typo fixes in the code and comments This is 35th iteration of typo fixes	2023-04-01 18:33:40 +02:00
Frédéric Lécaille	db4bc6b4f3	MINOR: quic: Add a fake congestion control algorithm named "nocc" This algorithm does nothing except initializing the congestion control window to a fixed value. Very smart! Modify the QUIC congestion control configuration parser to support this new algorithm. The congestion control algorithm must be set as follows: quic-cc-algo nocc-<cc window size(KB)) For instance if "nocc-15" is provided as quic-cc-algo keyword value, this will set a fixed window of 15KB.	2023-03-31 17:09:03 +02:00
Willy Tarreau	1cb041a6ee	MINOR: cli: support filtering on FD types in "show fd" Depending on what we're debugging, some FDs can represent pollution in the "show fd" output. Here we add a set of filters allowing to pick (or exclude) any combination of listener, frontend conn, backend conn, pipes, etc. "show fd l" will only list listening connections for example.	2023-03-31 16:35:53 +02:00
Frédéric Lécaille	5d5afe7900	BUG/MINOR: quic: Wrong rtt variance computing In ->srtt quic_loss struct this is 8*srtt which is stored so that not to have to multiply/devide it to compute the RTT variance (at least). This is where there was a bug in quic_loss_srtt_update(): each time ->srtt must be used, it must be devided by 8 or right shifted by 3. This bug had a very bad impact for network with non negligeable packet loss. Must be backported to 2.6 and 2.7.	2023-03-31 13:41:17 +02:00
Frédéric Lécaille	d721571d26	MEDIUM: quic: Ack delay implementation Reuse the idle timeout task to delay the acknowledgments. The time of the idle timer expiration is for now on stored in ->idle_expire. The one to trigger the acknowledgements is stored in ->ack_expire. Add QUIC_FL_CONN_ACK_TIMER_FIRED new connection flag to mark a connection as having its acknowledgement timer been triggered. Modify qc_may_build_pkt() to prevent the sending of "ack only" packets and allows the connection to send packet when the ack timer has fired. It is possible that acks are sent before the ack timer has triggered. In this case it is cancelled only if ACK frames are really sent. The idle timer expiration must be set again when the ack timer has been triggered or when it is cancelled. Must be backported to 2.7.	2023-03-31 13:41:17 +02:00
Frédéric Lécaille	8f991948f5	MINOR: quic: Traces adjustments at proto level. Dump variables displayed by TRACE_ENTER() or TRACE_LEAVE() by calls to TRACE_PROTO(). No more variables are displayed by the two former macros. For now on, these information are accessible from proto level. Add new calls to TRACE_PROTO() at important locations in relation whith QUIC transport protocol. When relevant, try to prefix such traces with TX or RX keyword to identify the concerned subpart (transmission or reception) of the protocol. Must be backported to 2.7.	2023-03-31 09:54:59 +02:00
Frédéric Lécaille	01314b8b53	MINOR: quic: Implement cubic state trace callback This callback was left as not implemented. It should at least display the algorithm state, the control congestion window the slow start threshold and the time of the current recovery period. Should be helpful to debug. Must be backported to 2.7.	2023-03-31 09:54:59 +02:00
Frédéric Lécaille	deb978149a	BUG/MINOR: quic: Missing max_idle_timeout initialization for the connection This bug was revealed by handshakeloss interop tests (often with quiceh) where one could see haproxy an Initial packet without TLS ClientHello message (only a padded PING frame). In this case, as the ->max_idle_timeout was not initialized, the connection was closed about three seconds later, and haproxy opened a new one with a new source connection ID upon receipt of the next client Initial packet. As the interop runner count the number of source connection ID used by the server to check there were exactly 50 such IDs used by the server, it considered the test as failed. So, the ->max_idle_timeout of the connection must be at least initialized to the local "max_idle_timeout" transport parameter value to avoid such a situation (closing connections too soon) until it is "negotiated" with the client when receiving its TLS ClientHello message. Must be backported to 2.7 and 2.6.	2023-03-31 09:54:59 +02:00
Frédéric Lécaille	8e6c6611e8	BUG/MINOR: quic: Wrong use of now_ms timestamps (newreno algo) This patch is similar to the one for cubic algorithm: "BUG/MINOR: quic: Wrong use of timestamps with now_ms variable (cubic algo)" As now_ms may wrap, one must use the ticks API to protect the cubic congestion control algorithm implementation from side effects due to this. Furthermore, to make the newreno congestion control algorithm more readable and easy to maintain, add quic_cc_cubic_rp_cb() new callback for the "in recovery period" state (QUIC_CC_ST_RP). Must be backported to 2.7 and 2.6.	2023-03-31 09:54:59 +02:00
Frédéric Lécaille	a3772e1134	MINOR: quic: Add recovery related information to "show quic" Add ->srtt, ->rtt_var, ->rtt_min and ->pto_count values from ->path->loss struct to "show quic". Same thing for ->cwnd from ->path struct. Also take the opportunity of this patch to dump the packet number space information directly from ->pktns[] array in place of ->els[] array. Indeed, ->els[QUIC_TLS_ENC_LEVEL_EARLY_DATA] and ->els[QUIC_TLS_ENC_LEVEL_APP] have the same packet number space. Must be backported to 2.7 where "show quic" implementation has alredy been backported.	2023-03-31 09:54:59 +02:00
Frédéric Lécaille	d7243318c4	BUG/MINOR: quic: Wrong use of now_ms timestamps (cubic algo) As now_ms may wrap, one must use the ticks API to protect the cubic congestion control algorithm implementation from side effects due to this. Furthermore to make the cubic congestion control algorithm more readable and easy to maintain, adding a new state ("in recovery period" QUIC_CC_ST_RP new enum) helps in reaching this goal. Implement quic_cc_cubic_rp_cb() which is the callback for this new state. Must be backported to 2.7 and 2.6.	2023-03-31 09:54:59 +02:00
Remi Tricot-Le Breton	6549f53fb6	BUG/MINOR: ssl: ssl-(min\|max)-ver parameter not duplicated for bundles in crt-list If a bundle is used in a crt-list, the ssl-min-ver and ssl-max-ver options were not taken into account in entries other than the first one because the corresponding fields in the ssl_bind_conf structure were not copied in crtlist_dup_ssl_conf. This should fix GitHub issue #2069. This patch should be backported up to 2.4.	2023-03-31 09:11:51 +02:00
Remi Tricot-Le Breton	d32c8e3ccb	BUG/MINOR: ssl: Fix potential leak in cli_parse_update_ocsp_response In some extremely unlikely case (or even impossible for now), we might exit cli_parse_update_ocsp_response without raising an error but with a filled 'err' buffer. It was not properly free'd. It does not need to be backported.	2023-03-31 09:10:36 +02:00
Remi Tricot-Le Breton	ae5187721f	BUG/MINOR: ssl: Remove dead code in cli_parse_update_ocsp_response This patch removes dead code from the cli_parse_update_ocsp_response function. The 'end' label in only used in case of error so the check of the 'errcode' variable and the errcode variable itself become useless. This patch does not need to be backported. It fixes GitHub issue #2077.	2023-03-31 09:08:28 +02:00
Aurelien DARRAGON	7f01f0a8ef	BUG/MEDIUM: proxy/sktable: prevent watchdog trigger on soft-stop During soft-stop, manage_proxy() (p->task) will try to purge trashable (expired and not referenced) sticktable entries, effectively releasing the process memory to leave some space for new processes. This is done by calling stktable_trash_oldest(), immediately followed by a pool_gc() to give the memory back to the OS. As already mentioned in `dfe7925` ("BUG/MEDIUM: stick-table: limit the time spent purging old entries"), calling stktable_trash_oldest() with a huge batch can result in the function spending too much time searching and purging entries, and ultimately triggering the watchdog. Lately, an internal issue was reported in which we could see that the watchdog is being triggered in stktable_trash_oldest() on soft-stop (thus initiated by manage_proxy()) According to the report, the crash seems to only occur since `5938021` ("BUG/MEDIUM: stick-table: do not leave entries in end of window during purge") This could be the result of stktable_trash_oldest() now working as expected, and thus spending a large amount of time purging entries when called with a large enough <to_batch>. Instead of adding new checks in stktable_trash_oldest(), here we chose to address the issue directly in manage_proxy(). Since the stktable_trash_oldest() function is called with <to_batch> == <p->table->current>, it's pretty obvious that it could cause some issues during soft-stop if a large table, assuming it is full prior to the soft-stop, suddenly sees most of its entries becoming trashable because of the soft-stop. Moreover, we should note that the call to stktable_trash_oldest() is immediately followed by a call to pool_gc(): We know for sure that pool_gc(), as it involves malloc_trim() on glibc, is rather expensive, and the more memory to reclaim, the longer the call. We need to ensure that both stktable_trash_oldest() + consequent pool_gc() call both theoretically fit in a single task execution window to avoid contention, and thus prevent the watchdog from being triggered. To do this, we now allocate a "budget" for each purging attempt. budget is maxed out to 32K, it means that each sticktable cleanup attempt will trash at most 32K entries. 32K value is quite arbitrary here, and might need to be adjusted or even deducted from other parameters if this fails to properly address the issue without introducing new side-effects. The goal is to find a good balance between the max duration of each cleanup batch and the frequency of (expensive) pool_gc() calls. If most of the budget is actually spent trashing entries, then the task will immediately be rescheduled to continue the purge. This way, the purge is effectively batched over multiple task runs. This may be slowly backported to all stable versions. [Please note that this commit depends on `6e1fe25` ("MINOR: proxy/pool: prevent unnecessary calls to pool_gc()")]	2023-03-31 07:05:08 +02:00
Martin DOLEZ	28c5f40ad6	MINOR: http_fetch: Add case-insensitive argument for url_param/urlp_val This commit adds a new optional argument to smp_fetch_url_param and smp_fetch_url_param_val that makes the parameter key comparison case-insensitive. Now users can retrieve URL parameters regardless of their case, allowing to match parameters in case insensitive application. Doc was updated.	2023-03-30 14:11:25 +02:00
Martin DOLEZ	110e4a8733	MINOR: http_fetch: add case insensitive support for smp_fetch_url_param This commit adds a new argument to smp_fetch_url_param that makes the parameter key comparison case-insensitive. Several levels of callers were modified to pass this info.	2023-03-30 14:11:10 +02:00
Martin DOLEZ	1a9a994c11	MINOR: http_fetch: Add support for empty delim in url_param In prevision of adding a third parameter to the url_param sample-fetch function we need to make the second parameter optional. User can now pass a empty 2nd argument to keep the default delimiter.	2023-03-30 14:10:59 +02:00
Aurelien DARRAGON	2c5b9ded9b	CLEANUP: proxy: remove stop_time related dead code Since `eb77824` ("MEDIUM: proxy: remove the deprecated "grace" keyword"), stop_time is never set, so the related code in manage_proxy() is not relevant anymore. Removing code that refers to p->stop_time, since it was probably overlooked.	2023-03-28 20:26:47 +02:00
Aurelien DARRAGON	6e1fe253b7	MINOR: proxy/pool: prevent unnecessary calls to pool_gc() Under certain soft-stopping conditions (ie: sticktable attached to proxy and in-progress connections to the proxy that prevent haproxy from exiting), manage_proxy() (p->task) will wake up every second to perform a cleanup attempt on the proxy sticktable (to purge unused entries). However, as reported by TimWolla in GH #2091, it was found that a systematic call to pool_gc() could cause some CPU waste, mainly because malloc_trim() (which is rather expensive) is being called for each pool_gc() invocation. As a result, such soft-stopping process could be spending a significant amount of time in the malloc_trim->madvise() syscall for nothing. Example "strace -c -f -p `pidof haproxy`" output (taken from Tim's report): % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 46.77 1.840549 3941 467 1 epoll_wait 43.82 1.724708 13 128509 sched_yield 8.82 0.346968 11 29696 madvise 0.58 0.023011 24 951 clock_gettime 0.01 0.000257 10 25 7 recvfrom 0.00 0.000033 11 3 sendto 0.00 0.000021 21 1 rt_sigreturn 0.00 0.000021 21 1 timer_settime ------ ----------- ----------- --------- --------- ---------------- 100.00 3.935568 24 159653 8 total To prevent this, we now only call pool_gc() when some memory is really expected to be reclaimed as a direct result of the previous stick table cleanup. This is pretty straightforward since stktable_trash_oldest() returns the number of trashed sticky sessions. This may be backported to every stable versions.	2023-03-28 20:26:38 +02:00
Fr�d�ric L�caille	9c317b1d35	BUG/MINOR: quic: Missing padding in very short probe packets This bug arrived with this commit: MINOR: quic: Send PING frames when probing Initial packet number space This may happen when haproxy needs to probe the peer with very short packets (only one PING frame). In this case, the packet must be padded. There was clearly a case which was removed by the mentionned commit above. That said, there was an extra byte which was added to the PADDING frame before the mentionned commit above. This is no more the case with this patch. Thank you to @tatsuhiro-t (ngtcp2 manager) for having reported this issue which was revealed by the keyupdate test (on client side). Must be backported to 2.7 and 2.6.	2023-03-28 18:26:57 +02:00
Christopher Faulet	21fb6bdab4	BUG/MEDIUM: mux-h2: Be able to detect connection error during handshake When a backend H2 connection is waiting the connection is fully established, nothing is sent. However, it remains useful to detect connection error at this stage. It is especially important to release H2 connection on connect error. Be able to set H2_CF_ERR_PENDiNG or H2_CF_ERROR flags when the underlying connection is not fully established will exclude the H2C to be inserted in a idle list in h2_detach(). Without this fix, an H2C in PREFACE state and relying on a connection in error can be inserted in the safe list. Of course, it will be purged if not reused. But in the mean time, it can be reused. When this happens, the connection remains in error and nothing happens. At the end a connection error is returned to the client. On low traffic, we can imagine a scenario where this dead connection is the only idle connection. If it is always reused before being purged, no connection to the server is possible. In addition, h2c_is_dead() is updated to declare as dead any H2 connection with a pending error if its state is PREFACE or SETTINGS1 (thus if no SETTINGS frame was received yet). This patch should fix the issue #2092. It must be backported as far as 2.6.	2023-03-28 14:52:42 +02:00
Christopher Faulet	41a454da0a	BUG/MINOR: stats: Don't replace sc_shutr() by SE_FL_EOS flag yet In commit `c2c043ed4` ("BUG/MEDIUM: stats: Consume the request except when parsing the POST payload"), a change about applet was pushed too early. The applet must still call cf_shutr() when the response is fully sent. It is planned to rely on SE_FL_EOS flag, just like connections. But it is not possible for now. However, at first glance, this bug has no visible effect. It is 2.8-specific. No backport needed.	2023-03-28 14:36:05 +02:00
Tim Duesterhus	b39c24b29e	BUG/MINOR: ssl: Stop leaking `err` in ssl_sock_load_ocsp() Previously performing a config check of `.github/h2spec.config` would report a 20 byte leak as reported in GitHub Issue #2082. The leak was introduced in `a6c0a59e9a`, which is dev only. No backport needed.	2023-03-28 11:09:12 +02:00
Frédéric Lécaille	c425e03b28	BUG/MINOR: quic: Missing STREAM frame type updated This patch follows this commit which was not sufficient: BUG/MINOR: quic: Missing STREAM frame data pointer updates Indeed, after updating the ->offset field, the bit which informs the frame builder of its presence must be systematically set. This bug was revealed by the following BUG_ON() from quic_build_stream_frame() : bug condition "!!(frm->type & 0x04) != !!stream->offset.key" matched at src/quic_frame.c:515 This should fix the last crash occured on github issue #2074. Must be backported to 2.6 and 2.7.	2023-03-27 16:01:44 +02:00
Aurelien DARRAGON	821581c990	BUG/MINOR: applet/new: fix sedesc freeing logic Since `465a6c8` ("BUG/MEDIUM: applet: only set appctx->sedesc on successful allocation"), sedesc is attached to the appctx after the task is successfully allocated. If the task fails to allocate: current sedesc cleanup is performed on appctx->sedesc which still points to NULL so sedesc won't be freed. This is fine when sedesc is provided as argument (!=NULL), but leads to memory leaks if sedesc is allocated locally. It was shown in GH #2086 that if sedesc != NULL when passed as argument, it shouldn't be freed on error paths. This is what `465a6c8` was trying to address. In an attempt to fix both issues at once, we do as Christopher suggested: that is moving sedesc allocation attempt at the end of the function, so that we don't have to free it in case of error, thus removing the ambiguity. (We won't risk freeing a sedesc that does not belong to us) If we fail to allocate sedesc, then the task that was previously created locally is simply destroyed. This needs to be backported to 2.6 with `465a6c8` ("BUG/MEDIUM: applet: only set appctx->sedesc on successful allocation") [Copy pasting the original backport note from Willy: In 2.6 the function is slightly different and called appctx_new(), though the issue is exactly the same.]	2023-03-24 14:38:53 +01:00
Christopher Faulet	551b896772	BUG/MEDIUM: mux-h1: Wakeup H1C on shutw if there is no I/O subscription This old bug was revealed because of the commit `407210a34` ("BUG/MEDIUM: stconn: Don't rearm the read expiration date if EOI was reached"). But it is still possible to hit it if there is no server timeout. At first glance, the 2.8 is not affected. But the fix remains valid. When a shutdown for writes if performed the H1 connection must be notified to be released. If it is subscribed for any I/O events, it is not an issue. But, if there is no subscription, no I/O event is reported to the H1 connection and it remains alive. If the message was already fully received, nothing more happens. On my side, I was able to trigger the bug by freezing the session. Some users reported a spinning loop on process_stream(). Not sure how to trigger the loop. To freeze the session, the client timeout must be reached while the server response was already fully received. On old version (< 2.6), it only happens if there is no server timeout. To fix the issue, we must wake up the H1 connection on shutdown for writes if there is no I/O subscription. This patch must be backported as far as 2.0. It should fix the issue #2090 and #2088.	2023-03-24 14:38:35 +01:00
Christopher Faulet	c2c043ed43	BUG/MEDIUM: stats: Consume the request except when parsing the POST payload The stats applet is designed to consume the request at the end, when it finishes to send the response. And during the response forwarding, because the request is not consumed, the applet states it will not consume data. This avoid to wake the applet up in loop. When it finishes to send the response, the request is consumed. For POST requests, there is no issue because the response is small enough. It is sent in one time and must be processed by HTTP analyzers. Thus the forwarding is not performed by the applet itself. The applet is always able to consume the request, regardless the payload length. But for other requests, it may be an issue. If the response is too big to be sent in one time and if the requests is not fully received when the response headers are sent, the applet may be blocked infinitely, not consuming the request. Indeed, in the case the applet will be switched in infinite forward mode, the request will not be consumed immediately. At the end, the request buffer is flushed. But if some data must still be received, the applet is not woken up because it is still in a "not-consuming" mode. So, to fix the issue, we must take care to re-enable data consuming when the end of the response is reached. This patch must be backported as far as 2.6.	2023-03-24 09:24:27 +01:00
Christopher Faulet	3aeb36681c	BUG/MINOR: syslog: Request for more data if message was not fully received In the syslog applet, when a message was not fully received, we must request for more data by calling appctx_need_more_data() and not by setting CF_READ_DONTWAIT flag on the request channel. Indeed, this flag is only used to only try a read at once. This patch could be backported as far as 2.4. On 2.5 and 2.4, applet_need_more_data() must be replaced by si_cant_get().	2023-03-24 09:24:03 +01:00
Amaury Denoyelle	abbb5ad1f5	MINOR: mux-quic: close on frame alloc failure Replace all BUG_ON() on frame allocation failure by a CONNECTION_CLOSE sending with INTERNAL_ERROR code. This can happen for the following cases : * sending of MAX_STREAM_DATA * sending of MAX_DATA * sending of MAX_STREAMS_BIDI In other cases (STREAM, STOP_SENDING, RESET_STREAM), an allocation failure will only result in the current operation to be interrupted and retried later. However, it may be desirable in the future to replace this with a simpler CONNECTION_CLOSE emission to recover better under a memory pressure issue. This should be backported up to 2.7.	2023-03-23 14:39:49 +01:00
Amaury Denoyelle	c0c6b6d8c0	MINOR: mux-quic: close on qcs allocation failure Emit a CONNECTION_CLOSE with INTERNAL_ERROR code each time qcs allocation fails. This can happen in two cases : * when creating a local stream through application layer * when instantiating a remote stream through qcc_get_qcs() In both cases, error paths are already in place to interrupt the current operation and a CONNECTION_CLOSE will be emitted soon after. This should be backported up to 2.7.	2023-03-23 14:39:49 +01:00
Amaury Denoyelle	e2213df9fe	MINOR: mux-quic: ensure CONNECTION_CLOSE is scheduled once per conn Add BUG_ON() statements to ensure qcc_emit_cc()/qcc_emit_cc_app() is not called more than one time for each connection. This should improve code resilience of MUX-QUIC and H3 and it will ensure that a scheduled CONNECTION_CLOSE is not overwritten by another one with a different error code. This commit relies on the previous one to ensure all QUIC operations are not conducted as soon as a CONNECTION_CLOSE has been prepared : commit d7fbf458f8a4c5b09cbf0da0208fbad70caaca33 MINOR: mux-quic: interrupt most operations if CONNECTION_CLOSE scheduled This should be backported up to 2.7.	2023-03-23 14:39:49 +01:00
Amaury Denoyelle	b47310d883	MINOR: mux-quic: interrupt qcc_recv() operations if CC scheduled Ensure that external MUX operations are interrupted if a CONNECTION_CLOSE is scheduled. This was already the cases for some functions. This is extended to the qcc_recv() family for MAX_STREAM_DATA, RESET_STREAM and STOP_SENDING. Also, qcc_release_remote_stream() is skipped in qcs_destroy() if a CONNECTION_CLOSE is already scheduled. All of this will ensure we only proceed to minimal treatment as soon as a CONNECTION_CLOSE is prepared. Indeed, all sending and receiving is stopped as soon as a CONNECTION_CLOSE is emitted so only internal cleanup code should be necessary at this stage. This should prevent a registered CONNECTION_CLOSE error status to be overwritten by an error in a follow-up treatment. This should be backported up to 2.7.	2023-03-23 14:39:47 +01:00
Amaury Denoyelle	665817a91c	BUG/MINOR: mux-quic: prevent CC status to be erased by shutdown HTTP/3 graceful shutdown operation is used to emit a GOAWAY followed by a CONNECTION_CLOSE with H3_NO_ERROR status. It is used for every connection on release which means that if a CONNECTION_CLOSE was already registered for a previous error, its status code is overwritten. To fix this, skip shutdown operation if a CONNECTION_CLOSE is already registered at the MUX level. This ensures that the correct error status is reported to the peer. This should be backported up to 2.6. Note that qc_shutdown() does not exists on 2.6 so modification will have to be made directly in qc_release() as followed : diff --git a/src/mux_quic.c b/src/mux_quic.c index 49df0dc418..3463222956 100644 --- a/src/mux_quic.c +++ b/src/mux_quic.c @@ -1766,19 +1766,21 @@ static void qc_release(struct qcc qcc) TRACE_ENTER(QMUX_EV_QCC_END, conn); - if (qcc->app_ops && qcc->app_ops->shutdown) { - / Application protocol with dedicated connection closing - * procedure. - / - qcc->app_ops->shutdown(qcc->ctx); + if (!(qcc->flags & QC_CF_CC_EMIT)) { + if (qcc->app_ops && qcc->app_ops->shutdown) { + / Application protocol with dedicated connection closing + * procedure. + / + qcc->app_ops->shutdown(qcc->ctx); - / useful if application protocol should emit some closing - * frames. For example HTTP/3 GOAWAY frame. - / - qc_send(qcc); - } - else { - qcc_emit_cc_app(qcc, QC_ERR_NO_ERROR, 0); + / useful if application protocol should emit some closing + * frames. For example HTTP/3 GOAWAY frame. + */ + qc_send(qcc); + } + else { + qcc_emit_cc_app(qcc, QC_ERR_NO_ERROR, 0); + } } if (qcc->task) {	2023-03-23 14:38:06 +01:00
Amaury Denoyelle	5aa21c1748	BUG/MINOR: h3: properly handle incomplete remote uni stream type A H3 unidirectional stream is always opened with its stream type first encoded as a QUIC variable integer. If the STREAM frame contains some data but not enough to decode this varint, haproxy would crash due to an ABORT_NOW() statement. To fix this, ensure we support an incomplete stream type. In this case, h3_init_uni_stream() returns 0 and the buffer content is not cleared. Stream decoding will resume when new data are received for this stream which should be enough to decode the stream type varint. This bug has never occured on production because standard H3 stream types are small enough to be encoded on a single byte. This should be backported up to 2.6.	2023-03-23 14:38:06 +01:00
Willy Tarreau	1751db140a	MINOR: pools: report a replaced memory allocator instead of just malloc_trim() Instead of reporting the inaccurate "malloc_trim() support" on -vv, let's report the case where the memory allocator was actively replaced from the one used at build time, as this is the corner case we want to be cautious about. We also put a tainted bit when this happens so that it's possible to detect it at run time (e.g. the user might have inherited it from an environment variable during a reload operation). The now unused is_trim_enabled() function was finally dropped.	2023-03-22 18:05:02 +01:00
Willy Tarreau	0c27ec5df7	BUG/MINOR: pools: restore detection of built-in allocator The runtime detection of the default memory allocator was broken by Commit `d8a97d8f6` ("BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used") due to a misunderstanding of its role. The purpose is not to detect whether we're on non-jemalloc but whether or not the allocator was changed from the one we booted with, in which case we must be extra cautious and absolutely refrain from calling malloc_trim() and its friends. This was done only to drop the message saying that malloc_trim() is supported, which will be totally removed in another commit, and could possibly be removed even in older versions if this patch would get backported since in the end it provides limited value.	2023-03-22 17:57:13 +01:00
Willy Tarreau	c3b297d5a4	MEDIUM: tools: further relax dlopen() checks too consider grouped symbols There's a recurring issue regarding shared library loading from Lua. If the imported library is linked with a different version of openssl but doesn't use it, the check will trigger and emit a warning. In practise it's not necessarily a problem as long as the API is the same, because all symbols are replaced and the library will use the included ssl lib. It's only a problem if the library comes with a different API because the dynamic linker will only replace known symbols with ours, and not all. Thus the loaded lib may call (via a static inline or a macro) a few different symbols that will allocate or preinitialize structures, and which will then pass them to the common symbols coming from a different and incompatible lib, exactly what happens to users of Lua's luaossl when building haproxy with quictls and without rebuilding luaossl. In order to better address this situation, we now define groups of symbols that must always appear/disappear in a consistent way. It's OK if they're all absent from either haproxy or the lib, it means that one of them doesn't use them so there's no problem. But if any of them is defined on any side, all of them must be in the exact same state on the two sides. The symbols are represented using a bit in a mask, and the mask of the group of other symbols they're related to. This allows to check 64 symbols, this should be OK for a while. The first ones that are tested for now are symbols whose combination differs between openssl versions 1.0, 1.1, and 3.0 as well as quictls. Thus a difference there will indicate upcoming trouble, but no error will mean that we're running on a seemingly compatible API and that all symbols should be replaced at once. The same mechanism could possibly be used for pcre/pcre2, zlib and the few other optional libs that may occasionally cause runtime issues when used by dependencies, provided that discriminatory symbols are found to distinguish them. But in practice such issues are pretty rare, mainly because loading standard libs via Lua on a custom build of haproxy is not pretty common. In the event that further symbol compatibility issues would be reported in the future, backporting this patch as well as the following series might be an acceptable solution given that the scope of changes is very narrow (the malloc stuff is needed so that the malloc/free tests can be dropped): BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used MINOR: pools: make sure 'no-memory-trimming' is always used MINOR: pools: intercept malloc_trim() instead of trying to plug holes MEDIUM: pools: move the compat code from trim_all_pools() to malloc_trim() MINOR: pools: export trim_all_pools() MINOR: pattern: use trim_all_pools() instead of a conditional malloc_trim() MINOR: tools: relax dlopen() on malloc/free checks	2023-03-22 17:30:28 +01:00
Willy Tarreau	58912b8d92	MINOR: tools: relax dlopen() on malloc/free checks Now that we can provide a safe malloc_trim() we don't need to detect anymore that some dependencies use a different set of malloc/free functions than ours because they will use the same as those we're seeing, and we control their use of malloc_trim(). The comment about the incompatibility with DEBUG_MEM_STATS is not true anymore either since the feature relies on macros so we're now OK. This will stop catching libraries linked against glibc's allocator when haproxy is natively built with jemalloc. This was especially annoying since dlopen() on a lib depending on jemalloc() tends to fail on TLS issues.	2023-03-22 17:30:28 +01:00
Willy Tarreau	9b060f148e	MINOR: pattern: use trim_all_pools() instead of a conditional malloc_trim() First this will ensure that we serialize the threads and avoid severe contention. Second it removes ugly ifdefs and conditions.	2023-03-22 17:30:28 +01:00
Willy Tarreau	7aee683541	MINOR: pools: export trim_all_pools() This way it will be usable from outside instead of malloc_trim().	2023-03-22 17:30:28 +01:00
Willy Tarreau	4138f15182	MEDIUM: pools: move the compat code from trim_all_pools() to malloc_trim() We already have some generic code in trim_all_pools() to implement the equivalent of malloc_trim() on jemalloc and macos. Instead of keeping the logic there, let's just move it to our own malloc_trim() implementation so that we can unify the mechanism and the logic. Now any low-level code calling malloc_trim() will either be disabled by haproxy's config if the user decides to, or will be mapped to the equivalent mechanism if malloc() was intercepted by a preloaded jemalloc. Trim_all_pools() preserves the benefit of serializing threads (which we must not impose to other libs which could come with their own threads). It means that our own code should mostly use trim_all_pools() instead of calling malloc_trim() directly.	2023-03-22 17:30:28 +01:00
Willy Tarreau	eaba76b02d	MINOR: pools: intercept malloc_trim() instead of trying to plug holes As reported by Miroslav in commit `d8a97d8f6` ("BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used") there are still occasional cases where it's discovered that malloc_trim() is being used without its suitability being checked first. This is a problem when using another incompatible allocator. But there's a class of use cases we'll never be able to cover, it's dynamic libraries loaded from Lua. In order to address this more reliably, we now define our own malloc_trim() that calls the previous one after checking that the feature is supported and that the allocator is the expected one. This way child libraries that would call it will also be safe. The function is intentionally left defined all the time so that it will be possible to clean up some code that uses it by removing ifdefs.	2023-03-22 17:30:28 +01:00
Willy Tarreau	4db0b0430d	MINOR: pools: make sure 'no-memory-trimming' is always used The global option 'no-memory-trimming' was added in 2.6 with commit `c4e56dc58` ("MINOR: pools: add a new global option "no-memory-trimming"") but there were some cases left where it was not considered. Let's make is_trim_enabled() also consider it.	2023-03-22 17:29:23 +01:00
Amaury Denoyelle	f4e7616e6c	MINOR: mux-quic: add flow-control info to minimal trace level Complete traces with information from qcc and qcs instances about flow-control level. This should help to debug further issue on sending. This must be backported up to 2.7.	2023-03-22 16:08:54 +01:00
Amaury Denoyelle	b7143a8781	MINOR: mux-quic: adjust trace level for MAX_DATA/MAX_STREAM_DATA recv Change the trace from developer to data level whenever the flow control limitation is updated following a MAX_DATA or MAX_STREAM_DATA reception. This should be backported up to 2.7.	2023-03-22 16:08:54 +01:00
Amaury Denoyelle	1ec78ff421	MINOR: mux-quic: complete traces for qcs emission Add traces for _qc_send_qcs() function. Most notably, traces have been added each time a qc_stream_desc buffer allocation fails and when stream or connection flow-level is reached. This should improve debugging for emission issues. This must be backported up to 2.7.	2023-03-22 16:08:54 +01:00
Amaury Denoyelle	178fbffda1	BUG/MEDIUM: mux-quic: release data from conn flow-control on qcs reset Connection flow-control level calculation is a bit complicated. To ensure it is never exceeded, each time a transfer occurs from a qcs.tx.buf to its qc_stream_desc buffer it is accounted in qcc.tx.offsets at the connection level. This value is not decremented even if the corresponding STREAM frame is rejected by the quic-conn layer as its emission will be retried later. In normal cases this works as expected. However there is an issue if a qcs instance is removed with prepared data left. In this case, its data is still accounted in qcc.tx.offsets despite being removed which may block other streams. This happens every time a qcs is reset with remaining data which will be discarded in favor of a RESET_STREAM frame. To fix this, if a stream has prepared data in qcc_reset_stream(), it is decremented from qcc.tx.offsets. A BUG_ON() has been added to ensure qcs_destroy() is never called for a stream with prepared data left. This bug can cause two issues : * transfer freeze as data unsent from closed streams still count on the connection flow-control limit and will block other streams. Note that this issue was not reproduced so it's unsure if this really happens without the following issue first. * a crash on a BUG_ON() statement in qc_send() loop over qc_send_frames(). Streams may remained in the send list with nothing to send due to connection flow-control limit. However, limit is never reached through qcc_streams_sent_done() so QC_CF_BLK_MFCTL flag is not set which will allow the loop to continue. The last case was reproduced after several minutes of testing using the following command : $ ngtcp2-client --exit-on-all-streams-close -t 0.1 -r 0.1 \ --max-data=100K -n32 \ 127.0.0.1 20443 "https://127.0.0.1:20443/?s=1g" 2>/dev/null This should fix github issues #2049 and #2074.	2023-03-22 16:08:54 +01:00
Miroslav Zagorac	d8a97d8f60	BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used In the event that HAProxy is linked with the jemalloc library, it is still shown that malloc_trim() is enabled when executing "haproxy -vv": .. Support for malloc_trim() is enabled. .. It's not so much a problem as it is that malloc_trim() is called in the pat_ref_purge_range() function without any checking. This was solved by setting the using_default_allocator variable to the correct value in the detect_allocator() function and before calling malloc_trim() it is checked whether the function should be called.	2023-03-22 14:14:50 +01:00
Willy Tarreau	9ef2742a51	MINOR: debug: support dumping the libs addresses when running in verbose mode Starting haproxy with -dL helps enumerate the list of libraries in use. But sometimes in order to go further we'd like to see their address ranges. This is already supported on the CLI's "show libs" but not on the command line where it can sometimes help troubleshoot startup issues. Let's dump them when in verbose mode. This way it doesn't change the existing behavior for those trying to enumerate libs to produce an archive.	2023-03-22 11:43:15 +01:00
Willy Tarreau	1b536a11e7	BUILD: thread: silence a build warning when threads are disabled When threads are disabled, the compiler complains that we might be accessing tg->abs[] out of bounds since the array is of size 1. It cannot know that the condition to do this is never met, and given that it's not in a fast path, we can make it more obvious.	2023-03-22 10:40:06 +01:00
Amaury Denoyelle	8afe4b88c4	BUG/MINOR: quic: ignore congestion window on probing for MUX wakeup qc_notify_send() is used to wake up the MUX layer for sending. This function first ensures that all sending condition are met to avoid to wake up the MUX for unnecessarily. One of this condition is to check if there is room in the congestion window. However, when probe packets must be sent due to a PTO expiration, RFC 9002 explicitely mentions that the congestion window must be ignored which was not the case prior to this patch. This commit fixes this by first setting <pto_probe> of 01RTT packet space before invoking qc_notify_send(). This ensures that congestion window won't be checked anymore to wake up the MUX layer until probing packets are sent. This commit replaces the following one which was not sufficient : commit `e25fce03eb` BUG/MINOR: quic: Dysfunctional 01RTT packet number space probing This should be backported up to 2.7.	2023-03-21 14:52:02 +01:00
Amaury Denoyelle	2a19b6e564	BUG/MINOR: quic: wake up MUX on probing only for 01RTT On PTO probe timeout expiration, a probe packet must be emitted. quic_pto_pktns() is used to determine for which packet space the timer has expired. However, if MUX is already subscribed for sending, it is woken up without checking first if this happened for the 01RTT packet space. It is unsure that this is really a bug as in most cases, MUX is established only after Initial and Handshake packet spaces are removed. However, the situation is not se clear when 0-RTT is used. For this reason, adjust the code to explicitely check for the 01RTT packet space before waking up the MUX layer. This should be backported up to 2.6. Note that qc_notify_send() does not exists in 2.6 so it should be replaced by the explicit block checking (qc->subs && qc->subs->events & SUB_RETRY_SEND).	2023-03-21 14:09:50 +01:00
Willy Tarreau	465a6c8506	BUG/MEDIUM: applet: only set appctx->sedesc on successful allocation If appctx_new_on() fails to allocate a task, it will not remove the freshly allocated sedesc from the appctx despite freeing it, causing a UAF. Let's only assign appctx->sedesc upon success. This needs to be backported to 2.6. In 2.6 the function is slightly different and called appctx_new(), though the issue is exactly the same.	2023-03-21 10:50:51 +01:00
Willy Tarreau	a220e59ad8	BUG/MEDIUM: mux-h1: properly destroy a partially allocated h1s In h1c_frt_stream_new() and h1c_bck_stream_new(), if we fail to completely initialize the freshly allocated h1s, typically because sc_attach_mux() fails, we must use h1s_destroy() to de-initialize it. Otherwise it stays attached to the h1c when released, causing use-after-free upon the next wakeup. This can be triggered upon memory shortage. This needs to be backported to 2.6.	2023-03-21 10:44:44 +01:00
Willy Tarreau	0c4348c982	MINOR: pools: preset the allocation failure rate to 1% with -dMfail Using -dMfail alone does nothing unless tune.fail-alloc is set, which renders it pretty useless as-is, and is not intuitive. Let's change this so that the filure rate is preset to 1% when the option is set on the command line. This allows to inject failures without having to edit the configuration.	2023-03-21 09:26:55 +01:00
Willy Tarreau	7a8ca0a063	BUG/MINOR: stconn: fix sedesc memory leak on stream allocation failure If we fail to allocate a new stream in sc_new_from_endp(), and the call to sc_new() allocated the sedesc itself (which normally doesn't happen), then it doesn't get released on the failure path. Let's explicitly handle this case so that it's not overlooked and avoids some head scratching sessions. This may be backported to 2.6.	2023-03-20 19:58:38 +01:00
Willy Tarreau	e2f7946339	BUG/MEDIUM: stconn: don't set the type before allocation succeeds There's an occasional crash that can be triggered in sc_detach_endp() when calling conn->mux->detach() upon memory allocation error. The problem in fact comes from sc_attach_mux(), which doesn't reset the sc type flags upon tasklet allocation failure, leading to an attempt at detaching an incompletely initialized stconn. Let's just attach the sc after the tasklet allocation succeeds, not before. This must be backported to 2.6.	2023-03-20 19:58:38 +01:00
Willy Tarreau	389ab0d4b4	BUG/MEDIUM: mux-h2: erase h2c->wait_event.tasklet on error path On the allocation error path in h2_init() we may check if h2c->wait_event.tasklet needs to be released but it has not yet been zeroed. Let's do this before jumping to the freeing location. This needs to be backported to all maintained versions.	2023-03-20 19:58:38 +01:00
Willy Tarreau	bcdc6cc15b	BUG/MEDIUM: mux-h2: do not try to free an unallocated h2s->sd In h2s_close() we may dereference h2s->sd to get the sc, but this function may be called on allocation error paths, so we must check for this specific condition. Let's also update the comment to make it explicitly permitted. This needs to be backported to 2.6.	2023-03-20 19:58:38 +01:00
Willy Tarreau	a45e7e81ec	BUG/MEDIUM: stream: do not try to free a failed stream-conn In stream_free() if we fail to allocate s->scb() we go to the path where we try to free it, and it doesn't like being called with a null at all. It's easily reproducible with -dMfail,no-cache and "tune.fail-alloc 10" in the global section. This must be backported to 2.6.	2023-03-20 19:58:38 +01:00
Frédéric Lécaille	e25fce03eb	BUG/MINOR: quic: Dysfunctional 01RTT packet number space probing This bug arrived with this commit: "MINOR: quic: implement qc_notify_send()". The ->tx.pto_probe variable was no more set when qc_processt_timer() the timer task for the connection responsible of detecting packet loss and probing upon PTO expiration leading to interrupted stream transfers. This was revealed by blackhole interop failed tests where one could see that qc_process_timer() was wakeup without traces as follows in the log file: "needs to probe 01RTT packet number space" Must be backported to 2.7 and to 2.6 if the commit mentionned above is backported to 2.6 in the meantime.	2023-03-20 17:50:36 +01:00
Frédéric Lécaille	c664e644eb	MINOR: quic: Stop stressing the acknowledgments process (RX ACK frames) The ACK frame range of packets were handled from the largest to the smallest packet number, leading to big number of ebtree insertions when the packet are handled in the inverse way they are sent. This was detected a long time ago but left in the code to stress our implementation. It is time to be more efficient and process the packet so that to avoid useless ebtree insertions. Modify qc_ackrng_pkts() responsible of handling the acknowledged packets from an ACK frame range of acknowledged packets. Must be backported to 2.7.	2023-03-20 17:47:12 +01:00
Willy Tarreau	ac78c4fd9d	MINOR: ssl-sock: pass the CO_SFL_MSG_MORE info down the stack Despite having replaced the SSL BIOs to use our own raw_sock layer, we still didn't exploit the CO_SFL_MSG_MORE flag which is pretty useful to avoid sending incomplete packets. It's particularly important for SSL since the extra overhead almost guarantees that each send() will be followed by an incomplete (and often odd-sided) segment. We already have an xprt_st set of flags to pass info to the various layers, so let's just add a new one, SSL_SOCK_SEND_MORE, that is set or cleared during ssl_sock_from_buf() to transfer the knowledge of CO_SFL_MSG_MORE. This way we can recover this information and pass it to raw_sock. This alone is sufficient to increase by ~5-10% the H2 bandwidth over SSL when multiple streams are used in parallel.	2023-03-17 16:43:51 +01:00
Willy Tarreau	464fa06e9a	MINOR: mux-h2: set CO_SFL_MSG_MORE when sending multiple buffers Traces show that sendto() rarely has MSG_MORE on H2 despite sending multiple buffers. The reason is that the loop iterating over the buffer ring doesn't have this info and doesn't pass it down. But now we know how many buffers are left to be sent, so we know whether or not the current buffer is the last one. As such we can set this flag for all buffers but the last one.	2023-03-17 16:43:51 +01:00
Willy Tarreau	88718955f4	OPTIM: mux-h1: limit first read size to avoid wrapping Before muxes were used, we used to refrain from reading past the buffer's reserve. But with muxes which have their own buffer, this rule was a bit forgotten, resulting in an extraneous read to be performed just because the rx buffer cannot be entirely transferred to the stream layer: sendto(12, "GET /?s=16k HTTP/1.1\r\nhost: 127."..., 84, MSG_DONTWAIT\|MSG_NOSIGNAL, NULL, 0) = 84 recvfrom(12, "HTTP/1.1 200\r\nContent-length: 16"..., 16320, 0, NULL, NULL) = 16320 recvfrom(12, ".123456789.12345", 16, 0, NULL, NULL) = 16 recvfrom(12, "6789.123456789.12345678\n.1234567"..., 15244, 0, NULL, NULL) = 182 recvfrom(12, 0x1e5d5d6, 15062, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) Here the server sends 16kB of payload after a headers block, the mux reads 16320 into the ibuf, and the stream layer consumes 15360 from the first h1_rcv_buf(), which leaves 960 into the buffer and releases a few indexes. The buffer cannot be realigned due to these remaining data, and a subsequent read is made on 16 bytes, then again on 182 bytes. By avoiding to read too much on the first call, we can avoid needlessly filling this buffer: recvfrom(12, "HTTP/1.1 200\r\nContent-length: 16"..., 15360, 0, NULL, NULL) = 15360 recvfrom(12, "456789.123456789.123456789.12345"..., 16220, 0, NULL, NULL) = 1158 recvfrom(12, 0x1d52a3a, 15062, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) This is much more efficient and uses less RAM since the first buffer that was emptied can now be released. Note that a further improvement (tested) consists in reading even less (typically 1kB) so that most of the data are transferred in zero-copy, and are not read until process_stream() is scheduled. This patch doesn't do that for now so that it can be backported without any obscure impact.	2023-03-17 16:43:51 +01:00
Willy Tarreau	f41dfc22b2	BUG/MAJOR: qpack: fix possible read out of bounds in static table CertiK Skyfall Team reported that passing an index greater than QPACK_SHT_SIZE in a qpack instruction referencing a literal field name with name reference or and indexed field line will cause a read out of bounds that may crash the process, and confirmed that this fix addresses the issue. This needs to be backported as far as 2.5.	2023-03-17 16:43:51 +01:00
Aurelien DARRAGON	e2907c7ee3	MINOR: stick-table: add sc-add-gpc() to http-after-response sc-add-gpc() was implemented in `5a72d03` ("MINOR: stick-table: implement the sc-add-gpc() action") This new action was exposed everywhere sc-inc-gpc() is available, except for http-after-response. But there doesn't seem to be a technical constraint that prevents us from exposing it in http-after-response. It was probably overlooked, let's add it. No backport needed, unless `5a72d03` ("MINOR: stick-table: implement the sc-add-gpc() action") is being backported.	2023-03-17 13:09:09 +01:00
Fr�d�ric L�caille	ca07979b97	BUG/MINOR: quic: Missing STREAM frame data pointer updates This patch follows this one which was not sufficient: "BUG/MINOR: quic: Missing STREAM frame length updates" Indeed, it is not sufficient to update the ->len and ->offset member of a STREAM frame to move it forward. The data pointer must also be updated. This is not done by the STREAM frame builder. Must be backported to 2.6 and 2.7.	2023-03-17 09:21:18 +01:00
Willy Tarreau	14ea98af73	BUG/MINOR: mux-h2: set CO_SFL_STREAMER when sending lots of data Emeric noticed that h2 bit-rate performance was always slightly lower than h1 when the CPU is saturated. Strace showed that we were always data in 2kB chunks, corresponding to the max_record size. What's happening is that when this mechanism of dynamic record size was introduced, the STREAMER flag at the stream level was relied upon. Since all this was moved to the muxes, the flag has to be passed as an argument to the snd_buf() function, but the mux h2 did not use it despite a comment mentioning it, probably because before the multi-buf it was not easy to figure the status of the buffer. The solution here consists in checking if the mbuf is congested or not, by checking if it has more than one buffer allocated. If so we set the CO_SFL_STREAMER flag, otherwise we don't. This way moderate size exchanges continue to be made over small chunks, but downloads will be able to use the large ones. While it could be backported to all supported versions, it would be better to limit it to the last LTS, so let's do it for 2.7 and 2.6 only. This patch requires previous commit "MINOR: buffer: add br_single() to check if a buffer ring has more than one buf".	2023-03-16 18:45:46 +01:00
Willy Tarreau	93c5511af8	BUG/MEDIUM: mux-h2: only restart sending when mux buffer is decongested During performance tests, Emeric faced a case where the wakeups of sc_conn_io_cb() caused by h2_resume_each_sending_h2s() was multiplied by 5-50 and a lot of CPU was being spent doing this for apparently no reason. The culprit is h2_send() not behaving well with congested buffers and small SSL records. What happens when the output is congested is that all buffers are full, and data are emitted in 2kB chunks, which are sufficient to wake all streams up again to ask them to send data again, something that will obviously only work for one of them at best, and waste a lot of CPU in wakeups and memcpy() due to the small buffers. When this happens, the performance can be divided by 2-2.5 on large objects. Here the chosen solution against this is to keep in mind that as long as there are still at least two buffers in the ring after calling xprt->snd_buf(), it means that the output is congested and there's no point trying again, because these data will just be placed into such buffers and will wait there. Instead we only mark the buffer decongested once we're back to a single allocated buffer in the ring. By doing so we preserve the ability to deal with large concurrent bursts while not causing a thundering herd by waking all streams for almost nothing. This needs to be backported to 2.7 and 2.6. Other versions could benefit from it as well but it's not strictly necessary, and we can reconsider this option if some excess calls to sc_conn_io_cb() are faced. Note that this fix depends on this recent commit: MINOR: buffer: add br_single() to check if a buffer ring has more than one buf	2023-03-16 18:45:46 +01:00
Willy Tarreau	3fb2c6d5b4	BUG/MINOR: mux-h2: make sure the h2c task exists before refreshing it When detaching a stream, if it's the last one and the mbuf is blocked, we leave without freeing the stream yet. We also refresh the h2c task's timeout, except that it's possible that there's no such task in case there is no client timeout, causing a crash. The fix just consists in doing this when the task exists. This bug has always been there and is extremely hard to meet even without a client timeout. This fix has to be backported to all branches, but it's unlikely anyone has ever met it anyay.	2023-03-16 18:45:46 +01:00
Christopher Faulet	3a7b539b12	BUG/MEDIUM: connection: Preserve flags when a conn is removed from an idle list The commit `5e1b0e7bf` ("BUG/MEDIUM: connection: Clear flags when a conn is removed from an idle list") introduced a regression. CO_FL_SAFE_LIST and CO_FL_IDLE_LIST flags are used when the connection is released to properly decrement used/idle connection counters. if a connection is idle, these flags must be preserved till the connection is really released. It may be removed from the list but not immediately released. If these flags are lost when it is finally released, the current number of used connections is erroneously decremented. If means this counter may become negative and the counters tracking the number of idle connecitons is not decremented, suggesting a leak. So, the above commit is reverted and instead we improve a bit the way to detect an idle connection. The function conn_get_idle_flag() must now be used to know if a connection is in an idle list. It returns the connection flag corresponding to the idle list if the connection is idle (CO_FL_SAFE_LIST or CO_FL_IDLE_LIST) or 0 otherwise. But if the connection is scheduled to be removed, 0 is also returned, regardless the connection flags. This new function is used when the connection is temporarily removed from the list to be used, mainly in muxes. This patch should fix #2078 and #2057. It must be backported as far as 2.2.	2023-03-16 15:34:20 +01:00
Fr�d�ric L�caille	fc546ab6a7	BUG/MINOR: quic: Missing STREAM frame length updates Some STREAM frame lengths were not updated before being duplicated, built of requeued contrary to their ack offsets. This leads haproxy to crash when receiving acknowledgements for such frames with this thread #1 backtrace: Thread 1 (Thread 0x7211b6ffd640 (LWP 986141)): #0 ha_crash_now () at include/haproxy/bug.h:52 No locals. #1 b_del (b=<optimized out>, del=<optimized out>) at include/haproxy/buf.h:436 No locals. #2 qc_stream_desc_ack (stream=stream@entry=0x7211b6fd9bc8, offset=offset@entry=53176, len=len@entry=1122) at src/quic_stream.c:111 Thank you to @Tristan971 for having provided such traces which reveal this issue: [04\|quic\|5\|c_conn.c:1865] qc_requeue_nacked_pkt_tx_frms(): entering : qc@0x72119c22cfe0 [04\|quic\|5\|_frame.c:1179] qc_frm_unref(): entering : qc@0x72119c22cfe0 [04\|quic\|5\|_frame.c:1186] qc_frm_unref(): remove frame reference : qc@0x72119c22cfe0 frm@0x72118863d260 STREAM_F uni=0 fin=1 id=460 off=52957 len=1122 3244 [04\|quic\|5\|_frame.c:1194] qc_frm_unref(): leaving : qc@0x72119c22cfe0 [04\|quic\|5\|c_conn.c:1902] qc_requeue_nacked_pkt_tx_frms(): updated partially acked frame : qc@0x72119c22cfe0 frm@0x72119c472290 STREAM_F uni=0 fin=1 id=460 off=53176 len=1122 Note that haproxy has much more chance to crash if this frame is the last one (fin bit set). But another condition must be fullfilled to update the ack offset. A previous STREAM frame from the same stream with the same offset but with less data must be acknowledged by the peer. This is the condition to update the ack offset. For others frames without fin bit in the same conditions, I guess the stream may be truncated because too much data are removed from the stream when they are acknowledged. Must be backported to 2.6 and 2.7.	2023-03-16 14:35:19 +01:00
Aurelien DARRAGON	819817fc5e	BUG/MINOR: tcp_sample: fix a bug in fc_dst_port and fc_dst_is_local sample fetches There is a bug in the smp_fetch_dport() function which affects the 'f' case, also known as 'fc_dst_port' sample fetch. conn_get_src() is used to retrieve the address prior to calling conn_dst(). But this is wrong: conn_get_dst() should be used instead. Because of that, conn_dst() may return unexpected results since the dst address is not guaranteed to be set depending on the conn state at the time the sample fetch is used. This was reported by Corin Langosch on the ML: during his tests he noticed that using fc_dst_port in a log-format string resulted in the correct value being printed in the logs but when he used it in an ACL, the ACL did not evaluate properly. This can be easily reproduced with the following test conf: \|frontend test-http \| bind 127.0.0.1:8080 \| mode http \| \| acl test fc_dst_port eq 8080 \| http-request return status 200 if test \| http-request return status 500 if !test A request on 127.0.0.1:8080 should normally return 200 OK, but here it will return a 500. The same bug was also found in smp_fetch_dst_is_local() (fc_dst_is_local sample fetch) by reading the code: the fix was applied twice. This needs to be backported up to 2.5 [both sample fetches were introduced in 2.5 with `888cd70` ("MINOR: tcp-sample: Add samples to get original info about client connection")]	2023-03-16 11:26:53 +01:00
Christopher Faulet	a4bd7602f3	BUG/MEDIUM: mux-h1: Don't block SE_FL_ERROR if EOS is not reported on H1C When a connection error is encountered during a receive, the error is not immediatly reported to the SE descriptor. We take care to process all pending input data first. However, when the error is finally reported, a fatal error is reported only if a read0 was also received. Otherwise, only a pending error is reported. With a raw socket, it is not an issue because shutdowns for read and write are systematically reported too when a connection error is detected. So in this case, the fatal error is always reported to the SE descriptor. But with a SSL socket, in case of pure SSL error, only the connection error is reported. This prevent the fatal error to go up. And because the connection is in error, no more receive or send are preformed on the socket. The mux is blocked till a timeout is triggered at the stream level, looping infinitly to progress. To fix the bug, during the demux stage, when there is no longer pending data, the read error is reported to the SE descriptor, regardless the shutdown for reads was received or not. No more data are expected, so it is safe. This patch should fix the issue #2046. It must be backported to 2.7.	2023-03-16 08:54:51 +01:00
Christopher Faulet	f19c639787	DEBUG: ssl-sock/show_fd: Display SSL error code Like for connection error code, when FD are dumps, the ssl error code is now displayed. This may help to diagnose why a connection error occurred. This patch may be backported to help debugging.	2023-03-14 15:51:34 +01:00
Christopher Faulet	d52f2ad6ee	DEBUG: cli/show_fd: Display connection error code When FD are dumps, the connection error code is now displayed. This may help to diagnose why a connection error occurred. This patch may be backported to help debugging.	2023-03-14 15:48:07 +01:00
Christopher Faulet	52ec6f14c4	BUG/MEDIUM: resolvers: Properly stop server resolutions on soft-stop When HAproxy is stopping, the DNS resolutions must be stopped, except those triggered from a "do-resolve" action. To do so, the resolutions themselves cannot be destroyed, the current design is too complex. However, it is possible to mute the resolvers tasks. The same is already performed with the health-checks. On soft-stop, the tasks are still running periodically but nothing if performed. For the resolvers, when the process is stopping, before running a resolution, we check all the requesters attached to this resolution. If t least a request is a stream or if there is a requester attached to a running proxy, a new resolution is triggered. Otherwise, we ignored the resolution. It will be evaluated again on the next wakeup. This way, "do-resolv" action are still working during soft-stop but other resoluation are stopped. Of course, it may be see as a feature and not a bug because it was never performed. But it is in fact not expected at all to still performing resolutions when HAProxy is stopping. In addution, a proxy option will be added to change this behavior. This patch partially fixes the issue #1874. It could be backported to 2.7 and maybe to 2.6. But no further.	2023-03-14 15:23:55 +01:00
Christopher Faulet	48678e483f	BUG/MEDIUM: proxy: properly stop backends on soft-stop On soft-stop, we must properlu stop backends and not only proxies with at least a listener. This is mandatory in order to stop the health checks. A previous fix was provided to do so (`ba29687bc1` "BUG/MEDIUM: proxy: properly stop backends"). However, only stop_proxy() function was fixed. When HAproxy is stopped, this function is no longer used. So the same kind of fix must be done on do_soft_stop_now(). This patch partially fixes the issue #1874. It must be backported as far as 2.4.	2023-03-14 15:23:55 +01:00
Remi Tricot-Le Breton	7716f27736	MINOR: ssl: Add certificate path to 'show ssl ocsp-response' output The ocsp-related CLI commands tend to work with OCSP_CERTIDs as well as certificate paths so the path should also be added to the output of the "show ssl ocsp-response" command when no certid or path is provided.	2023-03-14 11:07:32 +01:00
Remi Tricot-Le Breton	dafc068f12	MINOR: ssl: Accept certpath as param in "show ssl ocsp-response" CLI command In order to increase usability, the "show ssl ocsp-response" also takes a frontend certificate path as parameter. In such a case, it behaves the same way as "show ssl cert foo.pem.ocsp".	2023-03-14 11:07:32 +01:00
Remi Tricot-Le Breton	f64a05979d	BUG/MINOR: ssl: Fix double free in ocsp update deinit If the last update before a deinit happens was successful, the pointer to the httpclient in the ocsp update context was not reset while the httpclient instance was already destroyed.	2023-03-14 11:07:32 +01:00
Remi Tricot-Le Breton	a6c0a59e9a	MINOR: ssl: Use ocsp update task for "update ssl ocsp-response" command Instead of having a dedicated httpclient instance and its own code decorrelated from the actual auto update one, the "update ssl ocsp-response" will now use the update task in order to perform updates. Since the cli command allows to update responses that were never included in the auto update tree, a new flag was added to the certificate_ocsp structure so that the said entry can be inserted into the tree "by hand" and it won't be reinserted back into the tree after the update process is performed. The 'update_once' flag "stole" a bit from the 'fail_count' counter since it is the one less likely to reach UINT_MAX among the ocsp counters of the certificate_ocsp structure. This new logic required that every certificate_ocsp entry contained all the ocsp-related information at all time since entries that are not supposed to be configured automatically can still be updated through the cli. The logic of the ssl_sock_load_ocsp was changed accordingly.	2023-03-14 11:07:32 +01:00
Remi Tricot-Le Breton	c9bfe32b71	MINOR: ssl: Change the ocsp update log-format The dedicated proxy used for OCSP auto update is renamed OCSP-UPDATE which should be more explicit than the previous HC_OCSP name. The reference to the underlying httpclient is simply kept in the documentation. The certid is removed from the log line since it is not really comprehensible and is replaced by the path to the corresponding frontend certificate.	2023-03-14 11:07:32 +01:00
Christopher Faulet	e5d02c3d46	BUG/MEDIUM: mux-pt: Set EOS on error on sending path if read0 was received It is more a less a revert of the commit `b65af26e1` ("MEDIUM: mux-pt: Don't always set a final error on SE on the sending path"). The PT multiplexer is so simple that an error on the sending path is terminal. Unlike other muxes, there is no connection level here. However, instead of reporting an final error by setting SE_FL_ERROR, we set SE_FL_EOS flag instead if a read0 was received on the underlying connection. Concretely, it is always true with the current design of the raw socket layer. But it is cleaner this way. Without this patch, it is possible to block a TCP socket if a connection error is triggered when data are sent (for instance a broken pipe) while the upper stream does not expect to receive more data. Note the patch above introduced a regression because errors handling at the connection level is quite simple. All errors are final. But we must keep in mind it may change. And if so, this will require to move back on a 2-step errors handling in the mux-pt. This patch must be backported to 2.7.	2023-03-13 11:22:13 +01:00
Willy Tarreau	8f6da64641	MINOR: quic_sock: un-statify quic_conn_sock_fd_iocb() This one is printed as the iocb in the "show fd" output, and arguably this wasn't very convenient as-is: 293 : st=0x000123(cl heopI W:sRa R:sRA) ref=0 gid=1 tmask=0x8 umask=0x0 prmsk=0x8 pwmsk=0x0 owner=0x7f488487afe0 iocb=0x50a2c0(main+0x60f90) Let's unstatify it and export it so that the symbol can now be resolved from the various points that need it.	2023-03-10 14:30:01 +01:00
Frédéric Lécaille	4377dbd756	BUG/MINOR: quic: Missing listener accept queue tasklet wakeups This bug was revealed by h2load tests run as follows: h2load -t 4 --npn-list h3 -c 64 -m 16 -n 16384 -v https://127.0.0.1:4443/ This open (-c) 64 QUIC connections (-n) 16384 h3 requets from (-t) 4 threads, i.e. 256 requests by connection. Such tests could not always pass and often ended with such results displays by h2load: finished in 53.74s, 38.11 req/s, 493.78KB/s requests: 16384 total, 2944 started, 2048 done, 2048 succeeded, 14336 failed, 14336 errored, 0 timeout status codes: 2048 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 25.92MB (27174537) total, 102.00KB (104448) headers (space savings 1.92%), 25.80MB (27053569) data UDP datagram: 3883 sent, 24330 received min max mean sd ± sd time for request: 48.75ms 502.86ms 134.12ms 75.80ms 92.68% time for connect: 20.94ms 331.24ms 189.59ms 84.81ms 59.38% time to 1st byte: 394.36ms 417.01ms 406.72ms 9.14ms 75.00% req/s : 0.00 115.45 14.30 38.13 87.50% The number of successful requests was always a multiple of 256. Activating the traces also shew that some connections were blocked after having successfully completed their handshakes due to the fact that the mux. The mux is started upon the acceptation of the connection. Under heavy load, some connections were never accepted. From the moment where more than 4 (MAXACCEPT) connections were enqueued before a listener could be woken up to accept at most 4 connections, the remaining connections were not accepted ore lately at the second listener tasklet wakeup. Add a call to tasklet_wakeup() to the accept list tasklet of the listeners to wake up it if there are remaining connections to accept after having called listener_accept(). In this case the listener must not be removed of this accept list, if not at the next call it will not accept anything more. Must be backported to 2.7 and 2.6.	2023-03-10 14:05:24 +01:00
William Lallemand	2078d4b1f7	BUG/MINOR: mworker: use MASTER_MAXCONN as default maxconn value In environments where SYSTEM_MAXCONN is defined when compiling, the master will use this value instead of the original minimal value which was set to 100. When this happens, the master process could allocate RAM excessively since it does not need to have an high maxconn. (For example if SYSTEM_MAXCONN was set to 100000 or more) This patch fixes the issue by using the new define MASTER_MAXCONN which define a default maxconn of 100 for the master process. Must be backported as far as 2.5.	2023-03-09 14:28:44 +01:00
Willy Tarreau	bd3b44edff	MINOR: debug: add random delay injection with "debug dev delay-inj" The goal is to send signals to random threads at random instants so that they spin for a random delay in a relax() loop, trying to give back the CPU to another competing hardware thread, in hope that from time to time this can trigger in critical areas and increase the chances to provoke a latent concurrency bug. For now none were observed. For example, this command starts 64 such tasks waking after random delays of 0-1ms and delivering signals to trigger such loops on 3 random threads: for i in {1..64}; do socat - /tmp/sock1 <<< "expert-mode on;debug dev delay-inj 2 3" done This command is only enabled when DEBUG_DEV is set at build time.	2023-03-09 14:01:58 +01:00
Willy Tarreau	cd8914bc52	BUG/MAJOR: fd/threads: close a race on closing connections after takeover As mentioned in commit `237e6a0d6` ("BUG/MAJOR: fd/thread: fix race between updates and closing FD"), a race was found during stress tests involving heavy backend connection reuse with many competing closes. Here the problem is complex. The analysis in commit `f69fea64e` ("MAJOR: fd: get rid of the DWCAS when setting the running_mask") that removed the DWCAS in 2.5 overlooked a few races. First, a takeover from thread1 could happen just after fd_update_events() in thread2 validates it holds the tmask bit in the CAS loop. Since thread1 releases running_mask after the operation, thread2 will succeed the CAS and both will believe the FD is theirs. This does explain the occasional crashes seen with h1_io_cb() being called on a bad context, or sock_conn_iocb() seeing conn->subs vanish after checking it. This issue can be addressed using a DWCAS in both fd_takeover() and fd_update_events() as it was before the patch above but this is not portable to all archs and is not easy to adapt for those lacking it, due to some operations still happening only on individual masks after the thread groups were added. Second, the checks after fd_clr_running() for the current thread being the last one is not sufficient: at the exact moment the operation completes, another thread may also set and drop the running bit and see itself as alone, and both can call _fd_close_orphan() in parallel. In order to prevent this from happening, we cannot rely on the absence of others, we need an explicit flag indicating that the FD must be closed. One approach that was attempted consisted in playing with the thread_mask but that was not reliable since it could still match between the late deletion and the early insertion that follows. Instead, a new FD flag was added, FD_MUST_CLOSE, that exactly indicates that the call to _fd_delete_orphan() must be done. It is set by fd_delete(), and atomically cleared by the first one which checks it, and which is the only one to call _fd_delete_orphan(). With both points addressed, there's no more visible race left: - takeover() only happens under the connection list's lock and cannot compete with fd_delete() since fd_delete() must first remove the connection from the list before deleting the FD. That's also why it doesn't need to call _fd_delete_orphan() when dropping its running bit. - takeover() sets its running bit then atomically replaces the thread mask, so that until that's done, it doesn't validate the condition to end the synchonization loop in fd_update_events(). Once it's OK, the previous thread's bit is lost, and this is checked for in fd_update_events() - fd_update_events() can compete with fd_delete() at various places which are explained above. Since fd_delete() clears the thread mask as after setting its running bit and after setting the FD_MUST_CLOSE bit, the synchronization loop guarantees that the thread mask is seen before going further, and that once it's seen, the FD_MUST_CLOSE flag is already present. - fd_delete() may start while fd_update_events() has already started, but fd_delete() must hold a bit in thread_mask before starting, and that is checked by the first test in fd_update_events() before setting the running_mask. - the poller's _update_fd() will not compete against _fd_delete_orphan() nor fd_insert() thanks to the fd_grab_tgid() that's always done before updating the polled_mask, and guarantees that we never pretend that a polled_mask has a bit before the FD is added. The issue is very hard to reproduce and is extremely time-sensitive. Some tests were required with a 1-ms timeout with request rates closely matching 1 kHz per server, though certain tests sometimes benefitted from saturation. It was found that adding the following slowdown at a few key places helped a lot and managed to trigger the bug in 0.5 to 5 seconds instead of tens of minutes on a 20-thread setup: { volatile int i = 10000; while (i--); } Particularly, placing it at key places where only one of running_mask or thread_mask is set and not the other one yet (e.g. after the synchronization loop in fd_update_events or after dropping the running bit) did yield great results. Many thanks to Olivier Houchard for this expert help analysing these races and reviewing candidate fixes. The patch must be backported to 2.5. Note that 2.6 does not have tgid in FDs, and that it requires a change of output on fd_clr_running() as we need the previous bit. This is provided by carefully backporting commit `d6e1987612` ("MINOR: fd: make fd_clr_running() return the previous value instead"). Tests have shown that the lack of tgid is a showstopper for 2.6 and that unless a better workaround is found, it could still be preferable to backport the minimum pieces required for fd_grab_tgid() to 2.6 so that it stays stable long.	2023-03-09 14:01:48 +01:00
Willy Tarreau	cf0d0eedc7	BUG/MINOR: thread: report thread and group counts in the correct order In case too many thread groups are needed for the threads, we emit an error indicating the problem. Unfortunately the threads and groups counts were reversed. This can be backported to 2.6.	2023-03-09 11:40:56 +01:00
Willy Tarreau	f5b63277f4	BUG/MINOR: init: properly detect NUMA bindings on large systems The NUMA detection code tries not to interfer with any taskset the user could have specified in init scripts. For this it compares the number of CPUs available with the number the process is bound to. However, the CPU count is retrieved after being applied an upper bound of MAX_THREADS, so if the machine has more than 64 CPUs, the comparison always fails and makes haproxy think the user has already enforced a binding, and it does not pin it anymore to a single NUMA node. This can be verified by issuing: $ socat /path/to/sock - <<< "show info" \| grep thread On a dual 48-CPU machine it reports 64, implying that threads are allowed to run on the second socket: Nbthread: 64 With this fix, the function properly reports 96, and the output shows 48, indicating that a single NUMA node was used: Nbthread: 48 Of course nothing is changed when "no numa-cpu-mapping" is specified: Nbthread: 64 This can be backported to 2.4.	2023-03-09 10:17:37 +01:00
Frédéric Lécaille	be795ceb91	MINOR: quic: Do not stress the peer during retransmissions of lost packets This issue was revealed by "Multiple streams" QUIC tracker test which very often fails (locally) with a file of about 1Mbytes (x4 streams). The log of QUIC tracker revealed that from its point of view, the 4 files were never all received entirely: "results" : { "stream_0_rec_closed" : true, "stream_0_rec_offset" : 1024250, "stream_0_snd_closed" : true, "stream_0_snd_offset" : 15, "stream_12_rec_closed" : false, "stream_12_rec_offset" : 72689, "stream_12_snd_closed" : true, "stream_12_snd_offset" : 15, "stream_4_rec_closed" : true, "stream_4_rec_offset" : 1024250, "stream_4_snd_closed" : true, "stream_4_snd_offset" : 15, "stream_8_rec_closed" : true, "stream_8_rec_offset" : 1024250, "stream_8_snd_closed" : true, "stream_8_snd_offset" : 15 }, But this in contradiction with others QUIC tracker logs which confirms that haproxy has really (re)sent the stream at the suspected offset(stream_12_rec_offset): 1152085, "transport", "packet_received", { "frames" : [ { "frame_type" : "stream", "length" : "155", "offset" : "72689", "stream_id" : "12" } ], "header" : { "dcid" : "a14479169ebb9dba", "dcil" : "8", "packet_number" : "466", "packet_size" : 190 }, "packet_type" : "1RTT" } When detected as losts, the packets are enlisted, then their frames are requeued in their packet number space by qc_requeue_nacked_pkt_tx_frms(). This was done using a local list which was spliced to the packet number frame list. This had as bad effect to retransmit the frames in the inverse order they have been sent. This is something the QUIC tracker go client does not like at all! Removing the frame splicing fixes this issue and allows haproxy to pass the "Multiple streams" test. Must be backported to 2.7.	2023-03-08 18:50:45 +01:00
Willy Tarreau	9b773ec118	CLEANUP: sock: always perform last connection updates before wakeup Normally the task_wakeup() in sock_conn_io_cb() is expected to happen on the same thread the FD is attached to. But due to the way the code was arranged in the past (with synchronous callbacks) we continue to update connections after the wakeup, which always makes the reader have to think deeply whether it's possible or not to call another thread there. Let's just move the tasklet_wakeup() at the end to make sure there's no problem with that.	2023-03-08 16:07:32 +01:00
Willy Tarreau	677c006c5c	MINOR: fd/cli: report the polling mask in "show fd" It's missing and often needed when trying to debug a situation, let's report the polling mask as well in "show fd".	2023-03-08 16:07:32 +01:00
Frédéric Lécaille	cc101cd2aa	BUG/MINOR: quic: Wrong RETIRE_CONNECTION_ID sequence number check This bug arrived with this commit: b5a8020e9 MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX) and was revealed by h3 interop tests with clients like s2n-quic and quic-go as noticed by Amaury. Indeed, one must check that the CID matching the sequence number provided by a received RETIRE_CONNECTION_ID frame does not match the DCID of the packet. Remove useless ->curr_cid_seq_num member from quic_conn struct. The sequence number lookup must be done in qc_handle_retire_connection_id_frm() to check the validity of the RETIRE_CONNECTION_ID frame, it returns the CID to be retired into <cid_to_retire> variable passed as parameter to this function if the frame is valid and if the CID was not already retired Must be backported to 2.7.	2023-03-08 14:53:12 +01:00
Amaury Denoyelle	5907fede87	MEDIUM: quic: release closing connections on stopping Since the following commit : commit `fb375574f9` MINOR: quic: mark quic-conn as jobs on socket allocation quic-conn instances are marked as jobs. This prevent haproxy process to stop while there is transfer in progress. To not delay process termination, idle connections are woken up through their MUX instances to be able to release them immediately. However, there is no mechanism to wake up quic connections left on closing or draining state. This means that haproxy process termination is delayed until every closing quic connections timer has expired. To improve this, a new function quic_handle_stopping() is called when haproxy process is stopping. It simply wakes up the idle timer task of all connections in the global closing list. These connections will thus be released immediately to not interrupt haproxy process stopping. This should be backported up to 2.7.	2023-03-08 14:41:28 +01:00
Amaury Denoyelle	2d37629222	MINOR: quic: handle new closing list in show quic A new global quic-conn list has been added by the previous patch. It will contain every quic-conn in closing or draining state. Thus, it is now easier to include or skip them on a "show quic" output : when the default list on the current thread has been browsed entirely, either we skip to the next thread or we look at the closing list on the current thread. This should be backported up to 2.7.	2023-03-08 14:39:48 +01:00
Amaury Denoyelle	efed86c973	MINOR: quic: create a global list dedicated for closing QUIC conns When a CONNECTION_CLOSE is emitted or received, a QUIC connection enters respectively in draining or closing state. These states are a loose equivalent of TCP TIME_WAIT. No data can be exchanged anymore but the connection is maintained during a certain timer to handle packet reordering or loss. A new global list has been defined for QUIC connections in closing/draining state inside thread_ctx structure. Each time a connection enters in one of this state, it will be moved from the default global list to the new closing list. The objective of this patch is to quickly filter connections on closing/draining. Most notably, this will be used to wake up these connections and avoid that haproxy process stopping is delayed by them. A dedicated function qc_detach_th_ctx_list() has been implemented to transfer a quic-conn from one list instance to the other. This takes care of back-references attach to a quic-conn instance in case of a running "show quic". This should be backported up to 2.7.	2023-03-08 14:39:48 +01:00
Amaury Denoyelle	815c8ce210	MINOR: h3: add traces on h3_init_uni_stream() error paths Complete traces on h3_init_uni_stream(). This ensures there is always a dedicated trace for each error paths. This should be backported up to 2.7.	2023-03-08 14:32:30 +01:00
Remi Tricot-Le Breton	447a38f387	MINOR: jwt: Add support for RSA-PSS signatures (PS256 algorithm) This patch adds the support for the PS algorithms when verifying JWT signatures (rsa-pss). It was not managed during the first implementation and previously raised an "Unmanaged algorithm" error. The tests use the same rsa signature as the plain rsa tests (RS256 ...) and the implementation simply adds a call to EVP_PKEY_CTX_set_rsa_padding in the function that manages rsa and ecdsa signatures. The signatures in the reg-test were built thanks to the PyJWT python library once again.	2023-03-08 10:43:04 +01:00
Aurelien DARRAGON	bce0c0c37a	BUG/MINOR: dns: fix ring offset calculation in dns_resolve_send() With `737d10f` ("BUG/MEDIUM: dns: ensure ring offset is properly reajusted to head") relative offset calculation was fixed in dns_session_io_handler() and dns_process_req() functions. But if we compare with the changes performed in the patch that introduced the bug: `d9c7188` ("MEDIUM: ring: make the offset relative to the head/tail instead of absolute"), we can see that dns_resolve_send() is missing from the patch. Applying both `737d10f` + ("BUG/MINOR: dns: fix ring offset calculation on first read") to dns_resolve_send() function. With this last commit, we should be back at pre `d9c7188` behavior. No backport needed.	2023-03-08 08:57:13 +01:00
Aurelien DARRAGON	5a43db2c5d	BUG/MINOR: dns: fix ring offset calculation on first read With `737d10f` ("BUG/MEDIUM: dns: ensure ring offset is properly reajusted to head") ring offset is now properly re-adjusted in dns_session_io_handler() and dns_process_req(). But the previous patch does not cope well if the first read is performed on a non-empty ring since relative ofs will be computed from ds->ofs=0 or dss->ofs_req=0. In this case: relative offset could become invalid since we mix up relative offsets with absolute offsets. To fix this, we apply the same logic performed in `d9c7188` ("MEDIUM: ring: make the offset relative to the head/tail instead of absolute") for the cli_io_handler_show_ring() function: that is using b_peek_ofs(buf, 0) to set the contextual offset instead of hard-coding it to 0. This should be considered as a minor bugfix since this bug was discovered by reading the code: `737d10f` already survived a good amount of stress-tests as shown in GH #2068. No backport needed as `737d10f` is not marked for backports.	2023-03-08 08:56:30 +01:00
Aurelien DARRAGON	2c98867187	BUG/MEDIUM: sink/forwarder: ensure ring offset is properly readjusted to head Since `d9c7188` ("MEDIUM: ring: make the offset relative to the head/tail instead of absolute"), ring offset calculation has changed: we don't rely on ring->ofs absolute offset anymore. But with the above patch, relative offset is not properly calculated in sink_forward_oc_io_handler() and sink_forward_io_handler(). The issue here is the same as `737d10f` ("BUG/MEDIUM: dns: ensure ring offset is properly reajusted to head") since dns and sink_forward share the same ring logic: When the ring is becoming full, ring_write() will try to regain some space to insert new data by calling b_del() on older messages. Here b_del() moves buffer's head under the hood, and since ring->ofs cannot be used to "correct" the relative offset, both sink_forward_oc_io_handler() and sink_forward_io_handler() start to get invalid offset. At this point, we will suffer from ring data corruption resulting in unexpected behavior or process crashes. This can be easily demonstrated with the following test: \|log-forward syslog \| dgram-bind 127.0.0.1:5114 \| log ring@logbuffer local0 \| \|ring logbuffer \| format rfc5424 \| size 16384 \| server logserver 127.0.0.1:5114 Haproxy will forward incoming logs on udp@127.0.0.1:5114 to tcp@127.0.0.1:5114 Then use the following tcp server: nc -l -p 5114 With the following udp log sender: \|while [ 1 ] \|do \| logger --udp --server 127.0.0.1 -P 5114 -p user.warn "Test 7" \|done Once the ring buffer is full (it takes less that a second to fill the 16k buffer) haproxy starts to misbehave and the log forwarding stops. We apply the same fix as in `737d10f` ("BUG/MEDIUM: dns: ensure ring offset is properly reajusted to head"). Please note the ~0 case that is handled slightly differently in this patch: this is required to properly start reading from a non-empty ring. This case will be fixed in dns related code in the following patch. This does not need to be backported as `d9c7188` was not marked for backports.	2023-03-08 08:54:43 +01:00
Frédéric Lécaille	5e3201ea77	MINOR: quic: Add transport parameters to "show quic" Modify quic_transport_params_dump() and others function relative to the transport parameters value dump from TRACE() to make their output more compact. Add call to quic_transport_params_dump() to dump the transport parameters from "show quic" CLI command. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	ece86e64c4	MINOR: quic: Add spin bit support Add QUIC_FL_RX_PACKET_SPIN_BIT new RX packet flag to mark an RX packet as having the spin bit set. Idem for the connection with QUIC_FL_CONN_SPIN_BIT flag. Implement qc_handle_spin_bit() to set/unset QUIC_FL_CONN_SPIN_BIT for the connection as soon as a packet number could be deciphered. Modify quic_build_packet_short_header() to set the spin bit when building a short packet header. Validated by quic-tracker spin bit test. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	433af7fad9	MINOR: quic: Useless TLS context allocations in qc_do_rm_hp() These allocations are definitively useless. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	8ac8a8778d	MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX) Add ->curr_cid_seq_num new quic_conn struct frame to store the connection ID sequence number currently used by the connection. Implement qc_handle_retire_connection_id_frm() to handle this RX frame. Implement qc_retire_connection_seq_num() to remove a connection ID from its sequence number. Implement qc_build_new_connection_id_frm to allocate a new NEW_CONNECTION_ID frame from a CID. Modify qc_parse_pkt_frms() which parses the frames of an RX packet to handle the case of the RETIRE_CONNECTION_ID frame. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	904caac3e4	MINOR: quic: Typo fix for ACK_ECN frame Wrong name displayed by TRACE(). Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	b4c5471425	MINOR: quic: Store the next connection IDs sequence number in the connection Add ->next_cid_seq_num new member to quic_conn struct to store the next connection ID to be used to alloacated a connection ID. It is initialized to 0 from qc_new_conn() which initializes a connection. Modify new_quic_cid() to use this variable each time it is called without giving the possibility to the caller to pass the sequence number for the connection to be allocated. Modify quic_build_post_handshake_frames() to use ->next_cid_seq_num when building NEW_CONNECTION_ID frames after the hanshake has been completed. Limit the number of connection IDs provided to the peer to the minimum between 4 and the value it sent with active_connection_id_limit transport parameter. This includes the connection ID used by the connection to send this new connection IDs. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	4afbca611f	MINOR: quic: Do not accept wrong active_connection_id_limit values A peer must not send active_connection_id_limit values smaller than 2 which is also the minimum value when not sent. Make the transport parameters decoding fail in this case. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Amaury Denoyelle	ebfafc212a	BUG/MINOR: mux-quic: properly init STREAM frame as not duplicated STREAM frame retransmission has been recently fixed. A new boolean field <dup> was created for quic_stream frame type. It is set for duplicated STREAM frame to ensure extra checks on the underlying buffer are conducted before sending the frame. All of this has been implemented by this commit : `315a4f6ae5` BUG/MEDIUM: quic: do not crash when handling STREAM on released MUX However, the above commit is incomplete. In the MUX code, when a new STREAM frame is created, <dup> is left uninitialized. In most cases this is harmless as it will only add extra unneeded checks before sending the frame. So this is mainly a performance issue. There is however one case where this bug will lead to a crash : when the response consists only of an empty STREAM frame. In this case, the empty frame will be silently removed as it is incorrectly assimilated to an already acked frame range in qc_build_frms(). This can trigger a BUG_ON() on the MUX code as a qcs instance is still in the send list after qc_send_frames() invocation. Note that this is extremely rare to have only an empty STREAM frame. It was reproduced with HTTP/0.9 where no HTTP status line exists on an empty body. I do not know if this is possible on HTTP/3 as a status line should be present each time in a HEADERS frame. Properly initialize <dup> field to 0 on each STREAM frames generated by the QUIC MUX to fix this issue. This crash may be linked to github issue #2049. This should be backported up to 2.6.	2023-03-07 18:39:49 +01:00
Amaury Denoyelle	737d10fac1	BUG/MEDIUM: dns: ensure ring offset is properly reajusted to head Since the below patch, ring offset calculation for readers has changed. commit `d9c7188633` MEDIUM: ring: make the offset relative to the head/tail instead of absolute For readers, this requires to adjust their offsets to be relative to the ring head each time read is resumed. Indeed, buffer head can change any time a ring_write() is performed after older entries were purged. This operation was not performed on the DNS code which causes the offset to become invalid. In most cases, the following BUG_ON() was triggered : FATAL: bug condition "msg_len + ofs + cnt + 1 > b_data(buf)" matched at src/dns.c:522 Fix this by adjusting DNS reader offsets when entering dns_session_io_handler() and dns_process_req(). This bug was reproduced by using a backend with 10 servers using SRV record resolution on a single resolvers section. A BUG_ON() crash would occur after less than 5 minutes of process execution. This does not need to be backported as the above patch is not. This should fix github issue #2068.	2023-03-07 15:51:58 +01:00
Willy Tarreau	237e6a0d65	BUG/MAJOR: fd/thread: fix race between updates and closing FD While running some L7 retries tests, Christopher and I stumbled upon a very strange behavior showing some occasional server timeouts when the server closes keep-alive connections quickly. The issue can be reproduced with the following config: global expose-experimental-directives #tune.fd.edge-triggered on # can speed up the issue defaults mode http timeout client 5s timeout server 10s timeout connect 2s listen f bind :8001 http-reuse always retry-on all-retryable-errors server next 127.0.0.1:8002 frontend b bind :8002 timeout http-keep-alive 1 # one ms redirect location / Sending fast requests without reusing the client connection on port 8001 with a single connection and at least 3 threads on haproxy occasionally shows some glitches pauses (below with timeout server 2s): $ taskset -c 2,3 h1load -e -t 1 -r 1 -c 1 http://127.0.0.1:8001/ # time conns tot_conn tot_req tot_bytes err cps rps bps ttfb 1 1 9794 9793 959714 0 9k79 9k79 7M67 42.94u 2 1 9794 9793 959714 0 0.00 0.00 0.00 - 3 1 9794 9793 959714 0 0.00 0.00 0.00 - 4 0 16015 16015 1569470 0 6k22 6k22 4M87 522.9u 5 0 18657 18656 1828190 2 2k63 2k63 2M06 39.22u If this doesn't happen, limiting to a request rate close to 1/timeout may help. What is happening is that after several migrations, a late report via fd_update_events() may detect that the thread is not welcome, and will want to program an update so that the current thread's poller disables its polling on it. It is allowed to do so because it used fd_grab_tgid(). But what if _fd_delete_orphan() was just starting to be called and already reset the update_mask ? We'll end up with a bit present in the update mask, then _fd_delete_orphan() resets the tgid, which will prevent the poller from consuming that update. The update is not needed anymore since the FD was closed, but in this case nobody will clear this bit until the same FD is reused again and cleared. And as long as the thread's bit remains in the update_mask, no new updates will be programmed for the next use of this FD on the same thread since due to the bit being present, fd_nbupdt will not be changed. This is what is causing this timeout. The fix consists in making sure _fd_delete_orphan() waits for the occasional watchers to leave, and to do this before clearing the update_mask. This will be either fd_update_events() trying to check its thread_mask, or the poller checking its updates, so that's pretty short. But it definitely closes this race. This fix is needed since the introduction of fd_grab_tgid(), hence 2.7. Note that while testing the fix, another related issue concerning the atomicity of running_mask vs thread_mask popped up and will have to be fixed till 2.5 as part of another patch. It may make the tests for this fix occasionally tigger a few BUG_ON() or face a null conn->subs in sock_conn_iocb(), though these ones are much more difficult to trigger. This is not caused by this fix.	2023-03-07 07:09:59 +01:00
Amaury Denoyelle	315a4f6ae5	BUG/MEDIUM: quic: do not crash when handling STREAM on released MUX The MUX instance is released before its quic-conn counterpart. On termination, a H3 GOAWAY is emitted to prevent the client to open new streams for this connection. The quic-conn instance will stay alive until all opened streams data are acknowledged. If the client tries to open a new stream during this interval despite the GOAWAY, quic-conn is responsible to request its immediate closure with a STOP_SENDING + RESET_STREAM. This behavior was already implemented but the received packet with the new STREAM was never acknowledged. This was fixed with the following commit : commit `156a89aef8` BUG/MINOR: quic: acknowledge STREAM frame even if MUX is released However, this patch introduces a regression as it did not skip the call to qc_handle_strm_frm() despite the MUX instance being released. This can cause a segfault when using qcc_get_qcs() on a released MUX instance. To fix this, add a missing break statement which will skip qc_handle_strm_frm() when the MUX instance is not initialized. This commit was reproduced using a short timeout client and sending several requests with delay between them by using a modified aioquic. It produces a crash with the following backtrace : #0 0x000055555594d261 in __eb64_lookup (x=4, root=0x7ffff4091f60) at include/import/eb64tree.h:132 #1 eb64_lookup (root=0x7ffff4091f60, x=4) at src/eb64tree.c:37 #2 0x000055555563fc66 in qcc_get_qcs (qcc=0x7ffff4091dc0, id=4, receive_only=1, send_only=0, out=0x7ffff780ca70) at src/mux_quic.c:668 #3 0x0000555555641e1a in qcc_recv (qcc=0x7ffff4091dc0, id=4, len=40, offset=0, fin=1 '\001', data=0x7ffff40c4fef "\001&") at src/mux_quic.c:974 #4 0x0000555555619d28 in qc_handle_strm_frm (pkt=0x7ffff4088e60, strm_frm=0x7ffff780cf50, qc=0x7ffff7cef000, fin=1 '\001') at src/quic_conn.c:2515 #5 0x000055555561d677 in qc_parse_pkt_frms (qc=0x7ffff7cef000, pkt=0x7ffff4088e60, qel=0x7ffff7cef6c0) at src/quic_conn.c:3050 #6 0x00005555556230aa in qc_treat_rx_pkts (qc=0x7ffff7cef000, cur_el=0x7ffff7cef6c0, next_el=0x0) at src/quic_conn.c:4214 #7 0x0000555555625fee in quic_conn_app_io_cb (t=0x7ffff40c1fa0, context=0x7ffff7cef000, state=32848) at src/quic_conn.c:4640 #8 0x00005555558a676d in run_tasks_from_lists (budgets=0x7ffff780d470) at src/task.c:596 #9 0x00005555558a725b in process_runnable_tasks () at src/task.c:876 #10 0x00005555558522ba in run_poll_loop () at src/haproxy.c:2945 #11 0x00005555558529ac in run_thread_poll_loop (data=0x555555d14440 <ha_thread_info+64>) at src/haproxy.c:3141 #12 0x00007ffff789ebb5 in ?? () from /usr/lib/libc.so.6 #13 0x00007ffff7920d90 in ?? () from /usr/lib/libc.so.6 This should fix github issue #2067. This must be backported up to 2.6.	2023-03-06 13:39:40 +01:00
Frédéric Lécaille	ec93721fb0	MINOR: quic: Send PING frames when probing Initial packet number space In very very rare cases, it is possible the Initial packet number space must be probed even if it there is no more in flight CRYPTO frames. In such cases, a PING frame is sent into an Initial packet. As this packet is ack-eliciting, it must be padded by the server. qc_do_build_pkt() is modified to do so. Take the opportunity of this patch to modify the trace for TX frames to easily distinguished them from other frame relative traces. Must be backported to 2.7.	2023-03-03 19:12:26 +01:00
Frédéric Lécaille	a65b71f89f	BUG/MINOR: quic: Missing detections of amplification limit reached Mark the connection as limited by the anti-amplification limit when trying to probe the peer. Wakeup the connection PTO/dectection loss timer as soon as a datagram is received. This was done only when the datagram was dropped. This fixes deadlock issues revealed by some interop runner tests. Must be backported to 2.7 and 2.6.	2023-03-03 19:12:26 +01:00
Frédéric Lécaille	e6359b649b	BUG/MINOR: quic: Do not resend already acked frames Some frames are marked as already acknowledged from duplicated packets whose the original packet has been acknowledged. There is no need to resend such packets or frames. Implement qc_pkt_with_only_acked_frms() to detect packet with only already acknowledged frames inside and use it from qc_prep_fast_retrans() which selects the packet to be retransmitted. Must be backported to 2.6 and 2.7.	2023-03-03 19:12:26 +01:00
Frédéric Lécaille	21564be4a2	BUG/MINOR: quic: Ensure not to retransmit packets with no ack-eliciting frames Even if there is a check in callers of qc_prep_hdshk_fast_retrans() and qc_prep_fast_retrans() to prevent retransmissions of packets with no ack-eliciting frames, these two functions should pay attention not do to that especially if someone decides to modify their implementations in the future. Must be backported to 2.6 and 2.7.	2023-03-03 19:12:26 +01:00
Frédéric Lécaille	b3562a3815	BUG/MINOR: quic: Remove force_ack for Initial,Handshake packets This is an old bug which arrived in this commit due to a misinterpretation of the RFC I guess where the desired effect was to acknowledge all the handshake packets: `77ac6f566` BUG/MINOR: quic: Missing acknowledgments for trailing packets This had as bad effect to acknowledge all the handshake packets even the ones which are not ack-eliciting. Must be backported to 2.7 and 2.6.	2023-03-03 19:12:26 +01:00
Frédéric Lécaille	51a7caf921	MINOR: quic: Add traces about QUIC TLS key update Dump the secret used to derive the next one during a key update initiated by the client and dump the resulted new secret and the new key and iv to be used to decryption Application level packets. Also add a trace when the key update is supposed to be initiated on haproxy side. This has already helped in diagnosing an issue evealed by the key update interop test with xquic as client. Must be backported to 2.7.	2023-03-03 19:12:26 +01:00
Frédéric Lécaille	720277843b	BUG/MINOR: quic: v2 Initial packets decryption failed v2 interop runner test revealed this bug as follows: [01\|quic\|4\|c_conn.c:4087] new packet : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080 rel=H [01\|quic\|5\|c_conn.c:1509] qc_pkt_decrypt(): entering : qc@0x7f62ec026e30 [01\|quic\|0\|c_conn.c:1553] quic_tls_decrypt() failed : qc@0x7f62ec026e30 [01\|quic\|5\|c_conn.c:1575] qc_pkt_decrypt(): leaving : qc@0x7f62ec026e30 [01\|quic\|0\|c_conn.c:4091] packet decryption failed -> dropped : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080 Only v2 Initial packets decryption received by the clients were impacted. There is no issue to encrypt v2 Initial packets. This is due to the fact that when negotiated the client may send two versions of Initial packets (currently v1, then v2). The selection was done for the TX path but not on the RX path. Implement qc_select_tls_ctx() to select the correct TLS cipher context for all types of packets and call this function before removing the header protection and before deciphering the packet. Must be backported to 2.7.	2023-03-03 19:12:26 +01:00
Frédéric Lécaille	d30a04a4bb	BUG/MINOR: quic: Ensure to be able to build datagrams to be retransmitted When retransmitting datagrams with two coalesced packets inside, the second packet was not taken into consideration when checking there is enough space into the network for the datagram, especially when limited by the anti-amplification. Must be backported to 2.6 and 2.7.	2023-03-03 19:12:26 +01:00
Frédéric Lécaille	ceb88b8f46	MINOR: quic: Add a BUG_ON_HOT() call for too small datagrams This should be helpful to detect too small datagrams: datagrams smaller than 1200 bytes, with Initial packets inside. Must be backported to 2.7.	2023-03-03 19:12:26 +01:00
Frédéric Lécaille	69e7118fe9	BUG/MINOR: quic: Do not send too small datagrams (with Initial packets) Before building a packet into a datagram, ensure there is sufficient space for at least 1200 bytes. Also pad datagrams with only one ack-eliciting Initial packet inside. Must be backported to 2.7 and 2.6.	2023-03-03 19:12:26 +01:00
Aurelien DARRAGON	39254cac47	MINOR: http_ext: adding some documentation, forgot to inline function Making http_7239_valid_obfsc() inline because it is only called by inline functions. Removing dead comment and documenting proxy_http_parse_{7239,xff,xot} functions. No backport needed.	2023-03-03 18:22:59 +01:00
Amaury Denoyelle	dd3a33f863	BUG/MINOR: cli: fix CLI handler "set anon global-key" call Anonymization mode has two CLI handlers "set anon <on\|off>" and "set anon global-key". The last one only requires admin level. However, as cli_find_kw() is implemented, only the first handler will be retrieved as they both start with the same prefix "set anon". This has the effect to execute the wrong handler for "set anon global-key" with an error message about an invalid keyword. To fix this, handlers definition have been separated for both "set anon on" and "set anon off" commands. This allows to have minimal changes while keeping the same "set anon" prefix for each commands. Also take this opportunity to fix a reference to a non-existing "set global-key" CLI handler in the documentation. This must be backported up to 2.7.	2023-03-03 18:05:58 +01:00
Amaury Denoyelle	c8a0efbda8	BUG/MEDIUM: quic: properly handle duplicated STREAM frames When a STREAM frame is re-emitted, it will point to the same stream buffer as the original one. If an ACK is received for either one of these frame, the underlying buffer may be freed. Thus, if the second frame is declared as lost and schedule for retransmission, we must ensure that the underlying buffer is still allocated or interrupt the retransmission. Stream buffer is stored as an eb_tree indexed by the stream ID. To avoid to lookup over a tree each time a STREAM frame is re-emitted, a lost STREAM frame is flagged as QUIC_FL_TX_FRAME_LOST. In most cases, this code is functional. However, there is several potential issues which may cause a segfault : - when explicitely probing with a STREAM frame, the frame won't be flagged as lost - when splitting a STREAM frame during retransmission, the flag is not copied To fix both these cases, QUIC_FL_TX_FRAME_LOST flag has been converted to a <dup> field in quic_stream structure. This field is now properly copied when splitting a STREAM frame. Also, as this is now an inner quic_frame field, it will be copied automatically on qc_frm_dup() invocation thus ensuring that it will be set on probing. This issue was encounted randomly with the following backtrace : #0 __memmove_avx512_unaligned_erms () #1 0x000055f4d5a48c01 in memcpy (__len=18446698486215405173, __src=<optimized out>, #2 quic_build_stream_frame (buf=0x7f6ac3fcb400, end=<optimized out>, frm=0x7f6a00556620, #3 0x000055f4d5a4a147 in qc_build_frm (buf=buf@entry=0x7f6ac3fcb5d8, #4 0x000055f4d5a23300 in qc_do_build_pkt (pos=<optimized out>, end=<optimized out>, #5 0x000055f4d5a25976 in qc_build_pkt (pos=0x7f6ac3fcba10, #6 0x000055f4d5a30c7e in qc_prep_app_pkts (frms=0x7f6a0032bc50, buf=0x7f6a0032bf30, #7 qc_send_app_pkts (qc=0x7f6a0032b310, frms=0x7f6a0032bc50) at src/quic_conn.c:4184 #8 0x000055f4d5a35f42 in quic_conn_app_io_cb (t=0x7f6a0009c660, context=0x7f6a0032b310, This should fix github issue #2051. This should be backported up to 2.6.	2023-03-03 15:08:02 +01:00
Remi Tricot-Le Breton	8c20a74c90	BUG/MINOR: ssl: Use 'date' instead of 'now' in ocsp stapling callback In the OCSP response callback, instead of using the actual date of the system, the scheduler's 'now' timer is used when checking a response's validity. This patch can be backported to all stable versions.	2023-03-02 15:57:56 +01:00
Remi Tricot-Le Breton	56ab607c40	MINOR: ssl: Replace now.tv_sec with date.tv_sec in ocsp update task Instead of relying on the scheduler's timer in the main ocsp update task, we use the actual system's date.	2023-03-02 15:57:56 +01:00
Remi Tricot-Le Breton	86d1e0b163	BUG/MINOR: ssl: Fix ocsp-update when using "add ssl crt-list" When adding a new certificate through the CLI and appending it to a crt-list with the 'ocsp-update' option set, the new certificate would not be added to the OCSP response update list. The only thing that was missing was the copy of the ocsp_update mode from the ssl_bind_conf into the ckch_store's object. An extra wakeup of the update task also needed to happen in case the newly inserted entry needs to be updated before the next wakeup of the task. This patch does not need to be backported.	2023-03-02 15:57:56 +01:00
Remi Tricot-Le Breton	ca0c84a509	MINOR: ssl: Add ocsp-update information to "show ssl crt-list" The "show ssl crt-list <list>" CLI command did not manage the new ocsp-update option yet.	2023-03-02 15:57:55 +01:00
Remi Tricot-Le Breton	5843237993	MINOR: ssl: Add global options to modify ocsp update min/max delay The minimum and maximum delays between two automatic updates of a given OCSP response can now be set via global options. It allows to limit the update rate of OCSP responses for configurations that use many frontend certificates with the ocsp-update option set if the updates are deemed too costly.	2023-03-02 15:37:23 +01:00
Remi Tricot-Le Breton	9c4437d024	MINOR: ssl: Add way to dump ocsp response in base64 A new format option can be passed to the "show ssl ocsp-response" CLI command to dump the contents of an OCSP response in base64. This is needed because thanks to the new OCSP auto update mechanism, we could end up using an OCSP response internally that was never provided by the user.	2023-03-02 15:37:22 +01:00
Remi Tricot-Le Breton	7e1a62e2b4	MINOR: ssl: Increment OCSP update replay delay in case of failure In case of successive OCSP update errors for a given OCSP response, the retry delay will be multiplied by 2 for every new failure in order to avoid retrying too often to update responses for which the responder is unresponsive (for instance). The maximum delay will still be taken into account so the OCSP update requests will wtill be sent at least every hour.	2023-03-02 15:37:21 +01:00
Remi Tricot-Le Breton	b33fe2f4a2	MINOR: ssl: Use dedicated proxy and log-format for OCSP update Instead of using the same proxy as other http client calls (through lua for instance), the OCSP update will use a dedicated proxy which will enable it to change the log format and log conditions (for instance). This proxy will have the NOLOGNORM option and regular logging will be managed by the update task itself because in order to dump information related to OCSP updates, we need to control the moment when the logs are emitted (instead or relying on the stream's life which is decorrelated from the update itself). The update task then calls sess_log directly, which uses a dedicated ocsp logformat that fetches specific OCSP data. Sess_log was preferred to the more low level app_log because it offers the strength of "regular" sample fetches and allows to add generic information alongside OCSP ones in the log line. In case of connection error (unreachable server for instance), a regular httpclient log line will also be emitted. This line will have some extra HTTP related info that can't be provided by the ocsp update logging mechanism.	2023-03-02 15:37:19 +01:00
Remi Tricot-Le Breton	d42c896216	MINOR: ssl: Add sample fetches related to OCSP update This patch adds a series of sample fetches that rely on the specified OCSP update context structure. They will then be of use only in the context of an ongoing OCSP update. They cannot be used directly in the configuration so they won't be made public. They will be used in the OCSP update's specific log format which should be emitted by the update task itself in a future patch.	2023-03-02 15:37:18 +01:00
Remi Tricot-Le Breton	d14fc51613	MINOR: ssl: Add 'show ssl ocsp-updates' CLI command This command can be used to dump information about the entries contained in the ocsp update tree. It will display one line per concerned OCSP response and will contain the expected next update time as well as the time of the last successful update, and the number of successful and failed attempts.	2023-03-02 15:37:17 +01:00
Remi Tricot-Le Breton	0c96ee48b4	MINOR: ssl: Add certificate's path to certificate_ocsp structure In order to have some information about the frontend certificate when dumping the contents of the ocsp update tree from the cli, we could either keep a reference to a ckch_store in the certificate_ocsp structure, which might cause some dangling reference problems, or simply copy the path to the certificate in the ocsp response structure. This latter solution was chosen because of its simplicity.	2023-03-02 15:37:15 +01:00
Remi Tricot-Le Breton	ad6cba83a4	MINOR: ssl: Store specific ocsp update errors in response and update ctx Those new specific error codes will enable to know a bit better what went wrong during and OCSP update process. They will come to use in future sample fetches as well as in debugging means (via the cli or future traces).	2023-03-02 15:37:12 +01:00
Remi Tricot-Le Breton	9e94df3e55	MINOR: ssl: Add ocsp update success/failure counters Those counters will be used for debugging purposes and will be dumped via a cli command.	2023-03-02 15:37:11 +01:00
Remi Tricot-Le Breton	6de7b78c9f	MINOR: ssl: Reinsert ocsp update entries later in case of unknown error In case of allocation error during the construction of an OCSP request for instance, we would have ended reinserting the ocsp entry at the same place in the ocsp update tree which could potentially lead to an "endless" loop of errors in ssl_ocsp_update_responses. In such a case, entries are now reinserted further in the tree (1 minute later) in order to avoid such a chain of alloc failure.	2023-03-02 15:37:10 +01:00
Remi Tricot-Le Breton	926f34bc36	MINOR: ssl: Destroy ocsp update http_client during cleanup If a deinit is started while an OCSP update is in progress we might end up with a dangling http_client instance that should be destroyed properly.	2023-03-02 15:37:07 +01:00
Christopher Faulet	91ff709542	BUG/MINOR: mxu-h1: Report a parsing error on abort with pending data When an abort is detected before all headers were received, and if there are pending incoming data, we must report a parsing error instead of a connection abort. This way it will be able to be handled as an invalid message by HTTP analyzers instead of an early abort with no message. It is especially important to be accurate on L7 retry. Indeed, without this fix, this case will be handle by the "empty-response" retries policy while a retry on "junk-response" is more accurate. This patch must be backported to 2.7.	2023-03-01 17:35:16 +01:00
Christopher Faulet	c2fba3f77f	BUG/MEDIUM: http-ana: Don't close request side when waiting for response A recent fix (`af124360e` "BUG/MEDIUM: http-ana: Detect closed SC on opposite side during body forwarding") was pushed to handle to sync a side when the opposite one is in closing state. However, sometimes, the synchro is performed too early, preventing a L7 retry to be performed. Indeed, while the above fix is valid on the reponse side. On the request side, if the response was not yet received, we must wait before closing. So, to fix the fix, on the request side, we at least wait the response was received before finishing the request analysis. Of course, if there is an error, an abort or anything wrong on the server side, the response analyser should handle it. This patch is related to #2061. No backport needed.	2023-03-01 17:35:16 +01:00
Christopher Faulet	6f78ac5605	BUG/MINOR: http-ana: Do a L7 retry on read error if there is no response A regression about "empty-response" L7 retry was introduced with the commit dd6496f591 ("CLEANUP: http-ana: Remove useless if statement about L7 retries"). The if statetement was removed on a wrong assumption. Indeed, L7 retries on status is now handled in the HTTP analysers. Thus, the stream-connector (formely the conn-stream, and before again the stream-interface) no longer report a read error to force a retry. But it is still possible to get a read error with no response. In this case, we must perform a retry is "empty-response" is enabled. So the if statement is re-introduced, reverting the cleanup. This patch should fix the issue #2061. It must be backported as far as 2.4.	2023-03-01 17:35:16 +01:00
Christopher Faulet	41ade746c7	BUG/MINOR: http-ana: Don't increment conn_retries counter before the L7 retry When we are about to perform a L7 retry, we deal with the conn_retries counter, to be sure we can retry. However, there is an issue here because the counter is incremented before it is checked against the backend limit. So, we can miss a connection retry. Of course, we must invert both operation. The conn_retries counter must be incremented after the check agains the backend limit. This patch must be backported as far as 2.6.	2023-03-01 17:35:16 +01:00
Amaury Denoyelle	caa16549b8	MINOR: quic: notify on send ready This patch completes the previous one with poller subscribe of quic-conn owned socket on sendto() error. This ensures that mux-quic is notified if waiting on sending when a transient sendto() error is cleared. As such, qc_notify_send() is called directly inside socket I/O callback. qc_notify_send() internal condition have been thus completed. This will prevent to notify upper layer until all sending condition are fulfilled: room in congestion window and no transient error on socket FD. This should be backported up to 2.7.	2023-03-01 14:32:37 +01:00
Amaury Denoyelle	e1a0ee3cf6	MEDIUM: quic: implement poller subscribe on sendto error On sendto() transient error, prior to this patch sending was simulated and we relied on retransmission to retry sending. This could hurt significantly the performance. Thanks to quic-conn owned socket support, it is now possible to improve this. On transient error, sending is interrupted and quic-conn socket FD is subscribed on the poller for sending. When send is possible, quic_conn_sock_fd_iocb() will be in charge of restart sending. A consequence of this change is on the return value of qc_send_ppkts(). This function will now return 0 on transient error if quic-conn has its owned socket. This is used to interrupt sending in the calling function. The flag QUIC_FL_CONN_TO_KILL must be checked to differentiate a fatal error from a transient one. This should be backported up to 2.7.	2023-03-01 14:32:37 +01:00
Amaury Denoyelle	147862de61	MINOR: quic: purge txbuf before preparing new packets Sending is implemented in two parts on quic-conn module. First, QUIC packets are prepared in a buffer and then sendto() is called with this buffer as input. qc.tx.buf is used as the input buffer. It must always be empty before starting to prepare new packets in it. Currently, this is guarantee by the fact that either sendto() is completed, a fatal error is encountered which prevent future send, or a transient error is encountered and we rely on retransmission to send the remaining data. This will change when poller subscribe of socket FD on sendto() transient error will be implemented. In this case, qc.tx.buf will not be emptied to resume sending when the transient error is cleared. To allow the current sending process to work as expected, a new function qc_purge_txbuf() is implemented. It will try to send remaining data before preparing new packets for sending. If successful, txbuf will be emptied and sending can continue. If not, sending will be interrupted. This should be backported up to 2.7.	2023-03-01 14:29:16 +01:00
Amaury Denoyelle	e0fe118dad	MINOR: quic: implement qc_notify_send() Implement qc_notify_send(). This function is responsible to notify the upper layer subscribed on SUB_RETRY_SEND if sending condition are back to normal. For the moment, this patch has no functional change as only congestion window room is checked before notifying the upper layer. However, this will be extended when poller subscribe of socket on sendto() error will be implemented. qc_notify_send() will thus be responsible to ensure that all condition are met before wake up the upper layer. This should be backported up to 2.7.	2023-03-01 14:29:16 +01:00
Amaury Denoyelle	37333864ef	MINOR: quic: simplify return path in send functions This patch simply clean up return paths used in various send function of quic-conn module. This will simplify the implementation of poller subscribing on sendto() error which add another error handling path. This should be backported up to 2.7.	2023-03-01 14:29:16 +01:00
Oto Valek	d1773e6881	BUG/MINOR: http-fetch: recognize IPv6 addresses in square brackets in req.hdr_ip() If an IPv6 address is enclosed in square brackets [], trim them before calling inet_pton(). This is to comply with RFC7239 6.1 and RFC3986 3.2.2.	2023-03-01 14:09:46 +01:00
Christopher Faulet	d48bfb6983	BUG/MINOR: http-check: Skip C-L header for empty body when it's not mandatory The Content-Length header is always added into the request for an HTTP health-check. However, when there is no payload, this header may be skipped for OPTIONS, GET, HEAD and DELETE methods. In fact, it is a "SHOULD NOT" in the RCF 9110 (#8.6). It is not really an issue in itself but it seems to be an issue for AWS ELB. It returns a 400-Bad-Request if a HEAD/GET request with no payload contains a Content-Length header. So, it is better to skip this header when possible. This patch should fix the issue #2026. It could be backported as far as 2.2.	2023-02-28 18:51:27 +01:00
Christopher Faulet	0506d9de51	BUG/MINOR: http-check: Don't set HTX_SL_F_BODYLESS flag with a log-format body When the HTTP request of a health-check is forged, we must not pretend there is no payload, by setting HTX_SL_F_BODYLESS, if a log-format body was configured. Indeed, a test on the body length was used but it is only valid for a plain string. For A log-format string, a list is used. Note it an bug with no consequence for now. This patch must be backported as far as 2.2.	2023-02-28 18:44:15 +01:00
Christopher Faulet	fb5fff19fe	BUG/MINOR: mux-h1: Don't report an error on an early response close If the response is closed before any data was received, we must not report an error to the SE descriptor. It is important to be able to retry on an empty response. This patch should fix the issue #2061. It must be backported to 2.7.	2023-02-28 18:36:46 +01:00
Christopher Faulet	5e1b0e7bf8	BUG/MEDIUM: connection: Clear flags when a conn is removed from an idle list When a connection is removed from the safe list or the idle list, CO_FL_SAFE_LIST and CO_FL_IDLE_LIST flags must be cleared. It is performed when the connection is reused. But not when it is moved into the toremove_conns list. It may be an issue because the multiplexer owning the connection may be woken up before the connection is really removed. If the connection flags are not sanitized, it may think the connection is idle and reinsert it in the corresponding list. From this point, we can imagine several bugs. An UAF or a connection reused with an invalid state for instance. To avoid any issue, the connection flags are sanitized when an idle connection is moved into the toremove_conns list. The same is performed at right places in the multiplexers. Especially because the connection release may be delayed (for h2 and fcgi connections). This patch shoudld fix the issue #2057. It must carefully be backported as far as 2.2. Especially on the 2.2 where the code is really different. But some conflicts should be expected on the 2.4 too.	2023-02-28 18:36:29 +01:00
Amaury Denoyelle	4bdd069637	MINOR: quic: consider EBADF as critical on send() EBADF on sendto() is considered as a fatal error. As such, it is removed from the list of the transient errors. The connection will be killed when encountered. For the record, EBADF can be encountered on process termination with the listener socket. This should be backported up to 2.7.	2023-02-28 10:51:25 +01:00
Amaury Denoyelle	1febc2d316	MEDIUM: quic: improve fatal error handling on send Send is conducted through qc_send_ppkts() for a QUIC connection. There is two types of error which can be encountered on sendto() or affiliated syscalls : * transient error. In this case, sending is simulated with the remaining data and retransmission process is used to have the opportunity to retry emission * fatal error. If this happens, the connection should be closed as soon as possible. This is done via qc_kill_conn() function. Until this patch, only ECONNREFUSED errno was considered as fatal. Modify the QUIC send API to be able to differentiate transient and fatal errors more easily. This is done by fixing the return value of the sendto() wrapper qc_snd_buf() : * on fatal error, a negative error code is returned. This is now the case for every errno except EAGAIN, EWOULDBLOCK, ENOTCONN, EINPROGRESS and EBADF. * on a transient error, 0 is returned. This is the case for the listed errno values above and also if a partial send has been conducted by the kernel. * on success, the return value of sendto() syscall is returned. This commit will be useful to be able to handle transient error with a quic-conn owned socket. In this case, the socket should be subscribed to the poller and no simulated send will be conducted. This commit allows errno management to be confined in the quic-sock module which is a nice cleanup. On a final note, EBADF should be considered as fatal. This will be the subject of a next commit. This should be backported up to 2.7.	2023-02-28 10:51:25 +01:00
Willy Tarreau	7b8aac4439	MINOR: tinfo: make thread_set functions return nth group/mask instead of first thread_set_first_group() and thread_set_first_tmask() were modified and renamed to instead return the number and mask of the nth group. Passing zero continues to return the first one, but it will be more convenient to use this way when building shards.	2023-02-28 10:28:47 +01:00
Willy Tarreau	fea8c19119	CLEANUP: listener: only store conn counts for local threads The listeners have a thr_conn[] array indexed on the thread number that is used during connection redispatching to know what threads are the least loaded. Since we introduced thread groups, and based on the fact that a listener may only belong to one group, there's no point storing counters for all threads, we just need to store them for all threads in the group. Doing so reduces the struct listener from 1500 to 632 bytes. This may be backported to 2.7 to save a bit of resources.	2023-02-28 10:28:47 +01:00
Willy Tarreau	061754b249	BUG/MEDIUM: fd: make fd_delete() support being called from a different group There's currently a problem affecting thread groups. Stopping a listener from a different group than the one that runs this listener will trigger the BUG_ON() in fd_delete(). This typically happens by issuing "disable frontend f" on the CLI for the following config since the CLI runs on group 1: global nbthread 2 thread-groups 2 stats socket /tmp/sock1 level admin frontend f mode http bind abns@frt-sock thread 2 This happens because abns sockets cannot be suspended so here this requires a full stop. A first approach would consist in isolating the caller during such rare operations but it turns out that fd_delete() is not robust against even such calling conditions, because it uses its own thread mask with an FD that may be in a different group, and even though the threads would be isolated and running_mask should be zero, we must not mix thread masks from different groups like this. A better solution consists in replacing the bug condition detection with a self-protection. After all it's not trivial to figure all likely call places, and forcing upper layers to protect the code is not clean if we can do it at the bottom. Thus this is what is being done now. We detect a thread group mismatch, and if so, we forcefully isolate ourselves and entirely clean the socket. This has the merit of being much more robust and easier to use (and harder to misuse). Given that such operations are very rare (actually when they happen a crash follows), it's not a problem to waste some time isolating the caller there. This must be backported to 2.7, along with this previous patch: BUG/MINOR: fd: used the update list from the fd's group instead of tgid	2023-02-27 19:26:42 +01:00
Willy Tarreau	c0f6f5755b	BUG/MINOR: fd: used the update list from the fd's group instead of tgid In _fd_delete_orphan() we try to remove the FD from its update list which is supposed to be the current thread group's. However the function might be called from another group during stopping or under isolation, so FD is not queued in the current group's update list but in its own group's list. Let's retrieve the group from the FD instead of using tgid. This should have no impact on existing code since there is no code path calling fd_delete() under thread isolation for now, and other cases are blocked in fd_delete(). This must be backported to 2.7.	2023-02-27 19:26:41 +01:00
Christopher Faulet	85eabfbf67	MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished As for the H1 and H2 stream, the QUIC stream now states it does not expect data from the server as long as the request is unfinished. The aim is the same. We must be sure to not trigger a read timeout on server side if the client is still uploading data. From the moment the end of the request is received and forwarded to upper layer, the QUIC stream reports it expects to receive data from the opposite endpoint. This re-enables read timeout on the server side.	2023-02-27 17:45:45 +01:00
Christopher Faulet	72722c04b0	MEDIUM: mux-h2: Don't expect data from server as long as request is unfinished As for the H1 stream, the H2 stream now states it does not expect data from the server as long as the request is unfinished. The aim is the same. We must be sure to not trigger a read timeout on server side if the client is still uploading data. From the moment the end of the request is received and forwarded to upper layer, the H2 stream reports it expects to receive data from the opposite endpoint. This re-enables read timeout on the server side.	2023-02-27 17:45:45 +01:00
Christopher Faulet	f4b89f162a	MEDIUM: mux-h1: Don't expect data from server as long as request is unfinished On client side, as long as the request is unfinished, the H1 stream states it does not expect data from the server. It does not mean the server must not send its response but only it may wait to receive the whole request with no risk to trigger a read timeout. When the request is finished, the H1 stream reports it expects to receive data from the opposite endpoint. The purpose of this patch is to never report a server timeout on receive if the client is still uploading data. This way, it is possible to have a smaller server timeout than the client one.	2023-02-27 17:45:45 +01:00
Christopher Faulet	59b240c30c	BUG/MEDIUM: stconn: Report a blocked send if some output data are not consumed Instead of reporting a blocked send if nothing is send, we do it if some output data remain blocked after a write attempts or after a call the the applet's I/O handler. It is mandatory to properly handle write timeouts. Indeed, if an endpoint is blocked for a while but it partially consumed output data, no timeout is triggered. It is especially true for connections. But the same may happen for applet, there is no reason. Of course, if the endpoint decides to partially consume output data because it must wait to move on for any reason, it should use the se/applet API (se/applet_will_consume(), se/applet_wont_consume() and se/applet_need_more_data()). This bug was introduced during the channels timeouts refactoring. No backport is needed.	2023-02-27 17:45:45 +01:00
Christopher Faulet	e758b5c703	MEDIUM: stream: Eventually handle stream timeouts when exiting process_stream() When we exit from process_stream(), if the task is expired, we try to handle the stream timeouts and we resync the stream-connectors. This avoids a useless immediate wakeup. It is not really an issue, but it is a small improvement in edge cases.	2023-02-27 17:45:45 +01:00
Christopher Faulet	85e568f594	MINOR: stream: Handle stream's timeouts in a dedicated function This will be mandatory to be able to handle stream's timeouts before exiting process_stream(). So, to not duplicate code, all this stuff is moved in a dedicated function.	2023-02-27 17:45:45 +01:00
Christopher Faulet	3bbd2baab3	BUG/MINOR: stream: Remove BUG_ON about the task expiration in process_stream() At the end of process_stream(), A BUG_ON was recently added to abort if we leave the function with an expired task. However, it may happen if an event prevents the timeout to be handled but nothing evolved. In this case, the task expiration is not updated and we expect to catch the timeout on the immediate task wakeup. No backport needed.	2023-02-27 17:45:45 +01:00
Christopher Faulet	c9ec9bc834	BUG/MEDIUM: h1-htx: Never copy more than the max data allowed during parsing A bug during H1 data parsing may lead to copy more data than the maximum allowed. The bug is an overflow on this max threshold when it is lower than the size of an htx_blk structure. At first glance, it means it is possible to not respsect the buffer's reserve. So it may lead to rewrite errors but it may also block any progress on the stream if the compression is enabled. In this case, the channel buffer appears as full and the compression must wait for space to proceed. Outside of any bug, it is only possible when there are outgoing data to forward, so the compression filter just waits. Because of this bug, there is nothing to forward. The buffer is just full of input data. Thus nothing move and the stream is infinitly blocked. To fix the bug, we must be sure to be able to create an HTX block of 1 byte without exceeding the maximum allowed. This patch should fix the issue #2053. It must be backported as far as 2.5.	2023-02-27 17:45:45 +01:00
Aurelien DARRAGON	e51891a01d	BUG/MEDIUM: fd: avoid infinite loops in fd_add_to_fd_list and fd_rm_from_fd_list With `4d9888c` ("CLEANUP: fd: get rid of the __GET_{NEXT,PREV} macros") some "volatile" keywords were dropped at various assignment places in fd's code. In fd_add_to_fd_list() and fd_add_to_fd_list(), because of the absence of the "volatile" keyword: the compiler was able to perform some code optimizations that prevented prev and next variables from being reloaded between locking attempts (goto loop). The result was that fd_add_to_fd_list() and fd_rm_from_fd_list() could enter in infinite loops, preventing other threads from working further and ultimately resulting in the watchdog being triggered as described in GH #2011. To fix this, we made sure to re-audit `4d9888c` in order to restore the required memory barriers / compilers hints to prevent the compiler from mis-optimizing the code around the fd's locks. That is: using atomic loads to fetch the prev and next values, and restoring the "volatile" cast for cur_list.ent variable assignment in fd_rm_from_fd_list() Big thanks to @xanaxalan for his help and patience and to @wtarreau for his findings and explanations in regard to compiler's optimizations. This must be backported in 2.7 with `4d9888c` ("CLEANUP: fd: get rid of the __GET_{NEXT,PREV} macros")	2023-02-27 16:55:56 +01:00
Fr�d�ric L�caille	83540ed429	BUILD: thead: Fix several 32 bits compilation issues with uint64_t variables Cast uint64_t as ullong and difference between two uint64_t as llong.	2023-02-24 09:56:50 +01:00
S�baastien Gross	2a1bcf1a59	MINOR: config: add HAPROXY_BRANCH environment variable This patch adds support from HAPROXY_BRANCH environment variable. It can be useful is some resources are loaded from different locations when migrating from one version to another. Signed-off-by: S�bastien Gross <sgross@haproxy.com>	2023-02-24 09:45:44 +01:00
Willy Tarreau	a2a3d5dd25	CLEANUP: ring: remove the now unused ring's offset Since the previous patch, the ring's offset is not used anymore. The haring utility remains backward-compatible since it can trust the buffer element that's at the beginning of the map and which still contains all the valid data.	2023-02-24 09:26:30 +01:00
Willy Tarreau	d9c7188633	MEDIUM: ring: make the offset relative to the head/tail instead of absolute The ring's offset currently contains a perpetually growing custor which is the number of bytes written from the start. It's used by readers to know where to (re)start reading from. It was made absolute because both the head and the tail can change during writes and we needed a fixed position to know where the reader was attached. But this is complicated, error-prone, and limits the ability to reduce the lock's coverage. In fact what is needed is to know where the reader is currently waiting, if at all. And this location is exactly where it stored its count, so the absolute position in the buffer (the seek offset from the first storage byte) does represent exactly this, as it doesn't move (we don't realign the buffer), and is stable regardless of how head/tail changes with writes. This patch modifies this so that the application code now uses this representation instead. The most noticeable change is the initialization, where we've kept ~0 as a marker to go to the end, and it's now set to the tail offset instead of trying to resolve the current write offset against the current ring's position. The offset was also used at the end of the consuming loop, to detect if a new write had happened between the lock being released and taken again, so as to wake the consumer(s) up again. For this we used to take a copy of the ring->ofs before unlocking and comparing with the new value read in the next lock. Since it's not possible to write past the current reader's location, there's no risk of complete rollover, so it's sufficient to check if the tail has changed. Note that the change also has an impact on the haring consumer which needs to adapt as well. But that's good in fact because it will rely on one less variable, and will use offsets relative to the buffer's head, and the change remains backward-compatible.	2023-02-24 09:26:30 +01:00
Willy Tarreau	d0d85d2e36	BUG/MINOR: ring: do not realign ring contents on resize If a ring is resized, we must not zero its head since the contents are preserved in-situ. Till now it used to work because we only resize during boot and we emit very few data (if at all) during boot. But this can change in the future. This can be backported to 2.2 though no older version should notice a difference.	2023-02-24 09:26:30 +01:00
Fr�d�ric L�caille	b7a13be6cd	BUILD: quic: 32-bits compilation issue with %zu in quic_rx_pkts_del() This issue arrived with this commit: `1dbeb35f8` MINOR: quic: Add new traces about by connection RX buffer handling and revealed by the GH CI as follows: src/quic_conn.c: In function ‘quic_rx_pkts_del’: include/haproxy/trace.h:134:65: error: format ‘%zu’ expects argument of type ‘size_t’, but argument 6 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=] _msg_len = snprintf(_msg, sizeof(_msg), (fmt), ##args); Replace all %zu printf integer format by %llu. Must be backported to 2.7 where the previous is supposed to be backported.	2023-02-24 09:23:07 +01:00
Aurelien DARRAGON	28a6d48a60	MINOR: haproxy: always protocol unbind on startup error path In haproxy startup, all init error paths after the protocol bind step cautiously call protocol_unbind_all() before exiting except one that was conditional. We're not making an exception to the rule and we now properly call protocol_unbind_all() as well. No backport needed as this patch is unnoticeable.	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	de63efba5a	MINOR: proto_ux: ability to dump ABNS names in error messages In sock_unix_bind_receiver(), uxst_bind_listener() and uxdg_bind_listener(), properly dump ABNS socket names by leveraging sa2str() function which does the hard work for us. UNIX sockets are reported as is (unchanged) while ABNS UNIX sockets are prefixed with 'abns@' to match the syntax used in config file. (they where previously showing as empty strings because of the leading NULL-byte that was not properly handled in this case) This is only a minor debug improvement, however it could be useful to backport it up to 2.4. [for 2.4: you should replace "%s [%s]" by "%s for [%s]" for uxst and uxgd if you wan't the patch to apply properly]	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	2338dba18d	MEDIUM: proto_ux: properly suspend named UNIX listeners When a listener is suspended, we expect that it may not process any data for the time it is suspended. Yet for named UNIX socket, as the suspend operation is a no-op at the proto level, recv events on the socket may still be processed by the polling loop. This is quite disturbing as someone may rely on a paused proxy being harmless, which is true for all protos except for named UNIX sockets. To fix this behavior, we explicitely disable io recv events when suspending a named UNIX socket listener (we call disable() method on the listener). The io recv events will automatically be restored when the listener is resumed since the l->enable() method is called at the end of the resume() operation. This could be backported up to 2.4 after a reasonable observation period to make sure that this change doesn't cause unwanted side-effects.	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	2a7903bbb2	BUG/MINOR: sock_unix: match finalname with tempname in sock_unix_addrcmp() In sock_unix_addrcmp(), named UNIX sockets paths are manually compared in order to properly handle tempname paths (ending with ".XXXX.tmp") that result from the 2-step bind implemented in sock_unix_bind_receiver(). However, this logic does not take into account "final" path names (without the ".XXXX.tmp" suffix). Example: /tmp/test did not match with /tmp/test.1288.tmp prior to this patch Indeed, depending on how the socket addr is retrieved, the same socket could be designated either by its tempname or finalname. socket addr is normally stored with its finalname within a receiver, but a call to getsockname() on the same socket will return the tempname that was used for the bind() call (sock_get_old_sockets() depends on getsockname()). This causes sock_find_compatible_fd() to malfunction with named UNIX sockets (ie: haproxy -x CLI option). To fix this, we slightly modify the check around the temp suffix in sock_unix_addrcmp(): we perform the suffix check even if one of the paths is lacking the temp suffix (with proper precautions). Now the function is able to match: - finalname x finalname - tempname x tempname - finalname x tempname That is: /tmp/test == /tmp/test.1288.tmp == /tmp/test.X.tmp It should be backported up to 2.4	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	ca8a4b2966	BUG/MEDIUM: listener/proxy: fix listeners notify for proxy resume In `58651b42f` ("MEDIUM: listener/proxy: make the listeners notify about proxy pause/resume") we introduced the logic for pause/resume notify using li_ready for pause and li_paused for resume. Unfortunately, relying on li_paused for resume doesn't work reliably if we resume a listener which is only made of receivers that are completely stopped. For example, this could happen with receivers that don't support the LI_PAUSED state like ABNS sockets. This is especially true since pause_listener() was renamed to suspend_listener() to better reflect its actual behavior in ("MINOR: listener: pause_listener() becomes suspend_listener()) To fix this, we now rely on the li_suspended state in resume_listener() to make sure that suspend_listener() and resume_listener() notify messages are consistent to each other: "Proxy pause" is triggered when there are no more ready listeners. "Proxy resume" is triggered when there are no more suspended listeners. Also, we make use of the new PR_FL_PAUSED proxy flag to make sure we don't report the same event twice. This could be backported up to 2.4 after a reasonable observation period to make sure that this change doesn't cause unwanted side-effects. -- Backport notes: This commit depends on: - "MINOR: listener: pause_listener() becomes suspend_listener()" -> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state" was not backported: Replace this: \|+ if (px && !(px->flags & PR_FL_PAUSED) && !px->li_ready) { \| /* PROXY_LOCK is required / \| proxy_cond_pause(px); \| ha_warning("Paused %s %s.\n", proxy_cap_str(px->cap), px->id); By this: \|+ if (px && !px->li_ready) { \| ha_warning("Paused %s %s.\n", proxy_cap_str(px->cap), px->id); \| send_log(px, LOG_WARNING, "Paused %s %s.\n", proxy_cap_str(px->cap), px->id); \| } And this: \|+ if (px && (px->flags & PR_FL_PAUSED) && !px->li_suspended) { \| / PROXY_LOCK is required */ \| proxy_cond_resume(px); \| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); By this: \|+ if (px && !px->li_suspended) { \| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); \| send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); \| }	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	d3ffba4512	MINOR: listener: pause_listener() becomes suspend_listener() We are simply renaming pause_listener() to suspend_listener() to prevent confusion around listener pausing. A suspended listener can be in two differents valid states: - LI_PAUSED: the listener is effectively paused, it will unpause on resume_listener() - LI_ASSIGNED (not bound): the listener does not support the LI_PAUSED state, so it was unbound to satisfy the suspend request, it will correcly re-bind on resume_listener() Besides that, we add the LI_F_SUSPENDED flag to mark suspended listeners in suspend_listener() and unmark them in resume_listener(). We're also adding li_suspend proxy variable to track the number of currently suspended listeners: That is, the number of listeners that were suspended through suspend_listener() and that are either in LI_PAUSED or LI_ASSIGNED state. Counter is increased on successful suspend in suspend_listener() and it is decreased on successful resume in resume_listener() -- Backport notes: -> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state" was not backported: Replace this: \| /* PROXY_LOCK is require \| proxy_cond_resume(px); By this: \| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); \| send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); -> 2.6 and 2.7 only, as "MINOR: listener: make sure we don't pause/resume" was custom patched: Replace this: \|@@ -253,6 +253,7 @@ struct listener { \| \| /* listener flags (16 bits) / \| #define LI_F_FINALIZED 0x0001 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+#define LI_F_SUSPENDED 0x0002 / listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state / \| \| / Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of \| * success, or a combination of ERR_* flags if an error is encountered. The By this: \|@@ -222,6 +222,7 @@ struct li_per_thread { \| \| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic / \| #define LI_F_FINALIZED 0x00000002 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+#define LI_F_SUSPENDED 0x00000004 / listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state / \| \| / The listener will be directly referenced by the fdtab[] which holds its \| * socket. The listener provides the protocol-specific accept() function to	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	046a75e131	BUG/MEDIUM: resume from LI_ASSIGNED in default_resume_listener() Since `fc974887c` ("MEDIUM: protocol: explicitly start the receiver before the listener"), resume from LI_ASSIGNED state does not work anymore. This is because the binding part has been divided into 2 distinct steps since: first bind(), then listen(). This new logic was properly implemented in startup sequence through protocol_bind_all() but wasn't properly reported in default_resume_listener() function. Fixing default_resume_listener() to comply with the new logic. This should help ABNS sockets to properly rebind in resume_listener() after they have been stopped by pause_listener(): See Redmine:4475 for more context. This commit depends on: - "MINOR: listener: workaround for closing a tiny race between resume_listener() and stopping" - "MINOR: listener: make sure we don't pause/resume bypassed listeners" This could be backported up to 2.4 after a reasonable observation period to make sure that this change doesn't cause unwanted side-effects.	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	3bb2a38f01	BUG/MINOR: listener: fix resume_listener() resume return value handling In resume_listener(), proto->resume() errors were not properly handled: the function kept flowing down as if no errors were detected. Instead, we're performing an early return when such errors are detected to prevent undefined behaviors. This could be backported up to 2.4. -- Backport notes: This commit depends on: - "MINOR: listener: make sure we don't pause/resume bypassed listeners" -> 2.4 ... 2.7: Replace this: \| if (l->bind_conf->maxconn && l->nbconn >= l->bind_conf->maxconn) { \| l->rx.proto->disable(l); By this: \| if (l->maxconn && l->nbconn >= l->maxconn) { \| l->rx.proto->disable(l);	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	7a15fa58b1	BUG/MEDIUM: listener: fix pause_listener() suspend return value handling Legacy suspend() return value handling in pause_listener() has been altered over the time. First with `fb76bd5ca` ("BUG/MEDIUM: listeners: correctly report pause() errors") Then with `e03204c8e` ("MEDIUM: listeners: implement protocol level ->suspend/resume() calls") We aim to restore original function behavior and comply with resume_listener() function description. This is required for resume_listener() and pause_listener() to work as a whole Now, it is made explicit that pause_listener() may stop a listener if the listener doesn't support the LI_PAUSED state (depending on the protocol family, ie: ABNS sockets), in this case l->state will be set to LI_ASSIGNED and this won't be considered as an error. This could be backported up to 2.4 after a reasonable observation period to make sure that this change doesn't cause unwanted side-effects. -- Backport notes: This commit depends on: - "MINOR: listener: make sure we don't pause/resume bypassed listeners" -> 2.4: manual change required because "MINOR: proxy/listener: support for additional PAUSED state" was not backported: the contextual patch lines don't match. Replace this: \| if (px && !px->li_ready) { \| /* PROXY_LOCK is required */ By this: \| if (px && !px->li_ready) { \| ha_warning("Paused %s %s.\n", proxy_cap_str(px->cap), px->id);	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	2370599f96	MINOR: listener: make sure we don't pause/resume bypassed listeners Some listeners are kept in LI_ASSIGNED state but are not supposed to be started since they were bypassed on initial startup (eg: in protocol_bind_all() or in enable_listener()...) Introduce the LI_F_FINALIZED flag: when the variable is non zero it means that the listener made it past the LI_LISTEN state (finalized) at least once so we can safely pause / resume. This way we won't risk starting a previously bypassed listener which never made it that far and thus was not expected to be lazy-started by accident. As listener_pause() and listener_resume() are currently partially broken, such unexpected lazy-start won't happen. But we're trying to restore pause() and resume() behavior so this patch will be required before going any further. We had to re-introduce listeners 'flags' struct member since it was recently moved into bind_conf struct. But here we do have a legitimate need for these listener-only flags. This should only be backported if explicitly required by another commit. -- Backport notes: -> 2.4 and 2.5: The 2-bytes hole we're using in the current patch does not apply, let's use the 4-byte hole located under the 'option' field. Replace this: \|@@ -226,7 +226,8 @@ struct li_per_thread { \| struct listener { \| enum obj_type obj_type; /* object type = OBJ_TYPE_LISTENER / \| enum li_state state; / state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL / \|- / 2-byte hole here / \|+ uint16_t flags; / listener flags: LI_F_* / \| int luid; / listener universally unique ID, used for SNMP / \| int nbconn; / current number of connections on this listener / \| unsigned int thr_idx; / thread indexes for queue distribution : (t2<<16)+t1 / By this: \|@@ -209,6 +209,8 @@ struct listener { \| short int nice; / nice value to assign to the instantiated tasks / \| int luid; / listener universally unique ID, used for SNMP / \| int options; / socket options : LI_O_* / \|+ uint16_t flags; / listener flags: LI_F_* / \|+ / 2-bytes hole here / \| __decl_thread(HA_RWLOCK_T lock); \| \| struct fe_counters counters; /* statistics counters / -> 2.4 only: We need to adjust some contextual lines. Replace this: \|@@ -477,7 +478,7 @@ int pause_listener(struct listener l, int lpx, int lli) \| if (!lli) \| HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \|- if (l->state <= LI_PAUSED) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state <= LI_PAUSED) \| goto end; \| \| if (l->rx.proto->suspend) By this: \|@@ -477,7 +478,7 @@ int pause_listener(struct listener l, int lpx, int lli) \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) \| goto end; \| \|- if (l->state <= LI_PAUSED) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state <= LI_PAUSED) \| goto end; \| \| if (l->rx.proto->suspend) And this: \|@@ -535,7 +536,7 @@ int resume_listener(struct listener l, int lpx, int lli) \| if (MT_LIST_INLIST(&l->wait_queue)) \| goto end; \| \|- if (l->state == LI_READY) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state == LI_READY) \| goto end; \| \| if (l->rx.proto->resume) By this: \|@@ -535,7 +536,7 @@ int resume_listener(struct listener l, int lpx, int lli) \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) \| goto end; \| \|- if (l->state == LI_READY) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state == LI_READY) \| goto end; \| \| if (l->rx.proto->resume) -> 2.6 and 2.7 only: struct listener 'flags' member still exists, let's use it. Remove this from the current patch: \|@@ -226,7 +226,8 @@ struct li_per_thread { \| struct listener { \| enum obj_type obj_type; / object type = OBJ_TYPE_LISTENER / \| enum li_state state; / state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL / \|- / 2-byte hole here / \|+ uint16_t flags; / listener flags: LI_F_* / \| int luid; / listener universally unique ID, used for SNMP / \| int nbconn; / current number of connections on this listener / \| unsigned int thr_idx; / thread indexes for queue distribution : (t2<<16)+t1 / Then, replace this: \|@@ -251,6 +250,9 @@ struct listener { \| EXTRA_COUNTERS(extra_counters); \| }; \| \|+/ listener flags (16 bits) / \|+#define LI_F_FINALIZED 0x0001 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+ \| / Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of \| * success, or a combination of ERR_* flags if an error is encountered. The \| * function pointer can be NULL if not implemented. The function also has an By this: \|@@ -221,6 +221,7 @@ struct li_per_thread { \| }; \| \| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic / \|+#define LI_F_FINALIZED 0x00000002 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \| \| / The listener will be directly referenced by the fdtab[] which holds its \| * socket. The listener provides the protocol-specific accept() function to	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	f5d98938ad	MINOR: listener: workaround for closing a tiny race between resume_listener() and stopping This is an alternative fix that tries to address the same issue as `d1ebee177` ("BUG/MINOR: listener: close tiny race between resume_listener() and stopping") while allowing resume_listener() to be more versatile. Indeed, because of the previous fix, resume_listener() is not able to rebind stopped listeners, and this breaks the original behavior that is documented in the function description: "If the listener was only in the assigned state, it's totally rebound. This can happen if a pause() has completely stopped it. If the resume fails, 0 is returned and an error might be displayed." With relax_listener(), we now make sure to check l->state under the listener lock so we don't call resume_listener() when the conditions are not met. As such, concurrently stopped listeners may not be rebound using relax_listener(). Note: the documented race can't happen since `1b927eb3c` ("MEDIUM: proto: stop protocols under thread isolation during soft stop"), but older versions are concerned as `1b927eb3c` was not marked for backports. Moreover, the patch also prevents the race between protocol_pause_all() and resuming from LIMITED or FULL states. This commit depends on: - "MINOR: listener: add relax_listener() function" This should be backported with `d1ebee177` up to 2.4 (`d1ebee177` is marked to be backported for all stable versions but the current patch does not apply for versions < 2.4)	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	bcad7e6319	MINOR: listener: add relax_listener() function There is a need for a small difference between resuming and relaxing a listener. When resuming, we expect that the listener may completely resume, this includes unpausing or rebinding if required. Resuming a listener is a best-effort operation: no matter the current state, try our best to bring the listener up to the LI_READY state. There are some cases where we only want to "relax" listeners that were previously restricted using limit_listener() or listener_full() functions. Here we don't want to ressucitate listeners, we're simply interested in cancelling out the previous restriction. To this day, listener_resume() on a unbound listener is broken, that's why the need for this wasn't felt yet. But we're trying to restore historical listener_resume() behavior, so we better prepare for this by introducing an explicit relax_listener() function that only does what is expected in such cases. This commit depends on: - "MINOR: listener/api: add lli hint to listener functions"	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	4059e094db	MINOR: listener/api: add lli hint to listener functions Add listener lock hint (AKA lli) to (stop/resume/pause)_listener() functions. All these functions implicitely take the listener lock when they are called: It could be useful to be able to call them while already holding the lock, so we're adding lli hint to make them take the lock only when it is missing. This should only be backported if explicitly required by another commit -- -> 2.4 and 2.5 common backport notes: These 2 commits need to be backported first: - `187396e34` "CLEANUP: listener: function comment typo in stop_listener()" - `a57786e87` "BUG/MINOR: listener: null pointer dereference suspected by coverity" -> 2.4 special backport notes: In addition to the previously mentionned dependencies, the patch needs to be slightly adapted to match the corresponding contextual lines: Replace this: \|@@ -471,7 +474,8 @@ int pause_listener(struct listener l, int lpx) \| if (!lpx && px) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock); \| \|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \|+ if (!lli) \|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \| if (l->state <= LI_PAUSED) \| goto end; By this: \|@@ -471,7 +474,8 @@ int pause_listener(struct listener l, int lpx) \| if (!lpx && px) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock); \| \|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \|+ if (!lli) \|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \| if ((global.mode & (MODE_DAEMON \| MODE_MWORKER)) && \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) Replace this: \|@@ -169,7 +169,7 @@ void protocol_stop_now(void) \| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock); \| list_for_each_entry(proto, &protocols, list) { \| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list) \|- stop_listener(listener, 0, 1); \|+ stop_listener(listener, 0, 1, 0); \| } \| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); \| } By this: \|@@ -169,7 +169,7 @@ void protocol_stop_now(void) \| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock); \| list_for_each_entry(proto, &protocols, list) { \| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list) \| if (!listener->bind_conf->frontend->grace) \|- stop_listener(listener, 0, 1); \|+ stop_listener(listener, 0, 1, 0); \| } \| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); Replace this: \|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy p) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock); \| \| list_for_each_entry(l, &p->conf.listeners, by_fe) \|- stop_listener(l, 1, 0); \|+ stop_listener(l, 1, 0, 0); \| \| if (!(p->flags & (PR_FL_DISABLED\|PR_FL_STOPPED)) && !p->li_ready) { \| / might be just a backend / By this: \|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy p) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock); \| \| list_for_each_entry(l, &p->conf.listeners, by_fe) \|- stop_listener(l, 1, 0); \|+ stop_listener(l, 1, 0, 0); \| \| if (!p->disabled && !p->li_ready) { \| /* might be just a backend */	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	9c3214c7b4	MINOR: proto_uxst: add resume method resume method was not explicitly defined for uxst protocol family. Here we can safely use the default_resume_listener, just like the uxdg family. This could be backported up to 2.4.	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	8429627e3c	BUG/MINOR: protocol: fix minor memory leak in protocol_bind_all() In protocol_bind_all() (involved in startup sequence): We only free errmsg (set by fam->bind() attempt) when we make use of it. But this could lead to some memory leaks because there are some cases where we ignore the error message (e.g: verbose=0 with ERR_WARN messages). As long as errmsg is set, we should always free it. As mentioned earlier, this really is a minor leak because it can only occur on specific conditions (error paths) during the startup phase. This may be backported up to 2.4. -- Backport notes: -> 2.4 only: Replace this: \| ha_warning("Binding [%s:%d] for %s %s: %s\n", \| listener->bind_conf->file, listener->bind_conf->line, \| proxy_type_str(px), px->id, errmsg); By this: \| else if (lerr & ERR_WARN) \| ha_warning("Starting %s %s: %s\n", \| proxy_type_str(px), px->id, errmsg);	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	d861dc9b48	BUG/MINOR: proto_ux: report correct error when bind_listener fails In uxst_bind_listener() and uxdg_bind_listener(), when the function fails because the listener is not bound, both function are setting the error message but don't set the err status before returning. Because of this, such error is not properly handled by the upper functions. Making sure this error is properly catched by returning a composition of ERR_FATAL and ERR_ALERT. This could be backported up to 2.4.	2023-02-23 15:05:05 +01:00
Christopher Faulet	be5cc766b0	MINOR: stconn: Remove half-closed timeout The half-closed timeout is now directly retrieved from the proxy settings. There is no longer usage for the .hcto field in the stconn structure. So let's remove it.	2023-02-22 15:59:16 +01:00
Christopher Faulet	bcdcfad3ff	MINOR: stconn: Set half-close timeout using proxy settings We now directly use the proxy settings to set the half-close timeout of a stream-connector. The function sc_set_hcto() must be used to do so. This timeout is only set when a shutw is performed. So it is not really a big deal to use a dedicated function to do so.	2023-02-22 15:59:16 +01:00
Christopher Faulet	15315d6c0a	CLEANUP: stconn: Remove old read and write expiration dates Old read and write expiration dates are no longer used. Thus we can safely remove them.	2023-02-22 15:59:16 +01:00
Christopher Faulet	b08c5259eb	MINOR: stconn: Always report READ/WRITE event on shutr/shutw It was done by hand by callers when a shutdown for read or write was performed. It is now always handled by the functions performing the shutdown. This way the callers don't take care of it. This will avoid some bugs.	2023-02-22 15:59:16 +01:00
Christopher Faulet	80e4532105	MINOR: stream: Use relative expiration date in trace messages Expiration dates in trace messages are now relative to now_ms. It is fairly easier to read traces this way. And an expired date is now negative. So, it is also easy to detect when a timeout was reached.	2023-02-22 15:59:16 +01:00
Christopher Faulet	03d5e62e13	MINOR: stream: Report rex/wex value using the sedesc date in trace messages Becasue read and write timeout are now detected using .lra and .fsb fields of the stream-endpoint descriptor, it is better to also use these fields to report read and write expiration date in trace messages. Especially because old rex and wex fields will be removed.	2023-02-22 15:59:16 +01:00
Christopher Faulet	6e59e871bf	MINOR: stream: Dump the task expiration date in trace messages The expiration date of the stream's task is now diplayed in the trace messages. This will help to track changes in the stream traces.	2023-02-22 15:59:16 +01:00
Christopher Faulet	b374ba563a	MAJOR: stream: Use SE descriptor date to detect read/write timeouts We stop to use the channel's expiration dates to detect read and write timeouts on the channels. We now rely on the stream-endpoint descriptor to do so. All the stuff is handled in process_stream(). The stream relies on 2 helper functions to know if the receives or sends may expire: sc_rcv_may_expire() and sc_snd_may_expire().	2023-02-22 15:57:16 +01:00
Christopher Faulet	2ca4cc1936	MINOR: applet/stconn: Add a SE flag to specify an endpoint does not expect data An endpoint should now set SE_FL_EXP_NO_DATA flag if it does not expect any data from the opposite endpoint. This way, the stream will be able to disable any read timeout on the opposite endpoint. Applets should use applet_expect_no_data() and applet_expect_data() functions to set or clear the flag. For now, only dns and sink forwarder applets are concerned.	2023-02-22 15:56:28 +01:00
Christopher Faulet	4c13568b49	MEDIUM: stconn: Add two date to track successful reads and blocked sends The stream endpoint descriptor now owns two date, lra (last read activity) and fsb (first send blocked). The first one is updated every time a read activity is reported, including data received from the endpoint, successful connect, end of input and shutdown for reads. A read activity is also reported when receives are unblocked. It will be used to detect read timeouts. The other one is updated when no data can be sent to the endpoint and reset when some data are sent. It is the date of the first send blocked by the endpoint. It will be used to detect write timeouts. Helper functions are added to report read/send activity and to retrieve lra/fsb date.	2023-02-22 14:52:15 +01:00
Christopher Faulet	5aaacfbccd	MEDIUM: stconn: Replace read and write timeouts by a unique I/O timeout Read and write timeouts (.rto and .wto) are now replaced by an unique timeout, call .ioto. Since the recent refactoring on channel's timeouts, both use the same value, the client timeout on client side and the server timeout on the server side. Thus, this part may be simplified. Now it represents the I/O timeout.	2023-02-22 14:52:15 +01:00
Christopher Faulet	d7111e7ace	MEDIUM: stconn: Don't requeue the stream's task after I/O After I/O handling, in sc_notify(), the stream's task is no longer requeue. The stream may be woken up. But its task is not requeue. It is useless nowadays and only avoids a call to process_stream() for edge cases. It is not really a big deal if the stream is woken up for nothing because its task expired. At worst, it will be responsible to compute its new expiration date.	2023-02-22 14:52:15 +01:00
Christopher Faulet	f8413cba2a	MEDIUM: channel/stconn: Move rex/wex timer from the channel to the sedesc These timers are related to the I/O. Thus it is logical to move them into the SE descriptor. The patch is a bit huge but it is just a replacement. However it is error-prone. From the stconn or the stream, helper functions are used to get, set or reset these timers. This simplify the timers manipulations.	2023-02-22 14:52:15 +01:00
Christopher Faulet	ed7e66fe1a	MINOR: channel/stconn: Move rto/wto from the channel to the stconn Read and write timeouts concerns the I/O. Thus, it is logical to move it into the stconn. At the end, the stream is responsible to detect the timeouts. So it is logcial to have these values in the stconn and not in the SE descriptor. But it may change depending on the recfactoring. So, now: * scf->rto is used instead of req->rto * scf->wto is used instead of res->wto * scb->rto is used instead of res->rto * scb->wto is used instead of req->wto	2023-02-22 14:52:15 +01:00
Christopher Faulet	6362934793	DEBUG: stream/trace: Add sedesc flags in trace messages In trace messages, when info about stream-connector are dumped, the stream-endpoint descriptor flags are now also dumped.	2023-02-22 14:52:15 +01:00
Christopher Faulet	2e56a73459	MAJOR: channel: Remove flags to report READ or WRITE errors This patch removes CF_READ_ERROR and CF_WRITE_ERROR flags. We now rely on SE_FL_ERR_PENDING and SE_FL_ERROR flags. SE_FL_ERR_PENDING is used for write errors and SE_FL_ERROR for read or unrecoverable errors. When a connection error is reported, SE_FL_ERROR and SE_FL_EOS are now set and a read event and a write event are reported to be sure the stream will properly process the error. At the stream-connector level, it is similar. When an error is reported during a send, a write event is triggered. On the read side, nothing more is performed because an error at this stage is enough to wake the stream up. A major change is brought with this patch. We stop to check flags of the ooposite channel to report abort or timeout. It also means when an read or write error is reported on a side, we no longer update the other side. Thus a read error on the server side does no long lead to a write error on the client side. This should ease errors report.	2023-02-22 14:52:15 +01:00
Christopher Faulet	81fdeb8ce2	MEDIUM: channel: Remove CF_READ_NOEXP flag This flag was introduced in 1.3 to fix a design issue. It was untouch since then but there is no reason to still have this trick. Note it could be good to review what happens in HTTP with the server is waiting for the end of the request. It could be good to be sure a client timeout is always reported.	2023-02-22 14:52:14 +01:00
Aurelien DARRAGON	3ffbf3896d	BUG/MEDIUM: httpclient/lua: fix a race between lua GC and hlua_ctx_destroy In `bb581423b` ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient"), a new logic was implemented to make sure that when a lua ctx destroyed, related httpclients are correctly destroyed too to prevent a such httpclients from being resuscitated on a destroyed lua ctx. This was implemented by adding a list of httpclients within the lua ctx, and a new function, hlua_httpclient_destroy_all(), that is called under hlua_ctx_destroy() and runs through the httpclients list in the lua context to properly terminate them. This was done with the assumption that no concurrent Lua garbage collection cycles could occur on the same ressources, which seems OK since the "lua" context is about to be freed and is not explicitly being used by other threads. But when 'lua-load' is used, the main lua stack is shared between multiple OS threads, which means that all lua ctx in the process are linked to the same parent stack. Yet it seems that lua GC, which can be triggered automatically under lua_resume() or manually through lua_gc(), does not limit itself to the "coroutine" stack (the stack referenced in lua->T) when performing the cleanup, but is able to perform some cleanup on the main stack plus coroutines stacks that were created under the same main stack (via lua_newthread()) as well. This can be explained by the fact that lua_newthread() coroutines are not meant to be thread-safe by design. Source: http://lua-users.org/lists/lua-l/2011-07/msg00072.html (lua co-author) It did not cause other issues so far because most of the time when using 'lua-load', the global lua lock is taken when performing critical operations that are known to interfere with the main stack. But here in hlua_httpclient_destroy_all(), we don't run under the global lock. Now that we properly understand the issue, the fix is pretty trivial: We could simply guard the hlua_httpclient_destroy_all() under the global lua lock, this would work but it could increase the contention over the global lock. Instead, we switched 'lua->hc_list' which was introduced with `bb581423b` from simple list to mt_list so that concurrent accesses between hlua_httpclient_destroy_all and hlua_httpclient_gc() are properly handled. The issue was reported by @Mark11122 on Github #2037. This must be backported with `bb581423b` ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient") as far as 2.5.	2023-02-22 11:44:22 +01:00
Aurelien DARRAGON	0356407332	BUG/MINOR: lua/httpclient: missing free in hlua_httpclient_send() In hlua_httpclient_send(), we replace hc->req.url with a new url. But we forgot to free the original url that was allocated in hlua_httpclient_new() or in the previous httpclient_send() call. Because of this, each httpclient request performed under lua scripts would result in a small leak. When stress-testing a lua action which uses httpclient, the leak is clearly visible since we're leaking severals Mbytes per minute. This bug was discovered by chance when trying to reproduce GH issue #2037. It must be backported up to 2.5	2023-02-22 11:29:59 +01:00
Remi Tricot-Le Breton	25917cdb12	BUG/MINOR: cache: Check cache entry is complete in case of Vary Before looking for a secondary cache entry for a given request we checked that the first entry was complete, which might prevent us from using a valid entry if the first one with the same primary key is not full yet. Likewise, if the primary entry is complete but not the secondary entry we try to use, we might end up using a partial entry from the cache as a response. This bug was raised in GitHub #2048. It can be backported up to branch 2.4.	2023-02-21 18:35:48 +01:00
Remi Tricot-Le Breton	879debeecb	BUG/MINOR: cache: Cache response even if request has "no-cache" directive Since commit `cc9bf2e5f` "MEDIUM: cache: Change caching conditions" responses that do not have an explicit expiration time are not cached anymore. But this mechanism wrongly used the TX_CACHE_IGNORE flag instead of the TX_CACHEABLE one. The effect this had is that a cacheable response that corresponded to a request having a "Cache-Control: no-cache" for instance would not be cached. Contrary to what was said in the other commit message, the "checkcache" option should not be impacted by the use of the TX_CACHEABLE flag instead of the TX_CACHE_IGNORE one. The response is indeed considered as not cacheable if it has no expiration time, regardless of the presence of a cookie in the response. This should fix GitHub issue #2048. This patch can be backported up to branch 2.4.	2023-02-21 18:35:41 +01:00
William Lallemand	d4c0be6b20	MINOR: startup: HAPROXY_STARTUP_VERSION contains the version used to start HAPROXY_STARTUP_VERSION: contains the version used to start, in master-worker mode this is the version which was used to start the master, even after updating the binary and reloading. This patch could be backported in every version since it is useful when debugging.	2023-02-21 14:16:45 +01:00
William Lallemand	cc5b9fa593	BUG/MEDIUM: mworker: don't register mworker_accept_wrapper() when master FD is wrong This patch handles the case where the fd could be -1 when proc_self was lost for some reason (environment variable corrupted or upgrade from < 1.9). This could result in a out of bound array access fdtab[-1] and would crash. Must be backported in every maintained versions.	2023-02-21 13:53:35 +01:00
William Lallemand	e16d32050e	BUG/MEDIUM: mworker: prevent inconsistent reload when upgrading from old versions Previous versions ( < 1.9 ) of the master-worker process didn't had the "HAPROXY_PROCESSES" environment variable which contains the list of processes, fd etc. The part which describes the master is created at first startup so if you started the master with an old version you would never have it. Since patch `68836740` ("MINOR: mworker: implement a reload failure counter"), the failedreloads member of the proc_self structure for the master is set to 0. However if this structure does not exist, it will result in a NULL dereference and crash the master. This patch fixes the issue by creating the proc_self structure for the master when it does not exist. It also shows a warning which states to restart the master if that is the case, because we can't guarantee that it will be working correctly. This MUST be backported as far as 2.5, and could be backported in every other stable branches.	2023-02-21 13:53:35 +01:00
William Lallemand	d27f457eea	BUG/MINOR: mworker: stop doing strtok directly from the env When parsing the HAPROXY_PROCESSES environement variable, strtok was done directly from the ptr resulting from getenv(), which replaces the ; by \0, showing confusing environment variables when debugging in /proc or in a corefile. Example: (gdb) x/39s *environ [...] 0x7fff6935af64: "HAPROXY_PROCESSES=\|type=w" 0x7fff6935af7e: "fd=3" 0x7fff6935af83: "pid=4444" 0x7fff6935af8d: "rpid=1" 0x7fff6935af94: "reloads=0" 0x7fff6935af9e: "timestamp=1676338060" 0x7fff6935afb3: "id=" 0x7fff6935afb7: "version=2.4.0-8076da-1010+11" This patch fixes the issue by doing a strdup on the variable. Could be backported in previous versions (mworker_proc_to_env_list exists since 1.9)	2023-02-21 13:53:33 +01:00
Christopher Faulet	c13f3028e8	MINOR: cfgcond: Implement enabled condition expression Implement a way to test if some options are enabled at run-time. For now, following options may be detected: POLL, EPOLL, KQUEUE, EVPORTS, SPLICE, GETADDRINFO, REUSEPORT, FAST-FORWARD, SERVER-SSL-VERIFY-NONE These options are those that can be disabled on the command line. This way it is possible, from a reg-test for instance, to know if a feature is supported or not : feature cmd "$HAPROXY_PROGRAM -cc '!(globa.tune & GTUNE_NO_FAST_FWD)'"	2023-02-21 11:44:55 +01:00
Christopher Faulet	a1fdad784b	MINOR: cfgcond: Implement strstr condition expression Implement a way to match a substring in a string. The strstr expresionn can now be used to do so.	2023-02-21 11:44:55 +01:00
Christopher Faulet	2f7c82bfdf	BUG/MINOR: haproxy: Fix option to disable the fast-forward The option was renamed to only permit to disable the fast-forward. First there is no reason to enable it because it is the default behavior. Then it introduced a bug because there is no way to be sure the command line has precedence over the configuration this way. So, the option is now named "tune.disable-fast-forward" and does not support any argument. And of course, the commande line option "-dF" has now precedence over the configuration. No backport needed.	2023-02-21 11:44:55 +01:00
Christopher Faulet	d17dd848c4	MINOR: proxy: Only consider backend httpclose option for server connections For server connections, both the frontend and backend were considered to enable the httpclose option. However, it is ambiguous because on client side only the frontend is considerd. In addition for 2 frontends, one with the option enabled and not for the other, the HTTP connection mode may differ while it is a backend setting. Thus, now, for the server side, only the backend is considered. Of course, if the option is set for a listener, the option will be enabled if the listener is the backend's connection.	2023-02-21 11:44:55 +01:00
Christopher Faulet	a62201df5a	DEBUG: stream: Add a BUG_ON to never exit process_stream with an expired task We must never exit for the stream processing function with an expired task. Otherwise, we are pretty sure this will ends with a spinning loop. It is really better to abort as far as possible and with the original buggy state. This will ease the debug sessions.	2023-02-21 11:35:09 +01:00
Fr�d�ric L�caille	bbf86be996	BUG/MEDIUM: quic: Missing TX buffer draining from qc_send_ppkts() If the TX buffer (->tx.buf) attached to the connection is not drained, there are chances that this will be detected by qc_txb_release() which triggers a BUG_ON_HOT() when this is the case as follows [00\|quic\|2\|c_conn.c:3477] UDP port unreachable : qc@0x5584f18d6d50 pto_count=0 cwnd=6816 ppif=1046 pif=1046 [00\|quic\|5\|ic_conn.c:749] qc_kill_conn(): entering : qc@0x5584f18d6d50 [00\|quic\|5\|ic_conn.c:752] qc_kill_conn(): leaving : qc@0x5584f18d6d50 [00\|quic\|5\|c_conn.c:3532] qc_send_ppkts(): leaving : qc@0x5584f18d6d50 pto_count=0 cwnd=6816 ppif=1046 pif=1046 FATAL: bug condition "buf && b_data(buf)" matched at src/quic_conn.c:3098 Consume the remaining data in the TX buffer calling b_del(). This bug arrived with this commit: `a2c62c314` MINOR: quic: Kill the connections on ICMP (port unreachable) packet receipt Takes also the opportunity of this patch to modify the comments for qc_send_ppkts() which should have arrived with `a2c62c314` commit. Must be backported to 2.7 where this latter commit is supposed to be backported.	2023-02-21 11:06:10 +01:00
Willy Tarreau	0d6e5d271f	MINOR: mux-h2/traces: add a missing TRACE_LEAVE() in h2s_frt_handle_headers() Traces from this function would miss a TRACE_LEAVE() on the success path, which had for consequences, 1) that it was difficult to figure where the function was left, and 2) that we never had the allocated stream ID clearly visible (actually the one returned by h2c_frt_stream_new() is the right one but it's not obvious). This can be backported to 2.7 and 2.6.	2023-02-20 17:22:03 +01:00
Willy Tarreau	f9f4499429	MINOR: mux-h2/traces: do not log h2s pointer for dummy streams Functions which are called with dummy streams pass it down the traces and that leads to somewhat confusing "h2s=0x1234568(0,IDL)" for example while the nature of the called function makes this stream useless at that place. Better not report a random pointer, especially since it always requires to look at the code before remembering how this should be interpreted. Now what we're doing is that the idle stream only prints "h2s=IDL" which is shorter and doesn't report a pointer, closed stream do not report anything since the stream ID 0 already implies it, and other ones are reported normally. This could be backported to 2.7 and 2.6 as it improves traces legibility.	2023-02-20 17:22:03 +01:00
Amaury Denoyelle	77ed63106d	MEDIUM: quic: trigger fast connection closing on process stopping With previous commit, quic-conn are now handled as jobs to prevent the termination of haproxy process. This ensures that QUIC connections are closed when all data are acknowledged by the client and there is no more active streams. The quic-conn layer emits a CONNECTION_CLOSE once the MUX has been released and all streams are acknowledged. Then, the timer is scheduled to definitely free the connection after the idle timeout period. This allows to treat late-arriving packets. Adjust this procedure to deactivate this timer when process stopping is in progress. In this case, quic-conn timer is set to expire immediately to free the quic-conn instance as soon as possible. This allows to quickly close haproxy process. This should be backported up to 2.7.	2023-02-20 11:20:18 +01:00
Amaury Denoyelle	fb375574f9	MINOR: quic: mark quic-conn as jobs on socket allocation To prevent data loss for QUIC connections, haproxy global variable jobs is incremented each time a quic-conn socket is allocated. This allows the QUIC connection to terminate all its transfer operation during proxy soft-stop. Without this patch, the process will be terminated without waiting for QUIC connections. Note that this is done in qc_alloc_fd(). This means only QUIC connection with their owned socket will properly support soft-stop. In the other case, the connection will be interrupted abruptly as before. Similarly, jobs decrement is conducted in qc_release_fd(). This should be backported up to 2.7.	2023-02-20 11:20:18 +01:00
Amaury Denoyelle	b3aa07c78e	MEDIUM: mux-quic: properly implement soft-stop Properly implement support for haproxy soft-stop on QUIC MUX. This code is similar to H2 MUX : * on timeout refresh, if stop-stop in progress, schedule the timeout to expire with regards to the close-spread-end window. * after input/output processing, if soft-stop in progress, shutdown the connection. This is randomly spread by close-spread-end window. In the case of H3 connection, a GOAWAY is emitted and the connection is kept until all data are sent for opened streams. If the client tries to use new streams, they are rejected in conformance with the GOAWAY specification. This ensures that MUX is able to forward all content properly before closing the connection. The lower quic-conn layer is then responsible for retransmission and should be closed when all data are acknowledged. This will be implemented in the next commit to fully support soft-stop for QUIC connections. This should be backported up to 2.7.	2023-02-20 11:20:18 +01:00
Amaury Denoyelle	eb7d320d25	MINOR: mux-quic: implement client-fin timeout Implement client-fin timeout for MUX quic. This timeout is used once an applicative layer shutdown has been called. In HTTP/3, this corresponds to the emission of a GOAWAY. This should be backported up to 2.7.	2023-02-20 11:20:18 +01:00
Amaury Denoyelle	14dbb848af	MINOR: mux-quic: define qc_process() Define a new function qc_process(). This function will regroup several internal operation which should be called both on I/O tasklet and wake() callback. For the moment, only streams purge is conducted there. This patch is useful to support haproxy soft stop. This should be backported up to 2.7.	2023-02-20 11:19:45 +01:00
Amaury Denoyelle	b30247b16c	MINOR: mux-quic: define qc_shutdown() Factorize shutdown operation in a dedicated function qc_shutdown(). This will allow to call it from multiple places. A new flag QC_CF_APP_SHUT is also defined to ensure it will only be executed once even if called multiple times per connection. This commit will be useful to properly support haproxy soft stop. This should be backported up to 2.7.	2023-02-20 11:18:58 +01:00
Amaury Denoyelle	3d550848be	MEDIUM: h3: enforce GOAWAY by resetting higher unhandled stream When a GOAWAY has been emitted, an ID is announced to represent handled streams. H3 RFC suggests that higher streams should be resetted with the error code H3_REQUEST_CANCELLED. This allows the peer to replay requests on another connection. For the moment, the impact of this change is limitted as GOAWAY is only used on connection shutdown just before the MUX is freed. However, for soft-stop support, a GOAWAY can be emitted in anticipation while keeping the MUX to finish the active streams. In this case, new streams opened by the client are resetted. As a consequence of this change, app_ops.attach() operation has been delayed at the very end of qcs_new(). This ensure that all qcs members are initialized to support RESET_STREAM sending. This should be backported up to 2.7.	2023-02-20 11:18:25 +01:00
Amaury Denoyelle	35d9053b68	BUG/MINOR: h3: prevent hypothetical demux failure on int overflow h3s stores the current demux frame type and length as a state info. It should be big enough to store a QUIC variable-length integer which is the maximum H3 frame type and size. Without this patch, there is a risk of integer overflow if H3 frame size is bigger than INT_MAX. This can typically causes demux state mismatch and demux frame error. However, no occurence has been found yet of this bug with the current implementation. This should be backported up to 2.6.	2023-02-20 11:15:09 +01:00
Amaury Denoyelle	156a89aef8	BUG/MINOR: quic: acknowledge STREAM frame even if MUX is released When the MUX is freed, the quic-conn layer may stay active until all streams acknowledgment are processed. In this interval, if a new stream is opened by the client, the quic-conn is thus now responsible to handle it. This is done by the emission of a STOP_SENDING + RESET_STREAM. Prior to this patch, the received packet was not acknowledged. This is undesirable if the quic-conn is able to properly reject the request as this can lead to unneeded retransmission from the client. This must be backported up to 2.6.	2023-02-20 10:54:27 +01:00
Amaury Denoyelle	7546301712	BUG/MINOR: quic: also send RESET_STREAM if MUX released When the MUX is freed, the quic-conn layer may stay active until all streams acknowledgment are processed. In this interval, if a new stream is opened by the client, the quic-conn is thus now responsible to handle it. This is done by the emission of a STOP_SENDING. This process has been completed to also emit a RESET_STREAM with the same error code H3_REQUEST_REJECTED. This is done to conform with the H3 specification to invite the client to retry its request on a new connection. This should be backported up to 2.6.	2023-02-20 10:52:51 +01:00
Amaury Denoyelle	38836b6b3d	MINOR: quic: adjust request reject when MUX is already freed When the MUX is freed, the quic-conn layer may stay active until all streams acknowledgment are processed. In this interval, if a new stream is opened by the client, the quic-conn is thus now responsible to handle it. This is done by the emission of a STOP_SENDING. This process is closely related to HTTP/3 protocol despite being handled by the quic-conn layer. This highlights a flaw in our QUIC architecture which should be adjusted. To reflect this situation, the function qc_stop_sending_frm_enqueue() is renamed qc_h3_request_reject(). Also, internal H3 treatment such as uni-directional bypass has been moved inside the function. This commit is only a refactor. However, bug fix on next patches will rely on it so it should be backported up to 2.6.	2023-02-20 10:52:05 +01:00
Frédéric Lécaille	5faf577997	BUG/MINOR: quic: Missing padding for short packets This was revealed by Amaury when setting tune.quic.frontend.max-streams-bidi to 8 and asking a client to open 12 streams. haproxy has to send short packets with little MAX_STREAMS frames encoded with 2 bytes. In addition to a packet number encoded with only one byte. In the case <len_frms> is the length of the encoded frames to be added to the packet plus the length of the packet number. Ensure the length of the packet is at least QUIC_PACKET_PN_MAXLEN adding a PADDING frame wich (QUIC_PACKET_PN_MAXLEN - <len_frms>) as size. For instance with a two bytes MAX_STREAMS frames and a one byte packet number length, this adds one byte of padding. See https://datatracker.ietf.org/doc/html/rfc9001#name-header-protection-sample. Must be backported to 2.7 and 2.6.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	35218c6357	BUG/MINOR: quic: Do not drop too small datagrams with Initial packets When receiving an Initial packet a peer must drop it if the datagram is smaller than 1200. Before this patch, this is the entire datagram which was dropped. In such a case, drop the packet after having parsed its length. Must be backported to 2.6 and 2.7	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	8f7d22406c	BUG/MINOR: quic: Wrong initialization for io_cb_wakeup boolean This bug arrives with this commit: `982896961` MINOR: quic: split and rename qc_lstnr_pkt_rcv() The first block of code consists in possibly setting this variable to true. But it was already initialized to true before entering this code section. Should be initialized to false. Also take the opportunity to remove an unused "err" label. Must be backported to 2.6 and 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	7c6d8f88df	BUG/MINOR: quic: Do not probe with too little Initial packets Before probing the Initial packet number space, verify that we can at least sent 1200 bytes by datagram. This may not be the case due to the amplification limit. Must be backported to 2.6 and 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	4540053fa6	MINOR: quic: Add <pto_count> to the traces This may be useful to diagnose issues in relation with QUIC recovery. Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	bc09f745e6	MINOR: quic: Add a trace to identify connections which sent Initial packet. This should help in diagnosing issues revealed by the interop runner which counts the number of handshakes from the number of Initial packets sent by the server. Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	1e8ef1bed6	BUG/MINOR: quic: Missing call to task_queue() in qc_idle_timer_do_rearm() The aim of this function is to rearm the idle timer. The ->expire field of the timer task was updated without being requeued. Some connection could be unexpectedly terminated. Must be backported to 2.6 and 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	e1738df468	MINOR: quic: Make qc_dgrams_retransmit() return a status. This is very helpful during retranmission when receiving ICMP port unreachable errors after the peer has left. This is the unique case at prevent where qc_send_hdshk_pkts() or qc_send_app_probing() may fail (when they call qc_send_ppkts() which fails with ECONNREFUSED as errno). Also make the callers qc_dgrams_retransmit() stop their packet process. This is the case of quic_conn_app_io_cb() and quic_conn_io_cb(). This modifications stops definitively any packet processing when receiving ICMP port unreachable errors. Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	2f531116ed	MINOR: quic: Add traces to qc_kill_conn() Very minor modification to help in debugging issues. Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	a2c62c3141	MINOR: quic: Kill the connections on ICMP (port unreachable) packet receipt The send*() syscall which are responsible of such ICMP packets reception fails with ECONNREFUSED as errno. man(7) udp ECONNREFUSED No receiver was associated with the destination address. This might be caused by a previous packet sent over the socket. We must kill asap the underlying connection. Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	dd41a45014	MINOR: quic: Simplication for qc_set_timer() There is no reason to run code for nothing if the timer task has been released when entering qc_set_timer(). Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	dea3298282	BUG/MINOR: quic: Really cancel the connection timer from qc_set_timer() The ->expire field of the timer task to be cancelled was not reset to TICK_ETERNITY. Must be backported to 2.6 and 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	75c8ad5490	MINOR: quic: Move code to wakeup the timer task to avoid anti-amplication deadlock This code was there because the timer task was not running on the same thread as the one which parse the QUIC packets. Now that this is no more the case, we can wake up this task directly. Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	1dbeb35f80	MINOR: quic: Add new traces about by connection RX buffer handling Move quic_rx_pkts_del() out of quic_conn.h to make it benefit from the TRACE API. Add traces which already already helped in diagnosing an issue encountered with ngtcp2 which sent too much 1RTT packets before the handshake completion. This has been fixed here after having discussed with Tasuhiro on QUIC dev slack: https://github.com/ngtcp2/ngtcp2/pull/663 Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	9fc10aff05	BUG/MINOR: quic: Possible unexpected counter incrementation on send() errors Some counters could potentially be incremented even if send() syscall returned no error when ret >= 0 and ret != sz. This could be the case for instance if a first call to send() returned -1 with errno set to EINTR (or any previous syscall which set errno to a non-null value) and if the next call to send() returned something positive and smaller than <sz>. Must be backported to 2.7 and 2.6.	2023-02-17 17:36:30 +01:00
Amaury Denoyelle	14037bf26f	MINOR: h3: add traces on decode_qcs callback Add traces inside h3_decode_qcs(). Every error path has now its dedicated trace which should simplify debugging. Each early returns has been converted to a goto invocation. To complete the demux tracing, demux frame type and length are now printed using the h3s instance whenever its possible on trace invocation. A new internal value H3_FT_UNINIT is used as a frame type to mark demuxing as inactive. This should be backported up to 2.7.	2023-02-17 17:31:52 +01:00
William Lallemand	5a7f83af84	BUG/MINOR: mworker: prevent incorrect values in uptime Since the recent changes on the clocks, now.tv_sec is not to be used between processes because it's a clock which is local to the process and does not contain a real unix timestamp. This patch fixes the issue by using "data.tv_sec" which is the wall clock instead of "now.tv_sec'. It prevents having incoherent timestamps. It also introduces some checks on negatives values in order to never displays a netative value if it was computed from a wrong value set by a previous haproxy version. It must be backported as far as 2.0.	2023-02-17 17:17:28 +01:00
Amaury Denoyelle	fa241939c7	BUG/MINOR: mux-quic: transfer FIN on empty STREAM frame Implement support for clients that emit the stream FIN with an empty STREAM frame. For that, qcc_recv() offset comparison has been adjusted. If offset has already been received but the FIN bit is now transmitted, do not skip the rest of the function and call application layer decode_qcs() callback. Without this, streams will be kept open forever as HTX EOM is never transfered to the upper stream layer. This behavior was observed with mvfst client prior to its patch 38c955a024aba753be8bf50fdeb45fba3ac23cfd Fix hq-interop (HTTP 0.9 over QUIC) This notably caused the interop multiplexing test to fail as unclosed streams on haproxy side prevented the emission of new MAX_STREAMS frame to the client. This shoud be backported up to 2.6. It also relies on previous commit : `381d8137e3` MINOR: h3/hq-interop: handle no data in decode_qcs() with FIN set	2023-02-17 16:28:12 +01:00
Amaury Denoyelle	381d8137e3	MINOR: h3/hq-interop: handle no data in decode_qcs() with FIN set Properly handle a STREAM frame with no data but the FIN bit set at the application layer. H3 and hq-interop decode_qcs() callback have been adjusted to not return early in this case. If the FIN bit is accepted, a HTX EOM must be inserted for the upper stream layer. If the FIN is rejected because the stream cannot be closed, a proper CONNECTION_CLOSE error will be triggered. A new utility function qcs_http_handle_standalone_fin() has been implemented in the qmux_http module. This allows to simply add the HTX EOM on qcs HTX buffer. If the HTX buffer is empty, a EOT is first added to ensure it will be transmitted above. This commit will allow to properly handle FIN notify through an empty STREAM frame. However, it is not sufficient as currently qcc_recv() skip the decode_qcs() invocation when the offset is already received. This will be fixed in the next commit. This should be backported up to 2.6 along with the next patch.	2023-02-17 16:25:00 +01:00
Willy Tarreau	3e820a1056	MINOR: threads: add flags to know if a thread is started and/or running Several times during debugging it has been difficult to find a way to reliably indicate if a thread had been started and if it was still running. It's really not easy because the elements we look at are not necessarily reliable (e.g. harmless bit or idle bit might not reflect what we think during a signal). And such notions can be subjective anyway. Here we define two thread flags, TH_FL_STARTED which is set as soon as a thread enters run_thread_poll_loop() and drops the idle bit, and another one, TH_FL_IN_LOOP, which is set when entering run_poll_loop() and cleared when leaving it. This should help init/deinit code know whether it's called from a non-initialized thread (i.e. tid must not be trusted), or shared functions know if they're being called from a running thread or from init/deinit code outside of the polling loop.	2023-02-17 16:01:34 +01:00
Willy Tarreau	ba4c7a1597	BUG/MEDIUM: sched: allow a bit more TASK_HEAVY to be processed when needed As reported in github issue #1881, there are situations where an excess of TLS handshakes can cause a livelock. What's happening is that normally we process at most one TLS handshake per loop iteration to maintain the latency low. This is done by tagging them with TASK_HEAVY, queuing these tasklets in the TL_HEAVY queue. But if something slows down the loop, such as a connect() call when no more ports are available, we could end up processing no more than a few hundred or thousands handshakes per second. If the llmit becomes lower than the rate of incoming handshakes, we will accumulate them and at some point users will get impatient and give up or retry. Then a new problem happens: the queue fills up with even more handshake attempts, only one of which will be handled per iteration, so we can end up processing only outdated handshakes at a low rate, with basically nothing else in the queue. This can for example happen in parallel with health checks that don't require incoming handshakes to succeed to continue to cause some activity that could maintain the high latency stuff active. Here we're taking a slightly different approach. First, instead of always allowing only one handshake per loop (and usually it's critical for latency), we take the current situation into account: - if configured with tune.sched.low-latency, the limit remains 1 - if there are other non-heavy tasks, we set the limit to 1 + one per 1024 tasks, so that a heavily loaded queue of 4k handshakes per thread will be able to drain them at ~4 per loops with a limited impact on latency - if there are no other tasks, the limit grows to 1 + one per 128 tasks, so that a heavily loaded queue of 4k handshakes per thread will be able to drain them at ~32 per loop with still a very limited impact on latency since only I/O will get delayed. It was verified on a 56-core Xeon-8480 that this did not degrade the latency; all requests remained below 1ms end-to-end in full close+ handshake, and even 500us under low-lat + busy-polling. This must be backported to 2.4.	2023-02-17 16:01:34 +01:00
Willy Tarreau	2e270cf0b0	BUG/MINOR: sched: properly report long_rq when tasks remain in the queue There's a per-thread "long_rq" counter that is used to indicate how often we leave the scheduler with tasks still present in the run queue. The purpose is to know when tune.runqueue-depth served to limit latency, due to a large number of tasks being runnable at once. However there's a bug there, it's not always set: if after the first run, one heavy task was processed and later only heavy tasks remain, we'll loop back to not_done_yet where we try to pick more tasks, but none are eligible (since heavy ones have already run) so we directly return without incrementing the counter. This is what causes ultra-low values on long_rq during massive SSL handshakes, that are confusing because they make one believe that tl_class_mask doesn't have the HEAVY flag anymore. Let's just fix that by not returning from the middle of the function. This can be backported as far as 2.4.	2023-02-17 16:01:34 +01:00
Willy Tarreau	5405c9cdf3	BUG/MEDIUM: wdt: fix wrong thread being checked for sleeping In 2.7, the method used to check for a sleeping thread changed with commit `e7475c8e7` ("MEDIUM: tasks/fd: replace sleeping_thread_mask with a TH_FL_SLEEPING flag"). Previously there was a global sleeping mask and now there is a flag per thread. The commit above partially broke the watchdog by looking at the current thread's flags via th_ctx instead of the reported thread's flags, and using an AND condition instead of an OR to update and leave. This can cause a wrong thread to be killed when the load is uneven. For example, when enabling busy polling and sending traffic over a single connection, all threads have their run time grow, and if the one receiving the signal is also processing some traffic, it will not match the sleeping/harmless condition and will set the stuck flag, then die upon next invocation. While it's reproducible in tests, it's unlikely to be met in field. This fix should be backported to 2.7.	2023-02-17 16:01:34 +01:00
Christopher Faulet	678a4ced70	MINOR: haproxy: Add an command option to disable data fast-forward The -dF option can now be used to disable data fast-forward. It does the same than the global option "tune.fast-forward off". Some reg-tests may rely on this optim. To detect the feature and skip such script, the following vtest command must be used: feature cmd "$HAPROXY_PROGRAM -cc '!(globa.tune & GTUNE_NO_FAST_FWD)'"	2023-02-17 10:17:02 +01:00
Christopher Faulet	d4eaa8af6b	MINOR: global: Add an option to disable the data fast-forward The new global option "tune.fast-forward" can be set to "off" to disable the data fast-forward. It is an debug option, thus it is internally marked as experimental. The directive "expose-experimental-directives" must be set first to use this one. By default, the data fast-forward is enable. It could be usefull to force to wake the stream up when data are received. To be sure, evreything works fine in this case. The data fast-forward is an optim. It must work without it. But some code may rely on the fact the stream will not be woken up. With this option, it is possible to spot some hidden bugs.	2023-02-17 10:17:02 +01:00
Christopher Faulet	407210a34d	BUG/MEDIUM: stconn: Don't rearm the read expiration date if EOI was reached At the stream level, the read expiration date is unset if a shutr was received but not if the end of input was reached. If we know no more data are excpected, there is no reason to let the read expiration date armed, except to respect clientfin/serverfin timeout on some circumstances. This patch could slowly be backported as far as 2.2.	2023-02-17 10:16:25 +01:00
Christopher Faulet	af124360ed	BUG/MEDIUM: http-ana: Detect closed SC on opposite side during body forwarding During the payload forwarding, since the commit `f2b02cfd9` ("MAJOR: http-ana: Review error handling during HTTP payload forwarding"), when an error occurred on one side, we don't rely anymore on a specific HTTP message state to detect it on the other side. However, nothing was added to detect the error. Thus, when this happens, a spinning loop may be experienced and an abort because of the watchdog. To fix the bug, we must detect the opposite side is closed by checking the opposite SC state. Concretly, in http_end_request() and http_end_response(), we wait for the other side iff the HTTP message state is lower to HTTP_MSG_DONE (the message is not finished) and the SC state is not SC_ST_CLO (the opposite side is not closed). In these function, we don't care if there was an error on the opposite side. We only take care to detect when we must stop waiting the other side. This patch should fix the issue #2042. No backport needed.	2023-02-17 10:16:25 +01:00
William Lallemand	44979ad680	BUG/MINOR: config: crt-list keywords mistaken for bind ssl keywords This patch fixes an issue in the "-dK" keywords dumper, which was mistakenly displaying the "crt-list" keywords for "bind ssl" keywords. The patch fixes the issue by dumping the "crt-list" keywords in its own section, and dumping the "bind" keywords which are in the "SSL" scope with a "bind ssl" prefix. This commit depends on the previous "MINOR: ssl: rename confusing ssl_bind_kws" commit. Must be backported in 2.6. Diff of the `./haproxy -dKall -q -c -f /dev/null` output before and after the patch in 2.8-dev4: \| @@ -190,30 +190,9 @@ listen \| use-fcgi-app \| bind <addr> accept-netscaler-cip +1 \| bind <addr> accept-proxy \| - bind <addr> allow-0rtt \| - bind <addr> alpn +1 \| bind <addr> backlog +1 \| - bind <addr> ca-file +1 \| - bind <addr> ca-ignore-err +1 \| - bind <addr> ca-sign-file +1 \| - bind <addr> ca-sign-pass +1 \| - bind <addr> ca-verify-file +1 \| - bind <addr> ciphers +1 \| - bind <addr> ciphersuites +1 \| - bind <addr> crl-file +1 \| - bind <addr> crt +1 \| - bind <addr> crt-ignore-err +1 \| - bind <addr> crt-list +1 \| - bind <addr> curves +1 \| bind <addr> defer-accept \| - bind <addr> ecdhe +1 \| bind <addr> expose-fd +1 \| - bind <addr> force-sslv3 \| - bind <addr> force-tlsv10 \| - bind <addr> force-tlsv11 \| - bind <addr> force-tlsv12 \| - bind <addr> force-tlsv13 \| - bind <addr> generate-certificates \| bind <addr> gid +1 \| bind <addr> group +1 \| bind <addr> id +1 \| @@ -225,48 +204,52 @@ listen \| bind <addr> name +1 \| bind <addr> namespace +1 \| bind <addr> nice +1 \| - bind <addr> no-ca-names \| - bind <addr> no-sslv3 \| - bind <addr> no-tls-tickets \| - bind <addr> no-tlsv10 \| - bind <addr> no-tlsv11 \| - bind <addr> no-tlsv12 \| - bind <addr> no-tlsv13 \| - bind <addr> npn +1 \| - bind <addr> prefer-client-ciphers \| bind <addr> process +1 \| bind <addr> proto +1 \| bind <addr> severity-output +1 \| bind <addr> shards +1 \| - bind <addr> ssl \| - bind <addr> ssl-max-ver +1 \| - bind <addr> ssl-min-ver +1 \| - bind <addr> strict-sni \| bind <addr> tcp-ut +1 \| bind <addr> tfo \| bind <addr> thread +1 \| - bind <addr> tls-ticket-keys +1 \| bind <addr> transparent \| bind <addr> uid +1 \| bind <addr> user +1 \| bind <addr> v4v6 \| bind <addr> v6only \| - bind <addr> verify +1 \| bind <addr> ssl allow-0rtt \| bind <addr> ssl alpn +1 \| bind <addr> ssl ca-file +1 \| + bind <addr> ssl ca-ignore-err +1 \| + bind <addr> ssl ca-sign-file +1 \| + bind <addr> ssl ca-sign-pass +1 \| bind <addr> ssl ca-verify-file +1 \| bind <addr> ssl ciphers +1 \| bind <addr> ssl ciphersuites +1 \| bind <addr> ssl crl-file +1 \| + bind <addr> ssl crt +1 \| + bind <addr> ssl crt-ignore-err +1 \| + bind <addr> ssl crt-list +1 \| bind <addr> ssl curves +1 \| bind <addr> ssl ecdhe +1 \| + bind <addr> ssl force-sslv3 \| + bind <addr> ssl force-tlsv10 \| + bind <addr> ssl force-tlsv11 \| + bind <addr> ssl force-tlsv12 \| + bind <addr> ssl force-tlsv13 \| + bind <addr> ssl generate-certificates \| bind <addr> ssl no-ca-names \| + bind <addr> ssl no-sslv3 \| + bind <addr> ssl no-tls-tickets \| + bind <addr> ssl no-tlsv10 \| + bind <addr> ssl no-tlsv11 \| + bind <addr> ssl no-tlsv12 \| + bind <addr> ssl no-tlsv13 \| bind <addr> ssl npn +1 \| - bind <addr> ssl ocsp-update +1 \| + bind <addr> ssl prefer-client-ciphers \| bind <addr> ssl ssl-max-ver +1 \| bind <addr> ssl ssl-min-ver +1 \| + bind <addr> ssl strict-sni \| + bind <addr> ssl tls-ticket-keys +1 \| bind <addr> ssl verify +1 \| server <name> <addr> addr +1 \| server <name> <addr> agent-addr +1 \| @@ -591,6 +574,23 @@ listen \| http-after-response unset-var* \| userlist \| peers \| +crt-list \| + allow-0rtt \| + alpn +1 \| + ca-file +1 \| + ca-verify-file +1 \| + ciphers +1 \| + ciphersuites +1 \| + crl-file +1 \| + curves +1 \| + ecdhe +1 \| + no-ca-names \| + npn +1 \| + ocsp-update +1 \| + ssl-max-ver +1 \| + ssl-min-ver +1 \| + verify +1 \| # List of registered CLI keywords: \| @!<pid> [MASTER] \| @<relative pid> [MASTER]	2023-02-16 16:14:37 +01:00
William Lallemand	af67806651	MINOR: ssl: rename confusing ssl_bind_kws The ssl_bind_kw structure is exclusively used for crt-list keyword, it must be named otherwise to remove the confusion. The structure was renamed ssl_crtlist_kws.	2023-02-16 16:03:45 +01:00
Willy Tarreau	a8598a2eb1	BUG/CRITICAL: http: properly reject empty http header field names The HTTP header parsers surprizingly accepts empty header field names, and this is a leftover from the original code that was agnostic to this. When muxes were introduced, for H2 first, the HPACK decompressor needed to feed headers lists, and since empty header names were strictly forbidden by the protocol, the lists of headers were purposely designed to be terminated by an empty header field name (a principle that is similar to H1's empty line termination). This principle was preserved and generalized to other protocols migrated to muxes (H1/FCGI/H3 etc) without anyone ever noticing that the H1 parser was still able to deliver empty header field names to this list. In addition to this it turns out that the HPACK decompressor, despite a comment in the code, may successfully decompress an empty header field name, and this mistake was propagated to the QPACK decompressor as well. The impact is that an empty header field name may be used to truncate the list of headers and thus make some headers disappear. While for H2/H3 the impact is limited as haproxy sees a request with missing headers, and headers are not used to delimit messages, in the case of HTTP/1, the impact is significant because the presence (and sometimes contents) of certain sensitive headers is detected during the parsing. Thus, some of these headers may be seen, marked as present, their value extracted, but never delivered to upper layers and obviously not forwarded to the other side either. This can have for consequence that certain important header fields such as Connection, Upgrade, Host, Content-length, Transfer-Encoding etc are possibly seen as different between what haproxy uses to parse/forward/route and what is observed in http-request rules and of course, forwarded. One direct consequence is that it is possible to exploit this property in HTTP/1 to make affected versions of haproxy forward more data than is advertised on the other side, and bypass some access controls or routing rules by crafting extraneous requests. Note, however, that responses to such requests will normally not be passed back to the client, but this can still cause some harm. This specific risk can be mostly worked around in configuration using the following rule that will rely on the bug's impact to precisely detect the inconsistency between the known body size and the one expected to be advertised to the server (the rule works from 2.0 to 2.8-dev): http-request deny if { fc_http_major 1 } !{ req.body_size 0 } !{ req.hdr(content-length) -m found } !{ req.hdr(transfer-encoding) -m found } !{ method CONNECT } This will exclusively block such carefully crafted requests delivered over HTTP/1. HTTP/2 and HTTP/3 do not need content-length, and a body that arrives without being announced with a content-length will be forwarded using transfer-encoding, hence will not cause discrepancies. In HAProxy 2.0 in legacy mode ("no option http-use-htx"), this rule will simply have no effect but will not cause trouble either. A clean solution would consist in modifying the loops iterating over these headers lists to check the header name's pointer instead of its length (since both are zero at the end of the list), but this requires to touch tens of places and it's very easy to miss one. Functions such as htx_add_header(), htx_add_trailer(), htx_add_all_headers() would be good starting points for such a possible future change. Instead the current fix focuses on blocking empty headers where they are first inserted, hence in the H1/HPACK/QPACK decoders. One benefit of the current solution (for H1) is that it allows "show errors" to report a precise diagnostic when facing such invalid HTTP/1 requests, with the exact location of the problem and the originating address: $ printf "GET / HTTP/1.1\r\nHost: localhost\r\n:empty header\r\n\r\n" \| nc 0 8001 HTTP/1.1 400 Bad request Content-length: 90 Cache-Control: no-cache Connection: close Content-Type: text/html <html><body><h1>400 Bad request</h1> Your browser sent an invalid request. </body></html> $ socat /var/run/haproxy.stat <<< "show errors" Total events captured on [10/Feb/2023:16:29:37.530] : 1 [10/Feb/2023:16:29:34.155] frontend decrypt (#2): invalid request backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:31092 buffer starts at 0 (including 0 out), 16334 free, len 50, wraps at 16336, error at position 33 H1 connection flags 0x00000000, H1 stream flags 0x00000810 H1 msg state MSG_HDR_NAME(17), H1 msg flags 0x00001410 H1 chunk len 0 bytes, H1 body len 0 bytes : 00000 GET / HTTP/1.1\r\n 00016 Host: localhost\r\n 00033 :empty header\r\n 00048 \r\n I want to address sincere and warm thanks for their great work to the team composed of the following security researchers who found the issue together and reported it: Bahruz Jabiyev, Anthony Gavazzi, and Engin Kirda from Northeastern University, Kaan Onarlioglu from Akamai Technologies, Adi Peleg and Harvey Tuch from Google. And kudos to Amaury Denoyelle from HAProxy Technologies for spotting that the HPACK and QPACK decoders would let this pass despite the comment explicitly saying otherwise. This fix must be backported as far as 2.0. The QPACK changes can be dropped before 2.6. In 2.0 there is also the equivalent code for legacy mode, which doesn't suffer from the list truncation, but it would better be fixed regardless. CVE-2023-25725 was assigned to this issue.	2023-02-14 08:48:54 +01:00
Frédéric Lécaille	07846cbda8	BUG/MINOR: quic: Wrong datagram dispatch because of qc_check_dcid() There was a parenthesis placed in the wrong place for a memcmp(). As a consequence, clients could not reuse a UDP address for a new connection. Must be backported to 2.7.	2023-02-13 16:14:24 +01:00
Christopher Faulet	25e36bfc22	BUG/MEDIUM: spoe: Don't set the default traget for the SPOE agent frontend The commit `d5983cef8` ("MINOR: listener: remove the useless ->default_target field") revealed a bug in the SPOE. No default-target must be defined for the SPOE agent frontend. SPOE applets are used on the frontend side and a TCP connection is established on the backend side. Because of this bug, since the commit above, the stream target is set to the SPOE applet instead of the backend connection, leading to a spinning loop on the applet when it is released because are unable to close the backend side. This patch should fix the issue #2040. It only affects the 2.8-DEV but to avoid any future bug, it should be backported to all stable versions.	2023-02-13 11:37:27 +01:00
Christopher Faulet	3eff752f6f	BUG/MINOR: mux-h1: Don't report an H1C error on client timeout When a client timeout is reported by the H1 mux, it is not an error but an abort. Thus, H1C_F_ERROR flag must not be set. It is espacially important to not inhibit the send. Because of this bug, a 408-Request-time-out is reported in logs but the error message is not sent to the client. This patch must be backported to 2.7.	2023-02-13 09:43:38 +01:00
Christopher Faulet	5b74f99383	BUG/MINOR: http-ana: Fix condition to set LAST termination flag We should not report LAST data in log if the response is in TUNNEL mode on client close/timeout because there is no way to be sure it is the last data. It means, it can only be reported in DONE, CLOSING or CLOSE states. No backport needed.	2023-02-13 09:43:38 +01:00
Christopher Faulet	b3ef9c392e	MINOR: bwlim: Remove useless test on CF_READ_ERROR to detect the last packet There is already a test on CF_EOI and CF_SHUTR. The last one is always set when a read error is reported. Thus there is no reason to check CF_READ_ERROR.	2023-02-13 09:43:38 +01:00
Christopher Faulet	11fddb9655	MINOR: ssl-ckch: Stop to test CF_WRITE_ERROR to commit CA/CRL file This change was performed on all applet I/O handlers but one was missed. In the CLI I/O handler used to commit a CA/CRL file, we can remove the test on CF_WRITE_ERROR because there is already a test on CF_SHUTW.	2023-02-13 09:43:38 +01:00
Fr�d�ric L�caille	91376d6134	BUG/MEDIUM: quic: Buffer overflow when looking through QUIC CLI keyword list This has been detected by libasan as follows: ================================================================= ==3170559==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55cf77faad08 at pc 0x55cf77a87370 bp 0x7ffc01bdba70 sp 0x7ffc01bdba68 READ of size 8 at 0x55cf77faad08 thread T0 #0 0x55cf77a8736f in cli_find_kw src/cli.c:335 #1 0x55cf77a8a9bb in cli_parse_request src/cli.c:792 #2 0x55cf77a8c385 in cli_io_handler src/cli.c:1024 #3 0x55cf77d19ca1 in task_run_applet src/applet.c:245 #4 0x55cf77c0b6ba in run_tasks_from_lists src/task.c:634 #5 0x55cf77c0cf16 in process_runnable_tasks src/task.c:861 #6 0x55cf77b48425 in run_poll_loop src/haproxy.c:2934 #7 0x55cf77b491cf in run_thread_poll_loop src/haproxy.c:3127 #8 0x55cf77b4bef2 in main src/haproxy.c:3783 #9 0x7fb8b0693d09 in __libc_start_main ../csu/libc-start.c:308 #10 0x55cf7764f4c9 in _start (/home/flecaille/src/haproxy-untouched/haproxy+0x1914c9) 0x55cf77faad08 is located 0 bytes to the right of global variable 'cli_kws' defined in 'src/quic_conn.c:7834:27' (0x55cf77faaca0) of size 104 SUMMARY: AddressSanitizer: global-buffer-overflow src/cli.c:335 in cli_find_kw Shadow bytes around the buggy address: According to cli_find_kw() code and cli_kw_list struct definition, the second member of this structure ->kw[] must be a null-terminated array. Add a last element with default initializers to <cli_kws> global variable which is impacted by this bug. This bug arrived with this commit: `15c74702d` MINOR: quic: implement a basic "show quic" CLI handler Must be backported to 2.7 where this previous commit has been already backported.	2023-02-11 21:08:34 +01:00
Christopher Faulet	341a5783b0	BUG/MEDIUM: stconn: stop to enable/disable reads from streams via si_update_rx It is not really a bug because it does not fix any known issue. And it is flagged as MEDIUM because it is sensitive. But if there are some extra calls to process_stream(), it can be an issue because, in si_update_rx(), we may disable reading for the SC when outgoing data are blocked in the input channel. But it is not really the process_stream() job to take care of that. This may block data receipt. It is an old code, mainly here to avoid wakeup in loop on the stats applet. Today, it seems useless and can lead to bugs. An endpoint is responsible to block the SC if it waits for some room and the opposite endpoint is responsible to unblock it when some data are sent. The stream should not interfere on this part. This patch could be backported to 2.7 after a period of observation. And it should only be backported to lower versions if an issue is reported.	2023-02-10 17:50:35 +01:00
Willy Tarreau	b685ad0774	BUG/MINOR: clock/stats: also use start_time not start_date in HTML info For an unknown reason in the change of uptime calculation for the HTML page didn't make it to commit `6093ba47c` ("BUG/MINOR: clock: do not mix wall-clock and monotonic time in uptime calculation"). Let's address it as well otherwise the stats page will display an incorrect uptime. No backport needed unless the patch above is backported.	2023-02-10 16:53:35 +01:00
Amaury Denoyelle	2776e775ec	BUG/MINOR: mworker: fix uptime for master process Uptime calculation for master process was incorrect as it used <start_date> as its timestamp base time. Fix this by using the scheduler time <start_time> for this. The impact of this bug is minor as timestamp base time is only used for "show proc" CLI output. it was highlighted by the following commit. which caused a negative value to be displayed for the master process uptime on "show proc" output. `28360dc53f` MEDIUM: clock: force internal time to wrap early after boot This should be backported up to 2.0.	2023-02-10 15:57:33 +01:00
Amaury Denoyelle	a9de25a559	BUG/MINOR: quic: fix type bug on "show quic" for 32-bits arch Incorrect printf format specifier "%lu" was used on "show quic" handler for uint64_t. This breaks build on 32-bits architecture. To fix this portability issue, force an explicit cast to unsigned long long with "%llu" specifier. This must be backported up to 2.7.	2023-02-10 09:29:37 +01:00
Christopher Faulet	71c486b290	BUG/MEDIUM: stconn: Don't needlessly wake the stream on send during fast-forward With a connection, when data are received, if these data are sent to the opposite side because the fast-forwarding is possible, the stream may be woken up on some conditions (at the end of sc_app_chk_snd_conn()): * The channel is shut for write * The SC is not in the "established" state * The stream must explicitly be woken up on write and all data was sent * The connection was just established. A bug on the last condition was introduced with the commit `d89884153` ("MEDIUM: channel: Use CF_WRITE_EVENT instead of CF_WRITE_PARTIAL"). The stream is now woken up on any write events. This patch fixes this issue and restores the original behavior. No backport is needed.	2023-02-10 09:09:57 +01:00
Amaury Denoyelle	10a46de620	BUG/MINOR: quic: fix filtering of closing connections on "show quic" Filtering of closing/draining connections on "show quic" was not properly implemented. This causes the extra argument "all" to display all connections to be without effect. This patch fixes this and restores the output of all connections. This must be backported up to 2.7.	2023-02-09 18:30:14 +01:00
Amaury Denoyelle	3f9758ecab	MINOR: quic: filter closing conn on "show quic" Reduce default "show quic" output by masking connection on closing/draing state due to a CONNECTION_CLOSE emission/reception. These connections can still be displayed using the special argument "all". This should be backported up to 2.7.	2023-02-09 18:14:40 +01:00
Amaury Denoyelle	2eda63b447	MINOR: quic: display Tx stream info on "show quic" Complete "show quic" handler by displaying information about quic_stream_desc entries. These structures are used to emit stream data and store them until acknowledgment is received. This should be backported up to 2.7.	2023-02-09 18:14:40 +01:00
Amaury Denoyelle	1b0fc437f3	MINOR: quic: display infos about various encryption level on "show quic" Complete "show quic" handler by displaying various information related to each encryption level and packet number space. Most notably, ack ranges and bytes in flight are present to help debug retransmission issues. This should be backported up to 2.7.	2023-02-09 18:14:40 +01:00
Amaury Denoyelle	b89c0e243a	MINOR: quic: display socket info on "show quic" Complete "show quic" handler by displaying information related to the quic_conn owned socket. First, the FD is printed, followed by the address of the local and remote endpoint. This should be backported up to 2.7.	2023-02-09 18:14:40 +01:00
Amaury Denoyelle	58d9d5d160	MINOR: quic: display CIDs and state in "show quic" Complete "show quic" handler. Source and destination CIDs are printed for every connection. This is complete by a state info to reflect if handshake is completed and if a CONNECTION_CLOSE has been emitted or received and the allocation status of the attached MUX. Finally the idle timer expiration is also printed. This should be backported up to 2.7.	2023-02-09 18:14:40 +01:00
Amaury Denoyelle	15c74702d5	MINOR: quic: implement a basic "show quic" CLI handler Implement a basic "show quic" CLI handler. This command will be useful to display various information on all the active QUIC frontend connections. This work is heavily inspired by "show sess". Most notably, a global list of quic_conn has been introduced to be able to loop over them. This list is stored per thread in ha_thread_ctx. Also add three CLI handlers for "show quic" in order to allocate and free the command context. The dump handler runs on thread isolation. Each quic_conn is referenced using a back-ref to handle deletion during handler yielding. For the moment, only a list of raw quic_conn pointers is displayed. The handler will be completed over time with more information as needed. This should be backported up to 2.7.	2023-02-09 18:11:00 +01:00
Willy Tarreau	db991c2658	BUG/MEDIUM: quic: fix crash when "option nolinger" is set in the frontend Commit `0aba11e9e` ("MINOR: quic: remove unnecessary quic_session_accept()") overlooked one problem, in session_accept_fd() at the end, there's a bunch of FD-specific stuff that either sets up or resets the socket at the TCP level. The tests are mostly performed for AF_INET/AF_INET6 families but they're only for one part (i.e. to avoid setting up TCP options on UNIX sockets). Other pieces continue to configure the socket regardless of its family. All of this directly acts on the FD, which is not correct since the FD is not valid here, it corresponds to the QUIC handle. The issue is much more visible when "option nolinger" is enabled in the frontend, because the access to fdatb[cfd].state immediately crashes on the first connection, as can be seen in github issue #2030. This patch bypasses this setup for FD-less connections, such as QUIC. However some of them could definitely be relevant to the QUIC stack, or even to UNIX sockets sometimes. A better long-term solution would consist in implementing a setsockopt() equivalent at the protocol layer that would be used to configure the socket, either the FD or the QUIC conn depending on the case. Some of them would not always be implemented but that would allow to unify all this code. This fix must be backported everywhere the commit above is backported, namely 2.6 and 2.7. Thanks to github user @twomoses for the nicely detailed report.	2023-02-09 18:04:10 +01:00
Christopher Faulet	eb3f26d5a0	BUG/MEDIUM: stconn: Schedule a shutw on shutr if data must be sent first The commit `7f59d68fe` ("BUG/MEDIIM: stconn: Flush output data before forwarding close to write side") introduced a regression. When the read side is closed, the close is not forwarded to the write side if there are some pending outgoind data. The idea is to foward data first and the close the write side. However, when fast-forwarding is enabled and last data block is received with the read0, the close is never forwarded. We cannot revert the commit above because it really fix an issue. However, we can schedule the shutdown for write by setting CF_SHUTW_NOW flag on the write side. Indeed, it is the purpose of this flag. To not replicate ugly and hardly maintainable code block at different places in stconn.c, an helper function is used. Thus, sc_cond_forward_shutw() must be called to know if the close can be fowarded or not. It returns 1 if it is possible. In this case, the caller is responsible to forward the close to the write side. Otherwise, if the close cannot be forwarded, 0 is returned. It happens when it should not be performed at all. Or when it should only be delayed, waiting for the input channel to be flushed. In this last case, the CF_SHUTW_NOW flag is set in the output channel. This patch should fix the issue #2033. It must be backported with the commit above, thus at least as far as 2.2.	2023-02-08 16:35:54 +01:00
Aurelien DARRAGON	86207e782c	BUG/MINOR: server/add: ensure minconn/maxconn consistency when adding server When a new server was added through the cli using "server add" command, the maxconn/minconn consistency check historically implemented in check_config_validity() for static servers was missing. As a result, when adding a server with the maxconn parameter without the minconn set, the server was unable to handle any connection because srv_dynamic_maxconn() would always return 0. Consider the following reproducer: \| global \| stats socket /tmp/ha.sock mode 660 level admin expose-fd listeners \| \| defaults \| timeout client 5s \| timeout server 5s \| timeout connect 5s \| \| frontend test \| mode http \| bind *:8081 \| use_backend farm \| \| listen dummyok \| bind localhost:18999 \| mode http \| http-request return status 200 hdr test "ok" \| \| backend farm \| mode http Start haproxy and perform the following : echo "add server farm/t1 127.0.0.1:18999 maxconn 100" \| nc -U /tmp/ha.sock echo "enable server farm/t1" \| nc -U /tmp/ha.sock curl localhost:8081 # -> 503 after 5s connect timeout Thanks to ("MINOR: cfgparse/server: move (min/max)conn postparsing logic into dedicated function"), we are now able to perform the consistency check after the new dynamic server has been parsed. This is enough to fix the issue documented here that was reported by Thomas Pedoussaut on the ML. This commit depends on: - ("MINOR: cfgparse/server: move (min/max)conn postparsing logic into dedicated function") It must be backported to 2.6 and 2.7	2023-02-08 14:48:21 +01:00
Aurelien DARRAGON	3e7a0bb70b	MINOR: cfgparse/server: move (min/max)conn postparsing logic into dedicated function In check_config_validity() function, we performed some consistency checks to adjust minconn/maxconn attributes for each declared server. We move this logic into a dedicated function named srv_minmax_conn_apply() to be able to perform those checks later in the process life when needed (ie: dynamic servers)	2023-02-08 14:48:21 +01:00
William Lallemand	a14686d096	MINOR: ssl/ocsp: add a function to check the OCSP update configuration Deduplicate the code which checks the OCSP update in the ckch_store and in the crtlist_entry. Also, jump immediatly to error handling when the ERR_FATAL is catched.	2023-02-08 11:40:31 +01:00
Willy Tarreau	28360dc53f	MEDIUM: clock: force internal time to wrap early after boot GH issue #2034 clearly indicates yet another case of time roll-over that went badly. Issues that happen only once every 50 days are hard to detect and debug, and are usually reported more or less synchronized from multiple sources. This patch finally does what had long been planned but never done yet, which is to force the time to wrap early after boot so that any such remaining issue can be spotted quicker. The margin delay here is 20s (it may be changed by setting BOOT_TIME_WRAP_SEC to another value). This value seems sufficient to permit failed health checks to succeed and traffic to come in and possibly start to update some time stamps (accept dates in logs, freq counters, stick-tables expiration dates etc). It could theoretically be helpful to have this in 2.7, but as can be seen with the two patches below, we've already had incorrect use cases of the internal monotonic time when the wall-clock one was needed, so we could expect to detect other ones in the future. Note that this will not induce bugs, it will only make them happen much faster (i.e. no need to wait for 50 days before seeing them). If it were to eventually be backported, these two previous patches must also be backported: BUG/MINOR: clock: use distinct wall-clock and monotonic start dates BUG/MEDIUM: cache: use the correct time reference when comparing dates	2023-02-08 11:10:33 +01:00
Willy Tarreau	9b5d57dfd5	BUG/MEDIUM: cache: use the correct time reference when comparing dates The cache makes use of dates advertised by external components, such as "last-modified" or "date". As such these are wall-clock dates, and not internal dates. However, all comparisons are mistakenly made based on the internal monotonic date which is designed to drift from the wall clock one in order to catch up with stolen time (which can sometimes be intense in VMs). As such after some run time some objects may fail to validate or fail to expire depending on the direction of the drift. This is particularly visible when applying an offset to the internal time to force it to wrap soon after startup, as it will be shifted up to 49.7 days in the future depending on the current date; what happens in this case is that the reg-test "cache_expires.vtc" fails on the 3rd test by returning stale contents from the cache at the date of this commit. It is really important that all external dates are compared against "date" and not "now" for this reason. This fix needs to be backported to all versions.	2023-02-08 11:10:33 +01:00
Willy Tarreau	6093ba47c0	BUG/MINOR: clock: do not mix wall-clock and monotonic time in uptime calculation We've had a start date even before the internal monotonic clock existed, but once the monotonic clock was added, the start date was not updated to distinguish the wall clock time units and the internal monotonic time units. The distinction is important because both clocks do not necessarily progress at the same speed. The very rare occurrences of the wall-clock date are essentially for human consumption and communication with third parties (e.g. report the start date in "show info" for monitoring purposes). However currently this one is also used to measure the distance to "now" as being the process' uptime. This is actually not correct. It only works because for now the two dates are initialized at the exact same instant at boot but could still be wrong if the system's date shows a big jump backwards during startup for example. In addition the current situation prevents us from enforcing an abritrary offset at boot to reveal some heisenbugs. This patch adds a new "start_time" at boot that is set from "now" and is used in uptime calculations. "start_date" instead is now set from "date" and will always reflect the system date for human consumption (e.g. in "show info"). This way we're now sure that any drift of the internal clock relative to the system date will not impact the reported uptime. This could possibly be backported though it's unlikely that anyone has ever noticed the problem.	2023-02-08 11:06:55 +01:00
Aleksey Ponomaryov	593802128c	BUG/MEDIUM: stick-table: do not leave entries in end of window during purge At some moments expired stick table records stop being removed. This happens when the internal time wraps around the 32-bit limit, or every 49.7 days. What precisely happens is that some elements that are collected close to the end of the time window (2^32 - table's "expire" setting) might have been updated and will be requeued further, at the beginning of the next window. Here, three bad situations happen: - the incorrect integer-based comparison that is not aware of wrapping will result in the scan to restart from the freshly requeued element, skipping all those at the end of the window. The net effect of this is that at each wakeup of the expiration task, only one element from the end of the window will be expired, and other ones will remain there for a very long time, especially if they have to wait for all the predecessors to be picked one at a time after slow wakeups due to a long expiration ; this is what was observed in issue #2034 making the table fill up and appear as not expiring at all, and it seems that issue #2024 reports the same problem at the same moment (since such issues happen for everyone roughly at the same time when the clock doesn't drift too much). - the elements that were placed at the beginning of the next window are skipped as well for as long as there are refreshed entries at the end of the previous window, so these ones participate to filling the table as well. This is cause by the restart from the current, updated node that is generally placed after most other less recently updated elements. - once the last element at the end of the window is picked, suddenly there is a large amount of expired entries at the beginning of the next window that all have to be requeued. If the expiration delay is large, the number can be big and it can take a long time, which can very likely explain the periodic crashes reported in issue #2025. Limiting the batch size as done in commit `dfe79251d` ("BUG/MEDIUM: stick-table: limit the time spent purging old entries") would make sense for process_table_expire() as well. This patch addresses the incorrect tree scan algorithm to make sure that: - there's always a next element to compare against, even when dealing with the last one in the tree, the first one must be used ; - time comparisons used to decide whether to restart from the current element use tick_is_lt() as it is the only case where we know the current element will be placed before any other one (since the tree respects insertion ordering for duplicates) In order to reproduce the issue, it was found that injecting traffic on a random key that spans over half of the size of a table whose expiration is set to 15s while the date is going to wrap in 20s does exhibit an increase of the table's size 5s after startup, when entries start to be pushed to the next window. It's more effective when a second load generator constantly hammers a same key to be certain that none of them is ready to expire. This doesn't happen anymore after this patch. This fix needs to be backported to all stable versions. The bug has been there for as long as the stick tables were introduced in 1.4-dev7 with commit `3bd697e07` ("[MEDIUM] Add stick table (persistence) management functions and types"). A cleanup could consists in deduplicating that code by having process_table_expire() call __stktable_trash_oldest(), with that one improved to support an optional time check.	2023-02-08 08:55:02 +01:00
William Lallemand	d85227fca2	BUG/MINOR: ssl/crt-list: warn when a line is malformated Display a warning when some text exists between the filename and the options. This part is completely ignored so if there are filters here, they were never parsed. This could be backported in every versions. In the older versions, the parsing was done in ssl_sock_load_cert_list_file() in ssl_sock.c.	2023-02-07 17:28:54 +01:00
Willy Tarreau	655a7bcac1	BUG/MEDIUM: listener/thread: bypass shards setting on failed thread resolution Aur�lien reported that the BUG_ON(!new_ts.nbgrp) added in 2.8-dev3 by commit `50440457e` ("MEDIUM: config: restrict shards, not bind_conf to one group each") can trigger on some invalid configs where the thread_set on the "bind" line couldn't be resolved. The reason is that we still enter the parsing loop (as it was done previously) and we possibly have no group to work on (which was the purpose of this assertion). There we need to bypass all this block on such a condition. No backport is needed.	2023-02-06 18:06:14 +01:00
Willy Tarreau	f91ab7a08c	BUG/MEDIUM: thread: fix extraneous shift in the thread_set parser Aur�lien reported a bug making a statement such as "thread 2-2" fail for a config made of exactly 2 threads. What happens is that the parser for the "thread" keyword scans a range of thread numbers from either 1..64 or 0,-1,-2 for special values, and presets the bit masks accordingly in the thread set, except that due to the 1..64 range, the shift length must be reduced by one. Not doing this causes empty masks for single-bit values that are exactly equal to the number of threads in the group and fails to properly parse. No backport is needed as this was introduced in 2.8-dev3 by commit `bef43dfa6` ("MINOR: thread: add a simple thread_set API").	2023-02-06 18:01:50 +01:00
Frédéric Lécaille	d97d1d7c7c	BUG/MINOR: stats: Prevent HTTP "other sessions" counter underflows Due to multithreading concurrency, it is difficult at this time to figure out how this counter may become negative. This simple patch only checks this will never be the case. This issue arrives with this commit: "9969adbcdc MINOR: stats: add by HTTP version cumulated number of sessions and requests" So, this patch should be backported when the latter has been backported.	2023-02-06 14:04:27 +01:00
Frédéric Lécaille	b7a406ac34	MINOR: quic: Update version_information transport parameter to draft-14 This is necessary to make our stack negotiate the QUIC versions with clients. (See https://author-tools.ietf.org/iddiff?url1=draft-ietf-quic-version-negotiation-13&url2=draft-ietf-quic-version-negotiation-14&difftype=--html) Must be backported to 2.7.	2023-02-06 11:54:07 +01:00
Aurelien DARRAGON	90304dcdd8	BUG/MINOR: stats: fix STAT_STARTED behavior with full htx When stats_putchk() fails to peform the dump because available data space in htx is less than the number of bytes pending in the dump buffer, we wait for more room in the htx (ie: sc_need_room()) to retry the dump attempt on the next applet invocation. To provide consistent output, we have to make sure that the stat ctx is not updated (or at least correctly reverted) in case stats_putchk() fails so that the new dumping attempt behaves just like the previous (failed) one. STAT_STARTED is not following this logic, the flag is set in stats_dump_fields_json() as soon as some data is written to the output buffer. It's done too early: we need to delay this step after the stats_putchk() has successfully returned if we want to correctly handle the retries attempts. Because of this, JSON output could suffer from extraneous ',' characters which could make json parsers unhappy. For example, this is the kind of errors you could get when using `python -m json.tool` on such badly formatted outputs: "Expecting value: line 1 column 2 (char 1)" Unfortunately, fixing this means that the flag needs to be enabled at multiple places, which is what we're doing in this patch. (in stats_dump_proxy_to_buffer() where stats_dump_one_line() is involved by underlying stats_dump_{fe,li,sv,be} functions) Thereby, this raises the need for a cleanup to reduce code duplication around stats_dump_proxy_to_buffer() function and simplify things a bit. It could be backported to 2.6 and 2.7	2023-02-06 07:53:03 +01:00
Aurelien DARRAGON	28a23617ce	BUG/MINOR: stats: fix show stats field ctx for servers In ("MINOR: stats: introduce stats field ctx"), we forgot to apply the patch to servers. This prevents "BUG/MINOR: stats: fix show stat json buffer limitation" from working with servers dump. We're adding the missing part related to servers dump. This commit should be backported with the aforementioned commits.	2023-02-06 07:53:03 +01:00
Aurelien DARRAGON	9b07d4fecd	BUG/MINOR: stats: fix ctx->field update in stats_dump_proxy_to_buffer() When ctx->field was introduced with ("MINOR: stats: introduce stats field ctx") a mistake was made for the STAT_PX_ST_LI state in stats_dump_proxy_to_buffer(): current_field reset is placed after the for loop, ie: after multiple lines are dumped. Instead it should be placed right after each li line is dumped. This could cause some output inconsistencies (missing fields), especially when http dump is used with JSON output and "socket-stats" option is enabled on the proxy, because when htx is full we restore the ctx->field with current_field (which contains outdated value in this case). This should be backported with ("MINOR: stats: introduce stats field ctx")	2023-02-06 07:53:03 +01:00
Aurelien DARRAGON	e5958d0292	BUG/MEDIUM: stats: fix resolvers dump In ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats"), we forgot to apply the patch in resolvers.c which provides the stats_dump_resolvers() function that is involved when dumping with "resolvers" domain. As a consequence, resolvers dump was broken because stats_dump_one_line(), which is used in stats_dump_resolv_to_buffer(), implicitely uses trash_chunk from stats.c to prepare the dump, and stats_putchk() is then called with global trash (currently empty) as output data. Given that trash_dump variable is static and thus only available within stats.c we change stats_putchk() function prototype so that the function does not take the output buffer as an argument. Instead, stats_putchk() will implicitly use the local trash_dump variable declared in stats.c. It will also prevent further mixups between stats_dump_* functions and stats_putchk(). This needs to be backported with ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats")	2023-02-06 07:53:03 +01:00
Aurelien DARRAGON	14656844cc	BUG/MINOR: stats: fix source buffer size for http dump In ("BUG/MINOR: stats: use proper buffer size for http dump"), we used trash.size as source buffer size before applying the htx overhead computation. It is safer to use res->buf.size instead since res_htx (which is <htx> argument passed to stats_putchk() in http context) is made from res->buf: in http_stats_io_handler: \| res_htx = htx_from_buf(&res->buf); This will prevent the hang bug from showing up again if res->buf.size were to be less than trash.size (which is set according to tune.bufsize). This should be backported with ("BUG/MINOR: stats: use proper buffer size for http dump")	2023-02-06 07:53:03 +01:00
Willy Tarreau	15c8428060	BUILD: thread: fix build warnings with older gcc compilers The "{ 0 }" form to initialize an empty structure triggers build warnings on gcc 4.8, let's use the more common "{ }" instead.	2023-02-04 10:49:01 +01:00
Amaury Denoyelle	f2f08f88ef	BUG/MEDIUM: quic: do not split STREAM frames if no space When building STREAM frames in a packet buffer, if a frame is too large it will be splitted in two. A shorten version will be used and the original frame will be modified to represent the remaining space. To ensure there is enough space to store the frame data length encoded as a QUIC integer, we use the function max_available_room(). This function can return 0 if there not only a small space left which is insufficient for the frame header and the shorten data. Prior to this patch, this wasn't check and an empty unneeded STREAM frame was built and sent for nothing. Change this by checking the value return by max_available_room(). If 0, do not try to split this frame and continue to the next ones in the packet. On 2.6, this patch serves as an optimization which will prevent the building of unneeded empty STREAM frames. On 2.7, this behavior has the side-effect of triggering a BUG_ON() statement on quic_build_stream_frame(). This BUG_ON() ensures that we do not use quic_frame with OFF bit set if its offset is 0. This can happens if the condition defined above is reproduced for a STREAM frame at offset 0. An empty unneeded frame is built as descibed. The problem is that the original frame is modified with its OFF bit set even if the offset is still 0. This must be backported up to 2.6.	2023-02-03 19:19:50 +01:00
Willy Tarreau	50440457e3	MEDIUM: config: restrict shards, not bind_conf to one group each Now that we're using thread_sets there's no need to restrict an entire bind_conf to 1 group, the real concern being the FD, we can move that restriction to the shard only. This means that as long as we have enough shards and that they're properly aligned on group boundaries (i.e. shards are an integer divider of the number of threads), we can support "bind" lines spanning more than one group. The check is still performed for shards to span more than one group, and an error is emitted when this happens. But at least now it becomes possible to have this: global nbthread 256 frontend foo bind :1111 shards 4 bind :2222 shards by-thread	2023-02-03 18:00:21 +01:00
Willy Tarreau	484093df80	CLEANUP: listener/config: remove the special case for shards==1 In fact this case is already handled by the regular shards code, there is no need to special-case it.	2023-02-03 18:00:21 +01:00
Willy Tarreau	f2988e1447	CLEANUP: listener/thread: remove now unused bind_conf's bind_tgroup/bind_thread Not needed anymore since last commit, let's get rid of it.	2023-02-03 18:00:21 +01:00
Willy Tarreau	e6b88592a8	CLEANUP: config: stop using bind_tgroup and bind_thread Let's now retrieve the first thread group and its mask from the thread_set so that we don't need these fields in the bind_conf anymore. For now we're still limited to the first group (like before) but that allows to get rid of these fields and to make sure that there's nothing "special" being done there anymore.	2023-02-03 18:00:21 +01:00
Willy Tarreau	f0de8cacc4	MEDIUM: listener/config: make the "thread" parser rely on thread_sets Instead of reading and storing a single group and a single mask for a "thread" directive on a bind line, we now store the complete range in a thread set that's stored in the bind_conf. The bind_parse_thread() function now just calls parse_thread_set() to complete the current set, which starts empty, and thread_resolve_group_mask() was updated to support retrieving thread group numbers or absolute thread numbers directly from the pre-filled thread_set, and continue to feed bind_tgroup and bind_thread. The CLI parsers which were pre-initialized to set the bind_tgroup to 1 cannot do it anymore as it would prevent one from restricting the thread set. Instead check_config_validity() now detects the CLI frontend and passes the info down to thread_resolve_group_mask() that will automatically use only the group 1's threads for these listeners. The same is done for the peers listeners for now. At this step it's already possible to start with all previous valid configs as well as extended ones supporting comma-delimited thread sets. In addition the parser already accepts large ranges spanning multiple groups, but since the underlying listeners infrastructure is not read, for now we're maintaining a specific check against this at the higher level of the config validity check. The patch is a bit large because thread resolution is performed in multiple steps, so we need to adjust all of them at once to preserve functional and technical consistency.	2023-02-03 18:00:21 +01:00
Willy Tarreau	bef43dfa60	MINOR: thread: add a simple thread_set API The purpose is to be able to store large thread sets, defined by ranges that may cross group boundaries, as well as define lists of groups and masks. The thread_set struct implements the storage, and the parser is in parse_thread_set(), with a focus on "bind" lines, but not only.	2023-02-03 18:00:21 +01:00
Willy Tarreau	53c6c673ac	CLEANUP: config: remove test for impossible case regarding bind thread mask During 2.5 development, a fallback was implemented for bind "thread" directives that would not map to existing threads, with commit `e3f4d7496` ("MEDIUM: config: resolve relative threads on bind lines to absolute ones"). The approch consisted in remapping the threads to other ones. But now that relative threads and not absolute threads are stored in this mask, this case cannot happen anymore, and this confusing hack is not needed anymore.	2023-02-03 18:00:20 +01:00
Willy Tarreau	9e2682afed	MINOR: listener: remove the now useless LI_F_QUIC_LISTENER flag This flag is only used to tag a QUIC listener, which we now know by its bind_conf's xprt as well. It's only used to decide whether or not to perform an extra initialization step on the listener. Let's drop it as well as the flags field. With the various fields and options moved, the listener struct reduced by 48 bytes total.	2023-02-03 18:00:20 +01:00
Willy Tarreau	4c1d3a953d	MINOR: listener: get rid of LI_O_TCP_L4_RULES and LI_O_TCP_L5_RULES LI_O_TCP_L4_RULES and LI_O_TCP_L5_RULES are only set by from the proxy based on the presence or absence of tcp_req l4/l5 rules. It's basically as cheap to check the list as it is to check the flag, except that there is no need to maintain a copy. Let's get rid of them, and this may ease addition of more dynamic stuff later.	2023-02-03 18:00:20 +01:00
Willy Tarreau	1714680cec	MINOR: listener: move LI_O_UNLIMITED and LI_O_NOSTOP to bind_conf These two flags are entirely for internal use and are even per proxy in practice since they're used for peers and CLI to indicate (for the first one) that the listener(s) are not subject to connection limits, and for the second that the listener(s) should not be stopped on soft-stop. No need to keep them in the listeners, let's move them to the bind_conf under names BC_O_UNLIMITED and BC_O_NOSTOP.	2023-02-03 18:00:20 +01:00
Willy Tarreau	f1b4730f7d	MINOR: listener: move the ACC_PROXY and ACC_CIP options to bind_conf These are only set per bind line and used when creating a sessions, we can move them to the bind_conf under the names BC_O_ACC_PROXY and BC_O_ACC_CIP respectively.	2023-02-03 18:00:20 +01:00
Willy Tarreau	c492f1b17f	MINOR: listener: move TCP_FO to bind_conf It's set per bind line ("tfo") and only used in tcp_bind_listener() so there's no point keeping the address family tests, let's just store the flag in the bind_conf under the name BC_O_TCP_FO.	2023-02-03 18:00:20 +01:00
Willy Tarreau	d9b4d21248	MINOR: listener: move the DEF_ACCEPT option to the bind_conf This option is set per bind line, and was only set stored when the address family is AF_INET4 or AF_INET6. That's pointless since it's used only in tcp_bind_listener() which is only used for such families as well, so it can now be moved to the bind_conf under the name BC_O_DEF_ACCEPT.	2023-02-03 18:00:20 +01:00
Willy Tarreau	9bdcf42922	MINOR: listener: move the NOQUICKACK option to the bind_conf It solely depends on the bind line so let's move it there under the name BC_O_NOQUICKACK.	2023-02-03 18:00:20 +01:00
Willy Tarreau	cfb7c2f515	MINOR: listener: move the NOLINGER option to the bind_conf It's currently declared per-frontend, though it would make sense to support it per-line but in no case per-listener. Let's move the option to a bind_conf option BC_O_NOLINGER.	2023-02-03 18:00:20 +01:00
Willy Tarreau	7dbd4187dc	MINOR: listener: move the nice field to the bind_conf This is another bind line setting which can move to the bind_conf. Note that it leaves a 2-byte hole in the listener struct.	2023-02-03 18:00:20 +01:00
Willy Tarreau	d5983cef80	MINOR: listener: remove the useless ->default_target field This field is used by stream_new() to optionally set the applet the stream will connect to for simple proxies like the CLI for example. But it has never been configurable to anything and is always strictly equal to the frontend's ->default_target. Let's just drop it and make stream_new() only use the frontend's. It makes more sense anyway as we don't want the proxy to work differently based on the "bind" line. This idea was brought in 1.6 hoping that the h2 implementation would use applets for decoding (which was dropped after the very first attempt in 1.8).	2023-02-03 18:00:20 +01:00
Willy Tarreau	3083615410	MINOR: listener: move the ->accept callback to the bind_conf The accept callback directly derives from the upper layer, generally it's session_accept_fd(). As such it's also defined per bind line so it makes sense to move it there.	2023-02-03 18:00:20 +01:00
Willy Tarreau	758c69d951	MINOR: listener: move the maxconn parameter to the bind_conf The maxconn is set per bind line so let's move it there. This might possibly even slightly reduce inter-thread contention since this one is read-mostly and it was stored next to nbconn which changes for each connection setup or teardown.	2023-02-03 18:00:20 +01:00
Willy Tarreau	1920f897d8	MINOR: listener: move the backlog setting from listener to bind_conf The backlog setting is also defined by the bind_conf, so let's move it there.	2023-02-03 18:00:20 +01:00
Willy Tarreau	882f2485a1	MINOR: listener: move maxaccept from listener to bind_conf Like for previous values, maxaccept is really per-bind_conf, so let's move it there. Some frontends (peers, log) set it to 1 so the assignment was slightly moved.	2023-02-03 18:00:20 +01:00
Willy Tarreau	ee378165fb	MINOR: listener: move maxseg and tcp_ut to bind_conf These two arguments were only set and only used with tcpv4/tcpv6. Let's just store them into the bind_conf instead of duplicating them for all listeners since they're fixed per "bind" line.	2023-02-03 18:00:20 +01:00
Willy Tarreau	7866e8e50d	MEDIUM: listener: move the analysers mask to the bind_conf When bind_conf were created, some elements such as the analysers mask ought to have moved there but that wasn't the case. Now that it's getting clearer that bind_conf provides all binding parameters and the listener is essentially a listener on an address, it's starting to get really confusing to keep such parameters in the listener, so let's move the mask to the bind_conf. We also take this opportunity for pre-setting the mask to the frontend's upon initalization. Now several loops have one less argument to take care of.	2023-02-03 18:00:20 +01:00
Fr�d�ric L�caille	0aa79953c9	BUG/MINOR: quic: Unchecked source connection ID The SCID (source connection ID) used by a peer (client or server) is sent into the long header of a QUIC packet in clear. But it is also sent into the transport parameters (initial_source_connection_id). As these latter are encrypted into the packet, one must check that these two pieces of information do not differ due to a packet header corruption. Furthermore as such a connection is unusuable it must be killed and must stop as soon as possible processing RX/TX packets. Implement qc_kill_con() to flag a connection as unusable and to kille it asap waking up the idle timer task to release the connection. Add a check to quic_transport_params_store() to detect that the SCIDs do not match and make it call qc_kill_con(). Add several tests about connection to be killed at several critial locations, especially in the TLS stack callback to receive CRYPTO data from or derive secrets, and before preparing packet after having received others. Must be backported to 2.6 and 2.7.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	af25a69c8b	MEDIUM: quic: Remove qc_conn_finalize() from the ClientHello TLS callbacks This is a bad idea to make the TLS ClientHello callback call qc_conn_finalize(). If this latter fails, this would generate a TLS alert and make the connection send packet whereas it is not functional. But qc_conn_finalize() job was to install the transport parameters sent by the QUIC listener. This installation cannot be done at any time. This must be done after having possibly negotiated the QUIC version and before sending the first Handshake packets. It seems the better moment to do that in when the Handshake TX secrets are derived. This has been found inspecting the ngtcp2 code. Calling SSL_set_quic_transport_params() too late would make the ServerHello to be sent without the transport parameters. The code for the connection update which was done from qc_conn_finalize() has been moved to quic_transport_params_store(). So, this update is done as soon as possible. Add QUIC_FL_CONN_TX_TP_RECEIVED to flag the connection as having received the peer transport parameters. Indeed this is required when the ClientHello message is splitted between packets. Add QUIC_FL_CONN_FINALIZED to protect the connection from calling qc_conn_finalize() more than one time. This latter is called only when the connection has received the transport parameters and after returning from SSL_do_hanshake() which is the function which trigger the TLS ClientHello callback call. Remove the calls to qc_conn_finalize() from from the TLS ClientHello callbacks. Must be backported to 2.6. and 2.7.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	8417beb7da	BUG/MAJOR: quic: Possible crash when processing 1-RTT during 0-RTT session This bug was revealed by some C1 interop tests (heavy hanshake packet corruption) when receiving 1-RTT packets with a key phase update. This lead the packet to be decrypted with the next key phase secrets. But this latter is initialized only after the handshake is complete. In fact, 1-RTT must never be processed before the handshake is complete. Relying on the "qc->mux_state == QC_MUX_NULL" condition to check the handshake is complete is wrong during 0-RTT sessions when the mux is initialized before the handshake is complete. Must be backported to 2.7 and 2.6.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	37ed4a3842	MINOR: quic: When probing Handshake packet number space, also probe the Initial one This is not really a bug fix but an improvement. When the Handshake packet number space has been detected as needed to be probed, we should also try to probe the Initial packet number space if there are still packets in flight. Furthermore we should also try to send up to two datagrams. Must be backported to 2.6 and 2.7.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	055e82657e	BUG/MINOR: quic: Do not ignore coalesced packets in qc_prep_fast_retrans() This function is called only when probing only one packet number space (Handshake) or two times the same one (Application). So, there is no risk to prepare two times the same frame when uneeded because we wanted to probe two packet number spaces. The condition "ignore the packets which has been coalesced to another one" is not necessary. More importantly the bug is when we want to prepare a Application packet which has been coalesced to an Handshake packet. This is always the case when the first Application packet is sent. It is always coalesced to an Handshake packet with an ACK frame. So, when lost, this first application packet was never resent. It contains the HANDSHAKE_DONE frame to confirm the completion of the handshake to the client. Must be backported to 2.6 and 2.7.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	6dead91b8a	MINOR: quic: Add a trace about variable states in qc_prep_fast_retrans() This has already been very useful to diagnose retransmission issues. Must be backported to 2.6 and 2.7.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	b75eecc874	BUG/MINOR: quic: Too big PTO during handshakes During the handshake and when the handshake has not been confirmed the acknowledgement delays reported by the peer may be larger than max_ack_delay. max_ack_delay SHOULD be ignored before the handshake is completed when computing the PTO. But the current code considered the wrong condition "before the hanshake is completed". Replace the enum value QUIC_HS_ST_COMPLETED by QUIC_HS_ST_CONFIRMED to fix this issue. In quic_loss.c, the parameter passed to quic_pto_pktns() is renamed to avoid any possible confusion. Must be backported to 2.7 and 2.6.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	dd419461ef	BUG/MINOR: quic: Possible stream truncations under heavy loss This may happen during retransmission of frames which can be splitted (CRYPTO, or STREAM frames). One may have to split a frame to be retransmitted due to the QUIC protocol properties (packet size limitation and packet field encoding sizes). The remaining part of a frame which cannot be retransmitted must be detached from the original frame it is copied from. If not, when the really sent part will be acknowledged the remaining part will be acknowledged too but not sent! Must be backported to 2.7 and 2.6.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	9969adbcdc	MINOR: stats: add by HTTP version cumulated number of sessions and requests Add cum_sess_ver[] new array of counters to count the number of cumulated HTTP sessions by version (h1, h2 or h3). Implement proxy_inc_fe_cum_sess_ver_ctr() to increment these counter. This function is called each a HTTP mux is correctly initialized. The QUIC must before verify the application operations for the mux is for h3 before calling proxy_inc_fe_cum_sess_ver_ctr(). ST_F_SESS_OTHER stat field for the cumulated of sessions others than HTTP sessions is deduced from ->cum_sess_ver counter (for all the session, not only HTTP sessions) from which the HTTP sessions counters are substracted. Add cum_req[] new array of counters to count the number of cumulated HTTP requests by version and others than HTTP requests. This new member replace ->cum_req. Modify proxy_inc_fe_req_ctr() which increments these counters to pass an HTTP version, 0 special values meaning "other than an HTTP request". This is the case for instance for syslog.c from which proxy_inc_fe_req_ctr() is called with 0 as version parameter. ST_F_REQ_TOT stat field compputing for the cumulated number of requests is modified to count the sum of all the cum_req[] counters. As this patch is useful for QUIC, it must be backported to 2.7.	2023-02-03 17:55:49 +01:00
Willy Tarreau	23aa79d9a9	OPTIM: htx: inline the most common memcpy(8) On high traffic benchmarks, it's visible the the CPU is dominated by calls to memcpy(), and many of those come from htx functions. It was measured that 63% of those coming from htx are made on 8-byte blocks which really are not worth a call to the function since a single read-write cycle does it fine. This commit adds an inline htx_memcpy() function that explicitly checks for this length and just copies the data without that call. It's even likely that it could be detected on const sizes, though that was not done. This is already effective in reducing the number of calls to memcpy().	2023-02-03 13:39:18 +01:00
Amaury Denoyelle	24d5b72ca9	MINOR: quic: add config for retransmit limit Define a new configuration option "tune.quic.max-frame-loss". This is used to specify the limit for which a single frame instance can be detected as lost. If exceeded, the connection is closed. This should be backported up to 2.7.	2023-02-03 11:56:46 +01:00
Amaury Denoyelle	e4abb1f2da	MEDIUM: quic: implement a retransmit limit per frame Add a <loss_count> new field in quic_frame structure. This field is set to 0 and incremented each time a sent packet is declared lost. If <loss_count> reached a hard-coded limit, the connection is deemed as failing and is closed immediately with a CONNECTION_CLOSE using INTERNAL_ERROR. By default, limit is set to 10. This should ensure that overall memory usage is limited if a peer behaves incorrectly. This should be backported up to 2.7.	2023-02-03 11:56:42 +01:00
Amaury Denoyelle	57b3eaa793	MINOR: quic: refactor frame deallocation Define a new function qc_frm_free() to handle frame deallocation. New BUG_ON() statements ensure that the deallocated frame is not referenced by other frame. To support this, all LIST_DELETE() have been replaced by LIST_DEL_INIT(). This should enforce that frame deallocation is robust. As a complement, qc_frm_unref() has been moved into quic_frame module. It is justified as this is a utility function related to frame deallocation. It allows to use it in quic_pktns_tx_pkts_release() before calling qc_frm_free(). This should be backported up to 2.7.	2023-02-03 11:55:41 +01:00
Amaury Denoyelle	40c24f1a10	MINOR: quic: define new functions for frame alloc Define two utility functions for quic_frame allocation : * qc_frm_alloc() is used to allocate a new frame * qc_frm_dup() is used to allocate a new frame by duplicating an existing one Theses functions are useful to centralize quic_frame initialization. Note that pool_zalloc() is replaced by a proper pool_alloc() + explicit initialization code. This commit will simplify implementation of the per frame retransmission limitation. Indeed, a new counter will be added in quic_frame structure which must be initialized to 0. This should be backported up to 2.7.	2023-02-03 10:44:26 +01:00
Amaury Denoyelle	1dac018d9f	MINOR: quic: ensure offset is properly set for STREAM frames Care must be taken when reading/writing offset for STREAM frames. A special OFF bit is set in the frame type to indicate that the field is present. If not set, it is assumed that offset is 0. To represent this, offset field of quic_stream structure must always be initialized with a valid value in regards with its frame type OFF bit. The previous code has no bug in part because pool_zalloc() is used to allocate quic_frame instances. To be able to use pool_alloc(), offset is always explicitely set to 0. If a non-null value is used, OFF bit is set at the same occasion. A new BUG_ON() statement is added on frame builder to ensure that the caller has set OFF bit if offset is non null. This should be backported up to 2.7.	2023-02-03 09:46:55 +01:00
Amaury Denoyelle	2216b0866e	MINOR: quic: remove fin from quic_stream frame type A dedicated <fin> field was used in quic_stream structure. However, this info is already encoded in the frame type field as specified by QUIC protocol. In fact, only code for packet reception used the <fin> field. On the sending side, we only checked for the FIN bit. To align both sides, remove the <fin> field and only used the FIN bit. This should be backported up to 2.7.	2023-02-03 09:46:55 +01:00
Aurelien DARRAGON	5e7ecbec99	BUG/MINOR: stats: use proper buffer size for http dump In an attempt to fix GH #1873, ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats") explicitly reduced output buffer size to leave enough space for htx overhead under http context. Github user debtsandbooze, who first reported the issue, came back to us and said he was still able to make the http dump "hang" with the new fix. After some tests, it became clear that htx_add_data_atonce() could fail from time to time in stats_putchk(), even if htx was completely empty: In http context, buffer size is maxed out at channel_htx_recv_limit(). Unfortunately, channel_htx_recv_limit() is not what we're looking for here because limit() doesn't compute the proper htx overhead. Using buf_room_for_htx_data() instead of channel_htx_recv_limit() to compute max "usable" data space seems to be the last piece of work required for the previous fix to work properly. This should be backported everywhere the aforementioned commit is.	2023-02-02 17:10:11 +01:00
Aurelien DARRAGON	739281b3d6	BUG/MEDIUM: thread: consider secondary threads as idle+harmless during boot idle and harmless bits in the tgroup_ctx structure were not explicitly set during boot. \| struct tgroup_ctx ha_tgroup_ctx[MAX_TGROUPS] = { }; As the structure is first statically initialized, .threads_harmless and .threads_idle are automatically zero- initialized by the compiler. Unfortulately, this means that such threads are not considered idle nor harmless by thread_isolate(_full)() functions until they enter the polling loop (thread_harmless_now() and thread_idle_now() are respectively called before entering the polling loop) Because of this, any attempt to call thread_isolate() or thread_isolate_full() during a startup phase with nbthreads >= 2 will cause thread_isolate to loop until every secondary threads make it through their first polling loop. If the startup phase is aborted during boot (ie: "-c" option to check the configuration), secondary threads may be initialized but will never be started (ie: they won't enter the polling loop), thus thread_isolate() could would loop forever in such cases. We can easily reveal the bug with this patch reproducer: \| diff --git a/src/haproxy.c b/src/haproxy.c \| index e91691658..0b733f6ee 100644 \| --- a/src/haproxy.c \| +++ b/src/haproxy.c \| @@ -2317,6 +2317,10 @@ static void init(int argc, char *argv) \| if (pr \|\| px) { \| / At least one peer or one listener has been found */ \| qfprintf(stdout, "Configuration file is valid\n"); \| + printf("haproxy will loop...\n"); \| + thread_isolate(); \| + printf("we will never reach this\n"); \| + thread_release(); \| deinit_and_exit(0); \| } \| qfprintf(stdout, "Configuration file has no error but will not start (no listener) => exit(2).\n"); Now we start haproxy with a valid config: $> haproxy -c -f valid.conf Configuration file is valid haproxy will loop... ^C ------------------------------------------------------------------------------ This did not cause any issue so far because no early deinit paths require full thread isolation. But this may change when new features or requirements are introduced, so we should fix this before it becomes a real issue. To fix this, we explicitly assign .threads_harmless and .threads_idle to .threads_enabled value in thread_map_to_groups() function during boot. This is the proper place to do this since as long as .threads_enabled is not explicitly set, its default value is also 0 (zero-initialized by the compiler) code snippet from thread_isolate() function: ulong te = _HA_ATOMIC_LOAD(&ha_tgroup_info[tgrp].threads_enabled); ulong th = _HA_ATOMIC_LOAD(&ha_tgroup_ctx[tgrp].threads_harmless); if ((th & te) == te) break; Thus thread_isolate(_full()) won't be looping forever in thread_isolate() even if it were to be used before thread_map_to_groups() is executed. No backport needed unless this is a requirement.	2023-02-02 08:21:15 +01:00
Amaury Denoyelle	78adb4b451	BUG/MINOR: h3: fix crash due to h3 traces This commit is identical to the preceeding patch. However, these traces are from another patch with a different backport scope : `56a86ddfb9` MINOR: h3: add missing traces on closure This must be backported up to 2.7 where above patch is scheduled.	2023-01-31 16:09:47 +01:00
Amaury Denoyelle	e31867b7fa	BUG/MINOR: h3: fix crash due to h3 traces First H3 traces argument must be a connection instance or a NULL. Some new traces were added recently with a qcc instance which caused a crash when traces are activated. This trace was added by the following patch : `87f8766d3f` BUG/MEDIUM: h3: handle STOP_SENDING on control stream This must be backported up to 2.6 along with the above patch.	2023-01-31 16:08:33 +01:00
William Lallemand	222e5a260b	BUG/MEDIUM: ssl: wrong eviction from the session cache tree When using WolfSSL, there are some cases were the SSL_CTX_sess_new_cb is called with an existing session ID. These cases are not met with OpenSSL. When the ID is found in the session tree during the insertion, the shared_block len is not set to 0 and is not used. However if later the block is reused, since the len is not set to 0, the release callback will be called an ebmb_delete will be tried on the block, even if it's not in the tree, provoking a crash. The code was buggy from the beginning, but the case never happen with openssl which changes the ID. Must be backported in every maintained branches.	2023-01-31 14:34:40 +01:00
Amaury Denoyelle	56a86ddfb9	MINOR: h3: add missing traces on closure Add traces for function h3_shutdown() / h3_send_goaway(). This should help to debug problems related to connection closure. This should be backported up to 2.7.	2023-01-30 16:16:46 +01:00
Amaury Denoyelle	e269aeb46b	BUG/MINOR: h3: reject RESET_STREAM received for control stream This commit is similar to the previous one. It reports an error if a RESET_STREAM is received for the remote control stream. This will generate a CONNECTION_CLOSE with H3_CLOSED_CRITICAL_STREAM error. Note that contrary to the previous bug related to STOP_SENDING, this bug was not encountered in real environment. As such, it is labelled as MINOR. However, it could triggered the same crash as the previous patch. This should be backported up to 2.6.	2023-01-30 16:16:46 +01:00
Amaury Denoyelle	87f8766d3f	BUG/MEDIUM: h3: handle STOP_SENDING on control stream Before this patch, STOP_SENDING reception was considered valid even on H3 control stream. This causes the emission in return of RESET_STREAM and eventually the closure and freeing of the QCS instance. This then causes a crash during connection closure as a GOAWAY frame is emitted on the control stream which is now released. To fix this crash, STOP_SENDING on the control stream is now properly rejected as specified by RFC 9114. The new app_ops close callback is used which in turn will generate a CONNECTION_CLOSE with error H3_CLOSED_CRITICAL_STREAM. This bug was detected in github issue #2006. Note that however it is triggered by an incorrect client behavior. It may be useful to determine which client behaves like this. If this case is too frequent, STOP_SENDING should probably be silently ignored. To reproduce this issue, quiche was patched to emit a STOP_SENDING on its send() function in quiche/src/lib.rs: pub fn send(&mut self, out: &mut [u8]) -> Result<(usize, SendInfo)> { - self.send_on_path(out, None, None) + let ret = self.send_on_path(out, None, None); + self.streams.mark_stopped(3, true, 0); + ret } This must be backported up to 2.6 along with the preceeding commit : MINOR: mux-quic/h3: define close callback	2023-01-30 16:12:23 +01:00
Amaury Denoyelle	1e340ba6bc	MINOR: mux-quic/h3: define stream close callback Define a new qcc_app_ops callback named close(). This will be used to notify app-layer about the closure of a stream by the remote peer. Its main usage is to ensure that the closure is allowed by the application protocol specification. For the moment, close is not implemented by H3 layer. However, this function will be mandatory to properly reject a STOP_SENDING on the control stream and preventing a later crash. As such, this commit must be backported with the next one on 2.6. This is related to github issue #2006.	2023-01-30 15:56:25 +01:00
Amaury Denoyelle	4be5435014	OPTIM: h3: skip buf realign if no trailer to encode h3_resp_trailers_send() may be called due to an HTX EOT block present without preceeding HTX TRAILER block. In this case, no HEADERS frame will be generated by H3 layer and MUX will emit an empty STREAM frame with FIN set. However, before skipping these, some operations are conducted on qcs buffer to realign it and try to encode the QPACK field section line in a buffer copy. These operation are thus unneeded if no trailer is generated. Even worse, the function will fail if there is not enough space in the buffer for the superfluous QPACK section line. To improve this situation, this patch adds an early goto statement to skip most operations in h3_resp_trailers_send() if no HTX trailer block is found. This patch is related to github issue #2006. This should be backported up to 2.7.	2023-01-30 15:39:41 +01:00
Amaury Denoyelle	224ba5cffe	BUG/MEDIUM: h3: do not crash if no buf space for trailers Replace ABORT_NOW() by proper error management in h3_resp_trailers_send() for QPACK encoding operation. If a QPACK encoding operation fails, it means there is not enough space in qcs buffer. In this case, flag qcs instance with QC_SF_BLK_MROOM and return an error. MUX is responsible to remove this flag once buffer space is available. This should fix the crash reported by gabrieltz on github issue #2006. This must be backported up to 2.7.	2023-01-30 15:38:22 +01:00

... 15 16 17 18 19 ...

16712 Commits