haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-12-17 23:51:00 +01:00

Author	SHA1	Message	Date
Aurelien DARRAGON	368d01361a	MEDIUM: server: add and use srv_init() function rename _srv_postparse() internal function to srv_init() function and group srv_init_per_thr() plus idle conns list init inside it. This way we can perform some simplifications as srv_init() performs multiple server init steps after parsing. SRV_F_CHECKED flag was added, it is automatically set when srv_init() runs successfully. If the flag is already set and srv_init() is called again, nothing is done. This permis to manually call srv_init() earlier than the default POST_CHECK hook when needed without risking to do things twice.	2025-06-02 17:51:33 +02:00
Aurelien DARRAGON	889ef6f67b	MEDIUM: server: automatically add server to proxy list in new_server() while new_server() takes the parent proxy as argument and even assigns srv->proxy to the parent proxy, it didn't actually inserted the server to the parent proxy server list on success. The result is that sometimes we add the server to the list after new_server() is called, and sometimes we don't. This is really error-prone and because of that hooks such as REGISTER_POST_SERVER_CHECK() which as run for all servers listed in all proxies may not be relied upon for servers which are not actually inserted in their parent proxy server list. Plus it feels very strange to have a server that points to a proxy, but then the proxy doesn't know about it because it cannot find it in its server list. To prevent errors and make proxy->srv list reliable, we move the insertion logic directly under new_server(). This requires to know if we are called during parsing or during runtime to either insert or append the server to the parent proxy list. For that we use PR_FL_CHECKED flag from the parent proxy (if the flag is set, then the proxy was checked so we are past the init phase, thus we assume we are called during runtime) This implies that during startup if new_server() has to be cancelled on error paths we need to call srv_detach() (which is now exposed in server.h) before srv_drop(). The consequence of this commit is that REGISTER_POST_SERVER_CHECK() should not run reliably on all servers created using new_server() (without having to manually loop on global servers_list)	2025-06-02 17:51:30 +02:00
Aurelien DARRAGON	e262e4bbe4	MEDIUM: proxy: use global proxy list for REGISTER_POST_PROXY_CHECK() hook REGISTER_POST_PROXY_CHECK() used to iterate over "main" proxies to run registered callbacks. This means hidden proxies (and their servers) did not get a chance to get post-checked and could cause issues if some post- checks are expected to be executed on all proxies no matter their type. Instead we now rely on the global proxies list. Another side effect is that the REGISTER_POST_SERVER_CHECK() now runs as well for servers from proxies that are not part of the main proxies list.	2025-06-02 17:51:27 +02:00
Aurelien DARRAGON	1f12e45b0a	MINOR: log: only run postcheck_log_backend() checks on backend postcheck_log_backend() checks are executed no matter if the proxy actually has the backend capability while the checks actually depend on this. Let's fix that by adding an extra condition to ensure that the BE capability is set. This issue is not tagged as a bug because for now it remains impossible to have a syslog proxy without BE capability in the main proxy list, but this may change in the future.	2025-06-02 17:51:24 +02:00
Aurelien DARRAGON	943958c3ff	MINOR: proxy: add a true list containing all proxies We have global proxies_list pointer which is announced as the list of "all existing proxies", but in fact it only represents regular proxies declared on the config file through "listen, frontend or backend" keywords It is ambiguous, and we currently don't have a straightforwrd method to iterate over all proxies (either public or internal ones) within haproxy Instead we still have to manually iterate over multiple lists (main proxies, log-forward proxies, peer proxies..) which is error-prone. In this patch we add a struct list member (8 bytes) inside struct proxy in order to store every proxy (except default ones) within a global "proxies" list which is actually representative for all proxies existing under haproxy process, like we already have for servers.	2025-06-02 17:51:21 +02:00
Aurelien DARRAGON	6ccf770fe2	MINOR: proxy: collect per-capability stat in proxy_cond_disable() proxy_cond_disable() collects and prints cumulated connections for be and fe proxies no matter their type. With shared stats it may cause issues because depending on the proxy capabilities only fe or be counters may be allocated. In this patch we add some checks to ensure we only try to read from valid memory locations, else we rely on default values (0).	2025-06-02 17:51:17 +02:00
Aurelien DARRAGON	c7c017ec3c	MINOR: stats: add ME_NEW_COMMON() helper Split ME_NEW_* helper into COMMON part and specific part so it becomes easier to add alternative helpers without code duplication.	2025-06-02 17:51:12 +02:00
Aurelien DARRAGON	d04843167c	MINOR: stats: add stat_col flags Add stat_col flags member to store .generic bit and prepare for upcoming flags. No functional change expected.	2025-06-02 17:51:08 +02:00
Aurelien DARRAGON	f0b40b49b8	MINOR: server: group postinit server tasks under _srv_postparse() init_srv_requeue() and init_srv_slowstart() functions are called after initial server parsing via REGISTER_POST_SERVER_CHECK() hook, and they are also manually called for dynamic server after the server is initialized. This may conflict with _srv_postparse() which is also registered via REGISTER_POST_SERVER_CHECK() and called during dynamic server creation To ensure functions don't conflict with each other, let's ensure they are executed in proper order by calling init_srv_requeue and init_srv_slowstart() from _srv_postparse() which now becomes the parent function for server related postparsing stuff. No change of behavior is expected.	2025-06-02 17:51:05 +02:00
Willy Tarreau	b88164d9c0	BUILD: tools: properly define ha_dump_backtrace() to avoid a build warning In resolve_sym_name() we declare a few symbols that we want to be able to resolve. ha_dump_backtrace() was declared with a struct buffer instead of a pointer to such a struct, which has no effect since we only want to get the function's pointer, but produces a build warning with LTO, so let's fix it. This can be backported to 3.0.	2025-05-30 17:15:48 +02:00
Christopher Faulet	50fca6f0b7	BUG/MEDIUM: httpclient: Throw an error if an lua httpclient instance is reused It is not expected/supported to reuse an httpclient instance to process several requests. A new instance must be created for each request. However, in lua, there is nothing to prevent a user to create an httpclient object and use it in a loop to process requests. That's unfortunate because this will apparently work, the requests will be sent and a response will be received and processed. However internally some ressources will be allocated and never released. When the next response is processed, the ressources allocated for the previous one are definitively lost. In this patch we take care to check that the httpclient object was never used when a request is sent from a lua script by checking HTTPCLIENT_FS_STARTED flags. This flag is set when a httpclient applet is spawned to process a request and never removed after that. In lua, the httpclient applet is created when the request is sent. So, it is the right place to do this test. This patch should fix the issue #2986. It should be backported as far as 2.6.	2025-05-27 18:47:24 +02:00
Christopher Faulet	bc4c3c7969	BUG/MEDIUM: hlua: Fix receive API for TCP applets to properly handle shutdowns An optional timeout was added to AppletTCP.receive() to interrupt calls after a delay. It was mandatory to be able to implement interactive applets (like trisdemo). However, this broke the API and it made impossible to differentiate the shutdowns from the delays expirations. Indeed, in both cases, an empty string was returned. Because historically an empty string was used to notify a connection shutdown, it should not be changed. So now, 'nil' value is returned when no data was available before the delay expiration. The new AppletTCP:try_receive() function was also affected. To fix it, instead of stating there is no delay when a receive is tried, an expired delay is set. Concretely TICK_ETERNITY was replaced by now_ms. Finally, AppletTCP:getline() function is not concerned for now because there is no way to interrupt it after some delay. The documentation and trisdemo lua script were updated accordingly. This patch depends on "BUG/MEDIUM: hlua: Properly detect shudowns for TCP applets based on the new API". However, it is a 3.2-specific issue, so no backport is needed.	2025-05-27 07:53:19 +02:00
Christopher Faulet	c0ecef71d7	BUG/MEDIUM: hlua: Fix getline() for TCP applets to work with applet's buffers The commit e5e36ce09 ("BUG/MEDIUM: hlua/cli: Fix lua CLI commands to work with applet's buffers") fixed the TCP applets API to work with applets using its own buffers. Howver the getline() function was not updated. It could be an issue for anyone registering a CLI commands reading lines. This patch should be backported as far as 3.0.	2025-05-27 07:53:01 +02:00
Christopher Faulet	c64781c2c8	BUG/MEDIUM: hlua: Properly detect shudowns for TCP applets based on the new API The internal function responsible to receive data for TCP applets with internal buffers is buggy. Indeed, for these applets, the buffer API is used to get data. So there is no tests on the SE to properly detect connection shutdowns. So, it must be performed by hand after the call to b_getblk_nc(). This patch must be backported as far as 3.0.	2025-05-26 19:00:00 +02:00
Christopher Faulet	4d4da515f2	BUG/MEDIUM: cli/ring: Properly handle shutdown in "show event" I/O handler The commit 03dc54d802 ("BUG/MINOR: ring: Fix I/O handler of "show event" command to not rely on the SC") introduced a regression. By removing dependencies on the SC, a test to detect client shutdowns was removed. So now, the CLI applet is no longer released when the client shut the connection during a "show event -w". So of course, we should not use the SC to detect the shutdowns. But the SE must be used insteead. It is a 3.2-specific issue, so no backport needed.	2025-05-26 19:00:00 +02:00
Christopher Faulet	99e755d673	MINOR: listeners: Add support for a label on bind line It is now possile to set a label on a bind line. All sockets attached to this bind line inherits from this label. The idea is to be able to groud of sockets. For now, there is no mechanism to create these groups, this must be done by hand.	2025-05-26 19:00:00 +02:00
Christopher Faulet	e70c23e517	BUG/MEDIUM: h3: Declare absolute URI as normalized when a :authority is found Since commit 2c3d656f8 ("MEDIUM: h3: use absolute URI form with :authority"), the absolute URI form is used when a ':authority' pseudo-header is found. However, this URI was not declared as normalized internally. So, when the request is reformated to be sent to an h1 server, the absolute-form is used instead of the origin-form. It is unexpected and may be an issue for some servers that could reject the request. So, now, we take care to set HTX_SL_F_HAS_AUTHORITY flag on the HTX message when an authority was found and HTX_SL_F_NORMALIZED_URI flag is set for "http" or "https" schemes. No backport needed because the commit above must not be backported. It should fix a regression reported on the 3.2-dev17 in issue #2977. This commit depends on "BUG/MINOR: h3: Set HTX flags corresponding to the scheme found in the request".	2025-05-26 11:47:23 +02:00
Christopher Faulet	da9792cca8	BUG/MINOR: h3: Set HTX flags corresponding to the scheme found in the request When a ":scheme" pseudo-header is found in a h3 request, the HTX_SL_F_HAS_SCHM flag must be set on the HTX message. And if the scheme is 'http' or 'https', the corresponding HTX flag must also be set. So, respectively, HTX_SL_F_SCHM_HTTP or HTX_SL_F_SCHM_HTTPS. It is mainly used to send the right ":scheme" pseudo-header value to H2 server on backend side. This patch could be backported as far as 2.6.	2025-05-26 11:38:29 +02:00
Remi Tricot-Le Breton	90441e9bfe	BUG/MAJOR: cache: Crash because of wrong cache entry deleted When "vary" is enabled, we can have multiple entries for a given primary key in the cache tree. There is a limit to how many secondary entries can be inserted for a given key. When we try to insert a new secondary entry, if the limit is already reached, we can try to find expired entries with the same primary key, and if the limit is still reached we want to abort the current insertion and to remove the node that was just inserted. In commit "a29b073: MEDIUM: cache: Add refcount on cache_entry" though, a regression was introduced. Instead of removing the entry just inserted as the comments suggested, we removed the second to last entry and returned NULL. We then reset the eb.key of the cache_entry in the caller because we assumed that the entry was already removed from the tree. This means that some entries with an empty key were wrongly kept in the tree and the last secondary entry, which keeps the number of secondary entries of a given key was removed. This ended up causing some crashes later on when we tried to iterate over the elements of this given key. The crash could occur in multiple places, either when trying to retrieve an entry or to add some new ones. This crash was raised in GitHub issue #2950. The fix should be backported up to 3.0.	2025-05-23 22:38:54 +02:00
Willy Tarreau	84ffb3d0a9	MINOR: config: list recently added sections with -dKcfg Newly added sections (crt-store, traces, acme) were not listed in -dKcfg, let's add them. For now they have to be manually enumerated.	2025-05-23 10:49:33 +02:00
Willy Tarreau	28c7a22790	BUG/MEDIUM: server: fix potential null-deref after previous fix A valid build warning was reported in the CI with latest commit b40ce97ecc ("BUG/MEDIUM: server: fix crash after duplicate GUID insertion"). Indeed, if the first test in the function fails, we branch to the err label with guid==NULL and will crash there. Let's just test guid before dereferencing it for freeing. This needs to be backported to 3.0 as well since the commit above was meant to go there.	2025-05-22 18:09:12 +02:00
Amaury Denoyelle	b40ce97ecc	BUG/MEDIUM: server: fix crash after duplicate GUID insertion On "add server", if a GUID is defined, guid_insert() is used to add the entry into the global GUID tree. If a similar entry already exists, GUID insertion fails and the server creation is eventually aborted. A crash could occur in this case because of an invalid memory access via guid_remove(). The latter is caused via free_server() as the server insertion is rejected. The invalid occurs on GUID key. The issue occurs because of guid_insert(). The function properly deallocates the GUID key on duplicate insertion, but it failed to reset <guid.node.key> to NULL. This caused the invalid memory access on guid_remove(). To fix this, ensure that key member is properly resetted on guid_insert() error path. This must be backported up to 3.0.	2025-05-22 17:59:37 +02:00
Amaury Denoyelle	5e088e3f8e	MINOR: server: use stress mode for "add server help" Implement stress mode on "add server help". This ensures that the command is fully reentrant on full output buffer. For testing, it requires compilation with USE_STRESS and global setting "stress-level 1".	2025-05-22 17:40:05 +02:00
Amaury Denoyelle	4de5090976	MINOR: server: implement "add server help" Implement "help" as a sub-command for "add server" CLI. The objective is to list all the keywords that are supported for dynamic servers. CLI IO handler and add_srv_ctx are used to support reentrancy on full output buffer. Now that this command is implemented, the outdated keyword list on "add server" from management documentation can be removed.	2025-05-22 17:40:05 +02:00
Amaury Denoyelle	2570892c41	MINOR: server: define CLI I/O handler for "add server" Extend "add server" to support an IO handler function named cli_io_handler_add_server(). A context object is also defined whose usage will depend on IO handler capabilities. IO handler is skipped when "add server" is run in default mode, i.e. on a dynamic server creation. Thus, currently IO handler is unneeded. However, it will become useful to support sub-commands for "add server". Note that return value of "add server" parser has been changed on server creation success. Previously, it was used incorrectly to report if server was inserted or not. In fact, parser return value is used by CLI generic code to detect if command processing has been completed, or should continue to the IO handler. Now, "add server" always returns 1 to signal that CLI processing is completed. This is necessary to preserve CLI output emitted by parser, even now that IO handler is defined for the command. Previously, output was emitted in every situations due to IO handler not defined. See below code snippet from cli.c for a better overview : if (kw->parse && kw->parse(args, payload, appctx, kw->private) != 0) { ret = 1; goto fail; } /* kw->parse could set its own io_handler or io_release handler */ if (!appctx->cli_ctx.io_handler) { ret = 1; goto fail; } appctx->st0 = CLI_ST_CALLBACK; ret = 1; goto end;	2025-05-22 17:40:05 +02:00
Willy Tarreau	1c0f2e62ad	MINOR: ssl: also provide the "tls-tickets" bind option Currently there is "no-tls-tickets" that is also supported in the ssl-default-bind-options directive, but there's no way to re-enable them on a specific "bind" line. This patch simply provides the option to re-enable them. Note that the flag is inverted because tickets are enabled by default and the no-tls-ticket option sets the flag to disable them.	2025-05-22 15:31:54 +02:00
Willy Tarreau	3494775a1f	MINOR: ssl: support strict-sni in ssl-default-bind-options Several users already reported that it would be nice to support strict-sni in ssl-default-bind-options. However, in order to support it, we also need an option to disable it. This patch moves the setting of the option from the strict_sni field to a flag in the ssl_options field so that it can be inherited from the default bind options, and adds a new "no-strict-sni" directive to allow to disable it on a specific "bind" line. The test file "del_ssl_crt-list.vtc" which already tests both options was updated to make use of the default option and the no- variant to confirm everything continues to work.	2025-05-22 15:31:54 +02:00
Christopher Faulet	7244f16ac4	MINOR: promex: Add agent check status/code/duration metrics In the Prometheus exporter, the last health check status is already exposed, with its code and duration in seconds. The server status is also exposed. But the information about the agent check are not available. It is not really handy because when a server status is changed because of the agent, it is not obvious by looking to the Prometheus metrics. Indeed, the server may reported as DOWN for instance, while the health check status still reports a success. Being able to get the agent status in that case could be valuable. So now, the last agent check status is exposed, with its code and duration in seconds. Following metrics can be grabbe now: * haproxy_server_agent_status * haproxy_server_agent_code * haproxy_server_agent_duration_seconds Note that unlike the other metrics, no per-backend aggregated metric is exposed. This patch is related to issue #2983.	2025-05-22 09:50:10 +02:00
Willy Tarreau	a1577a89a0	MINOR: glitches: add global setting "tune.glitches.kill.cpu-usage" It was mentioned during the development of glitches that it would be nice to support not killing misbehaving connections below a certain CPU usage so that poor implementations that routinely misbehave without impact are not killed. This is now possible by setting a CPU usage threshold under which we don't kill them via this parameter. It defaults to zero so that we continue to kill them by default.	2025-05-21 15:47:42 +02:00
Willy Tarreau	eee57b4d3f	CLEANUP: cfgparse: alphabetically sort the global keywords The global keywords table was no longer sorted at all, let's fix it to ease spotting the searched ones.	2025-05-21 15:47:42 +02:00
Amaury Denoyelle	01e3b2119a	MINOR: quic: add some missing includes Insert some missing includes statement in QUIC source files. This was detected after the next commit which adjust the include list used in quic_conn-t.h file.	2025-05-21 14:44:27 +02:00
Amaury Denoyelle	f286288471	MINOR: quic: refactor handling of streams after MUX release quic-conn layer has to handle itself STREAM frames after MUX release. If the stream was already seen, it is probably only a retransmitted frame which can be safely ignored. For other streams, an active closure may be needed. Thus it's necessary that quic-conn layer knows the highest stream ID already handled by the MUX after its release. Previously, this was done via <nb_streams> member array in quic-conn structure. Refactor this by replacing <nb_streams> by two members called <stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for quic-conn layer to monitor locally opened uni streams, as the peer cannot by definition emit a STREAM frame on it. Also, bidirectional streams are always opened by the remote side. Previously, <nb_streams> were set by quic-stream layer. Now, <stream_max_uni>/<stream_max_bidi> members are only set one time, just prior to QUIC MUX release. This is sufficient as quic-conn do not use them if the MUX is available. Note that previously, IDs were used relatively to their type, thus incremented by 1, after shifting the original value. For simplification, use the plain stream ID, which is incremented by 4.	2025-05-21 14:26:45 +02:00
Amaury Denoyelle	07d41a043c	MINOR: quic: move function to check stream type in utils Move general function to check if a stream is uni or bidirectional from QUIC MUX to quic_utils module. This should prevent unnecessary include of QUIC MUX header file in other sources.	2025-05-21 14:17:41 +02:00
Amaury Denoyelle	cf45bf1ad8	CLEANUP: quic: remove unused cbuf module Cbuf are not used anymore. Remove the related source and header files, as well as include statements in the rest of QUIC source files.	2025-05-21 14:16:37 +02:00
William Lallemand	8b121ab6f7	BUG/MINOR: acme: fix formatting issue in error and logs Stop emitting \n in errmsg for intermediate error messages, this was emitting multiline logs and was returning to a new line in the middle of sentences. We don't need to emit them in acme_start_task() since the errmsg is ouput in a send_log which already contains a \n or on the CLI which also emits it.	2025-05-21 11:41:28 +02:00
William Lallemand	156f4bd7a6	BUG/MEDIUM: acme: check if acme domains are configured When starting the ACME task with a ckch_conf which does not contain the domains, the ACME task would segfault because it will try to dereference a NULL in this case. The patch fix the issue by emitting a warning when no domains are configured. It's not done at configuration parsing because it is not easy to emit the warning because there are is no callback system which give access to the whole ckch_conf once a line is parsed. No backport needed.	2025-05-21 11:41:28 +02:00
Amaury Denoyelle	e399daa67e	BUG/MEDIUM: mux-quic: fix BUG_ON() on rxbuf alloc error RX buffer allocation has been reworked in current dev tree. The objective is to support multiple buffers per QCS to improve upload throughput. RX buffer allocation failure is handled simply : the whole connection is closed. This is done via qcc_set_error(), with INTERNAL_ERROR as error code. This function contains a BUG_ON() to ensure it is called only one time per connection instance. On RX buffer alloc failure, the aformentioned BUG_ON() crashes due to a double invokation of qcc_set_error(). First by qcs_get_rxbuf(), and immediately after it by qcc_recv(), which is the caller of the previous one. This regression was introduced by the following commit. 60f64449fbba7bb6e351e8343741bb3c960a2e6d MAJOR: mux-quic: support multiple QCS RX buffers To fix this, simply remove qcc_set_error() invocation in qcs_get_rxbuf(). On buffer alloc failture, qcc_recv() is responsible to set the error. This does not need to be backported.	2025-05-21 11:33:00 +02:00
Willy Tarreau	4b52d5e406	BUILD: acme: fix build issue on 32-bit archs with 64-bit time_t The build failed on mips32 with a 64-bit time_t here: https://github.com/haproxy/haproxy/actions/runs/15150389164/job/42595310111 Let's just turn the "remain" variable used to show the remaining time into a more portable ullong and use %llu for all format specifiers, since long remains limited to 32-bit on 32-bit archs. No backport needed.	2025-05-21 10:18:47 +02:00
Willy Tarreau	09d4c9519e	BUILD: ssl: avoid possible printf format warning in traces When building on MIPS-32 with gcc-9.5 and glibc-2.31, I got this: src/ssl_trace.c: In function 'ssl_trace': src/ssl_trace.c:118:42: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'ssize_t' {aka 'const int'} [-Wformat=] 118 \| chunk_appendf(&trace_buf, " : size=%ld", *size); \| ~~^ ~~~~~ \| \| \| \| \| ssize_t {aka const int} \| long int \| %d Let's just cast the type. No backport needed.	2025-05-21 10:01:14 +02:00
Willy Tarreau	3b2fb5cc15	CLEANUP: wdt: clarify the comments on the common exit path The condition in which we reach the check for ha_panic() and ha_stuck_warning() are not super clear, let's reformulate them.	2025-05-20 16:37:06 +02:00
Willy Tarreau	0a8bfb5b90	BUG/MEDIUM: wdt: always ignore the first watchdog wakeup With commit a06c215f08 ("MEDIUM: wdt: always make the faulty thread report its own warnings"), when the TH_FL_STUCK flag was flipped on, we'd then go to the panic code instead of giving a second chance like before the commit. This can trigger rare cases that only happen with moderate loads like was addressed by commit 24ce001771 ("BUG/MEDIUM: wdt: fix the stuck detection for warnings"). This is in fact due to the loss of the common "goto update_and_leave" that used to serve both the warning code and the flag setting for probation, and it's apparently what hit Christian in issue #2980. Let's make sure we exit naturally when turning the bit on for the first time. Let's also update the confusing comment at the end of the check that was left over by latest change. Since the first commit was backported to 3.1, this commit should be backported there as well.	2025-05-20 16:37:03 +02:00
Frederic Lecaille	08eee0d9cf	MINOR: quic: OpenSSL 3.5 trick to support 0-RTT For an unidentified reason, SSL_do_hanshake() succeeds at its first call when 0-RTT is enabled for the connection. This behavior looks very similar by the one encountered by AWS-LC stack. That said, it was documented by AWS-LC. This issue leads the connection to stop sending handshake packets after having release the handshake encryption level. In fact, no handshake packets could even been sent leading the handshake to always fail. To fix this, this patch simulates a "handshake in progress" state waiting for the application level read secret to be established by the TLS stack. This may happen only after the QUIC listener has completed/confirmed the handshake upon handshake CRYPTO data receipt from the peer.	2025-05-20 15:00:06 +02:00
Frederic Lecaille	849a3af14e	MINOR: quic: OpenSSL 3.5 internal QUIC custom extension for transport parameters reset A QUIC must sent its transport parameter using a TLS custom extention. This extension is reset by SSL_set_SSL_CTX(). It can be restored calling quic_ssl_set_tls_cbs() (which calls SSL_set_quic_tls_cbs()).	2025-05-20 15:00:06 +02:00
Frederic Lecaille	b3ac1a636c	MINOR: quic: implement all remaining callbacks for OpenSSL 3.5 QUIC API The quic_conn struct is modified for two reasons. The first one is to store the encoded version of the local tranport parameter as this is done for USE_QUIC_OPENSSL_COMPAT. Indeed, the local transport parameter "should remain valid until after the parameters have been sent" as mentionned by SSL_set_quic_tls_cbs(3) manual. In our case, the buffer is a static buffer attached to the quic_conn object. qc_ssl_set_quic_transport_params() function whose role is to call SSL_set_tls_quic_transport_params() (aliased by SSL_set_quic_transport_params() to set these local tranport parameter into the TLS stack from the buffer attached to the quic_conn struct. The second quic_conn struct modification is the addition of the new ->prot_level (SSL protection level) member added to the quic_conn struct to store "the most recent write encryption level set via the OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn callback (if it has been called)" as mentionned by SSL_set_quic_tls_cbs(3) manual. This patches finally implements the five remaining callacks to make the haproxy QUIC implementation work. OSSL_FUNC_SSL_QUIC_TLS_crypto_send_fn() (ha_quic_ossl_crypto_send) is easy to implement. It calls ha_quic_add_handshake_data() after having converted qc->prot_level TLS protection level value to the correct ssl_encryption_level_t (boringSSL API/quictls) value. OSSL_FUNC_SSL_QUIC_TLS_crypto_recv_rcd_fn() (ha_quic_ossl_crypto_recv_rcd()) provide the non-contiguous addresses to the TLS stack, without releasing them. OSSL_FUNC_SSL_QUIC_TLS_crypto_release_rcd_fn() (ha_quic_ossl_crypto_release_rcd()) release these non-contiguous buffer relying on the fact that the list of encryption level (qc->qel_list) is correctly ordered by SSL protection level secret establishements order (by the TLS stack). OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn() (ha_quic_ossl_got_transport_params()) is a simple wrapping function over ha_quic_set_encryption_secrets() which is used by boringSSL/quictls API. OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() (ha_quic_ossl_got_transport_params()) role is to store the peer received transport parameters. It simply calls quic_transport_params_store() and set them into the TLS stack calling qc_ssl_set_quic_transport_params(). Also add some comments for all the OpenSSL 3.5 QUIC API callbacks. This patch have no impact on the other use of QUIC API provided by the others TLS stacks.	2025-05-20 15:00:06 +02:00
Frederic Lecaille	dc6a3c329a	MINOR: quic: Allow the use of the new OpenSSL 3.5.0 QUIC TLS API (to be completed) This patch allows the use of the new OpenSSL 3.5.0 QUIC TLS API when it is available and detected at compilation time. The detection relies on the presence of the OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_SEND macro from openssl-compat.h. Indeed this macro is defined by OpenSSL since 3.5.0 version. It is not defined by quictls. This helps in distinguishing these two TLS stacks. When the detection succeeds, HAVE_OPENSSL_QUIC is also defined by openssl-compat.h. Then, this is this new macro which is used to detect the availability of the new OpenSSL 3.5.0 QUIC TLS API. Note that this detection is done only if USE_QUIC_OPENSSL_COMPAT is not asked. So, USE_QUIC_OPENSSL_COMPAT and HAVE_OPENSSL_QUIC are exclusive. At the same location, from openssl-compat.h, ssl_encryption_level_t enum is defined. This enum was defined by quictls and expansively used by the haproxy QUIC implementation. SSL_set_quic_transport_params() is replaced by SSL_set_quic_tls_transport_params. SSL_set_quic_early_data_enabled() (quictls) is also replaced by SSL_set_quic_tls_early_data_enabled() (OpenSSL). SSL_quic_read_level() (quictls) is not defined by OpenSSL. It is only used by the traces to log the current TLS stack decryption level (read). A macro makes it return -1 which is an usused values. The most of the differences between quictls and OpenSSL QUI APIs are in quic_ssl.c where some callbacks must be defined for these two APIs. This is why this patch modifies quic_ssl.c to define an array of OSSL_DISPATCH structs: <ha_quic_dispatch>. Each element of this arry defines a callback. So, this patch implements these six callabcks: - ha_quic_ossl_crypto_send() - ha_quic_ossl_crypto_recv_rcd() - ha_quic_ossl_crypto_release_rcd() - ha_quic_ossl_yield_secret() - ha_quic_ossl_got_transport_params() and - ha_quic_ossl_alert(). But at this time, these implementations which must return an int return 0 interpreted as a failure by the OpenSSL QUIC API, except for ha_quic_ossl_alert() which is implemented the same was as for quictls. The five remaining functions above will be implemented by the next patches to come. ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() have been moved to be defined for both quictls and OpenSSL QUIC API. These callbacks are attached to the SSL objects (sessions) calling qc_ssl_set_cbs() new function. This latter callback the correct function to attached the correct callbacks to the SSL objects (defined by <ha_quic_method> for quictls, and <ha_quic_dispatch> for OpenSSL). The calls to SSL_provide_quic_data() and SSL_process_quic_post_handshake() have been also disabled. These functions are not defined by OpenSSL QUIC API. At this time, the functions which call them are still defined when HAVE_OPENSSL_QUIC is defined.	2025-05-20 15:00:06 +02:00
Frederic Lecaille	894595b711	MINOR: quic: Add useful error traces about qc_ssl_sess_init() failures There were no traces to diagnose qc_ssl_sess_init() failures from QUIC traces. This patch add calls to TRACE_DEVEL() into qc_ssl_sess_init() and its caller (qc_alloc_ssl_sock_ctx()). This was useful at least to diagnose SSL context initialization failures when porting QUIC to the new OpenSSL 3.5 QUIC API. Should be easily backported as far as 2.6.	2025-05-20 15:00:06 +02:00
Frederic Lecaille	a2822b1776	CLEANUP: quic: Useless BIO_METHOD initialization This code is there from QUIC implementation start. It was supposed to initialize <ha_quic_meth> as a BIO_METHOD static object. But this BIO_METHOD is not used at all! Should be backported as far as 2.6 to help integrate the next patches to come.	2025-05-20 15:00:06 +02:00
William Lallemand	e803385a6e	MINOR: acme: renewal notification over the dpapi sink Output a sink message when the certificate was renewed by the ACME client. The message is emitted on the "dpapi" sink, and ends by \n\0. Since the message contains this binary character, the right -0 parameter must be used when consulting the sink over the CLI: Example: $ echo "show events dpapi -nw -0" \| socat -t9999 /tmp/haproxy.sock - <0>2025-05-19T15:56:23.059755+02:00 acme newcert foobar.pem.rsa\n\0 When used with the master CLI, @@1 should be used instead of @1 in order to keep the connection to the worker. Example: $ echo "@@1 show events dpapi -nw -0" \| socat -t9999 /tmp/master.sock - <0>2025-05-19T15:56:23.059755+02:00 acme newcert foobar.pem.rsa\n\0	2025-05-19 16:07:25 +02:00
Willy Tarreau	99d6c889d0	BUG/MAJOR: leastconn: never reuse the node after dropping the lock On ARM with 80 cores and a single server, it's sometimes possible to see a segfault in fwlc_get_next_server() around 600-700k RPS. It seldom happens as well on x86 with 128 threads with the same config around 1M rps. It turns out that in fwlc_get_next_server(), before calling fwlc_srv_reposition(), we have to drop the lock and that one takes it back again. The problem is that anything can happen to our node during this time, and it can be freed. Then when continuing our work, we later iterate over it and its next to find a node with an acceptable key, and by doing so we can visit either uninitialized memory or simply nodes that are no longer in the tree. A first attempt at fixing this consisted in artificially incrementing the elements count before dropping the lock, but that turned out to be even worse because other threads could loop forever on such an element looking for an entry that does not exist. Maintaining a separate refcount didn't work well either, and it required to deal with the memory release while dropping it, which is really not convenient. Here we're taking a different approach consisting in simply not trusting this node anymore and going back to the beginning of the loop, as is done at a few other places as well. This way we can safely ignore the possibly released node, and the test runs reliably both on the arm and the x86 platforms mentioned above. No performance regression was observed either, likely because this operation is quite rare. No backport is needed since this appeared with the leastconn rework in 3.2.	2025-05-19 16:05:03 +02:00
Amaury Denoyelle	d358da4d83	BUG/MINOR: quic: fix crash on quic_conn alloc failure If there is an alloc failure during qc_new_conn(), cleaning is done via quic_conn_release(). However, since the below commit, an unchecked dereferencing of <qc.path> is performed in the latter. e841164a4402118bd7b2e2dc2b5068f21de5d9d2 MINOR: quic: account for global congestion window To fix this, simply check <qc.path> before dereferencing it in quic_conn_release(). This is safe as it is properly initialized to NULL on qc_new_conn() first stage. This does not need to be backported.	2025-05-19 11:03:48 +02:00

... 13 14 15 16 17 ...

20122 Commits