25448 Commits

Author SHA1 Message Date
Amaury Denoyelle
fa1a168bf1 MEDIUM: session: close new idle conns if server in maintenance
Previous patch ensures that a backend connection going into idle state
is rejected and freed if its target server is in maintenance.

This patch introduces a similar change for connections attached in the
session. session_check_idle_conn() now returns an errorl if connection
target server is in maintenance, similarly to session max idle conns
limit reached. This is sufficient to instruct muxes to delete the
connection immediately.
2025-08-28 14:55:21 +02:00
Amaury Denoyelle
67df6577ff MEDIUM: server: close new idle conns if server in maintenance
Currently, when a server is set on maintenance mode, its idle connection
are scheduled for purge. However, this does not prevent currently used
connection to become idle later on, even if the server is still off.

Change this behavior : an idle connection is now rejected by the server
if it is in maintenance. This is implemented with a new condition in
srv_add_to_idle_list() which returns an error value. In this case, muxes
stream detach callback will immediately free the connection.

A similar change is also performed in each MUX and SSL I/O handlers and
in conn_notify_mux(). An idle connection is not reinserted in its idle
list if server is in maintenance, but instead it is immediately freed.
2025-08-28 14:55:18 +02:00
Amaury Denoyelle
f234b40cde MINOR: server: shard by thread sess_conns member
Server member <sess_conns> is a mt_list which contains every backend
connections attached to a session which targets this server. These
connecions are not present in idle server trees.

The main utility of this list is to be able to cleanup these connections
prior to removing a server via "del server" CLI. However, this procedure
will be adjusted by a future patch. As such, <sess_conns> member must be
moved into srv_per_thread struct. Effectively, this duplicates a list
for every threads.

This commit does not introduce functional change. Its goal is to ensure
that these connections are now ordered by their owning thread, which
will allow to implement a purge, similarly to idle connections attached
to servers.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
37fca75ef7 MEDIUM: session: protect sess conns list by idle_conns_lock
Introduce idle_conns_lock usage to protect manipulation to <priv_conns>
session member. This represents a list of intermediary elements used to
store backend connections attached to a session to prevent their sharing
across multiple clients.

Currently, this patch is unneeded as sessions are only manipulated on a
single-thread. Indeed, contrary to idle connections stored in servers,
takeover is not implemented for connections attached to a session.
However, a future patch will introduce purging of these connections,
which is already performed for connections attached to servers. As this
can be executed by any thread, it is necessary to introduce
idle_conns_lock usage to protect their manipulation.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
f3e8e863c9 MINOR: session: refactor alloc/lookup of sess_conns elements
By default backend connections are stored into idle/avail server trees.
However, if such connections cannot be shared between multiple clients,
session serves as the alternative storage.

To be able to quickly reuse a backend conn from a session, they are
indexed by their target, which is either a server or a backend proxy.
This is the purpose of 'struct sess_priv_conns' intermediary stockage
element.

Lookup and allocation of these elements are performed in several session
function, for example to add, get or remove a backend connection from a
session. The purpose of this patch is to simplify this by providing two
internal functions sess_alloc_sess_conns() and sess_get_sess_conns().

Along with this, a new BUG_ON() is added into session_unown_conn(),
which ensure that sess_priv_conns element is found when the connection
is removed from the session.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
d4f7a2dbcc MINOR: session: uninline functions related to BE conns management
Move from header to source file functions related to session management
of backend connections. These functions are big enough to remove inline
attribute.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
d0df41fd22 MINOR: session: document explicitely that session_add_conn() is safe
A set of recent patches have simplified management of backend connection
attached to sessions. The API is now stricter to prevent any misuse.

One of this change is the addition of a BUG_ON() in session_add_conn(),
which ensures that a connection is not attached to a session if its
<owner> field points to another entry.

On older haproxy releases, this assertion could not be enforced due to
NTLM as a connection is turned as private during its transfer. When
using a true multiplexed protocol on the backend side, the connection
could be assigned in turn to several sessions. However, NTLM is now only
applied for HTTP/1.1 as it does not make sense if the connection is
already shared.

To better clarify this situation, extend the comment on BUG_ON() inside
session_add_conn().
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
b3ce464435 BUG/MINOR: mux-quic: do not access conn after idle list insert
Once a connection is inserted into the server idle/safe tree during
stream detach, it is not accessed anymore by the muxes without
idle_conns_lock protection. This is because the connection could have
been already stolen by a takeover operation.

Adjust QUIC MUX detach implementation to follow the same pattern. Note
that, no bug can occur due to takeover as QUIC does not implement it.
However, prior to this patch, there may still exist race-conditions with
idle connection purging.

No backport needed.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
0be225f341 BUG/MINOR: server: decrement session idle_conns on del server
When a server is deleted, each of its idle connections are removed. This
is also performed for every private connections stored on sessions which
referenced the target server.

As mentionned above, these private connections are idle, guaranteed by
srv_check_for_deletion(). A BUG_ON() on CO_FL_SESS_IDLE is already
present to guarantee this. Thus, these connections are accounted on the
session to enforce max-session-srv-conns limit.

However, this counter is not decremented during private conns cleanup on
"del server" handler. This patch fixes this by adding a decrement for
every private connections removed via "del server".

This should be backported up to 3.0.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
bce29bc7a4 MINOR: cli: display failure reason on wait command
wait CLI command can be used to wait until either a defined timeout or a
specific condition is reached. So far, srv-removable is the only event
supported. This is tested via srv_check_for_deletion().

This is implemented via srv_check_for_deletion(), which is
able to report a message describing the reason if the condition is
unmet.

Previously, wait return a generic string, to specify if the condition is
met, the timer has expired or an immediate error is encountered. In case
of srv-removable, it did not report the real reason why a server could
not be removed.

This patch improves wait command with srv-removable. It now displays the
last message returned by srv_check_for_deletion(), either on immediate
error or on timeout. This is implemented by using dynamic string output
with cli_dynmsg/dynerr() functions.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
04f05f1880 BUG/MINOR: connection: remove extra session_unown_conn() on reverse
When a connection is reversed via rhttp protocol on the edge endpoint,
it migrates from frontend to backend side. This operation is performed
by conn_reverse(). During this transition, the conn owning session is
freed as it becomes unneeded.

Prior to this patch, session_unown_conn() was also called during
frontend to backend migration. However, this is unnecessary as this
function is only used for backend connection reuse. As such, this patch
removes this unnecessary call.

This does not cause any harm to the process as session_unown_conn() can
handle a connection not inserted yet. However, for clarity purpose it's
better to backport this patch up to 3.0.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
a96f1286a7 BUG/MINOR: connection: rearrange union list members
A connection can be stored in several lists, thus there is several
attach points in struct connection. Depending on its proxy side, either
frontend or backend, a single connection will only access some of them
during its lifetime.

As an optimization, these attach points are organized in a union.
However, this repartition was not correctly achieved along
frontend/backend side delimitation.

Furthermore, reverse HTTP has recently been introduced. With this
feature, a connection can migrate from frontend to backend side or vice
versa. As such, it becomes even more tedious to ensure that these
members are always accessed in a safe way.

This commit rearrange these fields. First, union is now clearly splitted
between frontend and backend only elements. Next, backend elements are
initialized with conn_backend_init(), which is already used during
connection reversal on an edge endpoint. A new function
conn_frontend_init() serves to initialize the other members, called both
on connection first instantiation and on reversal on a dialer endpoint.

This model is much cleaner and should prevent any access to fields from
the wrong side.

Currently, there is no known case of wrong access in the existing code
base. However, this cleanup is considered an improvement which must be
backported up to 3.0 to remove any possible undefined behavior.
2025-08-28 14:52:29 +02:00
William Lallemand
d7f6819161 BUG/MEDIUM: mworker: fix startup and reload on macOS
Since the mworker rework in haproxy 3.1, the worker need to tell the
master that it is ready. This is done using the sockpair protocol by
sending a _send_status message to the master.

It seems that the sockpair protocol is buggy on macOS because of a known
issue around fd transfer documented in sendmsg(2):

https://man.freebsd.org/cgi/man.cgi?sendmsg(2) BUGS section

  Because sendmsg() does not necessarily block until the data has been
  transferred, it is possible to transfer an open file descriptor across
  an AF_UNIX domain socket (see recv(2)), then close() it before it has
  actually been sent, the result being that the receiver gets a closed
  file descriptor. It is left to the application to implement an
  acknowledgment mechanism to prevent this from happening.

Indeed the recv side of the sockpair is closed on the send side just
after the send_fd_uxst(), which does not implement an acknowledgment
mechanism. So the master might never recv the _send_status message.

In order to implement an acknowledgment mechanism, a blocking read() is
done before closing the recv fd on the sending side, so we are sure that
the message was read on the other side.

This was only reproduced on macOS, meaning the master CLI is also
impacted on macOS. But no solution was found on macOS for it.
Implementing an acknowledgment mechanism would complexify too much the
protocol in non-blocking mode.

The problem was reported in ticket #3045, reproduced and analyzed by
@cognet.

Must be backported as far as 3.1.
2025-08-28 14:51:46 +02:00
Valentine Krasnobaeva
441cd614f9 BUG/MINOR: acl: set arg_list->kw to aclkw->kw string literal if aclkw is found
During configuration parsing *args can contain different addresses, it is
changing from line to line. smp_resolve_args() is called after the
configuration parsing, it uses arg_list->kw to create an error message, if a
userlist referenced in some ACL is absent. This leads to wrong keyword names
reported in such message or some garbage is printed.

It does not happen in the case of sample fetches. In this case arg_list->kw is
assigned to a string literal from the sample_fetch struct returned by
find_sample_fetch(). Let's do the same in parse_acl_expr(), when find_acl_kw()
lookup returns a corresponding acl_keyword structure.

This fixes the issue #3088 at GitHub.
This should be backported in all stable versions since 2.6 including 2.6.
2025-08-28 10:22:21 +02:00
Frederic Lecaille
ffa926ead3 BUG/MINOR: mux-quic: trace with non initialized qcc
This issue leads to crashes when the QUIC mux traces are enabled and could be
reproduced with -dMfail. When the qcc allocation fails (qcc_init()) haproxy
crashes into qmux_dump_qcc_info() because ->conn qcc member is initialized:

Program terminated with signal SIGSEGV, Segmentation fault.
    at src/qmux_trace.c:146
146             const struct quic_conn *qc = qcc->conn->handle.qc;
[Current thread is 1 (LWP 1448960)]
(gdb) p qcc
$1 = (const struct qcc *) 0x7f9c63719fa0
(gdb) p qcc->conn
$2 = (struct connection *) 0x155550508
(gdb)

This patch simply fixes the TRACE() call concerned to avoid <qcc> object
dereferencing when it is NULL.

Must be backported as far as 3.0.
2025-08-28 08:19:34 +02:00
Frederic Lecaille
31c17ad837 MINOR: quic: remove ->offset qf_crypto struct field
This patch follows this previous bug fix:

    BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

where a ebtree node has been added to qf_crypto struct. It has the same
meaning and type as ->offset_node.key field with ->offset_node an eb64tree node.
This patch simply removes ->offset which is no more useful.

This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
2025-08-28 08:19:34 +02:00
William Lallemand
2ed515c632 DOC: configuration: clarify 'default-crt' and implicit default certificates
Clarify the behavior of implicit default certificates when used on the
same line as the default-crt keyword.

Should be backported as far as 3.2
2025-08-27 17:09:02 +02:00
William Lallemand
ab7358b366 MEDIUM: ssl: convert diag to warning for strict-sni + default-crt
Previous patch emits a diag warning when both 'strict-sni' +
'default-crt' are used on the same bind line.

This patch converts this diagnostic warning to a real warning, so the
previous patch could be backported without breaking configurations.

This was discussed in #3082.
2025-08-27 16:22:12 +02:00
William Lallemand
18ebd81962 MINOR: ssl: diagnostic warning when both 'default-crt' and 'strict-sni' are used
It possible to use both 'strict-sni' and 'default-crt' on the same bind
line, which does not make much sense.

This patch implements a check which will look for default certificates
in the sni_w tree when strict-sni is used. (Referenced by their empty
sni ""). default-crt sets the CKCH_INST_EXPL_DEFAULT flag in
ckch_inst->is_default, so its possible to differenciate explicits
default from implicit default.

Could be backported as far as 3.0.

This was discussed in ticket #3082.
2025-08-27 16:22:12 +02:00
Frederic Lecaille
d753f24096 BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets
This issue impacts the QUIC listeners. It is the same as the one fixed by this
commit:

	BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

As chrome, ngtcp2 client decided to fragment its CRYPTO frames but in a much
more agressive way. This could be fixed with a list local to qc_parse_pkt_frms()
to please chrome thanks to the commit above. But this is not sufficient for
ngtcp2 which often splits its ClientHello message into more than 10 fragments
with very small ones. This leads the packet parser to interrupt the CRYPTO frames
parsing due to the ncbuf gap size limit.

To fix this, this patch approximatively proceeds the same way but with an
ebtree to reorder the CRYPTO by their offsets. These frames are directly
inserted into a local ebtree. Then this ebtree is reused to provide the
reordered CRYPTO data to the underlying ncbuf (non contiguous buffer). This way
there are very few less chances for the ncbufs used to store CRYPTO data
to reach a too much fragmented state.

Must be backported as far as 2.6.
2025-08-27 16:14:19 +02:00
Frederic Lecaille
729196fbed BUG/MEDIUM: quic-be: avoid crashes when releasing Initial pktns
This bug arrived with this fix:

    BUG/MINOR: quic-be: missing Initial packet number space discarding

leading to crashes when dereferencing ->ipktns.

Such crashes could be reproduced with -dMfail option. To reach them, the
memory allocations must fail. So, this is relatively rare, except on systems
with limited memory.

To fix this, do not call quic_pktns_discard() if ->ipktns is NULL.

No need to backport.
2025-08-27 16:14:19 +02:00
William Lallemand
c36e4fb17f DOC: configuration: reword 'generate-certificates'
Reword the 'generate-certificates' keyword documentation to clarify
what's happening upon error.

This was discussed in ticket #3082.
2025-08-27 13:42:29 +02:00
Aurelien DARRAGON
2cd0afb430 MINOR: proxy: handle shared listener counters preparation from proxy_postcheck()
We used to allocate and prepare listener counters from
check_config_validity() all at once. But it isn't correct, since at that
time listeners's guid are not inserted yet, thus
counters_fe_shared_prepare() cannot work correctly, and so does
shm_stats_file_preload() which is meant to be called even earlier.

Thus in this commit (and to prepare for upcoming shm shared counters
preloading patches), we handle the shared listener counters prep in
proxy_postcheck(), which means that between the allocation and the
prep there is the proper window for listener's guid insertion and shm
counters preloading.

No change of behavior expected when shm shared counters are not
actually used.
2025-08-27 12:54:25 +02:00
Aurelien DARRAGON
cdb97cb73e MEDIUM: server: split srv_init() in srv_preinit() + srv_postinit()
We actually need more granularity to split srv postparsing init tasks:
Some of them are required to be run BEFORE the config is checked, and
some of them AFTER the config is checked.

Thus we push the logic from 368d0136 ("MEDIUM: server: add and use
srv_init() function") a little bit further and split the function
in two distinct ones, one of them executed under check_config_validity()
and the other one using REGISTER_POST_SERVER_CHECK() hook.

SRV_F_CHECKED flag was removed because it is no longer needed,
srv_preinit() is only called once, and so is srv_postinit().
2025-08-27 12:54:19 +02:00
Aurelien DARRAGON
9736221e90 MINOR: haproxy: abort config parsing on fatal errors for post parsing hooks
When pre-check and post-check postparsing hooks= are evaluated in
step_init_2() potential fatal errors are ignored during the iteration
and are only taken into account at the end of the loop. This is not ideal
because some errors (ie: memory errors) could cause multiple alert
messages in a row, which could make troubleshooting harder for the user.

Let's stop as soon as a fatal error is encountered for post parsing
hooks, as we use to do everywhere else.
2025-08-27 12:54:13 +02:00
Christopher Faulet
49db9739d0 BUG/MEDIUM: spoe: Improve error detection in SPOE applet on client abort
It is possible to interrupt a SPOE applet without reporting an error. For
instance, when the client of the parent stream aborts. Thanks to this patch,
we take care to report an error on the SPOE applet to be sure to interrupt
the processing. It is especially important if the connection to the agent is
queued. Thanks to 886a248be ("BUG/MEDIUM: mux-spop: Reject connection
attempts from a non-spop frontend"), it is no longer an issue. But there is
no reason to continue to process if the parent stream is gone.

In addition, in the SPOE filter, if the processing is interrupted when the
filter is destroyed, no specific status code was set. It is not a big deal
because it cannot be logged at this stage. But it can be used to notify the SPOE
applet. So better to set it.

This patch should be backported as far as 3.1.
2025-08-26 16:12:18 +02:00
William Lallemand
7a30c10587 REGTESTS: jwt: create dynamically "cert.ecdsa.pem"
Stop declaring "cert.ecdsa.pem" in a crt-store, and add it dynamically
over the stats socket insted.

This way we fully verify a JWS signature with a certificate which never
existed at HAProxy startup.
2025-08-25 16:44:24 +02:00
Christopher Faulet
886a248be4 BUG/MEDIUM: mux-spop: Reject connection attempts from a non-spop frontend
It is possible to crash the process by initializing a connection to a SPOP
server from a non-spop frontend. It is of course unexpected and invalid. And
there are some checks to prevent that when the configuration is
loaded. However, it is not possible to handle all cases, especially the
"use_backend" rules relying on log-format strings.

It could be good to improve the backend selection by checking the mode
compatibility (for now, it is only performed for the HTTP).

But at the end, this can also be handled by the SPOP multiplexer when it is
initialized. If the opposite SD is not attached to an SPOE agent, we should
fail the mux initialization and return an internal error.

This patch must be backported as far as 3.1.
2025-08-25 11:11:05 +02:00
Christopher Faulet
b4a92e7cb1 MEDIUM: applet: Set .rcv_buf and .snd_buf functions on default ones if not set
Based on the applet flags, it is possible to set .rcv_buf and .snd_buf
callback functions if necessary. If these functions are not defined for an
applet using the new API, it means the default functions must be used.

We also take care to choose the raw version or the htx version, depending on
the applet flags.
2025-08-25 11:11:05 +02:00
Christopher Faulet
71c01c1010 MINOR: applet: Make some applet functions HTX aware
applet_output_room() and applet_input_data() are now HTX aware. These
functions automatically rely on htx versions if APPLET_FL_HTX flag is set
for the applet.
2025-08-25 11:11:05 +02:00
Christopher Faulet
927884a3eb MINOR: applet: Add a flag to know an applet is using HTX buffers
Multiplexers already explicitly announce their HTX support. Now it is
possible to set flags on applet, it could be handy to do the same. So, now,
HTX aware applets must set the APPLET_FL_HTX flag.
2025-08-25 11:11:05 +02:00
Christopher Faulet
1c76e4b2e4 MINOR: applet: Add function to test applet flags from the appctx
appctx_app_test() function can now be used to test the applet flags using an
appctx. This simplify a bit tests on applet flags. For now, this function is
used to test APPLET_FL_NEW_API flag.
2025-08-25 11:11:05 +02:00
Christopher Faulet
3de6c375aa MINOR: applet: Rely on applet flag to detect the new api
Instead of setting a flag on the applet context by checking the defined
callback functions of the applet to know if an applet is using the new API
or not, we can now rely on the applet flags itself. By checking
APPLET_FL_NEW_API flag, it does the job. APPCTX_FL_INOUT_BUFS flag is thus
removed.
2025-08-25 11:11:05 +02:00
Aurelien DARRAGON
3da1d63749 BUG/MEDIUM: http_ana: handle yield for "stats http-request" evaluation
stats http-request rules evaluation is handled separately in
http_process_req_common(). Because of that, if a rule requires yielding,
the evaluation is interrupted as (F)YIELD verdict return values are not
handled there.

Since 3.2 with the introduction of costly ruleset interruption in
0846638 ("MEDIUM: stream: interrupt costly rulesets after too many
evaluations"), the issue started being more visible because stats
http-request rules would be interrupted when the evaluation counters
reached tune.max-rules-at-once, but the evaluation would never be
resumed, and the request would continue to be handled as if the
evaluation was complete. Note however that the issue already existed
in the past for actions that could return ACT_RET_YIELD such as
"pause" for instance.

This issue was reported by GH user @Wahnes in #3087, thanks to him for
providing useful repro and details.

To fix the issue, we merge rule vedict handling in
http_process_req_common() so that "stats http-request" evaluation benefits
from all return values already supported for the current ruleset.

It should be backported in 3.2 with 0846638 ("MEDIUM: stream: interrupt
costly rulesets after too many evaluations"), and probably even further
(all stable versions) if the patch adaptation is not to complex (before
HTTP_RULE_RES_FYIELD was introduced) because it is still relevant.
2025-08-25 10:59:16 +02:00
Aurelien DARRAGON
f9b227ebff MINOR: http_ana: fix typo in http_res_get_intercept_rule
HTTP_RULE_RES_YIELD was used where HTTP_RULE_RES_FYIELD should be used.
Hopefully, aside from debug traces, both return values were treated
equally. Let's fix that to prevent confusion and from causing bugs
in the future.

It may be backported in 3.2 with 0846638 ("MEDIUM: stream: interrupt
costly rulesets after too many evaluations") if it easily applies
2025-08-25 10:59:08 +02:00
Amaury Denoyelle
1529ec1a25 MINOR: quic: centralize padding for HP sampling on packet building
The below patch has simplified INITIAL padding on emission. Now,
qc_prep_pkts() is responsible to activate padding for this case, and
there is no more special case in qc_do_build_pkt() needed.

  commit 8bc339a6ad4702f2c39b2a78aaaff665d85c762b
  BUG/MAJOR: quic: fix INITIAL padding with probing packet only

However, qc_do_build_pkt() may still activate padding on its own, to
ensure that a packet is big enough so that header protection decryption
can be performed by the peer. HP decryption is performed by extracting a
sample from the ciphered packet, starting 4 bytes after PN offset.
Sample length is 16 bytes as defined by TLS algos used by QUIC. Thus, a
QUIC sender must ensures that length of packet number plus payload
fields to be at least 4 bytes long. This is enough given that each
packet is completed by a 16 bytes AEAD tag which can be part of the HP
sample.

This patch simplifies qc_do_build_pkt() by centralizing padding for this
case in a single location. This is performed at the end of the function
after payload is completed. The code is thus simpler.

This is not a bug. However, it may be interesting to backport this patch
up to 2.6, as qc_do_build_pkt() is a tedious function, in particular
when dealing with padding generation, thus it may benefit greatly from
simplification.
2025-08-25 08:48:24 +02:00
Amaury Denoyelle
7d554ca629 BUG/MINOR: quic: don't coalesce probing and ACK packet of same type
Haproxy QUIC stack suffers from a limitation : it's not possible to emit
a packet which contains probing data and a ACK frame in it. Thus, in
case qc_do_build_pkt() is invoked which both values as true, probing has
the priority and ACK is ignored.

However, this has the undesired side-effect of possibly generating two
coalesced packets of the same type in the same datagram : the first one
with the probing data and the second with an ACK frame. This is caused
by qc_prep_pkts() loop which may call qc_do_build_pkt() multiple times
with the same QEL instance. This case is normally use when a full
datagram has been built but there is still content to emit on the
current encryption level.

To fix this, alter qc_prep_pkts() loop : if both probing and ACK is
requested, force the datagram to be written after packet encoding. This
will result in a datagram containing the packet with probing data as
final entry. A new datagram is started for the next packet which will
can contain the ACK frame.

This also has some impact on INITIAL padding. Indeed, if packet must be
the last due to probing emission, qc_prep_pkts() will also activate
padding to ensure final datagram is at least 1.200 bytes long.

Note that coalescing two packets of the same type is not invalid
according to QUIC RFC. However it could cause issue with some shaky
implementations, so it is considered as a bug.

This must be backported up to 2.6.
2025-08-22 18:20:42 +02:00
Amaury Denoyelle
8bc339a6ad BUG/MAJOR: quic: fix INITIAL padding with probing packet only
A QUIC datagram that contains an INITIAL packet must be padded to 1.200
bytes to prevent any deadlock due to anti-amplification protection. This
is implemented by encoding a PADDING frame on the last packet of the
datagram if necessary.

Previously, qc_prep_pkts() was responsible to activate padding when
calling qc_do_build_pkt(), as it knows which packet is the last to
encode. However, this has the side-effect of preventing PING emission
for probing with no data as this case was handled in an else-if branch
after padding. This was fixed by the below commit

  217e467e89d15f3c22e11fe144458afbf718c8a8
  BUG/MINOR: quic: fix malformed probing packet building

Above logic was altered to fix the PING case : padding was set to false
explicitely in qc_prep_pkts(). Padding was then added in a specific
block dedicated to the PING case in qc_do_build_pkt() itself for INITIAL
packets.

However, the fix is incorrect if the last QEL used to built a packet is
not the initial one and probing is used with PING frame only. In this
case, specific block in qc_do_build_pkt() does not add padding. This
causes a BUG_ON() crash in qc_txb_store() which catches these packets as
irregularly formed.

To fix this while also properly handling PING emission, revert to the
original padding logic : qc_prep_pkts() is responsible to activate
INITIAL padding. To not interfere with PING emission, qc_do_build_pkt()
body is adjusted so that PING block is moved up in the function and
detached from the padding condition.

The main benefit from this patch is that INITIAL padding decision in
qc_prep_pkts() is clearer now.

Note that padding can also be activated by qc_do_build_pkt(), as packets
should be big enough for header protection decipher. However, this case
is different from INITIAL padding, so it is not covered by this patch.

This should be backported up to 2.6.
2025-08-22 18:12:32 +02:00
Amaury Denoyelle
0376e66112 BUG/MINOR: quic: do not emit probe data if CONNECTION_CLOSE requested
If connection closing is activated, qc_prep_pkts() can only built a
datagram with a single packet. This is because we consider that only a
single CONNECTION_CLOSE frame is relevant at this stage.

This is handled both by qc_prep_pkts() which ensure that only a single
packet datagram is built and also qc_do_build_pkt() which prevents the
invokation of qc_build_frms() if <cc> is set.

However, there is an incoherency for probing. First, qc_prep_pkts()
deactivates it if connection closing is requested. But qc_do_build_pkt()
may still emit probing frame as it does not check its <probe> argument
but rather <pto_probe> QEL field directly. This can results in a packet
mixing a PING and a CONNECTION close frames, which is useless.

Fix this by adjusting qc_do_build_pkt() : closing argument is also
checked on PING probing emission. Note that there is still shaky code
here as qc_do_build_pkt() should rely only on <probe> argument to ensure
this.

This should be backported up to 2.6.
2025-08-22 18:06:43 +02:00
Amaury Denoyelle
fc3ad50788 BUG/MEDIUM: quic: reset padding when building GSO datagrams
qc_prep_pkts() encodes input data into QUIC packets in a loop into one
or several datagrams. It supports GSO which requires to built a serie of
multiple datagrams of the same length.

Each packet encoding is performed via a call to qc_do_build_pkt(). This
function has an argument to specify if output packet must be completed
with a PADDING frame. This option is activated when qc_prep_pkts()
encodes the last packet of a datagram with at least one INITIAL packet
in it.

Padding is resetted each time a new datagram is started. However, this
was not performed if GSO is used to built the next datagram. This patch
fixes it by properly resetting padding in this case also.

The impact of this bug is unknown. It may have several effectfs, one of
the most obvious being the insertion of unnecessary padding in packets.
It could also potentially trigger an infinite loop in qc_prep_pkts(),
although this has never been encountered so far.

This must be backported up to 3.1.
2025-08-22 16:22:01 +02:00
Valentine Krasnobaeva
0dc8d8d027 MINOR: dns: dns_connect_nameserver: fix fd leak at error path
This fixes the commit 2c7e05f80e3b
("MEDIUM: dns: don't call connect to dest socket for AF_INET*"). If we fail to
bind AF_INET sockets or the address family of the nameserver protocol isn't
something, what we expect, we need to close the fd, obtained by
connect.

This fixes the issue GitHub #3085
This must be backported along with the commit 2c7e05f80e3b.
2025-08-22 10:50:47 +02:00
Christopher Faulet
a498e527b4 BUG/MAJOR: stream: Remove READ/WRITE events on channels after analysers eval
It is possible to miss a synchronous write event in process_stream() if the
stream was woken up on a write event. In that case, it is possible to freeze
the stream until the next I/O event or timeout.

Concretely, the stream is woken up with CF_WRITE_EVENT on a channel. this
flag is removed from the channel when we leave proces_stream(). But before
leaving process_stream(), when a synchronous send is tried on this channel,
the flag is removed and eventually set again on success. But this event is
masked by the previous one, and the channel is not resync as it should be.

To fix the bug, CF_READ_EVENT and CF_WRITE_EVENT flags are removed from a
channel after the corresponding analysers evaluation. This way, we will be
able to detect a successful synchronous send to restart analysers evaluation
based on the new channel state. It is safe (or it should be) to do so
becaues these flags are only used by analysers and tested to resync the
stream inside process_stream().

It is a very old bug and I guess all versions are affected. It was observed
on 2.9 and higher, and with the master/worker only. But it could affect any
stream. It is tagged a MAJOR because this area is really sensitive to any
change.

This patch should fix the issue #3070. It should probably be backported to
all stable versions, but only after a period of observation and with a
special care because this area is really sensitive to changes. It is
probably reasonnable to backport it as far as 3.0 and wait for older
versions.

Thanks to Valentine for its help on this issue !
2025-08-21 20:15:18 +02:00
William Lallemand
7b3b3d7146 BUG/MEDIUM: ssl: apply ssl-f-use on every "ssl" bind
This patch introduces a change of behavior in the configuration parsing.

Previously the "ssl-f-use" lines were only applied on "ssl" bind lines
that does not have any "crt" configured.
Since there is no warning and you could mix bind lines with and without
crt, this is really confusing.

This patch applies the "ssl-f-use" lines on every "ssl" bind lines.

This was discussed in ticket #3082.

Must be backported in 3.2.
2025-08-21 14:58:06 +02:00
Frederic Lecaille
e513620c72 BUG/MEDIUM: quic-be: crash after backend CID allocation failures
This bug impacts only the QUIC backends. It arrived with this commit:
   MINOR: quic-be: QUIC connection allocation adaptation (qc_new_conn())
which was supposed to be fixed by:
   BUG/MEDIUM: quic: crash after quic_conn allocation failures
but this commit was not sufficient.

Such a crashe could be reproduced with -dMfail option. To reach it, the
<conn_id> object allocation must fail (from qc_new_conn()). So, this is
relatively rare, except on systems with limited memory.

No need to backport.
2025-08-21 14:24:31 +02:00
Frederic Lecaille
9a22770ac5 BUG/MINOR: quic-be: missing Initial packet number space discarding
A QUIC client must discard the Initial packet number space as soon as it first
sends a Handshake packet.

This patch implements this packet number space which was missing.
2025-08-21 14:24:31 +02:00
Amaury Denoyelle
901de11157 BUG/MEDIUM: mux-h2: fix crash on idle-ping due to unwanted ABORT_NOW
An ABORT_NOW() was used during debugging idle-ping but was not removed
from the final code. This may cause crash, in particular when mixing
idle-ping with shorter http-request/http-keep-alive values.

Fix this situation by removing ABORT_NOW() statement.

This should fix github issue #3079.

This must be backported up to 3.2.
2025-08-21 14:21:11 +02:00
Willy Tarreau
82b002a225 [RELEASE] Released version 3.3-dev7
Released version 3.3-dev7 with the following main changes :
    - MINOR: quic: duplicate GSO unsupp status from listener to conn
    - MINOR: quic: define QUIC_FL_CONN_IS_BACK flag
    - MINOR: quic: prefer qc_is_back() usage over qc->target
    - BUG/MINOR: cfgparse: immediately stop after hard error in srv_init()
    - BUG/MINOR: cfgparse-listen: update err_code for fatal error on proxy directive
    - BUG/MINOR: proxy: avoid NULL-deref in post_section_px_cleanup()
    - MINOR: guid: add guid_get() helper
    - MINOR: guid: add guid_count() function
    - MINOR: clock: add clock_set_now_offset() helper
    - MINOR: clock: add clock_get_now_offset() helper
    - MINOR: init: add REGISTER_POST_DEINIT_MASTER() hook
    - BUILD: restore USE_SHM_OPEN build option
    - BUG/MINOR: stick-table: cap sticky counter idx with tune.nb_stk_ctr instead of MAX_SESS_STKCTR
    - MINOR: sock: update broken accept4 detection for older hardwares.
    - CI: vtest: add os name to OT cache key
    - CI: vtest: add Ubuntu arm64 builds
    - BUG/MEDIUM: ssl: Fix 0rtt to the server
    - BUG/MEDIUM: ssl: fix build with AWS-LC
    - MEDIUM: acme: use lowercase for challenge names in configuration
    - BUG/MINOR: init: Initialize random seed earlier in the init process
    - DOC: management: clarify usage of -V with -c
    - MEDIUM: ssl/cli: relax crt insertion in crt-list of type directory
    - MINOR: tools: implement ha_aligned_zalloc()
    - CLEANUP: fd: make use of ha_aligned_alloc() for the fdtab
    - MINOR: pools: distinguish the requested alignment from the type-specific one
    - MINOR: pools: permit to optionally specify extra size and alignment
    - MINOR: pools: always check that requested alignment matches the type's
    - DOC: api: update the pools API with the alignment and typed declarations
    - MEDIUM: tree-wide: replace most DECLARE_POOL with DECLARE_TYPED_POOL
    - OPTIM: tasks: align task and tasklet pools to 64
    - OPTIM: buffers: align the buffer pool to 64
    - OPTIM: queue: align the pendconn pools to 64
    - OPTIM: connection: align connection pools to 64
    - OPTIM: server: start to use aligned allocs in server
    - DOC: management: fix typo in commit f4f93c56
    - DOC: config: recommend single quoting passwords
    - MINOR: tools: also implement ha_aligned_alloc_typed()
    - MEDIUM: server: introduce srv_alloc()/srv_free() to alloc/free a server
    - MINOR: server: align server struct to 64 bytes
    - MEDIUM: ring: always allocate properly aligned ring structures
    - CI: Update to actions/checkout@v5
    - MINOR: quic: implement qc_ssl_do_hanshake()
    - BUG/MEDIUM: quic: listener connection stuck during handshakes (OpenSSL 3.5)
    - BUG/MINOR: mux-h1: fix wrong lock label
    - MEDIUM: dns: don't call connect to dest socket for AF_INET*
    - BUG/MINOR: spoe: Properly detect and skip empty NOTIFY frames
    - BUG/MEDIUM: cli: Report inbuf is no longer full when a line is consumed
    - BUG/MEDIUM: quic: crash after quic_conn allocation failures
    - BUG/MEDIUM: quic-be: do not initialize ->conn too early
    - BUG/MEDIUM: mworker: more verbose error upon loading failure
    - MINOR: xprt: Add recvmsg() and sendmsg() parameters to rcv_buf() and snd_buf().
    - MINOR: ssl: Add a "flags" field to ssl_sock_ctx.
    - MEDIUM: xprt: Add a "get_capability" method.
    - MEDIUM: mux_h1/mux_pt: Use XPRT_CAN_SPLICE to decide if we should splice
    - MINOR: cfgparse: Add a new "ktls" option to bind and server.
    - MINOR: ssl: Define HAVE_VANILLA_OPENSSL if openssl is used.
    - MINOR: build: Add a new option, USE_KTLS.
    - MEDIUM: ssl: Add kTLS support for OpenSSL.
    - MEDIUM: splice: Don't consider EINVAL to be a fatal error
    - MEDIUM: ssl: Add splicing with SSL.
    - MEDIUM: ssl: Add ktls support for AWS-LC.
    - MEDIUM: ssl: Add support for ktls on TLS 1.3 with AWS-LC
    - MEDIUM: ssl: Handle non-Application data record with AWS-LC
    - MINOR: ssl: Add a way to globally disable ktls.
v3.3-dev7
2025-08-20 21:52:39 +02:00
Olivier Houchard
6f21c5631a MINOR: ssl: Add a way to globally disable ktls.
Add a new global option, "noktls", as well as a command line option,
"-dT", to totally disable ktls usage, even if it is activated on servers
or binds in the configuration.
That makes it easier to quickly figure out if a problem is related to
ktls or not.
2025-08-20 18:33:11 +02:00
Olivier Houchard
5da3540988 MEDIUM: ssl: Handle non-Application data record with AWS-LC
Handle receiving and sending TLS records that are not application data
records.
When receiving, we ignore new session tickets records, we handle close
notify as a read0, and we consider any other records as a connection
error.
For sending, we're just sending close notify, so that the TLS connection
is properly closed.
2025-08-20 18:33:11 +02:00
Olivier Houchard
fefc1cce20 MEDIUM: ssl: Add support for ktls on TLS 1.3 with AWS-LC
AWS-LC added a new API in AWS-LC 1.54 that allows the user to retrieve
the keys for TLS 1.3 connections with SSL_get_read_traffic_secret(), so
use it to be able to use ktls with TLS 1.3 too.
2025-08-20 18:33:11 +02:00