This commit is the first one of a serie which aim is to implement
transcoding of a HTX request into HTTP/3, which is necessary for QUIC
backend support.
Transcoding is implementing via a new function h3_req_headers_send()
when a HTX start-line is parsed. For now, most of the request fields are
hardcoded, using a GET method. This will be adjusted in the next
following patches.
On backend side, QUIC MUX needs to initialize the first local stream
during MUX init operation. This is necessary so that the first transfer
can then be performed.
sc_attach_mux() is used to attach the created QCS instance to its stream
data layer. However, return value was not checked, which may cause
issues on allocation error. This patch fixes it by returning an error on
MUX init operation and freeing the QCS instance in case of
sc_attach_mux() error.
This fixes coverity report from github issue #3007.
No need to backport.
When a connection error is reported, we try to collect as much information
as possible on the connection status and the server status is adjusted
accordingly. However, the function does nothing if there is no connection
error and if the healthcheck is not expired yet. It is a problem when an
internal error occurred. It may happen at many places and it is hard to be
sure an error is reported on the connection. And in fact, it is already a
problem when the multiplexer allocation fails. In that case, the healthcheck
is not interrupted as it should be. Concretely, it could only happen when a
connection is established.
It is hard to predict the effects of this bug. It may be unimportant. But it
could probably lead to a crash. To avoid any issue, a SOCKERR status is now
set by default when a connection error is reported. There is no reason to
report a connection error for nothing. So a healthcheck failure must be
reported. There is no "internal error" status. So a socket error is
reported.
This patch must be backport to all stable versions.
It is not especially a bug fixed. But APPCTX_FL_EOS and APPCTX_FL_ERROR
flags must be handled first. These flags are set by the applet itself and
should mark the end of all processing. So there is not reason to get the
output buffer in first place.
This patch could be backported as far as 3.0.
The output buffer must be available to process a command, at least to be
able to emit error messages. When this buffer is full or cannot be
allocated, we must wait. In that case, we must take care to notify the SE
will not consume input data. It is important to avoid wakeup in loop,
especially when the client aborts.
When the output buffer is available again and no longer full, and the CLI
applet is waiting for a command line, it must notify it will consume input
data.
This patch must be backported as far as 3.0.
QUIC support on the backend side has been implemented recently. This has
lead to some adjustment on qc_new_conn() to handle both FE and BE sides,
with some of these changes performed by the following commit.
29fb1aee57
MINOR: quic-be: QUIC connection allocation adaptation (qc_new_conn())
An issue was introduced during some code adjustement. Initialization of
ODCID was incorrectly performed, which caused haproxy to emit invalid
transport parameters. Most of the clients detected this and immediatly
closed the connection.
Fix this by adjusting qc_lstnr_params_init() invokation : replace
<qc.dcid>, which in fact points to the received SCID, by <qc.odcid>
whose purpose is dedicated to original DCID storage.
This fixes github issue #3006. This issue also caused the majority of
tests in the interop to fail.
No backport needed.
This patch is OpenSSL3.5 QUIC API specific. It fixes
OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() callback (see man(3) SSL_set_quic_tls_cb).
The role of this callback is to store the transport parameters received by the peer.
At this time it is never used by QUIC listeners because there is another callback
which is used to store the transport parameters. This latter callback is not specific
to OpenSSL 3.5 QUIC API. As far as I know, the TLS stack call only one time
one of the callbacks which have been set to receive and store the transport parameters.
That said, OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() is called for QUIC
backends to store the server transport parameters.
qc_ssl_set_quic_transport_params() is useless is this callback. It is dedicated
to store the local tranport parameters (which are sent to the peer). Furthermore
<server> second parameter of quic_transport_params_store() must be 0 for a listener
(or QUIC server) whichs call it, denoting it does not receive the transport parameters
of a QUIC server. It must be 1 for a QUIC backend (a QUIC client which receives
the transport parameter of a QUIC server).
Must be backported to 3.2.
On backend side, HTTP/0.9 response body is copied into stream data HTX
buffer. Properly handle the case where the HTX out buffer space is too
small. Only copy a partial copy of the HTTP response. Transcoding will
be restarted when new room is available.
When QUIC is used on the frontend side, communication is restricted with
clients using privileged port. This is a simple protection against
DNS/NTP spoofing.
This feature should not be activated on the backend side, as in this
case it is quite frequent to exchange with server running on privileged
ports. As such, a new parameter is added to quic_recv() so that it is
only active on the frontend side.
Without this patch, it is impossible to communicate with QUIC servers
running on privileged ports, as incoming datagrams would be silently
dropped.
No need to backport.
The keep-query redirect option must do nothing is there is no query-string.
However, there is a bug. When there is no QS, an error is returned, leading
to return a 500-internal-error to the client.
To fix the bug, instead of returning 0 when there is no QS, we just skip the
QS processing.
This patch should fix the issue #3005. It must be backported as far as 3.1.
NEW_TOKEN frame is never emitted by a client, hence parsing was not
tested on frontend side.
On backend side, an issue can occur, as expected token length is static,
based on the token length used internally by haproxy. This is not
sufficient for most server implementation which uses larger token. This
causes a parsing error, which may cause skipping of following frames in
the same packet. This issue was detected using ngtcp2 as server.
As for now tokens are unused by haproxy, simply discard test on token
length during NEW_TOKEN frame parsing. The token itself is merely
skipped without being stored. This is sufficient for now to continue on
experimenting with QUIC backend implementation.
This does not need to be backported.
Report an error during server configuration if QUIC is used by SSL is
not activiated via 'ssl' keyword. This is done in _srv_parse_finalize(),
which is both used by static and dynamic servers.
Note that contrary to listeners, an error is reported instead of a
warning, and SSL is not automatically activated if missing. This is
mainly due to the complex server configuration : _srv_parse_finalize()
is ideal to affect every servers, including dynamic entries. However, it
is executed after server SSL context allocation performed via
<prepare_srv> XPRT operation. A proper fix would be to move SSL ctx
alloc in _srv_parse_finalize(), but this may have unknown impact. Thus,
for now a simpler solution has been chosen.
QUIC traces in ssl_quic_srv_new_ssl_ctx() are problematic as this
function is called early during startup. If activating traces via -dt
command-line argument, a crash occurs due to stderr sink not yet
available.
Thus, traces from ssl_quic_srv_new_ssl_ctx() are simply removed.
No backport needed.
This commit added a "err" C label reachable only with USE_QUIC_OPENSSL_COMPAT:
MINOR: quic-be: Missing callbacks initializations (USE_QUIC_OPENSSL_COMPAT)
leading coverity to warn this:
*** CID 1611481: Control flow issues (UNREACHABLE)
/src/quic_ssl.c: 802 in ssl_quic_srv_new_ssl_ctx()
796 goto err;
797 #endif
798
799 leave:
800 TRACE_LEAVE(QUIC_EV_CONN_NEW);
801 return ctx;
>>> CID 1611481: Control flow issues (UNREACHABLE)
>>> This code cannot be reached: "err:
SSL_CTX_free(ctx);".
802 err:
803 SSL_CTX_free(ctx);
804 ctx = NULL;
805 TRACE_DEVEL("leaving on error", QUIC_EV_CONN_NEW);
806 goto leave;
807 }
The less intrusive (without #ifdef) way to fix this it to add a "goto err"
statement from the code part which is reachable without USE_QUIC_OPENSSL_COMPAT.
Thank you to @chipitsine for having reported this issue in GH #3003.
This issue may occur when qc_new_conn() fails after having allocated
and attached <conn_cid> to its tree. This is the case when compiling
haproxy against WolfSSL for an unknown reason at this time. In this
case the <conn_cid> is freed by pool_head_quic_connection_id(), then
freed again by quic_conn_release().
This bug arrived with this commit:
MINOR: quic-be: QUIC connection allocation adaptation (qc_new_conn())
So, the aim of this patch is to free <conn_cid> only for QUIC backends
and if it is not attached to its tree. This is the case when <conn_id>
local variable passed with NULL value to qc_new_conn() is then intialized
to the same <conn_cid> value.
This patch should have come with this last commit for the last qc_new_conn()
modifications for QUIC backends:
MINOR: quic-be: get rid of ->li quic_conn member
qc_new_conn() must be passed NULL pointers for several variables as mentioned
by the comment. Some of these local variables are used to avoid too much
code modifications.
Implement transcoding of a HTX request into HTTP/0.9. This protocol is a
simplified version of HTTP. Request only supports GET method without any
header. As such, only a request line is written during snd_buf
operation.
Implement transcoding of a HTTP/0.9 response into a HTX message.
HTTP/0.9 is a really simple substract of HTTP spec. The response does
not have any status line and is contains only the payload body. Response
is finished when the underlying connection/stream is closed.
A status line is generated to be compliant with HTX. This is performed
on the first invokation of rcv_buf for the current stream. Status code
is set to 200. Payload body if present is then copied using
htx_add_data().
This commit is the second and final step to initiate QUIC MUX on the
backend side. On handshake completion, MUX is woken up just after its
creation. This step is necessary to notify the stream layer, via the QCS
instance pre-initialized on MUX init, so that the transfer can be
resumed.
This mode of operation is similar to TCP stack when TLS+ALPN are used,
which forces MUX initialization to be delayed after handshake
completion.
Adjust qmux_init() to handle frontend and backend sides differently.
Most notably, on backend side, the first bidirectional stream is created
preemptively. This step is necessary as MUX layer will be woken up just
after handshake completion.
Stream data layer is notified that data is expected when FIN is
received, which marks the end of the HTTP request. This prepares data
layer to be able to handle the expected HTTP response.
Thus, this step is only relevant on frontend side. On backend side, FIN
marks the end of the HTTP response. No further content is expected, thus
expect data should not be set in this case.
Note that se_expect_data() invokation via qcs_attach_sc() is not
protected. This is because this function will only be called during
request headers parsing which is performed on the frontend side.
Mux connection is flagged with new QC_CF_IS_BACK if used on the backend
side. For now the only change is during traces, to be able to
differentiate frontend and backend usage.
Complete document for rcv_buf/snd_buf operations. In particular, return
value is now explicitely defined. For H3 layer, associated functions
documentation is also extended.
Use conn_ctrl_init() on the connection when quic_connect_server()
succeeds. This is necessary so that the connection is considered as
completely initialized. Without this, connect operation will be call
again if connection is reused.
On backend side, multiplexer layer is initialized during
connect_server(). However, this step is not performed if ALPN is used,
as the negotiated protocol may be unknown. Multiplexer initialization is
delayed after TLS handshake completion.
There are still exceptions though that forces the MUX to be initialized
even if ALPN is used. One of them was if <mux_proto> server field was
already set at this stage, which is the case when an explicit proto is
selected on the server line configuration. Remove this condition so that
now MUX init is delayed with ALPN even if proto is forced.
The scope of this change should be minimal. In fact, the only impact
concerns server config with both proto and ALPN set, which is pretty
unlikely as it is contradictory.
The main objective of this patch is to prepare QUIC support on the
backend side. Indeed, QUIC proto will be forced on the server if a QUIC
address is used, similarly to bind configuration. However, we still want
to delay MUX initialization after QUIC handshake completion. This is
mandatory to know the selected application protocol, required during
QUIC MUX init.
Change wake callback behavior for QUIC MUX. This operation loops over
each QCS and notify their stream data layer on certain events via
internal helper qcc_wake_some_streams().
Previously, streams were notified only if an error occured on the
connection. Change this to notify streams data layer everytime wake
callback is used. This behavior is now identical to H2 MUX.
qcc_wake_some_streams() is also renamed to qcc_wake_streams(), as it
better reflect its true behavior.
This change should not have performance impact as wake mux ops should
not be called frequently. Note that qcc_wake_streams() can also be
called directly via qcc_io_process() to ensure a new error is correctly
propagated. As wake callback first uses qcc_io_process(), it will only
call qcc_wake_streams() if no error is present.
No known issue is associated with this commit. However, it could prevent
freezing transfer under certain condition. As such, it is considered as
a bug fix worthy of backporting.
This should be backported after a period of observation.
It was still failing on Ubuntu-24.04 with GCC+ASAN. So, instead of
understand the code path the compiler followed to report uninitialized
variables, let's init them now.
No backport needed.
commit 16eb0fab3 ("MAJOR: counters: dispatch counters over thread groups")
introduced a build regression on some compilers:
src/listener.c: In function 'listener_accept':
src/listener.c:1095:3: error: 'for' loop initial declarations are only allowed in C99 mode
for (int it = 0; it < global.nbtgroups; it++)
^
src/listener.c:1095:3: note: use option -std=c99 or -std=gnu99 to compile your code
src/listener.c:1101:4: error: 'for' loop initial declarations are only allowed in C99 mode
for (int it = 0; it < global.nbtgroups; it++) {
^
make: *** [src/listener.o] Error 1
make: *** Waiting for unfinished jobs....
Let's fix that.
No backport needed
In hlua_applet_tcp_recv_try() and hlua_applet_tcp_getline_yield(), GCC 14.2
reports warnings about 'blk2' variable that may be used uninitialized. It is
a bit strange because the code is pretty similar than before. But to make it
happy and to avoid bugs if the API change in future, 'blk2' is now used only
when its length is greater than 0.
No need to backport.
In hlua_applet_tcp_getline_yield(), the function may yield if there is no
data available. However we must take care to add a return statement just
after the call to hlua_yieldk(). I don't know the details of the LUA API,
but at least, this return statement fix a build error about uninitialized
variables that may be used.
It is a 3.3-specific issue. No backport needed.
On backend side, MUX is instantiated after QUIC handshake completion.
This step is performed via qc_ssl_provide_quic_data(). First, connection
flags for handshake completion are resetted. Then, MUX is instantiated
via conn_create_mux() function.
Force QUIC as <mux_proto> for server if a QUIC address is used. This is
similarly to what is already done for bind instances on the frontend
side. This step ensures that conn_create_mux() will select the proper
protocol.
Replace ->li quic_conn pointer to struct listener member by ->target which is
an object type enum and adapt the code.
Use __objt_(listener|server)() where the object type is known. Typically
this is were the code which is specific to one connection type (frontend/backend).
Remove <server> parameter passed to qc_new_conn(). It is redundant with the
<target> parameter.
GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified
to prevent it from building more than an MTU. This has as consequence to prevent
qc_send_ppkts() to use GSO.
ssl_clienthello.c code is run only by listeners. This is why __objt_listener()
is used in place of ->li.
Disable the code around SSL_get_peer_quic_transport_params() as this was done
for USE_QUIC_OPENSSL_COMPAT because SSL_get_peer_quic_transport_params() is not
defined by OpenSSL 3.5 QUIC API.
quic_tls_compat_keylog_callback() is the callback used by the QUIC OpenSSL
compatibility module to derive the TLS secrets from other secrets provided
by keylog. The <write> local variable to this function is initialized to denote
the direction (write to send, read to receive) the secret is supposed to be used
for. That said, as the QUIC cryptographic algorithms are symmetrical, the
direction is inversed between the peer: a secret which is used to write/send/cipher
data from a peer point of view is also the secret which is used to
read/receive/decipher data. This was confirmed by the fact that without this
patch, the TLS stack first provides the peer with Handshake to send/cipher
data. The client could not use such secret to decipher the Handshake packets
received from the server. This patch simply reverse the direction stored by
<write> variable to make the secrets derivation works for the QUIC client.
quic_tls_compat_init() function is called from OpenSSL QUIC compatibility module
(USE_QUIC_OPENSSL_COMPAT) to initialize the keylog callback and the callback
which stores the QUIC transport parameters as a TLS extensions into the stack.
These callbacks must also be initialized for QUIC backends.
This is done from TLS secrets derivation callback at Application level (the last
encryption level) calling SSL_get_peer_quic_transport_params() to have an access
to the TLS transport paremeters extension embedded into the Server Hello TLS message.
Then, quic_transport_params_store() is called to store a decoded version of
these transport parameters.
For connection to QUIC servers, this patch modifies the moment where the I/O
handler callback is switched to quic_conn_app_io_cb(). This is no more
done as for listener just after the handshake has completed but just after
it has been confirmed.
Discard the Initial packet number space as soon as possible. This is done
during handshakes in quic_conn_io_cb() as soon as an Handshake packet could
be successfully sent.
The initialization of <ssl_app_data_index> SSL user data index is required
to make all the SSL sessions to QUIC servers work as this is done for TCP
servers. The conn object notably retrieve for SSL callback which are
server specific (e.g. ssl_sess_new_srv_cb()).
Store the peer connection ID (SCID) as the connection DCID as soon as an Initial
packet is received.
Stop comparing the packet to QUIC_PACKET_TYPE_0RTT is already match as
QUIC_PACKET_TYPE_INITIAL.
A QUIC server must not send too short datagram with ack-eliciting packets inside.
This cannot be done from quic_rx_pkt_parse() because one does not know if
there is ack-eliciting frame into the Initial packets. If the packet must be
dropped, this is after having parsed it!
Modify quic_dgram_parse() to stop passing it a listener as third parameter.
In place the object type address of the connection socket owner is passed
to support the haproxy servers with QUIC as transport protocol.
qc_owner_obj_type() is implemented to return this address.
qc_counters() is also implemented to return the QUIC specific counters of
the proxy of owner of the connection.
quic_rx_pkt_parse() called by quic_dgram_parse() is also modify to use
the object type address used by this latter as last parameter. It is
also modified to send Retry packet only from listeners. A QUIC client
(connection to haproxy QUIC servers) must drop the Initial packets with
non null token length. It is also not supposed to receive O-RTT packets
which are dropped.
The QUIC datagram redispatch is there to counter the race condition which
exists only for QUIC connections to listener where datagrams may arrive
on the wrong socket between the bind() and connect() calls.
Run this code part only for listeners.
Allocate a connection to connect to QUIC servers from qc_conn_init() which is the
->init() QUIC xprt callback.
Also initialize ->prepare_srv and ->destroy_srv callback as this done for TCP
servers.
For haproxy QUIC servers (or QUIC clients), the peer is considered as validated.
This is a property which is more specific to QUIC servers (haproxy QUIC listeners).
No <odcid> is used for the QUIC client connection. It is used only on the QUIC server side.
The <token_odcid> is also not used on the QUIC client side. It must be embedded into
the transport parameters only on the QUIC server side.
The quic_conn is created before the socket allocation. So, the local address is
zeroed.
Initilize the transport parameter with qc_srv_params_init().
Stop hardcoding the <server> parameter passed value to qc_new_isecs() to correctly
initialize the Initial secrets.
Modify quic_connect_server() which is the ->connect() callback for QUIC protocol:
- add a BUG_ON() run when entering this funtion: the <fd> socket must equal -1
- conn->handle is a union. conn->handle.qc is use for QUIC connection,
conn->handle.fd must not be used to store the fd.
- code alignment fix for setsockopt(fd, SOL_SOCKET, (SO_SNDBUF|SO_RCVBUF))
statements
- remove the section of code which was duplicated from ->connect() TCP callback
- fd_insert() the new socket file decriptor created to connect to the QUIC
server with quic_conn_sock_fd_iocb() as callback for read event.
This patch only adds <proto_type> new proto_type enum parameter and <sock_type>
socket type parameter to sock_create_server_socket() and adapts its callers.
This is to prepare the use of this function by QUIC servers/backends.
Modify qc_alloc_ssl_sock_ctx() to pass the connection object as parameter. It is
NULL for a QUIC listener, not NULL for a QUIC server. This connection object is
set as value for ->conn quic_conn struct member. Initialise the SSL session object from
this function for QUIC servers.
qc_ssl_set_quic_transport_params() is also modified to pass the SSL object as parameter.
This is the unique parameter this function needs. <qc> parameter is used only for
the trace.
SSL_do_handshake() must be calle as soon as the SSL object is initialized for
the QUIC backend connection. This triggers the TLS CRYPTO data delivery.
tasklet_wakeup() is also called to send asap these CRYPTO data.
Modify the QUIC_EV_CONN_NEW event trace to dump the potential errors returned by
SSL_do_handshake().
Implement ssl_sock_new_ssl_ctx() to allocate a SSL server context as this is currently
done for TCP servers and also for QUIC servers depending on the <is_quic> boolean value
passed as new parameter. For QUIC servers, this function calls ssl_quic_srv_new_ssl_ctx()
which is specific to QUIC.
From connect_server(), QUIC protocol could not be retreived by protocol_lookup()
because of the PROTO_TYPE_STREAM default passed as argument. In place to support
QUIC srv->addr_type.proto_type may be safely passed.
The QUIC servers xprts have already been set at server line parsing time.
This patch prevents the QUIC servers xprts to be reset to <ssl_sock> value which is
the value used for SSL/TCP connections.
Add ->quic_params new member to server struct.
Also set the ->xprt member of the server being initialized and initialize asap its
transport parameters from _srv_parse_init().
This XPRT callback is called from check_config_validity() after the configuration
has been parsed to initialize all the SSL server contexts.
This patch implements the same thing for the QUIC servers.
Add a little check to verify that the version chosen by the server matches
with the client one. Initiliazes local transport parameters ->negotiated_version
value with this version if this is the case. If not, return 0;
According to the RFC, a QUIC client must encode the QUIC version it supports
into the "Available Versions" of "Version Information" transport parameter
order by descending preference.
This is done defining <quic_version_2> and <quic_version_draft_29> new variables
pointers to the corresponding version of <quic_versions> array elements.
A client announces its available versions as follows: v1, v2, draft29.
Activate QUIC protocol support for MUX-QUIC on the backend side,
additionally to current frontend support. This change is mandatory to be
able to implement QUIC on the backend side.
Without this modification, it is impossible to activate explicitely QUIC
protocol on a server line, hence an error is reported :
config : proxy 'xxxx' : MUX protocol 'quic' is not usable for server 'yyyy'
Mark QUIC address support for servers as experimental on the backend
side. Previously, it was allowed but wouldn't function as expected. As
QUIC backend support requires several changes, it is better to declare
it as experimental first.
QUIC is not implemented on the backend side. To prevent any issue, it is
better to reject any server configured which uses it. This is done via
_srv_parse_init() which is used both for static and dynamic servers.
This should be backported up to all stable versions.
The functions responsible to extract data from the applet input buffer or to
push data into the applet output buffer are now relying on the newly added
functions in the applet API. This simplifies a bit the code.
qc_alloc_ssl_sock_ctx() allocates an SSL_CTX object for each connection. It also
allocates an SSL object. When this function failed, it freed only the SSL_CTX object.
The correct way to free both of them is to call qc_free_ssl_sock_ctx().
Must be backported as far as 2.6.
If an empty argument is used in configuration, for example due to an
undefined environment variable, the rest of the line is not parsed. As
such, a warning is emitted to report this.
The warning was not totally correct as it reported the wrong argument
index. Fix this by this patch. Note that there is still an issue with
the "^" indicator, but this is not as easy to fix yet.
This is related to github issue #2995.
This should be backported up to 3.2.
Hide warning about empty argument outside of discovery mode. This is
necessary, else the message will be displayed twice, which hampers
haproxy output lisibility.
This should fix github isue #2995.
This should be backported up to 3.2.
Empty lines was not properly parsed and could lead to crashes because the
last argument was parsed outside of the cmdline buffer. Indeed, the last
argument is parsed to look for an eventual payload pattern. It is started
one character after the newline at the end of the command line. But it is
only valid for an non-empty command line.
So, now, this case is properly detected when we leave if an empty line is
detected.
This patch must be backported to 3.2.
Most fe and be counters are good candidates for being shared between
processes. They are now grouped inside "shared" struct sub member under
be_counters and fe_counters.
Now they are properly identified, they would greatly benefit from being
shared over thread groups to reduce the cost of atomic operations when
updating them. For this, we take the current tgid into account so each
thread group only updates its own counters. For this to work, it is
mandatory that the "shared" member from {fe,be}_counters is initialized
AFTER global.nbtgroups is known, because each shared counter causes the stat
to be allocated lobal.nbtgroups times. When updating a counter without
concurrency, the first counter from the array may be updated.
To consult the shared counters (which requires aggregation of per-tgid
individual counters), some helper functions were added to counter.h to
ease code maintenance and avoid computing errors.
proxies, listeners and server shared counters are now managed via helpers
added in one of the previous commits.
When guid is not set (ie: when not yet assigned), shared counters pointer
is allocated using calloc() (local memory) and a flag is set on the shared
counters struct to know how to manipulate (and free it). Else if guid is
set, then it means that the counters may be shared so while for now we
don't actually use a shared memory location the API is ready for that.
The way it works, for proxies and servers (for which guid is not known
during creation), we first call counters_{fe,be}_shared_get with guid not
set, which results in local pointer being retrieved (as if we just
manually called calloc() to retrieve a pointer). Later (during postparsing)
if guid is set we try to upgrade the pointer from local to shared.
Lastly, since the memory location for some objects (proxies and servers
counters) may change from creation to postparsing, let's update
counters->last_change member directly under counters_{fe,be}_shared_get()
so we don't miss it.
No change of behavior is expected, this is only preparation work.
create include/haproxy/counters.h and src/counters.c files to anticipate
for further helpers as some counters specific tasks needs to be carried
out and since counters are shared between multiple object types (ie:
listener, proxy, server..) we need generic helpers.
Add some shared counters helper which are not yet used but will be updated
in upcoming commits.
Shareable counters are not tagged as shared counters and are dynamically
allocated in separate memory area as a prerequisite for being stored
in shared memory area. For now, GUID and threads groups are not taken into
account, this is only a first step.
also we ensure all counters are now manipulated using atomic operations,
namely, "last_change" counter is now read from and written to using atomic
ops.
Despite the numerous changes caused by the counters being moved away from
counters struct, no change of behavior should be expected.
As reported by Ilya in GH #2994, some cleanup parts in
sink_new_from_logger() function are not used.
We can actually simplify the cleanup logic to remove dead code, let's
do that by renaming "error_final" label to "error" and only making use
of the "error" label, because sink_free() already takes care of proper
cleanup for all sink members.
When we try to allocate a new SPOP stream, if an error is encountered,
spop_strm_destroy() is called to released the eventually allocated
stream. But, it must only be called if a stream was allocated. If the
reported error is an SPOP stream allocation failure, we must just leave to
avoid null-pointer dereference.
This patch should fix point 1 of the issue #2993. It must be backported as
far as 3.1.
When a healthchecks is processed, once the first wakeup passed to start the
check, and as long as the expiration timer is not reached, only I/O events
are able to wake it up. It is an issue when there is a check timeout
defined. Especially if the connect timeout is high and the check timeout is
low. In that case, the healthcheck's task is never requeue to handle any
timeout update. When the connection is established, the check timeout is set
to replace the connect timeout. It is thus possible to report a success
while a timeout should be reported.
So, now, when an I/O event is handled, the healthcheck is requeue, except if
an success or an abort is reported.
Thanks to Thierry Fournier for report and the reproducer.
This patch must be backported to all stable versions.
In fwlc_srv_reposition(), set the server's tree_elt while we still hold
the lbprm read lock. While it was protected from concurrent
fwlc_srv_reposition() calls by the server's lb_lock, it was not from
dequeuing/requeuing that could occur if the server gets down/up or its
weight is changed, and that would lead to inconsistencies, and the
watchdog killing the process because it is stuck in an infinite loop in
fwlc_get_next_server().
This hopefully fixes github issue #2990.
This should be backported to 3.2.
rename _srv_postparse() internal function to srv_init() function and group
srv_init_per_thr() plus idle conns list init inside it. This way we can
perform some simplifications as srv_init() performs multiple server
init steps after parsing.
SRV_F_CHECKED flag was added, it is automatically set when srv_init()
runs successfully. If the flag is already set and srv_init() is called
again, nothing is done. This permis to manually call srv_init() earlier
than the default POST_CHECK hook when needed without risking to do things
twice.
while new_server() takes the parent proxy as argument and even assigns
srv->proxy to the parent proxy, it didn't actually inserted the server
to the parent proxy server list on success.
The result is that sometimes we add the server to the list after
new_server() is called, and sometimes we don't.
This is really error-prone and because of that hooks such as
REGISTER_POST_SERVER_CHECK() which as run for all servers listed in
all proxies may not be relied upon for servers which are not actually
inserted in their parent proxy server list. Plus it feels very strange
to have a server that points to a proxy, but then the proxy doesn't know
about it because it cannot find it in its server list.
To prevent errors and make proxy->srv list reliable, we move the insertion
logic directly under new_server(). This requires to know if we are called
during parsing or during runtime to either insert or append the server to
the parent proxy list. For that we use PR_FL_CHECKED flag from the parent
proxy (if the flag is set, then the proxy was checked so we are past the
init phase, thus we assume we are called during runtime)
This implies that during startup if new_server() has to be cancelled on
error paths we need to call srv_detach() (which is now exposed in server.h)
before srv_drop().
The consequence of this commit is that REGISTER_POST_SERVER_CHECK() should
not run reliably on all servers created using new_server() (without having
to manually loop on global servers_list)
REGISTER_POST_PROXY_CHECK() used to iterate over "main" proxies to run
registered callbacks. This means hidden proxies (and their servers) did
not get a chance to get post-checked and could cause issues if some post-
checks are expected to be executed on all proxies no matter their type.
Instead we now rely on the global proxies list. Another side effect is that
the REGISTER_POST_SERVER_CHECK() now runs as well for servers from proxies
that are not part of the main proxies list.
postcheck_log_backend() checks are executed no matter if the proxy
actually has the backend capability while the checks actually depend
on this.
Let's fix that by adding an extra condition to ensure that the BE
capability is set.
This issue is not tagged as a bug because for now it remains impossible
to have a syslog proxy without BE capability in the main proxy list, but
this may change in the future.
We have global proxies_list pointer which is announced as the list of
"all existing proxies", but in fact it only represents regular proxies
declared on the config file through "listen, frontend or backend" keywords
It is ambiguous, and we currently don't have a straightforwrd method to
iterate over all proxies (either public or internal ones) within haproxy
Instead we still have to manually iterate over multiple lists (main
proxies, log-forward proxies, peer proxies..) which is error-prone.
In this patch we add a struct list member (8 bytes) inside struct proxy
in order to store every proxy (except default ones) within a global
"proxies" list which is actually representative for all proxies existing
under haproxy process, like we already have for servers.
proxy_cond_disable() collects and prints cumulated connections for be and
fe proxies no matter their type. With shared stats it may cause issues
because depending on the proxy capabilities only fe or be counters may
be allocated.
In this patch we add some checks to ensure we only try to read from
valid memory locations, else we rely on default values (0).
init_srv_requeue() and init_srv_slowstart() functions are called after
initial server parsing via REGISTER_POST_SERVER_CHECK() hook, and they
are also manually called for dynamic server after the server is
initialized.
This may conflict with _srv_postparse() which is also registered via
REGISTER_POST_SERVER_CHECK() and called during dynamic server creation
To ensure functions don't conflict with each other, let's ensure they
are executed in proper order by calling init_srv_requeue and
init_srv_slowstart() from _srv_postparse() which now becomes the parent
function for server related postparsing stuff. No change of behavior is
expected.
In resolve_sym_name() we declare a few symbols that we want to be able
to resolve. ha_dump_backtrace() was declared with a struct buffer instead
of a pointer to such a struct, which has no effect since we only want to
get the function's pointer, but produces a build warning with LTO, so
let's fix it.
This can be backported to 3.0.
It is not expected/supported to reuse an httpclient instance to process
several requests. A new instance must be created for each request. However,
in lua, there is nothing to prevent a user to create an httpclient object
and use it in a loop to process requests.
That's unfortunate because this will apparently work, the requests will be
sent and a response will be received and processed. However internally some
ressources will be allocated and never released. When the next response is
processed, the ressources allocated for the previous one are definitively
lost.
In this patch we take care to check that the httpclient object was never
used when a request is sent from a lua script by checking
HTTPCLIENT_FS_STARTED flags. This flag is set when a httpclient applet is
spawned to process a request and never removed after that. In lua, the
httpclient applet is created when the request is sent. So, it is the right
place to do this test.
This patch should fix the issue #2986. It should be backported as far as
2.6.
An optional timeout was added to AppletTCP.receive() to interrupt calls after a
delay. It was mandatory to be able to implement interactive applets (like
trisdemo). However, this broke the API and it made impossible to differentiate
the shutdowns from the delays expirations. Indeed, in both cases, an empty
string was returned.
Because historically an empty string was used to notify a connection shutdown,
it should not be changed. So now, 'nil' value is returned when no data was
available before the delay expiration.
The new AppletTCP:try_receive() function was also affected. To fix it, instead
of stating there is no delay when a receive is tried, an expired delay is
set. Concretely TICK_ETERNITY was replaced by now_ms.
Finally, AppletTCP:getline() function is not concerned for now because there
is no way to interrupt it after some delay.
The documentation and trisdemo lua script were updated accordingly.
This patch depends on "BUG/MEDIUM: hlua: Properly detect shudowns for TCP
applets based on the new API". However, it is a 3.2-specific issue, so no
backport is needed.
The commit e5e36ce09 ("BUG/MEDIUM: hlua/cli: Fix lua CLI commands to work
with applet's buffers") fixed the TCP applets API to work with applets using
its own buffers. Howver the getline() function was not updated. It could be
an issue for anyone registering a CLI commands reading lines.
This patch should be backported as far as 3.0.
The internal function responsible to receive data for TCP applets with
internal buffers is buggy. Indeed, for these applets, the buffer API is used
to get data. So there is no tests on the SE to properly detect connection
shutdowns. So, it must be performed by hand after the call to b_getblk_nc().
This patch must be backported as far as 3.0.
The commit 03dc54d802 ("BUG/MINOR: ring: Fix I/O handler of "show event"
command to not rely on the SC") introduced a regression. By removing
dependencies on the SC, a test to detect client shutdowns was removed. So
now, the CLI applet is no longer released when the client shut the
connection during a "show event -w".
So of course, we should not use the SC to detect the shutdowns. But the SE
must be used insteead.
It is a 3.2-specific issue, so no backport needed.
It is now possile to set a label on a bind line. All sockets attached to
this bind line inherits from this label. The idea is to be able to groud of
sockets. For now, there is no mechanism to create these groups, this must be
done by hand.
Since commit 2c3d656f8 ("MEDIUM: h3: use absolute URI form with
:authority"), the absolute URI form is used when a ':authority'
pseudo-header is found. However, this URI was not declared as normalized
internally. So, when the request is reformated to be sent to an h1 server,
the absolute-form is used instead of the origin-form. It is unexpected and
may be an issue for some servers that could reject the request.
So, now, we take care to set HTX_SL_F_HAS_AUTHORITY flag on the HTX message
when an authority was found and HTX_SL_F_NORMALIZED_URI flag is set for
"http" or "https" schemes.
No backport needed because the commit above must not be backported. It
should fix a regression reported on the 3.2-dev17 in issue #2977.
This commit depends on "BUG/MINOR: h3: Set HTX flags corresponding to the
scheme found in the request".
When a ":scheme" pseudo-header is found in a h3 request, the
HTX_SL_F_HAS_SCHM flag must be set on the HTX message. And if the scheme is
'http' or 'https', the corresponding HTX flag must also be set. So,
respectively, HTX_SL_F_SCHM_HTTP or HTX_SL_F_SCHM_HTTPS.
It is mainly used to send the right ":scheme" pseudo-header value to H2
server on backend side.
This patch could be backported as far as 2.6.
When "vary" is enabled, we can have multiple entries for a given primary
key in the cache tree. There is a limit to how many secondary entries
can be inserted for a given key. When we try to insert a new secondary
entry, if the limit is already reached, we can try to find expired
entries with the same primary key, and if the limit is still reached we
want to abort the current insertion and to remove the node that was just
inserted.
In commit "a29b073: MEDIUM: cache: Add refcount on cache_entry" though,
a regression was introduced. Instead of removing the entry just inserted
as the comments suggested, we removed the second to last entry and
returned NULL. We then reset the eb.key of the cache_entry in the caller
because we assumed that the entry was already removed from the tree.
This means that some entries with an empty key were wrongly kept in the
tree and the last secondary entry, which keeps the number of secondary
entries of a given key was removed.
This ended up causing some crashes later on when we tried to iterate
over the elements of this given key. The crash could occur in multiple
places, either when trying to retrieve an entry or to add some new ones.
This crash was raised in GitHub issue #2950.
The fix should be backported up to 3.0.
A valid build warning was reported in the CI with latest commit b40ce97ecc
("BUG/MEDIUM: server: fix crash after duplicate GUID insertion"). Indeed,
if the first test in the function fails, we branch to the err label
with guid==NULL and will crash there. Let's just test guid before
dereferencing it for freeing.
This needs to be backported to 3.0 as well since the commit above was
meant to go there.
On "add server", if a GUID is defined, guid_insert() is used to add the
entry into the global GUID tree. If a similar entry already exists, GUID
insertion fails and the server creation is eventually aborted.
A crash could occur in this case because of an invalid memory access via
guid_remove(). The latter is caused via free_server() as the server
insertion is rejected. The invalid occurs on GUID key.
The issue occurs because of guid_insert(). The function properly
deallocates the GUID key on duplicate insertion, but it failed to reset
<guid.node.key> to NULL. This caused the invalid memory access on
guid_remove(). To fix this, ensure that key member is properly resetted
on guid_insert() error path.
This must be backported up to 3.0.
Implement stress mode on "add server help". This ensures that the
command is fully reentrant on full output buffer.
For testing, it requires compilation with USE_STRESS and global setting
"stress-level 1".
Implement "help" as a sub-command for "add server" CLI. The objective is
to list all the keywords that are supported for dynamic servers. CLI IO
handler and add_srv_ctx are used to support reentrancy on full output
buffer.
Now that this command is implemented, the outdated keyword list on "add
server" from management documentation can be removed.
Extend "add server" to support an IO handler function named
cli_io_handler_add_server(). A context object is also defined whose
usage will depend on IO handler capabilities.
IO handler is skipped when "add server" is run in default mode, i.e. on
a dynamic server creation. Thus, currently IO handler is unneeded.
However, it will become useful to support sub-commands for "add server".
Note that return value of "add server" parser has been changed on server
creation success. Previously, it was used incorrectly to report if
server was inserted or not. In fact, parser return value is used by CLI
generic code to detect if command processing has been completed, or
should continue to the IO handler. Now, "add server" always returns 1 to
signal that CLI processing is completed. This is necessary to preserve
CLI output emitted by parser, even now that IO handler is defined for
the command. Previously, output was emitted in every situations due to
IO handler not defined. See below code snippet from cli.c for a better
overview :
if (kw->parse && kw->parse(args, payload, appctx, kw->private) != 0) {
ret = 1;
goto fail;
}
/* kw->parse could set its own io_handler or io_release handler */
if (!appctx->cli_ctx.io_handler) {
ret = 1;
goto fail;
}
appctx->st0 = CLI_ST_CALLBACK;
ret = 1;
goto end;
Currently there is "no-tls-tickets" that is also supported in the
ssl-default-bind-options directive, but there's no way to re-enable
them on a specific "bind" line. This patch simply provides the option
to re-enable them. Note that the flag is inverted because tickets are
enabled by default and the no-tls-ticket option sets the flag to
disable them.
Several users already reported that it would be nice to support
strict-sni in ssl-default-bind-options. However, in order to support
it, we also need an option to disable it.
This patch moves the setting of the option from the strict_sni field
to a flag in the ssl_options field so that it can be inherited from
the default bind options, and adds a new "no-strict-sni" directive to
allow to disable it on a specific "bind" line.
The test file "del_ssl_crt-list.vtc" which already tests both options
was updated to make use of the default option and the no- variant to
confirm everything continues to work.
In the Prometheus exporter, the last health check status is already exposed,
with its code and duration in seconds. The server status is also exposed.
But the information about the agent check are not available. It is not
really handy because when a server status is changed because of the agent,
it is not obvious by looking to the Prometheus metrics. Indeed, the server
may reported as DOWN for instance, while the health check status still
reports a success. Being able to get the agent status in that case could be
valuable.
So now, the last agent check status is exposed, with its code and duration
in seconds. Following metrics can be grabbe now:
* haproxy_server_agent_status
* haproxy_server_agent_code
* haproxy_server_agent_duration_seconds
Note that unlike the other metrics, no per-backend aggregated metric is
exposed.
This patch is related to issue #2983.
It was mentioned during the development of glitches that it would be
nice to support not killing misbehaving connections below a certain
CPU usage so that poor implementations that routinely misbehave without
impact are not killed. This is now possible by setting a CPU usage
threshold under which we don't kill them via this parameter. It defaults
to zero so that we continue to kill them by default.
Insert some missing includes statement in QUIC source files. This was
detected after the next commit which adjust the include list used in
quic_conn-t.h file.
quic-conn layer has to handle itself STREAM frames after MUX release. If
the stream was already seen, it is probably only a retransmitted frame
which can be safely ignored. For other streams, an active closure may be
needed.
Thus it's necessary that quic-conn layer knows the highest stream ID
already handled by the MUX after its release. Previously, this was done
via <nb_streams> member array in quic-conn structure.
Refactor this by replacing <nb_streams> by two members called
<stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for
quic-conn layer to monitor locally opened uni streams, as the peer
cannot by definition emit a STREAM frame on it. Also, bidirectional
streams are always opened by the remote side.
Previously, <nb_streams> were set by quic-stream layer. Now,
<stream_max_uni>/<stream_max_bidi> members are only set one time, just
prior to QUIC MUX release. This is sufficient as quic-conn do not use
them if the MUX is available.
Note that previously, IDs were used relatively to their type, thus
incremented by 1, after shifting the original value. For simplification,
use the plain stream ID, which is incremented by 4.
Move general function to check if a stream is uni or bidirectional from
QUIC MUX to quic_utils module. This should prevent unnecessary include
of QUIC MUX header file in other sources.
Stop emitting \n in errmsg for intermediate error messages, this was
emitting multiline logs and was returning to a new line in the middle of
sentences.
We don't need to emit them in acme_start_task() since the errmsg is
ouput in a send_log which already contains a \n or on the CLI which
also emits it.
When starting the ACME task with a ckch_conf which does not contain the
domains, the ACME task would segfault because it will try to dereference
a NULL in this case.
The patch fix the issue by emitting a warning when no domains are
configured. It's not done at configuration parsing because it is not
easy to emit the warning because there are is no callback system which
give access to the whole ckch_conf once a line is parsed.
No backport needed.
RX buffer allocation has been reworked in current dev tree. The
objective is to support multiple buffers per QCS to improve upload
throughput.
RX buffer allocation failure is handled simply : the whole connection is
closed. This is done via qcc_set_error(), with INTERNAL_ERROR as error
code. This function contains a BUG_ON() to ensure it is called only one
time per connection instance.
On RX buffer alloc failure, the aformentioned BUG_ON() crashes due to a
double invokation of qcc_set_error(). First by qcs_get_rxbuf(), and
immediately after it by qcc_recv(), which is the caller of the previous
one. This regression was introduced by the following commit.
60f64449fb
MAJOR: mux-quic: support multiple QCS RX buffers
To fix this, simply remove qcc_set_error() invocation in
qcs_get_rxbuf(). On buffer alloc failture, qcc_recv() is responsible to
set the error.
This does not need to be backported.
The build failed on mips32 with a 64-bit time_t here:
https://github.com/haproxy/haproxy/actions/runs/15150389164/job/42595310111
Let's just turn the "remain" variable used to show the remaining time
into a more portable ullong and use %llu for all format specifiers,
since long remains limited to 32-bit on 32-bit archs.
No backport needed.
When building on MIPS-32 with gcc-9.5 and glibc-2.31, I got this:
src/ssl_trace.c: In function 'ssl_trace':
src/ssl_trace.c:118:42: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'ssize_t' {aka 'const int'} [-Wformat=]
118 | chunk_appendf(&trace_buf, " : size=%ld", *size);
| ~~^ ~~~~~
| | |
| | ssize_t {aka const int}
| long int
| %d
Let's just cast the type. No backport needed.
With commit a06c215f08 ("MEDIUM: wdt: always make the faulty thread
report its own warnings"), when the TH_FL_STUCK flag was flipped on,
we'd then go to the panic code instead of giving a second chance like
before the commit. This can trigger rare cases that only happen with
moderate loads like was addressed by commit 24ce001771 ("BUG/MEDIUM:
wdt: fix the stuck detection for warnings"). This is in fact due to
the loss of the common "goto update_and_leave" that used to serve
both the warning code and the flag setting for probation, and it's
apparently what hit Christian in issue #2980.
Let's make sure we exit naturally when turning the bit on for the
first time. Let's also update the confusing comment at the end of
the check that was left over by latest change.
Since the first commit was backported to 3.1, this commit should be
backported there as well.
For an unidentified reason, SSL_do_hanshake() succeeds at its first call when 0-RTT
is enabled for the connection. This behavior looks very similar by the one encountered
by AWS-LC stack. That said, it was documented by AWS-LC. This issue leads the
connection to stop sending handshake packets after having release the handshake
encryption level. In fact, no handshake packets could even been sent leading
the handshake to always fail.
To fix this, this patch simulates a "handshake in progress" state waiting
for the application level read secret to be established by the TLS stack.
This may happen only after the QUIC listener has completed/confirmed the handshake
upon handshake CRYPTO data receipt from the peer.
A QUIC must sent its transport parameter using a TLS custom extention. This
extension is reset by SSL_set_SSL_CTX(). It can be restored calling
quic_ssl_set_tls_cbs() (which calls SSL_set_quic_tls_cbs()).
The quic_conn struct is modified for two reasons. The first one is to store
the encoded version of the local tranport parameter as this is done for
USE_QUIC_OPENSSL_COMPAT. Indeed, the local transport parameter "should remain
valid until after the parameters have been sent" as mentionned by
SSL_set_quic_tls_cbs(3) manual. In our case, the buffer is a static buffer
attached to the quic_conn object. qc_ssl_set_quic_transport_params() function
whose role is to call SSL_set_tls_quic_transport_params() (aliased by
SSL_set_quic_transport_params() to set these local tranport parameter into
the TLS stack from the buffer attached to the quic_conn struct.
The second quic_conn struct modification is the addition of the new ->prot_level
(SSL protection level) member added to the quic_conn struct to store "the most
recent write encryption level set via the OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn
callback (if it has been called)" as mentionned by SSL_set_quic_tls_cbs(3) manual.
This patches finally implements the five remaining callacks to make the haproxy
QUIC implementation work.
OSSL_FUNC_SSL_QUIC_TLS_crypto_send_fn() (ha_quic_ossl_crypto_send) is easy to
implement. It calls ha_quic_add_handshake_data() after having converted
qc->prot_level TLS protection level value to the correct ssl_encryption_level_t
(boringSSL API/quictls) value.
OSSL_FUNC_SSL_QUIC_TLS_crypto_recv_rcd_fn() (ha_quic_ossl_crypto_recv_rcd())
provide the non-contiguous addresses to the TLS stack, without releasing
them.
OSSL_FUNC_SSL_QUIC_TLS_crypto_release_rcd_fn() (ha_quic_ossl_crypto_release_rcd())
release these non-contiguous buffer relying on the fact that the list of
encryption level (qc->qel_list) is correctly ordered by SSL protection level
secret establishements order (by the TLS stack).
OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn() (ha_quic_ossl_got_transport_params())
is a simple wrapping function over ha_quic_set_encryption_secrets() which is used
by boringSSL/quictls API.
OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() (ha_quic_ossl_got_transport_params())
role is to store the peer received transport parameters. It simply calls
quic_transport_params_store() and set them into the TLS stack calling
qc_ssl_set_quic_transport_params().
Also add some comments for all the OpenSSL 3.5 QUIC API callbacks.
This patch have no impact on the other use of QUIC API provided by the others TLS
stacks.
This patch allows the use of the new OpenSSL 3.5.0 QUIC TLS API when it is
available and detected at compilation time. The detection relies on the presence of the
OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_SEND macro from openssl-compat.h. Indeed this
macro is defined by OpenSSL since 3.5.0 version. It is not defined by quictls.
This helps in distinguishing these two TLS stacks. When the detection succeeds,
HAVE_OPENSSL_QUIC is also defined by openssl-compat.h. Then, this is this new macro
which is used to detect the availability of the new OpenSSL 3.5.0 QUIC TLS API.
Note that this detection is done only if USE_QUIC_OPENSSL_COMPAT is not asked.
So, USE_QUIC_OPENSSL_COMPAT and HAVE_OPENSSL_QUIC are exclusive.
At the same location, from openssl-compat.h, ssl_encryption_level_t enum is
defined. This enum was defined by quictls and expansively used by the haproxy
QUIC implementation. SSL_set_quic_transport_params() is replaced by
SSL_set_quic_tls_transport_params. SSL_set_quic_early_data_enabled() (quictls) is also replaced
by SSL_set_quic_tls_early_data_enabled() (OpenSSL). SSL_quic_read_level() (quictls)
is not defined by OpenSSL. It is only used by the traces to log the current
TLS stack decryption level (read). A macro makes it return -1 which is an
usused values.
The most of the differences between quictls and OpenSSL QUI APIs are in quic_ssl.c
where some callbacks must be defined for these two APIs. This is why this
patch modifies quic_ssl.c to define an array of OSSL_DISPATCH structs: <ha_quic_dispatch>.
Each element of this arry defines a callback. So, this patch implements these
six callabcks:
- ha_quic_ossl_crypto_send()
- ha_quic_ossl_crypto_recv_rcd()
- ha_quic_ossl_crypto_release_rcd()
- ha_quic_ossl_yield_secret()
- ha_quic_ossl_got_transport_params() and
- ha_quic_ossl_alert().
But at this time, these implementations which must return an int return 0 interpreted
as a failure by the OpenSSL QUIC API, except for ha_quic_ossl_alert() which
is implemented the same was as for quictls. The five remaining functions above
will be implemented by the next patches to come.
ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() have been moved
to be defined for both quictls and OpenSSL QUIC API.
These callbacks are attached to the SSL objects (sessions) calling qc_ssl_set_cbs()
new function. This latter callback the correct function to attached the correct
callbacks to the SSL objects (defined by <ha_quic_method> for quictls, and
<ha_quic_dispatch> for OpenSSL).
The calls to SSL_provide_quic_data() and SSL_process_quic_post_handshake()
have been also disabled. These functions are not defined by OpenSSL QUIC API.
At this time, the functions which call them are still defined when HAVE_OPENSSL_QUIC
is defined.
There were no traces to diagnose qc_ssl_sess_init() failures from QUIC traces.
This patch add calls to TRACE_DEVEL() into qc_ssl_sess_init() and its caller
(qc_alloc_ssl_sock_ctx()). This was useful at least to diagnose SSL context
initialization failures when porting QUIC to the new OpenSSL 3.5 QUIC API.
Should be easily backported as far as 2.6.
This code is there from QUIC implementation start. It was supposed to
initialize <ha_quic_meth> as a BIO_METHOD static object. But this
BIO_METHOD is not used at all!
Should be backported as far as 2.6 to help integrate the next patches to come.
Output a sink message when the certificate was renewed by the ACME
client.
The message is emitted on the "dpapi" sink, and ends by \n\0.
Since the message contains this binary character, the right -0 parameter
must be used when consulting the sink over the CLI:
Example:
$ echo "show events dpapi -nw -0" | socat -t9999 /tmp/haproxy.sock -
<0>2025-05-19T15:56:23.059755+02:00 acme newcert foobar.pem.rsa\n\0
When used with the master CLI, @@1 should be used instead of @1 in order
to keep the connection to the worker.
Example:
$ echo "@@1 show events dpapi -nw -0" | socat -t9999 /tmp/master.sock -
<0>2025-05-19T15:56:23.059755+02:00 acme newcert foobar.pem.rsa\n\0
On ARM with 80 cores and a single server, it's sometimes possible to see
a segfault in fwlc_get_next_server() around 600-700k RPS. It seldom
happens as well on x86 with 128 threads with the same config around 1M
rps. It turns out that in fwlc_get_next_server(), before calling
fwlc_srv_reposition(), we have to drop the lock and that one takes it
back again.
The problem is that anything can happen to our node during this time,
and it can be freed. Then when continuing our work, we later iterate
over it and its next to find a node with an acceptable key, and by
doing so we can visit either uninitialized memory or simply nodes that
are no longer in the tree.
A first attempt at fixing this consisted in artificially incrementing
the elements count before dropping the lock, but that turned out to be
even worse because other threads could loop forever on such an element
looking for an entry that does not exist. Maintaining a separate
refcount didn't work well either, and it required to deal with the
memory release while dropping it, which is really not convenient.
Here we're taking a different approach consisting in simply not
trusting this node anymore and going back to the beginning of the
loop, as is done at a few other places as well. This way we can
safely ignore the possibly released node, and the test runs reliably
both on the arm and the x86 platforms mentioned above. No performance
regression was observed either, likely because this operation is quite
rare.
No backport is needed since this appeared with the leastconn rework
in 3.2.
If there is an alloc failure during qc_new_conn(), cleaning is done via
quic_conn_release(). However, since the below commit, an unchecked
dereferencing of <qc.path> is performed in the latter.
e841164a44
MINOR: quic: account for global congestion window
To fix this, simply check <qc.path> before dereferencing it in
quic_conn_release(). This is safe as it is properly initialized to NULL
on qc_new_conn() first stage.
This does not need to be backported.
The queue length was moved to its own variable in commit 583303c48
("MINOR: proxies/servers: Calculate queueslength and use it."), however a
few places were missed in pendconn_unlink() and assign_server_and_queue()
resulting in never decreasing counts on aborted streams. This was
reproduced when injecting more connections than the total backend
could stand in TCP mode and letting some of them time out in the
queue. No backport is needed, this is only 3.2.
Since commit 9fe72bba3 ("MAJOR: leastconn; Revamp the way servers are
ordered."), there's no way to escape the loop visiting the mt_list heads
in fwlc_get_next_server if all servers in the list are saturated,
resulting in a watchdog panic. It can be reproduced with this config
and injecting with more than 2 concurrent conns:
balance leastconn
server s1 127.0.0.1:8000 maxconn 1
server s2 127.0.0.1:8000 maxconn 1
Here we count the number of saturated servers that were encountered, and
escape the loop once the number of remaining servers exceeds the number
of saturated ones. No backport is needed since this arrived in 3.2.
Building with clang 16 on MIPS64 yields this warning:
src/slz.c:931:24: warning: unused function 'crc32_uint32' [-Wunused-function]
static inline uint32_t crc32_uint32(uint32_t data)
^
Let's guard it using UNALIGNED_LE_OK which is the only case where it's
used. This saves us from introducing a possibly non-portable attribute.
This is libslz upstream commit f5727531dba8906842cb91a75c1ffa85685a6421.
Calling slz_rfc1950_finish() without emitting any data would result in
incorrectly emitting a gzip header (rfc1952) instead of a zlib header
(rfc1950) due to a copy-paste between the two wrappers. The impact is
almost inexistent since the zlib format is almost never used in this
context, and compressing totally empty messages is quite rare as well.
Let's take this opportunity for fixing another mistake on an RFC number
in a comment.
This is slz upstream commit 7f3fce4f33e8c2f5e1051a32a6bca58e32d4f818.
The current hash involves 3 simple shifts and additions so that it can
be mapped to a multiply on architecures having a fast multiply. This is
indeed what the compiler does on x86_64. A large range of values was
scanned to try to find more optimal factors on machines supporting such
a fast multiply, and it turned out that new factor 0x1af42f resulted in
smoother hashes that provided on average 0.4% better compression on both
the Silesia corpus and an mbox file composed of very compressible emails
and uncompressible attachments. It's even slightly better than CRC32C
while being faster on Skylake. This patch enables this factor on archs
with a fast multiply.
This is slz upstream commit 82ad1e75c13245a835c1c09764c89f2f6e8e2a40.
If building for sse4 and USE_CRC32C_HASH is defined, then we can use
crc32c to calculate the lookup hash. By default we don't do it because
even on skylake it's slower than the current hash, which only involves
a short multiply (~5% slower). But the gains are marginal (0.3%).
This is slz upstream commit 44ae4f3f85eb275adba5844d067d281e727d8850.
Note: this is not used by default and only merged in order to avoid
divergence between the code bases.
On 64-bit platforms, disassembling the code shows that send_huff() performs
a left shift followed by a right one, which are the result of integer
truncation and zero-extension caused solely by using different types at
different levels in the call chain. By making encode24() take a 64-bit
int on input and send_huff() take one optionally, we can remove one shift
in the hot path and gain 1% performance without affecting other platforms.
This is slz upstream commit fd165b36c4621579c5305cf3bb3a7f5410d3720b.
In continuation with 9a05c1f574 ("BUG/MEDIUM: h2/h3: reject some
forbidden chars in :authority before reassembly") and the discussion
in issue #2941, @DemiMarie rightfully suggested that Host should also
be sanitized, because it is sometimes used in concatenation, such as
this:
http-request set-url https://%[req.hdr(host)]%[pathq]
which was proposed as a workaround for h2 upstream servers that require
:authority here:
https://www.mail-archive.com/haproxy@formilux.org/msg43261.html
The current patch then adds the same check for forbidden chars in the
Host header, using the same function as for the patch above, since in
both cases we validate the host:port part of the authority. This way
we won't reconstruct ambiguous URIs by concatenating Host and path.
Just like the patch above, this can be backported afer a period of
observation.
Let's make sure we drop extraneous Host headers after having compared
them. That also works when :authority was already present. This way,
like for h1 and h2, we only keep one copy of it, while still making
sure that Host matches :authority. This way, if a request has both
:authority and Host, only one Host header will be produced (from
:authority). Note that due to the different organization of the code
and wording along the evolving RFCs, here we also check that all
duplicates are identical, while h2 ignores them as per RFC7540, but
this will be re-unified later.
This should be backported to stable versions, at least 2.8, though
thanks to the existing checks the impact is probably nul.
It is especially a problem with Lua filters, but it is important to disable
the 0-copy forwarding if a filter alters the payload, or at least to be able
to disable it. While the filter is registered on the data filtering, it is
not an issue (and it is the common case) because, there is now way to
fast-forward data at all. But it may be an issue if a filter decides to
alter the payload and to unregister from data filtering. In that case, the
0-copy forwarding can be re-enabled in a hardly precdictable state.
To fix the issue, a SC flags was added to do so. The HTTP compression filter
set it and lua filters too if the body length is changed (via
HTTPMessage.set_body_len()).
Note that it is an issue because of a bad design about the HTX. Many info
about the message are stored in the HTX structure itself. It must be
refactored to move several info to the stream-endpoint descriptor. This
should ease modifications at the stream level, from filter or a TCP/HTTP
rules.
This should be backported as far as 3.0. If necessary, it may be backported
on lower versions, as far as 2.6. In that case, it must be reviewed and
adapted.
There was no function for a lua filter to change the body length of an HTTP
Message. But it is mandatory to be able to alter the message payload. It is
not possible update to directly update the message headers because the
internal state of the message must also be updated accordingly.
It is the purpose of HTTPMessage.set_body_len() function. The new body
length myst be passed as argument. If it is an integer, the right
"Content-Length" header is set. If the "chunked" string is used, it forces
the message to be chunked-encoded and in that case the "Transfer-Encoding"
header.
This patch should fix the issue #2837. It could be backported as far as 2.6.
There's a configurable limit to the number of messages sent to a
peer (tune.peers.max-updates-at-once), but this one is not applied to
the receive side. While it can usually be OK with default settings,
setups involving a large tune.bufsize (1MB and above) regularly
experience high latencies and even watchdogs during reloads because
the full learning process sends a lot of data that manages to fill
the entire buffer, and due to the compactness of the protocol, 1MB
of buffer can contain more than 100k updates, meaning taking locks
etc during this time, which is not workable.
Let's make sure the receiving side also respects the max-updates-at-once
setting. For this it counts incoming updates, and refrains from
continuing once the limit is reached. It's a bit tricky to do because
after receiving updates we still have to send ours (and possibly some
ACKs) so we cannot just leave the loop.
This issue was reported on 3.1 but it should progressively be backported
to all versions having the max-updates-at-once option available.
using "send-proxy" or "send-proxy-v2" option on a ring server is not
relevant nor supported. Worse, on 2.4 it causes haproxy process to
crash as reported in GH #2965.
Let's be more explicit about the fact that this keyword is not supported
under "ring" context by ignoring the option and emitting a warning message
to inform the user about that.
Ideally, we should do the same for peers and log servers. The proper way
would be to check servers options during postparsing but we currently lack
proper cross-type server postparsing hooks. This will come later and thus
will give us a chance to perform the compatibilty checks for server
options depending on proxy type. But for now let's simply fix the "ring"
case since it is the only one that's known to cause a crash.
It may be backported to all stable versions.
Some trace messages were not really accurrate, reporting a CLOSED connection
while only an error was reported on it. In addition, an TRACE_ERROR() was
used to report a short read on HELLO/DISCONNECT frames header. But it is not
an error. a TRACE_DEVEL() should be used instead.
This patch could be backported to 3.1 to ease future backports.
When an read error is detected, no error must be reported on the SPOP
connection is there are still some data to parse. It is important to be sure
to process all data before reporting the error and be sure to not truncate
received frames. However, we must also take care to handle short read case
to not wait data that will never be received.
This patch must be backported to 3.1.
There was no test in the demux part to detect truncated frames and to report
an error at the connection level. The SPOP streams were properly switch to
half-closed state. But waiting the associated SPOE applets were woken up and
released, the SPOP connection could be woken up several times for nothing. I
never triggered the watchdog in that case, but it is not excluded.
Now, at the end of the demux function, if a specific test was added to
detect truncated frames to report an error and close the connection.
This patch must be backported to 3.1.
When a frame was not fully received, a short read must be reported on the
SPOP connection to help the demux to handle truncated frames. This was
performed for frames truncated on the header part but not on the payload
part. It is now properly detected.
This patch must be backported to 3.1.
The CLOSING state was not handled at all by the SPOP multiplexer while it is
mandatory when a DISCONNECT frame was sent and the mux should wait for the
DISCONNECT frame in reply from the agent. Thanks to this patch, it should be
fixed.
In addition, if an error occurres during the AGENT HELLO frame parsing, the
SPOP connection is no longer switched to CLOSED state and remains in ERROR
state instead. It is important to be able to send the DISCONNECT frame to
the agent instead of closing the TCP connection immediately.
This patch depends on following commits:
* BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state
* MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
* BUG/MINOR: mux-spop: Don't open new streams for SPOP connection on error
* BUG/MINOR: mux-spop: Make the demux stream ID a signed integer
All the series must be backported to 3.1.
SPOP_CS_FRAME_H and SPOP_CS_FRAME_P states, that were used to handle frame
parsing, were removed. The demux process now relies on the demux stream ID
to know if it is waiting for the frame header or the frame
payload. Concretly, when the demux stream ID is not set (dsi == -1), the
demuxer is waiting for the next frame header. Otherwise (dsi >= 0), it is
waiting for the frame payload. It is especially important to be able to
properly handle DISCONNECT frames sent by the agents.
SPOP_CS_RUNNING state is introduced to know the hello handshake was finished
and the SPOP connection is able to open SPOP streams and exchange NOTIFY/ACK
frames with the agents.
It depends on the following fixes:
* MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
* BUG/MINOR: mux-spop: Make the demux stream ID a signed integer
This change will be mandatory for the next fix. It must be backported to 3.1
with the commits above.
After the ACK frame was parsed, it is useless to set the SPOP connection
state to SPOP_CS_FRAME_H state because this will be automatically handled by
the demux function. If it is not an issue, but this will simplify changes
for the next commit.
Till now, only SPOP connections fully closed or those with a TCP connection on
error were concerned. But available streams could be reported for SPOP
connections in error or closing state. But in these states, no NOTIFY frames
will be sent and no ACK frames will be parsed. So, no new SPOP streams should be
opened.
This patch should be backported to 3.1.