The new "dump ssl cert" CLI command allows to dump a certificate stored
into HAProxy memory. Until now it was only possible to dump the
description of the certificate using "show ssl cert", but with this new
command you can dump the PEM content on the filesystem.
This command is only available on a admin stats socket.
$ echo "@1 dump ssl cert cert.pem" | socat /tmp/master.sock -
-----BEGIN PRIVATE KEY-----
[...]
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
Since c5959fd ("MEDIUM: pattern: merge same pattern"), UAF (leading to
crash) can be experienced if the same pattern file (and match method) is
used in two default sections and the first one is not referenced later in
the config. In this case, the first default section will be cleaned up.
However, due to an unhandled case in the above optimization, the original
expr which the second default section relies on is mistakenly freed.
This issue was discovered while trying to reproduce GH #2708. The issue
was particularly tricky to reproduce given the config and sequence
required to make the UAF happen. Hopefully, Github user @asmnek not only
provided useful informations, but since he was able to consistently
trigger the crash in his environment he was able to nail down the crash to
the use of pattern file involved with 2 named default sections. Big thanks
to him.
To fix the issue, let's push the logic from c5959fd a bit further. Instead
of relying on "do_free" variable to know if the expression should be freed
or not (which proved to be insufficient in our case), let's switch to a
simple refcounting logic. This way, no matter who owns the expression, the
last one attempting to free it will be responsible for freeing it.
Refcount is implemented using a 32bit value which fills a previous 4 bytes
structure gap:
int mflags; /* 80 4 */
/* XXX 4 bytes hole, try to pack */
long unsigned int lock; /* 88 8 */
(output from pahole)
Even though it was not reproduced in 2.6 or below by @asmnek (the bug was
revealed thanks to another bugfix), this issue theorically affects all
stable versions (up to c5959fd), thus it should be backported to all
stable versions.
Using valgrind when running map_beg or map_str, the following error is
reported:
==242644== Conditional jump or move depends on uninitialised value(s)
==242644== at 0x2E4AB1: pat_match_str (pattern.c:457)
==242644== by 0x2E81ED: pattern_exec_match (pattern.c:2560)
==242644== by 0x343176: sample_conv_map (map.c:211)
==242644== by 0x27522F: sample_process_cnv (sample.c:1330)
==242644== by 0x2752DB: sample_process (sample.c:1373)
==242644== by 0x319917: action_store (vars.c:814)
==242644== by 0x24D451: http_req_get_intercept_rule (http_ana.c:2697)
In fact, the error is legit, because in pat_match_{beg,str}, we
dereference the buffer on len+1 to check if a value was previously set,
and then decide to force NULL-byte if it wasn't set.
But the approach is no longer compatible with current architecture:
data past str.data is not guaranteed to be initialized in the buffer.
Thus we cannot dereference the value, else we expose us to uninitialized
read errors. Moreover, the check is useless, because we systematically
set the ending byte to 0 when the conditions are met.
Finally, restoring the older value after the lookup is not relevant:
indeed, either the sample is marked as const and in such case it
is already duplicated, or the sample is not const and we forcefully add
a terminating NULL byte outside from the actual string bytes (since we're
past str.data), so as we didn't alter effective string data and that data
past str.data cannot be dereferenced anyway as it isn't guaranteed to be
initialized, there's no point in restoring previous uninitialized data.
It could be backported in all stable versions. But since this was only
detected by valgrind and isn't known to cause issues in existing
deployments, it's probably better to wait a bit before backporting it
to avoid any breakage.. although the fix should be theoretically harmless.
This is a complementary patch to a68affeaa ("BUG/MINOR: pattern: a sample
marked as const could be written"). Indeed the same logic from
pat_match_str() is used there, but we lack the check to ensure that the
sample is not const before writing data to it.
It could be backported to all stable versions.
After commit e8b1ad4c2 ("BUG/MEDIUM: clock: also update the date offset
on time jumps"), @firexinghe mentioned that the issue was still present
in their case. In fact it depends on the load, which affects the
probability that the time changes between two poll() calls vs that it
changes during poll(). The time correction code used to only deal with
the latter. But under load if it changes between two poll() calls, what
happens then is that before_poll is off, and after returning from poll(),
the date is within bounds defined by before_poll, so no correction is
applied.
After many tests, it turns out that the most reliable solution without
using CLOCK_MONOTONIC is to prevent before_poll from being earlier than
the previous after_poll (trivial), and to cover forward jumps, we need
to enforce a margin. Given that the watchdog kills a looping task within
2 seconds and that no sane setup triggers it, it seems that 2 seconds
remains a safe enough margin. This means that in the worst case, some
forward jumps of up to 2 seconds will not be corrected, leading to an
apparent fast time and low rates. But this is supposed to be an exceptional
event anyway (typically an admin or crontab running ntpdate).
For future versions, given that we now opportunistically call
now_mono_time() before and after poll(), that returns zero if not
supported, we could imagine relying on this one for the thread's local
time when it's non-null.
"http-messaging/protocol_upgrade.vtc" script was updated to test upgrades
for requests with a payload. It should fail when the request is sent to a H2
server. When sent to a H1 server, it should succeed, except if the server
replies before the end of the request.
Since 1d2d77b27 ("MEDIUM: mux-h1: Return a 501-not-implemented for upgrade
requests with a body"), it is no longer possible to perform a protocol
upgrade for requests with a payload. The main reason was to be able to
support protocol upgrade for H1 client requesting a H2 server. In that case,
the upgrade request is converted to a CONNECT request. So, it is not
possible to convey a payload in that case.
But, it is a problem for anyone wanting to perform upgrades on H1 server
using requests with a payload. It is uncommon but valid. So, now, it is the
H2 multiplexer responsibility to reject upgrade requests, on server side, if
there is a payload. An INTERNAL_ERROR is returned for the H2S in that
case. On H1 side, the upgrade is now allowed, but only if the server waits
for the end of the request to return the 101-Switching-protocol
response. Indeed, it is quite hard to synchronise the frontend side and the
backend side in that case. Asking to servers to fully consume the request
payload before returned the response seems reasonable.
This patch should fix the issue #2684. It could be backported after a period
of observation, as far as 2.4 if possible. But only if it is not too
hard. It depends on "MINOR: mux-h1: Set EOI on SE during demux when both
side are in DONE state".
For now, this case is already handled for all requests except for those
waiting for a tunnel establishment (CONNECT and protocol upgrades). It is
not an issue because only bodyless requests are supported in these cases. So
the request is always finished at the end of headers and therefore before
the response.
However, to relax conditions for full H1 protocol upgrades (H1 client and
server), this case will be necessary. Indeed, the idea is to be able to
perform protocol upgrades for requests with a payload. Today, the "Upgrade:"
header is removed before sending the request to the server. But to support
this case, this patch is required to properly finish transaction when the
server does not perform the upgrade.
When HAPROXY_HTTP_LOG_FMT was added by commit 537b9e7f36 ("MINOR: config:
add environment variables for default log format"), the example was placed
by accident after the clf log format instead of the HTTP log format,
causing a bit of confusion.
This can be backported to 2.8.
Released version 3.1-dev7 with the following main changes :
- MINOR: config: Created env variables for http and tcp clf formats
- MINOR: mux-quic: add buf_in_flight to QCC debug infos
- MINOR: mux-quic: correct qcc_bufwnd_full() documentation
- MINOR: tools: add helpers to backup/clean/restore env
- MINOR: mworker: restore initial env before wait mode
- BUG/MINOR: haproxy: free init_env in deinit only if allocated
- BUILD: tools: environ is not defined in OS X and BSD
- DEV: coccinelle: add a test to detect unchecked malloc()
- DEV: coccinelle: add a test to detect unchecked calloc()
- CI: QUIC Interop AWS-LC: enable ngtcp2 client
- CI: fix missing comma introduced in 956839c0f6
- CI: QUIC Interop: do not run bandwidth measurement tests
- CI: QUIC Interop: use different artifact names for uploading logs
- BUILD: quic: 32bits build broken by wrong integer conversions for printf()
- CLEANUP: ssl: cleanup the clienthello capture
- MEDIUM: ssl: capture the supported_versions extension from Client Hello
- MEDIUM: ssl/sample: add ssl_fc_supported_versions_bin sample fetch
- MEDIUM: ssl: capture the signature_algorithms extension from Client Hello
- MEDIUM: ssl/sample: add ssl_fc_sigalgs_bin sample fetch
- MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status
- BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding
- BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no longer ready
- BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry
- CLEANUP: haproxy: fix typos in code comment
- CLEANUP: mqtt: fix typo in MQTT_REMAINING_LENGHT_MAX_SIZE
- MINOR: tools: Implement ipaddrcpy().
- MINOR: quic: Implement quic_tls_derive_token_secret().
- MINOR: quic: Token for future connections implementation.
- BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
- MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
- MINOR: quic: Implement qc_ssl_eary_data_accepted().
- MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
- BUG/MEDIUM: quic: always validate sender address on 0-RTT
- BUILD: quic: fix build errors on FreeBSD since recent GSO changes
- MINOR: tools: extend str2sa_range to add an alt parameter
- MINOR: server: add a alt_proto field for server
- MEDIUM: sock: use protocol when creating socket
- MEDIUM: protocol: add MPTCP per address support
- BUG/MINOR: quic: Crash from trace dumping SSL eary data status (AWS-LC)
- MEDIUM: stick-table: Add support of a factor for IN/OUT bytes rates
- MEDIUM: bwlim: Use a read-lock on the sticky session to apply a shared limit
- BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown
- BUG/MEDIUM: cli: Always release back endpoint between two commands on the mcli
- BUG/MINOR: quic: unexploited retransmission cases for Initial pktns.
- BUG/MEDIUM: mux-h1: Properly handle empty message when an error is triggered
- MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places
- BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
- BUG/MINOR: mux-spop: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
- BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns
- BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in mux_pt_shut()
- CLEANUP: assorted typo fixes in the code and comments
- DEV: patchbot: count the number of backported/non-backported patches
- DEV: patchbot: add direct links to show only specific categories
- DEV: patchbot: detect commit IDs starting with 7 chars
- BUG/MEDIUM: clock: also update the date offset on time jumps
- MEDIUM: server: add init-state
Allow the user to set the "initial state" of a server.
Context:
Servers are always set in an UP status by default. In
some cases, further checks are required to determine if the server is
ready to receive client traffic.
This introduces the "init-state {up|down}" configuration parameter to
the server.
- when set to 'fully-up', the server is considered immediately available
and can turn to the DOWN sate when ALL health checks fail.
- when set to 'up' (the default), the server is considered immediately
available and will initiate a health check that can turn it to the DOWN
state immediately if it fails.
- when set to 'down', the server initially is considered unavailable and
will initiate a health check that can turn it to the UP state immediately
if it succeeds.
- when set to 'fully-down', the server is initially considered unavailable
and can turn to the UP state when ALL health checks succeed.
The server's init-state is considered when the HAProxy instance
is (re)started, a new server is detected (for example via service
discovery / DNS resolution), a server exits maintenance, etc.
Link: https://github.com/haproxy/haproxy/issues/51
In GH issue #2704, @swimlessbird and @xanoxes reported problems handling
time jumps. Indeed, since 2.7 with commit 4eaf85f5d9 ("MINOR: clock: do
not update the global date too often") we refrain from updating the global
offset in case it didn't change. But there's a catch: in case of a large
time jump, if the poller was interrupted, the local time remains the same
and we return immediately from there without updating the offset. It then
becomes incorrect regarding the "date" value, and upon subsequent call to
the poller, there's no way to detect a jump anymore so we apply the old,
incorrect offset and the date becomes wrong. Worse, going back to the
original time (then in the past), global_now_ns remains higher than the
local time and neither get updated anymore.
What is missing in practice is to immediately update the offset when
detecting a time jump. In an ideal world, the offset would be updated
upon every call, that's what was being done prior to commit above but
it's extremely CPU intensive on large systems. However we can perfectly
afford to update the offset every time we detect a time jump, as it's
not as common.
This needs to be backported as far as 2.8. Thanks to both participants
above for providing very helpful details.
It's useful to instantly see how many patches of each category have
already been backported and are still pending, let's count them and
report them at the top of the page.
A regression was introduced in the commit 76fa71f7a ("BUG/MEDIUM: mux-pt:
Never fully close the connection on shutdown") because of a typo on the
connection flags. CO_FL_SOCK_WR_SH flag must be tested to prevent a call to
conn_sock_shutw() and not CO_FL_SOCK_RD_SH.
Concretly, most of time, it is harmeless because shutdown for writes is
always performed before any shutdown for reads. Except in case describe by
the commit above. But it is not clear if it has an impact or not.
This patch must be backported with the commit above, so as far as 2.9.
This bug arrived with this naive commit:
BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)
which omitted to consider the case where the Initial packet number space
could be discarded before receiving 0-RTT packets.
To fix this, append/insert the O-RTT (early-data) packet number space
into the encryption level list depending on the presence or not of
the Initial packet number space.
This issue was revealed when using aws-lc as TLS stack in GH #2701 issue.
Thank you to @Tristan971 for having reported this issue.
Must be backported where the commit mentionned above is supposed to be
backported: as far as 2.9.
That's the equivalent of the mux-h2 one, except that here there's no
real risk to loop since normally we cannot feed data that bypass the
closed state check (e.g. no zero-copy forward). But it still remains
dirty to be able to leave and empty mbuf with MFULL and MROOM set, so
better clear them as well.
No backport is needed since this is only in 3.1.
There exists an extremely tricky code path that was revealed in 3.0 by
the glitches feature, though it might theoretically have existed before.
TL;DR: a mux mbuf may be full after successfully sending GOAWAY, and
discard its remaining contents without clearing H2_CF_MUX_MFULL and
H2_CF_DEM_MROOM, then endlessly loop in h2_send(), until the watchdog
takes care of it.
What can happen is the following: Some data are received, h2_io_cb() is
called. h2_recv() is called to receive the incoming data. Then
h2_process() is called and in turn calls h2_process_demux() to process
input data. At some point, a glitch limit is reached and h2c_error() is
called to close the connection. The input frame was incomplete, so some
data are left in the demux buffer. Then h2_send() is called, which in
turn calls h2_process_mux(), which manages to queue the GOAWAY frame,
turning the state to H2_CS_ERROR2. The frame is sent, and h2_process()
calls h2_send() a last time (doing nothing) and leaves. The streams
are all woken up to notify about the error.
Multiple backend streams were waiting to be scheduled and are woken up
in turn, before their parents being notified, and communicate with the
h2 mux in zero-copy-forward mode, request a buffer via h2_nego_ff(),
fill it, and commit it with h2_done_ff(). At some point the mux's output
buffer is full, and gets flags H2_CF_MUX_MFULL.
The io_cb is called again to process more incoming data. h2_send() isn't
called (polled) or does nothing (e.g. TCP socket buffers full). h2_recv()
may or may not do anything (doesn't matter). h2_process() is called since
some data remain in the demux buf. It goes till the end, where it finds
st0 == H2_CS_ERROR2 and clears the mbuf. We're now in a situation where
the mbuf is empty and MFULL is still present.
Then it calls h2_send(), which doesn't call h2_process_mux() due to
MFULL, doesn't enter the for() loop since all buffers are empty, then
keeps sent=0, which doesn't allow to clear the MFULL flag, and since
"done" was not reset, it loops forever there.
Note that the glitches make the issue more reproducible but theoretically
it could happen with any other GOAWAY (e.g. PROTOCOL_ERROR). What makes
it not happen with the data produced on the parsing side is that we
process a single buffer of input at once, and there's no way to amplify
this to 30 buffers of responses (RST_STREAM, GOAWAY, SETTINGS ACK,
WINDOW_UPDATE, PING ACK etc are all quite small), and since the mbuf is
cleared upon every exit from h2_process() once the error was sent, it is
not possible to accumulate response data across multiple calls. And the
regular h2_snd_buf() path checks for st0 >= H2_CS_ERROR so it will not
produce any data there either.
Probably that h2_nego_ff() should check for H2_CS_ERROR before accepting
to deliver a buffer, but this needs to be carefully studied. In the mean
time the real problem is that the MFULL flag was kept when clearing the
buffer, making the two inconsistent.
Since it doesn't seem possible to trigger this sequence without the
zero-copy-forward mechanism, this fix needs to be backported as far as
2.9, along with previous commit "MINOR: mux-h2: try to clear DEM_MROOM
and MUX_MFULL at more places" which will strengthen the consistency
between these checks.
Many thanks to Annika Wickert for her detailed report that allowed to
diagnose this problem. CVE-2024-45506 was assigned to this problem.
The code leading to H2_CF_MUX_MFULL and H2_CF_DEM_MROOM being cleared
is quite complex and assumptions about its state are extremely difficult
when reading the code. There are indeed long sequences where the mux might
possibly be empty, still having the flag set until it reaches h2_send()
which will clear it after the last send. Even then it's not obviour whether
it's always guaranteed to release the flag when invoked in multiple passes.
Let's just simplify the conditionnn so that h2_send() does not depend on
"sent" anymore and that h2_timeout_task() doesn't leave the flags set on
the buffer on emptiness. While it doesn't seem to fix anything, it will
make the code more robust against future changes.
When a 400/408/500/501 error is returned by the H1 multiplexer, we first try
to get the error message of the proxy before using the default one. This may
be configured to be mapped on /dev/null or on an empty file. In that case,
no message is emitted, as expected. But everything is handled as the error
was successfully sent.
However, there is an bug here. In h1_send_error() function, this case is not
properly handled. The flag H1C_F_ABRTED is not set on the H1 connection as it
should be and h1_close() function is not called, leaving the H1 connection in an
undefined state.
It is especially an issue when a "empty" 408-Request-Time-out error is emitted
while there are data blocked in the output buffer. In that case, the connection
remains openned until the client closes and a "cR--"/408 is logged repeatedly, every
time the client timeout is reached.
This patch must backported as far as 2.8.
qc_prep_hdshk_fast_retrans() job is to pick some packets to be retransmitted
from Initial and Handshake packet number spaces. A packet may be coalesced to
a first one into the same datagram. When a coalesced packet is inspected for
retransmission, it is skipped if its length would make the total datagram length
it is attached to exceeding the anti-amplification limit. But in this case, the
first packet must be kept for the current retransmission. This is tracked by
this trace statemement:
TRACE_PROTO("will probe Initial packet number space", QUIC_EV_CONN_SPPKTS, qc);
This was not the case because of the wrong "goto end" statement. This latter
must be run only if the Initial packet number space must not be probe with
the first packet found as coalesced to another one which must be skipped.
This bug was revealed by AWS-LC interop runner with handshakeloss and
handshakecorruption which always fail because this stack leads the server
to send more Initial packets.
Thank you to Ilya (@chipitsine) for this issue report in GH #2663.
Must be backported as far as 2.6.
When several commands are chained on the master CLI, the same client
connection is used. Because, it is a TCP connection, the mux PT is used. It
means there is no stream at the mux level. It is not possible to release the
applicative stream between each commands as for the HTTP. So, to work around
this limitation, between two commands, the master CLI is resetting the
stream. It does exactly what it was performed on HTTP to manage keep-alive
connections on old HAProxy versions.
But this part was copied from a code dealing with connection only while the
back endpoint can be an applet or a mux for the master cli. The previous fix
on the mux PT ("BUG/MEDIUM: mux-pt: Never fully close the connection on
shutdown") revealed a bug. Between two commands, the back endpoint was only
released if the connection's XPRT was closed. This works if the back
endpoint is an applet because there is no connection. But for commands sent
to a worker, a connection is used. At this stage, this only works if the
connection's XPRT is closed. Otherwise, the old endpoint is never detached
leading to undefined behavior on the next command execution (most probably a
crash).
Without the commit above, the connection's XPRT is always closed on
shutdown. It is no longer true. At this stage, we must inconditionnally
release the back endpoint by resetting the corresponding sedesc to fix the
bug.
This patch must be backported with the commit above in all stable
versions. On 2.4 and lower, it will need to be adapted.
When a shutdown is reported to the mux (shutdown for reads or writes), the
connexion is immediately fully closed if the mux detects the connexion is
closed in both directions. Only the passthrough multiplexer is able to
perform this action at this stage because there is no stream and no internal
data. Other muxes perform a full connection close during the mux's release
stage. It was working quite well since recently. But, in theory, the bug is
quite old.
In fact, it seems possible for the lower layer to report an error on the
connection in same time a shutdown is performed on the mux. Depending on how
events are scheduled, the following may happen:
1. An connection error is detected at the fd layer and a wakeup is
scheduled on the mux to handle the event.
2. A shutdown for writes is performed on the mux. Here the mux decides to
fully close the connexion. If the xprt is not used to log info, it is
released.
3. The mux is finally woken up. It tries to retrieve data from the xprt
because it is not awayre there was an error. This leads to a crash
because of a NULL-deref.
By reading the code, it is not obvious. But it seems possible with SSL
connection when the handshake is rearmed. It happens when a
SSL_ERROR_WANT_WRITE is reported on a SSL_read() attempt or a
SSL_ERROR_WANT_READ on a SSL_write() attempt.
This bug is only visible if the XPRT is not used to log info. So it is no so
common.
This patch should fix the 2nd crash reported in the issue #2656. It must
first be backported as far as 2.9 and then slowly to all stable versions.
There is no reason to acquire a write-lock on the sticky session when a
shared limit is applied because only the frequency is updated. The sticky
session itself is not modified. We must just take care it is not removed in
the mean time. So a read-lock may be used instead.
Add a factor parameter to stick-tables, called "brates-factor", that is
applied to in/out bytes rates to work around the 32-bits limit of the
frequency counters. Thanks to this factor, it is possible to have bytes
rates beyond the 4GB. Instead of counting each bytes, we count blocks
of bytes. Among other things, it will be useful for the bwlim filter, to be
able to configure shared limit exceeding the 4GB/s.
For now, this parameter must be in the range ]0-1024].
This bug follows this patch:
MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
where a new third variable was added to be dumped from QUIC_EV_CONN_IO_CB trace
event. The quic_trace() code did not reveal there was already another variable
passed as third argument but not dumped. This leaded to crash when dereferencing
a point to an int in place of a point to an SSL object.
This issue was reproduced only by handshakecorruption aws-lc interop test with
s2n-quic as client.
Note that this patch must be backported with this one:
BUG/MEDIUM: quic: always validate sender address on 0-RTT
which depends on the commit mentionned above.
Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension
that enables a TCP connection to use different paths.
Multipath TCP has been used for several use cases. On smartphones, MPTCP
enables seamless handovers between cellular and Wi-Fi networks while
preserving established connections. This use-case is what pushed Apple
to use MPTCP since 2013 in multiple applications [2]. On dual-stack
hosts, Multipath TCP enables the TCP connection to automatically use the
best performing path, either IPv4 or IPv6. If one path fails, MPTCP
automatically uses the other path.
To benefit from MPTCP, both the client and the server have to support
it. Multipath TCP is a backward-compatible TCP extension that is enabled
by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...).
Multipath TCP is included in the Linux kernel since version 5.6 [3]. To
use it on Linux, an application must explicitly enable it when creating
the socket. No need to change anything else in the application.
This attached patch adds MPTCP per address support, to be used with:
mptcp{,4,6}@<address>[:port1[-port2]]
MPTCP v4 and v6 protocols have been added: they are mainly a copy of the
TCP ones, with small differences: names, proto, and receivers lists.
These protocols are stored in __protocol_by_family, as an alternative to
TCP, similar to what has been done with QUIC. By doing that, the size of
__protocol_by_family has not been increased, and it behaves like TCP.
MPTCP is both supported for the frontend and backend sides.
Also added an example of configuration using mptcp along with a backend
allowing to experiment with it.
Note that this is a re-implementation of Bjrn's work from 3 years ago
[4], when haproxy's internals were probably less ready to deal with
this, causing his work to be left pending for a while.
Currently, the TCP_MAXSEG socket option doesn't seem to be supported
with MPTCP [5]. This results in a warning when trying to set the MSS of
sockets in proto_tcp:tcp_bind_listener.
This can be resolved by adding two new variables:
sock_inet(6)_mptcp_maxseg_default that will hold the default
value of the TCP_MAXSEG option. Note that for the moment, this
will always be -1 as the option isn't supported. However, in the
future, when the support for this option will be added, it should
contain the correct value for the MSS, allowing to correctly
set the TCP_MAXSEG option.
Link: https://www.rfc-editor.org/rfc/rfc8684.html [1]
Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2]
Link: https://www.mptcp.dev [3]
Link: https://github.com/haproxy/haproxy/issues/1028 [4]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/515 [5]
Co-authored-by: Dorian Craps <dorian.craps@student.vinci.be>
Co-authored-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Use the protocol configured for a connection when creating the socket,
instead of always using 0.
This change is needed to allow new protocol to be used when creating
the sockets, such as MPTCP. Note however that this patch won't change
anything for now, as the only other value that proto->sock_prot could
hold is IPPROTO_TCP, which has the same behavior as 0 when passed to
socket.
Add a new field alt_proto to the server structures that
specify if an alternate protocol should be used for this server.
This field can be transparently passed to protocol_lookup to get
an appropriate protocol structure.
This change allows thus to create servers with different protocols,
and not only TCP anymore.
Add a new parameter "alt" that will store wether this configuration
use an alternate protocol.
This alt pointer will contain a value that can be transparently
passed to protocol_lookup to obtain an appropriate protocol structure.
This change is needed to allow for example the servers to know if it
need to use an alternate protocol or not.
The following commits broke the build on FreeBSD when QUIC is enabled:
35470d518 ("MINOR: quic: activate UDP GSO for QUIC if supported")
448d3d388 ("MINOR: quic: add GSO parameter on quic_sock send API")
Indeed, it turns out that netinet/udp.h requires sys/types.h to be
included before. Let's just change the includes order to fix the build.
No backport is needed.
It has been reported by Wedl Michael, a student at the University of Applied
Sciences St. Poelten, a potential vulnerability into haproxy as described below.
An attacker could have obtained a TLS session ticket after having established
a connection to an haproxy QUIC listener, using its real IP address. The
attacker has not even to send a application level request (HTTP3). Then
the attacker could open a 0-RTT session with a spoofed IP address
trusted by the QUIC listen to bypass IP allow/block list and send HTTP3 requests.
To mitigate this vulnerability, one decided to use a token which can be provided
to the client each time it successfully managed to connect to haproxy. These
tokens may be reused for future connections to validate the address/path of the
remote peer as this is done with the Retry token which is used for the current
connection, not the next one. Such tokens are transported by NEW_TOKEN frames
which was not used at this time by haproxy.
So, each time a client connect to an haproxy QUIC listener with 0-RTT
enabled, it is provided with such a token which can be reused for the
next 0-RTT session. If no such a token is presented by the client,
haproxy checks if the session is a 0-RTT one, so with early-data presented
by the client. Contrary to the Retry token, the decision to refuse the
connection is made only when the TLS stack has been provided with
enough early-data from the Initial ClientHello TLS message and when
these data have been accepted. Hopefully, this event arrives fast enough
to allow haproxy to kill the connection if some early-data have been accepted
without token presented by the client.
quic_build_post_handshake_frames() has been modified to build a NEW_TOKEN
frame with this newly implemented token to be transported inside.
quic_tls_derive_retry_token_secret() was renamed to quic_do_tls_derive_token_secre()
and modified to be reused and derive the secret for the new token implementation.
quic_token_validate() has been implemented to validate both the Retry and
the new token implemented by this patch. When this is a non-retry token
which could not be validated, the datagram received is marked as requiring
a Retry packet to be sent, and no connection is created.
When the Initial packet does not embed any non-retry token and if 0-RTT is enabled
the connection is marked with this new flag: QUIC_FL_CONN_NO_TOKEN_RCVD. As soon
as the TLS stack detects that some early-data have been provided and accepted by
the client, the connection is marked to be killed (QUIC_FL_CONN_TO_KILL) from
ha_quic_add_handshake_data(). This is done calling qc_ssl_eary_data_accepted()
new function. The secret TLS handshake is interrupted as soon as possible returnin
0 from ha_quic_add_handshake_data(). The connection is also marked as
requiring a Retry packet to be sent (QUIC_FL_CONN_SEND_RETRY) from
ha_quic_add_handshake_data(). The the handshake I/O handler (quic_conn_io_cb())
knows how to behave: kill the connection after having sent a Retry packet.
About TLS stack compatibility, this patch is supported by aws-lc. It is
disabled for wolfssl which does not support 0-RTT at this time thanks
to HAVE_SSL_0RTT_QUIC.
This patch depends on these commits:
MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
MINOR: quic: Implement qc_ssl_eary_data_accepted().
MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
MINOR: quic: Token for future connections implementation.
MINOR: quic: Implement quic_tls_derive_token_secret().
MINOR: tools: Implement ipaddrcpy().
Must be backported as far as 2.6.
Dump the early data status from QUIC_EV_CONN_IO_CB trace event.
This is very helpful to know if the QUIC server has accepted the
early data received from clients.
This function is a wrapper around SSL_get_early_data_status() for
OpenSSL derived stack and SSL_early_data_accepted() boringSSL derived
stacks like AWS-LC. It returns true for a TLS server if it has
accepted the early data received from a client.
Also implement quic_ssl_early_data_status_str() which is dedicated to be used
for debugging purposes (traces). This function converts the enum returned
by the two function mentionned above to a human readable string.
Modify qf_new_token structure to use a static buffer with QUIC_TOKEN_LEN
as size as defined by the token for future connections (quic_token.c).
Modify consequently the NEW_TOKEN frame parser (see quic_parse_new_token_frame()).
Also add comments to denote that the NEW_TOKEN parser function is used only by
clients and that its builder is used only by servers.
quic_build_new_token_frame() is the function which is called to build
a NEW_TOKEN frame into a buffer. The position pointer for this buffer
was not updated, leading the NEW_TOKEN frame to be malformed.
Must be backported as far as 2.6.
There exist two sorts of token used by QUIC. They are both used to validate
the peer address (path validation). Retry are used for the current
connection the client want to open. This patch implement the other
sort of tokens which after having been received from a connection, may
be provided for the next connection from the same IP address to validate
it (or validate the network path between the client and the server).
The token generation is implemented by quic_generate_token(), and
the token validation by quic_token_chek(). The same method
is used as for Retry tokens to build such tokens to be reused for
future connections. The format is very simple: one byte for the format
identifier to distinguish these new tokens for the Retry token, followed
by a 32bits timestamps. As this part is ciphered with AEAD as cryptographic
algorithm, 16 bytes are needed for the AEAD tag. 16 more random bytes
are added to this token and a salt to derive the AEAD secret used
to cipher the token. In addition to this salt, this is the client IP address
which is used also as AAD to derive the AEAD secret. So, the length of
the token is fixed: 37 bytes.
This is function is similar to quic_tls_derive_retry_token_secret().
Its aim is to derive the secret used to cipher the token to be used
for future connections.
This patch renames quic_tls_derive_retry_token_secret() to a more
and reuses its code to produce a more generic one: quic_do_tls_derive_token_secret().
Two arguments are added to this latter to produce both quic_tls_derive_retry_token_secret()
and quic_tls_derive_token_secret() new function which calls
quic_do_tls_derive_token_secret().
There was a typo in the macro name, where LENGTH was incorrectly
written. This didn't cause any issue because the typo appeared in all
occurrences in the codebase.
Support for 429 was recently added to L7 retries (0d142e075 "MINOR: proxy:
Add support of 429-Too-Many-Requests in retry-on status"). But the
l7_status_match() function was not properly updated. The switch statement
must match the 429 status to be able to perform a L7 retry.
This patch must be backported if the commit above is backported. It is
related to #2687.
If an early error occurred on the client connection, we must prevent any
multiplexer upgrades. Indeed, it is unexpected for a mux to be initialized
with no xprt. On a normal workflow it is impossible. So it is not an
issue. But if a mux upgrade is performed at the stream level, an early error
on the connection may have already been handled by the previous mux and the
connection may be already fully closed. If the mux upgrade is still
performed, a crash can be experienced.
It is possible to have a crash with an implicit TCP>HTTP upgrade if there is no
data in the input buffer. But it is also possible to get a crash with an
explicit "switch-mode http" rule.
It must be backported to all stable versions. In 2.2, the patch must be
applied directly in stream_set_backend() function.
When DATA frames are sent via the 0-copy data forwarding, we must take care
to set the ES flag on the last DATA frame. It should be performed in
h2_done_ff() when IOBUF_FL_EOI flag was set by the producer. This flag is
here to know when the producer has reached the end of input. When this
happens, the h2s state is also updated. It is switched to "half-closed
local" or "closed" state depending on its previous state.
It is mainly an issue on uploads because the server may be blocked waiting
for the end of the request. A workaround is to disable the 0-copy forwarding
support the the H2 by setting "tune.h2.zero-copy-fwd-send" directive to off
in your global section.
This patch should fix the issue #2665. It must be backported as far as 2.9.
The "429" status can now be specified on retry-on directives. PR_RE_* flags
were updated to remains sorted.
This patch should fix the issue #2687. It is quite simple so it may safely
be backported to 3.0 if necessary.
Activate the capture of the TLS signature_algorithms extension from the
Client Hello. This list is stored in the ssl_capture buffer when the
global option "tune.ssl.capture-cipherlist-size" is enabled.