This patch is similar to the previous one, this time dealing with
qc_new_conn(). This function was asymetric on frontend and backend side,
as connection argument was set only in the latter case.
This was required prior due to qc_alloc_ssl_sock_ctx() signature. This
has changed with the previous patch, thus qc_new_conn() can also be
realigned on both FE and BE sides. <conn> member of quic_conn instance
is always set outside it, in qc_xprt_start() on the backend case.
ssl_sock_ctx is a generic object used both on TCP/SSL and QUIC stacks.
Most notably it contains a <conn> member which is a pointer to struct
connection.
On QUIC frontend side, this member is always set to NULL. Indeed,
connection is only created after handshake completion. However, this has
changed for backend side, where the connection is instantiated prior to
its quic_conn counterpart. Thus, ssl_sock_ctx member would be set in
this case as a convenience for use later in qc_ssl_do_hanshake().
However, this method was unsafe as the connection can be released,
without resetting ssl_sock_ctx member. Thus, the previous patch fixes
this by using on <conn> member through the quic_conn instance which is
the proper way.
Thus, this patch resets ssl_sock_ctx <conn> member to NULL. This is
deemed the cleanest method as it ensures that both frontend and backend
sides must not use it anymore.
On backend side, a connection can be aborted and released prior to
handshake completion. This causes a crash in qc_ssl_do_hanshake() as
<conn> member of ssl_sock_ctx is not reset in this case.
To fix this, use <conn> member of quic_conn instead. This is safe as it
is properly set to NULL when a connection is released.
No impact on the frontend side as <conn> member is not accessed. Indeed,
in this case connection is most of the times allocated after handshake
completion.
No need to be backported.
macOS-15 images seems to have difficulties to run the reg-tests since a
few days for an unknown reason. Doing a rollback of both VTest2 and
haporxy doesn't seem to fix the problem so this is probably related to a
change in github actions.
This patch switches the image to the new macos-26 images which seems to
fix the problem.
Perf top showed that h1_snd_buf() was having great difficulties accessing
the proxy's server_id_hdr_name field in the middle of the headers loop.
Moving the assignment out of the loop to a local variable moved the
problem there as well:
| if (!(h1m->flags & H1_MF_RESP) && isttest(h1c->px->server_id_hdr_n
0.10 |20b0: mov -0x120(%rbp),%rdi
1.33 | mov 0x60(%rdi),%r10
0.01 | test %eax,%eax
0.18 | jne 2118
12.87 | mov 0x350(%r10),%rdi
0.01 | test %rdi,%rdi
0.05 | je 2118
| mov 0x358(%r10),%r11
It turns out that there are several atomically accessed fields in its
vicinity, causing the cache line to bounce all the time. Let's collect
the few frequently changed fields and place them together at the end
of the structure, and plug the 32-bit hole with another isolated field.
Doing so also reduced a little bit the cost of decrementing be->be_conn
in process_stream(), and overall the HTTP/1 performance increased by
about 1% both on ARM and x86_64.
build-ssl.sh is always prepending a "v" to the version, preventing to
build a FIPS version without FIPS enabled.
This patch checks if FIPS is in the version string to chose to add the
"v" or not.
Example:
AWS_LC_VERSION=AWS-LC-FIPS-3.0.0 BUILDSSL_DESTDIR=/opt/awslc-3.0.0 ./scripts/build-ssl.sh
When trying to reuse a backend connection, a connection hash is
calculated to match an entry with similar parameters. Previously, this
operation was skipped if the stream content wasn't based on HTTP, as it
would have been incompatible with http-reuse.
With the introduction of SPOP backends, this condition was removed, so
that it can also benefit from connection reuse. However, this means that
now hash calcul is always performed when connecting to a server, even
for TCP or log backends. This is unnecessary as these proxies cannot
perform connection reuse.
Note also that reuse mode is resetted on postparsing for incompatible
backends. This at least guarantees that no tree lookup will be performed
via be_reuse_connection(). However, connection lookup is still performed
in the session via session_get_conn() which is another unnecessary
operation.
Thus, this patch restores the condition so that reuse operations are now
entirely skipped if a backend mode is incompatible. This is implemented
via a new utility function named be_supports_conn_reuse().
This could be backported up to 3.1, as this commit could be considered
as a performance regression for tcp/log backend modes.
Since commit d655ed5f14 ("BUG/MAJOR: stats-file: ensure
shm_stats_file_object struct mapping consistency (2nd attempt)"), the
last_state_change field in the counters is a uint (to match how it's
reported). However, it happens that there are explicit casts in function
me_generate_field() to retrieve the value, and which cause crashes on
aarch64 and likely other non-x86 64-bit platforms due to atomically
reading an unaligned 64-bit value, and may even randomly crash other
64-bit platforms when reading past the end of the structure.
The fix for now adapts the cast to match the one used by the accessed
type (i.e. unsigned int), but the approach must change, as there's
nothing there which allows to figure whether or not the type is correct
by just reading the code. At minima a typeof() on a named field is
needed, but this requires more invasive changes, hence this temporary
fix.
No backport is needed, as stats-file is only in 3.3.
Previous fixes restored round robin iteration, but an imbalance remains
when the response tree contains record types other than A or AAAA. Let's
take the following example: the DNS answers two A records and a CNAME.
The response "tree" (which is actually flat, more like a list) may look
as follows, ordered by hash:
- 1st item: first A record with IP 1
- 2nd item: second A record with IP 2
- 3rd item: CNAME record
As a consequence, resolv_get_ip_from_response will iterate as follows,
while the TTL is still valid:
- 1st call: DNS request is done, response tree is created, iteration
starts at the first item, IP 1 is returned.
- 2nd call: cached response tree is used, iteration starts at the second
item, IP 2 is returned.
- 3rd call: cached response tree is used, iteration starts at the third
item, but it's a CNAME, so we continue to the next item, which restarts
iteration at the first item, and IP 1 is returned.
- 4th call: cached response tree is used and iteration restarts at the
beginning, returning IP 1 again.
The 1-2-1-1-2-1-1-2 sequence will repeat, so IP 1 will be used twice as
often as IP 2, creating a strong imbalance. Even with more IP addresses,
the first one by hashing order in the tree will always receive twice the
traffic of the others.
To fix this, set the next iteration item to the one following the selected
IP record, if any. This ensures we never use the same IP twice in a row.
This commit should be backported where 3023e9819 ("BUG/MINOR: resolvers:
Restore round-robin selection on records in DNS answers") is, so as far
as 2.6.
The aes_gcm_enc() and aes_gcm_dec() sample converters now accept an
optional fifth argument for Additional Authenticated Data (AAD). When
provided, the AAD value is base64-decoded and used during AES-GCM
encryption or decryption. Both string and variable forms are supported.
This enables use cases that require authentication of additional data.
If a QUIC server is declared without ALPN, "h3" value is automatically
set during _srv_parse_finalize().
This patch adjusts this operation. Instead of relying on
ssl_sock_parse_alpn(), a plain strdup() is used. This is considered more
efficient as the ALPN string is constant in this case. This method is
already used for listeners on the frontend side.
Ensure that QUIC support is compiled into haproxy when a QUIC server is
configured. This check is performed during _srv_parse_finalize() so that
it is detected both on configuration parsing and when adding a dynamic
server via the CLI.
Note that this changes the behavior of srv_is_quic() utility function.
Previously, it always returned false when QUIC support wasn't compiled.
With this new check introduced, it is now guaranteed that a QUIC server
won't exist if compilation support is not active. Hence srv_is_quic()
does not rely anymore on USE_QUIC define.
Previously, QUIC servers were rejected if SSL was not explicitely
activated using 'ssl' configuration keyword.
Change this behavior : now SSL is automatically activated for QUIC
servers when the keyword is missing. A warning is displayed as it is
considered better to explicitely note that SSL is in use.
Released version 3.3-dev11 with the following main changes :
- BUG/MEDIUM: mt_list: Make sure not to unlock the element twice
- BUG/MINOR: quic-be: unchecked connections during handshakes
- BUG/MEDIUM: cli: also free the trash chunk on the error path
- MINOR: initcalls: Add a new initcall stage, STG_INIT_2
- MEDIUM: stick-tables: Use a per-shard expiration task
- MEDIUM: stick-tables: Remove the table lock
- MEDIUM: stick-tables: Stop if stktable_trash_oldest() fails.
- MEDIUM: stick-tables: Stop as soon as stktable_trash_oldest succeeds.
- BUG/MEDIUM: h1-htx: Don't set HTX_FL_EOM flag on 1xx informational messages
- BUG/MEDIUM: h3: properly encode response after interim one in same buf
- BUG/MAJOR: pools: fix default pool alignment
- MINOR: ncbuf: extract common types
- MINOR: ncbmbuf: define new ncbmbuf type
- MINOR: ncbmbuf: implement add
- MINOR: ncbmbuf: implement iterator bitmap utilities functions
- MINOR: ncbmbuf: implement ncbmb_data()
- MINOR: ncbmbuf: implement advance operation
- MINOR: ncbmbuf: add tests as standalone mode
- BUG/MAJOR: quic: use ncbmbuf for CRYPTO handling
- MINOR: quic: remove received CRYPTO temporary tree storage
- MINOR: stats-file: fix typo in shm-stats-file object struct size detection
- MINOR: compiler: add FIXED_SIZE(size, type, name) macro
- MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct
- BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency
- BUG/MEDIUM: build: limit excessive and counter-productive gcc-15 vectorization
- BUG/MEDIUM: stick-tables: Don't loop if there's nothing left
- MINOR: acme: add the dns-01-record field to the sink
- MINOR: acme: display the complete challenge_ready command in the logs
- BUG/MEDIUM: mt_lists: Avoid el->prev = el->next = el
- MINOR: quic: remove unused conn-tx-buffers limit keyword
- MINOR: quic: prepare support for options on FE/BE side
- MINOR: quic: rename "no-quic" to "tune.quic.listen"
- MINOR: quic: duplicate glitches FE option on BE side
- MINOR: quic: split congestion controler options for FE/BE usage
- MINOR: quic: split Tx options for FE/BE usage
- MINOR: quic: rename max Tx mem setting
- MINOR: quic: rename retry-threshold setting
- MINOR: quic: rename frontend sock-per-conn setting
- BUG/MINOR: quic: split max-idle-timeout option for FE/BE usage
- BUG/MINOR: quic: split option for congestion max window size
- BUG/MINOR: quic: rename and duplicate stream settings
- BUG/MEDIUM: applet: Improve again spinning loops detection with the new API
- Revert "BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency"
- Revert "MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct"
- Revert "MINOR: compiler: add FIXED_SIZE(size, type, name) macro"
- BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency (2nd attempt)
- BUG/MINOR: stick-tables: properly index string-type keys
- BUILD: openssl-compat: fix build failure with OPENSSL=0 and KTLS=1
- BUG/MEDIUM: mt_list: Use atomic operations to prevent compiler optims
- MEDIUM: quic: Fix build with openssl-compat
- MINOR: applet: do not put SE_FL_WANT_ROOM on rcv_buf() if the channel is empty
- MINOR: cli: create cli_raw_rcv_buf() from the generic applet_raw_rcv_buf()
- BUG/MEDIUM: cli: do not return ACKs one char at a time
- BUG/MEDIUM: ssl: Crash because of dangling ckch_store reference in a ckch instance
- BUG/MINOR: ssl: Remove unreachable code in CLI function
- BUG/MINOR: acl: warn if "_sub" derivative used with an explicit match
- DOC: config: fix confusing typo about ACL -m ("now" vs "not")
- DOC: config: slightly clarify the ssl_fc_has_early() behavior
- MINOR: ssl-sample: add ssl_fc_early_rcvd() to detect use of early data
- CI: disable fail-fast on fedora rawhide builds
- MINOR: http: fix 405,431,501 default errorfile
- BUG/MINOR: init: Do not close previously created fd in stdio_quiet
- MINOR: init: Make devnullfd global and create it earlier in init
- MINOR: init: Use devnullfd in stdio_quiet calls instead of recreating a fd everytime
- MEDIUM: ssl: Add certificate password callback that calls external command
- MEDIUM: ssl: Add local passphrase cache
- MINOR: ssl: Do not dump decrypted privkeys in 'dump ssl cert'
- BUG/MINOR: resolvers: Apply dns-accept-family setting on additional records
- MEDIUM: h1: Immediately try to read data for frontend
- REGTEST: quic: add ssl_reuse.vtc new QUIC test
- BUG/MINOR: ssl: returns when SSL_CTX_new failed during init
- MEDIUM: ssl/ech: config and load keys
- MINOR: ssl/ech: add logging and sample fetches for ECH status and outer SNI
- MINOR: listener: implement bind_conf_find_by_name()
- MINOR: ssl/ech: key management via stats socket
- CI: github: add USE_ECH=1 to haproxy for openssl-ech job
- DOC: configuration: "ech" for bind lines
- BUG/MINOR: ech: non destructive parsing in cli_find_ech_specific_ctx()
- DOC: management: document ECH CLI commands
- MEDIUM: mux-h2: do not needlessly refrain from sending data early
- MINOR: mux-h2: extract the code to send preface+settings into its own function
- BUG/MINOR: mux-h2: send the preface along with the first request if needed
Tests involving 0-RTT and H2 on the backend show that 0-RTT is being
partially used but does not work. The analysis shows that only the
preface and settings are sent using early-data and the request is sent
separately. As explained in the previous patch, this is caused by the
fact that a wakeup of the iocb is needed just to send the preface, then
a new call to process_stream is needed to try sending again.
Here with this patch, we're making h2_snd_buf() able to send the preface
if it was not yet sent. Thanks to this, the preface, settings and first
request can now leave as a single TCP segment. In case of TLS with 0-RTT,
it now allows all the block to leave in early data.
Even in clear-text H2, we're now seeing a 15% lower context-switch count,
and the number of calls to process_stream() per connection dropped from 3
to 2. The connection rate increased by an extra 9.5%. Compared to without
the last 3 patches, this is a 22% reduction of context-switches, 33%
reduction of process_stream() calls, and 15.7% increase in connection
rate. And more importantly, 0-RTT now really works with H2 on the
backend, saving one full RTT on the first request.
This fix is only for a missed optimization and a non-functional 0-RTT
on the backend. It's worth backporting it, but it doesn't cause enough
harm to hurry a backport. Better wait for it to live a little bit in
3.3 (till at least a week or two after the final release) before
backporting it. It's not sure that it's worth going beyond 3.2 in any
case. It depends on the these two previous commits:
MEDIUM: mux-h2: do not needlessly refrain from sending data early
MINOR: mux-h2: extract the code to send preface+settings into its own function
The code that deals with sending preface + settings and changing the
state currently is in h2_process_mux(), but we'll want to do it as
well from h2_snd_buf(), so let's move it to a dedicate function first.
At this point there is no functional change.
The mux currently refrains from sending data before H2_CS_FRAME_H, i.e.
before the peer's SETTINGS frame was received. While it makes sense on
the frontend, it's causing harm on the backend because it forces the
first request to be sent in two halves over an extra RTT: first the
preface and settings, second the request once the settings are received.
This is totally contrary to the philosophy of the H2 protocol, consisting
in permitting the client to send as soon as possible.
Actually what happens is the following:
- process_stream() calls connect_server()
- connect_server() creates a connection, and if the proto/alpn is guessed
or known, the mux is instantiated for the current request.
- the H2 init code wakes the h2 tasklet up and returns
- process_stream() tries to send the request using h2_snd_buf(), but that
one sees that we're before H2_CS_FRAME_H, refrains from doing so and
returns.
- process_stream() subscribes and quits
- the h2 tasklet can now execute to send the preface and settings, which
leave as a first TCP segment. The connection is ready.
- the iocb is woken again once the server's SETTINGS frame is received,
turning the connection to the H2_CS_FRAME_H state, and the iocb wake
up process_stream().
- process_stream() executes again and can try to send again.
- h2_snd_buf() is called and finally sends the request as a second TCP
segment.
Not only this is inefficient, but it also renders 0-RTT and TFO impossible
on H2 connections. When 0-RTT is used, only the preface and settings leave
as early data (the very first data of that connection), which is totally
pointless.
In order to fix this, we have to go through a few steps:
- first we need to let data be sent to a server immediately after the
SETTINGS frame was sent (i.e. in H2_CS_SETTINGS1 state instead of
H2_CS_FRAME_H). However, some protocol extensions are advertised by
the server using SETTINGS (e.g. RFC8441) and some requests might need
to know the existence of such extensions. For this reason we're adding
a new h2c flag, H2_CF_SETTINGS_NEEDED, which indicates that some
operations were not done because a server's SETTINGS frame is needed.
This is set when trying to send a protocol upgrade or extended CONNECT
during H2_CS_SETTINGS1, indicating that it's needed to wait for
H2_CS_FRAME_H in this case. The flag is always set on frontend
connections. This is what is being done in this patch.
- second, we need to be able to push the preface opportunistically with
the first h2_snd_buf() so that it's not needed to wake the tasklet up
just to send that and wake process_stream() again. This will be in a
separate patch.
By doing the first step, we're at least saving one needless tasklet
wakeup per connection (~9%), which results in ~5% backend connection
rate increase.
cli_find_ech_specific_ctx() parses the <frontend>/<bind_conf> and sets
a \0 in place the '/'. But the originals tring is still used to emit
messages in the CLI so we only output the frontend part.
This patch do the parsing in a trash buffer instead.
ECH is an experimental features which still a draft, but already exists as a
feature branch in OpenSSL.
This patch explains how to configure "ech" on bind lines.
This patch extends the ECH support by adding runtime CLI commands to
view and modify ECH configurations.
New commands are added to the HAProxy CLI:
- "show ssl ech [<name>]" displays all ECH configurations or a specific
one.
- "add ssl ech <name> <payload>" adds a new PEM-formatted ECH
configuration.
- "set ssl ech <name> <payload>" replaces all existing ECH
configurations.
- "del ssl ech <name> [<age-in-secs>]" removes ECH configurations,
optionally filtered by age.
Returns a pointer to the first bind_conf matching <name> in a frontend
<front>.
When name is prefixed by a @ (@<filename>:<linenum>), it tries to look
for the corresponding filename and line of the configuration file.
NULL is returned if no match is found.
This patch adds functions to expose Encrypted Client Hello (ECH) status
and outer SNI information for logging and sample fetching.
Two new helper functions are introduced in ech.c:
- conn_get_ech_status() places the ECH processing status string into a
buffer.
- conn_get_ech_outer_sni() retrieves the outer SNI value if ECH
succeeded.
Two new sample fetch keywords are added:
- "ssl_fc_ech_status" returns the ECH status string.
- "ssl_fc_ech_outer_sni" returns the outer SNI value seen during ECH.
These allow ECH information to be used in HAProxy logs, ACLs, and
captures.
This patch introduces the USE_ECH option in the Makefile to enable
support for Encrypted Client Hello (ECH) with OpenSSL.
A new function, load_echkeys, is added to load ECH keys from a specified
directory. The SSL context initialization process in ssl_sock.c is
updated to load these keys if configured.
A new configuration directive, `ech`, is introduced to allow users to
specify the ECH key directory in the listener configuration.
In ssl_sock_initial_ctx(), returns when SSL_CTX_new() failed instead of
trying to apply anything on the ctx. This may avoid crashing when
there's not enough memory anymore during configuration parsing.
Could be backported in every haproxy versions
Note that this test does not work with OpenSSL 3.5.0 QUIC API because
the callback set by SSL_CTX_sess_set_new_cb() (ssl_sess_new_srv_cb()) is not
called (at least for QUIC clients)
The role of this new QUIC test is to run the same SSL/TCP test as
reg-tests/ssl/ssl_reuse.vtc but with QUIC connections where applicable (only with
TLSv1.3).
To do so, this QUIC test uses the "include" vtc command to run ssl/ssl_reuse.vtc
It also sets the VTC_SOCK_TYPE environment variable with the "setenv" command and
"quic" as value. This will ask vtest2 to use QUIC sockets for all "fd@{...}"
addresses prefixed by "${VTC_SOCK_TYPE}+" socket type if VTC_SOCK_TYPE value is "quic".
The SSL/TCP is modified to set this environment variable with "setenv -ifunset"
from ssl/ssl_reuse.vtc with "stream" as value, if it not already set.
vtest2 must be used with this patch to support this new QUIC test:
9aa4d498db
Thanks to this latter patch, vtest2 retrieves the VTC_SOCK_TYPE environment variable
value, then it parses the vtc file to retrieve all the fd addresses prefixed by
"${VTC_SOCK_TYPE}+" and creates a QUIC socket or a TCP socket depending on this
variable value.
In h1_init(), if we're a frontend connection, immediately attempt to
read data, if the connection is ready, instead of just subscribing.
There may already be data available, at least if we're using 0RTT.
This may be backported up to 2.8 in a while, after 3.3 is released, so
that if it causes problem, we have a chance to hear about it.
dns-accept-family setting was only evaluated for responses to A / AAAA DNS
queries. It was ignored when additional records in SRV responses were
parsed.
With this patch, whena SRV responses is parsed, additional records not
matching the dns-accept-family setting are ignored, as expected.
This patch must be backported to 3.2.
A private keys that is password protected and was decoded during init
thanks to the password obtained thanks to 'ssl-passphrase-cmd' should
not be dumped via 'dump ssl cert' CLI command.
Instead of calling the external password command for all loaded
encrypted certificates, we will keep a local password cache.
The passwords won't be stored as plain text, they will be stored
obfuscated into the password cache. The obfuscation is simply based on a
XOR'ing with a random number built during init.
After init is performed, the password cache is overwritten and freed so
that no dangling info allowing to dump the passwords remains.
When a certificate is protected by a password, we can provide the
password via the dedicated pem_password_cb param provided to
PEM_read_bio_PrivateKey.
HAProxy will fetch the password automatically during init by calling a
user-defined external command that should dump the right password on its
standard output (see new 'ssl-passphrase-cmd' global option).
Since commit "65760d MINOR: init: Make devnullfd global and create it
earlier in init" the devnullfd file descriptor pointing to /dev/null
is created regardless of the process's parameters so we can use it in
all 'stdio_quiet' calls instead or recreating an FD.
The devnull fd might be needed during configuration parsing, if some
options require to fork/exec for instance. So we now create it much
earlier in the init process and without depending on the '-q' or '-d'
parameters.
During init we were calling 'stdio_quiet' and passing the previously
created 'devnullfd' file descriptor. But the 'stdio_quiet' was also
closed afterwards which raised an error (EBADF).
If we keep from closing FDs that were opened outside of the
'stdio_quiet' function we will let the caller manage its FD and avoid
double close calls.
This patch can be backported to all stable branches.
A few typos were present in the default errorfiles for the status codes
above (missing dot at the end of the sentence, extra closing bracket).
This fixes them. This can be backported.
Previously builds were dependent in terms that if one fails, other are
stopped. By their nature those builds are independent, let's not to fail
them altogether
We currently have ssl_fc_has_early() which says that early data are still
unconfirmed by a final handshake, but nothing to see if a client has been
able to use early data at all, which is a problem because such mechanisms
generally depend on multiple factors and it's hard to know when they start
to work. This new sample fetch function will indicate that some early data
were seen over that front connection, i.e. this can be used to confirm
that at some point the client was able to push some. This is essentially
a debugging tool that has no practical use case other than debugging.
Clarify that it's about handshake *completion*, and also mention that
the action to be used to wait for the handshake is "wait-for-handshake",
which was not mentioned.
This can be backported though it's very minor.
A one-letter typo in the doc update comint with commit 6ea50ba462 ("MINOR:
acl; Warn when matching method based on a suffix is overwritten") inverts
the meaning of the sentence. It was "is not allowed" and not
"is now allowed". Needs to be backported only if the commit above ever is
(unlikely).
Recently, a new warning is displayed when an ACL derivative match method
is override with another '-m' method. This is implemented via the
following patch :
6ea50ba462692d6dcf301081f23cab3e0f6086e4
MINOR: acl; Warn when matching method based on a suffix is overwritten
However, this warning was not reported when "_sub" suffix was specified.
Fix this by adding PAT_MATCH_SUB in the warning comparison.
No backport needed except if above commit is.
When updating CAs via the CLI, we need to create new copies of all the
impacted ckch instances (as in referenced in the ckch_inst_link list of
the updated CA) in order to use them instead of the old ones once the
updated is completed. This relies on the ckch_inst_rebuild function that
would set the ckch_store field of the ckch_inst. But we forgot to also
add the newly created instances in the ckch_inst list of the
corresponding ckch_store.
When updating a certificate afterwards, we iterate over all the
instances linked in the ckch_inst list of the ckch_store (which is
missing some instances because of the previous command) and rebuild the
instances before replacing the ckch_store. The previous ckch_store,
still referenced by the dangling ckch instance then gets deleted which
means that the instance keeps a reference to a free'd object.
Then if we were to once again update the CA file, we would iterate over
the ckch instances referenced in the cafile_entry's ckch_inst_link list,
which includes the first mentioned ckch instance with the dead
ckch_store reference. This ends up crashing during the ckch_inst_rebuild
operation.
This bug was raised in GitHub #3165.
This patch should be backported to all stable branches.
Since 3.0 where the CLI started to use rcv_buf, it appears that some
external tools sending chained commands are randomly experiencing
failures. Each time this happens when the whole command is sent as a
single packet, immediately followed by a close. This is not a correct
way to use the CLI but this has been working for ages for simple
netcat-based scripts, so we should at least try to preserve this.
The cause of the failure is that the first LF that acks a command is
immediately sent back to the client and rejected due to the closed
connection. This in turn forwards the error back to the applet which
aborts its processing.
Before 3.0 the responses would be queued into the buffer, then sent
back to the channel, and would all fail at once. This changed when
snd_buf/rcv_buf were implemented because the applets are much more
responsive and since they yield between each command, they can
deliver one ACK at a time that is immediately forwarded down the
chain.
An easy way to observe the problem is to send 5 map updates, a shutdown,
and immediately close via tcploop, and in parallel run a periodic
"show map" to count the number of elements:
$ tcploop -U /tmp/sock1 C S:"add map #0 1 1; add map #0 2 2; add map #0 3 3; add map #0 4 4; add map #0 5 5\n" F K
Before 3.0, there would always be 5 elements. Since 3.0 and before
20ec1de214 ("MAJOR: cli: Refacor parsing and execution of pipelined
commands"), almost always 2. And since that commit above in 3.2, almost
always one. Doing the same using socat or netcat shows almost always 5...
It's entirely timing-dependent, and might even vary based on the RTT
between the client and haproxy!
The approach taken here consists in doing the same principle as MSG_MORE
or Nagle but on the response buffer: the applet doesn't need to send a
single ACK for each command when it has already been woken up and is
scheduled to come back to work. It's fine (and even desirable) that
ACKs are grouped in a single packet as much as possible.
For this reason, this patch implements APPCTX_CLI_ST1_YIELD, a new CLI
flag which indicates that the applet left in yielding condition, i.e.
it has not finished its work. This flag is used by .rcv_buf to hold
pending data. This way we won't return partial responses for no reason,
and we can continue to emulate the previous behavior.
One very nice benefit to this is that it saves huge amounts of CPU on
the client. In the test below that tries to update 1M map entries, the
CPU used by socat went from 100% to 0% and the total transfer time
dropped by 28%:
before:
$ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \
printf "add map #0 %d %d\n",i,i,i }}' | socat /tmp/sock1 - >/dev/null
real 0m2.407s
user 0m1.485s
sys 0m1.682s
after:
$ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \
printf "add map #0 %d %d\n",i,i,i }}' | socat /tmp/sock1 - >/dev/null
real 0m1.721s
user 0m0.952s
sys 0m0.057s
The difference is also quite visible on the number of syscalls during
the test (for 1k updates):
before:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.071691 0 100001 sendmsg
after:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000011 1 9 sendmsg
This patch will need to be backported to 3.0, and depends on these two
patches to be backported as well:
MINOR: applet: do not put SE_FL_WANT_ROOM on rcv_buf() if the channel is empty
MINOR: cli: create cli_raw_rcv_buf() from the generic applet_raw_rcv_buf()
This is in preparation for a future fix. For now it's simply a pure
copy of the original function, but dedicated to the CLI. It will
have to be backported to 3.0.
appctx_rcv_buf() prepares all the work to schedule the transfers between
the applet and the channel, and it takes care of setting the various flags
that indicate what condition is blocking the transfer from progressing.
There is one limitation though. In case an applet refrains from sending
data (e.g. rate-limited, prefers to aggregate blocks etc), it will leave
a possibly empty channel buffer, and keep some data in its outbuf. The
data in its outbuf will be seen by the function above as an indication
of a channel full condition, so it will place SE_FL_WANT_ROOM. But later,
sc_applet_recv() will see this flag with a possibly empty channel, and
will rightfully trigger a BUG_ON().
appctx_rcv_buf() should be more accurate in fact. It should only set
SE_FL_RCV_MORE when more data are present in the applet, then it should
either set or clear SE_FL_WANT_ROOM dependingon whether the channel is
empty or not.
Right now it doesn't seem possible to trigger this condition in the
current state of applets, but this will become possible with a future
bugfix that will have to be backported, so this patch will need to be
backported to 3.0.
As the QUIC options have been split into backend and frontend, there is
no more GTUNE_QUIC_LISTEN_OFF to be found in global.tune.options, look
for QUIC_TUNE_FE_LISTEN_OFF in quic_tune.fe instead.
This should fix the build with USE_QUIC and USE_QUIC_OPENSSL_COMPAT.
As a folow-up to f40f5401b9f24becc6fdd2e77d4f4578bbecae7f, explicitely
use atomic operations to set the prev and next fields, to make sure the
compiler can't assume anything about it, and just does it.
This should be backported after f40f5401b9 up to 2.8.