Some tables are currently used to decode bit blocks and lengths. We do
see such lookups in perf top. We have 4 512-byte tables and one 64-byte
one. Looking closer, the second half of the table (length) has so few
variations that most of the time it will be computed in a single "if",
and never more than 3. This alone allows to cut the tables in half. In
addition, one table (bits 15-11) is only 32-element long, while another
one (bits 11-4) starts at 0x60, so we can merge the two as they do not
overlap, and further save size. We're now down to 4 256-entries tables.
This is visible in h3 and h2 where the max request rate is slightly higher
(e.g. +1.6% for h2). The huff_dec() function got slightly larger but the
overall code size shrunk:
$ nm --size haproxy-before | grep huff_dec
000000000000029e T huff_dec
$ nm --size haproxy-after | grep huff_dec
0000000000000345 T huff_dec
$ size haproxy-before haproxy-after
text data bss dec hex filename
7591126 569268 2761348 10921742 a6a70e haproxy-before
7591082 568180 2761348 10920610 a6a2a2 haproxy-after
The locally defined static variables 'httpclient_srv_raw' and
'httpclient_srv_ssl' are not used anywhere in the source code,
except that they are set in the httpclient_precheck() function.
nb_hreq is a counter on qcc for active HTTP requests. It is incremented
for each qcs where a full HTTP request was received. It is decremented
when the stream is closed locally :
- on HTTP response fully transmitted
- on stream reset
A bug will occur if a stream is resetted without having processed a full
HTTP request. nb_hreq will be decremented whereas it was not
incremented. This will lead to a crash when building with
DEBUG_STRICT=2. If BUG_ON_HOT are not active, nb_hreq counter will wrap
which may break the timeout logic for the connection.
This bug was triggered on haproxy.org. It can be reproduced by
simulating the reception of a STOP_SENDING frame instead of a STREAM one
by patching qc_handle_strm_frm() :
+ if (quic_stream_is_bidi(strm_frm->id))
+ qcc_recv_stop_sending(qc->qcc, strm_frm->id, 0);
+ //ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len,
+ // strm_frm->offset.key, strm_frm->fin,
+ // (char *)strm_frm->data);
To fix this bug, a qcs is now flagged with a new QC_SF_HREQ_RECV. This
is set when the full HTTP request is received. When the stream is closed
locally, nb_hreq will be decremented only if this flag was set.
This must be backported up to 2.6.
This commit adds a new command line option -dC to dump the configuration
file. An optional key may be appended to -dC in order to produce an
anonymized dump using this key. The anonymizing process uses the same
algorithm as the CLI so that the same key will produce the same hashes
for the same identifiers. This way an admin may share an anonymized
extract of a configuration to match against live dumps. Note that key 0
will not anonymize the output. However, in any case, the configuration
is dumped after tokenizing, thus comments are lost.
Modify proxy.c in order to anonymize the following confidential data on
commands 'show servers state' and 'show servers conn':
- proxy name
- server name
- server address
Modify stream.c in order to hash the following confidential data if the
anonymized mode is enabled:
- configuration elements such as frontend/backend/server names
- IP addresses
In order to allow users to dump internal states using a specific key
without changing the global one, we're introducing a key in the CLI's
appctx. This key is preloaded from the global one when "set anon on"
is used (and if none exists, a random one is assigned). And the key
can optionally be assigned manually for the whole CLI session.
A "show anon" command was also added to show the anon state, and the
current key if the users has sufficient permissions. In addition, a
"debug dev hash" command was added to test the feature.
Add a uint32_t key in global to hash words with it. A new CLI command
'set global-key <key>' was added to change the global anonymizing key.
The global may also be set in the configuration using the global
"anonkey" directive. For now this key is not used.
These macros and functions will be used to anonymize strings by producing
a short hash. This will allow to match config elements against dump elements
without revealing the original data. This will later be used to anonymize
configuration parts and CLI commands output. For now only string, identifiers
and addresses are supported, but the model is easily extensible.
Ilya reported in issue #1816 a build warning on armhf (promoted to error
here since -Werror):
src/fd.c: In function fd_rm_from_fd_list:
src/fd.c:209:87: error: passing argument 3 of __ha_cas_dw discards volatile qualifier from pointer target type [-Werror=discarded-array-qualifiers]
209 | unlikely(!_HA_ATOMIC_DWCAS(((long *)&fdtab[fd].update), (uint32_t *)&cur_list.u32, &next_list.u32))
| ^~~~~~~~~~~~~~
This happens only on such an architecture because the DWCAS requires the
pointer not the value, and gcc seems to be needlessly picky about reading
a const from a volatile! This may safely be backported to older versions.
Ed Hein reported in github issue #1856 some occasional watchdog panics
in 2.4.18 showing extreme contention on the proxy's lock while the libc
was in malloc()/free(). One cause of this problem is that we call free()
under the proxy's lock in proxy_capture_error(), which makes no sense
since if we can free the object under the lock after it's been detached,
we can also free it after releasing the lock (since it's not referenced
anymore).
This should be backported to all relevant versions, likely all
supported ones.
This fixes 4 tiny and harmless typos in mux_quic.c, quic_tls.c and
ssl_sock.c. Originally sent via GitHub PR #1843.
Signed-off-by: cui fliter <imcusg@gmail.com>
[Tim: Rephrased the commit message]
[wt: further complete the commit message]
When calling 'add server' with a hostname from the cli (runtime),
str2sa_range() does not resolve hostname because it is purposely
called without PA_O_RESOLVE flag.
This leads to 'srv->addr_node.key' being NULL. According to Willy it
is fine behavior, as long as we handle it properly, and is already
handled like this in srv_set_addr_desc().
This patch fixes GH #1865 by adding an extra check before inserting
'srv->addr_node' into 'be->used_server_addr'. Insertion and removal
will be skipped if 'addr_node.key' is NULL.
It must be backported to 2.6 and 2.5 only.
A stream is considered as remotely closed once we have received all the
data with the FIN bit set.
The condition to close the stream was wrong. In particular, if we
receive an empty STREAM frame with FIN bit set, this would have close
the stream even if we do not have yet received all the data. The
condition is now adjusted to ensure that Rx buffer contains all the data
up to the stream final size.
In most cases, this bug is harmless. However, if compiled with
DEBUG_STRICT=2, a BUG_ON_HOT crash would have been triggered if close is
done too early. This was most notably the case sometimes on interop test
suite with quinn or kwik clients. This can also be artificially
reproduced by simulating reception of an empty STREAM frame with FIN bit
set in qc_handle_strm_frm() :
+ if (strm_frm->fin) {
+ qcc_recv(qc->qcc, strm_frm->id, 0,
+ strm_frm->len, strm_frm->fin,
+ (char *)strm_frm->data);
+ }
ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len,
strm_frm->offset.key, strm_frm->fin,
(char *)strm_frm->data);
This must be backported up to 2.6.
Small cleanup on snd_buf for application protocol layer.
* do not export h3_snd_buf
* replace stconn by a qcs argument. This is better as h3/hq-interop only
uses the qcs instance.
This should be backported up to 2.6.
The same was performed for the H2 multiplexer. H1C and H1S flags are moved
in a dedicated header file. It will be mainly used to be able to decode
mux-h1 flags from the flags utility.
In this patch, we only move the flags to mux_h1-t.h.
H3 SETTINGS emission has recently been delayed. The idea is to send it
with the first STREAM to reduce sendto syscall invocation. This was
implemented in the following patch :
3dd79d378c
MINOR: h3: Send the h3 settings with others streams (requests)
This patch works fine under nominal conditions. However, it will cause a
crash if a HTTP/3 connection is released before having sent any data,
for example when receiving an invalid first request. In this case,
qc_release will first free qcc.app_ops HTTP/3 application protocol layer
via release callback. Then qc_send is called to emit any closing frames
built by app_ops release invocation. However, in qc_send, as no data has
been sent, it will try to complete application layer protocol
intialization, with a SETTINGS emission for HTTP/3. Thus, qcc.app_ops is
reused, which is invalid as it has been just freed. This will cause a
crash with h3_finalize in the call stack.
This bug can be reproduced artificially by generating incomplete HTTP/3
requests. This will in time trigger http-request timeout without any
data send. This is done by editing qc_handle_strm_frm function.
- ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len,
+ ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len - 1,
strm_frm->offset.key, strm_frm->fin,
(char *)strm_frm->data);
To fix this, application layer closing API has been adjusted to be done
in two-steps. A new shutdown callback is implemented : it is used by the
HTTP/3 layer to generate GOAWAY frame in qc_release prologue.
Application layer context qcc.app_ops is then freed later in qc_release
via the release operation which is now only used to liberate app layer
ressources. This fixes the problem as the intermediary qc_send
invocation will be able to reuse app_ops before it is freed.
This patch fixes the crash, but it would be better to adjust H3 SETTINGS
emission in case of early connection closing : in this case, there is no
need to send it. This should be implemented in a future patch.
This should fix the crash recently experienced by Tristan in github
issue #1801.
This must be backported up to 2.6.
With quicTLS the set_encruption_secrets callback is always called with
the read_secret and the write_secret.
However this is not the case with libreSSL, which uses the
set_read_secret()/set_write_secret() mecanism. It still provides the
set_encryption_secrets() callback, which is called with a NULL
parameter for the write_secret during the read, and for the read_secret
during the write.
The exchange key was not designed in haproxy to be called separately for
read and write, so this patch allow calls with read or write key to
NULL.
httpclient_new_from_proxy() is a variant of httpclient_new() which
allows to create the requests from a different proxy.
The proxy and its 2 servers are now stored in the httpclient structure.
The proxy must have been created with httpclient_create_proxy() to be
used.
The httpclient_postcheck() callback will finish the initialization of
all proxies created with PR_CAP_HTTPCLIENT.
httpclient_create_proxy() is a function which creates a proxy that could
be used for the httpclient. It will allocate a proxy, a raw server and
an ssl server.
This patch moves most of the code from httpclient_precheck() into a
generic function httpclient_create_proxy().
The proxy will have the PR_CAP_HTTPCLIENT capability.
This could be used for specifics httpclient instances that needs
different proxy settings.
The init of tcp sink, particularly for SSL, was done
too early in the code, during parsing, and this can cause
a crash specially if nbthread was not configured.
This was detected by William using ASAN on a new regtest
on log forward.
This patch adds the 'struct proxy' created for a sink
to a list and this list is now submitted to the same init
code than the main proxies list or the log_forward's proxies
list. Doing this, we are assured to use the right init sequence.
It also removes the ini code for ssl from post section parsing.
This patch should be backported as far as v2.2
Note: this fix uses 'goto' labels created by commit
'BUG/MAJOR: log-forward: Fix log-forward proxies not fully initialized'
but this code didn't exist before v2.3 so this patch needs to be
adapted for v2.2.
Originally in 1.8 we wanted to have an independent mux that could possibly
be disabled and would not impose dependencies on the outside. Everything
would fit into a single C file and that was fine.
Nowadays muxes are unavoidable, and not being able to easily inspect them
from outside is sometimes a bit of a pain. In particular, the flags utility
still cannot be used to decode their flags.
As a first step towards this, this patch moves the flags and enums to
mux_h2-t.h, as well as the two state decoding inline functions. It also
dropped the H2_SS_*_BIT defines that nobody uses. The mux_h2.c file remains
the only one to include that for now.
Please refer to GH #1859 for more info.
Coverity suspected improper proxy pointer handling.
Without the fix it is considered safe for the moment, but it might not
be the case in the future as we want to keep the ability to have
isolated listeners.
Making sure stop_listener(), pause_listener(), resume_listener()
and listener_release() functions make proper use
of px pointer in that context.
No need for backport except if multi-connection protocols (ie:FTP)
were to be backported as well.
Since this counter was added, it was incremented at the wrong place for
client streams. It was incremented when the stream-connector (formely the
conn-stream) was created while it should be done when the H1 stream is
created. Thus, on parsing error, on H1>H2 upgrades or TCP>H1 upgrades, the
counter is not incremented. However, it is always decremented when the H1
stream is destroyed.
On bakcned side, there is no issue.
This patch must be backported to 2.6.
As reported by Ilya and Coverity in issue #1858, since recent commit
eea152ee6 ("BUG/MINOR: signals/poller: ensure wakeup from signals")
which removed the test for the global signal flag from the pollers'
loop, the remaining "wake" flag doesn't need to be tested since it
already participates to zeroing the wait_time and will be caught
on the previous line.
Let's just remove that test now.
This patch adresses the issue #1626.
Adding support for PR_FL_PAUSED flag in the function stats_fill_fe_stats().
The command 'show stat' now properly reports a disabled frontend
using "PAUSED" state label.
This patch depends on the following commits:
- 7d00077fd5 "BUG/MEDIUM: proxy: ensure pause_proxy()
and resume_proxy() own PROXY_LOCK".
- 001328873c "MINOR: listener: small API change"
- d46f437de6 "MINOR: proxy/listener: support for additional PAUSED state"
It should be backported to 2.6, 2.5 and 2.4
This patch is a prerequisite for #1626.
Adding PAUSED state to the list of available proxy states.
The flag is set when the proxy is paused at runtime (pause_listener()).
It is cleared when the proxy is resumed (resume_listener()).
It should be backported to 2.6, 2.5 and 2.4
A minor API change was performed in listener(.c/.h) to restore consistency
between stop_listener() and (resume/pause)_listener() functions.
LISTENER_LOCK was never locked prior to calling stop_listener():
lli variable hint is thus not useful anymore.
Added PROXY_LOCK locking in (resume/pause)_listener() functions
with related lpx variable hint (prerequisite for #1626).
It should be backported to 2.6, 2.5 and 2.4
There was a race involving hlua_proxy_* functions
and some proxy management functions.
pause_proxy() and resume_proxy() can be used directly from lua code,
but that could lead to some race as lua code didn't make sure PROXY_LOCK
was owned before calling the proxy functions.
This patch makes sure it won't happen again elsewhere in the code
by locking PROXY_LOCK directly in resume and pause proxy functions
so that it's not the caller's responsibility anymore.
(based on stop_proxy() behavior that was already safe prior to the patch)
This should be backported to stable series.
Note that the API will likely differ < 2.4
Add self-wake in signal_handler() to fix a race condition with a signal
coming in between checking signal_queue_len and entering polling sleep.
The changes in commit 43c891dda ("BUG/MINOR: signals/poller: set the
poller timeout to 0 when there are signals") were insufficient.
Move the signal_queue_len check from the poll implementations to
run_poll_loop() to keep that logic in one place.
The poll loops are terminated either by the parameter wake being set or
wake up due to a write to their poller_wr_pipe by wake_thread() in
signal_handler().
This fixes issue #1841.
Must be backported in every stable version.
This is the ->finalize application callback which prepares the unidirectional STREAM
frames for h3 settings and wakeup the mux I/O handler to send them. As haproxy is
at the same time always waiting for the client request, this makes haproxy
call sendto() to send only about 20 bytes of stream data. Furthermore in case
of heavy loss, this give less chances to short h3 requests to succeed.
Drawback: as at this time the mux sends its streams by their IDs ascending order
the stream 0 is always embedded before the unidirectional stream 3 for h3 settings.
Nevertheless, as these settings may be lost and received after other h3 request
streams, this is permitted by the RFC.
Perhaps there is a better way to do. This will have to be checked with Amaury.
Must be backported to 2.6.
This was due to a missing check in h3_trace() about the first argument
presence (connection) and h3_parse_settings_frm() which calls TRACE_LEAVE()
without any argument. Then this argument was dereferenced.
Must be backported to 2.6
<qc> variable was confused with <qel>. The consequence was that it was
always the same packet number space which was displayed: the first one (or
the Initial packet number space).
Must be backported to 2.6.
It is possible to speed up the handshake completion but only one time
by connection as mentionned in RFC 9002 "6.2.3. Speeding up Handshake Completion".
Add a flag to prevent this process to be run several times
(see https://www.rfc-editor.org/rfc/rfc9002#name-speeding-up-handshake-compl).
Must be backported to 2.6.
When receiving a signal before entering the poller, and without any
activity in the process, the poller will be entered with a timeout
calculated without checking the signals.
Since commit 4f59d3 ("MINOR: time: increase the minimum wakeup interval
to 60s") the issue is much more visible because it could be stuck for
60s.
When in mworker mode, if a worker quits and the SIGCHLD signal deliver
at the right time to the master, this one could be stuck for the time of
the timeout.
This should fix issue #1841
Must be backported in every stable version.
By default we now dump stats between caller and callee, but by
specifying "aggr" on the command line, stats get aggregated by
callee again as it used to be before the feature was available.
It may sometimes be helpful when comparing total call counts,
though that's about all.
The following task/tasklet handlers often appear in "show profiling tasks"
but were not resolved since static:
qc_io_cb, quic_conn_app_io_cb, process_timer,
quic_accept_run, qc_idle_timer_task
This commit simply exports them so they can be resolved now. "process_timer"
which was a bit too generic and renamed to qc_process_timer.
The function appears like this in "show profiling tasks", so let's export
it:
function calls cpu_tot cpu_avg lat_tot lat_avg
main+0x1463f0 92 77.28us 839.0ns 2.018ms 21.93us <- wake_expired_tasks@src/task.c:429 task_drop_running
This removes all the hard-coded 8-bit and 256 entries to use a pair of
macros instead so that we can more easily experiment with larger table
sizes if needed.
Now that ->wake_date is common to tasks and tasklets, we don't need
anymore to carry a duplicate control block to read and update it for
tasks and tasklets. And given that this code was present early in the
if/else fork between tasks and tasklets, taking it out of the block
allows to move the task part into a more visible "else" branch that
also allows to factor the epilogue that resets th_ctx->current and
updates profile_entry->cpu_time, which also used to be duplicated.
Overall, doing just that saved 253 bytes in the function, or ~1/6,
which is not bad considering that it's on a hot path. And the code
got much ore readable.
The memstats code currently defines its own file/function/line number,
type and extra pointer. We don't need to keep them separate and we can
easily replace them all with just a struct ha_caller. Note that the
extra pointer could be converted to a pool ID stored into arg8 or
arg32 and be dropped as well, but this would first require to define
IDs for pools (which we currently do not have).
It was a mistake to put these two fields in the struct task. This
was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task
CPU time and latency"). These fields are used solely by streams in
order to report the measurements via the lat_ns* and cpu_ns* sample
fetch functions when task profiling is enabled. For the rest of the
tasks, this is pure CPU waste when profiling is enabled, and memory
waste 100% of the time, as the point where these latencies and usages
are measured is in the profiling array.
Let's move the fields to the stream instead, and have process_stream()
retrieve the relevant info from the thread's context.
The struct task is now back to 120 bytes, i.e. almost two cache lines,
with 32 bit still available.
When task profiling is enabled, the reported CPU time for short requests
and responses (e.g. redirect) is always zero in the logs, because
process_stream() is only called once and the CPU time is measured after
it returns. This is particuarly annoying when dealing with denies and in
general anything that deals with parasitic traffic because it can be
difficult to figure where the CPU is spent.
The solution taken in this patch consists in having process_stream()
update the cpu time itself before logging and quitting. It's very simple.
It will not take into account the time taken to produce the log nor
freeing the stream, but that's marginal compared to always logging zero.
The task's wake_date is also reset so that the scheduler doesn't have to
perform these operations again. This is dependent on the following patch:
MINOR: sched: store the current profile entry in the thread context
It should be backported to 2.6 as it does help for troubleshooting.
The profile entry that corresponds to the current task/tasklet being
profiled is now stored into the thread's context. This will allow it
to be accessed from the tasks themselves. This is needed for an upcoming
fix.
When task profiling is enabled, the scheduler can measure and report
the cumulated time spent in each task and their respective latencies. But
this was wrong for tasks with few wakeups as well as for self-waking ones,
because the call date needed to measure how long it takes to process the
task is retrieved in the task itself (->wake_date was turned to the call
date), and we could face two conditions:
- a new wakeup while the task is executing would reset the ->wake_date
field before returning and make abnormally low values being reported;
that was likely the case for taskrun_applet for self-waking applets;
- when the task dies, NULL is returned and the call date couldn't be
retrieved, so that CPU time was not being accounted for. This was
particularly visible with process_stream() which is usually called
only twice per request, and whose time was systematically halved.
The cleanest solution here is to keep in mind that the scheduler already
uses quite a bit of local context in th_ctx, and place the intermediary
values there so that they cannot vanish. The wake_date has to be reset
immediately once read, and only its copy is used along the function. Note
that this must be done both for tasks and tasklet, and that until recently
tasklets were also able to report wrong values due to their sole dependency
on TH_FL_TASK_PROFILING between tests.
One nice benefit for future improvements is that such information will now
be available from the task without having to be stored into the task itself
anymore.
Since the tasklet part was computed on wrapping 32-bit arithmetics and
the task one was on 64-bit, the values were now consistently moved to
32-bit as it's already largely sufficient (4s spent in a task is more
than twice what the watchdog would tolerate). Some further cleanups might
be necessary, but the patch aimed at staying minimal.
Task profiling output after 1 million HTTP request previously looked like
this:
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
h1_io_cb 2012338 4.850s 2.410us 12.91s 6.417us
process_stream 2000136 9.594s 4.796us 34.26s 17.13us
sc_conn_io_cb 2000135 1.973s 986.0ns 30.24s 15.12us
h1_timeout_task 137 - - 2.649ms 19.34us
accept_queue_process 49 152.3us 3.107us 321.7yr 6.564yr
main+0x146430 7 5.250us 750.0ns 25.92us 3.702us
srv_cleanup_idle_conns 1 559.0ns 559.0ns 918.0ns 918.0ns
task_run_applet 1 - - 2.162us 2.162us
Now it looks like this:
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
h1_io_cb 2014194 4.794s 2.380us 13.75s 6.826us
process_stream 2000151 20.01s 10.00us 36.04s 18.02us
sc_conn_io_cb 2000148 2.167s 1.083us 32.27s 16.13us
h1_timeout_task 198 54.24us 273.0ns 3.487ms 17.61us
accept_queue_process 52 158.3us 3.044us 409.9us 7.882us
main+0x1466e0 18 16.77us 931.0ns 63.98us 3.554us
srv_cleanup_toremove_conns 8 282.1us 35.26us 546.8us 68.35us
srv_cleanup_idle_conns 3 149.2us 49.73us 8.131us 2.710us
task_run_applet 3 268.1us 89.38us 11.61us 3.871us
Note the two-fold difference on process_stream().
This feature is essentially used for debugging so it has extremely limited
impact. However it's used quite a bit more in bug reports and it would be
desirable that at least 2.6 gets this fix backported. It depends on at least
these two previous patches which will then also have to be backported:
MINOR: task: permanently enable latency measurement on tasklets
CLEANUP: task: rename ->call_date to ->wake_date
This field is misnamed because its real and important content is the
date the task was woken up, not the date it was called. It temporarily
holds the call date during execution but this remains confusing. In
fact before the latency measurements were possible it was indeed a call
date. Thus is will now be called wake_date.
This change is necessary because a subsequent fix will require the
introduction of the real call date in the thread ctx.
When tasklet latency measurement was enabled in 2.4 with commit b2285de04
("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"),
the feature was conditionned on DEBUG_TASK because the field would add 8
bytes to the struct tasklet.
This approach was not a very good idea because the struct ends on an int
anyway thus it does finish with a 32-bit hole regardless of the presence
of this field. What is true however is that adding it turned a 64-byte
struct to 72-byte when caller debugging is enabled.
This patch revisits this with a minor change. Now only the lowest 32
bits of the call date are stored, so they always fit in the remaining
hole, and this allows to remove the dependency on DEBUG_TASK. With
debugging off, we're now seeing a 48-byte struct, and with debugging
on it's exactly 64 bytes, thus still exactly one cache line. 32 bits
allow a latency of 4 seconds on a tasklet, which already indicates a
completely dead process, so there's no point storing the upper bits at
all. And even in the event it would happen once in a while, the lost
upper bits do not really add any value to the debug reports. Also, now
one tasklet wakeup every 4 billion will not be sampled due to the test
on the value itself. Similarly we just don't care, it's statistics and
the measurements are not 9-digit accurate anyway.
Add QUIC support to the ssl_sock_switchctx_cbk() variant used only when
no client_hello_cb is available.
This could be used with libreSSL implementation of QUIC for example.
It also works with quictls when HAVE_SSL_CLIENT_HELLO_CB is removed from
openss-compat.h
When building HAProxy with USE_QUIC and libressl 3.6.0, the
ssl_sock_switchtx_cbk symbol is not found because libressl does not
implement the client_hello_cb.
A ssl_sock_switchtx_cbk version for the servername callback is available
but wasn't exported correctly.
This verification is done by ssl_sock_bind_verifycbk() which is set at different
locations in the ssl_sock.c code . About QUIC connections, there are a lot of chances
the connection object is not initialized when entering this function. What must
be accessed is the SSL object to retrieve the connection or quic_conn objects,
then the bind_conf object of the listener. If the connection object is not found,
we try to find the quic_conn object.
Modify ssl_sock_dump_errors() interface which takes a connection object as parameter
to also passed a quic_conn object as parameter. Again this function try first
to access the connection object if not NULL or the quic_conn object if not.
There is a remaining thing to do for QUIC: store the certificate verification error
code as it is currently stored in the connection object. This error code is at least
used by the "bc_err" and "fc_err" sample fetches.
There are chances this bug is in relation with GH #1851. Thank you to @tasavis
for the report.
Must be merged into 2.6.
On frontend side, "h1-case-adjust-bogus-client" option is now supported in
TCP mode. It is important to be able to adjust the case of response headers
when a connection is routed to an HTTP backend. In this case, the client
connection is upgraded to H1.
On backend side, "h1-case-adjust-bogus-server" option is now also supported
in TCP mode to be able to perform HTTP health-checks with a case adjustment
of the request headers.
This patch should be backported as far as 2.0.
This trick is deprecated since the health-check refactoring, It is now
invalid. It means the following line will trigger an error during the
configuration parsing:
option httpchk OPTIONS * HTTP/1.1\r\nHost:\ www
It must be replaced by:
option httpchk OPTIONS * HTTP/1.1
http-check send hdr Host www
ssl_tlsext_ticket_key_cb() is called when "tls-ticket-keys" option is used on a
"bind" line. It needs to have an access to the TLS ticket keys which have been
stored into the listener bind_conf struct. The fix consists in nitializing the
<ref> variable (references to TLS secret keys) the correct way when this callback
is called for a QUIC connection. The bind_conf struct is store into the quic_conn
object (QUIC connection).
This issue may be in relation with GH #1851. Thank you for @tasavis for the report.
Must be backported to 2.6.
Obviously, frames which are duplicated from others must not be retransmitted if
the original frame they were duplicated from was already acknowledged.
This should have been detected by qc_build_frms() which skips such frames,
except if the QUIC xprt does really bad things which are not supported by
the upper layer. This will have to be checked with Amaury.
To prevent the retransmision of these frames which leads to crashes as reported by
hpn0t0ad this gdb backtrace in GH #1809 where the frame builder tries to copy a huge
number of bytes to the packet buffer:
Thread 7 (Thread 0x7fddf373a700 (LWP 13)):
#0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:520
No locals.
#1 0x000055b17435705e in quic_build_stream_frame (buf=0x7fddf372ef78, end=<optimized out>, frm=0x7fdde08d3470, conn=<optimized out>) at src/quic_frame.c:515
to_copy = 18446697703428890384
stream = 0x7fdde08d3490
wrap = <optimized out>
which matches this part of quic_frame.c code:
wrap = (const unsigned char *)b_wrap(stream->buf);
if (stream->data + stream->len > wrap) {
size_t to_copy = wrap - stream->data;
memcpy(*buf, stream->data, to_copy);
*buf += to_copy;
we release as soon as possible the impacted frames as there is really no need
to retransmit such frames.
Thank you to @hpn0t0ad for having provided us with useful traces in github
issue #1809.
Must be backported in 2.6.
Commit 5c83e3a156 made some adjustments
to clarify which TCP_INFO information is supported by each respective
OS.
There was a comment like so..
Note that fc_rtt and fc_rttvar are supported on any OS that has TCP_INFO,
not just linux/freebsd/netbsd, so we continue to expose them unconditionally.
But the diff didn't do so in a consistent manner.
In github issue #1850, Christian Ruppert reported a case of crash in
2.6 when failing to parse some http rules. This started to happen
with 2.6 commit dd7e6c6 ("BUG/MINOR: http-rules: completely free
incorrect TCP rules on error") but has some of its roots in 2.2
commit 2eb539687 ("MINOR: http-rules: Add release functions for
existing HTTP actions").
The cause is that when the release function is set for HTTP actions,
the rule->arg.http.fmt list head is not yet initialized, hence is
NULL, thus the release function crashes when it tries to iterate over
it. In fact this code was initially not written with the perspective
of releasing such elements upon error, so the arg list initialization
happened after error checking.
This patch just moves the list initialization just after setting the
release pointer and that's OK.
This patch must be backported to 2.6 since the problem is visible
there. It could be backported to 2.5 but the issue is not triggered
there without the first mentioned patch above that landed in 2.6, so
it will not bring any obvious benefit.
We now have two functions, one for dumping connections and the other
one for dumping the streams. This will permit to use it from show_sd.
A few optional line breaks were inserted where relevant to keep lines
homogenous when a prefix is passed.
It's very limited but at least provides the very basic info about QCS and
QCC when issuing "show sess all":
scf=0x7fa9642394a0 flags=0x00000080 state=EST endp=CONN,0x7fa9642351f0,0x02001001 sub=3
> qcs=0x7fa9642351f0 .flg=0x5 .id=396 .st=HCR .ctx=0x7fa9642353f0, .err=0
> qcc=0x7fa96405ce20 .flg=0 .nbsc=100 .nbhreq=100, .task=0x7fa964054260
co0=0x7fa96405cd50 ctrl=quic4 xprt=QUIC mux=QUIC data=STRM target=LISTENER:0x328c530
flags=0x00200300 fd=-1 fd.state=00 updt=0 fd.tmask=0x0
It will need to be improved but it's better than nothing already. This
should be backported to 2.6 if the other dumps are backported.
With this, it now becomes possible to see the state of each H2 stream from
"show sess all". Lines are still too long and need to be split, but that's
for another patch.
This helper will be called for muxes that provide it and will be used
to let the mux provide extra information about the stream attached to
a stream descriptor. A line prefix is passed in argument so that the
mux is free to break long lines without breaking indent. No prefix
means no line breaks should be produced (e.g. for short dumps).
The function will be reusable to dump streams, so let's extract it.
Note that due to "last_h2s" being originally printed as a prefix for
the stream dump, now the pointer is displayed by the caller instead.
When an appctx is found looping over itself, we report a number of info
but not the pointers to the definition nor the handler, which can be quite
handy in some cases. Let's add them and try to decode the symbol.
Commit 1776ffb97 ("MINOR: mux-fcgi: make the "show fd" helper also decode
the fstrm subscriber when known") improved the output of "show fd" for the
FCGI mux, but the output is sent to the trash buffer instead of the msg
argument. It turns out that this has no effect right now as the caller
passes the trash but this is risky.
This should be backported to 2.4.
Commit 150c4f8b7 ("MINOR: mux-h1: make the "show fd" helper also decode
the h1s subscriber when known") improved the output of "show fd" for the
H1 mux, but the output is sent to the trash buffer instead of the msg
argument. It turns out that this has no effect right now as the caller
passes the trash but this is risky.
This should be backported to 2.4.
Commit 98e40b981 ("MINOR: mux-h2: make the "show fd" helper also decode
the h2s subscriber when known") improved the output of "show fd" for the
H2 mux, but the output is sent to the trash buffer instead of the msg
argument. It turns out that this has no effect right now as the caller
passes the trash but this is risky.
This should be backported to 2.4.
Since everything is available for this, let's enable ALPN with the
usual "h2,http/1.1" on the https server. This will allow HTTPS requests
to use HTTP/2 when available.
It may be needed to permit to disable this (or to set the string) in
case some client code explicitly checks for the "HTTP/1.1" string, but
since httpclient is quite young it's unlikely that such code already
exists.
The servers were not set with default settings, meaning that a few
settings including the pool_max_delay were not set, thus disabling
connection pools, which is the cause of the fact that keep-alive was
disabled as reported in issue #1831. There might possibly be other
issues pending since all these fields were left to zero.
Note that this patch alone will not fix keep-alive because the applet
does not enforce SE_FL_NOT_FIRST and relies on the default http-reuse
safe, thus if servers are not shared, all requests are considered
first ones and do not reuse existing connections.
In 2.7, commit ecb40b2c3 ("MINOR: backend: always satisfy the first
req reuse rule with l7 retries") addressed this in a more elegant way
by fixing http-reuse to take into account the fact that properly
configured l7 retries provide exactly the capability that reuse safe
was trying to cover, and this patch is suitable for backporting.
This patch should be backported to 2.6 only.
There's a tiny issue in the I/O handler by which both a failed request
emission and missing response data will want to subscribe for more room
on output. That's not correct in that only the case where the request
buffer is full should cause this, the other one should just wait for
incoming data. This could theoretically cause spurious wakeups at
certain key points (e.g. connect() time maybe) though this could not
be reproduced but better fix this while it's easy enough.
It doesn't seem necessary to backport it right now, though this may
have to in case a concrete reproducible case is discovered.
If the caller dies before the server responds, the httpclient can crash
in hc_cli_res_end_cb() when unregistering because it dereferences
hc->caller which was already freed during the caller's unregistration.
The easiest way to reproduce it is by sending twice the following
request on the same CLI connection in expert mode, with httpterm
running on local port 8000:
httpclient GET http://127.0.0.1:8000/?t=600
Note the 600ms delay that's larger than socat's default 500.
The code checks for a NULL everywhere hc->caller is used, but the NULL
was forgotten in this specific case. It must be placed in the second
half of httpclient_stop_and_destroy() which is responsible for signaling
the client that the caller leaves.
This must be backported to 2.6.
In 1.9-dev, a new flag was introduced on the start line with commit
f1ba18d7b ("MEDIUM: htx: Don't rely on h1_sl anymore except during H1
header parsing") to designate a response message: HTX_SL_F_IS_RESP.
Unfortunately as it was done in parallel to the mux_h2 support for
the backend, it was never integrated there. It was not used by then
so this remained unnoticed for a while.
However the http_client now uses it, and missing that flag prevents
it from using the H2 mux, so let's properly add it.
There's no point in backporting this far away, but since the http_client
is fully operational in 2.6 it would make sense to backport this fix at
least there to secure the code.
The frame which are retransmitted by qc_dgrams_retransmit() are duplicated
from sent but not acknowledged packets and added to local frames lists.
Some may not have been sent. If not replaced somewhere (linked to the
connection) they are lost for ever (leak). We splice the list remaining
contents to the packets number space frame list to avoid such a situation.
Must be backported to 2.6.
<force_ack> boolean variable passed to qc_do_build_pkt() which builds a clear
packet is there to force this function to build an ACK frame regardless of
others conditions. This is used during handshake, when we acknowledge every
handshake packets received.
This variable was already taken into an account by the local variable <must_ack>
which is there at least to ignore any other conditions than this one: "are
we building a probing packet?". Indeed we do not want to add ACK frames when
we probe the peers. This is to have more chances to embed the new duplicated frames
into another packets without splitting them. So, the test on <force_ack> boolean
value is useless, silly and brakes the rule which consists in not acknowledging
when probing.
Must be backported to 2.6.
The "first req" rule consists in not delivering a connection's first
request to a connection that's not known for being safe so that we
don't deliver a broken page to a client if the server didn't intend to
keep it alive. That's what's used by "http-reuse safe" particularly.
But the reason this rule was created was precisely because haproxy was
not able to re-emit the request to the server in case of connection
breakage, which is precisely what l7 retries later brought. As such,
there's no reason for enforcing this rule when l7 retries are properly
enabled because such a blank page will trigger a retry and will not be
delivered to the client.
This patch simply checks that the l7 retries are enabled for the 3 cases
that can be triggered on a dead or dying connection (failure, empty, and
timeout), and if all 3 are enabled, then regular idle connections can be
reused.
This could almost be marked as a bug fix because a lot of users relying
on l7 retries do not necessarily think about using http-reuse always due
to the recommendation against it in the doc, while the protection that
the safe mode offers is never used in that mode, and it forces the http
client not to reuse existing persistent connections since it never sets
the "not first" flag.
It could also be decided that the protection is not used either when
the origin is an applet, as in this case this is internal code that
we can decide to let handle the retry by itself (all info are still
present). But at least the httpclient will be happy with this alone.
It would make sense to backport this at least to 2.6 in order to let
the httpclient reuse connections, maybe to older releases if some
users report low reuse counts.
When idle H1 connections cannot be stored into a server pool or are later
evicted, they're often seen closed with a FIN then an RST. The problem is
that this is sufficient to leave them in TIME_WAIT in the local sockets
table and port exhaustion may happen.
The reason is that in h1_release() we rely on h1_shutw_conn() which itself
decides whether to close in silent or normal mode only based on the
H1C_F_ST_SILENT_SHUT flag. This flag is only set by h1_shutw() based on
the requested mode. But when the connection is in the idle list, the mode
ought to always be silent.
What this patch does is to set the flag before trying to add to the idle
list, and remove it after removing from the idle list. This way if the
connection fails to be added or has to be killed, it's closed with an
RST.
This must be backported as far as 2.4. It's not sure whether older
versions need an equivalent.
The PCRE2 JIT support is buggy. If HAProxy is compiled with USE_PCRE2_JIT
option while the PCRE2 library is compiled without the JIT support, any
matching will fail because pcre2_jit_compile() return value is not properly
handled. We must fall back on pcre2_match() if PCRE2_ERROR_JIT_BADOPTION
error is returned.
This patch should fix the issue #1848. It must be backported as far as 2.4.
If the service is rechecked before a reload, that may cause the config
to be parsed twice and file-backed rings to be lost.
Here we make sure that such a ring does contain information before
deciding to rotate it. This way the first process starting after some
writes will cause a rotate but not subsequent ones until new writes
are applied.
An attempt was also made to disable rotations on checks but this was a
bad idea, as the ring is still initialized and this causes the contents
to be lost. The choice of initializing the ring during parsing is
questionable but the config check ought to be as close as possible to a
real start, and we could imagine that the ring is used by some code
during startup (e.g. lua). So this approach was abandonned and config
checks also cause a rotation, as the purpose of this rotation is to
preserve latest information against accidental removal.
ckch_inst_free() unlink the ckch_inst_link structure but never free it.
It can't be fixed simply because cli_io_handler_commit_cafile_crlfile()
is using a cafile_entry list to iterate a list of ckch_inst entries
to free. So both cli_io_handler_commit_cafile_crlfile() and
ckch_inst_free() would modify the list at the same time.
In order to let the caller manipulate the ckch_inst_link,
ckch_inst_free() now checks if the element is still attached before
trying to detach and free it.
For this trick to work, the caller need to do a LIST_DEL_INIT() during
the iteration over the ckch_inst_link.
list_for_each_entry was also replace by a while (!LIST_ISEMPTY()) on the
head list in cli_io_handler_commit_cafile_crlfile() so the iteration
works correctly, because it could have been stuck on the first detached
element. list_for_each_entry_safe() is not enough to fix the issue since
multiple element could have been removed.
Must be backported as far as 2.5.
Move these traces to QUIC_EV_CONN_SPPKTS trace event. They were displayed
at a useless location. Make them displayed just after having sent a packet
and when checking the anti-amplication limit.
Useful to diagnose issues in relation with the recovery.
This reverts commit 056ad01d55.
This reverts commit ddd480cbdc.
The architecture is ambiguous here: ckch_inst_free() is detaching and
freeing the "ckch_inst_link" linked list which must be free'd only from
the cafile_entry side.
The problem was also hidden by the fix ddd480c ("BUG/MEDIUM: ssl: Fix a
UAF when old ckch instances are released") which change the ckchi_link
inner loop by a safe one. However this can't fix entirely the problem
since both __ckch_inst_free_locked() could remove several nodes in the
ckchi_link linked list.
This revert is voluntary reintroducing a memory leak before really fixing
the problem.
Must be backported in 2.5 + 2.6.
When old chck instances is released at the end of "commit ssl ca-file" or
"commit ssl crl-file" commands, the link is released. But we walk through
the list using the unsafe macro. list_for_each_entry_safe() must be used.
This bug was introduced by commit 056ad01d5 ("BUG/MINOR: ssl: leak of
ckch_inst_link in ckch_inst_free()"). Thus this patch must be backported as
far as 2.5.
The commit 871dd8211 ("BUG/MINOR: tcpcheck: Disable QUICKACK only if data
should be sent after connect") introduced a regression. It removes the test
on the next rule to be able to disable TCP_QUICKACK when only a connect is
performed (so no next rule).
This patch must be backported as far as 2.2.
ckch_inst_free() unlink the ckch_inst_link structure but never free it.
It can cause a memory leak upon a ckch_inst_free() done with CLI
operation.
Bug introduced by commit 4458b97 ("MEDIUM: ssl: Chain ckch instances in
ca-file entries").
Must be backported as far as 2.5.
Commit b0c4827 ("BUG/MINOR: ssl: free the cafile entries on deinit")
introduced a double free.
The node was never removed from the tree before its free.
Fix issue #1836.
Must be backported where b0c4827 was backported. (2.6 for now).
Without such a trace, we do not know when a datagram is sent. Only trace for
the packets inside the datagrams were displayed.
Must be backported to 2.6.
This bug arrived with this commit:
"MINOR: quic: Add reusable cipher contexts for header protection"
haproxy could crash because of missing cipher contexts initializations for
the header protection and draft-v2 Initial secrets. This was due to the fact
that these initialization both for RX and TX secrets were done outside of
qc_new_isecs(). The role of this function is definitively to initialize these
cipher contexts in addition to the derived secrets. Indeed this function is called
by qc_new_conn() which initializes the connection but also by qc_conn_finalize()
which also calls qc_new_isecs() in case of a different QUIC version was negotiated
by the peers from the one used by the client for its first Initial packet.
This was reported by "v2" QUIC interop test with at least picoquic as client.
Must be backported to 2.6.
There's no point trying to send() on a socket on which an error was already
reported. This wastes syscalls. Till now it was possible to occasionally
see an attempt to sendto() after epoll_wait() had reported EPOLLERR.
In 2.2, commit 5d7dcc2a8 ("OPTIM: epoll: always poll for recv if neither
active nor ready") was added to compensate for the fact that our iocbs
are almost always asynchronous now and do not have the opportunity to
update the FD correctly. As such, they just perform a wakeup, the FD is
turned to inactive, the tasklet wakes up, performs the I/O, updates the
FD, most of the time this is done withing the same polling loop, and the
update cancels itself in the poller without having to switch the FD off
then on.
The issue was that when deciding to claim an FD was active for reads
if it was active for writes, we forgot one situation that unfortunately
causes excessive wakeups: dealing with errors. Indeed, errors are
reported and keep ringing as long as the FD is active for sending even
if the consumer disabled the FD for receiving. Usually this only causes
one extra wakeup for the time it takes to consider a potential write
subscriber and to call it, though with many tasks in a run queue, it
can last a bit longer and be reported more often.
The fix consists in checking that we really want to get more receive
events on this FD, that is:
- that no prevous EPOLLERR was reported
- that the FD doesn't carry a sticky error
- that the FD is not shut for reads
With this, after the last epoll_wait() reports EPOLLERR, one last recv()
is performed to flush pending data and the FD is immediately unregistered.
It's probably not needed to backport this as its effects are not much
visible, though it should not harm.
Before, EPOLLERR was seen twice:
accept4(4, {sa_family=AF_INET, sin_port=htons(22314), sin_addr=inet_addr("127.0.0.1")}, [128 => 16], SOCK_NONBLOCK) = 8
accept4(4, 0x261b160, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(8, "POST / HTTP/1.1\r\nConnection: close\r\nTransfer-encoding: chunk"..., 16320, 0, NULL, NULL) = 66
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 9
connect(9, {sa_family=AF_INET, sin_port=htons(8002), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(3, EPOLL_CTL_ADD, 8, {events=EPOLLIN|EPOLLRDHUP, data={u32=8, u64=8}}) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 9, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=9, u64=9}}) = 0
epoll_wait(3, [{events=EPOLLOUT, data={u32=9, u64=9}}], 200, 355) = 1
recvfrom(9, 0x25cfb30, 16320, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
sendto(9, "POST / HTTP/1.1\r\ntransfer-encoding: chunked\r\n\r\n", 47, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 47
epoll_ctl(3, EPOLL_CTL_MOD, 9, {events=EPOLLIN|EPOLLRDHUP, data={u32=9, u64=9}}) = 0
epoll_wait(3, [{events=EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, data={u32=9, u64=9}}], 200, 354) = 1
recvfrom(9, "HTTP/1.1 200 OK\r\ncontent-length: 0\r\nconnection: close\r\n\r\n", 16320, 0, NULL, NULL) = 57
sendto(8, "HTTP/1.1 200 OK\r\ncontent-length: 0\r\nconnection: close\r\n\r\n", 57, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 57
->epoll_wait(3, [{events=EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, data={u32=9, u64=9}}], 200, 354) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 9, 0x7ffe0b65fb24) = 0
epoll_wait(3, [{events=EPOLLIN, data={u32=8, u64=8}}], 200, 354) = 1
recvfrom(8, "A\n0123456789\r\n0\r\n\r\n", 16320, 0, NULL, NULL) = 19
close(9) = 0
close(8) = 0
After, EPOLLERR is only seen only once, with one less call to epoll_wait():
accept4(4, {sa_family=AF_INET, sin_port=htons(22362), sin_addr=inet_addr("127.0.0.1")}, [128 => 16], SOCK_NONBLOCK) = 8
accept4(4, 0x20d0160, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(8, "POST / HTTP/1.1\r\nConnection: close\r\nTransfer-encoding: chunk"..., 16320, 0, NULL, NULL) = 66
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 9
connect(9, {sa_family=AF_INET, sin_port=htons(8002), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(3, EPOLL_CTL_ADD, 8, {events=EPOLLIN|EPOLLRDHUP, data={u32=8, u64=8}}) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 9, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=9, u64=9}}) = 0
epoll_wait(3, [{events=EPOLLOUT, data={u32=9, u64=9}}], 200, 411) = 1
recvfrom(9, 0x2084b30, 16320, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
sendto(9, "POST / HTTP/1.1\r\ntransfer-encoding: chunked\r\n\r\n", 47, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 47
epoll_ctl(3, EPOLL_CTL_MOD, 9, {events=EPOLLIN|EPOLLRDHUP, data={u32=9, u64=9}}) = 0
epoll_wait(3, [{events=EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, data={u32=9, u64=9}}], 200, 411) = 1
recvfrom(9, "HTTP/1.1 200 OK\r\ncontent-length: 0\r\nconnection: close\r\n\r\n", 16320, 0, NULL, NULL) = 57
sendto(8, "HTTP/1.1 200 OK\r\ncontent-length: 0\r\nconnection: close\r\n\r\n", 57, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 57
epoll_ctl(3, EPOLL_CTL_DEL, 9, 0x7ffc95d46f04) = 0
epoll_wait(3, [{events=EPOLLIN, data={u32=8, u64=8}}], 200, 411) = 1
recvfrom(8, "A\n0123456789\r\n0\r\n\r\n", 16320, 0, NULL, NULL) = 19
close(9) = 0
close(8) = 0
In 2.6-dev4, a fix for truncated response was brought with commit 99bbdbcc2
("BUG/MEDIUM: mux-h1: only turn CO_FL_ERROR to CS_FL_ERROR with empty ibuf"),
trying to address the situation where an error is present at the connection
level but some data are still pending to be read by the stream. However,
this patch did not consider the case where the stream was no longer willing
to read the pending data, resulting in a situation where some aborted
transfers could lead to excessive CPU usage by causing constant stream
wakeups for which no error was reported. This perfectly matches what was
observed and reported in github issue #1842. It's not trivial to reproduce,
but aborting HTTP/1 pipelining in the middle of transfers seems to give
good results (using h2load and Ctrl-C in the middle).
The fix was incorrct as the error should be held only if there were data
that the stream was able to read. This is the approach taken by this patch,
which also checks via SE_FL_EOI | SE_FL_EOS that the stream will be able
to consume the pending data.
Note that the loop was provoked by the attempt by sc_conn_io_cb() itself
to call sc_conn_send() which resulted in a write subscription in
h1_subscribe() which immediately calls a tasklet_wakeup() since the
event is ready, and that it is now stopped by the presence of SE_FL_ERROR
that is checked in sc_conn_io_cb(). It seems that an extra check down the
send() path to refrain from subscribing when the connection is in error
could speed up error detection or at least avoid a risk of loops in this
case, but this is tricky. In addition, there's already SE_FL_ERR_PENDING
that seems more suitable for reporting when there are pending data, but
similarly, it probably isn't checked well enough to be suitable for
backports.
FWIW the issue may (unreliably) be reproduced by chaining haproxy to
httpterm and issuing:
(printf "GET /?s=10g HTTP/1.1\r\n\r\n"; sleep 0.1; printf "\r\n") | \
nc6 --half-close 0 8001 | head -c1000000000 >/dev/null
It's necessary to play with the size of the head command that's supposed
to trigger the error at some point. A variant involving h2load in h1 mode
and multiple pipelined streams, that is stopped with Ctrl-C also tends to
work.
As the fix above was backported as far as 2.0, it would be tempting to
backport this one as far. However tests have shown that the oldest
version that can trigger this issue is 2.5, maybe due to subtle
differences in older ones, so it's probably not worth going further
until an issue is reported. Note that in 2.5 and older, the SE_FL_*
flags are applied on the conn_stream instead, as CS_FL_*.
Special thanks go to Felipe W Damasio for providing lots of detailed data
allowing to quickly spot the root cause of the problem.
applet:getline() and applet:receive() functions for HTTP applets must rely
on the channel flags to detect the end of the message and not on HTX
flags. It means CF_EOI must be used instead of HTX_FL_EOM.
It is important because the HTX flag is transient. Because there is no flag
on HTTP applets to save the info, it is not reliable. However CF_EOI once
set is never removed. So it is safer to rely on it. Otherwise, the call to
these functions hang.
This patch must be backported as far as 2.4.
On a reload, if the previous resync was not finished, the freshly old worker
must not try to start a new resync. Otherwise, it will compete with the
older wokers, slowing down or blocking the resync. Only an up-to-date woker
must try to perform a local resync.
This patch must be backported as far as 2.0 (and maybe to 1.8 too).
When a worker is stopped, the resync timer is used to limit in time the
connection stage to the new worker to perform the local resync. However,
this timer must be stopped when the resync is in progress and it must be
re-armed if the resync is interrupted (for instance because another
reload). Otherwise, if the resync is a bit long, an old worker may be killed
too early.
This bug was introduce by the commit 160fff665 ("BUG/MEDIUM: peers: limit
reconnect attempts of the old process on reload"). It must be backported as
far as 2.0.
Only the client timeout was set. Nothing prevent a peer applet to stall
during a connect or waiting a message from a remote peer. To avoid any
issue, it is important to also set connection and server timeouts. The
connect timeout is set to 1s and the server timeout is set to 5s.
This patch must be backported to all supported versions.
A bug was introduced by the commit b042e4f6f ("BUG/MAJOR: spoe: properly
detach all agents when releasing the applet"). The fix is not correct. We
really want to known if the released appctx is the last one or not. It is
important when async mode is used. If there are still running applets, we
just need to remove the reference on the current applet from streams in the
async waiting queue.
With the commit above, in async mode, if there are still running applets, it
will work as expected. Otherwise a processing timeout will be reported for
all these streams. So it is not too bad. But for other modes (sync and
pipelining), the async waiting queue is always empty. If at least one stream
is waiting to send a message, a new applet is created. It is an issue if the
SPOA is unhealthy because the number of running applets may explode.
However, the commit above tried to fix an issue. The bug is in fact when an
new SPOE applet is created. On success, we must remove reference on the
current appctx from the streams in the async waiting queue.
This patch must be backported as far as 1.8.
Several frames could remain as not build into <frm_list> built by qc_build_frms()
after having stopped at the first building error. So only one frame was reinserted in
the frame list passed as parameter to qc_do_build_pkt(). Then <frm_list> was
spliced to the packet frame list even its frames were not built, nor attached to
any packet. Such frames had their ->pkt member set to NULL, but considered as
built, then sent leading to a crash in qc_release_frm() where ->pkt is dereferenced.
This issue was again reported by useful traces provided by Tristan in GH #1808.
Must be backported to 2.6.
This function must duplicate frames be resent from packets. Some of
them are still in flight, others have already been detected as lost.
In this case the original frame ->pkt member is NULL.
Add a trace to distinguish these cases.
Thank you to Tristan for having reported this issue in GH #1808.
Must be backported to 2.6.
Fix the resolution in the httpclient when a port is associated to a
domain. The do-resolve action doesn't support a port in its input.
Must be backported to 2.6. Require the "host_only" converter to be
backported.
This reverts commit f61398a7ca.
After having checked a version with more traces and reproduced the issue
as reported by Tristan in GH #1808, there are remaining cases where
a duplicated but not already sent frame have to be marked as acked because
the frame it was copied from was acknowledeged before its copied was sent.
Must be backported to 2.6.
Since this commit:
"BUG/MINOR: quic: Wrong list_for_each_entry() use when building packets from
qc_do_build_pkt()"
there is no more reason that frames can be released without having been
sent, i.e. frames with non null ->pkt member. This ->pkt is the packet
the frame is attached to.
Must be backported to 2.6.
This function parses the QUIC packet from a UDP datagram. It was originally
supposed to be run by several thread. Here we remove a section of code
where the current thread checks there is not another thread which has already
inserted the new quic_conn it is trying to insert in the connections tree.
Must be backported to 2.6 to ease the future backports to come.
This was due to a missing I/O handler tasklet wakeup in process_timer() when
detecting packet loss. As, qc_release_lost_pkts() could remove the lost packets
from the in flight packets count, qc_set_timer() could cancel the timer used
to wakeup the connection I/O handler. Then the connection could remain idle
until it ends.
Must be backported to 2.6.
Packets with null "in flight" lengths are kept as the others packets as sent
but not already acknowledeged in the by packet number space trees.
But qc_release_lost_pkts() relied on this in fligh length to release the
memory allocated for this packets. We must release the memory allocated for
all the lost packets regardless of their in fligh lengths.
Modify this function to do nothing if the list of lost packets passed
as argument is empty. Stop using <lost_bytes> variable to decide if some packets
memory must be released or not.
Modify the callers to stop checking if this list is empty.
Should helping in fixing memory leak as reported by Tristan in GH #1801.
Must be backported to 2.6.
This reverts commit da9c441886.
Indeed this commit prevented the ACK only packets to be used as other packets
when they are acknowledged. Even if not ack-eliciting packets they are
acknowledged alongside others packets. Such acknowledged ACK only packets
must be used for instance to compute the RTT.
Must be backported to 2.6 if da9c441 was backported to 2.6.
Shut the connect() warning of resolvers_finalize_config() when the
configuration was not emitted manually.
This shuts the warning for the "default" resolvers which is created
automatically for the httpclient.
Must be backported in 2.6.
It is only a real problem for agent-checks when there is no agent string to
send. The condition to disable TCP_QUICKACK was only based on the action
type following the connect one. But it is not always accurate. indeed, for
agent-checks, there is always a SEND action. But if there is no "agent-send"
string defined, nothing is sent. In this case, this adds 200ms of latency
with no reason.
To fix the bug, a flag is now used on the CONNECT action to instruct there
are data that should be sent after the connect. For health-checks, this flag
is set if the action following the connect is a SEND action. For
agent-checks, it is set if an "agent-send" string is defined.
This patch should fix the issue #1836. It must be backported as far as 2.2.
When doing a re-exec, the master was creating a "default" resolvers,
which could result in a warning emitted because the "default" resolvers
of the configuration file is not available anymore.
Skip the creating of the "default" resolvers in wait mode, this is not
useful in the master.
Must be backported as far as 2.6.
Patch c31577f ("MEDIUM: resolvers: continue startup if network is
unavailable") was not working correctly. Indeed
resolvers_finalize_config() was returning a ERR type, but a postparser
is supposed to return 0 or 1.
The return value was never right, however it was only a problem since c31577f.
Must be backported in every stable branch.
The build on OpenBSD is broken since commit 5c83e3a15 ("MINOR: tcp_sample:
clarifying samples support per os, for further expansion."), hence it
only affects 2.7 and 2.6.
It looks like this changed things in such a way that if TCP_INFO is added
but the OS is not added to the list of OS's it will not build.
Extend support for get_tcp_info to OpenBSD.
This must be backported to 2.6.
As seen in GH issue #1770, peers synchronization do not cope well with
very large buffers because by default the only two reasons for stopping
the processing of updates is either that the end was reached or that
the buffer is full. This can cause high latencies, and even rightfully
trigger the watchdog when the operations are numerous and slowed down
by competition on the stick-table lock.
This patch introduces a limit to the number of messages one may send
at once, which now defaults to 200, regardless of the buffer size. This
means taking and releasing the lock up to 400 times in a row, which is
costly enough to let some other parts work.
After some observation this could be backported to 2.6. If so, however,
previous commits "BUG/MEDIUM: applet: fix incorrect check for abnormal
return condition from handler" and "BUG/MINOR: applet: make the call_rate
only count the no-progress calls" must be backported otherwise the call
rate might trigger the looping protection.
This is very similar to what we did in commit 6c539c4b8 ("BUG/MINOR:
stream: make the call_rate only count the no-progress calls"), it's
better to only count the call rate with no progress than to count all
calls and try to figure if there's no progress, because a fast running
applet might once satisfy the whole condition and trigger the bug. This
typically happens when artificially limiting the number of messages sent
at once by an applet, but could happen with plenty of highly interactive
applets.
This patch could be backported to stable versions if there are any
indications that it might be useful there.
We have quite numerous checks for abnormal applet handler behavior which
are supposed to trigger the loop protection. However, consecutive to
commit 15252cd9c ("MEDIUM: stconn: move the RXBLK flags to the stream
connector") that was merged into 2.6-dev12, one flag was incorrectly
renamed, and the check for an applet waiting for a buffer that is present
mistakenly turned to a check for missing room in the buffer. This erroneous
test could mistakenly trigger on applets that perform intensive I/Os doing
small exchanges each (e.g. cache, peers or HTTP client) if the load would
be sustained (>100k iops). For the cache this could represent higher than
13 Gbps on an object at least 1.6 GB large for example, which is quite
unlikely but theoretically possible.
This fix needs to be backported to 2.6.
Replace ->rx.pqpkts quic_enc_level struct member MT_LIST by an LIST.
Same thing for ->list quic_rx_packet struct member MT_LIST.
Update the code consequently. This was a reminisence of the multithreading
support (several threads by connection).
Must be backported to 2.6
Do not rely on the fact the callers of qc_build_frm() handle their
buffer passed to function the correct way (without leaving garbage).
Make qc_build_frm() update the buffer passed as argument only if
the frame it builds is well formed.
As far as I sse, there is no such callers which does not handle
carefully such buffers.
Must be backported to 2.6.
This is list_for_each_entry_safe() which must be used if we want to delete elements
inside its code block. This could explain that some frames which were not built were added
to packets with a NULL ->pkt member.
Thank you to Tristan for having reported this issue through backtraces in GH #1808
Must be backported to 2.6.
First, these packets must not be inserted in the tree of TX packets.
They are never explicitely acknowledged (for instance an ACK only
packet will never be acknowledged). Furthermore, if taken into an account
these packets may uselessly disturb the congestion control. We do not care
if they are lost or not. Furthermore as the ->in_fligh_len member value is null
they were not released by qc_release_lost_pkts() which rely on these values
to decide to release the allocated memory for such packets.
Must be backported to 2.6.
The master is re-exec with an empty proxies list if no master CLI is
configured.
This results in infinite loop since last patch:
3b68b602 ("BUG/MAJOR: log-forward: Fix log-forward proxies not fully initialized")
This patch avoid to loop again on log-forward proxies list if empty.
This patch should be backported until v2.3
We used to emit a diag warning in case ranges were used both with the
process and thread part of a thread spec. Now with groups it's not
longer a problem, so let's just kill this warning.
Since 2.7-dev2 with commit 5b09341c02 ("MEDIUM: cpu-map: replace the
process number with the thread group number"), the thread group has
replaced the process number in the "cpu-map" directive. In part due to
a design limit in 2.4 and 2.5, a special case was made of thread 1 in
commit bda7c1decd ("MEDIUM: config: simplify cpu-map handling"), because
there was no other location to store a single-threaded setup's mask by
then. The combination of the two resulted in a problem with thread
groups, by which as soon as one line exhibiting thread number 1 alone
was found in a config, the mask would be applied to all threads in the
group.
The loop was reworked to avoid this obsolete special case, and was
factored for better legibility. One obsolete comment about nbproc
was also removed. No backport is needed.
Some clients send CONNECTION_CLOSE frame without acknowledging the STREAM
data haproxy has sent. In this case, when closing the connection if
there were remaining data in QUIC stream buffers, they were not released.
Add a <closing> boolean option to qc_stream_desc_free() to force the
stream buffer memory releasing upon closing connection.
Thank you to Tristan for having reported such a memory leak issue in GH #1801.
Must be backported to 2.6.
In ticket #1805 an user is impacted by the limitation of size of the CLI
buffer when updating a ca-file.
This patch allows a user to append new certificates to a ca-file instead
of trying to put them all with "set ssl ca-file"
The implementation use a new function ssl_store_dup_cafile_entry() which
duplicates a cafile_entry and its X509_STORE.
ssl_store_load_ca_from_buf() was modified to take an apped parameter so
we could share the function for "set" and "add".
In order to be able to append new CA in a cafile_entry,
ssl_store_load_ca_from_buf() was reworked and a "append" parameter was
added.
The function is able to keep the previous X509_STORE which was already
present in the cafile_entry.
"set ssl ca-file" does not return any error when a ca-file is empty or
only contains comments. This could be a problem is the file was
malformated and did not contain any PEM header.
It must be backported as far as 2.5.
Implement quic_tls_rx_hp_ctx_init() and quic_tls_tx_hp_ctx_init() to initiliaze
such header protection cipher contexts for each RX and TX parts and for each
packet number spaces, only one time by connection.
Make qc_new_isecs() call these two functions to initialize the cipher contexts
of the Initial secrets. Same thing for ha_quic_set_encryption_secrets() to
initialize the cipher contexts of the subsequent derived secrets (ORTT, 1RTT,
Handshake).
Modify qc_do_rm_hp() and quic_apply_header_protection() to reuse these
cipher contexts.
Note that there is no need to modify the key update for the header protection.
The header protection secrets are never updated.
Since commit 2071a99df ("MINOR: listener/ssl: set the SSL xprt layer only
once the whole config is known") the xprt is initialized for ssl directly
from a generic funtion used to parse bind args.
But the 'bind' lines from 'log-forward' sections were forgotten in commit
55f0f7bb5 ("MINOR: config: use the new bind_parse_args_list() to parse a
"bind" line").
This patch re-works 'log-forward' section parsing to use the generic
function to parse bind args and fix the issue.
Since the generic way to parse was introduced in 2.6, this patch
should be backported as far as this version.
Some initialisation for log forward proxies was missing such
as ssl configuration on 'log-forward's 'bind' lines.
After the loop on the proxy initialization code for proxies present
in the main proxies list, this patch force to loop again on this code
for proxies present in the log forward proxies list.
Those two lists should be merged. This will be part of a global
re-work of proxy initialization including peers proxies and resolver
proxies.
This patch was made in first attempt to fix the bug and to facilitate
the backport on older branches waiting for a cleaner re-work on proxies
initialization on the dev branch.
This patch should be backported as far as 2.3.
This wrong trace came with this commit:
"BUG/MINOR: quic: Possible crashes when dereferencing ->pkt quic_frame struct member"
In qc_release_frm() we mark frames as acked. Nothing to see with references
to frames.
Thank you to Willy for having caught this one.
Must be backported to 2.6 as these traces arrived with a bug fix to be backported
to 2.6.
When duplicated frames are splitted, we must propagate this information
to the new allocated frame and add a reference to this new frame
to the reference list of the original frame.
Must be backported to 2.6
This was done at several places. First in qc_requeue_nacked_pkt_tx_frms.
This aim of this function is, if needed, to requeue all the TX frames of a lost
<pkt> packet passed as argument and detach them from this packet they have been
sent from. They are possible cases where the frm->pkt quic_frame struct member could
be NULL, as a result of a duplication of an original frame by qc_dup_pkt_frms(). This
function adds the duplicated frame to the original frame reference list:
LIST_APPEND(&origin->reflist, &dup_frm->ref);
But, in this function, the packet which contains the frame is the one which is passed
as argument (for debug purpose). So let us prefer using this variable.
Also do not dereference this ->pkt quic_frame member in qc_release_frm() and
qc_frm_unref() and add a trace to catch the frame with a null ->pkt member.
They are logically frames which have not already been sent.
Thank you to Tristan for having reported such crashes in GH #1808.
Must be backported to 2.6