Originally in 1.8 we wanted to have an independent mux that could possibly
be disabled and would not impose dependencies on the outside. Everything
would fit into a single C file and that was fine.
Nowadays muxes are unavoidable, and not being able to easily inspect them
from outside is sometimes a bit of a pain. In particular, the flags utility
still cannot be used to decode their flags.
As a first step towards this, this patch moves the flags and enums to
mux_h2-t.h, as well as the two state decoding inline functions. It also
dropped the H2_SS_*_BIT defines that nobody uses. The mux_h2.c file remains
the only one to include that for now.
Please refer to GH #1859 for more info.
Coverity suspected improper proxy pointer handling.
Without the fix it is considered safe for the moment, but it might not
be the case in the future as we want to keep the ability to have
isolated listeners.
Making sure stop_listener(), pause_listener(), resume_listener()
and listener_release() functions make proper use
of px pointer in that context.
No need for backport except if multi-connection protocols (ie:FTP)
were to be backported as well.
Since this counter was added, it was incremented at the wrong place for
client streams. It was incremented when the stream-connector (formely the
conn-stream) was created while it should be done when the H1 stream is
created. Thus, on parsing error, on H1>H2 upgrades or TCP>H1 upgrades, the
counter is not incremented. However, it is always decremented when the H1
stream is destroyed.
On bakcned side, there is no issue.
This patch must be backported to 2.6.
As reported by Ilya and Coverity in issue #1858, since recent commit
eea152ee6 ("BUG/MINOR: signals/poller: ensure wakeup from signals")
which removed the test for the global signal flag from the pollers'
loop, the remaining "wake" flag doesn't need to be tested since it
already participates to zeroing the wait_time and will be caught
on the previous line.
Let's just remove that test now.
This patch adresses the issue #1626.
Adding support for PR_FL_PAUSED flag in the function stats_fill_fe_stats().
The command 'show stat' now properly reports a disabled frontend
using "PAUSED" state label.
This patch depends on the following commits:
- 7d00077fd5 "BUG/MEDIUM: proxy: ensure pause_proxy()
and resume_proxy() own PROXY_LOCK".
- 001328873c "MINOR: listener: small API change"
- d46f437de6 "MINOR: proxy/listener: support for additional PAUSED state"
It should be backported to 2.6, 2.5 and 2.4
This patch is a prerequisite for #1626.
Adding PAUSED state to the list of available proxy states.
The flag is set when the proxy is paused at runtime (pause_listener()).
It is cleared when the proxy is resumed (resume_listener()).
It should be backported to 2.6, 2.5 and 2.4
A minor API change was performed in listener(.c/.h) to restore consistency
between stop_listener() and (resume/pause)_listener() functions.
LISTENER_LOCK was never locked prior to calling stop_listener():
lli variable hint is thus not useful anymore.
Added PROXY_LOCK locking in (resume/pause)_listener() functions
with related lpx variable hint (prerequisite for #1626).
It should be backported to 2.6, 2.5 and 2.4
There was a race involving hlua_proxy_* functions
and some proxy management functions.
pause_proxy() and resume_proxy() can be used directly from lua code,
but that could lead to some race as lua code didn't make sure PROXY_LOCK
was owned before calling the proxy functions.
This patch makes sure it won't happen again elsewhere in the code
by locking PROXY_LOCK directly in resume and pause proxy functions
so that it's not the caller's responsibility anymore.
(based on stop_proxy() behavior that was already safe prior to the patch)
This should be backported to stable series.
Note that the API will likely differ < 2.4
Add self-wake in signal_handler() to fix a race condition with a signal
coming in between checking signal_queue_len and entering polling sleep.
The changes in commit 43c891dda ("BUG/MINOR: signals/poller: set the
poller timeout to 0 when there are signals") were insufficient.
Move the signal_queue_len check from the poll implementations to
run_poll_loop() to keep that logic in one place.
The poll loops are terminated either by the parameter wake being set or
wake up due to a write to their poller_wr_pipe by wake_thread() in
signal_handler().
This fixes issue #1841.
Must be backported in every stable version.
This is the ->finalize application callback which prepares the unidirectional STREAM
frames for h3 settings and wakeup the mux I/O handler to send them. As haproxy is
at the same time always waiting for the client request, this makes haproxy
call sendto() to send only about 20 bytes of stream data. Furthermore in case
of heavy loss, this give less chances to short h3 requests to succeed.
Drawback: as at this time the mux sends its streams by their IDs ascending order
the stream 0 is always embedded before the unidirectional stream 3 for h3 settings.
Nevertheless, as these settings may be lost and received after other h3 request
streams, this is permitted by the RFC.
Perhaps there is a better way to do. This will have to be checked with Amaury.
Must be backported to 2.6.
This was due to a missing check in h3_trace() about the first argument
presence (connection) and h3_parse_settings_frm() which calls TRACE_LEAVE()
without any argument. Then this argument was dereferenced.
Must be backported to 2.6
<qc> variable was confused with <qel>. The consequence was that it was
always the same packet number space which was displayed: the first one (or
the Initial packet number space).
Must be backported to 2.6.
It is possible to speed up the handshake completion but only one time
by connection as mentionned in RFC 9002 "6.2.3. Speeding up Handshake Completion".
Add a flag to prevent this process to be run several times
(see https://www.rfc-editor.org/rfc/rfc9002#name-speeding-up-handshake-compl).
Must be backported to 2.6.
When receiving a signal before entering the poller, and without any
activity in the process, the poller will be entered with a timeout
calculated without checking the signals.
Since commit 4f59d3 ("MINOR: time: increase the minimum wakeup interval
to 60s") the issue is much more visible because it could be stuck for
60s.
When in mworker mode, if a worker quits and the SIGCHLD signal deliver
at the right time to the master, this one could be stuck for the time of
the timeout.
This should fix issue #1841
Must be backported in every stable version.
By default we now dump stats between caller and callee, but by
specifying "aggr" on the command line, stats get aggregated by
callee again as it used to be before the feature was available.
It may sometimes be helpful when comparing total call counts,
though that's about all.
The following task/tasklet handlers often appear in "show profiling tasks"
but were not resolved since static:
qc_io_cb, quic_conn_app_io_cb, process_timer,
quic_accept_run, qc_idle_timer_task
This commit simply exports them so they can be resolved now. "process_timer"
which was a bit too generic and renamed to qc_process_timer.
The function appears like this in "show profiling tasks", so let's export
it:
function calls cpu_tot cpu_avg lat_tot lat_avg
main+0x1463f0 92 77.28us 839.0ns 2.018ms 21.93us <- wake_expired_tasks@src/task.c:429 task_drop_running
This removes all the hard-coded 8-bit and 256 entries to use a pair of
macros instead so that we can more easily experiment with larger table
sizes if needed.
Now that ->wake_date is common to tasks and tasklets, we don't need
anymore to carry a duplicate control block to read and update it for
tasks and tasklets. And given that this code was present early in the
if/else fork between tasks and tasklets, taking it out of the block
allows to move the task part into a more visible "else" branch that
also allows to factor the epilogue that resets th_ctx->current and
updates profile_entry->cpu_time, which also used to be duplicated.
Overall, doing just that saved 253 bytes in the function, or ~1/6,
which is not bad considering that it's on a hot path. And the code
got much ore readable.
The memstats code currently defines its own file/function/line number,
type and extra pointer. We don't need to keep them separate and we can
easily replace them all with just a struct ha_caller. Note that the
extra pointer could be converted to a pool ID stored into arg8 or
arg32 and be dropped as well, but this would first require to define
IDs for pools (which we currently do not have).
It was a mistake to put these two fields in the struct task. This
was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task
CPU time and latency"). These fields are used solely by streams in
order to report the measurements via the lat_ns* and cpu_ns* sample
fetch functions when task profiling is enabled. For the rest of the
tasks, this is pure CPU waste when profiling is enabled, and memory
waste 100% of the time, as the point where these latencies and usages
are measured is in the profiling array.
Let's move the fields to the stream instead, and have process_stream()
retrieve the relevant info from the thread's context.
The struct task is now back to 120 bytes, i.e. almost two cache lines,
with 32 bit still available.
When task profiling is enabled, the reported CPU time for short requests
and responses (e.g. redirect) is always zero in the logs, because
process_stream() is only called once and the CPU time is measured after
it returns. This is particuarly annoying when dealing with denies and in
general anything that deals with parasitic traffic because it can be
difficult to figure where the CPU is spent.
The solution taken in this patch consists in having process_stream()
update the cpu time itself before logging and quitting. It's very simple.
It will not take into account the time taken to produce the log nor
freeing the stream, but that's marginal compared to always logging zero.
The task's wake_date is also reset so that the scheduler doesn't have to
perform these operations again. This is dependent on the following patch:
MINOR: sched: store the current profile entry in the thread context
It should be backported to 2.6 as it does help for troubleshooting.
The profile entry that corresponds to the current task/tasklet being
profiled is now stored into the thread's context. This will allow it
to be accessed from the tasks themselves. This is needed for an upcoming
fix.
When task profiling is enabled, the scheduler can measure and report
the cumulated time spent in each task and their respective latencies. But
this was wrong for tasks with few wakeups as well as for self-waking ones,
because the call date needed to measure how long it takes to process the
task is retrieved in the task itself (->wake_date was turned to the call
date), and we could face two conditions:
- a new wakeup while the task is executing would reset the ->wake_date
field before returning and make abnormally low values being reported;
that was likely the case for taskrun_applet for self-waking applets;
- when the task dies, NULL is returned and the call date couldn't be
retrieved, so that CPU time was not being accounted for. This was
particularly visible with process_stream() which is usually called
only twice per request, and whose time was systematically halved.
The cleanest solution here is to keep in mind that the scheduler already
uses quite a bit of local context in th_ctx, and place the intermediary
values there so that they cannot vanish. The wake_date has to be reset
immediately once read, and only its copy is used along the function. Note
that this must be done both for tasks and tasklet, and that until recently
tasklets were also able to report wrong values due to their sole dependency
on TH_FL_TASK_PROFILING between tests.
One nice benefit for future improvements is that such information will now
be available from the task without having to be stored into the task itself
anymore.
Since the tasklet part was computed on wrapping 32-bit arithmetics and
the task one was on 64-bit, the values were now consistently moved to
32-bit as it's already largely sufficient (4s spent in a task is more
than twice what the watchdog would tolerate). Some further cleanups might
be necessary, but the patch aimed at staying minimal.
Task profiling output after 1 million HTTP request previously looked like
this:
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
h1_io_cb 2012338 4.850s 2.410us 12.91s 6.417us
process_stream 2000136 9.594s 4.796us 34.26s 17.13us
sc_conn_io_cb 2000135 1.973s 986.0ns 30.24s 15.12us
h1_timeout_task 137 - - 2.649ms 19.34us
accept_queue_process 49 152.3us 3.107us 321.7yr 6.564yr
main+0x146430 7 5.250us 750.0ns 25.92us 3.702us
srv_cleanup_idle_conns 1 559.0ns 559.0ns 918.0ns 918.0ns
task_run_applet 1 - - 2.162us 2.162us
Now it looks like this:
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
h1_io_cb 2014194 4.794s 2.380us 13.75s 6.826us
process_stream 2000151 20.01s 10.00us 36.04s 18.02us
sc_conn_io_cb 2000148 2.167s 1.083us 32.27s 16.13us
h1_timeout_task 198 54.24us 273.0ns 3.487ms 17.61us
accept_queue_process 52 158.3us 3.044us 409.9us 7.882us
main+0x1466e0 18 16.77us 931.0ns 63.98us 3.554us
srv_cleanup_toremove_conns 8 282.1us 35.26us 546.8us 68.35us
srv_cleanup_idle_conns 3 149.2us 49.73us 8.131us 2.710us
task_run_applet 3 268.1us 89.38us 11.61us 3.871us
Note the two-fold difference on process_stream().
This feature is essentially used for debugging so it has extremely limited
impact. However it's used quite a bit more in bug reports and it would be
desirable that at least 2.6 gets this fix backported. It depends on at least
these two previous patches which will then also have to be backported:
MINOR: task: permanently enable latency measurement on tasklets
CLEANUP: task: rename ->call_date to ->wake_date
This field is misnamed because its real and important content is the
date the task was woken up, not the date it was called. It temporarily
holds the call date during execution but this remains confusing. In
fact before the latency measurements were possible it was indeed a call
date. Thus is will now be called wake_date.
This change is necessary because a subsequent fix will require the
introduction of the real call date in the thread ctx.
When tasklet latency measurement was enabled in 2.4 with commit b2285de04
("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"),
the feature was conditionned on DEBUG_TASK because the field would add 8
bytes to the struct tasklet.
This approach was not a very good idea because the struct ends on an int
anyway thus it does finish with a 32-bit hole regardless of the presence
of this field. What is true however is that adding it turned a 64-byte
struct to 72-byte when caller debugging is enabled.
This patch revisits this with a minor change. Now only the lowest 32
bits of the call date are stored, so they always fit in the remaining
hole, and this allows to remove the dependency on DEBUG_TASK. With
debugging off, we're now seeing a 48-byte struct, and with debugging
on it's exactly 64 bytes, thus still exactly one cache line. 32 bits
allow a latency of 4 seconds on a tasklet, which already indicates a
completely dead process, so there's no point storing the upper bits at
all. And even in the event it would happen once in a while, the lost
upper bits do not really add any value to the debug reports. Also, now
one tasklet wakeup every 4 billion will not be sampled due to the test
on the value itself. Similarly we just don't care, it's statistics and
the measurements are not 9-digit accurate anyway.
Add QUIC support to the ssl_sock_switchctx_cbk() variant used only when
no client_hello_cb is available.
This could be used with libreSSL implementation of QUIC for example.
It also works with quictls when HAVE_SSL_CLIENT_HELLO_CB is removed from
openss-compat.h
When building HAProxy with USE_QUIC and libressl 3.6.0, the
ssl_sock_switchtx_cbk symbol is not found because libressl does not
implement the client_hello_cb.
A ssl_sock_switchtx_cbk version for the servername callback is available
but wasn't exported correctly.
This verification is done by ssl_sock_bind_verifycbk() which is set at different
locations in the ssl_sock.c code . About QUIC connections, there are a lot of chances
the connection object is not initialized when entering this function. What must
be accessed is the SSL object to retrieve the connection or quic_conn objects,
then the bind_conf object of the listener. If the connection object is not found,
we try to find the quic_conn object.
Modify ssl_sock_dump_errors() interface which takes a connection object as parameter
to also passed a quic_conn object as parameter. Again this function try first
to access the connection object if not NULL or the quic_conn object if not.
There is a remaining thing to do for QUIC: store the certificate verification error
code as it is currently stored in the connection object. This error code is at least
used by the "bc_err" and "fc_err" sample fetches.
There are chances this bug is in relation with GH #1851. Thank you to @tasavis
for the report.
Must be merged into 2.6.
On frontend side, "h1-case-adjust-bogus-client" option is now supported in
TCP mode. It is important to be able to adjust the case of response headers
when a connection is routed to an HTTP backend. In this case, the client
connection is upgraded to H1.
On backend side, "h1-case-adjust-bogus-server" option is now also supported
in TCP mode to be able to perform HTTP health-checks with a case adjustment
of the request headers.
This patch should be backported as far as 2.0.
This trick is deprecated since the health-check refactoring, It is now
invalid. It means the following line will trigger an error during the
configuration parsing:
option httpchk OPTIONS * HTTP/1.1\r\nHost:\ www
It must be replaced by:
option httpchk OPTIONS * HTTP/1.1
http-check send hdr Host www
ssl_tlsext_ticket_key_cb() is called when "tls-ticket-keys" option is used on a
"bind" line. It needs to have an access to the TLS ticket keys which have been
stored into the listener bind_conf struct. The fix consists in nitializing the
<ref> variable (references to TLS secret keys) the correct way when this callback
is called for a QUIC connection. The bind_conf struct is store into the quic_conn
object (QUIC connection).
This issue may be in relation with GH #1851. Thank you for @tasavis for the report.
Must be backported to 2.6.
Obviously, frames which are duplicated from others must not be retransmitted if
the original frame they were duplicated from was already acknowledged.
This should have been detected by qc_build_frms() which skips such frames,
except if the QUIC xprt does really bad things which are not supported by
the upper layer. This will have to be checked with Amaury.
To prevent the retransmision of these frames which leads to crashes as reported by
hpn0t0ad this gdb backtrace in GH #1809 where the frame builder tries to copy a huge
number of bytes to the packet buffer:
Thread 7 (Thread 0x7fddf373a700 (LWP 13)):
#0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:520
No locals.
#1 0x000055b17435705e in quic_build_stream_frame (buf=0x7fddf372ef78, end=<optimized out>, frm=0x7fdde08d3470, conn=<optimized out>) at src/quic_frame.c:515
to_copy = 18446697703428890384
stream = 0x7fdde08d3490
wrap = <optimized out>
which matches this part of quic_frame.c code:
wrap = (const unsigned char *)b_wrap(stream->buf);
if (stream->data + stream->len > wrap) {
size_t to_copy = wrap - stream->data;
memcpy(*buf, stream->data, to_copy);
*buf += to_copy;
we release as soon as possible the impacted frames as there is really no need
to retransmit such frames.
Thank you to @hpn0t0ad for having provided us with useful traces in github
issue #1809.
Must be backported in 2.6.
Commit 5c83e3a1563cd7face299bf08037e51f976eb5e3 made some adjustments
to clarify which TCP_INFO information is supported by each respective
OS.
There was a comment like so..
Note that fc_rtt and fc_rttvar are supported on any OS that has TCP_INFO,
not just linux/freebsd/netbsd, so we continue to expose them unconditionally.
But the diff didn't do so in a consistent manner.
In github issue #1850, Christian Ruppert reported a case of crash in
2.6 when failing to parse some http rules. This started to happen
with 2.6 commit dd7e6c6 ("BUG/MINOR: http-rules: completely free
incorrect TCP rules on error") but has some of its roots in 2.2
commit 2eb539687 ("MINOR: http-rules: Add release functions for
existing HTTP actions").
The cause is that when the release function is set for HTTP actions,
the rule->arg.http.fmt list head is not yet initialized, hence is
NULL, thus the release function crashes when it tries to iterate over
it. In fact this code was initially not written with the perspective
of releasing such elements upon error, so the arg list initialization
happened after error checking.
This patch just moves the list initialization just after setting the
release pointer and that's OK.
This patch must be backported to 2.6 since the problem is visible
there. It could be backported to 2.5 but the issue is not triggered
there without the first mentioned patch above that landed in 2.6, so
it will not bring any obvious benefit.
We now have two functions, one for dumping connections and the other
one for dumping the streams. This will permit to use it from show_sd.
A few optional line breaks were inserted where relevant to keep lines
homogenous when a prefix is passed.