Commit Graph

14616 Commits

Author SHA1 Message Date
Willy Tarreau
6a28a30efa MINOR: tasks: do not keep cpu and latency times in struct task
It was a mistake to put these two fields in the struct task. This
was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task
CPU time and latency"). These fields are used solely by streams in
order to report the measurements via the lat_ns* and cpu_ns* sample
fetch functions when task profiling is enabled. For the rest of the
tasks, this is pure CPU waste when profiling is enabled, and memory
waste 100% of the time, as the point where these latencies and usages
are measured is in the profiling array.

Let's move the fields to the stream instead, and have process_stream()
retrieve the relevant info from the thread's context.

The struct task is now back to 120 bytes, i.e. almost two cache lines,
with 32 bit still available.
2022-09-08 14:19:15 +02:00
Willy Tarreau
beee600491 BUG/MINOR: stream/sched: take into account CPU profiling for the last call
When task profiling is enabled, the reported CPU time for short requests
and responses (e.g. redirect) is always zero in the logs, because
process_stream() is only called once and the CPU time is measured after
it returns. This is particuarly annoying when dealing with denies and in
general anything that deals with parasitic traffic because it can be
difficult to figure where the CPU is spent.

The solution taken in this patch consists in having process_stream()
update the cpu time itself before logging and quitting. It's very simple.
It will not take into account the time taken to produce the log nor
freeing the stream, but that's marginal compared to always logging zero.
The task's wake_date is also reset so that the scheduler doesn't have to
perform these operations again. This is dependent on the following patch:

   MINOR: sched: store the current profile entry in the thread context

It should be backported to 2.6 as it does help for troubleshooting.
2022-09-08 14:19:15 +02:00
Willy Tarreau
1efddfa6bf MINOR: sched: store the current profile entry in the thread context
The profile entry that corresponds to the current task/tasklet being
profiled is now stored into the thread's context. This will allow it
to be accessed from the tasks themselves. This is needed for an upcoming
fix.
2022-09-08 14:19:15 +02:00
Willy Tarreau
62b5b96bcc BUG/MINOR: sched: properly account for the CPU time of dying tasks
When task profiling is enabled, the scheduler can measure and report
the cumulated time spent in each task and their respective latencies. But
this was wrong for tasks with few wakeups as well as for self-waking ones,
because the call date needed to measure how long it takes to process the
task is retrieved in the task itself (->wake_date was turned to the call
date), and we could face two conditions:
  - a new wakeup while the task is executing would reset the ->wake_date
    field before returning and make abnormally low values being reported;
    that was likely the case for taskèrun_applet for self-waking applets;

  - when the task dies, NULL is returned and the call date couldn't be
    retrieved, so that CPU time was not being accounted for. This was
    particularly visible with process_stream() which is usually called
    only twice per request, and whose time was systematically halved.

The cleanest solution here is to keep in mind that the scheduler already
uses quite a bit of local context in th_ctx, and place the intermediary
values there so that they cannot vanish. The wake_date has to be reset
immediately once read, and only its copy is used along the function. Note
that this must be done both for tasks and tasklet, and that until recently
tasklets were also able to report wrong values due to their sole dependency
on TH_FL_TASK_PROFILING between tests.

One nice benefit for future improvements is that such information will now
be available from the task without having to be stored into the task itself
anymore.

Since the tasklet part was computed on wrapping 32-bit arithmetics and
the task one was on 64-bit, the values were now consistently moved to
32-bit as it's already largely sufficient (4s spent in a task is more
than twice what the watchdog would tolerate). Some further cleanups might
be necessary, but the patch aimed at staying minimal.

Task profiling output after 1 million HTTP request previously looked like
this:

  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2012338   4.850s    2.410us   12.91s    6.417us
    process_stream              2000136   9.594s    4.796us   34.26s    17.13us
    sc_conn_io_cb               2000135   1.973s    986.0ns   30.24s    15.12us
    h1_timeout_task                 137      -         -      2.649ms   19.34us
    accept_queue_process             49   152.3us   3.107us   321.7yr   6.564yr
    main+0x146430                     7   5.250us   750.0ns   25.92us   3.702us
    srv_cleanup_idle_conns            1   559.0ns   559.0ns   918.0ns   918.0ns
    task_run_applet                   1      -         -      2.162us   2.162us

  Now it looks like this:
  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2014194   4.794s    2.380us   13.75s    6.826us
    process_stream              2000151   20.01s    10.00us   36.04s    18.02us
    sc_conn_io_cb               2000148   2.167s    1.083us   32.27s    16.13us
    h1_timeout_task                 198   54.24us   273.0ns   3.487ms   17.61us
    accept_queue_process             52   158.3us   3.044us   409.9us   7.882us
    main+0x1466e0                    18   16.77us   931.0ns   63.98us   3.554us
    srv_cleanup_toremove_conns        8   282.1us   35.26us   546.8us   68.35us
    srv_cleanup_idle_conns            3   149.2us   49.73us   8.131us   2.710us
    task_run_applet                   3   268.1us   89.38us   11.61us   3.871us

Note the two-fold difference on process_stream().

This feature is essentially used for debugging so it has extremely limited
impact. However it's used quite a bit more in bug reports and it would be
desirable that at least 2.6 gets this fix backported. It depends on at least
these two previous patches which will then also have to be backported:

     MINOR: task: permanently enable latency measurement on tasklets
     CLEANUP: task: rename ->call_date to ->wake_date
2022-09-08 14:19:15 +02:00
Willy Tarreau
04e50b3d32 CLEANUP: task: rename ->call_date to ->wake_date
This field is misnamed because its real and important content is the
date the task was woken up, not the date it was called. It temporarily
holds the call date during execution but this remains confusing. In
fact before the latency measurements were possible it was indeed a call
date. Thus is will now be called wake_date.

This change is necessary because a subsequent fix will require the
introduction of the real call date in the thread ctx.
2022-09-08 14:19:15 +02:00
Willy Tarreau
768c2c5678 MINOR: task: permanently enable latency measurement on tasklets
When tasklet latency measurement was enabled in 2.4 with commit b2285de04
("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"),
the feature was conditionned on DEBUG_TASK because the field would add 8
bytes to the struct tasklet.

This approach was not a very good idea because the struct ends on an int
anyway thus it does finish with a 32-bit hole regardless of the presence
of this field. What is true however is that adding it turned a 64-byte
struct to 72-byte when caller debugging is enabled.

This patch revisits this with a minor change. Now only the lowest 32
bits of the call date are stored, so they always fit in the remaining
hole, and this allows to remove the dependency on DEBUG_TASK. With
debugging off, we're now seeing a 48-byte struct, and with debugging
on it's exactly 64 bytes, thus still exactly one cache line. 32 bits
allow a latency of 4 seconds on a tasklet, which already indicates a
completely dead process, so there's no point storing the upper bits at
all. And even in the event it would happen once in a while, the lost
upper bits do not really add any value to the debug reports. Also, now
one tasklet wakeup every 4 billion will not be sampled due to the test
on the value itself. Similarly we just don't care, it's statistics and
the measurements are not 9-digit accurate anyway.
2022-09-08 14:19:15 +02:00
Frédéric Lécaille
614742b79c MINOR: quic: No TRACE_LEAVE() in retrieve_qc_conn_from_cid()
This macro was confused with TRACE_ENTER().

Should be backported to 2.6.
2022-09-07 15:59:43 +02:00
Frédéric Lécaille
449804e27d MINOR: quic: Add traces about sent or resent TX frames
Very useful to help in debugging issues, especially during retransmissions.

Should be backported to 2.6
2022-09-07 15:59:29 +02:00
William Lallemand
70a6e637b4 MINOR: quic: add QUIC support when no client_hello_cb
Add QUIC support to the ssl_sock_switchctx_cbk() variant used only when
no client_hello_cb is available.

This could be used with libreSSL implementation of QUIC for example.
It also works with quictls when HAVE_SSL_CLIENT_HELLO_CB is removed from
openss-compat.h
2022-09-07 11:33:28 +02:00
William Lallemand
373ce73695 BUILD: quic: fix the #ifdef in ssl_quic_initial_ctx()
As done on with ssl_sock_initial_ctx(), cleanup the ifdef for the
client_hello_cb and the no anti replay.
2022-09-07 11:11:59 +02:00
William Lallemand
4b7938d160 BUILD: ssl: fix the ifdef mess in ssl_sock_initial_ctx
ssl_sock_initial_ctx uses the wrong #ifdef to check the availability of
the client_hello_cb.

Cleanup the #ifdef, add comments and indentation.
2022-09-07 10:54:17 +02:00
William Lallemand
e6ec626ac5 BUILD: quic: enable early data only with >= openssl 1.1.1
Disable the early data in the QUIC code when not built with openssl >=
1.1.1.

LibreSSL 3.6.0 is impacted.
2022-09-07 09:33:46 +02:00
William Lallemand
844009d77a BUILD: ssl: fix ssl_sock_switchtx_cbk when no client_hello_cb
When building HAProxy with USE_QUIC and libressl 3.6.0, the
ssl_sock_switchtx_cbk symbol is not found because libressl does not
implement the client_hello_cb.

A ssl_sock_switchtx_cbk version for the servername callback is available
but wasn't exported correctly.
2022-09-07 09:33:46 +02:00
Frédéric Lécaille
2be0ac55e1 BUG/MINOR: quic: Possible crash when verifying certificates
This verification is done by ssl_sock_bind_verifycbk() which is set at different
locations in the ssl_sock.c code . About QUIC connections, there are a lot of chances
the connection object is not initialized when entering this function. What must
be accessed is the SSL object to retrieve the connection or quic_conn objects,
then the bind_conf object of the listener. If the connection object is not found,
we try to find the quic_conn object.

Modify ssl_sock_dump_errors() interface which takes a connection object as parameter
to also passed a quic_conn object as parameter. Again this function try first
to access the connection object if not NULL or the quic_conn object if not.

There is a remaining thing to do for QUIC: store the certificate verification error
code as it is currently stored in the connection object. This error code is at least
used by the "bc_err" and "fc_err" sample fetches.

There are chances this bug is in relation with GH #1851. Thank you to @tasavis
for the report.

Must be merged into 2.6.
2022-09-06 20:42:02 +02:00
Christopher Faulet
a9e934bbd1 BUG/MINOR: h1: Support headers case adjustment for TCP proxies
On frontend side, "h1-case-adjust-bogus-client" option is now supported in
TCP mode. It is important to be able to adjust the case of response headers
when a connection is routed to an HTTP backend. In this case, the client
connection is upgraded to H1.

On backend side, "h1-case-adjust-bogus-server" option is now also supported
in TCP mode to be able to perform HTTP health-checks with a case adjustment
of the request headers.

This patch should be backported as far as 2.0.
2022-09-06 18:23:14 +02:00
Christopher Faulet
4b5f3029bc MINOR: http-check: Remove support for headers/body in "option httpchk" version
This trick is deprecated since the health-check refactoring, It is now
invalid. It means the following line will trigger an error during the
configuration parsing:

  option httpchk OPTIONS * HTTP/1.1\r\nHost:\ www

It must be replaced by:

  option httpchk OPTIONS * HTTP/1.1
  http-check send hdr Host www
2022-09-06 18:23:14 +02:00
Frédéric Lécaille
6aec1f380e BUG/MINOR: quic: Possible crash with "tls-ticket-keys" on QUIC bind lines
ssl_tlsext_ticket_key_cb() is called when "tls-ticket-keys" option is used on a
"bind" line. It needs to have an access to the TLS ticket keys which have been
stored into the listener bind_conf struct. The fix consists in nitializing the
<ref> variable (references to TLS secret keys) the correct way when this callback
is called for a QUIC connection. The bind_conf struct is store into the quic_conn
object (QUIC connection).

This issue may be in relation with GH #1851. Thank you for @tasavis for the report.

Must be backported to 2.6.
2022-09-06 17:56:53 +02:00
Frédéric Lécaille
025945f12c BUG/MINOR: quic: Retransmitted frames marked as acknowledged
Obviously, frames which are duplicated from others must not be retransmitted if
the original frame they were duplicated from was already acknowledged.
This should have been detected by qc_build_frms() which skips such frames,
except if the QUIC xprt does really bad things which are not supported by
the upper layer. This will have to be checked with Amaury.

To prevent the retransmision of these frames which leads to crashes as reported by
hpn0t0ad this gdb backtrace in GH #1809 where the frame builder tries to copy a huge
number of bytes to the packet buffer:

Thread 7 (Thread 0x7fddf373a700 (LWP 13)):
 #0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:520
No locals.
 #1  0x000055b17435705e in quic_build_stream_frame (buf=0x7fddf372ef78, end=<optimized out>, frm=0x7fdde08d3470, conn=<optimized out>) at src/quic_frame.c:515
        to_copy = 18446697703428890384
        stream = 0x7fdde08d3490
        wrap = <optimized out>

which matches this part of quic_frame.c code:

    wrap = (const unsigned char *)b_wrap(stream->buf);
    if (stream->data + stream->len > wrap) {
        size_t to_copy = wrap - stream->data;
        memcpy(*buf, stream->data, to_copy);
        *buf += to_copy;

we release as soon as possible the impacted frames as there is really no need
to retransmit such frames.

Thank you to @hpn0t0ad for having provided us with useful traces in github
issue #1809.

Must be backported in 2.6.
2022-09-06 14:23:52 +02:00
Brad Smith
ef9d594839 MINOR: Revert part of clarifying samples support per os commit
Commit 5c83e3a156 made some adjustments
to clarify which TCP_INFO information is supported by each respective
OS.

There was a comment like so..

Note that fc_rtt and fc_rttvar are supported on any OS that has TCP_INFO,
not just linux/freebsd/netbsd, so we continue to expose them unconditionally.

But the diff didn't do so in a consistent manner.
2022-09-03 06:11:08 +02:00
Willy Tarreau
6a03a0d86d BUG/MINOR: http-act: initialize http fmt head earlier
In github issue #1850, Christian Ruppert reported a case of crash in
2.6 when failing to parse some http rules. This started to happen
with 2.6 commit dd7e6c6 ("BUG/MINOR: http-rules: completely free
incorrect TCP rules on error") but has some of its roots in 2.2
commit 2eb539687 ("MINOR: http-rules: Add release functions for
existing HTTP actions").

The cause is that when the release function is set for HTTP actions,
the rule->arg.http.fmt list head is not yet initialized, hence is
NULL, thus the release function crashes when it tries to iterate over
it. In fact this code was initially not written with the perspective
of releasing such elements upon error, so the arg list initialization
happened after error checking.

This patch just moves the list initialization just after setting the
release pointer and that's OK.

This patch must be backported to 2.6 since the problem is visible
there. It could be backported to 2.5 but the issue is not triggered
there without the first mentioned patch above that landed in 2.6, so
it will not bring any obvious benefit.
2022-09-02 19:24:12 +02:00
Willy Tarreau
e6f389d1a5 MINOR: mux-h1: provide a "show_sd" helper to output stream debugging info
With this, it now becomes possible to see the state of each H1 stream from
"show sess all". Example (added lines highlighted with '>'):

  0x7fc9b40460a0: [02/Sep/2022:16:29:31.267228] id=49 proto=tcpv4 source=127.0.0.1:53548
    flags=0xc4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x2dc4b20, pend_pos=(nil) waiting=0 epoch=0
    frontend=decrypt (id=2 mode=http), listener=? (id=3) addr=127.0.0.1:8001
    backend=decrypt (id=2 mode=http) addr=127.0.0.1:25168
    server=httpterm (id=1) addr=127.0.0.1:8000
    task=0x7fc9b4046490 (state=0x00 nice=0 calls=4 rate=0 exp=3s tid=7(1/7) age=2s)
    txn=0x7fc9b4046650 flags=0x3000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x0d
    scf=0x7fc9b4046030 flags=0x00000080 state=EST endp=CONN,0x7fc9b4041f00,0x02804001 sub=1
>       h1s=0x7fc9b4041f00 h1s.flg=0x104010 .sd.flg=0x2804001 .req.state=MSG_DONE .res.state=MSG_DATA
>       .meth=GET status=200 .sd.flg=0x02804001 .sc.flg=0x00000080 .sc.app=0x7fc9b40460a0
>       .subs=0x7fc9b4046040(ev=1 tl=0x7fc9b4046540 tl.calls=9 tl.ctx=0x7fc9b4046030 tl.fct=sc_conn_io_cb)
>       h1c=0x7fc9b402b3f0 h1c.flg=0x302200 .sub=1 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0
        co0=0x7fc9bc02e740 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=LISTENER:0x2dc3c40
        flags=0x00000300 fd=79 fd.state=421 updt=0 fd.tmask=0x80
    scb=0x7fc9b4046590 flags=0x00000011 state=EST endp=CONN,0x7fc9b4048660,0x02840001 sub=0
>       h1s=0x7fc9b4048660 h1s.flg=0x4010 .sd.flg=0x2840001 .req.state=MSG_DONE .res.state=MSG_DATA
>       .meth=GET status=200 .sd.flg=0x02840001 .sc.flg=0x00000011 .sc.app=0x7fc9b40460a0 .subs=(nil)
>       h1c=0x7fc9b4048490 h1c.flg=0x80002200 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0
        co1=0x7fc9b4048270 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x2dc4b20
        flags=0x00000300 fd=131 fd.state=10122 updt=0 fd.tmask=0x80
    req=0x7fc9b40460c0 (f=0x49c40080 an=0x8000 pipe=0 tofwd=0 total=56)
        an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
        buf=0x7fc9b40460c8 data=(nil) o=0 p=0 i=0 size=0
        htx=0xdd90a0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
    res=0x7fc9b4046120 (f=0x80070202 an=0x4000000 pipe=0 tofwd=-1 total=603840788)
        an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
        buf=0x7fc9b4046128 data=(nil) o=0 p=0 i=0 size=0
        htx=0xdd90a0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
2022-09-02 16:43:25 +02:00
Willy Tarreau
7079c0fbbd MINOR: mux-h1: split "show_fd" into connection and stream
We now have two functions, one for dumping connections and the other
one for dumping the streams. This will permit to use it from show_sd.
A few optional line breaks were inserted where relevant to keep lines
homogenous when a prefix is passed.
2022-09-02 16:43:25 +02:00
Willy Tarreau
b4a4feee87 MINOR: mux-quic: provide a "show_sd" helper to output stream debugging info
It's very limited but at least provides the very basic info about QCS and
QCC when issuing "show sess all":

  scf=0x7fa9642394a0 flags=0x00000080 state=EST endp=CONN,0x7fa9642351f0,0x02001001 sub=3
>     qcs=0x7fa9642351f0 .flg=0x5 .id=396 .st=HCR .ctx=0x7fa9642353f0, .err=0
>     qcc=0x7fa96405ce20 .flg=0 .nbsc=100 .nbhreq=100, .task=0x7fa964054260
      co0=0x7fa96405cd50 ctrl=quic4 xprt=QUIC mux=QUIC data=STRM target=LISTENER:0x328c530
      flags=0x00200300 fd=-1 fd.state=00 updt=0 fd.tmask=0x0

It will need to be improved but it's better than nothing already. This
should be backported to 2.6 if the other dumps are backported.
2022-09-02 16:43:25 +02:00
Willy Tarreau
7051f73efe MINOR: mux-h2: insert line breaks in "show sess all" output for legibility
h2s and h2c were extremely long in the "show sess all" output, around 300
chars each. This adds a few line breaks to improve legibility, there are
now 3 lines for each, which are around the same length as the other ones
while keeping a natural arrangement. E.g (lines highlighted with '>'):

  0x7faad8144f80: [02/Sep/2022:15:49:40.171620] id=105283 proto=tcpv4 source=127.0.0.1:42942
    flags=0x100c4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x1f44b20, pend_pos=(nil) waiting=0 epoch=0
    frontend=decrypt (id=2 mode=http), listener=? (id=3) addr=127.0.0.1:8001
    backend=decrypt (id=2 mode=http) addr=127.0.0.1:18144
    server=httpterm (id=1) addr=127.0.0.1:8000
    task=0x7faad812b7c0 (state=0x00 nice=0 calls=2 rate=0 exp=4s tid=7(1/7) age=0s)
    txn=0x7faad81453e0 flags=0x43000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x0d
    scf=0x7faad81625d0 flags=0x00000080 state=EST endp=CONN,0x7faad811d380,0x02805001 sub=1
>       h2s=0x7faad811d380 h2s.id=2113 .st=HCR .flg=0x207001 .rxbuf=0@(nil)+0/0
>       .sc=0x7faad81625d0(.flg=0x00000080 .app=0x7faad8144f80) .sd=0x7faad8119dc0(.flg=0x02805001)
>       .subs=0x7faad81625e0(ev=1 tl=0x7faad86d6500 tl.calls=4 tl.ctx=0x7faad81625d0 tl.fct=sc_conn_io_cb)
>       h2c=0x7faad802c640 h2c.st0=FRH .err=0 .maxid=2157 .lastid=-1 .flg=0x0600 .nbst=1 .nbsc=1
>       .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=2157 .dbuf=0@(nil)+0/0
>       .msi=-1 .mbuf=[6..6|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0]
        co0=0x7faae402efc0 ctrl=tcpv4 xprt=RAW mux=H2 data=STRM target=LISTENER:0x1f43c40
        flags=0x00000300 fd=95 fd.state=121 updt=0 fd.tmask=0x80
    scb=0x7faad8145370 flags=0x00000011 state=EST endp=CONN,0x7faad8115630,0x02840001 sub=1
        co1=0x7faad86c0730 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x1f44b20
        flags=0x00000300 fd=1656 fd.state=10121 updt=0 fd.tmask=0x80
    req=0x7faad8144fa0 (f=0x49c40000 an=0x8000 pipe=0 tofwd=0 total=110)
        an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
        buf=0x7faad8144fa8 data=(nil) o=0 p=0 i=0 size=0
        htx=0xdd90a0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
    res=0x7faad8145000 (f=0x80040202 an=0x4000000 pipe=0 tofwd=-1 total=60365)
        an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
        buf=0x7faad8145008 data=(nil) o=0 p=0 i=0 size=0
        htx=0xdd90a0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
2022-09-02 16:43:03 +02:00
Willy Tarreau
bf4ec6f4a0 MINOR: mux-h2: provide a "show_sd" helper to output stream debugging info
With this, it now becomes possible to see the state of each H2 stream from
"show sess all". Lines are still too long and need to be split, but that's
for another patch.
2022-09-02 15:48:50 +02:00
Willy Tarreau
ce57777660 MINOR: muxes: add a "show_sd" helper to complete "show sess" dumps
This helper will be called for muxes that provide it and will be used
to let the mux provide extra information about the stream attached to
a stream descriptor. A line prefix is passed in argument so that the
mux is free to break long lines without breaking indent. No prefix
means no line breaks should be produced (e.g. for short dumps).
2022-09-02 15:48:50 +02:00
Willy Tarreau
4e97bcc76b MINOR: mux-h2: extract the connection dump function out of h2_show_fd()
The function will be reusable to dump connections, so let's extract it.
2022-09-02 15:48:10 +02:00
Willy Tarreau
90bffa2ce3 MINOR: mux-h2: extract the stream dump function out of h2_show_fd()
The function will be reusable to dump streams, so let's extract it.
Note that due to "last_h2s" being originally printed as a prefix for
the stream dump, now the pointer is displayed by the caller instead.
2022-09-02 15:48:10 +02:00
Willy Tarreau
714900a3c9 MINOR: debug: report applet pointer and handler in crashes when known
When an appctx is found looping over itself, we report a number of info
but not the pointers to the definition nor the handler, which can be quite
handy in some cases. Let's add them and try to decode the symbol.
2022-09-02 15:48:10 +02:00
Willy Tarreau
410546145b BUG/MINOR: mux-fcgi: fix the "show fd" dest buffer for the subscriber
Commit 1776ffb97 ("MINOR: mux-fcgi: make the "show fd" helper also decode
the fstrm subscriber when known") improved the output of "show fd" for the
FCGI mux, but the output is sent to the trash buffer instead of the msg
argument. It turns out that this has no effect right now as the caller
passes the trash but this is risky.

This should be backported to 2.4.
2022-09-02 14:23:56 +02:00
Willy Tarreau
9b6a187e26 BUG/MINOR: mux-h1: fix the "show fd" dest buffer for the subscriber
Commit 150c4f8b7 ("MINOR: mux-h1: make the "show fd" helper also decode
the h1s subscriber when known") improved the output of "show fd" for the
H1 mux, but the output is sent to the trash buffer instead of the msg
argument. It turns out that this has no effect right now as the caller
passes the trash but this is risky.

This should be backported to 2.4.
2022-09-02 14:23:56 +02:00
Willy Tarreau
ba7657ca0f BUG/MINOR: mux-h2: fix the "show fd" dest buffer for the subscriber
Commit 98e40b981 ("MINOR: mux-h2: make the "show fd" helper also decode
the h2s subscriber when known") improved the output of "show fd" for the
H2 mux, but the output is sent to the trash buffer instead of the msg
argument. It turns out that this has no effect right now as the caller
passes the trash but this is risky.

This should be backported to 2.4.
2022-09-02 14:23:56 +02:00
Willy Tarreau
df3231c74a MEDIUM: httpclient: enable ALPN support on outgoing https connections
Since everything is available for this, let's enable ALPN with the
usual "h2,http/1.1" on the https server. This will allow HTTPS requests
to use HTTP/2 when available.

It may be needed to permit to disable this (or to set the string) in
case some client code explicitly checks for the "HTTP/1.1" string, but
since httpclient is quite young it's unlikely that such code already
exists.
2022-09-02 13:54:30 +02:00
Willy Tarreau
f80713ba8e BUG/MINOR: httpclient: keep-alive was accidentely disabled
The servers were not set with default settings, meaning that a few
settings including the pool_max_delay were not set, thus disabling
connection pools, which is the cause of the fact that keep-alive was
disabled as reported in issue #1831. There might possibly be other
issues pending since all these fields were left to zero.

Note that this patch alone will not fix keep-alive because the applet
does not enforce SE_FL_NOT_FIRST and relies on the default http-reuse
safe, thus if servers are not shared, all requests are considered
first ones and do not reuse existing connections.

In 2.7, commit ecb40b2c3 ("MINOR: backend: always satisfy the first
req reuse rule with l7 retries") addressed this in a more elegant way
by fixing http-reuse to take into account the fact that properly
configured l7 retries provide exactly the capability that reuse safe
was trying to cover, and this patch is suitable for backporting.

This patch should be backported to 2.6 only.
2022-09-02 11:48:01 +02:00
Willy Tarreau
6486ff8cab BUG/MINOR: httpclient: only ask for more room on failed writes
There's a tiny issue in the I/O handler by which both a failed request
emission and missing response data will want to subscribe for more room
on output. That's not correct in that only the case where the request
buffer is full should cause this, the other one should just wait for
incoming data. This could theoretically cause spurious wakeups at
certain key points (e.g. connect() time maybe) though this could not
be reproduced but better fix this while it's easy enough.

It doesn't seem necessary to backport it right now, though this may
have to in case a concrete reproducible case is discovered.
2022-09-02 11:42:50 +02:00
Willy Tarreau
b48292068b BUG/MEDIUM: httpclient: always detach the caller before self-killing
If the caller dies before the server responds, the httpclient can crash
in hc_cli_res_end_cb() when unregistering because it dereferences
hc->caller which was already freed during the caller's unregistration.
The easiest way to reproduce it is by sending twice the following
request on the same CLI connection in expert mode, with httpterm
running on local port 8000:

   httpclient GET http://127.0.0.1:8000/?t=600

Note the 600ms delay that's larger than socat's default 500.

The code checks for a NULL everywhere hc->caller is used, but the NULL
was forgotten in this specific case. It must be placed in the second
half of httpclient_stop_and_destroy() which is responsible for signaling
the client that the caller leaves.

This must be backported to 2.6.
2022-09-02 11:19:07 +02:00
Willy Tarreau
d8a44d0b24 BUG/MINOR: h2: properly set the direction flag on HTX response
In 1.9-dev, a new flag was introduced on the start line with commit
f1ba18d7b ("MEDIUM: htx: Don't rely on h1_sl anymore except during H1
header parsing") to designate a response message: HTX_SL_F_IS_RESP.

Unfortunately as it was done in parallel to the mux_h2 support for
the backend, it was never integrated there. It was not used by then
so this remained unnoticed for a while.

However the http_client now uses it, and missing that flag prevents
it from using the H2 mux, so let's properly add it.

There's no point in backporting this far away, but since the http_client
is fully operational in 2.6 it would make sense to backport this fix at
least there to secure the code.
2022-09-02 11:19:07 +02:00
Frédéric Lécaille
a1075209c7 BUG/MINOR: quic: Frames leak during retransmissions
The frame which are retransmitted by qc_dgrams_retransmit() are duplicated
from sent but not acknowledged packets and added to local frames lists.
Some may not have been sent. If not replaced somewhere (linked to the
connection) they are lost for ever (leak). We splice the list remaining
contents to the packets number space frame list to avoid such a situation.

Must be backported to 2.6.
2022-09-02 08:47:38 +02:00
Frédéric Lécaille
a777ee36f6 MINOR: quic: Trace typo fix in qc_release_frm()
Grammar fix without any impact.
2022-09-02 08:47:38 +02:00
Frédéric Lécaille
26236f5a5d MINOR: quic: Add TX frames addresses to traces to several trace events
This should be useful to diagnose TX frames related issues.
2022-09-02 08:47:38 +02:00
Frédéric Lécaille
b866c69f4f BUG/MINOR: quic: Do not ack when probing
<force_ack> boolean variable passed to qc_do_build_pkt() which builds a clear
packet is there to force this function to build an ACK frame regardless of
others conditions. This is used during handshake, when we acknowledge every
handshake packets received.

This variable was already taken into an account by the local variable <must_ack>
which is there at least to ignore any other conditions than this one: "are
we building a probing packet?". Indeed we do not want to add ACK frames when
we probe the peers. This is to have more chances to embed the new duplicated frames
into another packets without splitting them. So, the test on <force_ack> boolean
value is useless, silly and brakes the rule which consists in not acknowledging
when probing.

Must be backported to 2.6.
2022-09-02 08:47:38 +02:00
Willy Tarreau
ecb40b2c38 MINOR: backend: always satisfy the first req reuse rule with l7 retries
The "first req" rule consists in not delivering a connection's first
request to a connection that's not known for being safe so that we
don't deliver a broken page to a client if the server didn't intend to
keep it alive. That's what's used by "http-reuse safe" particularly.

But the reason this rule was created was precisely because haproxy was
not able to re-emit the request to the server in case of connection
breakage, which is precisely what l7 retries later brought. As such,
there's no reason for enforcing this rule when l7 retries are properly
enabled because such a blank page will trigger a retry and will not be
delivered to the client.

This patch simply checks that the l7 retries are enabled for the 3 cases
that can be triggered on a dead or dying connection (failure, empty, and
timeout), and if all 3 are enabled, then regular idle connections can be
reused.

This could almost be marked as a bug fix because a lot of users relying
on l7 retries do not necessarily think about using http-reuse always due
to the recommendation against it in the doc, while the protection that
the safe mode offers is never used in that mode, and it forces the http
client not to reuse existing persistent connections since it never sets
the "not first" flag.

It could also be decided that the protection is not used either when
the origin is an applet, as in this case this is internal code that
we can decide to let handle the retry by itself (all info are still
present). But at least the httpclient will be happy with this alone.

It would make sense to backport this at least to 2.6 in order to let
the httpclient reuse connections, maybe to older releases if some
users report low reuse counts.
2022-09-01 20:52:29 +02:00
Willy Tarreau
4d1ff11f05 BUG/MEDIUM: mux-h1: always use RST to kill idle connections in pools
When idle H1 connections cannot be stored into a server pool or are later
evicted, they're often seen closed with a FIN then an RST. The problem is
that this is sufficient to leave them in TIME_WAIT in the local sockets
table and port exhaustion may happen.

The reason is that in h1_release() we rely on h1_shutw_conn() which itself
decides whether to close in silent or normal mode only based on the
H1C_F_ST_SILENT_SHUT flag. This flag is only set by h1_shutw() based on
the requested mode. But when the connection is in the idle list, the mode
ought to always be silent.

What this patch does is to set the flag before trying to add to the idle
list, and remove it after removing from the idle list. This way if the
connection fails to be added or has to be killed, it's closed with an
RST.

This must be backported as far as 2.4. It's not sure whether older
versions need an equivalent.
2022-09-01 20:52:29 +02:00
Christopher Faulet
f348ecd67a BUG/MINOR: regex: Properly handle PCRE2 lib compiled without JIT support
The PCRE2 JIT support is buggy. If HAProxy is compiled with USE_PCRE2_JIT
option while the PCRE2 library is compiled without the JIT support, any
matching will fail because pcre2_jit_compile() return value is not properly
handled. We must fall back on pcre2_match() if PCRE2_ERROR_JIT_BADOPTION
error is returned.

This patch should fix the issue #1848. It must be backported as far as 2.4.
2022-09-01 19:34:46 +02:00
Willy Tarreau
32872db605 MINOR: sink/ring: rotate non-empty file-backed contents only
If the service is rechecked before a reload, that may cause the config
to be parsed twice and file-backed rings to be lost.

Here we make sure that such a ring does contain information before
deciding to rotate it. This way the first process starting after some
writes will cause a rotate but not subsequent ones until new writes
are applied.

An attempt was also made to disable rotations on checks but this was a
bad idea, as the ring is still initialized and this causes the contents
to be lost. The choice of initializing the ring during parsing is
questionable but the config check ought to be as close as possible to a
real start, and we could imagine that the ring is used by some code
during startup (e.g. lua). So this approach was abandonned and config
checks also cause a rotation, as the purpose of this rotation is to
preserve latest information against accidental removal.
2022-09-01 08:25:34 +02:00
William Lallemand
e0fa91ffe1 BUG/MINOR: ssl: leak of ckch_inst_link in ckch_inst_free() v2
ckch_inst_free() unlink the ckch_inst_link structure but never free it.

It can't be fixed simply because cli_io_handler_commit_cafile_crlfile()
is using a cafile_entry list to iterate a list of ckch_inst entries
to free. So both cli_io_handler_commit_cafile_crlfile() and
ckch_inst_free() would modify the list at the same time.

In order to let the caller manipulate the ckch_inst_link,
ckch_inst_free() now checks if the element is still attached before
trying to detach and free it.

For this trick to work, the caller need to do a LIST_DEL_INIT() during
the iteration over the ckch_inst_link.

list_for_each_entry was also replace by a while (!LIST_ISEMPTY()) on the
head list in cli_io_handler_commit_cafile_crlfile() so the iteration
works correctly, because it could have been stuck on the first detached
element. list_for_each_entry_safe() is not enough to fix the issue since
multiple element could have been removed.

Must be backported as far as 2.5.
2022-08-31 15:24:01 +02:00
Frédéric Lécaille
bccbad2654 BUG/MINOR: quic: TX frames memleak
Missing call to pool_free() for quic_frame objects

Must be backported to 2.6.
2022-08-31 15:20:29 +02:00
Frédéric Lécaille
3a9b944955 MINOR: quic: Move traces about RX/TX bytes from QUIC_EV_CONN_PRSAFRM event
Move these traces to QUIC_EV_CONN_SPPKTS trace event. They were displayed
at a useless location. Make them displayed just after having sent a packet
and when checking the anti-amplication limit.
Useful to diagnose issues in relation with the recovery.
2022-08-31 15:20:24 +02:00
William Lallemand
0bfa3e7ff2 BUG/MINOR: ssl: revert two wrong fixes with ckhi_link
This reverts commit 056ad01d55.
This reverts commit ddd480cbdc.

The architecture is ambiguous here: ckch_inst_free() is detaching and
freeing the "ckch_inst_link" linked list which must be free'd only from
the cafile_entry side.

The problem was also hidden by the fix ddd480c ("BUG/MEDIUM: ssl: Fix a
UAF when old ckch instances are released") which change the ckchi_link
inner loop by a safe one. However this can't fix entirely the problem
since both __ckch_inst_free_locked() could remove several nodes in the
ckchi_link linked list.

This revert is voluntary reintroducing a memory leak before really fixing
the problem.

Must be backported in 2.5 + 2.6.
2022-08-30 18:12:28 +02:00
Christopher Faulet
ddd480cbdc BUG/MEDIUM: ssl: Fix a UAF when old ckch instances are released
When old chck instances is released at the end of "commit ssl ca-file" or
"commit ssl crl-file" commands, the link is released. But we walk through
the list using the unsafe macro. list_for_each_entry_safe() must be used.

This bug was introduced by commit 056ad01d5 ("BUG/MINOR: ssl: leak of
ckch_inst_link in ckch_inst_free()"). Thus this patch must be backported as
far as 2.5.
2022-08-30 16:27:51 +02:00
Christopher Faulet
f611248d8c BUG/MINOR: tcpcheck: Disable QUICKACK for default tcp-check (with no rule)
The commit 871dd8211 ("BUG/MINOR: tcpcheck: Disable QUICKACK only if data
should be sent after connect") introduced a regression. It removes the test
on the next rule to be able to disable TCP_QUICKACK when only a connect is
performed (so no next rule).

This patch must be backported as far as 2.2.
2022-08-30 10:31:16 +02:00
William Lallemand
056ad01d55 BUG/MINOR: ssl: leak of ckch_inst_link in ckch_inst_free()
ckch_inst_free() unlink the ckch_inst_link structure but never free it.
It can cause a memory leak upon a ckch_inst_free() done with CLI
operation.

Bug introduced by commit 4458b97 ("MEDIUM: ssl: Chain ckch instances in
ca-file entries").

Must be backported as far as 2.5.
2022-08-29 18:53:34 +02:00
William Lallemand
946580e17a BUG/MINOR: ssl: fix deinit of the ca-file tree
Commit b0c4827 ("BUG/MINOR: ssl: free the cafile entries on deinit")
introduced a double free.

The node was never removed from the tree before its free.

Fix issue #1836.

Must be backported where b0c4827 was backported. (2.6 for now).
2022-08-29 18:51:39 +02:00
Frédéric Lécaille
3a56137048 MINOR: quic: Add a trace to distinguish the datagram from the packets inside
Without such a trace, we do not know when a datagram is sent. Only trace for
the packets inside the datagrams were displayed.

Must be backported to 2.6.
2022-08-29 18:46:40 +02:00
Frédéric Lécaille
c242832af3 BUG/MINOR: quic: Missing header protection AES cipher context initialisations (draft-v2)
This bug arrived with this commit:
   "MINOR: quic: Add reusable cipher contexts for header protection"

haproxy could crash because of missing cipher contexts initializations for
the header protection and draft-v2 Initial secrets. This was due to the fact
that these initialization both for RX and TX secrets were done outside of
qc_new_isecs(). The role of this function is definitively to initialize these
cipher contexts in addition to the derived secrets. Indeed this function is called
by qc_new_conn() which initializes the connection but also by qc_conn_finalize()
which also calls qc_new_isecs() in case of a different QUIC version was negotiated
by the peers from the one used by the client for its first Initial packet.

This was reported by "v2" QUIC interop test with at least picoquic as client.

Must be backported to 2.6.
2022-08-29 18:46:40 +02:00
Willy Tarreau
c6fc77404e MINOR: raw-sock: don't try to send if an error was already reported
There's no point trying to send() on a socket on which an error was already
reported. This wastes syscalls. Till now it was possible to occasionally
see an attempt to sendto() after epoll_wait() had reported EPOLLERR.
2022-08-29 18:45:27 +02:00
Willy Tarreau
2c30de3b90 BUG/MINOR: epoll: do not actively poll for Rx after an error
In 2.2, commit 5d7dcc2a8 ("OPTIM: epoll: always poll for recv if neither
active nor ready") was added to compensate for the fact that our iocbs
are almost always asynchronous now and do not have the opportunity to
update the FD correctly. As such, they just perform a wakeup, the FD is
turned to inactive, the tasklet wakes up, performs the I/O, updates the
FD, most of the time this is done withing the same polling loop, and the
update cancels itself in the poller without having to switch the FD off
then on.

The issue was that when deciding to claim an FD was active for reads
if it was active for writes, we forgot one situation that unfortunately
causes excessive wakeups: dealing with errors. Indeed, errors are
reported and keep ringing as long as the FD is active for sending even
if the consumer disabled the FD for receiving. Usually this only causes
one extra wakeup for the time it takes to consider a potential write
subscriber and to call it, though with many tasks in a run queue, it
can last a bit longer and be reported more often.

The fix consists in checking that we really want to get more receive
events on this FD, that is:
  - that no prevous EPOLLERR was reported
  - that the FD doesn't carry a sticky error
  - that the FD is not shut for reads

With this, after the last epoll_wait() reports EPOLLERR, one last recv()
is performed to flush pending data and the FD is immediately unregistered.

It's probably not needed to backport this as its effects are not much
visible, though it should not harm.

Before, EPOLLERR was seen twice:

  accept4(4, {sa_family=AF_INET, sin_port=htons(22314), sin_addr=inet_addr("127.0.0.1")}, [128 => 16], SOCK_NONBLOCK) = 8
  accept4(4, 0x261b160, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
  recvfrom(8, "POST / HTTP/1.1\r\nConnection: close\r\nTransfer-encoding: chunk"..., 16320, 0, NULL, NULL) = 66
  socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 9
  connect(9, {sa_family=AF_INET, sin_port=htons(8002), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
  epoll_ctl(3, EPOLL_CTL_ADD, 8, {events=EPOLLIN|EPOLLRDHUP, data={u32=8, u64=8}}) = 0
  epoll_ctl(3, EPOLL_CTL_ADD, 9, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=9, u64=9}}) = 0
  epoll_wait(3, [{events=EPOLLOUT, data={u32=9, u64=9}}], 200, 355) = 1
  recvfrom(9, 0x25cfb30, 16320, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  sendto(9, "POST / HTTP/1.1\r\ntransfer-encoding: chunked\r\n\r\n", 47, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 47
  epoll_ctl(3, EPOLL_CTL_MOD, 9, {events=EPOLLIN|EPOLLRDHUP, data={u32=9, u64=9}}) = 0
  epoll_wait(3, [{events=EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, data={u32=9, u64=9}}], 200, 354) = 1
  recvfrom(9, "HTTP/1.1 200 OK\r\ncontent-length: 0\r\nconnection: close\r\n\r\n", 16320, 0, NULL, NULL) = 57
  sendto(8, "HTTP/1.1 200 OK\r\ncontent-length: 0\r\nconnection: close\r\n\r\n", 57, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 57
->epoll_wait(3, [{events=EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, data={u32=9, u64=9}}], 200, 354) = 1
  epoll_ctl(3, EPOLL_CTL_DEL, 9, 0x7ffe0b65fb24) = 0
  epoll_wait(3, [{events=EPOLLIN, data={u32=8, u64=8}}], 200, 354) = 1
  recvfrom(8, "A\n0123456789\r\n0\r\n\r\n", 16320, 0, NULL, NULL) = 19
  close(9)                          = 0
  close(8)                          = 0

After, EPOLLERR is only seen only once, with one less call to epoll_wait():

  accept4(4, {sa_family=AF_INET, sin_port=htons(22362), sin_addr=inet_addr("127.0.0.1")}, [128 => 16], SOCK_NONBLOCK) = 8
  accept4(4, 0x20d0160, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
  recvfrom(8, "POST / HTTP/1.1\r\nConnection: close\r\nTransfer-encoding: chunk"..., 16320, 0, NULL, NULL) = 66
  socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 9
  connect(9, {sa_family=AF_INET, sin_port=htons(8002), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
  epoll_ctl(3, EPOLL_CTL_ADD, 8, {events=EPOLLIN|EPOLLRDHUP, data={u32=8, u64=8}}) = 0
  epoll_ctl(3, EPOLL_CTL_ADD, 9, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=9, u64=9}}) = 0
  epoll_wait(3, [{events=EPOLLOUT, data={u32=9, u64=9}}], 200, 411) = 1
  recvfrom(9, 0x2084b30, 16320, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  sendto(9, "POST / HTTP/1.1\r\ntransfer-encoding: chunked\r\n\r\n", 47, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 47
  epoll_ctl(3, EPOLL_CTL_MOD, 9, {events=EPOLLIN|EPOLLRDHUP, data={u32=9, u64=9}}) = 0
  epoll_wait(3, [{events=EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, data={u32=9, u64=9}}], 200, 411) = 1
  recvfrom(9, "HTTP/1.1 200 OK\r\ncontent-length: 0\r\nconnection: close\r\n\r\n", 16320, 0, NULL, NULL) = 57
  sendto(8, "HTTP/1.1 200 OK\r\ncontent-length: 0\r\nconnection: close\r\n\r\n", 57, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 57
  epoll_ctl(3, EPOLL_CTL_DEL, 9, 0x7ffc95d46f04) = 0
  epoll_wait(3, [{events=EPOLLIN, data={u32=8, u64=8}}], 200, 411) = 1
  recvfrom(8, "A\n0123456789\r\n0\r\n\r\n", 16320, 0, NULL, NULL) = 19
  close(9)                          = 0
  close(8)                          = 0
2022-08-29 18:45:27 +02:00
Willy Tarreau
cad42a78b8 BUG/MEDIUM: mux-h1: do not refrain from signaling errors after end of input
In 2.6-dev4, a fix for truncated response was brought with commit 99bbdbcc2
("BUG/MEDIUM: mux-h1: only turn CO_FL_ERROR to CS_FL_ERROR with empty ibuf"),
trying to address the situation where an error is present at the connection
level but some data are still pending to be read by the stream. However,
this patch did not consider the case where the stream was no longer willing
to read the pending data, resulting in a situation where some aborted
transfers could lead to excessive CPU usage by causing constant stream
wakeups for which no error was reported. This perfectly matches what was
observed and reported in github issue #1842. It's not trivial to reproduce,
but aborting HTTP/1 pipelining in the middle of transfers seems to give
good results (using h2load and Ctrl-C in the middle).

The fix was incorrct as the error should be held only if there were data
that the stream was able to read. This is the approach taken by this patch,
which also checks via SE_FL_EOI | SE_FL_EOS that the stream will be able
to consume the pending data.

Note that the loop was provoked by the attempt by sc_conn_io_cb() itself
to call sc_conn_send() which resulted in a write subscription in
h1_subscribe() which immediately calls a tasklet_wakeup() since the
event is ready, and that it is now stopped by the presence of SE_FL_ERROR
that is checked in sc_conn_io_cb(). It seems that an extra check down the
send() path to refrain from subscribing when the connection is in error
could speed up error detection or at least avoid a risk of loops in this
case, but this is tricky. In addition, there's already SE_FL_ERR_PENDING
that seems more suitable for reporting when there are pending data, but
similarly, it probably isn't checked well enough to be suitable for
backports.

FWIW the issue may (unreliably) be reproduced by chaining haproxy to
httpterm and issuing:

  (printf "GET /?s=10g HTTP/1.1\r\n\r\n"; sleep 0.1; printf "\r\n") | \
    nc6 --half-close 0 8001 | head -c1000000000 >/dev/null

It's necessary to play with the size of the head command that's supposed
to trigger the error at some point. A variant involving h2load in h1 mode
and multiple pipelined streams, that is stopped with Ctrl-C also tends to
work.

As the fix above was backported as far as 2.0, it would be tempting to
backport this one as far. However tests have shown that the oldest
version that can trigger this issue is 2.5, maybe due to subtle
differences in older ones, so it's probably not worth going further
until an issue is reported. Note that in 2.5 and older, the SE_FL_*
flags are applied on the conn_stream instead, as CS_FL_*.

Special thanks go to Felipe W Damasio for providing lots of detailed data
allowing to quickly spot the root cause of the problem.
2022-08-29 18:45:27 +02:00
Christopher Faulet
4a20972a95 BUG/MINOR: hlua: Rely on CF_EOI to detect end of message in HTTP applets
applet:getline() and applet:receive() functions for HTTP applets must rely
on the channel flags to detect the end of the message and not on HTX
flags. It means CF_EOI must be used instead of HTX_FL_EOM.

It is important because the HTX flag is transient. Because there is no flag
on HTTP applets to save the info, it is not reliable. However CF_EOI once
set is never removed. So it is safer to rely on it. Otherwise, the call to
these functions hang.

This patch must be backported as far as 2.4.
2022-08-29 15:37:17 +02:00
Christopher Faulet
b372f16d35 BUG/MEDIUM: peers: Don't start resync on reload if local peer is not up-to-date
On a reload, if the previous resync was not finished, the freshly old worker
must not try to start a new resync. Otherwise, it will compete with the
older wokers, slowing down or blocking the resync. Only an up-to-date woker
must try to perform a local resync.

This patch must be backported as far as 2.0 (and maybe to 1.8 too).
2022-08-29 11:38:02 +02:00
Christopher Faulet
19a82b9495 BUG/MEDIUM: peers: Don't use resync timer when local resync is in progress
When a worker is stopped, the resync timer is used to limit in time the
connection stage to the new worker to perform the local resync. However,
this timer must be stopped when the resync is in progress and it must be
re-armed if the resync is interrupted (for instance because another
reload). Otherwise, if the resync is a bit long, an old worker may be killed
too early.

This bug was introduce by the commit 160fff665 ("BUG/MEDIUM: peers: limit
reconnect attempts of the old process on reload"). It must be backported as
far as 2.0.
2022-08-29 11:38:02 +02:00
Christopher Faulet
13db4bdbc6 BUG/MEDIUM: peers: Add connect and server timeut to peers proxy
Only the client timeout was set. Nothing prevent a peer applet to stall
during a connect or waiting a message from a remote peer. To avoid any
issue, it is important to also set connection and server timeouts. The
connect timeout is set to 1s and the server timeout is set to 5s.

This patch must be backported to all supported versions.
2022-08-29 11:38:02 +02:00
Christopher Faulet
42a0662910 BUG/MEDIUM: spoe: Properly update streams waiting for a ACK in async mode
A bug was introduced by the commit b042e4f6f ("BUG/MAJOR: spoe: properly
detach all agents when releasing the applet"). The fix is not correct. We
really want to known if the released appctx is the last one or not. It is
important when async mode is used. If there are still running applets, we
just need to remove the reference on the current applet from streams in the
async waiting queue.

With the commit above, in async mode, if there are still running applets, it
will work as expected. Otherwise a processing timeout will be reported for
all these streams. So it is not too bad. But for other modes (sync and
pipelining), the async waiting queue is always empty. If at least one stream
is waiting to send a message, a new applet is created. It is an issue if the
SPOA is unhealthy because the number of running applets may explode.

However, the commit above tried to fix an issue. The bug is in fact when an
new SPOE applet is created. On success, we must remove reference on the
current appctx from the streams in the async waiting queue.

This patch must be backported as far as 1.8.
2022-08-29 09:57:33 +02:00
Frédéric Lécaille
149c531fa1 BUG/MINOR: quic: Frames added to packets even if not built.
Several frames could remain as not build into <frm_list> built by qc_build_frms()
after having stopped at the first building error. So only one frame was reinserted in
the frame list passed as parameter to qc_do_build_pkt(). Then <frm_list> was
spliced to the packet frame list even its frames were not built, nor attached to
any packet. Such frames had their ->pkt member set to NULL, but considered as
built, then sent leading to a crash in qc_release_frm() where ->pkt is dereferenced.

This issue was again reported by useful traces provided by Tristan in GH #1808.

Must be backported to 2.6.
2022-08-27 18:33:19 +02:00
Frédéric Lécaille
e35463c767 BUG/MINOR: quic: Null packet dereferencing from qc_dup_pkt_frms() trace
This function must duplicate frames be resent from packets. Some of
them are still in flight, others have already been detected as lost.
In this case the original frame ->pkt member is NULL.
Add a trace to distinguish these cases.

Thank you to Tristan for having reported this issue in GH #1808.

Must be backported to 2.6.
2022-08-27 10:29:30 +02:00
William Lallemand
d78dfe7891 BUG/MINOR: httpclient: fix resolution with port
Fix the resolution in the httpclient when a port is associated to a
domain. The do-resolve action doesn't support a port in its input.

Must be backported to 2.6. Require the "host_only" converter to be
backported.
2022-08-26 17:00:22 +02:00
William Lallemand
dd754cba16 MINOR: sample: add the host_only and port_only converters
Add 2 converters that can manipulate the value of an Host header.
host_only will return the host without any port, and port_only will
return the port.
2022-08-26 17:00:22 +02:00
Frédéric Lécaille
eba9088a7c Revert "MINOR: quic: Remove useless traces about references to TX packets"
This reverts commit f61398a7ca.
After having checked a version with more traces and reproduced the issue
as reported by Tristan in GH #1808, there are remaining cases where
a duplicated but not already sent frame have to be marked as acked because
the frame it was copied from was acknowledeged before its copied was sent.

Must be backported to 2.6.
2022-08-25 16:06:48 +02:00
Frédéric Lécaille
f61398a7ca MINOR: quic: Remove useless traces about references to TX packets
Since this commit:
    "BUG/MINOR: quic: Wrong list_for_each_entry() use when building packets from
     qc_do_build_pkt()"
there is no more reason that frames can be released without having been
sent, i.e. frames with non null ->pkt member. This ->pkt is the packet
the frame is attached to.

Must be backported to 2.6.
2022-08-25 07:35:47 +02:00
Frédéric Lécaille
560ddfa003 CLEANUP: quic: Remove a useless check in qc_lstnr_pkt_rcv()
This function parses the QUIC packet from a UDP datagram. It was originally
supposed to be run by several thread. Here we remove a section of code
where the current thread checks there is not another thread which has already
inserted the new quic_conn it is trying to insert in the connections tree.

Must be backported to 2.6 to ease the future backports to come.
2022-08-24 18:59:23 +02:00
Frédéric Lécaille
15773f2101 BUG/MINOR: quic: Stalled connections (missing I/O handler wakeup)
This was due to a missing I/O handler tasklet wakeup in process_timer() when
detecting packet loss. As, qc_release_lost_pkts() could remove the lost packets
from the in flight packets count, qc_set_timer() could cancel the timer used
to wakeup the connection I/O handler. Then the connection could remain idle
until it ends.

Must be backported to 2.6.
2022-08-24 18:13:30 +02:00
Frédéric Lécaille
277c4629e7 BUG/MINOR: quic: Leak in qc_release_lost_pkts() for non in flight TX packets
Packets with null "in flight" lengths are kept as the others packets as sent
but not already acknowledeged in the by packet number space trees.
But qc_release_lost_pkts() relied on this in fligh length to release the
memory allocated for this packets. We must release the memory allocated for
all the lost packets regardless of their in fligh lengths.

Modify this function to do nothing if the list of lost packets passed
as argument is empty. Stop using <lost_bytes> variable to decide if some packets
memory must be released or not.
Modify the callers to stop checking if this list is empty.

Should helping in fixing memory leak as reported by Tristan in GH #1801.

Must be backported to 2.6.
2022-08-24 18:13:30 +02:00
Frédéric Lécaille
5f6c25e447 Revert "BUG/MINOR: quix: Memleak for non in flight TX packets"
This reverts commit da9c441886.

Indeed this commit prevented the ACK only packets to be used as other packets
when they are acknowledged. Even if not ack-eliciting packets they are
acknowledged alongside others packets. Such acknowledged ACK only packets
must be used for instance to compute the RTT.

Must be backported to 2.6 if da9c441 was backported to 2.6.
2022-08-24 18:12:59 +02:00
William Lallemand
b10b1196b8 MINOR: resolvers: shut the warning when "default" resolvers is implicit
Shut the connect() warning of resolvers_finalize_config() when the
configuration was not emitted manually.

This shuts the warning for the "default" resolvers which is created
automatically for the httpclient.

Must be backported in 2.6.
2022-08-24 14:56:42 +02:00
Christopher Faulet
871dd82117 BUG/MINOR: tcpcheck: Disable QUICKACK only if data should be sent after connect
It is only a real problem for agent-checks when there is no agent string to
send. The condition to disable TCP_QUICKACK was only based on the action
type following the connect one. But it is not always accurate. indeed, for
agent-checks, there is always a SEND action. But if there is no "agent-send"
string defined, nothing is sent. In this case, this adds 200ms of latency
with no reason.

To fix the bug, a flag is now used on the CONNECT action to instruct there
are data that should be sent after the connect. For health-checks, this flag
is set if the action following the connect is a SEND action. For
agent-checks, it is set if an "agent-send" string is defined.

This patch should fix the issue #1836. It must be backported as far as 2.2.
2022-08-24 11:59:04 +02:00
William Lallemand
6020c4e44e BUG/MINOR: mworker: does not create the "default" resolvers in wait mode
When doing a re-exec, the master was creating a "default" resolvers,
which could result in a warning emitted because the "default" resolvers
of the configuration file is not available anymore.

Skip the creating of the "default" resolvers in wait mode, this is not
useful in the master.

Must be backported as far as 2.6.
2022-08-24 11:28:29 +02:00
William Lallemand
866b88bc95 BUG/MINOR: resolvers: return the correct value in resolvers_finalize_config()
Patch c31577f ("MEDIUM: resolvers: continue startup if network is
unavailable") was not working correctly. Indeed
resolvers_finalize_config() was returning a ERR type, but a postparser
is supposed to return 0 or 1.

The return value was never right, however it was only a problem since c31577f.

Must be backported in every stable branch.
2022-08-24 10:11:17 +02:00
Brad Smith
02fd3caa8f BUILD: tcp_sample: fix build of get_tcp_info() on OpenBSD
The build on OpenBSD is broken since commit 5c83e3a15 ("MINOR: tcp_sample:
clarifying samples support per os, for further expansion."), hence it
only affects 2.7 and 2.6.

It looks like this changed things in such a way that if TCP_INFO is added
but the OS is not added to the list of OS's it will not build.

Extend support for get_tcp_info to OpenBSD.

This must be backported to 2.6.
2022-08-24 05:23:13 +02:00
Willy Tarreau
8bd146d8af MEDIUM: peers: limit the number of updates sent at once
As seen in GH issue #1770, peers synchronization do not cope well with
very large buffers because by default the only two reasons for stopping
the processing of updates is either that the end was reached or that
the buffer is full. This can cause high latencies, and even rightfully
trigger the watchdog when the operations are numerous and slowed down
by competition on the stick-table lock.

This patch introduces a limit to the number of messages one may send
at once, which now defaults to 200, regardless of the buffer size. This
means taking and releasing the lock up to 400 times in a row, which is
costly enough to let some other parts work.

After some observation this could be backported to 2.6. If so, however,
previous commits "BUG/MEDIUM: applet: fix incorrect check for abnormal
return condition from handler" and "BUG/MINOR: applet: make the call_rate
only count the no-progress calls" must be backported otherwise the call
rate might trigger the looping protection.
2022-08-23 20:19:11 +02:00
Willy Tarreau
df3cab1ca1 BUG/MINOR: applet: make the call_rate only count the no-progress calls
This is very similar to what we did in commit 6c539c4b8 ("BUG/MINOR:
stream: make the call_rate only count the no-progress calls"), it's
better to only count the call rate with no progress than to count all
calls and try to figure if there's no progress, because a fast running
applet might once satisfy the whole condition and trigger the bug. This
typically happens when artificially limiting the number of messages sent
at once by an applet, but could happen with plenty of highly interactive
applets.

This patch could be backported to stable versions if there are any
indications that it might be useful there.
2022-08-23 20:19:11 +02:00
Willy Tarreau
8a3f58280f BUG/MEDIUM: applet: fix incorrect check for abnormal return condition from handler
We have quite numerous checks for abnormal applet handler behavior which
are supposed to trigger the loop protection. However, consecutive to
commit 15252cd9c ("MEDIUM: stconn: move the RXBLK flags to the stream
connector") that was merged into 2.6-dev12, one flag was incorrectly
renamed, and the check for an applet waiting for a buffer that is present
mistakenly turned to a check for missing room in the buffer. This erroneous
test could mistakenly trigger on applets that perform intensive I/Os doing
small exchanges each (e.g. cache, peers or HTTP client) if the load would
be sustained (>100k iops). For the cache this could represent higher than
13 Gbps on an object at least 1.6 GB large for example, which is quite
unlikely but theoretically possible.

This fix needs to be backported to 2.6.
2022-08-23 20:19:11 +02:00
Frédéric Lécaille
a2d8ad20a3 MINOR: quic: Replace MT_LISTs by LISTs for RX packets.
Replace ->rx.pqpkts quic_enc_level struct member MT_LIST by an LIST.
Same thing for ->list quic_rx_packet struct member MT_LIST.
Update the code consequently. This was a reminisence of the multithreading
support (several threads by connection).

Must be backported to 2.6
2022-08-23 17:55:02 +02:00
Frédéric Lécaille
b8047de11a BUG/MINOR: quic: Safer QUIC frame builders
Do not rely on the fact the callers of qc_build_frm() handle their
buffer passed to function the correct way (without leaving garbage).
Make qc_build_frm() update the buffer passed as argument only if
the frame it builds is well formed.

As far as I sse, there is no such callers which does not handle
carefully such buffers.

Must be backported to 2.6.
2022-08-23 17:40:09 +02:00
Frédéric Lécaille
a8a6043240 BUG/MINOR: quic: Wrong list_for_each_entry() use when building packets from qc_do_build_pkt()
This is list_for_each_entry_safe() which must be used if we want to delete elements
inside its code block. This could explain that some frames which were not built were added
to packets with a NULL ->pkt member.

Thank you to Tristan for having reported this issue through backtraces in GH #1808

Must be backported to 2.6.
2022-08-23 12:06:40 +02:00
Frédéric Lécaille
da9c441886 BUG/MINOR: quix: Memleak for non in flight TX packets
First, these packets must not be inserted in the tree of TX packets.
They are never explicitely acknowledged (for instance an ACK only
packet will never be acknowledged). Furthermore, if taken into an account
these packets may uselessly disturb the congestion control. We do not care
if they are lost or not. Furthermore as the ->in_fligh_len member value is null
they were not released by qc_release_lost_pkts() which rely on these values
to decide to release the allocated memory for such packets.

Must be backported to 2.6.
2022-08-22 19:06:08 +02:00
Emeric Brun
8032a276ce BUG/MAJOR: mworker: fix infinite loop on master with no proxies.
The master is re-exec with an empty proxies list if no master CLI is
configured.

This results in infinite loop since last patch:
3b68b602 ("BUG/MAJOR: log-forward: Fix log-forward proxies not fully initialized")

This patch avoid to loop again on log-forward proxies list if empty.

This patch should be backported until v2.3
2022-08-22 13:09:29 +02:00
Willy Tarreau
f1cfd9bc97 MINOR: cpu-map: remove obsolete diag warning about combined ranges
We used to emit a diag warning in case ranges were used both with the
process and thread part of a thread spec. Now with groups it's not
longer a problem, so let's just kill this warning.
2022-08-22 10:46:13 +02:00
Willy Tarreau
3cd71acd06 BUG/MEDIUM: cpu-map: fix thread 1's affinity affecting all threads
Since 2.7-dev2 with commit 5b09341c02 ("MEDIUM: cpu-map: replace the
process number with the thread group number"), the thread group has
replaced the process number in the "cpu-map" directive. In part due to
a design limit in 2.4 and 2.5, a special case was made of thread 1 in
commit bda7c1decd ("MEDIUM: config: simplify cpu-map handling"), because
there was no other location to store a single-threaded setup's mask by
then. The combination of the two resulted in a problem with thread
groups, by which as soon as one line exhibiting thread number 1 alone
was found in a config, the mask would be applied to all threads in the
group.

The loop was reworked to avoid this obsolete special case, and was
factored for better legibility. One obsolete comment about nbproc
was also removed. No backport is needed.
2022-08-22 10:38:00 +02:00
Frédéric Lécaille
ea4a5cbbdf BUG/MINOR: mux-quic: Fix memleak on QUIC stream buffer for unacknowledged data
Some clients send CONNECTION_CLOSE frame without acknowledging the STREAM
data haproxy has sent. In this case, when closing the connection if
there were remaining data in QUIC stream buffers, they were not released.

Add a <closing> boolean option to qc_stream_desc_free() to force the
stream buffer memory releasing upon closing connection.

Thank you to Tristan for having reported such a memory leak issue in GH #1801.

Must be backported to 2.6.
2022-08-20 19:08:31 +02:00
William Lallemand
62c0b99e3b MINOR: ssl/cli: implement "add ssl ca-file"
In ticket #1805 an user is impacted by the limitation of size of the CLI
buffer when updating a ca-file.

This patch allows a user to append new certificates to a ca-file instead
of trying to put them all with "set ssl ca-file"

The implementation use a new function ssl_store_dup_cafile_entry() which
duplicates a cafile_entry and its X509_STORE.

ssl_store_load_ca_from_buf() was modified to take an apped parameter so
we could share the function for "set" and "add".
2022-08-19 19:58:53 +02:00
William Lallemand
d4774d3cfa MINOR: ssl: handle ca-file appending in cafile_entry
In order to be able to append new CA in a cafile_entry,
ssl_store_load_ca_from_buf() was reworked and a "append" parameter was
added.

The function is able to keep the previous X509_STORE which was already
present in the cafile_entry.
2022-08-19 19:58:53 +02:00
William Lallemand
ec7eb59d20 BUG/MINOR: ssl/cli: error when the ca-file is empty
"set ssl ca-file" does not return any error when a ca-file is empty or
only contains comments. This could be a problem is the file was
malformated and did not contain any PEM header.

It must be backported as far as 2.5.
2022-08-19 19:56:53 +02:00
Frédéric Lécaille
86a53c5669 MINOR: quic: Add reusable cipher contexts for header protection
Implement quic_tls_rx_hp_ctx_init() and quic_tls_tx_hp_ctx_init() to initiliaze
such header protection cipher contexts for each RX and TX parts and for each
packet number spaces, only one time by connection.
Make qc_new_isecs() call these two functions to initialize the cipher contexts
of the Initial secrets. Same thing for ha_quic_set_encryption_secrets() to
initialize the cipher contexts of the subsequent derived secrets (ORTT, 1RTT,
Handshake).
Modify qc_do_rm_hp() and quic_apply_header_protection() to reuse these
cipher contexts.
Note that there is no need to modify the key update for the header protection.
The header protection secrets are never updated.
2022-08-19 18:31:59 +02:00
Emeric Brun
a8942cd9c4 BUG/MAJOR: log-forward: Fix ssl layer not initialized on bind even if configured
Since commit 2071a99df ("MINOR: listener/ssl: set the SSL xprt layer only
once the whole config is known") the xprt is initialized for ssl directly
from a generic funtion used to parse bind args.

But the 'bind' lines from 'log-forward' sections were forgotten in commit
55f0f7bb5 ("MINOR: config: use the new bind_parse_args_list() to parse a
"bind" line").

This patch re-works 'log-forward' section parsing to use the generic
function to parse bind args and fix the issue.

Since the generic way to parse was introduced in 2.6, this patch
should be backported as far as this version.
2022-08-19 16:09:06 +02:00
Emeric Brun
3b68b60261 BUG/MAJOR: log-forward: Fix log-forward proxies not fully initialized
Some initialisation for log forward proxies was missing such
as ssl configuration on 'log-forward's 'bind' lines.

After the loop on the proxy initialization code for proxies present
in the main proxies list, this patch force to loop again on this code
for proxies present in the log forward proxies list.

Those two lists should be merged. This will be part of a global
re-work of proxy initialization including peers proxies and resolver
proxies.

This patch was made in first attempt to fix the bug and to facilitate
the backport on older branches waiting for a cleaner re-work on proxies
initialization on the dev branch.

This patch should be backported as far as 2.3.
2022-08-19 16:08:03 +02:00
Frédéric Lécaille
a846a17fde MINOR: quic: Trace fix in qc_release_frm()
This wrong trace came with this commit:
  "BUG/MINOR: quic: Possible crashes when dereferencing ->pkt quic_frame struct member"
In qc_release_frm() we mark frames as acked. Nothing to see with references
to frames.

Thank you to Willy for having caught this one.

Must be backported to 2.6 as these traces arrived with a bug fix to be backported
to 2.6.
2022-08-19 12:15:05 +02:00
Frédéric Lécaille
e4c3074c00 MINOR: quic: Add the QUIC connection to mux traces
This should help for debugging purpose.

Should be backported to 2.6
2022-08-19 12:02:29 +02:00
Frédéric Lécaille
b827840b42 BUG/MINOR: quic: Wrong splitted duplicated frames handling
When duplicated frames are splitted, we must propagate this information
to the new allocated frame and add a reference to this new frame
to the reference list of the original frame.

Must be backported to 2.6
2022-08-19 10:10:43 +02:00
Frédéric Lécaille
2f16348d24 MINOR: quic: Add frame addresses to QUIC_EV_CONN_PRSAFRM event traces
This should be useful to diagnose some issues.

Should be backported to 2.6.
2022-08-19 09:59:07 +02:00
Frédéric Lécaille
1ba25c244e BUG/MINOR: quic: Possible crashes when dereferencing ->pkt quic_frame struct member
This was done at several places. First in qc_requeue_nacked_pkt_tx_frms.
This aim of this function is, if needed, to requeue all the TX frames of a lost
<pkt> packet passed as argument and detach them from this packet they have been
sent from. They are possible cases where the frm->pkt quic_frame struct member could
be NULL, as a result of a duplication of an original frame by qc_dup_pkt_frms(). This
function adds the duplicated frame to the original frame reference list:
        LIST_APPEND(&origin->reflist, &dup_frm->ref);
But, in this function, the packet which contains the frame is the one which is passed
as argument (for debug purpose). So let us prefer using this variable.
Also do not dereference this ->pkt quic_frame member in qc_release_frm() and
qc_frm_unref() and add a trace to catch the frame with a null ->pkt member.
They are logically frames which have not already been sent.

Thank you to Tristan for having reported such crashes in GH #1808.

Must be backported to 2.6
2022-08-19 09:58:28 +02:00
Willy Tarreau
473e0e54f5 BUG/MINOR: mux-h2: send a CANCEL instead of ES on truncated writes
If a POST upload is cancelled after having advertised a content-length,
or a response body is truncated after a content-length, we're not allowed
to send ES because in this case the total body length must exactly match
the advertised value. Till now that's what we were doing, and that was
causing the other side (possibly haproxy) to respond with an RST_STREAM
PROTOCOL_ERROR due to "ES on DATA frame before content-length".

We can behave a bit cleaner here. Let's detect that we haven't sent
everything, and send an RST_STREAM(CANCEL) instead, which is designed
exactly for this purpose.

This patch could be backported to older versions but only a little bit
of exposure to make sure it doesn't wake up a bad behavior somewhere.
It relies on the following previous commit:

   "MINOR: mux-h2: make streams know if they need to send more data"
2022-08-19 08:03:53 +02:00
Willy Tarreau
4877045f1d MINOR: mux-h2: make streams know if they need to send more data
H2 streams do not even know if they are expected to send more data or
not, which is problematic when closing because we don't know if we're
closing too early or not. Let's start by adding a new stream flag
"H2_SF_MORE_HTX_DATA" to indicate this on the tx path.
2022-08-19 08:03:53 +02:00
Willy Tarreau
ed2b9d9f27 MINOR: mux-h2/traces: report transition to SETTINGS1 before not after
Traces indicating "switching to XXX" generally apply before the transition
so that the current connection state is visible in the trace. SETTINGS1
was incorrect in this regard, with the trace being emitted after. Let's
fix this.

No need to backport this, as this is purely cosmetic.
2022-08-19 08:03:53 +02:00
Willy Tarreau
0f45871344 BUG/MEDIUM: mux-h2: do not fiddle with ->dsi to indicate demux is idle
When switching to H2_CS_FRAME_H, we do not want to present the previous
frame's state, flags, length etc in traces, or we risk to confuse the
analysis, making the reader think that the header information presented
is related to the new frame header being analysed. A naive approach could
have consisted in simply relying on the current parser state (FRAME_H
being that state), but traces are emitted before switching the state,
so traces cannot rely on this.

This was initially addressed by commit 73db434f7 ("MINOR: h2/trace: report
the frame type when known") which used to set dsi to -1 when the connection
becomes idle again, but was accidentally broken by commit 5112a603d
("BUG/MAJOR: mux_h2: Don't consume more payload than received for skipped
frames") which moved dsi after calling the trace function.

But in both cases there's problem with this approach. If an RST or WU frame
cannot be uploaded due to a busy mux, and at the same time we complete
processing on a perfect end of frame with no single new frame header, we
can leave the demux loop with dsi=-1 and with RST or WU to be sent, and
these ones will be sent for stream ID -1. This is what was reported in
github issue #1830. This can be reproduced with a config chaining an h1->h2
proxy to an empty h2 frontend, and uploading a large body such as below:

  $ (printf "POST / HTTP/1.1\r\nContent-length: 1000000000\r\n\r\n";
     cat /dev/zero) |  nc 0 4445 > /dev/null

This shows that we must never affect ->dsi which must always remain valid,
and instead we should set "something else". That something else could be
served by the demux frame type, but that one also needs to be preserved
for the RST_STREAM case. Instead, let's just add a connection flag to say
that the demuxing is in progress. This will be set once a new demux header
is set and reset after the end of a frame. This way the trace subsystem
can know that dft/dfl must not be displayed, without affecting the logic
relying on such values.

Given that the commits above are old and were backported to 1.8, this
new one also needs to be backported as far as 1.8.

Many thanks to David le Blanc (@systemmonkey42) for spotting, reporting,
capturing and analyzing this bug; his work permitted to quickly spot the
problem.
2022-08-19 08:03:53 +02:00
Willy Tarreau
1addf8b777 BUG/MEDIUM: cli: always reset the service context between commands
Erwan Le Goas reported that chaining certain commands on the CLI would
systematically crash the process; for example, "show version; show sess".
This happened since the conversion of cli context to appctx->svcctx,
because if applet_reserve_svcctx() is called a first time for a tiny
context, it's allocated in-situ, and later a keyword that wants a
larger one will see that it's not null and will reuse it and will
overwrite the end of the first one's context.

What is missing is a reset of the svcctx when looping back to
CLI_ST_GETREQ.

This needs to be backported to 2.6, and relies on previous commit
"MINOR: applet: add a function to reset the svcctx of an applet".
2022-08-18 18:16:36 +02:00
Willy Tarreau
1cc08a33e1 MINOR: applet: add a function to reset the svcctx of an applet
The CLI needs to reset the svcctx between commands, and there was nothing
done to handle this. Let's add appctx_reset_svcctx() to do that, it's the
closing equivalent of appctx_reserve_svcctx().

This will have to be backported to 2.6 as it will be used by a subsequent
patch to fix a bug.
2022-08-18 18:16:36 +02:00
Amaury Denoyelle
115ccce867 MEDIUM: h3: concatenate multiple cookie headers
As specified by RFC 9114, multiple cookie headers must be concatenated
into a single entry before passing it to a HTTP/1.1 connection. To
implement this, reuse the same function as already used for HTTP/2
module.

This should answer to feature requested in github issue #1818.
2022-08-18 16:13:33 +02:00
Amaury Denoyelle
2c5a7ee333 REORG: h2: extract cookies concat function in http_htx
As specified by RFC 7540, multiple cookie headers are merged in a single
entry before passing it to a HTTP/1.1 connection. This step is
implemented during headers parsing in h2 module.

Extract this code in the generic http_htx module. This will allow to
reuse it quickly for HTTP/3 implementation which has the same
requirement for cookie headers.
2022-08-18 16:13:33 +02:00
Amaury Denoyelle
704675656b BUG/MEDIUM: quic: fix crash on MUX send notification
MUX notification on TX has been edited recently : it will be notified
only when sending its own data, and not for example on retransmission by
the quic-conn layer. This is subject of the patch :
  b29a1dc2f4
  BUG/MINOR: quic: do not notify MUX on frame retransmit

A new flag QUIC_FL_CONN_RETRANS_LOST_DATA has been introduced to
differentiate qc_send_app_pkts invocation by MUX and directly by the
quic-conn layer in quic_conn_app_io_cb(). However, this is a first
problem as internal quic-conn layer usage is not limited to
retransmission. For example for NEW_CONNECTION_ID emission.

Another problem much important is that send functions are also called
through quic_conn_io_cb() which has not been protected from MUX
notification. This could probably result in crash when trying to notify
the MUX.

To fix both problems, quic-conn flagging has been inverted : when used
by the MUX, quic-conn is flagged with QUIC_FL_CONN_TX_MUX_CONTEXT. To
improve the API, MUX must now used qc_send_mux which ensure the flag is
set. qc_send_app_pkts is now static and can only be used by the
quic-conn layer.

This must be backported wherever the previously mentionned patch is.
2022-08-18 11:33:22 +02:00
Frédéric Lécaille
4173a39c1f BUG/MINOR: quic: Missing initializations for ducplicated frames.
When duplication frames in qc_dup_pkt_frms(), ->pkt member was not correctly
initialized (copied from the original frame). This could not have any impact
because this member is initialized whe the frame is added to a packet.
This was also the case for ->flags.
Also replace the pool_zalloc() call by a call to pool_alloc().

Must be backported to 2.6.
2022-08-18 10:28:31 +02:00
Mateusz Malek
4b85a963be BUG/MEDIUM: http-ana: fix crash or wrong header deletion by http-restrict-req-hdr-names
When using `option http-restrict-req-hdr-names delete`, HAproxy may
crash or delete wrong header after receiving request containing multiple
forbidden characters in single header name; exact behavior depends on
number of request headers, number of forbidden characters and position
of header containing them.

This patch fixes GitHub issue #1822.

Must be backported as far as 2.2 (buggy feature got included in 2.2.25,
2.4.18 and 2.5.8).
2022-08-17 15:52:17 +02:00
Amaury Denoyelle
b29a1dc2f4 BUG/MINOR: quic: do not notify MUX on frame retransmit
On STREAM emission, quic-conn notifies MUX through a callback named
qcc_streams_sent_done(). This also happens on retransmission : in this
case offset are examined and notification is ignored if already seen.

However, this behavior has slightly changed since
  e53b489826
  BUG/MEDIUM: mux-quic: fix server chunked encoding response

Indeed, if offset diff is NULL, frame is now not ignored. This is to
support FIN notification with a final empty STREAM frame. A side-effect
of this is that if the last stream frame is retransmitted, it won't be
ignored in qcc_streams_sent_done().

In most cases, this side-effect is harmless as qcs instance will soon be
freed after being closed. But if qcs is still alive, this will cause a
BUG_ON crash as it is considered as locally closed.

This bug depends on delay condition and seems to be extremely rare. But
it might be the reason for a crash seen on interop with s2n client on
http3 testcase :

FATAL: bug condition "qcs->st == QC_SS_CLO" matched at src/mux_quic.c:372
  call trace(16):
  | 0x558228912b0d [b8 01 00 00 00 c6 00 00]: main-0x1c7878
  | 0x558228917a70 [48 8b 55 d8 48 8b 45 e0]: qcc_streams_sent_done+0xcf/0x355
  | 0x558228906ff1 [e9 29 05 00 00 48 8b 05]: main-0x1d3394
  | 0x558228907cd9 [48 83 c4 10 85 c0 0f 85]: main-0x1d26ac
  | 0x5582289089c1 [48 83 c4 50 85 c0 75 12]: main-0x1d19c4
  | 0x5582288f8d2a [48 83 c4 40 48 89 45 a0]: main-0x1e165b
  | 0x5582288fc4cc [89 45 b4 83 7d b4 ff 74]: qc_send_app_pkts+0xc6/0x1f0
  | 0x5582288fd311 [85 c0 74 12 eb 01 90 48]: main-0x1dd074
  | 0x558228b2e4c1 [48 c7 c0 d0 60 ff ff 64]: run_tasks_from_lists+0x4e6/0x98e
  | 0x558228b2f13f [8b 55 80 29 c2 89 d0 89]: process_runnable_tasks+0x7d6/0x84c
  | 0x558228ad9aa9 [8b 05 75 16 4b 00 83 f8]: run_poll_loop+0x80/0x48c
  | 0x558228ada12f [48 8b 05 aa c5 20 00 48]: main-0x256
  | 0x7ff01ed2e609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
  | 0x7ff01e8ca163 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e

To reproduce it locally, code was artificially patched to produce
retransmission and avoid qcs liberation.

In order to fix this and avoid future class of similar problem, the best
way is to not call qcc_streams_sent_done() to notify MUX for
retranmission. To implement this, we test if any of
QUIC_FL_CONN_RETRANS_OLD_DATA or the new flag
QUIC_FL_CONN_RETRANS_LOST_DATA is set. A new wrapper
qc_send_app_retransmit() has been added to set the new flag as a
complement to already existing qc_send_app_probing().

This must be backported up to 2.6.
2022-08-17 11:06:24 +02:00
Amaury Denoyelle
cc13047364 MINOR: quic: refactor application send
Adjust qc_send_app_pkts function : remove <old_data> arg and provide a
new wrapper function qc_send_app_probing() which should be used instead
when probing with old data.

This simplifies the interface of the default function, most notably for
the MUX which does not interfer with retransmission.
QUIC_FL_CONN_RETRANS_OLD_DATA flag is set/unset directly in the wrapper
qc_send_app_probing().

At the same time, function documentation has been updated to clarified
arguments and return values.

This commit will be useful for the next patch to differentiate MUX and
retransmission send context. As a consequence, the current patch should
be backported wherever the next one will be.
2022-08-17 11:05:49 +02:00
Amaury Denoyelle
3baab744e7 MINOR: mux-quic: add missing args on some traces
Complete some MUX traces by adding qcc or qcs instance as arguments when
this is possible. This will be useful when several connections are
interleaved.
2022-08-17 11:05:47 +02:00
Amaury Denoyelle
fd79ddb2d6 MINOR: mux-quic: adjust traces on stream init
Adjust traces on qcc_init_stream_remote() : replace "opening" by
"initializing" to avoid confusion with traces dealing with OPEN stream
state.
2022-08-17 11:05:46 +02:00
Amaury Denoyelle
bf3c208760 BUG/MEDIUM: mux-quic: reject uni stream ID exceeding flow control
Emit STREAM_LIMIT_ERROR if a client tries to open an unidirectional
stream with an ID greater than the value specified by our flow-control
limit. The code is similar to the bidirectional stream opening.

MAX_STREAMS_UNI emission is not implement for the moment and is left as
a TODO. This should not be too urgent for the moment : in HTTP/3, a
client has only a limited use for unidirectional streams (H3 control
stream + 2 QPACK streams). This is covered by the value provided by
haproxy in transport parameters.

This patch has been tagged with BUG as it should have prevented last
crash reported on github issue #1808 when opening a new unidirectional
streams with an invalid ID. However, it is probably not the main cause
of the bug contrary to the patch
  commit 11a6f4007b
  BUG/MINOR: quic: Wrong status returned by qc_pkt_decrypt()

This must be backported up to 2.6.
2022-08-17 11:05:19 +02:00
Amaury Denoyelle
26aa399d6b MINOR: qpack: report error on enc/dec stream close
As specified by RFC 9204, encoder and decoder streams must not be
closed. If the peer behaves incorrectly and closes one of them, emit a
H3_CLOSED_CRITICAL_STREAM connection error.

To implement this, QPACK stream decoding API has been slightly adjusted.
Firstly, fin parameter is passed to notify about FIN STREAM bit.
Secondly, qcs instance is passed via unused void* context. This allows
to use qcc_emit_cc_app() function to report a CONNECTION_CLOSE error.
2022-08-17 11:04:53 +02:00
Amaury Denoyelle
6b02c6bb47 MINOR: h3: report error on control stream close
As specified by RFC 9114 the control stream must not be closed. If the
peer behaves incorrectly and closes it, emit a H3_CLOSED_CRITICAL_STREAM
connection error.
2022-08-17 11:04:52 +02:00
Amaury Denoyelle
f372e744de MINOR: quic: adjust quic_frame flag manipulation
Replace a plain '=' operator by '|=' when setting quic_frame
QUIC_FL_TX_FRAME_LOST flag.

For the moment, this change has no impact as only two exclusive flags
are defined for quic_frame. On the edited code path we are certain that
QUIC_FL_TX_FRAME_ACKED is not set due to a previous if statement, so a
plain equal or a binary OR is strictly identical.

This change will be useful if new flags are defined for quic_frame in
the future. These new flags won't be resetted automatically thanks to
binary OR without explictly intended, which otherwise could easily lead
to new bugs.
2022-08-17 11:04:47 +02:00
Frédéric Lécaille
bbeec37b31 MINOR: stick-table: Add table_expire() and table_idle() new converters
table_expire() returns the expiration delay for a stick-table entry associated
to an input sample. Its counterpart table_idle() returns the time the entry
remained idle since the last time it was updated.
Both converters may take a default value as second argument which is returned
when the entry is not present.
2022-08-17 10:52:15 +02:00
Willy Tarreau
cc1a2a1867 MINOR: chunk: inline alloc_trash_chunk()
This function is responsible for all calls to pool_alloc(trash), whose
total size can be huge. As such it's quite a pain that it doesn't provide
more hints about its users. However, since the function is tiny, it fully
makes sense to inline it, the code is less than 0.1% larger with this.
This way we can now detect where the callers are via "show profiling",
e.g.:

       0 1953671           0 32071463136| 0x59960f main+0x10676f p_free(-16416) [pool=trash]
       0       1           0       16416| 0x59960f main+0x10676f p_free(-16416) [pool=trash]
 1953672       0 32071479552           0| 0x599561 main+0x1066c1 p_alloc(16416) [pool=trash]
       0  976835           0 16035723360| 0x576ca7 http_reply_to_htx+0x447/0x920 p_free(-16416) [pool=trash]
       0       1           0       16416| 0x576ca7 http_reply_to_htx+0x447/0x920 p_free(-16416) [pool=trash]
  976835       0 16035723360           0| 0x576a5d http_reply_to_htx+0x1fd/0x920 p_alloc(16416) [pool=trash]
       1       0       16416           0| 0x576a5d http_reply_to_htx+0x1fd/0x920 p_alloc(16416) [pool=trash]
2022-08-17 10:45:22 +02:00
Willy Tarreau
42b180dcdb MINOR: pools/memprof: store and report the pool's name in each bin
Storing the pointer to the pool along with the stats is quite useful as
it allows to report the name. That's what we're doing here. We could
store it in place of another field but that's not convenient as it would
require to change all functions that manipulate counters. Thus here we
store one extra field, as well as some padding because the struct turns
56 bytes long, thus better go to 64 directly. Example of output from
"show profiling memory":

      2      0       48         0|  0x4bfb2c ha_quic_set_encryption_secrets+0xcc/0xb5e p_alloc(24) [pool=quic_tls_iv]
      0  55252        0  10608384|  0x4bed32 main+0x2beb2 free(-192)
     15      0     2760         0|  0x4be855 main+0x2b9d5 p_alloc(184) [pool=quic_frame]
      1      0     1048         0|  0x4be266 ha_quic_add_handshake_data+0x2b6/0x66d p_alloc(1048) [pool=quic_crypto]
      3      0      552         0|  0x4be142 ha_quic_add_handshake_data+0x192/0x66d p_alloc(184) [pool=quic_frame]
  31276      0  6755616         0|  0x4bb8f9 quic_sock_fd_iocb+0x689/0x69b p_alloc(216) [pool=quic_dgram]
      0  31424        0   6787584|  0x4bb7f3 quic_sock_fd_iocb+0x583/0x69b p_free(-216) [pool=quic_dgram]
    152      0    32832         0|  0x4bb4d9 quic_sock_fd_iocb+0x269/0x69b p_alloc(216) [pool=quic_dgram]
2022-08-17 10:34:00 +02:00
Willy Tarreau
facfad2b64 MINOR: pool/memprof: report pool alloc/free in memory profiling
Pools are being used so well that it becomes difficult to profile their
usage via the regular memory profiling. Let's add new entries for pools
there, named "p_alloc" and "p_free" that correspond to pool_alloc() and
pool_free(). Ideally it would be nice to only report those that fail
cache lookups but that's complicated, particularly on the free() path
since free lists are released in clusters to the shared pools.

It's worth noting that the alloc_tot/free_tot fields can easily be
determined by multiplying alloc_calls/free_calls by the pool's size, and
could be better used to store a pointer to the pool itself. However it
would require significant changes down the code that sorts output.

If this were to cause a measurable slowdown, an alternate approach could
consist in using a different value of USE_MEMORY_PROFILING to enable pools
profiling. Also, this profiler doesn't depend on intercepting regular malloc
functions, so we could also imagine enabling it alone or the other one alone
or both.

Tests show that the CPU overhead on QUIC (which is already an extremely
intensive user of pools) jumps from ~7% to ~10%. This is quite acceptable
in most deployments.
2022-08-17 09:38:05 +02:00
Willy Tarreau
219afa2ca8 MINOR: memprof: export the minimum definitions for memory profiling
Right now it's not possible to feed memory profiling info from outside
activity.c, so let's export the function and move the enum and struct
to the include file.
2022-08-17 09:03:57 +02:00
Frédéric Lécaille
11a6f4007b BUG/MINOR: quic: Wrong status returned by qc_pkt_decrypt()
This bug came with this big commit:
     "MEDIUM: quic: xprt traces rework"

This is the <ret> variable value which must be returned by most of the xprt functions.
This leaded packets which could not be decrypted to be parsed, with weird frames
to be parsed as found by Tristan in GH #1808.

To be backported where the commit above was backported.
2022-08-16 14:54:32 +02:00
Frédéric Lécaille
ebb1070721 BUG/MINOR: quic: MIssing check when building TX packets
When building an ack-eliciting frame only packet, if we did not manage to add
at least one such a frame to the packet, we did not notify the caller about
the fact the packet is empty. This could lead the caller to believe
everything was ok and make it endlessly try to build packet again and again.
This issue was amplified by the recent changes where a while(1) loop has been
added to qc_send_app_pkt() which calls qc_do_build_pkt() through qc_prep_app_pkts()
until we could not prepare packets. Before this recent change, I guess only
one empty packet was sent.
This patch checks that non empty packets could be built by qc_do_build_pkt()
and makes this function return an error if this was the case. Also note that
such an issue could happened only when the packet building was limited by
the congestion control.

Thank you to Tristan for having reported this issue in GH #1808.

Must be backported to 2.6.
2022-08-16 12:23:38 +02:00
Amaury Denoyelle
35a66c0a36 BUG/MINOR: mux-quic: fix crash with traces in qc_detach()
qc_detach() is used to free a qcs as notified by sedesc. If there is no
more stream active and the connection is considered as dead, it will
then be freed. This prevent to dereference qcc in TRACE macro. Else this
will cause a crash.

Use a different code-path on release for qc_detach() to fix this bug.

This will fix the last occurence of crash on github issue #1808.

This has been introduced by recent QUIC MUX traces rework. Thus, it does
not need to be backport.
2022-08-12 16:02:00 +02:00
Willy Tarreau
ded77cc71f MINOR: ring: archive a previous file-backed ring on startup
In order to ensure that an instant restart of the process will not wipe
precious debugging information, and to leave time for an admin to archive
a copy of a ring, now upon startup, any previously existing file will be
renamed with the extra suffix ".bak", and any previously existing file
with suffix ".bak" will be removed.
2022-08-12 15:40:19 +02:00
Willy Tarreau
8e87705c21 BUILD: sink: replace S_IRUSR, S_IWUSR with their octal value
The build broke on freebsd with S_IRUSR undefined after commit 0b8e9ceb1
("MINOR: ring: add support for a backing-file"). Maybe another include
is needed there, but the point is that we really don't care about these
symbolic names, file modes are more readable as 0600 than via these
cryptic names anyway, so let's go back to 0600. This will teach me not
to try to make things too clean.

No backport is needed.
2022-08-12 15:03:12 +02:00
Frédéric Lécaille
bfb077acff BUG/MINOR: quic: memleak on wrong datagram receipt
There was a missing pool_free() call for such datagrams. As far as I see
there is no leak on valid datagram receipt.

Must be backported to 2.6.
2022-08-12 12:19:26 +02:00
Willy Tarreau
0b8e9ceb12 MINOR: ring: add support for a backing-file
This mmaps a file which will serve as the backing-store for the ring's
contents. The idea is to provide a way to retrieve sensitive information
(last logs, debugging traces) even after the process stops and even after
a possible crash. Right now this was possible by connecting to the CLI
and dumping the contents of the ring live, but this is not handy and
consumes quite a bit of resources before it is needed.

With a backing file, the ring is effectively RAM-mapped file, so that
contents stored there are the same as those found in the file (the OS
doesn't guarantee immediate sync but if the process dies it will be OK).

Note that doing that on a filesystem backed by a physical device is a
bad idea, as it will induce slowdowns at high loads. It's really
important that the device is RAM-based.

Also, this may have security implications: if the file is corrupted by
another process, the storage area could be corrupted, causing haproxy
to crash or to overwrite its own memory. As such this should only be
used for debugging.
2022-08-12 11:18:46 +02:00
Willy Tarreau
6df10d872b MINOR: ring: support creating a ring from a linear area
Instead of allocating two parts, one for the ring struct itself and
one for the storage area, ring_make_from_area() will arrange the two
inside the same memory area, with the storage starting immediately
after the struct. This will allow to store a complete ring state in
shared memory areas for example.
2022-08-12 11:18:46 +02:00
Frédéric Lécaille
7629f5d670 BUG/MEDIUM: quic: Wrong use of <token_odcid> in qc_lsntr_pkt_rcv()
This commit was not complete:
  "BUG/MEDIUM: quic: Possible use of uninitialized <odcid>
variable in qc_lstnr_params_init()"
<token_odcid> should have been directly passed to qc_lstnr_params_init()
without dereferencing it to prevent haproxy to have new chances to crash!

Must be backported to 2.6.
2022-08-11 19:12:12 +02:00
Willy Tarreau
18d1306abd BUG/MEDIUM: ring: fix too lax 'size' parser
It took me a while to figure why a ring declared with "size 1M" was causing
strange effects in a ring, it's just because it's parsed as "1", which is
smaller than the default 16384 size and errors are silently ignored.

This commit tries to address this the best possible way without breaking
existing configs that would work by accident, by warning that the size is
ignored if it's smaller than the current one, and by printing the parsed
size instead of the input string in warnings and errors. This way if some
users have "size 10000" or "size 100k" it will continue to work as 16kB
like today but they will now be aware of it.

In addition the error messages were a bit poor in context in that they
only provided line numbers. The ring name was added to ease locating the
problem.

As the issue was present since day one and was introduced in 2.2 with
commit 99c453df9d ("MEDIUM: ring: new section ring to declare custom ring
buffers."), it could make sense to backport this as far as 2.2, but with
2.2 being quite old now it doesn't seem very reasonable to start emitting
new config warnings in config that apparently worked well.

Thus it looks more reasonable to backport this as far as 2.4.
2022-08-11 19:05:19 +02:00
Frédéric Lécaille
e9325e97c2 BUG/MEDIUM: quic: Possible use of uninitialized <odcid> variable in qc_lstnr_params_init()
When receiving a token into a client Initial packet without a cluster secret defined
by configuration, the <odcid> variable used to parse the ODCID from the token
could be used without having been initialized. Such a packet must be dropped. So
the sufficient part of this patch is this check:

+             }
+             else if (!global.cluster_secret && token_len) {
+                    /* Impossible case: a token was received without configured
+                    * cluster secret.
+                    */
+                    TRACE_PROTO("Packet dropped", QUIC_EV_CONN_LPKT,
+                    NULL, NULL, NULL, qv);
+                    goto drop;
              }

Take the opportunity of this patch to rework and make it more readable this part
of code where such a packet must be dropped removing the <check_token> variable.
When an ODCID is parsed from a token, new <token_odcid> new pointer variable
is set to the address of the parsed ODCID. This way, is not set but used it will
make crash haproxy. This was not always the case with an uninitialized local
variable.

Adapt the API to used such a pointer variable: <token> boolean variable is removed
from qc_lstnr_params_init() prototype.

This must be backported to 2.6.
2022-08-11 18:33:36 +02:00
Amaury Denoyelle
6bdf9367fb BUG/MEDIUM: mux-quic: fix crash due to invalid trace arg
Traces argument were incorrectly used in qcs_free(). A qcs was specified
as first arg instead of a connection. This will lead to a crash if
developer qmux traces are activated. This is now fixed.

This bug has been introduced with QUIC MUX traces rework. No need to
backport.
2022-08-11 18:24:53 +02:00
Amaury Denoyelle
4c9a1642c1 MINOR: mux-quic: define new traces
Add new traces to help debugging on QUIC MUX. Most notable, the
following functions are now traced :
* qcc_emit_cc
* qcs_free
* qcs_consume
* qcc_decode_qcs
* qcc_emit_cc_app
* qcc_install_app_ops
* qcc_release_remote_stream
* qcc_streams_sent_done
* qc_init
2022-08-11 15:20:44 +02:00
Amaury Denoyelle
047d86a34b CLEANUP: mux-quic: adjust traces level
Change default devel level for some traces in QUIC MUX:
* proto : used to notify about reception/emission of frames
* state : modification of internal state of connection or streams
* data : detailled information about transfer and flow-control
2022-08-11 15:20:44 +02:00
Amaury Denoyelle
c7fb0d2b7a MINOR: mux-quic: define protocol error traces
Replace devel traces with error level on all errors situation. Also a
new event QMUX_EV_PROTO_ERR is used. This should help to detect invalid
situations quickly.
2022-08-11 15:20:44 +02:00
Amaury Denoyelle
f0b67f995c MINOR: mux-quic: adjust enter/leave traces
Improve MUX traces by adding some missing enter/leave trace points. In
some places, early function returns have been replaced by a goto
statement.
2022-08-11 15:20:43 +02:00
Frédéric Lécaille
59507de932 CLEANUP: quic: Remove trailing spaces
This spaces have come with this commit:
  "MEDIUM: quic: xprt traces rework".
2022-08-11 14:33:43 +02:00
Frédéric Lécaille
96d08d37d9 BUG/MINOR: quic: Possible infinite loop in quic_build_post_handshake_frames()
This loop is due to the fact that we do not select the next node before
the conditional "continue" statement. Furthermore the condition and the
"continue" statement may be removed after replacing eb64_first() call
by eb64_lookup_ge(): we are sure this condition may not be satisfied.
Add some comments: this function initializes connection IDs with sequence
number 1 upto <max> non included.
Take the opportunity of this patch to remove a "return" wich broke this
traces rule: for any function, do not call TRACE_ENTER() without TRACE_LEAVE()!
Add also TRACE_ERROR() for any encoutered errors.

Must be backported to 2.6
2022-08-11 14:33:43 +02:00
Frédéric Lécaille
a6920a25d9 MINOR: quic: Remove useless lock for RX packets
This lock was there be able to handle the RX packets for a connetion
from several threads. This is no more needed since a QUIC connection
is always handled by the same thread.

May be backported to 2.6
2022-08-11 14:33:43 +02:00
Willy Tarreau
6a378d1677 BUILD: stconn: fix build warning at -O3 about possible null sc
gcc-6.x and 7.x emit build warnings about sc possibly being null upon
return from sc_detach_endp(). This actually is not the case and the
compiler is a little bit overzealous there, but there exists code
paths that can make this analysis non-trivial so let's at least add
a similar BUG_ON() to let both the compiler and the deverloper know
this doesn't happen.

This should be backported to 2.6.
2022-08-11 13:59:13 +02:00
Frédéric Lécaille
a8b2f843d2 MEDIUM: quic: xprt traces rework
Add a least as much as possible TRACE_ENTER() and TRACE_LEAVE() calls
to any function. Note that some functions do not have any access to the
a quic_conn argument when  receiving or parsing datagram at very low level.
2022-08-11 11:11:20 +02:00
Willy Tarreau
e6ca435c04 BUG/MEDIUM: poller: use fd_delete() to release the poller pipes
The poller pipes needed to communicate between multiple threads are
allocated in init_pollers_per_thread() and released in
deinit_pollers_per_thread(). The former adds them via fd_insert()
so that they are known, but the former only closes them using a
regular close().

This asymmetry represents a problem, because we have in the fdtab[]
an entry for something that may disappear when one thread leaves, and
since these FD numbers are very low, there is a very high likelihood
that they are immediately reassigned to another thread trying to
connect() to a server or just sending a health check. In this case,
the other thread is going to fd_insert() the fd and the recently
added consistency checks will notive that ->owner is not NULL and
will crash. We just need to use fd_delete() here to match fd_insert().

Note that this test was added in 2.7-dev2 by commit 36d9097cf
("MINOR: fd: Add BUG_ON checks on fd_insert()") which was backported
to 2.4 as a safety measure (since it allowed to catch particularly
serious issues). The patch in itself isn't wrong, it just revealed
a long-dormant bug (been there since 1.9-dev1, 4 years ago). As such
the current patch needs to be backported wherever the commit above
is backported.

Many thanks to Christian Ruppert for providing detailed traces in
github issue #1807 and Cedric Paillet for bringing his complementary
analysis that helped to understand the required conditions for this
issue to happen (fast health checks @100ms + randomly long connections
~7s + fast reloads every second + hard-stop-after 5s were necessary
on the dev's machine to trigger it from time to time).
2022-08-10 17:25:23 +02:00
Willy Tarreau
54bc78693d BUG/MEDIUM: quic: always remove the connection from the accept list on close
Fred managed to reproduce a crash showing a corrupted accept_list when
firing thousands of concurrent picoquicdemo clients to a same instance.
It may happen if the connection was placed into the accept_list and
immediately closed before being processed (e.g. on error or t/o ?).

In any case the quic_conn_release() function should always detach a
connection to be deleted from any list, like it does for other lists,
so let's add an MT_LIST_DELETE() here.

This should be backported to 2.6.
2022-08-10 07:30:22 +02:00
Amaury Denoyelle
f0f92b2db8 BUG/MINOR: quic: fix crash on handshake io-cb for null next enc level
When arriving at the handshake completion, next encryption level will be
null on quic_conn_io_cb(). Thus this must be check this before
dereferencing it via qc_need_sending() to prevent a crash.

This was reproduced quickly when browsing over a local nextcloud
instance through QUIC with firefox.

This has been introduced in the current dev with quic-conn Tx
refactoring. No need to backport it.
2022-08-09 18:01:10 +02:00
Amaury Denoyelle
96ca1b7c39 BUG/MINOR: mux-quic: open stream on STOP_SENDING
Considered a stream as opened when receiving a STOP_SENDING frame as the
first frame on the stream.

This patch is tagged as BUG because a BUG_ON may occur if only a
STOP_SENDING frame has been received for a frame. This will reset the
stream in respect with RFC9000 but internally it is considered invalid
transition to reset an idle stream.

To fix this, simply use qcs_idle_open() on STOP_SENDING parsing
function. This will mark the stream as OPEN before resetting it.

This was detected on haproxy.org with the following backtrace :

FATAL: bug condition "qcs->st == QC_SS_IDLE" matched at
src/mux_quic.c:383
  call trace(12):
  |       0x490dd3 [b8 01 00 00 00 c6 00 00]: main-0x1d0633
  |       0x4975b8 [48 8b 85 58 ff ff ff 8b]: main-0x1c9e4e
  |       0x497df4 [48 8b 45 c8 48 89 c7 e8]: main-0x1c9612
  |       0x49934c [48 8b 45 c8 48 89 c7 e8]: main-0x1c80ba
  |       0x6b3475 [48 8b 05 54 1b 3a 00 64]: run_tasks_from_lists+0x45d/0x8b2
  |       0x6b4093 [29 c3 89 d8 89 45 d0 83]: process_runnable_tasks+0x7c9/0x824
  |       0x660bde [8b 05 fc b3 4f 00 83 f8]: run_poll_loop+0x74/0x430
  |       0x6611de [48 8b 05 7b a6 40 00 48]: main-0x228
  | 0x7f66e4fb2ea5 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea5
  | 0x7f66e455ab0d [48 89 c7 e8 5b 72 fc ff]: libc:clone+0x6d/0x86

Stream states have been implemented in the current dev tree. Thus, this
patch does not need to be backported.
2022-08-09 17:58:02 +02:00
Amaury Denoyelle
c09ef0c5fc MINOR: quic: skip sending if no frame to send in io-cb
Check on quic_conn_io_cb() if sending is required. This allows to skip
over Tx buffer allocation if not needed.

To implement this, we check if frame lists on current and next
encryption level are empty. We also need to check if there is no need to
send ACK, PROBE or CONNECTION_CLOSE. This has been isolated in a new
function qc_need_sending() which may be reuse in some other functions in
the future.
2022-08-09 16:03:49 +02:00
Amaury Denoyelle
654269c769 MINOR: quic: refactor datagram commit in Tx buffer
This is the final patch on quic-conn Tx refactor. Extend the function
which is used to write a datagram header to save at the same time
written buffer data. This makes sense as the two operations are used at
the same occasion when a pre-written datagram is comitted.
2022-08-09 16:00:30 +02:00
Amaury Denoyelle
5b68986d77 MINOR: quic: release Tx buffer on each send
Complete refactor of quic-conn Tx buffer. The buffer is now released
on every send operation completion. This should help to reduce memory
footprint as now Tx buffers are allocated and released on demand.

To simplify allocation/free of quic-conn Tx buffer, two static functions
are created named qc_txb_alloc() and qc_txb_release().
2022-08-09 16:00:02 +02:00
Amaury Denoyelle
f2476053f9 MINOR: quic: replace custom buf on Tx by default struct buffer
On first prototype version of QUIC, emission was multithreaded. To
support this, a custom thread-safe ring-buffer has been implemented with
qring/cbuf.

Now the thread model has been adjusted : a quic-conn is always used on
the same thread and emission is not multi-threaded. Thus, qring/cbuf
usage can be replace by a standard struct buffer.

The code has been simplified even more as for now buffer is always
drained after a prepare/send invocation. This is the case since a
datagram is always considered as sent even on sendto() error. BUG_ON
statements guard are here to ensure that this model is always valid.
Thus, code to handle data wrapping and consume too small contiguous
space with a 0-length datagram is removed.
2022-08-09 15:45:47 +02:00
Amaury Denoyelle
56c6154dba CLEANUP: mux-quic: remove loop on sending frames
qc_send_app_pkts() has now a while loop implemented which allows to send
all possible frames even if the send buffer is full between packet
prepare and send. This is present since commit :
  dc07751ed7
  MINOR: quic: Send packets as much as possible from qc_send_app_pkts()

This means we can remove code from the MUX which implement this at the
upper layer. This is useful to simplify qc_send_frames() function.

As mentionned commit is subject to backport, this commit should be
backported as well to 2.6.
2022-08-09 15:41:07 +02:00
Willy Tarreau
4a426e2082 MINOR: debug/memstats: automatically determine first column size
The first column's width may vary a lot depending on outputs, and it's
annoying to have large empty columns on small names and mangled large
columns that are not yet large enough. In order to overcome this, this
patch adds a width field to the memstats applet's context, and this
width is calculated the first time the function is entered, by estimating
the width of all lines that will be dumped. This is simple enough and
does the job well. If in the future some filtering criteria are added,
it will still be possible to perform a single pass on everything
depending on the desired output format.
2022-08-09 08:51:08 +02:00
Willy Tarreau
17200dd1f3 MINOR: debug: also store the function name in struct mem_stats
The calling function name is now stored in the structure, and it's
reported when the "all" argument is passed. The first column is
significantly enlarged because some names are really wide :-(
2022-08-09 08:42:42 +02:00
Willy Tarreau
55c950baa9 MINOR: debug: store and report the pool's name in struct mem_stats
Let's add a generic "extra" pointer to the struct mem_stats to store
context-specific information. When tracing pool_alloc/pool_free, we
can now store a pointer to the pool, which allows to report the pool
name on an extra column. This significantly improves tracing
capabilities.

Example:

  proxy.c:1598      CALLOC   size:  28832  calls:  4     size/call:  7208
  dynbuf.c:55       P_FREE   size:  32768  calls:  2     size/call:  16384  buffer
  quic_tls.h:385    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:389    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:554    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:558    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:562    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:401    P_ALLOC  size:  34080  calls:  1420  size/call:  24     quic_tls_iv
  quic_tls.h:403    P_ALLOC  size:  34080  calls:  1420  size/call:  24     quic_tls_iv
  xprt_quic.c:4060  MALLOC   size:  45376  calls:  5672  size/call:  8
  quic_sock.c:328   P_ALLOC  size:  46440  calls:  215   size/call:  216    quic_dgram
2022-08-09 08:26:59 +02:00
Frédéric Lécaille
ba19acd822 MINOR: quic: Replace pool_zalloc() by pool_malloc() for fake datagrams
These fake datagrams are only used by the low level I/O handler. They
are not provided to the "by connection" datagram handlers. This
is why they are not MT_LIST_APPEND()ed to the listner RX buffer list
(see &quic_dghdlrs[cid_tid].dgrams in quic_lstnr_dgram_dispatch().
Replace the call to pool_zalloc() to by the lighter call to pool_malloc()
and initialize only the ->buf and ->length members. This is safe because
only these fields are inspected by the low level I/O handler.
2022-08-08 21:10:58 +02:00
Frédéric Lécaille
ffde3168fc BUG/MEDIUM: quic: Missing AEAD TAG check after removing header protection
After removing the packet header protection, we can check the packet is long
enough to contain a 16 bytes length AEAD TAG (at this end of the packet).
This test was missing.

Must be backported to 2.6.
2022-08-08 18:41:16 +02:00
Frédéric Lécaille
adc7641536 MINOR: quic: Too much useless traces in qc_build_frms()
These traces about the available room into the packet currently built and
its payload length could be displayed for each STREAM frame, even for
those which have no chance to be embedded into a packet leading to
very traces to be displayed from a connection with a lot of stream.
This was revealed by traces provide by Tristan in GH #1808

May be backported to 2.6.
2022-08-08 16:18:55 +02:00
Frédéric Lécaille
99897d11d9 BUG/MEDIUM: quic: Wrong packet length check in qc_do_rm_hp()
When entering this function, we first check the packet length is not too short.
But this was done against the datagram lenght in place of the packet length.
This could lead to the header protection to be removed using data past
the end of the packet (without buffer overflow).

Use the packet length in place of the datagram length which is at <end>
address passed as parameter to this function. As the packet length
is already stored in ->len packet struct member, this <end> parameter is no
more useful.

Must be backported to 2.6.
2022-08-08 11:02:04 +02:00
Willy Tarreau
2e64472d16 BUILD: cfgparse: always defined _GNU_SOURCE for sched.h and crypt.h
_GNU_SOURCE used to be defined only when USE_LIBCRYPT was set. It's also
needed for sched_setaffinity() to be exported. As a side effect, when
USE_LIBCRYPT is not set, a warning is emitted, as Ilya found and reported
in issue #1815. Let's just define _GNU_SOURCE regardless of USE_LIBCRYPT,
and also explicitly add sched.h, as right now it appears to be inherited
from one of the other includes.

This should be backported to 2.4.
2022-08-07 16:55:07 +02:00
Ilya Shipitsin
52f2ff5b93 BUG/MEDIUM: fix DH length when EC key is used
dh of length 1024 were chosen for EVP_PKEY_EC key type.
let us pick "default_dh_param" instead.

issue was found on Ubuntu 22.04 which is shipped with OpenSSL configured
with SECLEVEL=2 by default. such SECLEVEL value prohibits DH shorter than
2048:

OpenSSL error[0xa00018a] SSL_CTX_set0_tmp_dh_pkey: dh key too small

better strategy for chosing DH still may be considered though.
2022-08-06 17:45:40 +02:00
Ilya Shipitsin
3b64a28e15 CLEANUP: assorted typo fixes in the code and comments
This is 31st iteration of typo fixes
2022-08-06 17:12:51 +02:00
Willy Tarreau
c80bdb2da6 MINOR: threads: report the number of thread groups in build options
haproxy -vv shows the number of threads but didn't report the number
of groups, let's add it.
2022-08-06 16:45:26 +02:00
Willy Tarreau
f9d4a7dad3 BUG/MEDIUM: quic: break out of the loop in quic_lstnr_dghdlr
The function processes packets sent by other threads in the current
thread's queue. But if, for any reason, other threads write faster
than the current one processes, this can lead to a situation where
the function never returns.

It seems that it might be what's happening in issue #1808, though
unfortunately, this function is one of the rare without traces. But
the amount of calls to functions like qc_lstnr_pkt_rcv() on a single
thread seems to indicate this possibility.

Thanks to Tristan for his efforts in collecting extremely precious
traces!

This likely needs to be backported to 2.6.
2022-08-05 16:12:00 +02:00
Amaury Denoyelle
6715cbf97f BUG/MINOR: quic: adjust errno handling on sendto
qc_snd_buf returned a size_t which means that it was never negative
despite its documentation. Thus the caller who checked for this was
never informed of a sendto error.

Clean this by changing the return value of qc_snd_buf() to an integer.
A 0 is returned on success. Every other values are considered as an
error.

This commit should be backported up to 2.6. Note that to not cause
malfunctions, it must be backported after the previous patch :
  906b058954
  MINOR: quic: explicitely ignore sendto error
This is to ensure that a sendto error does not cause send to be
interrupted which may cause a stalled transfer without a proper retry
mechanism.

The impact of this bug seems null as caller explicitely ignores sendto
error. However this part of code seems to be subject to strange issues
and it may fix them in part. It may be of interest for github issue #1808.
2022-08-05 15:53:16 +02:00
Amaury Denoyelle
906b058954 MINOR: quic: explicitely ignore sendto error
qc_snd_buf() returns an error if sendto has failed. On standard
conditions, we should check for EAGAIN/EWOULDBLOCK errno and if so,
register the file-descriptor in the poller to retry the operation later.

However, quic_conn uses directly the listener fd which is shared for all
QUIC connections of this listener on several threads. Thus, it's
complicated to implement fd supversion via the poller : there is no
mechanism to easily wakeup quic_conn or MUX after a sendto failure.

A quick and simple solution for the moment is to considered a datagram
as properly emitted even on sendto error. In the end, this will trigger
the quic_conn retransmission timer as data will be considered lost on
the network and the send operation will be retried. This solution will
be replaced when fd management for quic_conn is reworked.

In fact, this quick hack was already in use in the current code, albeit
not voluntarily. This is due to a bug caused by an API mismatch on the
return type of qc_snd_buf() which never emits a negative error code
despite its documentation. Thus, all its invocation were considered as a
success. If this bug was fixed, the sending would would have been
interrupted by a break which could cause the transfer to freeze.

qc_snd_buf() invocation is clean up : the break statement is removed.
Send operation is now always explicitely conducted entirely even on
error and buffer data is purged.

A simple optimization has been added to skip over sendto when looping
over several datagrams at the first sendto error. However, to properly
function, it requires a fix on the return type of qc_snd_buf() which is
provided in another patch.

As the behavior before and after this patch seems identical, it is not
labelled as a BUG. However, it should be backported for cleaning
purpose. It may also have an impact on github issue #1808.
2022-08-05 15:45:25 +02:00
Frédéric Lécaille
e7df68a219 BUG/MINOR: quic: Missing Initial packet dropping case
An Initial packet shorter than 1200 bytes must be dropped. The test was there
without the "goto drop"!

Must be backported to 2.6
2022-08-05 15:27:14 +02:00
Frédéric Lécaille
8ecb7363b5 MINOR: quic: Add two new stats counters for sendto() errors
Add "quic_socket_full" new stats counter for sendto() errors with EAGAIN as errno.
and "quic_sendto_err" counter for any other error.
2022-08-05 15:27:14 +02:00
Willy Tarreau
af5138fd07 BUG/MINOR: quic: do not reject datagrams matching minimum permitted size
The dgram length check in quic_get_dgram_dcid() rejects datagrams
matching exactly the minimum allowed length, which doesn't seem
correct. I doubt any useful packet would be that small but better
fix this to avoid confusing debugging sessions in the future.

This might be backported to 2.6.
2022-08-05 10:31:29 +02:00
Willy Tarreau
53bfab080c BUG/MINOR: sink: fix a race condition between the writer and the reader
This is the same issue as just fixed in b8e0fb97f ("BUG/MINOR: ring/cli:
fix a race condition between the writer and the reader") but this time
for sinks. They're also sucking the ring and present the same race at
high write loads.

This must be backported to 2.2 as well. See comments in the aforementioned
commit for backport hints if needed.
2022-08-04 17:21:16 +02:00
Christopher Faulet
96417f392d BUG/MEDIUM: sink: Set the sink ref for forwarders created during ring parsing
A reference to the sink was added in every forwarder by the commit 2ae25ea24
("MINOR: sink: Add a ref to sink in the sink_forward_target structure"). But
this commit is incomplete. It is not performed for the forwarders created
during a ring parsing.

This patch must be backported to 2.6.
2022-08-04 17:10:28 +02:00
Willy Tarreau
b8e0fb97f3 BUG/MINOR: ring/cli: fix a race condition between the writer and the reader
The ring's CLI reader unlocks the read side of a ring and relocks it for
writing only if it needs to re-subscribe. But during this time, the writer
might have pushed data, see nobody subscribed hence woken nobody, while
the reader would have left marking that the applet had no more data. This
results in a dump that will not make any forward progress: the ring is
clogged by this reader which believes there's no data and the writer
will never wake it up.

The right approach consists in verifying after re-attaching if the writer
had made any progress in between, and to report that another call is
needed. Note that a jump back to the beginning would also work but here
we provide better fairness between readers this way.

This needs to be backported to 2.2. The applet API needed to signal the
availability of new data changed a few times since then.
2022-08-04 17:00:21 +02:00
Frédéric Lécaille
48bb875908 BUG/MINOR: quic: Avoid sending truncated datagrams
There is a remaining loop in this ugly qc_snd_buf() function which could
lead haproxy to send truncated UDP datagrams. For now on, we send
a complete UDP datagram or nothing!

Must be backported to 2.6.
2022-08-03 21:09:04 +02:00
Amaury Denoyelle
30e260e2e6 MEDIUM: mux-quic: implement http-request timeout
Implement http-request timeout for QUIC MUX. It is used when the
connection is opened and is triggered if no HTTP request is received in
time. By HTTP request we mean at least a QUIC stream with a full header
section. Then qcs instance is attached to a sedesc and upper layer is
then responsible to wait for the rest of the request.

This timeout is also used when new QUIC streams are opened during the
connection lifetime to wait for full HTTP request on them. As it's
possible to demux multiple streams in parallel with QUIC, each waiting
stream is registered in a list <opening_list> stored in qcc with <start>
as timestamp in qcs for the stream opening. Once a qcs is attached to a
sedesc, it is removed from <opening_list>. When refreshing MUX timeout,
if <opening_list> is not empty, the first waiting stream is used to set
MUX timeout.

This is efficient as streams are stored in the list in their creation
order so CPU usage is minimal. Also, the size of the list is
automatically restricted by flow control limitation so it should not
grow too much.

Streams are insert in <opening_list> by application protocol layer. This
is because only application protocol can differentiate streams for HTTP
messaging from internal usage. A function qcs_wait_http_req() has been
added to register a request stream by app layer. QUIC MUX can then
remove it from the list in qc_attach_sc().

As a side-note, it was necessary to implement attach qcc_app_ops
callback on hq-interop module to be able to insert a stream in waiting
list. Without this, a BUG_ON statement would be triggered when trying to
remove the stream on sedesc attach. This is to ensure that every
requests streams are registered for http-request timeout.

MUX timeout is explicitely refreshed on MAX_STREAM_DATA and STOP_SENDING
frame parsing to schedule http-request timeout if a new stream has been
instantiated. It was already done on STREAM parsing due to a previous
patch.
2022-08-03 15:04:18 +02:00
Amaury Denoyelle
6ec9837fca MINOR: mux-quic: refactor refresh timeout function
Try to reorganize qcc_refresh_timeout() to improve its readability. The
main objective is to reduce the indentation level and if sequences by
using goto statement to the end of the function. Also, backend and
frontend code path should be more explicit with this new version.
2022-08-03 15:04:18 +02:00
Amaury Denoyelle
418ba21461 MINOR: mux-quic: refresh timeout on frame decoding
Refresh the MUX connection timeout in frame parsing functions. This is
necessary as these Rx operation are completed directly from the
quic-conn layer outside of MUX I/O callback. Thus, the timeout should be
refreshed on this occasion.

Note that however on STREAM parsing refresh is only conducted when
receiving the current consecutive data offset.

Timeouts related function have been moved up in the source file to be
able to use them in qcc_decode_qcs().

This commit will be useful for http-request timeout. Indeed, a new
stream may be opened during qcc_decode_qcs() which should trigger this
timeout until a full header section is received and qcs instance is
attached to sedesc.
2022-08-03 15:04:18 +02:00
Amaury Denoyelle
8d818c6eab MINOR: h3: support HTTP request framing state
Store the current step of HTTP message in h3s stream. This reports if we
are in the parsing of headers, content or trailers section. A new enum
h3s_st_req is defined for this.

This field is stored in h3s struct but only used for request stream. It
is left undefined for other streams (control or QPACK streams).

h3_is_frame_valid() has been extended to take into account this state
information. A connection error H3_FRAME_UNEXPECTED is reported if an
invalid frame according to the current state is received; for example a
DATA frame at the beginning of a stream.
2022-08-03 15:04:18 +02:00
Frédéric Lécaille
2c77a5eb8e BUG/MEDIUM: quic: Floating point exception in cubic_root()
It is illegal to call my_flsl() with 0 as parameter value. It is a UB.
This leaded cubic_root() to divide values by 0 at this line:

  x = 2 * x + (uint32_t)(val / ((uint64_t)x * (uint64_t)(x - 1)));

Thank you to Tristan971 for having reported this issue in GH #1808
and Willy for having spotted the root cause of this bug.

Must follow any cubic for QUIC backport (2.6).
2022-08-03 14:27:20 +02:00
Frédéric Lécaille
8ddde4f05e BUG/MINOR: quic: Missing in flight ack eliciting packet counter decrement
The decrement was missing in quic_pktns_tx_pkts_release() called each time a
packet number space is discarded. This is not sure this bug could have an impact
during handshakes. This counter is used to cancel the timer used both for packet
detection and PTO, setting its value to null. So there could be retransmissions
or probing which could be triggered for nothing.

Must be backported to 2.6.
2022-08-03 12:59:59 +02:00
Christopher Faulet
6bb86539db BUG/MEDIUM: proxy: Perform a custom copy for default server settings
When a proxy is initialized with the settings of the default proxy, instead
of doing a raw copy of the default server settings, a custom copy is now
performed by calling srv_settings_copy(). This way, all settings will be
really duplicated. Without this deep copy, some pointers are shared between
several servers, leading to UAF, double-free or such bugs.

This patch relies on following commits:

  * b32cb9b51 REORG: server: Export srv_settings_cpy() function
  * 0b365e3cb MINOR: server: Constify source server to copy its settings

This patch should fix the issue #1804. It must be backported as far as 2.0.
2022-08-03 11:44:34 +02:00
Christopher Faulet
b32cb9b515 REORG: server: Export srv_settings_cpy() function
This function will be used to init a proxy with settings of the default
proxy. It is mandatory to fix a bug. To do so, it must be exposed.
2022-08-03 11:28:52 +02:00
Christopher Faulet
0b365e3cb5 MINOR: server: Constify source server to copy its settings
The source server used to initialize a new server, in srv_settings_cpy() and
sub-functions, is now a constant.

This patch is mandatory to fix a bug.
2022-08-03 11:28:23 +02:00
Christopher Faulet
bc6b23813f BUG/MINOR: backend: Don't increment conn_retries counter too early
The connection retry counter is incremented too early when a connection
fails. In SC_ST_CER state, errors handling must be performed before
incrementing the counter. Otherwise, we may consider the max connection
attempt is reached while a last one is in fact possible.

This patch must be backported to 2.6.
2022-08-03 11:16:35 +02:00
Christopher Faulet
14a60d420a BUG/MEDIUM: dns: Properly initialize new DNS session
When a new DNS session is created, all its fields are not properly
initialized. For instance, "tx_msg_offset" can have any value after the
allocation. So, to fix the bug, pool_zalloc() is now used to allocate new
DNS session.

This patch should fix the issue #1781. It must be backported as far as 2.4.
2022-08-03 10:30:07 +02:00
Christopher Faulet
642170a653 BUG/MINOR: peers: Use right channel flag to consider the peer as connected
When a peer open a new connection to another peer, it is considered as
connected when the hello message is sent. To do so, the peer applet was
relying on CF_WRITE_PARTIAL channel flag. However it is not the right flag
to use. This one is a transient flag. Depending on the scheduling, this flag
may be removed by the stream before the peer has a chance to see
it. Instead, CF_WROTE_DATA flag must be checked.

This patch is related to the issue #1799. It must be backported as far as
2.0.
2022-08-03 09:56:38 +02:00
Christopher Faulet
160fff665e BUG/MEDIUM: peers: limit reconnect attempts of the old process on reload
When peers are configured and HAProxy is reloaded or restarted, a
synchronization is performed between the old process and the new one. To do
so, the old process connects on the new one. If the synchronization fails,
it retries. However, there is no delay and reconnect attempts are not
bounded. Thus, it may loop for a while, consuming all the CPU. Of course, it
is unexpected, but it is possible. For instance, if the local peer is
misconfigured, an infinite loop can be observed if the connection succeeds
but not the synchronization. This prevents the old process to exit, except
if "hard-stop-after" option is set.

To fix the bug, the reconnect is delayed. The local peer already has a
expiration date to delay the reconnects. But it was not used on stopping
mode. So we use it not. Thanks to the previous fix, the reconnect timeout is
shorter in this case (500ms against 5s on running mode). In addition, we
also use the peers resync expiration date to not infinitely retries. It is
accurate because the new process, on its side, use this timeout to switch
from a local resync to a remote resync.

This patch depends on "MINOR: peers: Use a dedicated reconnect timeout when
stopping the local peer". It fixes the issue #1799. It should be backported
as far as 2.0.
2022-08-03 09:56:38 +02:00
Christopher Faulet
ab4b094055 MINOR: peers: Use a dedicated reconnect timeout when stopping the local peer
When a process is stopped or reload, a dedicated reconnect timeout is now
used. For now, this timeout is not used because the current code retries
immediately to reconnect to perform the local synchronization with the new
local peer, if any.

This patch is required to fix the issue #1799. It should be backported as
far as 2.0 with next fixes.
2022-08-03 09:56:38 +02:00
Christopher Faulet
1b6fa7f5ea MINOR: peers: Add a warning about incompatible SSL config for the local peer
In peers section, it is possible to enable SSL for the local peer. In this
case, the bind line and the server line should both be configured. A
"default-server" directive may also be used to configure the SSL on the
server side. However there is no test to be sure the SSL is enabled on both
sides. It is an problem because the local resync performed during a reload
will be impossible and it is probably not the expected behavior.

So, it is now checked during the configuration validation. A warning message
is displayed if the SSL is not properly configured for the local peer.

This patch is related to issue #1799. It should probably be backported to 2.6.
2022-08-03 09:56:38 +02:00
Amaury Denoyelle
bd6ec1bf84 MEDIUM: mux-quic: implement http-keep-alive timeout
Complete QUIC MUX timeout refresh function by using http-keep-alive
timeout. It is used when the connection is idle after having handle at
least one request.

To implement this a new member <idle_start> has been defined in qcc
structure. This is used as timestamp for when the connection became idle
and is used as base time for http keep-alive timeout
2022-08-01 15:00:13 +02:00
Amaury Denoyelle
c603de4d84 MINOR: mux-quic: count in-progress requests
Add a new qcc member named <nb_hreq>. Its purpose is close to <nb_sc>
which represents the number of attached stream connectors. Both are
incremented inside qc_attach_sc().

The difference is on the decrement operation. While <nb_cs> is
decremented on sedesc detach callback, <nb_hreq> is decremented when the
qcs is locally closed.

In most cases, <nb_hreq> will be decremented before <nb_cs>. However, it
will be the reverse if a stream must be kept alive after detach callback.

The main purpose of this field is to implement http-keep-alive timeout.
Both <nb_sc> and <nb_hreq> must be null to activate the http-keep-alive
timeout.
2022-08-01 14:58:41 +02:00
Amaury Denoyelle
5fc05d17ad MEDIUM: mux-quic: adjust timeout refresh
Implement a new internal function qcc_refresh_timeout(). Its role will be
to reset QUIC MUX timeout depending if there is requests in progress or
not.

qcc_update_timeout() does not set a timeout if there is still attached
streams as in this case the upper layer is responsible to manage it.
Else it will activate the timeout depending on the connection current
status.

Timeout is refreshed on several locations : on stream detach and in I/O
handler and wake callback.

For the moment, only the default timeout is used (client or server). The
function may be expanded in the future to support more specific ones :
* http-keep-alive if connection is idle
* http-request when waiting for incomplete HTTP requests
* client/server-fin for graceful shutdown
2022-08-01 14:58:36 +02:00
Amaury Denoyelle
b6309456d0 MINOR: mux-quic: use timeout server for backend conns
Use timeout server in qcc_init() as default timeout for backend
connections. No impact for the moment as QUIC backend support is not
implemented.
2022-08-01 14:23:21 +02:00
Amaury Denoyelle
07bf8f4d86 MINOR: mux-quic: save proxy instance into qcc
Store a reference to proxy in the qcc structure. This will be useful to
access to proxy members outside of qcc_init().

Most notably, this change is required to implement timeout refreshing by
using the various timeouts configured at the proxy level.
2022-08-01 14:23:21 +02:00
Amaury Denoyelle
09ec3e09bd BUG/MINOR: mux-quic: do not free conn if attached streams
Ensure via qcc_is_dead() that a connection is not released instance
until all of qcs streams are detached by the upper layer, even if an
error has been reported or the timeout has fired.

On the other side, as qc_detach() always check the connection status,
this should ensure that we do not keep a connection if not necessary.

Without this patch, a qcc instance may be freed with some of its qcs
streams not detached. This is an incorrect behavior and will lead to a
BUG_ON fault. Note however that no occurence of this bug has been
produced currently. This patch is mainly a safety against future
occurences.

This should be backported up to 2.6.
2022-08-01 14:23:19 +02:00
Amaury Denoyelle
4ea5090f55 CLEANUP: mux-quic: remove useless app_ops is_active callback
Timeout in QUIC MUX has evolved from the simple first implementation. At
the beginning, a connection was considered dead unless bidirectional
streams were opened. This was abstracted through an app callback
is_active().

Now this paradigm has been reversed and a connection is considered alive
by default, unless an error has been reported or a timeout has already
been fired. The callback is_active() is thus not used anymore and can be
safely removed to simplify qcc_is_dead().

This commit should be backported to 2.6.
2022-08-01 14:13:51 +02:00
Amaury Denoyelle
d3973853c2 BUG/MINOR: mux-quic: prevent crash if conn released during IO callback
A qcc instance may be freed in the middle of qc_io_cb() if all streams
were purged. This will lead to a crash as qcc instance is reused after
this step. Jump directly to the end of the function to avoid this.

Note that this bug has not been triggered for the moment. This is a
safety fix to prevent it.

This must be backported up to 2.6.
2022-08-01 14:13:51 +02:00
Willy Tarreau
51d38a26fe BUG/MEDIUM: pattern: only visit equivalent nodes when skipping versions
Miroslav reported in issue #1802 a problem that affects atomic map/acl
updates. During an update, incorrect versions are properly skipped, but
in order to do so, we rely on ebmb_next() instead of ebmb_next_dup().
This means that if a new matching entry is in the process of being
added and is the first one to succeed in the lookup, we'll skip it due
to its version and use the next entry regardless of its value provided
that it has the correct version. For IP addresses and string prefixes
it's particularly visible because a lookup may match a new longer prefix
that's not yet committed (e.g. 11.0.0.1 would match 11/8 when 10/7 was
the only committed one), and skipping it could end up on 12/8 for
example. As soon as a commit for the last version happens, the issue
disappears.

This problem only affects tree-based matches: the "str", "ip", and "beg"
matches.

Here we replace the ebmb_next() values with ebmb_next_dup() for exact
string matches, and with ebmb_lookup_shorter() for longest matches,
which will first visit duplicates, then look for shorter prefixes. This
relies on previous commit:

    MINOR: ebtree: add ebmb_lookup_shorter() to pursue lookups

Both need to be backported to 2.4, where the generation ID was added.

Note that nowadays a simpler and more efficient approach might be employed,
by having a single version in the current tree, and a list of trees per
version. Manipulations would look up the tree version and work (and lock)
only in the relevant trees, while normal operations would be performed on
the current tree only. Committing would just be a matter of swapping tree
roots and deleting old trees contents.
2022-08-01 11:59:46 +02:00
Willy Tarreau
0dc9e6dca2 DEBUG: tools: provide a tree dump function for ebmbtrees as well
It's convenient for debugging IP trees. However we're not dumping the
full keys, for the sake of simplicity, only the 4 first bytes are dumped
as a u32 hex value. In practice this is sufficient for debugging. As a
reminder since it seems difficult to recover the command each time it's
needed, the output is converted to an image using dot from Graphviz:

    dot -o a.png -Tpng dump.txt
2022-08-01 11:59:15 +02:00