Commit Graph

6766 Commits

Author SHA1 Message Date
Willy Tarreau
77acaf5af5 MINOR: flags: add a new file to host flag dumping macros
The "flags" utility is useful but painful to maintain up to date. This
commit aims at providing a low-maintenance solution to keep flags up to
date, by proposing some macros that build a string from a set of flags
in a way that requires the least possible verbosity.

The idea will be to add an inline function dedicated to this just after
the flags declaration, and enumerate the flags one is interested in, and
that function will fill a string based on them.

Placing this inside the type files allows both haproxy and external tools
like "flags" to use it, but comes with a few constraints. First, the
files will be slightly less readable if these functions are huge, so they
need to stay as compact as possible. Second, the function will need
anprintf() and we don't want to include stdio.h in type files as it
proved to be particularly heavy and to cause definition headaches in
the past.

As such the file here only contains a macro enclosed in #ifdef EOF (that
is defined in stdio), and provides an alternate empty one when no stdio
is defined. This way it's the caller that has to include stdio first or
it won't get anything back, and in practice the locations relying on
this always have it.

The macro has to be used in 3 steps:
  - prologue: dumps 0 and exits if the value is zero
  - flags: the macro can be recursively called and it will push the
    flag from bottom to top so that they appear in the same order as
    today without requiring to be declared the other way around
  - epilogue: dump remaining flags that were not identified

The macro was arranged so that a single character can be used with no
other argument to declare all flags at once. Example:

  #define _(n, ...) __APPEND_FLAG(buf, len, del, flg, n, #n, __VA_ARGS__)
     _(0);
     _(X_FLAG1, _(X_FLAG2, _(X_FLAG3, _(X_FLAG4))));
     _(~0);
  #undef _

Existing files will have to be updated to rely on it, and more files
could come soon.
2022-09-09 14:47:31 +02:00
Frédéric Lécaille
3dd79d378c MINOR: h3: Send the h3 settings with others streams (requests)
This is the ->finalize application callback which prepares the unidirectional STREAM
frames for h3 settings and wakeup the mux I/O handler to send them. As haproxy is
at the same time always waiting for the client request, this makes haproxy
call sendto() to send only about 20 bytes of stream data. Furthermore in case
of heavy loss, this give less chances to short h3 requests to succeed.

Drawback: as at this time the mux sends its streams by their IDs ascending order
the stream 0 is always embedded before the unidirectional stream 3 for h3 settings.
Nevertheless, as these settings may be lost and received after other h3 request
streams, this is permitted by the RFC.

Perhaps there is a better way to do. This will have to be checked with Amaury.

Must be backported to 2.6.
2022-09-08 18:04:58 +02:00
Frédéric Lécaille
bb995eafc7 BUG/MINOR: quic: Speed up the handshake completion only one time
It is possible to speed up the handshake completion but only one time
by connection as mentionned in RFC 9002 "6.2.3. Speeding up Handshake Completion".
Add a flag to prevent this process to be run several times
(see https://www.rfc-editor.org/rfc/rfc9002#name-speeding-up-handshake-compl).

Must be backported to 2.6.
2022-09-08 18:04:58 +02:00
Willy Tarreau
3d4cdb198c MEDIUM: tasks/activity: combine the called function with the caller
Now instead of getting aggregate stats per called function, we have
them per function AND per call place. The "byaddr" sort considers
the function pointer first, then the call count, so that dominant
callers of a given callee are instantly spotted. This allows to get
sorted outputs like this:

Tasks activity:
  function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
  h1_io_cb                   17357952   40.91s    2.357us   4.849m    16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  sc_conn_io_cb              10357182   6.297s    607.0ns   27.93m    161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup
  process_stream              9891131   1.809m    10.97us   53.61m    325.2us <- sc_notify@src/stconn.c:1209 task_wakeup
  process_stream              9823934   1.887m    11.52us   48.31m    295.1us <- stream_new@src/stream.c:563 task_wakeup
  sc_conn_io_cb               9347863   16.59s    1.774us   6.143m    39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup
  h1_io_cb                     501344   1.848s    3.686us   6.544m    783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup
  sc_conn_io_cb                239717   492.3ms   2.053us   3.213m    804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup
  h2_io_cb                     173019   4.204s    24.30us   40.95s    236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup
  h2_io_cb                     149487   424.3ms   2.838us   14.63s    97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup
  other                        101893   4.626s    45.40us   14.84s    145.7us
  quic_lstnr_dghdlr             94389   614.0ms   6.504us   30.54s    323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup
  quic_conn_app_io_cb           92205   3.735s    40.51us   390.9ms   4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  qc_io_cb                      50355   19.01s    377.5us   10.65s    211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup
  h1_io_cb                      44427   155.0ms   3.489us   21.50s    484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup
  qc_io_cb                       9018   4.924s    546.0us   3.084s    342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup
  h1_timeout_task                3236   1.172ms   362.0ns   1.119s    345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup
  h1_io_cb                       2804   7.974ms   2.843us   1.980s    706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup
  sc_conn_io_cb                  2804   33.44ms   11.92us   2.597s    926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup
  qc_io_cb                       2623   2.669s    1.017ms   1.347s    513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup
  qc_process_timer                662   526.4us   795.0ns   1.081s    1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup
  quic_conn_app_io_cb             648   12.62ms   19.47us   225.7ms   348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup
  accept_queue_process            286   1.571ms   5.494us   72.55ms   253.7us <- listener_accept@src/listener.c:1099 tasklet_wakeup
  process_resolvers               176   157.8us   896.0ns   7.835ms   44.52us <- wake_expired_tasks@src/task.c:429 task_drop_running
  qc_io_cb                        167   10.71ms   64.12us   32.47ms   194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup
  sc_conn_io_cb                   123   80.05us   650.0ns   50.35ms   409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup
  h2_timeout_task                  32   30.69us   958.0ns   9.038ms   282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup
  task_run_applet                  24   33.79ms   1.408ms   5.838ms   243.3us <- sc_applet_create@src/stconn.c:489 appctx_wakeup
  accept_queue_process             17   56.34us   3.314us   7.505ms   441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup
  srv_cleanup_toremove_conns       16   1.133ms   70.81us   5.685ms   355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup
  srv_cleanup_idle_conns           16   74.57us   4.660us   2.797ms   174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running
  quic_conn_app_io_cb              12   786.9us   65.58us   2.042ms   170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup
  sc_conn_io_cb                     9   20.55us   2.283us   2.475ms   275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  h2_io_cb                          8   34.12us   4.265us   1.784ms   223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup
  task_run_applet                   4   6.615ms   1.654ms   2.306us   576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup
  quic_conn_io_cb                   4   4.278ms   1.069ms   6.469us   1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  qc_io_cb                          2   20.81us   10.40us   4.943us   2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup
  quic_conn_app_io_cb               2   752.9us   376.4us   63.97us   31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup
  quic_accept_run                   2   13.84us   6.920us   172.8us   86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup
  qc_idle_timer_task                2   295.0us   147.5us   8.761us   4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup
  qc_io_cb                          1   867.1us   867.1us   812.8us   812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup

... and calls sorted by address like this:

Tasks activity:
  function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
  task_run_applet                  23   32.73ms   1.423ms   5.837ms   253.8us <- sc_applet_create@src/stconn.c:489 appctx_wakeup
  task_run_applet                   4   6.615ms   1.654ms   2.306us   576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup
  accept_queue_process            285   1.566ms   5.495us   72.49ms   254.3us <- listener_accept@src/listener.c:1099 tasklet_wakeup
  accept_queue_process             17   56.34us   3.314us   7.505ms   441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup
  sc_conn_io_cb              10357182   6.297s    607.0ns   27.93m    161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup
  sc_conn_io_cb               9347863   16.59s    1.774us   6.143m    39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup
  sc_conn_io_cb                239717   492.3ms   2.053us   3.213m    804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup
  sc_conn_io_cb                  2804   33.44ms   11.92us   2.597s    926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup
  sc_conn_io_cb                   123   80.05us   650.0ns   50.35ms   409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup
  sc_conn_io_cb                     9   20.55us   2.283us   2.475ms   275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  process_resolvers               159   145.9us   917.0ns   7.823ms   49.20us <- wake_expired_tasks@src/task.c:429 task_drop_running
  srv_cleanup_idle_conns           16   74.57us   4.660us   2.797ms   174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running
  srv_cleanup_toremove_conns       16   1.133ms   70.81us   5.685ms   355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup
  process_stream              9891130   1.809m    10.97us   53.61m    325.2us <- sc_notify@src/stconn.c:1209 task_wakeup
  process_stream              9823933   1.887m    11.52us   48.31m    295.1us <- stream_new@src/stream.c:563 task_wakeup
  h1_io_cb                   17357952   40.91s    2.357us   4.849m    16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  h1_io_cb                     501344   1.848s    3.686us   6.544m    783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup
  h1_io_cb                      44427   155.0ms   3.489us   21.50s    484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup
  h1_io_cb                       2804   7.974ms   2.843us   1.980s    706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup
  h1_timeout_task                3236   1.172ms   362.0ns   1.119s    345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup
  h2_timeout_task                  32   30.69us   958.0ns   9.038ms   282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup
  h2_io_cb                     173019   4.204s    24.30us   40.95s    236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup
  h2_io_cb                     149487   424.3ms   2.838us   14.63s    97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup
  h2_io_cb                          8   34.12us   4.265us   1.784ms   223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup
  qc_io_cb                      50355   19.01s    377.5us   10.65s    211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup
  qc_io_cb                       9018   4.924s    546.0us   3.084s    342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup
  qc_io_cb                       2623   2.669s    1.017ms   1.347s    513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup
  qc_io_cb                        167   10.71ms   64.12us   32.47ms   194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup
  qc_io_cb                          2   20.81us   10.40us   4.943us   2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup
  qc_io_cb                          1   867.1us   867.1us   812.8us   812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup
  qc_idle_timer_task                2   295.0us   147.5us   8.761us   4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup
  quic_conn_io_cb                   4   4.278ms   1.069ms   6.469us   1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  quic_conn_app_io_cb           92205   3.735s    40.51us   390.9ms   4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  quic_conn_app_io_cb             648   12.62ms   19.47us   225.7ms   348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup
  quic_conn_app_io_cb              12   786.9us   65.58us   2.042ms   170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup
  quic_conn_app_io_cb               2   752.9us   376.4us   63.97us   31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup
  quic_lstnr_dghdlr             94389   614.0ms   6.504us   30.54s    323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup
  qc_process_timer                662   526.4us   795.0ns   1.081s    1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup
  quic_accept_run                   2   13.84us   6.920us   172.8us   86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup
  other                        101892   4.626s    45.40us   14.84s    145.7us

It already becomes visible that some tasks have different very costs
depending where they're called (e.g. process_stream). The method used
to wake them up is also shown. Applets are handled specially and shown
as appctx_wakeup.
2022-09-08 16:21:22 +02:00
Willy Tarreau
a3423873fe CLEANUP: activity: make the number of sched activity entries more configurable
This removes all the hard-coded 8-bit and 256 entries to use a pair of
macros instead so that we can more easily experiment with larger table
sizes if needed.
2022-09-08 14:55:09 +02:00
Willy Tarreau
e0e6d81460 CLEANUP: task: move tid and wake_date into the common part
There used to be one tid for tasklets and a thread_mask for tasks. Since
2.7, both tasks and tasklets now use a tid (albeit with a very slight
semantic difference for the negative value), to in order to limit code
duplication and to ease debugging it makes sense to move tid into the
common part. One limitation is that it will leave a hole in the structure,
but we now have the wake_date that is always present and can move there as
well to plug the hole.

This results in something overall pretty clean (and cleaner than before),
with the low-level stuff (state,tid,process,context) appearing first, then
the caller stuff (caller,wake_date,calls,debug) next, and finally the
type-specific stuff (rq/wq/expire/nice).
2022-09-08 14:30:38 +02:00
Willy Tarreau
2830d282e5 DEBUG: task: simplify the caller recording in DEBUG_TASK
Instead of storing an index that's swapped at every call, let's use the
two pointers as a shifting history. Now we have a permanent "caller"
field that records the last caller, and an optional prev_caller in the
debug section enabled by DEBUG_TASK that keeps a copy of the previous
caller one. This way, not only it's much easier to follow what's
happening during debugging, but it saves 8 bytes in the struct task in
debug mode and still keeps it under 2 cache lines in nominal mode, and
this will finally be usable everywhere and later in profiling.

The caller_idx was also used as a hint that the entry was freed, in order
to detect wakeup-after-free. This was changed by setting caller to -1
instead and preserving its value in caller[1].

Finally, the operations were made atomic. That's not critical but since
it's used for debugging and race conditions represent a significant part
of the issues in multi-threaded mode, it seems wise to at least eliminate
some possible factors of faulty analysis.
2022-09-08 14:30:38 +02:00
Willy Tarreau
8d71abf0cd DEBUG: applet: instrument appctx_wakeup() to log the caller's location
appctx_wakeup() relies on task_wakeup(), but since it calls it from a
function, the calling place is always appctx_wakeup() itself, which is
not very useful.

Let's turn it to a macro so that we can log the location of the caller
instead. As an example, the cli_io_handler() which used to be seen as
this:

  (gdb) p *appctx->t.debug.caller[0]
  $10 = {
    func = 0x9ffb78 <__func__.37996> "appctx_wakeup",
    file = 0x9b336a "include/haproxy/applet.h",
    line = 110,
    what = 1 '\001',
    arg8 = 0 '\000',
    arg32 = 0
  }

Now shows the more useful:

  (gdb) p *appctx->t.debug.caller[0]
  $6 = {
    func = 0x9ffe80 <__func__.38641> "sc_app_chk_snd_applet",
    file = 0xa00320 "src/stconn.c",
    line = 996,
    what = 6 '\006',
    arg8 = 0 '\000',
    arg32 = 0
  }
2022-09-08 14:30:38 +02:00
Willy Tarreau
e08af9a0f4 DEBUG: task: use struct ha_caller instead of arrays of file:line
This reduces the task struct by 8 bytes, reduces the code size a little
bit by simplifying the calling convention (one argument dropped), and
as a bonus provides the function name in the caller.
2022-09-08 14:30:38 +02:00
Willy Tarreau
d2b2ad902b DEBUG: task: define a series of wakeup types for tasks and tasklets
The WAKEUP_* values will be used to report how a task/tasklet was woken
up, and task_wakeup_type_str() wlil report the associated function name.
2022-09-08 14:30:16 +02:00
Willy Tarreau
d96d214b4c CLEANUP: debug: use struct ha_caller for memstat
The memstats code currently defines its own file/function/line number,
type and extra pointer. We don't need to keep them separate and we can
easily replace them all with just a struct ha_caller. Note that the
extra pointer could be converted to a pool ID stored into arg8 or
arg32 and be dropped as well, but this would first require to define
IDs for pools (which we currently do not have).
2022-09-08 14:19:15 +02:00
Willy Tarreau
7f2f1f294c MINOR: debug: add struct ha_caller to describe a calling location
The purpose of this structure is to assemble all constant parts of a
generic calling point for a specific event. These ones are created by
the compiler as a static const element outside of the code path, so
they cost nothing in terms of CPU, and a pointer to that descriptor
can be passed to the place that needs it. This is very similar to what
is being done for the mem_stat stuff.

This will be useful to simplify and improve DEBUG_TASK.
2022-09-08 14:19:15 +02:00
Willy Tarreau
4a3907617f MINOR: tools: add generic pointer hashing functions
There are a few places where it's convenient to hash a pointer to compute
a statistics bucket. Here we're basically reusing the hash that was used
by memory profiling with a minor update that the multiplier was corrected
to be prime and stand by its promise to have equal numbers of 1 and 0,
and that 32-bit platforms won't lose range anymore.

A two-pointer variant was also added.
2022-09-08 14:19:15 +02:00
Willy Tarreau
6a28a30efa MINOR: tasks: do not keep cpu and latency times in struct task
It was a mistake to put these two fields in the struct task. This
was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task
CPU time and latency"). These fields are used solely by streams in
order to report the measurements via the lat_ns* and cpu_ns* sample
fetch functions when task profiling is enabled. For the rest of the
tasks, this is pure CPU waste when profiling is enabled, and memory
waste 100% of the time, as the point where these latencies and usages
are measured is in the profiling array.

Let's move the fields to the stream instead, and have process_stream()
retrieve the relevant info from the thread's context.

The struct task is now back to 120 bytes, i.e. almost two cache lines,
with 32 bit still available.
2022-09-08 14:19:15 +02:00
Willy Tarreau
1efddfa6bf MINOR: sched: store the current profile entry in the thread context
The profile entry that corresponds to the current task/tasklet being
profiled is now stored into the thread's context. This will allow it
to be accessed from the tasks themselves. This is needed for an upcoming
fix.
2022-09-08 14:19:15 +02:00
Willy Tarreau
62b5b96bcc BUG/MINOR: sched: properly account for the CPU time of dying tasks
When task profiling is enabled, the scheduler can measure and report
the cumulated time spent in each task and their respective latencies. But
this was wrong for tasks with few wakeups as well as for self-waking ones,
because the call date needed to measure how long it takes to process the
task is retrieved in the task itself (->wake_date was turned to the call
date), and we could face two conditions:
  - a new wakeup while the task is executing would reset the ->wake_date
    field before returning and make abnormally low values being reported;
    that was likely the case for taskrun_applet for self-waking applets;

  - when the task dies, NULL is returned and the call date couldn't be
    retrieved, so that CPU time was not being accounted for. This was
    particularly visible with process_stream() which is usually called
    only twice per request, and whose time was systematically halved.

The cleanest solution here is to keep in mind that the scheduler already
uses quite a bit of local context in th_ctx, and place the intermediary
values there so that they cannot vanish. The wake_date has to be reset
immediately once read, and only its copy is used along the function. Note
that this must be done both for tasks and tasklet, and that until recently
tasklets were also able to report wrong values due to their sole dependency
on TH_FL_TASK_PROFILING between tests.

One nice benefit for future improvements is that such information will now
be available from the task without having to be stored into the task itself
anymore.

Since the tasklet part was computed on wrapping 32-bit arithmetics and
the task one was on 64-bit, the values were now consistently moved to
32-bit as it's already largely sufficient (4s spent in a task is more
than twice what the watchdog would tolerate). Some further cleanups might
be necessary, but the patch aimed at staying minimal.

Task profiling output after 1 million HTTP request previously looked like
this:

  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2012338   4.850s    2.410us   12.91s    6.417us
    process_stream              2000136   9.594s    4.796us   34.26s    17.13us
    sc_conn_io_cb               2000135   1.973s    986.0ns   30.24s    15.12us
    h1_timeout_task                 137      -         -      2.649ms   19.34us
    accept_queue_process             49   152.3us   3.107us   321.7yr   6.564yr
    main+0x146430                     7   5.250us   750.0ns   25.92us   3.702us
    srv_cleanup_idle_conns            1   559.0ns   559.0ns   918.0ns   918.0ns
    task_run_applet                   1      -         -      2.162us   2.162us

  Now it looks like this:
  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2014194   4.794s    2.380us   13.75s    6.826us
    process_stream              2000151   20.01s    10.00us   36.04s    18.02us
    sc_conn_io_cb               2000148   2.167s    1.083us   32.27s    16.13us
    h1_timeout_task                 198   54.24us   273.0ns   3.487ms   17.61us
    accept_queue_process             52   158.3us   3.044us   409.9us   7.882us
    main+0x1466e0                    18   16.77us   931.0ns   63.98us   3.554us
    srv_cleanup_toremove_conns        8   282.1us   35.26us   546.8us   68.35us
    srv_cleanup_idle_conns            3   149.2us   49.73us   8.131us   2.710us
    task_run_applet                   3   268.1us   89.38us   11.61us   3.871us

Note the two-fold difference on process_stream().

This feature is essentially used for debugging so it has extremely limited
impact. However it's used quite a bit more in bug reports and it would be
desirable that at least 2.6 gets this fix backported. It depends on at least
these two previous patches which will then also have to be backported:

     MINOR: task: permanently enable latency measurement on tasklets
     CLEANUP: task: rename ->call_date to ->wake_date
2022-09-08 14:19:15 +02:00
Willy Tarreau
04e50b3d32 CLEANUP: task: rename ->call_date to ->wake_date
This field is misnamed because its real and important content is the
date the task was woken up, not the date it was called. It temporarily
holds the call date during execution but this remains confusing. In
fact before the latency measurements were possible it was indeed a call
date. Thus is will now be called wake_date.

This change is necessary because a subsequent fix will require the
introduction of the real call date in the thread ctx.
2022-09-08 14:19:15 +02:00
Willy Tarreau
768c2c5678 MINOR: task: permanently enable latency measurement on tasklets
When tasklet latency measurement was enabled in 2.4 with commit b2285de04
("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"),
the feature was conditionned on DEBUG_TASK because the field would add 8
bytes to the struct tasklet.

This approach was not a very good idea because the struct ends on an int
anyway thus it does finish with a 32-bit hole regardless of the presence
of this field. What is true however is that adding it turned a 64-byte
struct to 72-byte when caller debugging is enabled.

This patch revisits this with a minor change. Now only the lowest 32
bits of the call date are stored, so they always fit in the remaining
hole, and this allows to remove the dependency on DEBUG_TASK. With
debugging off, we're now seeing a 48-byte struct, and with debugging
on it's exactly 64 bytes, thus still exactly one cache line. 32 bits
allow a latency of 4 seconds on a tasklet, which already indicates a
completely dead process, so there's no point storing the upper bits at
all. And even in the event it would happen once in a while, the lost
upper bits do not really add any value to the debug reports. Also, now
one tasklet wakeup every 4 billion will not be sampled due to the test
on the value itself. Similarly we just don't care, it's statistics and
the measurements are not 9-digit accurate anyway.
2022-09-08 14:19:15 +02:00
Willy Tarreau
0fae3a0360 BUG/MINOR: task: make task_instant_wakeup() work on a task not a tasklet
There's a subtle (harmless) bug in task_instant_wakeup(). As it uses
some tasklet code instead of some task code, the debug part also acts
on the tasklet equivalent, and the call_date is only set when DEBUG_TASK
is set instead of inconditionally like with tasks. As such, without this
debugging macro, call dates are not updated for tasks woken this way.

There isn't any impact yet because this function was introduced in 2.6 to
solve certain classes of issues and is not used yet, and in the worst case
it would only affect the reported latency time.

This may be backported to 2.6 in case a future fix would depend on it but
currently will not fix existing code.
2022-09-08 14:19:15 +02:00
Willy Tarreau
f27acd961e BUG/MINOR: task: always reset a new tasklet's call date
The tasklet's call date was not reset, so if profiling was enabled while
some tasklets were in the run queue, their initial random value could be
used to preload a bogus initial latency value into the task profiling bin.
Let's just zero the initial value.

This should be backported to 2.4 as it was brought with initial commit
b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK
is set"). The impact is very low though.
2022-09-08 14:19:15 +02:00
Frédéric Lécaille
3122c75fd1 BUG/MINOR: quic: Wrong connection ID to thread ID association
To work, quic_pin_cid_to_tid() must set cid[0] to a value with <target_id>
as <global.nbthread> modulo. For each integer n, (n - (n % m)) + d has always
d as modulo m (with d < m).

So, this statement seemed correct:

     cid[0] = cid[0] - (cid[0] % global.nbthread) + target_tid;

except when n wraps or when another modulo is applied to the addition result.
Here, for 8bit modulo arithmetic, if m does not divides 256, this cannot
works for values which wraps when we increment them by d.
For instance n=255 m=3 and d=1 the formula result is 0 (should be d).

To fix this, we first limit c[0] to 255 - <target_id> to prevent c[0] from wrapping.

Thank you to @esb for having reported this issue in GH #1855.

Must be backported to 2.6
2022-09-07 15:59:43 +02:00
William Lallemand
d2be9d4c48 BUILD: quic: temporarly ignore chacha20_poly1305 for libressl
LibreSSL does not implement EVP_chacha20_poly1305() with EVP_CIPHER but
uses the EVP_AEAD API instead:

https://man.openbsd.org/EVP_AEAD_CTX_init

This patch disables this cipher for libreSSL for now.
2022-09-07 09:33:46 +02:00
William Lallemand
844009d77a BUILD: ssl: fix ssl_sock_switchtx_cbk when no client_hello_cb
When building HAProxy with USE_QUIC and libressl 3.6.0, the
ssl_sock_switchtx_cbk symbol is not found because libressl does not
implement the client_hello_cb.

A ssl_sock_switchtx_cbk version for the servername callback is available
but wasn't exported correctly.
2022-09-07 09:33:46 +02:00
William Lallemand
6d74e179ee BUILD: quic: add some ifdef around the SSL_ERROR_* for libressl
SSL_ERROR_WANT_ASYNC, SSL_ERROR_WANT_ASYNC_JOB and
SSL_ERROR_WANT_CLIENT_HELLO_CB does not seems supported by libressl.
2022-09-07 09:33:46 +02:00
Willy Tarreau
ce57777660 MINOR: muxes: add a "show_sd" helper to complete "show sess" dumps
This helper will be called for muxes that provide it and will be used
to let the mux provide extra information about the stream attached to
a stream descriptor. A line prefix is passed in argument so that the
mux is free to break long lines without breaking indent. No prefix
means no line breaks should be produced (e.g. for short dumps).
2022-09-02 15:48:50 +02:00
Willy Tarreau
178dda6b41 DEBUG: stream: minor rearrangement of a few fields in struct stream.
Some recent traces started to show confusing stream pointers ending with
0xe. The reason was that the stream's obj_type was almost unused in the
past and was stuffed in a hole in the structure. But now it's present in
all "show sess all" outputs and having to mentally match this value against
another one that's 0x17e lower is painful. The solution here is to move the
obj_type at the top, like in almost every other structure, but without
breaking the efficient layout.

This patch moves a few fields around and manages to both plug some holes
(16 bytes saved, 976 to 960) and avoid channels needlessly crossing cache
boundaries (res was spread over 3 lines vs 2 now).

Nothing else was changed. It would be desirable to backport this to 2.6
since it's where dumps are currently being processed the most.
2022-09-02 15:48:10 +02:00
Willy Tarreau
d8009a1ca6 BUILD: debug: make sure debug macros are never empty
As outlined in commit f7ebe584d7 ("BUILD: debug: Add braces to if
statement calling only CHECK_IF()"), the BUG_ON() family of macros
is incorrectly defined to be empty when debugging is disabled, and
that can lead to trouble. Make sure they always fall back to the
usual "do { } while (0)". This may be backported to 2.6 if needed,
though no such issue was met there to date.
2022-08-31 10:53:53 +02:00
Frdric Lcaille
c242832af3 BUG/MINOR: quic: Missing header protection AES cipher context initialisations (draft-v2)
This bug arrived with this commit:
   "MINOR: quic: Add reusable cipher contexts for header protection"

haproxy could crash because of missing cipher contexts initializations for
the header protection and draft-v2 Initial secrets. This was due to the fact
that these initialization both for RX and TX secrets were done outside of
qc_new_isecs(). The role of this function is definitively to initialize these
cipher contexts in addition to the derived secrets. Indeed this function is called
by qc_new_conn() which initializes the connection but also by qc_conn_finalize()
which also calls qc_new_isecs() in case of a different QUIC version was negotiated
by the peers from the one used by the client for its first Initial packet.

This was reported by "v2" QUIC interop test with at least picoquic as client.

Must be backported to 2.6.
2022-08-29 18:46:40 +02:00
Frédéric Lécaille
f34c1c9568 CLEANUP: quic: No more use ->rx_list MT_LIST entry point (quic_rx_packet)
This quic_rx_packet is definitively no more used.

Should be backported to 2.6 to ease the future backports.
2022-08-24 18:17:13 +02:00
William Lallemand
b10b1196b8 MINOR: resolvers: shut the warning when "default" resolvers is implicit
Shut the connect() warning of resolvers_finalize_config() when the
configuration was not emitted manually.

This shuts the warning for the "default" resolvers which is created
automatically for the httpclient.

Must be backported in 2.6.
2022-08-24 14:56:42 +02:00
Christopher Faulet
871dd82117 BUG/MINOR: tcpcheck: Disable QUICKACK only if data should be sent after connect
It is only a real problem for agent-checks when there is no agent string to
send. The condition to disable TCP_QUICKACK was only based on the action
type following the connect one. But it is not always accurate. indeed, for
agent-checks, there is always a SEND action. But if there is no "agent-send"
string defined, nothing is sent. In this case, this adds 200ms of latency
with no reason.

To fix the bug, a flag is now used on the CONNECT action to instruct there
are data that should be sent after the connect. For health-checks, this flag
is set if the action following the connect is a SEND action. For
agent-checks, it is set if an "agent-send" string is defined.

This patch should fix the issue #1836. It must be backported as far as 2.2.
2022-08-24 11:59:04 +02:00
Frédéric Lécaille
a2d8ad20a3 MINOR: quic: Replace MT_LISTs by LISTs for RX packets.
Replace ->rx.pqpkts quic_enc_level struct member MT_LIST by an LIST.
Same thing for ->list quic_rx_packet struct member MT_LIST.
Update the code consequently. This was a reminisence of the multithreading
support (several threads by connection).

Must be backported to 2.6
2022-08-23 17:55:02 +02:00
Frédéric Lécaille
ea4a5cbbdf BUG/MINOR: mux-quic: Fix memleak on QUIC stream buffer for unacknowledged data
Some clients send CONNECTION_CLOSE frame without acknowledging the STREAM
data haproxy has sent. In this case, when closing the connection if
there were remaining data in QUIC stream buffers, they were not released.

Add a <closing> boolean option to qc_stream_desc_free() to force the
stream buffer memory releasing upon closing connection.

Thank you to Tristan for having reported such a memory leak issue in GH #1801.

Must be backported to 2.6.
2022-08-20 19:08:31 +02:00
William Lallemand
62c0b99e3b MINOR: ssl/cli: implement "add ssl ca-file"
In ticket #1805 an user is impacted by the limitation of size of the CLI
buffer when updating a ca-file.

This patch allows a user to append new certificates to a ca-file instead
of trying to put them all with "set ssl ca-file"

The implementation use a new function ssl_store_dup_cafile_entry() which
duplicates a cafile_entry and its X509_STORE.

ssl_store_load_ca_from_buf() was modified to take an apped parameter so
we could share the function for "set" and "add".
2022-08-19 19:58:53 +02:00
William Lallemand
d4774d3cfa MINOR: ssl: handle ca-file appending in cafile_entry
In order to be able to append new CA in a cafile_entry,
ssl_store_load_ca_from_buf() was reworked and a "append" parameter was
added.

The function is able to keep the previous X509_STORE which was already
present in the cafile_entry.
2022-08-19 19:58:53 +02:00
Frédéric Lécaille
86a53c5669 MINOR: quic: Add reusable cipher contexts for header protection
Implement quic_tls_rx_hp_ctx_init() and quic_tls_tx_hp_ctx_init() to initiliaze
such header protection cipher contexts for each RX and TX parts and for each
packet number spaces, only one time by connection.
Make qc_new_isecs() call these two functions to initialize the cipher contexts
of the Initial secrets. Same thing for ha_quic_set_encryption_secrets() to
initialize the cipher contexts of the subsequent derived secrets (ORTT, 1RTT,
Handshake).
Modify qc_do_rm_hp() and quic_apply_header_protection() to reuse these
cipher contexts.
Note that there is no need to modify the key update for the header protection.
The header protection secrets are never updated.
2022-08-19 18:31:59 +02:00
Willy Tarreau
1cc08a33e1 MINOR: applet: add a function to reset the svcctx of an applet
The CLI needs to reset the svcctx between commands, and there was nothing
done to handle this. Let's add appctx_reset_svcctx() to do that, it's the
closing equivalent of appctx_reserve_svcctx().

This will have to be backported to 2.6 as it will be used by a subsequent
patch to fix a bug.
2022-08-18 18:16:36 +02:00
Amaury Denoyelle
2c5a7ee333 REORG: h2: extract cookies concat function in http_htx
As specified by RFC 7540, multiple cookie headers are merged in a single
entry before passing it to a HTTP/1.1 connection. This step is
implemented during headers parsing in h2 module.

Extract this code in the generic http_htx module. This will allow to
reuse it quickly for HTTP/3 implementation which has the same
requirement for cookie headers.
2022-08-18 16:13:33 +02:00
Amaury Denoyelle
704675656b BUG/MEDIUM: quic: fix crash on MUX send notification
MUX notification on TX has been edited recently : it will be notified
only when sending its own data, and not for example on retransmission by
the quic-conn layer. This is subject of the patch :
  b29a1dc2f4
  BUG/MINOR: quic: do not notify MUX on frame retransmit

A new flag QUIC_FL_CONN_RETRANS_LOST_DATA has been introduced to
differentiate qc_send_app_pkts invocation by MUX and directly by the
quic-conn layer in quic_conn_app_io_cb(). However, this is a first
problem as internal quic-conn layer usage is not limited to
retransmission. For example for NEW_CONNECTION_ID emission.

Another problem much important is that send functions are also called
through quic_conn_io_cb() which has not been protected from MUX
notification. This could probably result in crash when trying to notify
the MUX.

To fix both problems, quic-conn flagging has been inverted : when used
by the MUX, quic-conn is flagged with QUIC_FL_CONN_TX_MUX_CONTEXT. To
improve the API, MUX must now used qc_send_mux which ensure the flag is
set. qc_send_app_pkts is now static and can only be used by the
quic-conn layer.

This must be backported wherever the previously mentionned patch is.
2022-08-18 11:33:22 +02:00
Amaury Denoyelle
b29a1dc2f4 BUG/MINOR: quic: do not notify MUX on frame retransmit
On STREAM emission, quic-conn notifies MUX through a callback named
qcc_streams_sent_done(). This also happens on retransmission : in this
case offset are examined and notification is ignored if already seen.

However, this behavior has slightly changed since
  e53b489826
  BUG/MEDIUM: mux-quic: fix server chunked encoding response

Indeed, if offset diff is NULL, frame is now not ignored. This is to
support FIN notification with a final empty STREAM frame. A side-effect
of this is that if the last stream frame is retransmitted, it won't be
ignored in qcc_streams_sent_done().

In most cases, this side-effect is harmless as qcs instance will soon be
freed after being closed. But if qcs is still alive, this will cause a
BUG_ON crash as it is considered as locally closed.

This bug depends on delay condition and seems to be extremely rare. But
it might be the reason for a crash seen on interop with s2n client on
http3 testcase :

FATAL: bug condition "qcs->st == QC_SS_CLO" matched at src/mux_quic.c:372
  call trace(16):
  | 0x558228912b0d [b8 01 00 00 00 c6 00 00]: main-0x1c7878
  | 0x558228917a70 [48 8b 55 d8 48 8b 45 e0]: qcc_streams_sent_done+0xcf/0x355
  | 0x558228906ff1 [e9 29 05 00 00 48 8b 05]: main-0x1d3394
  | 0x558228907cd9 [48 83 c4 10 85 c0 0f 85]: main-0x1d26ac
  | 0x5582289089c1 [48 83 c4 50 85 c0 75 12]: main-0x1d19c4
  | 0x5582288f8d2a [48 83 c4 40 48 89 45 a0]: main-0x1e165b
  | 0x5582288fc4cc [89 45 b4 83 7d b4 ff 74]: qc_send_app_pkts+0xc6/0x1f0
  | 0x5582288fd311 [85 c0 74 12 eb 01 90 48]: main-0x1dd074
  | 0x558228b2e4c1 [48 c7 c0 d0 60 ff ff 64]: run_tasks_from_lists+0x4e6/0x98e
  | 0x558228b2f13f [8b 55 80 29 c2 89 d0 89]: process_runnable_tasks+0x7d6/0x84c
  | 0x558228ad9aa9 [8b 05 75 16 4b 00 83 f8]: run_poll_loop+0x80/0x48c
  | 0x558228ada12f [48 8b 05 aa c5 20 00 48]: main-0x256
  | 0x7ff01ed2e609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
  | 0x7ff01e8ca163 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e

To reproduce it locally, code was artificially patched to produce
retransmission and avoid qcs liberation.

In order to fix this and avoid future class of similar problem, the best
way is to not call qcc_streams_sent_done() to notify MUX for
retranmission. To implement this, we test if any of
QUIC_FL_CONN_RETRANS_OLD_DATA or the new flag
QUIC_FL_CONN_RETRANS_LOST_DATA is set. A new wrapper
qc_send_app_retransmit() has been added to set the new flag as a
complement to already existing qc_send_app_probing().

This must be backported up to 2.6.
2022-08-17 11:06:24 +02:00
Amaury Denoyelle
cc13047364 MINOR: quic: refactor application send
Adjust qc_send_app_pkts function : remove <old_data> arg and provide a
new wrapper function qc_send_app_probing() which should be used instead
when probing with old data.

This simplifies the interface of the default function, most notably for
the MUX which does not interfer with retransmission.
QUIC_FL_CONN_RETRANS_OLD_DATA flag is set/unset directly in the wrapper
qc_send_app_probing().

At the same time, function documentation has been updated to clarified
arguments and return values.

This commit will be useful for the next patch to differentiate MUX and
retransmission send context. As a consequence, the current patch should
be backported wherever the next one will be.
2022-08-17 11:05:49 +02:00
Amaury Denoyelle
bf3c208760 BUG/MEDIUM: mux-quic: reject uni stream ID exceeding flow control
Emit STREAM_LIMIT_ERROR if a client tries to open an unidirectional
stream with an ID greater than the value specified by our flow-control
limit. The code is similar to the bidirectional stream opening.

MAX_STREAMS_UNI emission is not implement for the moment and is left as
a TODO. This should not be too urgent for the moment : in HTTP/3, a
client has only a limited use for unidirectional streams (H3 control
stream + 2 QPACK streams). This is covered by the value provided by
haproxy in transport parameters.

This patch has been tagged with BUG as it should have prevented last
crash reported on github issue #1808 when opening a new unidirectional
streams with an invalid ID. However, it is probably not the main cause
of the bug contrary to the patch
  commit 11a6f4007b
  BUG/MINOR: quic: Wrong status returned by qc_pkt_decrypt()

This must be backported up to 2.6.
2022-08-17 11:05:19 +02:00
Amaury Denoyelle
26aa399d6b MINOR: qpack: report error on enc/dec stream close
As specified by RFC 9204, encoder and decoder streams must not be
closed. If the peer behaves incorrectly and closes one of them, emit a
H3_CLOSED_CRITICAL_STREAM connection error.

To implement this, QPACK stream decoding API has been slightly adjusted.
Firstly, fin parameter is passed to notify about FIN STREAM bit.
Secondly, qcs instance is passed via unused void* context. This allows
to use qcc_emit_cc_app() function to report a CONNECTION_CLOSE error.
2022-08-17 11:04:53 +02:00
Willy Tarreau
cc1a2a1867 MINOR: chunk: inline alloc_trash_chunk()
This function is responsible for all calls to pool_alloc(trash), whose
total size can be huge. As such it's quite a pain that it doesn't provide
more hints about its users. However, since the function is tiny, it fully
makes sense to inline it, the code is less than 0.1% larger with this.
This way we can now detect where the callers are via "show profiling",
e.g.:

       0 1953671           0 32071463136| 0x59960f main+0x10676f p_free(-16416) [pool=trash]
       0       1           0       16416| 0x59960f main+0x10676f p_free(-16416) [pool=trash]
 1953672       0 32071479552           0| 0x599561 main+0x1066c1 p_alloc(16416) [pool=trash]
       0  976835           0 16035723360| 0x576ca7 http_reply_to_htx+0x447/0x920 p_free(-16416) [pool=trash]
       0       1           0       16416| 0x576ca7 http_reply_to_htx+0x447/0x920 p_free(-16416) [pool=trash]
  976835       0 16035723360           0| 0x576a5d http_reply_to_htx+0x1fd/0x920 p_alloc(16416) [pool=trash]
       1       0       16416           0| 0x576a5d http_reply_to_htx+0x1fd/0x920 p_alloc(16416) [pool=trash]
2022-08-17 10:45:22 +02:00
Willy Tarreau
42b180dcdb MINOR: pools/memprof: store and report the pool's name in each bin
Storing the pointer to the pool along with the stats is quite useful as
it allows to report the name. That's what we're doing here. We could
store it in place of another field but that's not convenient as it would
require to change all functions that manipulate counters. Thus here we
store one extra field, as well as some padding because the struct turns
56 bytes long, thus better go to 64 directly. Example of output from
"show profiling memory":

      2      0       48         0|  0x4bfb2c ha_quic_set_encryption_secrets+0xcc/0xb5e p_alloc(24) [pool=quic_tls_iv]
      0  55252        0  10608384|  0x4bed32 main+0x2beb2 free(-192)
     15      0     2760         0|  0x4be855 main+0x2b9d5 p_alloc(184) [pool=quic_frame]
      1      0     1048         0|  0x4be266 ha_quic_add_handshake_data+0x2b6/0x66d p_alloc(1048) [pool=quic_crypto]
      3      0      552         0|  0x4be142 ha_quic_add_handshake_data+0x192/0x66d p_alloc(184) [pool=quic_frame]
  31276      0  6755616         0|  0x4bb8f9 quic_sock_fd_iocb+0x689/0x69b p_alloc(216) [pool=quic_dgram]
      0  31424        0   6787584|  0x4bb7f3 quic_sock_fd_iocb+0x583/0x69b p_free(-216) [pool=quic_dgram]
    152      0    32832         0|  0x4bb4d9 quic_sock_fd_iocb+0x269/0x69b p_alloc(216) [pool=quic_dgram]
2022-08-17 10:34:00 +02:00
Willy Tarreau
facfad2b64 MINOR: pool/memprof: report pool alloc/free in memory profiling
Pools are being used so well that it becomes difficult to profile their
usage via the regular memory profiling. Let's add new entries for pools
there, named "p_alloc" and "p_free" that correspond to pool_alloc() and
pool_free(). Ideally it would be nice to only report those that fail
cache lookups but that's complicated, particularly on the free() path
since free lists are released in clusters to the shared pools.

It's worth noting that the alloc_tot/free_tot fields can easily be
determined by multiplying alloc_calls/free_calls by the pool's size, and
could be better used to store a pointer to the pool itself. However it
would require significant changes down the code that sorts output.

If this were to cause a measurable slowdown, an alternate approach could
consist in using a different value of USE_MEMORY_PROFILING to enable pools
profiling. Also, this profiler doesn't depend on intercepting regular malloc
functions, so we could also imagine enabling it alone or the other one alone
or both.

Tests show that the CPU overhead on QUIC (which is already an extremely
intensive user of pools) jumps from ~7% to ~10%. This is quite acceptable
in most deployments.
2022-08-17 09:38:05 +02:00
Willy Tarreau
219afa2ca8 MINOR: memprof: export the minimum definitions for memory profiling
Right now it's not possible to feed memory profiling info from outside
activity.c, so let's export the function and move the enum and struct
to the include file.
2022-08-17 09:03:57 +02:00
Willy Tarreau
0b8e9ceb12 MINOR: ring: add support for a backing-file
This mmaps a file which will serve as the backing-store for the ring's
contents. The idea is to provide a way to retrieve sensitive information
(last logs, debugging traces) even after the process stops and even after
a possible crash. Right now this was possible by connecting to the CLI
and dumping the contents of the ring live, but this is not handy and
consumes quite a bit of resources before it is needed.

With a backing file, the ring is effectively RAM-mapped file, so that
contents stored there are the same as those found in the file (the OS
doesn't guarantee immediate sync but if the process dies it will be OK).

Note that doing that on a filesystem backed by a physical device is a
bad idea, as it will induce slowdowns at high loads. It's really
important that the device is RAM-based.

Also, this may have security implications: if the file is corrupted by
another process, the storage area could be corrupted, causing haproxy
to crash or to overwrite its own memory. As such this should only be
used for debugging.
2022-08-12 11:18:46 +02:00
Willy Tarreau
6df10d872b MINOR: ring: support creating a ring from a linear area
Instead of allocating two parts, one for the ring struct itself and
one for the storage area, ring_make_from_area() will arrange the two
inside the same memory area, with the storage starting immediately
after the struct. This will allow to store a complete ring state in
shared memory areas for example.
2022-08-12 11:18:46 +02:00
Willy Tarreau
8df098c2b1 BUILD: ring: forward-declare struct appctx to avoid a build warning
When using ring.h standalone it emits warnings about appctx. Let's
forward-declare it.
2022-08-12 11:18:46 +02:00
Frdric Lcaille
7629f5d670 BUG/MEDIUM: quic: Wrong use of <token_odcid> in qc_lsntr_pkt_rcv()
This commit was not complete:
  "BUG/MEDIUM: quic: Possible use of uninitialized <odcid>
variable in qc_lstnr_params_init()"
<token_odcid> should have been directly passed to qc_lstnr_params_init()
without dereferencing it to prevent haproxy to have new chances to crash!

Must be backported to 2.6.
2022-08-11 19:12:12 +02:00
Frdric Lcaille
e9325e97c2 BUG/MEDIUM: quic: Possible use of uninitialized <odcid> variable in qc_lstnr_params_init()
When receiving a token into a client Initial packet without a cluster secret defined
by configuration, the <odcid> variable used to parse the ODCID from the token
could be used without having been initialized. Such a packet must be dropped. So
the sufficient part of this patch is this check:

+             }
+             else if (!global.cluster_secret && token_len) {
+                    /* Impossible case: a token was received without configured
+                    * cluster secret.
+                    */
+                    TRACE_PROTO("Packet dropped", QUIC_EV_CONN_LPKT,
+                    NULL, NULL, NULL, qv);
+                    goto drop;
              }

Take the opportunity of this patch to rework and make it more readable this part
of code where such a packet must be dropped removing the <check_token> variable.
When an ODCID is parsed from a token, new <token_odcid> new pointer variable
is set to the address of the parsed ODCID. This way, is not set but used it will
make crash haproxy. This was not always the case with an uninitialized local
variable.

Adapt the API to used such a pointer variable: <token> boolean variable is removed
from qc_lstnr_params_init() prototype.

This must be backported to 2.6.
2022-08-11 18:33:36 +02:00
Amaury Denoyelle
4c9a1642c1 MINOR: mux-quic: define new traces
Add new traces to help debugging on QUIC MUX. Most notable, the
following functions are now traced :
* qcc_emit_cc
* qcs_free
* qcs_consume
* qcc_decode_qcs
* qcc_emit_cc_app
* qcc_install_app_ops
* qcc_release_remote_stream
* qcc_streams_sent_done
* qc_init
2022-08-11 15:20:44 +02:00
Frdric Lcaille
a6920a25d9 MINOR: quic: Remove useless lock for RX packets
This lock was there be able to handle the RX packets for a connetion
from several threads. This is no more needed since a QUIC connection
is always handled by the same thread.

May be backported to 2.6
2022-08-11 14:33:43 +02:00
Frdric Lcaille
a8b2f843d2 MEDIUM: quic: xprt traces rework
Add a least as much as possible TRACE_ENTER() and TRACE_LEAVE() calls
to any function. Note that some functions do not have any access to the
a quic_conn argument when  receiving or parsing datagram at very low level.
2022-08-11 11:11:20 +02:00
Willy Tarreau
341ac99f4d BUG/MEDIUM: task: relax one thread consistency check in task_unlink_wq()
While testing the fix for the previous issue related to reloads with
hard_stop_after, I've met another one which could spuriously produce:

  FATAL: bug condition "t->tid >= 0 && t->tid != tid" matched at include/haproxy/task.h:266

In 2.3-dev2, we've added more consistency checks for a number of bug-
inducing programming errors related to the tasks, via commit e5d79bccc
("MINOR: tasks/debug: add a few BUG_ON() to detect use of wrong timer
queue"), and this check comes from there.

The problem that happens here is that when hard-stop-after is set, we
can abort the current thread even if there are still ongoing checks
(or connections in fact). In this case some tasks are present in a
thread's wait queue and are thus bound exclusively to this thread.

During deinit(), the collect and cleanup of all memory areas also
stops servers and kills their check tasks. And calling task_destroy()
does in turn call task_unlink_wq()... except that it's called from
thread 0 which doesn't match the initially planned thread number.

Several approaches are possible. One of them would consist in letting
threads perform their own cleanup (tasks, pools, FDs, etc). This would
possibly be even faster since done in parallel, but some corner cases
might be way more complicated (e.g. who will kill a check's task, or
what to do with a task found in a local wait queue or run queue, and
what about other consistency checks this could violate?).

Thus for now this patches takes an easier and more conservative
approach consisting in admitting that when the process is stopping,
this rule is not necessarily valid, and to let thread 0 collect all
other threads' garbage.

As such this patch can be backpoted to 2.4.
2022-08-10 18:03:11 +02:00
Amaury Denoyelle
f2476053f9 MINOR: quic: replace custom buf on Tx by default struct buffer
On first prototype version of QUIC, emission was multithreaded. To
support this, a custom thread-safe ring-buffer has been implemented with
qring/cbuf.

Now the thread model has been adjusted : a quic-conn is always used on
the same thread and emission is not multi-threaded. Thus, qring/cbuf
usage can be replace by a standard struct buffer.

The code has been simplified even more as for now buffer is always
drained after a prepare/send invocation. This is the case since a
datagram is always considered as sent even on sendto() error. BUG_ON
statements guard are here to ensure that this model is always valid.
Thus, code to handle data wrapping and consume too small contiguous
space with a 0-length datagram is removed.
2022-08-09 15:45:47 +02:00
Willy Tarreau
db3716b8db MINOR: debug/memstats: permit to pass the size to free()
Right now the free() call is not intercepted since all this is done
using macros and that would break a lot of stuff. Instead a __free()
macro was provided but never used. In addition it used to only report
a zero size, which is not very convenient.

With this patch comes a better solution. Instead it provides a new
will_free() macro that can be prepended before a call to free(). It
only keeps the counters up to date, and also supports being passed a
size. The pool_free_area() command now uses it, which finally allows
the stats to look correct:

  pool-os.h:38   MALLOC  size:   5802127832  calls:   3868044  size/call:   1500
  pool-os.h:47     FREE  size:   5800041576  calls:   3867444  size/call:   1499

The few other places directly calling free() could now be instrumented to
use this and to pass the correct sizeof() when known.
2022-08-09 09:11:27 +02:00
Willy Tarreau
17200dd1f3 MINOR: debug: also store the function name in struct mem_stats
The calling function name is now stored in the structure, and it's
reported when the "all" argument is passed. The first column is
significantly enlarged because some names are really wide :-(
2022-08-09 08:42:42 +02:00
Willy Tarreau
55c950baa9 MINOR: debug: store and report the pool's name in struct mem_stats
Let's add a generic "extra" pointer to the struct mem_stats to store
context-specific information. When tracing pool_alloc/pool_free, we
can now store a pointer to the pool, which allows to report the pool
name on an extra column. This significantly improves tracing
capabilities.

Example:

  proxy.c:1598      CALLOC   size:  28832  calls:  4     size/call:  7208
  dynbuf.c:55       P_FREE   size:  32768  calls:  2     size/call:  16384  buffer
  quic_tls.h:385    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:389    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:554    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:558    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:562    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:401    P_ALLOC  size:  34080  calls:  1420  size/call:  24     quic_tls_iv
  quic_tls.h:403    P_ALLOC  size:  34080  calls:  1420  size/call:  24     quic_tls_iv
  xprt_quic.c:4060  MALLOC   size:  45376  calls:  5672  size/call:  8
  quic_sock.c:328   P_ALLOC  size:  46440  calls:  215   size/call:  216    quic_dgram
2022-08-09 08:26:59 +02:00
Willy Tarreau
4bc37086b8 MINOR: debug: make the mem_stats section aligned to void*
Not specifying the alignment will let the linker choose it, and it turns
out that it will not necessarily be the same size as the one chosen for
struct mem_stats, as can be seen if any new fields are added there. Let's
enforce an alignment to void* both for the section and for the structure.
2022-08-09 08:09:24 +02:00
Ilya Shipitsin
3b64a28e15 CLEANUP: assorted typo fixes in the code and comments
This is 31st iteration of typo fixes
2022-08-06 17:12:51 +02:00
Willy Tarreau
693688e734 MINOR: config: automatically preset MAX_THREADS based on MAX_TGROUPS
MAX_THREADS was not changed when setting MAX_TGROUPS, which still limits
some possibilities. Let's preset it to 4 * LONGBITS when MAX_TGROUPS is
larger than 1, or LONGBITS when it's set to 1. This means that the new
default value is 256 threads.

The rationale behind this is that the main use of thread groups is
mostly to address NUMA issues and that we don't necessarily need large
thread counts when using many groups, and 256 threads is already plenty
even on quite large systems.

For now it's important not to go too far because some internal structs
are arrays of MAX_THREADS entries, for example accept_queue_ring, which
is around 8kB per thread. Such structures will need to become dynamic
before defaulting to large thread counts (at 4096 threads max the
accept queues would require 32 MB RAM alone).
2022-08-06 16:51:20 +02:00
Amaury Denoyelle
6715cbf97f BUG/MINOR: quic: adjust errno handling on sendto
qc_snd_buf returned a size_t which means that it was never negative
despite its documentation. Thus the caller who checked for this was
never informed of a sendto error.

Clean this by changing the return value of qc_snd_buf() to an integer.
A 0 is returned on success. Every other values are considered as an
error.

This commit should be backported up to 2.6. Note that to not cause
malfunctions, it must be backported after the previous patch :
  906b058954
  MINOR: quic: explicitely ignore sendto error
This is to ensure that a sendto error does not cause send to be
interrupted which may cause a stalled transfer without a proper retry
mechanism.

The impact of this bug seems null as caller explicitely ignores sendto
error. However this part of code seems to be subject to strange issues
and it may fix them in part. It may be of interest for github issue #1808.
2022-08-05 15:53:16 +02:00
Frédéric Lécaille
8ecb7363b5 MINOR: quic: Add two new stats counters for sendto() errors
Add "quic_socket_full" new stats counter for sendto() errors with EAGAIN as errno.
and "quic_sendto_err" counter for any other error.
2022-08-05 15:27:14 +02:00
Amaury Denoyelle
30e260e2e6 MEDIUM: mux-quic: implement http-request timeout
Implement http-request timeout for QUIC MUX. It is used when the
connection is opened and is triggered if no HTTP request is received in
time. By HTTP request we mean at least a QUIC stream with a full header
section. Then qcs instance is attached to a sedesc and upper layer is
then responsible to wait for the rest of the request.

This timeout is also used when new QUIC streams are opened during the
connection lifetime to wait for full HTTP request on them. As it's
possible to demux multiple streams in parallel with QUIC, each waiting
stream is registered in a list <opening_list> stored in qcc with <start>
as timestamp in qcs for the stream opening. Once a qcs is attached to a
sedesc, it is removed from <opening_list>. When refreshing MUX timeout,
if <opening_list> is not empty, the first waiting stream is used to set
MUX timeout.

This is efficient as streams are stored in the list in their creation
order so CPU usage is minimal. Also, the size of the list is
automatically restricted by flow control limitation so it should not
grow too much.

Streams are insert in <opening_list> by application protocol layer. This
is because only application protocol can differentiate streams for HTTP
messaging from internal usage. A function qcs_wait_http_req() has been
added to register a request stream by app layer. QUIC MUX can then
remove it from the list in qc_attach_sc().

As a side-note, it was necessary to implement attach qcc_app_ops
callback on hq-interop module to be able to insert a stream in waiting
list. Without this, a BUG_ON statement would be triggered when trying to
remove the stream on sedesc attach. This is to ensure that every
requests streams are registered for http-request timeout.

MUX timeout is explicitely refreshed on MAX_STREAM_DATA and STOP_SENDING
frame parsing to schedule http-request timeout if a new stream has been
instantiated. It was already done on STREAM parsing due to a previous
patch.
2022-08-03 15:04:18 +02:00
Amaury Denoyelle
8d818c6eab MINOR: h3: support HTTP request framing state
Store the current step of HTTP message in h3s stream. This reports if we
are in the parsing of headers, content or trailers section. A new enum
h3s_st_req is defined for this.

This field is stored in h3s struct but only used for request stream. It
is left undefined for other streams (control or QPACK streams).

h3_is_frame_valid() has been extended to take into account this state
information. A connection error H3_FRAME_UNEXPECTED is reported if an
invalid frame according to the current state is received; for example a
DATA frame at the beginning of a stream.
2022-08-03 15:04:18 +02:00
Frédéric Lécaille
8ddde4f05e BUG/MINOR: quic: Missing in flight ack eliciting packet counter decrement
The decrement was missing in quic_pktns_tx_pkts_release() called each time a
packet number space is discarded. This is not sure this bug could have an impact
during handshakes. This counter is used to cancel the timer used both for packet
detection and PTO, setting its value to null. So there could be retransmissions
or probing which could be triggered for nothing.

Must be backported to 2.6.
2022-08-03 12:59:59 +02:00
Christopher Faulet
b32cb9b515 REORG: server: Export srv_settings_cpy() function
This function will be used to init a proxy with settings of the default
proxy. It is mandatory to fix a bug. To do so, it must be exposed.
2022-08-03 11:28:52 +02:00
Amaury Denoyelle
bd6ec1bf84 MEDIUM: mux-quic: implement http-keep-alive timeout
Complete QUIC MUX timeout refresh function by using http-keep-alive
timeout. It is used when the connection is idle after having handle at
least one request.

To implement this a new member <idle_start> has been defined in qcc
structure. This is used as timestamp for when the connection became idle
and is used as base time for http keep-alive timeout
2022-08-01 15:00:13 +02:00
Amaury Denoyelle
c603de4d84 MINOR: mux-quic: count in-progress requests
Add a new qcc member named <nb_hreq>. Its purpose is close to <nb_sc>
which represents the number of attached stream connectors. Both are
incremented inside qc_attach_sc().

The difference is on the decrement operation. While <nb_cs> is
decremented on sedesc detach callback, <nb_hreq> is decremented when the
qcs is locally closed.

In most cases, <nb_hreq> will be decremented before <nb_cs>. However, it
will be the reverse if a stream must be kept alive after detach callback.

The main purpose of this field is to implement http-keep-alive timeout.
Both <nb_sc> and <nb_hreq> must be null to activate the http-keep-alive
timeout.
2022-08-01 14:58:41 +02:00
Amaury Denoyelle
07bf8f4d86 MINOR: mux-quic: save proxy instance into qcc
Store a reference to proxy in the qcc structure. This will be useful to
access to proxy members outside of qcc_init().

Most notably, this change is required to implement timeout refreshing by
using the various timeouts configured at the proxy level.
2022-08-01 14:23:21 +02:00
Amaury Denoyelle
4ea5090f55 CLEANUP: mux-quic: remove useless app_ops is_active callback
Timeout in QUIC MUX has evolved from the simple first implementation. At
the beginning, a connection was considered dead unless bidirectional
streams were opened. This was abstracted through an app callback
is_active().

Now this paradigm has been reversed and a connection is considered alive
by default, unless an error has been reported or a timeout has already
been fired. The callback is_active() is thus not used anymore and can be
safely removed to simplify qcc_is_dead().

This commit should be backported to 2.6.
2022-08-01 14:13:51 +02:00
Willy Tarreau
81f3b80e32 MINOR: ebtree: add ebmb_lookup_shorter() to pursue lookups
This function is designed to enlarge the scope of a lookup performed
by a caller via ebmb_lookup_longest() that was not satisfied with the
result. It will first visit next duplicates, and if none are found,
it will go up in the tree to visit similar keys with shorter prefixes
and will return them if they match. We only use the starting point's
value to perform the comparison since it was expected to be valid for
the looked up key, hence it has all bits in common with its own length.

The algorithm is a bit complex because when going up we may visit nodes
that are located beneath the level we just come from. However it is
guaranteed that keys having a shorter prefix will be present above the
current location, though they may be attached to the left branch of a
cover node, so we just visit all nodes as long as their prefix is too
large, possibly go down along the left branch on cover nodes, and stop
when either there's a match, or there's a non-matching prefix anymore.

The following tricky case now works fine and properly finds 10.0.0.0/7
when looking up 11.0.0.1 from tree version 1 though both belong to
different sub-trees:

  prepare map #1
    add map @1 #1 10.0.0.0/7 10.0.0.0/7
    add map @1 #1 10.0.0.0/7 10.0.0.0/7
  commit map @1 #1
  prepare map #1
    add map @2 #1 11.0.0.0/8 11.0.0.0/8
    add map @2 #1 11.0.0.0/8 11.0.0.0/8

  prepare map #1
    add map @1 #1 10.0.0.0/7 10.0.0.0/7
  commit map @1 #1
  prepare map #1
    add map @2 #1 10.0.0.0/7 10.0.0.0/7
    add map @2 #1 11.0.0.0/8 11.0.0.0/8
    add map @2 #1 11.0.0.0/8 11.0.0.0/8
2022-08-01 11:59:46 +02:00
Willy Tarreau
0dc9e6dca2 DEBUG: tools: provide a tree dump function for ebmbtrees as well
It's convenient for debugging IP trees. However we're not dumping the
full keys, for the sake of simplicity, only the 4 first bytes are dumped
as a u32 hex value. In practice this is sufficient for debugging. As a
reminder since it seems difficult to recover the command each time it's
needed, the output is converted to an image using dot from Graphviz:

    dot -o a.png -Tpng dump.txt
2022-08-01 11:59:15 +02:00
Willy Tarreau
688709d814 MAJOR: threads/plock: update the embedded library
The plock code hasn't been been updated since 2017 and didn't benefit
from the exponential back-off improvements that were added in 2018.
Simply updating the file shows a massive performance gain on large
thread count (>=48) with dequeuing going from 113k RPS to 300k RPS and
round robin from 229k RPS to 1020k RPS. It was about time to update.
In addition, some recent improvements to the code will be useful with
thread groups.

An interesting improvement concerns EPYC CPUs. This one alone increased
fairness and was sufficient to avoid crashes in process_srv_queue() there,
when hammering two servers with maxconn 200 under 1k connections.
2022-07-30 10:15:44 +02:00
Frédéric Lécaille
43910a9450 MINOR: quic: New "quic-cc-algo" bind keyword
As it could be interesting to be able to choose the QUIC control congestion
algorithm to be used by listener, add "quic-cc-algo" new keyword to do so.
Update the documentation consequently.

Must be backported to 2.6.
2022-07-29 17:32:05 +02:00
Frédéric Lécaille
1c9c2f6c02 MEDIUM: quic: Cubic congestion control algorithm implementation
Cubic is the congestion control algorithm used by default by the Linux kernel
since 2.6.15 version. This algorithm is supposed to achieve good scalability and
fairness between flows using the same network path, it should also be used by QUIC
by default. This patch implements this algorithm and select it as default algorithm
for the congestion control.

Must be backported to 2.6.
2022-07-29 17:32:05 +02:00
Frédéric Lécaille
c591459d11 MINOR: quic: Congestion control architecture refactoring
Ease the integration of new congestion control algorithm to come.
Move the congestion controller state to a private array of uint32_t
to stop using a union. We do not want to continue using such long
paths cc->algo_state.<algo>.<var> to modify the internal state variable
for each algorithm.

Must be backported to 2.6
2022-07-29 17:32:05 +02:00
William Lallemand
dc66f2f97d DEBUG: fd: split the fd check
Split the BUG_ON(fd < 0 || fd >= global.maxsock) so it's easier to know
if it quits because of a -1.
2022-07-26 10:35:24 +02:00
Willy Tarreau
51b1fcedeb DEBUG: fd: detect possibly invalid tgid in fd_insert()
Since the API is still a bit young, let's make sure nobody tries to
assign and FD to a group not strictly 1..MAX_TGROUPS as that would
indicate a bug.

Note: some of these might be relaxed to BUG_ON_HOT() in the future
2022-07-25 15:47:45 +02:00
Christopher Faulet
7e94b40a22 BUG/MINOR: fd: Properly init the fd state in fd_insert()
When a new fd is inserted in the fdtab array, its state is initialized. The
"newstate" variable is used to compute the right state (0 by default, but
FD_ET_POSSIBLE flag is set if edge-triggered is supported for the fd).
However, this variable is never used and the fd state is always set to 0.

Now, the fd state is initialized with "newstate" variable.

This bug was introduced by commit ddedc1662 ("MEDIUM: fd: make
fd_insert/fd_delete atomically update fd.tgid"). No backport needed.
2022-07-19 12:11:04 +02:00
Willy Tarreau
03f3049df1 BUG/MINOR: tools: fix statistical_prng_range()'s output range
This function was added by commit 84ebfabf7 ("MINOR: tools: add
statistical_prng_range() to get a random number over a range") but it
contains a bug on the range, since mul32hi() covers the whole input
range, we must pass it range-1. For now it didn't have any impact, but
if used to find an array's index it will cause trouble.

This should be backported to 2.4.
2022-07-18 19:09:55 +02:00
Willy Tarreau
5b3cd9561b BUG/MEDIUM: tools: avoid calling dlsym() in static builds (try 2)
The first approach in commit 288dc1d8e ("BUG/MEDIUM: tools: avoid calling
dlsym() in static builds") relied on dlopen() but on certain configs (at
least gcc-4.8+ld-2.27+glibc-2.17) it used to catch situations where it
ought not fail.

Let's have a second try on this using dladdr() instead. The variable was
renamed "build_is_static" as it's exactly what's being detected there.
We could even take it for reporting in -vv though that doesn't seem very
useful. At least the variable was made global to ease inspection via the
debugger, or in case it's useful later.

Now it properly detects a static build even with gcc-4.4+glibc-2.11.1 and
doesn't crash anymore.
2022-07-18 14:03:54 +02:00
Willy Tarreau
856d56d2d2 MINOR: config: change default MAX_TGROUPS to 16
This will allows nbtgroups > 1 to be declared in the config without
recompiling. The theoretical limit is 64, though we'd rather not push
it too far for now as some structures might be enlarged to be indexed
per group. Let's start with 16 groups max, allowing to experiment with
dual-socket machines suffering from up to 8 loosely coupled L3 caches.
It's a good start and doesn't engage us too far.
2022-07-15 21:51:48 +02:00
Willy Tarreau
c6b596dcce CLEANUP: threads: remove the now unused all_threads_mask and tid_bit
Since these are not used anymore, let's now remove them. Given the
number of places where we're using ti->ldit_bit, maybe an equivalent
might be useful though.
2022-07-15 20:25:41 +02:00
Willy Tarreau
88c4c14050 MINOR: fd: add fd_reregister_all() to deal with boot-time FDs
At boot the pollers are allocated for each thread and they need to
reprogram updates for all FDs they will manage. This code is not
trivial, especially when trying to respect thread groups, so we'd
rather avoid duplicating it.

Let's centralize this into fd.c with this function. It avoids closed
FDs, those whose thread mask doesn't match the requested one or whose
thread group doesn't match the requested one, and performs the update
if required under thread-group protection.
2022-07-15 20:16:30 +02:00
Willy Tarreau
ddedc16624 MEDIUM: fd: make fd_insert/fd_delete atomically update fd.tgid
These functions need to set/reset the FD's tgid but when they're called
there may still be wakeups on other threads that discover late updates
and have to touch the tgid at the same time. As such, it is not possible
to just read/write the tgid there. It must only be done using operations
that are compatible with what other threads may be doing.

As we're using inc/dec on the refcount, it's safe to AND the area to zero
the lower part when resetting the value. However, in order to set the
value, there's no other choice but fd_claim_tgid() which will assign it
only if possible (via a CAS). This is convenient in the end because it
protects the FD's masks from being modified by late threads, so while
we hold this refcount we can safely reset the thread_mask and a few other
elements. A debug test for non-null masks was added to fd_insert() as it
must not be possible to face this situation thanks to the protection
offered by the tgid.
2022-07-15 20:16:30 +02:00
Willy Tarreau
3638d174e5 MEDIUM: fd: make thread_mask now represent group-local IDs
With the change that was started on other masks, the thread mask was
still not fully converted, sometimes being used as a global mask and
sometimes as a local one. This finishes the code modifications so that
the mask is always considered as a group-local mask. This doesn't
change anything as long as there's a single group, but is necessary
for groups 2 and above since it's used against running_mask and so on.
2022-07-15 20:16:30 +02:00
Willy Tarreau
d6e1987612 MINOR: fd: make fd_clr_running() return the previous value instead
It's an AND so it destroys information and due to this there's a call
place where we have to perform two reads to know the previous value
then to change it. With a fetch-and-and instead, in a single operation
we can know if the bit was previously present, which is more efficient.
2022-07-15 20:16:30 +02:00
Willy Tarreau
a707d02657 MEDIUM: fd/poller: turn running_mask to group-local IDs
From now on, the FD's running_mask only refers to local thread IDs. However,
there remains a limitation, in updt_fd_polling(), we temporarily have to
check and set shared FDs against .thread_mask, which still contains global
ones. As such, nbtgroups > 1 may break (but this is not yet supported without
special build options).
2022-07-15 20:16:30 +02:00
Willy Tarreau
6d3c501c08 MEDIUM: fd/poller: turn update_mask to group-local IDs
From now on, the FD's update_mask only refers to local thread IDs. However,
there remains a limitation, in updt_fd_polling(), we temporarily have to
check and set shared FDs against .thread_mask, which still contains global
ones. As such, nbtgroups > 1 may break (but this is not yet supported without
special build options).
2022-07-15 20:16:30 +02:00
Willy Tarreau
ceffd17f52 MINOR: fd: add fd_get_running() to atomically return the running mask
The running mask is only valid if the tgid is the expected one. This
function takes a reference on the tgid before reading the running mask,
so that both are checked at once. It returns either the mask or zero if
the tgid differs, thus providing a simple way for a caller to check if
it still holds the FD.
2022-07-15 20:16:30 +02:00
Willy Tarreau
080373ea38 MINOR: fd: add functions to manipulate the FD's tgid
The FD's tgid is refcounted and must be atomically manipulated. Function
fd_grab_tgid() will increase the refcount but only if the tgid matches the
one in argument (likely the current one). fd_claim_tgid() will be used to
self-assign the tgid after waiting for its refcount to reach zero.
fd_drop_tgid() will be used to drop a temporarily held tgid. All of these
are needed to prevent an FD from being reassigned to another group, either
when inspecting/modifying the running_mask, or when checking for updates,
in order to be certain that the mask being seen corresponds to the desired
group. Note that once at least one bit is set in the running mask of an
active FD, it cannot be closed, thus not migrated, thus the reference does
not need to be held long.
2022-07-15 20:16:09 +02:00
Willy Tarreau
9464bb1f05 MEDIUM: fd: add the tgid to the fd and pass it to fd_insert()
The file descriptors will need to know the thread group ID in addition
to the mask. This extends fd_insert() to take the tgid, and will store
it into the FD.

In the FD, the tgid is stored as a combination of tgid on the lower 16
bits and a refcount on the higher 16 bits. This allows to know when it's
really possible to trust the tgid and the running mask. If a refcount is
higher than 1 it indeed indicates another thread else might be in the
process of updating these values.

Since a closed FD must necessarily have a zero refcount, a test was
added to fd_insert() to make sure that it is the case.
2022-07-15 19:58:06 +02:00
Willy Tarreau
512dd2dc1c MINOR: fd: make fd_insert() apply the thread mask itself
It's a bit ugly to see that half of the callers of fd_insert() have to
apply all_threads_mask themselves to the bit field they're passing,
because usually it comes from a listener that may have other bits set.
Let's make the function apply the mask itself.
2022-07-15 19:58:06 +02:00
Willy Tarreau
35ee710ece MEDIUM: fd/poller: make the update-list per-group
The update-list needs to be per-group because its inspection is based
on a mask and we need to be certain when scanning it if a mask is for
the same thread or another one. Once per-group there's no doubt about
it, even if the FD's polling changes, the entry remains valid. It will
be needed to check the tgid though.

Note that a soft-stop or pause/resume might not necessarily work here
with tgroups>1, because the operation might be delivered to a thread
that doesn't belong to the group and whoe update mask will not reflect
one that is interesting here. We can't do better at this stage.
2022-07-15 19:57:28 +02:00
Willy Tarreau
91a7c164b4 MINOR: task: move the niced_tasks counter to the thread group context
This one is only used as a hint to improve scheduling latency, so there
is no more point in keeping it global since each thread group handles
its own run q
2022-07-15 19:43:10 +02:00
Willy Tarreau
b0e7712fb2 MEDIUM: task/thread: move the task shared wait queues per thread group
Their migration was postponed for convenience only but now's time for
having the shared wait queues per thread group and not just per process,
otherwise the WQ lock uses a huge amount of CPU alone.
2022-07-15 19:43:10 +02:00
Willy Tarreau
82e378aa8a MINOR: fd/thread: get rid of thread_mask()
Since commit d2494e048 ("BUG/MEDIUM: peers/config: properly set the
thread mask") there must not remain any single case of a receiver that
is bound nowhere, so there's no need anymore for thread_mask().

We're adding a test in fd_insert() to make sure this doesn't happen by
accident though, but the function was removed and its rare uses were
replaced with the original value of the bind_thread msak.
2022-07-15 19:43:10 +02:00
Willy Tarreau
5b09341c02 MEDIUM: cpu-map: replace the process number with the thread group number
The principle remains the same, but instead of having a single process
and ignoring extra ones, now we set the affinity masks for the respective
threads of all groups.

The doc was updated with a few extra examples.
2022-07-15 19:43:10 +02:00
Willy Tarreau
7aa41196cf MEDIUM: debug/threads: make the lock debugging take tgroups into account
Since we have to use masks to verify owners/waiters, we have no other
option but to have them per group. This definitely inflates the size
of the locks, but this is only used for extreme debugging anyway so
that's not dramatic.

Thus as of now, all masks in the lock stats are local bit masks, derived
from ti->ltid_bit. Since at boot ltid_bit might not be set, we just take
care of this situation (since some structs are initialized under look
during boot), and use bit 0 from group 0 only.
2022-07-15 19:41:26 +02:00
Willy Tarreau
4d9888ca69 CLEANUP: fd: get rid of the __GET_{NEXT,PREV} macros
They were initially made to deal with both the cache and the update list
but there's no cache anymore and keeping them for the update list adds a
lot of obfuscation that is really not desired. Let's get rid of them now.

Their purpose was simply to get a pointer to fdtab[fd].update.{,next,prev}
in order to perform atomic tests and modifications. The offset passed in
argument to the functions (fd_add_to_fd_list() and fd_rm_from_fd_list())
was the offset of the ->update field in fdtab, and as it's not used anymore
it was removed. This also removes a number of casts, though those used by
the atomic ops have to remain since only scalars are supported.
2022-07-15 19:41:26 +02:00
Willy Tarreau
91f7a1af34 CLEANUP: applet: remove the obsolete command context from the appctx
The "ctx" and "st2" parts in the appctx were marked for removal in 2.7
and were emulated using memcpy/memset etc for possible external code.
Let's remove this now.
2022-07-15 19:41:26 +02:00
Amaury Denoyelle
d666d740d2 MINOR: mux-quic: support app graceful shutdown
Adjust qcc_emit_cc_app() to allow the delay of emission of a
CONNECTION_CLOSE. This will only set the error code but the quic-conn
layer is not flagged for immediate close. The quic-conn will be
responsible to shut the connection when deemed suitable.

This change will allow to implement application graceful shutdown, such
as HTTP/3 with GOAWAY emission. This will allow to emit closing frames
on MUX release. Once all work is done at the lower layer, the quic-conn
should emit a CONNECTION_CLOSE with the registered error code.
2022-07-15 15:06:59 +02:00
Amaury Denoyelle
57e6db7021 MINOR: quic: define a generic QUIC error type
Define a new structure quic_err to abstract a QUIC error type. This
allows to easily differentiate a transport and an application error
code. This simplifies error transmission from QUIC MUX and H3 layers.

This new type is defined in quic_frame module. It is used to replace
<err_code> field in <quic_conn>. QUIC_FL_CONN_APP_ALERT flag is removed
as it is now useless.

Utility functions are defined to be able to quickly instantiate
transport, tls and application errors.
2022-07-15 14:57:49 +02:00
Amaury Denoyelle
41cd879383 CLEANUP: quic: clean up include on quic_frame-t.h
quic_frame-t.h and xprt_quic-t.h include themselves mutually. This may
cause some troubles later.

In fact, xprt_quic does not need to include quic_frame so remove this.
And as quic_frame is a generic source file which is included in multiple
places, it is useful to also remove the xprt_quic include in it. Use
forward declaration for this.
2022-07-15 14:54:24 +02:00
Amaury Denoyelle
a5b5075211 MEDIUM: mux-quic: implement STOP_SENDING handling
Implement support for STOP_SENDING frame parsing. The stream is resetted
as specified by RFC 9000. This will automatically interrupt all future
send operation in qc_send(). A RESET_STREAM will be sent with the code
extracted from the original STOP_SENDING frame.
2022-07-11 16:45:04 +02:00
Amaury Denoyelle
843a1196b3 MEDIUM: mux-quic: implement RESET_STREAM emission
Implement functions to be able to reset a stream via RESET_STREAM.

If needed, a qcs instance is flagged with QC_SF_TO_RESET to schedule a
stream reset. This will interrupt all future send operations.

On stream emission, if a stream is flagged with QC_SF_TO_RESET, a
RESET_STREAM frame is generated and emitted to the transport layer. If
this operation succeeds, the stream is locally closed. If upper layer is
instantiated, error flag is set on it.
2022-07-11 16:45:04 +02:00
Amaury Denoyelle
38e6006da1 MINOR: mux-quic: define basic stream states
Implement a basic state machine to represent stream lifecycle. By
default a stream is idle. It is marked as open when sending or receiving
the first data on a stream.

Bidirectional streams has two states to represent the closing on both
receive and send channels. This distinction does not exists for
unidirectional streams which passed automatically from open to close
state.

This patch is mostly internal and has a limited visible impact. Some
behaviors are slightly updated :
* closed streams are garbage collected at the start of io handler
* send operation is interrupted if a stream is close locally

Outside of this, there is no functional change. However, some additional
BUG_ON guards are implemented to ensure that we do not conduct invalid
operation on a stream. This should strengthen the code safety. Also,
stream states are displayed on trace which should help debugging.
2022-07-11 16:37:21 +02:00
Amaury Denoyelle
b143723411 REORG: mux-quic: rename stream initialization function
Rename both qcc_open_stream_local/remote() functions to
qcc_init_stream_local/remote(). This change is purely cosmetic. It will
reduces the ambiguity with the soon to be implemented OPEN states for
QCS instances.
2022-07-11 16:24:03 +02:00
Willy Tarreau
481edaceb8 BUILD: debug: silence warning on gcc-5
In 2.6, 8a0fd3a36 ("BUILD: debug: work around gcc-12 excessive
-Warray-bounds warnings") disabled some warnings that were reported
around the the BUG() statement. But the -Wnull-dereference warning
isn't known from gcc-5, it only arrived in gcc-6, hence makes gcc-5
complain loudly that it doesn't know this directive. Let's just
condition this one to gcc-6.
2022-07-10 14:13:48 +02:00
Christopher Faulet
ca7218aaf0 MINOR: http: Add function to detect default port
http_is_default_port() can be used to test if a port is a default HTTP/HTTPS
port. A scheme may be specified. In this case, it is used to detect defaults
ports, 80 for "http://" and 443 for "https://". Otherwise, with no scheme, both
are considered as default ports.
2022-07-06 17:54:03 +02:00
Christopher Faulet
658f971621 MINOR: http: Add function to get port part of a host
http_get_host_port() function can be used to get the port part of a host. It
will be used to get the port of an uri authority or a host header
value. This function only look for a port starting from the end of the
host. It is the caller responsibility to call it with a valid host value. An
indirect string is returned.
2022-07-06 17:54:03 +02:00
Amaury Denoyelle
3f39b40fe0 MINOR: mux-quic: rename qcs flag FIN_RECV to SIZE_KNOWN
Rename QC_SF_FIN_RECV to the more generic name QC_SF_SIZE_KNOWN. This
better align with the QUIC RFC 9000 which uses the "Size Known" state
definition. This change is purely cosmetic.
2022-07-05 16:18:27 +02:00
Amaury Denoyelle
a509ffb505 MEDIUM: mux-quic: refactor streams opening
Review the whole API used to access/instantiate qcs.

A public function qcc_open_stream_local() is available to the
application protocol layer. It allows to easily opening a local stream.
The ID is automatically attributed to the next one available.

For remote streams, qcc_open_stream_remote() has been implemented. It
will automatically take care of allocating streams in a linear way
according to the ID. This function is called via qcc_get_qcs() which can
be used for each qcc_recv*() operations. For the moment, it is only used
for STREAM frames via qcc_recv(), but soon it will be implemented for
other frames types which can also be used to open a new stream.

qcs_new() and qcs_free() has been restricted to the MUX QUIC only as
they are now reserved for internal usage.

This change is a pure refactoring and should not have any noticeable
impact. It clarifies the developer intent and help to ensure that a
stream is not automatically opened when not desired.
2022-07-05 16:18:27 +02:00
Amaury Denoyelle
321fa7733c REORG: mux-quic: reorganize flow-control fields
<qcc.cl_bidi_r> is used to implement STREAM ID flow control enforcement.
Move it with all fields related to this operation and separated from MAX
STREAM DATA calcul.
2022-07-05 11:20:02 +02:00
Amaury Denoyelle
a441ec9c7a CLEANUP: mux-quic: do not export qc_get_ncbuf
qc_get_ncbuf() is only used internally : thus its prototype in QUIC MUX
include is not required.
2022-07-05 11:06:52 +02:00
Emeric Brun
36d9097cf3 MINOR: fd: Add BUG_ON checks on fd_insert()
This patch adds two BUG_ON on fd_insert() into the fdtab checking
if the fd has been correctly re-initialized into the fdtab
before a new insert.

It will raise a BUG if we try to insert the same fd multiple times
without an intermediate fd_delete().

First one checks that the owner for this fd in fdtab was reset to NULL.

Second one checks that the state flags for this fd in fdtab was reset
to 0.

This patch could be backported on version >= 2.4
2022-07-05 05:18:51 +02:00
Willy Tarreau
1229ef312d MINOR: wdt: do not rely on threads_to_dump anymore
This flag is not needed anymore as we're already marking the waiting
threads as harmless, thus the thread's bit is already covered by this
information. The variable was unexported.
2022-07-01 19:26:35 +02:00
Willy Tarreau
a2b8ed4b44 MINOR: thread: add is_thread_harmless() to know if a thread already is harmless
The harmless status is not re-entrant, so sometimes for signal handling
it can be useful to know if we're already harmless or not. Let's add a
function doing that, and make the debugger use it instead of manipulating
the harmless mask.
2022-07-01 19:26:35 +02:00
Willy Tarreau
598cf3f22e MAJOR: threads: change thread_isolate to support inter-group synchronization
thread_isolate() and thread_isolate_full() were relying on a set of thread
masks for all threads in different states (rdv, harmless, idle). This cannot
work anymore when the number of threads increases beyond LONGBITS so we need
to change the mechanism.

What is done here is to have a counter of requesters and the number of the
current isolated thread. Threads which want to isolate themselves increment
the request counter and wait for all threads to be marked harmless (or idle)
by scanning all groups and watching the respective masks. This is possible
because threads cannot escape once they discover this counter, unless they
also want to isolate and possibly pass first. Once all threads are harmless,
the requesting thread tries to self-assign the isolated thread number, and
if it fails it loops back to checking all threads. If it wins it's guaranted
to be alone, and can drop its harmless bit, so that other competing threads
go back to the loop waiting for all threads to be harmless. The benefit of
proceeding this way is that there's very little write contention on the
thread number (none during work), hence no cache line moves between caches,
thus frozen threads do not slow down the isolated one.

Once it's done, the isolated thread resets the thread number (hence lets
another thread take the place) and decrements the requester count, thus
possibly releasing all harmless threads.

With this change there's no more need for any global mask to synchronize
any thread, and we only need to loop over a number of groups to check
64 threads at a time per iteration. As such, tinfo's threads_want_rdv
could be dropped.

This was tested with 64 threads spread into 2 groups, running 64 tasks
(from the debug dev command), 20 "show sess" (thread_isolate()), 20
"add server blah/blah" (thread_isolate()), and 20 "del server blah/blah"
(thread_isolate_full()). The load remained very low (limited by external
socat forks) and no stuck nor starved thread was found.
2022-07-01 19:15:15 +02:00
Willy Tarreau
ef422ced91 MEDIUM: thread: make stopping_threads per-group and add stopping_tgroups
Stopping threads need a mask to figure who's still there without scanning
everything in the poll loop. This means this will have to be per-group.
And we also need to have a global stopping groups mask to know what groups
were already signaled. This is used both to figure what thread is the first
one to catch the event, and which one is the first one to detect the end of
the last job. The logic isn't changed, though a loop is required in the
slow path to make sure all threads are aware of the end.

Note that for now the soft-stop still takes time for group IDs > 1 as the
poller is not yet started on these threads and needs to expire its timeout
as there's no way to wake it up. But all threads are eventually stopped.
2022-07-01 19:15:15 +02:00
Willy Tarreau
03f9b35114 MEDIUM: tinfo: add a dynamic thread-group context
The thread group info is not sufficient to represent a thread group's
current state as it's read-only. We also need something comparable to
the thread context to represent the aggregate state of the threads in
that group. This patch introduces ha_tgroup_ctx[] and tg_ctx for this.
It's indexed on the group id and must be cache-line aligned. The thread
masks that were global and that do not need to remain global were moved
there (want_rdv, harmless, idle).

Given that all the masks placed there now become group-specific, the
associated thread mask (tid_bit) now switches to the thread's local
bit (ltid_bit). Both are the same for nbtgroups 1 but will differ for
other values.

There's also a tg_ctx pointer in the thread so that it can be reached
from other threads.
2022-07-01 19:15:15 +02:00
Willy Tarreau
22b2a24eb2 CLEANUP: thread: remove thread_sync_release() and thread_sync_mask
This function was added in 2.0 when reworking the thread isolation
mechanism to make it more reliable. However it if fundamentally
incompatible with the full isolation mechanism provided by
thread_isolate_full() since that one will wait for all threads to
become idle while the former will wait for all threads to finish
waiting, causing a deadlock.

Given that it's not used, let's just drop it entirely before it gets
used by accident.
2022-07-01 19:15:15 +02:00
Willy Tarreau
cce203aae5 MINOR: thread: add a new all_tgroups_mask variable to know about active tgroups
In order to kill all_threads_mask we'll need to have an equivalent for
the thread groups. The all_tgroups_mask does just this, it keeps one bit
set per enabled group.
2022-07-01 19:15:15 +02:00
Willy Tarreau
377e37a80f MINOR: tinfo: add the mask of enabled threads in each group
In order to replace the global "all_threads_mask" we'll need to have an
equivalent per group. Take this opportunity for calling it threads_enabled
and make sure which ones are counted there (in case in the future we allow
to stop some).
2022-07-01 19:15:14 +02:00
Willy Tarreau
60fe4a95a2 MINOR: tinfo: replace the tgid with tgid_bit in tgroup_info
Now that the tgid is accessible from the thread, it's pointless to have
it in the group, and it was only set but never used. However we'll soon
frequently need the mask corresponding to the group ID and the risk of
getting it wrong with the +1 or to shift 1 instead of 1UL is important,
so let's store the tgid_bit there.
2022-07-01 19:15:14 +02:00
Willy Tarreau
66ad98a772 MINOR: tinfo: add the tgid to the thread_info struct
At several places we're dereferencing the thread group just to catch
the group number, and this will become even more required once we start
to use per-group contexts. Let's just add the tgid in the thread_info
struct to make this easier.
2022-07-01 19:15:14 +02:00
Willy Tarreau
e7475c8e79 MEDIUM: tasks/fd: replace sleeping_thread_mask with a TH_FL_SLEEPING flag
Every single place where sleeping_thread_mask was still used was to test
or set a single thread. We can now add a per-thread flag to indicate a
thread is sleeping, and remove this shared mask.

The wake_thread() function now always performs an atomic fetch-and-or
instead of a first load then an atomic OR. That's cleaner and more
reliable.

This is not easy to test, as broadcast FD events are rare. The good
way to test for this is to run a very low rate-limited frontend with
a listener that listens to the fewest possible threads (2), and to
send it only 1 connection at a time. The listener will periodically
pause and the wakeup task will sometimes wake up on a random thread
and will call wake_thread():

   frontend test
        bind :8888 maxconn 10 thread 1-2
        rate-limit sessions 5

Alternately, disabling/enabling a frontend in loops via the CLI also
broadcasts such events, but they're more difficult to observe since
this is causing connection failures.
2022-07-01 19:15:14 +02:00
Willy Tarreau
dce4ad755f MEDIUM: thread: add a new per-thread flag TH_FL_NOTIFIED to remember wakeups
Right now when an inter-thread wakeup happens, we preliminary check if the
thread was asleep, and if so we wake the poller up and remove its bit from
the sleeping mask. That's not very clean since the sleeping mask cannot be
entirely trusted since a thread that's about to wake up will already have
its sleeping bit removed.

This patch adds a new per-thread flag (TH_FL_NOTIFIED) to remember that a
thread was notified to wake up. It's cleared before checking the task lists
last, so that new wakeups can be considered again (since wake_thread() is
only used to notify about task wakeups and FD polling changes). This way
we do not need to modify a remote thread's sleeping mask anymore. As such
wake_thread() now only tests and sets the TH_FL_NOTIFIED flag but doesn't
clear sleeping anymore.
2022-07-01 19:15:14 +02:00
Willy Tarreau
058b2c1015 MINOR: poller: centralize poll return handling
When returning from the polling syscall, all pollers have a certain
dance to follow, made of wall clock updates, thread harmless updates,
idle time management and sleeping mask updates. Let's have a centralized
function to deal with all of this boring stuff: fd_leaving_poll(), and
make all the pollers use it.
2022-07-01 19:15:14 +02:00
Willy Tarreau
bdcd32598f MINOR: thread: only use atomic ops to touch the flags
The thread flags are touched a little bit by other threads, e.g. the STUCK
flag may be set by other ones, and they're watched a little bit. As such
we need to use atomic ops only to manipulate them. Most places were already
using them, but here we generalize the practice. Only ha_thread_dump() does
not change because it's run under isolation.
2022-07-01 19:15:14 +02:00
Willy Tarreau
8e079cdd44 MINOR: thread: move the flags to the shared cache line
The thread flags were once believed to be local to the thread, but as
it stands, even the STUCK flag is shared since it's looked at by the
watchdog. As such we'll need to use atomic ops to manipulate them, and
likely to move them into the shared area.

This patch only moves the flag into the shared area so that we can later
decide whether it's best to leave them there or to move them back to the
local area. Interestingly, some tests have shown a 3% better performance
on dequeuing with this, while they're not used by other threads yet, so
there are definitely alignment effects that might change over time.
2022-07-01 19:15:14 +02:00
Willy Tarreau
f3efef4d60 MINOR: thread: make wake_thread() take care of the sleeping threads mask
Almost every call place of wake_thread() checks for sleeping threads and
clears the sleeping mask itself, while the function is solely used for
there. Let's move the check and the clearing of the bit inside the function
itself. Note that updt_fd_polling() still performs the check because its
rules are a bit different.
2022-07-01 19:15:14 +02:00
Willy Tarreau
319d136ff9 MEDIUM: task: use regular eb32 trees for the run queues
Since we don't mix tasks from different threads in the run queues
anymore, we don't need to use the eb32sc_ trees and we can switch
to the regular eb32 ones. This uses cheaper lookup and insert code,
and a 16-thread test on the queues shows a performance increase
from 570k RPS to 585k RPS.
2022-07-01 19:15:14 +02:00
Willy Tarreau
c958c70ec8 MINOR: task: replace global_tasks_mask with a check for tree's emptiness
This bit field used to be a per-thread cache of the result of the last
lookup of the presence of a task for each thread in the shared cache.
Since we now know that each thread has its own shared cache, a test of
emptiness is now sufficient to decide whether or not the shared tree
has a task for the current thread. Let's just remove this mask.
2022-07-01 19:15:14 +02:00
Willy Tarreau
da195e8aab MINOR: task: remove grq_total and use rq_total instead
grq_total was only used to know how many tasks were being queued in the
global runqueue for stats purposes, and that was transferred to the per
thread rq_total counter once assigned. We don't need this anymore since
we know where they are, so let's just directly update rq_total and drop
that one.
2022-07-01 19:15:14 +02:00
Willy Tarreau
b17dd6cc19 MEDIUM: task: replace the global rq_lock with a per-rq one
There's no point having a global rq_lock now that we have one shared RQ
per thread, let's have one lock per runqueue instead.
2022-07-01 19:15:14 +02:00
Willy Tarreau
6f78038d72 MEDIUM: task: move the shared runqueue to one per thread
Since we only use the shared runqueue to put tasks only assigned to
known threads, let's move that runqueue to each of these threads. The
goal will be to arrange an N*(N-1) mesh instead of a central contention
point.

The global_rqueue_ticks had to be dropped (for good) since we'll now
use the per-thread rqueue_ticks counter for both trees.

A few points to note:
  - the rq_lock stlil remains the global one for now so there should not
    be any gain in doing this, but should this trigger any regression, it
    is important to detect whether it's related to the lock or to the tree.

  - there's no more reason for using the scope-based version of the ebtree
    now, we could switch back to the regular eb32_tree.

  - it's worth checking if we still need TASK_GLOBAL (probably only to
    delete a task in one's own shared queue maybe).
2022-07-01 19:15:14 +02:00
Willy Tarreau
a4fb79b4a2 MINOR: task: make rqueue_ticks atomic
The runqueue ticks counter is per-thread and wasn't initially meant to
be shared. We'll soon have to share it so let's make it atomic. It's
only updated when waking up a task, and no performance difference was
observed. It was moved in the thread_ctx struct so that it doesn't
pollute the local cache line when it's later updated by other threads.
2022-07-01 19:15:14 +02:00
Willy Tarreau
fc5de15baa CLEANUP: task: remove the now unused TASK_GLOBAL flag
TASK_GLOBAL was exclusively used by task_unlink_rq(), as such it can be
dropped.
2022-07-01 19:15:14 +02:00
Willy Tarreau
3961608f63 CLEANUP: task: remove the unused task_unlink_rq()
This function stopped being used before 2.4 because either the task is
dequeued by the scheduler itself and it knows where to find it, or it's
killed by any thread, and task_kill() must be used for this as only this
one is safe.

It's difficult to say whether task_unlink_rq() is still safe, but once
the lock moves to a thread declared in the task itself, it will be even
more difficult to keep it safe.

Let's just remove it now before someone reuses it and causes trouble.
2022-07-01 19:15:14 +02:00
Willy Tarreau
eed3911a54 MINOR: task: replace task_set_affinity() with task_set_thread()
The latter passes a thread ID instead of a mask, making the code simpler.
2022-07-01 19:15:14 +02:00
Willy Tarreau
159e3acf5d MEDIUM: task: remove TASK_SHARED_WQ and only use t->tid
TASK_SHARED_WQ was set upon task creation and never changed afterwards.
Thus if a task was created to run anywhere (e.g. a check or a Lua task),
all its timers would always pass through the shared timers queue with a
lock. Now we know that tid<0 indicates a shared task, so we can use that
to decide whether or not to use the shared queue. The task might be
migrated using task_set_affinity() but it's always dequeued first so
the check will still be valid.

Not only this removes a flag that's difficult to keep synchronized with
the thread ID, but it should significantly lower the load on systems with
many checks. A quick test with 5000 servers and fast checks that were
saturating the CPU shows that the check rate increased by 20% (hence the
CPU usage dropped by 17%). It's worth noting that run_task_lists() almost
no longer appears in perf top now.
2022-07-01 19:15:14 +02:00
Willy Tarreau
1f4bf7215a MEDIUM: task: only keep task_new_*() and drop task_new()
As previously advertised in comments, the mask-based task_new() is now
gone. The low-level function now is task_new_on() which takes a thread
number or a negative value for "any thread", which is turned to zero
for thread-less builds since there's no shared WQ in thiscase. The
task_new_here() and task_new_anywhere() functions were adjusted
accordingly.
2022-07-01 19:15:14 +02:00
Willy Tarreau
cb8542755e MEDIUM: applet: only keep appctx_new_*() and drop appctx_new()
This removes the mask-based variant so that from now on the low-level
function becomes appctx_new_on() and it takes either a thread number or
a negative value for "any thread". This way we can use task_new_on() and
task_new_anywhere() instead of task_new() which will soon disappear.
2022-07-01 19:15:14 +02:00
Willy Tarreau
0ad00befc1 CLEANUP: task: remove thread_mask from the struct task
It was not used anymore since everything moved to ->tid, so let's
remove it.
2022-07-01 19:15:14 +02:00
Willy Tarreau
29ffe26733 MAJOR: task: use t->tid instead of ffsl(t->thread_mask) to take the thread ID
At several places we need to figure the ID of the first thread allowed
to run a task. Till now this was performed using my_ffsl(t->thread_mask)
but since we now have the thread ID stored into the task, let's use it
instead. This is tagged major because it starts to assume that tid<0 is
strictly equivalent to atleast2(thread_mask), and that as such, among
the allowed threads are the current one.
2022-07-01 19:15:14 +02:00
Willy Tarreau
5b8e054732 MEDIUM: task/debug: move the ->thread_mask integrity checks to ->tid
Let's make sure the new ->tid field is always correct instead of checking
the thread mask.
2022-07-01 19:15:14 +02:00
Willy Tarreau
6ef52f4479 MEDIUM: task: add and preset a thread ID in the task struct
The tasks currently rely on a mask but do not have an assigned thread ID,
contrary to tasklets. However, in practice they're either running on a
single thread or on any thread, so that it will be worth simplifying all
this in order to ease the transition to the thread groups.

This patch introduces a "tid" field in the task struct, that's either
the number of the thread the task is attached to, or a negative value
if the task is not bound to a thread, (i.e. its mask is all_threads_mask).

The new ID is only set and updated but not used yet.
2022-07-01 19:15:14 +02:00
Willy Tarreau
66ef2c3af6 CLEANUP: config: remove unused proc_mask()
The function was used to return the mask of enabled processes, but it
now always returns 1 and is not called anymore. Let's drop it.
2022-07-01 19:15:14 +02:00
Willy Tarreau
252754c745 MINOR: tinfo: make tid temporarily still reflect global ID
For now we still set tid_bit to (1UL << tid) because FDs will not
work with more than one group without this, but once FDs start to
adopt local masks this must change to thr->ltid_bit.
2022-07-01 19:14:42 +02:00
Emeric Brun
f41a3f6762 MINOR: fd: add a new FD_DISOWN flag to prevent from closing a deleted FD
Some FDs might be offered to some external code (external libraries)
which will deal with them until they close them. As such we must not
close them upon fd_delete() but we need to delete them anyway so that
they do not appear anymore in the fdtab. This used to be handled by
fd_remove() before 2.3 but we don't have this anymore.

This patch introduces a new flag FD_DISOWN to let fd_delete() know that
the core doesn't own the fd and it must not be closed upon removal from
the fd_tab. This way it's totally unregistered from the poller but still
open.

This patch must be backported on branches >= 2.3 because it will be
needed to fix a bug affecting SSL async. it should be adapted on 2.3
because state flags were stored in a different way (via bits in the
structure).
2022-07-01 17:41:40 +02:00
Amaury Denoyelle
e0a92a7e56 MINOR: ncbuf: implement ncb_is_fragmented()
Implement a new status function for ncbuf. It allows to quickly report
if a buffer contains data in a fragmented way, i.e. with gaps in between
or at start of the buffer.

To summarize, a buffer is considered as non-fragmented in the following
cases :
- a null or empty buffer
- a full buffer
- a buffer containing exactly one data block at the beginning, following
  by a gap until the end.
2022-07-01 15:54:23 +02:00
Frédéric Lécaille
649b3fd9aa MINOR: quic: Increase the QUIC connections RX buffer size (upto 64Kb)
With ~1500 bytes QUIC datagrams, we can handle less than 200 datagrams
which is less than the default maxpollevents value. This should reduce
the chances of fulfilling the connections RX buffers as reported by
Tristan in GH #1737.

Must be backported to 2.6.
2022-06-30 14:34:32 +02:00
Frédéric Lécaille
1ca14950e6 MINOR: quic: Duplicated QUIC_RX_BUFSZ definition
This macro is already defined in src/quic_sock.c which is the correct place.

Must be backported to 2.6
2022-06-30 14:24:04 +02:00
Frédéric Lécaille
45a16295e3 MINOR: quic: Add new stats counter to diagnose RX buffer overrun
Remove the call to qc_list_all_rx_pkts() which print messages on stderr
during RX buffer overruns and add a new counter for the number of dropped packets
because of such events.

Must be backported to 2.6
2022-06-30 14:24:04 +02:00
Frédéric Lécaille
ad548b54a7 MINOR: task: Add tasklet_wakeup_after()
We want to be able to schedule a tasklet onto a thread after the current tasklet
is done. What we have to do is to insert this tasklet at the head of the thread
task list. Furthermore, we would like to serialize the tasklets. They must be
run in the same order as the order in which they have been scheduled. This is
implemented passing a list of tasklet as parameter (see <head> parameters) which
must be reused for subsequent calls.
_tasklet_wakeup_after_on() is implemented to accomplish this job.
tasklet_wakeup_after_on() and tasklet_wake_after() are only wrapper macros around
_tasklet_wakeup_after_on(). tasklet_wakeup_after_on() does exactly the same thing
as _tasklet_wakeup_after_on() without having to pass the filename and line in the
filename as parameters (usefull when DEBUG_TASK is enabled).
tasklet_wakeup_after() hides also the usage of the thread parameter which is
<tl> tasklet thread ID.
2022-06-30 14:24:04 +02:00
Frédéric Lécaille
628e89cfae BUILD: quic+h3: 32-bit compilation errors fixes
In GH #1760 (which is marked as being a feature), there were compilation
errors on MacOS which could be reproduced in Linux when building 32-bit code
(-m32 gcc option). Most of them were due to variables types mixing in QUIC_MIN macro
or using size_t type in place of uint64_t type.

Must be backported to 2.6.
2022-06-24 12:13:53 +02:00
Willy Tarreau
27061cd144 MEDIUM: debug: improve DEBUG_MEM_STATS to also report pool alloc/free
Sometimes using "debug dev memstats" can be frustrating because all
pool allocations are reported through pool-os.h and that's all.

But in practice there's nothing wrong with also intercepting pool_alloc,
pool_free and pool_zalloc and report their call counts and locations,
so that's what this patch does. It only uses an alternate set of macroes
for these 3 calls when DEBUG_MEM_STATS is defined. The outputs are
reported as P_ALLOC (for both pool_malloc() and pool_zalloc()) and
P_FREE (for pool_free()).
2022-06-23 11:58:01 +02:00
Christopher Faulet
aa55640b8c MINOR: freq_ctr: Add a function to get events excess over the current period
freq_ctr_overshoot_period() function may be used to retrieve the excess of
events over the current period for a givent frequency counter, ignoring the
history. It is a way compare the "current rate" (the number of events over
the current period) to a given rate and estimate the excess of events.

It may be used to safely add new events, especially at the begining of the
current period for a frequency counter with large period. This way, it is
possible to smoothly add events during the whole period without quickly
consuming all the quota at the beginning of the period and waiting for the
next one to be able to add new events.
2022-06-22 18:33:27 +02:00
Willy Tarreau
c7a8a3c7bd MINOR: intops: add a function to return a valid bit position from a mask
Sometimes we need to be able to signal one thread among a mask, without
caring much about which bit will be picked. At the moment we use ffsl()
for this but this sometimes results in imbalance at certain specific
places where the same first thread in a set is always the same one that
is selected.

Another approach would consist in using the rank finding function but it
requires a popcount and a setup phase, and possibly a modulo operation
depending on the popcount, which starts to be very expensive.

Here we take a different approach. The idea is an input bit value is
passed, from 0 to LONGBITS-1, and that as much as possible we try to
pick the bit matching it if it is set. Otherwise we look at a mirror
position based on a decreasing power of two, and jump to the side
that still has bits left. In 6 iterations it ends up spotting one bit
among 64 and the operations are very cheap and optimizable. This method
has the benefit that we don't care where the holes are located in the
mask, thus it shows a good distribution of output bits based on the
input ones. A long-time test shows an average of 16 cycles, or ~4ns
per lookup at 3.8 GHz, which is about twice as fast as using the rank
finding function.

Just like for that one, the code was stored into tools.c since we don't
have a C file for intops.
2022-06-21 20:29:57 +02:00
Frédéric Lécaille
4f5777a415 MINOR: quic: Dump version_information transport parameter
Implement quic_tp_version_info_dump() to dump such a transport parameter (only remote).
Call it from quic_transport_params_dump() which dump all the transport parameters.

Can be backported to 2.6 as it's useful for debugging.
2022-06-21 11:07:39 +02:00
Frédéric Lécaille
2aebaa49b1 BUG/MINOR: quic: Unexpected half open connection counter wrapping
This counter must be incremented only one time by connection and decremented
as soon as the handshake has failed or succeeded. This is a gauge. Under certain
conditions this counter could be decremented twice. For instance
after having received a TLS alert, then upon SSL_do_handshake() failure.
To stop having to deal to all the current combinations which can lead to such a
situation (and the next to come), add a connection flag to denote if this counter
has been already decremented for a connection. So, this counter must be decremented
only if this flag has not been already set.

Must be backported up to 2.6.
2022-06-20 14:57:09 +02:00
Willy Tarreau
177aed56dc MEDIUM: debug: detect redefinition of symbols upon dlopen()
In order to better detect the danger caused by extra shared libraries
which replace some symbols, upon dlopen() we now compare a few critical
symbols such as malloc(), free(), and some OpenSSL symbols, to see if
the loaded library comes with its own version. If this happens, a
warning is emitted and TAINTED_REDEFINITION is set. This is important
because some external libs might be linked against different libraries
than the ones haproxy was linked with, and most often this will end
very badly (e.g. an OpenSSL object is allocated by haproxy and freed
by such libs).

Since the main source of dlopen() calls is the Lua lib, via a "require"
statement, it's worth trying to show a Lua call trace when detecting a
symbol redefinition during dlopen(). As such we emit a Lua backtrace if
Lua is detected as being in use.
2022-06-19 17:58:32 +02:00
Willy Tarreau
40dde2d5c1 MEDIUM: debug: add a tainted flag when a shared library is loaded
Several bug reports were caused by shared libraries being loaded by other
libraries or some Lua code. Such libraries could define alternate symbols
or include dependencies to alternate versions of a library used by haproxy,
making it very hard to understand backtraces and analyze the issue.

Let's intercept dlopen() and set a new TAINTED_SHARED_LIBS flag when it
succeeds, so that "show info" makes it visible that some external libs
were added.

The redefinition is based on the definition of RTLD_DEFAULT or RTLD_NEXT
that were previously used to detect that dlsym() is usable, since we need
it as well. This should be sufficient to catch most situations.
2022-06-19 17:58:32 +02:00
Willy Tarreau
0b7b639d7e MINOR: hlua: add a new hlua_show_current_location() function
This function may be used to try to show where some Lua code is currently
being executed. It tries hard to detect the initialization phase, both for
the global and the per-thread states, and for runtime states. This intends
to be used by error handlers to provide the users with indications about
what Lua code was being executed when the error triggered.
2022-06-19 17:58:32 +02:00
Christopher Faulet
b68f77d626 BUG/MEDIUM: stream: Properly handle destructive client connection upgrades
When the protocol is changed for a client connection at the stream level
(from TCP to H1/H2), there are two cases. The stream may be reused or
not. The first case, when the stream is reused is working. The second one is
buggy since the conn-stream refactoring and leads to a crash.

In this case, the new mux don't reuse the stream. It must be silently
aborted. However, its front stream connector is still referencing the
connection. So it must be detached. But it must be performed in two stages,
to be sure to not loose the context for the upgrade and to be able to
rollback on error. So now, before the upgrade, we prepare to detach the
stconn and it is finally detached if the upgrade succeeds. There is a trick
here. Because we pretend the stconn is detached but its state is preserved.

This patch must be backported to 2.6.
2022-06-17 13:25:02 +02:00
Frédéric Lécaille
e06f7459fa CLEANUP: quic: Remove any reference to boringssl
I do not think we will support boringssl for QUIC soon ;)
2022-06-16 15:58:48 +02:00
Frédéric Lécaille
301425b880 MEDIUM: quic: Compatible version negotiation implementation (draft-08)
At this time haproxy supported only incompatible version negotiation feature which
consists in sending a Version Negotiation packet after having received a long packet
without compatible value in its version field. This version value is the version
use to build the current packet. This patch does not modify this behavior.

This patch adds the support for compatible version negotiation feature which
allows endpoints to negotiate during the first flight or packets sent by the
client the QUIC version to use for the connection (or after the first flight).
This is done thanks to "version_information" parameter sent by both endpoints.
To be short, the client offers a list of supported versions by preference order.
The server (or haproxy listener) chooses the first version it also supported as
negotiated version.

This implementation has an impact on the tranport parameters handling (in both
direcetions). Indeed, the server must sent its version information, but only
after received and parsed the client transport parameters). So we cannot
encode these parameters at the same time we instantiated a new connection.

Add QUIC_TP_DRAFT_VERSION_INFORMATION(0xff73db) new transport parameter.
Add tp_version_information new C struct to handle this new parameter.
Implement quic_transport_param_enc_version_info() (resp.
quic_transport_param_dec_version_info()) to encode (resp. decode) this
parameter.
Add qc_conn_finalize() which encodes the transport parameters and configure
the TLS stack to send them.
Add ->negotiated_ictx quic_conn C struct new member to store the Initial
QUIC TLS context for the negotiated version. The Initial secrets derivation
is version dependent.
Rename ->version to ->original_version and add ->negotiated_version to
this C struct to reflect the QUIC-VN RFC denomination.
Modify most of the QUIC TLS API functions to pass a version as parameter.
Export the QUIC version definitions to be reused at least from quic_tp.c
(transport parameters.
Move the token check after the QUIC connection lookup. As this is the original
version which is sent into a Retry packet, and because this original version is
stored into the connection, we must check the token after having retreived this
connection.
Add packet version to traces.

See https://datatracker.ietf.org/doc/html/draft-ietf-quic-version-negotiation-08
for more information about this new feature.
2022-06-16 15:58:48 +02:00
Frédéric Lécaille
86845c5171 MEDIUM: quic: Add QUIC v2 draft support
This is becoming difficult to handle the QUIC TLS related definitions
which arrive with a QUIC version (draft or not). So, here we add
quic_version C struct which does not define only the QUIC version number,
but also the QUIC TLS definitions which depend on a QUIC version.
Modify consequently all the QUIC TLS API to reuse these definitions
through new quic_version C struct.
Implement quic_pkt_type() function which return a packet type (0 up to 3)
depending on the QUIC version number.
Stop harding the Retry packet first byte in send_retry(): this is not more
possible because the packet type field depends on the QUIC version.
Also modify quic_build_packet_long_header() for the same reason: the packet
type depends on the QUIC version.
Add a quic_version C struct member to quic_conn C struct.
Modify qc_lstnr_pkt_rcv() to set this member asap.
Remove the version member from quic_rx_packet C struct: a packet is attached
asap to a connection (or dropped) which is the unique object which should
store the QUIC version.
Modify qc_pkt_is_supported_version() to return a supported quic_version C
struct from a version number.
Add Initial salt for QUIC v2 draft (initial_salt_v2_draft).
2022-06-16 14:56:24 +02:00
Frédéric Lécaille
83bf9ca25a CLEANUP: quid: QUIC draft-28 no more supported
Remove this useless definition.
2022-06-16 14:56:24 +02:00
Frédéric Lécaille
fa94f77bc5 BUG/MINOR: quic: Wrong PTO calculation
Due to missing brackets around the ternary C operator, quic_pto() could return zero
at the first run, before the QUIC connection was completely initialized. This leaded
the idle timeout task to be executed before this initialization completion. Then
this connection could be immediately released.

This bug was revealed by the multi_packet_client_hello QUIC tracker test.

Must be backported to 2.6.
2022-06-16 14:56:24 +02:00
Frédéric Lécaille
3f96a0a4c1 MINOR: quic: Add several nonce and key definitions for Retry tag
The nonce and keys used to cipher the Retry tag depend on the QUIC version.
Add these definitions for 0xff00001d (draft-29) and v2 QUIC version. At least
draft-29 is useful for QUIC tracker tests with "quic-force-retry" enabled
on haproxy side.
Validated with -v 0xff00001d ngtcp2 option.
Could not validate the v2 nonce and key at this time because not supported.
2022-06-16 14:56:24 +02:00
Amaury Denoyelle
60ef19f137 BUG/MINOR: h3/qpack: deal with too many headers
ensures that we never insert too many entries in a headers input list.
On the decoding side, a new error QPACK_ERR_TOO_LARGE is reported in
this case.

This prevents crash if headers number on a H3 request or response is
superior to tune.http.maxhdr config value. Previously, a crash would
occur in QPACK decoding function.

Note that the process still crashes later with ABORT_NOW() because error
reporting on frame parsing is not implemented for now. It should be
treated with a RESET_STREAM frame in most cases.

This can be backported up to 2.6.
2022-06-15 15:05:08 +02:00
Amaury Denoyelle
53eef46b88 MINOR: qpack: reduce dependencies on other modules
Clean up QPACK decoder API by removing dependencies on ncbuf and
MUX-QUIC. This reduces includes statements. It will also help to
implement a standalone QPACK decoder.
2022-06-15 11:20:48 +02:00
Amaury Denoyelle
c5d31ed8be MINOR: qpack: add comments and remove a useless trace
Add comments on the decoding function to facilitate code analysis.

Also remove the qpack_debug_hexdump() which prints the whole left buffer
on each header parsing. With large HEADERS frame payload, QPACK traces
are complicated to debug with this statement.
2022-06-15 11:20:42 +02:00
Willy Tarreau
3ccb14d60d MINOR: thread: get rid of MAX_THREADS_MASK
This macro was used both for binding and for lookups. When binding tasks
or FDs, using all_threads_mask instead is better as it will later be per
group. For lookups, ~0UL always does the job. Thus in practice the macro
was already almost not used anymore since the rest of the code could run
fine with a constant of all ones there.
2022-06-14 11:18:40 +02:00
Willy Tarreau
1a85a958dd MINOR: tinfo: remove the global thread ID bit (tid_bit)
Each thread has its own local thread id and its own global thread id,
in addition to the masks corresponding to each. Once the global thread
ID can go beyond 64 it will not be possible to have a global thread Id
bit anymore, so better start to remove it and use only the local one
from the struct thread_info.
2022-06-14 10:44:38 +02:00
Willy Tarreau
680ed5f28b MINOR: task: move profiling bit to per-thread
Instead of having a global mask of all the profiled threads, let's have
one flag per thread in each thread's flags. They are never accessed more
than one at a time an are better located inside the threads' contexts for
both performance and scalability.
2022-06-14 10:38:03 +02:00
Amaury Denoyelle
b9e0640405 BUG/MEDIUM: mux-quic: fix flow control connection Tx level
The flow control enforced at connection level is incorrectly calculated.
There is a risk of exceeding the limit. In most cases, this results in a
segfault produced by a BUG_ON which is here to catch this kind of error.
If not compiled with DEBUG_STRICT, this should generate a connection
closed by the client due to the flow control overflow.

The problem is encountered when transfered payload is big enough to fill
the transport congestion window. In this case, some data are rejected by
the transport layer and kept by the MUX to be reemitted later. However,
these preserved data are not counted on the connection flow control when
resubmitted, which gradually amplify the gap between expected and real
consumed flow control.

To fix this, handle the flow-control at the connection level in the same
way as the stream level. A new field qcc.tx.offsets is incremented as
soon as data are transfered between stream TX buffers. The field
qcc.tx.sent_offsets is preserved to count bytes handled by the transport
layer and stop the MUX transfer if limit is reached.

As already stated, this bug can occur during transfers with enough
emitted data, over multiple streams. When using a single stream, the
flow control at the stream level hides it.

The BUG_ON crash is reproduced systematically with quiche client :
$ quiche-client --no-verify --http-version HTTP/3 -n 10000 https://127.0.0.1:20443/10K

This must be backported up to 2.6 when confirmed to work as expected.

This should fix github issue #1738.
2022-06-10 17:30:41 +02:00
Amaury Denoyelle
c715eb7898 BUG/MINOR: h3: fix frame type definition
Frame type has changed during HTTP/3 specification process. Adjust it to
reflect the latest RFC 9114 status.

Concretly, type for GOAWAY and MAX_PUSH_ID frames has been adjusted.
The impact of this bug is limited as currently these frames are not
handled by haproxy and are ignored.

This can be backported up to 2.6.
2022-06-09 14:34:43 +02:00
Willy Tarreau
7d318ed8cc BUILD: compiler: implement unreachable for older compilers too
Benoit Dolez reported that gcc-4.4 emits several "may be used
uninitialized" warnings around places where there are BUG_ON()
or ABORT_NOW(). The reason is that __builtin_unreachable() was
introduced in gcc-4.5 thus older ones do not know that the code
after such statements is not reachable.

This patch solves the problem by deplacing the statement with
an infinite loop on older versions. The compiler knows that the
code following it cannot be reached, and this is quite cheap
(2 to 4 bytes depending on architectures). It even reduces the
code size a little bit as the compiler doesn't have to optimize
for branches that do not exist.

This may be backported to older versions.
2022-06-08 12:17:22 +02:00
Amaury Denoyelle
1f21ebdd76 MINOR: mux-quic/h3: adjust demuxing function return values
Clean the API used by decode_qcs() and transcoder internal functions.
Parsing functions now returns a ssize_t which represents the number of
consumed bytes or a negative error code. The total consumed bytes is
returned via decode_qcs().

The API is now unified and cleaner. The MUX can thus simply use the
return value of decode_qcs() instead of substracting the data bytes in
the buffer before and after the call. Transcoders functions are not
anymore obliged to remove consumed bytes from the buffer which was not
obvious.
2022-06-07 18:15:47 +02:00
Amaury Denoyelle
62eef85961 MINOR: mux-quic: simplify decode_qcs API
Slightly modify decode_qcs function used by transcoders. The MUX now
gives a buffer instance on which each transcoder is free to work on it.
At the return of the function, the MUX removes consume data from its own
buffer.

This reduces the number of invocation to qcs_consume at the end of a
full demuxing process. The API is also cleaner with the transcoders not
responsible of calling it with the risk of having the input buffer
freed if empty.
2022-06-07 18:15:47 +02:00
Amaury Denoyelle
c0156790e6 MINOR: h3: add h3c pointer into h3s instance
As a mirror to qcc/qcs types, add a h3c pointer into h3s struct. This
should help to clean up H3 code and avoid to use qcs.qcc.ctx to retrieve
the h3c instance.
2022-06-07 18:13:11 +02:00
Willy Tarreau
29698e39ed [RELEASE] Released version 2.7-dev0
Released version 2.7-dev0 with the following main changes :
    - MINOR: version: it's development again
2022-05-31 17:05:27 +02:00
Willy Tarreau
6391bb2de0 MINOR: version: it's development again
This essentially reverts b2c1e081f7.
2022-05-31 17:04:45 +02:00
Willy Tarreau
b2c1e081f7 MINOR: version: mention that it's LTS now.
The version will be maintained up to around Q2 2027. Let's
also update the INSTALL file to mention this.
2022-05-31 16:53:13 +02:00
Willy Tarreau
91a87918c9 BUILD: quic: use inttypes.h instead of stdint.h
The usual build joke on uncommon systems (AIX this time, though some
versions of Solaris are known for missing it as well).
2022-05-30 16:37:17 +02:00
Willy Tarreau
771483da3e MINOR: htx: add an unchecked version of htx_get_head_blk()
htx_get_head_blk() is used at plenty of places, many of which are known
to be safe, but the function checks for the presence of a first block
and returns NULL if it doesn't exist. While it's properly used, it makes
compilers complain at -Os on stream.c and mux_fcgi.c because they probably
don't propagate variables far enough to see that there's no risk.

Let's add an unchecked version for these use cases.
2022-05-30 16:25:16 +02:00
Frédéric Lécaille
6f7607ef1f MINOR: h3: Add a statistics module for h3
Add ->inc_err_cnt new callback to qcc_app_ops struct which can
be called from xprt to increment the application level error code counters.
It take the application context as first parameter to be generic and support
new QUIC applications to come.
Add h3_stats.c module with counters for all the frame types and error codes.
2022-05-30 09:59:26 +02:00
Frédéric Lécaille
eb79145f01 MINOR: quic_stats: Add transport new counters (lost, stateless reset, drop)
Add new counters to count the number of dropped packet upon parsing error, lost
sent packets and the number of stateless reset packet sent.
Take the oppportunity of this patch to rename CONN_OPENINGS to QUIC_ST_HALF_OPEN_CONN
(total number of half open connections) and QUIC_ST_HDSHK_FAILS to QUIC_ST_HDSHK_FAIL.
2022-05-30 09:59:26 +02:00
Frédéric Lécaille
3ccea6d276 MINOIR: quic_stats: add QUIC connection errors counters
Add statistical counters for all the transport level connection errrors.
2022-05-30 09:59:26 +02:00
Frédéric Lécaille
aee675746c MINOR: quic: Clarifications about transport parameters value
This is becoming difficult to distinguish the default values for
transport parameters which come with the RFC from our implementation
default values when not set by configuration (tunable parameters).
Add a comment to distinguish them.
Prefix these default values by QUIC_TP_DFLT_ to distinguish them from
QUIC_DFLT_* value even if there are not numerous.
Furthermore ->max_udp_payload_size must be first initialized to
QUIC_TP_DFLT_MAX_UDP_PAYLOAD_SIZE especially for received value.
2022-05-30 09:59:26 +02:00
Frédéric Lécaille
2674098569 MINOR: quic: Tunable "initial_max_streams_bidi" transport parameter
Add tunable "tune.quic.frontend.max_streams_bidi" setting for QUIC frontends
to set the "initial_max_streams_bidi" transport parameter.
Add some documentation for this new setting.
2022-05-30 09:59:26 +02:00
Frédéric Lécaille
1d96d6e024 MINOR: quic: Tunable "max_idle_timeout" transport parameter
Add two tunable settings both for backends and frontends "max_idle_timeout"
QUIC transport parameter, "tune.quic.frontend.max-idle-timeout" and
"tune.quic.backend.max-idle-timeout" respectively.
cfg_parse_quic_time() has been implemented to parse a time value thanks
to parse_time_err(). It should be reused for any tunable time value to be
parsed.
Add the documentation for this tunable setting only for frontend.
2022-05-30 09:59:26 +02:00
Frédéric Lécaille
c7785b5c26 MINOR: quic: Transport parameters dump
Add quic_transport_params_dump() static inline function to do so for
a quic_transport_parameters struct as parameter.
We use the trace API do dump these transport parameters both
after they have been initialized (RX/local) or received (TX/remote).
2022-05-30 09:59:26 +02:00
Frédéric Lécaille
748ece68b8 MINOR: quic: QUIC transport parameters split.
Make the transport parameters be standlone as much as possible as
it consists only in encoding/decoding data into/from buffers.
Reduce the size of xprt_quic.h. Unfortunalety, I think we will
have to continue to include <xprt_quic-t.h> to use the trace API
into this module.
2022-05-30 09:59:26 +02:00
Frédéric Lécaille
57ac3faed7 CLEANUP: quic: No more used handshake output buffer
->obuf quic_conn struct member is no more used.
2022-05-30 09:59:26 +02:00
Frédéric Lécaille
56f61b663b CLEANUP: quic: Useless QUIC_CONN_TX_BUF_SZ definition
This define is not more used.
2022-05-30 09:59:26 +02:00
Willy Tarreau
da59c895b9 CLEANUP: stconn: remove the new unneeded SE_FL_APP_MASK
The only two places where it was used was to carefully preserve the
SE_FL_WILL_CONSUME flag (since others are irrelevant there and the
previous RXBLK* flags moved to the stconn). Now that the flag is
cleared by default there's no need to re-created a fresh new one
when replacing the descriptor, so we can eliminate that remaining
trick.
2022-05-27 19:33:35 +02:00
Willy Tarreau
e4ebe261b1 MINOR: stconn: turn SE_FL_WILL_CONSUME to SE_FL_WONT_CONSUME
This flag was the only remaining one that was inverted as a blocking
condition, requiring special handling to preset it on sedesc allocation.
Let's flip it in its definition and accessors.
2022-05-27 19:33:35 +02:00
Willy Tarreau
d7b7e0df9a CLEANUP: mux-quic: rename the "endp" field to "sd"
The stream endpoint descriptor that was named "endp" is now called "sd"
both in the qcs struct and in the few functions using this.
2022-05-27 19:33:35 +02:00
Willy Tarreau
e68bc6178a CLEANUP: stconn: replace a few remaining occurrences of CS in comments or traces
A few "CS" desginating stconns were still present in code comments and
stream traces. This addresses them.
2022-05-27 19:33:35 +02:00
Willy Tarreau
1d2c79a53c CLEANUP: obj_type: rename OBJ_TYPE_CS to OBJ_TYPE_SC
Let's apply the new name to the type as well.
2022-05-27 19:33:35 +02:00
Willy Tarreau
df1a2fc234 CLEANUP: stream: rename stream_upgrade_from_cs() to stream_upgrade_from_sc()
It upgrades the protocol on a stream connector, let's update the name.
2022-05-27 19:33:35 +02:00
Willy Tarreau
c12b321661 CLEANUP: applet: rename appctx_cs() to appctx_sc()
It returns a stream connector, not a conn_stream anymore, so let's
fix its name.
2022-05-27 19:33:35 +02:00
Willy Tarreau
caff631bc0 CLEANUP: stats: rename all occurrences of stconn "cs" to "sc"
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. Both the core functions and the ones in the
resolvers files were updated.
2022-05-27 19:33:35 +02:00
Willy Tarreau
b49672d21f CLEANUP: stream: rename all occurrences of stconn "cs" to "sc"
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. The HTTP analyser and the backend functions
were all updated after being reviewed. Function stream_update_both_cs()
was renamed to stream_update_both_sc()
2022-05-27 19:33:35 +02:00
Willy Tarreau
3215e731b6 CLEANUP: quic/h3: rename all occurrences of stconn "cs" to "sc"
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. The "nb_cs" stream-connector counter was
renamed to "nb_sc" and qc_attach_cs() was renamed to qc_attach_sc().
2022-05-27 19:33:35 +02:00
Willy Tarreau
0adb281fb0 CLEANUP: stconn: rename all occurrences of stconn "cs" to "sc"
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. The change is huge (~580 lines), so extreme
care was given not to change anything else.
2022-05-27 19:33:35 +02:00
Willy Tarreau
61f5675cb4 CLEANUP: connection: rename all occurrences of stconn "cs" to "sc"
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion.
2022-05-27 19:33:35 +02:00
Willy Tarreau
bde14ad499 CLEANUP: check: rename all occurrences of stconn "cs" to "sc"
The check struct had a "cs" field renamed to "sc", which also required
a tiny update to a few functions using it to distinguish a check from
a stream (log.c, payload.c, ssl_sample.c, tcp_sample.c, tcpcheck.c,
connection.c).

Function arguments and local variables called "cs" were renamed to "sc".
The presence of one "cs=" in the debugging traces was also turned to
"sc=" for consistency.
2022-05-27 19:33:35 +02:00
Willy Tarreau
d137353ae3 CLEANUP: muxes: rename "get_first_cs" to "get_first_sc"
This is renamed both in the mux_ops descriptor and the mux functions
themselves to accommodate the new type name.
2022-05-27 19:33:35 +02:00
Willy Tarreau
cb086c6de1 REORG: stconn: rename conn_stream.{c,h} to stconn.{c,h}
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
2022-05-27 19:33:35 +02:00
Willy Tarreau
5edca2f0e1 REORG: rename cs_utils.h to sc_strm.h
This file contains all the stream-connector functions that are specific
to application layers of type stream. So let's name it accordingly so
that it's easier to figure what's located there.

The alphabetical ordering of include files was preserved.
2022-05-27 19:33:35 +02:00
Willy Tarreau
74568cf023 CLEANUP: stconn: rename final state manipulation functions from cs_* to sc_*
This applies the following renaming. It's a bit large but pretty
mechanical:

cs_state -> sc_state  (enum)

cs_alloc_ibuf() -> sc_alloc_ibuf()
cs_is_conn_error() -> sc_is_conn_error()
cs_opposite() -> sc_opposite()
cs_report_error() -> sc_report_error()
cs_set_state() -> sc_set_state()
cs_state_bit() -> sc_state_bit()
cs_state_in() -> sc_state_in()
cs_state_str() -> sc_state_str()
2022-05-27 19:33:35 +02:00
Willy Tarreau
f61dd19284 CLEANUP: stconn: rename cs_{shut,chk}* to sc_*
This applies the following renaming:

cs_shutr() -> sc_shutr()
cs_shutw() -> sc_shutw()
cs_chk_rcv() -> sc_chk_rcv()
cs_chk_snd() -> sc_chk_snd()
cs_must_kill_conn() -> sc_must_kill_conn()
2022-05-27 19:33:35 +02:00
Willy Tarreau
d68ff018c5 CLEANUP: stconn: rename cs{,_get}_{src,dst} to sc_*
The following functions were renamed:

cs_src() -> sc_src()
cs_dst() -> sc_dst()
cs_get_src() -> sc_get_src()
cs_get_dst() -> sc_get_dst()
2022-05-27 19:33:35 +02:00
Willy Tarreau
19c65a9ded CLEANUP: stconn: rename remaining management functions from cs_* to sc_*
This is the end of the renaming for the generic SC management functions
and macros:

cs_applet_process() -> sc_applet_process()
cs_attach_applet()  -> sc_attach_applet()
cs_attach_mux()     -> sc_attach_mux()
cs_attach_strm()    -> sc_attach_strm()
cs_detach_app()     -> sc_detach_app()
cs_detach_endp()    -> sc_detach_endp()
cs_notify()         -> sc_notify()
cs_reset_endp()     -> sc_reset_endp()
cs_state_in()       -> sc_state_in()
cs_update()         -> sc_update()
cs_update_rx()      -> sc_update_rx()
cs_update_tx()      -> sc_update_tx()
IS_HTX_CS()         -> IS_HTX_SC()
2022-05-27 19:33:35 +02:00
Willy Tarreau
a0b58b537d CLEANUP: stconn: rename cs_{new,create,free,destroy}_* to sc_*
This renames the following functions:

cs_new_from_endp()  -> sc_new_from_endp()
cs_new_from_strm()  -> sc_new_from_strm()
cs_new_from_check() -> sc_new_from_check()
cs_applet_create()  -> sc_applet_create()
cs_destroy()        -> sc_destroy()
cs_free()           -> sc_free()
2022-05-27 19:33:35 +02:00
Willy Tarreau
90e8b455b7 CLEANUP: stconn: rename cs_cant_get() to se_need_more_data()
An equivalent applet_need_more_data() was added as well since that function
is mostly used from applet code. It makes it much clearer that the applet
is waiting for data from the stream layer.
2022-05-27 19:33:35 +02:00
Willy Tarreau
75a8f8e290 CLEANUP: stconn: rename cs_{want,stop}_get() to se_{will,wont}_consume()
These ones are essentially for the stream endpoint, let's give them a
name that matches the intent. Equivalent versions were provided in the
applet namespace to ease code legibility.
2022-05-27 19:33:35 +02:00
Willy Tarreau
9f07b697ee CLEANUP: stconn: remove cs_tx_blocked() and cs_tx_endp_ready()
These ones were used exactly once and together, in sc_is_send_allowed().
No need to give them confusing names, instead let's just put the flags,
they're way more explicit, and drop the two functions.
2022-05-27 19:33:35 +02:00
Willy Tarreau
79cf6e1f15 CLEANUP: stconn: rename SE_FL_WANT_GET to SE_FL_WILL_CONSUME
This flag indicates the that stream endpoint is willing to consume output
data from the stream. Its new name makes this more explicit. The function
names will be updated accordingly, which will remove the disturbing "get"
everywhere.
2022-05-27 19:33:35 +02:00
Willy Tarreau
15252cd9c0 MEDIUM: stconn: move the RXBLK flags to the stream connector
The following flags are not at all related to the endpoint but to the
connector itself:
  - SE_FL_RXBLK_ROOM
  - SE_FL_RXBLK_BUFF
  - SE_FL_RXBLK_CHAN

As such they have no business staying in the endpoint descriptor and
they must move to the stream connector. They've also been renamed
accordingly to better match what they correspond to (the same name
as the function that sets them).

The rare occurrences of cs_rx_blocked() were replaced by an explicit
test on the list of flags. The reason is that cs_rx_blocked() used to
preserve some tests that are not needed at certain places since already
known. For the same reason SE_FL_RXBLK_ANY wasn't converted. As such it
will later be possible to carefully review these few locations and
eliminate the unneeded flags from the tests. No particular function
was made to test them since they're explicit enough.

It now looks like ci_putchk() and friends could very well place the flag
themselves on the connector when they detect a buffer full condition, as
this would significantly simplify the high-level API. But all usages must
first be reviewed before this simplification can be done. For now it
remains done by applet_put*() instead.
2022-05-27 19:33:35 +02:00
Willy Tarreau
8c02f8de14 CLEANUP: stconn: rename SE_FL_RX_WAIT_EP to SE_FL_HAVE_NO_DATA
It's more explicit this way. The cs_rx_endp_ready() function could be
removed so that the flag is directly tested. In the future it should
be inverted and the few places where it's set (or preserved via
SE_FL_APP_MASK) could be dropped.
2022-05-27 19:33:35 +02:00
Willy Tarreau
13d63afacd MINOR: stconn: add sc_is_recv_allowed() to check for ability to receive
At plenty of places we combine multiple flags checks to determine if we
can receive (endp_ready, rx_blocked, cf_shutr etc). Let's group them
under a single function that is meant to replace existing tests.

Some tests were only checking the rxblk flags at the connection level,
so for now they were not converted, this requires a bit of auditing
first, and probably a test to determine whether or not to check for
cf_shutr (e.g. there is none if no stream is present).
2022-05-27 19:33:35 +02:00
Willy Tarreau
4164eb94f3 MINOR: stconn: start to rename cs_rx_endp_{more,done}() to se_have_{no_,}more_data()
The analysis of cs_rx_endp_more() showed that the purpose is for a stream
endpoint to inform the connector that it's ready to deliver more data to
that one, and conversely cs_rx_endp_done() that it's done delivering data
so it should not be bothered again for this.

This was modified two ways:
  - the operation is no longer performed on the connector but on the
    endpoint so that there is no more doubt when reading applet code
    about what this rx refers to; it's the endpoint that has more or
    no more data.

  - an applet implementation is also provided and mostly used from
    applet code since it saves the caller from having to access the
    endpoint descriptor.

It's visible that the flag ought to be inverted because some places
have to set it by default for no reason.
2022-05-27 19:33:35 +02:00
Willy Tarreau
0ed73c376c CLEANUP: stconn: rename cs_rx_buff_{blk,rdy} to sc_{need,have}_buff()
These functions are used by the application layer to disable or enable
reading at the stream connector's level when the input buffer failed to
be allocated (or was finally allocated). The new names makes things
clearer.
2022-05-27 19:33:35 +02:00
Willy Tarreau
9512ab6e00 CLEANUP: stconn: rename cs_rx_chan_{blk,rdy} to sc_{wont,will}_read()
These functions were used by the channel to inform the lower layer
whether reading was acceptable or not. Usually this directly mimmicks
the CF_DONT_READ flag from the channel, which may be set when it's
desired not to buffer incoming data that will not be processed, or
that the buffer wants to be flushed before starting to read again,
or that bandwidth limiting might be enforced, etc. It's always a
policy reason, not a purely resource-based one.
2022-05-27 19:33:35 +02:00
Willy Tarreau
99615ed85d CLEANUP: stconn: rename cs_rx_room_{blk,rdy} to sc_{need,have}_room()
The new name mor eclearly indicates that a stream connector cannot make
any more progress because it needs room in the channel buffer, or that
it may be unblocked because the buffer now has more room available. The
testing function is sc_waiting_room(). This is mostly used by applets.
Note that the flags will change soon.
2022-05-27 19:33:35 +02:00
Willy Tarreau
b73262fc85 MEDIUM: stconn: take SE_FL_APPLET_NEED_CONN out of the RXBLK_ANY flags
This makes SE_FL_APPLET_NEED_CONN autonomous, in that we check for it
everywhere we have a relevant cs_rx_blocked(), so that the flag doesn't
need anymore to be covered by cs_rx_blocked(). Indeed, this flag doesn't
really translate a receive blocking condition but rather a refusal to
wake up an applet that is waiting for a connection to finish to setup.

This also ensures we will not risk to set it back on a new endpoint
after cs_reset_endp() via SE_FL_APP_MASK, because the flag being
specific to the endpoint only and not to the connector, we don't
want to preserve it when replacing the endpoint.

It's possible that cs_chk_rcv() could later be further simplified if
we can demonstrate that the two tests in it can be merged.
2022-05-27 19:33:35 +02:00
Willy Tarreau
b23edc8b8d MINOR: stconn: rename SE_FL_RXBLK_CONN to SE_FL_APPLET_NEED_CONN
This flag is exclusively used when a front applet needs to wait for the
other side to connect (or fail to). Let's give it a more explicit name
and remove the ambiguous function that was used only once.

This also ensures we will not risk to set it back on a new endpoint
after cs_reset_endp() via SE_FL_APP_MASK, because the flag being
specific to the endpoint only and not to the connector, we don't
want to preserve it when replacing the endpoint.
2022-05-27 19:33:35 +02:00
Willy Tarreau
676c8db134 MEDIUM: stconn: remove SE_FL_RXBLK_SHUT
This flag is no more needed, it was only set on shut read to be tested
by cs_rx_blocked() which is now properly tested for shutr as well. The
cs_rx_blk_shut() calls were removed. Interestingly it allowed to remove
a special case in the L7 retry code.

This also ensures we will not risk to set it back on a new endpoint
after cs_reset_endp() via SE_FL_APP_MASK.
2022-05-27 19:33:35 +02:00
Willy Tarreau
e7866b1ff7 MEDIUM: stconn: always rely on CF_SHUTR in addition to cs_rx_blocked()
One flag (RXBLK_SHUT) is always set with CF_SHUTR, so in order to remove
it, we first need to make sure we always check for CF_SHUTR where
cs_rx_blocked() is being used.
2022-05-27 19:33:35 +02:00
Willy Tarreau
516621bbe6 MINOR: stconn: remove calls to cs_done_get()
It was only called after setting SHUTW on the output channel, and since
it's now handled by sc_is_send_allowed() we don't need it anymore.
2022-05-27 19:33:34 +02:00
Willy Tarreau
a1547ce0a0 MINOR: stconn: consider CF_SHUTW for sc_is_send_allowed()
When a shutdown(WR) is performed, send is no longer allowed, and that is
currently handled by the explicit cs_done_get() in the various shutw()
calls. That's a bit ugly and complicated for no reason, let's simply
integrate the test of SHUTW in sc_is_send_allowed().

Note that the test could also be added wherever sc_is_send_allowed() is
used but for now proceeding like this limits the changes.
2022-05-27 19:33:34 +02:00
Willy Tarreau
902ba7e2bc CLEANUP: stconn: use a single function to know if SC may send to SE
sc_is_send_allowed() is now used everywhere instead of the combination
of cs_tx_endp_ready() && !cs_tx_blocked(). There's no place where we
need them individually thus it's simpler. The test was placed in cs_util
as we'll complete it later.
2022-05-27 19:33:34 +02:00
Willy Tarreau
6001c9217c CLEANUP: stconn: make a few functions take a const argument
A number of functions in cs_utils.h are not usable from functions taking
a const because they're not declared as using const, despite never
modifying the stconn. Let's address this for the following ones:

  sc_ic(), sc_oc(), sc_ib(), sc_ob(), sc_strm_task(),
  cs_opposite(), sc_conn_ready(), cs_src(), cs_dst(),
2022-05-27 19:33:34 +02:00
Willy Tarreau
967955b156 CLEANUP: stconn: rename cs_ep_set_error() to se_fl_set_error()
First it applies to the stream endpoint and not the conn_stream, and
second it only tests and touches the flags so it makes sense to call
it se_fl_ like other functions which only manipulate the flags, as
it's just a special case of flags.
2022-05-27 19:33:34 +02:00
Willy Tarreau
108423819c CLEANUP: stconn: rename cs_conn_get_first() to conn_get_first_sc()
It returns an stconn from a connection and not the opposite, so the name
change was more appropriate. In addition it was moved to connection.h
which manipulates the connection stuff, and it happens that only
connection.c uses it.
2022-05-27 19:33:34 +02:00
Willy Tarreau
462b989d4c CLEANUP: stconn: rename cs_conn_*() to sc_conn_*()
The following functions which act on a connection-based stream connector
were renamed to sc_conn_* (~60 places):

  cs_conn_drain_and_shut
  cs_conn_process
  cs_conn_read0
  cs_conn_ready
  cs_conn_recv
  cs_conn_send
  cs_conn_shut
  cs_conn_shutr
  cs_conn_shutw
2022-05-27 19:33:34 +02:00
Willy Tarreau
f8d0ab54ec CLEANUP: stconn: rename cs_get_data_name() to sc_get_data_name()
Only used twice to dump stream debug info.
2022-05-27 19:33:34 +02:00
Willy Tarreau
fa57cc7b20 CLEANUP: stconn: rename __cs_endp_target() to __sc_endp()
The function returns the real stream endpoint so since there's no more
confusion around the terminology, let's drop "target".
2022-05-27 19:33:34 +02:00
Willy Tarreau
8e7c6e6907 CLEANUP: stconn: rename cs_appctx() to sc_appctx()
Nothing special, just s/cs/sc/, roughly 50-60 entries.
2022-05-27 19:33:34 +02:00
Willy Tarreau
417a31bb55 CLEANUP: stconn: rename cs_conn_mux() to sc_mux_ops()
This effectively returns the mux_ops from the connection when it exists
on an stconn.
2022-05-27 19:33:34 +02:00
Willy Tarreau
6fe2b42e45 CLEANUP: stconn: rename cs_mux() to sc_mux_strm()
The function doesn't return a pointer to the mux but to the mux stream
(h1s, h2s etc). Let's adjust its name to reflect this. It's rarely used,
the name can be enlarged a bit. And of course s/cs/sc to accommodate for
the updated name.
2022-05-27 19:33:34 +02:00
Willy Tarreau
fd9417ba3f CLEANUP: stconn: rename cs_conn() to sc_conn()
It's mostly used from upper layers. Both the checked and unchecked
functions were updated, or ~150 entries.
2022-05-27 19:33:34 +02:00
Willy Tarreau
ea27f48c5a CLEANUP: stconn: rename cs_{check,strm,strm_task} to sc_strm_*
These functions return the app-layer associated with an stconn, which
is a check, a stream or a stream's task. They're used a lot to access
channels, flags and for waking up tasks. Let's just name them
appropriately for the stream connector.
2022-05-27 19:33:34 +02:00
Willy Tarreau
40a9c32e3a CLEANUP: stconn: rename cs_{i,o}{b,c} to sc_{i,o}{b,c}
We're starting to propagate the stream connector's new name through the
API. Most call places of these functions that retrieve the channel or its
buffer are in applets. The local variable names are not changed in order
to keep the changes small and reviewable. There were ~92 uses of cs_ic(),
~96 of cs_oc() (due to co_get*() being less factorizable than ci_put*),
and ~5 accesses to the buffer itself.
2022-05-27 19:33:34 +02:00
Willy Tarreau
15c25d5e1d MINOR: applet: add new wrappers to put chk/blk/str/chr to channel from appctx
The vast majority of calls to ci_putchk() etc are performed from applets
which directly know an endpoint. Figuring the correct API (writing into
input channel etc) isn't trivial for newcomers, and knowing that they
must mark the flag indicating a buffer full condition isn't trivial
either.

Here we're adding wrappers to these functions but to be used directly
from the appctx. That's already what is being done in multiple steps in
the applet code, where the endp is derived from the appctx, then the cs
from the endp, then the stream from the cs, then the channel from the
stream, and so on. But this time the function doesn't require to know
much of the internals, applet_putchr() writes a char from the appctx,
and marks the buffer full if needed. Period. This will allow to remove
a significant amount of obscure ci_putchk() and cs_ic() calls from the
code, hence a significant number of possible mistakes.
2022-05-27 19:33:34 +02:00
Willy Tarreau
2f2318df87 MEDIUM: stconn: merge the app_ops and the data_cb fields
For historical reasons (stream-interface and connections), we used to
require two independent fields for the application level callbacks and
the transport-level functions. Over time the distinction faded away so
much that the low-level functions became specific to the application
and conversely. For example, applets may only work with streams on top
since they rely on the channels, and the stream-level functions differ
between applets and connections. Right now the application level only
contains a wake() callback and the low-level ones contain the functions
that act at the lower level to perform the shutr/shutw and at the upper
level to notify about readability and writability. Let's just merge them
together into a single set and get rid of this confusing distinction.
Note that the check ops do not define any app-level function since these
are only called by streams.
2022-05-27 19:33:34 +02:00
Willy Tarreau
c086960a03 MINOR: conn_stream: test the various ops functions before calling them
We currently call all ->shutr, ->chk_snd etc from ->ops unconditionally,
while the ->wake() call from data_cb is checked. Better check ops as
well for consistency, this will help get them merged.
2022-05-27 19:33:34 +02:00
Willy Tarreau
f3ae34b67d MINOR: check: export wake_srv_chk()
We'll need it to centralize the stream connectors definitions.
2022-05-27 19:33:34 +02:00
Willy Tarreau
026e8fb290 CLEANUP: stconn: tree-wide rename stconn states CS_ST/SB_* to SC_ST/SB_*
This also follows the natural naming. There are roughly 238 changes, all
totally trivial. conn_stream-t.h has become completely void of any
"conn_stream" related stuff now (except its name).
2022-05-27 19:33:34 +02:00
Willy Tarreau
cb04166525 CLEANUP: stconn: tree-wide rename stream connector flags CS_FL_* to SC_FL_*
This follows the natural naming. There are roughly 100 changes, all
totally trivial.
2022-05-27 19:33:34 +02:00
Willy Tarreau
7cb9e6c6ba CLEANUP: stream: rename "csf" and "csb" to "scf" and "scb"
These are the stream connectors, let's give them consistent names. The
patch is large (405 locations) but totally trivial.
2022-05-27 19:33:34 +02:00
Willy Tarreau
c105492bf5 CLEANUP: stdesc: rename the stream connector ->cs field to ->sc
This is a rename of this field. Most of the places were in muxes, but
were already factored with the previous series adding *_sc().
2022-05-27 19:33:34 +02:00
Willy Tarreau
4596fe20d9 CLEANUP: conn_stream: tree-wide rename to stconn (stream connector)
This renames the "struct conn_stream" to "struct stconn" and updates
the descriptions in all comments (and the rare help descriptions) to
"stream connector" or "connector". This touches a lot of files but
the change is minimal. The local variables were not even renamed, so
there's still a lot of "cs" everywhere.
2022-05-27 19:33:34 +02:00
Willy Tarreau
3a3f480d15 CLEANUP: conn_stream: rename cs_app_* to sc_app_*
Let's start to introduce the stream connector at the app_ops level.
This is entirely self-contained into conn_stream.c. The functions
were also updated to reflect the new name, and the comments were
updated.
2022-05-27 19:33:34 +02:00
Willy Tarreau
798465b02c CLEANUP: conn_stream: rename the conn_stream's endp to sedesc
Just like for the appctx, this is a pointer to a stream endpoint descriptor,
so let's make this explicit and not confuse it with the full endpoint. There
are very few changes thanks to the preliminary refactoring of the flags
manipulation.
2022-05-27 19:33:34 +02:00
Willy Tarreau
d869e13ed8 CLEANUP: applet: rename the sedesc pointer from "endp" to "sedesc"
Now at least it makes it obvious that it's the stream endpoint descriptor
and not an endpoint. There were few changes thanks to the previous refactor
of the flags.
2022-05-27 19:33:34 +02:00
Willy Tarreau
ea59b0201c CLEANUP: conn_stream: rename cs_endpoint to sedesc (stream endpoint descriptor)
After some discussion we found that the cs_endpoint was precisely the
descriptor for a stream endpoint, hence the naturally coming name,
stream endpoint constructor.

This patch renames only the type everywhere and the new/init/free functions
to remain consistent with it. Future patches will address field names and
argument names in various code areas.
2022-05-27 19:33:34 +02:00
Willy Tarreau
65d0597b2b CLEANUP: conn_stream: rename the cs_endpoint's target to "se"
That's the "stream endpoint" pointer. Let's change it now while it's
not much spread. The function __cs_endp_target() wasn't yet renamed
because that will change more globally soon.
2022-05-27 19:33:34 +02:00
Willy Tarreau
b605c4213f CLEANUP: conn_stream: rename the stream endpoint flags CS_EP_* to SE_FL_*
Let's now use the new flag names for the stream endpoint.
2022-05-27 19:33:34 +02:00
Willy Tarreau
d56377c5eb CLEANUP: conn_stream: apply endp_flags.cocci tree-wide
This changes all main uses of endp->flags to the se_fl_*() equivalent
by applying coccinelle script endp_flags.cocci. The se_fl_*() functions
themselves were manually excluded from the change, of course.

Note: 144 locations were touched, manually reviewed and found to be OK.

The script was applied with all includes:

  spatch --in-place --recursive-includes -I include --sp-file $script $files
2022-05-27 19:33:34 +02:00
Willy Tarreau
0cfcc40812 CLEANUP: conn_stream: apply cs_endp_flags.cocci tree-wide
This changes all main uses of cs->endp->flags to the sc_ep_*() equivalent
by applying coccinelle script cs_endp_flags.cocci.

Note: 143 locations were touched, manually reviewed and found to be OK,
except a single one that was adjusted in cs_reset_endp() where the flags
are read and filtered to be used as-is and not as a boolean, hence was
replaced with sc_ep_get() & $FLAGS.

The script was applied with all includes:

  spatch --in-place --recursive-includes -I include --sp-file $script $files
2022-05-27 19:33:34 +02:00
Willy Tarreau
cd1d585e53 MINOR: conn_stream: add new sets of functions to set/get endpoint flags
At plenty of places we need to manipulate the conn_stream's endpoint just
to set or clear a flag. This patch adds a handful of functions to perform
the common operations (clr/set/get etc) on these flags at both the endpoint
and at the conn_stream level.

The functions were named after the target names, i.e. se_fl_*() to act on
the stream endpoint flags, and sc_ep_* to manipulate the endpoint flags
from the stream connector (currently conn_stream).

For now they're not used.
2022-05-27 19:33:34 +02:00
Willy Tarreau
24d15b1891 CLEANUP: conn_stream: rename the cs_endpoint's context to "conn"
This one is exclusively used by the connection, regardless its generic
name "ctx" is rather confusing. Let's make it a struct connection* and
call it "conn". This way there's no doubt about what it is and there's
no way it will be used by accident by being taken for something else.
2022-05-27 19:33:34 +02:00
Christopher Faulet
a45403f965 Revert "BUG/MINOR: task: Don't defer tasks release when HAProxy is stopping"
This reverts commit d9404b464f.

In fact, there is a BUG_ON() in __task_free() function to be sure the task
is no longer in the wait-queue or the run-queue. Because the patch tries to
fix a "leak" on deinit, it is safer to revert it. there is no reason to
introduce potential bug for this kind of issues. And there is no reason to
impact the normal use-cases at runtime with additionnal conditions to only
remove a task on deinit.
2022-05-25 16:41:52 +02:00
Amaury Denoyelle
8c6176b8db MINOR: h3: refactor SETTINGS parsing/error reporting
Bring some improvment to h3_parse_settings_frm() function. The first one
is the parsing which now manipulates a buffer instead of a plain char*.
This is more to unify with other parsing functions rather than dealing
with data wrapping : it's unlikely to happen as SETTINGS is only
received as the first frame on the control STREAM.

Various errors are now properly reported as connection error :
* on incomplete frame payload
* on a duplicated settings in the same frame
* on reserved settings receive
2022-05-25 15:41:25 +02:00
Amaury Denoyelle
849b24f15b MINOR: h3: abort read on unknown uni stream
As specified by HTTP/3 draft, an unknown unidirectional stream can be
aborted. To do this, use a new flag QC_SF_READ_ABORTED. When the MUX
detects this flag, QCS instance is automatically freed.

Previously, such streams were instead automatically drained. By aborting
them, we economize some useless memcpy instruction. On future data
reception, QCS instance is not found in the tree and considered as
already closed. The frame payload is thus deleted without copying it.
2022-05-25 15:41:25 +02:00
Amaury Denoyelle
9cc475182c CLEANUP: h3: remove h3 uni tasklet
Remove all unnecessary bits of code for H3 unidirectional streams. Most
notable, an individual tasklet is not require anymore for each stream.
This is useless since the merge of RX/TX uni streams handling with
bidirectional streams code.
2022-05-25 15:41:25 +02:00
Amaury Denoyelle
f8db5aaf78 MEDIUM: quic: refactor uni streams RX
The whole QUIC stack is impacted by this change :
* at quic-conn level, a single function is now used to handle uni and
  bidirectional streams. It uses qcc_recv() function from MUX.
* at MUX level, qc_recv() io-handler function does not skip uni streams
* most changes are conducted at app layer. Most notably, all received
  data is handle by decode_qcs operation.

Now that decode_qcs is the single app read function, the H3 layer can be
simplified. Uni streams parsing was extracted from h3_attach_ruqs() to
h3_decode_qcs().

h3_decode_qcs() is able to deal with all HTTP/3 frame types. It first
check if the frame is valid for the H3 stream type. Most notably,
SETTINGS parsing was moved from h3_control_recv() into h3_decode_qcs().

This commit has some major benefits besides removing duplicated code.
Mainly, QUIC flow control is now enforced for uni streams as with bidi
streams. Also, an unknown frame received on control stream does not set
an error : it is now silently ignored as required by the specification.

Some cleaning in H3 code is already done with this patch :
h3_control_recv() and h3_attach_ruqs() are removed as they are now
unused. A final patch should clean up the unneeded remaining bit.
2022-05-25 15:41:25 +02:00
Amaury Denoyelle
3236a8e85c MINOR: h3: define stream type
Define a new enum h3s_t. This is used to differentiate between the
different stream types used in a HTTP/3 connection, including the QPACK
encoder/decoder streams.

For the moment, only bidirectional streams is positioned. This patch
will be useful to unify reception of uni streams with bidirectional
ones.
2022-05-25 15:41:25 +02:00
Amaury Denoyelle
6b92394973 MINOR: h3/qpack: use qcs as type in decode callbacks
Replace h3_uqs type by qcs in stream callbacks. This change is done in
the context of unification between bidi and uni-streams. h3_uqs type
will be unneeded when this is achieved.
2022-05-25 15:41:25 +02:00
Amaury Denoyelle
f9e190e49a MINOR: quic: support CONNECTION_CLOSE_APP emission
Complete quic-conn API for error reporting. A new parameter <app> is
defined in the function quic_set_connection_close(). This will transform
the frame into a CONNECTION_CLOSE_APP type.

This type of frame will be generated by the applicative layer, h3 or
hq-interop for the moment. A new function qcc_emit_cc_app() is exported
by the MUX layer for them.
2022-05-25 15:41:25 +02:00
Amaury Denoyelle
081479df92 CLEANUP: h3: rename uni stream type constants
Cosmetic fix which reduce the name of unidirectional stream constants.
No impact on the code.
2022-05-25 15:41:25 +02:00
Amaury Denoyelle
1c25b18e17 MINOR: mux-quic: delay cs_endpoint allocation
Do not allocate cs_endpoint for every QCS instances in qcs_new().
Instead, this is delayed to qc_attach_cs() function.

In effect, with H3 as app protocol, cs_endpoint will be allocated on
HEADERS parsing. Thus, no cs_endpoint is allocated for H3 unidirectional
streams which do not convey any HTTP data.
2022-05-25 15:41:25 +02:00
Christopher Faulet
d9404b464f BUG/MINOR: task: Don't defer tasks release when HAProxy is stopping
A running or queued task is not released when task_destroy() is called,
except if it is the current task. Its process function is set to NULL and we
let the scheduler to release the task. However, when HAProxy is stopping, it
never happens and some tasks may leak. To fix the issue, we now also rely on
the global MODE_STOPPING flag. When this flag is set, the task is always
immediately released.

This patch should fix the issue #1714. It could be backported as far as 2.4
but it's not a real problem in practice because it only happens on
deinit. The leak exists on previous versions but not MODE_STOPPING flag.
2022-05-25 15:31:21 +02:00
David CARLIER
842e4a6617 BUILD/MINOR: cpuset fix build for FreeBSD 13.1
the cpuset api changes done fir the future 14 release had been
backported to the 13.1 release so changing the cpuset api of choice
condition change accordingly.
2022-05-20 23:06:03 +02:00
Willy Tarreau
b5821e12ce MINOR: connection: add flag MX_FL_FRAMED to mark muxes relying on framed xprt
In order to be able to check compatibility between muxes and transport
layers, we'll need a new flag to tag muxes that work on framed transport
layers like QUIC. Only QUIC has this flag now.
2022-05-20 18:41:55 +02:00
Willy Tarreau
91b780a455 CLEANUP: listener: store stream vs dgram at the bind_conf level
Let's collect the set of xprt-level and sock-level dgram/stream protocols
seen on a bind line and store that in the bind_conf itself while they're
being parsed. This will make it much easier to detect incompatibilities
later than the current approch which consists in scanning all listeners
in post-parsing.
2022-05-20 18:41:55 +02:00
Willy Tarreau
787e92a4fb CLEANUP: listener: replace bind_conf->quic_force_retry with BC_O_QUIC_FORCE_RETRY
It was only set and used once, let's replace it now and take it out of
the ifdef.
2022-05-20 18:41:51 +02:00
Willy Tarreau
1ea6e6a17f CLEANUP: listener: replace bind_conf->generate_cers with BC_O_GENERATE_CERTS
The new flag will now replace this boolean variable.
2022-05-20 18:39:43 +02:00
Willy Tarreau
11ba404c6b CLEANUP: listener: replace all uses of bind_conf->is_ssl with BC_O_USE_SSL
The new flag will now replace this boolean variable that was only set and
tested.
2022-05-20 18:39:43 +02:00
Willy Tarreau
c694471b21 MINOR: listener: add a new "options" entry in bind_conf
There is no way to store useful info there, yet there's about one entry
per boolean. Let's add an "options" attribute which will collect various
options.

In practice, even the BC_O_SSL_* flags and a few info such as strict_sni
could move there.
2022-05-20 18:39:43 +02:00
Willy Tarreau
fca044bda5 CLEANUP: listener: add a comment about what the BC_SSL_O_* flags are for
They're for ->ssl_options but it wasn't obvious.
2022-05-20 18:39:43 +02:00
Willy Tarreau
3882d2a96c MINOR: listener: provide a function to process all of a bind_conf's arguments
The "bind" parsing code was duplicated for the peers section and as a
result it wasn't kept updated, resulting in slightly different error
behavior (e.g. errors were not freed, warnings were emitted as alerts)
Let's first unify it into a new dedicated function that properly reports
and frees the error.
2022-05-20 18:39:43 +02:00
Willy Tarreau
91b47263f7 MINOR: protocol: replace ctrl_type with xprt_type and clarify it
There's been some great confusion between proto_type, ctrl_type and
sock_type. It turns out that ctrl_type was improperly chosen because
it's not the control layer that is of this or that type, but the
transport layer, and it turns out that the transport layer doesn't
(normally) denaturate the underlying control layer, except for QUIC
which turns dgrams to streams. The fact that the SOCK_{DGRAM|STREAM}
set of values was used added to the confusion.

Let's replace it with xprt_type which reuses the later introduced
PROTO_TYPE_* values, and update the comments to explain which one
works at what level.
2022-05-20 18:39:43 +02:00
Amaury Denoyelle
d46b0f52ae MINOR: mux-quic: emit FLOW_CONTROL_ERROR
Send a CONNECTION_CLOSE if the peer emits more data than authorized by
our flow-control. This is implemented for both stream and connection
level.

Fields have been added in qcc/qcs structures to differentiate received
offsets for limit enforcing with consumed offsets for sending of
MAX_DATA/MAX_STREAM_DATA frames.
2022-05-20 17:47:09 +02:00
Amaury Denoyelle
9fab9fd7e5 MINOR: quic/mux-quic: define CONNECTION_CLOSE send API
Define an API to easily set a CONNECTION_CLOSE. This will mainly be
useful for the MUX when an error is detected which require to close the
whole connection.

On the MUX side, a new flag is added when a CONNECTION_CLOSE has been
prepared. This will disable add future send operations.
2022-05-20 17:26:56 +02:00
Frédéric Lécaille
9286210aa8 MINOR: quic: Add tune.quic.retry-threshold keyword
This QUIC specific keyword may be used to set the theshold, in number of
connection openings, beyond which QUIC Retry feature will be automatically
enabled. Its default value is 100.
2022-05-20 17:11:13 +02:00
Frédéric Lécaille
cbd59c7ab6 MINOR: quic: QUIC stats counters handling
First commit to handle the QUIC stats counters. There is nothing special to say
except perhaps for ->conn_openings which is a gauge to count the number of
connection openings. It is incremented after having instantiated a quic_conn
struct, then decremented when the handshake was successful (handshake completed
state) or failed or when the connection timed out without reaching the handshake
completed state.
2022-05-20 17:11:13 +02:00
Frédéric Lécaille
a89659a752 MINOR: quic: Attach proxy QUIC stats counters to the QUIC connection
Make usage of EXTRA_COUNTERS_GET() do to so from qc_new_conn().
2022-05-20 17:11:13 +02:00
Frédéric Lécaille
a58cafeb89 MINOR: quic_stats: Add a new stats module for QUIC
This is a very minimalist frontend only stats module with only one gauge for the
QUIC establishing connections count.
2022-05-20 17:11:13 +02:00
Frédéric Lécaille
2822593a12 BUILD: stats: Missing headers inclusions from stats.h
If we add a new stats module to C source files including only
stats.h we get these errors:

    include/haproxy/stats.h:39:31: error: array type has incomplete element type
    ‘struct name_desc’
       39 | extern const struct name_desc stat_fields[];

    include/haproxy/stats.h:55:50: warning: ‘struct listener’ declared inside
    parameter list will not be visible outside of this definition or declaration
       55 | int stats_fill_li_stats(struct proxy *px, struct listener *l, int flags,

name_desc struct is defined in tools-t.h and listener struct in listner-t.h.
2022-05-20 16:57:12 +02:00
Frédéric Lécaille
6492e66e41 MINOR: quic: Move quic_lstnr_dgram_dispatch() out of xprt_quic.c
Remove this function from xprt_quic.c which for now implements only
"by thread attached to a connection" code.
2022-05-20 16:57:12 +02:00
Frédéric Lécaille
3f3ff47998 MINOR: quic: Retry implementation
Here is the format of a token:
        - format (1 byte)
        - ODCID (from 9 up 21 bytes)
        - creation timestamp (4 bytes)
        - salt (16 bytes)

A format byte is required to distinguish the Retry token from others sent in
NEW_TOKEN frames.

The Retry token is ciphered after having derived a strong secret from the cluster secret
and generated the AEAD AAD, as well as a 16 bytes long salt. This salt is
added to the token. Obviously it is not ciphered. The format byte is not
ciphered too.

The AAD are built by quic_generate_retry_token_aad() which concatenates the version,
the client SCID and the IP address and port. We had to implement quic_saddr_cpy()
to copy the IP address and port to the AAD buffer. Only the Retry SCID is generated
on our side to build a Retry packet, the others fields come from the first packet
received by the client. It must reuse this Retry SCID in response to our Retry packet.
So, we have not to store it on our side. Everything is offloaded to the client (stateless).
quic_generate_retry_token() must be used to generate a Retry packet. It calls
quic_pkt_encrypt() to cipher the token.

quic_generate_retry_check() must be used to check the validity of a Retry token.
It is able to decipher a token which arrives into an Initial packet in response
to a Retry packet. It calls parse_retry_token() after having deciphered the token
to store the ODCID into a local quic_cid struct variable. Finally this ODCID may
be stored into the transport parameter thanks to qc_lstnr_params_init().
The Retry token lifetime is 10 seconds. This lifetime is also checked by
quic_generate_retry_check(). If quic_generate_retry_check() fails, the received
packet is dropped without anymore packet processing at this time.
2022-05-20 16:57:12 +02:00
Frédéric Lécaille
55367c8679 MINOR: quic_tls: Add quic_tls_decrypt2() implementation
This function does exactly the same thing as quic_tls_decrypt(), except that
it does reuse its input buffer as output buffer. This is needed
to decrypt the Retry token without modifying the packet buffer which
contains this token. Indeed, this would prevent us from decryption
the packet itself as the token belong to the AEAD AAD for the packet.
2022-05-20 16:57:12 +02:00
Frédéric Lécaille
a9c5d8da58 MINOR: quic_tls: Add quic_tls_derive_retry_token_secret()
This function must be used to derive strong secrets from a non pseudo-random
secret (cluster-secret setting in our case) and an IV. First it call
quic_hkdf_extract_and_expand() to do that for a temporary strong secret (tmpkey)
then two calls to quic_hkdf_expand() reusing this strong temporary secret
to derive the final strong secret and IV.
2022-05-20 16:57:12 +02:00
Frédéric Lécaille
359d877f73 MINOR: quic: Dump initial derived secrets
It seems <qc> parameters was removed for an unknown reason preventing
these secrets to dumped by the traces.
2022-05-20 16:57:12 +02:00
Amaury Denoyelle
fe1c785bcc CLEANUP: quic: adjust comment/coding style for TPs init
Fix typo in comment and adjust code alignment for better readability.
2022-05-19 17:40:09 +02:00
Amaury Denoyelle
0daef007e4 BUG/MEDIUM: quic: fix initialization for local/remote TPs
The local and remote TPs were both processed through the same function
quic_transport_params_init(). This caused the remote TPs to be
overwritten with values configured for our local usage.

Change this by reserving quic_transport_params_init() only for our local
TPs. Remote TPs are simply initialized via
quic_dflt_transport_params_cpy().

This bug could result in a connection closed in error by the client due
to a violation of its TPs. For example, curl client closed the
connection after receiving too many CONNECTION_ID due to an invalid
active_connection_id value used.
2022-05-19 17:40:09 +02:00
Christopher Faulet
c95eaefbfd MEDIUM: check: Use the CS to handle subscriptions for read/write events
Instead of using the health-check to subscribe to read/write events, we now
rely on the conn-stream. Indeed, on the server side, the conn-stream's
endpoint is a multiplexer. Thus it seems appropriate to handle subscriptions
for read/write events the same way than for the streams. Of course, the I/O
callback function is not the same. We use srv_chk_io_cb() instead of
cs_conn_io_cb().
2022-05-19 10:12:38 +02:00
Christopher Faulet
361417f9b4 REORG: check: Rename and export I/O callback function
event_srv_chk_io() function is renamed srv_chk_io_cb() to be consistant with
the I/O callback function of connections. In addition, this function is
exported. It will be required to use the conn-stream's subscriptions.
2022-05-19 10:12:38 +02:00
Amaury Denoyelle
c830e1e904 MINOR: mux-quic: implement MAX_DATA emission
This commit is similar to the previous one but deals with MAX_DATA for
connection-level data flow control. It uses the same function
qcc_consume_qcs() to update flow control level and generate a MAX_DATA
frame if needed.
2022-05-18 16:25:07 +02:00
Amaury Denoyelle
a977355aa1 MINOR: mux-quic: implement MAX_STREAM_DATA emission
Send MAX_STREAM_DATA frames when at least half of the allocated
flow-control has been demuxed, frame and cleared. This is necessary to
support QUIC STREAM with received data greater than a buffer.

Transcoders must use the new function qcc_consume_qcs() to empty the QCS
buffer. This will allow to monitor current flow-control level and
generate a MAX_STREAM_DATA frame if required. This frame will be emitted
via qc_io_cb().
2022-05-18 16:25:07 +02:00
Amaury Denoyelle
c985cb167d MINOR: mux-quic: reorganize flow-control frames emission
Adjust the mechanism for MAX_STREAMS_BIDI emission. When a bidirectional
stream is removed, current flow-control level is checked. If needed, a
MAX_STREAMS_BIDI frame is generated and inserted in a new list in the
QCS instance. The new frames will be emitted at the start of qc_send().

This has no impact on the current MAX_STREAMS_BIDI behavior. However,
this mechanism is more flexible and will allow to implement quickly
MAX_STREAM_DATA/MAX_DATA emission.
2022-05-18 15:52:44 +02:00
Amaury Denoyelle
3a0864067a MINOR: mux-quic: remove qcc_decode_qcs() call in XPRT
Slightly change the interface for qcc_recv() between MUX and XPRT. The
MUX is now responsible to call qcc_decode_qcs(). This is cleaner as now
the XPRT does not have to deal with an extra QCS parameter and the MUX
will call qcc_decode_qcs() only if really needed.

This change is possible since there is no extra buffering for
out-of-order STREAM frames and the XPRT does not have to handle buffered
frames.
2022-05-18 15:50:57 +02:00
Amaury Denoyelle
03dcf560ae BUG/MINOR: mux-quic: update session's idle delay before stream creation
This commit is an adaptation from the following patch :
  commit d0de677682
  Author: Willy Tarreau <w@1wt.eu>
  Date:   Fri Feb 4 09:05:37 2022 +0100
  BUG/MINOR: mux-h2: update the session's idle delay before creating the stream

This should fix the incorrect timeouts present in httplog format for
QUIC requests.
2022-05-18 15:30:13 +02:00
Amaury Denoyelle
ca21c768b9 MINOR: ncbuf: refactor ncb_advance()
First adjusted some typos in comments inside the function. Second,
change the naming of some variable to reduce confusion.

A special case has been inserted when advance is done inside a GAP block
and this block is the last of the buffer. In this case, the whole buffer
will be emptied, equivalent to a ncb_init() operation.
2022-05-18 15:30:13 +02:00
Amaury Denoyelle
82c51b561e OPTIM: quic: realign empty Rx buffer
quic_rx_pkts_del() function removes packets from QUIC RX buffer. In most
cases, the buffer will be emptied after it. In this case, it's useful to
realign it. This will avoid future data wrapping and use of an
unnecessary junk to fill a too small contiguous space.
2022-05-18 15:16:26 +02:00
Maciej Zdeb
d01be2ab13 MINOR: peers: Track number of applets run by thread
Maintain number of peers applets run on all threads. It will be used
in next patch for least loaded thread selection.
2022-05-17 16:13:22 +02:00
Christopher Faulet
d9c1d33fa1 MEDIUM: applet: Add support for async appctx startup on a thread subset
It is now possible to start an appctx on a thread subset. Some controls were
added here and there. It is forbidden to start a backend appctx on another
thread than the local one. If a frontend appctx is started on another thread
or a thread subset, the applet .init callback function must be defined. This
callback function is responsible to finalize the appctx startup. It can be
performed synchornously. In this case, the appctx is started on the local
thread. It is not really useful but it is valid. Or it can be performed
asynchronously. In this case, .init callback function is called when the
appctx is woken up for the first time. When this happens, the appctx
affinity is set to the current thread to be able to start the session and
the stream.
2022-05-17 16:13:22 +02:00
Christopher Faulet
6095d57701 MINOR: applet: Add API to start applet on a thread subset
In the same way than for the tasks, the applets api was changed to be able
to start a new appctx on a thread subset. For now the feature is
disabled. Only appctx_new_here() is working. But it will be possible to
start an appctx on a specific thread or a subset via a mask.
2022-05-17 16:13:22 +02:00
Christopher Faulet
387e79727c MINOR: peers: Add a ref to peers section in the peer structure
This change is required to handle asynchrone init of the appctx. It is now
possible to directly get the peers section associated to a peer.
2022-05-17 16:13:22 +02:00
Christopher Faulet
2ae25ea24b MINOR: sink: Add a ref to sink in the sink_forward_target structure
This change is required to be able to refactor the init stage of appctx. It
is now possible to directly get the sink from a forward target.
2022-05-17 16:13:22 +02:00
Christopher Faulet
d0c4ec04b8 MINOR: applet: Add function to release appctx on error during init stage
appctx_free_on_early_error() must be used to release a freshly created
frontend appctx if an error occurred during the init stage. It takes care to
release the stream instead of the appctx if it exists. For a backend appctx,
it just calls appctx_free().
2022-05-17 16:13:21 +02:00
Christopher Faulet
8718c95c0a MINOR: applet: Add a function to finalize frontend appctx startup
appctx_finalize_startup() may be used to finalize the frontend appctx
startup. It is responsible to create the appctx's session and the frontend
conn-stream. On error, it is the caller responsibility to release the
appctx. However, the session is released if it was created. On success, if
an error is encountered in the caller function, the stream must be released
instead of the appctx.

This function should ease the init stage when new appctx is created.
2022-05-17 16:13:21 +02:00
Christopher Faulet
16c0d9cda0 MINOR: applet: Add appctx_init() helper fnuction
It is just a helper function that call the .init applet callback function,
if it exists. This will simplify a bit the init stage when a new applet is
started. For now, this callback function is only used when a new service is
started.
2022-05-17 16:13:21 +02:00
Christopher Faulet
ab5d1dceed MINOR: stream: Export stream_free()
The stream_free() function is now public. It is mandatory to properly handle
errors when a new applet is started.
2022-05-17 16:13:21 +02:00
Christopher Faulet
c9929380a4 MINOR: applet: Change return value for .init callback function
0 is now returned on success and -1 on error.
2022-05-17 16:13:21 +02:00
Christopher Faulet
ac57bb527a MINOR: applet: Prepare appctx to own the session on frontend side
Applets were moved at the same level than multiplexers. Thus, gradually,
applets code is changed to be less dependent from the stream. With this
commit, the frontend appctx are ready to own the session. It means a
frontend appctx will be responsible to release the session.
2022-05-17 16:13:21 +02:00
Christopher Faulet
ef5e1bb4cf CLEANUP: conn-stream: Remove cs_applet_shut declaration from header file
This function was renamed and moved in applet code. cs_applet_shut() does
not exist anymore. Its declaration must be removed.
2022-05-17 16:13:21 +02:00
David Carlier
135c1ec139 BUILD: fix build warning on solaris based systems with __maybe_unused.
__maybe_unused is already defined there.
2022-05-17 11:42:20 +02:00
Remi Tricot-Le Breton
1746a388c5 MINOR: ssl: Add 'ssl-provider' global option
When HAProxy is linked to an OpenSSLv3 library, this option can be used
to load a provider during init. You can specify multiple ssl-provider
options, which will be loaded in the order they appear. This does not
prevent OpenSSL from parsing its own configuration file in which some
other providers might be specified.
A linked list of the providers loaded from the configuration file is
kept so that all those providers can be unloaded during cleanup. The
providers loaded directly by OpenSSL will be freed by OpenSSL.
2022-05-17 10:56:05 +02:00
Christopher Faulet
18c13d3bd8 MEDIUM: http-ana: Add a proxy option to restrict chars in request header names
The "http-restrict-req-hdr-names" option can now be set to restrict allowed
characters in the request header names to the "[a-zA-Z0-9-]" charset.

Idea of this option is to not send header names with non-alphanumeric or
hyphen character. It is especially important for FastCGI application because
all those characters are converted to underscore. For instance,
"X-Forwarded-For" and "X_Forwarded_For" are both converted to
"HTTP_X_FORWARDED_FOR". So, header names can be mixed up by FastCGI
applications. And some HAProxy rules may be bypassed by mangling header
names. In addition, some non-HTTP compliant servers may incorrectly handle
requests when header names contain characters ouside the "[a-zA-Z0-9-]"
charset.

When this option is set, the policy must be specify:

  * preserve: It disables the filtering. It is the default mode for HTTP
              proxies with no FastCGI application configured.

  * delete: It removes request headers with a name containing a character
            outside the "[a-zA-Z0-9-]" charset. It is the default mode for
            HTTP backends with a configured FastCGI application.

  * reject: It rejects the request with a 403-Forbidden response if it
            contains a header name with a character outside the
            "[a-zA-Z0-9-]" charset.

The option is evaluated per-proxy and after http-request rules evaluation.

This patch may be backported to avoid any secuirty issue with FastCGI
application (so as far as 2.2).
2022-05-16 16:00:26 +02:00
Amaury Denoyelle
45fce8fcb5 CLEANUP: quic: remove unused quic_rx_strm_frm
quic_rx_strm_frm type was used to buffered STREAM frames received out of
order. Now the MUX is able to deal directly with these frames and
buffered it inside its ncbuf.
2022-05-13 17:29:52 +02:00
Amaury Denoyelle
00f87bbaa3 CLEANUP: mux-quic: remove unused fields for Rx
Rx has been simplified since the conversion of buffer to a ncbuf. The
old buffer can now be removed. The frms tree is also removed. It was
used previously to stored out-of-order received STREAM frames. Now the
MUX is able to buffer them directly into the ncbuf.
2022-05-13 17:29:52 +02:00
Amaury Denoyelle
1290f1ebfb MEDIUM: mux-quic/h3/hq-interop: use ncbuf for bidir streams
Add a ncbuf for data reception on qcs. Thanks to this, the MUX is able
to buffered all received frame directly into the buffer. Flow control
parameters will be used to ensure there is never an overflow.

This change will simplify Rx path with the future deletion of acked
frames tree previously used for frames out of order.
2022-05-13 17:28:46 +02:00
Amaury Denoyelle
06749f3d6f MINOR: xprt_quic: adjust flow-control according to bufsize
Redefine the initial local flow-control to enforce by us. Use bufsize as
the maximum offset allowed to be received.

This change is part of an adjustement on the Rx path. Mux buffer will be
converted to a ncbuf. Flow-control parameters must ensure that we never
receive a frame larger than the buffer. With this, all received frames
will be stored in the MUX buffer.
2022-05-13 17:22:19 +02:00
Willy Tarreau
6796a06278 CLEANUP: conn_stream: merge cs_new_from_{mux,applet} into cs_new_from_endp()
The two functions became exact copies since there's no more special case
for the appctx owner. Let's merge them into a single one, that simplifies
the code.
2022-05-13 14:28:48 +02:00
Willy Tarreau
0698c80a58 CLEANUP: applet: remove the unneeded appctx->owner
This one is the pointer to the conn_stream which is always in the
endpoint that is always present in the appctx, thus it's not needed.
This patch removes it and replaces it with appctx_cs() instead. A
few occurences that were using __cs_strm(appctx->owner) were moved
directly to appctx_strm() which does the equivalent.
2022-05-13 14:28:48 +02:00
Willy Tarreau
c1b8d77805 MINOR: applet: add appctx_strm() and appctx_cs() to access common fields
It's very common to have to access a stream or a conn_stream from the
appctx, let's add trivial accessors for that.
2022-05-13 14:28:48 +02:00
Willy Tarreau
1c3ead45a4 MINOR: applet: replace cs_applet_shut() with appctx_shut()
The former takes a conn_stream still attached to a valid appctx,
which also complicates the termination of the applet. Instead, let's
pass the appctx which already points to the endpoint, this allows us
to properly detach the conn_stream before the call, which is cleaner
and safer.
2022-05-13 14:28:48 +02:00
Willy Tarreau
4201ab791d CLEANUP: muxes: make mux->attach/detach take a conn_stream endpoint
The mux ->detach() function currently takes a conn_stream. This causes
an awkward situation where the caller cs_detach_endp() has to partially
mark it as released but not completely so that ->detach() finds its
endpoint and context, and it cannot be done later since it's possible
that ->detach() deletes the endpoint. As such the endpoint link between
the conn_stream and the mux's stream is in a transient situation while
we'd like it to be clean so that the mux's ->detach() code can call any
regular function it wants that knows the regular semantics of the
relation between the CS and the endpoint.

A better approach consists in slightly modifying the detach() API to
better match the reality, which is that the endpoint is detached but
still alive and that it's the only part the function is interested in.

As such, this patch modifies the function to take an endpoint there,
and by analogy (or simplicity) does the same for ->attach(), even
though it looks less important there since we're always attaching an
endpoint to a conn_stream anyway. It is possible that in the future
the API could evolve to use more endpoints that provide a bit more
flexibility in the API, but at this point we don't need to go further.
2022-05-13 14:28:48 +02:00
Willy Tarreau
01c2a4a86f MINOR: mux-quic: remove the now unneeded conn_stream from the qcs
Since we always have a valid endpoint we can safely use it to access
the conn_stream and stop using qcs->cs. That's one less pointer to
care about.
2022-05-13 14:28:48 +02:00
Willy Tarreau
efb4618c6e MINOR: conn_stream: add a pointer back to the cs from the endpoint
Muxes and applets need to have both a pointer to the endpoint and to the
conn_stream. It would seem more natural that they only have a pointer to
the endpoint (that is always there) and that this one has an optional
pointer to the conn_stream. This would reduce the number of elements to
manipulate in lower level code. In addition, the conn_stream is not much
used from the lower layers (wake and exceptional events mostly).
2022-05-13 14:28:48 +02:00
Willy Tarreau
386346f5eb MINOR: conn_stream: make cs_set_error() work on the endpoint instead
Wherever we need to report an error, we have an even easier access to
the endpoint than the conn_stream. Let's first adjust the API to use
the endpoint and rename the function accordingly to cs_ep_set_error().
2022-05-13 14:27:57 +02:00
Amaury Denoyelle
df25acf47f MINOR: ncbuf: implement advance
A new function ncb_advance() is implemented. This is used to advance the
buffer head pointer. This will consume the front data while forming a
new gap at the end for future data.

On success NCB_RET_OK is returned. The operation can be rejected if a
too small new gap is formed in front of the buffer.
2022-05-12 18:29:55 +02:00
Amaury Denoyelle
b830f0d8d9 MINOR: ncbuf: define various insertion modes
Define three different ways to proceed insertion. This configures how
overlapping data is treated.
- NCB_ADD_PRESERVE : in this mode, old data are kept during insertion.
- NCB_ADD_OVERWRT : new data will overwrite old ones.
- NCB_ADD_COMPARE : this mode adds a new test in check stage. The
  overlapping old and new data must be identical or else the insertion
  is not conducted. An error NCB_RET_DATA_REJ is used in this case.

The mode is specified with a new argument to ncb_add() function.
2022-05-12 18:27:05 +02:00
Amaury Denoyelle
077e096b30 MINOR: ncbuf: implement insertion
Implement a new function ncb_add() to insert data in ncbuf. This
operation is conducted in two stages. First, a simulation will be run to
ensure that insertion can be proceeded. If a gap is formed, either
before or after the new data, it must be big enough to store its header,
or else the insertion is aborted.

After this check stage, the insertion is conducted block by block with
the function pair ncb_fill_data_blk()/ncb_fill_gap_blk().

A new type ncb_ret is used as a return value. For the moment, only
success or gap-size error is used. It is planned to add new error types
in the future when insertion will be extended.
2022-05-12 18:27:05 +02:00
Amaury Denoyelle
edeb0a61a2 MINOR: ncbuf: optimize storage for the last gap
Relax the constraint for gap storage when this is the last block.
ncb_blk API functions will consider that if a gap is stored near the end
of the buffer, without the space to store its header, the gap will cover
entirely the buffer end.

For these special cases, the gap size/data size are not write/read
inside the gap to prevent an overflow. Such a gap is designed in
functions as "reduced gap" and will be flagged with the value
NCB_BK_F_FIN.

This should reduce the rejection on future add operation when receiving
data in-order. Without reduced gap handling, an insertion would be
rejected if it covers only partially the last buffer bytes, which can be
a very common case.
2022-05-12 18:18:47 +02:00
Amaury Denoyelle
d5d2ed90f0 MINOR: ncbuf: complete API and define block interal abstraction
Implement two new functions to report the total data stored accross the
whole buffer and the data stored at a specific offset until the next gap
or the buffer end.

To facilitate implementation of these new functions and also future
add/delete operations, a new abstraction is introduced : ncb_blk. This
structure represents a block of either data or gap in the buffer. It
simplifies operation when moving forward in the buffer. The first buffer
block can be retrieved via ncb_blk_first(buf). The block at a specific
offset is accessed via ncb_blk_find(buf, off).

This abstraction is purely used in functions but not stored in the ncbuf
structure per-se. This is necessary to keep the minimal memory
footprint.
2022-05-12 18:18:47 +02:00
Amaury Denoyelle
1b5f77fc18 MINOR: ncbuf: define non-contiguous buffer
Define the new type ncbuf. It can be used as a buffer with
non-contiguous data and wrapping support.

To reduce as much as possible the memory footprint, size of data and
gaps are stored in the gaps themselves. This put some limitation on the
buffer usage. A reserved space is present just before the head to store
the size of the first data block. Also, add and delete operations will
be constrained to ensure minimal gap sizes are preserved.

The sizes stored in the gaps are represented by a custom type named
ncb_sz_t. This type is a typedef to easily change it : this has a
direct impact on the maximum buffer size (MAX(ncb_sz_t) - sizeof(ncb_sz_t))
and the minimal gap sizes (sizeof(ncb_sz_t) * 2)).
Currently, it is set to uint32_t.
2022-05-12 18:13:21 +02:00
Frédéric Lécaille
31fe308acc CLEANUP: quic_tls: QUIC_TLS_IV_LEN defined two times
Hopefully with the same value!
2022-05-12 17:48:35 +02:00
Frédéric Lécaille
a54e49d0b1 CLEANUP: quic: wrong use of eb*entry() macro
This wrong use has no consequence because the ->node member fields of
eb*node structs are the first.
2022-05-12 17:48:35 +02:00
Frédéric Lécaille
e2fb1bf487 MINOR: quic: Send stateless reset tokens
Add send_stateless_reset() to send a stateless reset packet. It prepares
a packet to build a 1-RTT packet with quic_stateless_reset_token_cpy()
to copy a stateless reset token derived from the cluster secret with
the destination connection ID received as salt.
Also add QUIC_EV_STATELESS_RST new trace event to at least to have a trace
of the connection which are reset.
2022-05-12 17:48:35 +02:00
Frédéric Lécaille
0226c521b0 MINOR: quic: new_quic_cid() code moving
This function will have to call another one from quic_tls.[ch] soon.
As we do not want to include quic_tls.h from xprt_quic.h because
quic_tls.h already includes xprt_quic.h, let's moving it into
xprt_quic.c.
2022-05-12 17:48:35 +02:00
Frédéric Lécaille
7b92c81e43 MINOR: quic-tls: Add quic_hkdf_extract_and_expand() for HKDF
This is a wrapper function around OpenSSL HKDF API functions to
use the "extract-then-expand" HKDF mode as defined by rfc5869.
This function will be used to derived stateless reset tokens
from secrets ("cluster-secret" conf. keyword) and CIDs (as salts).
2022-05-12 17:48:35 +02:00
Frédéric Lécaille
372508cc42 MINOR: config: Add "cluster-secret" new global keyword
It could be usefull to set a ASCII secret which could be used for different
usages. For instance, it will be used to derive QUIC stateless reset tokens.
2022-05-12 17:48:35 +02:00
Frédéric Lécaille
7cc8b3166a MINOR: quic: Add correct ack delay values to ACK frames
A ->time_received new member is added to quic_rx_packet to store the time the
packet are received. ->largest_time_received is added the the packet number
space structure to store this timestamp for the packet with a new largest
packet number to be acknowledged. QUIC_FL_PKTNS_NEW_LARGEST_PN new flag is
added to mark a packet number space as having to acknowledged a packet wih a
new largest packet number. In this case, the packet number space ack delay
must be recalculated.
Add quic_compute_ack_delay_us() function to compute the ack delay from the value
of the time a packet was received. Used only when a packet with a new largest
packet number.
2022-05-12 15:30:14 +02:00
Frédéric Lécaille
9475d890ee MINOR: quic: Congestion controller event trace fix (loss)
Missing event type (loss).
2022-05-12 15:30:14 +02:00
Frédéric Lécaille
f6e8594469 BUG/MINOR: quic: Wrong unit for ack delay for incoming ACK frames
This ACK frame field value is in microseconds. Everything is interpreted
and stored in milliseconds in our QUIC implementation.
2022-05-12 15:30:14 +02:00
Frédéric Lécaille
5b988ebed1 BUG/MINOR: quic: Dropped peer transport parameters
The call to quic_dflt_transport_params_cpy() is already first done by
quic_transport_params_init() which is a good thing. But this function was also
called each time we parsed a transport parameters with quic_transport_param_decode(),
re-initializing to default values some of them. The transport parameters concerned
by this bug are the following:
   - max_udp_payload_size
   - ack_delay_exponent
   - max_ack_delay
   - active_connection_id_limit
So, let's remove this call to quic_dflt_transport_params_cpy() which has nothing
to do here!
2022-05-12 15:26:10 +02:00
Frédéric Lécaille
8726d633d4 MINOR: quic: Add a debug counter for sendto() errors
As we do not have any task to be wake up by the poller after sendto() error,
we add an sendto() error counter to the quic_conn struct.
Dump its values from qc_send_ppkts().
2022-05-12 15:11:53 +02:00
Emeric Brun
314e6ec822 BUG/MAJOR: dns: multi-thread concurrency issue on UDP socket
This patch adds a lock on the struct dgram_conn to ensure
that an other thread cannot trash a fd or alter its status
while the current thread processing it on for send/receive/connect
operations.

Starting with the 2.4 version this could cause a crash when a DNS
request is failing, setting the FD of the dgram structure to -1. If the
dgram structure is reused after that, a read access to fdtab[-1] is
attempted. The crash was only triggered when compiled with ASAN.

In previous versions the concurrency issue also exists but is less
likely to crash.

This patch must be backported until v2.4 and should be
adapt for v < 2.4.
2022-05-11 15:20:10 +02:00
vigneshsp
47a4c61d63 BUG/MINOR: server: Make SRV_STATE_LINE_MAXLEN value from 512 to 2kB (2000 bytes).
The statefile before this patch can only parse lines within 512
characters, now as we made the value to 2000, it can support a
line of length of 2kB.

This patch fixes GitHub issue #1530.
It should be backported to all stable releases.
2022-05-11 11:39:06 +02:00
Willy Tarreau
8a0fd3a36c BUILD: debug: work around gcc-12 excessive -Warray-bounds warnings
As was first reported by Ilya in issue #1513, compiling with gcc-12
adds warnings about size 0 around each BUG_ON() call due to the
ABORT_NOW() macro that tries to dereference pointer value 1.

The problem is known, seems to be complex inside gcc and could only
be worked around for now by adjusting a pointer limit so that the
warning still catches NULL derefs in the first page but not other
values commonly used in kernels and boot loaders:
   https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=91f7d7e1b

It's described in more details here:
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104657
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103768

And some projects had to work around it using various approaches,
some of which are described in the bugs reports above, plus another
one here:
   https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/HLK3BHP2T3FN6FZ46BIPIK3VD5FOU74Z/

In haproxy we can hide it by hiding the pointer in a DISGUISE() macro,
but this forces the pointer to be loaded into a register, so that
register is lost precisely where we want to get the maximum of them.

In our case we purposely use a low-value non-null pointer because:
  - it's mandatory that this value fits within an unmapped page and
    only the lowest one has this property
  - we really want to avoid register loads for the address, as these
    will be lost and will complicate the bug analysis, and they tend
    to be used for large addresses (i.e. instruction length limit).
  - the compiler may decide to optimize away the null deref when it
    sees it (seen in the past already)

As such, the current workaround merged in gcc-12 is not effective for
us.

Another approach consists in using pragmas to silently disable
-Warray-bounds and -Wnull-dereference only for this part. The problem
is that pragmas cannot be placed into macros.

The resulting solution consists in defining a forced-inlined function
only to trigger the crash, and surround the dereference with pragmas,
themselves conditionned to gcc >= 5 since older versions don't
understand them (but they don't complain on the dereference at least).

This way the code remains the same even at -O0, without the stack
pointer being modified nor any address register being modified on
common archs (x86 at least). A variation could have been to rely on
__builtin_trap() but it's not everywhere and it behaves differently
on different platforms (undefined opcode or a nasty abort()) while
the segv remains uniform and effective.

This may need to be backported to older releases once users start to
complain about gcc-12 breakage.
2022-05-09 20:32:11 +02:00
Willy Tarreau
ecab71fbac BUILD: stats: conditionally mark obsolete stats states as deprecated
The obsolete stats states STAT_ST_* were marked as deprecated with recent
commit 6ef1648dc ("CLEANUP: stats: rename the stats state values an mark
the old ones deprecated"), except that this feature requires gcc 6 and
above. Let's use the macro that depends on this condition instead.

The issue appeared on 2.6-dev9 so no backport is needed.
2022-05-09 20:32:11 +02:00
Willy Tarreau
4fc2cd7c8e MINOR: compiler: add a new macro to set an attribute on an enum when possible
Gcc 6 and above support placing an attribute on an enum's value. This
is convenient for marking some values as deprecated. We just need the
macro because older versions fail to parse __attribute__() there.
2022-05-09 20:32:11 +02:00
David CARLIER
4aed40e6c7 MINOR: tcp: socket translate TCP_KEEPIDLE for macOs equivalent
On Linux the interval before starting to send TCP keep-alive packets
is defined by TCP_KEEPIDLE. MacOS has an equivalent with TCP_KEEPIDLE,
which also uses seconds as a unit, so it's possible to simply remap the
definition of TCP_KEEPIDLE to TCP_KEEPALIVE there and get it to seamlessly
work. The other settings (interval and count) are not present, though.
2022-05-08 10:35:39 +02:00
Willy Tarreau
6ef1648dc2 CLEANUP: stats: rename the stats state values an mark the old ones deprecated
The STAT_ST_* values have been abused by virtually every applet and CLI
keyword handler, and this must not continue as it's a source of bugs and
of overly complicated code.

This patch renames the states to STAT_STATE_*, and keeps the previous
enum while marking each entry as deprecated. This should be sufficient to
catch out-of-tree code that might rely on them and to let them know what
to do with that.
2022-05-06 18:33:49 +02:00
Willy Tarreau
1c0715b12a CLEANUP: cli: move the status print context into its own context
Now that the CLI's print context is alone in the appctx, it's possible
to refine the appctx's ctx layout so that the cli part matches exactly
a regular svcctx, and as such move the CLI context into an svcctx like
other applets. External code will still build and work because the
struct cli perfectly maps onto the struct cli_print_ctx that's located
into svc.storage. This is of course only to make a smooth transition
during 2.6 and will disappear immediately after.

A tiny change had to be applied to the opentracing addon which performs
direct accesses to the CLI's err pointer in its own print function. The
rest uses the standard cli_print_* which were the only ones that needed
a small change.

The whole "ctx.cli" struct could be tagged as deprecated so that any
possibly existing external code that relies on it will get build
warnings, and the comments in the struct are pretty clear about the
way to fix it, and the lack of future of this old API.
2022-05-06 18:33:22 +02:00
Willy Tarreau
aa229ccc4c MINOR: lua: move the http service context out of appctx.ctx
Just like for the TCP service, let's move the context away from
appctx.ctx. A new struct hlua_http_ctx was defined, reserved in
hlua_applet_http_init() and used everywhere else. Similarly, the
task dump code will no more report decoded stack traces in case
these services would be involved. That may be solved later.
2022-05-06 18:13:36 +02:00
Willy Tarreau
e23f33bbfe MINOR: lua: move the tcp service storage outside of appctx.ctx
The use-service mechanism for Lua in TCP mode relies on the
hlua_tcp storage in appctx->ctx. We can move its definition to
hlua.c and simply use appctx_reserve_svcctx() to reserve and access
the stoage. One tiny side effect is that the task dump used in panics
will not show anymore the Lua call stack in its trace. For this a
better API is needed from the Lua code to expose a function that does
the job from an appctx.
2022-05-06 18:13:36 +02:00
Willy Tarreau
5321da9df0 MEDIUM: lua: move the cosocket storage outside of appctx.ctx
The Lua cosockets were using appctx.ctx.hlua_cosocket. Let's move this
to a local definition of "hlua_csk_ctx" in hlua.c, which is allocated
from the appctx by hlua_socket_new(). There's a notable change which is
that, while previously the xref link with the peer was established with
the appctx, it's now in the hlua_csk_ctx. This one must then hold a
pointer to the appctx. The code was adjusted accordingly, and now that
part of the code doesn't use the appctx.ctx anymore.
2022-05-06 18:13:36 +02:00
Willy Tarreau
f61494c708 CLEANUP: cache: take the context out of appctx.ctx
The context was moved to a local definition in the cache code, and
there's nothing specific to the cache anymore in the appctx. The
struct is stored into the appctx's storage area via the svcctx.
2022-05-06 18:13:36 +02:00
Willy Tarreau
c7afedc140 BUILD: applet: mark the appctx's st2 variable as deprecated
This one has been misused for a while as well, it's time to deprecate it
since we don't use it anymore. It will be removed in 2.7 and for now is
only marked as deprecated. Since we need to guarantee that it's zeroed
before starting any applet or CLI command, it was moved into an anonymous
union where its sibling is not marked as deprecated so that we can
continue to initialize it without triggering a warning.

If you found this commit after a bisect session you initiated to figure
why you got some build warnings and don't know what to do, have a look
at the code that deals with the "show fd", "show sess" or "show servers"
commands, as it's supposed to be self-explanatory about the tiny changes
to apply to your code to port it. If you find APPLET_MAX_SVCCTX to be
too small for your use case, either kindly ask for a tiny extension
(and try to get your code merged), or just use a pool.
2022-05-06 18:13:36 +02:00
Willy Tarreau
f50da2c320 BUILD: applet: mark the CLI's generic variables as deprecated
The generic context variables p0/p1/p2, i0/i1, o0/o1 have been abused
and causing trouble for too long, it's time to remove them now that
they are not used anymore.

However the risk that external code still uses them is not nul and we
had not warned before about their removal. Let's mark them deprecated
in 2.6 and removed in 2.7. This will let external code continue to work
(as well as it could if it misuses them), with a strong encouragement
on updating it.

If you found this commit after a bisect session you initiated to figure
why you got some build warnings and don't know what to do, have a look
at the code that deals with the "show fd", "show env" or "show servers"
commands, as it's supposed to be self-explanatory about the tiny changes
to apply to your code to port it. If you find APPLET_MAX_SVCCTX to be
too small for your use case, either kindly ask for a tiny extension
(and try to get your code merged), or just use a pool.
2022-05-06 18:13:36 +02:00
Willy Tarreau
23a2407843 CLEANUP: spoe: do not use appctx.ctx anymore
The spoe code already uses its own generic pointer, let's move it to
svcctx instead of keeping a struct spoe in the appctx union.
2022-05-06 18:13:36 +02:00
Willy Tarreau
455caef642 CLEANUP: peers: do not use appctx.ctx anymore
The peers code already uses its own generic pointer, let's move it to
svcctx instead of keeping a struct peers in the appctx union.
2022-05-06 18:13:36 +02:00
Willy Tarreau
1eea6657fb CLEANUP: httpclient: do not use the appctx.ctx anymore
The httpclient already uses its own pointer and only used to store this
single pointer into the appctx.ctx field. Let's just move it to the
svcctx and remove this entry from the appctx union.
2022-05-06 18:13:36 +02:00
Willy Tarreau
cba8838e59 CLEANUP: ring: pass the ring watch flags to ring_attach_cli(), not in ctx.cli
The ring watch flags (wait, seek end) were dangerously passed via ctx.cli.i0
from "show buf" in sink.c:cli_parse_show_events(), or implicitly reset in
"show errors". That's very unconvenient, difficult to follow, and prone to
short-term breakage.

Let's pass an extra argument to ring_attach_cli() to take these flags, now
defined in ring-t.h as RING_WF_*, and let the function set them itself
where appropriate (still ctx.cli.i0 for now).
2022-05-06 18:13:36 +02:00
Willy Tarreau
42cc831abf CLEANUP: sink: use the generic context to store the forwarder's context
Instead of having a struct that contains a single pointer in the appctx
context, let's directly use the generic context pointer and get rid of
the now unused sft.ptr entry.
2022-05-06 18:13:36 +02:00
Willy Tarreau
dec23dc43f CLEANUP: ssl/cli: use a local context for "commit ssl {ca|crl}file"
These two commands use distinct parse/release functions but a common
iohandler, thus they need to keep the same context. It was created
under the name "commit_cacrlfile_ctx" and holds a large part of the
pointers (6) and the ca_type field that helps distinguish between
the two commands for the I/O handler. It looks like some of these
fields could have been merged since apparently the CA part only
uses *cafile* and the CRL part *crlfile*, while both old and new
are of type cafile_entry and set only for each type. This could
probably even simplify some parts of the code that tries to use
the correct field.

These fields were the last ones to be migrated thus the appctx's
ssl context could finally be removed.
2022-05-06 18:13:36 +02:00
Willy Tarreau
329f4b4f2f CLEANUP: ssl/cli: use a local context for "set ssl cert"
The command doesn't really need any storage since there's only a parser,
but since it used this context, there might have been plans for extension,
so better continue with a persistent one. Only old_ckchs, new_ckchs, and
path were being used from the appctx's ssl context. There ones moved to
the local definition, and the two former ones were removed from the appctx
since not used anymore.
2022-05-06 18:13:36 +02:00
Willy Tarreau
96c9a6c752 CLEANUP: ssl/cli: use a local context for "show ssl cert"
This command only really uses old_ckchs, cur_ckchs and the index
in which the transaction was stored. The new structure "show_cert_ctx"
only has these 3 fields, and the now unused "cur_ckchs" and "index"
could be removed from the shared ssl context.
2022-05-06 18:13:36 +02:00
Willy Tarreau
f3e8b3e877 CLEANUP: ssl/cli: use a local context for "show crlfile"
Now this command doesn't share any context anymore with "show cafile"
nor with the other commands. The previous "cur_cafile_entry" field from
the applet's ssl context was removed as not used anymore. Everything was
moved to show_crlfile_ctx which only has 3 fields.
2022-05-06 18:13:36 +02:00
Willy Tarreau
50c2f1e0cd CLEANUP: ssl/cli: use a local context for "show cafile"
Saying that the layout and usage of the various variables in the ssl
applet context is a mess would be an understatement. It's very hard
to know what command uses what fields, even after having moved away
from the mix of cli and ssl.

Let's extract the parts used by "show cafile" into their own structure.
Only the "show_all" field would be removed from the ssl ctx, the other
fields are still shared with other commands.
2022-05-06 18:13:35 +02:00
Willy Tarreau
bcda5f6bcd CLEANUP: hlua/cli: take the hlua_cli context definition out of the appctx
This context is used by CLI keywords registered by Lua. We can take
it out of the appctx and use the generic command context allocation so
that the appctx doesn't have to declare a specific one anymore. The
context is created during parsing.
2022-05-06 18:13:35 +02:00
Willy Tarreau
41f885241e CLEANUP: stats/cli: stop using appctx->st2
Instead, let's have the state as an enum inside the context. It's much
cleaner and safer as we know nobody else touches it.
2022-05-06 18:13:35 +02:00
Willy Tarreau
91cefcaba4 CLEANUP: stats/cli: take the "show stat" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing (both in the CLI and HTTP).

The change looks large but it's particularly mechanical. The context
initialization appears in stats.c and http_ana.c. The context is used
in stats.c and resolvers.c since "show stat resolvers" points there.
That's the reason why the definition moved to stats.h. "show info"
and "show stat" continue to share the same state definition for now.

Nothing else was modified.
2022-05-06 18:13:35 +02:00
Willy Tarreau
cb8bf17900 CLEANUP: peers/cli: take the "show peers" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. The code also uses st2 which deserves being
addressed in separate commit.
2022-05-06 18:13:35 +02:00
Willy Tarreau
0fcecc63c8 CLEANUP: map/cli: take the "show map" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. Many commands, including pure parsers, use this
context but that's not a problem as it's designed to be used this way.
Due to this, many lines are changed but that's in fact a replacement of
"appctx->ctx.map" with "ctx->". Note that the code also uses st2 which
deserves being addressed in separate commit.
2022-05-06 18:13:35 +02:00
Willy Tarreau
3c69e08e96 CLEANUP: stick-table/cli: take the "show table" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. The code also uses st2 which deserves being
addressed in separate commit.
2022-05-06 18:13:35 +02:00
Willy Tarreau
0fd8f0e236 CLEANUP: proxy/cli: take the "show errors" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing.

The code still has room for improvement, such as in the "flags" field
where bits are hard-coded, but they weren't modified.
2022-05-06 18:13:35 +02:00
Willy Tarreau
39f097d965 CLEANUP: stream/cli: take the "show sess" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing.
2022-05-06 18:13:35 +02:00
Willy Tarreau
f12f32a0fa MINOR: applet: reserve some generic storage in the applet's context
Instead of using existing fields and having to put keyword-specific
contexts in the applet definition, let's have the appctx provide a
generic storage area that's currently large enough for existing CLI
commands and small applets, and a function to allocate that storage.

The function will be responsible for verifying that the requested size
fits in the area so that the caller doesn't need to add specific checks;
it is validated during development as this size is static and will
not change at runtime. In addition the caller doesn't even need to
free() the area since it's part of an existing context. For the
caller's convenience, a context pointer "svcctx" for the command is
also provided so that the allocated area can be placed there (or
possibly any other one in case a larger area is needed).

The struct's layout has been temporarily complicated by adding one
level of anonymous union on top of the "ctx" one. This will allow us
to preserve "ctx" during 2.6 for compatibility with possible external
code and get rid of it in 2.7. This explains why the diff extends to
the whole "ctx" union, but a "git show -b" shows that only one extra
layer was added. In order to make both the svcctx pointer and its
storage accessible without further enlarging the appctx structure,
both svcctx and the storage share the same storage as the ctx part.
This is done by having them placed in the union with a protected
overlapping area for svcctx, for which a shadow member is also
present in the storage area:

    union {
       void* svcctx;         // variable accessed by services
       struct {
           void *shadow;     // shadow of svcctx;
           char storage[];   // where most services store their data
       };
       union {               // older commands store here and ignore svcctx
          ...
       } ctx;
    };

I.e. new applications will use appctx->svcctx while older ones will be
able to continue to use appctx->ctx.*

The whole area (including the pointer's context) is zeroed before any
applet is initialized, and before CLI keyword processor's first invocation,
as it is an important part of the existing keyword processors, which makes
CLI keywords effectively behave like applets.
2022-05-06 18:13:35 +02:00
Willy Tarreau
4fd9b4ddf0 BUG/MINOR: ssl/cli: fix "show ssl cert" not to mix cli+ssl contexts
The "show ssl cert" command mixes some generic pointers from the
"ctx.cli" struct with context-specific ones from "ctx.ssl" while both
are in a union. Amazingly, despite the use of both p0 and i0 to store
respectively a pointer to the current ckchs and a transaction id, there
was no overlap with the other pointers used during these operations,
but should these fields be reordered or slightly updated this will break.
Comments were added above the faulty functions to indicate which fields
they are using.

This needs to be backported to 2.5.
2022-05-06 18:13:35 +02:00
Willy Tarreau
06305798f7 BUG/MINOR: ssl/cli: fix "show ssl ca-file <name>" not to mix cli+ssl contexts
The "show ssl ca-file <name>" command mixes some generic pointers from
the "ctx.cli" struct and context-specific ones from "ctx.ssl" while both
are in a union. The i0 integer used to store the current ca_index overlaps
with new_crlfile_entry which is thus harmless for now but is at the mercy
of any reordering or addition of these fields. Let's add dedicated fields
into the ssl structure for this.

Comments were added on top of the affected functions to indicate what they
use.

This needs to be backported to 2.5.
2022-05-06 18:13:35 +02:00
Willy Tarreau
821c3b0b5e BUG/MINOR: ssl/cli: fix "show ssl ca-file/crl-file" not to mix cli+ssl contexts
The "show ca-file" and "show crl-file" commands mix some generic pointers
from the "ctx.cli" struct and context-specific ones from "ctx.ssl" while
both are in a union. It's fortunate that the p0 pointer in use is located
immediately before the first one used (it overlaps with next_ckchi_link,
and old_cafile_entry is safe). But should these fields be reordered or
slightly updated this will break.

Comments were added on top of the affected functions to indicate what they
use.

This needs to be backported to 2.5.
2022-05-06 18:13:35 +02:00
William Lallemand
7867f63313 MEDIUM: resolvers: create a "default" resolvers section at startup
Try to create a "default" resolvers section at startup, but does not
display any error nor warning. This section is initialized using the
/etc/resolv.conf of the system.

This is opportunistic and with no guarantee that it will work (but it should
on most systems).

This is useful for the httpclient as it allows to use the DNS resolver
without any configuration in most of the cases.

The function is called from the httpclient_pre_check() function to
ensure than we tried to create the section before trying to initiate the
httpclient. But it is also called from the resolvers.c to ensure the
section is created when the httpclient init was disabled.
2022-05-06 17:02:15 +02:00
Christopher Faulet
fa24379aeb MINOR: conn-stream: Add mask from flags set by endpoint or app layer
In flags set on the endpoints, some are set by endpoints itself and some are
set by the app layer. To help flags manipulations, 2 masks have been
added. The first one, CS_EP_ENDP_MASK, for all flags that an endpoint may
set. The other one, CS_EP_APP_MASK, for flags that the app layer may set.

This patch is mandatory for the next commit.
2022-05-05 09:23:35 +02:00
Frédéric Lécaille
664741e1c5 MINOR: quic: Make the quic_conn be aware of the number of streams
This is required when the retransmitted frame types when the mux is released.
We add a counter for the number of streams which were opened or closed by the mux.
After the mux has been released, we can rely on this counter to know if the STREAM
frames are retransmitted ones or not.
2022-05-03 10:13:40 +02:00
Frédéric Lécaille
b074966634 CLEANUP: mux: Useless xprt_quic-t.h inclusion
This inclusion is useless for mux_quic-t.h. Furthermore this fixes compilation
issues when we need to refer to mux_quic-t.h from xprt_quic-t.h.
2022-05-03 10:13:40 +02:00