Commit Graph

23038 Commits

Author SHA1 Message Date
Willy Tarreau
cb4c236fac BUG/MINOR: cfgparse: detect another uncaught case of duplicate defaults
The following sequence was not properly caught:

   defaults def
   backend back from def
   defaults def

But this one was:

   defaults def
   defaults def
   backend back from def

Let's check when defaults are declared that they're not already
referenced.

Better not backport this. While it will catch broken configs (possibly
some with backends pasted after the wrong defaults), these might still
work by accident. It may be reported as a diag warning though.
2024-09-20 15:58:10 +02:00
Willy Tarreau
5b221d1e41 CLEANUP: cfgparse: factor proxy vs log-forward collisions
This simplifies the check added in 1a38684fbc ("MEDIUM: cfgparse:
detect collisions between defaults and log-forward"), by factoring it
with the other existing one.

The tests are ugly in that code because a first block tests pure
proxies, a second one proxies or defaults and inside that one we
have special cases for defaults. Let's just move the tests to the
"any proxy type" block.
2024-09-20 14:13:14 +02:00
Willy Tarreau
b325453c36 MINOR: proxy: use the global file names for conf->file
Proxy file names are assigned a bit everywhere (resolvers, peers,
cli, logs, proxy). All these elements were enumerated and now use
copy_file_name(). The only ha_free() call was turned to drop_file_name().

As a bonus side effect, a 300k backend config saved 14 MB of RAM.
2024-09-19 15:38:19 +02:00
Willy Tarreau
9ab21a3c2d CLEANUP: stick-table: make the file location point to a global file name
The file name used to point to the calling function's stack for stick
tables, which was OK during parsing but remained dangling afterwards.
At least it was already marked const so as not to accidentally free it.
Let's make it point to a file_name_node now.
2024-09-19 15:38:19 +02:00
Willy Tarreau
d6c060c5ae MINOR: tools: add minimal file name management
In proxies, stick-tables, servers, etc... at plenty of places we store
a file name and a line number. Some file names are the result of strdup()
(e.g. in proxies), others not (e.g. stick-tables) and leave dangling
pointers at the end of parsing. The risk of double-free is not null
either.

In order to stop this, let's first add a simple tool that allows to
register short strings inside a global list, these strings happening
to be server names. The strings are either duplicated and stored upon
failure to find them, or just added to this storage. Since file names
are not expected to disappear before the end of the process, for now
we don't even implement refcounting, and we free them all at the end.
There's already a drop_file_name() function to reset the pointer like
ha_free() used to do, and even if not strictly needed it's a good
habit to get used to doing it.

The strings are returned as const so that they're stored as-is in
structs, and that nasty free() calls are easily caught. The pointer
points to the char[] storage inside the node itself. This way later
if we want to implement refcounting, it will be trivial to just look
up a string and change its associated node's refcount. If needed,
comparisons can also be made on pointers.

For now they're not used yet and are released on deinit().
2024-09-19 15:36:58 +02:00
Willy Tarreau
30a0e93fe6 [RELEASE] Released version 3.1-dev8
Released version 3.1-dev8 with the following main changes :
    - DOC: configuration: place the HAPROXY_HTTP_LOG_FMT example on the correct line
    - MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state
    - BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only
    - REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades
    - BUG/MEDIUM: clock: detect and cover jumps during execution
    - BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg()
    - BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg}
    - BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
    - MEDIUM: ssl/cli: "dump ssl cert" allow to dump a certificate in PEM format
    - BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state
    - BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established
    - REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load
    - BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
    - REGTESTS: shorten a bit the delay for the h1/h2 upgrade test
    - MINOR: server: allow init-state for dynamic servers
    - DOC: server: document what to check for when adding new server keywords
    - MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option
    - BUG/MINOR: polling: fix time reporting when using busy polling
    - BUG/MINOR: clock: make time jump corrections a bit more accurate
    - BUG/MINOR: clock: validate that now_offset still applies to the current date
    - BUG/MEDIUM: queue: implement a flag to check for the dequeuing
    - OPTIM: sample: don't check casts for samples of same type
    - OPTIM: vars: remove the unneeded lock in vars_prune_*
    - OPTIM: vars: inline vars_prune() to avoid many calls
    - MINOR: vars: remove the emptiness tests in callers before pruning
    - IMPORT: import cebtree (compact elastic binary trees)
    - OPTIM: vars: use a cebtree instead of a list for variable names
    - OPTIM: vars: use multiple name heads in the vars struct
    - BUG/MINOR: peers: local entries updates may not be advertised after resync
    - DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options
    - MINOR: proxy: Rename accept-invalid-http-* options
    - DOC: configuration: Remove dangerous directives from the proxy matrix
    - BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send
    - BUG/MEDIUM: cache/stats: Wait to have the request before sending the response
    - BUG/MEDIUM: promex: Wait to have the request before sending the response
    - MINOR: clock: test all clock_gettime() return values
    - MEDIUM: clock: collect the monotonic time in clock_local_update_date()
    - MEDIUM: clock: opportunistically use CLOCK_MONOTONIC for the internal time
    - MEDIUM: clock: use the monotonic clock for idle time calculation
    - MEDIUM: clock: don't compute before_poll when using monotonic clock
    - BUG/MINOR: fix missing "log-format overrides previous 'option tcplog clf'..." detection
    - BUG/MINOR: fix missing "'option httpslog' overrides previous 'option tcplog clf'..." detection
    - BUG/MINOR: cfgparse-listen: fix option httpslog override warning message
    - BUG/MINOR: cfgparse: detect incorrect overlap of same backend names
    - MEDIUM: cfgparse: warn about proxies having the same names
    - DOC: management: add init-state to add server keywords
    - BUG/MINOR: mux-quic: report glitches to session
    - BUILD: cebtree: silence a bogus gcc warning on impossible code paths
    - MEDIUM: cfgparse: warn about colliding names between defaults and proxies
    - MEDIUM: cfgparse: detect collisions between defaults and log-forward
2024-09-18 22:29:08 +02:00
Willy Tarreau
1a38684fbc MEDIUM: cfgparse: detect collisions between defaults and log-forward
Sadly, when log-forward were introduced they took great care of avoiding
collision with regular proxies but defaults were missed (they need to be
explicitly checked for). So now we have to move them to a warning for 3.1
instead of rejecting them.
2024-09-18 18:08:15 +02:00
Willy Tarreau
d8f4b07e40 MEDIUM: cfgparse: warn about colliding names between defaults and proxies
In order to complete the checks added in 303a66573d ("MEDIUM: cfgparse:
warn about proxies having the same names"), we also need to warn about
regular proxies having the same name as defaults sections as well as
defaults sections having the same name as proxies, since defaults
sections are inherently proxies, albeit stored in a separate list for
now.
2024-09-18 18:08:06 +02:00
Willy Tarreau
8df44eea6d BUILD: cebtree: silence a bogus gcc warning on impossible code paths
gcc-12 and above report a wrong warning about a negative length being
passed to memcmp() on an impossible code path when built at -O0. The
pattern is the same at a few places, basically:

  int foo(int op, const void *a, const void *b, size_t size, size_t arg)
  {
      if (op == 1) // arg is a strict multiple of size
          return memcmp(a, b, arg - size);
      return 0;
  }
  ...
  int bar()
  {
     return foo(0, a, b, sizeof(something), 0);
  }

It *might* be possible to invent dummy values for the "len" argument
above in the real code, but that significantly complexifies it and as
usual can easily result in introducing undesired bugs.

Here we take a different approach consisting in shutting the
-Wstringop-overread warning on gcc>=12 at -O0 since that's the only
condition that triggers it. The issue was reported to and confirmed by
the gcc team here:  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114622

No backport needed, but this should be upstreamed into cebtree after
checking that all involved macros are available.
2024-09-18 17:42:52 +02:00
Amaury Denoyelle
fcd6d29acf BUG/MINOR: mux-quic: report glitches to session
Glitch counter was implemented for QUIC/HTTP3. The counter is stored in
the QCC MUX connection instance. However, this is never reported at the
session level which is necessary if glitch counter is tracked via a
stick-table.

To fix this, use session_add_glitch_ctr() in various QUIC MUX functions
which may increment glitch counter.

This should be backported up to 3.0.
2024-09-18 16:11:03 +02:00
Damien Claisse
2c783c25d6 DOC: management: add init-state to add server keywords
Commit ce6a621ae allowed init-state to be used for dynamic servers but I
forgot to update management doc.
2024-09-17 22:44:53 +02:00
Willy Tarreau
303a66573d MEDIUM: cfgparse: warn about proxies having the same names
As discussed below, there are too many problems and uncaught bugs
in the parser when trying to support proxies having similar names
but different types. There's specific code to detect the presence
of stick-tables in a pair of such proxies for example. It's even
possible that certain combinations of backend+listen that were not
previously detected have some nasty side effects.

According to the proposal in the discussion, this is now deprecated
in 3.1 (thus we emit a warning) and will become forbidden in 3.3.

A backport might be useful, but reporting a diag_warning only, not a
classical warning, so as not to break setups running in zero-warning
mode.

It was verified with a config involving all 9 combinations of
(frontend,backend,listen) followed by one of the same three that all
collisions are now properly blocked and that only back+front are kept
and emit a warning.

Link: https://www.mail-archive.com/haproxy@formilux.org/msg45185.html
2024-09-17 19:55:00 +02:00
Willy Tarreau
c70906c8a1 BUG/MINOR: cfgparse: detect incorrect overlap of same backend names
As reported below, it's possible to declare a backend then a proxy with
the same name, because for the proxy we check a frontend capability (the
first one to be tested):

   backend b
   listen b
        bind :8888

Let's check the two capabilities in this case and not just the frontend.

Better not backport this, as there's a risk of breakage of existing
setups that work by accident. It might make sense to report them as
diag warnings though.

Link: https://www.mail-archive.com/haproxy@formilux.org/msg45185.html
2024-09-17 19:55:00 +02:00
Aurelien DARRAGON
17e52c922b BUG/MINOR: cfgparse-listen: fix option httpslog override warning message
"option httpslog" override warning messaged used to be reported as
"option httplog", probably as a result of copy paste without adjusting
the context. Let's fix that to prevent emitting confusing warning messages

The issue exists since 98b930d ("MINOR: ssl: Define a default https log
format"), thus it should be backported up to 2.6
2024-09-17 15:40:02 +02:00
Aurelien DARRAGON
bc4bf5779f BUG/MINOR: fix missing "'option httpslog' overrides previous 'option tcplog clf'..." detection
Same as b85edd44db0 ("BUG/MINOR: fix missing "log-format overrides
previous 'option tcplog clf'..." detection") but for "option httpslog"
keyword.

No backport needed unless fd48b28 ("MINOR: Implements new log format of
option tcplog clf") is.
2024-09-17 15:40:02 +02:00
Aurelien DARRAGON
607b9adc9b BUG/MINOR: fix missing "log-format overrides previous 'option tcplog clf'..." detection
In commit fd48b28315 ("MINOR: Implements new log format of option tcplog clf")
"option tcplog clf" detection was correcly added for "option tcplog" and
"option httplog", but "log-format" case was overlooked. Thus, this config
would report erroneous warning message:

  defaults
    option tcplog clf
    log-format "ok"

[WARNING]  (727893) : config : parsing [test.conf:3]: 'log-format' overrides previous 'log-format' in 'defaults' section.

No backport needed unless fd48b28315 is.
2024-09-17 14:41:58 +02:00
Willy Tarreau
499e057644 MEDIUM: clock: don't compute before_poll when using monotonic clock
There's no point keeping both clocks up to date; if the monotonic clock
is ticking, let's just refrain from updating the wall clock one before
polling since we won't use it. We still do it after polling however as
we need a wall clock time to communicate with outside.

This saves one gettimeofday() call per loop and two timeval comparisons.
2024-09-17 09:08:10 +02:00
Willy Tarreau
24496803d1 MEDIUM: clock: use the monotonic clock for idle time calculation
By just keeping a copy of the last known value before entering
polling, we can apply the same algorithm as we're currently using,
except that it's now applied to the monotonic clock instead of the
wall clock, when it's detected that it's ticking. This improves
idle time calculation accuracy by making it independent on the
wall clock.
2024-09-17 09:08:10 +02:00
Willy Tarreau
4150851ce5 MEDIUM: clock: opportunistically use CLOCK_MONOTONIC for the internal time
We already collect CLOCK_MONOTONIC when it's available when leaving the
poller, but it's only used for profiling. The functions that return it
set the value to zero when it's not available, so we can use that to
detect if it works or not. The idea is that if the monotonic time is
non-zero, it is ticking and usable, then we use if for now_ns, otherwise
we use the corrected date. We continue to apply the now_offset to the
returned value because it helps forcing an early time wrap-around.

Proceeding like this presents two benefits:
  - on systems supporting this, the time is much more robust against
    time changes
  - when it works, it saves us from having to go through the time
    correction code, which is usually cheap, but better avoided anyway.

Note that idle time calculation continues to rely on the wall-clock
time.
2024-09-17 09:08:10 +02:00
Willy Tarreau
f793845f4a MEDIUM: clock: collect the monotonic time in clock_local_update_date()
Now we collect this clock in clock_local_update_date(), the closest from
the poller, which is also used when busy-polling, and the values is set
into the thread's curr_mono_time which did not exist before. Later,
clock_leaving_poll() just sets the prev_mono_time value from the curr_
one instead of retrieving the time at this specific point. It also means
that the monotonic time will now also cover the time needed to update
the global time, which should be negligible. Note that we don't collect
the CPU time in the clock_local_update_date() function even though it's
tempting, because when doing busy-polling, it would be collected on each
round while being useless.

Doing so will make sure that the local time always knows the monotonic
time when it is available.
2024-09-17 09:08:10 +02:00
Willy Tarreau
42e699903e MINOR: clock: test all clock_gettime() return values
Till now we were only using clock_gettime() for profiling, so if it
would fail it was no big deal. We intend to use it as the main clock
as well now, so we need to more reliably detect its absence or failure
and gracefully fall back to other options. Without the test we would
return anything present in the stack, which is neither clean nor easy
to detect.
2024-09-17 09:08:10 +02:00
Christopher Faulet
bb2a2bc5f2 BUG/MEDIUM: promex: Wait to have the request before sending the response
It is similar to the previous fix about the stats applet ("BUG/MEDIUM:
cache/stats: Wait to have the request before sending the response").
However, for promex, there is no crash and no obvious issue. But it depends
on the filter. Indeed, the request is used by promex, independantly if it
was considered as forwarded or not. So if it is modified by the filter,
modification are just ignored.

Same bug, same fix. We now wait the request was forwarded before processing
it and produce the response.
2024-09-16 22:56:28 +02:00
Christopher Faulet
afc50f2445 BUG/MEDIUM: cache/stats: Wait to have the request before sending the response
It seems obvious. On a classical workflow, the request headers analysis is
finished when these applets are woken up for the first time. So they don't
take care to really have the request to start to process it and to send the
response. But with a filter, it is possible to stop the request analysis
after the applet creation.

If this happens for the stats applet, this leads to a crash because we
retrieve the request start-line without checking if it is available. For the
cache applet, the response is just immediatly sent. And here it is a problem
if the compression is enabled. In that case too, this may lead to a crash
because the compression may be enabled but not initialized.

For a true server, there is no issue because the connection cannot be
established. The server is chosen only after the request analysis. The issue
with applets is that once created, an applet is quickly switched to the
established state. So it is probably a point that must be carefully reviewed
and probably reworked.

In the mean time, as a fix, in the cache and the stats applet, we just take
care to have the request before sending the response. This will do the
trick.

The patch must be backported as far as 2.6. On 2.6, the patch must be adapted.
2024-09-16 22:55:40 +02:00
Christopher Faulet
5fc12b0afd BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send
On a synchronous send from the stream to an applet, if some data were sent,
we must take care to wake the applet up. It is important because if
everything was sent at this stage, there is no other chance to wake the
applet up, mainly because SE_FL_WAIT_DATA flag is set on the applet's sedesc
in sc_update_tx() at the end of process_stream(). This flag prevent any
wakeup of the applet for a send event.

It is not necessary for a mux because the mux stream is called when a
syncrhonous send from the stream is performed. So it is reponsible to wake
the mux connection if necessary.

This patch must be backport to 3.0.
2024-09-16 22:55:40 +02:00
Christopher Faulet
655124f5cc DOC: configuration: Remove dangerous directives from the proxy matrix
For now, that only concerns accept-invalid-http-{request/response} and
accept-unsafe-violations-in-http-{request/response}. But the idea is to make
dangerous directives hard to find. It is one more way to discourage anyone
to use it. And, optionnaly, it is also handy because it keeps the matrix
aligned on 80 columns.
2024-09-16 22:55:25 +02:00
Christopher Faulet
4de6632693 MINOR: proxy: Rename accept-invalid-http-* options
With these options, it is possible to accept some invalid messages that may
considered as unsafe and may result as vulnerabilities. The naming is not
explicit enough on this point. These option must really be considered as
dangerous and only used as a temporary workaround. Unfortunately, when used,
it is probably because there are some legacy and unsupported applications in
place. Nevermind. The documentation warns about the use of these
options. Now the name of the options itself is a warning.

So now, "accept-invalid-http-request" and "accept-invalid-http-response"
options are deprecated and replaced by
"accept-unsafe-violations-in-http-request" and
"accept-unsafe-violations-in-http-response" options.
2024-09-16 22:55:25 +02:00
Christopher Faulet
0f4fad5291 DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options
Time to time, new exceptions are added in the HTTP parsing (most of time H1)
to not reject some invalid messages sent by legacy applications. But the
documentation of accept-invalid-http-request and
accept-invalid-http-response options is not pretty clear. So, now, there is
an explicit list of relaxing rules for both options.
2024-09-16 22:55:24 +02:00
Aurelien DARRAGON
1e0920f855 BUG/MINOR: peers: local entries updates may not be advertised after resync
Since commit 864ac3117 ("OPTIM: stick-tables: check the stksess without
taking the read lock"), when entries for a local table are learned from
another peer upon resynchro, and this is the only peer haproxy speaks to,
local updates on such entries are not advertised to the peer anymore,
until they eventually expire and can be recreated upon local updates.

This is due to the fact that ts->seen is always set to 0 when creating
new entry, and also when touch_remote is performed on the entry.

Indeed, while 864ac3117 attempts to avoid useless updates, it didn't
consider entries learned from a remote peer. Such entries are exclusively
learned in peer_treat_updatemsg(): once the entry is created (or updated)
with new data, touch_remote is used to commit the change. However, unlike
touch_local, entries committed using touch_remote will not be advertised
to the peer from which the entry was just learned (otherwise we would
enter a looping situation). Due to the above patch, once an entry is
learned from the (unique) remote peer, 'seen' will be stuck to 0 so it
will never be advertised for its whole lifetime.

Instead, when entries are learned from a peer, we should consider that
the peer that taught us the entry has seen it.

To do this, let's set seen=1 in peer_treat_updatemsg() after calling
touch_remote(). This way, if we happen to perform updates on this entry,
it will be properly advertized to relevant peers. This patch should not
affect the performance gain documented in 864ac3117 given that the test
scenario didn't involved entries learned by remote peers, but solely
locally created entries advertised to remote peers upon updates.

This should be backported in 3.0 with 864ac3117.
2024-09-16 14:06:39 +02:00
Willy Tarreau
5d350d1e50 OPTIM: vars: use multiple name heads in the vars struct
Given that the original list-based version was using a list head as the
root of the variables, while the tree is using a single pointer, it made
sense to reuse that space to place multiple roots, indexed on the lower
bits of the name hash. Two roots slightly increase the performance level,
but the best gain is obtained with 4 roots. The performance is now always
above that of the list, even with small counts, and with 100 vars, it's
21% higher than before, or 67% higher than with the list.

We keep the same lock (it could have made sense to use one lock per head),
because most of the variables in large configs are attached to a stream
or a session, hence are not shared between threads. Thus there's no point
in sharding the pointer.
2024-09-15 23:51:51 +02:00
Willy Tarreau
47ec7c681e OPTIM: vars: use a cebtree instead of a list for variable names
Configs involving many variables can start to eat a lot of CPU in name
lookups. The reason is that the names themselves are dynamic in that
they are relative to dynamic objects (sessions, streams, etc), so
there's no fixed index for example. The current implementation relies
on a standard linked list, and in order to speed up lookups and avoid
comparing strings, only a 64-bit hash of the variable's name is stored
and compared everywhere.

But with just 100 variables and 1000 accesses in a config, it's clearly
visible that variable name lookup can reach 56% CPU with a config
generated this way:

  for i in {0..100}; do
    printf "\thttp-request set-var(txn.var%04d) int(%d)" $i $i;
    for j in {1..10}; do [ $i -lt $j ] || printf ",add(txn.var%04d)" $((i-j)); done;
    echo;
  done

The performance and a 4-core skylake 4.4 GHz reaches 85k RPS with a perf
profile showing:

  Samples: 170K of event 'cycles', Event count (approx.): 142378815419
  Overhead  Shared Object            Symbol
    56.39%  haproxy                  [.] var_to_smp
     6.65%  haproxy                  [.] var_set.part.0
     5.76%  haproxy                  [.] sample_process_cnv
     3.23%  haproxy                  [.] sample_conv_var2smp
     2.88%  haproxy                  [.] sample_conv_arith_add
     2.33%  haproxy                  [.] __pool_alloc
     2.19%  haproxy                  [.] action_store
     2.13%  haproxy                  [.] vars_get_by_desc
     1.87%  haproxy                  [.] smp_dup

[above, var_to_smp() calls var_get() under the read lock].

By switching to a binary tree, the cost is significantly lower, the
performance reaches 117k RPS (+37%) with this profile:

  Samples: 170K of event 'cycles', Event count (approx.): 142323631229
  Overhead  Shared Object            Symbol
    40.22%  haproxy                  [.] cebu64_lookup
     7.12%  haproxy                  [.] sample_process_cnv
     6.15%  haproxy                  [.] var_to_smp
     4.75%  haproxy                  [.] cebu64_insert
     3.79%  haproxy                  [.] sample_conv_var2smp
     3.40%  haproxy                  [.] cebu64_delete
     3.10%  haproxy                  [.] sample_conv_arith_add
     2.36%  haproxy                  [.] action_store
     2.32%  haproxy                  [.] __pool_alloc
     2.08%  haproxy                  [.] vars_get_by_desc
     1.96%  haproxy                  [.] smp_dup
     1.75%  haproxy                  [.] var_set.part.0
     1.74%  haproxy                  [.] cebu64_first
     1.07%  [kernel]                 [k] aq_hw_read_reg
     1.03%  haproxy                  [.] pool_put_to_cache
     1.00%  haproxy                  [.] sample_process

The performance lowers a bit earlier than with the list however. What
can be seen is that the performance maintains a plateau till 25 vars,
starts degrading a little bit for the tree while it remains stable till
28 vars for the list. Then both cross at 42 vars and the list continues
to degrade doing a hyperbole while the tree resists better. The biggest
loss is at around 32 variables where the list stays 10% higher.

Regardless, given the extremely narrow band where the list is better, it
looks relevant to switch to this in order to preserve the almost linear
performance of large setups. For example at 1000 variables and 10k
lookups, the tree is 18 times faster than the list.

In addition this reduces the size of the struct vars by 8 bytes since
there's a single pointer, though it could make sense to re-invest them
into a secondary head for example.
2024-09-15 23:49:01 +02:00
Willy Tarreau
a0205f9de4 IMPORT: import cebtree (compact elastic binary trees)
This is an import of the compact elastic binary trees at commit
a9cd84a ("OPTIM: descent: better prefetch less and for writes when
deleting")

These will be used to replace certain lists (and possibly certain
tree nodes as well). They're as fast (or even faster) than ebtrees
for lookups, as fast for insertion and slower for deletion, and a
node only uses 2 pointers (like a list).

The only changes were cebtree.h where common/tools.h was replaced
with ebtree.h which we already have and already provides the needed
functions and macros, and the addition of a wrapper cebtree-prv.h in
src/ to redirect to import/cebtree-prv.h.
2024-09-15 23:44:59 +02:00
Willy Tarreau
6e92988e20 MINOR: vars: remove the emptiness tests in callers before pruning
All callers of vars_prune_* currently check the list for emptiness.
Let's leave that to vars_prune() itself, it will ease some changes in
the code. Thanks to the previous inlining of the vars_prune() function,
there's no performance loss, and even a very tiny 0.1% gain.
2024-09-15 23:44:16 +02:00
Willy Tarreau
2c1a9c3a43 OPTIM: vars: inline vars_prune() to avoid many calls
Many configs don't have variables and call it for no reason, and even
configs with variables don't necessarily have some in all scopes.
2024-09-15 23:42:09 +02:00
Willy Tarreau
aad6b771dd OPTIM: vars: remove the unneeded lock in vars_prune_*
vars_prune() and vars_prune_all() take the variable lock while purging
all variables from a head. However this is not needed:
  - proc scope variables are only purged during deinit, hence no lock
    is needed ;
  - all other scopes are attached to entities bound to a single thread
    so no lock is needed either.

Removing the lock saves about 0.5% CPU on variables-intensive setups,
but above all simplify the code, so let's do it.
2024-09-15 23:05:50 +02:00
Willy Tarreau
51ade2f1db OPTIM: sample: don't check casts for samples of same type
Originally when converters were created, they were mostly for casting
types. Nowadays we have many artithmetic converters to perform operations
on integers, and a number of converters operating on strings. Both of
these categories most often do not need any cast since the input and
output types are the same, which is visible as the cast function is
c_none. However, profiling shows that when heavily using arithmetic
converters, it's possible to spend up to ~7% of the time in
sample_process_cnv(), a good part of which is only in accessing the
sample_casts[] array. Simply avoiding this lookup when input and ouput
types are equal saves about 2% CPU on such setups doing intensive use
of converters.
2024-09-15 12:43:56 +02:00
Willy Tarreau
b11495652e BUG/MEDIUM: queue: implement a flag to check for the dequeuing
As unveiled in GH issue #2711, commit 5541d4995d ("BUG/MEDIUM: queue:
deal with a rare TOCTOU in assign_server_and_queue()") does have some
side effects in that it can occasionally cause an endless loop.

As Christopher analysed it, the problem is that process_srv_queue(),
which uses a trylock in order to leave only one thread in charge of
the dequeueing process, can lose the lock race against pendconn_add().
If this happens on the last served request, then there's no more thread
to deal with the dequeuing, and assign_server_and_queue() will loop
forever on a condition that was initially exepected to be extremely
rare (and still is, except that now it can become sticky). Previously
what was happening is that such queued requests would just time out
and since that was very rare, nobody would notice.

The root of the problem really is that trylock. It was added so that
only one thread dequeues at a time but it doesn't offer only that
guarantee since it also prevents a thread from dequeuing if another
one is in the process of queuing. We need a different criterion.

What we're doing now is to set a flag "dequeuing" in the server, which
indicates that one thread is currently in the process of dequeuing
requests. This one is atomically tested, and only if no thread is in
this process, then the thread grabs the queue's lock and dequeues.
This way it will be serialized with pendconn_add() and no request
addition will be missed.

It is not certain whether the original race covered by the fix above
can still happen with this change, so better keep that fix for now.

Thanks to @Yenya (Jan Kasprzak) for the precise and complete report
allowing to spot the problem.

This patch should be backported wherever the patch above was backported.
2024-09-13 08:35:47 +02:00
Willy Tarreau
adaba6f904 BUG/MINOR: clock: validate that now_offset still applies to the current date
We want to make sure that now_offset is still valid for the current
date: another thread could very well have updated it by detecting a
backwards jump, and at the very same moment the time got fixed again,
that we retrieve and add to the new offset, which results in a larger
jump. Normally, for this to happen, it would mean that before_poll
was also affected by the jump and was detected before and bounded
within 2 seconds, resulting in max 2 seconds perturbations.

Here we try to detect this situation and fall back to re-adjusting the
offset instead.

It's more of a strengthening of what's done by commit e8b1ad4c2b
("BUG/MEDIUM: clock: also update the date offset on time jumps") than a
pure fix, in that the issue was not direclty observed but it's visibly
possible by reading the code, so this should be backported along with
the patch above. This is related to issue GH #2704.

Note that this could be simplified in terms of operations by migrating
the deadlines to nanoseconds, but this was the path to least intrusive
changes.
2024-09-12 19:09:19 +02:00
Willy Tarreau
af48e4cc6b BUG/MINOR: clock: make time jump corrections a bit more accurate
Since commit e8b1ad4c2b ("BUG/MEDIUM: clock: also update the date offset
on time jumps") we try to update the now_offet based on the last known
valid date. But if it's off compared to the global_now_ns date shared
by other threads, we'll get the time off a little bit. When this happens,
we should consider the most recent of these dates so that if the global
date was already known to be more recent, we should use it and stick to
it. This will avoid setting too large an offset that could in turn provoke
a larger jump on another thread.

This is related to issue GH #2704.

This can be backported to other branches having the patch above.
2024-09-12 18:27:03 +02:00
Willy Tarreau
ad98edd00a BUG/MINOR: polling: fix time reporting when using busy polling
Since commit beb859abce ("MINOR: polling: add an option to support
busy polling") the time and status passed to clock_update_local_date()
were incorrect. Indeed, what is considered is the before_poll date
related to the configured timeout which does not correspond to what
is passed to the poller. That's not correct because before_poll+the
syscall's timeout will be crossed by the current date 100 ms after
the start of the poller. In practice it didn't happen when the poller
was limited to 1s timeout but at one minute it happens all the time.

That's particularly visible when running a multi-threaded setup with
busy polling and only half of the threads working (bind ... thread even).
In this case, the fixup code of clock_update_local_date() is executed
for each round of busy polling. The issue was made really visible
starting with recent commit e8b1ad4c2b ("BUG/MEDIUM: clock: also
update the date offset on time jumps") because upon a jump, the
shared offset is reset, while it should not be in this specific
case.

What needs to be done instead is to pass the configured timeout of
the poller (and not of the syscall), and always pass "interrupted"
set so as to claim we got an event (which is sort of true as it just
means the poller returned instantly). In this case we can still
detect backwards/forward jumps and will use a correct boundary
for the maximum date that covers the whole loop.

This can be backported to all versions since the issue was introduced
with busy-polling in 1.9-dev8.
2024-09-12 17:47:13 +02:00
Christopher Faulet
1900ca475f MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option
Since the 2.6, A parsing error is reported when the chunked encoding is
found twice. As stated in RFC9112, A sender must not apply the chunked
transfer coding more than once to a message body. It means only one chunked
coding must be found. In addition, empty values are also rejected becaues it
is forbidden by RFC9110.

However, in both cases, it may be useful to relax the rules for trusted
legacy servers when accept-invalid-http-response option is set. Especially
because it was accepted on 2.4 and older. In addition, T-E header is now
sanitized before sending it. It is not a problem Because it is a hop-by-hop
header

Note that it remains invalid on client side because there is no good reason
to relax the parsing on this side. We can argue a server is trusted so we
can decide to support some legacy behavior. It is not true on client side
and it is highly suspicious if a client is sending an invalid T-E header.

Note also we continue to reject unsupported T-E values (so all codings except
"chunked"). Because the "TE" header is sanitized and cannot contain other value
than "Trailers", there is absolutely no reason for a server to use something
else.

This patch should fix the issue #2677. It could probably be backported as
far as 2.6 if necessary.
2024-09-12 09:21:57 +02:00
Willy Tarreau
2b95c77c08 DOC: server: document what to check for when adding new server keywords
It's too easy to overlook the dynamic servers when adding new server
keywords, and the fields on each keyword line are totally obscure. This
commit adds a title to each column of the table and explains what is
expected and what to check for when adding a keyword.
2024-09-10 18:50:12 +02:00
Damien Claisse
ce6a621ae3 MINOR: server: allow init-state for dynamic servers
Commit 50322df introduced the init-state keyword, but it didn't enable
it for dynamic servers. However, this feature is perfectly desirable
for virtual servers too, where someone would like a server inlived
through "set server be1/srv1 state ready" to be put out of maintenance
in down state until the next health check succeeds.
At reading the code, it seems that it's only a matter of allowing this
keyword for dynamic servers, as current code path calls
srv_adm_set_ready() which incidentally triggers a call to
_srv_update_status_adm().
2024-09-10 18:18:38 +02:00
Willy Tarreau
33deb4babe REGTESTS: shorten a bit the delay for the h1/h2 upgrade test
Commit d6c4ed9a96 ("REGTESTS: h1/h2: Update script testing H1/H2
protocol upgrades") introduced a 0.5 second delay which is higher
than those of most other tests (usually 0.05 or 0.2) and triggers
timeouts on my side. Let's just shorten it to 0.2 since its goal
is only to send data separately.

Note: maybe a barrier approach would be possible, though not
      studied.
2024-09-10 10:36:59 +02:00
Willy Tarreau
9f8d9c9e8b BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
Commit 4f2493f355 ("BUG/MINOR: pattern: pat_ref_set: fix UAF reported by
coverity") dropped the condition to concatenate error messages and as
such introduced a leading comma in front of all of them. Then commit
911f4d93d4 ("BUG/MINOR: pattern: pat_ref_set: return 0 if err was found")
changed the behavior to stop at the first error anyway, so all the
mechanics dedicated to the concatenation of error messages is no longer
needed and we can simply return the error as-is, without inserting any
comma.

This should be backported where the patches above are backported.
2024-09-10 08:55:29 +02:00
Willy Tarreau
036ab62231 REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load
This test has an expect rule for syslog that looks for [cC]D, to
indicate a client abort or timeout during the data phase. The purpose
was to say that when it fails it must be this, but the very low timeout
(1ms) still makes it prone to succeeding if the machine is highly loaded.

This has become more visible since commit e8b1ad4c2b ("BUG/MEDIUM: clock:
also update the date offset on time jumps") because the clock drift
adjustments are more systematic. Since this commit, running 50 such tests
at twice more than the number of CPUs in parallel is sufficient to yield
errors due to some lines appearing as succeeding:

   make reg-tests -- --j $((($(nproc)+1)*2)) --vtestparams -n50 reg-tests/log/wrong_ip_port_logging.vtc

It was observed that pauses up to 300ms were observed in epoll_wait() in
such circumstances, which were properly fixed by the time drift detection..
Another approach would consist in increasing the permitted margin during
which we don't fix the clock drift but that would not be logical since the
base time had really been awaited for.

This should be backported to all stable releases since the commit above
will trigger the issue more often.
2024-09-09 19:38:28 +02:00
Christopher Faulet
a99d58819f BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established
This reverts commit 225a4d02e1.

When a 200-OK response is replied to a CONNECT request or a
101-Switching-protocol, a tunnel is considered as established between the
client and the server. However, we must not declare the reponse as
bodyless. Of course, there is no payload, but tunneled data are expected.

Because of this bug, the zero-copy forwarding is disabled on the server
side.

This patch must be backported as far as 2.9.
2024-09-09 19:01:47 +02:00
Christopher Faulet
f6e193f1b0 BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state
When the mux is woken up on I/O events, if the zero-copy forwarding is
enabled, receives are blocked. In this case, the SC is woken up to be able
to perform 0-copy forwarding to the other side. This works well, except for
the H1C in CLOSING state.

Indeed, in that case, in h1_process(), the SC is not woken up because only
RUNNING H1 connections are considered. As consequence, the mux will ignore
connection closure. The H1 connection remains blocked, waiting for the
shutdown timeout. If no timeout is configured, the H1 connection is never
closed leading to a leak.

This patch should fix leak reported by Damien Claisse in the issue #2697. It
should be backported as far as 2.8.
2024-09-09 19:01:47 +02:00
William Lallemand
021ac6a108 MEDIUM: ssl/cli: "dump ssl cert" allow to dump a certificate in PEM format
The new "dump ssl cert" CLI command allows to dump a certificate stored
into HAProxy memory. Until now it was only possible to dump the
description of the certificate using "show ssl cert", but with this new
command you can dump the PEM content on the filesystem.

This command is only available on a admin stats socket.

$ echo "@1 dump ssl cert cert.pem" | socat /tmp/master.sock -
-----BEGIN PRIVATE KEY-----
[...]
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
2024-09-09 16:54:48 +02:00
Aurelien DARRAGON
68cfb222b5 BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
Since c5959fd ("MEDIUM: pattern: merge same pattern"), UAF (leading to
crash) can be experienced if the same pattern file (and match method) is
used in two default sections and the first one is not referenced later in
the config. In this case, the first default section will be cleaned up.
However, due to an unhandled case in the above optimization, the original
expr which the second default section relies on is mistakenly freed.

This issue was discovered while trying to reproduce GH #2708. The issue
was particularly tricky to reproduce given the config and sequence
required to make the UAF happen. Hopefully, Github user @asmnek not only
provided useful informations, but since he was able to consistently
trigger the crash in his environment he was able to nail down the crash to
the use of pattern file involved with 2 named default sections. Big thanks
to him.

To fix the issue, let's push the logic from c5959fd a bit further. Instead
of relying on "do_free" variable to know if the expression should be freed
or not (which proved to be insufficient in our case), let's switch to a
simple refcounting logic. This way, no matter who owns the expression, the
last one attempting to free it will be responsible for freeing it.
Refcount is implemented using a 32bit value which fills a previous 4 bytes
structure gap:

        int                        mflags;               /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        long unsigned int          lock;                 /*    88     8 */
(output from pahole)

Even though it was not reproduced in 2.6 or below by @asmnek (the bug was
revealed thanks to another bugfix), this issue theorically affects all
stable versions (up to c5959fd), thus it should be backported to all
stable versions.
2024-09-09 16:07:05 +02:00
Aurelien DARRAGON
8157c1caf2 BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg}
Using valgrind when running map_beg or map_str, the following error is
reported:

==242644== Conditional jump or move depends on uninitialised value(s)
==242644==    at 0x2E4AB1: pat_match_str (pattern.c:457)
==242644==    by 0x2E81ED: pattern_exec_match (pattern.c:2560)
==242644==    by 0x343176: sample_conv_map (map.c:211)
==242644==    by 0x27522F: sample_process_cnv (sample.c:1330)
==242644==    by 0x2752DB: sample_process (sample.c:1373)
==242644==    by 0x319917: action_store (vars.c:814)
==242644==    by 0x24D451: http_req_get_intercept_rule (http_ana.c:2697)

In fact, the error is legit, because in pat_match_{beg,str}, we
dereference the buffer on len+1 to check if a value was previously set,
and then decide to force NULL-byte if it wasn't set.

But the approach is no longer compatible with current architecture:
data past str.data is not guaranteed to be initialized in the buffer.
Thus we cannot dereference the value, else we expose us to uninitialized
read errors. Moreover, the check is useless, because we systematically
set the ending byte to 0 when the conditions are met.

Finally, restoring the older value after the lookup is not relevant:
indeed, either the sample is marked as const and in such case it
is already duplicated, or the sample is not const and we forcefully add
a terminating NULL byte outside from the actual string bytes (since we're
past str.data), so as we didn't alter effective string data and that data
past str.data cannot be dereferenced anyway as it isn't guaranteed to be
initialized, there's no point in restoring previous uninitialized data.

It could be backported in all stable versions. But since this was only
detected by valgrind and isn't known to cause issues in existing
deployments, it's probably better to wait a bit before backporting it
to avoid any breakage.. although the fix should be theoretically harmless.
2024-09-09 15:57:30 +02:00