Commit Graph

7303 Commits

Author SHA1 Message Date
Willy Tarreau
e03d05c6ce MINOR: check: remember when we migrate a check
The goal here is to explicitly mark that a check was migrated so that
we don't do it again. This will allow us to perform other actions on
the target thread while still knowing that we don't want to be migrated
again. The new READY bit combine with SLEEPING to form 4 possible states:

 SLP  RDY   State      Description
  0    0    -          (reserved)
  0    1    RUNNING    Check is bound to current thread and running
  1    0    SLEEPING   Check is sleeping, not bound to a thread
  1    1    MIGRATING  Check is migrating to another thread

Thus we set READY upon migration, and check for it before migrating, this
is sufficient to prevent a second migration. To make things a bit clearer,
the SLEEPING bit was switched with FASTINTER so that SLEEPING and READY are
adjacent.
2023-09-01 08:26:06 +02:00
Willy Tarreau
7163f95b43 MINOR: checks: start the checks in sleeping state
The CHK_ST_SLEEPING state was introduced by commit d114f4a68 ("MEDIUM:
checks: spread the checks load over random threads") to indicate that
a check was not currently bound to a thread and that it could easily
be migrated to any other thread. However it did not start the checks
in this state, meaning that they were not redispatchable on startup.

Sometimes under heavy load (e.g. when using SSL checks with OpenSSL 3.0)
the cost of setting up new connections is so high that some threads may
experience connection timeouts on startup. In this case it's better if
they can transfer their excess load to other idle threads. By just
marking the check as sleeping upon startup, we can do this and
significantly reduce the number of failed initial checks.
2023-09-01 08:26:06 +02:00
Willy Tarreau
52b260bae4 MINOR: server/ssl: maintain an index of the last known valid SSL session
When a thread creates a new session for a server, if none was known yet,
we assign the thread id (hence the reused_sess index) to a shared variable
so that other threads will later be able to find it when they don't have
one yet. For now we only set and clear the pointer upon session creation,
we do not yet pick it.

Note that we could have done it per thread-group, so as to avoid any
cross-thread exchanges, but it's anticipated that this is essentially
used during startup, at a moment where the cost of inter-thread contention
is very low compared to the ability to restart at full speed, which
explains why instead we store a single entry.
2023-08-31 08:50:01 +02:00
Willy Tarreau
607041dec3 MEDIUM: server/ssl: place an rwlock in the per-thread ssl server session
The goal will be to permit a thread to update its session while having
it shared with other threads. For now we only place the lock and arrange
the code around it so that this is quite light. For now only the owner
thread uses this lock so there is no contention.

Note that there is a subtlety in the openssl API regarding
i2s_SSL_SESSION() in that it fills the area pointed to by its argument
with a dump of the session and returns a size that's equal to the
previously allocated one. As such, it does modify the shared area even
if that's not obvious at first glance.
2023-08-31 08:50:01 +02:00
Alexander Stephan
ece0d1ab49 MINOR: sample: Refactor fc_pp_authority by wrapping the generic TLV fetch
We already have a call that can retreive an TLV with any value.
Therefore, the fetch logic is redundant and can be simplified
by simply calling the generic fetch with the correct TLV ID
set as an argument.
2023-08-29 15:31:51 +02:00
Alexander Stephan
fecc573da1 MEDIUM: connection: Generic, list-based allocation and look-up of PPv2 TLVs
In order to be able to implement fetches in the future that allow
retrieval of any TLVs, a new generic data structure for TLVs is introduced.

Existing TLV fetches for PP2_TYPE_AUTHORITY and PP2_TYPE_UNIQUE_ID are
migrated to use this new data structure. TLV related pools are updated
to not rely on type, but only on size. Pools accomodate the TLV list
element with their associated value. For now, two pools for 128 B and
256 B values are introduced. More fine-grained solutions are possible
in the future, if necessary.
2023-08-29 15:15:47 +02:00
Alexander Stephan
c9d47652d2 CLEANUP/MINOR: connection: Improve consistency of PPv2 related constants
This patch improves readability by scoping HA proxy related PPv2 constants
with a 'HA" prefix. Besides, a new constant for the length of a CRC32C
TLV is introduced. The length is derived from the PPv2 spec, so 32 Bit.
2023-08-29 15:15:47 +02:00
Willy Tarreau
bd84387beb MEDIUM: capabilities: enable support for Linux capabilities
For a while there has been the constraint of having to run as root for
transparent proxying, and we're starting to see some cases where QUIC is
not running in socket-per-connection mode due to the missing capability
that would be needed to bind a privileged port. It's not realistic to
ask all QUIC users on port 443 to run as root, so instead let's provide
a basic support for capabilities at least on linux. The ones currently
supported are cap_net_raw, cap_net_admin and cap_net_bind_service. The
mechanism was made OS-specific with a dedicated file because it really
is. It can be easily refined later for other OSes if needed.

A new keyword "setcaps" is added to the global section, to enumerate the
capabilities that must be kept when switching from root to non-root. This
is ignored in other situations though. HAProxy has to be built with
USE_LINUX_CAP=1 for this to be supported, which is enabled by default
for linux-glibc, linux-glibc-legacy and linux-musl.

A good way to test this is to start haproxy with such a config:

    global
        uid 1000
        setcap cap_net_bind_service

    frontend test
        mode http
        timeout client 3s
        bind quic4@:443 ssl crt rsa+dh2048.pem allow-0rtt

and run it under "sudo strace -e trace=bind,setuid", then connecting
there from an H3 client. The bind() syscall must succeed despite the
user id having been switched.
2023-08-29 11:11:50 +02:00
William Lallemand
e7d9082315 BUG/MINOR: ssl/cli: can't find ".crt" files when replacing a certificate
Bug was introduced by commit 26654 ("MINOR: ssl: add "crt" in the
cert_exts array").

When looking for a .crt directly in the cert_exts array, the
ssl_sock_load_pem_into_ckch() function will be called with a argument
which does not have its ".crt" extensions anymore.

If "ssl-load-extra-del-ext" is used this is not a problem since we try
to add the ".crt" when doing the lookup in the tree.

However when using directly a ".crt" without this option it will failed
looking for the file in the tree.

The fix removes the "crt" entry from the array since it does not seem to
be really useful without a rework of all the lookups.

Should fix issue #2265

Must be backported as far as 2.6.
2023-08-28 18:20:39 +02:00
Willy Tarreau
892d04733f BUILD: import: guard plock.h against multiple inclusion
Surprisingly there's no include guard in plock.h though there is one in
atomic-ops.h. Let's add one, or we cannot risk including the file multiple
times.
2023-08-26 17:28:08 +02:00
Amaury Denoyelle
5afcb686b9 MAJOR: connection: purge idle conn by last usage
Backend idle connections are purged on a recurring occurence during the
process lifetime. An estimated number of needed connections is
calculated and the excess is removed periodically.

Before this patch, purge was done directly using the idle then the safe
connection tree of a server instance. This has a major drawback to take
no account of a specific ordre and it may removed functional connections
while leaving ones which will fail on the next reuse.

The problem can be worse when using criteria to differentiate idle
connections such as the SSL SNI. In this case, purge may remove
connections with a high rate of reusing while leaving connections with
criteria never matched once, thus reducing drastically the reuse rate.

To improve this, introduce an alternative storage for idle connection
used in parallel of the idle/safe trees. Now, each connection inserted
in one of this tree is also inserted in the new list at
`srv_per_thread.idle_conn_list`. This guarantees that recently used
connection is present at the end of the list.

During the purge, use this list instead of idle/safe trees. Remove first
connection in front of the list which were not reused recently. This
will ensure that connection that are frequently reused are not purged
and should increase the reuse rate, particularily if distinct idle
connection criterias are in used.
2023-08-25 15:57:48 +02:00
Amaury Denoyelle
61fc9568fb MINOR: server: move idle tree insert in a dedicated function
Define a new function _srv_add_idle(). This is a simple wrapper to
insert a connection in the server idle tree. This is reserved for simple
usage and require to idle_conns lock. In most cases,
srv_add_to_idle_list() should be used.

This patch does not have any functional change. However, it will help
with the next patch as idle connection will be always inserted in a list
as secondary storage along with idle/safe trees.
2023-08-25 15:57:48 +02:00
Amaury Denoyelle
77ac8eb4a6 MINOR: connection: simplify removal of idle conns from their trees
Small change of API for conn_delete_from_tree(). Now the connection
instance is taken as argument instead of its inner node.

No functional change introduced with this commit. This simplifies
slightly invocation of conn_delete_from_tree(). The most useful changes
is that this function will be extended in the next patch to be able to
remove the connection from its new idle list at the same time as in its
idle tree.
2023-08-25 15:57:48 +02:00
Frdric Lcaille
81815a9a83 MEDIUM: map/acl: Replace map/acl spin lock by a read/write lock.
Replace ->lock type of pat_ref struct by HA_RWLOCK_T.
Replace all calls to HA_SPIN_LOCK() (resp. HA_SPIN_UNLOCK()) by HA_RWLOCK_WRLOCK()
(resp. HA_RWLOCK_WRUNLOCK()) when a write access is required.
There is only one read access which is needed. This is in the "show map" command
callback, cli_io_handler_map_lookup() where a HA_SPIN_LOCK() call is replaced
by HA_RWLOCK_RDLOCK() (resp. HA_SPIN_UNLOCK() by HA_RWLOCK_RDUNLOCK).
Replace HA_SPIN_INIT() calls by HA_RWLOCK_INIT() calls.
2023-08-25 15:42:03 +02:00
Frdric Lcaille
745d1a269b MEDIUM: map/acl: Improve pat_ref_set_elt() efficiency (for "set-map", "add-acl"action perfs)
Store a pointer to the expression (struct pattern_expr) into the data structure
used to chain/store the map element references (struct pat_ref_elt) , e.g. the
struct pattern_tree when stored into an ebtree or struct pattern_list when
chained to a list.

Modify pat_ref_set_elt() to stop inspecting all the expressions attached to a map
and to look for the <elt> element passed as parameter to retrieve the sample data
to be parsed. Indeed, thanks to the pointer added above to each pattern tree nodes
or list elements, they all can be inspected directly from the <elt> passed as
parameter and its ->tree_head and ->list_head member: the pattern tree nodes are
stored into elt->tree_head, and the pattern list elements are chained to
elt->list_head list. This inspection was also the job of pattern_find_smp() which
is no more useful. This patch removes the code of this function.
2023-08-25 15:41:59 +02:00
Frdric Lcaille
0844bed7d3 MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs)
Organize reference to pattern element of map (struct pat_ref_elt) into an ebtree:
  - add an eb_root member to the map (pat_ref struct) and an ebpt_node to its
    element (pat_ref_elt struct),
  - modify the code to insert these nodes into their ebtrees each time they are
    allocated. This is done in pat_ref_append().

Note that ->head member (struct list) of map (struct pat_ref) is not removed
could have been removed. This is not the case because still necessary to dump
the map contents from the CLI in the order the map elememnts have been inserted.

This patch also modifies http_action_set_map() which is the callback at least
used by "set-map" action. The pat_ref_elt element returned by pat_ref_find_elt()
is no more ignored, but reused if not NULL by pat_ref_set() as first element to
lookup from. This latter is also modified to use the ebtree attached to the map
in place of the ->head list attached to each map element (pat_ref_elt struct).

Also modify pat_ref_find_elt() to makes it use ->eb_root map ebtree added to the
map by this patch in place of inspecting all the elements with a strcmp() call.
2023-08-25 15:41:56 +02:00
Amaury Denoyelle
5053e89142 MEDIUM: h2: prevent stream opening before connection reverse completed
HTTP/2 demux must be handled with care for active reverse connection.
Until accept has been completed, it should be forbidden to handle
HEADERS frame as session is not yet ready to handle streams.

To implement this, use the flag H2_CF_DEM_TOOMANY which blocks demux
process. This flag is automatically set just after conn_reverse()
invocation. The flag is removed on rev_accept_conn() callback via a new
H2 ctl enum. H2 tasklet is woken up to restart demux process.

As a side-effect, reporting in H2 mux may be blocked as demux functions
are used to convert error status at the connection level with
CO_FL_ERROR. To ensure error is reported for a reverse connection, check
h2c_is_dead() specifically for this case in h2_wake(). This change also
has its own side-effect : h2c_is_dead() conditions have been adjusted to
always exclude !h2c->conn->owner condition which is always true for
reverse connection or else H2 mux may kill them unexpectedly.
2023-08-24 17:03:08 +02:00
Amaury Denoyelle
47f502df5e MEDIUM: proto_reverse_connect: bootstrap active reverse connection
Implement active reverse connection initialization. This is done through
a new task stored in the receiver structure. This task is instantiated
via bind callback and first woken up via enable callback.

Task handler is separated into two halves. On the first step, a new
connection is allocated and stored in <pend_conn> member of the
receiver. This new client connection will proceed to connect using the
server instance referenced in the bind_conf.

When connect has successfully been executed and HTTP/2 connection is
ready for exchange after SETTINGS, reverse_connect task is woken up. As
<pend_conn> is still set, the second halve is executed which only
execute listener_accept(). This will in turn execute accept_conn
callback which is defined to return the pending connection.

The task is automatically requeued inside accept_conn callback if bind
maxconn is not yet reached. This allows to specify how many connection
should be opened. Each connection is instantiated and reversed serially
one by one until maxconn is reached.

conn_free() has been modified to handle failure if a reverse connection
fails before being accepted. In this case, no session exists to notify
about the failure. Instead, reverse_connect task is requeud with a 1
second delay, giving time to fix a possible network issue. This will
allow to attempt a new connection reverse.

Note that for the moment connection rebinding after accept is disabled
for simplicity. Extra operations are required to migrate an existing
connection and its stack to a new thread which will be implemented
later.
2023-08-24 17:03:06 +02:00
Amaury Denoyelle
0747e493a0 MINOR: proto_reverse_connect: parse rev@ addresses for bind
Implement parsing for "rev@" addresses on bind line. On config parsing,
server name is stored on the bind_conf.

Several new callbacks are defined on reverse_connect protocol to
complete parsing. listen callback is used to retrieve the server
instance from the bind_conf server name. If found, the server instance
is stored on the receiver. Checks are implemented to ensure HTTP/2
protocol only is used by the server.
2023-08-24 17:02:37 +02:00
Amaury Denoyelle
008e8f67ee MINOR: connection: extend conn_reverse() for active reverse
Implement active reverse support inside conn_reverse(). This is used to
transfer the connection from the backend to the frontend side.

A new flag is defined CO_FL_REVERSED which is set just after this
transition. This will be used to identify connections which were
reversed but not yet accepted.
2023-08-24 17:02:37 +02:00
Amaury Denoyelle
5db6dde058 MINOR: proto: define dedicated protocol for active reverse connect
A new protocol named "reverse_connect" is created. This will be used to
instantiate connections that are opened by a reverse bind.

For the moment, only a minimal set of callbacks are defined with no real
work. This will be extended along the next patches.
2023-08-24 17:02:37 +02:00
Amaury Denoyelle
1723e21af2 MINOR: connection: use attach-srv name as SNI reuse parameter on reverse
On connection passive reverse from frontend to backend, its hash node is
calculated to be able to select it from the idle server pool. If
attach-srv rule defined an associated name, reuse it as the value for
SNI prehash.

This change allows a client to select a reverse connection by its name
by configuring its server line with a SNI to permit this.
2023-08-24 17:02:34 +02:00
Amaury Denoyelle
0b3758e18f MINOR: tcp-act: define optional arg name for attach-srv
Add an optional argument 'name' for attach-srv rule. This contains an
expression which will be used as an identifier inside the server idle
pool after reversal. To match this connection for a future transfer
through the server, the SNI server parameter must match this name. If no
name is defined, match will only occur with an empty SNI value.

For the moment, only the parsing step is implemented. An extra check is
added to ensure that the reverse server uses SSL with a SNI. Indeed, if
name is defined but server does not uses a SNI, connections will never
be selected on reused after reversal due to a hash mismatch.
2023-08-24 15:28:38 +02:00
Amaury Denoyelle
58cb76d7e1 MINOR: tcp-act: parse 'tcp-request attach-srv' session rule
Create a new tcp-request session rule 'attach-srv'.

The parsing handler is used to extract the server targetted with the
notation 'backend/server'. The server instance is stored in the act_rule
instance under the new union variant 'attach_srv'.

Extra checks are implemented in parsing to ensure attach-srv is only
used for proxy in HTTP mode and with listeners/server with no explicit
protocol reference or HTTP/2 only.

The action handler itself is really simple. It assigns the stored server
instance to the 'reverse' member of the connection instance. It will be
used in a future patch to implement passive reverse-connect.
2023-08-24 15:02:32 +02:00
Amaury Denoyelle
6e428dfaf2 MINOR: backend: only allow reuse for reverse server
A reverse server relies solely on its pool of idle connection to
transfer requests which will be populated through a new tcp-request rule
'attach-srv'.

Several changes are required on connect_server() to implement this.
First, reuse mode is forced to always for this type of server. Then, if
no idle connection is found, the request will be aborted. This results
with a 503 HTTP error code, similarly to when no server is available.
2023-08-24 14:49:03 +02:00
Amaury Denoyelle
e6223a3188 MINOR: server: define reverse-connect server
Implement reverse-connect server. This server type cannot instantiate
its own connection on transfer. Instead, it can only reuse connection
from its idle pool. These connections will be populated using the future
'tcp-request session attach-srv' rule.

A reverse-connect has no address. Instead, it uses a new custom server
notation with '@' character prefix. For the moment, only '@reverse' is
defined. An extra check is implemented to ensure server is used in a
HTTP proxy.
2023-08-24 14:49:03 +02:00
Amaury Denoyelle
4fb538d4b6 MEDIUM: h2: reverse connection after SETTINGS reception
Reverse connection after SETTINGS reception if it was set as reversable.
This operation is done in a new function h2_conn_reverse(). It regroups
common changes which are needed for both reversal direction :
H2_CF_IS_BACK is set or unset and timeouts are inverted.

For the moment, only passive reverse is fully implemented. Once done,
the connection instance is directly inserted in its targetted server
pool. It can then be used immediately for future transfers using this
server.
2023-08-24 14:49:03 +02:00
Amaury Denoyelle
1f76b8ae07 MEDIUM: connection: implement passive reverse
Define a new method conn_reverse(). This method is used to reverse a
connection from frontend to backend or vice-versa depending on its
initial status.

For the moment, passive reverse only is implemented. This covers the
transition from frontend to backend side. The connection is detached
from its owner session which can then be freed. Then the connection is
linked to the server instance.

only for passive connection on
frontend to transfer them on the backend side. This requires to free the
connection session after detaching it from.
2023-08-24 14:44:33 +02:00
Amaury Denoyelle
fbe35afaa4 MINOR: proxy: simplify parsing 'backend/server'
Several CLI handlers use a server argument specified with the format
'<backend>/<server>'. The parsing of this arguement is done in two
steps, first splitting the string with '/' delimiter and then use
get_backend_server() to retrieve the server instance.

Refactor this code sections with the following changes :
* splitting is reimplented using ist API
* get_backend_server() is removed. Instead use the already existing
  proxy_be_by_name() then server_find_by_name() which contains
  duplicated code with the now removed function.

No functional change occurs with this commit. However, it will be useful
to add new configuration options reusing the same '<backend>/<server>'
for reverse connect.
2023-08-24 14:44:33 +02:00
Willy Tarreau
9b47ed1a93 IMPORT: xxhash: update xxHash to version 0.8.2
Peter Varkoly reported a build issue on ppc64le in xxhash.h. Our version
(0.8.1) was the last one 9 months ago, and since then this specific issue
was addressed in 0.8.2, so let's apply the maintenance update.

This should be backported to 2.8 and 2.7.
2023-08-24 12:01:06 +02:00
Amaury Denoyelle
cd97ba147c BUILD/IMPORT: fix compilation with PLOCK_DISABLE_EBO=1
Compilation is broken due to missing __pl_wait_unlock_long() definition
when building with PLOCK_DISABLE_EBO=1. This has been introduced since
the following commit which activates the inlining version of
pl_wait_unlock_long() :
  commit 071d689a51
  MINOR: threads: inline the wait function for pthread_rwlock emulation

Add an extra check on PLOCK_DISABLE_EBO before choosing the inline or
default version of pl_wait_unlock_long() to fix this.
2023-08-17 11:16:54 +02:00
Willy Tarreau
78fa54863d MINOR: atomic: make sure to always relax after a failed CAS
There were a few places left where we forgot to call __ha_cpu_relax()
after a failed CAS, in the HA_ATOMIC_UPDATE_{MIN,MAX} macros, and in
a few sync_* API macros (the same as above plus HA_ATOMIC_CAS and
HA_ATOMIC_XCHG). Let's add them now.

This could have been a cause of contention, particularly with
process_stream() calling stream_update_time_stats() which uses 8 of them
in a call (4 for the server, 4 for the proxy). This may be a possible
explanation for the high CPU consumption reported in GH issue #2251.

This should be backported at least to 2.6 as it's harmless.
2023-08-17 09:09:20 +02:00
Willy Tarreau
071d689a51 MINOR: threads: inline the wait function for pthread_rwlock emulation
When using pthread_rwlock emulation, contention is reported on
pl_wait_unlock_long(). This is really not convenient to analyse what is
happening. Now plock supports inlining the wait call for just the lorw
functions by enabling PLOCK_LORW_INLINE_WAIT. Let's do this so that now
the wait time will be precisely reported as either pthread_rwlock_rdlock()
or pthread_rwlock_wrlock() depending on the contended function, but no
more on pl_wait_unlock_long(), which will still be reported for all
other locks.
2023-08-17 00:09:05 +02:00
Willy Tarreau
e56275378f IMPORT: lorw: support inlining the wait call
Now when PLOCK_LORW_INLINE_WAIT is defined, the pl_wait_unlock_long()
calls in pl_lorw_rdlock() and pl_lorw_wrlock() will be inlined so that
all the CPU time is accounted for in the calling function.

This is plock upstream commit c993f81d581732a6eb8fe3033f21970420d21e5e.
2023-08-17 00:09:05 +02:00
Willy Tarreau
66dcc0550e IMPORT: plock: always expose the inline version of the lock wait function
Doing so will allow to expose the time spent in certain highly
contended functions, which can be desirable for more accurate CPU
profiling. For example this could be done in locking functions that
are already not inlined so that they are the ones being reported as
those consuming the CPU instead of just pl_wait_unlock_long().

This is plock upstream commit 7505c2e2c8c4aa0ab8f52a2288e1334ae6412be4.
2023-08-17 00:09:05 +02:00
Willy Tarreau
c6b98f05d2 IMPORT: plock: also support inlining the int code
Commit 9db830b ("plock: support inlining exponential backoff code")
added an option to support inlining of the wait code for longs but
forgot to do it for ints. Let's do it now.

This is plock upstream commit b1f9f0d252fa40577d11cfb2bc0a809d6960a297.
2023-08-17 00:09:05 +02:00
Willy Tarreau
7bf829ace1 MAJOR: pools: move the shared pool's free_list over multiple buckets
This aims at further reducing the contention on the free_list when using
global pools. The free_list pointer now appears for each bucket, and both
the alloc and the release code skip to a next bucket when ending on a
contended entry. The default entry used for allocations and releases
depend on the thread ID so that locality is preserved as much as possible
under low contention.

It would be nice to improve the situation to make sure that releases to
the shared pools doesn't consider the first entry's pointer but only an
argument that would be passed and that would correspond to the bucket in
the thread's cache. This would reduce computations and make sure that the
shared cache only contains items whose pointers match the same bucket.
This was not yet done. One possibility could be to keep the same splitting
in the local cache.

With this change, an h2load test with 5 * 160 conns & 40 streams on 80
threads that was limited to 368k RPS with the shared cache jumped to
3.5M RPS for 8 buckets, 4M RPS for 16 buckets, 4.7M RPS for 32 buckets
and 5.5M RPS for 64 buckets.
2023-08-12 19:04:34 +02:00
Willy Tarreau
8a0b5f783b MINOR: pools: move the failed allocation counter over a few buckets
The failed allocation counter cannot depend on a pointer, but since it's
a perpetually increasing counter and not a gauge, we don't care where
it's incremented. Thus instead we're hashing on the TID. There's no
contention there anyway, but it's better not to waste the room in
the pool's heads and to move that with the other counters.
2023-08-12 19:04:34 +02:00
Willy Tarreau
da6999f839 MEDIUM: pools: move the needed_avg counter over a few buckets
That's the same principle as for ->allocated and ->used. Here we return
the summ of the raw values, so the result still needs to be fed to
swrate_avg(). It also means that we now use the local ->used instead
of the global one for the calculations and do not need to call pool_used()
anymore on fast paths. The number of samples should likely be divided by
the number of buckets, but that's not done yet (better observe first).

A function pool_needed_avg() was added to report aggregated values for
the "show pools" command.

With this change, an h2load made of 5 * 160 conn * 40 streams on 80
threads raised from 1.5M RPS to 6.7M RPS.
2023-08-12 19:04:34 +02:00
Willy Tarreau
9e5eb586b1 MEDIUM: pools: move the used counter over a few buckets
That's the same principle as for ->allocated. The small difference here
is that it's no longer possible to decrement ->used in batches when
releasing clusters from the cache to the shared cache, so the counter
has to be decremented for each of them. But as it provides less
contention and it's done only during forced eviction, it shouldn't be
a problem.

A function "pool_used()" was added to return the sum of the entries.
It's used by pool_alloc_nocache() and pool_free_nocache() which need
to count the number of used entries. It's not a problem since such
operations are done when picking/releasing objects to/from the OS,
but it is a reminder that the number of buckets should remain small.

With this change, an h2load test made of 5 * 160 conn * 40 streams on
80 threads raised from 812k RPS to 1.5M RPS.
2023-08-12 19:04:34 +02:00
Willy Tarreau
cdb711e42b MEDIUM: pools: spread the allocated counter over a few buckets
The ->used counter is one of the most stressed, and it heavily
depends on the ->allocated one, so let's first move ->allocated
to a few buckets.

A function "pool_allocated()" was added to return the sum of the entries.
It's important not to abuse it as it does iterate, so everywhere it's
possible to avoid it by keeping a local counter, it's better. Currently
it's used for limited pools which need to make sure they do not allocate
too many objects. That's an acceptable tradeoff to save CPU on large
machines at the expense of spending a little bit more on small ones which
normally are not under load.
2023-08-12 19:04:34 +02:00
Willy Tarreau
06885aaea7 MINOR: pools: introduce the use of multiple buckets
On many threads and without the shared cache, there can be extreme
contention on the ->allocated counter, the ->free_list pointer, and
the ->used counter. It's possible to limit this contention by spreading
the counters a little bit over multiple entries, that are summed up when
a consultation is needed. The criterion used to spread the values cannot
be related to the thread ID due to migrations, since we need to keep
consistent stats (allocated vs used).

Instead we'll just hash the pointer, it provides an index that does the
job and that is consistent for the object. When having just a few entries
(16 here as it showed almost identical performance between global and
non-global pools) even iterations should be short enough during
measurements to not be a problem.

A pair of functions designed to ease pointer hash bucket calculation were
added, with one of them doing it for thread IDs because allocation failures
will be associated with a thread and not a pointer.

For now this patch only brings in the relevant parts of the infrastructure,
the CONFIG_HAP_POOL_BUCKETS_BITS macro that defaults to 6 bits when 512
threads or more are supported, 5 bits when 128 or more are supported, 4
bits when 16 or more are supported, otherwise 3 bits for small setups.
The array in the pool_head and the two utility functions are already
added. It should have no measurable impact beyond inflating the pool_head
structure.
2023-08-12 19:04:34 +02:00
Willy Tarreau
29ad61fb00 OPTIM: pools: make pool_get_from_os() / pool_put_to_os() not update ->allocated
The pool's allocation counter doesn't strictly require to be updated
from these functions, it may more efficiently be done in the caller
(even out of a loop for pool_flush() and pool_gc()), and doing so will
also help us spread the counters over an array later. The functions
were renamed _noinc and _nodec to make sure we catch any possible
user in an external patch. If needed, the original functions may easily
be reimplemented in an inline function.
2023-08-12 19:04:34 +02:00
Willy Tarreau
f0d188f6ed OPTIM: tools: improve hash distribution using a better prime seed
During tests it was noticed that the current hash is not that good
on 4- and 5- bit hashes. About 7.5% of all the 32-bit primes were tested
as candidates for the hash function, by submitting them 128 arrangements
of N pointers among 40k extracted from haproxy's pools, and the average
fill rates for 1- to 12- bit hashes were measured and compared. It was
clear that some values do not provide great hashes and other ones are
way more resistant.

The current value is not bad at all but delivers 42.6% unique 2-bit
outputs, 41.6% 3-bit, 38.0% 4-bit, 38.2% 5-bit and 37.1% 10-bit. Some
values did perform significantly better, among which 0xacd1be85 which
does 43.2% 2-bit, 42.5% 3-bit, 42.2% 4-bit, 39.2% 5-bit and 37.3% 10-bit.

The reverse value used in the ptr2_hash() was really underperforming and
was replaced with 0x9d28e4e9 which does 49.6%, 40.4%, 42.6%, 39.1%, and
37.2% respectvely.

This should slightly improve the accuracy of the task and memory
profiling, and will be useful for pools.
2023-08-12 19:04:34 +02:00
Willy Tarreau
58946d44f8 MINOR: tools: improve ptr hash distribution on 64 bits
When testing the pointer hash on 64-bit real pointers (map entries),
it appeared that the shift by 33 bits that hoped to compensate for the
3 nul LSB degrades the hash, and the centering is more optimal on
31-(bits+1)/2. This makes sense since the topmost bit of the
multiplicator is 31, so for an input of 1 bit and 1 bit of output we
would always get zero. With the formula adjusted this way, we can get
up to ~15% more unique entries at 10 bits and ~24% more at 11 bits.
2023-08-12 19:04:34 +02:00
Willy Tarreau
ab6cb5dea0 MINOR: tools: make ptr_hash() support 0-bit outputs
When dealing with macro-based size definitions, it is useful to be able
to hash pointers on zero bits so that the macro automatically returns a
constant 0. For now it only supports 1-32. Let's just add this special
case. It's automatically optimized out by the compiler since the function
is inlined.
2023-08-12 19:04:34 +02:00
Willy Tarreau
59c347c15e BUILD: defaults: use __WORDSIZE not LONGBITS for MAX_THREADS_PER_GROUP
LONGBITS was defined long ago with old compilers that didn't provide the
word size. It's still present as being referenced in various places in the
code, but we must not use it to define other macros that may be evaluated
at pre-processing time since it contains sizeof() and casts that are not
compatible with preprocessor conditions. Let's switch MAX_THREADS_PER_GROUP
to __WORDSIZE so that we can condition blocks of code on it if needed.

LONGBITS should really be removed by now, given that we don't support
compilers not providing __WORDSIZE anymore (gcc < 4.2).
2023-08-12 19:04:34 +02:00
Willy Tarreau
9e52c35de4 CLEANUP: stick-table: slightly reorder the stktable struct
By moving the config-time stuff after the updt_lock, we can plug some
holes without interfering with it. This allows us to get back to the
768-bytes struct. The performance was not affected at all.
2023-08-11 19:03:35 +02:00
Willy Tarreau
9c6248560e MINOR: stick-table: move the update lock into its own cache line
The read-lock contention observed on the update lock while turning it
into an upgradable lock were due to false sharing with the nearby
updates. Simply moving the lock alone into its own cache line is
sufficient to almost double the performance again, raising from 2355
to 4480k RPS with very low contention:

  Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 743422995452 lost
  Overhead  Shared Object          Symbol
    15.88%  haproxy                [.] stktable_lookup_key
     5.94%  haproxy                [.] ebmb_lookup
     5.69%  haproxy                [.] http_wait_for_request
     3.66%  haproxy                [.] stktable_touch_with_exp
     2.62%  [kernel]               [k] _raw_spin_unlock_irqrestore
     1.86%  haproxy                [.] http_action_return
     1.79%  haproxy                [.] stream_process_counters
     1.78%  [kernel]               [k] skb_release_data
     1.77%  haproxy                [.] process_stream

Unfortunately, trying to move the line anywhere else didn't work,
despite the remaining holes, because this structure is not quite
clean. This adds 64 bytes to a struct that was already 768 long,
so it's now 832. It's possible to repack it a little bit and regain
these bytes by removing the THREAD_ALIGN before "keys" because we
rarely use the config stuff, but that's a bit unsafe.
2023-08-11 19:03:35 +02:00
Willy Tarreau
87e072eea5 MEDIUM: stick-table: use a distinct lock for the updates tree
Updating an entry in the updates tree is currently performed under the
table's write lock, which causes huge contention with other accesses
such as lookups and free. Aside the updates tree, the update,
localupdate and commitupdate variables, nothing is manipulated, so
let's create a distinct lock (updt_lock) to protect these together
to remove this contention. It required to add an extra lock in the
few places where we delete the update (though only if we're really
going to delete it) to protect the tree. This is very convenient
because now peer_send_teachmsgs() only needs to take this read lock,
and there is very little contention left on the stick-table.

With this alone, the performance jumped from 614k to 1140k/s on a
80-thread machine with a peers section! Stick-table updates with
no peers however now has to stand two locks and slightly regressed
from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the
significant unlocking of the peers updates and considered totally
acceptable.
2023-08-11 19:03:35 +02:00
Willy Tarreau
cc10fce9c2 MINOR: stick-table: better organize the struct stktable
The structure currently mixes R/O and R/W fields, let's organize them
by access type, focusing mainly on splitting the updates from the rest
so that peers activity does not affect the rest. For now it doesn't
bring any benefit but it paves the way for splitting the lock.
2023-08-11 19:03:35 +02:00
Willy Tarreau
7968fe3889 MEDIUM: stick-table: change the ref_cnt atomically
Due to the ts->ref_cnt being manipulated and checked inside wrlocks,
we continue to have it updated under plenty of read locks, which have
an important cost on many-thread machines.

This patch turns them all to atomic ops and carefully moves them outside
of locks every time this is possible:
  - the ref_cnt is incremented before write-unlocking on creation otherwise
    the element could vanish before we can do it
  - the ref_cnt is decremented after write-locking on release
  - for all other cases it's updated out of locks since it's guaranteed by
    the sequence that it cannot vanish
  - checks are done before locking every time it's used to decide
    whether we're going to release the element (saves several write locks)
  - expiration tests are just done using atomic loads, since there's no
    particular ordering constraint there, we just want consistent values.

For Lua, the loop that is used to dump stick-tables could switch to read
locks only, but this was not done.

For peers, the loop that builds updates in peer_send_teachmsgs is extremely
expensive in write locks and it doesn't seem this is really needed since
the only updated variables are last_pushed and commitupdate, the first
one being on the shared table (thus not used by other threads) and the
commitupdate could likely be changed using a CAS. Thus all of this could
theoretically move under a read lock, but that was not done here.

On a 80-thread machine with a peers section enabled, the request rate
increased from 415 to 520k rps.
2023-08-11 19:03:35 +02:00
Willy Tarreau
8178a5211c MAJOR: threads/plock: update the embedded library again
This updates the local copy of the plock library to benefit from finer
memory ordering, EBO on more operations such as when take_w() and stow()
wait for readers to leave  and refined EBO, especially on common operation
such as attempts to upgade R to S, and avoids a counter-productive prior
read in rtos() and take_r().

These changes have shown a 5% increase on regular operations on ARM,
a 33% performance increase on ARM on stick-tables and 2% on x86, and
a 14% and 4% improvements on peers updates respectively on ARM and x86.

The availability of relaxed operations will probably be useful for stats
counters which are still extremely expensive to update.

The following plock commits were included in this update:

  9db830b plock: support inlining exponential backoff code
  008d3c2 plock: make the rtos upgrade faster
  2f76dde atomic: clean up the generic xchg()
  3c6919b atomic: make sure that the no-return macros do not return a value
  97c2bb7 atomic: make the fallback bts use the pointed type for the shift
  f4c1880 atomic: also implement the missing pl_btr()
  8329b82 atomic: guard all generic definitions to make it easier to provide specific ones
  7c5cb62 atomic: use C11 atomics when available
  96afaf9 atomic: prefer the C11 definitions in general
  f3ec7a6 atomic: implement load/store/atomic barriers
  8bdbd1e atomic: add atomic load/stores
  0f604c0 atomic: add more _noret operations
  3fe35db atomic: remove the (void) cast from the C11 operations
  3b08a7c atomic: allow to define the fallback _noret variants
  28deb22 atomic: make x86 arithmetic operations the _noret variants
  8061fe2 atomic: handle modern compilers that support returning flags
  b8b91b7 atomic: add the fetch-and-<op> operations (pl_ld<op>)
  59817ca atomic: add memory order variants for most operations
  a40774f plock: explicitly make use of the pl_*_noret operations
  6f1861b plock: switch to pl_sub_noret_lax() for cancellation
  c013980 plock: use pl_ldadd{_lax,_acq,} instead of pl_xadd()
  382eea3 plock: use a release ordering when dropping the lock
  60d750d plock: use EBO when waiting for readers to leave in take_w() and stow()
  fc01c4f plock: improve EBO a little bit
  1ef6390 plock: switch to CAS + XADD for pl_take_r()
2023-08-11 19:03:35 +02:00
Frédéric Lécaille
b0e32c6263 BUG/MINOR: quic: Possible crash when issuing "show fd/sess" CLI commands
->xprt_ctx (struct ssl_sock_ctx) and ->conn (struct connection) must be kept
by the remaining QUIC connection object (struct quic_cc_conn) after having
release the previous one (struct quic_conn) to allow "show fd/sess" commands
to be functional without causing haproxy crashes.

No need to backport.
2023-08-11 11:21:31 +02:00
Willy Tarreau
d93a00861d MINOR: h2: pass accept-invalid-http-request down the request parser
We're adding a new argument "relaxed" to h2_make_htx_request() so that
we can control its level of acceptance of certain invalid requests at
the proxy level with "option accept-invalid-http-request". The goal
will be to add deactivable checks that are still desirable to have by
default. For now no test is subject to it.
2023-08-08 19:10:54 +02:00
Willy Tarreau
30f58f4217 MINOR: http: add new function http_path_has_forbidden_char()
As its name implies, this function checks if a path component has any
forbidden headers starting at the designated location. The goal is to
seek from the result of a successful ist_find_range() for more precise
chars. Here we're focusing on 0x00-0x1F, 0x20 and 0x23 to make sure
we're not too strict at this point.
2023-08-08 19:10:54 +02:00
Willy Tarreau
197668de97 MINOR: ist: add new function ist_find_range() to find a character range
This looks up the character range <min>..<max> in the input string and
returns a pointer to the first one found. It's essentially the equivalent
of ist_find_ctl() in that it searches by 32 or 64 bits at once, but deals
with a range.
2023-08-08 19:10:54 +02:00
Willy Tarreau
d4069f3cee REORG: http: move has_forbidden_char() from h2.c to http.h
This function is not H2 specific but rather generic to HTTP. We'll
need it in H3 soon, so let's move it to HTTP and rename it to
http_header_has_forbidden_char().
2023-08-08 19:02:24 +02:00
Frédéric Lécaille
9f7cfb0a56 MEDIUM: quic: Allow the quic_conn memory to be asap released.
When the connection enters the "connection closing" state after having sent
a datagram with CONNECTION_CLOSE frames inside its packets, a lot of memory
may be freed from quic_conn objects (QUIC connection). This is done allocating
a reduced sized object which keeps enough information to handle the remaining
incoming packets for the connection in "connection closing" state, and to
continue to send again the previous datagram with CONNECTION_CLOSE frames inside
which has already been sent.

Define a new quic_cc_conn struct which represents the connection object after
entering the "connection close" state and after having release the quic_conn
connection object.

Define <pool_head_quic_cc_conn> new pool for these quic_cc_conn struct objects.

Define QUIC_CONN_COMMON structure which is shared between quic_conn struct object
(the connection before entering "connection close" state), and new quic_cc_conn
struct object (the connection after entering "connection close"). So, all the
members inside QUIC_CONN_COMMON may be indifferently dereferenced from a
quic_conn struct or a quic_cc_conn struct pointer.

Implement qc_new_cc_conn() function to allocate such connections in
"connection close" state. This function is responsible of copying the
required information from the original connection (quic_conn) to the remaining
connection (quic_cc_conn). Among others initialization, it redefined the
QUIC packet handler task to quic_cc_conn_io_cb() and the idle timer task
to qc_cc_idle_timer_task(). quic_cc_conn_io_cb() drains the received and
resend the datagram which CONNECTION_CLOSE frame which has already been sent
when entering "connection close" state. qc_cc_idle_timer_task() only releases
the remaining quic_cc_conn struct object.

Modify quic_conn_release() to allocate quic_cc_conn struct objects from the
original connection passed as argument. It does nothing if this original
connection is not in closing state, or if the idle timer has already expired.

Implement quic_release_cc_conn() to release a "connection close" connection.
It is called when its timer expires or if an error occured when sending
a packet from this connection when the peer is no more reachable.
2023-08-08 14:59:17 +02:00
Frédéric Lécaille
276697438d MINOR: quic: Use a pool for the connection ID tree.
Add "quic_cids" new pool to allocate the ->cids trees of quic_conn objects.
Replace ->cids member of quic_conn objects by pointer to "quic_cids" and
adapt the code consequently. Nothing special.
2023-08-08 10:57:00 +02:00
Frédéric Lécaille
dc9b8e1f27 MEDIUM: quic: Send CONNECTION_CLOSE packets from a dedicated buffer.
Add a new pool <pool_head_quic_cc_buf> for buffer used when building datagram
wich CONNECTION_CLOSE frames inside with QUIC_MIN_CC_PKTSIZE(128) as minimum
size.
Add ->cc_buf_area to quic_conn struct to store such buffers.
Add ->cc_dgram_len to store the size of the "connection close" datagrams
and ->cc_buf a buffer struct to be used with ->cc_buf_area as ->area member
value.
Implement qc_get_txb() to be called in place of qc_txb_alloc() to allocate
a struct "quic_cc_buf" buffer when the connection needs an immediate close
or a buffer struct if not.
Modify qc_prep_hptks() and qc_prep_app_pkts() to allow them to use such
"quic_cc_buf" buffer when an immediate close is required.
2023-08-08 10:57:00 +02:00
Frédéric Lécaille
f7ab5918d1 MINOR: quic: Move some counters from [rt]x quic_conn anonymous struct
Move rx.bytes, tx.bytes and tx.prep_bytes quic_conn struct member to
bytes anonymous struct (bytes.rx, bytes.tx and bytes.prep member respectively).
They are moved before being defined into a bytes anonoymous struct common to
a future struct to be defined.

Consequently adapt the code.
2023-08-07 18:57:45 +02:00
Frédéric Lécaille
a45f90dd4e MINOR: quic: Amplification limit handling sanitization.
Add a BUG_ON() to quic_peer_validated_addr() to check the amplification limit
is respected when it return false(0), i.e. when the connection is not validated.

Implement quic_may_send_bytes() which returns the number of bytes which may be
sent when the connection has not already been validated and call this functions
at several places when this is the case (after having called
quic_peer_validated_addr()).

Furthermore, this patch improves the code maintainability. Some patches to
come will have to rename ->[rt]x.bytes quic_conn struct members.
2023-08-07 18:57:45 +02:00
Frédéric Lécaille
1f40b6c9fe CLEANUP: quic: Remove quic_path_room().
This function is definitively no more needed/used.
2023-08-07 18:57:45 +02:00
Amaury Denoyelle
559482c11e MINOR: h3: abort request if not completed before full response
A HTTP server may provide a complete response even prior receiving the
full request. In this case, RFC 9114 allows the server to abort read
with a STOP_SENDING with error code H3_NO_ERROR.

This scenario was notably reproduced with haproxy and an inactive
server. If the client send a POST request, haproxy may provide a full
HTTP 503 response before the end of the full request.
2023-08-04 16:17:16 +02:00
Christopher Faulet
8670bb42c2 CLEANUP: stconn: Move comment about sedesc fields on the field line
Fields of sedesc structure were documented in the comment about the
structure itself. It was not really convenient, hard to read, hard to
update. So comments about the fields are moved on the corresponding field
line, as usual.
2023-08-04 14:32:57 +02:00
Christopher Faulet
ef2b15998c BUG/MINOR: htx/mux-h1: Properly handle bodyless responses when splicing is used
There is a mechanisme in the H1 and H2 multiplexer to skip the payload when
a response is returned to the client when it must not contain any payload
(response to a HEAD request or a 204/304 response). However, this does not
work when the splicing is used. The H2 multiplexer does not support the
splicing, so there is no issue. But with the mux-h1, when data are sent
using the kernel splicing, the mux on the server side is not aware the
client side should skip the payload. And once the data are put in a pipe,
there is no way to stop the sending.

It is a defect of the current design. This will be easier to deal with this
case when the mux-to-mux forwarding will be implemented. But for now, to fix
the issue, we should add an HTX flag on the start-line to pass the info from
the client side to the server side and be able to disable the splicing in
necessary.

The associated reg-test was improved to be sure it does not fail when the
splicing is configured.

This patch should be backported as far as 2.4..
2023-08-02 12:05:05 +02:00
Patrick Hemmer
7fccccccea MINOR: acl: add acl() sample fetch
This provides a sample fetch which returns the evaluation result
of the conjunction of named ACLs.
2023-08-01 10:49:06 +02:00
Patrick Hemmer
00e00fb424 REORG: cfgparse: extract curproxy as a global variable
This extracts curproxy from cfg_parse_listen so that it can be referenced by
keywords that need the context of the proxy they are being used within.
2023-08-01 10:48:28 +02:00
Patrick Hemmer
997a31dbdf CLEANUP: acl: remove cache_idx from acl struct
It isn't used and never has been.
2023-08-01 10:48:05 +02:00
Frédéric Lécaille
c156c5bda6 MINOR: quic; Move the QUIC frame pool to its proper location
pool_head_quic_frame QUIC frame pool definition is move from quic_conn-t.h to
quic_frame-t.h.
Its declation is moved from quic_conn.c to quic_frame.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
fa58f67787 CLEANUP: quic: quic_conn struct cleanup
Remove no more used QUIC_TX_RING_BUFSZ macro.
Remove several no more used quic_conn struct members.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
444c1a4113 MINOR: quic: Split QUIC connection code into three parts
Move the TX part of the code to quic_tx.c.
Add quic_tx-t.h and quic_tx.h headers for this TX part code.
The definition of quic_tx_packet struct has been move from quic_conn-t.h to
quic_tx-t.h.

Same thing for the TX part:
Move the RX part of the code to quic_rx.c.
Add quic_rx-t.h and quic_rx.h headers for this TX part code.
The definition of quic_rx_packet struct has been move from quic_conn-t.h to
quic_rx-t.h.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
2fe50a01ca CLEANUP: quic: Defined but no more used function (quic_get_tls_enc_levels())
This function is no more used since this commit:

    MEDIUM: quic: Handshake I/O handler rework.

Let's remove it!
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
7008f16d57 MINOR: quic: Add a new quic_ack.c C module for QUIC acknowledgements
Extract the code in relation with the QUIC acknowledgements from quic_conn.c
to quic_ack.c to accelerate the compilation of quic_conn.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
f454b78fa9 MINOR: quic: Add new "QUIC over SSL" C module.
Move the code which directly calls the functions of the OpenSSL QUIC API into
quic_ssl.c new C file.
Some code have been extracted from qc_conn_finalize() to implement only
the QUIC TLS part (see quic_tls_finalize()) into quic_tls.c.
qc_conn_finalize() has also been exported to be used from this new quic_ssl.c
C module.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
57237f68ad MINOR: quic: Move TLS related code to quic_tls.c
quic_tls_key_update() and quic_tls_rotate_keys() are QUIC TLS functions.
Let's move them to their proper location: quic_tls.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
953e67abb6 MINOR: quic: Export QUIC CLI code from quic_conn.c
To accelerate the compilation of quic_conn.c file, export the code in relation
with the QUIC CLI from quic_conn.c to quic_cli.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
6334f4f6c5 MINOR: quic: Export QUIC traces code from quic_conn.c
To accelerate the compilation of quic_conn.c file, export the code in relation
with the traces from quic_conn.c to quic_trace.c.
Also add some headers (quic_trace-t.h and quic_trace.h).
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
f32201abb0 MINOR: quic: Add "limited-quic" new tuning setting
This setting which may be used into a "global" section, enables the QUIC listener
bindings when haproxy is compiled with the OpenSSL wrapper. It has no effect
when haproxy is compiled against a TLS stack with QUIC support, typically quictls.
2023-07-21 19:19:27 +02:00
Frédéric Lécaille
2fd67c558a MINOR: quic: Missing encoded transport parameters for QUIC OpenSSL wrapper
This wrapper needs to have an access to an encoded version of the local transport
parameter (to be sent to the peer). They are provided to the TLS stack thanks to
qc_ssl_compat_add_tps_cb() callback.

These encoded transport parameters were attached to the QUIC connection but
removed by this commit to save memory:

      MINOR: quic: Stop storing the TX encoded transport parameters

This patch restores these transport parameters and attaches them again
to the QUIC connection (quic_conn struct), but only when the QUIC OpenSSL wrapper
is compiled.
Implement qc_set_quic_transport_params() to encode the transport parameters
for a connection and to set them into the stack and make this function work
for both the OpenSSL wrapper or any other TLS stack with QUIC support. Its uses
the encoded version of the transport parameters attached to the connection
when compiled for the OpenSSL wrapper, or local parameters when compiled
with TLS stack with QUIC support. These parameters are passed to
quic_transport_params_encode() and SSL_set_quic_transport_params() as before
this patch.
2023-07-21 17:27:40 +02:00
Frédéric Lécaille
7978493c2e MINOR: quic: Add a quic_openssl_compat struct to quic_conn struct
Add quic_openssl_compat struct to the quic_conn struct to support the
QUIC OpenSSL wrapper feature.
2023-07-21 15:54:31 +02:00
Frédéric Lécaille
e3991e03cc MINOR: quic: Export some KDF functions (QUIC-TLS)
quic_hkdf_expand() and quic_hkdf_expand_label() must be used by the QUIC OpenSSL
wrapper.
2023-07-21 15:53:41 +02:00
Frédéric Lécaille
780133548c MINOR: quic: Include QUIC opensssl wrapper header from TLS stacks compatibility header
Include haproxy/quic_openssl_compat.h from haproxy/openssl-compat.h when the
compilation of the QUIC openssl wrapper for TLS stacks is enabled with
USE_QUIC_OPENSSLCOMPAT.
2023-07-21 15:53:40 +02:00
Frédéric Lécaille
1b03f8016d MINOR: quic: QUIC openssl wrapper implementation
Highly inspired from nginx openssl wrapper code.

This wrapper implement this list of functions:

   SSL_set_quic_method(),
   SSL_quic_read_level(),
   SSL_quic_write_level(),
   SSL_set_quic_transport_params(),
   SSL_provide_quic_data(),
   SSL_process_quic_post_handshake()

and SSL_QUIC_METHOD QUIC specific bio method which are also implemented by quictls
to support QUIC from OpenSSL. So, its aims is to support QUIC from a standard OpenSSL
stack without QUIC support. It relies on the OpenSSL keylog feature to retreive
the secrets derived by the OpenSSL stack during a handshake and to pass them to
the ->set_encryption_secrets() callback as this is done by quictls. It makes
usage of a callback (quic_tls_compat_msg_callback()) to handle some TLS messages
only on the receipt path. Some of them must be passed to the ->add_handshake_data()
callback as this is done with quictls to be sent to the peer as CRYPTO data.
quic_tls_compat_msg_callback() callback also sends the received TLS alert with
->send_alert() callback.

AES 128-bits with CCM mode is not supported at this time. It is often disabled by
the OpenSSL stack, but as it can be enabled by "ssl-default-bind-ciphersuites",
the wrapper will send a TLS alerts (Handhshake failure) if this algorithm is
negotiated between the client and the server.

0rtt is also not supported by this wrapper.
2023-07-21 15:53:40 +02:00
Frédéric Lécaille
72619bda4c MINOR: quic: add trace about pktns packet/frames releasing
Add useful traces which have alredy helped in debugging issues.
2023-07-21 14:31:42 +02:00
Frédéric Lécaille
0645e56a6e MINOR: quic: Add traces for qc_frm_free()
Useful to diagnose memory leak issues in relation with the QUIC frame objects.
2023-07-21 14:30:35 +02:00
Frédéric Lécaille
cf2368a3d5 MEDIUM: quic: Packet building rework.
The aim of this patch is to allow the building of QUIC datagrams with
as much as packets with different encryption levels inside during handshake.
At this time, this is possible only for at most two encryption levels.
That said, most of the time, a server only needs to use two encryption levels
by datagram, except during retransmissions.

Modify qc_prep_pkts(), the function responsible of building datagrams, to pass
a list of encryption levels as parameter in place of two encryption levels. This
function is also used when retransmitting datagrams. In this case this is a
customized/flexible list of encryption level which is passed to this function.
Add ->retrans new member to quic_enc_level struct, to be used as attach point
to list of encryption level used only during retransmission, and ->retrans_frms
new member which is a pointer to a list of frames to be retransmitted.
2023-07-21 14:30:35 +02:00
Frédéric Lécaille
2b8510d722 MINOR: quic: Release asap the negotiated Initial TLS context.
This context may be released at the same time as the Initial TLS context.
This is done calling quic_tls_ctx_secs_free() and pool_free() in two code locations.
Implement quic_nictx_free() to do that.
2023-07-21 14:27:10 +02:00
Frédéric Lécaille
90a63ae4fa MINOR: quic: Dynamic allocation for negotiated Initial TLS cipher context.
Shorten ->negotiated_ictx quic_conn struct member (->nictx).
This variable is used during version negotiation. Indeed, a connection
may have to support support several QUIC versions of paquets during
the handshake. ->nictx is the QUIC TLS cipher context used for the negotiated
QUIC version.

This patch allows a connection to dynamically allocate this TLS cipher context.

Add a new pool (pool_head_quic_tls_ctx) for such QUIC TLS cipher context object.
Modify qc_new_conn() to initialize ->nictx to NULL value.
quic_tls_ctx_secs_free() frees all the secrets attached to a QUIC TLS cipher context.
Modify it to do nothing if it is called with a NULL TLS cipher context.
Modify to allocate ->nictx from qc_conn_finalize() just before initializing
its secrets. qc_conn_finalize() allocates -nictx only if needed (if a new QUIC
version was negotiated).
Modify qc_conn_release() which release a QUIC connection (quic_conn struct) to
release ->nictx TLS cipher context.
2023-07-21 14:27:10 +02:00
Frédéric Lécaille
642dba8c22 MINOR: quic: Stop storing the TX encoded transport parameters
There is no need to keep an encoded version of the QUIC listener transport
parameters attache to the connection.

Remove ->enc_params and ->enc_params_len member of quic_conn struct.
Use variables to build the encoded transport parameter local to
ha_quic_set_encryption_secrets() before they are passed to
SSL_set_quic_transport_params().

Modify qc_ssl_sess_init() prototype. It was expected to be used with
the encoded transport parameters as passed parameter, but they were not
used. Cleanup this function.
2023-07-21 14:27:10 +02:00
Patrick Hemmer
57926fe8a3 MINOR: peers: add peers keyword registration
This adds support for registering keywords in the 'peers' section.
2023-07-20 18:12:44 +02:00
Willy Tarreau
6ecabb3f35 CLEANUP: config: make parse_cpu_set() return documented values
parse_cpu_set() stopped returning the undocumented -1 which was a
leftover from an earlier attempt, changed from ulong to int since
it only returns a success/failure and no more a mask. Thus it must
not return -1 and its callers must only test for != 0, as is
documented.
2023-07-20 11:01:09 +02:00
Willy Tarreau
f54d8c6457 CLEANUP: cpuset: remove the unused proc_t1 field in cpu_map
This field used to store the cpumap of the first thread in a group, and
was used till 2.4 to hold some default settings, after which it was no
longer used. Let's just drop it.
2023-07-20 11:01:09 +02:00
Willy Tarreau
151f9a2808 BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct
We're currently having a problem with the porting from cpu_map from
processes to thread-groups as it happened in 2.7 with commit 5b09341c0
("MEDIUM: cpu-map: replace the process number with the thread group
number"), though it seems that it has deeper roots even in 2.0 and
that it was progressively made worng over time.

The issue stems in the way the per-process and per-thread cpu-sets were
employed over time. Originally only processes were supported. Then
threads were added after an optional "/" and it was documented that
"cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified
in 2.5 by commit 317804d28 ("DOC: update references to process numbers
in cpu-map and bind-process").

The reality is different: when processes were still supported, setting
"cpu-map 1" would apply the mask to the process itself (and only when
run in the background, which is not documented either and is also a
bug for another fix), and would be combined with any possible per-thread
mask when calculating the threads' affinity, possibly resulting in empty
sets. However, "cpu-map 1/all" would only set the mask for the threads
and not the process. As such the following:

    cpu-map 1 odd
    cpu-map 1/1-8 even

would leave no CPU while doing:

    cpu-map 1/all odd
    cpu-map 1/1-8 even

would allow all CPUs.

While such configs are very unlikely to ever be met (which is why this
bug is tagged minor), this is becoming quite more visible while testing
automatic CPU binding during 2.9 development because due to this bug
it's much more common to end up with incorrect bindings.

This patch fixes it by simply removing the .proc entry from cpu_map and
always setting all threads' maps. The process is no longer arbitrarily
bound to the group 1's mask, but in case threads are disabled, we'll
use thread 1's mask since it contains the configured CPUs.

This fix should be backported at least to 2.6, but no need to insist if
it resists as it's easier to break cpu-map than to fix an unlikely issue.
2023-07-20 11:01:09 +02:00
Willy Tarreau
7134417613 MINOR: cpuset: add cpu_map_configured() to know if a cpu-map was found
Since we'll soon want to adjust the "thread-groups" degree of freedom
based on the presence of cpu-map, we first need to be able to detect
if cpu-map was used. This function scans all cpu-map sets to detect if
any is present, and returns true accordingly.
2023-07-20 11:01:09 +02:00
Aurelien DARRAGON
2e7d3d2e5c BUG/MINOR: hlua: hlua_yieldk ctx argument should support pointers
lua_yieldk ctx argument is of type lua_KContext which is typedefed to
intptr_t when available so it can be used to store pointers.

But the wrapper function hlua_yieldk() passes it as a regular it so it
breaks that promise.

Changing hlua_yieldk() prototype so that ctx argument is of type
lua_KContext.

This bug had no functional impact because ctx argument is not being
actively used so far. This may be backported to all stable versions
anyway.
2023-07-17 07:42:47 +02:00
Emeric Brun
49ddd87d41 CLEANUP: quic: remove useless parameter 'key' from quic_packet_encrypt
Parameter 'key' was not used in this function.

This patch removes it from the prototype of the function.

This patch could be backported until v2.6.
2023-07-12 14:33:03 +02:00
Emeric Brun
cadb232e93 BUG/MEDIUM: quic: timestamp shared in token was using internal time clock
The internal tick clock was used to export the timestamp int the token
on retry packets. Doing this in cluster mode the nodes don't
understand the timestamp from tokens generated by others.

This patch re-work this using the the real current date (wall-clock time).

Timestamp are also now considered in secondes instead of milleseconds.

This patch should be backported until v2.6
2023-07-12 14:32:01 +02:00
Aurelien DARRAGON
b6e2d62fb3 MINOR: sink/api: pass explicit maxlen parameter to sink_write()
sink_write() currently relies on sink->maxlen to know when to stop
writing a given payload.

But it could be useful to pass a smaller, explicit value to sink_write()
to stop before the ring maxlen, for instance if the ring is shared between
multiple feeders.

sink_write() now takes an optional maxlen parameter:
  if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen
  is smaller than ring->maxlen, else only ring->maxlen will be considered.

[for haproxy <= 2.7, patch must be applied by hand: that is:
__sink_write() and sink_write() should be patched to take maxlen into
account and function calls to sink_write() should use 0 as second argument
to keep original behavior]
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
4f0e0f5a65 MEDIUM: sample: introduce 'same' output type
Thierry Fournier reported an annoying side-effect when using the debug()
converter.

Consider the following examples:

[1]  http-request set-var(txn.test) bool(true),ipmask(24)
[2]  http-request redirect location /match if { bool(true),ipmask(32) }

When starting haproxy with [1] example we get:

   config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request set-var(txn.test)' rule : converter 'ipmask' cannot be applied.

With [2], we get:

  config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request redirect' rule : error in condition: converter 'ipmask' cannot be applied in ACL expression 'bool(true),ipmask(32)'.

Now consider the following examples which are based on [1] and [2]
but with the debug() sample conv inserted in-between those incompatible
sample types:

[1*] http-request set-var(txn.test) bool(true),debug,ipmask(24)
[2*] http-request redirect location /match if { bool(true),debug,ipmask(32) }

According to the documentation, "it is safe to insert the debug converter
anywhere in a chain, even with non-printable sample types".
Thus we don't expect any side-effect from using it within a chain.

However in current implementation, because of debug() returning SMP_T_ANY
type which is a pseudo type (only resolved at runtime), the sample
compatibility checks performed at parsing time are completely uneffective.
(haproxy will start and no warning will be emitted)

The indesirable effect of this is that debug() prevents haproxy from
warning you about impossible type conversions, hiding potential errors
in the configuration that could result to unexpected evaluation or logic
while serving live traffic. We better let haproxy warn you about this kind
of errors when it has the chance.

With our previous examples, this could cause some inconveniences. Let's
say for example that you are testing a configuration prior to deploying
it. When testing the config you might want to use debug() converter from
time to time to check how the conversion chain behaves.

Now after deploying the exact same conf, you might want to remove those
testing debug() statements since they are no longer relevant.. but
removing them could "break" the config and suddenly prevent haproxy from
starting upon reload/restart. (this is the expected behavior, but it
comes a bit too late because of debug() hiding errors during tests..)

To fix this, we introduce a new output type for sample expressions:
  SMP_T_SAME - may only be used as "expected" out_type (parsing time)
               for sample converters.

As it name implies, it is a way for the developpers to indicate that the
resulting converter's output type is guaranteed to match the type of the
sample that is presented on the converter's input side.
(converter may alter data content, but data type must not be changed)

What it does is that it tells haproxy that if switching to the converter
(by looking at the converter's input only, since outype is SAME) is
conversion-free, then the converter type can safely be ignored for type
compatibility checks within the chain.

debug()'s out_type is thus set to SMP_T_SAME instead of ANY, which allows
it to fully comply with the doc in the sense that it does not impact the
conversion chain when inserted between sample items.

Co-authored-by: Thierry Fournier <thierry.f.78@gmail.com>
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
a635a1779a MEDIUM: sample: add missing ADDR=>? compatibility matrix entries
SMP_T_ADDR support was added in b805f71 ("MEDIUM: sample: let the cast
functions set their output type").

According to the above commit, it is made clear that the ADDR type is
a pseudo/generic type that may be used for compatibility checks but
that cannot be emitted from a fetch or converter.

With that in mind, all conversions from ADDR to other types were
explicitly disabled in the compatibility matrix.

But later, when map_*_ip functions were updated in b2f8f08 ("MINOR: map:
The map can return IPv4 and IPv6"), we started using ADDR as "expected"
output type for converters. This still complies with the original
description from b805f71, because it is used as the theoric output
type, and is never emitted from the converters themselves (only "real"
types such as IPV4 or IPV6 are actually being emitted at runtime).

But this introduced an ambiguity as well as a few bugs, because some
compatibility checks are being performed at config parse time, and thus
rely on the expected output type to check if the conversion from current
element to the next element in the chain is theorically supported.

However, because the compatibility matrix doesn't support ADDR to other
types it is currently impossible to use map_*_ip converters in the middle
of a chain (the only supported usage is when map_*_ip converters are
at the very end of the chain).

To illustrate this, consider the following examples:

  acl test str(ok),map_str_ip(test.map) -m found                          # this will work
  acl test str(ok),map_str_ip(test.map),ipmask(24) -m found               # this will raise an error

Likewise, stktable_compatible_sample() check for stick tables also relies
on out_type[table_type] compatibility check, so map_*_ip cannot be used
with sticktables at the moment:

  backend st_test
    stick-table type string size 1m expire 10m store http_req_rate(10m)
  frontend fe
    bind localhost:8080
    mode http
    http-request track-sc0 str(test),map_str_ip(test.map) table st_test     # raises an error

To fix this, and prevent future usage of ADDR as expected output type
(for either fetches or converters) from introducing new bugs, the
ADDR=>? part of the matrix should follow the ANY type logic.

That is, ADDR, which is a pseudo-type, must be compatible with itself,
and where IPV4 and IPV6 both support a backward conversion to a given
type, ADDR must support it as well. It is done by setting the relevant
ADDR entries to c_pseudo() in the compatibility matrix to indicate that
the operation is theorically supported (c_pseudo() will never be executed
because ADDR should not be emitted: this only serves as a hint for
compatibility checks at parse time).

This is what's being done in this commit, thanks to this the broken
examples documented above should work as expected now, and future
usage of ADDR as out_type should not cause any issue.
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
30cd137d3f MINOR: sample: introduce c_pseudo() conv function
This function is used for ANY=>!ANY conversions in the compatibility
matrix to help differentiate between real NOOP (c_none) and pseudo
conversions that are theorically supported at config parse time but can
never occur at runtime,. That is, to explicit the fact that actual related
runtime operations (e.g.: ANY->IPV4) are not NOOP since they might require
some conversion to be performed depending on the input type.

When checking the conf we don't know the effective out types so
cast[pseudo type][pseudo type] is allowed in the compatibility matrix,
but at runtime we only expect cast[real type][(real type || pseudo type)]
because fetches and converters may not emit pseudo types, thus using
c_none() everywhere was too ambiguous.

The process will crash if c_pseudo() is invoked to help catch bugs:
crashing here means that a pseudo type has been encountered on a
converter's input at runtime (because it was emitted earlier in the
chain), which is not supported and results from a broken sample fetch
or converter implementation. (pseudo types may only be used as out_type
in sample definitions for compatibility checks at parsing time)
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
58bbe41cb8 MEDIUM: acl/sample: unify sample conv parsing in a single function
Both sample_parse_expr() and parse_acl_expr() implement some code
logic to parse sample conv list after respective fetch or acl keyword.

(Seems like the acl one was inspired by the sample one historically)

But there is clearly code duplication between the two functions, making
them hard to maintain.
Hopefully, the parsing logic between them has stayed pretty much the
same, thus the sample conv parsing part may be moved in a dedicated
helper parsing function.

This is what's being done in this commit, we're adding the new function
sample_parse_expr_cnv() which does a single thing: parse the converters
that are listed right after a sample fetch keyword and inject them into
an already existing sample expression.

Both sample_parse_expr() and parse_acl_expr() were adapted to now make
use of this specific parsing function and duplicated code parts were
cleaned up.

Although sample_parse_expr() remains quite complicated (numerous function
arguments due to contextual parsing data) the first goal was to get rid of
code duplication without impacting the current behavior, with the added
benefit that it may allow further code cleanups / simplification in the
future.
2023-07-03 16:32:01 +02:00
Frédéric Lécaille
7f3c1bef37 MINOR: quic: Drop packet with type for discarded packet number space.
This patch allows the low level packet parser to drop packets with type for discarded
packet number spaces. Furthermore, this prevents it from reallocating new encryption
levels and packet number spaces already released/discarded. When a packet number space
is discarded, it MUST NOT be reallocated.

As the packet number space discarding is done asap the type of packet received is
known, some packet number space discarding check may be safely removed from qc_try_rm_hp()
and qc_qel_may_rm_hp() which are called after having parse the packet header, and
is type.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
b97de9dc21 MINOR: quic: Move the packet number space status at quic_conn level
As the packet number spaces and encryption level are dynamically allocated,
the information about the packet number space discarded status must be kept
somewhere else than in these objects.

quic_tls_discard_keys() is no more useful.
Modify quic_pktns_discard() to do the same job: flag the quic_conn object
has having discarded packet number space.
Implement quic_tls_pktns_is_disarded() to check if a packet number space is
discarded. Note the Application data packet number space is never discarded.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
f7749968d6 CLEANUP: quic: Remove two useless pools a low QUIC connection level
Both "quic_tx_ring" and "quic_rx_crypto_frm" pool are no more used.

Should be backported as far as 2.6.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
a5c1a3b774 MINOR: quic: Reduce the maximum length of TLS secrets
The maximum length of the secrets derived by the TLS stack is 384 bits.
This reduces the size of the objects provided by the "quic_tls_secret" pool by
16 bytes.

Should be backported as far as 2.6
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
3097be92f1 MEDIUM: quic: Dynamic allocations of QUIC TLS encryption levels
Replace ->els static array of encryption levels by 4 pointers into the QUIC
connection object, quic_conn struct.
    ->iel denotes the Initial encryption level,
    ->eel the Early-Data encryption level,
    ->hel the Handshaske encryption level and
    ->ael the Application Data encryption level.

Add ->qel_list to this structure to list the encryption levels after having been
allocated. Modify consequently the encryption level object itself (quic_enc_level
struct) so that it might be added to ->qel_list QUIC connection list of
encryption levels.

Implement qc_enc_level_alloc() to initialize the value of a pointer to an encryption
level object. It is used to initialized the pointer newly added to the quic_conn
structure. It also takes a packet number space pointer address as argument to
initialize it if not already initialized.

Modify quic_tls_ctx_reset() to call it from quic_conn_enc_level_init() which is
called by qc_enc_level_alloc() to allocate an encryption level object.

Implement 2 new helper functions:
  - ssl_to_qel_addr() to match and pointer address to a quic_encryption level
    attached to a quic_conn object with a TLS encryption level enum value;
  - qc_quic_enc_level() to match a pointer to a quic_encryption level attached
    to a quic_conn object with an internal encryption level enum value.
This functions are useful to be called from ->set_encryption_secrets() and
->add_handshake_data() TLS stack called which takes a TLS encryption enum
as argument (enum ssl_encryption_level_t).

Replace all the use of the qc->els[] array element values by one of the newly
added ->[ieha]el quic_conn struct member values.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
25a7b15144 MINOR: quic: Add a pool for the QUIC TLS encryption levels
Very simple patch to define and declare a pool for the QUIC TLS encryptions levels.
It will be used to dynamically allocate such objects to be attached to the
QUIC connection object (quic_conn struct) and to remove from quic_conn struct the
static array of encryption levels (see ->els[]).
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
7d9f12998d CLEANUP: quic: Remove qc_list_all_rx_pkts() defined but not used
This function is not used. May be safely removed.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
6635aa6a0a MEDIUM: quic: Dynamic allocations of packet number spaces
Add a pool to dynamically handle the memory used for the QUIC TLS packet number spaces.
Remove the static array of packet number spaces at QUIC connection level (struct
quic_conn) and add three new members to quic_conn struc as pointers to quic_pktns
struct, one by packet number space as follows:
     ->ipktns for Initial packet number space,
     ->hpktns for Handshake packet number space and
     ->apktns for Application packet number space.
Also add a ->pktns_list new member (struct list) to quic_conn struct to attach
the list of the packet number spaces allocated for the QUIC connection.
Implement ssl_to_quic_pktns() to map and retrieve the addresses of these pointers
from TLS stack encryption levels.
Modify quic_pktns_init() to initialize these members.
Modify ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data()  to
allocate the packet numbers and initialize the encryption level.
Implement quic_pktns_release() which takes pointers to pointers to packet number
space objects to release the memory allocated for a packet number space attached
to a QUIC connection and reset their address values.

Modify qc_new_conn() to allocation only the Initial packet number space and
Initial encryption level.

Modify QUIC loss detection API (quic_loss.c) to use the new ->pktns_list
list attached to a QUIC connection in place of a static array of packet number
spaces.

Replace at several locations the use of elements of an array of packet number
spaces by one of the three pointers to packet number spaces
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
ef39a74f4a MINOR: quic: Move packet number space related functions
Move packet number space related functions from quic_conn.h to quic_tls.h.

Should be backported as far as 2.6 to ease future backports to come.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
411b6f73b7 MINOR: quic: Implement a packet number space identification function
Implement quic_pktns_char() to identify a packet number space from a
quic_conn object. Usefull only for traces.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
dc6b339733 MINOR: quic: Move QUIC encryption level structure definition
haproxy/quic_tls-t.h is the correct place to quic_enc_level structure
definition.

Should be backported as far as 2.6 to ease any further backport to come.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
6593ec6f5e MINOR: quic: Move QUIC TLS encryption level related code (quic_conn_enc_level_init())
quic_conn_enc_level_init() location is definitively in QUIC TLS API source file:
src/quic_tls.c.
2023-06-30 16:20:55 +02:00
Willy Tarreau
90d18e2006 IMPORT: slz: implement a synchronous flush() operation
In some cases it may be desirable for latency reasons to forcefully
flush the queue even if it results in suboptimal compression. In our
case the queue might contain up to almost 4 bytes, which need an EOB
and a switch to literal mode, followed by 4 bytes to encode an empty
message. This means that each call can add 5 extra bytes in the ouput
stream. And the flush may also result in the header being produced for
the first time, which can amount to 2 or 10 bytes (zlib or gzip). In
the worst case, a total of 19 bytes may be emitted at once upon a flush
with 31 pending bits and a gzip header.

This is libslz upstream commit cf8c4668e4b4216e930b56338847d8d46a6bfda9.
2023-06-30 16:12:36 +02:00
William Lallemand
593c895eed MINOR: ssl: allow to change the client-sigalgs on server lines
This patch introduces the "client-sigalgs" keyword for the server line,
which allows to configure the list of server signature algorithms
negociated during the handshake. Also available as
"ssl-default-server-client-sigalgs" in the global section.
2023-06-29 14:11:46 +02:00
William Lallemand
717f0ad995 MINOR: ssl: allow to change the server signature algorithm on server lines
This patch introduces the "sigalgs" keyword for the server line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-server-sigalgs" in
the global section.
2023-06-29 13:40:18 +02:00
Frédéric Lécaille
c2bab72d32 BUG/MINOR: quic: Missing TLS secret context initialization
This bug arrived with this commit:

     MINOR: quic: Remove pool_zalloc() from qc_new_conn()

Missing initialization of largest packet number received during a keyupdate phase.
This prevented the keyupdate feature from working and made the keyupdate interop
tests to fail for all the clients.

Furthermore, ->flags from quic_tls_ctx was also not initialized. This could
also impact the keyupdate feature at least.

No backport needed.
2023-06-19 19:05:45 +02:00
Frédéric Lécaille
ddc616933c MINOR: quic: Remove pool_zalloc() from qc_new_conn()
qc_new_conn() is ued to initialize QUIC connections with quic_conn struct objects.
This function calls quic_conn_release() when it fails to initialize a connection.
quic_conn_release() is also called to release the memory allocated by a QUIC
connection.

Replace pool_zalloc() by pool_alloc() in this function and initialize
all quic_conn struct members which are referenced by quic_conn_release() to
prevent use of non initialized variables in this fonction.
The ebtrees, the lists attached to quic_conn struct must be initialized.
The tasks must be reset to their NULL default values to be safely destroyed
by task_destroy(). This is all the case for all the TLS cipher contexts
of the encryption levels (struct quic_enc_level) and those for the keyupdate.
The packet number spaces (struct quic_pktns) must also be initialized.
->prx_counters pointer must be initialized to prevent quic_conn_prx_cntrs_update()
from dereferencing this pointer.
->latest_rtt member of quic_loss struct must also be initialized. This is done
by quic_loss_init() called by quic_path_init().
2023-06-16 16:55:58 +02:00
Frédéric Lécaille
d66896036a BUG/MINOR: quic: Missing initialization (packet number space probing)
->tx.pto_probe member of quic_pktns struct was not initialized by quic_pktns_init().
This bug never occured because all quic_pktns structs are attached to quic_conn
structs which are always pool_zalloc()'ed.

Must be backported as far as 2.6.
2023-06-14 11:35:22 +02:00
Aurelien DARRAGON
b7f8af3ca9 BUG/MINOR: proxy/server: free default-server on deinit
proxy default-server is a specific type of server that is not allocated
using new_server(): it is directly stored within the parent proxy
structure. However, since it may contain some default config options that
may be inherited by regular servers, it is also subject to dynamic members
(strings, structures..) that needs to be deallocated when the parent proxy
is cleaned up.

Unfortunately, srv_drop() may not be used directly from p->defsrv since
this function is meant to be used on regular servers only (those created
using new_server()).

To circumvent this, we're splitting srv_drop() to make a new function
called srv_free_params() that takes care of the member cleaning which
originally takes place in srv_drop(). This function is exposed through
server.h, so it may be called from outside server.c.

Thanks to this, calling srv_free_params(&p->defsrv) from free_proxy()
prevents any memory leaks due to dynamic parameters allocated when
parsing a default-server line from a proxy section.

This partially fixes GH #2173 and may be backported to 2.8.

[While it could also be relevant for other stable versions, the patch
won't apply due to architectural changes / name changes between 2.4 => 2.6
and then 2.6 => 2.8. Considering this is a minor fix that only makes
memory analyzers happy during deinit paths (at least for <= 2.8), it might
not be worth the trouble to backport them any further?]
2023-06-06 15:15:17 +02:00
Willy Tarreau
4ad1c9635a BUG/MINOR: stream: do not use client-fin/server-fin with HTX
Historically the client-fin and server-fin timeouts were made to allow
a connection closure to be effective quickly if the last data were sent
down a socket and the client didn't close, something that can happen
when the peer's FIN is lost and retransmits are blocked by a firewall
for example. This made complete sense in 1.5 for TCP and HTTP in close
mode. But nowadays with muxes, it's not done at the right layer anymore
and even the description doesn't match what is being done, because what
happens is that the stream will abort the whole transfer after it's done
sending to the mux and this timeout expires.

We've seen in GH issue 2095 that this can happen with very short timeout
values, and while this didn't trigger often before, now that the muxes
(h2 & quic) properly report an end of stream before even the first
sc_conn_sync_recv(), it seems that it can happen more often, and have
two undesirable effects:
  - logging a timeout when that's not the case
  - aborting the request channel, hence the server-side conn, possibly
    before it had a chance to be put back to the idle list, causing
    this connection to be closed and not reusable.

Unfortunately for TCP (mux_pt) this remains necessary because the mux
doesn't have a timeout task. So here we're adding tests to only do
this through an HTX mux. But to be really clean we should in fact
completely drop all of this and implement these timeouts in the mux
itself.

This needs to be backported to 2.8 where the issue was discovered,
and maybe carefully to older versions, though that is not sure at
all. In any case, using a higher timeout or removing client-fin in
HTTP proxies is sufficient to make the issue disappear.
2023-06-02 16:33:40 +02:00
Willy Tarreau
ae0f8be011 MINOR: stats: protect against future stats fields omissions
As seen in commits 33a4461fa ("BUG/MINOR: stats: Fix Lua's `get_stats`
function") and a46b142e8 ("BUG/MINOR: Missing stat_field_names (since
f21d17bb)") it seems frequent to omit to update stats_fields[] when
adding a new ST_F_xxx entry. This breaks Lua's get_stats() and shows
a "(null)" in the header of "show stat", but that one is not detectable
to the naked eye anymore.

Let's add a reminder above the enum declaration about this, and a small
reg tests checking for the absence of "(null)". It was verified to fail
before the last patch above.
2023-06-02 08:39:53 +02:00
Willy Tarreau
cb6a35fdc1 [RELEASE] Released version 2.9-dev0
Released version 2.9-dev0 with the following main changes :
    - MINOR: version: mention that it's development again
2023-05-31 16:29:19 +02:00
Willy Tarreau
9dc8308a67 MINOR: version: mention that it's development again
This essentially reverts b9b6e94474.
2023-05-31 16:28:34 +02:00
Willy Tarreau
b9b6e94474 MINOR: version: mention that it's LTS now.
The version will be maintained up to around Q2 2028. Let's
also update the INSTALL file to mention this.
2023-05-31 16:23:56 +02:00
Amaury Denoyelle
d68f8b5a4a CLEANUP: mux-quic: rename internal functions
This patch is similar to the previous one but for QUIC mux functions
used inside the mux code itself or application layer. Replace all
occurences of qc_* prefix by qcc_* or qcs_*. This should help to better
differentiate code between quic_conn and MUX.

This should be backported up to 2.7.
2023-05-30 15:45:55 +02:00
Amaury Denoyelle
6d6ee0dc0b MINOR: quic: fix stats naming for flow control BLOCKED frames
There was a misnaming in stats counter for *_BLOCKED frames in regard to
QUIC rfc convention. This patch fixes it to prevent future ambiguity :

- STREAMS_BLOCKED -> STREAM_DATA_BLOCKED
- STREAMS_DATA_BLOCKED_BIDI -> STREAMS_BLOCKED_BIDI
- STREAMS_DATA_BLOCKED_UNI -> STREAMS_BLOCKED_UNI

This should be backported up to 2.7.
2023-05-26 17:17:00 +02:00
Amaury Denoyelle
087c5f041b MINOR: mux-quic: remove nb_streams from qcc
Remove nb_streams field from qcc. It was not used outside of a BUG_ON()
statement to ensure we never have a negative count of streams. However
this is already checked with other fields.

This should be backported up to 2.7.
2023-05-26 17:17:00 +02:00
Amaury Denoyelle
7b41dfd834 CLEANUP: mux-quic: remove unneeded fields in qcc
Remove fields from qcc structure which are unused.

This should be backported up to 2.7.
2023-05-26 17:17:00 +02:00
Patrick Hemmer
425d7ad89d MINOR: init: pre-allocate kernel data structures on init
The Linux kernel maintains data structures to track a processes' open file
descriptors, and it expands these structures as necessary when FD usage grows
(at every FD=2^X starting at 64). However when threading is in use, during
expansion the kernel will pause (observed up to 47ms) while it waits for thread
synchronization (see https://bugzilla.kernel.org/show_bug.cgi?id=217366).

This change addresses the issue and avoids the random pauses by opening the
maximum file descriptor during initialization, so that expansion will not occur
while processing traffic.
2023-05-26 09:28:18 +02:00
Willy Tarreau
b298882acc BUILD: compiler: systematically set USE_OBSOLETE_LINKER with TCC
TCC silently ignores the weak and section attributes, which ruins the
initcalls. Technically we're exactly in the same situation as with an
obsolete linker. Let's just automatically set the flag if TCC is
detected, this avoids surprises where the program compiles but does
not start.

No backport is needed.
2023-05-24 21:37:06 +02:00
Willy Tarreau
eced142aa8 BUILD: ist: use the literal declaration for ist_lc/ist_uc under TCC
TCC doesn't knoow about __attribute__((weak)), it silently ignores it.
We could add a "static" modifier there in this case but we already have
an alternate portable mode that is based on a slightly larger literal
for obsolete linkers (and non-ELF systems) which choke on weak. Let's
just add the test for tcc there and use it in this case.

No backport is needed.
2023-05-24 21:33:34 +02:00
Willy Tarreau
4e8720ab78 BUILD: ist: do not put a cast in an array declaration
TCC is upset by the declaration looking like:

  const unsigned char ist_lc[256] __attribute__((weak)) = ((const unsigned char[256]){ ... });

It was written like this because it's expanded from the _IST_LC macro
but it's never used as-is, it's only used from ist_lc, which should be
the one containing the cast so that the macro only contains the list of
bytes that can be used in both places. And this assigns more consistent
roles to the lower and upper case macro/variable now, one is typed and
the other one not. No backport is needed.
2023-05-24 21:27:39 +02:00
Frdric Lcaille
12a815ad19 MINOR: quic: Add a counter for sent packets
Add ->sent_pkt counter to quic_conn struct to count the packet at QUIC connection
level. Then, when the connection is released, the ->sent_pkt counter value
is added to the one for the listener.

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Frdric Lcaille
bdd64fd71d MINOR: quic: Add some counters at QUIC connection level
Add some statistical counters to quic_conn struct from quic_counters struct which
are used at listener level to handle them at QUIC connection level. This avoid
calling atomic functions. Furthermore this will be useful soon when a counter will
be added for the total number of packets which have been sent which will be very
often incremented.

Some counters were not added, espcially those which count the number of QUIC errors
by QUIC error types. Indeed such counters would be incremented most of the time
only one time at QUIC connection level.

Implement quic_conn_prx_cntrs_update() which accumulates the QUIC connection level
statistical counters to the listener level statistical counters.

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Willy Tarreau
1e1c28873c BUILD: makefile: fix build issue on GNU make < 3.82
Thierry Fournier reported a build breakage with the ubiquitous make
3.81, LDFLAGS were ignored. This is caused by the declaration of the
collect_opt_flags macro that is defined with an "=" sign, something
that only appeared in 3.82 and that is not necessary. With it removed,
the build now works fine at least from 3.80 to 4.3.

No backport is needed since this makefile cleanup appeared in 2.8.
2023-05-24 15:51:03 +02:00
Ilya Shipitsin
97c344dae0 BUILD: quic: re-enable chacha20_poly1305 for libressl
this reverts d2be9d4c48

LibreSSL implements EVP_chacha20_poly1305() with EVP_CIPHER for every
released version starting with 3.6.0
2023-05-23 19:20:36 +02:00
Willy Tarreau
b7209d42d9 MEDIUM: stconn: make the SE_FL_ERR_PENDING to ERROR transition systematic
During a code audit of the various situations that promote ERR_PENDING to
ERROR, it appeared that:
  - all muxes use se_fl_set_error() to set it, which chooses either based
    on EOI/EOS presence ;
  - EOI/EOS that arrive late after ERR_PENDING were not systematically
    upgraded to ERROR

This results in confusion about how such ERROR or ERR_PENDING ought to
be handled, which is not quite desirable.

This patch adds a test to se_fl_set() to detect if we're setting EOI or
EOS while ERR_PENDING is present, or the other way around so that any
sequence of EOI/EOS <-> ERR_PENDING results in ERROR being set. This
way there will no longer be possible situations where ERROR is missing
while the other ones are set.
2023-05-23 16:17:04 +02:00
Amaury Denoyelle
5eadc27623 MINOR: quic: remove return val of quic_aead_iv_build()
quic_aead_iv_build() should never fail unless we call it with buffers of
different size. This never happens in the code as every input buffers
are of size QUIC_TLS_IV_LEN.

Remove the return value and add a BUG_ON() to prevent future misusage.
This is especially useful to remove one error handling on the sending
patch via quic_packet_encrypt().

This should be backported up to 2.7.
2023-05-22 11:17:18 +02:00
Willy Tarreau
5345490b8e MINOR: clock: provide a function to automatically adjust now_offset
Right now there's no way to enforce a specific value of now_ms upon
startup in order to compensate for the time it takes to load a config,
specifically when dealing with the health check startup. For this we'd
need to force the now_offset value to compensate for the last known
value of the current date. This patch exposes a function to do exactly
this.
2023-05-17 09:33:54 +02:00
Willy Tarreau
5723b382ed MINOR: stats: report the boot time in "show info"
Just like we have the uptime in "show info", let's add the boot time.
It's trivial to collect as it's just the difference between the ready
date and the start date, and will allow users to monitor this element
in order to take action before it starts becoming problematic. Here
the boot time is reported in milliseconds, so this allows to even
observe sub-second anomalies in startup delays.
2023-05-17 09:33:54 +02:00
Willy Tarreau
da4aa6905c MINOR: clock: measure the total boot time
Some huge configs take a significant amount of time to start and this
can cause some trouble (e.g. health checks getting delayed and grouped,
process not responding to the CLI etc). For example, some configs might
start fast in certain environments and slowly in other ones just due to
the use of a wrong DNS server that delays all libc's resolutions. Let's
first start by measuring it by keeping a copy of the most recently known
ready date, once before calling check_config_validity() and then refine
it when leaving this function. A last call is finally performed just
before deciding to split between master and worker processes, and it covers
the whole boot. It's trivial to collect and even allows to get rid of a
call to clock_update_date() in function check_config_validity() that was
used in hope to better schedule future events.
2023-05-17 09:33:54 +02:00
Amaury Denoyelle
1a2faef92f MINOR: mux-quic: uninline qc_attach_sc()
Uninline and move qc_attach_sc() function to implementation source file.
This will be useful for next commit to add traces in it.

This should be backported up to 2.7.
2023-05-16 17:53:45 +02:00
Amaury Denoyelle
3cb78140cf MINOR: mux-quic: properly report end-of-stream on recv
MUX is responsible to put EOS on stream when read channel is closed.
This happens if underlying connection is closed or a RESET_STREAM is
received. FIN STREAM is ignored in this case.

For connection closure, simply check for CO_FL_SOCK_RD_SH.

For RESET_STREAM reception, a new flag QC_CF_RECV_RESET has been
introduced. It is set when RESET_STREAM is received, unless we already
received all data. This is conform to QUIC RFC which allows to ignore a
RESET_STREAM in this case. During RESET_STREAM processing, input buffer
is emptied so EOS can be reported right away on recv_buf operation.

This should be backported up to 2.7.
2023-05-16 17:53:45 +02:00
William Lallemand
d0c363486c BUILD: ssl: get0_verified chain is available on libreSSL
Define HAVE_SSL_get0_verified_chain when it's using libreSSL >= 3.3.6.
2023-05-15 15:16:15 +02:00
William Lallemand
6e0c39d7ac BUILD: ssl: ssl_c_r_dn fetches uses functiosn only available since 1.1.1
Fix the openssl build with older openssl version by disabling the new
ssl_c_r_dn fetch.

This also disable the ssl_client_samples.vtc file for OpenSSL version
older than 1.1.1
2023-05-15 12:07:52 +02:00
Abhijeet Rastogi
df97f472fa MINOR: ssl: add new sample ssl_c_r_dn
This patch addresses #1514, adds the ability to fetch DN of the root
ca that was in the chain when client certificate was verified during SSL
handshake.
2023-05-15 10:48:05 +02:00
Amaury Denoyelle
6c501ed23b BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc
qc_stream_buf_alloc() can fail for two reasons :
* limit of Tx buffer per connection reached
* allocation failure

The first case is properly treated. A flag QC_CF_CONN_FULL is set on the
connection to interrupt emission. It is cleared when a buffer became
available after in order ACK reception and the MUX tasklet is woken up.

The allocation failure was handled with the same mechanism which in this
case is not appropriate and could lead to a connection transfer freeze.
Instead, prefer to close the connection with a QUIC internal error code.

To differentiate the two causes, qc_stream_buf_alloc() API was changed
to return the number of available buffers to the caller.

This must be backported up to 2.6.
2023-05-12 16:26:20 +02:00
Amaury Denoyelle
93dd23cab4 MINOR: mux-quic: remove dedicated function to handle standalone FIN
Remove QUIC MUX function qcs_http_handle_standalone_fin(). The purpose
of this function was only used when receiving an empty STREAM frame with
FIN bit. Besides, it was called by each application protocol which could
have different approach and render the function purpose unclear.

Invocation of qcs_http_handle_standalone_fin() have been replaced by
explicit code in both H3 and HTTP/0.9 module. In the process, use
htx_set_eom() to reliably put EOM on the HTX message.

This should be backported up to 2.7, along with the previous patch which
introduced htx_set_eom().
2023-05-12 15:50:30 +02:00
Amaury Denoyelle
25cf19d5c8 MINOR: htx: add function to set EOM reliably
Implement a new HTX utility function htx_set_eom(). If the HTX message
is empty, it will first add a dummy EOT block. This is a small trick
needed to ensure readers will detect the HTX buffer as not empty and
retrieve the EOM flag.

Replace the H2 code related by a htx_set_eom() invocation. QUIC also has
the same code which will be replaced in the next commit.

This should be backported up to 2.7 before the related QUIC patch.
2023-05-12 15:29:28 +02:00
Willy Tarreau
ea07715ccf MINOR: master/cli: also implement the timed prompt on the master CLI
This provides more consistency between the master and the worker. When
"prompt timed" is passed on the master, the timed mode is toggled. When
enabled, for a master it will show the master process' uptime, and for
a worker it will show this worker's uptime. Example:

  master> prompt timed
  [0:00:00:50] master> show proc
  #<PID>          <type>          <reloads>       <uptime>        <version>
  11940           master          1 [failed: 0]   0d00h02m10s     2.8-dev11-474c14-21
  # workers
  11955           worker          0               0d00h00m59s     2.8-dev11-474c14-21
  # old workers
  11942           worker          1               0d00h02m10s     2.8-dev11-474c14-21
  # programs

  [0:00:00:58] master> @!11955
  [0:00:01:03] 11955> @!11942
  [0:00:02:17] 11942> @
  [0:00:01:10] master>
2023-05-11 16:38:52 +02:00
Willy Tarreau
225555711f MINOR: cli: add an option to display the uptime in the CLI's prompt
Entering "prompt timed" toggles reporting of the process' uptime in
the prompt, which will report days, hours, minutes and seconds since
it was started. As discussed with Tim in issue #2145, this can be
convenient to roughly estimate the time between two outputs, as well
as detecting that a process failed to be reloaded for example.
2023-05-11 16:38:52 +02:00
Aurelien DARRAGON
31b23aef38 CLEANUP: acl: discard prune_acl_cond() function
Thanks to previous commit, we have no more use for prune_acl_cond(),
let's remove it to prevent code duplication.
2023-05-11 15:37:04 +02:00
Aurelien DARRAGON
7abc9224a6 MINOR: proxy: add http_free_redirect_rule() function
Adding http_free_redirect_rule() function to free a single redirect rule
since it may be required to free rules outside of free_proxy() function.

This patch is required for an upcoming bugfix.

[for 2.2, free_proxy function did not exist (first seen in 2.4), thus
http_free_redirect_rule() needs to be deducted from haproxy.c deinit()
function if the patch is required]
2023-05-11 15:37:04 +02:00
Christopher Faulet
7542fb43d6 MINOR: stconn: Add a cross-reference between SE descriptor
A xref is added between the endpoint descriptors. It is created when the
server endpoint is attached to the SC and it is destroyed when an endpoint
is detached.

This xref is not used for now. But it will be useful to retrieve info about
an endpoint for the opposite side. It is also the warranty there is still a
endpoint attached on the other side.
2023-05-11 15:37:04 +02:00
Willy Tarreau
4cfb0019e6 MINOR: stats: report the listener's protocol along with the address in stats
When "optioon socket-stats" is used in a frontend, its listeners have
their own stats and will appear in the stats page. And when the stats
page has "stats show-legends", then a tooltip appears on each such
socket with ip:port and ID. The problem is that since QUIC arrived, it
was not possible to distinguish the TCP listeners from the QUIC ones
because no protocol indication was mentioned. Now we add a "proto"
legend there with the protocol name, so we can see "tcp4" or "quic6"
and figure how the socket is bound.
2023-05-11 14:52:56 +02:00
Amaury Denoyelle
5f67b17a59 MEDIUM: mux-quic: adjust transport layer error handling
Following previous patch, error notification from quic_conn has been
adjusted to rely on standard connection flags. Most notably, CO_FL_ERROR
on the connection instance when a fatal error is detected.

Check for CO_FL_ERROR is implemented by qc_send(). If set the new flag
QC_CF_ERR_CONN will be set for the MUX instance. This flag is similar to
the local error flag and will abort most of the futur processing. To
ensure stream upper layer is also notified, qc_wake_some_streams()
called by qc_process() will put the stream on error if this new flag is
set.

This should be backported up to 2.7.
2023-05-11 14:12:48 +02:00
Amaury Denoyelle
b2e31d33f5 MEDIUM: quic: streamline error notification
When an error is detected at quic-conn layer, the upper MUX must be
notified. Previously, this was done relying on quic_conn flag
QUIC_FL_CONN_NOTIFY_CLOSE set and the MUX wake callback called on
connection closure.

Adjust this mechanism to use an approach more similar to other transport
layers in haproxy. On error, connection flags are updated with
CO_FL_ERROR, CO_FL_SOCK_RD_SH and CO_FL_SOCK_WR_SH. The MUX is then
notified when the error happened instead of just before the closing. To
reflect this change, qc_notify_close() has been renamed qc_notify_err().
This function must now be explicitely called every time a new error
condition arises on the quic_conn layer.

To ensure MUX send is disabled on error, qc_send_mux() now checks
CO_FL_SOCK_WR_SH. If set, the function returns an error. This should
prevent the MUX from sending data on closing or draining state.

To complete this patch, MUX layer must now check for CO_FL_ERROR
explicitely. This will be the subject of the following commit.

This should be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Amaury Denoyelle
2d5c3f5cd1 MINOR: mux-quic: add traces for stream wake
Add traces for when an upper layer stream is woken up by the MUX. This
should help to diagnose frozen stream issues.

This should be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Willy Tarreau
9615102b01 MINOR: stats: report the number of times the global maxconn was reached
As discussed a few times over the years, it's quite difficult to know
how often we stop accepting connections because the global maxconn was
reached. This is not easy to know because when we reach the limit we
stop accepting but we don't know if incoming connections are pending,
so it's not possible to know how many were delayed just because of this.
However, an interesting equivalent metric consist in counting the number
of times an accepted incoming connection resulted in the limit being
reached. I.e. "we've accepted the last one for now". That doesn't imply
any other one got delayed but it's a factual indicator that something
might have been delayed. And by counting the number of such events, it
becomes easier to know whether some limits need to be adjusted because
they're reached often, or if it's exceptionally rare.

The metric is reported as a counter in show info and on the stats page
in the info section right next to "maxconn".
2023-05-11 13:51:31 +02:00
Willy Tarreau
3c4a297d2b MINOR: stats: report the total number of warnings issued
Now in "show info" we have a TotalWarnings field that reports the total
number of warnings issued since the process started. It's also reported
in the the stats page next to the uptime.
2023-05-11 12:02:21 +02:00
Willy Tarreau
29dcc5e559 DEBUG: list: add DEBUG_LIST to purposely corrupt list heads after delete
LIST_DELETE doesn't affect the previous pointers of the stored element.
This can sometimes hide bugs when such a pointer is reused by accident
in a LIST_NEXT() or equivalent after having been detached for example, or
ia another LIST_DELETE is performed again, something that LIST_DEL_INIT()
is immune to. By compiling with -DDEBUG_LIST, we'll replace a freshly
detached list element with two invalid pointers that will cause a crash
in case of accidental misuse. It's not enabled by default.
2023-05-11 11:33:35 +02:00
Frédéric Lécaille
b971696296 BUG/MINOR: quic: Possible crash when dumping version information
->others member of tp_version_information structure pointed to a buffer in the
TLS stack used to parse the transport parameters. There is no garantee that this
buffer is available until the connection is released.

Do not dump the available versions selected by the client anymore, but displayed the
chosen one (selected by the client for this connection) and the negotiated one.

Must be backported to 2.7 and 2.6.
2023-05-10 13:26:37 +02:00
Amaury Denoyelle
58721f2192 BUG/MINOR: mux-quic: fix transport VS app CONNECTION_CLOSE
A recent series of patch were introduced to streamline error generation
by QUIC MUX. However, a regression was introduced : every error
generated by the MUX was built as CONNECTION_CLOSE_APP frame, whereas it
should be only for H3/QPACK errors.

Fix this by adding an argument <app> in qcc_set_error. When false, a
standard CONNECTION_CLOSE is used as error.

This bug was detected by QUIC tracker with the following tests
"stop_sending" and "server_flow_control" which requires a
CONNECTION_CLOSE frame.

This must be backported up to 2.7.
2023-05-09 18:42:34 +02:00
Christopher Faulet
557146ccc8 DOC: stconn: Update comments about ABRT/SHUT for stconn structure
The comment for the stconn structure was still referencing the SHUTR/SHUTW
flags. These flags were replaced and we now use ABRT/SHUT flags in
comments. The comment itself was slightly updated to be accurate.
2023-05-09 16:36:45 +02:00
Christopher Faulet
e59f7583ee MEDIUM: stconn: Be sure to always be able to unblock a SC that needs room
When sc_need_room() is called, the caller cannot request more free space
than a minimum value to be sure it is always possible to unblock it. it is a
safety guard to not freeze any SC on NEED_ROOM condition. At worse it will
lead to some wakeups un excess at the edge.

To keep things simple, the following minimum is used:

  (global.tune.bufsize - global.tune.maxrewrite - sizeof(struct htx))
2023-05-09 11:53:28 +02:00
Frédéric Lécaille
1bc6e318f0 CLEANUP: quic: Rename several <buf> variables in quic_frame.(c|h)
Most of the function in quic_frame.c and quic_frame.h manipulate <buf> buffer
position variables which have nothing to see with struct buffer variables.
Rename them to <pos>

Should be backported to 2.7.
2023-05-09 10:48:40 +02:00
Frédéric Lécaille
d19a02a40e CLEANUP: quic: No more used q_buf structure
This definition is no more used.

Should be backported to 2.7.
2023-05-09 10:48:40 +02:00
Willy Tarreau
652d1712dd BUILD: quic: fix build warning when threads are disabled
Commit e83f937cc ("MEDIUM: quic: use a global CID trees list") uses a
local variable "tree" used only for locks, but when threads are disabled
it spews a warning about this unused variable.
2023-05-07 15:06:22 +02:00
Willy Tarreau
dd9f921b3a CLEANUP: fix a few reported typos in code comments
These are only the few relevant changes among those reported here:

  https://github.com/haproxy/haproxy/actions/runs/4856148287/jobs/8655397661
2023-05-07 07:07:44 +02:00
Willy Tarreau
615c301db4 MINOR: config: allow cpu-map to take commas in lists of ranges
The function that cpu-map uses to parse CPU sets, parse_cpu_set(), was
etended in 2.4 with commit a80823543 ("MINOR: cfgparse: support the
comma separator on parse_cpu_set") to support commas between ranges.
But since it was quite late in the development cycle, by then it was
decided not to add a last-minute surprise and not to magically support
commas in cpu-map, hence the "comma_allowed" argument.

Since then we know that it was not the best choice, because the comma
is silently ignored in the cpu-map syntax, causing all sorts of
surprises in field with threads running on a single node for example.
In addition it's quite common to copy-paste a taskset line and put it
directly into the haproxy configuration.

This commit relaxes this rule an finally allows cpu-map to support
commas between ranges. It simply consists in removing the comma_allowed
argument in the parse_cpu_set() function. The doc was updated to
reflect this.
2023-05-05 18:41:52 +02:00
Aurelien DARRAGON
fc4ec0d653 MINOR: hlua: declare hlua_yieldk() function
Declaring hlua_yieldk() function to make it usable from hlua_fcn.c.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
40cd44f52c MINOR: hlua: declare hlua_gethlua() function
Declaring hlua_gethlua() function to make it usable from hlua_fcn.c.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
34c86760fa MINOR: hlua: declare hlua_{ref,pushref,unref} functions
Declaring hlua_{ref,pushref,unref} functions to make them usable from
hlua_fcn.c to simplify reference handling.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
5bed48fec8 MINOR: mailers/hlua: disable email sending from lua
Exposing a new hlua function, available from body or init contexts, that
forcefully disables the sending of email alerts even if the mailers are
defined in haproxy configuration.

This will help for sending email directly from lua.
(prevent legacy email sending from intefering with lua)
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
dcbc2d2cac MINOR: checks/event_hdl: SERVER_CHECK event
Adding a new event type: SERVER_CHECK.

This event is published when a server's check state ought to be reported.
(check status change or check result)

SERVER_CHECK event is provided as a server event with additional data
carrying relevant check's context such as check's result and health.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
a163d65254 MINOR: server/event_hdl: add SERVER_ADMIN event
Adding a new SERVER event in the event_hdl API.

SERVER_ADMIN is implemented as an advanced server event.
It is published each time the administrative state changes.
(when s->cur_admin changes)

SERVER_ADMIN data is an event_hdl_cb_data_server_admin struct that
provides additional info related to the admin state change, but can
be casted as a regular event_hdl_cb_data_server struct if additional
info is not needed.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
e3eea29f48 MINOR: server/event_hdl: add SERVER_STATE event
Adding a new SERVER event in the event_hdl API.

SERVER_STATE is implemented as an advanced server event.
It is published each time the server's effective state changes.
(when s->cur_state changes)

SERVER_STATE data is an event_hdl_cb_data_server_state struct that
provides additional info related to the server state change, but can
be casted as a regular event_hdl_cb_data_server struct if additional
info is not needed.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
3889efa8e4 MINOR: hlua_fcn: add Server.get_proxy()
Server.get_proxy(): get the proxy to which the server belongs
(or nil if not available)
2023-05-05 16:28:32 +02:00
Christopher Faulet
7b3d38a633 MEDIUM: tree-wide: Change sc API to specify required free space to progress
sc_need_room() now takes the required free space to receive more data as
parameter. All calls to this function are updated accordingly. For now, this
value is set but not used. When we are waiting for a buffer, 0 is used. So
we expect to be unblocked ASAP. However this must be reviewed because
SC_FL_NEED_BUF is probably enough in this case and this flag is already set
if the input buffer allocation fails.
2023-05-05 15:44:23 +02:00
Christopher Faulet
9aed1124ed MINOR: stconn: Add a field to specify the room needed by the SC to progress
When the SC is blocked because it is waiting for room in the input buffer,
it will be responsible to specify the minimum free space required to
progress. In this commit, we only introduce the field in the stconn
structure that will be used to store this value. It is a signed value with
the following meaning:

  * -1: The SC is waiting for room but not based on the buffer state. It
        will be typically used during splicing when the pipe is full. In
        this case, only a successful send can unblock the SC.

  * >= 0; The minimum free space in the input buffer to unblock the SC. 0 is
          a special value to specify the SC must be unblocked ASAP, by the
          stream, at the end of process_stream() or when output data are
          consumed on the opposite side.
2023-05-05 15:41:30 +02:00
Christopher Faulet
f4258bdf3b MINOR: stats: Use the applet API to write data
stats_putchk() is updated to use the applet API instead of the channel API
to write data. To do so, the appctx is passed as parameter instead of the
channel. This way, the applet does not need to take care to request more
room it it fails to put data into the channel's buffer.
2023-05-05 15:41:29 +02:00
William Lallemand
b6ae2aafde MINOR: ssl: allow to change the signature algorithm for client authentication
This commit introduces the keyword "client-sigalgs" for the bind line,
which does the same as "sigalgs" but for the client authentication.

"ssl-default-bind-client-sigalgs" allows to set the default parameter
for all the bind lines.

This patch should fix issue #2081.
2023-05-05 00:05:46 +02:00
William Lallemand
1d3c822300 MINOR: ssl: allow to change the server signature algorithm
This patch introduces the "sigalgs" keyword for the bind line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-bind-sigalgs" in
the default section.

This patch was originally written by Bruno Henc.
2023-05-04 22:43:18 +02:00
Willy Tarreau
e69919d1ba CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash()
The function isn't used anymore since each call place performs its
own loop. Let's get rid of it.
2023-05-04 19:19:04 +02:00
Willy Tarreau
9a6ecbd590 MEDIUM: debug: simplify the thread dump mechanism
The thread dump mechanism that is used by "show threads" and by the
panic dump is overly complicated due to an initial misdesign. It
firsts wakes all threads, then serializes their dumps, then releases
them, while taking extreme care not to face colliding dumps. In fact
this is not what we need and it reached a limit where big machines
cannot dump all their threads anymore due to buffer size limitations.

What is needed instead is to be able to dump *one* thread, and to let
the requester iterate on all threads.

That's what this patch does. It adds the thread_dump_buffer to the
struct thread_ctx so that the requester offers the buffer to the
thread that is about to be dumped. This buffer also serves as a lock.
A thread at rest has a NULL, a valid pointer indicates the thread is
using it, and 0x1 (NULL+1) is used by the dumped thread to tell the
requester it's done. This makes sure that a given thread is dumped
once at a time. In addition to this, the calling thread decides
whether it accesses the thread by itself or via the debug signal
handler, in order to get a backtrace. This is much saner because the
calling thread is free to do whatever it wants with the buffer after
each thread is dumped, and there is no dependency between threads,
once they've dumped, they're free to continue (and possibly to dump
for another requester if needed). Finally, when the THREAD_DUMP
feature is disabled and the debug signal is not used, the requester
accesses the thread by itself like before.

For now we still have the buffer size limitation but it will be
addressed in future patches.
2023-05-04 19:15:44 +02:00
Aurelien DARRAGON
e910909556 BUG/MINOR: time: fix NS_TO_TV macro
NS_TO_TV helper was implemented in 591fa59 ("MINOR: time: add conversions
to/from nanosecond timestamps")

Due to NS_TO_TV being implemented as a macro and not a function, we must
take extra care when manipulating user input.

In current implementation, 't' argument is not isolated within the macro.

Because of this, NS_TO_TV(1 + 1) will expand to:
  ((const struct timeval){ .tv_sec = 1 + 1 / 1000000000ULL, .tv_usec = (1 + 1 % 1000000000ULL) / 1000U })

Instead of:
  ((const struct timeval){ .tv_sec = 2 / 1000000000ULL, .tv_usec = (2 % 1000000000ULL) / 1000U })

As such, NS_TO_TV usage in hlua_now() is currently incorrect and this
results in unexpected values being passed to lua.

In this patch, we're adding an extra parenthesis around 't' in NS_TO_TV()
macro to make it safe against such usages. (that is: ensure proper
argument expansion as if NS_TO_TV was implemented as a function)

This is a 2.8 specific bug, no backport needed.
2023-05-04 18:09:50 +02:00
Amaury Denoyelle
51f116d65e MINOR: mux-quic: adjust local error API
When a fatal error is detected by the QUIC MUX or H3 layer, the
connection should be closed with a CONNECTION_CLOSE with an error code
as the reason.

Previously, a direct call was used to the quic_conn layer to try to
close the connection. This API was adjusted to be more flexible. Now,
when an error is detected, the function qcc_set_error() is called. This
set the flag QC_CF_ERRL with the error code stored by the MUX. The
connection will be closed soon so most of the operations are not
conducted anymore. Connection is then finally closed during qc_send()
via quic_conn layer if QC_CF_ERRL is set. This will set the flag
QC_CF_ERRL_DONE which indicates that the MUX instance can be freed.

This model is cleaner and brings the following improvments :
- interaction with quic_conn layer for closure is centralized on a
  single function
- CO_FL_ERROR is not set anymore. This was incorrect as this should be
  reserved to errors reported by the transport layer to be similar with
  other haproxy components. As a consequence, qcc_is_dead() has been
  adjusted to check for QC_CF_ERRL_DONE to release the MUX instance.

This should be backported up to 2.7.
2023-05-04 16:36:51 +02:00
Amaury Denoyelle
8d44bfaf0b MINOR: mux-quic: add trace event for local error
Add a dedicated trace event QMUX_EV_QCC_ERR. This is used for locally
detected error when a CONNECTION_CLOSE should be emitted.

This should be backported up to 2.7.
2023-05-04 16:36:51 +02:00
Amaury Denoyelle
bc0adfa334 MINOR: proxy: factorize send rate measurement
Implement a new dedicated function increment_send_rate() which can be
call anywhere new bytes must be accounted for global total sent.
2023-04-28 16:53:44 +02:00
Willy Tarreau
c05d30e9d8 MINOR: clock: replace the timeval start_time with start_time_ns
Now that "now" is no more a timeval, there's no point keeping a copy
of it as a timeval, let's also switch start_time to nanoseconds, it
simplifies operations.
2023-04-28 16:08:08 +02:00
Willy Tarreau
69530f59ae MEDIUM: clock: replace timeval "now" with integer "now_ns"
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".

All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really  is.

The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.

The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.

The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.

One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
2023-04-28 16:08:08 +02:00
Willy Tarreau
eed5da1037 MINOR: clock: do not use now.tv_sec anymore
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
2023-04-28 16:08:08 +02:00
Willy Tarreau
e8e4712771 MINOR: checks: use a nanosecond counters instead of timeval for checks->start
Now we store the checks start date as a nanosecond timestamps instead
of a timeval, this will simplify the operations with "now" in the near
future.
2023-04-28 16:08:08 +02:00
Willy Tarreau
ad5a5f6779 MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request
Let's get rid of timeval in storage of internal timestamps so that they
are no longer mistaken for wall clock time. These were exclusively used
subtracted from each other or to/from "now" after being converted to ns,
so this patch removes the tv_to_ns() conversion to use them natively. Two
occurrences of tv_isge() were turned to a regular wrapping subtract.
2023-04-28 16:08:08 +02:00
Willy Tarreau
aaebcae58b MINOR: spoe: switch the timeval-based timestamps to nanosecond timestamps
Various points were collected during a request/response and were stored
using timeval. Let's now switch them to nanosecond based timestamps.
2023-04-28 16:08:08 +02:00
Willy Tarreau
76d343d3d3 MINOR: time: replace calls to tv_ms_elapsed() with a linear subtract
Instead of operating on {sec, usec} now we convert both operands to
ns then subtract them and convert to ms. This is a first step towards
dropping timeval from these timestamps.

Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at
all and could be removed.
2023-04-28 16:08:08 +02:00
Willy Tarreau
591fa59da7 MINOR: time: add conversions to/from nanosecond timestamps
In order to ease the transition away from the timeval used in internal
timestamps, let's first create a few functions and macro to return a
counter from a timeval and conversely, as well as ease the conversions
to/from ns/us/ms/sec to save the user from having to count zeroes and
to think about appending ULL in conversions.
2023-04-28 16:08:08 +02:00
Christopher Faulet
81951f264e BUG/MINOR: stconn: Fix SC flags with same value
SC_FL_SND_NEVERWAIT and SC_FL_SND_EXP_MORE flags have the same value. It is
not critical because these flags are only used to know if MSG_MORE flag must
be set on a send().

No backport needed.
2023-04-28 08:51:34 +02:00
Christopher Faulet
e99c43907c BUG/MEDIUM: spoe: Don't start new applet if there are enough idle ones
It is possible to start too many applets on sporadic burst of events after
an inactivity period. It is due to the way we estimate if a new applet must
be created or not. It is based on a frequency counter. We compare the events
processing rate against the number of events currently processed (in
progress or waiting to be processed). But we should also take care of the
number of idle applets.

We already track the number of idle applets, but it is global and not
per-thread. Thus we now also track the number of idle applets per-thread. It
is not a big deal because this fills a hole in the spoe_agent structure.
Thanks to this counter, we can refrain applets creation if there is enough
idle applets to handle currently processed events.

This patch should be backported to every stable versions.
2023-04-28 08:51:34 +02:00
Amaury Denoyelle
d6646dddcc MINOR: quic: finalize affinity change as soon as possible
During accept, a quic-conn is rebind to a new thread. This process is
done in two times :
* first on the original thread via qc_set_tid_affinity()
* then on the newly assigned thread via qc_finalize_affinity_rebind()

Most quic_conn operations (I/O tasklet, task and quic_conn FD socket
read) are reactivated ony after the second step. However, there is a
possibility that datagrams are handled before it via quic_dgram_parse()
when using listener sockets. This does not seem to cause any issue but
this may cause unexpected behavior in the future.

To simplify this, qc_finalize_affinity_rebind() will be called both by
qc_xprt_start() and quic_dgram_parse(). Only one invocation will be
performed thanks to the new flag QUIC_FL_CONN_AFFINITY_CHANGED.

This should be backported up to 2.7.
2023-04-26 17:50:16 +02:00
Amaury Denoyelle
24962dd178 BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length
Some HTX responses may not always contain a EOM block. For example this
is the case if content-length header is missing from the HTTP server
response. Stream termination is thus signaled to QUIC mux via shutw
callback. However, this is interpreted inconditionnally as an early
close by the mux with a RESET_STREAM emission. Most of the times, QUIC
clients report this as an error.

To fix this, check if htx.extra is set to HTX_UNKOWN_PAYLOAD_LENGTH for
a qcs instance. If true, shutw will never be used to emit a
RESET_STREAM. Instead, the stream will be closed properly with a FIN
STREAM frame. If all data were already transfered, an empty STREAM frame
is sent.

This fix may help with the github issue #2004 where chrome browser stop
to use QUIC after receiving RESET_STREAM frames.

This issue was reported by Vladimir Zakharychev. Thanks to him for his
help and testing. It was also reproduced locally using httpterm with the
query string "/?s=1k&b=0&C=1".

This should be backported up to 2.7.
2023-04-26 17:50:09 +02:00
Willy Tarreau
543e2544ca DEBUG: crash using an invalid opcode on aarch64 instead of an invalid access
On aarch64 there's also a guaranted invalid instruction, called UDF, and
which even supports an optional 16-bit immediate operand:

   https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/UDF--Permanently-Undefined-?lang=en

It's conveniently encoded as 4 zeroes (when the operand is zero). It's
unclear when support for it was added into GAS, if at all; even a
not-so-old 2.27 doesn't know about it. Let's byte-encode it.

Tested on an A72 and works as expected.
2023-04-25 19:53:39 +02:00
Willy Tarreau
77787ec9bc DEBUG: crash using an invalid opcode on x86/x86_64 instead of an invalid access
BUG_ON() calls currently trigger a segfault. This is more convenient
than abort() as it doesn't rely on any function call nor signal handler
and never causes non-unwindable stacks when opening cores. But it adds
quite some confusion in bug reports which are rightfully tagged "segv"
and do not instantly allow to distinguish real segv (e.g. null derefs)
from code asserts.

Some CPU architectures offer various crashing methods. On x86 we have
INT3 (0xCC), which stops into the debugger, and UD0/UD1/UD2. INT3 looks
appealing but for whatever reason (maybe signal handling somewhere) it
loses the last call point in the stack, making backtraces unusable. UD2
has the merit of being only 2 bytes and causing an invalid instruction,
which almost never happens normally, so it's easily distinguishable.
Here it was defined as a macro so that the line number in the core
matches the one where the BUG_ON() macro is called, and the debugger
shows the last frame exactly at its calligg point.

E.g. when calling "debug dev bug":

Program terminated with signal SIGILL, Illegal instruction.
  #0  debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408
  408             BUG_ON(one > zero);
  [Current thread is 1 (Thread 0x7f7a660cc1c0 (LWP 14238))]
  (gdb) bt
  #0  debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408
  #1  debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:402
  #2  0x000000000061a69f in cli_parse_request (appctx=appctx@entry=0x181c0160) at src/cli.c:832
  #3  0x000000000061af86 in cli_io_handler (appctx=0x181c0160) at src/cli.c:1035
  #4  0x00000000006ca2f2 in task_run_applet (t=0x181c0290, context=0x181c0160, state=<optimized out>) at src/applet.c:449
2023-04-25 18:51:10 +02:00
Amaury Denoyelle
d5f03cd576 CLEANUP: quic: rename frame variables
Rename all frame variables with the suffix _frm. This helps to
differentiate frame instances from other internal objects.

This should be backported up to 2.7.
2023-04-24 15:35:22 +02:00
Amaury Denoyelle
888c5f283a CLEANUP: quic: rename frame types with an explicit prefix
Each frame type used in quic_frame union has been renamed with the
following prefix "qf_". This helps to differentiate frame instances from
other internal objects.

This should be backported up to 2.7.
2023-04-24 15:35:03 +02:00
Willy Tarreau
7310164b2c MINOR: listener: add a new global tune.listener.default-shards setting
This new setting accepts "by-process", "by-group" and "by-thread" and
will dictate how listeners will be sharded by default when nothing is
specified. While the default remains "by-process", "by-group" should be
much more efficient with many threads, while not changing anything for
single-group setups.
2023-04-23 09:46:15 +02:00
Willy Tarreau
f1003ea7fa MINOR: protocol: perform a live check for SO_REUSEPORT support
When testing if a protocol supports SO_REUSEPORT, we're now able to
verify if the OS does really support it. While it may be supported at
build time, it may possibly have been blocked in a container for
example so we'd rather know what it's like.
2023-04-23 09:46:15 +02:00
Willy Tarreau
b073573c10 MINOR: sock: add a function to check for SO_REUSEPORT support at runtime
The new function _sock_supports_reuseport() will be used to check if a
protocol type supports SO_REUSEPORT or not. This will be useful to verify
that shards can really work.
2023-04-23 09:46:15 +02:00
Willy Tarreau
8a5e6f4cca MINOR: protocol: add a function to check if some features are supported
The new function protocol_supports_flag() checks the protocol flags
to verify if some features are supported, but will support being
extended to refine the tests. Let's use it to check for REUSEPORT.
2023-04-23 09:46:15 +02:00
Willy Tarreau
785b89f551 MINOR: protocol: move the global reuseport flag to the protocols
Some protocol support SO_REUSEPORT and others not. Some have such a
limitation in the kernel, and others in haproxy itself (e.g. sock_unix
cannot support multiple bindings since each one will unbind the previous
one). Also it's really protocol-dependent and not just family-dependent
because on Linux for some time it was supported for TCP and not UDP.

Let's move the definition to the protocols instead. Now it's preset in
tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset.
The enabled() config condition test validates IPv4 (generally sufficient),
and -dR / noreuseport all protocols at once.
2023-04-23 09:46:15 +02:00
Willy Tarreau
65df7e028d MINOR: protocol: add a flags field to store info about protocols
We'll use these flags to know if some protocols are supported, and if
so, with what options/extensions. Reuseport will move there for example.
Two functions were added to globally set/clear a flag.
2023-04-23 09:46:15 +02:00
Willy Tarreau
da0d2cb698 MINOR: proxy: make proxy_type_str() recognize peers sections
Now proxy_type_str() will emit "peers section" when the mode is set to
peers, so as to ease sharing more code between peers and proxies.
2023-04-23 09:46:15 +02:00
Willy Tarreau
f6a8444f55 REORG: listener: move the bind_conf's thread setup code to listener.c
What used to be only two lines to apply a mask in a loop in
check_config_validity() grew into a 130-line block that performs deeply
listener-specific operations that do not have their place there anymore.
In addition it's worth noting that the peers code still doesn't support
shards nor being bound to more than one group, which is a second reason
for moving that code to its own function. Nothing was changed except
recreating the missing variables from the bind_conf itself (the fe only).
2023-04-23 09:46:15 +02:00
Willy Tarreau
4c538df28c CLEANUP: protocol: move the nb_receivers to plug a hole in protocol
This field forces an unaligned hole between two list heads. Let's move
it up where it will be more easily combined with other fields. In
addition, turn it to unsigned while it's still not used.
2023-04-23 09:46:15 +02:00
Willy Tarreau
798d6b4124 CLEANUP: protocol: move the l3_addrlen to plug a hole in proto_fam
There's a two-byte hole in proto_fam after sock_family, let's move the
l3_addrlen there as a ushort. Note that contrary to what the comment
says, it's still not used by hash algorithms though it could.
2023-04-23 09:46:15 +02:00
Willy Tarreau
df4051cd58 BUILD: proto_tcp: export the correct names for proto_tcpv[46]
The exported names were not correct (missing the 'v').
2023-04-23 09:46:15 +02:00
Willy Tarreau
968a4f34fc BUILD: sock_inet: forward-declare struct receiver
Including sock_inet.h without receiver-t.h causes build failures due to
struct receiver not being defined. Let's just forward-declare it.
2023-04-23 09:46:15 +02:00
Ilya Shipitsin
ccf8012f28 CLEANUP: assorted typo fixes in the code and comments
This is 36th iteration of typo fixes
2023-04-23 09:44:53 +02:00
Tim Duesterhus
3a8c63d48d MINOR: Make tasklet_free() safe to be called with NULL
Make this freeing function safe, like other freeing functions are as discussed
in GitHub issue #2126.
2023-04-23 00:28:25 +02:00
Willy Tarreau
ff18504d73 MINOR: listener: make sure to avoid ABA updates in per-thread index
One limitation of the current thread index mechanism is that if the
values are assigned multiple times to the same thread and the index
loops, it can match again the old value, which will not prevent a
competing thread from finishing its CAS and assigning traffic to a
thread that's not the optimal one. The probability is low but the
solution is simple enough and consists in implementing an update
counter in the high bits of the index to force a mismatch in this
case (assuming we don't try to cover for extremely unlikely cases
where the update counter loops while the index remains equal). So
let's do that. In order to improve the situation a little bit, we
now set the index to a ulong so that in 32 bits we have 8 bits of
counter and in 64 bits we have 40 bits.
2023-04-21 17:41:26 +02:00
Willy Tarreau
e6f5ab5afa MINOR: listener: make accept_queue index atomic
There has always been a race when checking the length of an accept queue
to determine which one is more loaded that another, because the head and
tail are read at two different moments. This is not required, we can merge
them as two 16 bit numbers inside a single 32-bit index that is always
accessed atomically. This way we read both values at once and always have
a consistent measurement.
2023-04-21 17:41:26 +02:00
Willy Tarreau
e4c36aa8a1 MINOR: receiver: add RX_F_MUST_DUP to indicate that an rx must be duped
The purpose of this new flag will be to mark that some listeners
duplicate their reference's FD instead of trying to setup a completely
new listener from scratch. This will be used when multiple groups want
to listen to the same socket, via multiple FDs.
2023-04-21 17:41:26 +02:00
Willy Tarreau
aae1810b4d MINOR: receiver: add a struct shard_info to store info about each shard
In order to create multiple receivers for one multi-group shard, we'll
need some more info about the shard. Here we store:
  - the number of groups (= number of receivers)
  - the number of threads (will be used for accept LB)
  - pointer to the reference rx (to get the FD and to find all threads)
  - pointers to the other members (to iterate over all threads)

For now since there's only one group per shard it remains simple. The
listener deletion code already takes care of removing the current
member from its shards list and moving others' reference to the last
one if it was their reference (so as to avoid o(n^2) updates during
ordered deletes).

Since the vast majority of setups will not use multi-group shards, we
try to save memory usage by only allocating the shard_info when it is
needed, so the principle here is that a receiver shard_info==NULL is
alone and doesn't share its socket with another group.

Various approaches were considered and tests show that the management
of the listeners during boot makes it easier to just attach to or
detach from a shard_info and automatically allocate it if it does not
exist, which is what is being done here.

For now the attach code is not called, but detach is already called
on delete.
2023-04-21 17:41:26 +02:00
Willy Tarreau
84fe1f479b MINOR: listener: support another thread dispatch mode: "fair"
This new algorithm for rebalancing incoming connections to multiple
threads is simpler and instead of considering the threads load, it will
only cycle through all of them, offering a fair share of the traffic to
each thread. It may be well suited for short-lived connections but is
also convenient for very large thread counts where it's not always certain
that the least loaded thread will always be found.
2023-04-21 17:41:26 +02:00
Willy Tarreau
6a4d48b736 MINOR: quic_sock: index li->per_thr[] on local thread id, not global one
There's a li_per_thread array in each listener for use with QUIC
listeners. Since thread groups were introduced, this array can be
allocated too large because global.nbthread is allocated for each
listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP)
may be used by a single listener. This was because the global thread
ID is used as the index instead of the local ID (since a listener may
only be used by a single group). Let's just switch to local ID and
reduce the allocated size.
2023-04-21 17:41:26 +02:00
Willy Tarreau
77d37b07b1 MINOR: quic: support migrating the listener as well
When migrating a quic_conn to another thread, we may need to also
switch the listener if the thread belongs to another group. When
this happens, the freshly created connection will already have the
target listener, so let's just pick it from the connection and use
it in qc_set_tid_affinity(). Note that it will be the caller's
responsibility to guarantee this.
2023-04-21 17:41:26 +02:00
Aurelien DARRAGON
76e255520f MINOR: server: pass adm and op cause to srv_update_status()
Operational and administrative state change causes are not propagated
through srv_update_status(), instead they are directly consumed within
the function to provide additional info during the call when required.

Thus, there is no valid reason for keeping adm and op causes within
server struct. We are wasting space and keeping uneeded complexity.

We now exlicitly pass change type (operational or administrative) and
associated cause to srv_update_status() so that no extra storage is
needed since those values are only relevant from srv_update_status().
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
1746b56e68 MINOR: server: change srv_op_st_chg_cause storage type
This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type".

While looking at current srv_op_st_chg_cause usage, it was clear that
the struct needed some cleanup since some leftovers from asynchronous server
state change updates were left behind and resulted in some useless code
duplication, and making the whole thing harder to maintain.

Two observations were made:

- by tracking down srv_set_{running, stopped, stopping} usage,
  we can see that the <reason> argument is always a fixed statically
  allocated string.
- check-related state change context (duration, status, code...) is
  not used anymore since srv_append_status() directly extracts the
  values from the server->check. This is pure legacy from when
  the state changes were applied asynchronously.

To prevent code duplication, useless string copies and make the reason/cause
more exportable, we store it as an enum now, and we provide
srv_op_st_chg_cause() function to fetch the related description string.
HEALTH and AGENT causes (check related) are now explicitly identified to
make consumers like srv_append_op_chg_cause() able to fetch checks info
from the server itself if they need to.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
f3b48a808e MINOR: server: srv_append_status refacto
srv_append_status() has become a swiss-knife function over time.
It is used from server code and also from checks code, with various
inputs and distincts code paths, making it very hard to guess the
actual behavior of the function (resulting string output).

To simplify the logic behind it, we're dividing it in multiple contextual
functions that take simple inputs and do explicit things, making them
more predictable and easier to maintain.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
9b1ccd7325 MINOR: server: change adm_st_chg_cause storage type
Even though it doesn't look like it at first glance, this is more like
a cleanup than an actual code improvement:

Given that srv->adm_st_chg_cause has been used to exclusively store
static strings ever since it was implemented, we make the choice to
store it as an enum instead of a fixed-size string within server
struct.

This will allow to save some space in server struct, and will make
it more easily exportable (ie: event handlers) because of the
reduced memory footprint during handling and the ability to later get
the corresponding human-readable message when it's explicitly needed.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
e9314fb7a7 MINOR: event_hdl: provide event->when for advanced handlers
For advanced async handlers only
(Registered using EVENT_HDL_ASYNC_TASK() macro):

event->when is provided as a struct timeval and fetched from 'date'
haproxy global variable.

Thanks to 'when', related event consumers will be able to timestamp
events, even if they don't work in real-time or near real-time.
Indeed, unlike sync or normal async handlers, advanced async handlers
could purposely delay the consumption of pending events, which means
that the date wouldn't be accurate if computed directly from within
the handler.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
ebf58e991a MINOR: event_hdl: dynamically allocated event data members
Add the ability to provide a cleanup function for event data passed
via the publishing function.

One use case could be the need to provide valid pointers in the safe
section of the data struct.
Cleanup function will be automatically called with data (or copy of data)
as argument when all handlers consumed the event, which provides an easy
way to release some memory or decrement refcounts to ressources that were
provided through the data struct.
data in itself may not be freed by the cleanup function, it is handled
by the API.

This would allow passing large (allocated) data blocks through the data
struct while keeping data struct size under the EVENT_HDL_ASYNC_EVENT_DATA
size limit.

To do so, when publishing an event, where we would currently do:

        struct event_hdl_cb_data_new_family event_data;

        /* safe data, available from both sync and async contexts
	 * may not use pointers to short-living resources
	 */
        event_data.safe.my_custom_data = x;

        /* unsafe data, only available from sync contexts */
        event_data.unsafe.my_unsafe_data = y;

        /* once data is prepared, we can publish the event */
        event_hdl_publish(NULL,
                          EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1,
                          EVENT_HDL_CB_DATA(&event_data));

We could do:

        struct event_hdl_cb_data_new_family event_data;

        /* safe data, available from both sync and async contexts
	 * may not use pointers to short-living resources,
	 * unless EVENT_HDL_CB_DATA_DM is used to ensure pointer
	 * consistency (ie: refcount)
	 */
        event_data.safe.my_custom_static_data = x;
	event_data.safe.my_custom_dynamic_data = malloc(1);

        /* unsafe data, only available from sync contexts */
        event_data.unsafe.my_unsafe_data = y;

        /* once data is prepared, we can publish the event */
        event_hdl_publish(NULL,
                          EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1,
                          EVENT_HDL_CB_DATA_DM(&event_data, data_new_family_cleanup));

With data_new_family_cleanup func which would look like this:

      void data_new_family_cleanup(const void *data)
      {
      	const struct event_hdl_cb_data_new_family *event_data = ptr;

	/* some data members require specific cleanup once the event
	 * is consumed
	 */
      	free(event_data.safe.my_custom_dynamic_data);
	/* don't ever free data! it is not ours */
      }

Not sure if this feature will become relevant in the future, so I prefer not
to mention it in the doc for now.

But given that the implementation is trivial and does not put a burden
on the existing API, it's a good thing to have it there, just in case.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
147691fd83 CLEANUP: event_hdl: fix comment typo about _sync assertion
Fixing a comment relative to EVENT_HDL_ASSERT_SYNC macro where a
typo was made and the comment was lacking some context.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
363ef4daa7 CLEANUP: event_hdl: updating obsolete comment for EVENT_HDL_CB_DATA
EVENT_HDL_CB_DATA macro comments were not updated during the API
refactor, fixing that.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
8273bfc639 BUG/MINOR: event_hdl: don't waste 1 event subtype slot
ESUB_INDEX(n) index macro is used exclusively with n > 0
Fixing it so that it starts numbering at 1 instead of 2.

This way, we don't waste a subtype slot in event_hdl_sub_type
struct, and we comply with the structure comments about max
supported event subtypes (currently set at 16).

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
a63f4903c9 MINOR: server/event_hdl: prepare for upcoming refactors
This commit does nothing that ought to be mentioned, except that
it adds missing comments and slighty moves some function calls
out of "sensitive" code in preparation of some server code refactors.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
d714213862 MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server
Expose proxy_uuid variable in event_hdl_cb_data_server struct to
overcome proxy_name fixed length limitation.

proxy_uuid may be used by the handler to perform proxy lookups.
This should be preferred over lookups relying proxy_name.
(proxy_name is suitable for printing / logging purposes but not for
ID lookups since it has a maximum fixed length)
2023-04-21 14:36:45 +02:00
Frédéric Lécaille
0ed94032b2 MINOR: quic: Do not allocate too much ack ranges
Limit the maximum number of ack ranges to QUIC_MAX_ACK_RANGES(32).

Must be backported to 2.6 and 2.7.
2023-04-19 11:36:54 +02:00
Frédéric Lécaille
4b2627beae BUG/MINOR: quic: Stop removing ACK ranges when building packets
Since this commit:

    BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit.

There are more chances that ack ranges may be removed from their trees when
building a packet. It is preferable to impose a limit to these trees. This
will be the subject of the a next commit to come.

For now on, it is sufficient to stop deleting ack range from their trees.
Remove quic_ack_frm_reduce_sz() and quic_rm_last_ack_ranges() which were
there to do that.
Make qc_frm_len() support ACK frames and calls it to ensure an ACK frame
may be added to a packet before building it.

Must be backported to 2.6 and 2.7.
2023-04-19 11:36:54 +02:00
Aurelien DARRAGON
2a9764baae CLEANUP: hlua: avoid confusion between internal timers and tick based timers
Not all hlua "time" variables use the same time logic.

hlua->wake_time relies on ticks since its meant to be used in conjunction
with task scheduling. Thus, it should be stored as a signed int and
manipulated using the tick api.
Adding a few comments about that to prevent mixups with hlua internal
timer api which doesn't rely on the ticks api.
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON
da9503ca9a MEDIUM: hlua: reliable timeout detection
For non yieldable lua handlers (converters, fetches or yield
incompatible lua functions), current timeout detection relies on now_ms
thread local variable.

But within non-yieldable contexts, now_ms won't be updated if not by us
(because we're momentarily stuck in lua context so we won't
re-enter the polling loop, which is responsible for clock updates).

To circumvent this, clock_update_date(0, 1) was manually performed right
before now_ms is being read for the timeout checks.

But this fails to work consistently, because if no other concurrent
threads periodically run clock_update_global_date(), which do happen if
we're the only active thread (nbthread=1 or low traffic), our
clock_update_date() call won't reliably update our local now_ms variable

Moreover, clock_update_date() is not the right tool for this anyway, as
it was initially meant to be used from the polling context.
Using it could have negative impact on other threads relying on now_ms
to be stable. (because clock_update_date() performs global clock update
from time to time)

-> Introducing hlua multipurpose timer, which is internally based on
now_cpu_time_fast() that provides per-thread consistent clock readings.

Thanks to this new hlua timer API, hlua timeout logic is less error-prone
and more robust.

This allows the timeout detection to work as expected for both yieldable
and non-yieldable lua handlers.

This patch depends on commit "MINOR: clock: add now_cpu_time_fast() function"

While this could theorically be backported to all stable versions,
it is advisable to avoid backports unless we're confident enough
since it could cause slight behavior changes (timing related) in
existing setups.
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON
df188f145b MINOR: clock: add now_cpu_time_fast() function
Same as now_cpu_time(), but for fast queries (less accurate)
Relies on now_cpu_time() and now_mono_time_fast() is used
as a cache expiration hint to prevent now_cpu_time() from being
called too often since it is known to be quite expensive.

Depends on commit "MINOR: clock: add now_mono_time_fast() function"
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON
07cbd8e074 MINOR: clock: add now_mono_time_fast() function
Same as now_mono_time(), but for fast queries (less accurate)
Relies on coarse clock source (also known as fast clock source on
some systems).

Fallback to now_mono_time() if coarse source is not supported on the system.
2023-04-19 11:03:31 +02:00
Amaury Denoyelle
0783a7b08e MINOR: listener: remove unneeded local accept flag
Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC
protocol before thread rebinding was supported by the quic_conn layer.

This should be backported up to 2.7 after the previous patch has also
been taken.
2023-04-18 17:09:34 +02:00
Amaury Denoyelle
739de3f119 MINOR: quic: properly finalize thread rebinding
When a quic_conn instance is rebinded on a new thread its tasks and
tasklet are destroyed and new ones created. Its socket is also migrated
to a new thread which stop reception on it.

To properly reactivate a quic_conn after rebind, wake up its tasks and
tasklet if they were active before thread rebind. Also reactivate
reading on the socket FD. These operations are implemented on a new
function qc_finalize_affinity_rebind().

This should be backported up to 2.7 after a period of observation.
2023-04-18 17:09:02 +02:00
Amaury Denoyelle
25174d51ef MEDIUM: quic: implement thread affinity rebinding
Implement a new function qc_set_tid_affinity(). This function is
responsible to rebind a quic_conn instance to a new thread.

This operation consists mostly of releasing existing tasks and tasklet
and allocating new instances on the new thread. If the quic_conn uses
its owned socket, it is also migrated to the new thread. The migration
is finally completed with updated the CID TID to the new thread. After
this step, the connection is thus accessible to the new thread and
cannot be access anymore on the old one without risking race condition.

To ensure rebinding is either done completely or not at all, tasks and
tasklet are pre-allocated before all operations. If this fails, an error
is returned and rebiding is not done.

To destroy the older tasklet, its context is set to NULL before wake up.
In I/O callbacks, a new function qc_process() is used to check context
and free the tasklet if NULL.

The thread rebinding can cause a race condition if the older thread
quic_dghdlrs::dgrams list contains datagram for the connection after
rebinding is done. To prevent this, quic_rx_pkt_retrieve_conn() always
check if the packet CID is still associated to the current thread or
not. In the latter case, no connection is returned and the new thread is
returned to allow to redispatch the datagram to the new thread in a
thread-safe way.

This should be backported up to 2.7 after a period of observation.
2023-04-18 17:08:34 +02:00