Commit Graph

7113 Commits

Author SHA1 Message Date
Willy Tarreau
cdb711e42b MEDIUM: pools: spread the allocated counter over a few buckets
The ->used counter is one of the most stressed, and it heavily
depends on the ->allocated one, so let's first move ->allocated
to a few buckets.

A function "pool_allocated()" was added to return the sum of the entries.
It's important not to abuse it as it does iterate, so everywhere it's
possible to avoid it by keeping a local counter, it's better. Currently
it's used for limited pools which need to make sure they do not allocate
too many objects. That's an acceptable tradeoff to save CPU on large
machines at the expense of spending a little bit more on small ones which
normally are not under load.
2023-08-12 19:04:34 +02:00
Willy Tarreau
06885aaea7 MINOR: pools: introduce the use of multiple buckets
On many threads and without the shared cache, there can be extreme
contention on the ->allocated counter, the ->free_list pointer, and
the ->used counter. It's possible to limit this contention by spreading
the counters a little bit over multiple entries, that are summed up when
a consultation is needed. The criterion used to spread the values cannot
be related to the thread ID due to migrations, since we need to keep
consistent stats (allocated vs used).

Instead we'll just hash the pointer, it provides an index that does the
job and that is consistent for the object. When having just a few entries
(16 here as it showed almost identical performance between global and
non-global pools) even iterations should be short enough during
measurements to not be a problem.

A pair of functions designed to ease pointer hash bucket calculation were
added, with one of them doing it for thread IDs because allocation failures
will be associated with a thread and not a pointer.

For now this patch only brings in the relevant parts of the infrastructure,
the CONFIG_HAP_POOL_BUCKETS_BITS macro that defaults to 6 bits when 512
threads or more are supported, 5 bits when 128 or more are supported, 4
bits when 16 or more are supported, otherwise 3 bits for small setups.
The array in the pool_head and the two utility functions are already
added. It should have no measurable impact beyond inflating the pool_head
structure.
2023-08-12 19:04:34 +02:00
Willy Tarreau
29ad61fb00 OPTIM: pools: make pool_get_from_os() / pool_put_to_os() not update ->allocated
The pool's allocation counter doesn't strictly require to be updated
from these functions, it may more efficiently be done in the caller
(even out of a loop for pool_flush() and pool_gc()), and doing so will
also help us spread the counters over an array later. The functions
were renamed _noinc and _nodec to make sure we catch any possible
user in an external patch. If needed, the original functions may easily
be reimplemented in an inline function.
2023-08-12 19:04:34 +02:00
Willy Tarreau
f0d188f6ed OPTIM: tools: improve hash distribution using a better prime seed
During tests it was noticed that the current hash is not that good
on 4- and 5- bit hashes. About 7.5% of all the 32-bit primes were tested
as candidates for the hash function, by submitting them 128 arrangements
of N pointers among 40k extracted from haproxy's pools, and the average
fill rates for 1- to 12- bit hashes were measured and compared. It was
clear that some values do not provide great hashes and other ones are
way more resistant.

The current value is not bad at all but delivers 42.6% unique 2-bit
outputs, 41.6% 3-bit, 38.0% 4-bit, 38.2% 5-bit and 37.1% 10-bit. Some
values did perform significantly better, among which 0xacd1be85 which
does 43.2% 2-bit, 42.5% 3-bit, 42.2% 4-bit, 39.2% 5-bit and 37.3% 10-bit.

The reverse value used in the ptr2_hash() was really underperforming and
was replaced with 0x9d28e4e9 which does 49.6%, 40.4%, 42.6%, 39.1%, and
37.2% respectvely.

This should slightly improve the accuracy of the task and memory
profiling, and will be useful for pools.
2023-08-12 19:04:34 +02:00
Willy Tarreau
58946d44f8 MINOR: tools: improve ptr hash distribution on 64 bits
When testing the pointer hash on 64-bit real pointers (map entries),
it appeared that the shift by 33 bits that hoped to compensate for the
3 nul LSB degrades the hash, and the centering is more optimal on
31-(bits+1)/2. This makes sense since the topmost bit of the
multiplicator is 31, so for an input of 1 bit and 1 bit of output we
would always get zero. With the formula adjusted this way, we can get
up to ~15% more unique entries at 10 bits and ~24% more at 11 bits.
2023-08-12 19:04:34 +02:00
Willy Tarreau
ab6cb5dea0 MINOR: tools: make ptr_hash() support 0-bit outputs
When dealing with macro-based size definitions, it is useful to be able
to hash pointers on zero bits so that the macro automatically returns a
constant 0. For now it only supports 1-32. Let's just add this special
case. It's automatically optimized out by the compiler since the function
is inlined.
2023-08-12 19:04:34 +02:00
Willy Tarreau
59c347c15e BUILD: defaults: use __WORDSIZE not LONGBITS for MAX_THREADS_PER_GROUP
LONGBITS was defined long ago with old compilers that didn't provide the
word size. It's still present as being referenced in various places in the
code, but we must not use it to define other macros that may be evaluated
at pre-processing time since it contains sizeof() and casts that are not
compatible with preprocessor conditions. Let's switch MAX_THREADS_PER_GROUP
to __WORDSIZE so that we can condition blocks of code on it if needed.

LONGBITS should really be removed by now, given that we don't support
compilers not providing __WORDSIZE anymore (gcc < 4.2).
2023-08-12 19:04:34 +02:00
Willy Tarreau
9e52c35de4 CLEANUP: stick-table: slightly reorder the stktable struct
By moving the config-time stuff after the updt_lock, we can plug some
holes without interfering with it. This allows us to get back to the
768-bytes struct. The performance was not affected at all.
2023-08-11 19:03:35 +02:00
Willy Tarreau
9c6248560e MINOR: stick-table: move the update lock into its own cache line
The read-lock contention observed on the update lock while turning it
into an upgradable lock were due to false sharing with the nearby
updates. Simply moving the lock alone into its own cache line is
sufficient to almost double the performance again, raising from 2355
to 4480k RPS with very low contention:

  Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 743422995452 lost
  Overhead  Shared Object          Symbol
    15.88%  haproxy                [.] stktable_lookup_key
     5.94%  haproxy                [.] ebmb_lookup
     5.69%  haproxy                [.] http_wait_for_request
     3.66%  haproxy                [.] stktable_touch_with_exp
     2.62%  [kernel]               [k] _raw_spin_unlock_irqrestore
     1.86%  haproxy                [.] http_action_return
     1.79%  haproxy                [.] stream_process_counters
     1.78%  [kernel]               [k] skb_release_data
     1.77%  haproxy                [.] process_stream

Unfortunately, trying to move the line anywhere else didn't work,
despite the remaining holes, because this structure is not quite
clean. This adds 64 bytes to a struct that was already 768 long,
so it's now 832. It's possible to repack it a little bit and regain
these bytes by removing the THREAD_ALIGN before "keys" because we
rarely use the config stuff, but that's a bit unsafe.
2023-08-11 19:03:35 +02:00
Willy Tarreau
87e072eea5 MEDIUM: stick-table: use a distinct lock for the updates tree
Updating an entry in the updates tree is currently performed under the
table's write lock, which causes huge contention with other accesses
such as lookups and free. Aside the updates tree, the update,
localupdate and commitupdate variables, nothing is manipulated, so
let's create a distinct lock (updt_lock) to protect these together
to remove this contention. It required to add an extra lock in the
few places where we delete the update (though only if we're really
going to delete it) to protect the tree. This is very convenient
because now peer_send_teachmsgs() only needs to take this read lock,
and there is very little contention left on the stick-table.

With this alone, the performance jumped from 614k to 1140k/s on a
80-thread machine with a peers section! Stick-table updates with
no peers however now has to stand two locks and slightly regressed
from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the
significant unlocking of the peers updates and considered totally
acceptable.
2023-08-11 19:03:35 +02:00
Willy Tarreau
cc10fce9c2 MINOR: stick-table: better organize the struct stktable
The structure currently mixes R/O and R/W fields, let's organize them
by access type, focusing mainly on splitting the updates from the rest
so that peers activity does not affect the rest. For now it doesn't
bring any benefit but it paves the way for splitting the lock.
2023-08-11 19:03:35 +02:00
Willy Tarreau
7968fe3889 MEDIUM: stick-table: change the ref_cnt atomically
Due to the ts->ref_cnt being manipulated and checked inside wrlocks,
we continue to have it updated under plenty of read locks, which have
an important cost on many-thread machines.

This patch turns them all to atomic ops and carefully moves them outside
of locks every time this is possible:
  - the ref_cnt is incremented before write-unlocking on creation otherwise
    the element could vanish before we can do it
  - the ref_cnt is decremented after write-locking on release
  - for all other cases it's updated out of locks since it's guaranteed by
    the sequence that it cannot vanish
  - checks are done before locking every time it's used to decide
    whether we're going to release the element (saves several write locks)
  - expiration tests are just done using atomic loads, since there's no
    particular ordering constraint there, we just want consistent values.

For Lua, the loop that is used to dump stick-tables could switch to read
locks only, but this was not done.

For peers, the loop that builds updates in peer_send_teachmsgs is extremely
expensive in write locks and it doesn't seem this is really needed since
the only updated variables are last_pushed and commitupdate, the first
one being on the shared table (thus not used by other threads) and the
commitupdate could likely be changed using a CAS. Thus all of this could
theoretically move under a read lock, but that was not done here.

On a 80-thread machine with a peers section enabled, the request rate
increased from 415 to 520k rps.
2023-08-11 19:03:35 +02:00
Willy Tarreau
8178a5211c MAJOR: threads/plock: update the embedded library again
This updates the local copy of the plock library to benefit from finer
memory ordering, EBO on more operations such as when take_w() and stow()
wait for readers to leave  and refined EBO, especially on common operation
such as attempts to upgade R to S, and avoids a counter-productive prior
read in rtos() and take_r().

These changes have shown a 5% increase on regular operations on ARM,
a 33% performance increase on ARM on stick-tables and 2% on x86, and
a 14% and 4% improvements on peers updates respectively on ARM and x86.

The availability of relaxed operations will probably be useful for stats
counters which are still extremely expensive to update.

The following plock commits were included in this update:

  9db830b plock: support inlining exponential backoff code
  008d3c2 plock: make the rtos upgrade faster
  2f76dde atomic: clean up the generic xchg()
  3c6919b atomic: make sure that the no-return macros do not return a value
  97c2bb7 atomic: make the fallback bts use the pointed type for the shift
  f4c1880 atomic: also implement the missing pl_btr()
  8329b82 atomic: guard all generic definitions to make it easier to provide specific ones
  7c5cb62 atomic: use C11 atomics when available
  96afaf9 atomic: prefer the C11 definitions in general
  f3ec7a6 atomic: implement load/store/atomic barriers
  8bdbd1e atomic: add atomic load/stores
  0f604c0 atomic: add more _noret operations
  3fe35db atomic: remove the (void) cast from the C11 operations
  3b08a7c atomic: allow to define the fallback _noret variants
  28deb22 atomic: make x86 arithmetic operations the _noret variants
  8061fe2 atomic: handle modern compilers that support returning flags
  b8b91b7 atomic: add the fetch-and-<op> operations (pl_ld<op>)
  59817ca atomic: add memory order variants for most operations
  a40774f plock: explicitly make use of the pl_*_noret operations
  6f1861b plock: switch to pl_sub_noret_lax() for cancellation
  c013980 plock: use pl_ldadd{_lax,_acq,} instead of pl_xadd()
  382eea3 plock: use a release ordering when dropping the lock
  60d750d plock: use EBO when waiting for readers to leave in take_w() and stow()
  fc01c4f plock: improve EBO a little bit
  1ef6390 plock: switch to CAS + XADD for pl_take_r()
2023-08-11 19:03:35 +02:00
Frédéric Lécaille
b0e32c6263 BUG/MINOR: quic: Possible crash when issuing "show fd/sess" CLI commands
->xprt_ctx (struct ssl_sock_ctx) and ->conn (struct connection) must be kept
by the remaining QUIC connection object (struct quic_cc_conn) after having
release the previous one (struct quic_conn) to allow "show fd/sess" commands
to be functional without causing haproxy crashes.

No need to backport.
2023-08-11 11:21:31 +02:00
Willy Tarreau
d93a00861d MINOR: h2: pass accept-invalid-http-request down the request parser
We're adding a new argument "relaxed" to h2_make_htx_request() so that
we can control its level of acceptance of certain invalid requests at
the proxy level with "option accept-invalid-http-request". The goal
will be to add deactivable checks that are still desirable to have by
default. For now no test is subject to it.
2023-08-08 19:10:54 +02:00
Willy Tarreau
30f58f4217 MINOR: http: add new function http_path_has_forbidden_char()
As its name implies, this function checks if a path component has any
forbidden headers starting at the designated location. The goal is to
seek from the result of a successful ist_find_range() for more precise
chars. Here we're focusing on 0x00-0x1F, 0x20 and 0x23 to make sure
we're not too strict at this point.
2023-08-08 19:10:54 +02:00
Willy Tarreau
197668de97 MINOR: ist: add new function ist_find_range() to find a character range
This looks up the character range <min>..<max> in the input string and
returns a pointer to the first one found. It's essentially the equivalent
of ist_find_ctl() in that it searches by 32 or 64 bits at once, but deals
with a range.
2023-08-08 19:10:54 +02:00
Willy Tarreau
d4069f3cee REORG: http: move has_forbidden_char() from h2.c to http.h
This function is not H2 specific but rather generic to HTTP. We'll
need it in H3 soon, so let's move it to HTTP and rename it to
http_header_has_forbidden_char().
2023-08-08 19:02:24 +02:00
Frédéric Lécaille
9f7cfb0a56 MEDIUM: quic: Allow the quic_conn memory to be asap released.
When the connection enters the "connection closing" state after having sent
a datagram with CONNECTION_CLOSE frames inside its packets, a lot of memory
may be freed from quic_conn objects (QUIC connection). This is done allocating
a reduced sized object which keeps enough information to handle the remaining
incoming packets for the connection in "connection closing" state, and to
continue to send again the previous datagram with CONNECTION_CLOSE frames inside
which has already been sent.

Define a new quic_cc_conn struct which represents the connection object after
entering the "connection close" state and after having release the quic_conn
connection object.

Define <pool_head_quic_cc_conn> new pool for these quic_cc_conn struct objects.

Define QUIC_CONN_COMMON structure which is shared between quic_conn struct object
(the connection before entering "connection close" state), and new quic_cc_conn
struct object (the connection after entering "connection close"). So, all the
members inside QUIC_CONN_COMMON may be indifferently dereferenced from a
quic_conn struct or a quic_cc_conn struct pointer.

Implement qc_new_cc_conn() function to allocate such connections in
"connection close" state. This function is responsible of copying the
required information from the original connection (quic_conn) to the remaining
connection (quic_cc_conn). Among others initialization, it redefined the
QUIC packet handler task to quic_cc_conn_io_cb() and the idle timer task
to qc_cc_idle_timer_task(). quic_cc_conn_io_cb() drains the received and
resend the datagram which CONNECTION_CLOSE frame which has already been sent
when entering "connection close" state. qc_cc_idle_timer_task() only releases
the remaining quic_cc_conn struct object.

Modify quic_conn_release() to allocate quic_cc_conn struct objects from the
original connection passed as argument. It does nothing if this original
connection is not in closing state, or if the idle timer has already expired.

Implement quic_release_cc_conn() to release a "connection close" connection.
It is called when its timer expires or if an error occured when sending
a packet from this connection when the peer is no more reachable.
2023-08-08 14:59:17 +02:00
Frédéric Lécaille
276697438d MINOR: quic: Use a pool for the connection ID tree.
Add "quic_cids" new pool to allocate the ->cids trees of quic_conn objects.
Replace ->cids member of quic_conn objects by pointer to "quic_cids" and
adapt the code consequently. Nothing special.
2023-08-08 10:57:00 +02:00
Frédéric Lécaille
dc9b8e1f27 MEDIUM: quic: Send CONNECTION_CLOSE packets from a dedicated buffer.
Add a new pool <pool_head_quic_cc_buf> for buffer used when building datagram
wich CONNECTION_CLOSE frames inside with QUIC_MIN_CC_PKTSIZE(128) as minimum
size.
Add ->cc_buf_area to quic_conn struct to store such buffers.
Add ->cc_dgram_len to store the size of the "connection close" datagrams
and ->cc_buf a buffer struct to be used with ->cc_buf_area as ->area member
value.
Implement qc_get_txb() to be called in place of qc_txb_alloc() to allocate
a struct "quic_cc_buf" buffer when the connection needs an immediate close
or a buffer struct if not.
Modify qc_prep_hptks() and qc_prep_app_pkts() to allow them to use such
"quic_cc_buf" buffer when an immediate close is required.
2023-08-08 10:57:00 +02:00
Frédéric Lécaille
f7ab5918d1 MINOR: quic: Move some counters from [rt]x quic_conn anonymous struct
Move rx.bytes, tx.bytes and tx.prep_bytes quic_conn struct member to
bytes anonymous struct (bytes.rx, bytes.tx and bytes.prep member respectively).
They are moved before being defined into a bytes anonoymous struct common to
a future struct to be defined.

Consequently adapt the code.
2023-08-07 18:57:45 +02:00
Frédéric Lécaille
a45f90dd4e MINOR: quic: Amplification limit handling sanitization.
Add a BUG_ON() to quic_peer_validated_addr() to check the amplification limit
is respected when it return false(0), i.e. when the connection is not validated.

Implement quic_may_send_bytes() which returns the number of bytes which may be
sent when the connection has not already been validated and call this functions
at several places when this is the case (after having called
quic_peer_validated_addr()).

Furthermore, this patch improves the code maintainability. Some patches to
come will have to rename ->[rt]x.bytes quic_conn struct members.
2023-08-07 18:57:45 +02:00
Frédéric Lécaille
1f40b6c9fe CLEANUP: quic: Remove quic_path_room().
This function is definitively no more needed/used.
2023-08-07 18:57:45 +02:00
Amaury Denoyelle
559482c11e MINOR: h3: abort request if not completed before full response
A HTTP server may provide a complete response even prior receiving the
full request. In this case, RFC 9114 allows the server to abort read
with a STOP_SENDING with error code H3_NO_ERROR.

This scenario was notably reproduced with haproxy and an inactive
server. If the client send a POST request, haproxy may provide a full
HTTP 503 response before the end of the full request.
2023-08-04 16:17:16 +02:00
Christopher Faulet
8670bb42c2 CLEANUP: stconn: Move comment about sedesc fields on the field line
Fields of sedesc structure were documented in the comment about the
structure itself. It was not really convenient, hard to read, hard to
update. So comments about the fields are moved on the corresponding field
line, as usual.
2023-08-04 14:32:57 +02:00
Christopher Faulet
ef2b15998c BUG/MINOR: htx/mux-h1: Properly handle bodyless responses when splicing is used
There is a mechanisme in the H1 and H2 multiplexer to skip the payload when
a response is returned to the client when it must not contain any payload
(response to a HEAD request or a 204/304 response). However, this does not
work when the splicing is used. The H2 multiplexer does not support the
splicing, so there is no issue. But with the mux-h1, when data are sent
using the kernel splicing, the mux on the server side is not aware the
client side should skip the payload. And once the data are put in a pipe,
there is no way to stop the sending.

It is a defect of the current design. This will be easier to deal with this
case when the mux-to-mux forwarding will be implemented. But for now, to fix
the issue, we should add an HTX flag on the start-line to pass the info from
the client side to the server side and be able to disable the splicing in
necessary.

The associated reg-test was improved to be sure it does not fail when the
splicing is configured.

This patch should be backported as far as 2.4..
2023-08-02 12:05:05 +02:00
Patrick Hemmer
7fccccccea MINOR: acl: add acl() sample fetch
This provides a sample fetch which returns the evaluation result
of the conjunction of named ACLs.
2023-08-01 10:49:06 +02:00
Patrick Hemmer
00e00fb424 REORG: cfgparse: extract curproxy as a global variable
This extracts curproxy from cfg_parse_listen so that it can be referenced by
keywords that need the context of the proxy they are being used within.
2023-08-01 10:48:28 +02:00
Patrick Hemmer
997a31dbdf CLEANUP: acl: remove cache_idx from acl struct
It isn't used and never has been.
2023-08-01 10:48:05 +02:00
Frédéric Lécaille
c156c5bda6 MINOR: quic; Move the QUIC frame pool to its proper location
pool_head_quic_frame QUIC frame pool definition is move from quic_conn-t.h to
quic_frame-t.h.
Its declation is moved from quic_conn.c to quic_frame.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
fa58f67787 CLEANUP: quic: quic_conn struct cleanup
Remove no more used QUIC_TX_RING_BUFSZ macro.
Remove several no more used quic_conn struct members.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
444c1a4113 MINOR: quic: Split QUIC connection code into three parts
Move the TX part of the code to quic_tx.c.
Add quic_tx-t.h and quic_tx.h headers for this TX part code.
The definition of quic_tx_packet struct has been move from quic_conn-t.h to
quic_tx-t.h.

Same thing for the TX part:
Move the RX part of the code to quic_rx.c.
Add quic_rx-t.h and quic_rx.h headers for this TX part code.
The definition of quic_rx_packet struct has been move from quic_conn-t.h to
quic_rx-t.h.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
2fe50a01ca CLEANUP: quic: Defined but no more used function (quic_get_tls_enc_levels())
This function is no more used since this commit:

    MEDIUM: quic: Handshake I/O handler rework.

Let's remove it!
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
7008f16d57 MINOR: quic: Add a new quic_ack.c C module for QUIC acknowledgements
Extract the code in relation with the QUIC acknowledgements from quic_conn.c
to quic_ack.c to accelerate the compilation of quic_conn.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
f454b78fa9 MINOR: quic: Add new "QUIC over SSL" C module.
Move the code which directly calls the functions of the OpenSSL QUIC API into
quic_ssl.c new C file.
Some code have been extracted from qc_conn_finalize() to implement only
the QUIC TLS part (see quic_tls_finalize()) into quic_tls.c.
qc_conn_finalize() has also been exported to be used from this new quic_ssl.c
C module.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
57237f68ad MINOR: quic: Move TLS related code to quic_tls.c
quic_tls_key_update() and quic_tls_rotate_keys() are QUIC TLS functions.
Let's move them to their proper location: quic_tls.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
953e67abb6 MINOR: quic: Export QUIC CLI code from quic_conn.c
To accelerate the compilation of quic_conn.c file, export the code in relation
with the QUIC CLI from quic_conn.c to quic_cli.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
6334f4f6c5 MINOR: quic: Export QUIC traces code from quic_conn.c
To accelerate the compilation of quic_conn.c file, export the code in relation
with the traces from quic_conn.c to quic_trace.c.
Also add some headers (quic_trace-t.h and quic_trace.h).
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
f32201abb0 MINOR: quic: Add "limited-quic" new tuning setting
This setting which may be used into a "global" section, enables the QUIC listener
bindings when haproxy is compiled with the OpenSSL wrapper. It has no effect
when haproxy is compiled against a TLS stack with QUIC support, typically quictls.
2023-07-21 19:19:27 +02:00
Frédéric Lécaille
2fd67c558a MINOR: quic: Missing encoded transport parameters for QUIC OpenSSL wrapper
This wrapper needs to have an access to an encoded version of the local transport
parameter (to be sent to the peer). They are provided to the TLS stack thanks to
qc_ssl_compat_add_tps_cb() callback.

These encoded transport parameters were attached to the QUIC connection but
removed by this commit to save memory:

      MINOR: quic: Stop storing the TX encoded transport parameters

This patch restores these transport parameters and attaches them again
to the QUIC connection (quic_conn struct), but only when the QUIC OpenSSL wrapper
is compiled.
Implement qc_set_quic_transport_params() to encode the transport parameters
for a connection and to set them into the stack and make this function work
for both the OpenSSL wrapper or any other TLS stack with QUIC support. Its uses
the encoded version of the transport parameters attached to the connection
when compiled for the OpenSSL wrapper, or local parameters when compiled
with TLS stack with QUIC support. These parameters are passed to
quic_transport_params_encode() and SSL_set_quic_transport_params() as before
this patch.
2023-07-21 17:27:40 +02:00
Frédéric Lécaille
7978493c2e MINOR: quic: Add a quic_openssl_compat struct to quic_conn struct
Add quic_openssl_compat struct to the quic_conn struct to support the
QUIC OpenSSL wrapper feature.
2023-07-21 15:54:31 +02:00
Frédéric Lécaille
e3991e03cc MINOR: quic: Export some KDF functions (QUIC-TLS)
quic_hkdf_expand() and quic_hkdf_expand_label() must be used by the QUIC OpenSSL
wrapper.
2023-07-21 15:53:41 +02:00
Frédéric Lécaille
780133548c MINOR: quic: Include QUIC opensssl wrapper header from TLS stacks compatibility header
Include haproxy/quic_openssl_compat.h from haproxy/openssl-compat.h when the
compilation of the QUIC openssl wrapper for TLS stacks is enabled with
USE_QUIC_OPENSSLCOMPAT.
2023-07-21 15:53:40 +02:00
Frédéric Lécaille
1b03f8016d MINOR: quic: QUIC openssl wrapper implementation
Highly inspired from nginx openssl wrapper code.

This wrapper implement this list of functions:

   SSL_set_quic_method(),
   SSL_quic_read_level(),
   SSL_quic_write_level(),
   SSL_set_quic_transport_params(),
   SSL_provide_quic_data(),
   SSL_process_quic_post_handshake()

and SSL_QUIC_METHOD QUIC specific bio method which are also implemented by quictls
to support QUIC from OpenSSL. So, its aims is to support QUIC from a standard OpenSSL
stack without QUIC support. It relies on the OpenSSL keylog feature to retreive
the secrets derived by the OpenSSL stack during a handshake and to pass them to
the ->set_encryption_secrets() callback as this is done by quictls. It makes
usage of a callback (quic_tls_compat_msg_callback()) to handle some TLS messages
only on the receipt path. Some of them must be passed to the ->add_handshake_data()
callback as this is done with quictls to be sent to the peer as CRYPTO data.
quic_tls_compat_msg_callback() callback also sends the received TLS alert with
->send_alert() callback.

AES 128-bits with CCM mode is not supported at this time. It is often disabled by
the OpenSSL stack, but as it can be enabled by "ssl-default-bind-ciphersuites",
the wrapper will send a TLS alerts (Handhshake failure) if this algorithm is
negotiated between the client and the server.

0rtt is also not supported by this wrapper.
2023-07-21 15:53:40 +02:00
Frédéric Lécaille
72619bda4c MINOR: quic: add trace about pktns packet/frames releasing
Add useful traces which have alredy helped in debugging issues.
2023-07-21 14:31:42 +02:00
Frédéric Lécaille
0645e56a6e MINOR: quic: Add traces for qc_frm_free()
Useful to diagnose memory leak issues in relation with the QUIC frame objects.
2023-07-21 14:30:35 +02:00
Frédéric Lécaille
cf2368a3d5 MEDIUM: quic: Packet building rework.
The aim of this patch is to allow the building of QUIC datagrams with
as much as packets with different encryption levels inside during handshake.
At this time, this is possible only for at most two encryption levels.
That said, most of the time, a server only needs to use two encryption levels
by datagram, except during retransmissions.

Modify qc_prep_pkts(), the function responsible of building datagrams, to pass
a list of encryption levels as parameter in place of two encryption levels. This
function is also used when retransmitting datagrams. In this case this is a
customized/flexible list of encryption level which is passed to this function.
Add ->retrans new member to quic_enc_level struct, to be used as attach point
to list of encryption level used only during retransmission, and ->retrans_frms
new member which is a pointer to a list of frames to be retransmitted.
2023-07-21 14:30:35 +02:00
Frédéric Lécaille
2b8510d722 MINOR: quic: Release asap the negotiated Initial TLS context.
This context may be released at the same time as the Initial TLS context.
This is done calling quic_tls_ctx_secs_free() and pool_free() in two code locations.
Implement quic_nictx_free() to do that.
2023-07-21 14:27:10 +02:00
Frédéric Lécaille
90a63ae4fa MINOR: quic: Dynamic allocation for negotiated Initial TLS cipher context.
Shorten ->negotiated_ictx quic_conn struct member (->nictx).
This variable is used during version negotiation. Indeed, a connection
may have to support support several QUIC versions of paquets during
the handshake. ->nictx is the QUIC TLS cipher context used for the negotiated
QUIC version.

This patch allows a connection to dynamically allocate this TLS cipher context.

Add a new pool (pool_head_quic_tls_ctx) for such QUIC TLS cipher context object.
Modify qc_new_conn() to initialize ->nictx to NULL value.
quic_tls_ctx_secs_free() frees all the secrets attached to a QUIC TLS cipher context.
Modify it to do nothing if it is called with a NULL TLS cipher context.
Modify to allocate ->nictx from qc_conn_finalize() just before initializing
its secrets. qc_conn_finalize() allocates -nictx only if needed (if a new QUIC
version was negotiated).
Modify qc_conn_release() which release a QUIC connection (quic_conn struct) to
release ->nictx TLS cipher context.
2023-07-21 14:27:10 +02:00
Frédéric Lécaille
642dba8c22 MINOR: quic: Stop storing the TX encoded transport parameters
There is no need to keep an encoded version of the QUIC listener transport
parameters attache to the connection.

Remove ->enc_params and ->enc_params_len member of quic_conn struct.
Use variables to build the encoded transport parameter local to
ha_quic_set_encryption_secrets() before they are passed to
SSL_set_quic_transport_params().

Modify qc_ssl_sess_init() prototype. It was expected to be used with
the encoded transport parameters as passed parameter, but they were not
used. Cleanup this function.
2023-07-21 14:27:10 +02:00
Patrick Hemmer
57926fe8a3 MINOR: peers: add peers keyword registration
This adds support for registering keywords in the 'peers' section.
2023-07-20 18:12:44 +02:00
Willy Tarreau
6ecabb3f35 CLEANUP: config: make parse_cpu_set() return documented values
parse_cpu_set() stopped returning the undocumented -1 which was a
leftover from an earlier attempt, changed from ulong to int since
it only returns a success/failure and no more a mask. Thus it must
not return -1 and its callers must only test for != 0, as is
documented.
2023-07-20 11:01:09 +02:00
Willy Tarreau
f54d8c6457 CLEANUP: cpuset: remove the unused proc_t1 field in cpu_map
This field used to store the cpumap of the first thread in a group, and
was used till 2.4 to hold some default settings, after which it was no
longer used. Let's just drop it.
2023-07-20 11:01:09 +02:00
Willy Tarreau
151f9a2808 BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct
We're currently having a problem with the porting from cpu_map from
processes to thread-groups as it happened in 2.7 with commit 5b09341c0
("MEDIUM: cpu-map: replace the process number with the thread group
number"), though it seems that it has deeper roots even in 2.0 and
that it was progressively made worng over time.

The issue stems in the way the per-process and per-thread cpu-sets were
employed over time. Originally only processes were supported. Then
threads were added after an optional "/" and it was documented that
"cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified
in 2.5 by commit 317804d28 ("DOC: update references to process numbers
in cpu-map and bind-process").

The reality is different: when processes were still supported, setting
"cpu-map 1" would apply the mask to the process itself (and only when
run in the background, which is not documented either and is also a
bug for another fix), and would be combined with any possible per-thread
mask when calculating the threads' affinity, possibly resulting in empty
sets. However, "cpu-map 1/all" would only set the mask for the threads
and not the process. As such the following:

    cpu-map 1 odd
    cpu-map 1/1-8 even

would leave no CPU while doing:

    cpu-map 1/all odd
    cpu-map 1/1-8 even

would allow all CPUs.

While such configs are very unlikely to ever be met (which is why this
bug is tagged minor), this is becoming quite more visible while testing
automatic CPU binding during 2.9 development because due to this bug
it's much more common to end up with incorrect bindings.

This patch fixes it by simply removing the .proc entry from cpu_map and
always setting all threads' maps. The process is no longer arbitrarily
bound to the group 1's mask, but in case threads are disabled, we'll
use thread 1's mask since it contains the configured CPUs.

This fix should be backported at least to 2.6, but no need to insist if
it resists as it's easier to break cpu-map than to fix an unlikely issue.
2023-07-20 11:01:09 +02:00
Willy Tarreau
7134417613 MINOR: cpuset: add cpu_map_configured() to know if a cpu-map was found
Since we'll soon want to adjust the "thread-groups" degree of freedom
based on the presence of cpu-map, we first need to be able to detect
if cpu-map was used. This function scans all cpu-map sets to detect if
any is present, and returns true accordingly.
2023-07-20 11:01:09 +02:00
Aurelien DARRAGON
2e7d3d2e5c BUG/MINOR: hlua: hlua_yieldk ctx argument should support pointers
lua_yieldk ctx argument is of type lua_KContext which is typedefed to
intptr_t when available so it can be used to store pointers.

But the wrapper function hlua_yieldk() passes it as a regular it so it
breaks that promise.

Changing hlua_yieldk() prototype so that ctx argument is of type
lua_KContext.

This bug had no functional impact because ctx argument is not being
actively used so far. This may be backported to all stable versions
anyway.
2023-07-17 07:42:47 +02:00
Emeric Brun
49ddd87d41 CLEANUP: quic: remove useless parameter 'key' from quic_packet_encrypt
Parameter 'key' was not used in this function.

This patch removes it from the prototype of the function.

This patch could be backported until v2.6.
2023-07-12 14:33:03 +02:00
Emeric Brun
cadb232e93 BUG/MEDIUM: quic: timestamp shared in token was using internal time clock
The internal tick clock was used to export the timestamp int the token
on retry packets. Doing this in cluster mode the nodes don't
understand the timestamp from tokens generated by others.

This patch re-work this using the the real current date (wall-clock time).

Timestamp are also now considered in secondes instead of milleseconds.

This patch should be backported until v2.6
2023-07-12 14:32:01 +02:00
Aurelien DARRAGON
b6e2d62fb3 MINOR: sink/api: pass explicit maxlen parameter to sink_write()
sink_write() currently relies on sink->maxlen to know when to stop
writing a given payload.

But it could be useful to pass a smaller, explicit value to sink_write()
to stop before the ring maxlen, for instance if the ring is shared between
multiple feeders.

sink_write() now takes an optional maxlen parameter:
  if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen
  is smaller than ring->maxlen, else only ring->maxlen will be considered.

[for haproxy <= 2.7, patch must be applied by hand: that is:
__sink_write() and sink_write() should be patched to take maxlen into
account and function calls to sink_write() should use 0 as second argument
to keep original behavior]
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
4f0e0f5a65 MEDIUM: sample: introduce 'same' output type
Thierry Fournier reported an annoying side-effect when using the debug()
converter.

Consider the following examples:

[1]  http-request set-var(txn.test) bool(true),ipmask(24)
[2]  http-request redirect location /match if { bool(true),ipmask(32) }

When starting haproxy with [1] example we get:

   config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request set-var(txn.test)' rule : converter 'ipmask' cannot be applied.

With [2], we get:

  config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request redirect' rule : error in condition: converter 'ipmask' cannot be applied in ACL expression 'bool(true),ipmask(32)'.

Now consider the following examples which are based on [1] and [2]
but with the debug() sample conv inserted in-between those incompatible
sample types:

[1*] http-request set-var(txn.test) bool(true),debug,ipmask(24)
[2*] http-request redirect location /match if { bool(true),debug,ipmask(32) }

According to the documentation, "it is safe to insert the debug converter
anywhere in a chain, even with non-printable sample types".
Thus we don't expect any side-effect from using it within a chain.

However in current implementation, because of debug() returning SMP_T_ANY
type which is a pseudo type (only resolved at runtime), the sample
compatibility checks performed at parsing time are completely uneffective.
(haproxy will start and no warning will be emitted)

The indesirable effect of this is that debug() prevents haproxy from
warning you about impossible type conversions, hiding potential errors
in the configuration that could result to unexpected evaluation or logic
while serving live traffic. We better let haproxy warn you about this kind
of errors when it has the chance.

With our previous examples, this could cause some inconveniences. Let's
say for example that you are testing a configuration prior to deploying
it. When testing the config you might want to use debug() converter from
time to time to check how the conversion chain behaves.

Now after deploying the exact same conf, you might want to remove those
testing debug() statements since they are no longer relevant.. but
removing them could "break" the config and suddenly prevent haproxy from
starting upon reload/restart. (this is the expected behavior, but it
comes a bit too late because of debug() hiding errors during tests..)

To fix this, we introduce a new output type for sample expressions:
  SMP_T_SAME - may only be used as "expected" out_type (parsing time)
               for sample converters.

As it name implies, it is a way for the developpers to indicate that the
resulting converter's output type is guaranteed to match the type of the
sample that is presented on the converter's input side.
(converter may alter data content, but data type must not be changed)

What it does is that it tells haproxy that if switching to the converter
(by looking at the converter's input only, since outype is SAME) is
conversion-free, then the converter type can safely be ignored for type
compatibility checks within the chain.

debug()'s out_type is thus set to SMP_T_SAME instead of ANY, which allows
it to fully comply with the doc in the sense that it does not impact the
conversion chain when inserted between sample items.

Co-authored-by: Thierry Fournier <thierry.f.78@gmail.com>
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
a635a1779a MEDIUM: sample: add missing ADDR=>? compatibility matrix entries
SMP_T_ADDR support was added in b805f71 ("MEDIUM: sample: let the cast
functions set their output type").

According to the above commit, it is made clear that the ADDR type is
a pseudo/generic type that may be used for compatibility checks but
that cannot be emitted from a fetch or converter.

With that in mind, all conversions from ADDR to other types were
explicitly disabled in the compatibility matrix.

But later, when map_*_ip functions were updated in b2f8f08 ("MINOR: map:
The map can return IPv4 and IPv6"), we started using ADDR as "expected"
output type for converters. This still complies with the original
description from b805f71, because it is used as the theoric output
type, and is never emitted from the converters themselves (only "real"
types such as IPV4 or IPV6 are actually being emitted at runtime).

But this introduced an ambiguity as well as a few bugs, because some
compatibility checks are being performed at config parse time, and thus
rely on the expected output type to check if the conversion from current
element to the next element in the chain is theorically supported.

However, because the compatibility matrix doesn't support ADDR to other
types it is currently impossible to use map_*_ip converters in the middle
of a chain (the only supported usage is when map_*_ip converters are
at the very end of the chain).

To illustrate this, consider the following examples:

  acl test str(ok),map_str_ip(test.map) -m found                          # this will work
  acl test str(ok),map_str_ip(test.map),ipmask(24) -m found               # this will raise an error

Likewise, stktable_compatible_sample() check for stick tables also relies
on out_type[table_type] compatibility check, so map_*_ip cannot be used
with sticktables at the moment:

  backend st_test
    stick-table type string size 1m expire 10m store http_req_rate(10m)
  frontend fe
    bind localhost:8080
    mode http
    http-request track-sc0 str(test),map_str_ip(test.map) table st_test     # raises an error

To fix this, and prevent future usage of ADDR as expected output type
(for either fetches or converters) from introducing new bugs, the
ADDR=>? part of the matrix should follow the ANY type logic.

That is, ADDR, which is a pseudo-type, must be compatible with itself,
and where IPV4 and IPV6 both support a backward conversion to a given
type, ADDR must support it as well. It is done by setting the relevant
ADDR entries to c_pseudo() in the compatibility matrix to indicate that
the operation is theorically supported (c_pseudo() will never be executed
because ADDR should not be emitted: this only serves as a hint for
compatibility checks at parse time).

This is what's being done in this commit, thanks to this the broken
examples documented above should work as expected now, and future
usage of ADDR as out_type should not cause any issue.
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
30cd137d3f MINOR: sample: introduce c_pseudo() conv function
This function is used for ANY=>!ANY conversions in the compatibility
matrix to help differentiate between real NOOP (c_none) and pseudo
conversions that are theorically supported at config parse time but can
never occur at runtime,. That is, to explicit the fact that actual related
runtime operations (e.g.: ANY->IPV4) are not NOOP since they might require
some conversion to be performed depending on the input type.

When checking the conf we don't know the effective out types so
cast[pseudo type][pseudo type] is allowed in the compatibility matrix,
but at runtime we only expect cast[real type][(real type || pseudo type)]
because fetches and converters may not emit pseudo types, thus using
c_none() everywhere was too ambiguous.

The process will crash if c_pseudo() is invoked to help catch bugs:
crashing here means that a pseudo type has been encountered on a
converter's input at runtime (because it was emitted earlier in the
chain), which is not supported and results from a broken sample fetch
or converter implementation. (pseudo types may only be used as out_type
in sample definitions for compatibility checks at parsing time)
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
58bbe41cb8 MEDIUM: acl/sample: unify sample conv parsing in a single function
Both sample_parse_expr() and parse_acl_expr() implement some code
logic to parse sample conv list after respective fetch or acl keyword.

(Seems like the acl one was inspired by the sample one historically)

But there is clearly code duplication between the two functions, making
them hard to maintain.
Hopefully, the parsing logic between them has stayed pretty much the
same, thus the sample conv parsing part may be moved in a dedicated
helper parsing function.

This is what's being done in this commit, we're adding the new function
sample_parse_expr_cnv() which does a single thing: parse the converters
that are listed right after a sample fetch keyword and inject them into
an already existing sample expression.

Both sample_parse_expr() and parse_acl_expr() were adapted to now make
use of this specific parsing function and duplicated code parts were
cleaned up.

Although sample_parse_expr() remains quite complicated (numerous function
arguments due to contextual parsing data) the first goal was to get rid of
code duplication without impacting the current behavior, with the added
benefit that it may allow further code cleanups / simplification in the
future.
2023-07-03 16:32:01 +02:00
Frédéric Lécaille
7f3c1bef37 MINOR: quic: Drop packet with type for discarded packet number space.
This patch allows the low level packet parser to drop packets with type for discarded
packet number spaces. Furthermore, this prevents it from reallocating new encryption
levels and packet number spaces already released/discarded. When a packet number space
is discarded, it MUST NOT be reallocated.

As the packet number space discarding is done asap the type of packet received is
known, some packet number space discarding check may be safely removed from qc_try_rm_hp()
and qc_qel_may_rm_hp() which are called after having parse the packet header, and
is type.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
b97de9dc21 MINOR: quic: Move the packet number space status at quic_conn level
As the packet number spaces and encryption level are dynamically allocated,
the information about the packet number space discarded status must be kept
somewhere else than in these objects.

quic_tls_discard_keys() is no more useful.
Modify quic_pktns_discard() to do the same job: flag the quic_conn object
has having discarded packet number space.
Implement quic_tls_pktns_is_disarded() to check if a packet number space is
discarded. Note the Application data packet number space is never discarded.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
f7749968d6 CLEANUP: quic: Remove two useless pools a low QUIC connection level
Both "quic_tx_ring" and "quic_rx_crypto_frm" pool are no more used.

Should be backported as far as 2.6.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
a5c1a3b774 MINOR: quic: Reduce the maximum length of TLS secrets
The maximum length of the secrets derived by the TLS stack is 384 bits.
This reduces the size of the objects provided by the "quic_tls_secret" pool by
16 bytes.

Should be backported as far as 2.6
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
3097be92f1 MEDIUM: quic: Dynamic allocations of QUIC TLS encryption levels
Replace ->els static array of encryption levels by 4 pointers into the QUIC
connection object, quic_conn struct.
    ->iel denotes the Initial encryption level,
    ->eel the Early-Data encryption level,
    ->hel the Handshaske encryption level and
    ->ael the Application Data encryption level.

Add ->qel_list to this structure to list the encryption levels after having been
allocated. Modify consequently the encryption level object itself (quic_enc_level
struct) so that it might be added to ->qel_list QUIC connection list of
encryption levels.

Implement qc_enc_level_alloc() to initialize the value of a pointer to an encryption
level object. It is used to initialized the pointer newly added to the quic_conn
structure. It also takes a packet number space pointer address as argument to
initialize it if not already initialized.

Modify quic_tls_ctx_reset() to call it from quic_conn_enc_level_init() which is
called by qc_enc_level_alloc() to allocate an encryption level object.

Implement 2 new helper functions:
  - ssl_to_qel_addr() to match and pointer address to a quic_encryption level
    attached to a quic_conn object with a TLS encryption level enum value;
  - qc_quic_enc_level() to match a pointer to a quic_encryption level attached
    to a quic_conn object with an internal encryption level enum value.
This functions are useful to be called from ->set_encryption_secrets() and
->add_handshake_data() TLS stack called which takes a TLS encryption enum
as argument (enum ssl_encryption_level_t).

Replace all the use of the qc->els[] array element values by one of the newly
added ->[ieha]el quic_conn struct member values.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
25a7b15144 MINOR: quic: Add a pool for the QUIC TLS encryption levels
Very simple patch to define and declare a pool for the QUIC TLS encryptions levels.
It will be used to dynamically allocate such objects to be attached to the
QUIC connection object (quic_conn struct) and to remove from quic_conn struct the
static array of encryption levels (see ->els[]).
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
7d9f12998d CLEANUP: quic: Remove qc_list_all_rx_pkts() defined but not used
This function is not used. May be safely removed.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
6635aa6a0a MEDIUM: quic: Dynamic allocations of packet number spaces
Add a pool to dynamically handle the memory used for the QUIC TLS packet number spaces.
Remove the static array of packet number spaces at QUIC connection level (struct
quic_conn) and add three new members to quic_conn struc as pointers to quic_pktns
struct, one by packet number space as follows:
     ->ipktns for Initial packet number space,
     ->hpktns for Handshake packet number space and
     ->apktns for Application packet number space.
Also add a ->pktns_list new member (struct list) to quic_conn struct to attach
the list of the packet number spaces allocated for the QUIC connection.
Implement ssl_to_quic_pktns() to map and retrieve the addresses of these pointers
from TLS stack encryption levels.
Modify quic_pktns_init() to initialize these members.
Modify ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data()  to
allocate the packet numbers and initialize the encryption level.
Implement quic_pktns_release() which takes pointers to pointers to packet number
space objects to release the memory allocated for a packet number space attached
to a QUIC connection and reset their address values.

Modify qc_new_conn() to allocation only the Initial packet number space and
Initial encryption level.

Modify QUIC loss detection API (quic_loss.c) to use the new ->pktns_list
list attached to a QUIC connection in place of a static array of packet number
spaces.

Replace at several locations the use of elements of an array of packet number
spaces by one of the three pointers to packet number spaces
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
ef39a74f4a MINOR: quic: Move packet number space related functions
Move packet number space related functions from quic_conn.h to quic_tls.h.

Should be backported as far as 2.6 to ease future backports to come.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
411b6f73b7 MINOR: quic: Implement a packet number space identification function
Implement quic_pktns_char() to identify a packet number space from a
quic_conn object. Usefull only for traces.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
dc6b339733 MINOR: quic: Move QUIC encryption level structure definition
haproxy/quic_tls-t.h is the correct place to quic_enc_level structure
definition.

Should be backported as far as 2.6 to ease any further backport to come.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
6593ec6f5e MINOR: quic: Move QUIC TLS encryption level related code (quic_conn_enc_level_init())
quic_conn_enc_level_init() location is definitively in QUIC TLS API source file:
src/quic_tls.c.
2023-06-30 16:20:55 +02:00
Willy Tarreau
90d18e2006 IMPORT: slz: implement a synchronous flush() operation
In some cases it may be desirable for latency reasons to forcefully
flush the queue even if it results in suboptimal compression. In our
case the queue might contain up to almost 4 bytes, which need an EOB
and a switch to literal mode, followed by 4 bytes to encode an empty
message. This means that each call can add 5 extra bytes in the ouput
stream. And the flush may also result in the header being produced for
the first time, which can amount to 2 or 10 bytes (zlib or gzip). In
the worst case, a total of 19 bytes may be emitted at once upon a flush
with 31 pending bits and a gzip header.

This is libslz upstream commit cf8c4668e4b4216e930b56338847d8d46a6bfda9.
2023-06-30 16:12:36 +02:00
William Lallemand
593c895eed MINOR: ssl: allow to change the client-sigalgs on server lines
This patch introduces the "client-sigalgs" keyword for the server line,
which allows to configure the list of server signature algorithms
negociated during the handshake. Also available as
"ssl-default-server-client-sigalgs" in the global section.
2023-06-29 14:11:46 +02:00
William Lallemand
717f0ad995 MINOR: ssl: allow to change the server signature algorithm on server lines
This patch introduces the "sigalgs" keyword for the server line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-server-sigalgs" in
the global section.
2023-06-29 13:40:18 +02:00
Frédéric Lécaille
c2bab72d32 BUG/MINOR: quic: Missing TLS secret context initialization
This bug arrived with this commit:

     MINOR: quic: Remove pool_zalloc() from qc_new_conn()

Missing initialization of largest packet number received during a keyupdate phase.
This prevented the keyupdate feature from working and made the keyupdate interop
tests to fail for all the clients.

Furthermore, ->flags from quic_tls_ctx was also not initialized. This could
also impact the keyupdate feature at least.

No backport needed.
2023-06-19 19:05:45 +02:00
Frédéric Lécaille
ddc616933c MINOR: quic: Remove pool_zalloc() from qc_new_conn()
qc_new_conn() is ued to initialize QUIC connections with quic_conn struct objects.
This function calls quic_conn_release() when it fails to initialize a connection.
quic_conn_release() is also called to release the memory allocated by a QUIC
connection.

Replace pool_zalloc() by pool_alloc() in this function and initialize
all quic_conn struct members which are referenced by quic_conn_release() to
prevent use of non initialized variables in this fonction.
The ebtrees, the lists attached to quic_conn struct must be initialized.
The tasks must be reset to their NULL default values to be safely destroyed
by task_destroy(). This is all the case for all the TLS cipher contexts
of the encryption levels (struct quic_enc_level) and those for the keyupdate.
The packet number spaces (struct quic_pktns) must also be initialized.
->prx_counters pointer must be initialized to prevent quic_conn_prx_cntrs_update()
from dereferencing this pointer.
->latest_rtt member of quic_loss struct must also be initialized. This is done
by quic_loss_init() called by quic_path_init().
2023-06-16 16:55:58 +02:00
Frédéric Lécaille
d66896036a BUG/MINOR: quic: Missing initialization (packet number space probing)
->tx.pto_probe member of quic_pktns struct was not initialized by quic_pktns_init().
This bug never occured because all quic_pktns structs are attached to quic_conn
structs which are always pool_zalloc()'ed.

Must be backported as far as 2.6.
2023-06-14 11:35:22 +02:00
Aurelien DARRAGON
b7f8af3ca9 BUG/MINOR: proxy/server: free default-server on deinit
proxy default-server is a specific type of server that is not allocated
using new_server(): it is directly stored within the parent proxy
structure. However, since it may contain some default config options that
may be inherited by regular servers, it is also subject to dynamic members
(strings, structures..) that needs to be deallocated when the parent proxy
is cleaned up.

Unfortunately, srv_drop() may not be used directly from p->defsrv since
this function is meant to be used on regular servers only (those created
using new_server()).

To circumvent this, we're splitting srv_drop() to make a new function
called srv_free_params() that takes care of the member cleaning which
originally takes place in srv_drop(). This function is exposed through
server.h, so it may be called from outside server.c.

Thanks to this, calling srv_free_params(&p->defsrv) from free_proxy()
prevents any memory leaks due to dynamic parameters allocated when
parsing a default-server line from a proxy section.

This partially fixes GH #2173 and may be backported to 2.8.

[While it could also be relevant for other stable versions, the patch
won't apply due to architectural changes / name changes between 2.4 => 2.6
and then 2.6 => 2.8. Considering this is a minor fix that only makes
memory analyzers happy during deinit paths (at least for <= 2.8), it might
not be worth the trouble to backport them any further?]
2023-06-06 15:15:17 +02:00
Willy Tarreau
4ad1c9635a BUG/MINOR: stream: do not use client-fin/server-fin with HTX
Historically the client-fin and server-fin timeouts were made to allow
a connection closure to be effective quickly if the last data were sent
down a socket and the client didn't close, something that can happen
when the peer's FIN is lost and retransmits are blocked by a firewall
for example. This made complete sense in 1.5 for TCP and HTTP in close
mode. But nowadays with muxes, it's not done at the right layer anymore
and even the description doesn't match what is being done, because what
happens is that the stream will abort the whole transfer after it's done
sending to the mux and this timeout expires.

We've seen in GH issue 2095 that this can happen with very short timeout
values, and while this didn't trigger often before, now that the muxes
(h2 & quic) properly report an end of stream before even the first
sc_conn_sync_recv(), it seems that it can happen more often, and have
two undesirable effects:
  - logging a timeout when that's not the case
  - aborting the request channel, hence the server-side conn, possibly
    before it had a chance to be put back to the idle list, causing
    this connection to be closed and not reusable.

Unfortunately for TCP (mux_pt) this remains necessary because the mux
doesn't have a timeout task. So here we're adding tests to only do
this through an HTX mux. But to be really clean we should in fact
completely drop all of this and implement these timeouts in the mux
itself.

This needs to be backported to 2.8 where the issue was discovered,
and maybe carefully to older versions, though that is not sure at
all. In any case, using a higher timeout or removing client-fin in
HTTP proxies is sufficient to make the issue disappear.
2023-06-02 16:33:40 +02:00
Willy Tarreau
ae0f8be011 MINOR: stats: protect against future stats fields omissions
As seen in commits 33a4461fa ("BUG/MINOR: stats: Fix Lua's `get_stats`
function") and a46b142e8 ("BUG/MINOR: Missing stat_field_names (since
f21d17bb)") it seems frequent to omit to update stats_fields[] when
adding a new ST_F_xxx entry. This breaks Lua's get_stats() and shows
a "(null)" in the header of "show stat", but that one is not detectable
to the naked eye anymore.

Let's add a reminder above the enum declaration about this, and a small
reg tests checking for the absence of "(null)". It was verified to fail
before the last patch above.
2023-06-02 08:39:53 +02:00
Willy Tarreau
cb6a35fdc1 [RELEASE] Released version 2.9-dev0
Released version 2.9-dev0 with the following main changes :
    - MINOR: version: mention that it's development again
2023-05-31 16:29:19 +02:00
Willy Tarreau
9dc8308a67 MINOR: version: mention that it's development again
This essentially reverts b9b6e94474.
2023-05-31 16:28:34 +02:00
Willy Tarreau
b9b6e94474 MINOR: version: mention that it's LTS now.
The version will be maintained up to around Q2 2028. Let's
also update the INSTALL file to mention this.
2023-05-31 16:23:56 +02:00
Amaury Denoyelle
d68f8b5a4a CLEANUP: mux-quic: rename internal functions
This patch is similar to the previous one but for QUIC mux functions
used inside the mux code itself or application layer. Replace all
occurences of qc_* prefix by qcc_* or qcs_*. This should help to better
differentiate code between quic_conn and MUX.

This should be backported up to 2.7.
2023-05-30 15:45:55 +02:00
Amaury Denoyelle
6d6ee0dc0b MINOR: quic: fix stats naming for flow control BLOCKED frames
There was a misnaming in stats counter for *_BLOCKED frames in regard to
QUIC rfc convention. This patch fixes it to prevent future ambiguity :

- STREAMS_BLOCKED -> STREAM_DATA_BLOCKED
- STREAMS_DATA_BLOCKED_BIDI -> STREAMS_BLOCKED_BIDI
- STREAMS_DATA_BLOCKED_UNI -> STREAMS_BLOCKED_UNI

This should be backported up to 2.7.
2023-05-26 17:17:00 +02:00
Amaury Denoyelle
087c5f041b MINOR: mux-quic: remove nb_streams from qcc
Remove nb_streams field from qcc. It was not used outside of a BUG_ON()
statement to ensure we never have a negative count of streams. However
this is already checked with other fields.

This should be backported up to 2.7.
2023-05-26 17:17:00 +02:00
Amaury Denoyelle
7b41dfd834 CLEANUP: mux-quic: remove unneeded fields in qcc
Remove fields from qcc structure which are unused.

This should be backported up to 2.7.
2023-05-26 17:17:00 +02:00
Patrick Hemmer
425d7ad89d MINOR: init: pre-allocate kernel data structures on init
The Linux kernel maintains data structures to track a processes' open file
descriptors, and it expands these structures as necessary when FD usage grows
(at every FD=2^X starting at 64). However when threading is in use, during
expansion the kernel will pause (observed up to 47ms) while it waits for thread
synchronization (see https://bugzilla.kernel.org/show_bug.cgi?id=217366).

This change addresses the issue and avoids the random pauses by opening the
maximum file descriptor during initialization, so that expansion will not occur
while processing traffic.
2023-05-26 09:28:18 +02:00
Willy Tarreau
b298882acc BUILD: compiler: systematically set USE_OBSOLETE_LINKER with TCC
TCC silently ignores the weak and section attributes, which ruins the
initcalls. Technically we're exactly in the same situation as with an
obsolete linker. Let's just automatically set the flag if TCC is
detected, this avoids surprises where the program compiles but does
not start.

No backport is needed.
2023-05-24 21:37:06 +02:00
Willy Tarreau
eced142aa8 BUILD: ist: use the literal declaration for ist_lc/ist_uc under TCC
TCC doesn't knoow about __attribute__((weak)), it silently ignores it.
We could add a "static" modifier there in this case but we already have
an alternate portable mode that is based on a slightly larger literal
for obsolete linkers (and non-ELF systems) which choke on weak. Let's
just add the test for tcc there and use it in this case.

No backport is needed.
2023-05-24 21:33:34 +02:00
Willy Tarreau
4e8720ab78 BUILD: ist: do not put a cast in an array declaration
TCC is upset by the declaration looking like:

  const unsigned char ist_lc[256] __attribute__((weak)) = ((const unsigned char[256]){ ... });

It was written like this because it's expanded from the _IST_LC macro
but it's never used as-is, it's only used from ist_lc, which should be
the one containing the cast so that the macro only contains the list of
bytes that can be used in both places. And this assigns more consistent
roles to the lower and upper case macro/variable now, one is typed and
the other one not. No backport is needed.
2023-05-24 21:27:39 +02:00
Frédéric Lécaille
12a815ad19 MINOR: quic: Add a counter for sent packets
Add ->sent_pkt counter to quic_conn struct to count the packet at QUIC connection
level. Then, when the connection is released, the ->sent_pkt counter value
is added to the one for the listener.

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Frédéric Lécaille
bdd64fd71d MINOR: quic: Add some counters at QUIC connection level
Add some statistical counters to quic_conn struct from quic_counters struct which
are used at listener level to handle them at QUIC connection level. This avoid
calling atomic functions. Furthermore this will be useful soon when a counter will
be added for the total number of packets which have been sent which will be very
often incremented.

Some counters were not added, espcially those which count the number of QUIC errors
by QUIC error types. Indeed such counters would be incremented most of the time
only one time at QUIC connection level.

Implement quic_conn_prx_cntrs_update() which accumulates the QUIC connection level
statistical counters to the listener level statistical counters.

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Willy Tarreau
1e1c28873c BUILD: makefile: fix build issue on GNU make < 3.82
Thierry Fournier reported a build breakage with the ubiquitous make
3.81, LDFLAGS were ignored. This is caused by the declaration of the
collect_opt_flags macro that is defined with an "=" sign, something
that only appeared in 3.82 and that is not necessary. With it removed,
the build now works fine at least from 3.80 to 4.3.

No backport is needed since this makefile cleanup appeared in 2.8.
2023-05-24 15:51:03 +02:00
Ilya Shipitsin
97c344dae0 BUILD: quic: re-enable chacha20_poly1305 for libressl
this reverts d2be9d4c48

LibreSSL implements EVP_chacha20_poly1305() with EVP_CIPHER for every
released version starting with 3.6.0
2023-05-23 19:20:36 +02:00