Commit Graph

16712 Commits

Author SHA1 Message Date
Amaury Denoyelle
90873dc678 MINOR: proto_reverse_connect: support source address setting
Support backend configuration for explicit source address on
pre-connect. These settings can be specified via "source" backend
keyword or directly on the server line.

Previously, all source parameters triggered a BUG_ON() when binding a
reverse connect listener. This was done because some settings are
incompatible with reverse connect context : this is the case for all
source settings which do not specify a fixed address but rather rely on
a frontend connection. Indeed, in case of preconnect, connection is
initiated on its own without the existence of a previous frontend
connection.

This patch allows to use a source parameter with a fixed address. All
other settings (usesrc client/clientip/hdr_ip) are rejected on listener
binding. On connection init, alloc_bind_address() is used to set the
optional source address.
2023-10-03 17:50:36 +02:00
Amaury Denoyelle
bd001ff346 MINOR: backend: refactor specific source address allocation
Refactor alloc_bind_address() function which is used to allocate a
sockaddr if a connection to a target server relies on a specific source
address setting.

The main objective of this change is to be able to use this function
outside of backend module, namely for preconnections using a reverse
server. As such, this function is now exported globally.

For reverse connect, there is no stream instance. As such, the function
parts which relied on it were reduced to the minimal. Now, stream is
only used if a non-static address is configured which is useful for
usesrc client|clientip|hdr_ip. These options have no sense for reverse
connect so it should be safe to use the same function.
2023-10-03 17:49:12 +02:00
Amaury Denoyelle
2ac5d9a657 MINOR: quic: handle perm error on bind during runtime
Improve EACCES permission errors encounterd when using QUIC connection
socket at runtime :

* First occurence of the error on the process will generate a log
  warning. This should prevent users from using a privileged port
  without mandatory access rights.

* Socket mode will automatically fallback to listener socket for the
  receiver instance. This requires to duplicate the settings from the
  bind_conf to the receiver instance to support configurations with
  multiple addresses on the same bind line.
2023-10-03 16:52:02 +02:00
Amaury Denoyelle
3ef6df7387 MINOR: quic: define quic-socket bind setting
Define a new bind option quic-socket :
  quic-socket [ connection | listener ]

This new setting works in conjunction with the existing configuration
global tune.quic.socket-owner and reuse the same semantics.

The purpose of this setting is to allow to disable connection socket
usage on listener instances individually. This will notably be useful
when needing to deactivating it when encountered a fatal permission
error on bind() at runtime.
2023-10-03 16:49:26 +02:00
Remi Tricot-Le Breton
b019636cd7 DOC: sample: Add a comment in 'check_operator' to explain why 'vars_check_arg' should ignore the 'err' buffer
This extra comment ensure that we do not try to pass an 'err' argument
to 'vars_check_arg' otherwise some warnings will be raised if an
operator is given an integer directly in the configuration file.
2023-10-03 11:13:10 +02:00
Remi Tricot-Le Breton
6fe57303f7 Revert "MEDIUM: sample: Small fix in function check_operator for eror reporting"
This reverts commit d897d7da87.

The "check_operator" function is used for all the operator converters
such as "and", "or", "add"...
With such a converter that accepts a variable name as well as an
integer, the "vars_check_arg" call is expected to fail when an integer
is provided. Passing an "err" variable has the unwanted side effect of
raising a warning during init for a configuration such as the following:

    http-request set-query "s=%[rand,add(20)]"

which raises the following warning:
    [WARNING]  (33040) : config : parsing [hap.cfg:14] : invalid
    variable name '20'. A variable name must be start by its scope. The
    scope can be 'proc', 'sess', 'txn', 'req', 'res' or 'check'.
2023-10-03 11:13:10 +02:00
William Lallemand
c21ec3b735 BUG/MINOR: proto_reverse_connect: fix FD leak upon connect
new_reverse_conn() is creating its own socket with
sock_create_server_socket(). However the connect is done with
conn->ctrl->connect() which is tcp_connect_server().

tcp_connect_server() is also creating its own socket and sets it in the
struct conn, left the previous socket unclosed and leaking at each
attempt.

This patch fixes the issue by letting tcp_connect_server() handling the
socket part, and removes it in new_reverse_conn().
2023-09-30 00:53:43 +02:00
Amaury Denoyelle
c58fd4d1cc MINOR: tcp_act: remove limitation on protocol for attach-srv
This patch allows to specify "tcp-request session attach-srv" without
requiring that each associated bind lines mandates HTTP/2 usage. If a
non supported protocol is targetted by this rule, conn_install_mux_fe()
is responsible to reject it.

This change is mandatory to be able to mix attach-srv and standard
non-reversable connection on the same bind instances. An ACL can be used
to activate attach-srv only on some conditions.
2023-09-29 18:11:10 +02:00
Amaury Denoyelle
337c71423f MINOR: connection: define mux flag for reverse support
Add a new MUX flag MX_FL_REVERSABLE. This value is used to indicate that
MUX instance supports connection reversal. For the moment, only HTTP/2
multiplexer is flagged with it.

This allows to dynamically check if reversal can be completed during MUX
installation. This will allow to relax requirement on config writing for
'tcp-request session attach-srv' which currently cannot be used mixed
with non-http/2 listener instances, even if used conditionnally with an
ACL.
2023-09-29 18:09:08 +02:00
Amaury Denoyelle
ac1164de7c MINOR: connection: define error for reverse connect
Define a new error code for connection CO_ER_REVERSE. This will be used
to report an issue which happens on a connection targetted for reversal
before reverse process is completed.
2023-09-29 18:08:26 +02:00
Amaury Denoyelle
753fe2b9ac BUG/MINOR: tcp_act: fix attach-srv rule ACL parsing
Fix parser for tcp-request session attach-srv rule. Before this commit,
it was impossible to use an anonymous ACL with it. This was caused
because support for optional name argument was badly implemented.

No need to backport this.
2023-09-29 18:07:52 +02:00
Amaury Denoyelle
6118590e95 BUG/MINOR: proto_reverse_connect: fix FD leak on connection error
Listener using "rev@" address is responsible to setup connection and
reverse it using a server instance. If an error occured before reversal
is completed, proper freeing must be taken care of by the listener as no
session exists for this.

Currently, there is two locations where a connection is freed on error
before reversal inside reverse_connect protocol. Both of these were
incomplete as several function must be used to ensure connection is
properly freed. This commit fixes this by reusing the same cleaning
mechanism used inside H2 multiplexer.

One of the biggest drawback before this patch was that connection FD was
not properly removed from fdtab which caused a file-descriptor leak.

No need to backport this.
2023-09-29 18:02:36 +02:00
Willy Tarreau
b3dcd59f8d MINOR: stream: fix output alignment of stuck thread dumps
Since commit c185bc465 ("MEDIUM: stream: now provide full stream dumps
in case of loops"), the stuck threads show the stream's pointer in the
margin since it appears immediately after a line feed. Let's add it after
the prefix and "stream=" to make the output more readable.
2023-09-29 16:43:07 +02:00
Emeric Brun
3c250cb847 Revert "BUG/MEDIUM: quic: missing check of dcid for init pkt including a token"
This reverts commit 072e774939.

Doing h2load with h3 tests we notice this behavior:

Client ---- INIT no token SCID = a , DCID = A ---> Server (1)
Client <--- RETRY+TOKEN DCID = a, SCID = B    ---- Server (2)
Client ---- INIT+TOKEN SCID = a , DCID = B    ---> Server (3)
Client <--- INIT DCID = a, SCID = C           ---- Server (4)
Client ---- INIT+TOKEN SCID = a, DCID = C     ---> Server (5)

With (5) dropped by haproxy due to token validation.

Indeed the previous patch adds SCID of retry packet sent to the aad
of the token ciphering aad. It was useful to validate the next INIT
packets including the token are sent by the client using the new
provided SCID for DCID as mantionned into the RFC 9000.
But this stateless information is lost on received INIT packets
following the first outgoing INIT packet from the server because
the client is also supposed to re-use a second time the lastest
received SCID for its new DCID. This will break the token validation
on those last packets and they will be dropped by haproxy.

It was discussed there:
https://mailarchive.ietf.org/arch/msg/quic/7kXVvzhNCpgPk6FwtyPuIC6tRk0/

To resume: this is not the role of the server to verify the re-use of
retry's SCID for DCID in further client's INIT packets.

The previous patch must be reverted in all versions where it was
backported (supposed until 2.6)
2023-09-29 09:27:22 +02:00
Willy Tarreau
d956db6638 CLEANUP: stream: remove the now unused stream_dump() function
It was superseded by strm_dump_to_buffer() which provides much more
complete information and supports anonymizing.
2023-09-29 09:20:27 +02:00
Willy Tarreau
feff6296a1 MINOR: debug: use the more detailed stream dump in panics
Similarly upon a panic we'd like to have a more detailed dump of a
stream's state, so let's use the full dump function for this now.
2023-09-29 09:20:27 +02:00
Willy Tarreau
c185bc4656 MEDIUM: stream: now provide full stream dumps in case of loops
When a stream is caught looping, we produce some output to help figure
its internal state explaining why it's looping. The problem is that this
debug output is quite old and the info it provides are quite insufficient
to debug a modern process, and since such bugs happen only once or twice
a year the situation doesn't improve.

On the other hand the output of "show sess all" is extremely detailed
and kept up to date with code evolutions since it's a heavily used
debugging tool.

This commit replaces the call to the totally outdated stream_dump() with
a call to strm_dump_to_buffer(), and removes the filters dump since they
are already emitted there, and it now produces much more exploitable
output:

  [ALERT]    (5936) : A bogus STREAM [0x7fa8dc02f660] is spinning at 5653514 calls per second and refuses to die, aborting now! Please report this error to developers:
  0x7fa8dc02f660: [28/Sep/2023:09:53:08.811818] id=2 proto=tcpv4 source=127.0.0.1:58306
     flags=0xc4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x133f220, pend_pos=(nil) waiting=0 epoch=0x1
     frontend=public (id=2 mode=http), listener=? (id=1) addr=127.0.0.1:4080
     backend=public (id=2 mode=http) addr=127.0.0.1:61932
     server=s1 (id=1) addr=127.0.0.1:7443
     task=0x7fa8dc02fa40 (state=0x01 nice=0 calls=5749559 rate=5653514 exp=3s tid=1(1/1) age=1s)
     txn=0x7fa8dc02fbf0 flags=0x3000 meth=1 status=-1 req.st=MSG_DONE rsp.st=MSG_RPBEFORE req.f=0x4c rsp.f=0x00
     scf=0x7fa8dc02f5f0 flags=0x00000482 state=EST endp=CONN,0x7fa8dc02b4b0,0x05004001 sub=1 rex=58s wex=<NEVER>
         h1s=0x7fa8dc02b4b0 h1s.flg=0x100010 .sd.flg=0x5004001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE
          .meth=GET status=0 .sd.flg=0x05004001 .sc.flg=0x00000482 .sc.app=0x7fa8dc02f660
          .subs=0x7fa8dc02f608(ev=1 tl=0x7fa8dc02fae0 tl.calls=0 tl.ctx=0x7fa8dc02f5f0 tl.fct=sc_conn_io_cb)
          h1c=0x7fa8dc0272d0 h1c.flg=0x0 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc0273f0 .exp=<NEVER>
         co0=0x7fa8dc027040 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=LISTENER:0x12840c0
         flags=0x00000300 fd=32 fd.state=20 updt=0 fd.tmask=0x2
     scb=0x7fa8dc02fb30 flags=0x00001411 state=EST endp=CONN,0x7fa8dc0300c0,0x05000001 sub=1 rex=58s wex=<NEVER>
         h1s=0x7fa8dc0300c0 h1s.flg=0x4010 .sd.flg=0x5000001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE
          .meth=GET status=0 .sd.flg=0x05000001 .sc.flg=0x00001411 .sc.app=0x7fa8dc02f660
          .subs=0x7fa8dc02fb48(ev=1 tl=0x7fa8dc02feb0 tl.calls=2 tl.ctx=0x7fa8dc02fb30 tl.fct=sc_conn_io_cb)
          h1c=0x7fa8dc02ff00 h1c.flg=0x80000000 .sub=1 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc030020 .exp=<NEVER>
         co1=0x7fa8dc02fcd0 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x133f220
         flags=0x10000300 fd=33 fd.state=10421 updt=0 fd.tmask=0x2
     req=0x7fa8dc02f680 (f=0x1840000 an=0x8000 pipe=0 tofwd=0 total=79)
         an_exp=<NEVER> buf=0x7fa8dc02f688 data=(nil) o=0 p=0 i=0 size=0
         htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
     res=0x7fa8dc02f6d0 (f=0x80000000 an=0x1400000 pipe=0 tofwd=0 total=0)
         an_exp=<NEVER> buf=0x7fa8dc02f6d8 data=(nil) o=0 p=0 i=0 size=0
         htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
    call trace(10):
    |       0x59f2b7 [0f 0b 0f 1f 80 00 00 00]: stream_dump_and_crash+0x1f7/0x2bf
    |       0x5a0d71 [e9 af e6 ff ff ba 40 00]: process_stream+0x19f1/0x3a56
    |       0x68d7bb [49 89 c7 4d 85 ff 74 77]: run_tasks_from_lists+0x3ab/0x924
    |       0x68e0b4 [29 44 24 14 8b 4c 24 14]: process_runnable_tasks+0x374/0x6d6
    |       0x656f67 [83 3d f2 75 84 00 01 0f]: run_poll_loop+0x127/0x5a8
    |       0x6575d7 [48 8b 1d 42 50 5c 00 48]: main+0x1b22f7
    | 0x7fa8e0f35e45 [64 48 89 04 25 30 06 00]: libpthread:+0x7e45
    | 0x7fa8e0e5a4af [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a

Note that the output is subject to the global anon key so that IPs and
object names can be anonymized if required. It could make sense to
backport this and the few related previous patches next time such an
issue is reported.
2023-09-29 09:20:27 +02:00
Willy Tarreau
b206504f43 MINOR: streams: add support for line prefixes to strm_dump_to_buffer()
Now the function can prepend every new line with a caller-fed prefix
that will later be used for indenting. The caller has to feed the
prefix for the first line itself though, allowing to possibly append
the first line at the end of an existing one.
2023-09-29 09:20:27 +02:00
Willy Tarreau
5743eeea88 MINOR: stream: make stream_dump() always multi-line
There used to be two working modes for this function, a single-line one
and a multi-line one, the difference being made on the "eol" argument
which could contain either a space or an LF (and with the prefix being
adjusted accordingly). Let's get rid of the single-line mode as it's
what limits the output contents because it's difficult to produce
exploitable structured data this way. It was only used in the rare case
of spinning streams and applets and these are the ones lacking info. Now
a spinning stream produces:

[ALERT]    (3511) : A bogus STREAM [0x227e7b0] is spinning at 5581202 calls per second and refuses to die, aborting now! Please report this error to developers:
  strm=0x227e7b0,c4a src=127.0.0.1 fe=public be=public dst=s1
  txn=0x2041650,3000 txn.req=MSG_DONE,4c txn.rsp=MSG_RPBEFORE,0
  rqf=1840000 rqa=8000 rpf=80000000 rpa=1400000
  scf=0x24af280,EST,482 scb=0x24af430,EST,1411
  af=(nil),0 sab=(nil),0
  cof=0x7fdb28026630,300:H1(0x24a6f60)/RAW((nil))/tcpv4(33)
  cob=0x23199f0,10000300:H1(0x24af630)/RAW((nil))/tcpv4(32)
  filters={}
  call trace(11):
  (...)
2023-09-29 09:20:27 +02:00
Willy Tarreau
5ddeba7af3 MINOR: stream: make strm_dump_to_buffer() show the list of filters
That's one of the rare pieces of information that was not present in
the full dump and only in the short one, the list of filters the stream
is subscribed to (however the current filter was present and more
detailed).
2023-09-29 09:20:27 +02:00
Willy Tarreau
3e630a9871 MINOR: stream: make strm_dump_to_buffer() take an arbitrary buffer
We won't always want to dump into the trash, so let's make the function
accept an arbitrary buffer.
2023-09-29 09:20:27 +02:00
Willy Tarreau
6bc07103f8 CLEANUP: stream: make strm_dump_to_buffer() take a const stream
Now that we don't need a variable anymore, let's pass a const stream.
It will void any doubt about what can happen to the stream when the
function is called from inspection points (show sess etc).
2023-09-29 09:20:27 +02:00
Willy Tarreau
1a01ee4740 CLEANUP: stream: use const filters in the dump function
The strm_dump_to_buffer() function requires a variable stream only
for a few functions in it that do not take a const. strm_flt() is
one of them (and for good reasons since most call places want to
update filters). Here we know we won't modify the filter nor the
stream so let's directly access the strm_flt in the stream and assign
it to a const filter. This will also catch any future accidental change.
2023-09-29 09:20:27 +02:00
Willy Tarreau
77ecb3146a MINOR: stream: split stats_dump_full_strm_to_buffer() in two
The function only works with the CLI's appctx and does most of the
convenient work of dumping a stream into a buffer (well, the trash
buffer for now). Let's split it in two so that most of the work is
done in a generic function and that the CLI-specific function relies
on that one.

The diff looks huge due to the changed indent caused by the extraction
of the switch/case statement, but when looked at using diff -b it's
small.
2023-09-29 09:20:27 +02:00
Willy Tarreau
6c2af048d6 CLEANUP: stream: make the dump code not depend on the CLI appctx
The HA_ANON_CLI() helper relies on the CLI appctx and prevents the code
from being made more generic. Let's extract the CLI's anon key separately
and pass it via HA_ANON_STR() instead.
2023-09-29 09:20:27 +02:00
Amaury Denoyelle
7cf9cf705e BUG/MINOR: mux-quic: remove full demux flag on ncbuf release
When rcv_buf stream callback is invoked, mux tasklet is woken up if
demux was previously blocked due to lack of buffer space. A BUG_ON() is
present to ensure there is data in qcs Rx buffer. If this is not the
case, wakeup is unneeded :

  BUG_ON(!ncb_data(&qcs->rx.ncbuf, 0));

This BUG_ON() may be triggered if RESET_STREAM is received after demux
has been blocked. On reset, Rx buffer is purged according to RFC 9000
which allows to discard any data not yet consumed. This will trigger the
BUG_ON() assertion if rcv_buf stream callback is invoked after this.

To prevent BUG_ON() crash, just clear demux block flag each time Rx
buffer is purged. This covers accordingly RESET_STREAM reception.

This should be backported up to 2.7.

This may fix github issue #2293.

This bug relies on several precondition so its occurence is rare. This
was reproduced by using a custom client which post big enough data to
fill the buffer. It then emits a RESET_STREAM in place of a proper FIN.
Moreover, mux code has been edited to artificially stalled stream read
to force demux blocking.

h3_data_to_htx:
-       return htx_sent;
+       return 1;

qcc_recv_reset_stream:
        qcs_free_ncbuf(qcs, &qcs->rx.ncbuf);
+       qcs_notify_recv(qcs);

qmux_strm_rcv_buf:
        char fin = 0;
+       static int i = 0;
+       if (++i < 2)
+               return 0;
        TRACE_ENTER(QMUX_EV_STRM_RECV, qcc->conn, qcs);
2023-09-28 11:44:53 +02:00
Vladimir Vdovin
f8b81f6eb7 MINOR: support for http-request set-timeout client
Added set-timeout for frontend side of session, so it can be used to set
custom per-client timeouts if needed. Added cur_client_timeout to fetch
client timeout samples.
2023-09-28 08:49:22 +02:00
Amaury Denoyelle
b9bb3b932c MINOR: proto_reverse_connect: emit log for preconnect
Add reporting using send_log() for preconnect operation. This is minimal
to ensure we understand the current status of listener in active reverse
connect.

To limit logging quantity, only important transition are considered.
This requires to implement a minimal state machine as a new field in
receiver structure.

Here are the logs produced :
* Initiating : first time preconnect is enabled on a listener
* Error : last preconnect attempt interrupted on a connection error
* Reaching maxconn : all necessary connections were reversed and are
  operational on a listener
2023-09-22 17:21:53 +02:00
Amaury Denoyelle
069ca55e70 MINOR: proto_reverse_connect: remove unneeded wakeup
No need to use task_wakeup() on rev_bind_listener() to bootstrap
preconnect. A similar call is done on rev_enable_listener() which serve
both for bootstrap and also later to reinitiate attemps to maintain
maxconn if connection are freed.
2023-09-22 17:06:18 +02:00
Amaury Denoyelle
1f43fb71be MINOR: proto_reverse_connect: refactor preconnect failure
When a connection is freed during preconnect before reversal, the error
must be notified to the listener to remove any connection reference and
rearm a new preconnect attempt. Currently, this can occur through 2 code
paths :
* conn_free() called directly by H2 mux
* error during conn_create_mux(). For this case, connection is flagged
  with CO_FL_ERROR and reverse_connect task is woken up. The process
  task handler is then responsible to call conn_free() for such
  connection.

Duplicated steps where done both in conn_free() and process task
handler. These are now removed. To facilitate code maintenance,
dedicated operation have been centralized in a new function
rev_notify_preconn_err() which is called by conn_free().
2023-09-22 16:43:36 +02:00
Amaury Denoyelle
a37abee266 BUG/MINOR: proto_reverse_connect: set default maxconn
If maxconn is not set for preconnect, it assumes we want to establish a
single connection. However, this does not work properly in case the
connection is closed after reversal. Listener is not resumed by protocol
layer to attempt a new preconnect.

To fix this, explicitely set maxconn to 1 in the listener instance if
none is defined. This ensures the behavior is consistent. A BUG_ON() has
been added to validate we never try to use a listener with a 0 maxconn.
2023-09-22 16:40:58 +02:00
Emeric Brun
27b2fd2e06 MINOR: quic: handle external extra CIDs generator.
This patch adds the ability to externalize and customize the code
of the computation of extra CIDs after the first one was derived from
the ODCID.

This is to prepare interoperability with extra components such as
different QUIC proxies or routers for instance.

To process the patch defines two function callbacks:
- the first one to compute a hash 64bits from the first generated CID
  (itself continues to be derived from ODCID). Resulting hash is stored
  into the 'quic_conn' and 64bits is chosen large enought to be able to
  store an entire haproxy's CID.
- the second callback re-uses the previoulsy computed hash to derive
  an extra CID using the custom algorithm. If not set haproxy will
  continue to choose a randomized CID value.

Those two functions have also the 'cluster_secret' passed as an argument:
this way, it is usable for obfuscation or ciphering.
2023-09-22 10:32:14 +02:00
Lokesh Jindal
d897d7da87 MEDIUM: sample: Small fix in function check_operator for eror reporting
When function "check_operator" calls function "vars_check_arg" to decode
a variable, it passes in NULL value for pointer to the char array meant
for capturing the error message.  This commit replaces NULL with the
pointer to the real char array.  This should help in correct error
reporting.
2023-09-22 08:48:53 +02:00
Lokesh Jindal
915e48675a MEDIUM: sample: Enhances converter "bytes" to take variable names as arguments
Prior to this commit, converter "bytes" takes only integer values as
arguments.  After this commit, it can take variable names as inputs.
This allows us to dynamically determine the offset/length and capture
them in variables.  These variables can then be used with the converter.
Example use case: parsing a token present in a request header.
2023-09-22 08:48:51 +02:00
Amaury Denoyelle
d3db96f11a MINOR: proto_reverse_connect: prevent transparent server for pre-connect
Prevent using transparent servers for pre-connect on startup by emitting
a fatal error. This is used to ensure we never try to connect to a
target with an unspecified destination address or port.
2023-09-21 16:58:08 +02:00
Amaury Denoyelle
9b6812d781 BUG/MINOR: proto_reverse_connect: fix preconnect with startup name resolution
addr member of server structure is not set consistently depending on the
server address type. When using <IP:PORT> notation, its port is properly
set. However, when using <HOSTNAME:PORT>, only IP address is set after
startup name resolution but its port is left to 0.

This behavior causes preconnect to not be functional when using server
with hostname for startup name resolution. Indeed, only srv.addr is used
as connect argument through function new_reverse_conn(). To fix this,
rely on srv.svc_port : this member is always set for servers using IP or
hostname. This is similar to connect_server() on the backend side.

This does not need to be backported.
2023-09-21 16:57:30 +02:00
Sébastien Gross
6a9ba85322 MINOR: hlua: Add support for the "http-after-res" action
This commit introduces support for the "http-after-res" action in
hlua, enabling the invocation of a Lua function in a
"http-after-response" rule. With this enhancement, a Lua action can be
registered using the "http-after-res" action type:

    core.register_action('myaction', {'http-after-res'}, myaction)

A new "lua.myaction" is created and can be invoked in a
"http-after-response" rule:

    http-after-response lua.myaction

This addition provides greater flexibility and extensibility in
handling post-response actions using Lua.

This commit depends on:
 - 4457783 ("MINOR: http_ana: position the FINAL flag for http_after_res execution")

Signed-off-by: Sébastien Gross <sgross@haproxy.com>
2023-09-21 16:31:20 +02:00
Aurelien DARRAGON
95c4d24825 BUG/MEDIUM: server/cli: don't delete a dynamic server that has streams
In cli_parse_delete_server(), we take care of checking that the server is
in MAINT and that the cur_sess counter is set to 0, in the hope that no
connection/stream ressources continue to point to the server, else we
refuse to delete it.

As shown in GH #2298, this is not sufficient.

Indeed, when the server option "on-marked-down shutdown-sessions" is not
used, server streams are not purged when srv enters maintenance mode.

As such, there could be remaining streams that point to the server. To
detect this, a secondary check on srv->cur_sess counter was performed in
cli_parse_delete_server(). Unfortunately, there are some code paths that
could lead to cur_sess being decremented, and not resulting in a stream
being actually shutdown. As such, if the delete_server cli is handled
right after cur_sess has been decremented with streams still pointing to
the server, we could face some nasty bugs where stream->srv_conn could
point to garbage memory area, as described in the original github report.

To make the check more reliable prior to deleting the server, we don't
rely exclusively on cur_sess and directly check that the server is not
used in any stream through the srv_has_stream() helper function.

Thanks to @capflam which found out the root cause for the bug and greatly
helped to provide the fix.

This should be backported up to 2.6.
2023-09-21 14:57:01 +02:00
Aurelien DARRAGON
0189a4679e MINOR: pattern/ip: simplify pat_match_ip() function
pat_match_ip() has been updated several times over the last decade to
introduce new features, but it was never cleaned up.

The result is that the function is pretty hard to read, and there are
multiple duplicated code blocks so it becomes error-prone to maintain it,
plus it bloats the haproxy binary for nothing.

In this patch, we move the tree search (ip4 / ip6) logic into 2
dedicated helper functions. This allows us to refactor pat_match_ip()
without touching to the original behavior.
2023-09-21 09:50:56 +02:00
Aurelien DARRAGON
f80122db26 MINOR: pattern/ip: offload ip conversion logic to helper functions
Now that v4tov6() and v6tov4() were reworked to match behavior from
pat_match_ip() function in ("MINOR: tools/ip: v4tov6() and v6tov4()
rework"), we can remove code duplication in pat_match_ip() by directly
using those dedicated functions where relevant.
2023-09-21 09:50:55 +02:00
Aurelien DARRAGON
72514a4467 MEDIUM: tools/ip: v4tov6() and v6tov4() rework
v4tov6() and v6tov4() helper function were initially implemented in
4f92d3200 ("[MEDIUM] IPv6 support for stick-tables").

However, since ceb4ac9c3 ("MEDIUM: acl: support IPv6 address matching")
support for legacy ip6 to ip4 conversion formats were added, with the
parsing logic directly performed in acl_match_ip (which later became
pat_match_ip)

The issue is that the original v6tov4() function which is used for sample
expressions handling lacks those additional formats, so we could face
inconsistencies whether we rely on ip4/ip6 conversions from an acl context
or an expression context.

To unify ip4/ip6 automatic mapping behavior, we reworked v4tov6 and v6tov4
functions so that they now behave like in pat_match_ip() function.

Note: '6to4 (RFC3056)' and 'RFC4291 ipv4 compatible address' formats are
still supported for legacy purposes despite being deprecated for a while
now.
2023-09-21 09:50:55 +02:00
Christopher Faulet
d3e379b3ce BUG/MEDIUM: http-ana: Try to handle response before handling server abort
In the request analyser responsible to forward the request, we try to detect
the server abort to stop the request forwarding. However, we must be careful
to not block the response processing, if any. Indeed, it is possible to get
the response and the server abort in same time. In this case, we must try to
forward the response to the client first.

So to fix the issue, in the request analyser we no longer handle the server
abort if the response channel is not empty. In the end, the response
analyser is able to detect the server abort if it is relevant. Otherwise,
the stream will be woken up after the response forwarding and the server
abort should be handled at this stage.

This patch should be backported as far as 2.7 only because the risk of
breakage is high. And it is probably a good idea to wait a bit before
backporting it.
2023-09-21 09:36:37 +02:00
Willy Tarreau
cbbee15462 CLEANUP: ring: rename the ring lock "RING_LOCK" instead of "LOGSRV_LOCK"
The ring lock was initially mostly used for the logs and used to inherit
its name in lock stats. Now that it's exclusively used by rings, let's
rename it accordingly.
2023-09-20 21:38:33 +02:00
Willy Tarreau
cec8b42cb3 MEDIUM: logs: atomically check and update the log sample index
The log server lock is pretty visible in perf top when using log samples
because it's taken for each server in turn while trying to validate and
update the log server's index. Let's change this for a CAS, since we have
the index and the range at hand now. This allow us to remove the logsrv
lock.

The test on 4 servers now shows a 3.7 times improvement thanks to much
lower contention. Without log sampling a test producing 4.4M logs/s
delivers 4.4M logs/s at 21 CPUs used, everything spent in the kernel.
After enabling 4 samples (1:4, 2:4, 3:4 and 4:4), the throughput would
previously drop to 1.13M log/s with 37 CPUs used and 75% spent in
process_send_log(). Now with this change, 4.25M logs/s are emitted,
using 26 CPUs and 22% in process_send_log(). That's a 3.7x throughput
improvement for a 30% global CPU usage reduction, but in practice it
mostly shows that the performance drop caused by having samples is much
less noticeable (each of the 4 servers has its index updated for each
log).

Note that in order to even avoid incrementing an index for each log srv
that is consulted, it would be more convenient to have a single index
per frontend and apply the modulus on each log server in turn to see if
the range has to be updated. It would then only perform one write per
range switch. However the place where this is done doesn't have access
to a frontend, so some changes would need to be performed for this, and
it would require to update the current range independently in each
logsrv, which is not necessarily easier since we don't know yet if we
can commit it.
2023-09-20 21:38:33 +02:00
Willy Tarreau
e00470378b MINOR: logs: use a single index to store the current range and index
By using a single long long to store both the current range and the
next index, we'll make it possible to perform atomic operations instead
of locking. Let's only regroup them for now under a new "curr_rg_idx".
The upper word is the range, the lower is the index.
2023-09-20 21:38:33 +02:00
Willy Tarreau
49ddc0138c CLEANUP: logs: rename a confusing local variable "curr_rg" to "smp_rg"
The variable curr_rg in process_send_log() is misleading because it is
not related to the integer curr_rg that's used to calculate it, instead
it's a pointer to the current smp_log_range from smp_rgs[], so let's call
it "smp_rg" as a singular for this "smp_rgs" and put an end to this
confusion.
2023-09-20 21:38:33 +02:00
Willy Tarreau
3f1284560f MINOR: log: remove the unused curr_idx in struct smp_log_range
This index is useless because it only serves to know when the global
index reached the end, while the global one already knows it. Let's
just drop it and perform the test on the global range.

It was verified with the following config that the first server continues
to take 1/10 of the traffic, the 2nd one 2/10, the 3rd one 3/10 and the
4th one 4/10:

    log 127.0.0.1:10001 sample 1:10 local0
    log 127.0.0.1:10002 sample 2,5:10 local0
    log 127.0.0.1:10003 sample 3,7,9:10 local0
    log 127.0.0.1:10004 sample 4,6,8,10:10 local0
2023-09-20 21:38:33 +02:00
Willy Tarreau
4351364700 MINOR: logs: clarify the check of the log range
The test of the log range is not very clear, in part due to the
reuse of the "curr_idx" name that happens at two levels. The call
to in_smp_log_range() applies to the smp_info's index to which 1 is
added: it verifies that the next index is still within the current
range.

Let's just have a local variable "next_index" in process_send_log()
that gets assigned the next index (current+1) and compare it to the
current range's boundaries. This makes the test much clearer. We can
then simply remove in_smp_log_range() that's no longer needed.
2023-09-20 21:38:33 +02:00
Aurelien DARRAGON
2c9bd3ae80 BUG/MINOR: server: add missing free for server->rdr_pfx
rdr_pfx was not being free during server cleanup, leading to small memory
leak when "redir" argument was used on a server line (HTTP only).

This should be backported to every stable versions.

[For 2.6 and 2.7: the free should be performed in srv_drop() directly.
 For older versions: free in deinit() function near the free for the
 cookie string]
2023-09-15 17:46:49 +02:00
Willy Tarreau
6cbb5a057b Revert "MAJOR: import: update mt_list to support exponential back-off"
This reverts commit c618ed5ff4.

The list iterator is broken. As found by Fred, running QUIC single-
threaded shows that only the first connection is accepted because the
accepter relies on the element being initialized once detached (which
is expected and matches what MT_LIST_DELETE_SAFE() used to do before).
However while doing this in the quic_sock code seems to work, doing it
inside the macro show total breakage and the unit test doesn't work
anymore (random crashes). Thus it looks like the fix is not trivial,
let's roll this back for the time it will take to fix the loop.
2023-09-15 17:13:43 +02:00
William Lallemand
694889ac2d BUILD: quic: fix build on centos 8 and USE_QUIC_OPENSSL_COMPAT
When using USE_QUIC_OPENSSL_COMPAT=1 on centos-8 the build fail this
way:

In file included from src/quic_openssl_compat.c:11:
/usr/include/openssl/kdf.h:33:46: error: unknown type name 'va_list'
 int EVP_KDF_vctrl(EVP_KDF_CTX *ctx, int cmd, va_list args);

This is because of openssl/kdf.h being include before openssl-compat.h
2023-09-14 16:26:58 +02:00
Christopher Faulet
89e20033c7 BUG/MAJOR: mux-h2: Report a protocol error for any DATA frame before headers
If any DATA frame is received before all headers are fully received, a
protocol error must be reported. It is required by the HTTP/2 RFC but it is
also important because the HTTP analyzers expect the first HTX block is a
start-line. It leads to a crash if this statement is not respected.

For instance, it is possible to trigger a crash by sending an interim
message with a DATA frame (It may be an empty DATA frame with the ES
flag). AFAIK, only the server side is affected by this bug.

To fix the issue, an protocol error is reported for the stream.

This patch should fix the issue #2291. It must be backported as far as 2.2
(and probably to 2.0 too).
2023-09-14 11:39:39 +02:00
Frdric Lcaille
3921bf80c7 BUG/MINOR: quic: Leak of frames to send.
In very rare cases, it is possible that packet are detected as lost, their frames
requeued, then the connection is released without releasing for any reason (to
be killed because of a sendto() fatal failure for instance. Such frames are lost
and never release because the function which release their packet number spaces
does not release the frames which are still enqueued to be send.

Must be backported as far as 2.6.
2023-09-13 15:32:14 +02:00
William Lallemand
c7424a1bac MINOR: samples: implement bytes_in and bytes_out samples
%[bytes_in] and %[bytes_out] are equivalent to %U and %B tags in
log-format.
2023-09-13 14:54:50 +02:00
Willy Tarreau
5abbae2d3d CLEANUP: pools: simplify the pool expression when no pool was matched in dump
When dumping pool information, we make a special case of the condition
where the pool couldn't be identified and we consider that it was the
correct one. In the code arrangements brought by commit efc46dede ("DEBUG:
pools: inspect pools on fatal error and dump information found"), a
ternary expression for testing this depends on the "if" block condition
so this can be simplified and will make Coverity happy. This was reported
in GH #2290.
2023-09-13 13:31:41 +02:00
Willy Tarreau
c618ed5ff4 MAJOR: import: update mt_list to support exponential back-off
The new mt_list code supports exponential back-off on conflict, which
is important for use cases where there is contention on a large number
of threads. The API evolved a little bit and required some updates:

  - mt_list_for_each_entry_safe() is now in upper case to explicitly
    show that it is a macro, and only uses the back element, doesn't
    require a secondary pointer for deletes anymore.

  - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has
    to set the list iterator to NULL so that it is not re-inserted
    into the list and the list is spliced there. One must be careful
    because it was usually performed before freeing the element. Now
    instead the element must be nulled before the continue/break.

  - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been
    unclear. They were replaced by mt_list_cut_around() and
    mt_list_connect_elem() which more explicitly detach the element
    and reconnect it into the list.

  - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is
    in list.h. It may however possibly benefit from being upstreamed.

This required tiny adaptations to event_hdl.c and quic_sock.c. The
test case was updated and the API doc added. Note that in order to
keep include files small, the struct mt_list definition remains in
list-t.h (par of the internal API) and was ifdef'd out in mt_list.h.

A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with
80 threads shows a drastic reduction of CPU usage thanks to this and
the refined memory barriers. Please note that the CPU usage on OpenSSL
3.0.9 is significantly higher due to the excessive use of atomic ops
by openssl, but 3.1 is only slightly above 1.1.1 though:

  - before: 35 Gbps, 3.5 Mpps, 7800% CPU
  - after:  41 Gbps, 4.2 Mpps, 2900% CPU
2023-09-13 11:50:33 +02:00
Christopher Faulet
13fb7170be BUG/MEDIUM: master/cli: Pin the master CLI on the first thread of the group 1
There is no reason to start the master CLI on several threads and on several
groups. And in fact, it must not be done otherwise the same FD is inserted
several times in the fdtab, leading to a crash during startup because of a
BUG_ON(). It happens when several groups are configured.

To fix the bug the master CLI is now pinned on the first thread of the first
group.

This patch should fix the issue #2259 and must be backported to 2.8.
2023-09-13 10:26:32 +02:00
Christopher Faulet
665703d456 BUG/MEDIUM: mux-fcgi: Don't swap trash and dbuf when handling STDERR records
trahs chunks are buffers but not allocated from the buffers pool. And the
"trash" chunk is static and thread-local. It is two reason to not swap it
with a regular buffer allocated from the buffers pool.

Unfortunatly, it is exactly what is performed in the FCGI mux when a STDERR
record is handled. b_xfer() is used to copy data from the demux buffer to
the trash to format the error message. A zeor-copy via a swap may be
performed. In this case, this leads to a memory corruption and a crash
because, some time later, the demux buffer is released because it is
empty. And it is in fact the trash chunk.

b_force_xfer() must be used instead. This function forces the copy.

This patch must be backported as far as 2.2. For 2.4 and 2.2, b_force_xfer()
does not exist. For these versions, the following commit must be backported
too:

  * c7860007cc ("MINOR: buf: Add b_force_xfer() function")
2023-09-12 19:50:17 +02:00
Aurelien DARRAGON
1115fc348e BUG/MINOR: hlua/init: coroutine may not resume itself
It's not supported to call lua_resume with <L> and <from> designating
the same lua coroutine. It didn't cause visible bugs so far because
Lua 5.3 used to be more permissive about this, and moreover, yielding
is not involved during the hlua init state.

But this is wrong usage, and the doc clearly specifies that the <from>
argument can be NULL when there is no such coroutine, which is the case
here.

This should be backported in every stable versions.
2023-09-12 19:50:17 +02:00
Aurelien DARRAGON
e7281f3f5d BUG/MEDIUM: hlua: don't pass stale nargs argument to lua_resume()
In hlua_ctx_resume(), we call lua_resume() function like this:

  lua_resume(lua->T, hlua_states[lua->state_id], lua->nargs)

Once the call returns, we may call the function again with the same
hlua context when E_YIELD is returned (the execution was interrupted
and may be resumed through another lua_resume() call).

The 3rd argument to lua_resume(), 'nargs', is a hint passed to Lua to
know how many (optional) arguments were pushed on the stack prior to
resuming the execution (arguments that Lua will then expose to the Lua
script).

But here is the catch: we never reset lua->nargs between successive
lua_resume() calls, meaning that next lua_resume() calls will still
inherit from the initial nargs value that was set in hlua ctx prior
to calling hlua_ctx_resume() (our wrapper function) for the first time.

This is problematic, because despite not being explicitly mentioned in
the Lua documentation, passed arguments (to which `nargs` refer to), are
already consumed once lua_resume() returns.

This means that we cannot keep calling lua_resume() with non-zero nargs
if we don't push new arguments on the stack prior to resuming lua after
the initial call: nargs is proper to a single lua_resume() invocation.

Despite improper use of lua_resume() for a long time, this didn't cause
visible issues in the past with Lua 5.3, but it is particularly sensitive
starting with Lua 5.4.3 due to debugging hooks improvements that led to
some internal changes (see: lua/lua@58aa09a). Not using nargs properly
now exposes us to undefined behavior when resuming after a yield triggered
from a debugging hook, which may cause running scripts to crash
unexpectedly: for instance with Lua raising errors and complaining about
values being NULL where it should not be the case.

For reference, this issue was initially raised on the Lua mailing list:
  http://lua-users.org/lists/lua-l/2023-09/msg00005.html

In this patch, we immediately reset nargs when lua_resume() returns to
prevent any misuse.

It should be backported to every maintained versions.
2023-09-12 19:50:17 +02:00
Willy Tarreau
93c2ea0ec3 MEDIUM: pools: refine pool size rounding
The pools sizes were rounded up a little bit too much with commit
30f931ead ("BUG/MEDIUM: pools: fix the minimum allocation size"). The
goal was in fact to make sure they were always at least large enough to
store 2 list heads, and stuffing this into the alignment calculation
resulted in the size being always rounded up to this size. This is
problematic because it means that the appended tag at the end doesn't
always catch potential overflows since more bytes than needed are
allocated. Moreover, this test was later reinforced by commit b5ba09ed5
("BUG/MEDIUM: pools: ensure items are always large enough for the
pool_cache_item"), proving that the first test was not always sufficient.

This needs to be reworked to proceed correctly:
  - the two lists are needed when the object is in the cache, hence
    when we don't care about the tag, which means that the tag's size,
    if any, can easily cover for the missing bytes to reach that size.
    This is actually what was already being checked for.

  - the rounding should not be performed (beyond the size of a word to
    preserve pointer alignment) when pool tagging is enabled, otherwise
    we don't detect small overflows. It means that there will be less
    merging when proceeding like this. Tests show that we merge 93 pools
    into 36 without tags and 43 with tags enabled.

  - the rounding should not consider the extra size, since it's already
    done when calculating the allocated size later (i.e. don't round up
    twice). The difference is subtle but it's what makes sure the tag
    immediately follows the area instead of starting from the end.

Thanks to this, now when writing one byte too many at the end of a struct
stream, the error is instantly caught.
2023-09-12 18:14:05 +02:00
Willy Tarreau
61575769ac DEBUG: pools: print the contents surrounding the expected tag location
When no tag matches a known pool, we can inspect around to help figure
what could have possibly overwritten memory. The contents are printed
one machine word per line in hex, then using printable characters, and
when they can be resolved to a pointer, either the pool's pointer name
or a resolvable symbol with offset. The goal here is to help recognize
what is easily identifiable in memory.

For example applying the following patch to stream_free():

  -	pool_free(pool_head_stream, s);
  +	pool_free(pool_head_stream, (void*)s+1);

Causes the following dump to be emitted:

  FATAL: pool inconsistency detected in thread 1: tag mismatch on free().
    caller: 0x59e968 (stream_free+0x6d8/0xa0a)
    item: 0x13df5c1
    pool: 0x12782c0 ('stream', size 888, real 904, users 1)
  Tag does not match (0x4f00000000012782). Tag does not match any other pool.
  Contents around address 0x13df5c1+888=0x13df939:
         0x13df918 [00 00 00 00 00 00 00 00] [........]
         0x13df920 [00 00 00 00 00 00 00 00] [........]
         0x13df928 [00 00 00 00 00 00 00 00] [........]
         0x13df930 [00 00 00 00 00 00 00 00] [........]
         0x13df938 [c0 82 27 01 00 00 00 00] [..'.....] [pool:stream]
         0x13df940 [4f c0 59 00 00 00 00 00] [O.Y.....] [stream_new+0x4f/0xbec]
         0x13df948 [49 46 49 43 41 54 45 2d] [IFICATE-]
         0x13df950 [81 02 00 00 00 00 00 00] [........]
         0x13df958 [df 13 00 00 00 00 00 00] [........]
  Other possible callers:
  (...)

We notice that the tag references pool_head_stream with the allocation
point in stream_new. Another benefit is that a caller may be figured
from the tag even if the "caller" feature is not enabled, because upon
a free() we always put the caller's location into the tag. This should
be sufficient to debug most cases that normally require gdb.
2023-09-12 18:14:05 +02:00
Willy Tarreau
0f9a10c7f1 DEBUG: pools: also print the value of the tag when it doesn't match
Sometimes the tag's value may reveal a recognizable pattern, so let's
print it when it doesn't match a known pool.
2023-09-12 18:14:05 +02:00
Willy Tarreau
96c1a24224 DEBUG: pools: also print the item's pointer when crashing
It's important to inspect a core or recognize some values to have the
item pointer, it was not provided.
2023-09-12 18:14:05 +02:00
Willy Tarreau
efc46dede9 DEBUG: pools: inspect pools on fatal error and dump information found
It's a bit frustrating sometimes to see pool checks catch a bug but not
provide exploitable information without a core.

Here we're adding a function "pool_inspect_item()" which is called just
before aborting in pool_check_pattern() and POOL_DEBUG_CHECK_MARK() and
which will display the error type, the pool's pointer and name, and will
try to check if the item's tag matches the pool, and if not, will iterate
over all pools to see if one would be a better candidate, then will try
to figure the last known caller and possibly other likely candidates if
the pool's tag is not sufficiently trusted. This typically helps better
diagnose corruption in use-after-free scenarios, or freeing to a pool
that differs from the one the object was allocated from, and will also
indicate calling points that may help figure where an object was last
released or allocated. The info is printed on stderr just before the
backtrace.

For example, the recent off-by-one test in the PPv2 changes would have
produced the following output in vtest logs:

  ***  h1    debug|FATAL: pool inconsistency detected in thread 1: tag mismatch on free().
  ***  h1    debug|  caller: 0x62bb87 (conn_free+0x147/0x3c5)
  ***  h1    debug|  pool: 0x2211ec0 ('pp_tlv_256', size 304, real 320, users 1)
  ***  h1    debug|Tag does not match. Possible origin pool(s):
  ***  h1    debug|  tag: @0x2565530 = 0x2216740 (pp_tlv_128, size 176, real 192, users 1)
  ***  h1    debug|Recorded caller if pool 'pp_tlv_128':
  ***  h1    debug|  @0x2565538 (+0184) = 0x62c76d (conn_recv_proxy+0x4cd/0xa24)

A mismatch in the allocated/released pool is already visible, and the
callers confirm it once resolved, where the allocator indeed allocates
from pp_tlv_128 and conn_free() releases to pp_tlv_256:

  $ addr2line -spafe ./haproxy <<< $'0x62bb87\n0x62c76d'
  0x000000000062bb87: conn_free at connection.c:568
  0x000000000062c76d: conn_recv_proxy at connection.c:1177
2023-09-11 15:46:14 +02:00
Willy Tarreau
f6bee5a50b DEBUG: pools: make pool_check_pattern() take a pointer to the pool
This will be useful to report detailed bug traces.
2023-09-11 15:19:49 +02:00
Willy Tarreau
e92e96b00f DEBUG: pools: pass the caller pointer to the check functions and macros
In preparation for more detailed pool error reports, let's pass the
caller pointers to the check functions. This will be useful to produce
messages indicating where the issue happened.
2023-09-11 15:19:49 +02:00
Willy Tarreau
baf2070421 DEBUG: pools: always record the caller for uncached allocs as well
When recording the caller of a pool_alloc(), we currently store it only
when the object comes from the cache and never when it comes from the
heap. There's no valid reason for this except that the caller's pointer
was not passed to pool_alloc_nocache(), so it used to set NULL there.
Let's just pass it down the chain.
2023-09-11 15:19:49 +02:00
Frédéric Lécaille
2dedbe76c9 BUG/MINOR: quic: fdtab array underflow access
When using the listener socket as file descriptor, qc->fd value is -1.
In this case one must not access fdtab[qc->fd] element to change its value.

This bug could have been detected by asan with such a backtrace:

=================================================================
==402222==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa8ecf417ex7fa8e915cf90 sp 0x7fa8e915cf88
WRITE of size 8 at 0x7fa8ecf417e8 thread T6
    #0 0x55707a0bf18a in qc_new_cc_conn src/quic_conn.c:838
    #1 0x55707a0c6dc0 in quic_conn_release src/quic_conn.c:1408
    #2 0x55707a10916f in quic_close src/xprt_quic.c:35
    #3 0x55707a0cec77 in conn_xprt_close include/haproxy/connection.h:153
    #4 0x55707a0ceed0 in conn_full_close include/haproxy/connection.h:197
    #5 0x55707a0ec253 in qcc_release src/mux_quic.c:2412
    #6 0x55707a0ec7d0 in qcc_io_cb src/mux_quic.c:2443
    #7 0x55707a63ff2a in run_tasks_from_lists src/task.c:596
    #8 0x55707a641cc9 in process_runnable_tasks src/task.c:876
    #9 0x55707a56f7b2 in run_poll_loop src/haproxy.c:2954
    #10 0x55707a5705fd in run_thread_poll_loop src/haproxy.c:3153
    #11 0x7fa8f9450ea6 in start_thread nptl/pthread_create.c:477
    #12 0x7fa8f936ea2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

0x7fa8ecf417e8 is located 24 bytes to the left of 134217728-byte region [0x7fa8e
allocated by thread T0 here:
    #0 0x7fa8f9a37037 in __interceptor_calloc ../../../../src/libsanitizer/asan/
    #1 0x55707a71a61d in init_pollers src/fd.c:1161
    #2 0x55707a56cdf1 in init src/haproxy.c:2672
    #3 0x55707a5714c2 in main src/haproxy.c:3298
    #4 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308

Thread T6 created by T0 here:
    #0 0x7fa8f99e22a2 in __interceptor_pthread_create ../../../../src/libsanitizpp:214
    #1 0x55707a748a21 in setup_extra_threads src/thread.c:252
    #2 0x55707a5735c9 in main src/haproxy.c:3844
    #3 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow src/quic_conn.c:838 in qc_new_cc
Shadow bytes around the buggy address:
  0x0ff59d9e02a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0ff59d9e02f0: fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]fa fa
  0x0ff59d9e0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==402222==ABORTING
Aborted

Thank you to @Tristan971 for having reported this bug in GH #2247.

No need to backport.
2023-09-11 15:14:22 +02:00
Willy Tarreau
4a18d9e560 REORG: cpuset: move parse_cpu_set() and parse_cpumap() to cpuset.c
These ones were still in cfgparse.c but they're not specific to the
config at all and may actually be used even when parsing cpu list
entries in /sys. Better move them where they can be reused.
2023-09-08 16:25:19 +02:00
Willy Tarreau
5119109e3f MINOR: cpuset: dynamically allocate cpu_map
cpu_map is 8.2kB/entry and there's one such entry per group, that's
~520kB total. In addition, the init code is still in haproxy.c enclosed
in ifdefs. Let's make this a dynamically allocated array in the cpuset
code and remove that init code.

Later we may even consider reallocating it once the number of threads
and groups is known, in order to shrink it a little bit, as the typical
setup with a single group will only need 8.2kB, thus saving half a MB
of RAM. This would require that the upper bound is placed in a variable
though.
2023-09-08 16:25:19 +02:00
Willy Tarreau
b0f20ed79b MEDIUM: cfgparse: assign NUMA affinity to cpu-maps
Do not force affinity on the process, instead let's just apply it to
cpu-map, it will automatically be used later in the init process. We
can do this because we know that cpu-map was not set when we're using
this detection code.

This is much saner, as we don't need to manipulate the process' affinity
at this point in time, and just update the info that the user omitted to
set by themselves, which guarantees a better long-term consistency with
the documented feature.
2023-09-08 16:25:19 +02:00
Willy Tarreau
809a49da96 MINOR: cfgparse: use read_line_from_trash() to read from /sys
It's easier to use this function now to natively support variable
fields in the file's path. This also removes read_file_from_trash()
that was only used here and was static.
2023-09-08 16:25:19 +02:00
Willy Tarreau
1f2433fb6a MINOR: tools: add function read_line_to_trash() to read a line of a file
This function takes on input a printf format for the file name, making
it particularly suitable for /proc or /sys entries which take a lot of
numbers. It also automatically trims the trailing CR and/or LF chars.
2023-09-08 16:25:19 +02:00
Willy Tarreau
5f10176e2c MEDIUM: init: initialize the trash earlier
More and more utility functions rely on the trash while most of the init
code doesn't have access to it because it's initialized very late (in
PRE_CHECK for the initial one). It's a pool, and it purposely supports
being reallocated, so let's initialize it in STG_POOL so that early
STG_INIT code can at least use it.
2023-09-08 16:25:19 +02:00
Frédéric Lécaille
e3e218b98e CLEANUP: quic: Remove useless free_quic_tx_pkts() function.
This function define but no more used since this commit:
    BUG/MAJOR: quic: Really ignore malformed ACK frames.
2023-09-08 10:17:25 +02:00
Frédéric Lécaille
292dfdd78d BUG/MINOR: quic: Wrong cluster secret initialization
The function generate_random_cluster_secret() which initializes the cluster secret
when not supplied by configuration is buggy. There 1/256 that the cluster secret
string is empty.

To fix this, one stores the cluster as a reduced size first 128 bits of its own
SHA1 (160 bits) digest, if defined by configuration. If this is not the case, it
is initialized with a 128 bits random value. Furthermore, thus the cluster secret
is always initialized.

As the cluster secret is always initialized, there are several tests which
are for now on useless. This patch removes such tests (if(global.cluster_secret))
in the QUIC code part and at parsing time: no need to check that a cluster
secret was initialized with "quic-force-retry" option.

Must be backported as far as 2.6.
2023-09-08 09:50:58 +02:00
William Lallemand
15e591b6e0 MINOR: ssl: add support for 'curves' keyword on server lines
This patch implements the 'curves' keyword on server lines as well as
the 'ssl-default-server-curves' keyword in the global section.

It also add the keyword on the server line in the ssl_curves reg-test.

These keywords allow the configuration of the curves list for a server.
2023-09-07 23:29:10 +02:00
Willy Tarreau
28ff1a5d56 MINOR: tasks/stats: report the number of niced tasks in "show info"
We currently know the number of tasks in the run queue that are niced,
and we don't expose it. It's too bad because it can give a hint about
what share of the load is relevant. For example if one runs a Lua
script that was purposely reniced, or if a stats page or the CLI is
hammered with slow operations, seeing them appear there can help
identify what part of the load is not caused by the traffic, and
improve monitoring systems or autoscalers.
2023-09-06 17:44:44 +02:00
Remi Tricot-Le Breton
e03d060aa3 MINOR: cache: Change hash function in default normalizer used in case of "vary"
When building the secondary signature for cache entries when vary is
enabled, the referer part of the signature was a simple crc32 of the
first referer header.
This patch changes it to a 64bits hash based of xxhash algorithm with a
random seed built during init. This will prevent "malicious" hash
collisions between entries of the cache.
2023-09-06 16:11:31 +02:00
Aurelien DARRAGON
7a71801af6 CLEANUP: log: remove unnecessary trim in __do_send_log
Since both sink_write and fd_write_frag_line take the maxlen parameter
as argument, there is no added value for the trim before passing the
msg parameter to those functions.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
8e6339aa29 MEDIUM: sink: add sink_finalize() function
To further clean the code and remove duplication, some sink postparsing
and sink->sft finalization is now performed in a dedicated function
named sink_finalize().
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
b2879e3502 MEDIUM: sink/ring: introduce high level ring creation helper function
ease code maintenance.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
5a8755681d MINOR: sink: add helper function to deallocate sink struct
In this patch we move sink freeing logic outside of sink_deinit() function
in order to create the sink_free() helper function that could be used
on error paths for example.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
6049a478e4 MEDIUM: spoe-agent: properly postresolve log rings
Now that we have sink_postresolve_logsrvs() function, we make use of it
for spoe-agent log postparsing logic.

This will allow this kind of config to work:
  |spoe-agent test
  |        log tcp@127.0.0.1:514 local0
  |        use-backend xxx

Plus, consistency checks will also be performed as for regular log
directives used from global, log-forward or proxy sections.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
486aa01204 MEDIUM: fcgi-app: properly postresolve logsrvs
Now that we have postresolve_logsrv_list() function, we make use of it
for fcgi-app log postparsing logic.

This will allow this kind of config to work:
  |fcgi-app test
  |        docroot /
  |        log-stderr tcp@127.0.0.1:514 local0

Plus, consistency checks will also be performed as for regular log
directives used from global, log-forward or proxy sections.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
d9b81e5b49 MEDIUM: log/sink: make logsrv postparsing more generic
We previously had postparsing logic but only for logsrv sinks, but now we
need to make this operation on logsrv directly instead of sinks to prepare
for additional postparsing logic that is not sink-specific.

To do this, we migrated post_sink_resolve() and sink_postresolve_logsrvs()
to their postresolve_logsrvs() and postresolve_logsrv_list() equivalents.

Then, we split postresolve_logsrv_list() so that the sink-only logic stays
in sink.c (sink_resolve_logsrv_buffer() function), and the "generic"
target part stays in log.c as resolve_logsrv().

Error messages formatting was preserved as far as possible but some slight
variations are to be expected.
As for the functional aspect, no change should be expected.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
969e212c66 MINOR: log: add dup_logsrv() helper function
ease code maintenance by introducing dup_logsrv() helper function to
properly duplicate an existing logsrv struct.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
7a12e2d369 MEDIUM: httpclient/logs: rely on per-proxy post-check instead of global one
httpclient used to register a global post-check function to iterate over
all known proxies and post-initialize httpclient related ones (mainly
for logs initialization).

But we currently have an issue: post_sink_resolve() function which is
also registered using REGISTER_POST_CHECK() macro conflicts with
httpclient_postcheck() function.

This is because post_sink_resolve() relies on proxy->logsrvs to be
correctly initialized already, and httpclient_postcheck() may create
and insert new logsrvs entries to existing proxies when executed.

So depending on which function runs first, we could run into trouble.

Hopefully, to this day, everything works "by accident" due to
http_client.c file being loaded before sink.c file when compiling source
code.

But as soon as we would move one of the two functions to other files, or
if we rename files or make changes to the Makefile build recipe, we could
break this at any time.

To prevent post_sink_resolve() from randomly failing in the future, we now
make httpclient postcheck rely on per-proxy post-checks by slightly
modifying httpclient_postcheck() function so that it can be registered
using REGISTER_POST_PROXY_CHECK() macro.

As per-proxy post-check functions are executed right after config parsing
for each known proxy (vs global post-check which are executed a bit later
in the init process), we can be certain that functions registered using
global post-check macro, ie: post_sink_resolve(), will always be executed
after httpclient postcheck, effectively resolving the ordering conflict.

This should normally not cause visible behavior changes, and while it
could be considered as a bug, it's probably not worth backporting it
since the only way to trigger the issue is through code refactors,
unless we want to backport it to ease code maintenance of course,
in which case it should easily apply for >= 2.7.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
e187361b52 MINOR: log: move log-forwarders cleanup in log.c
Move the log-forwarded proxies cleanup from global deinit() function into
log dedicated deinit function.

No backport needed.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
32f1db6d0d MEDIUM: sink: don't perform implicit truncations when maxlen is not set
maxlen now defaults ~0 (instead of BUFSIZE) to make sure no implicit
truncation will be performed when the option is not specified, since the
doc doesn't mention any default value for maxlen. As such, if the payload
is too big, it will be dropped (this is the default expected behavior).
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
fdf82d058b MINOR: sink: inform the user when logs will be implicitly truncated
Consider the following example:

 |log ring@test-ring len 2000 local0
 |
 |ring test-ring
 |  maxlen 1000

This would result in emitted logs being silently truncated to 1000 because
test-ring maxlen is smaller than the log directive maxlen.

In this patch we're adding an extra check in post_sink_resolve() to detect
this kind of confusing setups and warn the user about the implicit
truncation when DIAG mode is on.

This commit depends on:
 - "MINOR: sink: simplify post_sink_resolve function"
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
ceaa1ddb06 MINOR: log/sink: detect when log maxlen exceeds sink size
To prevent logs from being silently (and unexpectly droppped) at runtime,
we check that the maxlen parameter from the log directives are
strictly inferior to the targeted ring size.

      |global
      |     tune.bufsize 16384
      |     log tcp@127.0.0.1:514 len 32768
      |     log myring@127.0.0.1:514 len 32768
      |ring myring
      |     # no explicit size

On such configs, a diag warning will be reported.

This commit depends on:
 - "MINOR: sink: simplify post_sink_resolve function"
 - "MINOR: ring: add a function to compute max ring payload"
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
d499485aa9 MINOR: sink: simplify post_sink_resolve function
Simplify post_sink_resolve() function to reduce code duplication and
make it easier to maintain.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
ddd8671b19 BUG/MEDIUM: ring: adjust maxlen consistency check
When user specifies a maxlen parameter that is greater than the size of
a given ring section, a warning is emitted to inform that the max length
exceeds size, and then the maxlen is forced to size.

The logic is good, but imprecise, because it doesn't take into account
the slight overhead from storing payloads into the ring.

In practise, we cannot store a single message which is exactly the same
length than size. Doing so will result in the message being dropped at
runtime.

Thanks to the ring_max_payload() function introduced in "MINOR: ring: add
a function to compute max ring payload", we can now deduce the maximum
value for the maxlen parameter before it could result in messages being
dropped.

When maxlen value is set to an improper value, the warning will be emitted
and maxlen will be forced to the maximum "single" payload len that could
fit in the ring buffer, preventing messages from being dropped
unexpectedly.

This commit depends on:
 - "MINOR: ring: add a function to compute max ring payload"

This may be backported as far as 2.2
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
5b295ff409 MINOR: ring: add a function to compute max ring payload
Add a helper function to the ring API to compute the maximum payload
length that could fit into the ring based on ring size.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
4457783ade MINOR: http_ana: position the FINAL flag for http_after_res execution
Ensure that the ACT_OPT_FINAL flag is always set when executing actions
from http_after_res context.

This will permit lua functions to be executed as http_after_res actions
since hlua_ctx_resume() automatically disables "yielding" when such flag
is set: the hlua handler will only allow 1shot executions at this point
(lua or not, we don't wan't to reschedule http_after_res actions).
2023-09-06 11:42:34 +02:00
Aurelien DARRAGON
967608a432 BUG/MINOR: hlua/action: incorrect message on E_YIELD error
When hlua_action error messages were reworked in d5b073cf1
("MINOR: lua: Improve error message"), an error was made for the
E_YIELD case.

Indeed, everywhere E_YIELD error is handled: "yield is not allowed" or
similar error message is reported to the user. But instead we currently
have: "aborting Lua processing on expired timeout".

It is quite misleading because this error message often refers to the
HLUA_E_ETMOUT case.

Thus, we now report the proper error message thanks to this patch.

This should be backported to all stable versions.
[on 2.0, the patch needs to be slightly adapted]
2023-09-06 11:42:34 +02:00
Frdric Lcaille
e7240a0ba6 BUG/MINOR: quic: Dereferenced unchecked pointer to Handshke packet number space
This issue was reported by longrtt interop test with quic-go as client
and @chipitsine in GH #2282 when haproxy is compiled against libressl.

Add two checks to prevent a pointer to the Handshake packet number space
to be dereferenced if this packet number space was released.

Thank you to @chipitsine for this report.

No need to backport.
2023-09-06 10:13:40 +02:00
Christopher Faulet
700ca14fc1 BUG/MINOR: ring/cli: Don't expect input data when showing events
The "show events" command may wait for now events if "-w" option is used. In
this case, no timeout must be triggered. So we explicitly state no input
data are expected. This disables the read timeout on the client side.

This patch should be backported to 2.8. It is probably useless to backport
it further. In all cases, it depends on the commit "BUG/MINOR: applet:
Always expect data when CLI is waiting for a new command"
2023-09-06 09:36:29 +02:00
Christopher Faulet
2f1e0a0a46 BUG/MINOR: applet: Always expect data when CLI is waiting for a new command
There is a mechanism for applets to disable the read timeout on the opposite
side if it is now waiting for any data. Of course, there is also a way to
re-activate it. But, it must excplicitly be handle by applets.

For the CLI, some commands may state no input data are expected. So we must
be sure to reset its state when the applet is waiting for a new command. For
now, it is not a bug because no CLI command uses this mechanism.

This patch must be backported to 2.8.
2023-09-06 09:36:19 +02:00
Christopher Faulet
8073094bf1 NUG/MEDIUM: stconn: Always update stream's expiration date after I/O
It is a revert of following patches:

  * d7111e7ac ("MEDIUM: stconn: Don't requeue the stream's task after I/O")
  * 3479d99d5 ("BUG/MEDIUM: stconn: Update stream expiration date on blocked sends")

Because the first one is reverted, the second one is useless and can be reverted
too.

The issue here is that I/O may be performed without stream wakeup. So if no
expiration date was set on the last call to process_stream(), the stream is
never rescheduled and no timeout can be detected. This especially happens on
TCP streams because fast-forward is enabled very early.

Instead of tracking all places where the stream's expiration data must be
updated, it is now centralized in sc_notify(), as it was performed before
the timeout refactoring.

This patch must be backported to 2.8.
2023-09-06 09:29:27 +02:00
Christopher Faulet
b9c87f8082 BUG/MEDIUM: stconn/stream: Forward shutdown on write timeout
The commit 7f59d68fe2 ("BUG/MEDIIM: stconn: Flush output data before
forwarding close to write side") introduced a regression. When a write
timeout is detected, the shutdown is no longer forwarded. Dependig on the
channels state, it may block the processing, waiting the client or the
server leaves.

The commit above tries to avoid to truncate messages on shutdown but on
write timeout, if the channel is not empty, there is nothing more we can do
to send these data. It means the endpoint is unable to send data. In this
case, we must forward the shutdown.

This patch should be backported as far as 2.2.
2023-09-06 09:29:27 +02:00
Christopher Faulet
d18657ae11 BUG/MEDIUM: applet: Report an error if applet request more room on aborted SC
If an abort was performed and the applet still request more room, it means
the applet has not properly handle the error on its own. At least the CLI
applet is concerned. Instead of reviewing all applets, the error is now
handled in task_run_applet() function.

Because of this bug, a session may be blocked infinitly and may also lead to
a wakup loop.

This patch must only be backported to 2.8 for now. And only to lower
versions if a bug is reported because it is a bit sensitive and the code
older versions are very different.
2023-09-06 09:29:27 +02:00
Christopher Faulet
34645a6365 BUG/MEDIUM: stconn: Report read activity when a stream is attached to front SC
It only concerns the front SC. But it is important to report a read activity
when a stream is created and attached to the front SC, especially in TCP. In
HTTP, when this happens, the request was necessarily received. But in TCP,
the client may open a connection without sending anything. We must still
report a first read activity in this case to be able to properly report
client timeout.

This patch must be backported to 2.8.
2023-09-06 09:29:27 +02:00
Christopher Faulet
015fec6a29 BUG/MINOR: stconn: Don't inhibit shutdown on connection on error
In the SC function responsible to perform shutdown, there is a statement
inhibiting the shutdown if an error was encountered on the SC. This
statement is inherited from very old version and should in fact be
removed. The error may be set from the stream. In this case the shutdown
must be performed. In all cases, it is not a big deal if the shutdown is
performed twice because underlying functions already handle multiple calls.

This patch does not fix any bug. Thus there is no reason to backport it.
2023-09-06 09:29:27 +02:00
Frdric Lcaille
fb4294be55 BUG/MINOR: quic: Wrong RTT computation (srtt and rrt_var)
Due to the fact that several variable values (rtt_var, srtt) were stored as multiple
of their real values, some calculations were less accurate as expected.

Stop storing 4*rtt_var values, and 8*srtt values.
Adjust all the impacted statements.

Must be backported as far as 2.6.
2023-09-05 17:14:51 +02:00
Frdric Lcaille
cf768f7456 BUG/MINOR: quic: Wrong RTT adjusments
There was a typo in the test statement to check if the rtt must be adjusted
(>= incorectly replaced by >).

Must be backported as far as 2.6.
2023-09-05 17:14:51 +02:00
William Lallemand
6bc00a97da MINOR: httpclient: allow to configure the timeout.connect
When using the httpclient, one could be bothered with it returning
after a very long time when failing. By default the httpclient has a
retries of 3 and a timeout connect of 5s, which can results in pause of
20s upon failure.

This patch allows the user to configure the "timeout connect" of the
httpclient so it could reduce the time to return an error.

This patch helps fixing part of the issue #2269.

Could be backported in 2.7 if needed.
2023-09-05 16:42:27 +02:00
William Lallemand
c52948bd2c MINOR: httpclient: allow to configure the retries
When using the httpclient, one could be bothered with it returning after
a very long time when failing. By default the httpclient has a retries
of 3 and a timeout connect of 5s, which can results in pause of 20s
upon failure.

This patch allows the user to configure the retries of the httpclient so
it could reduce the time to return an error.

This patch helps fixing part of the issue #2269.

Could be backported in 2.7 if needed.
2023-09-05 15:55:04 +02:00
William Lallemand
fcb080d8f9 MEDIUM: mworker: display a more accessible message when a worker crash
Should fix issue #1034.

Display a more accessible message when a worker crash about what to do.

Example:

	$ ./haproxy -W -f haproxy.cfg
	[NOTICE]   (308877) : New worker (308884) forked
	[NOTICE]   (308877) : Loading success.
	[NOTICE]   (308877) : haproxy version is 2.9-dev4-d90d3b-58
	[NOTICE]   (308877) : path to executable is ./haproxy
	[ALERT]    (308877) : Current worker (308884) exited with code 139 (Segmentation fault)
	[WARNING]  (308877) : A worker process unexpectedly died and this can only be explained by a bug in haproxy or its dependencies.
	Please check that you are running an up to date and maintained version of haproxy and open a bug report.
	HAProxy version 2.9-dev4-d90d3b-58 2023/09/05 - https://haproxy.org/
	Status: development branch - not safe for use in production.
	Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
	Running on: Linux 6.2.0-31-generic #31-Ubuntu SMP PREEMPT_DYNAMIC Mon Aug 14 13:42:26 UTC 2023 x86_64
	[ALERT]    (308877) : exit-on-failure: killing every processes with SIGTERM
	[WARNING]  (308877) : All workers exited. Exiting... (139)
2023-09-05 15:31:04 +02:00
William Lallemand
d90d3bf894 MINOR: global: export the display_version() symbol
Export the display_version() function which can be used elsewhere than
in haproxy.c
2023-09-05 15:24:39 +02:00
Frdric Lcaille
aeb2f28ca7 BUG/MINOR: quic: Unchecked pointer to Handshake packet number space
It is possible that there are still Initial crypto data in flight without
Handshake crypto data in flight. This is very rare but possible.

This issue was reported by handshakeloss interop test with quic-go as client
and @chipitsine in GH #2279.

No need to backport.
2023-09-05 11:38:33 +02:00
Frédéric Lécaille
3afe54ed5b BUILD: quic: Compilation issue on 32-bits systems with quic_may_send_bytes()
quic_may_send_bytes() implementation arrived with this commit:

  MINOR: quic: Amplification limit handling sanitization.

It returns a size_t. So when compared with QUIC_MIN() with qc->path->mtu there is
no need to cast this latted anymore because it is also a size_t.

Detected when compiled with -m32 gcc option.
2023-09-05 10:33:56 +02:00
Willy Tarreau
86854dd032 MEDIUM: threads: detect excessive thread counts vs cpu-map
This detects when there are more threads bound via cpu-map than CPUs
enabled in cpu-map, or when there are more total threads than the total
number of CPUs available at boot (for unbound threads) and configured
for bound threads. In this case, a warning is emitted to explain the
problems it will cause, and explaining how to address the situation.

Note that some configurations will not be detected as faulty because
the algorithmic complexity to resolve all arrangements grows in O(N!).
This means that having 3 threads on 2 CPUs and one thread on 2 CPUs
will not be detected as it's 4 threads for 4 CPUs. But at least configs
such as T0:(1,4) T1:(1,4) T2:(2,4) T3:(3,4) will not trigger a warning
since they're valid.
2023-09-04 19:39:17 +02:00
Willy Tarreau
8357f950cb MEDIUM: threads: detect incomplete CPU bindings
It's very easy to mess up with some cpu-map directives and to leave
some thread unbound. Let's add a test that checks that either all
threads are bound or none are bound, but that we do not face the
intermediary situation where some are pinned and others are left
wandering around, possibly on the same CPUs as bound ones.

Note that this should not be backported, or maybe turned into a
notice only, as it appears that it will easily catch invalid
configs and that may break updates for some users.
2023-09-04 19:39:17 +02:00
Willy Tarreau
e65f54cf96 MINOR: cpuset: centralize a reliable bound cpu detection
Till now the CPUs that were bound were only retrieved in
thread_cpus_enabled() in order to count the number of CPUs allowed,
and it relied on arch-specific code.

Let's slightly arrange this into ha_cpuset_detect_bound() that
reuses the ha_cpuset struct and the accompanying code. This makes
the code much clearer without having to carry along some arch-specific
stuff out of this area.

Note that the macos-specific code used in thread.c to only count
online CPUs but not retrieve a mask, so for now we can't infer
anything from it and can't implement it.

In addition and more importantly, this function is reliable in that
it will only return a value when the detection is accurate, and will
not return incomplete sets on operating systems where we don't have
an exact list, such as online CPUs.
2023-09-04 19:39:17 +02:00
Willy Tarreau
d3ecc67a01 MINOR: cpuset: add ha_cpuset_or() to bitwise-OR two CPU sets
This operation was not implemented and will be needed later.
2023-09-04 19:39:17 +02:00
Willy Tarreau
eb10567254 MINOR: cpuset: add ha_cpuset_isset() to check for the presence of a CPU in a set
This function will be convenient to test for the presence of a given CPU
in a set.
2023-09-04 19:39:17 +02:00
Willy Tarreau
fca3fc0d90 BUILD: checks: shut up yet another stupid gcc warning
gcc has always had hallucinations regarding value ranges, and this one
is interesting, and affects branches 4.7 to 11.3 at least. When building
without threads, the randomly picked new_tid that is reduced to a multiply
by 1 shifted right 32 bits, hence a constant output of 0 shows this
warning:

  src/check.c: In function 'process_chk_conn':
  src/check.c:1150:32: warning: array subscript [-1, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
  In file included from include/haproxy/thread.h:28,
                   from include/haproxy/list.h:26,
                   from include/haproxy/action.h:28,
                   from src/check.c:31:

or this one when trying to force the test to see that it cannot be zero(!):

  src/check.c: In function 'process_chk_conn':
  src/check.c:1150:54: warning: array subscript [0, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
   1150 |         uint t2_act  = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
        |                                         ~~~~~~~~~~~~~^~~~~~
  include/haproxy/atomic.h:66:40: note: in definition of macro 'HA_ATOMIC_LOAD'
     66 | #define HA_ATOMIC_LOAD(val)          *(val)
        |                                        ^~~
  src/check.c:1150:24: note: in expansion of macro '_HA_ATOMIC_LOAD'
   1150 |         uint t2_act  = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
        |                        ^~~~~~~~~~~~~~~

Let's just add an ALREADY_CHECKED() statement there, no other check seems
to get rid of it. No backport is needed.
2023-09-04 19:38:51 +02:00
Andrew Hopkins
b3f94f8b3b BUILD: ssl: Build with new cryptographic library AWS-LC
This adds a new option for the Makefile USE_OPENSSL_AWSLC, and
update the documentation with instructions to use HAProxy with
AWS-LC.

Update the type of the OCSP callback retrieved with
SSL_CTX_get_tlsext_status_cb with the actual type for
libcrypto versions greater than 1.0.2. This doesn't affect
OpenSSL which casts the callback to void* in SSL_CTX_ctrl.
2023-09-04 18:19:18 +02:00
Miroslav Zagorac
3cfc30416c MINOR: properly mark the end of the CLI command in error messages
In several places in the file src/ssl_ckch.c, in the message about the
incorrect use of the CLI command, the end of that CLI command is not
correctly marked with the sign ' .
2023-09-04 18:13:43 +02:00
Willy Tarreau
8547f5cfa2 BUG/MINOR: stream: further protect stream_dump() against incomplete sessions
As found by Coverity in issue #2273, the fix in commit e64bccab2 ("BUG/MINOR:
stream: protect stream_dump() against incomplete streams") was still not
enough, as scf/scb are still dereferenced to dump their flags and states.

This should be backported to 2.8.
2023-09-04 15:32:17 +02:00
Chris Staite
3939e39479 BUG/MEDIUM: h1-htx: Ensure chunked parsing with full output buffer
A previous fix to ensure that there is sufficient space on the output buffer
to place parsed data (#2053) introduced an issue that if the output buffer is
filled on a chunk boundary no data is parsed but the congested flag is not set
due to the state not being H1_MSG_DATA.

The check to ensure that there is sufficient space in the output buffer is
actually already performed in all downstream functions before it is used.
This makes the early optimisation that avoids the state transition to
H1_MSG_DATA needless.  Therefore, in order to allow the chunk parser to
continue in this edge case we can simply remove the early check.  This
ensures that the state can progress and set the congested flag correctly
in the caller.

This patch fixes #2262. The upstream change that caused this logic error was
backported as far as 2.5, therefore it makes sense to backport this fix back
that far also.
2023-09-04 12:15:36 +02:00
Willy Tarreau
135c66f6cb BUG/MEDIUM: connection: fix pool free regression with recent ppv2 TLV patches
In commit fecc573da ("MEDIUM: connection: Generic, list-based allocation
and look-up of PPv2 TLVs") there was a tiny mistake, elements of length
<= 128 are allocated from pool_pp_128 but only those of length < 128 are
released to this pool, other ones go to pool_pp_256. Because of this,
elements of size exactly 128 are allocated from 128 and released to 256.
It can be reproduced a few times by running sample_fetches/tlvs.vtc 1000
times with -DDEBUG_DONT_SHARE_POOLS -DDEBUG_MEMORY_POOLS -DDEBUG_EXPR
-DDEBUG_STRICT=2 -DDEBUG_POOL_INTEGRITY -DDEBUG_POOL_TRACING
-DDEBUG_NO_POOLS. Not sure why it doesn't reproduce more often though.

No backport is needed. This should address github issues #2275 and #2274.
2023-09-04 11:45:37 +02:00
Frdric Lcaille
d52466726f BUG/MINOR: quic: Unchecked pointer to packet number space dereferenced
It is possible that there are still Initial crypto data in flight without
Handshake crypto data in flight. This is very rare but possible.

This issue was reported by long-rtt interop test with quic-go as client
and @chipitsine in GH #2276.

No need to backport.
2023-09-04 11:29:35 +02:00
Frdric Lcaille
9077f20251 BUG/MAJOR: quic: Really ignore malformed ACK frames.
If not correctly parsed, an ACK frame must be ignored without any more
treatment. Before this patch an ACK frame could be partially correctly
parsed, then some errors could be detected which leaded newly acknowledged
packets to be released in a wrong way calling free_quic_tx_pkts() called
by qc_parse_ack_frm(). But there is no reason to release such packets because
of a malformed ACK frame.

This patch modifies qc_parse_ack_frm(). The newly acknowledged TX packets is done
in two steps. It first collects the newly acknowledged packet calling
qc_newly_acked_pkts(). Then proceed the same way as before for the treatments of
haproxy TX packets acknowledged by the peer. If the ACK frame could not be fully
parsed, the newly ackowledged packets are replaced back from where they were
detached: the tree of TX packets for their encryption level.

Must be backported as far as 2.6.
2023-09-04 11:29:35 +02:00
Frdric Lcaille
7dad52bdbd MINOR: quic: Add a trace to quic_release_frm()
Display the address of the frame to be released as soon as entering into
quic_release_frm() whose job is obviously to released the memory allocated
for the frame <frm> passed as parameter.
2023-09-04 11:29:35 +02:00
Frdric Lcaille
3c90c1ce6b BUG/MINOR: quic: Possible skipped RTT sampling
There are very few chances this bug may occur. Furthermore the consequences
are not dramatic: an RTT sampling may be ignored. I guess this may happen
when the now_ms global value wraps.

Do not rely on the time variable value a packet was sent to decide if it
is a newly acknowledged packet but on its presence or not in the tx packet
ebtree.

Must be backported as far as 2.6.
2023-09-04 11:29:35 +02:00
Christopher Faulet
0b93ff8c87 BUG/MEDIUM: stconn: Wake applets on sending path if there is a pending shutdown
An applet is not woken up on sending path if it is not waiting for data or
if it states it will not consume data. However, it is important to still
wake it up if there is a pending shutdown. Otherwise, the event may be
missed and some data may remain blocked in the channel's buffer.

Because of this bug, it is possible to have a stream stuck if data are also
blocked on the opposite channel. It is for instance possible to hit the buf
with the stats applet and a client not consuming data.

This patch must slowly be backported as far as 2.2. It should partially fix
issue #2249.
2023-09-01 14:18:26 +02:00
Christopher Faulet
9e394d34e0 BUG/MINOR: stconn: Don't report blocked sends during connection establishment
The server timeout must not be handled during the connection establishment
to not superseed the connect timeout. To do so, we must not consider
outgoing data are blocked during this stage. Concretly, it means the fsb
time must not be updated during connection establishment.

It is not an issue with regular clients because the server timeout is only
defined when the connection is estalished. However, it may be an issue for the
HTTP client, when the server timeout is lower than the connect timeout. In this
case, an early 502 may be reported with no connection retries.

This patch must be backported to 2.8.
2023-09-01 14:18:26 +02:00
Christopher Faulet
3479d99d5f BUG/MEDIUM: stconn: Update stream expiration date on blocked sends
When outgoing data are blocked, we must update the stream expiration date
and requeue the task. It is important to be sure to properly handle write
timeout, expecially if the stream cannot expire on reads. This bug was
introduced when handling of channel's timeouts was refactored to be managed
by the stream-connectors.

It is an issue if there is no server timeout and the client does not consume
the response (or the opposite but it is less common). It is also possible to
trigger the same scenario with applets on server side because, most of time,
there is no server timeout.

This patch must be backported to 2.8.
2023-09-01 14:18:26 +02:00
Christopher Faulet
49ed83e948 DEBUG: applet: Properly report opposite SC expiration dates in traces
The wrong label was used in trace to report expiration dates of the opposite
SC. "sc" was used instead of "sco".

This patch should be backported to 2.8.
2023-09-01 14:18:26 +02:00
Willy Tarreau
b0031d9679 MINOR: checks: also consider the thread's queue for rebalancing
Let's also check for other threads when the current one is queueing,
let's not wait for the load to be high. Now this totally eliminates
differences between threads.
2023-09-01 14:00:04 +02:00
Willy Tarreau
844a3bc25b MEDIUM: checks: implement a queue in order to limit concurrent checks
The progressive adoption of OpenSSL 3 and its abysmal handshake
performance has started to reveal situations where it simply isn't
possible anymore to succesfully run health checks on many servers,
because between the moment all the checks are started and the moment
the handshake finally completes, the timeout has expired!

This also has consequences on production traffic which gets
significantly delayed as well, all that for lots of checks. While it's
possible to increase the check delays, it doesn't solve everything as
checks still take a huge amount of time to converge in such conditions.

Here we take a different approach by permitting to enforce the maximum
concurrent checks per thread limitation and implementing an ordered
queue. Thanks to this, if a thread about to start a check has reached
its limit, it will add the check at the end of a queue and it will be
processed once another check is finished. This proves to be extremely
efficient, with all checks completing in a reasonable amount of time
and not being disturbed by the rest of the traffic from other checks.
They're just cycling slower, but at the speed the machine can handle.

One must understand however that if some complex checks perform multiple
exchanges, they will take a check slot for all the required duration.
This is why the limit is not enforced by default.

Tests on SSL show that a limit of 5-50 checks per thread on local
servers gives excellent results already, so that could be a good starting
point.
2023-09-01 14:00:04 +02:00
Willy Tarreau
cfc0bceeb5 MEDIUM: checks: search more aggressively for another thread on overload
When the current check is overloaded (more running checks than the
configured limit), we'll try more aggressively to find another thread.
Instead of just opportunistically looking for one half as loaded, now if
the current thread has more than 1% more active checks than another one,
or has more than a configured limit of concurrent running checks, it will
search for a more suitable thread among 3 other random ones in order to
migrate the check there. The number of migrations remains very low (~1%)
and the checks load very fair across all threads (~1% as well). The new
parameter is called tune.max-checks-per-thread.
2023-09-01 08:26:06 +02:00
Willy Tarreau
016e189ea3 MINOR: check: also consider the random other thread's active checks
When checking if it's worth transferring a sleeping thread to another
random thread, let's also check if that random other thread has less
checks than the current one, which is another reason for transferring
the load there.

This commit adds a function "check_thread_cmp_load()" to compare two
threads' loads in order to simplify the decision taking.

The minimum active check count before starting to consider rebalancing
the load was now raised from 2 to 3, because tests show that at 15k
concurrent checks, at 2, 50% are evaluated for rebalancing and 30%
are rebalanced, while at 3, this is cut in half.
2023-09-01 08:26:06 +02:00
Willy Tarreau
00de9e0804 MINOR: checks: maintain counters of active checks per thread
Let's keep two check counters per thread:
  - one for "active" checks, i.e. checks that are no more sleeping
    and are assigned to the thread. These include sleeping and
    running checks ;

  - one for "running" checks, i.e. those which are currently
    executing on the thread.

By doing so, we'll be able to spread the health checks load a bit better
and refrain from sending too many at once per thread. The counters are
atomic since a migration increments the target thread's active counter.
These numbers are reported in "show activity", which allows to check
per thread and globally how many checks are currently pending and running
on the system.

Ideally, we should only consider checks in the process of establishing
a connection since that's really the expensive part (particularly with
OpenSSL 3.0). But the inner layers are really not suitable to doing
this. However knowing the number of active checks is already a good
enough hint.
2023-09-01 08:26:06 +02:00
Willy Tarreau
3b7942a1c9 MINOR: check/activity: collect some per-thread check activity stats
We now count the number of times a check was started on each thread
and the number of times a check was adopted. This helps understand
better what is observed regarding checks.
2023-09-01 08:26:06 +02:00
Willy Tarreau
e03d05c6ce MINOR: check: remember when we migrate a check
The goal here is to explicitly mark that a check was migrated so that
we don't do it again. This will allow us to perform other actions on
the target thread while still knowing that we don't want to be migrated
again. The new READY bit combine with SLEEPING to form 4 possible states:

 SLP  RDY   State      Description
  0    0    -          (reserved)
  0    1    RUNNING    Check is bound to current thread and running
  1    0    SLEEPING   Check is sleeping, not bound to a thread
  1    1    MIGRATING  Check is migrating to another thread

Thus we set READY upon migration, and check for it before migrating, this
is sufficient to prevent a second migration. To make things a bit clearer,
the SLEEPING bit was switched with FASTINTER so that SLEEPING and READY are
adjacent.
2023-09-01 08:26:06 +02:00
Willy Tarreau
3544c9f8a0 MINOR: checks: pin the check to its thread upon wakeup
When a check leaves the sleeping state, we must pin it to the thread that
is processing it. It's normally always the case after the first execution,
but initial checks that start assigned to any thread (-1) could be assigned
much later, causing problems with planned changes involving queuing. Thus
better do it early, so that all threads start properly pinned.
2023-09-01 08:26:06 +02:00
Willy Tarreau
7163f95b43 MINOR: checks: start the checks in sleeping state
The CHK_ST_SLEEPING state was introduced by commit d114f4a68 ("MEDIUM:
checks: spread the checks load over random threads") to indicate that
a check was not currently bound to a thread and that it could easily
be migrated to any other thread. However it did not start the checks
in this state, meaning that they were not redispatchable on startup.

Sometimes under heavy load (e.g. when using SSL checks with OpenSSL 3.0)
the cost of setting up new connections is so high that some threads may
experience connection timeouts on startup. In this case it's better if
they can transfer their excess load to other idle threads. By just
marking the check as sleeping upon startup, we can do this and
significantly reduce the number of failed initial checks.
2023-09-01 08:26:06 +02:00
Willy Tarreau
48442b8b15 BUG/MINOR: checks: do not queue/wake a bounced check
A small issue was introduced with commit d114f4a68 ("MEDIUM: checks:
spread the checks load over random threads"): when a check is bounced
to another thread, its expiration time is set to TICK_ETERNITY. This
makes it show as not expired upon first wakeup on the next thread,
thus being detected as "woke up too early" and being instantly
rescheduled. Only this after this next wakeup it will be properly
considered.

Several approaches were attempted to fix this. The best one seems to
consist in resetting t->expire and expired upon wakeup, and changing
the !expired test for !tick_is_expired() so that we don't trigger on
this case.

This needs to be backported to 2.7.
2023-09-01 08:26:06 +02:00
Willy Tarreau
338431ecb6 MINOR: activity: report the current run queue size
While troubleshooting the causes of load spikes, it appeared that the
length of individual run queues was missing, let's add it to "show
activity".
2023-09-01 08:26:06 +02:00
Willy Tarreau
2cb896c4b0 MEDIUM: server/ssl: pick another thread's session when we have none yet
The per-thread SSL context in servers causes a burst of connection
renegotiations on startup, both for the forwarded traffic and for the
health checks. Health checks have been seen to continue to cause SSL
rekeying for several minutes after a restart on large thread-count
machines. The reason is that the context is exlusively per-thread
and that the more threads there are, the more likely it is for a new
connection to start on a thread that doesn't have such a context yet.

In order to improve this situation, this commit ensures that a thread
starting an SSL connection to a server without a session will first
look at the last session that was updated by another thread, and will
try to use it. In order to minimize the contention, we're using a read
lock here to protect the data, and the first-level index is an integer
containing the thread number, that is always valid and may always be
dereferenced. This way the session retrieval algorithm becomes quite
simple:

  - if the last thread index is valid, then try to use the same session
    under a read lock ;
  - if any error happens, then atomically nuke the index so that other
    threads don't use it and the next one to update a connection updates
    it again

And for the ssl_sess_new_srv_cb(), we have this:

  - update the entry under a write lock if the new session is valid,
    otherwise kill it if the session is not valid;
  - atomically update the index if it was 0 and the new one is valid,
    otherwise atomically nuke it if the session failed.

Note that even if only the pointer is destroyed, the element will be
re-allocated by the next thread during the sess_new_srv_sb().

Right now a session is picked even if the SNI doesn't match, because
we don't know the SNI yet during ssl_sock_init(), but that's essentially
a matter of API, since connect_server() figures the SNI very early, then
calls conn_prepare() which calls ssl_sock_init(). Thus in the future we
could easily imaging storing a number of SNI-based contexts instead of
storing contexts per thread.

It could be worth backporting this to one LTS version after some
observation, though this is not strictly necessary. the current commit
depends on the following ones:

  BUG/MINOR: ssl_sock: fix possible memory leak on OOM
  MINOR: ssl_sock: avoid iterating realloc(+1) on stored context
  DOC: ssl: add some comments about the non-obvious session allocation stuff
  CLEANUP: ssl: keep a pointer to the server in ssl_sock_init()
  MEDIUM: ssl_sock: always use the SSL's server name, not the one from the tid
  MEDIUM: server/ssl: place an rwlock in the per-thread ssl server session
  MINOR: server/ssl: maintain an index of the last known valid SSL session
  MINOR: server/ssl: clear the shared good session index on failure
  MEDIUM: server/ssl: pick another thread's session when we have none yet
2023-08-31 09:27:14 +02:00
Willy Tarreau
777f62cfb7 MINOR: server/ssl: clear the shared good session index on failure
If we fail to set the session using SSL_set_session(), we want to quickly
erase our index from the shared one so that any other thread with a valid
session replaces it.
2023-08-31 08:50:01 +02:00
Willy Tarreau
52b260bae4 MINOR: server/ssl: maintain an index of the last known valid SSL session
When a thread creates a new session for a server, if none was known yet,
we assign the thread id (hence the reused_sess index) to a shared variable
so that other threads will later be able to find it when they don't have
one yet. For now we only set and clear the pointer upon session creation,
we do not yet pick it.

Note that we could have done it per thread-group, so as to avoid any
cross-thread exchanges, but it's anticipated that this is essentially
used during startup, at a moment where the cost of inter-thread contention
is very low compared to the ability to restart at full speed, which
explains why instead we store a single entry.
2023-08-31 08:50:01 +02:00
Willy Tarreau
607041dec3 MEDIUM: server/ssl: place an rwlock in the per-thread ssl server session
The goal will be to permit a thread to update its session while having
it shared with other threads. For now we only place the lock and arrange
the code around it so that this is quite light. For now only the owner
thread uses this lock so there is no contention.

Note that there is a subtlety in the openssl API regarding
i2s_SSL_SESSION() in that it fills the area pointed to by its argument
with a dump of the session and returns a size that's equal to the
previously allocated one. As such, it does modify the shared area even
if that's not obvious at first glance.
2023-08-31 08:50:01 +02:00
Willy Tarreau
95ac5fe4a8 MEDIUM: ssl_sock: always use the SSL's server name, not the one from the tid
In ssl_sock_set_servername(), we're retrieving the current server name
from the current thread, hoping it will not have changed. This is a
bit dangerous as strictly speaking it's not easy to prove that no other
connection had to use one between the moment it was retrieved in
ssl_sock_init() and the moment it's being read here. In addition, this
forces us to maintain one session per thread while this is not the real
need, in practice we only need one session per SNI. And the current model
prevents us from sharing sessions between threads.

This had been done in 2.5 via commit e18d4e828 ("BUG/MEDIUM: ssl: backend
TLS resumption with sni and TLSv1.3"), but as analyzed with William, it
turns out that a saner approach consists in keeping the call to
SSL_get_servername() there and instead to always assign the SNI to the
current SSL context via SSL_set_tlsext_host_name() immediately when the
session is retreived. This way the session and SNI are consulted atomically
and the host name is only checked from the session and not from possibly
changing elements.

As a bonus the rdlock that was added by that commit could now be removed,
though it didn't cost much.
2023-08-31 08:49:15 +02:00
Willy Tarreau
335b5adf2c CLEANUP: ssl: keep a pointer to the server in ssl_sock_init()
We're using about 6 times "__objt_server(conn->target)" there, it's not
quite easy to read, let's keep a pointer to the server.
2023-08-30 18:58:40 +02:00
Willy Tarreau
bc31ef0896 DOC: ssl: add some comments about the non-obvious session allocation stuff
The SSL session allocation/reuse part is far from being trivial, and
there are some necessary tricks such as allocating then immediately
freeing that are required by the API due to internal refcount. All of
this is particularly hard to grasp, even with the scarce man pages.

Let's document a little bit what's granted and expected along this path
to help the reader later.
2023-08-30 11:43:06 +02:00
Willy Tarreau
2c6fe24001 MINOR: ssl_sock: avoid iterating realloc(+1) on stored context
The SSL context storage in servers is per-thread, and the contents are
allocated for a length that is determined from the session. It turns out
that placing some traces there revealed that the realloc() that is called
to grow the area can be called multiple times in a row even for just
health checks, to grow the area by just one or two bytes. Given that
malloc() allocates in multiples of 8 or 16 anyway, let's round the
allocated size up to the nearest multiple of 8 to avoid this unneeded
operation.
2023-08-30 11:43:06 +02:00
Alexander Stephan
2cc53ecc8f MINOR: sample: Add common TLV types as constants for fc_pp_tlv
This patch adds common TLV types as specified in the PPv2 spec.
We will use the suffix of the type, e.g., PP2_TYPE_AUTHORITY becomes AUTHORITY.
2023-08-29 15:32:02 +02:00
Alexander Stephan
0a4f6992e0 MINOR: sample: Refactor fc_pp_unique_id by wrapping the generic TLV fetch
The fetch logic is redundant and can be simplified by simply
calling the generic fetch with the correct TLV ID set as an
argument, similar to fc_pp_authority.
2023-08-29 15:32:01 +02:00
Alexander Stephan
ece0d1ab49 MINOR: sample: Refactor fc_pp_authority by wrapping the generic TLV fetch
We already have a call that can retreive an TLV with any value.
Therefore, the fetch logic is redundant and can be simplified
by simply calling the generic fetch with the correct TLV ID
set as an argument.
2023-08-29 15:31:51 +02:00
Alexander Stephan
f773ef721c MEDIUM: sample: Add fetch for arbitrary TLVs
Based on the new, generic allocation infrastructure, a new sample
fetch fc_pp_tlv is introduced. It is an abstraction for existing
PPv2 TLV sample fetches. It takes any valid TLV ID as argument and
returns the value as a string, similar to fc_pp_authority and
fc_pp_unique_id.
2023-08-29 15:31:28 +02:00
Alexander Stephan
fecc573da1 MEDIUM: connection: Generic, list-based allocation and look-up of PPv2 TLVs
In order to be able to implement fetches in the future that allow
retrieval of any TLVs, a new generic data structure for TLVs is introduced.

Existing TLV fetches for PP2_TYPE_AUTHORITY and PP2_TYPE_UNIQUE_ID are
migrated to use this new data structure. TLV related pools are updated
to not rely on type, but only on size. Pools accomodate the TLV list
element with their associated value. For now, two pools for 128 B and
256 B values are introduced. More fine-grained solutions are possible
in the future, if necessary.
2023-08-29 15:15:47 +02:00
Alexander Stephan
c9d47652d2 CLEANUP/MINOR: connection: Improve consistency of PPv2 related constants
This patch improves readability by scoping HA proxy related PPv2 constants
with a 'HA" prefix. Besides, a new constant for the length of a CRC32C
TLV is introduced. The length is derived from the PPv2 spec, so 32 Bit.
2023-08-29 15:15:47 +02:00
Willy Tarreau
bd84387beb MEDIUM: capabilities: enable support for Linux capabilities
For a while there has been the constraint of having to run as root for
transparent proxying, and we're starting to see some cases where QUIC is
not running in socket-per-connection mode due to the missing capability
that would be needed to bind a privileged port. It's not realistic to
ask all QUIC users on port 443 to run as root, so instead let's provide
a basic support for capabilities at least on linux. The ones currently
supported are cap_net_raw, cap_net_admin and cap_net_bind_service. The
mechanism was made OS-specific with a dedicated file because it really
is. It can be easily refined later for other OSes if needed.

A new keyword "setcaps" is added to the global section, to enumerate the
capabilities that must be kept when switching from root to non-root. This
is ignored in other situations though. HAProxy has to be built with
USE_LINUX_CAP=1 for this to be supported, which is enabled by default
for linux-glibc, linux-glibc-legacy and linux-musl.

A good way to test this is to start haproxy with such a config:

    global
        uid 1000
        setcap cap_net_bind_service

    frontend test
        mode http
        timeout client 3s
        bind quic4@:443 ssl crt rsa+dh2048.pem allow-0rtt

and run it under "sudo strace -e trace=bind,setuid", then connecting
there from an H3 client. The bind() syscall must succeed despite the
user id having been switched.
2023-08-29 11:11:50 +02:00
Willy Tarreau
e64bccab20 BUG/MINOR: stream: protect stream_dump() against incomplete streams
If a stream is interrupted during its initialization by a panic signal
and tries to dump itself, it may cause a crash during the dump due to
scf and/or scb not being fully initialized. This may also happen while
releasing an endpoint to attach a new one. The effect is that instead
of dying on an abort, the process dies on a segv. This race is ultra-
rare but totally possible. E.g:

  #0  se_fl_test (test=1, se=0x0) at include/haproxy/stconn.h:98
  #1  sc_ep_test (test=1, sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:148
  #2  sc_conn (sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:223
  #3  stream_dump (buf=buf@entry=0x7ff9507e7678, s=0x7ff4c40c8800, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>, eol=eol@entry=10 '\n') at src/stream.c:2840
  #4  0x000055996c493b42 in ha_task_dump (buf=buf@entry=0x7ff9507e7678, task=<optimized out>, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>) at src/debug.c:328
  #5  0x000055996c493edb in ha_thread_dump_one (thr=thr@entry=18, from_signal=from_signal@entry=0) at src/debug.c:227
  #6  0x000055996c493ff1 in ha_thread_dump (buf=buf@entry=0x7ff9507e7678, thr=thr@entry=18) at src/debug.c:270
  #7  0x000055996c494257 in ha_panic () at src/debug.c:430
  #8  ha_panic () at src/debug.c:411
  (...)
  #23 0x000055996c341fe8 in ssl_sock_close (conn=<optimized out>, xprt_ctx=0x7ff8dcae3880) at src/ssl_sock.c:6699
  #24 0x000055996c397648 in conn_xprt_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:148
  #25 conn_full_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:192
  #26 h1_release (h1c=0x7ff8c297b3c0) at src/mux_h1.c:1074
  #27 0x000055996c39c9f0 in h1_detach (sd=<optimized out>) at src/mux_h1.c:3502
  #28 0x000055996c474de4 in sc_detach_endp (scp=scp@entry=0x7ff9507e3148) at src/stconn.c:375
  #29 0x000055996c4752a5 in sc_reset_endp (sc=<optimized out>, sc@entry=0x7ff8d5cbd560) at src/stconn.c:475

Note that this cannot happen on "show sess" since a stream never leaves
process_stream in such an uninitialized state, thus it's really only the
crash dump that may cause this.

It should be backported to 2.8.
2023-08-29 11:11:50 +02:00
William Lallemand
e7d9082315 BUG/MINOR: ssl/cli: can't find ".crt" files when replacing a certificate
Bug was introduced by commit 26654 ("MINOR: ssl: add "crt" in the
cert_exts array").

When looking for a .crt directly in the cert_exts array, the
ssl_sock_load_pem_into_ckch() function will be called with a argument
which does not have its ".crt" extensions anymore.

If "ssl-load-extra-del-ext" is used this is not a problem since we try
to add the ".crt" when doing the lookup in the tree.

However when using directly a ".crt" without this option it will failed
looking for the file in the tree.

The fix removes the "crt" entry from the array since it does not seem to
be really useful without a rework of all the lookups.

Should fix issue #2265

Must be backported as far as 2.6.
2023-08-28 18:20:39 +02:00
Willy Tarreau
0074c36dd2 BUILD: pools: import plock.h to build even without thread support
In 2.9-dev4, commit 544c2f2d9 ("MINOR: pools: use EBO to wait for
unlock during pool_flush()") broke the thread-less build by calling
pl_wait_new_long() without explicitly including plock.h which is
normally included by thread.h when threads are enabled.
2023-08-26 17:28:08 +02:00
Willy Tarreau
a7b9baa2cc BUG/MEDIUM: mux-h2: fix crash when checking for reverse connection after error
If the connection is closed in h2_release(), which is indicated by ret<0, we
must not dereference conn anymore. This was introduced in 2.9-dev4 by commit
5053e8914 ("MEDIUM: h2: prevent stream opening before connection reverse
completed") and detected after a few hours of runtime thanks to running with
pool integrity checks and caller enabled. No backport is needed.
2023-08-26 17:05:19 +02:00
Amaury Denoyelle
5afcb686b9 MAJOR: connection: purge idle conn by last usage
Backend idle connections are purged on a recurring occurence during the
process lifetime. An estimated number of needed connections is
calculated and the excess is removed periodically.

Before this patch, purge was done directly using the idle then the safe
connection tree of a server instance. This has a major drawback to take
no account of a specific ordre and it may removed functional connections
while leaving ones which will fail on the next reuse.

The problem can be worse when using criteria to differentiate idle
connections such as the SSL SNI. In this case, purge may remove
connections with a high rate of reusing while leaving connections with
criteria never matched once, thus reducing drastically the reuse rate.

To improve this, introduce an alternative storage for idle connection
used in parallel of the idle/safe trees. Now, each connection inserted
in one of this tree is also inserted in the new list at
`srv_per_thread.idle_conn_list`. This guarantees that recently used
connection is present at the end of the list.

During the purge, use this list instead of idle/safe trees. Remove first
connection in front of the list which were not reused recently. This
will ensure that connection that are frequently reused are not purged
and should increase the reuse rate, particularily if distinct idle
connection criterias are in used.
2023-08-25 15:57:48 +02:00
Amaury Denoyelle
61fc9568fb MINOR: server: move idle tree insert in a dedicated function
Define a new function _srv_add_idle(). This is a simple wrapper to
insert a connection in the server idle tree. This is reserved for simple
usage and require to idle_conns lock. In most cases,
srv_add_to_idle_list() should be used.

This patch does not have any functional change. However, it will help
with the next patch as idle connection will be always inserted in a list
as secondary storage along with idle/safe trees.
2023-08-25 15:57:48 +02:00
Amaury Denoyelle
77ac8eb4a6 MINOR: connection: simplify removal of idle conns from their trees
Small change of API for conn_delete_from_tree(). Now the connection
instance is taken as argument instead of its inner node.

No functional change introduced with this commit. This simplifies
slightly invocation of conn_delete_from_tree(). The most useful changes
is that this function will be extended in the next patch to be able to
remove the connection from its new idle list at the same time as in its
idle tree.
2023-08-25 15:57:48 +02:00
Frdric Lcaille
81815a9a83 MEDIUM: map/acl: Replace map/acl spin lock by a read/write lock.
Replace ->lock type of pat_ref struct by HA_RWLOCK_T.
Replace all calls to HA_SPIN_LOCK() (resp. HA_SPIN_UNLOCK()) by HA_RWLOCK_WRLOCK()
(resp. HA_RWLOCK_WRUNLOCK()) when a write access is required.
There is only one read access which is needed. This is in the "show map" command
callback, cli_io_handler_map_lookup() where a HA_SPIN_LOCK() call is replaced
by HA_RWLOCK_RDLOCK() (resp. HA_SPIN_UNLOCK() by HA_RWLOCK_RDUNLOCK).
Replace HA_SPIN_INIT() calls by HA_RWLOCK_INIT() calls.
2023-08-25 15:42:03 +02:00
Frdric Lcaille
5fea59754b MEDIUM: map/acl: Accelerate several functions using pat_ref_elt struct ->head list
Replace as much as possible list_for_each*() around ->head list, member of
pat_ref_elt struct by use of its ->ebpt_root member which is an ebtree.
2023-08-25 15:42:01 +02:00
Frdric Lcaille
745d1a269b MEDIUM: map/acl: Improve pat_ref_set_elt() efficiency (for "set-map", "add-acl"action perfs)
Store a pointer to the expression (struct pattern_expr) into the data structure
used to chain/store the map element references (struct pat_ref_elt) , e.g. the
struct pattern_tree when stored into an ebtree or struct pattern_list when
chained to a list.

Modify pat_ref_set_elt() to stop inspecting all the expressions attached to a map
and to look for the <elt> element passed as parameter to retrieve the sample data
to be parsed. Indeed, thanks to the pointer added above to each pattern tree nodes
or list elements, they all can be inspected directly from the <elt> passed as
parameter and its ->tree_head and ->list_head member: the pattern tree nodes are
stored into elt->tree_head, and the pattern list elements are chained to
elt->list_head list. This inspection was also the job of pattern_find_smp() which
is no more useful. This patch removes the code of this function.
2023-08-25 15:41:59 +02:00
Frdric Lcaille
0844bed7d3 MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs)
Organize reference to pattern element of map (struct pat_ref_elt) into an ebtree:
  - add an eb_root member to the map (pat_ref struct) and an ebpt_node to its
    element (pat_ref_elt struct),
  - modify the code to insert these nodes into their ebtrees each time they are
    allocated. This is done in pat_ref_append().

Note that ->head member (struct list) of map (struct pat_ref) is not removed
could have been removed. This is not the case because still necessary to dump
the map contents from the CLI in the order the map elememnts have been inserted.

This patch also modifies http_action_set_map() which is the callback at least
used by "set-map" action. The pat_ref_elt element returned by pat_ref_find_elt()
is no more ignored, but reused if not NULL by pat_ref_set() as first element to
lookup from. This latter is also modified to use the ebtree attached to the map
in place of the ->head list attached to each map element (pat_ref_elt struct).

Also modify pat_ref_find_elt() to makes it use ->eb_root map ebtree added to the
map by this patch in place of inspecting all the elements with a strcmp() call.
2023-08-25 15:41:56 +02:00
Willy Tarreau
ff9e653859 BUG/MINOR: ssl_sock: fix possible memory leak on OOM
That's the classical realloc() issue: if it returns NULL, the old area
is not freed but we erase the pointer. It was brought by commit e18d4e828
("BUG/MEDIUM: ssl: backend TLS resumption with sni and TLSv1.3"), and
should be backported where this commit was backported.
2023-08-25 14:32:50 +02:00
Aurelien DARRAGON
ee1891ccbe BUG/MINOR: hlua_fcn: potentially unsafe stktable_data_ptr usage
As reported by Coverity in GH #2253, stktable_data_ptr() usage in
hlua_stktable_dump() func is potentially unsafe because
stktable_data_ptr() may return NULL and the returned value is
dereferenced as-is without precautions.

In practise, this should not happen because some error checking was
already performed prior to calling stktable_data_ptr(). But since we're
using the safe stktable_data_ptr() function, all the error checking is
already done within the function, thus all we need to do is check ptr
against NULL instead to protect against NULL dereferences.

This should be backported in every stable versions.
2023-08-25 11:52:43 +02:00
Amaury Denoyelle
6bd994d5d7 BUG/MINOR: h2: fix reverse if no timeout defined
h2c.task is not allocated in h2_init() if timeout client/server is not
defined depending on the connection side. This caused crash on
connection reverse due to systematic requeuing of h2c.task in
h2_conn_reverse().

To fix this, check h2c.task in h2_conn_reverse(). If old timeout was
undefined but new one is, h2c.task must be allocated as it was not in
h2_init(). On the opposite situation, if old timeout was defined and new
one is not, h2c.task is freed. In this case, or if neither timeout are
defined, skip the task requeuing.

This bug is easily reproduced by using reverse bind or server with
undefined timeout client/server depending on the connection reverse
direction.

This bug has been introduced by reverse connect support.
No need to backport it.
2023-08-24 17:58:14 +02:00
Amaury Denoyelle
5053e89142 MEDIUM: h2: prevent stream opening before connection reverse completed
HTTP/2 demux must be handled with care for active reverse connection.
Until accept has been completed, it should be forbidden to handle
HEADERS frame as session is not yet ready to handle streams.

To implement this, use the flag H2_CF_DEM_TOOMANY which blocks demux
process. This flag is automatically set just after conn_reverse()
invocation. The flag is removed on rev_accept_conn() callback via a new
H2 ctl enum. H2 tasklet is woken up to restart demux process.

As a side-effect, reporting in H2 mux may be blocked as demux functions
are used to convert error status at the connection level with
CO_FL_ERROR. To ensure error is reported for a reverse connection, check
h2c_is_dead() specifically for this case in h2_wake(). This change also
has its own side-effect : h2c_is_dead() conditions have been adjusted to
always exclude !h2c->conn->owner condition which is always true for
reverse connection or else H2 mux may kill them unexpectedly.
2023-08-24 17:03:08 +02:00
Amaury Denoyelle
6820b9b393 MEDIUM: h2: implement active connection reversal
Implement active reverse on h2_conn_reverse().

Only minimal steps are done here : HTTP version session counters are
incremented on the listener instance. Also, the connection is inserted
in the mux_stopping_list to ensure it will be actively closed on process
shutdown/listener suspend.
2023-08-24 17:03:08 +02:00
Amaury Denoyelle
b130f8dbc3 MINOR: proto_reverse_connect: handle early error before reversal
An error can occured on a reverse connection before accept is completed.
In this case, no parent session can be notified. Instead, wake up the
receiver task on conn_create_mux().

As a counterpart to this, receiver task is extended to match CO_FL_ERROR
flag on pending connection. In this case, the onnection is freed. The
task is then requeued with a 1 second delay to start a new reverse
connection attempt.
2023-08-24 17:03:08 +02:00
Amaury Denoyelle
47f502df5e MEDIUM: proto_reverse_connect: bootstrap active reverse connection
Implement active reverse connection initialization. This is done through
a new task stored in the receiver structure. This task is instantiated
via bind callback and first woken up via enable callback.

Task handler is separated into two halves. On the first step, a new
connection is allocated and stored in <pend_conn> member of the
receiver. This new client connection will proceed to connect using the
server instance referenced in the bind_conf.

When connect has successfully been executed and HTTP/2 connection is
ready for exchange after SETTINGS, reverse_connect task is woken up. As
<pend_conn> is still set, the second halve is executed which only
execute listener_accept(). This will in turn execute accept_conn
callback which is defined to return the pending connection.

The task is automatically requeued inside accept_conn callback if bind
maxconn is not yet reached. This allows to specify how many connection
should be opened. Each connection is instantiated and reversed serially
one by one until maxconn is reached.

conn_free() has been modified to handle failure if a reverse connection
fails before being accepted. In this case, no session exists to notify
about the failure. Instead, reverse_connect task is requeud with a 1
second delay, giving time to fix a possible network issue. This will
allow to attempt a new connection reverse.

Note that for the moment connection rebinding after accept is disabled
for simplicity. Extra operations are required to migrate an existing
connection and its stack to a new thread which will be implemented
later.
2023-08-24 17:03:06 +02:00
Amaury Denoyelle
b781a1bb09 MINOR: connection: prepare init code paths for active reverse
When an active reverse connection is initialized, it has no stream-conn
attached to it contrary to other backend connections. This forces to add
extra check on stream existence in conn_create_mux() and h2_init().

There is also extra checks required for session_accept_fd() after
reverse and accept is done. This is because contrary to other frontend
connections, reversed connections have already initialized their mux and
transport layers. This forces us to skip the majority of
session_accept_fd() initialization part.

Finally, if session_accept_fd() is interrupted due to an early error, a
reverse connection cannot be freed directly or else mux will remain
alone. Instead, the mux destroy callback is used to free all connection
elements properly.
2023-08-24 17:02:37 +02:00
Amaury Denoyelle
0747e493a0 MINOR: proto_reverse_connect: parse rev@ addresses for bind
Implement parsing for "rev@" addresses on bind line. On config parsing,
server name is stored on the bind_conf.

Several new callbacks are defined on reverse_connect protocol to
complete parsing. listen callback is used to retrieve the server
instance from the bind_conf server name. If found, the server instance
is stored on the receiver. Checks are implemented to ensure HTTP/2
protocol only is used by the server.
2023-08-24 17:02:37 +02:00
Amaury Denoyelle
008e8f67ee MINOR: connection: extend conn_reverse() for active reverse
Implement active reverse support inside conn_reverse(). This is used to
transfer the connection from the backend to the frontend side.

A new flag is defined CO_FL_REVERSED which is set just after this
transition. This will be used to identify connections which were
reversed but not yet accepted.
2023-08-24 17:02:37 +02:00
Amaury Denoyelle
5db6dde058 MINOR: proto: define dedicated protocol for active reverse connect
A new protocol named "reverse_connect" is created. This will be used to
instantiate connections that are opened by a reverse bind.

For the moment, only a minimal set of callbacks are defined with no real
work. This will be extended along the next patches.
2023-08-24 17:02:37 +02:00
Amaury Denoyelle
1723e21af2 MINOR: connection: use attach-srv name as SNI reuse parameter on reverse
On connection passive reverse from frontend to backend, its hash node is
calculated to be able to select it from the idle server pool. If
attach-srv rule defined an associated name, reuse it as the value for
SNI prehash.

This change allows a client to select a reverse connection by its name
by configuring its server line with a SNI to permit this.
2023-08-24 17:02:34 +02:00
Amaury Denoyelle
0b3758e18f MINOR: tcp-act: define optional arg name for attach-srv
Add an optional argument 'name' for attach-srv rule. This contains an
expression which will be used as an identifier inside the server idle
pool after reversal. To match this connection for a future transfer
through the server, the SNI server parameter must match this name. If no
name is defined, match will only occur with an empty SNI value.

For the moment, only the parsing step is implemented. An extra check is
added to ensure that the reverse server uses SSL with a SNI. Indeed, if
name is defined but server does not uses a SNI, connections will never
be selected on reused after reversal due to a hash mismatch.
2023-08-24 15:28:38 +02:00
Amaury Denoyelle
58cb76d7e1 MINOR: tcp-act: parse 'tcp-request attach-srv' session rule
Create a new tcp-request session rule 'attach-srv'.

The parsing handler is used to extract the server targetted with the
notation 'backend/server'. The server instance is stored in the act_rule
instance under the new union variant 'attach_srv'.

Extra checks are implemented in parsing to ensure attach-srv is only
used for proxy in HTTP mode and with listeners/server with no explicit
protocol reference or HTTP/2 only.

The action handler itself is really simple. It assigns the stored server
instance to the 'reverse' member of the connection instance. It will be
used in a future patch to implement passive reverse-connect.
2023-08-24 15:02:32 +02:00
Amaury Denoyelle
6e428dfaf2 MINOR: backend: only allow reuse for reverse server
A reverse server relies solely on its pool of idle connection to
transfer requests which will be populated through a new tcp-request rule
'attach-srv'.

Several changes are required on connect_server() to implement this.
First, reuse mode is forced to always for this type of server. Then, if
no idle connection is found, the request will be aborted. This results
with a 503 HTTP error code, similarly to when no server is available.
2023-08-24 14:49:03 +02:00
Amaury Denoyelle
e6223a3188 MINOR: server: define reverse-connect server
Implement reverse-connect server. This server type cannot instantiate
its own connection on transfer. Instead, it can only reuse connection
from its idle pool. These connections will be populated using the future
'tcp-request session attach-srv' rule.

A reverse-connect has no address. Instead, it uses a new custom server
notation with '@' character prefix. For the moment, only '@reverse' is
defined. An extra check is implemented to ensure server is used in a
HTTP proxy.
2023-08-24 14:49:03 +02:00
Amaury Denoyelle
4fb538d4b6 MEDIUM: h2: reverse connection after SETTINGS reception
Reverse connection after SETTINGS reception if it was set as reversable.
This operation is done in a new function h2_conn_reverse(). It regroups
common changes which are needed for both reversal direction :
H2_CF_IS_BACK is set or unset and timeouts are inverted.

For the moment, only passive reverse is fully implemented. Once done,
the connection instance is directly inserted in its targetted server
pool. It can then be used immediately for future transfers using this
server.
2023-08-24 14:49:03 +02:00
Amaury Denoyelle
1f76b8ae07 MEDIUM: connection: implement passive reverse
Define a new method conn_reverse(). This method is used to reverse a
connection from frontend to backend or vice-versa depending on its
initial status.

For the moment, passive reverse only is implemented. This covers the
transition from frontend to backend side. The connection is detached
from its owner session which can then be freed. Then the connection is
linked to the server instance.

only for passive connection on
frontend to transfer them on the backend side. This requires to free the
connection session after detaching it from.
2023-08-24 14:44:33 +02:00
Amaury Denoyelle
d8d9122a02 MINOR: connection: centralize init/deinit of backend elements
A connection contains extra elements which are only used for the backend
side. Regroup their allocation and deallocation in two new functions
named conn_backend_init() and conn_backend_deinit().

No functional change is introduced with this commit. The new functions
are reused in place of manual alloc/dealloc in conn_new() / conn_free().
This patch will be useful for reverse connect support with connection
conversion from backend to frontend side and vice-versa.
2023-08-24 14:44:33 +02:00
Amaury Denoyelle
fbe35afaa4 MINOR: proxy: simplify parsing 'backend/server'
Several CLI handlers use a server argument specified with the format
'<backend>/<server>'. The parsing of this arguement is done in two
steps, first splitting the string with '/' delimiter and then use
get_backend_server() to retrieve the server instance.

Refactor this code sections with the following changes :
* splitting is reimplented using ist API
* get_backend_server() is removed. Instead use the already existing
  proxy_be_by_name() then server_find_by_name() which contains
  duplicated code with the now removed function.

No functional change occurs with this commit. However, it will be useful
to add new configuration options reusing the same '<backend>/<server>'
for reverse connect.
2023-08-24 14:44:33 +02:00
Willy Tarreau
821fc95146 MINOR: pattern: do not needlessly lookup the LRU cache for empty lists
If a pattern list is empty, there's no way we can find its elements in
the pattern cache, so let's avoid this expensive lookup. This can happen
for ACLs or maps loaded from files that may optionally be empty for
example. Doing so improves the request rate by roughly 10% for a single
such match for only 8 threads. That's normal because the LRU cache
pre-creates an entry that is about to be committed for the case the list
lookup succeeds after a miss, so we bypass all this.
2023-08-22 07:27:01 +02:00
William Lallemand
3fde27d980 BUG/MINOR: quic: ssl_quic_initial_ctx() uses error count not error code
ssl_quic_initial_ctx() is supposed to use error count and not errror
code.

Bug was introduced by 557706b3 ("MINOR: quic: Initialize TLS contexts
for QUIC openssl wrapper").

No backport needed.
2023-08-21 15:35:17 +02:00
William Lallemand
8c004153e5 BUG/MINOR: quic: allow-0rtt warning must only be emitted with quic bind
When built with USE_QUIC_OPENSSL_COMPAT, a warning is emitted when using
allow-0rtt. However this warning is emitted for every allow-0rtt
keywords on the bind line which is confusing, it must only be done in
case the bind is a quic one. Also this does not handle the case where
the allow-0rtt keyword is in the crt-list.

This patch moves the warning to ssl_quic_initial_ctx() in order to emit
the warning in every useful cases.
2023-08-21 15:33:26 +02:00
Frdric Lcaille
2677dc1c32 MINOR: quic+openssl_compat: Emit an alert for "allow-0rtt" option
QUIC 0-RTT is not supported when haproxy is linked against an TLS stack with
limited QUIC support (OpenSSL).

Modify the "allow-0rtt" option callback to make it emit a warning if set on
a QUIC listener "bind" line.
2023-08-17 15:44:03 +02:00
Frdric Lcaille
0e13325f23 MINOR: quic+openssl_compat: Do not start without "limited-quic"
Add a check for limited-quic in check_config_validity() when compiled
with USE_QUIC_OPENSSL_COMPAT so that we prevent a config from starting
accidentally with limited QUIC support. If a QUIC listener is found
when using the compatibility mode and limited-quic is not set, an error
message is reported explaining that the SSL library is not compatible
and proposing the user to enable limited-quic if that's what they want,
and the startup fails.

This partially reverts commit 7c730803d ("MINOR: quic: Warning for
OpenSSL wrapper QUIC bindings without "limited-quic"") since a warning
was not sufficient.
2023-08-17 15:44:03 +02:00
Willy Tarreau
544c2f2d9e MINOR: pools: use EBO to wait for unlock during pool_flush()
pool_flush() could become a source of contention on the pool's free list
if there are many competing thread using that pool. Let's make sure we
use EBO and not just a simple CPU relaxation there, to avoid disturbing
them.
2023-08-17 09:09:20 +02:00
Aurelien DARRAGON
7eb05891d8 BUG/MINOR: stktable: allow sc-add-gpc from tcp-request connection
Following the previous commit's logic, we enable the use of sc-add-gpc
from tcp-request connection since it was probably forgotten in the first
place for sc-set-gpt0, and since sc-add-gpc was inspired from it, it also
lacks its.

As sc-add-gpc was implemented in 5a72d03a58 ("MINOR: stick-table: implement
the sc-add-gpc() action"), this should only be backported to 2.8
2023-08-14 09:03:49 +02:00
Aurelien DARRAGON
6c79309fda BUG/MINOR: stktable: allow sc-set-gpt(0) from tcp-request connection
Both the documentation and original developer intents seem to suggest
that sc-set-gpt/sc-set-gpt0 actions should be available from
tcp-request connection.

Yet because it was probably forgotten when expr support was added to
sc-set-gpt0 in 0d7712dff0 ("MINOR: stick-table: allow sc-set-gpt0 to
set value from an expression") it doesn't work and will report this
kind of errors:
 "internal error, unexpected rule->from=0, please report this bug!"

Fixing the code to comply with the documentation and the expected
behavior.

This must be backported to every stable versions.

[for < 2.5, as only sc-set-gpt0 existed back then, the patch must be
manually applied to skip irrelevant parts]
2023-08-14 09:03:44 +02:00
Willy Tarreau
2d18717fb8 BUILD: pools: fix build error on clang with inline vs forceinline
clang is more picky than gcc regarding duplicate "inline". The functions
declared with "forceinline" don't need to have "inline" since it's already
in the macro.
2023-08-12 19:58:17 +02:00
Willy Tarreau
29eed99b50 MINOR: pools: make pool_evict_last_items() use pool_put_to_os_no_dec()
The bucket is already known, no need to calculate it again. Let's just
include the lower level functions.
2023-08-12 19:04:34 +02:00
Willy Tarreau
7bf829ace1 MAJOR: pools: move the shared pool's free_list over multiple buckets
This aims at further reducing the contention on the free_list when using
global pools. The free_list pointer now appears for each bucket, and both
the alloc and the release code skip to a next bucket when ending on a
contended entry. The default entry used for allocations and releases
depend on the thread ID so that locality is preserved as much as possible
under low contention.

It would be nice to improve the situation to make sure that releases to
the shared pools doesn't consider the first entry's pointer but only an
argument that would be passed and that would correspond to the bucket in
the thread's cache. This would reduce computations and make sure that the
shared cache only contains items whose pointers match the same bucket.
This was not yet done. One possibility could be to keep the same splitting
in the local cache.

With this change, an h2load test with 5 * 160 conns & 40 streams on 80
threads that was limited to 368k RPS with the shared cache jumped to
3.5M RPS for 8 buckets, 4M RPS for 16 buckets, 4.7M RPS for 32 buckets
and 5.5M RPS for 64 buckets.
2023-08-12 19:04:34 +02:00
Willy Tarreau
8a0b5f783b MINOR: pools: move the failed allocation counter over a few buckets
The failed allocation counter cannot depend on a pointer, but since it's
a perpetually increasing counter and not a gauge, we don't care where
it's incremented. Thus instead we're hashing on the TID. There's no
contention there anyway, but it's better not to waste the room in
the pool's heads and to move that with the other counters.
2023-08-12 19:04:34 +02:00
Willy Tarreau
da6999f839 MEDIUM: pools: move the needed_avg counter over a few buckets
That's the same principle as for ->allocated and ->used. Here we return
the summ of the raw values, so the result still needs to be fed to
swrate_avg(). It also means that we now use the local ->used instead
of the global one for the calculations and do not need to call pool_used()
anymore on fast paths. The number of samples should likely be divided by
the number of buckets, but that's not done yet (better observe first).

A function pool_needed_avg() was added to report aggregated values for
the "show pools" command.

With this change, an h2load made of 5 * 160 conn * 40 streams on 80
threads raised from 1.5M RPS to 6.7M RPS.
2023-08-12 19:04:34 +02:00
Willy Tarreau
9e5eb586b1 MEDIUM: pools: move the used counter over a few buckets
That's the same principle as for ->allocated. The small difference here
is that it's no longer possible to decrement ->used in batches when
releasing clusters from the cache to the shared cache, so the counter
has to be decremented for each of them. But as it provides less
contention and it's done only during forced eviction, it shouldn't be
a problem.

A function "pool_used()" was added to return the sum of the entries.
It's used by pool_alloc_nocache() and pool_free_nocache() which need
to count the number of used entries. It's not a problem since such
operations are done when picking/releasing objects to/from the OS,
but it is a reminder that the number of buckets should remain small.

With this change, an h2load test made of 5 * 160 conn * 40 streams on
80 threads raised from 812k RPS to 1.5M RPS.
2023-08-12 19:04:34 +02:00
Willy Tarreau
cdb711e42b MEDIUM: pools: spread the allocated counter over a few buckets
The ->used counter is one of the most stressed, and it heavily
depends on the ->allocated one, so let's first move ->allocated
to a few buckets.

A function "pool_allocated()" was added to return the sum of the entries.
It's important not to abuse it as it does iterate, so everywhere it's
possible to avoid it by keeping a local counter, it's better. Currently
it's used for limited pools which need to make sure they do not allocate
too many objects. That's an acceptable tradeoff to save CPU on large
machines at the expense of spending a little bit more on small ones which
normally are not under load.
2023-08-12 19:04:34 +02:00
Willy Tarreau
06885aaea7 MINOR: pools: introduce the use of multiple buckets
On many threads and without the shared cache, there can be extreme
contention on the ->allocated counter, the ->free_list pointer, and
the ->used counter. It's possible to limit this contention by spreading
the counters a little bit over multiple entries, that are summed up when
a consultation is needed. The criterion used to spread the values cannot
be related to the thread ID due to migrations, since we need to keep
consistent stats (allocated vs used).

Instead we'll just hash the pointer, it provides an index that does the
job and that is consistent for the object. When having just a few entries
(16 here as it showed almost identical performance between global and
non-global pools) even iterations should be short enough during
measurements to not be a problem.

A pair of functions designed to ease pointer hash bucket calculation were
added, with one of them doing it for thread IDs because allocation failures
will be associated with a thread and not a pointer.

For now this patch only brings in the relevant parts of the infrastructure,
the CONFIG_HAP_POOL_BUCKETS_BITS macro that defaults to 6 bits when 512
threads or more are supported, 5 bits when 128 or more are supported, 4
bits when 16 or more are supported, otherwise 3 bits for small setups.
The array in the pool_head and the two utility functions are already
added. It should have no measurable impact beyond inflating the pool_head
structure.
2023-08-12 19:04:34 +02:00
Willy Tarreau
29ad61fb00 OPTIM: pools: make pool_get_from_os() / pool_put_to_os() not update ->allocated
The pool's allocation counter doesn't strictly require to be updated
from these functions, it may more efficiently be done in the caller
(even out of a loop for pool_flush() and pool_gc()), and doing so will
also help us spread the counters over an array later. The functions
were renamed _noinc and _nodec to make sure we catch any possible
user in an external patch. If needed, the original functions may easily
be reimplemented in an inline function.
2023-08-12 19:04:34 +02:00
Willy Tarreau
feeda4132b OPTIM: pools: use exponential back-off on shared pool allocation/release
Running a stick-table stress with -dMglobal under 56 threads shows
extreme contention on the pool's free_list because it has to be
processed in two phases and only used to implement a cpu_relax() on
the retry path.

Let's at least implement exponential back-off here to limit the neighbor's
noise and reduce the time needed to successfully acquire the pointer. Just
doing so shows there's still contention but almost doubled the performance,
from 1.1 to 2.1M req/s.
2023-08-12 19:04:34 +02:00
Willy Tarreau
45eeaad45f MEDIUM: peers: drop the stick-table lock before entering peer_send_teachmsgs()
The function drops the lock very early, and the only operations that
are performed on the entry code are updating the current peer's
last_local_table, which doesn't need to be protected. Thus it's
easier to drop the lock before entering the function and it further
limits its scope.

This has raised the peak RPS from 2050 to 2355k/s with a peers section on
the 80-core machine.
2023-08-11 19:03:35 +02:00
Willy Tarreau
cfeca3a3a3 MEDIUM: stick-table: touch updates under an upgradable read lock
Instead of taking the update's write lock in stktable_touch_with_exp(),
while most of the time under high load there is nothing to update because
the entry is touched before having been synchronized present, let's do
the check under a read lock and upgrade it to perform the update if
needed. These updates are rare and the contention is not expected to be
very high, so at the first failure to upgrade we retry directly with a
write lock.

By doing so the performance has almost doubled again, from 1140 to 2050k
with a peers section enabled. The contention is now on taking the read
lock itself, so there's little to be gained beyond this in this function.
2023-08-11 19:03:35 +02:00
Willy Tarreau
87e072eea5 MEDIUM: stick-table: use a distinct lock for the updates tree
Updating an entry in the updates tree is currently performed under the
table's write lock, which causes huge contention with other accesses
such as lookups and free. Aside the updates tree, the update,
localupdate and commitupdate variables, nothing is manipulated, so
let's create a distinct lock (updt_lock) to protect these together
to remove this contention. It required to add an extra lock in the
few places where we delete the update (though only if we're really
going to delete it) to protect the tree. This is very convenient
because now peer_send_teachmsgs() only needs to take this read lock,
and there is very little contention left on the stick-table.

With this alone, the performance jumped from 614k to 1140k/s on a
80-thread machine with a peers section! Stick-table updates with
no peers however now has to stand two locks and slightly regressed
from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the
significant unlocking of the peers updates and considered totally
acceptable.
2023-08-11 19:03:35 +02:00
Willy Tarreau
29982ea769 MEDIUM: peers: only read-lock peer_send_teachmsgs()
This function doesn't need to be write-locked. It performs a lookup
of the next update at its index, atomically updates the ref_cnt on
the stksess, updates some shared_table fields on the local thread,
and updates the table's commitupdate. Now that this update is atomic
we don't need to keep the write lock during that period. In addition
this function's callers do not rely on the write lock to be held
either since it was droped during peer_send_updatemsg() anyway.

Now, when the function is entered with a write lock, it's downgraded
to a read lock, otherwise a read lock is grabbed. Updates are looked
up under the read lock and the message is sent without the lock. The
commitupdate is still performed under the read lock (so as not to
break the code too much), and the write lock is re-acquired when
leaving if needed. This allows multiple peers to look up updates in
parallel and to avoid stalling stick-table lookups.
2023-08-11 19:03:35 +02:00
Willy Tarreau
d4f8286e45 MEDIUM: peers: drop then re-acquire the wrlock in peer_send_teachmsgs()
This function maintains the write lock for a while. In practice it does
not need to hold it that long, and some parts could be performed under a
read lock. This patch first drops then re-acquires the write lock at the
function's entry. The purpose is simply to break the end-to-end atomicity
to prove that it has no impact in case something needs to be bisected
later. In fact the write lock is already dropped while calling
peer_send_updatemsg().
2023-08-11 19:03:35 +02:00
Willy Tarreau
4eddf26f58 MEDIUM: peers: update ->commitupdate out of the lock using a CAS
The ->commitupdate index doesn't need to be kept consistent with other
operations, it only needs to be correct and to reflect the last known
value. Right now it's updated under the stick-table lock, which is
expensive and maintains this lock longer than needed. Let's move it
outside of the lock, and update it using a CAS. This patch simply
replaces the assignment with a CAS and makes sure all reads are atomic.
On failed CAS we use a simple cpu_relax(), no need for more as there
should not be that much contention here (updates are not that fast).
2023-08-11 19:03:35 +02:00
Willy Tarreau
7968fe3889 MEDIUM: stick-table: change the ref_cnt atomically
Due to the ts->ref_cnt being manipulated and checked inside wrlocks,
we continue to have it updated under plenty of read locks, which have
an important cost on many-thread machines.

This patch turns them all to atomic ops and carefully moves them outside
of locks every time this is possible:
  - the ref_cnt is incremented before write-unlocking on creation otherwise
    the element could vanish before we can do it
  - the ref_cnt is decremented after write-locking on release
  - for all other cases it's updated out of locks since it's guaranteed by
    the sequence that it cannot vanish
  - checks are done before locking every time it's used to decide
    whether we're going to release the element (saves several write locks)
  - expiration tests are just done using atomic loads, since there's no
    particular ordering constraint there, we just want consistent values.

For Lua, the loop that is used to dump stick-tables could switch to read
locks only, but this was not done.

For peers, the loop that builds updates in peer_send_teachmsgs is extremely
expensive in write locks and it doesn't seem this is really needed since
the only updated variables are last_pushed and commitupdate, the first
one being on the shared table (thus not used by other threads) and the
commitupdate could likely be changed using a CAS. Thus all of this could
theoretically move under a read lock, but that was not done here.

On a 80-thread machine with a peers section enabled, the request rate
increased from 415 to 520k rps.
2023-08-11 19:03:35 +02:00
Willy Tarreau
73b1dea4d1 MINOR: stick-table: move the task_wakeup() call outside of the lock
The write lock in stktable_touch_with_exp() is quite expensive and should
be shortened as much as possible. There's no need for it when calling
task_wakeup() so let's move it out.

On a 80-thread machine with a peers section, the request rate increased
from 397k to 415k rps.
2023-08-11 19:03:35 +02:00
Willy Tarreau
322e4ab9d2 MINOR: stick-table: move the task_queue() call outside of the lock
The write lock in stktable_requeue_exp() is quite expensive and should
be shortened as much as possible. There's no need for it when calling
task_queue() so let's move it out.

On a 80-thread machine with a peers section, the request rate increased
from 368k to 397k rps.
2023-08-11 19:03:35 +02:00
Aurelien DARRAGON
09133860bf BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread
Michel Mayen reported that mixing lua actions loaded from 'lua-load'
and 'lua-load-per-thread' directives within a single http/tcp session
yields unexpected results.

When executing action defined in another running context from the one of
the previously executed action (from lua-load, then from
lua-load-per-thread or the opposite, order doesn't matter), it would yield
this kind of error:

"Lua function 'name': [state-id x] runtime error: attempt to call a nil value from ."

He also noted that when loading all actions using the same loading
directive, the issue is gone.

This is due to the fact that for lua actions, fetches and converters, lua
code is being executed from the stream lua context. However, the stream
lua context, which is created on the fly when first executing some lua
code related to the stream, is reused between multiple lua executions.

But the thing is, despite successive executions referring to the same
parent "stream" (which is also assigned to a given thread id), they don't
necessarily depend on the same running context from lua point of view.

Indeed, since the function which is about to be executed could have been
loaded from either 'lua-load' or 'lua-load-per-thread', the function
declaration and related dependencies are defined in a specific stack ID
which is known by calling fcn_ref_to_stack_id() on the given function.

Thus, in order to make streams capable of chaining lua actions, fetches and
converters loaded in different lua stacks, we add a new detection logic
in hlua_stream_ctx_prepare() to be able to recreate the lua context in the
proper stack space when the existing one conflicts with the expected stack
id.

This must be backported in every stable versions.

It depends on:
 - "MINOR: hlua: add hlua_stream_prepare helper function"
   [for < 2.5, skip the filter part since they didn't exist]

[wt: warning, wait a little bit before backporting too far, we
 need to be certain the added BUG_ON() will never trigger]
2023-08-11 19:02:59 +02:00
Aurelien DARRAGON
2fdb9d41b3 MINOR: hlua: add hlua_stream_ctx_prepare helper function
Stream-dedicated hlua ctx creation and attachment is now performed in
hlua_stream_ctx_prepare() helper function to ease code maintenance.

No functional behavior change should be expected.
2023-08-11 19:00:57 +02:00
Aurelien DARRAGON
12cf8d4db7 BUG/MINOR: hlua: fix invalid use of lua_pop on error paths
Multiple error paths made invalid use of lua_pop():

When the stack is emptied using lua_settop(0), lua_pop() (which is
implemented as a lua_settop() macro) should not be used right after,
because it could lead to invalid reads since the stack is already empty.

Unfortunately, some remnants from initial lua stack implementation kept
doing so, resulting in haproxy crashs on some lua runtime errors paths
from time to time (ie: ERRRUN, ERRMEM).

Moreover, the extra lua_pop() instruction, even if it was safe, is totally
pointless in such case.

Removing such unsafe lua_pop() statements when we know that the stack is
already empty.

This must be backported in every stable versions.
2023-08-11 19:00:55 +02:00
Amaury Denoyelle
7f80d51812 BUG/MEDIUM: quic: fix tasklet_wakeup loop on connection closing
It is possible to trigger a loop of tasklets calls if a QUIC connection
is interrupted abruptly by the client. This is caused by the following
interaction :
* FD iocb is woken up for read. This causes a wakeup on quic_conn
  tasklet.
* quic_conn_io_cb is run and try to read but fails as the connection
  socket is closed (typically with a ECONNREFUSED). FD read is
  subscribed to the poller via qc_rcv_buf() which will cause the loop.

The looping will stop automatically once the idle-timeout is expired and
the connection instance is finally released.

To fix this, ensure FD read is subscribed only for transient error cases
(EAGAIN or similar). All other cases are considered as fatal and thus
all future read operations will fail. Note that for the moment, nothing
is reported on the quic_conn which may not skip future reception. This
should be improved in a future commit to accelerate connection closing.

This bug can be reproduced on a frequent occurence by interrupting the
following command. Quic traces should be activated on haproxy side to
detect the loop :
$ ngtcp2-client --tp-file=/tmp/ngtcp2-tp.txt \
  --session-file=/tmp/ngtcp2-session.txt \
  -r 0.3 -t 0.3 --exit-on-all-streams-close 127.0.0.1 20443 \
  "http://127.0.0.1:20443/?s=1024"

This must be backported up to 2.7.
2023-08-11 17:04:20 +02:00
Frédéric Lécaille
d355bce7e4 BUG/MINOR: quic: Missing tasklet (quic_cc_conn_io_cb) memory release (leak)
The tasklet responsible of handling the remaining QUIC connection object
and its traffic was not released, leading to a memory leak. Furthermore its
callback, quic_cc_conn_io_cb(), should return NULL after this tasklet is
released.
2023-08-11 11:43:19 +02:00
Frédéric Lécaille
b0e32c6263 BUG/MINOR: quic: Possible crash when issuing "show fd/sess" CLI commands
->xprt_ctx (struct ssl_sock_ctx) and ->conn (struct connection) must be kept
by the remaining QUIC connection object (struct quic_cc_conn) after having
release the previous one (struct quic_conn) to allow "show fd/sess" commands
to be functional without causing haproxy crashes.

No need to backport.
2023-08-11 11:21:31 +02:00
Frédéric Lécaille
5d602f4f33 MINOR: quic: Add a trace for QUIC conn fd ready for receive
Add a trace as this is done for the "send ready" fd state.
2023-08-11 08:57:47 +02:00
Frédéric Lécaille
f3edbc792e BUG/MINOR: quic: Possible crash in quic_cc_conn_io_cb() traces.
Reset the local cc_qc and qc after having released cc_qc. Note that
cc_qc == qc. This is required to prevent haproxy from crashing
when TRACE_LEAVE() is called.

No need to backport.
2023-08-10 17:21:19 +02:00
Frédéric Lécaille
1ab6126ca7 BUG/MINOR: quic: mux started when releasing quic_conn
There are cases where the mux is started before the handshake is completed:
during 0-RTT sessions.

So, it was a bad idea to try to release the quic_conn object from quic_conn_io_cb()
without checking if the mux is started.

No need to backport.
2023-08-10 17:15:21 +02:00
Willy Tarreau
d39a9cbd13 BUILD: mux-h1: shut a build warning on clang from previous commit
Commit 5201b4abd ("BUG/MEDIUM: mux-h1: do not forget EOH even when no
header is sent") introduced a build warning on clang due to the remaining
two parenthesis in the expression. Let's fix this. No backport needed.
2023-08-09 16:03:39 +02:00
Willy Tarreau
5201b4abd1 BUG/MEDIUM: mux-h1: do not forget EOH even when no header is sent
Since commit 723c73f8a ("MEDIUM: mux-h1: Split h1_process_mux() to make
code more readable"), outgoing H1 requests with no header at all (i.e.
essentially HTTP/1.0 requests) get delayed by 200ms. Christopher found
that it's due to the fact that we end processing too early and we don't
have the opportunity to send the EOH in this case.

This fix addresses it by verifying if it's required to emit EOH when
retruning from h1_make_headers(). But maybe that block could be moved
after the while loop in fact, or the stop condition in the loop be
revisited not to stop of !htx_is_empty(). The current solution gets the
job done at least.

No backport is needed, this was in 2.9-dev.
2023-08-09 11:58:15 +02:00
Willy Tarreau
949371a00d BUG/MEDIUM: mux-h1: fix incorrect state checking in h1_process_mux()
That's a regression introduced in 2.9-dev by commit 723c73f8a ("MEDIUM:
mux-h1: Split h1_process_mux() to make code more readable") and found
by Christopher. The consequence is uncertain but the test definitely was
not right in that it would catch most existing states (H1_MSG_DONE=30).
At least it would emit too many "H1 request fully xferred".

No backport needed.
2023-08-09 11:51:58 +02:00
Willy Tarreau
22731762d9 BUG/MINOR: http: skip leading zeroes in content-length values
Ben Kallus also noticed that we preserve leading zeroes on content-length
values. While this is totally valid, it would be safer to at least trim
them before passing the value, because a bogus server written to parse
using "strtol(value, NULL, 0)" could inadvertently take a leading zero
as a prefix for an octal value. While there is not much that can be done
to protect such servers in general (e.g. lack of check for overflows etc),
at least it's quite cheap to make sure the transmitted value is normalized
and not taken for an octal one.

This is not really a bug, rather a missed opportunity to sanitize the
input, but is marked as a bug so that we don't forget to backport it to
stable branches.

A combined regtest was added to h1or2_to_h1c which already validates
end-to-end syntax consistency on aggregate headers.
2023-08-09 11:28:48 +02:00
Willy Tarreau
6492f1f29d BUG/MAJOR: http: reject any empty content-length header value
The content-length header parser has its dedicated function, in order
to take extreme care about invalid, unparsable, or conflicting values.
But there's a corner case in it, by which it stops comparing values
when reaching the end of the header. This has for a side effect that
an empty value or a value that ends with a comma does not deserve
further analysis, and it acts as if the header was absent.

While this is not necessarily a problem for the value ending with a
comma as it will be cause a header folding and will disappear, it is a
problem for the first isolated empty header because this one will not
be recontructed when next ones are seen, and will be passed as-is to the
backend server. A vulnerable HTTP/1 server hosted behind haproxy that
would just use this first value as "0" and ignore the valid one would
then not be protected by haproxy and could be attacked this way, taking
the payload for an extra request.

In field the risk depends on the server. Most commonly used servers
already have safe content-length parsers, but users relying on haproxy
to protect a known-vulnerable server might be at risk (and the risk of
a bug even in a reputable server should never be dismissed).

A configuration-based work-around consists in adding the following rule
in the frontend, to explicitly reject requests featuring an empty
content-length header that would have not be folded into an existing
one:

    http-request deny if { hdr_len(content-length) 0 }

The real fix consists in adjusting the parser so that it always expects a
value at the beginning of the header or after a comma. It will now reject
requests and responses having empty values anywhere in the C-L header.

This needs to be backported to all supported versions. Note that the
modification was made to functions h1_parse_cont_len_header() and
http_parse_cont_len_header(). Prior to 2.8 the latter was in
h2_parse_cont_len_header(). One day the two should be refused but the
former is also used by Lua.

The HTTP messaging reg-tests were completed to test these cases.

Thanks to Ben Kallus of Dartmouth College and Narf Industries for
reporting this! (this is in GH #2237).
2023-08-09 09:27:38 +02:00
Willy Tarreau
2e97857a84 BUG/MINOR: h3: reject more chars from the :path pseudo header
This is the h3 version of this previous fix:

   BUG/MINOR: h2: reject more chars from the :path pseudo header

In addition to the current NUL/CR/LF, this will also reject all other
control chars, the space and '#' from the :path pseudo-header, to avoid
taking the '#' for a part of the path. It's still possible to fall back
to the previous behavior using "option accept-invalid-http-request".

Here the :path header value is scanned a second time to look for
forbidden chars because we don't know upfront if we're dealing with a
path header field or another one. This is no big deal anyway for now.

This should be progressively backported to 2.6, along with the
following commits it relies on (the same as for h2):

   REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests
   REORG: http: move has_forbidden_char() from h2.c to http.h
   MINOR: ist: add new function ist_find_range() to find a character range
   MINOR: http: add new function http_path_has_forbidden_char()
2023-08-08 19:56:41 +02:00
Willy Tarreau
b3119d4fb4 BUG/MINOR: h2: reject more chars from the :path pseudo header
This is the h2 version of this previous fix:

    BUG/MINOR: h1: do not accept '#' as part of the URI component

In addition to the current NUL/CR/LF, this will also reject all other
control chars, the space and '#' from the :path pseudo-header, to avoid
taking the '#' for a part of the path. It's still possible to fall back
to the previous behavior using "option accept-invalid-http-request".

This patch modifies the request parser to change the ":path" pseudo header
validation function with a new one that rejects 0x00-0x1F (control chars),
space and '#'. This way such chars will be dropped early in the chain, and
the search for '#' doesn't incur a second pass over the header's value.

This should be progressively backported to stable versions, along with the
following commits it relies on:

     REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests
     REORG: http: move has_forbidden_char() from h2.c to http.h
     MINOR: ist: add new function ist_find_range() to find a character range
     MINOR: http: add new function http_path_has_forbidden_char()
     MINOR: h2: pass accept-invalid-http-request down the request parser
2023-08-08 19:56:41 +02:00
Willy Tarreau
2eab6d3543 BUG/MINOR: h1: do not accept '#' as part of the URI component
Seth Manesse and Paul Plasil reported that the "path" sample fetch
function incorrectly accepts '#' as part of the path component. This
can in some cases lead to misrouted requests for rules that would apply
on the suffix:

    use_backend static if { path_end .png .jpg .gif .css .js }

Note that this behavior can be selectively configured using
"normalize-uri fragment-encode" and "normalize-uri fragment-strip".

The problem is that while the RFC says that this '#' must never be
emitted, as often it doesn't suggest how servers should handle it. A
diminishing number of servers still do accept it and trim it silently,
while others are rejecting it, as indicated in the conversation below
with other implementers:

   https://lists.w3.org/Archives/Public/ietf-http-wg/2023JulSep/0070.html

Looking at logs from publicly exposed servers, such requests appear at
a rate of roughly 1 per million and only come from attacks or poorly
written web crawlers incorrectly following links found on various pages.

Thus it looks like the best solution to this problem is to simply reject
such ambiguous requests by default, and include this in the list of
controls that can be disabled using "option accept-invalid-http-request".

We're already rejecting URIs containing any control char anyway, so we
should also reject '#'.

In the H1 parser for the H1_MSG_RQURI state, there is an accelerated
parser for bytes 0x21..0x7e that has been tightened to 0x24..0x7e (it
should not impact perf since 0x21..0x23 are not supposed to appear in
a URI anyway). This way '#' falls through the fine-grained filter and
we can add the special case for it also conditionned by a check on the
proxy's option "accept-invalid-http-request", with no overhead for the
vast majority of valid URIs. Here this information is available through
h1m->err_pos that's set to -2 when the option is here (so we don't need
to change the API to expose the proxy). Example with a trivial GET
through netcat:

  [08/Aug/2023:16:16:52.651] frontend layer1 (#2): invalid request
    backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:50812
    buffer starts at 0 (including 0 out), 16361 free,
    len 23, wraps at 16336, error at position 7
    H1 connection flags 0x00000000, H1 stream flags 0x00000810
    H1 msg state MSG_RQURI(4), H1 msg flags 0x00001400
    H1 chunk len 0 bytes, H1 body len 0 bytes :

    00000  GET /aa#bb HTTP/1.0\r\n
    00021  \r\n

This should be progressively backported to all stable versions along with
the following patch:

    REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests

Similar fixes for h2 and h3 will come in followup patches.

Thanks to Seth Manesse and Paul Plasil for reporting this problem with
detailed explanations.
2023-08-08 19:56:11 +02:00
Willy Tarreau
d93a00861d MINOR: h2: pass accept-invalid-http-request down the request parser
We're adding a new argument "relaxed" to h2_make_htx_request() so that
we can control its level of acceptance of certain invalid requests at
the proxy level with "option accept-invalid-http-request". The goal
will be to add deactivable checks that are still desirable to have by
default. For now no test is subject to it.
2023-08-08 19:10:54 +02:00
Willy Tarreau
db97bb42d9 MINOR: mux-h2/traces: also suggest invalid header upon parsing error
Historically the parsing error used to apply only to too large headers,
so this is what has been reported in traces. But nowadays we can also
reject invalid characters, and when this happens the trace is a bit
misleading, so let's mention "or invalid".
2023-08-08 19:02:24 +02:00
Willy Tarreau
d13a80abb7 BUG/MAJOR: h3: reject header values containing invalid chars
In practice it's exactly the same for h3 as 54f53ef7c ("BUG/MAJOR: h2:
reject header values containing invalid chars") was for h2: we must
make sure never to accept NUL/CR/LF in any header value because this
may be used to construct splitted headers on the backend. Hence we
apply the same solution. Here pseudo-headers, headers and trailers are
checked separately, which explains why we have 3 locations instead of
2 for h2 (+1 for response which we don't have here).

This is marked major for consistency and due to the impact if abused,
but the reality is that at the time of writing, this problem is limited
by the scarcity of the tools which would permit to build such a request
in the first place. But this may change over time.

This must be backported to 2.6. This depends on the following commit
that exposes the filtering function:

    REORG: http: move has_forbidden_char() from h2.c to http.h
2023-08-08 19:02:24 +02:00
Willy Tarreau
d4069f3cee REORG: http: move has_forbidden_char() from h2.c to http.h
This function is not H2 specific but rather generic to HTTP. We'll
need it in H3 soon, so let's move it to HTTP and rename it to
http_header_has_forbidden_char().
2023-08-08 19:02:24 +02:00
Frédéric Lécaille
7c730803dc MINOR: quic: Warning for OpenSSL wrapper QUIC bindings without "limited-quic"
If the "limited-quic" globale option wa not set, the QUIC listener bindings were
not bound, this is ok, but silently ignored.

Add a warning in these cases to ask the user to explicitely enable the QUIC
bindings when building QUIC support against a TLS/SSL library without QUIC
support (OpenSSL).
2023-08-08 14:59:17 +02:00
Frédéric Lécaille
ab95230200 MINOR: quic: Release asap quic_conn memory from ->close() xprt callback.
Add a condition to release asap the quic_conn memory when the connection is
in "connection close" state from ->close() QUIC xprt callback.
2023-08-08 14:59:17 +02:00
Frédéric Lécaille
b930ff03d6 MINOR: quic: Release asap quic_conn memory (application level)
Add a check to the QUIC packet handler running at application level (after the
handshake is complete) to release the quic_conn memory calling quic_conn_release().
This is done only if the mux is not started.
2023-08-08 14:59:17 +02:00
Frédéric Lécaille
9f7cfb0a56 MEDIUM: quic: Allow the quic_conn memory to be asap released.
When the connection enters the "connection closing" state after having sent
a datagram with CONNECTION_CLOSE frames inside its packets, a lot of memory
may be freed from quic_conn objects (QUIC connection). This is done allocating
a reduced sized object which keeps enough information to handle the remaining
incoming packets for the connection in "connection closing" state, and to
continue to send again the previous datagram with CONNECTION_CLOSE frames inside
which has already been sent.

Define a new quic_cc_conn struct which represents the connection object after
entering the "connection close" state and after having release the quic_conn
connection object.

Define <pool_head_quic_cc_conn> new pool for these quic_cc_conn struct objects.

Define QUIC_CONN_COMMON structure which is shared between quic_conn struct object
(the connection before entering "connection close" state), and new quic_cc_conn
struct object (the connection after entering "connection close"). So, all the
members inside QUIC_CONN_COMMON may be indifferently dereferenced from a
quic_conn struct or a quic_cc_conn struct pointer.

Implement qc_new_cc_conn() function to allocate such connections in
"connection close" state. This function is responsible of copying the
required information from the original connection (quic_conn) to the remaining
connection (quic_cc_conn). Among others initialization, it redefined the
QUIC packet handler task to quic_cc_conn_io_cb() and the idle timer task
to qc_cc_idle_timer_task(). quic_cc_conn_io_cb() drains the received and
resend the datagram which CONNECTION_CLOSE frame which has already been sent
when entering "connection close" state. qc_cc_idle_timer_task() only releases
the remaining quic_cc_conn struct object.

Modify quic_conn_release() to allocate quic_cc_conn struct objects from the
original connection passed as argument. It does nothing if this original
connection is not in closing state, or if the idle timer has already expired.

Implement quic_release_cc_conn() to release a "connection close" connection.
It is called when its timer expires or if an error occured when sending
a packet from this connection when the peer is no more reachable.
2023-08-08 14:59:17 +02:00
Frédéric Lécaille
276697438d MINOR: quic: Use a pool for the connection ID tree.
Add "quic_cids" new pool to allocate the ->cids trees of quic_conn objects.
Replace ->cids member of quic_conn objects by pointer to "quic_cids" and
adapt the code consequently. Nothing special.
2023-08-08 10:57:00 +02:00
Frédéric Lécaille
dc9b8e1f27 MEDIUM: quic: Send CONNECTION_CLOSE packets from a dedicated buffer.
Add a new pool <pool_head_quic_cc_buf> for buffer used when building datagram
wich CONNECTION_CLOSE frames inside with QUIC_MIN_CC_PKTSIZE(128) as minimum
size.
Add ->cc_buf_area to quic_conn struct to store such buffers.
Add ->cc_dgram_len to store the size of the "connection close" datagrams
and ->cc_buf a buffer struct to be used with ->cc_buf_area as ->area member
value.
Implement qc_get_txb() to be called in place of qc_txb_alloc() to allocate
a struct "quic_cc_buf" buffer when the connection needs an immediate close
or a buffer struct if not.
Modify qc_prep_hptks() and qc_prep_app_pkts() to allow them to use such
"quic_cc_buf" buffer when an immediate close is required.
2023-08-08 10:57:00 +02:00
Frédéric Lécaille
f7ab5918d1 MINOR: quic: Move some counters from [rt]x quic_conn anonymous struct
Move rx.bytes, tx.bytes and tx.prep_bytes quic_conn struct member to
bytes anonymous struct (bytes.rx, bytes.tx and bytes.prep member respectively).
They are moved before being defined into a bytes anonoymous struct common to
a future struct to be defined.

Consequently adapt the code.
2023-08-07 18:57:45 +02:00
Frédéric Lécaille
a45f90dd4e MINOR: quic: Amplification limit handling sanitization.
Add a BUG_ON() to quic_peer_validated_addr() to check the amplification limit
is respected when it return false(0), i.e. when the connection is not validated.

Implement quic_may_send_bytes() which returns the number of bytes which may be
sent when the connection has not already been validated and call this functions
at several places when this is the case (after having called
quic_peer_validated_addr()).

Furthermore, this patch improves the code maintainability. Some patches to
come will have to rename ->[rt]x.bytes quic_conn struct members.
2023-08-07 18:57:45 +02:00
Christopher Faulet
624979cf3b BUG/MAJOR: http-ana: Get a fresh trash buffer for each header value replacement
When a "replace-header" action is used, we loop on all headers in the
message to change value of all headers matching a name. The new value is
placed in a trash. However, there is a race here because if the message must
be defragmented, another trash is used. If several defragmentation are
performed because several headers must be updated at same time, the first
trash, used to store the new value, may be crushed. Indeed, there are only 2
pre-allocated trash used in rotation. and the trash to store the new value
is never renewed. As consequece, random data may be inserted into the header
value.

Here, to fix the issue, we must take care to refresh the trash buffer when
we evaluated a new header. This way, a trash used for the new value, and
eventually another way for the htx defragmentation. But that's all.

Thanks to Christian Ruppert for his detailed report.

This patch must be to all stable versions. On the 2.0, the patch must be
applied on src/proto_htx.c and the function is named
htx_transform_header_str().
2023-08-04 17:06:31 +02:00
Amaury Denoyelle
559482c11e MINOR: h3: abort request if not completed before full response
A HTTP server may provide a complete response even prior receiving the
full request. In this case, RFC 9114 allows the server to abort read
with a STOP_SENDING with error code H3_NO_ERROR.

This scenario was notably reproduced with haproxy and an inactive
server. If the client send a POST request, haproxy may provide a full
HTTP 503 response before the end of the full request.
2023-08-04 16:17:16 +02:00
Amaury Denoyelle
f40a72a7ff BUILD: quic: fix wrong potential NULL dereference
GCC warns about a possible NULL dereference when requeuing a datagram on
the connection socket. This happens due to a MT_LIST_POP to retrieve a
rxbuf instance.

In fact, this can never be NULL there is enough rxbuf allocated for each
thread. Once a thread has finished to work with it, it must always
reappend it.

This issue was introduced with the following patch :
  commit b34d353968
  BUG/MEDIUM: quic: consume contig space on requeue datagram
As such, it must be backported in every version with the above commit.

This should fix the github CI compilation error.
2023-08-04 15:42:34 +02:00
Amaury Denoyelle
f59635c495 BUG/MINOR: quic: reappend rxbuf buffer on fake dgram alloc error
A thread must always reappend the rxbuf instance after finishing
datagram reception treatment. This was not the case on one error code
path : when fake datagram allocation fails on datagram requeing.

This issue was introduced with the following patch :
  commit b34d353968
  BUG/MEDIUM: quic: consume contig space on requeue datagram
As such, it must be backported in every version with the above commit.
2023-08-04 15:42:30 +02:00
Christopher Faulet
e827b45821 BUG/MINOR: http-client: Don't forget to commit changes on HTX message
In the http-client I/O handler, HTX request and response are loaded from the
channels buffer. Some changes are preformed in these messages. So, we must
take care to commit changes into the underlying buffer by calling
htx_to_buf().

It is especially important when the HTX message becoms empty to be able to
quickly release the buffer.

This patch should be backported as far as 2.6.
2023-08-04 14:32:48 +02:00
Amaury Denoyelle
b34d353968 BUG/MEDIUM: quic: consume contig space on requeue datagram
When handling UDP datagram reception, it is possible to receive a QUIC
packet for one connection to the socket attached to another connection.
To protect against this, an explicit comparison is done against the
packet DCID and the quic-conn CID. On no match, the datagram is requeued
and dispatched via rxbuf and will be treated as if it arrived on the
listener socket.

One reason for this wrong reception is explained by the small race
condition that exists between bind() and connect() syscalls during
connection socket initialization. However, one other reason which was
not thought initially is when clients reuse the same IP:PORT for
different connections. In this case the current FD attribution is not
optimal and this can cause a substantial number of requeuing.

This situation has revealed a bug during requeuing. If rxbuf contig
space is not big enough for the datagram, the incoming datagram was
dropped, even if there is space at buffer origin. This can cause several
datagrams to be dropped in a series until eventually buffer head is
moved when passing through the listener FD.

To fix this, allocate a fake datagram to consume contig space. This is
similar to the handling of datagrams on the listener FD. This allows
then to store the datagram to requeue on buffer head and continue.

This can be reproduced by starting a lot of connections. To increase the
phenomena, POST are used to increase the number of datagram dropping :

$ while true; do curl -F "a=@~/50k" -k --http3-only -o /dev/null https://127.0.0.1:20443/; done
2023-08-04 14:27:40 +02:00
Christopher Faulet
ef2b15998c BUG/MINOR: htx/mux-h1: Properly handle bodyless responses when splicing is used
There is a mechanisme in the H1 and H2 multiplexer to skip the payload when
a response is returned to the client when it must not contain any payload
(response to a HEAD request or a 204/304 response). However, this does not
work when the splicing is used. The H2 multiplexer does not support the
splicing, so there is no issue. But with the mux-h1, when data are sent
using the kernel splicing, the mux on the server side is not aware the
client side should skip the payload. And once the data are put in a pipe,
there is no way to stop the sending.

It is a defect of the current design. This will be easier to deal with this
case when the mux-to-mux forwarding will be implemented. But for now, to fix
the issue, we should add an HTX flag on the start-line to pass the info from
the client side to the server side and be able to disable the splicing in
necessary.

The associated reg-test was improved to be sure it does not fail when the
splicing is configured.

This patch should be backported as far as 2.4..
2023-08-02 12:05:05 +02:00
Christopher Faulet
dabeb5cbd5 MEDIUM: stream: Reset response analyse expiration date if there is no analyzer
When the stream expiration date is computed at the end of process_stream(),
if there is no longer analyzer on the request channel, its analyse
expiration date is reset. The same is now performed on the response
channel. This way, we are sure to not inherit of an orphan expired date.

This should prevent spinning loop on process_stream().
2023-08-01 11:33:45 +02:00
Christopher Faulet
f1bf0b1a6b BUG/MEDIUM: bwlim: Reset analyse expiration date when then channel analyse ends
The bandwidth limitation filter sets the analyse expiration date on the
channel to restart the data forwarding and thus limit the bandwidth.
However, this expiration date is not reset on abort. So it is possible to
reuse the same expiration date to set the stream one. If it expired before
the end of the stream, this will lead to a spinning loop on process_stream()
because the task expiration date is always set in past.

To fix the issue, when the analyse ends on a channel, the bandwidth
limitation filter reset the corrsponding analyse expiration date.

This patch should fix the issue #2230. It must be backported as far as 2.7.
2023-08-01 11:33:45 +02:00
Willy Tarreau
bbc3e4463e BUILD: cfgparse: keep a single "curproxy"
Surprisingly, commit 00e00fb42 ("REORG: cfgparse: extract curproxy as a
global variable") caused a build breakage on the CI but not on two
developers' machines. It looks like it's dependent on the linker version
used. What happens is that flt_spoe.c already has a curproxy struct which
already is a copy of the one passed by the parser because it also needed
it to be exported, so they now conflict. Let's just drop this unused copy.
2023-08-01 11:31:39 +02:00
Patrick Hemmer
7fccccccea MINOR: acl: add acl() sample fetch
This provides a sample fetch which returns the evaluation result
of the conjunction of named ACLs.
2023-08-01 10:49:06 +02:00
Patrick Hemmer
00e00fb424 REORG: cfgparse: extract curproxy as a global variable
This extracts curproxy from cfg_parse_listen so that it can be referenced by
keywords that need the context of the proxy they are being used within.
2023-08-01 10:48:28 +02:00
Frédéric Lécaille
bdd863477d BUG/MINOR: quic+openssl_compat: Non initialized TLS encryption levels
The ->openssl_compat struct member of the QUIC connection object was not fully
initialized. This was done on purpose, believing that ->write_level and
->read_level member was initialized by quic_tls_compat_keylog_callback() (the
keylog callback) before entering quic_tls_compat_msg_callback() which
has to parse the TLS messages. In fact this is not the case at all.
quic_tls_compat_msg_callback() is called before quic_tls_compat_keylog_callback()
when receiving the first TLS ClientHello message.

->write_level and ->read_level was not initialized to <ssl_encryption_initial> (= 0)
as this is implicitely done by the originial ngxinx wrapper which calloc()s the openssl
compatibily structure. This could lead to a crash after ssl_to_qel_addr() returns
NULL when called by ha_quic_add_handshake_data().

This patch explicitely initialializes ->write_level and ->read_level to
<ssl_encryption_initial> (=0).

No need to backport.
2023-07-31 15:18:36 +02:00
Christopher Faulet
2f8c44cfb7 BUG/MEDIUM: h3: Be sure to handle fin bit on the last DATA frame
When DATA frames are decoded for a QUIC stream, we take care to not exceed
the announced content-length, if any. To do so, we check we don't received
more data than excepted but also no less than announced. For the last check,
we rely on the fin bit.

However, it is possible to have several DATA frames to decode at a time
while the end of the stream was received. In this case, we must take care to
handle the fin bit only on the last frame. But because of a bug, the fin bit
was handled to early, erroneously triggering an internal error.

This patch must be backported as far as 2.6.
2023-07-28 09:33:32 +02:00
Dragan Dosen
f7596209ee BUG/MINOR: chunk: fix chunk_appendf() to not write a zero if buffer is full
If the buffer is completely full, the function chunk_appendf() would
write a zero past it, which can result in unexpected behavior.

Now we make a check before calling vsnprintf() and return the current
chunk size if no room is available.

This should be backported as far as 2.0.
2023-07-27 22:05:25 +02:00
Frédéric Lécaille
c156c5bda6 MINOR: quic; Move the QUIC frame pool to its proper location
pool_head_quic_frame QUIC frame pool definition is move from quic_conn-t.h to
quic_frame-t.h.
Its declation is moved from quic_conn.c to quic_frame.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
fa58f67787 CLEANUP: quic: quic_conn struct cleanup
Remove no more used QUIC_TX_RING_BUFSZ macro.
Remove several no more used quic_conn struct members.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
444c1a4113 MINOR: quic: Split QUIC connection code into three parts
Move the TX part of the code to quic_tx.c.
Add quic_tx-t.h and quic_tx.h headers for this TX part code.
The definition of quic_tx_packet struct has been move from quic_conn-t.h to
quic_tx-t.h.

Same thing for the TX part:
Move the RX part of the code to quic_rx.c.
Add quic_rx-t.h and quic_rx.h headers for this TX part code.
The definition of quic_rx_packet struct has been move from quic_conn-t.h to
quic_rx-t.h.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
7008f16d57 MINOR: quic: Add a new quic_ack.c C module for QUIC acknowledgements
Extract the code in relation with the QUIC acknowledgements from quic_conn.c
to quic_ack.c to accelerate the compilation of quic_conn.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
f454b78fa9 MINOR: quic: Add new "QUIC over SSL" C module.
Move the code which directly calls the functions of the OpenSSL QUIC API into
quic_ssl.c new C file.
Some code have been extracted from qc_conn_finalize() to implement only
the QUIC TLS part (see quic_tls_finalize()) into quic_tls.c.
qc_conn_finalize() has also been exported to be used from this new quic_ssl.c
C module.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
57237f68ad MINOR: quic: Move TLS related code to quic_tls.c
quic_tls_key_update() and quic_tls_rotate_keys() are QUIC TLS functions.
Let's move them to their proper location: quic_tls.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
953e67abb6 MINOR: quic: Export QUIC CLI code from quic_conn.c
To accelerate the compilation of quic_conn.c file, export the code in relation
with the QUIC CLI from quic_conn.c to quic_cli.c.
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
6334f4f6c5 MINOR: quic: Export QUIC traces code from quic_conn.c
To accelerate the compilation of quic_conn.c file, export the code in relation
with the traces from quic_conn.c to quic_trace.c.
Also add some headers (quic_trace-t.h and quic_trace.h).
2023-07-27 10:51:03 +02:00
Frédéric Lécaille
8d19366832 BUG/MINOR: quic: Possible crash when acknowledging Initial v2 packets
The memory allocated for TLS cipher context used to encrypt/decrypt QUIC v2
packets should not be released as soon as possible. Indeed, even if
after having received an client Handshake packet one may drop the Initial
TLS cipher context, one has often to used it to acknowledged Initial packets.

No need to backport.
2023-07-27 10:51:03 +02:00
William Lallemand
a6964cfb4f MINOR: sample: implement the T* timer tags from the log-format as fetches
This commit implements the following timer tags available in the log
format as sample fetches:

	req.timer.idle (%Ti)
	req.timer.tq (%Tq)
	req.timer.hdr (%TR)
	req.timer.queue (%Tw)

	res.timer.hdr (%Tr)
	res.timer.user (%Tu)

	txn.timer.total (%Ta)
	txn.timer.data (%Td)

	bc.timer.connect (%Tc)
	fc.timer.handshake (%Th)
	fc.timer.total (%Tt)
2023-07-26 17:44:38 +02:00
William Lallemand
202b8e34a2 BUG/MINOR: sample: check alloc_trash_chunk() in conv_time_common()
Check the trash chunk allocation in conv_time_common(), also remove the
data initialisation which is already done when allocating.

Fixes issue #2227.

No backported needed.
2023-07-25 10:05:29 +02:00
William Lallemand
5e63e1636a MEDIUM: sample: implement us and ms variant of utime and ltime
Implement 4 new fetches:

- ms_ltime
- ms_utime

- us_ltime
- us_utime

Which are the same as ltime and utime but with milliseconds and
microseconds input.

The converters also suports the %N conversion specifier like in date(1).

Unfortunately since %N is not supported by strftime, the format string
is parsed twice, once manually to replace %N, and once by strftime.
2023-07-24 17:12:29 +02:00
William Lallemand
739c4e5b1e MINOR: sample: accept_date / request_date return %Ts / %tr timestamp values
Implement %[accept_date] which returns the same as %Ts log-format tag.
Implement %[request_date] which is a timestamp for %tr.

accept_date and request_date take an faculative unit argument which can
be 's', 'ms' or 'us'.

The goal is to be able to convert these 2 timestamps to HAProxy date
format like its done with %T, %tr, %trg etc
2023-07-24 17:12:29 +02:00
William Lallemand
2a46bfe239 MINOR: sample: implement act_conn sample fetch
Implement the act_conn sample fetch which is the same as %ac (actconn)
in the log-format.
2023-07-24 17:12:29 +02:00
William Lallemand
ac87815be9 MINOR: sample: add pid sample
Implement the pid sample fetch.
2023-07-24 17:12:29 +02:00
Christopher Faulet
e42241ed2b BUG/MEDIUM: h3: Properly report a C-L header was found to the HTX start-line
When H3 HEADERS frames are converted to HTX, if a Content-Length header was
found, the HTX start-line must be notified by setting HTX_SL_F_CLEN flag.
Some components may rely on this flag to know there is a content-length
without looping on headers to get the info.

Among other this, it is mandatory for the FCGI multiplexer because it must
announce the message body length.

This patch must be backported as far as 2.6.
2023-07-24 11:58:35 +02:00
Remi Tricot-Le Breton
adb96fd9ff BUG/MINOR: ssl: OCSP callback only registered for first SSL_CTX
If multiple SSL_CTXs use the same certificate that has an OCSP response
file on the filesystem, only the first one will have the OCSP callback
set. This bug was introduced by "cc346678d MEDIUM: ssl: Add ocsp_certid
in ckch structure and discard ocsp buffer early" which cleared the
ocsp_response from the ckch_data after it was inserted in the tree,
which prevented subsequent contexts from having the callback registered.

This patch should be backported to 2.8.
2023-07-24 10:43:20 +02:00
Frédéric Lécaille
f32201abb0 MINOR: quic: Add "limited-quic" new tuning setting
This setting which may be used into a "global" section, enables the QUIC listener
bindings when haproxy is compiled with the OpenSSL wrapper. It has no effect
when haproxy is compiled against a TLS stack with QUIC support, typically quictls.
2023-07-21 19:19:27 +02:00
Frédéric Lécaille
2fd67c558a MINOR: quic: Missing encoded transport parameters for QUIC OpenSSL wrapper
This wrapper needs to have an access to an encoded version of the local transport
parameter (to be sent to the peer). They are provided to the TLS stack thanks to
qc_ssl_compat_add_tps_cb() callback.

These encoded transport parameters were attached to the QUIC connection but
removed by this commit to save memory:

      MINOR: quic: Stop storing the TX encoded transport parameters

This patch restores these transport parameters and attaches them again
to the QUIC connection (quic_conn struct), but only when the QUIC OpenSSL wrapper
is compiled.
Implement qc_set_quic_transport_params() to encode the transport parameters
for a connection and to set them into the stack and make this function work
for both the OpenSSL wrapper or any other TLS stack with QUIC support. Its uses
the encoded version of the transport parameters attached to the connection
when compiled for the OpenSSL wrapper, or local parameters when compiled
with TLS stack with QUIC support. These parameters are passed to
quic_transport_params_encode() and SSL_set_quic_transport_params() as before
this patch.
2023-07-21 17:27:40 +02:00
Frédéric Lécaille
bcbd5a287b MINOR: quic: SSL context initialization with QUIC OpenSSL wrapper.
When the QUIC OpenSSL wrapper is used, the keylog has to be set and a QUIC
specific TLS 1.3 extension must be added to the EncryptedExtensions message.
This is done by quic_tls_compat_init().
2023-07-21 15:54:31 +02:00
Frédéric Lécaille
173b3d9497 MINOR: quic: Useless call to SSL_CTX_set_quic_method()
SSL_set_quic_method() is already called at SSL session level. This call
is useless. Furthermore, SSL_CTX_set_quic_method() is not implemented by
the QUIC OpenSSL wrapper to come.

Should be backported as far as 2.6 to ease further backports to come.
2023-07-21 15:54:31 +02:00
Frédéric Lécaille
a53e523aef MINOR: quic: Call the keylog callback for QUIC openssl wrapper from SSL_CTX_keylog()
SSL_CTX_keylog() is the callback used when the TLS keylog feature is enabled with
tune.ssl.keylog configuration setting. But the QUIC openssl wrapper also needs
to use such a callback to receive the QUIC TLS secrets from the TLS stack.

Add a call to the keylog callback for the QUIC openssl wrapper to SSL_CTX_keylog()
to ensure that it will be called when the TLS keylog feature is enabled.
2023-07-21 15:54:31 +02:00
Frédéric Lécaille
557706b34c MINOR: quic: Initialize TLS contexts for QUIC openssl wrapper
When the QUIC OpenSSL wrapper use is enabled, all the TLS contexts (SSL_CTX) must
be configured to support it. This is done calling quic_tls_compat_init() from
ssl_sock_prepare_ctx(). Note that quic_tls_compat_init() ignore the TLS context
which are not linked to non-QUIC TLS sessions/connections.

Required for the QUIC openssl wrapper support.
2023-07-21 15:54:31 +02:00
Frédéric Lécaille
91f1950ed6 MINOR: quic: Make ->set_encryption_secrets() be callable two times
With this patch, ha_set_encryption_secrets() may be callable two times,
one time to derive the RX secrets and a second time to derive the TX secrets.

There was a missing step to do so when the RX secret was received from the stack.
In this case the secret was not stored for the keyupdate, leading the keyupdate
RX part to be uninitialized.

Add a label to initialize the keyupdate RX part and a "goto" statement to run
the concerned code after having derived the RX secrets.

This patch is required to make the keupdate feature work with the OpenSSL wrapper.

Must be backported as far as 2.6.
2023-07-21 15:54:31 +02:00
Frédéric Lécaille
d66b95d33d MINOR: quic: Do not enable 0RTT with SSL_set_quic_early_data_enabled()
SSL_set_quic_early_data_enabled is not implemented by the QUIC OpenSSL wrapper.
Furthermore O-RTT is not supported by this wrapper. Do not know why at
this time.
2023-07-21 15:53:41 +02:00
Frédéric Lécaille
039f5a8786 MINOR: quic: Set the QUIC connection as extra data before calling SSL_set_quic_method()
This patch is required for the QUIC OpenSSL wrapper, and does not break anything
for the other TLS stacks with their own QUIC support (quictls for instance).

The implementation of SSL_set_quic_method() needs to access the quic_conn object
to store data within. But SSL_set_quic_method() is only aware of the SSL session
object. This is the reason why it is required to set the quic_conn object
as extra data to the SSL session object before calling SSL_set_quic_method()
so that it can be retrieve by SSL_set_quic_method().
2023-07-21 15:53:41 +02:00
Frédéric Lécaille
85d763b11e MINOR: quic: Do not enable O-RTT with USE_QUIC_OPENSSL_COMPAT
Modify ssl_quic_initial_ctx() to disable O-RTT when the QUIC OpenSSL wrapper was
enabled.
2023-07-21 15:53:41 +02:00
Frédéric Lécaille
1b03f8016d MINOR: quic: QUIC openssl wrapper implementation
Highly inspired from nginx openssl wrapper code.

This wrapper implement this list of functions:

   SSL_set_quic_method(),
   SSL_quic_read_level(),
   SSL_quic_write_level(),
   SSL_set_quic_transport_params(),
   SSL_provide_quic_data(),
   SSL_process_quic_post_handshake()

and SSL_QUIC_METHOD QUIC specific bio method which are also implemented by quictls
to support QUIC from OpenSSL. So, its aims is to support QUIC from a standard OpenSSL
stack without QUIC support. It relies on the OpenSSL keylog feature to retreive
the secrets derived by the OpenSSL stack during a handshake and to pass them to
the ->set_encryption_secrets() callback as this is done by quictls. It makes
usage of a callback (quic_tls_compat_msg_callback()) to handle some TLS messages
only on the receipt path. Some of them must be passed to the ->add_handshake_data()
callback as this is done with quictls to be sent to the peer as CRYPTO data.
quic_tls_compat_msg_callback() callback also sends the received TLS alert with
->send_alert() callback.

AES 128-bits with CCM mode is not supported at this time. It is often disabled by
the OpenSSL stack, but as it can be enabled by "ssl-default-bind-ciphersuites",
the wrapper will send a TLS alerts (Handhshake failure) if this algorithm is
negotiated between the client and the server.

0rtt is also not supported by this wrapper.
2023-07-21 15:53:40 +02:00
Christopher Faulet
ff1c803279 BUG/MEDIUM: listener: Acquire proxy's lock in relax_listener() if necessary
Listener functions must follow a common locking pattern:

  1. Get the proxy's lock if necessary
  2. Get the protocol's lock if necessary
  3. Get the listener's lock if necessary

We must take care to respect this order to avoid any ABBA issue. However, an
issue was introduced in the commit bcad7e631 ("MINOR: listener: add
relax_listener() function"). relax_listener() gets the lisener's lock and if
resume_listener() is called, the proxy's lock is then acquired.

So to fix the issue, the proxy's lock is first acquired in relax_listener(),
if necessary.

This patch should fix the issue #2222. It must be backported as far as 2.4
because the above commit is marked to be backported there.
2023-07-21 15:08:27 +02:00
Marcos de Oliveira
462b54dee2 BUG/MINOR: server-state: Avoid warning on 'file not found'
On a clean installation, users might want to use server-state-file and
the recommended zero-warning option. This caused a problem if
server-state-file was not found, as a warning was emited, causing
startup to fail.

This will allow users to specify nonexistent server-state-file at first,
and dump states to the file later.

Fixes #2190

CF: Technically speaking, this patch can be backported to all stable
    versions. But it is better to do so to 2.8 only for now.
2023-07-21 15:08:27 +02:00
Marcos de Oliveira
122a903b94 BUG/MINOR: server-state: Ignore empty files
Users might want to pre-create an empty file for later dumping
server-states. This commit allows for that by emiting a notice in case
file is empty and a warning if file is not empty, but version is unknown

Fix partially: #2190

CF: Technically speaking, this patch can be backported to all stable
    versions. But it is better to do so to 2.8 only for now.
2023-07-21 15:08:27 +02:00
Frédéric Lécaille
bd6ef51fa5 MINOR: quic: Ping from Initial pktns before reaching anti-amplification limit
There are cases where there are enough room on the network to send 1200 bytes
into a PING only Initial packets. This may be considered as the last chance
for the connection to complete the handshake. Indeed, the client should
reply with at least a 1200 bytes datagram with an Initial packet inside.
This would give the haproxy endpoint a credit of 3600 bytes to complete
the handshake before reaching the anti-amplification limit again, and so on.
2023-07-21 14:31:42 +02:00
Frédéric Lécaille
f92d816e3d BUG/MINOR: quic: Missing parentheses around PTO probe variable.
It is hard to analyze the impact of this bug. I guess it could lead a connection
to probe infinitively (with an exponential backoff probe timeout) during an handshake,
but one has never seen such a case.

Add missing parentheses around ->flags of the TX packet built by qc_do_build_pkt()
to detect that this packet embeds ack-eliciting frames. In this case if a probing
packet was needed the ->pto_probe value of the packet number space must be
decremented.

Must be backported as far as 2.6.
2023-07-21 14:31:42 +02:00
Frédéric Lécaille
0645e56a6e MINOR: quic: Add traces for qc_frm_free()
Useful to diagnose memory leak issues in relation with the QUIC frame objects.
2023-07-21 14:30:35 +02:00
Frédéric Lécaille
6d027c2edb MEDIUM: quic: Handshake I/O handler rework.
quic_conn_io_cb() is the I/O handler used during the handshakes. It called
qc_prep_pkts() after having called qc_treat_rx_pkts() to prepare datagrams
after having parsed incoming ones with branches to "next_level" label depending
on the connection state and if the current TLS session was a 0-RTT session
or not. The code doing that was ugly and not easy to maintain.

As qc_prep_pkts() is able to handle all the encryption levels available for
a connection, there is no need to keep this code.

After simplification, for now on, to be short, quic_conn_io_cb() called only one
time qc_prep_pkts() after having called qc_treat_rx_pkts().

Furthermore, there are more chances that this I/O handler could be reused for
the haproxy server side connections.
2023-07-21 14:30:35 +02:00
Frédéric Lécaille
367ece4add CLEANUP: quic: Remove a useless TLS related variable from quic_conn_io_cb().
<ssl_err> was defined and used but its value is never modified after having
been initialized. It is definitively useless.
2023-07-21 14:30:35 +02:00
Frédéric Lécaille
cf2368a3d5 MEDIUM: quic: Packet building rework.
The aim of this patch is to allow the building of QUIC datagrams with
as much as packets with different encryption levels inside during handshake.
At this time, this is possible only for at most two encryption levels.
That said, most of the time, a server only needs to use two encryption levels
by datagram, except during retransmissions.

Modify qc_prep_pkts(), the function responsible of building datagrams, to pass
a list of encryption levels as parameter in place of two encryption levels. This
function is also used when retransmitting datagrams. In this case this is a
customized/flexible list of encryption level which is passed to this function.
Add ->retrans new member to quic_enc_level struct, to be used as attach point
to list of encryption level used only during retransmission, and ->retrans_frms
new member which is a pointer to a list of frames to be retransmitted.
2023-07-21 14:30:35 +02:00
Frédéric Lécaille
dfb9fad183 MINOR: quic: Add traces to qc_may_build_pkt()
This patch is very useful to debug issues in relation with the packets building.
2023-07-21 14:30:19 +02:00
Frédéric Lécaille
2b8510d722 MINOR: quic: Release asap the negotiated Initial TLS context.
This context may be released at the same time as the Initial TLS context.
This is done calling quic_tls_ctx_secs_free() and pool_free() in two code locations.
Implement quic_nictx_free() to do that.
2023-07-21 14:27:10 +02:00
Frédéric Lécaille
90a63ae4fa MINOR: quic: Dynamic allocation for negotiated Initial TLS cipher context.
Shorten ->negotiated_ictx quic_conn struct member (->nictx).
This variable is used during version negotiation. Indeed, a connection
may have to support support several QUIC versions of paquets during
the handshake. ->nictx is the QUIC TLS cipher context used for the negotiated
QUIC version.

This patch allows a connection to dynamically allocate this TLS cipher context.

Add a new pool (pool_head_quic_tls_ctx) for such QUIC TLS cipher context object.
Modify qc_new_conn() to initialize ->nictx to NULL value.
quic_tls_ctx_secs_free() frees all the secrets attached to a QUIC TLS cipher context.
Modify it to do nothing if it is called with a NULL TLS cipher context.
Modify to allocate ->nictx from qc_conn_finalize() just before initializing
its secrets. qc_conn_finalize() allocates -nictx only if needed (if a new QUIC
version was negotiated).
Modify qc_conn_release() which release a QUIC connection (quic_conn struct) to
release ->nictx TLS cipher context.
2023-07-21 14:27:10 +02:00
Frédéric Lécaille
642dba8c22 MINOR: quic: Stop storing the TX encoded transport parameters
There is no need to keep an encoded version of the QUIC listener transport
parameters attache to the connection.

Remove ->enc_params and ->enc_params_len member of quic_conn struct.
Use variables to build the encoded transport parameter local to
ha_quic_set_encryption_secrets() before they are passed to
SSL_set_quic_transport_params().

Modify qc_ssl_sess_init() prototype. It was expected to be used with
the encoded transport parameters as passed parameter, but they were not
used. Cleanup this function.
2023-07-21 14:27:10 +02:00
Patrick Hemmer
57926fe8a3 MINOR: peers: add peers keyword registration
This adds support for registering keywords in the 'peers' section.
2023-07-20 18:12:44 +02:00
Christopher Faulet
e2e72e578e BUG/MINOR: server: Don't warn on server resolution failure with init-addr none
During startup, when the "none" method for "init-addr" is evaluated, a
warning is emitted if a resolution failure was previously encountered. The
documentation of the "none" method states it should be used to ignore server
resolution failures and let the server starts in DOWN state. However,
because a warning may be emitted, it is not possible to start HAProxy with
"zero-warning" option.

The same is true when "-dr" command line option is used. It is counter
intuitive and, in a way, this contradict what is specified in the
documentation.

So instead, a notice message is now emitted. At the end, if "-dr" command
line option is used or if "none" method is explicitly used, it means the
admin is agree with server resolution failures. There is no reason to emit a
warning.

This patch should fix the issue #2176. It could be backported to all stable
versions but backporting to 2.8 is probably enough for now.
2023-07-20 18:12:44 +02:00
Willy Tarreau
6ecabb3f35 CLEANUP: config: make parse_cpu_set() return documented values
parse_cpu_set() stopped returning the undocumented -1 which was a
leftover from an earlier attempt, changed from ulong to int since
it only returns a success/failure and no more a mask. Thus it must
not return -1 and its callers must only test for != 0, as is
documented.
2023-07-20 11:01:09 +02:00
Willy Tarreau
f54d8c6457 CLEANUP: cpuset: remove the unused proc_t1 field in cpu_map
This field used to store the cpumap of the first thread in a group, and
was used till 2.4 to hold some default settings, after which it was no
longer used. Let's just drop it.
2023-07-20 11:01:09 +02:00
Willy Tarreau
c955659906 BUG/MINOR: init: set process' affinity even in foreground
The per-process CPU affinity settings are only applied during forking,
which means that cpu-map are ignored when running in foreground (e.g.
haproxy started with -db). This is historic due to the original semantics
of a process array, but isn't documented and causes surprises when trying
to debug affinity settings.

Let's make sure the setting is applied to the workers themselves even
in foreground. This may be backported to 2.6 though it is really not
important. If backported, it also depends on previous commit:

  BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct
2023-07-20 11:01:09 +02:00
Willy Tarreau
151f9a2808 BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct
We're currently having a problem with the porting from cpu_map from
processes to thread-groups as it happened in 2.7 with commit 5b09341c0
("MEDIUM: cpu-map: replace the process number with the thread group
number"), though it seems that it has deeper roots even in 2.0 and
that it was progressively made worng over time.

The issue stems in the way the per-process and per-thread cpu-sets were
employed over time. Originally only processes were supported. Then
threads were added after an optional "/" and it was documented that
"cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified
in 2.5 by commit 317804d28 ("DOC: update references to process numbers
in cpu-map and bind-process").

The reality is different: when processes were still supported, setting
"cpu-map 1" would apply the mask to the process itself (and only when
run in the background, which is not documented either and is also a
bug for another fix), and would be combined with any possible per-thread
mask when calculating the threads' affinity, possibly resulting in empty
sets. However, "cpu-map 1/all" would only set the mask for the threads
and not the process. As such the following:

    cpu-map 1 odd
    cpu-map 1/1-8 even

would leave no CPU while doing:

    cpu-map 1/all odd
    cpu-map 1/1-8 even

would allow all CPUs.

While such configs are very unlikely to ever be met (which is why this
bug is tagged minor), this is becoming quite more visible while testing
automatic CPU binding during 2.9 development because due to this bug
it's much more common to end up with incorrect bindings.

This patch fixes it by simply removing the .proc entry from cpu_map and
always setting all threads' maps. The process is no longer arbitrarily
bound to the group 1's mask, but in case threads are disabled, we'll
use thread 1's mask since it contains the configured CPUs.

This fix should be backported at least to 2.6, but no need to insist if
it resists as it's easier to break cpu-map than to fix an unlikely issue.
2023-07-20 11:01:09 +02:00
Willy Tarreau
67c99db0a7 BUG/MINOR: config: do not detect NUMA topology when cpu-map is configured
As documented, the NUMA auto-detection is not supposed to be used when
the CPU affinity was set either by taskset (already checked) or by a
cpu-map directive. However this check was missing, so that configs
having cpu-map entries would still first bind to a single node. In
practice it has no impact on correct configs since bindings will be
replaced. However for those where the cpu-map directive are not
exhaustive it will have the impact of binding those threads to one node,
which disagrees with the doc (and makes future evolutions significantly
more complicated).

This could be backported to 2.4 where numa-cpu-mapping was added, though
if nobody encountered this by then maybe we should only focus on recent
versions that are more NUMA-friendly (e.g. 2.8 only). This patch depends
on this previous commit that brings the function we rely on:

   MINOR: cpuset: add cpu_map_configured() to know if a cpu-map was found
2023-07-20 11:01:09 +02:00
Willy Tarreau
7134417613 MINOR: cpuset: add cpu_map_configured() to know if a cpu-map was found
Since we'll soon want to adjust the "thread-groups" degree of freedom
based on the presence of cpu-map, we first need to be able to detect
if cpu-map was used. This function scans all cpu-map sets to detect if
any is present, and returns true accordingly.
2023-07-20 11:01:09 +02:00
Daan van Gorkum
f034139bc0 MINOR: lua: Allow reading "proc." scoped vars from LUA core.
This adds the "core.get_var()" method allow the reading
of "proc." scoped variables outside of TXN or HTTP/TCPApplet.

Fixes: #2212
Signed-off-by: Daan van Gorkum <djvg@djvg.net>
2023-07-20 10:55:28 +02:00
Christopher Faulet
083f917fe2 BUG/MINOR: h1-htx: Return the right reason for 302 FCGI responses
A FCGI response may contain a "Location" header with no status code. In this
case a 302-Found HTTP response must be returned to the client. However,
while the status code is indeed 302, the reason is wrong. "Found" must be
set instead of "Moved Temporarily".

This patch must be backported as far as 2.2. With the commit e3e4e0006
("BUG/MINOR: http: Return the right reason for 302"), this should fix the
issue #2208.
2023-07-20 09:51:00 +02:00
firexinghe
bfff46f411 BUG/MINOR: hlua: add check for lua_newstate
Calling lual_newstate(Init main lua stack) in the hlua_init_state()
function, the return value of lua_newstate() may be NULL (for example
in case of OOM). In this case, L will be NULL, and then crash happens
in lua_getextraspace(). So, we add a check for lua_newstate.

This should be backported at least to 2.4, maybe further.
2023-07-19 10:16:14 +02:00
Emeric Brun
c0456f45c8 BUILD: quic: fix warning during compilation using gcc-6.5
Building with gcc-6.5:

src/quic_conn.c: In function 'send_retry':
src/quic_conn.c:6554:2: error: dereferencing type-punned pointer will break
strict-aliasing rules [-Werror=strict-aliasing]
   *((uint32_t *)((unsigned char *)&buf[i])) = htonl(qv->num);

This patch use write_n32 to set the value.

This could be backported until v2.6
2023-07-19 08:58:55 +02:00
Frédéric Lécaille
e5a17b0bc0 BUG/MINOR: quic: Unckecked encryption levels availability
This bug arrived with this commit:

   MEDIUM: quic: Dynamic allocations of QUIC TLS encryption levels

It is possible that haproxy receives a late Initial packet after it has
released its Initial or Handshake encryption levels. In this case
it must not try to retransmit packets from such encryption levels to
speed up the handshake completion.

No need to backport.
2023-07-18 11:50:31 +02:00
Mariam John
00b7b49a46 MEDIUM: ssl: new sample fetch method to get curve name
Adds a new sample fetch method to get the curve name used in the
key agreement to enable better observability. In OpenSSLv3, the function
`SSL_get_negotiated_group` returns the NID of the curve and from the NID,
we get the curve name by passing the NID to OBJ_nid2sn. This was not
available in v1.1.1. SSL_get_curve_name(), which returns the curve name
directly was merged into OpenSSL master branch last week but will be available
only in its next release.
2023-07-17 15:45:41 +02:00
Christopher Faulet
e3e4e00063 BUG/MINOR: http: Return the right reason for 302
Because of a cut/paste error, the wrong reason was returned for 302
code. The 301 reason was returned instead. Thus now, "Found" is returned for
302, instead of "Moved Permanently".

This pathc should fix the issue 2208. It must be backported to all stable
versions.
2023-07-17 11:14:10 +02:00
Christopher Faulet
b982fc2177 BUG/MINOR: sample: Fix wrong overflow detection in add/sub conveters
When "add" or "sub" conveters are used, an overflow detection is performed.
When 2 negative integers are added (or a positive integer is substracted to
a positive one), we take care to not exceed the low limit (LLONG_MIN) and
when 2 positive integers are added, we take care to not exceed the high
limit (LLONG_MAX).

However, because of a missing 'else' statement, if there is no overflow in
the first case, we fall back on the second check (the one for positive adds)
and LLONG_MAX is returned. It means that most of time, when 2 negative
integers are added (or a positive integer is substracted to a negative one),
LLONG_MAX is returned.

This patch should solve the issue #2216. It must be backported to all stable
versions.
2023-07-17 11:14:10 +02:00
Aurelien DARRAGON
70e10ee5bc BUG/MEDIUM: hlua_fcn/queue: bad pop_wait sequencing
I assumed that the hlua_yieldk() function used in queue:pop_wait()
function would eventually return when the continuation function would
return.

But this is wrong, the continuation function is simply called back by the
resume after the hlua_yieldk() which does not return in this case. The
caller is no longer the initial calling function, but Lua, so when the
continuation function eventually returns, it does not give the hand back
to the C calling function (queue:pop_wait()), like we're used to, but
directly to Lua which will continue the normal execution of the (Lua)
function that triggered the C-function, effectively bypassing the end
of the C calling function.

Because of this, the queue waiting list cleanup never occurs!

This causes some undesirable effects:
 - pop_wait() will slowly leak over the time, because the allocated queue
   waiting entry never gets deallocated when the function is finished
 - queue:push() will become slower and slower because the wait list will
   keep growing indefinitely as a result of the previous leak
 - the task that performed at least 1 pop_wait() could suffer from
   useless wakeups because it will stay indefinitely in the queue waiting
   list, so every queue:push() will try to wake the task, even if the
   task is not waiting for new queue items.
 - last but not least, if the task that performed at least 1 pop_wait ends
   or crashes, the next queue:push() will lead to invalid reads and
   process crash because it will try to wakeup a ghost task that doesn't
   exist anymore.

To fix this, the pop_wait function was reworked with the assumption that
the hlua_yieldk() with continuation function never returns. Indeed, it is
now the continuation function that will take care of the cleanup, instead
of the parent function.

This must be backported in 2.8 with 86fb22c5 ("MINOR: hlua_fcn: add Queue class")
2023-07-17 07:42:52 +02:00
Aurelien DARRAGON
2e7d3d2e5c BUG/MINOR: hlua: hlua_yieldk ctx argument should support pointers
lua_yieldk ctx argument is of type lua_KContext which is typedefed to
intptr_t when available so it can be used to store pointers.

But the wrapper function hlua_yieldk() passes it as a regular it so it
breaks that promise.

Changing hlua_yieldk() prototype so that ctx argument is of type
lua_KContext.

This bug had no functional impact because ctx argument is not being
actively used so far. This may be backported to all stable versions
anyway.
2023-07-17 07:42:47 +02:00
Emeric Brun
49ddd87d41 CLEANUP: quic: remove useless parameter 'key' from quic_packet_encrypt
Parameter 'key' was not used in this function.

This patch removes it from the prototype of the function.

This patch could be backported until v2.6.
2023-07-12 14:33:03 +02:00
Emeric Brun
cadb232e93 BUG/MEDIUM: quic: timestamp shared in token was using internal time clock
The internal tick clock was used to export the timestamp int the token
on retry packets. Doing this in cluster mode the nodes don't
understand the timestamp from tokens generated by others.

This patch re-work this using the the real current date (wall-clock time).

Timestamp are also now considered in secondes instead of milleseconds.

This patch should be backported until v2.6
2023-07-12 14:32:01 +02:00
Emeric Brun
072e774939 BUG/MEDIUM: quic: missing check of dcid for init pkt including a token
RFC 9000, 17.2.5.1:
"The client MUST use the value from the Source Connection ID
field of the Retry packet in the Destination Connection ID
field of subsequent packets that it sends."

There was no control of this and we could accept a different
dcid on init packets containing a valid token.

The randomized value used as new scid on retry packets is now
added in the aad used to encode the token. This way the token
will appear as invalid if the dcid missmatch the scid of
the previous retry packet.

This should be backported until v2.6
2023-07-12 14:31:15 +02:00
Emeric Brun
cc0a4fa0cc BUG/MINOR: quic: retry token remove one useless intermediate expand
According to rfc 5869 about hkdf, extract function returns a
pseudo random key usable to perform expand using labels to derive keys.
So the intermediate expand on a label is useless, the key should be strong
enought using only one expand.

This patch should be backported until v2.6
2023-07-12 14:30:45 +02:00
Emeric Brun
075b8f4cd8 BUG/MEDIUM: quic: token IV was not computed using a strong secret
Computing the token key and IV, a stronger derived key was used
to compute the key but the weak secret was still used to compute
the IV. This could be used to found the secret.

This patch fix this using the same derived key than the one used
to compute the token key.

This should backport until v2.6
2023-07-12 14:30:07 +02:00
Thierry Fournier
65f18d65a3 BUG/MINOR: config: Lenient port configuration parsing
Configuration parsing allow port like 8000/websocket/. This is
a nonsense and allowing this syntax may hide to the user something
not corresponding to its intent.

This patch should not be backported because it could break existing
configurations
2023-07-11 20:58:28 +02:00
Thierry Fournier
111351eebb BUG/MINOR: config: Remove final '\n' in error messages
Because error messages are displayed with appending final '\n', it's
useless to add '\n' in the message.

This patch should be backported.
2023-07-11 20:58:28 +02:00
Aurelien DARRAGON
33a8c2842b BUG/MINOR: hlua_fcn/queue: use atomic load to fetch queue size
In hlua_queue_size(), queue size is loaded as a regular int, but the
queue might be shared by multiple threads that could perform some
atomic pushing or popping attempts in parallel, so we better use an
atomic load operation to guarantee consistent readings.

This could be backported in 2.8.
2023-07-11 16:04:39 +02:00
Aurelien DARRAGON
c38cf3cf98 BUG/MINOR: sink/log: properly deinit srv in sink_new_from_logsrv()
When errors are encountered in sink_new_from_logsrv() function,
incompetely allocated ressources are freed to prevent memory leaks.

For instance: logsrv implicit server is manually cleaned up on error prior
to returning from the function.

However, since 198e92a8e5 ("MINOR: server: add a global list of all known
servers") every server created using new_server() is registered to the
global list, but unfortunately the manual srv cleanup in
sink_new_from_logsrv() doesn't remove the srv from the global list, so the
freed server will still be referenced there, which can result in invalid
reads later.

Moreover, server API has evolved since, and now the srv_drop() function is
available for that purpose, so let's use it, but make sure that srv is
freed before the proxy because on older versions srv_drop() expects the
srv to be linked to a valid proxy pointer.

This must be backported up to 2.4.

[For 2.4 version, free_server() must be used instead of srv_drop()]
2023-07-11 10:26:09 +02:00
Aurelien DARRAGON
b58bd9794f MINOR: hlua_fcn/mailers: handle timeout mail from mailers section
As the example/lua/mailers.lua script does its best to mimic the c-mailer
implementation, it should support the "timeout mail" directive as well.

This could be backported in 2.8.
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
b2f7069479 BUG/MINOR: server: set rid default value in new_server()
srv->rid default value is set in _srv_parse_init() after the server is
succesfully allocated using new_server().

This is wrong because new_server() can be used independently so rid value
assignment would be skipped in this case.

Hopefully new_server() allocates server data using calloc() so srv->rid
is already set to 0 in practise. But if calloc() is replaced by malloc()
or other memory allocating function that doesn't zero-initialize srv
members, this could lead to rid being uninitialized in some cases.

This should be backported in 2.8 with 61e3894dfe ("MINOR: server: add
srv->rid (revision id) value")
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
9859e00981 BUG/MINOR: sink: fix errors handling in cfg_post_parse_ring()
Multiple error paths (memory,IO related) in cfg_post_parse_ring() were
not implemented correcly and could result in memory leak or undefined
behavior.

Fixing them all at once.

This can be backported in 2.4
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
a26b736300 BUG/MINOR: sink: invalid sft free in sink_deinit()
sft freeing attempt made in a575421 ("BUG/MINOR: sink: missing sft free in
sink_deinit()") is incomplete, because sink->sft is meant to be used as a
list and not a single sft entry.

Because of that, the previous fix only frees the first sft entry, which
fixes memory leaks for single-server forwarders (this is the case for
implicit rings), but could still result in memory leaks when multiple
servers are configured in a explicit ring sections.

What this patch does: instead of directly freeing sink->sft, it iterates
over every list members to free them.

It must be backported up to 2.4 with a575421.
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
9f9d557468 BUG/MINOR: log: free errmsg on error in cfg_parse_log_forward()
When leaving cfg_parse_log_forward() on error paths, errmsg which is local
to the function could still point to valid data, and it's our
responsibility to free it.

Instead of freeing it everywhere it is invoved, we free it prior to
leaving the function.

This should be backported as far as 2.4.
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
21cf42f579 BUG/MINOR: log: fix multiple error paths in cfg_parse_log_forward()
Multiple error paths were badly handled in cfg_parse_log_forward():
some errors were raised without interrupting the function execution,
resulting in undefined behavior.

Instead of fixing issues separately, let's fix the whole function at once.
This should be backported as far as 2.4.
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
d1af50c807 BUG/MINOR: log: fix missing name error message in cfg_parse_log_forward()
"missing name for ip-forward section" is generated instead of "missing
name name for log-forward section" in cfg_parse_log_forward().

This may be backported up to 2.4.
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
47ee036e5f BUG/MEDIUM: log: improper use of logsrv->maxlen for buffer targets
In e709e1e ("MEDIUM: logs: buffer targets now rely on new sink_write")
we started using the sink API instead of using the ring_write function
directly.

But as indicated in the commit message, the maxlen parameter of the log
directive now only applies to the message part and not the complete
payload. I don't know what the original intent was (maybe minimizing code
changes) but it seems wrong, because the doc doesn't mention this special
case, and the result is that the ring->buffer output can differ from all
other log output types, making it very confusing.

One last issue with this is that log messages can end up being dropped at
runtime, only for the buffer target, and even if logsrv->maxlen is
correctly set (including default: 1024) because depending on the generated
header size the payload can grow bigger than the accepted sink size (sink
maxlen is not mandatory) and we have no simple way to detect this at
configuration time.

First, we partially revert e709e1e:

  TARGET_BUFFER still leverages the proper sink API, but thanks to
  "MINOR: sink: pass explicit maxlen parameter to sink_write()" we now
  explicitly pass the logsrv->maxlen to the sink_write function in order
  to stop writing as soon as either sink->maxlen or logsrv->maxlen is
  reached.

This restores pre-e709e1e behavior with the added benefit from using the
high-level API, which includes automatically announcing dropped message
events.

Then, we also need to take the ending '\n' into account: it is not
explicitly set when generating the logline for TARGET_BUFFER, but it will
be forcefully added by the sink_forward_io_handler function from the tcp
handler applet when log messages from the buffer are forwarded to tcp
endpoints.

In current form, because the '\n' is added later in the chain, maxlen is
not being considered anymore, so the final log message could exceed maxlen
by 1 byte, which could make receiving servers unhappy in logging context.

To prevent this, we sacrifice 1 byte from the logsrv->maxlen to ensure
that the final message will never exceed log->maxlen, even if the '\n'
char is automatically appended later by the forwarding applet.

Thanks to this change TCP (over RING/BUFFER) target now behaves like
FD and UDP targets.

This commit depends on:
 - "MINOR: sink: pass explicit maxlen parameter to sink_write()"

It may be backported as far as 2.2

[For 2.2 and 2.4 the patch does not apply automatically, the sink_write()
call must be updated by hand]
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
b6e2d62fb3 MINOR: sink/api: pass explicit maxlen parameter to sink_write()
sink_write() currently relies on sink->maxlen to know when to stop
writing a given payload.

But it could be useful to pass a smaller, explicit value to sink_write()
to stop before the ring maxlen, for instance if the ring is shared between
multiple feeders.

sink_write() now takes an optional maxlen parameter:
  if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen
  is smaller than ring->maxlen, else only ring->maxlen will be considered.

[for haproxy <= 2.7, patch must be applied by hand: that is:
__sink_write() and sink_write() should be patched to take maxlen into
account and function calls to sink_write() should use 0 as second argument
to keep original behavior]
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
901f31bc9a BUG/MINOR: log: LF upsets maxlen for UDP targets
A regression was introduced with 5464885 ("MEDIUM: log/sink: re-work
and merge of build message API.").

For UDP targets, a final '\n' is systematically inserted, but with the
rework of the build message API, it is inserted after the maxlen
limitation has been enforced, so this can lead to the final message
becoming maxlen+1. For strict syslog servers that only accept up to
maxlen characters, this could be a problem.

To fix the regression, we take the final '\n' into account prior to
building the message, like it was done before the rework of the API.

This should be backported up to 2.4.
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
c103379847 BUG/MINOR: ring: maxlen warning reported as alert
When maxlen parameter exceeds sink size, a warning is generated and maxlen
is enforced to sink size. But the err_code is incorrectly set to ERR_ALERT

Indeed, being a "warning", ERR_WARN should be used here.

This may be backported as far as 2.2
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
30ff33bd9b BUG/MINOR: ring: size warning incorrectly reported as fatal error
When a ring section defines its size using the "size" directive with a
smaller size than the default one or smaller than the previous one,
a warning is generated to inform the user that the new size will be
ignored.

However the err_code is returned as FATAL, so this cause haproxy to
incorrectly abort the init sequence.

Changing the err_code to ERR_WARN so that this warning doesn't refrain
from successfully starting the process.

This should be backported as far as 2.4
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
a5754219e7 BUG/MINOR: sink: missing sft free in sink_deinit()
Adding missing free for sft (string_forward_target) in sink_deinit(),
which resulted in minor leak for each declared ring target at deinit().
(either explicit and implicit rings are affected)

This may be backported up to 2.4.
2023-07-06 15:41:17 +02:00
Aurelien DARRAGON
5028a6e50b BUG/MINOR: http_ext: unhandled ERR_ABORT in proxy_http_parse_7239()
_proxy_http_parse_7239_expr() helper used in proxy_http_parse_7239()
function may return ERR_ABORT in case of memory error. But the error check
used below is insufficient to catch ERR_ABORT so the function could keep
executing prior to returning ERR_ABORT, which may cause undefined
behavior. Hopefully no sensitive handling is performed in this case so
this bug has very limited impact, but let's fix it anyway.

We now use ERR_CODE mask instead of ERR_FATAL to check if err_code is set
to any kind of error combination that should prevent the function from
further executing.

This may be backported in 2.8 with b2bb9257d2 ("MINOR: proxy/http_ext:
introduce proxy forwarded option")
2023-07-06 15:41:17 +02:00
Aurelien DARRAGON
999699a277 BUG/MEDIUM: sink: invalid server list in sink_new_from_logsrv()
forward proxy server list created from sink_new_from_logsrv() is invalid

Indeed, srv->next is literally assigned to itself. This did not cause
issues during syslog handling because the sft was properly set, but it
will cause the free_proxy(sink->forward_px) at deinit to go wild since
free_proxy() will try to iterate through the proxy srv list to free
ressources, but because of the improper list initialization, double-free
and infinite-loop will occur.

This bug was revealed by 9b1d15f53a ("BUG/MINOR: sink: free forward_px on deinit()")

It must be backported as far as 2.4.
2023-07-06 15:41:17 +02:00
Remi Tricot-Le Breton
ca4fd73938 BUG/MINOR: cache: A 'max-age=0' cache-control directive can be overriden by a s-maxage
When a s-maxage cache-control directive is present, it overrides any
other max-age or expires value (see section 5.2.2.9 of RFC7234). So if
we have a max-age=0 alongside a strictly positive s-maxage, the response
should be cached.

This bug was raised in GitHub issue #2203.
The fix can be backported to all stable branches.
2023-07-04 22:15:00 +02:00
Frédéric Lécaille
6aeaa73d39 BUG/MINOR: quic: Possible crash in "show quic" dumping packet number spaces
This bug was introduced by this commit:

    MEDIUM: quic: Release encryption levels and packet number spaces asap

Add some checks before derefencing pointers to packet number spaces objects
to dump them from "show quic" command.

No backport needed.
2023-07-03 17:20:55 +02:00
Aurelien DARRAGON
4f0e0f5a65 MEDIUM: sample: introduce 'same' output type
Thierry Fournier reported an annoying side-effect when using the debug()
converter.

Consider the following examples:

[1]  http-request set-var(txn.test) bool(true),ipmask(24)
[2]  http-request redirect location /match if { bool(true),ipmask(32) }

When starting haproxy with [1] example we get:

   config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request set-var(txn.test)' rule : converter 'ipmask' cannot be applied.

With [2], we get:

  config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request redirect' rule : error in condition: converter 'ipmask' cannot be applied in ACL expression 'bool(true),ipmask(32)'.

Now consider the following examples which are based on [1] and [2]
but with the debug() sample conv inserted in-between those incompatible
sample types:

[1*] http-request set-var(txn.test) bool(true),debug,ipmask(24)
[2*] http-request redirect location /match if { bool(true),debug,ipmask(32) }

According to the documentation, "it is safe to insert the debug converter
anywhere in a chain, even with non-printable sample types".
Thus we don't expect any side-effect from using it within a chain.

However in current implementation, because of debug() returning SMP_T_ANY
type which is a pseudo type (only resolved at runtime), the sample
compatibility checks performed at parsing time are completely uneffective.
(haproxy will start and no warning will be emitted)

The indesirable effect of this is that debug() prevents haproxy from
warning you about impossible type conversions, hiding potential errors
in the configuration that could result to unexpected evaluation or logic
while serving live traffic. We better let haproxy warn you about this kind
of errors when it has the chance.

With our previous examples, this could cause some inconveniences. Let's
say for example that you are testing a configuration prior to deploying
it. When testing the config you might want to use debug() converter from
time to time to check how the conversion chain behaves.

Now after deploying the exact same conf, you might want to remove those
testing debug() statements since they are no longer relevant.. but
removing them could "break" the config and suddenly prevent haproxy from
starting upon reload/restart. (this is the expected behavior, but it
comes a bit too late because of debug() hiding errors during tests..)

To fix this, we introduce a new output type for sample expressions:
  SMP_T_SAME - may only be used as "expected" out_type (parsing time)
               for sample converters.

As it name implies, it is a way for the developpers to indicate that the
resulting converter's output type is guaranteed to match the type of the
sample that is presented on the converter's input side.
(converter may alter data content, but data type must not be changed)

What it does is that it tells haproxy that if switching to the converter
(by looking at the converter's input only, since outype is SAME) is
conversion-free, then the converter type can safely be ignored for type
compatibility checks within the chain.

debug()'s out_type is thus set to SMP_T_SAME instead of ANY, which allows
it to fully comply with the doc in the sense that it does not impact the
conversion chain when inserted between sample items.

Co-authored-by: Thierry Fournier <thierry.f.78@gmail.com>
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
4136aba830 MEDIUM: tree-wide: fetches that may return IPV4+IPV6 now return ADDR
Historically, the ADDR pseudo-type did not exist. So when IPV6 support
was added to existing IPV4 sample fetches (e.g.: src,dst,hdr_ip...) the
expected out_type in related sample definitions was left on IPV4 because
it was required to declare the out_type as the lowest common denominator
(the type that can be casted into all other ones) to make compatibility
checks at parse time work properly.

However, now that ADDR pseudo-type may safely be used as out_type since
("MEDIUM: sample: add missing ADDR=>? compatibility matrix entries"), we
can use ADDR for fetches that may output both IPV4 and IPV6 at runtime.

One added benefit on top of making the code less confusing is that
'haproxy -dKsmp' output will now show "addr" instead of "ipv4" for such
fetches, so the 'haproxy -dKsmp' output better complies with the fetches
signatures from the documentation.

out_ip fetch, which returns an ip according to the doc, was purposely
left as is (returning IPV4) since the smp_fetch_url_ip() implementation
forces output type to IPV4 anyway, and since this is an historical fetch
I prefer not to touch it to prevent any regression. However if
smp_fetch_url_ip() were to be fixed to also return IPV6 in the future,
then its expected out_type may be changed to ADDR as well.

Multiple notes in the code were updated to mention that the appropriate
pseudo-type may be used instead of the lowest common denominator for
out_type when available.
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
67544fdd1c MINOR: sample: fix ipmask sample definition
Since 1478aa7 ("MEDIUM: sample: Add IPv6 support to the ipmask converter")
ipmask converter may return IPV6 as well, so the sample definition must
be updated. (the output type is currently explicited to IPV4)

This has no functional impact: one visible effect will be that
converter's visible output type will change from "ipv4" to "addr" when
using 'haproxy -dKcnv'.
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
a635a1779a MEDIUM: sample: add missing ADDR=>? compatibility matrix entries
SMP_T_ADDR support was added in b805f71 ("MEDIUM: sample: let the cast
functions set their output type").

According to the above commit, it is made clear that the ADDR type is
a pseudo/generic type that may be used for compatibility checks but
that cannot be emitted from a fetch or converter.

With that in mind, all conversions from ADDR to other types were
explicitly disabled in the compatibility matrix.

But later, when map_*_ip functions were updated in b2f8f08 ("MINOR: map:
The map can return IPv4 and IPv6"), we started using ADDR as "expected"
output type for converters. This still complies with the original
description from b805f71, because it is used as the theoric output
type, and is never emitted from the converters themselves (only "real"
types such as IPV4 or IPV6 are actually being emitted at runtime).

But this introduced an ambiguity as well as a few bugs, because some
compatibility checks are being performed at config parse time, and thus
rely on the expected output type to check if the conversion from current
element to the next element in the chain is theorically supported.

However, because the compatibility matrix doesn't support ADDR to other
types it is currently impossible to use map_*_ip converters in the middle
of a chain (the only supported usage is when map_*_ip converters are
at the very end of the chain).

To illustrate this, consider the following examples:

  acl test str(ok),map_str_ip(test.map) -m found                          # this will work
  acl test str(ok),map_str_ip(test.map),ipmask(24) -m found               # this will raise an error

Likewise, stktable_compatible_sample() check for stick tables also relies
on out_type[table_type] compatibility check, so map_*_ip cannot be used
with sticktables at the moment:

  backend st_test
    stick-table type string size 1m expire 10m store http_req_rate(10m)
  frontend fe
    bind localhost:8080
    mode http
    http-request track-sc0 str(test),map_str_ip(test.map) table st_test     # raises an error

To fix this, and prevent future usage of ADDR as expected output type
(for either fetches or converters) from introducing new bugs, the
ADDR=>? part of the matrix should follow the ANY type logic.

That is, ADDR, which is a pseudo-type, must be compatible with itself,
and where IPV4 and IPV6 both support a backward conversion to a given
type, ADDR must support it as well. It is done by setting the relevant
ADDR entries to c_pseudo() in the compatibility matrix to indicate that
the operation is theorically supported (c_pseudo() will never be executed
because ADDR should not be emitted: this only serves as a hint for
compatibility checks at parse time).

This is what's being done in this commit, thanks to this the broken
examples documented above should work as expected now, and future
usage of ADDR as out_type should not cause any issue.
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
30cd137d3f MINOR: sample: introduce c_pseudo() conv function
This function is used for ANY=>!ANY conversions in the compatibility
matrix to help differentiate between real NOOP (c_none) and pseudo
conversions that are theorically supported at config parse time but can
never occur at runtime,. That is, to explicit the fact that actual related
runtime operations (e.g.: ANY->IPV4) are not NOOP since they might require
some conversion to be performed depending on the input type.

When checking the conf we don't know the effective out types so
cast[pseudo type][pseudo type] is allowed in the compatibility matrix,
but at runtime we only expect cast[real type][(real type || pseudo type)]
because fetches and converters may not emit pseudo types, thus using
c_none() everywhere was too ambiguous.

The process will crash if c_pseudo() is invoked to help catch bugs:
crashing here means that a pseudo type has been encountered on a
converter's input at runtime (because it was emitted earlier in the
chain), which is not supported and results from a broken sample fetch
or converter implementation. (pseudo types may only be used as out_type
in sample definitions for compatibility checks at parsing time)
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
58bbe41cb8 MEDIUM: acl/sample: unify sample conv parsing in a single function
Both sample_parse_expr() and parse_acl_expr() implement some code
logic to parse sample conv list after respective fetch or acl keyword.

(Seems like the acl one was inspired by the sample one historically)

But there is clearly code duplication between the two functions, making
them hard to maintain.
Hopefully, the parsing logic between them has stayed pretty much the
same, thus the sample conv parsing part may be moved in a dedicated
helper parsing function.

This is what's being done in this commit, we're adding the new function
sample_parse_expr_cnv() which does a single thing: parse the converters
that are listed right after a sample fetch keyword and inject them into
an already existing sample expression.

Both sample_parse_expr() and parse_acl_expr() were adapted to now make
use of this specific parsing function and duplicated code parts were
cleaned up.

Although sample_parse_expr() remains quite complicated (numerous function
arguments due to contextual parsing data) the first goal was to get rid of
code duplication without impacting the current behavior, with the added
benefit that it may allow further code cleanups / simplification in the
future.
2023-07-03 16:32:01 +02:00
Aurelien DARRAGON
461fd2e296 BUG/MINOR: tcp_sample: bc_{dst,src} return IP not INT
bc_{dst,src} sample fetches were introduced in 7d081f0 ("MINOR:
tcp_samples: Add samples to get src/dst info of the backend connection")

However a copy-pasting error was probably made here because they both
return IP addresses (not integers) like their fc_{src,dst} or src,dst
analogs.

Hopefully, this had no functional impact because integer can be converted
to both IPV4 and IPV6 types so the compatibility checks were not affected.

Better fix this to prevent confusion in code and "haproxy -dKsmp" output.

This should be backported up to 2.4 with 7d081f0.
2023-07-03 16:32:01 +02:00
Frdric Lcaille
f4d9fa4089 BUILD: quic: Compilation fixes for some gcc warnings with -O1
This is to prevent the build from these gcc warning when compiling with -O1 option:

  willy@wtap:haproxy$ sh make-native-quic-memstat src/quic_conn.o CPU_CFLAGS="-O1"
    CC      src/quic_conn.o
  src/quic_conn.c: In function 'qc_prep_pkts':
  src/quic_conn.c:3700:44: warning: potential null pointer dereference [-Wnull-dereference]
   3700 |                                 frms = &qel->pktns->tx.frms;
        |                                         ~~~^~~~~~~
  src/quic_conn.c:3626:33: warning: 'end' may be used uninitialized in this function [-Wmaybe-uninitialized]
   3626 |                         if (end - pos < QUIC_INITIAL_PACKET_MINLEN) {
        |                             ~~~~^~~~~
2023-07-03 14:50:54 +02:00
Frédéric Lécaille
7405874555 BUG/MINOR: quic: Missing QUIC connection path member initialization
This bug was introduced by this commit:
  MINOR: quic: Remove pool_zalloc() from qc_new_conn().

If ->path is not initialized to NULL value, and if a QUIC connection object
allocation has failed (from qc_new_conn()), haproxy could crash in
quic_conn_prx_cntrs_update() when dereferencing this QUIC connection member.

No backport needed.
2023-07-03 10:51:12 +02:00
Frédéric Lécaille
0e53cb07a5 BUG/MINOR: quic: Possible leak when allocating an encryption level
This bug was reported by GH #2200 (coverity issue) as follows:

*** CID 1516590:  Resource leaks  (RESOURCE_LEAK)
/src/quic_tls.c: 159 in quic_conn_enc_level_init()
153
154             LIST_APPEND(&qc->qel_list, &qel->list);
155             *el = qel;
156             ret = 1;
157      leave:
158             TRACE_LEAVE(QUIC_EV_CONN_CLOSE, qc);
>>>     CID 1516590:  Resource leaks  (RESOURCE_LEAK)
>>>     Variable "qel" going out of scope leaks the storage it points to.
159             return ret;
160     }
161
162     /* Uninitialize <qel> QUIC encryption level. Never fails. */
163     void quic_conn_enc_level_uninit(struct quic_conn *qc, struct quic_enc_level *qel)
164     {

This bug was introduced by this commit which has foolishly assumed the encryption
level memory would be released after quic_conn_enc_level_init() has failed. This
is no more possible because this object is dynamic and no more a static member
of the QUIC connection object.

Anyway, this patch modifies quic_conn_enc_level_init() to ensure this is
no more leak when quic_conn_enc_level_init() fails calling quic_conn_enc_level_uninit()
in case of memory allocation error.

quic_conn_enc_level_uninit() code was moved without modification only to be defined
before quic_conn_enc_level_init()

There is no need to backport this.
2023-07-03 10:50:08 +02:00
Willy Tarreau
1f60133bfb BUILD: debug: avoid a build warning related to epoll_wait() in debug code
Ilya reported in issue #2193 that the latest Fedora complains about us
passing NULL to epoll_wait() in the "debug dev fd" code to try to detect
an epoll FD. That was intentional to get the kernel's verifications and
make sure we're facing a poller, but since such a warning comes from the
libc, it's possible that it plans to replace the syscall with a wrapper
in the near future (e.g. epoll_pwait()), and that just hiding the NULL
(as was confirmed to work) might just postpone the problem.

Let's take another approach, instead we'll create a new dummy FD that
we'll try to remove from the epoll set using epoll_ctl(). Since we
created the FD we're certain it cannot be there.  In this case (and
only in this case) epoll_ctl() will return -ENOENT, otherwise it will
typically return EINVAL or EBADF. It was verified that it works and
doesn't return false positives for other FD types. It should be
backported to the branches that contain a backport of the commit which
introduced the feature, apparently as far as 2.4:

  5be7c198e ("DEBUG: cli: add a new "debug dev fd" expert command")
2023-07-02 11:01:37 +02:00
Frdric Lcaille
36e6c8aa4b BUILD: quic: Add a DISGUISE() to please some compiler to qc_prep_hpkts() 1st parameter
Some compiler could complain with such a warning:
  src/quic_conn.c:3700:44: warning: potential null pointer dereference [-Wnull-dereference]
   3700 |                                 frms = &qel->pktns->tx.frms;

It could not figure out that <qel> could not be NULL at this location.
This is fixed calling qc_prep_hpkts() with a disguise 1st parameter.
2023-06-30 18:48:52 +02:00
Frédéric Lécaille
7f3c1bef37 MINOR: quic: Drop packet with type for discarded packet number space.
This patch allows the low level packet parser to drop packets with type for discarded
packet number spaces. Furthermore, this prevents it from reallocating new encryption
levels and packet number spaces already released/discarded. When a packet number space
is discarded, it MUST NOT be reallocated.

As the packet number space discarding is done asap the type of packet received is
known, some packet number space discarding check may be safely removed from qc_try_rm_hp()
and qc_qel_may_rm_hp() which are called after having parse the packet header, and
is type.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
b97de9dc21 MINOR: quic: Move the packet number space status at quic_conn level
As the packet number spaces and encryption level are dynamically allocated,
the information about the packet number space discarded status must be kept
somewhere else than in these objects.

quic_tls_discard_keys() is no more useful.
Modify quic_pktns_discard() to do the same job: flag the quic_conn object
has having discarded packet number space.
Implement quic_tls_pktns_is_disarded() to check if a packet number space is
discarded. Note the Application data packet number space is never discarded.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
75ae0f7bbc CLEANUP: quic: Remove a useless test about discarded pktns (qc_handle_crypto_frm())
There is no need to check that the packet number space associated to the encryption
level to handle the CRYPTO frames is available when entering qc_handle_crypto_frm().
This has already been done by the caller: qc_treat_rx_pkts().
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
f1bfbf24cd MEDIUM: quic: Release encryption levels and packet number spaces asap
Release the memory allocated for the Initial and Handshake encryption level
under packet number spaces as soon as possible.

qc_treat_rx_pkts() has been modified to do that for the Initial case. This must
be done after all the Initial packets have been parsed and the underlying TLS
secrets have been flagged as to be discarded. As the Initial encryption level is
removed from the list attached to the quic_conn object inside a list_for_each_entry()
loop, this latter had to be converted into a list_for_each_entry_safe() loop.

The memory allocated by the Handshake encryption level and packet number spaces
is released just before leaving the handshake I/O callback (qc_conn_io_cb()).

As ->iel and ->hel pointer to Initial and Handshake encryption are reset to null
value when releasing the encryption levels memory, some check have been added
at several place before dereferencing them. Same thing for the ->ipktns and ->htpktns
pointer to packet number spaces.

Also take the opportunity of this patch to modify qc_dgrams_retransmit() to
use packet number space variables in place of encryption level variables to shorten
some statements. Furthermore this reflects the jobs it has to do: retransmit
UDP datagrams from packet number spaces.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
53caf351a9 CLEANUP: quic: Remove server specific about Initial packet number space
Remove a code section about the QUIC client Initial packet number space
dropping.

Should be backported as far as 2.6 to ease future backports to come.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
2d6a234af5 MINOR: quic: Remove call to qc_rm_hp_pkts() from I/O callback
quic_conn_io_cb() is the I/O handler callback used during the handshakes.
quic_conn_app_io_cb() is used after the handshakes. Both call qc_rm_hp_pkts()
before parsing the RX packets ordered by their packet numbers calling qc_treat_rx_pkts().
qc_rm_hp_pkts() is there to remove the header protection to reveal the packet
numbers, then reorder the packets by their packet numbers.

qc_rm_hp_pkts() may be safely directly called by qc_treat_rx_pkts(), which is
itself called by the I/O handlers.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
081f5002eb MEDIUM: quic: Handle the RX in one pass
Insert a loop around the existing code of qc_treat_rx_pkts() to handle all
the RX packets by encryption level for all the encryption levels allocated
for the connection, i.e. for all the encryption levels with secrets derived
from the ones provided by the TLS stack through ->set_encryption_secrets().
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
f7749968d6 CLEANUP: quic: Remove two useless pools a low QUIC connection level
Both "quic_tx_ring" and "quic_rx_crypto_frm" pool are no more used.

Should be backported as far as 2.6.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
3097be92f1 MEDIUM: quic: Dynamic allocations of QUIC TLS encryption levels
Replace ->els static array of encryption levels by 4 pointers into the QUIC
connection object, quic_conn struct.
    ->iel denotes the Initial encryption level,
    ->eel the Early-Data encryption level,
    ->hel the Handshaske encryption level and
    ->ael the Application Data encryption level.

Add ->qel_list to this structure to list the encryption levels after having been
allocated. Modify consequently the encryption level object itself (quic_enc_level
struct) so that it might be added to ->qel_list QUIC connection list of
encryption levels.

Implement qc_enc_level_alloc() to initialize the value of a pointer to an encryption
level object. It is used to initialized the pointer newly added to the quic_conn
structure. It also takes a packet number space pointer address as argument to
initialize it if not already initialized.

Modify quic_tls_ctx_reset() to call it from quic_conn_enc_level_init() which is
called by qc_enc_level_alloc() to allocate an encryption level object.

Implement 2 new helper functions:
  - ssl_to_qel_addr() to match and pointer address to a quic_encryption level
    attached to a quic_conn object with a TLS encryption level enum value;
  - qc_quic_enc_level() to match a pointer to a quic_encryption level attached
    to a quic_conn object with an internal encryption level enum value.
This functions are useful to be called from ->set_encryption_secrets() and
->add_handshake_data() TLS stack called which takes a TLS encryption enum
as argument (enum ssl_encryption_level_t).

Replace all the use of the qc->els[] array element values by one of the newly
added ->[ieha]el quic_conn struct member values.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
25a7b15144 MINOR: quic: Add a pool for the QUIC TLS encryption levels
Very simple patch to define and declare a pool for the QUIC TLS encryptions levels.
It will be used to dynamically allocate such objects to be attached to the
QUIC connection object (quic_conn struct) and to remove from quic_conn struct the
static array of encryption levels (see ->els[]).
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
6635aa6a0a MEDIUM: quic: Dynamic allocations of packet number spaces
Add a pool to dynamically handle the memory used for the QUIC TLS packet number spaces.
Remove the static array of packet number spaces at QUIC connection level (struct
quic_conn) and add three new members to quic_conn struc as pointers to quic_pktns
struct, one by packet number space as follows:
     ->ipktns for Initial packet number space,
     ->hpktns for Handshake packet number space and
     ->apktns for Application packet number space.
Also add a ->pktns_list new member (struct list) to quic_conn struct to attach
the list of the packet number spaces allocated for the QUIC connection.
Implement ssl_to_quic_pktns() to map and retrieve the addresses of these pointers
from TLS stack encryption levels.
Modify quic_pktns_init() to initialize these members.
Modify ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data()  to
allocate the packet numbers and initialize the encryption level.
Implement quic_pktns_release() which takes pointers to pointers to packet number
space objects to release the memory allocated for a packet number space attached
to a QUIC connection and reset their address values.

Modify qc_new_conn() to allocation only the Initial packet number space and
Initial encryption level.

Modify QUIC loss detection API (quic_loss.c) to use the new ->pktns_list
list attached to a QUIC connection in place of a static array of packet number
spaces.

Replace at several locations the use of elements of an array of packet number
spaces by one of the three pointers to packet number spaces
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
411b6f73b7 MINOR: quic: Implement a packet number space identification function
Implement quic_pktns_char() to identify a packet number space from a
quic_conn object. Usefull only for traces.
2023-06-30 16:20:55 +02:00
Frédéric Lécaille
6593ec6f5e MINOR: quic: Move QUIC TLS encryption level related code (quic_conn_enc_level_init())
quic_conn_enc_level_init() location is definitively in QUIC TLS API source file:
src/quic_tls.c.
2023-06-30 16:20:55 +02:00
Willy Tarreau
56f15298d9 MINOR: compression/slz: add support for a pure flush of pending bytes
While HTTP makes no promises of sub-message delivery, haproxy tries to
make reasonable efforts to be friendly to applications relying on this,
particularly though the "option http-no-delay" statement. However, it
was reported that when slz compression is being used, a few bytes can
remain pending for more data to complete them in the SLZ queue when
built on a 64-bit little endian architecture. This is because aligning
blocks on byte boundary is costly (requires to switch to literals and
to send 4 bytes of block size), so incomplete bytes are left pending
there until they reach at least 32 bits. On other architecture, the
queue is only 8 bits long.

Robert Newson from Apache's CouchDB project explained that the heartbeat
used by CouchDB periodically delivers a single LF character, that it used
to work fine before the change enlarging the queue for 64-bit platforms,
but only forwards once every 3 LF after the change. This was definitely
caused by this incomplete byte sequence queuing. Zlib is not affected,
and the code shows that ->flush() is always called. In the case of SLZ,
the called function is rfc195x_flush_or_finish() and when its "finish"
argument is zero, no flush is performed because there was no such flush()
operation.

The previous patch implemented a flush() operation in SLZ, and this one
makes rfc195x_flush_or_finish() call it when finish==0. This way each
delivered data block will now provoke a flush of the queue as is done
with zlib.

This may very slightly degrade the compression ratio, but another change
is needed to condition this on "option http-no-delay" only in order to
be consistent with other parts of the code.

This patch (and the preceeding slz one) should be backported at least to
2.6, but any further change to depend on http-no-delay should not.
2023-06-30 16:12:36 +02:00
Willy Tarreau
90d18e2006 IMPORT: slz: implement a synchronous flush() operation
In some cases it may be desirable for latency reasons to forcefully
flush the queue even if it results in suboptimal compression. In our
case the queue might contain up to almost 4 bytes, which need an EOB
and a switch to literal mode, followed by 4 bytes to encode an empty
message. This means that each call can add 5 extra bytes in the ouput
stream. And the flush may also result in the header being produced for
the first time, which can amount to 2 or 10 bytes (zlib or gzip). In
the worst case, a total of 19 bytes may be emitted at once upon a flush
with 31 pending bits and a gzip header.

This is libslz upstream commit cf8c4668e4b4216e930b56338847d8d46a6bfda9.
2023-06-30 16:12:36 +02:00
Frédéric Lécaille
17eaee31c3 BUG/MINOR: quic: Wrong endianess for version field in Retry token
This field must be sent in network byte order.

Must be backported as far as 2.6.
2023-06-30 14:57:30 +02:00
Frédéric Lécaille
5997d18c78 BUG/MINOR: quic: Wrong Retry paquet version field endianess
The 32-bits version field of the Retry paquet was inversed by the code. As this
field must be in the network byte order on the wire, this code has supposed that
the sender of the Retry packet will always be little endian. Hopefully this is
often the case on our Intel machines ;)

Must be backported as far as 2.6.
2023-06-30 14:41:31 +02:00
Frédéric Lécaille
6c9bf2bdf5 BUG/MINOR: quic: Missing random bits in Retry packet header
The 4 bits least significant bits of the first byte in a Retry packet must be
random. There are generated calling statistical_prng_range() with 16 as argument.

Must be backported as far as 2.6.
2023-06-30 12:17:36 +02:00
Patrick Hemmer
bce0ca696c BUG/MINOR: config: fix stick table duplicate name check
When a stick-table is defined within a peers section, the name is
prefixed with the peers section name. However when checking for
duplicate table names, the check was using the table name without
the prefix, and would thus never match.

Must be backported as far as 2.6.
2023-06-30 10:27:16 +02:00
William Lallemand
593c895eed MINOR: ssl: allow to change the client-sigalgs on server lines
This patch introduces the "client-sigalgs" keyword for the server line,
which allows to configure the list of server signature algorithms
negociated during the handshake. Also available as
"ssl-default-server-client-sigalgs" in the global section.
2023-06-29 14:11:46 +02:00
William Lallemand
717f0ad995 MINOR: ssl: allow to change the server signature algorithm on server lines
This patch introduces the "sigalgs" keyword for the server line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-server-sigalgs" in
the global section.
2023-06-29 13:40:18 +02:00
Emeric Brun
f473eb7206 BUG/MEDIUM: quic: error checking buffer large enought to receive the retry tag
Building a retry message, the offset of the tag was checked instead of the
remaining length into the buffer.

Must be backported as far as 2.6.
2023-06-27 18:54:10 +02:00
Willy Tarreau
e12e202f6a BUILD: mux-h1: silence a harmless fallthrough warning
This warning happened in 2.9-dev with commit 723c73f8a ("MEDIUM: mux-h1:
Split h1_process_mux() to make code more readable"). It's the usual gcc
crap that relies on comments to disable the warning but which drops these
comments between the preprocessor and the compiler, so using any split
build system (distcc, ccache etc) reintroduces the warning. Use the more
reliable and portable __fallthrough instead. No backport needed.
2023-06-27 16:08:13 +02:00
William Lallemand
3388b23465 BUG/MINOR: ssl: SSL_ERROR_ZERO_RETURN returns CO_ER_SSL_EMPTY
Return a more acurate error than the previous patch, CO_ER_SSL_EMPTY is
the code for "Connection closed during SSL handshake" which is more
precise than CO_ER_SSL_ABORT ("Connection error during SSL handshake").

No backport needed.
2023-06-26 19:10:24 +02:00
William Lallemand
e8e5762389 MEDIUM: ssl: handle the SSL_ERROR_ZERO_RETURN during the handshake
During a SSL_do_handshake(), SSL_ERROR_ZERO_RETURN can be returned in case
the remote peer sent a close_notify alert. Previously this would set the
connection error to CO_ER_SSL_HANDSHAKE, this patch sets it to
CO_ER_SSL_ABORT to have a more acurate error.
2023-06-26 18:52:53 +02:00
Frédéric Lécaille
1231810963 BUG/MINOR: quic: Prevent deadlock with CID tree lock
This bug was introduced by this commit which was not sufficient:
      BUG/MINOR: quic: Possible endless loop in quic_lstnr_dghdlr()

It was revealed by the blackhole interop runner test with neqo as client.

qc_conn_release() could be called after having locke the CID tree when two different
threads was creating the same connection at the same time. Indeed in this case
the last thread which tried to create a new connection for the same an already existing
CID could not manage to insert an already inserted CID in the connection CID tree.
This was expected. It had to destroy the newly created for nothing connection calling
qc_conn_release(). But this function also locks this tree calling free_quic_conn_cids() leading to a deadlock.
A solution would have been to delete the new CID created from its tree before
calling qc_conn_release().

A better solution is to stop inserting the first CID from qc_new_conn(), and to
insert it into the CID tree only if there was not an already created connection.
This is whas is implemented by this patch.

Must be backported as far as 2.7.
2023-06-26 14:09:58 +02:00
William Lallemand
117b03ff4a BUG/MINOR: mworker: leak of a socketpair during startup failure
Aurelien Darragon found a case of leak when working on ticket #2184.

When a reexec_on_failure() happens *BEFORE* protocol_bind_all(), the
worker is not fork and the mworker_proc struct is still there with
its 2 socketpairs.

The socketpair that is supposed to be in the master is already closed in
mworker_cleanup_proc(), the one for the worker was suppposed to
be cleaned up in mworker_cleanlisteners().

However, since the fd is not bound during this failure, the fd is never
closed.

This patch fixes the problem by setting the fd to -1 in the mworker_proc
after the fork, so we ensure that this it won't be close if everything
was done right, and then we try to close it in mworker_cleanup_proc()
when it's not set to -1.

This could be triggered with the script in ticket #2184 and a `ulimit -H
-n 300`. This will fail before the protocol_bind_all() when trying to
increase the nofile setrlimit.

In recent version of haproxy, there is a BUG_ON() in fd_insert() that
could be triggered by this bug because of the global.maxsock check.

Must be backported as far as 2.6.

The problem could exist in previous version but the code is different
and this won't be triggered easily without other consequences in the
master.
2023-06-21 09:44:18 +02:00
Aurelien DARRAGON
d35cee972b BUG/MINOR: http_ext: fix if-none regression in forwardfor option
A regression was introduced in 730b983 ("MINOR: proxy: move 'forwardfor'
option to http_ext")

Indeed, when the forwardfor if-none option is specified on the frontend
but forwardfor is not specified at all on the backend: if-none from the
frontend is ignored.

But this behavior conflicts with the historical one, if-none should only
be ignored if forwardfor is also enabled on the backend and if-none is
not set there.

It should fix GH #2187.

This should be backported in 2.8 with 730b983 ("MINOR: proxy: move
'forwardfor' option to http_ext")
2023-06-20 15:32:56 +02:00
Christopher Faulet
a150cfcfec CLEANUP: mux-h1: Remove useless __maybe_unused statement
h1_append_chunk_size() and h1_prepend_chunk_crlf() functions were marked as
possibly unused to be able to add them in standalone commits. Now these
functions are used, the __maybe_unused statement can be removed.
2023-06-20 13:59:24 +02:00
Christopher Faulet
c6ca6db034 MEDIIM: mux-h1: Add splicing support for chunked messages
When the HTX was introduced, we have lost the support for the kernel
splicing for chunked messages. Thanks to this patch set, it is possible
again. Of course, we still need to keep the H1 parser synchronized. Thus
only the chunk content can be spliced. We still need to read the chunk
envelope using a buffer.

There is no reason to backport this feature. But, just in case, this patch
depends on following patches:

  * "MEDIUM: filters/htx: Don't rely on HTX extra field if payload is filtered"
  * "MINOR: mux-h1: Add function to prepend the chunk crlf to the output buffer"
  * "MINOR: mux-h1: Add function to append the chunk size to the output buffer"
  * "REORG: mux-h1: Rename functions to emit chunk size/crlf in the output buffer"
  * "MEDIUM: mux-h1: Split h1_process_mux() to make code more readable"
2023-06-20 13:34:49 +02:00
Christopher Faulet
8bd835b2d2 MEDIUM: filters/htx: Don't rely on HTX extra field if payload is filtered
If an HTTP data filter is registered on a channel, we must not rely on the
HTX extra field because the payload may be changed and we cannot predict if
this value will change or not. It is too errorprone to let filters deal with
this reponsibility. So we set it to 0 when payload filtering is performed,
but only if the payload length can be determined. It is important because
this field may be used when data are forwarded. In fact, it will be used by
the H1 multiplexer to be able to splice chunk-encoded payload.
2023-06-20 13:34:46 +02:00
Christopher Faulet
05fe76b540 MINOR: mux-h1: Add function to prepend the chunk crlf to the output buffer
h1_prepend_chunk_crlf() function does the opposite of
h1_append_chunk_crlf(). It emit the chunk size in front of the output
buffer.
2023-06-20 13:33:59 +02:00
Christopher Faulet
a07c85c5df MINOR: mux-h1: Add function to append the chunk size to the output buffer
h1_append_chunk_size() function does the opposite of
h1_prepend_chunk_size(). It emit the chunk size at the end of the output
buffer.
2023-06-20 13:33:53 +02:00
Christopher Faulet
e081efd448 REORG: mux-h1: Rename functions to emit chunk size/crlf in the output buffer
h1_emit_chunk_size() and h1_emit_chunk_crlf() functions were renamed,
respectively, h1_prepend_chunk_size() and h1_append_chunk_crlf().
2023-06-20 13:33:23 +02:00
Christopher Faulet
723c73f8a7 MEDIUM: mux-h1: Split h1_process_mux() to make code more readable
h1_process_mux() function was pretty huge a quite hard to debug. So, the
funcion is split in sub-functions. Each sub-function is responsible to a
part of the message (start-line, headers, payload, trailers...). We are
still relying on a HTTP parser to format the message to be sure to detect
errors.  Functionnaly speaking, there is no change. But the code is now more
readable.
2023-06-20 13:33:01 +02:00
Frédéric Lécaille
a55acf993a BUG/MINOR: quic: ticks comparison without ticks API use
Replace a "less than" comparison between two tick variable by a call to tick_is_lt()
in quic_loss_pktns(). This bug could lead to a wrong packet loss detection
when the loss time computed values could wrap. This is the case 20 seconds after
haproxy has started.

Must be backported as far as 2.6.
2023-06-19 19:05:45 +02:00
William Lallemand
e6051a04ef BUG/MEDIUM: mworker: increase maxsock with each new worker
In ticket #2184, HAProxy is crashing in a BUG_ON() after a lot of reload
when the previous processes did not exit.

Each worker has a socketpair which is a FD in the master, when reloading
this FD still exists until the process leaves. But the global.maxconn
value is not incremented for each of these FD. So when there is too much
workers and the number of FD reaches maxsock, the next FD inserted in
the poller will crash the process.

This patch fixes the issue by increasing the maxsock for each remaining
worker.

Must be backported in every maintained version.
2023-06-19 17:32:32 +02:00
Frédéric Lécaille
98b55d1260 BUG/MINOR: quic: Missing transport parameters initializations
This bug was introduced by this commit:

     MINOR: quic: Remove pool_zalloc() from qc_new_conn()

The transport parameters was not initialized. This leaded to a crash when
dumping the received ones from TRACE()s.

Also reset the lengths of the CIDs attached to a quic_conn struct to 0 value
to prevent them from being dumped from traces when not already initialized.

No backport needed.
2023-06-19 08:49:04 +02:00
Frédéric Lécaille
30254d5e75 MINOR: quic: Remove pool_zalloc() from quic_dgram_parse()
Replace a call to pool_zalloc() by a call to pool_malloc() into quic_dgram_parse
to allocate quic_rx_packet struct objects.
Initialize almost all the members of quic_rx_packet struct.
->saddr is initialized by quic_rx_pkt_retrieve_conn().
->pnl and ->pn are initialized by qc_do_rm_hp().
->dcid and ->scid are initialized by quic_rx_pkt_parse() which calls
quic_packet_read_long_header() for a long packet. For a short packet,
only ->dcid will be initialized.
2023-06-16 16:56:08 +02:00
Frédéric Lécaille
9f9bd5f84d MINOR: quic: Remove pool_zalloc() from qc_conn_alloc_ssl_ctx()
pool_zalloc() is replaced by pool_alloc() into qc_conn_alloc_ssl_ctx() to allocate
a ssl_sock_ctx struct. ssl_sock_ctx struct member are all initiliazed to null values
excepted ->ssl which is initialized by the next statement: a call to qc_ssl_sess_init().
2023-06-16 16:56:08 +02:00
Frédéric Lécaille
ddc616933c MINOR: quic: Remove pool_zalloc() from qc_new_conn()
qc_new_conn() is ued to initialize QUIC connections with quic_conn struct objects.
This function calls quic_conn_release() when it fails to initialize a connection.
quic_conn_release() is also called to release the memory allocated by a QUIC
connection.

Replace pool_zalloc() by pool_alloc() in this function and initialize
all quic_conn struct members which are referenced by quic_conn_release() to
prevent use of non initialized variables in this fonction.
The ebtrees, the lists attached to quic_conn struct must be initialized.
The tasks must be reset to their NULL default values to be safely destroyed
by task_destroy(). This is all the case for all the TLS cipher contexts
of the encryption levels (struct quic_enc_level) and those for the keyupdate.
The packet number spaces (struct quic_pktns) must also be initialized.
->prx_counters pointer must be initialized to prevent quic_conn_prx_cntrs_update()
from dereferencing this pointer.
->latest_rtt member of quic_loss struct must also be initialized. This is done
by quic_loss_init() called by quic_path_init().
2023-06-16 16:55:58 +02:00
Frédéric Lécaille
4ae29be18c BUG/MINOR: quic: Possible endless loop in quic_lstnr_dghdlr()
This may happen when the initilization of a new QUIC conn fails with qc_new_conn()
when receiving an Initial paquet. This is done after having allocated a CID with
new_quic_cid() called by quic_rx_pkt_retrieve_conn() which stays in the listener
connections tree without a QUIC connection attached to. Then when the listener
receives another Initial packet for the same CID, quic_rx_pkt_retrieve_conn()
returns NULL again (no QUIC connection) but with an thread ID already bound to the
connection, leading the datagram to be requeued in the same datagram handler thread
queue. And so on.

To fix this, the connection is created after having created the connection ID.
If this fails, the connection is deallocated.

During the race condition, when two different threads handle two datagrams for
the same connection, in addition to releasing the newer created connection ID,
the newer QUIC connection must also be released.

Must be backported as far as 2.7.
2023-06-16 16:10:58 +02:00