40 Commits

Author SHA1 Message Date
Dragan Dosen
96c1a61136 MEDIUM: udp: allow to retrieve the frontend destination address
A new flag RX_F_PASS_PKTINFO is now available, whose purpose is to mark
that the destination address is about to be retrieved on some listeners.

The address can be retrieved from the first received datagram, and
relies on the IP_PKTINFO, IP_RECVDSTADDR and IPV6_RECVPKTINFO support.
2024-01-02 11:44:42 +01:00
Amaury Denoyelle
55e78ff7e1 MINOR: rhttp: large renaming to use rhttp prefix
Previous commit renames 'proto_reverse_connect' module to 'proto_rhttp'.
This commits follows this by replacing various custom prefix by 'rhttp_'
to make the code uniform.

Note that 'reverse_' prefix was kept in connection module. This is
because if a new reversable protocol not based on HTTP is implemented,
it may be necessary to reused the same connection function which are
protocol agnostic.
2023-11-23 17:40:01 +01:00
Amaury Denoyelle
e09af499b4 MINOR: rhttp: rename proto_reverse_connect
This commit is renaming of module proto_reverse_connect to proto_rhttp.
This name is selected as it is shorter and more precise.
2023-11-23 17:38:58 +01:00
Willy Tarreau
445fc1fe3a BUG/MINOR: sock: mark abns sockets as non-suspendable and always unbind them
In 2.3, we started to get a cleaner socket unbinding mechanism with
commit f58b8db47 ("MEDIUM: receivers: add an rx_unbind() method in
the protocols"). This mechanism rightfully refrains from unbinding
when sockets are expected to be transferrable to another worker via
"expose-fd listeners", but this is not compatible with ABNS sockets,
which do not support reuseport, unbinding nor being renamed: in short
they will always prevent a new process from binding.

It turns out that this is not much visible because by pure accident,
GTUNE_SOCKET_TRANSFER is only set in the code dealing with master mode
and deamons, so it's never set in foreground mode nor in tests even if
present on the stats socket. However with master mode, it is now always
set even when not present on the stats socket, and will always conflict.

The only reasonable approach seems to consist in marking these abns
sockets as non-suspendable so that the generic sock_unbind() code can
decide to just unbind them regardless of GTUNE_SOCKET_TRANSFER.

This should carefully be backported as far as 2.4.
2023-11-20 11:38:26 +01:00
Amaury Denoyelle
bb28215d9b MEDIUM: quic: define an accept queue limit
QUIC connections are pushed manually into a dedicated listener queue
when they are ready to be accepted. This happens after handshake
finalization or on 0-RTT packet reception. Listener is then woken up to
dequeue them with listener_accept().

This patch comptabilizes the number of connections currently stored in
the accept queue. If reaching a certain limit, INITIAL packets are
dropped on reception to prevent further QUIC connections allocation.
This should help to preserve system resources.

This limit is automatically derived from the listener backlog. Half of
its value is reserved for handshakes and the other half for accept
queues. By default, backlog is equal to maxconn which guarantee that
there can't be no more than maxconn connections in handshake or waiting
to be accepted.
2023-11-09 16:24:00 +01:00
Amaury Denoyelle
3df6a60113 MEDIUM: quic: limit handshake per listener
Implement a limit per listener for concurrent number of QUIC
connections. When reached, INITIAL packets for new connections are
automatically dropped until the number of handshakes is reduced.

The limit value is automatically based on listener backlog, which itself
defaults to maxconn.

This feature is important to ensure CPU and memory resources are not
consume if too many handshakes attempt are started in parallel.

Special care is taken if a connection is released before handshake
completion. In this case, counter must be decremented. This forces to
ensure that member <qc.state> is set early in qc_new_conn() before any
quic_conn_release() invocation.
2023-11-09 16:23:52 +01:00
Amaury Denoyelle
2ac5d9a657 MINOR: quic: handle perm error on bind during runtime
Improve EACCES permission errors encounterd when using QUIC connection
socket at runtime :

* First occurence of the error on the process will generate a log
  warning. This should prevent users from using a privileged port
  without mandatory access rights.

* Socket mode will automatically fallback to listener socket for the
  receiver instance. This requires to duplicate the settings from the
  bind_conf to the receiver instance to support configurations with
  multiple addresses on the same bind line.
2023-10-03 16:52:02 +02:00
Amaury Denoyelle
b9bb3b932c MINOR: proto_reverse_connect: emit log for preconnect
Add reporting using send_log() for preconnect operation. This is minimal
to ensure we understand the current status of listener in active reverse
connect.

To limit logging quantity, only important transition are considered.
This requires to implement a minimal state machine as a new field in
receiver structure.

Here are the logs produced :
* Initiating : first time preconnect is enabled on a listener
* Error : last preconnect attempt interrupted on a connection error
* Reaching maxconn : all necessary connections were reversed and are
  operational on a listener
2023-09-22 17:21:53 +02:00
Amaury Denoyelle
47f502df5e MEDIUM: proto_reverse_connect: bootstrap active reverse connection
Implement active reverse connection initialization. This is done through
a new task stored in the receiver structure. This task is instantiated
via bind callback and first woken up via enable callback.

Task handler is separated into two halves. On the first step, a new
connection is allocated and stored in <pend_conn> member of the
receiver. This new client connection will proceed to connect using the
server instance referenced in the bind_conf.

When connect has successfully been executed and HTTP/2 connection is
ready for exchange after SETTINGS, reverse_connect task is woken up. As
<pend_conn> is still set, the second halve is executed which only
execute listener_accept(). This will in turn execute accept_conn
callback which is defined to return the pending connection.

The task is automatically requeued inside accept_conn callback if bind
maxconn is not yet reached. This allows to specify how many connection
should be opened. Each connection is instantiated and reversed serially
one by one until maxconn is reached.

conn_free() has been modified to handle failure if a reverse connection
fails before being accepted. In this case, no session exists to notify
about the failure. Instead, reverse_connect task is requeud with a 1
second delay, giving time to fix a possible network issue. This will
allow to attempt a new connection reverse.

Note that for the moment connection rebinding after accept is disabled
for simplicity. Extra operations are required to migrate an existing
connection and its stack to a new thread which will be implemented
later.
2023-08-24 17:03:06 +02:00
Amaury Denoyelle
0747e493a0 MINOR: proto_reverse_connect: parse rev@ addresses for bind
Implement parsing for "rev@" addresses on bind line. On config parsing,
server name is stored on the bind_conf.

Several new callbacks are defined on reverse_connect protocol to
complete parsing. listen callback is used to retrieve the server
instance from the bind_conf server name. If found, the server instance
is stored on the receiver. Checks are implemented to ensure HTTP/2
protocol only is used by the server.
2023-08-24 17:02:37 +02:00
Willy Tarreau
e4c36aa8a1 MINOR: receiver: add RX_F_MUST_DUP to indicate that an rx must be duped
The purpose of this new flag will be to mark that some listeners
duplicate their reference's FD instead of trying to setup a completely
new listener from scratch. This will be used when multiple groups want
to listen to the same socket, via multiple FDs.
2023-04-21 17:41:26 +02:00
Willy Tarreau
aae1810b4d MINOR: receiver: add a struct shard_info to store info about each shard
In order to create multiple receivers for one multi-group shard, we'll
need some more info about the shard. Here we store:
  - the number of groups (= number of receivers)
  - the number of threads (will be used for accept LB)
  - pointer to the reference rx (to get the FD and to find all threads)
  - pointers to the other members (to iterate over all threads)

For now since there's only one group per shard it remains simple. The
listener deletion code already takes care of removing the current
member from its shards list and moving others' reference to the last
one if it was their reference (so as to avoid o(n^2) updates during
ordered deletes).

Since the vast majority of setups will not use multi-group shards, we
try to save memory usage by only allocating the shard_info when it is
needed, so the principle here is that a receiver shard_info==NULL is
alone and doesn't share its socket with another group.

Various approaches were considered and tests show that the management
of the listeners during boot makes it easier to just attach to or
detach from a shard_info and automatically allocate it if it does not
exist, which is what is being done here.

For now the attach code is not called, but detach is already called
on delete.
2023-04-21 17:41:26 +02:00
Amaury Denoyelle
0783a7b08e MINOR: listener: remove unneeded local accept flag
Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC
protocol before thread rebinding was supported by the quic_conn layer.

This should be backported up to 2.7 after the previous patch has also
been taken.
2023-04-18 17:09:34 +02:00
Willy Tarreau
d30e82b9f0 MINOR: receiver: reserve special values for "shards"
Instead of artificially setting the shards count to MAX_THREAD when
"by-thread" is used, let's reserve special values for symbolic names
so that we can add more in the future. For now we use value -1 for
"by-thread", which requires to turn the type to signed int but it was
already used as such everywhere anyway.
2023-04-13 17:12:50 +02:00
Amaury Denoyelle
1cba8d60f3 CLEANUP: quic: improve naming for rxbuf/datagrams handling
QUIC datagrams are read from a random thread. They are then redispatch
to the connection thread according to the first packet DCID. These
operations are implemented through a special buffer designed to avoid
locking.

Refactor this code with the following changes :
* <rxbuf> type is renamed <quic_receiver_buf>. Its list element is also
  renamed to highligh its attach point to a receiver.
* <quic_dgram> and <quic_receiver_buf> definition are moved to
  quic_sock-t.h. This helps to reduce the size of quic_conn-t.h.
* <quic_dgram> list elements are renamed to highlight their attach point
  into a <quic_receiver_buf> and a <quic_dghdlr>.

This should be backported up to 2.6.
2022-10-13 11:06:48 +02:00
Amaury Denoyelle
8c4d062d25 CLEANUP: quic: remove unused rxbufs member in receiver
rxbuf is the structure used to store QUIC datagrams and redispatch them
to the connection thread.

Each receiver manages a list of rxbuf. This was stored both as an array
and a mt_list. Currently, only mt_list is needed so removed <rxbufs>
member from receiver structure.

This should be backported up to 2.6.
2022-10-13 11:05:41 +02:00
Willy Tarreau
cab054bbf9 CLEANUP: quic/receiver: remove the now unused tx_qring list
The tx_qrings[] and tx_qring_list in the receiver are not used
anymore since commit f2476053f ("MINOR: quic: replace custom buf on Tx
by default struct buffer"), the only place where they're referenced
was in quic_alloc_tx_rings_listener(), which by the way implies that
these were not even freed on exit.

Let's just remove them. This should be backported to 2.6 since the
commit above also was.
2022-10-11 08:40:38 +02:00
Amaury Denoyelle
8524f0f779 MINOR: quic: use a global dghlrs for each thread
Move the QUIC datagram handlers oustide of the receivers. Use a global
handler per-thread which is allocated on post-config. Implement a free
function on process deinit to avoid a memory leak.
2022-02-15 10:13:20 +01:00
Frédéric Lécaille
74904a4792 MINOR: quic: Make usage of by datagram handler trees
The CID trees are no more attached to the listener receiver but to the
underlying datagram handlers (one by thread) which run always on the same thread.
So, any operation on these trees do not require any locking.
2022-01-27 16:37:55 +01:00
Frédéric Lécaille
794d068d8f MINOR: proto_quic: Wrong allocations for TX rings and RX bufs
As mentionned in the comment, the tx_qrings and rxbufs members of
receiver struct must be pointers to pointers!
Modify the functions responsible of their allocations consequently.
Note that this code could work because sizeof rxbuf and sizeof tx_qrings
are greater than the size of pointer!
2022-01-27 16:37:55 +01:00
Frédéric Lécaille
69dd5e6a0b MINOR: proto_quic: Allocate datagram handlers
Add quic_dghdlr new struct do define datagram handler tasks, one by thread.
Allocate them and attach them to the listener receiver part calling
quic_alloc_dghdlrs_listener() newly implemented function.
2022-01-27 16:37:55 +01:00
Amaury Denoyelle
cfa2d5648f MAJOR: quic: implement accept queue
Do not proceed to direct accept when creating a new quic_conn. Wait for
the QUIC handshake to succeeds to insert the quic_conn in the accept
queue. A tasklet is then woken up to call listener_accept to accept the
quic_conn.

The most important effect is that the connection/mux layers are not
instantiated at the same time as the quic_conn. This forces to delay
some process to be sure that the mux is allocated :
* initialization of mux transport parameters
* installation of the app-ops

Also, the mux instance is not checked now to wake up the quic_conn
tasklet. This is safe because the xprt-quic code is now ready to handle
the absence of the connection/mux layers.

Note that this commit has a deep impact as it changes significantly the
lower QUIC architecture. Most notably, it breaks the 0-RTT feature.
2022-01-26 16:13:54 +01:00
Amaury Denoyelle
7f7713d6ef MINOR: receiver: define a flag for local accept
This flag is named RX_F_LOCAL_ACCEPT. It will be activated for special
receivers where connection balancing to threads is already handle
outside of listener_accept, such as with QUIC listeners.
2022-01-26 11:22:20 +01:00
Frédéric Lécaille
c1029f6182 MINOR: quic: Allocate listener RX buffers
At this time we allocate an RX buffer by thread.
Also take the opportunity offered by this patch to rename TX related variable
names to distinguish them from the RX part.
2021-11-05 15:20:04 +01:00
Willy Tarreau
6dfbef4145 MEDIUM: listener: add the "shards" bind keyword
In multi-threaded mode, on operating systems supporting multiple listeners on
the same IP:port, this will automatically create this number of multiple
identical listeners for the same line, all bound to a fair share of the number
of the threads attached to this listener. This can sometimes be useful when
using very large thread counts where the in-kernel locking on a single socket
starts to cause a significant overhead. In this case the incoming traffic is
distributed over multiple sockets and the contention is reduced. Note that
doing this can easily increase the CPU usage by making more threads work a
little bit.

If the number of shards is higher than the number of available threads, it
will automatically be trimmed to the number of threads. A special value
"by-thread" will automatically assign one shard per thread.
2021-10-14 21:27:48 +02:00
Willy Tarreau
01cac3f721 MEDIUM: listeners: split the thread mask between receiver and bind_conf
With groups at some point we'll have to have distinct masks/groups in the
receiver and the bind_conf, because a single bind_conf might require to
instantiate multiple receivers (one per group).

Let's split the thread mask and group to have one for the bind_conf and
another one for the receiver while it remains easy to do. This will later
allow to use different storage for the bind_conf if needed (e.g. support
multiple groups).
2021-10-14 21:27:48 +02:00
Willy Tarreau
d57b9ff7af MEDIUM: listeners: support the definition of thread groups on bind lines
This extends the "thread" statement of bind lines to support an optional
thread group number. When unspecified (0) it's an absolute thread range,
and when specified it's one relative to the thread group. Masks are still
used so no more than 64 threads may be specified at once, and a single
group is possible. The directive is not used for now.
2021-10-08 17:22:26 +02:00
Frédéric Lécaille
48f8e1925b MINOR: proto_quic: Allocate TX ring buffers for listeners
We allocate an array of QUIC ring buffer, one by thread, and arranges them in a
MT_LIST. Everything is allocated or nothing: we do not want to usse an incomplete
array of ring buffers to ensure that each thread may safely acquire one of these
buffers.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
f3d078d22e MINOR: quic: Make qc_lstnr_pkt_rcv() be thread safe.
Modify the I/O dgram handler principal function used to parse QUIC packets
be thread safe. Its role is at least to create new incoming connections
add to two trees protected by the same RW lock. The packets are for now on
fully parsed before possibly creating new connections.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
c28aba2a8d MINOR: quic: Replace the RX list of packet by a thread safety one.
This list is shared between the I/O dgram handler and the task responsible
for processing the QUIC packets.
2021-09-23 15:27:25 +02:00
Willy Tarreau
72faef3866 MEDIUM: global: remove dead code from nbproc/bind_proc removal
Lots of places iterating over nbproc or comparing with nbproc could be
simplified. Further, "bind-process" and "process" parsing that was
already limited to process 1 or "all" or "odd" resulted in a bind_proc
field that was either 0 or 1 during the init phase and later always 1.

All the checks for compatibilities were removed since it's not possible
anymore to run a frontend and a backend on different processes or to
have peers and stick-tables bound on different ones. This is the largest
part of this patch.

The bind_proc field was removed from both the proxy and the receiver
structs.

Since the "process" and "bind-process" directives are still parsed,
configs making use of correct values allowing process 1 will continue
to work.
2021-06-15 16:52:42 +02:00
Frdric Lcaille
884f2e9f43 MINOR: listener: Add QUIC info to listeners and receivers.
This patch adds a quic_transport_params struct to bind_conf struct
used for the listeners. This is to store the QUIC transport parameters
for the listeners. Also initializes them when calling str2listener().
Before str2sa_range() it's too early to figure we're going to speak QUIC,
and after it's too late as listeners are already created. So it seems that
doing it in str2listener() when the protocol is discovered is the best
place.

Also adds two ebtrees to the underlying receivers to store the connection
by connections IDs (one for the original connection IDs, and another
one for the definitive connection IDs which really identify the connections.

However it doesn't seem normal that it is stored in the receiver nor the
listener. There should be a private context in the listener so that
protocols can store internal information. This element should in
fact be the listener handle.

Something still feels wrong, and probably we'll have to make QUIC and
SSL co-exist: a proof of this is that there's some explicit code in
bind_parse_ssl() to prevent the "ssl" keyword from replacing the xprt.
2020-12-23 11:57:26 +01:00
Willy Tarreau
d2fb99f9d5 MINOR: protocol: add a default I/O callback and put it into the receiver
For now we're still using the protocol's default accept() function as
the I/O callback registered by the receiver into the poller. While
this is usable for most TCP connections where a listener is needed,
this is not suitable for UDP where a different handler is needed.

Let's make this configurable in the receiver just like the upper layer
is configurable for listeners. In order to ease stream protocols
handling, the protocols will now provide a default I/O callback
which will be preset into the receivers upon allocation so that
almost none of them has to deal with it.
2020-10-15 21:47:56 +02:00
Willy Tarreau
18c20d28d7 MINOR: listeners: move the LI_O_MWORKER flag to the receiver
This listener flag indicates whether the receiver part of the listener
is specific to the master or to the workers. In practice it's only used
by the master's CLI right now. It's used to know whether or not the FD
must be closed before forking the workers. For this reason it's way more
of a receiver's property than a listener's property, so let's move it
there under the name RX_F_MWORKER. The rest of the code remains
unchanged.
2020-10-09 18:43:05 +02:00
Willy Tarreau
3fd3bdc836 MINOR: receiver: move the FOREIGN and V6ONLY options from listener to settings
The new RX_O_FOREIGN, RX_O_V6ONLY and RX_O_V4V6 options are now set into
the rx_settings part during the parsing, so that we don't need to adjust
them in each and every listener anymore. We have to keep both v4v6 and
v6only due to the precedence from v6only over v4v6.
2020-09-16 22:08:07 +02:00
Willy Tarreau
43046fa4f4 MINOR: listener: move the INHERITED flag down to the receiver
It's the receiver's FD that's inherited from the parent process, not
the listener's so the flag must move to the receiver so that appropriate
actions can be taken.
2020-09-16 22:08:07 +02:00
Willy Tarreau
0b9150155e MINOR: receiver: add a receiver-specific flag to indicate the socket is bound
In order to split the receiver from the listener, we'll need to know that
a socket is already bound and ready to receive. We used to do that via
tha LI_O_ASSIGNED state but that's not sufficient anymore since the
receiver might not belong to a listener anymore. The new RX_F_BOUND flag
is used for this.
2020-09-16 22:08:07 +02:00
Willy Tarreau
eef454224d MINOR: receiver: link the receiver to its owner
A receiver will have to pass a context to be installed into the fdtab
for use by the handler. We need to set this into the receiver struct
as the bind will happen longer after the configuration.
2020-09-16 22:08:07 +02:00
Willy Tarreau
0fce6bce34 MINOR: receiver: link the receiver to its settings
Just like listeners keep a pointer to their bind_conf, receivers now also
have a pointer to their rx_settings. All those belonging to a listener are
automatically initialized with a pointer to the bind_conf's settings.
2020-09-16 22:08:07 +02:00
Willy Tarreau
d45693d85c REORG: listener: move the receiver part to a new file
We'll soon add flags for the receivers, better add them to the final
file, so it's time to move the definition to receiver-t.h. The struct
receiver and rx_settings were placed there.
2020-09-16 22:08:07 +02:00