A new protocol named "reverse_connect" is created. This will be used to
instantiate connections that are opened by a reverse bind.
For the moment, only a minimal set of callbacks are defined with no real
work. This will be extended along the next patches.
When testing if a protocol supports SO_REUSEPORT, we're now able to
verify if the OS does really support it. While it may be supported at
build time, it may possibly have been blocked in a container for
example so we'd rather know what it's like.
Some protocol support SO_REUSEPORT and others not. Some have such a
limitation in the kernel, and others in haproxy itself (e.g. sock_unix
cannot support multiple bindings since each one will unbind the previous
one). Also it's really protocol-dependent and not just family-dependent
because on Linux for some time it was supported for TCP and not UDP.
Let's move the definition to the protocols instead. Now it's preset in
tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset.
The enabled() config condition test validates IPv4 (generally sufficient),
and -dR / noreuseport all protocols at once.
We'll use these flags to know if some protocols are supported, and if
so, with what options/extensions. Reuseport will move there for example.
Two functions were added to globally set/clear a flag.
This field forces an unaligned hole between two list heads. Let's move
it up where it will be more easily combined with other fields. In
addition, turn it to unsigned while it's still not used.
There's a two-byte hole in proto_fam after sock_family, let's move the
l3_addrlen there as a ushort. Note that contrary to what the comment
says, it's still not used by hash algorithms though it could.
Define a new protocol callback set_affinity. This function is used
during listener_accept() to notify about a rebind on a new thread just
before pushing the connection on the selected thread queue. If the
callback fails, accept is done locally.
This change will be useful for protocols with state allocated before
accept is done. For the moment, only QUIC protocol is concerned. This
will allow to rebind the quic_conn to a new thread depending on its
load.
This should be backported up to 2.7 after a period of observation.
There's been some great confusion between proto_type, ctrl_type and
sock_type. It turns out that ctrl_type was improperly chosen because
it's not the control layer that is of this or that type, but the
transport layer, and it turns out that the transport layer doesn't
(normally) denaturate the underlying control layer, except for QUIC
which turns dgrams to streams. The fact that the SOCK_{DGRAM|STREAM}
set of values was used added to the confusion.
Let's replace it with xprt_type which reuses the later introduced
PROTO_TYPE_* values, and update the comments to explain which one
works at what level.
Right now the proto_fam descriptor provides a family-specific
get_src() and get_dst() pair of calls to retrieve a socket's source
or destination address. However this only works for connected mode
sockets. QUIC provides its own stream protocol, which relies on a
datagram protocol underneath, so the get_src()/get_dst() at that
protocol's family will not work, and QUIC would need to provide its
own.
This patch implements get_src() and get_dst() at the protocol level
from a connection, and makes sure that conn_get_src()/conn_get_dst()
will automatically use them if defined before falling back to the
family's pair of functions.
The protocol selection is currently performed based on the family,
control type and socket type. But this is often not enough, as both
only provide DGRAM or STREAM, leaving few variants. Protocols like
SCTP for example might be indistinguishable from TCP here. Same goes
for TCP extensions like MPTCP.
This commit introduces a new enum proto_type that is placed in each
and every protocol definition, that will usually more or less match
the sock_type, but being an enum, will support additional values.
No need to include the full tree management code, type files only
need the definitions. Doing so reduces the whole code size by around
3.6% and the build time is down to just 6s.
Right now the connection subscribe/unsubscribe code needs to manipulate
FDs, which is not compatible with QUIC. In practice what we need there
is to be able to either subscribe or wake up depending on readiness at
the moment of subscription.
This commit introduces two new functions at the control layer, which are
provided by the socket code, to check for FD readiness or subscribe to it
at the control layer. For now it's not used.
This is what we need to drain pending incoming data from an connection.
The code was taken from conn_sock_drain() without the connection-specific
stuff. It still takes a connection for now for API simplicity.
Currnetly conn_ctrl_init() does an fd_insert() and conn_ctrl_close() does an
fd_delete(). These are the two only short-term obstacles against using a
non-fd handle to set up a connection. Let's have pur these into the protocol
layer, along with the other connection-level stuff so that the generic
connection code uses them instead. This will allow to define new ones for
other protocols (e.g. QUIC).
Since we only support regular sockets at the moment, the code was placed
into sock.c and shared with proto_tcp, proto_uxst and proto_sockpair.
For the sake of an improved readability, let's group the protocol
field members according to where they're supposed to be defined:
- connection layer (note: for now even UDP needs one)
- binding layer
- address family
- socket layer
Nothing else was changed.
This field used to be needed before commit 2b5e0d8b6 ("MEDIUM: proto_udp:
replace last AF_CUST_UDP* with AF_INET*") as it was used as a protocol
entry selector. Since this commit it's always equal to the socket family's
value so it's entirely redundant. Let's remove it now to simplify the
protocol definition a little bit.
With the removal of the family-specific port setting, all protocol had
exactly the same implementation of ->add(). A generic one was created
with the name "default_add_listener" so that all other ones can now be
removed. The API was slightly adjusted so that the protocol and the
listener are passed instead of the listener and the port.
Note that all protocols continue to provide this ->add() method instead
of routinely calling default_add_listener() from create_listeners(). This
makes sure that any non-standard protocol will still be able to intercept
the listener addition if needed.
This could be backported to 2.3 along with the few previous patches on
listners as a pure code cleanup.
At various places we need to set a port on an IPv4 or IPv6 address, and
it requires casts that are easy to get wrong. Let's add a new set_port()
helper to the address family to assist in this. It will be directly
accessible from the protocol and will make the operation seamless.
Right now this is only implemented for sock_inet as other families do
not need a port.
We don't need to specify the handler anymore since it's set in the
receiver. Let's remove this argument from the function and clean up
the remains of code that were still setting it.
Now we define a new sock_accept_iocb() for socket-based stream protocols
and use it as a wrapper for listener_accept() which now takes a listener
and not an FD anymore. This will allow the receiver's I/O cb to be
redefined during registration, and more specifically to get rid of the
hard-coded hacks in protocol_bind_all() made for syslog.
The previous ->accept() callback in the protocol was removed since it
doesn't have anything to do with accept() anymore but is more generic.
A few places where listener_accept() was compared against the FD's IO
callback for debugging purposes on the CLI were updated.
For now we're still using the protocol's default accept() function as
the I/O callback registered by the receiver into the poller. While
this is usable for most TCP connections where a listener is needed,
this is not suitable for UDP where a different handler is needed.
Let's make this configurable in the receiver just like the upper layer
is configurable for listeners. In order to ease stream protocols
handling, the protocols will now provide a default I/O callback
which will be preset into the receivers upon allocation so that
almost none of them has to deal with it.
This per-protocol function will be used to accept an incoming
connection and return it as a struct connection*. As such the protocol
stack's internal representation of a connection will not need to be
handled by the listener code.
No protocol defines it anymore. The last user used to be the monitor-net
stuff that got partially broken already when the tcp_drain() function
moved to conn_sock_drain() with commit e215bba95 ("MINOR: connection:
make conn_sock_drain() work for all socket families") in 1.9-dev2.
A part of this will surely move back later when non-socket connections
arrive with QUIC but better keep the API clean and implement what's
needed in time instead.
Now we introdce a new .rx_listening() function to report if a receiver is
actually a listening socket. The reason for this is to help detect shared
sockets that might have been broken by sibling processes.
Now we have ->suspend() and ->resume() for listeners at the protocol
level. This means that it now becomes possible for a protocol to redefine
its own way to suspend and resume. The default functions are provided for
TCP, UDP and unix, and they are pass-through to the receiver equivalent
as it used to be till now. Nothing was defined for sockpair since it does
not need to suspend/resume during reloads, hence it will succeed.
The inner part now goes into the protocol and is used to decide how to
unbind a given protocol's listener. The existing code which is able to
also unbind the receiver was provided as a default function that we
currently use everywhere. Some complex listeners like QUIC will use this
to decide how to unbind without impacting existing connections, possibly
by setting up other incoming paths for the traffic.
This is used as a generic way to unbind a receiver at the end of
do_unbind_listener(). This allows to considerably simplify that function
since we can now let the protocol perform the cleanup. The generic code
was moved to sock.c, along with the conditional rx_disable() call. Now
the code also supports that the ->disable() function of the protocol
which acts on the listener performs the close itself and adjusts the
RX_F_BUOND flag accordingly.
These methods will be used to enable/disable accepting new connections
so that listeners do not play with FD directly anymore. Since all the
currently supported protocols work on socket for now, these are identical
to the rx_enable/rx_disable functions. However they were not defined in
sock.c since it's likely that some will quickly start to differ. At the
moment they're not used.
We have to take care of fd_updt before calling fd_{want,stop}_recv()
because it's allocated fairly late in the boot process and some such
functions may be called very early (e.g. to stop a disabled frontend's
listeners).
These methods will be used to enable/disable rx at the receiver level so
that callers don't play with FDs directly anymore. All our protocols use
the generic ones from sock.c at the moment. For now they're not used.
This one undoes ->rx_suspend(), it tries to restore an operational socket.
It was only implemented for TCP since it's the only one we support right
now.
The ->pause method is inappropriate since it doesn't exactly "pause" a
listener but rather temporarily disables it so that it's not visible at
all to let another process take its place. The term "suspend" is more
suitable, since the "pause" is actually what we'll need to apply to the
FULL and LIMITED states which really need to make a pause in the accept
process. And it goes well with the use of the "resume" function that
will also need to be made per-protocol.
Let's rename the function and make it act on the receiver since it's
already what it essentially does, hence the prefix "_rx" to make it
more explicit.
The protocol struct was a bit reordered because it was becoming a real
mess between the parts related to the listeners and those for the
receivers.
Since the listeners were split into receiver+listener, this field ought
to have been renamed because it's confusing. It really links receivers
and not listeners, as most of the time it's used via rx.proto_list!
The nb_listeners field was updated accordingly.
We don't need to cheat with the sock_domain anymore, we now always have
the SOCK_DGRAM sock_type as a complementary selector. This patch restores
the sock_domain to AF_INET* in the udp* protocols and removes all traces
of the now unused AF_CUST_*.
This one will be needed to more accurately select a protocol. It may
differ from the socket type for QUIC, which uses dgram at the socket
layer and provides stream at the control layer. The upper level requests
a control layer only so we need this field.
At some places (log fd@XXX, bind fd@XXX) we support using an explicit
file descriptor number, that is placed into the sockaddr for later use.
The problem is that till now it was done with an AF_UNSPEC family, which
is also used for other situations like missing info or rings (for logs).
Let's create an "official" family AF_CUST_EXISTING_FD for this case so
that we are certain the FD can be found in the address when it is set.
This removes the following fields from struct protocol that are now
retrieved from the protocol family instead: .sock_family, .sock_addrlen,
.l3_addrlen, .addrcmp, .bind, .get_src, .get_dst.
This also removes the UDP-specific udp{,6}_get_{src,dst}() functions
which were referenced but not used yet. Their goal was only to remap
the original AF_INET* addresses to AF_CUST_UDP*.
Note that .sock_domain is still there as it's used as a selector for
the protocol struct to be used.
We need to specially handle protocol families which regroup common
functions used for a given address family. These functions include
bind(), addrcmp(), get_src() and get_dst() for now. Some fields are
also added about the address family, socket domain (protocol family
passed to the socket() syscall), and address length.
These protocol families are referenced from the protocols but not yet
used.
This will be the function that must be used to bind the receiver. It
solely depends on the address family but for now it's simpler to have
it per protocol.
The new addrcmp() protocol member points to the function to be used to
compare two addresses of the same family.
When picking an FD from a previous process, we can now use the address
specific address comparison functions instead of having to rely on a
local implementation. This will help move that code to a more central
place.
This patch introduce proto_udp.c targeting a further support of
log forwarding feature.
This code was originally produced by Frederic Lecaille working on
QUIC support and only minimal requirements for syslog support
have been merged.
There are list definitions everywhere in the code, let's drop the need
for including list-t.h to declare them. The rest of the list manipulation
is huge however and not needed everywhere so using the list walking macros
still requires to include list.h.
The protocol.h files are pretty low in the dependency and (sadly) used
by some files from common/. Almost nothing was changed except lifting a
few comments.