These two functions are used to pause and resume all listeners of
all protocols. They use the standard listener functions for this
so they're supposed to handle the situation gracefully regardless
of the upper proxies' states, and they will report completion on
proxies once the switch is performed.
It might be nice to define a particular "failed" state for listeners
that cannot resume and to count them on proxies in order to mention
that they're definitely stuck. On the other hand, the current
situation is retryable which is quite appreciable as well.
The two functions don't need to be distinguished anymore since they have
all the necessary info to act as needed on their listeners. Let's just
pass via stop_proxy() and make it check for each listener which one to
close or not.
Its sole remaining purpose was to display "proxy foo started", which
has little benefit and pollutes output for those with plenty of proxies.
Let's remove it now.
The VTCs were updated to reflect this, because many of them had explicit
counts of dropped lines to match this message.
This is tagged as MEDIUM because some users may be surprized by the
loss of this quite old message.
The remaining proxy states were only used to distinguish an enabled
proxy from a disabled one. Due to the initialization order, both
PR_STNEW and PR_STREADY were equivalent after startup, and they
would only differ from PR_STSTOPPED when the proxy is disabled or
shutdown (which is effectively another way to disable it).
Now we just have a "disabled" field which allows to distinguish them.
It's becoming obvious that start_proxies() is only used to print a
greeting message now, that we'd rather get rid of. Probably that
zombify_proxy() and stop_proxy() should be merged once their
differences move to the right place.
The enabled/disabled config options were stored into a "state" field
that is an integer but contained only PR_STNEW or PR_STSTOPPED, which
is a bit confusing, and causes a dependency with proxies. This was
renamed to "disabled" and is used as a boolean. The field was also
moved to the end of the struct to stop creating a hole and fill another
one.
This state was used to mention that a proxy was in PAUSED state, as opposed
to the READY state. This was causing some trouble because if a listener
failed to resume (e.g. because its port was temporarily in use during the
resume), it was not possible to retry the operation later. Now by checking
the number of READY or PAUSED listeners instead, we can accurately know if
something went bad and try to fix it again later. The case of the temporary
port conflict during resume now works well:
$ socat readline /tmp/sock1
prompt
> disable frontend testme3
> disable frontend testme3
All sockets are already disabled.
> enable frontend testme3
Failed to resume frontend, check logs for precise cause (port conflict?).
> enable frontend testme3
> enable frontend testme3
All sockets are already enabled.
This state is only set when a pause() fails but isn't even set when a
resume() fails. And we cannot recover from this state. Instead, let's
just count remaining ready listeners to decide to emit an error or not.
It's more accurate and will better support new attempts if needed.
Since v1.4 or so, it's almost not possible anymore to set this state. The
only exception is by using the CLI to change a frontend's maxconn setting
below its current usage. This case makes no sense, and for other cases it
doesn't make sense either because "full" is a vague concept when only
certain listeners are full and not all. Let's just remove this unused
state and make it clear that it's not reported. The "ready" or "open"
states will continue to be reported without being misleading as they
will be opposed to "stop".
The proxy state tries to be synthetic but that doesn't work well with
many listeners, especially for transition phases or after a failed
pause/resume.
In order to address this, we'll instead rely on counters of listeners in
a given state for the 3 major states (ready, paused, listen) and a total
counter. We'll now be able to determine a proxy's state by comparing these
counters only.
This function is used as a wrapper to set a listener's state everywhere.
We'll use it later to maintain some counters in a consistent state when
switching state so it's capital that all state changes go through it.
No functional change was made beyond calling the wrapper.
The zombie state is not used anymore by the listeners, because in the
last two cases where it was tested it couldn't match as it was covered
by the test on the process mask. Instead now the FD is either in the
LISTEN state or the INIT state. This also avoids forcing the listener
to be single-dimensional because actually belonging to another process
isn't totally exclusive with the other states, which explains some of
the difficulties requiring to check the proc_mask and the fd sometimes.
So let's get rid of it now not to be tempted to reuse it.
The doc on the listeners state was updated.
Use the new stats module API to integrate the dns counters in the
standard stats. This is done in order to avoid code duplication, keep
the code related to cli out of dns and use the full possibility of the
stats function, allowing to print dns stats in csv or json format.
Add a boolean 'clearable' on stats module structure. If set, it forces
all the counters to be reset on 'clear counters' cli command. If not,
the counters are reset only when 'clear counters all' is used.
This is executed on startup with the registered statistics module. The
existing statistics have been merged in a list containing all
statistics for each domain. This is useful to print all available
statistics in a generic way.
Allocate extra counters for all proxies/servers/listeners instances.
These counters are allocated with the counters from the stats modules
registered on startup.
Implement a small API to easily add extra counters inside a structure
instance. This will be used to implement dynamic statistics linked on
every type of object as needed.
The counters are stored in a dynamic array inside the relevant objects.
A stat module can be registered to quickly add new statistics on
haproxy. It must be attached to one of the available stats domain. The
register must be done using INITCALL on STG_REGISTER.
The stat module has a name which should be unique for each new module in
a domain. It also contains a statistics list with their name/desc and a
pointer to a function used to fill the stats from the module counters.
The module also provides the initial counters values used on
automatically allocated counters. The offset for these counters
are stored in the module structure.
This flag can be used to determine on what type of proxy object the
statistics should be relevant. It will be useful when adding dynamic
statistics. Currently, this flag is not used.
The domain option will be used to have statistics attached to other
objects than proxies/listeners/servers. At the moment, only the PROXY
domain is available.
Add an argument 'domain' on the 'show stats' cli command to specify the
domain. Only 'domain proxy' is available now. If not specified, proxy
will be considered the default domain.
For HTML output, only proxy statistics will be displayed.
Use an opaque pointer to store proxy instance. Regroup server/listener
as a single opaque pointer. This has the benefit to render the structure
more evolutive to support statistics on other types of objects in the
future.
This patch is needed to extend stat support for components other than
proxies objects.
The prometheus module has been adapted for these changes.
Render the stats size parametric in csv/json dump functions. This is
needed for the future patch which provides dynamic stats. For now the
static value ST_F_TOTAL_FIELDS is provided.
Remove unused parameter px on stats_dump_one_line.
This patch is needed to extend stat support to components other than
proxies objects.
Un-mark stats_dump_one_line and stats_putchk as static and export them
in the header file. These functions will be reusable by other components to
print their statistics.
This patch is needed to extend stat support to components other than
proxies objects.
A crash reported in github issue #880 looks impossible unless
pendconn_cond_unlink() occasionally sees a null leaf_p when attempting
to remove an entry, which seems to be confirmed by the reporter. What
seems to be happening is that depending on compiler optimizations,
this pointer can appear as null while pointers are moved if one of
the node's parents is removed from or inserted into the tree. There's
no explicit null of the pointer during these operations but those
pointers are rewritten in multiple steps and nothing prevents this
situation from happening, and there are no particular barrier nor
atomic ops around this.
This test was used to avoid unnecessary locking, for already deleted
entries, but looking at the code it appears that pendconn_free() already
resets s->pend_pos that's used as <p> there, and that the other call
reasons are after an error where the connection will be dropped as
well. So we don't save anything by doing this test, and make it
unsafe. The older code used to check for list emptiness there and
not inside pendconn_unlink(), which explains why the code has stayed
there. Let's just remove this now.
Thanks to @jaroslawr for reporting this issue in great details and for
testing the proposed fix.
This should be backpored to 1.8, where the test on LIST_ISEMPTY should
be moved to pendconn_unlink() instead (inside the lock, just like 2.0+).
Allow the syntax "${...[*]}" to expand an environment variable
containing several values separated by spaces as individual arguments. A
new flag PARSE_OPT_WORD_EXPAND has been added to toggle this feature on
parse_line invocation. In case of an invalid syntax, a new error
PARSE_ERR_WRONG_EXPAND will be triggered.
This feature has been asked on the github issue #165.
As part of his GREASE experiments on Chromium, Bence Bky reported in
https://lists.w3.org/Archives/Public/ietf-http-wg/2020JulSep/0202.html
and https://bugs.chromium.org/p/chromium/issues/detail?id=1127060 that
a certain combination of frame type and frame flags was causing an error
on app.slack.com. It turns out that it's haproxy that is causing this
issue because the frame type is wrongly assumed to support padding, the
frame flags indicate padding is present, and the frame is too short for
this, resulting in an error.
The reason why only some frame types are affected is due to the frame
type being used in a bit shift to match against a mask, and where the
5 lower bits of the frame type only are used to compute the frame bit.
If the resulting frame bit matches a DATA, HEADERS or PUSH_PROMISE frame
bit, then padding support is assumed and the test is enforced, resulting
in a PROTOCOL_ERROR or FRAME_SIZE_ERROR depending on the payload size.
We must never match any such bit for unsupported frame types so let's
add a check for this. This must be backported as far as 1.8.
Thanks to Cooper Bethea for providing enough context to help narrow the
issue down and to Bence Bky for creating a simple reproducer.
We don't need to cheat with the sock_domain anymore, we now always have
the SOCK_DGRAM sock_type as a complementary selector. This patch restores
the sock_domain to AF_INET* in the udp* protocols and removes all traces
of the now unused AF_CUST_*.
The protocol array used to be only indexed by socket family, which is very
problematic with UDP (requiring an extra family) and with the forthcoming
QUIC (also requiring an extra family), especially since that binds them to
certain families, prevents them from supporting dgram UNIX sockets etc.
In order to address this, we now start to register the protocols with more
info, namely the socket type and the control type (either stream or dgram).
This is sufficient for the protocols we have to deal with, but could also
be extended further if multiple protocol variants were needed. But as is,
it still fits nicely in an array, which is convenient for lookups that are
instant.
This one will be needed to more accurately select a protocol. It may
differ from the socket type for QUIC, which uses dgram at the socket
layer and provides stream at the control layer. The upper level requests
a control layer only so we need this field.
Most callers of str2sa_range() need the protocol only to check that it
provides a ->connect() method. It used to be used to verify that it's a
stream protocol, but it might be a bit early to get rid of it. Let's keep
the test for now but move it to str2sa_range() when the new flag PA_O_CONNECT
is present. This way almost all call places could be cleaned from this.
There's a strange test in the server address parsing code that rechecks
the family from the socket which seems to be a duplicate of the previously
removed tests. It will have to be rechecked.
We'll need this so that it can return pointers to stacked protocol in
the future (for QUIC). In addition this removes a lot of tests for
protocol validity in the callers.
Some of them were checked further apart, or after a call to
str2listener() and they were simplified as well.
There's still a trick, we can fail to return a protocol in case the caller
accepts an fqdn for use later. This is what servers do and in this case it
is valid to return no protocol. A typical example is:
server foo localhost:1111
The function will need to use more than just a family, let's pass it
the selected protocol. The caller will then be able to do all the fancy
stuff required to pick the best protocol.
This is at least temporary, as the migration at once is way too difficuly.
For now it still creates listeners but only allows DGRAM sockets. This
aims at easing the split between listeners and receivers.
If a file descriptor was passed, we can optionally return it. This will
be useful for listening sockets which are both a pre-bound FD and a ready
socket.
These flags indicate whether the call is made to fill a bind or a server
line, or even just send/recv calls (like logs or dns). Some special cases
are made for outgoing FDs (e.g. pipes for logs) or socket FDs (e.g external
listeners), and there's a distinction between stream or dgram usage that's
expected to significantly help str2sa_range() proceed appropriately with
the input information. For now they are not used yet.
These flags indicate what is expected regarding port specifications. Some
callers accept none, some need fixed ports, some have it mandatory, some
support ranges, and some take an offset. Each possibilty is reflected by
an option. For now they are not exploited, but the goal is to instrument
str2sa_range() to properly parse that.
We currently have an argument to require that the address is resolved
but we'll soon add more, so let's turn it into a bit field. The old
"resolve" boolean is now PA_O_RESOLVE.
At some places (log fd@XXX, bind fd@XXX) we support using an explicit
file descriptor number, that is placed into the sockaddr for later use.
The problem is that till now it was done with an AF_UNSPEC family, which
is also used for other situations like missing info or rings (for logs).
Let's create an "official" family AF_CUST_EXISTING_FD for this case so
that we are certain the FD can be found in the address when it is set.
This removes the following fields from struct protocol that are now
retrieved from the protocol family instead: .sock_family, .sock_addrlen,
.l3_addrlen, .addrcmp, .bind, .get_src, .get_dst.
This also removes the UDP-specific udp{,6}_get_{src,dst}() functions
which were referenced but not used yet. Their goal was only to remap
the original AF_INET* addresses to AF_CUST_UDP*.
Note that .sock_domain is still there as it's used as a selector for
the protocol struct to be used.
We now take care of retrieving sock_family, l3_addrlen, bind(),
addrcmp(), get_src() and get_dst() from the protocol family and
not just the protocol itself. There are very few places, this was
only seldom used. Interestingly in sock_inet.c used to rely on
->sock_family instead of ->sock_domain, and sock_unix.c used to
hard-code PF_UNIX instead of using ->sock_domain.
Also it appears obvious we have something wrong it the protocol
selection algorithm because sock_domain is the one set to the custom
protocols while it ought to be sock_family instead, which would avoid
having to hard-code some conversions for UDP namely.
We need to specially handle protocol families which regroup common
functions used for a given address family. These functions include
bind(), addrcmp(), get_src() and get_dst() for now. Some fields are
also added about the address family, socket domain (protocol family
passed to the socket() syscall), and address length.
These protocol families are referenced from the protocols but not yet
used.
Note that for now we don't have a sockpair.c file to host that unusual
family, so the new function was placed directly into proto_sockpair.c.
It's no big deal given that this family is currently not shared with
multiple protocols.
The function does almost nothing but setting up the receiver. This is
normal as the socket the FDs are passed onto are supposed to have been
already created somewhere else, and the only usable identifier for such
a socket pair is the receiving FD itself.
The function was assigned to sockpair's ->bind() and is not used yet.
This function performs all the bind-related stuff for UNIX sockets that
was previously done in uxst_bind_listener(). There is a very tiny
difference however, which is that previously, in the unlikely event
where listen() would fail, it was still possible to roll back the binding
and rename the backup to the original socket. Now we have to rename it
before calling returning, hence it will be done before calling listen().
However, this doesn't cover any particular use case since listen() has no
reason to fail there (and the rollback is not done for inherited sockets),
that was just done that way as a generic error processing path.
The code is not used yet and is referenced in the uxst proto's ->bind().
This function collects all the receiver-specific code from both
tcp_bind_listener() and udp_bind_listener() in order to provide a more
generic AF_INET/AF_INET6 socket binding function. For now the API is
not very elegant because some info are still missing from the receiver
while there's no ideal place to fill them except when calling ->listen()
at the protocol level. It looks like some polishing code is needed in
check_config_validity() or somewhere around this in order to finalize
the receivers' setup. The main issue is that listeners and receivers
are created *before* bind_conf options are parsed and that there's no
finishing step to resolve some of them.
The function currently sets up a receiver and subscribes it to the
poller. In an ideal world we wouldn't subscribe it but let the caller
do it after having finished to configure the L4 stuff. The problem is
that the caller would then need to perform an fd_insert() call and to
possibly set the exported flag on the FD while it's not its job. Maybe
an improvement could be to have a separate sock_start_receiver() call
in sock.c.
For now the function is not used but it will soon be. It's already
referenced as tcp and udp's ->bind().
This will be the function that must be used to bind the receiver. It
solely depends on the address family but for now it's simpler to have
it per protocol.
The new RX_O_FOREIGN, RX_O_V6ONLY and RX_O_V4V6 options are now set into
the rx_settings part during the parsing, so that we don't need to adjust
them in each and every listener anymore. We have to keep both v4v6 and
v6only due to the precedence from v6only over v4v6.
It's the receiver's FD that's inherited from the parent process, not
the listener's so the flag must move to the receiver so that appropriate
actions can be taken.
In order to split the receiver from the listener, we'll need to know that
a socket is already bound and ready to receive. We used to do that via
tha LI_O_ASSIGNED state but that's not sufficient anymore since the
receiver might not belong to a listener anymore. The new RX_F_BOUND flag
is used for this.
A receiver will have to pass a context to be installed into the fdtab
for use by the handler. We need to set this into the receiver struct
as the bind will happen longer after the configuration.
Just like listeners keep a pointer to their bind_conf, receivers now also
have a pointer to their rx_settings. All those belonging to a listener are
automatically initialized with a pointer to the bind_conf's settings.
We'll soon add flags for the receivers, better add them to the final
file, so it's time to move the definition to receiver-t.h. The struct
receiver and rx_settings were placed there.
The receiver is the one which depends on the protocol while the listener
relies on the receiver. Let's move the protocol there. Since there's also
a list element to get back to the listener from the proto list, this list
element (proto_list) was moved as well. For now when scanning protos, we
still see listeners which are linked by their rx.proto_list part.
The listening socket is represented by its file descriptor, which is
generic to all receivers and not just listeners, so it must move to
the rx struct.
It's worth noting that in order to extend receivers and listeners to
other protocols such as QUIC, we'll need other handles than file
descriptors here, and that either a union or a cast to uintptr_t
will have to be used. This was not done yet and the field was
preserved under the name "fd" to avoid adding confusion.
In order to start to split the listeners into the listener part and the
event receiver part, we introduce a new field "rx" into struct listener
that will eventually become a separate struct receiver. This patch only
adds the struct with an options field that the receivers will need.
The netns is common to all listeners/receivers and is used to bind the
listening socket so it must be in the receiver settings and not in the
listener. This removes some yet another set of unnecessary loops.
The interface is common to all listeners/receivers and is used to bind
the listening socket so it must be in the receiver settings and not in
the listener. This removes some unnecessary loops.
There currently is a large inconsistency in how binding parameters are
split between bind_conf and listeners. It happens that for historical
reasons some parameters are available at the listener level but cannot
be configured per-listener but only for a bind_conf, and thus, need to
be replicated. In addition, some of the bind_conf parameters are in fact
for the listening socket itself while others are for the instanciated
sockets.
A previous attempt at splitting listeners into receivers failed because
the boundary between all these settings is not well defined.
This patch introduces a level of listening socket settings in the
bind_conf, that will be detachable later. Such settings that are solely
for the listening socket are:
- unix socket permissions (used only during binding)
- interface (used for binding)
- network namespace (used for binding)
- process mask and thread mask (used during startup)
The rest seems to be used only to initialize the resulting sockets, or
to control the accept rate. For now, only the unix params (bind_conf->ux)
were moved there.
Remove the last utility functions for handling the multi-cert bundles
and remove the multi-variable from the ckch structure.
With this patch, the bundles are completely removed.
Since the removal of the multi-certificates bundle support, this
variable is not useful anymore, we can remove all tests for this
variable and suppose that every ckch contains a single certificate.
Commit 4987a4744 ("CLEANUP: tree-wide: use VAR_ARRAY instead of [0] in
various definitions") broke the build on clang due to the tlv field used
to receive/send the proxy protocol. The problem is that struct tlv is
included at the beginning of struct tlv_ssl, which doesn't make much
sense. In fact the value[] array isn't really a var array but just an
end of struct marker, and must really be an array of size zero.
Surprisingly there were still a number of [0] definitions for variable
sized arrays in certain structures all over the code. We need to use
VAR_ARRAY instead of zero to accommodate various compilers' preferences,
as zero was used only on old ones and tends to report errors on new ones.
tcc supports variadic macros provided that there is always at least one
argument, like older gcc versions. Thus we need to always keep one and
define args as the remaining ones. It's not an issue at all and doesn't
change the way to use them, just the internal definitions.
For whatever reason, glibc decided that the __attribute__ keyword is
the exclusive property of gcc, and redefines it to an empty macro on
other compilers. Some non-gcc compilers also support it (possibly
partially), tinycc is one of them. By doing this, glibc silently
broke all constructors, resulting in code that arrives in main() with
uninitialized variables.
The solution we use here consists in undefining the macro on non-gcc
compilers, and redefining it to itself in order to cause a conflict in
the event the redefinition would happen afterwards. This visibly solved
the problem.
Some checks on __GNUC__ imply that if it's undefined it will match a
low value but that's not always what we want, like for example in the
VAR_ARRAY definition which is not needed on tcc. Let's always be explicit
on these tests.
A SRV record keeps a reference on the corresponding additional record, if
any. But this additional record is also inserted in a separate linked-list into
the dns response. The problems arise when obsolete additional records are
released. The additional records list is purged but the SRV records always
reference these objects, leading to an undefined behavior. Worst, this happens
very quickly because additional records are never renewed. Thus, once received,
an additional record will always expire.
Now, the addtional record are only associated to a SRV record or simply
ignored. And the last version is always used.
This patch helps to fix the issue #841. It must be backported to 2.2.
Ever since the protocols were added in 1.3.13, listeners used to be
started twice:
- once by start_proxies(), which iteratees over all proxies then all
listeners ;
- once by protocol_bind_all() which iterates over all protocols then
all listeners ;
It's a real mess because error reporting is not even consistent, and
more importantly now that some protocols do not appear in regular
proxies (peers, logs), there is no way to retry their binding should
it fail on the last step.
What this patch does is to make sure that listeners are exclusively
started by protocols. The failure to start a listener now causes the
emission of an error indicating the proxy's name (as it used to be
the case per proxy), and retryable failures are silently ignored
during all but last attempts.
The start_proxies() function was kept solely for setting the proxy's
state to READY and emitting the "Proxy started" message and log that
some have likely got used to seeking in their logs.
When calling the http_replace_res_status() function, an optional reason may now
be set. It is ignored if it points to NULL and the original reason is
preserved. Only the response status is replaced. Otherwise both the status and
the reason are replaced.
It simplifies the API and most of time, avoids an extra call to
http_replace_res_reason().
The http_replace_req_path() function now takes a third argument to evaluate the
query-string as part of the path or to preserve it. If <with_qs> is set, the
query-string is replaced with the path. Otherwise, only the path is replaced.
This patch is mandatory to fix issue #829. The next commit depends on it. So be
carefull during backports.
For now we still don't retrieve dgram sockets, but the code must be able
to distinguish them before we switch to receivers. This adds a new flag
to the xfer_sock_list indicating that a socket is of type SOCK_DGRAM. The
way to set the flag for now is by looking at the dummy address family which
equals AF_CUST_UDP{4,6} in this case (given that other dgram sockets are not
yet supported).
We'll want to store more info there and some info that are not represented
in listener options at the moment (such as dgram vs stream) so let's get
rid of these and instead use a new set of options (SOCK_XFER_OPT_*).
The new function was called sock_get_old_sockets() and was left as-is
except a minimum amount of style lifting to make it more readable. It
will never be awesome anyway since it's used very early in the boot
sequence and needs to perform socket I/O without any external help.
This code was highly redundant, existing for TCP clients, TCP servers
and UDP servers. Let's move it to sock_inet where it belongs. The new
functions are sock_inet4_make_foreign() and sock_inet6_make_foreign().
This is essentially a merge from tcp_find_compatible_fd() and
uxst_find_compatible_fd() that relies on a listener's address and
compare function and still checks for other variations. For AF_INET6
it compares a few of the listener's bind options. A minor change for
UNIX sockets is that transparent mode, interface and namespace used
to be ignored when trying to pick a previous socket while now if they
are changed, the socket will not be reused. This could be refined but
it's still better this way as there is no more risk of using a
differently bound socket by accident.
Eventually we should not pass a listener there but a set of binding
parameters (address, interface, namespace etc...) which ultimately will
be grouped into a receiver. For now this still doesn't exist so let's
stick to the listener to break dependencies in the rest of the code.
Let's determine it at boot time instead of doing it on first use. It
also saves us from having to keep it thread local. It's been moved to
the new sock_inet_prepare() function, and the variables were renamed
to sock_inet_tcp_maxseg_default and sock_inet6_tcp_maxseg_default.
The v6only_default variable is not specific to TCP but to AF_INET6, so
let's move it to the right file. It's now immediately filled on startup
during the PREPARE stage so that it doesn't have to be tested each time.
The variable's name was changed to sock_inet6_v6only_default.
The function now makes it clear that it's independent on the socket
type and solely relies on the address family. Note that it supports
both IPv4 and IPv6 as we don't seem to need it per-family.
This one is common to the TCPv4 and UDPv4 code, it retrieves the
destination address of a socket, taking care of the possiblity that for
an incoming connection the traffic was possibly redirected. The TCP and
UDP definitions were updated to rely on it and remove duplicated code.
The new addrcmp() protocol member points to the function to be used to
compare two addresses of the same family.
When picking an FD from a previous process, we can now use the address
specific address comparison functions instead of having to rely on a
local implementation. This will help move that code to a more central
place.
These files will regroup everything specific to AF_INET, AF_INET6 and
AF_UNIX socket definitions and address management. Some code there might
be agnostic to the socket type and could later move to af_xxxx.c but for
now we only support regular sockets so no need to go too far.
The files are quite poor at this step, they only contain the address
comparison function for each address family.
The new file sock.c will contain generic code for standard sockets
relying on file descriptors. We currently have way too much duplication
between proto_uxst, proto_tcp, proto_sockpair and proto_udp.
For now only get_src, get_dst and sock_create_server_socket were moved,
and are used where appropriate.
This is totally ugly, smp_fetch_src() is exported only so that stick_table.c
can (ab)use it in the {sc,src}_* sample fetch functions. It could be argued
that the sample could have been reconstructed there in place, but we don't
even need to duplicate the code. We'd rather simply retrieve the "src"
fetch's function from where it's used at init time and be done with it.
This new flag will be used to mark FDs that must be passed to any future
process across the CLI's "_getsocks" command.
The scheme here is quite complex and full of special cases:
- FDs inherited from parent processes are *not* exported this way, as
they are supposed to instead be passed by the master process itself
across reloads. However such FDs ought never to be paused otherwise
this would disrupt the socket in the parent process as well;
- FDs resulting from a "bind" performed over a socket pair, which are
in fact one side of a socket pair passed inside another control socket
pair must not be passed either. Since all of them are used the same
way, for now it's enough never to put this "exported" flag to FDs
bound by the socketpair code.
- FDs belonging to temporary listeners (e.g. a passive FTP data port)
must not be passed either. Fortunately we don't have such FDs yet.
- the rest of the listeners for now are made of TCP, UNIX stream, ABNS
sockets and are exportable, so they get the flag.
- UDP listeners were wrongly created as listeners and are not suitable
here. Their FDs should be passed but for now they are not since the
client doesn't even distinguish the SO_TYPE of the retrieved sockets.
In addition, it's important to keep in mind that:
- inherited FDs may never be closed in master process but may be closed
in worker processes if the service is shut down (useless since still
bound, but technically possible) ;
- inherited FDs may not be disabled ;
- exported FDs may be disabled because the caller will perform the
subsequent listen() on them. However that might not work for all OSes
- exported FDs may be closed, it just means the service was shut down
from the worker, and will be rebound in the new process. This implies
that we have to disable exported on close().
=> as such, contrary to an apparently obvious equivalence, the "exported"
status doesn't imply anything regarding the ability to close a
listener's FD or not.
This essentially undoes what we did in fd.c in 1.8 to support seamless
reload. Since we don't need to remove an fd anymore we can turn
fd_delete() to the simple function it used to be.
Let's not look at the listener options passed by the original process
and determine from the socket itself whether it is configured for
transparent mode or not. This is cleaner and safer, and doesn't rely
on flag values that could possibly change between versions.
haproxy supports generating SSL certificates based on SNI using a provided
CA signing certificate. Because CA certificates may be signed by multiple
CAs, in some scenarios, it is neccesary for the server to attach the trust chain
in addition to the generated certificate.
The following patch adds the ability to serve the entire trust chain with
the generated certificate. The chain is loaded from the provided
`ca-sign-file` PEM file.
When a regex had been succesfully compiled by the JIT pass, it is better
to use the related match, thanksfully having same signature, for better
performance.
Signed-off-by: David Carlier <devnexen@gmail.com>
The ARGT_PTR argument type may now be used to keep a reference to opaque data in
the argument array used by sample fetches and converters. It is a generic way to
point on data. I guess it could be used for some other arguments, like proxy,
server, map or stick-table.
A dedicated expiration date is now used to apply the inspect-delay of the
tcp-request or tcp-response rulesets. Before, the analyse expiratation date was
used but it may also be updated by the lua (at least). So a lua script may
extend or reduce the inspect-delay by side effect. This is not expected. If it
becomes necessary, a specific function will be added to do this. Because, for
now, it is a bit confusing.
The HTX_FL_EOI flag must now be set on a HTX message when no more data are
expected. Most of time, it must be set before adding the EOM block. Thus, if
there is no space for the EOM, there is still an information to know all data
were received and pushed in the HTX message. There is only an exception for the
HTTP replies (deny, return...). For these messages, the flag is set after all
blocks are pushed in the message, including the EOM block, because, on error,
we remove all inserted data.
__task_free() cannot be called with a task still in the queue. This
test adds a check which confirms there is no concurrency issue on such
a case where a thread could requeue nor wakeup a task being freed.
This aims at catching calls to task_unlink_wq() performed by the wrong
thread based on the shared status for the task, as well as calls to
__task_queue() with the wrong timer queue being used based on the task's
capabilities. This will at least help eliminate some hypothesis during
debugging sessions when suspecting that a wrong thread has attempted to
queue a task at the wrong place.
The BUG_ON() test in task_queue() only tests for the case where
we're queuing a task that doesn't run on the current thread. Let's
refine it a bit further to catch all cases where the task does not
run *exactly* on the current thread alone.
Reported github issue #759 shows there is no name resolving
on server lines for ring and peers sections.
This patch introduce the resolving for those lines.
This patch adds boolean a parameter to parse_server function to specify
if we want the function to perform an initial name resolving using libc.
This boolean is forced to true in case of peers or ring section.
The boolean is kept to false in case of classic servers (from
backend/listen)
This patch should be backported in branches where peers sections
support 'server' lines.
This patch adds a global counter of received syslog messages
and this one is exported on CLI "show info" as "CumRecvLogs".
This patch also updates internal conn counter and freq
of the listener and the proxy for each received log message to
prepare a further export on the "show stats".
Log forwarding:
It is possible to declare one or multiple log forwarding section,
haproxy will forward all received log messages to a log servers list.
log-forward <name>
Creates a new log forwarder proxy identified as <name>.
bind <addr> [param*]
Used to configure a log udp listener to receive messages to forward.
Only udp listeners are allowed, address must be prefixed using
'udp@', 'udp4@' or 'udp6@'. This supports for all "bind" parameters
found in 5.1 paragraph but most of them are irrelevant for udp/syslog case.
log global
log <address> [len <length>] [format <format>] [sample <ranges>:<smp_size>]
<facility> [<level> [<minlevel>]]
Used to configure target log servers. See more details on proxies
documentation.
If no format specified, haproxy tries to keep the incoming log format.
Configured facility is ignored, except if incoming message does not
present a facility but one is mandatory on the outgoing format.
If there is no timestamp available in the input format, but the field
exists in output format, haproxy will use the local date.
Example:
global
log stderr format iso local7
ring myring
description "My local buffer"
format rfc5424
maxlen 1200
size 32764
timeout connect 5s
timeout server 10s
# syslog tcp server
server mysyslogsrv 127.0.0.1:514 log-proto octet-count
log-forward sylog-loadb
bind udp4@127.0.0.1:1514
# all messages on stderr
log global
# all messages on local tcp syslog server
log ring@myring local0
# load balance messages on 4 udp syslog servers
log 127.0.0.1:10001 sample 1:4 local0
log 127.0.0.1:10002 sample 2:4 local0
log 127.0.0.1:10003 sample 3:4 local0
log 127.0.0.1:10004 sample 4:4 local0
This patch introduce a new fd handler used to parse syslog
message on udp.
The parsing function returns level, facility and metadata that
can be immediatly reused to forward message to a log server.
This handler is enabled on udp listeners if proxy is internally set
to mode PR_MODE_SYSLOG
This patch merges build message code between sink and log
and introduce a new API based on struct ist array to
prepare message header with zero copy, targeting the
log forwarding feature.
Log format 'iso' and 'timed' are now avalaible on logs line.
A new log format 'priority' is also added.
This patch introduce proto_udp.c targeting a further support of
log forwarding feature.
This code was originally produced by Frederic Lecaille working on
QUIC support and only minimal requirements for syslog support
have been merged.
In the continuity of the commit 7cf0e4517 ("MINOR: raw_sock: report global
traffic statistics"), we are now able to report the global number of bytes
emitted using the splicing. It can be retrieved in "show info" output on the
CLI.
Note this counter is always declared, regardless the splicing support. This
eases the integration with monitoring tools plugged on the CLI.
The srv_del_conn_from_list() function is now responsible to update the server
counters and the connection flags when a connection is removed from an idle list
(safe, idle or available). It is called when a connection is released or when a
connection is set as private. This function also removes the connection from the
idle list if necessary.
The srv_use_idle_conn() function is now responsible to update the server
counters and the connection flags when an idle connection is reused. The same
function is called when a new connection is created. This simplifies a bit the
connect_server() function.
When a new connection is created, its target is always set just after. So the
connection target may set when it is created instead, during its initialisation
to be precise. It is the purpose of this patch. Now, conn_new() function is
called with the connection target as parameter. The target is then passed to
conn_init(). It means the target must be passed when cs_new() is called. In this
case, the target is only used when the conn-stream is created with no
connection. This only happens for tcpchecks for now.
The session_get_conn() must now be used to look for an available connection
matching a specific target for a given session. This simplifies a bit the
connect_server() function.
When a connection is marked as private, it is now added in the session server
list. We don't wait a stream is detached from the mux to do so. When the
connection is created, this happens after the mux creation. Otherwise, it is
performed when the connection is marked as private.
To allow that, when a connection is created, the session is systematically set
as the connectin owner. Thus, a backend connection has always a owner during its
creation. And a private connection has always a owner until its death.
Note that outside the detach() callback, if the call to session_add_conn()
failed, the error is ignored. In this situation, we retry to add the connection
into the session server list in the detach() callback. If this fails at this
step, the multiplexer is destroyed and the connection is closed.
To set a connection as private, the conn_set_private() function must now be
called. It sets the CO_FL_PRIVATE flags, but it also remove the connection from
the available connection list, if necessary. For now, it never happens because
only HTTP/1 connections may be set as private after their creation. And these
connections are never inserted in the available connection list.
When a connection is added to an idle list, it's already detached and
cannot be seen by two threads at once, so there's no point using
TRY_ADDQ, there will never be any conflict. Let's just use the cheaper
ADDQ.
The TRY_ADDQ there was not needed since the wait list is exclusively
owned by the caller. There's a preliminary test on MT_LIST_ADDED()
that might have been eliminated by keeping MT_LIST_TRY_ADDQ() but
it would have required two more expensive writes before testing so
better keep the test the way it is.
Initially when mt_lists were added, their purpose was to be used with
the scheduler, where anyone may concurrently add the same tasklet, so
it sounded natural to implement a check in MT_LIST_ADD{,Q}. Later their
usage was extended and MT_LIST_ADD{,Q} started to be used on situations
where the element to be added was exclusively owned by the one performing
the operation so a conflict was impossible. This became more obvious with
the idle connections and the new macro was called MT_LIST_ADDQ_NOCHECK.
But this remains confusing and at many places it's not expected that
an MT_LIST_ADD could possibly fail, and worse, at some places we start
by initializing it before adding (and the test is superflous) so let's
rename them to something more conventional to denote the presence of the
check or not:
MT_LIST_ADD{,Q} : inconditional operation, the caller owns the
element, and doesn't care about the element's
current state (exactly like LIST_ADD)
MT_LIST_TRY_ADD{,Q}: only perform the operation if the element is not
already added or in the process of being added.
This means that the previously "safe" MT_LIST_ADD{,Q} are not "safe"
anymore. This also means that in case of backport mistakes in the
future causing this to be overlooked, the slower and safer functions
will still be used by default.
Note that the missing unchecked MT_LIST_ADD macro was added.
The rest of the code will have to be reviewed so that a number of
callers of MT_LIST_TRY_ADDQ are changed to MT_LIST_ADDQ to remove
the unneeded test.
It is now possible to customize TCP keepalive parameters.
These correspond to the socket options TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL
and are valid for the defaults, listen, frontend and backend sections.
This patch fixes GitHub issue #670.
The torture test run for previous commit 787dc20 ("BUG/MEDIUM: lists: add
missing store barrier on MT_LIST_BEHEAD()") finally broke again after 34M
connections. It appeared that MT_LIST_ADD and MT_LIST_ADDQ were suffering
from the same missing barrier when restoring the original pointers before
giving up, when checking if the element was already added. This is indeed
something which seldom happens with the shared scheduler, in case two
threads simultaneously try to wake up the same tasklet.
With a store barrier there after reverting the pointers, the torture test
survived 750M connections on the NanoPI-Fire3, so it looks good this time.
Probably that MT_LIST_BEHEAD should be added to test-list.c since it seems
to be more sensitive to concurrent accesses with MT_LIST_ADDQ.
It's worth noting that there is no barrier between the last two pointers
update, while there is one in MT_LIST_POP and MT_LIST_BEHEAD, the latter
having shown to be needed, but I cannot demonstrate why we would need
one here. Given that the code seems solid here, let's stick to what is
shown to work.
This fix should be backported to 2.1, just for the sake of safety since
the issue couldn't be triggered there, but it could change with the
compiler or when backporting a fix for example.
When running multi-threaded tests on my NanoPI-Fire3 (8 A53 cores), I
managed to occasionally get either a bus error or a segfault in the
scheduler, but only when running at a high connection rate (injecting
on a tcp-request connection reject rule). The bug is rare and happens
around once per million connections. I could never reproduce it with
less than 4 threads nor on A72 cores.
Haproxy 2.1.0 would also fail there but not 2.1.7.
Every time the crash happened with the TL_URGENT task list corrupted,
though it was not immediately after the LIST_SPLICE() call, indicating
background activity survived the MT_LIST_BEHEAD() operation. This queue
is where the shared runqueue is transferred, and the shared runqueue
gets fast inter-thread tasklet wakeups from idle conn takeover and new
connections.
Comparing the MT_LIST_BEHEAD() and MT_LIST_DEL() implementations, it's
quite obvious that a few barriers are missing from the former, and these
will simply fail on weakly ordered caches.
Two store barriers were added before the break() on failure, to match
what is done on the normal path. Missing them almost always results in
a segfault which is quite rare but consistent (after ~3M connections).
The 3rd one before updating n->prev seems intuitively needed though I
coudln't make the code fail without it. It's present in MT_LIST_DEL so
better not be needlessly creative. The last one is the most important
one, and seems to be the one that helps a concurrent MT_LIST_ADDQ()
detect a late failure and try again. With this, the code survives at
least 30M connections.
Interestingly the exact same issue was addressed in 2.0-dev2 for MT_LIST_DEL
with commit 690d2ad4d ("BUG/MEDIUM: list: add missing store barriers when
updating elements and head").
This fix must be backported to 2.1 as MT_LIST_BEHEAD() is also used there.
It's only tagged as medium because it will only affect entry-level CPUs
like Cortex A53 (x86 are not affected), and requires load levels that are
very hard to achieve on such machines to trigger it. In practice it's
unlikely anyone will ever hit it.
OpenSSL 1.1.1 provides a callback registering function
SSL_CTX_set_keylog_callback, which allows one to receive a string
containing the keys to deciphers TLSv1.3.
Unfortunately it is not possible to store this data in binary form and
we can only get this information using the callback. Which means that we
need to store it until the connection is closed.
This patches add 2 pools, the first one, pool_head_ssl_keylog is used to
store a struct ssl_keylog which will be inserted as a ex_data in a SSL *.
The second one is pool_head_ssl_keylog_str which will be used to store
the hexadecimal strings.
To enable the capture of the keys, you need to set "tune.ssl.keylog on"
in your configuration.
The following fetches were implemented:
ssl_fc_client_early_traffic_secret,
ssl_fc_client_handshake_traffic_secret,
ssl_fc_server_handshake_traffic_secret,
ssl_fc_client_traffic_secret_0,
ssl_fc_server_traffic_secret_0,
ssl_fc_exporter_secret,
ssl_fc_early_exporter_secret
Originally it was made to return a void* because some comparisons in the
code where it was used required a lot of casts. But now we don't need
that anymore. And having it non-const breaks the build on NetBSD 9 as
reported in issue #728.
So let's switch to const and adjust debug.c to accomodate this.
When we takeover a connection, let the xprt layer know. If it has its own
tasklet, and it is already scheduled, then it has to be destroyed, otherwise
it may run the new mux tasklet on the old thread.
Note that we only do this for the ssl xprt for now, because the only other
one that might wake the mux up is the handshake one, which is supposed to
disappear before idle connections exist.
No backport is needed, this is for 2.2.
In the various takeover() methods, make sure we schedule the old tasklet
on the old thread, as we don't want it to run on our own thread! This
was causing a very rare crash when building with DEBUG_STRICT, seeing
that either an FD's thread mask didn't match the thread ID in h1_io_cb(),
or that stream_int_notify() would try to queue a task with the wrong
tid_bit.
In order to reproduce this, it is necessary to maintain many connections
(typically 30k) at a high request rate flowing over H1+SSL between two
proxies, the second of which would randomly reject ~1% of the incoming
connection and randomly killing some idle ones using a very short client
timeout. The request rate must be adjusted so that the CPUs are nearly
saturated, but never reach 100%. It's easier to reproduce this by skipping
local connections and always picking from other threads. The issue
should happen in less than 20s otherwise it's necessary to restart to
reset the idle connections lists.
No backport is needed, takeover() is 2.2 only.
tasklet_wakeup() only checks tl->tid to know whether the task is
programmed to run on the current thread or on a specific thread. We'll
have to ease this selection in a subsequent patch, preferably without
modifying tl->tid, so let's have a new tasklet_wakeup_on() function
to specify the thread number to run on. That the logic has not changed
at all.
Now it's possible to preserve spacing everywhere except in "log-format",
"log-format-sd" and "unique-id-format" directives, where spaces are
delimiters and are merged. That may be useful when the response payload
is specified as a log format string by "lf-file" or "lf-string", or even
for headers or anything else.
In order to merge spaces, a new option LOG_OPT_MERGE_SPACES is applied
exclusively on options passed to function parse_logformat_string().
This patch fixes an issue #701 ("http-request return log-format file
evaluation altering spacing of ASCII output/art").
Now when building with -DDEBUG_MEM_STATS, some malloc/calloc/strdup/realloc
stats are kept per file+line number and may be displayed and even reset on
the CLI using "debug dev memstats". This allows to easily track potential
leakers or abnormal usages.
Enables ('on') or disables ('off') sharing of idle connection pools between
threads for a same server. The default is to share them between threads in
order to minimize the number of persistent connections to a server, and to
optimize the connection reuse rate. But to help with debugging or when
suspecting a bug in HAProxy around connection reuse, it can be convenient to
forcefully disable this idle pool sharing between multiple threads, and force
this option to "off". The default is on.
This could have been nice to have during the idle connections debugging,
but it's not too late to add it!
Add two new macros, MT_LIST_DEL_SAFE_NOINIT makes sure we remove the
element from the list, without reinitializing its next and prev, and
MT_LIST_ADDQ_NOCHECK is similar to MT_LIST_ADDQ(), except it doesn't check
if the element is already in a list.
The goal is to be able to move an element from a list we're currently
parsing to another, keeping it locked in the meanwhile.
task_kill() may be used by any thread to kill any task with less overhead
than a regular wakeup. In order to achieve this, it bypasses the priority
tree and inserts the task directly into the shared tasklets list, cast as
a tasklet. The task_list_size is updated to make sure it is properly
decremented after execution of this task. The task will thus be picked by
process_runnable_tasks() after checking the tree and sent to the TL_URGENT
list, where it will be processed and killed.
If the task is bound to more than one thread, its first thread will be the
one notified.
If the task was already queued or running, nothing is done, only the flag
is added so that it gets killed before or after execution. Of course it's
the caller's responsibility to make sur any resources allocated by this
task were already cleaned up or taken over.