This development tool can be used to convert a string representing a
termination event logs to its human redable representation. Several string
may be converting at a time. To do so, several arguments can be specified on
the commeand line or they can be provided on STDIN, using "-" argument.
Here is an exemple:
> term_events f2x2f4x4 m2m4m1 e2e1 s2s1S1 E1 M1 F1
### f2x2f4x4 : fd:shutr > xprt:shutr > fd:snd_err > xprt:snd_err
### m2m4m1 : muxc:shutr > muxc:snd_err > muxc:shutw
### e2e1 : se:eos > se:shutw
### s2s1S1 : strm:eos > strm:shutw > STRM:shutw
### E1 : SE:shutw
### M1 : MUXC:shutw
### F1 : FD:shutw
The make target "dev/term_events/term_events" must be used to compile it.
Enums used to report events were placed in the connection header for
conveniance. But it is not specifically related to connection. So, they are
moved at the end of the file to have a better isolation.
The function is now responsible to handle empty log because no event was
reported. In that case, an empty string is returned. It is also responsible to
handle case where termination events log is not supported for an given entity
(for instance the quic mux for now). In that case, a dash ("-") is returned.
"term_events" is a sample fetche function that can be used to get
termination events for all locations in one call. The format equivalent to:
{fc_term_events,fc_mux_term_events,fs.term_events,txn.term_events,bs.term_events,bc_mux_term_events,bc_term_events}
If no event was reported for a location, the field is empty. If the feature
is not supported yet, a dash ('-') is printed.
There is no termination events log for applet but events for the SE location
are filled when the endpoint is an applet. Most of them relies on the new
applet API. Only few events are reported for legacy applets.
It is hard to never detect the same event several time without painful
tests. In other words, the same termination event can be reported several
time and this must be handled. To do so, "tevt_report_event" macro is
updated to ignore an event if the last reported one is of the same type, for
the same location. Of course, if the same event is reported several times at
different moment, it will not be detected.
If it is the last patch to introduce dedicated termination events for each
location. In this one, events for the stream location are introcued. The old
enum is also removed because it is now unused.
Here, more accurate evets are added. The "intercepted" event was splitted.
Termination events dedicated to mux connection and stream-endpoint
descriptors are added in this patch. Specific events to these locations are
thus added. Changes for the H1 and H2 multiplexers are reviewed to be more
accurate.
To be able to add more accurate termination events for each location, the
enum will be splitted by location. Indeed, there are at most 16 possbile
events. It will be pretty confusing to use same termination events for the
different locations. So the best is to split them.
In this patch, the termination events for the fd, hs and xprt locations are
introduced. For now some holes are added to keep similar events aligned
across enums. But this may change in future.
MUX_CTL_TEVTS command is added to get the termination event logs of a mux
connection and MUX_SCTL_TEVTS command to get the termination event logs of a
mux stream.
The termiantion events logs of the multiplexer connection and stream are now
dumped when corresponding mux info are dumped. The termination event logs of
the underlying connection is also dumped in the debug string.
In this patch, events for the stream location are reported. These events are
first reported on the corresponding stream-connector. So front events on scf
and back event on scb. Then all events are both merged in the stream. But
only 4 events are saved on the stream.
Several internal events are for now grouped with the type
"tevt_type_intercepted". More events will be added to have a better
resolution. But at least the place to report these events are identified.
For now, when a event is reported on a SC, it is also reported on the stream
and vice versa.
shutdown for reads (read0), receive errors, shutdown for writes and timeouts
are reported, but only for the H2 connection for now.
As for the H1 multiplexer, more events must be added to report protocol
errors, goaways and rst-streams. And of course, all events for the H2
streams must be reported too.
shutdown for reads (read0), receive errors, shutdown for writes and timeouts
are reported. It is not too hard to know where to report events generated by
HAProxy (timeouts and shutw). For detected events (shutr and receive error),
it is not so simple. These events must not be reported when they are
detected but when the mux can handle them. For instance, some unprocessed
input data may block a read0. So, the experience will tell us if these
events are reported at the rigth time and on the right conditions.
For now, no internal errors (parsing errors, protocol errors, intenral
errors...) are reported because these event types have not yet been added.
This termination events log will be used to report events from the mux
streams. The location will be "tevt_loc_se" and the muxes will be
responsible to report the corresponding events.
Termination events logs will be used to report the events that led to close
a connection. Unlike flags, that reflect a state, the idea here is to store
a log to preserve the order of the events. Most of time, when debugging an
issue, the order of the events is crucial to be able to understand the root
cause of the issue. The traces are trully heplful to do so. But it is not
always possible to active them because it is pretty verbose. On heavily
loaded platforms, it is not acceptable. We hope that the termination events
logs will help us in that situations.
One termination events log will be be store at each layer (connection, mux
connection, mux stream...) as a 32-bits integer. Each event will be store on
8 bits, 4 bits for the location and 4 bits for the type. So the first four
events will be stored only for each layer. It should be enough why a
connection is closed.
In this patch, the enums defining the termination event locations and types
are added. The macro to report a new event is also added and a function to
convert a termination events log to a string that could be display in log
messages for instance.
When a demux error is reported by the H1S, an error must be reported on the
SE and not an end-of-input or an end-of-stream. So SE_FL_ERROR flag must be
set and not SE_FL_EOI/SE_FL_EOS.
It seems this bug has no impact. So there is no reason to backport it.
It is just a small patch to clean up mux/demux functions. Instead of listing
the H1S errors that must be handled during demux of mux operations, masks of
flags are used. It is more readable.
Now that we can see that some events are reported for older instances
of a file descriptor, let's skip these ones instead of reporting
dangerous events on them. It might possibly qualify as a bug if it
helps fixing strange issues in certain environments, in which case it
can make sense to backport it along with the following recent patches:
DEBUG: fd: add a counter of takeovers of an FD since it was last opened
MINOR: fd: add a generation number to file descriptors
DEBUG: epoll: store and compare the FD's generation count with reported event
There have been some reported cases where races between threads in epoll
were causing wrong reports of close or error events. Since the epoll_event
data is 64 bits, we can store the FD's generation counter in the upper
bits to verify if we're speaking about the same instance of the FD as the
current one or a stale one. If the generation number does not match, then
we classify these into 3 conditions and increment the relevant COUNT_IF()
counters (stale report for closed FD, stale report of harmless event on
reopened FD, stale report of HUP/ERR on reopened FD). Tests have shown that
with heavy concurrency, a very small maxconn (typically 1 per thread),
http-reuse always and a server closing connections first but randomly
(httpterm with /C=2r), such events can happen at a pace of a few per second
for the closed FDs, and a few per minute for the other ones, so there's value
in leaving this accessible for troubleshooting. E.g after a few minutes:
Count Type Location function(): "condition" [comment]
5541 CNT ev_epoll.c:296 _do_poll(): "1" [epoll report of event on a just closed fd (harmless)]
10 CNT ev_epoll.c:294 _do_poll(): "1" [epoll report of event on a closed recycled fd (rare)]
42 CNT ev_epoll.c:289 _do_poll(): "1" [epoll report of HUP on a stale fd reopened on the same thread (suspicious)]
212 CNT ev_epoll.c:279 _do_poll(): "1" [epoll report of HUP/ERR on a stale fd reopened on another thread (harmless)]
1 CNT mux_h1.c:3911 h1_send(): "b_data(&h1c->obuf)" [connection error (send) with pending output data]
This one with the following setup, whicih abuses threads contention by
starting 64 threads on two cores:
- config:
global
nbthread 64
stats socket /tmp/sock1 level admin
stats timeout 1h
defaults
timeout client 5s
timeout server 5s
timeout connect 5s
mode http
listen p2
bind :8002
http-reuse always
server s1 127.0.0.1:8000 maxconn 4
- haproxy forcefully started on 2C4T:
$ taskset -c 0,1,4,5 ./haproxy -db -f epoll-dbg.cfg
- httpterm on port 8000, cpus 2,3,6,7 (2C4T)
- h1load with responses larger than a single buffer, and randomly
closing/keeping alive:
$ taskset -c 2,3,6,7 h1load -e -t 4 -c 256 -r 1 0:8002/?s=19k/C=2r
This patch adds a counter of close() on file descriptors in the fdtab.
The goal is to better detect if reported events concern the current or
a previous file descriptor. For now the counter is only added, and is
showed in "show fd" as "gen". We're reusing unused space at the end of
the struct. If it's needed for something more important later, this
patch can be reverted.
That's essentially in order to help with debugging strange cases like
the occasional epoll issues/races, by keeping a counter of how many
times an FD was taken over since last inserted. The room is available
so let's use it. If it's needed later, this patch can easily be reverted.
The counter is also reported in "show fd" as "tkov".
A new global option was recently introduced to disable pacing. However,
the value used (1<<31) caused issue with some compiler as options field
used for storage is declared as int. Move pacing deactivation flag
outside into the newly defined quic_tune to fix this.
This should be backported up to 3.1 after a period of observation. Note
that it relied on the previous patch which defined new quic_tune type.
Define a new structure quic_tune. It will be useful to regroup various
configuration settings and tunable related to QUIC, instead of defining
them into the global structure.
Pacing has recently been moved out of experimental status and is
activated by default. This is a mandatory requirement for BBR.
Furthermore, BBR is now considered stable. As such, removes its
experimental status with this commit.
Remove pacing experimental status, so it's not required anymore to use
expose-experimental-directives to enable it.
Along this change, pacing is now activated by default. As such, pacing
configuration is transformed into its final form. The global on/off
setting is turned into a disable setting without argument.
Pacing support was previously activated on each bind line individually,
via an optional argument of quic-cc-algo keyword. Remove this optional
argument and introduce a global setting to enable/disable pacing. Pacing
activation is still flagged as experimental.
One important change is that previously BBR usage automatically
activated pacing support. This is not the case anymore, so users should
now always explicitely activate pacing if BBR is selected. A new warning
message will be displayed if this is not the case.
Another consequence of this change is that now pacing_inter callback is
always defined for every quic_cc_algo types. As such, QUIC MUX uses
global.tune.options to determine if pacing is required.
This should be backported up to 3.1, after a period of observation.
Pacing is activated per bind line via an optional boolean argument of
quic-cc-algo keyword. Contrary to the default usage, pacing is
automatically activated when BBR is chosen. This is because this
algorithm is expected to run on top of pacing, else its behavior is
undefined.
Previously, pacing argument was thus ignored when BBR was selected.
Change this to support explicit deactivation of pacing with it. This
could be useful to test BBR without pacing when debugging some issues.
This should be backported up to 3.1, after a period of observation.
Pacing activation configuration has been recently revamped. Previously,
pacing related quic-cc-algo argument was used to specify a burst size.
It evolved into a boolean value as burst size is dynamically calculated
now. As such, removes any references to the old burst value in config
parsing code for cleaner code.
This should be backported up to 3.1, after a period of observation.
Late in 3.1 we've added an integrity check to make sure we didn't keep
trash objects allocated before resizing the trash with commit 0bfd36e7b8
("MINOR: chunk: add a BUG_ON upon the next init_trash_buffer()"), but
it turns out that the counter that is being checked includes the number
of objects left in local thread caches. As such it can trigger despite
no object being allocated. This precisely happens when setting
tune.memory.hot-size to a few megabytes because some temporarily used
trash objects will remain in cache.
In order to address this, let's first flush the pool before running
the check. That was previously done by pool_destroy() but the check
had to be inserted before it. So now we first flush the trash pool,
then verify it's no longer used, and finally we can destroy it.
This needs to be backported to 3.1. Thanks to Christian Ruppert for
reporting this bug.
Patch discussed in https://github.com/wolfSSL/wolfssl/issues/6834
When building Wolfssl without renegotiation options, WolfSSL still
defines the macros about it, which warns during the build.
This patch completes the previous one by undefining the macros so
haproxy could build without any warning.
In ticket https://github.com/wolfSSL/wolfssl/issues/6834, it was
suggested to push --enable-haproxy within --enable-distro.
WolfSSL does not want to include the renegotiation support in
--enable-distro.
To achieve this, let haproxy build without SSL_renegotiate_pending()
when wolfssl does not define HAVE_SECURE_RENEGOCIATION or
HAVE_SERVER_RENEGOCIATION_INFO.
Commit 44537379fc ("MINOR: tools: add errname to print errno macro
name") brought a facility to report errno using a symbolic string
when known instead of showing only the value. However, among the
listed options, ETIME is mentioned but is unknown from FreeBSD where
it breaks the build. Let's simply drop it, we don't use ETIME anyway
and even if it would be reported, the default code path still reports
the numeric value so there's no harm. If other ones fail to build in
the future, they could be handled the same way.
Thanks to the previous patch, it is now possible to explicitly rely on
stream's events to shut it down. The right event is set in
stream_shutdown(), before waking up the stream, via an atomic operation. In
process_stream(), this event will be handled as expected.
Thus, TASK_F_UEVT* are no longer used, but not removed since still usable
for other tasks.
This patch depends on "MEDIUM: stream: Map task wake up reasons to dedicated
stream events".
To fix thread-safety issues when a stream must be shut, three new task
states were added. These states are generic (UEVT1, UEVT2 and UEVT3), the
task callback function is responsible to know what to do with them. However,
it is not really scalable.
The best is to use an atomic field in the stream structure itself to deal
with these dedicated events. There is already the "pending_events" field
that save wake up reasons (TASK_WOKEN_*) to not loose them if
process_stream() is interrupted before it had a chance to handle them.
So the idea is to introduce a new field to handle streams dedicated events
and merged them with the task's wake up reasons used by the stream. This
means a mapping must be performed between some task wake up reasons and
streams events. Note that not all task wake up reasons will be mapped.
In this patch, the "new_events" field is introduced. It is an atomic
bit-field. Streams events (STRM_EVT_*) are also introduced to map the task
wake up reasons used by process_stream(). Only TASK_WOKEN_TIMER and
TASK_WOKEN_MSG are mapped, in addition to TASK_F_UEVT* flags. In
process_stream(), "pending_events" field is now filled with new stream
events and the mapping of the wake up reasons.
shutdown-backup-sessions action for on-marked-up directive does not work anymore
since the stream_shutdown() function was modified to be async-safe.
When stream_shutdown() was modified to be async-safe, dedicated task events were
added to map the reasons to shut a stream down. SF_ERR_DOWN was mapped to
TASK_F_EVT1 and SF_ERR_KILLED was mapped to TASK_F_EVT2. The reverse mapping was
performed by process_stream() to shut the stream with the appropriate reason.
However, SF_ERR_UP reason, used by shutdown-backup-sessions action to shut a
stream down because a preferred server became available, was not mapped in the
same way. So since commit b8e3b0a18d ("BUG/MEDIUM: stream: make
stream_shutdown() async-safe"), this action is ignored and does not work
anymore.
To fix an issue, and being able to bakcport the fix, a third task event was
added. TASK_F_EVT3 is now mapped on SF_ERR_UP.
This patch should fix the issue #2848. It must be backported as far as 2.6.
For both servers and proxies, use one connection queue per thread-group,
instead of only one. Having only one can lead to severe performance
issues on NUMA machines, it is actually trivial to get the watchdog to
trigger on an AMD machine, having a server with a maxconn of 96, and an
injector that uses 160 concurrent connections.
We now have one queue per thread-group, however when dequeueing, we're
dequeuing MAX_SELF_USE_QUEUE (currently 9) pendconns from our own queue,
before dequeueing one from another thread group, if available, to make
sure everybody is still running.
For both proxies and servers, properly calculates queueslength, which is
the total number of element in each queues (as they currently are only
using one queue, it is equivalent to the number of element of that
queue), and use it instead of the queue's length.
Add a per-thread group queue and associated fields in per-thread group
field in struct server, as well as a new field, queues length.
This is currently unused, so should change nothing.
Add a per-thread group field to struct proxy, that will contain a struct
queue, as well as a new field, "queueslength".
This is currently unused, so should change nothing.
Please note that proxy_init_per_thr() must now be called for each proxy
once the thread groups number is known.
A few times in the past we've seen cases where epoll was caught reporting
a wrong event that caused trouble (e.g. spuriously reporting HUP or RDHUP
after a successful connect()). The new tune.epoll.mask-events directive
permits to mask events such as ERR, HUP and RDHUP and convert them to IN
events that are processed by the regular receive path. This should help
better diagnose and troubleshoot issues such as this one, as well as rule
out such a cause when similar issues are reported:
https://github.com/haproxy/haproxy/issues/2368https://www.spinics.net/lists/netdev/msg876470.html
It should be harmless to backport this if necessary.
acl_match_cond() combines acl_exec_cond() + acl_pass() and a check on the
condition->pol (to check if the cond is inverted) in order to return
either 0 if the cond doesn't match or 1 if it matches (or NULL).
Thanks to this we can actually simplify some redundant constructs that
iterate over rules and evaluate if the condition matches or not.
Conditions for tcp-request inspect-content and tcp-response
inspect-content couldn't be simplified because they perform an extra
check for missing data, and thus still need to leverage acl_exec_cond()
It's best to display the patch using "-w", like "git show xxxx -w",
because some blocks had to be re-indented after the cleanup, which
makes the patch hard to review by default.
As ssl_sock_gencert_load_ca and ssl_sock_gencert_free_ca are compiled only if
SSL_NO_GENERATE_CERTIFICATES is not defined, let's align it and move these
declarations in ssl_gencert.h.
ssl_sock_load_ca is defined in ssl_gencert.c and compiled only if
SSL_NO_GENERATE_CERTIFICATES is not defined. It's name is a bit confusing, as
we may think at the first glance, that it's a generic function, which is also
used to load CA file, provided via 'ca-file' keyword.
ssl_set_verify_locations_file is used in this case.
So let's rename ssl_sock_load_ca into ssl_sock_gencert_load_ca. Same is
applied to ssl_sock_free_ca.
ssl_sock_load_ca and ssl_sock_free_ca definitions are compiled only, if
SSL_NO_GENERATE_CERTIFICATES is not set. In case, when we set this define and
build haproxy, linker throws an error. So, let's fix this.
This should be backported in all stable versions.