Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "4k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, which
was verified to be OK.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, preventing from starting when set
e.g. to "64k". Let's make use of parse_size_err() on it so that units are
supported. This requires to turn it to uint as well, and to explicitly
limit its range to INT_MAX - 2*sizeof(void*), which was previously
partially handled as part of the sign check.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, and
since it's sometimes compared to an int, we limit its range to
0..INT_MAX.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, which
was verified to be OK.
Till now these values were parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on them so that
units are supported. This requires to turn them to uint as well, which
is OK.
Till now these values were parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on them so that
units are supported. This requires to turn them to uint as well, which
is OK.
The qcc_report_glitch() function is now replaced with a macro to support
enumerating counters for each individual glitch line. For now this adds
36 such counters. The macro supports an optional description, though that
is not being used for now.
As a reminder, this requires to build with -DDEBUG_GLITCHES=1.
proxy auth_uri struct was manually cleaned up during deinit, but the logic
behind was kind of akward because it was required to find out which ones
were shared or not. Instead, let's switch to a proper refcount mechanism
and free the auth_uri struct directly in proxy_free_common().
COUNT_GLITCH() will implement an unconditional counter on its declaration
line when DEBUG_GLITCHES is set, and do nothing otherwise. The output will
be reported as "GLT" and can be filtered as "glt" on the CLI. The purpose
is to help figure what's happening if some glitches counters start going
through the roof. The macro supports an optional string argument to
describe the cause of the glitch (e.g. "truncated header"), which is then
reported in the dump.
For now this is conditioned by DEBUG_GLITCHES but if it turns out to be
light enough, maybe we'll keep it enabled full time. In this case it
might have to be moved away from debug dev, or at least documented (or
done as debug counters maybe so that dev can remain undocumented and
updatable within a branch?).
In order to count new event types, we'll need to support empty conditions
so that we don't have to fake if (1) that would pollute the output. This
change checks if #cond is an empty string before concatenating it with
the optional var args, and avoids dumping the colon on the dump if the
whole description is empty.
After master-worker refactoring, master performs re-exec only once up to
receiving "reload" command or USR2 signal. There is no more the second
master's re-exec to free unused memory. Thus, there is no longer need to export
environment variable HAPROXY_LOAD_SUCCESS with worker process load status. This
status can be simply saved in a global variable load_status.
Since 3.0, it is possible to assign a GUID to proxies, listeners and
servers. These objects are stored in a global tree guid_tree.
Proxies and listeners are static. However, servers may be added or
deleted at runtime, which imply that guid_tree must be protected. Fix
this by declaring a read-write lock to protect tree access.
For now, only guid_insert() and guid_remove() are protected using a
write lock. Outside of these, GUID tree is not accessed at runtime. If
server CLI commands are extended to support GUID as server identifier,
lookup operation should be extended with a read lock protection.
Note that during stat-file preloading, GUID tree is accessed for lookup.
However, as it is performed on startup which is single threaded, there
is no need for lock here. A BUG_ON() has been added to ensure this
precondition remains true.
This bug could caused a segfault when using dynamic servers with GUID.
However, it was never reproduced for now.
This must be backported up to 3.0. To avoid a conflict issue, the
previous cleanup patch can be merged before it.
event_hdl_sub_list_empty() may be used to know if the subscription list
passed as argument is empty or not (ie: if there currently are any
subcribers or not). It can be useful to know if the subscription is empty
is order to avoid unecessary preparation work and skip event publishing to
save CPU time if we already know that no one is interested in tracking the
changes for a given subscription list.
In order to help users detect when threads are behaving abnormally, let's
try to emit a warning when one is no longer making any progress. This will
allow to catch faulty situations more accurately, instead of occasionally
triggering just after the long task. It will also let users know that there
is something wrong with their configuration, and inspect the call trace to
figure whether they're using excessively long rules or Lua for example (the
usual warnings about lua-load vs lua-load-per-thread are still reported).
The warning will only be emitted for threads not yet marked as stuck so
as not to interfere with panic dumps and avoid sending a warning just
before a panic. A tainted flag is set when this happens however (0x2000).
There's currently no way to just emit a warning informing that a thread
is stuck without crashing. This is a problem because sometimes users
would benefit from this info to clean up their configuration (e.g. abuse
of map_regm, lua-load etc).
This commit adds a new function ha_stuck_warning() that will emit a
warning indicating that the designated thread has been stuck for XX
milliseconds, with a number of streams blocked, and will make that
thread dump its own state. The warning will then be sent to stderr,
along with some reminders about the impacts of such situations to
encourage users to fix their configuration.
In order not to disrupt operations, a local 4kB buffer is allocated
in the stack. This should be quite sufficient.
For now the function is not used.
The comment asks to update the "metrics_info" array, which does not
exist, instead it's called stat_cols_info[] and is in stats.c. Let's
mention all that to save time searching for the needed info.
While no version seems to have ever known that "metrics_info", it's not
needed to backport this as it's only a comment.
A ClientHello may be splitted accross several different CRYPTO frames,
then mixed in a single QUIC packet. This is used notably by clients such
as chrome to render the first Initial packet opaque to middleboxes.
Each packet frame is handled sequentially. Out-of-order CRYPTO frames
are buffered in a ncbuf, until gaps are filled and data is transferred
to the SSL stack. If CRYPTO frames are heavily splitted with small
fragments, buffering may fail as ncbuf does not support small gaps. This
causes the whole packet to be rejected and unacknowledged. It could be
solved if the client reemits its ClientHello after remixing its CRYPTO
frames.
This patch is written to improve CRYPTO frame parsing. Each CRYPTO
frames which cannot be buffered due to ncbuf limitation are now stored
in a temporary list. Packet parsing is completed until all frames have
been handled. If temporary list is not empty, reparsing is done on the
stored frames. With the newly buffered CRYPTO frames, ncbuf insert
operation may this time succeeds if the frame now covers a whole gap.
Reparsing will loop until either no progress can be made or it has been
done at least 3 times, to prevent CPU utilization.
This patch should fix github issue #2776.
This should be backported up to 2.6, after a period of observation. Note
that it relies on the following refactor patches :
MINOR: quic: extend return value of CRYPTO parsing
MINOR: quic: use dynamically allocated frame on parsing
MINOR: quic: simplify qc_parse_pkt_frms() return path
qc_handle_crypto_frm() is the function used to handled a newly received
CRYPTO frame. Change its API to use a newly dedicated return type. This
allows to report if the frame was properly handled, ignored if already
parsed previously or rejected after a fatal error.
This commit does not have any functional changes. However, it allows to
simplify qc_handle_crypto_frm() API by removing <fast_retrans> as output
parameter. Also, this patch will be necessary to support multiple
iteration of packet parsing for CRYPTO frames.
As reported by Pierre Maoui in GH #2477, it's not possible to render
control chars from variables or expressions verbatim in the payload part
of http-return statements. That's a problem because this part should not
require to be encoded at all (we could even imagine building favicons on
the fly for example).
In fact it is the LOG_OPT_HTTP option when passed as default options on
parse_logformat_string() which tells the log encoder that the payload
should be http-encoded using lf_chunk() instead of being printed using the
per-type encoder.
This option was set when parsing logformat expressions for lf-string
expression under http-return statements, as well as logformat expressions
for set-map action. While it is true that those actions may only be
used under http context, the LOG_OPT_HTTP logformat option is not relevant
there, because the payload is expected to be used without being encoded.
So let's simply get rid of this option when parsing logformat expressions
for set-map action key/value and lf-string from http-request return
action, and add a note next to LOG_OPT_HTTP option to indicate that it is
used to tell the log encoder that the payload should be HTTP-encoded.
Thanks to Pierre for having reported the issue and Willy for the
analysis and patch proposal.
These functions return a symbolic error code such as ECONNRESET to keep
logs compact while making them human-readable. It's a good alternative
to the numeric code in that it's more expressive, and a good one to the
full message since it's shorter and more precise (some codes even match
errno names).
The doc was updated so that the symbolic names appear in the table. It
could be useful to backport this feature to help with troubleshooting
some issues, though backporting the doc might possibly be more annoying
in case users have local patches already, so maybe the table update does
not need to be backported in this case.
While we get reports of connection setup errors in fc_err/bc_err, we
don't have the equivalent for the recv/send/splice syscalls. Let's
add provisions for new codes that cover the common errno values that
recv/send/splice can return, i.e. ECONNREFUSED, ENOMEM, EBADF, EFAULT,
EINVAL, ENOTCONN, ENOTSOCK, ENOBUFS, EPIPE. We also add a special case
for when the poller reported the error itself. It's worth noting that
EBADF/EFAULT/EINVAL will generally indicate serious bugs in the code
and should not be reported.
The only thing is that it's quite hard to forcefully (and reliably)
trigger these errors in automated tests as the timing is critical.
Using iptables to manually reset established connections in the
middle of large transfers at least permits to see some ECONNRESET
and/or EPIPE, but the other ones are harder to trigger.
It was the only one prefixed with "CO_ERR_", making it harder to batch
process and to look up. It was added in 2.5 by commit 61944f7a73 ("MINOR:
ssl: Set connection error code in case of SSL read or write fatal failure")
so it can be backported as far as 2.6 if needed to help integrate other
patches.
We're using a few occurrences of __builtin_prefetch() but tcc doesn't
know about it so let's give it a dummy definition. Now the code builds
and works again with tcc without thread support.
TCC is often convenient to quickly test builds, run CI tests etc. It has
limited thread support (e.g. no thread-local stuff) but that is often
sufficient for testing. TCC lacks __atomic_exchange_n() but has the
exactly equivalent __atomic_exchange(), and doesn't have any barrier.
For this reason we force the atomic_exchange to use the stricter SEQ_CST
mem ordering that allows to ignore the barrier.
[wt: that's upstream commit ca8b865 ("BUILD: support building with TCC")]
This commit introduces the tune.renice.startup and tune.renice.runtime
global keywords that allows to change the priority with setpriority().
tune.renice.startup is parsed and applied in the worker or the standalone
process for configuration parsing. If this keyword is used alone, the
nice value is changed to the previous one after configuration parsing.
tune.renice.runtime is applied after configuration parsing, so in the
worker or a standalone process. Combined with tune.renice.startup it
allows to have a different nice value during configuration parsing and
during runtime.
The feature was discussed in github issue #1919.
Example:
global
tune.renice.startup 15
tune.renice.runtime 0
When http-buffer-request option is set on a proxy, the processing will be
paused to wait the full request payload or a full buffer. So it is an entity
that block the processing, just like a rule or a filter that yields. So now,
it is reported as a waiting entity if an error or a timeout occurred.
To do so, an stream entity type is added for this option. There is no
pointer. And "waiting_entity" sample fetch returns the option name.
When a rule or a filter yields because it waits for something to be able to
continue its processing, this entity is saved in the stream. If an error or
a timeout occurred, info on this entity may be retrieved via the
"waiting_entity" sample fetch, for instance to dump it in the logs. This
info may be useful to found root cause of some bugs because it is a way to
know the processing was temporarily stopped. This may explain timeouts for
instance.
The sample fetch is not documented yet.
It is very similar to the last evaluated rule. When a filter returns an
error that interrupts the processing, it is saved in the stream, in the
last_entity field, with the type 2. The pointer on filter config is
saved. This pointer never changes during runtime and is part of the proxy's
structure. It is an element of the filter_configs list in the proxy
structure.
"last_entity" sample fetch was update accordingly. The filter identifier is
returned, if defined. Otherwise the save pointer.
The last evaluated rule is now saved in a generic structure, named
last_entity, with a type to identify it. The idea is to be able to store
other kind of entity that may interrupt a specific processing.
The type of the last evaluated rule is set to 1. It will be replace later by
an enum to be more explicit. In addition, the pointer to the rule itself is
saved instead of its location.
The sample fetch "last_entity" was added to retrieve the information about
it. In this case, it is the rule localtion, the config file containing the
rule followed by the line where the rule is defined, separated by a
colon. This sample fetch is not documented yet.
When an abstract unix socket is bound by HAProxy (using "abns@" prefix),
NUL bytes are appended at the end of its path until sun_path is filled
(for a total of 108 characters).
Here we add an alternative to pass only the non-NUL length of that path
to connect/bind calls, such that the effective path of the socket's name
is as humanly written. This may be useful to interconnect with existing
softwares that implement abstract sockets with this logic instead of the
default haproxy one.
This is achieved by implementing the "abnsz" socket prefix (instead of
"abns"), which stands for "zero-terminated ABNS". "abnsz" prefix may be
used anywhere "abns" is. Internally, haproxy uses the custom socket
family (AF_CUST_ABNS vs AF_CUST_ABNSZ) to differentiate default abns
sockets from zero-terminated ones.
Documentation was updated and regtest was added.
Fixes GH issues #977 and #2479
Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
Thanks to previous commit, we may now use dedicated addrcmp functions for
each UNIX address family. This allows to simplify sock_unix_addrcmp()
function and avoid useless checks in order to try to guess the socket
type.
In this patch we implement sock_abns_addrcmp() and sock_abnsz_addrcmp()
functions, which are respectively used for ABNS and ABNSZ custom families
sock_unix_addrcmp() now only holds regular UNIX socket comparing logic.
For now it's the same as abns. We'll need to modify sock_unix_addrcmp(),
and a few other ones to support effective path length when dealing with
the \0. Let's check with Tristan's patch for this (upcoming patch).
Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
This is a pre-requisite to adding the abnsz socket address family:
in this patch we make use of protocol API rework started by 732913f
("MINOR: protocol: properly assign the sock_domain and sock_family") in
order to implement a dedicated address family for ABNS sockets (based on
UNIX parent family).
Thanks to this, it will become trivial to implement a new ABNSZ (for abns
zero) family which is essentially the same as ABNS but with a slight
difference when it comes to path handling (ABNS uses the whole sun_path
length, while ABNSZ's path is zero terminated and evaluation stops at 0)
It was verified that this patch doesn't break reg-tests and behaves
properly (tests performed on the CLI with show sess and show fd).
Anywhere relevant, AF_CUST_ABNS is handled alongside AF_UNIX. If no
distinction needs to be made, real_family() is used to fetch the proper
real family type to handle it properly.
Both stream and dgram were converted, so no functional change should be
expected for this "internal" rework, except that proto will be displayed
as "abns_{stream,dgram}" instead of "unix_{stream,dgram}".
Before ("show sess" output):
0x64c35528aab0: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=21,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp=
After:
0x619da7ad74c0: proto=abns_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=22,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp=
Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
When using trace with -dt, the trace_parse_cmd() function is doing a
strtok which write \0 into the argv string.
When using the mworker mode, and reloading, argv was modified and the
trace won't work anymore because the first : is replaced by a '\0'.
This patch fixes the issue by allocating a temporary string so we don't
modify the source string directly. It also replace strtok by its
reentrant version strtok_r.
Must be backported as far as 2.9.
strnlen2() is functionally equivalent to strnlen(). Goal is to provide
an alternative to strnlen() which is not portable since it requires
_POSIX_C_SOURCE >= 200809L
There is no reason to disable the 0-copy data forwarding if an end-of-stream
was reported on the consumer side. Indeed, the consumer will send data in
this case. So there is no reason to check the read side here.
This patch may be backported as far as 2.9.
Each server is inserted in a global list named servers_list on
new_server(). This list is then only used to finalize servers
initialization after parsing.
On dynamic server creation, there is no issue as new_server() is under
thread isolation. However, when a server is deleted after its refcount
reached zero, srv_drop() removes it from servers_list without lock
protection. In the longterm, this can cause list corruption and crashes,
especially if multiple adjacent servers are removed in parallel.
To fix this, convert servers_list to a mt_list. This should not impact
performance as servers_list is not used during runtime outside of server
creation/deletion.
This should fix github issue #2733. Thanks to Chris Staite who first
found the issue here.
This must be backported up to 2.6.
There are two parts in mworker_cli_proxy_create(): allocating and setting up
MASTER proxy and allocating and setting up servers on ipc_fd[0] of the
sockpairs shared with workers.
So, let's split mworker_cli_proxy_create() into two functions respectively.
Each of them takes **errmsg as an argument to write an error message, which may
be triggered by some subcalls. The content of this errmsg will allow to extend
the final alert message shown to user, if these new functions will fail.
The main goals of this split is to allow to move these two parts independantly
in future and makes the code of haproxy initialization in haproxy.c more
transparent.
The idea here is to record how many times a filter is being called on a
stream. We're incrementing the same counter all along, regardless of the
type of event, since the purpose is essentially to detect one that might
be misbehaving. The number of calls is reported in "show sess all" next
to the filter name. It may also help detect suboptimal processing. For
example compressing 1GB shows 138k calls to the compression filter, which
is roughly two calls per buffer. Maybe we wake up with incomplete buffers
and compress less. That's left for a future analysis.
Process_stream() is a complex function and a few times some lopos were
either witnessed or suspected. Each time this happens it's extremely
difficult to figure why because it involves combinations of analysers,
filters, errors etc.
Let's at least maintain a set of 4 counters per stream that report the
number of times we've been through each of the 4 most important blocks
(stconn changes, request analysers, response analysers, and propagation
of changes down). These ones are stored in the stream and reported in
"show sess all", just like they will be reported in panic dumps.
This macro works exactly like BUG_ON() except that it never logs anything
nor crashes, it only implements an atomic counter that is incremented on
every call. This can be used to count a number of unlikely events that are
worth checking at run time on setups showing unusual and unreproducible
behaviors.
These macros do not always kill the process, and sometimes it would be
nice to know if some match or not, and how many times (especially for the
CHECK_IF one).
This commit adds a new section "dbg_cnt" made of structs that contain
function name, file name, line number, check type, condition and match
count. A newe macro __DBG_COUNT() adds one to the counter, and is placed
inside _BUG_ON() and _BUG_ON_ONCE(). It's worth noting that the exact
type of the check is not very precise but in practice we don't care,
as most checks will cause the process to die anyway unless they're of
type _BUG_ON_ONCE() (used by CHECK_IF by default).
All of this is limited to !defined(USE_OBSOLETE_LINKER) because we're
creating a section, thus we need a modern linker to be able to scan
this section later. Doing so adds ~50kB to the executable due to the
~1266 BUG_ON() and others placed there. That's not huge in comparison
to the visibility it can provide.
The BUG_ON() macros are made of two levels so as to resolve the condition
to a string. However this doesn't offer much flexibility for performing
other operations when the condition is validated, so let's adjust them so
that the condition is checked in the outer macro and the operations are
performed in the inner one.