This one used to maintain a shortcut in the pools allocation path that
was only justified by b_alloc_fast() which was not used! Let's get rid
of it as well so that the allocator becomes a bit more straight forward.
It is never used anymore since 1.7 where it was used by b_alloc_margin()
then replaced by direct calls to the pools function, and it maintains a
dependency on the exposed pools functions. It's time to get rid of it,
as it's not even certain it still works.
The channel's buffer allocator, channel_alloc_buffer(), was still relying
on the principle of a margin for the request and not for the response.
But this margin stopped working around 1.7 with the introduction of the
content filters such as SPOE, and was completely anihilated with the
local pools that came with threads. Let's simplify this and just use
b_alloc().
Right now there is a discrepancy beteween b_alloc() and b_allow_margin():
the former forcefully overwrites the target pointer while the latter tests
it and returns it as-is if already allocated.
As a matter of fact, all callers of b_alloc() either preliminary test the
buffer, or assume it's already null.
Let's remove this pain and make the function test the buffer's allocation
before doing it again, and match call places' expectations.
Some parts of the Lua are non-reentrant. We must be sure to carefully track
these parts to not dump the lua stack when it is interrupted inside such
parts. For now, we only identified the custom lua allocator. If the thread
is interrupted during the memory allocation, we must not try to print the
lua stack wich also allocate memory. Indeed, realloc() is not
async-signal-safe.
In this patch we introduce a thread-local counter. It is incremented before
entering in a non-reentrant part and decremented when exiting. It is only
performed in hlua_alloc() for now.
Now that connections aren't being reused when they failed, remove the
reset() method. It was unimplemented anywhere, except for H1 where it did
nothing, anyway.
Introduce a new XPRT method, start(). The init() method will now only
initialize whatever is needed for the XPRT to run, but any action the XPRT
has to do before being ready, such as handshakes, will be done in the new
start() method. That way, we will be sure the full stack of xprt will be
initialized before attempting to do anything.
The init() call is also moved to conn_prepare(). There's no longer any reason
to wait for the ctrl to be ready, any action will be deferred until start(),
anyway. This means conn_xprt_init() is no longer needed.
Remove static qualifier on stats_allocate_proxy_counters_internal. This
function will be used to allocate extra counters at runtime for dynamic
servers.
Prepare the server parsing API to support dynamic servers.
- define a new parsing flag to be used for dynamic servers
- each keyword contains a new field dynamic_ok to indicate if it can be
used for a dynamic server. For now, no keyword are supported.
- do not copy settings from the default server for a new dynamic server.
- a dynamic server is created in a maintenance mode and requires an
explicit 'enable server' command.
- a new server flag named SRV_F_DYNAMIC is created. This flag is set for
all servers created at runtime. It might be useful later, for example
to know if a server can be purged.
Modify the API of parse_server function. Use flags to describe the type
of the parsed server instead of discrete arguments. These flags can be
used to specify if a server/default-server/server-template is parsed.
Additional parameters are also specified (parsing of the address
required, resolve of a name must be done immediately).
It is now unneeded to use strcmp on args[0] in parse_server. Also, the
calls to parse_server are more explicit thanks to the flags.
The freq counters were using the thread's own time as the start of the
current period. The problem is that in case of contention, it was
occasionally possible to perform non-monotonic updates on the edge of
the next second, because if the upfront thread updates a counter first,
it causes a rotation, then the second thread loses the race from its
older time, and tries again, and detects a different time again, but
in the past so it only updates the counter, then a third thread on the
new date would detect a change again, thus provoking a rotation again.
The effect was triple:
- rare loss of stored values during certain transitions from one
period to the next one, causing counters to report 0
- half of the threads forced to go through the slow path every second
- difficult convergence when using many threads where the CAS can fail
a lot and we can observe N(N-1) attempts for N threads to complete
This patch fixes this issue in two ways:
- first, it now makes use og the monotonic global_now value which also
happens to be volatile and to carry the latest known time; this way
time will never jump backwards anymore and only the first thread
updates it on transition, the other ones do not need to.
- second, re-read the time in the loop after each failure, because
if the date changed in the counter, it means that one thread knows
a more recent one and we need to update. In this case if it matches
the new current second, the fast path is usable.
This patch relies on previous patch "MINOR: time: export the global_now
variable" and must be backported as far as 1.8.
This is the process-wide monotonic time that is used to update each
thread's own time. It may be required at a few places where a strictly
monotonic clock is required such as freq_ctr. It will be have to be
backported as a dependency of a forthcoming fix.
Some are not always easy to spot with "chk" vs "check" or hyphens at
some places and not at others. Now entering "option http-close" properly
suggests "httpclose" and "option tcp-chk" suggests "tcp-check". There's
no need to consider the proxy's capabilities, what matters is to figure
what related word the user tried to spell, and there are not that many
options anyway.
The distance between two words can be high due to a sub-word being missing
and in this case it happens that other totally unrealted words are proposed
because their average score looks lower thanks to being shorter. Here we're
introducing the notion of presence of each character so that word sequences
that contain existing sub-words are favored against the shorter ones having
nothing in common. In addition we do not distinguish being/end from a
regular delimitor anymore. That made it harder to spot inverted words.
When tasklets were derived from tasks, there was no immediate need for
the scheduler to know their status after execution, and in a spirit of
simplicity they just started to always return NULL. The problem is that
it simply prevents the scheduler from 1) accounting their execution time,
and 2) keeping track of their current execution status. Indeed, a remote
wake-up could very well end up manipulating a tasklet that's currently
being executed. And this is the reason why those handlers have to take
the idle lock before checking their context.
In 2.5 we'll take care of making tasklets and tasks work more similarly,
but trouble is to be expected if we continue to propagate the trend of
returning NULL everywhere, especially if some fixes relying on a stricter
model later need to be backported. For this reason this patch updates all
known tasklet handlers to make them return NULL only when the tasklet was
freed. It has no effect for now and isn't even guaranteed to always be
100% safe but it puts the code into the right direction for this.
There were still a very small list of functions, variables and fields
called "stats_" while they were really purely CLI-centric. There's the
frontend called "stats_fe" in the global section, which instantiates a
"cli_applet" called "<CLI>" so it was renamed "cli_fe".
The "alloc_stats_fe" function cas renamed to "cli_alloc_fe" which also
better matches the naming convention of all cli-specific functions.
Finally the "stats_permission_denied_msg" used to return an error on
the CLI was renamed "cli_permission_denied_msg".
Now there's no more "stats_something" that designates the CLI.
This is the number of args accepted on a command received on the CLI,
is has long been totally independent of stats and should not carry
this misleading "stats" name anymore.
Instead of making a new one from scratch, let's support not wiping the
existing fingerprint and updating it, and to do the same char by char.
The word-by-word one will still result in multiple beginnings and ends,
but that will accurately translate word boundaries. The char-based one
has more flexibility and requires that the caller maintains the previous
char to indicate the transition, which also allows to insert delimiters
for example.
Entering "show tls" would still emit 35 entries. By measuring the distance
between all unknown words and the candidates, we can sort them and pick the
10 most likely candidates. This works reasonably well, as now "show tls"
only proposes "show tls-keys", "show threads", "show pools" and "show tasks".
If the distance is still too high or if a word is missing, the whole
prefix list continues to be dumped, thus "show" alone will still report
the entire list of commands beginning with "show".
It's still impossible to skip a word, for example "show conn" will not
propose "show servers conn" because the distance is calculated for each
word individually. Some changes to the distance calculation to support
updating an existing map could easily address this. But this is already
a great improvement.
It was mentioned that ACCESS_MASTER_ONLY as for workers only instead of
master-only. And it wasn't clear that all ACCESS_* would belong to the
same thing.
The last time when an item was seen in a resolver responses is now stored in
milliseconds instead of seconds. This avoid some corner-cases at the
edges. This also simplifies time comparisons.
Another way to say it: "Safely unlink requester from a requester callbacks".
Requester callbacks must never try to unlink a requester from a resolution, for
the current requester or another one. First, these callback functions are called
in a loop on a request list, not necessarily safe. Thus unlink resolution at
this place, may be unsafe. And it is useless to try to make these loops safe
because, all this stuff is placed in a loop on a resolution list. Unlink a
requester may lead to release a resolution if it is the last requester.
However, the unkink is necessary because we cannot reset the server state
(hostname and IP) with some pending DNS resolution on it. So, to workaround
this issue, we introduce the "safe" unlink. It is only performed from a
requester callback. In this case, the unlink function never releases the
resolution, it only reset it if necessary. And when a resolution is found
with an empty requester list, it is released.
This patch depends on the following commits :
* MINOR: resolvers: Purge answer items when a SRV resolution triggers an error
* MINOR: resolvers: Use a function to remove answers attached to a resolution
* MINOR: resolvers: Directly call srvrq_update_srv_state() when possible
* MINOR: resolvers: Add function to change the srv status based on SRV resolution
All the series must be backported as far as 2.2. It fixes a regression
introduced by the commit b4badf720 ("BUG/MINOR: resolvers: new callback to
properly handle SRV record errors").
don't release resolution from requester cb
srvrq_update_srv_status() update the server status based on result of SRV
resolution. For now, it is only used from snr_update_srv_status() when
appropriate.
resolv_purge_resolution_answer_records() must be used to removed all answers
attached to a resolution. For now, it is only used when a resolution is
released.
This function search for a SRV answer item associated to a requester
whose type is server.
This is mainly useful to "link" a server to its SRV record when no
additional record were found to configure the IP address.
This patch is required by a bug fix.
action_suggest() will return a pointer to an action whose keyword more or
less ressembles the passed argument. It also accepts to be more tolerant
against prefixes (since actions taking arguments are handled as prefixes).
This will be used to suggest approaching words.
Just like with the server keywords, now's the turn of "bind" keywords.
The difference is that 100% of the bind keywords are registered, thus
we do not need the list of extra keywords.
There are multiple bind line parsers today, all were updated:
- peers
- log
- dgram-bind
- cli
$ printf "listen f\nbind :8000 tcut\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/101358 (25146) : haproxy version is 2.4-dev11-7b8787-26
[NOTICE] 070/101358 (25146) : path to executable is ./haproxy
[ALERT] 070/101358 (25146) : parsing [/dev/stdin:2] : 'bind :8000' unknown keyword 'tcut'; did you mean 'tcp-ut' maybe ?
[ALERT] 070/101358 (25146) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/101358 (25146) : Fatal errors found in configuration.
Instead of just reporting "unknown keyword", let's provide a function which
will look through a list of registered keywords for a similar-looking word
to the one that wasn't matched. This will help callers suggest correct
spelling. Also, given that a large part of the config parser still relies
on a long chain of strcmp(), we'll need to be able to pass extra candidates.
Thus the function supports an optional extra list for this purpose.
This introduces two functions, one which creates a fingerprint of a word,
and one which computes a distance between two words fingerprints. The
fingerprint is made by counting the transitions between one character and
another one. Here we consider the 26 alphabetic letters regardless of
their case, then any digit as a digit, and anything else as "other". We
also consider the first and last locations as transitions from begin to
first char, and last char to end. The distance is simply the sum of the
squares of the differences between two fingerprints. This way, doubling/
missing a letter has the same cost, however some repeated transitions
such as "e"->"r" like in "server" are very unlikely to match against
situations where they do not exist. This is a naive approach but it seems
to work sufficiently well for now. It may be refined in the future if
needed.
It is possible to have a session without a listener. It happens for applets
on the client side. Thus all accesses to the listener info from the session
must be guarded. It was the purpose of the commit 36119de18 ("BUG/MEDIUM:
session: NULL dereference possible when accessing the listener"). However,
some tests on the session's listener existence are missing in proxy_inc_*
functions.
This patch should fix the issues #1171, #1172, #1173, #1174 and #1175. It
must be backported with the above commit as far as 1.8.
Since commit f8fb4f75f ("MINOR: atomic: implement a more efficient arm64
__ha_cas_dw() using pairs"), on some modern arm64 (armv8.1+) compiled
with -march=armv8.1-a under gcc-7.5.0, a build error may appear on ev_poll.o :
/tmp/ccHD2lN8.s:1771: Error: reg pair must start from even reg at operand 1 -- `casp x27,x28,x22,x23,[x12]'
Makefile:927: recipe for target 'src/ev_poll.o' failed
It appears that the compiler cannot always assign register pairs there
for a structure made of two u64. It was possibly later addressed since
gcc-9.3 never caused this, but there's no trivially available info on
the subject in the changelogs. Unsuprizingly, using a u128 instead does
fix this, but it significantly inflates the code (+4kB for just 6 places,
very likely that it loaded some extra stubs) and the comparison is ugly,
involving two slower conditional jumps instead of a single one and a
conditional comparison. For example, ha_random64() grew from 144 bytes
to 232.
However, simply forcing the base register does work pretty well, and
makes the code even cleaner and more efficient by further reducing it
by about 4.5kB, possibly because it helps the compiler to pick suitable
registers for the pair there. And the perf on 64-cores looks steadily
0.5% above the previous one, so let's do this.
Note that the commit above was backported to 2.3 to fix scalability
issues on AWS Graviton2 platform, so this one will need to be as well.
The QUIC connection struct connection member was not initialized. This may
make randomly haproxy handle TLS connections as QUIC ones only when QUIC support
is enabled leading to such OpenSSL errors (captured from a reg test output, TLS
Client-Hello callback failed):
OpenSSL error[0x10000085] OPENSSL_internal: CONNECTION_REJECTED
OpenSSL error[0x10000410] OPENSSL_internal: SSLV3_ALERT_HANDSHAKE_FAILURE
OpenSSL error[0x1000009a] OPENSSL_internal: HANDSHAKE_FAILURE_ON_CLIENT_HELLO
This patch should fix#1168 github issue.
The recent default runqueue size reduction appeared to have significantly
lowered performance on low-thread count configs. Testing various values
runqueue values on different workloads under thread counts ranging from
1 to 64, it appeared that lower values are more optimal for high thread
counts and conversely. It could even be drawn that the optimal value for
various workloads sits around 280/sqrt(nbthread), and probably has to do
with both the L3 cache usage and how to optimally interlace the threads'
activity to minimize contention. This is much easier to optimally
configure, so let's do this by default now.
The recently introduced Financial Information eXchange (FIX)
converters have some hard coded tags based on the specification that
were misspelled. Specifically, SenderComID and TargetComID should
be SenderCompID and TargetCompID according to the specification [1][2].
This patch updates all references, which includes the converters
themselves, the regression test, and the documentation.
[1] https://fiximate.fixtrading.org/en/FIX.5.0SP2_EP264/tag49.html
[2] https://fiximate.fixtrading.org/en/FIX.5.0SP2_EP264/tag56.html
Parameter "accepted_payload_size" is currently considered regardless
the used nameserver is using TCP or UDP. It remains mandatory to annouce
such capability to support e-dns, so a value have to be announced also
in TCP. Maximum DNS message size in TCP is limited by protocol to 65535
and so for UDP (65507) if system supports such UDP messages. But
the maximum value for this option was arbitrary forced to 8192.
This patch change this maximum to 65535 to allow user to set bigger value
for UDP if its system supports. It also sets accepted_payload_size
in TCP allowing to retrieve huge responses if the configuration uses
TCP nameservers.
The request announcing the accepted_payload_size capability is currently
built at resolvers level and is common to all used nameservers of the
section regardess transport protocol used. A further patch should be
made to at least specify a different payload size depending of the
transport, and perhaps could be forced to 65535 in case of TCP and
maximum would be forced back to 65507 matching UDP max.
This patch is appliable since 2.4 version
These are some leftovers from the ancient code where they were still
called sessions, but these areas in the code remain confusing due to
this naming. They were now called "strm" which will not even affect
indenting nor alignment.
It was brought by commit c44b8de99 ("CLEANUP: connection: Use `VAR_ARRAY`
in `struct tlv` definition") but breaks the build with clang. Actually it
had already been done 6 months ago by commit 4987a4744 ("CLEANUP: tree-wide:
use VAR_ARRAY instead of [0] in various definitions") then reverted by
commit 441b6c31e ("BUILD: connection: fix build on clang after the VAR_ARRAY
cleanup") which explained the same thing but didn't place a comment in the
code to justify this (in short it's just an end of struct marker).
The default proxy was passed as a variable to all parsers instead of a
const, which is not without risk, especially when some timeout parsers used
to make some int pointers point to the default values for comparisons. We
want to be certain that none of these parsers will modify the defaults
sections by accident, so it's important to mark this proxy as const.
This patch touches all occurrences found (89).
TCC happens to define __OPTIMIZE__ at -O2 but doesn't proceed with dead
code elimination, resulting in ha_free() to always reference the link
error symbol. Let's condition this test on __GCC__ which others like
Clang also define.