This is the second attempt at importing the updated mt_list code (commit
59459ea3). The previous one was attempted with commit c618ed5ff4 ("MAJOR:
import: update mt_list to support exponential back-off") but revealed
problems with QUIC connections and was reverted.
The problem that was faced was that elements deleted inside an iterator
were no longer reset, and that if they were to be recycled in this form,
they could appear as busy to the next user. This was trivially reproduced
with this:
$ cat quic-repro.cfg
global
stats socket /tmp/sock1 level admin
stats timeout 1h
limited-quic
frontend stats
mode http
bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
timeout client 5s
stats uri /
$ ./haproxy -db -f quic-repro.cfg &
$ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/
=> hang
This was purely an API issue caused by the simplified usage of the macros
for the iterator. The original version had two backups (one full element
and one pointer) that the user had to take care of, while the new one only
uses one that is transparent for the user. But during removal, the element
still has to be unlocked if it's going to be reused.
All of this sparked discussions with Fred and Aurélien regarding the still
unclear state of locking. It was found that the lock API does too much at
once and is lacking granularity. The new version offers a much more fine-
grained control allowing to selectively lock/unlock an element, a link,
the rest of the list etc.
It was also found that plenty of places just want to free the current
element, or delete it to do anything with it, hence don't need to reset
its pointers (e.g. event_hdl). Finally it appeared obvious that the
root cause of the problem was the unclear usage of the list iterators
themselves because one does not necessarily expect the element to be
presented locked when not needed, which makes the unlock easy to overlook
during reviews.
The updated version of the list presents explicit lock status in the
macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED
suffix, the caller is expected to unlock the element if it intends to
reuse it. At least the status is advertised. The _UNLOCKED variant,
instead, always unlocks it before starting the loop block. This means
it's not necessary to think about unlocking it, though it's obviously
not usable with everything. A few _UNLOCKED were used at obvious places
(i.e. where the element is deleted and freed without any prior check).
Interestingly, the tests performed last year on QUIC forwarding, that
resulted in limited traffic for the original version and higher bit
rate for the new one couldn't be reproduced because since then the QUIC
stack has gaind in efficiency, and the 100 Gbps barrier is now reached
with or without the mt_list update. However the unit tests definitely
show a huge difference, particularly on EPYC platforms where the EBO
provides tremendous CPU savings.
Overall, the following changes are visible from the application code:
- mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr
=> MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
+ 1 back elem
- MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
=> just manually set iterator to NULL however.
For MT_LIST_FOR_EACH_ENTRY_LOCKED()
=> mt_list_unlock_self() (if element going to be reused) + NULL
- MT_LIST_LOCK_ELT => mt_list_lock_full()
- MT_LIST_UNLOCK_ELT => mt_list_unlock_full()
- l = MT_LIST_APPEND_LOCKED(h, e); MT_LIST_UNLOCK_ELT();
=> l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)
The following patch fixes a race condition during server addr/port
update :
cd994407a9
BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates
The new update mechanism is implemented via an event update. It uses
thread isolation to guarantee that no other thread is accessing server
addr/port. Furthermore, to ensure server instance is not deleted just
before the event handler, server instance is lookup via its ID in proxy
tree.
However, thread isolation is only entered after server lookup. This
leaves a tiny race condition as the thread will be marked as harmless
and a concurrent thread can delete the server in the meantime. This
causes server_atomic_sync() to manipulated a deleted server instance to
reinsert it in used_server_addr backend tree. This can cause a segfault
during this operation or possibly on a future used_server_addr tree
access.
This issue was detected by criteo. Several backtraces were retrieved,
each related to server addr_node insert or delete operation, either in
srv_set_addr_desc(), or add/delete dynamic server handlers.
To fix this, simply extend thread isolation section to start it before
server lookup. This ensures that once retrieved the server cannot be
deleted until its addr/port are updated. To ensure this issue won't
happen anymore, a new BUG_ON() is added in srv_set_addr_desc().
Also note that ebpt_delete() is now called every time on delete handler
as this is a safe idempotent operation.
To reproduce these crashes, a script was executed to add then remove
different servers every second. In parallel, the following CLI command
was issued repeatdly without any delay to force multiple update on
servers port :
set server <srv> addr 0.0.0.0 port $((1024 + RANDOM % 1024))
This must be backported at least up to 3.0. If above mentionned patch
has been selected for previous version, this commit must also be
backported on them.
This is a complementary patch to c16eba818 ("BUG/MEDIUM: server/dns:
preserve server's port upon resolution timeout or error").
Indeed, since c16eba818, the port is properly preserved, but unsetting
server's address this way results in server_atomic_sync() function
thinking that we're actually setting a new address and not unsetting
the previous one because addr family is != AF_UNSPEC.
Upon DNS timeout, this could be observed:
[WARNING] (2588257) : Server http/s1 is going DOWN for maintenance (DNS timeout status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (2588257) : Server http/s1 ('test1.localhost') is UP/READY (resolves again).
Notice that server timeouts and then immediately resolves again. Of course
in this case case the server's address was properly set to 0, meaning
that the server will not receive any traffic, but it is confusing and
could result in haproxy temporarily thinking that the server is actually
available while it's not.
To properly fix the issue and restore historical behavior, let's
explicitly set inetaddr's family to AF_UNSPEC after fetching original
server's address.
It should be backported in 3.0 with c16eba818.
This is a follow-up for 7223296 ("BUG/MINOR: server: fix first server
template not being indexed").
Indeed, in 7223296 we added a new call to _srv_parse_set_id_from_prefix()
for the first server before handling additional ones. But we actually
overlooked the fact that _srv_parse_set_id_from_prefix() was already
performed at the end of _srv_parse_tmpl_init() for the same server.
Since _srv_parse_set_id_from_prefix() frees srv->id, it results in UAF
when performing name lookups on the first server, because used_server_name
node key still uses the freed string pointer.
The early _srv_parse_set_id_from_prefix() call (added in 7223296) and
the original one perform the same task, except that the new one is
followed by name node insertion logic required for name lookups to work
properly. So let's simply get rid of the old one at the end of the
function.
_srv_parse_set_id_from_prefix() in the 'err:' label was also removed since
is is now useless as well starting with 7223296 and would trigger the same
bug on error paths. Thanks to Amaury for noticing it.
This bug was discovered while trying to address GH issue #2620.
Thanks to @x-yuri for his detailed report (with working repro).
It should be backported in 3.0 with 7223296.
When a new "default-server" line is parsed, some resolver options are reset.
Thus previously defined default options cannot be inherited. There is no
reason to do so. First because other server options are inherited. And then
because not all resolver options are reset. It is not consistent.
This patch should fix issue #2559. It should be backported to all stable
versions.
@boi4 reported in GH #2578 that since 3.0-dev1 for servers with address
learned from A/AAAA records after a DNS flap server would be put out of
maintenance with proper address but with invalid port (== 0), making it
unusable and causing tcp checks to fail:
[NOTICE] (1) : Loading success.
[WARNING] (8) : Server mybackend/myserver1 is going DOWN for maintenance (DNS refused status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] (8) : backend 'mybackend' has no server available!
[WARNING] (8) : mybackend/myserver1: IP changed from '(none)' to '127.0.0.1' by 'myresolver/ns1'.
[WARNING] (8) : Server mybackend/myserver1 ('myhost') is UP/READY (resolves again).
[WARNING] (8) : Server mybackend/myserver1 administratively READY thanks to valid DNS answer.
[WARNING] (8) : Server mybackend/myserver1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
@boi4 also mentioned that this used to work fine before.
Willy suggested that this regression may have been introduced by 64c9c8e
("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS")
Turns out he was right! Indeed, in 64c9c8e we systematically memset the
whole server_inetaddr struct (which contains both the requested server's
addr and port planned for atomic update) instead of only memsetting the
addr part of the structure: except when SRV records are involved (SRV
records provide both the address and the port unlike A or AAAA records),
we must not reset the server's port upon DNS errors because the port may
have been provided at config time and we don't want to lose its value.
Big thanks to @boi4 for his well-documented issue that really helped us to
pinpoint the bug right on time for the dev-13 release.
No backport needed (unless 64c9c8e gets backported).
Define a new server keyword pool-conn-name. The purpose of this keyword
will be to identify connections inside the idle connections pool,
replacing SNI in case SSL is not wanted.
This keyword uses a sample expression argument. It thus can reuse
existing function parse_srv_expr() for parsing. In the future, it may be
necessary to define a keyword variant which uses a logformat for
extensability.
This patch only implement parsing. Argument is stored inside new server
field <pool_conn_name> and expression is generated in
_srv_parse_finalize() into <pool_conn_name_expr>.
If pool-conn-name is not set but SNI is, the latter is reused
automatically as pool-conn-name via _srv_parse_finalize(). This ensures
current reuse behavior remains compatible and idle connection reuse will
not mix connections with different SNIs by mistake.
Main usage will be for rhttp when SSL is not wanted between the two
haproxy instances. Previously, it was possible to use "sni" keyword even
without SSL on a server line which have a similar effect. However,
having a dedicated "pool-conn-name" keyword is deemed clearer. Besides,
it would allow for more complex configuration where pool-conn-name and
SNI are use in parallel with different values.
Two functions exists for server sni sample expression parsing. This is
confusing so this commit aims at clarifying this.
Functions are renamed with the following identifiers. First function is
named parse_srv_expr() and can be used during parsing. Besides
expression parsing, it has ensure sample fetch validity in the context
of a server line.
Second function is renamed _parse_srv_expr() and is used internally by
parse_srv_expr(). It only implements sample parsing without extra
checks. It is already use for server instantiation derived from
server-template as checks were already performed. Also, it is now used
in http-client code as SNI is a fixed string.
Finally, both functions are generalized to remove any reference to SNI.
This will allow to reuse it to parse other server keywords which use an
expression. This will be the case for the future keyword pool-conn-name.
Dynamically allocated servers PROXY TLVs were not freed on server
release. This patch fixes this leak by extending srv_free_params().
Every server line with set-proxy-v2-tlv-fmt keyword is impacted.
For static servers, issue is minimal as it will only cause leak on
deinit(). However, this could be aggravated when performing multiple
removal of dynamic servers.
This should be backported up to 2.9.
Since the following commit, idle connections are cleared before a server
is deleted. This is better than blocking server deletion due to inactive
connections :
6e0afb2e27
MEDIUM: server: close idle conn on server deletion
A BUG_ON() has been added to ensure that server idle conn counter is nul
after these connections are removed. However, Willy managed to trigger
it easily by repeatedly and randomly delete servers accross a
single-thread haproxy using a server-template with 1000 instances. In
parallel, a h1load client is executed to generate traffic.
This BUG_ON() reflected that it some connections referencing the server
targetted for deletion remained, even though idle server list is empty.
In fact, this is caused by connections scheduled for purging. These
connections are moved from idle server list to a global toremove_list
while still being accounted by the server.
A first approach could be to decrement server idle counter while moving
connection to the purge list. However, this is functionnaly incorrect as
these purgeable connections still reference the server and it could
cause a crash if cleared after it.
The correct fix for this issue is simply to remove every purgeable
connections before a server is deleted. This is implemented by this
patch by extending cli_parse_delete_server(). It could be enough to only
remove connections targetted the deleted server, but as these
connections will be purged anyway it is justified to clear the whole
list.
This must not be backported, unless the above mentionned patch is.
Convert FN_AGE in stat_cols_px[] as generic columns. These values will
be automatically used for dump/preload of a stats-file.
Remove srv_lastsession() / be_lastsession() function which are now
useless as last_sess is calculated via me_generate_field().
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.
Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.
Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.
To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.
As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.
If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.
In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
We observed that a dynamic server which health check is down for longer
than slowstart delay at startup doesn't trigger the warmup phase, it
receives full traffic immediately. This has been confirmed by checking
haproxy UI, weight is immediately the full one (e.g. 75/75), without any
throttle applied. Further tests showed that it was similar if it was in
maintenance, and even when entering a down or maintenance state after
being up.
Another issue is that if the server is down for less time than
slowstart, when it comes back up, it briefly has a much higher weight
than expected for a slowstart.
An easy way to reproduce is to do the following:
- Add a server with e.g. a 20s slowstart and a weight of 10 in config
file
- Put it in maintenance using CLI (set server be1/srv1 state maint)
- Wait more than 20s, enable it again (set server be1/srv1 state ready)
- Observe UI, weight will show 10/10 immediately.
If server was down for less than 20s, you'd briefly see a weight and
throttle value that is inconsistent, e.g. 50% throttle value and a
weight of 5 if server comes back up after 10s before going back to
6% after a second or two.
Code analysis shows that the logic in server_recalc_eweight stops the
warmup task by setting server's next state to SRV_ST_RUNNING if it
didn't change state for longer than the slowstart duration, regardless
of its current state. As a consequence, a server being down or disabled
for longer than the slowstart duration will never enter the warmup phase
when it will be up again.
Regarding the weight when server comes back up, issue is that even if
the server is down, we still compute its next weight as if it was up,
hence when it comes back up, it can briefly have a much higher weight
than expected during slowstart, until the warmup task is called again
after last_change is updated.
This patch aims to fix both issues.
This commit is similar to previous one, except that it implements GUID
support for server instances. A guid_node field is inserted into server
structure. A new "guid" server keyword is defined.
log format expressions are broadly used within the code: once they are
parsed from input string, they are converted to a linked list of
logformat nodes.
We're starting to face some limitations because we're simply storing the
converted expression as a generic logformat_node list.
The first issue we're facing is that storing logformat expressions that
way doesn't allow us to add metadata alongside the list, which is part
of the prerequites for implementing log-profiles.
Another issue with storing logformat expressions as generic lists of
logformat_node elements is that it's starting to become really hard to
tell when we rely on logformat expressions or not in the code given that
there isn't always a comment near the list declaration or manipulation
to indicate that it's relying on logformat expressions under the hood,
so this adds some complexity for code maintenance.
This patch looks quite impressive due to changes in a lot of header and
source files (since logformat expressions are broadly used), but it does
a simple thing: it defines the lf_expr structure which itself holds a
generic list of logformat nodes, and then declares some helpers to
manipulate lf_expr elements and fixes the code so that we now exclusively
manipulate logformat_node lists as lf_expr elements outside of log.c.
For now, lf_expr struct only contains the list of logformat nodes (no
additional metadata), but now that we have dedicated type and helpers,
doing so in the future won't be problematic at all and won't require
extensive code changes.
Since faa8c3e ("MEDIUM: lb-chash: Deterministic node hashes based on
server address") the following configuration will cause haproxy to crash:
backend test1
mode http
balance hash int(1)
server s1 haproxy.org:80
This is because lbprm.update_server_eweight() method is now systematically
called in _srv_set_inetaddr_port() upon srv addr/port change (and with the
above config it happens during startup after initial dns resolution).
However, depending on the chosen lbprm algo, update_server_eweight function
may not be set (it is not a mandatory method, some lb implementations don't
define it).
Thus, using 'balance hash' with map-based hashing or 'balance sticky' will
cause a crash due to a NULL de-reference in _srv_set_inetaddr_port(). To
fix the issue, we first check that the update_server_eweight() method is
set before using it.
No backport needed unless faa8c3e ("MEDIUM: lb-chash: Deterministic node
hashes based on server address") gets backported.
Motivation: When services are discovered through DNS resolution, the order in
which DNS records get resolved and assigned to servers is arbitrary. Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they have
assigned different IPs and ports to the server in that slot.
This patch adds a server option, "hash-key <key>" which can be set to "id" (the
existing behaviour, default), "addr", or "addr-port". By deriving the keys for
the chash tree nodes from a server's address and port we ensure that independent
HAProxy instances will agree on routing decisions. If an address is not known
then the key is derived from the server's puid as it was previously.
When adjusting a server's weight, we now check whether the server's hash has
changed. If it has, we have to remove all its nodes first, since the node keys
will also have to change.
This commit allows "cookie" keyword for dynamic servers. After code
review, nothing was found which could prevent a dynamic server to use
it. An extra warning is added under cli_parse_add_server() if cookie
value is ignored due to a non HTTP backend.
This patch is not considered a bugfix. However, it may backported if
needed as its impact seems minimal.
When adding a server dynamically, we observe that when a backend has a
dynamic persistence cookie, the new server has no cookie as we receive
the following HTTP header:
set-cookie: test-cookie=; Expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/
Whereas we were expecting to receive something like the following, which
is what we receive for a server added in the config file:
set-cookie: test-cookie=abcdef1234567890; path=/
After investigating code path, srv_set_dyncookie() is never called when
adding a server through CLI, it is only called when parsing config file
or using "set server bkd1/srv1 addr".
To fix this, call srv_set_dyncookie() inside cli_parse_add_server().
This patch must be backported up to 2.4.
Since their first implementation, dynamic servers are created into
maintenance state. This has been done purposely to avoid immediate
activation of a newly inserted server.
However, this principle is incompatible if "enabled" keyword is used on
"add server". The newly created instance will be unreacheable as proxy
load-balancing algorithm is not informed of its presence via
srv_lb_propagate(). The new server could be unblocked by toggling its
state with "disable server" / "enable server" commands, which will
trigger srv_lb_propagate() invocation.
To avoid this unexpected state, simply forbid "enabled" keyword for
dynamic servers. In the long-term, it could be possible to re authorize
it but at least this requires to call srv_lb_propagate() on dynamic
server creation.
This should fix github issue #2497.
This patch should not be backported as-is, to avoid breaking dynamic
servers API on stable versions. "enabled" should instead be ignored for
them. This will be implemented in a dedicated patch on top of 2.9.
Sebastien Gross reported that 'interface' keyword ('source' subargument)
is silently ignored when used from 'default-server' directive despite the
documentation implicitly stating that the keyword should be supported
there.
When support for 'source' keyword was added to 'default-server' directive
in dba97077 ("MINOR: server: Make 'default-server' support 'source'
keyword."), we properly duplicated the conn iface_name from the default-
server but we forgot to copy the conn iface_len which must be set as well
since it is used as setsockopt()'s 'optlen' argument in
tcp_connect_server().
It should be backported to all stable versions.
Willy reported that since 3ac79b504 ("MEDIUM: server:
make server_set_inetaddr() updater serializable"), haproxy fails to
compile on some older compilers such as gcc-4.4 with this kind of error:
src/server.c: In function 'snr_resolution_cb':
src/server.c:4471: error: unknown field 'dns_resolver' specified in initializer
compilation terminated due to -Wfatal-errors.
make: *** [Makefile:1006: src/server.o] Error 1
This is due to referencing a member inside anonymous union from a compound
literal assignment. Apparently such use of anonymous union wasn't properly
supported back then on older compilers. To fix the issue, we give "u" name
to the parent union use this name to explicitly refer to the union where
relevant in the code (only a few changes fortunately).
The fix itself was verified to restore build compatibility with gcc 4.4
(and even 4.2).
As 3ac79b504 is used as a prerequisite for 64c9c8ef3 ("BUG/MINOR:
server/dns: use server_set_inetaddr() to unset srv addr from DNS"), please
consider backporting this patch too if 64c9c8ef3 happens to be backported
in 2.9.
This commit similar to the following one :
65ae241dcfe710e1cdd3ec4e7a9bde38d2e4c116
MEDIUM: server: close idle conn before server deletion
This patch implements a similar logic, this time to close private idle
connections stored in sessions. The principle is identical to the above
commit : conn_release() is used on idle connections after a takeover to
ensure thread safety.
An extra change was required to be able to execute takeover on such
connections. Their original thread ID was unknown, contrary to non
private connections which are stored in sharded lists. As such, a new
tid member has been added under sess_priv_conns chaining element.
To be able to delete a server, a number of preconditions must be
validated to ensure it is not in used anymore. Previously, if idle
connections were stored in the server, the deletion was cancelled. No
action was implemented to force idle connection closure, the only
solution was to wait for the periodic purging to be achieved.
This is an extra burden to be able to delete a server. Indeed, idle
connections are by definition inactive and can be closed prior to delete
a server. This is the exact purpose of this patch.
Idle connections removal is implemented inside "delete server" handler,
once it has been determined that the server can be freely removed. A
simple loop is run to call conn_release() over each idle connections.
Takeover is also executed before conn_release() to ensure tasks/tasklets
or any other sensible elements are not deleted from a foreign thread.
This patch should reduce the occurence of rejected "delete server"
execution, especially when connection reuse is high.
A server can only be deleted if there is no elements which reference it.
This is taken care via srv_check_for_deletion(), most notably for active
and idle connections.
A special case occurs for connections directly managed by a session.
This is for so-called private connections, when using http-reuse never
or H2 + http-reuse safe for example. In this case. server does not
account these connections into its idle lists. This caused a bug as the
server is deleted despite the session still being able to access it.
To properly fix this, add a new referencing element into the server for
these session connections. A mt_list has been chosen for this. On
default http-reuse, private connections are typically not used so it
won't make any difference. If using H2 servers, or more generally when
dealing with private connections, insert/delete should typically occur
only once per session lifetime so impact on performance should be
minimal.
This should be backported up to 2.4. Note that srv_check_for_deletion()
was introduced in 3.0 dev tree. On backport, the extra condition in it
should be placed in cli_parse_delete_server() instead.
3.0-dev1 introduced a small regression with commit b4db3be86e ("BUG/MINOR:
server: fix server_find_by_name() usage during parsing"). By changing the
way servers are indexed and moving it into the server template loop, the
first one is no longer indexed because the loop starts at low+1 since it
focuses on duplication. Let's index the first one explicitly now.
This should not be backported, unless the commit above is backported.
Compilation on solaris fails because of usage of names reserved on that
platform, i.e. 'queue' and 's_addr'.
This patch redefines 'queue' as '_queue' and renames 's_addr' to
'srv_addr' which fixes compilation for now.
Future plan: rename 'queue' in code base so define can be removed again.
Backporting: 2.9, 2.8
Contrary to static servers, dynamic servers does not initialize their
settings from a default server instance. As such, _srv_parse_init() was
responsible to set a set of minimal values to have a correct behavior.
However, some settings were not properly initialized. This caused
dynamic servers to not behave as static ones without explicit
parameters.
Currently, the main issue detected is connection reuse which was
completely impossible. This is due to incorrect pool_purge_delay and
max_reuse settings incompatible with srv_add_to_idle_list().
To fix the connection reuse, but also more generally to ensure dynamic
servers are aligned with other server instances, define a new function
srv_settings_init(). This is used to set initial values for both default
servers and dynamic servers. For static servers, srv_settings_cpy() is
kept instead, using their default server as reference.
This patch could have unexpected effects on dynamic servers behavior as
it restored proper initial settings. Previously, they were set to 0 via
calloc() invocation from new_server().
This should be backported up to 2.6, after a brief period of
observation.
Before a dynamic server can be deleted, a set of preconditions must be
validated to ensure it is not referenced naymore by a stream or a
connection. This is implemented in srv_check_for_deletion().
The various criteria specified were incomplete. This allows a server
instance to be deleted while still be referenced by a stream and a
connection.
This bug was reproduced by using ASAN compilation. A script was used to
add and delete a server every second, while using h2load to generate
traffic with download of 1k objects. Here is the ASAN error.
==140916==ERROR: AddressSanitizer: heap-use-after-free on address 0x520000020080 at pc 0x63cb25679537 bp 0x701529ff5070 sp 0x701529ff5060
READ of size 1 at 0x520000020080 thread T7
#0 0x63cb25679536 in objt_server include/haproxy/obj_type.h:99
#1 0x63cb2568f465 in process_stream src/stream.c:1823
#2 0x63cb25a4a4a2 in run_tasks_from_lists src/task.c:632
#3 0x63cb25a4bf62 in process_runnable_tasks src/task.c:876
#4 0x63cb2596a220 in run_poll_loop src/haproxy.c:3050
#5 0x63cb2596b192 in run_thread_poll_loop src/haproxy.c:3252
#6 0x701539aa9559 (/usr/lib/libc.so.6+0x8b559) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)
#7 0x701539b26a3b (/usr/lib/libc.so.6+0x108a3b) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)
To fix this, add <curr_used_conns> to the counters checked in
srv_check_for_deletion().
Outside of this bug, one case which remains sensible is for SF_DIRECT
streams which referenced a server instance early in process_stream()
before connect_server(). This occurs with use-server directive,
force-persist rule or cookie persistence. However, after code
reexamination, the code is considered reliable as process_stream() is
not rescheduled before connect_server() invocation. These observations
have been saved in sess_change_server() documentation to ensure it
remains valid in the future.
This must be backported up to 2.6.
We'll need to be able to verify whether or not a server may be deleted.
For now, both the verification and the action are performed in the same
function, at once under thread isolation. The goal here is to extract
the verification code into a new function that will perform these checks,
return a status between success/recoverable/non-recoverable failure, and
will also return a message for the caller.
Some cli_err(), cli_msg() or even ha_error() etc are missing the trailing
LF, which breaks the continuity of the CLI parsing: the extra LF that serves
to mark the end of the command is in fact taken as the missing LF and no
extra one is added.
This patch adds the missing LF on identified messages. It might be worth
trying to proceed in a more generic way with this, given the amount of
code that is possibly at risk.
Remove some QUIC definitions of members from server structure as the haproxy QUIC
stack does not support at all the server part (QUIC client) as this time.
Remove the statements in relation with their initializations.
This patch should be backported as far as 2.6 to save memory.
Since below commit, server_find_by_name() now search using
'used_server_id' proxy backend tree :
4bcfe30414
OPTIM: server: eb lookup for server_find_by_name()
This introduces a regression if server_find_by_name() is used via
check_config_validity() during post-parsing. Indeed, used_server_id tree
is populated at the same stage so it's possible to not found an existing
server. This can cause incorrect rejection of previously valid
configuration file.
To fix this, servers are now inserted in used_server_id tree during
parsing via parse_server(). This guarantees that server instances can be
retrieved during post parsing.
A known feature which uses server_find_by_name() during post parsing is
attach-srv tcp-rule used for reverse HTTP. Prior to the current fix, a
config was wrongly rejected if the rule was declared before the server
line.
This should not be backported unless the mentionned commit is.
snr_set_srv_down() (was formely known as snr_update_srv_status()), is
still too ambiguous because it's not clear whether we will be putting
the server under maintenance or not. This is mainly due to the fact that
the function behaves differently if has_no_ip is set or not.
By reviewing the function callers, it has now become clear that
snr_resolution_cb() is always calling the function with a valid resolution
so we only want to put the server under maintenance if we don't have a
valid IP address. On the other hand snr_resolution_error_cb() always
calls the function on error, with either no resolution (for SRV requests)
or with failing resolution (all cases except RSLV_STATUS_VALID), so in
this case we decide whether to put the server under maintenance case by
case (ie: expired? timeout?)
As a result, let's simplify snr_set_srv_down() so that it is only called
when the caller really thinks that the server should be put under
maintenance, which means always for snr_resolution_error_cb(), and only
if the resolution didn't yield usable ip for snr_resolution_cb().
RSLV_UPD_CNAME and RSLV_UPD_NAME_ERROR flags have now become useless since
3cf7f987 ("MINOR: dns: proper domain name validation when receiving DNS
response") as they are never set, but we forgot to remove them.
A leftover check was left by recent patch series about server
addr:svc_port propagation: a check on (msg) being set was performed
in srv_update_addr_port(), but msg is always set, so the check is not
needed and confuses coverity (See GH #2399)
As seen before, server's addr and svc_port should not be updated directly
during runtime, because even if the update is performed under the lock,
some competing threads might be reading ->addr and ->svc_port without
the lock because they simply cannot afford it.
To prevent races with such competing threads, server's addr and port
should only be updated using server_set_inetaddr() function or similar.
This patch depends on:
- "MINOR: server: ensure connection cleanup on server addr changes"
- "CLEANUP: server/event_hdl: remove purge_conn hint in INETADDR event"
- "MEDIUM: server: merge srv_update_addr() and srv_update_addr_port() logic"
- "MEDIUM: server: make server_set_inetaddr() updater serializable"
- "MINOR: server/event_hdl: expose updater info through INETADDR event"
- "MINOR: server: add dns hint in server_inetaddr_updater struct"
- "MEDIUM: server/dns: clear RMAINT when addr resolves again"
While it could be backported in 2.9 with cd994407a ("BUG/MAJOR:
server/addr: fix a race during server addr:svc_port updates") to ensure
addr and svc_port reset performed by resolver's code comply with the
API taking care of pushing the update (and thus avoid any race), some
patch dependencies are quite sensitive so it's probably best to avoid
backporting for no good reason, or at least wait for it to be considered
stable to prevent any breakeages.
snr_update_srv_status() and srvrq_update_srv_status() will both set or
clear the server RMAINT state depending of the result of the current dns
resolution.
This used to work pretty well in the past, but now that addr:svc_port
changes are changed atomically through a dedicated task, the change is
performed asynchronously, so this can cause some flapping issues if the
server is put out of maintenance while the server's address is still
unassigned.
To prevent errors, the resolver's code is now only allowed to put the
server under maintenance but not to remove it from maintenance:
the decision to remove a server from maintenance is performed by the task
responsible for updating the server's addr: if the addr resolves again
thanks to a valid DNS resolution and the server was previously under
RMAINT, then it cleared from RMAINT state.
srvrq_update_srv_status() was renamed srvrq_set_srv_down(), since it is
only called to put the server in maintenance as a result of a failing
SRV entry.
snr_update_srv_status() was renamed srv_set_srv_down() and slightly
modified so that it only takes care of putting the server under
maintenance when needed.
The cli command "set server x/y addr" does not need to remove the RMAINT
flag anymore.
server_set_inetaddr() updater argument is a simple char * string
containing infos about the caller responsible for the update.
In this patch, we try to make this argument serializable, that is, make
it so that we can easily export it without having to keep the original
pointer passed by the caller or having to work with strings of variable
lengths.
This was a prerequisite for exposing more updater information through
SERVER_INETADDR event (upcoming patch).
Static strings were simply mapped to a fixed ID that can be converted back
to a string when needed using server_inetaddr_updater_by_to_str(). One
special case one made for the SERVER_INETADDR_UPDATER_DNS_RESOLVER updater
since in this case the updater hint has to be generated from the
corresponding resolver id / nameserver id combination. This was achieved
by saving the nameserver id within the updater struct. Knowing that the
resolver id can be guessed from the server struct directly, it was not
exposed through the updater struct.
This patch depends on:
- "MINOR: resolvers: add unique numeric id to nameservers"
No functional change should be expected.
server_parse_addr_change_request() was completely replaced by the newer
srv_update_addr_port() function. Considering the function doesn't offer
useful features that srv_update_addr_port() couldn't do, we simply
remove the function.
Both functions are performing the similar tasks, except that the _port()
version is doing a bit more work.
In this patch, we add the server_set_inetaddr() function that works like
the srv_update_addr_port() but it takes parsed inputs instead of raw
strings as arguments.
Then, server_set_inetaddr() is used as underlying helper function for
both srv_update_addr() and srv_update_addr_port() to make them easier
to maintain.
Also, helper functions were added:
- server_set_inetaddr_warn() -> same as server_set_inetaddr() but report
a warning on updates.
- server_get_inetaddr() -> fills a struct server_inetaddr from srv
Since the feedback message generation part was slightly reworked, some
minor changes in the way addr:svc_port updates are reported in the logs
or cli messages should be expected (no loss of information though).
Previously, in srv_update_addr_port(), we forced connection cleanup on
server changes.
This was done in 6318d33ce ("BUG/MEDIUM: connections: force connections
cleanup on server changes").
However, there is no reason we shouldn't have done the same in
srv_update_addr() function, because the end goal is the same: perform
runtime changes on server's address.
The purge_conn hint propagated through the INETADDR server event was
simply there to keep the original behavior (only purge the connection
for events originating from srv_update_addr_port()), but to ensure the
address change is handled the same way for both code paths, we simply
ignore this hint.
server addr:svc_port updates during runtime might set or clear the
SRV_F_MAPPORTS flag. Unfortunately, the flag update is still directly
performed by srv_update_addr_port() function while the addr:svc_port
update is being scheduled for atomic update. Given that existing readers
don't take server's lock to read addr:svc_port, they also check the
SRV_F_MAPPORTS flag right after without the lock.
So we could cause the readers to incorrectly interpret the svc_port from
the server struct because the mapport information is not published
atomically, resulting in inconsistencies between svc_port / mapport flag.
(MAPPORTS flag causes svc_port to be used differently by the reader)
To fix this, we publish the mapport information within the INETADDR server
event and we let the task responsible for updating server's addr and port
position or clear the flag depending on the mapport hint.
This patch depends on:
- MINOR: server/event_hdl: add server_inetaddr struct to facilitate event data usage
- MINOR: server/event_hdl: update _srv_event_hdl_prepare_inetaddr prototype
This should be backported in 2.9 with 683b2ae01 ("MINOR: server/event_hdl:
add SERVER_INETADDR event")
Slightly change _srv_event_hdl_prepare_inetaddr() function prototype to
reduce the input arguments by learning some settings directly from the
server. Also taking this opportunity to make the function static inline
since it's relatively simple and not meant to be used directly.
4e5e2664 ("MINOR: proxy: add findserver_unique_id() and findserver_unique_name()")
added findserver_unique_id() and findserver_unique_name() functions that
were inspired from the historical findserver() function, so unfortunately
they don't perform well when used on large backend farms because they scan
the whole server list linearly.
I was about to provide a patch to optimize such functions when I stumbled
on Baptiste's work:
19a106d24 ("MINOR: server: server_find functions: id, name, best_match")
It turns out Baptiste already implemented helper functions to supersed
the unoptimized findserver() function (at least at runtime when servers
have been assigned their final IDs and inserted in the lookup trees): they
offer more matching options and rely on eb lookups so they are much more
suitable for fast queries. I don't know how I missed that, but they are a
perfect base for the server rid matching functions.
So in this patch, we essentially revert 4e5e2664 to provide the optimized
equivalent functions named server_find_by_id_unique() and
server_find_by_name_unique(), then we force existing findserver_unique_*()
callers to switch to the new functions.
This patch depends on:
- "OPTIM: server: eb lookup for server_find_by_name()"
This could be backported up to 2.8.
server_find_by_name() function was added in 19a106d24 ("MINOR: server:
server_find functions: id, name, best_match").
At that time, only the used_server_id proxy tree was available, thus the
name lookup was performed as a linear search.
However, used_server_name proxy tree was added in 84d6046a ("MINOR: proxy:
Add a "server by name" tree to proxy."), so we may safely rely on it to
perform server name lookups now. This will hopefully make the function
quite faster, especially when performing lookups in huge backend farms.