75e480d10 ("MEDIUM: stats: avoid 1 indirection by storing the shared
stats directly in counters struct") took care of renaming
counters_fe_shared_init() but we forgot counters_be_shared_init().
Let's fix that for consistency
Between 3.2 and 3.3-dev we noticed a noticeable performance regression
due to stats handling. After bisecting, Willy found out that recent
work to split stats computing accross multiple thread groups (stats
sharding) was responsible for that performance regression. We're looking
at roughly 20% performance loss.
More precisely, it is the added indirections, multiplied by the number
of statistics that are updated for each request, which in the end causes
a significant amount of time being spent resolving pointers.
We noticed that the fe_counters_shared and be_counters_shared structures
which are currently allocated in dedicated memory since a0dcab5c
("MAJOR: counters: add shared counters base infrastructure")
are no longer huge since 16eb0fab31 ("MAJOR: counters: dispatch counters
over thread groups") because they now essentially hold flags plus the
per-thread group id pointer mapping, not the counters themselves.
As such we decided to try merging fe_counters_shared and
be_counters_shared in their parent structures. The cost is slight memory
overhead for the parent structure, but it allows to get rid of one
pointer indirection. This patch alone yields visible performance gains
and almost restores 3.2 stats performance.
counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and
now returns either failure or success instead of a pointer because we
don't need to retrieve a shared pointer anymore, the function takes care
of initializing existing pointer.
Now we only copy the default server's settings if such a default server
exists, otherwise we only initialize it. At the moment it always exists.
The change is mostly performed in srv_settings_cpy() since that's where
each caller passes through, and there's no point duplicating that test
everywhere.
The server struct has gone huge over time (~3.8kB), and having a copy
of it in the defsrv section of the struct proxy costs a lot of RAM,
that is not needed anymore at run time.
This patch replaces this struct with a dynamically allocated one. The
field is allocated and initialized during alloc_new_proxy() and is
freed when the proxy is destroyed for now. But the goal will be to
support freeing it after parsing the section.
The test in srv_ssl_settings_cpy() comparing src to the server's proxy's
default server does work but it's a subtle trap. Indeed, no check is made
on srv->proxy to be valid, and this only works because the compiler is
comparing pointer offsets. During the boot, it's common to have NULL here
in srv->proxy and of course in this case srv does not match that value
which is NULL plus epsilon. But when trying to turn defsrv to a dynamic
pointer instead, then the compiler is forced to dereference this NULL
srv->proxy and dies during init.
Let's always add the null check for srv->proxy before the test to avoid
this situation.
No backport is needed since the problem cannot happen yet.
At a few places we're seeing some open-coding of the same function, likely
because it looks overkill for what it's supposed to do, due to extraneous
tests that are not needed (e.g. check of the backend's PR_CAP_BE etc).
Let's just remove all these superfluous tests and inline it so that it
feels more suitable for use everywhere it's needed.
This function doesn't just look at the name but also the ID when the
argument starts with a '#'. So the name is not correct and explains
why this function is not always used when the name only is needed,
and why the list-based findserver() is used instead. So let's just
call the function "server_find()", and rename its generation-id based
cousin "server_find_unique()".
Let's just use the tree-based lookup instead of walking through the list.
This function is used to find duplicates in "track" statements and a few
such places, so it's important not to waste too much time on large setups.
The server's "hostname_dn" is in Domain Name format, not a pure string, as
converted by resolv_str_to_dn_label(). It is made of lower-case string
components delimited by binary lengths, e.g. <0x03>www<0x07>haproxy<0x03)org.
As such it must not be lowercased again in srv_state_srv_update(), because
1) it's useless on the name components since already done, and 2) because
it would replace component lengths 97 and above by 32-char shorter ones.
Granted, not many domain names have that large components so the risk is
very low but the operation is always wrong anyway. This was brought in
2.5 by commit 3406766d57 ("MEDIUM: resolvers: add a ref between servers
and srv request or used SRV record").
In the same vein, let's fix the confusing strcasecmp() that are applied
to this binary format, and use memcmp() instead. Here there's basically
no risk to incorrectly match the wrong record, but that test alone is
confusing enough to provoke the existence of the bug above.
Finally let's update the component for that field to mention that it's
in this format and already lower cased.
Better not backport this, the risk of facing this bug is almost zero, and
every time we touch such files something breaks for bad reasons.
To properly support QUIC for dynamic servers, it is required to extend
add server CLI handler :
* ensure conformity between server address and proto
* automatically set proto to QUIC if not specified
* prepare_srv callback must be called to initialize required SSL context
Prior to this patch, crashes may occur when trying to use QUIC with
dynamic servers.
Also, destroy_srv callback must be called when a dynamic server is
deallocated. This ensures that there is no memory leak due to SSL
context.
No need to backport.
Make the server line parsing fail when a QUIC backend is configured if haproxy
is built to use the OpenSSL stack compatibility module. This latter does not
support the QUIC client part.
This patch adds the support for the RFC2385 (Protection of BGP Sessions via
the + TCP MD5 Signature Option) for the listeners and the servers. The
feature is only available on Linux. Keywords are not exposed otherwise.
By setting "tcp-md5sig <password>" option on a bind line, TCP segments of
all connections instantiated from the listening socket will be signed with a
16-byte MD5 digest. The same option can be set on a server line to protect
outgoing connections to the corresponding server.
The primary use case for this option is to allow BGP to protect itself
against the introduction of spoofed TCP segments into the connection
stream. But it can be useful for any very long-lived TCP connections.
A reg-test was added and it will be executed only on linux. All other
targets are excluded.
Since proxy and server struct already have an internal last_change
variable and we cannot merge it with the shared counter one, let's
rename the last_change counter to be more specific and prevent the
mixup between the two.
last_change counter is renamed to last_state_change, and unlike the
internal last_change, this one is a shared counter so it is expected
to be updated by other processes in our back.
However, when updating last_state_change counter, we use the value
of the server/proxy last_change as reference value.
Same motivation as previous commit, proxy last_change is "abused" because
it is used for 2 different purposes, one for stats, and the other one
for process-local internal use.
Let's add a separate proxy-only last_change variable for internal use,
and leave the last_change shared (and thread-grouped) counter for
statistics.
last_change server metric is used for 2 separate purposes. First it is
used to report last server state change date for stats and other related
metrics. But it is also used internally, including in sensitive paths,
such as lb related stuff to take decision or perform computations
(ie: in srv_dynamic_maxconn()).
Due to last_change counter now being split over thread groups since 16eb0fa
("MAJOR: counters: dispatch counters over thread groups"), reading the
aggregated value has a cost, and we cannot afford to consult last_change
value from srv_dynamic_maxconn() anymore. Moreover, since the value is
used to take decision for the current process we don't wan't the variable
to be updated by another process in our back.
To prevent performance regression and sharing issues, let's instead add a
separate srv->last_change value, which is not updated atomically (given how
rare the updates are), and only serves for places where the use of the
aggregated last_change counter/stats (split over thread groups) is too
costly.
16eb0fa ("MAJOR: counters: dispatch counters over thread groups")
introduced some bugs: as a result of improper copy paste during
COUNTERS_SHARED_LAST() macro introduction, some functions such as
srv_downtime() which used to make use of the server last_change variable
now use the proxy one, which doesn't make sense and will likely cause
unexpected logical errors/bugs.
Let's fix them all at once by properly pointing to the server last_change
variable when relevant.
No backport needed.
_srv_check_proxy_mode() is currently executed during server init (from
_srv_parse_init()), while it used to be fine for current checks, it
seems it occurs a bit too early to be usable for some checks that depend
on server keywords to be evaluated for instance.
As such, to make _srv_check_proxy_mode() more relevant and be extended
with additional checks in the future, let's call it later during server
finalization, once all server keywords were evaluated.
No change of behavior is expected
This "renegotiate" option can be set on SSL backends to allow secure
renegotiation. It is mostly useful with SSL libraries that disable
secure regotiation by default (such as AWS-LC).
The "no-renegotiate" one can be used the other way around, to disable
secure renegotation that could be allowed by default.
Those two options can be set via "ssl-default-server-options" as well.
As mentioned in 2.8 announce on the mailing list [1] and on the wiki [2]
native mailers were deprecated and planned for removal in 3.3. Now is
the time to drop the legacy code for native mailers which is based on a
tcpcheck "hack" and cannot be maintained. Lua mailers should be used as
a drop in replacement. Indeed, "mailers" and associated config directives
are preserved because mailers config is exposed to Lua, which helps smoothing
the transition from native mailers to Lua based ones.
As a reminder, to keep mailers configuration working as before without
making changes to the config file, simply add the line below to the global
section:
lua-load examples/lua/mailers.lua
mailers.lua script (provided in the git repository, adjust path as needed)
may be customized by users familiar with Lua, by default it emulates the
behavior of the native (now removed) mailers.
[1]: https://www.mail-archive.com/haproxy@formilux.org/msg43600.html
[2]: https://github.com/haproxy/wiki/wiki/Breaking-changes
Report an error during server configuration if QUIC is used by SSL is
not activiated via 'ssl' keyword. This is done in _srv_parse_finalize(),
which is both used by static and dynamic servers.
Note that contrary to listeners, an error is reported instead of a
warning, and SSL is not automatically activated if missing. This is
mainly due to the complex server configuration : _srv_parse_finalize()
is ideal to affect every servers, including dynamic entries. However, it
is executed after server SSL context allocation performed via
<prepare_srv> XPRT operation. A proper fix would be to move SSL ctx
alloc in _srv_parse_finalize(), but this may have unknown impact. Thus,
for now a simpler solution has been chosen.
Add ->quic_params new member to server struct.
Also set the ->xprt member of the server being initialized and initialize asap its
transport parameters from _srv_parse_init().
Mark QUIC address support for servers as experimental on the backend
side. Previously, it was allowed but wouldn't function as expected. As
QUIC backend support requires several changes, it is better to declare
it as experimental first.
QUIC is not implemented on the backend side. To prevent any issue, it is
better to reject any server configured which uses it. This is done via
_srv_parse_init() which is used both for static and dynamic servers.
This should be backported up to all stable versions.
Most fe and be counters are good candidates for being shared between
processes. They are now grouped inside "shared" struct sub member under
be_counters and fe_counters.
Now they are properly identified, they would greatly benefit from being
shared over thread groups to reduce the cost of atomic operations when
updating them. For this, we take the current tgid into account so each
thread group only updates its own counters. For this to work, it is
mandatory that the "shared" member from {fe,be}_counters is initialized
AFTER global.nbtgroups is known, because each shared counter causes the stat
to be allocated lobal.nbtgroups times. When updating a counter without
concurrency, the first counter from the array may be updated.
To consult the shared counters (which requires aggregation of per-tgid
individual counters), some helper functions were added to counter.h to
ease code maintenance and avoid computing errors.
proxies, listeners and server shared counters are now managed via helpers
added in one of the previous commits.
When guid is not set (ie: when not yet assigned), shared counters pointer
is allocated using calloc() (local memory) and a flag is set on the shared
counters struct to know how to manipulate (and free it). Else if guid is
set, then it means that the counters may be shared so while for now we
don't actually use a shared memory location the API is ready for that.
The way it works, for proxies and servers (for which guid is not known
during creation), we first call counters_{fe,be}_shared_get with guid not
set, which results in local pointer being retrieved (as if we just
manually called calloc() to retrieve a pointer). Later (during postparsing)
if guid is set we try to upgrade the pointer from local to shared.
Lastly, since the memory location for some objects (proxies and servers
counters) may change from creation to postparsing, let's update
counters->last_change member directly under counters_{fe,be}_shared_get()
so we don't miss it.
No change of behavior is expected, this is only preparation work.
Shareable counters are not tagged as shared counters and are dynamically
allocated in separate memory area as a prerequisite for being stored
in shared memory area. For now, GUID and threads groups are not taken into
account, this is only a first step.
also we ensure all counters are now manipulated using atomic operations,
namely, "last_change" counter is now read from and written to using atomic
ops.
Despite the numerous changes caused by the counters being moved away from
counters struct, no change of behavior should be expected.
rename _srv_postparse() internal function to srv_init() function and group
srv_init_per_thr() plus idle conns list init inside it. This way we can
perform some simplifications as srv_init() performs multiple server
init steps after parsing.
SRV_F_CHECKED flag was added, it is automatically set when srv_init()
runs successfully. If the flag is already set and srv_init() is called
again, nothing is done. This permis to manually call srv_init() earlier
than the default POST_CHECK hook when needed without risking to do things
twice.
while new_server() takes the parent proxy as argument and even assigns
srv->proxy to the parent proxy, it didn't actually inserted the server
to the parent proxy server list on success.
The result is that sometimes we add the server to the list after
new_server() is called, and sometimes we don't.
This is really error-prone and because of that hooks such as
REGISTER_POST_SERVER_CHECK() which as run for all servers listed in
all proxies may not be relied upon for servers which are not actually
inserted in their parent proxy server list. Plus it feels very strange
to have a server that points to a proxy, but then the proxy doesn't know
about it because it cannot find it in its server list.
To prevent errors and make proxy->srv list reliable, we move the insertion
logic directly under new_server(). This requires to know if we are called
during parsing or during runtime to either insert or append the server to
the parent proxy list. For that we use PR_FL_CHECKED flag from the parent
proxy (if the flag is set, then the proxy was checked so we are past the
init phase, thus we assume we are called during runtime)
This implies that during startup if new_server() has to be cancelled on
error paths we need to call srv_detach() (which is now exposed in server.h)
before srv_drop().
The consequence of this commit is that REGISTER_POST_SERVER_CHECK() should
not run reliably on all servers created using new_server() (without having
to manually loop on global servers_list)
init_srv_requeue() and init_srv_slowstart() functions are called after
initial server parsing via REGISTER_POST_SERVER_CHECK() hook, and they
are also manually called for dynamic server after the server is
initialized.
This may conflict with _srv_postparse() which is also registered via
REGISTER_POST_SERVER_CHECK() and called during dynamic server creation
To ensure functions don't conflict with each other, let's ensure they
are executed in proper order by calling init_srv_requeue and
init_srv_slowstart() from _srv_postparse() which now becomes the parent
function for server related postparsing stuff. No change of behavior is
expected.
Implement stress mode on "add server help". This ensures that the
command is fully reentrant on full output buffer.
For testing, it requires compilation with USE_STRESS and global setting
"stress-level 1".
Implement "help" as a sub-command for "add server" CLI. The objective is
to list all the keywords that are supported for dynamic servers. CLI IO
handler and add_srv_ctx are used to support reentrancy on full output
buffer.
Now that this command is implemented, the outdated keyword list on "add
server" from management documentation can be removed.
Extend "add server" to support an IO handler function named
cli_io_handler_add_server(). A context object is also defined whose
usage will depend on IO handler capabilities.
IO handler is skipped when "add server" is run in default mode, i.e. on
a dynamic server creation. Thus, currently IO handler is unneeded.
However, it will become useful to support sub-commands for "add server".
Note that return value of "add server" parser has been changed on server
creation success. Previously, it was used incorrectly to report if
server was inserted or not. In fact, parser return value is used by CLI
generic code to detect if command processing has been completed, or
should continue to the IO handler. Now, "add server" always returns 1 to
signal that CLI processing is completed. This is necessary to preserve
CLI output emitted by parser, even now that IO handler is defined for
the command. Previously, output was emitted in every situations due to
IO handler not defined. See below code snippet from cli.c for a better
overview :
if (kw->parse && kw->parse(args, payload, appctx, kw->private) != 0) {
ret = 1;
goto fail;
}
/* kw->parse could set its own io_handler or io_release handler */
if (!appctx->cli_ctx.io_handler) {
ret = 1;
goto fail;
}
appctx->st0 = CLI_ST_CALLBACK;
ret = 1;
goto end;
Last commit 7361515 ("BUG/MINOR: server: dont depend on proxy for server
cleanup in srv_drop()") introduced a regression because the lbprm
server_deinit is not evaluated anymore with dynamic servers, possibly
resulting in a memory leak.
To fix the issue, in addition to free_proxy(), the server deinit check
should be manually performed in cli_parse_delete_server() as well.
No backport needed.
In commit b5ee8bebfc ("MINOR: server: always call ssl->destroy_srv when
available"), we made it so srv_drop() doesn't depend on proxy to perform
server cleanup.
It turns out this is now mandatory, because during deinit, free_proxy()
can occur before the final srv_drop(). This is the case when using Lua
scripts for instance.
In 2a9436f96 ("MINOR: lbprm: Add method to deinit server and proxy") we
added a freeing check under srv_drop() that depends on the proxy.
Because of that UAF may occur during deinit when using a Lua script that
manipulate server objects.
To fix the issue, let's perform the lbprm server deinit logic under
free_proxy() directly, where the DEINIT server hooks are evaluated.
Also, to prevent similar bugs in the future, let's explicitly document
in srv_drop() that server cleanups should assume that the proxy may
already be freed.
No backport needed unless 2a9436f96 is.
commit 29b76cae4 ("BUG/MEDIUM: server/log: "mode log" after server
keyword causes crash") introduced some postparsing checks/tasks for
server
Initially they were mainly meant for "mode log" servers postparsing, but
we already have a check dedicated to "tcp/http" servers (ie: only tcp
proto supported)
However when dynamic servers are added they bypass _srv_postparse() since
the REGISTER_POST_SERVER_CHECK() is only executed for servers defined in
the configuration.
To ensure consistency between dynamic and static servers, and ensure no
post-check init routine is missed, let's manually invoke _srv_postparse()
after creating a dynamic server added via the cli.
Add a pointer to the server into the struct srv_per_tgroup, so that if
we only have access to that srv_per_tgroup, we can come back to the
corresponding server.
This commit implements support for idle-ping on the backend side. First,
a new server keyword "idle-ping" is defined in configuration parsing. It
is used to set the corresponding new server member.
The second part of this commit implements idle-ping support on H2 MUX. A
new inlined function conn_idle_ping() is defined to access connection
idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and
H2_CF_IDL_PING_SENT. The first one is set for idle connections via
h2c_update_timeout().
On h2_timeout_task() handler, if first flag is set, instead of releasing
the connection as before, the second flag is set and tasklet is
scheduled. As both flags are now set, h2_process_mux() will proceed to
PING emission. The timer has also been rearmed to the idle-ping value.
If a PING ACK is received before next timeout, connection timer is
refreshed. Else, the connection is released, as with timer expiration.
Also of importance, special care is needed when a backend connection is
going to idle. In this case, idle-ping timer must be rearmed. Thus a new
invokation of h2c_update_timeout() is performed on h2_detach().
This commit is a direct follow-up of the previous one. It defines a new
server keyword check-pool-conn-name. It is used as the default value for
the name parameter of idle connection hash generation.
Its behavior is similar to server keyword pool-conn-name, but reserved
for checks reuse. If check-pool-conn-name is set, it is used in priority
to match a connection for reuse. If unset, a fallback is performed on
check-sni.
Without check-reuse-pool, it is impossible to perform check on server
using @rhttp protocol. This is due to the inherent nature of the
protocol which does not implement an active connect method.
Thus, ensure that check-reuse-pool is always set when a reverse HTTP
server is declared. This reduces server configuration and should prevent
any omission. Note that it is still require to add "check" server
keyword so activate server checks.
Duplicate server check.reuse_pool boolean value in srv_settings_cpy().
This is necessary to ensure that check-reuse-pool value can be set via
default-server or server-template.
This does not need to be backported.
Add two new methods to lbprm, server_deinit() and proxy_deinit(),
in case something should be done at the lbprm level when
removing servers and proxies.
This function was marked as requiring thread isolation because its code
was extracted from cli_parse_delete_server() and was running under
isolation. But upon closer inspection, and using atomic loads to check
a few counters, it is actually safe to run without isolation, so let's
reflect that in its description.
However, it remains true that cli_parse_delete_server() continues to call
it under isolation.
Now that thanks to commit c880c32b16 ("MINOR: stream: decrement
srv->served after detaching from the list") we can trust srv->served,
let's use it and no longer loop on threads when checking if a server
still has streams attached to it. This will be much cheaper and will
result in keeping isolation for a shorter time in the "wait" command.