This one is included almost everywhere and used to rely on a few other
.h that are not needed (unistd, stdlib, standard.h). It could possibly
make sense to split it into multiple parts to distinguish operations
performed on timers and the internal time accounting, but at this point
it does not appear much important.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.
The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).
A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.
No further cleanup was done yet in order to keep changes minimal.
log-proto <logproto>
The "log-proto" specifies the protocol used to forward event messages to
a server configured in a ring section. Possible values are "legacy"
and "octet-count" corresponding respectively to "Non-transparent-framing"
and "Octet counting" in rfc6587. "legacy" is the default.
Notes: a separated io_handler was created to avoid per messages test
and to prepare code to set different log protocols such as
request- response based ones.
Helper functions are used to dump bind, server or filter keywords. These
functions are used to report errors during the configuration parsing. To have a
coherent API, these functions are now prepared to handle a null pointer as
argument. If so, no action is performed and functions immediately return.
This patch should fix the issue #631. It is not a bug. There is no reason to
backport it.
srv_cleanup_connections() is supposed to be static, so mark it as so.
This patch should be backported where commit 6318d33ce6
("BUG/MEDIUM: connections: force connections cleanup on server changes")
will be backported, that is to say v1.9 to v2.1.
Fixes: 6318d33ce6 ("BUG/MEDIUM: connections: force connections cleanup
on server changes")
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
I've been trying to understand a change of behaviour between v2.2dev5 and
v2.2dev6. Indeed our probe is regularly testing to add and remove
servers on a given backend such as:
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -
-> curl on the corresponding frontend: reply from server:31255
# echo "set server be_foo/srv1 addr 10.236.139.34 port 31257" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31257' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31257 -
-> curl on the corresponding frontend: reply for server:31257
(notice the difference of weight)
# echo "set server be_foo/srv1 state maint" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 addr 0.0.0.0 port 0" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '10.236.139.34' to '0.0.0.0', port changed from '31257' to '0' by 'stats socket command'
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -
-> curl on the corresponding frontend: reply from server:31255
# echo "set server be_foo/srv1 addr 10.236.139.34 port 31256" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31256' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31256 -
-> curl on the corresponding frontend: reply from server:31257 (!)
Here we indeed would expect to get an anver from server:31256. The issue
is highly linked to the usage of `pool-purge-delay`, with a value which
is higher than the duration of the test, 10s in our case.
a git bisect between dev5 and dev6 seems to show commit
079cb9af22 ("MEDIUM: connections: Revamp the way idle connections are killed")
being the origin of this new behaviour.
So if I understand the later correctly, it seems that it was more a
matter of chance that we did not saw the issue earlier.
My patch proposes to force clean idle connections in the two following
cases:
- we set a (still running) server to maintenance
- we change the ip/port of a server
This commit should be backported to 2.1, 2.0, and 1.9.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
The variable 'ret' must only be declared When HAProxy is compiled with the SSL
support (more precisely SSL_CTRL_SET_TLSEXT_HOSTNAME must be defined).
No backport needed.
A shared tcp-check ruleset is now created to support agent checks. The following
sequence is used :
tcp-check send "%[var(check.agent_string)] log-format
tcp-check expect custom
The custom function to evaluate the expect rule does the same that it was done
to handle agent response when a custom check was used.
Parsing of following keywords have been moved in checks.c file : addr, check,
check-send-proxy, check-via-socks4, no-check, no-check-send-proxy, rise, fall,
inter, fastinter, downinter and port.
A global list to tcp-check ruleset can now be used to share common rulesets with
all backends without any duplication. It is mandatory to convert all specific
protocol checks (redis, pgsql...) to tcp-check healthchecks.
To do so, a flag is now attached to each tcp-check ruleset to know if it is a
shared ruleset or not. tcp-check rules defined in a backend are still directly
attached to the proxy and not shared. In addition a second flag is used to know
if the ruleset is inherited from the defaults section.
To allow reusing these blocks without consuming more memory, their list
should be static and share-able accross uses. The head of the list will
be shared as well.
It is thus necessary to extract the head of the rule list from the proxy
itself. Transform it into a pointer instead, that can be easily set to
an external dynamically allocated head.
The options and directives related to the configuration of checks in a backend
may be defined after the servers declarations. So, initialization of the check
of each server must not be performed during configuration parsing, because some
info may be missing. Instead, it must be done during the configuration validity
check.
Thus, callback functions are registered to be called for each server after the
config validity check, one for the server check and another one for the server
agent-check. In addition deinit callback functions are also registered to
release these checks.
This patch should be backported as far as 1.7. But per-server post_check
callback functions are only supported since the 2.1. And the initcall mechanism
does not exist before the 1.9. Finally, in 1.7, the code is totally
different. So the backport will be harder on older versions.
Error codes ERR_WARN and ERR_ALERT are used to signal that the error
given is of the corresponding level. All errors are displayed as ALERT
in the display_parser_err() function.
Differentiate the display level based on the error code. If both
ERR_WARN and ERR_ALERT are used, ERR_ALERT is given priority.
Documentation states that default settings for ssl server options can be set
using either ssl-default-server-options or default-server directives. In practice,
not all ssl server options can have default values, such as ssl-min-ver, ssl-max-ver,
etc..
This patch adds the missing ssl options in srv_ssl_settings_cpy() and srv_parse_ssl(),
making it possible to write configurations like the following examples, and have them
behave as expected.
global
ssl-default-server-options ssl-max-ver TLSv1.2
defaults
mode http
listen l1
bind 1.2.3.4:80
default-server ssl verify none
server s1 1.2.3.5:443
listen l2
bind 2.2.3.4:80
default-server ssl verify none ssl-max-ver TLSv1.3 ssl-min-ver TLSv1.2
server s1 1.2.3.6:443
This should be backported as far as 1.8.
This fixes issue #595.
Before supporting "server" line in "peers" section, such sections without
any local peer were removed from the configuration to get it validated.
This patch fixes the issue where a "server" line without address and port which
is a remote peer without address and port makes the configuration parsing fail.
When encoutering such cases we now ignore such lines remove them from the
configuration.
Thank you to Jérôme Magnin for having reported this bug.
Must be backported to 2.1 and 2.0.
The original algorithm always killed half the idle connections. This doesn't
take into account the way the load can change. Instead, we now kill half
of the exceeding connections (exceeding connection being the number of
used + idle connections past the last maximum used connections reached).
That way if we reach a peak, we will kill much less, and it'll slowly go back
down when there's less usage.
Revamp the server connection lists. We know have 3 lists :
- idle_conns, which contains idling connections
- safe_conns, which contains idling connections that are safe to use even
for the first request
- available_conns, which contains connections that are not idling, but can
still accept new streams (those are HTTP/2 or fastcgi, and are always
considered safe).
This patch adds the `unique-id` option to `proxy-v2-options`. If this
option is set a unique ID will be generated based on the `unique-id-format`
while sending the proxy protocol v2 header and stored as the unique id for
the first stream of the connection.
This feature is meant to be used in `tcp` mode. It works on HTTP mode, but
might result in inconsistent unique IDs for the first request on a keep-alive
connection, because the unique ID for the first stream is generated earlier
than the others.
Now that we can send unique IDs in `tcp` mode the `%ID` log variable is made
available in TCP mode.
The isalnum(), isalpha(), isdigit() etc functions from ctype.h are
supposed to take an int in argument which must either reflect an
unsigned char or EOF. In practice on some platforms they're implemented
as macros referencing an array, and when passed a char, they either cause
a warning "array subscript has type 'char'" when lucky, or cause random
segfaults when unlucky. It's quite unconvenient by the way since none of
them may return true for negative values. The recent introduction of
cygwin to the list of regularly tested build platforms revealed a lot
of breakage there due to the same issues again.
So this patch addresses the problem all over the code at once. It adds
unsigned char casts to every valid use case, and also drops the unneeded
double cast to int that was sometimes added on top of it.
It may be backported by dropping irrelevant changes if that helps better
support uncommon platforms. It's unlikely to fix bugs on platforms which
would already not emit any warning though.
When an end pointer is passed, instead of complaining that a comma is
missing after a keyword, sample_parse_expr() will silently return the
pointer to the current location into this return pointer so that the
caller can continue its parsing. This will be used by more complex
expressions which embed sample expressions, and may even permit to
embed sample expressions into arguments of other expressions.
Most DNS servers provide A/AAAA records in the Additional section of a
response, which correspond to the SRV records from the Answer section:
;; QUESTION SECTION:
;_http._tcp.be1.domain.tld. IN SRV
;; ANSWER SECTION:
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A1.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A8.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A5.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A6.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A4.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A3.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A2.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A7.domain.tld.
;; ADDITIONAL SECTION:
A1.domain.tld. 3600 IN A 192.168.0.1
A8.domain.tld. 3600 IN A 192.168.0.8
A5.domain.tld. 3600 IN A 192.168.0.5
A6.domain.tld. 3600 IN A 192.168.0.6
A4.domain.tld. 3600 IN A 192.168.0.4
A3.domain.tld. 3600 IN A 192.168.0.3
A2.domain.tld. 3600 IN A 192.168.0.2
A7.domain.tld. 3600 IN A 192.168.0.7
SRV record support was introduced in HAProxy 1.8 and the first design
did not take into account the records from the Additional section.
Instead, a new resolution is associated to each server with its relevant
FQDN.
This behavior generates a lot of DNS requests (1 SRV + 1 per server
associated).
This patch aims at fixing this by:
- when a DNS response is validated, we associate A/AAAA records to
relevant SRV ones
- set a flag on associated servers to prevent them from running a DNS
resolution for said FADN
- update server IP address with information found in the Additional
section
If no relevant record can be found in the Additional section, then
HAProxy will failback to running a dedicated resolution for this server,
as it used to do.
This behavior is the one described in RFC 2782.
Since commit 980855bd95 ("BUG/MEDIUM: server: initialize the orphaned
conns lists and tasks at the end"), we no longer use err section.
This should fix github issue #438
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Issue #417 reports a possible memory leak in the state-file loading code.
There's one such place in the loop which corresponds to parsing errors
where the curreently allocated line is not freed when dropped. In any
case this is very minor in that no more than the file's length may be
lost in the worst case, considering that the whole file is kept anyway
in case of success. This fix addresses this.
It should be backported to 2.1.
The global state file tree isn't configured for unique keys, so if an
entry appears multiple times, e.g. due to a bogus script that concatenates
entries multiple times, this will needlessly eat memory. Let's just drop
duplicates.
This should be backported to 2.1.
Starting haproxy with a state file of 700k servers eats 11.2 GB of RAM
due to a mistake in the function that loads the strings into a tree: it
allocates a full buffer for each backend+server name instead of allocating
just the required string. By just fixing this we're down to 80 MB.
This should be backported to 2.1.
As reported in issue #408, "agent-addr" doesn't work on default-server
lines. This is due to the transcription of the old "addr" option in commit
6e5e0d8f9e ("MINOR: server: Make 'default-server' support 'addr' keyword.")
which correctly assigns it to the check.addr and agent.addr fields, but
which also copies the default check.addr into both the check's and the
agent's addr fields. Thus the default agent's address is never used.
This fix makes sure to copy the check from the check and the agent from
the agent. However it's worth noting that if "addr" is specified on the
server line, it will still overwrite both the check and the agent's
addresses.
This must be backported as far as 1.8.
It was noted in #48 that there are times when a configuration
may use the server-template directive with SRV records and
simultaneously want to control weights using an agent-check or
through the runtime api. This patch adds a new option
"ignore-weight" to the "resolve-opts" directive.
When specified, any weight indicated within an SRV record will
be ignored. This is for both initial resolution and ongoing
resolution.
fopen() can return NULL when state file is missing. This patch adds a
check of fopen() return value so we can skip processing in such case.
No backport needed.
Instead of using the same type for regular linked lists and "autolocked"
linked lists, use a separate type, "struct mt_list", for the autolocked one,
and introduce a set of macros, similar to the LIST_* macros, with the
MT_ prefix.
When we use the same entry for both regular list and autolocked list, as
is done for the "list" field in struct connection, we know have to explicitely
cast it to struct mt_list when using MT_ macros.
There were 221 places where a status message or an error message were built
to be returned on the CLI. All of them were replaced to use cli_err(),
cli_msg(), cli_dynerr() or cli_dynmsg() depending on what was expected.
This removed a lot of duplicated code because most of the times, 4 lines
are replaced by a single, safer one.
A problem involving server slowstart was reported by @max2k1 in issue #197.
The problem is that pendconn_grab_from_px() takes the proxy lock while
already under the server's lock while process_srv_queue() first takes the
proxy's lock then the server's lock.
While the latter seems more natural, it is fundamentally incompatible with
mayn other operations performed on servers, namely state change propagation,
where the proxy is only known after the server and cannot be locked around
the servers. Howwever reversing the lock in process_srv_queue() is trivial
and only the few functions related to dynamic cookies need to be adjusted
for this so that the proxy's lock is taken for each server operation. This
is possible because the proxy's server list is built once at boot time and
remains stable. So this is what this patch does.
The comments in the proxy and server structs were updated to mention this
rule that the server's lock may not be taken under the proxy's lock but
may enclose it.
Another approach could consist in using a second lock for the proxy's queue
which would be different from the regular proxy's lock, but given that the
operations above are rare and operate on small servers list, there is no
reason for overdesigning a solution.
This fix was successfully tested with 10000 servers in a backend where
adjusting the dyncookies in loops over the CLI didn't have a measurable
impact on the traffic.
The only workaround without the fix is to disable any occurrence of
"slowstart" on server lines, or to disable threads using "nbthread 1".
This must be backported as far as 1.8.
When we're purging idle connections, there's a race condition, when we're
removing the connection from the idle list, to add it to the list of
connections to free, if the thread owning the connection tries to free it
at the same time.
To fix this, simply add a per-thread lock, that has to be hold before
removing the connection from the idle list, and when, in conn_free(), we're
about to remove the connection from every list. That way, we know for sure
the connection will stay valid while we remove it from the idle list, to add
it to the list of connections to free.
This should happen rarely enough that it shouldn't have any impact on
performances.
This has not been reported yet, but could provoke random segfaults.
This should be backported to 2.0.
Server states can be recovered from either a "global" file (all backends)
or a "local" file (per backend).
The way the algorithm to parse the state file was first implemented was good
enough for a low number of backends and servers per backend.
Basically, for each backend the state file (global or local) is opened,
parsed entirely and for each line we check if it contains data related to
a server from the backend we're currently processing.
We must read the file entirely, just in case some lines for the current
backend are stored at the end of the file.
This does not scale at all!
This patch changes the behavior above for the "global" file only. Now,
the global file is read and parsed once and all lines it contains are
stored in a tree, for faster discovery.
This result in way much less fopen, fgets, and strcmp calls, which make
loading of very big state files very quick now.
Since h7da71293e431b5ebb3d6289a55b0102331788ee6as has been added, the
server name (srv->id in the code) is now unique per backend, which
means it can reliabely be used to identify a server recovered from the
server-state file.
This patch cleans up the parsing of server-state file and ensure we use
only the server name as a reliable key.
As reported in GH issue #109 and in discourse issue
https://discourse.haproxy.org/t/haproxy-returns-408-or-504-error-when-timeout-client-value-is-every-25d
the time parser doesn't error on overflows nor underflows. This is a
recurring problem which additionally has the bad taste of taking a long
time before hitting the user.
This patch makes parse_time_err() return special error codes for overflows
and underflows, and adds the control in the call places to report suitable
errors depending on the requested unit. In practice, underflows are almost
never returned as the parsing function takes care of rounding values up,
so this might possibly happen on 64-bit overflows returning exactly zero
after rounding though. It is not really possible to cut the patch into
pieces as it changes the function's API, hence all callers.
Tests were run on about every relevant part (cookie maxlife/maxidle,
server inter, stats timeout, timeout*, cli's set timeout command,
tcp-request/response inspect-delay).
Commit fb55365f9 ("MINOR: server: increase the default pool-purge-delay
to 5 seconds") did this but the setting placed in new_server() was
overwritten by srv_settings_cpy() from the default-server values preset
in init_default_instance(). Now let's put it at the right place.
The default used to be a very aggressive delay of 1 second before starting
to purge idle connections, but tests show that with bursty traffic it's a
bit short. Let's increase this to 5 seconds.
Have "socks4" and "check-via-socks4" server keyword added.
Implement handshake with SOCKS4 proxy server for tcp stream connection.
See issue #82.
I have the "SOCKS: A protocol for TCP proxy across firewalls" doc found
at "https://www.openssh.com/txt/socks4.protocol". Please reference to it.
[wt: for now connecting to the SOCKS4 proxy over unix sockets is not
supported, and mixing IPv4/IPv6 is discouraged; indeed, the control
layer is unique for a connection and will be used both for connecting
and for target address manipulation. As such it may for example report
incorrect destination addresses in logs if the proxy is reached over
IPv6]
We still have quite a number of build macros which are mapped 1:1 to a
USE_something setting in the makefile but which have a different name.
This patch cleans this up by renaming them to use the USE_something
one, allowing to clean up the makefile and make it more obvious when
reading the code what build option needs to be added.
The following renames were done :
ENABLE_POLL -> USE_POLL
ENABLE_EPOLL -> USE_EPOLL
ENABLE_KQUEUE -> USE_KQUEUE
ENABLE_EVPORTS -> USE_EVPORTS
TPROXY -> USE_TPROXY
NETFILTER -> USE_NETFILTER
NEED_CRYPT_H -> USE_CRYPT_H
CONFIG_HAP_CRYPT -> USE_LIBCRYPT
CONFIG_HAP_NS -> DUSE_NS
CONFIG_HAP_LINUX_SPLICE -> USE_LINUX_SPLICE
CONFIG_HAP_LINUX_TPROXY -> USE_LINUX_TPROXY
CONFIG_HAP_LINUX_VSYSCALL -> USE_LINUX_VSYSCALL
They were all check to comply with the advertised openssl version. Now
that libressl doesn't pretend to be a more recent openssl anymore, we
can simply rely on the regular openssl version tests without having to
deal with exceptions for libressl.
Most tests on OPENSSL_VERSION_NUMBER have become complex and break all
the time because this number is fake for some derivatives like LibreSSL.
This patch creates a new macro, HA_OPENSSL_VERSION_NUMBER, which will
carry the real openssl version defining the compatibility level, and
this version will be adjusted depending on the variants.
This implements support for the new API which relies on a call to
setsockopt().
On systems that support it (currently, only Linux >= 4.11), this enables
using TCP fast open when connecting to server.
Please note that you should use the retry-on "conn-failure", "empty-response"
and "response-timeout" keywords, or the request won't be able to be retried
on failure.
Co-authored-by: Olivier Houchard <ohouchard@haproxy.com>
When copying the settings for all servers when using server templates,
fix a typo, or we would never copy the length of the ALPN to be used for
checks.
This should be backported to 1.9.
As by default we add all keepalive connections to the idle pool, if we run
into a pathological case, where all client don't do keepalive, but the server
does, and haproxy is configured to only reuse "safe" connections, we will
soon find ourself having lots of idling, unusable for new sessions, connections,
while we won't have any file descriptors available to create new connections.
To fix this, add 2 new global settings, "pool_low_ratio" and "pool_high_ratio".
pool-low-fd-ratio is the % of fds we're allowed to use (against the maximum
number of fds available to haproxy) before we stop adding connections to the
idle pool, and destroy them instead. The default is 20. pool-high-fd-ratio is
the % of fds we're allowed to use (against the maximum number of fds available
to haproxy) before we start killing idling connection in the event we have to
create a new outgoing connection, and no reuse is possible. The default is 25.
Older compilers don't like to see "inline" placed after the type in a
function declaration, it must be "static inline <type>" only. This
patch touches various areas. The warnings were seen with gcc-3.4.
It is mandatory to handle mux upgrades, because during a mux upgrade, the
connection will be reassigned to another multiplexer. So when the old one is
destroyed, it does not own the connection anymore. Or in other words, conn->ctx
does not point to the old mux's context when its destroy() callback is
called. So we now rely on the multiplexer context do destroy it instead of the
connection.
In addition, h1_release() and h2_release() have also been updated in the same
way.
Since LIST_DEL_LOCKED() and LIST_POP_LOCKED() now automatically reinitialize
the removed element, there's no need for keeping this LIST_INIT() call in the
idle connection code.
Instead of having one task per thread and per server that does clean the
idling connections, have only one global task for every servers.
That tasks parses all the servers that currently have idling connections,
and remove half of them, to put them in a per-thread list of connections
to kill. For each thread that does have connections to kill, wake a task
to do so, so that the cleaning will be done in the context of said thread.
Add a per-thread counter of idling connections, and use it to determine
how many connections we should kill after the timeout, instead of using
the global counter, or we're likely to just kill most of the connections.
This should be backported to 1.9.
This also depends on the nbthread count, so it must only be performed after
parsing the whole config file. As a side effect, this removes some code
duplication between servers and server-templates.
This must be backported to 1.9.
The idle conns lists are sized according to the number of threads. As such
they cannot be initialized during the parsing since nbthread can be set
later, as revealed by this simple config which randomly crashes when used.
Let's do this at the end instead.
listen proxy
bind :4445
mode http
timeout client 10s
timeout server 10s
timeout connect 10s
http-reuse always
server s1 127.0.0.1:8000
global
nbthread 8
This fix must be backported to 1.9 and 1.8.
Some servers may wish to limit the total number of requests they execute
over a connection because some of their components might leak resources.
In HTTP/1 it was easy, they just had to emit a "connection: close" header
field with the last response. In HTTP/2, it's less easy because the info
is not always shared with the component dealing with the H2 protocol and
it could be harder to advertise a GOAWAY with a stream limit.
This patch provides a solution to this by adding a new "max-reuse" parameter
to the server keyword. This parameter indicates how many times an idle
connection may be reused for new requests. The information is made available
and the underlying muxes will be able to use it at will.
This patch should be backported to 1.9.
The keyword parser doesn't check the value range, but supported values are
-1 and positive values, thus we should check it.
This can be backported to 1.9.
When we load health values from a server state file, make sure what we assign
to srv->check.health actually matches the state we restore.
This should be backported as far as 1.6.
In order to address the mailers issues, we needed to store the proxy
into the checks struct, which was done by commit c98aa1f18 ("MINOR:
checks: Store the proxy in checks."). However this one did it only for
the health checks and not for the agent checks, resulting in an immediate
crash when the agent is enabled on a random config like this one :
listen agent
bind :8000
server s1 255.255.255.255:1 agent-check agent-port 1
Thanks to Seri Kim for reporting it and providing a reproducer in
issue #20. This fix must be backported to 1.9.
Make "bind" keywork be supported in "peers" sections.
All "bind" settings are supported on this line.
Add "default-bind" option to parse the binding options excepted the bind address.
Do not parse anymore the bind address for local peers on "server" lines.
Do not use anymore list_for_each_entry() to set the "peers" section
listener parameters because there is only one listener by "peers" section.
May be backported to 1.5 and newer.
With this patch "default-server" lines are supported in "peers" sections
to setup the default settings of peers which are from now setup
when parsing both "peer" and "server" lines.
May be backported to 1.5 and newer.
Instead of assuming we have a server, store the proxy directly in struct
check, and use it instead of s->server.
This should be a no-op for now, but will be useful later when we change
mail checks to avoid having a server.
This should be backported to 1.9.
When initializing server-template all of the servers after the first
have srv->idle_orphan_conns initialized within server_template_init()
The first server does not have this initialized and when http-reuse
is active this causes a segmentation fault when accessed from
srv_add_to_idle_list(). This patch removes the check for
srv->tmpl_info.prefix within server_finalize_init() and allows
the first server within a server-template to have srv->idle_orphan_conns
properly initialized.
This should be backported to 1.9.
Add a way to configure the ALPN used by check, with a new "check-alpn"
keyword. By default, the checks will use the server ALPN, but it may not
be convenient, for instance because the server may use HTTP/2, while checks
are unable to do HTTP/2 yet.
Instead of the old "idle-timeout" mechanism, add a new option,
"pool-purge-delay", that sets the delay before purging idle connections.
Each time the delay happens, we destroy half of the idle connections.
Add a new command, "pool-max-conn" that sets the maximum number of connections
waiting in the orphan idling connections list (as activated with idle-timeout).
Using "-1" means unlimited. Using pools is now dependant on this.
Add a new keyword for servers, "idle-timeout". If set, unused connections are
kept alive until the timeout happens, and will be picked for reuse if no
other connection is available.
Currently a mux may be forced on a bind or server line by specifying the
"proto" keyword. The problem is that the mux may depend on the proxy's
mode, which is not known when parsing this keyword, so a wrong mux could
be picked.
Let's simply update the mux entry while checking its validity. We do have
the name and the side, we only need to see if a better mux fits based on
the proxy's mode. It also requires to remove the side check while parsing
the "proto" keyword since a wrong mux could be picked.
This way it becomes possible to declare multiple muxes with the same
protocol names and different sides or modes.
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :
- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
Remaining calls to si_cant_put() were all for lack of room and were
turned to si_rx_room_blk(). A few places where SI_FL_RXBLK_ROOM was
cleared by hand were converted to si_rx_room_rdy().
The now unused si_cant_put() function was removed.
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
There are as many ways to build the globalfilepathlen variable as branches
in the if/then/else, creating lots of confusion. Address the most obvious
parts, but some polishing definitely is still needed.
OpenSSL released support for TLSv1.3. It also added a separate function
SSL_CTX_set_ciphersuites that is used to set the ciphers used in the
TLS 1.3 handshake. This change adds support for that new configuration
option by adding a ciphersuites configuration variable that works
essentially the same as the existing ciphers setting.
Note that it should likely be backported to 1.8 in order to ease usage
of the now released openssl-1.1.1.
This patch ensures that a DNS resolution may be launched before
setting a server FQDN via the CLI. Especially, it checks that
resolvers was set.
A LEVEL 4 reg testing file is provided.
Thanks to Lukas Tribus for having reported this issue.
Must be backported to 1.8.
Server state file has no indication that a server is currently managed
by a DNS SRV resolution.
And thus, both feature (DNS SRV resolution and server state), when used
together, does not provide the expected behavior: a smooth experience...
This patch introduce the "SRV record name" in the server state file and
loads and applies it if found and wherever required.
This patch applies to haproxy-dev branch only. For backport, a specific patch
is provided for 1.8.
thread_isolate() is currently being called with the server lock held.
This is not acceptable because it prevents other threads from reaching
the rendez-vous point. Now that the LB algos are thread-safe, let's get
rid of this call.
No backport is nedeed.
The server-specific CLI commands "set weight", "set maxconn",
"disable agent", "enable agent", "disable health", "enable health",
"disable server" and "enable server" were not protected against
concurrent accesses. Now they take the server lock around the
sensitive part.
This patch must be backported to 1.8.
At the moment it's totally unclear while reading the server's code which
functions require to be called with the server lock held and which ones
grab it and cannot be called this way. This commit simply inventories
all of them to indicate what is detected depending on how these functions
use the struct server. Only functions used at runtime were checked, those
dedicated to config parsing were skipped. Doing so already has uncovered
a few bugs on some CLI actions.
Commit 3ff577e ("MAJOR: server: make server state changes synchronous again")
reintroduced synchronous server state changes. However, during the previous
change from synchronous to asynchronous, the server state propagation was
placed at the end of the function to ease the code changes, and the commit
above didn't put it back at its place. This has resulted in propagated
states to be incomplete. For example, making a server leave maintenance
would make it up but would leave its tracking servers down because they
see their tracked server is still down.
Let's just move the status update right to its place. It also adds the
benefit of reporting state changes in the order they appear and not in
reverse.
No backport is needed.
We'll need trees to manage the queues by priorities. This change replaces
the list with a tree based on a single key. It's effectively a list but
allows us to get rid of the list management right now.
Now we try to synchronously push updates as they come using the new rdv
point, so that the call to the server update function from the main poll
loop is not needed anymore.
It further reduces the apparent latency in the health checks as the response
time almost always appears as 0 ms, resulting in a slightly higher check rate
of ~1960 conn/s. Despite this, the CPU consumption has slightly dropped again
to ~32% for the same test.
The only trick is that the checks code is built with a bit of recursivity
because srv_update_status() calls server_recalc_eweight(), and the latter
needs to signal srv_update_status() in case of updates. Thus we added an
extra argument to this function to indicate whether or not it must
propagate updates (no if it comes from srv_update_status).
The current sync point causes some important stress when a high number
of threads is in use on a config with lots of checks, because it wakes
up all threads every time a server state changes.
A config like the following can easily saturate a 4-core machine reaching
only 750 checks per second out of the ~2000 configured :
global
nbthread 4
defaults
mode http
timeout connect 5s
timeout client 5s
timeout server 5s
frontend srv
bind :8001 process 1/1
redirect location / if { method OPTIONS } { rand(100) ge 50 }
stats uri /
backend chk
option httpchk
server-template srv 1-100 127.0.0.1:8001 check rise 1 fall 1 inter 50
The reason is that the random on the fake server causes the responses
to randomly match an HTTP check, and results in a lot of up/down events
that are broadcasted to all threads. It's worth noting that the CPU usage
already dropped by about 60% between 1.8 and 1.9 just due to the scheduler
updates, but the sync point remains expensive.
In addition, it's visible on the stats page that a lot of requests end up
with an L7TOUT status in ~60ms. With smaller timeouts, it's even L4TOUT
around 20-25ms.
By not using THREAD_WANT_SYNC() anymore and only calling the server updates
under thread_isolate(), we can avoid all these wakeups. The CPU usage on
the same config drops to around 44% on the same machine, with all checks
being delivered at ~1900 checks per second, and the stats page shows no
more timeouts, even at 10 ms check interval. The difference is mainly
caused by the fact that there's no more need to wait for a thread to wake
up from poll() before starting to process check results.
Commit 64cc49c ("MAJOR: servers: propagate server status changes
asynchronously.") heavily changed the way the server states are
updated since they became asynchronous. During this change, some
code was lost, which is used to shut down some sessions from a
backup server and to pick pending connections from a proxy once
a server is turned back from maintenance to ready state. The
effect is that when temporarily disabling a server, connections
stay in the backend's queue, and when re-enabling it, they are
not picked and they expire in the backend's queue. Now they're
properly picked again.
This fix must be backported to 1.8.
When parsing the configuration, if "server", "default-server" or
"server-template" are found in a frontend, we first warn that it will be
ignored, only to be considered a fatal error later. Be true to our word, and
just ignore it.
This should be backported to 1.8 and 1.7.
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.
Most of the changes were made with spatch using this coccinelle script :
@rule_d1@
typedef chunk;
struct chunk chunk;
@@
- chunk.str
+ chunk.area
@rule_d2@
typedef chunk;
struct chunk chunk;
@@
- chunk.len
+ chunk.data
@rule_i1@
typedef chunk;
struct chunk *chunk;
@@
- chunk->str
+ chunk->area
@rule_i2@
typedef chunk;
struct chunk *chunk;
@@
- chunk->len
+ chunk->data
Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
By default, HAProxy's DNS resolution at runtime ensure that there is no
IP address duplication in a backend (for servers being resolved by the
same hostname).
There are a few cases where people want, on purpose, to disable this
feature.
This patch introduces a couple of new server side options for this purpose:
"resolve-opts allow-dup-ip" or "resolve-opts prevent-dup-ip".
When creating a state file using "show servers state" an empty field is
created in the srv_addr column if the server is from the socket family
AF_UNIX. This leads to a warning on start up when using
"load-server-state-from-file". This patch defaults srv_addr to "-" if
the socket family is not covered.
This patch should be backported to 1.8.
In order to use arbitrary data in the CLI (multiple lines or group of words
that must be considered as a whole, for example), it is now possible to add a
payload to the commands. To do so, the first line needs to end with a special
pattern: <<\n. Everything that follows will be left untouched by the CLI parser
and will be passed to the commands parsers.
Per-command support will need to be added to take advantage of this
feature.
Signed-off-by: Aurélien Nephtali <aurelien.nephtali@corp.ovh.com>
This patch add option crc32c (PP2_TYPE_CRC32C) to proxy protocol v2.
It compute the checksum of proxy protocol v2 header as describe in
"doc/proxy-protocol.txt".
This patch implement proxy protocol v2 options related to crypto information:
ssl-cipher (PP2_SUBTYPE_SSL_CIPHER), cert-sig (PP2_SUBTYPE_SSL_SIG_ALG) and
cert-key (PP2_SUBTYPE_SSL_KEY_ALG).
Proxy protocol v2 can transport many optional informations. To avoid
send-proxy-v2-* explosion, this patch introduce proxy-v2-options parameter
and will allow to write: "send-proxy-v2 proxy-v2-options ssl,cert-cn".
Because of a typo (HA_SPIN_LOCK instead of HA_SPIN_UNLOCK), there is a deadlock
in srv_set_stopping and srv_set_admin_flag when there is at least one trackers.
This patch must be backported in 1.8.
Especially with server-templates, it can happen servers starts with a
placeholder IP, in the disabled state. In this case, we don't want to report
that the same cookie was generated for multiple servers. So defer the test
until the server is enabled.
This should be backported to 1.8.
Setting a server in maint mode, the required next_state was not set
before calling the 'lb_down' function and so the system state was never
commited.
This patch should be backported in 1.8
The new admin state was not correctly commited in this case.
Checks were fully disabled but the server was not marked in MAINT state.
It results with a server definitely stucked on the DOWN state.
This patch should be backported on haproxy 1.8
Rename the global variable "proxy" to "proxies_list".
There's been multiple proxies in haproxy for quite some time, and "proxy"
is a potential source of bugs, a number of functions have a "proxy" argument,
and some code used "proxy" when it really meant "px" or "curproxy". It worked
by pure luck, because it usually happened while parsing the config, and thus
"proxy" pointed to the currently parsed proxy, but we should probably not
rely on this.
[wt: some of these are definitely fixes that are worth backporting]
Clang reports this warning :
src/server.c:872:14: warning: address of array 'check->desc' will
always evaluate to 'true' [-Wpointer-bool-conversion]
Indeed, check->desc used to be a pointer to a dynamically allocated area
a long time ago and is now an array. Let's remove the useless test.
This macro should be used to declare variables or struct members depending on
the USE_THREAD compile option. It avoids the encapsulation of such declarations
between #ifdef/#endif. It is used to declare all lock variables.
snr_check_ip_callback() may be called with the server lock, so don't attempt
to lock it again, instead, make sure the callers always have the lock before
calling it.
Commit c3680ec ("MINOR: add severity information to cli feedback messages")
introduced a severity level to CLI messages, but one of them was missed
on "set server addr". No backport is needed.
The "set server <srv> check-port" CLI handler forgot to return after
detecting an error on the port number, and still proceeds with the action.
This needs to be backported to 1.7.
srv_set_fqdn() may be called with the DNS lock already held, but tries to
lock it anyway. So, add a new parameter to let it know if it was already
locked or not;
This list is used to save changes on the servers state. So when serveral threads
are used, it must be locked. The changes are then applied in the sync-point. To
do so, servers_update_status has be moved in the sync-point. So this is useless
to lock it at this step because the sync-point is a protected area by iteself.
For now, we have a list of each type per thread. So there is no need to lock
them. This is the easiest solution for now, but not the best one because there
is no sharing between threads. An idle connection on a thread will not be able
be used by a stream on another thread. So it could be a good idea to rework this
patch later.
This is a huge patch with many changes, all about the DNS. Initially, the idea
was to update the DNS part to ease the threads support integration. But quickly,
I started to refactor some parts. And after several iterations, it was
impossible for me to commit the different parts atomically. So, instead of
adding tens of patches, often reworking the same parts, it was easier to merge
all my changes in a uniq patch. Here are all changes made on the DNS.
First, the DNS initialization has been refactored. The DNS configuration parsing
remains untouched, in cfgparse.c. But all checks have been moved in a post-check
callback. In the function dns_finalize_config, for each resolvers, the
nameservers configuration is tested and the task used to manage DNS resolutions
is created. The links between the backend's servers and the resolvers are also
created at this step. Here no connection are kept alive. So there is no needs
anymore to reopen them after HAProxy fork. Connections used to send DNS queries
will be opened on demand.
Then, the way DNS requesters are linked to a DNS resolution has been
reworked. The resolution used by a requester is now referenced into the
dns_requester structure and the resolution pointers in server and dns_srvrq
structures have been removed. wait and curr list of requesters, for a DNS
resolution, have been replaced by a uniq list. And Finally, the way a requester
is removed from a DNS resolution has been simplified. Now everything is done in
dns_unlink_resolution.
srv_set_fqdn function has been simplified. Now, there is only 1 way to set the
server's FQDN, independently it is done by the CLI or when a SRV record is
resolved.
The static DNS resolutions pool has been replaced by a dynamoc pool. The part
has been modified by Baptiste Assmann.
The way the DNS resolutions are triggered by the task or by a health-check has
been totally refactored. Now, all timeouts are respected. Especially
hold.valid. The default frequency to wake up a resolvers is now configurable
using "timeout resolve" parameter.
Now, as documented, as long as invalid repsonses are received, we really wait
all name servers responses before retrying.
As far as possible, resources allocated during DNS configuration parsing are
releases when HAProxy is shutdown.
Beside all these changes, the code has been cleaned to ease code review and the
doc has been updated.
Don't forget to allocate tmptrash before using it, and free it once we're
done.
[wt: introduced by commit 64cc49cf ("MAJOR: servers: propagate server
status changes asynchronously"), no backport needed]
Fix regression introduced by commit:
'MAJOR: servers: propagate server status changes asynchronously.'
The building of the log line was re-worked to be done at the
postponed point without lack of data.
[wt: this only affects 1.8-dev, no backport needed]
For HTTP/2 we'll need some buffer-only equivalent functions to some of
the ones applying to channels and still squatting the bi_* / bo_*
namespace. Since these names have kept being misleading for quite some
time now and are really getting annoying, it's time to rename them. This
commit will use "ci/co" as the prefix (for "channel in", "channel out")
instead of "bi/bo". The following ones were renamed :
bi_getblk_nc, bi_getline_nc, bi_putblk, bi_putchr,
bo_getblk, bo_getblk_nc, bo_getline, bo_getline_nc, bo_inject,
bi_putchk, bi_putstr, bo_getchr, bo_skip, bi_swpbuf
In order to prepare multi-thread development, code was re-worked
to propagate changes asynchronoulsy.
Servers with pending status changes are registered in a list
and this one is processed and emptied only once 'run poll' loop.
Operational status changes are performed before administrative
status changes.
In a case of multiple operational status change or admin status
change in the same 'run poll' loop iteration, those changes are
merged to reach only the targeted status.
Leaving the maintenance state and if the server remains in stopping mode due
to a tracked one:
- We mistakenly try to grab some pending conns and shutdown backup sessions.
- The proxy down time and last change were also mistakenly updated
snr_resolution_cb can be called with <nameserver> parameter set to NULL. So we
must check it before using it. This is done most of time, except when we deal
with invalid DNS response.
This reverts commit 19e8aa58f7.
It causes some trouble reported by Manu :
listen tls
[...]
server bla 127.0.0.1:8080
[ALERT] 248/130258 (21960) : parsing [/etc/haproxy/test.cfg:53] : 'server bla' : no method found to resolve address '(null)'
[ALERT] 248/130258 (21960) : Failed to initialize server(s) addr.
According to Nenad :
"It's not a good way to fix the issue we were experiencing
before. It will need a bigger rewrite, because the logic in
srv_iterate_initaddr needs to be changed."
Historically the DNS was the only way of updating the server IP dynamically
and the init-addr processing and state file load required the server to have
an FQDN defined. Given that we can now update the IP through the socket as
well and also can have different init-addr values (like IP and 'none') - this
requirement needs to be removed.
This patch should be backported to 1.7.
The server state and weight was reworked to handle
"pending" values updated by checks/CLI/LUA/agent.
These values are commited to be propagated to the
LB stack.
In further dev related to multi-thread, the commit
will be handled into a sync point.
Pending values are named using the prefix 'next_'
Current values used by the LB stack are named 'cur_'
This patch fixes a bug where some servers managed by SRV record query
types never ever recover from a "no resolution" status.
The problem is due to a wrong function called when breaking the
server/resolution (A/AAAA) relationship: this is performed when a server's SRV
record disappear from the SRV response.
The function srv_set_fqdn() is used to update a server's fqdn and set
accordingly its DNS resolution.
Current implementation prevents a server whose update is triggered by a
SRV record from being linked to an existing resolution in the cache (if
applicable).
This patch aims at fixing this.
Make it so for each server, instead of specifying a hostname, one can use
a SRV label.
When doing so, haproxy will first resolve the SRV label, then use the
resulting hostnames, as well as port and weight (priority is ignored right
now), to each server using the SRV label.
It is resolved periodically, and any server disappearing from the SRV records
will be removed, and any server appearing will be added, assuming there're
free servers in haproxy.
As DNS servers may not return all IPs in one answer, we want to cache the
previous entries. Those entries are removed when considered obsolete, which
happens when the IP hasn't been returned by the DNS server for a time
defined in the "hold obsolete" parameter of the resolver section. The default
is 30s.
This patch is a major upgrade of the internal run-time DNS resolver in
HAProxy and it brings the following 2 main changes:
1. DNS resolution task
Up to now, DNS resolution was triggered by the health check task.
From now, DNS resolution task is autonomous. It is started by HAProxy
right after the scheduler is available and it is woken either when a
network IO occurs for one of its nameserver or when a timeout is
matched.
From now, this means we can enable DNS resolution for a server without
enabling health checking.
2. Introduction of a dns_requester structure
Up to now, DNS resolution was purposely made for resolving server
hostnames.
The idea, is to ensure that any HAProxy internal object should be able
to trigger a DNS resolution. For this purpose, 2 things has to be done:
- clean up the DNS code from the server structure (this was already
quite clean actually) and clean up the server's callbacks from
manipulating too much DNS resolution
- create an agnostic structure which allows linking a DNS resolution
and a requester of any type (using obj_type enum)
3. Manage requesters through queues
Up to now, there was an uniq relationship between a resolution and it's
owner (aka the requester now). It's a shame, because in some cases,
multiple objects may share the same hostname and may benefit from a
resolution being performed by a third party.
This patch introduces the notion of queues, which are basically lists of
either currently running resolution or waiting ones.
The resolutions are now available as a pool, which belongs to the resolvers.
The pool has has a default size of 64 resolutions per resolvers and is
allocated at configuration parsing.
Introduction of a DNS response LRU cache in HAProxy.
When a positive response is received from a DNS server, HAProxy stores
it in the struct resolution and then also populates a LRU cache with the
response.
For now, the key in the cache is a XXHASH64 of the hostname in the
domain name format concatened to the query type in string format.
Prior this patch, the DNS responses were stored in a pre-allocated
memory area (allocated at HAProxy's startup).
The problem is that this memory is erased for each new DNS responses
received and processed.
This patch removes the global memory allocation (which was not thread
safe by the way) and introduces a storage of the dns response in the
struct
resolution.
The memory in the struct resolution is also reserved at start up and is
thread safe, since each resolution structure will have its own memory
area.
For now, we simply store the response and use it atomically per
response per server.
In the process of breaking links between dns_* functions and other
structures (mainly server and a bit of resolution), the function
dns_get_ip_from_response needs to be reworked: it now can call
"callback" functions based on resolution's owner type to allow modifying
the way the response is processed.
For now, main purpose of the callback function is to check that an IP
address is not already affected to an element of the same type.
For now, only server type has a callback.