This converter tranform a integer to its binary representation in the network
byte order. Integer are already automatically converted to binary during sample
expression evaluation. But because samples own 8-bytes integers, the conversion
produces 8 bytes. the htonl converter do the same but for 4-bytes integer.
A global list to tcp-check ruleset can now be used to share common rulesets with
all backends without any duplication. It is mandatory to convert all specific
protocol checks (redis, pgsql...) to tcp-check healthchecks.
To do so, a flag is now attached to each tcp-check ruleset to know if it is a
shared ruleset or not. tcp-check rules defined in a backend are still directly
attached to the proxy and not shared. In addition a second flag is used to know
if the ruleset is inherited from the defaults section.
When a log-format string is parsed, if a sample fetch is found, the flag LW_REQ
is systematically added on the proxy. Unfortunately, this produce a warning
during HAProxy start-up when a log-format string is used for a tcp-check send
rule. Now this flag is only added if the parsed sample fetch depends on HTTP
information.
When a log-format string is evaluated, there is no reason to process sample
fetches only when a stream is defined. Several sample fetches are available
outside the stream scope. All others should handle calls without stream. This
patch is mandatory to support log-format string in tcp-check rules.
An extra parameter for tcp-check send rules can be specified to handle the
string or the hexa string as a log-format one. Using "log-format" option,
instead of considering the data to send as raw data, it is parsed as a
log-format string. Thus it is possible to call sample fetches to customize data
sent to a server. Of course, because we have no stream attached to healthchecks,
not all sample fetches are available. So be careful.
tcp-check set-var(check.port) int(8000)
tcp-check set-var(check.uri) str(/status)
tcp-check connect port var(check.port)
tcp-check send "GET %[check.uri] HTTP/1.0\r\n" log-format
tcp-check send "Host: %[srv_name]\r\n" log-format
tcp-check send "\r\n"
Since we have a session attached to tcp-check healthchecks, It is possible use
sample expression and variables. In addition, it is possible to add tcp-check
set-var rules to define custom variables. So, now, a sample expression can be
used to define the port to use to establish a connection for a tcp-check connect
rule. For instance:
tcp-check set-var(check.port) int(8888)
tcp-check connect port var(check.port)
With this option, it is now possible to use a specific address to open the
connection for a tcp-check connect rule. If the port option is also specified,
it is used in priority.
With this option, it is possible to open a connection from a tcp-check connect
rule using all parameter of the server line, like any other healthcheck. For
now, this parameter is exclusive with all other option for a tcp-check connect
rule.
With this option, it is possible to establish the connection opened by a
tcp-check connect rule using upstream socks4 proxy. Info from the socks4
parameter on the server are used.
Evaluate the registered action_ptr associated with each CHK_ACTION_KW rules from
a ruleset. Currently only the 'set-var' and 'unset-var' are parsed by the
tcp-check parser. Thus it is now possible to set or unset variables. It is
possible to use such rules before the first connect of the ruleset.
Register the custom action rules "set-var" and "unset-var", that will
call the parse_store() command upon parsing.
These rules are thus built and integrated to the tcp-check ruleset, but
have no further effect for the moment.
Add a dedicated vars scope for checks. This scope is considered as part of the
session scope for accounting purposes.
The scope can be addressed by a valid session, even embryonic. The stream is not
necessary.
The scope is initialized after the check session is created. All variables are
then pruned before the session is destroyed.
Create a session for each healthcheck relying on a tcp-check ruleset. When such
check is started, a session is allocated, which will be freed when the check
finishes. A dummy static frontend is used to create these sessions. This will be
useful to support variables and sample expression. This will also be used,
later, by HTTP healthchecks to rely on HTTP muxes.
The loop in tcpcheck_main() function is quite hard to understand. Depending
where we are in the loop, The current_step is the currentely executed rule or
the one to execute on the next call to tcpcheck_main(). When the check result is
reported, we rely on the rule pointed by last_started_step or the one pointed by
current_step. In addition, the loop does not use the common list_for_each_entry
macro and it is thus quite confusing.
So the loop has been totally rewritten and splitted to several functions to
simplify its reading and its understanding. Tcp-check rules are evaluated in
dedicated functions. And a common for_each loop is used and only one rule is
referenced, the current one.
After the configuration parsing, when its validity check, an implicit tcp-check
connect rule is added in front of the tcp-check ruleset if the first non-comment
rule is not a connect one. This implicit rule is flagged to use the default
check parameter.
This means now, all tcp-check rulesets begin with a connect and are never
empty. When tcp-check healthchecks are used, all connections are thus handled by
tcpcheck_main() function.
The keyword 'tcp-check' is now parsed in a dedicated callback function. Thus the
code to parse these rules is now located in checks.c. In addition, a deinit
function have been added to release proxy tcp-check rules, on error or when
HAProxy is stopped.
This patch is based on Gaetan Rivet work. It uses a callback function registerd
on the 'tcp-check' keyword instead, but the spirit is the same.
To allow reusing these blocks without consuming more memory, their list
should be static and share-able accross uses. The head of the list will
be shared as well.
It is thus necessary to extract the head of the rule list from the proxy
itself. Transform it into a pointer instead, that can be easily set to
an external dynamically allocated head.
Parse back-references in comments of tcp-check expect rules. If references are
made, capture groups in the match and replace references to it within the
comment when logging the error. Both text and binary regex can caputre groups
and reference them in the expect rule comment.
[Cf: I slightly updated the patch. exp_replace() function is used instead of a
custom one. And if the trash buffer is too small to contain the comment during
the substitution, the comment is ignored.]
The loop to find the id corresponding to the current rule in
tcpcheck_get_step_id() function has been simplified. And
tcpcheck_get_step_comment() function now only relies on the current rule to find
the rigth comment string. The step id is no longer used. To do so, we iterate
backward from the current step to find the first COMMENT rule immediately
preceedding the expect rule chain.
The rbinary match works similarly to the rstring match type, however the
received data is rewritten as hex-string before the match operation is
done.
This allows using regexes on binary content even with the POSIX regex
engine.
[Cf: I slightly updated the patch. mem2hex function was removed and dump_binary
is used instead.]
On the input buffer, it was mainly done to call regex_exec() function. But
regex_exec2() can be used instead. This way, it is no more required to add the
terminating null byte. For the output buffer, it was only done for debugging
purpose.
Allow declaring tcpcheck connect commands with a new parameter,
"linger". This option will configure the connection to avoid using an
RST segment to close, instead following the four-way termination
handshake. Some servers would otherwise log each healthcheck as
an error.
Some expect rules cannot be satisfied due to inherent ambiguity towards
the received data: in the absence of match, the current behavior is to
be forced to wait either the end of the connection or a buffer full,
whichever comes first. Only then does the matching diagnostic is
considered conclusive. For instance :
tcp-check connect
tcp-check expect !rstring "^error"
tcp-check expect string "valid"
This check will only succeed if the connection is closed by the server before
the check timeout. Otherwise the first expect rule will wait for more data until
"^error" regex matches or the check expires.
Allow the user to explicitly define an amount of data that will be
considered enough to determine the value of the check.
This allows succeeding on negative rstring rules, as previously
in valid condition no match happened, and the matching was repeated
until the end of the connection. This could timeout the check
while no error was happening.
[Cf: I slighly updated the patch. The parameter was renamed and the value is a
signed integer to support -1 as default value to ignore the parameter.]
Reduce copy of parsing portions that is common to all three types of
expect actions.
This reduces the amount of code, helping maintainability and reducing
future change spread.
Functionality is identical.
When receiving additional data while chaining multiple tcp-check expects,
previous inverse expects might have a different result with the new data. They
need to be evaluated again against the new data.
Add a pointer to the first inverse expect rule of the current expect chain
(possibly of length one) to each expect rule. When receiving new data, the
currently evaluated tcp-check rule is set back to this pointed rule.
Fonctionnaly speaking, it is a bug and it exists since the introduction of the
feature. But there is no way for now to hit it because when an expect rule does
not match, we wait for more data, independently on the inverse flag. The only
way to move to the following rule is to be sure no more data will be received.
This patch depends on the commit "MINOR: mini-clist: Add functions to iterate
backward on a list".
[Cf: I slightly updated the patch. First, it only concerns inverse expect
rule. Normal expect rules are not concerned. Then, I removed the BUG tag
because, for now, it is not possible to move to the following rule when the
current one does not match while more data can be received.]
TCP check expect matching strings or binary patterns are able to know
prior to applying their match function whether the available data is
already sufficient to attempt the match or not.
As such, on insufficient data the expect is postponed. This behavior
avoids unnecessary matches when the data could not possibly match.
When chaining expect, upon passing the previous and going onto the next
however, this length check is not done again. Then the match is done and
will necessarily fail, triggering a new wait for more data. The end
result is the same for a slightly higher cost.
Check received data length for all expects in a chain.
This bug exists since the introduction of the feature:
Fixes: 5ecb77f4c7 ("MEDIUM: checks: add send/expect tcp based check")
Version 1.5+ impacted.
The options and directives related to the configuration of checks in a backend
may be defined after the servers declarations. So, initialization of the check
of each server must not be performed during configuration parsing, because some
info may be missing. Instead, it must be done during the configuration validity
check.
Thus, callback functions are registered to be called for each server after the
config validity check, one for the server check and another one for the server
agent-check. In addition deinit callback functions are also registered to
release these checks.
This patch should be backported as far as 1.7. But per-server post_check
callback functions are only supported since the 2.1. And the initcall mechanism
does not exist before the 1.9. Finally, in 1.7, the code is totally
different. So the backport will be harder on older versions.
This options is used to force a non-SSL connection to check a SSL server or to
invert a check-ssl option inherited from the default section. The use_ssl field
in the check structure is used to know if a SSL connection must be used
(use_ssl=1) or not (use_ssl=0). The server configuration is used by default.
The problem is that we cannot distinguish the default case (no specific SSL
check option) and the case of an explicit non-SSL check. In both, use_ssl is set
to 0. So the server configuration is always used. For a SSL server, when
no-check-ssl option is set, the check is still performed using a SSL
configuration.
To fix the bug, instead of a boolean value (0=TCP, 1=SSL), we use a ternary value :
* 0 = use server config
* 1 = force SSL
* -1 = force non-SSL
The same is done for the server parameter. It is not really necessary for
now. But it is a good way to know is the server no-ssl option is set.
In addition, the PR_O_TCPCHK_SSL proxy option is no longer used to set use_ssl
to 1 for a check. Instead the flag is directly tested to prepare or destroy the
server SSL context.
This patch should be backported as far as 1.8.
Error codes ERR_WARN and ERR_ALERT are used to signal that the error
given is of the corresponding level. All errors are displayed as ALERT
in the display_parser_err() function.
Differentiate the display level based on the error code. If both
ERR_WARN and ERR_ALERT are used, ERR_ALERT is given priority.
The 'http-check send' directive have been added to add headers and optionnaly a
payload to the request sent during HTTP healthchecks. The request line may be
customized by the "option httpchk" directive but there was not official way to
add extra headers. An old trick consisted to hide these headers at the end of
the version string, on the "option httpchk" line. And it was impossible to add
an extra payload with an "http-check expect" directive because of the
"Connection: close" header appended to the request (See issue #16 for details).
So to make things official and fully support payload additions, the "http-check
send" directive have been added :
option httpchk POST /status HTTP/1.1
http-check send hdr Content-Type "application/json;charset=UTF-8" \
hdr X-test-1 value1 hdr X-test-2 value2 \
body "{id: 1, field: \"value\"}"
When a payload is defined, the Content-Length header is automatically added. So
chunk-encoded requests are not supported yet. For now, there is no special
validity checks on the extra headers.
This patch is inspired by Kiran Gavali's work. It should fix the issue #16 and
as far as possible, it may be backported, at least as far as 1.8.
Server address and port may change at runtime. So the address and port passed as
arguments and as environment variables when an external check is executed must
be updated. The current number of connections on the server was already updated
before executing the command. So the same mechanism is used for the server
address and port. But in addition, command arguments are also updated.
This patch must be backported to all stable versions. It should fix the
issue #577.
It is the intended behaviour. But because of a bug, the 500 error resulting of a
rewrite failure during http-after-response ruleset evaluation is also
rewritten. So if at this step, if there is also a rewrite error, the session is
closed and no error message is returned.
Instead, we must be sure to not evaluate the http-after-response rules on an
error message if it is was thrown because of a rewrite failure on a previous
error message.
It is a 2.2-dev2+ bug. No need to backport. This patch should fix the issue
pool_gc() causes quite some stress on the memory allocator because
it calls a lot of free() calls while other threads are using malloc().
In addition, pool_gc() needs to take care of possible locking because
it may be called from pool allocators. One way to avoid all this is to
use thread_isolate() to make sure the gc runs alone. By putting less
pressure on the pools and getting rid of the locks, it may even take
less time to complete.
The url_decode() function used by the url_dec converter and a few other
call points is ambiguous on its processing of the '+' character which
itself isn't stable in the spec. This one belongs to the reserved
characters for the query string but not for the path nor the scheme,
in which it must be left as-is. It's only in argument strings that
follow the application/x-www-form-urlencoded encoding that it must be
turned into a space, that is, in query strings and POST arguments.
The problem is that the function is used to process full URLs and
paths in various configs, and to process query strings from the stats
page for example.
This patch updates the function to differentiate the situation where
it's parsing a path and a query string. A new argument indicates if a
query string should be assumed, otherwise it's only assumed after seeing
a question mark.
The various locations in the code making use of this function were
updated to take care of this (most call places were using it to decode
POST arguments).
The url_dec converter is usually called on path or url samples, so it
needs to remain compatible with this and will default to parsing a path
and turning the '+' to a space only after a question mark. However in
situations where it would explicitly be extracted from a POST or a
query string, it now becomes possible to enforce the decoding by passing
a non-null value in argument.
It seems to be what was reported in issue #585. This fix may be
backported to older stable releases.
A typo resulted in '||' being used to concatenate trace flags, which will
only set flag of value '1' there. Noticed by clang 10 and reported in
issue #588.
No backport is needed, this trace was added in 2.2-dev.
When checking www-authenticate headers, we don't want to just accept
"NTLM" as value, because the server may send "HTLM <base64 value>". Instead,
just check that it starts with NTLM.
This should be backported to 2.1, 2.0, 1.9 and 1.8.
Documentation states that default settings for ssl server options can be set
using either ssl-default-server-options or default-server directives. In practice,
not all ssl server options can have default values, such as ssl-min-ver, ssl-max-ver,
etc..
This patch adds the missing ssl options in srv_ssl_settings_cpy() and srv_parse_ssl(),
making it possible to write configurations like the following examples, and have them
behave as expected.
global
ssl-default-server-options ssl-max-ver TLSv1.2
defaults
mode http
listen l1
bind 1.2.3.4:80
default-server ssl verify none
server s1 1.2.3.5:443
listen l2
bind 2.2.3.4:80
default-server ssl verify none ssl-max-ver TLSv1.3 ssl-min-ver TLSv1.2
server s1 1.2.3.6:443
This should be backported as far as 1.8.
This fixes issue #595.
This option activate the feature introduce in commit 16739778:
"MINOR: ssl: skip self issued CA in cert chain for ssl_ctx".
The patch disable the feature per default.
When trying to insert a new certificate into a directory with "add ssl
crt-list", no check were done on the path of the new certificate.
To be more consistent with the HAProxy reload, when adding a file to
a crt-list, if this crt-list is a directory, the certificate will need
to have the directory in its path.
Allowing the use of SSL options and filters when adding a file in a
directory is not really consistent with the reload of HAProxy. Disable
the ability to use these options if one try to use them with a directory.
This patch adds the sysname, release, version and machine fields from
the uname results to the version output. It intentionally leaves out the
machine name, because it is usually not useful and users might not want to
expose their machine names for privacy reasons.
May be backported if it is considered useful for debugging.
If haproxy fails to start and emits an alert, then it can be useful
to have it also emit the version and the path used to load it. Some
users may be mistakenly launching the wrong binary due to a misconfigured
PATH variable and this will save them some troubleshooting time when it
reports that some keywords are not understood.
What we do here is that we *try* to extract the binary name from the
AUX vector on glibc, and we report this as a NOTICE tag before the
very first alert is emitted.
Some portability issues were met a few times in the past depending on
compiler versions, but this one was not reported in haproxy -vv output
while it's trivial to add it. This patch tries to be the most accurate
by explicitly reporting the clang version if detected, otherwise the
gcc version.
Since some systems switched to service managers which hide all warnings
by default, some users are not aware of some possibly important warnings
and get caught too late with errors that could have been detected earlier.
This patch adds a new global keyword, "zero-warning" and an equivalent
command-line option "-dW" to refuse to start in case any warning is
detected. It is recommended to use these with configurations that are
managed by humans in order to catch mistakes very early.
This helps quickly checking if the config produces any warning. For
this we reuse the "warned" bit field to add a new WARN_ANY bit that is
set by ha_warning(). The rest of the bit field was also cleaned from
unused bits.
Before supporting "server" line in "peers" section, such sections without
any local peer were removed from the configuration to get it validated.
This patch fixes the issue where a "server" line without address and port which
is a remote peer without address and port makes the configuration parsing fail.
When encoutering such cases we now ignore such lines remove them from the
configuration.
Thank you to Jrme Magnin for having reported this bug.
Must be backported to 2.1 and 2.0.
Commit 7f26391bc5 ("BUG/MINOR: connection: make sure to correctly tag
local PROXY connections") revealed that some implementations do not
properly ignore addresses in LOCAL connections (at least Dovecot was
spotted). More context information in the thread below:
https://www.mail-archive.com/haproxy@formilux.org/msg36890.html
The patch above was using LOCAL on top of local addresses in order to
minimize the risk of breakage but revealed worse than a clean fix. So
let's partially revert it and send pure LOCAL connections instead now.
After a bit of observation, this patch should be progressively backported
to stable branches. However if it reveals new breakage, the backport of
the patch above will have to be reverted from stable branches while other
products work on fixing their code based on the master branch.
When reading a crt-list file, the SSL options betweeen square brackets
are parsed, however the calling function sets the ssl_conf ptr to NULL
leading to all options being ignored, and a memory leak.
This is a remaining of the previous code which was forgotten.
This bug was introduced by 97b0810 ("MINOR: ssl: split the line parsing
of the crt-list").
Create a ckch_store_new() function which alloc and initialize a
ckch_store, allowing us to remove duplicated code and avoiding wrong
initialization in the future.
Bug introduced by d9d5d1b ("MINOR: ssl: free instances and SNIs with
ckch_inst_free()").
Upon an 'commit ssl cert' the HA_RWLOCK_WRUNLOCK of the SNI lock is done
with using the bind_conf pointer of the ckch_inst which was freed.
Fix the problem by using an intermediate variable to store the
bind_conf pointer.
Replace ckchs_free() by ckch_store_free() which frees the ckch_store but
now also all its ckch_inst with ckch_inst_free().
Also remove the "ckchs" naming since its confusing.
In 'commit ssl cert', instead of trying to regenerate a list of filters
from the SNIs, use the list provided by the crtlist_entry used to
generate the ckch_inst.
This list of filters doesn't need to be free'd anymore since they are
always reused from the crtlist_entry.
Use the refcount of the SSL_CTX' to free them instead of freeing them on
certains conditions. That way we can free the SSL_CTX everywhere its
pointer is used.
When deleting the previous SNI entries with 'set ssl cert', the old
SSL_CTX' were not free'd, which probably prevent the completion of the
free of the X509 in the old ckch_store, because of the refcounts in the
SSL library.
This bug was introduced by 150bfa8 ("MEDIUM: cli/ssl: handle the
creation of SSL_CTX in an IO handler").
Must be backported to 2.1.
The crtlist_load_cert_dir() caches the directory name without trailing
slashes when ssl_sock_load_cert_list_file() tries to lookup without
cleaning the trailing slashes.
This bug leads to creating the crtlist twice and prevents to remove
correctly a crtlist_entry since it exists in the serveral crtlists
created by accident.
Move the trailing slashes cleanup in ssl_sock_load_cert_list_file() to
fix the problem.
This bug was introduced by 6be66ec ("MINOR: ssl: directories are loaded
like crt-list")
Delete a certificate store from HAProxy and free its memory. The
certificate must be unused and removed from any crt-list or directory.
The deletion doesn't work with a certificate referenced directly with
the "crt" directive in the configuration.
Bundles are deprecated and can't be used with the crt-list command of
the CLI, improve the error output when trying to use them so the users
can disable them.
The cli_parse_del_crtlist() does unlock the ckch big lock, but it does
not lock it at the beginning of the function which is dangerous.
As a side effect it let the structures locked once it called the unlock.
This bug was introduced by 0a9b941 ("MINOR: ssl/cli: 'del ssl crt-list'
delete an entry")
Issue #574 reported an unclear error when trying to open a file with not
enough permission.
[ALERT] 096/032117 (835) : parsing [/etc/haproxy/haproxy.cfg:54] : 'bind :443' : error encountered while processing 'crt'.
[ALERT] 096/032117 (835) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg
[ALERT] 096/032117 (835) : Fatal errors found in configuration.
Improve the error to give us more information:
[ALERT] 097/142030 (240089) : parsing [test.cfg:22] : 'bind :443' : cannot open the file 'kikyo.pem.rsa'.
[ALERT] 097/142030 (240089) : Error(s) found in configuration file : test.cfg
[ALERT] 097/142030 (240089) : Fatal errors found in configuration.
This patch could be backported in 2.1.
The dump and show ssl crt-list commands does the same thing, they dump
the content of a crt-list, but the 'show' displays an ID in the first
column. Delete the 'dump' command so it is replaced by the 'show' one.
The old 'show' command is replaced by an '-n' option to dump the ID.
And the ID which was a pointer is replaced by a line number and placed
after colons in the filename.
Example:
$ echo "show ssl crt-list -n kikyo.crt-list" | socat /tmp/sock1 -
# kikyo.crt-list
kikyo.pem.rsa:1 secure.domain.tld
kikyo.pem.ecdsa:2 secure.domain.tld
Delete an entry in a crt-list, this is done by iterating over the
ckch_inst in the crtlist_entry. For each ckch_inst the bind_conf lock is
held during the deletion of the sni_ctx in the SNI trees. Everything
is free'd.
If there is several entries with the same certificate, a line number
must be provided to chose with entry delete.
Initialize fcount to 0 when 'add ssl crt-list' does not contain any
filters. This bug can lead to trying to read some filters even if they
doesn't exist.
The HPACK header table is implemented as a wrapping list inside a contigous
area. Headers names and values are stored from right to left while indexes
are stored from left to right. When there's no more room to store a new one,
we wrap to the right again, or possibly defragment it if needed. The condition
do use the right part (called tailroom) or the left part (called headroom)
depends on the location of the last inserted header. After wrapping happens,
the code forces to stick to tailroom by pretending there's no more headroom,
so that the size fit test always fails. The problem is that nothing prevents
from storing a header with an empty name and empty value, resulting in a
total size of zero bytes, which satisfies the condition to use the headroom.
Doing this in a wrapped buffer results in changing the "front" header index
and causing miscalculations on the available size and the addresses of the
next headers. This may even allow to overwrite some parts of the index,
opening the possibility to perform arbitrary writes into a 32-bit relative
address space.
This patch fixes the issue by making sure the headroom is considered only
when the buffer does not wrap, instead of relying on the zero size. This
must be backported to all versions supporting H2, which is as far as 1.8.
Many thanks to Felix Wilhelm of Google Project Zero for responsibly
reporting this problem with a reproducer and a detailed analysis.
CVE-2020-11100 was assigned to this issue.
Add the support for filters and SSL options in the CLI command
"add ssl crt-list".
The feature was implemented by applying the same parser as the crt-list
file to the payload.
The new options are passed to the command as a payload with the same
format that is suppported by the crt-list file itself, so you can easily
copy a line from a file and push it via the CLI.
Example:
printf "add ssl crt-list localhost.crt-list <<\necdsa.pem [verify none allow-0rtt] localhost !www.test1.com\n\n" | socat /tmp/sock1 -
In order to reuse the crt-list line parsing in "add ssl crt-list",
the line parsing was extracted from crtlist_parse_file() to a new
function crtlist_parse_line().
This function fills a crtlist_entry structure with a bind_ssl_conf and
the list of filters.
We can't expect the DNS answer to always match the case we used for the
request, so we can't just use memcmp() to compare the DNS answer with what
we are expected.
Instead, introduce dns_hostname_cmp(), which compares each string in a
case-insensitive way.
This should fix github issue #566.
This should be backported to 2.1, 2.0, 1.9 and 1.8.
This patch fixes#53 where it was noticed that when an active
server is set to DRAIN it no longer has the color blue reflected
within the stats page. This patch addresses that and adds the
color back to drain. It's to be noted that backup servers are
configured to have an orange color when they are draining.
Should be backported as far as 1.7.
The new 'add ssl crt-list' command allows to add a new crt-list entry in
a crt-list (or a directory since they are handled the same way).
The principle is basicaly the same as the "commit ssl cert" command with
the exception that it iterates on every bind_conf that uses the crt-list
and generates a ckch instance (ckch_inst) for each of them.
The IO handler yield every 10 bind_confs so HAProxy does not get stuck in
a too much time consuming generation if it needs to generate too much
SSL_CTX'.
This command does not handle the SNI filters and the SSL configuration
yet.
Example:
$ echo "new ssl cert foo.net.pem" | socat /tmp/sock1 -
New empty certificate store 'foo.net.pem'!
$ echo -e -n "set ssl cert foo.net.pem <<\n$(cat foo.net.pem)\n\n" | socat /tmp/sock1 -
Transaction created for certificate foo.net.pem!
$ echo "commit ssl cert foo.net.pem" | socat /tmp/sock1 -
Committing foo.net.pem
Success!
$ echo "add ssl crt-list one.crt-list foo.net.pem" | socat /tmp/sock1 -
Inserting certificate 'foo.net.pem' in crt-list 'one.crt-list'......
Success!
$ echo "show ssl crt-list one.crt-list" | socat /tmp/sock1 -
# one.crt-list
0x55d17d7be360 kikyo.pem.rsa [ssl-min-ver TLSv1.0 ssl-max-ver TLSv1.3]
0x55d17d82cb10 foo.net.pem
The crtlist_entry structure use a pointer to the store as key.
That's a problem with the dynamic update of a certificate over the CLI,
because it allocates a new ckch_store. So updating the pointers is
needed. To achieve that, a linked list of the crtlist_entry was added in
the ckch_store, so it's easy to iterate on this list to update the
pointers. Another solution would have been to rework the system so we
don't allocate a new ckch_store, but it requires a rework of the ckch
code.
When updating a ckch_store we may want to update its pointer in the
crtlist_entry which use it. To do this, we need the list of the entries
using the store.
The instances were wrongly inserted in the crtlist entries, all
instances of a crt-list were inserted in the last crt-list entry.
Which was kind of handy to free all instances upon error.
Now that it's done correctly, the error path was changed, it must
iterate on the entries and find the ckch_insts which were generated for
this bind_conf. To avoid wasting time, it stops the iteration once it
found the first unsuccessful generation.
In order to be able to add new certificate in a crt-list, we need the
list of bind_conf that uses this crt-list so we can create a ckch_inst
for each of them.
The original algorithm always killed half the idle connections. This doesn't
take into account the way the load can change. Instead, we now kill half
of the exceeding connections (exceeding connection being the number of
used + idle connections past the last maximum used connections reached).
That way if we reach a peak, we will kill much less, and it'll slowly go back
down when there's less usage.
Add a counter to know the current number of used connections, as well as the
max, this will be used later to refine the algorithm used to kill idle
connections, based on current usage.
With server-template was introduced the possibility to scale the
number of servers in a backend without needing a configuration change
and associated reload. On the other hand it became impractical to
write use-server rules for these servers as they would only accept
existing server labels as argument. This patch allows the use of
log-format notation to describe targets of a use-server rules, such
as in the example below:
listen test
bind *:1234
use-server %[hdr(srv)] if { hdr(srv) -m found }
use-server s1 if { path / }
server s1 127.0.0.1:18080
server s2 127.0.0.1:18081
If a use-server rule is applied because it was conditionned by an
ACL returning true, but the target of the use-server rule cannot be
resolved, no other use-server rule is evaluated and we fall back to
load balancing.
This feature was requested on the ML, and bumped with issue #563.
Add a sample fetch for the name of a bind. This can be useful to
take decisions when PROXY protocol is used and we can't rely on dst,
such as the sample config below.
defaults
mode http
listen bar
bind 127.0.0.1:1111
server s1 127.0.1.1:1234 send-proxy
listen foo
bind 127.0.1.1:1234 name foo accept-proxy
http-request return status 200 hdr dst %[dst] if { dst 127.0.1.1 }
First: self issued CA, aka root CA, is the enchor for chain validation,
no need to send it, client must have it. HAProxy can skip it in ssl_ctx.
Second: the main motivation to skip root CA in ssl_ctx is to be able to
provide it in the chain without drawback. Use case is to provide issuer
for ocsp without the need for .issuer and be able to share it in
issuers-chain-path. This concerns all certificates without intermediate
certificates. It's useless for BoringSSL, .issuer is ignored because ocsp
bits doesn't need it.
13a9232ebc introduced parsing of
Additionnal DNS response section to pick up IP address when available.
That said, this introduced a side effect for other query types (A and
AAAA) leading to consider those responses invalid when parsing the
Additional section.
This patch avoids this situation by ensuring the Additional section is
parsed only for SRV queries.
In h1_detach(), if our input buffer isn't empty, don't just subscribe(), we
may hold a new request, and there's nothing left to read. Instead, call
h1_process() directly, so that a new stream is created.
Failure to do so means if we received the new request to early, the
connecetion will just hang, as it happens when using svn.
When a "peers" section has not any local peer, it is removed of the list
of "peers" sections by check_config_validity(). But a stick-table which
refers to a "peers" section stores a pointer to this peers section.
These pointer must be reset to NULL value for each stick-table refering to
such a "peers" section to prevent stktable_init() to start the peers frontend
attached to the peers section dereferencing the invalid pointer.
Furthemore this patch stops the peers frontend as this is done for other
configurations invalidated by check_config_validity().
Thank you to Olivier D for having reported this issue with such a
simple configuration file which made haproxy crash when started with
-c option for configuration file validation.
defaults
mode http
peers mypeers
peer toto 127.0.0.1:1024
backend test
stick-table type ip size 10k expire 1h store http_req_rate(1h) peers mypeers
Must be backported to 2.1 and 2.0.
Fix an infinite loop which was added in an attempt to fix#558.
If the peers_fe is NULL, it will loop forever.
Must be backported with a2cfd7e as far as 1.8.
Tim reported that in master-worker mode, if a stick-table is declared
but not used in the configuration, its associated peers listener won't
bind.
This problem is due the fact that the master-worker and the daemon mode,
depend on the bind_proc field of the peers proxy to disable the listener.
Unfortunately the bind_proc is left to 0 if no stick-table were used in
the configuration, stopping the listener on all processes.
This fixes sets the bind_proc to the first process if it wasn't
initialized.
Should fix bug #558. Should be backported as far as 1.8.
SSL_CTX_set1_chain is used for openssl >= 1.0.2 and a loop with
SSL_CTX_add_extra_chain_cert for openssl < 1.0.2.
SSL_CTX_add_extra_chain_cert exist with openssl >= 1.0.2 and can be
used for all openssl version (is new name is SSL_CTX_add0_chain_cert).
This patch use SSL_CTX_add_extra_chain_cert to remove any #ifdef for
compatibilty. In addition sk_X509_num()/sk_X509_value() replace
sk_X509_shift() to extract CA from chain, as it is used in others part
of the code.
This bug was introduced by 85888573 "BUG/MEDIUM: ssl: chain must be
initialized with sk_X509_new_null()". No need to set find_chain with
sk_X509_new_null(), use find_chain conditionally to fix issue #516.
This bug was referenced by issue #559.
[wla: fix some alignment/indentation issue]
In run_thread_poll_loop() we test both for (global_tasks_mask & tid_bit)
and thread_has_tasks(), but the former is useless since this test is
already part of the latter.
Commit 4b3f27b ("BUG/MINOR: haproxy/threads: try to make all threads
leave together") improved the soft-stop synchronization but it left a
small race open because it looks at tasks_run_queue, which can drop
to zero then back to one while another thread picks the task from the
run queue to insert it into the tasklet_list. The risk is very low but
not null. In addition the condition didn't consider the possible presence
of signals in the queue.
This patch moves the stopping detection just after the "wake" calculation
which already takes care of the various queues' sizes and signals. It
avoids needlessly duplicating these tests.
The bug was discovered during a code review but will probably never be
observed. This fix may be backported to 2.1 and 2.0 along with the commit
above.
In the various muxes, add a comment documenting that once
srv_add_to_idle_list() got called, any thread may pick that conenction up,
so it is unsafe to access the mux context/the connection, the only thing we
can do is returning.
In h1_detach(), make sure we subscribe before we call
srv_add_to_idle_list(), not after. As soon as srv_add_to_idle_list() is
called, and it is put in an idle list, another thread can take it, and
we're no longer allowed to subscribe.
This fixes a race condition when another thread grabs a connection as soon
as it is put, the original owner would subscribe, and thus the new thread
would fail to do so, and to activate polling.
Fix a potential NULL dereference in "show ssl cert" when we can't
allocate the <out> trash buffer.
This patch creates a new label so we could jump without trying to do the
ci_putchk in this case.
This bug was introduced by ea987ed ("MINOR: ssl/cli: 'new ssl cert'
command"). 2.2 only.
This bug was referenced by issue #556.
In connect_server(), make sure we properly free a newly created connection
if we somehow fail, and it has not yet been attached to a conn_stream, or
it would lead to a memory leak.
This should appease coverity for backend.c, as reported in inssue #556.
This should be backported to 2.1, 2.0 and 1.9
Fix a memory leak that could happen upon a "show ssl cert" if notBefore:
or notAfter: failed to extract its ASN1 string.
Introduced by d4f946c ("MINOR: ssl/cli: 'show ssl cert' give information
on the certificates"). 2.2 only.
In `crtlist_dup_filters()` add the `1` to the number of elements instead of
the size of a single element.
This bug was introduced in commit 2954c478eb,
which is 2.2+. No backport needed.
In `ckch_inst_sni_ctx_to_sni_filters` use `calloc()` to allocate the filter
array. When the function fails to allocate memory for a single entry the
whole array will be `free()`d using free_sni_filters(). With the previous
`malloc()` the pointers for entries after the failing allocation could
possibly be a garbage value.
This bug was introduced in commit 38df1c8006,
which is 2.2+. No backport needed.
In conn_backend_get(), when we manage to get an idle connection from the
current thread's pool, don't forget to decrement the idle connection
counters, or we may end up not reusing connections when we could, and/or
killing connections when we shouldn't.
In connect_server(), if we notice we have more file descriptors opened than
we should, there's no reason not to close a connection just because we're
reusing one, so do it anyway.
In connect_server(), if we no longer have any idle connections for the
current thread, attempt to use the new "takeover" mux method to steal a
connection from another thread.
This should have no impact right now, given no mux implements it.
Make the "list" element a struct mt_list, and explicitely use
list_from_mt_list to get a struct list * where it is used as such, so that
mt_list_for_each_entry will be usable with it.
Implement a new function, fd_takeover(), that lets you become the thread
responsible for the fd. On architectures that do not have a double-width CAS,
use a global rwlock.
fd_set_running() was also changed to be able to compete with fd_takeover(),
either using a dooble-width CAS on both running_mask and thread_mask, or
by claiming a reader on the global rwlock. This extra operation should not
have any measurable impact on modern architectures where threading is
relevant.
Revamp the server connection lists. We know have 3 lists :
- idle_conns, which contains idling connections
- safe_conns, which contains idling connections that are safe to use even
for the first request
- available_conns, which contains connections that are not idling, but can
still accept new streams (those are HTTP/2 or fastcgi, and are always
considered safe).
Make it so sessions are not responsible for connection anymore, except for
connections that are private, and thus can't be shared, otherwise, as soon
as a request is done, the session will just add the connection to the
orphan connections pool.
This will break http-reuse safe, but it is expected to be fixed later.
The CLI command "new ssl cert" allows one to create a new certificate
store in memory. It can be filed with "set ssl cert" and "commit ssl
cert".
This patch also made a small change in "show ssl cert" to handle an
empty certificate store.
Multi-certificate bundles are not supported since they will probably be
removed soon.
This feature alone is useless since there is no way to associate the
store to a crt-list yet.
Example:
$ echo "new ssl cert foobar.pem" | socat /tmp/sock1 -
New empty certificate store 'foobar.pem'!
$ printf "set ssl cert foobar.pem <<\n$(cat localhost.pem.rsa)\n\n" | socat /tmp/sock1 -
Transaction created for certificate foobar.pem!
$ echo "commit ssl cert foobar.pem" | socat /tmp/sock1 -
Committing foobar.pem
Success!
$ echo "show ssl cert foobar.pem" | socat /tmp/sock1 -
Filename: foobar.pem
[...]
The flush_lock was introduced, mostly to be sure that pool_gc() will never
dereference a pointer that has been free'd. __pool_get_first() was acquiring
the lock to, the fear was that otherwise that pointer could get free'd later,
and then pool_gc() would attempt to dereference it. However, that can not
happen, because the only functions that can free a pointer, when using
lockless pools, are pool_gc() and pool_flush(), and as long as those two
are mutually exclusive, nobody will be able to free the pointer while
pool_gc() attempts to access it.
So change the flush_lock to a spinlock, and don't bother acquire/release
it in __pool_get_first(), that way callers of __pool_get_first() won't have
to wait while the pool is flushed. The worst that can happen is we call
__pool_refill_alloc() while the pool is getting flushed, and memory can
get allocated just to be free'd.
This may help with github issue #552
This may be backported to 2.1, 2.0 and 1.9.
When running __signal_process_queue(), we ignore most signals. We can't,
however, ignore WDTSIG and DEBUGSIG, otherwise that thread may end up
waiting for another one that could hold a glibc lock, while the other thread
wait for this one to enter debug_handler().
So make sure WDTSIG and DEBUGSIG aren't ignored, if they are defined.
This probably explains the watchdog deadlock described in github issue
This should be backported to 2.1, 2.0 and 1.9.
Move the definition of WDTSIG and DEBUGSIG from wdt.c and debug.c into
types/signal.h, so that we can access them in another file.
We need those definition to avoid blocking those signals when running
__signal_process_queue().
This should be backported to 2.1, 2.0 and 1.9.
The behavior of calloc() when being passed `0` as `nelem` is implementation
defined. It may return a NULL pointer.
Avoid this issue by checking before allocating. While doing so adjust the local
integer variables that are used to refer to memory offsets to `size_t`.
This issue was introced in commit f91ac19299. This
patch should be backported together with that commit.
There is a memleak of the entry structure in crtlist_load_cert_dir(), in
the case we can't stat the file, or this is not a regular file. Let's
move the entry allocation so it's done after these tests.
Fix issue #551.
When tasklet were introduced, it has been decided not to provide the tasklet
to the callback, but NULL instead. While it may have been reasonable back
then, maybe to be able to differentiate a task from a tasklet from the
callback, it also means that we can't access the tasklet from the handler if
the context provided can't be trusted.
As no handler is shared between a task and a tasklet, and there are now
other means of distinguishing between task and tasklet, just pass the
tasklet pointer too.
This may be backported to 2.1, 2.0 and 1.9 if needed.
A memory leak happens in an error case when ckchs_load_cert_file()
returns NULL in crtlist_parse_file().
This bug was introduced by commit 2954c47 ("MEDIUM: ssl: allow crt-list caching")
This patch fixes bug #551.
In the struct fdtab, introduce a new mask, running_mask. Each thread should
add its bit before using the fd.
Use the running_mask instead of a lock, in fd_insert/fd_delete, we'll just
spin as long as the mask is non-zero, to be sure we access the data
exclusively.
fd_set_running_excl() spins until the mask is 0, fd_set_running() just
adds the thread bit, and fd_clr_running() removes it.
Implement 2 new commands on the CLI:
show ssl crt-list [<filename>]: Without a specified filename, display
the list of crt-lists used by the configuration. If a filename is
specified, it will displays the content of this crt-list, with a line
identifier at the beginning of each line. This output must not be used
as a crt-list file.
dump ssl crt-list <filename>: Dump the content of a crt-list, the output
can be used as a crt-list file.
Note: It currently displays the default ssl-min-ver and ssl-max-ver
which are potentialy not in the original file.
Don't bother trying to remove the connection from the idle list, as the
only connections the mux_pt handles are now the TCP-mode connections, and
those are never added to the idle list.
The agent's engine_id forgot to dup from trash, all engine_ids point to
the same address "&trash.area", the engine_id changed at run time and will
double free when release agents and trash.
This bug was introduced by the commit ee3bcddef ("MINOR: tools: add a generic
function to generate UUIDs").
No backport is needed, this is 2.2-dev.
The commit 6be66ec ("MINOR: ssl: directories are loaded like crt-list")
broke the directory loading of the certificates. The <crtlist> wasn't
filled by the crtlist_load_cert_dir() function. And the entries were
not correctly initialized. Leading to a segfault during startup.
Generate a directory cache with the crtlist and crtlist_entry structures.
With this new model, directories are a special case of the crt-lists.
A directory is a crt-list which allows only one occurence of each file,
without SSL configuration (ssl_bind_conf) and without filters.
The crtlist structure defines a crt-list in the HAProxy configuration.
It contains crtlist_entry structures which are the lines in a crt-list
file.
crt-list are now loaded in memory using crtlist and crtlist_entry
structures. The file is read only once. The generation algorithm changed
a little bit, new ckch instances are generated from the crtlist
structures, instead of being generated during the file loading.
The loading function was split in two, one that loads and caches the
crt-list and certificates, and one that looks for a crt-list and creates
the ckch instances.
Filters are also stored in crtlist_entry->filters as a char ** so we can
generate the sni_ctx again if needed. I won't be needed anymore to parse
the sni_ctx to do that.
A crtlist_entry stores the list of all ckch_inst that were generated
from this entry.
Pass a pointer to the struct ckch_inst to the ssl_sock_load_ckchs()
function so we can manipulate the ckch_inst from
ssl_sock_load_cert_list_file() and ssl_sock_load_cert().
This bug was introduced with peers.c code re-work (7d0ceeec80):
"struct peer" flags are mistakenly checked instead of
"struct peers" flags to check the resync status of the local peer.
The issue was reported here:
https://github.com/haproxy/haproxy/issues/545
This bug affects all branches >= 2.0 and should be backported.
It's more generic and versatile than the previous shut_your_big_mouth_gcc()
that was used to silence annoying warnings as it's not limited to ignoring
syscalls returns only. This allows us to get rid of the aforementioned
function and the shut_your_big_mouth_gcc_int variable, that started to
look ugly in multi-threaded environments.
This message comes when we run:
haproxy -c -V -f /etc/haproxy/haproxy.cfg
[ALERT] 072/233727 (30865) : parsing [/etc/haproxy/haproxy.cfg:34] : The 'rspirep' directive is not supported anymore sionce HAProxy 2.1. Use 'http-response replace-header' instead.
[ALERT] 072/233727 (30865) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg
[ALERT] 072/233727 (30865) : Fatal errors found in configuration.
These are mostly comments in the code. A few error messages were fixed
and are of low enough importance not to deserve a backport. Some regtests
were also fixed.
This adds the missing blank lines in `make_proxy_line_v2` and
`conn_recv_proxy`. It also adjusts the type of the temporary variable
used for the return value of `recv` to be `ssize_t` instead of `int`.
This patch adds the `unique-id` option to `proxy-v2-options`. If this
option is set a unique ID will be generated based on the `unique-id-format`
while sending the proxy protocol v2 header and stored as the unique id for
the first stream of the connection.
This feature is meant to be used in `tcp` mode. It works on HTTP mode, but
might result in inconsistent unique IDs for the first request on a keep-alive
connection, because the unique ID for the first stream is generated earlier
than the others.
Now that we can send unique IDs in `tcp` mode the `%ID` log variable is made
available in TCP mode.
There's a small issue with soft stop combined with the incoming
connection load balancing. A thread may dispatch a connection to
another one at the moment stopping=1 is set, and the second one could
stop by seeing (jobs - unstoppable_jobs) == 0 in run_poll_loop(),
without ever picking these connections from the queue. This is
visible in that it may occasionally cause a connection drop on
reload since no remaining thread will ever pick that connection
anymore.
In order to address this, this patch adds a stopping_thread_mask
variable by which threads acknowledge their willingness to stop
when their runqueue is empty. And all threads will only stop at
this moment, so that if finally some late work arrives in the
thread's queue, it still has a chance to process it.
This should be backported to 2.1 and 2.0.
When stopping there is a risk that other threads are already in the
process of stopping, so let's not add new work in their queue and
instead keep the incoming connection local.
This should be backported to 2.1 and 2.0.
Surprizingly the variable was never initialized, though on most
platforms it's zeroed at boot, and it is relatively harmless
anyway since in the worst case the bits are updated around poll().
This was introduced by commit 79321b95a8 and needs to be backported
as far as 1.9.
In pool_gc(), when we're not using lockless pool, always update free_list,
and read from it the next element to free. As we now unlock the pool while
we're freeing the item, another thread could have updated free_list in our
back. Not doing so could lead to segfaults when pool_gc() is called.
This should be backported to 2.1.
Don't assume the connection always has a valid session in "owner".
Instead, attempt to retrieve the session from the stream, and modify
the error snapshot code to not assume we always have a session, or the proxy
for the other end.
x86_64 and ARM64 do support the double-word atomic CAS. However on
ARM it must be done only on aligned data. The random generator makes
use of such double-word atomic CAS when available but didn't enforce
alignment, which causes ARM64 to crash early in the startup since
commit 52bf839 ("BUG/MEDIUM: random: implement a thread-safe and
process-safe PRNG").
This commit just unconditionally aligns the arrays. It must be
backported to all branches where the commit above is backported
(likely till 2.0).
Remove the list of private connections from server, it has been largely
unused, we only inserted connections in it, but we would never actually
use it.
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
The maxsock value is currently derived from global.maxconn and a few other
settings, some of which also depend on global.maxconn. This makes it
difficult to check if a limit is already too high or not during the maxconn
automatic sizing.
Let's move this code into a new function, compute_ideal_maxsock() which now
takes a maxconn in argument. It performs the same operations and returns
the maxsock value if global.maxconn were to be set to that value. It now
replaces the previous code to compute maxsock.
When calling MT_LIST_DEL_SAFE(), give him the temporary pointer "tmpelt",
as that's what is expected. We want to be able to set that pointer to NULL,
to let other parts of the code know we deleted an element.
In order to store and cache the directory loading, the directory loading
was separated from ssl_sock_load_cert() and put in a new function
ssl_sock_load_cert_dir() to be more readable.
This patch only splits the function in two.
SI_TKILL is not necessarily defined on older systems and is used only
with the pthread_kill() call a few lines below, so it should also be
subject to the USE_THREAD condition.
Technically speaking the call was implemented in glibc 2.3 so we must
rely on this and not on __USE_GNU which is an internal define of glibc
to track use of GNU_SOURCE.
The splice() syscall has been supported in glibc since version 2.5 issued
in 2006 and is present on supported systems so there's no need for having
our own arch-specific syscall definitions anymore.
This was made to support epoll on patched 2.4 kernels, and on early 2.6
using alternative libcs thanks to the arch-specific syscall definitions.
All the features we support have been around since 2.6.2 and present in
glibc since 2.3.2, neither of which are found in field anymore. Let's
simply drop this and use epoll normally.
The accept4() syscall has been present for a while now, there is no more
reason for maintaining our own arch-specific syscall implementation for
systems lacking it in libc but having it in the kernel.
This was introduced 10 years ago to squeeze a few CPU cycles per syscall
on 32-bit x86 machines and was already quite old by then, requiring to
explicitly enable support for this in the kernel. We don't even know if
it still builds, let alone if it works at all on recent kernels! Let's
completely drop this now.
Since commit 244b070 ("MINOR: ssl/cli: support crt-list filters"),
HAProxy generates a list of filters based on the sni_ctx in memory.
However it's not always relevant, sometimes no filters were configured
and the CN/SAN in the new certificate are not the same.
This patch fixes the issue by using a flag filters in the ckch_inst, so
we are able to know if there were filters or not. In the late case it
uses the CN/SAN of the new certificate to generate the sni_ctx.
note: filters are still only used in the crt-list atm.
We currently have two UUID generation functions, one for the sample
fetch and the other one in the SPOE filter. Both were a bit complicated
since they were made to support random() implementations returning an
arbitrary number of bits, and were throwing away 33 bits every 64. Now
we don't need this anymore, so let's have a generic function consuming
64 bits at once and use it as appropriate.
The rand() sample fetch supports being limited to a certain range, but
it only uses 31 bits and scales them as requested, which means that when
the requested output range is larger than 31 bits, the least significant
one is not random and may even be constant.
Let's make use of the whole 32 bits now that we have access ot them.
In order to honor spread_checks we currently call rand() which is not
thread safe and which must never turn its internal state to zero. This
is not thread safe, let's use ha_random() instead. This is a complement
to commimt 52bf839394 ("BUG/MEDIUM: random: implement a thread-safe and
process-safe PRNG") and may be backported with it.
For the random LB algorithm we need a random 32-bit hashing key that used
to be made of two calls to random(). Now we can simply perform a single
call to ha_random32() and get rid of the useless operations.
This is the replacement of failed attempt to add thread safety and
per-process sequences of random numbers initally tried with commit
1c306aa84d ("BUG/MEDIUM: random: implement per-thread and per-process
random sequences").
This new version takes a completely different approach and doesn't try
to work around the horrible OS-specific and non-portable random API
anymore. Instead it implements "xoroshiro128**", a reputedly high
quality random number generator, which is one of the many variants of
xorshift, which passes all quality tests and which is described here:
http://prng.di.unimi.it/
While not cryptographically secure, it is fast and features a 2^128-1
period. It supports fast jumps allowing to cut the period into smaller
non-overlapping sequences, which we use here to support up to 2^32
processes each having their own, non-overlapping sequence of 2^96
numbers (~7*10^28). This is enough to provide 1 billion randoms per
second and per process for 2200 billion years.
The implementation was made thread-safe either by using a double 64-bit
CAS on platforms supporting it (x86_64, aarch64) or by using a local
lock for the time needed to perform the shift operations. This ensures
that all threads pick numbers from the same pool so that it is not
needed to assign per-thread ranges. For processes we use the fast jump
method to advance the sequence by 2^96 for each process.
Before this patch, the following config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout format raw daemon
log-format "%[uuid] %pid"
redirect location /
Would produce this output:
a4d0ad64-2645-4b74-b894-48acce0669af 12987
a4d0ad64-2645-4b74-b894-48acce0669af 12992
a4d0ad64-2645-4b74-b894-48acce0669af 12986
a4d0ad64-2645-4b74-b894-48acce0669af 12988
a4d0ad64-2645-4b74-b894-48acce0669af 12991
a4d0ad64-2645-4b74-b894-48acce0669af 12989
a4d0ad64-2645-4b74-b894-48acce0669af 12990
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12987
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12992
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12986
(...)
And now produces:
f94b29b3-da74-4e03-a0c5-a532c635bad9 13011
47470c02-4862-4c33-80e7-a952899570e5 13014
86332123-539a-47bf-853f-8c8ea8b2a2b5 13013
8f9efa99-3143-47b2-83cf-d618c8dea711 13012
3cc0f5c7-d790-496b-8d39-bec77647af5b 13015
3ec64915-8f95-4374-9e66-e777dc8791e0 13009
0f9bf894-dcde-408c-b094-6e0bb3255452 13011
49c7bfde-3ffb-40e9-9a8d-8084d650ed8f 13014
e23f6f2e-35c5-4433-a294-b790ab902653 13012
There are multiple benefits to using this method. First, it doesn't
depend anymore on a non-portable API. Second it's thread safe. Third it
is fast and more proven than any hack we could attempt to try to work
around the deficiencies of the various implementations around.
This commit depends on previous patches "MINOR: tools: add 64-bit rotate
operators" and "BUG/MEDIUM: random: initialize the random pool a bit
better", all of which will need to be backported at least as far as
version 2.0. It doesn't require to backport the build fixes for circular
include files dependecy anymore.
This reverts commit 1c306aa84d.
It breaks the build on all non-glibc platforms. I got confused by the
man page (which possibly is the most confusing man page I've ever read
about a standard libc function) and mistakenly understood that random_r
was portable, especially since it appears in latest freebsd source as
well but not in released versions, and with a slightly different API :-/
We need to find a different solution with a fallback. Among the
possibilities, we may reintroduce this one with a fallback relying on
locking around the standard functions, keeping fingers crossed for no
other library function to call them in parallel, or we may also provide
our own PRNG, which is not necessarily more difficult than working
around the totally broken up design of the portable API.
As mentioned in previous patch, the random number generator was never
made thread-safe, which used not to be a problem for health checks
spreading, until the uuid sample fetch function appeared. Currently
it is possible for two threads or processes to produce exactly the
same UUID. In fact it's extremely likely that this will happen for
processes, as can be seen with this config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout daemon format raw
log-format "%[uuid] %pid"
redirect location /
It typically produces this log:
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30645
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30641
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30644
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30646
07764439-c24d-4e6f-a5a6-0138be59e7a8 30645
07764439-c24d-4e6f-a5a6-0138be59e7a8 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30643
07764439-c24d-4e6f-a5a6-0138be59e7a8 30646
b6773fdd-678f-4d04-96f2-4fb11ad15d6b 30646
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30642
07764439-c24d-4e6f-a5a6-0138be59e7a8 30642
What this patch does is to use a distinct per-thread and per-process
seed to make sure the same sequences will not appear, and will then
extend these seeds by "burning" a number of randoms that depends on
the global random seed, the thread ID and the process ID. This adds
roughly 20 extra bits of randomness, resulting in 52 bits total per
thread and per process.
It only takes a few milliseconds to burn these randoms and given
that threads start with a different seed, we know they will not
catch each other. So these random extra bits are essentially added
to ensure randomness between boots and cluster instances.
This replaces all uses of random() with ha_random() which uses the
thread-local state.
This must be backported as far as 2.0 or any version having the
UUID sample-fetch function since it's the main victim here.
It's important to note that this patch, in addition to depending on
the previous one "BUG/MEDIUM: init: initialize the random pool a bit
better", also depends on the preceeding build fixes to address a
circular dependency issue in the include files that prevented it
from building. Part or all of these patches may need to be backported
or adapted as well.
Since the UUID sample fetch was created, some people noticed that in
certain virtualized environments they manage to get exact same UUIDs
on different instances started exactly at the same moment. It turns
out that the randoms were only initialized to spread the health checks
originally, not to provide "clean" randoms.
This patch changes this and collects more randomness from various
sources, including existing randoms, /dev/urandom when available,
RAND_bytes() when OpenSSL is available, as well as the timing for such
operations, then applies a SHA1 on all this to keep a 160 bits random
seed available, 32 of which are passed to srandom().
It's worth mentioning that there's no clean way to pass more than 32
bits to srandom() as even initstate() provides an opaque state that
must absolutely not be tampered with since known implementations
contain state information.
At least this allows to have up to 4 billion different sequences
from the boot, which is not that bad.
Note that the thread safety was still not addressed, which is another
issue for another patch.
This must be backported to all versions containing the UUID sample
fetch function, i.e. as far as 2.0.
In the same way than for the request, when a redirect rule is applied the
transction is aborted. This must be done returning HTTP_RULE_RES_ABRT from
http_res_get_intercept_rule() function.
No backport needed because on previous versions, the action return values are
not handled the same way.
Backend counters must be incremented only if a backend was already assigned to
the stream (when the stream exists). Otherwise, it means we are still on the
frontend side.
This patch may be backported as far as 1.6.
When an action interrupts a transaction, returning a response or not, it must
return the ACT_RET_ABRT value and not ACT_RET_STOP. ACT_RET_STOP is reserved to
stop the processing of the current ruleset.
No backport needed because on previous versions, the action return values are
not handled the same way.
When at least a filter is attached to a stream, FLT_END analyzers must be
preserved on request and response channels.
This patch should be backported as far as 1.7.
Since the HTX mode is the only mode to process HTTP messages, the stream is
created for a uniq transaction. The keep-alive is handled at the mux level. So,
the compression filter can be initialized when the stream is created and
released with the stream. Concretly, .channel_start_analyze and
.channel_end_analyze callback functions are replaced by .attach and .detach
ones.
With this change, it is no longer necessary to call FLT_START_FE/BE and FLT_END
analysers for the compression filter.
Since the HTX mode is the only mode to process HTTP messages, the stream is
created for a uniq transaction. The keep-alive is handled at the mux level. So,
the cache filter can be initialized when the stream is created and released with
the stream. Concretly, .channel_start_analyze and .channel_end_analyze callback
functions are replaced by .attach and .detach ones.
With this change, it is no longer necessary to call FLT_START_FE/BE and FLT_END
analysers for the cache filter.
A typo was introduced by the commit c5bb5a0f2 ("BUG/MINOR: http-rules: Preserve
FLT_END analyzers on reject action").
This patch must be backported with the commit c5bb5a0f2.
When at least a filter is attached to a stream, FLT_END analyzers must be
preserved on request and response channels.
This patch should be backported as far as 1.8.
When an action interrupts a transaction, returning a response or not, it must
return the ACT_RET_ABRT value and not ACT_RET_DONE. ACT_RET_DONE is reserved to
stop the processing on the current channel but some analysers may still be
active. When ACT_RET_ABRT is returned, all analysers are removed, except FLT_END
if it is set.
No backport needed because on previous verions, the action return value was not
handled the same way.
It is stated in the comment the return action returns ACT_RET_ABRT on
success. It it the right code to use to abort a transaction. ACT_RET_DONE must
be used when the message processing must be stopped. This does not means the
transaction is interrupted.
No backport needed.
The wake_time of a lua context is now always set to TICK_ETERNITY when the
context is initialized and when everytime the execution of the lua stack is
started. It is mandatory to not set arbitrary wake_time when an action yields.
No backport needed.
This flag was used in some internal functions to be sure the current stream is
able to handle HTTP content. It was introduced when the legacy HTTP code was
still there. Now, It is possible to rely on stream's flags to be sure we have an
HTX stream.
So the flag HLUA_TXN_HTTP_RDY can be removed. Everywhere it was tested, it is
replaced by a call to the IS_HTX_STRM() macro.
This patch is mandatory to allow the support of the filters written in lua.
In these functions, the lua txn was not used. So it can be removed from the
function argument list.
This patch is mandatory to allow the support of the filters written in lua.
In this function, the lua txn was only used to retrieve the stream. But it can
be retieve from the HTTP message, using its channel pointer. So, the lua txn can
be removed from the function argument list.
This patch is mandatory to allow the support of the filters written in lua.
In this function, the lua txn was only used to test if the HTTP transaction is
defined. But it is always used in a context where it is true. So, the lua txn
can be removed from the function argument list.
This patch is mandatory to allow the support of the filters written in lua.
The Lua function Channel.is_full() should not take care of the reserve because
it is not called from a producer (an applet for instance). From an action, it is
allowed to overwrite the buffer reserve.
This patch should be backported as far as 1.7. But it must be adapted for 1.8
and lower because there is no HTX on these versions.
When a lua action aborts a transaction calling txn:done() function, the action
must return ACT_RET_ABRT instead of ACT_RET_DONE. It is mandatory to
abort the message analysis.
This patch must be backported everywhere the commit 7716cdf45 ("MINOR: lua: Get
the action return code on the stack when an action finishes") was
backported. For now, no backport needed.
When an error occurred on the response side, request analysers must be reset. At
this stage, only AN_REQ_HTTP_XFER_BODY analyser remains, and possibly
AN_REQ_FLT_END, if at least one filter is attached to the stream. So it is safe
to remove the AN_REQ_HTTP_XFER_BODY analyser. An error was already handled and a
response was already returned to the client (or it was at least scheduled to be
sent). So there is no reason to continue to process the request payload. It may
cause some troubles for the filters because when an error occurred, data from
the request buffer are truncated.
This patch must be backported as far as 1.9, for the HTX part only. I don't know
if the legacy HTTP code is affected.
During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.
So, from the compression filter point of view, when we loop on the HTX blocks to
compress the response payload, we must start from the head of the HTX
message. To ease the loop, we use the function htx_find_offset().
This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.
During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.
So, from the cache point of view, when we loop on the HTX blocks to cache the
response payload, we must start from the head of the HTX message. To ease the
loop, we use the function htx_find_offset().
This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.
If a filter enable the data filtering, in TCP or in HTTP, but it does not
defined the corresponding callback function (so http_payload() or
tcp_payload()), it will be ignored. If all configured data filter do the same,
we must be sure to forward everything. Otherwise nothing will be forwarded at
all.
This patch must be forwarded as far as 1.9.
When the tcp or http payload is filtered, it is important to use the filter
offset to decude the amount of forwarded data because this offset may change
during the call to the callback function. So we should not rely on a local
variable defined before this call.
For now, existing HAproxy filters don't change this offset, so this bug may only
affect external filters.
This patch must be forwarded as far as 1.9.
The htx_find_offset() function may be used to look for a block at a specific
offset in an HTX message, starting from the message head. A compound result is
returned, an htx_ret structure, with the found block and the position of the
offset in the block. If the offset is ouside of the HTX message, the returned
block is NULL.
This patch fixes PROXYv2 parsing when the payload of the TCP connection is
fused with the PROXYv2 header within a single recv() call.
Previously HAProxy ignored the PROXYv2 header length when attempting to
parse the TLV, possibly interpreting the first byte of the payload as a
TLV type.
This patch adds proper validation. It ensures that:
1. TLV parsing stops when the end of the PROXYv2 header is reached.
2. TLV lengths cannot exceed the PROXYv2 header length.
3. The PROXYv2 header ends together with the last TLV, not allowing for
"stray bytes" to be ignored.
A reg-test was added to ensure proper behavior.
This patch tries to find the sweat spot between a small and easily
backportable one, and a cleaner one that's more easily adaptable to
older versions, hence why it merges the "if" and "while" blocks which
causes a reindent of the whole block. It should be used as-is for
versions 1.9 to 2.1, the block about PP2_TYPE_AUTHORITY should be
dropped for 2.0 and the block about CRC32C should be dropped for 1.8.
This bug was introduced when TLV parsing was added. This happened in commit
b3e54fe387. This commit was first released
with HAProxy 1.6-dev1.
A similar issue was fixed in commit 7209c204bd.
This patch must be backported to HAProxy 1.6+.
James Stroehmann reported something working as documented but that can
be considered as a regression in the way the automatic maxconn is
calculated from the process' limits :
https://www.mail-archive.com/haproxy@formilux.org/msg36523.html
The purpose of the changes in 2.0 was to have maxconn default to the
highest possible value permitted to the user based on the ulimit -n
setting, however the calculation starts from the soft limit, which
can be lower than what users were allowed to with previous versions
where the default value of 2000 would force a higher ulimit -n as
long as it fitted in the hard limit.
Usually this is not noticeable if the user changes the limits, because
quite commonly setting a new value restricts both the soft and hard
values.
Let's instead always use the max between the hard and soft limits, as
we know these values are permitted. This was tried on the following
setup:
$ cat ulimit-n.cfg
global
stats socket /tmp/sock1 level admin
$ ulimit -n
1024
Before the change the limits would show like this:
$ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
Maxsock: 1023
Maxconn: 489
After the change the limits are now much better and more in line with
the default settings in earlier versions:
$ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
Maxsock: 4095
Maxconn: 2025
The difference becomes even more obvious when running moderately large
configs with hundreds of checked servers and hundreds of listeners:
$ cat ulimit-n.cfg
global
stats socket /tmp/sock1 level admin
listen l
bind :10000-10300
server-template srv- 300 0.0.0.0 check disabled
Before After
Maxsock 1024 4096
Maxconn 189 1725
This issue is tagged as minor since a trivial config change fixes it,
but it would help new users to have it backported as far as 2.0.
pattern_finalize_config() uses an inefficient algorithm which is a
problem with very large configuration files. This affects startup, and
therefore reload time. When haproxy is deployed as a router in a
Kubernetes cluster the generated configuration file may be large and
reloads are frequently occuring, which makes this a significant issue.
The old algorithm is O(n^2)
* allocate missing uids - O(n^2)
* sort linked list - O(n^2)
The new algorithm is O(n log n):
* find the user allocated uids - O(n)
* store them for efficient lookup - O(n log n)
* allocate missing uids - n times O(log n)
* sort all uids - O(n log n)
* convert back to linked list - O(n)
Performance examples, startup time in seconds:
pat_refs old new
1000 0.02 0.01
10000 2.1 0.04
20000 12.3 0.07
30000 27.9 0.10
40000 52.5 0.14
50000 77.5 0.17
Please backport to 1.8, 2.0 and 2.1.
This command is used to produce an arbitrary amount of data on the
output. It can be used to test the CLI's state machine as well as
the internal parts related to applets an I/O. A typical test consists
in asking for all sizes from 0 to 16384:
$ (echo "prompt;expert-mode on";for i in {0..16384}; do
echo "debug dev write $i"; done) | socat - /tmp/sock1 | wc -c
134258738
A better test would consist in first waiting for the response before
sending a new request.
This command is not restricted to the admin since it's harmless.
There's a build error reported here:
c9c6cdbf9c/checks
It's just caused by an inconditional assignment of tmp_filter to
*sni_filter without having been initialized, though it's harmless because
this return pointer is not used when fcount is NULL, which is the only
case where this happens.
No backport is needed as this was brought today by commit 38df1c8006
("MINOR: ssl/cli: support crt-list filters").
It was only possible to go down from the ckch_store to the sni_ctx but
not to go up from the sni_ctx to the ckch_store.
To allow that, 2 pointers were added:
- a ckch_inst pointer in the struct sni_ctx
- a ckckh_store pointer in the struct ckch_inst
Generate a list of the previous filters when updating a certificate
which use filters in crt-list. Then pass this list to the function
generating the sni_ctx during the commit.
This feature allows the update of the crt-list certificates which uses
the filters with "set ssl cert".
This function could be probably replaced by creating a new
ckch_inst_new_load_store() function which take the previous sni_ctx list as
an argument instead of the char **sni_filter, avoiding the
allocation/copy during runtime for each filter. But since are still
handling the multi-cert bundles, it's better this way to avoid code
duplication.
When building with DEBUG_STRICT, there are still some BUG_ON(events&event_type)
in the subscribe() code which are not welcome anymore since we explicitly
permit to wake the caller up on readiness. This causes some regtests to fail
since 2c1f37d353 ("OPTIM: mux-h1: subscribe rather than waking up at a few
other places") when built with this option.
No backport is needed, this is 2.2-dev.
This is another round of conversion from a blind tasklet_wakeup() to
a more careful subscribe(). It has significantly improved the number
of function calls per HTTP request (/?s=1k/t=20) :
before after
tasklet_wakeup: 3 2
conn_subscribe: 3 2
h1_iocb: 3 2
h1_process: 3 2
h1_parse_msg_hdrs: 4 3
h1_rcv_buf: 5 3
h1_send: 5 4
h1_subscribe: 2 1
h1_wake_stream_for_send: 5 4
http_wait_for_request: 2 1
process_stream: 3 2
si_cs_io_cb: 4 2
si_cs_process: 3 1
si_cs_rcv: 5 3
si_sync_send: 2 1
si_update_both: 2 1
stream_int_chk_rcv_conn: 3 2
stream_int_notify: 3 1
stream_release_buffers: 9 4
In order to save a lot on syscalls, we currently don't disable receiving
on a file descriptor anymore if its handler was already woken up. But if
the run queue is huge and the poller collects a lot of events, this causes
excess wakeups which take CPU time which is not used to flush these tasklets.
This patch simply considers the run queue size to decide whether or not to
stop receiving. Tests show that by stopping receiving when the run queue
reaches ~16 times its configured size, we can still hold maximal performance
in extreme situations like maxpollevents=20k for runqueue_depth=2, and still
totally avoid calling epoll_event under moderate load using default settings
on keep-alive connections.
We used to call ->wake() for any I/O event for which there was no
subscriber. But this is a problem because this causes massive wake()
storms since we disabled fd_stop_recv() to save syscalls.
The only reason for the io_available condition is to detect that an
asynchronous connect() just finished and will not be handled by any
registered event handler. Since we now properly handle synchronous
connects, we can detect this situation by the fact that we had a
success on conn_fd_check() and no requested I/O took over.
In the rare case of immediate connect() (unix sockets, socket pairs, and
occasionally TCP over the loopback), it is counter-productive to subscribe
for sending and then getting immediately back to process_stream() after
having passed through si_cs_process() just to update the connection. We
already know it is established and it doesn't have any handshake anymore
so we just have to complete it and return to process_stream() with the
stream_interface in the SI_ST_RDY state. In this case, process_stream will
simply loop back to the beginning to synchronize the state and turn it to
SI_ST_EST/ASS/CLO/TAR etc.
This will save us from having to needlessly subscribe in the connect()
code, something which in addition cannot work with edge-triggered pollers.
With commit 065a025610 ("MEDIUM: connection: don't stop receiving events
in the FD handler") we disabled a number of fd_stop_* in conn_fd_handler(),
in order to wait for their respective handlers to deal with them. But it
is not correct to do that for the send direction, as we may very well
have nothing to send. This is visible when connecting in TCP mode to
a server with no data to send, there's nobody anymore to disable the
polling for the send direction.
And it is logical, on the recv() path we know the system has data to
deliver and that some code will be in charge of it. On the send
direction we simply don't know if it was the result of a successful
connect() or if there is still something to send. In any case we
almost never fill the network buffer on a single send() after being
woken up by the system, so disabling the FD immediately or much later
will not change the number of operations.
No backport is needed, this is 2.2-dev.
The approach was wrong. USE_DL is for the makefile to know if it's required
to append -ldl at link time. Some platforms do not need it (and in fact do
not have it) yet they have a working dladdr(). The real condition is related
to ELF. Given that due to Lua, all platforms that require -ldl already have
USE_DL set, let's replace USE_DL with __ELF__ here and consider that dladdr
is always needed on ELF, which basically is already the case.
resolve_sym_name() doesn't build when USE_DL is set on non-GNU platforms
because "Elf(W)" isn't defined. Since it's only used for dladdr1(), let's
refactor all this so that we can completely ifdef out that part on other
platforms. Now we have a separate function to perform the call depending
on the platform and it only returns the size when available.
Instead of special-casing the use of the symbol resolving to decide
whether to dump a partial or complete trace, let's simply start over
and dump everything when we reach the end after having found nothing.
It will be more robust against dirty traces as well.