Counters are currently stored into lowlevel nameservers struct but
most of them are resolving layer data and increased in the upper layer
So this patch renames the prototype used to allocate/dump them with prefix
'resolv' waiting for a clean split.
Some types are specific to resolver code and a renamed using
the 'resolv' prefix instead 'dns'.
-struct dns_query_item {
+struct resolv_query_item {
-struct dns_answer_item {
+struct resolv_answer_item {
-struct dns_response_packet {
+struct resolv_response {
Resolv callbacks are also updated to rely on counters and not on
nameservers.
"show stat domain dns" will now show the parent id (i.e. resolvers
section name).
When a server dns resolution is performed, there is no reason to set an
unconfigured check port with the server port. Because by default, if the
check port is not set, the server's one is used. Thus we can remove this
useless assignment. It is mandatory for next improvements.
V2 of this fix which includes a missing pointer initialization which was
causing a segfault in v1 (949a7f6459)
This bug happens when a service has multiple records on the same host
and the server provides the A/AAAA resolution in the response as AR
(Additional Records).
In such condition, the first occurence of the host will be taken from
the Additional section, while the second (and next ones) will be process
by an independent resolution task (like we used to do before 2.2).
This can lead to a situation where the "synchronisation" of the
resolution may diverge, like described in github issue #971.
Because of this behavior, HAProxy mixes various type of requests to
resolve the full list of servers: SRV+AR for all "first" occurences and
A/AAAA for all other occurences of an existing hostname.
IE: with the following type of response:
;; ANSWER SECTION:
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A2.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 86 A3.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A1.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 85 A3.tld.
;; ADDITIONAL SECTION:
A2.tld. 3600 IN A 192.168.0.2
A3.tld. 3600 IN A 192.168.0.3
A1.tld. 3600 IN A 192.168.0.1
A3.tld. 3600 IN A 192.168.0.3
the first A3 host is resolved using the Additional Section and the
second one through a dedicated A request.
When linking the SRV records to their respective Additional one, a
condition was missing (chek if said SRV record is already attached to an
Additional one), leading to stop processing SRV only when the target
SRV field matches the Additional record name. Hence only the first
occurence of a target was managed by an additional record.
This patch adds a condition in this loop to ensure the record being
parsed is not already linked to an Additional Record. If so, we can
carry on the parsing to find a possible next one with the same target
field value.
backport status: 2.2 and above
This reverts commit 949a7f6459.
The first part of the patch introduces a bug. When a dns answer item is
allocated, its <ar_item> is only initialized at the end of the parsing, when
the item is added in the answer list. Thus, we must not try to release it
during the parsing.
The second part is also probably buggy. It fixes the issue #971 but reverts
a fix for the issue #841 (see commit fb0884c8297 "BUG/MEDIUM: dns: Don't
store additional records in a linked-list"). So it must be at least
revalidated.
This revert fixes a segfault reported in a comment of the issue #971. It
must be backported as far as 2.2.
This bug happens when a service has multiple records on the same host
and the server provides the A/AAAA resolution in the response as AR
(Additional Records).
In such condition, the first occurence of the host will be taken from
the Additional section, while the second (and next ones) will be process
by an independent resolution task (like we used to do before 2.2).
This can lead to a situation where the "synchronisation" of the
resolution may diverge, like described in github issue #971.
Because of this behavior, HAProxy mixes various type of requests to
resolve the full list of servers: SRV+AR for all "first" occurences and
A/AAAA for all other occurences of an existing hostname.
IE: with the following type of response:
;; ANSWER SECTION:
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A2.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 86 A3.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A1.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 85 A3.tld.
;; ADDITIONAL SECTION:
A2.tld. 3600 IN A 192.168.0.2
A3.tld. 3600 IN A 192.168.0.3
A1.tld. 3600 IN A 192.168.0.1
A3.tld. 3600 IN A 192.168.0.3
the first A3 host is resolved using the Additional Section and the
second one through a dedicated A request.
When linking the SRV records to their respective Additional one, a
condition was missing (chek if said SRV record is already attached to an
Additional one), leading to stop processing SRV only when the target
SRV field matches the Additional record name. Hence only the first
occurence of a target was managed by an additional record.
This patch adds a condition in this loop to ensure the record being
parsed is not already linked to an Additional Record. If so, we can
carry on the parsing to find a possible next one with the same target
field value.
backport status: 2.2 and above
Use the new stats module API to integrate the dns counters in the
standard stats. This is done in order to avoid code duplication, keep
the code related to cli out of dns and use the full possibility of the
stats function, allowing to print dns stats in csv or json format.
When a SRV record for an already known server is processed, only the weight is
updated, if not configured to be ignored. It is a problem if the IP address
carried by the associated additional record changes. Because the server IP
address is never renewed.
To fix this bug, If there is an addition record attached to a SRV record, we
always try to set the IP address. If it is the same, no change is
performed. This way, IP changes are always handled.
This patch should fix the issue #841. It must be backported to 2.2.
A SRV record keeps a reference on the corresponding additional record, if
any. But this additional record is also inserted in a separate linked-list into
the dns response. The problems arise when obsolete additional records are
released. The additional records list is purged but the SRV records always
reference these objects, leading to an undefined behavior. Worst, this happens
very quickly because additional records are never renewed. Thus, once received,
an additional record will always expire.
Now, the addtional record are only associated to a SRV record or simply
ignored. And the last version is always used.
This patch helps to fix the issue #841. It must be backported to 2.2.
A regression was introduced by 13a9232ebc
when I added support for Additional section of the SRV responses..
Basically, when a server is managed through SRV records additional
section and it's disabled (because its associated Additional record has
disappeared), it never leaves its MAINT state and so never comes back to
production.
This patch updates the "snr_update_srv_status()" function to clear the
MAINT status when the server now has an IP address and also ensure this
function is called when parsing Additional records (and associating them
to new servers).
This can cause severe outage for people using HAProxy + consul (or any
other service registry) through DNS service discovery).
This should fix issue #793.
This should be backported to 2.2.
When an action is evaluated, flags are passed to know if it is the first call
(ACT_OPT_FIRST) and if it must be the last one (ACT_OPT_FINAL). For the
do-resolve DNS action, the ACT_OPT_FINAL flag must be handled because the
action may yield. It must never yield when this flag is set. Otherwise, it may
lead to a wakeup loop of the stream because the inspected-delay of a tcp-request
content ruleset was reached without stopping the rules evaluation.
This patch is related to the issue #222. It must be backported as far as 2.0.
Support for DNS Service Discovery by means of SRV records was enhanced with
commit 13a9232eb ("MEDIUM: dns: use Additional records from SRV responses")
to use the content of the answers Additional records when present.
If there are Authority records before the Additional records we mistakenly
treat that as an invalid response. To fix this, just ignore the Authority
section if it exist and skip to the Additional records.
As 13a9232eb was introduced during 2.2-dev, it must be backported to 2.2.
This is a fix for issue #778
The previous leak on do-resolve was particularly tricky to check due
to the important code repetition in dns_validate_dns_response() which
required careful examination of all return statements to check whether
they needed a pool_free() or not. Let's clean all this up using a common
leave point which releases the element itself. This also encourages
to properly set the current response to null right after freeing or
adding it so that it doesn't get added. 45 return and 22 pool_free()
were replaced by one of each.
When a DNS resolution is freed, the remaining items in .ar_list and .answer_list
are also released. It must be done to avoid a memory leak. And it is the last
chance to release these objects. I've honestly no idea if there is a better
place to release them earlier. But at least, there is no more leak.
This patch should solve the issue #222. It must be backported, at least, as far
as 2.0, and probably, with caution, as far as 1.8 or 1.7.
The do-resolve HTTP action, performing a DNS resolution of a sample expression
output, is not thread-safe at all. The resolver object used to do the resolution
must be locked when the action is executed or when the stream is released
because its curr or wait resolution lists and the requester list inside a
resolution are updated. It is also important to not wake up a released stream
(with a destroyed task).
Of course, because of this bug, various kind of crashes may be observed.
This patch should fix the issue #236. It must be backported as far as 2.0.
NetBSD apparently uses macros for tolower/toupper and complains about
the use of char for array subscripts. Let's properly cast all of them
to unsigned char where they are used.
This is needed to fix issue #729.
The set of files proto_udp.{c,h} were misleadingly named, as they do not
provide anything related to the UDP protocol but to datagram handling
instead, since currently all UDP processing is hard-coded where it's used
(dns, logs). They are to UDP what connection.{c,h} are to proto_tcp. This
was causing confusion about how to insert UDP socket management code,
so let's rename them right now to dgram.{c,h} which more accurately
matches what's inside since every function and type is already prefixed
with "dgram_".
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.
There is nothing left in common/ now so this directory must not be used
anymore.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
This one is particularly difficult to split because it provides all the
functions used to manipulate a proxy state and to retrieve names or IDs
for error reporting, and as such, it was included in 73 files (down to
68 after cleanup). It would deserve a small cleanup though the cut points
are not obvious at the moment given the number of structs involved in
the struct proxy itself.
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.
Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.
A good approach would consist in splitting that file in 3 parts:
- one for error output ("errors" ?).
- one for log_format processing
- and one for actual logging.
It was moved without any change, however many callers didn't need it at
all. This was a consequence of the split of proto_http.c into several
parts that resulted in many locations to still reference it.
Almost no change except moving the cli_kw struct definition after the
defines. Almost all users had both types&proto included, which is not
surprizing since this code is old and it used to be the norm a decade
ago. These places were cleaned.
Just some minor reordering, and the usual cleanup of call places for
those which didn't need it. We don't include the whole tools.h into
stats-t anymore but just tools-t.h.
The type file was slightly tidied. The cli-specific APPCTX_CLI_ST1_* flag
definitions were moved to cli.h. The type file was adjusted to include
buf-t.h and not the huge buf.h. A few call places were fixed because they
did not need this include.
All includes that were not absolutely necessary were removed because
checks.h happens to very often be part of dependency loops. A warning
was added about this in check-t.h. The fields, enums and structs were
a bit tidied because it's particularly tedious to find anything there.
It would make sense to split this in two or more files (at least
extract tcp-checks).
The file was renamed to the singular because it was one of the rare
exceptions to have an "s" appended to its name compared to the struct
name.
The TASK_IS_TASKLET() macro was moved to the proto file instead of the
type one. The proto part was a bit reordered to remove a number of ugly
forward declaration of static inline functions. About a tens of C and H
files had their dependency dropped since they were not using anything
from task.h.
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.
It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.
The UNIX_MAX_PATH definition was moved to compat.h.
This one is particularly tricky to move because everyone uses it
and it depends on a lot of other types. For example it cannot include
arg-t.h and must absolutely only rely on forward declarations to avoid
dependency loops between vars -> sample_data -> arg. In order to address
this one, it would be nice to split the sample_data part out of sample.h.
List.h was missing for LIST_ADDQ(). A few unneeded includes of action.h
were removed from certain files.
This one still relies on applet.h and stick-table.h.
A few includes were missing in each file. A definition of
struct polled_mask was moved to fd-t.h. The MAX_POLLERS macro was
moved to defaults.h
Stdio used to be silently inherited from whatever path but it's needed
for list_pollers() which takes a FILE* and which can thus not be
forward-declared.
This one is included almost everywhere and used to rely on a few other
.h that are not needed (unistd, stdlib, standard.h). It could possibly
make sense to split it into multiple parts to distinguish operations
performed on timers and the internal time accounting, but at this point
it does not appear much important.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
We can't expect the DNS answer to always match the case we used for the
request, so we can't just use memcmp() to compare the DNS answer with what
we are expected.
Instead, introduce dns_hostname_cmp(), which compares each string in a
case-insensitive way.
This should fix github issue #566.
This should be backported to 2.1, 2.0, 1.9 and 1.8.
13a9232ebc introduced parsing of
Additionnal DNS response section to pick up IP address when available.
That said, this introduced a side effect for other query types (A and
AAAA) leading to consider those responses invalid when parsing the
Additional section.
This patch avoids this situation by ensuring the Additional section is
parsed only for SRV queries.
As per issue #435 a hostname with a trailing dot confuses our DNS code,
as for a zero length DNS label we emit a null-byte. This change makes us
ignore the zero length label instead.
Must be backported to 1.8.
When an end pointer is passed, instead of complaining that a comma is
missing after a keyword, sample_parse_expr() will silently return the
pointer to the current location into this return pointer so that the
caller can continue its parsing. This will be used by more complex
expressions which embed sample expressions, and may even permit to
embed sample expressions into arguments of other expressions.
<.arg.dns.dns_opts> field in the act_rule structure is now dynamically allocated
when a do-resolve rule is parsed. This drastically reduces the structure size.
hostname were limited to 62 char, which is not RFC1035 compliant;
- the parsing loop should stop when above max label char
- fix len label test where d[i] was wrongly used
- simplify the whole function to avoid using two extra char* variable
this should fix github issue #387
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Reviewed-by: Tim Duesterhus <tim@bastelstu.be>
Acked-by: Baptiste <bedis9@gmail.com>
Most DNS servers provide A/AAAA records in the Additional section of a
response, which correspond to the SRV records from the Answer section:
;; QUESTION SECTION:
;_http._tcp.be1.domain.tld. IN SRV
;; ANSWER SECTION:
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A1.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A8.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A5.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A6.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A4.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A3.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A2.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A7.domain.tld.
;; ADDITIONAL SECTION:
A1.domain.tld. 3600 IN A 192.168.0.1
A8.domain.tld. 3600 IN A 192.168.0.8
A5.domain.tld. 3600 IN A 192.168.0.5
A6.domain.tld. 3600 IN A 192.168.0.6
A4.domain.tld. 3600 IN A 192.168.0.4
A3.domain.tld. 3600 IN A 192.168.0.3
A2.domain.tld. 3600 IN A 192.168.0.2
A7.domain.tld. 3600 IN A 192.168.0.7
SRV record support was introduced in HAProxy 1.8 and the first design
did not take into account the records from the Additional section.
Instead, a new resolution is associated to each server with its relevant
FQDN.
This behavior generates a lot of DNS requests (1 SRV + 1 per server
associated).
This patch aims at fixing this by:
- when a DNS response is validated, we associate A/AAAA records to
relevant SRV ones
- set a flag on associated servers to prevent them from running a DNS
resolution for said FADN
- update server IP address with information found in the Additional
section
If no relevant record can be found in the Additional section, then
HAProxy will failback to running a dedicated resolution for this server,
as it used to do.
This behavior is the one described in RFC 2782.
Some custom actions are just ignored and skipped when an error is encoutered. In
that case, we jump to the next rule. To do so, most of them use the return code
ACT_RET_ERR. Currently, for http rules and tcp content rules, it is not a
problem because this code is handled the same way than ACT_RET_CONT. But, it
means there is no way to handle the error as other actions. The custom actions
must handle the error and return ACT_RET_DONE. For instance, when http-request
rules are processed, an error when we try to replace a header value leads to a
bad request and an error 400 is returned to the client. But when we fail to
replace the URI, the error is silently ignored. This difference between the
custom actions and the others is an obstacle to write new custom actions.
So, in this first patch, ACT_RET_CONT is now returned from custom actions
instead of ACT_RET_ERR when an error is encoutered if it should be ignored. The
behavior remains the same but it is now possible to handle true errors using the
return code ACT_RET_ERR. Some actions will probably be reviewed to determine if
an error is fatal or not. Other patches will be pushed to trigger an error when
a custom action returns the ACT_RET_ERR code.
This patch is not tagged as a bug because it is just a design issue. But others
will depends on it. So be careful during backports, if so.
Left shifting of large signed values and negative values is undefined.
In a test script clang's ubsan rightfully complains:
> runtime error: left shift of 1934242336581872173 by 13 places cannot be represented in type 'int64_t' (aka 'long')
This bug was introduced in the initial version of the DNS resolver
in 325137d603. The fix must be backported
to HAProxy 1.6+.
In dns_send_query(), there's no point in first waking up the FD, to get
called back by the poller to send the request and sleep. Instead let's
simply send the request as soon as it's known and only subscribe to the
poller when the socket buffers are full and it's required to poll (i.e.
almost never).
This significantly reduces the number of calls to the poller. A large
config sees the number of epoll_ctl() calls reduced from 577 to 7 over
10 seconds, the number of recvfrom() from 1533 to 582 and the number of
sendto() from 369 to 162.
It also has the extra benefit of building each requests only once per
resolution and sending it to multiple resolvers instead of rebuilding
it for each and every resolver.
This will reduce the risk of seeing situations similar to bug #416 in
the future.
It was reported in bug #399 that the DNS sometimes enters endless loops
after hours working fine. The issue is caused by a lack of error
processing in the DNS's recv() path combined with an exclusive recv OR
send in the UDP layer, resulting in some errors causing CPU loops that
will never stop until the process is restarted.
The basic cause is that the FD_POLL_ERR and FD_POLL_HUP flags are sticky
on the FD, and contrary to a stream socket, receiving an error on a
datagram socket doesn't indicate that this socket cannot be used anymore.
Thus the Rx code must at least handle this situation and flush the error
otherwise it will constantly be reported. In theory this should not be a
big issue but in practise it is due to another bug in the UDP datagram
handler which prevents the send() callback from being called when Rx
readiness was reported, so the situation cannot go away. It happens way
more easily with threads enabled, so that there is no dead time between
the moment the FD is disabled and another recv() is called, such as in
the example below where the request was sent to a closed port on the
loopback provoking an ICMP unreachable to be sent back:
[pid 20888] 18:26:57.826408 sendto(29, ";\340\1\0\0\1\0\0\0\0\0\1\0031wt\2eu\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 35, 0, NULL, >
[pid 20893] 18:26:57.826566 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 ECONNREFUSED (Connection refused)
[pid 20889] 18:26:57.826601 recvfrom(29, 0x7f97c76182f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20892] 18:26:57.826630 recvfrom(29, 0x7f97c5cf02f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20891] 18:26:57.826684 recvfrom(29, 0x7f97c66162f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20895] 18:26:57.826716 recvfrom(29, 0x7f97bffda2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20894] 18:26:57.826747 recvfrom(29, 0x7f97c4cee2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20888] 18:26:58.419838 recvfrom(29, 0x7ffcc8712c20, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20893] 18:26:58.419900 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
(... hundreds before next sendto() ...)
This situation was handled by clearing HUP and ERR when recv()
returns <0.
A second case was handled, there was a control for a missing dgram
handler, but it does nothing, causing the FD to ring again if this
situation ever happens. After looking at the rest of the code, it
doesn't seem possible to face such a situation because these handlers
are registered during startup, but at least we need to handle it
properly.
A third case was handled, that's mainly a small optimization. With
threads and massive responses, due to the large lock around the loop,
it's likely that some threads will have seen fd_recv_ready() and will
wait at the lock(). But if they wait here, chances are that other
threads will have eliminated pending data and issued fd_cant_recv().
In this case, better re-check fd_recv_ready() before performing the
recv() call to avoid the huge amounts of syscalls that happen on
massively threaded setups.
This patch must be backported as far as 1.6 (the atomic AND just
needs to be turned to a regular AND).
`eb` being tested above, `res` cannot be null, so the condition is
not needed and introduces potential dead code.
also fix a typo in associated comment
This should fix issue #349
Reported-by: Илья Шипицин <chipitsine@gmail.com>
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
It was noted in #48 that there are times when a configuration
may use the server-template directive with SRV records and
simultaneously want to control weights using an agent-check or
through the runtime api. This patch adds a new option
"ignore-weight" to the "resolve-opts" directive.
When specified, any weight indicated within an SRV record will
be ignored. This is for both initial resolution and ongoing
resolution.
Documentation states that the interval between 2 DNS resolution is
driven by "timeout resolve <time>" directive.
From a code point of view, this was applied unless the latest status of
the resolution was VALID. In such case, "hold valid" was enforce.
This is a bug, because "hold" timers are not here to drive how often we
want to trigger a DNS resolution, but more how long we want to keep an
information if the status of the resolution itself as changed.
This avoid flapping and prevent shutting down an entire backend when a
DNS server is not answering.
This issue was reported by hamshiva in github issue #345.
Backport status: 1.8
As reported by David Birdsong on the ML, the HTTP action do-resolve does
not use the DNS cache.
Actually, the action is "registred" to the resolution for said name to
be resolved and wait until an other requester triggers the it. Once the
resolution is finished, then the action is updated with the result.
To trigger this, you must have a server with runtime DNS resolution
enabled and run a do-resolve action with the same fqdn AND they use the
same resolvers section.
This patch fixes this behavior by ensuring the resolution associated to
the action has a valid answer which is not considered as expired. If
those conditions are valid, then we can use it (it's the "cache").
Backport status: 2.0
Processing of SRV record weight was inaccurate and when a SRV record's
weight was set to 0, HAProxy enforced it to '1'.
This patch aims at fixing this without breaking compability with
previous behavior.
Backport status: 1.8 to 2.0
@davidmogar reported a github issue (#227) about problems with
do-resolve action when the request contains a body.
The variable was never populated in such case, despite tcpdump shows a
valid DNS response coming back.
The do-resolve action is a task in HAProxy and so it's waken by the
scheduler each time the scheduler think such task may have some work to
do.
When a simple HTTP request is sent, then the task is called, it sends
the DNS request, then the scheduler will wake up the task again later
once the DNS response is there.
Now, when the client send a PUT or a POST request (or any other type)
with a BODY, then the do-resolve action if first waken up once the
headers are processed. It sends the DNS request. Then, when the bytes
for the body are processed by HAProxy AND the DNS response has not yet
been received, then the action simply terminates and cleans up all the
data associated to this resolution...
This patch detect such behavior and if the action is now waken up while
a DNS resolution is in RUNNING state, then the action will tell the
scheduler to wake it up again later.
Backport status: 2.0 and above
There were 221 places where a status message or an error message were built
to be returned on the CLI. All of them were replaced to use cli_err(),
cli_msg(), cli_dynerr() or cli_dynmsg() depending on what was expected.
This removed a lot of duplicated code because most of the times, 4 lines
are replaced by a single, safer one.
The old module proto_http does not exist anymore. All code dedicated to the HTTP
analysis is now grouped in the file proto_htx.c. So, to finish the polishing
after removing the legacy HTTP code, proto_htx.{c,h} files have been moved in
http_ana.{c,h} files.
In addition, all HTX analyzers and related functions prefixed with "htx_" have
been renamed to start with "http_" instead.
The do-resolve action tests for a client connection to the stream and
tries to get the client's address, otherwise it refrains from performing
the resolution. This really makes no sense at all and looks like an
earlier attempt at resolving the client's address to test that the
code was working. Further, it prevents the action from being used
from other places such as an autonomous applet for example, even if
at the moment this use case does not exist.
This patch simply removes the irrelevant test.
This can be backported to 2.0.
The 'do-resolve' action is an http-request or tcp-request content action
which allows to run DNS resolution at run time in HAProxy.
The name to be resolved can be picked up in the request sent by the
client and the result of the resolution is stored in a variable.
The time the resolution is being performed, the request is on pause.
If the resolution can't provide a suitable result, then the variable
will be empty. It's up to the admin to take decisions based on this
statement (return 503 to prevent loops).
Read carefully the documentation concerning this feature, to ensure your
setup is secure and safe to be used in production.
This patch creates a global counter to track various errors reported by
the action 'do-resolve'.
In dns.c, dns_link_resolution(), each type of dns requester is managed
separately, that said, the callback function is affected globaly (and
points to server type callbacks only).
This design prevents the addition of new dns requester type and this
patch aims at fixing this limitation: now, the callback setting is done
directly into the portion of code dedicated to each requester type.
dns_requester structure can be allocated at run time when servers get
associated to DNS resolution (this happens when SRV records are used in
conjunction with service discovery).
Well, this memory allocation is safer if managed in an HAProxy pool,
furthermore with upcoming HTTP action which can perform DNS resolution
at runtime.
This patch moves the memory management of the dns_requester structure
into its own pool.
task_delete() was never used without calling task_free() just after, and
task_free() was only used on error pathes to destroy a just-created task,
so merge them into task_destroy(), that will remove the task from the
wait queue, and make sure the task is either destroyed immediately if it's
not in the run queue, or destroyed when it's supposed to run.
In dns_read_name() when dns name is used with compression and start position of
name is greater than 255 name read is incorrect and causes invalid dns error.
eg: 0xc11b c specifies name compression being used. 11b represent the start
position of name but currently we are using only 1b for start position.
This should be backported as far as 1.7.
A regression was introduced with efbbdf72 BUG: dns: Prevent out-of-bounds
read in dns_validate_dns_response() as it prevented from taking into account
the last byte of the payload. this patch aims at fixing it.
this must be backported in 1.8.
We need to make sure that the record length is not making us read
past the end of the data we received.
Before this patch we could for example read the 16 bytes
corresponding to an AAAA record from the non-initialized part of
the buffer, possibly accessing anything that was left on the stack,
or even past the end of the 8193-byte buffer, depending on the
value of accepted_payload_size.
To be backported to 1.8, probably also 1.7.
Some callers of dns_read_name() do not make sure that we can read
the first byte, holding the length of the next label, without going
past our buffer, so we need to make sure of that.
In addition, if the label is a compressed one we need to make sure
that we can read the following byte to compute the target offset.
To be backported to 1.8, probably also 1.7.
When a compressed pointer is encountered, dns_read_name() will call
itself with the pointed-to offset in the packet.
With a specially crafted packet, it was possible to trigger an
infinite-loop recursion by making the pointer points to itself.
While it would be possible to handle that particular case differently
by making sure that the target is different from the current offset,
it would still be possible to craft a packet with a very long chain
of valid pointers, always pointing backwards. To prevent a stack
exhaustion in that case, this patch restricts the number of recursive
calls to 100, which should be more than enough.
To be backported to 1.8, probably also 1.7.
Instead of exporting a number of pools and having to manually delete
them in deinit() or to have dedicated destructors to remove them, let's
simply kill all pools on deinit().
For this a new function pool_destroy_all() was introduced. As its name
implies, it destroys and frees all pools (provided they don't have any
user anymore of course).
This allowed to remove 4 implicit destructors, 2 explicit ones, and 11
individual calls to pool_destroy(). In addition it properly removes
the mux_pt_ctx pool which was not cleared on exit (no backport needed
here since it's 1.9 only). The sig_handler pool doesn't need to be
exported anymore and became static now.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
Most calls to hap_register_post_check(), hap_register_post_deinit(),
hap_register_per_thread_init(), hap_register_per_thread_deinit() can
be done using initcalls and will not require a constructor anymore.
Let's create a set of simplified macros for this, called respectively
REGISTER_POST_CHECK, REGISTER_POST_DEINIT, REGISTER_PER_THREAD_INIT,
and REGISTER_PER_THREAD_DEINIT.
Some files were not modified because they wouldn't benefit from this
or because they conditionally register (e.g. the pollers).
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :
- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
Remaining calls to si_cant_put() were all for lack of room and were
turned to si_rx_room_blk(). A few places where SI_FL_RXBLK_ROOM was
cleared by hand were converted to si_rx_room_rdy().
The now unused si_cant_put() function was removed.
This flag is not enough to describe all blocking situations, as can be
seen in each case we remove it. The muxes has taught us that using multiple
blocking flags in parallel will be much easier, so let's start to do this
now. This patch only renames this flags in order to make next changes more
readable.
Like for the other checks, the type is being tested just before calling
objt_{server,dns_srvrq}() so let's use the unguarded version instead to
silence the warning.
On the Mailing list, Marcos Moreno reported that haproxy configuration
validation (through "haproxy -c cfgfile") does not detect when a
resolvers section does not exist for a server.
That said, this checking is done after HAProxy has started up.
The problem is that this can create production issue, since init
script can't detect the problem before starting / reloading HAProxy.
To fix this issue, this patch registers the function which validates DNS
configuration validity and run it right after configuration parsing is
finished (through cfg_register_postparser()).
Thanks to it, now "haproxy -c cfgfile" will fail when a server
points to a non-existing resolvers section (or any other validation made
by the function above).
Backport status: 1.8
By convenience or laziness we used to store dns_build_query()'s return code
into trash.data. The result checks applied there compare trash.data to -1
while it's now unsigned since commit 843b7cb ("MEDIUM: chunks: make the
chunk struct's fields match the buffer struct"). Let's clean this up
and test the result itself without storing it first.
No backport is needed.
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.
Most of the changes were made with spatch using this coccinelle script :
@rule_d1@
typedef chunk;
struct chunk chunk;
@@
- chunk.str
+ chunk.area
@rule_d2@
typedef chunk;
struct chunk chunk;
@@
- chunk.len
+ chunk.data
@rule_i1@
typedef chunk;
struct chunk *chunk;
@@
- chunk->str
+ chunk->area
@rule_i2@
typedef chunk;
struct chunk *chunk;
@@
- chunk->len
+ chunk->data
Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
By default, HAProxy's DNS resolution at runtime ensure that there is no
IP address duplication in a backend (for servers being resolved by the
same hostname).
There are a few cases where people want, on purpose, to disable this
feature.
This patch introduces a couple of new server side options for this purpose:
"resolve-opts allow-dup-ip" or "resolve-opts prevent-dup-ip".
dns_get_ip_from_response() is used to compare the caller current IP to
the IP available in the records returned by the DNS server.
A scoring system is in place to get the best IP address available.
That said, in the current implementation, there are a couple of issues:
1. a comment does not match what the code does
2. the code does not match what the commet says (score value is not
incremented with '2')
This patch fixes both issues.
Backport status: 1.8
The comment was about "prefered network ip version" while it's actually
"prefered ip version" in the code.
Fixed
Backport status: 1.7 and 1.8
Be careful, this patch may not apply on 1.7, since the score was '4'
for this item at that time.
In preparation for thread-specific runqueues, change the task API so that
the callback takes 3 arguments, the task itself, the context, and the state,
those were retrieved from the task before. This will allow these elements to
change atomically in the scheduler while the application uses the copied
value, and even to have NULL tasks later.
When checks fail, the code tries to run a dns resolution, in case the IP
changed.
The old way of doing that was to check, in case the last dns resolution
hadn't expired yet, if there were an applicable IP, which should be useless,
because it has already be done when the resolution was first done, or to
run a new resolution.
Both are a locking nightmare, and lead to deadlocks, so instead, just wake the
resolvers task, that should do the trick.
This should be backported to 1.8.
In order to use arbitrary data in the CLI (multiple lines or group of words
that must be considered as a whole, for example), it is now possible to add a
payload to the commands. To do so, the first line needs to end with a special
pattern: <<\n. Everything that follows will be left untouched by the CLI parser
and will be passed to the commands parsers.
Per-command support will need to be added to take advantage of this
feature.
Signed-off-by: Aurlien Nephtali <aurelien.nephtali@corp.ovh.com>
issue was identified by cppcheck
[src/dns.c:2037] -> [src/dns.c:2041]: (warning) Variable 'appctx->st2' is reassigned a value before the old one has been used. 'break;' missing?
Automatic downgrade of DNS accepted payload size may have undesired side
effect, which could make a backend with all servers DOWN.
After talking with Lukas on the ML, I realized this "feature" introduces
more issues that it fixes problem.
The "best" way to handle properly big responses will be to implement DNS
over TCP.
To be backported to 1.8.
fd_insert() is currently called just after setting the owner and iocb,
but proceeding like this prevents the operation from being atomic and
requires a lock to protect the maxfd computation in another thread from
meeting an incompletely initialized FD and computing a wrong maxfd.
Fortunately for now all fdtab[].owner are set before calling fd_insert(),
and the first lock in fd_insert() enforces a memory barrier so the code
is safe.
This patch moves the initialization of the owner and iocb to fd_insert()
so that the function will be able to properly arrange its operations and
remain safe even when modified to become lockless. There's no other change
beyond the internal API.
A SRV record weight can range from 0 to 65535, while haproxy weight goes
from 0 to 256, so we have to divide it by 256 before handing it to haproxy.
Also, a SRV record with a weight of 0 doesn't mean the server shouldn't be
used, so use a minimum weight of 1.
This should probably be backported to 1.8.
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
Rename the global variable "proxy" to "proxies_list".
There's been multiple proxies in haproxy for quite some time, and "proxy"
is a potential source of bugs, a number of functions have a "proxy" argument,
and some code used "proxy" when it really meant "px" or "curproxy". It worked
by pure luck, because it usually happened while parsing the config, and thus
"proxy" pointed to the currently parsed proxy, but we should probably not
rely on this.
[wt: some of these are definitely fixes that are worth backporting]
snr_check_ip_callback() may be called with the server lock, so don't attempt
to lock it again, instead, make sure the callers always have the lock before
calling it.
srv_set_fqdn() may be called with the DNS lock already held, but tries to
lock it anyway. So, add a new parameter to let it know if it was already
locked or not;
2 global locks have been added to protect, respectively, the run queue and the
wait queue. And a process mask has been added on each task. Like for FDs, this
mask is used to know which threads are allowed to process a task.
For many tasks, all threads are granted. And this must be your first intension
when you create a new task, else you have a good reason to make a task sticky on
some threads. This is then the responsibility to the process callback to lock
what have to be locked in the task context.
Nevertheless, all tasks linked to a session must be sticky on the thread
creating the session. It is important that I/O handlers processing session FDs
and these tasks run on the same thread to avoid conflicts.
This is a huge patch with many changes, all about the DNS. Initially, the idea
was to update the DNS part to ease the threads support integration. But quickly,
I started to refactor some parts. And after several iterations, it was
impossible for me to commit the different parts atomically. So, instead of
adding tens of patches, often reworking the same parts, it was easier to merge
all my changes in a uniq patch. Here are all changes made on the DNS.
First, the DNS initialization has been refactored. The DNS configuration parsing
remains untouched, in cfgparse.c. But all checks have been moved in a post-check
callback. In the function dns_finalize_config, for each resolvers, the
nameservers configuration is tested and the task used to manage DNS resolutions
is created. The links between the backend's servers and the resolvers are also
created at this step. Here no connection are kept alive. So there is no needs
anymore to reopen them after HAProxy fork. Connections used to send DNS queries
will be opened on demand.
Then, the way DNS requesters are linked to a DNS resolution has been
reworked. The resolution used by a requester is now referenced into the
dns_requester structure and the resolution pointers in server and dns_srvrq
structures have been removed. wait and curr list of requesters, for a DNS
resolution, have been replaced by a uniq list. And Finally, the way a requester
is removed from a DNS resolution has been simplified. Now everything is done in
dns_unlink_resolution.
srv_set_fqdn function has been simplified. Now, there is only 1 way to set the
server's FQDN, independently it is done by the CLI or when a SRV record is
resolved.
The static DNS resolutions pool has been replaced by a dynamoc pool. The part
has been modified by Baptiste Assmann.
The way the DNS resolutions are triggered by the task or by a health-check has
been totally refactored. Now, all timeouts are respected. Especially
hold.valid. The default frequency to wake up a resolvers is now configurable
using "timeout resolve" parameter.
Now, as documented, as long as invalid repsonses are received, we really wait
all name servers responses before retrying.
As far as possible, resources allocated during DNS configuration parsing are
releases when HAProxy is shutdown.
Beside all these changes, the code has been cleaned to ease code review and the
doc has been updated.
The cli command to show resolvers stats is in conflict with the command to show
proxies and servers stats. When you use the command "show stat resolvers [id]",
instead of printing stats about resolvers, you get the stats about all proxies
and servers.
Now, to avoid conflict, to print resolvers stats, you must use the following
command:
show resolvers [id]
This patch must be backported in 1.7.
For HTTP/2 we'll need some buffer-only equivalent functions to some of
the ones applying to channels and still squatting the bi_* / bo_*
namespace. Since these names have kept being misleading for quite some
time now and are really getting annoying, it's time to rename them. This
commit will use "ci/co" as the prefix (for "channel in", "channel out")
instead of "bi/bo". The following ones were renamed :
bi_getblk_nc, bi_getline_nc, bi_putblk, bi_putchr,
bo_getblk, bo_getblk_nc, bo_getline, bo_getline_nc, bo_inject,
bi_putchk, bi_putstr, bo_getchr, bo_skip, bi_swpbuf
This patch adds the ability to read from a wrapping memory area (ie:
buffers). The new functions are called "readv_<type>". The original
ones were renamed to start with "read_" to make the difference more
obvious between the read method and the returned type.
It's worth noting that the memory barrier in readv_bytes() is critical,
as otherwise gcc decides that it doesn't need the resulting data, but
even worse, removes the length checks in readv_u64() and happily
performs an out-of-bounds unaligned read using read_u64()! Such
"optimizations" are a bit borderline, especially when they impact
security like this...
Since the DNS layer split and the use of obj_type structure, we did not
updated propoerly the code used to compute the interval between 2
resolutions.
A nasty loop was then created when:
- resolver's hold.valid is shorter than servers' check.inter
- a valid response is available in the DNS cache
A task was woken up for a server's resolution. The servers pick up the IP
in the cache and returns without updating the 'last update' timestamp of
the resolution (which is normal...). Then the task is woken up again for
the same server.
The fix simply computes now properly the interval between 2 resolutions
and the cache is used properly while a new resolution is triggered if
the data is not fresh enough.
a reader pointer comparison to the end of the buffer was performed twice
while once is obviously enough.
backport status: this patch can be backported into HAProxy 1.6 (with some
modification. Please contact me)
For troubleshooting purpose, it may be important to know when a server
got its fqdn updated by a SRV record.
This patch makes HAProxy to report such events through stderr and logs.
RFC 6891 states that if a DNS client announces "big" payload size and
doesn't receive a response (because some equipments on the path may
block/drop UDP fragmented packets), then it should try asking for
smaller responses.
Following up DNS extension introduction, this patch aims at making the
computation of the maximum number of records in DNS response dynamic.
This computation is based on the announced payload size accepted by
HAProxy.
This patch fixes a bug where some servers managed by SRV record query
types never ever recover from a "no resolution" status.
The problem is due to a wrong function called when breaking the
server/resolution (A/AAAA) relationship: this is performed when a server's SRV
record disappear from the SRV response.
Edns extensions may be used to negotiate some settings between a DNS
client and a server.
For now we only use it to announce the maximum response payload size accpeted
by HAProxy.
This size can be set through a configuration parameter in the resolvers
section. If not set, it defaults to 512 bytes.
Current code implementation prevents multiple backends from relying on
the same SRV resolution. Actually, only the first backend which triggers
the resolution gets updated.
This patch makes HAProxy to process the whole list of the 'curr'
requesters to apply the changes everywhere (hence, the cache also applies
to SRV records...)
This function is particularly useful when debugging DNS resolution at
run time in HAProxy.
SRV records must be read differently, hence we have to update this
function.
DNS SRV records uses "dns name compression" to store the target name.
"dns compression" principle is simple. Let's take the name below:
3336633266663038.red.default.svc.cluster.local.
It can be stored "as is" in the response or it can be compressed like
this:
3336633266663038<POINTER>
and <POINTER> would point to the string
'.red.default.svc.cluster.local.' availble in the question section for
example.
This mechanism allows storing much more data in a single DNS response.
This means the flag "record->data_len" which stores the size of the
record (hence the whole string, uncompressed) can't be used to move the
pointer forward when reading responses. We must use the "offset" integer
which means the real number of bytes occupied by the target name.
If we don't do that, we can properly read the first SRV record, then we
loose alignment and we start reading unrelated data (still in the
response) leading to a false negative error treated as an "invalid"
response...
DNS response for SRV queries look like this:
- query dname looks like '_http._tcp.red.default.svc.cluster.local'
- answer record dname looks like
'3336633266663038.red.default.svc.cluster.local.'
Of course, it never matches... and it triggers many false positive in
the current code (which is suitable for A/AAAA/CNAME).
This patch simply ignores this dname matching in the case of SRV query
type.
First implementation of the DNS parser used to consider TRUNCATED
responses as errors and triggered a failover to an other query type
(usually A to AAAA or vice-versa).
When we query for SRV records, a TRUNCATED response still contains valid
records we can exploit, so we shouldn't trigger a failover in such case.
Note that we had to move the maching against the flag later in the
response parsing (actually, until we can read the query type....)
Make it so for each server, instead of specifying a hostname, one can use
a SRV label.
When doing so, haproxy will first resolve the SRV label, then use the
resulting hostnames, as well as port and weight (priority is ignored right
now), to each server using the SRV label.
It is resolved periodically, and any server disappearing from the SRV records
will be removed, and any server appearing will be added, assuming there're
free servers in haproxy.
As DNS servers may not return all IPs in one answer, we want to cache the
previous entries. Those entries are removed when considered obsolete, which
happens when the IP hasn't been returned by the DNS server for a time
defined in the "hold obsolete" parameter of the resolver section. The default
is 30s.
The commit 201c07f68 ("MAJOR/REORG: dns: DNS resolution task and
requester queues") introduces a warning during compilation:
src/dns.c: In function ‘dns_resolve_recv’:
src/dns.c:487:6: warning: ‘need_resend’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (need_resend) {
^
This patch initialize the variable and remove the comment about it.
This patch is a major upgrade of the internal run-time DNS resolver in
HAProxy and it brings the following 2 main changes:
1. DNS resolution task
Up to now, DNS resolution was triggered by the health check task.
From now, DNS resolution task is autonomous. It is started by HAProxy
right after the scheduler is available and it is woken either when a
network IO occurs for one of its nameserver or when a timeout is
matched.
From now, this means we can enable DNS resolution for a server without
enabling health checking.
2. Introduction of a dns_requester structure
Up to now, DNS resolution was purposely made for resolving server
hostnames.
The idea, is to ensure that any HAProxy internal object should be able
to trigger a DNS resolution. For this purpose, 2 things has to be done:
- clean up the DNS code from the server structure (this was already
quite clean actually) and clean up the server's callbacks from
manipulating too much DNS resolution
- create an agnostic structure which allows linking a DNS resolution
and a requester of any type (using obj_type enum)
3. Manage requesters through queues
Up to now, there was an uniq relationship between a resolution and it's
owner (aka the requester now). It's a shame, because in some cases,
multiple objects may share the same hostname and may benefit from a
resolution being performed by a third party.
This patch introduces the notion of queues, which are basically lists of
either currently running resolution or waiting ones.
The resolutions are now available as a pool, which belongs to the resolvers.
The pool has has a default size of 64 resolutions per resolvers and is
allocated at configuration parsing.
This patch introduces a bit of roundrobin in the records stored in our
local cache.
Purpose is to allow some kind of distribution of the IPs found in a
response.
Note that distribution properly applies only when the IP used by many
requesters disappear and is replaced by an other one.
ancount is the number of answers available in a DNS response.
Before this patch, HAProxy used to store the ancount found in the buffer
(sent by the DNS server).
Unfortunately, this is now inaccurate and does not correspond to the
number of records effectively stored in our local version of the
response. In Example, the CNAMEs are not stored.
This patch updates ancount field in to make it match what is effectively
stored in our version.
Introduction of a DNS response LRU cache in HAProxy.
When a positive response is received from a DNS server, HAProxy stores
it in the struct resolution and then also populates a LRU cache with the
response.
For now, the key in the cache is a XXHASH64 of the hostname in the
domain name format concatened to the query type in string format.
Prior this patch, the DNS responses were stored in a pre-allocated
memory area (allocated at HAProxy's startup).
The problem is that this memory is erased for each new DNS responses
received and processed.
This patch removes the global memory allocation (which was not thread
safe by the way) and introduces a storage of the dns response in the
struct
resolution.
The memory in the struct resolution is also reserved at start up and is
thread safe, since each resolution structure will have its own memory
area.
For now, we simply store the response and use it atomically per
response per server.