The function resolv_update_resolvers_timeout() always schedules a wakeup
of the process_resolvers() task based on the "timeout resolve" setting,
regardless of the presence of an ongoing resolution or not. This is causing
one wakeup every second by default even when there's no resolvers section
(due to the default one), and can even be worse: creating a section with
"timeout resolve 1" bombs the process with 1000 wakeups per second.
Let's condition the setting to the presence of a resolution to address
this.
This issue has been there forever, but it doesn't cause that much trouble,
and given how fragile and tricky this code is, it's probably wise to
refrain from backporting it until it's reported to really cause trouble.
For a server subject to SRV resolution, when the server's address is set,
its dynamic cookie, if any, and its server key are computed. Both are based
on the ip/port pair. However, this happens before the server's port is
set. Thus the port is equal to 0 at this stage. It is a problem if several
servers share the same IP but with different ports because they will share
the same dynamic cookie and the same server key, disturbing this way the
connection persistency and the session stickiness.
This patch must be backported as far as 2.2.
DNS resoltions may be triggered via a "do-resolve" action or when a connection
failure is experienced during a healthcheck. Cached valid responses are used, if
possible. But if the entry is expired or if there is no valid response, a new
reolution should be performed. However, an resolution is only performed if the
"resolve" timeout is expired. Thus, when this comes from a healthcheck, it means
no extra resolution is performed at all.
Now, when the resolution is performed for a server (SRV or SRVEQ) and no valid
response is found, the resolution timer is reset (last_resolution is set to
TICK_ETERNITY). Of course, it is only performed if no resolution is already
running.
Note that this feature was broken 5 years ago when the resolvers code was
refactored (67957bd59e).
This patch should fix the issue #1906. It affects all stable versions. However,
it is probably a good idea to not backport it too far (2.6, maybe 2.4) and with
some delay.
To avoid any UAF when a resolution is released, a mechanism was added to
abort a resolution and delayed the released at the end of the current
execution path. This mechanism depends on an hard assumption: Any reference
on an aborted resolution must be removed. So, when a resolution is aborted,
it is removed from the resolver lists and inserted into a death row list.
However, a resolution may still be referenced in the query_ids tree. It is
the tree containing all resolutions with a pending request. Because aborted
resolutions are released outside the resolvers lock, it is possible to
release a resolution on a side while a query ansswer is received and
processed on another one. Thus, it is still possible to have a UAF because
of this bug.
To fix the issue, when a resolution is aborted, it is removed from any list,
but it is also removed from the query_ids tree.
This patch should solve the issue #1862 and may be related to #1875. It must
be backported as far as 2.2.
The function appears like this in "show profiling tasks", so let's export
it:
function calls cpu_tot cpu_avg lat_tot lat_avg
main+0x1463f0 92 77.28us 839.0ns 2.018ms 21.93us <- wake_expired_tasks@src/task.c:429 task_drop_running
Shut the connect() warning of resolvers_finalize_config() when the
configuration was not emitted manually.
This shuts the warning for the "default" resolvers which is created
automatically for the httpclient.
Must be backported in 2.6.
When doing a re-exec, the master was creating a "default" resolvers,
which could result in a warning emitted because the "default" resolvers
of the configuration file is not available anymore.
Skip the creating of the "default" resolvers in wait mode, this is not
useful in the master.
Must be backported as far as 2.6.
Patch c31577f ("MEDIUM: resolvers: continue startup if network is
unavailable") was not working correctly. Indeed
resolvers_finalize_config() was returning a ERR type, but a postparser
is supposed to return 0 or 1.
The return value was never right, however it was only a problem since c31577f.
Must be backported in every stable branch.
When haproxy starts with a resolver section, and there is a default one
since 2.6 which use /etc/resolv.conf, it tries to do a connect() with the UDP
socket in order to check if the routes of the system allows to reach the
server.
This check is too much restrictive as it won't prevent any runtime
failure.
Relax the check by making it a warning instead of a fatal alert.
This must be backported in 2.6.
When the resolv.conf file is empty or there is no resolv.conf file, an
empty resolvers will be created, which emits a warning during the
postparsing step.
This patch fixes the problem by freeing the resolvers section if the
parsing failed or if the nameserver list is empty.
Must be backported in 2.6, the previous patch which introduces
resolvers_destroy() is also required.
Leonhard Wimmer reported an interesting bug in github issue #1742.
Servers in disabled proxies that are configured for resolution are still
subscribed to DNS resolutions, but the LB algos are not initialized at
all since the proxy is disabled, so when the server state changes,
attempts to update its status cause a crash when the server's weight
is recalculated via a divide by the proxy's total weight which is zero.
This should be backported to all versions. Beware that before 2.5 or
so, there's no PR_FL_DISABLED flag, instead px->disabled should be
used (2.3-2.4) or PR_STSTOPPED for older versions.
Thanks to Leonhard for his report and quick test!
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. Both the core functions and the ones in the
resolvers files were updated.
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
This file contains all the stream-connector functions that are specific
to application layers of type stream. So let's name it accordingly so
that it's easier to figure what's located there.
The alphabetical ordering of include files was preserved.
The new name mor eclearly indicates that a stream connector cannot make
any more progress because it needs room in the channel buffer, or that
it may be unblocked because the buffer now has more room available. The
testing function is sc_waiting_room(). This is mostly used by applets.
Note that the flags will change soon.
We're starting to propagate the stream connector's new name through the
API. Most call places of these functions that retrieve the channel or its
buffer are in applets. The local variable names are not changed in order
to keep the changes small and reviewable. There were ~92 uses of cs_ic(),
~96 of cs_oc() (due to co_get*() being less factorizable than ci_put*),
and ~5 accesses to the buffer itself.
This applies the change so that the applet code stops using ci_putchk()
and friends everywhere possible, for the much saferapplet_put*() instead.
The change is mechanical but large. Two or three functions used to have no
appctx and a cs derived from the appctx instead, which was a reminiscence
of old times' stream_interface. These were simply changed to directly take
the appctx. No sensitive change was performed, and the old (more complex)
API is still usable when needed (e.g. the channel is already known).
The change touched roughly a hundred of locations, with no less than 124
lines removed.
It's worth noting that the stats applet, the oldest of the series, could
get a serious lifting, as it's still very channel-centric instead of
propagating the appctx along the chain. Given that this code doesn't
change often, there's no emergency to clean it up but it would look
better.
This renames the "struct conn_stream" to "struct stconn" and updates
the descriptions in all comments (and the rare help descriptions) to
"stream connector" or "connector". This touches a lot of files but
the change is minimal. The local variables were not even renamed, so
there's still a lot of "cs" everywhere.
resolvers_deinit() function is called on error, during post-parsing stage,
or on deinit, when HAProxy is stopped. It releases all entities: resolvers,
resolutions and SRV requests. There is no reason to defer the resolutions
release by moving them in the death_row list because this function is
terminal. And it is in fact a bug. Resolutions must not be released at the
end of the function because resolvers were already freed. However some
resolutions may still be attached to a reolver. Thus, when we try to remove
it from the resolver's tree, in resolv_reset_resolution(), this resolver was
already released.
So now, resolution are immediately released. It means there is no more
reason to track this function. calls to
enter_resolver_code()/leave_resolver_code() have been removed.
This patch should fix the issue #1680 and may be related to #1485. It must
be backported as far as 2.2.
There's been some great confusion between proto_type, ctrl_type and
sock_type. It turns out that ctrl_type was improperly chosen because
it's not the control layer that is of this or that type, but the
transport layer, and it turns out that the transport layer doesn't
(normally) denaturate the underlying control layer, except for QUIC
which turns dgrams to streams. The fact that the SOCK_{DGRAM|STREAM}
set of values was used added to the confusion.
Let's replace it with xprt_type which reuses the later introduced
PROTO_TYPE_* values, and update the comments to explain which one
works at what level.
This one is the pointer to the conn_stream which is always in the
endpoint that is always present in the appctx, thus it's not needed.
This patch removes it and replaces it with appctx_cs() instead. A
few occurences that were using __cs_strm(appctx->owner) were moved
directly to appctx_strm() which does the equivalent.
The command uses this state but _INIT immediately turns to _LIST, which
turns to _FIN at the end without doing anything in that state, thus the
only existing state is _LIST so we don't need to store a state. Let's
just get rid of it.
The command was using cli.p0/p1/p2 to select which section to dump, the
current section and the current ns. Let's instead have a locally defined
"show_resolvers_ctx" section for this.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing (both in the CLI and HTTP).
The change looks large but it's particularly mechanical. The context
initialization appears in stats.c and http_ana.c. The context is used
in stats.c and resolvers.c since "show stat resolvers" points there.
That's the reason why the definition moved to stats.h. "show info"
and "show stat" continue to share the same state definition for now.
Nothing else was modified.
The "show resolvers" command is bogus, it tries to implement a yielding
mechanism except that if it yields it restarts from the beginning, until
it manages to fill the buffer with only line breaks, and faces error -2
that lets it reach the final state and exit.
The risk is low since it requires about 50 name servers to reach that
state, but it's not impossible, especially when using multiple sections.
In addition, the extraneous line breaks, if sent over an interactive
connection, will desynchronize the commands and make the client believe
the end was reached after the first nameserver. This cannot be fixed
separately because that would turn this bug into an infinite loop since
it's the line feed that manages to fill the buffer and stop it.
The fix consists in saving the current resolvers section into ctx.cli.p1
and the current nameserver into ctx.cli.p2.
This should be backported, but that code moved a lot since it was
introduced and has always been bogus. It looks like it has mostly
stabilized in 2.4 with commit c943799c86 so the fix might be backportable
to 2.4 without too much effort.
Try to create a "default" resolvers section at startup, but does not
display any error nor warning. This section is initialized using the
/etc/resolv.conf of the system.
This is opportunistic and with no guarantee that it will work (but it should
on most systems).
This is useful for the httpclient as it allows to use the DNS resolver
without any configuration in most of the cases.
The function is called from the httpclient_pre_check() function to
ensure than we tried to create the section before trying to initiate the
httpclient. But it is also called from the resolvers.c to ensure the
section is created when the httpclient init was disabled.
Move the resolv.conf parser from the cfg_parse_resolvers so it could be
used separately.
Some changes were made in the memprintf in order to use a char **
instead of a char *. Also the variable is tested before each memprintf
so could skip them if no warnmsg nor errmsg were set.
Cleanup the alert and warning handling in the "parse-resolve-conf"
parser to use the errmsg and warnmsg variables and memprintf.
This will allow to split the parser and shut the alert/warning if
needed.
A config like the following:
global
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
resolvers unbound
nameserver unbound 127.0.0.1:53
will report the following leak when running a configuration check:
==241882== 6,991 (6,952 direct, 39 indirect) bytes in 1 blocks are definitely lost in loss record 8 of 13
==241882== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==241882== by 0x25938D: cfg_parse_resolvers (resolvers.c:3193)
==241882== by 0x26A1E8: readcfgfile (cfgparse.c:2171)
==241882== by 0x156D72: init (haproxy.c:2016)
==241882== by 0x156D72: main (haproxy.c:3037)
because the `.px` member of `struct resolvers` is not freed.
The offending allocation was introduced in
c943799c86 which is a reorganization that
happened during development of 2.4.x. This fix can likely be backported without
issue to 2.4+ and is likely not needed for earlier versions as the leak happens
during deinit only.
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
Remaining flags and associated functions are move in the conn-stream
scope. These flags are added on the endpoint and not the conn-stream
itself. This way it will be possible to get them from the mux or the
applet. The functions to get or set these flags are renamed accordingly with
the "cs_" prefix and updated to manipualte a conn-stream instead of a
stream-interface.
At many places, we now use the new CS functions to get a stream or a channel
from a conn-stream instead of using the stream-interface API. It is the
first step to reduce the scope of the stream-interfaces. The main change
here is about the applet I/O callback functions. Before the refactoring, the
stream-interface was the appctx owner. Thus, it was heavily used. Now, as
far as possible,the conn-stream is used. Of course, it remains many calls to
the stream-interface API.
The unsafe conn-stream API (__cs_*) is now used when we are sure the good
endpoint or application is attached to the conn-stream. This avoids compiler
warnings about possible null derefs. It also simplify the code and clear up
any ambiguity about manipulated entities.
Because appctx is now an endpoint of the conn-stream, there is no reason to
still have the stream-interface as appctx owner. Thus, the conn-stream is
now the appctx owner.
Thanks to previous changes, it is now possible to set an appctx as endpoint
for a conn-stream. This means the appctx is no longer linked to the
stream-interface but to the conn-stream. Thus, a pointer to the conn-stream
is explicitly stored in the stream-interface. The endpoint (connection or
appctx) can be retrieved via the conn-stream.
When a string is converted to a domain name label, the trailing dot must be
ignored. In resolv_str_to_dn_label(), there is a test to do so. However, the
trailing dot is not really ignored. The character itself is not copied but
the string index is still moved to the next char. Thus, this trailing dot is
counted in the length of the last encoded part of the domain name. Worst,
because the copy is skipped, a garbage character is included in the domain
name.
This patch should fix the issue #1528. It must be backported as far as 2.0.
When a response is validated, the query domain name is checked to be sure it
is the same than the one requested. When an error is reported, the wrong
goto label was used. Thus, the error was lost. Instead of
RSLV_RESP_WRONG_NAME, RSLV_RESP_INVALID was reported.
This bug was introduced by the commit c1699f8c1 ("MEDIUM: resolvers: No
longer store query items in a list into the response").
This patch should fix the issue #1473. No backport is needed.
When the response is parsed, query items are stored in a list, attached to
the parsed response (resolve_response).
First, there is one and only one query sent at a time. Thus, there is no
reason to use a list. There is a test to be sure there is only one query
item in the response. Then, the reference on this query item is only used to
validate the domain name is the one requested. So the query list can be
removed. We only expect one query item, no reason to loop on query records.
In addition, the query domain name is now immediately checked against the
resolution domain name. This way, the query item is only manipulated during
the response parsing.
When a new response is parsed, it is unexpected to have an old query item
still attached to the resolution. And indeed, when the response is parsed
and validated, the query item is detached and used for a last check on its
dname. However, this is only true for a valid response. If an error is
detected, the query is not detached. This leads to undefined behavior (most
probably a crash) on the next response because the first element in the
query list is referencing an old response.
This patch must be backported as far as 2.0.
This patch renames all dns extra counters and stats functions, types and
enums using the 'resolv' prefix/suffixes.
The dns extra counter domain id used on cli was replaced by "resolvers"
instead of "dns".
The typed extra counter prefix dumping resolvers domain "D." was
also renamed "N." because it points counters on a Nameserver.
This was done to finish the split between "resolver" and "dns" layers
and to avoid further misunderstanding when haproxy will handle dns
load balancing.
This should not be backported.
This patch add a union and struct into dns_counter struct to split
application specific counters.
The only current existing application is the resolver.c layer but
in futur we could handle different application such as dns load
balancing with others specific counters.
This patch should not be backported.
Before this patch the sent error counter was increased
for each targeted nameserver as soon as we were unable to build
the query message into the trash buffer. But this counter is here
to count sent errors at dns.c transport layer and this error is not
related to a nameserver.
This patch stops to increase those counters and sent a log message
to signal the trash buffer size is not large enough to build the query.
Note: This case should not happen except if trash size buffer was
customized to a very low value.
The function was also re-worked to return -1 in this error case
as it was specified in comment. This function is currently
called at multiple point in resolver.c but return code
is still not yet handled. So to advert the user of the malfunction
the log message was added.
This patch should be backported on all versions including the
layer split between dns.c and resolver.c (v >= 2.4)
The sent messages counter was increased at both resolver.c and dns.c
layers.
This patch let the dns.c layer count the sent messages since this
layer handle a retry if transport layer is not ready (EAGAIN on udp
or tcp session ring buffer full).
This patch should be backported on all versions using a split of those
layers for resolving (v >=2.4)
The kill list introduced in commit f766ec6b5 ("MEDIUM: resolvers: use a kill
list to preserve the list consistency") contains a bug. The deatch_row must
be initialized before calling resolv_process_responses() function. However,
this function is called for the dns code. The death_row is not visible from
the outside. So, it is possible to add a resolution in an uninitialized
death_row, leading to a crash.
But, with the current implementation, it is not possible to handle the
death_row in resolv_process_responses() function because, internally, the
kill list may be freed via a call to resolv_unlink_resolution(). At the end,
we are unable to determine all call chains to guarantee a safe use of the
kill list. It is a shameful observation, but unfortunatly true.
So, to make the fix simple, we track all calls to the public resolvers
api. A counter is incremented when we enter in the resolver code and
decremented when we leave it. This way, we are able to track the recursions
to init and release the kill list only once, at the edge.
Following functions are incrementing/decrementing the recurse counter:
* resolv_trigger_resolution()
* resolv_srvrq_expire_task()
* resolv_link_resolution()
* resolv_unlink_resolution()
* resolv_detach_from_resolution_answer_items()
* resolv_process_responses()
* process_resolvers()
* resolvers_finalize_config()
* resolv_action_do_resolve()
This patch should fix the issue #1404. It must be backported everywhere the
above commit was backported.
When a requester is unlink from a resolution, by reading the code, we can
have this call chain:
_resolv_unlink_resolution(srv->resolv_requester)
resolv_detach_from_resolution_answer_items(resolution, requester)
resolv_srvrq_cleanup_srv(srv)
_resolv_unlink_resolution(srv->resolv_requester)
A loop on the resolution answer items is performed inside
resolv_detach_from_resolution_answer_items(). But by reading the code, it
seems possible to recursively unlink the same requester.
To avoid any loop at this stage, the requester clean up must be performed
before the call to resolv_detach_from_resolution_answer_items(). This way,
the second call to _resolv_unlink_resolution() does nothing and returns
immediately because the requester was already detached from the resolution.
This patch is related to the issue #1404. It must be backported as far as
2.2.
At a few places we were still using protocol_by_family() instead of
the richer protocol_lookup(). The former is limited as it enforces
SOCK_STREAM and a stream protocol at the control layer. At least with
protocol_lookup() we don't have this limitationn. The values were still
set for now but later we can imagine making them configurable on the
fly.
In issue 1424 Coverity reports that the loop increment is unreachable,
which is true, the list_for_each_entry() was replaced with a for loop,
but it was already not needed and was instead used as a convenient
construct for a single iteration lookup. Let's get rid of all this
now and replace the loop with an "if" statement.
We're using an XXH32() on the record to insert it into or look it up from
the tree. This way we don't change the rest of the code, the comparisons
are still made on all fields and the next node is visited on mismatch. This
also allows to continue to use roundrobin between identical nodes.
Just doing this is sufficient to see the CPU usage go down from ~60-70% to
4% at ~2k DNS requests per second for farm with 300 servers. A larger
config with 12 backends of 2000 servers each shows ~8-9% CPU for 6-10000
DNS requests per second.
It would probably be possible to go further with multiple levels of indexing
but it's not worth it, and it's important to remember that tree nodes take
space (the struct answer_list went back from 576 to 600 bytes).
With SRV records, a huge amount of time is spent looking for records
by walking long lists. It is possible to reduce this by indexing values
in trees instead. However the whole code relies a lot on the list
ordering, and even implements some round-robin on it to distribute IP
addresses to servers.
This patch starts carefully by replacing the list with a an eb32 tree
that is still used like a list, with a constant key 0. Since ebtrees
preserve insertion order for duplicates, the tree walk visits the nodes
in the exact same order it did with the lists. This allows to implement
the required infrastructure without changing the behavior.
This one was used to indicate whether the callee had to follow particularly
safe code path when removing resolutions. Since the code now uses a kill
list, this is not needed anymore.
When scanning resolution.curr it's possible to try to free some
resolutions which will themselves result in freeing other ones. If
one of these other ones is exactly the next one in the list, the list
walk visits deleted nodes and causes memory corruption, double-frees
and so on. The approach taken using the "safe" argument to some
functions seems to work but it's extremely brittle as it is required
to carefully check all call paths from process_ressolvers() and pass
the argument to 1 there to refrain from deleting entries, so the bug
is very likely to come back after some tiny changes to this code.
A variant was tried, checking at various places that the current task
corresponds to process_resolvers() but this is also quite brittle even
though a bit less.
This patch uses another approach which consists in carefully unlinking
elements from the list and deferring their removal by placing it in a
kill list instead of deleting them synchronously. The real benefit here
is that the complexity only has to be placed where the complications
are.
A thread-local list is fed with elements to be deleted before scanning
the resolutions, and it's flushed at the end by picking the first one
until the list is empty. This way we never dereference the next element
and do not care about its presence or not in the list. One function,
resolv_unlink_resolution(), is exported and used outside, so it had to
be modified to use this list as well. Internal code has to use
_resolv_unlink_resolution() instead.
The code as it is uses crossed lists between many elements, and at
many places the code relies on list iterators or emptiness checks,
which does not work with only LIST_DELETE. Further, it is quite
difficult to place debugging code and checks in the current situation,
and gdb is helpless.
This code replaces all LIST_DELETE calls with LIST_DEL_INIT so that
it becomes possible to trust the lists.
This function allocates requesters by hand for each and every type. This
is complex and error-prone, and it doesn't even initialize the list part,
leaving dangling pointers that complicate debugging.
This patch introduces a new function resolv_get_requester() that either
returns the current pointer if valid or tries to allocate a new one and
links it to its destination. Then it makes use of it in the function
above to clean it up quite a bit. This allows to remove complicated but
unneeded tests.
Similar to the previous patch, the answer's list was only initialized the
first time it was added to a list, leading to bogus outdated pointer to
appear when debugging code is added around it to watch it. Let's make
sure it's always initialized upon allocation.
The query_list is physically stored in the struct resolution itself,
so we have a list that contains a list to items stored in itself (and
there is a single item). But the list is first initialized in
resolv_validate_dns_response(), while it's scanned in
resolv_process_responses() later after calling the former. First,
this results in crashes as soon as the code is instrumented a little
bit for debugging, as elements from a previous incarnation can appear.
But in addition to this, the presence of an element is checked by
verifying that the return of LIST_NEXT() is not NULL, while it may
never be NULL even for an empty list, resulting in bugs or crashes
if the number of responses does not match the list's contents. This
is easily triggered by testing for the list non-emptiness outside of
the function.
Let's make sure the list is always correct, i.e. it's initialized to
an empty list when the structure is allocated, elements are checked by
first verifying the list is not empty, they are deleted once checked,
and in any case at end so that there are no dangling pointers.
This should be backported, but only as long as the patch fits without
modifications, as adaptations can be risky there given that bugs tend
to hide each other.
Depending on the code that precedes the loop, gcc may emit this warning:
src/resolvers.c: In function 'resolv_process_responses':
src/resolvers.c:1009:11: warning: potential null pointer dereference [-Wnull-dereference]
1009 | if (query->type != DNS_RTYPE_SRV && flags & DNS_FLAG_TRUNCATED) {
| ~~~~~^~~~~~
However after carefully checking, r_res->header.qdcount it exclusively 1
when reaching this place, which forces the for() loop to enter for at
least one iteration, and <query> to be set. Thus there's no code path
leading to a null deref. It's possibly just because the assignment is
too far and the compiler cannot figure that the condition is always OK.
Let's just mark it to please the compiler.
This code is dangerous enough that we certainly don't want external code
to ever approach it, let's not export unnecessary functions like this one.
It was made static and a comment was added about its purpose.
There is a fundamental design bug in the resolvers code which is that
a list of active resolutions is being walked to try to delete outdated
entries, and that the code responsible for removing them also removes
other elements, including the next one which will be visited by the
list iterator. This randomly causes a use-after-free condition leading
to crashes, infinite loops and various other issues such as random memory
corruption.
A first fix for the memory fix for this was brought by commit 0efc0993e
("BUG/MEDIUM: resolvers: Don't release resolution from a requester
callbacks"). While preparing for more fixes, some code was factored by
commit 11c6c3965 ("MINOR: resolvers: Clean server in a dedicated function
when removing a SRV item"), which inadvertently passed "0" as the "safe"
argument all the time, missing one case of removal protection, instead
of always using "safe". This patch reintroduces the correct argument.
This must be backported with all fixes above.
Cc: Christopher Faulet <cfaulet@haproxy.com>
resolv_hostname_cmp() is bogus, it is applied on labels and not plain
names, but doesn't make any distinction between length prefixes and
characters, so it compares the labels lengths via tolower() as well.
The only reason for which it doesn't break is because labels cannot
be larger than 63 bytes, and that none of the common encoding systems
have upper case letters in the lower 63 bytes, that could be turned
into a different value via tolower().
Now that all labels are stored in lower case, we don't need to burn
CPU cycles in tolower() at run time and can use memcmp() instead of
resolv_hostname_cmp(). This results in a ~22% lower CPU usage on large
farms using SRV records:
before:
18.33% haproxy [.] resolv_validate_dns_response
10.58% haproxy [.] process_resolvers
10.28% haproxy [.] resolv_hostname_cmp
7.50% libc-2.30.so [.] tolower
46.69% total
after:
24.73% haproxy [.] resolv_validate_dns_response
7.78% libc-2.30.so [.] __memcmp_avx2_movbe
3.65% haproxy [.] process_resolvers
36.16% total
The whole code relies on performing case-insensitive comparison on
lookups, which is extremely inefficient.
Let's make sure that all labels to be looked up or sent are first
converted to lower case. Doing so is also the opportunity to eliminate
an inefficient memcpy() in resolv_dn_label_to_str() that essentially
runs over a few unaligned bytes at once. As a side note, that call
was dangerous because it relied on a sign-extended size taken from a
string that had to be sanitized first.
This is tagged medium because while this is 100% safe, it may cause
visible changes on the wire at the packet level and trigger bugs in
test programs.
A bug was introduced by commit previous bf9498a31 ("MINOR: resolvers:
fix the resolv_str_to_dn_label() API about trailing zero") as the code
is particularly contrived and hard to test. The output writes the last
char at [i+1] so the trailing zero and return value must be at i+1.
This will have to be backported where the patch above is backported
since it was needed for a fix.
These two fields are exclusive as they depend on the data type.
Let's move them into a union to save some precious bytes. This
reduces the struct resolv_answer_item size from 600 to 576 bytes.
The struct resolv_answer_item contains an address field of type
"sockaddr" which is only 16 bytes long, but which is used to store
either IPv4 or IPv6. Fortunately, the contents only overlap with
the "target" field that follows it and that is large enough to
absorb the extra bytes needed to store AAAA records. But this is
dangerous as just moving fields around could result in memory
corruption.
The fix uses a union and removes the casts that were used to hide
the problem.
Older versions need to be checked and possibly fixed. This needs
to be backported anyway.
This function suffers from the same API issue as its sibling that does the
opposite direction, it demands that the input string is zero-terminated
*and* that its length *including* the trailing zero is passed on input,
forcing callers to pass length + 1, and itself to use that length - 1
everywhere internally.
This patch addressess this. There is a single caller, which is the
location of the previous bug, so it should probably be backported at
least to keep the code consistent across versions. Note that the
function is called dns_dn_label_to_str() in 2.3 and earlier.
An off-by-one issue in buffer size calculation used to limit the output
of resolv_dn_label_to_str() to 254 instead of 255.
This must be backported to 2.0.
In issue #1411, @jjiang-stripe reports that do-resolve() sometimes seems
to be trying to resolve crap from random memory contents.
The issue is that action_prepare_for_resolution() tries to measure the
input string by itself using strlen(), while resolv_action_do_resolve()
directly passes it a pointer to the sample, omitting the known length.
Thus of course any other header present after the host in memory are
appended to the host value. It could theoretically crash if really
unlucky, with a buffer that does not contain any zero including in the
index at the end, and if the HTX buffer ends on an allocation boundary.
In practice it should be too low a probability to have ever been observed.
This patch modifies the action_prepare_for_resolution() function to take
the string length on with the host name on input and pass that down the
chain. This should be backported to 2.0 along with commit "MINOR:
resolvers: fix the resolv_str_to_dn_label() API about trailing zero".
This function is bogus at the API level: it demands that the input string
is zero-terminated *and* that its length *including* the trailing zero is
passed on input. While that already looks smelly, the trailing zero is
copied as-is, and is then explicitly replaced with a zero... Not only
all callers have to pass hostname_len+1 everywhere to work around this
absurdity, but this requirement causes a bug in the do-resolve() action
that passes random string lengths on input, and that will be fixed on a
subsequent patch.
Let's fix this API issue for now.
This patch will have to be backported, and in versions 2.3 and older,
the function is in dns.c and is called dns_str_to_dn_label().
We'll need to improve the API to pass other arguments in the future, so
let's start to adapt better to the current use cases. task_new() is used:
- 18 times as task_new(tid_bit)
- 18 times as task_new(MAX_THREADS_MASK)
- 2 times with a single bit (in a loop)
- 1 in the debug code that uses a mask
This patch provides 3 new functions to achieve this:
- task_new_here() to create a task on the calling thread
- task_new_anywhere() to create a task to be run anywhere
- task_new_on() to create a task to run on a specific thread
The change is trivial and will allow us to later concentrate the
required adaptations to these 3 functions only. It's still possible
to call task_new() if needed but a comment was added to encourage the
use of the new ones instead. The debug code was not changed and still
uses it.
When a server is configured with name-resolution, resolvers objects are
created with reference to this server. Thus the server is marked as non
purgeable to prevent its removal at runtime.
This does not need to be backport.
When we evaluate a DNS response item, it may be necessary to look for a
server with a hostname matching the item target into the named servers
tree. To do so, the item target is transformed to a lowercase string. It
must be a null-terminated string. Thus we must explicitly set the trailing
'\0' character.
For a specific resolution, the named servers tree contains all servers using
this resolution with a hostname loaded from a state file. Because of this
bug, same entry may be duplicated because we are unable to find the right
server, assigning this way the item to a free server slot.
This patch should fix the issue #1333. It must be backported as far as 2.2.
On A/AAAA resolution, for a given server, if a record is matching, we must
always attach the server to this record. Before it was only done if the
server IP was not the same than the record one. However, it is a problem if
the server IP was not set for a previous resolution. From the libc during
startup for instance. In this case, the server IP is not updated and the
server is not attached to any record. It remains in this state while a
matching record is found in the DNS response. It is especially a problem
when the resolution is used for server-templates.
This bug was introduced by the commit bd78c912f ("MEDIUM: resolvers: add a
ref on server to the used A/AAAA answer item").
This patch should solve the issue #1305. It must be backported to all
versions containing the above commit.
The commit dcac41806 ("BUG/MEDIUM: resolvers: Add a task on servers to check
SRV resolution status") introduced a type. In resolv_srvrq_expire_task()
function, the resolver's lock must be used instead of the resolver itself.
This patch must be backported with the patch above (at least as far as 2.2).
When a server relies on a SRV resolution, a task is created to clean it up
(fqdn/port and address) when the SRV resolution is considered as outdated
(based on the resolvers 'timeout' value). It is only possible if the server
inherits outdated info from a state file and is no longer selected to be
attached to a SRV item. Note that most of time, a server is attached to a
SRV item. Thus when the item becomes obsolete, the server is cleaned
up.
It is important to have such task to be sure the server will be free again
to have a chance to be resolved again with fresh information. Of course,
this patch is a workaround to solve a design issue. But there is no other
obvious way to fix it without rewritting all the resolvers part. And it must
be backportable.
This patch relies on following commits:
* MINOR: resolvers: Clean server in a dedicated function when removing a SRV item
* MINOR: resolvers: Remove server from named_servers tree when removing a SRV item
All the series must be backported as far as 2.2 after some observation
period. Backports to 2.0 and 1.8 must be evaluated.
When a server is cleaned up because the corresponding SRV item is removed,
we always remove the server from the srvrq's name_servers tree. For now, it
is useless because, if a server was attached to a SRV item, it means it was
already removed from the tree. But it will be mandatory to fix a bug.
A dedicated function is now used to clean up servers when a SRV item becomes
obsolete or when a requester is removed from a resolution. This patch is
mandatory to fix a bug.
Lots of places iterating over nbproc or comparing with nbproc could be
simplified. Further, "bind-process" and "process" parsing that was
already limited to process 1 or "all" or "odd" resulted in a bind_proc
field that was either 0 or 1 during the init phase and later always 1.
All the checks for compatibilities were removed since it's not possible
anymore to run a frontend and a backend on different processes or to
have peers and stick-tables bound on different ones. This is the largest
part of this patch.
The bind_proc field was removed from both the proxy and the receiver
structs.
Since the "process" and "bind-process" directives are still parsed,
configs making use of correct values allowing process 1 will continue
to work.
This patch add a ref into servers to register them onto the
record answer item used to set their hostnames.
It also adds a head list into 'srvrq' to register servers free
to be affected to a SRV record.
A head of a tree is also added to srvrq to put servers which
present a hotname in server state file. To re-link them fastly
to the matching record as soon an item present the same name.
This results in better performances on SRV record response
parsing.
This is an optimization but it could avoid to trigger the haproxy's
internal wathdog in some circumstances. And for this reason
it should be backported as far we can (2.0 ?)
This patch adds a head list into answer items on servers which use
this record to set their IPs. It makes lookup on duplicated ip faster and
allow to check immediatly if an item is still valid renewing the IP.
This results in better performances on A/AAAA resolutions.
This is an optimization but it could avoid to trigger the haproxy's
internal wathdog in some circumstances. And for this reason
it should be backported as far we can (2.0 ?)
In case of SRV records, The answer item list was purged by the
error callback of the first requester which considers the error
could not be safely ignored. It makes this item list unavailable
for subsequent requesters even if they consider the error
could be ignored.
On A resolution or do_resolve action error, the answer items were
never trashed.
This patch re-work the error callbacks and the code to check the return code
If a callback return 1, we consider the error was ignored and
the answer item list must be kept. At the opposite, If all error callbacks
of all requesters of the same resolution returns 0 the list will be purged
This patch should be backported as far as 2.0.
Set "config :" as a prefix for the user messages context before starting
the configuration parsing. All following stderr output will be prefixed
by it.
As a consequence, remove extraneous prefix "config" already specified in
various ha_alert/warning/notice calls.
Define a new keyword flag KWF_MATCH_PREFIX. This is used to replace the
match_pfx field of action struct.
This has the benefit to have more explicit action declaration, and now
it is possible to quickly implement experimental actions.
There were 102 CLI commands whose help were zig-zagging all along the dump
making them unreadable. This patch realigns all these messages so that the
command now uses up to 40 characters before the delimiting colon. About a
third of the commands did not correctly list their arguments which were
added after the first version, so they were all updated. Some abuses of
the term "id" were fixed to use a more explanatory term. The
"set ssl ocsp-response" command was not listed because it lacked a help
message, this was fixed as well. The deprecated enable/disable commands
for agent/health/server were prominently written as deprecated. Whenever
possible, clearer explanations were provided.
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
This patch re-works configuration parsing, it removes the "server"
lines from "resolvers" sections introduced in commit 56fc5d9eb:
MEDIUM: resolvers: add supports of TCP nameservers in resolvers.
It also extends the nameserver lines to support stream server
addresses such as:
resolvers
nameserver localhost tcp@127.0.0.1:53
Doing so, a part of nameserver's init code was factorized in
function 'parse_resolvers' and removed from 'post_parse_resolvers'.
Create a new module init which contains code related to REGISTER_*
macros for initcalls. init.h is included in api.h to make init code
available to all modules.
It's a step to clean up a bit haproxy.c/global.h.
Modify the API of parse_server function. Use flags to describe the type
of the parsed server instead of discrete arguments. These flags can be
used to specify if a server/default-server/server-template is parsed.
Additional parameters are also specified (parsing of the address
required, resolve of a name must be done immediately).
It is now unneeded to use strcmp on args[0] in parse_server. Also, the
calls to parse_server are more explicit thanks to the flags.
The loop looking for existing ADD items to renew their last_seen must ignore
the items already renewed in the same loop. To do so, we rely on the
last_seen time. because it is now based on now_ms, it is safe.
Doing so avoid to match several time the same ADD item when the same IP
address is found in several ADD item. This reduces the number of extra DNS
resolutions.
This patch depends on "MINOR: resolvers: Use milliseconds for cached items
in resolver responses". Both may be backported as far as 2.2 if necessary.
The last time when an item was seen in a resolver responses is now stored in
milliseconds instead of seconds. This avoid some corner-cases at the
edges. This also simplifies time comparisons.
At startup, if a SRV resolution is set for a server, no DNS resolution is
created. We must wait the first SRV resolution to know if it must be
triggered. It is important to do so for two reasons.
First, during a "classical" startup, a server based on a SRV resolution has
no hostname. Thus the created DNS resolution is useless. Best waiting the
first SRV resolution. It is not really a bug at this stage, it is just
useless.
Second, in the same situation, if the server state is loaded from a file,
its hosname will be set a bit later. Thus, if there is no additionnal record
for this server, because there is already a DNS resolution, it inhibits any
new DNS resolution. But there is no hostname attached to the existing DNS
resolution. So no resolution is performed at all for this server.
To avoid any problem, it is fairly easier to handle this special case during
startup. But this means we must be prepared to have no "resolv_requester"
field for a server at runtime.
This patch must be backported as far as 2.2.
Another way to say it: "Safely unlink requester from a requester callbacks".
Requester callbacks must never try to unlink a requester from a resolution, for
the current requester or another one. First, these callback functions are called
in a loop on a request list, not necessarily safe. Thus unlink resolution at
this place, may be unsafe. And it is useless to try to make these loops safe
because, all this stuff is placed in a loop on a resolution list. Unlink a
requester may lead to release a resolution if it is the last requester.
However, the unkink is necessary because we cannot reset the server state
(hostname and IP) with some pending DNS resolution on it. So, to workaround
this issue, we introduce the "safe" unlink. It is only performed from a
requester callback. In this case, the unlink function never releases the
resolution, it only reset it if necessary. And when a resolution is found
with an empty requester list, it is released.
This patch depends on the following commits :
* MINOR: resolvers: Purge answer items when a SRV resolution triggers an error
* MINOR: resolvers: Use a function to remove answers attached to a resolution
* MINOR: resolvers: Directly call srvrq_update_srv_state() when possible
* MINOR: resolvers: Add function to change the srv status based on SRV resolution
All the series must be backported as far as 2.2. It fixes a regression
introduced by the commit b4badf720 ("BUG/MINOR: resolvers: new callback to
properly handle SRV record errors").
don't release resolution from requester cb
When the server status must be updated from the result of a SRV resolution,
we can directly call srvrq_update_srv_state(). It is simpler and this avoid
a test on the server DNS resolution.
This patch is mandatory for the next commit. It also rely on "MINOR:
resolvers: Directly call srvrq_update_srv_state() when possible".
resolv_purge_resolution_answer_records() must be used to removed all answers
attached to a resolution. For now, it is only used when a resolution is
released.