haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-19 13:41:27 +02:00

Author	SHA1	Message	Date
Willy Tarreau	caff631bc0	CLEANUP: stats: rename all occurrences of stconn "cs" to "sc" Function arguments and local variables called "cs" were renamed to "sc" to avoid future confusion. Both the core functions and the ones in the resolvers files were updated.	2022-05-27 19:33:35 +02:00
Willy Tarreau	cb086c6de1	REORG: stconn: rename conn_stream.{c,h} to stconn.{c,h} There's no more reason for keepin the code and definitions in conn_stream, let's move all that to stconn. The alphabetical ordering of include files was adjusted.	2022-05-27 19:33:35 +02:00
Willy Tarreau	5edca2f0e1	REORG: rename cs_utils.h to sc_strm.h This file contains all the stream-connector functions that are specific to application layers of type stream. So let's name it accordingly so that it's easier to figure what's located there. The alphabetical ordering of include files was preserved.	2022-05-27 19:33:35 +02:00
Willy Tarreau	99615ed85d	CLEANUP: stconn: rename cs_rx_room_{blk,rdy} to sc_{need,have}_room() The new name mor eclearly indicates that a stream connector cannot make any more progress because it needs room in the channel buffer, or that it may be unblocked because the buffer now has more room available. The testing function is sc_waiting_room(). This is mostly used by applets. Note that the flags will change soon.	2022-05-27 19:33:35 +02:00
Willy Tarreau	8e7c6e6907	CLEANUP: stconn: rename cs_appctx() to sc_appctx() Nothing special, just s/cs/sc/, roughly 50-60 entries.	2022-05-27 19:33:34 +02:00
Willy Tarreau	40a9c32e3a	CLEANUP: stconn: rename cs_{i,o}{b,c} to sc_{i,o}{b,c} We're starting to propagate the stream connector's new name through the API. Most call places of these functions that retrieve the channel or its buffer are in applets. The local variable names are not changed in order to keep the changes small and reviewable. There were ~92 uses of cs_ic(), ~96 of cs_oc() (due to co_get() being less factorizable than ci_put), and ~5 accesses to the buffer itself.	2022-05-27 19:33:34 +02:00
Willy Tarreau	d0a06d52f4	CLEANUP: applet: use applet_put() everywhere possible This applies the change so that the applet code stops using ci_putchk() and friends everywhere possible, for the much saferapplet_put() instead. The change is mechanical but large. Two or three functions used to have no appctx and a cs derived from the appctx instead, which was a reminiscence of old times' stream_interface. These were simply changed to directly take the appctx. No sensitive change was performed, and the old (more complex) API is still usable when needed (e.g. the channel is already known). The change touched roughly a hundred of locations, with no less than 124 lines removed. It's worth noting that the stats applet, the oldest of the series, could get a serious lifting, as it's still very channel-centric instead of propagating the appctx along the chain. Given that this code doesn't change often, there's no emergency to clean it up but it would look better.	2022-05-27 19:33:34 +02:00
Willy Tarreau	4596fe20d9	CLEANUP: conn_stream: tree-wide rename to stconn (stream connector) This renames the "struct conn_stream" to "struct stconn" and updates the descriptions in all comments (and the rare help descriptions) to "stream connector" or "connector". This touches a lot of files but the change is minimal. The local variables were not even renamed, so there's still a lot of "cs" everywhere.	2022-05-27 19:33:34 +02:00
Christopher Faulet	4315d17d3f	BUG/MEDIUM: resolvers: Don't defer resolutions release in deinit function resolvers_deinit() function is called on error, during post-parsing stage, or on deinit, when HAProxy is stopped. It releases all entities: resolvers, resolutions and SRV requests. There is no reason to defer the resolutions release by moving them in the death_row list because this function is terminal. And it is in fact a bug. Resolutions must not be released at the end of the function because resolvers were already freed. However some resolutions may still be attached to a reolver. Thus, when we try to remove it from the resolver's tree, in resolv_reset_resolution(), this resolver was already released. So now, resolution are immediately released. It means there is no more reason to track this function. calls to enter_resolver_code()/leave_resolver_code() have been removed. This patch should fix the issue #1680 and may be related to #1485. It must be backported as far as 2.2.	2022-05-24 18:11:59 +02:00
Willy Tarreau	91b47263f7	MINOR: protocol: replace ctrl_type with xprt_type and clarify it There's been some great confusion between proto_type, ctrl_type and sock_type. It turns out that ctrl_type was improperly chosen because it's not the control layer that is of this or that type, but the transport layer, and it turns out that the transport layer doesn't (normally) denaturate the underlying control layer, except for QUIC which turns dgrams to streams. The fact that the SOCK_{DGRAM\|STREAM} set of values was used added to the confusion. Let's replace it with xprt_type which reuses the later introduced PROTO_TYPE_* values, and update the comments to explain which one works at what level.	2022-05-20 18:39:43 +02:00
Willy Tarreau	0698c80a58	CLEANUP: applet: remove the unneeded appctx->owner This one is the pointer to the conn_stream which is always in the endpoint that is always present in the appctx, thus it's not needed. This patch removes it and replaces it with appctx_cs() instead. A few occurences that were using __cs_strm(appctx->owner) were moved directly to appctx_strm() which does the equivalent.	2022-05-13 14:28:48 +02:00
Willy Tarreau	12d5228a44	CLEANUP: resolvers/cli: remove the unneeded appctx->st2 from "show resolvers" The command uses this state but _INIT immediately turns to _LIST, which turns to _FIN at the end without doing anything in that state, thus the only existing state is _LIST so we don't need to store a state. Let's just get rid of it.	2022-05-06 18:13:36 +02:00
Willy Tarreau	db933d6fdd	CLEANUP: resolvers/cli: make "show resolvers" use a locally-defined context The command was using cli.p0/p1/p2 to select which section to dump, the current section and the current ns. Let's instead have a locally defined "show_resolvers_ctx" section for this.	2022-05-06 18:13:36 +02:00
Willy Tarreau	91cefcaba4	CLEANUP: stats/cli: take the "show stat" context definition out of the appctx This makes use of the generic command context allocation so that the appctx doesn't have to declare a specific one anymore. The context is created during parsing (both in the CLI and HTTP). The change looks large but it's particularly mechanical. The context initialization appears in stats.c and http_ana.c. The context is used in stats.c and resolvers.c since "show stat resolvers" points there. That's the reason why the definition moved to stats.h. "show info" and "show stat" continue to share the same state definition for now. Nothing else was modified.	2022-05-06 18:13:35 +02:00
Willy Tarreau	4e047e7d0e	BUG/MEDIUM: resolvers: make "show resolvers" properly yield The "show resolvers" command is bogus, it tries to implement a yielding mechanism except that if it yields it restarts from the beginning, until it manages to fill the buffer with only line breaks, and faces error -2 that lets it reach the final state and exit. The risk is low since it requires about 50 name servers to reach that state, but it's not impossible, especially when using multiple sections. In addition, the extraneous line breaks, if sent over an interactive connection, will desynchronize the commands and make the client believe the end was reached after the first nameserver. This cannot be fixed separately because that would turn this bug into an infinite loop since it's the line feed that manages to fill the buffer and stop it. The fix consists in saving the current resolvers section into ctx.cli.p1 and the current nameserver into ctx.cli.p2. This should be backported, but that code moved a lot since it was introduced and has always been bogus. It looks like it has mostly stabilized in 2.4 with commit c943799c86 so the fix might be backportable to 2.4 without too much effort.	2022-05-06 18:13:35 +02:00
William Lallemand	7867f63313	MEDIUM: resolvers: create a "default" resolvers section at startup Try to create a "default" resolvers section at startup, but does not display any error nor warning. This section is initialized using the /etc/resolv.conf of the system. This is opportunistic and with no guarantee that it will work (but it should on most systems). This is useful for the httpclient as it allows to use the DNS resolver without any configuration in most of the cases. The function is called from the httpclient_pre_check() function to ensure than we tried to create the section before trying to initiate the httpclient. But it is also called from the resolvers.c to ensure the section is created when the httpclient init was disabled.	2022-05-06 17:02:15 +02:00
William Lallemand	e7f5776800	MINOR: resolvers: resolvers_new() create a resolvers with default values Split the creation of the resolve structure from the parser to resolvers_new();	2022-05-05 18:27:48 +02:00
William Lallemand	73edfe402e	MINOR: resolvers: move the resolv.conf parser in parse_resolv_conf() Move the resolv.conf parser from the cfg_parse_resolvers so it could be used separately. Some changes were made in the memprintf in order to use a char ** instead of a char *. Also the variable is tested before each memprintf so could skip them if no warnmsg nor errmsg were set.	2022-05-05 17:38:48 +02:00
William Lallemand	106bd29dd0	MINOR: resolvers: cleanup alert/warning in parse-resolve-conf Cleanup the alert and warning handling in the "parse-resolve-conf" parser to use the errmsg and warnmsg variables and memprintf. This will allow to split the parser and shut the alert/warning if needed.	2022-05-05 17:33:42 +02:00
Tim Duesterhus	0b7031b37d	BUG/MINOR: resolvers: Fix memory leak in resolvers_deinit() A config like the following: global stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners resolvers unbound nameserver unbound 127.0.0.1:53 will report the following leak when running a configuration check: ==241882== 6,991 (6,952 direct, 39 indirect) bytes in 1 blocks are definitely lost in loss record 8 of 13 ==241882== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==241882== by 0x25938D: cfg_parse_resolvers (resolvers.c:3193) ==241882== by 0x26A1E8: readcfgfile (cfgparse.c:2171) ==241882== by 0x156D72: init (haproxy.c:2016) ==241882== by 0x156D72: main (haproxy.c:3037) because the `.px` member of `struct resolvers` is not freed. The offending allocation was introduced in c943799c865c04281454a7a54fd6c45c2b4d7e09 which is a reorganization that happened during development of 2.4.x. This fix can likely be backported without issue to 2.4+ and is likely not needed for earlier versions as the leak happens during deinit only.	2022-04-26 23:42:10 +02:00
Willy Tarreau	7e2e4f8401	CLEANUP: tree-wide: remove 25 occurrences of unneeded fcntl.h There were plenty of leftovers from old code that were never removed and that are not needed at all since these files do not use any definition depending on fcntl.h, let's drop them.	2022-04-26 10:59:48 +02:00
Christopher Faulet	6b0a0fb2f9	CLEANUP: tree-wide: Remove any ref to stream-interfaces Stream-interfaces are gone. Corresponding files can be safely be removed. In addition, comments are updated accordingly.	2022-04-13 15:10:16 +02:00
Christopher Faulet	a0bdec350f	MEDIUM: stream-int/conn-stream: Move blocking flags from SI to CS Remaining flags and associated functions are move in the conn-stream scope. These flags are added on the endpoint and not the conn-stream itself. This way it will be possible to get them from the mux or the applet. The functions to get or set these flags are renamed accordingly with the "cs_" prefix and updated to manipualte a conn-stream instead of a stream-interface.	2022-04-13 15:10:15 +02:00
Christopher Faulet	908628c4c0	MEDIUM: tree-wide: Use CS util functions instead of SI ones At many places, we now use the new CS functions to get a stream or a channel from a conn-stream instead of using the stream-interface API. It is the first step to reduce the scope of the stream-interfaces. The main change here is about the applet I/O callback functions. Before the refactoring, the stream-interface was the appctx owner. Thus, it was heavily used. Now, as far as possible,the conn-stream is used. Of course, it remains many calls to the stream-interface API.	2022-04-13 15:10:14 +02:00
Christopher Faulet	693b23bb10	MEDIUM: tree-wide: Use unsafe conn-stream API when it is relevant The unsafe conn-stream API (__cs_*) is now used when we are sure the good endpoint or application is attached to the conn-stream. This avoids compiler warnings about possible null derefs. It also simplify the code and clear up any ambiguity about manipulated entities.	2022-02-28 17:13:36 +01:00
Christopher Faulet	86e1c3381b	MEDIUM: applet: Set the conn-stream as appctx owner instead of the stream-int Because appctx is now an endpoint of the conn-stream, there is no reason to still have the stream-interface as appctx owner. Thus, the conn-stream is now the appctx owner.	2022-02-24 11:00:02 +01:00
Christopher Faulet	13a35e5752	MAJOR: conn_stream/stream-int: move the appctx to the conn-stream Thanks to previous changes, it is now possible to set an appctx as endpoint for a conn-stream. This means the appctx is no longer linked to the stream-interface but to the conn-stream. Thus, a pointer to the conn-stream is explicitly stored in the stream-interface. The endpoint (connection or appctx) can be retrieved via the conn-stream.	2022-02-24 11:00:02 +01:00
Christopher Faulet	0a82cf4c16	BUG/MEDIUM: resolvers: Really ignore trailing dot in domain names When a string is converted to a domain name label, the trailing dot must be ignored. In resolv_str_to_dn_label(), there is a test to do so. However, the trailing dot is not really ignored. The character itself is not copied but the string index is still moved to the next char. Thus, this trailing dot is counted in the length of the last encoded part of the domain name. Worst, because the copy is skipped, a garbage character is included in the domain name. This patch should fix the issue #1528. It must be backported as far as 2.0.	2022-01-28 17:56:18 +01:00
Christopher Faulet	af93d2fd70	BUG/MINOR: resolvers: Don't overwrite the error for invalid query domain name When a response is validated, the query domain name is checked to be sure it is the same than the one requested. When an error is reported, the wrong goto label was used. Thus, the error was lost. Instead of RSLV_RESP_WRONG_NAME, RSLV_RESP_INVALID was reported. This bug was introduced by the commit c1699f8c1 ("MEDIUM: resolvers: No longer store query items in a list into the response"). This patch should fix the issue #1473. No backport is needed.	2021-12-02 10:05:04 +01:00
Christopher Faulet	c1699f8c1b	MEDIUM: resolvers: No longer store query items in a list into the response When the response is parsed, query items are stored in a list, attached to the parsed response (resolve_response). First, there is one and only one query sent at a time. Thus, there is no reason to use a list. There is a test to be sure there is only one query item in the response. Then, the reference on this query item is only used to validate the domain name is the one requested. So the query list can be removed. We only expect one query item, no reason to loop on query records. In addition, the query domain name is now immediately checked against the resolution domain name. This way, the query item is only manipulated during the response parsing.	2021-12-01 15:21:56 +01:00
Christopher Faulet	80b2e34b18	BUG/MEDIUM: resolvers: Detach query item on response error When a new response is parsed, it is unexpected to have an old query item still attached to the resolution. And indeed, when the response is parsed and validated, the query item is detached and used for a last check on its dname. However, this is only true for a valid response. If an error is detected, the query is not detached. This leads to undefined behavior (most probably a crash) on the next response because the first element in the query list is referencing an old response. This patch must be backported as far as 2.0.	2021-12-01 11:47:08 +01:00
Emeric Brun	f8642ee826	MEDIUM: resolvers: rename dns extra counters to resolvers extra counters This patch renames all dns extra counters and stats functions, types and enums using the 'resolv' prefix/suffixes. The dns extra counter domain id used on cli was replaced by "resolvers" instead of "dns". The typed extra counter prefix dumping resolvers domain "D." was also renamed "N." because it points counters on a Nameserver. This was done to finish the split between "resolver" and "dns" layers and to avoid further misunderstanding when haproxy will handle dns load balancing. This should not be backported.	2021-11-03 17:16:46 +01:00
Emeric Brun	d174f0e59a	MINOR: resolvers/dns: split dns and resolver counters in dns_counter struct This patch add a union and struct into dns_counter struct to split application specific counters. The only current existing application is the resolver.c layer but in futur we could handle different application such as dns load balancing with others specific counters. This patch should not be backported.	2021-11-03 17:16:46 +01:00
Emeric Brun	0161d32df2	BUG/MINOR: resolvers: throw log message if trash not large enough for query Before this patch the sent error counter was increased for each targeted nameserver as soon as we were unable to build the query message into the trash buffer. But this counter is here to count sent errors at dns.c transport layer and this error is not related to a nameserver. This patch stops to increase those counters and sent a log message to signal the trash buffer size is not large enough to build the query. Note: This case should not happen except if trash size buffer was customized to a very low value. The function was also re-worked to return -1 in this error case as it was specified in comment. This function is currently called at multiple point in resolver.c but return code is still not yet handled. So to advert the user of the malfunction the log message was added. This patch should be backported on all versions including the layer split between dns.c and resolver.c (v >= 2.4)	2021-11-03 17:16:46 +01:00
Emeric Brun	c37caab21c	BUG/MINOR: resolvers: fix sent messages were counted twice The sent messages counter was increased at both resolver.c and dns.c layers. This patch let the dns.c layer count the sent messages since this layer handle a retry if transport layer is not ready (EAGAIN on udp or tcp session ring buffer full). This patch should be backported on all versions using a split of those layers for resolving (v >=2.4)	2021-11-03 17:16:46 +01:00
Christopher Faulet	9ed1a0601d	BUG/MEDIUM: resolvers: Track api calls with a counter to free resolutions The kill list introduced in commit f766ec6b5 ("MEDIUM: resolvers: use a kill list to preserve the list consistency") contains a bug. The deatch_row must be initialized before calling resolv_process_responses() function. However, this function is called for the dns code. The death_row is not visible from the outside. So, it is possible to add a resolution in an uninitialized death_row, leading to a crash. But, with the current implementation, it is not possible to handle the death_row in resolv_process_responses() function because, internally, the kill list may be freed via a call to resolv_unlink_resolution(). At the end, we are unable to determine all call chains to guarantee a safe use of the kill list. It is a shameful observation, but unfortunatly true. So, to make the fix simple, we track all calls to the public resolvers api. A counter is incremented when we enter in the resolver code and decremented when we leave it. This way, we are able to track the recursions to init and release the kill list only once, at the edge. Following functions are incrementing/decrementing the recurse counter: * resolv_trigger_resolution() * resolv_srvrq_expire_task() * resolv_link_resolution() * resolv_unlink_resolution() * resolv_detach_from_resolution_answer_items() * resolv_process_responses() * process_resolvers() * resolvers_finalize_config() * resolv_action_do_resolve() This patch should fix the issue #1404. It must be backported everywhere the above commit was backported.	2021-11-02 16:55:01 +01:00
Christopher Faulet	bce6db6c3c	BUG/MEDIUM: resolvers: Don't recursively perform requester unlink When a requester is unlink from a resolution, by reading the code, we can have this call chain: _resolv_unlink_resolution(srv->resolv_requester) resolv_detach_from_resolution_answer_items(resolution, requester) resolv_srvrq_cleanup_srv(srv) _resolv_unlink_resolution(srv->resolv_requester) A loop on the resolution answer items is performed inside resolv_detach_from_resolution_answer_items(). But by reading the code, it seems possible to recursively unlink the same requester. To avoid any loop at this stage, the requester clean up must be performed before the call to resolv_detach_from_resolution_answer_items(). This way, the second call to _resolv_unlink_resolution() does nothing and returns immediately because the requester was already detached from the resolution. This patch is related to the issue #1404. It must be backported as far as 2.2.	2021-10-29 15:06:31 +02:00
Willy Tarreau	14e7f29e86	MINOR: protocols: replace protocol_by_family() with protocol_lookup() At a few places we were still using protocol_by_family() instead of the richer protocol_lookup(). The former is limited as it enforces SOCK_STREAM and a stream protocol at the control layer. At least with protocol_lookup() we don't have this limitationn. The values were still set for now but later we can imagine making them configurable on the fly.	2021-10-27 17:41:07 +02:00
Willy Tarreau	dbb0bb59e3	CLEANUP: resolvers: get rid of single-iteration loop in resolv_get_ip_from_response() In issue 1424 Coverity reports that the loop increment is unreachable, which is true, the list_for_each_entry() was replaced with a for loop, but it was already not needed and was instead used as a convenient construct for a single iteration lookup. Let's get rid of all this now and replace the loop with an "if" statement.	2021-10-22 08:34:14 +02:00
Willy Tarreau	dcb696cd31	MEDIUM: resolvers: hash the records before inserting them into the tree We're using an XXH32() on the record to insert it into or look it up from the tree. This way we don't change the rest of the code, the comparisons are still made on all fields and the next node is visited on mismatch. This also allows to continue to use roundrobin between identical nodes. Just doing this is sufficient to see the CPU usage go down from ~60-70% to 4% at ~2k DNS requests per second for farm with 300 servers. A larger config with 12 backends of 2000 servers each shows ~8-9% CPU for 6-10000 DNS requests per second. It would probably be possible to go further with multiple levels of indexing but it's not worth it, and it's important to remember that tree nodes take space (the struct answer_list went back from 576 to 600 bytes).	2021-10-21 08:29:02 +02:00
Willy Tarreau	7893ae117f	MEDIUM: resolvers: replace the answer_list with a (flat) tree With SRV records, a huge amount of time is spent looking for records by walking long lists. It is possible to reduce this by indexing values in trees instead. However the whole code relies a lot on the list ordering, and even implements some round-robin on it to distribute IP addresses to servers. This patch starts carefully by replacing the list with a an eb32 tree that is still used like a list, with a constant key 0. Since ebtrees preserve insertion order for duplicates, the tree walk visits the nodes in the exact same order it did with the lists. This allows to implement the required infrastructure without changing the behavior.	2021-10-21 08:02:08 +02:00
Willy Tarreau	6878f80427	MEDIUM: resolvers: remove the last occurrences of the "safe" argument This one was used to indicate whether the callee had to follow particularly safe code path when removing resolutions. Since the code now uses a kill list, this is not needed anymore.	2021-10-20 17:54:27 +02:00
Willy Tarreau	f766ec6b53	MEDIUM: resolvers: use a kill list to preserve the list consistency When scanning resolution.curr it's possible to try to free some resolutions which will themselves result in freeing other ones. If one of these other ones is exactly the next one in the list, the list walk visits deleted nodes and causes memory corruption, double-frees and so on. The approach taken using the "safe" argument to some functions seems to work but it's extremely brittle as it is required to carefully check all call paths from process_ressolvers() and pass the argument to 1 there to refrain from deleting entries, so the bug is very likely to come back after some tiny changes to this code. A variant was tried, checking at various places that the current task corresponds to process_resolvers() but this is also quite brittle even though a bit less. This patch uses another approach which consists in carefully unlinking elements from the list and deferring their removal by placing it in a kill list instead of deleting them synchronously. The real benefit here is that the complexity only has to be placed where the complications are. A thread-local list is fed with elements to be deleted before scanning the resolutions, and it's flushed at the end by picking the first one until the list is empty. This way we never dereference the next element and do not care about its presence or not in the list. One function, resolv_unlink_resolution(), is exported and used outside, so it had to be modified to use this list as well. Internal code has to use _resolv_unlink_resolution() instead.	2021-10-20 17:54:22 +02:00
Willy Tarreau	aae7320b0d	CLEANUP: resolvers: replace all LIST_DELETE with LIST_DEL_INIT The code as it is uses crossed lists between many elements, and at many places the code relies on list iterators or emptiness checks, which does not work with only LIST_DELETE. Further, it is quite difficult to place debugging code and checks in the current situation, and gdb is helpless. This code replaces all LIST_DELETE calls with LIST_DEL_INIT so that it becomes possible to trust the lists.	2021-10-20 17:54:14 +02:00
Willy Tarreau	239675e4a9	CLEANUP: resolvers: simplify resolv_link_resolution() regarding requesters This function allocates requesters by hand for each and every type. This is complex and error-prone, and it doesn't even initialize the list part, leaving dangling pointers that complicate debugging. This patch introduces a new function resolv_get_requester() that either returns the current pointer if valid or tries to allocate a new one and links it to its destination. Then it makes use of it in the function above to clean it up quite a bit. This allows to remove complicated but unneeded tests.	2021-10-20 17:54:01 +02:00
Willy Tarreau	48664c048d	CLEANUP: always initialize the answer_list Similar to the previous patch, the answer's list was only initialized the first time it was added to a list, leading to bogus outdated pointer to appear when debugging code is added around it to watch it. Let's make sure it's always initialized upon allocation.	2021-10-20 17:53:54 +02:00
Willy Tarreau	25e010906a	BUG/MEDIUM: resolvers: always check a valid item in query_list The query_list is physically stored in the struct resolution itself, so we have a list that contains a list to items stored in itself (and there is a single item). But the list is first initialized in resolv_validate_dns_response(), while it's scanned in resolv_process_responses() later after calling the former. First, this results in crashes as soon as the code is instrumented a little bit for debugging, as elements from a previous incarnation can appear. But in addition to this, the presence of an element is checked by verifying that the return of LIST_NEXT() is not NULL, while it may never be NULL even for an empty list, resulting in bugs or crashes if the number of responses does not match the list's contents. This is easily triggered by testing for the list non-emptiness outside of the function. Let's make sure the list is always correct, i.e. it's initialized to an empty list when the structure is allocated, elements are checked by first verifying the list is not empty, they are deleted once checked, and in any case at end so that there are no dangling pointers. This should be backported, but only as long as the patch fits without modifications, as adaptations can be risky there given that bugs tend to hide each other.	2021-10-20 17:53:35 +02:00
Willy Tarreau	10c1a8c3bd	BUILD: resolvers: avoid a possible warning on null-deref Depending on the code that precedes the loop, gcc may emit this warning: src/resolvers.c: In function 'resolv_process_responses': src/resolvers.c:1009:11: warning: potential null pointer dereference [-Wnull-dereference] 1009 \| if (query->type != DNS_RTYPE_SRV && flags & DNS_FLAG_TRUNCATED) { \| ~~~~~^~~~~~ However after carefully checking, r_res->header.qdcount it exclusively 1 when reaching this place, which forces the for() loop to enter for at least one iteration, and <query> to be set. Thus there's no code path leading to a null deref. It's possibly just because the assignment is too far and the compiler cannot figure that the condition is always OK. Let's just mark it to please the compiler.	2021-10-20 17:53:35 +02:00
Willy Tarreau	2acc160c05	CLEANUP: resolvers: do not export resolv_purge_resolution_answer_records() This code is dangerous enough that we certainly don't want external code to ever approach it, let's not export unnecessary functions like this one. It was made static and a comment was added about its purpose.	2021-10-20 17:52:50 +02:00
Willy Tarreau	2a67aa0a51	BUG/MAJOR: resolvers: add other missing references during resolution removal There is a fundamental design bug in the resolvers code which is that a list of active resolutions is being walked to try to delete outdated entries, and that the code responsible for removing them also removes other elements, including the next one which will be visited by the list iterator. This randomly causes a use-after-free condition leading to crashes, infinite loops and various other issues such as random memory corruption. A first fix for the memory fix for this was brought by commit 0efc0993e ("BUG/MEDIUM: resolvers: Don't release resolution from a requester callbacks"). While preparing for more fixes, some code was factored by commit 11c6c3965 ("MINOR: resolvers: Clean server in a dedicated function when removing a SRV item"), which inadvertently passed "0" as the "safe" argument all the time, missing one case of removal protection, instead of always using "safe". This patch reintroduces the correct argument. This must be backported with all fixes above. Cc: Christopher Faulet <cfaulet@haproxy.com>	2021-10-20 17:52:36 +02:00

1 2 3

102 Commits