haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-09 08:37:04 +02:00

Author	SHA1	Message	Date
Christopher Faulet	908628c4c0	MEDIUM: tree-wide: Use CS util functions instead of SI ones At many places, we now use the new CS functions to get a stream or a channel from a conn-stream instead of using the stream-interface API. It is the first step to reduce the scope of the stream-interfaces. The main change here is about the applet I/O callback functions. Before the refactoring, the stream-interface was the appctx owner. Thus, it was heavily used. Now, as far as possible,the conn-stream is used. Of course, it remains many calls to the stream-interface API.	2022-04-13 15:10:14 +02:00
Christopher Faulet	693b23bb10	MEDIUM: tree-wide: Use unsafe conn-stream API when it is relevant The unsafe conn-stream API (__cs_*) is now used when we are sure the good endpoint or application is attached to the conn-stream. This avoids compiler warnings about possible null derefs. It also simplify the code and clear up any ambiguity about manipulated entities.	2022-02-28 17:13:36 +01:00
Christopher Faulet	86e1c3381b	MEDIUM: applet: Set the conn-stream as appctx owner instead of the stream-int Because appctx is now an endpoint of the conn-stream, there is no reason to still have the stream-interface as appctx owner. Thus, the conn-stream is now the appctx owner.	2022-02-24 11:00:02 +01:00
Christopher Faulet	13a35e5752	MAJOR: conn_stream/stream-int: move the appctx to the conn-stream Thanks to previous changes, it is now possible to set an appctx as endpoint for a conn-stream. This means the appctx is no longer linked to the stream-interface but to the conn-stream. Thus, a pointer to the conn-stream is explicitly stored in the stream-interface. The endpoint (connection or appctx) can be retrieved via the conn-stream.	2022-02-24 11:00:02 +01:00
Christopher Faulet	0a82cf4c16	BUG/MEDIUM: resolvers: Really ignore trailing dot in domain names When a string is converted to a domain name label, the trailing dot must be ignored. In resolv_str_to_dn_label(), there is a test to do so. However, the trailing dot is not really ignored. The character itself is not copied but the string index is still moved to the next char. Thus, this trailing dot is counted in the length of the last encoded part of the domain name. Worst, because the copy is skipped, a garbage character is included in the domain name. This patch should fix the issue #1528. It must be backported as far as 2.0.	2022-01-28 17:56:18 +01:00
Christopher Faulet	af93d2fd70	BUG/MINOR: resolvers: Don't overwrite the error for invalid query domain name When a response is validated, the query domain name is checked to be sure it is the same than the one requested. When an error is reported, the wrong goto label was used. Thus, the error was lost. Instead of RSLV_RESP_WRONG_NAME, RSLV_RESP_INVALID was reported. This bug was introduced by the commit `c1699f8c1` ("MEDIUM: resolvers: No longer store query items in a list into the response"). This patch should fix the issue #1473. No backport is needed.	2021-12-02 10:05:04 +01:00
Christopher Faulet	c1699f8c1b	MEDIUM: resolvers: No longer store query items in a list into the response When the response is parsed, query items are stored in a list, attached to the parsed response (resolve_response). First, there is one and only one query sent at a time. Thus, there is no reason to use a list. There is a test to be sure there is only one query item in the response. Then, the reference on this query item is only used to validate the domain name is the one requested. So the query list can be removed. We only expect one query item, no reason to loop on query records. In addition, the query domain name is now immediately checked against the resolution domain name. This way, the query item is only manipulated during the response parsing.	2021-12-01 15:21:56 +01:00
Christopher Faulet	80b2e34b18	BUG/MEDIUM: resolvers: Detach query item on response error When a new response is parsed, it is unexpected to have an old query item still attached to the resolution. And indeed, when the response is parsed and validated, the query item is detached and used for a last check on its dname. However, this is only true for a valid response. If an error is detected, the query is not detached. This leads to undefined behavior (most probably a crash) on the next response because the first element in the query list is referencing an old response. This patch must be backported as far as 2.0.	2021-12-01 11:47:08 +01:00
Emeric Brun	f8642ee826	MEDIUM: resolvers: rename dns extra counters to resolvers extra counters This patch renames all dns extra counters and stats functions, types and enums using the 'resolv' prefix/suffixes. The dns extra counter domain id used on cli was replaced by "resolvers" instead of "dns". The typed extra counter prefix dumping resolvers domain "D." was also renamed "N." because it points counters on a Nameserver. This was done to finish the split between "resolver" and "dns" layers and to avoid further misunderstanding when haproxy will handle dns load balancing. This should not be backported.	2021-11-03 17:16:46 +01:00
Emeric Brun	d174f0e59a	MINOR: resolvers/dns: split dns and resolver counters in dns_counter struct This patch add a union and struct into dns_counter struct to split application specific counters. The only current existing application is the resolver.c layer but in futur we could handle different application such as dns load balancing with others specific counters. This patch should not be backported.	2021-11-03 17:16:46 +01:00
Emeric Brun	0161d32df2	BUG/MINOR: resolvers: throw log message if trash not large enough for query Before this patch the sent error counter was increased for each targeted nameserver as soon as we were unable to build the query message into the trash buffer. But this counter is here to count sent errors at dns.c transport layer and this error is not related to a nameserver. This patch stops to increase those counters and sent a log message to signal the trash buffer size is not large enough to build the query. Note: This case should not happen except if trash size buffer was customized to a very low value. The function was also re-worked to return -1 in this error case as it was specified in comment. This function is currently called at multiple point in resolver.c but return code is still not yet handled. So to advert the user of the malfunction the log message was added. This patch should be backported on all versions including the layer split between dns.c and resolver.c (v >= 2.4)	2021-11-03 17:16:46 +01:00
Emeric Brun	c37caab21c	BUG/MINOR: resolvers: fix sent messages were counted twice The sent messages counter was increased at both resolver.c and dns.c layers. This patch let the dns.c layer count the sent messages since this layer handle a retry if transport layer is not ready (EAGAIN on udp or tcp session ring buffer full). This patch should be backported on all versions using a split of those layers for resolving (v >=2.4)	2021-11-03 17:16:46 +01:00
Christopher Faulet	9ed1a0601d	BUG/MEDIUM: resolvers: Track api calls with a counter to free resolutions The kill list introduced in commit `f766ec6b5` ("MEDIUM: resolvers: use a kill list to preserve the list consistency") contains a bug. The deatch_row must be initialized before calling resolv_process_responses() function. However, this function is called for the dns code. The death_row is not visible from the outside. So, it is possible to add a resolution in an uninitialized death_row, leading to a crash. But, with the current implementation, it is not possible to handle the death_row in resolv_process_responses() function because, internally, the kill list may be freed via a call to resolv_unlink_resolution(). At the end, we are unable to determine all call chains to guarantee a safe use of the kill list. It is a shameful observation, but unfortunatly true. So, to make the fix simple, we track all calls to the public resolvers api. A counter is incremented when we enter in the resolver code and decremented when we leave it. This way, we are able to track the recursions to init and release the kill list only once, at the edge. Following functions are incrementing/decrementing the recurse counter: * resolv_trigger_resolution() * resolv_srvrq_expire_task() * resolv_link_resolution() * resolv_unlink_resolution() * resolv_detach_from_resolution_answer_items() * resolv_process_responses() * process_resolvers() * resolvers_finalize_config() * resolv_action_do_resolve() This patch should fix the issue #1404. It must be backported everywhere the above commit was backported.	2021-11-02 16:55:01 +01:00
Christopher Faulet	bce6db6c3c	BUG/MEDIUM: resolvers: Don't recursively perform requester unlink When a requester is unlink from a resolution, by reading the code, we can have this call chain: _resolv_unlink_resolution(srv->resolv_requester) resolv_detach_from_resolution_answer_items(resolution, requester) resolv_srvrq_cleanup_srv(srv) _resolv_unlink_resolution(srv->resolv_requester) A loop on the resolution answer items is performed inside resolv_detach_from_resolution_answer_items(). But by reading the code, it seems possible to recursively unlink the same requester. To avoid any loop at this stage, the requester clean up must be performed before the call to resolv_detach_from_resolution_answer_items(). This way, the second call to _resolv_unlink_resolution() does nothing and returns immediately because the requester was already detached from the resolution. This patch is related to the issue #1404. It must be backported as far as 2.2.	2021-10-29 15:06:31 +02:00
Willy Tarreau	14e7f29e86	MINOR: protocols: replace protocol_by_family() with protocol_lookup() At a few places we were still using protocol_by_family() instead of the richer protocol_lookup(). The former is limited as it enforces SOCK_STREAM and a stream protocol at the control layer. At least with protocol_lookup() we don't have this limitationn. The values were still set for now but later we can imagine making them configurable on the fly.	2021-10-27 17:41:07 +02:00
Willy Tarreau	dbb0bb59e3	CLEANUP: resolvers: get rid of single-iteration loop in resolv_get_ip_from_response() In issue 1424 Coverity reports that the loop increment is unreachable, which is true, the list_for_each_entry() was replaced with a for loop, but it was already not needed and was instead used as a convenient construct for a single iteration lookup. Let's get rid of all this now and replace the loop with an "if" statement.	2021-10-22 08:34:14 +02:00
Willy Tarreau	dcb696cd31	MEDIUM: resolvers: hash the records before inserting them into the tree We're using an XXH32() on the record to insert it into or look it up from the tree. This way we don't change the rest of the code, the comparisons are still made on all fields and the next node is visited on mismatch. This also allows to continue to use roundrobin between identical nodes. Just doing this is sufficient to see the CPU usage go down from ~60-70% to 4% at ~2k DNS requests per second for farm with 300 servers. A larger config with 12 backends of 2000 servers each shows ~8-9% CPU for 6-10000 DNS requests per second. It would probably be possible to go further with multiple levels of indexing but it's not worth it, and it's important to remember that tree nodes take space (the struct answer_list went back from 576 to 600 bytes).	2021-10-21 08:29:02 +02:00
Willy Tarreau	7893ae117f	MEDIUM: resolvers: replace the answer_list with a (flat) tree With SRV records, a huge amount of time is spent looking for records by walking long lists. It is possible to reduce this by indexing values in trees instead. However the whole code relies a lot on the list ordering, and even implements some round-robin on it to distribute IP addresses to servers. This patch starts carefully by replacing the list with a an eb32 tree that is still used like a list, with a constant key 0. Since ebtrees preserve insertion order for duplicates, the tree walk visits the nodes in the exact same order it did with the lists. This allows to implement the required infrastructure without changing the behavior.	2021-10-21 08:02:08 +02:00
Willy Tarreau	6878f80427	MEDIUM: resolvers: remove the last occurrences of the "safe" argument This one was used to indicate whether the callee had to follow particularly safe code path when removing resolutions. Since the code now uses a kill list, this is not needed anymore.	2021-10-20 17:54:27 +02:00
Willy Tarreau	f766ec6b53	MEDIUM: resolvers: use a kill list to preserve the list consistency When scanning resolution.curr it's possible to try to free some resolutions which will themselves result in freeing other ones. If one of these other ones is exactly the next one in the list, the list walk visits deleted nodes and causes memory corruption, double-frees and so on. The approach taken using the "safe" argument to some functions seems to work but it's extremely brittle as it is required to carefully check all call paths from process_ressolvers() and pass the argument to 1 there to refrain from deleting entries, so the bug is very likely to come back after some tiny changes to this code. A variant was tried, checking at various places that the current task corresponds to process_resolvers() but this is also quite brittle even though a bit less. This patch uses another approach which consists in carefully unlinking elements from the list and deferring their removal by placing it in a kill list instead of deleting them synchronously. The real benefit here is that the complexity only has to be placed where the complications are. A thread-local list is fed with elements to be deleted before scanning the resolutions, and it's flushed at the end by picking the first one until the list is empty. This way we never dereference the next element and do not care about its presence or not in the list. One function, resolv_unlink_resolution(), is exported and used outside, so it had to be modified to use this list as well. Internal code has to use _resolv_unlink_resolution() instead.	2021-10-20 17:54:22 +02:00
Willy Tarreau	aae7320b0d	CLEANUP: resolvers: replace all LIST_DELETE with LIST_DEL_INIT The code as it is uses crossed lists between many elements, and at many places the code relies on list iterators or emptiness checks, which does not work with only LIST_DELETE. Further, it is quite difficult to place debugging code and checks in the current situation, and gdb is helpless. This code replaces all LIST_DELETE calls with LIST_DEL_INIT so that it becomes possible to trust the lists.	2021-10-20 17:54:14 +02:00
Willy Tarreau	239675e4a9	CLEANUP: resolvers: simplify resolv_link_resolution() regarding requesters This function allocates requesters by hand for each and every type. This is complex and error-prone, and it doesn't even initialize the list part, leaving dangling pointers that complicate debugging. This patch introduces a new function resolv_get_requester() that either returns the current pointer if valid or tries to allocate a new one and links it to its destination. Then it makes use of it in the function above to clean it up quite a bit. This allows to remove complicated but unneeded tests.	2021-10-20 17:54:01 +02:00
Willy Tarreau	48664c048d	CLEANUP: always initialize the answer_list Similar to the previous patch, the answer's list was only initialized the first time it was added to a list, leading to bogus outdated pointer to appear when debugging code is added around it to watch it. Let's make sure it's always initialized upon allocation.	2021-10-20 17:53:54 +02:00
Willy Tarreau	25e010906a	BUG/MEDIUM: resolvers: always check a valid item in query_list The query_list is physically stored in the struct resolution itself, so we have a list that contains a list to items stored in itself (and there is a single item). But the list is first initialized in resolv_validate_dns_response(), while it's scanned in resolv_process_responses() later after calling the former. First, this results in crashes as soon as the code is instrumented a little bit for debugging, as elements from a previous incarnation can appear. But in addition to this, the presence of an element is checked by verifying that the return of LIST_NEXT() is not NULL, while it may never be NULL even for an empty list, resulting in bugs or crashes if the number of responses does not match the list's contents. This is easily triggered by testing for the list non-emptiness outside of the function. Let's make sure the list is always correct, i.e. it's initialized to an empty list when the structure is allocated, elements are checked by first verifying the list is not empty, they are deleted once checked, and in any case at end so that there are no dangling pointers. This should be backported, but only as long as the patch fits without modifications, as adaptations can be risky there given that bugs tend to hide each other.	2021-10-20 17:53:35 +02:00
Willy Tarreau	10c1a8c3bd	BUILD: resolvers: avoid a possible warning on null-deref Depending on the code that precedes the loop, gcc may emit this warning: src/resolvers.c: In function 'resolv_process_responses': src/resolvers.c:1009:11: warning: potential null pointer dereference [-Wnull-dereference] 1009 \| if (query->type != DNS_RTYPE_SRV && flags & DNS_FLAG_TRUNCATED) { \| ~~~~~^~~~~~ However after carefully checking, r_res->header.qdcount it exclusively 1 when reaching this place, which forces the for() loop to enter for at least one iteration, and <query> to be set. Thus there's no code path leading to a null deref. It's possibly just because the assignment is too far and the compiler cannot figure that the condition is always OK. Let's just mark it to please the compiler.	2021-10-20 17:53:35 +02:00
Willy Tarreau	2acc160c05	CLEANUP: resolvers: do not export resolv_purge_resolution_answer_records() This code is dangerous enough that we certainly don't want external code to ever approach it, let's not export unnecessary functions like this one. It was made static and a comment was added about its purpose.	2021-10-20 17:52:50 +02:00
Willy Tarreau	2a67aa0a51	BUG/MAJOR: resolvers: add other missing references during resolution removal There is a fundamental design bug in the resolvers code which is that a list of active resolutions is being walked to try to delete outdated entries, and that the code responsible for removing them also removes other elements, including the next one which will be visited by the list iterator. This randomly causes a use-after-free condition leading to crashes, infinite loops and various other issues such as random memory corruption. A first fix for the memory fix for this was brought by commit `0efc0993e` ("BUG/MEDIUM: resolvers: Don't release resolution from a requester callbacks"). While preparing for more fixes, some code was factored by commit `11c6c3965` ("MINOR: resolvers: Clean server in a dedicated function when removing a SRV item"), which inadvertently passed "0" as the "safe" argument all the time, missing one case of removal protection, instead of always using "safe". This patch reintroduces the correct argument. This must be backported with all fixes above. Cc: Christopher Faulet <cfaulet@haproxy.com>	2021-10-20 17:52:36 +02:00
Willy Tarreau	75cc65356f	MEDIUM: resolvers: replace bogus resolv_hostname_cmp() with memcmp() resolv_hostname_cmp() is bogus, it is applied on labels and not plain names, but doesn't make any distinction between length prefixes and characters, so it compares the labels lengths via tolower() as well. The only reason for which it doesn't break is because labels cannot be larger than 63 bytes, and that none of the common encoding systems have upper case letters in the lower 63 bytes, that could be turned into a different value via tolower(). Now that all labels are stored in lower case, we don't need to burn CPU cycles in tolower() at run time and can use memcmp() instead of resolv_hostname_cmp(). This results in a ~22% lower CPU usage on large farms using SRV records: before: 18.33% haproxy [.] resolv_validate_dns_response 10.58% haproxy [.] process_resolvers 10.28% haproxy [.] resolv_hostname_cmp 7.50% libc-2.30.so [.] tolower 46.69% total after: 24.73% haproxy [.] resolv_validate_dns_response 7.78% libc-2.30.so [.] __memcmp_avx2_movbe 3.65% haproxy [.] process_resolvers 36.16% total	2021-10-18 10:47:36 +02:00
Willy Tarreau	814889c28a	MEDIUM: resolvers: lower-case labels when converting from/to DNS names The whole code relies on performing case-insensitive comparison on lookups, which is extremely inefficient. Let's make sure that all labels to be looked up or sent are first converted to lower case. Doing so is also the opportunity to eliminate an inefficient memcpy() in resolv_dn_label_to_str() that essentially runs over a few unaligned bytes at once. As a side note, that call was dangerous because it relied on a sign-extended size taken from a string that had to be sanitized first. This is tagged medium because while this is 100% safe, it may cause visible changes on the wire at the packet level and trigger bugs in test programs.	2021-10-18 09:14:02 +02:00
Willy Tarreau	7b232f132d	BUG/MEDIUM: resolvers: fix truncated TLD consecutive to the API fix A bug was introduced by commit previous `bf9498a31` ("MINOR: resolvers: fix the resolv_str_to_dn_label() API about trailing zero") as the code is particularly contrived and hard to test. The output writes the last char at [i+1] so the trailing zero and return value must be at i+1. This will have to be backported where the patch above is backported since it was needed for a fix.	2021-10-15 08:09:25 +02:00
Willy Tarreau	cc8fd4c040	MINOR: resolvers: merge address and target into a union "data" These two fields are exclusive as they depend on the data type. Let's move them into a union to save some precious bytes. This reduces the struct resolv_answer_item size from 600 to 576 bytes.	2021-10-14 22:52:04 +02:00
Willy Tarreau	b4ca0195a9	BUG/MEDIUM: resolvers: use correct storage for the target address The struct resolv_answer_item contains an address field of type "sockaddr" which is only 16 bytes long, but which is used to store either IPv4 or IPv6. Fortunately, the contents only overlap with the "target" field that follows it and that is large enough to absorb the extra bytes needed to store AAAA records. But this is dangerous as just moving fields around could result in memory corruption. The fix uses a union and removes the casts that were used to hide the problem. Older versions need to be checked and possibly fixed. This needs to be backported anyway.	2021-10-14 22:44:51 +02:00
Willy Tarreau	875ee704dd	MINOR: resolvers: fix the resolv_dn_label_to_str() API about trailing zero This function suffers from the same API issue as its sibling that does the opposite direction, it demands that the input string is zero-terminated and that its length including the trailing zero is passed on input, forcing callers to pass length + 1, and itself to use that length - 1 everywhere internally. This patch addressess this. There is a single caller, which is the location of the previous bug, so it should probably be backported at least to keep the code consistent across versions. Note that the function is called dns_dn_label_to_str() in 2.3 and earlier.	2021-10-14 21:24:18 +02:00
Willy Tarreau	85c15e6bff	BUG/MINOR: resolvers: do not reject host names of length 255 in SRV records An off-by-one issue in buffer size calculation used to limit the output of resolv_dn_label_to_str() to 254 instead of 255. This must be backported to 2.0.	2021-10-14 21:24:18 +02:00
Willy Tarreau	947ae125cc	BUG/MEDIUM: resolver: make sure to always use the correct hostname length In issue #1411, @jjiang-stripe reports that do-resolve() sometimes seems to be trying to resolve crap from random memory contents. The issue is that action_prepare_for_resolution() tries to measure the input string by itself using strlen(), while resolv_action_do_resolve() directly passes it a pointer to the sample, omitting the known length. Thus of course any other header present after the host in memory are appended to the host value. It could theoretically crash if really unlucky, with a buffer that does not contain any zero including in the index at the end, and if the HTX buffer ends on an allocation boundary. In practice it should be too low a probability to have ever been observed. This patch modifies the action_prepare_for_resolution() function to take the string length on with the host name on input and pass that down the chain. This should be backported to 2.0 along with commit "MINOR: resolvers: fix the resolv_str_to_dn_label() API about trailing zero".	2021-10-14 21:24:18 +02:00
Willy Tarreau	bf9498a31b	MINOR: resolvers: fix the resolv_str_to_dn_label() API about trailing zero This function is bogus at the API level: it demands that the input string is zero-terminated and that its length including the trailing zero is passed on input. While that already looks smelly, the trailing zero is copied as-is, and is then explicitly replaced with a zero... Not only all callers have to pass hostname_len+1 everywhere to work around this absurdity, but this requirement causes a bug in the do-resolve() action that passes random string lengths on input, and that will be fixed on a subsequent patch. Let's fix this API issue for now. This patch will have to be backported, and in versions 2.3 and older, the function is in dns.c and is called dns_str_to_dn_label().	2021-10-14 21:24:18 +02:00
Willy Tarreau	beeabf5314	MINOR: task: provide 3 task_new_* wrappers to simplify the API We'll need to improve the API to pass other arguments in the future, so let's start to adapt better to the current use cases. task_new() is used: - 18 times as task_new(tid_bit) - 18 times as task_new(MAX_THREADS_MASK) - 2 times with a single bit (in a loop) - 1 in the debug code that uses a mask This patch provides 3 new functions to achieve this: - task_new_here() to create a task on the calling thread - task_new_anywhere() to create a task to be run anywhere - task_new_on() to create a task to run on a specific thread The change is trivial and will allow us to later concentrate the required adaptations to these 3 functions only. It's still possible to call task_new() if needed but a comment was added to encourage the use of the new ones instead. The debug code was not changed and still uses it.	2021-10-01 18:36:29 +02:00
Amaury Denoyelle	dd56520cdf	BUG/MINOR: resolvers: mark servers with name-resolution as non purgeable When a server is configured with name-resolution, resolvers objects are created with reference to this server. Thus the server is marked as non purgeable to prevent its removal at runtime. This does not need to be backport.	2021-08-26 15:53:17 +02:00
Christopher Faulet	1f923391d1	BUG/MINOR: resolvers: Use a null-terminated string to lookup in servers tree When we evaluate a DNS response item, it may be necessary to look for a server with a hostname matching the item target into the named servers tree. To do so, the item target is transformed to a lowercase string. It must be a null-terminated string. Thus we must explicitly set the trailing '\0' character. For a specific resolution, the named servers tree contains all servers using this resolution with a hostname loaded from a state file. Because of this bug, same entry may be duplicated because we are unable to find the right server, assigning this way the item to a free server slot. This patch should fix the issue #1333. It must be backported as far as 2.2.	2021-07-22 15:03:25 +02:00
Christopher Faulet	d7bb23490c	BUG/MINOR: resolvers: Always attach server on matching record on resolution On A/AAAA resolution, for a given server, if a record is matching, we must always attach the server to this record. Before it was only done if the server IP was not the same than the record one. However, it is a problem if the server IP was not set for a previous resolution. From the libc during startup for instance. In this case, the server IP is not updated and the server is not attached to any record. It remains in this state while a matching record is found in the DNS response. It is especially a problem when the resolution is used for server-templates. This bug was introduced by the commit `bd78c912f` ("MEDIUM: resolvers: add a ref on server to the used A/AAAA answer item"). This patch should solve the issue #1305. It must be backported to all versions containing the above commit.	2021-06-24 17:15:33 +02:00
Christopher Faulet	e886dd5c32	BUG/MINOR: resolvers: Use resolver's lock in resolv_srvrq_expire_task() The commit `dcac41806` ("BUG/MEDIUM: resolvers: Add a task on servers to check SRV resolution status") introduced a type. In resolv_srvrq_expire_task() function, the resolver's lock must be used instead of the resolver itself. This patch must be backported with the patch above (at least as far as 2.2).	2021-06-18 09:15:35 +02:00
Christopher Faulet	dcac418062	BUG/MEDIUM: resolvers: Add a task on servers to check SRV resolution status When a server relies on a SRV resolution, a task is created to clean it up (fqdn/port and address) when the SRV resolution is considered as outdated (based on the resolvers 'timeout' value). It is only possible if the server inherits outdated info from a state file and is no longer selected to be attached to a SRV item. Note that most of time, a server is attached to a SRV item. Thus when the item becomes obsolete, the server is cleaned up. It is important to have such task to be sure the server will be free again to have a chance to be resolved again with fresh information. Of course, this patch is a workaround to solve a design issue. But there is no other obvious way to fix it without rewritting all the resolvers part. And it must be backportable. This patch relies on following commits: * MINOR: resolvers: Clean server in a dedicated function when removing a SRV item * MINOR: resolvers: Remove server from named_servers tree when removing a SRV item All the series must be backported as far as 2.2 after some observation period. Backports to 2.0 and 1.8 must be evaluated.	2021-06-17 16:52:35 +02:00
Christopher Faulet	73001ab6e3	MINOR: resolvers: Remove server from named_servers tree when removing a SRV item When a server is cleaned up because the corresponding SRV item is removed, we always remove the server from the srvrq's name_servers tree. For now, it is useless because, if a server was attached to a SRV item, it means it was already removed from the tree. But it will be mandatory to fix a bug.	2021-06-17 16:52:35 +02:00
Christopher Faulet	11c6c39656	MINOR: resolvers: Clean server in a dedicated function when removing a SRV item A dedicated function is now used to clean up servers when a SRV item becomes obsolete or when a requester is removed from a resolution. This patch is mandatory to fix a bug.	2021-06-17 16:52:35 +02:00
Willy Tarreau	72faef3866	MEDIUM: global: remove dead code from nbproc/bind_proc removal Lots of places iterating over nbproc or comparing with nbproc could be simplified. Further, "bind-process" and "process" parsing that was already limited to process 1 or "all" or "odd" resulted in a bind_proc field that was either 0 or 1 during the init phase and later always 1. All the checks for compatibilities were removed since it's not possible anymore to run a frontend and a backend on different processes or to have peers and stick-tables bound on different ones. This is the largest part of this patch. The bind_proc field was removed from both the proxy and the receiver structs. Since the "process" and "bind-process" directives are still parsed, configs making use of correct values allowing process 1 will continue to work.	2021-06-15 16:52:42 +02:00
Emeric Brun	3406766d57	MEDIUM: resolvers: add a ref between servers and srv request or used SRV record This patch add a ref into servers to register them onto the record answer item used to set their hostnames. It also adds a head list into 'srvrq' to register servers free to be affected to a SRV record. A head of a tree is also added to srvrq to put servers which present a hotname in server state file. To re-link them fastly to the matching record as soon an item present the same name. This results in better performances on SRV record response parsing. This is an optimization but it could avoid to trigger the haproxy's internal wathdog in some circumstances. And for this reason it should be backported as far we can (2.0 ?)	2021-06-11 16:16:16 +02:00
Emeric Brun	bd78c912fd	MEDIUM: resolvers: add a ref on server to the used A/AAAA answer item This patch adds a head list into answer items on servers which use this record to set their IPs. It makes lookup on duplicated ip faster and allow to check immediatly if an item is still valid renewing the IP. This results in better performances on A/AAAA resolutions. This is an optimization but it could avoid to trigger the haproxy's internal wathdog in some circumstances. And for this reason it should be backported as far we can (2.0 ?)	2021-06-11 16:16:16 +02:00
Emeric Brun	12ca658dbe	BUG/MINOR: resolvers: answser item list was randomly purged or errors In case of SRV records, The answer item list was purged by the error callback of the first requester which considers the error could not be safely ignored. It makes this item list unavailable for subsequent requesters even if they consider the error could be ignored. On A resolution or do_resolve action error, the answer items were never trashed. This patch re-work the error callbacks and the code to check the return code If a callback return 1, we consider the error was ignored and the answer item list must be kept. At the opposite, If all error callbacks of all requesters of the same resolution returns 0 the list will be purged This patch should be backported as far as 2.0.	2021-06-11 16:16:16 +02:00
Amaury Denoyelle	111243003e	MINOR: errors: specify prefix "config" for parsing output Set "config :" as a prefix for the user messages context before starting the configuration parsing. All following stderr output will be prefixed by it. As a consequence, remove extraneous prefix "config" already specified in various ha_alert/warning/notice calls.	2021-06-07 17:19:16 +02:00
Willy Tarreau	ca14dd5537	BUILD: resolvers: include tools.h Many functions from tools.h are called there but it was inherited via others.	2021-05-08 12:59:47 +02:00
Amaury Denoyelle	e4a617c931	MINOR: action: replace match_pfx by a keyword flags field Define a new keyword flag KWF_MATCH_PREFIX. This is used to replace the match_pfx field of action struct. This has the benefit to have more explicit action declaration, and now it is possible to quickly implement experimental actions.	2021-05-07 14:35:01 +02:00
Willy Tarreau	b205bfdab7	CLEANUP: cli/tree-wide: properly re-align the CLI commands' help messages There were 102 CLI commands whose help were zig-zagging all along the dump making them unreadable. This patch realigns all these messages so that the command now uses up to 40 characters before the delimiting colon. About a third of the commands did not correctly list their arguments which were added after the first version, so they were all updated. Some abuses of the term "id" were fixed to use a more explanatory term. The "set ssl ocsp-response" command was not listed because it lacked a help message, this was fixed as well. The deprecated enable/disable commands for agent/health/server were prominently written as deprecated. Whenever possible, clearer explanations were provided.	2021-05-07 11:51:26 +02:00
Willy Tarreau	2b71810cb3	CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion The current "ADD" vs "ADDQ" is confusing because when thinking in terms of appending at the end of a list, "ADD" naturally comes to mind, but here it does the opposite, it inserts. Several times already it's been incorrectly used where ADDQ was expected, the latest of which was a fortunate accident explained in `6fa922562` ("CLEANUP: stream: explain why we queue the stream at the head of the server list"). Let's use more explicit (but slightly longer) names now: LIST_ADD -> LIST_INSERT LIST_ADDQ -> LIST_APPEND LIST_ADDED -> LIST_INLIST LIST_DEL -> LIST_DELETE The same is true for MT_LISTs, including their "TRY" variant. LIST_DEL_INIT keeps its short name to encourage to use it instead of the lazier LIST_DELETE which is often less safe. The change is large (~674 non-comment entries) but is mechanical enough to remain safe. No permutation was performed, so any out-of-tree code can easily map older names to new ones. The list doc was updated.	2021-04-21 09:20:17 +02:00
Emeric Brun	c8f3e45c6a	MEDIUM: resolvers: add support of tcp address on nameserver line. This patch re-works configuration parsing, it removes the "server" lines from "resolvers" sections introduced in commit `56fc5d9eb`: MEDIUM: resolvers: add supports of TCP nameservers in resolvers. It also extends the nameserver lines to support stream server addresses such as: resolvers nameserver localhost tcp@127.0.0.1:53 Doing so, a part of nameserver's init code was factorized in function 'parse_resolvers' and removed from 'post_parse_resolvers'.	2021-04-08 14:20:40 +02:00
Amaury Denoyelle	ce44482fe5	REORG: global: move initcall register code in a dedicated file Create a new module init which contains code related to REGISTER_* macros for initcalls. init.h is included in api.h to make init code available to all modules. It's a step to clean up a bit haproxy.c/global.h.	2021-03-26 15:28:33 +01:00
Willy Tarreau	70490ebb12	CLEANUP: resolvers: use pool_zalloc() in resolv_link_resolution() This one used to alloc then zero the area, let's have the allocator do it.	2021-03-22 23:19:28 +01:00
Amaury Denoyelle	30c0537f5a	REORG: server: use flags for parse_server Modify the API of parse_server function. Use flags to describe the type of the parsed server instead of discrete arguments. These flags can be used to specify if a server/default-server/server-template is parsed. Additional parameters are also specified (parsing of the address required, resolve of a name must be done immediately). It is now unneeded to use strcmp on args[0] in parse_server. Also, the calls to parse_server are more explicit thanks to the flags.	2021-03-18 15:37:05 +01:00
Christopher Faulet	db31b4486c	CLEANUP: resolvers: Perform unsafe loop on requester list when possible When answer list of a response is checked, it is useless to perform a safe loop on the requester list.	2021-03-12 17:42:47 +01:00
Christopher Faulet	e8674c7184	MINOR: resolvers: Don't try to match immediatly renewed ADD items The loop looking for existing ADD items to renew their last_seen must ignore the items already renewed in the same loop. To do so, we rely on the last_seen time. because it is now based on now_ms, it is safe. Doing so avoid to match several time the same ADD item when the same IP address is found in several ADD item. This reduces the number of extra DNS resolutions. This patch depends on "MINOR: resolvers: Use milliseconds for cached items in resolver responses". Both may be backported as far as 2.2 if necessary.	2021-03-12 17:42:45 +01:00
Christopher Faulet	55c1c4053f	MINOR: resolvers: Use milliseconds for cached items in resolver responses The last time when an item was seen in a resolver responses is now stored in milliseconds instead of seconds. This avoid some corner-cases at the edges. This also simplifies time comparisons.	2021-03-12 17:41:28 +01:00
Christopher Faulet	d83a6df5cd	BUG/MEDIUM: resolvers: Skip DNS resolution at startup if SRV resolution is set At startup, if a SRV resolution is set for a server, no DNS resolution is created. We must wait the first SRV resolution to know if it must be triggered. It is important to do so for two reasons. First, during a "classical" startup, a server based on a SRV resolution has no hostname. Thus the created DNS resolution is useless. Best waiting the first SRV resolution. It is not really a bug at this stage, it is just useless. Second, in the same situation, if the server state is loaded from a file, its hosname will be set a bit later. Thus, if there is no additionnal record for this server, because there is already a DNS resolution, it inhibits any new DNS resolution. But there is no hostname attached to the existing DNS resolution. So no resolution is performed at all for this server. To avoid any problem, it is fairly easier to handle this special case during startup. But this means we must be prepared to have no "resolv_requester" field for a server at runtime. This patch must be backported as far as 2.2.	2021-03-12 17:41:28 +01:00
Christopher Faulet	0efc0993ec	BUG/MEDIUM: resolvers: Don't release resolution from a requester callbacks Another way to say it: "Safely unlink requester from a requester callbacks". Requester callbacks must never try to unlink a requester from a resolution, for the current requester or another one. First, these callback functions are called in a loop on a request list, not necessarily safe. Thus unlink resolution at this place, may be unsafe. And it is useless to try to make these loops safe because, all this stuff is placed in a loop on a resolution list. Unlink a requester may lead to release a resolution if it is the last requester. However, the unkink is necessary because we cannot reset the server state (hostname and IP) with some pending DNS resolution on it. So, to workaround this issue, we introduce the "safe" unlink. It is only performed from a requester callback. In this case, the unlink function never releases the resolution, it only reset it if necessary. And when a resolution is found with an empty requester list, it is released. This patch depends on the following commits : * MINOR: resolvers: Purge answer items when a SRV resolution triggers an error * MINOR: resolvers: Use a function to remove answers attached to a resolution * MINOR: resolvers: Directly call srvrq_update_srv_state() when possible * MINOR: resolvers: Add function to change the srv status based on SRV resolution All the series must be backported as far as 2.2. It fixes a regression introduced by the commit `b4badf720` ("BUG/MINOR: resolvers: new callback to properly handle SRV record errors"). don't release resolution from requester cb	2021-03-12 17:41:28 +01:00
Christopher Faulet	6b117aed49	MINOR: resolvers: Directly call srvrq_update_srv_state() when possible When the server status must be updated from the result of a SRV resolution, we can directly call srvrq_update_srv_state(). It is simpler and this avoid a test on the server DNS resolution. This patch is mandatory for the next commit. It also rely on "MINOR: resolvers: Directly call srvrq_update_srv_state() when possible".	2021-03-12 17:41:28 +01:00
Christopher Faulet	1dec5c7934	MINOR: resolvers: Use a function to remove answers attached to a resolution resolv_purge_resolution_answer_records() must be used to removed all answers attached to a resolution. For now, it is only used when a resolution is released.	2021-03-12 17:41:28 +01:00
Christopher Faulet	3e0600fbbf	BUG/MEDIUM: resolvers: Trigger a DNS resolution if an ADD item is obsolete When a ADD item attached to a SRV item is removed because it is obsolete, we must trigger a DNS resolution to be sure the hostname still resolves or not. There is no other way to be the entry is still valid. And we cannot set the server in RMAINT immediatly, because a DNS server may be inconsitent and may stop to add some additionnal records. The opposite is also true. If a valid ADD item is still attached to a SRV item, any DNS resolution must be stopped. There is no reason to perform extra resolution in this case. This patch must be backported as far as 2.2.	2021-03-12 17:41:28 +01:00
Baptiste Assmann	6a8d11dc80	MINOR: resolvers: new function find_srvrq_answer_record() This function search for a SRV answer item associated to a requester whose type is server. This is mainly useful to "link" a server to its SRV record when no additional record were found to configure the IP address. This patch is required by a bug fix.	2021-03-12 17:41:28 +01:00
Christopher Faulet	77f860699c	BUG/MEDIUM: resolvers: Fix the loop looking for an existing ADD item For each ADD item found in a SRV response, we try to find a corresponding ADD item already attached to an existing SRV item. If found, the ADD last_seen time is updated, otherwise we try to find a SRV item with no ADD to attached the new one. However, the loop is buggy. Instead of comparing 2 ADD items, it compares the new ADD item with the SRV item. Because of this bug, we are unable to renew last_seen time of existing ADD. This patch must be backported as far as 2.2.	2021-03-12 17:41:24 +01:00
Christopher Faulet	ab177ac1f3	BUG/MEDIUM: resolvers: Don't set an address-less server as UP when a server status is updated based on a SRV item, it is always set to UP, regardless it has an IP address defined or not. For instance, if only a SRV item is received, with no additional record, only the server hostname is defined. We must wait to have an IP address to set the server as UP. This patch must be backported as far as 2.2.	2021-03-12 16:43:37 +01:00
Christopher Faulet	bca680ba90	BUG/MINOR: resolvers: Unlink DNS resolution to set RMAINT on SRV resolution When a server is set in RMAINT becaues of a SRV resolution failure, the server DNS resolution, if any, must be unlink first. It is mandatory to handle the change in the context of a SRV resolution. This patch must be backported as far as 2.2.	2021-03-12 16:43:37 +01:00
Christopher Faulet	5037c06d91	Revert "BUG/MINOR: resolvers: Only renew TTL for SRV records with an additional record" This reverts commit `a331a1e8eb`. This commit fixes a real bug, but it also reveals some hidden bugs, mostly because of some design issues. Thus, in itself, it create more problem than it solves. So revert it for now. All known bugs will be addressed in next commits. This patch should be backported as far as 2.2.	2021-03-12 16:43:37 +01:00
Willy Tarreau	144f84a09d	MEDIUM: task: extend the state field to 32 bits It's been too short for quite a while now and is now full. It's still time to extend it to 32-bits since we have room for this without wasting any space, so we now gained 16 new bits for future flags. The values were not reassigned just in case there would be a few hidden u16 or short somewhere in which these flags are placed (as it used to be the case with stream->pending_events). The patch is tagged MEDIUM because this required to update the task's process() prototype to use an int instead of a short, that's quite a bunch of places.	2021-03-05 08:30:08 +01:00
Willy Tarreau	61cfdf4fd8	CLEANUP: tree-wide: replace free(x);x=NULL with ha_free(&x) This makes the code more readable and less prone to copy-paste errors. In addition, it allows to place some __builtin_constant_p() predicates to trigger a link-time error in case the compiler knows that the freed area is constant. It will also produce compile-time error if trying to free something that is not a regular pointer (e.g. a function). The DEBUG_MEM_STATS macro now also defines an instance for ha_free() so that all these calls can be checked. 178 occurrences were converted. The vast majority of them were handled by the following Coccinelle script, some slightly refined to better deal with "&*x" or with long lines: @ rule @ expression E; @@ - free(E); - E = NULL; + ha_free(&E); It was verified that the resulting code is the same, more or less a handful of cases where the compiler optimized slightly differently the temporary variable that holds the copy of the pointer. A non-negligible amount of {free(str);str=NULL;str_len=0;} are still present in the config part (mostly header names in proxies). These ones should also be cleaned for the same reasons, and probably be turned into ist strings.	2021-02-26 21:21:09 +01:00
Christopher Faulet	69beaa91d5	REORG: server: Export and rename some functions updating server info Some static functions are now exported and renamed to follow the same pattern of other exported functions. Here is the list : * update_server_fqdn: Renamed to srv_update_fqdn and exported * update_server_check_addr_port: renamed to srv_update_check_addr_port and exported * update_server_agent_addr_port: renamed to srv_update_agent_addr_port and exported * update_server_addr: renamed to srv_update_addr * update_server_addr_potr: renamed to srv_update_addr_port * srv_prepare_for_resolution: exported This change is mandatory to move all functions dealing with the server-state files in a separate file.	2021-02-25 10:02:39 +01:00
Christopher Faulet	52d4d30109	BUG/MEDIUM: resolvers: Reset server address and port for obselete SRV records When a SRV record expires, the ip/port assigned to the associated server are now removed. Otherwise, the server is stopped but keeps its ip/port while the server hostname is removed. It is confusing when the servers state are retrieve on the CLI and may be a problem if saved in a server-state file. Because the reload may fail because of this inconsistency. Here is an example: * Declare a server template in a backend, using the resolver <dns> server-template test 2 _http._tcp.example.com resolvers dns check * 2 SRV records are announced with the corresponding additional records. Thus, 2 servers are filled. Here is the "show servers state" output : 2 frt 1 test1 192.168.1.1 2 64 0 1 2 15 3 4 6 0 0 0 http1.example.com 8001 _http._tcp.example.com 0 0 - - 0 2 frt 2 test2 192.168.1.2 2 64 0 1 1 15 3 4 6 0 0 0 http2.example.com 8002 _http._tcp.example.com 0 0 - - 0 * Then, one additional record is removed (or a SRV record is removed, the result is the same). Here is the new "show servers state" output : 2 frt 1 test1 192.168.1.1 2 64 0 1 38 15 3 4 6 0 0 0 http1.example.com 8001 _http._tcp.example.com 0 0 - - 0 2 frt 2 test2 192.168.1.2 0 96 0 1 19 15 3 0 14 0 0 0 - 8002 _http._tcp.example.com 0 0 - - 0 On reload, if a server-state file is used, this leads to undefined behaviors depending on the configuration. This patch should be backported as far as 2.0.	2021-02-24 21:58:45 +01:00
Baptiste Assmann	b4badf720c	BUG/MINOR: resolvers: new callback to properly handle SRV record errors When a SRV record was created, it used to register the regular server name resolution callbacks. That said, SRV records and regular server name resolution don't work the same way, furthermore on error management. This patch introduces a new call back to manage DNS errors related to the SRV queries. this fixes github issue #50. Backport status: 2.3, 2.2, 2.1, 2.0	2021-02-24 21:58:45 +01:00
Christopher Faulet	a331a1e8eb	BUG/MINOR: resolvers: Only renew TTL for SRV records with an additional record If no additional record is associated to a SRV record, its TTL must not be renewed. Otherwise the entry never expires. Thus once announced a first time, the entry remains blocked on the same IP/port except if a new announce replaces the old one. Now, the TTL is updated if a SRV record is received while a matching existing one is found with an additional record or when an new additional record is assigned to an existing SRV record. This patch should be backported as far as 2.2.	2021-02-24 21:58:45 +01:00
Christopher Faulet	9c246a4b6c	BUG/MINOR: resolvers: Fix condition to release received ARs if not assigned At the end of resolv_validate_dns_response(), if a received additionnal record is not assigned to an existing server record, it is released. But the condition to do so is buggy. If "answer_record" (the received AR) is not assigned, "tmp_record" is not a valid record object. It is just a dummy record "representing" the head of the record list. Now, the condition is far cleaner. This patch must be backported as far as 2.2.	2021-02-24 21:58:45 +01:00
Emeric Brun	56fc5d9ebc	MEDIUM: resolvers: add supports of TCP nameservers in resolvers. This patch introduce the new line "server" to set a TCP nameserver in a "resolvers" section: server <name> <address> [param*] Used to configure a DNS TCP or stream server. This supports for all "server" parameters found in 5.2 paragraph. Some of these parameters are irrelevant for DNS resolving. Note: currently 4 queries are pipelined on the same connections. A batch of idle connections are removed every 5 seconds. "maxconn" can be configured to limit the amount of those concurrent connections and TLS should also usable if the server supports . The current implementation limits to 4 pipelined The name of the line in configuration is open to discussion and could be changed before the next release.	2021-02-13 10:03:46 +01:00
Emeric Brun	c943799c86	MEDIUM: resolvers/dns: split dns.c into dns.c and resolvers.c This patch splits current dns.c into two files: The first dns.c contains code related to DNS message exchange over UDP and in future other TCP. We try to remove depencies to resolving to make it usable by other stuff as DNS load balancing. The new resolvers.c inherit of the code specific to the actual resolvers. Note: It was really difficult to obtain a clean diff dur to the amount of moved code. Note2: Counters and stuff related to stats is not cleany separated because currently counters for both layers are merged and hard to separate for now.	2021-02-13 10:03:46 +01:00

1 2 3

129 Commits