haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-11 01:26:58 +02:00

Author	SHA1	Message	Date
Willy Tarreau	8a96216847	MEDIUM: sock-inet: re-check IPv6 connectivity every 30s IPv6 connectivity might start off (e.g. network not fully up when haproxy starts), so for features like resolvers, it would be nice to periodically recheck. With this change, instead of having the resolvers code rely on a variable indicating connectivity, it will now call a function that will check for how long a connectivity check hasn't been run, and will perform a new one if needed. The age was set to 30s which seems reasonable considering that the DNS will cache results anyway. There's no saving in spacing it more since the syscall is very check (just a connect() without any packet being emitted). The variables remain exported so that we could present them in show info or anywhere else. This way, "dns-accept-family auto" will now stay up to date. Warning though, it does perform some caching so even with a refreshed IPv6 connectivity, an older record may be returned anyway.	2025-05-09 15:45:44 +02:00
Aurelien DARRAGON	bd48e26a74	CLEANUP: proxy: mention that px->conn_retries isn't relevant in some cases Since `91e785edc` ("MINOR: stream: Rely on a per-stream max connection retries value"), px->conn_retries may be ignored in the following cases: * proxy not part of a list which gets properly post-init (ie: main proxy list, log-forward list, sink list) * proxy lacking the CAP_FE capability Documenting such cases where the px->conn_retries is set but effectively ignored, so that we either remove ignored statements or fix them in the future if they are really needed. In fact all cases affected here are automomous applets that already handle the retries themselves so the fact that `91e785edc` made ->conn_retries ineffective should not be a big deal anyway.	2025-04-29 21:21:19 +02:00
Willy Tarreau	40aceb7414	MINOR: resolvers: use the runtime IPv6 status instead of boot time one On systems where the network is not reachable at boot time (certain HA systems for example, or dynamically addressed test machines), we'll want to be able to periodically revalidate the IPv6 reachability status. The current code makes it complicated because it sets the config bits once for all at boot time. This commit changes this so that the config bits are not changed, but instead we rely on a static inline function that relies on sock_inet6_seems_reachable for every test (really cheap). This also removes the now unneeded resolvers late init code. This variable for now is still set at boot time but this will ease the transition later, as the resolvers code is now ready for this.	2025-04-25 09:32:05 +02:00
Willy Tarreau	69b051d1dc	MINOR: resolvers: add "dns-accept-family auto" to rely on detected IPv6 Instead of always having to force IPv4 or IPv6, let's now also offer "auto" which will only enable IPv6 if the system has a default gateway for it. This means that properly configured dual-stack systems will default to "ipv4,ipv6" while those lacking a gateway will only use "ipv4". Note that no real connectivity test is performed, so firewalled systems may still get it wrong and might prefer to rely on a manual "ipv4" assignment.	2025-04-24 17:52:28 +02:00
Willy Tarreau	2c46c2c042	MINOR: resolvers: add command-line argument -4 to force IPv4-only DNS In order to ease troubleshooting and testing, the new "-4" command line argument enforces queries and processing of "A" DNS records only, i.e. those representing IPv4 addresses. This can be useful when a host lack end-to-end dual-stack connectivity. This overrides the global "dns-accept-family" directive and is equivalent to value "ipv4".	2025-04-24 17:52:28 +02:00
Willy Tarreau	940fa19ad8	MEDIUM: resolvers: add global "dns-accept-family" directive By default, DNS resolvers accept both IPv4 and IPv6 addresses. This can be influenced by the "resolve-prefer" keywords on server lines as well as the family argument to the "do-resolve" action, but that is only a preference, which does not block the other family from being used when it's alone. In some environments where dual-stack is not usable, stumbling on an unreachable IPv6-only DNS record can cause significant trouble as it will replace a previous IPv4 one which would possibly have continued to work till next request. The "dns-accept-family" global option permits to enforce usage of only one (or both) address families. The argument is a comma-delimited list of the following words: - "ipv4": query and accept IPv4 addresses ("A" records) - "ipv6": query and accept IPv6 addresses ("AAAA" records) When a single family is used, no request will be sent to resolvers for the other family, and any response for the othe family will be ignored. The default value is "ipv4,ipv6", which effectively enables both families.	2025-04-24 17:52:28 +02:00
Aurelien DARRAGON	4194f756de	MEDIUM: tree-wide: avoid manually initializing proxies In this patch we try to use the proxy API init functions as much as possible to avoid code redundancy and prevent proxy initialization errors. As such, we prefer using alloc_new_proxy() and setup_new_proxy() instead of manually allocating the proxy pointer and performing the base init ourselves.	2025-04-10 22:10:31 +02:00
Christopher Faulet	892eb2bb2c	BUG/MEDIUM: promex/resolvers: Don't dump metrics if no nameserver is defined A 'resolvers' section may be defined without any nameserver. In that case, we must take care to not dump corresponding Prometheus metrics. However there is an issue that could lead to a crash or a strange infinite loop because we are looping on an empty list and, at some point, we are dereferencing an invalid pointer. There is an issue because the loop on the nameservers of a resolvers section is performed via callback functions and not the standard list_for_each_entry macro. So we must take care to properly detect end of the list and empty lists for nameservers. But the fix is not so simple because resolvers sections with and without nameservers may be mixed. To fix the issue, in rslv_promex_start_ts() and rslv_promex_next_ts(), when the next resolvers section must be evaluated, a loop is now used to properly skip empty sections. This patch is related to #2831. Not sure it fixes it. It must be backported as far as 3.0.	2025-01-06 09:08:38 +01:00
Ilia Shipitsin	193c94a539	BUG/MINOR: resolvers: handle a possible strdup() failure This defect was found by the coccinelle script "unchecked-strdup.cocci". It can be backported to all supported branches.	2024-12-10 08:05:50 +01:00
Willy Tarreau	9c6ccb8dbb	MEDIUM: config: warn on unitless timeouts < 100 ms From time to time we face a configuration with very small timeouts which look accidental because there could be expectations that they're expressed in seconds and not milliseconds. This commit adds a check for non-nul unitless values smaller than 100 and emits a warning suggesting to append an explicit unit if that was the intent. Only the common timeouts, the server check intervals and the resolvers hold and timeout values were covered for now. All the code needs to be manually reviewed to verify if it supports emitting warnings. This may break some configs using "zero-warning", but greps in existing configs indicate that these are extremely rare and solely intentionally done during tests. At least even if a user leaves that after a test, it will be more obvious when reading 10ms that something's probably not correct.	2024-11-19 10:33:20 +01:00
Christopher Faulet	8f28dbeea9	BUG/MEDIUM: resolvers: Insert a non-executed resulution in front of the wait list When a resolver is woken up to process DNS resolutions, it is possible to trigger an infinite loop on the resolver's wait list because delayed resolutions are always reinserted at the end of this list. This leads the watchdog to kill the process. By re-inserting them in front of the list, that fixes the bug. When a resolver tries to send the queries for the resolutions in its wait list, it may be unable to proceed for a resolution. This may happen because the resolution must be skipped (no hostname to resolv, a resolution already in-progress) or when an error occurred. In that case, the resolution is re-inserted in the resolver's wait list to be retry later, on a next wakeup. However, the resolution is inserted at the end of the wait list. So it is immediately reevaluated, in the same execution loop, instead of to be delayed. Most of time, it is not an issue because the resolution is considered as not expired on the second run. But it is an problem when the internal time wraps and is equal to 0. In that case, the resolution expiration date is badly computed and it is always considered as expired. If two or more resolutions are in that state, the resolver loops for ever on its wait list, until the process is killed by the watchdog. So we can argue that the way the resolution expiration date is computed must be fixed. And it would be true in a perfect world. However, the resolvers code is so crapy that it is hard to be sure to not introduce regressions. It is farly easier to re-insert delayed resolutions in front of the wait list. This fixes the issue and at worst, these resolutions will be evaluated one time too many on the next wakeup and only if now_ms was equal to 0 on the prior wakeup. This patch should be backported to all stable versions. On 2.2, LIST_ADD() must be used instead of LIST_INSERT()	2024-11-13 10:53:27 +01:00
William Lallemand	5184f3fb30	BUG/MINOR: resolvers/mworker: missing default resolvers in mworker mode Since commit `fe75c1e12d` ("MEDIUM: startup: remove MODE_MWORKER_WAIT") the MODE_MWORKER_WAIT constant disappeared. The initialization of the default resolvers section was conditionned by this constant. The section must be created in mworker mode, but only in the worker not in the master. It was currently completely disabled in both the master and the worker which could break configuration using it, as well as the httpclient. No backport needed.	2024-10-17 12:17:23 +02:00
Valentine Krasnobaeva	fe75c1e12d	MEDIUM: startup: remove MODE_MWORKER_WAIT MODE_MWORKER_WAIT becames redundant with MODE_MWORKER, due to moving master-worker fork in init(). This change allows master no longer perform reexec just after forking in order to free additional memory. As after the fork in the master process we set 'master' variable, we can replace now MODE_MWORKER_WAIT in some 'if' statements by simple check of this 'master' variable. Let's also continue to get rid of HAPROXY_MWORKER_WAIT_ONLY environment variable, as it's no longer needed as well. In cfg_program_postparser(), which is used to check if cmdline is defined to launch a program, we completely remove the check of mode for now, because the master process does not parse the configuration for the moment. 'program' section parsing will be reintroduced in master later in the next commits.	2024-10-16 22:02:39 +02:00
Willy Tarreau	b325453c36	MINOR: proxy: use the global file names for conf->file Proxy file names are assigned a bit everywhere (resolvers, peers, cli, logs, proxy). All these elements were enumerated and now use copy_file_name(). The only ha_free() call was turned to drop_file_name(). As a bonus side effect, a 300k backend config saved 14 MB of RAM.	2024-09-19 15:38:19 +02:00
Aperence	a7b04e383a	MINOR: tools: extend str2sa_range to add an alt parameter Add a new parameter "alt" that will store wether this configuration use an alternate protocol. This alt pointer will contain a value that can be transparently passed to protocol_lookup to obtain an appropriate protocol structure. This change is needed to allow for example the servers to know if it need to use an alternate protocol or not.	2024-08-30 18:53:49 +02:00
Willy Tarreau	d592ebdbeb	MEDIUM: socket: always properly use the sock_domain for requested families Now we make sure to always look up the protocol's domain for an address family. Previously we would use it as-is, which prevented from properly using custom addresses (which is when they differ). This removes some hard-coded tests such as in log.c where UNIX vs UDP was explicitly checked for example. It requires a bit of care, however, so as to properly pass value 1 in the 3rd arg of the protocol_lookup() for DGRAM stuff. Maybe one day we'll change these for defines or enums to limit mistakes.	2024-08-21 17:36:58 +02:00
Aurelien DARRAGON	d3d35f0fc6	BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char (2) Fix build warning on NetBSD by reapplying `f278eec37a` ("BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char"). This should fix issue #2551.	2024-07-18 13:29:52 +02:00
Christopher Faulet	91fe085943	BUG/MINOR: promex: Skip resolvers metrics when there is no resolver section By default, there is always at least on resolver section, the default one, based on "/etc/resolv.conf" content. However, it is possible to have no resolver at all if the file is empty or if any error occurred. Errors are silently ignored at this stage. In that case, there was a bug in the Prometheus exporter leading to a crash because the resolver section list is empty. An invalid resolver entity was used. To fix the issue we must only take care to not dump resolvers metrics when there is no resolver. Thanks to Aurelien to have spotted the offending commit. This patch should fix the issue #2604. It must be backported to 3.0.	2024-06-12 08:55:52 +02:00
Aurelien DARRAGON	c16eba8183	BUG/MEDIUM: server/dns: preserve server's port upon resolution timeout or error @boi4 reported in GH #2578 that since 3.0-dev1 for servers with address learned from A/AAAA records after a DNS flap server would be put out of maintenance with proper address but with invalid port (== 0), making it unusable and causing tcp checks to fail: [NOTICE] (1) : Loading success. [WARNING] (8) : Server mybackend/myserver1 is going DOWN for maintenance (DNS refused status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. [ALERT] (8) : backend 'mybackend' has no server available! [WARNING] (8) : mybackend/myserver1: IP changed from '(none)' to '127.0.0.1' by 'myresolver/ns1'. [WARNING] (8) : Server mybackend/myserver1 ('myhost') is UP/READY (resolves again). [WARNING] (8) : Server mybackend/myserver1 administratively READY thanks to valid DNS answer. [WARNING] (8) : Server mybackend/myserver1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. @boi4 also mentioned that this used to work fine before. Willy suggested that this regression may have been introduced by `64c9c8e` ("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS") Turns out he was right! Indeed, in `64c9c8e` we systematically memset the whole server_inetaddr struct (which contains both the requested server's addr and port planned for atomic update) instead of only memsetting the addr part of the structure: except when SRV records are involved (SRV records provide both the address and the port unlike A or AAAA records), we must not reset the server's port upon DNS errors because the port may have been provided at config time and we don't want to lose its value. Big thanks to @boi4 for his well-documented issue that really helped us to pinpoint the bug right on time for the dev-13 release. No backport needed (unless `64c9c8e` gets backported).	2024-05-24 15:29:48 +02:00
Amaury Denoyelle	634cc2a5d8	MINOR: counters: move last_change into counters struct last_change was a member present in both proxy and server struct. It is used as an age statistics to report the last update of the object. Move last_change into fe_counters/be_counters. This is necessary to be able to manipulate it through generic stat column and report it into stats-file. Note that there is a change for proxy structure with now 2 different last_change values, on frontend and backend side. Special care was taken to ensure that the value is initialized only on the proxy side. The other value is set to 0 unless a listen proxy is instantiated. For the moment, only backend counter is reported in stats. However, with now two distinct values, stats could be extended to report it on both side.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	65624876f2	MINOR: stats: introduce a more expressive stat definition method Previously, statistics were simply defined as a list of name_desc, as for example "stat_cols_px" for proxy stats. No notion of type was fixed for each stat definition. This correspondance was done individually inside stats_fill_*_line() functions. This renders the process to define new statistics tedious. Implement a more expressive stat definition method via a new API. A new type "struct stat_col" for stat column to replace name_desc usage is defined. It contains a field to store the stat nature and format. A <cap> field is also defined to be able to define a proxy stat only for certain type of objects. This new type is also further extended to include counter offsets. This allows to define a method to automatically generate a stat value field from a "struct stat_col". This will be the subject of a future commit. New type "struct stat_col" is fully compatible full name_desc. This allows to gradually convert stats definition. The focus will be first for proxies counters to implement statistics preservation on reload.	2024-04-26 10:20:57 +02:00
Tim Duesterhus	cd5d62249f	CLEANUP: Reapply ist.cocci (3) This reapplies ist.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Willy Tarreau	ad31e53287	REORG: dns/ring: split the ring between the generic one and the DNS one A ring is used for the DNS code but slightly differently from the generic one, which prevents some important changes from being made to the generic code without breaking DNS. As the use cases differ, it's better to just split them apart for now and have the DNS code use its own ring that we rename dns_ring and let the generic code continue to live on its own. The unused parts such as CLI registration were dropped, resizing and allocation from a mapped area were dropped. dns_ring_detach_appctx() was kept despite not being used, so as to stay consistent with the comments that say it must be called, despite the DNS code explicitly mentioning that it skips it for now (i.e. this may change in the future). Hopefully after the generic rings are converted the DNS code can migrate back to them, though this is really not necessary.	2024-03-25 17:34:19 +00:00
Aurelien DARRAGON	59f08f65fd	CLEANUP: tree-wide: use proper ERR_* return values for PRE_CHECK fcts httpclient_precheck(), ssl_ocsp_update_precheck(), and resolvers_create_default() functions are registered through REGISTER_PRE_CHECK() macro to be called by haproxy during init from the pre_check_list list. When calling functions registered in pre_check_list, haproxy expects ERR_* return values. However those 3 functions currently use raw return values, so we better use explicit ERR_* macros to prevent breakage in the future if ERR_* values mapping were to change.	2024-03-07 11:48:08 +01:00
Ilya Shipitsin	96cd04f8db	CLEANUP: fix typo in naming for variable "unused" In resolvers.c:rslv_promex_next_ts() and in stick-tables.c:stk_promex_next_ts(), an unused argument was mistakenly called "unsued" instead of "unused". Let's fix this in a separate patch so that it can be omitted from backports if this causes build problems.	2024-03-05 11:50:34 +01:00
Christopher Faulet	868205943c	MAJOR: stats: Send stats dump over HTTP using zero-copy forwarding Just like for the cache applet, it is now possible to send response to the opposite side using the zero-copy forwarding. Internal functions were slightly updated but there is nothing special to say. Except the requested size during the nego stage is not exact.	2024-02-07 15:04:48 +01:00
Christopher Faulet	ca6f0ca82b	MEDIUM: promex/resolvers: Dump resolvers metrics via a promex module Just like for stick-tables, this patch adds a promex module to dump resolvers metrics. It adds the "resolver" scope and for now, it dumps folloowing metrics: * haproxy_resolver_sent * haproxy_resolver_send_error * haproxy_resolver_valid * haproxy_resolver_update * haproxy_resolver_cname * haproxy_resolver_cname_error * haproxy_resolver_any_err * haproxy_resolver_nx * haproxy_resolver_timeout * haproxy_resolver_refused * haproxy_resolver_other * haproxy_resolver_invalid * haproxy_resolver_too_big * haproxy_resolver_outdated	2024-02-02 09:11:34 +01:00
Christopher Faulet	3246f863d6	MEDIUM: stats: Be able to access a specific field into a stats module It is now possible to selectively retrieve extra counters from stats modules. H1, H2, QUIC and H3 fill_stats() callback functions are updated to return a specific counter.	2024-02-01 12:00:53 +01:00
Aurelien DARRAGON	c5cace3100	BUG/MEDIUM: server/dns: perform svc_port updates atomically from SRV records This was the last missing bit from `cd994407a` ("BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates") Indeed, despite the fix, svc_port updates from resolvers were still directly performed on the server's struct. Now they make proper use of the server_set_inetaddr() function so the port change (+ optional addr change with AR) will be propagated atomically. This patch depends on: - "MINOR: server: ensure connection cleanup on server addr changes" - "CLEANUP: server/event_hdl: remove purge_conn hint in INETADDR event" - "MEDIUM: server: merge srv_update_addr() and srv_update_addr_port() logic" - "MEDIUM: server: make server_set_inetaddr() updater serializable" - "MINOR: server/event_hdl: expose updater info through INETADDR event" - "MINOR: server: add dns hint in server_inetaddr_updater struct" - "MEDIUM: server/dns: clear RMAINT when addr resolves again" While it could be backported in 2.9 with `cd994407a` ("BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates") to ensure addr and svc_port updates performed by resolver's code comply with the API taking care of pushing the update (and thus avoid any race), some patch dependencies are quite sensitive so it's probably best to avoid backporting for no good reason, or at least wait for it to be considered stable to prevent any breakeages	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	64c9c8ef39	BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS As seen before, server's addr and svc_port should not be updated directly during runtime, because even if the update is performed under the lock, some competing threads might be reading ->addr and ->svc_port without the lock because they simply cannot afford it. To prevent races with such competing threads, server's addr and port should only be updated using server_set_inetaddr() function or similar. This patch depends on: - "MINOR: server: ensure connection cleanup on server addr changes" - "CLEANUP: server/event_hdl: remove purge_conn hint in INETADDR event" - "MEDIUM: server: merge srv_update_addr() and srv_update_addr_port() logic" - "MEDIUM: server: make server_set_inetaddr() updater serializable" - "MINOR: server/event_hdl: expose updater info through INETADDR event" - "MINOR: server: add dns hint in server_inetaddr_updater struct" - "MEDIUM: server/dns: clear RMAINT when addr resolves again" While it could be backported in 2.9 with `cd994407a` ("BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates") to ensure addr and svc_port reset performed by resolver's code comply with the API taking care of pushing the update (and thus avoid any race), some patch dependencies are quite sensitive so it's probably best to avoid backporting for no good reason, or at least wait for it to be considered stable to prevent any breakeages.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	334ebfa1a2	MEDIUM: server/dns: clear RMAINT when addr resolves again snr_update_srv_status() and srvrq_update_srv_status() will both set or clear the server RMAINT state depending of the result of the current dns resolution. This used to work pretty well in the past, but now that addr:svc_port changes are changed atomically through a dedicated task, the change is performed asynchronously, so this can cause some flapping issues if the server is put out of maintenance while the server's address is still unassigned. To prevent errors, the resolver's code is now only allowed to put the server under maintenance but not to remove it from maintenance: the decision to remove a server from maintenance is performed by the task responsible for updating the server's addr: if the addr resolves again thanks to a valid DNS resolution and the server was previously under RMAINT, then it cleared from RMAINT state. srvrq_update_srv_status() was renamed srvrq_set_srv_down(), since it is only called to put the server in maintenance as a result of a failing SRV entry. snr_update_srv_status() was renamed srv_set_srv_down() and slightly modified so that it only takes care of putting the server under maintenance when needed. The cli command "set server x/y addr" does not need to remove the RMAINT flag anymore.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	3ac79b504a	MEDIUM: server: make server_set_inetaddr() updater serializable server_set_inetaddr() updater argument is a simple char * string containing infos about the caller responsible for the update. In this patch, we try to make this argument serializable, that is, make it so that we can easily export it without having to keep the original pointer passed by the caller or having to work with strings of variable lengths. This was a prerequisite for exposing more updater information through SERVER_INETADDR event (upcoming patch). Static strings were simply mapped to a fixed ID that can be converted back to a string when needed using server_inetaddr_updater_by_to_str(). One special case one made for the SERVER_INETADDR_UPDATER_DNS_RESOLVER updater since in this case the updater hint has to be generated from the corresponding resolver id / nameserver id combination. This was achieved by saving the nameserver id within the updater struct. Knowing that the resolver id can be guessed from the server struct directly, it was not exposed through the updater struct. This patch depends on: - "MINOR: resolvers: add unique numeric id to nameservers" No functional change should be expected.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	2f6120d6d4	MINOR: resolvers: add unique numeric id to nameservers When we want to avoid keeping pointers on a nameserver struct, it's not always convenient to refer as a nameserver using it's text-based unique identifier since it's not limited in length thus it cannot be serialized and deserialized safely. To address this limitation, we add a new ->puid member in dns_nameserver struct which is a parent-unique numeric value that can be used to refer to the dns nameserver within its parent resolver context. To achieve this, we reused the resolver->nb_nameserver member that wasn't used. Each time we add a new nameserver to a resolver: we set ns->puid to the current number of nameservers within the resolver and we increment this number right away. Public helper function find_nameserver_by_resolvers_and_id() was added to help retrieve nameserver pointer from (resolver X nameserver puid) combination.	2023-12-21 14:22:27 +01:00
William Lallemand	0d2ebb53f7	BUG/MINOR: resolvers: default resolvers fails when network not configured Bug #1740 was opened again, this time a user is complaining about the "can't create socket for nameserver". This can happen if the resolv.conf file contains a class of address which was not configured on the machine, for example IPv6. The fix does the same as `b10b1196b` ("MINOR: resolvers: shut the warning when "default" resolvers is implicit"), and uses the "resolvers->conf.implicit" variable to emit the error. Though it is not needed to convert the explicit behavior with a ERR_WARN, because this is supposed to be an unrecoverable error, unlike the connect(). Should fix issue #1740. Must be backported were `b10b1196b` was backported. (as far as 2.6)	2023-12-18 15:50:07 +01:00
Aurelien DARRAGON	12582eb8e5	MINOR: tools: make str2sa_range() directly return type hints str2sa_range() already allows the caller to provide <proto> in order to get a pointer on the protocol matching with the string input thanks to `5fc9328a` ("MINOR: tools: make str2sa_range() directly return the protocol") However, as stated into the commit message, there is a trick: "we can fail to return a protocol in case the caller accepts an fqdn for use later. This is what servers do and in this case it is valid to return no protocol" In this case, we're unable to return protocol because the protocol lookup depends on both the [proto type + xprt type] and the [family type] to be known. While family type might not be directly resolved when fqdn is involved (because family type might be discovered using DNS queries), proto type and xprt type are already known. As such, the caller might be interested in knowing those address related hints even if the address family type is not yet resolved and thus the matching protocol cannot be looked up. Thus in this patch we add the optional net_addr_type (custom type) argument to str2sa_range to enable the caller to check the protocol type and transport type when the function succeeds.	2023-11-10 17:49:57 +01:00
Christopher Faulet	06e9c81bd0	MEDIUM: resolvers: Stop scheduling resolution during stopping stage When the process is stopping, the server resolutions are suspended. However the task is still periodically woken up for nothing. If there is a huge number of resolution, it may lead to a noticeable CPU consumption for no reason. To avoid this extra CPU cost, we stop to schedule the the resolution tasks during the stopping stage. Of course, it is only true for server resolutinos. Dynamic ones, via do-resolve actions, are not concerned. These ones must still be triggered during stopping stage. Concretly, during the stopping stage, the resolvers task is no longer scheduled if there is no running resolutions. In this case, if a do-resolve action is evaluated, the task is woken up. This patch should partially solve the issue #2145.	2023-05-17 16:48:33 +02:00
Christopher Faulet	7b3d38a633	MEDIUM: tree-wide: Change sc API to specify required free space to progress sc_need_room() now takes the required free space to receive more data as parameter. All calls to this function are updated accordingly. For now, this value is set but not used. When we are waiting for a buffer, 0 is used. So we expect to be unblocked ASAP. However this must be reviewed because SC_FL_NEED_BUF is probably enough in this case and this flag is already set if the input buffer allocation fails.	2023-05-05 15:44:23 +02:00
Christopher Faulet	f4258bdf3b	MINOR: stats: Use the applet API to write data stats_putchk() is updated to use the applet API instead of the channel API to write data. To do so, the appctx is passed as parameter instead of the channel. This way, the applet does not need to take care to request more room it it fails to put data into the channel's buffer.	2023-05-05 15:41:29 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	eed5da1037	MINOR: clock: do not use now.tv_sec anymore Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec part to disappear. At this point, "now" is only used as a timeval in clock.c where it is updated.	2023-04-28 16:08:08 +02:00
Christopher Faulet	89aeabff5b	BUG/MINOR: resolvers: Use sc_need_room() to wait more room when dumping stats It was a cut/paste typo during stream-interface to conn-stream refactoring. sc_have_room() was used instead of sc_need_room(). This patch must be backported as far as 2.6.	2023-04-28 08:51:34 +02:00
Tim Duesterhus	1307cd42d2	CLEANUP: Stop checking the pointer before calling `ring_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { ring_free(e); - } @@ expression e; @@ - if (e) { ring_free(e); - } @@ expression e; @@ - if (e) ring_free(e); @@ expression e; @@ - if (e != NULL) ring_free(e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	fe83f58906	CLEANUP: Stop checking the pointer before calling `task_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { task_destroy(e); - } @@ expression e; @@ - if (e) { task_destroy(e); - } @@ expression e; @@ - if (e) task_destroy(e); @@ expression e; @@ - if (e != NULL) task_destroy(e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	c18e244515	CLEANUP: Stop checking the pointer before calling `pool_free()` Changes performed with this Coccinelle patch: @@ expression e; expression p; @@ - if (e != NULL) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) pool_free(p, e); @@ expression e; expression p; @@ - if (e != NULL) pool_free(p, e);	2023-04-23 00:28:25 +02:00
Christopher Faulet	5220a8c5c4	BUG/MEDIUM: resolvers: Force the connect timeout for DNS resolutions Timeouts for dynamic resolutions are not handled at the stream level but by the resolvers themself. It means there is no connect, client and server timeouts defined on the internal proxy used by a resolver. While it is not an issue for DNS resolution over UDP, it can be a problem for resolution over TCP. New sessions are automatically created when required, and killed on excess. But only established connections are considered. Connecting ones are never killed. Because there is no conncet timeout, we rely on the kernel to report a connection error. And this may be quite long. Because resolutions are periodically triggered, this may lead to an excess of unusable sessions in connecting state. This also prevents HAProxy to quickly exit on soft-stop. It is annoying, especially because there is no reason to not set a connect timeout. So to mitigate the issue, we now use the "resolve" timeout as connect timeout for the internal proxy attached to a resolver. This patch should be backported as far as 2.4.	2023-04-11 08:19:06 +02:00
Christopher Faulet	142cc1b52a	BUG/MINOR: resolvers: Wakeup DNS idle task on stopping Thanks to previous commit ("BUG/MEDIUM: dns: Kill idle DNS sessions during stopping stage"), DNS idle sessions are killed on stopping staged. But the task responsible to kill these sessions is running every 5 seconds. It means, when HAProxy is stopped, we can observe a delay before the process exits. To reduce this delay, when the resolvers task is executed, all DNS idle tasks are woken up. This patch must be backported as far as 2.6.	2023-04-11 08:19:06 +02:00
Christopher Faulet	52ec6f14c4	BUG/MEDIUM: resolvers: Properly stop server resolutions on soft-stop When HAproxy is stopping, the DNS resolutions must be stopped, except those triggered from a "do-resolve" action. To do so, the resolutions themselves cannot be destroyed, the current design is too complex. However, it is possible to mute the resolvers tasks. The same is already performed with the health-checks. On soft-stop, the tasks are still running periodically but nothing if performed. For the resolvers, when the process is stopping, before running a resolution, we check all the requesters attached to this resolution. If t least a request is a stream or if there is a requester attached to a running proxy, a new resolution is triggered. Otherwise, we ignored the resolution. It will be evaluated again on the next wakeup. This way, "do-resolv" action are still working during soft-stop but other resoluation are stopped. Of course, it may be see as a feature and not a bug because it was never performed. But it is in fact not expected at all to still performing resolutions when HAProxy is stopping. In addution, a proxy option will be added to change this behavior. This patch partially fixes the issue #1874. It could be backported to 2.7 and maybe to 2.6. But no further.	2023-03-14 15:23:55 +01:00
Aurelien DARRAGON	e5958d0292	BUG/MEDIUM: stats: fix resolvers dump In ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats"), we forgot to apply the patch in resolvers.c which provides the stats_dump_resolvers() function that is involved when dumping with "resolvers" domain. As a consequence, resolvers dump was broken because stats_dump_one_line(), which is used in stats_dump_resolv_to_buffer(), implicitely uses trash_chunk from stats.c to prepare the dump, and stats_putchk() is then called with global trash (currently empty) as output data. Given that trash_dump variable is static and thus only available within stats.c we change stats_putchk() function prototype so that the function does not take the output buffer as an argument. Instead, stats_putchk() will implicitly use the local trash_dump variable declared in stats.c. It will also prevent further mixups between stats_dump_* functions and stats_putchk(). This needs to be backported with ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats")	2023-02-06 07:53:03 +01:00
Christopher Faulet	51dbb4cb79	BUG/MINOR: resolvers: Wait the resolution execution for a do_resolv action The do_resolv action triggers a resolution and must wait for the result. Concretely, if no cache entry is available, it creates a resolution and wakes up the resolvers task. Then it yields. When the action is recalled, if the resolution is still running, it yields again. However, if the resolution is not running, it does not check it was running. Thus, it is possible to ignore the resolution because the action was recalled before the resolvers task had a chance to be executed. If there is result, the action must yield. This patch should fix the issue #1993. It must be backported as far as 2.0.	2023-01-11 10:31:42 +01:00
Christopher Faulet	819d48b14e	BUG/MEDIUM: resolvers: Use tick_first() to update the resolvers task timeout In resolv_update_resolvers_timeout(), the resolvers task timeout is updated by checking running and waiting resolutions. However, to find the next wakeup date, MIN() operator is used to compare ticks. Ticks must never be compared with such operators, tick helper functions must be used, to properly handled TICK_ETERNITY value. In this case, tick_first() must be used instead of MIN(). It is an old bug but it is pretty visible since the commit `fdecaf6ae4` ("BUG/MINOR: resolvers: do not run the timeout task when there's no resolution"). Because of this bug, the resolvers task timeout may be set to TICK_ETERNITY, stopping periodic resolutions. This patch should solve the issue #1962. It must be backported to all stable versions.	2022-12-14 10:44:17 +01:00

1 2 3 4

165 Commits