haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-27 06:31:23 +01:00

Author	SHA1	Message	Date
Aurelien DARRAGON	5c299dee5a	MEDIUM: stats: consider that shared stats pointers may be NULL This patch looks huge, but it has a very simple goal: protect all accessed to shared stats pointers (either read or writes), because we know consider that these pointers may be NULL. The reason behind this is despite all precautions taken to ensure the pointers shouldn't be NULL when not expected, there are still corner cases (ie: frontends stats used on a backend which no FE cap and vice versa) where we could try to access a memory area which is not allocated. Willy stumbled on such cases while playing with the rings servers upon connection error, which eventually led to process crashes (since 3.3 when shared stats were implemented) Also, we may decide later that shared stats are optional and should be disabled on the proxy to save memory and CPU, and this patch is a step further towards that goal. So in essence, this patch ensures shared stats pointers are always initialized (including NULL), and adds necessary guards before shared stats pointers are de-referenced. Since we already had some checks for backends and listeners stats, and the pointer address retrieval should stay in cpu cache, let's hope that this patch doesn't impact stats performance much.	2025-09-18 16:49:51 +02:00
Willy Tarreau	fdf6fd5b45	MEDIUM: server: switch the host_dn member to cebis_tree This member is used to index the hostname_dn contents for DNS resolution. Let's replace it with a cebis_tree to save another 32 bytes (24 for the node + 8 by avoiding the duplication of the pointer). The struct server is now at 3904 bytes.	2025-09-16 09:23:46 +02:00
Christopher Faulet	f8f94ffc9c	BUG/MEDIUM: server: Use sni as pool connection name for SSL server only By default, for a given server, when no pool-conn-name is specified, the configured sni is used. However, this must only be done when SSL is in-use for the server. Of course, it is uncommon to have a sni expression for now-ssl server. But this may happen. In addition, the SSL may be disabled via the CLI. In that case, the pool-conn-name must be discarded if it was copied from the sni. And, we must of course take care to set it if the ssl is enabled. Finally, when the attac-srv action is checked, we now checked the pool-conn-name expression. This patch should be backported as far as 3.0. It relies on "MINOR: server: Parse sni and pool-conn-name expressions in a dedicated function" which should be backported too.	2025-09-05 15:56:08 +02:00
Aurelien DARRAGON	75e480d107	MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct Between 3.2 and 3.3-dev we noticed a noticeable performance regression due to stats handling. After bisecting, Willy found out that recent work to split stats computing accross multiple thread groups (stats sharding) was responsible for that performance regression. We're looking at roughly 20% performance loss. More precisely, it is the added indirections, multiplied by the number of statistics that are updated for each request, which in the end causes a significant amount of time being spent resolving pointers. We noticed that the fe_counters_shared and be_counters_shared structures which are currently allocated in dedicated memory since a0dcab5c ("MAJOR: counters: add shared counters base infrastructure") are no longer huge since 16eb0fab31 ("MAJOR: counters: dispatch counters over thread groups") because they now essentially hold flags plus the per-thread group id pointer mapping, not the counters themselves. As such we decided to try merging fe_counters_shared and be_counters_shared in their parent structures. The cost is slight memory overhead for the parent structure, but it allows to get rid of one pointer indirection. This patch alone yields visible performance gains and almost restores 3.2 stats performance. counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and now returns either failure or success instead of a pointer because we don't need to retrieve a shared pointer anymore, the function takes care of initializing existing pointer.	2025-07-25 16:46:10 +02:00
Willy Tarreau	96da670cd7	MINOR: resolvers: do not duplicate the hostname_dn field The hostdn.key field in the server contains a pure copy of the hostname_dn since commit 3406766d57 ("MEDIUM: resolvers: add a ref between servers and srv request or used SRV record") which wanted to lowercase it. Since it's not necessary, let's drop this useless copy. In addition, the return from strdup() was not tested, so it could theoretically crash the process under heavy memory contention.	2025-07-08 07:54:45 +02:00
Willy Tarreau	95cf518bfa	BUG/MINOR: resolvers: don't lower the case of binary DNS format The server's "hostname_dn" is in Domain Name format, not a pure string, as converted by resolv_str_to_dn_label(). It is made of lower-case string components delimited by binary lengths, e.g. <0x03>www<0x07>haproxy<0x03)org. As such it must not be lowercased again in srv_state_srv_update(), because 1) it's useless on the name components since already done, and 2) because it would replace component lengths 97 and above by 32-char shorter ones. Granted, not many domain names have that large components so the risk is very low but the operation is always wrong anyway. This was brought in 2.5 by commit 3406766d57 ("MEDIUM: resolvers: add a ref between servers and srv request or used SRV record"). In the same vein, let's fix the confusing strcasecmp() that are applied to this binary format, and use memcmp() instead. Here there's basically no risk to incorrectly match the wrong record, but that test alone is confusing enough to provoke the existence of the bug above. Finally let's update the component for that field to mention that it's in this format and already lower cased. Better not backport this, the risk of facing this bug is almost zero, and every time we touch such files something breaks for bad reasons.	2025-07-08 07:54:45 +02:00
Aurelien DARRAGON	4fcc9b5572	MINOR: counters: rename last_change counter to last_state_change Since proxy and server struct already have an internal last_change variable and we cannot merge it with the shared counter one, let's rename the last_change counter to be more specific and prevent the mixup between the two. last_change counter is renamed to last_state_change, and unlike the internal last_change, this one is a shared counter so it is expected to be updated by other processes in our back. However, when updating last_state_change counter, we use the value of the server/proxy last_change as reference value.	2025-06-30 16:26:38 +02:00
Aurelien DARRAGON	01dfe17acf	MEDIUM: server: add and use a separate last_change variable for internal use last_change server metric is used for 2 separate purposes. First it is used to report last server state change date for stats and other related metrics. But it is also used internally, including in sensitive paths, such as lb related stuff to take decision or perform computations (ie: in srv_dynamic_maxconn()). Due to last_change counter now being split over thread groups since 16eb0fa ("MAJOR: counters: dispatch counters over thread groups"), reading the aggregated value has a cost, and we cannot afford to consult last_change value from srv_dynamic_maxconn() anymore. Moreover, since the value is used to take decision for the current process we don't wan't the variable to be updated by another process in our back. To prevent performance regression and sharing issues, let's instead add a separate srv->last_change value, which is not updated atomically (given how rare the updates are), and only serves for places where the use of the aggregated last_change counter/stats (split over thread groups) is too costly.	2025-06-30 16:26:25 +02:00
Aurelien DARRAGON	16eb0fab31	MAJOR: counters: dispatch counters over thread groups Most fe and be counters are good candidates for being shared between processes. They are now grouped inside "shared" struct sub member under be_counters and fe_counters. Now they are properly identified, they would greatly benefit from being shared over thread groups to reduce the cost of atomic operations when updating them. For this, we take the current tgid into account so each thread group only updates its own counters. For this to work, it is mandatory that the "shared" member from {fe,be}_counters is initialized AFTER global.nbtgroups is known, because each shared counter causes the stat to be allocated lobal.nbtgroups times. When updating a counter without concurrency, the first counter from the array may be updated. To consult the shared counters (which requires aggregation of per-tgid individual counters), some helper functions were added to counter.h to ease code maintenance and avoid computing errors.	2025-06-05 09:59:38 +02:00
Aurelien DARRAGON	a0dcab5c45	MAJOR: counters: add shared counters base infrastructure Shareable counters are not tagged as shared counters and are dynamically allocated in separate memory area as a prerequisite for being stored in shared memory area. For now, GUID and threads groups are not taken into account, this is only a first step. also we ensure all counters are now manipulated using atomic operations, namely, "last_change" counter is now read from and written to using atomic ops. Despite the numerous changes caused by the counters being moved away from counters struct, no change of behavior should be expected.	2025-06-05 09:58:58 +02:00
Christopher Faulet	647a290662	BUG/MINOR: server-state: Fix expiration date of srvrq_check tasks "hold.timeout" was used as expiration date for srvrq_check tasks. But it is not accurrate. The expiration date must be based on the resolution timeouts instead (resolve and retry). The purpose of srvrq_check task is to clean up the server resolution status when outdated info are inherited from the state file. Using "hold.timeout" is not accurrate here because hold timeouts concern the resolution response items not the resolution status of servers. It may be set to a huge value or 0. The expiration date of these tasks must be based on the resolution timeouts instead. So now the ("timeout resolve" + resolve_retries * "timeout retry") value is used. This patch should fix the issue #2816. It must be backported to all stable versions.	2024-12-11 10:00:01 +01:00
Aurelien DARRAGON	85298189bf	BUG/MEDIUM: server: server stuck in maintenance after FQDN change Pierre Bonnat reported that SRV-based server-template recently stopped to work properly. After reviewing the changes, it was found that the regression was caused by a4d04c6 ("BUG/MINOR: server: make sure the HMAINT state is part of MAINT") Indeed, HMAINT is not a regular maintenance flag. It was implemented in b418c122 a4d04c6 ("BUG/MINOR: server: make sure the HMAINT state is part of MAINT"). This flag is only set (and never removed) when the server FQDN is changed from its initial config-time value. This can happen with "set server fqdn" command as well as SRV records updates from the DNS. This flag should ideally belong to server flags.. but it was stored under srv_admin enum because cur_admin is properly exported/imported via server state-file while regular server's flags are not. Due to a4d04c6, when a server FQDN changes, the server is considered in maintenance, and since the HMAINT flag is never removed, the server is stuck in maintenance. To fix the issue, we partially revert a4d04c6. But this latter commit is right on one point: HMAINT flag was way too confusing and mixed-up between regular MAINT flags, thus there's nothing to blame about a4d04c6 as it was error-prone anyway.. To prevent such kind of bugs from happening again, let's rename HMAINT to something more explicit (SRV_ADMF_FQDN_CHANGED) and make it stand out under srv_admin enum so we're not tempted to mix it with regular maintenance flags anymore. Since a4d04c6 was set to be backported in all versions, this patch must be backported there as well.	2024-10-16 14:26:57 +02:00
Aurelien DARRAGON	d3d35f0fc6	BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char (2) Fix build warning on NetBSD by reapplying f278eec37a ("BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char"). This should fix issue #2551.	2024-07-18 13:29:52 +02:00
Amaury Denoyelle	634cc2a5d8	MINOR: counters: move last_change into counters struct last_change was a member present in both proxy and server struct. It is used as an age statistics to report the last update of the object. Move last_change into fe_counters/be_counters. This is necessary to be able to manipulate it through generic stat column and report it into stats-file. Note that there is a change for proxy structure with now 2 different last_change values, on frontend and backend side. Special care was taken to ensure that the value is initialized only on the proxy side. The other value is set to 0 unless a listen proxy is instantiated. For the moment, only backend counter is reported in stats. However, with now two distinct values, stats could be extended to report it on both side.	2024-05-02 10:55:25 +02:00
Marcos de Oliveira	462b54dee2	BUG/MINOR: server-state: Avoid warning on 'file not found' On a clean installation, users might want to use server-state-file and the recommended zero-warning option. This caused a problem if server-state-file was not found, as a warning was emited, causing startup to fail. This will allow users to specify nonexistent server-state-file at first, and dump states to the file later. Fixes #2190 CF: Technically speaking, this patch can be backported to all stable versions. But it is better to do so to 2.8 only for now.	2023-07-21 15:08:27 +02:00
Marcos de Oliveira	122a903b94	BUG/MINOR: server-state: Ignore empty files Users might want to pre-create an empty file for later dumping server-states. This commit allows for that by emiting a notice in case file is empty and a warning if file is not empty, but version is unknown Fix partially: #2190 CF: Technically speaking, this patch can be backported to all stable versions. But it is better to do so to 2.8 only for now.	2023-07-21 15:08:27 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	eed5da1037	MINOR: clock: do not use now.tv_sec anymore Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec part to disappear. At this point, "now" is only used as a timeval in clock.c where it is updated.	2023-04-28 16:08:08 +02:00
Aurelien DARRAGON	1746b56e68	MINOR: server: change srv_op_st_chg_cause storage type This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type". While looking at current srv_op_st_chg_cause usage, it was clear that the struct needed some cleanup since some leftovers from asynchronous server state change updates were left behind and resulted in some useless code duplication, and making the whole thing harder to maintain. Two observations were made: - by tracking down srv_set_{running, stopped, stopping} usage, we can see that the <reason> argument is always a fixed statically allocated string. - check-related state change context (duration, status, code...) is not used anymore since srv_append_status() directly extracts the values from the server->check. This is pure legacy from when the state changes were applied asynchronously. To prevent code duplication, useless string copies and make the reason/cause more exportable, we store it as an enum now, and we provide srv_op_st_chg_cause() function to fetch the related description string. HEALTH and AGENT causes (check related) are now explicitly identified to make consumers like srv_append_op_chg_cause() able to fetch checks info from the server itself if they need to.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9c21ff0208	BUG/MINOR: server: don't use date when restoring last_change from state file When restoring from a state file: the server "Status" reports weird values on the html stats page: "5s UP" becomes -> "? UP" after the restore This is due to a bug in srv_state_srv_update(): when restoring the states from a state file, we rely on date.tv_sec to compute the process-relative server last_change timestamp. This is wrong because everywhere else we use now.tv_sec when dealing with last_change, for instance in srv_update_status(). date (which is Wall clock time) deviates from now (monotonic time) in the long run. They should not be mixed, and given that last_change is an internal time value, we should rely on now.tv_sec instead. last_change export through "show servers state" cli is safe since we export a delta and not the raw time value in dump_servers_state(): srv_time_since_last_change = now.tv_sec - srv->last_change -- While this bug affects all stable versions, it was revealed in 2.8 thanks to 28360dc ("MEDIUM: clock: force internal time to wrap early after boot") This is due to the fact that "now" immediately deviates from "date", whereas in the past they had the same value when starting. Thus prior to 2.8 the bug is trickier since it could take some time for date and now to deviate sufficiently for the issue to arise, and instead of reporting absurd values that are easy to spot it could just result in last_change becoming inconsistent over time. As such, the fix should be backported to all stable versions. [for 2.2 the patch needs to be applied manually since srv_state_srv_update() was named srv_update_state() and can be found in server.c instead of server_state.c]	2023-04-21 14:36:45 +02:00
Willy Tarreau	74bc991600	BUILD: server-state: avoid using not-so-portable isblank() Once in a while we get rid of this one. isblank() is missing on old C libraries and only matches two values, so let's just replace it. It was brought with this commit in 2.4: 0bf268e18 ("MINOR: server: Be more strict on the server-state line parsing") It may be backported though it's really not important.	2022-01-28 19:04:02 +01:00
Christopher Faulet	dfd10ab5ee	MINOR: proxy: Introduce proxy flags to replace disabled bitfield This change is required to support TCP/HTTP rules in defaults sections. The 'disabled' bitfield in the proxy structure, used to know if a proxy is disabled or stopped, is replaced a generic bitfield named 'flags'. PR_DISABLED and PR_STOPPED flags are renamed to PR_FL_DISABLED and PR_FL_STOPPED respectively. In addition, everywhere there is a test to know if a proxy is disabled or stopped, there is now a bitwise AND operation on PR_FL_DISABLED and/or PR_FL_STOPPED flags.	2021-10-15 14:12:19 +02:00
Willy Tarreau	a8a72c68d5	CLEANUP: ssl/server: move ssl_sock_set_srv() to srv_set_ssl() in server.c This one has nothing to do with ssl_sock as it manipulates the struct server only. Let's move it to server.c and remove unneeded dependencies on ssl_sock.h. This further reduces by 10% the number of includes of opensslconf.h and by 0.5% the number of compiled lines.	2021-10-07 01:41:06 +02:00
Tim Duesterhus	d5fc8fcb86	CLEANUP: Add haproxy/xxhash.h to avoid modifying import/xxhash.h This solves setting XXH_INLINE_ALL in a cleaner way, because the imported header is not modified, easing future updates. see 6f7cc11e6dd0f01b437fba893da2edd2362660a2	2021-09-11 19:58:45 +02:00
Christopher Faulet	dcac418062	BUG/MEDIUM: resolvers: Add a task on servers to check SRV resolution status When a server relies on a SRV resolution, a task is created to clean it up (fqdn/port and address) when the SRV resolution is considered as outdated (based on the resolvers 'timeout' value). It is only possible if the server inherits outdated info from a state file and is no longer selected to be attached to a SRV item. Note that most of time, a server is attached to a SRV item. Thus when the item becomes obsolete, the server is cleaned up. It is important to have such task to be sure the server will be free again to have a chance to be resolved again with fresh information. Of course, this patch is a workaround to solve a design issue. But there is no other obvious way to fix it without rewritting all the resolvers part. And it must be backportable. This patch relies on following commits: * MINOR: resolvers: Clean server in a dedicated function when removing a SRV item * MINOR: resolvers: Remove server from named_servers tree when removing a SRV item All the series must be backported as far as 2.2 after some observation period. Backports to 2.0 and 1.8 must be evaluated.	2021-06-17 16:52:35 +02:00
Christopher Faulet	85af93b8c7	BUG/MINOR: server-state: load SRV resolution only if params match the config When the state of a server is loaded, if there is no hostname defined for this server and if a fqdn and a server record are retrieved from the state file, it means the server should rely on a SRV resolution. But we must be sure the server is configured this way. A SRV resolution must be configured with the same SRV record. This part must be skipped if there is no SRV resolution configured for this server or if the SRV record used is not the same. This patch should be backported as far as 1.8 after some observation period.	2021-06-11 16:16:20 +02:00
Emeric Brun	3406766d57	MEDIUM: resolvers: add a ref between servers and srv request or used SRV record This patch add a ref into servers to register them onto the record answer item used to set their hostnames. It also adds a head list into 'srvrq' to register servers free to be affected to a SRV record. A head of a tree is also added to srvrq to put servers which present a hotname in server state file. To re-link them fastly to the matching record as soon an item present the same name. This results in better performances on SRV record response parsing. This is an optimization but it could avoid to trigger the haproxy's internal wathdog in some circumstances. And for this reason it should be backported as far we can (2.0 ?)	2021-06-11 16:16:16 +02:00
Willy Tarreau	bf1ae1a4b1	BUILD: server-state: include tools.h from server_state.c Many functions from tools.h are called there without the file being included.	2021-05-08 13:08:34 +02:00
Willy Tarreau	47a30c456c	BUG/MINOR: server-state: use the argument, not the global state The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor apply_server_state() to make it more readable") also had a copy-paste error resulting in using global.server_state_file instead of the function's argument, which easily crashes with a conf having a state file in a backend and no global state file. In addition, let's simplify the code and get rid of strcpy() which almost certainly will break the build on OpenBSD. This was introduced in 2.4-dev10, no backport is needed.	2021-03-12 14:13:07 +01:00
Willy Tarreau	6d4173e622	BUG/MINOR: server-state: properly handle the case where the base is not set The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor apply_server_state() to make it more readable") made the global server_state_base be dereferenced before being checked, resulting in a crash on certain files. This happened in 2.4-dev10, no backport is needed.	2021-03-12 13:57:19 +01:00
Ilya Shipitsin	d7a988c14a	CLEANUP: assorted typo fixes in the code and comments This is 19th iteration of typo fixes	2021-03-05 21:22:47 +01:00
Christopher Faulet	6f69110191	BUG/MINOR: server-state: Don't load server-state file for disabled backends Recent changes on the server-state file loading have introduced a regression. HAproxy crashes if a backend with no server-state file is disabled in the configuration. Indeed, configuration of such backends is not finalized. Thus many fields are not defined. To fix the bug, disabled backends must be ignored. In addition a BUG_ON() has been added to verify the proxy mode regarding the server-state file. It must be specified (none, global or local) for enabled backends. No backport needed.	2021-03-04 16:49:10 +01:00
Christopher Faulet	456f45f301	MINOR: server-state: Don't load server-state file for serverless proxies Just a minor improvement. Proxies with no server are now ignored early. It may happens for listeners for instance.	2021-02-25 10:02:39 +01:00
Christopher Faulet	3e3d3be708	REORG: server-state: Move functions to deal with server-state in its own file All functions dealing with the server-state files are moved to server_state.c. srv_update_state() function was renammed to srv_state_srv_update().	2021-02-25 10:02:39 +01:00

34 Commits