haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-09 16:47:18 +02:00

Author	SHA1	Message	Date
Ilia Shipitsin	27a6353ceb	CLEANUP: assorted typo fixes in the code, commits and doc	2025-04-03 11:37:25 +02:00
Aurelien DARRAGON	97a19517ff	MINOR: clock: always use atomic ops for global_now_ms global_now_ms is shared between threads so we must give hint to the compiler that read/writes operations should be performed atomically. Everywhere global_now_ms was used, atomic ops were used, except in clock_update_global_date() where a read was performed without using atomic op. In practise it is not an issue because on most systems such reads should be atomic already, but to prevent any confusion or potential bug on exotic systems, let's use an explicit _HA_ATOMIC_LOAD there. This may be backported up to 2.8	2025-02-21 11:22:35 +01:00
Willy Tarreau	5a3735a155	BUG/MEDIUM: clock: make sure now_ms cannot be TICK_ETERNITY In clock ticks, 0 is TICK_ETERNITY. Long ago we used to make sure now_ms couldn't be zero so that it could be assigned to expiration timers, but it has long changed after functions like tick_add() were instrumented to make the check. The problem is that aside the rare few accidental direct assignments to expiration dates, it's also used to mark the beginning of an event that's later checked against TICK_ETERNITY to know if it has already struck. The problem in this case is that certain events may just be replaced or dropped just because they apparently never appeared. It's probably the case for stconn's "lra" and "fsb" fields, just like it is for all those involving tick_add_ifset(), like h2c->idle_start. The right approach would be to change the type of now_ms to something else that cannot take direct computations and that represents a timestamp, forcing to always use the conversion functions. The variables holding such timestamps would also be distinguished from intervals. At first glance we could have for timestamps: - 0 = never happened (for the past), eternity (for the future) - X = date and for intervals: - 0 = not set - X = interval However this requires significant changes. Instead for now, let's just make sure again that now_ms is never 0 by setting it to 1 when this happens (1 / 4 billion times, or 1ms every 49.7 days). This will need to be carefully backported to older versions. Note that with this patch backported, the previous ones fixing the zero date are not strictly needed.	2024-11-15 16:01:31 +01:00
Willy Tarreau	499e057644	MEDIUM: clock: don't compute before_poll when using monotonic clock There's no point keeping both clocks up to date; if the monotonic clock is ticking, let's just refrain from updating the wall clock one before polling since we won't use it. We still do it after polling however as we need a wall clock time to communicate with outside. This saves one gettimeofday() call per loop and two timeval comparisons.	2024-09-17 09:08:10 +02:00
Willy Tarreau	24496803d1	MEDIUM: clock: use the monotonic clock for idle time calculation By just keeping a copy of the last known value before entering polling, we can apply the same algorithm as we're currently using, except that it's now applied to the monotonic clock instead of the wall clock, when it's detected that it's ticking. This improves idle time calculation accuracy by making it independent on the wall clock.	2024-09-17 09:08:10 +02:00
Willy Tarreau	4150851ce5	MEDIUM: clock: opportunistically use CLOCK_MONOTONIC for the internal time We already collect CLOCK_MONOTONIC when it's available when leaving the poller, but it's only used for profiling. The functions that return it set the value to zero when it's not available, so we can use that to detect if it works or not. The idea is that if the monotonic time is non-zero, it is ticking and usable, then we use if for now_ns, otherwise we use the corrected date. We continue to apply the now_offset to the returned value because it helps forcing an early time wrap-around. Proceeding like this presents two benefits: - on systems supporting this, the time is much more robust against time changes - when it works, it saves us from having to go through the time correction code, which is usually cheap, but better avoided anyway. Note that idle time calculation continues to rely on the wall-clock time.	2024-09-17 09:08:10 +02:00
Willy Tarreau	f793845f4a	MEDIUM: clock: collect the monotonic time in clock_local_update_date() Now we collect this clock in clock_local_update_date(), the closest from the poller, which is also used when busy-polling, and the values is set into the thread's curr_mono_time which did not exist before. Later, clock_leaving_poll() just sets the prev_mono_time value from the curr_ one instead of retrieving the time at this specific point. It also means that the monotonic time will now also cover the time needed to update the global time, which should be negligible. Note that we don't collect the CPU time in the clock_local_update_date() function even though it's tempting, because when doing busy-polling, it would be collected on each round while being useless. Doing so will make sure that the local time always knows the monotonic time when it is available.	2024-09-17 09:08:10 +02:00
Willy Tarreau	42e699903e	MINOR: clock: test all clock_gettime() return values Till now we were only using clock_gettime() for profiling, so if it would fail it was no big deal. We intend to use it as the main clock as well now, so we need to more reliably detect its absence or failure and gracefully fall back to other options. Without the test we would return anything present in the stack, which is neither clean nor easy to detect.	2024-09-17 09:08:10 +02:00
Willy Tarreau	adaba6f904	BUG/MINOR: clock: validate that now_offset still applies to the current date We want to make sure that now_offset is still valid for the current date: another thread could very well have updated it by detecting a backwards jump, and at the very same moment the time got fixed again, that we retrieve and add to the new offset, which results in a larger jump. Normally, for this to happen, it would mean that before_poll was also affected by the jump and was detected before and bounded within 2 seconds, resulting in max 2 seconds perturbations. Here we try to detect this situation and fall back to re-adjusting the offset instead. It's more of a strengthening of what's done by commit `e8b1ad4c2b` ("BUG/MEDIUM: clock: also update the date offset on time jumps") than a pure fix, in that the issue was not direclty observed but it's visibly possible by reading the code, so this should be backported along with the patch above. This is related to issue GH #2704. Note that this could be simplified in terms of operations by migrating the deadlines to nanoseconds, but this was the path to least intrusive changes.	2024-09-12 19:09:19 +02:00
Willy Tarreau	af48e4cc6b	BUG/MINOR: clock: make time jump corrections a bit more accurate Since commit `e8b1ad4c2b` ("BUG/MEDIUM: clock: also update the date offset on time jumps") we try to update the now_offet based on the last known valid date. But if it's off compared to the global_now_ns date shared by other threads, we'll get the time off a little bit. When this happens, we should consider the most recent of these dates so that if the global date was already known to be more recent, we should use it and stick to it. This will avoid setting too large an offset that could in turn provoke a larger jump on another thread. This is related to issue GH #2704. This can be backported to other branches having the patch above.	2024-09-12 18:27:03 +02:00
Willy Tarreau	ef8d8215de	BUG/MEDIUM: clock: detect and cover jumps during execution After commit `e8b1ad4c2` ("BUG/MEDIUM: clock: also update the date offset on time jumps"), @firexinghe mentioned that the issue was still present in their case. In fact it depends on the load, which affects the probability that the time changes between two poll() calls vs that it changes during poll(). The time correction code used to only deal with the latter. But under load if it changes between two poll() calls, what happens then is that before_poll is off, and after returning from poll(), the date is within bounds defined by before_poll, so no correction is applied. After many tests, it turns out that the most reliable solution without using CLOCK_MONOTONIC is to prevent before_poll from being earlier than the previous after_poll (trivial), and to cover forward jumps, we need to enforce a margin. Given that the watchdog kills a looping task within 2 seconds and that no sane setup triggers it, it seems that 2 seconds remains a safe enough margin. This means that in the worst case, some forward jumps of up to 2 seconds will not be corrected, leading to an apparent fast time and low rates. But this is supposed to be an exceptional event anyway (typically an admin or crontab running ntpdate). For future versions, given that we now opportunistically call now_mono_time() before and after poll(), that returns zero if not supported, we could imagine relying on this one for the thread's local time when it's non-null.	2024-09-08 19:15:38 +02:00
Willy Tarreau	e8b1ad4c2b	BUG/MEDIUM: clock: also update the date offset on time jumps In GH issue #2704, @swimlessbird and @xanoxes reported problems handling time jumps. Indeed, since 2.7 with commit `4eaf85f5d9` ("MINOR: clock: do not update the global date too often") we refrain from updating the global offset in case it didn't change. But there's a catch: in case of a large time jump, if the poller was interrupted, the local time remains the same and we return immediately from there without updating the offset. It then becomes incorrect regarding the "date" value, and upon subsequent call to the poller, there's no way to detect a jump anymore so we apply the old, incorrect offset and the date becomes wrong. Worse, going back to the original time (then in the past), global_now_ns remains higher than the local time and neither get updated anymore. What is missing in practice is to immediately update the offset when detecting a time jump. In an ideal world, the offset would be updated upon every call, that's what was being done prior to commit above but it's extremely CPU intensive on large systems. However we can perfectly afford to update the offset every time we detect a time jump, as it's not as common. This needs to be backported as far as 2.8. Thanks to both participants above for providing very helpful details.	2024-09-04 16:55:43 +02:00
Ilia Shipitsin	a7cf2454dd	BUILD: clock: improve check for pthread_getcpuclockid() if _POSIX_THREAD_CPUTIME is greater than 0, pthread_getcpuclockid() is implemented. This should fix the build on Solaris 11. Reference: https://docs.oracle.com/cd/E88353_01/html/E37842/unistd-3head.html ML: https://www.mail-archive.com/haproxy@formilux.org/msg44915.html	2024-05-06 08:25:17 +02:00
Willy Tarreau	5345490b8e	MINOR: clock: provide a function to automatically adjust now_offset Right now there's no way to enforce a specific value of now_ms upon startup in order to compensate for the time it takes to load a config, specifically when dealing with the health check startup. For this we'd need to force the now_offset value to compensate for the last known value of the current date. This patch exposes a function to do exactly this.	2023-05-17 09:33:54 +02:00
Willy Tarreau	da4aa6905c	MINOR: clock: measure the total boot time Some huge configs take a significant amount of time to start and this can cause some trouble (e.g. health checks getting delayed and grouped, process not responding to the CLI etc). For example, some configs might start fast in certain environments and slowly in other ones just due to the use of a wrong DNS server that delays all libc's resolutions. Let's first start by measuring it by keeping a copy of the most recently known ready date, once before calling check_config_validity() and then refine it when leaving this function. A last call is finally performed just before deciding to split between master and worker processes, and it covers the whole boot. It's trivial to collect and even allows to get rid of a call to clock_update_date() in function check_config_validity() that was used in hope to better schedule future events.	2023-05-17 09:33:54 +02:00
Willy Tarreau	c05d30e9d8	MINOR: clock: replace the timeval start_time with start_time_ns Now that "now" is no more a timeval, there's no point keeping a copy of it as a timeval, let's also switch start_time to nanoseconds, it simplifies operations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Aurelien DARRAGON	df188f145b	MINOR: clock: add now_cpu_time_fast() function Same as now_cpu_time(), but for fast queries (less accurate) Relies on now_cpu_time() and now_mono_time_fast() is used as a cache expiration hint to prevent now_cpu_time() from being called too often since it is known to be quite expensive. Depends on commit "MINOR: clock: add now_mono_time_fast() function"	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	07cbd8e074	MINOR: clock: add now_mono_time_fast() function Same as now_mono_time(), but for fast queries (less accurate) Relies on coarse clock source (also known as fast clock source on some systems). Fallback to now_mono_time() if coarse source is not supported on the system.	2023-04-19 11:03:31 +02:00
Willy Tarreau	fc458ec8aa	CLEANUP: tree-wide: remove strpcy() from constant strings These ones are genenerally harmless on modern compilers because the compiler checks them. While gcc optimizes them away without even referencing strcpy(), clang prefers to call strcpy(). Nevertheless they prevent from enabling stricter checks so better remove them altogether. They were all replaced by strlcpy2() and the size of the destination which is always known there.	2023-04-07 18:14:28 +02:00
Willy Tarreau	28360dc53f	MEDIUM: clock: force internal time to wrap early after boot GH issue #2034 clearly indicates yet another case of time roll-over that went badly. Issues that happen only once every 50 days are hard to detect and debug, and are usually reported more or less synchronized from multiple sources. This patch finally does what had long been planned but never done yet, which is to force the time to wrap early after boot so that any such remaining issue can be spotted quicker. The margin delay here is 20s (it may be changed by setting BOOT_TIME_WRAP_SEC to another value). This value seems sufficient to permit failed health checks to succeed and traffic to come in and possibly start to update some time stamps (accept dates in logs, freq counters, stick-tables expiration dates etc). It could theoretically be helpful to have this in 2.7, but as can be seen with the two patches below, we've already had incorrect use cases of the internal monotonic time when the wall-clock one was needed, so we could expect to detect other ones in the future. Note that this will not induce bugs, it will only make them happen much faster (i.e. no need to wait for 50 days before seeing them). If it were to eventually be backported, these two previous patches must also be backported: BUG/MINOR: clock: use distinct wall-clock and monotonic start dates BUG/MEDIUM: cache: use the correct time reference when comparing dates	2023-02-08 11:10:33 +01:00
Willy Tarreau	6093ba47c0	BUG/MINOR: clock: do not mix wall-clock and monotonic time in uptime calculation We've had a start date even before the internal monotonic clock existed, but once the monotonic clock was added, the start date was not updated to distinguish the wall clock time units and the internal monotonic time units. The distinction is important because both clocks do not necessarily progress at the same speed. The very rare occurrences of the wall-clock date are essentially for human consumption and communication with third parties (e.g. report the start date in "show info" for monitoring purposes). However currently this one is also used to measure the distance to "now" as being the process' uptime. This is actually not correct. It only works because for now the two dates are initialized at the exact same instant at boot but could still be wrong if the system's date shows a big jump backwards during startup for example. In addition the current situation prevents us from enforcing an abritrary offset at boot to reveal some heisenbugs. This patch adds a new "start_time" at boot that is set from "now" and is used in uptime calculations. "start_date" instead is now set from "date" and will always reflect the system date for human consumption (e.g. in "show info"). This way we're now sure that any drift of the internal clock relative to the system date will not impact the reported uptime. This could possibly be backported though it's unlikely that anyone has ever noticed the problem.	2023-02-08 11:06:55 +01:00
Aurelien DARRAGON	16d6c0cb09	BUG/MEDIUM: wdt/clock: properly handle early task hangs In `ae053b30` - BUG/MEDIUM: wdt: don't trigger the watchdog when p is unitialized: wdt is not triggering until prev_cpu_time is initialized to prevent unexpected process termination. Unfortunately this is not enough, some tasks could start immediately after process startup, and in such cases prev_cpu_time could be uninitialized, because prev_cpu_time is set after the polling loop while process_runnable_tasks() is executed before the polling loop. It happens to be the case with lua tasks registered using register_task function from lua script. Those tasks are registered in early init stage of haproxy and they are scheduled to run before the first polling loop, leading to prev_cpu_time being uninitialized (equals 0) on the thread when the task is first executed. Because of this, if such tasks get stuck right away (e.g: blocking IO) the watchdog won't behave as expected and the thread will remain stuck indefinitely. (polling loop for the thread won't run at all as the thread is already stuck) To solve this, we're now making sure that prev_cpu_time is first set before any tasks are processed on the thread. This is done by setting initial prev_cpu_time value directly in clock_init_thread_date() Thanks to Abhijeet Rastogi for reporting this unexpected behavior. It could be backported in every stable versions. (everywhere `ae053b30` is, because both are related)	2022-11-14 19:14:53 +01:00
Ilya Shipitsin	4a689dad03	CLEANUP: assorted typo fixes in the code and comments This is 32nd iteration of typo fixes	2022-10-30 17:17:56 +01:00
Willy Tarreau	4eaf85f5d9	MINOR: clock: do not update the global date too often Tests with forced wakeups on a 24c/48t machine showed that we're caping at 7.3M loops/s, which means 6.6 microseconds of loop delay without having anything to do. This is caused by two factors: - the load and update of the now_offset variable - the update of the global_now variable What is happening is that threads are not running within the one- microsecond time precision provided by gettimeofday(), so each thread waking up sees a slightly different date and causes undesired updates to global_now. But worse, these undesired updates mean that we then have to adjust the now_offset to match that, and adds significant noise to this variable, which then needs to be updated upon each call. By only allowing sightly less precision we can completely eliminate that contention. Here we're ignoring the 5 lowest bits of the usec part, meaning that the global_now variable may be off by up to 31 us (16 on avg). The variable is only used to correct the time drift some threads might be observing in environments where CPU clocks are not synchronized, and it's used by freq counters. In both cases we don't need that level of precision and even one millisecond would be pretty fine. We're just 30 times better at almost no cost since the global_now and now_offset variables now only need to be updated 30000 times a second in the worst case, which is unnoticeable. After this change, the wakeup rate jumped from 7.3M/s to 66M/s, meaning that the loop delay went from 6.6us to 0.73us, that's a 9x improvement when under load! With real tasks we're seeing a boost from 28M to 52M wakeups/s. The clock_update_global_date() function now only takes 1.6%, it's good enough so that we don't need to go further.	2022-09-21 09:06:28 +02:00
Willy Tarreau	a700420671	MINOR: clock: split local and global date updates Pollers that support busy polling spend a lot of time (and cause contention) updating the global date when they're looping over themselves while it serves no purpose: what's needed is only an update on the local date to know when to stop looping. This patch splits clock_pudate_date() into a pair of local and global update functions, so that pollers can be easily improved.	2022-09-21 09:06:28 +02:00
Willy Tarreau	1e7f0d68b0	MINOR: clock: use ltid_bit in clock_report_idle() Since commit `cc7a11ee3` ("MINOR: threads: set the tid, ltid and their bit in thread_cfg") we ought not use (1UL << thr) to get the group mask for thread <thr>, but (ha_thread_info[thr].ltid_bit). clock_report_idle() needs this. This also implies not using all_threads_mask anymore but taking the mask from the tgroup since it becomes relative now.	2022-07-01 19:15:15 +02:00
Willy Tarreau	45c38e22bf	REORG: thread/clock: move the clock parts of thread_info to thread_ctx The "thread_info" name was initially chosen to store all info about threads but since we now have a separate per-thread context, there is no point keeping some of its elements in the thread_info struct. As such, this patch moves prev_cpu_time, prev_mono_time and idle_pct to thread_ctx, into the thread context, with the scheduler parts. Instead of accessing them via "ti->" we now access them via "th_ctx->", which makes more sense as they're totally dynamic, and will be required for future evolutions. There's no room problem for now, the structure still has 84 bytes available at the end.	2021-10-08 17:22:26 +02:00
Willy Tarreau	2169498941	MINOR: clock: move the clock_ids to clock.c This removes the knowledge of clockid_t from anywhere but clock.c, thus eliminating a source of includes burden. The unused clock_id field was removed from thread_info, and the definition setting of clockid_t was removed from compat.h. The most visible change is that the function now_cpu_time_thread() now takes the thread number instead of a tinfo pointer.	2021-10-08 17:22:26 +02:00
Willy Tarreau	6cb0c391e7	REORG: clock/wdt: move wdt timer initialization to clock.c The code that deals with timer creation for the WDT was moved to clock.c and is called with the few relevant arguments. This removes the need for awareness of clock_id from wdt.c and as such saves us from having to share it outside. The timer_t is also known only from both ends but not from the public API so that we don't have to create a fake timer_t anymore on systems which do not support it (e.g. macos).	2021-10-08 17:22:26 +02:00
Willy Tarreau	44c58da52f	REORG: clock: move the clock_id initialization to clock.c This was previously open-coded in run_thread_poll_loop(). Now that we have clock.c dedicated to such stuff, let's move the code there so that we don't need to keep such ifdefs nor to depend on the clock_id.	2021-10-08 17:22:26 +02:00
Willy Tarreau	2c6a998727	CLEANUP: clock: stop exporting before_poll and after_poll We don't need to export them anymore so let's make them static.	2021-10-08 17:22:26 +02:00
Willy Tarreau	20adfde9c8	MINOR: activity: get the run_time from the clock updates Instead of fiddling with before_poll and after_poll in activity_count_runtime(), the function is now called by clock_entering_poll() which passes it the number of microseconds spent working. This allows to remove all calls to activity_count_runtime() from the pollers.	2021-10-08 17:22:26 +02:00
Willy Tarreau	f9d5e1079c	REORG: clock: move the updates of cpu/mono time to clock.c The entering_poll/leaving_poll/measure_idle functions that were hard to classify and used to move to various locations have now been placed into clock.c since it's precisely about time-keeping. The functions were renamed to clock_*. The samp_time and idle_time values are now static since there is no reason for them to be read from outside.	2021-10-08 17:22:26 +02:00
Willy Tarreau	5554264f31	REORG: time: move time-keeping code and variables to clock.c There is currently a problem related to time keeping. We're mixing the functions to perform calculations with the os-dependent code needed to retrieve and adjust the local time. This patch extracts from time.{c,h} the parts that are solely dedicated to time keeping. These are the "now" or "before_poll" variables for example, as well as the various now_() functions that make use of gettimeofday() and clock_gettime() to retrieve the current time. The "tv_" functions moved there were also more appropriately renamed to "clock_*". Other parts used to compute stolen time are in other files, they will have to be picked next.	2021-10-08 17:22:26 +02:00

35 Commits