haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-27 14:41:28 +01:00

Author	SHA1	Message	Date
Aurelien DARRAGON	5c299dee5a	MEDIUM: stats: consider that shared stats pointers may be NULL This patch looks huge, but it has a very simple goal: protect all accessed to shared stats pointers (either read or writes), because we know consider that these pointers may be NULL. The reason behind this is despite all precautions taken to ensure the pointers shouldn't be NULL when not expected, there are still corner cases (ie: frontends stats used on a backend which no FE cap and vice versa) where we could try to access a memory area which is not allocated. Willy stumbled on such cases while playing with the rings servers upon connection error, which eventually led to process crashes (since 3.3 when shared stats were implemented) Also, we may decide later that shared stats are optional and should be disabled on the proxy to save memory and CPU, and this patch is a step further towards that goal. So in essence, this patch ensures shared stats pointers are always initialized (including NULL), and adds necessary guards before shared stats pointers are de-referenced. Since we already had some checks for backends and listeners stats, and the pointer address retrieval should stay in cpu cache, let's hope that this patch doesn't impact stats performance much.	2025-09-18 16:49:51 +02:00
Willy Tarreau	91258fb9d8	MEDIUM: guid: switch guid to more compact cebuis_tree The current guid struct size is 56 bytes. Once reduced using compact trees, it goes down to 32 (almost half). We're not on a critical path and size matters here, so better switch to this. It's worth noting that the name part could also be stored in the guid_node at the end to save 8 extra byte (no pointer needed anymore), however the purpose of this struct is to be embedded into other ones, which is not compatible with having a dynamic size. Affected struct sizes in bytes: Before After Diff server 4032 4032 0* proxy 3184 3160 -24 listener 752 728 -24 *: struct server is full of holes and padding (176 bytes) and is 64-byte aligned. Moving the guid_node elsewhere such as after sess_conn reduces it to 3968, or one less cache line. There's no point in moving anything now because forthcoming patches will arrange other parts.	2025-09-16 09:23:46 +02:00
Aurelien DARRAGON	b9ef55d56d	MINOR: stats-file: use explicit unsigned integer bitshift for user slots As reported in GH #3104, there remained a place where (1 << shift was used to set or remove bits from uint64_t users bitfield. It is incorrect and could lead to bugs for values > 32 bits. Instead, let's use 1ULL to ensure the operation remains 64bits consistent. No backport needed.	2025-09-08 13:38:49 +02:00
Aurelien DARRAGON	f32bc8f0a4	MEDIUM: stats-file: add some BUG_ON() guards to ensure exported structs are not changed by accident Add two BUG_ON() in shm_stats_file_prepare() which will trigger if exported structures (shm_stats_file_hdr and shm_stats_file_object) change in size, because it means that they will become incompatible with older versions and thus precautions should be taken by the developer to ensure compatibility with olders versions, or at least detect incompatible versions by changing the version number to prevent bugs resulting from inconsistent mapping between versions. The BUG_ON() may be safely adjusted then. Please note that it doesn't protect against accidental struct member re-ordering if the resulting struct size is equal..	2025-09-03 16:29:55 +02:00
Aurelien DARRAGON	cf2562cddf	OPTIM: stats-file: don't unnecessarily die hard on shm_stats_file_reuse_object() shm_stats_file_reuse_object() has a non negligible cost, especially if the shm file contains a lot of objects because the functions scans the whole shm file to find available slots. During startup, if no existing objects could be mapped in the shm file shm_stats_file_add_object() for each object (server, fe, be or listener) with a GUID set. On large config it means shm_stats_file_add_object() could be called a lot of times in a row. With current implementation, each shm_stats_file_add_object() call leverages shm_stats_file_reuse_object(), so the more objects are defined in the config, the slower the startup will be. To try to optimize startup time a bit with large configs, we don't sytematically call shm_stats_file_reuse_object(), especially when we know that the previous attempt to reuse objects failed. In this case we add a small tempo between failed attempts to reuse objects because we assume the new attempt will probably fail anyway. (For slots to become available, either an old process has to clean its entries, or they have to time out which implies that the clock needs to be updated)	2025-09-03 16:28:41 +02:00
Aurelien DARRAGON	585ece4c92	MEDIUM: stats-file/counters: store and preload stats counters as shm file objects This is the last patch of the shm stats file series, in this patch we implement the logic to store and fetch shm stats objects and associate them to existing shared counters on the current process. Shm objects are stored in the same memory location as the shm stats file header. In fact they are stored right after it. All objects (struct shm_stats_file_object) have the same size (no matter their type), which allows for easy object traversal without having to check the object's type, and could permit the use of external tools to scan the SHM in the future. Each object stores a guid (of GUID_MAX_LEN+1 size) and tgid which allows to match corresponding shared counters indexes. Also, as stated before, each object stores the list of users making use of it. Objects are never released (the map can only grow), but unused objects (when no more users or active users are found in objects->users), the object is automatically recycled. Also, each object stores its type which defines how the object generic data member should be handled. Upon startup (or reload), haproxy first tries to scan existing shm to find objects that could be associated to frontends, backends, listeners or servers in the current config based on GUID. For associations that couldn't be made, haproxy will automatically create missing objects in the SHM during late startup. When haproxy matches with an existing object, it means the counter from an older process is preserved in the new process, so multiple processes temporarily share the same counter for as long as required for older processes to eventually exit.	2025-09-03 15:59:37 +02:00
Aurelien DARRAGON	ee17d20245	MINOR: stats-file: add process slot management for shm stats file Now that all processes tied to the same shm stats file now share a common clock source, we introduce the process slot notion in this patch. Each living process registers itself in a map at a free index: each slot stores information about the process' PID and heartbeat. Each process is responsible for updating its heartbeat, a slot is considered as "free" if the heartbeat was never set or if the heartbeat is expired (60 seconds of inactivity). The total number of slots is set to 64, this is on purpose because it allows to easily store the "users" of a given shm object using a 64 bits bitmask. Given that when haproxy is reloaded olders processes are supposed to die eventually, it should be large enough (64 simultaneous processes) to be safe. If we manage to reach this limit someday, more slots could be added by splitting "users" bitmask on multiple 64bits variable.	2025-09-03 15:59:33 +02:00
Aurelien DARRAGON	443e657fd6	MEDIUM: stats-file: processes share the same clock source from shm-stats-file The use of the "shm-stats-file" directive now implies that all processes using the same file now share a common clock source, this is required for consistency regarding time-related operations. The clock source is stored in the shm stats file header. When the directive is set, all processes share the same clock (global_now_ms and global_now_ns both point to variables in the map), this is required for time-based counters such as freq counters to work consistently. Since all processes manipulate global clock with atomic operations exclusively during runtime, and don't systematically relies on it (thanks to local now_ms and now_ns), it is pretty much transparent.	2025-09-03 15:59:27 +02:00
Aurelien DARRAGON	c91d93ed1c	MINOR: stats-file: introduce shm-stats-file directive add initial support for the "shm-stats-file" directive and associated "shm-stats-file-max-objects" directive. For now they are flagged as experimental directives. The shared memory file is automatically created by the first process. The file is created using open() so it is up to the user to provide relevant path (either on regular filesystem or ramfs for performance reasons). The directive takes only one argument which is path of the shared memory file. It is passed as-is to open(). The maximum number of objects per thread-group (hard limit) that can be stored in the shm is defined by "shm-stats-file-max-objects" directive, Upon initial creation, the main shm stats file header is provisioned with the version which must remains the same to be compatible between processes and defaults to 2k. which means approximately 1mb max per thread group and should cover most setups. When the limit is reached (during startup) an error is reported by haproxy which invites the user to increase the "shm-stats-file-max-objects" if desired, but this means more memory will be allocated. Actual memory usage is low at start, because only the mmap (mapping) is provisionned with the maximum number of objects to avoid relocating the memory area during runtime, but the actual shared memory file is dynamically resized when objects are added (resized by following half power of 2 curve when new objects are added, see upcoming commits) For now only the file is created, further logic will be implemented in upcoming commits.	2025-09-03 15:59:22 +02:00
Aurelien DARRAGON	75e480d107	MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct Between 3.2 and 3.3-dev we noticed a noticeable performance regression due to stats handling. After bisecting, Willy found out that recent work to split stats computing accross multiple thread groups (stats sharding) was responsible for that performance regression. We're looking at roughly 20% performance loss. More precisely, it is the added indirections, multiplied by the number of statistics that are updated for each request, which in the end causes a significant amount of time being spent resolving pointers. We noticed that the fe_counters_shared and be_counters_shared structures which are currently allocated in dedicated memory since a0dcab5c ("MAJOR: counters: add shared counters base infrastructure") are no longer huge since 16eb0fab31 ("MAJOR: counters: dispatch counters over thread groups") because they now essentially hold flags plus the per-thread group id pointer mapping, not the counters themselves. As such we decided to try merging fe_counters_shared and be_counters_shared in their parent structures. The cost is slight memory overhead for the parent structure, but it allows to get rid of one pointer indirection. This patch alone yields visible performance gains and almost restores 3.2 stats performance. counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and now returns either failure or success instead of a pointer because we don't need to retrieve a shared pointer anymore, the function takes care of initializing existing pointer.	2025-07-25 16:46:10 +02:00
Aurelien DARRAGON	16eb0fab31	MAJOR: counters: dispatch counters over thread groups Most fe and be counters are good candidates for being shared between processes. They are now grouped inside "shared" struct sub member under be_counters and fe_counters. Now they are properly identified, they would greatly benefit from being shared over thread groups to reduce the cost of atomic operations when updating them. For this, we take the current tgid into account so each thread group only updates its own counters. For this to work, it is mandatory that the "shared" member from {fe,be}_counters is initialized AFTER global.nbtgroups is known, because each shared counter causes the stat to be allocated lobal.nbtgroups times. When updating a counter without concurrency, the first counter from the array may be updated. To consult the shared counters (which requires aggregation of per-tgid individual counters), some helper functions were added to counter.h to ease code maintenance and avoid computing errors.	2025-06-05 09:59:38 +02:00
Aurelien DARRAGON	a0dcab5c45	MAJOR: counters: add shared counters base infrastructure Shareable counters are not tagged as shared counters and are dynamically allocated in separate memory area as a prerequisite for being stored in shared memory area. For now, GUID and threads groups are not taken into account, this is only a first step. also we ensure all counters are now manipulated using atomic operations, namely, "last_change" counter is now read from and written to using atomic ops. Despite the numerous changes caused by the counters being moved away from counters struct, no change of behavior should be expected.	2025-06-05 09:58:58 +02:00
William Lallemand	e1fb9a47e1	MEDIUM: stats-file: silently ignore be/fe mistmatch Most of the invalid or unknow field in the stats-file parser are ignored silently, which is not the case of the frontend/backend mismatch on a guid, which is kind of strange. Since this is ""documented"" to be ignored in the reg-tests/stats/sample-stats-file file, let's also ignore this kind of line. This will allow to run the associated reg-test with -dW.	2024-11-19 16:44:51 +01:00
William Lallemand	f36caf7b81	MEDIUM: stats-file: explicitely ignore comments starting by // Explicitely ignore comments starting by // so they don't emit a warning.	2024-11-19 15:49:44 +01:00
William Lallemand	96f2736e99	MINOR: stats-file: add the filename in the warning Add the name of the stats-file in the warning so it's clear that the warning was provoked by the stats-file and not the config file.	2024-11-19 15:49:44 +01:00
Amaury Denoyelle	fbc3d46b9f	BUILD: stats: remove non portable getline() usage getline() was used to read stats-file. However, this function is not portable and may cause build issue on some systems. Replace it by standard fgets(). No need to backport.	2024-05-17 14:53:19 +02:00
Ilia Shipitsin	a65c6d3574	CLEANUP: assorted typo fixes in the code and comments This is 42nd iteration of typo fixes	2024-05-03 09:01:36 +02:00
Amaury Denoyelle	e92ae8f0ba	MINOR: stats: support age in stats-file Extend generic stat column support to be able to fully support age stats type. Several changes were required. On output, me_generate_field() has been updated to report the difference between the current tick with the stored value for FN_AGE type. Also, if an age stats is hidden in show stats, -1 is returned instead of an empty metric, which is the value to mark an age as unset. On counters preload, load_ctr() was updated to handled FN_AGE. A similar substraction is performed to the current tick value.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	fec2ae9b76	MINOR: stats: support rate in stats-file Implement support for FN_RATE stat column into stat-file. For the output part, only minimal change is required. Reuse the function read_freq_ctr() to print the same value in both stats output and stats-file dump. For counter preloading, define a new utility function preload_freq_ctr(). This can be used to initialize a freq-ctr type by preloading previous period value. Reuse this function in load_ctr() during stats-file parsing. At the moment, no rate column is defined as generic. Thus, this commit does not have functional change. This will be changed as soon as FN_RATE are converted to generic columns.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	4e9e841878	MINOR: stats: prepare stats-file support for values other than FN_COUNTER Currently, only FN_COUNTER are dumped and preloaded via a stats-file. Thus in several places we relied on the assumption that only FN_COUNTER are valid in stats-file context. New stats types will soon be implemented as they are also eligilible to statistics reloading on process startup. Thus, prepare stats-file functions to remove any FN_COUNTER restriction. As one of this change, generate_stat_tree() now uses stcol_is_generic() for stats name tree indexing before stats-file parsing. Also related to stats-file parsing, individual counter preloading step as been extracted from line parsing in a dedicated new function load_ctr(). This will allow to extend it to support multiple mechanism of counter preloading depending on the stats type.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	20bc42e697	BUG/MINOR: stats: replace objt_* by __objt_* macros Update parse_stat_line() used during stats-file parsing. For each line, GUID is extracted first to access to the object instance. obj_type() is then invoked to retrieve the correct object type. Replace objt_* by __objt_* macros to mark its result as safe and non NULL. This should fix coverity report from github issue #2550. No need to backport.	2024-04-29 14:21:10 +02:00
Amaury Denoyelle	782be288ca	MINOR: stats: parse values from stats-file This patch implement parsing of counter values line from stats-file. It reuses domain context previously set by the last header line. Each value is separated by ',' character, relative to the list of column names describe by the header line. This is implemented via static function parse_stat_line(). It first extract a GUID and retrieve the object instance. Then each numerical value is parsed and object counters updated. For the moment, only U64 counters metrics is supported. parse_stat_line() is called on each line until a new header line is found.	2024-04-26 11:34:02 +02:00
Amaury Denoyelle	374dc08611	MINOR: stats: parse header lines from stats-file This patch implements parsing of headers line from stats-file. A header line is defined as starting with '#' character. It is directly followed by a domain name. For the moment, either 'fe' or 'be' is allowed. The following lines will contain counters values relatives to the domain context until the next header line. This is implemented via static function parse_header_line(). It first sets the domain context used during apply_stats_file(). A stats column array is generated to contains the order on which column are stored. This will be reused to parse following lines values. If an invalid line is found and no header was parsed, considered the stats-file as ill formatted and stop parsing. This allows to immediately interrupt parsing if a garbage file was used without emitting a ton of warnings to the user.	2024-04-26 11:34:02 +02:00
Amaury Denoyelle	34ae7755b3	MINOR: stats: apply stats-file on process startup This commit is the first one of a serie to implement preloading of haproxy counters via stats-file parsing. This patch defines a basic apply_stats_file() function. It implements reading line by line of a stats-file without any parsing for the moment. It is called automatically on process startup via init().	2024-04-26 11:29:25 +02:00
Amaury Denoyelle	e74148fb7c	MEDIUM: stats: implement dump stats-file CLI Define a new CLI command "dump stats-file" with its handler cli_parse_dump_stat_file(). It will loop twice on proxies_list to dump first frontend and then backend side. It reuses the common function stats_dump_stat_to_buffer(), using STAT_F_BOUND to restrict on the correct side. A new module stats-file.c is added to regroup function specifics to stats-file. It defines two main functions : * stats_dump_file_header() to generate the list of column list prefixed by the line context, either "#fe" or "#be" * stats_dump_fields_file() to generate each stat lines. Object without GUID are skipped. Each stat entry is separated by a comma. For the moment, stats-file does not support statistics modules. As such, stats_dump_*_line() functions are updated to prevent looping over stats module on stats-file output.	2024-04-26 10:20:57 +02:00

25 Commits