31 Commits

Author SHA1 Message Date
Amaury Denoyelle
82907d5621 MINOR: stats: report BE unpublished status
A previous patch defines a new proxy status : unpublished backends. This
patch extends this by changing proxy status reported in stats. If
unpublished is set, an extra "(UNPUB)" is added to the field.

Also, HTML stats is also slightly updated. If a backend is up but
unpublished, its status will be reported in orange color.
2026-01-15 09:08:18 +01:00
Olivier Houchard
5495c88441 MEDIUM: counters: Dynamically allocate per-thread group counters
Instead of statically allocating the per-thread group counters,
based on the max number of thread groups available, allocate
them dynamically, based on the number of thread groups actually
used. That way we can increase the maximum number of thread
groups without using an unreasonable amount of memory.
2026-01-13 11:12:34 +01:00
Aurelien DARRAGON
a287841578 MINOR: stats-proxy: ensure future-proof FN_AGE manipulation in me_generate_field()
Commit ad1bdc33 ("BUG/MAJOR: stats-file: fix crash on non-x86 platform
caused by unaligned cast") revealed an ambiguity in me_generate_field()
around FN_AGE manipulation. For now FN_AGE can only be stored as u32 or
s32, but in the future we could also support 64bit FN_AGES, and the
current code assumes 32bits types and performs and explicit unsigned int
cast. Instead we group current 32 bits operations for FF_U32 and FF_S32
formats, and let room for potential future formats for FN_AGE.

Commit ad1bdc33 also suggested that the fix was temporary and the approach
must change, but after a code review it turns out the current approach
(generic types manipulation under me_generate_field()) is legit. The
introduction of shm-stats-file feature didn't change the logic which
was initially implemented in 3.0. It only extended it and since shared
stats are now spread over thread-groups since 3.3, the use of atomic
operations made typecasting errors more visible, and structure mapping
change from d655ed5f14 ("BUG/MAJOR: stats-file: ensure
shm_stats_file_object struct mapping consistency (2nd attempt)") was in
fact the only change to blame for the crash on non-x86 platforms.

With ambiguities removed in me_generate_field(), let's hope we don't face
similar bugs in the future. Indeed, with generic counters, and more
specifically shared ones (which leverage atomic ops), great care must be
taken when changing their underlying types as me_generate_field() solely
relies on stat_col descriptor to know how to read the stat from a generic
pointer, so any breaking change must be reflected in that function as well

No backport needed.
2025-11-10 21:32:22 +01:00
Christopher Faulet
7d1787ba8e MINOR: sample/stats: Add "bytes" in req_{in,out} and res_{in,out} names
Number of bytes received or sent by a client or a server are now
saved. Sample fetches and stats fields to retrieve these informations are
renamed to add "bytes" in names to avoid any ambiguity with number of
requests and responses.
2025-11-07 14:09:48 +01:00
Christopher Faulet
4991a51208 MINOR: stats: Add stats about request and response bytes received and sent
In previous patches, these counters were added per frontend, backend, server
and listener. With this patch, these counters are reported on stats,
including promex.

Note that the stats file minor version was incremented by one because the
shm_stats_file_object struct size has changed.

This patch is related to issue #1617.
2025-11-06 15:01:29 +01:00
Christopher Faulet
0084baa6ba MINOR: counters: Remove bytes_in and bytes_out counter from fe/be/srv/li
bytes_in and bytes_out counters per frontend, backend, listener and server
were removed and we now rely on, respectively on, req_in and res_in
counters.

This patch is related to issue #1617.
2025-11-06 15:01:29 +01:00
Willy Tarreau
ad1bdc3364 BUG/MAJOR: stats-file: fix crash on non-x86 platform caused by unaligned cast
Since commit d655ed5f14 ("BUG/MAJOR: stats-file: ensure
shm_stats_file_object struct mapping consistency (2nd attempt)"), the
last_state_change field in the counters is a uint (to match how it's
reported). However, it happens that there are explicit casts in function
me_generate_field() to retrieve the value, and which cause crashes on
aarch64 and likely other non-x86 64-bit platforms due to atomically
reading an unaligned 64-bit value, and may even randomly crash other
64-bit platforms when reading past the end of the structure.

The fix for now adapts the cast to match the one used by the accessed
type (i.e. unsigned int), but the approach must change, as there's
nothing there which allows to figure whether or not the type is correct
by just reading the code. At minima a typeof() on a named field is
needed, but this requires more invasive changes, hence this temporary
fix.

No backport is needed, as stats-file is only in 3.3.
2025-11-03 07:33:11 +01:00
Amaury Denoyelle
fac1de935a MINOR: stats: display new curr_sess_idle_conns server counter
Add a new stats column in proxy stats to display server counter for
private idle connections. This counter has been introduced recently.

The value is displayed on CSV output on the last column before modules.
It is also displayed on HTLM page alongside other idle server counters.
2025-08-28 18:58:11 +02:00
Aurelien DARRAGON
75e480d107 MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct
Between 3.2 and 3.3-dev we noticed a noticeable performance regression
due to stats handling. After bisecting, Willy found out that recent
work to split stats computing accross multiple thread groups (stats
sharding) was responsible for that performance regression. We're looking
at roughly 20% performance loss.

More precisely, it is the added indirections, multiplied by the number
of statistics that are updated for each request, which in the end causes
a significant amount of time being spent resolving pointers.

We noticed that the fe_counters_shared and be_counters_shared structures
which are currently allocated in dedicated memory since a0dcab5c
("MAJOR: counters: add shared counters base infrastructure")
are no longer huge since 16eb0fab31 ("MAJOR: counters: dispatch counters
over thread groups") because they now essentially hold flags plus the
per-thread group id pointer mapping, not the counters themselves.

As such we decided to try merging fe_counters_shared and
be_counters_shared in their parent structures. The cost is slight memory
overhead for the parent structure, but it allows to get rid of one
pointer indirection. This patch alone yields visible performance gains
and almost restores 3.2 stats performance.

counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and
now returns either failure or success instead of a pointer because we
don't need to retrieve a shared pointer anymore, the function takes care
of initializing existing pointer.
2025-07-25 16:46:10 +02:00
Aurelien DARRAGON
4fcc9b5572 MINOR: counters: rename last_change counter to last_state_change
Since proxy and server struct already have an internal last_change
variable and we cannot merge it with the shared counter one, let's
rename the last_change counter to be more specific and prevent the
mixup between the two.

last_change counter is renamed to last_state_change, and unlike the
internal last_change, this one is a shared counter so it is expected
to be updated by other processes in our back.

However, when updating last_state_change counter, we use the value
of the server/proxy last_change as reference value.
2025-06-30 16:26:38 +02:00
Aurelien DARRAGON
16eb0fab31 MAJOR: counters: dispatch counters over thread groups
Most fe and be counters are good candidates for being shared between
processes. They are now grouped inside "shared" struct sub member under
be_counters and fe_counters.

Now they are properly identified, they would greatly benefit from being
shared over thread groups to reduce the cost of atomic operations when
updating them. For this, we take the current tgid into account so each
thread group only updates its own counters. For this to work, it is
mandatory that the "shared" member from {fe,be}_counters is initialized
AFTER global.nbtgroups is known, because each shared counter causes the stat
to be allocated lobal.nbtgroups times. When updating a counter without
concurrency, the first counter from the array may be updated.

To consult the shared counters (which requires aggregation of per-tgid
individual counters), some helper functions were added to counter.h to
ease code maintenance and avoid computing errors.
2025-06-05 09:59:38 +02:00
Aurelien DARRAGON
a0dcab5c45 MAJOR: counters: add shared counters base infrastructure
Shareable counters are not tagged as shared counters and are dynamically
allocated in separate memory area as a prerequisite for being stored
in shared memory area. For now, GUID and threads groups are not taken into
account, this is only a first step.

also we ensure all counters are now manipulated using atomic operations,
namely, "last_change" counter is now read from and written to using atomic
ops.

Despite the numerous changes caused by the counters being moved away from
counters struct, no change of behavior should be expected.
2025-06-05 09:58:58 +02:00
Aurelien DARRAGON
c7c017ec3c MINOR: stats: add ME_NEW_COMMON() helper
Split ME_NEW_* helper into COMMON part and specific part so it becomes
easier to add alternative helpers without code duplication.
2025-06-02 17:51:12 +02:00
Aurelien DARRAGON
d04843167c MINOR: stats: add stat_col flags
Add stat_col flags member to store .generic bit and prepare for upcoming
flags. No functional change expected.
2025-06-02 17:51:08 +02:00
Christopher Faulet
7244f16ac4 MINOR: promex: Add agent check status/code/duration metrics
In the Prometheus exporter, the last health check status is already exposed,
with its code and duration in seconds. The server status is also exposed.
But the information about the agent check are not available. It is not
really handy because when a server status is changed because of the agent,
it is not obvious by looking to the Prometheus metrics. Indeed, the server
may reported as DOWN for instance, while the health check status still
reports a success. Being able to get the agent status in that case could be
valuable.

So now, the last agent check status is exposed, with its code and duration
in seconds. Following metrics can be grabbe now:

  * haproxy_server_agent_status
  * haproxy_server_agent_code
  * haproxy_server_agent_duration_seconds

Note that unlike the other metrics, no per-backend aggregated metric is
exposed.

This patch is related to issue #2983.
2025-05-22 09:50:10 +02:00
Aurelien DARRAGON
dc95a3ed61 MINOR: promex: expose ST_I_PX_RATE (current_session_rate)
It has been requested to have the current_session_rate exposed at the
frontend level. For now only the per-process value was exposed
(ST_I_INF_SESS_RATE).

Thanks to the work done lately to merge promex and stat_cols_px[]
array, let's simply defined an .alt_name for the ST_I_PX_RATE metric in
order to have promex exposing it as current_session_rate for relevant
contexts.
2025-04-28 12:23:20 +02:00
Ilia Shipitsin
78b849b839 CLEANUP: assorted typo fixes in the code and comments
code, comments and doc actually.
2025-04-02 11:12:20 +02:00
Aurelien DARRAGON
276491dc22 MINOR: stats-proxy: add alt name info to stat_cols_px where relevant
For all metrics defined under promex_st_metrics array, add the
corresponding .alt_name field in the general purpose stat_cols_px
array.
2025-03-21 17:05:26 +01:00
Aurelien DARRAGON
7f9d8c1327 MINOR: stats-proxy: add alt_name field for ME_NEW_{FE,BE,PX} helpers
For now alt_name is systematically set to NULL. Thanks to this change we
may easily add an altname to existing metrics. Also by requiring explicit
value it offers more visibility for this field.
2025-03-21 17:05:19 +01:00
Aurelien DARRAGON
2ab82124ec MINOR: stats: explicitly add frontend cap for ST_I_PX_REQ_TOT
While being a generic metric, ST_I_PX_REQ_TOT is handled specifically for
the frontend case. But the frontend capability isn't set for that metric
It is actually quite misleading, because the capability may be checked
to see whether the metric is relevant for a given scope, yet it is
relevant for frontend scope.

In this patch we also add the frontend capability for the metric.
2025-03-20 11:42:43 +01:00
Aurelien DARRAGON
8aa8626d12 MINOR: stats: add .cap for some static metrics
Goal is to merge promex metrics definition into the main one.
Promex metrics will use the metric capability to know available scopes,
thus only metrics relevant for prometheus were updated.
2025-03-20 11:38:17 +01:00
Aurelien DARRAGON
3c1b00b127 MINOR: stats: add .generic explicit field in stat_col struct
Further extend logic implemented in 65624876 ("MINOR: stats: introduce a
more expressive stat definition method") and 4e9e8418 ("MINOR: stats:
prepare stats-file support for values other than FN_COUNTER"): we don't
rely anymore on the presence of the capability to know if the metric is
generic or not. This is because it prevents us from setting a capability
on static statistics. Yet it could be useful to set the capability even
on static metrics, thus we add a dedicated .generic bit to tell haproxy
that the metric is generic and can be handled automatically by the API.

Also, ME_NEW_* helpers are not explicitly associated to generic metric
definition (as it was already the case before) to avoid ambiguities.
It may change in the future as we may need to use the new definition
method to define static metrics (without the generic bit set). But for
now it isn't the case as this need definition was implemented for generic
metrics support in the first place. If we want to define static metrics
using the API, we could add a new set of helpers for instance.
2025-03-20 11:37:21 +01:00
Aurelien DARRAGON
8311be5ac6 BUG/MINOR: stats: fix capabilities and hide settings for some generic metrics
Performing a diff on stats output before vs after commit 66152526
("MEDIUM: stats: convert counters to new column definition") revealed
that some metrics were not properly ported to to the new API. Namely,
"lbtot", "cli_abrt" and "srv_abrt" are now exposed on frontend and
listeners while it was not the case before.

Also, "hrsp_other" is exposed even when "mode http" wasn't set on the
proxy.

In this patch we restore original behavior by fixing the capabilities
and hide settings.

As this could be considered as a minor regression (looking at the commit
message it doesn't seem intended), better tag this as a bug. It should be
backported in 3.0 with 66152526.
2025-03-13 11:49:18 +01:00
Olivier Houchard
583303c48b MINOR: proxies/servers: Calculate queueslength and use it.
For both proxies and servers, properly calculates queueslength, which is
the total number of element in each queues (as they currently are only
using one queue, it is equivalent to the number of element of that
queue), and use it instead of the queue's length.
2025-01-28 12:49:41 +01:00
Amaury Denoyelle
071ae8ce3d BUG/MEDIUM: stats/server: use watcher to track server during stats dump
If a server A is deleted while a stats dump is currently on it, deletion
is delayed thanks to reference counting. Server A is nonetheless removed
from the proxy list. However, this list is a single linked list. If the
next server B is deleted and freed immediately, server A would still
point to it. This problem has been solved by the prev_deleted list in
servers.

This model seems correct, but it is difficult to ensure completely its
validity. In particular, it implies when stats dump is resumed, server A
elements will be accessed despite the server being in a half-deleted
state.

Thus, it has been decided to completely ditch the refcount mechanism for
stats dump. Instead, use the watcher element to register every stats
dump currently tracking a server instance. Each time a server is deleted
on the CLI, each stats dump element which may points to it are updated
to access the next server instance, or NULL if this is the last server.
This ensures that a server which was deleted via CLI but not completely
freed is never accessed on stats dump resumption.

Currently, no race condition related to dynamic servers and stats dump
is known. However, as described above, the previous model is deemed too
fragile, as such this patch is labelled as bug-fix. It should be
backported up to 2.6, after a reasonable period of observation. It
relies on the following patch :
  MINOR: list: define a watcher type
2024-12-10 16:19:33 +01:00
Willy Tarreau
d24768ab44 MINOR: protocol: create abnsz socket address family
For now it's the same as abns. We'll need to modify sock_unix_addrcmp(),
and a few other ones to support effective path length when dealing with
the \0. Let's check with Tristan's patch for this (upcoming patch).

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2024-10-29 12:14:50 +01:00
Willy Tarreau
78ac312bbd MEDIUM: protocol: make abns a custom unix socket address family
This is a pre-requisite to adding the abnsz socket address family:

in this patch we make use of protocol API rework started by 732913f
("MINOR: protocol: properly assign the sock_domain and sock_family") in
order to implement a dedicated address family for ABNS sockets (based on
UNIX parent family).

Thanks to this, it will become trivial to implement a new ABNSZ (for abns
zero) family which is essentially the same as ABNS but with a slight
difference when it comes to path handling (ABNS uses the whole sun_path
length, while ABNSZ's path is zero terminated and evaluation stops at 0)

It was verified that this patch doesn't break reg-tests and behaves
properly (tests performed on the CLI with show sess and show fd).

Anywhere relevant, AF_CUST_ABNS is handled alongside AF_UNIX. If no
distinction needs to be made, real_family() is used to fetch the proper
real family type to handle it properly.

Both stream and dgram were converted, so no functional change should be
expected for this "internal" rework, except that proto will be displayed
as "abns_{stream,dgram}" instead of "unix_{stream,dgram}".

Before ("show sess" output):
  0x64c35528aab0: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=21,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp=

After:
  0x619da7ad74c0: proto=abns_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=22,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp=

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2024-10-29 12:14:25 +01:00
Ilya Shipitsin
1f6e5f7a61 CLEANUP: assorted typo fixes in the code and comments
This is 43rd iteration of typo fixes
2024-09-03 17:49:21 +02:00
Christopher Faulet
6b9daec93d MINOR: stats-html: Display reuse ratio for spop connections
Now SPOP connections can be reused, it could be pretty useful to know the
reuse rate. The corresponding backend and server counters are already
incremented, but not displayed on the stats HTML page. Thanks to this patch,
it is now possible to get it, just like for HTTP proxies.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Amaury Denoyelle
53782b9ea5 MINOR: stats: extract proxy clear-counter in a dedicated function
Split code related to proxies list looping in cli_parse_clear_counters()
to a new dedicated function. This function is placed in the new module
stats-proxy.
2024-05-02 16:43:26 +02:00
Amaury Denoyelle
f0644d1bd7 REORG: stats: define stats-proxy source module
Create a new module stats-proxy. Move stats functions related to proxies
list looping in it. This allows to reduce stats source file dividing its
size by half.
2024-05-02 16:42:36 +02:00