The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
When a backend is in status DOWN and going UP it is currently displayed
as yellow ("active UP, going down") instead of orange ("active DOWN, going
UP"). This patches restyles the table rows to actually match the
legend.
This may be backported to any version, the issue appeared in 1.7-dev2
with commit 0c378efe8 ("MEDIUM: stats: compute the color code only in
the HTML form").
Remove static qualifier on stats_allocate_proxy_counters_internal. This
function will be used to allocate extra counters at runtime for dynamic
servers.
Refactoring performed with the following Coccinelle patch:
@@
char *s;
@@
(
- ist2(s, strlen(s))
+ ist(s)
|
- ist2(strdup(s), strlen(s))
+ ist(strdup(s))
)
Note that this replacement is safe even in the strdup() case, because `ist()`
will not call `strlen()` on a `NULL` pointer. Instead is inserts a length of
`0`, effectively resulting in `IST_NULL`.
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).
The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.
178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:
@ rule @
expression E;
@@
- free(E);
- E = NULL;
+ ha_free(&E);
It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.
A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
The nb_tasks counter was still global and gets incremented and decremented
for each task_new()/task_free(), and was read in process_runnable_tasks().
But it's only used for stats reporting, so doing this this often is
pointless and expensive. Let's move it to the task_per_thread struct and
have the stats sum it when needed.
This counter is solely used for reporting in the stats and is the hottest
thread contention point to date. Moving it to the scheduler and having a
separate one for the global run queue dramatically improves the performance,
showing a 12% boost on the request rate on 16 threads!
In addition, the thread debugging output which used to rely on rqueue_size
was not totally accurate as it would only report task counts. Now we can
return the exact thread's run queue length.
It is also interesting to note that there are still a few other task/tasklet
counters in the scheduler that are not efficiently updated because some cover
a single area and others cover multiple areas. It looks like having a distinct
counter for each of the following entries would help and would keep the code
a bit cleaner:
- global run queue (tree)
- per-thread run queue (tree)
- per-thread shared tasklets list
- per-thread local lists
Maybe even splitting the shared tasklets lists between pure tasklets and
tasks instead of having the whole and tasks would simplify the code because
there remain a number of places where several counters have to be updated.
move listen status to a helper, defining both status enum and string
definition.
this will be helpful to be reused in prometheus code. It also removes
this hard-to-read nested ternary.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
From this patch it should be possible to add support for listen stats in
prometheus.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
This patch splits current dns.c into two files:
The first dns.c contains code related to DNS message exchange over UDP
and in future other TCP. We try to remove depencies to resolving
to make it usable by other stuff as DNS load balancing.
The new resolvers.c inherit of the code specific to the actual
resolvers.
Note:
It was really difficult to obtain a clean diff dur to the amount
of moved code.
Note2:
Counters and stuff related to stats is not cleany separated because
currently counters for both layers are merged and hard to separate
for now.
Counters are currently stored into lowlevel nameservers struct but
most of them are resolving layer data and increased in the upper layer
So this patch renames the prototype used to allocate/dump them with prefix
'resolv' waiting for a clean split.
In order to unify prometheus and stats description, we need to remove
some field reference which are specific to stats implementation:
- `scur` in max current sessions (also reword current session)
- `rate` in max sessions
- `req_rate` in max requests
- `conn_rate` in max connections
Signed-off-by: William Dauchy <wdauchy@gmail.com>
In order to unify prometheus and stats description, we need to clarify
the description for pending connections.
- remove the BE reference in counters struct, as it is also used in
servers
- remove reference of `qcur` field in description as it is specific to
stats implemention
- try to reword cur and max pending connections description
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The EOM block may be removed. The HTX_FL_EOM flags is enough. Most of time,
to know if the end of the message is reached, we just need to have an empty
HTX message with HTX_FL_EOM flag set. It may also be detected when the last
block of a message with HTX_FL_EOM flag is manipulated.
Removing EOM blocks simplifies the HTX message filling. Indeed, there is no
more edge problems when the message ends but there is no more space to write
the EOM block. However, some part are more tricky. Especially the
compression filter or the FCGI mux. The compression filter must finish the
compression on the last DATA block. Before it was performed on the EOM
block, an extra DATA block with the checksum was added. Now, we must detect
the last DATA block to be sure to finish the compression. The FCGI mux on
its part must be sure to reserve the space for the empty STDIN record on the
last DATA block while this record was inserted on the EOM block.
The H2 multiplexer is probably the part that benefits the most from this
change. Indeed, it is now fairly easier to known when to set the ES flag.
The HTX documentaion has been updated accordingly.
The previous patch was pushed too quickly (399bf72f6 "BUG/MINOR: stats:
Remove a break preventing ST_F_QCUR to be set for servers"). It was not an
extra break but a misplaced break statement. Thus, now a break statement
must be added after filling the ST_F_MODE field in stats_fill_sv_stats().
No backport needed except if the above commit is backported.
There is an extra break statement wrongly placed in stats_fill_sv_stats()
function, just before filling the ST_F_QCUR field. It prevents this field to
be set to the right value for servers.
No backport needed except if commit 3a9a4992 ("MEDIUM: stats: allow to
select one field in `stats_fill_sv_stats`") is backported.
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
This patch follows what has already been done on frontend and backend
side.
From this patch it should be possible to remove most of the duplicate
code on prometheuse side for the server.
A few things to note though:
- state require prior calculation, so I moved that to a sort of helper
`stats_fill_be_stats_computestate`.
- all ST_F*TIME fields requires some minor compute, so I moved it at te
beginning of the function under a condition.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
This patch follows what has already been done on frontend side.
From this patch it should be possible to remove most of the duplicate
code on prometheuse side for the backend
A few things to note though:
- status and uweight field requires prior compute, so I moved that to a
sort of helper `stats_fill_be_stats_computesrv`.
- all ST_F*TIME fields requires some minor compute, so I moved it at te
beginning of the function under a condition.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
while working on backend/servers I realised I could have written that in
a better way and avoid one extra break. This is slightly improving
readiness.
also while being here, fix function declaration which was not 100%
accurate.
this patch does not change the behaviour of the code.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
In stats_fill_fe_stats(), some fields are conditionnal (ST_F_HRSP_* for
instance). But unlike unimplemented fields, for those fields, the <metric>
variable is used to fill the <stats> array, but it is not initialized. This
bug as no impact, because these fields are not used. But it is better to fix
it now to avoid future bugs.
To fix it, the metric is now defined and initialized into the for loop.
The bug was introduced by the commit 0ef54397 ("MEDIUM: stats: allow to
select one field in `stats_fill_fe_stats`"). No backport is needed except if
the above commit is backported. It fixes the issue #1063.
A regression was introduced by the commit 0ef54397b ("MEDIUM: stats: allow
to select one field in `stats_fill_fe_stats`"). stats_fill_fe_stats()
function fails on unimplemented metrics for frontends. However, not all
stats metrics are used by frontends. For instance ST_F_QCUR. As a
consequence, the frontends stats are always skipped.
To fix the bug, we just skip unimplemented metric for frontends. An error is
triggered only if a specific field is given and is unimplemented.
No backport is needed except if the above commit is backported.
use `stats_fill_fe_stats` when possible to avoid duplicating code; make
use of field selector to get the needed field only.
this should not introduce any difference of output.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
From this patch it should be possible to remove most of the duplicate
code on prometheuse side for the frontend.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Another patch in order to try to reconciliate haproxy stats and
prometheus. Here I'm adding a proper start time field in order to make
proper use of uptime field.
That being done we can move the calculation in `fill_info`
Signed-off-by: William Dauchy <wdauchy@gmail.com>
in order to prepare a possible merge of fields between haproxy stats and
prometheus, duplicate 3 fields:
INF_MEMMAX
INF_POOL_ALLOC
INF_POOL_USED
Those were specifically named in MB unit which is not what prometheus
recommends. We therefore used them but changed the unit while doing the
calculation. It created a specific case for that, up to the description.
This patch:
- removes some possible confusion, i.e. using MB field for bytes
- will permit an easier merge of fields such as description
First consequence for now, is that we can remove the calculation on
prometheus side and move it on `fill_info`.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
commit 5a982a71656ce885be4b1d4b90b8db31204788a1 ("MINOR:
contrib/prometheus-exporter: export build_info") is breaking lua
`core.get_info()`.
This patch makes sure build_info is correctly initialised in all cases.
Reviewed-by: William Dauchy <wdauchy@gmail.com>
Since ee63d4bd6 ("MEDIUM: stats: integrate static proxies stats in new
stats"), all dumped stats for a given domain, the default ones and the
modules ones, are merged in a signle array to dump them in a generic way.
For this purpose, the stat_l global variable is allocated at startup to
store a line of stats before the dump, i.e. all stats of an entity
(frontend, backend, listener, server or dns nameserver). But this variable
is not thread safe. If stats are retrieved concurrently by several clients
on different threads, the same variable is used. This leads to corrupted
stats output.
To fix the bug, the stat_l variable is now thread local.
This patch should probably solve issues #972 and #992. It must be backported
to 2.3.
Define a per-thread counters allocated with the greatest size of any
stat module counters. This variable is named trash_counters.
When using a proxy without allocated counters, return the trash counters
from EXTRA_COUNTERS_GET instead of a dangling pointer to prevent
segfault.
This is useful for all the proxies used internally and not
belonging to the global proxy list. As these objects does not appears on
the stat report, it does not matter to use the dummy counters.
For this fix to be functional, the extra counters are explicitly
initialized to NULL on proxy/server/listener init functions.
Most notably, the crash has already been detected with the following
vtc:
- reg-tests/lua/txn_get_priv.vtc
- reg-tests/peers/tls_basic_sync.vtc
- reg-tests/peers/tls_basic_sync_wo_stkt_backend.vtc
There is probably other parts that may be impacted (SPOE for example).
This bug was introduced in the current release and do not need to be
backported. The faulty commits are
"MINOR: ssl: count client hello for stats" and
"MINOR: ssl: add counters for ssl sessions".
Register a new function on POST DEINIT to free stats fields/lines for
each domain.
This patch does not fix a critical bug but may be backported to 2.3.
The "weight" column on the stats page is somewhat confusing when using
slowstart becaue it reports the effective weight, without being really
explicit about it. In some situations the user-configured weight is more
relevant (especially with long slowstarts where it's important to know
if the configured weight is correct).
This adds a new uweight stat which reports a server's user-configured
weight, and in a backend it receives the sum of all servers' uweights.
In addition it adds the mention of "effective" in a few descriptions
for the "weight" column (help and doc).
As a result, the list of servers in a backend is now always scanned
when dumping the stats. But this is not a problem given that these
servers are already scanned anyway and for way heavier processing.
When dumping the stats page (or the CSV output), when many states are
mixed, it's hard to figure the number of up servers. But when showing
only the "up" servers or hiding the "maint" servers, there's no way to
know how many servers are configured, which is problematic when trying
to update server-templates.
What this patch does, for dumps in "up" or "no-maint" modes, is to add
after the backend's "UP" or "DOWN" state "(%d/%d)" indicating the number
of servers seen as UP to the total number of servers in the backend. As
such, seeing "UP (33/39)" immediately tells that there are 6 servers that
are not listed when using "up", or will let the client figure how many
servers are left once deducted the number of non-maintenance ones. It's
not done on default dumps so as not to disturb existing tools, which
already have all the information they need in the dump.
"no-maint" is a bit similar to "up" except that it will only hide
servers that are in maintenance (or disabled in the configuration), and
not those that are enabled but failed a check. One benefit here is to
significantly reduce the output of the "show stat" command when using
large server-templates containing entries that are not yet provisioned.
Note that the prometheus exporter also has such an option which does
the exact same.
We already had it on the HTTP interface but it was not accessible on the
CLI. It can be very convenient to hide servers which are down, do not
resolve, or are in maintenance.
The remaining proxy states were only used to distinguish an enabled
proxy from a disabled one. Due to the initialization order, both
PR_STNEW and PR_STREADY were equivalent after startup, and they
would only differ from PR_STSTOPPED when the proxy is disabled or
shutdown (which is effectively another way to disable it).
Now we just have a "disabled" field which allows to distinguish them.
It's becoming obvious that start_proxies() is only used to print a
greeting message now, that we'd rather get rid of. Probably that
zombify_proxy() and stop_proxy() should be merged once their
differences move to the right place.
Since v1.4 or so, it's almost not possible anymore to set this state. The
only exception is by using the CLI to change a frontend's maxconn setting
below its current usage. This case makes no sense, and for other cases it
doesn't make sense either because "full" is a vague concept when only
certain listeners are full and not all. Let's just remove this unused
state and make it clear that it's not reported. The "ready" or "open"
states will continue to be reported without being misleading as they
will be opposed to "stop".
Remove variable declaration inside a for-loop. This was introduced by my
patches serie of the implementation of dynamic stats. This is not
supported by older gcc, notably on the freebsd environment of the ci.
Use the new stats module API to integrate the dns counters in the
standard stats. This is done in order to avoid code duplication, keep
the code related to cli out of dns and use the full possibility of the
stats function, allowing to print dns stats in csv or json format.
Integrate the additional proxy stats on the html stats page. For each
module, a new column is displayed with the individual stats available as
a tooltip.
Add a boolean 'clearable' on stats module structure. If set, it forces
all the counters to be reset on 'clear counters' cli command. If not,
the counters are reset only when 'clear counters all' is used.
This is executed on startup with the registered statistics module. The
existing statistics have been merged in a list containing all
statistics for each domain. This is useful to print all available
statistics in a generic way.
Allocate extra counters for all proxies/servers/listeners instances.
These counters are allocated with the counters from the stats modules
registered on startup.
A stat module can be registered to quickly add new statistics on
haproxy. It must be attached to one of the available stats domain. The
register must be done using INITCALL on STG_REGISTER.
The stat module has a name which should be unique for each new module in
a domain. It also contains a statistics list with their name/desc and a
pointer to a function used to fill the stats from the module counters.
The module also provides the initial counters values used on
automatically allocated counters. The offset for these counters
are stored in the module structure.
Use the character '-' to mark the end of static statistics on proxy
domain. After this marker, the order of the fields is not guaranteed and
should be parsed with care.
This flag can be used to determine on what type of proxy object the
statistics should be relevant. It will be useful when adding dynamic
statistics. Currently, this flag is not used.