use `stats_fill_fe_stats` when possible to avoid duplicating code; make
use of field selector to get the needed field only.
this should not introduce any difference of output.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
From this patch it should be possible to remove most of the duplicate
code on prometheuse side for the frontend.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Another patch in order to try to reconciliate haproxy stats and
prometheus. Here I'm adding a proper start time field in order to make
proper use of uptime field.
That being done we can move the calculation in `fill_info`
Signed-off-by: William Dauchy <wdauchy@gmail.com>
in order to prepare a possible merge of fields between haproxy stats and
prometheus, duplicate 3 fields:
INF_MEMMAX
INF_POOL_ALLOC
INF_POOL_USED
Those were specifically named in MB unit which is not what prometheus
recommends. We therefore used them but changed the unit while doing the
calculation. It created a specific case for that, up to the description.
This patch:
- removes some possible confusion, i.e. using MB field for bytes
- will permit an easier merge of fields such as description
First consequence for now, is that we can remove the calculation on
prometheus side and move it on `fill_info`.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
If an HTTP protocol upgrade request with a payload is received, a
501-not-implemented error is now returned to the client. It is valid from
the RFC point of view but will be incompatible with the way the H2
websockets will be handled by HAProxy. And it is probably a very uncommon
way to do perform protocol upgrades.
With this patch, the H1 mux is now able to return 501-not-implemented errors
to client during the request parsing. However, no such errors are returned
for now.
The MUX_ES_NOTIMPL_ERR exit status is added to allow the multiplexers to
report errors about not implemented features. This will be used by the H1
mux to return 501-not-implemented errors.
Add the support for the 501-not-implemented status code with the
corresponding default message. The documentation is updated accordingly
because it is now part of status codes HAProxy may emit via an errorfile or
a deny/return HTTP action.
Just like the H1 muliplexer, when a new frontend H2 stream is created, the
rxbuf is xferred to the stream at the upper layer.
Originally, it is not a bug fix, but just an api standardization. And in
fact, it fixes a crash when a h2 stream is aborted after the request parsing
but before the first call to process_stream(). It crashes since the commit
8bebd2fe5 ("MEDIUM: http-ana: Don't process partial or empty request
anymore"). It is now totally unexpected to have an HTTP stream without a
valid request. But here the stream is unable to get the request because the
client connection was aborted. Passing it during the stream creation fixes
the bug. But the true problem is that the stream-interfaces are still
relying on the connection state while only the muxes should do so.
This fix is specific for 2.4. No backport needed.
When a tcpcheck ruleset uses multiple connections, the existing one must be
closed and destroyed before openning the new one. This part is handled in
the tcpcheck_main() function, when called from the wake callback function
(wake_srv_chk). But it is indeed a problem, because this function may be
called from the mux layer. This means a mux may call the wake callback
function of the data layer, which may release the connection and the mux. It
is easy to see how it is hazardous. And actually, depending on the
scheduling, it leads to crashes.
Thus, we must avoid to release the connection in the wake callback context,
and move this part in the check's process function instead. To do so, we
rely on the CHK_ST_CLOSE_CONN flags. When a connection must be replaced by a
new one, this flag is set on the check, in tcpcheck_main() function, and the
check's task is woken up. Then, the connection is really closed in
process_chk_conn() function.
This patch must be backported as far as 2.2, with some adaptations however
because the code is not exactly the same.
glibc < 2.10 requires _GNU_SOURCE in order to make use of strsignal(),
otherwise leading to SEGV at runtime.
$ make V=1 TARGET=linux-glibc-legacy USE_THREAD= USE_ACCEPT4=
..
src/mworker.c: In function 'mworker_catch_sigchld':
src/mworker.c:285: warning: implicit declaration of function 'strsignal'
src/mworker.c:285: warning: pointer/integer type mismatch in conditional expression
..
$ make V=1 reg-tests REGTESTS_TYPES=slow,default
..
###### Test case: reg-tests/mcli/mcli_start_progs.vtc ######
## test results in: "/tmp/haregtests-2021-01-19_15-18-07.n24989/vtc.29077.28f6153d"
---- h1 Bad exit status: 0x008b exit 0x0 signal 11 core 128
---- h1 Assert error in haproxy_wait(), src/vtc_haproxy.c line 792: Condition(*(&h->fds[1]) >= 0) not true. Errno=0 Success
..
$ gdb ./haproxy /tmp/core.0.haproxy.30270
..
Core was generated by `/root/haproxy/haproxy -d -W -S fd@8 -dM -f /tmp/haregtests-2021-01-19_15-18-07.'.
Program terminated with signal 11, Segmentation fault.
#0 0x00002aaaab387a10 in strlen () from /lib64/libc.so.6
(gdb) bt
#0 0x00002aaaab387a10 in strlen () from /lib64/libc.so.6
#1 0x00002aaaab354b69 in vfprintf () from /lib64/libc.so.6
#2 0x00002aaaab37788a in vsnprintf () from /lib64/libc.so.6
#3 0x00000000004a76a3 in memvprintf (out=0x7fffedc680a0, format=0x5a5d58 "Current worker #%d (%d) exited with code %d (%s)\n", orig_args=0x7fffedc680d0)
at src/tools.c:3868
#4 0x00000000004bbd40 in print_message (label=0x58abed "ALERT", fmt=0x5a5d58 "Current worker #%d (%d) exited with code %d (%s)\n", argp=0x7fffedc680d0)
at src/log.c:1066
#5 0x00000000004bc07f in ha_alert (fmt=0x5a5d58 "Current worker #%d (%d) exited with code %d (%s)\n") at src/log.c:1109
#6 0x0000000000534b7b in mworker_catch_sigchld (sh=<value optimized out>) at src/mworker.c:293
#7 0x0000000000556af3 in __signal_process_queue () at src/signal.c:88
#8 0x00000000004f6216 in signal_process_queue () at include/haproxy/signal.h:39
#9 run_poll_loop () at src/haproxy.c:2859
#10 0x00000000004f63b7 in run_thread_poll_loop (data=<value optimized out>) at src/haproxy.c:3028
#11 0x00000000004faaac in main (argc=<value optimized out>, argv=0x7fffedc68498) at src/haproxy.c:904
See: https://man7.org/linux/man-pages/man3/strsignal.3.html
Must be backported as far as 2.0.
If a subscriber's tasklet was called more than one million times, if
the ssl_ctx's connection doesn't match the current one, or if the
connection appears closed in one direction while the SSL stack is
still subscribed, the FD is reported as suspicious. The close cases
may occasionally trigger a false positive during very short and rare
windows. Similarly the 1M calls will trigger after 16GB are transferred
over a given connection. These are rare enough events to be reported as
suspicious.
A file descriptor which maps to a connection but has more than one
thread in its mask, or an FD handle that doesn't correspond to the FD,
or wiht no mux context, or an FD with no thread in its mask, or with
more than 1 million events is flagged as suspicious.
Now the show_fd helpers at the transport and mux levels return an integer
which indicates whether or not the inspected entry looks suspicious. When
an entry is reported as suspicious, "show fd" will suffix it with an
exclamation mark ('!') in the dump, that is supposed to help detecting
them.
For now, helpers were adjusted to adapt to the new API but none of them
reports any suspicious entry yet.
When dumping a live h1 stream, also take the opportunity for reporting
the subscriber including the event, tasklet, handler and context. Example:
3030 : st=0x21(R:rA W:Ra) ev=0x04(heOpi) [Lc] tmask=0x4 umask=0x0 owner=0x7f97805c1f70 iocb=0x65b847(sock_conn_iocb) back=1 cflg=0x00002300 sv=s1/recv mux=H1 ctx=0x7f97805c21b0 h1c.flg=0x80000200 .sub=1 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 h1s=0x7f97805c2380 h1s.flg=0x4010 .req.state=MSG_DATA .res.state=MSG_RPBEFORE .meth=POST status=0 .cs.flg=0x00000000 .cs.data=0x7f97805c1720 .subs=0x7f97805c1748(ev=1 tl=0x7f97805c1990 tl.calls=2 tl.ctx=0x7f97805c1720 tl.fct=si_cs_io_cb) xprt=RAW
In FD dumps it's often very important to figure what upper layer function
is going to be called. Let's export the few I/O callbacks that appear as
tasklet functions so that "show fd" can resolve them instead of printing
a pointer relative to main. For example:
1028 : st=0x21(R:rA W:Ra) ev=0x01(heopI) [lc] tmask=0x2 umask=0x2 owner=0x7f00b889b200 iocb=0x65b638(sock_conn_iocb) back=0 cflg=0x00001300 fe=recv mux=H2 ctx=0x7f00c8824de0 h2c.st0=FRH .err=0 .maxid=795 .lastid=-1 .flg=0x0000 .nbst=0 .nbcs=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=0 .orph_cnt=0 .sub=1 .dsi=795 .dbuf=0@(nil)+0/0 .msi=-1 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] xprt=SSL xprt_ctx=0x7f00c86d0750 xctx.st=0 .xprt=RAW .wait.ev=1 .subs=0x7f00c88252e0(ev=1 tl=0x7f00a07d1aa0 tl.calls=1047 tl.ctx=0x7f00c8824de0 tl.fct=h2_io_cb) .sent_early=0 .early_in=0
The SSL context contains a lot of important details that are currently
missing from debug outputs. Now that we detect ssl_sock, we can perform
some sanity checks, print the next xprt, the subscriber callback's context,
handler and number of calls. The process function is also resolved. This
now gives for example on an H2 connection:
1029 : st=0x21(R:rA W:Ra) ev=0x01(heopI) [lc] tmask=0x2 umask=0x2 owner=0x7fc714881700 iocb=0x65b528(sock_conn_iocb) back=0 cflg=0x00001300 fe=recv mux=H2 ctx=0x7fc734545e50 h2c.st0=FRH .err=0 .maxid=217 .lastid=-1 .flg=0x0000 .nbst=0 .nbcs=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=0 .orph_cnt=0 .sub=1 .dsi=217 .dbuf=0@(nil)+0/0 .msi=-1 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] xprt=SSL xprt_ctx=0x7fc73478f230 xctx.st=0 .xprt=RAW .wait.ev=1 .subs=0x7fc734546350(ev=1 tl=0x7fc7346702e0 tl.calls=278 tl.ctx=0x7fc734545e50 tl.fct=main-0x144efa) .sent_early=0 .early_in=0
Just like we did for the muxes, now the transport layers will have the
ability to provide helpers to report more detailed information about their
internal context. When the helper is not known, the pointer continues to
be dumped as-is if it's not NULL. This way a transport with no context nor
dump function will not add a useless "xprt_ctx=(nil)" but the pointer will
be emitted if valid or if a helper is defined.
These ones are definitely missing from some dumps, let's report them! We
print the xprt's name instead of its useless pointer, as well as its ctx
when xprt is not NULL.
Over time the code has uglified, casting fdt.owner as a struct connection
for about everything. Let's have a const struct connection* there and take
this opportunity for passing all fields as const as well.
Additionally a misplaced closing parenthesis on the output was fixed.
When 0c439d895 ("BUILD: tools: make resolve_sym_name() return a const")
was written, the pointer argument ought to have been turned to const for
more flexibility. Let's do it now.
That was causing confusing outputs like this one whenan H2S is known:
1030 : ... last_h2s=0x2ed8390 .id=775 .st=HCR.flg=0x4001 .rxbuf=...
^^^^
This was introduced by commit ab2ec4540 in 2.1-dev2 so the fix can be
backported as far as 2.1.
This counter could be hugely incremented by the peer task responsible of managing
peer synchronizations and reconnections, for instance when a peer is not reachable
there is a period where the appctx is not created. If we receive stick-table
updates before the peer session (appctx) is instantiated, we reach the code
responsible of incrementing the "new_conn" counter.
With this patch we increment this counter only when we really instantiate a new
peer session thanks to peer_session_create().
May be backported as far as 2.0.
As of commit 6ca89162dc881df8fecd7713ca1fe5dbaa66b315 this hash no longer is
required, because unknown encodings are not longer stored and known encodings
do not use the cache.
If a server varies on the accept-encoding header and it sends a response
with an encoding we do not know (see parse_encoding_value function), we
will not store it. This will prevent unexpected errors caused by
cache collisions that could happen in accept_encoding_hash_cmp.
commit 5a982a71656ce885be4b1d4b90b8db31204788a1 ("MINOR:
contrib/prometheus-exporter: export build_info") is breaking lua
`core.get_info()`.
This patch makes sure build_info is correctly initialised in all cases.
Reviewed-by: William Dauchy <wdauchy@gmail.com>
Previous commit da2b0844f ("MINOR: peers: Add traces for peer control
messages.") introduced a build warning on some compiler versions after
the removal of variable "peers" in peer_send_msgs() because variable
"s" was used only to assign this one, and variable "si" to assign "s".
Let's remove both to fix the warning. No backport is needed.
V2 of this fix which includes a missing pointer initialization which was
causing a segfault in v1 (949a7f64591458eb06c998acf409093ea991dc3a)
This bug happens when a service has multiple records on the same host
and the server provides the A/AAAA resolution in the response as AR
(Additional Records).
In such condition, the first occurence of the host will be taken from
the Additional section, while the second (and next ones) will be process
by an independent resolution task (like we used to do before 2.2).
This can lead to a situation where the "synchronisation" of the
resolution may diverge, like described in github issue #971.
Because of this behavior, HAProxy mixes various type of requests to
resolve the full list of servers: SRV+AR for all "first" occurences and
A/AAAA for all other occurences of an existing hostname.
IE: with the following type of response:
;; ANSWER SECTION:
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A2.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 86 A3.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A1.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 85 A3.tld.
;; ADDITIONAL SECTION:
A2.tld. 3600 IN A 192.168.0.2
A3.tld. 3600 IN A 192.168.0.3
A1.tld. 3600 IN A 192.168.0.1
A3.tld. 3600 IN A 192.168.0.3
the first A3 host is resolved using the Additional Section and the
second one through a dedicated A request.
When linking the SRV records to their respective Additional one, a
condition was missing (chek if said SRV record is already attached to an
Additional one), leading to stop processing SRV only when the target
SRV field matches the Additional record name. Hence only the first
occurence of a target was managed by an additional record.
This patch adds a condition in this loop to ensure the record being
parsed is not already linked to an Additional Record. If so, we can
carry on the parsing to find a possible next one with the same target
field value.
backport status: 2.2 and above
Display traces when sending/receiving peer control messages (synchronisation, heartbeat).
Add remaining traces when parsing malformed messages (acks, stick-table definitions)
or ignoring them.
Also add traces when releasing session or when reaching the PEER_SESS_ST_ERRPROTO
peer protocol state.
It's about the third time I get confused by these functions, half of
which manipulate the reference as a whole and those manipulating only
an entry. For me "pat_ref_commit" means committing the pattern reference,
not just an element, so let's rename it. A number of other ones should
really be renamed before 2.4 gets released :-/
There is no low level api to achieve same as Linux/FreeBSD, we rely
on CPUs available. Without this, the number of threads is just 1 for
Mac while having 8 cores in my M1.
Backporting to 2.1 should be enough if that's possible.
Signed-off-by: David CARLIER <devnexen@gmail.com>
An fatal error is now reported if a server is defined in a frontend
section. til now, a warning was just emitted and the server was ignored. The
warning was added in the 1.3.4 when the frontend/backend keywords were
introduced to allow a smooth transition and to not break existing
configs. It is old enough now to emit an fatal error in this case.
This patch is related to the issue #1043. It may be backported at least as
far as 2.2, and possibly to older versions. It relies on the previous commit
("MINOR: config: Add failifnotcap() to emit an alert on proxy capabilities").
The HAPROXY_CFGFILES env variable is built using a static trash chunk, via a
call to get_trash_chunk() function. This chunk is reserved during the whole
configuration parsing. It is far too large to guarantee it will not be
reused during the configuration parsing. And in fact, it happens in the lua
code since the commit f67442efd ("BUG/MINOR: lua: warn when registering
action, conv, sf, cli or applet multiple times"), when a lua script is
loaded.
To fix the bug, we now use a dynamic buffer instead. And we call memprintf()
function to handle both the allocation and the formatting. Allocation errors
at this stage are fatal.
This patch should fix the issue #1041. It must be backported as far as 2.0.
The strict-limits global option was introduced with commit 0fec3ab7b
("MINOR: init: always fail when setrlimit fails"). When used in
conjuction with master-worker, haproxy will not fail when a setrlimit
fails. This happens because we only exit() if master-worker isn't used.
This patch removes all tests for master-worker mode for all cases covered
by strict-limits scope.
This should be backported from 2.1 onward.
This should fix issue #1042.
Reviewed by William Dauchy <wdauchy@gmail.com>
If a server is defined in a frontend, thus a proxy without the backend
capability, the 'check' and 'agent-check' keywords are ignored. This way, no
check is performed on an ignored server. This avoids a segfault because some
part of the tcpchecks are not fully initialized (or released for frontends
during the post-check).
In addition, an test on the server's proxy capabilities is performed when
checks or agent-checks are initialized and nothing is performed for servers
attached to a non-backend proxy.
This patch should fix the issue #1043. It must be backported as far as 2.2.
If an errors occurs during the sample expression parsing, the alloced
sample_expr is not freed despite having its main pointer reset.
This fixes GitHub issue #1046.
It could be backported as far as 1.8.
This reverts commit 949a7f64591458eb06c998acf409093ea991dc3a.
The first part of the patch introduces a bug. When a dns answer item is
allocated, its <ar_item> is only initialized at the end of the parsing, when
the item is added in the answer list. Thus, we must not try to release it
during the parsing.
The second part is also probably buggy. It fixes the issue #971 but reverts
a fix for the issue #841 (see commit fb0884c8297 "BUG/MEDIUM: dns: Don't
store additional records in a linked-list"). So it must be at least
revalidated.
This revert fixes a segfault reported in a comment of the issue #971. It
must be backported as far as 2.2.
like it is done in other places, check the return value of
`alloc_trash_chunk` before using it. This was detected by coverity.
this patch fixes commit 591fc3a330005c289b4705fe4cb37c4eec9f9eed
("BUG/MINOR: sample: fix concat() converter's corruption with non-string
variables"
As a consequence, this patch should be backported as far as 2.0
this should fix github issue #1039
Signed-off-by: William Dauchy <wdauchy@gmail.com>
I was looking at writing a simple first test for prometheus but I
realised there is no proper way to exclude it if haproxy was not built
with prometheus plugin.
Today we have `REQUIRE_OPTIONS` in reg-tests which is based on `Feature
list` from `haproxy -vv`. Those options are coming from the Makefile
itself.
A plugin is build this way:
EXTRA_OBJS="contrib/prometheus-exporter/service-prometheus.o"
It does register service actions through `service_keywords_register`.
Those are listed through `list_services` in `haproxy -vv`.
To facilitate parsing, I slightly changed the output to a single line
and integrate it in regtests shell script so that we can now specify a
dependency while writing a reg-test for prometheus, e.g:
#REQUIRE_SERVICE=prometheus-exporter
#REQUIRE_SERVICES=prometheus-exporter,foo
There might be other ways to handle this, but that's the cleanest I
found; I understand people might be concerned by this output change in
`haproxy -vv` which goes from:
Available services :
foo
bar
to:
Available services : foo bar
Signed-off-by: William Dauchy <wdauchy@gmail.com>
- check functions are never called with a NULL args list, it is always
an array, so first check can be removed
- the expression parser guarantees that we can't have anything else,
because we mentioned json converter takes a mandatory string argument.
Thus test on `ARGT_STR` can be removed as well
- also add breaking line between enum and function declaration
In order to validate it, add a simple json test testing very simple
cases but can be improved in the future:
- default json converter without args
- json converter failing on error (utf8)
- json converter with error being removed (utf8s)
Signed-off-by: William Dauchy <wdauchy@gmail.com>
GitHub Issue #1037 Reported a memory leak in deinit() caused by an
allocation made in sa2str() that was stored in srv_set_addr_desc().
When destroying each server for a proxy in deinit, include freeing the
memory in the key of server->addr_node.
The leak was introduced in commit 92149f9a8 ("MEDIUM: stick-tables: Add
srvkey option to stick-table") which is not in any released version so
no backport is needed.
Cc: Tim Duesterhus <tim@bastelstu.be>
Patrick Hemmer reported that calling concat() with an integer variable
causes a %00 to appear at the beginning of the output. Looking at the
code, it's not surprising. The function uses get_trash_chunk() to get
one of the trashes, but can call casting functions which will also use
their trash in turn and will cycle back to ours, causing the trash to
be overwritten before being assigned to a sample.
By allocating the trash from a pool using alloc_trash_chunk(), we can
avoid this. However we must free it so the trash's contents must be
moved to a permanent trash buffer before returning. This is what's
achieved using smp_dup().
This should be backported as far as 2.0.