Commit Graph

16085 Commits

Author SHA1 Message Date
eaglegai
15c3d20e31 BUG/MINOR: ssl_sock: add check for ha_meth
in __ssl_sock_init, BIO_meth_new may failed and return NULL if
OPENSSL_zalloc failed.  in this case, ha_meth  will be NULL, and then
crash happens in  BIO_meth_set_write.  So, we add a check for ha_meth.
2023-05-26 12:07:43 +02:00
Patrick Hemmer
425d7ad89d MINOR: init: pre-allocate kernel data structures on init
The Linux kernel maintains data structures to track a processes' open file
descriptors, and it expands these structures as necessary when FD usage grows
(at every FD=2^X starting at 64). However when threading is in use, during
expansion the kernel will pause (observed up to 47ms) while it waits for thread
synchronization (see https://bugzilla.kernel.org/show_bug.cgi?id=217366).

This change addresses the issue and avoids the random pauses by opening the
maximum file descriptor during initialization, so that expansion will not occur
while processing traffic.
2023-05-26 09:28:18 +02:00
Christopher Faulet
535dd920df MINOR: compression: Improve the way Vary header is added
When a message is compressed, A "Vary" header is added with
"accept-encoding" value. However, a new header is always added, regardless
there is already a Vary header or not. In addition, if there is already a
Vary header, there is no check on values to be sure "accept-encoding" value
is not already there. So it is possible to have it twice.

To improve this part, we now test Vary header values and "accept-encoding"
is only added if it was not found. In addition, "accept-encoding" value is
appended to the last Vary header found, if any. Otherwise, a new header is
added.
2023-05-25 11:25:31 +02:00
Aurelien DARRAGON
1c07da4b48 BUG/MINOR: hlua: unsafe hlua_lua2smp() usage
Fixing hlua_lua2smp() usage in hlua's code since it was assumed that
hlua_lua2smp() makes a standalone smp out of lua data, but it is not
the case.

This is especially true when dealing with lua strings (string is
extracted using lua_tolstring() which returns a pointer to lua string
memory location that may be reclaimed by lua at any time when no longer
used from lua's point of view). Thus, smp generated by hlua_lua2smp() may
only be used from the lua context where the call was initially made, else
it should be explicitly duplicated before exporting it out of lua's
context to ensure safe (standalone) usage.

This should be backported to all stable versions.
2023-05-24 16:48:17 +02:00
Aurelien DARRAGON
a3624cb528 DOC: hlua: document hlua_lua2smp() function
Add some developer notes to hlua_lua2smp() function description since
it lacks some important infos, including a critical usage restriction.
2023-05-24 16:48:17 +02:00
Aurelien DARRAGON
0aaf6c45ca DOC: hlua: document hlua_lua2arg() function
Add some developer notes to hlua_lua2arg() function description since
it lacks some important infos, including an usage restriction.
2023-05-24 16:48:17 +02:00
Aurelien DARRAGON
e5c048a72d MINOR: hlua: hlua_arg2lua() may LJMP
Add LJMP hint to hlua_arg2lua() prototype since it relies on
functions (e.g.: lua_pushlstring()) which may raise lua memory errors.
2023-05-24 16:48:17 +02:00
Aurelien DARRAGON
4121772c50 MINOR: hlua: hlua_smp2lua() may LJMP
Add LJMP hint to hlua_smp2lua() prototype since it relies on
functions (e.g.: lua_pushstring()) which may raise lua memory errors.
2023-05-24 16:48:17 +02:00
Aurelien DARRAGON
742b1a8797 MINOR: hlua: hlua_smp2lua_str() may LJMP
Add LJMP hint to hlua_smp2lua_str() prototype since it relies on
functions (e.g.: lua_pushstring()) which may raise lua memory errors.
2023-05-24 16:48:17 +02:00
Frédéric Lécaille
12a815ad19 MINOR: quic: Add a counter for sent packets
Add ->sent_pkt counter to quic_conn struct to count the packet at QUIC connection
level. Then, when the connection is released, the ->sent_pkt counter value
is added to the one for the listener.

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Frédéric Lécaille
bdd64fd71d MINOR: quic: Add some counters at QUIC connection level
Add some statistical counters to quic_conn struct from quic_counters struct which
are used at listener level to handle them at QUIC connection level. This avoid
calling atomic functions. Furthermore this will be useful soon when a counter will
be added for the total number of packets which have been sent which will be very
often incremented.

Some counters were not added, espcially those which count the number of QUIC errors
by QUIC error types. Indeed such counters would be incremented most of the time
only one time at QUIC connection level.

Implement quic_conn_prx_cntrs_update() which accumulates the QUIC connection level
statistical counters to the listener level statistical counters.

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Frédéric Lécaille
464281af46 CLEANUP: quic: Useless tests in qc_rx_pkt_handle()
There is no reason to test <qc> nullity at the end of this function because it is
clearly not null, furthermore the trace handle the case where <qc> is null.

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Frédéric Lécaille
ab3aa0ff22 CLEANUP: quic: Indentation fix quic_rx_pkt_retrieve_conn()
Add missing spaces.

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Frédéric Lécaille
5fa633e22f MINOR: quic: Align "show quic" command help information
Align the "show quic" help information with all the others command help information.
Furthermore, makes this information match the management documentation.

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Frédéric Lécaille
35b63964a0 BUG/MINOR: quic: Missing Retry token length on receipt
quic_retry_token_check() must decipher the token sent to and received back from
clients. This token is made of the token format byte, the ODCID prefixed by its one byte
length, the timestamp of its creation, and terminated by an AEAD TAG followed
by the salt used to derive the secret to cipher the token.

So, the length of these data must be between
2 + QUIC_ODCID_MINLEN + sizeof(uint32_t) + QUIC_TLS_TAG_LEN + QUIC_RETRY_TOKEN_SALTLEN
and
2 + QUIC_CID_MAXLEN + sizeof(uint32_t) + QUIC_TLS_TAG_LEN + QUIC_RETRY_TOKEN_SALTLEN.

Must be backported to 2.7 and 2.6.
2023-05-24 16:30:11 +02:00
Frédéric Lécaille
6d6ddb2ce5 BUG/MINOR: quic: Wrong token length check (quic_generate_retry_token())
This bug would never occur because the buffer supplied to quic_generate_retry_token()
to build a Retry token is large enough to embed such a token. Anyway, this patch
fixes quic_generate_retry_token() implementation.

There were two errors: this is the ODCID which is added to the token. Furthermore
the timestamp was not taken into an account.

Must be backported to 2.6 and 2.7.
2023-05-24 16:30:11 +02:00
Frédéric Lécaille
aaf32f0c83 MINOR: quic: Add low level traces (addresses, DCID)
Add source and destination addresses to QUIC_EV_CONN_RCV trace event. This is
used by datagram/socket level functions (quic_sock.c).

Must be backported to 2.7.
2023-05-24 16:30:11 +02:00
Christopher Faulet
c2f1d0ee5e BUG/MEDIUM: mux-h2: Propagate termination flags when frontend SC is created
We must evaluate if EOS/EOI/ERR_PENDING/ERROR flags must be set on the SE
when the frontend SC is created because the rxbuf is transferred to the
steeam at this stage. It means the call to h2_rcv_buf() may be skipped on
some circumstances.

And indeed, it happens when HAproxy quickly replies, for instance because of
a deny rule. In this case, depending on the scheduling, the abort may block
the receive attempt from the SC. In this case if SE flags were not properly
set earlier, there is no way to terminate the request and the session may be
freezed.

For now, I can't explain why there is no timeout when this happens but it
remains an issue because here we should not rely on timeouts to close the
stream.

This patch relies on following commits:

    * MINOR: mux-h2: Add a function to propagate termination flags from h2s to SE
    * MINOR: mux-h2: Set H2_SF_ES_RCVD flag when decoding the HEADERS frame

The issue was encountered on the 2.8 but it seems the bug exists since the
2.4. But it is probably a good idea to only backport the series to 2.7 only
and wait for a bug report on earlier versions.

This patch should solve the issue #2147.
2023-05-24 16:06:11 +02:00
Christopher Faulet
531dd050ff MINOR: mux-h2: Add a function to propagate termination flags from h2s to SE
The function h2s_propagate_term_flags() was added to check the H2S state and
evaluate when EOI/EOS/ERR_PENDING/ERROR flags must be set on the SE. It is
not the only place where those flags are set. But it centralizes the synchro
between the H2 stream and the SC.

For now, this function is only used at the end of h2_rcv_buf(). But it will
be used to fix a bug.
2023-05-24 16:06:11 +02:00
Christopher Faulet
1a60a66306 MINOR: mux-h2: Set H2_SF_ES_RCVD flag when decoding the HEADERS frame
The flag H2_SF_ES_RCVD is set on the H2 stream when the ES flag is found in
a frame. On HEADERS frame, it was set in function processing the frame. It
is moved in the function decoding the frame. Fundamentally, this changes
nothing. But it will be useful to have this information earlier when a
client H2 stream is created.
2023-05-24 16:06:11 +02:00
Christopher Faulet
78b1eb2b04 BUG/MINOR: mux-h2: Check H2_SF_BODY_TUNNEL on H2S flags and not demux frame ones
In h2c_frt_stream_new(), H2_SF_BODY_TUNNEL flags was tested on demux frame
flags (h2c->dff) instead of the h2s flags.  By chance, it is a noop test
becasue H2_SF_BODY_TUNNEL value, once converted to an int8_t, is 0.

It is a 2.8-specific issue. No backport needed.
2023-05-24 16:06:11 +02:00
Amaury Denoyelle
152beeec34 MINOR: mux-quic: report error on stream-endpoint earlier
A RESET_STREAM is emitted in several occasions :
- protocol error during HTTP/3.0 parsing
- STOP_SENDING reception

In both cases, if a stream-endpoint is attached we must set its ERR
flag. This was correctly done but after some delay as it was only when
the RESET_STREAM was emitted. Change this to set the ERR flag as soon as
one of the upper cases has been encountered. This should help to release
faster streams in error.

This should be backported up to 2.7.
2023-05-24 14:46:52 +02:00
Amaury Denoyelle
37d78997ae MINOR: mux-quic: only set EOS on RESET_STREAM recv
A recent review was done to rationalize ERR/EOS/EOI flags on stream
endpoint. A common definition for both H1/H2/QUIC mux have been written
in the following documentation :
 ./doc/internals/stconn-close.txt

In QUIC it is possible to close each channels of a stream independently
with RESET_STREAM and STOP_SENDING frames. When a RESET_STREAM is
received, it indicates that the peer has ended its transmission in an
abnormal way. However, it is still ready to receive.

Previously, on RESET_STREAM reception, QUIC MUX set the ERR flag on
stream-endpoint. However, according to the QUIC mechanism, it should be
instead EOS but this was impossible due to a BUG_ON() which prevents EOS
without EOI or ERR. This BUG_ON was only present because this case was
never used before the introduction of QUIC. It was removed in a recent
commit which allows us to now properly set EOS alone on RESET_STREAM
reception.

In practice, this change allows to continue to send data even after
RESET_STREAM reception. However, currently browsers always emit it with
a STOP_SENDING as this is used to abort the whole H3 streams. In the end
this will result in a stream-endpoint with EOS and ERR_PENDING/ERR
flags.

This should be backported up to 2.7.
2023-05-24 14:39:17 +02:00
Amaury Denoyelle
8de35925f7 MINOR: mux-quic: set both EOI EOS for stream fin
A recent review was done to rationalize ERR/EOS/EOI flags on stream
endpoint. A common definition for both H1/H2/QUIC mux have been written
in the following documentation :
 ./doc/internals/stconn-close.txt

Always set EOS with EOI flag to conform to this specification. EOI is
set whenever the proper stream end has been encountered : with QUIC it
corresponds to a STREAM frame with FIN bit. At this step, RESET_STREAM
frames are ignored by QUIC MUX as allowed by RFC 9000. This means we can
always set EOS at the same time with EOI.

This should be backported up to 2.7.
2023-05-24 14:23:22 +02:00
Christopher Faulet
2437377445 MEDIUM: stconn/applet: Allow SF_SL_EOS flag alone
During the refactoring on SC/SE flags, it was stated that SE_FL_EOS flag
should not be set without on of SE_FL_EOI or SE_FL_ERROR flags. In fact, it
is a problem for the QUIC/H3 multiplexer. When a RST_STREAM frame is
received, it means no more data will be received from the peer. And this
happens before the end of the message (RST_STREAM frame received after the
end of the message are ignored). At this stage, it is a problem to report an
error because from the QUIC point of view, it is valid. Data may still be
sent to the peer. If an error is reported, this will stop the data sending
too.

In the same idea, the H1 mulitplexer reports an error when the message is
truncated because of a read0. But only an EOS flag should be reported in
this case, not an error. Fundamentally, it is important to distinguish
errors from shuts for reads because some cases are valid. For instance a H1
client can choose to stop uploading data if it received the server response.

So, relax tests on SE flags by removing BUG_ON_HOT() on SE_FL_EOS flag. For
now, the abort will be handled in the HTTP analyzers.
2023-05-23 15:52:35 +02:00
Amaury Denoyelle
aa39cc9f42 MINOR: quic: fix alignment of oneline show quic
Output of 'show quic' CLI in oneline mode was not correctly done. This
was caused both due to differing qc pointer size and ports length. Force
proper alignment by using maximum sizes as expected and complete with
blanks if needed.

This should be backported up to 2.7.
2023-05-22 14:18:02 +02:00
Amaury Denoyelle
7385ff3f0c BUG/MINOR: quic: handle Tx packet allocation failure properly
qc_prep_app_pkts() is responsible to built several new packets for
sending. It can fail due to memory allocation error. Before this patch,
the Tx buffer was released on error even if some packets were properly
generated.

With this patch, if an error happens on qc_prep_app_pkts(), we still try
to send already built packets if Tx buffer is not empty. The sending
loop is then interrupted and the Tx buffer is released with data
cleared.

This should be backported up to 2.7.
2023-05-22 14:18:02 +02:00
Amaury Denoyelle
f8fbb0b94e MINOR: quic: use WARN_ON for encrypt failures
It is expected that quic_packet_encrypt() and
quic_apply_header_protection() never fails as encryption is done in
place. This allows to remove their return value.

This is useful to simplify error handling on sending path. An error can
only be encountered on the first steps when allocating a new packet or
copying its frame content. After a clear packet is successfully built,
no error is expected on encryption.

However, it's still unclear if our assumption that in-place encryption
function never fail. As such, a WARN_ON() statement is used if an error
is detected at this stage. Currently, it's impossible to properly manage
this without data loss as this will leave partially unencrypted data in
the send buffer. If warning are reported a solution will have to be
implemented.

This should be backported up to 2.7.
2023-05-22 11:20:44 +02:00
Amaury Denoyelle
5eadc27623 MINOR: quic: remove return val of quic_aead_iv_build()
quic_aead_iv_build() should never fail unless we call it with buffers of
different size. This never happens in the code as every input buffers
are of size QUIC_TLS_IV_LEN.

Remove the return value and add a BUG_ON() to prevent future misusage.
This is especially useful to remove one error handling on the sending
patch via quic_packet_encrypt().

This should be backported up to 2.7.
2023-05-22 11:17:18 +02:00
Amaury Denoyelle
8d6d246dbc CLEANUP: mux-quic/h3: complete BUG_ON with comments
Complete each useful BUG_ON statements with a comment to explain its
purpose. Also convert BUG_ON_HOT to BUG_ON as they should not have a
big impact.

This should be backported up to 2.7.
2023-05-22 11:17:18 +02:00
Aurelien DARRAGON
b6a24a52a2 BUG/MINOR: debug: fix pointer check in debug_parse_cli_task()
Task pointer check in debug_parse_cli_task() computes the theoric end
address of provided task pointer to check if it is valid or not thanks to
may_access() helper function.

However, relative ending address is calculated by adding task size to 't'
pointer (which is a struct task pointer), thus it will result to incorrect
address since the compiler automatically translates 't + x' to
't + x * sizeof(*t)' internally (with sizeof(*t) != 1 here).

Solving the issue by using 'ptr' (which is the void * raw address) as
starting address to prevent automatic address scaling.

This was revealed by coverity, see GH #2157.

No backport is needed, unless 9867987 ("DEBUG: cli: add "debug dev task"
to show/wake/expire/kill tasks and tasklets") gets backported.
2023-05-17 16:49:17 +02:00
Aurelien DARRAGON
7428adaf0d BUG/MINOR: hlua: SET_SAFE_LJMP misuse in hlua_event_runner()
When hlua_event_runner() pauses the subscription (ie: if the consumer
can't keep up the pace), hlua_traceback() is used to get the current
lua trace (running context) to provide some info to the user.

However, as hlua_traceback() may raise an error (__LJMP) is set, it is
used within a SET_SAFE_LJMP() / RESET_SAFE_LJMP() combination to ensure
lua errors are properly handled and don't result in unexpected behavior.

But the current usage of SET_SAFE_LJMP() within the function is wrong
since hlua_traceback() will run a second time (unprotected) if the
first (protected) attempt fails. This is undefined behavior and could
even lead to crashes.

Hopefully it is very hard to trigger this code path, thus we can consider
this as a minor bug.

Also using this as an opportunity to enhance the message report to make
it more meaningful to the user.

This should fix GH #2159.

It is a 2.8 specific bug, no backport needed unless c84899c636
("MEDIUM: hlua/event_hdl: initial support for event handlers") gets
backported.
2023-05-17 16:48:40 +02:00
Christopher Faulet
06e9c81bd0 MEDIUM: resolvers: Stop scheduling resolution during stopping stage
When the process is stopping, the server resolutions are suspended. However
the task is still periodically woken up for nothing. If there is a huge
number of resolution, it may lead to a noticeable CPU consumption for no
reason.

To avoid this extra CPU cost, we stop to schedule the the resolution tasks
during the stopping stage. Of course, it is only true for server
resolutinos. Dynamic ones, via do-resolve actions, are not concerned. These
ones must still be triggered during stopping stage.

Concretly, during the stopping stage, the resolvers task is no longer
scheduled if there is no running resolutions. In this case, if a do-resolve
action is evaluated, the task is woken up.

This patch should partially solve the issue #2145.
2023-05-17 16:48:33 +02:00
Christopher Faulet
8bca3cc8c7 MEDIUM: checks: Stop scheduling healthchecks during stopping stage
When the process is stopping, the health-checks are suspended. However the
task is still periodically woken up for nothing. If there is a huge number
of health-checks and if they are woken up in same time, it may lead to a
noticeable CPU consumption for no reason.

To avoid this extra CPU cost, we stop to schedule the health-check tasks
when the proxy is disabled or stopped.

This patch should partially solve the issue #2145.
2023-05-17 14:57:10 +02:00
Christopher Faulet
c8a7bb16b7 CLEANUP: fcgi-app; Remove useless assignment to NULL
When the fcgi configuration is checked and fcgi rules are created, a useless
assignment to NULL is reported by Covertiy. Let's remove it.

This patch should fix the coverity report #2161.
2023-05-17 09:42:37 +02:00
Willy Tarreau
c7b9308f20 BUG/MINOR: clock: automatically adjust the internal clock with the boot time
This is a better and more general solution to the problem described in
this commit:

    BUG/MINOR: checks: postpone the startup of health checks by the boot time

Now we're updating the now_offset that is used to compute now_ms at the
few points where we update the ready date during boot. This ensures that
now_ms while being stable during all the boot process will be correct
and will start with the boot value right after the boot is finished. As
such the patch above is rolled back (we don't want to count the boot
time twice).

This must not be backported because it relies on the more flexible clock
architecture in 2.8.
2023-05-17 09:33:54 +02:00
Willy Tarreau
5345490b8e MINOR: clock: provide a function to automatically adjust now_offset
Right now there's no way to enforce a specific value of now_ms upon
startup in order to compensate for the time it takes to load a config,
specifically when dealing with the health check startup. For this we'd
need to force the now_offset value to compensate for the last known
value of the current date. This patch exposes a function to do exactly
this.
2023-05-17 09:33:54 +02:00
Willy Tarreau
8e978a094d BUG/MINOR: checks: postpone the startup of health checks by the boot time
When health checks are started at boot, now_ms could be off by the boot
time. In general it's not even noticeable, but with very large configs
taking up to one or even a few seconds to start, this can result in a
part of the servers' checks being scheduled slightly in the past. As
such all of them will start groupped, partially defeating the purpose of
the spread-checks setting. For example, this can cause a burst of
connections for the network, or an excess of CPU usage during SSL
handshakes, possibly even causing some timeouts to expire early.

Here in order to compensate for this, we simply add the known boot time
to the computed delay when scheduling the startup of checks. That's very
simple and particularly efficient. For example, a config with 5k servers
in 800 backends checked every 5 seconds, that was taking 3.8 seconds to
start used to show this distribution of health checks previously despite
the spread-checks 50:

   3690 08:59:25
    417 08:59:26
    213 08:59:27
     71 08:59:28
    428 08:59:29
    860 08:59:30
    918 08:59:31
    938 08:59:32
   1124 08:59:33
    904 08:59:34
    647 08:59:35
    890 08:59:36
    973 08:59:37
    856 08:59:38
    893 08:59:39
    154 08:59:40

Now with the fix it shows this:
    470 08:59:59
    929 09:00:00
    896 09:00:01
    937 09:00:02
    854 09:00:03
    827 09:00:04
    906 09:00:05
    863 09:00:06
    913 09:00:07
    873 09:00:08
    162 09:00:09

This should be backported to all supported versions. It depends on
this commit:

    MINOR: clock: measure the total boot time

For 2.8 where the internal clock is now totally independent on the human
one, an more generic fix will consist in simply updating now_ms to reflect
the startup time.
2023-05-17 09:33:54 +02:00
Willy Tarreau
5723b382ed MINOR: stats: report the boot time in "show info"
Just like we have the uptime in "show info", let's add the boot time.
It's trivial to collect as it's just the difference between the ready
date and the start date, and will allow users to monitor this element
in order to take action before it starts becoming problematic. Here
the boot time is reported in milliseconds, so this allows to even
observe sub-second anomalies in startup delays.
2023-05-17 09:33:54 +02:00
Willy Tarreau
da4aa6905c MINOR: clock: measure the total boot time
Some huge configs take a significant amount of time to start and this
can cause some trouble (e.g. health checks getting delayed and grouped,
process not responding to the CLI etc). For example, some configs might
start fast in certain environments and slowly in other ones just due to
the use of a wrong DNS server that delays all libc's resolutions. Let's
first start by measuring it by keeping a copy of the most recently known
ready date, once before calling check_config_validity() and then refine
it when leaving this function. A last call is finally performed just
before deciding to split between master and worker processes, and it covers
the whole boot. It's trivial to collect and even allows to get rid of a
call to clock_update_date() in function check_config_validity() that was
used in hope to better schedule future events.
2023-05-17 09:33:54 +02:00
Willy Tarreau
52fd879953 CLEANUP: stats: update the trash chunk where it's used
When integrating the number of warnings in "show info" in 2.8 with commit
3c4a297d2 ("MINOR: stats: report the total number of warnings issued"),
the update of the trash buffer used by the Tainted flag got displaced
lower. There's no harm for now util someone adds a new metric requiring
a call to chunk_newstr() and gets both values merged. Let's move the
call to its location now.
2023-05-17 09:33:54 +02:00
Christopher Faulet
cb76030356 CLEANUP: check; Remove some useless assignments to NULL
In process_chk_conn(), some assignments to NULL are useless and are reported
by Coverity as unused value. while it is harmless, these assignments can be
removed.

This patch should fix the coverity report #2158.
2023-05-17 09:28:23 +02:00
Aurelien DARRAGON
0d2f1acee6 BUG/MINOR: server: memory leak in _srv_update_status_op() on server DOWN
When server is transitionning from UP to DOWN, a log message is generated.
e.g.: "Server backend_name/server_name is DOWN")

However since f71e064 ("MEDIUM: server: split srv_update_status() in two
functions"), the allocated buffer tmptrash which is used to prepare the
log message is not freed after it has been used, resulting in a small
memory leak each time a server goes DOWN because of an operational
change.

This is a 2.8 specific bug, no backport needed unless the above commit
gets backported.
2023-05-17 09:21:01 +02:00
Aurelien DARRAGON
22d584a993 CLEANUP: server: remove useless tmptrash assigments in srv_update_status()
Within srv_update_status subfunctions _op() and _adm(), each time tmptrash
is freed, we assign it to NULL to ensure it will not be reused.

However, within those functions it is not very useful given that tmptrash
is never checked against NULL except upon allocation through
alloc_trash_chunk(), which happens everytime a new log message is
generated, sent, and then freed right away, so there are no code paths
that could lead to tmptrash being checked for reuse (tmptrash is
systematically overwritten since all log messages are independant from
each other).

This was raised by coverity, see GH #2162.
2023-05-17 09:21:01 +02:00
Christopher Faulet
2d5a5665fe BUG/MINOR: tcp-rules: Don't shortened the inspect-delay when EOI is set
A regression was introduced with the commit cb59e0bc3 ("BUG/MINOR:
tcp-rules: Stop content rules eval on read error and end-of-input").  We
should not shorten the inspect-delay when the EOI flag is set on the SC.

Idea of the inspect-delay is to wait a TCP rule is matching. It is only
interrupted if an error occurs, on abort or if the peer shuts down. It is
also interrupted if the buffer is full. This last case is a bit ambiguous
and discutable. It could be good to add ACLS, like "wait_complete" and
"wait_full" to do so. But for now, we only remove the test on SC_FL_EOI
flag.

This patch must be backported to all stable versions.
2023-05-17 09:21:01 +02:00
Willy Tarreau
b93758cec9 MINOR: checks: make sure spread-checks is used also at boot time
This makes use of spread-checks also for the startup of the check tasks.
This provides a smoother load on startup for uneven configurations which
tend to enable only *some* servers. Below is the connection distribution
per second of the SSL checks of a config with 5k servers spread over 800
backends, with a check inter of 5 seconds:

- default:
    682 08:00:50
    826 08:00:51
    773 08:00:52
   1016 08:00:53
    885 08:00:54
    889 08:00:55
    825 08:00:56
    773 08:00:57
   1016 08:00:58
    884 08:00:59
    888 08:01:00
    491 08:01:01

- with spread-checks 50:
    437 08:01:19
    866 08:01:20
    777 08:01:21
   1023 08:01:22
   1118 08:01:23
    923 08:01:24
    641 08:01:25
    859 08:01:26
    962 08:01:27
    860 08:01:28
    929 08:01:29
    909 08:01:30
    866 08:01:31
    849 08:01:32
    114 08:01:33

- with spread-checks 50 + this patch:
    680 08:01:55
    922 08:01:56
    962 08:01:57
    899 08:01:58
    819 08:01:59
    843 08:02:00
    916 08:02:01
    896 08:02:02
    886 08:02:03
    846 08:02:04
    903 08:02:05
    894 08:02:06
    178 08:02:07

The load is much smoother from the start, this can help initial health
checks succeed when many target the same overloaded server for example.
This could be backported as it should make border-line configs more
reliable across reloads.
2023-05-17 08:10:40 +02:00
Amaury Denoyelle
bf86d89ea6 BUG/MEDIUM: mux-quic: fix EOI for request without payload
When a full message is received for a stream, MUX is responsible to set
EOI flag. This was done through rcv_buf stream callback by checking if
QCS HTX buffer contained the EOM flag.

This is not correct for HTTP without body. In this case, QCS HTX buffer
is never used. Only a local HTX buffer is used to transfer headers just
as stream endpoint is created. As such, EOI is never transmitted to the
upper layer.

If the transfer occur without any issue, this does not seem to cause any
problem. However, in case the transfer is aborted, the stream is never
released which cause a memory leak and prevent the process soft-stop.

To fix this, also check if EOM is put by application layer during
headers conversion. If true, this is transferred through a new argument
to qc_attach_sc() MUX function which is responsible to set the EOI flag.

This issue was reproduced using h2load with hundred of connections.
h2load is interrupted with a SIGINT which causes streams to never be
closed on haproxy side.

This should be backported up to 2.6.
2023-05-16 17:53:45 +02:00
Amaury Denoyelle
1a2faef92f MINOR: mux-quic: uninline qc_attach_sc()
Uninline and move qc_attach_sc() function to implementation source file.
This will be useful for next commit to add traces in it.

This should be backported up to 2.7.
2023-05-16 17:53:45 +02:00
Amaury Denoyelle
3cb78140cf MINOR: mux-quic: properly report end-of-stream on recv
MUX is responsible to put EOS on stream when read channel is closed.
This happens if underlying connection is closed or a RESET_STREAM is
received. FIN STREAM is ignored in this case.

For connection closure, simply check for CO_FL_SOCK_RD_SH.

For RESET_STREAM reception, a new flag QC_CF_RECV_RESET has been
introduced. It is set when RESET_STREAM is received, unless we already
received all data. This is conform to QUIC RFC which allows to ignore a
RESET_STREAM in this case. During RESET_STREAM processing, input buffer
is emptied so EOS can be reported right away on recv_buf operation.

This should be backported up to 2.7.
2023-05-16 17:53:45 +02:00
Amaury Denoyelle
1649469be1 MINOR: mux-quic: add trace to stream rcv_buf operation
Add traces to render each stream transition more explicit. Also, move
ERR_PENDING to ERROR transition after other stream flags are set, as
with the MUX H2 implementation. This is purely a cosmetic change and it
should have no functional impact.

This should be backported up to 2.7.
2023-05-16 17:53:45 +02:00
Amaury Denoyelle
6133aba889 BUG/MINOR: h3: missing goto on buf alloc failure
The following patch introduced proper error management on buffer
allocation failure :
  0abde9dee6
  BUG/MINOR: mux-quic: properly handle buf alloc failure

However, when decoding an empty STREAM frame with just FIN bit set, this
was not done correctly. Indeed, there is a missing goto statement in
case of a NULL buffer check.

This was reported thanks to coverity analysis. This should fix github
issue #2163.

This must be backported up to 2.6.
2023-05-15 14:57:56 +02:00
Amaury Denoyelle
1611a7659b BUG/MINOR: mux-quic: handle properly Tx buf exhaustion
Since the following patch
  commit 6c501ed23b
  BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc
it is not possible to check if Tx buf allocation failed due to a
configured limit exhaustion or a simple memory failure.

This patch fixes it as the condition was inverted. Indeed, if buf_avail
is null, this means that the limit has been reached. On the contrary
case, this is a real memory alloc failure. This caused the flag
QC_CF_CONN_FULL to not be properly used and may have caused disruption
on transfer with several streams or large data.

This was detected due to an abnormal error QUIC MUX traces. Also change
in consequence trace for limit exhaustion to be more explicit.

This must be backported up to 2.6.
2023-05-15 14:06:21 +02:00
William Lallemand
6e0c39d7ac BUILD: ssl: ssl_c_r_dn fetches uses functiosn only available since 1.1.1
Fix the openssl build with older openssl version by disabling the new
ssl_c_r_dn fetch.

This also disable the ssl_client_samples.vtc file for OpenSSL version
older than 1.1.1
2023-05-15 12:07:52 +02:00
Willy Tarreau
d38d8c6ccb BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout
Christopher found as part of the analysis of Tim's issue #1891 that commit
15a4733d5 ("BUG/MEDIUM: mux-h2: make use of http-request and keep-alive
timeouts") introduced in 2.6 incompletely addressed a timeout issue in the
H2 mux. The problem was that the http-keepalive and http-request timeouts
were not applied before it. With that commit they are now considered, but
if a GOAWAY is sent (or even attempted to be sent), then they are not used
anymore again, because the way the code is arranged consists in applying
the client-fin timeout (if set) to the current date, and falling back to
the client timeout, without considering the idle_start period. This means
that a config having a "timeout http-keepalive" would still not close the
connection quickly when facing a client that periodically sends PING,
PRIORITY or whatever other frame types.

In addition, after the GOAWAY was attempted to be sent, there was no check
for pending data in the output buffer, meaning that it would be possible
to truncate some responses in configs involving a very short client-fin
timeout.

Finally the spreading of the closures during the soft-stop brought in 2.6
by commit b5d968d9b ("MEDIUM: global: Add a "close-spread-time" option to
spread soft-stop on time window") didn't consider the particular case of
an idle "pre-connect" connection, which would also live long if a browser
failed to deliver a valid request for a long time.

All of this indicates that the conditions must be reworked so as not to
have that level of exclusion between conditions, but rather stick to the
rules from the doc that are already enforced on other muxes:
  - timeout client always applies if there are data pending, and is
    relative to each new I/O ;
  - timeout http-request applies before the first complete request and
    is relative to the entry in idle state ;
  - timeout http-keepalive applies between idle and the next complete
    request and is relative to the entry in idle state ;
  - timeout client-fin applies when in idle after a shut was sent (here
    the shut is the GOAWAY). The shut may only be considered as sent if
    the buffer is empty and the flags indicate that it was successfully
    sent (or failed) but not if it's still waiting for some room in the
    output buffer for example. This implies that this timeout may then
    lower the http-keepalive/http-request ones.

This is what this patch implements. Of course the client timeout still
applies as a fallback when all the ones above are not set or when their
conditions are not met.

It would seem reasoanble to backport this to 2.7 first, then only after
one or two releases to 2.6.
2023-05-15 12:01:20 +02:00
Abhijeet Rastogi
df97f472fa MINOR: ssl: add new sample ssl_c_r_dn
This patch addresses #1514, adds the ability to fetch DN of the root
ca that was in the chain when client certificate was verified during SSL
handshake.
2023-05-15 10:48:05 +02:00
William Lallemand
7f95469163 MEDIUM: proxy: stop emitting logs for internal proxies when stopping
The HTTPCLIENT and the OCSP-UPDATE proxies are internal proxies, we
don't need to display logs of them stopping during the stopping of the
process.

This patch checks if a proxy has the flag PR_CAP_INT so it doesn't
display annoying messages.
2023-05-15 10:38:09 +02:00
Christopher Faulet
6eb53b138d MINOR: stconn: Remove useless test on sedesc on detach to release the xref
When the SC is detached from the endpoint, the xref between the endpoints is
removed. At this stage, the sedesc cannot be undefined. So we can remove the
test on it.

This issue should fix the issue #2156. No backport needed.
2023-05-15 09:53:30 +02:00
William Lallemand
1601eebcd1 MEDIUM: mworker/cli: does not disconnect the master CLI upon error
In the proxy CLI analyzer, when pcli_parse_request returns -1, the
client was shut to prevent any problem with the master CLI.

This behavior is a little bit excessive and not handy at all in prompt
mode. For example one could have activated multiples mode, then have an
error which disconnect the CLI, and they would have to reconnect and
enter all the modes again.

This patch introduces the pcli_error() function, which only output an
error and flush the input buffer, instead of closing everything.

When encountering a parsing error, this function is used, and the prompt
is written again, without any disconnection.
2023-05-14 18:42:31 +02:00
William Lallemand
4adb4b9903 MEDIUM: session/ssl: return the SSL error string during a SSL handshake error
SSL hanshake error were unable to dump the OpenSSL error string by
default, to do so it was mandatory to configure a error-log-format with
the ssl_fc_err fetch.

This patch implements the session_build_err_string() function which creates
the error log to send during session_kill_embryonic(), a special case is
made with CO_ER_SSL_HANDSHAKE which is able to dump the error string
with ERR_error_string().

Before:
    <134>May 12 17:14:04 haproxy[183151]: 127.0.0.1:49346 [12/May/2023:17:14:04.571] frt2/1: SSL handshake failure

After:
    <134>May 12 17:14:04 haproxy[183151]: 127.0.0.1:49346 [12/May/2023:17:14:04.571] frt2/1: SSL handshake failure (error:0A000418:SSL routines::tlsv1 alert unknown ca)
2023-05-12 17:43:58 +02:00
Amaury Denoyelle
ee65efbfae BUG/MINOR: mux-quic: free task on qc_init() app ops failure
qc_init() is used to initialize a QUIC MUX instance. On failure, each
resources are released via a series of goto statements. There is one
issue if the app_ops.init callback fails. In this case, MUX task is not
freed.

This can cause a crash as the task is already scheduled. When the
handler will run, it will crash when trying to access qcc instance.

To fix this, properly destroy qcc task on fail_install_app_ops label.

The impact of this bug is minor as app_ops.init callback succeeds most
of the time. However, it may fail on allocation failure due to memory
exhaustion.

This may fix github issue #2154.

This must be backported up to 2.7.
2023-05-12 16:37:27 +02:00
Amaury Denoyelle
6c501ed23b BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc
qc_stream_buf_alloc() can fail for two reasons :
* limit of Tx buffer per connection reached
* allocation failure

The first case is properly treated. A flag QC_CF_CONN_FULL is set on the
connection to interrupt emission. It is cleared when a buffer became
available after in order ACK reception and the MUX tasklet is woken up.

The allocation failure was handled with the same mechanism which in this
case is not appropriate and could lead to a connection transfer freeze.
Instead, prefer to close the connection with a QUIC internal error code.

To differentiate the two causes, qc_stream_buf_alloc() API was changed
to return the number of available buffers to the caller.

This must be backported up to 2.6.
2023-05-12 16:26:20 +02:00
Amaury Denoyelle
50fe00650f BUG/MINOR: quic: do not alloc buf count on alloc failure
The total number of buffer per connection for sending is limited by a
configuration value. To ensure this, <stream_buf_count> quic_conn field
is incremented on qc_stream_buf_alloc().

qc_stream_buf_alloc() may fail if the buffer cannot be allocated. In
this case, <stream_buf_count> should not be incremented. To fix this,
simply move increment operation after buffer allocation.

The impact of this bug is low. However, if a connection suffers from
several buffer allocation failure, it may cause the <stream_buf_count>
to be incremented over the limit without being able to go back down.

This must be backported up to 2.6.
2023-05-12 15:55:41 +02:00
Amaury Denoyelle
d00b3093c9 BUG/MINOR: mux-quic: handle properly recv ncbuf alloc failure
The function qc_get_ncbuf() is used to allocate a ncbuf content.
Allocation failure was handled using a plain BUG_ON.

Fix this by a proper error management. This buffer is only used for
STREAM frame reception to support out-of-order offsets. When an
allocation failed, close the connection with a QUIC internal error code.

This should be backported up to 2.6.
2023-05-12 15:52:19 +02:00
Amaury Denoyelle
0abde9dee6 BUG/MINOR: mux-quic: properly handle buf alloc failure
A convenience function qc_get_buf() is implemented to centralize buffer
allocation on MUX and H3 layers. However, allocation failure was not
handled properly with a BUG_ON() statement.

Replace this by proper error management. On emission, streams is
temporarily skip over until the next qc_send() invocation. On reception,
H3 uses this function for HTX conversion; on alloc failure the
connection will be closed with QUIC internal error code.

This must be backported up to 2.6.
2023-05-12 15:51:15 +02:00
Amaury Denoyelle
93dd23cab4 MINOR: mux-quic: remove dedicated function to handle standalone FIN
Remove QUIC MUX function qcs_http_handle_standalone_fin(). The purpose
of this function was only used when receiving an empty STREAM frame with
FIN bit. Besides, it was called by each application protocol which could
have different approach and render the function purpose unclear.

Invocation of qcs_http_handle_standalone_fin() have been replaced by
explicit code in both H3 and HTTP/0.9 module. In the process, use
htx_set_eom() to reliably put EOM on the HTX message.

This should be backported up to 2.7, along with the previous patch which
introduced htx_set_eom().
2023-05-12 15:50:30 +02:00
Amaury Denoyelle
25cf19d5c8 MINOR: htx: add function to set EOM reliably
Implement a new HTX utility function htx_set_eom(). If the HTX message
is empty, it will first add a dummy EOT block. This is a small trick
needed to ensure readers will detect the HTX buffer as not empty and
retrieve the EOM flag.

Replace the H2 code related by a htx_set_eom() invocation. QUIC also has
the same code which will be replaced in the next commit.

This should be backported up to 2.7 before the related QUIC patch.
2023-05-12 15:29:28 +02:00
Frédéric Lécaille
76d502588d BUG/MINOR: quic: Wrong redispatch for external data on connection socket
It is possible to receive datagram from other connection on a dedicated
quic-conn socket. This is due to a race condition between bind() and
connect() system calls.

To handle this, an explicit check is done on each datagram. If the DCID
is not associated to the connection which owns the socket, the datagram
is redispatch as if it arrived on the listener socket.

This redispatch step was not properly done because the source address
specified for the redispatch function was incorrect. Instead of using
the datagram source address, we used the address of the socket
quic-conn which received the datagram due to the above race condition.
Fix this simply by using the address from the recvmsg() system call.

The impact of this bug is minor as redispatch on connection socket
should be really rare. However, when it happens it can lead to several
kinds of problems, like for example a connection initialized with an
incorrect peer address. It can also break the Retry token check as this
relies on the peer address.

In fact, Retry token check failure was the reason this bug was found.
When using h2load with thousands of clients, the counter of Retry token
failure was unusually high. With this patch, no failure is reported
anymore for Retry.

Must be backported to 2.7.
2023-05-12 14:48:30 +02:00
Aurelien DARRAGON
256d581fbd BUG/MINOR: log: fix memory error handling in parse_logsrv()
A check was missing in parse_logsrv() to make sure that malloc-dependent
variable is checked for non-NULL before using it.

If malloc fails, the function raises an error and stops, like it's already
done at a few other places within the function.

This partially fixes GH #2130.

It should be backported to every stable versions.
2023-05-12 09:45:30 +02:00
Aurelien DARRAGON
d4dba38ab1 BUG/MINOR: errors: handle malloc failure in usermsgs_put()
usermsgs_buf.size is set without first checking if previous malloc
attempt succeeded.

This could fool the buffer API into assuming that the buffer is
initialized, resulting in unsafe read/writes.

Guarding usermsgs_buf.size assignment with the malloc attempt result
to make the buffer initialization safe against malloc failures.

This partially fixes GH #2130.

It should be backported up to 2.6.
2023-05-12 09:45:30 +02:00
Aurelien DARRAGON
ceb13b5ed3 MINOR: ncbuf: missing malloc checks in standalone code
Some malloc resulsts were not checked in standalone ncbuf code.

As this is debug/test code, we don't need to explicitly handle memory
errors, we just add some BUG_ON() to ensure that memory is properly
allocated and prevent unexpected results.

This partially fixes issue GH #2130.

No backport needed.
2023-05-12 09:45:30 +02:00
Willy Tarreau
94df1b57ee BUILD: debug: fix build issue on 32-bit platforms in "debug dev task"
Commit 986798718 ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill
tasks and tasklets") caused a build failure on 32-bit platforms when parsing
the task's pointer. Let's use strtoul() and not strtoll(). No backport is
needed, unless the commit above gets backported.
2023-05-12 04:40:06 +02:00
William Lallemand
e279f595ad MINOR: httpclient: allow to disable the DNS resolvers of the httpclient
httpclient.resolvers.disabled allow to disable completely the resolvers
of the httpclient, prevents the creation of the "default" resolvers
section, and does not insert the http do-resolve rule in the proxies.
2023-05-11 21:25:37 +02:00
Willy Tarreau
fe0ba0e9f9 MINOR: cli: make "show fd" identify QUIC connections and listeners
Now we can detect the listener associated with a QUIC listener and report
a bit more info (e.g. listening port and frontend name), and provide a bit
more info about connections as well, and filter on both front connections
and listeners using the "l" and "f" flags.
2023-05-11 17:20:39 +02:00
Willy Tarreau
ea07715ccf MINOR: master/cli: also implement the timed prompt on the master CLI
This provides more consistency between the master and the worker. When
"prompt timed" is passed on the master, the timed mode is toggled. When
enabled, for a master it will show the master process' uptime, and for
a worker it will show this worker's uptime. Example:

  master> prompt timed
  [0:00:00:50] master> show proc
  #<PID>          <type>          <reloads>       <uptime>        <version>
  11940           master          1 [failed: 0]   0d00h02m10s     2.8-dev11-474c14-21
  # workers
  11955           worker          0               0d00h00m59s     2.8-dev11-474c14-21
  # old workers
  11942           worker          1               0d00h02m10s     2.8-dev11-474c14-21
  # programs

  [0:00:00:58] master> @!11955
  [0:00:01:03] 11955> @!11942
  [0:00:02:17] 11942> @
  [0:00:01:10] master>
2023-05-11 16:38:52 +02:00
Willy Tarreau
225555711f MINOR: cli: add an option to display the uptime in the CLI's prompt
Entering "prompt timed" toggles reporting of the process' uptime in
the prompt, which will report days, hours, minutes and seconds since
it was started. As discussed with Tim in issue #2145, this can be
convenient to roughly estimate the time between two outputs, as well
as detecting that a process failed to be reloaded for example.
2023-05-11 16:38:52 +02:00
Willy Tarreau
21d7125c92 BUG/MINOR: cli: don't complain about empty command on empty lines
There's something very irritating on the CLI, when just pressing ENTER,
it complains "Unknown command: ''..." and dumps all the help. This
action is often done to add a bit of clearance after a dump to visually
find delimitors later, but this stupid error makes it unusable. This
patch addresses this by just returning on empty command instead of trying
to look up a matching keyword. It will result in an empty line to mark
the end of the empty command and a prompt again.

It's probably not worth backporting this given that nobody seems to have
complained about it yet.
2023-05-11 16:38:52 +02:00
Aurelien DARRAGON
31b23aef38 CLEANUP: acl: discard prune_acl_cond() function
Thanks to previous commit, we have no more use for prune_acl_cond(),
let's remove it to prevent code duplication.
2023-05-11 15:37:04 +02:00
Aurelien DARRAGON
c610095258 MINOR: tree-wide: use free_acl_cond() where relevant
Now that we have free_acl_cond(cond) function that does cond prune then
frees cond, replace all occurences of this pattern:

   | prune_acl_cond(cond)
   | free(cond)

with:

   | free_acl_cond(cond)
2023-05-11 15:37:04 +02:00
Aurelien DARRAGON
cd9aff1321 CLEANUP: http_act: use http_free_redirect_rule() to clean redirect act
Since redirect rules now have a dedicated cleanup function, better use
it to prevent code duplication.
2023-05-11 15:37:04 +02:00
Aurelien DARRAGON
5313570605 BUG/MINOR: http_rules: fix errors paths in http_parse_redirect_rule()
http_parse_redirect_rule() doesn't perform enough checks around NULL
returning allocating functions.

Moreover, existing error paths don't perform cleanups. This could lead to
memory leaks.

Adding a few checks and a cleanup path to ensure memory errors are
properly handled and that no memory leaks occurs within the function
(already allocated structures are freed on error path).

It should partially fix GH #2130.

This patch depends on ("MINOR: proxy: add http_free_redirect_rule() function")

This could be backported up to 2.4. The patch is also relevant for
2.2 but "MINOR: proxy: add http_free_redirect_rule() function" would
need to be adapted first.

==

Backport notes:

-> For 2.2 only:

Replace:

    (strcmp(args[cur_arg], "drop-query") == 0)

with:

    (!strcmp(args[cur_arg],"drop-query"))

-> For 2.2 and 2.4:

Replace:

    "expects 'code', 'prefix', 'location', 'scheme', 'set-cookie', 'clear-cookie', 'drop-query', 'ignore-empty' or 'append-slash' (was '%s')",

with:

    "expects 'code', 'prefix', 'location', 'scheme', 'set-cookie', 'clear-cookie', 'drop-query' or 'append-slash' (was '%s')",
2023-05-11 15:37:04 +02:00
Aurelien DARRAGON
7abc9224a6 MINOR: proxy: add http_free_redirect_rule() function
Adding http_free_redirect_rule() function to free a single redirect rule
since it may be required to free rules outside of free_proxy() function.

This patch is required for an upcoming bugfix.

[for 2.2, free_proxy function did not exist (first seen in 2.4), thus
http_free_redirect_rule() needs to be deducted from haproxy.c deinit()
function if the patch is required]
2023-05-11 15:37:04 +02:00
Aurelien DARRAGON
8dfc2491d2 BUG/MINOR: proxy: missing free in free_proxy for redirect rules
cookie_str from struct redirect, which may be allocated through
http_parse_redirect_rule() function is not properly freed on proxy
cleanup within free_proxy().

This could be backported to all stable versions.

[for 2.2, free_proxy() did not exist so the fix needs to be performed
directly in deinit() function from haproxy.c]
2023-05-11 15:37:04 +02:00
Christopher Faulet
7542fb43d6 MINOR: stconn: Add a cross-reference between SE descriptor
A xref is added between the endpoint descriptors. It is created when the
server endpoint is attached to the SC and it is destroyed when an endpoint
is detached.

This xref is not used for now. But it will be useful to retrieve info about
an endpoint for the opposite side. It is also the warranty there is still a
endpoint attached on the other side.
2023-05-11 15:37:04 +02:00
Christopher Faulet
efebff35bb BUG/MEDIUM: mux-fcgi: Don't request more room if mux is waiting for more data
A mux must never report it is waiting for room in the channel buffer if this
buffer is empty. Because there is nothing the application layer can do to
unblock the situation. Indeed, when this happens, it means the mux is
waiting for data to progress. It typically happens when all headers are not
received.

In the FCGI mux, if some data remain in the RX buffer but the channel buffer
is empty, it does no longer report it is waiting for room.

This patch should fix the issue #2150. It must be backported as far as 2.6.
2023-05-11 15:37:04 +02:00
Christopher Faulet
a272c39330 BUG/MEDIUM: mux-fcgi: Never set SE_FL_EOS without SE_FL_EOI or SE_FL_ERROR
When end-of-stream is reported by a FCGI stream, we must take care to also
report an error if end-of-input was not reported. Indeed, it is now
mandatory to set SE_FL_EOI or SE_FL_ERROR flags when SE_FL_EOS is set.

It is a 2.8-specific issue. No backport needed.
2023-05-11 15:37:04 +02:00
Willy Tarreau
4cfb0019e6 MINOR: stats: report the listener's protocol along with the address in stats
When "optioon socket-stats" is used in a frontend, its listeners have
their own stats and will appear in the stats page. And when the stats
page has "stats show-legends", then a tooltip appears on each such
socket with ip:port and ID. The problem is that since QUIC arrived, it
was not possible to distinguish the TCP listeners from the QUIC ones
because no protocol indication was mentioned. Now we add a "proto"
legend there with the protocol name, so we can see "tcp4" or "quic6"
and figure how the socket is bound.
2023-05-11 14:52:56 +02:00
Amaury Denoyelle
5f67b17a59 MEDIUM: mux-quic: adjust transport layer error handling
Following previous patch, error notification from quic_conn has been
adjusted to rely on standard connection flags. Most notably, CO_FL_ERROR
on the connection instance when a fatal error is detected.

Check for CO_FL_ERROR is implemented by qc_send(). If set the new flag
QC_CF_ERR_CONN will be set for the MUX instance. This flag is similar to
the local error flag and will abort most of the futur processing. To
ensure stream upper layer is also notified, qc_wake_some_streams()
called by qc_process() will put the stream on error if this new flag is
set.

This should be backported up to 2.7.
2023-05-11 14:12:48 +02:00
Amaury Denoyelle
b2e31d33f5 MEDIUM: quic: streamline error notification
When an error is detected at quic-conn layer, the upper MUX must be
notified. Previously, this was done relying on quic_conn flag
QUIC_FL_CONN_NOTIFY_CLOSE set and the MUX wake callback called on
connection closure.

Adjust this mechanism to use an approach more similar to other transport
layers in haproxy. On error, connection flags are updated with
CO_FL_ERROR, CO_FL_SOCK_RD_SH and CO_FL_SOCK_WR_SH. The MUX is then
notified when the error happened instead of just before the closing. To
reflect this change, qc_notify_close() has been renamed qc_notify_err().
This function must now be explicitely called every time a new error
condition arises on the quic_conn layer.

To ensure MUX send is disabled on error, qc_send_mux() now checks
CO_FL_SOCK_WR_SH. If set, the function returns an error. This should
prevent the MUX from sending data on closing or draining state.

To complete this patch, MUX layer must now check for CO_FL_ERROR
explicitely. This will be the subject of the following commit.

This should be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Amaury Denoyelle
2ad41b8629 MINOR: mux-quic: simplify return path of qc_send()
Remove the unnecessary err label for qc_send(). Anyway, this label
cannot be used once some frames are sent because there is no cleanup
part for it.

This should be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Amaury Denoyelle
b35e32e43b MINOR: mux-quic: factorize send subscribing
Factorize code for send subscribing on the lower layer in a dedicated
function qcc_subscribe_send(). This allows to call the lower layer only
if not already subscribed and print a trace in this case. This should
help to understand when subscribing is really performed.

In the future, this function may be extended to avoid subscribing under
new conditions, such as connection already on error.

This should be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Amaury Denoyelle
04b2208aa0 MINOR: mux-quic: do not send STREAM frames if already subscribe
Do not built STREAM frames if MUX is already subscribed for sending on
lower layer. Indeed, this means that either socket currently encountered
a transient error or congestion window is full.

This change is an optimization which prevents to allocate and release a
series of STREAM frames for nothing under congestion.

Note that nothing is done for other frames (flow-control, RESET_STREAM
and STOP_SENDING). Indeed, these frames are not restricted by flow
control. However, this means that they will be allocated for nothing if
send is blocked on a transient error.

This should be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Amaury Denoyelle
2d5c3f5cd1 MINOR: mux-quic: add traces for stream wake
Add traces for when an upper layer stream is woken up by the MUX. This
should help to diagnose frozen stream issues.

This should be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Amaury Denoyelle
69670e88bd BUG/MINOR: mux-quic: no need to subscribe for detach streams
When detach is conducted by stream endpoint layer, a stream is either
freed or just flagged as detached if the transfer is not yet finished.
In the latter case, the stream will be finally freed via
qc_purge_streams() which is called periodically.

A subscribe was done on quic-conn layer if a stream cannot be freed via
qc_purge_streams() as this means FIN STREAM has not yet been sent.
However, this is unnecessary as either HTX EOM was not yet received and
we are waiting for the upper layer, or FIN stream is still in the buffer
but was not yet transmitted due to an incomplete transfer, in which case
a subscribe should have already been done.

This should be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Amaury Denoyelle
131f2d93e1 BUG/MINOR: mux-quic: do not free frame already released by quic-conn
MUX uses qc_send_mux() function to send frames list over a QUIC
connection. On network congestion, the lower layer will reject some
frames and it is the MUX responsibility to free them. There is another
category of error which are when the sendto() fails. In this case, the
lower layer will free the packet and its attached frames and the MUX
should not touch them.

This model was violated by MUX layer for RESET_STREAM and STOP_SENDING
emission. In this case, frames were freed every time by the MUX on
error. This causes a double free error which lead to a crash.

Fix this by always ensuring if frames were rejected by the lower layer
before freeing them on the MUX. This is done simply by checking if frame
list is not empty, as RESET_STREAM and STOP_SENDING are sent
individually.

This bug was never reproduced in production. Thus, it is labelled as
MINOR.

This must be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Amaury Denoyelle
3fd40935d9 BUG/MINOR: mux-quic: do not prevent shutw on error
Since recent modification of MUX error processing, shutw operation was
skipped for a connection reported as on error. However, this can caused
the stream layer to not be notified about error. The impact of this bug
is unknown but it may lead to stream never closed.

To fix this, simply skip over send operations when connection is on
error while keep notifying the stream layer.

This should be backported up to 2.7.
2023-05-11 14:04:51 +02:00
Willy Tarreau
9615102b01 MINOR: stats: report the number of times the global maxconn was reached
As discussed a few times over the years, it's quite difficult to know
how often we stop accepting connections because the global maxconn was
reached. This is not easy to know because when we reach the limit we
stop accepting but we don't know if incoming connections are pending,
so it's not possible to know how many were delayed just because of this.
However, an interesting equivalent metric consist in counting the number
of times an accepted incoming connection resulted in the limit being
reached. I.e. "we've accepted the last one for now". That doesn't imply
any other one got delayed but it's a factual indicator that something
might have been delayed. And by counting the number of such events, it
becomes easier to know whether some limits need to be adjusted because
they're reached often, or if it's exceptionally rare.

The metric is reported as a counter in show info and on the stats page
in the info section right next to "maxconn".
2023-05-11 13:51:31 +02:00
Willy Tarreau
3c4a297d2b MINOR: stats: report the total number of warnings issued
Now in "show info" we have a TotalWarnings field that reports the total
number of warnings issued since the process started. It's also reported
in the the stats page next to the uptime.
2023-05-11 12:02:21 +02:00
Frédéric Lécaille
0dd4fa58e6 BUG/MINOR: quic: Buggy acknowlegments of acknowlegments function
qc_treat_ack_of_ack() must remove ranges of acknowlegments from an ebtree which
have been acknowledged. This is done keeping track of the largest acknowledged
packet number which has been acknowledged and sent with an ack-eliciting packet.
But due to the data structure of the acknowledgement ranges used to build an ACK frame,
one must leave at least one range in such an ebtree which must at least contain
a unique one-element range with the largest acknowledged packet number as element.

This issue was revealed by @Tristan971 in GH #2140.

Must be backported in 2.7 and 2.6.
2023-05-11 10:33:23 +02:00
Aurelien DARRAGON
d7d507aa8a CLEANUP: hlua_fcn/queue: make queue:push() easier to read
Adding some spaces and code comments in queue:push() function to make
it easier to read.
2023-05-11 09:23:14 +02:00
Aurelien DARRAGON
c0af7cdba2 BUG/MINOR: hlua_fcn/queue: fix reference leak
When pushing a lua object through lua Queue class, a new reference is
created from the object so that it can be safely restored when needed.

Likewise, when popping an object from lua Queue class, the object is
restored at the top of the stack via its reference id.

However, once the object is restored the related queue entry is removed,
thus the object reference must be dropped to prevent reference leak.
2023-05-11 09:23:14 +02:00
Aurelien DARRAGON
bd8a94a759 BUG/MINOR: hlua_fcn/queue: fix broken pop_wait()
queue:pop_wait() was broken during late refactor prior to merge.
(Due to small modifications to ensure that pop() returns nil on empty
queue instead of nothing)

Because of this, pop_wait() currently behaves exactly as pop(), resulting
in 100% active CPU when used in a while loop.

Indeed, _hlua_queue_pop() should explicitly return 0 when the queue is
empty since pop_wait logic relies on this and the pushnil should be
handled directly in queue:pop() function instead.

Adding some comments as well to document this.
2023-05-11 09:23:14 +02:00
Christopher Faulet
0fda8d2c8e BUG/MEDIUM: filters: Don't deinit filters for disabled proxies during startup
During the startup stage, if a proxy was disabled in config, all filters
were released and removed. But it may be an issue if some info are shared
between filters of the same type. Resources may be released too early.

It happens with ACLs defined in SPOE configurations. Pattern expressions can
be shared between filters. To fix the issue, filters for disabled proxies
are no longer released during the startup stage but only when HAProxy is
stopped.

This commit depends on the previous one ("MINOR: spoe: Don't stop disabled
proxies"). Both must be backported to all stable versions.
2023-05-11 09:22:46 +02:00
Christopher Faulet
7f4ffad46e MINOR: spoe: Don't stop disabled proxies
SPOE register a signal handler to be able to stop SPOE applets ASAP during
soft-stop. Disabled proxies must be ignored at this staged because they are
not fully configured.

For now, it is useless but this change is mandatory to fix a bug.
2023-05-11 09:22:46 +02:00
Christopher Faulet
16e314150a BUILD: mjson: Fix warning about unused variables
clang 15 reports unused variables in src/mjson.c:

  src/mjson.c:196:21: fatal error: expected ';' at end of declaration
    int __maybe_unused n = 0;

and

  src/mjson.c:727:17: fatal error: variable 'n' set but not used [-Wunused-but-set-variable]
    int sign = 1, n = 0;

An issue was created on the project, but it was not fixed for now:

   https://github.com/cesanta/mjson/issues/51

So for now, to fix the build issue, these variables are declared as unused.
Of course, if there is any update on this library, be careful to review this
patch first to be sure it is always required.

This patch should fix the issue #1868. It be backported as far as 2.4.
2023-05-11 09:22:46 +02:00
Ilya Shipitsin
83f54b9aef CLEANUP: src/listener.c: remove redundant NULL check
fixes #2031

quoting Willy Tarreau:

"Originally the listeners were intended to work without a bind_conf
(e.g. for FTP processing) hence these tests, but over time the
bind_conf has become omnipresent"
2023-05-11 05:30:03 +02:00
Christopher Faulet
bd90a16564 MEDIUM: stream: Resync analyzers at the end of process_stream() on change
At the end of process_stream(), if there was any change on request/response
analyzers, we now trigger a resync. It is performed if any analyzer is added
but also removed. It should help to catch internal changes on a stream and
eventually avoid it to be frozen.

There is no reason to backport this patch. But it may be good to keep an eye
on it, just in case.
2023-05-10 16:45:36 +02:00
Christopher Faulet
b1368adcc7 BUG/MEDIUM: stream: Forward shutdowns when unhandled errors are caught
In process_stream(), after request and response analyzers evaluation,
unhandled errors are processed, if any. In this case, depending on the case,
remaining request or response analyzers may be removed, unlesse the last one
about end of filters. However, auto-close is not reenabled in same
time. Thus it is possible to not forward the shutdown for a side to the
other one while no analyzer is there to do so or at least to make evolved
the situation.

In theory, it is thus possible to freeze a stream if no wakeup happens. And
it seems possible because it explain a freeze we've oberseved.

This patch could be backported to every stable versions but only after a
period of observation and if it may match an unexplained bug. It should not
lead to any loop but at worst and eventually to truncated messages.
2023-05-10 16:45:36 +02:00
Willy Tarreau
862588a4b5 BUG/MINOR: config: make compression work again in defaults section
When commit ead43fe4f ("MEDIUM: compression: Make it so we can compress
requests as well.") added the test for the direction flags to select the
compression, it implicitly broke compression defined in defaults sections
because the flags from the default proxy were not recopied, hence the
compression was enabled but in no direction.

No backport is needed, that's 2.8 only.
2023-05-10 16:41:21 +02:00
Frédéric Lécaille
b971696296 BUG/MINOR: quic: Possible crash when dumping version information
->others member of tp_version_information structure pointed to a buffer in the
TLS stack used to parse the transport parameters. There is no garantee that this
buffer is available until the connection is released.

Do not dump the available versions selected by the client anymore, but displayed the
chosen one (selected by the client for this connection) and the negotiated one.

Must be backported to 2.7 and 2.6.
2023-05-10 13:26:37 +02:00
Amaury Denoyelle
da24bcfad3 BUG/MEDIUM: mux-quic: wakeup tasklet to close on error
A recent series of commit have been introduced to rework error
generation on QUIC MUX side. Now, all MUX/APP functions uses
qcc_set_error() to set the flag QC_CF_ERRL on error. Then, this flag is
converted to QC_CF_ERRL_DONE with a CONNECTION_CLOSE emission by
qc_send().

This has the advantage of centralizing the CONNECTION_CLOSE generation
in one place and reduces the link between MUX and quic-conn layer.
However, we must now ensure that every qcc_set_error() call is followed
by a QUIC MUX tasklet to invoke qc_send(). This was not the case, thus
when there is no active transfer, no CONNECTION_CLOSE frame is emitted
and the connection remains opened.

To fix this, add a tasklet_wakeup() directly in qcc_set_error(). This is
a brute force solution as this may be unneeded when already in the MUX
tasklet context. However, it is the simplest solution as it is too
tedious for the moment to list all qcc_set_error() invocation outside of
the tasklet.

This must be backported up to 2.7.
2023-05-09 18:42:34 +02:00
Amaury Denoyelle
58721f2192 BUG/MINOR: mux-quic: fix transport VS app CONNECTION_CLOSE
A recent series of patch were introduced to streamline error generation
by QUIC MUX. However, a regression was introduced : every error
generated by the MUX was built as CONNECTION_CLOSE_APP frame, whereas it
should be only for H3/QPACK errors.

Fix this by adding an argument <app> in qcc_set_error. When false, a
standard CONNECTION_CLOSE is used as error.

This bug was detected by QUIC tracker with the following tests
"stop_sending" and "server_flow_control" which requires a
CONNECTION_CLOSE frame.

This must be backported up to 2.7.
2023-05-09 18:42:34 +02:00
Christopher Faulet
a236c58223 BUG/MEDIUM: stats: Require more room if buffer is almost full
This was lost with commit f4258bdf3 ("MINOR: stats: Use the applet API to
write data"). When the buffer is almost full, the stats applet gives up.
When this happens, the applet must require more room. Otherwise, data in the
channel buffer are sent to the client but the applet is not woken up in
return.

It is a 2.8-specific bug, no backport needed.
2023-05-09 16:36:45 +02:00
William Lallemand
930afdf614 BUILD: ssl: buggy -Werror=dangling-pointer since gcc 13.0
GCC complains about swapping 2 heads list, one local and one global.

    gcc -Iinclude  -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -Werror     -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS -DUSE_EPOLL  -DUSE_NETFILTER -DUSE_POLL -DUSE_THREAD  -DUSE_BACKTRACE -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H  -DUSE_GETADDRINFO -DUSE_OPENSSL  -DUSE_SSL -DUSE_LUA -DUSE_ACCEPT4  -DUSE_ZLIB  -DUSE_CPU_AFFINITY -DUSE_TFO -DUSE_NS -DUSE_DL -DUSE_RT  -DUSE_MATH    -DUSE_SYSTEMD  -DUSE_PRCTL  -DUSE_THREAD_DUMP   -DUSE_QUIC   -DUSE_SHM_OPEN   -DUSE_PCRE -DUSE_PCRE_JIT   -I/github/home/opt/include -I/usr/include  -DCONFIG_HAPROXY_VERSION=\"2.8-dev8-7d23e8d1a6db\" -DCONFIG_HAPROXY_DATE=\"2023/04/24\" -c -o src/ssl_sample.o src/ssl_sample.c
    In file included from include/haproxy/pool.h:29,
                     from include/haproxy/chunk.h:31,
                     from include/haproxy/dynbuf.h:33,
                     from include/haproxy/channel.h:27,
                     from include/haproxy/applet.h:29,
                     from src/ssl_sock.c:47:
    src/ssl_sock.c: In function 'tlskeys_finalize_config':
    include/haproxy/list.h:48:88: error: storing the address of local variable 'tkr' in 'tlskeys_reference.p' [-Werror=dangling-pointer=]
       48 | #define LIST_INSERT(lh, el) ({ (el)->n = (lh)->n; (el)->n->p = (lh)->n = (el); (el)->p = (lh); (el); })
          |                                                                                ~~~~~~~~^~~~~~
    src/ssl_sock.c:1086:9: note: in expansion of macro 'LIST_INSERT'
     1086 |         LIST_INSERT(&tkr, &tlskeys_reference);
          |         ^~~~~~~~~~~
    compilation terminated due to -Wfatal-errors.

This appears with gcc 13.0.

The fix uses LIST_SPLICE() instead of inserting the head of the local
list in the global list.

Should fix issue #2136 .
2023-05-09 14:25:10 +02:00
Christopher Faulet
d6f0557deb BUG/MEDIUM: cache: Don't request more room than the max allowed
Since a recent change on the SC API, a producer must specify the amount of
free space it needs to progress when it is blocked. But, it must take care
to never exceed the maximum size allowed in the buffer. Otherwise, the
stream is freezed because it cannot reach the condition to unblock the
producer.

In this context, there is a bug in the cache applet when it fails to dump a
message. It may request more space than allowed. It happens when the cached
object is too big.

It is a 2.8-specific bug. No backport needed.
2023-05-09 11:53:28 +02:00
Frédéric Lécaille
7a01ff7921 BUG/MINOR: quic: Wrong key update cipher context initialization for encryption
As noticed by Miroslav, there was a typo in quic_tls_key_update() which lead
a cipher context for decryption to be initialized and used in place of a cipher
context for encryption. Surprisingly, this did not prevent the key update
from working. Perhaps this is due to the fact that the underlying cryptographic
algorithms used by QUIC are all symetric algorithms.

Also modify incorrect traces.

Must be backported in 2.6 and 2.7.
2023-05-09 11:03:26 +02:00
Frédéric Lécaille
a94612522d CLEANUP: quic: Typo fix for quic_connection_id pool
Remove a "n" extra letter.

Should be backported to 2.7.
2023-05-09 10:48:40 +02:00
Frédéric Lécaille
1bc6e318f0 CLEANUP: quic: Rename several <buf> variables in quic_frame.(c|h)
Most of the function in quic_frame.c and quic_frame.h manipulate <buf> buffer
position variables which have nothing to see with struct buffer variables.
Rename them to <pos>

Should be backported to 2.7.
2023-05-09 10:48:40 +02:00
Willy Tarreau
95e6c9999a BUILD: debug: do not check the isolated_thread variable in non-threaded builds
The build without thread support was broken by commit b30ced3d8 ("BUG/MINOR:
debug: fix incorrect profiling status reporting in show threads") because
it accesses the isolated_thread variable that is not defined when threads
are disabled. In fact both the test on harmless and this one make no sense
without threads, so let's comment out the block and mark the related
variables as unused.

This may have to be backported to 2.7 if the commit above is.
2023-05-07 15:02:30 +02:00
Willy Tarreau
dd9f921b3a CLEANUP: fix a few reported typos in code comments
These are only the few relevant changes among those reported here:

  https://github.com/haproxy/haproxy/actions/runs/4856148287/jobs/8655397661
2023-05-07 07:07:44 +02:00
Willy Tarreau
615c301db4 MINOR: config: allow cpu-map to take commas in lists of ranges
The function that cpu-map uses to parse CPU sets, parse_cpu_set(), was
etended in 2.4 with commit a80823543 ("MINOR: cfgparse: support the
comma separator on parse_cpu_set") to support commas between ranges.
But since it was quite late in the development cycle, by then it was
decided not to add a last-minute surprise and not to magically support
commas in cpu-map, hence the "comma_allowed" argument.

Since then we know that it was not the best choice, because the comma
is silently ignored in the cpu-map syntax, causing all sorts of
surprises in field with threads running on a single node for example.
In addition it's quite common to copy-paste a taskset line and put it
directly into the haproxy configuration.

This commit relaxes this rule an finally allows cpu-map to support
commas between ranges. It simply consists in removing the comma_allowed
argument in the parse_cpu_set() function. The doc was updated to
reflect this.
2023-05-05 18:41:52 +02:00
Amaury Denoyelle
2273af11e0 MINOR: quic: implement oneline format for "show quic"
Add a new output format "oneline" for "show quic" command. This prints
one connection per line with minimal information. The objective is to
have an equivalent of the netstat/ss tools with just enough information
to quickly find connection which are misbehaving.

A legend is printed on the first line to describe the field columns
starting with a dash character.

This should be backported up to 2.7.
2023-05-05 18:08:37 +02:00
Amaury Denoyelle
bc1f5fed72 MINOR: quic: add format argument for "show quic"
Add an extra optional argument for "show quic" to specify desired output
format. Its objective is to control the verbosity per connections. For
the moment, only "full" is supported, which is the already implemented
output with maximum information.

This should be backported up to 2.7.
2023-05-05 18:06:51 +02:00
Aurelien DARRAGON
86fb22c557 MINOR: hlua_fcn: add Queue class
Adding a new lua class: Queue.

This class provides a generic FIFO storage mechanism that may be shared
between multiple lua contexts to easily pass data between them, as stock
Lua doesn't provide easy methods for passing data between multiple
coroutines.

New Queue object may be obtained using core.queue()
(it works like core.concat() for a concat Class)

Lua documentation was updated (including some usage examples)
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
40cd44f52c MINOR: hlua: declare hlua_gethlua() function
Declaring hlua_gethlua() function to make it usable from hlua_fcn.c.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
e0b16355ce CLEANUP: hlua: hlua_register_task() may longjmp
Adding __LJMP prefix to hlua_register_task() to indicate that the
function may longjmp when executed.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
977688bd57 MINOR: server: fix message report when IDRAIN is set and MAINT is cleared
Remaining in drain mode after removing one of server admins flags leads
to this message being generated:

"Server name/backend is leaving forced drain but remains in drain mode."

However this is not necessarily true: the server might just be leaving
MAINT with the IDRAIN flag set, so the report is incorrect in this case.
(FDRAIN was not set so it cannot be cleared)

To prevent confusion around this message and to comply with the code
comment above it: we remove the "leaving forced drain" precision to
make the report suitable for multiple transitions.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
a2c5321045 BUG/MINOR: hlua: spinning loop in hlua_socket_handler()
Since 3157222 ("MEDIUM: hlua/applet: Use the sedesc to report and detect
end of processing"), hlua_socket_handler() might spin loop if the hlua
socket is destroyed and some data was left unconsumed in the applet.

Prior to the above commit, the stream was explicitly KILLED
(when ctx->die == 1) so the app couldn't spinloop on unconsumed data.
But since the refactor this is no longer the case.

To prevent unconsumed data from waking the applet indefinitely, we consume
pending data when either one of EOS|ERROR|SHR|SHW flags are set, as it is
done everywhere else this check is performed in the code. Hence it was
probably overlooked in the first place during the refacto.

This bug is 2.8 specific only, so no backport needed.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
717a38d135 MINOR: hlua: expose proxy mailers
Proxy mailers, which are configured using "email-alert" directives
in proxy sections from the configuration, are now being exposed
directly in lua thanks to the proxy:get_mailers() method which
returns a class containing the various mailers settings if email
alerts are configured for the given proxy (else nil is returned).

Both the class and the proxy method were marked as LEGACY since this
feature relies on mailers configuration, and the goal here is to provide
the last missing bits of information to lua scripts in order to make
them capable of sending email alerts instead of relying on the soon-to-
be deprecated mailers implementation based on checks (see src/mailers.c)

Lua documentation was updated accordingly.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
5bed48fec8 MINOR: mailers/hlua: disable email sending from lua
Exposing a new hlua function, available from body or init contexts, that
forcefully disables the sending of email alerts even if the mailers are
defined in haproxy configuration.

This will help for sending email directly from lua.
(prevent legacy email sending from intefering with lua)
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
0bd53b2152 MINOR: hlua/event_hdl: expose SERVER_CHECK event
Exposing SERVER_CHECK event through the lua API.

New lua class named ServerEventCheck was added to provide additional
data for SERVER_CHECK event.

Lua documentation was updated accordingly.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
dcbc2d2cac MINOR: checks/event_hdl: SERVER_CHECK event
Adding a new event type: SERVER_CHECK.

This event is published when a server's check state ought to be reported.
(check status change or check result)

SERVER_CHECK event is provided as a server event with additional data
carrying relevant check's context such as check's result and health.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
948dd3ddfb MINOR: hlua: expose SERVER_ADMIN event
Exposing SERVER_ADMIN event in lua and updating the documentation.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
a163d65254 MINOR: server/event_hdl: add SERVER_ADMIN event
Adding a new SERVER event in the event_hdl API.

SERVER_ADMIN is implemented as an advanced server event.
It is published each time the administrative state changes.
(when s->cur_admin changes)

SERVER_ADMIN data is an event_hdl_cb_data_server_admin struct that
provides additional info related to the admin state change, but can
be casted as a regular event_hdl_cb_data_server struct if additional
info is not needed.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
c99f3adf10 MINOR: hlua: expose SERVER_STATE event
Exposing SERVER_STATE event in lua and updating the documentation.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
c249f6d964 OPTIM: server: publish UP/DOWN events from STATE change
Reuse cb_data from STATE event to publish UP and DOWN events.
This saves some CPU time since the event is only constructed
once to publish STATE, STATE+UP or STATE+DOWN depending on the
state change.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
e3eea29f48 MINOR: server/event_hdl: add SERVER_STATE event
Adding a new SERVER event in the event_hdl API.

SERVER_STATE is implemented as an advanced server event.
It is published each time the server's effective state changes.
(when s->cur_state changes)

SERVER_STATE data is an event_hdl_cb_data_server_state struct that
provides additional info related to the server state change, but can
be casted as a regular event_hdl_cb_data_server struct if additional
info is not needed.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
306a5fc987 MINOR: server/event_hdl: publish macro helper
add a macro helper to help publish server events to global and
per-server subscription list at once since all server events
support both subscription modes.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
fc84553df8 MINOR: hlua_fcn: add Proxy.get_srv_act() and Proxy.get_srv_bck()
Proxy.get_srv_act: number of active servers that are eligible for LB
Proxy.get_srv_bck: number of backup servers that are eligible for LB
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
fc759b4ac2 MINOR: hlua_fcn: add Server.get_pend_conn() and Server.get_cur_sess()
Server.get_pend_conn: number of pending connections to the server
Server.get_cur_sess: number of current sessions handled by the server

Lua documentation was updated accordingly.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
3889efa8e4 MINOR: hlua_fcn: add Server.get_proxy()
Server.get_proxy(): get the proxy to which the server belongs
(or nil if not available)
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
4be36a1337 MINOR: hlua_fcn: add Server.get_trackers()
This function returns an array of servers who are currently tracking
the server.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
406511a2df MINOR: hlua_fcn: add Server.tracking()
This function returns the currently tracked server, if any.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
7a03dee36f MINOR: hlua_fcn: add Server.is_dynamic()
This function returns true if the current server is dynamic,
meaning that it was instantiated at runtime (ie: from the cli)
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
c72051d53a MINOR: hlua_fcn: add Server.is_backup()
This function returns true if the current server is a backup server.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
862a0fe75a MINOR: hlua_fcn: fix Server.is_draining() return type
Adjusting Server.is_draining() return type from integer to boolean
to comply with the documentation.
2023-05-05 16:28:32 +02:00
Christopher Faulet
e7405d4124 MEDIUM: stconn: Check room needed to unblock opposite SC when data was sent
After a sending attempt, we check the opposite SC to see if it is waiting
for a minimum free space to receive more data. If the condition is
respected, it is unblocked. 0 is special case where the SC is
unconditionally unblocked.
2023-05-05 15:44:23 +02:00
Christopher Faulet
18b3309f38 MEDIUM: stconn: Check room needed to unblock SC on fast-forward
During fast-forward, if the SC is waiting for a minimum free space to
receive more data and some data was sent, it is only unblock is the
condition is respected. 0 is special case where the SC is unconditionally
unblocked.
2023-05-05 15:44:23 +02:00
Christopher Faulet
c184b11b1a MEDIUM: applet: Check room needed to unblock opposite SC when data was consumed
If the opposite SC is waiting for a minimum free space to receive more data,
it is only unblock is the condition is respected. 0 is a special cases where
the opposite SC is always unblocked.
2023-05-05 15:44:23 +02:00
Christopher Faulet
fab82bfd55 BUG/MEDIUM: stconn: Unblock SC from stream if there is enough room to progrees
At the end of process_stream(), in sc_update_rx(), the SC is now unblocked
if it was waiting for room and the free space in the input buffer is large
enough. This patch should fix an issue with the compression filter that can
leave the channel's buffer empty while the endpoint is waiting for room to
progress. Indeed, in this case, because the buffer is empty, there is no
send attempt and no other way to unblock the SE.

This commit depends on following commits:

  * MEDIUM: tree-wide: Change sc API to specify required free space to progress
  * MINOR: stconn: Add a field to specify the room needed by the SC to progress
  * MINOR: peers: Use the applet API to send message
  * MINOR: stats: Use the applet API to write data
  * MINOR: cli: Use applet API to write output message

It should fix a regression introduced with the commit 341a5783b
("BUG/MEDIUM: stconn: stop to enable/disable reads from streams via
si_update_rx").

It must be backported iff the commit above is also backported. It was not
backported yet and it is thus probably a good idea to not do so to avoid to
backport too many change..
2023-05-05 15:44:23 +02:00
Christopher Faulet
7b3d38a633 MEDIUM: tree-wide: Change sc API to specify required free space to progress
sc_need_room() now takes the required free space to receive more data as
parameter. All calls to this function are updated accordingly. For now, this
value is set but not used. When we are waiting for a buffer, 0 is used. So
we expect to be unblocked ASAP. However this must be reviewed because
SC_FL_NEED_BUF is probably enough in this case and this flag is already set
if the input buffer allocation fails.
2023-05-05 15:44:23 +02:00
Christopher Faulet
9aed1124ed MINOR: stconn: Add a field to specify the room needed by the SC to progress
When the SC is blocked because it is waiting for room in the input buffer,
it will be responsible to specify the minimum free space required to
progress. In this commit, we only introduce the field in the stconn
structure that will be used to store this value. It is a signed value with
the following meaning:

  * -1: The SC is waiting for room but not based on the buffer state. It
        will be typically used during splicing when the pipe is full. In
        this case, only a successful send can unblock the SC.

  * >= 0; The minimum free space in the input buffer to unblock the SC. 0 is
          a special value to specify the SC must be unblocked ASAP, by the
          stream, at the end of process_stream() or when output data are
          consumed on the opposite side.
2023-05-05 15:41:30 +02:00
Christopher Faulet
7a48b72d39 MINOR: peers: Use the applet API to send message
The peers applet now use the applet API to send message instead of the
channel API. This way, it does not need to take care to request more room if
it fails to put data into the channel's buffer.
2023-05-05 15:41:30 +02:00
Christopher Faulet
f4258bdf3b MINOR: stats: Use the applet API to write data
stats_putchk() is updated to use the applet API instead of the channel API
to write data. To do so, the appctx is passed as parameter instead of the
channel. This way, the applet does not need to take care to request more
room it it fails to put data into the channel's buffer.
2023-05-05 15:41:29 +02:00
Christopher Faulet
e8ee27b0fd MINOR: cli: Use applet API to write output message
Instead of using the channel API to to write output message from the CLI
applet, we use the applet API. This way, the applet does not need to take
care to request more room it it fails to put its message into the channel's
buffer.
2023-05-05 15:41:19 +02:00
William Lallemand
b6ae2aafde MINOR: ssl: allow to change the signature algorithm for client authentication
This commit introduces the keyword "client-sigalgs" for the bind line,
which does the same as "sigalgs" but for the client authentication.

"ssl-default-bind-client-sigalgs" allows to set the default parameter
for all the bind lines.

This patch should fix issue #2081.
2023-05-05 00:05:46 +02:00
William Lallemand
1d3c822300 MINOR: ssl: allow to change the server signature algorithm
This patch introduces the "sigalgs" keyword for the bind line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-bind-sigalgs" in
the default section.

This patch was originally written by Bruno Henc.
2023-05-04 22:43:18 +02:00
Willy Tarreau
e69919d1ba CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash()
The function isn't used anymore since each call place performs its
own loop. Let's get rid of it.
2023-05-04 19:19:04 +02:00
Willy Tarreau
009b5519e6 MINOR: debug: make "show threads" properly iterate over all threads
Previously it would re-dump all threads to the same trash if  the output
buffer was full, which it never was since the trash is of the same size.
Now it dumps one thread, copies it to the buffer and yields until it can
continue. Showing 256 threads works as expected.
2023-05-04 19:15:50 +02:00
Willy Tarreau
880d1684a7 MINOR: debug: write panic dump to stderr one thread at a time
Currently large setups cannot dump all their threads because they're
first dumped to the trash buffer, then copied to stderr. Here we can
now change this, instead we dump one thread at a time into the trash
and immediately send it to stderr. We also keep a copy into a local
trash chunk that's assigned to thread_dump_buffer so that a core file
still contains a copy of a large number of threads, which is generally
sufficient for the vast majority of situations.

It was verified that dumping 256 threads now produces ~55kB of output
and all of them are properly dumped.
2023-05-04 19:15:50 +02:00
Willy Tarreau
9a6ecbd590 MEDIUM: debug: simplify the thread dump mechanism
The thread dump mechanism that is used by "show threads" and by the
panic dump is overly complicated due to an initial misdesign. It
firsts wakes all threads, then serializes their dumps, then releases
them, while taking extreme care not to face colliding dumps. In fact
this is not what we need and it reached a limit where big machines
cannot dump all their threads anymore due to buffer size limitations.

What is needed instead is to be able to dump *one* thread, and to let
the requester iterate on all threads.

That's what this patch does. It adds the thread_dump_buffer to the
struct thread_ctx so that the requester offers the buffer to the
thread that is about to be dumped. This buffer also serves as a lock.
A thread at rest has a NULL, a valid pointer indicates the thread is
using it, and 0x1 (NULL+1) is used by the dumped thread to tell the
requester it's done. This makes sure that a given thread is dumped
once at a time. In addition to this, the calling thread decides
whether it accesses the thread by itself or via the debug signal
handler, in order to get a backtrace. This is much saner because the
calling thread is free to do whatever it wants with the buffer after
each thread is dumped, and there is no dependency between threads,
once they've dumped, they're free to continue (and possibly to dump
for another requester if needed). Finally, when the THREAD_DUMP
feature is disabled and the debug signal is not used, the requester
accesses the thread by itself like before.

For now we still have the buffer size limitation but it will be
addressed in future patches.
2023-05-04 19:15:44 +02:00
Christopher Faulet
34f81d5815 BUG/MINOR: mux-h2: Also expect data when waiting for a tunnel establishment
When a client H2 stream is waiting for a tunnel establishment, it must state
it expects data from server. It is the second fix that should fix
regressions of the commit 2722c04b ("MEDIUM: mux-h2: Don't expect data from
server as long as request is unfinished")

It is a 2.8-specific bug. No backport needed.
2023-05-04 16:58:33 +02:00
Willy Tarreau
cb01f5daa7 BUG/MINOR: debug: do not emit empty lines in thread dumps
In 2.3, commit 471425f51 ("BUG/MINOR: debug: Don't dump the lua stack
if it is not initialized") introduced the possibility to emit an empty
line when there's no Lua info to dump. The problem is that doing this
on the CLI in "show threads" marks the end of the output, and it may
affect some external tools. We need to make sure that LFs are only
emitted if there's something on the line and that all lines properly
start with the prefix.

This may be backported as far as 2.0 since the commit above was
backported there.
2023-05-04 16:51:50 +02:00
Amaury Denoyelle
d4af04198b MINOR: mux-quic: close connection asap on local error
With the change for QUIC MUX local error API, the new flag QC_CF_ERRL is
now checked on qc_detach(). If set, qcs instance is freed even though
transfer is not finished. This should help to quickly release qcs and
eventually all MUX instance resources.

To further accelerate this, a specific check has been added in
qc_shutw(). It is skipped if local error flag is set to prevent noisy
reset stream invocation. In the same way, QUIC MUX is not rescheduled on
qc_recv_buf() operation if local error flag set.

This should be backported up to 2.7.
2023-05-04 16:36:51 +02:00
Amaury Denoyelle
35542ce7bf MINOR: mux-quic: report local error on stream endpoint asap
If an error a detected at the MUX layer, all remaining stream endpoints
should be closed asap with error set. This is now done by checking for
QC_CF_ERRL flag on qc_wake_some_streams() and qc_send_buf(). To complete
this, qc_wake_some_streams() is called by qc_process() if needed.

This should help to quickly release streams as soon as a new error is
detected locally by the MUX or APP layer. This allows to in turn free
the MUX instance itself. Previously, error would not have been
automatically reported until the transport layer closure would occur
on CONNECTION_CLOSE emission.

This should be backported up to 2.7.
2023-05-04 16:36:51 +02:00
Amaury Denoyelle
51f116d65e MINOR: mux-quic: adjust local error API
When a fatal error is detected by the QUIC MUX or H3 layer, the
connection should be closed with a CONNECTION_CLOSE with an error code
as the reason.

Previously, a direct call was used to the quic_conn layer to try to
close the connection. This API was adjusted to be more flexible. Now,
when an error is detected, the function qcc_set_error() is called. This
set the flag QC_CF_ERRL with the error code stored by the MUX. The
connection will be closed soon so most of the operations are not
conducted anymore. Connection is then finally closed during qc_send()
via quic_conn layer if QC_CF_ERRL is set. This will set the flag
QC_CF_ERRL_DONE which indicates that the MUX instance can be freed.

This model is cleaner and brings the following improvments :
- interaction with quic_conn layer for closure is centralized on a
  single function
- CO_FL_ERROR is not set anymore. This was incorrect as this should be
  reserved to errors reported by the transport layer to be similar with
  other haproxy components. As a consequence, qcc_is_dead() has been
  adjusted to check for QC_CF_ERRL_DONE to release the MUX instance.

This should be backported up to 2.7.
2023-05-04 16:36:51 +02:00
Amaury Denoyelle
b8901d2c86 MINOR: mux-quic: wake up after recv only if avail data
When HTX content is transferred from qcs instance to upper stream
endpoint, a wakeup is conducted for MUX tasklet. However, this is only
necessary if demux was interrupted due to a full QCS HTX buffer.

This should be backported up to 2.7.
2023-05-04 16:36:51 +02:00
Amaury Denoyelle
8d44bfaf0b MINOR: mux-quic: add trace event for local error
Add a dedicated trace event QMUX_EV_QCC_ERR. This is used for locally
detected error when a CONNECTION_CLOSE should be emitted.

This should be backported up to 2.7.
2023-05-04 16:36:51 +02:00
Amaury Denoyelle
b737f95009 BUG/MINOR: mux-quic: prevent quic_conn error code to be overwritten
When MUX performs a graceful shutdown, quic_conn error code is set to a
"no error" code which depends on the application layer used. However,
this may overwrite a previous error code if quic_conn layer has detected
an error on its side.

In practice, this behavior has not been seen on production. In fact, it
may have undesirable effect only if this error code modification happens
between the quic_conn error detection and the emission of the
CONNECTION_CLOSE, so it should be pretty rare. However, there is still a
tiny possibility it may happen.

To prevent this, first check that quic_conn error code is not set before
setting it. Ideally, transport layer API should be adjusted to be able
to set this without fiddling with the quic_conn directly.

This should be backported up to 2.6.
2023-05-04 16:36:51 +02:00
Christopher Faulet
4403cdf653 BUG/MEDIUM: mux-h2: Properly handle end of request to expect data from server
The commit 2722c04b ("MEDIUM: mux-h2: Don't expect data from server as long
as request is unfinished") introduced a regression in the H2 multiplexer.
The end of the request is not systematically handled to state a H2 stream on
client side now expexts data from the server.

Indeed, while the client is uploading its request, the H2 stream warns it
does not expect data from the server. This way, no server timeout is applied
at this stage. When end of the request is detected, the H2 stream must state
it now expects the server response. This enables the server timeout.

However, it was only performed at one place while the end of the request can
be handled at different places. First, during a zero-copy in
h2_rcv_buf(). Then, when the SC is created with the full request. Because of
this bug, it is possible to totally disable the server timeout for H2
streams.

In h2_rcv_buf(), we now rely on h2s flags to detect the end of the request,
but only when the rxbuf was emptied.

It is a 2.8-specific bug. No backport needed.
2023-05-04 16:29:27 +02:00
Willy Tarreau
e5e62231d8 MINOR: debug: permit the "debug dev loop" to run under isolation
Sometimes it's convenient to test the effect of tasks running under
isolation, e.g. to validate the contents of the crash dumps. Let's
add an optional "isolated" keyword to "debug dev loop" for this.
2023-05-04 11:50:26 +02:00
Willy Tarreau
b30ced3d88 BUG/MINOR: debug: fix incorrect profiling status reporting in show threads
Thread dumps include a field "prof" for each thread that reports whether
task profiling is currently active or not. It turns out that in 2.7-dev1,
commit 680ed5f28 ("MINOR: task: move profiling bit to per-thread")
mistakenly replaced it with a check for the current thread's bit in the
thread dumps, which basically is the only place where another thread is
being watched. The same mistake was done a few lines later by confusing
threads_want_rdv_mask with the profiling mask. This mask disappeared
in 2.7-dev2 with commit 598cf3f22 ("MAJOR: threads: change thread_isolate
to support inter-group synchronization"), though instead we know the ID
of the isolated thread. This commit fixes this and now reports "isolated"
instead of "wantrdv".

This can be backported to 2.7.
2023-05-04 11:41:33 +02:00
Willy Tarreau
8b3e39e37b MINOR: activity: allow "show activity" to restart in the middle of a line
16kB buffers are not enough to dump 4096 threads with up to 10 bytes value
on each line. By storing the column number in the applet's context, we can
now restart from the last attempted column. This requires to dump all values
as they are produced, but it doesn't cost that much: a 4096-thread output
from a fesh process produces 300kB of output in ~8ms, or ~400us per call
(19*16kB), most of which are spent in vfprintf(). Given that we don't print
more than needed, it doesn't really change anything.

The main caveat is that when interrupted on such large lines, there's a
great possibility that the total or average on the first column doesn't
match anymore the sum or average of all dumped values. In order to avoid
this whenever possible (typically less than ~1500 threads), we first try
to dump entire lines and only proceed one column at a time when we have
to retry a failed dump. This is already the same for other stats that are
dumped in an interruptible way anyway and there's little that can be done
about it at this point (and not much immediately perceived benefit in
doing this with extreme accuracy for >1500 threads).
2023-05-03 17:26:11 +02:00
Willy Tarreau
6ed0b9885d MINOR: activity: allow "show activity" to restart dumping on any line
When using many threads, it's difficult to see the end of "show activity"
due to the numerous columns which fill the buffer. For example a dump of
a 256-thread, freshly booted process yields around 15kB.

Here by arranging the dump in a loop around a switch/case block where
each case checks the code line number against the current dump position,
we have a restartable counter for free with a granularity of the line of
code, without having to maintain a matching between states and specific
lines. It just requires to reset the trash buffer for each line and to
try to dump it after each line.

Now dumping 256 threads after a few seconds of traffic happily emits 20kB.
2023-05-03 17:24:54 +02:00
Willy Tarreau
8ee0d11cb8 MINOR: activity: iterate over all fields in a main loop for dumping
Now each line of "show activity" will iterate over n+2 fields, one for
the line header, one for the total, and one per thread. This will soon
allow us to save the current state in a restartable way.
2023-05-03 17:24:54 +02:00
Willy Tarreau
a465b21516 MINOR: activity: show the line header inside the SHOW_VAL macro
Doing so will allow us to drop the extra chunk_appendf() dedicated to
the line header and simplify iteration over restartable columns.
2023-05-03 17:24:54 +02:00
Willy Tarreau
5ddf9bea09 MINOR: activity: use a single macro to iterate over all fields
Instead of having SHOW_AVG() and SHOW_TOT(), let's just have SHOW_VAL()
which iterates over all values.
2023-05-03 17:24:54 +02:00
Willy Tarreau
ff508f12c6 BUILD: cli: fix build on Windows due to isalnum() implemented as a macro
Commit 986798718 ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill
tasks and tasklets") broke the build on windows due to this:

  src/debug.c:940:95: error: array subscript has type char [-Werror=char-subscripts]
    940 |  caller && may_access(caller) && may_access(caller->func) && isalnum(*caller->func) ? caller->func : "0",
        |                                                                      ^~~~~~~~~~~~~

It's classical on platforms which implement ctype.h as macros instead of
functions, let's cast it as uchar. No backport is needed.
2023-05-03 16:32:50 +02:00
William Lallemand
117c7fde06 BUG/MINOR: ssl/sample: x509_v_err_str converter output when not found
The x509_v_err_str converter now outputs the numerical value as a string
when the corresponding constant name was not found.

Must be backported as far as 2.7.
2023-05-03 15:19:38 +02:00
Willy Tarreau
9867987182 DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets
When analyzing certain types of bugs in field, sometimes it would be
nice to be able to wake up a task or tasklet to see how events progress
(e.g. to detect a missing wakeup condition), or expire or kill such a
task. This restricted command shows hte current state of a task or tasklet
and allows to manipulate it like this. However it must be used with extreme
care because while it does verify that the pointers are mapped, it cannot
know if they point to a real task, and performing such actions on something
not a task will easily lead to a crash. In addition, performing a "kill"
on a task has great chances of provoking a deferred crash due to a double
free and/or another kill that is not idempotent. Use with extreme care!
2023-05-03 11:47:44 +02:00
Willy Tarreau
dd01448953 MINOR: debug: clarify "debug dev stream" help message
The help message was insufficient to figure how to use it and specify
the stream pointer and changes to operate.
2023-05-03 11:47:44 +02:00
Willy Tarreau
65efd33c06 BUG/MINOR: stream/cli: fix stream age calculation in "show sess"
The "show sess" command displays the stream's age in synthetic form,
and also makes it appear in the long version (show sess all). But that
last one uses the wrong origin, it uses accept_date.tv_sec instead of
accept_ts (formerly known as tv_accept). This was introduced in 1.4.2
with the long format, with commit 66dc20a17 ("[MINOR] stats socket: add
show sess <id> to dump details about a session"), while the code that
split the two variables was introduced in 1.3.16 with commit b7f694f20
("[MEDIUM] implement a monotonic internal clock"). This problem was
revealed by recent change ad5a5f677 ("MEDIUM: tree-wide: replace timeval
with nanoseconds in tv_accept and tv_request") that made this value report
random garbage, and generally emphasized by the fact that in 2.8 the two
clocks have sufficiently large an offset for such mistakes to be noticeable
early.

Arguably a difference between date and accept_date could also make sense,
to indicate if the stream had been there for more than 49 days, but this
would introduce instabilities for most sockets (including negative times)
for extremely rare cases while the goal is essentially to see how much
longer than a configured timeout a stream has been there. And that's what
other locations (including the short form) provide.

This patch could be backported but most users will never notice. In case
of backport, tv_accept.tv_sec should be used instead of accept_date.tv_sec.
2023-05-03 11:47:44 +02:00
William Lallemand
64a77e3ea5 MINOR: ssl: disable CRL checks with WolfSSL when no CRL file
WolfSSL is enabling by default the CRL checks even if a CRL file wasn't
provided. This patch resets the default X509_STORE flags so this is
not checked by default.
2023-05-02 18:30:11 +02:00
Tim Duesterhus
0ababda701 BUG/MINOR: stats: fix typo in TotalSplicedBytesOut field name
An additional `d` slipped in there.

This likely should not be backported, because scripts might rely on the
typoed name.

Public discussion on this topic here:
   https://www.mail-archive.com/haproxy@formilux.org/msg43359.html
2023-05-02 11:15:49 +02:00
Amaury Denoyelle
bc0adfa334 MINOR: proxy: factorize send rate measurement
Implement a new dedicated function increment_send_rate() which can be
call anywhere new bytes must be accounted for global total sent.
2023-04-28 16:53:44 +02:00
Amaury Denoyelle
1bcb695a05 MINOR: quic: use real sending rate measurement
Before this patch, global sending rate was measured on the QUIC lower
layer just after sendto(). This meant that all QUIC frames were
accounted for, including non STREAM frames and also retransmission.

To have a better reflection of the application data transferred, move
the incrementation into the MUX layer. This allows to account only for
STREAM frames payload on their first emission.

This should be backported up to 2.6.
2023-04-28 16:52:26 +02:00
Aleksandar Lazic
5529c9985e MINOR: sample: Add bc_rtt and bc_rttvar
This Patch adds fetch samples for backends round trip time.
2023-04-28 16:31:08 +02:00
Willy Tarreau
c05d30e9d8 MINOR: clock: replace the timeval start_time with start_time_ns
Now that "now" is no more a timeval, there's no point keeping a copy
of it as a timeval, let's also switch start_time to nanoseconds, it
simplifies operations.
2023-04-28 16:08:08 +02:00
Willy Tarreau
69530f59ae MEDIUM: clock: replace timeval "now" with integer "now_ns"
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".

All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really  is.

The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.

The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.

The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.

One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
2023-04-28 16:08:08 +02:00
Willy Tarreau
eed5da1037 MINOR: clock: do not use now.tv_sec anymore
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
2023-04-28 16:08:08 +02:00
Willy Tarreau
e8e4712771 MINOR: checks: use a nanosecond counters instead of timeval for checks->start
Now we store the checks start date as a nanosecond timestamps instead
of a timeval, this will simplify the operations with "now" in the near
future.
2023-04-28 16:08:08 +02:00
Willy Tarreau
b68d308aec MINOR: activity: use nanoseconds, not timeval to compute uptime
Now that we have the required functions, let's get rid of the timeval
in intermediary calculations.
2023-04-28 16:08:08 +02:00
Willy Tarreau
563efe62e9 MINOR: stats: use nanoseconds, not timeval to compute uptime
Now that we have the required functions, let's get rid of the timeval
in intermediary calculations.
2023-04-28 16:08:08 +02:00
Willy Tarreau
ad5a5f6779 MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request
Let's get rid of timeval in storage of internal timestamps so that they
are no longer mistaken for wall clock time. These were exclusively used
subtracted from each other or to/from "now" after being converted to ns,
so this patch removes the tv_to_ns() conversion to use them natively. Two
occurrences of tv_isge() were turned to a regular wrapping subtract.
2023-04-28 16:08:08 +02:00
Willy Tarreau
aaebcae58b MINOR: spoe: switch the timeval-based timestamps to nanosecond timestamps
Various points were collected during a request/response and were stored
using timeval. Let's now switch them to nanosecond based timestamps.
2023-04-28 16:08:08 +02:00
Willy Tarreau
76d343d3d3 MINOR: time: replace calls to tv_ms_elapsed() with a linear subtract
Instead of operating on {sec, usec} now we convert both operands to
ns then subtract them and convert to ms. This is a first step towards
dropping timeval from these timestamps.

Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at
all and could be removed.
2023-04-28 16:08:08 +02:00
Willy Tarreau
7222db7b84 BUG/MINOR: stats: report the correct start date in "show info"
The "show info" help for "Start_time_sec" says "Start time in seconds"
so it's definitely the start date in human format, not the internal one
that is solely used to compute uptime. Since commit 28360dc ("MEDIUM:
clock: force internal time to wrap early after boot"), both are split
apart since the start time takes into account the offset needed to cause
the early wraparound, so we must only use start_date here.

No backport is needed.
2023-04-28 16:08:08 +02:00
Christopher Faulet
2ebac6a320 BUG/MEDIUM: tcpcheck: Don't eval custom expect rule on an empty buffer
The commit a664aa6a6 ("BUG/MINOR: tcpcheck: Be able to expect an empty
response") instroduced a regression for expect rules relying on a custom
function. Indeed, there is no check on the buffer to be sure it is not empty
before calling the custom function. But some of these functions expect to
have data and don't perform any test on the buffer emptiness.

So instead of fixing all custom functions, we just don't eval them if the
buffer is empty.

This patch must be backported but only if the commit above was backported
first.
2023-04-28 15:01:10 +02:00
Christopher Faulet
89aeabff5b BUG/MINOR: resolvers: Use sc_need_room() to wait more room when dumping stats
It was a cut/paste typo during stream-interface to conn-stream
refactoring. sc_have_room() was used instead of sc_need_room().

This patch must be backported as far as 2.6.
2023-04-28 08:51:34 +02:00
Christopher Faulet
e99c43907c BUG/MEDIUM: spoe: Don't start new applet if there are enough idle ones
It is possible to start too many applets on sporadic burst of events after
an inactivity period. It is due to the way we estimate if a new applet must
be created or not. It is based on a frequency counter. We compare the events
processing rate against the number of events currently processed (in
progress or waiting to be processed). But we should also take care of the
number of idle applets.

We already track the number of idle applets, but it is global and not
per-thread. Thus we now also track the number of idle applets per-thread. It
is not a big deal because this fills a hole in the spoe_agent structure.
Thanks to this counter, we can refrain applets creation if there is enough
idle applets to handle currently processed events.

This patch should be backported to every stable versions.
2023-04-28 08:51:34 +02:00
Willy Tarreau
d2f61de8c2 BUG/MINOR: hlua: return wall-clock date, not internal date in core.now()
That's hopefully the last one affected by this. It was a bit trickier
because there's the promise in the doc that the date is monotonous, so
we continue to use now-start_time as the uptime value and add it to
start_date to get the current date. It was also emphasized by commit
28360dc ("MEDIUM: clock: force internal time to wrap early after boot"),
causing core.now() to return a date of Mar 20 on Apr 27. No backport is
needed.
2023-04-27 18:44:14 +02:00