haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-11 17:46:57 +02:00

Author	SHA1	Message	Date
Amaury Denoyelle	e05edf71df	MINOR: cfgparse: rename "rev@" prefix to "rhttp@" 'rev@' was used to specify a bind/server used with reverse HTTP transport. This notation was deemed not explicit enough. Rename it 'rhttp@' instead.	2023-10-20 14:44:37 +02:00
Amaury Denoyelle	9d4c7c1151	MINOR: server: convert @reverse to rev@ standard format Remove the recently introduced '@reverse' notation for HTTP reverse servers. Instead, reuse the 'rev@' prefix already defined for bind lines.	2023-10-20 14:44:37 +02:00
Amaury Denoyelle	3222047a14	MINOR: listener: add nbconn kw for reverse connect Previously, maxconn keyword was reused for a specific usage on reverse HTTP binds to specify the number of active connect to proceed. To avoid confusion, introduce a new dedicated keyword 'nbconn' which is specific to reverse HTTP bind. This new keyword is forbidden for non-reverse listener. A fatal error is emitted during config parsing if this rule is not respected. It's safe because it's also forbidden to mix standard and reverse addresses on the same bind line. Internally, nbconn value will be reassigned to 'maxconn' member of bind_conf structure. This ensures that listener layer will automatically reenable the preconnect task each time a connection is closed.	2023-10-20 14:44:37 +02:00
Amaury Denoyelle	37d7e52cc6	MINOR: cfgparse: forbid mixing reverse and standard listeners Reverse HTTP listeners are very specific and share only a very limited subset of keywords with other listeners. As such, it is probable meaningless to mix standard and reverse addresses on the same bind line. This patch emits a fatal error during configuration parsing if this is the case.	2023-10-20 14:44:37 +02:00
Christopher Faulet	60e7116be0	BUG/MEDIUM: peers: Fix synchro for huge number of tables The number of updates sent at once was limited to not loop too long to emit updates when the buffer size is huge or when the number of sync tables is huge. The limit can be configured and is set to 200 by default. However, this fix introduced a bug. It is impossible to syncrhonize two peers if the number of tables is higher than this limit. Thus by default, it is not possible to sync two peers if there are more than 200 tables to sync. Technically speacking, a teaching process is finished if we loop on all tables with no new update messages sent. Because we are limited at each call, the loop is splitted on several calls. However the restart point for the next loop is always the last table for which we emitted an update message. Thus with more tables than the limit, the loop never reachs the end point. Worse, in conjunction with the bug fixed by "BUG/MEDIUM: peers: Be sure to always refresh recconnect timer in sync task", it is possible to trigger the watchdog because the applets may be woken up in loop and leave requesting more room while its buffer is empty. To fix the issue, restart conditions for a teaching loop were changed. If the teach process is interrupted, we now save the restart point, called stop_local_table. It is the last evaluated table on the previous loop. This restart point is reset when the teach process is finished. In additionn, the updates_sent variable in peer_send_msgs() was renamed to updates to avoid ambiguities. Indeed, the variable is incremented, whether messages were sent or not. This patch must be backported as far as 2.6.	2023-10-20 14:32:12 +02:00
Willy Tarreau	3dd963b35f	BUG/MINOR: mux-h2: fix http-request and http-keep-alive timeouts again Stefan Behte reported that since commit `f279a2f14` ("BUG/MINOR: mux-h2: refresh the idle_timer when the mux is empty"), the http-request and http-keep-alive timeouts don't work anymore on H2. Before this patch, and since 3e448b9b64 ("BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout"), they would only be refreshed after stream frames were sent (HEADERS or DATA) but the patch above that adds more refresh points broke these so they don't expire anymore as long as there's some activity. We cannot just revert the fix since it also addressed an isse by which sometimes the timeout would trigger too early and provoque truncated responses. The right approach here is in fact to only use refresh the idle timer when the mux buffer was flushed from any such stream frames. In order to achieve this, we're now setting a flag on the connection whenever we write a stream frame, and we consider that flag when deciding to refresh the buffer after it's emptied. This way we'll only clear that flag once the buffer is empty and there were stream data in it, not if there were no such stream data. In theory it remains possible to leave the flag on if some control data is appended after the buffer and it's never cleared, but in practice it's not a problem as a buffer will always get sent in large blocks when the window opens. Even a large buffer should be emptied once in a while as control frames will not fill it as much as data frames could. Given the patch above was backported as far as 2.6, this patch should also be backported as far as 2.6.	2023-10-18 17:17:58 +02:00
Willy Tarreau	91ed52976c	MINOR: dgram: allow to set rcv/sndbuf for dgram sockets as well tune.rcvbuf.client and tune.rcvbuf.server are not suitable for shared dgram sockets because they're per connection so their units are not the same. However, QUIC's listener and log servers are not connected and take per-thread or per-process traffic where a socket log buffer might be too small, causing undesirable packet losses and retransmits in the case of QUIC. This essentially manifests in listener mode with new connections taking a lot of time to set up under heavy traffic due to the small queues causing delays. Let's add a few new settings allowing to set these shared socket sizes on the frontend and backend side (which reminds that these are per-front/back and not per client/server hence not per connection).	2023-10-18 17:01:19 +02:00
Christopher Faulet	203211f4cb	REORG: stconn/muxes: Rename init step in fast-forwarding Instead of speaking of an initialisation stage for each data fast-forwarding, we now use the negociate term. Thus init_ff/init_fastfwd functions were renamed nego_ff/nego_fastfwd.	2023-10-18 12:46:55 +02:00
Christopher Faulet	023564b685	MINOR: global: Add an option to disable the zero-copy forwarding The zero-copy forwarding or the mux-to-mux forwarding is a way to fast-forward data without using the channels buffers. Data are transferred from a mux to the other one. The kernel splicing is an optimization of the zero-copy forwarding. But it can also use normal buffers (but not channels ones). This way, it could be possible to fast-forward data with muxes not supporting the kernel splicing (H2 and H3 muxes) but also with applets. However, this mode can introduce regressions or bugs in future (just like the kernel splicing). Thus, It could be usefull to disable this optim. To do so, in configuration, the global tune settting 'tune.disable-zero-copy-forwarding' may be set in a global section or the '-dZ' command line parameter may be used to start HAProxy. Of course, this also disables the kernel splicing.	2023-10-17 18:51:13 +02:00
Christopher Faulet	322d660d08	MINOR: tree-wide: Only rely on co_data() to check channel emptyness Because channel_is_empty() function does now only check the channel's buffer, we can remove it and rely on co_data() instead. Of course, all tests must be inverted. channel_is_empty() is thus removed.	2023-10-17 18:51:13 +02:00
Christopher Faulet	20c463955d	MEDIUM: channel: don't look at iobuf to report an empty channel It is important to split channels and I/O buffers. When data are pushed in an I/O buffer, we consider them as forwarded. The channel never sees them. Fast-forwarded data are now handled in the SE only.	2023-10-17 18:51:13 +02:00
Christopher Faulet	2d80eb5b7a	MEDIUM: mux-h1: Add fast-forwarding support The H1 multiplexer now implements callbacks function to produce and consume fast-forwarded data.	2023-10-17 18:51:13 +02:00
Christopher Faulet	91f1c5519a	MEDIUM: raw-sock: Specifiy amount of data to send via snd_pipe callback When data were sent using the kernel splicing, we tried to send all data with no restriction. Most of time it is valid. However, because the payload representation may differ between the producer and the consumer, it is important to be able to specify how must data to send via the splicing. Of course, for performance reason, it is important to maximize amount of data send via splicing at each call. However, on edge-cases, this now can be limited.	2023-10-17 18:51:13 +02:00
Christopher Faulet	7ffb7624fe	MINOR: connection: Remove mux callbacks about splicing The kernel splicing support was totally remove waiting for the mux-to-mux fast-forward implementation. So corresponding mux callbacks can be removed now.	2023-10-17 18:51:13 +02:00
Christopher Faulet	8b89fe3d8f	MINOR: stconn: Temporarily remove kernel splicing support mux-to-mux fast-forwarding will be added. To avoid mix with the splicing and simplify the commits, the kernel splicing support is removed from the stconn. CF_KERN_SPLICING flag is removed and the support is no longer tested in process_stream(). In the stconn part, rcv_pipe() callback function is no longer called. Reg-tests scripts testing the kernel splicing are temporarly marked as broken.	2023-10-17 18:51:13 +02:00
Christopher Faulet	242c6f0ded	MINOR: connection: Add new mux callbacks to perform data fast-forwarding To perform the mux-to-mux data fast-forwarding, 4 new callbacks were added into the mux_ops structure. 2 callbacks will be used from the stconn for fast-forward data. The 2 other callbacks will be used by the endpoint to request an iobuf to the opposite endpoint. * fastfwd() callback function is used by a producer to forward data * resume_fastfwd() callback function is used by a consumer if some data are blocked in the iobuf, to resume the data forwarding. * init_fastfwd() must be used by an endpoint (the producer one), inside the fastfwd() callback to request an iobuf to the opposite side (the consumer one). * done_fastfwd() must be used by an endpoint (the producer one) at the end of fastfwd() to notify the opposite endpoint (the consumer one) if data were forwarded or not. This API is still under development, so it may evolved. Especially when the fast-forward will be extended to applets. 2 helper functions were also added into the SE api to wrap init_fastfwd() and done_fastfwd() callback function of the underlying endpoint. For now, this API is unsed and not implemented at all in muxes.	2023-10-17 18:51:13 +02:00
Christopher Faulet	1d68bebb70	MINOR: stconn: Extend iobuf to handle a buffer in addition to a pipe It is unused for now, but the iobuf structure now owns a pointer to a buffer. This buffer will be used to perform mux-to-mux fast-forwarding when splicing is not supported or unusable. This pointer should be filled by an endpoint to let the opposite one forward data. Extra fields, in addition to the buffer, are mandatory because the buffer may already contains some data. the ".offset" field may be used may be used as the position to start to copy data. Finally, the amount of data copied in this buffer must be saved in ".data" field. Some flags are also added to prepare next changes. And helper stconn fnuctions are updated to also count data in the buffer. For a first implementation, it is not planned to handle data in the buffer and in the pipe in same time. But it will be possible to do so.	2023-10-17 18:51:13 +02:00
Christopher Faulet	e52519ac83	MINOR: stconn: Start to introduce mux-to-mux fast-forwarding notion Instead of talking about kernel splicing at stconn/sedesc level, we now try to talk about mux-to-mux fast-forwarding. To do so, 2 functions were added to know if there are fast-forwarded data and to retrieve this amount of data. Of course, for now, there is only data in a pipe. In addition, some flags were renamed to reflect this notion. Note the channel's documentation was not updated yet.	2023-10-17 18:51:13 +02:00
Christopher Faulet	8bee0dcd7d	MEDIUM: stconn/channel: Move pipes used for the splicing in the SE descriptors The pipes used to put data when the kernel splicing is in used are moved in the SE descriptors. For now, it is just a simple remplacement but there is a major difference with the pipes in the channel. The data are pushed in the consumer's pipe while it was pushed in the producer's pipe. So it means the request data are now pushed in the pipe of the backend SE descriptor and response data are pushed in the pipe of the frontend SE descriptor. The idea is to hide the pipe from the channel/SC side and to be able to handle fast-forwading in pipe but also in buffer. To do so, the pipe is inside a new entity, called iobuf. This entity will be extended.	2023-10-17 18:51:13 +02:00
Willy Tarreau	68d02e5fa9	BUG/MINOR: mux-h2: make up other blocked streams upon removal from list An interesting issue was met when testing the mux-to-mux forwarding code. In order to preserve fairness, in h2_snd_buf() if other streams are waiting in send_list or fctl_list, the stream that is attempting to send also goes to its list, and will be woken up by h2_process_mux() or h2_send() when some space is released. But on rare occasions, there are only a few (or even a single) streams waiting in this list, and these streams are just quickly removed because of a timeout or a quick h2_detach() that calls h2s_destroy(). In this case there's no even to wake up the other waiting stream in its list, and this will possibly resume processing after some client WINDOW_UPDATE frames or even new streams, so usually it doesn't last too long and it not much noticeable, reason why it was left that long. In addition, measures have shown that in heavy network-bound benchmark, this exact situation happens on less than 1% of the streams (reached 4% with mux-mux). The fix here consists in replacing these LIST_DEL_INIT() calls on h2s->list with a function call that checks if other streams were queued to the send_list recently, and if so, which also tries to resume them by calling h2_resume_each_sending_h2s(). The detection of late additions is made via a new flag on the connection, H2_CF_WAIT_INLIST, which is set when a stream is queued due to other streams being present, and which is cleared when this is function is called. It is particularly difficult to reproduce this case which is particularly timing-dependent, but in a constrained environment, a test involving 32 conns of 20 streams each, all downloading a 10 MB object previously showed a limitation of 17 Gbps with lots of idle CPU time, and now filled the cable at 25 Gbps. This should be backported to all versions where it applies.	2023-10-17 16:43:44 +02:00
Aurelien DARRAGON	94d0f77deb	MINOR: server: introduce "log-bufsize" kw "log-bufsize" may now be used for a log server (in a log backend) to configure the bufsize of implicit ring associated to the server (which defaults to BUFSIZE).	2023-10-13 10:05:07 +02:00
Aurelien DARRAGON	b30bd7adba	MEDIUM: log/balance: support for the "hash" lb algorithm hash lb algorithm can be configured with the "log-balance hash <cnv_list>" directive. With this algorithm, the user specifies a converter list with <cnv_list>. The produced log message will be passed as-is to the provided converter list, and the resulting hash will be used to select the log server that will receive the log message.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	7251344748	MINOR: sample: add sample_process_cnv() function split sample_process() in 2 parts in order to be able to only process the converter part of a sample expression from an existing input sample struct passed as parameter.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	a7563158f7	MINOR: lbprm: support for the "none" hash-type function Allow the use of the "none" hash-type function so that the key resulting from the sample expression is directly used as the hash. This can be useful to do the hashing manually using available hashing converters, or even custom ones, and then inform haproxy that it can directly rely on the sample expression result which is explictly handled as an integer in this case.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	9a74a6cb17	MAJOR: log: introduce log backends Using "mode log" in a backend section turns the proxy in a log backend which can be used to log-balance logs between multiple log targets (udp or tcp servers) log backends can be used as regular log targets using the log directive with "backend@be_name" prefix, like so: \| log backend@mybackend local0 A log backend will distribute log messages to servers according to the log load-balancing algorithm that can be set using the "log-balance" option from the log backend section. For now, only the roundrobin algorithm is supported and set by default.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	e58a9b4baf	MINOR: sink: add sink_new_from_srv() function This helper function can be used to create a new sink from an existing server struct (and thus existing proxy as well), in order to spare some resources when possible.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	5c0d1c1a74	MEDIUM: sink: inherit from caller fmt in ring_write() when rings didn't set one implicit rings were automatically forced to the parent logger format, but this was done upon ring creation. This is quite restrictive because we might want to choose the desired format right before generating the log header (ie: when producing the log message), depending on the logger (log directive) that is responsible for the log message, and with current logic this is not possible. (To this day, we still have dedicated implicit ring per log directive, but this might change) In ring_write(), we check if the sink->fmt is specified: - defined: we use it since it is the most precise format (ie: for named rings) - undefined: then we fallback to the format from the logger With this change, implicit rings' format is now set to UNSPEC upon creation. This is safe because the log header building function automatically enforces the "raw" format when UNSPEC is set. And since logger->format also defaults to "raw", no change of default behavior should be expected.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	6dad0549a5	MEDIUM: log/sink: simplify log header handling Introduce log_header struct to easily pass log header data between functions and use that to simplify the logic around log header handling. While at it, some outdated comments were updated as well. No change in behavior should be expected.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	a9b185f34e	MEDIUM: log: introduce log target log targets were immediately embedded in logger struct (previously named logsrv) and could not be used outside of this context. In this patch, we're introducing log_target type with the associated helper functions so that it becomes possible to declare and use log targets outside of loggers scope.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	18da35c123	MEDIUM: tree-wide: logsrv struct becomes logger When 'log' directive was implemented, the internal representation was named 'struct logsrv', because the 'log' directive would directly point to the log target, which used to be a (UDP) log server exclusively at that time, hence the name. But things have become more complex, since today 'log' directive can point to ring targets (implicit, or named) for example. Indeed, a 'log' directive does no longer reference the "final" server to which the log will be sent, but instead it describes which log API and parameters to use for transporting the log messages to the proper log destination. So now the term 'logsrv' is rather confusing and prevents us from introducing a new level of abstraction because they would be mixed with logsrv. So in order to better designate this 'log' directive, and make it more generic, we chose the word 'logger' which now replaces logsrv everywhere it was used in the code (including related comments). This is internal rewording, so no functional change should be expected on user-side.	2023-10-13 10:05:06 +02:00
Amaury Denoyelle	7d76ffb2a4	BUG/MINOR: quic: fix qc.cids access on quic-conn fail alloc CIDs tree is now allocated dynamically since the following commit : `276697438d` MINOR: quic: Use a pool for the connection ID tree. This can caused a crash if qc_new_conn() is interrupted due to an intermediary failed allocation. When freeing all connection members, free_quic_conn_cids() is used. However, this function does not support a NULL cids. To fix this, simply check that cids is NULL during free_quic_conn_cids() prologue. This bug was reproduced using -dMfail. No need to backport.	2023-10-13 08:52:16 +02:00
Willy Tarreau	5798b5bb14	BUG/MAJOR: connection: make sure to always remove a connection from the tree Since commit `5afcb686b` ("MAJOR: connection: purge idle conn by last usage") in 2.9-dev4, the test on conn->toremove_list added to conn_get_idle_flag() in 2.8 by commit `3a7b539b1` ("BUG/MEDIUM: connection: Preserve flags when a conn is removed from an idle list") becomes misleading. Indeed, now both toremove_list and idle_list are shared by a union since the presence in these lists is mutually exclusive. However, in conn_get_idle_flag() we check for the presence in the toremove_list to decide whether or not to delete the connection from the tree. This test now fails because instead it sees the presence in the idle or safe list via the union, and concludes the element must not be removed. Thus the element remains in the tree and can be found later after the connection is released, causing crashes that Tristan reported in issue #2292. The following config is sufficient to reproduce it with 2 threads: defaults mode http timeout client 5s timeout server 5s timeout connect 1s listen front bind :8001 server next 127.0.0.1:8002 frontend next bind :8002 timeout http-keep-alive 1 http-request redirect location / Sending traffic with a few concurrent connections and some short timeouts suffices to instantly crash it after ~10k reqs: $ h2load -t 4 -c 16 -n 10000 -m 1 -w 1 http://0:8001/ With Amaury we analyzed the conditions in which the function is called in order to figure a better condition for the test and concluded that ->toremove_list is never filled there so we can safely remove that part from the test and just move the flag retrieval back to what it was prior to the 2.8 patch above. Note that the patch is not reverted though, as the parts that would drop the unexpected flags removal are unchanged. This patch must NOT be backported. The code in 2.8 works correctly, it's only the change in 2.9 that makes it misbehave.	2023-10-12 14:20:03 +02:00
Amaury Denoyelle	f59f8326f9	REORG: quic: cleanup traces definition Move all QUIC trace definitions from quic_conn.h to quic_trace-t.h. Also remove multiple definition trace_quic macro definition into quic_trace.h. This forces all QUIC source files who relies on trace to include it while reducing the size of quic_conn.h.	2023-10-11 14:15:31 +02:00
Frédéric Lécaille	bd83b6effb	BUG/MINOR: quic: Avoid crashing with unsupported cryptographic algos This bug was detected when compiling haproxy against aws-lc TLS stack during QUIC interop runner tests. Some algorithms could be negotiated by haproxy through the TLS stack but not fully supported by haproxy QUIC implentation. This leaded tls_aead() to return NULL (same thing for tls_md(), tls_hp()). As these functions returned values were never checked, they could triggered segfaults. To fix this, one closes the connection as soon as possible with a handshake_failure(40) TLS alert. Note that as the TLS stack successfully negotiates an algorithm, it provides haproxy with CRYPTO data before entering ->set_encryption_secrets() callback. This is why this callback (ha_set_encryption_secrets() on haproxy side) is modified to release all the CRYPTO frames before triggering a CONNECTION_CLOSE with a TLS alert. This is done calling qc_release_pktns_frms() for all the packet number spaces. Modify some quic_tls_keys_hexdump to avoid crashes when the ->aead or ->hp EVP_CIPHER are NULL. Modify qc_release_pktns_frms() to do nothing if the packet number space passed as parameter is not intialized. This bug does not impact the QUIC TLS compatibily mode (USE_QUIC_OPENSSL_COMPAT). Thank you to @ilia-shipitsin for having reported this issue in GH #2309. Must be backported as far as 2.6.	2023-10-11 11:52:22 +02:00
William Lallemand	deed2b6077	BUILD: ssl: enable keylog for WolfSSL Enable the keylog feature when linking against an WolfSSL library which has the 'HAVE_SECRET_CALLBACK' define. Only supports <= TLSv1.2 secret dump.	2023-10-09 21:34:25 +02:00
William Lallemand	9a4c53d96c	CLEANUP: ssl: remove compat functions for openssl < 1.0.0 The openssl-compat.h file has some function which were implemented in order to provide compatibility with openssl < 1.0.0. Most of them where to support the 0.9.8 version, but we don't support this version anymore. This patch removes the deprecated code from openssl-compat.h	2023-10-09 17:27:53 +02:00
William Lallemand	1918bcbc12	BUILD: ssl: enable keylog for awslc AWSLC suports SSL_CTX_set_keylog_callback(), this patch enables the build with the keylog feature for this library.	2023-10-09 16:17:30 +02:00
William Lallemand	4428ac4f70	BUILD: ssl: add 'secure_memcmp' converter for WolfSSL and awslc CRYPTO_memcmp is supported by both awslc and wolfssl, lets add the suport for the 'secure_memcmp' converter into the build.	2023-10-09 15:44:50 +02:00
William Lallemand	bf426eecd7	BUILD: ssl: add 'ssl_c_r_dn' fetch for WolfSSL WolfSSL supports SSL_get0_verified_chain() so we can activate this feature.	2023-10-09 15:09:47 +02:00
William Lallemand	d75bc06bdc	BUILD: ssl: enable 'ciphersuites' for WolfSSL WolfSSL supports setting the 'ciphersuites', lets enable the keyword for it.	2023-10-09 14:56:43 +02:00
Willy Tarreau	1e3422e6b0	BUG/MEDIUM: actions: always apply a longest match on prefix lookup Many actions take arguments after a parenthesis. When this happens, they have to be tagged in the parser with KWF_MATCH_PREFIX so that a sub-word is sufficient (since by default the whole block including the parenthesis is taken). The problem with this is that the parser stops on the first match. This was OK years ago when there were very few actions, but over time new ones were added and many actions are the prefix of another one (e.g. "set-var" is the prefix of "set-var-fmt"). And what happens in this case is that the first word is picked. Most often that doesn't cause trouble because such similar-looking actions involve the same custom parser so actually the wrong selection of the first entry results in the correct parser to be used anyway and the error to be silently hidden. But it's getting worse when accidentally declaring prefixes in multiple files, because in this case it will solely depend on the object file link order: if the longest name appears first, it will be properly detected, but if it appears last, its other prefix will be detected and might very well not be related at all and use a distinct parser. And this is random enough to make some actions succeed or fail depending on the build options that affect the linkage order. Worse: what if a keyword is the prefix of another one, with a different parser but a compatible syntax ? It could seem to work by accident but not do the expected operations. The correct solution is to always look for the longest matching name. This way the correct keyword will always be matched and used and there will be no risk to randomly pick the wrong anymore. This fix must be backported to the relevant stable releases.	2023-10-06 17:06:44 +02:00
Christopher Faulet	a633338b55	BUG/MEDIUM: stconn: Fix comparison sign in sc_need_room() sc_need_room() function may be called with a negative value. In this case, the intent is to be notified if any space was made in the channel buffer. In the function, we get the min between the requested room and the maximum possible room in the buffer, considering it may be an HTX buffer. However this max value is unsigned and leads to an unsigned comparison, casting the negative value to an unsigned value. Of course, in this case, this always leads to the wrong result. This bug seems to have no effect but it is hard to be sure. To fix the issue, we take care to respect the requested room sign by casting the max value to a signed integer. This patch must be backported to 2.8.	2023-10-06 15:34:31 +02:00
Aurelien DARRAGON	205d480d9f	MINOR: sink: refine forward_px usage now forward_px only serves as a hint to know if a proxy was created specifically for the sink, in which case the sink is responsible for it. Everywhere forward_px was used in appctx context: get the parent proxy from the sft->srv instead. This permits to finally get rid of the double link dependency between sink and proxy.	2023-10-06 15:34:31 +02:00
Willy Tarreau	90fa2eaa15	MINOR: haproxy: permit to register features during boot The regtests are using the "feature()" predicate but this one can only rely on build-time options. It would be nice if some runtime-specific options could be detected at boot time so that regtests could more flexibly adapt to what is supported (capabilities, splicing, etc). Similarly, certain features that are currently enabled with USE_XXX could also be automatically detected at build time using ifdefs and would simplify the configuration, but then we'd lose the feature report in the feature list which is convenient for regtests. This patch makes sure that haproxy -vv shows the variable's contents and not the macro's contents, and adds a new hap_register_feature() to allow the code to register a new keyword.	2023-10-06 11:40:02 +02:00
Remi Tricot-Le Breton	a5e96425a2	MEDIUM: cache: Add "Origin" header to secondary cache key This patch add a hash of the Origin header to the cache's secondary key. This enables to manage store responses that have a "Vary: Origin" header in the cache when vary is enabled. This cannot be considered as a means to manage CORS requests though, it only processes the Origin header and hashes the presented value without any form of URI normalization. This need was expressed by Philipp Hossner in GitHub issue #251. Co-Authored-by: Philipp Hossner <philipp.hossner@posteo.de>	2023-10-05 10:53:54 +02:00
William Lallemand	45174e4fdc	BUILD: quic: allow USE_QUIC to work with AWSLC This patch fixes the build with AWSLC and USE_QUIC=1, this is only meant to be able to build for now and it's not feature complete. The set_encryption_secrets callback has been split in set_read_secret and set_write_secret. Missing features: - 0RTT was disabled. - TLS1_3_CK_CHACHA20_POLY1305_SHA256, TLS1_3_CK_AES_128_CCM_SHA256 were disabled - clienthello callback is missing, certificate selection could be limited (RSA + ECDSA at the same time)	2023-10-04 16:55:19 +02:00
Christopher Faulet	f32e28eddc	MINOR: mux-h1: Add flags if outgoing msg contains a header about its payload If a "Content-length" or "Transfer-Encoding; chunked" headers is found or inserted in an outgoing message, a specific flag is now set on the H1 stream. H1S_F_HAVE_CLEN is set for "Content-length" header and H1S_F_HAVE_CHNK for "Transfer-Encoding: chunked". This will be useful to properly format outgoing messages, even if one of these headers was removed by hand (with no update of the message meta-data).	2023-10-04 15:34:18 +02:00
Amaury Denoyelle	bd001ff346	MINOR: backend: refactor specific source address allocation Refactor alloc_bind_address() function which is used to allocate a sockaddr if a connection to a target server relies on a specific source address setting. The main objective of this change is to be able to use this function outside of backend module, namely for preconnections using a reverse server. As such, this function is now exported globally. For reverse connect, there is no stream instance. As such, the function parts which relied on it were reduced to the minimal. Now, stream is only used if a non-static address is configured which is useful for usesrc client\|clientip\|hdr_ip. These options have no sense for reverse connect so it should be safe to use the same function.	2023-10-03 17:49:12 +02:00
Amaury Denoyelle	2ac5d9a657	MINOR: quic: handle perm error on bind during runtime Improve EACCES permission errors encounterd when using QUIC connection socket at runtime : * First occurence of the error on the process will generate a log warning. This should prevent users from using a privileged port without mandatory access rights. * Socket mode will automatically fallback to listener socket for the receiver instance. This requires to duplicate the settings from the bind_conf to the receiver instance to support configurations with multiple addresses on the same bind line.	2023-10-03 16:52:02 +02:00
Amaury Denoyelle	3ef6df7387	MINOR: quic: define quic-socket bind setting Define a new bind option quic-socket : quic-socket [ connection \| listener ] This new setting works in conjunction with the existing configuration global tune.quic.socket-owner and reuse the same semantics. The purpose of this setting is to allow to disable connection socket usage on listener instances individually. This will notably be useful when needing to deactivating it when encountered a fatal permission error on bind() at runtime.	2023-10-03 16:49:26 +02:00
Willy Tarreau	7c69c9b51f	BUG/MAJOR: plock: fix major bug in pl_take_w() introduced with EBO When EBO was brought to pl_take_w() by plock commit 60d750d ("plock: use EBO when waiting for readers to leave in take_w() and stow()"), a mistake was made: the mask against which the current value of the lock is tested excludes the first reader like in stow(), but it must not because it was just obtained via an ldadd() which means that it doesn't count itself. The problem this causes is that if there is exactly one reader when a writer grabs the lock, the writer will not wait for it to leave before starting its operations. The solution consists in checking for any reader in the IF. However the mask passed to pl_wait_unlock_*() must still exclude the lowest bit as it's verified after a subsequent load. Kudos to Remi Tricot-Le Breton for reporting and bisecting this issue with a reproducer. No backport is needed since this was brought in 2.9-dev3 with commit `8178a5211` ("MAJOR: threads/plock: update the embedded library again"). The code is now on par with plock commit ada70fe.	2023-10-03 08:28:12 +02:00
Amaury Denoyelle	337c71423f	MINOR: connection: define mux flag for reverse support Add a new MUX flag MX_FL_REVERSABLE. This value is used to indicate that MUX instance supports connection reversal. For the moment, only HTTP/2 multiplexer is flagged with it. This allows to dynamically check if reversal can be completed during MUX installation. This will allow to relax requirement on config writing for 'tcp-request session attach-srv' which currently cannot be used mixed with non-http/2 listener instances, even if used conditionnally with an ACL.	2023-09-29 18:09:08 +02:00
Amaury Denoyelle	ac1164de7c	MINOR: connection: define error for reverse connect Define a new error code for connection CO_ER_REVERSE. This will be used to report an issue which happens on a connection targetted for reversal before reverse process is completed.	2023-09-29 18:08:26 +02:00
Emeric Brun	3c250cb847	Revert "BUG/MEDIUM: quic: missing check of dcid for init pkt including a token" This reverts commit `072e774939`. Doing h2load with h3 tests we notice this behavior: Client ---- INIT no token SCID = a , DCID = A ---> Server (1) Client <--- RETRY+TOKEN DCID = a, SCID = B ---- Server (2) Client ---- INIT+TOKEN SCID = a , DCID = B ---> Server (3) Client <--- INIT DCID = a, SCID = C ---- Server (4) Client ---- INIT+TOKEN SCID = a, DCID = C ---> Server (5) With (5) dropped by haproxy due to token validation. Indeed the previous patch adds SCID of retry packet sent to the aad of the token ciphering aad. It was useful to validate the next INIT packets including the token are sent by the client using the new provided SCID for DCID as mantionned into the RFC 9000. But this stateless information is lost on received INIT packets following the first outgoing INIT packet from the server because the client is also supposed to re-use a second time the lastest received SCID for its new DCID. This will break the token validation on those last packets and they will be dropped by haproxy. It was discussed there: https://mailarchive.ietf.org/arch/msg/quic/7kXVvzhNCpgPk6FwtyPuIC6tRk0/ To resume: this is not the role of the server to verify the re-use of retry's SCID for DCID in further client's INIT packets. The previous patch must be reverted in all versions where it was backported (supposed until 2.6)	2023-09-29 09:27:22 +02:00
Willy Tarreau	d956db6638	CLEANUP: stream: remove the now unused stream_dump() function It was superseded by strm_dump_to_buffer() which provides much more complete information and supports anonymizing.	2023-09-29 09:20:27 +02:00
Willy Tarreau	c185bc4656	MEDIUM: stream: now provide full stream dumps in case of loops When a stream is caught looping, we produce some output to help figure its internal state explaining why it's looping. The problem is that this debug output is quite old and the info it provides are quite insufficient to debug a modern process, and since such bugs happen only once or twice a year the situation doesn't improve. On the other hand the output of "show sess all" is extremely detailed and kept up to date with code evolutions since it's a heavily used debugging tool. This commit replaces the call to the totally outdated stream_dump() with a call to strm_dump_to_buffer(), and removes the filters dump since they are already emitted there, and it now produces much more exploitable output: [ALERT] (5936) : A bogus STREAM [0x7fa8dc02f660] is spinning at 5653514 calls per second and refuses to die, aborting now! Please report this error to developers: 0x7fa8dc02f660: [28/Sep/2023:09:53:08.811818] id=2 proto=tcpv4 source=127.0.0.1:58306 flags=0xc4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x133f220, pend_pos=(nil) waiting=0 epoch=0x1 frontend=public (id=2 mode=http), listener=? (id=1) addr=127.0.0.1:4080 backend=public (id=2 mode=http) addr=127.0.0.1:61932 server=s1 (id=1) addr=127.0.0.1:7443 task=0x7fa8dc02fa40 (state=0x01 nice=0 calls=5749559 rate=5653514 exp=3s tid=1(1/1) age=1s) txn=0x7fa8dc02fbf0 flags=0x3000 meth=1 status=-1 req.st=MSG_DONE rsp.st=MSG_RPBEFORE req.f=0x4c rsp.f=0x00 scf=0x7fa8dc02f5f0 flags=0x00000482 state=EST endp=CONN,0x7fa8dc02b4b0,0x05004001 sub=1 rex=58s wex=<NEVER> h1s=0x7fa8dc02b4b0 h1s.flg=0x100010 .sd.flg=0x5004001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE .meth=GET status=0 .sd.flg=0x05004001 .sc.flg=0x00000482 .sc.app=0x7fa8dc02f660 .subs=0x7fa8dc02f608(ev=1 tl=0x7fa8dc02fae0 tl.calls=0 tl.ctx=0x7fa8dc02f5f0 tl.fct=sc_conn_io_cb) h1c=0x7fa8dc0272d0 h1c.flg=0x0 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc0273f0 .exp=<NEVER> co0=0x7fa8dc027040 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=LISTENER:0x12840c0 flags=0x00000300 fd=32 fd.state=20 updt=0 fd.tmask=0x2 scb=0x7fa8dc02fb30 flags=0x00001411 state=EST endp=CONN,0x7fa8dc0300c0,0x05000001 sub=1 rex=58s wex=<NEVER> h1s=0x7fa8dc0300c0 h1s.flg=0x4010 .sd.flg=0x5000001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE .meth=GET status=0 .sd.flg=0x05000001 .sc.flg=0x00001411 .sc.app=0x7fa8dc02f660 .subs=0x7fa8dc02fb48(ev=1 tl=0x7fa8dc02feb0 tl.calls=2 tl.ctx=0x7fa8dc02fb30 tl.fct=sc_conn_io_cb) h1c=0x7fa8dc02ff00 h1c.flg=0x80000000 .sub=1 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc030020 .exp=<NEVER> co1=0x7fa8dc02fcd0 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x133f220 flags=0x10000300 fd=33 fd.state=10421 updt=0 fd.tmask=0x2 req=0x7fa8dc02f680 (f=0x1840000 an=0x8000 pipe=0 tofwd=0 total=79) an_exp=<NEVER> buf=0x7fa8dc02f688 data=(nil) o=0 p=0 i=0 size=0 htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 res=0x7fa8dc02f6d0 (f=0x80000000 an=0x1400000 pipe=0 tofwd=0 total=0) an_exp=<NEVER> buf=0x7fa8dc02f6d8 data=(nil) o=0 p=0 i=0 size=0 htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 call trace(10): \| 0x59f2b7 [0f 0b 0f 1f 80 00 00 00]: stream_dump_and_crash+0x1f7/0x2bf \| 0x5a0d71 [e9 af e6 ff ff ba 40 00]: process_stream+0x19f1/0x3a56 \| 0x68d7bb [49 89 c7 4d 85 ff 74 77]: run_tasks_from_lists+0x3ab/0x924 \| 0x68e0b4 [29 44 24 14 8b 4c 24 14]: process_runnable_tasks+0x374/0x6d6 \| 0x656f67 [83 3d f2 75 84 00 01 0f]: run_poll_loop+0x127/0x5a8 \| 0x6575d7 [48 8b 1d 42 50 5c 00 48]: main+0x1b22f7 \| 0x7fa8e0f35e45 [64 48 89 04 25 30 06 00]: libpthread:+0x7e45 \| 0x7fa8e0e5a4af [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a Note that the output is subject to the global anon key so that IPs and object names can be anonymized if required. It could make sense to backport this and the few related previous patches next time such an issue is reported.	2023-09-29 09:20:27 +02:00
Willy Tarreau	5743eeea88	MINOR: stream: make stream_dump() always multi-line There used to be two working modes for this function, a single-line one and a multi-line one, the difference being made on the "eol" argument which could contain either a space or an LF (and with the prefix being adjusted accordingly). Let's get rid of the single-line mode as it's what limits the output contents because it's difficult to produce exploitable structured data this way. It was only used in the rare case of spinning streams and applets and these are the ones lacking info. Now a spinning stream produces: [ALERT] (3511) : A bogus STREAM [0x227e7b0] is spinning at 5581202 calls per second and refuses to die, aborting now! Please report this error to developers: strm=0x227e7b0,c4a src=127.0.0.1 fe=public be=public dst=s1 txn=0x2041650,3000 txn.req=MSG_DONE,4c txn.rsp=MSG_RPBEFORE,0 rqf=1840000 rqa=8000 rpf=80000000 rpa=1400000 scf=0x24af280,EST,482 scb=0x24af430,EST,1411 af=(nil),0 sab=(nil),0 cof=0x7fdb28026630,300:H1(0x24a6f60)/RAW((nil))/tcpv4(33) cob=0x23199f0,10000300:H1(0x24af630)/RAW((nil))/tcpv4(32) filters={} call trace(11): (...)	2023-09-29 09:20:27 +02:00
Willy Tarreau	48b2233d36	CLEANUP: freq_ctr: make all freq_ctr readers take a const Since 2.4-dev18 with commit `b4476c6a8` ("CLEANUP: freq_ctr: make arguments of freq_ctr_total() const"), most of the freq_ctr readers should be fine with a const, except that they were not updated to reflect this and they continue to force variable on some functions that call them. Let's update this. This could even be backported if needed.	2023-09-29 09:20:27 +02:00
Vladimir Vdovin	f8b81f6eb7	MINOR: support for http-request set-timeout client Added set-timeout for frontend side of session, so it can be used to set custom per-client timeouts if needed. Added cur_client_timeout to fetch client timeout samples.	2023-09-28 08:49:22 +02:00
Amaury Denoyelle	b9bb3b932c	MINOR: proto_reverse_connect: emit log for preconnect Add reporting using send_log() for preconnect operation. This is minimal to ensure we understand the current status of listener in active reverse connect. To limit logging quantity, only important transition are considered. This requires to implement a minimal state machine as a new field in receiver structure. Here are the logs produced : * Initiating : first time preconnect is enabled on a listener * Error : last preconnect attempt interrupted on a connection error * Reaching maxconn : all necessary connections were reversed and are operational on a listener	2023-09-22 17:21:53 +02:00
Amaury Denoyelle	1f43fb71be	MINOR: proto_reverse_connect: refactor preconnect failure When a connection is freed during preconnect before reversal, the error must be notified to the listener to remove any connection reference and rearm a new preconnect attempt. Currently, this can occur through 2 code paths : * conn_free() called directly by H2 mux * error during conn_create_mux(). For this case, connection is flagged with CO_FL_ERROR and reverse_connect task is woken up. The process task handler is then responsible to call conn_free() for such connection. Duplicated steps where done both in conn_free() and process task handler. These are now removed. To facilitate code maintenance, dedicated operation have been centralized in a new function rev_notify_preconn_err() which is called by conn_free().	2023-09-22 16:43:36 +02:00
Emeric Brun	27b2fd2e06	MINOR: quic: handle external extra CIDs generator. This patch adds the ability to externalize and customize the code of the computation of extra CIDs after the first one was derived from the ODCID. This is to prepare interoperability with extra components such as different QUIC proxies or routers for instance. To process the patch defines two function callbacks: - the first one to compute a hash 64bits from the first generated CID (itself continues to be derived from ODCID). Resulting hash is stored into the 'quic_conn' and 64bits is chosen large enought to be able to store an entire haproxy's CID. - the second callback re-uses the previoulsy computed hash to derive an extra CID using the custom algorithm. If not set haproxy will continue to choose a randomized CID value. Those two functions have also the 'cluster_secret' passed as an argument: this way, it is usable for obfuscation or ciphering.	2023-09-22 10:32:14 +02:00
Aurelien DARRAGON	acb7d8a89c	MINOR: pattern: fix pat_{parse,match}_ip() function comments Function comments were outdated, probably because they have not been updated during the previous refactors. Fixing comments to better reflect the current behavior. This may be backported up to 2.2, or even 2.0 by slightly adapting the patch (in 2.0, such functions are documented in proto/pattern.h)	2023-09-21 09:50:55 +02:00
Willy Tarreau	cbbee15462	CLEANUP: ring: rename the ring lock "RING_LOCK" instead of "LOGSRV_LOCK" The ring lock was initially mostly used for the logs and used to inherit its name in lock stats. Now that it's exclusively used by rings, let's rename it accordingly.	2023-09-20 21:38:33 +02:00
Willy Tarreau	cec8b42cb3	MEDIUM: logs: atomically check and update the log sample index The log server lock is pretty visible in perf top when using log samples because it's taken for each server in turn while trying to validate and update the log server's index. Let's change this for a CAS, since we have the index and the range at hand now. This allow us to remove the logsrv lock. The test on 4 servers now shows a 3.7 times improvement thanks to much lower contention. Without log sampling a test producing 4.4M logs/s delivers 4.4M logs/s at 21 CPUs used, everything spent in the kernel. After enabling 4 samples (1:4, 2:4, 3:4 and 4:4), the throughput would previously drop to 1.13M log/s with 37 CPUs used and 75% spent in process_send_log(). Now with this change, 4.25M logs/s are emitted, using 26 CPUs and 22% in process_send_log(). That's a 3.7x throughput improvement for a 30% global CPU usage reduction, but in practice it mostly shows that the performance drop caused by having samples is much less noticeable (each of the 4 servers has its index updated for each log). Note that in order to even avoid incrementing an index for each log srv that is consulted, it would be more convenient to have a single index per frontend and apply the modulus on each log server in turn to see if the range has to be updated. It would then only perform one write per range switch. However the place where this is done doesn't have access to a frontend, so some changes would need to be performed for this, and it would require to update the current range independently in each logsrv, which is not necessarily easier since we don't know yet if we can commit it.	2023-09-20 21:38:33 +02:00
Willy Tarreau	e00470378b	MINOR: logs: use a single index to store the current range and index By using a single long long to store both the current range and the next index, we'll make it possible to perform atomic operations instead of locking. Let's only regroup them for now under a new "curr_rg_idx". The upper word is the range, the lower is the index.	2023-09-20 21:38:33 +02:00
Willy Tarreau	3f1284560f	MINOR: log: remove the unused curr_idx in struct smp_log_range This index is useless because it only serves to know when the global index reached the end, while the global one already knows it. Let's just drop it and perform the test on the global range. It was verified with the following config that the first server continues to take 1/10 of the traffic, the 2nd one 2/10, the 3rd one 3/10 and the 4th one 4/10: log 127.0.0.1:10001 sample 1:10 local0 log 127.0.0.1:10002 sample 2,5:10 local0 log 127.0.0.1:10003 sample 3,7,9:10 local0 log 127.0.0.1:10004 sample 4,6,8,10:10 local0	2023-09-20 21:38:33 +02:00
Willy Tarreau	4351364700	MINOR: logs: clarify the check of the log range The test of the log range is not very clear, in part due to the reuse of the "curr_idx" name that happens at two levels. The call to in_smp_log_range() applies to the smp_info's index to which 1 is added: it verifies that the next index is still within the current range. Let's just have a local variable "next_index" in process_send_log() that gets assigned the next index (current+1) and compare it to the current range's boundaries. This makes the test much clearer. We can then simply remove in_smp_log_range() that's no longer needed.	2023-09-20 21:38:33 +02:00
Willy Tarreau	6cbb5a057b	Revert "MAJOR: import: update mt_list to support exponential back-off" This reverts commit `c618ed5ff4`. The list iterator is broken. As found by Fred, running QUIC single- threaded shows that only the first connection is accepted because the accepter relies on the element being initialized once detached (which is expected and matches what MT_LIST_DELETE_SAFE() used to do before). However while doing this in the quic_sock code seems to work, doing it inside the macro show total breakage and the unit test doesn't work anymore (random crashes). Thus it looks like the fix is not trivial, let's roll this back for the time it will take to fix the loop.	2023-09-15 17:13:43 +02:00
Willy Tarreau	e3b2704e26	BUG/MINOR: freq_ctr: fix possible negative rate with the scaled API In 1.9 with commit `627505d36` ("MINOR: freq_ctr: add swrate_add_scaled() to work with large samples") we got the ability to indicate when adding some values that they represent a number of samples. However there is an issue in the calculation which is that the number of samples that is added to the sum before the division in order to avoid fading away too fast, is multiplied by the scale. The problem it causes is that this is done in the negative part of the expression, and that as soon if the sum of old_sum and v*s is too small (e.g. zero), we end up with a negative value of -s. This is visible in "show pools" which occasionally report a very large value on "needed_avg" since 2.9, though the bug has been there for longer. Indeed in 2.9 since they're hashed in buckets, it suffices that any thread got one such error once for the sum to be wrong. One possible impact is memory usage not shrinking after a short burst due to pools refraining from releasing objects, believing they don't have enough. This must be backported to all versions. Note that the opportunistic version can be dropped before 2.8.	2023-09-14 11:09:07 +02:00
Willy Tarreau	c618ed5ff4	MAJOR: import: update mt_list to support exponential back-off The new mt_list code supports exponential back-off on conflict, which is important for use cases where there is contention on a large number of threads. The API evolved a little bit and required some updates: - mt_list_for_each_entry_safe() is now in upper case to explicitly show that it is a macro, and only uses the back element, doesn't require a secondary pointer for deletes anymore. - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has to set the list iterator to NULL so that it is not re-inserted into the list and the list is spliced there. One must be careful because it was usually performed before freeing the element. Now instead the element must be nulled before the continue/break. - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been unclear. They were replaced by mt_list_cut_around() and mt_list_connect_elem() which more explicitly detach the element and reconnect it into the list. - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is in list.h. It may however possibly benefit from being upstreamed. This required tiny adaptations to event_hdl.c and quic_sock.c. The test case was updated and the API doc added. Note that in order to keep include files small, the struct mt_list definition remains in list-t.h (par of the internal API) and was ifdef'd out in mt_list.h. A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with 80 threads shows a drastic reduction of CPU usage thanks to this and the refined memory barriers. Please note that the CPU usage on OpenSSL 3.0.9 is significantly higher due to the excessive use of atomic ops by openssl, but 3.1 is only slightly above 1.1.1 though: - before: 35 Gbps, 3.5 Mpps, 7800% CPU - after: 41 Gbps, 4.2 Mpps, 2900% CPU	2023-09-13 11:50:33 +02:00
Fr�d�ric L�caille	84757e32e6	BUG/MEDIUM: quic: quic_cc_conn ->cntrs counters unreachable This bug arrived with this commit in 2.9-dev3: MEDIUM: quic: Allow the quic_conn memory to be asap released. When sending packets from quic_cc_conn_io_cb(), e.g. when the quic_conn object has been released and replaced by a lighter one (quic_cc_conn), some counters may have to be incremented. But they were not reachable because not shared between quic_conn and quic_cc_conn struct. To fix this, one has only to move the ->cntrs counters from quic_conn to QUIC_CONN_COMMON struct which is shared between both quic_cc_conn Thank you to Tristan for having reported this in GH #2247. No need to backport.	2023-09-12 18:13:36 +02:00
Willy Tarreau	efc46dede9	DEBUG: pools: inspect pools on fatal error and dump information found It's a bit frustrating sometimes to see pool checks catch a bug but not provide exploitable information without a core. Here we're adding a function "pool_inspect_item()" which is called just before aborting in pool_check_pattern() and POOL_DEBUG_CHECK_MARK() and which will display the error type, the pool's pointer and name, and will try to check if the item's tag matches the pool, and if not, will iterate over all pools to see if one would be a better candidate, then will try to figure the last known caller and possibly other likely candidates if the pool's tag is not sufficiently trusted. This typically helps better diagnose corruption in use-after-free scenarios, or freeing to a pool that differs from the one the object was allocated from, and will also indicate calling points that may help figure where an object was last released or allocated. The info is printed on stderr just before the backtrace. For example, the recent off-by-one test in the PPv2 changes would have produced the following output in vtest logs: * h1 debug\|FATAL: pool inconsistency detected in thread 1: tag mismatch on free(). * h1 debug\| caller: 0x62bb87 (conn_free+0x147/0x3c5) * h1 debug\| pool: 0x2211ec0 ('pp_tlv_256', size 304, real 320, users 1) * h1 debug\|Tag does not match. Possible origin pool(s): * h1 debug\| tag: @0x2565530 = 0x2216740 (pp_tlv_128, size 176, real 192, users 1) * h1 debug\|Recorded caller if pool 'pp_tlv_128': *** h1 debug\| @0x2565538 (+0184) = 0x62c76d (conn_recv_proxy+0x4cd/0xa24) A mismatch in the allocated/released pool is already visible, and the callers confirm it once resolved, where the allocator indeed allocates from pp_tlv_128 and conn_free() releases to pp_tlv_256: $ addr2line -spafe ./haproxy <<< $'0x62bb87\n0x62c76d' 0x000000000062bb87: conn_free at connection.c:568 0x000000000062c76d: conn_recv_proxy at connection.c:1177	2023-09-11 15:46:14 +02:00
Willy Tarreau	f6bee5a50b	DEBUG: pools: make pool_check_pattern() take a pointer to the pool This will be useful to report detailed bug traces.	2023-09-11 15:19:49 +02:00
Willy Tarreau	e92e96b00f	DEBUG: pools: pass the caller pointer to the check functions and macros In preparation for more detailed pool error reports, let's pass the caller pointers to the check functions. This will be useful to produce messages indicating where the issue happened.	2023-09-11 15:19:49 +02:00
Willy Tarreau	baf2070421	DEBUG: pools: always record the caller for uncached allocs as well When recording the caller of a pool_alloc(), we currently store it only when the object comes from the cache and never when it comes from the heap. There's no valid reason for this except that the caller's pointer was not passed to pool_alloc_nocache(), so it used to set NULL there. Let's just pass it down the chain.	2023-09-11 15:19:49 +02:00
Willy Tarreau	4a18d9e560	REORG: cpuset: move parse_cpu_set() and parse_cpumap() to cpuset.c These ones were still in cfgparse.c but they're not specific to the config at all and may actually be used even when parsing cpu list entries in /sys. Better move them where they can be reused.	2023-09-08 16:25:19 +02:00
Willy Tarreau	5119109e3f	MINOR: cpuset: dynamically allocate cpu_map cpu_map is 8.2kB/entry and there's one such entry per group, that's ~520kB total. In addition, the init code is still in haproxy.c enclosed in ifdefs. Let's make this a dynamically allocated array in the cpuset code and remove that init code. Later we may even consider reallocating it once the number of threads and groups is known, in order to shrink it a little bit, as the typical setup with a single group will only need 8.2kB, thus saving half a MB of RAM. This would require that the upper bound is placed in a variable though.	2023-09-08 16:25:19 +02:00
Willy Tarreau	1f2433fb6a	MINOR: tools: add function read_line_to_trash() to read a line of a file This function takes on input a printf format for the file name, making it particularly suitable for /proc or /sys entries which take a lot of numbers. It also automatically trims the trailing CR and/or LF chars.	2023-09-08 16:25:19 +02:00
Frédéric Lécaille	e3e218b98e	CLEANUP: quic: Remove useless free_quic_tx_pkts() function. This function define but no more used since this commit: BUG/MAJOR: quic: Really ignore malformed ACK frames.	2023-09-08 10:17:25 +02:00
Frédéric Lécaille	292dfdd78d	BUG/MINOR: quic: Wrong cluster secret initialization The function generate_random_cluster_secret() which initializes the cluster secret when not supplied by configuration is buggy. There 1/256 that the cluster secret string is empty. To fix this, one stores the cluster as a reduced size first 128 bits of its own SHA1 (160 bits) digest, if defined by configuration. If this is not the case, it is initialized with a 128 bits random value. Furthermore, thus the cluster secret is always initialized. As the cluster secret is always initialized, there are several tests which are for now on useless. This patch removes such tests (if(global.cluster_secret)) in the QUIC code part and at parsing time: no need to check that a cluster secret was initialized with "quic-force-retry" option. Must be backported as far as 2.6.	2023-09-08 09:50:58 +02:00
William Lallemand	15e591b6e0	MINOR: ssl: add support for 'curves' keyword on server lines This patch implements the 'curves' keyword on server lines as well as the 'ssl-default-server-curves' keyword in the global section. It also add the keyword on the server line in the ssl_curves reg-test. These keywords allow the configuration of the curves list for a server.	2023-09-07 23:29:10 +02:00
Willy Tarreau	28ff1a5d56	MINOR: tasks/stats: report the number of niced tasks in "show info" We currently know the number of tasks in the run queue that are niced, and we don't expose it. It's too bad because it can give a hint about what share of the load is relevant. For example if one runs a Lua script that was purposely reniced, or if a stats page or the CLI is hammered with slow operations, seeing them appear there can help identify what part of the load is not caused by the traffic, and improve monitoring systems or autoscalers.	2023-09-06 17:44:44 +02:00
Remi Tricot-Le Breton	e03d060aa3	MINOR: cache: Change hash function in default normalizer used in case of "vary" When building the secondary signature for cache entries when vary is enabled, the referer part of the signature was a simple crc32 of the first referer header. This patch changes it to a 64bits hash based of xxhash algorithm with a random seed built during init. This will prevent "malicious" hash collisions between entries of the cache.	2023-09-06 16:11:31 +02:00
Aurelien DARRAGON	d9b81e5b49	MEDIUM: log/sink: make logsrv postparsing more generic We previously had postparsing logic but only for logsrv sinks, but now we need to make this operation on logsrv directly instead of sinks to prepare for additional postparsing logic that is not sink-specific. To do this, we migrated post_sink_resolve() and sink_postresolve_logsrvs() to their postresolve_logsrvs() and postresolve_logsrv_list() equivalents. Then, we split postresolve_logsrv_list() so that the sink-only logic stays in sink.c (sink_resolve_logsrv_buffer() function), and the "generic" target part stays in log.c as resolve_logsrv(). Error messages formatting was preserved as far as possible but some slight variations are to be expected. As for the functional aspect, no change should be expected.	2023-09-06 16:06:39 +02:00
Aurelien DARRAGON	969e212c66	MINOR: log: add dup_logsrv() helper function ease code maintenance by introducing dup_logsrv() helper function to properly duplicate an existing logsrv struct.	2023-09-06 16:06:39 +02:00
Aurelien DARRAGON	d499485aa9	MINOR: sink: simplify post_sink_resolve function Simplify post_sink_resolve() function to reduce code duplication and make it easier to maintain.	2023-09-06 16:06:39 +02:00
Aurelien DARRAGON	5b295ff409	MINOR: ring: add a function to compute max ring payload Add a helper function to the ring API to compute the maximum payload length that could fit into the ring based on ring size.	2023-09-06 16:06:39 +02:00
Christopher Faulet	3ec156f027	BUG/MEDIUM: applet: Fix API for function to push new data in channels buffer All applets only check the -1 error value (need room) for applet_put* functions while the underlying functions may also return -2 if the input is closed or -3 if the data length is invalid. It means applets already handle other cases by their own. The API should be fixed but for now, to ease backports, we only fix applet_put* functions to always return -1 on error. This way, at least for the applets point of view, the API is consistent. This patch should be backported to 2.8. Probably not further. Except if we suspect it could fix a bug.	2023-09-06 09:29:27 +02:00
Fr�d�ric L�caille	fb4294be55	BUG/MINOR: quic: Wrong RTT computation (srtt and rrt_var) Due to the fact that several variable values (rtt_var, srtt) were stored as multiple of their real values, some calculations were less accurate as expected. Stop storing 4rtt_var values, and 8srtt values. Adjust all the impacted statements. Must be backported as far as 2.6.	2023-09-05 17:14:51 +02:00
William Lallemand	d90d3bf894	MINOR: global: export the display_version() symbol Export the display_version() function which can be used elsewhere than in haproxy.c	2023-09-05 15:24:39 +02:00
Willy Tarreau	86854dd032	MEDIUM: threads: detect excessive thread counts vs cpu-map This detects when there are more threads bound via cpu-map than CPUs enabled in cpu-map, or when there are more total threads than the total number of CPUs available at boot (for unbound threads) and configured for bound threads. In this case, a warning is emitted to explain the problems it will cause, and explaining how to address the situation. Note that some configurations will not be detected as faulty because the algorithmic complexity to resolve all arrangements grows in O(N!). This means that having 3 threads on 2 CPUs and one thread on 2 CPUs will not be detected as it's 4 threads for 4 CPUs. But at least configs such as T0:(1,4) T1:(1,4) T2:(2,4) T3:(3,4) will not trigger a warning since they're valid.	2023-09-04 19:39:17 +02:00
Willy Tarreau	8357f950cb	MEDIUM: threads: detect incomplete CPU bindings It's very easy to mess up with some cpu-map directives and to leave some thread unbound. Let's add a test that checks that either all threads are bound or none are bound, but that we do not face the intermediary situation where some are pinned and others are left wandering around, possibly on the same CPUs as bound ones. Note that this should not be backported, or maybe turned into a notice only, as it appears that it will easily catch invalid configs and that may break updates for some users.	2023-09-04 19:39:17 +02:00
Willy Tarreau	e65f54cf96	MINOR: cpuset: centralize a reliable bound cpu detection Till now the CPUs that were bound were only retrieved in thread_cpus_enabled() in order to count the number of CPUs allowed, and it relied on arch-specific code. Let's slightly arrange this into ha_cpuset_detect_bound() that reuses the ha_cpuset struct and the accompanying code. This makes the code much clearer without having to carry along some arch-specific stuff out of this area. Note that the macos-specific code used in thread.c to only count online CPUs but not retrieve a mask, so for now we can't infer anything from it and can't implement it. In addition and more importantly, this function is reliable in that it will only return a value when the detection is accurate, and will not return incomplete sets on operating systems where we don't have an exact list, such as online CPUs.	2023-09-04 19:39:17 +02:00
Willy Tarreau	d3ecc67a01	MINOR: cpuset: add ha_cpuset_or() to bitwise-OR two CPU sets This operation was not implemented and will be needed later.	2023-09-04 19:39:17 +02:00
Willy Tarreau	eb10567254	MINOR: cpuset: add ha_cpuset_isset() to check for the presence of a CPU in a set This function will be convenient to test for the presence of a given CPU in a set.	2023-09-04 19:39:17 +02:00
Willy Tarreau	17a7baca07	BUILD: bug: make BUG_ON() void to avoid a rare warning When building without threads, the recently introduced BUG_ON(tid != 0) turns to a constant expression that evaluates to 0 and that is not used, resulting in this warning: src/connection.c: In function 'conn_free': src/connection.c:584:3: warning: statement with no effect [-Wunused-value] This is because the whole thing is declared as an expression for clarity. Make it return void to avoid this. No backport is needed.	2023-09-04 19:38:51 +02:00
Andrew Hopkins	b3f94f8b3b	BUILD: ssl: Build with new cryptographic library AWS-LC This adds a new option for the Makefile USE_OPENSSL_AWSLC, and update the documentation with instructions to use HAProxy with AWS-LC. Update the type of the OCSP callback retrieved with SSL_CTX_get_tlsext_status_cb with the actual type for libcrypto versions greater than 1.0.2. This doesn't affect OpenSSL which casts the callback to void* in SSL_CTX_ctrl.	2023-09-04 18:19:18 +02:00
Christopher Faulet	b50a471adb	BUG/MEDIUM: stconn: Don't block sends if there is a pending shutdown For the same reason than the previous patch, we must not block the sends when there is a pending shutdown. In other words, we must consider the sends are allowed when there is a pending shutdown. This patch must slowly be backported as far as 2.2. It should partially fix issue #2249.	2023-09-01 14:18:26 +02:00
Willy Tarreau	844a3bc25b	MEDIUM: checks: implement a queue in order to limit concurrent checks The progressive adoption of OpenSSL 3 and its abysmal handshake performance has started to reveal situations where it simply isn't possible anymore to succesfully run health checks on many servers, because between the moment all the checks are started and the moment the handshake finally completes, the timeout has expired! This also has consequences on production traffic which gets significantly delayed as well, all that for lots of checks. While it's possible to increase the check delays, it doesn't solve everything as checks still take a huge amount of time to converge in such conditions. Here we take a different approach by permitting to enforce the maximum concurrent checks per thread limitation and implementing an ordered queue. Thanks to this, if a thread about to start a check has reached its limit, it will add the check at the end of a queue and it will be processed once another check is finished. This proves to be extremely efficient, with all checks completing in a reasonable amount of time and not being disturbed by the rest of the traffic from other checks. They're just cycling slower, but at the speed the machine can handle. One must understand however that if some complex checks perform multiple exchanges, they will take a check slot for all the required duration. This is why the limit is not enforced by default. Tests on SSL show that a limit of 5-50 checks per thread on local servers gives excellent results already, so that could be a good starting point.	2023-09-01 14:00:04 +02:00
Willy Tarreau	cfc0bceeb5	MEDIUM: checks: search more aggressively for another thread on overload When the current check is overloaded (more running checks than the configured limit), we'll try more aggressively to find another thread. Instead of just opportunistically looking for one half as loaded, now if the current thread has more than 1% more active checks than another one, or has more than a configured limit of concurrent running checks, it will search for a more suitable thread among 3 other random ones in order to migrate the check there. The number of migrations remains very low (~1%) and the checks load very fair across all threads (~1% as well). The new parameter is called tune.max-checks-per-thread.	2023-09-01 08:26:06 +02:00
Willy Tarreau	00de9e0804	MINOR: checks: maintain counters of active checks per thread Let's keep two check counters per thread: - one for "active" checks, i.e. checks that are no more sleeping and are assigned to the thread. These include sleeping and running checks ; - one for "running" checks, i.e. those which are currently executing on the thread. By doing so, we'll be able to spread the health checks load a bit better and refrain from sending too many at once per thread. The counters are atomic since a migration increments the target thread's active counter. These numbers are reported in "show activity", which allows to check per thread and globally how many checks are currently pending and running on the system. Ideally, we should only consider checks in the process of establishing a connection since that's really the expensive part (particularly with OpenSSL 3.0). But the inner layers are really not suitable to doing this. However knowing the number of active checks is already a good enough hint.	2023-09-01 08:26:06 +02:00
Willy Tarreau	3b7942a1c9	MINOR: check/activity: collect some per-thread check activity stats We now count the number of times a check was started on each thread and the number of times a check was adopted. This helps understand better what is observed regarding checks.	2023-09-01 08:26:06 +02:00
Willy Tarreau	e03d05c6ce	MINOR: check: remember when we migrate a check The goal here is to explicitly mark that a check was migrated so that we don't do it again. This will allow us to perform other actions on the target thread while still knowing that we don't want to be migrated again. The new READY bit combine with SLEEPING to form 4 possible states: SLP RDY State Description 0 0 - (reserved) 0 1 RUNNING Check is bound to current thread and running 1 0 SLEEPING Check is sleeping, not bound to a thread 1 1 MIGRATING Check is migrating to another thread Thus we set READY upon migration, and check for it before migrating, this is sufficient to prevent a second migration. To make things a bit clearer, the SLEEPING bit was switched with FASTINTER so that SLEEPING and READY are adjacent.	2023-09-01 08:26:06 +02:00
Willy Tarreau	7163f95b43	MINOR: checks: start the checks in sleeping state The CHK_ST_SLEEPING state was introduced by commit `d114f4a68` ("MEDIUM: checks: spread the checks load over random threads") to indicate that a check was not currently bound to a thread and that it could easily be migrated to any other thread. However it did not start the checks in this state, meaning that they were not redispatchable on startup. Sometimes under heavy load (e.g. when using SSL checks with OpenSSL 3.0) the cost of setting up new connections is so high that some threads may experience connection timeouts on startup. In this case it's better if they can transfer their excess load to other idle threads. By just marking the check as sleeping upon startup, we can do this and significantly reduce the number of failed initial checks.	2023-09-01 08:26:06 +02:00
Willy Tarreau	52b260bae4	MINOR: server/ssl: maintain an index of the last known valid SSL session When a thread creates a new session for a server, if none was known yet, we assign the thread id (hence the reused_sess index) to a shared variable so that other threads will later be able to find it when they don't have one yet. For now we only set and clear the pointer upon session creation, we do not yet pick it. Note that we could have done it per thread-group, so as to avoid any cross-thread exchanges, but it's anticipated that this is essentially used during startup, at a moment where the cost of inter-thread contention is very low compared to the ability to restart at full speed, which explains why instead we store a single entry.	2023-08-31 08:50:01 +02:00
Willy Tarreau	607041dec3	MEDIUM: server/ssl: place an rwlock in the per-thread ssl server session The goal will be to permit a thread to update its session while having it shared with other threads. For now we only place the lock and arrange the code around it so that this is quite light. For now only the owner thread uses this lock so there is no contention. Note that there is a subtlety in the openssl API regarding i2s_SSL_SESSION() in that it fills the area pointed to by its argument with a dump of the session and returns a size that's equal to the previously allocated one. As such, it does modify the shared area even if that's not obvious at first glance.	2023-08-31 08:50:01 +02:00
Alexander Stephan	ece0d1ab49	MINOR: sample: Refactor fc_pp_authority by wrapping the generic TLV fetch We already have a call that can retreive an TLV with any value. Therefore, the fetch logic is redundant and can be simplified by simply calling the generic fetch with the correct TLV ID set as an argument.	2023-08-29 15:31:51 +02:00
Alexander Stephan	fecc573da1	MEDIUM: connection: Generic, list-based allocation and look-up of PPv2 TLVs In order to be able to implement fetches in the future that allow retrieval of any TLVs, a new generic data structure for TLVs is introduced. Existing TLV fetches for PP2_TYPE_AUTHORITY and PP2_TYPE_UNIQUE_ID are migrated to use this new data structure. TLV related pools are updated to not rely on type, but only on size. Pools accomodate the TLV list element with their associated value. For now, two pools for 128 B and 256 B values are introduced. More fine-grained solutions are possible in the future, if necessary.	2023-08-29 15:15:47 +02:00
Alexander Stephan	c9d47652d2	CLEANUP/MINOR: connection: Improve consistency of PPv2 related constants This patch improves readability by scoping HA proxy related PPv2 constants with a 'HA" prefix. Besides, a new constant for the length of a CRC32C TLV is introduced. The length is derived from the PPv2 spec, so 32 Bit.	2023-08-29 15:15:47 +02:00
Willy Tarreau	bd84387beb	MEDIUM: capabilities: enable support for Linux capabilities For a while there has been the constraint of having to run as root for transparent proxying, and we're starting to see some cases where QUIC is not running in socket-per-connection mode due to the missing capability that would be needed to bind a privileged port. It's not realistic to ask all QUIC users on port 443 to run as root, so instead let's provide a basic support for capabilities at least on linux. The ones currently supported are cap_net_raw, cap_net_admin and cap_net_bind_service. The mechanism was made OS-specific with a dedicated file because it really is. It can be easily refined later for other OSes if needed. A new keyword "setcaps" is added to the global section, to enumerate the capabilities that must be kept when switching from root to non-root. This is ignored in other situations though. HAProxy has to be built with USE_LINUX_CAP=1 for this to be supported, which is enabled by default for linux-glibc, linux-glibc-legacy and linux-musl. A good way to test this is to start haproxy with such a config: global uid 1000 setcap cap_net_bind_service frontend test mode http timeout client 3s bind quic4@:443 ssl crt rsa+dh2048.pem allow-0rtt and run it under "sudo strace -e trace=bind,setuid", then connecting there from an H3 client. The bind() syscall must succeed despite the user id having been switched.	2023-08-29 11:11:50 +02:00
William Lallemand	e7d9082315	BUG/MINOR: ssl/cli: can't find ".crt" files when replacing a certificate Bug was introduced by commit 26654 ("MINOR: ssl: add "crt" in the cert_exts array"). When looking for a .crt directly in the cert_exts array, the ssl_sock_load_pem_into_ckch() function will be called with a argument which does not have its ".crt" extensions anymore. If "ssl-load-extra-del-ext" is used this is not a problem since we try to add the ".crt" when doing the lookup in the tree. However when using directly a ".crt" without this option it will failed looking for the file in the tree. The fix removes the "crt" entry from the array since it does not seem to be really useful without a rework of all the lookups. Should fix issue #2265 Must be backported as far as 2.6.	2023-08-28 18:20:39 +02:00
Willy Tarreau	892d04733f	BUILD: import: guard plock.h against multiple inclusion Surprisingly there's no include guard in plock.h though there is one in atomic-ops.h. Let's add one, or we cannot risk including the file multiple times.	2023-08-26 17:28:08 +02:00
Amaury Denoyelle	5afcb686b9	MAJOR: connection: purge idle conn by last usage Backend idle connections are purged on a recurring occurence during the process lifetime. An estimated number of needed connections is calculated and the excess is removed periodically. Before this patch, purge was done directly using the idle then the safe connection tree of a server instance. This has a major drawback to take no account of a specific ordre and it may removed functional connections while leaving ones which will fail on the next reuse. The problem can be worse when using criteria to differentiate idle connections such as the SSL SNI. In this case, purge may remove connections with a high rate of reusing while leaving connections with criteria never matched once, thus reducing drastically the reuse rate. To improve this, introduce an alternative storage for idle connection used in parallel of the idle/safe trees. Now, each connection inserted in one of this tree is also inserted in the new list at `srv_per_thread.idle_conn_list`. This guarantees that recently used connection is present at the end of the list. During the purge, use this list instead of idle/safe trees. Remove first connection in front of the list which were not reused recently. This will ensure that connection that are frequently reused are not purged and should increase the reuse rate, particularily if distinct idle connection criterias are in used.	2023-08-25 15:57:48 +02:00
Amaury Denoyelle	61fc9568fb	MINOR: server: move idle tree insert in a dedicated function Define a new function _srv_add_idle(). This is a simple wrapper to insert a connection in the server idle tree. This is reserved for simple usage and require to idle_conns lock. In most cases, srv_add_to_idle_list() should be used. This patch does not have any functional change. However, it will help with the next patch as idle connection will be always inserted in a list as secondary storage along with idle/safe trees.	2023-08-25 15:57:48 +02:00
Amaury Denoyelle	77ac8eb4a6	MINOR: connection: simplify removal of idle conns from their trees Small change of API for conn_delete_from_tree(). Now the connection instance is taken as argument instead of its inner node. No functional change introduced with this commit. This simplifies slightly invocation of conn_delete_from_tree(). The most useful changes is that this function will be extended in the next patch to be able to remove the connection from its new idle list at the same time as in its idle tree.	2023-08-25 15:57:48 +02:00
Fr�d�ric L�caille	81815a9a83	MEDIUM: map/acl: Replace map/acl spin lock by a read/write lock. Replace ->lock type of pat_ref struct by HA_RWLOCK_T. Replace all calls to HA_SPIN_LOCK() (resp. HA_SPIN_UNLOCK()) by HA_RWLOCK_WRLOCK() (resp. HA_RWLOCK_WRUNLOCK()) when a write access is required. There is only one read access which is needed. This is in the "show map" command callback, cli_io_handler_map_lookup() where a HA_SPIN_LOCK() call is replaced by HA_RWLOCK_RDLOCK() (resp. HA_SPIN_UNLOCK() by HA_RWLOCK_RDUNLOCK). Replace HA_SPIN_INIT() calls by HA_RWLOCK_INIT() calls.	2023-08-25 15:42:03 +02:00
Fr�d�ric L�caille	745d1a269b	MEDIUM: map/acl: Improve pat_ref_set_elt() efficiency (for "set-map", "add-acl"action perfs) Store a pointer to the expression (struct pattern_expr) into the data structure used to chain/store the map element references (struct pat_ref_elt) , e.g. the struct pattern_tree when stored into an ebtree or struct pattern_list when chained to a list. Modify pat_ref_set_elt() to stop inspecting all the expressions attached to a map and to look for the <elt> element passed as parameter to retrieve the sample data to be parsed. Indeed, thanks to the pointer added above to each pattern tree nodes or list elements, they all can be inspected directly from the <elt> passed as parameter and its ->tree_head and ->list_head member: the pattern tree nodes are stored into elt->tree_head, and the pattern list elements are chained to elt->list_head list. This inspection was also the job of pattern_find_smp() which is no more useful. This patch removes the code of this function.	2023-08-25 15:41:59 +02:00
Fr�d�ric L�caille	0844bed7d3	MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs) Organize reference to pattern element of map (struct pat_ref_elt) into an ebtree: - add an eb_root member to the map (pat_ref struct) and an ebpt_node to its element (pat_ref_elt struct), - modify the code to insert these nodes into their ebtrees each time they are allocated. This is done in pat_ref_append(). Note that ->head member (struct list) of map (struct pat_ref) is not removed could have been removed. This is not the case because still necessary to dump the map contents from the CLI in the order the map elememnts have been inserted. This patch also modifies http_action_set_map() which is the callback at least used by "set-map" action. The pat_ref_elt element returned by pat_ref_find_elt() is no more ignored, but reused if not NULL by pat_ref_set() as first element to lookup from. This latter is also modified to use the ebtree attached to the map in place of the ->head list attached to each map element (pat_ref_elt struct). Also modify pat_ref_find_elt() to makes it use ->eb_root map ebtree added to the map by this patch in place of inspecting all the elements with a strcmp() call.	2023-08-25 15:41:56 +02:00
Amaury Denoyelle	5053e89142	MEDIUM: h2: prevent stream opening before connection reverse completed HTTP/2 demux must be handled with care for active reverse connection. Until accept has been completed, it should be forbidden to handle HEADERS frame as session is not yet ready to handle streams. To implement this, use the flag H2_CF_DEM_TOOMANY which blocks demux process. This flag is automatically set just after conn_reverse() invocation. The flag is removed on rev_accept_conn() callback via a new H2 ctl enum. H2 tasklet is woken up to restart demux process. As a side-effect, reporting in H2 mux may be blocked as demux functions are used to convert error status at the connection level with CO_FL_ERROR. To ensure error is reported for a reverse connection, check h2c_is_dead() specifically for this case in h2_wake(). This change also has its own side-effect : h2c_is_dead() conditions have been adjusted to always exclude !h2c->conn->owner condition which is always true for reverse connection or else H2 mux may kill them unexpectedly.	2023-08-24 17:03:08 +02:00
Amaury Denoyelle	47f502df5e	MEDIUM: proto_reverse_connect: bootstrap active reverse connection Implement active reverse connection initialization. This is done through a new task stored in the receiver structure. This task is instantiated via bind callback and first woken up via enable callback. Task handler is separated into two halves. On the first step, a new connection is allocated and stored in <pend_conn> member of the receiver. This new client connection will proceed to connect using the server instance referenced in the bind_conf. When connect has successfully been executed and HTTP/2 connection is ready for exchange after SETTINGS, reverse_connect task is woken up. As <pend_conn> is still set, the second halve is executed which only execute listener_accept(). This will in turn execute accept_conn callback which is defined to return the pending connection. The task is automatically requeued inside accept_conn callback if bind maxconn is not yet reached. This allows to specify how many connection should be opened. Each connection is instantiated and reversed serially one by one until maxconn is reached. conn_free() has been modified to handle failure if a reverse connection fails before being accepted. In this case, no session exists to notify about the failure. Instead, reverse_connect task is requeud with a 1 second delay, giving time to fix a possible network issue. This will allow to attempt a new connection reverse. Note that for the moment connection rebinding after accept is disabled for simplicity. Extra operations are required to migrate an existing connection and its stack to a new thread which will be implemented later.	2023-08-24 17:03:06 +02:00
Amaury Denoyelle	0747e493a0	MINOR: proto_reverse_connect: parse rev@ addresses for bind Implement parsing for "rev@" addresses on bind line. On config parsing, server name is stored on the bind_conf. Several new callbacks are defined on reverse_connect protocol to complete parsing. listen callback is used to retrieve the server instance from the bind_conf server name. If found, the server instance is stored on the receiver. Checks are implemented to ensure HTTP/2 protocol only is used by the server.	2023-08-24 17:02:37 +02:00
Amaury Denoyelle	008e8f67ee	MINOR: connection: extend conn_reverse() for active reverse Implement active reverse support inside conn_reverse(). This is used to transfer the connection from the backend to the frontend side. A new flag is defined CO_FL_REVERSED which is set just after this transition. This will be used to identify connections which were reversed but not yet accepted.	2023-08-24 17:02:37 +02:00
Amaury Denoyelle	5db6dde058	MINOR: proto: define dedicated protocol for active reverse connect A new protocol named "reverse_connect" is created. This will be used to instantiate connections that are opened by a reverse bind. For the moment, only a minimal set of callbacks are defined with no real work. This will be extended along the next patches.	2023-08-24 17:02:37 +02:00
Amaury Denoyelle	1723e21af2	MINOR: connection: use attach-srv name as SNI reuse parameter on reverse On connection passive reverse from frontend to backend, its hash node is calculated to be able to select it from the idle server pool. If attach-srv rule defined an associated name, reuse it as the value for SNI prehash. This change allows a client to select a reverse connection by its name by configuring its server line with a SNI to permit this.	2023-08-24 17:02:34 +02:00
Amaury Denoyelle	0b3758e18f	MINOR: tcp-act: define optional arg name for attach-srv Add an optional argument 'name' for attach-srv rule. This contains an expression which will be used as an identifier inside the server idle pool after reversal. To match this connection for a future transfer through the server, the SNI server parameter must match this name. If no name is defined, match will only occur with an empty SNI value. For the moment, only the parsing step is implemented. An extra check is added to ensure that the reverse server uses SSL with a SNI. Indeed, if name is defined but server does not uses a SNI, connections will never be selected on reused after reversal due to a hash mismatch.	2023-08-24 15:28:38 +02:00
Amaury Denoyelle	58cb76d7e1	MINOR: tcp-act: parse 'tcp-request attach-srv' session rule Create a new tcp-request session rule 'attach-srv'. The parsing handler is used to extract the server targetted with the notation 'backend/server'. The server instance is stored in the act_rule instance under the new union variant 'attach_srv'. Extra checks are implemented in parsing to ensure attach-srv is only used for proxy in HTTP mode and with listeners/server with no explicit protocol reference or HTTP/2 only. The action handler itself is really simple. It assigns the stored server instance to the 'reverse' member of the connection instance. It will be used in a future patch to implement passive reverse-connect.	2023-08-24 15:02:32 +02:00
Amaury Denoyelle	6e428dfaf2	MINOR: backend: only allow reuse for reverse server A reverse server relies solely on its pool of idle connection to transfer requests which will be populated through a new tcp-request rule 'attach-srv'. Several changes are required on connect_server() to implement this. First, reuse mode is forced to always for this type of server. Then, if no idle connection is found, the request will be aborted. This results with a 503 HTTP error code, similarly to when no server is available.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	e6223a3188	MINOR: server: define reverse-connect server Implement reverse-connect server. This server type cannot instantiate its own connection on transfer. Instead, it can only reuse connection from its idle pool. These connections will be populated using the future 'tcp-request session attach-srv' rule. A reverse-connect has no address. Instead, it uses a new custom server notation with '@' character prefix. For the moment, only '@reverse' is defined. An extra check is implemented to ensure server is used in a HTTP proxy.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	4fb538d4b6	MEDIUM: h2: reverse connection after SETTINGS reception Reverse connection after SETTINGS reception if it was set as reversable. This operation is done in a new function h2_conn_reverse(). It regroups common changes which are needed for both reversal direction : H2_CF_IS_BACK is set or unset and timeouts are inverted. For the moment, only passive reverse is fully implemented. Once done, the connection instance is directly inserted in its targetted server pool. It can then be used immediately for future transfers using this server.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	1f76b8ae07	MEDIUM: connection: implement passive reverse Define a new method conn_reverse(). This method is used to reverse a connection from frontend to backend or vice-versa depending on its initial status. For the moment, passive reverse only is implemented. This covers the transition from frontend to backend side. The connection is detached from its owner session which can then be freed. Then the connection is linked to the server instance. only for passive connection on frontend to transfer them on the backend side. This requires to free the connection session after detaching it from.	2023-08-24 14:44:33 +02:00
Amaury Denoyelle	fbe35afaa4	MINOR: proxy: simplify parsing 'backend/server' Several CLI handlers use a server argument specified with the format '<backend>/<server>'. The parsing of this arguement is done in two steps, first splitting the string with '/' delimiter and then use get_backend_server() to retrieve the server instance. Refactor this code sections with the following changes : * splitting is reimplented using ist API * get_backend_server() is removed. Instead use the already existing proxy_be_by_name() then server_find_by_name() which contains duplicated code with the now removed function. No functional change occurs with this commit. However, it will be useful to add new configuration options reusing the same '<backend>/<server>' for reverse connect.	2023-08-24 14:44:33 +02:00
Willy Tarreau	9b47ed1a93	IMPORT: xxhash: update xxHash to version 0.8.2 Peter Varkoly reported a build issue on ppc64le in xxhash.h. Our version (0.8.1) was the last one 9 months ago, and since then this specific issue was addressed in 0.8.2, so let's apply the maintenance update. This should be backported to 2.8 and 2.7.	2023-08-24 12:01:06 +02:00
Amaury Denoyelle	cd97ba147c	BUILD/IMPORT: fix compilation with PLOCK_DISABLE_EBO=1 Compilation is broken due to missing __pl_wait_unlock_long() definition when building with PLOCK_DISABLE_EBO=1. This has been introduced since the following commit which activates the inlining version of pl_wait_unlock_long() : commit `071d689a51` MINOR: threads: inline the wait function for pthread_rwlock emulation Add an extra check on PLOCK_DISABLE_EBO before choosing the inline or default version of pl_wait_unlock_long() to fix this.	2023-08-17 11:16:54 +02:00
Willy Tarreau	78fa54863d	MINOR: atomic: make sure to always relax after a failed CAS There were a few places left where we forgot to call __ha_cpu_relax() after a failed CAS, in the HA_ATOMIC_UPDATE_{MIN,MAX} macros, and in a few sync_* API macros (the same as above plus HA_ATOMIC_CAS and HA_ATOMIC_XCHG). Let's add them now. This could have been a cause of contention, particularly with process_stream() calling stream_update_time_stats() which uses 8 of them in a call (4 for the server, 4 for the proxy). This may be a possible explanation for the high CPU consumption reported in GH issue #2251. This should be backported at least to 2.6 as it's harmless.	2023-08-17 09:09:20 +02:00
Willy Tarreau	071d689a51	MINOR: threads: inline the wait function for pthread_rwlock emulation When using pthread_rwlock emulation, contention is reported on pl_wait_unlock_long(). This is really not convenient to analyse what is happening. Now plock supports inlining the wait call for just the lorw functions by enabling PLOCK_LORW_INLINE_WAIT. Let's do this so that now the wait time will be precisely reported as either pthread_rwlock_rdlock() or pthread_rwlock_wrlock() depending on the contended function, but no more on pl_wait_unlock_long(), which will still be reported for all other locks.	2023-08-17 00:09:05 +02:00
Willy Tarreau	e56275378f	IMPORT: lorw: support inlining the wait call Now when PLOCK_LORW_INLINE_WAIT is defined, the pl_wait_unlock_long() calls in pl_lorw_rdlock() and pl_lorw_wrlock() will be inlined so that all the CPU time is accounted for in the calling function. This is plock upstream commit c993f81d581732a6eb8fe3033f21970420d21e5e.	2023-08-17 00:09:05 +02:00
Willy Tarreau	66dcc0550e	IMPORT: plock: always expose the inline version of the lock wait function Doing so will allow to expose the time spent in certain highly contended functions, which can be desirable for more accurate CPU profiling. For example this could be done in locking functions that are already not inlined so that they are the ones being reported as those consuming the CPU instead of just pl_wait_unlock_long(). This is plock upstream commit 7505c2e2c8c4aa0ab8f52a2288e1334ae6412be4.	2023-08-17 00:09:05 +02:00
Willy Tarreau	c6b98f05d2	IMPORT: plock: also support inlining the int code Commit 9db830b ("plock: support inlining exponential backoff code") added an option to support inlining of the wait code for longs but forgot to do it for ints. Let's do it now. This is plock upstream commit b1f9f0d252fa40577d11cfb2bc0a809d6960a297.	2023-08-17 00:09:05 +02:00
Willy Tarreau	7bf829ace1	MAJOR: pools: move the shared pool's free_list over multiple buckets This aims at further reducing the contention on the free_list when using global pools. The free_list pointer now appears for each bucket, and both the alloc and the release code skip to a next bucket when ending on a contended entry. The default entry used for allocations and releases depend on the thread ID so that locality is preserved as much as possible under low contention. It would be nice to improve the situation to make sure that releases to the shared pools doesn't consider the first entry's pointer but only an argument that would be passed and that would correspond to the bucket in the thread's cache. This would reduce computations and make sure that the shared cache only contains items whose pointers match the same bucket. This was not yet done. One possibility could be to keep the same splitting in the local cache. With this change, an h2load test with 5 * 160 conns & 40 streams on 80 threads that was limited to 368k RPS with the shared cache jumped to 3.5M RPS for 8 buckets, 4M RPS for 16 buckets, 4.7M RPS for 32 buckets and 5.5M RPS for 64 buckets.	2023-08-12 19:04:34 +02:00
Willy Tarreau	8a0b5f783b	MINOR: pools: move the failed allocation counter over a few buckets The failed allocation counter cannot depend on a pointer, but since it's a perpetually increasing counter and not a gauge, we don't care where it's incremented. Thus instead we're hashing on the TID. There's no contention there anyway, but it's better not to waste the room in the pool's heads and to move that with the other counters.	2023-08-12 19:04:34 +02:00
Willy Tarreau	da6999f839	MEDIUM: pools: move the needed_avg counter over a few buckets That's the same principle as for ->allocated and ->used. Here we return the summ of the raw values, so the result still needs to be fed to swrate_avg(). It also means that we now use the local ->used instead of the global one for the calculations and do not need to call pool_used() anymore on fast paths. The number of samples should likely be divided by the number of buckets, but that's not done yet (better observe first). A function pool_needed_avg() was added to report aggregated values for the "show pools" command. With this change, an h2load made of 5 * 160 conn * 40 streams on 80 threads raised from 1.5M RPS to 6.7M RPS.	2023-08-12 19:04:34 +02:00
Willy Tarreau	9e5eb586b1	MEDIUM: pools: move the used counter over a few buckets That's the same principle as for ->allocated. The small difference here is that it's no longer possible to decrement ->used in batches when releasing clusters from the cache to the shared cache, so the counter has to be decremented for each of them. But as it provides less contention and it's done only during forced eviction, it shouldn't be a problem. A function "pool_used()" was added to return the sum of the entries. It's used by pool_alloc_nocache() and pool_free_nocache() which need to count the number of used entries. It's not a problem since such operations are done when picking/releasing objects to/from the OS, but it is a reminder that the number of buckets should remain small. With this change, an h2load test made of 5 * 160 conn * 40 streams on 80 threads raised from 812k RPS to 1.5M RPS.	2023-08-12 19:04:34 +02:00
Willy Tarreau	cdb711e42b	MEDIUM: pools: spread the allocated counter over a few buckets The ->used counter is one of the most stressed, and it heavily depends on the ->allocated one, so let's first move ->allocated to a few buckets. A function "pool_allocated()" was added to return the sum of the entries. It's important not to abuse it as it does iterate, so everywhere it's possible to avoid it by keeping a local counter, it's better. Currently it's used for limited pools which need to make sure they do not allocate too many objects. That's an acceptable tradeoff to save CPU on large machines at the expense of spending a little bit more on small ones which normally are not under load.	2023-08-12 19:04:34 +02:00
Willy Tarreau	06885aaea7	MINOR: pools: introduce the use of multiple buckets On many threads and without the shared cache, there can be extreme contention on the ->allocated counter, the ->free_list pointer, and the ->used counter. It's possible to limit this contention by spreading the counters a little bit over multiple entries, that are summed up when a consultation is needed. The criterion used to spread the values cannot be related to the thread ID due to migrations, since we need to keep consistent stats (allocated vs used). Instead we'll just hash the pointer, it provides an index that does the job and that is consistent for the object. When having just a few entries (16 here as it showed almost identical performance between global and non-global pools) even iterations should be short enough during measurements to not be a problem. A pair of functions designed to ease pointer hash bucket calculation were added, with one of them doing it for thread IDs because allocation failures will be associated with a thread and not a pointer. For now this patch only brings in the relevant parts of the infrastructure, the CONFIG_HAP_POOL_BUCKETS_BITS macro that defaults to 6 bits when 512 threads or more are supported, 5 bits when 128 or more are supported, 4 bits when 16 or more are supported, otherwise 3 bits for small setups. The array in the pool_head and the two utility functions are already added. It should have no measurable impact beyond inflating the pool_head structure.	2023-08-12 19:04:34 +02:00
Willy Tarreau	29ad61fb00	OPTIM: pools: make pool_get_from_os() / pool_put_to_os() not update ->allocated The pool's allocation counter doesn't strictly require to be updated from these functions, it may more efficiently be done in the caller (even out of a loop for pool_flush() and pool_gc()), and doing so will also help us spread the counters over an array later. The functions were renamed _noinc and _nodec to make sure we catch any possible user in an external patch. If needed, the original functions may easily be reimplemented in an inline function.	2023-08-12 19:04:34 +02:00
Willy Tarreau	f0d188f6ed	OPTIM: tools: improve hash distribution using a better prime seed During tests it was noticed that the current hash is not that good on 4- and 5- bit hashes. About 7.5% of all the 32-bit primes were tested as candidates for the hash function, by submitting them 128 arrangements of N pointers among 40k extracted from haproxy's pools, and the average fill rates for 1- to 12- bit hashes were measured and compared. It was clear that some values do not provide great hashes and other ones are way more resistant. The current value is not bad at all but delivers 42.6% unique 2-bit outputs, 41.6% 3-bit, 38.0% 4-bit, 38.2% 5-bit and 37.1% 10-bit. Some values did perform significantly better, among which 0xacd1be85 which does 43.2% 2-bit, 42.5% 3-bit, 42.2% 4-bit, 39.2% 5-bit and 37.3% 10-bit. The reverse value used in the ptr2_hash() was really underperforming and was replaced with 0x9d28e4e9 which does 49.6%, 40.4%, 42.6%, 39.1%, and 37.2% respectvely. This should slightly improve the accuracy of the task and memory profiling, and will be useful for pools.	2023-08-12 19:04:34 +02:00
Willy Tarreau	58946d44f8	MINOR: tools: improve ptr hash distribution on 64 bits When testing the pointer hash on 64-bit real pointers (map entries), it appeared that the shift by 33 bits that hoped to compensate for the 3 nul LSB degrades the hash, and the centering is more optimal on 31-(bits+1)/2. This makes sense since the topmost bit of the multiplicator is 31, so for an input of 1 bit and 1 bit of output we would always get zero. With the formula adjusted this way, we can get up to ~15% more unique entries at 10 bits and ~24% more at 11 bits.	2023-08-12 19:04:34 +02:00
Willy Tarreau	ab6cb5dea0	MINOR: tools: make ptr_hash() support 0-bit outputs When dealing with macro-based size definitions, it is useful to be able to hash pointers on zero bits so that the macro automatically returns a constant 0. For now it only supports 1-32. Let's just add this special case. It's automatically optimized out by the compiler since the function is inlined.	2023-08-12 19:04:34 +02:00
Willy Tarreau	59c347c15e	BUILD: defaults: use __WORDSIZE not LONGBITS for MAX_THREADS_PER_GROUP LONGBITS was defined long ago with old compilers that didn't provide the word size. It's still present as being referenced in various places in the code, but we must not use it to define other macros that may be evaluated at pre-processing time since it contains sizeof() and casts that are not compatible with preprocessor conditions. Let's switch MAX_THREADS_PER_GROUP to __WORDSIZE so that we can condition blocks of code on it if needed. LONGBITS should really be removed by now, given that we don't support compilers not providing __WORDSIZE anymore (gcc < 4.2).	2023-08-12 19:04:34 +02:00
Willy Tarreau	9e52c35de4	CLEANUP: stick-table: slightly reorder the stktable struct By moving the config-time stuff after the updt_lock, we can plug some holes without interfering with it. This allows us to get back to the 768-bytes struct. The performance was not affected at all.	2023-08-11 19:03:35 +02:00
Willy Tarreau	9c6248560e	MINOR: stick-table: move the update lock into its own cache line The read-lock contention observed on the update lock while turning it into an upgradable lock were due to false sharing with the nearby updates. Simply moving the lock alone into its own cache line is sufficient to almost double the performance again, raising from 2355 to 4480k RPS with very low contention: Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 743422995452 lost Overhead Shared Object Symbol 15.88% haproxy [.] stktable_lookup_key 5.94% haproxy [.] ebmb_lookup 5.69% haproxy [.] http_wait_for_request 3.66% haproxy [.] stktable_touch_with_exp 2.62% [kernel] [k] _raw_spin_unlock_irqrestore 1.86% haproxy [.] http_action_return 1.79% haproxy [.] stream_process_counters 1.78% [kernel] [k] skb_release_data 1.77% haproxy [.] process_stream Unfortunately, trying to move the line anywhere else didn't work, despite the remaining holes, because this structure is not quite clean. This adds 64 bytes to a struct that was already 768 long, so it's now 832. It's possible to repack it a little bit and regain these bytes by removing the THREAD_ALIGN before "keys" because we rarely use the config stuff, but that's a bit unsafe.	2023-08-11 19:03:35 +02:00
Willy Tarreau	87e072eea5	MEDIUM: stick-table: use a distinct lock for the updates tree Updating an entry in the updates tree is currently performed under the table's write lock, which causes huge contention with other accesses such as lookups and free. Aside the updates tree, the update, localupdate and commitupdate variables, nothing is manipulated, so let's create a distinct lock (updt_lock) to protect these together to remove this contention. It required to add an extra lock in the few places where we delete the update (though only if we're really going to delete it) to protect the tree. This is very convenient because now peer_send_teachmsgs() only needs to take this read lock, and there is very little contention left on the stick-table. With this alone, the performance jumped from 614k to 1140k/s on a 80-thread machine with a peers section! Stick-table updates with no peers however now has to stand two locks and slightly regressed from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the significant unlocking of the peers updates and considered totally acceptable.	2023-08-11 19:03:35 +02:00
Willy Tarreau	cc10fce9c2	MINOR: stick-table: better organize the struct stktable The structure currently mixes R/O and R/W fields, let's organize them by access type, focusing mainly on splitting the updates from the rest so that peers activity does not affect the rest. For now it doesn't bring any benefit but it paves the way for splitting the lock.	2023-08-11 19:03:35 +02:00
Willy Tarreau	7968fe3889	MEDIUM: stick-table: change the ref_cnt atomically Due to the ts->ref_cnt being manipulated and checked inside wrlocks, we continue to have it updated under plenty of read locks, which have an important cost on many-thread machines. This patch turns them all to atomic ops and carefully moves them outside of locks every time this is possible: - the ref_cnt is incremented before write-unlocking on creation otherwise the element could vanish before we can do it - the ref_cnt is decremented after write-locking on release - for all other cases it's updated out of locks since it's guaranteed by the sequence that it cannot vanish - checks are done before locking every time it's used to decide whether we're going to release the element (saves several write locks) - expiration tests are just done using atomic loads, since there's no particular ordering constraint there, we just want consistent values. For Lua, the loop that is used to dump stick-tables could switch to read locks only, but this was not done. For peers, the loop that builds updates in peer_send_teachmsgs is extremely expensive in write locks and it doesn't seem this is really needed since the only updated variables are last_pushed and commitupdate, the first one being on the shared table (thus not used by other threads) and the commitupdate could likely be changed using a CAS. Thus all of this could theoretically move under a read lock, but that was not done here. On a 80-thread machine with a peers section enabled, the request rate increased from 415 to 520k rps.	2023-08-11 19:03:35 +02:00
Willy Tarreau	8178a5211c	MAJOR: threads/plock: update the embedded library again This updates the local copy of the plock library to benefit from finer memory ordering, EBO on more operations such as when take_w() and stow() wait for readers to leave and refined EBO, especially on common operation such as attempts to upgade R to S, and avoids a counter-productive prior read in rtos() and take_r(). These changes have shown a 5% increase on regular operations on ARM, a 33% performance increase on ARM on stick-tables and 2% on x86, and a 14% and 4% improvements on peers updates respectively on ARM and x86. The availability of relaxed operations will probably be useful for stats counters which are still extremely expensive to update. The following plock commits were included in this update: 9db830b plock: support inlining exponential backoff code 008d3c2 plock: make the rtos upgrade faster 2f76dde atomic: clean up the generic xchg() 3c6919b atomic: make sure that the no-return macros do not return a value 97c2bb7 atomic: make the fallback bts use the pointed type for the shift f4c1880 atomic: also implement the missing pl_btr() 8329b82 atomic: guard all generic definitions to make it easier to provide specific ones 7c5cb62 atomic: use C11 atomics when available 96afaf9 atomic: prefer the C11 definitions in general f3ec7a6 atomic: implement load/store/atomic barriers 8bdbd1e atomic: add atomic load/stores 0f604c0 atomic: add more _noret operations 3fe35db atomic: remove the (void) cast from the C11 operations 3b08a7c atomic: allow to define the fallback _noret variants 28deb22 atomic: make x86 arithmetic operations the _noret variants 8061fe2 atomic: handle modern compilers that support returning flags b8b91b7 atomic: add the fetch-and-<op> operations (pl_ld<op>) 59817ca atomic: add memory order variants for most operations a40774f plock: explicitly make use of the pl_*_noret operations 6f1861b plock: switch to pl_sub_noret_lax() for cancellation c013980 plock: use pl_ldadd{_lax,_acq,} instead of pl_xadd() 382eea3 plock: use a release ordering when dropping the lock 60d750d plock: use EBO when waiting for readers to leave in take_w() and stow() fc01c4f plock: improve EBO a little bit 1ef6390 plock: switch to CAS + XADD for pl_take_r()	2023-08-11 19:03:35 +02:00
Frédéric Lécaille	b0e32c6263	BUG/MINOR: quic: Possible crash when issuing "show fd/sess" CLI commands ->xprt_ctx (struct ssl_sock_ctx) and ->conn (struct connection) must be kept by the remaining QUIC connection object (struct quic_cc_conn) after having release the previous one (struct quic_conn) to allow "show fd/sess" commands to be functional without causing haproxy crashes. No need to backport.	2023-08-11 11:21:31 +02:00
Willy Tarreau	d93a00861d	MINOR: h2: pass accept-invalid-http-request down the request parser We're adding a new argument "relaxed" to h2_make_htx_request() so that we can control its level of acceptance of certain invalid requests at the proxy level with "option accept-invalid-http-request". The goal will be to add deactivable checks that are still desirable to have by default. For now no test is subject to it.	2023-08-08 19:10:54 +02:00
Willy Tarreau	30f58f4217	MINOR: http: add new function http_path_has_forbidden_char() As its name implies, this function checks if a path component has any forbidden headers starting at the designated location. The goal is to seek from the result of a successful ist_find_range() for more precise chars. Here we're focusing on 0x00-0x1F, 0x20 and 0x23 to make sure we're not too strict at this point.	2023-08-08 19:10:54 +02:00
Willy Tarreau	197668de97	MINOR: ist: add new function ist_find_range() to find a character range This looks up the character range <min>..<max> in the input string and returns a pointer to the first one found. It's essentially the equivalent of ist_find_ctl() in that it searches by 32 or 64 bits at once, but deals with a range.	2023-08-08 19:10:54 +02:00
Willy Tarreau	d4069f3cee	REORG: http: move has_forbidden_char() from h2.c to http.h This function is not H2 specific but rather generic to HTTP. We'll need it in H3 soon, so let's move it to HTTP and rename it to http_header_has_forbidden_char().	2023-08-08 19:02:24 +02:00
Frédéric Lécaille	9f7cfb0a56	MEDIUM: quic: Allow the quic_conn memory to be asap released. When the connection enters the "connection closing" state after having sent a datagram with CONNECTION_CLOSE frames inside its packets, a lot of memory may be freed from quic_conn objects (QUIC connection). This is done allocating a reduced sized object which keeps enough information to handle the remaining incoming packets for the connection in "connection closing" state, and to continue to send again the previous datagram with CONNECTION_CLOSE frames inside which has already been sent. Define a new quic_cc_conn struct which represents the connection object after entering the "connection close" state and after having release the quic_conn connection object. Define <pool_head_quic_cc_conn> new pool for these quic_cc_conn struct objects. Define QUIC_CONN_COMMON structure which is shared between quic_conn struct object (the connection before entering "connection close" state), and new quic_cc_conn struct object (the connection after entering "connection close"). So, all the members inside QUIC_CONN_COMMON may be indifferently dereferenced from a quic_conn struct or a quic_cc_conn struct pointer. Implement qc_new_cc_conn() function to allocate such connections in "connection close" state. This function is responsible of copying the required information from the original connection (quic_conn) to the remaining connection (quic_cc_conn). Among others initialization, it redefined the QUIC packet handler task to quic_cc_conn_io_cb() and the idle timer task to qc_cc_idle_timer_task(). quic_cc_conn_io_cb() drains the received and resend the datagram which CONNECTION_CLOSE frame which has already been sent when entering "connection close" state. qc_cc_idle_timer_task() only releases the remaining quic_cc_conn struct object. Modify quic_conn_release() to allocate quic_cc_conn struct objects from the original connection passed as argument. It does nothing if this original connection is not in closing state, or if the idle timer has already expired. Implement quic_release_cc_conn() to release a "connection close" connection. It is called when its timer expires or if an error occured when sending a packet from this connection when the peer is no more reachable.	2023-08-08 14:59:17 +02:00
Frédéric Lécaille	276697438d	MINOR: quic: Use a pool for the connection ID tree. Add "quic_cids" new pool to allocate the ->cids trees of quic_conn objects. Replace ->cids member of quic_conn objects by pointer to "quic_cids" and adapt the code consequently. Nothing special.	2023-08-08 10:57:00 +02:00
Frédéric Lécaille	dc9b8e1f27	MEDIUM: quic: Send CONNECTION_CLOSE packets from a dedicated buffer. Add a new pool <pool_head_quic_cc_buf> for buffer used when building datagram wich CONNECTION_CLOSE frames inside with QUIC_MIN_CC_PKTSIZE(128) as minimum size. Add ->cc_buf_area to quic_conn struct to store such buffers. Add ->cc_dgram_len to store the size of the "connection close" datagrams and ->cc_buf a buffer struct to be used with ->cc_buf_area as ->area member value. Implement qc_get_txb() to be called in place of qc_txb_alloc() to allocate a struct "quic_cc_buf" buffer when the connection needs an immediate close or a buffer struct if not. Modify qc_prep_hptks() and qc_prep_app_pkts() to allow them to use such "quic_cc_buf" buffer when an immediate close is required.	2023-08-08 10:57:00 +02:00
Frédéric Lécaille	f7ab5918d1	MINOR: quic: Move some counters from [rt]x quic_conn anonymous struct Move rx.bytes, tx.bytes and tx.prep_bytes quic_conn struct member to bytes anonymous struct (bytes.rx, bytes.tx and bytes.prep member respectively). They are moved before being defined into a bytes anonoymous struct common to a future struct to be defined. Consequently adapt the code.	2023-08-07 18:57:45 +02:00
Frédéric Lécaille	a45f90dd4e	MINOR: quic: Amplification limit handling sanitization. Add a BUG_ON() to quic_peer_validated_addr() to check the amplification limit is respected when it return false(0), i.e. when the connection is not validated. Implement quic_may_send_bytes() which returns the number of bytes which may be sent when the connection has not already been validated and call this functions at several places when this is the case (after having called quic_peer_validated_addr()). Furthermore, this patch improves the code maintainability. Some patches to come will have to rename ->[rt]x.bytes quic_conn struct members.	2023-08-07 18:57:45 +02:00
Frédéric Lécaille	1f40b6c9fe	CLEANUP: quic: Remove quic_path_room(). This function is definitively no more needed/used.	2023-08-07 18:57:45 +02:00
Amaury Denoyelle	559482c11e	MINOR: h3: abort request if not completed before full response A HTTP server may provide a complete response even prior receiving the full request. In this case, RFC 9114 allows the server to abort read with a STOP_SENDING with error code H3_NO_ERROR. This scenario was notably reproduced with haproxy and an inactive server. If the client send a POST request, haproxy may provide a full HTTP 503 response before the end of the full request.	2023-08-04 16:17:16 +02:00
Christopher Faulet	8670bb42c2	CLEANUP: stconn: Move comment about sedesc fields on the field line Fields of sedesc structure were documented in the comment about the structure itself. It was not really convenient, hard to read, hard to update. So comments about the fields are moved on the corresponding field line, as usual.	2023-08-04 14:32:57 +02:00
Christopher Faulet	ef2b15998c	BUG/MINOR: htx/mux-h1: Properly handle bodyless responses when splicing is used There is a mechanisme in the H1 and H2 multiplexer to skip the payload when a response is returned to the client when it must not contain any payload (response to a HEAD request or a 204/304 response). However, this does not work when the splicing is used. The H2 multiplexer does not support the splicing, so there is no issue. But with the mux-h1, when data are sent using the kernel splicing, the mux on the server side is not aware the client side should skip the payload. And once the data are put in a pipe, there is no way to stop the sending. It is a defect of the current design. This will be easier to deal with this case when the mux-to-mux forwarding will be implemented. But for now, to fix the issue, we should add an HTX flag on the start-line to pass the info from the client side to the server side and be able to disable the splicing in necessary. The associated reg-test was improved to be sure it does not fail when the splicing is configured. This patch should be backported as far as 2.4..	2023-08-02 12:05:05 +02:00
Patrick Hemmer	7fccccccea	MINOR: acl: add acl() sample fetch This provides a sample fetch which returns the evaluation result of the conjunction of named ACLs.	2023-08-01 10:49:06 +02:00
Patrick Hemmer	00e00fb424	REORG: cfgparse: extract curproxy as a global variable This extracts curproxy from cfg_parse_listen so that it can be referenced by keywords that need the context of the proxy they are being used within.	2023-08-01 10:48:28 +02:00
Patrick Hemmer	997a31dbdf	CLEANUP: acl: remove cache_idx from acl struct It isn't used and never has been.	2023-08-01 10:48:05 +02:00
Frédéric Lécaille	c156c5bda6	MINOR: quic; Move the QUIC frame pool to its proper location pool_head_quic_frame QUIC frame pool definition is move from quic_conn-t.h to quic_frame-t.h. Its declation is moved from quic_conn.c to quic_frame.c.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	fa58f67787	CLEANUP: quic: quic_conn struct cleanup Remove no more used QUIC_TX_RING_BUFSZ macro. Remove several no more used quic_conn struct members.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	444c1a4113	MINOR: quic: Split QUIC connection code into three parts Move the TX part of the code to quic_tx.c. Add quic_tx-t.h and quic_tx.h headers for this TX part code. The definition of quic_tx_packet struct has been move from quic_conn-t.h to quic_tx-t.h. Same thing for the TX part: Move the RX part of the code to quic_rx.c. Add quic_rx-t.h and quic_rx.h headers for this TX part code. The definition of quic_rx_packet struct has been move from quic_conn-t.h to quic_rx-t.h.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	2fe50a01ca	CLEANUP: quic: Defined but no more used function (quic_get_tls_enc_levels()) This function is no more used since this commit: MEDIUM: quic: Handshake I/O handler rework. Let's remove it!	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	7008f16d57	MINOR: quic: Add a new quic_ack.c C module for QUIC acknowledgements Extract the code in relation with the QUIC acknowledgements from quic_conn.c to quic_ack.c to accelerate the compilation of quic_conn.c.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	f454b78fa9	MINOR: quic: Add new "QUIC over SSL" C module. Move the code which directly calls the functions of the OpenSSL QUIC API into quic_ssl.c new C file. Some code have been extracted from qc_conn_finalize() to implement only the QUIC TLS part (see quic_tls_finalize()) into quic_tls.c. qc_conn_finalize() has also been exported to be used from this new quic_ssl.c C module.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	57237f68ad	MINOR: quic: Move TLS related code to quic_tls.c quic_tls_key_update() and quic_tls_rotate_keys() are QUIC TLS functions. Let's move them to their proper location: quic_tls.c.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	953e67abb6	MINOR: quic: Export QUIC CLI code from quic_conn.c To accelerate the compilation of quic_conn.c file, export the code in relation with the QUIC CLI from quic_conn.c to quic_cli.c.	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	6334f4f6c5	MINOR: quic: Export QUIC traces code from quic_conn.c To accelerate the compilation of quic_conn.c file, export the code in relation with the traces from quic_conn.c to quic_trace.c. Also add some headers (quic_trace-t.h and quic_trace.h).	2023-07-27 10:51:03 +02:00
Frédéric Lécaille	f32201abb0	MINOR: quic: Add "limited-quic" new tuning setting This setting which may be used into a "global" section, enables the QUIC listener bindings when haproxy is compiled with the OpenSSL wrapper. It has no effect when haproxy is compiled against a TLS stack with QUIC support, typically quictls.	2023-07-21 19:19:27 +02:00
Frédéric Lécaille	2fd67c558a	MINOR: quic: Missing encoded transport parameters for QUIC OpenSSL wrapper This wrapper needs to have an access to an encoded version of the local transport parameter (to be sent to the peer). They are provided to the TLS stack thanks to qc_ssl_compat_add_tps_cb() callback. These encoded transport parameters were attached to the QUIC connection but removed by this commit to save memory: MINOR: quic: Stop storing the TX encoded transport parameters This patch restores these transport parameters and attaches them again to the QUIC connection (quic_conn struct), but only when the QUIC OpenSSL wrapper is compiled. Implement qc_set_quic_transport_params() to encode the transport parameters for a connection and to set them into the stack and make this function work for both the OpenSSL wrapper or any other TLS stack with QUIC support. Its uses the encoded version of the transport parameters attached to the connection when compiled for the OpenSSL wrapper, or local parameters when compiled with TLS stack with QUIC support. These parameters are passed to quic_transport_params_encode() and SSL_set_quic_transport_params() as before this patch.	2023-07-21 17:27:40 +02:00
Frédéric Lécaille	7978493c2e	MINOR: quic: Add a quic_openssl_compat struct to quic_conn struct Add quic_openssl_compat struct to the quic_conn struct to support the QUIC OpenSSL wrapper feature.	2023-07-21 15:54:31 +02:00
Frédéric Lécaille	e3991e03cc	MINOR: quic: Export some KDF functions (QUIC-TLS) quic_hkdf_expand() and quic_hkdf_expand_label() must be used by the QUIC OpenSSL wrapper.	2023-07-21 15:53:41 +02:00
Frédéric Lécaille	780133548c	MINOR: quic: Include QUIC opensssl wrapper header from TLS stacks compatibility header Include haproxy/quic_openssl_compat.h from haproxy/openssl-compat.h when the compilation of the QUIC openssl wrapper for TLS stacks is enabled with USE_QUIC_OPENSSLCOMPAT.	2023-07-21 15:53:40 +02:00
Frédéric Lécaille	1b03f8016d	MINOR: quic: QUIC openssl wrapper implementation Highly inspired from nginx openssl wrapper code. This wrapper implement this list of functions: SSL_set_quic_method(), SSL_quic_read_level(), SSL_quic_write_level(), SSL_set_quic_transport_params(), SSL_provide_quic_data(), SSL_process_quic_post_handshake() and SSL_QUIC_METHOD QUIC specific bio method which are also implemented by quictls to support QUIC from OpenSSL. So, its aims is to support QUIC from a standard OpenSSL stack without QUIC support. It relies on the OpenSSL keylog feature to retreive the secrets derived by the OpenSSL stack during a handshake and to pass them to the ->set_encryption_secrets() callback as this is done by quictls. It makes usage of a callback (quic_tls_compat_msg_callback()) to handle some TLS messages only on the receipt path. Some of them must be passed to the ->add_handshake_data() callback as this is done with quictls to be sent to the peer as CRYPTO data. quic_tls_compat_msg_callback() callback also sends the received TLS alert with ->send_alert() callback. AES 128-bits with CCM mode is not supported at this time. It is often disabled by the OpenSSL stack, but as it can be enabled by "ssl-default-bind-ciphersuites", the wrapper will send a TLS alerts (Handhshake failure) if this algorithm is negotiated between the client and the server. 0rtt is also not supported by this wrapper.	2023-07-21 15:53:40 +02:00
Frédéric Lécaille	72619bda4c	MINOR: quic: add trace about pktns packet/frames releasing Add useful traces which have alredy helped in debugging issues.	2023-07-21 14:31:42 +02:00
Frédéric Lécaille	0645e56a6e	MINOR: quic: Add traces for qc_frm_free() Useful to diagnose memory leak issues in relation with the QUIC frame objects.	2023-07-21 14:30:35 +02:00
Frédéric Lécaille	cf2368a3d5	MEDIUM: quic: Packet building rework. The aim of this patch is to allow the building of QUIC datagrams with as much as packets with different encryption levels inside during handshake. At this time, this is possible only for at most two encryption levels. That said, most of the time, a server only needs to use two encryption levels by datagram, except during retransmissions. Modify qc_prep_pkts(), the function responsible of building datagrams, to pass a list of encryption levels as parameter in place of two encryption levels. This function is also used when retransmitting datagrams. In this case this is a customized/flexible list of encryption level which is passed to this function. Add ->retrans new member to quic_enc_level struct, to be used as attach point to list of encryption level used only during retransmission, and ->retrans_frms new member which is a pointer to a list of frames to be retransmitted.	2023-07-21 14:30:35 +02:00
Frédéric Lécaille	2b8510d722	MINOR: quic: Release asap the negotiated Initial TLS context. This context may be released at the same time as the Initial TLS context. This is done calling quic_tls_ctx_secs_free() and pool_free() in two code locations. Implement quic_nictx_free() to do that.	2023-07-21 14:27:10 +02:00
Frédéric Lécaille	90a63ae4fa	MINOR: quic: Dynamic allocation for negotiated Initial TLS cipher context. Shorten ->negotiated_ictx quic_conn struct member (->nictx). This variable is used during version negotiation. Indeed, a connection may have to support support several QUIC versions of paquets during the handshake. ->nictx is the QUIC TLS cipher context used for the negotiated QUIC version. This patch allows a connection to dynamically allocate this TLS cipher context. Add a new pool (pool_head_quic_tls_ctx) for such QUIC TLS cipher context object. Modify qc_new_conn() to initialize ->nictx to NULL value. quic_tls_ctx_secs_free() frees all the secrets attached to a QUIC TLS cipher context. Modify it to do nothing if it is called with a NULL TLS cipher context. Modify to allocate ->nictx from qc_conn_finalize() just before initializing its secrets. qc_conn_finalize() allocates -nictx only if needed (if a new QUIC version was negotiated). Modify qc_conn_release() which release a QUIC connection (quic_conn struct) to release ->nictx TLS cipher context.	2023-07-21 14:27:10 +02:00
Frédéric Lécaille	642dba8c22	MINOR: quic: Stop storing the TX encoded transport parameters There is no need to keep an encoded version of the QUIC listener transport parameters attache to the connection. Remove ->enc_params and ->enc_params_len member of quic_conn struct. Use variables to build the encoded transport parameter local to ha_quic_set_encryption_secrets() before they are passed to SSL_set_quic_transport_params(). Modify qc_ssl_sess_init() prototype. It was expected to be used with the encoded transport parameters as passed parameter, but they were not used. Cleanup this function.	2023-07-21 14:27:10 +02:00
Patrick Hemmer	57926fe8a3	MINOR: peers: add peers keyword registration This adds support for registering keywords in the 'peers' section.	2023-07-20 18:12:44 +02:00
Willy Tarreau	6ecabb3f35	CLEANUP: config: make parse_cpu_set() return documented values parse_cpu_set() stopped returning the undocumented -1 which was a leftover from an earlier attempt, changed from ulong to int since it only returns a success/failure and no more a mask. Thus it must not return -1 and its callers must only test for != 0, as is documented.	2023-07-20 11:01:09 +02:00
Willy Tarreau	f54d8c6457	CLEANUP: cpuset: remove the unused proc_t1 field in cpu_map This field used to store the cpumap of the first thread in a group, and was used till 2.4 to hold some default settings, after which it was no longer used. Let's just drop it.	2023-07-20 11:01:09 +02:00
Willy Tarreau	151f9a2808	BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct We're currently having a problem with the porting from cpu_map from processes to thread-groups as it happened in 2.7 with commit `5b09341c0` ("MEDIUM: cpu-map: replace the process number with the thread group number"), though it seems that it has deeper roots even in 2.0 and that it was progressively made worng over time. The issue stems in the way the per-process and per-thread cpu-sets were employed over time. Originally only processes were supported. Then threads were added after an optional "/" and it was documented that "cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified in 2.5 by commit `317804d28` ("DOC: update references to process numbers in cpu-map and bind-process"). The reality is different: when processes were still supported, setting "cpu-map 1" would apply the mask to the process itself (and only when run in the background, which is not documented either and is also a bug for another fix), and would be combined with any possible per-thread mask when calculating the threads' affinity, possibly resulting in empty sets. However, "cpu-map 1/all" would only set the mask for the threads and not the process. As such the following: cpu-map 1 odd cpu-map 1/1-8 even would leave no CPU while doing: cpu-map 1/all odd cpu-map 1/1-8 even would allow all CPUs. While such configs are very unlikely to ever be met (which is why this bug is tagged minor), this is becoming quite more visible while testing automatic CPU binding during 2.9 development because due to this bug it's much more common to end up with incorrect bindings. This patch fixes it by simply removing the .proc entry from cpu_map and always setting all threads' maps. The process is no longer arbitrarily bound to the group 1's mask, but in case threads are disabled, we'll use thread 1's mask since it contains the configured CPUs. This fix should be backported at least to 2.6, but no need to insist if it resists as it's easier to break cpu-map than to fix an unlikely issue.	2023-07-20 11:01:09 +02:00
Willy Tarreau	7134417613	MINOR: cpuset: add cpu_map_configured() to know if a cpu-map was found Since we'll soon want to adjust the "thread-groups" degree of freedom based on the presence of cpu-map, we first need to be able to detect if cpu-map was used. This function scans all cpu-map sets to detect if any is present, and returns true accordingly.	2023-07-20 11:01:09 +02:00
Aurelien DARRAGON	2e7d3d2e5c	BUG/MINOR: hlua: hlua_yieldk ctx argument should support pointers lua_yieldk ctx argument is of type lua_KContext which is typedefed to intptr_t when available so it can be used to store pointers. But the wrapper function hlua_yieldk() passes it as a regular it so it breaks that promise. Changing hlua_yieldk() prototype so that ctx argument is of type lua_KContext. This bug had no functional impact because ctx argument is not being actively used so far. This may be backported to all stable versions anyway.	2023-07-17 07:42:47 +02:00
Emeric Brun	49ddd87d41	CLEANUP: quic: remove useless parameter 'key' from quic_packet_encrypt Parameter 'key' was not used in this function. This patch removes it from the prototype of the function. This patch could be backported until v2.6.	2023-07-12 14:33:03 +02:00
Emeric Brun	cadb232e93	BUG/MEDIUM: quic: timestamp shared in token was using internal time clock The internal tick clock was used to export the timestamp int the token on retry packets. Doing this in cluster mode the nodes don't understand the timestamp from tokens generated by others. This patch re-work this using the the real current date (wall-clock time). Timestamp are also now considered in secondes instead of milleseconds. This patch should be backported until v2.6	2023-07-12 14:32:01 +02:00
Aurelien DARRAGON	b6e2d62fb3	MINOR: sink/api: pass explicit maxlen parameter to sink_write() sink_write() currently relies on sink->maxlen to know when to stop writing a given payload. But it could be useful to pass a smaller, explicit value to sink_write() to stop before the ring maxlen, for instance if the ring is shared between multiple feeders. sink_write() now takes an optional maxlen parameter: if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen is smaller than ring->maxlen, else only ring->maxlen will be considered. [for haproxy <= 2.7, patch must be applied by hand: that is: __sink_write() and sink_write() should be patched to take maxlen into account and function calls to sink_write() should use 0 as second argument to keep original behavior]	2023-07-10 18:28:08 +02:00
Aurelien DARRAGON	4f0e0f5a65	MEDIUM: sample: introduce 'same' output type Thierry Fournier reported an annoying side-effect when using the debug() converter. Consider the following examples: [1] http-request set-var(txn.test) bool(true),ipmask(24) [2] http-request redirect location /match if { bool(true),ipmask(32) } When starting haproxy with [1] example we get: config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request set-var(txn.test)' rule : converter 'ipmask' cannot be applied. With [2], we get: config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request redirect' rule : error in condition: converter 'ipmask' cannot be applied in ACL expression 'bool(true),ipmask(32)'. Now consider the following examples which are based on [1] and [2] but with the debug() sample conv inserted in-between those incompatible sample types: [1] http-request set-var(txn.test) bool(true),debug,ipmask(24) [2] http-request redirect location /match if { bool(true),debug,ipmask(32) } According to the documentation, "it is safe to insert the debug converter anywhere in a chain, even with non-printable sample types". Thus we don't expect any side-effect from using it within a chain. However in current implementation, because of debug() returning SMP_T_ANY type which is a pseudo type (only resolved at runtime), the sample compatibility checks performed at parsing time are completely uneffective. (haproxy will start and no warning will be emitted) The indesirable effect of this is that debug() prevents haproxy from warning you about impossible type conversions, hiding potential errors in the configuration that could result to unexpected evaluation or logic while serving live traffic. We better let haproxy warn you about this kind of errors when it has the chance. With our previous examples, this could cause some inconveniences. Let's say for example that you are testing a configuration prior to deploying it. When testing the config you might want to use debug() converter from time to time to check how the conversion chain behaves. Now after deploying the exact same conf, you might want to remove those testing debug() statements since they are no longer relevant.. but removing them could "break" the config and suddenly prevent haproxy from starting upon reload/restart. (this is the expected behavior, but it comes a bit too late because of debug() hiding errors during tests..) To fix this, we introduce a new output type for sample expressions: SMP_T_SAME - may only be used as "expected" out_type (parsing time) for sample converters. As it name implies, it is a way for the developpers to indicate that the resulting converter's output type is guaranteed to match the type of the sample that is presented on the converter's input side. (converter may alter data content, but data type must not be changed) What it does is that it tells haproxy that if switching to the converter (by looking at the converter's input only, since outype is SAME) is conversion-free, then the converter type can safely be ignored for type compatibility checks within the chain. debug()'s out_type is thus set to SMP_T_SAME instead of ANY, which allows it to fully comply with the doc in the sense that it does not impact the conversion chain when inserted between sample items. Co-authored-by: Thierry Fournier <thierry.f.78@gmail.com>	2023-07-03 16:32:01 +02:00
Aurelien DARRAGON	a635a1779a	MEDIUM: sample: add missing ADDR=>? compatibility matrix entries SMP_T_ADDR support was added in `b805f71` ("MEDIUM: sample: let the cast functions set their output type"). According to the above commit, it is made clear that the ADDR type is a pseudo/generic type that may be used for compatibility checks but that cannot be emitted from a fetch or converter. With that in mind, all conversions from ADDR to other types were explicitly disabled in the compatibility matrix. But later, when map__ip functions were updated in `b2f8f08` ("MINOR: map: The map can return IPv4 and IPv6"), we started using ADDR as "expected" output type for converters. This still complies with the original description from `b805f71`, because it is used as the theoric output type, and is never emitted from the converters themselves (only "real" types such as IPV4 or IPV6 are actually being emitted at runtime). But this introduced an ambiguity as well as a few bugs, because some compatibility checks are being performed at config parse time, and thus rely on the expected output type to check if the conversion from current element to the next element in the chain is theorically supported. However, because the compatibility matrix doesn't support ADDR to other types it is currently impossible to use map__ip converters in the middle of a chain (the only supported usage is when map__ip converters are at the very end of the chain). To illustrate this, consider the following examples: acl test str(ok),map_str_ip(test.map) -m found # this will work acl test str(ok),map_str_ip(test.map),ipmask(24) -m found # this will raise an error Likewise, stktable_compatible_sample() check for stick tables also relies on out_type[table_type] compatibility check, so map__ip cannot be used with sticktables at the moment: backend st_test stick-table type string size 1m expire 10m store http_req_rate(10m) frontend fe bind localhost:8080 mode http http-request track-sc0 str(test),map_str_ip(test.map) table st_test # raises an error To fix this, and prevent future usage of ADDR as expected output type (for either fetches or converters) from introducing new bugs, the ADDR=>? part of the matrix should follow the ANY type logic. That is, ADDR, which is a pseudo-type, must be compatible with itself, and where IPV4 and IPV6 both support a backward conversion to a given type, ADDR must support it as well. It is done by setting the relevant ADDR entries to c_pseudo() in the compatibility matrix to indicate that the operation is theorically supported (c_pseudo() will never be executed because ADDR should not be emitted: this only serves as a hint for compatibility checks at parse time). This is what's being done in this commit, thanks to this the broken examples documented above should work as expected now, and future usage of ADDR as out_type should not cause any issue.	2023-07-03 16:32:01 +02:00
Aurelien DARRAGON	30cd137d3f	MINOR: sample: introduce c_pseudo() conv function This function is used for ANY=>!ANY conversions in the compatibility matrix to help differentiate between real NOOP (c_none) and pseudo conversions that are theorically supported at config parse time but can never occur at runtime,. That is, to explicit the fact that actual related runtime operations (e.g.: ANY->IPV4) are not NOOP since they might require some conversion to be performed depending on the input type. When checking the conf we don't know the effective out types so cast[pseudo type][pseudo type] is allowed in the compatibility matrix, but at runtime we only expect cast[real type][(real type \|\| pseudo type)] because fetches and converters may not emit pseudo types, thus using c_none() everywhere was too ambiguous. The process will crash if c_pseudo() is invoked to help catch bugs: crashing here means that a pseudo type has been encountered on a converter's input at runtime (because it was emitted earlier in the chain), which is not supported and results from a broken sample fetch or converter implementation. (pseudo types may only be used as out_type in sample definitions for compatibility checks at parsing time)	2023-07-03 16:32:01 +02:00
Aurelien DARRAGON	58bbe41cb8	MEDIUM: acl/sample: unify sample conv parsing in a single function Both sample_parse_expr() and parse_acl_expr() implement some code logic to parse sample conv list after respective fetch or acl keyword. (Seems like the acl one was inspired by the sample one historically) But there is clearly code duplication between the two functions, making them hard to maintain. Hopefully, the parsing logic between them has stayed pretty much the same, thus the sample conv parsing part may be moved in a dedicated helper parsing function. This is what's being done in this commit, we're adding the new function sample_parse_expr_cnv() which does a single thing: parse the converters that are listed right after a sample fetch keyword and inject them into an already existing sample expression. Both sample_parse_expr() and parse_acl_expr() were adapted to now make use of this specific parsing function and duplicated code parts were cleaned up. Although sample_parse_expr() remains quite complicated (numerous function arguments due to contextual parsing data) the first goal was to get rid of code duplication without impacting the current behavior, with the added benefit that it may allow further code cleanups / simplification in the future.	2023-07-03 16:32:01 +02:00
Frédéric Lécaille	7f3c1bef37	MINOR: quic: Drop packet with type for discarded packet number space. This patch allows the low level packet parser to drop packets with type for discarded packet number spaces. Furthermore, this prevents it from reallocating new encryption levels and packet number spaces already released/discarded. When a packet number space is discarded, it MUST NOT be reallocated. As the packet number space discarding is done asap the type of packet received is known, some packet number space discarding check may be safely removed from qc_try_rm_hp() and qc_qel_may_rm_hp() which are called after having parse the packet header, and is type.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	b97de9dc21	MINOR: quic: Move the packet number space status at quic_conn level As the packet number spaces and encryption level are dynamically allocated, the information about the packet number space discarded status must be kept somewhere else than in these objects. quic_tls_discard_keys() is no more useful. Modify quic_pktns_discard() to do the same job: flag the quic_conn object has having discarded packet number space. Implement quic_tls_pktns_is_disarded() to check if a packet number space is discarded. Note the Application data packet number space is never discarded.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	f7749968d6	CLEANUP: quic: Remove two useless pools a low QUIC connection level Both "quic_tx_ring" and "quic_rx_crypto_frm" pool are no more used. Should be backported as far as 2.6.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	a5c1a3b774	MINOR: quic: Reduce the maximum length of TLS secrets The maximum length of the secrets derived by the TLS stack is 384 bits. This reduces the size of the objects provided by the "quic_tls_secret" pool by 16 bytes. Should be backported as far as 2.6	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	3097be92f1	MEDIUM: quic: Dynamic allocations of QUIC TLS encryption levels Replace ->els static array of encryption levels by 4 pointers into the QUIC connection object, quic_conn struct. ->iel denotes the Initial encryption level, ->eel the Early-Data encryption level, ->hel the Handshaske encryption level and ->ael the Application Data encryption level. Add ->qel_list to this structure to list the encryption levels after having been allocated. Modify consequently the encryption level object itself (quic_enc_level struct) so that it might be added to ->qel_list QUIC connection list of encryption levels. Implement qc_enc_level_alloc() to initialize the value of a pointer to an encryption level object. It is used to initialized the pointer newly added to the quic_conn structure. It also takes a packet number space pointer address as argument to initialize it if not already initialized. Modify quic_tls_ctx_reset() to call it from quic_conn_enc_level_init() which is called by qc_enc_level_alloc() to allocate an encryption level object. Implement 2 new helper functions: - ssl_to_qel_addr() to match and pointer address to a quic_encryption level attached to a quic_conn object with a TLS encryption level enum value; - qc_quic_enc_level() to match a pointer to a quic_encryption level attached to a quic_conn object with an internal encryption level enum value. This functions are useful to be called from ->set_encryption_secrets() and ->add_handshake_data() TLS stack called which takes a TLS encryption enum as argument (enum ssl_encryption_level_t). Replace all the use of the qc->els[] array element values by one of the newly added ->[ieha]el quic_conn struct member values.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	25a7b15144	MINOR: quic: Add a pool for the QUIC TLS encryption levels Very simple patch to define and declare a pool for the QUIC TLS encryptions levels. It will be used to dynamically allocate such objects to be attached to the QUIC connection object (quic_conn struct) and to remove from quic_conn struct the static array of encryption levels (see ->els[]).	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	7d9f12998d	CLEANUP: quic: Remove qc_list_all_rx_pkts() defined but not used This function is not used. May be safely removed.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	6635aa6a0a	MEDIUM: quic: Dynamic allocations of packet number spaces Add a pool to dynamically handle the memory used for the QUIC TLS packet number spaces. Remove the static array of packet number spaces at QUIC connection level (struct quic_conn) and add three new members to quic_conn struc as pointers to quic_pktns struct, one by packet number space as follows: ->ipktns for Initial packet number space, ->hpktns for Handshake packet number space and ->apktns for Application packet number space. Also add a ->pktns_list new member (struct list) to quic_conn struct to attach the list of the packet number spaces allocated for the QUIC connection. Implement ssl_to_quic_pktns() to map and retrieve the addresses of these pointers from TLS stack encryption levels. Modify quic_pktns_init() to initialize these members. Modify ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() to allocate the packet numbers and initialize the encryption level. Implement quic_pktns_release() which takes pointers to pointers to packet number space objects to release the memory allocated for a packet number space attached to a QUIC connection and reset their address values. Modify qc_new_conn() to allocation only the Initial packet number space and Initial encryption level. Modify QUIC loss detection API (quic_loss.c) to use the new ->pktns_list list attached to a QUIC connection in place of a static array of packet number spaces. Replace at several locations the use of elements of an array of packet number spaces by one of the three pointers to packet number spaces	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	ef39a74f4a	MINOR: quic: Move packet number space related functions Move packet number space related functions from quic_conn.h to quic_tls.h. Should be backported as far as 2.6 to ease future backports to come.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	411b6f73b7	MINOR: quic: Implement a packet number space identification function Implement quic_pktns_char() to identify a packet number space from a quic_conn object. Usefull only for traces.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	dc6b339733	MINOR: quic: Move QUIC encryption level structure definition haproxy/quic_tls-t.h is the correct place to quic_enc_level structure definition. Should be backported as far as 2.6 to ease any further backport to come.	2023-06-30 16:20:55 +02:00
Frédéric Lécaille	6593ec6f5e	MINOR: quic: Move QUIC TLS encryption level related code (quic_conn_enc_level_init()) quic_conn_enc_level_init() location is definitively in QUIC TLS API source file: src/quic_tls.c.	2023-06-30 16:20:55 +02:00
Willy Tarreau	90d18e2006	IMPORT: slz: implement a synchronous flush() operation In some cases it may be desirable for latency reasons to forcefully flush the queue even if it results in suboptimal compression. In our case the queue might contain up to almost 4 bytes, which need an EOB and a switch to literal mode, followed by 4 bytes to encode an empty message. This means that each call can add 5 extra bytes in the ouput stream. And the flush may also result in the header being produced for the first time, which can amount to 2 or 10 bytes (zlib or gzip). In the worst case, a total of 19 bytes may be emitted at once upon a flush with 31 pending bits and a gzip header. This is libslz upstream commit cf8c4668e4b4216e930b56338847d8d46a6bfda9.	2023-06-30 16:12:36 +02:00
William Lallemand	593c895eed	MINOR: ssl: allow to change the client-sigalgs on server lines This patch introduces the "client-sigalgs" keyword for the server line, which allows to configure the list of server signature algorithms negociated during the handshake. Also available as "ssl-default-server-client-sigalgs" in the global section.	2023-06-29 14:11:46 +02:00
William Lallemand	717f0ad995	MINOR: ssl: allow to change the server signature algorithm on server lines This patch introduces the "sigalgs" keyword for the server line, which allows to configure the list of server signature algorithms negociated during the handshake. Also available as "ssl-default-server-sigalgs" in the global section.	2023-06-29 13:40:18 +02:00
Frédéric Lécaille	c2bab72d32	BUG/MINOR: quic: Missing TLS secret context initialization This bug arrived with this commit: MINOR: quic: Remove pool_zalloc() from qc_new_conn() Missing initialization of largest packet number received during a keyupdate phase. This prevented the keyupdate feature from working and made the keyupdate interop tests to fail for all the clients. Furthermore, ->flags from quic_tls_ctx was also not initialized. This could also impact the keyupdate feature at least. No backport needed.	2023-06-19 19:05:45 +02:00
Frédéric Lécaille	ddc616933c	MINOR: quic: Remove pool_zalloc() from qc_new_conn() qc_new_conn() is ued to initialize QUIC connections with quic_conn struct objects. This function calls quic_conn_release() when it fails to initialize a connection. quic_conn_release() is also called to release the memory allocated by a QUIC connection. Replace pool_zalloc() by pool_alloc() in this function and initialize all quic_conn struct members which are referenced by quic_conn_release() to prevent use of non initialized variables in this fonction. The ebtrees, the lists attached to quic_conn struct must be initialized. The tasks must be reset to their NULL default values to be safely destroyed by task_destroy(). This is all the case for all the TLS cipher contexts of the encryption levels (struct quic_enc_level) and those for the keyupdate. The packet number spaces (struct quic_pktns) must also be initialized. ->prx_counters pointer must be initialized to prevent quic_conn_prx_cntrs_update() from dereferencing this pointer. ->latest_rtt member of quic_loss struct must also be initialized. This is done by quic_loss_init() called by quic_path_init().	2023-06-16 16:55:58 +02:00
Frédéric Lécaille	d66896036a	BUG/MINOR: quic: Missing initialization (packet number space probing) ->tx.pto_probe member of quic_pktns struct was not initialized by quic_pktns_init(). This bug never occured because all quic_pktns structs are attached to quic_conn structs which are always pool_zalloc()'ed. Must be backported as far as 2.6.	2023-06-14 11:35:22 +02:00
Aurelien DARRAGON	b7f8af3ca9	BUG/MINOR: proxy/server: free default-server on deinit proxy default-server is a specific type of server that is not allocated using new_server(): it is directly stored within the parent proxy structure. However, since it may contain some default config options that may be inherited by regular servers, it is also subject to dynamic members (strings, structures..) that needs to be deallocated when the parent proxy is cleaned up. Unfortunately, srv_drop() may not be used directly from p->defsrv since this function is meant to be used on regular servers only (those created using new_server()). To circumvent this, we're splitting srv_drop() to make a new function called srv_free_params() that takes care of the member cleaning which originally takes place in srv_drop(). This function is exposed through server.h, so it may be called from outside server.c. Thanks to this, calling srv_free_params(&p->defsrv) from free_proxy() prevents any memory leaks due to dynamic parameters allocated when parsing a default-server line from a proxy section. This partially fixes GH #2173 and may be backported to 2.8. [While it could also be relevant for other stable versions, the patch won't apply due to architectural changes / name changes between 2.4 => 2.6 and then 2.6 => 2.8. Considering this is a minor fix that only makes memory analyzers happy during deinit paths (at least for <= 2.8), it might not be worth the trouble to backport them any further?]	2023-06-06 15:15:17 +02:00
Willy Tarreau	4ad1c9635a	BUG/MINOR: stream: do not use client-fin/server-fin with HTX Historically the client-fin and server-fin timeouts were made to allow a connection closure to be effective quickly if the last data were sent down a socket and the client didn't close, something that can happen when the peer's FIN is lost and retransmits are blocked by a firewall for example. This made complete sense in 1.5 for TCP and HTTP in close mode. But nowadays with muxes, it's not done at the right layer anymore and even the description doesn't match what is being done, because what happens is that the stream will abort the whole transfer after it's done sending to the mux and this timeout expires. We've seen in GH issue 2095 that this can happen with very short timeout values, and while this didn't trigger often before, now that the muxes (h2 & quic) properly report an end of stream before even the first sc_conn_sync_recv(), it seems that it can happen more often, and have two undesirable effects: - logging a timeout when that's not the case - aborting the request channel, hence the server-side conn, possibly before it had a chance to be put back to the idle list, causing this connection to be closed and not reusable. Unfortunately for TCP (mux_pt) this remains necessary because the mux doesn't have a timeout task. So here we're adding tests to only do this through an HTX mux. But to be really clean we should in fact completely drop all of this and implement these timeouts in the mux itself. This needs to be backported to 2.8 where the issue was discovered, and maybe carefully to older versions, though that is not sure at all. In any case, using a higher timeout or removing client-fin in HTTP proxies is sufficient to make the issue disappear.	2023-06-02 16:33:40 +02:00
Willy Tarreau	ae0f8be011	MINOR: stats: protect against future stats fields omissions As seen in commits `33a4461fa` ("BUG/MINOR: stats: Fix Lua's `get_stats` function") and `a46b142e8` ("BUG/MINOR: Missing stat_field_names (since `f21d17bb`)") it seems frequent to omit to update stats_fields[] when adding a new ST_F_xxx entry. This breaks Lua's get_stats() and shows a "(null)" in the header of "show stat", but that one is not detectable to the naked eye anymore. Let's add a reminder above the enum declaration about this, and a small reg tests checking for the absence of "(null)". It was verified to fail before the last patch above.	2023-06-02 08:39:53 +02:00
Willy Tarreau	cb6a35fdc1	[RELEASE] Released version 2.9-dev0 Released version 2.9-dev0 with the following main changes : - MINOR: version: mention that it's development again	2023-05-31 16:29:19 +02:00
Willy Tarreau	9dc8308a67	MINOR: version: mention that it's development again This essentially reverts `b9b6e94474`.	2023-05-31 16:28:34 +02:00
Willy Tarreau	b9b6e94474	MINOR: version: mention that it's LTS now. The version will be maintained up to around Q2 2028. Let's also update the INSTALL file to mention this.	2023-05-31 16:23:56 +02:00
Amaury Denoyelle	d68f8b5a4a	CLEANUP: mux-quic: rename internal functions This patch is similar to the previous one but for QUIC mux functions used inside the mux code itself or application layer. Replace all occurences of qc_* prefix by qcc_* or qcs_*. This should help to better differentiate code between quic_conn and MUX. This should be backported up to 2.7.	2023-05-30 15:45:55 +02:00
Amaury Denoyelle	6d6ee0dc0b	MINOR: quic: fix stats naming for flow control BLOCKED frames There was a misnaming in stats counter for *_BLOCKED frames in regard to QUIC rfc convention. This patch fixes it to prevent future ambiguity : - STREAMS_BLOCKED -> STREAM_DATA_BLOCKED - STREAMS_DATA_BLOCKED_BIDI -> STREAMS_BLOCKED_BIDI - STREAMS_DATA_BLOCKED_UNI -> STREAMS_BLOCKED_UNI This should be backported up to 2.7.	2023-05-26 17:17:00 +02:00
Amaury Denoyelle	087c5f041b	MINOR: mux-quic: remove nb_streams from qcc Remove nb_streams field from qcc. It was not used outside of a BUG_ON() statement to ensure we never have a negative count of streams. However this is already checked with other fields. This should be backported up to 2.7.	2023-05-26 17:17:00 +02:00
Amaury Denoyelle	7b41dfd834	CLEANUP: mux-quic: remove unneeded fields in qcc Remove fields from qcc structure which are unused. This should be backported up to 2.7.	2023-05-26 17:17:00 +02:00
Patrick Hemmer	425d7ad89d	MINOR: init: pre-allocate kernel data structures on init The Linux kernel maintains data structures to track a processes' open file descriptors, and it expands these structures as necessary when FD usage grows (at every FD=2^X starting at 64). However when threading is in use, during expansion the kernel will pause (observed up to 47ms) while it waits for thread synchronization (see https://bugzilla.kernel.org/show_bug.cgi?id=217366). This change addresses the issue and avoids the random pauses by opening the maximum file descriptor during initialization, so that expansion will not occur while processing traffic.	2023-05-26 09:28:18 +02:00
Willy Tarreau	b298882acc	BUILD: compiler: systematically set USE_OBSOLETE_LINKER with TCC TCC silently ignores the weak and section attributes, which ruins the initcalls. Technically we're exactly in the same situation as with an obsolete linker. Let's just automatically set the flag if TCC is detected, this avoids surprises where the program compiles but does not start. No backport is needed.	2023-05-24 21:37:06 +02:00
Willy Tarreau	eced142aa8	BUILD: ist: use the literal declaration for ist_lc/ist_uc under TCC TCC doesn't knoow about __attribute__((weak)), it silently ignores it. We could add a "static" modifier there in this case but we already have an alternate portable mode that is based on a slightly larger literal for obsolete linkers (and non-ELF systems) which choke on weak. Let's just add the test for tcc there and use it in this case. No backport is needed.	2023-05-24 21:33:34 +02:00
Willy Tarreau	4e8720ab78	BUILD: ist: do not put a cast in an array declaration TCC is upset by the declaration looking like: const unsigned char ist_lc[256] __attribute__((weak)) = ((const unsigned char[256]){ ... }); It was written like this because it's expanded from the _IST_LC macro but it's never used as-is, it's only used from ist_lc, which should be the one containing the cast so that the macro only contains the list of bytes that can be used in both places. And this assigns more consistent roles to the lower and upper case macro/variable now, one is typed and the other one not. No backport is needed.	2023-05-24 21:27:39 +02:00
Fr�d�ric L�caille	12a815ad19	MINOR: quic: Add a counter for sent packets Add ->sent_pkt counter to quic_conn struct to count the packet at QUIC connection level. Then, when the connection is released, the ->sent_pkt counter value is added to the one for the listener. Must be backported to 2.7.	2023-05-24 16:30:11 +02:00
Fr�d�ric L�caille	bdd64fd71d	MINOR: quic: Add some counters at QUIC connection level Add some statistical counters to quic_conn struct from quic_counters struct which are used at listener level to handle them at QUIC connection level. This avoid calling atomic functions. Furthermore this will be useful soon when a counter will be added for the total number of packets which have been sent which will be very often incremented. Some counters were not added, espcially those which count the number of QUIC errors by QUIC error types. Indeed such counters would be incremented most of the time only one time at QUIC connection level. Implement quic_conn_prx_cntrs_update() which accumulates the QUIC connection level statistical counters to the listener level statistical counters. Must be backported to 2.7.	2023-05-24 16:30:11 +02:00
Willy Tarreau	1e1c28873c	BUILD: makefile: fix build issue on GNU make < 3.82 Thierry Fournier reported a build breakage with the ubiquitous make 3.81, LDFLAGS were ignored. This is caused by the declaration of the collect_opt_flags macro that is defined with an "=" sign, something that only appeared in 3.82 and that is not necessary. With it removed, the build now works fine at least from 3.80 to 4.3. No backport is needed since this makefile cleanup appeared in 2.8.	2023-05-24 15:51:03 +02:00
Ilya Shipitsin	97c344dae0	BUILD: quic: re-enable chacha20_poly1305 for libressl this reverts `d2be9d4c48` LibreSSL implements EVP_chacha20_poly1305() with EVP_CIPHER for every released version starting with 3.6.0	2023-05-23 19:20:36 +02:00
Willy Tarreau	b7209d42d9	MEDIUM: stconn: make the SE_FL_ERR_PENDING to ERROR transition systematic During a code audit of the various situations that promote ERR_PENDING to ERROR, it appeared that: - all muxes use se_fl_set_error() to set it, which chooses either based on EOI/EOS presence ; - EOI/EOS that arrive late after ERR_PENDING were not systematically upgraded to ERROR This results in confusion about how such ERROR or ERR_PENDING ought to be handled, which is not quite desirable. This patch adds a test to se_fl_set() to detect if we're setting EOI or EOS while ERR_PENDING is present, or the other way around so that any sequence of EOI/EOS <-> ERR_PENDING results in ERROR being set. This way there will no longer be possible situations where ERROR is missing while the other ones are set.	2023-05-23 16:17:04 +02:00
Amaury Denoyelle	5eadc27623	MINOR: quic: remove return val of quic_aead_iv_build() quic_aead_iv_build() should never fail unless we call it with buffers of different size. This never happens in the code as every input buffers are of size QUIC_TLS_IV_LEN. Remove the return value and add a BUG_ON() to prevent future misusage. This is especially useful to remove one error handling on the sending patch via quic_packet_encrypt(). This should be backported up to 2.7.	2023-05-22 11:17:18 +02:00
Willy Tarreau	5345490b8e	MINOR: clock: provide a function to automatically adjust now_offset Right now there's no way to enforce a specific value of now_ms upon startup in order to compensate for the time it takes to load a config, specifically when dealing with the health check startup. For this we'd need to force the now_offset value to compensate for the last known value of the current date. This patch exposes a function to do exactly this.	2023-05-17 09:33:54 +02:00
Willy Tarreau	5723b382ed	MINOR: stats: report the boot time in "show info" Just like we have the uptime in "show info", let's add the boot time. It's trivial to collect as it's just the difference between the ready date and the start date, and will allow users to monitor this element in order to take action before it starts becoming problematic. Here the boot time is reported in milliseconds, so this allows to even observe sub-second anomalies in startup delays.	2023-05-17 09:33:54 +02:00
Willy Tarreau	da4aa6905c	MINOR: clock: measure the total boot time Some huge configs take a significant amount of time to start and this can cause some trouble (e.g. health checks getting delayed and grouped, process not responding to the CLI etc). For example, some configs might start fast in certain environments and slowly in other ones just due to the use of a wrong DNS server that delays all libc's resolutions. Let's first start by measuring it by keeping a copy of the most recently known ready date, once before calling check_config_validity() and then refine it when leaving this function. A last call is finally performed just before deciding to split between master and worker processes, and it covers the whole boot. It's trivial to collect and even allows to get rid of a call to clock_update_date() in function check_config_validity() that was used in hope to better schedule future events.	2023-05-17 09:33:54 +02:00
Amaury Denoyelle	1a2faef92f	MINOR: mux-quic: uninline qc_attach_sc() Uninline and move qc_attach_sc() function to implementation source file. This will be useful for next commit to add traces in it. This should be backported up to 2.7.	2023-05-16 17:53:45 +02:00
Amaury Denoyelle	3cb78140cf	MINOR: mux-quic: properly report end-of-stream on recv MUX is responsible to put EOS on stream when read channel is closed. This happens if underlying connection is closed or a RESET_STREAM is received. FIN STREAM is ignored in this case. For connection closure, simply check for CO_FL_SOCK_RD_SH. For RESET_STREAM reception, a new flag QC_CF_RECV_RESET has been introduced. It is set when RESET_STREAM is received, unless we already received all data. This is conform to QUIC RFC which allows to ignore a RESET_STREAM in this case. During RESET_STREAM processing, input buffer is emptied so EOS can be reported right away on recv_buf operation. This should be backported up to 2.7.	2023-05-16 17:53:45 +02:00
William Lallemand	d0c363486c	BUILD: ssl: get0_verified chain is available on libreSSL Define HAVE_SSL_get0_verified_chain when it's using libreSSL >= 3.3.6.	2023-05-15 15:16:15 +02:00
William Lallemand	6e0c39d7ac	BUILD: ssl: ssl_c_r_dn fetches uses functiosn only available since 1.1.1 Fix the openssl build with older openssl version by disabling the new ssl_c_r_dn fetch. This also disable the ssl_client_samples.vtc file for OpenSSL version older than 1.1.1	2023-05-15 12:07:52 +02:00
Abhijeet Rastogi	df97f472fa	MINOR: ssl: add new sample ssl_c_r_dn This patch addresses #1514, adds the ability to fetch DN of the root ca that was in the chain when client certificate was verified during SSL handshake.	2023-05-15 10:48:05 +02:00
Amaury Denoyelle	6c501ed23b	BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc qc_stream_buf_alloc() can fail for two reasons : * limit of Tx buffer per connection reached * allocation failure The first case is properly treated. A flag QC_CF_CONN_FULL is set on the connection to interrupt emission. It is cleared when a buffer became available after in order ACK reception and the MUX tasklet is woken up. The allocation failure was handled with the same mechanism which in this case is not appropriate and could lead to a connection transfer freeze. Instead, prefer to close the connection with a QUIC internal error code. To differentiate the two causes, qc_stream_buf_alloc() API was changed to return the number of available buffers to the caller. This must be backported up to 2.6.	2023-05-12 16:26:20 +02:00
Amaury Denoyelle	93dd23cab4	MINOR: mux-quic: remove dedicated function to handle standalone FIN Remove QUIC MUX function qcs_http_handle_standalone_fin(). The purpose of this function was only used when receiving an empty STREAM frame with FIN bit. Besides, it was called by each application protocol which could have different approach and render the function purpose unclear. Invocation of qcs_http_handle_standalone_fin() have been replaced by explicit code in both H3 and HTTP/0.9 module. In the process, use htx_set_eom() to reliably put EOM on the HTX message. This should be backported up to 2.7, along with the previous patch which introduced htx_set_eom().	2023-05-12 15:50:30 +02:00
Amaury Denoyelle	25cf19d5c8	MINOR: htx: add function to set EOM reliably Implement a new HTX utility function htx_set_eom(). If the HTX message is empty, it will first add a dummy EOT block. This is a small trick needed to ensure readers will detect the HTX buffer as not empty and retrieve the EOM flag. Replace the H2 code related by a htx_set_eom() invocation. QUIC also has the same code which will be replaced in the next commit. This should be backported up to 2.7 before the related QUIC patch.	2023-05-12 15:29:28 +02:00
Willy Tarreau	ea07715ccf	MINOR: master/cli: also implement the timed prompt on the master CLI This provides more consistency between the master and the worker. When "prompt timed" is passed on the master, the timed mode is toggled. When enabled, for a master it will show the master process' uptime, and for a worker it will show this worker's uptime. Example: master> prompt timed [0:00:00:50] master> show proc #<PID> <type> <reloads> <uptime> <version> 11940 master 1 [failed: 0] 0d00h02m10s 2.8-dev11-474c14-21 # workers 11955 worker 0 0d00h00m59s 2.8-dev11-474c14-21 # old workers 11942 worker 1 0d00h02m10s 2.8-dev11-474c14-21 # programs [0:00:00:58] master> @!11955 [0:00:01:03] 11955> @!11942 [0:00:02:17] 11942> @ [0:00:01:10] master>	2023-05-11 16:38:52 +02:00
Willy Tarreau	225555711f	MINOR: cli: add an option to display the uptime in the CLI's prompt Entering "prompt timed" toggles reporting of the process' uptime in the prompt, which will report days, hours, minutes and seconds since it was started. As discussed with Tim in issue #2145, this can be convenient to roughly estimate the time between two outputs, as well as detecting that a process failed to be reloaded for example.	2023-05-11 16:38:52 +02:00
Aurelien DARRAGON	31b23aef38	CLEANUP: acl: discard prune_acl_cond() function Thanks to previous commit, we have no more use for prune_acl_cond(), let's remove it to prevent code duplication.	2023-05-11 15:37:04 +02:00
Aurelien DARRAGON	7abc9224a6	MINOR: proxy: add http_free_redirect_rule() function Adding http_free_redirect_rule() function to free a single redirect rule since it may be required to free rules outside of free_proxy() function. This patch is required for an upcoming bugfix. [for 2.2, free_proxy function did not exist (first seen in 2.4), thus http_free_redirect_rule() needs to be deducted from haproxy.c deinit() function if the patch is required]	2023-05-11 15:37:04 +02:00
Christopher Faulet	7542fb43d6	MINOR: stconn: Add a cross-reference between SE descriptor A xref is added between the endpoint descriptors. It is created when the server endpoint is attached to the SC and it is destroyed when an endpoint is detached. This xref is not used for now. But it will be useful to retrieve info about an endpoint for the opposite side. It is also the warranty there is still a endpoint attached on the other side.	2023-05-11 15:37:04 +02:00
Willy Tarreau	4cfb0019e6	MINOR: stats: report the listener's protocol along with the address in stats When "optioon socket-stats" is used in a frontend, its listeners have their own stats and will appear in the stats page. And when the stats page has "stats show-legends", then a tooltip appears on each such socket with ip:port and ID. The problem is that since QUIC arrived, it was not possible to distinguish the TCP listeners from the QUIC ones because no protocol indication was mentioned. Now we add a "proto" legend there with the protocol name, so we can see "tcp4" or "quic6" and figure how the socket is bound.	2023-05-11 14:52:56 +02:00
Amaury Denoyelle	5f67b17a59	MEDIUM: mux-quic: adjust transport layer error handling Following previous patch, error notification from quic_conn has been adjusted to rely on standard connection flags. Most notably, CO_FL_ERROR on the connection instance when a fatal error is detected. Check for CO_FL_ERROR is implemented by qc_send(). If set the new flag QC_CF_ERR_CONN will be set for the MUX instance. This flag is similar to the local error flag and will abort most of the futur processing. To ensure stream upper layer is also notified, qc_wake_some_streams() called by qc_process() will put the stream on error if this new flag is set. This should be backported up to 2.7.	2023-05-11 14:12:48 +02:00
Amaury Denoyelle	b2e31d33f5	MEDIUM: quic: streamline error notification When an error is detected at quic-conn layer, the upper MUX must be notified. Previously, this was done relying on quic_conn flag QUIC_FL_CONN_NOTIFY_CLOSE set and the MUX wake callback called on connection closure. Adjust this mechanism to use an approach more similar to other transport layers in haproxy. On error, connection flags are updated with CO_FL_ERROR, CO_FL_SOCK_RD_SH and CO_FL_SOCK_WR_SH. The MUX is then notified when the error happened instead of just before the closing. To reflect this change, qc_notify_close() has been renamed qc_notify_err(). This function must now be explicitely called every time a new error condition arises on the quic_conn layer. To ensure MUX send is disabled on error, qc_send_mux() now checks CO_FL_SOCK_WR_SH. If set, the function returns an error. This should prevent the MUX from sending data on closing or draining state. To complete this patch, MUX layer must now check for CO_FL_ERROR explicitely. This will be the subject of the following commit. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Amaury Denoyelle	2d5c3f5cd1	MINOR: mux-quic: add traces for stream wake Add traces for when an upper layer stream is woken up by the MUX. This should help to diagnose frozen stream issues. This should be backported up to 2.7.	2023-05-11 14:04:51 +02:00
Willy Tarreau	9615102b01	MINOR: stats: report the number of times the global maxconn was reached As discussed a few times over the years, it's quite difficult to know how often we stop accepting connections because the global maxconn was reached. This is not easy to know because when we reach the limit we stop accepting but we don't know if incoming connections are pending, so it's not possible to know how many were delayed just because of this. However, an interesting equivalent metric consist in counting the number of times an accepted incoming connection resulted in the limit being reached. I.e. "we've accepted the last one for now". That doesn't imply any other one got delayed but it's a factual indicator that something might have been delayed. And by counting the number of such events, it becomes easier to know whether some limits need to be adjusted because they're reached often, or if it's exceptionally rare. The metric is reported as a counter in show info and on the stats page in the info section right next to "maxconn".	2023-05-11 13:51:31 +02:00
Willy Tarreau	3c4a297d2b	MINOR: stats: report the total number of warnings issued Now in "show info" we have a TotalWarnings field that reports the total number of warnings issued since the process started. It's also reported in the the stats page next to the uptime.	2023-05-11 12:02:21 +02:00
Willy Tarreau	29dcc5e559	DEBUG: list: add DEBUG_LIST to purposely corrupt list heads after delete LIST_DELETE doesn't affect the previous pointers of the stored element. This can sometimes hide bugs when such a pointer is reused by accident in a LIST_NEXT() or equivalent after having been detached for example, or ia another LIST_DELETE is performed again, something that LIST_DEL_INIT() is immune to. By compiling with -DDEBUG_LIST, we'll replace a freshly detached list element with two invalid pointers that will cause a crash in case of accidental misuse. It's not enabled by default.	2023-05-11 11:33:35 +02:00
Frédéric Lécaille	b971696296	BUG/MINOR: quic: Possible crash when dumping version information ->others member of tp_version_information structure pointed to a buffer in the TLS stack used to parse the transport parameters. There is no garantee that this buffer is available until the connection is released. Do not dump the available versions selected by the client anymore, but displayed the chosen one (selected by the client for this connection) and the negotiated one. Must be backported to 2.7 and 2.6.	2023-05-10 13:26:37 +02:00
Amaury Denoyelle	58721f2192	BUG/MINOR: mux-quic: fix transport VS app CONNECTION_CLOSE A recent series of patch were introduced to streamline error generation by QUIC MUX. However, a regression was introduced : every error generated by the MUX was built as CONNECTION_CLOSE_APP frame, whereas it should be only for H3/QPACK errors. Fix this by adding an argument <app> in qcc_set_error. When false, a standard CONNECTION_CLOSE is used as error. This bug was detected by QUIC tracker with the following tests "stop_sending" and "server_flow_control" which requires a CONNECTION_CLOSE frame. This must be backported up to 2.7.	2023-05-09 18:42:34 +02:00
Christopher Faulet	557146ccc8	DOC: stconn: Update comments about ABRT/SHUT for stconn structure The comment for the stconn structure was still referencing the SHUTR/SHUTW flags. These flags were replaced and we now use ABRT/SHUT flags in comments. The comment itself was slightly updated to be accurate.	2023-05-09 16:36:45 +02:00
Christopher Faulet	e59f7583ee	MEDIUM: stconn: Be sure to always be able to unblock a SC that needs room When sc_need_room() is called, the caller cannot request more free space than a minimum value to be sure it is always possible to unblock it. it is a safety guard to not freeze any SC on NEED_ROOM condition. At worse it will lead to some wakeups un excess at the edge. To keep things simple, the following minimum is used: (global.tune.bufsize - global.tune.maxrewrite - sizeof(struct htx))	2023-05-09 11:53:28 +02:00
Frédéric Lécaille	1bc6e318f0	CLEANUP: quic: Rename several <buf> variables in quic_frame.(c\|h) Most of the function in quic_frame.c and quic_frame.h manipulate <buf> buffer position variables which have nothing to see with struct buffer variables. Rename them to <pos> Should be backported to 2.7.	2023-05-09 10:48:40 +02:00
Frédéric Lécaille	d19a02a40e	CLEANUP: quic: No more used q_buf structure This definition is no more used. Should be backported to 2.7.	2023-05-09 10:48:40 +02:00
Willy Tarreau	652d1712dd	BUILD: quic: fix build warning when threads are disabled Commit `e83f937cc` ("MEDIUM: quic: use a global CID trees list") uses a local variable "tree" used only for locks, but when threads are disabled it spews a warning about this unused variable.	2023-05-07 15:06:22 +02:00
Willy Tarreau	dd9f921b3a	CLEANUP: fix a few reported typos in code comments These are only the few relevant changes among those reported here: https://github.com/haproxy/haproxy/actions/runs/4856148287/jobs/8655397661	2023-05-07 07:07:44 +02:00
Willy Tarreau	615c301db4	MINOR: config: allow cpu-map to take commas in lists of ranges The function that cpu-map uses to parse CPU sets, parse_cpu_set(), was etended in 2.4 with commit `a80823543` ("MINOR: cfgparse: support the comma separator on parse_cpu_set") to support commas between ranges. But since it was quite late in the development cycle, by then it was decided not to add a last-minute surprise and not to magically support commas in cpu-map, hence the "comma_allowed" argument. Since then we know that it was not the best choice, because the comma is silently ignored in the cpu-map syntax, causing all sorts of surprises in field with threads running on a single node for example. In addition it's quite common to copy-paste a taskset line and put it directly into the haproxy configuration. This commit relaxes this rule an finally allows cpu-map to support commas between ranges. It simply consists in removing the comma_allowed argument in the parse_cpu_set() function. The doc was updated to reflect this.	2023-05-05 18:41:52 +02:00
Aurelien DARRAGON	fc4ec0d653	MINOR: hlua: declare hlua_yieldk() function Declaring hlua_yieldk() function to make it usable from hlua_fcn.c.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	40cd44f52c	MINOR: hlua: declare hlua_gethlua() function Declaring hlua_gethlua() function to make it usable from hlua_fcn.c.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	34c86760fa	MINOR: hlua: declare hlua_{ref,pushref,unref} functions Declaring hlua_{ref,pushref,unref} functions to make them usable from hlua_fcn.c to simplify reference handling.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	5bed48fec8	MINOR: mailers/hlua: disable email sending from lua Exposing a new hlua function, available from body or init contexts, that forcefully disables the sending of email alerts even if the mailers are defined in haproxy configuration. This will help for sending email directly from lua. (prevent legacy email sending from intefering with lua)	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	dcbc2d2cac	MINOR: checks/event_hdl: SERVER_CHECK event Adding a new event type: SERVER_CHECK. This event is published when a server's check state ought to be reported. (check status change or check result) SERVER_CHECK event is provided as a server event with additional data carrying relevant check's context such as check's result and health.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	a163d65254	MINOR: server/event_hdl: add SERVER_ADMIN event Adding a new SERVER event in the event_hdl API. SERVER_ADMIN is implemented as an advanced server event. It is published each time the administrative state changes. (when s->cur_admin changes) SERVER_ADMIN data is an event_hdl_cb_data_server_admin struct that provides additional info related to the admin state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	e3eea29f48	MINOR: server/event_hdl: add SERVER_STATE event Adding a new SERVER event in the event_hdl API. SERVER_STATE is implemented as an advanced server event. It is published each time the server's effective state changes. (when s->cur_state changes) SERVER_STATE data is an event_hdl_cb_data_server_state struct that provides additional info related to the server state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	3889efa8e4	MINOR: hlua_fcn: add Server.get_proxy() Server.get_proxy(): get the proxy to which the server belongs (or nil if not available)	2023-05-05 16:28:32 +02:00
Christopher Faulet	7b3d38a633	MEDIUM: tree-wide: Change sc API to specify required free space to progress sc_need_room() now takes the required free space to receive more data as parameter. All calls to this function are updated accordingly. For now, this value is set but not used. When we are waiting for a buffer, 0 is used. So we expect to be unblocked ASAP. However this must be reviewed because SC_FL_NEED_BUF is probably enough in this case and this flag is already set if the input buffer allocation fails.	2023-05-05 15:44:23 +02:00
Christopher Faulet	9aed1124ed	MINOR: stconn: Add a field to specify the room needed by the SC to progress When the SC is blocked because it is waiting for room in the input buffer, it will be responsible to specify the minimum free space required to progress. In this commit, we only introduce the field in the stconn structure that will be used to store this value. It is a signed value with the following meaning: * -1: The SC is waiting for room but not based on the buffer state. It will be typically used during splicing when the pipe is full. In this case, only a successful send can unblock the SC. * >= 0; The minimum free space in the input buffer to unblock the SC. 0 is a special value to specify the SC must be unblocked ASAP, by the stream, at the end of process_stream() or when output data are consumed on the opposite side.	2023-05-05 15:41:30 +02:00
Christopher Faulet	f4258bdf3b	MINOR: stats: Use the applet API to write data stats_putchk() is updated to use the applet API instead of the channel API to write data. To do so, the appctx is passed as parameter instead of the channel. This way, the applet does not need to take care to request more room it it fails to put data into the channel's buffer.	2023-05-05 15:41:29 +02:00
William Lallemand	b6ae2aafde	MINOR: ssl: allow to change the signature algorithm for client authentication This commit introduces the keyword "client-sigalgs" for the bind line, which does the same as "sigalgs" but for the client authentication. "ssl-default-bind-client-sigalgs" allows to set the default parameter for all the bind lines. This patch should fix issue #2081.	2023-05-05 00:05:46 +02:00
William Lallemand	1d3c822300	MINOR: ssl: allow to change the server signature algorithm This patch introduces the "sigalgs" keyword for the bind line, which allows to configure the list of server signature algorithms negociated during the handshake. Also available as "ssl-default-bind-sigalgs" in the default section. This patch was originally written by Bruno Henc.	2023-05-04 22:43:18 +02:00
Willy Tarreau	e69919d1ba	CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash() The function isn't used anymore since each call place performs its own loop. Let's get rid of it.	2023-05-04 19:19:04 +02:00
Willy Tarreau	9a6ecbd590	MEDIUM: debug: simplify the thread dump mechanism The thread dump mechanism that is used by "show threads" and by the panic dump is overly complicated due to an initial misdesign. It firsts wakes all threads, then serializes their dumps, then releases them, while taking extreme care not to face colliding dumps. In fact this is not what we need and it reached a limit where big machines cannot dump all their threads anymore due to buffer size limitations. What is needed instead is to be able to dump one thread, and to let the requester iterate on all threads. That's what this patch does. It adds the thread_dump_buffer to the struct thread_ctx so that the requester offers the buffer to the thread that is about to be dumped. This buffer also serves as a lock. A thread at rest has a NULL, a valid pointer indicates the thread is using it, and 0x1 (NULL+1) is used by the dumped thread to tell the requester it's done. This makes sure that a given thread is dumped once at a time. In addition to this, the calling thread decides whether it accesses the thread by itself or via the debug signal handler, in order to get a backtrace. This is much saner because the calling thread is free to do whatever it wants with the buffer after each thread is dumped, and there is no dependency between threads, once they've dumped, they're free to continue (and possibly to dump for another requester if needed). Finally, when the THREAD_DUMP feature is disabled and the debug signal is not used, the requester accesses the thread by itself like before. For now we still have the buffer size limitation but it will be addressed in future patches.	2023-05-04 19:15:44 +02:00
Aurelien DARRAGON	e910909556	BUG/MINOR: time: fix NS_TO_TV macro NS_TO_TV helper was implemented in `591fa59` ("MINOR: time: add conversions to/from nanosecond timestamps") Due to NS_TO_TV being implemented as a macro and not a function, we must take extra care when manipulating user input. In current implementation, 't' argument is not isolated within the macro. Because of this, NS_TO_TV(1 + 1) will expand to: ((const struct timeval){ .tv_sec = 1 + 1 / 1000000000ULL, .tv_usec = (1 + 1 % 1000000000ULL) / 1000U }) Instead of: ((const struct timeval){ .tv_sec = 2 / 1000000000ULL, .tv_usec = (2 % 1000000000ULL) / 1000U }) As such, NS_TO_TV usage in hlua_now() is currently incorrect and this results in unexpected values being passed to lua. In this patch, we're adding an extra parenthesis around 't' in NS_TO_TV() macro to make it safe against such usages. (that is: ensure proper argument expansion as if NS_TO_TV was implemented as a function) This is a 2.8 specific bug, no backport needed.	2023-05-04 18:09:50 +02:00
Amaury Denoyelle	51f116d65e	MINOR: mux-quic: adjust local error API When a fatal error is detected by the QUIC MUX or H3 layer, the connection should be closed with a CONNECTION_CLOSE with an error code as the reason. Previously, a direct call was used to the quic_conn layer to try to close the connection. This API was adjusted to be more flexible. Now, when an error is detected, the function qcc_set_error() is called. This set the flag QC_CF_ERRL with the error code stored by the MUX. The connection will be closed soon so most of the operations are not conducted anymore. Connection is then finally closed during qc_send() via quic_conn layer if QC_CF_ERRL is set. This will set the flag QC_CF_ERRL_DONE which indicates that the MUX instance can be freed. This model is cleaner and brings the following improvments : - interaction with quic_conn layer for closure is centralized on a single function - CO_FL_ERROR is not set anymore. This was incorrect as this should be reserved to errors reported by the transport layer to be similar with other haproxy components. As a consequence, qcc_is_dead() has been adjusted to check for QC_CF_ERRL_DONE to release the MUX instance. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	8d44bfaf0b	MINOR: mux-quic: add trace event for local error Add a dedicated trace event QMUX_EV_QCC_ERR. This is used for locally detected error when a CONNECTION_CLOSE should be emitted. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	bc0adfa334	MINOR: proxy: factorize send rate measurement Implement a new dedicated function increment_send_rate() which can be call anywhere new bytes must be accounted for global total sent.	2023-04-28 16:53:44 +02:00
Willy Tarreau	c05d30e9d8	MINOR: clock: replace the timeval start_time with start_time_ns Now that "now" is no more a timeval, there's no point keeping a copy of it as a timeval, let's also switch start_time to nanoseconds, it simplifies operations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	eed5da1037	MINOR: clock: do not use now.tv_sec anymore Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec part to disappear. At this point, "now" is only used as a timeval in clock.c where it is updated.	2023-04-28 16:08:08 +02:00
Willy Tarreau	e8e4712771	MINOR: checks: use a nanosecond counters instead of timeval for checks->start Now we store the checks start date as a nanosecond timestamps instead of a timeval, this will simplify the operations with "now" in the near future.	2023-04-28 16:08:08 +02:00
Willy Tarreau	ad5a5f6779	MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request Let's get rid of timeval in storage of internal timestamps so that they are no longer mistaken for wall clock time. These were exclusively used subtracted from each other or to/from "now" after being converted to ns, so this patch removes the tv_to_ns() conversion to use them natively. Two occurrences of tv_isge() were turned to a regular wrapping subtract.	2023-04-28 16:08:08 +02:00
Willy Tarreau	aaebcae58b	MINOR: spoe: switch the timeval-based timestamps to nanosecond timestamps Various points were collected during a request/response and were stored using timeval. Let's now switch them to nanosecond based timestamps.	2023-04-28 16:08:08 +02:00
Willy Tarreau	76d343d3d3	MINOR: time: replace calls to tv_ms_elapsed() with a linear subtract Instead of operating on {sec, usec} now we convert both operands to ns then subtract them and convert to ms. This is a first step towards dropping timeval from these timestamps. Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at all and could be removed.	2023-04-28 16:08:08 +02:00
Willy Tarreau	591fa59da7	MINOR: time: add conversions to/from nanosecond timestamps In order to ease the transition away from the timeval used in internal timestamps, let's first create a few functions and macro to return a counter from a timeval and conversely, as well as ease the conversions to/from ns/us/ms/sec to save the user from having to count zeroes and to think about appending ULL in conversions.	2023-04-28 16:08:08 +02:00
Christopher Faulet	81951f264e	BUG/MINOR: stconn: Fix SC flags with same value SC_FL_SND_NEVERWAIT and SC_FL_SND_EXP_MORE flags have the same value. It is not critical because these flags are only used to know if MSG_MORE flag must be set on a send(). No backport needed.	2023-04-28 08:51:34 +02:00
Christopher Faulet	e99c43907c	BUG/MEDIUM: spoe: Don't start new applet if there are enough idle ones It is possible to start too many applets on sporadic burst of events after an inactivity period. It is due to the way we estimate if a new applet must be created or not. It is based on a frequency counter. We compare the events processing rate against the number of events currently processed (in progress or waiting to be processed). But we should also take care of the number of idle applets. We already track the number of idle applets, but it is global and not per-thread. Thus we now also track the number of idle applets per-thread. It is not a big deal because this fills a hole in the spoe_agent structure. Thanks to this counter, we can refrain applets creation if there is enough idle applets to handle currently processed events. This patch should be backported to every stable versions.	2023-04-28 08:51:34 +02:00
Amaury Denoyelle	d6646dddcc	MINOR: quic: finalize affinity change as soon as possible During accept, a quic-conn is rebind to a new thread. This process is done in two times : * first on the original thread via qc_set_tid_affinity() * then on the newly assigned thread via qc_finalize_affinity_rebind() Most quic_conn operations (I/O tasklet, task and quic_conn FD socket read) are reactivated ony after the second step. However, there is a possibility that datagrams are handled before it via quic_dgram_parse() when using listener sockets. This does not seem to cause any issue but this may cause unexpected behavior in the future. To simplify this, qc_finalize_affinity_rebind() will be called both by qc_xprt_start() and quic_dgram_parse(). Only one invocation will be performed thanks to the new flag QUIC_FL_CONN_AFFINITY_CHANGED. This should be backported up to 2.7.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	24962dd178	BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length Some HTX responses may not always contain a EOM block. For example this is the case if content-length header is missing from the HTTP server response. Stream termination is thus signaled to QUIC mux via shutw callback. However, this is interpreted inconditionnally as an early close by the mux with a RESET_STREAM emission. Most of the times, QUIC clients report this as an error. To fix this, check if htx.extra is set to HTX_UNKOWN_PAYLOAD_LENGTH for a qcs instance. If true, shutw will never be used to emit a RESET_STREAM. Instead, the stream will be closed properly with a FIN STREAM frame. If all data were already transfered, an empty STREAM frame is sent. This fix may help with the github issue #2004 where chrome browser stop to use QUIC after receiving RESET_STREAM frames. This issue was reported by Vladimir Zakharychev. Thanks to him for his help and testing. It was also reproduced locally using httpterm with the query string "/?s=1k&b=0&C=1". This should be backported up to 2.7.	2023-04-26 17:50:09 +02:00
Willy Tarreau	543e2544ca	DEBUG: crash using an invalid opcode on aarch64 instead of an invalid access On aarch64 there's also a guaranted invalid instruction, called UDF, and which even supports an optional 16-bit immediate operand: https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/UDF--Permanently-Undefined-?lang=en It's conveniently encoded as 4 zeroes (when the operand is zero). It's unclear when support for it was added into GAS, if at all; even a not-so-old 2.27 doesn't know about it. Let's byte-encode it. Tested on an A72 and works as expected.	2023-04-25 19:53:39 +02:00
Willy Tarreau	77787ec9bc	DEBUG: crash using an invalid opcode on x86/x86_64 instead of an invalid access BUG_ON() calls currently trigger a segfault. This is more convenient than abort() as it doesn't rely on any function call nor signal handler and never causes non-unwindable stacks when opening cores. But it adds quite some confusion in bug reports which are rightfully tagged "segv" and do not instantly allow to distinguish real segv (e.g. null derefs) from code asserts. Some CPU architectures offer various crashing methods. On x86 we have INT3 (0xCC), which stops into the debugger, and UD0/UD1/UD2. INT3 looks appealing but for whatever reason (maybe signal handling somewhere) it loses the last call point in the stack, making backtraces unusable. UD2 has the merit of being only 2 bytes and causing an invalid instruction, which almost never happens normally, so it's easily distinguishable. Here it was defined as a macro so that the line number in the core matches the one where the BUG_ON() macro is called, and the debugger shows the last frame exactly at its calligg point. E.g. when calling "debug dev bug": Program terminated with signal SIGILL, Illegal instruction. #0 debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408 408 BUG_ON(one > zero); [Current thread is 1 (Thread 0x7f7a660cc1c0 (LWP 14238))] (gdb) bt #0 debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408 #1 debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:402 #2 0x000000000061a69f in cli_parse_request (appctx=appctx@entry=0x181c0160) at src/cli.c:832 #3 0x000000000061af86 in cli_io_handler (appctx=0x181c0160) at src/cli.c:1035 #4 0x00000000006ca2f2 in task_run_applet (t=0x181c0290, context=0x181c0160, state=<optimized out>) at src/applet.c:449	2023-04-25 18:51:10 +02:00
Amaury Denoyelle	d5f03cd576	CLEANUP: quic: rename frame variables Rename all frame variables with the suffix _frm. This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:22 +02:00
Amaury Denoyelle	888c5f283a	CLEANUP: quic: rename frame types with an explicit prefix Each frame type used in quic_frame union has been renamed with the following prefix "qf_". This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:03 +02:00
Willy Tarreau	7310164b2c	MINOR: listener: add a new global tune.listener.default-shards setting This new setting accepts "by-process", "by-group" and "by-thread" and will dictate how listeners will be sharded by default when nothing is specified. While the default remains "by-process", "by-group" should be much more efficient with many threads, while not changing anything for single-group setups.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f1003ea7fa	MINOR: protocol: perform a live check for SO_REUSEPORT support When testing if a protocol supports SO_REUSEPORT, we're now able to verify if the OS does really support it. While it may be supported at build time, it may possibly have been blocked in a container for example so we'd rather know what it's like.	2023-04-23 09:46:15 +02:00
Willy Tarreau	b073573c10	MINOR: sock: add a function to check for SO_REUSEPORT support at runtime The new function _sock_supports_reuseport() will be used to check if a protocol type supports SO_REUSEPORT or not. This will be useful to verify that shards can really work.	2023-04-23 09:46:15 +02:00
Willy Tarreau	8a5e6f4cca	MINOR: protocol: add a function to check if some features are supported The new function protocol_supports_flag() checks the protocol flags to verify if some features are supported, but will support being extended to refine the tests. Let's use it to check for REUSEPORT.	2023-04-23 09:46:15 +02:00
Willy Tarreau	785b89f551	MINOR: protocol: move the global reuseport flag to the protocols Some protocol support SO_REUSEPORT and others not. Some have such a limitation in the kernel, and others in haproxy itself (e.g. sock_unix cannot support multiple bindings since each one will unbind the previous one). Also it's really protocol-dependent and not just family-dependent because on Linux for some time it was supported for TCP and not UDP. Let's move the definition to the protocols instead. Now it's preset in tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset. The enabled() config condition test validates IPv4 (generally sufficient), and -dR / noreuseport all protocols at once.	2023-04-23 09:46:15 +02:00
Willy Tarreau	65df7e028d	MINOR: protocol: add a flags field to store info about protocols We'll use these flags to know if some protocols are supported, and if so, with what options/extensions. Reuseport will move there for example. Two functions were added to globally set/clear a flag.	2023-04-23 09:46:15 +02:00
Willy Tarreau	da0d2cb698	MINOR: proxy: make proxy_type_str() recognize peers sections Now proxy_type_str() will emit "peers section" when the mode is set to peers, so as to ease sharing more code between peers and proxies.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f6a8444f55	REORG: listener: move the bind_conf's thread setup code to listener.c What used to be only two lines to apply a mask in a loop in check_config_validity() grew into a 130-line block that performs deeply listener-specific operations that do not have their place there anymore. In addition it's worth noting that the peers code still doesn't support shards nor being bound to more than one group, which is a second reason for moving that code to its own function. Nothing was changed except recreating the missing variables from the bind_conf itself (the fe only).	2023-04-23 09:46:15 +02:00
Willy Tarreau	4c538df28c	CLEANUP: protocol: move the nb_receivers to plug a hole in protocol This field forces an unaligned hole between two list heads. Let's move it up where it will be more easily combined with other fields. In addition, turn it to unsigned while it's still not used.	2023-04-23 09:46:15 +02:00
Willy Tarreau	798d6b4124	CLEANUP: protocol: move the l3_addrlen to plug a hole in proto_fam There's a two-byte hole in proto_fam after sock_family, let's move the l3_addrlen there as a ushort. Note that contrary to what the comment says, it's still not used by hash algorithms though it could.	2023-04-23 09:46:15 +02:00
Willy Tarreau	df4051cd58	BUILD: proto_tcp: export the correct names for proto_tcpv[46] The exported names were not correct (missing the 'v').	2023-04-23 09:46:15 +02:00
Willy Tarreau	968a4f34fc	BUILD: sock_inet: forward-declare struct receiver Including sock_inet.h without receiver-t.h causes build failures due to struct receiver not being defined. Let's just forward-declare it.	2023-04-23 09:46:15 +02:00
Ilya Shipitsin	ccf8012f28	CLEANUP: assorted typo fixes in the code and comments This is 36th iteration of typo fixes	2023-04-23 09:44:53 +02:00
Tim Duesterhus	3a8c63d48d	MINOR: Make `tasklet_free()` safe to be called with `NULL` Make this freeing function safe, like other freeing functions are as discussed in GitHub issue #2126.	2023-04-23 00:28:25 +02:00
Willy Tarreau	ff18504d73	MINOR: listener: make sure to avoid ABA updates in per-thread index One limitation of the current thread index mechanism is that if the values are assigned multiple times to the same thread and the index loops, it can match again the old value, which will not prevent a competing thread from finishing its CAS and assigning traffic to a thread that's not the optimal one. The probability is low but the solution is simple enough and consists in implementing an update counter in the high bits of the index to force a mismatch in this case (assuming we don't try to cover for extremely unlikely cases where the update counter loops while the index remains equal). So let's do that. In order to improve the situation a little bit, we now set the index to a ulong so that in 32 bits we have 8 bits of counter and in 64 bits we have 40 bits.	2023-04-21 17:41:26 +02:00
Willy Tarreau	e6f5ab5afa	MINOR: listener: make accept_queue index atomic There has always been a race when checking the length of an accept queue to determine which one is more loaded that another, because the head and tail are read at two different moments. This is not required, we can merge them as two 16 bit numbers inside a single 32-bit index that is always accessed atomically. This way we read both values at once and always have a consistent measurement.	2023-04-21 17:41:26 +02:00
Willy Tarreau	e4c36aa8a1	MINOR: receiver: add RX_F_MUST_DUP to indicate that an rx must be duped The purpose of this new flag will be to mark that some listeners duplicate their reference's FD instead of trying to setup a completely new listener from scratch. This will be used when multiple groups want to listen to the same socket, via multiple FDs.	2023-04-21 17:41:26 +02:00
Willy Tarreau	aae1810b4d	MINOR: receiver: add a struct shard_info to store info about each shard In order to create multiple receivers for one multi-group shard, we'll need some more info about the shard. Here we store: - the number of groups (= number of receivers) - the number of threads (will be used for accept LB) - pointer to the reference rx (to get the FD and to find all threads) - pointers to the other members (to iterate over all threads) For now since there's only one group per shard it remains simple. The listener deletion code already takes care of removing the current member from its shards list and moving others' reference to the last one if it was their reference (so as to avoid o(n^2) updates during ordered deletes). Since the vast majority of setups will not use multi-group shards, we try to save memory usage by only allocating the shard_info when it is needed, so the principle here is that a receiver shard_info==NULL is alone and doesn't share its socket with another group. Various approaches were considered and tests show that the management of the listeners during boot makes it easier to just attach to or detach from a shard_info and automatically allocate it if it does not exist, which is what is being done here. For now the attach code is not called, but detach is already called on delete.	2023-04-21 17:41:26 +02:00
Willy Tarreau	84fe1f479b	MINOR: listener: support another thread dispatch mode: "fair" This new algorithm for rebalancing incoming connections to multiple threads is simpler and instead of considering the threads load, it will only cycle through all of them, offering a fair share of the traffic to each thread. It may be well suited for short-lived connections but is also convenient for very large thread counts where it's not always certain that the least loaded thread will always be found.	2023-04-21 17:41:26 +02:00
Willy Tarreau	6a4d48b736	MINOR: quic_sock: index li->per_thr[] on local thread id, not global one There's a li_per_thread array in each listener for use with QUIC listeners. Since thread groups were introduced, this array can be allocated too large because global.nbthread is allocated for each listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP) may be used by a single listener. This was because the global thread ID is used as the index instead of the local ID (since a listener may only be used by a single group). Let's just switch to local ID and reduce the allocated size.	2023-04-21 17:41:26 +02:00
Willy Tarreau	77d37b07b1	MINOR: quic: support migrating the listener as well When migrating a quic_conn to another thread, we may need to also switch the listener if the thread belongs to another group. When this happens, the freshly created connection will already have the target listener, so let's just pick it from the connection and use it in qc_set_tid_affinity(). Note that it will be the caller's responsibility to guarantee this.	2023-04-21 17:41:26 +02:00
Aurelien DARRAGON	76e255520f	MINOR: server: pass adm and op cause to srv_update_status() Operational and administrative state change causes are not propagated through srv_update_status(), instead they are directly consumed within the function to provide additional info during the call when required. Thus, there is no valid reason for keeping adm and op causes within server struct. We are wasting space and keeping uneeded complexity. We now exlicitly pass change type (operational or administrative) and associated cause to srv_update_status() so that no extra storage is needed since those values are only relevant from srv_update_status().	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	1746b56e68	MINOR: server: change srv_op_st_chg_cause storage type This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type". While looking at current srv_op_st_chg_cause usage, it was clear that the struct needed some cleanup since some leftovers from asynchronous server state change updates were left behind and resulted in some useless code duplication, and making the whole thing harder to maintain. Two observations were made: - by tracking down srv_set_{running, stopped, stopping} usage, we can see that the <reason> argument is always a fixed statically allocated string. - check-related state change context (duration, status, code...) is not used anymore since srv_append_status() directly extracts the values from the server->check. This is pure legacy from when the state changes were applied asynchronously. To prevent code duplication, useless string copies and make the reason/cause more exportable, we store it as an enum now, and we provide srv_op_st_chg_cause() function to fetch the related description string. HEALTH and AGENT causes (check related) are now explicitly identified to make consumers like srv_append_op_chg_cause() able to fetch checks info from the server itself if they need to.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	f3b48a808e	MINOR: server: srv_append_status refacto srv_append_status() has become a swiss-knife function over time. It is used from server code and also from checks code, with various inputs and distincts code paths, making it very hard to guess the actual behavior of the function (resulting string output). To simplify the logic behind it, we're dividing it in multiple contextual functions that take simple inputs and do explicit things, making them more predictable and easier to maintain.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	9b1ccd7325	MINOR: server: change adm_st_chg_cause storage type Even though it doesn't look like it at first glance, this is more like a cleanup than an actual code improvement: Given that srv->adm_st_chg_cause has been used to exclusively store static strings ever since it was implemented, we make the choice to store it as an enum instead of a fixed-size string within server struct. This will allow to save some space in server struct, and will make it more easily exportable (ie: event handlers) because of the reduced memory footprint during handling and the ability to later get the corresponding human-readable message when it's explicitly needed.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	e9314fb7a7	MINOR: event_hdl: provide event->when for advanced handlers For advanced async handlers only (Registered using EVENT_HDL_ASYNC_TASK() macro): event->when is provided as a struct timeval and fetched from 'date' haproxy global variable. Thanks to 'when', related event consumers will be able to timestamp events, even if they don't work in real-time or near real-time. Indeed, unlike sync or normal async handlers, advanced async handlers could purposely delay the consumption of pending events, which means that the date wouldn't be accurate if computed directly from within the handler.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	ebf58e991a	MINOR: event_hdl: dynamically allocated event data members Add the ability to provide a cleanup function for event data passed via the publishing function. One use case could be the need to provide valid pointers in the safe section of the data struct. Cleanup function will be automatically called with data (or copy of data) as argument when all handlers consumed the event, which provides an easy way to release some memory or decrement refcounts to ressources that were provided through the data struct. data in itself may not be freed by the cleanup function, it is handled by the API. This would allow passing large (allocated) data blocks through the data struct while keeping data struct size under the EVENT_HDL_ASYNC_EVENT_DATA size limit. To do so, when publishing an event, where we would currently do: struct event_hdl_cb_data_new_family event_data; /* safe data, available from both sync and async contexts * may not use pointers to short-living resources / event_data.safe.my_custom_data = x; / unsafe data, only available from sync contexts / event_data.unsafe.my_unsafe_data = y; / once data is prepared, we can publish the event / event_hdl_publish(NULL, EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1, EVENT_HDL_CB_DATA(&event_data)); We could do: struct event_hdl_cb_data_new_family event_data; / safe data, available from both sync and async contexts * may not use pointers to short-living resources, * unless EVENT_HDL_CB_DATA_DM is used to ensure pointer * consistency (ie: refcount) / event_data.safe.my_custom_static_data = x; event_data.safe.my_custom_dynamic_data = malloc(1); / unsafe data, only available from sync contexts / event_data.unsafe.my_unsafe_data = y; / once data is prepared, we can publish the event / event_hdl_publish(NULL, EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1, EVENT_HDL_CB_DATA_DM(&event_data, data_new_family_cleanup)); With data_new_family_cleanup func which would look like this: void data_new_family_cleanup(const void data) { const struct event_hdl_cb_data_new_family event_data = ptr; / some data members require specific cleanup once the event * is consumed / free(event_data.safe.my_custom_dynamic_data); / don't ever free data! it is not ours */ } Not sure if this feature will become relevant in the future, so I prefer not to mention it in the doc for now. But given that the implementation is trivial and does not put a burden on the existing API, it's a good thing to have it there, just in case.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	147691fd83	CLEANUP: event_hdl: fix comment typo about _sync assertion Fixing a comment relative to EVENT_HDL_ASSERT_SYNC macro where a typo was made and the comment was lacking some context.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	363ef4daa7	CLEANUP: event_hdl: updating obsolete comment for EVENT_HDL_CB_DATA EVENT_HDL_CB_DATA macro comments were not updated during the API refactor, fixing that.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	8273bfc639	BUG/MINOR: event_hdl: don't waste 1 event subtype slot ESUB_INDEX(n) index macro is used exclusively with n > 0 Fixing it so that it starts numbering at 1 instead of 2. This way, we don't waste a subtype slot in event_hdl_sub_type struct, and we comply with the structure comments about max supported event subtypes (currently set at 16). If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	a63f4903c9	MINOR: server/event_hdl: prepare for upcoming refactors This commit does nothing that ought to be mentioned, except that it adds missing comments and slighty moves some function calls out of "sensitive" code in preparation of some server code refactors.	2023-04-21 14:36:45 +02:00
Aurelien DARRAGON	d714213862	MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server Expose proxy_uuid variable in event_hdl_cb_data_server struct to overcome proxy_name fixed length limitation. proxy_uuid may be used by the handler to perform proxy lookups. This should be preferred over lookups relying proxy_name. (proxy_name is suitable for printing / logging purposes but not for ID lookups since it has a maximum fixed length)	2023-04-21 14:36:45 +02:00
Frédéric Lécaille	0ed94032b2	MINOR: quic: Do not allocate too much ack ranges Limit the maximum number of ack ranges to QUIC_MAX_ACK_RANGES(32). Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Frédéric Lécaille	4b2627beae	BUG/MINOR: quic: Stop removing ACK ranges when building packets Since this commit: BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit. There are more chances that ack ranges may be removed from their trees when building a packet. It is preferable to impose a limit to these trees. This will be the subject of the a next commit to come. For now on, it is sufficient to stop deleting ack range from their trees. Remove quic_ack_frm_reduce_sz() and quic_rm_last_ack_ranges() which were there to do that. Make qc_frm_len() support ACK frames and calls it to ensure an ACK frame may be added to a packet before building it. Must be backported to 2.6 and 2.7.	2023-04-19 11:36:54 +02:00
Aurelien DARRAGON	2a9764baae	CLEANUP: hlua: avoid confusion between internal timers and tick based timers Not all hlua "time" variables use the same time logic. hlua->wake_time relies on ticks since its meant to be used in conjunction with task scheduling. Thus, it should be stored as a signed int and manipulated using the tick api. Adding a few comments about that to prevent mixups with hlua internal timer api which doesn't rely on the ticks api.	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	da9503ca9a	MEDIUM: hlua: reliable timeout detection For non yieldable lua handlers (converters, fetches or yield incompatible lua functions), current timeout detection relies on now_ms thread local variable. But within non-yieldable contexts, now_ms won't be updated if not by us (because we're momentarily stuck in lua context so we won't re-enter the polling loop, which is responsible for clock updates). To circumvent this, clock_update_date(0, 1) was manually performed right before now_ms is being read for the timeout checks. But this fails to work consistently, because if no other concurrent threads periodically run clock_update_global_date(), which do happen if we're the only active thread (nbthread=1 or low traffic), our clock_update_date() call won't reliably update our local now_ms variable Moreover, clock_update_date() is not the right tool for this anyway, as it was initially meant to be used from the polling context. Using it could have negative impact on other threads relying on now_ms to be stable. (because clock_update_date() performs global clock update from time to time) -> Introducing hlua multipurpose timer, which is internally based on now_cpu_time_fast() that provides per-thread consistent clock readings. Thanks to this new hlua timer API, hlua timeout logic is less error-prone and more robust. This allows the timeout detection to work as expected for both yieldable and non-yieldable lua handlers. This patch depends on commit "MINOR: clock: add now_cpu_time_fast() function" While this could theorically be backported to all stable versions, it is advisable to avoid backports unless we're confident enough since it could cause slight behavior changes (timing related) in existing setups.	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	df188f145b	MINOR: clock: add now_cpu_time_fast() function Same as now_cpu_time(), but for fast queries (less accurate) Relies on now_cpu_time() and now_mono_time_fast() is used as a cache expiration hint to prevent now_cpu_time() from being called too often since it is known to be quite expensive. Depends on commit "MINOR: clock: add now_mono_time_fast() function"	2023-04-19 11:03:31 +02:00
Aurelien DARRAGON	07cbd8e074	MINOR: clock: add now_mono_time_fast() function Same as now_mono_time(), but for fast queries (less accurate) Relies on coarse clock source (also known as fast clock source on some systems). Fallback to now_mono_time() if coarse source is not supported on the system.	2023-04-19 11:03:31 +02:00
Amaury Denoyelle	0783a7b08e	MINOR: listener: remove unneeded local accept flag Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC protocol before thread rebinding was supported by the quic_conn layer. This should be backported up to 2.7 after the previous patch has also been taken.	2023-04-18 17:09:34 +02:00
Amaury Denoyelle	739de3f119	MINOR: quic: properly finalize thread rebinding When a quic_conn instance is rebinded on a new thread its tasks and tasklet are destroyed and new ones created. Its socket is also migrated to a new thread which stop reception on it. To properly reactivate a quic_conn after rebind, wake up its tasks and tasklet if they were active before thread rebind. Also reactivate reading on the socket FD. These operations are implemented on a new function qc_finalize_affinity_rebind(). This should be backported up to 2.7 after a period of observation.	2023-04-18 17:09:02 +02:00
Amaury Denoyelle	25174d51ef	MEDIUM: quic: implement thread affinity rebinding Implement a new function qc_set_tid_affinity(). This function is responsible to rebind a quic_conn instance to a new thread. This operation consists mostly of releasing existing tasks and tasklet and allocating new instances on the new thread. If the quic_conn uses its owned socket, it is also migrated to the new thread. The migration is finally completed with updated the CID TID to the new thread. After this step, the connection is thus accessible to the new thread and cannot be access anymore on the old one without risking race condition. To ensure rebinding is either done completely or not at all, tasks and tasklet are pre-allocated before all operations. If this fails, an error is returned and rebiding is not done. To destroy the older tasklet, its context is set to NULL before wake up. In I/O callbacks, a new function qc_process() is used to check context and free the tasklet if NULL. The thread rebinding can cause a race condition if the older thread quic_dghdlrs::dgrams list contains datagram for the connection after rebinding is done. To prevent this, quic_rx_pkt_retrieve_conn() always check if the packet CID is still associated to the current thread or not. In the latter case, no connection is returned and the new thread is returned to allow to redispatch the datagram to the new thread in a thread-safe way. This should be backported up to 2.7 after a period of observation.	2023-04-18 17:08:34 +02:00
Amaury Denoyelle	1304d19dee	MINOR: quic: delay post handshake frames after accept When QUIC handshake is completed on our side, some frames are prepared to be sent : * HANDSHAKE_DONE * several NEW_CONNECTION_ID with CIDs allocated This step was previously executed in quic_conn_io_cb() directly after CRYPTO frames parsing. This patch delays it to be completed after accept. Special care have been taken to ensure it is still functional with 0-RTT activated. For the moment, this patch should have no impact. However, when quic_conn thread migration on accept will be implemented, it will be easier to remap only one CID to the new thread. New CIDs will be allocated after migration on the new thread. This should be backported up to 2.7 after a period of observation.	2023-04-18 17:08:28 +02:00
Amaury Denoyelle	a66e04338e	MINOR: protocol: define new callback set_affinity Define a new protocol callback set_affinity. This function is used during listener_accept() to notify about a rebind on a new thread just before pushing the connection on the selected thread queue. If the callback fails, accept is done locally. This change will be useful for protocols with state allocated before accept is done. For the moment, only QUIC protocol is concerned. This will allow to rebind the quic_conn to a new thread depending on its load. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:52 +02:00
Amaury Denoyelle	1e959ad522	MINOR: quic: remove TID encoding in CID CIDs were moved from a per-thread list to a global list instance. The TID-encoded is thus non needed anymore. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:31 +02:00
Amaury Denoyelle	e83f937cc1	MEDIUM: quic: use a global CID trees list Previously, quic_connection_id were stored in a per-thread tree list. Datagram were first dispatched to the correct thread using the encoded TID before a tree lookup was done. Remove these trees and replace it with a global trees list of 256 entries. A CID is using the list index corresponding to its first byte. On datagram dispatch, CID is lookup on its tree and TID is retrieved using new member quic_connection_id.tid. As such, a read-write lock protects each list instances. With 256 entries, it is expected that contention should be reduced. A new structure quic_cid_tree served as a tree container associated with its read-write lock. An API is implemented to ensure lock safety for insert/lookup/delete operation. This patch is a step forward to be able to break the affinity between a CID and a TID encoded thread. This is required to be able to migrate a quic_conn after accept to select thread based on their load. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:17 +02:00
Amaury Denoyelle	66947283ba	MINOR: quic: remove TID ref from quic_conn Remove <tid> member in quic_conn. This is moved to quic_connection_id instance. For the moment, this change has no impact. Indeed, qc.tid reference could easily be replaced by tid as all of this work was already done on the connection thread. However, it is planified to support quic_conn thread migration in the future, so removal of qc.tid will simplify this. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	c2a9264f34	MINOR: quic: adjust quic CID derive API ODCID are never stored in the CID tree. Instead, we store our generated CID which is directly derived from the CID using a hash function. This operation is done via quic_derive_cid(). Previously, generated CID was returned as a 64-bits integer. However, this is cumbersome to convert as an array of bytes which is the most common CID representation. Adjust this by modifying return type to a quic_cid struct. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	1a5cc19cec	MINOR: quic: adjust Rx packet type parsing qc_parse_hd_form() is the function used to parse the first byte of a packet and return its type and version. Its API has been simplified with the following changes : * extra out paremeters are removed (long_header and version). All infos are now stored directly in quic_rx_packet instance * a new dummy version is declared in quic_versions array with a 0 number code. This can be used to match Version negotiation packets. * a new default packet type is defined QUIC_PACKET_TYPE_UNKNOWN to be used as an initial value. Also, the function has been exported to an include file. This will be useful to be able to reuse on quic-sock to parse the first packet of a datagram. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	591e7981d9	CLEANUP: quic: rename quic_connection_id vars Two different structs exists for QUIC connection ID : * quic_connection_id which represents a full CID with its sequence number * quic_cid which is just a buffer with a length. It is contained in the above structure. To better differentiate them, rename all quic_connection_id variable instances to "conn_id" by contrast to "cid" which is used for quic_cid. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	90e5027e46	CLEANUP: quic: remove unused scid_node Remove unused scid_node member for quic_conn structure. It was prepared for QUIC backend support. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Amaury Denoyelle	22a368ce58	CLEANUP: quic: remove unused QUIC_LOCK label QUIC_LOCK label is never used. Indeed, lock usage is minimal on QUIC as every connection is pinned to its owned thread. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Christopher Faulet	ca5309a9a3	MINOR: stconn: Add a flag to report EOS at the stream-connector level SC_FL_EOS flag is added to report the end-of-stream at the SC level. It will be used to distinguish end of stream reported by the endoint, via the SE_FL_EOS flag, and the abort triggered by the stream, via the SC_FL_ABRT_DONE flag. In this patch, the flag is defined and is systematically tested everywhere SC_FL_ABRT_DONE is tested. It should be safe because it is never set.	2023-04-17 17:41:28 +02:00
Christopher Faulet	a1d14a7c7f	MINOR: stconn: Add a flag to ack endpoint errors at SC level The flag SC_FL_ERROR is added to ack errors on the endpoint. When SE_FL_ERROR flag is detected on the SE descriptor, the corresponding is set on the SC. Idea is to avoid, as far as possible, to manipulated the SE descriptor in upper layers and know when an error in the endpoint is handled by the SC. For now, this flag is only set and cleared but never tested.	2023-04-14 17:05:53 +02:00
Christopher Faulet	b2b1c3a6ea	MINOR: channel/stconn: Replace sc_shutw() by sc_shutdown() All reference to a shutw is replaced by an abort. So sc_shutw() is renamed sc_shutdown(). SC app ops functions are renamed accordingly.	2023-04-14 15:02:57 +02:00
Christopher Faulet	208c712b40	MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE Here again, it is just a flag renaming. In SC flags, there is no longer shutdown for writes but shutdowns.	2023-04-14 15:01:21 +02:00
Christopher Faulet	cfc11c0eae	MINOR: channel/stconn: Replace sc_shutr() by sc_abort() All reference to a shutr is replaced by an abort. So sc_shutr() is renamed sc_abort(). SC app ops functions are renamed accordingly.	2023-04-14 14:54:35 +02:00
Christopher Faulet	0c370eee6d	MINOR: stconn: Rename SC_FL_SHUTR in SC_FL_ABRT_DONE Here again, it is just a flag renaming. In SC flags, there is no longer shutdown for reads but aborts. For now this flag is set when a read0 is detected. It is of couse not accurate. This will be changed later.	2023-04-14 14:51:22 +02:00
Christopher Faulet	df7cd710a8	MINOR: channel/stconn: Replace channel_shutw_now() by sc_schedule_shutdown() After the flag renaming, it is now the turn for the channel function to be renamed and moved in the SC scope. channel_shutw_now() is replaced by sc_schedule_shutdown(). The request channel is replaced by the front SC and the response is replace by the back SC.	2023-04-14 14:49:45 +02:00
Christopher Faulet	e38534cbd0	MINOR: stconn: Rename SC_FL_SHUTW_NOW in SC_FL_SHUT_WANTED Because shutowns for reads are now considered as aborts, the shudowns for writes can now be considered as shutdowns. Here it is just a flag renaming. SC_FL_SHUTW_NOW is renamed SC_FL_SHUT_WANTED.	2023-04-14 14:46:07 +02:00
Christopher Faulet	12762f09a5	MINOR: channel/stconn: Replace channel_shutr_now() by sc_schedule_abort() After the flag renaming, it is now the turn for the channel function to be renamed and moved in the SC scope. channel_shutr_now() is replaced by sc_schedule_abort(). The request channel is replaced by the front SC and the response is replace by the back SC.	2023-04-14 14:08:49 +02:00
Christopher Faulet	573ead1e68	MINOR: stconn: Rename SC_FL_SHUTR_NOW in SC_FL_ABRT_WANTED It is the first step to transform shutdown for reads for the upper layer into aborts. This patch is quite simple, it is just a flag renaming.	2023-04-14 14:06:01 +02:00
Christopher Faulet	7eb837df4a	MINOR: stream: Introduce stream_abort() to abort on both sides in same time The function stream_abort() should now be called when an abort is performed on the both channels in same time.	2023-04-14 14:04:59 +02:00
Christopher Faulet	3db538ac2f	MINOR: channel: Forwad close to other side on abort Most of calls to channel_abort() are associated to a call to channel_auto_close(). Others are in areas where the auto close is the default. So, it is now systematically enabled when an abort is performed on a channel, as part of channel_abort() function.	2023-04-14 13:56:28 +02:00
Christopher Faulet	dbad8ec787	MINOR: stream: Uninline and export sess_set_term_flags() function This function will be used to set termination flags on TCP streams from outside of process_stream(). Thus, it must be uninlined and exported.	2023-04-14 12:13:09 +02:00
Frédéric Lécaille	fad0e6cf73	MINOR: quic: Add packet loss and maximum cc window to "show quic" Add the number of packet losts and the maximum congestion control window computed by the algorithms to "show quic". Same thing for the traces of existent congestion control algorithms. Must be backported to 2.7 and 2.6.	2023-04-13 19:20:08 +02:00
Willy Tarreau	d30e82b9f0	MINOR: receiver: reserve special values for "shards" Instead of artificially setting the shards count to MAX_THREAD when "by-thread" is used, let's reserve special values for symbolic names so that we can add more in the future. For now we use value -1 for "by-thread", which requires to turn the type to signed int but it was already used as such everywhere anyway.	2023-04-13 17:12:50 +02:00
Amaury Denoyelle	53fc98c3bc	MINOR: fd: implement fd_migrate_on() to migrate on a non-local thread fd_migrate_on() can be used to migrate an existing FD to any thread, even one belonging to a different group from the current one and from the caller's. All that is needed is to make sure the FD is still valid when the operation is performed (which is the case when such operations happen). This is potentially slightly expensive since it locks the tgid during the delicate operation, but it is normally performed only from an owning thread to offer the FD to another one (e.g. reassign a better thread upon accept()).	2023-04-13 16:57:51 +02:00
Willy Tarreau	7b44c26e13	MINOR: fd: add a lock bit with the tgid In order to permit to migrate FDs from one thread group to another, we'll need to be able to set a TGID that is compatible with no other thread group. Either we use a special value or we dedicate a special bit. Given that we already have way more bits than needed, let's just sacrifice the topmost one to serve as a lock bit, indicating the tgid is not valid anymore. This will make all fd_grab_tgid() fail to grab it. The new fd_lock_tgid() function now tries to assign a locked tgid to an idle FD, and fd_unlock_tgid() simply drops the lock bit, revealing the target tgid. For now it's still unused so it must not have any effect.	2023-04-13 16:57:51 +02:00
Willy Tarreau	4d882bd800	MINOR: fd: optimize fd_claim_tgid() for use in fd_insert() fd_claim_tgid() uses a CAS to set the desired TGID on the FD. It's only called from fd_insert() where in the vast majority of cases, the tgid and refcount are zero before the call. However the loop was optimized for the case where it was equal to the desired TGID, systematically causing one extra round in the loop there. Better start assuming a zero value.	2023-04-13 16:57:51 +02:00
Willy Tarreau	97da942ba6	MINOR: thread: keep a bitmask of enabled groups in thread_set We're only checking for 0, 1, or >1 groups enabled there, and we'll soon need to be more precise and know quickly which groups are non-empty. Let's just replace the count with a mask of enabled groups. This will allow to quickly spot the presence of any such group in a set.	2023-04-13 16:57:51 +02:00
William Lallemand	3f210970bf	BUG/MINOR: stick_table: alert when type len has incorrect characters Alert when the len argument of a stick table type contains incorrect characters. Replace atol by strtol. Could be backported in every maintained versions.	2023-04-13 14:46:08 +02:00
Willy Tarreau	7f2b3f9431	BUILD: bug.h: add a warning in the base API when unsafe functions are used Once in a while we introduce an sprintf() or strncat() function by accident. These ones are particularly dangerous and must never ever be used because the only way to use them safely is at least as complicated if not more, than their safe counterparts. By redefining a few of these functions with an attribute_warning() we can deliver a message to the developer who is tempted to use them. This commit does it for strcat(), strcpy(), strncat(), sprintf(), vsprintf(). More could come later if needed, such as strtok() and maybe a few others, but these are less common.	2023-04-07 18:21:36 +02:00
Willy Tarreau	d499127148	MINOR: compiler: define a __attribute__warning() macro __attribute__((deprecated)) is convenient to discourage from using something deprecated, but gcc >= 4.3 provides __attribute__((warning(x))) that allows to display a specific warning if something is used. This is particularly convenient to give indications when some API parts need to be adapted. Let's just define it as a macro that falls back to the older deprecated attribute when not available. It's supported on clang 14 as well but works differently and errors out when redefined (while the main purpose precisely is to add such a redefinition). Thus instead on clang we use deprecated(msg) which is OK. See https://github.com/llvm/llvm-project/issues/56519	2023-04-07 18:14:28 +02:00
Willy Tarreau	988e19c607	BUILD: compiler: fix __equals_1() on older compilers It appeared that __has_attribute() doesn't work on gcc 4.4 and older because the concatenation of __has_attribute##x isn't resolved as a one before being passed to __equals_1() which immediately concatenates it to comma_for_one. We first need to pass it through an extra layer to resolve this name to a value. The new version was tested with gcc 4.2 to 11.3. This may be backported though it's pretty minor.	2023-04-07 18:14:28 +02:00
Olivier Houchard	ead43fe4f2	MEDIUM: compression: Make it so we can compress requests as well. Add code so that compression can be used for requests as well. New compression keywords are introduced : "direction" that specifies what we want to compress. Valid values are "request", "response", or "both". "type-req" and "type-res" define content-type to be compressed for requests and responses, respectively. "type" is kept as an alias for "type-res" for backward compatibilty. "algo-req" specifies the compression algorithm to be used for requests. Only one algorithm can be provided. "algo-res" provides the list of algorithm that can be used to compress responses. "algo" is kept as an alias for "algo-res" for backward compatibility.	2023-04-07 00:49:17 +02:00
Olivier Houchard	dea25f51b6	MINOR: compression: Count separately request and response compression Duplicate the compression counters, so that we have separate counters for request and response compression.	2023-04-07 00:47:04 +02:00
Olivier Houchard	db573e9c58	MINOR: compression: Store algo and type for both request and response Make provision for being able to store both compression algorithms and content-types to compress for both requests and responses. For now only the responses one are used.	2023-04-07 00:46:59 +02:00
Olivier Houchard	dfc11da561	MINOR: compression: Prepare compression code for request compression Make provision for storing the compression algorithm and the compression context twice, one for requests, and the other for responses. Only the response ones are used for now.	2023-04-07 00:46:55 +02:00
Olivier Houchard	3ce0f01b81	MINOR: compression: Make compression offload a flag Turn compression offload into a flag in struct comp, instead of using an int just for it.	2023-04-07 00:46:45 +02:00
Christopher Faulet	6bb26d41fe	BUG/MUNOR: http-ana: Use an unsigned integer for http_msg flags In the commit `2954bcc1e` (BUG/MINOR: http-ana: Don't switch message to DATA when waiting for payload), the HTTP message flags were extended and don't fit anymore in an unsigned char. So, we must use an unsigned integer now. It is not a big deal because there was already a 6-bytes hole in the structure, just after the flags. Now, there are a 3-bytes hold before. This patch should fix the issue #2105. It is 2.8-specific, no backport needed.	2023-04-06 08:58:45 +02:00
Amaury Denoyelle	15adc4cc4e	MINOR: quic: remove address concatenation to ODCID Previously, ODCID were concatenated with the client address. This was done to prevent a collision between two endpoints which used the same ODCID. Thanks to the two previous patches, first connection generated CID is now directly derived from the client ODCID using a hash function which uses the client source address from the same purpose. Thus, it is now unneeded to concatenate client address to <odcid> quic-conn member. This change allows to simplify the quic_cid structure management and reduce its size which is important as it is embedded several times in various structures such as quic_conn and quic_rx_packet. This should be backported up to 2.7.	2023-04-05 11:09:57 +02:00
Amaury Denoyelle	2c98209c1c	MINOR: quic: remove ODCID dedicated tree First connection CID generation has been altered. It is now directly derived from client ODCID since previous commit : commit `162baaff7a` MINOR: quic: derive first DCID from client ODCID This patch removes the ODCID tree which is now unneeded. On connection lookup via CID, if a DCID is not found the hash derivation is performed for an INITIAL/0-RTT packet only. In case a client has used multiple times an ODCID, this will allow to retrieve our generated DCID in the CID tree without storing the ODCID node. The impact of this two combined patch is that it may improve slightly haproxy memory footprint by removing a tree node from quic_conn structure. The cpu calculation induced by hash derivation should only be performed only a few times per connection as the client will start to use our generated CID as soon as it received it. This should be backported up to 2.7.	2023-04-05 11:07:01 +02:00
Christopher Faulet	ffcffa8e93	MINOR: http-ana: Add a HTTP_MSGF flag to state the Expect header was checked HTTP_MSGF_EXPECT_CHECKED is now set on the request message to know the "Expect: " header was already handled, if any. The flag is set from the moment we try to handle the header to send a "100-continue" response, whether it was found or not. This way, when we are waiting for the request payload, thanks to this flag, we only try to handle "Expect: " header only once. Before it was performed by changing the message state from BODY to DATA. But this has some side effects and it is no accurate. So, it is better to rely on a flag to do so.	2023-04-05 10:33:32 +02:00
Aurelien DARRAGON	c84899c636	MEDIUM: hlua/event_hdl: initial support for event handlers Now that the event handler API is pretty mature, we can expose it in the lua API. Introducing the core.event_sub(<event_types>, <cb>) lua function that takes an array of event types <event_types> as well as a callback function <cb> as argument. The function returns a subscription <sub> on success. Subscription <sub> allows you to manage the subscription from anywhere in the script. To this day only the sub->unsub method is implemented. The following event types are currently supported: - "SERVER_ADD": when a server is added - "SERVER_DEL": when a server is removed from haproxy - "SERVER_DOWN": server states goes from up to down - "SERVER_UP": server states goes from down to up As for the <cb> function: it will be called when one of the registered event types occur. The function will be called with 3 arguments: cb(<event>,<data>,<sub>) <event>: event type (string) that triggered the function. (could be any of the types used in <event_types> when registering the subscription) <data>: data associated with the event (specific to each event family). For "SERVER_" family events, server details such as server name/id/proxy will be provided. If the server still exists (not yet deleted), a reference to the live server is provided to spare you from an additionnal lookup if you need to have direct access to the server from lua. <sub> refers to the subscription. In case you need to manage it from within an event handler. (It refers to the same subscription that the one returned from core.event_sub()) Subscriptions are per-thread: the thread that will be handling the event is the one who performed the subscription using core.event_sub() function. Each thread treats events sequentially, it means that if you have, let's say SERVER_UP, then SERVER_DOWN in a short timelapse, then your cb function will first be called with SERVER_UP, and once you're done handling the event, your function will be called again with SERVER_DOWN. This is to ensure event consitency when it comes to logging / triggering logic from lua. Your lua cb function may yield if needed, but you're pleased to process the event as fast as possible to prevent the event queue from growing up To prevent abuses, if the event queue for the current subscription goes over 100 unconsumed events, the subscription will pause itself automatically for as long as it takes for your handler to catch up. This would lead to events being missed, so a warning will be emitted in the logs to inform you about that. This is not something you want to let happen too often, it may indicate that you subscribed to an event that is occurring too frequently or/and that your callback function is too slow to keep up the pace and you should review it. If you want to do some parallel processing because your callback functions are slow: you might want to create subtasks from lua using core.register_task() from within your callback function to perform the heavy job in a dedicated task and allow remaining events to be processed more quickly. Please check the lua documentation for more information.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	4e5e26641d	MINOR: proxy: add findserver_unique_id() and findserver_unique_name() Adding alternative findserver() functions to be able to perform an unique match based on name or puid and by leveraging revision id (rid) to make sure the function won't match with a new server reusing the same name or puid of the "potentially deleted" server we were initially looking for. For example, if you were in the position of finding a server based on a given name provided to you by a different context: Since dynamic servers were implemented, between the time the name was picked and the time you will perform the findserver() call some dynamic server deletion/additions could've been performed in the mean time. In such cases, findserver() could return a new server that re-uses the name of a previously deleted server. Depending on your needs, it could be perfectly fine, but there are some cases where you want to lookup the original server that was provided to you (if it still exists).	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	f751a97a11	MINOR: event_hdl: pause/resume for subscriptions While working on event handling from lua, the need for a pause/resume function to temporarily disable a subscription was raised. We solve this by introducing the EHDL_SUB_F_PAUSED flag for subscriptions. The flag is set via _pause() and cleared via _resume(), and it is checked prior to notifying the subscription in publish function. Pause and Resume functions are also available for via lookups for identified subscriptions. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	b4b7320a6a	MINOR: event_hdl: add event_hdl_async_equeue_size() function Use event_hdl_async_equeue_size() in advanced async task handler to get the near real-time event queue size. By near real-time, you should understand that the queue size is not updated during element insertion/removal, but shortly before insertion and shortly after removal, so the size should reflect the approximate queue size at a given time but should definitely not be used as a unique source of truth. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	9e98a27d6a	MINOR: event_hdl: add event_hdl_async_equeue_isempty() function Add event_hdl_async_equeue_isempty() to check is the event queue is empty from an advanced async task handler. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	b289fd1420	MINOR: event_hdl: normal tasks support for advanced async mode advanced async mode (EVENT_HDL_ASYNC_TASK) provided full support for custom tasklets registration. Due to the similarities between tasks and tasklets, it may be useful to use the advanced mode with an existing task (not a tasklet). While the API did not explicitly disallow this usage, things would get bad if we try to wakeup a task using tasklet_wakeup() for notifying the task about new events. To make the API support both custom tasks and tasklets, we use the TASK_IS_TASKLET() macro to call the proper waking function depending on the task's type: - For tasklets: we use tasklet_wakeup() - For tasks: we use task_wakeup() If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	ef6ca67176	BUG/MEDIUM: event_hdl: clean soft-stop handling soft-stop was not explicitly handled in event_hdl API. Because of this, event_hdl was causing some leaks on deinit paths. Moreover, a task responsible for handling events could require some additional cleanups (ie: advanced async task), and as the task was not protected against abort when soft-stopping, such cleanup could not be performed unless the task itself implements the required protections, which is not optimal. Consider this new approach: 'jobs' global variable is incremented whenever an async subscription is created to prevent the related task from being aborted before the task acknowledges the final END event. Once the END event is acknowledged and freed by the task, the 'jobs' variable is decremented, and the deinit process may continue (including the abortion of remaining tasks not guarded by the 'jobs' variable). To do this, a new global mt_list is required: known_event_hdl_sub_list This list tracks the known (initialized) subscription lists within the process. sub_lists are automatically added to the "known" list when calling event_hdl_sub_list_init(), and are removed from the list with event_hdl_sub_list_destroy(). This allows us to implement a global thread-safe event_hdl deinit() function that is automatically called on soft-stop thanks to signal(0). When event_hdl deinit() is initiated, we simply iterate against the known subscription lists to destroy them. event_hdl_subscribe_ptr() was slightly modified to make sure that a sub_list may not accept new subscriptions once it is destroyed (removed from the known list) This can occur between the time the soft-stop is initiated (signal(0)) and haproxy actually enters in the deinit() function (once tasks are either finished or aborted and other threads already joined). It is safe to destroy() the subscription list multiple times as long as the pointer is still valid (ie: first on soft-stop when handling the '0' signal, then from regular deinit() path): the function does nothing if the subscription list is already removed. We partially reverted "BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe" since we can use parent mt_list locking instead of a dedicated lock to make the check gainst duplicate subscription ID. (insert_lock is not useful anymore) The check in itself is not changed, only the locking method. sizeof(event_hdl_sub_list) slightly increases: from 24 bits to 32bits due to the additional mt_list struct within it. With that said, having thread-safe list to store known subscription lists is a good thing: it could help to implement additional management logic for subcription lists and could be useful to add some stats or debugging tools in the future. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	3a81e997ac	MINOR: event_hdl: global sublist management clarification event_hdl_sub_list_init() and event_hdl_sub_list_destroy() don't expect to be called with a NULL argument (to use global subscription list implicitly), simply because the global subscription list init and destroy is internally managed. Adding BUG_ON() to detect such invalid usages, and updating some comments to prevent confusion around these functions. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	d514ca45c6	BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe List insertion in event_hdl_subscribe() was not thread-safe when dealing with unique identifiers. Indeed, in this case the list insertion is conditional (we check for a duplicate, then we insert). And while we're using mt lists for this, the whole operation is not atomic: there is a race between the check and the insertion. This could lead to the same ID being registered multiple times with concurrent calls to event_hdl_subscribe() on the same ID. To fix this, we add 'insert_lock' dedicated lock in the subscription list struct. The lock's cost is nearly 0 since it is only used when registering identified subscriptions and the lock window is very short: we only guard the duplicate check and the list insertion to make the conditional insertion "atomic" within a given subscription list. This is the only place where we need the lock: as soon as the item is properly inserted we're out of trouble because all other operations on the list are already thread-safe thanks to mt lists. A new lock hint is introduced: LOCK_EHDL which is dedicated to event_hdl The patch may seem quite large since we had to rework the logic around the subscribe function and switch from simple mt_list to a dedicated struct wrapping both the mt_list and the insert_lock for the event_hdl_sub_list type. (sizeof(event_hdl_sub_list) is now 24 instead of 16) However, all the changes are internal: we don't break the API. If `68e692da0` ("MINOR: event_hdl: add event handler base api") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Aurelien DARRAGON	53eb6aecce	BUG/MINOR: event_hdl: fix rid storage type rid is stored as a uint32_t within struct server, but it was stored as a signed int within the server event data struct. Switching from signed int to uint32_t in event_hdl_cb_data_server struct to make sure it won't overflow. If `129ecf441` ("MINOR: server/event_hdl: add support for SERVER_ADD and SERVER_DEL events") is being backported, then this commit should be backported with it.	2023-04-05 08:58:17 +02:00
Thierry Fournier	1edf36a369	MEDIUM: hlua_fcn: dynamic server iteration and indexing This patch proposes to enumerate servers using internal HAProxy list. Also, remove the flag SRV_F_NON_PURGEABLE which makes the server non purgeable each time Lua uses the server. Removing reg-tests/cli_delete_server_lua.vtc since this test is no longer relevant (we don't set the SRV_F_NON_PURGEABLE flag anymore) and we already have a more generic test: reg-tests/server/cli_delete_server.vtc Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>	2023-04-05 08:58:16 +02:00
Thierry Fournier	467913c84e	MEDIUM: hlua: Dynamic list of frontend/backend in Lua When HAproxy is loaded with a lot of frontends/backends (tested with 300k), it is slow to start and it uses a lot of memory just for indexing backends in the lua tables. This patch uses the internal frontend/backend index of HAProxy in place of lua table. HAProxy startup is now quicker as each frontend/backend object is created on demand and not at init. This has to come with some cost: the execution of Lua will be a little bit slower.	2023-04-05 08:58:16 +02:00
Thierry Fournier	599f2311a8	MINOR: hlua: Fix two functions that return nothing useful Two lua init function seems to return something useful, but it is not the case. The function "hlua_concat_init" seems to return a failure status, but the function never fails. The function "hlua_fcn_reg_core_fcn" seems to return a number of elements in the stack, but it is not the case.	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	f175b08bfb	BUG/MINOR: server/del: fix srv->next pointer consistency We recently discovered a bug which affects dynamic server deletion: When a server is deleted, it is removed from the "visible" server list. But as we've seen in previous commit ("MINOR: server: add SRV_F_DELETED flag"), it can still be accessed by someone who keeps a reference on it (waiting for the final srv_drop()). Throughout this transient state, server ptr is still valid (may be dereferenced) and the flag SRV_F_DELETED is set. However, as the server is not part of server list anymore, we have an issue: srv->next pointer won't be updated anymore as the only place where we perform such update is in cli_parse_delete_server() by iterating over the "visible" server list. Because of this, we cannot guarantee that a server with the SRV_F_DELETED flag has a valid 'next' ptr: 'next' could be pointing to a fully removed (already freed) server. This problem can be easily demonstrated with server dumping in the stats: server list dumping is performed in stats_dump_proxy_to_buffer() The function can be interrupted and resumed later by design. ie: output buffer is full: partial dump and finish the dump after the flush This is implemented by calling srv_take() on the server being dumped, and only releasing it when we're done with it using srv_drop(). (drop can be delayed after function resume if buffer is full) While the function design seems OK, it works with the assumption that srv->next will still be valid after the function resumes, which is not true. (especially if multiple servers are being removed in between the 2 dumping attempts) In practice, this did not cause any crash yet (at least this was not reported so far), because server dumping is so fast that it is very unlikely that multiple server deletions make their way between 2 dumping attempts in most setups. But still, this is a problem that we need to address because some upcoming work might depend on this assumption as well and for the moment it is not safe at all. ======================================================================== Here is a quick reproducer: With this patch, we're creating a large deletion window of 3s as soon as we reach a server named "t2" while iterating over the list. This will give us plenty of time to perform multiple deletions before the function is resumed. \| diff --git a/src/stats.c b/src/stats.c \| index 84a4f9b6e..15e49b4cd 100644 \| --- a/src/stats.c \| +++ b/src/stats.c \| @@ -3189,11 +3189,24 @@ int stats_dump_proxy_to_buffer(struct stconn sc, struct htx htx, \| * Temporarily increment its refcount to prevent its \| * anticipated cleaning. Call free_server to release it. \| / \| + struct server orig = ctx->obj2; \| for (; ctx->obj2 != NULL; \| ctx->obj2 = srv_drop(sv)) { \| \| sv = ctx->obj2; \| + printf("sv = %s\n", sv->id); \| srv_take(sv); \| + if (!strcmp("t2", sv->id) && orig == px->srv) { \| + printf("deletion window: 3s\n"); \| + thread_idle_now(); \| + thread_harmless_now(); \| + sleep(3); \| + thread_harmless_end(); \| + \| + thread_idle_end(); \| + \| + goto full; /* simulate full buffer / \| + } \| \| if (htx) { \| if (htx_almost_full(htx)) \| @@ -4353,6 +4366,7 @@ static void http_stats_io_handler(struct appctx appctx) \| struct channel res = sc_ic(sc); \| struct htx req_htx, res_htx; \| \| + printf("http dump\n"); \| / only proxy stats are available via http / \| ctx->domain = STATS_DOMAIN_PROXY; \| Ok, we're ready, now we start haproxy with the following conf: global stats socket /tmp/ha.sock mode 660 level admin expose-fd listeners thread 1-1 nbthread 2 frontend stats mode http bind :8081 thread 2-2 stats enable stats uri / backend farm server t1 127.0.0.1:1899 disabled server t2 127.0.0.1:18999 disabled server t3 127.0.0.1:18998 disabled server t4 127.0.0.1:18997 disabled And finally, we execute the following script: curl localhost:8081/stats& sleep .2 echo "del server farm/t2" \| nc -U /tmp/ha.sock echo "del server farm/t3" \| nc -U /tmp/ha.sock This should be enough to reveal the issue, I easily manage to consistently crash haproxy with the following reproducer: http dump sv = t1 http dump sv = t1 sv = t2 deletion window = 3s [NOTICE] (2940566) : Server deleted. [NOTICE] (2940566) : Server deleted. http dump sv = t2 sv = ��U [1] 2940566 segmentation fault (core dumped) ./haproxy -f ttt.conf ======================================================================== To fix this, we add prev_deleted mt_list in server struct. For a given "visible" server, this list will contain the pending "deleted" servers references that point to it using their 'next' ptr. This way, whenever this "visible" server is going to be deleted via cli_parse_delete_server() it will check for servers in its 'prev_deleted' list and update their 'next' pointer so that they no longer point to it, and then it will push them in its 'next->prev_deleted' list to transfer the update responsibility to the next 'visible' server (if next != NULL). Then, following the same logic, the server about to be removed in cli_parse_delete_server() will push itself as well into its 'next->prev_deleted' list (if next != NULL) so that it may still use its 'next' ptr for the time it is in transient removal state. In srv_drop(), right before the server is finally freed, we make sure to remove it from the 'next->prev_deleted' list so that 'next' won't try to perform the pointers update for this server anymore. This has to be done atomically to prevent 'next' srv from accessing a purged server. As a result: for a valid server, either deleted or not, 'next' ptr will always point to a non deleted (ie: visible) server. With the proposed fix, and several removal combinations (including unordered cli_parse_delete_server() and srv_drop() calls), I cannot reproduce the crash anymore. Example tricky removal sequence that is now properly handled: sv list: t1,t2,t3,t4,t5,t6 ops: take(t2) del(t4) del(t3) del(t5) drop(t3) drop(t4) drop(t5) drop(t2)	2023-04-05 08:58:16 +02:00
Aurelien DARRAGON	75b9d1c041	MINOR: server: add SRV_F_DELETED flag Set the SRV_F_DELETED flag when server is removed from the cli. When removing a server from the cli (in cli_parse_delete_server()), we update the "visible" server list so that the removed server is no longer part of the list. However, despite the server being removed from "visible" server list, one could still access the server data from a valid ptr (ie: srv_take()) Deleted flag helps detecting when a server is in transient removal state: that is, removed from the list, thus not visible but not yet purged from memory.	2023-04-05 08:58:16 +02:00
Christopher Faulet	7faac7cf34	MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly At many places, we simplify the tests on SHUT flags to remove calls to chn_prod() or chn_cons() function because the corresponding SC is available.	2023-04-05 08:57:06 +02:00
Christopher Faulet	87633c3a11	MEDIUM: tree-wide: Move flags about shut from the channel to the SC The purpose of this patch is only a one-to-one replacement, as far as possible. CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not so simple because at many places, we were testing if a channel was shut for reads and writes in same time. To do the same, shut for reads must be tested on one side on the SC and shut for writes on the other side on the opposite SC. A special care was taken with process_stream(). flags of SCs must be saved to be able to detect changes, just like for the channels.	2023-04-05 08:57:06 +02:00
Christopher Faulet	904763f562	MINOR: stconn/channel: Move CF_EOI into the SC and rename it The channel flag CF_EOI is renamed to SC_FL_EOI and moved into the stream-connector.	2023-04-05 08:57:06 +02:00
Christopher Faulet	84d3ef982c	MINOR: stconn/channel: Move CF_EXPECT_MORE into the SC and rename it The channel flag CF_EXPECT_MORE is renamed to SC_FL_SND_EXP_MORE and moved into the stream-connector.	2023-04-05 08:57:05 +02:00
Christopher Faulet	68ef218a72	MINOR: stconn/channel: Move CF_NEVER_WAIT into the SC and rename it The channel flag CF_NEVER_WAIT is renamed to SC_FL_SND_NEVERWAIT and moved into the stream-connector.	2023-04-05 08:57:05 +02:00
Christopher Faulet	5c281d58ea	MINOR: stconn/channel: Move CF_SEND_DONTWAIT into the SC and rename it The channel flag CF_SEND_DONTWAIT is renamed to SC_FL_SND_ASAP and moved into the stream-connector.	2023-04-05 08:57:05 +02:00
Christopher Faulet	9a790f63ed	MINOR: stconn/channel: Move CF_READ_DONTWAIT into the SC and rename it The channel flag CF_READ_DONTWAIT is renamed to SC_FL_RCV_ONCE and moved into the stream-connector.	2023-04-05 08:57:05 +02:00
Christopher Faulet	26e0935681	MEDIUM: applet/trace: Register a new trace source with its events Traces are now supported for applets. The first argument is always the appctx. This will help to debug applets.	2023-04-05 08:46:06 +02:00
Christopher Faulet	a5915eb1dd	MINOR: applet: Uninline appctx_free() This functin is uninlined and move in src/applet.c. It is mandatory to add traces for applets.	2023-04-05 08:46:06 +02:00
Remi Tricot-Le Breton	26e1432436	BUG/MINOR: ssl: Undefined reference when building with OPENSSL_NO_DEPRECATED If OPENSSL_NO_DEPRECATED is set, we get a 'error: ‘RSA_PKCS1_PADDING’ undeclared' when building jwt.c. The symbol is not deprecated, we are just missing an include. This was raised in GitHub issue #2098. It does not need to be backported.	2023-04-03 11:46:54 +02:00
Frédéric Lécaille	7d6270a845	BUG/MAJOR: quic: Congestion algorithms states shared between the connection This very old bug is there since the first implementation of newreno congestion algorithm implementation. This was a very bad idea to put a state variable into quic_cc_algo struct which only defines the congestion control algorithm used by a QUIC listener, typically its type and its callbacks. This bug could lead to crashes since BUG_ON() calls have been added to each algorithm implementation. This was revealed by interop test, but not very often as there was not very often several connections run at the time during these tests. Hopefully this was also reported by Tristan in GH #2095. Move the congestion algorithm state to the correct structures which are private to a connection (see cubic and nr structs). Must be backported to 2.7 and 2.6.	2023-04-02 13:10:13 +02:00
Ilya Shipitsin	07be66d21b	CLEANUP: assorted typo fixes in the code and comments This is 35th iteration of typo fixes	2023-04-01 18:33:40 +02:00
Frédéric Lécaille	db4bc6b4f3	MINOR: quic: Add a fake congestion control algorithm named "nocc" This algorithm does nothing except initializing the congestion control window to a fixed value. Very smart! Modify the QUIC congestion control configuration parser to support this new algorithm. The congestion control algorithm must be set as follows: quic-cc-algo nocc-<cc window size(KB)) For instance if "nocc-15" is provided as quic-cc-algo keyword value, this will set a fixed window of 15KB.	2023-03-31 17:09:03 +02:00
Frédéric Lécaille	d721571d26	MEDIUM: quic: Ack delay implementation Reuse the idle timeout task to delay the acknowledgments. The time of the idle timer expiration is for now on stored in ->idle_expire. The one to trigger the acknowledgements is stored in ->ack_expire. Add QUIC_FL_CONN_ACK_TIMER_FIRED new connection flag to mark a connection as having its acknowledgement timer been triggered. Modify qc_may_build_pkt() to prevent the sending of "ack only" packets and allows the connection to send packet when the ack timer has fired. It is possible that acks are sent before the ack timer has triggered. In this case it is cancelled only if ACK frames are really sent. The idle timer expiration must be set again when the ack timer has been triggered or when it is cancelled. Must be backported to 2.7.	2023-03-31 13:41:17 +02:00
Frédéric Lécaille	8f991948f5	MINOR: quic: Traces adjustments at proto level. Dump variables displayed by TRACE_ENTER() or TRACE_LEAVE() by calls to TRACE_PROTO(). No more variables are displayed by the two former macros. For now on, these information are accessible from proto level. Add new calls to TRACE_PROTO() at important locations in relation whith QUIC transport protocol. When relevant, try to prefix such traces with TX or RX keyword to identify the concerned subpart (transmission or reception) of the protocol. Must be backported to 2.7.	2023-03-31 09:54:59 +02:00
Frédéric Lécaille	acc9cfdf79	MINOR: quic: Adjustments for generic control congestion traces Display the elapsed time since packets were sent in place of the timestamp which do not bring easy to read information. Must be backported to 2.7.	2023-03-31 09:54:59 +02:00
Frédéric Lécaille	d7243318c4	BUG/MINOR: quic: Wrong use of now_ms timestamps (cubic algo) As now_ms may wrap, one must use the ticks API to protect the cubic congestion control algorithm implementation from side effects due to this. Furthermore to make the cubic congestion control algorithm more readable and easy to maintain, adding a new state ("in recovery period" QUIC_CC_ST_RP new enum) helps in reaching this goal. Implement quic_cc_cubic_rp_cb() which is the callback for this new state. Must be backported to 2.7 and 2.6.	2023-03-31 09:54:59 +02:00
Aurelien DARRAGON	7e64d8720e	BUG/MINOR: backend: make be_usable_srv() consistent when stopping When a proxy enters the STOPPED state, it will no longer accept new connections. However, it doesn't mean that it's completely inactive yet: it will still be able to handle already pending / keep-alive connections, thus finishing ongoing work before effectively stopping. be_usable_srv(), which is used by nbsrv converter and sample fetch, will return 0 if the proxy is either stopped or disabled. nbsrv behaves this way since it was originally implemented in `b7e7c4720` ("MINOR: Add nbsrv sample converter"). (Since then, multiple refactors were performed around this area, but the current implementation still follows the same logic) It was found that if nbsrv is used in a proxy section to perform routing logic, unexpected decisions are being made when nbsrv is used on a proxy with STOPPED state, since in-flight requests will suffer from nbsrv returning 0 instead of the current number of usable servers which may still process existing connections. For instance, this can happen during process soft-stop, or even when stopping the proxy from the cli / lua. To fix this: we now make sure be_usable_srv() always returns the current number of usable servers, unless the proxy is explicitly disabled (from the config, not at runtime) This could be backported up to 2.6. For older versions, the need for a backport should be evaluated first. -- Note for 2.4: proxy flags did not exist, it was implemented with fd10ab5e ("MINOR: proxy: Introduce proxy flags to replace disabled bitfield") For 2.2: STOPPED and DISABLED states were not separated, so we have no easy way to apply the fix anyway.	2023-03-31 07:45:08 +02:00
Martin DOLEZ	110e4a8733	MINOR: http_fetch: add case insensitive support for smp_fetch_url_param This commit adds a new argument to smp_fetch_url_param that makes the parameter key comparison case-insensitive. Several levels of callers were modified to pass this info.	2023-03-30 14:11:10 +02:00
Aurelien DARRAGON	2c5b9ded9b	CLEANUP: proxy: remove stop_time related dead code Since `eb77824` ("MEDIUM: proxy: remove the deprecated "grace" keyword"), stop_time is never set, so the related code in manage_proxy() is not relevant anymore. Removing code that refers to p->stop_time, since it was probably overlooked.	2023-03-28 20:26:47 +02:00
Frédéric Lécaille	c425e03b28	BUG/MINOR: quic: Missing STREAM frame type updated This patch follows this commit which was not sufficient: BUG/MINOR: quic: Missing STREAM frame data pointer updates Indeed, after updating the ->offset field, the bit which informs the frame builder of its presence must be systematically set. This bug was revealed by the following BUG_ON() from quic_build_stream_frame() : bug condition "!!(frm->type & 0x04) != !!stream->offset.key" matched at src/quic_frame.c:515 This should fix the last crash occured on github issue #2074. Must be backported to 2.6 and 2.7.	2023-03-27 16:01:44 +02:00
Willy Tarreau	1751db140a	MINOR: pools: report a replaced memory allocator instead of just malloc_trim() Instead of reporting the inaccurate "malloc_trim() support" on -vv, let's report the case where the memory allocator was actively replaced from the one used at build time, as this is the corner case we want to be cautious about. We also put a tainted bit when this happens so that it's possible to detect it at run time (e.g. the user might have inherited it from an environment variable during a reload operation). The now unused is_trim_enabled() function was finally dropped.	2023-03-22 18:05:02 +01:00
Willy Tarreau	7aee683541	MINOR: pools: export trim_all_pools() This way it will be usable from outside instead of malloc_trim().	2023-03-22 17:30:28 +01:00
Willy Tarreau	eaba76b02d	MINOR: pools: intercept malloc_trim() instead of trying to plug holes As reported by Miroslav in commit `d8a97d8f6` ("BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used") there are still occasional cases where it's discovered that malloc_trim() is being used without its suitability being checked first. This is a problem when using another incompatible allocator. But there's a class of use cases we'll never be able to cover, it's dynamic libraries loaded from Lua. In order to address this more reliably, we now define our own malloc_trim() that calls the previous one after checking that the feature is supported and that the allocator is the expected one. This way child libraries that would call it will also be safe. The function is intentionally left defined all the time so that it will be possible to clean up some code that uses it by removing ifdefs.	2023-03-22 17:30:28 +01:00
Amaury Denoyelle	1d0ed1a2e9	BUG/MINOR: trace: fix hardcoded level for TRACE_PRINTF Level argument was not ignored by TRACE_PRINTF due to an hardcoded value of TRACE_LEVEL_DEVELOPER inside the macro. This must be backported up to 2.6.	2023-03-22 15:31:55 +01:00
Miroslav Zagorac	d8a97d8f60	BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used In the event that HAProxy is linked with the jemalloc library, it is still shown that malloc_trim() is enabled when executing "haproxy -vv": .. Support for malloc_trim() is enabled. .. It's not so much a problem as it is that malloc_trim() is called in the pat_ref_purge_range() function without any checking. This was solved by setting the using_default_allocator variable to the correct value in the detect_allocator() function and before calling malloc_trim() it is checked whether the function should be called.	2023-03-22 14:14:50 +01:00
Willy Tarreau	0de1e6180a	BUILD: thread: implement thread_harmless_end_sig() for threadless builds Building without thread support was broken in 2.8-dev2 with commit `7e70bfc8c` ("MINOR: threads: add a thread_harmless_end() version that doesn't wait") that forgot to define the function for the threadless cases. No backport is needed.	2023-03-22 10:40:06 +01:00
Willy Tarreau	69869e6354	MINOR: dynbuf: set POOL_F_NO_FAIL on buffer allocation b_alloc() is used to allocate a buffer. We can provoke fault injection based on forced memory allocation failures using -dMfail on the command line, but we know that the buffer_wait list is a bit weak and doesn't always recover well. As such, submitting buffer allocation to such a treatment seriously limits the usefulness of -dMfail which cannot really be used for other purposes. Let's just disable it for buffers for now.	2023-03-21 09:15:13 +01:00
Willy Tarreau	ac78c4fd9d	MINOR: ssl-sock: pass the CO_SFL_MSG_MORE info down the stack Despite having replaced the SSL BIOs to use our own raw_sock layer, we still didn't exploit the CO_SFL_MSG_MORE flag which is pretty useful to avoid sending incomplete packets. It's particularly important for SSL since the extra overhead almost guarantees that each send() will be followed by an incomplete (and often odd-sided) segment. We already have an xprt_st set of flags to pass info to the various layers, so let's just add a new one, SSL_SOCK_SEND_MORE, that is set or cleared during ssl_sock_from_buf() to transfer the knowledge of CO_SFL_MSG_MORE. This way we can recover this information and pass it to raw_sock. This alone is sufficient to increase by ~5-10% the H2 bandwidth over SSL when multiple streams are used in parallel.	2023-03-17 16:43:51 +01:00
Fr�d�ric L�caille	ca07979b97	BUG/MINOR: quic: Missing STREAM frame data pointer updates This patch follows this one which was not sufficient: "BUG/MINOR: quic: Missing STREAM frame length updates" Indeed, it is not sufficient to update the ->len and ->offset member of a STREAM frame to move it forward. The data pointer must also be updated. This is not done by the STREAM frame builder. Must be backported to 2.6 and 2.7.	2023-03-17 09:21:18 +01:00
Willy Tarreau	9824f8c890	MINOR: buffer: add br_single() to check if a buffer ring has more than one buf It's cheaper and cleaner than using br_count()==1 given that it just compares two indexes, and that a ring having a single buffer is in a special case where it is between empty and used up-to-1. In other words it's not congested.	2023-03-16 18:45:46 +01:00
Willy Tarreau	e5a26eb2de	MINOR: buffer: add br_count() to return the number of allocated bufs We have no way to know how many buffers are currently allocated in a buffer ring. Let's add br_count() for this.	2023-03-16 18:45:46 +01:00
Christopher Faulet	3a7b539b12	BUG/MEDIUM: connection: Preserve flags when a conn is removed from an idle list The commit `5e1b0e7bf` ("BUG/MEDIUM: connection: Clear flags when a conn is removed from an idle list") introduced a regression. CO_FL_SAFE_LIST and CO_FL_IDLE_LIST flags are used when the connection is released to properly decrement used/idle connection counters. if a connection is idle, these flags must be preserved till the connection is really released. It may be removed from the list but not immediately released. If these flags are lost when it is finally released, the current number of used connections is erroneously decremented. If means this counter may become negative and the counters tracking the number of idle connecitons is not decremented, suggesting a leak. So, the above commit is reverted and instead we improve a bit the way to detect an idle connection. The function conn_get_idle_flag() must now be used to know if a connection is in an idle list. It returns the connection flag corresponding to the idle list if the connection is idle (CO_FL_SAFE_LIST or CO_FL_IDLE_LIST) or 0 otherwise. But if the connection is scheduled to be removed, 0 is also returned, regardless the connection flags. This new function is used when the connection is temporarily removed from the list to be used, mainly in muxes. This patch should fix #2078 and #2057. It must be backported as far as 2.2.	2023-03-16 15:34:20 +01:00
Remi Tricot-Le Breton	a6c0a59e9a	MINOR: ssl: Use ocsp update task for "update ssl ocsp-response" command Instead of having a dedicated httpclient instance and its own code decorrelated from the actual auto update one, the "update ssl ocsp-response" will now use the update task in order to perform updates. Since the cli command allows to update responses that were never included in the auto update tree, a new flag was added to the certificate_ocsp structure so that the said entry can be inserted into the tree "by hand" and it won't be reinserted back into the tree after the update process is performed. The 'update_once' flag "stole" a bit from the 'fail_count' counter since it is the one less likely to reach UINT_MAX among the ocsp counters of the certificate_ocsp structure. This new logic required that every certificate_ocsp entry contained all the ocsp-related information at all time since entries that are not supposed to be configured automatically can still be updated through the cli. The logic of the ssl_sock_load_ocsp was changed accordingly.	2023-03-14 11:07:32 +01:00
Willy Tarreau	8f6da64641	MINOR: quic_sock: un-statify quic_conn_sock_fd_iocb() This one is printed as the iocb in the "show fd" output, and arguably this wasn't very convenient as-is: 293 : st=0x000123(cl heopI W:sRa R:sRA) ref=0 gid=1 tmask=0x8 umask=0x0 prmsk=0x8 pwmsk=0x0 owner=0x7f488487afe0 iocb=0x50a2c0(main+0x60f90) Let's unstatify it and export it so that the symbol can now be resolved from the various points that need it.	2023-03-10 14:30:01 +01:00
William Lallemand	2078d4b1f7	BUG/MINOR: mworker: use MASTER_MAXCONN as default maxconn value In environments where SYSTEM_MAXCONN is defined when compiling, the master will use this value instead of the original minimal value which was set to 100. When this happens, the master process could allocate RAM excessively since it does not need to have an high maxconn. (For example if SYSTEM_MAXCONN was set to 100000 or more) This patch fixes the issue by using the new define MASTER_MAXCONN which define a default maxconn of 100 for the master process. Must be backported as far as 2.5.	2023-03-09 14:28:44 +01:00
Willy Tarreau	cd8914bc52	BUG/MAJOR: fd/threads: close a race on closing connections after takeover As mentioned in commit `237e6a0d6` ("BUG/MAJOR: fd/thread: fix race between updates and closing FD"), a race was found during stress tests involving heavy backend connection reuse with many competing closes. Here the problem is complex. The analysis in commit `f69fea64e` ("MAJOR: fd: get rid of the DWCAS when setting the running_mask") that removed the DWCAS in 2.5 overlooked a few races. First, a takeover from thread1 could happen just after fd_update_events() in thread2 validates it holds the tmask bit in the CAS loop. Since thread1 releases running_mask after the operation, thread2 will succeed the CAS and both will believe the FD is theirs. This does explain the occasional crashes seen with h1_io_cb() being called on a bad context, or sock_conn_iocb() seeing conn->subs vanish after checking it. This issue can be addressed using a DWCAS in both fd_takeover() and fd_update_events() as it was before the patch above but this is not portable to all archs and is not easy to adapt for those lacking it, due to some operations still happening only on individual masks after the thread groups were added. Second, the checks after fd_clr_running() for the current thread being the last one is not sufficient: at the exact moment the operation completes, another thread may also set and drop the running bit and see itself as alone, and both can call _fd_close_orphan() in parallel. In order to prevent this from happening, we cannot rely on the absence of others, we need an explicit flag indicating that the FD must be closed. One approach that was attempted consisted in playing with the thread_mask but that was not reliable since it could still match between the late deletion and the early insertion that follows. Instead, a new FD flag was added, FD_MUST_CLOSE, that exactly indicates that the call to _fd_delete_orphan() must be done. It is set by fd_delete(), and atomically cleared by the first one which checks it, and which is the only one to call _fd_delete_orphan(). With both points addressed, there's no more visible race left: - takeover() only happens under the connection list's lock and cannot compete with fd_delete() since fd_delete() must first remove the connection from the list before deleting the FD. That's also why it doesn't need to call _fd_delete_orphan() when dropping its running bit. - takeover() sets its running bit then atomically replaces the thread mask, so that until that's done, it doesn't validate the condition to end the synchonization loop in fd_update_events(). Once it's OK, the previous thread's bit is lost, and this is checked for in fd_update_events() - fd_update_events() can compete with fd_delete() at various places which are explained above. Since fd_delete() clears the thread mask as after setting its running bit and after setting the FD_MUST_CLOSE bit, the synchronization loop guarantees that the thread mask is seen before going further, and that once it's seen, the FD_MUST_CLOSE flag is already present. - fd_delete() may start while fd_update_events() has already started, but fd_delete() must hold a bit in thread_mask before starting, and that is checked by the first test in fd_update_events() before setting the running_mask. - the poller's _update_fd() will not compete against _fd_delete_orphan() nor fd_insert() thanks to the fd_grab_tgid() that's always done before updating the polled_mask, and guarantees that we never pretend that a polled_mask has a bit before the FD is added. The issue is very hard to reproduce and is extremely time-sensitive. Some tests were required with a 1-ms timeout with request rates closely matching 1 kHz per server, though certain tests sometimes benefitted from saturation. It was found that adding the following slowdown at a few key places helped a lot and managed to trigger the bug in 0.5 to 5 seconds instead of tens of minutes on a 20-thread setup: { volatile int i = 10000; while (i--); } Particularly, placing it at key places where only one of running_mask or thread_mask is set and not the other one yet (e.g. after the synchronization loop in fd_update_events or after dropping the running bit) did yield great results. Many thanks to Olivier Houchard for this expert help analysing these races and reviewing candidate fixes. The patch must be backported to 2.5. Note that 2.6 does not have tgid in FDs, and that it requires a change of output on fd_clr_running() as we need the previous bit. This is provided by carefully backporting commit `d6e1987612` ("MINOR: fd: make fd_clr_running() return the previous value instead"). Tests have shown that the lack of tgid is a showstopper for 2.6 and that unless a better workaround is found, it could still be preferable to backport the minimum pieces required for fd_grab_tgid() to 2.6 so that it stays stable long.	2023-03-09 14:01:48 +01:00
Frédéric Lécaille	cc101cd2aa	BUG/MINOR: quic: Wrong RETIRE_CONNECTION_ID sequence number check This bug arrived with this commit: b5a8020e9 MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX) and was revealed by h3 interop tests with clients like s2n-quic and quic-go as noticed by Amaury. Indeed, one must check that the CID matching the sequence number provided by a received RETIRE_CONNECTION_ID frame does not match the DCID of the packet. Remove useless ->curr_cid_seq_num member from quic_conn struct. The sequence number lookup must be done in qc_handle_retire_connection_id_frm() to check the validity of the RETIRE_CONNECTION_ID frame, it returns the CID to be retired into <cid_to_retire> variable passed as parameter to this function if the frame is valid and if the CID was not already retired Must be backported to 2.7.	2023-03-08 14:53:12 +01:00
Amaury Denoyelle	5907fede87	MEDIUM: quic: release closing connections on stopping Since the following commit : commit `fb375574f9` MINOR: quic: mark quic-conn as jobs on socket allocation quic-conn instances are marked as jobs. This prevent haproxy process to stop while there is transfer in progress. To not delay process termination, idle connections are woken up through their MUX instances to be able to release them immediately. However, there is no mechanism to wake up quic connections left on closing or draining state. This means that haproxy process termination is delayed until every closing quic connections timer has expired. To improve this, a new function quic_handle_stopping() is called when haproxy process is stopping. It simply wakes up the idle timer task of all connections in the global closing list. These connections will thus be released immediately to not interrupt haproxy process stopping. This should be backported up to 2.7.	2023-03-08 14:41:28 +01:00
Amaury Denoyelle	efed86c973	MINOR: quic: create a global list dedicated for closing QUIC conns When a CONNECTION_CLOSE is emitted or received, a QUIC connection enters respectively in draining or closing state. These states are a loose equivalent of TCP TIME_WAIT. No data can be exchanged anymore but the connection is maintained during a certain timer to handle packet reordering or loss. A new global list has been defined for QUIC connections in closing/draining state inside thread_ctx structure. Each time a connection enters in one of this state, it will be moved from the default global list to the new closing list. The objective of this patch is to quickly filter connections on closing/draining. Most notably, this will be used to wake up these connections and avoid that haproxy process stopping is delayed by them. A dedicated function qc_detach_th_ctx_list() has been implemented to transfer a quic-conn from one list instance to the other. This takes care of back-references attach to a quic-conn instance in case of a running "show quic". This should be backported up to 2.7.	2023-03-08 14:39:48 +01:00
Frédéric Lécaille	5e3201ea77	MINOR: quic: Add transport parameters to "show quic" Modify quic_transport_params_dump() and others function relative to the transport parameters value dump from TRACE() to make their output more compact. Add call to quic_transport_params_dump() to dump the transport parameters from "show quic" CLI command. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	ece86e64c4	MINOR: quic: Add spin bit support Add QUIC_FL_RX_PACKET_SPIN_BIT new RX packet flag to mark an RX packet as having the spin bit set. Idem for the connection with QUIC_FL_CONN_SPIN_BIT flag. Implement qc_handle_spin_bit() to set/unset QUIC_FL_CONN_SPIN_BIT for the connection as soon as a packet number could be deciphered. Modify quic_build_packet_short_header() to set the spin bit when building a short packet header. Validated by quic-tracker spin bit test. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	8ac8a8778d	MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX) Add ->curr_cid_seq_num new quic_conn struct frame to store the connection ID sequence number currently used by the connection. Implement qc_handle_retire_connection_id_frm() to handle this RX frame. Implement qc_retire_connection_seq_num() to remove a connection ID from its sequence number. Implement qc_build_new_connection_id_frm to allocate a new NEW_CONNECTION_ID frame from a CID. Modify qc_parse_pkt_frms() which parses the frames of an RX packet to handle the case of the RETIRE_CONNECTION_ID frame. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	b4c5471425	MINOR: quic: Store the next connection IDs sequence number in the connection Add ->next_cid_seq_num new member to quic_conn struct to store the next connection ID to be used to alloacated a connection ID. It is initialized to 0 from qc_new_conn() which initializes a connection. Modify new_quic_cid() to use this variable each time it is called without giving the possibility to the caller to pass the sequence number for the connection to be allocated. Modify quic_build_post_handshake_frames() to use ->next_cid_seq_num when building NEW_CONNECTION_ID frames after the hanshake has been completed. Limit the number of connection IDs provided to the peer to the minimum between 4 and the value it sent with active_connection_id_limit transport parameter. This includes the connection ID used by the connection to send this new connection IDs. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	51a7caf921	MINOR: quic: Add traces about QUIC TLS key update Dump the secret used to derive the next one during a key update initiated by the client and dump the resulted new secret and the new key and iv to be used to decryption Application level packets. Also add a trace when the key update is supposed to be initiated on haproxy side. This has already helped in diagnosing an issue evealed by the key update interop test with xquic as client. Must be backported to 2.7.	2023-03-03 19:12:26 +01:00
Amaury Denoyelle	c8a0efbda8	BUG/MEDIUM: quic: properly handle duplicated STREAM frames When a STREAM frame is re-emitted, it will point to the same stream buffer as the original one. If an ACK is received for either one of these frame, the underlying buffer may be freed. Thus, if the second frame is declared as lost and schedule for retransmission, we must ensure that the underlying buffer is still allocated or interrupt the retransmission. Stream buffer is stored as an eb_tree indexed by the stream ID. To avoid to lookup over a tree each time a STREAM frame is re-emitted, a lost STREAM frame is flagged as QUIC_FL_TX_FRAME_LOST. In most cases, this code is functional. However, there is several potential issues which may cause a segfault : - when explicitely probing with a STREAM frame, the frame won't be flagged as lost - when splitting a STREAM frame during retransmission, the flag is not copied To fix both these cases, QUIC_FL_TX_FRAME_LOST flag has been converted to a <dup> field in quic_stream structure. This field is now properly copied when splitting a STREAM frame. Also, as this is now an inner quic_frame field, it will be copied automatically on qc_frm_dup() invocation thus ensuring that it will be set on probing. This issue was encounted randomly with the following backtrace : #0 __memmove_avx512_unaligned_erms () #1 0x000055f4d5a48c01 in memcpy (__len=18446698486215405173, __src=<optimized out>, #2 quic_build_stream_frame (buf=0x7f6ac3fcb400, end=<optimized out>, frm=0x7f6a00556620, #3 0x000055f4d5a4a147 in qc_build_frm (buf=buf@entry=0x7f6ac3fcb5d8, #4 0x000055f4d5a23300 in qc_do_build_pkt (pos=<optimized out>, end=<optimized out>, #5 0x000055f4d5a25976 in qc_build_pkt (pos=0x7f6ac3fcba10, #6 0x000055f4d5a30c7e in qc_prep_app_pkts (frms=0x7f6a0032bc50, buf=0x7f6a0032bf30, #7 qc_send_app_pkts (qc=0x7f6a0032b310, frms=0x7f6a0032bc50) at src/quic_conn.c:4184 #8 0x000055f4d5a35f42 in quic_conn_app_io_cb (t=0x7f6a0009c660, context=0x7f6a0032b310, This should fix github issue #2051. This should be backported up to 2.6.	2023-03-03 15:08:02 +01:00
Remi Tricot-Le Breton	86d1e0b163	BUG/MINOR: ssl: Fix ocsp-update when using "add ssl crt-list" When adding a new certificate through the CLI and appending it to a crt-list with the 'ocsp-update' option set, the new certificate would not be added to the OCSP response update list. The only thing that was missing was the copy of the ocsp_update mode from the ssl_bind_conf into the ckch_store's object. An extra wakeup of the update task also needed to happen in case the newly inserted entry needs to be updated before the next wakeup of the task. This patch does not need to be backported.	2023-03-02 15:57:56 +01:00
Remi Tricot-Le Breton	5843237993	MINOR: ssl: Add global options to modify ocsp update min/max delay The minimum and maximum delays between two automatic updates of a given OCSP response can now be set via global options. It allows to limit the update rate of OCSP responses for configurations that use many frontend certificates with the ocsp-update option set if the updates are deemed too costly.	2023-03-02 15:37:23 +01:00
Remi Tricot-Le Breton	07b7c15bce	MINOR: ssl: Reorder struct certificate_ocsp members Just swapping those two 'refcount' and 'response' members enables to fill two 4 bytes holes in the structure.	2023-03-02 15:37:20 +01:00
Remi Tricot-Le Breton	0c96ee48b4	MINOR: ssl: Add certificate's path to certificate_ocsp structure In order to have some information about the frontend certificate when dumping the contents of the ocsp update tree from the cli, we could either keep a reference to a ckch_store in the certificate_ocsp structure, which might cause some dangling reference problems, or simply copy the path to the certificate in the ocsp response structure. This latter solution was chosen because of its simplicity.	2023-03-02 15:37:15 +01:00
Remi Tricot-Le Breton	ad6cba83a4	MINOR: ssl: Store specific ocsp update errors in response and update ctx Those new specific error codes will enable to know a bit better what went wrong during and OCSP update process. They will come to use in future sample fetches as well as in debugging means (via the cli or future traces).	2023-03-02 15:37:12 +01:00
Remi Tricot-Le Breton	9e94df3e55	MINOR: ssl: Add ocsp update success/failure counters Those counters will be used for debugging purposes and will be dumped via a cli command.	2023-03-02 15:37:11 +01:00
Amaury Denoyelle	e0fe118dad	MINOR: quic: implement qc_notify_send() Implement qc_notify_send(). This function is responsible to notify the upper layer subscribed on SUB_RETRY_SEND if sending condition are back to normal. For the moment, this patch has no functional change as only congestion window room is checked before notifying the upper layer. However, this will be extended when poller subscribe of socket on sendto() error will be implemented. qc_notify_send() will thus be responsible to ensure that all condition are met before wake up the upper layer. This should be backported up to 2.7.	2023-03-01 14:29:16 +01:00
Amaury Denoyelle	1febc2d316	MEDIUM: quic: improve fatal error handling on send Send is conducted through qc_send_ppkts() for a QUIC connection. There is two types of error which can be encountered on sendto() or affiliated syscalls : * transient error. In this case, sending is simulated with the remaining data and retransmission process is used to have the opportunity to retry emission * fatal error. If this happens, the connection should be closed as soon as possible. This is done via qc_kill_conn() function. Until this patch, only ECONNREFUSED errno was considered as fatal. Modify the QUIC send API to be able to differentiate transient and fatal errors more easily. This is done by fixing the return value of the sendto() wrapper qc_snd_buf() : * on fatal error, a negative error code is returned. This is now the case for every errno except EAGAIN, EWOULDBLOCK, ENOTCONN, EINPROGRESS and EBADF. * on a transient error, 0 is returned. This is the case for the listed errno values above and also if a partial send has been conducted by the kernel. * on success, the return value of sendto() syscall is returned. This commit will be useful to be able to handle transient error with a quic-conn owned socket. In this case, the socket should be subscribed to the poller and no simulated send will be conducted. This commit allows errno management to be confined in the quic-sock module which is a nice cleanup. On a final note, EBADF should be considered as fatal. This will be the subject of a next commit. This should be backported up to 2.7.	2023-02-28 10:51:25 +01:00
Willy Tarreau	7b8aac4439	MINOR: tinfo: make thread_set functions return nth group/mask instead of first thread_set_first_group() and thread_set_first_tmask() were modified and renamed to instead return the number and mask of the nth group. Passing zero continues to return the first one, but it will be more convenient to use this way when building shards.	2023-02-28 10:28:47 +01:00
Willy Tarreau	fea8c19119	CLEANUP: listener: only store conn counts for local threads The listeners have a thr_conn[] array indexed on the thread number that is used during connection redispatching to know what threads are the least loaded. Since we introduced thread groups, and based on the fact that a listener may only belong to one group, there's no point storing counters for all threads, we just need to store them for all threads in the group. Doing so reduces the struct listener from 1500 to 632 bytes. This may be backported to 2.7 to save a bit of resources.	2023-02-28 10:28:47 +01:00
Christopher Faulet	85eabfbf67	MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished As for the H1 and H2 stream, the QUIC stream now states it does not expect data from the server as long as the request is unfinished. The aim is the same. We must be sure to not trigger a read timeout on server side if the client is still uploading data. From the moment the end of the request is received and forwarded to upper layer, the QUIC stream reports it expects to receive data from the opposite endpoint. This re-enables read timeout on the server side.	2023-02-27 17:45:45 +01:00
Christopher Faulet	8aabc8ebfd	MINOR: stconn: Report a send activity when endpoint is willing to consume data When the endpoint (applet or mux) is now willing to consume data while it said it wouldn't, a send activity is reported. Indeed, the writes was blocked because of the endpoint. It is now ready to consume outgoing data. So an send activity must be reported to reset corresponding timers. Concretly, when the flag SE_FL_WONT_CONSULE is removed, a send activity is reported.	2023-02-27 17:45:45 +01:00
Willy Tarreau	a2a3d5dd25	CLEANUP: ring: remove the now unused ring's offset Since the previous patch, the ring's offset is not used anymore. The haring utility remains backward-compatible since it can trust the buffer element that's at the beginning of the map and which still contains all the valid data.	2023-02-24 09:26:30 +01:00
Aurelien DARRAGON	d3ffba4512	MINOR: listener: pause_listener() becomes suspend_listener() We are simply renaming pause_listener() to suspend_listener() to prevent confusion around listener pausing. A suspended listener can be in two differents valid states: - LI_PAUSED: the listener is effectively paused, it will unpause on resume_listener() - LI_ASSIGNED (not bound): the listener does not support the LI_PAUSED state, so it was unbound to satisfy the suspend request, it will correcly re-bind on resume_listener() Besides that, we add the LI_F_SUSPENDED flag to mark suspended listeners in suspend_listener() and unmark them in resume_listener(). We're also adding li_suspend proxy variable to track the number of currently suspended listeners: That is, the number of listeners that were suspended through suspend_listener() and that are either in LI_PAUSED or LI_ASSIGNED state. Counter is increased on successful suspend in suspend_listener() and it is decreased on successful resume in resume_listener() -- Backport notes: -> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state" was not backported: Replace this: \| /* PROXY_LOCK is require \| proxy_cond_resume(px); By this: \| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); \| send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); -> 2.6 and 2.7 only, as "MINOR: listener: make sure we don't pause/resume" was custom patched: Replace this: \|@@ -253,6 +253,7 @@ struct listener { \| \| /* listener flags (16 bits) / \| #define LI_F_FINALIZED 0x0001 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+#define LI_F_SUSPENDED 0x0002 / listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state / \| \| / Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of \| * success, or a combination of ERR_* flags if an error is encountered. The By this: \|@@ -222,6 +222,7 @@ struct li_per_thread { \| \| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic / \| #define LI_F_FINALIZED 0x00000002 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+#define LI_F_SUSPENDED 0x00000004 / listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state / \| \| / The listener will be directly referenced by the fdtab[] which holds its \| * socket. The listener provides the protocol-specific accept() function to	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	2370599f96	MINOR: listener: make sure we don't pause/resume bypassed listeners Some listeners are kept in LI_ASSIGNED state but are not supposed to be started since they were bypassed on initial startup (eg: in protocol_bind_all() or in enable_listener()...) Introduce the LI_F_FINALIZED flag: when the variable is non zero it means that the listener made it past the LI_LISTEN state (finalized) at least once so we can safely pause / resume. This way we won't risk starting a previously bypassed listener which never made it that far and thus was not expected to be lazy-started by accident. As listener_pause() and listener_resume() are currently partially broken, such unexpected lazy-start won't happen. But we're trying to restore pause() and resume() behavior so this patch will be required before going any further. We had to re-introduce listeners 'flags' struct member since it was recently moved into bind_conf struct. But here we do have a legitimate need for these listener-only flags. This should only be backported if explicitly required by another commit. -- Backport notes: -> 2.4 and 2.5: The 2-bytes hole we're using in the current patch does not apply, let's use the 4-byte hole located under the 'option' field. Replace this: \|@@ -226,7 +226,8 @@ struct li_per_thread { \| struct listener { \| enum obj_type obj_type; /* object type = OBJ_TYPE_LISTENER / \| enum li_state state; / state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL / \|- / 2-byte hole here / \|+ uint16_t flags; / listener flags: LI_F_* / \| int luid; / listener universally unique ID, used for SNMP / \| int nbconn; / current number of connections on this listener / \| unsigned int thr_idx; / thread indexes for queue distribution : (t2<<16)+t1 / By this: \|@@ -209,6 +209,8 @@ struct listener { \| short int nice; / nice value to assign to the instantiated tasks / \| int luid; / listener universally unique ID, used for SNMP / \| int options; / socket options : LI_O_* / \|+ uint16_t flags; / listener flags: LI_F_* / \|+ / 2-bytes hole here / \| __decl_thread(HA_RWLOCK_T lock); \| \| struct fe_counters counters; /* statistics counters / -> 2.4 only: We need to adjust some contextual lines. Replace this: \|@@ -477,7 +478,7 @@ int pause_listener(struct listener l, int lpx, int lli) \| if (!lli) \| HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \|- if (l->state <= LI_PAUSED) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state <= LI_PAUSED) \| goto end; \| \| if (l->rx.proto->suspend) By this: \|@@ -477,7 +478,7 @@ int pause_listener(struct listener l, int lpx, int lli) \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) \| goto end; \| \|- if (l->state <= LI_PAUSED) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state <= LI_PAUSED) \| goto end; \| \| if (l->rx.proto->suspend) And this: \|@@ -535,7 +536,7 @@ int resume_listener(struct listener l, int lpx, int lli) \| if (MT_LIST_INLIST(&l->wait_queue)) \| goto end; \| \|- if (l->state == LI_READY) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state == LI_READY) \| goto end; \| \| if (l->rx.proto->resume) By this: \|@@ -535,7 +536,7 @@ int resume_listener(struct listener l, int lpx, int lli) \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) \| goto end; \| \|- if (l->state == LI_READY) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state == LI_READY) \| goto end; \| \| if (l->rx.proto->resume) -> 2.6 and 2.7 only: struct listener 'flags' member still exists, let's use it. Remove this from the current patch: \|@@ -226,7 +226,8 @@ struct li_per_thread { \| struct listener { \| enum obj_type obj_type; / object type = OBJ_TYPE_LISTENER / \| enum li_state state; / state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL / \|- / 2-byte hole here / \|+ uint16_t flags; / listener flags: LI_F_* / \| int luid; / listener universally unique ID, used for SNMP / \| int nbconn; / current number of connections on this listener / \| unsigned int thr_idx; / thread indexes for queue distribution : (t2<<16)+t1 / Then, replace this: \|@@ -251,6 +250,9 @@ struct listener { \| EXTRA_COUNTERS(extra_counters); \| }; \| \|+/ listener flags (16 bits) / \|+#define LI_F_FINALIZED 0x0001 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+ \| / Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of \| * success, or a combination of ERR_* flags if an error is encountered. The \| * function pointer can be NULL if not implemented. The function also has an By this: \|@@ -221,6 +221,7 @@ struct li_per_thread { \| }; \| \| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic / \|+#define LI_F_FINALIZED 0x00000002 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \| \| / The listener will be directly referenced by the fdtab[] which holds its \| * socket. The listener provides the protocol-specific accept() function to	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	bcad7e6319	MINOR: listener: add relax_listener() function There is a need for a small difference between resuming and relaxing a listener. When resuming, we expect that the listener may completely resume, this includes unpausing or rebinding if required. Resuming a listener is a best-effort operation: no matter the current state, try our best to bring the listener up to the LI_READY state. There are some cases where we only want to "relax" listeners that were previously restricted using limit_listener() or listener_full() functions. Here we don't want to ressucitate listeners, we're simply interested in cancelling out the previous restriction. To this day, listener_resume() on a unbound listener is broken, that's why the need for this wasn't felt yet. But we're trying to restore historical listener_resume() behavior, so we better prepare for this by introducing an explicit relax_listener() function that only does what is expected in such cases. This commit depends on: - "MINOR: listener/api: add lli hint to listener functions"	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	4059e094db	MINOR: listener/api: add lli hint to listener functions Add listener lock hint (AKA lli) to (stop/resume/pause)_listener() functions. All these functions implicitely take the listener lock when they are called: It could be useful to be able to call them while already holding the lock, so we're adding lli hint to make them take the lock only when it is missing. This should only be backported if explicitly required by another commit -- -> 2.4 and 2.5 common backport notes: These 2 commits need to be backported first: - `187396e34` "CLEANUP: listener: function comment typo in stop_listener()" - `a57786e87` "BUG/MINOR: listener: null pointer dereference suspected by coverity" -> 2.4 special backport notes: In addition to the previously mentionned dependencies, the patch needs to be slightly adapted to match the corresponding contextual lines: Replace this: \|@@ -471,7 +474,8 @@ int pause_listener(struct listener l, int lpx) \| if (!lpx && px) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock); \| \|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \|+ if (!lli) \|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \| if (l->state <= LI_PAUSED) \| goto end; By this: \|@@ -471,7 +474,8 @@ int pause_listener(struct listener l, int lpx) \| if (!lpx && px) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock); \| \|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \|+ if (!lli) \|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \| if ((global.mode & (MODE_DAEMON \| MODE_MWORKER)) && \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) Replace this: \|@@ -169,7 +169,7 @@ void protocol_stop_now(void) \| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock); \| list_for_each_entry(proto, &protocols, list) { \| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list) \|- stop_listener(listener, 0, 1); \|+ stop_listener(listener, 0, 1, 0); \| } \| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); \| } By this: \|@@ -169,7 +169,7 @@ void protocol_stop_now(void) \| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock); \| list_for_each_entry(proto, &protocols, list) { \| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list) \| if (!listener->bind_conf->frontend->grace) \|- stop_listener(listener, 0, 1); \|+ stop_listener(listener, 0, 1, 0); \| } \| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); Replace this: \|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy p) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock); \| \| list_for_each_entry(l, &p->conf.listeners, by_fe) \|- stop_listener(l, 1, 0); \|+ stop_listener(l, 1, 0, 0); \| \| if (!(p->flags & (PR_FL_DISABLED\|PR_FL_STOPPED)) && !p->li_ready) { \| / might be just a backend / By this: \|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy p) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock); \| \| list_for_each_entry(l, &p->conf.listeners, by_fe) \|- stop_listener(l, 1, 0); \|+ stop_listener(l, 1, 0, 0); \| \| if (!p->disabled && !p->li_ready) { \| /* might be just a backend */	2023-02-23 15:05:05 +01:00
Christopher Faulet	2bf99123ef	MINOR: stconn: Add functions to set/clear SE_FL_EXP_NO_DATA flag from endpoint se_expect_data() and se_expect_no_data() should be used from the endpoint to inform upper layer it expects data or not from the opposite endpoint.	2023-02-23 13:44:32 +01:00
Christopher Faulet	be5cc766b0	MINOR: stconn: Remove half-closed timeout The half-closed timeout is now directly retrieved from the proxy settings. There is no longer usage for the .hcto field in the stconn structure. So let's remove it.	2023-02-22 15:59:16 +01:00
Christopher Faulet	bcdcfad3ff	MINOR: stconn: Set half-close timeout using proxy settings We now directly use the proxy settings to set the half-close timeout of a stream-connector. The function sc_set_hcto() must be used to do so. This timeout is only set when a shutw is performed. So it is not really a big deal to use a dedicated function to do so.	2023-02-22 15:59:16 +01:00
Christopher Faulet	15315d6c0a	CLEANUP: stconn: Remove old read and write expiration dates Old read and write expiration dates are no longer used. Thus we can safely remove them.	2023-02-22 15:59:16 +01:00
Christopher Faulet	b374ba563a	MAJOR: stream: Use SE descriptor date to detect read/write timeouts We stop to use the channel's expiration dates to detect read and write timeouts on the channels. We now rely on the stream-endpoint descriptor to do so. All the stuff is handled in process_stream(). The stream relies on 2 helper functions to know if the receives or sends may expire: sc_rcv_may_expire() and sc_snd_may_expire().	2023-02-22 15:57:16 +01:00
Christopher Faulet	2ca4cc1936	MINOR: applet/stconn: Add a SE flag to specify an endpoint does not expect data An endpoint should now set SE_FL_EXP_NO_DATA flag if it does not expect any data from the opposite endpoint. This way, the stream will be able to disable any read timeout on the opposite endpoint. Applets should use applet_expect_no_data() and applet_expect_data() functions to set or clear the flag. For now, only dns and sink forwarder applets are concerned.	2023-02-22 15:56:28 +01:00
Christopher Faulet	4c13568b49	MEDIUM: stconn: Add two date to track successful reads and blocked sends The stream endpoint descriptor now owns two date, lra (last read activity) and fsb (first send blocked). The first one is updated every time a read activity is reported, including data received from the endpoint, successful connect, end of input and shutdown for reads. A read activity is also reported when receives are unblocked. It will be used to detect read timeouts. The other one is updated when no data can be sent to the endpoint and reset when some data are sent. It is the date of the first send blocked by the endpoint. It will be used to detect write timeouts. Helper functions are added to report read/send activity and to retrieve lra/fsb date.	2023-02-22 14:52:15 +01:00
Christopher Faulet	5aaacfbccd	MEDIUM: stconn: Replace read and write timeouts by a unique I/O timeout Read and write timeouts (.rto and .wto) are now replaced by an unique timeout, call .ioto. Since the recent refactoring on channel's timeouts, both use the same value, the client timeout on client side and the server timeout on the server side. Thus, this part may be simplified. Now it represents the I/O timeout.	2023-02-22 14:52:15 +01:00
Christopher Faulet	f8413cba2a	MEDIUM: channel/stconn: Move rex/wex timer from the channel to the sedesc These timers are related to the I/O. Thus it is logical to move them into the SE descriptor. The patch is a bit huge but it is just a replacement. However it is error-prone. From the stconn or the stream, helper functions are used to get, set or reset these timers. This simplify the timers manipulations.	2023-02-22 14:52:15 +01:00
Christopher Faulet	ed7e66fe1a	MINOR: channel/stconn: Move rto/wto from the channel to the stconn Read and write timeouts concerns the I/O. Thus, it is logical to move it into the stconn. At the end, the stream is responsible to detect the timeouts. So it is logcial to have these values in the stconn and not in the SE descriptor. But it may change depending on the recfactoring. So, now: * scf->rto is used instead of req->rto * scf->wto is used instead of res->wto * scb->rto is used instead of res->rto * scb->wto is used instead of req->wto	2023-02-22 14:52:15 +01:00
Christopher Faulet	2e56a73459	MAJOR: channel: Remove flags to report READ or WRITE errors This patch removes CF_READ_ERROR and CF_WRITE_ERROR flags. We now rely on SE_FL_ERR_PENDING and SE_FL_ERROR flags. SE_FL_ERR_PENDING is used for write errors and SE_FL_ERROR for read or unrecoverable errors. When a connection error is reported, SE_FL_ERROR and SE_FL_EOS are now set and a read event and a write event are reported to be sure the stream will properly process the error. At the stream-connector level, it is similar. When an error is reported during a send, a write event is triggered. On the read side, nothing more is performed because an error at this stage is enough to wake the stream up. A major change is brought with this patch. We stop to check flags of the ooposite channel to report abort or timeout. It also means when an read or write error is reported on a side, we no longer update the other side. Thus a read error on the server side does no long lead to a write error on the client side. This should ease errors report.	2023-02-22 14:52:15 +01:00
Christopher Faulet	81fdeb8ce2	MEDIUM: channel: Remove CF_READ_NOEXP flag This flag was introduced in 1.3 to fix a design issue. It was untouch since then but there is no reason to still have this trick. Note it could be good to review what happens in HTTP with the server is waiting for the end of the request. It could be good to be sure a client timeout is always reported.	2023-02-22 14:52:14 +01:00
Aurelien DARRAGON	3ffbf3896d	BUG/MEDIUM: httpclient/lua: fix a race between lua GC and hlua_ctx_destroy In `bb581423b` ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient"), a new logic was implemented to make sure that when a lua ctx destroyed, related httpclients are correctly destroyed too to prevent a such httpclients from being resuscitated on a destroyed lua ctx. This was implemented by adding a list of httpclients within the lua ctx, and a new function, hlua_httpclient_destroy_all(), that is called under hlua_ctx_destroy() and runs through the httpclients list in the lua context to properly terminate them. This was done with the assumption that no concurrent Lua garbage collection cycles could occur on the same ressources, which seems OK since the "lua" context is about to be freed and is not explicitly being used by other threads. But when 'lua-load' is used, the main lua stack is shared between multiple OS threads, which means that all lua ctx in the process are linked to the same parent stack. Yet it seems that lua GC, which can be triggered automatically under lua_resume() or manually through lua_gc(), does not limit itself to the "coroutine" stack (the stack referenced in lua->T) when performing the cleanup, but is able to perform some cleanup on the main stack plus coroutines stacks that were created under the same main stack (via lua_newthread()) as well. This can be explained by the fact that lua_newthread() coroutines are not meant to be thread-safe by design. Source: http://lua-users.org/lists/lua-l/2011-07/msg00072.html (lua co-author) It did not cause other issues so far because most of the time when using 'lua-load', the global lua lock is taken when performing critical operations that are known to interfere with the main stack. But here in hlua_httpclient_destroy_all(), we don't run under the global lock. Now that we properly understand the issue, the fix is pretty trivial: We could simply guard the hlua_httpclient_destroy_all() under the global lua lock, this would work but it could increase the contention over the global lock. Instead, we switched 'lua->hc_list' which was introduced with `bb581423b` from simple list to mt_list so that concurrent accesses between hlua_httpclient_destroy_all and hlua_httpclient_gc() are properly handled. The issue was reported by @Mark11122 on Github #2037. This must be backported with `bb581423b` ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient") as far as 2.5.	2023-02-22 11:44:22 +01:00
Willy Tarreau	27629a7d65	MINOR: compiler: add a TOSTR() macro to turn a value into a string Pretty often we have to emit a value (setting, limit etc) in an error message, and this value is known at compile-time, and just doing this forces to use a printf format such as "%d". Let's have a simple macro to turn any other macro or value into a string that can be concatenated with the rest of the string around. This simplifies error messages production on the CLI for example.	2023-02-22 09:10:53 +01:00
Remi Tricot-Le Breton	879debeecb	BUG/MINOR: cache: Cache response even if request has "no-cache" directive Since commit `cc9bf2e5f` "MEDIUM: cache: Change caching conditions" responses that do not have an explicit expiration time are not cached anymore. But this mechanism wrongly used the TX_CACHE_IGNORE flag instead of the TX_CACHEABLE one. The effect this had is that a cacheable response that corresponded to a request having a "Cache-Control: no-cache" for instance would not be cached. Contrary to what was said in the other commit message, the "checkcache" option should not be impacted by the use of the TX_CACHEABLE flag instead of the TX_CACHE_IGNORE one. The response is indeed considered as not cacheable if it has no expiration time, regardless of the presence of a cookie in the response. This should fix GitHub issue #2048. This patch can be backported up to branch 2.4.	2023-02-21 18:35:41 +01:00
Christopher Faulet	c13f3028e8	MINOR: cfgcond: Implement enabled condition expression Implement a way to test if some options are enabled at run-time. For now, following options may be detected: POLL, EPOLL, KQUEUE, EVPORTS, SPLICE, GETADDRINFO, REUSEPORT, FAST-FORWARD, SERVER-SSL-VERIFY-NONE These options are those that can be disabled on the command line. This way it is possible, from a reg-test for instance, to know if a feature is supported or not : feature cmd "$HAPROXY_PROGRAM -cc '!(globa.tune & GTUNE_NO_FAST_FWD)'"	2023-02-21 11:44:55 +01:00
Christopher Faulet	a1fdad784b	MINOR: cfgcond: Implement strstr condition expression Implement a way to match a substring in a string. The strstr expresionn can now be used to do so.	2023-02-21 11:44:55 +01:00
Christopher Faulet	2f7c82bfdf	BUG/MINOR: haproxy: Fix option to disable the fast-forward The option was renamed to only permit to disable the fast-forward. First there is no reason to enable it because it is the default behavior. Then it introduced a bug because there is no way to be sure the command line has precedence over the configuration this way. So, the option is now named "tune.disable-fast-forward" and does not support any argument. And of course, the commande line option "-dF" has now precedence over the configuration. No backport needed.	2023-02-21 11:44:55 +01:00
Amaury Denoyelle	77ed63106d	MEDIUM: quic: trigger fast connection closing on process stopping With previous commit, quic-conn are now handled as jobs to prevent the termination of haproxy process. This ensures that QUIC connections are closed when all data are acknowledged by the client and there is no more active streams. The quic-conn layer emits a CONNECTION_CLOSE once the MUX has been released and all streams are acknowledged. Then, the timer is scheduled to definitely free the connection after the idle timeout period. This allows to treat late-arriving packets. Adjust this procedure to deactivate this timer when process stopping is in progress. In this case, quic-conn timer is set to expire immediately to free the quic-conn instance as soon as possible. This allows to quickly close haproxy process. This should be backported up to 2.7.	2023-02-20 11:20:18 +01:00
Amaury Denoyelle	eb7d320d25	MINOR: mux-quic: implement client-fin timeout Implement client-fin timeout for MUX quic. This timeout is used once an applicative layer shutdown has been called. In HTTP/3, this corresponds to the emission of a GOAWAY. This should be backported up to 2.7.	2023-02-20 11:20:18 +01:00
Amaury Denoyelle	b30247b16c	MINOR: mux-quic: define qc_shutdown() Factorize shutdown operation in a dedicated function qc_shutdown(). This will allow to call it from multiple places. A new flag QC_CF_APP_SHUT is also defined to ensure it will only be executed once even if called multiple times per connection. This commit will be useful to properly support haproxy soft stop. This should be backported up to 2.7.	2023-02-20 11:18:58 +01:00
Frédéric Lécaille	2f531116ed	MINOR: quic: Add traces to qc_kill_conn() Very minor modification to help in debugging issues. Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	a2c62c3141	MINOR: quic: Kill the connections on ICMP (port unreachable) packet receipt The send*() syscall which are responsible of such ICMP packets reception fails with ECONNREFUSED as errno. man(7) udp ECONNREFUSED No receiver was associated with the destination address. This might be caused by a previous packet sent over the socket. We must kill asap the underlying connection. Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	75c8ad5490	MINOR: quic: Move code to wakeup the timer task to avoid anti-amplication deadlock This code was there because the timer task was not running on the same thread as the one which parse the QUIC packets. Now that this is no more the case, we can wake up this task directly. Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Frédéric Lécaille	1dbeb35f80	MINOR: quic: Add new traces about by connection RX buffer handling Move quic_rx_pkts_del() out of quic_conn.h to make it benefit from the TRACE API. Add traces which already already helped in diagnosing an issue encountered with ngtcp2 which sent too much 1RTT packets before the handshake completion. This has been fixed here after having discussed with Tasuhiro on QUIC dev slack: https://github.com/ngtcp2/ngtcp2/pull/663 Must be backported to 2.7.	2023-02-17 17:36:30 +01:00
Amaury Denoyelle	14037bf26f	MINOR: h3: add traces on decode_qcs callback Add traces inside h3_decode_qcs(). Every error path has now its dedicated trace which should simplify debugging. Each early returns has been converted to a goto invocation. To complete the demux tracing, demux frame type and length are now printed using the h3s instance whenever its possible on trace invocation. A new internal value H3_FT_UNINIT is used as a frame type to mark demuxing as inactive. This should be backported up to 2.7.	2023-02-17 17:31:52 +01:00
Amaury Denoyelle	381d8137e3	MINOR: h3/hq-interop: handle no data in decode_qcs() with FIN set Properly handle a STREAM frame with no data but the FIN bit set at the application layer. H3 and hq-interop decode_qcs() callback have been adjusted to not return early in this case. If the FIN bit is accepted, a HTX EOM must be inserted for the upper stream layer. If the FIN is rejected because the stream cannot be closed, a proper CONNECTION_CLOSE error will be triggered. A new utility function qcs_http_handle_standalone_fin() has been implemented in the qmux_http module. This allows to simply add the HTX EOM on qcs HTX buffer. If the HTX buffer is empty, a EOT is first added to ensure it will be transmitted above. This commit will allow to properly handle FIN notify through an empty STREAM frame. However, it is not sufficient as currently qcc_recv() skip the decode_qcs() invocation when the offset is already received. This will be fixed in the next commit. This should be backported up to 2.6 along with the next patch.	2023-02-17 16:25:00 +01:00
Willy Tarreau	3e820a1056	MINOR: threads: add flags to know if a thread is started and/or running Several times during debugging it has been difficult to find a way to reliably indicate if a thread had been started and if it was still running. It's really not easy because the elements we look at are not necessarily reliable (e.g. harmless bit or idle bit might not reflect what we think during a signal). And such notions can be subjective anyway. Here we define two thread flags, TH_FL_STARTED which is set as soon as a thread enters run_thread_poll_loop() and drops the idle bit, and another one, TH_FL_IN_LOOP, which is set when entering run_poll_loop() and cleared when leaving it. This should help init/deinit code know whether it's called from a non-initialized thread (i.e. tid must not be trusted), or shared functions know if they're being called from a running thread or from init/deinit code outside of the polling loop.	2023-02-17 16:01:34 +01:00
Christopher Faulet	d4eaa8af6b	MINOR: global: Add an option to disable the data fast-forward The new global option "tune.fast-forward" can be set to "off" to disable the data fast-forward. It is an debug option, thus it is internally marked as experimental. The directive "expose-experimental-directives" must be set first to use this one. By default, the data fast-forward is enable. It could be usefull to force to wake the stream up when data are received. To be sure, evreything works fine in this case. The data fast-forward is an optim. It must work without it. But some code may rely on the fact the stream will not be woken up. With this option, it is possible to spot some hidden bugs.	2023-02-17 10:17:02 +01:00
William Lallemand	44979ad680	BUG/MINOR: config: crt-list keywords mistaken for bind ssl keywords This patch fixes an issue in the "-dK" keywords dumper, which was mistakenly displaying the "crt-list" keywords for "bind ssl" keywords. The patch fixes the issue by dumping the "crt-list" keywords in its own section, and dumping the "bind" keywords which are in the "SSL" scope with a "bind ssl" prefix. This commit depends on the previous "MINOR: ssl: rename confusing ssl_bind_kws" commit. Must be backported in 2.6. Diff of the `./haproxy -dKall -q -c -f /dev/null` output before and after the patch in 2.8-dev4: \| @@ -190,30 +190,9 @@ listen \| use-fcgi-app \| bind <addr> accept-netscaler-cip +1 \| bind <addr> accept-proxy \| - bind <addr> allow-0rtt \| - bind <addr> alpn +1 \| bind <addr> backlog +1 \| - bind <addr> ca-file +1 \| - bind <addr> ca-ignore-err +1 \| - bind <addr> ca-sign-file +1 \| - bind <addr> ca-sign-pass +1 \| - bind <addr> ca-verify-file +1 \| - bind <addr> ciphers +1 \| - bind <addr> ciphersuites +1 \| - bind <addr> crl-file +1 \| - bind <addr> crt +1 \| - bind <addr> crt-ignore-err +1 \| - bind <addr> crt-list +1 \| - bind <addr> curves +1 \| bind <addr> defer-accept \| - bind <addr> ecdhe +1 \| bind <addr> expose-fd +1 \| - bind <addr> force-sslv3 \| - bind <addr> force-tlsv10 \| - bind <addr> force-tlsv11 \| - bind <addr> force-tlsv12 \| - bind <addr> force-tlsv13 \| - bind <addr> generate-certificates \| bind <addr> gid +1 \| bind <addr> group +1 \| bind <addr> id +1 \| @@ -225,48 +204,52 @@ listen \| bind <addr> name +1 \| bind <addr> namespace +1 \| bind <addr> nice +1 \| - bind <addr> no-ca-names \| - bind <addr> no-sslv3 \| - bind <addr> no-tls-tickets \| - bind <addr> no-tlsv10 \| - bind <addr> no-tlsv11 \| - bind <addr> no-tlsv12 \| - bind <addr> no-tlsv13 \| - bind <addr> npn +1 \| - bind <addr> prefer-client-ciphers \| bind <addr> process +1 \| bind <addr> proto +1 \| bind <addr> severity-output +1 \| bind <addr> shards +1 \| - bind <addr> ssl \| - bind <addr> ssl-max-ver +1 \| - bind <addr> ssl-min-ver +1 \| - bind <addr> strict-sni \| bind <addr> tcp-ut +1 \| bind <addr> tfo \| bind <addr> thread +1 \| - bind <addr> tls-ticket-keys +1 \| bind <addr> transparent \| bind <addr> uid +1 \| bind <addr> user +1 \| bind <addr> v4v6 \| bind <addr> v6only \| - bind <addr> verify +1 \| bind <addr> ssl allow-0rtt \| bind <addr> ssl alpn +1 \| bind <addr> ssl ca-file +1 \| + bind <addr> ssl ca-ignore-err +1 \| + bind <addr> ssl ca-sign-file +1 \| + bind <addr> ssl ca-sign-pass +1 \| bind <addr> ssl ca-verify-file +1 \| bind <addr> ssl ciphers +1 \| bind <addr> ssl ciphersuites +1 \| bind <addr> ssl crl-file +1 \| + bind <addr> ssl crt +1 \| + bind <addr> ssl crt-ignore-err +1 \| + bind <addr> ssl crt-list +1 \| bind <addr> ssl curves +1 \| bind <addr> ssl ecdhe +1 \| + bind <addr> ssl force-sslv3 \| + bind <addr> ssl force-tlsv10 \| + bind <addr> ssl force-tlsv11 \| + bind <addr> ssl force-tlsv12 \| + bind <addr> ssl force-tlsv13 \| + bind <addr> ssl generate-certificates \| bind <addr> ssl no-ca-names \| + bind <addr> ssl no-sslv3 \| + bind <addr> ssl no-tls-tickets \| + bind <addr> ssl no-tlsv10 \| + bind <addr> ssl no-tlsv11 \| + bind <addr> ssl no-tlsv12 \| + bind <addr> ssl no-tlsv13 \| bind <addr> ssl npn +1 \| - bind <addr> ssl ocsp-update +1 \| + bind <addr> ssl prefer-client-ciphers \| bind <addr> ssl ssl-max-ver +1 \| bind <addr> ssl ssl-min-ver +1 \| + bind <addr> ssl strict-sni \| + bind <addr> ssl tls-ticket-keys +1 \| bind <addr> ssl verify +1 \| server <name> <addr> addr +1 \| server <name> <addr> agent-addr +1 \| @@ -591,6 +574,23 @@ listen \| http-after-response unset-var* \| userlist \| peers \| +crt-list \| + allow-0rtt \| + alpn +1 \| + ca-file +1 \| + ca-verify-file +1 \| + ciphers +1 \| + ciphersuites +1 \| + crl-file +1 \| + curves +1 \| + ecdhe +1 \| + no-ca-names \| + npn +1 \| + ocsp-update +1 \| + ssl-max-ver +1 \| + ssl-min-ver +1 \| + verify +1 \| # List of registered CLI keywords: \| @!<pid> [MASTER] \| @<relative pid> [MASTER]	2023-02-16 16:14:37 +01:00
William Lallemand	af67806651	MINOR: ssl: rename confusing ssl_bind_kws The ssl_bind_kw structure is exclusively used for crt-list keyword, it must be named otherwise to remove the confusion. The structure was renamed ssl_crtlist_kws.	2023-02-16 16:03:45 +01:00
Amaury Denoyelle	15c74702d5	MINOR: quic: implement a basic "show quic" CLI handler Implement a basic "show quic" CLI handler. This command will be useful to display various information on all the active QUIC frontend connections. This work is heavily inspired by "show sess". Most notably, a global list of quic_conn has been introduced to be able to loop over them. This list is stored per thread in ha_thread_ctx. Also add three CLI handlers for "show quic" in order to allocate and free the command context. The dump handler runs on thread isolation. Each quic_conn is referenced using a back-ref to handle deletion during handler yielding. For the moment, only a list of raw quic_conn pointers is displayed. The handler will be completed over time with more information as needed. This should be backported up to 2.7.	2023-02-09 18:11:00 +01:00
Aurelien DARRAGON	3e7a0bb70b	MINOR: cfgparse/server: move (min/max)conn postparsing logic into dedicated function In check_config_validity() function, we performed some consistency checks to adjust minconn/maxconn attributes for each declared server. We move this logic into a dedicated function named srv_minmax_conn_apply() to be able to perform those checks later in the process life when needed (ie: dynamic servers)	2023-02-08 14:48:21 +01:00
William Lallemand	a14686d096	MINOR: ssl/ocsp: add a function to check the OCSP update configuration Deduplicate the code which checks the OCSP update in the ckch_store and in the crtlist_entry. Also, jump immediatly to error handling when the ERR_FATAL is catched.	2023-02-08 11:40:31 +01:00
William Lallemand	b4b9caa65f	BUILD: ssl/ocsp: ssl_ocsp-t.h depends on ssl_sock-t.h ssl_ocsp-t.h uses SSL_SOCK_NUM_KEYTYPES which is defined in ssl_sock-t.h. No backport needed.	2023-02-08 11:31:03 +01:00
Willy Tarreau	28360dc53f	MEDIUM: clock: force internal time to wrap early after boot GH issue #2034 clearly indicates yet another case of time roll-over that went badly. Issues that happen only once every 50 days are hard to detect and debug, and are usually reported more or less synchronized from multiple sources. This patch finally does what had long been planned but never done yet, which is to force the time to wrap early after boot so that any such remaining issue can be spotted quicker. The margin delay here is 20s (it may be changed by setting BOOT_TIME_WRAP_SEC to another value). This value seems sufficient to permit failed health checks to succeed and traffic to come in and possibly start to update some time stamps (accept dates in logs, freq counters, stick-tables expiration dates etc). It could theoretically be helpful to have this in 2.7, but as can be seen with the two patches below, we've already had incorrect use cases of the internal monotonic time when the wall-clock one was needed, so we could expect to detect other ones in the future. Note that this will not induce bugs, it will only make them happen much faster (i.e. no need to wait for 50 days before seeing them). If it were to eventually be backported, these two previous patches must also be backported: BUG/MINOR: clock: use distinct wall-clock and monotonic start dates BUG/MEDIUM: cache: use the correct time reference when comparing dates	2023-02-08 11:10:33 +01:00
Willy Tarreau	6093ba47c0	BUG/MINOR: clock: do not mix wall-clock and monotonic time in uptime calculation We've had a start date even before the internal monotonic clock existed, but once the monotonic clock was added, the start date was not updated to distinguish the wall clock time units and the internal monotonic time units. The distinction is important because both clocks do not necessarily progress at the same speed. The very rare occurrences of the wall-clock date are essentially for human consumption and communication with third parties (e.g. report the start date in "show info" for monitoring purposes). However currently this one is also used to measure the distance to "now" as being the process' uptime. This is actually not correct. It only works because for now the two dates are initialized at the exact same instant at boot but could still be wrong if the system's date shows a big jump backwards during startup for example. In addition the current situation prevents us from enforcing an abritrary offset at boot to reveal some heisenbugs. This patch adds a new "start_time" at boot that is set from "now" and is used in uptime calculations. "start_date" instead is now set from "date" and will always reflect the system date for human consumption (e.g. in "show info"). This way we're now sure that any drift of the internal clock relative to the system date will not impact the reported uptime. This could possibly be backported though it's unlikely that anyone has ever noticed the problem.	2023-02-08 11:06:55 +01:00
Frédéric Lécaille	b7a406ac34	MINOR: quic: Update version_information transport parameter to draft-14 This is necessary to make our stack negotiate the QUIC versions with clients. (See https://author-tools.ietf.org/iddiff?url1=draft-ietf-quic-version-negotiation-13&url2=draft-ietf-quic-version-negotiation-14&difftype=--html) Must be backported to 2.7.	2023-02-06 11:54:07 +01:00
Aurelien DARRAGON	e5958d0292	BUG/MEDIUM: stats: fix resolvers dump In ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats"), we forgot to apply the patch in resolvers.c which provides the stats_dump_resolvers() function that is involved when dumping with "resolvers" domain. As a consequence, resolvers dump was broken because stats_dump_one_line(), which is used in stats_dump_resolv_to_buffer(), implicitely uses trash_chunk from stats.c to prepare the dump, and stats_putchk() is then called with global trash (currently empty) as output data. Given that trash_dump variable is static and thus only available within stats.c we change stats_putchk() function prototype so that the function does not take the output buffer as an argument. Instead, stats_putchk() will implicitly use the local trash_dump variable declared in stats.c. It will also prevent further mixups between stats_dump_* functions and stats_putchk(). This needs to be backported with ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats")	2023-02-06 07:53:03 +01:00
Willy Tarreau	f2988e1447	CLEANUP: listener/thread: remove now unused bind_conf's bind_tgroup/bind_thread Not needed anymore since last commit, let's get rid of it.	2023-02-03 18:00:21 +01:00
Willy Tarreau	f0de8cacc4	MEDIUM: listener/config: make the "thread" parser rely on thread_sets Instead of reading and storing a single group and a single mask for a "thread" directive on a bind line, we now store the complete range in a thread set that's stored in the bind_conf. The bind_parse_thread() function now just calls parse_thread_set() to complete the current set, which starts empty, and thread_resolve_group_mask() was updated to support retrieving thread group numbers or absolute thread numbers directly from the pre-filled thread_set, and continue to feed bind_tgroup and bind_thread. The CLI parsers which were pre-initialized to set the bind_tgroup to 1 cannot do it anymore as it would prevent one from restricting the thread set. Instead check_config_validity() now detects the CLI frontend and passes the info down to thread_resolve_group_mask() that will automatically use only the group 1's threads for these listeners. The same is done for the peers listeners for now. At this step it's already possible to start with all previous valid configs as well as extended ones supporting comma-delimited thread sets. In addition the parser already accepts large ranges spanning multiple groups, but since the underlying listeners infrastructure is not read, for now we're maintaining a specific check against this at the higher level of the config validity check. The patch is a bit large because thread resolution is performed in multiple steps, so we need to adjust all of them at once to preserve functional and technical consistency.	2023-02-03 18:00:21 +01:00
Willy Tarreau	bef43dfa60	MINOR: thread: add a simple thread_set API The purpose is to be able to store large thread sets, defined by ranges that may cross group boundaries, as well as define lists of groups and masks. The thread_set struct implements the storage, and the parser is in parse_thread_set(), with a focus on "bind" lines, but not only.	2023-02-03 18:00:21 +01:00
Willy Tarreau	9e2682afed	MINOR: listener: remove the now useless LI_F_QUIC_LISTENER flag This flag is only used to tag a QUIC listener, which we now know by its bind_conf's xprt as well. It's only used to decide whether or not to perform an extra initialization step on the listener. Let's drop it as well as the flags field. With the various fields and options moved, the listener struct reduced by 48 bytes total.	2023-02-03 18:00:20 +01:00
Willy Tarreau	b25634d23e	CLEANUP: listener: remove the now unused options field All options that made sense were moved to the bind_conf, and remaining ones were removed. This field isn't used at all anymore. The thr_idx field was moved there to plug the hole.	2023-02-03 18:00:20 +01:00
Willy Tarreau	4c1d3a953d	MINOR: listener: get rid of LI_O_TCP_L4_RULES and LI_O_TCP_L5_RULES LI_O_TCP_L4_RULES and LI_O_TCP_L5_RULES are only set by from the proxy based on the presence or absence of tcp_req l4/l5 rules. It's basically as cheap to check the list as it is to check the flag, except that there is no need to maintain a copy. Let's get rid of them, and this may ease addition of more dynamic stuff later.	2023-02-03 18:00:20 +01:00
Willy Tarreau	1714680cec	MINOR: listener: move LI_O_UNLIMITED and LI_O_NOSTOP to bind_conf These two flags are entirely for internal use and are even per proxy in practice since they're used for peers and CLI to indicate (for the first one) that the listener(s) are not subject to connection limits, and for the second that the listener(s) should not be stopped on soft-stop. No need to keep them in the listeners, let's move them to the bind_conf under names BC_O_UNLIMITED and BC_O_NOSTOP.	2023-02-03 18:00:20 +01:00
Willy Tarreau	f1b4730f7d	MINOR: listener: move the ACC_PROXY and ACC_CIP options to bind_conf These are only set per bind line and used when creating a sessions, we can move them to the bind_conf under the names BC_O_ACC_PROXY and BC_O_ACC_CIP respectively.	2023-02-03 18:00:20 +01:00
Willy Tarreau	c492f1b17f	MINOR: listener: move TCP_FO to bind_conf It's set per bind line ("tfo") and only used in tcp_bind_listener() so there's no point keeping the address family tests, let's just store the flag in the bind_conf under the name BC_O_TCP_FO.	2023-02-03 18:00:20 +01:00
Willy Tarreau	d9b4d21248	MINOR: listener: move the DEF_ACCEPT option to the bind_conf This option is set per bind line, and was only set stored when the address family is AF_INET4 or AF_INET6. That's pointless since it's used only in tcp_bind_listener() which is only used for such families as well, so it can now be moved to the bind_conf under the name BC_O_DEF_ACCEPT.	2023-02-03 18:00:20 +01:00
Willy Tarreau	9bdcf42922	MINOR: listener: move the NOQUICKACK option to the bind_conf It solely depends on the bind line so let's move it there under the name BC_O_NOQUICKACK.	2023-02-03 18:00:20 +01:00
Willy Tarreau	cfb7c2f515	MINOR: listener: move the NOLINGER option to the bind_conf It's currently declared per-frontend, though it would make sense to support it per-line but in no case per-listener. Let's move the option to a bind_conf option BC_O_NOLINGER.	2023-02-03 18:00:20 +01:00
Willy Tarreau	7dbd4187dc	MINOR: listener: move the nice field to the bind_conf This is another bind line setting which can move to the bind_conf. Note that it leaves a 2-byte hole in the listener struct.	2023-02-03 18:00:20 +01:00
Willy Tarreau	d5983cef80	MINOR: listener: remove the useless ->default_target field This field is used by stream_new() to optionally set the applet the stream will connect to for simple proxies like the CLI for example. But it has never been configurable to anything and is always strictly equal to the frontend's ->default_target. Let's just drop it and make stream_new() only use the frontend's. It makes more sense anyway as we don't want the proxy to work differently based on the "bind" line. This idea was brought in 1.6 hoping that the h2 implementation would use applets for decoding (which was dropped after the very first attempt in 1.8).	2023-02-03 18:00:20 +01:00
Willy Tarreau	3083615410	MINOR: listener: move the ->accept callback to the bind_conf The accept callback directly derives from the upper layer, generally it's session_accept_fd(). As such it's also defined per bind line so it makes sense to move it there.	2023-02-03 18:00:20 +01:00
Willy Tarreau	758c69d951	MINOR: listener: move the maxconn parameter to the bind_conf The maxconn is set per bind line so let's move it there. This might possibly even slightly reduce inter-thread contention since this one is read-mostly and it was stored next to nbconn which changes for each connection setup or teardown.	2023-02-03 18:00:20 +01:00
Willy Tarreau	1920f897d8	MINOR: listener: move the backlog setting from listener to bind_conf The backlog setting is also defined by the bind_conf, so let's move it there.	2023-02-03 18:00:20 +01:00
Willy Tarreau	882f2485a1	MINOR: listener: move maxaccept from listener to bind_conf Like for previous values, maxaccept is really per-bind_conf, so let's move it there. Some frontends (peers, log) set it to 1 so the assignment was slightly moved.	2023-02-03 18:00:20 +01:00
Willy Tarreau	ee378165fb	MINOR: listener: move maxseg and tcp_ut to bind_conf These two arguments were only set and only used with tcpv4/tcpv6. Let's just store them into the bind_conf instead of duplicating them for all listeners since they're fixed per "bind" line.	2023-02-03 18:00:20 +01:00
Willy Tarreau	7866e8e50d	MEDIUM: listener: move the analysers mask to the bind_conf When bind_conf were created, some elements such as the analysers mask ought to have moved there but that wasn't the case. Now that it's getting clearer that bind_conf provides all binding parameters and the listener is essentially a listener on an address, it's starting to get really confusing to keep such parameters in the listener, so let's move the mask to the bind_conf. We also take this opportunity for pre-setting the mask to the frontend's upon initalization. Now several loops have one less argument to take care of.	2023-02-03 18:00:20 +01:00
Fr�d�ric L�caille	0aa79953c9	BUG/MINOR: quic: Unchecked source connection ID The SCID (source connection ID) used by a peer (client or server) is sent into the long header of a QUIC packet in clear. But it is also sent into the transport parameters (initial_source_connection_id). As these latter are encrypted into the packet, one must check that these two pieces of information do not differ due to a packet header corruption. Furthermore as such a connection is unusuable it must be killed and must stop as soon as possible processing RX/TX packets. Implement qc_kill_con() to flag a connection as unusable and to kille it asap waking up the idle timer task to release the connection. Add a check to quic_transport_params_store() to detect that the SCIDs do not match and make it call qc_kill_con(). Add several tests about connection to be killed at several critial locations, especially in the TLS stack callback to receive CRYPTO data from or derive secrets, and before preparing packet after having received others. Must be backported to 2.6 and 2.7.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	af25a69c8b	MEDIUM: quic: Remove qc_conn_finalize() from the ClientHello TLS callbacks This is a bad idea to make the TLS ClientHello callback call qc_conn_finalize(). If this latter fails, this would generate a TLS alert and make the connection send packet whereas it is not functional. But qc_conn_finalize() job was to install the transport parameters sent by the QUIC listener. This installation cannot be done at any time. This must be done after having possibly negotiated the QUIC version and before sending the first Handshake packets. It seems the better moment to do that in when the Handshake TX secrets are derived. This has been found inspecting the ngtcp2 code. Calling SSL_set_quic_transport_params() too late would make the ServerHello to be sent without the transport parameters. The code for the connection update which was done from qc_conn_finalize() has been moved to quic_transport_params_store(). So, this update is done as soon as possible. Add QUIC_FL_CONN_TX_TP_RECEIVED to flag the connection as having received the peer transport parameters. Indeed this is required when the ClientHello message is splitted between packets. Add QUIC_FL_CONN_FINALIZED to protect the connection from calling qc_conn_finalize() more than one time. This latter is called only when the connection has received the transport parameters and after returning from SSL_do_hanshake() which is the function which trigger the TLS ClientHello callback call. Remove the calls to qc_conn_finalize() from from the TLS ClientHello callbacks. Must be backported to 2.6. and 2.7.	2023-02-03 17:55:55 +01:00
Fr�d�ric L�caille	9969adbcdc	MINOR: stats: add by HTTP version cumulated number of sessions and requests Add cum_sess_ver[] new array of counters to count the number of cumulated HTTP sessions by version (h1, h2 or h3). Implement proxy_inc_fe_cum_sess_ver_ctr() to increment these counter. This function is called each a HTTP mux is correctly initialized. The QUIC must before verify the application operations for the mux is for h3 before calling proxy_inc_fe_cum_sess_ver_ctr(). ST_F_SESS_OTHER stat field for the cumulated of sessions others than HTTP sessions is deduced from ->cum_sess_ver counter (for all the session, not only HTTP sessions) from which the HTTP sessions counters are substracted. Add cum_req[] new array of counters to count the number of cumulated HTTP requests by version and others than HTTP requests. This new member replace ->cum_req. Modify proxy_inc_fe_req_ctr() which increments these counters to pass an HTTP version, 0 special values meaning "other than an HTTP request". This is the case for instance for syslog.c from which proxy_inc_fe_req_ctr() is called with 0 as version parameter. ST_F_REQ_TOT stat field compputing for the cumulated number of requests is modified to count the sum of all the cum_req[] counters. As this patch is useful for QUIC, it must be backported to 2.7.	2023-02-03 17:55:49 +01:00
Willy Tarreau	1ea5f410ff	CLEANUP: quic: no need for atomics on packet refcnt This is a leftover from the implementation's history, but the quic_rx_packet and quic_tx_packet ref counts were still atomically updated. It was found in perf top that the cost of the atomic inc in quic_tx_packet_refinc() alone was responsible for 1% of the CPU usage at 135 Gbps. Given that packets are only processed on their assigned thread we don't need that anymore and this can be replaced with regular non-atomic operations. Doing this alone has reduced the CPU usage of qc_do_build_pkt() from 3.6 to 2.5% and increased the overall bit rate by about 1%.	2023-02-03 13:39:20 +01:00
Amaury Denoyelle	24d5b72ca9	MINOR: quic: add config for retransmit limit Define a new configuration option "tune.quic.max-frame-loss". This is used to specify the limit for which a single frame instance can be detected as lost. If exceeded, the connection is closed. This should be backported up to 2.7.	2023-02-03 11:56:46 +01:00
Amaury Denoyelle	e4abb1f2da	MEDIUM: quic: implement a retransmit limit per frame Add a <loss_count> new field in quic_frame structure. This field is set to 0 and incremented each time a sent packet is declared lost. If <loss_count> reached a hard-coded limit, the connection is deemed as failing and is closed immediately with a CONNECTION_CLOSE using INTERNAL_ERROR. By default, limit is set to 10. This should ensure that overall memory usage is limited if a peer behaves incorrectly. This should be backported up to 2.7.	2023-02-03 11:56:42 +01:00
Amaury Denoyelle	57b3eaa793	MINOR: quic: refactor frame deallocation Define a new function qc_frm_free() to handle frame deallocation. New BUG_ON() statements ensure that the deallocated frame is not referenced by other frame. To support this, all LIST_DELETE() have been replaced by LIST_DEL_INIT(). This should enforce that frame deallocation is robust. As a complement, qc_frm_unref() has been moved into quic_frame module. It is justified as this is a utility function related to frame deallocation. It allows to use it in quic_pktns_tx_pkts_release() before calling qc_frm_free(). This should be backported up to 2.7.	2023-02-03 11:55:41 +01:00
Amaury Denoyelle	40c24f1a10	MINOR: quic: define new functions for frame alloc Define two utility functions for quic_frame allocation : * qc_frm_alloc() is used to allocate a new frame * qc_frm_dup() is used to allocate a new frame by duplicating an existing one Theses functions are useful to centralize quic_frame initialization. Note that pool_zalloc() is replaced by a proper pool_alloc() + explicit initialization code. This commit will simplify implementation of the per frame retransmission limitation. Indeed, a new counter will be added in quic_frame structure which must be initialized to 0. This should be backported up to 2.7.	2023-02-03 10:44:26 +01:00
Amaury Denoyelle	2216b0866e	MINOR: quic: remove fin from quic_stream frame type A dedicated <fin> field was used in quic_stream structure. However, this info is already encoded in the frame type field as specified by QUIC protocol. In fact, only code for packet reception used the <fin> field. On the sending side, we only checked for the FIN bit. To align both sides, remove the <fin> field and only used the FIN bit. This should be backported up to 2.7.	2023-02-03 09:46:55 +01:00
Amaury Denoyelle	1e340ba6bc	MINOR: mux-quic/h3: define stream close callback Define a new qcc_app_ops callback named close(). This will be used to notify app-layer about the closure of a stream by the remote peer. Its main usage is to ensure that the closure is allowed by the application protocol specification. For the moment, close is not implemented by H3 layer. However, this function will be mandatory to properly reject a STOP_SENDING on the control stream and preventing a later crash. As such, this commit must be backported with the next one on 2.6. This is related to github issue #2006.	2023-01-30 15:56:25 +01:00
Aurelien DARRAGON	b2e2ec51b3	MEDIUM: proxy/http_ext: implement dynamic http_ext proxy http-only options implemented in http_ext were statically stored within proxy struct. We're making some changes so that http_ext are now stored in a dynamically allocated structs. http_ext related structs are only allocated when needed to save some space whenever possible, and they are automatically freed upon proxy deletion. Related PX_O_HTTP{7239,XFF,XOT) option flags were removed because we're now considering an http_ext option as 'active' if it is allocated (ptr is not NULL) A few checks (and BUG_ON) were added to make these changes safe because it adds some (acceptable) complexity to the previous design. Also, proxy.http was renamed to proxy.http_ext to make things more explicit.	2023-01-27 15:18:59 +01:00
Aurelien DARRAGON	9ded834adc	OPTIM: http_ext/7239: introduce c_mode to save some space forwarded header option (rfc7239) deals with sample expressions in two steps: first a sample expression string is extracted from the config file and later in startup sequence this string is converted into the resulting sample_expr. We need to perform these two steps because we cannot compile the expr too early in the parsing sequence. (or we would miss some context) Because of this, we have two dinstinct structure members (expr and expr_s) for each 7239 field supporting sample expressions. This is not cool, because we're bloating the http forwarded config structure, and thus, bloating proxy config structure. To address this, we now merge both expr and expr_s members inside a single union to regain some space. This forces us to perform some additional logic to make sure to use the proper structure member at different parsing steps. Thanks to this, we're also able to free/release related config hints and sample expression strings as soon as the sample expression compilation is finished.	2023-01-27 15:18:59 +01:00
Aurelien DARRAGON	f958341610	MINOR: proxy: move 'originalto' option to http_ext Just like forwarded (7239) header and forwardfor header, move parsing, logic and management of 'originalto' option into http_ext dedicated class. We're only doing this to standardize proxy http options management. Existing behavior remains untouched.	2023-01-27 15:18:59 +01:00
Aurelien DARRAGON	730b9836a6	MINOR: proxy: move 'forwardfor' option to http_ext Just like forwarded (7239) header, move parsing, logic and management of 'forwardfor' option into http_ext dedicated class. We're only doing this to standardize proxy http options management. Existing behavior remains untouched.	2023-01-27 15:18:59 +01:00
Aurelien DARRAGON	b2bb9257d2	MINOR: proxy/http_ext: introduce proxy forwarded option Introducing http_ext class for http extension related work that doesn't fit into existing http classes. HTTP extension "forwarded", introduced with 7239 RFC is now supported by haproxy. The option supports various modes from simple to complex usages involving custom sample expressions. Examples : # Those servers want the ip address and protocol of the client request # Resulting header would look like this: # forwarded: proto=http;for=127.0.0.1 backend www_default mode http option forwarded #equivalent to: option forwarded proto for # Those servers want the requested host and hashed client ip address # as well as client source port (you should use seed for xxh32 if ensuring # ip privacy is a concern) # Resulting header would look like this: # forwarded: host="haproxy.org";for="_000000007F2F367E:60138" backend www_host mode http option forwarded host for-expr src,xxh32,hex for_port # Those servers want custom data in host, for and by parameters # Resulting header would look like this: # forwarded: host="host.com";by=_haproxy;for="[::1]:10" backend www_custom mode http option forwarded host-expr str(host.com) by-expr str(_haproxy) for for_port-expr int(10) # Those servers want random 'for' obfuscated identifiers for request # tracing purposes while protecting sensitive IP information # Resulting header would look like this: # forwarded: for=_000000002B1F4D63 backend www_for_hide mode http option forwarded for-expr rand,hex By default (no argument provided), forwarded option will try to mimic x-forward-for common setups (source client ip address + source protocol) The option is not available for frontends. no option forwarded is supported. More info about 7239 RFC here: https://www.rfc-editor.org/rfc/rfc7239.html More info about the feature in doc/configuration.txt This should address feature request GH #575 Depends on: - "MINOR: http_htx: add http_append_header() to append value to header" - "MINOR: sample: add ARGC_OPT" - "MINOR: proxy: introduce http only options"	2023-01-27 15:18:59 +01:00
Aurelien DARRAGON	832e9f4119	MINOR: proxy: introduce http only options This commit is innoffensive but will allow to do some code refactors in existing proxy http options. Newly created http related proxy options will also benefit from this.	2023-01-27 15:18:59 +01:00
Aurelien DARRAGON	5f7f5fe76a	MINOR: sample: add ARGC_OPT Add ARGC_OPT enum to provide more context for upcoming sample parse errors involving proxy "option" config directives.	2023-01-27 15:18:59 +01:00
Aurelien DARRAGON	38ebffaf10	MINOR: http_htx: add http_prepend_header() to prepend value to header Just like http_append_header(), but this time to insert new value before an existing one. If the header already contains one or multiple values, ',' is automatically inserted after the new value.	2023-01-27 15:18:59 +01:00
Aurelien DARRAGON	a5a8552cab	MINOR: http_htx: add http_append_header() to append value to header Calling this function as an alternative to http_replace_header_value() to append a new value to existing header instead of replacing the whole header content. If the header already contains one or multiple values: a ',' is automatically appended before the new value. This function is not meant for prepending (providing empty ctx value), in which case we should consider implementing dedicated prepend alternative function.	2023-01-27 15:18:59 +01:00
Willy Tarreau	271c440392	MINOR: h2: add h2_phdr_to_ist() to make ISTs from pseudo headers Till now pseudo headers were passed as const strings, but having them as ISTs will be more convenient for traces. This doesn't change anything for strings which are derived from them (and being constants they're still zero-terminated).	2023-01-26 15:49:43 +01:00
Willy Tarreau	b8b243ac6a	MINOR: trace: add the long awaited TRACE_PRINTF() TRACE_PRINTF() can be used to produce arbitrary trace contents at any trace level. It uses the exact same arguments as other TRACE_* macros, but here they are mandatory since they are followed by the format-string, though they may be filled with zeroes. The reason for the arguments is to match tracking or filtering and not pollute other non-inspected objects. It will probably be used inside loops, in which case there are two points to be careful about: - output atomicity is only per-message, so competing threads may see their messages interleaved. As such, it is recommended that the caller places a recognizable unique context at the beginning of the message such as a connection pointer. - iterating over arrays or lists for all requests could be very expensive. In order to avoid this it is best to condition the call via TRACE_ENABLED() with the same arguments, which will return the same decision. - messages longer than TRACE_MAX_MSG-1 (1023 by default) will be truncated. For example, in order to dump the list of HTTP headers between hpack and h2: if (outlen > 0 && TRACE_ENABLED(TRACE_LEVEL_DEVELOPER, H2_EV_RX_FRAME\|H2_EV_RX_HDR, h2c->conn, 0, 0, 0)) { int i; for (i = 0; list[i].n.len; i++) TRACE_PRINTF(TRACE_LEVEL_DEVELOPER, H2_EV_RX_FRAME\|H2_EV_RX_HDR, h2c->conn, 0, 0, 0, "h2c=%p hdr[%d]=%s:%s", h2c, i, list[i].n.ptr, list[i].v.ptr); } In addition, a lower-level TRACE_PRINTF_LOC() macro is provided, that takes two extra arguments, the caller's location and the caller's function name. This will allow to emit composite traces from central functions on the behalf of another one.	2023-01-26 15:49:43 +01:00
Willy Tarreau	4b36d5e8de	MINOR: trace: add a trace_no_cb() dummy callback for when to use no callback By default, passing a NULL cb to the trace functions will result in the source's default one to be used. For some cases we won't want to use any callback at all, not event the default one. Let's define a trace_no_cb() function for this, that does absolutely nothing.	2023-01-26 15:49:43 +01:00
Willy Tarreau	8f9a9704bb	MINOR: trace: add a TRACE_ENABLED() macro to determine if a trace is active Sometimes it would be necessary to prepare some messages, pre-process some blocks or maybe duplicate some contents before they vanish for the purpose of tracing them. However we don't want to do that for everything that is submitted to the traces, it's important to do it only for what will really be traced. The __trace() function has all the knowledge for this, to the point of even checking the lockon pointers. This commit splits the function in two, one with the trace decision logic, and the other one for the trace production. The first one is now usable through wrappers such as _trace_enabled() and TRACE_ENABLED() which will indicate whether traces are going to be produced for the current source, level, event mask, parameters and tracking.	2023-01-26 15:49:43 +01:00
Willy Tarreau	80f36b2ac2	CLEANUP: trace: remove the QUIC-specific ifdefs There are ifdefs at several places to only define TRC_ARGS_QCON when QUIC is defined, but nothing prevents this code from building without. Let's just remove those ifdefs, the single "if" they avoid is not worth the extra maintenance burden.	2023-01-26 15:49:43 +01:00
Amaury Denoyelle	71fd03632f	MINOR: mux-quic/h3: send SETTINGS as soon as transport is ready As specified by HTTP3 RFC, SETTINGS frame should be sent as soon as possible. Before this patch, this was only done on the first qc_send() invocation. This delay significantly SETTINGS emission until the first H3 response is ready to be transferred. This patch fixes this by ensuring SETTINGS is emitted when MUX-QUIC is being setup. As a side point, return value of finalize operation is checked. This means that an error during SETTINGS emission will cause the connection init to fail. This should be backported up to 2.7.	2023-01-25 16:01:55 +01:00
Willy Tarreau	7e70bfc8cb	MINOR: threads: add a thread_harmless_end() version that doesn't wait thread_harmless_end() needs to wait for rdv_requests to disappear so that we're certain to respect a harmless promise that possibly allowed another thread to proceed under isolation. But this doesn't work in a signal handler because a thread could be interrupted by the debug handler while already waiting for isolation and with rdv_request>0. As such this function could cause a deadlock in such a signal handler. Let's implement a specific variant for this, thread_harmless_end_sig(), that just resets the thread's bit and doesn't wait. It must of course not be used past a check point that would allow the isolation requester to return and see the thread as temporarily harmless then turning back on its promise. This will be needed to fix a race in the debug handler.	2023-01-19 19:22:17 +01:00
Willy Tarreau	b2f38c13d1	BUG/MINOR: thread: always reload threads_enabled in loops A few loops waiting for threads to synchronize such as thread_isolate() rightfully filter the thread masks via the threads_enabled field that contains the list of enabled threads. However, it doesn't use an atomic load on it. Before 2.7, the equivalent variables were marked as volatile and were always reloaded. In 2.7 they're fields in ha_tgroup_ctx[], and the risk that the compiler keeps them in a register inside a loop is not null at all. In practice when ha_thread_relax() calls sched_yield() or an x86 PAUSE instruction, it could be verified that the variable is always reloaded. If these are avoided (e.g. architecture providing neither solution), it's visible in asm code that the variables are not reloaded. In this case, if a thread exists just between the moment the two values are read, the loop could spin forever. This patch adds the required _HA_ATOMIC_LOAD() on the relevant threads_enabled fields. It must be backported to 2.7.	2023-01-19 19:22:17 +01:00
Amaury Denoyelle	7d78eff889	MINOR: h3: extend function for QUIC varint encoding Slighty adjust b_quic_enc_int(). This function is used to encode an integer as a QUIC varint in a struct buffer. A new parameter is added to the function API to specify the width of the encoded integer. By default, 0 should be use to ensure that the minimum space is used. Other valid values are 1, 2, 4 or 8. An error is reported if the width is not large enough. This new parameter will be useful when buffer space is reserved prior to encode an unknown integer value. The maximum size of 8 bytes will be reserved and some data can be put after. When finally encoding the integer, the width can be requested to be 8 bytes. With this new parameter, a small refactoring of the function has been conducted to remove some useless internal variables. This should be backported up to 2.7. It will be mostly useful to implement H3 trailers encoding.	2023-01-19 15:09:01 +01:00
Remi Tricot-Le Breton	bb35e1f5aa	BUG/MINOR: ssl: Fix compilation with OpenSSL 1.0.2 (missing ECDSA_SIG_set0) This function was introduced in OpenSSL 1.1.0. Prior to that, the ECDSA_SIG structure was public. This function was used in commit `5a8f02ae` "BUG/MEDIUM: jwt: Properly process ecdsa signatures (concatenated R and S params)". This patch needs to be backported up to branch 2.5 alongside commit `5a8f02ae`.	2023-01-19 11:13:51 +01:00
William Lallemand	2edc6d0301	Revert "BUILD: ssl: add ECDSA_SIG_set0() for openssl < 1.1 or libressl < 2.7" This reverts commit `d65791e26c`. Conflict with the patch which was originally written and lacks the BN_clear_free() and the NULL check.	2023-01-19 11:13:24 +01:00
Willy Tarreau	d65791e26c	BUILD: ssl: add ECDSA_SIG_set0() for openssl < 1.1 or libressl < 2.7 Commit `5a8f02ae6` ("BUG/MEDIUM: jwt: Properly process ecdsa signatures (concatenated R and S params)") makes use of ECDSA_SIG_set0() which only appeared in openssl-1.1.0 and libressl 2.7, and breaks the build before. Let's just do what it minimally does (only assigns the two fields to the destination). This will need to be backported where the commit above is, likely 2.5.	2023-01-19 10:57:00 +01:00
Fr�d�ric L�caille	21c4c9b854	MINOR: quic: Replace v2 draft definitions by those of the final 2 version This should finalize the support for the QUIC version 2. Must be backported to 2.7.	2023-01-17 16:35:20 +01:00
Fr�d�ric L�caille	12a0317fed	MINOR: quic: Add "no-quic" global option Add "no-quic" to "global" section to disable the use of QUIC transport protocol by all configured QUIC listeners. This is listeners with QUIC addresses on their "bind" lines. Internally, the socket addresses binding is skipped by protocol_bind_all() for receivers with <proto_quic4> or <proto_quic6> as protocol (see protocol struct). Add information about "no-quic" global option to the documentation. Must be backported to 2.7.	2023-01-17 16:35:20 +01:00
Willy Tarreau	e77f4306ba	BUG/MEDIUM: stconn: also consider SE_FL_EOI to switch to SE_FL_ERROR In se_fl_set_error() we used to switch to SE_FL_ERROR only when there is already SE_FL_EOS indicating that the read side is closed. But that is not sufficient, we need to consider all cases where no more reads will be performed on the connection, and as such also include SE_FL_EOI. Without this, some aborted connections during a transfer sometimes only stop after the timeout, because the ERR_PENDING is never promoted to ERROR. This must be backported to 2.7 and requires previous patch "CLEANUP: stconn: always use se_fl_set_error() to set the pending error".	2023-01-17 16:27:35 +01:00
Christopher Faulet	2e47e3a1cf	MINOR: htx: Add an HTX value for the extra field is payload length is unknown When the payload length cannot be determined, the htx extra field is set to the magical vlaue ULLONG_MAX. It is not obvious. This a dedicated HTX value is now used. Now, HTX_UNKOWN_PAYLOAD_LENGTH must be used in this case, instead of ULLONG_MAX.	2023-01-13 11:51:11 +01:00
Christopher Faulet	4da82395d8	CLEANUP: http-ana: Remove HTTP_MSG_ERROR state This state is now unused. Thus it can be removed.	2023-01-13 11:22:13 +01:00
Christopher Faulet	71236dedb9	MINOR: http-ana: Add a function to set HTTP termination flags There is already a function to set termination flags but it is not well suited for HTTP streams. So a function, dedicated to the HTTP analysis, was added. This way, this new function will be called for HTTP analysers on error. And if the error is not caugth at this stage, the generic function will still be called from process_stream(). Here, by default a PRXCOND error is reported and depending on the stream state, the reson will be set accordingly: * If the backend SC is in INI state, SF_FINST_T is reported on tarpit and SF_FINST_R otherwise. * SF_FINST_Q is the server connection is queued * SF_FINST_C in any connection attempt state (REQ/TAR/ASS/CONN/CER/RDY). Except for applets, a SF_FINST_R is reported. * Once the server connection is established, SF_FINST_H is reported while HTTP_MSG_DATA state on the response side. * SF_FINST_L is reported if the response is in HTTP_MSG_DONE state or higher and a client error/timeout was reported. * Otherwise SF_FINST_D is reported.	2023-01-13 09:45:23 +01:00
Willy Tarreau	6be8d09a61	OPTIM: global: move byte counts out of global and per-thread During multiple tests we've already noticed that shared stats counters have become a real bottleneck under large thread counts. With QUIC it's pretty visible, with qc_snd_buf() taking 2.5% of the CPU on a 48-thread machine at only 25 Gbps, and this CPU is entirely spent in the atomic increment of the byte count and byte rate. It's also visible in H1/H2 but slightly less since we're working with larger buffers, hence less frequent updates. These counters are exclusively used to report the byte count in "show info" and the byte rate in the stats. Let's move them to the thread_ctx struct and make the stats reader just collect each thread's stats when requested. That's way more efficient than competing on a single cache line. After this, qc_snd_buf has totally disappeared from the perf profile and tests made in h1 show roughly 1% performance increase on small objects.	2023-01-12 16:37:45 +01:00
Amaury Denoyelle	0a1154afb5	MINOR: mux-quic: use send-list for STOP_SENDING/RESET_STREAM emission When a STOP_SENDING or RESET_STREAM must be send, its corresponding qcs is inserted into <qcc.send_list> via qcc_reset_stream() or qcc_abort_stream_read(). This allows to remove the iteration on full qcs tree in qc_send(). Instead, STOP_SENDING and RESET_STREAM is done in the loop over <qcc.send_list> as with STREAM frames. This should improve slightly the performance, most notably when large number of streams are opened. This must be backported up to 2.7.	2023-01-10 17:49:50 +01:00
Amaury Denoyelle	f9b03265f0	MEDIUM: h3: send SETTINGS before STREAM frames Complete qcc_send_stream() function to allow to specify if the stream should be handled in priority. Internally this will insert the qcs instance in front of <qcc.send_list> to be able to treat it before other streams. This functionality is useful when some QUIC streams should be sent before others. Most notably, this is used to guarantee that H3 SETTINGS is done first via the control stream. This must be backported up to 2.7.	2023-01-10 17:49:50 +01:00
Amaury Denoyelle	20f2a425ff	MAJOR: mux-quic: rework stream sending priorization Implement a mechanism to register streams ready to send data in new STREAM frames. Internally, this is implemented with a new list <qcc.send_list> which contains qcs instances. A qcs can be registered safely using the new function qcc_send_stream(). This is done automatically in qc_send_buf() which covers most cases. Also, application layer is free to use it for internal usage streams. This is currently the case for H3 control stream with SETTINGS sending. The main point of this patch is to handle stream sending fairly. This is in stark contrast with previous code where streams with lower ID were always prioritized. This could cause other streams to be indefinitely blocked behind a stream which has a lot of data to transfer. Now, streams are handled in an order scheduled by se_desc layer. This commit is the first one of a serie which will bring other improvments which also relied on the send_list implementation. This must be backported up to 2.7 when deemed sufficiently stable.	2023-01-10 17:49:50 +01:00
Christopher Faulet	da89e9b95b	MINOR: channel/applets: Stop to test CF_WRITE_ERROR flag if CF_SHUTW is enough In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown for writes (CF_SHUTW) is detected. However, any write error leads to an immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is set.	2023-01-09 18:41:08 +01:00
Christopher Faulet	4b490b7517	MINOR: channel: Stop to test CF_READ_ERROR flag if CF_SHUTR is enough When a read error (CF_READ_ERROR) is reported, a shutdown for reads is always performed (CF_SHUTR). Thus, there is no reason to check if CF_READ_ERROR is set if CF_SHUTR is also checked.	2023-01-09 18:41:08 +01:00
Christopher Faulet	2357718217	MEDIUM: channel: Remove CF_READ_ATTACHED and report CF_READ_EVENT instead CF_READ_ATTACHED flag is only used in input events for stream analyzers, CF_MASK_ANALYSER. A read event can be reported instead and this flag can be removed. We must only take care to report a read event when the client connection is upgraded from TCP to HTTP.	2023-01-09 18:41:08 +01:00
Christopher Faulet	049fbcd36a	MINOR: channel: Remove CF_ANA_TIMEOUT and report CF_READ_EVENT instead It appears CF_ANA_TIMEOUT is flag only used in CF_MASK_ANALYSER. All analyzer timeout relies on the analysis expiration date (chn->analyse_exp). Worst, once set, this flag is never removed. Thus this flag can be removed and replaced by a read event (CF_READ_EVENT).	2023-01-09 18:41:08 +01:00
Christopher Faulet	a63f8f379f	MINOR: channel: Remove CF_WRITE_ACTIVITY Thanks to previous changes, CF_WRITE_ACTIVITY flags can be removed. Everywhere it was used, its value is now directly used (CF_WRITE_EVENT\|CF_WRITE_ERROR).	2023-01-09 18:41:08 +01:00
Christopher Faulet	33e03cec5f	MINOR: channel: Remove CF_READ_ACTIVITY Thanks to previous changes, CF_READ_ACTIVITY flags can be removed. Everywhere it was used, its value is now directly used (CF_READ_EVENT\|CF_READ_ERROR).	2023-01-09 18:41:08 +01:00
Christopher Faulet	d898841530	MEDIUM: channel: Use CF_WRITE_EVENT instead of CF_WRITE_PARTIAL Just like CF_READ_PARTIAL, CF_WRITE_PARTIAL is now merged with CF_WRITE_EVENT. There a subtlety in sc_notify(). The "connect" event (formely CF_WRITE_NULL) is now detected with (CF_WRITE_EVENT + sc->state < SC_ST_EST).	2023-01-09 18:41:08 +01:00
Christopher Faulet	285f7616ee	MEDIUM: channel: Use CF_READ_EVENT instead of CF_READ_PARTIAL CF_READ_PARTIAL flag is now merged with CF_READ_EVENT. It means CF_READ_EVENT is set when a read0 is received (formely CF_READ_NULL) or when data are received (formely CF_READ_ACTIVITY). There is nothing special here, except conditions to wake the stream up in sc_notify(). Indeed, the test was a bit changed to reflect recent change. read0 event is now formalized by (CF_READ_EVENT + CF_SHUTR).	2023-01-09 18:41:08 +01:00
Christopher Faulet	b96f2aa380	REORG: channel: Rename CF_WRITE_NULL to CF_WRITE_EVENT As for CF_READ_NULL, it appears CF_WRITE_NULL and other write events on a channel are mainly used to wake up the stream and may be replace by on write event. In this patch, we introduce CF_WRITE_EVENT flag as a replacement to CF_WRITE_EVENT_NULL. There is no breaking change for now, it is just a rename. Gradually, other write events will be merged with this one.	2023-01-09 18:41:08 +01:00
Christopher Faulet	6e1bbc446b	REORG: channel: Rename CF_READ_NULL to CF_READ_EVENT CF_READ_NULL flag is not really useful and used. It is a transient event used to wakeup the stream. As we will see, all read events on a channel may be resumed to only one and are all used to wake up the stream. In this patch, we introduce CF_READ_EVENT flag as a replacement to CF_READ_NULL. There is no breaking change for now, it is just a rename. Gradually, other read events will be merged with this one.	2023-01-09 18:41:08 +01:00
Willy Tarreau	5a72d03a58	MINOR: stick-table: implement the sc-add-gpc() action This action increments the General Purpose Counter at the index <idx> of the array associated to the sticky counter designated by <sc-id> by the value of either integer <int> or the integer evaluation of expression <expr>. Integers and expressions are limited to unsigned 32-bit values. If an error occurs, this action silently fails and the actions evaluation continues. <idx> is an integer between 0 and 99 and <sc-id> is an integer between 0 and 2. It also silently fails if the there is no GPC stored at this index. The entry in the table is refreshed even if the value is zero. The 'gpc_rate' is automatically adjusted to reflect the average growth rate of the gpc value. The main use of this action is to count scores or total volumes (e.g. estimated danger per source IP reported by the server or a WAF, total uploaded bytes, etc).	2023-01-07 09:11:22 +01:00
Willy Tarreau	6c0117168e	MEDIUM: stick-table: set the track-sc limit at boottime via tune.stick-counters The number of stick-counter entries usable by track-sc rules is currently set at build time. There is no good value for this since the vast majority of users don't need any, most need only a few and rare users need more. Adding more counters for everyone increases memory and CPU usages for no reason. This patch moves the per-session and per-stream arrays to a pool of a size defined at boot time. This way it becomes possible to set the number of entries at boot time via a new global setting "tune.stick-counters" that sets the limit for the whole process. When not set, the MAX_SESS_STR_CTR value still applies, or 3 if not set, as before. It is also possible to lower the value to 0 to save a bit of memory if not used at all. Note that a few low-level sample-fetch functions had to be protected due to the ability to use sample-fetches in the global section to set some variables.	2023-01-06 18:08:49 +01:00
Christopher Faulet	61aded057d	BUG/MAJOR: buf: Fix copy of wrapping output data when a buffer is realigned There is a bug in b_slow_realign() function when wrapping output data are copied in the swap buffer. block1 and block2 sizes are inverted. Thus blocks with a wrong size are copied. It leads to data mixin if the first block is in reality larger than the second one or to a copy of data outside the buffer is the first block is smaller than the second one. The bug was introduced when the buffer API was refactored in 1.9. It was found by a code review and seems never to have been triggered in almost 5 years. However, we cannot exclude it is responsible of some unresolved bugs. This patch should fix issue #1978. It must be backported as far as 2.0.	2023-01-05 09:34:49 +01:00
Willy Tarreau	6e70a3986c	BUILD: makefile: only consider settings from enabled options Due to the previous SSL exception we coudln't restrict the collected CFLAGS/LDFLAGS to those of enabled options, so all of them were considered if set. The problem is that it would prevent simply disabling a build option without unsetting its xxx_CFLAGS or _LDFLAGS values if those had incompatible values (e.g. -lfoo). Now that only existing options are listed in collect_opts_flags, we can safely check that the option is set and only consider its settings in this case. Thus OT_LDFLAGS will not be used if USE_OT is not set for example.	2022-12-23 17:01:55 +01:00
Willy Tarreau	6a2cd33509	BUILD: makefile: remove the special case of the SSL option By creating USE_SSL and enabling it when USE_OPENSSL is set, we can get rid of the special case that was made with it regarding cflags collect and when resetting options. The option doesn't need to be manually set, though in the future it might prove useful if other non-openssl API are supported.	2022-12-23 16:53:35 +01:00
Willy Tarreau	2b8d0978f3	BUILD: makefile: make all OpenSSL variants use the same settings It's getting complicated to configure includes and lib dirs for OpenSSL API variants such as WolfSSL, because some settings are common and others are specific but carry a prefix that doesn't match the USE_* rule scheme. This patch simplifies everything by considering that all SSL libs will use SSL_INC, SSL_LIB, SSL_CFLAGS and SSL_LDFLAGS. That's much more convenient. This works thanks to the settings collector which explicitly checks the SSL_* settings. When USE_OPENSSL_WOLFSSL is set, then USE_OPENSSL is implied, so that there's no need to duplicate maintenance effort.	2022-12-23 16:53:35 +01:00
Willy Tarreau	8fa2f49f24	BUILD: makefile: add a function to collect all options' CFLAGS/LDFLAGS The new function collect_opts_flags now scans all USE_* options defined in use_opts and appends the corresponding _CFLAGS and _LDFLAGS to OPTIONS_{C,LD}FLAGS respectively. This will be useful to get rid of all the individual concatenations to these variables.	2022-12-23 16:53:35 +01:00
Willy Tarreau	b14e89e322	BUILD: makefile: initialize all build options' variables at once A lot of _SRC, _INC, _LIB etc variables are set and expected to be initialized to an empty string by default. However, an in-depth review of all of them showed that WOLFSSL_{INC,LIB}, SSL_{INC,LIB}, LUA_{INC,LIB}, and maybe others were not always initialized and could sometimes leak from the environment and as such cause strange build issues when running from cascaded scripts that had exported them. The approach taken here consists in iterating over all USE_* options and unsetting any _SRC, _INC, _LIB, _CFLAGS and _LDFLAGS that follows the same name. For the few variable names options that don't exactly match the build option (SSL & WOLFSSL), these ones are specifically added to the list. The few that were explicitly cleared in their own sections were just removed since not needed anymore. Note that an "undefine" command appeared in GNU make 3.82 but since we support older ones we can only initialize the variables to an empty string here. It's not a problem in practice. We're now certain that these variables are empty wherever they are used, and that it is possible to just append to them, or use them as-is.	2022-12-23 16:53:35 +01:00
Willy Tarreau	848362f2d2	BUILD: makefile: sort the features list The features list that appears in -vv appears in a random order, which always makes it a pain to look for certain features. Let's sort it.	2022-12-23 16:53:35 +01:00
Willy Tarreau	69e7b7f677	BUILD: makefile: move common options-oriented macros to include/make/options.mk Some macros and functions are barely understandable and are only used to iterate over known options from the use_opts list. Better assign them a name and move them into a dedicated file to clean the makefile a little bit. Now at least "use_opts" only appears once, where it is defined. This also allowed to completely remove the BUILD_FEATURES macro that caused some confusion until previous commit.	2022-12-23 16:53:35 +01:00
Amaury Denoyelle	663e872e3a	MEDIUM: mux-quic: implement STOP_SENDING emission Implement STOP_SENDING. This is divided in two main functions : * qcc_abort_stream_read() which can be used by application protocol to request for a STOP_SENDING. This set the flag QC_SF_READ_ABORTED. * qcs_send_reset() is a static function called after the preceding one. It will send a STOP_SENDING via qcc_send(). QC_SF_READ_ABORTED flag is now properly used : if activated on a stream during qcc_recv(), <qcc.app_ops.decode_qcs> callback is skipped. Also, abort reading on unknown unidirection remote stream is now fully supported with the emission of a STOP_SENDING as specified by RFC 9000. This commit is part of implementing H3 errors at the stream level. This will allows the H3 layer to request the peer to close its endpoint for an error on a stream. This should be backported up to 2.7.	2022-12-22 16:38:16 +01:00
Amaury Denoyelle	5854fc08cc	MINOR: mux-quic: handle RESET_STREAM reception Implement RESET_STREAM reception by mux-quic. On reception, qcs instance will be mark as remotely closed and its Rx buffer released. The stream layer will be flagged on error if still attached. This commit is part of implementing H3 errors at the stream level. Indeed, on H3 stream errors, STOP_SENDING + RESET_STREAM should be emitted. The STOP_SENDING will in turn generate a RESET_STREAM by the remote peer which will be handled thanks to this patch. This should be backported up to 2.7.	2022-12-22 16:38:04 +01:00
Amaury Denoyelle	a473f196f1	MEDIUM: mux-quic: implement shutw Implement mux_ops shutw operation for QUIC mux. A RESET_STREAM is emitted unless the stream is already closed due to all data or RESET_STREAM already transmitted. This operation is notably useful when upper stream layer wants to close the connection early due to an error. This was tested by using a HTTP server which listens with PROXY protocol support. The corresponding server line on haproxy configuration deliberately not specify send-proxy. This causes the server to close abruptly the connection. Without this patch, nothing was done on the QUIC stream which was kept open until the whole connection is closed. Now, a proper RESET_STREAM is emitted to report the error. This should be backported up to 2.7.	2022-12-22 16:22:39 +01:00
William Lallemand	be6a873096	BUG/MINOR: httpclient/log: free of invalid ptr with httpclient_log_format free_proxy() must check if the ptr is not httpclient_log_format before trying to free p->conf.logformat_string. No backport needed.	2022-12-22 15:39:31 +01:00
Christopher Faulet	c960a3b60f	BUG/MINOR: pool/stats: Use ullong to report total pool usage in bytes in stats The same change was already performed for the cli. The stats applet and the prometheus exporter are also concerned. Both use the stats API and rely on pool functions to get total pool usage in bytes. pool_total_allocated() and pool_total_used() must return 64 bits unsigned integer to avoid any wrapping around 4G. This may be backported to all versions.	2022-12-22 13:46:21 +01:00
Remi Tricot-Le Breton	c8d814ed63	MINOR: ssl: Move OCSP code to a dedicated source file This is a simple cleanup that moves OCSP related code to a dedicated file instead of interlacing it in some pure ssl connection code.	2022-12-21 11:21:07 +01:00
Remi Tricot-Le Breton	6477bbd78d	MEDIUM: ssl: Add ocsp update task main function This patch contains the main function of the ocsp auto update mechanism as well as an init and destroy function of the task used for this. The task is not created in this patch but in a later one. The function has two distinct parts and the branching to one or the other is completely based on the fact that the cur_ocsp pointer of the ssl_ocsp_task_ctx member is set. If the pointer is not set, we need to look at the first item of the update tree and see if it needs to be updated. If it does not we simply wait until the time is right and let the task asleep. If it does need to be updated, we simply build and send the corresponding ocsp request thanks to the http_client. The task is then sent to sleep with an expire time set to infinity. The http_client will wake it back up once the response is received (or a timeout occurs). Just note that during this whole process the cetificate_ocsp object corresponding to the entry being updated is taken out of the update tree and only stored in the ssl_ocsp_task_ctx context. Once the task is waken up by the http_client, it branches on the response processing part of the function which basically checks that the response is valid and inserts it into the ocsp_response tree. The task then goes back to sleep until another entry needs to be updated.	2022-12-21 11:21:07 +01:00
Remi Tricot-Le Breton	fb2b9988e8	MINOR: ssl: Store 'ocsp-update' mode in the ckch_data and check for inconsistencies The 'ocsp-update' option is parsed at the same time as all the other bind line options but it does not actually have anything to do with the bind line since it concerns the frontend certificate instead. For that reason, we should have a mean to identify inconsistencies in the configuration and raise an error when a given certificate has two different ocsp-update modes specified in one or more crt-lists. The simplest way to do it is to store the ocsp update mode directly in the ckch and not only in the ssl_bind_conf.	2022-12-21 11:21:07 +01:00
Remi Tricot-Le Breton	03c5ffff8e	MINOR: ssl: Add crt-list ocsp-update option This option will define how the ocsp update mechanism behaves. The option can either be set to 'on' or 'off' and can only be specified in a crt-list entry so that we ensure that it concerns a single certificate. The 'off' mode is the default one and corresponds to the old behavior (no automatic update). When the option is set to 'on', we will try to get an ocsp response whenever an ocsp uri can be found in the frontend's certificate. The only limitation of this mode is that the certificate's issuer will have to be known in order for the OCSP certid to be built. This patch only adds the parsing of the option. The full functionality will come in a later commit.	2022-12-21 11:21:07 +01:00
Remi Tricot-Le Breton	cc346678dc	MEDIUM: ssl: Add ocsp_certid in ckch structure and discard ocsp buffer early The ocsp_response member of the cert_key_and_chain structure is only used temporarily. During a standard init process where an ocsp response is provided, this ocsp file is first copied into the ocsp_response buffer without any ocsp-related parsing (see ssl_sock_load_ocsp_response_from_file), and then the contents are actually interpreted and inserted into the actual ocsp tree (cert_ocsp_tree) later in the process (see ssl_sock_load_ocsp). If the response was deemed valid, it is then copied into the actual ocsp_response structure's 'response' field (see ssl_sock_load_ocsp_response). From this point, the ocsp_response field of the cert_key_and_chain object could be discarded since actual ocsp operations will be based of the certificate_ocsp object. The only remaining runtime use of the ckch's ocsp_response field was in the CLI, and more precisely in the 'show ssl cert' mechanism. This constraint could be removed by adding an OCSP_CERTID directly in the ckch because the buffer was only used to get this id. This patch then adds the OCSP_CERTID pointer in the ckch, it clears the ocsp_response buffer early and simplifies the ckch_store_build_certid function.	2022-12-21 11:21:07 +01:00
Remi Tricot-Le Breton	c0b4058e7e	MINOR: ssl: Add helper function that checks the validity of an OCSP response This helper function will check that an OCSP response is valid, meaning that the proper "Content-Type: application/ocsp-response" header is present and the data itself is a proper OCSP_RESPONSE that can be checked thanks to the issuer certificate.	2022-12-21 11:21:07 +01:00
Remi Tricot-Le Breton	e09d2ae598	MINOR: ssl: Add OCSP request helper function This function creates the url and body that will be used to build a proper OCSP request for a given certid (following section A.1 of RFC6960).	2022-12-21 11:21:07 +01:00
Remi Tricot-Le Breton	47a4f1239d	MINOR: ssl: Add helper function that extracts an OCSP URI from a certificate This function extracts the first OCSP URI (if any) contained in a certificate. It only takes the first of potentially multiple URIs.	2022-12-21 11:21:07 +01:00
Remi Tricot-Le Breton	95e7cf1ddf	MINOR: httpclient: Make the CLI flags public for future use Those flags used by the http_client in its CLI function might come to use for OCSP updates that will strongly rely on the http client.	2022-12-21 11:21:07 +01:00
Remi Tricot-Le Breton	2b96364b35	MINOR: ssl: Add a lock to the OCSP response tree The tree that contains OCSP responses is never locked despite being used at runtime for OCSP stapling as well as the CLI through "set ssl cert" and "set ssl ocsp-response" commands. Everything works though because the certificate_ocsp structure is refcounted and the tree's entries are cleaned up when SSL_CTXs are destroyed (thanks to an ex_data entry in which the certificate_ocsp pointer is stored). This new lock will come to use when the OCSP auto update mechanism is fully implemented because this new feature will be based on another tree that stores the same certificate_ocsp members and updates their contents periodically.	2022-12-21 11:21:07 +01:00
Willy Tarreau	eed7826529	BUG/MEDIUM: quic: properly take shards into account on bind lines Shards were completely forgotten in commit `f5a0c8abf` ("MEDIUM: quic: respect the threads assigned to a bind line"). The thread mask is taken from the bind_conf, but since shards were introduced in 2.5, the per-listener mask is held by the receiver and can be smaller than the bind_conf's mask. The effect here is that the traffic is not distributed to the appropriate thread. At first glance it's not dramatic since it remains one of the threads eligible by the bind_conf, but it still means that in some contexts such as "shards by-thread", some concurrency may persist on listeners while they're expected to be alone. One identified impact is that it requires more rxbufs than necessary, but there may possibly be other not yet identified side effects. This must be backported to 2.7 and everywhere the commit above is backported.	2022-12-21 09:27:26 +01:00
Amaury Denoyelle	15337fd808	BUG/MEDIUM: mux-quic: fix double delete from qcc.opening_list qcs instances for bidirectional streams are inserted in <qcc.opening_list>. It is removed from the list once a full HTTP request has been parsed. This is required to implement http-request timeout. In case a stream is deleted before receiving full HTTP request, it also must be removed from <qcc.opening_list>. This was not the case on first implementation but has been fixed by the following patch : `641a65ff3c` BUG/MINOR: mux-quic: remove qcs from opening-list on free This means that now a stream can be deleted from the list in two different functions. Sadly, as LIST_DELETE was used in both cases, nothing prevented a double-deletion from the list, even though LIST_INLIST was used. Both calls are replaced with LIST_DEL_INIT which is idempotent. This bug causes memory corruption which results in most cases in a segfault, most of times outside of mux-quic code itself. It has been found first by gabrieltz who reported it on the github issue #1903. Big thanks to him for his testing. This bug also causes failures on several 'M' transfer testcase of QUIC interop-runner. The s2n-quic client is particularly useful in this case as segfaults triggers were most of the times on the LIST_DELETE operation itself. This is probably due to its encapsulating of HEADERS frame with fin bit delayed in a following empty STREAM frame. This must be backported wherever the above patch is, up to 2.6.	2022-12-21 08:58:04 +01:00
Willy Tarreau	e327b4a73e	MINOR: freq_ctr: add opportunistic versions of swrate_add() Some uses of swrate_add() only consist in getting a rough estimate of a frequency. There are cases where speed matters more than accuracy (e.g. pools). For such use cases, let's just stop looping on the CAS, if the update fails, another thread is already providing input, and it's not dramatic to lose the race. All these functions are now suffixed with "_opportunistic".	2022-12-20 14:51:12 +01:00
Willy Tarreau	284cfc67b8	MINOR: pool: make the thread-local hot cache size configurable Till now it was only possible to change the thread local hot cache size at build time using CONFIG_HAP_POOL_CACHE_SIZE. But along benchmarks it was sometimes noticed a huge contention in the lower level memory allocators indicating that larger caches could be beneficial, especially on machines with large L2 CPUs. Given that the checks against this value was no longer on a hot path anymore, there was no reason for continuing to force it to be tuned at build time. So this patch allows to set it by tune.memory-hot-size. It's worth noting that during the boot phase the value remains zero so that it's possible to know if the value was set or not, which opens the possibility that we try to automatically adjust it based on the per-cpu L2 cache size or the use of certain protocols (none of this is done yet).	2022-12-20 14:51:12 +01:00
Willy Tarreau	4dd33d9c32	OPTIM: pool: split the read_mostly from read_write parts in pool_head Performance profiling on a 48-thread machine showed a lot of time spent in pool_free(), precisely at the point where pool->limit was retrieved. And the reason is simple. Some parts of the pool_head are heavily updated only when facing a cache miss ("allocated", "used", "needed_avg"), while others are always accessed (limit, flags, size). The fact that both entries were stored into the same cache line makes it very difficult for each thread to access these precious info even when working with its own cache. By just splitting the fields apart, a test on QUIC (which stresses pools a lot) more than doubled performance from 42 Gbps to 96 Gbps! Given that the patch only reorders fields and addresses such a significant contention, it should be backported to 2.7 and 2.6.	2022-12-20 14:51:12 +01:00
William Lallemand	46bea1c616	BUILD: peers: peers-t.h depends on stick-table-t.h peers-t.h uses "struct stktable" as well as STKTABLE_DATA_TYPES which are defined in stick-table-t.h. It works by accident because stick-table-t.h was always included before. But could provoke build issue with EXTRA code. To be backported as far as 2.2.	2022-12-16 15:51:44 +01:00
Aurelien DARRAGON	5594184190	MINOR: stats: introduce stats field ctx Add a new value in stats ctx: field. Implement field support in line dumping parent functions stats_print_proxy_field_json() and stats_dump_proxy_to_buffer(). This will allow child dumping functions to support partial line dumping when needed. ie: when dumping buffer is exhausted: do a partial send and wait for a new buffer to finish the dump. Thanks to field ctx, the function can start dumping where it left off on previous (unterminated) invokation.	2022-12-15 16:53:49 +01:00
Amaury Denoyelle	15f3cc4b38	MINOR: http: extract content-length parsing from H2 Extract function h2_parse_cont_len_header() in the generic HTTP module. This allows to reuse it for all HTTP/x parsers. The function is now available as http_parse_cont_len_header(). Most notably, this will be reused in the next bugfix for the H3 parser. This is necessary to check that content-length header match the length of DATA frames. Thus, it must be backported to 2.6.	2022-12-14 11:34:18 +01:00
Amaury Denoyelle	dbf6ad470b	BUG/MINOR: quic: properly handle alloc failure in qc_new_conn() qc_new_conn() is used to allocate a quic_conn instance and its various internal members. If one allocation fails, quic_conn_release() is used to cleanup things. For the moment, pool_zalloc() is used which ensures that all content is null. However, some members must be initialized to a special values to be able to use quic_conn_release() safely. This is the case for quic_conn lists and its tasklet. Also, some quic_conn internal allocation functions were doing their own cleanup on failure without reset to NULL. This caused an issue with quic_conn_release() which also frees this members. To fix this, these functions now only return an error without cleanup. It is the caller responsibility to free the allocated content, which is done via quic_conn_release(). Without this patch, allocation failure in qc_new_conn() would often result in segfault. This was reproduced easily using fail-alloc at 10%. This should be backported up to 2.6.	2022-12-12 11:44:34 +01:00
Cedric Paillet	e06e31ea3b	MINOR: promex: introduce haproxy_backend_agg_check_status This patch introduces haproxy_backend_agg_check_status metric as we wanted in `42d7c402d` but with the right data source. This patch could be backported as far as 2.4.	2022-12-09 10:54:48 +01:00
Cedric Paillet	7d6644e689	BUG/MINOR: promex: create haproxy_backend_agg_server_status haproxy_backend_agg_server_check_status currently aggregates haproxy_server_status instead of haproxy_server_check_status. We deprecate this and create a new one, haproxy_backend_agg_server_status to clarify what it really does. This patch could be backported as far as 2.4.	2022-12-09 10:54:27 +01:00
Willy Tarreau	9192d20f02	MINOR: pools: make DEBUG_UAF a runtime setting Since the massive pools cleanup that happened in 2.6, the pools architecture was made quite more hierarchical and many alternate code blocks could be moved to runtime flags set by -dM. One of them had not been converted by then, DEBUG_UAF. It's not much more difficult actually, since it only acts on a pair of functions indirection on the slow path (OS-level allocator) and a default setting for the cache activation. This patch adds the "uaf" setting to the options permitted in -dM so that it now becomes possible to set or unset UAF at boot time without recompiling. This is particularly convenient, because every 3 months on average, developers ask a user to recompile haproxy with DEBUG_UAF to understand a bug. Now it will not be needed anymore, instead the user will only have to disable pools and enable uaf using -dMuaf. Note that -dMuaf only disables previously enabled pools, but it remains possible to re-enable caching by specifying the cache after, like -dMuaf,cache. A few tests with this mode show that it can be an interesting combination which catches significantly less UAF but will do so with much less overhead, so it might be compatible with some high-traffic deployments. The change is very small and isolated. It could be helpful to backport this at least to 2.7 once confirmed not to cause build issues on exotic systems, and even to 2.6 a bit later as this has proven to be useful over time, and could be even more if it did not require a rebuild. If a backport is desired, the following patches are needed as well: CLEANUP: pools: move the write before free to the uaf-only function CLEANUP: pool: only include pool-os from pool.c not pool.h REORG: pool: move all the OS specific code to pool-os.h CLEANUP: pools: get rid of CONFIG_HAP_POOLS DEBUG: pool: show a few examples in -dMhelp	2022-12-08 18:54:59 +01:00
Willy Tarreau	4da51bd190	CLEANUP: pools: get rid of CONFIG_HAP_POOLS This one was set in defaults.h only when neither DEBUG_NO_POOLS nor DEBUG_UAF were set. This was not the most convenient location to look for it, and it was only used in pool.c to decide on the initial value of POOL_DBG_NO_CACHE. Let's just use DEBUG_NO_POOLS \|\| DEBUG_UAF directly on this flag and get rid of the intermediary condition. This also has the benefit of removing a double inversion, which is always nice for understanding.	2022-12-08 17:45:08 +01:00
Willy Tarreau	a95636682d	REORG: pool: move all the OS specific code to pool-os.h Till now pool-os used to contain a mapping from pool_{alloc,free}_area() to pool_{alloc,free}_area_uaf() in case of DEBUG_UAF, or the regular malloc-based function. And the _uaf() functions were in pool.c. But since 2.4 with the first cleanup of the pools, there has been no more calls to pool_{alloc,free}_area() from anywhere but pool.c, from exactly one place each. As such, there's no more need to keep _uaf() apart in pool.c, we can inline it into pool-os.h and leave all the OS stuff there, with pool.c calling either based on DEBUG_UAF. This is cleaner with less round trips between both files and easier to find.	2022-12-08 17:32:57 +01:00
Willy Tarreau	76a97a98ca	CLEANUP: pool: only include pool-os from pool.c not pool.h There's no need for the low-level pool functions to be known from all callers anymore, they're only used by pool.c. Let's reduce the amount of header files processed.	2022-12-08 17:32:40 +01:00
Willy Tarreau	5ab3c61932	BUILD: atomic: atomic.h may need compiler.h on ARMv8.2-a We get a build error in ncbuf.c when building for ARMv8.2-a because ncbuf has minimal includes and among them bug.h which includes atomic.h. Atomic.h may use "forceinline" without including compiler.h, hence the build error. It was verified that adding it doesn't inflate the total headers. Since all other C files include api.h which already covers this, there's no real need to bapkport this. The issue was already there in 2.3 though.	2022-12-08 08:36:24 +01:00
Aurelien DARRAGON	7d541a91ec	BUG/MINOR: checks: restore legacy on-error fastinter behavior With previous commit, `9e080bf` ("BUG/MINOR: checks: make sure fastinter is used even on forced transitions"), on-error mark-down\|sudden-death\|fail-check are now working as expected. However, on-error fastinter remains broken because srv_getinter(), used in the above commit to check the expiration date, won't return fastinter interval if server health is maxed out (which is the case with on-error fastinter mode). To fix this, we introduce a check flag named CHK_ST_FASTINTER. This flag is set when on-error is triggered. This way we can force srv_getinter() to return fastinter interval whenever the flag is set. The flag is automatically cleared as soon as the new check task expiry is recalculated in process_chk_conn(). This restores original behavior prior to `d114f4a` ("MEDIUM: checks: spread the checks load over random threads"). It must be backported to 2.7 along with the aforementioned commits.	2022-12-07 17:03:55 +01:00
Ilya Shipitsin	5fa29b8a74	CLEANUP: assorted typo fixes in the code and comments This is 34th iteration of typo fixes	2022-12-07 09:08:18 +01:00
Aurelien DARRAGON	22f82f81e5	MINOR: server/event_hdl: add support for SERVER_UP and SERVER_DOWN events We're using srv_update_status() as the only event source or UP/DOWN server events in an attempt to simplify the support for these 2 events. It seems srv_update_status() is the common path for server state changes anyway Tested with server state updated from various sources: - the cli - server-state file (maybe we could disable this or at least don't publish in global event queue in the future if it ends in slower startup for setups relying on huge server state files) - dns records (ie: srv template) (again, could be fined tuned to only publish in server specific subscriber list and no longer in global subscription list if mass dns update tend to slow down srv_update_status()) - normal checks and observe checks (HCHK_STATUS_HANA) (same as above, if checks related state update storms are expected) - lua scripts - html stats page (admin mode)	2022-12-06 10:22:07 +01:00
Aurelien DARRAGON	129ecf441f	MINOR: server/event_hdl: add support for SERVER_ADD and SERVER_DEL events Basic support for ADD and DEL server events are added through this commit: SERVER_ADD is published on dynamic server addition through cli. SERVER_DEL is published on dynamic server deletion through cli. This work depends on: "MINOR: event_hdl: add event handler base api" "MINOR: server: add srv->rid (revision id) value"	2022-12-06 10:22:07 +01:00
Aurelien DARRAGON	745ce8e8ad	MINOR: stats: add server revision id support Make use of the new srv->rid value in stats. Stat is referred as ST_F_SRID, it is now used in stats_fill_sv_stats function in order to be included in csv and json stats dumps. Moreover, "rid: $value" will be displayed next to server puid in html stats page if "stats show-legend" is specified in the stats frontend. (mouse hovering tooltip) Depends on the following commit: "MINOR: server: add srv->rid (revision id) value"	2022-12-06 10:22:06 +01:00
Aurelien DARRAGON	61e3894dfe	MINOR: server: add srv->rid (revision id) value With current design, we could not distinguish between previously existing deleted server and a new server reusing the deleted server name/id. This can cause some confusion when auditing stats/events/logs, because the new server will look similar to the old one. To address this, we're adding a new value in server structure: rid rid (revision id) value is an unsigned 32bits value that is set upon server creation. Value is derived from a global counter that starts at 0 and is incremented each time one or multiple server deletions are followed by a server addition (meaning that old name/id reuse could occur). Thanks to this revision id, it is now easy to tell whether the server we're looking at is the same as before or if it has been deleted and re-added in the meantime. (combining server name/id + server revision id yields a process-wide unique identifier)	2022-12-06 10:22:06 +01:00
Amaury Denoyelle	d3083c9df9	MINOR: quic: reconnect quic-conn socket on address migration UDP addresses may change over time for a QUIC connection. When using quic-conn owned socket, we have to detect address change to break the bind/connect association on the socket. For the moment, on change detected, QUIC connection socket is closed and a new one is opened. In the future, we may improve this by trying to keep the original socket and reexecute only bind/connect syscalls. This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	7c9fdd9c3a	MEDIUM: quic: move receive out of FD handler to quic-conn io-cb This change is the second part for reception on QUIC connection socket. All operations inside the FD handler has been delayed to quic-conn tasklet via the new function qc_rcv_buf(). With this change, buffer management on reception has been simplified. It is now possible to use a local buffer inside qc_rcv_buf() instead of quic_receiver_buf(). This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	5b41486b7f	MEDIUM: quic: use quic-conn socket for reception Try to use the quic-conn socket for reception if it is allocated. For this, the socket is inserted in the fdtab. This will call the new handler quic_conn_io_cb() which is responsible to process the recv() system call. It will reuse datagram dispatch for simplicity. However, this is guaranteed to be called on the quic-conn thread, so it will be more efficient to use a dedicated buffer. This will be implemented in another commit. This patch should improve performance by reducing contention on the receiver socket. However, more gain can be obtained when the datagram dispatch operation will be skipped. Older quic_sock_fd_iocb() is renamed to quic_lstnr_sock_fd_iocb() to emphasize its usage for the receiver socket. This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	40909dfec5	MINOR: quic: allocate a socket per quic-conn Allocate quic-conn owned socket if possible. This requires that this is activated in haproxy configuration. Also, this is done only if local address is known so it depends on the support of IP_PKTINFO. For the moment this socket is not used. This causes QUIC support to be broken as received datagram are not read. This commit will be completed by a following patch to support recv operation on the newly allocated socket. This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	75839a44e7	MINOR: quic: startup detect for quic-conn owned socket support To be able to use individual sockets for QUIC connections, we rely on the OS network stack which must support UDP sockets binding on the same local address. Add a detection code for this feature executed on startup. When the first QUIC listener socket is binded, a test socket is created and binded on the same address. If the bind call fails, we consider that it's impossible to use individual socket for QUIC connections. A new global option GTUNE_QUIC_SOCK_PER_CONN is defined. If startup detect fails, this value is resetted from global options. For the moment, there is no code to activate the option : this will be in a follow-up patch with the introduction of a new configuration option. This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	eec0b3c1bd	MINOR: quic: detect connection migration Detect connection migration attempted by the client. This is done by comparing addresses stored in quic-conn with src/dest addresses of the UDP datagram. A new function qc_handle_conn_migration() has been added. For the moment, no operation is conducted and the function will be completed during connection migration implementation. The only notable things is the increment of a new counter "quic_conn_migration_done". This should be backported up to 2.7.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	21e611dc89	MINOR: tools: add port for ipcmp as optional criteria Complete ipcmp() function with a new argument <check_port>. If this argument is true, the function will compare port values besides IP addresses and return true only if both are identical. This commit will simplify QUIC connection migration detection. As such, it should be backported to 2.7.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	8687b63c69	MINOR: quic: extract datagram parsing code Extract individual datagram parsing code outside of datagrams list loop in quic_lstnr_dghdlr(). This is moved in a new function named quic_dgram_parse(). To complete this change, quic_lstnr_dghdlr() has been moved into quic_sock source file : it belongs to QUIC socket lower layer and is directly called by quic_sock_fd_iocb(). This commit will ease implementation of quic-conn owned socket. New function quic_dgram_parse() will be easily usable after a receive operation done on quic-conn IO-cb. This should be backported up to 2.7.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	518c98f150	MINOR: quic: remove qc from quic_rx_packet quic_rx_packet struct had a reference to the quic_conn instance. This is useless as qc instance is always passed through function argument. In fact, pkt.qc is used only in qc_pkt_decrypt() on key update, even though qc is also passed as argument. Simplify this by removing qc field from quic_rx_packet structure definition. Also clean up qc_pkt_decrypt() documentation and interface to align it with other quic-conn related functions. This should be backported up to 2.7.	2022-12-02 14:45:43 +01:00
William Lallemand	52ddd99940	MEDIUM: ssl: rename the struct "cert_key_and_chain" to "ckch_data" Rename the structure "cert_key_and_chain" to "ckch_data" in order to avoid confusion with the store whcih often called "ckchs". The "cert_key_and_chain ckch" were renamed "ckch_data data", so we now have store->data instead of ckchs->ckch. Marked medium because it changes the API.	2022-12-02 11:48:30 +01:00
Aurelien DARRAGON	68e692da02	MINOR: event_hdl: add event handler base api Adding base code to provide subscribe/publish API for internal events processing. event_hdl provides two complementary APIs, both are implemented in src/event_hdl.c and include/haproxy/event_hdl{-t.h,.h}: One API targeting developers that want to register event handlers that will be notified on specific events. (SUBSCRIBE) One API targeting developers that want to notify registered handlers about an event. (PUBLISH) This feature is being considered to address the following scenarios: - mailers code refactoring (getting rid of deprecated tcp-check ruleset implementation) - server events from lua code (registering user defined lua function that is executed with relevant data when a server is dynamically added/removed or on server state change) - providing a stable and easy to use API for upcoming developments that rely on specific events to perform actions. (e.g: ressource cleanup when a server is deleted from haproxy) At this time though, we don't have much use cases in mind in addition to server events handling, but the API is aimed at being multipurpose so that new event families, with their own particularities, can be easily implemented afterwards (and hopefully) without requiring breaking changes to the API. Moreover, you should know that the API was not designed to cope well with high rate event publishing. Mostly because publishing means iterating over unsorted subscriber list. So it won't scale well as subscriber list increases, but it is intended in order to keep the code simple and versatile. Instead, it is assumed that events implemented using this API should be periodic events, and that events related to critical io/networking processing should be handled using dedicated facilities anyway. (After all, this is meant to be a general purpose event API) Apart from being easily extensible, one of the main goals of this API is to make subscriber code as simple and safe as possible. This is done by offering multiple event handling modes: - SYNC mode: publishing code directly leverages handler code (callback function) and handler code has a direct access to "live" event data (pointers mostly, alongside with lock hints/context so that accessing data pointers can be done properly) - normal ASYNC mode: handler is executed in a backward compatible way with sync mode, so that it is easy to switch from and to SYNC/ASYNC mode. Only here the handler has access to "offline" event data, and not "live" data (ptrs) so that data consistency is guaranteed. By offline, you should understand "snapshot" of relevant data at the time of the event, so that the handler can consume it later (even if associated ressource is not valid anymore) - advanced ASYNC mode same as normal ASYNC mode, but here handler is not a function that is executed with event data passed as argument: handler is a user defined tasklet that is notified when event occurs. The tasklet may consume pending events and associated data through its own message queue. ASYNC mode should be considered first if you don't rely on live event data and you wan't to make sure that your code has the lowest impact possible on publisher code. (ie: you don't want to break stuff) Internal API documentation will follow: You will find more details about the notions we roughly approached here.	2022-12-02 09:40:52 +01:00
Willy Tarreau	eaded987ee	[RELEASE] Released version 2.8-dev0 Released version 2.8-dev0 with the following main changes : - MINOR: version: mention that it's development again	2022-12-01 15:25:34 +01:00
Willy Tarreau	989c55dc2f	MINOR: version: mention that it's development again This essentially reverts `d705b85a4a`.	2022-12-01 15:24:10 +01:00
Willy Tarreau	d705b85a4a	MINOR: version: mention that it's stable now This version will be maintained up to around Q1 2024. The INSTALL file also mentions it.	2022-12-01 15:15:24 +01:00
Stefan Eissing	b82296c10e	BUILD: quic: allow build with USE_QUIC and USE_OPENSSL_WOLFSSL WolfSSL does not implement the TLS1_3_CK_AES_128_CCM_SHA256 cipher as well as the SSL_ERROR_WANT_ASYNC, SSL_ERROR_WANT_ASYNC_JOB and SSL_ERROR_WANT_CLIENT_HELLO_CB error codes. This patch disables them for WolfSSL. Signed-off-by: William Lallemand <wlallemand@haproxy.org>	2022-11-30 17:38:27 +01:00
Ilya Shipitsin	6f86eaae4f	CLEANUP: assorted typo fixes in the code and comments This is 33rd iteration of typo fixes	2022-11-30 14:02:36 +01:00
Willy Tarreau	d5cae6a0c7	MINOR: stick-table: change the API of the function used to calculate the shard The function used to calculate the shard number currently requires a stktable_key on input for this. Unfortunately, it happens that peers currently miss this calculation and they do not provide stktable_key at all, instead they're open-coding all the low-level stick-table work (hence why it's missing). Thus we'll need to be able to calculate the shard number in keys coming from peers as well but the current API does not make it possible. This commit addresses this by inverting the order where the length and the shard number are used. Now the low-level function is independent on stksess and stktable_key, it takes a table, pointer and length and does all the job. The upper function takes care of the type and key to get the its length, and is for use only from stick-table code. This doesn't change anything except that the low-level one will be usable from outside (hence why it's exported now).	2022-11-29 18:06:42 +01:00
Amaury Denoyelle	d64a26f023	CLEANUP: ncbuf: inline small functions ncbuf API relies on lot of small functions. Mark these functions as inline to reduce call invocations and facilitate compiler optimizations to reduce code size. This should be backported up to 2.6.	2022-11-29 15:14:39 +01:00
Willy Tarreau	56460ee52a	MINOR: stick-table: store a per-table hash seed and use it Instead of using memcpy() to concatenate the table's name to the key when allocating an stksess, let's compute once for all a per-table seed at boot time and use it to calculate the key's hash. This saves two memcpy() and the usage of a chunk, it's always nice in a fast path. When tested under extreme conditions with a 80-byte long table name, it showed a 1% performance increase.	2022-11-28 18:58:06 +01:00
Willy Tarreau	63b5b33ba8	CLEANUP: stick-table: fill alignment holes in the stktable struct There were two 32-bit holes in the stktable struct surrounding 32-bit words, so let's just reorder them a little bit to address the issue.	2022-11-28 18:49:55 +01:00
William Lallemand	0a2d63236c	BUG/MINOR: ssl: shut the ca-file errors emitted during httpclient init With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it can't. This patch fixes the issue by allowing to shut the error when the SSL configuration for the httpclient is not explicit. Must be backported in 2.6.	2022-11-24 19:14:19 +01:00
Uriah Pollock	3cbf09ed64	MEDIUM: ssl: add minimal WolfSSL support with OpenSSL compatibility mode This adds a USE_OPENSSL_WOLFSSL option, wolfSSL must be used with the OpenSSL compatibility layer. This must be used with USE_OPENSSL=1. WolfSSL build options: ./configure --prefix=/opt/wolfssl --enable-haproxy HAProxy build options: USE_OPENSSL=1 USE_OPENSSL_WOLFSSL=1 WOLFSSL_INC=/opt/wolfssl/include/ WOLFSSL_LIB=/opt/wolfssl/lib/ ADDLIB='-Wl,-rpath=/opt/wolfssl/lib' Using at least the commit 54466b6 ("Merge pull request #5810 from Uriah-wolfSSL/haproxy-integration") from WolfSSL. (2022-11-23). This is still to be improved, reg-tests are not supported yet, and more tests are to be done. Signed-off-by: William Lallemand <wlallemand@haproxy.org>	2022-11-24 11:29:03 +01:00
Uriah Pollock	79320cb074	BUILD: quic: use openssl-compat.h instead of openssl/ssl.h Replace the include of openssl/ssl.h by openssl-compat.h. Signed-off-by: William Lallemand <wlallemand@haproxy.org>	2022-11-24 11:29:03 +01:00
Willy Tarreau	946d370d22	BUILD: flags: really restrict the cases where flags are exposed A number of internal flags started to be exposed to external programs at the location of their definition since commit `77acaf5af` ("MINOR: flags: add a new file to host flag dumping macros"). This allowed the "flags" utility to decode many more of them and always correctly. The condition to expose them was to rely on the preliminary definition of EOF that indicates that stdio is already included. But this was a wrong approach. It only guarantees that snprintf() can safely be used but still causes large functions to be built. But stdio is often included before some of these includes, so these heavy inline functions actually have to be compiled in many cases. The result is that the build time significantly increased, especially with fast compilers like gcc -O0 which took +50% or TCC which took +100%! This patch addresses the problem by instead relying on an explicit macro HA_EXPOSE_FLAGS that the calling program must explicitly define before including these files. flags.c does this and that's all. The previous build time is now restored with a speed up of 20 to 50% depending on the build options.	2022-11-24 08:32:27 +01:00
Willy Tarreau	08093cc0fa	CLEANUP: tools: do not needlessly include xxhash nor cli from tools.h These includes brought by commit `9c76637ff` ("MINOR: anon: add new macros and functions to anonymize contents") resulted in an increase of exactly 20% of the number of lines to build. These include are not needed there, only tools.c needs xxhash.h.	2022-11-24 08:30:48 +01:00
Willy Tarreau	4d46638540	BUILD: compiler: include compiler's definitions before ours Building with TCC caused a warning on __attribute__() being redefined, because we do define it on compilers that don't have it, but we didn't include the compiler's definitions first to leave it a chance to expose its definitions. The correct way to do this would be to include sys/cdefs.h but we currently don't include it explicitly and a few reports on the net mention some platforms where it could be missing by default. Let's use inttypes.h instead, it always causes it (or its equivalent) to be included and we know it's present on supported platforms since we already depend on it. No backport is needed.	2022-11-24 08:30:48 +01:00
Willy Tarreau	fc50b9dd14	BUG/MAJOR: sched: protect task during removal from wait queue The issue addressed by commit `fbb934da9` ("BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task") is still present when thread groups are enabled, but this time it lies in the scheduler. What happens is that a task configured to run anywhere might already have been queued into one group's wait queue. When updating a stick table entry, sometimes the task will have to be dequeued and requeued. For this a lock is taken on the current thread group's wait queue lock, but while this is necessary for the queuing, it's not sufficient for dequeuing since another thread might be in the process of expiring this task under its own group's lock which is different. This is easy to test using 3 stick tables with 1ms expiration, 3 track-sc rules and 4 thread groups. The process crashes almost instantly under heavy traffic. One approach could consist in storing the group number the task was queued under in its descriptor (we don't need 32 bits to store the thread id, it's possible to use one short for the tid and another one for the tgrp). Sadly, no safe way to do this was figured, because the race remains at the moment the thread group number is checked, as it might be in the process of being changed by another thread. It seems that a working approach could consist in always having it associated with one group, and only allowing to change it under this group's lock, so that any code trying to change it would have to iterately read it and lock its group until the value matches, confirming it really holds the correct lock. But this seems a bit complicated, particularly with wait_expired_tasks() which already uses upgradable locks to switch from read state to a write state. Given that the shared tasks are not that common (stick-table expirations, rate-limited listeners, maybe resolvers), it doesn't seem worth the extra complexity for now. This patch takes a simpler and safer approach consisting in switching back to a single wq_lock, but still keeping separate wait queues. Given that shared wait queues are almost always empty and that otherwise they're scanned under a read lock, the contention remains manageable and most of the time the lock doesn't even need to be taken since such tasks are not present in a group's queue. In essence, this patch reverts half of the aforementionned patch. This was tested and confirmed to work fine, without observing any performance degradation under any workload. The performance with 8 groups on an EPYC 74F3 and 3 tables remains twice the one of a single group, with the contention remaining on the table's lock first. No backport is needed.	2022-11-22 09:10:08 +01:00
Willy Tarreau	c21a187ec0	MINOR: server/idle: make the next_takeover index per-tgroup In order to evenly pick idle connections from other threads, there is a "next_takeover" index in the server, that is incremented each time a connection is picked from another thread, and indicates which one to start from next time. With thread groups this doesn't work well because the index is the same regardless of the group, and if a group has more threads than another, there's even a risk to reintroduce an imbalance. This patch introduces a new per-tgroup storage in servers which, for now, only contains an instance of this next_takeover index. This way each thread will now only manipulate the index specific to its own group, and the takeover will become fair again. More entries may come soon.	2022-11-21 19:21:07 +01:00
Willy Tarreau	9dc231a6b2	BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns In 2.2, some idle conns usage metrics were added by commit `cf612a045` ("MINOR: servers: Add a counter for the number of currently used connections."), which mentioned that the operation doesn't need to be atomic since we're not seeking exact values. This is true but at least we should use atomic stores to make sure not to cause invalid values to appear on archs that wouldn't guarantee atomicity when writing an int, such as writing two 16-bit words. This is pretty unlikely on our targets but better keep the code safe against this. This may be backported as far as 2.2.	2022-11-21 19:21:07 +01:00
Willy Tarreau	2fba08faec	MINOR: cli/pools: add sorting capabilities to "show pools" The "show pools" command is used a lot for debugging but didn't get much love over the years. This patch brings new capabilities: - sorting the output by pool names to ese their finding ("byname"). - sorting the output by reverse item size to spot the biggest ones("bysize") - sorting the output by reverse number of allocated bytes ("byusage") The last one (byusage) also omits displaying the ones with zero allocation. In addition, an optional max number of output entries may be passed so as to dump only the N most relevant ones.	2022-11-21 10:14:52 +01:00
Ilya Shipitsin	ace3da8dd4	CLEANUP: quic: replace "choosen" with "chosen" all over the code Some variables were set as "choosen" instead of "chosen", this is dedicated spelling fix	2022-11-21 09:22:28 +01:00
Frédéric Lécaille	74b5f7b31b	BUG/MAJOR: quic: Crash after discarding packet number spaces This previous patch was not sufficient to prevent haproxy from crashing when some Handshake packets had to be inspected before being possibly retransmitted: "BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets" This patch introduced another issue: access to packets which have been released because still attached to others (in the same datagram). This was the case for instance when discarding the Initial packet number space before inspecting an Handshake packet in the same datagram through its ->prev or member in our case. This patch implements quic_tx_packet_dgram_detach() which detaches a packet from the adjacent ones in the same datagram to be called when ackwowledging a packet (as done in the previous commit) and when releasing its memory. This was, we are sure the released packets will not be accessed during retransmissions. Thank you to @gabrieltz for having reported this issue in GH #1903. Must be backported to 2.6.	2022-11-20 18:35:46 +01:00
Fr�d�ric L�caille	814645f42f	BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets As revealed by some traces provided by @gabrieltz in GH #1903 issue, there are clients (chrome I guess) which acknowledge only one packet among others in the same datagram. This is the case for the first datagram sent by a QUIC haproxy listener made an Initial packet followed by an Handshake one. In this identified case, this is the Handshake packet only which is acknowledged. But if the client is able to respond with an Handshake packet (ACK frame) this is because it has successfully parsed the Initial packet. So, why not also acknowledging it? AFAIK, this is mandatory. On our side, when restransmitting this datagram, the Handshake packet was accessed from the Initial packet after having being released. Anyway. There is an issue on our side. Obviously, we must not expect an implementation to respect the RFC especially when it want to build an attack ;) With this simple patch for each TX packet we send, we also set the previous one in addition to the next one. When a packet is acknowledged, we detach the next one and the next one in the same datagram from this packet, so that it cannot be resent when resending these packets (the previous one, in our case). Thank you to @gabrieltz for having reported this issue. Must be backported to 2.6.	2022-11-19 04:56:55 +01:00
Christopher Faulet	037e3f8735	MINOR: cfgparse: Always check the section position In diag mode, the section position is checked and a warning is emitted if a global section is defined after any non-global one. Now, this check is always performed. But the warning is still only emitted in diag mode. In addition, the result of this check is now stored in a global variable, to be used from anywhere. The aim of this patch is to be able to restrict usage of some global directives to the very first global sections. It will be useful to avoid undefined behaviors. Indeed, some config parts may depend on global settings and it is a problem if these settings are changed after.	2022-11-18 16:03:45 +01:00
Christopher Faulet	62138aab3e	MINOR: mux-h1: Rely on a H1S flag to know a WS key was found or not h1_process_mux() is written to allow partial headers formatting. For now, all headers are forwarded in one time. But it is still good to keep this ability at the H1 mux level. So we must rely on a H1S flag instead of a local variable to know a WebSocket key was found in headers to be able to generate a key if necessary. There is no reason to backport this patch.	2022-11-17 14:33:15 +01:00
Christopher Faulet	ab79b321d6	MEDIUM: mux-fcgi: Introduce flags to deal with connection read/write errors Similarly to the H1 and H2 multiplexers, FCFI_CF_ERR_PENDING is now used to report an error when we try to send data and FCGI_CF_ERROR to report an error when we try to read data. In other funcions, we rely on these flags instead of connection ones. Only FCGI_CF_ERROR is considered as a final error. FCGI_CF_ERR_PENDING does not block receive attempt. In addition, FCGI_CF_EOS flag was added. we rely on it to test if a read0 was received or not.	2022-11-17 14:33:15 +01:00
Christopher Faulet	68ee7845cf	CLEANUP: mux-h2: Remove unused fields in h2c structures Some fields in h2c structures are not used: .mfl, .mft and .mff. Just remove them. .msi field is also removed. It is tested but never set, except when a H2 connection is initialized. It also means h2c_mux_busy() function is useless because it always returns 0 (.msi is always -1). And thus, by transitivity, H2_CF_DEM_MBUSY is also useless because it is never set. So .msi field, h2c_mux_busy() function and H2C_MUX_BUSY flag are removed.	2022-11-17 14:33:15 +01:00
Christopher Faulet	ff7925dce0	MEDIUM: mux-h2: Introduce flags to deal with connection read/write errors Similarly to the H1 multiplexer, H2_CF_ERR_PENDING is now used to report an error when we try to send data and H2_CF_ERROR to report an error when we try to read data. In other funcions, we rely on these flags instead of connection ones. Only H2_CF_ERROR is considered as a final error. H2_CF_ERR_PENDING does not block receive attempt. In addition, we rely on H2_CF_RCVD_SHUT flag to test if a read0 was received or not.	2022-11-17 14:33:15 +01:00
Christopher Faulet	31da34d1e7	MEDIUM: mux-h1: Don't report a final error whe a message is aborted When the H1 connection is aborted, we no longer set a final error. To do so, the flag H1C_F_ABORTED was added. For now, it is only set when a error is detected on the H1 stream. Idea is to use ERR_PENDING/ERROR for upgoing errors and ABRT_PENDING/ABRTED for downgoing errors.	2022-11-17 14:33:15 +01:00
Christopher Faulet	b3de5e5084	CLEANUP: mux-h1: Reorder H1 connection flags to avoid holes	2022-11-17 14:33:15 +01:00
Christopher Faulet	fc473a6453	MEDIUM: mux-h1: Rely on the H1C to deal with shutdown for reads read0 is now handled with a H1 connection flag (H1C_F_EOS). Corresponding flag was removed on the H1 stream and we fully rely on the SE descriptor at the stream level. Concretly, it means we rely on the H1 connection flags instead of the connection one. H1C_F_EOS is only set in h1_recv() or h1_rcv_pipe() after a read if a read0 was detected.	2022-11-17 14:33:15 +01:00
Christopher Faulet	bef8900cd6	MINOR: mux-h1: Add flag on H1 stream to deal with internal errors A new error is added on H1 stream to deal with internal errors. For now, this error is only reported when we fail to create a stream-connector. This way, the error is reported at the H1 stream level and not the H1 connection level.	2022-11-17 14:33:14 +01:00
Christopher Faulet	56a499475f	CLEANUP: mux-h1: Rename H1C_F_ERR_PENDING into H1C_F_ABRT_PENDING H1C_F_ERR_PENDING flags will be used to refactor error handling at the H1 connection level. It will be used to notify error during sends. Thus, the flag to notify an error must be sent before closing the connection is now named H1C_F_ABRT_PENDING. This introduce a naming convertion: ERROR must be used to notify upper layer of an event at the lower ones while ABORT must be used in the opposite direction.	2022-11-17 14:33:14 +01:00
Christopher Faulet	4e72b172d7	MEDIUM: mux-h1: Handle H1C states via its state field instead of H1C_F_ST_* The H1 connection state is now handled in a dedicated state. H1C_F_ST_* flags are removed. All states are now exclusives. It is easier to know the H1 connection states. It is alive, or usable, if it is not CLOSING or CLOSED. It is CLOSING if it should be closed ASAP but a stream is still attached and/or the output buffer is not empty. CLOSED is used when the H1 connection is ready to be closed. Other states are quite easy to understand. There is no special changes in the H1 connection behavior. Except in h1_send(). When a CLOSING connection is CLOSED, the function now reports an activity. In addition, when an embryonic H1 stream is aborted, it is destroyed. This way, the H1 connection can be switched to CLOSED state.	2022-11-17 14:33:14 +01:00
Christopher Faulet	ef93be2a7b	MINOR: mux-h1: Add a dedicated enum to deal with H1 connection state The H1 connection state will be handled is a dedicated field. To do so, h1_cs enum was added. The different states are more or less equivalent to H1C_F_ST_* flags: * H1_CS_IDLE <=> H1C_F_ST_IDLE * H1_CS_EMBRYONIC <=> H1C_F_ST_EMBRYONIC * H1_CS_UPGRADING <=> H1C_F_ST_ATTACHED && !H1C_F_ST_READY * H1_CS_RUNNING <=> H1C_F_ST_ATTACHED && H1C_F_ST_READY * H1_CS_CLOSING <=> H1C_F_ST_SHUTDOWN && (H1C_F_ST_ATTACHED \|\| b_data(&h1c->ibuf)) * H1_CS_CLOSED <=> H1C_F_ST_SHUTDOWN && !H1C_F_ST_ATTACHED && !b_data(&h1c->ibuf) In addition, in this patch, the h1_is_alive() and h1_close() function are added. The first one will be used to know if a H1 connection is alive or not. The second one will be used to set the connection in CLOSING or CLOSED state, depending on the output buffer state and if there is still a H1 stream or not. For now, the H1 connection state is not used.	2022-11-17 14:33:14 +01:00
Christopher Faulet	71abc0cfd5	CLEANUP: mux-h1: Rename H1C_F_ST_ERROR and H1C_F_ST_SILENT_SHUT flags _ST_ part is removed from these 2 flags because they don't reflect a state. In addition, the H1 connection state will be handled in a dedicated enum.	2022-11-17 14:33:14 +01:00
Christopher Faulet	7fcbcc0e4c	CLEANUP: mux-h1; Rename H1S_F_ERROR flag into H1S_F_ERROR_MASK In fact, H1S_F_ERROR is not a flag but a mask. So rename it to make it clear.	2022-11-17 14:33:14 +01:00
Willy Tarreau	2fd6dbfb0d	BUILD: makefile: move the compiler option detection stuff to compiler.mk There's quite a large barely readable functions block in the makefile dedicated to compiler option support. It provides no value here and makes it harder to find user-configurable stuff, so let's move it to include/make/compiler.mk to keep the makefile a bit cleaner. It's better to keep the options themselves in the makefile however.	2022-11-17 10:56:35 +01:00
Willy Tarreau	8b5a998c9c	BUILD: makefile: use $(cmd_MAKE) in quiet mode It's better to see "make" entering a subdir than seeing nothing, so let's use a command name for make. Since make 3.81, "+" needs to be prepended in front of the command to pass the job server to the subdir.	2022-11-17 10:56:35 +01:00
Willy Tarreau	8dd672523f	BUILD: makefile: move default verbosity settings to include/make/verbose.mk The $(Q), $(V), $(cmd_xx) handling needs to be reused in sub-project makefiles and it's a pain to maintain inside the main makefile. Let's just move that into a new subdir include/make/ with a dedicated file "verbose.mk". It slightly cleans up the makefile in addition.	2022-11-17 10:56:35 +01:00
Willy Tarreau	a58af5b0a1	MINOR: dynbuf: switch allocation and release to macros to better track users When building with DEBUG_MEM_STATS, we only see b_alloc() and b_free() as users of the "buffer" pool, because all call places rely on these more convenient functions. It's annoying because it makes it very hard to see which parts of the code are consuming buffers. By switching the b_alloc() and b_free() inline functions to macros, we can now finally track the users of struct buffer, e.g: mux_h1.c:513 P_FREE size: 1275002880 calls: 38910 size/call: 32768 buffer mux_h1.c:498 P_ALLOC size: 1912438784 calls: 58363 size/call: 32768 buffer stream.c:763 P_FREE size: 4121493504 calls: 125778 size/call: 32768 buffer stream.c:759 P_FREE size: 2061697024 calls: 62918 size/call: 32768 buffer stream.c:742 P_ALLOC size: 3341123584 calls: 101963 size/call: 32768 buffer stream.c:632 P_FREE size: 1275068416 calls: 38912 size/call: 32768 buffer stream.c:631 P_FREE size: 637435904 calls: 19453 size/call: 32768 buffer channel.h:850 P_ALLOC size: 4116480000 calls: 125625 size/call: 32768 buffer channel.h:850 P_ALLOC size: 720896 calls: 22 size/call: 32768 buffer dynbuf.c:55 P_FREE size: 65536 calls: 2 size/call: 32768 buffer Let's do this since it doesn't change anything for the output code (beyond adding the call places). Interestingly the code even got slightly smaller now.	2022-11-16 11:44:26 +01:00
Willy Tarreau	f7c475df5c	MINOR: pool/debug: create a new pool_alloc_flag() macro This macro just serves as an intermediary for __pool_alloc() and forwards the flag. When DEBUG_MEM_STATS is set, it will be used to collect all pool allocations including those which need to pass an explicit flag. It's now used by b_alloc() which previously couldn't be tracked by DEBUG_MEM_STATS, causing some free() calls to have no corresponding allocations.	2022-11-16 11:44:26 +01:00
Willy Tarreau	91d31c9e1c	OPTIM: ebtree: make ebmb_insert_prefix() keep a copy the new node's key Similarly to the previous patch, it's better to keep a local copy of the new node's key instead of accessing it every time. This slightly reduces the code's size in the descent and further improves the load time to 7.45s.	2022-11-15 09:37:09 +01:00
Willy Tarreau	bf13e53964	OPTIM: ebtree: make ebmb_insert_prefix() keep a copy the new node's pfx looking at a perf profile while loading a conf with a huge map, it appeared that there was a hot spot on the access to the new node's prefix, which is unexpectedly being reloaded for each visited node during the tree descent. Better keep a copy of it because with large trees that don't fit into the L3 cache the memory bandwidth is scarce. Doing so reduces the load time from 8.0 to 7.5 seconds.	2022-11-15 09:37:09 +01:00
Willy Tarreau	e98d385819	MINOR: deinit: add a "quick-exit" option to bypass the deinit step Once in a while we spot a bug in the deinit code that is complex, especially when it has to deal with incomplete initializations, and the ability to bypass this step has regularly been raised. In addition for fast-reloading setups it could theoretically save some time. Tests have shown that very large configs can barely save ~100-150ms by skipping the deinit step. However the ability not to crash if a bug is encountered can occasionally help. This patch adds an option to do exactly this. It's obviously not enabled by default and the documentation discourages from using it, but this might be useful in the future.	2022-11-15 09:37:09 +01:00
Willy Tarreau	6342714052	CLEANUP: stick-table: remove the unused table->exp_next The ->exp_next field of the stick-table was probably useful in 1.5 but it currently only carries a copy of what the future value of the table's task's expire value will be, while it's systematically copied over there immediately after being assigned. As such it provides exactly a local variable. Let's remove it, as it costs atomic operations.	2022-11-14 18:20:38 +01:00
Remi Tricot-Le Breton	e239e4938d	BUG/MINOR: ssl: Fix potential overflow Coverity raised a potential overflow issue in these new functions that work on unsigned long long objects. They were added in commit `9b25982` "BUG/MEDIUM: ssl: Verify error codes can exceed 63". This patch needs to be backported alongside `9b25982`.	2022-11-14 15:30:54 +01:00
Willy Tarreau	7ed0597ce8	BUILD: sample: use __fallthrough in smp_is_rw() and smp_dup() This avoids three build warnings when preprocessing happens before compiling with gcc >= 7.	2022-11-14 11:14:02 +01:00
Willy Tarreau	1f344c0f30	BUILD: compiler: define a __fallthrough statement for switch/case When the code is preprocessed first and compiled later, such as when built under distcc, a lot of fallthrough warnings are emitted because the preprocessor has already stripped the comments. As an alternative, a "fallthrough" attribute was added with the same compilers as those which started to emit those warnings. However it's not portable to older compilers. Let's just define a __fallthrough statement that corresponds to this attribute on supported compilers and only switches to the classical empty do {} while (0) on other ones. This way the code will support being cleaned up using __fallthrough.	2022-11-14 11:14:02 +01:00
Willy Tarreau	2b080f713f	BUILD: compiler: add a default definition for __has_attribute() It happens that gcc since 5.x has this macro which is only mentioned once in the doc, associated with __builtin_has_attribute(). Clang had it at least since 3.0. In addition it validates #ifdef when present, so it's easy to detect it. Here we're providing a fallback to another macro __has_attribute_<name> so that it's possible to define that macro to the value 1 for older compilers when the attribute is supported.	2022-11-14 11:14:02 +01:00
Willy Tarreau	08e09f0b3c	BUILD: compiler: add a macro to detect if another one is set and equals 1 In order to simplify compiler-specific checks, we'll need to check if some attributes exist. In order to ease declarations, we'll only focus on those that exist and will set them to 1. Let's first add a macro aimed at doing this. Passed a macro name in argument, it will return 1 if the macro is defined and equals 1, otherwise it will return 0. This is based on the concatenation of the macro's value with a name to form the name of a macro which contains one comma, resulting in some other macros arguments being shifted by one when the macro is defined. As such it's only a matter of pushing both a 1 and a 0 and picking the correct argument to see the desired one. It was verified to work since at least gcc-3.4 so it should be portable enough.	2022-11-14 11:14:02 +01:00
Willy Tarreau	71de04134e	IMPORT: slz: define and use a __fallthrough statement for switch/case When the code is preprocessed first and compiled later, such as when built under distcc, the "fall through" comments are dropped and warnings are emitted. Let's use the alternative "fallthrough" attribute instead, that is supported by versions of gcc and clang that also produce this warning. This is libslz upstream commit 0fdf8ae218f3ecb0b7f22afd1a6b35a4f94053e2	2022-11-14 11:14:02 +01:00
Dridi Boukelmoune	4bd53c397c	IMPORT: slz: mention the potential header in slz_finish() There may be 2 or 10 bytes sent respectively for zlib and gzip. This is libslz upstream commit de1cac155ac730ba0491a6c866a510760c01fa9b	2022-11-14 11:14:02 +01:00
Willy Tarreau	eab4256a9c	IMPORT: xxhash: update xxHash to version 0.8.1 This is the latest released version and a minor update on top of the current one (0.8.0). It addresses a few build issues (some for which patches were already backported), and particularly the fallthrough issue by using an attribute instead of a comment.	2022-11-14 11:14:02 +01:00
Willy Tarreau	eedcea8b90	BUILD: debug: remove unnecessary quotes in HA_WEAK() calls HA_WEAK() is supposed to take a symbol in argument, not a string, since the asm statements it produces already quote the argument. Having it quoted twice doesn't work on older compilers and was the only reason why DEBUG_MEM_STATS didn't work on older compilers.	2022-11-14 11:12:49 +01:00
Amaury Denoyelle	24e9961a8f	MINOR: cli: define usermsgs print context CLI 'add server' handler relies on usermsgs_ctx to display errors in internal function on CLI output. This may be also extended to other handlers. However, to not clutter stderr from another contextes, usermsgs_ctx must be resetted when it is not needed anymore. This operation cannot be conducted in the CLI parse handler as display is conducted after it. To achieve this, define new CLI states CLI_ST_PRINT_UMSG / CLI_ST_PRINT_UMSGERR. Their principles is nearly identical to states for dynamic messages printing.	2022-11-10 16:42:47 +01:00
Amaury Denoyelle	56f50a03b7	CLEANUP: cli: rename dynamic error printing state Rename CLI_ST_PRINT_FREE to CLI_ST_PRINT_DYNERR. Most notably, this highlights that this is reserved to error printing. This is done to ensure consistency between CLI_ST_PRINT/CLI_ST_PRINT_DYN and CLI_ST_PRINT_ERR/CLI_ST_PRINT_DYNERR. The name is also consistent with the function cli_dynerr() which activates it.	2022-11-10 16:42:47 +01:00
William Lallemand	960fb74cae	MEDIUM: ssl: {ca,crt}-ignore-err can now use error constant name The ca-ignore-err and crt-ignore-err directives are now able to use the openssl X509_V_ERR constant names instead of the numerical values. This allow a configuration to survive an OpenSSL upgrade, because the numerical ID can change between versions. For example X509_V_ERR_INVALID_CA was 24 in OpenSSL 1 and is 79 in OpenSSL 3. The list of errors must be updated when a new major OpenSSL version is released.	2022-11-10 13:28:37 +01:00
Remi Tricot-Le Breton	9b25982716	BUG/MEDIUM: ssl: Verify error codes can exceed 63 The CRT and CA verify error codes were stored in 6 bits each in the xprt_st field of the ssl_sock_ctx meaning that only error code up to 63 could be stored. Likewise, the ca-ignore-err and crt-ignore-err options relied on two unsigned long longs that were used as bitfields for all the ignored error codes. On the latest OpenSSL1.1.1 and with OpenSSLv3 and newer, verify errors have exceeded this value so these two storages must be increased. The error codes will now be stored on 7 bits each and the ignore-err bitfields are replaced by a big enough array and dedicated bit get and set functions. It can be backported on all stable branches. [wla: let it be tested a little while before backport] Signed-off-by: William Lallemand <wlallemand@haproxy.org>	2022-11-10 11:45:48 +01:00
Ilya Shipitsin	4a689dad03	CLEANUP: assorted typo fixes in the code and comments This is 32nd iteration of typo fixes	2022-10-30 17:17:56 +01:00
Amaury Denoyelle	735b44f5df	MINOR: quic: add counter for interrupted reception Add a new counter "quic_rxbuf_full". It is incremented each time quic_sock_fd_iocb() is interrupted on full buffer. This should help to debug github issue #1903. It is suspected that QUIC receiver buffers are full which in turn cause quic_sock_fd_iocb() to be called repeatedly resulting in a high CPU consumption.	2022-10-27 18:35:42 +02:00
Amaury Denoyelle	bbb1c68508	BUG/MINOR: quic: fix subscribe operation Subscribing was not properly designed between quic-conn and quic MUX layers. Align this as with in other haproxy components : <subs> field is moved from the MUX to the quic-conn structure. All mention of qcc MUX is cleaned up in quic_conn_subscribe()/quic_conn_unsubscribe(). Thanks to this change, ACK reception notification has been simplified. It's now unnecessary to check for the MUX existence before waking it. Instead, if <subs> quic-conn field is set, just wake-up the upper layer tasklet without mentionning MUX. This should probably be extended to other part in quic-conn code. This should be backported up to 2.6.	2022-10-26 18:18:26 +02:00
Fr�d�ric L�caille	36d1565640	MINOR: peers: Support for peer shards Add "shards" new keyword for "peers" section to configure the number of peer shards attached to such secions. This impact all the stick-tables attached to the section. Add "shard" new "server" parameter to configure the peers which participate to all the stick-tables contents distribution. Each peer receive the stick-tables updates only for keys with this shard value as distribution hash. The "shard" value is stored in ->shard new server struct member. cfg_parse_peers() which is the function which is called to parse all the lines of a "peers" section is modified to parse the "shards" parameter stored in ->nb_shards new peers struct member. Add srv_parse_shard() new callback into server.c to pare the "shard" parameter. Implement stksess_getkey_hash() to compute the distribution hash for a stick-table key as the 64-bits xxhash of the key concatenated to the stick-table name. This function is called by stksess_setkey_shard(), itself called by the already implemented function which create a new stick-table key (stksess_new()). Add ->idlen new stktable struct member to store the stick-table name length to not have to compute it each time a stick-table key hash is computed.	2022-10-24 10:55:53 +02:00
Amaury Denoyelle	7941ead3aa	MINOR: quic: display unknown error sendto counter on stat page This patch complete the previous incomplete commit. The new counter sendto_err_unknown is now displayed on stats page/CLI show stats. This is related to github issue #1903. This should be backported up to 2.6.	2022-10-24 10:52:59 +02:00
Amaury Denoyelle	1d9f170edd	MINOR: quic: do not crash on unhandled sendto error Remove ABORT_NOW() statement on unhandled sendto error. Instead use a dedicated counter sendto_err_unknown to report these cases. If we detect increment of this counter, strace can be used to detect errno value : $ strace -p $(pidof haproxy) -f -e trace=sendto -Z This should be backported up to 2.6. This should help to debug github issue #1903.	2022-10-24 10:18:44 +02:00
Amaury Denoyelle	176174f7e4	BUG/MINOR: mux-quic: complete flow-control for uni streams Max stream data was not enforced and respect for local/remote uni streams. Previously, qcs instances incorrectly reused the limit defined from bidirectional ones. This is now fixed. Two fields are added in qcc structure connection : * value for local flow control to enforce on remote uni streams * value for remote flow control to respect on local uni streams These two values can be reused to properly initialized msd field of a qcs instance in qcs_new(). The rest of the code is similar. This must be backported up to 2.6.	2022-10-21 17:31:18 +02:00
Aurelien DARRAGON	e951c3435c	MINOR: list: adding MT_LIST_APPEND_LOCKED macro adding a new mt macro: MT_LIST_APPEND_LOCKED. This macro may be used to append an item to an existing list, like MT_LIST_APPEND. But here the item will be forced into locked/busy state prior to appending, so that it is already referenced in the list while still preventing concurrent accesses until we decide to unlock it. The macro returns a struct mt_list "np", that is needed at unlock time using regular MT_LIST_UNLOCK_ELT macro.	2022-10-21 16:26:27 +02:00
Aurelien DARRAGON	18c284c126	DOC/MINOR: list: fixing MT_LIST_LOCK_ELT macro documentation MT_LIST_LOCK_ELT macro was documented with an ambiguous usage restriction, implying that concurrent list deletion was not supported. But it seems that either the code has evolved, or the comment is wrong because the locking behavior implemented here is exactly the same one used in MT_LIST_DELETE, and no such restriction is described for MT_LIST_DELETE. I made some tests to make sure concurrent MT_LIST_DELETE (or deletion from mt_list_for_each_entry_safe) don't cause unexepected results. At the present time, this macro is not used, this fix only targets upcoming developments that might rely on this. No backport needed.	2022-10-21 16:26:27 +02:00
Aurelien DARRAGON	bcaa401646	MINOR: list: fixing typo in MT_LIST_LOCK_ELT A minor typo was made in MT_LIST_LOCK_ELT, preventing haproxy from compiling if MT_LIST_LOCK_ELT is used in the code. Today, the macro is unused, and that's the reason why the typo has remained unnoticed for such a long time. Fixing it so it can be used in upcoming developments. No backport required.	2022-10-21 16:26:27 +02:00
William Lallemand	bb581423b3	BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient When the lua task finished before the httpclient that are associated to it, there is a risk that the httpclient try to task_wakeup() the lua task which does not exist anymore. To fix this issue the httpclient used in a lua task are stored in a list, and the httpclient are destroyed at the end of the lua task. Must be backported in 2.5 and 2.6.	2022-10-20 18:47:15 +02:00
Amaury Denoyelle	deb7c87f55	MINOR: quic: define first packet flag Received packets treatment has some difference regarding if this is the first one or not of the encapsulating datagram. Previously, this was set via a function argument. Simplify this by defining a new Rx packet flag named QUIC_FL_RX_PACKET_DGRAM_FIRST. This change does not have functional impact. It will simplify API when qc_lstnr_pkt_rcv() is broken into several functions : their number of arguments will be reduced thanks to this patch. This should be backported up to 2.6.	2022-10-19 18:12:56 +02:00
Amaury Denoyelle	845169da58	MINOR: quic: extend pn_offset field from quic_rx_packet pn_offset field was only set if header protection cannot be removed. Extend the usage of this field : it is now set everytime on packet parsing in qc_lstnr_pkt_rcv(). This change helps to clean up API of Rx functions by removing unnecessary variables and function argument. This change has no functional impact. It is a part of a refactoring series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of FD-owned socket patches. This should be backported up to 2.6.	2022-10-19 18:12:56 +02:00
Amaury Denoyelle	0eae57273b	MINOR: quic: add version field on quic_rx_packet Add a new field version on quic_rx_packet structure. This is set on header parsing in qc_lstnr_pkt_rcv() function. This change has no functional impact. It is a part of a refactoring series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of FD-owned socket patches. This should be backported up to 2.6.	2022-10-19 18:12:56 +02:00
Willy Tarreau	f5a0c8abf5	MEDIUM: quic: respect the threads assigned to a bind line Right now the QUIC thread mapping derives the thread ID from the CID by dividing by global.nbthread. This is a problem because this makes QUIC work on all threads and ignores the "thread" directive on the bind lines. In addition, only 8 bits are used, which is no more compatible with the up to 4096 threads we may have in a configuration. Let's modify it this way: - the CID now dedicates 12 bits to the thread ID - on output we continue to place the TID directly there. - on input, the value is extracted. If it corresponds to a valid thread number of the bind_conf, it's used as-is. - otherwise it's used as a rank within the current bind_conf's thread mask so that in the end we still get a valid thread ID for this bind_conf. The extraction function now requires a bind_conf in order to get the group and thread mask. It was better to use bind_confs now as the goal is to make them support multiple listeners sooner or later.	2022-10-13 18:08:05 +02:00
William Lallemand	eba6a54cd4	MINOR: logs: startup-logs can use a shm for logging the reload When compiled with USE_SHM_OPEN=1 the startup-logs are now able to use an shm which is used to keep the logs when switching to mworker wait mode. This allows to keep the failed reload logs. When allocating the startup-logs at first start of the process, haproxy will do a shm_open with a unique path using the PID of the process, the file is unlink immediatly so we don't let unwelcomed files be. The fd resulting from this shm is stored in the HAPROXY_STARTUPLOGS_FD environment variable so it can be mmap again when switching to wait mode. When forking children, the process is copying the mmap to a a mallocated ring so we never share the same memory section between the master and the workers. When switching to wait mode, the shm is not used anymore as it is also copied to a mallocated structure. This allow to use the "show startup-logs" command over the master CLI, to get the logs of the latest startup or reload. This way the logs of the latest failed reload are also kept. This is only activated on the linux-glibc target for now.	2022-10-13 16:50:22 +02:00
William Lallemand	35df34223b	MINOR: buffers: split b_force_xfer() into b_cpy() and b_force_xfer() Split the b_force_xfer() into b_ncat() and b_force_xfer(). The previous b_force_xfer() implementation was basically a copy with a b_del on the src buffer. Keep this implementation to make b_ncat(), and just call b_ncat() + b_del() into b_force_xfer().	2022-10-13 16:45:28 +02:00
William Lallemand	9e4ead3095	MINOR: ring: ring_cast_from_area() cast from an allocated area Cast an unified ring + storage area to a ring from area, without reinitializing the data buffer. Reinitialize the waiters and the lock. It helps retrieving a previously allocated ring, from an mmap for example.	2022-10-13 16:45:28 +02:00
Amaury Denoyelle	1cba8d60f3	CLEANUP: quic: improve naming for rxbuf/datagrams handling QUIC datagrams are read from a random thread. They are then redispatch to the connection thread according to the first packet DCID. These operations are implemented through a special buffer designed to avoid locking. Refactor this code with the following changes : * <rxbuf> type is renamed <quic_receiver_buf>. Its list element is also renamed to highligh its attach point to a receiver. * <quic_dgram> and <quic_receiver_buf> definition are moved to quic_sock-t.h. This helps to reduce the size of quic_conn-t.h. * <quic_dgram> list elements are renamed to highlight their attach point into a <quic_receiver_buf> and a <quic_dghdlr>. This should be backported up to 2.6.	2022-10-13 11:06:48 +02:00
Amaury Denoyelle	8c4d062d25	CLEANUP: quic: remove unused rxbufs member in receiver rxbuf is the structure used to store QUIC datagrams and redispatch them to the connection thread. Each receiver manages a list of rxbuf. This was stored both as an array and a mt_list. Currently, only mt_list is needed so removed <rxbufs> member from receiver structure. This should be backported up to 2.6.	2022-10-13 11:05:41 +02:00
Frédéric Lécaille	e1a49cfd4d	MINOR: quic: Split the secrets key allocation in two parts Implement quic_tls_secrets_keys_alloc()/quic_tls_secrets_keys_free() to allocate the memory for only one direction (RX or TX). Modify ha_quic_set_encryption_secrets() to call these functions for one of this direction (or both). So, for now on we can rely on the value of the secret keys to know if it was derived. Remove QUIC_FL_TLS_SECRETS_SET flag which is no more useful. Consequently, the secrets are dumped by the traces only if derived. Must be backported to 2.6.	2022-10-13 10:12:03 +02:00
Frédéric Lécaille	4aa7d8197a	BUG/MINOR: quic: Stalled 0RTT connections with big ClientHello TLS message This issue was reproduced with -Q picoquic client option to split a big ClientHello message into two Initial packets and haproxy as server without any knowledged of any previous ORTT session (restarted after a firt 0RTT session). The ORTT received packets were removed from their queue when the second Initial packet was parsed, and the QUIC handshake state never progressed and remained at Initial state. To avoid such situations, after having treated some Initial packets we always check if there are ORTT packets to parse and we never remove them from their queue. This will be done after the hanshake is completed or upon idle timeout expiration. Also add more traces to be able to analize the handshake progression. Tested with ngtcp2 and picoquic Must be backported to 2.6.	2022-10-13 10:12:03 +02:00
Frédéric Lécaille	9f9263ed13	MINOR: quic: Use a non-contiguous buffer for RX CRYPTO data Implement quic_get_ncbuf() to dynamically allocate a new ncbuf to be attached to any quic_cstream struct which needs such a buffer. Note that there is no quic_cstream for 0RTT encryption level. quic_free_ncbuf() is added to release the memory allocated for a non-contiguous buffer. Modify qc_handle_crypto_frm() to call this function and allocate an ncbuf for crypto data which are not received in order. The crypto data which are received in order are not buffered but provide to the TLS stack (calling qc_provide_cdata()). Modify qc_treat_rx_crypto_frms() which is called after having provided the in order received crypto data to the TLS stack to provide again the remaining crypto data which has been buffered, if possible (if they are in order). Each time buffered CRYPTO data were consumed, we try to release the memory allocated for the non-contiguous buffer (ncbuf). Also move rx.crypto.offset quic_enc_level struct member to rx.offset quic_cstream struct member. Must be backported to 2.6.	2022-10-13 10:12:03 +02:00
Frédéric Lécaille	7e3f7c47e9	MINOR: quic: New quic_cstream object implementation Add new quic_cstream struct definition to implement the CRYPTO data stream. This is a simplication of the qcs object (QUIC streams) for the CRYPTO data without any information about the flow control. They are not attached to any tree, but to a QUIC encryption level, one by encryption level except for the early data encryption level (for 0RTT). A stream descriptor is also allocated for each CRYPTO data stream. Must be backported to 2.6	2022-10-13 10:12:03 +02:00
Willy Tarreau	d114f4a68f	MEDIUM: checks: spread the checks load over random threads The CPU usage pattern was found to be high (5%) on a machine with 48 threads and only 100 servers checked every second That was supposed to be only 100 connections per second, which should be very cheap. It was figured that due to the check tasks unbinding from any thread when going back to sleep, they're queued into the shared queue. Not only this requires to manipulate the global queue lock, but in addition it means that all threads have to check the global queue before going to sleep (hence take a lock again) to figure how long to sleep, and that they would all sleep only for the shortest amount of time to the next check, one would pick it and all other ones would go down to sleep waiting for the next check. That's perfectly visible in time-to-first-byte measurements. A quick test consisting in retrieving the stats page in CSV over a 48-thread process checking 200 servers every 2 seconds shows the following tail: percentile ttfb(ms) 99.98 2.43 99.985 5.72 99.99 32.96 99.995 82.176 99.996 82.944 99.9965 83.328 99.997 83.84 99.9975 84.288 99.998 85.12 99.9985 86.592 99.999 88 99.9995 89.728 99.9999 100.352 One solution could consist in forcefully binding checks to threads at boot time, but that's annoying, will cause trouble for dynamic servers and may cause some skew in the load depending on some server patterns. Instead here we take a different approach. A check remains bound to its thread for as long as possible, but upon every wakeup, the thread's load is compared with another random thread's load. If it's found that that other thread's load is less than half of the current one's, the task is bounced to that thread. In order to prevent that new thread from doing the same, we set a flag "CHK_ST_SLEEPING" that indicates that it just woke up and we're bouncing the task only on this condition. Tests have shown that the initial load was very unfair before, with a few checks threads having a load of 15-20 and the vast majority having zero. With this modification, after two "inter" delays, the load is either zero or one everywhere when checks start. The same test shows a CPU usage that significantly drops, between 0.5 and 1%. The same latency tail measurement is much better, roughly 10 times smaller: percentile ttfb(ms) 99.98 1.647 99.985 1.773 99.99 4.912 99.995 8.76 99.996 8.88 99.9965 8.944 99.997 9.016 99.9975 9.104 99.998 9.224 99.9985 9.416 99.999 9.8 99.9995 10.04 99.9999 10.432 In fact one difference here is that many threads work while in the past they were waking up and going down to sleep after having perturbated the shared lock. Thus it is anticipated that this will scale way smoother than before. Under strace it's clearly visible that all threads are sleeping for the time it takes to relaunch a check, there's no more thundering herd wakeups. However it is also possible that in some rare cases such as very short check intervals smaller than a scheduler's timeslice (such as 4ms), some users might have benefited from the work being concentrated on less threads and would instead observe a small increase of apparent CPU usage due to more total threads waking up even if that's for less work each and less total work. That's visible with 200 servers at 4ms where show activity shows that a few threads were overloaded and others doing nothing. It's not a problem, though as in practice checks are not supposed to eat much CPU and to wake up fast enough to represent a significant load anyway, and the main issue they could have been causing (aside the global lock) is an increase last-percentile latency.	2022-10-12 21:49:30 +02:00
Christopher Faulet	c8db114afc	MINOR: flags/mux-fcgi: Decode FCGI connection and stream flags The new functions fconn_show_flags() and fstrm_show_flags() decode the flags state into a string, and are used by dev/flags: $ /dev/flags/flags fconn 0x3100 fconn->flags = FCGI_CF_GET_VALUES \| FCGI_CF_KEEP_CONN \| FCGI_CF_MPXS_CONNS ./dev/flags/flags fstrm 0x3300 fstrm->flags = FCGI_SF_WANT_SHUTW \| FCGI_SF_WANT_SHUTR \| FCGI_SF_OUTGOING_DATA \| FCGI_SF_BEGIN_SENT	2022-10-12 17:10:41 +02:00
Christopher Faulet	3965aa7494	REORG: mux-fcgi: Extract flags and enums into mux_fcgi-t.h The same was performed for the H2 and H1 multiplexers. FCGI connection and stream flags are moved in a dedicated header file. It will be mainly used to be able to decode mux-fcgi flags from the flags utility. In this patch, we move the flags and enums to mux_fcgi-t.h, as well as the two state decoding inline functions.	2022-10-12 17:10:37 +02:00
Willy Tarreau	dbae89e09c	MEDIUM: stick-table: always use atomic ops to requeue the table's task We're generalizing the change performed in previous commit "MEDIUM: stick-table: requeue the expiration task out of the exclusive lock" to stktable_requeue_exp() so that it can also be used by callers of __stktable_store(). At the moment there's still no visible change since it's still called under the write lock. However, the previous code in stitable_touch_with_exp() was updated to use this function.	2022-10-12 14:19:05 +02:00
Willy Tarreau	8d3c3336f9	MEDIUM: stick-table: make stksess_kill_if_expired() avoid the exclusive lock stream_store_counters() calls stksess_kill_if_expired() for each active counter. And this one takes an exclusive lock on the table before checking if it has any work to do (hint: it almost never has since it only wants to delete expired entries). However a lock is still neeed for now to protect the ref_cnt, but we can do it atomically under the read lock. Let's change the mechanism. Now what we do is to check out of the lock if the entry is expired. If it is, we take the write lock, expire it, and decrement the refcount. Otherwise we just decrement the refcount under a read lock. With this change alone, the config based on 3 trackers without the previous patches saw a 2.6x improvement, but here it doesn't yet change anything because some heavy contention remains on the lookup part.	2022-10-12 14:19:05 +02:00
Willy Tarreau	9f5cb435b6	MINOR: stick-table: move the write lock inside stktable_touch_with_exp() Taking the write lock prior to entering that function is a problem because this function is full of conditions that most of the time can lead to eliminating the lock. This commit first moves the write lock inside the function and passes the extra argument required to implement stktable_touch_remote() and stktable_touch_local(). It also renames the function to remove the underscores since there's no other variant and it's exported under this name (probably an old rename that was not propagated). The code was stressed under 48 threads using 3 trackers on the same table. It already shows a tiny 3% improvement from 187k to 193k rps.	2022-10-12 14:19:05 +02:00
Willy Tarreau	76642223f0	MEDIUM: stick-table: switch the table lock to rwlock Right now a spinlock is used, but most accesses are for reads, so let's switch the lock to an rwlock and switch all accesses to exclusive locks for now. There should be no visible difference at this point.	2022-10-12 14:19:05 +02:00
Willy Tarreau	f6a42c3a37	MINOR: freq_ctr: use the thread's local time whenever possible Right now when dealing with freq_ctr updates, we're using the process- wide monotinic time, and accessing it is expensive since every thread needs to update it, so this adds some contention. However we don't need it all the time, the thread's local time is most of the time strictly equal to the global time, and may be off by one millisecond when the global time is switched to the next one by another thread, and in this case we don't want to use the local time because it would risk to cause a rotation of the counter. But that's precisely the condition we're already relying on for the slow path! What this patch does is to add a check for the period against the local time prior to anything else, and immediately return after updating the counter if still within the period, otherwise fall back to the existing code. Given that the function starts to inflate a bit, it was split between s very short inline part that does the hot path, and the slower fallback that's in a cold function. It was measured that on a 24-CPU machine it was called ~0.003% of the time. The resulting improvement sits between 2 and 3% at 500k req/s tracking an http_req_rate counter.	2022-10-12 14:19:05 +02:00
Willy Tarreau	b13044cc1a	MINOR: plock: support disabling exponential back-off The new macro PLOCK_DISABLE_EBO may be defined to disable exponential backoff. This can be useful to more easily spot functions that cause contention. In this case the CPU will be spent inside the functions themselves instead of the pl_wait_unlock_{long,int}() functions, making them easier to spot using "perf top" even if that causes a significant degradation of the thread scalability.	2022-10-12 14:19:05 +02:00
Willy Tarreau	cab054bbf9	CLEANUP: quic/receiver: remove the now unused tx_qring list The tx_qrings[] and tx_qring_list in the receiver are not used anymore since commit `f2476053f` ("MINOR: quic: replace custom buf on Tx by default struct buffer"), the only place where they're referenced was in quic_alloc_tx_rings_listener(), which by the way implies that these were not even freed on exit. Let's just remove them. This should be backported to 2.6 since the commit above also was.	2022-10-11 08:40:38 +02:00
Amaury Denoyelle	97ecc7a8ea	MEDIUM: quic: retrieve frontend destination address Retrieve the frontend destination address for a QUIC connection. This address is retrieve from the first received datagram and then stored in the associated quic-conn. This feature relies on IP_PKTINFO or affiliated flags support on the socket. This flag is set for each QUIC listeners in sock_inet_bind_receiver(). To retrieve the destination address, recvfrom() has been replaced by recvmsg() syscall. This operation and parsing of msghdr structure has been extracted in a wrapper quic_recv(). This change is useful to finalize the implementation of 'dst' sample fetch. As such, quic_sock_get_dst() has been edited to return local address from the quic-conn. As a best effort, if local address is not available due to kernel non-support of IP_PKTINFO, address of the listener is returned instead. This should be backported up to 2.6.	2022-10-10 11:48:27 +02:00
Amaury Denoyelle	2ed840015f	MINOR: quic: limit usage of ssl_sock_ctx in favor of quic_conn Continue on the cleanup of QUIC stack and components. quic_conn uses internally a ssl_sock_ctx to handle mandatory TLS QUIC integration. However, this is merely as a convenience, and it is not equivalent to stackable ssl xprt layer in the context of HTTP1 or 2. To better emphasize this, ssl_sock_ctx usage in quic_conn has been removed wherever it is not necessary : namely in functions not related to TLS. quic_conn struct now contains its own wait_event for tasklet quic_conn_io_cb(). This should be backported up to 2.6.	2022-10-05 11:08:32 +02:00
Willy Tarreau	922a907926	MINOR: fd: add a new function to only raise RLIMIT_NOFILE In issue #1866 an issue was reported under docker, by which a user cannot lower the number of FD needed. It looks like a restriction imposed in this environment, but it results in an error while it ought not have to in the case of shrinking. This patch adds a new function raise_rlim_nofile() that takes the desired new setting, compares it to the current one, and only calls setrlimit() if one of the values in the new setting is larger than the older one. As such it will continue to emit warnings and errors in case of failure to raise the limit but will never shrink it. This patch is only preliminary to another one, but will have to be backported where relevant (likely only 2.6).	2022-10-04 08:38:47 +02:00
Amaury Denoyelle	92fa63f735	CLEANUP: quic: create a dedicated quic_conn module xprt_quic module was too large and did not reflect the true architecture by contrast to the other protocols in haproxy. Extract code related to XPRT layer and keep it under xprt_quic module. This code should only contains a simple API to communicate between QUIC lower layer and connection/MUX. The vast majority of the code has been moved into a new module named quic_conn. This module is responsible to the implementation of QUIC lower layer. Conceptually, it overlaps with TCP kernel implementation when comparing QUIC and HTTP1/2 stacks of haproxy. This should be backported up to 2.6.	2022-10-03 16:25:17 +02:00
Amaury Denoyelle	a2639383ec	CLEANUP: quic: remove duplicated varint code from xprt_quic.h There was some identical code between xprt_quic and quic_enc modules. This concerns helper on QUIC varint type. Keep only the version in quic_enc file : this should help to reduce dependency on xprt_quic module. Note that quic_max_int_by_size() has been removed and is replaced by the identical quic_max_int(). This should be backported up to 2.6.	2022-10-03 16:25:17 +02:00
Amaury Denoyelle	5c25dc5bfd	CLEANUP: quic: fix headers Clean up quic sources by adjusting headers list included depending on the actual dependency of each source file. On some occasion, xprt_quic.h was removed from included list. This is useful to help reducing the dependency on this single file and cleaning up QUIC haproxy architecture. This should be backported up to 2.6.	2022-10-03 16:25:17 +02:00
Amaury Denoyelle	f3c40f83fb	BUG/MINOR: quic: adjust quic_tls prototypes Two prototypes in quic_tls module were not identical to the actual function definition. * quic_tls_decrypt2() : the second argument const attribute is not present, to be able to use it with EVP_CIPHER_CTX_ctlr(). As a consequence of this change, token field of quic_rx_packet is now declared as non-const. * quic_tls_generate_retry_integrity_tag() : the second argument type differ between the two. Adjust this by fixing it to as unsigned char to match EVP_EncryptUpdate() SSL function. This situation did not seem to have any visible effect. However, this is clearly an undefined behavior and should be treated as a bug. This should be backported up to 2.6.	2022-10-03 16:25:17 +02:00
Amaury Denoyelle	a19bb6f0b2	CLEANUP: quic: remove global var definition in quic_tls header Some variables related to QUIC TLS were defined in a header file : their definitions are now moved properly in the implementation file, with only declarations in the header. This should be backported up to 2.6.	2022-10-03 16:25:17 +02:00
Willy Tarreau	406efb96d1	BUG/MINOR: backend: only enforce turn-around state when not redispatching In github issue #1878, Bart Butler reported observing turn-around states (1 second pause) after connection retries going to different servers, while this ought not happen. In fact it does happen because back_handle_st_cer() enforces the TAR state for any algo that's not round-robin. This means that even leastconn has it, as well as hashes after the number of servers changed. Prior to doing that, the call to stream_choose_redispatch() has already had a chance to perform the correct choice and to check the algo and the number of retries left. So instead we should just let that function deal with the algo when needed (and focus on deterministic ones), and let the former just obey. Bart confirmed that the fixed version works as expected (no more delays during retries). This may be backported to older releases, though it doesn't seem very important. At least Bart would like to have it in 2.4 so let's go there for now after it has cooked a few weeks in 2.6.	2022-10-03 15:04:55 +02:00
Willy Tarreau	8522348482	BUG/MAJOR: conn-idle: fix hash indexing issues on idle conns Idle connections do not work on 32-bit machines due to an alignment issue causing the connection nodes to be indexed with their lower 32-bits set to zero and the higher 32 ones containing the 32 lower bitss of the hash. The cause is the use of ebmb_node with an aligned data, as on this platform ebmb_node is only 32-bit aligned, leaving a hole before the following hash which is a uint64_t: $ pahole -C conn_hash_node ./haproxy struct conn_hash_node { struct ebmb_node node; /* 0 20 / / XXX 4 bytes hole, try to pack / int64_t hash; / 24 8 / struct connection conn; /* 32 4 / / size: 40, cachelines: 1, members: 3 / / sum members: 32, holes: 1, sum holes: 4 / / padding: 4 / / last cacheline: 40 bytes */ }; Instead, eb64 nodes should be used when it comes to simply storing a 64-bit key, and that is what this patch does. For backports, a variant consisting in simply marking the "hash" member with a "packed" attribute on the struct also does the job (tested), and might be preferable if the fix is difficult to adapt. Only 2.6 and 2.5 are affected by this.	2022-10-03 12:06:36 +02:00
Erwan Le Goas	d78693178c	MINOR: cli: correct commentary and replace 'set global-key' name Correct a commentary in in include/haproxy/global-t.h and include/haproxy/tools.h Replace the CLI command 'set global-key <key>' by 'set anon global-key <key>' in order to find it easily when you don't remember it, the recommandation can guide you when you just tap 'set anon'. No backport needed, except if anonymization mechanism is backported.	2022-09-29 10:53:15 +02:00
Erwan Le Goas	f30c5d7666	MINOR: config: Add option line when the configuration file is dumped Add an option to dump the number lines of the configuration file when it's dumped. Other options can be easily added. Options are separated by ',' when tapping the command line: './haproxy -dC[key],line -f [file]' No backport needed, except if anonymization mechanism is backported.	2022-09-29 10:53:15 +02:00
Erwan Le Goas	5eef1588a1	MINOR: tools: modify hash_ipanon in order to use it in cli Add a parameter hasport to return a simple hash or ipstring when ipstring has no port. Doesn't hash if scramble is null. Add option PA_O_PORT_RESOLVE to str2sa_range. Add a case UNIX. Those modification permit to use hash_ipanon in cli section in order to dump the same anonymization of address in the configuration file and with CLI. No backport needed, except if anonymization mechanism is backported.	2022-09-29 10:53:14 +02:00
Willy Tarreau	56ac2cbf58	CLEANUP: list: fix again some style issues in the recent comments While reading the recent changes around mt_list_for_each_entry_safe() I noticed a spurious "q" at the beginning of a line introduced by commit `455843721` ("CLEANUP: list: Fix mt_list_for_each_entry_safe indentation") and that visually confusing multi-line comments missing the trailing '\' character were introduced by previous commit `60cffbaca` ("MINOR: list: documenting mt_list_for_each_entry_safe() macro"), which at first glance made the macro look broken. In addition, multi-line comments must end with a "/" on its own line to instantly spot where it ends without having to read the whole line, like this: / we know from the above that foo is always valid * here so it's safe to end the string: / (unsigned char )foo = 0; Not like this: / we know from the above that foo is always valid * here so it's safe to end the string: / (unsigned char *)foo = 0; Finally, macro's main comment mentionned the wrong macro name and types, and was randomly indented.	2022-09-27 08:04:08 +02:00
William Lallemand	0a0512f76d	MINOR: mworker/cli: the mcli_reload bind_conf only send the reload status Upon a reload with the master CLI, the FD of the master CLI session is received by the internal socketpair listener. This session is used to display the status of the reload and then will close.	2022-09-24 16:35:23 +02:00
William Lallemand	56f73b21a5	MINOR: mworker: stores the mcli_reload bind_conf Stores the mcli_reload bind_conf in order to identify it later.	2022-09-24 15:56:25 +02:00
William Lallemand	21623b5949	MINOR: mworker: mworker_cli_proxy_new_listener() returns a bind_conf mworker_cli_proxy_new_listener() now returns a bind_conf * or NULL upon failure.	2022-09-24 15:51:27 +02:00
Christopher Faulet	4558437211	CLEANUP: list: Fix mt_list_for_each_entry_safe indentation It makes the macro easier to read.	2022-09-21 16:02:40 +02:00
Aurelien DARRAGON	60cffbaca5	MINOR: list: documenting mt_list_for_each_entry_safe() macro - Adding some comments in mt_list_for_each_entry_safe() macro to make it somehow understandable. The macro is performing critical stuff but was not documented at all. Moreover, nested loops with conditional tricks are used, making it even harder to understand the steps performed in it. - Updating mt_list_for_each_entry_safe usage example. - Added a "FIXME:" comment in a specific condition that seems to never be reached even when deeply stress-testing mt_lists (using test_list binary provided in the repository).	2022-09-21 16:02:40 +02:00
Willy Tarreau	a700420671	MINOR: clock: split local and global date updates Pollers that support busy polling spend a lot of time (and cause contention) updating the global date when they're looping over themselves while it serves no purpose: what's needed is only an update on the local date to know when to stop looping. This patch splits clock_pudate_date() into a pair of local and global update functions, so that pollers can be easily improved.	2022-09-21 09:06:28 +02:00
Aurelien DARRAGON	ae1e14d65b	CLEANUP: tools: removing escape_chunk() function Func is not used anymore. See e3bde807d.	2022-09-20 16:25:30 +02:00
Aurelien DARRAGON	c5bff8e550	BUG/MINOR: log: improper behavior when escaping log data Patrick Hemmer reported an improper log behavior when using log-format to escape log data (+E option): Some bytes were truncated from the output: - escape_string() function now takes an extra parameter that allow the caller to specify input string stop pointer in case the input string is not guaranteed to be zero-terminated. - Minors checks were added into lf_text_len() to make sure dst string will not overflow. - lf_text_len() now makes proper use of escape_string() function. This should be backported as far as 1.8.	2022-09-20 16:25:30 +02:00
Amaury Denoyelle	0ed617ac2f	BUG/MEDIUM: mux-quic: properly trim HTX buffer on snd_buf reset MUX QUIC snd_buf operation whill return early if a qcs instance is resetted. In this case, HTX is left untouched and the callback returns the whole bufer size. This lead to an undefined behavior as the stream layer is notified about a transfer but does not see its HTX buffer emptied. In the end, the transfer may stall which will lead to a leak on session. To fix this, HTX buffer is now resetted when snd_buf is short-circuited. This should fix the issue as now the stream layer can continue the transfer until its completion. This patch has already been tested by Tristan and is reported to solve the github issue #1801. This should be backported up to 2.6.	2022-09-20 15:35:33 +02:00
Amaury Denoyelle	9534e59bb9	MINOR: mux-quic: refactor snd_buf Factorize common code between h3 and hq-interop snd_buf operation. This is inserted in MUX QUIC snd_buf own callback. The h3/hq-interop API has been adjusted to directly receive a HTX message instead of a plain buf. This led to extracting part of MUX QUIC snd_buf in qmux_http module. This should be backported up to 2.6.	2022-09-20 15:35:29 +02:00
Amaury Denoyelle	d80fbcaca2	REORG: mux-quic: export HTTP related function in a dedicated file Extract function dealing with HTX outside of MUX QUIC. For the moment, only rcv_buf stream operation is concerned. The main objective is to be able to support both TCP and HTTP proxy mode with a common base and add specialized modules on top of it. This should be backported up to 2.6.	2022-09-20 15:35:23 +02:00
Amaury Denoyelle	36d50bff22	REORG: mux-quic: extract traces in a dedicated source file QUIC MUX implements several APIs to interface with stream, quic-conn and app-ops layers. It is planified to better separate this roles, possibly by using several files. The first step is to extract QUIC MUX traces in a dedicated source files. This will allow to reuse traces in multiple files. The main objective is to be able to support both TCP and HTTP proxy mode with a common base and add specialized modules on top of it. This should be backported up to 2.6.	2022-09-20 15:35:09 +02:00
Amaury Denoyelle	afb7b9d8e5	BUG/MEDIUM: mux-quic: fix nb_hreq decrement nb_hreq is a counter on qcc for active HTTP requests. It is incremented for each qcs where a full HTTP request was received. It is decremented when the stream is closed locally : - on HTTP response fully transmitted - on stream reset A bug will occur if a stream is resetted without having processed a full HTTP request. nb_hreq will be decremented whereas it was not incremented. This will lead to a crash when building with DEBUG_STRICT=2. If BUG_ON_HOT are not active, nb_hreq counter will wrap which may break the timeout logic for the connection. This bug was triggered on haproxy.org. It can be reproduced by simulating the reception of a STOP_SENDING frame instead of a STREAM one by patching qc_handle_strm_frm() : + if (quic_stream_is_bidi(strm_frm->id)) + qcc_recv_stop_sending(qc->qcc, strm_frm->id, 0); + //ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len, + // strm_frm->offset.key, strm_frm->fin, + // (char *)strm_frm->data); To fix this bug, a qcs is now flagged with a new QC_SF_HREQ_RECV. This is set when the full HTTP request is received. When the stream is closed locally, nb_hreq will be decremented only if this flag was set. This must be backported up to 2.6.	2022-09-19 12:12:21 +02:00
Erwan Le Goas	b0c0501516	MINOR: config: add command-line -dC to dump the configuration file This commit adds a new command line option -dC to dump the configuration file. An optional key may be appended to -dC in order to produce an anonymized dump using this key. The anonymizing process uses the same algorithm as the CLI so that the same key will produce the same hashes for the same identifiers. This way an admin may share an anonymized extract of a configuration to match against live dumps. Note that key 0 will not anonymize the output. However, in any case, the configuration is dumped after tokenizing, thus comments are lost.	2022-09-17 11:27:09 +02:00
Erwan Le Goas	54966dffda	MINOR: anon: store the anonymizing key in the CLI's appctx In order to allow users to dump internal states using a specific key without changing the global one, we're introducing a key in the CLI's appctx. This key is preloaded from the global one when "set anon on" is used (and if none exists, a random one is assigned). And the key can optionally be assigned manually for the whole CLI session. A "show anon" command was also added to show the anon state, and the current key if the users has sufficient permissions. In addition, a "debug dev hash" command was added to test the feature.	2022-09-17 11:27:09 +02:00
Erwan Le Goas	fad9da83da	MINOR: anon: store the anonymizing key in the global structure Add a uint32_t key in global to hash words with it. A new CLI command 'set global-key <key>' was added to change the global anonymizing key. The global may also be set in the configuration using the global "anonkey" directive. For now this key is not used.	2022-09-17 11:24:53 +02:00
Erwan Le Goas	9c76637fff	MINOR: anon: add new macros and functions to anonymize contents These macros and functions will be used to anonymize strings by producing a short hash. This will allow to match config elements against dump elements without revealing the original data. This will later be used to anonymize configuration parts and CLI commands output. For now only string, identifiers and addresses are supported, but the model is easily extensible.	2022-09-17 11:24:53 +02:00
Amaury Denoyelle	8d4ac48d3d	CLEANUP: mux-quic: remove stconn usage in h3/hq Small cleanup on snd_buf for application protocol layer. * do not export h3_snd_buf * replace stconn by a qcs argument. This is better as h3/hq-interop only uses the qcs instance. This should be backported up to 2.6.	2022-09-16 13:53:30 +02:00
Christopher Faulet	7c4b2ec09d	MINOR: flags/mux-h1: decode H1C and H1S flags The new functions h1c_show_flags() and h1s_show_flags() decode the flags state into a string, and are used by dev/flags: $ /dev/flags/flags h1c 0x2200 h1c->flags = H1C_F_ST_READY \| H1C_F_ST_ATTACHED ./dev/flags/flags h1s 0x190 h1s->flags = H1S_F_BODYLESS_RESP \| H1S_F_NOT_FIRST \| H1S_F_WANT_KAL	2022-09-15 11:01:59 +02:00
Christopher Faulet	18ad15f5c4	REORG: mux-h1: extract flags and enums into mux_h1-t.h The same was performed for the H2 multiplexer. H1C and H1S flags are moved in a dedicated header file. It will be mainly used to be able to decode mux-h1 flags from the flags utility. In this patch, we only move the flags to mux_h1-t.h.	2022-09-15 11:01:59 +02:00
Amaury Denoyelle	f8aaf8bdfa	BUG/MEDIUM: mux-quic: fix crash on early app-ops release H3 SETTINGS emission has recently been delayed. The idea is to send it with the first STREAM to reduce sendto syscall invocation. This was implemented in the following patch : `3dd79d378c` MINOR: h3: Send the h3 settings with others streams (requests) This patch works fine under nominal conditions. However, it will cause a crash if a HTTP/3 connection is released before having sent any data, for example when receiving an invalid first request. In this case, qc_release will first free qcc.app_ops HTTP/3 application protocol layer via release callback. Then qc_send is called to emit any closing frames built by app_ops release invocation. However, in qc_send, as no data has been sent, it will try to complete application layer protocol intialization, with a SETTINGS emission for HTTP/3. Thus, qcc.app_ops is reused, which is invalid as it has been just freed. This will cause a crash with h3_finalize in the call stack. This bug can be reproduced artificially by generating incomplete HTTP/3 requests. This will in time trigger http-request timeout without any data send. This is done by editing qc_handle_strm_frm function. - ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len, + ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len - 1, strm_frm->offset.key, strm_frm->fin, (char *)strm_frm->data); To fix this, application layer closing API has been adjusted to be done in two-steps. A new shutdown callback is implemented : it is used by the HTTP/3 layer to generate GOAWAY frame in qc_release prologue. Application layer context qcc.app_ops is then freed later in qc_release via the release operation which is now only used to liberate app layer ressources. This fixes the problem as the intermediary qc_send invocation will be able to reuse app_ops before it is freed. This patch fixes the crash, but it would be better to adjust H3 SETTINGS emission in case of early connection closing : in this case, there is no need to send it. This should be implemented in a future patch. This should fix the crash recently experienced by Tristan in github issue #1801. This must be backported up to 2.6.	2022-09-15 10:41:44 +02:00
William Lallemand	95fc737fc6	MEDIUM: quic: separate path for rx and tx with set_encryption_secrets With quicTLS the set_encruption_secrets callback is always called with the read_secret and the write_secret. However this is not the case with libreSSL, which uses the set_read_secret()/set_write_secret() mecanism. It still provides the set_encryption_secrets() callback, which is called with a NULL parameter for the write_secret during the read, and for the read_secret during the write. The exchange key was not designed in haproxy to be called separately for read and write, so this patch allow calls with read or write key to NULL.	2022-09-14 18:16:37 +02:00
William Lallemand	1c8f3b386d	MINOR: httpclient: export httpclient_create_proxy() Export httpclient_create_proxy() in http_client.h	2022-09-14 14:34:39 +02:00
William Lallemand	992ad62e3c	MEDIUM: httpclient: allow to use another proxy httpclient_new_from_proxy() is a variant of httpclient_new() which allows to create the requests from a different proxy. The proxy and its 2 servers are now stored in the httpclient structure. The proxy must have been created with httpclient_create_proxy() to be used. The httpclient_postcheck() callback will finish the initialization of all proxies created with PR_CAP_HTTPCLIENT.	2022-09-13 17:12:38 +02:00
William Lallemand	54aec5f678	MEDIUM: httpclient: httpclient_create_proxy() creates a proxy for httpclient httpclient_create_proxy() is a function which creates a proxy that could be used for the httpclient. It will allocate a proxy, a raw server and an ssl server. This patch moves most of the code from httpclient_precheck() into a generic function httpclient_create_proxy(). The proxy will have the PR_CAP_HTTPCLIENT capability. This could be used for specifics httpclient instances that needs different proxy settings.	2022-09-13 17:12:38 +02:00
Emeric Brun	d6e581de4b	BUG/MEDIUM: sink: bad init sequence on tcp sink from a ring. The init of tcp sink, particularly for SSL, was done too early in the code, during parsing, and this can cause a crash specially if nbthread was not configured. This was detected by William using ASAN on a new regtest on log forward. This patch adds the 'struct proxy' created for a sink to a list and this list is now submitted to the same init code than the main proxies list or the log_forward's proxies list. Doing this, we are assured to use the right init sequence. It also removes the ini code for ssl from post section parsing. This patch should be backported as far as v2.2 Note: this fix uses 'goto' labels created by commit 'BUG/MAJOR: log-forward: Fix log-forward proxies not fully initialized' but this code didn't exist before v2.3 so this patch needs to be adapted for v2.2.	2022-09-13 17:03:30 +02:00
Willy Tarreau	439be5838d	MINOR: flags/mux-h2: decode H2C and H2S flags The new functions h2c_show_flags() and h2s_show_flags() decode the flags state into a string, and are used by dev/flags: $ ./dev/flags/flags h2c 0x0600 h2c->flags = H2_CF_DEM_IN_PROGRESS \| H2_CF_DEM_SHORT_READ $ ./dev/flags/flags h2s 0x7003 h2s->flags = H2_SF_HEADERS_RCVD \| H2_SF_OUTGOING_DATA \| H2_SF_HEADERS_SENT \ \| H2_SF_ES_SENT \| H2_SF_ES_RCVD	2022-09-12 19:33:07 +02:00
Willy Tarreau	6c0fadfb7d	REORG: mux-h2: extract flags and enums into mux_h2-t.h Originally in 1.8 we wanted to have an independent mux that could possibly be disabled and would not impose dependencies on the outside. Everything would fit into a single C file and that was fine. Nowadays muxes are unavoidable, and not being able to easily inspect them from outside is sometimes a bit of a pain. In particular, the flags utility still cannot be used to decode their flags. As a first step towards this, this patch moves the flags and enums to mux_h2-t.h, as well as the two state decoding inline functions. It also dropped the H2_SS_*_BIT defines that nobody uses. The mux_h2.c file remains the only one to include that for now.	2022-09-12 19:33:07 +02:00
Willy Tarreau	799e5410b4	MINOR: flags/fd: decode FD flags states The new function is fd_show_flags() and it reports known FD flags: $ ./dev/flags/flags fd 0x000121 fd->flags = FD_POLL_IN \| FD_EV_READY_W \| FD_EV_ACTIVE_R	2022-09-12 19:33:07 +02:00
Willy Tarreau	62bde43779	BUILD: flags: fix the fallback macros for missing stdio The fallback macros for when stdio is not there didn't have the "..." and were causing build issues on platforms with stricter dependencies between includes.	2022-09-09 17:46:45 +02:00
Willy Tarreau	233c0a586d	BUILD: flags: fix build warning in some macros used by show_flags Some gcc versions seem to be upset by the use of enums as booleans, so OK, let's cast all of them as uint, that's no big deal.	2022-09-09 17:36:27 +02:00
Aurelien DARRAGON	d46f437de6	MINOR: proxy/listener: support for additional PAUSED state This patch is a prerequisite for #1626. Adding PAUSED state to the list of available proxy states. The flag is set when the proxy is paused at runtime (pause_listener()). It is cleared when the proxy is resumed (resume_listener()). It should be backported to 2.6, 2.5 and 2.4	2022-09-09 17:23:01 +02:00
Aurelien DARRAGON	001328873c	MINOR: listener: small API change A minor API change was performed in listener(.c/.h) to restore consistency between stop_listener() and (resume/pause)_listener() functions. LISTENER_LOCK was never locked prior to calling stop_listener(): lli variable hint is thus not useful anymore. Added PROXY_LOCK locking in (resume/pause)_listener() functions with related lpx variable hint (prerequisite for #1626). It should be backported to 2.6, 2.5 and 2.4	2022-09-09 17:23:01 +02:00
Willy Tarreau	6edae6ff48	MINOR: flags/http_ana: use flag dumping to show http msg states The function is hmsg_show_flags(). It shows the HTTP_MSGF_* flags.	2022-09-09 17:18:57 +02:00
Willy Tarreau	5349779e40	MINOR: flags/htx: use flag dumping to show htx and start-line flags The function are respectively htx_show_flags() and hsl_show_flags().	2022-09-09 16:59:29 +02:00
Willy Tarreau	e2afad0af4	MINOR: flags/http_ana: use flag dumping for txn flags The new function is txn_show_flags(). It dumps the TXN flags as well as the client and server cookie types.	2022-09-09 16:52:09 +02:00
Willy Tarreau	92a2d3c02b	MINOR: flags/task: use flag dumping for task state The new function is task_show_state().	2022-09-09 16:52:09 +02:00
Willy Tarreau	e9d1283cc5	MINOR: flags/stream: use flag dumping for stream flags The new function is strm_show_flags(). It dumps the stream flags as well as the err type under SF_ERR_MASK and the final state under SF_FINST_MASK.	2022-09-09 16:52:09 +02:00
Willy Tarreau	f4cb98ce56	MINOR: flags/stream: use flag dumping for stream error type The new function is strm_et_show_flags(). Only the error type is handled at the moment, as a bit more complex logic is needed to mix the values and enums present in some fields.	2022-09-09 16:52:09 +02:00
Willy Tarreau	4bab7d81b6	MINOR: flags/stconn: use flag dumping for stconn and sedesc flags The two new functions are se_show_flags() and sc_show_flags(). Maybe something could be done for SC_ST_* values but as it's a small enum, a simple switch/case should work fine.	2022-09-09 16:52:08 +02:00
Willy Tarreau	9d9e101689	MINOR: flags/connection: use flag dumping for connection flags The new function is conn_show_flags(), it only deals with flags. Nothing is planned for connection error types at the moment.	2022-09-09 16:15:10 +02:00
Willy Tarreau	cdc9ddc8cf	MINOR: flags/channel: use flag dumping for channel flags and analysers The two new functions are chn_show_analysers() and chn_show_flags(). They work on an existing buffer so one was declared in flags.c for this purpose. File flags.c does not have to know about channel flags anymore.	2022-09-09 16:15:10 +02:00
Willy Tarreau	7a955b5d73	MINOR: flags: implement a macro used to dump enums inside masks Some of our flags have enums inside a mask. The new macro __APPEND_ENUM is able to deal with that by comparing the flag's value against an exact one under the mask. One needs to take care of eliminating the zero value though, otherwise delimiters will not always be properly placed (e.g. if some flags were dumped before and what remains is exactly zero). The bits of the mask are cleared only upon exact matches.	2022-09-09 16:15:10 +02:00
Willy Tarreau	77acaf5af5	MINOR: flags: add a new file to host flag dumping macros The "flags" utility is useful but painful to maintain up to date. This commit aims at providing a low-maintenance solution to keep flags up to date, by proposing some macros that build a string from a set of flags in a way that requires the least possible verbosity. The idea will be to add an inline function dedicated to this just after the flags declaration, and enumerate the flags one is interested in, and that function will fill a string based on them. Placing this inside the type files allows both haproxy and external tools like "flags" to use it, but comes with a few constraints. First, the files will be slightly less readable if these functions are huge, so they need to stay as compact as possible. Second, the function will need anprintf() and we don't want to include stdio.h in type files as it proved to be particularly heavy and to cause definition headaches in the past. As such the file here only contains a macro enclosed in #ifdef EOF (that is defined in stdio), and provides an alternate empty one when no stdio is defined. This way it's the caller that has to include stdio first or it won't get anything back, and in practice the locations relying on this always have it. The macro has to be used in 3 steps: - prologue: dumps 0 and exits if the value is zero - flags: the macro can be recursively called and it will push the flag from bottom to top so that they appear in the same order as today without requiring to be declared the other way around - epilogue: dump remaining flags that were not identified The macro was arranged so that a single character can be used with no other argument to declare all flags at once. Example: #define _(n, ...) __APPEND_FLAG(buf, len, del, flg, n, #n, __VA_ARGS__) _(0); _(X_FLAG1, _(X_FLAG2, _(X_FLAG3, _(X_FLAG4)))); _(~0); #undef _ Existing files will have to be updated to rely on it, and more files could come soon.	2022-09-09 14:47:31 +02:00
Frédéric Lécaille	3dd79d378c	MINOR: h3: Send the h3 settings with others streams (requests) This is the ->finalize application callback which prepares the unidirectional STREAM frames for h3 settings and wakeup the mux I/O handler to send them. As haproxy is at the same time always waiting for the client request, this makes haproxy call sendto() to send only about 20 bytes of stream data. Furthermore in case of heavy loss, this give less chances to short h3 requests to succeed. Drawback: as at this time the mux sends its streams by their IDs ascending order the stream 0 is always embedded before the unidirectional stream 3 for h3 settings. Nevertheless, as these settings may be lost and received after other h3 request streams, this is permitted by the RFC. Perhaps there is a better way to do. This will have to be checked with Amaury. Must be backported to 2.6.	2022-09-08 18:04:58 +02:00
Frédéric Lécaille	bb995eafc7	BUG/MINOR: quic: Speed up the handshake completion only one time It is possible to speed up the handshake completion but only one time by connection as mentionned in RFC 9002 "6.2.3. Speeding up Handshake Completion". Add a flag to prevent this process to be run several times (see https://www.rfc-editor.org/rfc/rfc9002#name-speeding-up-handshake-compl). Must be backported to 2.6.	2022-09-08 18:04:58 +02:00
Willy Tarreau	3d4cdb198c	MEDIUM: tasks/activity: combine the called function with the caller Now instead of getting aggregate stats per called function, we have them per function AND per call place. The "byaddr" sort considers the function pointer first, then the call count, so that dominant callers of a given callee are instantly spotted. This allows to get sorted outputs like this: Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg h1_io_cb 17357952 40.91s 2.357us 4.849m 16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup sc_conn_io_cb 10357182 6.297s 607.0ns 27.93m 161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup process_stream 9891131 1.809m 10.97us 53.61m 325.2us <- sc_notify@src/stconn.c:1209 task_wakeup process_stream 9823934 1.887m 11.52us 48.31m 295.1us <- stream_new@src/stream.c:563 task_wakeup sc_conn_io_cb 9347863 16.59s 1.774us 6.143m 39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup h1_io_cb 501344 1.848s 3.686us 6.544m 783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup sc_conn_io_cb 239717 492.3ms 2.053us 3.213m 804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup h2_io_cb 173019 4.204s 24.30us 40.95s 236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup h2_io_cb 149487 424.3ms 2.838us 14.63s 97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup other 101893 4.626s 45.40us 14.84s 145.7us quic_lstnr_dghdlr 94389 614.0ms 6.504us 30.54s 323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup quic_conn_app_io_cb 92205 3.735s 40.51us 390.9ms 4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after qc_io_cb 50355 19.01s 377.5us 10.65s 211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup h1_io_cb 44427 155.0ms 3.489us 21.50s 484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup qc_io_cb 9018 4.924s 546.0us 3.084s 342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup h1_timeout_task 3236 1.172ms 362.0ns 1.119s 345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup h1_io_cb 2804 7.974ms 2.843us 1.980s 706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup sc_conn_io_cb 2804 33.44ms 11.92us 2.597s 926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup qc_io_cb 2623 2.669s 1.017ms 1.347s 513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup qc_process_timer 662 526.4us 795.0ns 1.081s 1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup quic_conn_app_io_cb 648 12.62ms 19.47us 225.7ms 348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup accept_queue_process 286 1.571ms 5.494us 72.55ms 253.7us <- listener_accept@src/listener.c:1099 tasklet_wakeup process_resolvers 176 157.8us 896.0ns 7.835ms 44.52us <- wake_expired_tasks@src/task.c:429 task_drop_running qc_io_cb 167 10.71ms 64.12us 32.47ms 194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup sc_conn_io_cb 123 80.05us 650.0ns 50.35ms 409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup h2_timeout_task 32 30.69us 958.0ns 9.038ms 282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup task_run_applet 24 33.79ms 1.408ms 5.838ms 243.3us <- sc_applet_create@src/stconn.c:489 appctx_wakeup accept_queue_process 17 56.34us 3.314us 7.505ms 441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup srv_cleanup_toremove_conns 16 1.133ms 70.81us 5.685ms 355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup srv_cleanup_idle_conns 16 74.57us 4.660us 2.797ms 174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running quic_conn_app_io_cb 12 786.9us 65.58us 2.042ms 170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup sc_conn_io_cb 9 20.55us 2.283us 2.475ms 275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup h2_io_cb 8 34.12us 4.265us 1.784ms 223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup task_run_applet 4 6.615ms 1.654ms 2.306us 576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup quic_conn_io_cb 4 4.278ms 1.069ms 6.469us 1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after qc_io_cb 2 20.81us 10.40us 4.943us 2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup quic_conn_app_io_cb 2 752.9us 376.4us 63.97us 31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup quic_accept_run 2 13.84us 6.920us 172.8us 86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup qc_idle_timer_task 2 295.0us 147.5us 8.761us 4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup qc_io_cb 1 867.1us 867.1us 812.8us 812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup ... and calls sorted by address like this: Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 23 32.73ms 1.423ms 5.837ms 253.8us <- sc_applet_create@src/stconn.c:489 appctx_wakeup task_run_applet 4 6.615ms 1.654ms 2.306us 576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup accept_queue_process 285 1.566ms 5.495us 72.49ms 254.3us <- listener_accept@src/listener.c:1099 tasklet_wakeup accept_queue_process 17 56.34us 3.314us 7.505ms 441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup sc_conn_io_cb 10357182 6.297s 607.0ns 27.93m 161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup sc_conn_io_cb 9347863 16.59s 1.774us 6.143m 39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup sc_conn_io_cb 239717 492.3ms 2.053us 3.213m 804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup sc_conn_io_cb 2804 33.44ms 11.92us 2.597s 926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup sc_conn_io_cb 123 80.05us 650.0ns 50.35ms 409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup sc_conn_io_cb 9 20.55us 2.283us 2.475ms 275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup process_resolvers 159 145.9us 917.0ns 7.823ms 49.20us <- wake_expired_tasks@src/task.c:429 task_drop_running srv_cleanup_idle_conns 16 74.57us 4.660us 2.797ms 174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running srv_cleanup_toremove_conns 16 1.133ms 70.81us 5.685ms 355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup process_stream 9891130 1.809m 10.97us 53.61m 325.2us <- sc_notify@src/stconn.c:1209 task_wakeup process_stream 9823933 1.887m 11.52us 48.31m 295.1us <- stream_new@src/stream.c:563 task_wakeup h1_io_cb 17357952 40.91s 2.357us 4.849m 16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup h1_io_cb 501344 1.848s 3.686us 6.544m 783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup h1_io_cb 44427 155.0ms 3.489us 21.50s 484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup h1_io_cb 2804 7.974ms 2.843us 1.980s 706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup h1_timeout_task 3236 1.172ms 362.0ns 1.119s 345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup h2_timeout_task 32 30.69us 958.0ns 9.038ms 282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup h2_io_cb 173019 4.204s 24.30us 40.95s 236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup h2_io_cb 149487 424.3ms 2.838us 14.63s 97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup h2_io_cb 8 34.12us 4.265us 1.784ms 223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup qc_io_cb 50355 19.01s 377.5us 10.65s 211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup qc_io_cb 9018 4.924s 546.0us 3.084s 342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup qc_io_cb 2623 2.669s 1.017ms 1.347s 513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup qc_io_cb 167 10.71ms 64.12us 32.47ms 194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup qc_io_cb 2 20.81us 10.40us 4.943us 2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup qc_io_cb 1 867.1us 867.1us 812.8us 812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup qc_idle_timer_task 2 295.0us 147.5us 8.761us 4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup quic_conn_io_cb 4 4.278ms 1.069ms 6.469us 1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after quic_conn_app_io_cb 92205 3.735s 40.51us 390.9ms 4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after quic_conn_app_io_cb 648 12.62ms 19.47us 225.7ms 348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup quic_conn_app_io_cb 12 786.9us 65.58us 2.042ms 170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup quic_conn_app_io_cb 2 752.9us 376.4us 63.97us 31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup quic_lstnr_dghdlr 94389 614.0ms 6.504us 30.54s 323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup qc_process_timer 662 526.4us 795.0ns 1.081s 1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup quic_accept_run 2 13.84us 6.920us 172.8us 86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup other 101892 4.626s 45.40us 14.84s 145.7us It already becomes visible that some tasks have different very costs depending where they're called (e.g. process_stream). The method used to wake them up is also shown. Applets are handled specially and shown as appctx_wakeup.	2022-09-08 16:21:22 +02:00
Willy Tarreau	a3423873fe	CLEANUP: activity: make the number of sched activity entries more configurable This removes all the hard-coded 8-bit and 256 entries to use a pair of macros instead so that we can more easily experiment with larger table sizes if needed.	2022-09-08 14:55:09 +02:00
Willy Tarreau	e0e6d81460	CLEANUP: task: move tid and wake_date into the common part There used to be one tid for tasklets and a thread_mask for tasks. Since 2.7, both tasks and tasklets now use a tid (albeit with a very slight semantic difference for the negative value), to in order to limit code duplication and to ease debugging it makes sense to move tid into the common part. One limitation is that it will leave a hole in the structure, but we now have the wake_date that is always present and can move there as well to plug the hole. This results in something overall pretty clean (and cleaner than before), with the low-level stuff (state,tid,process,context) appearing first, then the caller stuff (caller,wake_date,calls,debug) next, and finally the type-specific stuff (rq/wq/expire/nice).	2022-09-08 14:30:38 +02:00
Willy Tarreau	2830d282e5	DEBUG: task: simplify the caller recording in DEBUG_TASK Instead of storing an index that's swapped at every call, let's use the two pointers as a shifting history. Now we have a permanent "caller" field that records the last caller, and an optional prev_caller in the debug section enabled by DEBUG_TASK that keeps a copy of the previous caller one. This way, not only it's much easier to follow what's happening during debugging, but it saves 8 bytes in the struct task in debug mode and still keeps it under 2 cache lines in nominal mode, and this will finally be usable everywhere and later in profiling. The caller_idx was also used as a hint that the entry was freed, in order to detect wakeup-after-free. This was changed by setting caller to -1 instead and preserving its value in caller[1]. Finally, the operations were made atomic. That's not critical but since it's used for debugging and race conditions represent a significant part of the issues in multi-threaded mode, it seems wise to at least eliminate some possible factors of faulty analysis.	2022-09-08 14:30:38 +02:00
Willy Tarreau	8d71abf0cd	DEBUG: applet: instrument appctx_wakeup() to log the caller's location appctx_wakeup() relies on task_wakeup(), but since it calls it from a function, the calling place is always appctx_wakeup() itself, which is not very useful. Let's turn it to a macro so that we can log the location of the caller instead. As an example, the cli_io_handler() which used to be seen as this: (gdb) p appctx->t.debug.caller[0] $10 = { func = 0x9ffb78 <__func__.37996> "appctx_wakeup", file = 0x9b336a "include/haproxy/applet.h", line = 110, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } Now shows the more useful: (gdb) p appctx->t.debug.caller[0] $6 = { func = 0x9ffe80 <__func__.38641> "sc_app_chk_snd_applet", file = 0xa00320 "src/stconn.c", line = 996, what = 6 '\006', arg8 = 0 '\000', arg32 = 0 }	2022-09-08 14:30:38 +02:00
Willy Tarreau	e08af9a0f4	DEBUG: task: use struct ha_caller instead of arrays of file:line This reduces the task struct by 8 bytes, reduces the code size a little bit by simplifying the calling convention (one argument dropped), and as a bonus provides the function name in the caller.	2022-09-08 14:30:38 +02:00
Willy Tarreau	d2b2ad902b	DEBUG: task: define a series of wakeup types for tasks and tasklets The WAKEUP_* values will be used to report how a task/tasklet was woken up, and task_wakeup_type_str() wlil report the associated function name.	2022-09-08 14:30:16 +02:00

... 14 15 16 17 18 ...

7956 Commits