haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-11 01:26:58 +02:00

Author	SHA1	Message	Date
Frédéric Lécaille	a2e954a817	MINOR: quic: Make circular buffer internal buffers be variable-sized. For now on thanks to this simple patch we can use circular buffers with a variable-sized internal buffer.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	0e50e1b0b5	MINOR: quic: Add the packet type to quic_tx_packet struct This is required to build packets from the same function.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	db2a7649a9	MINOR: quic: Add a useful function to compute any frame length. This should be used by the function which build packets to prevent it from failing. This is important when the packet numbers are consumed by several threads. The packet number is used to build and encrypt packets and must be incremented only and only if the packet it refers to has been successfully built.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	0ad0458a56	MINOR: quic: Replace quic_tx_frm struct by quic_frame struct These structures are similar. quic_tx_frm was there to try to reduce the size of such objects which embed a union for all the QUIC frames. Furtheremore this patch fixes the issue where quic_tx_frm objects were freed from the pool for quic_frame.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	c88df07bdd	MINOR: quic: Make ->tx.frms quic_pktns struct member be thread safe Replace this member which is a list struct by an mt_list struct.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	120ea6f169	MINOR: quic: Make qc_treat_rx_pkts() be thread safe. Make quic_rx_packet_ref(inc\|dec)() functions be thread safe. Make use of ->rx.crypto.frms_rwlock RW lock when manipulating RX frames from qc_treat_rx_crypto_frms(). Modify atomically several variables attached to RX part of quic_enc_level struct.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	98cdeb2f0c	MINOR: quic: Rename ->rx.rwlock of quic_enc_level struct to ->rx.pkts_rwlock As there are at two RW lock in this structure, let's the name of this lock be more explicit.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	9054d1b564	MINOR: quic: Missing encryption level rx.crypto member initialization and lock. ->rx.crypto member of quic_enc_level struct was not initialized as this was done for all other members of this structure. This patch fixes this. Also adds a RW lock for the frame of this member.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	b095252a23	MINOR: Add function for TX packets reference counting Add two functions to encrement or decrement a referenc counter attached to TX packet structure (struct quic_tx_packet). The packet are freed when their counters reach the null value.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	c8d3f873e8	MINOR: quic: Remove old TX buffer implementation We use only ring buffers (struct qring) to prepare and send QUIC datagrams. We can safely remove the old buffering implementation which was not thread safe.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	c5b0c93c26	MINOR: quic: Make use of TX ring buffers to send QUIC packets We modify the functions responsible of building packets to put these latters in ring buffers (qc_build_hdshk_pkt() during the handshake step, and qc_build_phdshk_apkt() during the post-handshake step). These functions remove a ring buffer from its list to build as much as possible datagrams. Eache datagram is prepended of two field: the datagram length and the first packet in the datagram. We chain the packets belonging to the same datagram in a singly linked list to reach them from the first one: indeed we must modify some members of each packet when we really send them from send_ppkts(). This function is also modified to retrieved the datagram from ring buffers.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	6b19764e3c	MINOR: quic: Initialize pointers to TX ring buffer list We initialize the pointer to the listener TX ring buffer list. Note that this is not done for QUIC clients as we do not fully support them: we only have to allocate the list and attach it to server struct I guess.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	48f8e1925b	MINOR: proto_quic: Allocate TX ring buffers for listeners We allocate an array of QUIC ring buffer, one by thread, and arranges them in a MT_LIST. Everything is allocated or nothing: we do not want to usse an incomplete array of ring buffers to ensure that each thread may safely acquire one of these buffers.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	22cfdf8d0e	MINOR: quic: Add ring buffer definition (struct qring) for QUIC A ring buffer is made of a circular buffer (->cbuf) and must be arrange in a MT_LIST (->mt_list).	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	9621565b74	MINOR: net_helper: add functions for pointers Add two functions to read/write pointer values to/from vectors.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	c6bc185c18	MINOR: quic: Add a ring buffer implementation for QUIC This implementation is inspired from Linux kernel circular buffer implementation (see include/linux/circ-buf.h). Such buffers may be used at the same time both by writer and reader (lock-free).	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	f3d078d22e	MINOR: quic: Make qc_lstnr_pkt_rcv() be thread safe. Modify the I/O dgram handler principal function used to parse QUIC packets be thread safe. Its role is at least to create new incoming connections add to two trees protected by the same RW lock. The packets are for now on fully parsed before possibly creating new connections.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	a11d0e26d4	MINOR: quic: Replace the RX unprotected packet list by a thread safety one. This list is shared between the I/O dgram handler and the task responsible for processing the QUIC packets inside.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	c28aba2a8d	MINOR: quic: Replace the RX list of packet by a thread safety one. This list is shared between the I/O dgram handler and the task responsible for processing the QUIC packets.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	a5fe49f44a	MINOR: quic: Move the connection state Move the connection state from quic_conn_ctx struct to quic_conn struct which is the structure which is used to store the QUIC connection part information. This structure is initialized by the I/O dgram handler for each new connection to QUIC listeners. This is needed for the multithread support so that to not to have to depend on the connection context potentially initialized by another thread.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	9fccace8b0	MINOR: quic: Add a lock for RX packets We must protect from concurrent the tree which stores the QUIC packets received by the dgram I/O handler, these packets being also parsed by the xprt task.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	497fa78ad8	MINOR: quic: Derive the initial secrets asap Make depends qc_new_isecs() only on quic_conn struct initialization only (no more dependency on connection struct initialization) to be able to run it as soon as the quic_conn struct is initialized (from the I/O handler) before running ->accept() quic proto callback.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	3d77fa754d	MINOR: quic: QUIC conn initialization from I/O handler Move the QUIC conn (struct quic_conn) initialization from quic_sock_accept_conn() to qc_lstnr_pkt_rcv() as this is done for the server part. Move the timer initialization to ->start xprt callback to ensure the connection context is done : it is initialized by the ->accept callback which may be run by another thread than the one for the I/O handler which also run ->start.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	785c9c998a	MINOR: quic: Replace max_packet_size by max_udp_payload size. The name the maximum packet size transport parameter was ambiguous and replaced by maximum UDP payload size. Our code would be also ambiguous if it does not reflect this change.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	8648c7c995	MINOR: quic: Avoid header collisions Extract the QUIC varints encoding functions from xprt_quic.h to avoid header collisions.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	738397065c	MINOR: quic: Add a wrapper function to update transport parameters. This function calls quic_mux_transport_params_update() to update the related streams transport parameter of the mux. It is there only so that not to have to include mux_quic.h to update these parameters.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	7a668f5acb	MINOR: quic: Variable-length integer encoding/decoding into/from buffer struct. Add a function to encode a QUIC varint into a buffer struct. Samething for the deconding part.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	b4672fb6f0	MINOR: qpack: Add QPACK compression. Implement QPACK used for HTTP header compression by h3.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	ccac11f35a	MINOR: h3: Add HTTP/3 definitions. Add all the definitions for HTTP/3 implementation.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	b8f60b3c66	MINOR: quic: Add a new definition to store STREAM frames. Add a new structure to store enough information about STREAM frames which must be stored before being delivered to the application layer, for any reason.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	65bc43434a	MINOR: quic: Attach QUIC mux connection objet to QUIC connection. This add a qcc struct for QUIC mux/demux connection layer to quic_conn struct at low level connection layer.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	2abe74f39c	MINOR: connection: Add callbacks definitions for QUIC. The flow control at stream level is organized by types (client bidi, server bidi, client uni, server uni). Adds at least callback to retrieve the number of available streams by direction.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	dfbae766b2	MINOR: mux_quic: Add QUIC mux layer. This file has been derived from mux_h2.c removing all h2 parts. At QUIC mux layer, there must not be any reference to http. This will be the responsability of the application layer (h3) to open streams handled by the mux.	2021-09-23 15:27:25 +02:00
Frédéric Lécaille	5aa4143d6c	MINOR: quic: Move transport parmaters to anynomous struct. We move ->params transport parameters to ->rx.params. They are the transport parameters which will be sent to the peer, and used for the endpoint flow control. So, they will be used to received packets from the peer (RX part). Also move ->rx_tps transport parameters to ->tx.params. They are the transport parameter which are sent by the peer, and used to respect its flow control limits. So, they will be used when sending packets to the peer (TX part).	2021-09-23 15:27:25 +02:00
Tim Duesterhus	ec4a8754da	CLEANUP: Apply xalloc_size.cocci This fixes a few locations with a hardcoded type within `sizeof()`.	2021-09-17 17:22:05 +02:00
Tim Duesterhus	b113b5ca24	CLEANUP: Apply ist.cocci This cleans up ist handling.	2021-09-17 17:22:05 +02:00
Willy Tarreau	81a76f4827	REORG: threads: move ha_get_pthread_id() to tinfo.h This solely manipulates the thread_info struct, it ought to be in tinfo.h, not in thread.h.	2021-09-17 16:08:34 +02:00
Willy Tarreau	e61244631a	MINOR: applet: remove the thread mask from appctx_new() appctx_new() is exclusively called with tid_bit and it only uses the mask to pass it to the accompanying task. There is no point requiring the caller to know about a mask there, nor is there any point in creating an applet outside of the context of its own thread anyway. Let's drop this and pass tid_bit to task_new() directly.	2021-09-17 16:08:34 +02:00
Amaury Denoyelle	7a8aff2688	BUILD: ist: prevent gcc11 maybe-uninitialized warning on istalloc A new warning is reported by gcc11 when using a pointer to uninitialized memory block for a function with a const pointer argument. The warning is triggered for istalloc, used by http_client.c / proxy.c / tcpcheck.c. This warning is reported because the uninitialized memory block allocated by malloc should not be passed to a const argument as in ist2. See https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Warning-Options.html#index-Wmaybe-uninitialized This should be backported up to 2.2.	2021-09-17 09:57:27 +02:00
Willy Tarreau	c2afb860f2	MINOR: pools: use mallinfo2() when available instead of mallinfo() Ilya reported in issue #1391 a build warning on Fedora about mallinfo() being deprecated in favor of mallinfo2() since glibc-2.33. Let's add support for it. This should be backported where the following commit is also backported: `157e39303` ("MINOR: pools: automatically disable malloc_trim() with external allocators").	2021-09-16 09:20:16 +02:00
Tim Duesterhus	8f1669b10f	CLEANUP: Remove prototype for non-existent thread_get_default_count() This is the only location of `thread_get_default_count` within the codebase.	2021-09-15 11:07:18 +02:00
Tim Duesterhus	992007ec78	CLEANUP: tree-wide: fix prototypes for functions taking no arguments. "f(void)" is the correct and preferred form for a function taking no argument, while some places use the older "f()". These were reported by clang's -Wmissing-prototypes, for example: src/cpuset.c:111:5: warning: no previous prototype for function 'ha_cpuset_size' [-Wmissing-prototypes] int ha_cpuset_size() include/haproxy/cpuset.h:42:5: note: this declaration is not a prototype; add 'void' to make it a prototype for a zero-parameter function int ha_cpuset_size(); ^ void This aggregate patch fixes this for the following functions: ha_backtrace_to_stderr(), ha_cpuset_size(), ha_panic(), ha_random64(), ha_thread_dump_all_to_trash(), get_exec_path(), check_config_validity(), mworker_child_nb(), mworker_cli_proxy_(create\|stop)(), mworker_cleantasks(), mworker_cleanlisteners(), mworker_ext_launch_all(), mworker_reload(), mworker_(env\|proc_list)_to_(proc_list\|env)(), mworker_(un\|)block_signals(), proxy_adjust_all_maxconn(), proxy_destroy_all_defaults(), get_tainted(), pool_total_(allocated\|used)(), thread_isolate(_full\|)(), thread(_sync\|)_release(), thread_harmless_till_end(), thread_cpu_mask_forced(), dequeue_all_listeners(), next_timer_expiry(), wake_expired_tasks(), process_runnable_tasks(), init_acl(), init_buffer(), (de\|)init_log_buffers(), (de\|)init_pollers(), fork_poller(), pool_destroy_all(), pool_evict_from_local_caches(), pool_total_failures(), dump_pools_to_trash(), cfg_run_diagnostics(), tv_init_(process\|thread)_date(), __signal_process_queue(), deinit_signals(), haproxy_unblock_signals()	2021-09-15 11:07:18 +02:00
Willy Tarreau	4f5485bfad	BUG/MINOR: compat: make sure __WORDSIZE is always defined -Wundef triggered on a MIPS-based musl build on __WORDSIZE that's used in ultoa_o() and some Lua initialization. The former will fail to convert integers larger to 1 billion to proper string in this case. Let's make sure this macro is defined and fall back to values determined from __SIZEOF_LONG__ otherwise. A cleaner long-term approach would consist in removing all remaining occurrences of this macro. This can be backported to all versions.	2021-09-15 10:32:12 +02:00
Willy Tarreau	8ab9419394	BUILD: threads: fix -Wundef for _POSIX_PRIORITY_SCHEDULING on libmusl Building with an old musl-based toolchain reported this warning: include/haproxy/thread.h: In function 'ha_thread_relax': include/haproxy/thread.h:256:5: warning: "_POSIX_PRIORITY_SCHEDULING" is not defined [-Wundef] #if _POSIX_PRIORITY_SCHEDULING ^ There were indeed two "#if" insteadd of #ifdef" for this macro, let's fix them.	2021-09-15 10:32:12 +02:00
Willy Tarreau	8ac6597cbe	BUILD: compiler: fixed a missing test on defined(__GNUC__) This one could theoretically trigger -Wundef on non-gcc compatible compilers if DEBUG_USE_ABORT is not set.	2021-09-13 09:30:47 +02:00
Tim Duesterhus	cf6f574872	CLEANUP: Move XXH3 macro from haproxy/compat.h to haproxy/xxhash.h This moves all the xxhash functionality into a single location. see `d5fc8fcb86`	2021-09-11 20:37:50 +02:00
Tim D�sterhus	a8bfb4d135	CLEANUP: ebmbtree: Replace always-taken elseif by else `diff` is guaranteed to be less than 0, because the `if` handles the `>= 0` case. Found using GitHub's CodeQL scan in HAProxy's codebase.	2021-09-11 20:15:28 +02:00
Tim Duesterhus	d5fc8fcb86	CLEANUP: Add haproxy/xxhash.h to avoid modifying import/xxhash.h This solves setting XXH_INLINE_ALL in a cleaner way, because the imported header is not modified, easing future updates. see `6f7cc11e6d`	2021-09-11 19:58:45 +02:00
Christopher Faulet	f079f44096	MINOR: htx: Skip headers with no value when adding a header list to a message When the header list is added, after the message parsing, headers with no value are now ignored. It is not the same than headers with empty value fields. Only headers with a NULL pointer as value are skipped. This only happens if the header value is removed during the message parsing. Concretly, such headers are now ignored when htx_add_all_headers() is called. However, htx_add_header() is not affected by this change. Symetrically, the same is true for trailers. It may be backported to 2.4 because of the previous fix ("BUG/MEDIUM: mux-h1: Remove "Upgrade:" header for requests with payload").	2021-09-10 10:35:53 +02:00
devnexen@gmail.com	ac5f634cb1	BUILD: fix dragonfly build again on __read_mostly It looks like some versions define it and others not. Better rely on the macro itself rather than checking for a particular OS.	2021-09-08 19:46:29 +02:00
Willy Tarreau	61ecf28389	OPTIM: vars: only takes the variables lock on shared entries There's no point taking the variables locks for sess/txn/req/res contexts since these ones always run inside the same thread anyway. This patch conditions the lock on the variable's scope to avoid flushing cache lines when not needed. This showed an improvement of ~5% on a 16-thread machine with 12 variables.	2021-09-08 15:44:45 +02:00
Willy Tarreau	dc72fbb8e8	MINOR: vars: centralize the lock/unlock into static inlines The goal it to simplify the variables locking in order to later simplify it.	2021-09-08 15:19:57 +02:00
Willy Tarreau	3a4bedccc6	MEDIUM: vars: replace the global name index with a hash The global table of known variables names can only grow and was designed for static names that are registered at boot. Nowadays it's possible to set dynamic variable names from Lua or from the CLI, which causes a real problem that was partially addressed in 2.2 with commit `4e172c93f` ("MEDIUM: lua: Add `ifexist` parameter to `set_var`"). Please see github issue #624 for more context. This patch simplifies all this by removing the need for a central registry of known names, and storing 64-bit hashes instead. This is highly sufficient given the low number of variables in each context. The hash is calculated using XXH64() which is bijective over the 64-bit space thus is guaranteed collision-free for 1..8 chars. Above that the risk remains around 1/2^64 per extra 8 chars so in practice this is highly sufficient for our usage. A random seed is used at boot to seed the hash so that it's not attackable from Lua for example. There's one particular nit though. The "ifexist" hack mentioned above is now limited to variables of scope "proc" only, and will only match variables that were already created or declared, but will now verify the scope as well. This may affect some bogus Lua scripts and SPOE agents which used to accidentally work because a similarly named variable used to exist in a different scope. These ones may need to be fixed to comply with the doc. Now we can sum up the situation as this one: - ephemeral variables (scopes sess, txn, req, res) will always be usable, regardless of any prior declaration. This effectively addresses the most problematic change from the commit above that in order to work well could have required some script auditing ; - process-wide variables (scope proc) that are mentioned in the configuration, referenced in a "register-var-names" SPOE directive, or created via "set-var" in the global section or the CLI, are permanent and will always accept to be set, with or without the "ifexist" restriction (SPOE uses this internally as well). - process-wide variables (scope proc) that are only created via a set-var() tcp/http action, via Lua's set_var() calls, or via an SPOE with the "force-set-var" directive), will not be permanent but will always accept to be replaced once they are created, even if "ifexist" is present - process-wide variables (scope proc) that do not exist will only support being created via the set-var() tcp/http action, Lua's set_var() calls without "ifexist", or an SPOE declared with "force-set-var". This means that non-proc variables do not care about "ifexist" nor prior declaration, and that using "ifexist" should most often be reliable in Lua and that SPOE should most often work without any prior declaration. It may be doable to turn "ifexist" to 1 by default in Lua to further ease the transition. Note: regtests were adjusted. Cc: Tim D�sterhus <tim@bastelstu.be>	2021-09-08 15:06:11 +02:00
Willy Tarreau	c1c88f4809	MEDIUM: vars: make var_clear() only reset VF_PERMANENT variables We certainly do not want that a permanent variable (one that is listed in the configuration) be erased by accident by an "unset-var" action. Let's make sure these ones are only reset to an empty sample, like at the moment of their initial registration. One trick is that the same function is used to purge the memory at the end and to delete, so we need to add an extra "force" argument to make the choice.	2021-09-08 15:06:11 +02:00
Willy Tarreau	3dc6dc3178	MINOR: vars: store flags into variables and add VF_PERMANENT In order to continue to honor the ifexist Lua option and prevent rogue SPOA agents from creating too many variables, we'll need to keep the ability to mark certain proc.* variables as permanent when they're known from the config file. Let's add a flag there for this. It's added to the variable when the variable is created with this flag set by the caller. Another approach could have been to use a distinct list or distinct scope but that sounds complicated and bug-prone.	2021-09-08 14:06:34 +02:00
Willy Tarreau	4994b57728	MINOR: vars: add a VF_CREATEONLY flag for creation Passing this flag to var_set() will result in the variable to only be created if it did not exist, otherwise nothing is done (it's not even updated). This will be used for pre-registering names.	2021-09-08 11:47:30 +02:00
Willy Tarreau	7978c5c422	MEDIUM: vars: make the ifexist variant of set-var only apply to the proc scope When setting variables, there are currently two variants, one which will always create the variable, and another one, "ifexist", which will only create or update a variable if a similarly named variable in any scope already existed before. The goal was to limit the risk of injecting random names in the proc scope, but it was achieved by making use of the somewhat limited name indexing model, which explains the scope-agnostic restriction. With this change, we're moving the check downwards in the chain, at the variable level, and only variables under the scope "proc" will be subject to the restriction. A new set of VF_* flags was added to adjust how variables are set, and VF_UPDATEONLY is used to mention this restriction. In this exact state of affairs, this is not completely exact, as if a similar name was not known in any scope, the variable will continue to be rejected like before, but this will change soon.	2021-09-08 11:47:06 +02:00
Willy Tarreau	b7bfcb3ff3	MINOR: vars: rename vars_init() to vars_init_head() The vars_init() name is particularly confusing as it does not initialize the variables code but the head of a list of variables passed in arguments. And we'll soon need to have proper initialization code, so let's rename it now.	2021-09-08 11:10:16 +02:00
Willy Tarreau	10080716bf	MINOR: proxy: add a global "grace" directive to postpone soft-stop In ticket #1348 some users expressed some concerns regarding the removal of the "grace" directive from the proxies. Their use case very closely mimmicks the original intent of the grace keyword, which is, let haproxy accept traffic for some time when stopping, while indicating an external LB that it's stopping. This is implemented here by starting a task whose expiration triggers the soft-stop for real. The global "stopping" variable is immediately set however. For example, this below will be sufficient to instantly notify an external check on port 9999 that the service is going down, while other services remain active for 10s: global grace 10s frontend ext-check bind :9999 monitor-uri /ext-check monitor fail if { stopping }	2021-09-07 17:34:29 +02:00
Willy Tarreau	3b69886f7d	BUG/MAJOR: htx: fix missing header name length check in htx_add_header/trailer Ori Hollander of JFrog Security reported that htx_add_header() and htx_add_trailer() were missing a length check on the header name. While this does not allow to overwrite any memory area, it results in bits of the header name length to slip into the header value length and may result in forging certain header names on the input. The sad thing here is that a FIXME comment was present suggesting to add the required length checks :-( The injected headers are visible to the HTTP internals and to the config rules, so haproxy will generally stay synchronized with the server. But there is one exception which is the content-length header field, because it is already deduplicated on the input, but before being indexed. As such, injecting a content-length header after the deduplication stage may be abused to present a different, shorter one on the other side and help build a request smuggling attack, or even maybe a response splitting attack. CVE-2021-40346 was assigned to this problem. As a mitigation measure, it is sufficient to verify that no more than one such header is present in any message, which is normally the case thanks to the duplicate checks: http-request deny if { req.hdr_cnt(content-length) gt 1 } http-response deny if { res.hdr_cnt(content-length) gt 1 } This must be backported to all HTX-enabled versions, hence as far as 2.0. In 2.3 and earlier, the functions are in src/htx.c instead. Many thanks to Ori for his work and his responsible report!	2021-09-03 16:15:29 +02:00
Willy Tarreau	3d5f19e04d	CLEANUP: htx: remove comments about "must be < 256 MB" Since commit "BUG/MINOR: config: reject configs using HTTP with bufsize >= 256 MB" we are now sure that it's not possible anymore to have an HTX block of a size 256 MB or more, even after concatenation thanks to the tests for len >= htx_free_data_space(). Let's remove these now obsolete comments. A BUG_ON() was added in htx_add_blk() to track any such exception if the conditions would change later, to complete the one that is performed on the start address that must remain within the buffer.	2021-09-03 16:15:29 +02:00
Willy Tarreau	e352b9dac7	MINOR: vars: make vars_get_by_* support an optional default value In preparation for support default values when fetching variables, we need to update the internal API to pass an extra argument to functions vars_get_by_{name,desc} to provide an optional default value. This patch does this and always passes NULL in this argument. var_to_smp() was extended to fall back to this value when available.	2021-09-03 12:08:54 +02:00
Willy Tarreau	9a621ae76d	MEDIUM: vars: add a new "set-var-fmt" action The set-var() action is convenient because it preserves the input type but it's a pain to deal with when trying to concatenate values. The most recurring example is when it's needed to build a variable composed of the source address and the source port. Usually it ends up like this: tcp-request session set-var(sess.port) src_port tcp-request session set-var(sess.addr) src,concat(":",sess.port) This is even worse when trying to aggregate multiple fields from stick-table data for example. Due to this a lot of users instead abuse headers from HTTP rules: http-request set-header(x-addr) %[src]:%[src_port] But this requires some careful cleanups to make sure they won't leak, and it's significantly more expensive to deal with. And generally speaking it's not clean. Plus it must be performed for each and every request, which is expensive for this common case of ip+port that doesn't change for the whole session. This patch addresses this limitation by implementing a new "set-var-fmt" action which performs the same work as "set-var" but takes a format string in argument instead of an expression. This way it becomes pretty simple to just write: tcp-request session set-var-fmt(sess.addr) %[src]:%[src_port] It is usable in all rulesets that already support the "set-var" action. It is not yet implemented for the global "set-var" directive (which already takes a string) and the CLI's "set var" command, which would definitely benefit from it but currently uses its own parser and engine, thus it must be reworked. The doc and regtests were updated.	2021-09-02 21:22:22 +02:00
Willy Tarreau	57467b8356	MINOR: sample: add missing ARGC_ entries For a long time we couldn't have arguments in expressions used in tcp-request, tcp-response etc rules. But now due to the variables it's possible, and their context in case of failure to resolve an argument (e.g. backend name not found) is not properly reported because there is no arg context values in ARGC_* to report them. Let's add a number of missing ones for tcp-request {connection, session,content}, tcp-response content, tcp-check, the config parser (for "set-var" in the global section) and the CLI parser (for "set-var" on the CLI).	2021-09-02 19:43:20 +02:00
Willy Tarreau	bc1223be79	MINOR: http-rules: add a new "ignore-empty" option to redirects. Sometimes it is convenient to remap large sets of URIs to new ones (e.g. after a site migration for example). This can be achieved using "http-request redirect" combined with maps, but one difficulty there is that non-matching entries will return an empty response. In order to avoid this, duplicating the operation as an ACL condition ending in "-m found" is possible but it becomes complex and error-prone while it's known that an empty URL is not valid in a location header. This patch addresses this by improving the redirect rules to be able to simply ignore the rule and skip to the next one if the result of the evaluation of the "location" expression is empty. However in order not to break existing setups, it requires a new "ignore-empty" keyword. There used to be an ACT_FLAG_FINAL on redirect rules that's used during the parsing to emit a warning if followed by another rule, so here we only set it if the option is not there. The http_apply_redirect_rule() function now returns a 3rd value to mention that it did nothing and that this was not an error, so that callers can just ignore the rule. The regular "redirect" rules were not modified however since this does not apply there. The map_redirect VTC was completed with such a test and updated to 2.5 and an example was added into the documentation.	2021-09-02 17:06:18 +02:00
Tim Duesterhus	abc6b31ab8	CLEANUP: Add missing include guard to signal.h Found using GitHub's CodeQL scan.	2021-09-01 21:39:19 +02:00
Willy Tarreau	87154e3010	BUG/MAJOR: queue: better protect a pendconn being picked from the proxy The locking in the dequeuing process was significantly improved by commit `49667c14b` ("MEDIUM: queue: take the proxy lock only during the px queue accesses") in that it tries hard to limit the time during which the proxy's queue lock is held to the strict minimum. Unfortunately it's not enough anymore, because we take up the task and manipulate a few pendconn elements after releasing the proxy's lock (while we're under the server's lock) but the task will not necessarily hold the server lock since it may not have successfully found one (e.g. timeout in the backend queue). As such, stream_free() calling pendconn_free() may release the pendconn immediately after the proxy's lock is released while the other thread currently proceeding with the dequeuing tries to wake up the owner's task and dies in task_wakeup(). One solution consists in releasing le proxy's lock later. But tests have shown that we'd have to sacrifice a significant share of the performance gained with the patch above (roughly a 20% loss). This patch takes another approach. It adds a "del_lock" to each pendconn struct, that allows to keep it referenced while the proxy's lock is being released. It's mostly a serialization lock like a refcount, just to maintain the pendconn alive till the task_wakeup() call is complete. This way we can continue to release the proxy's lock early while keeping this one. It had to be added to the few points where we're about to free a pendconn, namely in pendconn_dequeue() and pendconn_unlink(). This way we continue to release the proxy's lock very early and there is no performance degradation. This lock may only be held under the queue's lock to prevent lock inversion. No backport is needed since the patch above was merged in 2.5-dev only.	2021-08-31 18:37:13 +02:00
Remi Tricot-Le Breton	fe21fe76bd	MINOR: log: Add new "error-log-format" option This option can be used to define a specific log format that will be used in case of error, timeout, connection failure on a frontend... It will be used for any log line concerned by the log-separate-errors option. It will also replace the format of specific error messages decribed in section 8.2.6. If no "error-log-format" is defined, the legacy error messages are still emitted and the other error logs keep using the regular log-format.	2021-08-31 12:13:08 +02:00
Willy Tarreau	ea57a9b103	BUILD: ssl: next round of build warnings on LIBRESSL_VERSION_NUMBER Other build warnings were emitted on LIBRESSL_VERSION_NUMBER with -Wundef under openssl < 1.1. Related to GH issue #1369. Seems like some of them could be simplified a little bit.	2021-08-30 06:20:46 +02:00
Willy Tarreau	a01f8ce2d4	BUILD/MINOR: regex: avoid a build warning on USE_PCRE2 with -Wundef regex-t emits a warning on #elif USE_PCRE2 when built with -Wundef, let's just fix it. This was reported in GH issue #1369.	2021-08-28 12:49:58 +02:00
Willy Tarreau	6e5542e9f4	BUILD/MINOR: ssl: avoid a build warning on LIBRESSL_VERSION with -Wundef Openssl-compat emits a warning for the test on LIBRESSL_VERSION that might be underfined, if built with -Wundef. The fix is easy, let's do it. Related to GH issue #1369.	2021-08-28 12:06:51 +02:00
Willy Tarreau	33056436c7	BUILD/MINOR: defaults: eliminate warning on MAXHOSTNAMELEN with -Wundef As reported in GH issue #1369, there is a single case of #if with a possibly undefined value in defaults.h which is on MAXHOSTNAMELEN. Let's turn it to a #ifdef.	2021-08-28 12:05:32 +02:00
Willy Tarreau	cbdc74b4b3	BUG/MINOR: ebtree: remove dependency on incorrect macro for bits per long The code used to rely on BITS_PER_LONG to decide on the most efficient way to perform a 64-bit shift, but this macro is not defined (at best it's __BITS_PER_LONG) and it's likely that it's been like this since the early implementation of ebtrees designed on i386. Let's remove the test on this macro and rely on sizeof(long) instead, it also has the benefit of letting the compiler validate the two branches. This can be backported to all versions. Thanks to Ezequiel Garcia for reporting this one in issue #1369.	2021-08-28 11:55:53 +02:00
Willy Tarreau	fe456c581f	MINOR: time: add report_idle() to report process-wide idle time Before threads were introduced in 1.8, idle_pct used to be a global variable indicating the overall process idle time. Threads made it thread-local, meaning that its reporting in the stats made little sense, though this was not easy to spot. In 2.0, the idle_pct variable moved to the struct thread_info via commit `81036f273` ("MINOR: time: move the cpu, mono, and idle time to thread_info"). It made it more obvious that the idle_pct was per thread, and also allowed to more accurately measure it. But no more effort was made in that direction. This patch introduces a new report_idle() function that accurately averages the per-thread idle time over all running threads (i.e. it should remain valid even if some threads are paused or stopped), and makes use of it in the stats / "show info" reports. Sending traffic over only two connections of an 8-thread process would previously show this erratic CPU usage pattern: $ while :; do socat /tmp/sock1 - <<< "show info"\|grep ^Idle;sleep 0.1;done Idle_pct: 30 Idle_pct: 35 Idle_pct: 100 Idle_pct: 100 Idle_pct: 100 Idle_pct: 100 Idle_pct: 100 Idle_pct: 100 Idle_pct: 35 Idle_pct: 33 Idle_pct: 100 Idle_pct: 100 Idle_pct: 100 Idle_pct: 100 Idle_pct: 100 Idle_pct: 100 Now it shows this more accurate measurement: $ while :; do socat /tmp/sock1 - <<< "show info"\|grep ^Idle;sleep 0.1;done Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 Idle_pct: 83 This is not technically a bug but this lack of precision definitely affects some users who rely on the idle_pct measurement. This should at least be backported to 2.4, and might be to some older releases depending on users demand.	2021-08-28 11:18:10 +02:00
Willy Tarreau	e365aa28d4	BUG/MINOR: time: fix idle time computation for long sleeps In 2.4 we extended the max poll time from 1s to 60s with commit `4f59d3861` ("MINOR: time: increase the minimum wakeup interval to 60s"). This had the consequence that the calculation of the idle time percentage may overflow during the multiply by 100 if the thread had slept 43s or more. Let's change this to a 64 bit computation. This will have no performance impact since this is done at most twice per second. This should fix github issue #1366. This must be backported to 2.4.	2021-08-27 23:36:20 +02:00
Marcin Deranek	310a260e4a	MEDIUM: config: Deprecate tune.ssl.capture-cipherlist-size Deprecate tune.ssl.capture-cipherlist-size in favor of tune.ssl.capture-buffer-size which better describes the purpose of the setting.	2021-08-26 19:52:04 +02:00
Marcin Deranek	959a48c116	MINOR: sample: Expose SSL captures using new fetchers To be able to provide JA3 compatible TLS Fingerprints we need to expose all Client Hello captured data using fetchers. Patch provides new and modifies existing fetchers to add ability to filter out GREASE values: - ssl_fc_cipherlist_* - ssl_fc_ecformats_bin - ssl_fc_eclist_bin - ssl_fc_extlist_bin - ssl_fc_protocol_hello_id	2021-08-26 19:48:34 +02:00
Marcin Deranek	769fd2e447	MEDIUM: ssl: Capture more info from Client Hello When we set tune.ssl.capture-cipherlist-size to a non-zero value we are able to capture cipherlist supported by the client. To be able to provide JA3 compatible TLS fingerprinting we need to capture more information from Client Hello message: - SSL Version - SSL Extensions - Elliptic Curves - Elliptic Curve Point Formats This patch allows HAProxy to capture such information and store it for later use.	2021-08-26 19:48:33 +02:00
Willy Tarreau	906f7daed1	MINOR: compiler: implement an ONLY_ONCE() macro There are regularly places, especially in config analysis, where we need to report certain things (warnings or errors) only once, but where implementing a counter is sufficiently deterrent so that it's not done. Let's add a simple ONLY_ONCE() macro that implements a static variable (char) which is atomically turned on, and returns true if it's set for the first time. This uses fairly compact code, a single byte of BSS and is thread-safe. There are probably a number of places in the config parser where this could be used. It may also be used to implement a WARN_ON() similar to BUG_ON() but which would only warn once.	2021-08-26 16:35:00 +02:00
Amaury Denoyelle	5cca48cba2	MINOR: server: define non purgeable server flag Define a flag to mark a server as non purgeable. This flag will be used for "delete server" CLI handler. All servers without this flag will be eligible to runtime suppression.	2021-08-25 15:53:54 +02:00
Amaury Denoyelle	bc2ebfa5a4	MEDIUM: server: extend refcount for all servers In a future patch, it will be possible to remove at runtime every servers, both static and dynamic. This requires to extend the server refcount for all instances. First, refcount manipulation functions have been renamed to better express the API usage. * srv_refcount_use -> srv_take The refcount is always initialize to 1 on the server creation in new_server. It's also incremented for each check/agent configured on a server instance. * free_server -> srv_drop This decrements the refcount and if null, the server is freed, so code calling it must not use the server reference after it. As a bonus, this function now returns the next server instance. This is useful when calling on the server loop without having to save the next pointer before each invocation. In these functions, remove the checks that prevent refcount on non-dynamic servers. Each reference to "dynamic" in variable/function naming have been eliminated as well.	2021-08-25 15:53:54 +02:00
Amaury Denoyelle	0a8d05d31c	BUG/MINOR: stats: use refcount to protect dynamic server on dump A dynamic server may be deleted at runtime at the same moment when the stats applet is pointing to it. Use the server refcount to prevent deletion in this case. This should be backported up to 2.4, with an observability period of 2 weeks. Note that it requires the dynamic server refcounting feature which has been implemented on 2.5; the following commits are required : - MINOR: server: implement a refcount for dynamic servers - BUG/MINOR: server: do not use refcount in free_server in stopping mode - MINOR: server: return the next srv instance on free_server	2021-08-25 15:53:43 +02:00
Amaury Denoyelle	f5c1e12e44	MINOR: server: return the next srv instance on free_server As a convenience, return the next server instance from servers list on free_server. This is particularily useful when using this function on the servers list without having to save of the next pointer before calling it.	2021-08-25 15:29:19 +02:00
Ilya Shipitsin	ff0f278860	CLEANUP: assorted typo fixes in the code and comments This is 26th iteration of typo fixes	2021-08-25 05:13:31 +02:00
William Lallemand	3aeb3f9347	MINOR: cfgcond: implements openssl_version_atleast and openssl_version_before Implements a way of checking the running openssl version: If the OpenSSL support was not compiled within HAProxy it will returns a error, so it's recommanded to do a SSL feature check before: $ ./haproxy -cc 'feature(OPENSSL) && openssl_version_atleast(0.9.8zh) && openssl_version_before(3.0.0)' This will allow to select the SSL reg-tests more carefully.	2021-08-22 00:30:24 +02:00
William Lallemand	44d862d8d4	MINOR: ssl: add an openssl version string parser openssl_version_parser() parse a string in the OpenSSL version format which is documented here: https://www.openssl.org/docs/man1.1.1/man3/OPENSSL_VERSION_NUMBER.html The function returns an unsigned int that could be used for comparing openssl versions.	2021-08-21 23:44:02 +02:00
William Lallemand	2a8fe8bb48	MINOR: httpclient: cleanup the include files Include the correct .h files in http_client.c and http_client.h. The api.h is needed in http_client.c and http_client-t.h is now include directly from http_client.h	2021-08-20 14:25:15 +02:00
Remi Tricot-Le Breton	f95c29546c	BUILD/MINOR: ssl: Fix compilation with OpenSSL 1.0.2 The X509_STORE_CTX_get0_cert did not exist yet on OpenSSL 1.0.2 and neither did X509_STORE_CTX_get0_chain, which was not actually needed since its get1 equivalent already existed.	2021-08-20 10:05:58 +02:00
Remi Tricot-Le Breton	74f6ab6e87	MEDIUM: ssl: Keep a reference to the client's certificate for use in logs Most of the SSL sample fetches related to the client certificate were based on the SSL_get_peer_certificate function which returns NULL when the verification process failed. This made it impossible to use those fetches in a log format since they would always be empty. The patch adds a reference to the X509 object representing the client certificate in the SSL structure and makes use of this reference in the fetches. The reference can only be obtained in ssl_sock_bind_verifycbk which means that in case of an SSL error occurring before the verification process ("no shared cipher" for instance, which happens while processing the Client Hello), we won't ever start the verification process and it will be impossible to get information about the client certificate. This patch also allows most of the ssl_c_XXX fetches to return a usable value in case of connection failure (because of a verification error for instance) by making the "conn->flags & CO_FL_WAIT_XPRT" test (which requires a connection to be established) less strict. Thanks to this patch, a log-format such as the following should return usable information in case of an error occurring during the verification process : log-format "DN=%{+Q}[ssl_c_s_dn] serial=%[ssl_c_serial,hex] \ hash=%[ssl_c_sha1,hex]" It should answer to GitHub issue #693.	2021-08-19 23:26:05 +02:00
William Lallemand	33b0d095cc	MINOR: httpclient: implement a simple HTTP Client API This commit implements a very simple HTTP Client API. A client can be operated by several functions: - httpclient_new(), httpclient_destroy(): create and destroy the struct httpclient instance. - httpclient_req_gen(): generate a complete HTX request using the the absolute URL, the method and a list of headers. This request is complete and sets the HTX End of Message flag. This is limited to small request we don't need a body. - httpclient_start() fill a sockaddr storage with a IP extracted from the URL (it cannot resolve an fqdm for now), start the applet. It also stores the ptr of the caller which could be an appctx or something else. - hc->ops contains a list of callbacks used by the HTTPClient, they should be filled manually after an httpclient_new(): * res_stline(): the client received a start line, its content will be stored in hc->res.vsn, hc->res.status, hc->res.reason * res_headers(): the client received headers, they are stored in hc->res.hdrs. * res_payload(): the client received some payload data, they are stored in the hc->res.buf buffer and could be extracted with the httpclient_res_xfer() function, which takes a destination buffer as a parameter * res_end(): this callback is called once we finished to receive the response.	2021-08-18 17:36:32 +02:00
Willy Tarreau	d3d8d03d98	MINOR: http: add a new function http_validate_scheme() to validate a scheme While http_parse_scheme() extracts a scheme from a URI by extracting exactly the valid characters and stopping on delimiters, this new function performs the same on a fixed-size string.	2021-08-17 10:16:22 +02:00
Ilya Shipitsin	01881087fc	CLEANUP: assorted typo fixes in the code and comments This is 25th iteration of typo fixes	2021-08-16 12:37:59 +02:00
Christopher Faulet	df97ac4584	MEDIUM: filters/lua: Add HTTPMessage class to help HTTP filtering This new class exposes methods to manipulate HTTP messages from a filter written in lua. Like for the HTTP class, there is a bunch of methods to manipulate the message headers. But there are also methods to manipulate the message payload. This part is similar to what is available in the Channel class. Thus the payload can be duplicated, erased, modified or forwarded. For now, only DATA blocks can be retrieved and modified because the current API is limited. No HTTPMessage method is able to yield. Those manipulating the headers are always called on messages containing all the headers, so there is no reason to yield. Those manipulating the payload are called from the http_payload filters callback function where yielding is forbidden. When an HTTPMessage object is instantiated, the underlying Channel object can be retrieved via the ".channel" field. For now this class is not used because the HTTP filtering is not supported yet. It will be the purpose of another commit. There is no documentation for now.	2021-08-12 08:57:07 +02:00
Christopher Faulet	8c9e6bba0f	MINOR: lua: Add flags on the lua TXN to know the execution context A lua TXN can be created when a sample fetch, an action or a filter callback function is executed. A flag is now used to track the execute context. Respectively, HLUA_TXN_SMP_CTX, HLUA_TXN_ACT_CTX and HLUA_TXN_FLT_CTX. The filter flag is not used for now.	2021-08-12 08:57:07 +02:00
Christopher Faulet	1f43a3430e	MINOR: lua: Add a flag on lua context to know the yield capability at run time When a script is executed, a flag is used to allow it to yield. An error is returned if a lua function yield, explicitly or not. But there is no way to get this capability in C functions. So there is no way to choose to yield or not depending on this capability. To fill this gap, the flag HLUA_NOYIELD is introduced and added on the lua context if the current script execution is not authorized to yield. Macros to set, clear and test this flags are also added. This feature will be usefull to fix some bugs in lua actions execution.	2021-08-12 08:57:07 +02:00
William Lallemand	8c29fa7454	MINOR: channel: remove an htx block from a channel co_htx_remove_blk() implements a way to remove an htx block from a channel buffer and update the channel output.	2021-08-12 00:51:59 +02:00
Amaury Denoyelle	7afa5c1843	MINOR: global: define MODE_STOPPING Define a new mode MODE_STOPPING. It is used to indicate that the process is in the stopping stage and no event loop runs anymore.	2021-08-09 17:51:55 +02:00
Amaury Denoyelle	b33a0abc0b	MEDIUM: check: implement check deletion for dynamic servers Implement a mechanism to free a started check on runtime for dynamic servers. A new function check_purge is created for this. The check task will be marked for deletion and scheduled to properly close connection elements and free the task/tasklet/buf_wait elements. This function will be useful to delete a dynamic server wich checks.	2021-08-06 11:09:48 +02:00
Amaury Denoyelle	d6b7080cec	MINOR: server: implement a refcount for dynamic servers It is necessary to have a refcount mechanism on dynamic servers to be able to enable check support. Indeed, when deleting a dynamic server with check activated, the check will be asynchronously removed. This is mandatory to properly free the check resources in a thread-safe manner. The server instance must be kept alive for this.	2021-08-06 11:09:48 +02:00
Amaury Denoyelle	3c2ab1a0d4	MINOR: check: export check init functions Remove static qualifier on init_srv_check, init_srv_agent_check and start_check_task. These functions will be called in server.c for dynamic servers with checks.	2021-08-06 11:08:04 +02:00
Amaury Denoyelle	7b368339af	MEDIUM: task: implement tasklet kill Implement an equivalent of task_kill for tasklets. This function can be used to request a tasklet deletion in a thread-safe way. Currently this function is unused.	2021-08-06 11:07:48 +02:00
Christopher Faulet	434b8525ee	MINOR: spoe: Add a pointer on the filter config in the spoe_agent structure There was no way to access the SPOE filter configuration from the agent object. However it could be handy to have it. And in fact, this will be required to fix a bug.	2021-08-05 10:07:43 +02:00
Willy Tarreau	7b2ac29a92	CLEANUP: fd: remove the now unneeded fd_mig_lock This is not needed anymore since we don't use it when setting the running mask anymore.	2021-08-04 16:03:36 +02:00
Willy Tarreau	b201b1dab1	CLEANUP: fd: remove the now unused fd_set_running() It was inlined inside fd_update_events() since it relies on a loop that may return immediate failure codes.	2021-08-04 16:03:36 +02:00
Willy Tarreau	f69fea64e0	MAJOR: fd: get rid of the DWCAS when setting the running_mask Right now we're using a DWCAS to atomically set the running_mask while being constrained by the thread_mask. This DWCAS is annoying because we may seriously need it later when adding support for thread groups, for checking that the running_mask applies to the correct group. It turns out that the DWCAS is not strictly necessary because we never need it to set the thread_mask based on the running_mask, only the other way around. And in fact, the running_mask is always cleared alone, and the thread_mask is changed alone as well. The running_mask is only relevant to indicate a takeover when the thread_mask matches it. Any bit set in running and not present in thread_mask indicates a transition in progress. As such, it is possible to re-arrange this by using a regular CAS around a consistency check between running_mask and thread_mask in fd_update_events and by making a CAS on running_mask then an atomic store on the thread_mask in fd_takeover(). The only other case is fd_delete() but that one already sets the running_mask before clearing the thread_mask, which is compatible with the consistency check above. This change has happily survived 10 billion takeovers on a 16-thread machine at 800k requests/s. The fd-migration doc was updated to reflect this change.	2021-08-04 16:03:36 +02:00
Willy Tarreau	b1f29bc625	MINOR: activity/fd: remove the dead_fd counter This one is set whenever an FD is reported by a poller with a null owner, regardless of the thread_mask. It has become totally meaningless because it only indicates a migrated FD that was not yet reassigned to a thread, but as soon as a thread uses it, the status will change to skip_fd. Thus there is no reason to distinguish between the two, it adds more confusion than it helps. Let's simply drop it.	2021-08-04 16:03:36 +02:00
Willy Tarreau	88d1c5d3fb	MEDIUM: threads: add a stronger thread_isolate_full() call The current principle of running under isolation was made to access sensitive data while being certain that no other thread was using them in parallel, without necessarily having to place locks everywhere. The main use case are "show sess" and "show fd" which run over long chains of pointers. The thread_isolate() call relies on the "harmless" bit that indicates for a given thread that it's not currently doing such sensitive things, which is advertised using thread_harmless_now() and which ends usings thread_harmless_end(), which also waits for possibly concurrent threads to complete their work if they took this opportunity for starting something tricky. As some system calls were notoriously slow (e.g. mmap()), a bunch of thread_harmless_now() / thread_harmless_end() were placed around them to let waiting threads do their work while such other threads were not able to modify memory contents. But this is not sufficient for performing memory modifications. One such example is the server deletion code. By modifying memory, it not only requires that other threads are not playing with it, but are not either in the process of touching it. The fact that a pool_alloc() or pool_free() on some structure may call thread_harmless_now() and let another thread start to release the same object's memory is not acceptable. This patch introduces the concept of "idle threads". Threads entering the polling loop are idle, as well as those that are waiting for all others to become idle via the new function thread_isolate_full(). Once thread_isolate_full() is granted, the thread is not idle anymore, and it is released using thread_release() just like regular isolation. Its users have to keep in mind that across this call nothing is granted as another thread might have performed shared memory modifications. But such users are extremely rare and are actually expecting this from their peers as well. Note that that in case of backport, this patch depends on previous patch: MINOR: threads: make thread_release() not wait for other ones to complete	2021-08-04 14:49:36 +02:00
William Lallemand	8e765b86fd	MINOR: proxy: disabled takes a stopping and a disabled state This patch splits the disabled state of a proxy into a PR_DISABLED and a PR_STOPPED state. The first one is set when the proxy is disabled in the configuration file, and the second one is set upon a stop_proxy().	2021-08-03 14:17:45 +02:00
William Lallemand	56f1f75715	MINOR: log: rename 'dontloglegacyconnerr' to 'log-error-via-logformat' Rename the 'dontloglegacyconnerr' option to 'log-error-via-logformat' which is much more self-explanatory and readable. Note: only legacy keywords don't use hyphens, it is recommended to separate words with them in new keywords.	2021-08-02 10:42:42 +02:00
Willy Tarreau	99198546f6	MEDIUM: atomic: relax the load/store barriers on x86_64 The x86-tso model makes the load and store barriers unneeded for our usage as long as they perform at least a compiler barrier: the CPU will respect store ordering and store vs load ordering. It's thus safe to remove the lfence and sfence which are normally needed only to communicate with external devices. Let's keep the mfence though, to make sure that reads of same memory location after writes report the value from memory and not the one snooped from the write buffer for too long. An in-depth review of all use cases tends to indicate that this is okay in the rest of the code. Some parts could be cleaned up to use atomic stores and atomic loads instead of explicit barriers though. Doing this reliably increases the overall performance by about 2-2.5% on a 8c-16t Xeon thanks to less frequent flushes (it's likely that the biggest gain is in the MT lists which use them a lot, and that this results in less cache line flushes).	2021-08-01 17:34:06 +02:00
Willy Tarreau	cb0451146f	MEDIUM: atomic: simplify the atomic load/store/exchange operations The atomic_load/atomic_store/atomic_xchg operations were all forced to __ATOMIC_SEQ_CST, which results in explicit store or even full barriers even on x86-tso while we do not need them: we're not communicating with external devices for example and are only interested in respecting the proper ordering of loads and stores between each other. These ones being rarely used, the emitted code on x86 remains almost the same (barring a handful of locations). However they will allow to place correct barriers at other places where atomics are accessed a bit lightly. The patch is marked medium because we can never rule out the risk of some bugs on more relaxed platforms due to the rest of the code.	2021-08-01 17:34:06 +02:00
Willy Tarreau	55a0975b1e	BUG/MINOR: freq_ctr: use stricter barriers between updates and readings update_freq_ctr_period() was using relaxed atomics without using barriers, which usually works fine on x86 but not everywhere else. In addition, some values were read without being enclosed by barriers, allowing the compiler to possibly prefetch them a bit earlier. Finally, freq_ctr_total() was also reading these without enough barriers. Let's make explicit use of atomic loads and atomic stores to get rid of this situation. This required to slightly rearrange the freq_ctr_total() loop, which could possibly slightly improve performance under extreme contention by avoiding to reread all fields. A backport may be done to 2.4 if a problem is encountered, but last tests on arm64 with LSE didn't show any issue so this can possibly stay as-is.	2021-08-01 17:34:06 +02:00
Willy Tarreau	200bd50b73	MEDIUM: fd: rely more on fd_update_events() to detect changes This function already performs a number of checks prior to calling the IOCB, and detects the change of thread (FD migration). Half of the controls are still in each poller, and these pollers also maintain activity counters for various cases. Note that the unreliable test on thread_mask was removed so that only the one performed by fd_set_running() is now used, since this one is reliable. Let's centralize all that fd-specific logic into the function and make it return a status among: FD_UPDT_DONE, // update done, nothing else to be done FD_UPDT_DEAD, // FD was already dead, ignore it FD_UPDT_CLOSED, // FD was closed FD_UPDT_MIGRATED, // FD was migrated, ignore it now Some pollers already used to call it last and have nothing to do after it, regardless of the result. epoll has to delete the FD in case a migration is detected. Overall this removes more code than it adds.	2021-07-30 17:45:18 +02:00
Willy Tarreau	84c7922c52	REORG: fd: uninline fd_update_events() This function has become a monster (80 lines and 2/3 of a kB), it doesn't benefit from being static nor inline anymore, let's move it to fd.c.	2021-07-30 17:41:55 +02:00
Willy Tarreau	a199a17d72	MINOR: fd: update flags only once in fd_update_events() Since 2.4 with commit `f50906519` ("MEDIUM: fd: merge fdtab[].ev and state for FD_EV_* and FD_POLL_* into state") we can merge all flag updates at once in fd_update_events(). Previously this was performed in 1 to 3 steps, setting the polling state, then setting READY_R if in/err/hup, and setting READY_W if out/err. But since the commit above, all flags are stored together in the same structure field that is being updated with the new flags, thus we can simply update the flags altogether and avoid multiple atomic operations. This even removes the need for atomic ops for FDs that are not shared.	2021-07-30 17:41:55 +02:00
Willy Tarreau	d5402b8df8	BUG/MINOR: fd: protect fd state harder against a concurrent takeover There's a theoretical race (that we failed to trigger) in function fd_update_events(), which could strike on idle connections. The "locked" variable will most often be 0 as the FD is bound to the current thread only. Another thread could take it over once "locked" is set, change the thread and running masks. Then the first thread updates the FD's state non-atomically and possibly overwrites what the other thread was preparing. It still looks like the FD's state will ultimately converge though. The solution against this is to set the running flag earlier so that a takeover() attempt cannot succeed, or that the fd_set_running() attempt fails, indicating that nothing needs to be done on this FD. While this is sufficient for a simple fix to be backported, it leaves the FD actively polled in the calling thread, this will trigger a second wakeup which will notice the absence of tid_bit in the thread_mask, getting rid of it. A more elaborate solution would consist in calling fd_set_running() directly from the pollers before calling fd_update_events(), getting rid of the thread_mask test and letting the caller eliminate that FD from its list if needed. Interestingly, this code also proves to be suboptimal in that it sets the FD state twice instead of calculating the new state at once and always using a CAS to set it. This is a leftover of a simplification that went into 2.4 and which should be explored in a future patch. This may be backported as far as 2.2.	2021-07-30 14:54:19 +02:00
Willy Tarreau	6ed242ece6	BUG/MEDIUM: connection: close a rare race between idle conn close and takeover The takeover of idle conns between threads is particularly tricky, for two reasons: - there's no way to atomically synchronize kernel-side polling with userspace activity, so late events will always be reported for some FDs just migrated ; - upon error, an FD may be immediately reassigned to whatever other thread since it's process-wide. The current model uses the FD's thread_mask to figure if an FD still ought to be reported or not, and a per-thread idle connection queue from which eligible connections are atomically added/picked. I/Os coming from the bottom for such a connection must remove it from the list so that it's not elected. Same for timeout tasks and iocbs. And these last ones check their context under the idle_conn lock to judge if they're still allowed to run. One rare case was omitted: the wake() callback. This one is rare, it may serve to notify about finalized connect() calls that are not being polled, as well as unhandled shutdowns and errors. This callback was not protected till now because it wasn't seen as sensitive, but there exists a particular case where it may be called without protectoin in parallel to a takeover. This happens in the following sequence: - thread T1 wants to establish an outgoing connection - the connect() call returns EINPROGRESS - the poller adds it using epoll_ctl() - epoll_wait() reports it, connect() is done. The connection is not being marked as actively polled anymore but is still known from the poller. - the request is sent over that connection using send(), which queues to system buffers while data are being delivered - the scheduler switches to other tasks - the request is physically sent - the server responds - the stream is notified that send() succeeded, and makes progress, trying to recv() from that connection - the recv() succeeds, the response is delivered - the poller doesn't need to be touched (still no active polling) - the scheduler switches to other tasks - the server closes the connection - the poller on T1 is notified of the SHUTR and starts to call mux->wake() - another thread T2 takes over the connection - T2 continues to run inside wake() and releases the connection - T2 is just dereferencing it. - BAM. The most logical solution here is to surround the call to wake() with an atomic removal/insert of the connection from/into the idle conns lists. This way, wake() is guaranteed to run alone. Any other poller reporting the FD will not have its tid_bit in the thread_mask si will not bother it. Another thread trying a takeover will not find this connection. A task or tasklet being woken up late will either be on the same thread, or be called on another one with a NULL context since it will be the consequence of previous successful takeover, and will be ignored. Note that the extra cost of a lock and tree access here have a low overhead which is totally amortized given that these ones roughly happen 1-2 times per connection at best. While it was possible to crash the process after 10-100k req using H2 and a hand-refined configuration achieving perfect synchronism between a long (20+) chain of proxies and a short timeout (1ms), now with that fix this never happens even after 10M requests. Many thanks to Olivier for proposing this solution and explaining why it works. This should be backported as far as 2.2 (when inter-thread takeover was introduced). The code in older versions will be found in conn_fd_handler(). A workaround consists in disabling inter-thread pool sharing using: tune.idle-pool.shared off	2021-07-30 08:34:38 +02:00
Remi Tricot-Le Breton	4a6328f066	MEDIUM: connection: Add option to disable legacy error log In case of connection failure, a dedicated error message is output, following the format described in section "Error log format" of the documentation. These messages cannot be configured through a log-format option. This patch adds a new option, "dontloglegacyconnerr", that disables those error logs when set, and "replaces" them by a regular log line that follows the configured log-format (thanks to a call to sess_log in session_kill_embryonic). The new fc_conn_err sample fetch allows to add the legacy error log information into a regular log format. This new option is unset by default so the logging logic will remain the same until this new option is used.	2021-07-29 15:40:45 +02:00
Remi Tricot-Le Breton	98b930d043	MINOR: ssl: Define a default https log format This patch adds a new httpslog option and a new HTTP over SSL log-format that expands the default HTTP format and adds SSL specific information.	2021-07-29 15:40:45 +02:00
Remi Tricot-Le Breton	7c6898ee49	MINOR: ssl: Add new ssl_fc_hsk_err sample fetch This new sample fetch along the ssl_fc_hsk_err_str fetch contain the last SSL error of the error stack that occurred during the SSL handshake (from the frontend's perspective). The errors happening during the client's certificate verification will still be given by the ssl_c_err and ssl_c_ca_err fetches. This new fetch will only hold errors retrieved by the OpenSSL ERR_get_error function.	2021-07-29 15:40:45 +02:00
Remi Tricot-Le Breton	3d2093af9b	MINOR: connection: Add a connection error code sample fetch The fc_conn_err and fc_conn_err_str sample fetches give information about the problem that made the connection fail. This information would previously only have been given by the error log messages meaning that thanks to these fetches, the error log can now be included in a custom log format. The log strings were all found in the conn_err_code_str function.	2021-07-29 15:40:45 +02:00
Remi Tricot-Le Breton	0aa4130d65	BUG/MINOR: connection: Add missing error labels to conn_err_code_str The CO_ER_SSL_EARLY_FAILED and CO_ER_CIP_TIMEOUT connection error codes were missing in the conn_err_code_str switch which converts the error codes into string. This patch can be backported on all stable branches.	2021-07-29 15:40:45 +02:00
William Lallemand	6bb77b9c64	MINOR: proxy: rename PR_CAP_LUA to PR_CAP_INT This patch renames the proxy capability "LUA" to "INT" so it could be used for any internal proxy. Every proxy that are not user defined should use this flag.	2021-07-28 15:51:42 +02:00
David CARLIER	534197c721	BUILD/MINOR: memprof fix macOs build. this platform has a similar malloc_usable_size too.	2021-07-21 10:22:48 +02:00
Willy Tarreau	dc70c18ddc	BUG/MEDIUM: cfgcond: limit recursion level in the condition expression parser Oss-fuzz reports in issue 36328 that we can recurse too far by passing extremely deep expressions to the ".if" parser. I thought we were still limited to the 1024 chars per line, that would be highly sufficient, but we don't have any limit now :-/ Let's just pass a maximum recursion counter to the recursive parsers. It's decremented for each call and the expression fails if it reaches zero. On the most complex paths it can add 3 levels per parenthesis, so with a limit of 1024, that's roughly 343 nested sub-expressions that are supported in the worst case. That's more than sufficient, for just a few kB of RAM. No backport is needed.	2021-07-20 18:03:08 +02:00
Willy Tarreau	252412316e	MEDIUM: proxy: remove long-broken 'option http_proxy' This option had always been broken in HTX, which means that the first breakage appeared in 1.9, that it was broken by default in 2.0 and that no workaround existed starting with 2.1. The way this option works is praticularly unfit to the rest of the configuration and to the internal architecture. It had some uses when it was introduced 14 years ago but nowadays it's possible to do much better and more reliable using a set of "http-request set-dst" and "http-request set-uri" rules, which additionally are compatible with DNS resolution (via do-resolve) and are not exclusive to normal load balancing. The "option-http_proxy" example config file was updated to reflect this. The option is still parsed so that an error message gives hints about what to look for.	2021-07-18 19:35:32 +02:00
Willy Tarreau	f1db20c473	BUG/MINOR: cfgcond: revisit the condition freeing mechanism to avoid a leak The cfg_free_cond_{term,and,expr}() functions used to take a pointer to the pointer to be freed in order to replace it with a NULL once done. But this doesn't cope well with freeing lists as it would require recursion which the current code tried to avoid. Let's just change the API to free the area and let the caller set the NULL. This leak was reported by oss-fuzz (issue 36265).	2021-07-17 18:46:30 +02:00
Willy Tarreau	316ea7ede5	MINOR: cfgcond: support terms made of parenthesis around expressions Now it's possible to form a term using parenthesis around an expression. This will soon allow to build more complex expressions. For now they're still pretty limited but parenthesis do work.	2021-07-16 19:18:41 +02:00
Willy Tarreau	ca81887599	MINOR: cfgcond: insert an expression between the condition and the term Now evaluating a condition will rely on an expression (or an empty string), and this expression will support ORing a sub-expression with another optional expression. The sub-expressions ANDs a term with another optional sub-expression. With this alone precedence between && and \|\| is respected, and the following expression: A && B && C \|\| D \|\| E && F \|\| G will naturally evaluate as: (A && B && C) \|\| D \|\| (E && F) \|\| G	2021-07-16 19:18:41 +02:00
Willy Tarreau	087b2d018f	MINOR: cfgcond: make the conditional term parser automatically allocate nodes It's not convenient to let the caller be responsible for node allocation, better have the leaf function do that and implement the accompanying free call. Now only a pointer is needed instead of a struct, and the leaf function makes sure to leave the situation in a consistent way.	2021-07-16 19:18:41 +02:00
Willy Tarreau	ca56d3d28b	MINOR: cfgcond: support negating conditional expressions Now preceeding a config condition term with "!" will simply negate it. Example: .if !feature(OPENSSL) .alert "SSL support is mandatory" .endif	2021-07-16 19:18:41 +02:00
Willy Tarreau	f869095df9	MINOR: cfgcond: start to split the condition parser to introduce terms The purpose is to build a descendent parser that will split conditions into expressions made of terms. There are two phases, a parsing phase and an evaluation phase. Strictly speaking it's not required to cut that in two right now, but it's likely that in the future we won't want certain predicates to be evaluated during the parsing (e.g. file system checks or execution of some external commands). The cfg_eval_condition() function is now much simpler, it just tries to parse a single term, and if OK evaluates it, then returns the result. Errors are unchanged and may still be reported during parsing or evaluation. It's worth noting that some invalid expressions such as streq(a,b)zzz continue to parse correctly for now (what remains after the parenthesis is simply ignored as not necessary).	2021-07-16 19:18:41 +02:00
Willy Tarreau	66243b4273	REORG: config: move the condition preprocessing code to its own file The .if/.else/.endif and condition evaluation code is quite dirty and was dumped into cfgparse.c because it was easy. But it should be tidied quite a bit as it will need to evolve. Let's move all that to cfgcond.{c,h}.	2021-07-16 19:18:41 +02:00
Willy Tarreau	ab213a5b6f	MINOR: arg: add a free_args() function to free an args array make_arg_list() can create an array of arguments, some of which remain to be resolved, but all users had to deal with their own roll back on error. Let's add a free_args() function to release all the array's elements and let the caller deal with the array itself (sometimes it's allocated in the stack).	2021-07-16 19:18:41 +02:00
Amaury Denoyelle	669b620e5f	MINOR: srv: extract tracking server config function Extract the post-config tracking setup in a dedicated function srv_apply_track. This will be useful to implement track support for dynamic servers.	2021-07-16 10:08:55 +02:00
Willy Tarreau	4c6986a6bc	CLEANUP: applet: remove unused thread_mask Since 1.9 with commit `673867c35` ("MAJOR: applets: Use tasks, instead of rolling our own scheduler.") the thread_mask field of the appctx became unused, but the code hadn't been cleaned for this. The appctx has its own task and the task's thread_mask is the one to be displayed. It's worth noting that all calls to appctx_new() pass tid_bit as the thread_mask. This makes sense, and it could be convenient to decide that this becomes the norm and to simplify the API.	2021-07-13 18:20:34 +02:00
Amaury Denoyelle	befeae88e8	MINOR: mux_h2: define config to disable h2 websocket support Define a new global config statement named "h2-workaround-bogus-websocket-clients". This statement will disable the automatic announce of h2 websocket support as specified in the RFC8441. This can be use to overcome clients which fail to implement the relatively fresh RFC8441. Clients will in his case automatically downgrade to http/1.1 for the websocket tunnel if the haproxy configuration allows it. This feature is relatively simple and can be backported up to 2.4, which saw the introduction of h2 websocket support.	2021-07-12 10:41:45 +02:00
Amaury Denoyelle	c453f9547e	MINOR: http: use http uri parser for path Replace http_get_path by the http_uri_parser API. The new functions is renamed http_parse_path. Replace duplicated code for scheme and authority parsing by invocations to http_parse_scheme/authority. If no scheme is found for an URI detected as an absolute-uri/authority, consider it to be an authority format : no path will be found. For an absolute-uri or absolute-path, use the remaining of the string as the path. A new http_uri_parser state is declared to mark the path parsing as done.	2021-07-08 17:11:17 +02:00
Amaury Denoyelle	69294b20ac	MINOR: http: use http uri parser for authority Replace http_get_authority by the http_uri_parser API. The new function is renamed http_parse_authority. Replace duplicated scheme parsing code by http_parse_scheme invocation. A new http_uri_parser state is declared to mark the authority parsing as done.	2021-07-08 17:11:17 +02:00
Amaury Denoyelle	8ac8cbfd72	MINOR: http: use http uri parser for scheme Replace http_get_scheme by the http_uri_parser API. The new function is renamed http_parse_scheme. A new http_uri_parser state is declared to mark the scheme parsing as completed.	2021-07-08 17:11:17 +02:00
Amaury Denoyelle	89c68c8117	MINOR: http: implement http uri parser Implement a http uri parser type. This type will be used as a context to parse the various elements of an uri. The goal of this serie of patches is to factorize duplicated code between the http_get_scheme/authority/path functions. A simple parsing API is designed to be able to extract once each element of an HTTP URI in order. The functions will be renamed in the following patches to reflect the API change with the prefix http_parse_*. For the parser API, the http_uri_parser type must first be initialized before usage. It will register the URI to parse and detect its format according to the rfc 7230.	2021-07-08 17:08:57 +02:00
Amaury Denoyelle	4c0882b1b4	MEDIUM: http: implement scheme-based normalization Implement the scheme-based uri normalization as described in rfc3986 6.3.2. Its purpose is to remove the port of an uri if the default one is used according to the uri scheme : 80/http and 443/https. All other ports are not touched. This method uses an htx message as an input. It requires that the target URI is in absolute-form with a http/https scheme. This represents most of h2 requests except CONNECT. On the contrary, most of h1 requests won't be elligible as origin-form is the standard case. The normalization is first applied on the target URL of the start line. Then, it is conducted on every Host headers present, assuming that they are equivalent to the target URL. This change will be notably useful to not confuse users who are accustomed to use the host for routing without specifying default ports. This problem was recently encountered with Firefox which specify the 443 default port for http2 websocket Extended CONNECT.	2021-07-07 15:34:01 +02:00
Amaury Denoyelle	ef08811240	MINOR: http: implement http_get_scheme This method can be used to retrieve the scheme part of an uri, with the suffix '://'. It will be useful to implement scheme-based normalization.	2021-07-07 15:34:01 +02:00
Emeric Brun	4d7ada8f9e	MEDIUM: stick-table: add the new arrays of gpc and gpc_rate This patch adds the definition of two new array data_types: 'gpc': This is an array of 32bits General Purpose Counters. 'gpc_rate': This is an array on increment rates of General Purpose Counters. Like for all arrays, they are limited to 100 elements. This patch also adds actions and fetches to handle elements of those arrays. Note: As documented, those new actions and fetches won't apply to the legacy 'gpc0', 'gpc1', 'gpc0_rate' nor 'gpc1_rate'.	2021-07-06 07:24:42 +02:00
Emeric Brun	877b0b5a7b	MEDIUM: stick-table: add the new array of gpt data_type This patch adds the definition of a new array data_type 'gpt'. This is an array of 32bits General Purpose Tags. Like for all arrays, it is limited to 100 elements. This patch also adds actions and fetches to handle elements of this array. Note: As documented, those new actions and fetches won't apply to the legacy 'gpt0' data type.	2021-07-06 07:24:42 +02:00
Emeric Brun	90a9b676a8	MEDIUM: peers: handle arrays of std types in peers protocol This patch adds support of array data_types on the peer protocol. The table definition message will provide an additionnal parameter for array data-types: the number of elements of the array. In case of array of frqp it also provides a second parameter: the period used to compute freq counter. The array elements are std_type values linearly encoded in the update message. Note: if a remote peer announces an array data_type without parameters into the table definition message, all updates on this table will be ignored because we can not parse update messages consistently.	2021-07-06 07:24:42 +02:00
Emeric Brun	c64a2a307c	MEDIUM: stick-table: handle arrays of standard types into stick-tables This patch provides the code to handle arrays of some standard types (SINT, UINT, ULL and FRQP) in stick table. This way we could define new "array" data types. Note: the number of elements of an array was limited to 100 to put a limit and to ensure that an encoded update message will continue to fit into a buffer when the peer protocol will handle such data types.	2021-07-06 07:24:42 +02:00
Emeric Brun	0e3457b63a	MINOR: stick-table: make skttable_data_cast to use only std types This patch replaces all advanced data type aliases on stktable_data_cast calls by standard types. This way we could call the same stktable_data_cast regardless of the used advanced data type as long they are using the same std type. It also removes all the advanced data type aliases.	2021-07-06 07:24:42 +02:00
David Carlier	bae4cb2790	BUILD/MEDIUM: tcp: set-mark support for OpenBSD set-mark support for this platform, for routing table purpose. Follow-up from `f7f53afcf9`, this time for OpenBSD.	2021-07-05 10:53:18 +02:00
David Carlier	f7f53afcf9	BUILD/MEDIUM: tcp: set-mark setting support for FreeBSD. This platform has a similar socket option from Linux's SO_MARK, marking a socket with an id for packet filter purpose, DTrace monitoring and so on.	2021-06-28 07:03:35 +02:00

1 2 3 4 5 ...

5449 Commits