haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-10 17:17:06 +02:00

Author	SHA1	Message	Date
Ilya Shipitsin	f04a89c549	CLEANUP: remove unused function "ssl_sock_is_ckch_valid" "ssl_sock_is_ckch_valid" is not used anymore, let us remove it	2020-11-24 09:54:44 +01:00
Julien Pivotto	2de240a676	MINOR: stream: Add level 7 retries on http error 401, 403 Level-7 retries are only possible with a restricted number of HTTP return codes. While it is usually not safe to retry on 401 and 403, I came up with an authentication backend which was not synchronizing authentication of users. While not perfect, being allowed to also retry on those return codes is really helpful and acts as a hotfix until we can fix the backend. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-11-23 09:33:14 +01:00
Maciej Zdeb	ebdd4c55da	MINOR: http_act: Add -m flag for del-header name matching method This patch adds -m flag which allows to specify header name matching method when deleting headers from http request/response. Currently beg, end, sub, str and reg are supported. This is related to GitHub issue #909	2020-11-21 15:54:30 +01:00
Willy Tarreau	3aab17bd56	BUG/MAJOR: connection: reset conn->owner when detaching from session list Baptiste reported a new crash affecting 2.3 which can be triggered when using H2 on the backend, with http-reuse always and with a tens of clients doing close only. There are a few combined cases which cause this to happen, but each time the issue is the same, an already freed session is dereferenced in session_unown_conn(). Two cases were identified to cause this: - a connection referencing a session as its owner, which is detached from the session's list and is destroyed after this session ends. The test on conn->owner before calling session_unown_conn() is not sufficent as the pointer is not null but is not valid anymore. - a connection that never goes idle and that gets killed form the mux, where session_free() is called first, then conn_free() calls session_unown_conn() which scans the just freed session for older connections. This one is only triggered with DEBUG_UAF The reason for this session to be present here is that it's needed during the connection setup, to be passed to conn_install_mux_be() to mux->init() as the owning session, but it's never deleted aftrewards. Furthermore, even conn_session_free() doesn't delete this pointer after freeing the session that lies there. Both do definitely result in a use-after-free that's more easily triggered under DEBUG_UAF. This patch makes sure that the owner is always deleted after detaching or killing the session. However it is currently not possible to clear the owner right after a synchronous init because the proxy protocol apparently needs it (a reg test checks this), and if we leave it past the connection setup with the session not attached anywhere, it's hard to catch the right moment to detach it. This means that the session may remain in conn->owner as long as the connection has never been added to nor removed from the session's idle list. Given that this patch needs to remain simple enough to be backported, instead it adds a workaround in session_unown_conn() to detect that the element is already not attached anywhere. This fix absolutely requires previous patch "CLEANUP: connection: do not use conn->owner when the session is known" otherwise the situation will be even worse, as some places used to rely on conn->owner instead of the session. The fix could theorically be backported as far as 1.8. However, the code in this area has significantly changed along versions and there are more risks of breaking working stuff than fixing real issues there. The issue was really woken up in two steps during 2.3-dev when slightly reworking the idle conns with commit `08016ab82` ("MEDIUM: connection: Add private connections synchronously in session server list") and when adding support for storing used H2 connections in the session and adding the necessary call to session_unown_conn() in the muxes. But the same test managed to crash 2.2 when built in DEBUG_UAF and patched like this, proving that we used to already leave dangling pointers behind us: \| diff --git a/include/haproxy/connection.h b/include/haproxy/connection.h \| index f8f235c1a..dd30b5f80 100644 \| --- a/include/haproxy/connection.h \| +++ b/include/haproxy/connection.h \| @@ -458,6 +458,10 @@ static inline void conn_free(struct connection conn) \| sess->idle_conns--; \| session_unown_conn(sess, conn); \| } \| + else { \| + struct session sess = conn->owner; \| + BUG_ON(sess && sess->origin != &conn->obj_type); \| + } \| \| sockaddr_free(&conn->src); \| sockaddr_free(&conn->dst); It's uncertain whether an existing code path there can lead to dereferencing conn->owner when it's bad, though certain suspicious memory corruption bugs make one think it's a likely candidate. The patch should not be hard to adapt there. Backports to 2.1 and older are left to the appreciation of the person doing the backport. A reproducer consists in this: global nbthread 1 listen l bind :9000 mode http http-reuse always server s 127.0.0.1:8999 proto h2 frontend f bind :8999 proto h2 mode http http-request return status 200 Then this will make it crash within 2-3 seconds: $ h1load -e -r 1 -c 10 http://0:9000/ If it does not, it might be that DEBUG_UAF was not used (it's harder then) and it might be useful to restart.	2020-11-21 15:29:22 +01:00
Willy Tarreau	38b4d2eb22	CLEANUP: connection: do not use conn->owner when the session is known At a few places we used to rely on conn->owner to retrieve the session while the session is already known. This is not correct because at some of these points the reason the connection's owner was still the session (instead of NULL) is a mistake. At one place a comparison is even made between the session and conn->owner assuming it's valid without checking if it's NULL. Let's clean this up to use the session all the time. Note that this will be needed for a forthcoming fix and will have to be backported.	2020-11-21 15:29:22 +01:00
Ilya Shipitsin	f34ed0b74c	BUILD: SSL: guard TLS13 ciphersuites with HAVE_SSL_CTX_SET_CIPHERSUITES HAVE_SSL_CTX_SET_CIPHERSUITES is newly defined macro set in openssl-compat.h, which helps to identify ssl libs (currently OpenSSL-1.1.1 only) that supports TLS13 cipersuites manipulation on TLS13 context	2020-11-21 11:04:36 +01:00
Ilya Shipitsin	bdec3ba796	BUILD: ssl: use SSL_MODE_ASYNC macro instead of OPENSSL_VERSION	2020-11-19 19:59:32 +01:00
William Dauchy	f63704488e	MEDIUM: cli/ssl: configure ssl on server at runtime in the context of a progressive backend migration, we want to be able to activate SSL on outgoing connections to the server at runtime without reloading. This patch adds a `set server ssl` command; in order to allow that: - add `srv_use_ssl` to `show servers state` command for compatibility, also update associated parsing - when using default-server ssl setting, and `no-ssl` on server line, init SSL ctx without activating it - when triggering ssl API, de/activate SSL connections as requested - clean ongoing connections as it is done for addr/port changes, without checking prior server state example config: backend be_foo default-server ssl server srv0 127.0.0.1:6011 weight 1 no-ssl show servers state: 5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - -1 where srv0 can switch to ssl later during the runtime: set server be_foo/srv0 ssl on 5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - 1 Also update existing tests and create a new one. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2020-11-18 17:22:28 +01:00
Christopher Faulet	83fefbcdff	MINOR: init: Fix the prototype for per-thread free callbacks Functions registered to release memory per-thread have no return value. But the registering function and the function pointer in per_thread_free_fct structure specify it should return an integer. This patch fixes it. This patch may be backported as far as 2.0.	2020-11-13 16:26:10 +01:00
Amaury Denoyelle	7f8f6cb926	BUG/MEDIUM: stats: prevent crash if counters not alloc with dummy one Define a per-thread counters allocated with the greatest size of any stat module counters. This variable is named trash_counters. When using a proxy without allocated counters, return the trash counters from EXTRA_COUNTERS_GET instead of a dangling pointer to prevent segfault. This is useful for all the proxies used internally and not belonging to the global proxy list. As these objects does not appears on the stat report, it does not matter to use the dummy counters. For this fix to be functional, the extra counters are explicitly initialized to NULL on proxy/server/listener init functions. Most notably, the crash has already been detected with the following vtc: - reg-tests/lua/txn_get_priv.vtc - reg-tests/peers/tls_basic_sync.vtc - reg-tests/peers/tls_basic_sync_wo_stkt_backend.vtc There is probably other parts that may be impacted (SPOE for example). This bug was introduced in the current release and do not need to be backported. The faulty commits are "MINOR: ssl: count client hello for stats" and "MINOR: ssl: add counters for ssl sessions".	2020-11-12 15:16:05 +01:00
Remi Tricot-Le Breton	cc9bf2e5fe	MEDIUM: cache: Change caching conditions Do not cache responses that do not have an explicit expiration time (s-maxage or max-age Cache-Control directives or Expires header) or a validator (ETag or Last-Modified headers) anymore, as suggested in RFC 7234#3. The TX_FLAG_IGNORE flag is used instead of the TX_FLAG_CACHEABLE so as not to change the behavior of the checkcache option.	2020-11-12 11:22:05 +01:00
Christopher Faulet	a66adf41ea	MINOR: http-htx: Add understandable errors for the errorfiles parsing No details are provided when an error occurs during the parsing of an errorfile, Thus it is a bit hard to diagnose where the problem is. Now, when it happens, an understandable error message is reported. This patch is not a bug fix in itself. But it will be required to change an fatal error into a warning in last stable releases. Thus it must be backported as far as 2.0.	2020-11-06 09:13:58 +01:00
Willy Tarreau	38d41996c1	MEDIUM: pattern: turn the pattern chaining to single-linked list It does not require heavy deletion from the expr anymore, so we can now turn this to a single-linked list since most of the time we want to delete all instances of a given pattern from the head. By doing so we save 32 bytes of memory per pattern. The pat_unlink_from_head() function was adjusted accordingly.	2020-11-05 19:27:09 +01:00
Willy Tarreau	94b9abe200	MINOR: pattern: add pat_ref_purge_older() to purge old entries This function will be usable to purge at most a specified number of old entries from a reference. Entries are declared old if their generation number is in the past compared to the one passed in argument. This will ease removal of early entries when new ones have been appended. We also call malloc_trim() when available, at the end of the series, because this is one place where there is a lot of memory to save. Reloads of 1M IP addresses used in an ACL made the process grow up to 1.7 GB RSS after 10 reloads and roughly stabilize there without this call, versus only 260 MB when the call is present. Sadly there is no direct equivalent for jemalloc, which stabilizes around 800MB-1GB.	2020-11-05 19:27:09 +01:00
Willy Tarreau	1a6857b9c1	MINOR: pattern: implement pat_ref_load() to load a pattern at a given generation pat_ref_load() basically combines pat_ref_append() and pat_ref_commit(). It's very similar to pat_ref_add() except that it also allows to set the generation ID and the line number. pat_ref_add() was modified to directly rely on it to avoid code duplication. Note that a previous declaration of pat_ref_load() was removed as it was just a leftover of an earlier incarnation of something possibly similar, so no existing functionality was changed here.	2020-11-05 19:27:09 +01:00
Willy Tarreau	0439e5eeb4	MINOR: pattern: add pat_ref_commit() to commit a previously inserted element This function will be used after a successful pat_ref_append() to propagate the pattern to all use places (including parsing and indexing). On failure, it will entirely roll back all insertions and free the pattern itself. It also preserves the generation number so that it is convenient for use in association with pat_ref_append(). pat_ref_add() was modified to rely on it instead of open-coding the insertion and roll-back.	2020-11-05 19:27:09 +01:00
Willy Tarreau	29947745b5	MINOR: pattern: store a generation number in the reference patterns Right now it's not possible to perform a safe reload because we don't know what patterns were recently added or were already present. This patch adds a generation counter to the reference patterns so that it is possible to know what generation of the reference they were loaded with. A reference now has two generations, the current one, used for all additions, and the next one, allocated to those wishing to update the contents. The generation wraps at 2^32 so comparisons must be made relative to the current position. The idea will be that upon full reload, the caller will first get a new generation ID, will insert all new patterns using it, will then switch the current ID to the new one, and will delete all entries older than the current ID. This has the benefit of supporting chunked updates that remain consistent and that won't block the whole process for ages like pat_ref_reload() currently does.	2020-11-05 19:27:09 +01:00
Willy Tarreau	1fd52f70e5	MINOR: pattern: introduce pat_ref_delete_by_ptr() to delete a valid reference Till now the only way to remove a known reference was via pat_ref_delete_by_id() which scans the whole list to find a matching pointer. Let's add pat_ref_delete_by_ptr() which takes a valid pointer. It can be called by the function above after the pointer is found, and can also be used to roll back a failed insertion much more efficiently.	2020-11-05 19:27:09 +01:00
Willy Tarreau	a98b2882ac	CLEANUP: pattern: remove pat_delete_fcts[] and pattern_head->delete() These ones are not used anymore, so let's remove them to remove a bit of the complexity. The ACL keyword's delete() function could be removed as well, though most keyword declarations are positional and we have a high risk of introducing a mistake here, so let's not touch the ACL part.	2020-11-05 19:27:09 +01:00
Willy Tarreau	f1c0892aa6	MINOR: pattern: remerge the list and tree deletion functions pat_del_tree_gen() was already chained onto pat_del_list_gen() to deal with remaining cases, so let's complete the merge and have a generic pattern deletion function acting on the reference and taking care of reliably removing all elements.	2020-11-05 19:27:09 +01:00
Willy Tarreau	78777ead32	MEDIUM: pattern: change the pat_del_* functions to delete from the references This is the next step in speeding up entry removal. Now we don't scan the whole lists or trees for elements pointing to the target reference, instead we start from the reference and delete all linked patterns. This simplifies some delete functions since we don't need anymore to delete multiple times from an expression since all nodes appear after the reference element. We can now have one generic list and one generic tree deletion function. This required the replacement of pattern_delete() with an open-coded version since we now need to lock all expressions first before proceeding. This means there is a high risk of lock inversion here but given that the expressions are always scanned in the same order from the same head, this must not happen. Now deleting first entries is instantaneous, and it's still slow to delete the last ones when looking up their ID since it still requires to look them up by a full scan, but it's already way faster than previously. Typically removing the last 10 IP from a 20M entries ACL with a full-scan each took less than 2 seconds. It would be technically possible to make use of indexed entries to speed up most lookups for removal by value (e.g. IP addresses) but that's for later.	2020-11-05 19:27:09 +01:00
Willy Tarreau	4bdd0a13d6	MEDIUM: pattern: link all final elements from the reference There is a data model issue in the current pattern design that makes pattern deletion extremely expensive: there's no direct way from a reference to access all indexed occurrences. As such, the only way to remove all indexed entries corresponding to a reference update is to scan all expressions's lists and trees to find a link to the reference. While this was possibly OK when map removal was not common and most maps were small, this is not conceivable anymore with GeoIP maps containing 10M+ entries and del-map operations that are triggered from http-request rulesets. This patch introduces two list heads from the pattern reference, one for the objects linked by lists and one for those linked by tree node. Ideally a single list would be enough but the linked elements are too much unrelated to be distinguished at the moment, so we'll need two lists. However for the long term a single-linked list will suffice but for now it's not possible due to the way elements are removed from expressions. As such this patch adds 32 bytes of memory usage per reference plus 16 per indexed entry, but both will be cut in half later. The links are not yet used for deletion, this patch only ensures the list is always consistent.	2020-11-05 19:27:09 +01:00
Willy Tarreau	6d8a68914e	MINOR: pattern: make the delete and prune functions more generic Now we have a single prune() function to act on an expression, and one delete function for the lists and one for the trees. The presence of a pointer in the lists is enough to warrant a free, and we rely on the PAT_SF_REGFREE flag to decide whether to free using free() or regfree().	2020-11-05 19:27:09 +01:00
Willy Tarreau	9b5c8bbc89	MINOR: pattern: new sflag PAT_SF_REGFREE indicates regex_free() is needed Currently we have no way to know how to delete/prune a pattern in a generic way. A pattern doesn't contain its own type so we don't know what function to call. Tree nodes are roughly OK but not lists where regex are possible. Let's add one new bit for sflags at index time to indicate that regex_free() will be needed upon deletion. It's not used for now.	2020-11-05 19:27:08 +01:00
Willy Tarreau	3ee0de1b41	MINOR: pattern: move the update revision to the pat_ref, not the expression It's not possible to uniquely update a single expression without updating the pattern reference, I don't know why we've put the revision in the expression back then, given that it in fact provides an update for a full pattern. Let's move the revision into the reference's head instead.	2020-11-05 19:27:08 +01:00
Willy Tarreau	1d3c7003d9	MINOR: compat: automatically include malloc.h on glibc This is in order to access malloc_trim() which is convenient after clearing huge maps to reclaim memory. When this is detected, we also define HA_HAVE_MALLOC_TRIM.	2020-11-05 19:27:08 +01:00
Baptiste Assmann	e279ca6bbe	MINOR: sample: Add converts to parses MQTT messages This patch implements a couple of converters to validate and extract data from a MQTT (Message Queuing Telemetry Transport) message. The validation consists of a few checks as well as "packet size" validation. The extraction can get any field from the variable header and the payload. This is limited to CONNECT and CONNACK packet types only. All other messages are considered as invalid. It is not a problem for now because only the first packet on each side can be parsed (CONNECT for the client and CONNACK for the server). MQTT 3.1.1 and 5.0 are supported. Reviewed and Fixed by Christopher Faulet <cfaulet@haproxy.com>	2020-11-05 19:27:03 +01:00
Baptiste Assmann	e138dda1e0	MINOR: sample: Add converters to parse FIX messages This patch implements a couple of converters to validate and extract tag value from a FIX (Financial Information eXchange) message. The validation consists in a few checks such as mandatory fields and checksum computation. The extraction can get any tag value based on a tag string or tag id. This patch requires the istend() function. Thus it depends on "MINOR: ist: Add istend() function to return a pointer to the end of the string". Reviewed and Fixed by Christopher Faulet <cfaulet@haproxy.com>	2020-11-05 19:26:30 +01:00
Christopher Faulet	cf26623780	MINOR: ist: Add istend() function to return a pointer to the end of the string istend() is a shortcut to istptr() + istlen().	2020-11-05 19:25:12 +01:00
Willy Tarreau	1db5579bf8	[RELEASE] Released version 2.4-dev0 Released version 2.4-dev0 with the following main changes : - MINOR: version: it's development again. - DOC: mention in INSTALL that it's development again	2020-11-05 17:20:35 +01:00
Willy Tarreau	b9b2ac20f8	MINOR: version: it's development again. This reverts commit `0badabc381`.	2020-11-05 17:18:49 +01:00
Willy Tarreau	0badabc381	MINOR: version: mention that it's stable now This version will be maintained up to around Q1 2022.	2020-11-05 17:00:50 +01:00
Ilya Shipitsin	0aa8c29460	BUILD: ssl: use feature macros for detecting ec curves manipulation support Let us use SSL_CTX_set1_curves_list, defined by OpenSSL, as well as in openssl-compat when SSL_CTRL_SET_CURVES_LIST is present (BoringSSL), for feature detection instead of versions.	2020-11-05 15:08:41 +01:00
Willy Tarreau	5b8af1e30c	MINOR: ssl: define SSL_CTX_set1_curves_list to itself on BoringSSL OpenSSL 1.0.2 and onwards define SSL_CTX_set1_curves_list which is both a function and a macro. OpenSSL 1.0.2 to 1.1.0 define SSL_CTRL_SET_CURVES_LIST as a macro, which disappeared from 1.1.1. BoringSSL only has that one and not the former macro but it does have the function. Let's keep the test on the macro matching the function name by defining the macro to itself when needed.	2020-11-05 15:05:09 +01:00
Willy Tarreau	7e98e28eb0	MINOR: fd: add fd_want_recv_safe() This does the same as fd_want_recv() except that it does check for fd_updt[] to be allocated, as this may be called during early listener initialization. Previously we used to check fd_updt[] before calling fd_want_recv() but this is not correct since it does not update the FD flags. This method will be safer.	2020-11-04 14:22:42 +01:00
Willy Tarreau	9dd7f4fb4b	MINOR: debug: don't count free(NULL) in memstats The mem stats are pretty convenient to spot leaks, except that they count free(NULL) as 1, and the code does actually have quite a number of free(foo) guards where foo is NULL if the object was already freed. Let's just not count these ones so that the stats remain consistent. Now it's possible to compare the strdup()/malloc() and free() and verify they are consistent.	2020-11-03 16:46:48 +01:00
Ilya Shipitsin	04a5a440b8	BUILD: ssl: use HAVE_OPENSSL_KEYLOG instead of OpenSSL versions let us use HAVE_OPENSSL_KEYLOG for feature detection instead of versions	2020-11-03 14:54:15 +01:00
Willy Tarreau	b706a3b4e1	CLEANUP: pattern: remove unused entry "tree" in pattern.val This one might have disappeared since patterns were reworked, but the entry was not removed from the structure, let's do it now.	2020-11-02 11:32:05 +01:00
Willy Tarreau	6bedf151e1	MINOR: pattern: export pat_ref_push() Strangely this one was marked static inline within the file itself. Let's export it.	2020-10-31 13:13:48 +01:00
Willy Tarreau	f4edb72e0a	MINOR: pattern: make pat_ref_append() return the newly added element It's more convenient to return the element than to return just 0 or 1, as the next thing we'll want to do is to act on this element! In addition it was using variable arguments instead of consts, causing some reuse constraints which were also addressed. This doesn't change its use as a boolean, hence why call places were not modified.	2020-10-31 13:13:48 +01:00
Remi Tricot-Le Breton	bb4582cf71	MINOR: ist: Add a case insensitive istmatch function Add a helper function that checks if a string starts with another string while ignoring case.	2020-10-30 13:20:21 +01:00
Willy Tarreau	bd71510024	MINOR: stats: report server's user-configured weight next to effective weight The "weight" column on the stats page is somewhat confusing when using slowstart becaue it reports the effective weight, without being really explicit about it. In some situations the user-configured weight is more relevant (especially with long slowstarts where it's important to know if the configured weight is correct). This adds a new uweight stat which reports a server's user-configured weight, and in a backend it receives the sum of all servers' uweights. In addition it adds the mention of "effective" in a few descriptions for the "weight" column (help and doc). As a result, the list of servers in a backend is now always scanned when dumping the stats. But this is not a problem given that these servers are already scanned anyway and for way heavier processing.	2020-10-23 22:47:30 +02:00
Willy Tarreau	3e32036701	MINOR: stats: also support a "no-maint" show stat modifier "no-maint" is a bit similar to "up" except that it will only hide servers that are in maintenance (or disabled in the configuration), and not those that are enabled but failed a check. One benefit here is to significantly reduce the output of the "show stat" command when using large server-templates containing entries that are not yet provisioned. Note that the prometheus exporter also has such an option which does the exact same.	2020-10-23 18:11:24 +02:00
Willy Tarreau	670119955b	Revert "OPTIM: queue: don't call pendconn_unlink() when the pendconn is not queued" This reverts commit `b7ba1d9011`. Actually this test had already been removed in the past by commit `fac0f645d` ("BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe"), but the condition to reproduce the bug mentioned there was not clear. Now after analysis and a certain dose of code cleanup, things start to appear more obvious. what happens is that if we check the presence of the node in the tree without taking the lock, we can see the NULL at the instant the node is being unlinked by another thread in pendconn_process_next_strm() as part of __pendconn_unlink_prx() or __pendconn_unlink_srv(). Till now there is no issue except that the pendconn is not removed from the queue during this operation and that the task is scheduled to be woken up by pendconn_process_next_strm() with the stream being added to the list of the server's active connections by __stream_add_srv_conn(). The first thread finishes faster and gets back to stream_free() faster than the second one sets the srv_conn on the stream, so stream_free() skips the s->srv_conn test and doesn't try to dequeue the freshly queued entry. At the very least a barrier would be needed there but we can't afford to free the stream while it's being queued. So there's no other solution than making sure that either __pendconn_unlink_prx() or pendconn_cond_unlink() get the entry but never both, which is why the lock is required around the test. A possible solution would be to set p->target before unlinking the entry and using it to complete the test. This would leave no dead period where the pendconn is not seen as attached. It is possible, yet extremely difficult, to reproduce this bug, which was first noticed in bug #880. Running 100 servers with maxconn 1 and maxqueue 1 on leastconn and a connect timeout of 30ms under 16 threads with DEBUG_UAF, with a traffic making the backend's queue oscillate around zero (typically using 250 connections with a local httpterm server) may rarely manage to trigger a use-after-free. No backport is needed.	2020-10-23 09:21:55 +02:00
Willy Tarreau	b7ba1d9011	OPTIM: queue: don't call pendconn_unlink() when the pendconn is not queued On connection error processing, we can see massive storms of calls to pendconn_cond_unlink() to release a possible place in the queue. For example, in issue #908, on average half of the threads are caught in this function via back_try_conn_req() consecutive to a synchronous error. However we wait until grabbing the lock to know if the pendconn is effectively in a queue, which is expensive for many cases. We know the transition may only happen from in-queue to out-of-queue so it's safe to first run a preliminary check to see if it's worth going further. This will allow to avoid the cost of locking for most requests. This should not change anything for those completing correctly as they're already run through pendconn_free() which doesn't call pendconn_cond_unlink() unless deemed necessary.	2020-10-22 17:32:28 +02:00
Willy Tarreau	ac66d6bafb	MINOR: proxy; replace the spinlock with an rwlock This is an anticipation of finer grained locking for the queues. For now all lock places take a write lock so that there is no difference at all with previous code.	2020-10-22 17:32:28 +02:00
Willy Tarreau	de785f04e1	MINOR: threads/debug: only report lock stats for used operations In addition to the previous simplification, most locks don't use the seek or read lock (e.g. spinlocks etc) so let's split the dump into distinct operations (write/seek/read) and only report those which were used. Now the output size is roughly divided by 5 compared to previous ones.	2020-10-22 17:32:28 +02:00
Willy Tarreau	23d3b00bdd	MINOR: threads/debug: only report used lock stats The lock stats are very verbose and more than half of them are used in a typical test, making it hard to spot the sought values. Let's simply report "not used" for those which have not been called at all.	2020-10-22 17:32:28 +02:00
Christopher Faulet	d6c48366b8	BUG/MINOR: http-ana: Don't send payload for internal responses to HEAD requests When an internal response is returned to a client, the message payload must be skipped if it is a reply to a HEAD request. The payload is removed from the HTX message just before the message forwarding. This bugs has been around for a long time. It was already there in the pre-HTX versions. In legacy HTTP mode, internal errors are not parsed. So this bug cannot be easily fixed. Thus, this patch should only be backported in all HTX versions, as far as 2.0. However, the code has significantly changed in the 2.2. Thus in the 2.1 and 2.0, the patch must be entirely reworked.	2020-10-22 17:13:22 +02:00
Remi Tricot-Le Breton	6cb10384a3	MEDIUM: cache: Add support for 'If-None-Match' request header Partial support of conditional HTTP requests. This commit adds the support of the 'If-None-Match' header (see RFC 7232#3.2). When a client specifies a list of ETags through one or more 'If-None-Match' headers, they are all compared to the one that might have been stored in the corresponding http cache entry until one of them matches. If a match happens, a specific "304 Not Modified" response is sent instead of the cached data. This response has all the stored headers but no other data (see RFC 7232#4.1). Otherwise, the whole cached data is sent. Although unlikely in a GET/HEAD request, the "If-None-Match: *" syntax is valid and also receives a "304 Not Modified" response (RFC 7434#4.3.2). This resolves a part of GitHub issue #821.	2020-10-22 16:10:20 +02:00
Remi Tricot-Le Breton	bcced09b91	MINOR: http: Add etag comparison function Add a function that compares two etags that might be of different types. If any of them is weak, the 'W/' prefix is discarded and a strict string comparison is performed. Co-authored-by: Tim Duesterhus <tim@bastelstu.be>	2020-10-22 16:06:20 +02:00
Tim Duesterhus	2493ee81d4	MINOR: http: Add `enum etag_type http_get_etag_type(const struct ist)` http_get_etag_type returns whether a given `etag` is a strong, weak, or invalid ETag.	2020-10-22 16:02:29 +02:00
William Lallemand	8e8581e242	MINOR: ssl: 'ssl-load-extra-del-ext' removes the certificate extension In issue #785, users are reporting that it's not convenient to load a ".crt.key" when the configuration contains a ".crt". This option allows to remove the extension of the certificate before trying to load any extra SSL file (.key, .ocsp, .sctl, .issuer etc.) The patch changes a little bit the way ssl_sock_load_files_into_ckch() looks for the file.	2020-10-20 18:25:46 +02:00
Christopher Faulet	96ddc8ab43	BUG/MEDIUM: connection: Never cleanup server lists when freeing private conns When a connection is released, depending on its state, it may be detached from the session and it may be removed from the server lists. The first case may happen for private or unsharable active connections. The second one should only be performed for idle or available connections. We never try to remove a connection from the server list if it is attached to a session. But it is also important to never try to remove a private connecion from the server lists, even if it is not attached to a session. Otherwise, the curr_used_conn server counter is decremented once too often. This bug was introduced by the commit `04a24c5ea` ("MINOR: connection: don't check priv flag on free"). It is related to the issue #881. It only affects the 2.3, no backport is needed.	2020-10-19 17:19:10 +02:00
Willy Tarreau	69a7b8fc6c	CLEANUP: task: remove the unused and mishandled global_rqueue_size This counter is only updated and never used, and in addition it's done without any atomicity so it's very unlikely to be correct on multi-CPU systems! Let's just remove it since it's not used.	2020-10-19 14:08:13 +02:00
Willy Tarreau	e72a3f4489	CLEANUP: tree-wide: reorder a few structures to plug some holes around locks A few structures were slightly rearranged in order to plug some holes left around the locks. Sizes ranging from 8 to 32 bytes could be saved depending on the structures. No performance difference was noticed (none was expected there), though memory usage might be slightly reduced in some rare cases.	2020-10-19 14:08:13 +02:00
Willy Tarreau	8f1f177ed0	MINOR: threads: change lock_t to an unsigned int We don't need to waste the size of a long for the locks: with the plocks, even an unsigned short would offer enough room for up to 126 threads! Let's use an unsigned int which will be easier to place in certain structures and will more conveniently plug some holes, and Atomic ops are at least as fast on 32-bit as on 64-bit. This will not change anything for 32-bit platforms.	2020-10-19 14:08:13 +02:00
Willy Tarreau	3d18498645	CLEANUP: threads: don't register an initcall when not debugging It's a bit overkill to register an initcall to call a function to set a lock to zero when not debugging, let's just declare the lock as pre-initialized to zero.	2020-10-19 14:08:13 +02:00
Ilya Shipitsin	fcb69d768b	BUILD: ssl: make BoringSSL use its own version numbers BoringSSL is a fork of OpenSSL 1.1.0, however in 49e9f67d8b7cbeb3953b5548ad1009d15947a523 it has changed version to 1.1.1. Should fix issue #895. This must be backported to 2.2, 2.1, 2.0, 1.8	2020-10-19 11:34:37 +02:00
Willy Tarreau	cd10def825	MINOR: backend: replace the lbprm lock with an rwlock It was previously a spinlock, and it happens that a number of LB algos only lock it for lookups, without performing any modification. Let's first turn it to an rwlock and w-lock it everywhere. This is strictly identical. It was carefully checked that every HA_SPIN_LOCK() was turned to HA_RWLOCK_WRLOCK() and that HA_SPIN_UNLOCK() was turned to HA_RWLOCK_WRUNLOCK() on this lock. _INIT and _DESTROY were updated too.	2020-10-17 18:51:41 +02:00
Willy Tarreau	61f799b8da	MINOR: threads: add the transitions to/from the seek state Since our locks are based on progressive locks, we support the upgradable seek lock that is compatible with readers and upgradable to a write lock. The main purpose is to take it while seeking down a tree for modification while other threads may seek the same tree for an input (e.g. compute the next event date). The newly supported operations are: HA_RWLOCK_SKLOCK(lbl,l) pl_take_s(l) /* N --> S / HA_RWLOCK_SKTOWR(lbl,l) pl_stow(l) / S --> W / HA_RWLOCK_WRTOSK(lbl,l) pl_wtos(l) / W --> S / HA_RWLOCK_SKTORD(lbl,l) pl_stor(l) / S --> R / HA_RWLOCK_WRTORD(lbl,l) pl_wtor(l) / W --> R / HA_RWLOCK_SKUNLOCK(lbl,l) pl_drop_s(l) / S --> N / HA_RWLOCK_TRYSKLOCK(lbl,l) (!pl_try_s(l)) / N -?> S / HA_RWLOCK_TRYRDTOSK(lbl,l) (!pl_try_rtos(l)) / R -?> S */ Existing code paths are left unaffected so this patch doesn't affect any running code.	2020-10-16 16:53:46 +02:00
Willy Tarreau	8d5360ca7f	MINOR: threads: augment rwlock debugging stats to report seek lock stats We currently use only read and write lock operations with rwlocks, but ours also support upgradable seek locks for which we do not report any stats. Let's add them now when DEBUG_THREAD is enabled.	2020-10-16 16:51:49 +02:00
Willy Tarreau	233ad288cd	CLEANUP: protocol: remove the now unused <handler> field of proto_fam->bind() We don't need to specify the handler anymore since it's set in the receiver. Let's remove this argument from the function and clean up the remains of code that were still setting it.	2020-10-15 21:47:56 +02:00
Willy Tarreau	a74cb38e7c	MINOR: protocol: register the receiver's I/O handler and not the protocol's Now we define a new sock_accept_iocb() for socket-based stream protocols and use it as a wrapper for listener_accept() which now takes a listener and not an FD anymore. This will allow the receiver's I/O cb to be redefined during registration, and more specifically to get rid of the hard-coded hacks in protocol_bind_all() made for syslog. The previous ->accept() callback in the protocol was removed since it doesn't have anything to do with accept() anymore but is more generic. A few places where listener_accept() was compared against the FD's IO callback for debugging purposes on the CLI were updated.	2020-10-15 21:47:56 +02:00
Willy Tarreau	d2fb99f9d5	MINOR: protocol: add a default I/O callback and put it into the receiver For now we're still using the protocol's default accept() function as the I/O callback registered by the receiver into the poller. While this is usable for most TCP connections where a listener is needed, this is not suitable for UDP where a different handler is needed. Let's make this configurable in the receiver just like the upper layer is configurable for listeners. In order to ease stream protocols handling, the protocols will now provide a default I/O callback which will be preset into the receivers upon allocation so that almost none of them has to deal with it.	2020-10-15 21:47:56 +02:00
Willy Tarreau	f1dc9f2f17	MINOR: sock: implement sock_accept_conn() to accept a connection The socket-specific accept() code in listener_accept() has nothing to do there. Let's move it to sock.c where it can be significantly cleaned up. It will now directly return an accepted connection and provide a status code instead of letting listener_accept() deal with various errno values. Note that this doesn't support the sockpair specific code. The function is now responsible for dealing with its own receiver's polling state and calling fd_cant_recv() when facing EAGAIN. One tiny change from the previous implementation is that the connection's sockaddr is now allocated before trying accept(), which saves a memcpy() of the resulting address for each accept at the expense of a cheap pool_alloc/pool_free on the final accept returning EAGAIN. This still apparently slightly improves accept performance in microbencharks.	2020-10-15 21:47:56 +02:00
Willy Tarreau	1e509a7231	MINOR: protocol: add a new function accept_conn() This per-protocol function will be used to accept an incoming connection and return it as a struct connection*. As such the protocol stack's internal representation of a connection will not need to be handled by the listener code.	2020-10-15 21:47:56 +02:00
Willy Tarreau	7d053e4211	MINOR: sock: rename sock_accept_conn() to sock_accepting_conn() This call was introduced by commit `5ced3e887` ("MINOR: sock: add sock_accept_conn() to test a listening socket") but is actually quite confusing because it makes one think the socket will accept a connection (which is what we want to have in a new function) while it only tells whether it's configured to accept connections. Let's call it sock_accepting_conn() instead. The same change was applied to sockpair which had the same issue.	2020-10-15 21:47:56 +02:00
Willy Tarreau	65ed143841	MINOR: connection: add new error codes for accept_conn() accept_conn() will be used to accept an incoming connection and return it. It will have to deal with various error codes. The currently identified ones were created as CO_AC_*.	2020-10-15 21:47:56 +02:00
Willy Tarreau	83efc320aa	MEDIUM: listener: allocate the connection before queuing a new connection Till now we would keep a per-thread queue of pending incoming connections for which we would store: - the listener - the accepted FD - the source address - the source address' length And these elements were first used in session_accept_fd() running on the target thread to allocate a connection and duplicate them again. Doing this induces various problems. The first one is that session_accept_fd() may only run on file descriptors and cannot be reused for QUIC. The second issue is that it induces lots of memory copies and that the listerner queue thrashes a lot of cache, consuming 64 bytes per entry. This patch changes this by allocating the connection before queueing it, and by only placing the connection's pointer into the queue. Indeed, the first two calls used to initialize the connection already store all the information above, which can be retrieved from the connection pointer alone. So we just have to pop one pointer from the target thread, and pass it to session_accept_fd() which only needs the FD for the final settings. This starts to make the accept path a bit more transport-agnostic, and saves memory and CPU cycles at the same time (1% connection rate increase was noticed with 4 threads). Thanks to dividing the accept-queue entry size from 64 to 8 bytes, its size could be increased from 256 to 1024 connections while still dividing the overall size by two. No single queue full condition was met. One minor drawback is that connection may be allocated from one thread's pool to be used into another one. But this already happens a lot with connection reuse so there is really nothing new here.	2020-10-15 21:47:56 +02:00
Willy Tarreau	9b7587a6af	MINOR: connection: make sockaddr_alloc() take the address to be copied Roughly half of the calls to sockadr_alloc() are made to copy an already known address. Let's optionally pass it in argument so that the function can handle the copy at the same time, this slightly simplifies its usage.	2020-10-15 21:47:56 +02:00
Willy Tarreau	0138f51f93	CLEANUP: fd: finally get rid of fd_done_recv() fd_done_recv() used to be useful with the FD cache because it used to allow to keep a file descriptor active in the poller without being marked as ready in the cache, saving it from ringing immediately, without incurring any system call. It was a way to make it yield to wait for new events leaving a bit of time for others. The only user left was the connection accepter (listen_accept()). We used to suspect that with the FD cache removal it had become totally useless since changing its readiness or not wouldn't change its status regarding the poller itself, which would be the only one deciding to report it again. Careful tests showed that it indeed has exactly zero effect nowadays, the syscall numbers are exactly the same with and without, including when enabling edge-triggered polling. Given that there's no more API available to manipulate it and that it was directly called as an optimization from listener_accept(), it's about time to remove it.	2020-10-15 21:47:56 +02:00
Willy Tarreau	e53e7ec9d9	CLEANUP: protocol: remove the ->drain() function No protocol defines it anymore. The last user used to be the monitor-net stuff that got partially broken already when the tcp_drain() function moved to conn_sock_drain() with commit `e215bba95` ("MINOR: connection: make conn_sock_drain() work for all socket families") in 1.9-dev2. A part of this will surely move back later when non-socket connections arrive with QUIC but better keep the API clean and implement what's needed in time instead.	2020-10-15 21:47:04 +02:00
Willy Tarreau	9e9919dd8b	MEDIUM: proxy: remove obsolete "monitor-net" As discussed here during 2.1-dev, "monitor-net" is totally obsolete: https://www.mail-archive.com/haproxy@formilux.org/msg35204.html It's fundamentally incompatible with usage of SSL, and imposes the presence of file descriptors with hard-coded syscalls directly in the generic accept path. It's very unlikely that anyone has used it in the last 10 years for anything beyond testing. In the worst case if anyone would depend on it, replacing it with "http-request return status 200 if ..." and "mode http" would certainly do the trick. The keyword is still detected as special by the config parser to help users update their configurations appropriately.	2020-10-15 21:47:04 +02:00
Willy Tarreau	77e0daef9f	MEDIUM: proxy: remove obsolete "mode health" As discussed here during 2.1-dev, "mode health" is totally obsolete: https://www.mail-archive.com/haproxy@formilux.org/msg35204.html It's fundamentally incompatible with usage of SSL, doesn't support source filtering, and imposes the presence of file descriptors with hard-coded syscalls directly in the generic accept path. It's very unlikely that anyone has used it in the last 10 years for anything beyond testing. In the worst case if anyone would depend on it, replacing it with "http-request return status 200" and "mode http" would certainly do the trick. The keyword is still detected as special by the config parser to help users update their configurations appropriately.	2020-10-15 21:47:04 +02:00
Amaury Denoyelle	04a24c5eaa	MINOR: connection: don't check priv flag on free Do not check CO_FL_PRIVATE flag to check if the connection is in session list on conn_free. This is necessary due to the future patches which add server connections in the session list even if not private, if the mux protocol is the subject of HOL blocking.	2020-10-15 15:19:34 +02:00
Amaury Denoyelle	3d3c0918dc	MINOR: mux/connection: add a new mux flag for HOL risk This flag is used to indicate if the mux protocol is subject to head-of-line blocking problem.	2020-10-15 15:19:34 +02:00
Amaury Denoyelle	c98df5fb44	MINOR: connection: improve list api usage Replace !LIST_ISEMPTY by LIST_ADDED and LIST_DEL+LIST_INIT by LIST_DEL_INIT for connection session list.	2020-10-15 15:19:34 +02:00
Amaury Denoyelle	9c13b62b47	BUG/MEDIUM: connection: fix srv idle count on conn takeover On server connection migration from one thread to another, the wrong idle thread-specific counter is decremented. This bug was introduced since commit `3d52f0f1f8` due to the factorization with srv_use_idle_conn. However, this statement is only executed from conn_backend_get. Extract the decrement from srv_use_idle_conn in conn_backend_get and use the correct thread-specific counter. Rename the function to srv_use_conn to better reflect its purpose as it is also used with a newly initialized connection not in the idle list. As a side change, the connection insertion to available list has also been extracted to conn_backend_get. This will be useful to be able to specify an alternative list for protocol subject to HOL risk that should not be shared between several clients. This bug is only present in this release and thus do not need a backport.	2020-10-15 15:19:34 +02:00
Willy Tarreau	29185140db	MINOR: protocol: make proto_tcp & proto_uxst report listening sockets Now we introdce a new .rx_listening() function to report if a receiver is actually a listening socket. The reason for this is to help detect shared sockets that might have been broken by sibling processes.	2020-10-13 18:15:33 +02:00
Willy Tarreau	5ced3e8879	MINOR: sock: add sock_accept_conn() to test a listening socket At several places we need to check if a socket is still valid and still willing to accept connections. Instead of open-coding this, each time, let's add a new function for this.	2020-10-13 18:15:33 +02:00
Fr�d�ric L�caille	3fc0fe05fd	MINOR: peers: heartbeat, collisions and handshake information for "show peers" command. This patch adds "coll" new counter and the heartbeat timer values to "show peers" command. It also adds the elapsed time since the last handshake to new "last_hdshk" new peer dump field.	2020-10-09 20:59:58 +02:00
Willy Tarreau	e03204c8e1	MEDIUM: listeners: implement protocol level ->suspend/resume() calls Now we have ->suspend() and ->resume() for listeners at the protocol level. This means that it now becomes possible for a protocol to redefine its own way to suspend and resume. The default functions are provided for TCP, UDP and unix, and they are pass-through to the receiver equivalent as it used to be till now. Nothing was defined for sockpair since it does not need to suspend/resume during reloads, hence it will succeed.	2020-10-09 18:44:37 +02:00
Willy Tarreau	7b2febde1d	MINOR: listeners: split do_unbind_listener() in two The inner part now goes into the protocol and is used to decide how to unbind a given protocol's listener. The existing code which is able to also unbind the receiver was provided as a default function that we currently use everywhere. Some complex listeners like QUIC will use this to decide how to unbind without impacting existing connections, possibly by setting up other incoming paths for the traffic.	2020-10-09 18:44:37 +02:00
Willy Tarreau	f58b8db47b	MEDIUM: receivers: add an rx_unbind() method in the protocols This is used as a generic way to unbind a receiver at the end of do_unbind_listener(). This allows to considerably simplify that function since we can now let the protocol perform the cleanup. The generic code was moved to sock.c, along with the conditional rx_disable() call. Now the code also supports that the ->disable() function of the protocol which acts on the listener performs the close itself and adjusts the RX_F_BUOND flag accordingly.	2020-10-09 18:44:36 +02:00
Willy Tarreau	18c20d28d7	MINOR: listeners: move the LI_O_MWORKER flag to the receiver This listener flag indicates whether the receiver part of the listener is specific to the master or to the workers. In practice it's only used by the master's CLI right now. It's used to know whether or not the FD must be closed before forking the workers. For this reason it's way more of a receiver's property than a listener's property, so let's move it there under the name RX_F_MWORKER. The rest of the code remains unchanged.	2020-10-09 18:43:05 +02:00
Willy Tarreau	75c98d166e	CLEANUP: listeners: remove the do_close argument to unbind_listener() And also remove it from its callers. This subtle distinction was added as sort of a hack for the seamless reload feature but is not needed anymore since the do_close turned unused since commit previous commit ("MEDIUM: listener: let do_unbind_listener() decide whether to close or not"). This also removes the unbind_listener_no_close() function.	2020-10-09 18:41:56 +02:00
Willy Tarreau	02e8557e88	MINOR: protocol: add protocol_stop_now() to instant-stop listeners This will instantly stop all listeners except those which belong to a proxy configured with a grace time. This means that UDP listeners, and peers will also be stopped when called this way.	2020-10-09 18:29:04 +02:00
Willy Tarreau	acde152175	MEDIUM: proxy: centralize proxy status update and reporting There are multiple ways a proxy may switch to the disabled state, but now it's essentially once it loses its last listener. Instead of keeping duplicate code around and reporting the state change before actually seeing it, we now report it at the moment it's performed (from the last listener leaving) which allows to remove the message from all other places.	2020-10-09 18:29:04 +02:00
Willy Tarreau	a389c9e1e3	MEDIUM: proxy: add mode PR_MODE_PEERS to flag peers frontends For now we cannot easily distinguish a peers frontend from another one, which will be problematic to avoid reporting them when stopping their listeners. Let's add PR_MODE_PEERS for this. It's not supposed to cause any issue since all non-HTTP proxies are handled similarly now.	2020-10-09 18:28:21 +02:00
Willy Tarreau	caa7df1296	MINOR: listeners: add a new stop_listener() function This function will be used to definitely stop a listener (e.g. during a soft_stop). This is actually tricky because it may be called for a proxy or for a protocol, both of which require locks and already hold some. The function takes booleans indicating which ones are already held, hoping this will be enough. It's not well defined wether proto->disable() and proto->rx_disable() are supposed to be called with any lock held, and they are used from do_unbind_listener() with all these locks. Some back annotations ought to be added on this point. The proxy's listeners count is updated, and the proxy is marked as disabled and woken up after the last one is gone. Note that a listener in listen state is already not attached anymore since it was disabled.	2020-10-09 18:27:48 +02:00
Willy Tarreau	b4c083f5bf	MINOR: listeners: split delete_listener() in two versions We'll need an already locked variant of this function so let's make __delete_listener() which will be called with the protocol lock held and the listener's lock held.	2020-10-09 11:27:30 +02:00
Willy Tarreau	5ddf1ce9c4	MINOR: protocol: add a new pair of enable/disable methods for listeners These methods will be used to enable/disable accepting new connections so that listeners do not play with FD directly anymore. Since all the currently supported protocols work on socket for now, these are identical to the rx_enable/rx_disable functions. However they were not defined in sock.c since it's likely that some will quickly start to differ. At the moment they're not used. We have to take care of fd_updt before calling fd_{want,stop}_recv() because it's allocated fairly late in the boot process and some such functions may be called very early (e.g. to stop a disabled frontend's listeners).	2020-10-09 11:27:30 +02:00
Willy Tarreau	686fa3db50	MINOR: protocol: add a new pair of rx_enable/rx_disable methods These methods will be used to enable/disable rx at the receiver level so that callers don't play with FDs directly anymore. All our protocols use the generic ones from sock.c at the moment. For now they're not used.	2020-10-09 11:27:30 +02:00
Willy Tarreau	e70c7977f2	MINOR: sock: provide a set of generic enable/disable functions These will be used on receivers, to enable or disable receiving on a listener, which most of the time just consists in enabling/disabling the file descriptor. We have to take care of the existence of fd_updt to know if we may or not call fd_{want,stop}_recv() since it's not permitted in very early boot.	2020-10-09 11:27:30 +02:00
Willy Tarreau	58e6b71bb0	MINOR: protocol: implement an ->rx_resume() method This one undoes ->rx_suspend(), it tries to restore an operational socket. It was only implemented for TCP since it's the only one we support right now.	2020-10-09 11:27:30 +02:00
Willy Tarreau	cb66ea60cf	MINOR: protocol: replace ->pause(listener) with ->rx_suspend(receiver) The ->pause method is inappropriate since it doesn't exactly "pause" a listener but rather temporarily disables it so that it's not visible at all to let another process take its place. The term "suspend" is more suitable, since the "pause" is actually what we'll need to apply to the FULL and LIMITED states which really need to make a pause in the accept process. And it goes well with the use of the "resume" function that will also need to be made per-protocol. Let's rename the function and make it act on the receiver since it's already what it essentially does, hence the prefix "_rx" to make it more explicit. The protocol struct was a bit reordered because it was becoming a real mess between the parts related to the listeners and those for the receivers.	2020-10-09 11:27:30 +02:00
Willy Tarreau	d7f331c8b8	MINOR: protocol: rename the ->listeners field to ->receivers Since the listeners were split into receiver+listener, this field ought to have been renamed because it's confusing. It really links receivers and not listeners, as most of the time it's used via rx.proto_list! The nb_listeners field was updated accordingly.	2020-10-09 11:27:30 +02:00
Willy Tarreau	dae0692717	CLEANUP: listeners: remove the now unused enable_all_listeners() It's not used anymore since previous commit. The good thing is that no more listener function now directly acts on a protocol.	2020-10-09 11:27:30 +02:00
Willy Tarreau	078e1c7102	CLEANUP: protocol: remove the ->enable_all method It's not used anymore, now the listeners are enabled from protocol_enable_all().	2020-10-09 11:27:30 +02:00
Willy Tarreau	7834a3f70f	MINOR: listeners: export enable_listener() we'll soon call it from outside.	2020-10-09 11:27:30 +02:00
Willy Tarreau	d008009958	CLEANUP: listeners: remove unused disable_listener and disable_all_listeners These ones have never been called, they were referenced by the protocol's disable_all for some protocols but there are no traces of their use, so in addition to not being sure the code works, it has never been tested. Let's remove a bit of complexity starting from there.	2020-10-09 11:27:30 +02:00
Willy Tarreau	fb4ead8e8a	CLEANUP: protocol: remove the ->disable_all method This one has never been used, is only referenced by proto_uxst and proto_sockpair, and it's not even certain it works at all. Let's get rid of it.	2020-10-09 11:27:30 +02:00
Willy Tarreau	1accacbcc3	CLEANUP: proxy: remove the now unused pause_proxies() and resume_proxies() They're not used anymore, delete them before someone thinks about using them again!	2020-10-09 11:27:30 +02:00
Willy Tarreau	09819d1118	MINOR: protocol: introduce protocol_{pause,resume}_all() These two functions are used to pause and resume all listeners of all protocols. They use the standard listener functions for this so they're supposed to handle the situation gracefully regardless of the upper proxies' states, and they will report completion on proxies once the switch is performed. It might be nice to define a particular "failed" state for listeners that cannot resume and to count them on proxies in order to mention that they're definitely stuck. On the other hand, the current situation is retryable which is quite appreciable as well.	2020-10-09 11:27:30 +02:00
Willy Tarreau	337c835d16	MEDIUM: proxy: merge zombify_proxy() with stop_proxy() The two functions don't need to be distinguished anymore since they have all the necessary info to act as needed on their listeners. Let's just pass via stop_proxy() and make it check for each listener which one to close or not.	2020-10-09 11:27:30 +02:00
Willy Tarreau	43ba3cf2b5	MEDIUM: proxy: remove start_proxies() Its sole remaining purpose was to display "proxy foo started", which has little benefit and pollutes output for those with plenty of proxies. Let's remove it now. The VTCs were updated to reflect this, because many of them had explicit counts of dropped lines to match this message. This is tagged as MEDIUM because some users may be surprized by the loss of this quite old message.	2020-10-09 11:27:30 +02:00
Willy Tarreau	c3914d4fff	MEDIUM: proxy: replace proxy->state with proxy->disabled The remaining proxy states were only used to distinguish an enabled proxy from a disabled one. Due to the initialization order, both PR_STNEW and PR_STREADY were equivalent after startup, and they would only differ from PR_STSTOPPED when the proxy is disabled or shutdown (which is effectively another way to disable it). Now we just have a "disabled" field which allows to distinguish them. It's becoming obvious that start_proxies() is only used to print a greeting message now, that we'd rather get rid of. Probably that zombify_proxy() and stop_proxy() should be merged once their differences move to the right place.	2020-10-09 11:27:30 +02:00
Willy Tarreau	1ad64acf6c	CLEANUP: peers: don't use the PR_ST* states to mark enabled/disabled The enabled/disabled config options were stored into a "state" field that is an integer but contained only PR_STNEW or PR_STSTOPPED, which is a bit confusing, and causes a dependency with proxies. This was renamed to "disabled" and is used as a boolean. The field was also moved to the end of the struct to stop creating a hole and fill another one.	2020-10-09 11:27:30 +02:00
Willy Tarreau	f18d968830	MEDIUM: proxy: remove state PR_STPAUSED This state was used to mention that a proxy was in PAUSED state, as opposed to the READY state. This was causing some trouble because if a listener failed to resume (e.g. because its port was temporarily in use during the resume), it was not possible to retry the operation later. Now by checking the number of READY or PAUSED listeners instead, we can accurately know if something went bad and try to fix it again later. The case of the temporary port conflict during resume now works well: $ socat readline /tmp/sock1 prompt > disable frontend testme3 > disable frontend testme3 All sockets are already disabled. > enable frontend testme3 Failed to resume frontend, check logs for precise cause (port conflict?). > enable frontend testme3 > enable frontend testme3 All sockets are already enabled.	2020-10-09 11:27:30 +02:00
Willy Tarreau	a17c91b37f	MEDIUM: proxy: remove the PR_STERROR state This state is only set when a pause() fails but isn't even set when a resume() fails. And we cannot recover from this state. Instead, let's just count remaining ready listeners to decide to emit an error or not. It's more accurate and will better support new attempts if needed.	2020-10-09 11:27:30 +02:00
Willy Tarreau	6b3bf733dd	MEDIUM: proxy: remove the unused PR_STFULL state Since v1.4 or so, it's almost not possible anymore to set this state. The only exception is by using the CLI to change a frontend's maxconn setting below its current usage. This case makes no sense, and for other cases it doesn't make sense either because "full" is a vague concept when only certain listeners are full and not all. Let's just remove this unused state and make it clear that it's not reported. The "ready" or "open" states will continue to be reported without being misleading as they will be opposed to "stop".	2020-10-09 11:27:30 +02:00
Willy Tarreau	efc0eec4c1	MINOR: proxy: maintain per-state counters of listeners The proxy state tries to be synthetic but that doesn't work well with many listeners, especially for transition phases or after a failed pause/resume. In order to address this, we'll instead rely on counters of listeners in a given state for the 3 major states (ready, paused, listen) and a total counter. We'll now be able to determine a proxy's state by comparing these counters only.	2020-10-09 11:27:30 +02:00
Willy Tarreau	a37b244509	MINOR: listeners: introduce listener_set_state() This function is used as a wrapper to set a listener's state everywhere. We'll use it later to maintain some counters in a consistent state when switching state so it's capital that all state changes go through it. No functional change was made beyond calling the wrapper.	2020-10-09 11:27:30 +02:00
Willy Tarreau	c6dac6c7f5	MEDIUM: listeners: remove the now unused ZOMBIE state The zombie state is not used anymore by the listeners, because in the last two cases where it was tested it couldn't match as it was covered by the test on the process mask. Instead now the FD is either in the LISTEN state or the INIT state. This also avoids forcing the listener to be single-dimensional because actually belonging to another process isn't totally exclusive with the other states, which explains some of the difficulties requiring to check the proc_mask and the fd sometimes. So let's get rid of it now not to be tempted to reuse it. The doc on the listeners state was updated.	2020-10-09 11:27:29 +02:00
Emeric Brun	b0c331f71f	BUG/MINOR: proxy/log: frontend/backend and log forward names must differ This patch disallow to use same name for a log forward section and a frontend/backend section.	2020-10-08 08:53:26 +02:00
Emeric Brun	6d75616951	MINOR: channel: new getword and getchar functions on channel. This patch adds two new functions to get a char or a word from a channel.	2020-10-07 17:17:27 +02:00
Emeric Brun	2897644ae5	MINOR: stats: inc req counter on listeners. This patch enables count of requests for listeners if listener's counters are enabled.	2020-10-07 17:17:27 +02:00
Amaury Denoyelle	fbd0bc98fe	MINOR: dns/stats: integrate dns counters in stats Use the new stats module API to integrate the dns counters in the standard stats. This is done in order to avoid code duplication, keep the code related to cli out of dns and use the full possibility of the stats function, allowing to print dns stats in csv or json format.	2020-10-05 12:02:14 +02:00
Amaury Denoyelle	0b70a8a314	MINOR: stats: add config "stats show modules" By default, hide the extra statistics on the html page. Define a new flag STAT_SHMODULES which is activated if the config "stats show modules" is set.	2020-10-05 12:02:14 +02:00
Amaury Denoyelle	d3700a7fda	MINOR: stats: support clear counters for dynamic stats Add a boolean 'clearable' on stats module structure. If set, it forces all the counters to be reset on 'clear counters' cli command. If not, the counters are reset only when 'clear counters all' is used.	2020-10-05 12:02:14 +02:00
Amaury Denoyelle	ee63d4bd67	MEDIUM: stats: integrate static proxies stats in new stats This is executed on startup with the registered statistics module. The existing statistics have been merged in a list containing all statistics for each domain. This is useful to print all available statistics in a generic way. Allocate extra counters for all proxies/servers/listeners instances. These counters are allocated with the counters from the stats modules registered on startup.	2020-10-05 12:02:14 +02:00
Amaury Denoyelle	730c727ea3	MEDIUM: stats: add abstract type to store counters Implement a small API to easily add extra counters inside a structure instance. This will be used to implement dynamic statistics linked on every type of object as needed. The counters are stored in a dynamic array inside the relevant objects.	2020-10-05 12:02:14 +02:00
Amaury Denoyelle	58d395e0d6	MEDIUM: stats: define an API to register stat modules A stat module can be registered to quickly add new statistics on haproxy. It must be attached to one of the available stats domain. The register must be done using INITCALL on STG_REGISTER. The stat module has a name which should be unique for each new module in a domain. It also contains a statistics list with their name/desc and a pointer to a function used to fill the stats from the module counters. The module also provides the initial counters values used on automatically allocated counters. The offset for these counters are stored in the module structure.	2020-10-05 12:02:14 +02:00
Amaury Denoyelle	72b16e5173	MINOR: stats: define additional flag px cap on domain This flag can be used to determine on what type of proxy object the statistics should be relevant. It will be useful when adding dynamic statistics. Currently, this flag is not used.	2020-10-05 12:02:14 +02:00
Amaury Denoyelle	072f97eddf	MINOR: stats: define the concept of domain for statistics The domain option will be used to have statistics attached to other objects than proxies/listeners/servers. At the moment, only the PROXY domain is available. Add an argument 'domain' on the 'show stats' cli command to specify the domain. Only 'domain proxy' is available now. If not specified, proxy will be considered the default domain. For HTML output, only proxy statistics will be displayed.	2020-10-05 12:02:14 +02:00
Amaury Denoyelle	da5b6d1cd9	MINOR: stats: hide px/sv/li fields in applet struct Use an opaque pointer to store proxy instance. Regroup server/listener as a single opaque pointer. This has the benefit to render the structure more evolutive to support statistics on other types of objects in the future. This patch is needed to extend stat support for components other than proxies objects. The prometheus module has been adapted for these changes.	2020-10-05 10:48:58 +02:00
Amaury Denoyelle	97323c9ed4	MINOR: stats: add stats size as a parameter for csv/json dump Render the stats size parametric in csv/json dump functions. This is needed for the future patch which provides dynamic stats. For now the static value ST_F_TOTAL_FIELDS is provided. Remove unused parameter px on stats_dump_one_line. This patch is needed to extend stat support to components other than proxies objects.	2020-10-05 09:06:10 +02:00
Amaury Denoyelle	3ca927e68f	REORG: stats: export some functions Un-mark stats_dump_one_line and stats_putchk as static and export them in the header file. These functions will be reusable by other components to print their statistics. This patch is needed to extend stat support to components other than proxies objects.	2020-10-05 09:06:10 +02:00
Amaury Denoyelle	cd3de50779	MINOR: counters: fix a typo in comment Wrong copy/paste comment, replace listeners/frontends by servers/backends This may be backported up to 1.7.	2020-10-05 09:05:57 +02:00
Willy Tarreau	fac0f645df	BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe A crash reported in github issue #880 looks impossible unless pendconn_cond_unlink() occasionally sees a null leaf_p when attempting to remove an entry, which seems to be confirmed by the reporter. What seems to be happening is that depending on compiler optimizations, this pointer can appear as null while pointers are moved if one of the node's parents is removed from or inserted into the tree. There's no explicit null of the pointer during these operations but those pointers are rewritten in multiple steps and nothing prevents this situation from happening, and there are no particular barrier nor atomic ops around this. This test was used to avoid unnecessary locking, for already deleted entries, but looking at the code it appears that pendconn_free() already resets s->pend_pos that's used as <p> there, and that the other call reasons are after an error where the connection will be dropped as well. So we don't save anything by doing this test, and make it unsafe. The older code used to check for list emptiness there and not inside pendconn_unlink(), which explains why the code has stayed there. Let's just remove this now. Thanks to @jaroslawr for reporting this issue in great details and for testing the proposed fix. This should be backpored to 1.8, where the test on LIST_ISEMPTY should be moved to pendconn_unlink() instead (inside the lock, just like 2.0+).	2020-10-02 18:10:26 +02:00
Amaury Denoyelle	fa41cb6792	MINOR: tools: support for word expansion of environment in parse_line Allow the syntax "${...[*]}" to expand an environment variable containing several values separated by spaces as individual arguments. A new flag PARSE_OPT_WORD_EXPAND has been added to toggle this feature on parse_line invocation. In case of an invalid syntax, a new error PARSE_ERR_WRONG_EXPAND will be triggered. This feature has been asked on the github issue #165.	2020-10-01 17:24:14 +02:00
Willy Tarreau	3ca2365904	BUG/MEDIUM: h2: report frame bits only for handled types As part of his GREASE experiments on Chromium, Bence B�ky reported in https://lists.w3.org/Archives/Public/ietf-http-wg/2020JulSep/0202.html and https://bugs.chromium.org/p/chromium/issues/detail?id=1127060 that a certain combination of frame type and frame flags was causing an error on app.slack.com. It turns out that it's haproxy that is causing this issue because the frame type is wrongly assumed to support padding, the frame flags indicate padding is present, and the frame is too short for this, resulting in an error. The reason why only some frame types are affected is due to the frame type being used in a bit shift to match against a mask, and where the 5 lower bits of the frame type only are used to compute the frame bit. If the resulting frame bit matches a DATA, HEADERS or PUSH_PROMISE frame bit, then padding support is assumed and the test is enforced, resulting in a PROTOCOL_ERROR or FRAME_SIZE_ERROR depending on the payload size. We must never match any such bit for unsupported frame types so let's add a check for this. This must be backported as far as 1.8. Thanks to Cooper Bethea for providing enough context to help narrow the issue down and to Bence B�ky for creating a simple reproducer.	2020-09-18 08:05:03 +02:00
Willy Tarreau	2b5e0d8b6a	MEDIUM: proto_udp: replace last AF_CUST_UDP* with AF_INET* We don't need to cheat with the sock_domain anymore, we now always have the SOCK_DGRAM sock_type as a complementary selector. This patch restores the sock_domain to AF_INET* in the udp* protocols and removes all traces of the now unused AF_CUST_*.	2020-09-16 22:08:08 +02:00
Willy Tarreau	910c64da96	MEDIUM: protocol: store the socket and control type in the protocol array The protocol array used to be only indexed by socket family, which is very problematic with UDP (requiring an extra family) and with the forthcoming QUIC (also requiring an extra family), especially since that binds them to certain families, prevents them from supporting dgram UNIX sockets etc. In order to address this, we now start to register the protocols with more info, namely the socket type and the control type (either stream or dgram). This is sufficient for the protocols we have to deal with, but could also be extended further if multiple protocol variants were needed. But as is, it still fits nicely in an array, which is convenient for lookups that are instant.	2020-09-16 22:08:08 +02:00
Willy Tarreau	a54553f74f	MINOR: protocol: add the control layer type in the protocol struct This one will be needed to more accurately select a protocol. It may differ from the socket type for QUIC, which uses dgram at the socket layer and provides stream at the control layer. The upper level requests a control layer only so we need this field.	2020-09-16 22:08:08 +02:00
Willy Tarreau	65ec4e3ff7	MEDIUM: tools: make str2sa_range() check that the protocol has ->connect() Most callers of str2sa_range() need the protocol only to check that it provides a ->connect() method. It used to be used to verify that it's a stream protocol, but it might be a bit early to get rid of it. Let's keep the test for now but move it to str2sa_range() when the new flag PA_O_CONNECT is present. This way almost all call places could be cleaned from this. There's a strange test in the server address parsing code that rechecks the family from the socket which seems to be a duplicate of the previously removed tests. It will have to be rechecked.	2020-09-16 22:08:08 +02:00
Willy Tarreau	5fc9328aa2	MINOR: tools: make str2sa_range() directly return the protocol We'll need this so that it can return pointers to stacked protocol in the future (for QUIC). In addition this removes a lot of tests for protocol validity in the callers. Some of them were checked further apart, or after a call to str2listener() and they were simplified as well. There's still a trick, we can fail to return a protocol in case the caller accepts an fqdn for use later. This is what servers do and in this case it is valid to return no protocol. A typical example is: server foo localhost:1111	2020-09-16 22:08:08 +02:00
Willy Tarreau	9b3178df23	MINOR: listener: pass the chosen protocol to create_listeners() The function will need to use more than just a family, let's pass it the selected protocol. The caller will then be able to do all the fancy stuff required to pick the best protocol.	2020-09-16 22:08:08 +02:00
Willy Tarreau	aa333123f2	MINOR: cfgparse: add str2receiver() to parse dgram receivers This is at least temporary, as the migration at once is way too difficuly. For now it still creates listeners but only allows DGRAM sockets. This aims at easing the split between listeners and receivers.	2020-09-16 22:08:08 +02:00
Willy Tarreau	a93e5c7fae	MINOR: tools: make str2sa_range() optionally return the fd If a file descriptor was passed, we can optionally return it. This will be useful for listening sockets which are both a pre-bound FD and a ready socket.	2020-09-16 22:08:08 +02:00
Willy Tarreau	909c23b086	MINOR: listener: remove the inherited arg to create_listener() This argument can now safely be determined from fd != -1, let's just drop it.	2020-09-16 22:08:08 +02:00
Willy Tarreau	328199348b	MINOR: tools: add several PA_O_* flags in str2sa_range() callers These flags indicate whether the call is made to fill a bind or a server line, or even just send/recv calls (like logs or dns). Some special cases are made for outgoing FDs (e.g. pipes for logs) or socket FDs (e.g external listeners), and there's a distinction between stream or dgram usage that's expected to significantly help str2sa_range() proceed appropriately with the input information. For now they are not used yet.	2020-09-16 22:08:08 +02:00
Willy Tarreau	809587635e	MINOR: tools: add several PA_O_PORT_* flags in str2sa_range() callers These flags indicate what is expected regarding port specifications. Some callers accept none, some need fixed ports, some have it mandatory, some support ranges, and some take an offset. Each possibilty is reflected by an option. For now they are not exploited, but the goal is to instrument str2sa_range() to properly parse that.	2020-09-16 22:08:07 +02:00
Willy Tarreau	cd3a5591f6	MINOR: tools: make str2sa_range() take more options than just resolve We currently have an argument to require that the address is resolved but we'll soon add more, so let's turn it into a bit field. The old "resolve" boolean is now PA_O_RESOLVE.	2020-09-16 22:08:07 +02:00
Willy Tarreau	a5b325f92c	MINOR: protocol: add a real family for existing FDs At some places (log fd@XXX, bind fd@XXX) we support using an explicit file descriptor number, that is placed into the sockaddr for later use. The problem is that till now it was done with an AF_UNSPEC family, which is also used for other situations like missing info or rings (for logs). Let's create an "official" family AF_CUST_EXISTING_FD for this case so that we are certain the FD can be found in the address when it is set.	2020-09-16 22:08:07 +02:00
Willy Tarreau	1e984b73f0	CLEANUP: protocol: remove family-specific fields from struct protocol This removes the following fields from struct protocol that are now retrieved from the protocol family instead: .sock_family, .sock_addrlen, .l3_addrlen, .addrcmp, .bind, .get_src, .get_dst. This also removes the UDP-specific udp{,6}_get_{src,dst}() functions which were referenced but not used yet. Their goal was only to remap the original AF_INET* addresses to AF_CUST_UDP*. Note that .sock_domain is still there as it's used as a selector for the protocol struct to be used.	2020-09-16 22:08:07 +02:00
Willy Tarreau	f1f660978c	MINOR: protocol: retrieve the family-specific fields from the family We now take care of retrieving sock_family, l3_addrlen, bind(), addrcmp(), get_src() and get_dst() from the protocol family and not just the protocol itself. There are very few places, this was only seldom used. Interestingly in sock_inet.c used to rely on ->sock_family instead of ->sock_domain, and sock_unix.c used to hard-code PF_UNIX instead of using ->sock_domain. Also it appears obvious we have something wrong it the protocol selection algorithm because sock_domain is the one set to the custom protocols while it ought to be sock_family instead, which would avoid having to hard-code some conversions for UDP namely.	2020-09-16 22:08:07 +02:00
Willy Tarreau	b0254cb361	MINOR: protocol: add a new proto_fam structure for protocol families We need to specially handle protocol families which regroup common functions used for a given address family. These functions include bind(), addrcmp(), get_src() and get_dst() for now. Some fields are also added about the address family, socket domain (protocol family passed to the socket() syscall), and address length. These protocol families are referenced from the protocols but not yet used.	2020-09-16 22:08:07 +02:00
Willy Tarreau	62292b28a3	MEDIUM: sockpair: implement sockpair_bind_receiver() Note that for now we don't have a sockpair.c file to host that unusual family, so the new function was placed directly into proto_sockpair.c. It's no big deal given that this family is currently not shared with multiple protocols. The function does almost nothing but setting up the receiver. This is normal as the socket the FDs are passed onto are supposed to have been already created somewhere else, and the only usable identifier for such a socket pair is the receiving FD itself. The function was assigned to sockpair's ->bind() and is not used yet.	2020-09-16 22:08:07 +02:00
Willy Tarreau	1e0a860099	MEDIUM: sock_unix: implement sock_unix_bind_receiver() This function performs all the bind-related stuff for UNIX sockets that was previously done in uxst_bind_listener(). There is a very tiny difference however, which is that previously, in the unlikely event where listen() would fail, it was still possible to roll back the binding and rename the backup to the original socket. Now we have to rename it before calling returning, hence it will be done before calling listen(). However, this doesn't cover any particular use case since listen() has no reason to fail there (and the rollback is not done for inherited sockets), that was just done that way as a generic error processing path. The code is not used yet and is referenced in the uxst proto's ->bind().	2020-09-16 22:08:07 +02:00
Willy Tarreau	d69ce1ffbc	MEDIUM: sock_inet: implement sock_inet_bind_receiver() This function collects all the receiver-specific code from both tcp_bind_listener() and udp_bind_listener() in order to provide a more generic AF_INET/AF_INET6 socket binding function. For now the API is not very elegant because some info are still missing from the receiver while there's no ideal place to fill them except when calling ->listen() at the protocol level. It looks like some polishing code is needed in check_config_validity() or somewhere around this in order to finalize the receivers' setup. The main issue is that listeners and receivers are created before bind_conf options are parsed and that there's no finishing step to resolve some of them. The function currently sets up a receiver and subscribes it to the poller. In an ideal world we wouldn't subscribe it but let the caller do it after having finished to configure the L4 stuff. The problem is that the caller would then need to perform an fd_insert() call and to possibly set the exported flag on the FD while it's not its job. Maybe an improvement could be to have a separate sock_start_receiver() call in sock.c. For now the function is not used but it will soon be. It's already referenced as tcp and udp's ->bind().	2020-09-16 22:08:07 +02:00
Willy Tarreau	3e5c7ab7ce	MINOR: protocol: add a new ->bind() entry to bind the receiver This will be the function that must be used to bind the receiver. It solely depends on the address family but for now it's simpler to have it per protocol.	2020-09-16 22:08:07 +02:00
Willy Tarreau	b3580b19c8	MINOR: protocol: rename the ->bind field to ->listen The function currently is doing both the bind() and the listen(), so let's call it ->listen so that the bind() operation can move to another place.	2020-09-16 22:08:07 +02:00
Willy Tarreau	c049c0d5ad	MINOR: sock: make sock_find_compatible_fd() only take a receiver We don't need to have a listener anymore to find an fd, a receiver with its settings properly set is enough now.	2020-09-16 22:08:07 +02:00
Willy Tarreau	3fd3bdc836	MINOR: receiver: move the FOREIGN and V6ONLY options from listener to settings The new RX_O_FOREIGN, RX_O_V6ONLY and RX_O_V4V6 options are now set into the rx_settings part during the parsing, so that we don't need to adjust them in each and every listener anymore. We have to keep both v4v6 and v6only due to the precedence from v6only over v4v6.	2020-09-16 22:08:07 +02:00
Willy Tarreau	43046fa4f4	MINOR: listener: move the INHERITED flag down to the receiver It's the receiver's FD that's inherited from the parent process, not the listener's so the flag must move to the receiver so that appropriate actions can be taken.	2020-09-16 22:08:07 +02:00
Willy Tarreau	0b9150155e	MINOR: receiver: add a receiver-specific flag to indicate the socket is bound In order to split the receiver from the listener, we'll need to know that a socket is already bound and ready to receive. We used to do that via tha LI_O_ASSIGNED state but that's not sufficient anymore since the receiver might not belong to a listener anymore. The new RX_F_BOUND flag is used for this.	2020-09-16 22:08:07 +02:00
Willy Tarreau	eef454224d	MINOR: receiver: link the receiver to its owner A receiver will have to pass a context to be installed into the fdtab for use by the handler. We need to set this into the receiver struct as the bind will happen longer after the configuration.	2020-09-16 22:08:07 +02:00
Willy Tarreau	0fce6bce34	MINOR: receiver: link the receiver to its settings Just like listeners keep a pointer to their bind_conf, receivers now also have a pointer to their rx_settings. All those belonging to a listener are automatically initialized with a pointer to the bind_conf's settings.	2020-09-16 22:08:07 +02:00
Willy Tarreau	d45693d85c	REORG: listener: move the receiver part to a new file We'll soon add flags for the receivers, better add them to the final file, so it's time to move the definition to receiver-t.h. The struct receiver and rx_settings were placed there.	2020-09-16 22:08:07 +02:00
Willy Tarreau	b743661f04	REORG: listener: move the listener's proto to the receiver The receiver is the one which depends on the protocol while the listener relies on the receiver. Let's move the protocol there. Since there's also a list element to get back to the listener from the proto list, this list element (proto_list) was moved as well. For now when scanning protos, we still see listeners which are linked by their rx.proto_list part.	2020-09-16 22:08:05 +02:00
Willy Tarreau	38ba647f9f	REORG: listener: move the receiving FD to struct receiver The listening socket is represented by its file descriptor, which is generic to all receivers and not just listeners, so it must move to the rx struct. It's worth noting that in order to extend receivers and listeners to other protocols such as QUIC, we'll need other handles than file descriptors here, and that either a union or a cast to uintptr_t will have to be used. This was not done yet and the field was preserved under the name "fd" to avoid adding confusion.	2020-09-16 22:08:03 +02:00
Willy Tarreau	371590661e	REORG: listener: move the listening address to a struct receiver The address will be specific to the receiver so let's move it there.	2020-09-16 22:08:01 +02:00
Willy Tarreau	37d9d6721a	REORG: listener: create a new struct receiver In order to start to split the listeners into the listener part and the event receiver part, we introduce a new field "rx" into struct listener that will eventually become a separate struct receiver. This patch only adds the struct with an options field that the receivers will need.	2020-09-16 22:07:58 +02:00
Willy Tarreau	be56c1038f	MINOR: listener: move the network namespace to the struct settings The netns is common to all listeners/receivers and is used to bind the listening socket so it must be in the receiver settings and not in the listener. This removes some yet another set of unnecessary loops.	2020-09-16 20:13:13 +02:00
Willy Tarreau	7e307215e8	MINOR: listener: move the interface to the struct settings The interface is common to all listeners/receivers and is used to bind the listening socket so it must be in the receiver settings and not in the listener. This removes some unnecessary loops.	2020-09-16 20:13:13 +02:00
Willy Tarreau	e26993c098	MINOR: listener: move bind_proc and bind_thread to struct settings As mentioned previously, these two fields come under the settings struct since they'll be used to bind receivers as well.	2020-09-16 20:13:13 +02:00
Willy Tarreau	6e459d7f92	MINOR: listener: create a new struct "settings" in bind_conf There currently is a large inconsistency in how binding parameters are split between bind_conf and listeners. It happens that for historical reasons some parameters are available at the listener level but cannot be configured per-listener but only for a bind_conf, and thus, need to be replicated. In addition, some of the bind_conf parameters are in fact for the listening socket itself while others are for the instanciated sockets. A previous attempt at splitting listeners into receivers failed because the boundary between all these settings is not well defined. This patch introduces a level of listening socket settings in the bind_conf, that will be detachable later. Such settings that are solely for the listening socket are: - unix socket permissions (used only during binding) - interface (used for binding) - network namespace (used for binding) - process mask and thread mask (used during startup) The rest seems to be used only to initialize the resulting sockets, or to control the accept rate. For now, only the unix params (bind_conf->ux) were moved there.	2020-09-16 20:13:13 +02:00
William Lallemand	70bf06e5f0	BUILD: fix build with openssl < 1.0.2 since bundle removal Bundle removal broke the build with openssl version < 1.0.2. Remove the #ifdef around SSL_SOCK_KEYTYPE_NAMES.	2020-09-16 18:10:00 +02:00
William Lallemand	e7eb1fec2f	CLEANUP: ssl: remove utility functions for bundle Remove the last utility functions for handling the multi-cert bundles and remove the multi-variable from the ckch structure. With this patch, the bundles are completely removed.	2020-09-16 16:28:26 +02:00
William Lallemand	bd8e6eda59	CLEANUP: ssl: remove test on "multi" variable in ckch functions Since the removal of the multi-certificates bundle support, this variable is not useful anymore, we can remove all tests for this variable and suppose that every ckch contains a single certificate.	2020-09-16 16:28:26 +02:00
Willy Tarreau	441b6c31e9	BUILD: connection: fix build on clang after the VAR_ARRAY cleanup Commit `4987a4744` ("CLEANUP: tree-wide: use VAR_ARRAY instead of [0] in various definitions") broke the build on clang due to the tlv field used to receive/send the proxy protocol. The problem is that struct tlv is included at the beginning of struct tlv_ssl, which doesn't make much sense. In fact the value[] array isn't really a var array but just an end of struct marker, and must really be an array of size zero.	2020-09-14 08:43:51 +02:00
Willy Tarreau	4987a47446	CLEANUP: tree-wide: use VAR_ARRAY instead of [0] in various definitions Surprisingly there were still a number of [0] definitions for variable sized arrays in certain structures all over the code. We need to use VAR_ARRAY instead of zero to accommodate various compilers' preferences, as zero was used only on old ones and tends to report errors on new ones.	2020-09-12 20:56:41 +02:00
Ilya Shipitsin	4a034f2212	BUILD: introduce possibility to define ABORT_NOW() conditionally code analysis tools recognize abort() better, so let us introduce such possibility	2020-09-12 13:11:27 +02:00
Willy Tarreau	00c363ba9d	REORG: tools: move PARSE_OPT_* from tools.h to tools-t.h These would better be placed into the low-level type files with other similar macros.	2020-09-11 11:27:22 +02:00
Willy Tarreau	76296dce68	BUILD: trace: always have an argument before variadic args in macros tcc supports variadic macros provided that there is always at least one argument, like older gcc versions. Thus we need to always keep one and define args as the remaining ones. It's not an issue at all and doesn't change the way to use them, just the internal definitions.	2020-09-10 09:35:54 +02:00
Willy Tarreau	d966f1497c	BUILD: intops: on x86_64, the bswap instruction is called bswapq Building with tcc fails on "bswap" which in fact ought to be called "bswapq". Let's rename it as gas doesn't care.	2020-09-10 09:31:50 +02:00
Willy Tarreau	f6afda6539	BUILD: compiler: workaround a glibc madness around __attribute__() For whatever reason, glibc decided that the __attribute__ keyword is the exclusive property of gcc, and redefines it to an empty macro on other compilers. Some non-gcc compilers also support it (possibly partially), tinycc is one of them. By doing this, glibc silently broke all constructors, resulting in code that arrives in main() with uninitialized variables. The solution we use here consists in undefining the macro on non-gcc compilers, and redefining it to itself in order to cause a conflict in the event the redefinition would happen afterwards. This visibly solved the problem.	2020-09-10 09:26:50 +02:00
Willy Tarreau	d9537f6082	BUILD: compiler: reserve the gcc version checks to the gcc compiler Some checks on __GNUC__ imply that if it's undefined it will match a low value but that's not always what we want, like for example in the VAR_ARRAY definition which is not needed on tcc. Let's always be explicit on these tests.	2020-09-10 08:35:28 +02:00
Christopher Faulet	5a89175ac8	BUG/MEDIUM: dns: Don't store additional records in a linked-list A SRV record keeps a reference on the corresponding additional record, if any. But this additional record is also inserted in a separate linked-list into the dns response. The problems arise when obsolete additional records are released. The additional records list is purged but the SRV records always reference these objects, leading to an undefined behavior. Worst, this happens very quickly because additional records are never renewed. Thus, once received, an additional record will always expire. Now, the addtional record are only associated to a SRV record or simply ignored. And the last version is always used. This patch helps to fix the issue #841. It must be backported to 2.2.	2020-09-08 10:44:39 +02:00
Willy Tarreau	e91bff2134	MAJOR: init: start all listeners via protocols and not via proxies anymore Ever since the protocols were added in 1.3.13, listeners used to be started twice: - once by start_proxies(), which iteratees over all proxies then all listeners ; - once by protocol_bind_all() which iterates over all protocols then all listeners ; It's a real mess because error reporting is not even consistent, and more importantly now that some protocols do not appear in regular proxies (peers, logs), there is no way to retry their binding should it fail on the last step. What this patch does is to make sure that listeners are exclusively started by protocols. The failure to start a listener now causes the emission of an error indicating the proxy's name (as it used to be the case per proxy), and retryable failures are silently ignored during all but last attempts. The start_proxies() function was kept solely for setting the proxy's state to READY and emitting the "Proxy started" message and log that some have likely got used to seeking in their logs.	2020-09-02 11:11:43 +02:00
Willy Tarreau	576a633868	CLEANUP: protocol: remove all ->bind_all() and ->unbind_all() functions These ones were not used anymore since the two previous patches, let's drop them.	2020-09-02 10:40:33 +02:00
Christopher Faulet	bde2c4c621	MINOR: http-htx: Handle an optional reason when replacing the response status When calling the http_replace_res_status() function, an optional reason may now be set. It is ignored if it points to NULL and the original reason is preserved. Only the response status is replaced. Otherwise both the status and the reason are replaced. It simplifies the API and most of time, avoids an extra call to http_replace_res_reason().	2020-09-01 10:55:36 +02:00
Christopher Faulet	b8ce505c6f	MINOR: http-htx: Add an option to eval query-string when the path is replaced The http_replace_req_path() function now takes a third argument to evaluate the query-string as part of the path or to preserve it. If <with_qs> is set, the query-string is replaced with the path. Otherwise, only the path is replaced. This patch is mandatory to fix issue #829. The next commit depends on it. So be carefull during backports.	2020-09-01 10:55:14 +02:00
Willy Tarreau	9dbb6c43ce	MINOR: sock: distinguish dgram from stream types when retrieving old sockets For now we still don't retrieve dgram sockets, but the code must be able to distinguish them before we switch to receivers. This adds a new flag to the xfer_sock_list indicating that a socket is of type SOCK_DGRAM. The way to set the flag for now is by looking at the dummy address family which equals AF_CUST_UDP{4,6} in this case (given that other dgram sockets are not yet supported).	2020-08-28 19:26:39 +02:00
Willy Tarreau	a2c17877b3	MINOR: sock: do not use LI_O_* in xfer_sock_list anymore We'll want to store more info there and some info that are not represented in listener options at the moment (such as dgram vs stream) so let's get rid of these and instead use a new set of options (SOCK_XFER_OPT_*).	2020-08-28 19:26:38 +02:00
Willy Tarreau	429617459d	REORG: sock: move get_old_sockets() from haproxy.c The new function was called sock_get_old_sockets() and was left as-is except a minimum amount of style lifting to make it more readable. It will never be awesome anyway since it's used very early in the boot sequence and needs to perform socket I/O without any external help.	2020-08-28 19:24:55 +02:00
Willy Tarreau	37bafdcbb1	MINOR: sock_inet: move the IPv4/v6 transparent mode code to sock_inet This code was highly redundant, existing for TCP clients, TCP servers and UDP servers. Let's move it to sock_inet where it belongs. The new functions are sock_inet4_make_foreign() and sock_inet6_make_foreign().	2020-08-28 18:51:36 +02:00
Willy Tarreau	2d34a710b1	MINOR: sock: implement sock_find_compatible_fd() This is essentially a merge from tcp_find_compatible_fd() and uxst_find_compatible_fd() that relies on a listener's address and compare function and still checks for other variations. For AF_INET6 it compares a few of the listener's bind options. A minor change for UNIX sockets is that transparent mode, interface and namespace used to be ignored when trying to pick a previous socket while now if they are changed, the socket will not be reused. This could be refined but it's still better this way as there is no more risk of using a differently bound socket by accident. Eventually we should not pass a listener there but a set of binding parameters (address, interface, namespace etc...) which ultimately will be grouped into a receiver. For now this still doesn't exist so let's stick to the listener to break dependencies in the rest of the code.	2020-08-28 18:51:36 +02:00
Willy Tarreau	a6473ede5c	MINOR: sock: add interface and namespace length to xfer_sock_list This will ease and speed up comparisons in FD lookups.	2020-08-28 18:51:36 +02:00
Willy Tarreau	063d47d136	REORG: listener: move xfer_sock_list to sock.{c,h}. This will be used for receivers as well thus it is not specific to listeners but to sockets.	2020-08-28 18:51:36 +02:00
Willy Tarreau	e5bdc51bb5	REORG: sock_inet: move default_tcp_maxseg from proto_tcp.c Let's determine it at boot time instead of doing it on first use. It also saves us from having to keep it thread local. It's been moved to the new sock_inet_prepare() function, and the variables were renamed to sock_inet_tcp_maxseg_default and sock_inet6_tcp_maxseg_default.	2020-08-28 18:51:36 +02:00
Willy Tarreau	d88e8c06ac	REORG: sock_inet: move v6only_default from proto_tcp.c to sock_inet.c The v6only_default variable is not specific to TCP but to AF_INET6, so let's move it to the right file. It's now immediately filled on startup during the PREPARE stage so that it doesn't have to be tested each time. The variable's name was changed to sock_inet6_v6only_default.	2020-08-28 18:51:36 +02:00
Willy Tarreau	25140cc573	REORG: inet: replace tcp_is_foreign() with sock_inet_is_foreign() The function now makes it clear that it's independent on the socket type and solely relies on the address family. Note that it supports both IPv4 and IPv6 as we don't seem to need it per-family.	2020-08-28 18:51:36 +02:00
Willy Tarreau	c5a94c936b	MINOR: sock_inet: implement sock_inet_get_dst() This one is common to the TCPv4 and UDPv4 code, it retrieves the destination address of a socket, taking care of the possiblity that for an incoming connection the traffic was possibly redirected. The TCP and UDP definitions were updated to rely on it and remove duplicated code.	2020-08-28 18:51:36 +02:00
Willy Tarreau	f172558b27	MINOR: tcp/udp/unix: make use of proto->addrcmp() to compare addresses The new addrcmp() protocol member points to the function to be used to compare two addresses of the same family. When picking an FD from a previous process, we can now use the address specific address comparison functions instead of having to rely on a local implementation. This will help move that code to a more central place.	2020-08-28 18:51:36 +02:00
Willy Tarreau	0d06df6448	MINOR: sock: introduce sock_inet and sock_unix These files will regroup everything specific to AF_INET, AF_INET6 and AF_UNIX socket definitions and address management. Some code there might be agnostic to the socket type and could later move to af_xxxx.c but for now we only support regular sockets so no need to go too far. The files are quite poor at this step, they only contain the address comparison function for each address family.	2020-08-28 18:51:36 +02:00
Willy Tarreau	18b7df7a2b	REORG: sock: start to move some generic socket code to sock.c The new file sock.c will contain generic code for standard sockets relying on file descriptors. We currently have way too much duplication between proto_uxst, proto_tcp, proto_sockpair and proto_udp. For now only get_src, get_dst and sock_create_server_socket were moved, and are used where appropriate.	2020-08-28 18:51:36 +02:00
Willy Tarreau	478331dd93	CLEANUP: tcp: stop exporting smp_fetch_src() This is totally ugly, smp_fetch_src() is exported only so that stick_table.c can (ab)use it in the {sc,src}_* sample fetch functions. It could be argued that the sample could have been reconstructed there in place, but we don't even need to duplicate the code. We'd rather simply retrieve the "src" fetch's function from where it's used at init time and be done with it.	2020-08-28 18:51:36 +02:00
Willy Tarreau	bb1caff70f	MINOR: fd: add a new "exported" flag and use it for all regular listeners This new flag will be used to mark FDs that must be passed to any future process across the CLI's "_getsocks" command. The scheme here is quite complex and full of special cases: - FDs inherited from parent processes are not exported this way, as they are supposed to instead be passed by the master process itself across reloads. However such FDs ought never to be paused otherwise this would disrupt the socket in the parent process as well; - FDs resulting from a "bind" performed over a socket pair, which are in fact one side of a socket pair passed inside another control socket pair must not be passed either. Since all of them are used the same way, for now it's enough never to put this "exported" flag to FDs bound by the socketpair code. - FDs belonging to temporary listeners (e.g. a passive FTP data port) must not be passed either. Fortunately we don't have such FDs yet. - the rest of the listeners for now are made of TCP, UNIX stream, ABNS sockets and are exportable, so they get the flag. - UDP listeners were wrongly created as listeners and are not suitable here. Their FDs should be passed but for now they are not since the client doesn't even distinguish the SO_TYPE of the retrieved sockets. In addition, it's important to keep in mind that: - inherited FDs may never be closed in master process but may be closed in worker processes if the service is shut down (useless since still bound, but technically possible) ; - inherited FDs may not be disabled ; - exported FDs may be disabled because the caller will perform the subsequent listen() on them. However that might not work for all OSes - exported FDs may be closed, it just means the service was shut down from the worker, and will be rebound in the new process. This implies that we have to disable exported on close(). => as such, contrary to an apparently obvious equivalence, the "exported" status doesn't imply anything regarding the ability to close a listener's FD or not.	2020-08-26 18:33:52 +02:00
Willy Tarreau	63d8b6009b	CLEANUP: fd: remove fd_remove() and rename fd_dodelete() to fd_delete() This essentially undoes what we did in fd.c in 1.8 to support seamless reload. Since we don't need to remove an fd anymore we can turn fd_delete() to the simple function it used to be.	2020-08-26 18:33:52 +02:00
Willy Tarreau	bf3b06b03d	MINOR: reload: determine the foreing binding status from the socket Let's not look at the listener options passed by the original process and determine from the socket itself whether it is configured for transparent mode or not. This is cleaner and safer, and doesn't rely on flag values that could possibly change between versions.	2020-08-26 10:33:02 +02:00
Shimi Gersner	5846c490ce	MEDIUM: ssl: Support certificate chaining for certificate generation haproxy supports generating SSL certificates based on SNI using a provided CA signing certificate. Because CA certificates may be signed by multiple CAs, in some scenarios, it is neccesary for the server to attach the trust chain in addition to the generated certificate. The following patch adds the ability to serve the entire trust chain with the generated certificate. The chain is loaded from the provided `ca-sign-file` PEM file.	2020-08-25 16:36:06 +02:00
David Carlier	7adf8f35df	OPTIM: regex: PCRE2 use JIT match when JIT optimisation occured. When a regex had been succesfully compiled by the JIT pass, it is better to use the related match, thanksfully having same signature, for better performance. Signed-off-by: David Carlier <devnexen@gmail.com>	2020-08-14 07:53:40 +02:00
Christopher Faulet	d25d926806	MINOR: lua: Add support for userlist as fetches and converters arguments It means now http_auth() and http_auth_group() sample fetches are now exported to the lua.	2020-08-07 14:27:54 +02:00
Christopher Faulet	e02fc4d0dd	MINOR: arg: Add an argument type to keep a reference on opaque data The ARGT_PTR argument type may now be used to keep a reference to opaque data in the argument array used by sample fetches and converters. It is a generic way to point on data. I guess it could be used for some other arguments, like proxy, server, map or stick-table.	2020-08-07 14:20:07 +02:00
Ilya Shipitsin	6b79f38a7a	CLEANUP: assorted typo fixes in the code and comments This is 12th iteration of typo fixes	2020-07-31 11:18:07 +02:00
Christopher Faulet	2747fbb7ac	MEDIUM: tcp-rules: Use a dedicated expiration date for tcp ruleset A dedicated expiration date is now used to apply the inspect-delay of the tcp-request or tcp-response rulesets. Before, the analyse expiratation date was used but it may also be updated by the lua (at least). So a lua script may extend or reduce the inspect-delay by side effect. This is not expected. If it becomes necessary, a specific function will be added to do this. Because, for now, it is a bit confusing.	2020-07-30 09:31:09 +02:00
Christopher Faulet	810df06145	MEDIUM: htx: Add a flag on a HTX message when no more data are expected The HTX_FL_EOI flag must now be set on a HTX message when no more data are expected. Most of time, it must be set before adding the EOM block. Thus, if there is no space for the EOM, there is still an information to know all data were received and pushed in the HTX message. There is only an exception for the HTTP replies (deny, return...). For these messages, the flag is set after all blocks are pushed in the message, including the EOM block, because, on error, we remove all inserted data.	2020-07-22 16:43:32 +02:00
Willy Tarreau	f2452b3c70	MINOR: tasks/debug: add a BUG_ON() check to detect requeued task on free __task_free() cannot be called with a task still in the queue. This test adds a check which confirms there is no concurrency issue on such a case where a thread could requeue nor wakeup a task being freed.	2020-07-22 14:42:52 +02:00
Willy Tarreau	e5d79bccc0	MINOR: tasks/debug: add a few BUG_ON() to detect use of wrong timer queue This aims at catching calls to task_unlink_wq() performed by the wrong thread based on the shared status for the task, as well as calls to __task_queue() with the wrong timer queue being used based on the task's capabilities. This will at least help eliminate some hypothesis during debugging sessions when suspecting that a wrong thread has attempted to queue a task at the wrong place.	2020-07-22 14:42:52 +02:00
Willy Tarreau	2447bce554	MINOR: tasks/debug: make the thread affinity BUG_ON check a bit stricter The BUG_ON() test in task_queue() only tests for the case where we're queuing a task that doesn't run on the current thread. Let's refine it a bit further to catch all cases where the task does not run exactly on the current thread alone.	2020-07-22 14:22:38 +02:00
Emeric Brun	d3db3846c5	BUG/MEDIUM: resolve: fix init resolving for ring and peers section. Reported github issue #759 shows there is no name resolving on server lines for ring and peers sections. This patch introduce the resolving for those lines. This patch adds boolean a parameter to parse_server function to specify if we want the function to perform an initial name resolving using libc. This boolean is forced to true in case of peers or ring section. The boolean is kept to false in case of classic servers (from backend/listen) This patch should be backported in branches where peers sections support 'server' lines.	2020-07-21 17:59:20 +02:00
Emeric Brun	45c457a629	MINOR: log: adds counters on received syslog messages. This patch adds a global counter of received syslog messages and this one is exported on CLI "show info" as "CumRecvLogs". This patch also updates internal conn counter and freq of the listener and the proxy for each received log message to prepare a further export on the "show stats".	2020-07-15 17:50:12 +02:00
Emeric Brun	12941c82d0	MEDIUM: log: adds log forwarding section. Log forwarding: It is possible to declare one or multiple log forwarding section, haproxy will forward all received log messages to a log servers list. log-forward <name> Creates a new log forwarder proxy identified as <name>. bind <addr> [param*] Used to configure a log udp listener to receive messages to forward. Only udp listeners are allowed, address must be prefixed using 'udp@', 'udp4@' or 'udp6@'. This supports for all "bind" parameters found in 5.1 paragraph but most of them are irrelevant for udp/syslog case. log global log <address> [len <length>] [format <format>] [sample <ranges>:<smp_size>] <facility> [<level> [<minlevel>]] Used to configure target log servers. See more details on proxies documentation. If no format specified, haproxy tries to keep the incoming log format. Configured facility is ignored, except if incoming message does not present a facility but one is mandatory on the outgoing format. If there is no timestamp available in the input format, but the field exists in output format, haproxy will use the local date. Example: global log stderr format iso local7 ring myring description "My local buffer" format rfc5424 maxlen 1200 size 32764 timeout connect 5s timeout server 10s # syslog tcp server server mysyslogsrv 127.0.0.1:514 log-proto octet-count log-forward sylog-loadb bind udp4@127.0.0.1:1514 # all messages on stderr log global # all messages on local tcp syslog server log ring@myring local0 # load balance messages on 4 udp syslog servers log 127.0.0.1:10001 sample 1:4 local0 log 127.0.0.1:10002 sample 2:4 local0 log 127.0.0.1:10003 sample 3:4 local0 log 127.0.0.1:10004 sample 4:4 local0	2020-07-15 17:50:12 +02:00
Emeric Brun	54932b4408	MINOR: log: adds syslog udp message handler and parsing. This patch introduce a new fd handler used to parse syslog message on udp. The parsing function returns level, facility and metadata that can be immediatly reused to forward message to a log server. This handler is enabled on udp listeners if proxy is internally set to mode PR_MODE_SYSLOG	2020-07-15 17:50:12 +02:00
Emeric Brun	546488559a	MEDIUM: log/sink: re-work and merge of build message API. This patch merges build message code between sink and log and introduce a new API based on struct ist array to prepare message header with zero copy, targeting the log forwarding feature. Log format 'iso' and 'timed' are now avalaible on logs line. A new log format 'priority' is also added.	2020-07-15 17:50:12 +02:00
Emeric Brun	3835c0dcb5	MEDIUM: udp: adds minimal proto udp support for message listeners. This patch introduce proto_udp.c targeting a further support of log forwarding feature. This code was originally produced by Frederic Lecaille working on QUIC support and only minimal requirements for syslog support have been merged.	2020-07-15 17:50:12 +02:00
Christopher Faulet	aaa70852d9	MINOR: raw_sock: Report the number of bytes emitted using the splicing In the continuity of the commit `7cf0e4517` ("MINOR: raw_sock: report global traffic statistics"), we are now able to report the global number of bytes emitted using the splicing. It can be retrieved in "show info" output on the CLI. Note this counter is always declared, regardless the splicing support. This eases the integration with monitoring tools plugged on the CLI.	2020-07-15 14:08:14 +02:00
Christopher Faulet	0f9ff14b17	CLEANUP: connection: remove unused field idle_time from the connection struct Thanks to previous changes, this field is now unused.	2020-07-15 14:08:14 +02:00
Christopher Faulet	c6e7563b1a	MINOR: server: Factorize code to deal with connections removed from an idle list The srv_del_conn_from_list() function is now responsible to update the server counters and the connection flags when a connection is removed from an idle list (safe, idle or available). It is called when a connection is released or when a connection is set as private. This function also removes the connection from the idle list if necessary.	2020-07-15 14:08:14 +02:00
Christopher Faulet	3d52f0f1f8	MINOR: server: Factorize code to deal with reuse of server idle connections The srv_use_idle_conn() function is now responsible to update the server counters and the connection flags when an idle connection is reused. The same function is called when a new connection is created. This simplifies a bit the connect_server() function.	2020-07-15 14:08:14 +02:00
Christopher Faulet	15979619c4	MINOR: session: Take care to decrement idle_conns counter in session_unown_conn So conn_free() only calls session_unown_conn() if necessary. The details are now fully handled by session_unown_conn().	2020-07-15 14:08:14 +02:00
Christopher Faulet	236c93b108	MINOR: connection: Set the conncetion target during its initialisation When a new connection is created, its target is always set just after. So the connection target may set when it is created instead, during its initialisation to be precise. It is the purpose of this patch. Now, conn_new() function is called with the connection target as parameter. The target is then passed to conn_init(). It means the target must be passed when cs_new() is called. In this case, the target is only used when the conn-stream is created with no connection. This only happens for tcpchecks for now.	2020-07-15 14:08:14 +02:00
Christopher Faulet	fcc3d8a1c0	MINOR: connection: Use a dedicated function to look for a session's connection The session_get_conn() must now be used to look for an available connection matching a specific target for a given session. This simplifies a bit the connect_server() function.	2020-07-15 14:08:14 +02:00
Christopher Faulet	08016ab82d	MEDIUM: connection: Add private connections synchronously in session server list When a connection is marked as private, it is now added in the session server list. We don't wait a stream is detached from the mux to do so. When the connection is created, this happens after the mux creation. Otherwise, it is performed when the connection is marked as private. To allow that, when a connection is created, the session is systematically set as the connectin owner. Thus, a backend connection has always a owner during its creation. And a private connection has always a owner until its death. Note that outside the detach() callback, if the call to session_add_conn() failed, the error is ignored. In this situation, we retry to add the connection into the session server list in the detach() callback. If this fails at this step, the multiplexer is destroyed and the connection is closed.	2020-07-15 14:08:14 +02:00
Christopher Faulet	21ddc74e8a	MINOR: connection: Add a wrapper to mark a connection as private To set a connection as private, the conn_set_private() function must now be called. It sets the CO_FL_PRIVATE flags, but it also remove the connection from the available connection list, if necessary. For now, it never happens because only HTTP/1 connections may be set as private after their creation. And these connections are never inserted in the available connection list.	2020-07-15 14:08:14 +02:00
Willy Tarreau	a9d7b76f6a	MINOR: connection: use MT_LIST_ADDQ() to add connections to idle lists When a connection is added to an idle list, it's already detached and cannot be seen by two threads at once, so there's no point using TRY_ADDQ, there will never be any conflict. Let's just use the cheaper ADDQ.	2020-07-10 08:52:13 +02:00
Willy Tarreau	8689127816	MINOR: buffer: use MT_LIST_ADDQ() for buffer_wait lists additions The TRY_ADDQ there was not needed since the wait list is exclusively owned by the caller. There's a preliminary test on MT_LIST_ADDED() that might have been eliminated by keeping MT_LIST_TRY_ADDQ() but it would have required two more expensive writes before testing so better keep the test the way it is.	2020-07-10 08:52:13 +02:00
Willy Tarreau	de4db17dee	MINOR: lists: rename some MT_LIST operations to clarify them Initially when mt_lists were added, their purpose was to be used with the scheduler, where anyone may concurrently add the same tasklet, so it sounded natural to implement a check in MT_LIST_ADD{,Q}. Later their usage was extended and MT_LIST_ADD{,Q} started to be used on situations where the element to be added was exclusively owned by the one performing the operation so a conflict was impossible. This became more obvious with the idle connections and the new macro was called MT_LIST_ADDQ_NOCHECK. But this remains confusing and at many places it's not expected that an MT_LIST_ADD could possibly fail, and worse, at some places we start by initializing it before adding (and the test is superflous) so let's rename them to something more conventional to denote the presence of the check or not: MT_LIST_ADD{,Q} : inconditional operation, the caller owns the element, and doesn't care about the element's current state (exactly like LIST_ADD) MT_LIST_TRY_ADD{,Q}: only perform the operation if the element is not already added or in the process of being added. This means that the previously "safe" MT_LIST_ADD{,Q} are not "safe" anymore. This also means that in case of backport mistakes in the future causing this to be overlooked, the slower and safer functions will still be used by default. Note that the missing unchecked MT_LIST_ADD macro was added. The rest of the code will have to be reviewed so that a number of callers of MT_LIST_TRY_ADDQ are changed to MT_LIST_ADDQ to remove the unneeded test.	2020-07-10 08:50:41 +02:00
MIZUTA Takeshi	b24bc0dfb6	MINOR: tcp: Support TCP keepalive parameters customization It is now possible to customize TCP keepalive parameters. These correspond to the socket options TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL and are valid for the defaults, listen, frontend and backend sections. This patch fixes GitHub issue #670.	2020-07-09 05:22:16 +02:00
Willy Tarreau	3b8f9b7b88	BUG/MEDIUM: lists: add missing store barrier in MT_LIST_ADD/MT_LIST_ADDQ The torture test run for previous commit `787dc20` ("BUG/MEDIUM: lists: add missing store barrier on MT_LIST_BEHEAD()") finally broke again after 34M connections. It appeared that MT_LIST_ADD and MT_LIST_ADDQ were suffering from the same missing barrier when restoring the original pointers before giving up, when checking if the element was already added. This is indeed something which seldom happens with the shared scheduler, in case two threads simultaneously try to wake up the same tasklet. With a store barrier there after reverting the pointers, the torture test survived 750M connections on the NanoPI-Fire3, so it looks good this time. Probably that MT_LIST_BEHEAD should be added to test-list.c since it seems to be more sensitive to concurrent accesses with MT_LIST_ADDQ. It's worth noting that there is no barrier between the last two pointers update, while there is one in MT_LIST_POP and MT_LIST_BEHEAD, the latter having shown to be needed, but I cannot demonstrate why we would need one here. Given that the code seems solid here, let's stick to what is shown to work. This fix should be backported to 2.1, just for the sake of safety since the issue couldn't be triggered there, but it could change with the compiler or when backporting a fix for example.	2020-07-09 05:01:27 +02:00
Willy Tarreau	787dc20952	BUG/MEDIUM: lists: add missing store barrier on MT_LIST_BEHEAD() When running multi-threaded tests on my NanoPI-Fire3 (8 A53 cores), I managed to occasionally get either a bus error or a segfault in the scheduler, but only when running at a high connection rate (injecting on a tcp-request connection reject rule). The bug is rare and happens around once per million connections. I could never reproduce it with less than 4 threads nor on A72 cores. Haproxy 2.1.0 would also fail there but not 2.1.7. Every time the crash happened with the TL_URGENT task list corrupted, though it was not immediately after the LIST_SPLICE() call, indicating background activity survived the MT_LIST_BEHEAD() operation. This queue is where the shared runqueue is transferred, and the shared runqueue gets fast inter-thread tasklet wakeups from idle conn takeover and new connections. Comparing the MT_LIST_BEHEAD() and MT_LIST_DEL() implementations, it's quite obvious that a few barriers are missing from the former, and these will simply fail on weakly ordered caches. Two store barriers were added before the break() on failure, to match what is done on the normal path. Missing them almost always results in a segfault which is quite rare but consistent (after ~3M connections). The 3rd one before updating n->prev seems intuitively needed though I coudln't make the code fail without it. It's present in MT_LIST_DEL so better not be needlessly creative. The last one is the most important one, and seems to be the one that helps a concurrent MT_LIST_ADDQ() detect a late failure and try again. With this, the code survives at least 30M connections. Interestingly the exact same issue was addressed in 2.0-dev2 for MT_LIST_DEL with commit `690d2ad4d` ("BUG/MEDIUM: list: add missing store barriers when updating elements and head"). This fix must be backported to 2.1 as MT_LIST_BEHEAD() is also used there. It's only tagged as medium because it will only affect entry-level CPUs like Cortex A53 (x86 are not affected), and requires load levels that are very hard to achieve on such machines to trigger it. In practice it's unlikely anyone will ever hit it.	2020-07-08 19:45:50 +02:00
Willy Tarreau	e3cb9978c2	MINOR: version: back to development, update status message Update the status message and update INSTALL again.	2020-07-07 16:38:51 +02:00
Willy Tarreau	33205c23a7	[RELEASE] Released version 2.3-dev0 Released version 2.3-dev0 with the following main changes : - exact copy of 2.2.0	2020-07-07 16:35:28 +02:00
Willy Tarreau	44c47de81a	MINOR: version: mention that it's an LTS release now The new version is going to be LTS up to around Q2 2025.	2020-07-07 16:31:52 +02:00
William Lallemand	7d42ef5b22	WIP/MINOR: ssl: add sample fetches for keylog in frontend OpenSSL 1.1.1 provides a callback registering function SSL_CTX_set_keylog_callback, which allows one to receive a string containing the keys to deciphers TLSv1.3. Unfortunately it is not possible to store this data in binary form and we can only get this information using the callback. Which means that we need to store it until the connection is closed. This patches add 2 pools, the first one, pool_head_ssl_keylog is used to store a struct ssl_keylog which will be inserted as a ex_data in a SSL *. The second one is pool_head_ssl_keylog_str which will be used to store the hexadecimal strings. To enable the capture of the keys, you need to set "tune.ssl.keylog on" in your configuration. The following fetches were implemented: ssl_fc_client_early_traffic_secret, ssl_fc_client_handshake_traffic_secret, ssl_fc_server_handshake_traffic_secret, ssl_fc_client_traffic_secret_0, ssl_fc_server_traffic_secret_0, ssl_fc_exporter_secret, ssl_fc_early_exporter_secret	2020-07-06 19:08:03 +02:00
Ilya Shipitsin	46a030cdda	CLEANUP: assorted typo fixes in the code and comments This is 11th iteration of typo fixes	2020-07-06 14:34:32 +02:00
Willy Tarreau	b0be8ae2a8	CLEANUP: auth: fix useless self-include of auth-t.h Since recent include cleanups auth-t.h ended up including itself.	2020-07-05 21:32:47 +02:00
Willy Tarreau	0c439d8956	BUILD: tools: make resolve_sym_name() return a const Originally it was made to return a void* because some comparisons in the code where it was used required a lot of casts. But now we don't need that anymore. And having it non-const breaks the build on NetBSD 9 as reported in issue #728. So let's switch to const and adjust debug.c to accomodate this.	2020-07-05 20:26:04 +02:00
Olivier Houchard	a74bb7e26e	BUG/MEDIUM: connections: Let the xprt layer know a takeover happened. When we takeover a connection, let the xprt layer know. If it has its own tasklet, and it is already scheduled, then it has to be destroyed, otherwise it may run the new mux tasklet on the old thread. Note that we only do this for the ssl xprt for now, because the only other one that might wake the mux up is the handshake one, which is supposed to disappear before idle connections exist. No backport is needed, this is for 2.2.	2020-07-03 17:49:33 +02:00
Olivier Houchard	1662cdb0c6	BUG/MEDIUM: connections: Set the tid for the old tasklet on takeover. In the various takeover() methods, make sure we schedule the old tasklet on the old thread, as we don't want it to run on our own thread! This was causing a very rare crash when building with DEBUG_STRICT, seeing that either an FD's thread mask didn't match the thread ID in h1_io_cb(), or that stream_int_notify() would try to queue a task with the wrong tid_bit. In order to reproduce this, it is necessary to maintain many connections (typically 30k) at a high request rate flowing over H1+SSL between two proxies, the second of which would randomly reject ~1% of the incoming connection and randomly killing some idle ones using a very short client timeout. The request rate must be adjusted so that the CPUs are nearly saturated, but never reach 100%. It's easier to reproduce this by skipping local connections and always picking from other threads. The issue should happen in less than 20s otherwise it's necessary to restart to reset the idle connections lists. No backport is needed, takeover() is 2.2 only.	2020-07-03 17:49:23 +02:00
Willy Tarreau	43079e0731	MINOR: sched: split tasklet_wakeup() into tasklet_wakeup_on() tasklet_wakeup() only checks tl->tid to know whether the task is programmed to run on the current thread or on a specific thread. We'll have to ease this selection in a subsequent patch, preferably without modifying tl->tid, so let's have a new tasklet_wakeup_on() function to specify the thread number to run on. That the logic has not changed at all.	2020-07-03 17:19:47 +02:00
Emeric Brun	9f9b22c4f1	MINOR: log: add time second fraction field to rfc5424 log timestamp. This patch adds the time second fraction in microseconds as supported by the rfc.	2020-07-02 17:56:06 +02:00
Willy Tarreau	dab586c3a8	BUILD: debug: avoid build warnings with DEBUG_MEM_STATS Some libcs define strdup() as a macro and cause redefine warnings to be emitted, so let's first undefine all functions we redefine.	2020-07-02 10:25:01 +02:00
Dragan Dosen	1e3b16f74f	MINOR: log-format: allow to preserve spacing in log format strings Now it's possible to preserve spacing everywhere except in "log-format", "log-format-sd" and "unique-id-format" directives, where spaces are delimiters and are merged. That may be useful when the response payload is specified as a log format string by "lf-file" or "lf-string", or even for headers or anything else. In order to merge spaces, a new option LOG_OPT_MERGE_SPACES is applied exclusively on options passed to function parse_logformat_string(). This patch fixes an issue #701 ("http-request return log-format file evaluation altering spacing of ASCII output/art").	2020-07-02 10:11:44 +02:00
Willy Tarreau	a6026a0c92	MINOR: debug: add a new "debug dev memstats" command Now when building with -DDEBUG_MEM_STATS, some malloc/calloc/strdup/realloc stats are kept per file+line number and may be displayed and even reset on the CLI using "debug dev memstats". This allows to easily track potential leakers or abnormal usages.	2020-07-02 09:14:48 +02:00
Willy Tarreau	76cc699017	MINOR: config: add a new tune.idle-pool.shared global setting. Enables ('on') or disables ('off') sharing of idle connection pools between threads for a same server. The default is to share them between threads in order to minimize the number of persistent connections to a server, and to optimize the connection reuse rate. But to help with debugging or when suspecting a bug in HAProxy around connection reuse, it can be convenient to forcefully disable this idle pool sharing between multiple threads, and force this option to "off". The default is on. This could have been nice to have during the idle connections debugging, but it's not too late to add it!	2020-07-01 19:07:37 +02:00
Olivier Houchard	ff1d0929b8	MEDIUM: connections: Don't use a lock when moving connections to remove. Make it so we don't have to take a lock while moving a connection from the idle list to the toremove_list by taking advantage of the MT_LIST.	2020-07-01 17:09:19 +02:00
Olivier Houchard	f8f4c2ef60	CLEANUP: connections: rename the toremove_lock to takeover_lock This lock was misnamed and a bit confusing. It's only used for takeover so let's call it takeover_lock.	2020-07-01 17:09:10 +02:00
Olivier Houchard	bbee1f7e78	MINOR: list: Add MT_LIST_DEL_SAFE_NOINIT() and MT_LIST_ADDQ_NOCHECK() Add two new macros, MT_LIST_DEL_SAFE_NOINIT makes sure we remove the element from the list, without reinitializing its next and prev, and MT_LIST_ADDQ_NOCHECK is similar to MT_LIST_ADDQ(), except it doesn't check if the element is already in a list. The goal is to be able to move an element from a list we're currently parsing to another, keeping it locked in the meanwhile.	2020-07-01 17:04:00 +02:00
Willy Tarreau	eb8c2c69fa	MEDIUM: sched: implement task_kill() to kill a task task_kill() may be used by any thread to kill any task with less overhead than a regular wakeup. In order to achieve this, it bypasses the priority tree and inserts the task directly into the shared tasklets list, cast as a tasklet. The task_list_size is updated to make sure it is properly decremented after execution of this task. The task will thus be picked by process_runnable_tasks() after checking the tree and sent to the TL_URGENT list, where it will be processed and killed. If the task is bound to more than one thread, its first thread will be the one notified. If the task was already queued or running, nothing is done, only the flag is added so that it gets killed before or after execution. Of course it's the caller's responsibility to make sur any resources allocated by this task were already cleaned up or taken over.	2020-07-01 16:35:53 +02:00
Willy Tarreau	8a6049c268	MEDIUM: sched: create a new TASK_KILLED task flag This flag, when set, will be used to indicate that the task must die. At the moment this may only be placed by the task itself or by the scheduler when placing it into the TL_NORMAL queue.	2020-07-01 16:35:49 +02:00
Willy Tarreau	364f25a688	MINOR: backend: don't always takeover from the same threads The next thread walking algorithm in commit `566df309c` ("MEDIUM: connections: Attempt to get idle connections from other threads.") proved to be sufficient for most cases, but it still has some rough edges when threads are unevenly loaded. If one thread wakes up with 10 streams to process in a burst, it will mainly take over connections from the next one until it doesn't have anymore. This patch implements a rotating index that is stored into the server list and that any thread taking over a connection is responsible for updating. This way it starts mostly random and avoids always picking from the same place. This results in a smoother distribution overall and a slightly lower takeover rate.	2020-07-01 16:07:43 +02:00
Willy Tarreau	2f3f4d3441	MEDIUM: server: add a new pool-low-conn server setting The problem with the way idle connections currently work is that it's easy for a thread to steal all of its siblings' connections, then release them, then it's done by another one, etc. This happens even more easily due to scheduling latencies, or merged events inside the same pool loop, which, when dealing with a fast server responding in sub-millisecond delays, can really result in one thread being fully at work at a time. In such a case, we perform a huge amount of takeover() which consumes CPU and requires quite some locking, sometimes resulting in lower performance than expected. In order to fight against this problem, this patch introduces a new server setting "pool-low-conn", whose purpose is to dictate when it is allowed to steal connections from a sibling. As long as the number of idle connections remains at least as high as this value, it is permitted to take over another connection. When the idle connection count becomes lower, a thread may only use its own connections or create a new one. By proceeding like this even with a low number (typically 2*nbthreads), we quickly end up in a situation where all active threads have a few connections. It then becomes possible to connect to a server without bothering other threads the vast majority of the time, while still being able to use these connections when the number of available FDs becomes low. We also use this threshold instead of global.nbthread in the connection release logic, allowing to keep more extra connections if needed. A test performed with 10000 concurrent HTTP/1 connections, 16 threads and 210 servers with 1 millisecond of server response time showed the following numbers: haproxy 2.1.7: 185000 requests per second haproxy 2.2: 314000 requests per second haproxy 2.2 lowconn 32: 352000 requests per second The takeover rate goes down from 300k/s to 13k/s. The difference is further amplified as the response time shrinks.	2020-07-01 15:23:15 +02:00
Willy Tarreau	35e30c9670	BUG/MINOR: server: fix the connection release logic regarding nearly full conditions There was a logic bug in commit `ddfe0743d` ("MEDIUM: server: use the two thresholds for the connection release algorithm"): instead of keeping only our first idle connection when FDs become scarce, the condition was inverted resulting in enforcing this constraint unless FDs are scarce. This results in less idle connections than permitted to be kept under normal condition. No backport needed.	2020-07-01 14:14:29 +02:00
Willy Tarreau	daf8aa62a8	MINOR: pools: increase MAX_BASE_POOLS to 64 When not sharing pools (i.e. when building with -DDEBUG_DONT_SHARE_POOLS) we have about 47 pools right now, while MAX_BASE_POOLS is only 32, meaning that only the first 32 ones will benefit from a per-thread cache entry. This totally kills performance when pools are not shared (roughly -20%). Let's double the limit to gain some margin, and make it possible to set it as a build option. It might be useful to backport this to stable versions as they're likely to be affected as well.	2020-06-30 14:29:02 +02:00
Willy Tarreau	ddfe0743d8	MEDIUM: server: use the two thresholds for the connection release algorithm The algorithm improvement in `bdb86bd` ("MEDIUM: server: improve estimate of the need for idle connections") is still not enough because there's a hard limit between below and above the FD count, so it continues to end up with many killed connections. Here we're proceeding differently. Given that there are two configured limits, a low and a high one, what we do is that we drop connections when the high limit is reached (what's already done by the killing task anyway), when we're between the low and the high threshold, we only keep the connection if our idle entries are empty (with a preference for safe ones), and below the low threshold, we keep any connection so as to give them a chance of being reused or taken over by another thread. Proceeding like this results in much less dropped connections, we typically see a 99.3% reuse rate (76k conns for 10M requests over 200 servers and 4 threads, with 335k takeovers or 3%), and much less CPU usage variations because there are no more bursts to try to kill extra connections. It should be possible to further improve this by counting the number of threads exploiting a server and trying to optimize the amount of per-thread idle connections so that it is approximately balanced among the threads.	2020-06-29 21:54:38 +02:00
Willy Tarreau	e69282a03f	BUG/MINOR: server: always count one idle slot for current thread The idle server connection estimates brought in commit `bdb86bd` ("MEDIUM: server: improve estimate of the need for idle connections") were committed without the minimum of 1 idle conn needed for the current thread. The net effect is that there are bursts of dropped connections when the load varies because there's no provision for the last connection. No backport needed, this is 2.2-dev.	2020-06-29 21:54:38 +02:00
Willy Tarreau	d59946e673	Revert "BUG/MEDIUM: lists: Lock the element while we check if it is in a list." This reverts previous commit 347bbf79d20e1cff57075a8a378355dfac2475e2i. The original code was correct. This patch resulted from a mistaken analysis and breaks the scheduler: ########################## Starting vtest ########################## Testing with haproxy version: 2.2-dev11-90b7d9-23 # top TEST reg-tests/lua/close_wait_lf.vtc TIMED OUT (kill -9) # top TEST reg-tests/lua/close_wait_lf.vtc FAILED (10.008) signal=9 1 tests failed, 0 tests skipped, 88 tests passed Program terminated with signal SIGABRT, Aborted. [Current thread is 1 (Thread 0x7fb0dac2c700 (LWP 11292))] (gdb) bt #0 0x00007fb0e7c143f8 in raise () from /lib64/libc.so.6 #1 0x00007fb0e7c15ffa in abort () from /lib64/libc.so.6 #2 0x000000000053f5d6 in ha_panic () at src/debug.c:269 #3 0x00000000005a6248 in wdt_handler (sig=14, si=<optimized out>, arg=<optimized out>) at src/wdt.c:119 #4 <signal handler called> #5 0x00000000004fbccd in tasklet_wakeup (tl=0x1b5abc0) at include/haproxy/task.h:351 #6 listener_accept (fd=<optimized out>) at src/listener.c:999 #7 0x00000000004262df in fd_update_events (evts=<optimized out>, fd=6) at include/haproxy/fd.h:418 #8 _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:251 #9 0x0000000000548d0f in run_poll_loop () at src/haproxy.c:2949 #10 0x000000000054908b in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3067 #11 0x00007fb0e902b684 in start_thread () from /lib64/libpthread.so.0 #12 0x00007fb0e7ce5eed in clone () from /lib64/libc.so.6 (gdb) up #5 0x00000000004fbccd in tasklet_wakeup (tl=0x1b5abc0) at include/haproxy/task.h:351 351 if (MT_LIST_ADDQ(&task_per_thread[tl->tid].shared_tasklet_list, (struct mt_list *)&tl->list) == 1) { If the commit above is ever backported, this one must be as well!	2020-06-29 21:54:37 +02:00
Olivier Houchard	347bbf79d2	BUG/MEDIUM: lists: Lock the element while we check if it is in a list. In MT_LIST_ADDQ() and MT_LIST_ADD() we can't just check if the element is already in a list, because there's a small race condition, it could be added between the time we checked, and the time we actually set its next and prev. So we have to lock it first. This should be backported to 2.1.	2020-06-29 19:59:06 +02:00
Willy Tarreau	a9fcecbdf3	MINOR: stats: add the estimated need of concurrent connections per server The max_used_conns value is used as an estimate of the needed number of connections on a server to know how many to keep open. But this one is not reported, making it hard to troubleshoot reuse issues. Let's export it in the sessions/current column.	2020-06-29 16:29:11 +02:00
Willy Tarreau	bdb86bdaab	MEDIUM: server: improve estimate of the need for idle connections Starting with commit `079cb9a` ("MEDIUM: connections: Revamp the way idle connections are killed") we started to improve the way to compute the need for idle connections. But the condition to keep a connection idle or drop it when releasing it was not updated. This often results in storms of close when certain thresholds are met, and long series of takeover() when there aren't enough connections left for a thread on a server. This patch tries to improve the situation this way: - it keeps an estimate of the number of connections needed for a server. This estimate is a copy of the max over previous purge period, or is a max of what is seen over current period; it differs from max_used_conns in that this one is a counter that's reset on each purge period ; - when releasing, if the number of current idle+used connections is lower than this last estimate, then we'll keep the connection; - when releasing, if the current thread's idle conns head is empty, and we don't exceed the estimate by the number of threads, then we'll keep the connection. - when cleaning up connections, we consider the max of the last two periods to avoid killing too many idle conns when facing bursty traffic. Thanks to this we can better converge towards a situation where, provided there are enough FDs, each active server keeps at least one idle connection per thread all the time, with a total number close to what was needed over the previous measurement period (as defined by pool-purge-delay). On tests with large numbers of concurrent connections (30k) and many servers (200), this has quite smoothed the CPU usage pattern, increased the reuse rate and roughly halved the takeover rate.	2020-06-29 16:29:10 +02:00
Willy Tarreau	b159132ea3	MINOR: activity: add per-thread statistics on FD takeover The FD takeover operation might have certain impacts explaining unexpected activities, so it's important to report such a counter there. We thus count the number of times a thread has stolen an FD from another thread.	2020-06-29 14:26:05 +02:00
Willy Tarreau	3bb617cfe0	MINOR: stats: add 3 new output values for the per-server idle conn state The servers have internal states describing the status of idle connections, unfortunately these were not exported in the stats. This patch adds the 3 following gauges: - idle_conn_cur : Current number of unsafe idle connections - safe_conn_cur : Current number of safe idle connections - used_conn_cur : Current number of connections in use	2020-06-29 14:26:05 +02:00
Willy Tarreau	20dc3cd4a6	MINOR: pools: move the LRU cache heads to thread_info The LRU cache head was an array of list, which causes false sharing between 4 to 8 threads in the same cache line. Let's move it to the thread_info structure instead. There's no need to do the same for the pool_cache[] array since it's already quite large (32 pointers each). By doing this the request rate increased by 1% on a 16-thread machine.	2020-06-29 10:36:37 +02:00
Willy Tarreau	c03d7632a5	CLEANUP: pool: only include the type files from types pool-t.h was mistakenly including the full-blown includes for threads, lists and api instead of the types, and as such, CONFIG_HAP_LOCAL_POOLS and CONFIG_HAP_LOCKLESS_POOLS were not visible everywhere.	2020-06-29 10:11:24 +02:00
Willy Tarreau	e4d1505c83	REORG: includes: create tinfo.h for the thread_info struct The thread_info struct is convenient to store various per-thread info without having to resort to a painful thread_local storage which is slow and painful to initialize. The problem is, by having this one in thread.h it's very difficult to add more entries there because everyone already includes thread.h so conversely thread.h cannot reference certain types. There's no point in having this there, instead let's create a new pair of files, tinfo{,-t}.h, which declare the structure. This way it will become possible to extend them with other includes and have certain files store their own types there.	2020-06-29 09:57:23 +02:00
Willy Tarreau	4d82bf5c2e	MINOR: connection: align toremove_{lock,connections} and cleanup into idle_conns We used to have 3 thread-based arrays for toremove_lock, idle_cleanup, and toremove_connections. The problem is that these items are small, and that this creates false sharing between threads since it's possible to pack up to 8-16 of these values into a single cache line. This can cause real damage where there is contention on the lock. This patch creates a new array of struct "idle_conns" that is aligned on a cache line and which contains all three members above. This way each thread has access to its variables without hindering the other ones. Just doing this increased the HTTP/1 request rate by 5% on a 16-thread machine. The definition was moved to connection.{c,h} since it appeared a more natural evolution of the ongoing changes given that there was already one of them declared in connection.h previously.	2020-06-28 10:52:36 +02:00
Willy Tarreau	d79422a0ff	BUG/MEDIUM: buffers: always allocate from the local cache first It looked strange to see pool_evict_from_cache() always very present on "perf top", but there was actually a reason to this: while b_free() uses pool_free() which properly disposes the buffer into the local cache and b_alloc_fast() allocates using pool_get_first() which considers the local cache, b_alloc_margin() does not consider the local cache as it only uses __pool_get_first() which only allocates from the shared pools. The impact is that basically everywhere a buffer is allocated (muxes, streams, applets), it's always picked from the shared pool (hence involves locking) and is released to the local one and makes it grow until it's required to trigger a flush using pool_evict_from_cache(). Buffers usage are thus not thread-local at all, and cause eviction of a lot of possibly useful objects from the local caches. Just fixing this results in a 10% request rate increase in an HTTP/1 test on a 16-thread machine. This bug was caused by recent commit `ed891fd` ("MEDIUM: memory: make local pools independent on lockless pools") merged into 2.2-dev9, so not backport is needed.	2020-06-28 10:45:35 +02:00
Willy Tarreau	4dc6c860b4	CLEANUP: buffers: remove unused buffer_wq_lock lock Commit `2104659` ("MEDIUM: buffer: remove the buffer_wq lock") removed usage of the lock but not the lock itself. It's totally unused, let's remove it.	2020-06-28 10:45:34 +02:00
Anthonin Bonnefoy	85048f80c9	MINOR: http: Add support for http 413 status Add 413 http "payload too large" status code. This will allow 413 to be used in deny_status and errorfile.	2020-06-26 11:30:02 +02:00
Ilya Shipitsin	47d17182f4	CLEANUP: assorted typo fixes in the code and comments This is 10th iteration of typo fixes	2020-06-26 11:27:28 +02:00
Ilya Shipitsin	f44d155515	BUILD: fix ssl_sample.c when building against BoringSSL BoringSSL does not have X509_get_X509_PUBKEY let our emulation level define that for BoringSSL as well Build log: src/ssl_sample.o: In function `smp_fetch_ssl_x_key_alg': /home/travis/build/haproxy/haproxy/src/ssl_sample.c:592: undefined reference to `X509_get_X509_PUBKEY' clang-7: error: linker command failed with exit code 1 (use -v to see invocation) Makefile:860: recipe for target 'haproxy' failed make: *** [haproxy] Error 1 travis-ci: https://travis-ci.com/github/haproxy/haproxy/jobs/351670996	2020-06-26 10:33:38 +02:00
Willy Tarreau	c54e5ad9cc	MINOR: cfgparse: sanitize the output a little bit With the rework of the config line parser, we've started to emit a dump of the initial line underlined by a caret character indicating the error location. But with extremely large lines it starts to take time and can even cause trouble to slow terminals (e.g. over ssh), and this becomes useless. In addition, control characters could be dumped as-is which is bad, especially when the input file is accidently wrong (an executable). This patch adds a string sanitization function which isolates an area around the error position in order to report only that area if the string is too large. The limit was set to 80 characters, which will result in roughly 40 chars around the error being reported only, prefixed and suffixed with "..." as needed. In addition, non-printable characters in the line are now replaced with '?' so as not to corrupt the terminal. This way invalid variable names, unmatched quotes etc will be easier to spot. A typical output is now: [ALERT] 176/092336 (23852) : parsing [bad.cfg:8]: forbidden first char in environment variable name at position 811957: ...c$PATH$PATH$d(xlc`%?$PATH$PATH$dgc?T$%$P?AH?$PATH$PATH$d(?$PATH$PATH$dgc?%... ^	2020-06-25 09:43:27 +02:00
Willy Tarreau	e7723bddd7	MEDIUM: tasks: add a tune.sched.low-latency option Now that all tasklet queues are scanned at once by run_tasks_from_lists(), it becomes possible to always check for lower priority classes and jump back to them when they exist. This patch adds tune.sched.low-latency global setting to enable this behavior. What it does is stick to the lowest ranked priority list in which tasks are still present with an available budget, and leave the loop to refill the tasklet lists if the trees got new tasks or if new work arrived into the shared urgent queue. Doing so allows to cut the latency in half when running with extremely deep run queues (10k-100k), thus allowing forwarding of small and large objects to coexist better. It remains off by default since it does have a small impact on large traffic by default (shorter batches).	2020-06-24 12:21:26 +02:00
Willy Tarreau	59153fef86	MINOR: tasks: make run_tasks_from_lists() scan the queues itself Now process_runnable_tasks is responsible for calculating the budgets for each queue, dequeuing from the tree, and calling run_tasks_from_lists(). This latter one scans the queues, picking tasks there and respecting budgets. Note that its name was updated with a plural "s" for this reason.	2020-06-24 12:21:26 +02:00
Willy Tarreau	ba48d5c8f9	MINOR: tasks: pass the queue index to run_task_from_list() Instead of passing it a pointer to the queue, pass it the queue's index so that it can perform all the work around current_queue and tl_class_mask.	2020-06-24 12:21:26 +02:00
Willy Tarreau	49f90bf148	MINOR: tasks: add a mask of the queues with active tasklets It is neither convenient nor scalable to check each and every tasklet queue to figure whether it's empty or not while we often need to check them all at once. This patch introduces a tasklet class mask which gets a bit 1 set for each queue representing one class of service. A single test on the mask allows to figure whether there's still some work to be done. It will later be usable to better factor the runqueue code. Bits are set when tasklets are queued. They're cleared when queues are emptied. It is possible that a queue is empty but has a bit if a tasklet was added then removed, but this is not a problem as this is properly checked for in run_tasks_from_list().	2020-06-24 12:21:26 +02:00
Willy Tarreau	c0a08ba2df	MINOR: tasks: make current_queue an index instead of a pointer It will be convenient to have the tasklet queue number soon, better make current_queue an index rather than a pointer to the queue. When not currently running (e.g. from I/O), the index is -1.	2020-06-24 12:21:26 +02:00
William Lallemand	ee8530c65e	MINOR: ssl: free the crtlist and the ckch during the deinit() Add some functions to deinit the whole crtlist and ckch architecture. It will free all crtlist, crtlist_entry, ckch_store, ckch_inst and their associated SNI, ssl_conf and SSL_CTX. The SSL_CTX in the default_ctx and initial_ctx still needs to be free'd separately.	2020-06-23 20:07:50 +02:00
William Lallemand	7df5c2dc3c	BUG/MEDIUM: ssl: fix ssl_bind_conf double free Since commit `2954c47` ("MEDIUM: ssl: allow crt-list caching"), the ssl_bind_conf is allocated directly in the crt-list, and the crt-list can be shared between several bind_conf. The deinit() code wasn't changed to handle that. This patch fixes the issue by removing the free of the ssl_conf in ssl_sock_free_all_ctx(). It should be completed with a patch that free the ssl_conf and the crt-list. Fix issue #700.	2020-06-23 20:06:55 +02:00
Willy Tarreau	5bd73063ab	BUG/MEDIUM: task: be careful not to run too many tasks at TL_URGENT A test on large objects revealed a big performance loss from 2.1. The cause was found to be related to cache locality between scheduled operations that are batched using tasklets. It happens that we now have several layers of tasklets and that queuing all these operations leaves time to let memory objects cool down in the CPU cache, effectively resulting in halving the performance. A quick test consisting in putting most unknown tasklets into the BULK queue almost fixed the performance regression, but this is a wrong approach as it can also slow down some low-latency transfers or access to applets like the CLI. What this patch does instead is to queue unknown tasklets into the same queue as the current one when tasklet_wakeup() is itself called from a task/tasklet, otherwise it uses urgent for real I/O (when sched->current is NULL). This results in the called tasklet being woken up much sooner, often at the end of the current batch of tasklets. By doing so, a test on 2 cores 4 threads with 256 concurrent H1 conns transferring 16m objects with 256kB buffers jumped from 55 to 88 Gbps. It's even possible to go as high as 101 Gbps by evaluating the URGENT queue after the BULK one, though this was not done as considered dangerous for latency sensitive operations. This reinforces the importance of getting back the CPU transfer mechanisms based on tasklet_wakeup_after() to work at the tasklet level by supporting an immediate wakeup in certain cases. No backport is needed, this is strictly 2.2.	2020-06-23 16:45:28 +02:00
Willy Tarreau	116ef223d2	MINOR: task: add a new pointer to current tasklet queue In task_per_thread[] we now have current_queue which is a pointer to the current tasklet_list entry being evaluated. This will be used to know the class under which the current task/tasklet is currently running.	2020-06-23 16:35:38 +02:00
Willy Tarreau	38e8a1c7b8	MINOR: debug: add a new DEBUG_FD build option When DEBUG_FD is set at build time, we'll keep a counter of per-FD events in the fdtab. This counter is reported in "show fd" even for closed FDs if not zero. The purpose is to help spot situations where an apparently closed FD continues to be reported in loops, or where some events are dismissed.	2020-06-23 10:04:54 +02:00
Willy Tarreau	d1d005d7f6	MEDIUM: map: make the "clear map" operation yield As reported in issue #419, a "clear map" operation on a very large map can take a lot of time and freeze the entire process for several seconds. This patch makes sure that pat_ref_prune() can regularly yield after clearing some entries so that the rest of the process continues to work. The first part, the removal of the patterns, can take quite some time by itself in one run but it's still relatively fast. It may block for up to 100ms for 16M IP addresses in a tree typically. This change needed to declare an I/O handler for the clear operation so that we can get back to it after yielding. The second part can be much slower because it deconstructs the elements and its users, but it iterates progressively so we can yield less often here. The patch was tested with traffic in parallel sollicitating the map being released and showed no problem. Some traffic will definitely notice an incomplete map but the filling is already not atomic anyway thus this is not different. It may be backported to stable versions once sufficiently tested for side effects, at least as far as 2.0 in order to avoid the watchdog triggering when the process is frozen there. For a better behaviour, all these prune_* functions should support yielding so that the callers have a chance to continue also yield in turn.	2020-06-19 16:57:51 +02:00
Willy Tarreau	bc52bec163	MEDIUM: fd: add experimental support for edge-triggered polling Some of the recent optimizations around the polling to save a few epoll_ctl() calls have shown that they could also cause some trouble. However, over time our code base has become totally asynchronous with I/Os always attempted from the upper layers and only retried at the bottom, making it look like we're getting closer to EPOLLET support. There are showstoppers there such as the listeners which cannot support this. But given that most of the epoll_ctl() dance comes from the connections, we can try to enable edge-triggered polling on connections. What this patch does is to add a new global tunable "tune.fd.edge-triggered", that makes fd_insert() automatically set an et_possible bit on the fd if the I/O callback is conn_fd_handler. When the epoll code sees an update for such an FD, it immediately registers it in both directions the first time and doesn't update it anymore. On a few tests it proved quite useful with a 14% request rate increase in a H2->H1 scenario, reducing the epoll_ctl() calls from 2 per request to 2 per connection. The option is obviously disabled by default as bugs are still expected, particularly around the subscribe() code where it is possible that some layers do not always re-attempt reading data after being woken up.	2020-06-19 14:21:46 +02:00
Dragan Dosen	13cd54c08b	MEDIUM: peers: add the "localpeer" global option localpeer <name> Sets the local instance's peer name. It will be ignored if the "-L" command line argument is specified or if used after "peers" section definitions. In such cases, a warning message will be emitted during the configuration parsing. This option will also set the HAPROXY_LOCALPEER environment variable. See also "-L" in the management guide and "peers" section in the configuration manual.	2020-06-19 11:37:30 +02:00
Dragan Dosen	4f01415d3b	MINOR: peers: do not use localpeer as an array anymore It is now dynamically allocated by using strdup().	2020-06-19 11:37:11 +02:00
Willy Tarreau	7af4fa9a48	MINOR: activity: rename the "stream" field to "stream_calls" This one was confusingly called, I thought it was the cumulated number of streams but it's the number of calls to process_stream(). Let's make this clearer.	2020-06-17 20:52:29 +02:00
Willy Tarreau	e406386542	MINOR: activity: rename confusing poll_* fields in the output We have poll_drop, poll_dead and poll_skip which are confusingly named like their poll_io and poll_exp counterparts except that they are not per poll() call but per-fd. This patch renames them to poll_drop_fd(), poll_dead_fd() and poll_skip_fd() for this reason.	2020-06-17 20:35:33 +02:00
Willy Tarreau	e545153c50	MINOR: activity: report the number of times poll() reports I/O The "show activity" output mentions a number of indicators to explain wake up reasons but doesn't have the number of times poll() sees some I/O. And given that multiple events can happen simultaneously, it's not always possible to deduce this metric by subtracting. This patch adds a new "poll_io" counter that allows one to see how often poll() returns with at least one active FD. This should help detect stuck events and measure various ratios of poll sub-metrics.	2020-06-17 20:25:18 +02:00
Willy Tarreau	c208a54ab2	DOC: fd: make it clear that some fields ordering must absolutely be respected fd_set_running() and fd_takeover() may both use a double-word CAS on the (running_mask, thread_mask) couple and as such they expect the fields to be exactly arranged like this. It's critical not to reorder them, so add a comment to avoid such a potential mistake later.	2020-06-17 19:58:37 +02:00
Willy Tarreau	4f72ec851c	CLEANUP: activity: remove unused counter fd_lock Since 2.1-dev2, with commit `305d5ab46` ("MAJOR: fd: Get rid of the fd cache.") we don't have the fd_lock anymore and as such its acitvity counter is always zero. Let's remove it from the struct and from "show activity" output, as there are already plenty of indicators to look at. The cache line comment in the struct activity was updated to reflect reality as it looks like another one already got removed in the past.	2020-06-17 19:15:51 +02:00
Willy Tarreau	6d4c81db96	MINOR: compiler: always define __has_feature() This macro is provided by clang but gcc lacks it. Not having it makes it painful to test features on both compilers. Better define it to zero when not available so that __has_feature(foo) never errors.	2020-06-16 19:13:24 +02:00
Willy Tarreau	c8d167bcfb	MINOR: tools: add a new configurable line parse, parse_line() This function takes on input a string to tokenize, an output storage (which may be the same) and a number of options indicating how to handle certain characters (single & double quote support, backslash support, end of line on '#', environment variables etc). On output it will provide a list of pointers to individual words after having possibly unescaped some character sequences, handled quotes and resolved environment variables, and it will also indicate a status made of: - a list of failures (overlap between src/dst, wrong quote etc) - the pointer to the first sequence in error - the required output length (a-la snprintf()). This allows a caller to freely unescape/unquote a string by using a pre-allocated temporary buffer and expand it as necessary. It takes extreme care at avoiding expensive operations and intentionally does not use memmove() when removing escapes, hence the reason for the different input and output buffers. The goal is to use it as the basis for the config parser.	2020-06-16 16:27:26 +02:00
Willy Tarreau	853926a9ac	BUG/MEDIUM: ebtree: use a byte-per-byte memcmp() to compare memory blocks As reported in issue #689, there is a subtle bug in the ebtree code used to compared memory blocks. It stems from the platform-dependent memcmp() implementation. Original implementations used to perform a byte-per-byte comparison and to stop at the first non-matching byte, as in this old example: https://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/compat-sys5/memcmp.c.html The ebtree code has been relying on this to detect the first non-matching byte when comparing keys. This is made so that a zero-terminated string can fail to match against a longer string. Over time, especially with large busses and SIMD instruction sets, multi-byte comparisons have appeared, making the processor fetch bytes past the first different byte, which could possibly be a trailing zero. This means that it's possible to read past the allocated area for a string if it was allocated by strdup(). This is not correct and definitely confuses address sanitizers. In real life the problem doesn't have visible consequences. Indeed, multi-byte comparisons are implemented so that aligned words are loaded (e.g. 512 bits at once to process a cache line at a time). So there is no way such a multi-byte access will cross a page boundary and end up reading from an unallocated zone. This is why it was never noticed before. This patch addresses this by implementing a one-byte-at-a-time memcmp() variant for ebtree, called eb_memcmp(). It's optimized for both small and long strings and guarantees to stop after the first non-matching byte. It only needs 5 instructions in the loop and was measured to be 3.2 times faster than the glibc's AVX2-optimized memcmp() on short strings (1 to 257 bytes), since that latter one comes with a significant setup cost. The break-even seems to be at 512 bytes where both version perform equally, which is way longer than what's used in general here. This fix should be backported to stable versions and reintegrated into the ebtree code.	2020-06-16 11:30:33 +02:00
Willy Tarreau	f3ca5a0273	BUILD: haproxy: mark deinit_and_exit() as noreturn Commit `0a3b43d9c` ("MINOR: haproxy: Make use of deinit_and_exit() for clean exits") introduced this build warning: src/haproxy.c: In function 'main': src/haproxy.c:3775:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ This is because the new deinit_and_exit() is not marked as "noreturn" so depending on the optimizations, the noreturn attribute of exit() will either leak through it and silence the warning or not and confuse the compiler. Let's just add the attribute to fix this. No backport is needed, this is purely 2.2.	2020-06-15 18:43:46 +02:00
Willy Tarreau	bcefb85009	BUILD: atomic: add string.h for memcpy() on ARM64 As reported in issue #686, ARM64 build fails since the include files reorganization. This is caused by the lack of string.h while a memcpy() is present in __ha_cas_dw().	2020-06-14 08:08:13 +02:00

... 4 5 6 7 8 ...

4854 Commits