haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-12 18:16:58 +02:00

Author	SHA1	Message	Date
Willy Tarreau	4e65fc66f6	MAJOR: import: update mt_list to support exponential back-off (try #2 ) This is the second attempt at importing the updated mt_list code (commit 59459ea3). The previous one was attempted with commit `c618ed5ff4` ("MAJOR: import: update mt_list to support exponential back-off") but revealed problems with QUIC connections and was reverted. The problem that was faced was that elements deleted inside an iterator were no longer reset, and that if they were to be recycled in this form, they could appear as busy to the next user. This was trivially reproduced with this: $ cat quic-repro.cfg global stats socket /tmp/sock1 level admin stats timeout 1h limited-quic frontend stats mode http bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3 timeout client 5s stats uri / $ ./haproxy -db -f quic-repro.cfg & $ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/ => hang This was purely an API issue caused by the simplified usage of the macros for the iterator. The original version had two backups (one full element and one pointer) that the user had to take care of, while the new one only uses one that is transparent for the user. But during removal, the element still has to be unlocked if it's going to be reused. All of this sparked discussions with Fred and Aur�lien regarding the still unclear state of locking. It was found that the lock API does too much at once and is lacking granularity. The new version offers a much more fine- grained control allowing to selectively lock/unlock an element, a link, the rest of the list etc. It was also found that plenty of places just want to free the current element, or delete it to do anything with it, hence don't need to reset its pointers (e.g. event_hdl). Finally it appeared obvious that the root cause of the problem was the unclear usage of the list iterators themselves because one does not necessarily expect the element to be presented locked when not needed, which makes the unlock easy to overlook during reviews. The updated version of the list presents explicit lock status in the macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED suffix, the caller is expected to unlock the element if it intends to reuse it. At least the status is advertised. The _UNLOCKED variant, instead, always unlocks it before starting the loop block. This means it's not necessary to think about unlocking it, though it's obviously not usable with everything. A few _UNLOCKED were used at obvious places (i.e. where the element is deleted and freed without any prior check). Interestingly, the tests performed last year on QUIC forwarding, that resulted in limited traffic for the original version and higher bit rate for the new one couldn't be reproduced because since then the QUIC stack has gaind in efficiency, and the 100 Gbps barrier is now reached with or without the mt_list update. However the unit tests definitely show a huge difference, particularly on EPYC platforms where the EBO provides tremendous CPU savings. Overall, the following changes are visible from the application code: - mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr => MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED() + 1 back elem - MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED() => just manually set iterator to NULL however. For MT_LIST_FOR_EACH_ENTRY_LOCKED() => mt_list_unlock_self() (if element going to be reused) + NULL - MT_LIST_LOCK_ELT => mt_list_lock_full() - MT_LIST_UNLOCK_ELT => mt_list_unlock_full() - l = MT_LIST_APPEND_LOCKED(h, e); MT_LIST_UNLOCK_ELT(); => l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)	2024-07-09 16:46:38 +02:00
Amaury Denoyelle	50ae717624	BUG/MEDIUM: server: fix race on server_atomic_sync() The following patch fixes a race condition during server addr/port update : `cd994407a9` BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates The new update mechanism is implemented via an event update. It uses thread isolation to guarantee that no other thread is accessing server addr/port. Furthermore, to ensure server instance is not deleted just before the event handler, server instance is lookup via its ID in proxy tree. However, thread isolation is only entered after server lookup. This leaves a tiny race condition as the thread will be marked as harmless and a concurrent thread can delete the server in the meantime. This causes server_atomic_sync() to manipulated a deleted server instance to reinsert it in used_server_addr backend tree. This can cause a segfault during this operation or possibly on a future used_server_addr tree access. This issue was detected by criteo. Several backtraces were retrieved, each related to server addr_node insert or delete operation, either in srv_set_addr_desc(), or add/delete dynamic server handlers. To fix this, simply extend thread isolation section to start it before server lookup. This ensures that once retrieved the server cannot be deleted until its addr/port are updated. To ensure this issue won't happen anymore, a new BUG_ON() is added in srv_set_addr_desc(). Also note that ebpt_delete() is now called every time on delete handler as this is a safe idempotent operation. To reproduce these crashes, a script was executed to add then remove different servers every second. In parallel, the following CLI command was issued repeatdly without any delay to force multiple update on servers port : set server <srv> addr 0.0.0.0 port $((1024 + RANDOM % 1024)) This must be backported at least up to 3.0. If above mentionned patch has been selected for previous version, this commit must also be backported on them.	2024-07-03 09:20:24 +02:00
Aurelien DARRAGON	80aba1d284	BUG/MEDIUM: server/dns: prevent DOWN/UP flap upon resolution timeout or error This is a complementary patch to `c16eba818` ("BUG/MEDIUM: server/dns: preserve server's port upon resolution timeout or error"). Indeed, since `c16eba818`, the port is properly preserved, but unsetting server's address this way results in server_atomic_sync() function thinking that we're actually setting a new address and not unsetting the previous one because addr family is != AF_UNSPEC. Upon DNS timeout, this could be observed: [WARNING] (2588257) : Server http/s1 is going DOWN for maintenance (DNS timeout status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. [WARNING] (2588257) : Server http/s1 ('test1.localhost') is UP/READY (resolves again). Notice that server timeouts and then immediately resolves again. Of course in this case case the server's address was properly set to 0, meaning that the server will not receive any traffic, but it is confusing and could result in haproxy temporarily thinking that the server is actually available while it's not. To properly fix the issue and restore historical behavior, let's explicitly set inetaddr's family to AF_UNSPEC after fetching original server's address. It should be backported in 3.0 with `c16eba818`.	2024-06-28 11:26:52 +02:00
Aurelien DARRAGON	eec8048042	BUG/MINOR: server: fix first server template name lookup UAF This is a follow-up for `7223296` ("BUG/MINOR: server: fix first server template not being indexed"). Indeed, in `7223296` we added a new call to _srv_parse_set_id_from_prefix() for the first server before handling additional ones. But we actually overlooked the fact that _srv_parse_set_id_from_prefix() was already performed at the end of _srv_parse_tmpl_init() for the same server. Since _srv_parse_set_id_from_prefix() frees srv->id, it results in UAF when performing name lookups on the first server, because used_server_name node key still uses the freed string pointer. The early _srv_parse_set_id_from_prefix() call (added in `7223296`) and the original one perform the same task, except that the new one is followed by name node insertion logic required for name lookups to work properly. So let's simply get rid of the old one at the end of the function. _srv_parse_set_id_from_prefix() in the 'err:' label was also removed since is is now useless as well starting with `7223296` and would trigger the same bug on error paths. Thanks to Amaury for noticing it. This bug was discovered while trying to address GH issue #2620. Thanks to @x-yuri for his detailed report (with working repro). It should be backported in 3.0 with `7223296`.	2024-06-27 16:38:25 +02:00
Christopher Faulet	0d7c1bc6ab	BUG/MINOR: server: Don't reset resolver options on a new default-server line When a new "default-server" line is parsed, some resolver options are reset. Thus previously defined default options cannot be inherited. There is no reason to do so. First because other server options are inherited. And then because not all resolver options are reset. It is not consistent. This patch should fix issue #2559. It should be backported to all stable versions.	2024-05-24 16:31:01 +02:00
Aurelien DARRAGON	c16eba8183	BUG/MEDIUM: server/dns: preserve server's port upon resolution timeout or error @boi4 reported in GH #2578 that since 3.0-dev1 for servers with address learned from A/AAAA records after a DNS flap server would be put out of maintenance with proper address but with invalid port (== 0), making it unusable and causing tcp checks to fail: [NOTICE] (1) : Loading success. [WARNING] (8) : Server mybackend/myserver1 is going DOWN for maintenance (DNS refused status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. [ALERT] (8) : backend 'mybackend' has no server available! [WARNING] (8) : mybackend/myserver1: IP changed from '(none)' to '127.0.0.1' by 'myresolver/ns1'. [WARNING] (8) : Server mybackend/myserver1 ('myhost') is UP/READY (resolves again). [WARNING] (8) : Server mybackend/myserver1 administratively READY thanks to valid DNS answer. [WARNING] (8) : Server mybackend/myserver1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. @boi4 also mentioned that this used to work fine before. Willy suggested that this regression may have been introduced by `64c9c8e` ("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS") Turns out he was right! Indeed, in `64c9c8e` we systematically memset the whole server_inetaddr struct (which contains both the requested server's addr and port planned for atomic update) instead of only memsetting the addr part of the structure: except when SRV records are involved (SRV records provide both the address and the port unlike A or AAAA records), we must not reset the server's port upon DNS errors because the port may have been provided at config time and we don't want to lose its value. Big thanks to @boi4 for his well-documented issue that really helped us to pinpoint the bug right on time for the dev-13 release. No backport needed (unless `64c9c8e` gets backported).	2024-05-24 15:29:48 +02:00
Amaury Denoyelle	be4f89f2b2	MINOR: server: define pool-conn-name keyword Define a new server keyword pool-conn-name. The purpose of this keyword will be to identify connections inside the idle connections pool, replacing SNI in case SSL is not wanted. This keyword uses a sample expression argument. It thus can reuse existing function parse_srv_expr() for parsing. In the future, it may be necessary to define a keyword variant which uses a logformat for extensability. This patch only implement parsing. Argument is stored inside new server field <pool_conn_name> and expression is generated in _srv_parse_finalize() into <pool_conn_name_expr>. If pool-conn-name is not set but SNI is, the latter is reused automatically as pool-conn-name via _srv_parse_finalize(). This ensures current reuse behavior remains compatible and idle connection reuse will not mix connections with different SNIs by mistake. Main usage will be for rhttp when SSL is not wanted between the two haproxy instances. Previously, it was possible to use "sni" keyword even without SSL on a server line which have a similar effect. However, having a dedicated "pool-conn-name" keyword is deemed clearer. Besides, it would allow for more complex configuration where pool-conn-name and SNI are use in parallel with different values.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	91001422b4	MINOR: server: generalize sni expr parsing Two functions exists for server sni sample expression parsing. This is confusing so this commit aims at clarifying this. Functions are renamed with the following identifiers. First function is named parse_srv_expr() and can be used during parsing. Besides expression parsing, it has ensure sample fetch validity in the context of a server line. Second function is renamed _parse_srv_expr() and is used internally by parse_srv_expr(). It only implements sample parsing without extra checks. It is already use for server instantiation derived from server-template as checks were already performed. Also, it is now used in http-client code as SNI is a fixed string. Finally, both functions are generalized to remove any reference to SNI. This will allow to reuse it to parse other server keywords which use an expression. This will be the case for the future keyword pool-conn-name.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	3efd9f3925	BUG/MINOR: server: free PROXY v2 TLVs on srv drop Dynamically allocated servers PROXY TLVs were not freed on server release. This patch fixes this leak by extending srv_free_params(). Every server line with set-proxy-v2-tlv-fmt keyword is impacted. For static servers, issue is minimal as it will only cause leak on deinit(). However, this could be aggravated when performing multiple removal of dynamic servers. This should be backported up to 2.9.	2024-05-22 10:01:57 +02:00
Amaury Denoyelle	412f1eeb89	BUG/MEDIUM: server: clear purgeable conns before server deletion Since the following commit, idle connections are cleared before a server is deleted. This is better than blocking server deletion due to inactive connections : `6e0afb2e27` MEDIUM: server: close idle conn on server deletion A BUG_ON() has been added to ensure that server idle conn counter is nul after these connections are removed. However, Willy managed to trigger it easily by repeatedly and randomly delete servers accross a single-thread haproxy using a server-template with 1000 instances. In parallel, a h1load client is executed to generate traffic. This BUG_ON() reflected that it some connections referencing the server targetted for deletion remained, even though idle server list is empty. In fact, this is caused by connections scheduled for purging. These connections are moved from idle server list to a global toremove_list while still being accounted by the server. A first approach could be to decrement server idle counter while moving connection to the purge list. However, this is functionnaly incorrect as these purgeable connections still reference the server and it could cause a crash if cleared after it. The correct fix for this issue is simply to remove every purgeable connections before a server is deleted. This is implemented by this patch by extending cli_parse_delete_server(). It could be enough to only remove connections targetted the deleted server, but as these connections will be purged anyway it is justified to clear the whole list. This must not be backported, unless the above mentionned patch is.	2024-05-15 15:01:55 +02:00
Amaury Denoyelle	10ab56831e	MINOR: stats: convert age as generic column for proxy stat Convert FN_AGE in stat_cols_px[] as generic columns. These values will be automatically used for dump/preload of a stats-file. Remove srv_lastsession() / be_lastsession() function which are now useless as last_sess is calculated via me_generate_field().	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	634cc2a5d8	MINOR: counters: move last_change into counters struct last_change was a member present in both proxy and server struct. It is used as an age statistics to report the last update of the object. Move last_change into fe_counters/be_counters. This is necessary to be able to manipulate it through generic stat column and report it into stats-file. Note that there is a change for proxy structure with now 2 different last_change values, on frontend and backend side. Special care was taken to ensure that the value is initialized only on the proxy side. The other value is set to 0 unless a listen proxy is instantiated. For the moment, only backend counter is reported in stats. However, with now two distinct values, stats could be extended to report it on both side.	2024-05-02 10:55:25 +02:00
Valentine Krasnobaeva	5cbb278fae	MINOR: capabilities: add cap_sys_admin support If 'namespace' keyword is used in the backend server settings or/and in the bind string, it means that haproxy process will call setns() to change its default namespace to the configured one and then, it will create a socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN capability in the process Effective set (see man 2 setns). Otherwise, the process must be run as root. To avoid to run haproxy as root, let's add cap_sys_admin capability in the same way as we already added the support for some other network capabilities. As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was found during configuration parsing. The flag may be unset only in prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which inspect process EUID/RUID and Effective and Permitted capabilities sets. If system doesn't support Linux capabilities or 'cap_sys_admin' was not set in 'setcap', but 'namespace' keyword is presented in the configuration, we keep the previous strict behaviour. Process, that has changed uid to the non-priviledged user, will terminate with alert. This alert invites the user to recheck its configuration. In the case, when haproxy will start and run under a non-root user and 'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch does not change previous behaviour as well. We'll still let the user to try its configuration, but we inform via warning, that unexpected things, like socket creation errors, may occur.	2024-04-30 21:40:17 +02:00
Damien Claisse	0797e05d9f	BUG/MINOR: server: fix slowstart behavior We observed that a dynamic server which health check is down for longer than slowstart delay at startup doesn't trigger the warmup phase, it receives full traffic immediately. This has been confirmed by checking haproxy UI, weight is immediately the full one (e.g. 75/75), without any throttle applied. Further tests showed that it was similar if it was in maintenance, and even when entering a down or maintenance state after being up. Another issue is that if the server is down for less time than slowstart, when it comes back up, it briefly has a much higher weight than expected for a slowstart. An easy way to reproduce is to do the following: - Add a server with e.g. a 20s slowstart and a weight of 10 in config file - Put it in maintenance using CLI (set server be1/srv1 state maint) - Wait more than 20s, enable it again (set server be1/srv1 state ready) - Observe UI, weight will show 10/10 immediately. If server was down for less than 20s, you'd briefly see a weight and throttle value that is inconsistent, e.g. 50% throttle value and a weight of 5 if server comes back up after 10s before going back to 6% after a second or two. Code analysis shows that the logic in server_recalc_eweight stops the warmup task by setting server's next state to SRV_ST_RUNNING if it didn't change state for longer than the slowstart duration, regardless of its current state. As a consequence, a server being down or disabled for longer than the slowstart duration will never enter the warmup phase when it will be up again. Regarding the weight when server comes back up, issue is that even if the server is down, we still compute its next weight as if it was up, hence when it comes back up, it can briefly have a much higher weight than expected during slowstart, until the warmup task is called again after last_change is updated. This patch aims to fix both issues.	2024-04-11 19:24:01 +02:00
Amaury Denoyelle	8259456981	MINOR: server: implement GUID support This commit is similar to previous one, except that it implements GUID support for server instances. A guid_node field is inserted into server structure. A new "guid" server keyword is defined.	2024-04-05 15:40:42 +02:00
Aurelien DARRAGON	6810c41f8e	MEDIUM: tree-wide: add logformat expressions wrapper log format expressions are broadly used within the code: once they are parsed from input string, they are converted to a linked list of logformat nodes. We're starting to face some limitations because we're simply storing the converted expression as a generic logformat_node list. The first issue we're facing is that storing logformat expressions that way doesn't allow us to add metadata alongside the list, which is part of the prerequites for implementing log-profiles. Another issue with storing logformat expressions as generic lists of logformat_node elements is that it's starting to become really hard to tell when we rely on logformat expressions or not in the code given that there isn't always a comment near the list declaration or manipulation to indicate that it's relying on logformat expressions under the hood, so this adds some complexity for code maintenance. This patch looks quite impressive due to changes in a lot of header and source files (since logformat expressions are broadly used), but it does a simple thing: it defines the lf_expr structure which itself holds a generic list of logformat nodes, and then declares some helpers to manipulate lf_expr elements and fixes the code so that we now exclusively manipulate logformat_node lists as lf_expr elements outside of log.c. For now, lf_expr struct only contains the list of logformat nodes (no additional metadata), but now that we have dedicated type and helpers, doing so in the future won't be problematic at all and won't require extensive code changes.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	837a26ab05	BUG/MEDIUM: server/lbprm: fix crash in _srv_set_inetaddr_port() Since `faa8c3e` ("MEDIUM: lb-chash: Deterministic node hashes based on server address") the following configuration will cause haproxy to crash: backend test1 mode http balance hash int(1) server s1 haproxy.org:80 This is because lbprm.update_server_eweight() method is now systematically called in _srv_set_inetaddr_port() upon srv addr/port change (and with the above config it happens during startup after initial dns resolution). However, depending on the chosen lbprm algo, update_server_eweight function may not be set (it is not a mandatory method, some lb implementations don't define it). Thus, using 'balance hash' with map-based hashing or 'balance sticky' will cause a crash due to a NULL de-reference in _srv_set_inetaddr_port(). To fix the issue, we first check that the update_server_eweight() method is set before using it. No backport needed unless `faa8c3e` ("MEDIUM: lb-chash: Deterministic node hashes based on server address") gets backported.	2024-04-03 11:58:03 +02:00
Anthony Deschamps	faa8c3e024	MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key <key>" which can be set to "id" (the existing behaviour, default), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now check whether the server's hash has changed. If it has, we have to remove all its nodes first, since the node keys will also have to change.	2024-04-02 07:00:10 +02:00
Amaury Denoyelle	6333e6ec8e	MINOR: server: allow cookie for dynamic servers This commit allows "cookie" keyword for dynamic servers. After code review, nothing was found which could prevent a dynamic server to use it. An extra warning is added under cli_parse_add_server() if cookie value is ignored due to a non HTTP backend. This patch is not considered a bugfix. However, it may backported if needed as its impact seems minimal.	2024-03-28 11:54:21 +01:00
Damien Claisse	9a0e0d3a19	BUG/MINOR: server: fix persistence cookie for dynamic servers When adding a server dynamically, we observe that when a backend has a dynamic persistence cookie, the new server has no cookie as we receive the following HTTP header: set-cookie: test-cookie=; Expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/ Whereas we were expecting to receive something like the following, which is what we receive for a server added in the config file: set-cookie: test-cookie=abcdef1234567890; path=/ After investigating code path, srv_set_dyncookie() is never called when adding a server through CLI, it is only called when parsing config file or using "set server bkd1/srv1 addr". To fix this, call srv_set_dyncookie() inside cli_parse_add_server(). This patch must be backported up to 2.4.	2024-03-28 11:54:21 +01:00
Amaury Denoyelle	250c19032f	BUG/MINOR: server: reject enabled for dynamic server Since their first implementation, dynamic servers are created into maintenance state. This has been done purposely to avoid immediate activation of a newly inserted server. However, this principle is incompatible if "enabled" keyword is used on "add server". The newly created instance will be unreacheable as proxy load-balancing algorithm is not informed of its presence via srv_lb_propagate(). The new server could be unblocked by toggling its state with "disable server" / "enable server" commands, which will trigger srv_lb_propagate() invocation. To avoid this unexpected state, simply forbid "enabled" keyword for dynamic servers. In the long-term, it could be possible to re authorize it but at least this requires to call srv_lb_propagate() on dynamic server creation. This should fix github issue #2497. This patch should not be backported as-is, to avoid breaking dynamic servers API on stable versions. "enabled" should instead be ignored for them. This will be implemented in a dedicated patch on top of 2.9.	2024-03-28 11:51:05 +01:00
Aurelien DARRAGON	bd98db5078	BUG/MINOR: server: 'source' interface ignored from 'default-server' directive Sebastien Gross reported that 'interface' keyword ('source' subargument) is silently ignored when used from 'default-server' directive despite the documentation implicitly stating that the keyword should be supported there. When support for 'source' keyword was added to 'default-server' directive in `dba97077` ("MINOR: server: Make 'default-server' support 'source' keyword."), we properly duplicated the conn iface_name from the default- server but we forgot to copy the conn iface_len which must be set as well since it is used as setsockopt()'s 'optlen' argument in tcp_connect_server(). It should be backported to all stable versions.	2024-03-26 11:09:02 +01:00
Aurelien DARRAGON	3de1acfb23	BUILD: server: fix build regression on old compilers (<= gcc-4.4) Willy reported that since `3ac79b504` ("MEDIUM: server: make server_set_inetaddr() updater serializable"), haproxy fails to compile on some older compilers such as gcc-4.4 with this kind of error: src/server.c: In function 'snr_resolution_cb': src/server.c:4471: error: unknown field 'dns_resolver' specified in initializer compilation terminated due to -Wfatal-errors. make: *** [Makefile:1006: src/server.o] Error 1 This is due to referencing a member inside anonymous union from a compound literal assignment. Apparently such use of anonymous union wasn't properly supported back then on older compilers. To fix the issue, we give "u" name to the parent union use this name to explicitly refer to the union where relevant in the code (only a few changes fortunately). The fix itself was verified to restore build compatibility with gcc 4.4 (and even 4.2). As `3ac79b504` is used as a prerequisite for `64c9c8ef3` ("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS"), please consider backporting this patch too if `64c9c8ef3` happens to be backported in 2.9.	2024-03-25 16:23:37 +01:00
Amaury Denoyelle	0d4273f04b	MEDIUM: server: close private idle connection before server deletion This commit similar to the following one : 65ae241dcfe710e1cdd3ec4e7a9bde38d2e4c116 MEDIUM: server: close idle conn before server deletion This patch implements a similar logic, this time to close private idle connections stored in sessions. The principle is identical to the above commit : conn_release() is used on idle connections after a takeover to ensure thread safety. An extra change was required to be able to execute takeover on such connections. Their original thread ID was unknown, contrary to non private connections which are stored in sharded lists. As such, a new tid member has been added under sess_priv_conns chaining element.	2024-03-22 17:12:27 +01:00
Amaury Denoyelle	6e0afb2e27	MEDIUM: server: close idle conn on server deletion To be able to delete a server, a number of preconditions must be validated to ensure it is not in used anymore. Previously, if idle connections were stored in the server, the deletion was cancelled. No action was implemented to force idle connection closure, the only solution was to wait for the periodic purging to be achieved. This is an extra burden to be able to delete a server. Indeed, idle connections are by definition inactive and can be closed prior to delete a server. This is the exact purpose of this patch. Idle connections removal is implemented inside "delete server" handler, once it has been determined that the server can be freely removed. A simple loop is run to call conn_release() over each idle connections. Takeover is also executed before conn_release() to ensure tasks/tasklets or any other sensible elements are not deleted from a foreign thread. This patch should reduce the occurence of rejected "delete server" execution, especially when connection reuse is high.	2024-03-22 16:59:02 +01:00
Amaury Denoyelle	7dae3ceaa0	BUG/MAJOR: server: do not delete srv referenced by session A server can only be deleted if there is no elements which reference it. This is taken care via srv_check_for_deletion(), most notably for active and idle connections. A special case occurs for connections directly managed by a session. This is for so-called private connections, when using http-reuse never or H2 + http-reuse safe for example. In this case. server does not account these connections into its idle lists. This caused a bug as the server is deleted despite the session still being able to access it. To properly fix this, add a new referencing element into the server for these session connections. A mt_list has been chosen for this. On default http-reuse, private connections are typically not used so it won't make any difference. If using H2 servers, or more generally when dealing with private connections, insert/delete should typically occur only once per session lifetime so impact on performance should be minimal. This should be backported up to 2.4. Note that srv_check_for_deletion() was introduced in 3.0 dev tree. On backport, the extra condition in it should be placed in cli_parse_delete_server() instead.	2024-03-14 15:21:07 +01:00
Willy Tarreau	7223296092	BUG/MINOR: server: fix first server template not being indexed 3.0-dev1 introduced a small regression with commit `b4db3be86e` ("BUG/MINOR: server: fix server_find_by_name() usage during parsing"). By changing the way servers are indexed and moving it into the server template loop, the first one is no longer indexed because the loop starts at low+1 since it focuses on duplication. Let's index the first one explicitly now. This should not be backported, unless the commit above is backported.	2024-03-12 08:23:03 +01:00
matthias sweertvaegher	062ea3a3d4	BUILD: solaris: fix compilation errors Compilation on solaris fails because of usage of names reserved on that platform, i.e. 'queue' and 's_addr'. This patch redefines 'queue' as '_queue' and renames 's_addr' to 'srv_addr' which fixes compilation for now. Future plan: rename 'queue' in code base so define can be removed again. Backporting: 2.9, 2.8	2024-03-09 11:24:54 +01:00
Amaury Denoyelle	8a31783b64	BUG/MEDIUM: server: fix dynamic servers initial settings Contrary to static servers, dynamic servers does not initialize their settings from a default server instance. As such, _srv_parse_init() was responsible to set a set of minimal values to have a correct behavior. However, some settings were not properly initialized. This caused dynamic servers to not behave as static ones without explicit parameters. Currently, the main issue detected is connection reuse which was completely impossible. This is due to incorrect pool_purge_delay and max_reuse settings incompatible with srv_add_to_idle_list(). To fix the connection reuse, but also more generally to ensure dynamic servers are aligned with other server instances, define a new function srv_settings_init(). This is used to set initial values for both default servers and dynamic servers. For static servers, srv_settings_cpy() is kept instead, using their default server as reference. This patch could have unexpected effects on dynamic servers behavior as it restored proper initial settings. Previously, they were set to 0 via calloc() invocation from new_server(). This should be backported up to 2.6, after a brief period of observation.	2024-02-27 17:02:20 +01:00
Amaury Denoyelle	1b8c5abeeb	BUG/MAJOR: server: fix stream crash due to deleted server Before a dynamic server can be deleted, a set of preconditions must be validated to ensure it is not referenced naymore by a stream or a connection. This is implemented in srv_check_for_deletion(). The various criteria specified were incomplete. This allows a server instance to be deleted while still be referenced by a stream and a connection. This bug was reproduced by using ASAN compilation. A script was used to add and delete a server every second, while using h2load to generate traffic with download of 1k objects. Here is the ASAN error. ==140916==ERROR: AddressSanitizer: heap-use-after-free on address 0x520000020080 at pc 0x63cb25679537 bp 0x701529ff5070 sp 0x701529ff5060 READ of size 1 at 0x520000020080 thread T7 #0 0x63cb25679536 in objt_server include/haproxy/obj_type.h:99 #1 0x63cb2568f465 in process_stream src/stream.c:1823 #2 0x63cb25a4a4a2 in run_tasks_from_lists src/task.c:632 #3 0x63cb25a4bf62 in process_runnable_tasks src/task.c:876 #4 0x63cb2596a220 in run_poll_loop src/haproxy.c:3050 #5 0x63cb2596b192 in run_thread_poll_loop src/haproxy.c:3252 #6 0x701539aa9559 (/usr/lib/libc.so.6+0x8b559) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af) #7 0x701539b26a3b (/usr/lib/libc.so.6+0x108a3b) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af) To fix this, add <curr_used_conns> to the counters checked in srv_check_for_deletion(). Outside of this bug, one case which remains sensible is for SF_DIRECT streams which referenced a server instance early in process_stream() before connect_server(). This occurs with use-server directive, force-persist rule or cookie persistence. However, after code reexamination, the code is considered reliable as process_stream() is not rescheduled before connect_server() invocation. These observations have been saved in sess_change_server() documentation to ensure it remains valid in the future. This must be backported up to 2.6.	2024-02-22 18:36:54 +01:00
Willy Tarreau	9b680d7411	MINOR: server: split the server deletion code in two parts We'll need to be able to verify whether or not a server may be deleted. For now, both the verification and the action are performed in the same function, at once under thread isolation. The goal here is to extract the verification code into a new function that will perform these checks, return a status between success/recoverable/non-recoverable failure, and will also return a message for the caller.	2024-02-09 20:38:08 +01:00
Willy Tarreau	eaeb67bdb4	BUG/MINOR: server/cli: add missing LF at the end of certain notice/error lines Some cli_err(), cli_msg() or even ha_error() etc are missing the trailing LF, which breaks the continuity of the CLI parsing: the extra LF that serves to mark the end of the command is in fact taken as the missing LF and no extra one is added. This patch adds the missing LF on identified messages. It might be worth trying to proceed in a more generic way with this, given the amount of code that is possibly at risk.	2024-02-08 18:21:52 +01:00
Frédéric Lécaille	860028db47	CLEANUP: quic: Remaining useless code into server part Remove some QUIC definitions of members from server structure as the haproxy QUIC stack does not support at all the server part (QUIC client) as this time. Remove the statements in relation with their initializations. This patch should be backported as far as 2.6 to save memory.	2024-01-04 11:16:06 +01:00
Amaury Denoyelle	b4db3be86e	BUG/MINOR: server: fix server_find_by_name() usage during parsing Since below commit, server_find_by_name() now search using 'used_server_id' proxy backend tree : `4bcfe30414` OPTIM: server: eb lookup for server_find_by_name() This introduces a regression if server_find_by_name() is used via check_config_validity() during post-parsing. Indeed, used_server_id tree is populated at the same stage so it's possible to not found an existing server. This can cause incorrect rejection of previously valid configuration file. To fix this, servers are now inserted in used_server_id tree during parsing via parse_server(). This guarantees that server instances can be retrieved during post parsing. A known feature which uses server_find_by_name() during post parsing is attach-srv tcp-rule used for reverse HTTP. Prior to the current fix, a config was wrongly rejected if the rule was declared before the server line. This should not be backported unless the mentionned commit is.	2024-01-02 15:52:47 +01:00
Aurelien DARRAGON	bdecff511c	MEDIUM: server: simplify snr_set_srv_down() to prevent confusions snr_set_srv_down() (was formely known as snr_update_srv_status()), is still too ambiguous because it's not clear whether we will be putting the server under maintenance or not. This is mainly due to the fact that the function behaves differently if has_no_ip is set or not. By reviewing the function callers, it has now become clear that snr_resolution_cb() is always calling the function with a valid resolution so we only want to put the server under maintenance if we don't have a valid IP address. On the other hand snr_resolution_error_cb() always calls the function on error, with either no resolution (for SRV requests) or with failing resolution (all cases except RSLV_STATUS_VALID), so in this case we decide whether to put the server under maintenance case by case (ie: expired? timeout?) As a result, let's simplify snr_set_srv_down() so that it is only called when the caller really thinks that the server should be put under maintenance, which means always for snr_resolution_error_cb(), and only if the resolution didn't yield usable ip for snr_resolution_cb().	2024-01-02 10:29:50 +01:00
Aurelien DARRAGON	689784ed91	CLEANUP: resolvers: remove some more unused RSLV_UDP flags RSLV_UPD_CNAME and RSLV_UPD_NAME_ERROR flags have now become useless since `3cf7f987` ("MINOR: dns: proper domain name validation when receiving DNS response") as they are never set, but we forgot to remove them.	2024-01-02 10:29:41 +01:00
Aurelien DARRAGON	3ebe7bef8d	CLEANUP: server: remove ambiguous check in srv_update_addr_port() A leftover check was left by recent patch series about server addr:svc_port propagation: a check on (msg) being set was performed in srv_update_addr_port(), but msg is always set, so the check is not needed and confuses coverity (See GH #2399)	2024-01-02 10:29:24 +01:00
Ilya Shipitsin	8705e45964	CLEANUP: assorted typo fixes in the code and comments This is 38th iteration of typo fixes	2024-01-02 10:19:48 +01:00
Aurelien DARRAGON	64c9c8ef39	BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS As seen before, server's addr and svc_port should not be updated directly during runtime, because even if the update is performed under the lock, some competing threads might be reading ->addr and ->svc_port without the lock because they simply cannot afford it. To prevent races with such competing threads, server's addr and port should only be updated using server_set_inetaddr() function or similar. This patch depends on: - "MINOR: server: ensure connection cleanup on server addr changes" - "CLEANUP: server/event_hdl: remove purge_conn hint in INETADDR event" - "MEDIUM: server: merge srv_update_addr() and srv_update_addr_port() logic" - "MEDIUM: server: make server_set_inetaddr() updater serializable" - "MINOR: server/event_hdl: expose updater info through INETADDR event" - "MINOR: server: add dns hint in server_inetaddr_updater struct" - "MEDIUM: server/dns: clear RMAINT when addr resolves again" While it could be backported in 2.9 with `cd994407a` ("BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates") to ensure addr and svc_port reset performed by resolver's code comply with the API taking care of pushing the update (and thus avoid any race), some patch dependencies are quite sensitive so it's probably best to avoid backporting for no good reason, or at least wait for it to be considered stable to prevent any breakeages.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	334ebfa1a2	MEDIUM: server/dns: clear RMAINT when addr resolves again snr_update_srv_status() and srvrq_update_srv_status() will both set or clear the server RMAINT state depending of the result of the current dns resolution. This used to work pretty well in the past, but now that addr:svc_port changes are changed atomically through a dedicated task, the change is performed asynchronously, so this can cause some flapping issues if the server is put out of maintenance while the server's address is still unassigned. To prevent errors, the resolver's code is now only allowed to put the server under maintenance but not to remove it from maintenance: the decision to remove a server from maintenance is performed by the task responsible for updating the server's addr: if the addr resolves again thanks to a valid DNS resolution and the server was previously under RMAINT, then it cleared from RMAINT state. srvrq_update_srv_status() was renamed srvrq_set_srv_down(), since it is only called to put the server in maintenance as a result of a failing SRV entry. snr_update_srv_status() was renamed srv_set_srv_down() and slightly modified so that it only takes care of putting the server under maintenance when needed. The cli command "set server x/y addr" does not need to remove the RMAINT flag anymore.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	33cd676e9e	MINOR: server/event_hdl: expose updater info through INETADDR event Thanks to the previous commit, we can now expose updater info through INETADDR event.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	3ac79b504a	MEDIUM: server: make server_set_inetaddr() updater serializable server_set_inetaddr() updater argument is a simple char * string containing infos about the caller responsible for the update. In this patch, we try to make this argument serializable, that is, make it so that we can easily export it without having to keep the original pointer passed by the caller or having to work with strings of variable lengths. This was a prerequisite for exposing more updater information through SERVER_INETADDR event (upcoming patch). Static strings were simply mapped to a fixed ID that can be converted back to a string when needed using server_inetaddr_updater_by_to_str(). One special case one made for the SERVER_INETADDR_UPDATER_DNS_RESOLVER updater since in this case the updater hint has to be generated from the corresponding resolver id / nameserver id combination. This was achieved by saving the nameserver id within the updater struct. Knowing that the resolver id can be guessed from the server struct directly, it was not exposed through the updater struct. This patch depends on: - "MINOR: resolvers: add unique numeric id to nameservers" No functional change should be expected.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	ab6fef4882	CLEANUP: server: remove unused server_parse_addr_change_request() function server_parse_addr_change_request() was completely replaced by the newer srv_update_addr_port() function. Considering the function doesn't offer useful features that srv_update_addr_port() couldn't do, we simply remove the function.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	f1f4b93a67	MEDIUM: server: merge srv_update_addr() and srv_update_addr_port() logic Both functions are performing the similar tasks, except that the _port() version is doing a bit more work. In this patch, we add the server_set_inetaddr() function that works like the srv_update_addr_port() but it takes parsed inputs instead of raw strings as arguments. Then, server_set_inetaddr() is used as underlying helper function for both srv_update_addr() and srv_update_addr_port() to make them easier to maintain. Also, helper functions were added: - server_set_inetaddr_warn() -> same as server_set_inetaddr() but report a warning on updates. - server_get_inetaddr() -> fills a struct server_inetaddr from srv Since the feedback message generation part was slightly reworked, some minor changes in the way addr:svc_port updates are reported in the logs or cli messages should be expected (no loss of information though).	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	2d0c7f5935	CLEANUP: server/event_hdl: remove purge_conn hint in INETADDR event Now that purge_conn hint is now being ignored thanks to previous commit, we can simply get rid of it.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	2e3a163e47	MINOR: server: ensure connection cleanup on server addr changes Previously, in srv_update_addr_port(), we forced connection cleanup on server changes. This was done in `6318d33ce` ("BUG/MEDIUM: connections: force connections cleanup on server changes"). However, there is no reason we shouldn't have done the same in srv_update_addr() function, because the end goal is the same: perform runtime changes on server's address. The purge_conn hint propagated through the INETADDR server event was simply there to keep the original behavior (only purge the connection for events originating from srv_update_addr_port()), but to ensure the address change is handled the same way for both code paths, we simply ignore this hint.	2023-12-21 14:22:26 +01:00
Aurelien DARRAGON	545e72546c	BUG/MINOR: server/event_hdl: propagate map port info through inetaddr event server addr:svc_port updates during runtime might set or clear the SRV_F_MAPPORTS flag. Unfortunately, the flag update is still directly performed by srv_update_addr_port() function while the addr:svc_port update is being scheduled for atomic update. Given that existing readers don't take server's lock to read addr:svc_port, they also check the SRV_F_MAPPORTS flag right after without the lock. So we could cause the readers to incorrectly interpret the svc_port from the server struct because the mapport information is not published atomically, resulting in inconsistencies between svc_port / mapport flag. (MAPPORTS flag causes svc_port to be used differently by the reader) To fix this, we publish the mapport information within the INETADDR server event and we let the task responsible for updating server's addr and port position or clear the flag depending on the mapport hint. This patch depends on: - MINOR: server/event_hdl: add server_inetaddr struct to facilitate event data usage - MINOR: server/event_hdl: update _srv_event_hdl_prepare_inetaddr prototype This should be backported in 2.9 with `683b2ae01` ("MINOR: server/event_hdl: add SERVER_INETADDR event")	2023-12-21 14:22:26 +01:00
Aurelien DARRAGON	4e50c31eab	MINOR: server/event_hdl: update _srv_event_hdl_prepare_inetaddr prototype Slightly change _srv_event_hdl_prepare_inetaddr() function prototype to reduce the input arguments by learning some settings directly from the server. Also taking this opportunity to make the function static inline since it's relatively simple and not meant to be used directly.	2023-12-21 14:22:26 +01:00
Aurelien DARRAGON	835263047e	OPTIM: server: ebtree lookups for findserver_unique_* functions `4e5e2664` ("MINOR: proxy: add findserver_unique_id() and findserver_unique_name()") added findserver_unique_id() and findserver_unique_name() functions that were inspired from the historical findserver() function, so unfortunately they don't perform well when used on large backend farms because they scan the whole server list linearly. I was about to provide a patch to optimize such functions when I stumbled on Baptiste's work: `19a106d24` ("MINOR: server: server_find functions: id, name, best_match") It turns out Baptiste already implemented helper functions to supersed the unoptimized findserver() function (at least at runtime when servers have been assigned their final IDs and inserted in the lookup trees): they offer more matching options and rely on eb lookups so they are much more suitable for fast queries. I don't know how I missed that, but they are a perfect base for the server rid matching functions. So in this patch, we essentially revert `4e5e2664` to provide the optimized equivalent functions named server_find_by_id_unique() and server_find_by_name_unique(), then we force existing findserver_unique_*() callers to switch to the new functions. This patch depends on: - "OPTIM: server: eb lookup for server_find_by_name()" This could be backported up to 2.8.	2023-12-21 14:22:26 +01:00
Aurelien DARRAGON	4bcfe30414	OPTIM: server: eb lookup for server_find_by_name() server_find_by_name() function was added in `19a106d24` ("MINOR: server: server_find functions: id, name, best_match"). At that time, only the used_server_id proxy tree was available, thus the name lookup was performed as a linear search. However, used_server_name proxy tree was added in `84d6046a` ("MINOR: proxy: Add a "server by name" tree to proxy."), so we may safely rely on it to perform server name lookups now. This will hopefully make the function quite faster, especially when performing lookups in huge backend farms.	2023-12-21 14:22:26 +01:00
Christopher Faulet	3811c1de25	BUG/MINOR: server: Use the configured address family for the initial resolution A regression was introduced by the commit `c886fb58eb` ("MINOR: server/ip: centralize server ip updates"). The configured address family is lost when the server address is initialized during the startup, for the resolution based on the libc or based on the server state-file. Thus, "ipv4@" and "ipv6@" prefixed are ignored. To fix the bug, we take care to use the configured address family before calling str2ip2() in srv_apply_lastaddr() and srv_apply_via_libc() functions. This patch should fix the issue #2393. It must be backported to 2.9.	2023-12-20 12:21:59 +01:00
Aurelien DARRAGON	c2cd6a419c	BUG/MINOR: server/event_hdl: properly handle AF_UNSPEC for INETADDR event It is possible that a server's addr family is temporarily set to AF_UNSPEC even if we're certain to be in INET context (ipv4, ipv6). Indeed, as soon as IP address resolving is involved, srv->addr family will be set to AF_UNSPEC when the resolution fails (could happen at anytime). However, _srv_event_hdl_prepare_inetaddr() wrongly assumed that it would only be called with AF_INET or AF_INET6 families. Because of that, the function will handle AF_UNSPEC address as an IPV6 address: not only we could risk reading from an unititialized area, but we would then propagate false information when publishing the event. In this patch we make sure to properly handle the AF_UNSPEC family in both the "prev" and the "next" part for SERVER_INETADDR event and that every members are explicitly initialized. This bug was introduced by 6fde37e046 ("MINOR: server/event_hdl: add SERVER_INETADDR event"), no backport needed.	2023-12-01 20:43:42 +01:00
Willy Tarreau	822d45678f	BUILD: server: shut a bogus gcc warning on certain ubuntu On ubuntu 20.04 and 22.04 with gcc 9.4 and 11.4 respectively, we get the following warning: src/server.c: In function 'srv_update_addr_port': src/server.c:4027:3: warning: 'new_port' may be used uninitialized in this function [-Wmaybe-uninitialized] 4027 \| _srv_event_hdl_prepare_inetaddr(&cb_data.addr, &s->addr, s->svc_port, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4028 \| ((ip_change) ? &sa : &s->addr), \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4029 \| ((port_change) ? new_port : s->svc_port), \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4030 \| 1); \| ~~ It's clearly wrong, port_change only changes from 0 to anything else after assigning new_port. Let's just preset new_port to zero instead of trying to play smart with the compiler.	2023-11-30 17:48:03 +01:00
Aurelien DARRAGON	2f2cb6d082	MEDIUM: log/balance: support FQDN for UDP log servers In previous log backend implementation, we created a pseudo log target for each declared log server, and we made the log target's address point to the actual server address to save some time and prevent unecessary copies. But this was done without knowing that when FQDN is involved (more broadly when dns/resolution is involved), the "port" part of server addr should not be relied upon, and we should explicitly use ->svc_port for that purpose. With that in mind and thanks to the previous commit, some changes were required: we allocate a dedicated addr within the log target when target is in DGRAM mode. The addr is first initialized with known values and it is then updated automatically by _srv_set_inetaddr() during runtime. (the change is atomic so readers don't need to worry about it) addr from server "log target" (INET/DGRAM mode) is made of the combination of server's address (lacking the port part) and server's svc_port.	2023-11-29 08:59:27 +01:00
Aurelien DARRAGON	cd994407a9	BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates For inet families (IP4/IP6), it is expected that server's addr/port might be updated at runtime from DNS, cli or lua for instance. Such updates were performed under the server's lock. Unfortunately, most readers such as backend.c or sink.c perform the read without taking server's lock because they can't afford slowing down their processing for a type of event which is normally rare. But this could result in bad values being read for the server addr:svc_port tuple (ie: during connection etablishment) as a result of concurrent updates from external components, which can obviously cause some undesirable effects. Instead of slowing the readers down, as we consider server's addr changes are relatively rare, we take another approach and try to update the addr:port atomically by performing changes under full thread isolation when a new change is requested. The changes are performed by a dedicated task which takes care of isolating the current thread and doesn't depend on other threads (independent code path) to protect against dead locks. As such, server's addr:port changes will now be performed atomically, but they will not be processed instantly, they will be translated to events that the dedicated task will pick up from time to time to apply the pending changes. This bug existed for a very long time and has never been reported so far. It was discovered by reading the code during the implementation of log backend ("mode log" in backends). As it involves changes in sensitive areas as well as thread isolation, it is probably not worth considering backporting it for now, unless it is proven that it will help to solve bugs that are actually encountered in the field. This patch depends on: - `24da4d3` ("MINOR: tools: use const for read only pointers in ip{cmp,cpy}") - `c886fb5` ("MINOR: server/ip: centralize server ip updates") - event_hdl API (which was first seen on 2.8) + `683b2ae` ("MINOR: server/event_hdl: add SERVER_INETADDR event") + BUG/MEDIUM: server/event_hdl: memory overrun in _srv_event_hdl_prepare_inetaddr() + "MINOR: event_hdl: add global tunables" Note that the patch may be reworked so that it doesn't depend on event_hdl API for older versions, the approach would remain the same: this would result in a larger patch due to the need to manually implement a global queue of pending updates with its dedicated task responsible for picking updates and comitting them. An alternative approach could consist in per-server, lock-protected, temporary addr:svc_port storage dedicated to "updaters" were only the most recent values would be kept. The sync task would then use them as source values to atomically update the addr:svc_port members that the runtime readers are actually using.	2023-11-29 08:59:27 +01:00
Aurelien DARRAGON	f638d4b1bc	BUG/MEDIUM: server/event_hdl: memory overrun in _srv_event_hdl_prepare_inetaddr() As reported in GH #2358, #2359, #2360, #2361 and #2362: ipv6 address handling may cause memory overrun due to struct in6_addr being handled as sockaddr_in6 which is larger. Moreover, source variable wasn't properly read from since the raw value was used as a pointer instead of pointing to the actual variable's address. This bug was introduced by 6fde37e046 ("MINOR: server/event_hdl: add SERVER_INETADDR event") Unfortunately for us, gcc didn't catch this and, this actually used to "work" by accident since in6_addr struct is made of array so not passing pointer explicitly still resolved to the proper starting address.. Hopefully this was caught by coverity so thanks to Ilya for that. The fix is simple: we simply copy the whole in6_addr struct by accessing it using a pointer and using the proper struct size for the copy.	2023-11-29 08:59:27 +01:00
Aurelien DARRAGON	c886fb58eb	MINOR: server/ip: centralize server ip updates Add a new helper function named _srv_update_inetaddr() to centralize ip addr and port updates during runtime.	2023-11-24 16:27:55 +01:00
Aurelien DARRAGON	683b2ae013	MINOR: server/event_hdl: add SERVER_INETADDR event In this patch we add the support for a new SERVER event in the event_hdl API. SERVER_INETADDR is implemented as an advanced server event. It is published each time the server's ip address or port is about to change. (ie: from the cli, dns, lua...) SERVER_INETADDR data is an event_hdl_cb_data_server_inetaddr struct that provides additional info related to the server inet addr change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-11-24 16:27:55 +01:00
Amaury Denoyelle	55e78ff7e1	MINOR: rhttp: large renaming to use rhttp prefix Previous commit renames 'proto_reverse_connect' module to 'proto_rhttp'. This commits follows this by replacing various custom prefix by 'rhttp_' to make the code uniform. Note that 'reverse_' prefix was kept in connection module. This is because if a new reversable protocol not based on HTTP is implemented, it may be necessary to reused the same connection function which are protocol agnostic.	2023-11-23 17:40:01 +01:00
Willy Tarreau	53da8bfcb6	BUG/MINOR: server: do not leak default-server in defaults sections When a default-server directive is used in a defaults section, it's never freed and the "defaults" proxy gets reset without freeing the fields from that default-server. Normally there are no allocation there, except for the config file location stored in srv->conf.file form an strdup() since commit `9394a9444` ("REORG: server: move alert traces in parse_server") that appeared in 2.4. In addition, if a "default-server" directive appears multiple times in a defaults section, one more entry will be leaked per call. This commit addresses this by checking that we don't overwrite the file upon multiple calls, and by clearing it when resetting the default proxy. This should be backported to 2.4.	2023-11-23 14:32:55 +01:00
Amaury Denoyelle	560cb1332a	MINOR: server: force add to idle on reverse A backend connection is inserted in server idle list via srv_add_to_idle_list(). This function has several conditions which may cause the connection to be rejected instead. One of this condition is based on the current estimate count of needed connections for the server. If the count of idle connections stored has already reached this estimation, the new connection is rejected. This is in opposition with the purpose of reverse HTTP. On active reverse, haproxy can instantiate several connections to properly serve the future traffic. However, the opposite passive haproxy will have only a low estimate of needed connection and will reject most of them. To fix this, simply check CO_FL_REVERSED connection flag on srv_add_to_idle_list(). If set, the connection is inserted without checking for estimate count. Note that all other conditions are not impacted, so it's still possible to reject a connection, for example if process FD limit is reached. This commit relies on recent patch which change CO_FL_REVERSED flag for connection after passive reverse.	2023-11-16 18:43:41 +01:00
Willy Tarreau	79aa638238	MINOR: server: always initialize pp_tlvs for default servers In commit `6f4bfed3a` ("MINOR: server: Add parser support for set-proxy-v2-tlv-fmt") a suspicious check for a NULL srv_tlv was placed in the list_for_each_entry(), that should not be needed. In practice, it's caused by the list head not being initialized, hence the first element is NULL, as shown by Alexander's reproducer below which crashes if the test in the loop is removed: backend dummy default-server send-proxy-v2 set-proxy-v2-tlv-fmt(0xE1) %[fc_pp_tlv(0xE1)] server dummy_server 127.0.0.1:2319 The right place to initialize this field is proxy_preset_defaults(). We'd really need a function to initialize a server :-/ The check in the loop was removed. No backport is needed.	2023-11-13 08:53:28 +01:00
Aurelien DARRAGON	64e0b63442	BUG/MEDIUM: server: invalid address (post)parsing checks This bug was introduced with `29b76ca` ("BUG/MEDIUM: server/log: "mode log" after server keyword causes crash ") Indeed, we cannot safely rely on addr_proto being set when str2sa_range() returns in parse_server() (even if SRV_PARSE_PARSE_ADDR is set), because proto lookup might be bypassed when FQDN addresses are involved. Unfortunately, the above patch wrongly assumed that proto would always be set when SRV_PARSE_PARSE_ADDR was passed to parse_server() (so when str2sa_range() was called), resulting in invalid postparsing checks being performed, which could as well lead to crashes with log backends ("mode log" set) because some postparsing init was skipped as a result of proto not being set and this wasn't expected later in the init code. To fix this, we now make use of the previous patch to perform server's address compatibility checks on hints that are always set when str2sa_range() succesfully returns. For log backend, we're also adding a complementary test to check if the address family is of expected type, else we report an error, plus we're moving the postinit logic in log api since _srv_check_proxy_mode() is only meant to check proxy mode compatibility and we were abusing it. This patch depends on: - "MINOR: tools: make str2sa_range() directly return type hints" No backport required unless `29b76ca` gets backported.	2023-11-10 17:49:57 +01:00
Aurelien DARRAGON	12582eb8e5	MINOR: tools: make str2sa_range() directly return type hints str2sa_range() already allows the caller to provide <proto> in order to get a pointer on the protocol matching with the string input thanks to `5fc9328a` ("MINOR: tools: make str2sa_range() directly return the protocol") However, as stated into the commit message, there is a trick: "we can fail to return a protocol in case the caller accepts an fqdn for use later. This is what servers do and in this case it is valid to return no protocol" In this case, we're unable to return protocol because the protocol lookup depends on both the [proto type + xprt type] and the [family type] to be known. While family type might not be directly resolved when fqdn is involved (because family type might be discovered using DNS queries), proto type and xprt type are already known. As such, the caller might be interested in knowing those address related hints even if the address family type is not yet resolved and thus the matching protocol cannot be looked up. Thus in this patch we add the optional net_addr_type (custom type) argument to str2sa_range to enable the caller to check the protocol type and transport type when the function succeeds.	2023-11-10 17:49:57 +01:00
Tim Duesterhus	d7eaa0d553	CLEANUP: Re-apply xalloc_size.cocci (3) This reapplies the xalloc_size.cocci patch across the whole `src/` tree. see `16cc16dd82` see `63ee0e4c01` see `9fb57e8c17`	2023-11-06 20:49:56 +01:00
Willy Tarreau	09eacb8b24	BUG/MINOR: server: remove some incorrect free() calls on null elements In commit `6f4bfed3a` ("MINOR: server: Add parser support for set-proxy-v2-tlv-fmt") a few free() calls were made to an element on error path when it was detected it was NULL. It doesn't have any effect, however there was one case of use-after-free at the end of srv_settings_cpy() that was caught by gcc due to attempting to free the element after freeing its holder. No backport is needed.	2023-11-04 08:56:01 +01:00
Alexander Stephan	6f4bfed3a2	MINOR: server: Add parser support for set-proxy-v2-tlv-fmt This commit introduces a generic server-side parsing of type-value pair arguments and allocation of a TLV list via a new keyword called set-proxy-v2-tlv-fmt. This allows to 1) forward any TLV type with the help of fc_pp_tlv, 2) generally, send out any TLV type and value via a log format expression. To have this fully working the connection will need to be updated in a follow-up commit to actually respect the new server TLV list. default-server support has also been implemented.	2023-11-04 04:56:59 +01:00
Aurelien DARRAGON	1822e8998b	MINOR: server: add helper function to detach server from proxy list Remove some code duplication by introducing a basic helper function to detach a server from its parent proxy. It is supported to call the function even if the server is not yet listed in the proxy list. If the server is not yet listed in the proxy, the function will do nothing. In delete_server(), we previously performed some BUG_ON() to ensure that the detach always succeeded given that we were certain that the server was in the proxy list because it was retrieved through get_backend_server(). However this test is superfluous, we can safely assume that the operation will always succeed if get_backend_server() returned != NULL (we're under full thread isolation), and if it's not the case, then we have a bigger API issue anyway..	2023-10-25 11:59:27 +02:00
Aurelien DARRAGON	e128fc7ce1	BUG/MEDIUM: server: "proto" not working for dynamic servers In `304672320e` ("MINOR: server: support keyword proto in 'add server' cli") improper use of conn_get_best_mux_entry() function was made: First, server's proxy mode was directly passed as "proto_mode" argument to conn_get_best_mux_entry(), but this is strictly invalid because while there is some relationship between proto modes and proxy modes, they don't use the same storage mechanism and cannot be used interchangeably. Because of this bug, conn_get_best_mux_entry() would not work at all for TCP because PR_MODE_TCP equals 0, where PROTO_MODE_TCP normally equals 1. Then another, less sensitive bug, remains: as its name and description implies, conn_get_best_mux_entry() will try its best to return something to the user, only using keyword (mux_proto) input as an hint to return the most relevant mux within the list of mux that are compatibles with proto_side and proto_mode values. This means that even if mux_proto cannot be found or is not available with current proto_side and proto_mode values, conn_get_best_mux_entry() will most probably fallback to a more generic mux. However in cli_parse_add_server(), we directly check the result of conn_get_best_mux_entry() and consider that it will return NULL if the provided keyword hint for mux_proto cannot be found. This will result in the function not raising errors as expected, because most of the times if the expected proto cannot be found, then we'll silently switch to the fallback one, despite the user providing an explicit proto. To fix that, we store the result of conn_get_best_mux_entry() to compare the returned mux proto name with the one we're expecting to get, as it is originally performed in cfgparse during initial server keyword parsing. This patch depends on - "MINOR: connection: add conn_pr_mode_to_proto_mode() helper func") It must be backported up to 2.6.	2023-10-25 11:59:27 +02:00
Aurelien DARRAGON	29b76cae47	BUG/MEDIUM: server/log: "mode log" after server keyword causes crash In `9a74a6c` ("MAJOR: log: introduce log backends"), a mistake was made: it was assumed that the proxy mode was already known during server keyword parsing in parse_server() function, but this is wrong. Indeed, "mode log" can be declared late in the proxy section. Due to this, a simple config like this will cause the process to crash: \|backend test \| \| server name 127.0.0.1:8080 \| mode log In order to fix this, we relax some checks in _srv_parse_init() and store the address protocol from str2sa_range() in server struct, then we set-up a postparsing function that is to be called after config parsing to finish the server checks/initialization that depend on the proxy mode to be known. We achieve this by checking the PR_CAP_LB capability from the parent proxy to know if we're in such case where the effective proxy mode is not yet known (it is assumed that other proxies which are implicit ones don't provide this possibility and thus don't suffer from this constraint). Only then, if the capability is not found, we immediately perform the server checks that depend on the proxy mode, else the check is postponed and it will automatically be performed during postparsing thanks to the REGISTER_POST_SERVER_CHECK() hook. Note that we remove the SRV_PARSE_IN_LOG_BE flag because it was introduced in the above commit and it is no longer relevant. No backport needed unless `9a74a6c` gets backported.	2023-10-25 11:59:27 +02:00
Amaury Denoyelle	f76e94d231	MINOR: backend: refactor insertion in avail conns tree Define a new function srv_add_to_avail_list(). This function is used to centralize connection insertion in available tree. It reuses a BUG_ON() statement to ensure the connection is not present in the idle list.	2023-10-25 10:33:06 +02:00
Amaury Denoyelle	394bd4eb39	BUG/MAJOR: backend: fix idle conn crash under low FD Since the following commit, idle conns are stored in a list as secondary storage to retrieve them in usage order : `5afcb686b9` MAJOR: connection: purge idle conn by last usage The list usage has been extended wherever connections lookup are done both on idle and safe trees. This reduced the code size by replacing a two tree loops by a single list loop. LIST_ELEM() is used in this context to retrieve the first idle list element from the server list head. However, macro usage was wrong due to an extra '&' operator which returns an invalid connection reference. This will most of the time caused a crash on conn_delete_from_tree() or affiliated functions. This bug only occurs if the FD pool is exhausted and some idle connections are selected to be killed. It can be reproduced using the following config and h2load command : $ h2load -t 8 -c 800 -m 10 -n 800 "http://127.0.0.1:21080/?s=10k" global maxconn 100 defaults mode http timeout connect 20s timeout client 20s timeout server 20s listen li bind :21080 proto h2 server nginx 127.99.0.1:30080 proto h1 This bug has been introduced by the above commit. Thus no need to backport this fix. Note that LIST_ELEM() macro usage was slightly adjusted also in srv_migrate_conns_to_remove(). The function used toremove_list instead of idle_list connection list element. This is not a bug as they are stored in the same union. However, the new code is clearer as it intends to move connection from the idle_list only into the toremove_list mt-list.	2023-10-25 10:30:45 +02:00
Amaury Denoyelle	9d4c7c1151	MINOR: server: convert @reverse to rev@ standard format Remove the recently introduced '@reverse' notation for HTTP reverse servers. Instead, reuse the 'rev@' prefix already defined for bind lines.	2023-10-20 14:44:37 +02:00
Aurelien DARRAGON	94d0f77deb	MINOR: server: introduce "log-bufsize" kw "log-bufsize" may now be used for a log server (in a log backend) to configure the bufsize of implicit ring associated to the server (which defaults to BUFSIZE).	2023-10-13 10:05:07 +02:00
Aurelien DARRAGON	9a74a6cb17	MAJOR: log: introduce log backends Using "mode log" in a backend section turns the proxy in a log backend which can be used to log-balance logs between multiple log targets (udp or tcp servers) log backends can be used as regular log targets using the log directive with "backend@be_name" prefix, like so: \| log backend@mybackend local0 A log backend will distribute log messages to servers according to the log load-balancing algorithm that can be set using the "log-balance" option from the log backend section. For now, only the roundrobin algorithm is supported and set by default.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	95c4d24825	BUG/MEDIUM: server/cli: don't delete a dynamic server that has streams In cli_parse_delete_server(), we take care of checking that the server is in MAINT and that the cur_sess counter is set to 0, in the hope that no connection/stream ressources continue to point to the server, else we refuse to delete it. As shown in GH #2298, this is not sufficient. Indeed, when the server option "on-marked-down shutdown-sessions" is not used, server streams are not purged when srv enters maintenance mode. As such, there could be remaining streams that point to the server. To detect this, a secondary check on srv->cur_sess counter was performed in cli_parse_delete_server(). Unfortunately, there are some code paths that could lead to cur_sess being decremented, and not resulting in a stream being actually shutdown. As such, if the delete_server cli is handled right after cur_sess has been decremented with streams still pointing to the server, we could face some nasty bugs where stream->srv_conn could point to garbage memory area, as described in the original github report. To make the check more reliable prior to deleting the server, we don't rely exclusively on cur_sess and directly check that the server is not used in any stream through the srv_has_stream() helper function. Thanks to @capflam which found out the root cause for the bug and greatly helped to provide the fix. This should be backported up to 2.6.	2023-09-21 14:57:01 +02:00
Aurelien DARRAGON	2c9bd3ae80	BUG/MINOR: server: add missing free for server->rdr_pfx rdr_pfx was not being free during server cleanup, leading to small memory leak when "redir" argument was used on a server line (HTTP only). This should be backported to every stable versions. [For 2.6 and 2.7: the free should be performed in srv_drop() directly. For older versions: free in deinit() function near the free for the cookie string]	2023-09-15 17:46:49 +02:00
Willy Tarreau	6cbb5a057b	Revert "MAJOR: import: update mt_list to support exponential back-off" This reverts commit `c618ed5ff4`. The list iterator is broken. As found by Fred, running QUIC single- threaded shows that only the first connection is accepted because the accepter relies on the element being initialized once detached (which is expected and matches what MT_LIST_DELETE_SAFE() used to do before). However while doing this in the quic_sock code seems to work, doing it inside the macro show total breakage and the unit test doesn't work anymore (random crashes). Thus it looks like the fix is not trivial, let's roll this back for the time it will take to fix the loop.	2023-09-15 17:13:43 +02:00
Willy Tarreau	c618ed5ff4	MAJOR: import: update mt_list to support exponential back-off The new mt_list code supports exponential back-off on conflict, which is important for use cases where there is contention on a large number of threads. The API evolved a little bit and required some updates: - mt_list_for_each_entry_safe() is now in upper case to explicitly show that it is a macro, and only uses the back element, doesn't require a secondary pointer for deletes anymore. - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has to set the list iterator to NULL so that it is not re-inserted into the list and the list is spliced there. One must be careful because it was usually performed before freeing the element. Now instead the element must be nulled before the continue/break. - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been unclear. They were replaced by mt_list_cut_around() and mt_list_connect_elem() which more explicitly detach the element and reconnect it into the list. - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is in list.h. It may however possibly benefit from being upstreamed. This required tiny adaptations to event_hdl.c and quic_sock.c. The test case was updated and the API doc added. Note that in order to keep include files small, the struct mt_list definition remains in list-t.h (par of the internal API) and was ifdef'd out in mt_list.h. A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with 80 threads shows a drastic reduction of CPU usage thanks to this and the refined memory barriers. Please note that the CPU usage on OpenSSL 3.0.9 is significantly higher due to the excessive use of atomic ops by openssl, but 3.1 is only slightly above 1.1.1 though: - before: 35 Gbps, 3.5 Mpps, 7800% CPU - after: 41 Gbps, 4.2 Mpps, 2900% CPU	2023-09-13 11:50:33 +02:00
Amaury Denoyelle	5afcb686b9	MAJOR: connection: purge idle conn by last usage Backend idle connections are purged on a recurring occurence during the process lifetime. An estimated number of needed connections is calculated and the excess is removed periodically. Before this patch, purge was done directly using the idle then the safe connection tree of a server instance. This has a major drawback to take no account of a specific ordre and it may removed functional connections while leaving ones which will fail on the next reuse. The problem can be worse when using criteria to differentiate idle connections such as the SSL SNI. In this case, purge may remove connections with a high rate of reusing while leaving connections with criteria never matched once, thus reducing drastically the reuse rate. To improve this, introduce an alternative storage for idle connection used in parallel of the idle/safe trees. Now, each connection inserted in one of this tree is also inserted in the new list at `srv_per_thread.idle_conn_list`. This guarantees that recently used connection is present at the end of the list. During the purge, use this list instead of idle/safe trees. Remove first connection in front of the list which were not reused recently. This will ensure that connection that are frequently reused are not purged and should increase the reuse rate, particularily if distinct idle connection criterias are in used.	2023-08-25 15:57:48 +02:00
Amaury Denoyelle	61fc9568fb	MINOR: server: move idle tree insert in a dedicated function Define a new function _srv_add_idle(). This is a simple wrapper to insert a connection in the server idle tree. This is reserved for simple usage and require to idle_conns lock. In most cases, srv_add_to_idle_list() should be used. This patch does not have any functional change. However, it will help with the next patch as idle connection will be always inserted in a list as secondary storage along with idle/safe trees.	2023-08-25 15:57:48 +02:00
Amaury Denoyelle	77ac8eb4a6	MINOR: connection: simplify removal of idle conns from their trees Small change of API for conn_delete_from_tree(). Now the connection instance is taken as argument instead of its inner node. No functional change introduced with this commit. This simplifies slightly invocation of conn_delete_from_tree(). The most useful changes is that this function will be extended in the next patch to be able to remove the connection from its new idle list at the same time as in its idle tree.	2023-08-25 15:57:48 +02:00
Amaury Denoyelle	e6223a3188	MINOR: server: define reverse-connect server Implement reverse-connect server. This server type cannot instantiate its own connection on transfer. Instead, it can only reuse connection from its idle pool. These connections will be populated using the future 'tcp-request session attach-srv' rule. A reverse-connect has no address. Instead, it uses a new custom server notation with '@' character prefix. For the moment, only '@reverse' is defined. An extra check is implemented to ensure server is used in a HTTP proxy.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	d8d9122a02	MINOR: connection: centralize init/deinit of backend elements A connection contains extra elements which are only used for the backend side. Regroup their allocation and deallocation in two new functions named conn_backend_init() and conn_backend_deinit(). No functional change is introduced with this commit. The new functions are reused in place of manual alloc/dealloc in conn_new() / conn_free(). This patch will be useful for reverse connect support with connection conversion from backend to frontend side and vice-versa.	2023-08-24 14:44:33 +02:00
Amaury Denoyelle	fbe35afaa4	MINOR: proxy: simplify parsing 'backend/server' Several CLI handlers use a server argument specified with the format '<backend>/<server>'. The parsing of this arguement is done in two steps, first splitting the string with '/' delimiter and then use get_backend_server() to retrieve the server instance. Refactor this code sections with the following changes : * splitting is reimplented using ist API * get_backend_server() is removed. Instead use the already existing proxy_be_by_name() then server_find_by_name() which contains duplicated code with the now removed function. No functional change occurs with this commit. However, it will be useful to add new configuration options reusing the same '<backend>/<server>' for reverse connect.	2023-08-24 14:44:33 +02:00
Christopher Faulet	e2e72e578e	BUG/MINOR: server: Don't warn on server resolution failure with init-addr none During startup, when the "none" method for "init-addr" is evaluated, a warning is emitted if a resolution failure was previously encountered. The documentation of the "none" method states it should be used to ignore server resolution failures and let the server starts in DOWN state. However, because a warning may be emitted, it is not possible to start HAProxy with "zero-warning" option. The same is true when "-dr" command line option is used. It is counter intuitive and, in a way, this contradict what is specified in the documentation. So instead, a notice message is now emitted. At the end, if "-dr" command line option is used or if "none" method is explicitly used, it means the admin is agree with server resolution failures. There is no reason to emit a warning. This patch should fix the issue #2176. It could be backported to all stable versions but backporting to 2.8 is probably enough for now.	2023-07-20 18:12:44 +02:00
Aurelien DARRAGON	b2f7069479	BUG/MINOR: server: set rid default value in new_server() srv->rid default value is set in _srv_parse_init() after the server is succesfully allocated using new_server(). This is wrong because new_server() can be used independently so rid value assignment would be skipped in this case. Hopefully new_server() allocates server data using calloc() so srv->rid is already set to 0 in practise. But if calloc() is replaced by malloc() or other memory allocating function that doesn't zero-initialize srv members, this could lead to rid being uninitialized in some cases. This should be backported in 2.8 with `61e3894dfe` ("MINOR: server: add srv->rid (revision id) value")	2023-07-10 18:28:08 +02:00
Aurelien DARRAGON	19b5a7c7a5	BUG/MINOR: server: inherit from netns in srv_settings_cpy() When support for 'namespace' keyword was added for the 'default-server' directive in `22f41a2` ("MINOR: server: Make 'default-server' support 'namespace' keyword."), we forgot to copy the attribute from the parent to the newly created server. This resulted in the 'namespace' keyword being parsed without errors when used from a 'default-server' directive, but in practise the option was simply ignored. There's no need to duplicate the netns struct because it is stored in a shared list, so copying the pointer does the job. This patch partially fixes GH #2038 and should be backported to all stable versions.	2023-06-14 11:27:29 +02:00
Aurelien DARRAGON	b7f8af3ca9	BUG/MINOR: proxy/server: free default-server on deinit proxy default-server is a specific type of server that is not allocated using new_server(): it is directly stored within the parent proxy structure. However, since it may contain some default config options that may be inherited by regular servers, it is also subject to dynamic members (strings, structures..) that needs to be deallocated when the parent proxy is cleaned up. Unfortunately, srv_drop() may not be used directly from p->defsrv since this function is meant to be used on regular servers only (those created using new_server()). To circumvent this, we're splitting srv_drop() to make a new function called srv_free_params() that takes care of the member cleaning which originally takes place in srv_drop(). This function is exposed through server.h, so it may be called from outside server.c. Thanks to this, calling srv_free_params(&p->defsrv) from free_proxy() prevents any memory leaks due to dynamic parameters allocated when parsing a default-server line from a proxy section. This partially fixes GH #2173 and may be backported to 2.8. [While it could also be relevant for other stable versions, the patch won't apply due to architectural changes / name changes between 2.4 => 2.6 and then 2.6 => 2.8. Considering this is a minor fix that only makes memory analyzers happy during deinit paths (at least for <= 2.8), it might not be worth the trouble to backport them any further?]	2023-06-06 15:15:17 +02:00
Christopher Faulet	2c29d1f524	BUG/MINOR: peers: Improve detection of config errors in peers sections There are several misuses in peers sections that are not detected during the configuration parsing and that could lead to undefined behaviors or crashes. First, only one listener is expected for a peers section. If several bind lines or local peer definitions are used, an error is triggered. However, if multiple addresses are set on the same bind line, there is no error while only the last listener is properly configured. On the 2.8, there is no crash but side effects are hardly predictable. On older version, HAProxy crashes if an unconfigured listener is used. Then, there is no check on remote peers name. It is unexpected to have same name for several remote peers. There is now a test, performed during the post-parsing, to verify all remote peer names are unique. Finally, server parsing options for the peers sections are changed to be sure a port is always defined, and not a port range or a port offset. This patch fixes the issue #2066. It could be backported to all stable versions.	2023-06-05 08:24:34 +02:00
Aurelien DARRAGON	0d2f1acee6	BUG/MINOR: server: memory leak in _srv_update_status_op() on server DOWN When server is transitionning from UP to DOWN, a log message is generated. e.g.: "Server backend_name/server_name is DOWN") However since `f71e064` ("MEDIUM: server: split srv_update_status() in two functions"), the allocated buffer tmptrash which is used to prepare the log message is not freed after it has been used, resulting in a small memory leak each time a server goes DOWN because of an operational change. This is a 2.8 specific bug, no backport needed unless the above commit gets backported.	2023-05-17 09:21:01 +02:00
Aurelien DARRAGON	22d584a993	CLEANUP: server: remove useless tmptrash assigments in srv_update_status() Within srv_update_status subfunctions _op() and _adm(), each time tmptrash is freed, we assign it to NULL to ensure it will not be reused. However, within those functions it is not very useful given that tmptrash is never checked against NULL except upon allocation through alloc_trash_chunk(), which happens everytime a new log message is generated, sent, and then freed right away, so there are no code paths that could lead to tmptrash being checked for reuse (tmptrash is systematically overwritten since all log messages are independant from each other). This was raised by coverity, see GH #2162.	2023-05-17 09:21:01 +02:00
Aurelien DARRAGON	977688bd57	MINOR: server: fix message report when IDRAIN is set and MAINT is cleared Remaining in drain mode after removing one of server admins flags leads to this message being generated: "Server name/backend is leaving forced drain but remains in drain mode." However this is not necessarily true: the server might just be leaving MAINT with the IDRAIN flag set, so the report is incorrect in this case. (FDRAIN was not set so it cannot be cleared) To prevent confusion around this message and to comply with the code comment above it: we remove the "leaving forced drain" precision to make the report suitable for multiple transitions.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	dcbc2d2cac	MINOR: checks/event_hdl: SERVER_CHECK event Adding a new event type: SERVER_CHECK. This event is published when a server's check state ought to be reported. (check status change or check result) SERVER_CHECK event is provided as a server event with additional data carrying relevant check's context such as check's result and health.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	a163d65254	MINOR: server/event_hdl: add SERVER_ADMIN event Adding a new SERVER event in the event_hdl API. SERVER_ADMIN is implemented as an advanced server event. It is published each time the administrative state changes. (when s->cur_admin changes) SERVER_ADMIN data is an event_hdl_cb_data_server_admin struct that provides additional info related to the admin state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	c249f6d964	OPTIM: server: publish UP/DOWN events from STATE change Reuse cb_data from STATE event to publish UP and DOWN events. This saves some CPU time since the event is only constructed once to publish STATE, STATE+UP or STATE+DOWN depending on the state change.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	e3eea29f48	MINOR: server/event_hdl: add SERVER_STATE event Adding a new SERVER event in the event_hdl API. SERVER_STATE is implemented as an advanced server event. It is published each time the server's effective state changes. (when s->cur_state changes) SERVER_STATE data is an event_hdl_cb_data_server_state struct that provides additional info related to the server state change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-05-05 16:28:32 +02:00
Aurelien DARRAGON	306a5fc987	MINOR: server/event_hdl: publish macro helper add a macro helper to help publish server events to global and per-server subscription list at once since all server events support both subscription modes.	2023-05-05 16:28:32 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	eed5da1037	MINOR: clock: do not use now.tv_sec anymore Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec part to disappear. At this point, "now" is only used as a timeval in clock.c where it is updated.	2023-04-28 16:08:08 +02:00

1 2 3 4 5 ...

786 Commits