haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-09 08:37:04 +02:00

Author	SHA1	Message	Date
Patrick Hemmer	57926fe8a3	MINOR: peers: add peers keyword registration This adds support for registering keywords in the 'peers' section.	2023-07-20 18:12:44 +02:00
Christopher Faulet	7b3d38a633	MEDIUM: tree-wide: Change sc API to specify required free space to progress sc_need_room() now takes the required free space to receive more data as parameter. All calls to this function are updated accordingly. For now, this value is set but not used. When we are waiting for a buffer, 0 is used. So we expect to be unblocked ASAP. However this must be reviewed because SC_FL_NEED_BUF is probably enough in this case and this flag is already set if the input buffer allocation fails.	2023-05-05 15:44:23 +02:00
Christopher Faulet	7a48b72d39	MINOR: peers: Use the applet API to send message The peers applet now use the applet API to send message instead of the channel API. This way, it does not need to take care to request more room if it fails to put data into the channel's buffer.	2023-05-05 15:41:30 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	eed5da1037	MINOR: clock: do not use now.tv_sec anymore Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec part to disappear. At this point, "now" is only used as a timeval in clock.c where it is updated.	2023-04-28 16:08:08 +02:00
Christopher Faulet	3d949010bc	MEDIUM: peers: Use the sedesc to report and detect end of processing Just like for other applets, we now use the SE descriptor instead of the channel to report error and end-of-stream. We must just be sure to consume request data when we are waiting the applet to be released.	2023-04-05 08:57:05 +02:00
Christopher Faulet	9a790f63ed	MINOR: stconn/channel: Move CF_READ_DONTWAIT into the SC and rename it The channel flag CF_READ_DONTWAIT is renamed to SC_FL_RCV_ONCE and moved into the stream-connector.	2023-04-05 08:57:05 +02:00
Christopher Faulet	b08c5259eb	MINOR: stconn: Always report READ/WRITE event on shutr/shutw It was done by hand by callers when a shutdown for read or write was performed. It is now always handled by the functions performing the shutdown. This way the callers don't take care of it. This will avoid some bugs.	2023-02-22 15:59:16 +01:00
Willy Tarreau	03926129b0	BUG/MEDIUM: peers: make "show peers" more careful about partial initialization Since 2.6 with commit `34e4085f8` ("MEDIUM: peers: Balance applets across threads") the initialization of a peers appctx may be postponed with a wakeup, causing some partially initialized appctx to be visible. The "show peers" command used to only care about peers without appctx, but now it must also take care of those with no stconn, otherwise it can occasionally crash while dumping them. This fix must be backported to 2.6. Thanks to Patrick Hemmer for reporting the problem.	2023-01-12 17:09:34 +01:00
Christopher Faulet	6e1bbc446b	REORG: channel: Rename CF_READ_NULL to CF_READ_EVENT CF_READ_NULL flag is not really useful and used. It is a transient event used to wakeup the stream. As we will see, all read events on a channel may be resumed to only one and are all used to wake up the stream. In this patch, we introduce CF_READ_EVENT flag as a replacement to CF_READ_NULL. There is no breaking change for now, it is just a rename. Gradually, other read events will be merged with this one.	2023-01-09 18:41:08 +01:00
Aurelien DARRAGON	f648767a4e	MINOR: peers: unused code path in process_peer_sync In process_peer_sync: a check was performed to know whether the peers section handler should kill itself if the corresponding proxy was not started on the current process. This logic was initially implemented in early 1.6 development to prevent some issues when peers where used in conjunction with nbproc > 1: `f83d3fe00a` MEDIUM: init: stop any peers section not bound to the correct process `46dc1ca` MEDIUM: peers: unregister peers that were never started But later in 1.6 dev, a new commit has been introduced: `47c8c029db` MEDIUM: init: completely deallocate unused peers With the latter, the check implemented in `46dc1ca` ("MEDIUM: peers: unregister peers that were never started") will never succeed: it is dead code. Since nbproc support has been dropped in 2.5, things have changed a bit: `f83d3fe00a` logic was moved in mworker_cleanlisteners, but as in `46dc1ca` : peers task is safely destroyed before peers_fe is set to NULL. Conversely, peers_fe is first set by init_peers_frontend() before peers task is scheduled by peers_init_sync() in check_config_validity(). Again, it is safe to say that we will never reach !peers->peers_fe in process_peer_sync(): this self-killing mechanism is not relevant anymore. -- To cut a long story short: I stumbled on this while tracking down current signal api usage. This led me to a signal_unregister_handler() call performed in the aforementionned dead code. To me this code was potentially unsafe because signal_unregister_handler() is not thread safe and here it was used within a task initialized via task_new_anywhere(). So I decided to check how bad this could be (ie: conditions to be met for this code to run).. and here we are.	2022-12-07 18:26:53 +01:00
Willy Tarreau	4ede46be4e	BUG/MINOR: peers: always update the stksess shard number on incoming updates If shards are in use, we must fill the shard number on incoming updates, otherwise some entries are assigned shard number zero, and may be broadcast everywhere once updated, instead of being sent only to the peers having the same shard number. This fixes commit `36d156564` ("MINOR: peers: Support for peer shards"). No backport is needed.	2022-11-29 18:06:42 +01:00
Willy Tarreau	b12be7c1bb	CLEANUP: peers: factor out the key len calculation in received updates In peer_treat_updatemsg(), the lower layers of the stick-table code are reimplemented, and the key length is never really known for an entry being processed, it depends on the type being parsed and the moment where it's done. This makes it quite difficult to stuff some shard number calculation there. This patch adds a keylen local variable that is always set to the length of the current key depending on its type. It takes this opportunity for reducing redudant expressions involving this length and always using the new variable instead, limiting the risk of errors. Arguably that code would have been way simpler by creating a dummy stktable_key and passing it to stksess_new() as done anywhere else, but let's not change all that a few days before the release.	2022-11-29 18:06:42 +01:00
Willy Tarreau	d05aa38950	CLEANUP: peers: fix format string for status messages (int signedness) In issue #1939, Ilya mentions that cppchecks warned about use of "%d" to report the status state that's locally stored as an unsigned int. While technically valid, this will never cause any trouble since in the end what we store there are the applet's states (just a few enum values). Better use %u anyway to silence this warning.	2022-11-24 15:32:20 +01:00
Christopher Faulet	4cfdcbbd19	BUILD: peers: Remove unused variables Since `0909f62266` ("BUG/MEDIUM: peers: messages about unkown tables not correctly ignored"), the 'sc' variable is no longer used in peer_treat_updatemsg() and peer_treat_definemsg() functions. So, we must remove them to avoid compilation warning. This patch must be backported with the commit above.	2022-11-18 16:40:56 +01:00
Emeric Brun	0909f62266	BUG/MEDIUM: peers: messages about unkown tables not correctly ignored Table defintion's messages and update messages are not correctly ignored if the table is not configured on the local peer. It is a bug because, receiving those messages, the parser returns an error and the upper layer considers that the state of the peer's connection is modified (as it is done in the case of protocol error) and switch immediatly the automate to process the new state. But, even if message is silently ignored because the connection's state doesn't change and we continue to process the next message, some processing remains not performed: for instance the ALIVE flag is not set on the peer's connection as it should be done after receiving any valid messages. This results in a shutdown of the connection when timeout is elapsed as if no message has been received during this delay. This patch fix the behavior, those messages are now silently ignored and the upper layer continue the processing as it is done for any valid messages. This bug appears with the code re-work of the peers on 2.0 so it should be backported until this version.	2022-11-18 15:54:33 +01:00
Willy Tarreau	7910825409	BUILD: peers: use __fallthrough in peer_io_handler() This avoids 7 build warnings when preprocessing happens before compiling with gcc >= 7.	2022-11-14 11:14:02 +01:00
Ilya Shipitsin	4a689dad03	CLEANUP: assorted typo fixes in the code and comments This is 32nd iteration of typo fixes	2022-10-30 17:17:56 +01:00
Emeric Brun	ac556082e7	MINOR: peers: handle multiple resync requests using shards We considered the resync process is finished if a full resync request is ended receiving the "resync-finish" message. But in the case of "shards" each node declared with a "shard" has only a partial view of the table. And the resync process is ended whereas the original peer tables content contains only a "shard" of the full content. This patch allow to retrieve the entire tables requesting a resync from all different "shards". To do so we don't commit the end of a resync process receiving a "resync-finish" if the node is part of "shard", we only flag this peer and all peers using the same shard as "notup2date" as if we received a "resync-partial" message, and we re-schedule a request of a resync as it is done receiving a "resync-partial" message. Doing this the peers flagged "notup2date" won't be addressed for the next resync request round and the next resync request will be send to a shard not yet requested. Receving a "resync-finish" message we also check if all peers using "shards" are flagged "notup2date". It meens that all peers have been addressed and we can considered the resync process is now finished. Note also that the "resync request" scheduler already handle a timeout and if we are not able to retrieve a full resync after a delay. The resync process is ended. This patch should be backported in all versions handling "shard" on peer lines.	2022-10-24 10:55:53 +02:00
Fr�d�ric L�caille	36d1565640	MINOR: peers: Support for peer shards Add "shards" new keyword for "peers" section to configure the number of peer shards attached to such secions. This impact all the stick-tables attached to the section. Add "shard" new "server" parameter to configure the peers which participate to all the stick-tables contents distribution. Each peer receive the stick-tables updates only for keys with this shard value as distribution hash. The "shard" value is stored in ->shard new server struct member. cfg_parse_peers() which is the function which is called to parse all the lines of a "peers" section is modified to parse the "shards" parameter stored in ->nb_shards new peers struct member. Add srv_parse_shard() new callback into server.c to pare the "shard" parameter. Implement stksess_getkey_hash() to compute the distribution hash for a stick-table key as the 64-bits xxhash of the key concatenated to the stick-table name. This function is called by stksess_setkey_shard(), itself called by the already implemented function which create a new stick-table key (stksess_new()). Add ->idlen new stktable struct member to store the stick-table name length to not have to compute it each time a stick-table key hash is computed.	2022-10-24 10:55:53 +02:00
Willy Tarreau	76642223f0	MEDIUM: stick-table: switch the table lock to rwlock Right now a spinlock is used, but most accesses are for reads, so let's switch the lock to an rwlock and switch all accesses to exclusive locks for now. There should be no visible difference at this point.	2022-10-12 14:19:05 +02:00
Christopher Faulet	b372f16d35	BUG/MEDIUM: peers: Don't start resync on reload if local peer is not up-to-date On a reload, if the previous resync was not finished, the freshly old worker must not try to start a new resync. Otherwise, it will compete with the older wokers, slowing down or blocking the resync. Only an up-to-date woker must try to perform a local resync. This patch must be backported as far as 2.0 (and maybe to 1.8 too).	2022-08-29 11:38:02 +02:00
Christopher Faulet	19a82b9495	BUG/MEDIUM: peers: Don't use resync timer when local resync is in progress When a worker is stopped, the resync timer is used to limit in time the connection stage to the new worker to perform the local resync. However, this timer must be stopped when the resync is in progress and it must be re-armed if the resync is interrupted (for instance because another reload). Otherwise, if the resync is a bit long, an old worker may be killed too early. This bug was introduce by the commit `160fff665` ("BUG/MEDIUM: peers: limit reconnect attempts of the old process on reload"). It must be backported as far as 2.0.	2022-08-29 11:38:02 +02:00
Christopher Faulet	13db4bdbc6	BUG/MEDIUM: peers: Add connect and server timeut to peers proxy Only the client timeout was set. Nothing prevent a peer applet to stall during a connect or waiting a message from a remote peer. To avoid any issue, it is important to also set connection and server timeouts. The connect timeout is set to 1s and the server timeout is set to 5s. This patch must be backported to all supported versions.	2022-08-29 11:38:02 +02:00
Willy Tarreau	8bd146d8af	MEDIUM: peers: limit the number of updates sent at once As seen in GH issue #1770, peers synchronization do not cope well with very large buffers because by default the only two reasons for stopping the processing of updates is either that the end was reached or that the buffer is full. This can cause high latencies, and even rightfully trigger the watchdog when the operations are numerous and slowed down by competition on the stick-table lock. This patch introduces a limit to the number of messages one may send at once, which now defaults to 200, regardless of the buffer size. This means taking and releasing the lock up to 400 times in a row, which is costly enough to let some other parts work. After some observation this could be backported to 2.6. If so, however, previous commits "BUG/MEDIUM: applet: fix incorrect check for abnormal return condition from handler" and "BUG/MINOR: applet: make the call_rate only count the no-progress calls" must be backported otherwise the call rate might trigger the looping protection.	2022-08-23 20:19:11 +02:00
Christopher Faulet	642170a653	BUG/MINOR: peers: Use right channel flag to consider the peer as connected When a peer open a new connection to another peer, it is considered as connected when the hello message is sent. To do so, the peer applet was relying on CF_WRITE_PARTIAL channel flag. However it is not the right flag to use. This one is a transient flag. Depending on the scheduling, this flag may be removed by the stream before the peer has a chance to see it. Instead, CF_WROTE_DATA flag must be checked. This patch is related to the issue #1799. It must be backported as far as 2.0.	2022-08-03 09:56:38 +02:00
Christopher Faulet	160fff665e	BUG/MEDIUM: peers: limit reconnect attempts of the old process on reload When peers are configured and HAProxy is reloaded or restarted, a synchronization is performed between the old process and the new one. To do so, the old process connects on the new one. If the synchronization fails, it retries. However, there is no delay and reconnect attempts are not bounded. Thus, it may loop for a while, consuming all the CPU. Of course, it is unexpected, but it is possible. For instance, if the local peer is misconfigured, an infinite loop can be observed if the connection succeeds but not the synchronization. This prevents the old process to exit, except if "hard-stop-after" option is set. To fix the bug, the reconnect is delayed. The local peer already has a expiration date to delay the reconnects. But it was not used on stopping mode. So we use it not. Thanks to the previous fix, the reconnect timeout is shorter in this case (500ms against 5s on running mode). In addition, we also use the peers resync expiration date to not infinitely retries. It is accurate because the new process, on its side, use this timeout to switch from a local resync to a remote resync. This patch depends on "MINOR: peers: Use a dedicated reconnect timeout when stopping the local peer". It fixes the issue #1799. It should be backported as far as 2.0.	2022-08-03 09:56:38 +02:00
Christopher Faulet	ab4b094055	MINOR: peers: Use a dedicated reconnect timeout when stopping the local peer When a process is stopped or reload, a dedicated reconnect timeout is now used. For now, this timeout is not used because the current code retries immediately to reconnect to perform the local synchronization with the new local peer, if any. This patch is required to fix the issue #1799. It should be backported as far as 2.0 with next fixes.	2022-08-03 09:56:38 +02:00
Willy Tarreau	29ffe26733	MAJOR: task: use t->tid instead of ffsl(t->thread_mask) to take the thread ID At several places we need to figure the ID of the first thread allowed to run a task. Till now this was performed using my_ffsl(t->thread_mask) but since we now have the thread ID stored into the task, let's use it instead. This is tagged major because it starts to assume that tid<0 is strictly equivalent to atleast2(thread_mask), and that as such, among the allowed threads are the current one.	2022-07-01 19:15:14 +02:00
Willy Tarreau	50e77b2b85	CLEANUP: peers/cli: make peers_dump_peer() take an appctx instead of an stconn By having the appctx in argument this function wouldn't have experienced the previous bug. Better do that now to avoid proliferation of awkward functions.	2022-05-31 08:55:54 +02:00
Willy Tarreau	fc5059958f	CLEANUP: peers/cli: stop misusing the appctx local variable In the context of a CLI command, it's particularly not welcome to use an "appctx" variable that is not the current one. In addition it was created for use at exactly 6 places in 2 lines. Let's just remove it and stick to peer->appctx which is used elsewhere in the function and is unambiguous.	2022-05-31 08:53:25 +02:00
Willy Tarreau	ccea010104	BUG/MEDIUM: peers/cli: fix "show peers" crash Commit `d0a06d52f` ("CLEANUP: applet: use applet_put*() everywhere possible") replaced most accesses to the conn_stream with simpler accesses to the appctx. Unfortunately, in all the CLI functions using an appctx, one makes an exception where the appctx is not the caller's but the one being inspected! When no peers connection is active, the early exit immediately crashes. No backport is needed.	2022-05-31 08:49:29 +02:00
Willy Tarreau	c12b321661	CLEANUP: applet: rename appctx_cs() to appctx_sc() It returns a stream connector, not a conn_stream anymore, so let's fix its name.	2022-05-27 19:33:35 +02:00
Willy Tarreau	da30490b9c	CLEANUP: peers: rename all occurrences of stconn "cs" to "sc" In the applet, function arguments and local variables called "cs" were renamed to "sc" to avoid future confusion.	2022-05-27 19:33:35 +02:00
Willy Tarreau	475e4636bc	CLEANUP: cli: rename all occurrences of stconn "cs" to "sc" Function arguments and local variables called "cs" were renamed to "sc" in the various keyword handlers.	2022-05-27 19:33:35 +02:00
Willy Tarreau	cb086c6de1	REORG: stconn: rename conn_stream.{c,h} to stconn.{c,h} There's no more reason for keepin the code and definitions in conn_stream, let's move all that to stconn. The alphabetical ordering of include files was adjusted.	2022-05-27 19:33:35 +02:00
Willy Tarreau	5edca2f0e1	REORG: rename cs_utils.h to sc_strm.h This file contains all the stream-connector functions that are specific to application layers of type stream. So let's name it accordingly so that it's easier to figure what's located there. The alphabetical ordering of include files was preserved.	2022-05-27 19:33:35 +02:00
Willy Tarreau	74568cf023	CLEANUP: stconn: rename final state manipulation functions from cs_* to sc_* This applies the following renaming. It's a bit large but pretty mechanical: cs_state -> sc_state (enum) cs_alloc_ibuf() -> sc_alloc_ibuf() cs_is_conn_error() -> sc_is_conn_error() cs_opposite() -> sc_opposite() cs_report_error() -> sc_report_error() cs_set_state() -> sc_set_state() cs_state_bit() -> sc_state_bit() cs_state_in() -> sc_state_in() cs_state_str() -> sc_state_str()	2022-05-27 19:33:35 +02:00
Willy Tarreau	f61dd19284	CLEANUP: stconn: rename cs_{shut,chk}* to sc_* This applies the following renaming: cs_shutr() -> sc_shutr() cs_shutw() -> sc_shutw() cs_chk_rcv() -> sc_chk_rcv() cs_chk_snd() -> sc_chk_snd() cs_must_kill_conn() -> sc_must_kill_conn()	2022-05-27 19:33:35 +02:00
Willy Tarreau	90e8b455b7	CLEANUP: stconn: rename cs_cant_get() to se_need_more_data() An equivalent applet_need_more_data() was added as well since that function is mostly used from applet code. It makes it much clearer that the applet is waiting for data from the stream layer.	2022-05-27 19:33:35 +02:00
Willy Tarreau	99615ed85d	CLEANUP: stconn: rename cs_rx_room_{blk,rdy} to sc_{need,have}_room() The new name mor eclearly indicates that a stream connector cannot make any more progress because it needs room in the channel buffer, or that it may be unblocked because the buffer now has more room available. The testing function is sc_waiting_room(). This is mostly used by applets. Note that the flags will change soon.	2022-05-27 19:33:35 +02:00
Willy Tarreau	ea27f48c5a	CLEANUP: stconn: rename cs_{check,strm,strm_task} to sc_strm_* These functions return the app-layer associated with an stconn, which is a check, a stream or a stream's task. They're used a lot to access channels, flags and for waking up tasks. Let's just name them appropriately for the stream connector.	2022-05-27 19:33:34 +02:00
Willy Tarreau	40a9c32e3a	CLEANUP: stconn: rename cs_{i,o}{b,c} to sc_{i,o}{b,c} We're starting to propagate the stream connector's new name through the API. Most call places of these functions that retrieve the channel or its buffer are in applets. The local variable names are not changed in order to keep the changes small and reviewable. There were ~92 uses of cs_ic(), ~96 of cs_oc() (due to co_get() being less factorizable than ci_put), and ~5 accesses to the buffer itself.	2022-05-27 19:33:34 +02:00
Willy Tarreau	d0a06d52f4	CLEANUP: applet: use applet_put() everywhere possible This applies the change so that the applet code stops using ci_putchk() and friends everywhere possible, for the much saferapplet_put() instead. The change is mechanical but large. Two or three functions used to have no appctx and a cs derived from the appctx instead, which was a reminiscence of old times' stream_interface. These were simply changed to directly take the appctx. No sensitive change was performed, and the old (more complex) API is still usable when needed (e.g. the channel is already known). The change touched roughly a hundred of locations, with no less than 124 lines removed. It's worth noting that the stats applet, the oldest of the series, could get a serious lifting, as it's still very channel-centric instead of propagating the appctx along the chain. Given that this code doesn't change often, there's no emergency to clean it up but it would look better.	2022-05-27 19:33:34 +02:00
Willy Tarreau	cb04166525	CLEANUP: stconn: tree-wide rename stream connector flags CS_FL_* to SC_FL_* This follows the natural naming. There are roughly 100 changes, all totally trivial.	2022-05-27 19:33:34 +02:00
Willy Tarreau	7cb9e6c6ba	CLEANUP: stream: rename "csf" and "csb" to "scf" and "scb" These are the stream connectors, let's give them consistent names. The patch is large (405 locations) but totally trivial.	2022-05-27 19:33:34 +02:00
Willy Tarreau	4596fe20d9	CLEANUP: conn_stream: tree-wide rename to stconn (stream connector) This renames the "struct conn_stream" to "struct stconn" and updates the descriptions in all comments (and the rare help descriptions) to "stream connector" or "connector". This touches a lot of files but the change is minimal. The local variables were not even renamed, so there's still a lot of "cs" everywhere.	2022-05-27 19:33:34 +02:00
Christopher Faulet	9e3c8d5512	CLEANUP: peers: Remove unreachable code in peer_session_create() An error label is now unreachable in peer_session_create(). This patch should fix the issue #1704.	2022-05-18 09:04:53 +02:00
Maciej Zdeb	34e4085f8a	MEDIUM: peers: Balance applets across threads When creating a new applet for peer outgoing connection, we check the load on each thread. Threads with least applet count are preferred. With this solution we avoid a situation when many outgoing connections run on the same thread causing significant load on single CPU core.	2022-05-17 16:13:22 +02:00
Maciej Zdeb	d01be2ab13	MINOR: peers: Track number of applets run by thread Maintain number of peers applets run on all threads. It will be used in next patch for least loaded thread selection.	2022-05-17 16:13:22 +02:00

1 2 3 4 5 ...

442 Commits