507 Commits

Author SHA1 Message Date
Olivier Houchard
b3ad7b6371 MEDIUM: peers: Give up if we fail to take locks in hot path
In peer_send_msgs(), give up in order to retry later if we failed at
getting the update read lock.
Similarly, in __process_running_peer_sync(), give up and just reschedule
the task if we failed to get the peer lock. There is an heavy contention
on both those locks, so we could spend a lot of time trying to get them.
This helps getting peers perform better under heavy load.
2025-05-02 15:27:55 +02:00
Aurelien DARRAGON
bd48e26a74 CLEANUP: proxy: mention that px->conn_retries isn't relevant in some cases
Since 91e785edc ("MINOR: stream: Rely on a per-stream max connection
retries value"), px->conn_retries may be ignored in the following cases:

 * proxy not part of a list which gets properly post-init (ie: main proxy
   list, log-forward list, sink list)
 * proxy lacking the CAP_FE capability

Documenting such cases where the px->conn_retries is set but effectively
ignored, so that we either remove ignored statements or fix them in
the future if they are really needed. In fact all cases affected here are
automomous applets that already handle the retries themselves so the fact
that 91e785edc made ->conn_retries ineffective should not be a big deal
anyway.
2025-04-29 21:21:19 +02:00
Willy Tarreau
1af592c511 MINOR: stick-table: use a separate lock label for updates
Too many locks were sharing STK_TABLE_LOCK making it hard to analyze.
Let's split the already heavily used update lock.
2025-04-24 14:02:22 +02:00
Aurelien DARRAGON
4194f756de MEDIUM: tree-wide: avoid manually initializing proxies
In this patch we try to use the proxy API init functions as much as
possible to avoid code redundancy and prevent proxy initialization
errors. As such, we prefer using alloc_new_proxy() and setup_new_proxy()
instead of manually allocating the proxy pointer and performing the
base init ourselves.
2025-04-10 22:10:31 +02:00
Emeric Brun
b02b8453d1 BUG/MEDIUM: peers: prevent learning expiration too far in futur from unsync node
This patch sets the expire of the entry to the max value in
configuration if the value showed in the peer update message
is too far in futur.

This should be backported an all supported branches.
2025-04-03 11:26:29 +02:00
Emeric Brun
00461fbfbf BUG/MINOR: peers: fix expire learned from a peer not converted from ms to ticks
This is has now impact currently since MS_TO_TICKS macro does nothing
but it will prevent further bugs.
2025-04-03 11:26:21 +02:00
Aurelien DARRAGON
1f73d3524d MINOR: stktable: implement "recv-only" table option
When "recv-only" keyword is added on a stick table declaration (in peers
or proxy section), haproxy considers that the table is only used for
data retrieval from a remote location and not used to perform local
updates. As such, it enables the retrieval of local-only values such
as conn_cur that are ignored by default. This can be useful in some
contexts where we want to know about local-values such are conn_cur
from a remote peer.

To do this, add stktable struct flags  which default to NONE and enable
the RECV_ONLY flag on the table then "recv-only" keyword is found in the
table declaration. Then, when in peer_treat_updatemsg(), when handling
table updates, don't ignore data updates for local-only values if the flag
is set.
2024-12-05 12:15:24 +01:00
Willy Tarreau
ed55ff878d BUG/MINOR: peers: make sure to always apply offsets to now_ms in expiration
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be a reconnect programmed upon signal
receipt at the wrapping date not having a working timeout.

This should be backported where it applies.
2024-11-15 15:44:05 +01:00
Willy Tarreau
d24768ab44 MINOR: protocol: create abnsz socket address family
For now it's the same as abns. We'll need to modify sock_unix_addrcmp(),
and a few other ones to support effective path length when dealing with
the \0. Let's check with Tristan's patch for this (upcoming patch).

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2024-10-29 12:14:50 +01:00
Willy Tarreau
78ac312bbd MEDIUM: protocol: make abns a custom unix socket address family
This is a pre-requisite to adding the abnsz socket address family:

in this patch we make use of protocol API rework started by 732913f
("MINOR: protocol: properly assign the sock_domain and sock_family") in
order to implement a dedicated address family for ABNS sockets (based on
UNIX parent family).

Thanks to this, it will become trivial to implement a new ABNSZ (for abns
zero) family which is essentially the same as ABNS but with a slight
difference when it comes to path handling (ABNS uses the whole sun_path
length, while ABNSZ's path is zero terminated and evaluation stops at 0)

It was verified that this patch doesn't break reg-tests and behaves
properly (tests performed on the CLI with show sess and show fd).

Anywhere relevant, AF_CUST_ABNS is handled alongside AF_UNIX. If no
distinction needs to be made, real_family() is used to fetch the proper
real family type to handle it properly.

Both stream and dgram were converted, so no functional change should be
expected for this "internal" rework, except that proto will be displayed
as "abns_{stream,dgram}" instead of "unix_{stream,dgram}".

Before ("show sess" output):
  0x64c35528aab0: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=21,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp=

After:
  0x619da7ad74c0: proto=abns_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=22,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp=

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2024-10-29 12:14:25 +01:00
Aurelien DARRAGON
1e0920f855 BUG/MINOR: peers: local entries updates may not be advertised after resync
Since commit 864ac3117 ("OPTIM: stick-tables: check the stksess without
taking the read lock"), when entries for a local table are learned from
another peer upon resynchro, and this is the only peer haproxy speaks to,
local updates on such entries are not advertised to the peer anymore,
until they eventually expire and can be recreated upon local updates.

This is due to the fact that ts->seen is always set to 0 when creating
new entry, and also when touch_remote is performed on the entry.

Indeed, while 864ac3117 attempts to avoid useless updates, it didn't
consider entries learned from a remote peer. Such entries are exclusively
learned in peer_treat_updatemsg(): once the entry is created (or updated)
with new data, touch_remote is used to commit the change. However, unlike
touch_local, entries committed using touch_remote will not be advertised
to the peer from which the entry was just learned (otherwise we would
enter a looping situation). Due to the above patch, once an entry is
learned from the (unique) remote peer, 'seen' will be stuck to 0 so it
will never be advertised for its whole lifetime.

Instead, when entries are learned from a peer, we should consider that
the peer that taught us the entry has seen it.

To do this, let's set seen=1 in peer_treat_updatemsg() after calling
touch_remote(). This way, if we happen to perform updates on this entry,
it will be properly advertized to relevant peers. This patch should not
affect the performance gain documented in 864ac3117 given that the test
scenario didn't involved entries learned by remote peers, but solely
locally created entries advertised to remote peers upon updates.

This should be backported in 3.0 with 864ac3117.
2024-09-16 14:06:39 +02:00
Christopher Faulet
78b8b60030 BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync
When the peer applet is waiting for a synchronisation with the global sync
task, we must notify it won't consume data. Otherwise, if some data are
already waiting in the input buffer, the applet will be woken up in loop and
this wil trigger the watchdog. Once synchronized, the applet is woken up. In
that case, the peer applet must indicate it is going to consume data again.

This patch should fix the issue #2656. It must be backported to 3.0.
2024-08-02 08:42:29 +02:00
Christopher Faulet
3e2d1476e6 BUG/MEDIUM: peers: Fix crash when syncing learn state of a peer without appctx
For a given peer, the synchronization of the learn state is no longer
performed in the peer appctx. It is delayed to be handled by the peers sync
task. It means that for a given peer, it is possible to have finished to
learn and only handle it after the appctx release. So the synchronization
may happen on a peer without appctx.

This was not tested and an unconditionnal wakeup on the appctx could lead to
a crash because of a NULL-deref. It may be experienced by running
reg-tests/peers/tls_basic_sync.vtc script in loop. The fix is obivous. In
sync_peer_learn_state(), we must omit to wakeup the appctx if it was already
released.

This patch should fix issue #2629. It must be backported to 3.0.
2024-07-05 12:14:27 +02:00
Ilia Shipitsin
a65c6d3574 CLEANUP: assorted typo fixes in the code and comments
This is 42nd iteration of typo fixes
2024-05-03 09:01:36 +02:00
Amaury Denoyelle
634cc2a5d8 MINOR: counters: move last_change into counters struct
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.

Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.

Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
2024-05-02 10:55:25 +02:00
Christopher Faulet
4b1a7ea66c BUG/MINOR: peers: Don't wait for a remote resync if there no remote peer
When a resync is needed, a local resync is first tried and if it does not
work, a remote resync is tried. It happens when the worker is started for
instance. There is a timeout to wait for the local resync, except for the
first start. And if the local resync fails or times out, the same timeout
is applied to the remote resync. This one is always applied, even if there
is no remote peer.

On the other hand, on reload, if the old worker has never performed its
resync, it does not try to resync the new worker. And here there is an
issue. On the first reload, when there is no remote peer, we must wait for
the resync timeout expiration to have a chance to resync the new worker. If
the reload happens too early, there is no resync at all. Concretly, after a
fresh start, if a reload happens in the first 5 seconds, there is no resync
with the new worker. The issue only concerns the first reload and affects
the second worker.

To fix the issue, we must only skip the remote resync if there is no remote
peer. This way, on a fresh start, the worker is immediately considered as
resync. The local reynsc is skipped because it is the first worker and the
remote resync is skipped because there is no remote peer.

This patch must be backported to all stable versions.
2024-04-25 21:47:02 +02:00
Christopher Faulet
0243691de1 REORG: peers: Rename all occurrences to 'ps' variable
In loops on the peer list in the code, the 'ps' variable was used as a
shortcut for the peer session. However, if mays be confusing with the peers
section too. So, all occurrences to 'ps' variable were renamed to 'peer'.
2024-04-25 18:29:58 +02:00
Christopher Faulet
fff5f63e10 BUG/MEDIUM: peers: Use atomic operations on peers flags when necessary
Peers flags are mainly used from the sync task. At least, it is only updated
by the sync task. However, there is one place where a peer may read these
flags, when the message marking the end of a synchro is sent.

So to be sure the value retrieved at this place is consistent, we must use
an atomic operation to read it. And of course, from the sync task, atomic
operations must be used to update peers flags. However, from the sync task,
there is no reason to use atomic operations to read flags because they
cannot be update from somewhere eles.
2024-04-25 18:29:58 +02:00
Christopher Faulet
608e23c495 MINOR: peers: Use a static variable to wait a resync on reload
When a process is reloaded, the old process must performed a synchronisation
with the new process. To do so, the sync task notify the local peer to
proceed and waits. Internally, the sync task used PEERS_F_DONOTSTOP flag to
know it should wait. However, this flag was only set/unset in a single
function. There is no real reason to set a flag to do so. A static variable
set to 1 when the resync starts and to 0 when it is finished is enough.
2024-04-25 18:29:58 +02:00
Christopher Faulet
bdcfacdb78 MINOR: peers: Add comment on processing functions of the sync task
Just add a comment on __process_running_peer_sync() and
__process_stopping_peer_sync() functions.
2024-04-25 18:29:58 +02:00
Christopher Faulet
697bd69efc REORG: peers: Move peer and peers flags in the corresponding header file
PEER_F_* and PEERS_F_ * flags were moved to <peer-t.h> header file. It is
mandatory to decode them from "flags" dev tool.
2024-04-25 18:29:58 +02:00
Christopher Faulet
31f544209d MINOR: peers: Reorder and rename PEERS flags
Peers flags were renamed and reordered, mainly to move flags used for
debugging purpose at the end.

PEERS_F_RESYNC_LOCAL and PEERS_F_RESYNC_REMOTE were also renamed to
PEERS_F_RESYNC_LOCAL_FINISHED and PEERS_F_RESYNC_REMOTE_FINISHED to be clear
on the fact the operation is finished when the flag is set.
2024-04-25 18:29:58 +02:00
Christopher Faulet
17c4030aaa MINOR: peers: Reorder and slightly rename PEER flags
There are too many holes in peer flags. So let's reorder them. In addition,
PEER_F_RESYNC_REQUESTED flag was renamed to PEER_F_DBG_RESYNC_REQUESTED to
clearly state it is a flag set for debugging purpose.

Finally, PEER_TEACH_RESET was replaced by PEER_TEACH_FLAGS and the bitwise
complement operator is now used on lines updating the peer flags. It is a
far more common way to do (in HAProxy code at least) and less surprising.
2024-04-25 18:29:58 +02:00
Christopher Faulet
9934eebc19 MINOR: peers: Rename PEERS_F_TEACH_COMPLETE to PEERS_F_LOCAL_TEACH_COMPLETE
PEERS_F_TEACH_COMPLETE flag is only used for the old local peer to let the
sync task know it can stop waiting during a soft-stop. So it is less
confusing to rename this flag to clearly state it concerns local peer only.
2024-04-25 18:29:57 +02:00
Christopher Faulet
45f4698725 MINOR: peers: Start learning for local peer before receiving messages
A local peer assigned for leaning can immediately start to learn, without
sending any request. So we can do that first, before receiving
messages. This way, only PEER_LR_ST_PROCESSING state is evaluating when
received messages are processed.

In addition, when the resync request is sent, we are sure it is for a remote
peer.
2024-04-25 18:29:57 +02:00
Christopher Faulet
c904f7b440 MEDIUM: peers: Use true states for the learn state of a peer
Some flags were used to define the learn state of a peer. It was a bit
confusing, especially because the learn state of a peer is manipulated from
the peer applet but also from the sync task. It is harder to understand the
transitions if it is based on flags than if it is based a dedicated state
based on an enum. It is the purpose of this patch.

Now, we can define the following rules regarding this learn state:

  * A peer is assigned to learn by the sync task
  * The learn state is then changed by the peer itself to notify the
    learning is in progress and when it is finished.
  * Finally, when the peer finished to learn, the sync task must acknowledge
    it by unassigning the peer.
2024-04-25 18:29:57 +02:00
Christopher Faulet
ea9bd6d075 MEDIUM: peers: Use true states for the peer applets as seen from outside
This patch is a cleanup of the recent change about the relation between a
peer and the applet used to deal with I/O. Three flags was introduced to
reflect the peer applet state as seen from outside (from the sync task in
fact). Using flags instead of true states was in fact a bad idea. This work
but it is confusing. Especially because it was mixed with LEARN and TEACH
peer flags.

So, now, to make it clearer, we are now using a dedicated state for this
purpose. From the outside, the peer may be in one of the following state
with respects of its applet:

 * the peer has no applet, it is stopped (PEER_APP_ST_STOPPED).

 * the peer applet was created with a validated connection from the protocol
   perspective. But the sync task must synchronized it with the peers
   section. It is in starting state (PEER_APP_ST_STARTING).

 * The starting starting was acknowledged by the sync task, the peer applet
   can start to process messages. It is in running state
   (PEER_APP_ST_RUNNING).

 * The last peer applet was released and the associated connection
   closed. But the sync task must synchronized it with the peers section. It
   is in stopping state (PEER_APP_ST_STOPPING).

Functionnaly speaking, there is no true change here. But it should be easier
to understand now.

In addition to these changes, __process_peer_state() function was renamed
sync_peer_app_state().
2024-04-25 18:29:57 +02:00
Christopher Faulet
229755d8f5 MEDIUM: peers: Simplify the peer flags dealing with the connection state
Recently, some peer flags were added to deal with the connection state
(PEER_F_ST_*). 3 states were added:

  * RELEASED: Set when we forced to shutdown the peer session and no new
    session was created yet.

  * CONNECTED: Set when the peer has established connection and validated it
    from the peer protocol point of view

  * ACCEPTED: Set when the peer has accepted a connection and validated it
    from the peer protocol point of view

However, management of these pseudo states is a bit confusing. And it
appears there is no reason to have 2 flags to express there is a validated
peer session. CONNECTED state was used for a peer session on the frontend
side while ACCEPTED state was used for a peer session on the backend side.

So, there is now only one "connected" state and we test if the applet was
created on the frontend or the backend side to decide what to do, in
addition to the fact the peer is local or remote.

It is a transitionnal patch. True states will be created to deal with all
this stuff and corresponding flags will be removed.

This patch depends on the commit "MINOR: applet: Add a function to know the
sidde where an applet was created".
2024-04-25 18:29:57 +02:00
Christopher Faulet
0c1ea46fe0 MINOR: peers: Remove unused PEERS_F_RESYNC_PROCESS flag
This flag is now set or unset but never tested. So we can safely remove it.
2024-04-25 18:29:57 +02:00
Christopher Faulet
e35293b2d3 BUG/MEDIUM: peers: Wait for sync task ack when a resynchro is finished
When a learning process is finished, partially or not, the event must be
processed by the sync task. It is important for the peer applet to wait in
this case, especially if the same peer is teaching to another peer, to be
sure to send the right resync finished message (full or partial).

Thanks to the previous patch, we can set PEER_F_WAIT_SYNCTASK_ACK flag on
the peer when a PEER_MSG_CTRL_RESYNCPARTIAL or PEER_MSG_CTRL_RESYNCFINISHED
message is received to be sure to stop the processing. Of course, we must
also take care to wake the peer up after having acknowledged the learn
status from the sync task.

This patch depends on the commit "BUG/MEDIUM: peers: Wait for sync task ack
when a resynchro is finished". Both must be backported if commit 9425aeaffb
("BUG/MAJOR: peers: Update peers section state from a thread-safe manner")
is backported.
2024-04-25 18:29:57 +02:00
Christopher Faulet
12014587fa MINOR: peers: Use a peer flag to block the applet waiting ack of the sync task
Since recent fixes on peers, some changes on a peer must be acknowledged
by the sync task before letting the peer applet processing messages.
Blocking conditions was based on a combination of flags. It was
errorprone. So, this patch introduces PEER_F_WAIT_SYNCTASK_ACK peer flag for
this purpose. This flag is set by the peer when it must wait for an ack from
the sync task. This sync task, on its side, must remove it and wake the peer
up.
2024-04-25 18:29:57 +02:00
Christopher Faulet
f80f1635ec MINOR: peers: Don't set TEACH flags on a peer from the sync task
The TEACH flags only concerns the peer applet. There is no reason to set it
from the sync task. It is confusing. And at the end, after some
refactoring/fixes, setting these flags directly from the peer applet will
allow us to immediatly performing the corresponding teach processing, while
for now we must wait the sync task acknowledges the changes.
2024-04-25 18:29:57 +02:00
Christopher Faulet
6380fd5eb9 MINOR: peers: Remove unused PEERS_F_RESYNC_REQUESTED flag
This flag was used for debugging purpose to know a resync was requested at
least once in the process life. Since the last bunch of fixes about the
peers locking mechanism, this info is now set per-peer. There is no reason
to still have it on peers too. So, just remove it.
2024-04-25 18:29:57 +02:00
Christopher Faulet
2a902e3188 BUG/MEDIUM: peers: Reprocess peer state after all session shutdowns
When a session is shut down, the peer is switched in released state
(PEER_F_ST_RELEASED) and the sync task must process it to eventually
perform some clean up, in case the peer was assigned to learn.

However, this was only true when the session was shut down from the peer
applet itself. This was not performed when it was shut down from the sync
task. It is now fixed.
2024-04-25 18:29:57 +02:00
Christopher Faulet
3541c54481 BUG/MEDIUM: peers: Automatically start to learn on local peer
The previous fix (c0b2015aae "BUG/MEDIUM: peers: Don't set
PEERS_F_RESYNC_PROCESS flag on a peer") was made due to lack of knowledge on
the peers. A local peer, when assigned to learn, must start to learn
immediately without sending any request. This happens on reload.

Thus, in this case, the PEER_F_LEARN_PROCESS flag must be set with
PEER_F_LEARN_ASSIGN flag from the sync task.

This patch must only be backported if the above commit is backported.
2024-04-25 18:29:57 +02:00
Christopher Faulet
d43f0e7f5a BUG/MEDIUM: peers: Fix state transitions of a peer
The commit 9425aeaffb ("BUG/MAJOR: peers: Update peers section state from a
thread-safe manner") introduced regressions about state transitions of a
peer.

A peer may be in a connected, accepted or released state. Before, changes for
these states were performed synchronously. Since the commit above, changes
are mainly performed in the sync process task.

The first regression was about the released then accepted state transition,
called the renewed state. In reality the state was always crushed by the
accepted state. After some review, the state was just removed to always
perform the cleanup in the sync process task before acknowledging the
connected or accepted states.

Then, a wakeup of the peer applet was missing from the sync process task
after the ack of connected or accepted states, blocking the applet.

Finally, when a peer is in released, connected or accepted state, we must
take care to wait the sync process task wakeup before trying to receive or
send messages.

This patch must only be backported if the above commit is backported.
2024-04-19 17:08:22 +02:00
Christopher Faulet
c0b2015aae BUG/MEDIUM: peers: Don't set PEERS_F_RESYNC_PROCESS flag on a peer
The bug was introduced by commit 9425aeaffb ("BUG/MAJOR: peers: Update peers
section state from a thread-safe manner"). A peers flags was set on a peer
by error. Just remove it.

This patch must only be backported if the above commit is backported.
2024-04-19 17:08:22 +02:00
Aurelien DARRAGON
81a8a2cae1 MINOR: peers: stop relying on srv->addr to find peer port
Now that peers entirely rely on peer->srv for connection settings, and
that it was confirmed that it works properly thanks to previous commit,
let's finish what we started in f6ae258 ("MINOR: peers: rely on srv->addr
and remove peer->addr") and stop using srv->addr to find out peers port
and instead rely on srv->svc_port as it's already done for other proxy
types.
2024-04-18 11:18:26 +02:00
Christopher Faulet
494bc03ff7 BUG/MEDIUM: peers: Fix exit condition when max-updates-at-once is reached
When a peer applet is pushing updates, we limit the number of update sent at
once via a global parameter to not spend too much time in the applet. On
interrupt, we claimed for more room to be woken up quickly. However, this
statement is only true if something was pushed in the buffer. Otherwise,
with an empty buffer, if the stream itself is not woken up, the applet
remains also blocked because there is no send activity on the other side to
unblock it.

In this case, instead of requesting more room, it is sufficient to state the
applet have more data to send.

This patch must be backported as far as 2.6.
2024-04-18 09:17:03 +02:00
Christopher Faulet
ffe0874cfb MINOR: peer: Restore previous peer flags value to ease debugging
The last fixes on the peers to improve the locking mechanism introduced new
peer flags and the value of some old flags was changed. This was done in the
commit 9b78e33837 ("MINOR: peers: Add 2 peer flags about the peer learn
status"). But, to ease the debugging of the peers team, old values are
restored.

This patch must be backported with the commit above.
2024-04-16 11:35:47 +02:00
Christopher Faulet
9075a7e32f MEDIUM: peers: Only lock one peer at a time in the sync process function
Thanks to all previous changes, it is now possible to stop locking all peers
at once in the resync process function. Peer are locked one after the
other. Wen a peer is locked, another one may be locked when all peer sharing
the same shard must be updated. Otherwise, at anytime, at most one peer is
locked. This should significantly improve the situation.

This patch depends on the following patchs:

 * BUG/MAJOR: peers: Update peers section state from a thread-safe manner
 * BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
 * MINOR: peers: Add functions to commit peer changes from the resync task
 * MINOR: peers: sligthly adapt part processing the stopping signal
 * MINOR: peers: Add flags to report the peer state to the resync task
 * MINOR: peers: Add 2 peer flags about the peer learn status
 * MINOR: peers: Split resync process function to separate running/stopping states

It may be good to backport it to 2.9. All the seris should fix the issue #2470.
2024-04-16 10:29:21 +02:00
Christopher Faulet
9425aeaffb BUG/MAJOR: peers: Update peers section state from a thread-safe manner
It is the main part of this series. In the peer applet, only the peer flags
are updated. It is now the responsibility of the resync process function to
check changes on each peer to update the peers section state accordingly.

Concretly, changes on the connection state (accepted, connected, released or
renewed) are first reported at the peer level and then handled in
__process_peer_state() function.

In the same manner, when the learn status of a peer changes, the peers
section state is no longer updated immediately. The resync task is woken up
to deal with this changes.

Thanks to these changes, the peers should be now really thread-safe.

This patch relies on the following ones:

  * BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
  * MINOR: peers: Add functions to commit peer changes from the resync task
  * MINOR: peers: sligthly adapt part processing the stopping signal
  * MINOR: peers: Add flags to report the peer state to the resync task
  * MINOR: peers: Add 2 peer flags about the peer learn status
  * MINOR: peers: Split resync process function to separate running/stopping states

No bug was reported about the thread-safety of peers. Only a performance
issue was encountered with a huge number of peers (> 50). So there is no
reason to backport all these patches further than 2.9.
2024-04-16 10:29:21 +02:00
Christopher Faulet
ef066fa186 BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
Flags on the peers section state must be updated from a thread-safe manner.
It is not true today. With this patch we take care PEERS_F_RESYNC_REQUESTED
flag is only set by the resync task. To do so, a peer flag is used. This
flag is only set once and never removed. It is juste used for debugging
purpose. So it is enough to set it on a peer and be sure to report it on the
peers section when the sync task is executed.

This patch relies on previous ones:

 * MINOR: peers: Add functions to commit peer changes from the resync task
 * MINOR: peers: sligthly adapt part processing the stopping signal
 * MINOR: peers: Add flags to report the peer state to the resync task
 * MINOR: peers: Add 2 peer flags about the peer learn status
 * MINOR: peers: Split resync process function to separate running/stopping states
2024-04-16 10:29:21 +02:00
Christopher Faulet
bdf1634883 MINOR: peers: Add functions to commit peer changes from the resync task
For now, nothing is done in these functions. It is only a patch to prepare
the huge part of the refactoring about the locking mechanism of the peers.
These functions will be responsible to check peers state and their learn
status to update the peers section flags accordingly.
2024-04-16 10:29:21 +02:00
Christopher Faulet
4a16560315 MINOR: peers: sligthly adapt part processing the stopping signal
The signal and the PEERS_F_DONOTSTOP flag are now handled in the loop on peers
to force sessions shutdown. We will need to loop on all peers to update their
state. It is easier this way.
2024-04-16 10:29:21 +02:00
Christopher Faulet
4ca8a00955 MINOR: peers: Add flags to report the peer state to the resync task
As the previous patch, this patch is also part of the refactoring of peer
locking mechanisme. Here we add flags to represent a transitional state for
a peer. It will be the resync task responsibility to update the peers state
accordingly.

A peer may be in 4 transitional states:

  * accepted : a connection was accepted from a peer
  * connected: a connection to a peer was established
  * release  : a peer session was released
  * renewed  : a peer session was released because it was replaced by a new
               one. Concretly, this will be equivalent to released+accepted

If none of these flags is set, it means the transition, if any, was
processed by the resync task, or no transition happened.
2024-04-16 10:29:21 +02:00
Christopher Faulet
9b78e33837 MINOR: peers: Add 2 peer flags about the peer learn status
PEER_F_LEARN_PROCESS and PEER_F_LEARN_FINISHED flags are added to help to
fix locking issue about peers. Indeed, a peer is able to update the peers
"section" state under its own lock. Because the resync task locks all peers
at once, there is no conflict at this level. But there is nothing to prevent
2 peers to update the peers state in same time. So it seems there is no real
issue here, but there is a theorical thread-safety issue here. And it means
the locking mechanism of the peers must be reviewed.

In this context, the 2 flags above will help to move all update of the peers
state in the scope of resync task. Each peer will be able to update its own
state and the resync task will be responsible to update the peers state
accordingly.
2024-04-16 10:29:21 +02:00
Christopher Faulet
4078893049 MINOR: peers: Split resync process function to separate running/stopping states
The function responsible to deal with resynchro between all peers is now split
in two subfunctions. The first one is used when HAProxy is running while the
other one is used in soft-stop case.

This patch is required to be able to refactor locking mechanism of the peers.
2024-04-16 10:29:21 +02:00
Willy Tarreau
d8c2f5c586 BUG/MEDIUM: peers/trace: fix crash when listing event types
Sending "trace peers event" on the CLI crashes because the event list
in the peers is not finished. This was introduced in 2.4 by commit
d865935f32 ("MINOR: peers: Add traces to peer_treat_updatemsg().")
so this must be backported to 2.4.
2024-04-12 17:59:55 +02:00
Willy Tarreau
4c1480f13b MINOR: stick-tables: mark the seen stksess with a flag "seen"
Right now we're taking the stick-tables update lock for reads just for
the sake of checking if the update index is past it or not. That's
costly because even taking the read lock is sufficient to provoke a
cache line write, while when under load or attack it's frequent that
the update has not yet been propagated and wouldn't require anything.

This commit brings a new field to the stksess, "seen", which is zeroed
when the entry is updated, and set to one as soon as at least one peer
starts to consult it. This way it will reflect that the entry must be
updated again so that this peer can see it. Otherwise no update will
be necessary. For now the flag is only set/reset but not exploited.
A great care is taken to avoid writes whenever possible.
2024-04-03 17:34:47 +02:00