Commit Graph

496 Commits

Author SHA1 Message Date
Christopher Faulet
78b8b60030 BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync
When the peer applet is waiting for a synchronisation with the global sync
task, we must notify it won't consume data. Otherwise, if some data are
already waiting in the input buffer, the applet will be woken up in loop and
this wil trigger the watchdog. Once synchronized, the applet is woken up. In
that case, the peer applet must indicate it is going to consume data again.

This patch should fix the issue #2656. It must be backported to 3.0.
2024-08-02 08:42:29 +02:00
Christopher Faulet
3e2d1476e6 BUG/MEDIUM: peers: Fix crash when syncing learn state of a peer without appctx
For a given peer, the synchronization of the learn state is no longer
performed in the peer appctx. It is delayed to be handled by the peers sync
task. It means that for a given peer, it is possible to have finished to
learn and only handle it after the appctx release. So the synchronization
may happen on a peer without appctx.

This was not tested and an unconditionnal wakeup on the appctx could lead to
a crash because of a NULL-deref. It may be experienced by running
reg-tests/peers/tls_basic_sync.vtc script in loop. The fix is obivous. In
sync_peer_learn_state(), we must omit to wakeup the appctx if it was already
released.

This patch should fix issue #2629. It must be backported to 3.0.
2024-07-05 12:14:27 +02:00
Ilia Shipitsin
a65c6d3574 CLEANUP: assorted typo fixes in the code and comments
This is 42nd iteration of typo fixes
2024-05-03 09:01:36 +02:00
Amaury Denoyelle
634cc2a5d8 MINOR: counters: move last_change into counters struct
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.

Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.

Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
2024-05-02 10:55:25 +02:00
Christopher Faulet
4b1a7ea66c BUG/MINOR: peers: Don't wait for a remote resync if there no remote peer
When a resync is needed, a local resync is first tried and if it does not
work, a remote resync is tried. It happens when the worker is started for
instance. There is a timeout to wait for the local resync, except for the
first start. And if the local resync fails or times out, the same timeout
is applied to the remote resync. This one is always applied, even if there
is no remote peer.

On the other hand, on reload, if the old worker has never performed its
resync, it does not try to resync the new worker. And here there is an
issue. On the first reload, when there is no remote peer, we must wait for
the resync timeout expiration to have a chance to resync the new worker. If
the reload happens too early, there is no resync at all. Concretly, after a
fresh start, if a reload happens in the first 5 seconds, there is no resync
with the new worker. The issue only concerns the first reload and affects
the second worker.

To fix the issue, we must only skip the remote resync if there is no remote
peer. This way, on a fresh start, the worker is immediately considered as
resync. The local reynsc is skipped because it is the first worker and the
remote resync is skipped because there is no remote peer.

This patch must be backported to all stable versions.
2024-04-25 21:47:02 +02:00
Christopher Faulet
0243691de1 REORG: peers: Rename all occurrences to 'ps' variable
In loops on the peer list in the code, the 'ps' variable was used as a
shortcut for the peer session. However, if mays be confusing with the peers
section too. So, all occurrences to 'ps' variable were renamed to 'peer'.
2024-04-25 18:29:58 +02:00
Christopher Faulet
fff5f63e10 BUG/MEDIUM: peers: Use atomic operations on peers flags when necessary
Peers flags are mainly used from the sync task. At least, it is only updated
by the sync task. However, there is one place where a peer may read these
flags, when the message marking the end of a synchro is sent.

So to be sure the value retrieved at this place is consistent, we must use
an atomic operation to read it. And of course, from the sync task, atomic
operations must be used to update peers flags. However, from the sync task,
there is no reason to use atomic operations to read flags because they
cannot be update from somewhere eles.
2024-04-25 18:29:58 +02:00
Christopher Faulet
608e23c495 MINOR: peers: Use a static variable to wait a resync on reload
When a process is reloaded, the old process must performed a synchronisation
with the new process. To do so, the sync task notify the local peer to
proceed and waits. Internally, the sync task used PEERS_F_DONOTSTOP flag to
know it should wait. However, this flag was only set/unset in a single
function. There is no real reason to set a flag to do so. A static variable
set to 1 when the resync starts and to 0 when it is finished is enough.
2024-04-25 18:29:58 +02:00
Christopher Faulet
bdcfacdb78 MINOR: peers: Add comment on processing functions of the sync task
Just add a comment on __process_running_peer_sync() and
__process_stopping_peer_sync() functions.
2024-04-25 18:29:58 +02:00
Christopher Faulet
697bd69efc REORG: peers: Move peer and peers flags in the corresponding header file
PEER_F_* and PEERS_F_ * flags were moved to <peer-t.h> header file. It is
mandatory to decode them from "flags" dev tool.
2024-04-25 18:29:58 +02:00
Christopher Faulet
31f544209d MINOR: peers: Reorder and rename PEERS flags
Peers flags were renamed and reordered, mainly to move flags used for
debugging purpose at the end.

PEERS_F_RESYNC_LOCAL and PEERS_F_RESYNC_REMOTE were also renamed to
PEERS_F_RESYNC_LOCAL_FINISHED and PEERS_F_RESYNC_REMOTE_FINISHED to be clear
on the fact the operation is finished when the flag is set.
2024-04-25 18:29:58 +02:00
Christopher Faulet
17c4030aaa MINOR: peers: Reorder and slightly rename PEER flags
There are too many holes in peer flags. So let's reorder them. In addition,
PEER_F_RESYNC_REQUESTED flag was renamed to PEER_F_DBG_RESYNC_REQUESTED to
clearly state it is a flag set for debugging purpose.

Finally, PEER_TEACH_RESET was replaced by PEER_TEACH_FLAGS and the bitwise
complement operator is now used on lines updating the peer flags. It is a
far more common way to do (in HAProxy code at least) and less surprising.
2024-04-25 18:29:58 +02:00
Christopher Faulet
9934eebc19 MINOR: peers: Rename PEERS_F_TEACH_COMPLETE to PEERS_F_LOCAL_TEACH_COMPLETE
PEERS_F_TEACH_COMPLETE flag is only used for the old local peer to let the
sync task know it can stop waiting during a soft-stop. So it is less
confusing to rename this flag to clearly state it concerns local peer only.
2024-04-25 18:29:57 +02:00
Christopher Faulet
45f4698725 MINOR: peers: Start learning for local peer before receiving messages
A local peer assigned for leaning can immediately start to learn, without
sending any request. So we can do that first, before receiving
messages. This way, only PEER_LR_ST_PROCESSING state is evaluating when
received messages are processed.

In addition, when the resync request is sent, we are sure it is for a remote
peer.
2024-04-25 18:29:57 +02:00
Christopher Faulet
c904f7b440 MEDIUM: peers: Use true states for the learn state of a peer
Some flags were used to define the learn state of a peer. It was a bit
confusing, especially because the learn state of a peer is manipulated from
the peer applet but also from the sync task. It is harder to understand the
transitions if it is based on flags than if it is based a dedicated state
based on an enum. It is the purpose of this patch.

Now, we can define the following rules regarding this learn state:

  * A peer is assigned to learn by the sync task
  * The learn state is then changed by the peer itself to notify the
    learning is in progress and when it is finished.
  * Finally, when the peer finished to learn, the sync task must acknowledge
    it by unassigning the peer.
2024-04-25 18:29:57 +02:00
Christopher Faulet
ea9bd6d075 MEDIUM: peers: Use true states for the peer applets as seen from outside
This patch is a cleanup of the recent change about the relation between a
peer and the applet used to deal with I/O. Three flags was introduced to
reflect the peer applet state as seen from outside (from the sync task in
fact). Using flags instead of true states was in fact a bad idea. This work
but it is confusing. Especially because it was mixed with LEARN and TEACH
peer flags.

So, now, to make it clearer, we are now using a dedicated state for this
purpose. From the outside, the peer may be in one of the following state
with respects of its applet:

 * the peer has no applet, it is stopped (PEER_APP_ST_STOPPED).

 * the peer applet was created with a validated connection from the protocol
   perspective. But the sync task must synchronized it with the peers
   section. It is in starting state (PEER_APP_ST_STARTING).

 * The starting starting was acknowledged by the sync task, the peer applet
   can start to process messages. It is in running state
   (PEER_APP_ST_RUNNING).

 * The last peer applet was released and the associated connection
   closed. But the sync task must synchronized it with the peers section. It
   is in stopping state (PEER_APP_ST_STOPPING).

Functionnaly speaking, there is no true change here. But it should be easier
to understand now.

In addition to these changes, __process_peer_state() function was renamed
sync_peer_app_state().
2024-04-25 18:29:57 +02:00
Christopher Faulet
229755d8f5 MEDIUM: peers: Simplify the peer flags dealing with the connection state
Recently, some peer flags were added to deal with the connection state
(PEER_F_ST_*). 3 states were added:

  * RELEASED: Set when we forced to shutdown the peer session and no new
    session was created yet.

  * CONNECTED: Set when the peer has established connection and validated it
    from the peer protocol point of view

  * ACCEPTED: Set when the peer has accepted a connection and validated it
    from the peer protocol point of view

However, management of these pseudo states is a bit confusing. And it
appears there is no reason to have 2 flags to express there is a validated
peer session. CONNECTED state was used for a peer session on the frontend
side while ACCEPTED state was used for a peer session on the backend side.

So, there is now only one "connected" state and we test if the applet was
created on the frontend or the backend side to decide what to do, in
addition to the fact the peer is local or remote.

It is a transitionnal patch. True states will be created to deal with all
this stuff and corresponding flags will be removed.

This patch depends on the commit "MINOR: applet: Add a function to know the
sidde where an applet was created".
2024-04-25 18:29:57 +02:00
Christopher Faulet
0c1ea46fe0 MINOR: peers: Remove unused PEERS_F_RESYNC_PROCESS flag
This flag is now set or unset but never tested. So we can safely remove it.
2024-04-25 18:29:57 +02:00
Christopher Faulet
e35293b2d3 BUG/MEDIUM: peers: Wait for sync task ack when a resynchro is finished
When a learning process is finished, partially or not, the event must be
processed by the sync task. It is important for the peer applet to wait in
this case, especially if the same peer is teaching to another peer, to be
sure to send the right resync finished message (full or partial).

Thanks to the previous patch, we can set PEER_F_WAIT_SYNCTASK_ACK flag on
the peer when a PEER_MSG_CTRL_RESYNCPARTIAL or PEER_MSG_CTRL_RESYNCFINISHED
message is received to be sure to stop the processing. Of course, we must
also take care to wake the peer up after having acknowledged the learn
status from the sync task.

This patch depends on the commit "BUG/MEDIUM: peers: Wait for sync task ack
when a resynchro is finished". Both must be backported if commit 9425aeaffb
("BUG/MAJOR: peers: Update peers section state from a thread-safe manner")
is backported.
2024-04-25 18:29:57 +02:00
Christopher Faulet
12014587fa MINOR: peers: Use a peer flag to block the applet waiting ack of the sync task
Since recent fixes on peers, some changes on a peer must be acknowledged
by the sync task before letting the peer applet processing messages.
Blocking conditions was based on a combination of flags. It was
errorprone. So, this patch introduces PEER_F_WAIT_SYNCTASK_ACK peer flag for
this purpose. This flag is set by the peer when it must wait for an ack from
the sync task. This sync task, on its side, must remove it and wake the peer
up.
2024-04-25 18:29:57 +02:00
Christopher Faulet
f80f1635ec MINOR: peers: Don't set TEACH flags on a peer from the sync task
The TEACH flags only concerns the peer applet. There is no reason to set it
from the sync task. It is confusing. And at the end, after some
refactoring/fixes, setting these flags directly from the peer applet will
allow us to immediatly performing the corresponding teach processing, while
for now we must wait the sync task acknowledges the changes.
2024-04-25 18:29:57 +02:00
Christopher Faulet
6380fd5eb9 MINOR: peers: Remove unused PEERS_F_RESYNC_REQUESTED flag
This flag was used for debugging purpose to know a resync was requested at
least once in the process life. Since the last bunch of fixes about the
peers locking mechanism, this info is now set per-peer. There is no reason
to still have it on peers too. So, just remove it.
2024-04-25 18:29:57 +02:00
Christopher Faulet
2a902e3188 BUG/MEDIUM: peers: Reprocess peer state after all session shutdowns
When a session is shut down, the peer is switched in released state
(PEER_F_ST_RELEASED) and the sync task must process it to eventually
perform some clean up, in case the peer was assigned to learn.

However, this was only true when the session was shut down from the peer
applet itself. This was not performed when it was shut down from the sync
task. It is now fixed.
2024-04-25 18:29:57 +02:00
Christopher Faulet
3541c54481 BUG/MEDIUM: peers: Automatically start to learn on local peer
The previous fix (c0b2015aae "BUG/MEDIUM: peers: Don't set
PEERS_F_RESYNC_PROCESS flag on a peer") was made due to lack of knowledge on
the peers. A local peer, when assigned to learn, must start to learn
immediately without sending any request. This happens on reload.

Thus, in this case, the PEER_F_LEARN_PROCESS flag must be set with
PEER_F_LEARN_ASSIGN flag from the sync task.

This patch must only be backported if the above commit is backported.
2024-04-25 18:29:57 +02:00
Christopher Faulet
d43f0e7f5a BUG/MEDIUM: peers: Fix state transitions of a peer
The commit 9425aeaffb ("BUG/MAJOR: peers: Update peers section state from a
thread-safe manner") introduced regressions about state transitions of a
peer.

A peer may be in a connected, accepted or released state. Before, changes for
these states were performed synchronously. Since the commit above, changes
are mainly performed in the sync process task.

The first regression was about the released then accepted state transition,
called the renewed state. In reality the state was always crushed by the
accepted state. After some review, the state was just removed to always
perform the cleanup in the sync process task before acknowledging the
connected or accepted states.

Then, a wakeup of the peer applet was missing from the sync process task
after the ack of connected or accepted states, blocking the applet.

Finally, when a peer is in released, connected or accepted state, we must
take care to wait the sync process task wakeup before trying to receive or
send messages.

This patch must only be backported if the above commit is backported.
2024-04-19 17:08:22 +02:00
Christopher Faulet
c0b2015aae BUG/MEDIUM: peers: Don't set PEERS_F_RESYNC_PROCESS flag on a peer
The bug was introduced by commit 9425aeaffb ("BUG/MAJOR: peers: Update peers
section state from a thread-safe manner"). A peers flags was set on a peer
by error. Just remove it.

This patch must only be backported if the above commit is backported.
2024-04-19 17:08:22 +02:00
Aurelien DARRAGON
81a8a2cae1 MINOR: peers: stop relying on srv->addr to find peer port
Now that peers entirely rely on peer->srv for connection settings, and
that it was confirmed that it works properly thanks to previous commit,
let's finish what we started in f6ae258 ("MINOR: peers: rely on srv->addr
and remove peer->addr") and stop using srv->addr to find out peers port
and instead rely on srv->svc_port as it's already done for other proxy
types.
2024-04-18 11:18:26 +02:00
Christopher Faulet
494bc03ff7 BUG/MEDIUM: peers: Fix exit condition when max-updates-at-once is reached
When a peer applet is pushing updates, we limit the number of update sent at
once via a global parameter to not spend too much time in the applet. On
interrupt, we claimed for more room to be woken up quickly. However, this
statement is only true if something was pushed in the buffer. Otherwise,
with an empty buffer, if the stream itself is not woken up, the applet
remains also blocked because there is no send activity on the other side to
unblock it.

In this case, instead of requesting more room, it is sufficient to state the
applet have more data to send.

This patch must be backported as far as 2.6.
2024-04-18 09:17:03 +02:00
Christopher Faulet
ffe0874cfb MINOR: peer: Restore previous peer flags value to ease debugging
The last fixes on the peers to improve the locking mechanism introduced new
peer flags and the value of some old flags was changed. This was done in the
commit 9b78e33837 ("MINOR: peers: Add 2 peer flags about the peer learn
status"). But, to ease the debugging of the peers team, old values are
restored.

This patch must be backported with the commit above.
2024-04-16 11:35:47 +02:00
Christopher Faulet
9075a7e32f MEDIUM: peers: Only lock one peer at a time in the sync process function
Thanks to all previous changes, it is now possible to stop locking all peers
at once in the resync process function. Peer are locked one after the
other. Wen a peer is locked, another one may be locked when all peer sharing
the same shard must be updated. Otherwise, at anytime, at most one peer is
locked. This should significantly improve the situation.

This patch depends on the following patchs:

 * BUG/MAJOR: peers: Update peers section state from a thread-safe manner
 * BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
 * MINOR: peers: Add functions to commit peer changes from the resync task
 * MINOR: peers: sligthly adapt part processing the stopping signal
 * MINOR: peers: Add flags to report the peer state to the resync task
 * MINOR: peers: Add 2 peer flags about the peer learn status
 * MINOR: peers: Split resync process function to separate running/stopping states

It may be good to backport it to 2.9. All the seris should fix the issue #2470.
2024-04-16 10:29:21 +02:00
Christopher Faulet
9425aeaffb BUG/MAJOR: peers: Update peers section state from a thread-safe manner
It is the main part of this series. In the peer applet, only the peer flags
are updated. It is now the responsibility of the resync process function to
check changes on each peer to update the peers section state accordingly.

Concretly, changes on the connection state (accepted, connected, released or
renewed) are first reported at the peer level and then handled in
__process_peer_state() function.

In the same manner, when the learn status of a peer changes, the peers
section state is no longer updated immediately. The resync task is woken up
to deal with this changes.

Thanks to these changes, the peers should be now really thread-safe.

This patch relies on the following ones:

  * BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
  * MINOR: peers: Add functions to commit peer changes from the resync task
  * MINOR: peers: sligthly adapt part processing the stopping signal
  * MINOR: peers: Add flags to report the peer state to the resync task
  * MINOR: peers: Add 2 peer flags about the peer learn status
  * MINOR: peers: Split resync process function to separate running/stopping states

No bug was reported about the thread-safety of peers. Only a performance
issue was encountered with a huge number of peers (> 50). So there is no
reason to backport all these patches further than 2.9.
2024-04-16 10:29:21 +02:00
Christopher Faulet
ef066fa186 BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
Flags on the peers section state must be updated from a thread-safe manner.
It is not true today. With this patch we take care PEERS_F_RESYNC_REQUESTED
flag is only set by the resync task. To do so, a peer flag is used. This
flag is only set once and never removed. It is juste used for debugging
purpose. So it is enough to set it on a peer and be sure to report it on the
peers section when the sync task is executed.

This patch relies on previous ones:

 * MINOR: peers: Add functions to commit peer changes from the resync task
 * MINOR: peers: sligthly adapt part processing the stopping signal
 * MINOR: peers: Add flags to report the peer state to the resync task
 * MINOR: peers: Add 2 peer flags about the peer learn status
 * MINOR: peers: Split resync process function to separate running/stopping states
2024-04-16 10:29:21 +02:00
Christopher Faulet
bdf1634883 MINOR: peers: Add functions to commit peer changes from the resync task
For now, nothing is done in these functions. It is only a patch to prepare
the huge part of the refactoring about the locking mechanism of the peers.
These functions will be responsible to check peers state and their learn
status to update the peers section flags accordingly.
2024-04-16 10:29:21 +02:00
Christopher Faulet
4a16560315 MINOR: peers: sligthly adapt part processing the stopping signal
The signal and the PEERS_F_DONOTSTOP flag are now handled in the loop on peers
to force sessions shutdown. We will need to loop on all peers to update their
state. It is easier this way.
2024-04-16 10:29:21 +02:00
Christopher Faulet
4ca8a00955 MINOR: peers: Add flags to report the peer state to the resync task
As the previous patch, this patch is also part of the refactoring of peer
locking mechanisme. Here we add flags to represent a transitional state for
a peer. It will be the resync task responsibility to update the peers state
accordingly.

A peer may be in 4 transitional states:

  * accepted : a connection was accepted from a peer
  * connected: a connection to a peer was established
  * release  : a peer session was released
  * renewed  : a peer session was released because it was replaced by a new
               one. Concretly, this will be equivalent to released+accepted

If none of these flags is set, it means the transition, if any, was
processed by the resync task, or no transition happened.
2024-04-16 10:29:21 +02:00
Christopher Faulet
9b78e33837 MINOR: peers: Add 2 peer flags about the peer learn status
PEER_F_LEARN_PROCESS and PEER_F_LEARN_FINISHED flags are added to help to
fix locking issue about peers. Indeed, a peer is able to update the peers
"section" state under its own lock. Because the resync task locks all peers
at once, there is no conflict at this level. But there is nothing to prevent
2 peers to update the peers state in same time. So it seems there is no real
issue here, but there is a theorical thread-safety issue here. And it means
the locking mechanism of the peers must be reviewed.

In this context, the 2 flags above will help to move all update of the peers
state in the scope of resync task. Each peer will be able to update its own
state and the resync task will be responsible to update the peers state
accordingly.
2024-04-16 10:29:21 +02:00
Christopher Faulet
4078893049 MINOR: peers: Split resync process function to separate running/stopping states
The function responsible to deal with resynchro between all peers is now split
in two subfunctions. The first one is used when HAProxy is running while the
other one is used in soft-stop case.

This patch is required to be able to refactor locking mechanism of the peers.
2024-04-16 10:29:21 +02:00
Willy Tarreau
d8c2f5c586 BUG/MEDIUM: peers/trace: fix crash when listing event types
Sending "trace peers event" on the CLI crashes because the event list
in the peers is not finished. This was introduced in 2.4 by commit
d865935f32 ("MINOR: peers: Add traces to peer_treat_updatemsg().")
so this must be backported to 2.4.
2024-04-12 17:59:55 +02:00
Willy Tarreau
4c1480f13b MINOR: stick-tables: mark the seen stksess with a flag "seen"
Right now we're taking the stick-tables update lock for reads just for
the sake of checking if the update index is past it or not. That's
costly because even taking the read lock is sufficient to provoke a
cache line write, while when under load or attack it's frequent that
the update has not yet been propagated and wouldn't require anything.

This commit brings a new field to the stksess, "seen", which is zeroed
when the entry is updated, and set to one as soon as at least one peer
starts to consult it. This way it will reflect that the entry must be
updated again so that this peer can see it. Otherwise no update will
be necessary. For now the flag is only set/reset but not exploited.
A great care is taken to avoid writes whenever possible.
2024-04-03 17:34:47 +02:00
Willy Tarreau
6a2f09de1c OPTIM: peers: avoid the locking dance around peer_send_teach_process_msgs()
In peer_send_msg(), we take a lock before calling
peer_send_teach_process_msgs because of the check on the flags and update
indexes, and the function then drops it then takes it again just to resume
in the same situation, so that on return we can drop it again! Not only
this is absurd because it doubles the costs of taking the lock, it's also
totally inefficient because it takes a write lock while the only usage
that is done with it is to read the indexes! Let's drop the lock from
peer_send_teach_process_msgs() and move it explicitly in its only caller
around the condition, and turn it into a read lock only.
2024-04-03 09:34:08 +02:00
Willy Tarreau
ed45d13321 BUG/MEDIUM: stick-table: use the update lock when reading tables from peers
In 2.9, the stick-tables' locking was split between the lock used to
manipulate the contents (->lock) and the lock used to manipulate the
list of updates and the update indexes (->updt_lock). This was done
with commit 87e072eea5 ("MEDIUM: stick-table: use a distinct lock for
the updates tree"). However a part was overlooked in the peers code,
the parts that consult (and update) the indexes use the table's lock
instead of the update lock. It's surprising that it hasn't caused more
trouble. It's likely due to the fact that the tree nodes are not often
immediately freed and that their memory area remains connected to valid
nodes in the tree during peer_stksess_lookup(), while other parts only
check or update indexes, thus are not that critical.

This needs to be backported wherever the commit above is, thus logically
2.9.
2024-04-03 09:33:10 +02:00
Christopher Faulet
dcd917d972 MINOR: applet: Remove uselelss test on SE_FL_SHR/SHW flags
These both flags are set after releasing the applet, in
appctx_shut(). Concretly, it means the applet is shutdown for reads and
writes. Once set, the applet's I/O handler was no longer called. Tests on
these flags are useless. There is no chance to match them.
2024-02-14 14:22:36 +01:00
Aurelien DARRAGON
f6ae25858d MINOR: peers: rely on srv->addr and remove peer->addr
Similarly to the previous commit, we get rid of unused peer member.

peer->addr was only used to save a copy of the sever's addr at parsing
time. But instead of relying on an intermediate variable, we can actually
use server's address directly when initiating the peer session.

As with other streams created from server's settings (tcp/http, log, ring),
we should rely on srv->svc_port for the port part of the address. This
shouldn't change anything for peers since the address is fully resolved
at parsing time and runtime changes are not supported, but this should
help to make the code future-proof.
2023-12-21 14:22:27 +01:00
Christopher Faulet
a7777bbf79 BUG/MEDIUM: peers: fix partial message decoding
peer_recv_msg() may return because the message is incomplete without
checking if a shutdown is pending for the SC. The function relies on
co_getblk() to detect shutdowns. However, the message length decoding may be
interrupted if the multi-bytes integer is incomplete. In this case, the SC
is not check for shutdowns.

When this happens, this leads to an appctx spinning loop.

This patch should fix the issue #2373. It must be backported to 2.8.
2023-12-05 09:28:53 +01:00
Ilya Shipitsin
80813cdd2a CLEANUP: assorted typo fixes in the code and comments
This is 37th iteration of typo fixes
2023-11-23 16:23:14 +01:00
Aurelien DARRAGON
5158c0ff69 MEDIUM: stktable/peers: "write-to" local table on peer updates
In this patch, we add the possibility to declare on a table definition
("table" in peer section, or "stick-table" in proxy section) that we
want the remote/peer updates on that table to be pushed on a local
haproxy table in addition to the source table.

Consider this example:

  |peers mypeers
  |        peer local 127.0.0.1:3334
  |        peer clust 127.0.0.1:3333
  |        table t1.local type string size 10m store server_id,server_key expire 30s
  |        table t1.clust type string size 10m store server_id,server_key write-to mypeers/t1.local expire 30s

With this setup, we consider haproxy uses t1.local as cache/local table
for read and write operations, and that t1.clust is a remote table
containing datas processed from t1.local and similar tables from other
haproxy peers in a cluster setup. The t1.clust table will be used to
refresh the local/cache one via the "write-to" statement.

What will happen, is that every time haproxy will see entry updates for
the t1.clust table: it will overwrite t1.local table with fresh data and
will update the entry expiration timer. If t1.local entry doesn't exist
yet (key doesn't exist), it will automatically create it. Note that only
types that cannot be used for arithmetic ops will be handled, and this
to prevent processed values from the remote table from interfering with
computations based on values from the local table. (ie: prevent
cumulative counters from growing indefinitely).

"write-to" will only push supported types if they both exist in the source
and the target table. Be careful with server_id and server_key storage
because they are often declared implicitly when referencing a table in
sticking rules but it is required to declare them explicitly for them to
be pushed between a remote and a local table through "write-to" option.

Also note that the "write-to" target table should have the same type as
the source one, and that the key length should be strictly equal,
otherwise haproxy will raise an error due to the tables being
incompatibles. A table that is already being written to cannot be used
as a source table for a "write-to" target.

Thanks to this patch, it will now be possible to use sticking rules in
peer cluster context by using a local table as a local cache which
will be automatically refreshed by one or multiple remote table(s).

This commit depends on:
 - "MINOR: stktable: stktable_init() sets err_msg on error"
 - "MINOR: stktable: check if a type should be used as-is"
2023-11-03 17:30:30 +01:00
Christopher Faulet
60e7116be0 BUG/MEDIUM: peers: Fix synchro for huge number of tables
The number of updates sent at once was limited to not loop too long to emit
updates when the buffer size is huge or when the number of sync tables is
huge. The limit can be configured and is set to 200 by default. However,
this fix introduced a bug. It is impossible to syncrhonize two peers if the
number of tables is higher than this limit. Thus by default, it is not
possible to sync two peers if there are more than 200 tables to sync.

Technically speacking, a teaching process is finished if we loop on all tables
with no new update messages sent. Because we are limited at each call, the loop
is splitted on several calls. However the restart point for the next loop is
always the last table for which we emitted an update message. Thus with more
tables than the limit, the loop never reachs the end point.

Worse, in conjunction with the bug fixed by "BUG/MEDIUM: peers: Be sure to
always refresh recconnect timer in sync task", it is possible to trigger the
watchdog because the applets may be woken up in loop and leave requesting
more room while its buffer is empty.

To fix the issue, restart conditions for a teaching loop were changed. If
the teach process is interrupted, we now save the restart point, called
stop_local_table. It is the last evaluated table on the previous loop. This
restart point is reset when the teach process is finished.

In additionn, the updates_sent variable in peer_send_msgs() was renamed to
updates to avoid ambiguities. Indeed, the variable is incremented, whether
messages were sent or not.

This patch must be backported as far as 2.6.
2023-10-20 14:32:12 +02:00
Christopher Faulet
cebeab3d20 BUG/MEDIUM: peers: Be sure to always refresh recconnect timer in sync task
A sync task used to manage reconnect, sessions creation or shutdown and data
synchronization is responsible to refresh reconnect and heartbeat timers for
each remote peers and trigger applets wakeup. These timers are used to
refresh the sync task timeer itself. Thus it is important to take care to
always properly refresh them.

However, when there are some data to push, the reconnect timer is not
checked. It may be expired and not refreshed. In this case, an expired timer
may be used to the sync task, leading to a storm of wakeups. The sync task
is woken up in loop because its timer is in the past, waking up Peer applets
at each time.

To fix the issue, the peer's reconnect timer is now refresh to the default
reconnect timeout, if necessary, when there are some data to push.

This patch must be backported to all stable versions.
2023-10-19 15:26:43 +02:00
Willy Tarreau
45eeaad45f MEDIUM: peers: drop the stick-table lock before entering peer_send_teachmsgs()
The function drops the lock very early, and the only operations that
are performed on the entry code are updating the current peer's
last_local_table, which doesn't need to be protected. Thus it's
easier to drop the lock before entering the function and it further
limits its scope.

This has raised the peak RPS from 2050 to 2355k/s with a peers section on
the 80-core machine.
2023-08-11 19:03:35 +02:00
Willy Tarreau
87e072eea5 MEDIUM: stick-table: use a distinct lock for the updates tree
Updating an entry in the updates tree is currently performed under the
table's write lock, which causes huge contention with other accesses
such as lookups and free. Aside the updates tree, the update,
localupdate and commitupdate variables, nothing is manipulated, so
let's create a distinct lock (updt_lock) to protect these together
to remove this contention. It required to add an extra lock in the
few places where we delete the update (though only if we're really
going to delete it) to protect the tree. This is very convenient
because now peer_send_teachmsgs() only needs to take this read lock,
and there is very little contention left on the stick-table.

With this alone, the performance jumped from 614k to 1140k/s on a
80-thread machine with a peers section! Stick-table updates with
no peers however now has to stand two locks and slightly regressed
from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the
significant unlocking of the peers updates and considered totally
acceptable.
2023-08-11 19:03:35 +02:00