Properly handle memory allocation failures, by checking the return value
for pool_alloc(), and if it fails, make sure that the caller will take
it into account.
The only use of pool_alloc() in fwlc is to allocate the tree elements in
order to properly queue the server into the ebtree, so if that
allocation fails, just schedule the requeue tasklet, that will try
again, until it hopefully eventually succeeds.
This should be backported to 3.2.
This should fix github issue #3143.
Modify fwlc_srv_reposition() so that it does not assume that the server
was already queued, and so make it so it works even if s->tree_elt is
NULL.
While the server will usually be queued, there is an unlikely
possibility that when the server attempted to get queued when it got up,
it failed due to a memory allocation failure, and it just expect the
server_requeue tasklet to run to take care of that later.
This should be backported to 3.2.
This is part of an attempt to fix github issue #3143
This will make the pools size and alignment automatically inherit
the type declaration. It was done like this:
sed -i -e 's:DECLARE_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_POOL src addons)
sed -i -e 's:DECLARE_STATIC_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_STATIC_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_STATIC_POOL src addons)
81 replacements were made. The only remaining ones are those which set
their own size without depending on a structure. The few ones with an
extra size were manually handled.
It also means that the requested alignments are now checked against the
type's. Given that none is specified for now, no issue is reported.
It was verified with "show pools detailed" that the definitions are
exactly the same, and that the binaries are similar.
Always set unusable if we could not use a server, instead of doing it in
each branch
This should be backported to 3.2 after e28e647fef43e5865c87f328832fec7794a423e5
is backported.
When fwlc_get_next_server(), if a server to avoid has been provided, and
we have to ignore it, don't forget to increase the number of unusable
servers, otherwise we may end up ignoring it over and over, never
switching to another server, in an infinite loop until the process gets
killed.
This hopefully fixes Github issues #3004 and #3014.
This should be backported to 3.2.
In fwlc_srv_reposition(), set the server's tree_elt while we still hold
the lbprm read lock. While it was protected from concurrent
fwlc_srv_reposition() calls by the server's lb_lock, it was not from
dequeuing/requeuing that could occur if the server gets down/up or its
weight is changed, and that would lead to inconsistencies, and the
watchdog killing the process because it is stuck in an infinite loop in
fwlc_get_next_server().
This hopefully fixes github issue #2990.
This should be backported to 3.2.
On ARM with 80 cores and a single server, it's sometimes possible to see
a segfault in fwlc_get_next_server() around 600-700k RPS. It seldom
happens as well on x86 with 128 threads with the same config around 1M
rps. It turns out that in fwlc_get_next_server(), before calling
fwlc_srv_reposition(), we have to drop the lock and that one takes it
back again.
The problem is that anything can happen to our node during this time,
and it can be freed. Then when continuing our work, we later iterate
over it and its next to find a node with an acceptable key, and by
doing so we can visit either uninitialized memory or simply nodes that
are no longer in the tree.
A first attempt at fixing this consisted in artificially incrementing
the elements count before dropping the lock, but that turned out to be
even worse because other threads could loop forever on such an element
looking for an entry that does not exist. Maintaining a separate
refcount didn't work well either, and it required to deal with the
memory release while dropping it, which is really not convenient.
Here we're taking a different approach consisting in simply not
trusting this node anymore and going back to the beginning of the
loop, as is done at a few other places as well. This way we can
safely ignore the possibly released node, and the test runs reliably
both on the arm and the x86 platforms mentioned above. No performance
regression was observed either, likely because this operation is quite
rare.
No backport is needed since this appeared with the leastconn rework
in 3.2.
Since commit 9fe72bba3 ("MAJOR: leastconn; Revamp the way servers are
ordered."), there's no way to escape the loop visiting the mt_list heads
in fwlc_get_next_server if all servers in the list are saturated,
resulting in a watchdog panic. It can be reproduced with this config
and injecting with more than 2 concurrent conns:
balance leastconn
server s1 127.0.0.1:8000 maxconn 1
server s2 127.0.0.1:8000 maxconn 1
Here we count the number of saturated servers that were encountered, and
escape the loop once the number of remaining servers exceeds the number
of saturated ones. No backport is needed since this arrived in 3.2.
Machines lacking CAS8B/DWCAS and emit a warning in lb_fwlc.c without
threads due to declaration ordering. Let's just move the variable
declaration into the block that uses it as a last variable. No
backport is needed.
It may happen that the server is going down, and fwlc_srv_reposition()
is still called, because streams still attached to the server are
being terminated.
So in fwlc_srv_reposition(), just do nothing if we've been removed from
the tree.
This should fix github issue #2919.
This should not be backported, unless commit
9fe72bba3cf3484577fa1ef00723de08df757996 is also backported.

For leastconn, servers used to just be stored in an ebtree.
Each server would be one node.
Change that so that nodes contain multiple mt_lists. Each list
will contain servers that share the same key (typically meaning
they have the same number of connections). Using mt_lists means
that as long as tree elements already exist, moving a server from
one tree element to another does no longer require the lbprm write
lock.
We use multiple mt_lists to reduce the contention when moving
a server from one tree element to another. A list in the new
element will be chosen randomly.
We no longer remove a tree element as soon as they no longer
contain any server. Instead, we keep a list of all elements,
and when we need a new element, we look at that list only if it
contains a number of elements already, otherwise we'll allocate
a new one. Keeping nodes in the tree ensures that we very
rarely have to take the lbrpm write lock (as it only happens
when we're moving the server to a position for which no
element is currently in the tree).
The number of mt_lists used is defined as FWLC_NB_LISTS.
The number of tree elements we want to keep is defined as
FWLC_MIN_FREE_ENTRIES, both in defaults.h.
The value used were picked afrer experimentation, and
seems to be the best choice of performances vs memory
usage.
Doing that gives a good boost in performances when a lot of
servers are used.
With a configuration using 500 servers, before that patch,
about 830000 requests per second could be processed, with
that patch, about 1550000 requests per second are
processed, on an 64-cores AMD, using 1200 concurrent connections.
When leastconn is used under many threads, there can be a lot of
contention on leastconn, because the same node has to be moved around
all the time (when picking it and when releasing it). In GH issue #2861
it was noticed that 46 threads out of 64 were waiting on the same lock
in fwlc_srv_reposition().
In such a case, the accuracy of the server's key becomes quite irrelevant
because nobody cares if the same server is picked twice in a row and the
next one twice again.
While other approaches in the past considered using a floating key to
avoid moving the server each time (which was not compatible with the
round-robin rule for equal keys), here a more drastic solution is needed.
What we're doing instead is that we turn this lock into a trylock. If we
can grab it, we do the job. If we can't, then we just wake up a server's
tasklet dedicated to this. That tasklet will then try again slightly
later, knowing that during this short time frame, the server's position
in the queue is slightly inaccurate. Note that any thread touching the
same server will also reposition it and save that work for next time.
Also if multiple threads wake the tasklet up, then that's fine, their
calls will be merged and a single lock will be taken in the end.
Testing this on a 24-core EPYC 74F3 showed a significant performance
boost from 382krps to 610krps. The performance profile reported by
perf top dropped from 43% to 2.5%:
Before:
Overhead Shared Object Symbol
43.46% haproxy-master-inlineebo [.] fwlc_srv_reposition
21.20% haproxy-master-inlineebo [.] fwlc_get_next_server
0.91% haproxy-master-inlineebo [.] process_stream
0.75% [kernel] [k] ice_napi_poll
0.51% [kernel] [k] tcp_recvmsg
0.50% [kernel] [k] ice_start_xmit
0.50% [kernel] [k] tcp_ack
After:
Overhead Shared Object Symbol
30.37% haproxy [.] fwlc_get_next_server
2.51% haproxy [.] fwlc_srv_reposition
1.91% haproxy [.] process_stream
1.46% [kernel] [k] ice_napi_poll
1.36% [kernel] [k] tcp_recvmsg
1.04% [kernel] [k] tcp_ack
1.00% [kernel] [k] skb_release_data
0.96% [kernel] [k] ice_start_xmit
0.91% haproxy [.] conn_backend_get
0.82% haproxy [.] connect_server
0.82% haproxy [.] run_tasks_from_lists
Tested on an Ampere Altra with 64 aarch64 cores dedicated to haproxy,
the gain is even more visible (3.6x):
Before: 311-323k rps, 3.16-3.25ms, 6400% CPU
Overhead Shared Object Symbol
55.69% haproxy-master [.] fwlc_srv_reposition
33.30% haproxy-master [.] fwlc_get_next_server
0.89% haproxy-master [.] process_stream
0.45% haproxy-master [.] h1_snd_buf
0.34% haproxy-master [.] run_tasks_from_lists
0.32% haproxy-master [.] connect_server
0.31% haproxy-master [.] conn_backend_get
0.31% haproxy-master [.] h1_headers_to_hdr_list
0.24% haproxy-master [.] srv_add_to_idle_list
0.23% haproxy-master [.] http_request_forward_body
0.22% haproxy-master [.] __pool_alloc
0.21% haproxy-master [.] http_wait_for_response
0.21% haproxy-master [.] h1_send
After: 1.21M rps, 0.842ms, 6400% CPU
Overhead Shared Object Symbol
17.44% haproxy [.] fwlc_get_next_server
6.33% haproxy [.] process_stream
4.40% haproxy [.] fwlc_srv_reposition
3.64% haproxy [.] conn_backend_get
2.75% haproxy [.] connect_server
2.71% haproxy [.] h1_snd_buf
2.66% haproxy [.] srv_add_to_idle_list
2.33% haproxy [.] run_tasks_from_lists
2.14% haproxy [.] h1_headers_to_hdr_list
1.56% haproxy [.] stream_set_backend
1.37% haproxy [.] http_request_forward_body
1.35% haproxy [.] http_wait_for_response
1.34% haproxy [.] h1_send
And at similar loads, the CPU usage considerably drops (3.55x), as
well as the response time (10x):
After: 320k rps, 0.322ms, 1800% CPU
Overhead Shared Object Symbol
7.62% haproxy [.] process_stream
4.64% haproxy [.] h1_headers_to_hdr_list
3.09% haproxy [.] h1_snd_buf
3.08% haproxy [.] h1_process_demux
2.22% haproxy [.] __pool_alloc
2.14% haproxy [.] connect_server
1.87% haproxy [.] h1_send
> 1.84% haproxy [.] fwlc_srv_reposition
1.84% haproxy [.] run_tasks_from_lists
1.77% haproxy [.] sock_conn_iocb
1.75% haproxy [.] srv_add_to_idle_list
1.66% haproxy [.] http_request_forward_body
1.65% haproxy [.] wake_expired_tasks
1.59% haproxy [.] h1_parse_msg_hdrs
1.51% haproxy [.] http_wait_for_response
> 1.50% haproxy [.] fwlc_get_next_server
The cost of fwlc_get_next_server() naturally increases as the server
count increases, but now has no visible effect on updates. The load
distribution remains unchanged compared to the previous approach,
the weight still being respected.
For further improvements to the fwlc algo, please consult github
issue #881 which centralizes everything related to this algorithm.
This callback will be used to reposition a server to its expected
position regardless of the fact that it was taken or dropped. It
will only be used by supporting LB algos. For now, only fwlc defines
it and assigns it to fwlc_srv_reposition(). At the moment it's not
used yet.
For both proxies and servers, properly calculates queueslength, which is
the total number of element in each queues (as they currently are only
using one queue, it is equivalent to the number of element of that
queue), and use it instead of the queue's length.
An optimization was brought in commit 5064ab6a9 ("OPTIM: lb-leastconn:
do not unlink the server if it did not change") to avoid locking the
server just to discover it did not move. However a mistake was made
because the operation involves a divide with a value that is read
outside of its usual lock, which makes it possible to be zero at the
exact moment we watch it if another thread takes the server down under
the lbprm lock, resulting in a divide by zero.
Therefore we must check that the value is not null there.
This must be backported to 2.4.
This essentially reverts commit 2b4370078 ("MINOR: lb/api: let callers
of take_conn/drop_conn tell if they have the lock") that was merged
during 2.4 before the various locks could be eliminated at the lower
layers. Passing that information complicates the cleanup of the queuing
code and it's become useless.
The leastconn and roundrobin functions mention that the server's lock
must be held while this is not true at all and it is not used either.
The "first" algo doesn't mention anything about the need for locking,
so let's mention that it uses the lbprm lock.
Due to the two-phase server reservation, there are 3 calls to
fwlc_srv_reposition() per request, one during assign_server() to reserve
the slot, one in connect_server() to commit it, and one in process_stream()
to release it. However only one of the first two will change the key, so
it's needlessly costly to take the lock, remove a server and insert it
again at the same place when we can already figure we ought not to even
have taken the lock.
Furthermore, even when the server needs to move, there can be quite some
contention on the lbprm lock forcing the thread to wait. During this time
the served and nbpend server values might have changed, just like the
lb_node.key itself. Thus we measure the values again under the lock
before updating the tree. Measurements have shown that under contention
with 16 servers and 16 threads, 50% of the updates can be avoided there.
This patch makes the function compute the new key and compare it to
the current one before deciding to move the entry (and does it again
under the lock forthe second test).
This removes between 40 and 50% of the tree updates depending on the
thread contention and the number of servers. The performance gain due
to this (on 16 threads) was:
16 servers: 415 krps -> 440 krps (6%, contention on lbprm)
4 servers: 554 krps -> 714 krps (+29%, little contention)
One point worth thinking about is that it's not logic to update the
tree 2-3 times per request while it's only read once. half to 2/3 of
these updates are not needed. An experiment consisting in subscribing
the server to a list and letting the readers reinsert them on the fly
showed further degradation instead of an improvement.
A better approach would probably consist in avoinding writes to shared
cache lines by having the leastconn nodes distinct from the servers,
with one node per value, and having them hold an mt-list with all the
servers having that number of connections. The connection count tree
would then be read-mostly instead of facing heavy writes, and most
write operations would be performed on 1-3 list heads which are way
cheaper to migrate than a tree node, and do not require updating the
last two updated neighbors' cache lines.
The operations are only an insert and a delete into the LB tree, which
doesn't require the server's lock at all as the lbprm lock is already
held. Let's drop it. Just for the sake of cleanness, given that the
served and nbpend values used to be atomically updated, we'll use an
atomic load to read them.
The two algos defining these functions (first and leastconn) do not need the
server's lock. However it's already present in pendconn_process_next_strm()
so the API must be updated so that the functions may take it if needed and
that the callers indicate whether they already own it.
As such, the call places (backend.c and stream.c) now do not take it
anymore, queue.c was unchanged since it's already held, and both "first"
and "leastconn" were updated to take it if not already held.
A quick test on the "first" algo showed a jump from 432 to 565k rps by
just dropping the lock in stream.c!
Depending on the context, the current eweight or the next one must be used
to reposition a server in the tree. When the server state is updated, for
instance its weight, the next eweight must be used because it is not yet
committed. However, when the server is used, on normal conditions, the
current eweight must be used.
In fact, it is only a bug on the 1.8. On newer versions, the changes on a
server are performed synchronously. But it is safer to rely on the right
eweight value to avoid any futur bugs.
On the 1.8, it is important to do so, because the server state is updated
and committed inside the rendez-vous point. Thus, the next server state may
be unsync with the current state for a short time, waiting all threads join
the rendez-vous point. It is especially a problem if the next eweight is set
to 0. Because otherwise, it must not be used to reposition the server in the
tree, leading to a divide by 0.
This patch must be backported as far as 1.8.
Leastconn has the nice propery of being able to sort servers by their
current usage. It's really a shame to force all requests into the backend
queue when the algo would be able to also consider their current queue.
In order not to change existing behavior but extend it, this patch allows
leastconn to elect servers which are already full if they have an explicitly
configured maxqueue setting above zero and their queue hasn't reached that
threshold. This will significantly reduce the pressure in the backend queue
when queuing a lot with lots of servers.
A test on 8 threads with 100 servers configured with maxconn 1 jumped
from 165krps to 330krps with maxqueue 15 with this patch.
This partially undoes commit 82cd5c13a ("OPTIM: backend: skip LB when we
know the backend is full") but allows to scale much better even by setting
a single-digit maxqueue value. Some better heuristics could be used to
maintain the behavior of the bypass in the patch above, consisting in
keeping it if it's known that there is no server with a configured
maxqueue in the farm (or in the backend).
When servers are queued into the leastconn tree, it's important to also
consider their queue length. There could be some servers with lots of
queued requests that we don't want to hammer with extra connections. In
order not to add extra stress to the LB algorithm, we don't update the
value when adding to the queue, only when updating the connection count
(i.e. picking from the queue or releasing a connection). This will be
sufficient to significantly improve the fairness in such situations.
This function doesn't change the tree, it only looks for the first
usable server, so let's do that under a read lock to limit the
situations like the ones described in issue #881 where finding a
usable server when dealing with lots of saturated ones can be
expensive. At least threads will now be able to look up in
parallel.
It's interesting to note that s->served is not incremented during the
server choice, nor is the server repositionned. So right now already,
nothing prevents multiple threads from picking the same server. This
will not cause a significant imbalance anyway given that the server
will automatically be repositionned at the right place, but this might
be something to improve in the future if it doesn't come with too high
a cost.
It also looks like the way a server's weight is updated could be
revisited so that the write lock gets tighter at the expense of a
short part of inconsistency between weights and servers still present
in the tree.
It was previously a spinlock, and it happens that a number of LB algos
only lock it for lookups, without performing any modification. Let's
first turn it to an rwlock and w-lock it everywhere. This is strictly
identical.
It was carefully checked that every HA_SPIN_LOCK() was turned to
HA_RWLOCK_WRLOCK() and that HA_SPIN_UNLOCK() was turned to
HA_RWLOCK_WRUNLOCK() on this lock. _INIT and _DESTROY were updated too.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
The files remained mostly unchanged since they were OK. However, half of
the users didn't need to include them, and about as many actually needed
to have it and used to find functions like srv_currently_usable() through
a long chain that broke when moving the file.
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.
It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.
The UNIX_MAX_PATH definition was moved to compat.h.
This one used to be stored into debug.h but the debug tools got larger
and require a lot of other includes, which can't use BUG_ON() anymore
because of this. It does not make sense and instead this macro should
be placed into the lower includes and given its omnipresence, the best
solution is to create a new bug.h with the few surrounding macros needed
to trigger bugs and place assertions anywhere.
Another benefit is that it won't be required to add include <debug.h>
anymore to use BUG_ON, it will automatically be covered by api.h. No
less than 32 occurrences were dropped.
The FSM_PRINTF macro was dropped since not used at all anymore (probably
since 1.6 or so).
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.
The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).
A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.
No further cleanup was done yet in order to keep changes minimal.
As discussed in issue #178, the change brought around 1.9-dev11 by commit
1eb6c55808 ("MINOR: lb: make the leastconn algorithm more accurate")
causes some harm in the situation it tried to improve. By always applying
the server's weight even for no connection, we end up always picking the
same servers for the first connections, so under a low load, if servers
only have either 0 or 1 connections, in practice the same servers will
always be picked.
This patch partially restores the original behaviour but still keeping
the spirit of the aforementioned patch. Now what is done is that servers
with no connections will always be picked first, regardless of their
weight, so they will effectively follow round-robin. Only servers with
one connection or more will see an accurate weight applied.
This patch was developed and tested by @malsumis and @jaroslawr who
reported the initial issue. It should be backported to 2.0 and 1.9.
In the function fwlc_srv_reposition(), the server's lb_tree is tested from
outside the lock. So it is possible to remove it after the test and then call
eb32_insert() in fwlc_queue_srv() with a NULL root pointer, which is
invalid. Moving the test in the scope of the lock fixes the bug.
This issue was reported on Github, issue #126.
This patch must be backported to 2.0, 1.9 and 1.8.
The leastconn algorithm queues available servers based on their weighted
current load. But this results in an inaccurate load balancing when weights
differ and the load is very low, because what matters is not the load before
picking the server but the load resulting from picking the server. At the
very least, it must be granted that servers with the highest weight are
always picked first when no server has any connection.
This patch addresses this by simply adding one to the current connections
count when queuing the server, since this is the load the server will have
once picked. This finally allows to bridge the gap that existed between
the "leastconn" and the "first" algorithms.
Since commit 3ff577e ("MAJOR: server: make server state changes
synchronous again"), srv_update_status() calls the various maintenance
operations of the LB algorithms (->set_server_up, ->set_server_down,
->update_server_weight()). These ones are called with a single thread
guaranteed by the rendez-vous point, so the fact that they're lacking
some locks has no effect. However we'll need to remove the rendez-vous
point so we have to take care of properly locking all the LB algos.
The comments have been properly updated on the various functions to
mention their locking expectations. All these functions are called
with the server lock held, and all of them now support concurrent
calls by using the lbprm's lock.
This fix doesn't need to be backported at the moment, though if any
check-specific issue surfaced in 1.8, it could make sense to reuse it.
A lock for LB parameters has been added inside the proxy structure and atomic
operations have been used to update server variables releated to lb.
The only significant change is about lb_map. Because the servers status are
updated in the sync-point, we can call recalc_server_map function synchronously
in map_set_server_status_up/down function.
The server state and weight was reworked to handle
"pending" values updated by checks/CLI/LUA/agent.
These values are commited to be propagated to the
LB stack.
In further dev related to multi-thread, the commit
will be handled into a sync point.
Pending values are named using the prefix 'next_'
Current values used by the LB stack are named 'cur_'
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
We used to call srv_is_usable() with either the current state and weights
or the previous ones. This causes trouble for future changes, so let's first
split it in two variants :
- srv_is_usable(srv) considers the current status
- srv_was_usable(srv) considers the previous status
Detecting that a server's status has changed is a bit messy, as well
as it is to commit the status changes. We'll have to add new conditions
soon and we'd better avoid to multiply the number of touched locations
with the high risk of forgetting them.
This commit introduces :
- srv_lb_status_changed() to report if the status changed from the
previously committed one ;
- svr_lb_commit_status() to commit the current status
The function is now used by all load-balancing algorithms.
A crash was reported by Igor at owind when changing a server's weight
on the CLI. Lukas Tribus could reproduce a related bug where setting
a server's weight would result in the new weight being multiplied by
the initial one. The two bugs are the same.
The incorrect weight calculation results in the total farm weight being
larger than what was initially allocated, causing the map index to be out
of bounds on some hashes. It's easy to reproduce using "balance url_param"
with a variable param, or with "balance static-rr".
It appears that the calculation is made at many places and is not always
right and not always wrong the same way. Thus, this patch introduces a
new function "server_recalc_eweight()" which is dedicated to this task
of computing ->eweight from many other elements including uweight and
current time (for slowstart), and all users now switch to use this
function.
The patch is a bit large but the code was not trivially fixable in a way
that could guarantee this situation would not occur anymore. The fix is
much more readable and has been verified to work with all algorithms,
with both consistent and map-based hashes, and even with static-rr.
Slowstart was tested as well, just like enable/disable server.
The same bug is very likely present in 1.4 as well, so the patch will
probably need to be backported eventhough it will not apply as-is.
Thanks to Lukas and Igor for the information they provided to reproduce it.
All files referencing the previous ebtree code were changed to point
to the new one in the ebtree directory. A makefile variable (EBTREE_DIR)
is also available to use files from another directory.
The ability to build the libebtree library temporarily remains disabled
because it can have an impact on some existing toolchains and does not
appear worth it in the medium term if we add support for multi-criteria
stickiness for instance.
It was becoming painful to have all the LB algos in backend.c.
Let's move them to their own files. A few hashing functions still
need be broken in two parts, one for the contents and one for the
map position.