mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2025-08-06 23:27:04 +02:00
BUG/MEDIUM: master: force the thread count earlier
Christopher bisected that recent commitd0b73bca71
("MEDIUM: listener: switch bind_thread from global to group-local") broke the master socket in that only the first out of the Nth initial connections would work, where N is the number of threads, after which they all work. The cause is that the master socket was bound to multiple threads, despite global.nbthread being 1 there, so the incoming connection load balancing would try to send incoming connections to non-existing threads, however the bind_thread mask would nonetheless include multiple threads. What happened is that in 1.9 we forced "nbthread" to 1 in the master's poll loop with commitb3f2be338b
("MEDIUM: mworker: use the haproxy poll loop"). In 2.0, nbthread detection was enabled by default in commit149ab779cc
("MAJOR: threads: enable one thread per CPU by default"). From this point on, the operation above is unsafe because everything during startup is performed with nbthread corresponding to the default value, then it changes to one when starting the polling loop. But by then we weren't using the wait mode except for reload errors, so even if it would have happened nobody would have noticed. In 2.5 with commitfab0fdce9
("MEDIUM: mworker: reexec in waitpid mode after successful loading") we started to rexecute all the time, not just for errors, so as to release precious resources and to possibly spot bugs that were rarely exposed in this mode. By then the incoming connection LB was enforcing all_threads_mask on the listener's thread mask so that the incorrect value was being corrected while using it. Finally in 2.7 commitd0b73bca71
("MEDIUM: listener: switch bind_thread from global to group-local") replaces the all_threads_mask there with the listener's bind_thread, but that one was never adjusted by the starting master, whose thread group was filled to N threads by the automatic detection during early setup. The best approach here is to set nbthread to 1 very early in init() when we're in the master in wait mode, so that we don't try to guess the best value and don't end up with incorrect bindings anymore. This patch does this and also sets nbtgroups to 1 in preparation for a possible future where this will also be automatically calculated. There is no need to backport this patch since no other versions were affected, but if it were to be discovered that the incorrect bind mask on some of the master's FDs could be responsible for any trouble in older versions, then the backport should be safe (provided that nbtgroups is dropped of course).
This commit is contained in:
parent
38c53944cb
commit
53bfac8c63
@ -833,8 +833,6 @@ static void mworker_loop()
|
|||||||
mworker_catch_sigchld(NULL); /* ensure we clean the children in case
|
mworker_catch_sigchld(NULL); /* ensure we clean the children in case
|
||||||
some SIGCHLD were lost */
|
some SIGCHLD were lost */
|
||||||
|
|
||||||
global.nbthread = 1;
|
|
||||||
|
|
||||||
jobs++; /* this is the "master" job, we want to take care of the
|
jobs++; /* this is the "master" job, we want to take care of the
|
||||||
signals even if there is no listener so the poll loop don't
|
signals even if there is no listener so the poll loop don't
|
||||||
leave */
|
leave */
|
||||||
@ -2076,6 +2074,16 @@ static void init(int argc, char **argv)
|
|||||||
|
|
||||||
LIST_APPEND(&proc_list, &tmproc->list);
|
LIST_APPEND(&proc_list, &tmproc->list);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (global.mode & MODE_MWORKER_WAIT) {
|
||||||
|
/* in exec mode, there's always exactly one thread. Failure to
|
||||||
|
* set these ones now will result in nbthread being detected
|
||||||
|
* automatically.
|
||||||
|
*/
|
||||||
|
global.nbtgroups = 1;
|
||||||
|
global.nbthread = 1;
|
||||||
|
}
|
||||||
|
|
||||||
if (global.mode & (MODE_MWORKER|MODE_MWORKER_WAIT)) {
|
if (global.mode & (MODE_MWORKER|MODE_MWORKER_WAIT)) {
|
||||||
struct wordlist *it, *c;
|
struct wordlist *it, *c;
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user