MINOR: listener: improve incoming traffic distribution

By picking two randoms following the P2C algorithm, we seldom observe
asymmetric loads on bursts of small session counts. This is typically
what makes h2load take a bit of time to complete the last 100% because
if a thread gets two connections while the other ones only have one,
it takes twice the time to complete its work.

This patch proposes a modification of the p2c algorithm which seems
more suitable to this case : it mixes a rotating index with a random.
This way, we're certain that all threads are consulted in turn and at
the same time we're not forced to use the ones we're giving a chance.

This significantly increases the traffic rate. Now h2load shows faster
completion and the average request rates on H2 and the TLS resume rate
increases by a bit more than 5% compared to pure p2c.

The index was placed into the struct bind_conf because 1) it's faster
there and it's the best place to optimally distribute traffic among a
group of listeners. It's the only runtime-modified element there and
it will be quite cache-hot.
This commit is contained in:
Willy Tarreau 2019-03-04 19:57:34 +01:00
parent b238b12e98
commit fc630bd373
2 changed files with 17 additions and 5 deletions

View File

@ -172,6 +172,7 @@ struct bind_conf {
unsigned long bind_thread; /* bitmask of threads allowed to use these listeners */
unsigned long thr_2, thr_4, thr_8, thr_16; /* intermediate values for bind_thread counting */
unsigned int thr_count; /* #threads bound */
unsigned int thr_idx; /* thread indexes for queue distribution : (t2<<16)+t1 */
uint32_t ns_cip_magic; /* Excepted NetScaler Client IP magic number */
struct list by_fe; /* next binding for the same frontend, or NULL */
char *arg; /* argument passed to "bind" for better error reporting */

View File

@ -847,12 +847,23 @@ void listener_accept(int fd)
count = l->bind_conf->thr_count;
if (count > 1 && (global.tune.options & GTUNE_LISTENER_MQ)) {
struct accept_queue_ring *ring;
int r, t1, t2, q1, q2;
int t1, t2, q1, q2;
/* pick two small distinct random values and drop lower bits */
r = (random() >> 8) % ((count - 1) * count);
t2 = r / count; // 0..thr_count-2
t1 = r % count; // 0..thr_count-1
/* pick a first thread ID using a round robin index,
* and a second thread ID using a random. The
* connection will be assigned to the one with the
* least connections. This provides fairness on short
* connections (round robin) and on long ones (conn
* count).
*/
t1 = l->bind_conf->thr_idx;
do {
t2 = t1 + 1;
if (t2 >= count)
t2 = 0;
} while (!HA_ATOMIC_CAS(&l->bind_conf->thr_idx, &t1, t2));
t2 = (random() >> 8) % (count - 1); // 0..thr_count-2
t2 += t1 + 1; // necessarily different from t1
if (t2 >= count)