MINOR: mux-h1: perform a graceful close at 75% glitches threshold

This avoids hitting the hard wall for connections with non-compliant
peers that are accumulating errors. We recycle the connection early
enough to permit to reset the counter. Example below with a threshold
set to 100:

Before, 1% errors:
  $ h1load -H "Host : blah" -c 1 -n 10000000 0:4445
  #     time conns tot_conn  tot_req      tot_bytes    err  cps  rps  bps   ttfb
           1     1     1039   103872        6763365   1038 1k03 103k 54M1 9.426u
           2     1     2128   212793       14086140   2127 1k08 108k 58M5 8.963u
           3     1     3215   321465       21392137   3214 1k08 108k 58M3 8.982u
           4     1     4307   430684       28735013   4306 1k09 109k 58M6 8.935u
           5     1     5390   538989       36016294   5389 1k08 108k 58M1 9.021u

After, no more errors:
  $ h1load -H "Host : blah" -c 1 -n 10000000 0:4445
  #     time conns tot_conn  tot_req      tot_bytes    err  cps  rps  bps   ttfb
           1     1     1509   113161        7487809      0 1k50 113k 59M9 8.482u
           2     1     3002   225101       15114659      0 1k49 111k 60M9 8.582u
           3     1     4508   338045       22809911      0 1k50 112k 61M5 8.523u
           4     1     5971   447785       30286861      0 1k46 109k 59M7 8.772u
           5     1     7472   560335       37955271      0 1k49 112k 61M2 8.537u
This commit is contained in:
Willy Tarreau 2025-12-20 16:48:15 +01:00
parent 05b457002b
commit 5904f8279b
2 changed files with 23 additions and 4 deletions

View File

@ -4211,6 +4211,10 @@ tune.h1.be.glitches-threshold <number>
probably be in the hundreds or thousands to be effective without affecting
slightly bogus servers. It is also possible to only kill connections when the
CPU usage crosses a certain level, by using "tune.glitches.kill.cpu-usage".
Note that a graceful close is attempted at 75% of the configured threshold by
advertising a GOAWAY for a future stream. This ensures that a slightly faulty
connection will stop being used after some time without risking to interrupt
ongoing transfers.
See also: tune.h1.fe.glitches-threshold, bc_glitches, and
tune.glitches.kill.cpu-usage
@ -4226,6 +4230,11 @@ tune.h1.fe.glitches-threshold <number>
probably be in the hundreds or thousands to be effective without affecting
slightly bogus clients. It is also possible to only kill connections when the
CPU usage crosses a certain level, by using "tune.glitches.kill.cpu-usage".
Note that a graceful close is attempted at 75% of the configured threshold by
advertising a GOAWAY for a future stream. This ensures that a slightly non-
compliant client will have the opportunity to create a new connection and
continue to work unaffected without ever triggering the hard close thus
risking to interrupt ongoing transfers.
See also: tune.h1.be.glitches-threshold, fc_glitches, and
tune.glitches.kill.cpu-usage

View File

@ -525,10 +525,20 @@ static inline int _h1_report_glitch(struct h1c *h1c, int increment)
h1_be_glitches_threshold : h1_fe_glitches_threshold;
h1c->glitches += increment;
if (thres && h1c->glitches >= thres &&
(th_ctx->idle_pct <= global.tune.glitch_kill_maxidle)) {
h1c->flags |= H1C_F_ERROR;
return 1;
if (unlikely(thres && h1c->glitches >= (thres * 3 + 1) / 4)) {
/* at 75% of the threshold, we switch to close mode
* to force clients to periodically reconnect.
*/
h1c->h1s->flags = (h1c->h1s->flags & ~H1S_F_WANT_MSK) | H1S_F_WANT_CLO;
/* at 100% of the threshold and excess of CPU usage we also
* actively kill the connection.
*/
if (h1c->glitches >= thres &&
(th_ctx->idle_pct <= global.tune.glitch_kill_maxidle)) {
h1c->flags |= H1C_F_ERROR;
return 1;
}
}
return 0;
}