haproxy/include/haproxy/check-t.h
Willy Tarreau d114f4a68f MEDIUM: checks: spread the checks load over random threads
The CPU usage pattern was found to be high (5%) on a machine with
48 threads and only 100 servers checked every second That was
supposed to be only 100 connections per second, which should be very
cheap. It was figured that due to the check tasks unbinding from any
thread when going back to sleep, they're queued into the shared queue.

Not only this requires to manipulate the global queue lock, but in
addition it means that all threads have to check the global queue
before going to sleep (hence take a lock again) to figure how long
to sleep, and that they would all sleep only for the shortest amount
of time to the next check, one would pick it and all other ones would
go down to sleep waiting for the next check.

That's perfectly visible in time-to-first-byte measurements. A quick
test consisting in retrieving the stats page in CSV over a 48-thread
process checking 200 servers every 2 seconds shows the following tail:

  percentile   ttfb(ms)
  99.98        2.43
  99.985       5.72
  99.99       32.96
  99.995     82.176
  99.996     82.944
  99.9965    83.328
  99.997      83.84
  99.9975    84.288
  99.998      85.12
  99.9985    86.592
  99.999         88
  99.9995    89.728
  99.9999   100.352

One solution could consist in forcefully binding checks to threads at
boot time, but that's annoying, will cause trouble for dynamic servers
and may cause some skew in the load depending on some server patterns.

Instead here we take a different approach. A check remains bound to its
thread for as long as possible, but upon every wakeup, the thread's load
is compared with another random thread's load. If it's found that that
other thread's load is less than half of the current one's, the task is
bounced to that thread. In order to prevent that new thread from doing
the same, we set a flag "CHK_ST_SLEEPING" that indicates that it just
woke up and we're bouncing the task only on this condition.

Tests have shown that the initial load was very unfair before, with a few
checks threads having a load of 15-20 and the vast majority having zero.
With this modification, after two "inter" delays, the load is either zero
or one everywhere when checks start. The same test shows a CPU usage that
significantly drops, between 0.5 and 1%. The same latency tail measurement
is much better, roughly 10 times smaller:

  percentile   ttfb(ms)
  99.98        1.647
  99.985       1.773
  99.99        4.912
  99.995        8.76
  99.996        8.88
  99.9965      8.944
  99.997       9.016
  99.9975      9.104
  99.998       9.224
  99.9985      9.416
  99.999         9.8
  99.9995      10.04
  99.9999     10.432

In fact one difference here is that many threads work while in the past
they were waking up and going down to sleep after having perturbated the
shared lock. Thus it is anticipated that this will scale way smoother
than before. Under strace it's clearly visible that all threads are
sleeping for the time it takes to relaunch a check, there's no more
thundering herd wakeups.

However it is also possible that in some rare cases such as very short
check intervals smaller than a scheduler's timeslice (such as 4ms),
some users might have benefited from the work being concentrated on
less threads and would instead observe a small increase of apparent
CPU usage due to more total threads waking up even if that's for less
work each and less total work. That's visible with 200 servers at 4ms
where show activity shows that a few threads were overloaded and others
doing nothing. It's not a problem, though as in practice checks are not
supposed to eat much CPU and to wake up fast enough to represent a
significant load anyway, and the main issue they could have been
causing (aside the global lock) is an increase last-percentile latency.
2022-10-12 21:49:30 +02:00

188 lines
7.7 KiB
C

/*
* include/haproxy/check-t.h
* Health-checks definitions, enums, macros and bitfields.
*
* Copyright 2008-2009 Krzysztof Piotr Oledzki <ole@ans.pl>
* Copyright (C) 2000-2020 Willy Tarreau - w@1wt.eu
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*
*/
#ifndef _HAPROXY_CHECKS_T_H
#define _HAPROXY_CHECKS_T_H
#include <sys/time.h>
#include <import/ebtree-t.h>
#include <import/ist.h>
#include <haproxy/api-t.h>
#include <haproxy/buf-t.h>
#include <haproxy/connection-t.h>
#include <haproxy/dynbuf-t.h>
#include <haproxy/obj_type-t.h>
#include <haproxy/vars-t.h>
/* Please note: this file tends to commonly be part of circular dependencies,
* so it is important to keep its includes list to the minimum possible (i.e.
* only types whose size needs to be known). Since there are no function
* prototypes nor pointers here, forward declarations are not really necessary.
* This file oughtt to be split into multiple parts, at least regular checks vs
* tcp-checks.
*/
/* enum used by check->result. Must remain in this order, as some code uses
* result >= CHK_RES_PASSED to declare success.
*/
enum chk_result {
CHK_RES_UNKNOWN = 0, /* initialized to this by default */
CHK_RES_NEUTRAL, /* valid check but no status information */
CHK_RES_FAILED, /* check failed */
CHK_RES_PASSED, /* check succeeded and server is fully up again */
CHK_RES_CONDPASS, /* check reports the server doesn't want new sessions */
};
/* flags used by check->state */
#define CHK_ST_INPROGRESS 0x0001 /* a check is currently running */
#define CHK_ST_CONFIGURED 0x0002 /* this check is configured and may be enabled */
#define CHK_ST_ENABLED 0x0004 /* this check is currently administratively enabled */
#define CHK_ST_PAUSED 0x0008 /* checks are paused because of maintenance (health only) */
#define CHK_ST_AGENT 0x0010 /* check is an agent check (otherwise it's a health check) */
#define CHK_ST_PORT_MISS 0x0020 /* check can't be send because no port is configured to run it */
#define CHK_ST_IN_ALLOC 0x0040 /* check blocked waiting for input buffer allocation */
#define CHK_ST_OUT_ALLOC 0x0080 /* check blocked waiting for output buffer allocation */
#define CHK_ST_CLOSE_CONN 0x0100 /* check is waiting that the connection gets closed */
#define CHK_ST_PURGE 0x0200 /* check must be freed */
#define CHK_ST_SLEEPING 0x0400 /* check was sleeping */
/* check status */
enum healthcheck_status {
HCHK_STATUS_UNKNOWN = 0, /* Unknown */
HCHK_STATUS_INI, /* Initializing */
HCHK_STATUS_START, /* Check started - SPECIAL STATUS */
/* Below we have finished checks */
HCHK_STATUS_CHECKED, /* DUMMY STATUS */
HCHK_STATUS_HANA, /* Health analyze detected enough consecutive errors */
HCHK_STATUS_SOCKERR, /* Socket error */
HCHK_STATUS_L4OK, /* L4 check passed, for example tcp connect */
HCHK_STATUS_L4TOUT, /* L4 timeout */
HCHK_STATUS_L4CON, /* L4 connection problem, for example: */
/* "Connection refused" (tcp rst) or "No route to host" (icmp) */
HCHK_STATUS_L6OK, /* L6 check passed */
HCHK_STATUS_L6TOUT, /* L6 (SSL) timeout */
HCHK_STATUS_L6RSP, /* L6 invalid response - protocol error */
HCHK_STATUS_L7TOUT, /* L7 (HTTP/SMTP) timeout */
HCHK_STATUS_L7RSP, /* L7 invalid response - protocol error */
/* Below we have layer 5-7 data available */
HCHK_STATUS_L57DATA, /* DUMMY STATUS */
HCHK_STATUS_L7OKD, /* L7 check passed */
HCHK_STATUS_L7OKCD, /* L7 check conditionally passed */
HCHK_STATUS_L7STS, /* L7 response error, for example HTTP 5xx */
HCHK_STATUS_PROCERR, /* External process check failure */
HCHK_STATUS_PROCTOUT, /* External process check timeout */
HCHK_STATUS_PROCOK, /* External process check passed */
HCHK_STATUS_SIZE
};
/* health status for response tracking */
enum {
HANA_STATUS_UNKNOWN = 0,
HANA_STATUS_L4_OK, /* L4 successful connection */
HANA_STATUS_L4_ERR, /* L4 unsuccessful connection */
HANA_STATUS_HTTP_OK, /* Correct http response */
HANA_STATUS_HTTP_STS, /* Wrong http response, for example HTTP 5xx */
HANA_STATUS_HTTP_HDRRSP, /* Invalid http response (headers) */
HANA_STATUS_HTTP_RSP, /* Invalid http response */
HANA_STATUS_HTTP_READ_ERROR, /* Read error */
HANA_STATUS_HTTP_READ_TIMEOUT, /* Read timeout */
HANA_STATUS_HTTP_BROKEN_PIPE, /* Unexpected close from server */
HANA_STATUS_SIZE
};
enum {
HANA_ONERR_UNKNOWN = 0,
HANA_ONERR_FASTINTER, /* Force fastinter*/
HANA_ONERR_FAILCHK, /* Simulate a failed check */
HANA_ONERR_SUDDTH, /* Enters sudden death - one more failed check will mark this server down */
HANA_ONERR_MARKDWN, /* Mark this server down, now! */
};
enum {
HANA_ONMARKEDDOWN_NONE = 0,
HANA_ONMARKEDDOWN_SHUTDOWNSESSIONS, /* Shutdown peer sessions */
};
enum {
HANA_ONMARKEDUP_NONE = 0,
HANA_ONMARKEDUP_SHUTDOWNBACKUPSESSIONS, /* Shutdown peer sessions */
};
enum {
HANA_OBS_NONE = 0,
HANA_OBS_LAYER4, /* Observe L4 - for example tcp */
HANA_OBS_LAYER7, /* Observe L7 - for example http */
HANA_OBS_SIZE
};
struct tcpcheck_rule;
struct tcpcheck_rules;
struct check {
enum obj_type obj_type; /* object type == OBJ_TYPE_CHECK */
struct session *sess; /* Health check session. */
struct vars vars; /* Health check dynamic variables. */
struct xprt_ops *xprt; /* transport layer operations for health checks */
struct stconn *sc; /* stream connector used by health checks */
struct buffer bi, bo; /* input and output buffers to send/recv check */
struct buffer_wait buf_wait; /* Wait list for buffer allocation */
struct task *task; /* the task associated to the health check processing, NULL if disabled */
struct timeval start; /* last health check start time */
long duration; /* time in ms took to finish last health check */
short status, code; /* check result, check code */
unsigned short port; /* the port to use for the health checks */
char desc[HCHK_DESC_LEN]; /* health check description */
signed char use_ssl; /* use SSL for health checks (1: on, 0: server mode, -1: off) */
int send_proxy; /* send a PROXY protocol header with checks */
struct tcpcheck_rules *tcpcheck_rules; /* tcp-check send / expect rules */
struct tcpcheck_rule *current_step; /* current step when using tcpcheck */
int inter, fastinter, downinter; /* checks: time in milliseconds */
enum chk_result result; /* health-check result : CHK_RES_* */
int state; /* state of the check : CHK_ST_* */
int health; /* 0 to rise-1 = bad;
* rise to rise+fall-1 = good */
int rise, fall; /* time in iterations */
int type; /* Check type, one of PR_O2_*_CHK */
struct server *server; /* back-pointer to server */
struct proxy *proxy; /* proxy to be used */
char **argv; /* the arguments to use if running a process-based check */
char **envp; /* the environment to use if running a process-based check */
struct pid_list *curpid; /* entry in pid_list used for current process-based test, or -1 if not in test */
struct sockaddr_storage addr; /* the address to check */
char *sni; /* Server name */
char *alpn_str; /* ALPN to use for checks */
int alpn_len; /* ALPN string length */
const struct mux_proto_list *mux_proto; /* the mux to use for all outgoing connections (specified by the "proto" keyword) */
int via_socks4; /* check the connection via socks4 proxy */
};
#endif /* _HAPROXY_CHECKS_T_H */