10849 Commits

Author SHA1 Message Date
Amaury Denoyelle
293dcc400e MINOR: backend: compare conn hash for session conn reuse
Compare the connection hash when reusing a connection from the session.
This ensures that a private connection is reused only if it shares the
same set of parameters.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
1a58aca84e MINOR: connection: use the srv pointer for the srv conn hash
The pointer of the target server is used as a first parameter for the
server connection hash calcul. This prevents the hash to be null when no
specific parameters are present, and can serve as a simple defense
against an attacker trying to reuse a non-conform connection.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
81c6f76d3e MINOR: connection: prepare hash calcul for server conns
This is a preliminary work for the calcul of the backend connection
hash. A structure conn_hash_params is the input for the operation,
containing the various specific parameters of a connection.

The high bits of the hash will reflect the parameters present as input.
A set of macros is written to manipulate the connection hash and extract
the parameters/payload.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
aa890aef3d MINOR: backend: search conn in idle tree after safe on always reuse
With http-reuse always, if no matching safe connection is found, check
in idle tree for a matching one. This is needed because now idle
connections can be differentiated from each other.

If only the safe tree was checked because not empty, but did not contain
a matching connection, we could miss matching entry in idle tree.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
1399d695c0 MINOR: backend: search conn in idle/safe trees after available
If no matching connection is found on available, check on idle/safe
trees for a matching one. This is needed because now idle connections
can be differentiated from each other.

If only the available list was checked because not empty, but did not
contain a matching connection, we could miss matching entries in idle or
safe trees.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
f232cb3e9b MEDIUM: connection: replace idle conn lists by eb trees
The server idle/safe/available connection lists are replaced with ebmb-
trees. This is used to store backend connections, with the new field
connection hash as the key. The hash is a 8-bytes size field, used to
reflect specific connection parameters.

This is a preliminary work to be able to reuse connection with SNI,
explicit src/dst address or PROXY protocol.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
5c7086f6b0 MEDIUM: connection: protect idle conn lists with locks
This is a preparation work for connection reuse with sni/proxy
protocol/specific src-dst addresses.

Protect every access to idle conn lists with a lock. This is currently
strictly not needed because the access to the list are made with atomic
operations. However, to be able to reuse connection with specific
parameters, the list storage will be converted to eb-trees. As this
structure does not have atomic operation, it is mandatory to protect it
with a lock.

For this, the takeover lock is reused. Its role was to protect during
connection takeover. As it is now extended to general idle conns usage,
it is renamed to idle_conns_lock. A new lock section is also
instantiated named IDLE_CONNS_LOCK to isolate its impact on performance.
2021-02-12 12:33:04 +01:00
Amaury Denoyelle
a3bf62ec54 BUG/MINOR: backend: hold correctly lock when killing idle conn
The wrong lock seems to be held when trying to remove another thread
connection if max fd limit has been reached (locking the current thread
instead of the target thread lock).

This could be backported up to 2.0.
2021-02-12 12:32:31 +01:00
Christopher Faulet
cd7126b396 CLEANUP: queue: Remove useless tests on p or pp in pendconn_process_next_strm()
This patch removes unecessary tests on p or pp pointers in
pendconn_process_next_strm() function. This should make cppcheck happy and
avoid false report of null pointer dereference.

This patch should fix the issue #1036.
2021-02-11 11:48:36 +01:00
Ilya Shipitsin
a1e0f387c7 CLEANUP: remove unused variable assigned found by Coverity
this is pure cleanup, no need to backport

2116        if ((end - 1) == (payload + strlen(PAYLOAD_PATTERN))) {
2117                /* if the payload pattern is at the end */
2118                s->pcli_flags |= PCLI_F_PAYLOAD;
    CID 1399833 (#1 of 1): Unused value (UNUSED_VALUE)assigned_value: Assigning value from reql to ret here, but that stored value is overwritten before it can be used.
2119                ret = reql;
2120        }

This patch fixes the issue #1048.
2021-02-11 11:48:36 +01:00
Christopher Faulet
4b524124db BUG/MINOR: tools: Fix a memory leak on error path in parse_dotted_uints()
When an invalid character is found during parsing in parse_dotted_uints()
function, the allocated array of uint must be released. This patch fixes a
memory leak on error path during the configuration parsing.

This patch should fix the issue #1106. It should be backported as far as
2.0. Note that, for 2.1 and 2.0, the function is in src/standard.c
2021-02-11 11:48:36 +01:00
Christopher Faulet
0aeaa290da CLEANUP: muxes: Remove useless calls to b_realign_if_empty()
In H1, H2 and FCGI muxes, b_realign_if_empty() is called to reset the head
of an empty buffer before setting it a specific value to permit the
zero-copy. Thus, we can remove call to b_realign_if_empty().
2021-02-11 11:48:36 +01:00
Christopher Faulet
368936703a MINOR: mux-h1: Be sure EOM flag is set when processing end of outgoing message
When a message is sent, an extra check is performed when the parser is
switch to MSG_DONE state to be sure the EOM flag is really set. This flag is
quite new and replaces the EOM block. Thus, this test is a safeguard waiting
for a proper refactoring of the outgoing side.
2021-02-10 16:25:42 +01:00
Christopher Faulet
337243235f BUG/MEDIUM: mux-h2: Add EOT block when EOM flag is set on an empty HTX message
In the H2 mux, when a empty DATA frame is used to finish a message, just to
set the ES flag, we now only set the EOM flag on the HTX message. However,
if the HTX message is empty, this event will not be properly handled on the
other side because there is no effective data to handle. Thus, it is
interpreted as an abort by the H1 mux.

It is in part caused by the current H1 mux design but also because there is
no way to emit empty HTX block (NOOP HTX block) or to wakeup a mux for send
when there is no data to finish some internal processing.

Thus, for now, to work around this limitation, an EOT HTX block is added by
the H2 mux if a EOM flag is added on an empty HTX message. This case is only
possible when an empty DATA frame with the ES flag is received.

This fix is specific for 2.4. No backport needed.
2021-02-10 16:25:42 +01:00
Christopher Faulet
0a916d2aca BUG/MINOR: mux-h1: Don't blindly skip EOT block for non-chunked messages
In HTTP/2, we may have trailers for messages with a Content-length
header. Thus, when the H2 mux receives a HEADERS frame at the end of a
message, it always emits TLR and EOT HTX blocks. On the H1 mux, if this
happens, these blocks are just skipped because we cannot emit trailers for a
non-chunked message. But the EOT HTX block must not be blindly
ignored. Indeed, there is no longer EOM HTX block to mark the end of the
message. Thus the EOT block, when found, is the end of the message. So we
must handle it to swith in MSG_DONE state.

This fix is specific for 2.4. No backport needed.
2021-02-10 16:25:42 +01:00
Christopher Faulet
0d7e634631 BUG/MINOR: mux-h1: Fix data skipping for bodyless responses
When payload is received for a bodyless response, for instance a response to
a HEAD request, it is silently skipped. Unfortunately, when this happens,
the end of the message is not properly handled. The response remains in the
MSG_DATA state (or MSG_TRAILERS if the message is chunked). In addition,
when a zero-copy is possible, the data are not removed from the channel
buffer and the H1 connection is killed because an error is then triggered.

To fix the bug, the zero-copy is disabled for bodyless responses. It is not
a problem because there is no copy at all. And the last block (DATA or EOT)
is now properly handled.

This bug was introduced by the commit e5596bf53 ("MEDIUM: mux-h1: Don't emit
any payload for bodyless responses").

This fix is specific for 2.4. No backport needed.
2021-02-10 16:25:42 +01:00
Christopher Faulet
a22782b597 BUG/MEDIUM: mux-h1: Always set CS_FL_EOI for response in MSG_DONE state
During the message parsing, if in MSG_DONE state, the CS_FL_EOI flag must
always be set on the conn-stream if following conditions are met :

  * It is a response or
  * It is a request but not a protocol upgrade nor a CONNECT.

For now, there is no test on the message type (request or response). Thus
the CS_FL_EOI flag is not set for a response with a "Connection: upgrade"
header but not a 101 response.

This bug was introduced by the commit 3e1748bbf ("BUG/MINOR: mux-h1: Don't
set CS_FL_EOI too early for protocol upgrade requests"). It was backported
as far as 2.0. Thus, this patch must also be backported as far as 2.0.
2021-02-10 16:25:42 +01:00
Christopher Faulet
bf7175f9b6 BUG/MINOR: http-ana: Don't increment HTTP error counter on internal errors
If internal error is reported by the mux during HTTP request parsing, the
HTTP error counter should not be incremented. It should only be incremented
on parsing error to reflect errors caused by clients.

This patch must be backported as far as 2.0. During the backport, the same
must be performed for 408-request-time-out errors.
2021-02-10 16:22:32 +01:00
Christopher Faulet
f4b7074784 BUG/MINOR: mux-h1: Don't increment HTTP error counter for 408/500/501 errors
The HTTP error counter reflects the number of errors caused by
clients. Thus, In the H1 mux, it should only be increment on parsing errors.

This fix is specific for 2.4. No backport needed.
2021-02-10 16:22:32 +01:00
Willy Tarreau
826f3ab5e6 MINOR: stick-tables/counters: add http_fail_cnt and http_fail_rate data types
Historically we've been counting lots of client-triggered events in stick
tables to help detect misbehaving ones, but we've been missing the same on
the server side, and there's been repeated requests for being able to count
the server errors per URL in order to precisely monitor the quality of
service or even to avoid routing requests to certain dead services, which
is also called "circuit breaking" nowadays.

This commit introduces http_fail_cnt and http_fail_rate, which work like
http_err_cnt and http_err_rate in that they respectively count events and
their frequency, but they only consider server-side issues such as network
errors, unparsable and truncated responses, and 5xx status codes other
than 501 and 505 (since these ones are usually triggered by the client).
Note that retryable errors are purposely not accounted for, so that only
what the client really sees is considered.

With this it becomes very simple to put some protective measures in place
to perform a redirect or return an excuse page when the error rate goes
beyond a certain threshold for a given URL, and give more chances to the
server to recover from this condition. Typically it could look like this
to bypass a URL causing more than 10 requests per second:

  stick-table type string len 80 size 4k expire 1m store http_fail_rate(1m)
  http-request track-sc0 base       # track host+path, ignore query string
  http-request return status 503 content-type text/html \
      lf-file excuse.html if { sc0_http_fail_rate gt 10 }

A more advanced mechanism using gpt0 could even implement high/low rates
to disable/enable the service.

Reg-test converteers_ref_cnt_never_dec.vtc was updated to test it.
2021-02-10 12:27:01 +01:00
Willy Tarreau
e4d247e217 BUG/MINOR: freq_ctr: fix a wrong delay calculation in next_event_delay()
The sleep time calculation in next_event_delay() was wrong because it
was dividing 999 by the number of pending events, and was directly
responsible for an observation made a long time ago that listeners
would eat all the CPU when hammered while globally rate-limited,
because the more the queued events, the least it would wait, and would
ignore the configured frequency to compute the delay.

This was addressed in various ways in listeners through the switch to
the FULL state and the wakeup of manage_global_listener_queue() that
avoids this fast loop, but the calculation made there remained wrong
nevertheless. It's even visible with this patch that the accept
frequency is much more accurate at low values now; for example,
configuring a maxconrate of 10 would give between 8.99 and 11.0 cps
before this patch and between 9.99 and 10.0 with it.

Better fix it now in case it's reused anywhere else and causes confusion
again. It maybe be backported but is probably not worth it.
2021-02-09 17:52:50 +01:00
William Lallemand
3ce6eedb37 MEDIUM: ssl: add a rwlock for SSL server session cache
When adding the server side support for certificate update over the CLI
we encountered a design problem with the SSL session cache which was not
locked.

Indeed, once a certificate is updated we need to flush the cache, but we
also need to ensure that the cache is not used during the update.
To prevent the use of the cache during an update, this patch introduce a
rwlock for the SSL server session cache.

In the SSL session part this patch only lock in read, even if it writes.
The reason behind this, is that in the session part, there is one cache
storage per thread so it is not a problem to write in the cache from
several threads. The problem is only when trying to write in the cache
from the CLI (which could be on any thread) when a session is trying to
access the cache. So there is a write lock in the CLI part to prevent
simultaneous access by a session and the CLI.

This patch also remove the thread_isolate attempt which is eating too
much CPU time and was not protecting from the use of a free ptr in the
session.
2021-02-09 09:43:44 +01:00
Ilya Shipitsin
7ff7747a17 BUILD: ssl: guard SSL_CTX_set_msg_callback with SSL_CTRL_SET_MSG_CALLBACK macro
both SSL_CTX_set_msg_callback and SSL_CTRL_SET_MSG_CALLBACK defined since
ea262260469e49149cb10b25a87dfd6ad3fbb4ba, we can safely switch to that guard
instead of OpenSSL version
2021-02-08 13:49:41 +01:00
William Dauchy
060ffc82d6 CLEANUP: tools: typo in strl2irc mention
`str2irc` does not exist

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-08 10:49:08 +01:00
William Dauchy
f4300902b9 CLEANUP: check: fix some typo in comments
a few obvious english typo in comments, some of which introduced by
myself quite recently

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-08 10:49:08 +01:00
Ilya Shipitsin
acf84595a7 CLEANUP: assorted typo fixes in the code and comments
This is 17th iteration of typo fixes
2021-02-08 10:49:08 +01:00
Christopher Faulet
3d6e0e3e04 BUG/MINOR: mux-h1: Don't emit extra CRLF for empty chunked messages
Because of a buggy tests when processing the EOH HTX block, an extra CRLF is
added for empty chunked messages. This bug was introduced by the commit
d1ac2b90c ("MAJOR: htx: Remove the EOM block type and use HTX_FL_EOM
instead").

This fix is specific for 2.4. No backport needed.
2021-02-08 09:43:36 +01:00
Ilya Shipitsin
f00cdb1856 BUILD: ssl: guard SSL_CTX_add_server_custom_ext with special macro
special guard macros HAVE_SSL_CTX_ADD_SERVER_CUSTOM_EXT was defined earlier
exactly for guarding SSL_CTX_add_server_custom_ext, let us use it wherever
appropriate
2021-02-08 00:11:43 +01:00
Ilya Shipitsin
7bbf5866e0 BUILD: ssl: fix typo in HAVE_SSL_CTX_ADD_SERVER_CUSTOM_EXT macro
HAVE_SSL_CTX_ADD_SERVER_CUSTOM_EXT was introduced in ec609098718b9c1cd803ca57442b2b98c9ba4a16
however it was defined as HAVE_SL_CTX_ADD_SERVER_CUSTOM_EXT (missing "S")
let us fix typo
2021-02-08 00:11:41 +01:00
Willy Tarreau
133aaa9f11 BUG/MEDIUM: mux-h2: do not quit the demux loop before setting END_REACHED
The demux loop could quit on missing data but the H2_CF_END_REACHED flag
would not be set in this case. This fixes a remaining situation where
previous commit f09612289 ("BUG/MEDIUM: mux-h2: handle remaining read0
cases") could not be sufficient and still leave CLOSE_WAIT. It's harder
to reproduce but was still observed in prod.

Now we quit via the end of the loop which already takes care of shutr.

This should be backported along with the patch above as far as 2.0.
2021-02-05 12:22:54 +01:00
Remi Tricot-Le Breton
25dd0ad123 BUG/MINOR: sock: Unclosed fd in case of connection allocation failure
If allocating a connection object failed right after a successful accept
on a listener, the new file descriptor was not properly closed.

This fixes GitHub issue #905.
It can be backported to 2.3.
2021-02-05 12:14:51 +01:00
Christopher Faulet
1cdc028687 CLEANUP: http-htx: Set buffer area to NULL instead of malloc(0)
During error files conversion to HTX message, in http_str_to_htx(), if a
file is empty, the corresponding buffer's area is initialized with a
malloc(0) and its size is set to 0. There is no problem here. The behaviour
is totally defined. But it is not really intuitive. Instead, we can simply
set the area to NULL.

This patch should fix the issue #1022.
2021-02-05 11:51:44 +01:00
Willy Tarreau
f09612289f BUG/MEDIUM: mux-h2: handle remaining read0 cases
Commit 3d4631fec ("BUG/MEDIUM: mux-h2: fix read0 handling on partial
frames") tried to address an issue introduced in commit aade4edc1 where
read0 wasn't properly handled in the middle of a frame. But the fix was
incomplete for two reasons:

  - first, it would set H2_CF_RCVD_SHUT in h2_recv() after detecting
    a read0 but the condition was guarded by h2_recv_allowed() which
    explicitly excludes read0 ;

  - second, h2_process would only call h2_process_demux() when there
    were still data in the buffer, but closing after a short pause to
    leave a buffer empty wouldn't be caught in this case.

This patch fixes this by properly taking care of the received shutdown
and by also waking up h2_process_demux() on an empty buffer if the demux
is not blocked.

Given the patches above were tagged for backporting to 2.0, this one
should be as well.
2021-02-05 11:48:38 +01:00
Willy Tarreau
ed9892018c MINOR: cli/show_fd: report local and report ports when known
FD dumps are not always easy to match against netstat dumps, and often
require an lsof as a third dump. Let's emit the socket family, and the
local and remore ports when the FD is an IPv4/IPv6 socket, this will
significantly ease the matching.
2021-02-05 10:58:03 +01:00
Willy Tarreau
a84986ae4f BUG/MINOR: ssl: do not try to use early data if not configured
The CO_FL_EARLY_SSL_HS flag was inconditionally set on the connection,
resulting in SSL_read_early_data() always being used first in handshake
calculations. While this seems to work well (probably that there are
fallback paths inside openssl), it's particularly confusing and makes
the debugging quite complicated. It possibly is not optimal by the way.

This flag ought to be set only when early_data is configured on the bind
line. Apparently there used to be a good reason for doing it this way in
1.8 times, but it really does not make sense anymore. It may be OK to
backport this to 2.3 if this helps with troubleshooting, but better not
go too far as it's unlikely to fix any real issue while it could introduce
some in old versions.
2021-02-05 08:04:02 +01:00
Christopher Faulet
a8979a9b59 DOC: server: Add missing params in comment of the server state line parsing
srv_use_ssl and srv_check_port parameters were not mentionned in the comment
of the function parsing a server state line.
2021-02-04 14:00:43 +01:00
William Dauchy
4858fb2e18 MEDIUM: check: align agentaddr and agentport behaviour
in the same manner of agentaddr, we now:
- permit to set agentport through `port` keyword, like it is the case
  for agentaddr through `addr`
- set the priority on `agent-port` keyword when used
- add a flag to be able to test when the value is set like for agentaddr

it makes the behaviour between `addr` and `port` more consistent.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-04 14:00:38 +01:00
William Dauchy
1c921cd748 BUG/MINOR: check: consitent way to set agentaddr
small consistency problem with `addr` and `agent-addr` options:
for the both options, the last one parsed is always used to set the
agent-check addr.  Thus these two lines don't have the same behavior:

  server ... addr <addr1> agent-addr <addr2>
  server ... agent-addr <addr2> addr <addr1>

After this patch `agent-addr` will always be the priority option over
`addr`. It means we test the flag before setting agentaddr.
We also fix all the places where we did not set the flag to be coherent
everywhere.

I was not really able to determine where this issue is coming from. So
it is probable we may backport it to all stable version where the agent
is supported.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-04 13:55:04 +01:00
William Dauchy
fe03e7d045 MEDIUM: server: adding support for check_port in server state
We can currently change the check-port using the cli command `set server
check-port` but there is a consistency issue when using server state.
This patch aims to fix this problem but will be also a good preparation
work to get rid of checkport flag, so we are able to know when checkport
was set by config.

I am fully aware this is not making github #953 moving forward, I
however think this might be acceptable while waiting for a proper
solution and resolve consistency problem faced with port settings.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-04 10:46:52 +01:00
William Dauchy
69f118d7b6 MEDIUM: check: remove checkport checkaddr flag
While trying to fix some consistency problem with the config file/cli
(e.g. check-port cli command does not set the flag), we realised
checkport flag was not necessarily needed. Indeed tcpcheck uses service
port as the last choice if check.port is zero. So we can assume if
check.port is zero, it means it was never set by the user, regardless if
it is by the cli or config file.  In the longterm this will avoid to
introduce a new consistency issue if we forget to set the flag.

in the same manner of checkport flag, we don't really need checkaddr
flag. We can assume if checkaddr is not set, it means it was never set
by the user or config.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-04 10:43:00 +01:00
Christopher Faulet
21ca3dfc3a MINOR: dns: Don't set the check port during a server dns resolution
When a server dns resolution is performed, there is no reason to set an
unconfigured check port with the server port. Because by default, if the
check port is not set, the server's one is used. Thus we can remove this
useless assignment. It is mandatory for next improvements.
2021-02-04 10:42:52 +01:00
Christopher Faulet
99497d7dba MINOR: server: Don't set the check port during the update from a state file
When the server state is loaded from a server-state file, there is no reason
to set an unconfigured check port with the server port. Because by default,
if the check port is not set, the server's one is used. Thus we can remove
this useless assignment. It is mandatory for next improvements.
2021-02-04 10:42:45 +01:00
William Dauchy
446db718cb BUG/MINOR: cli: fix set server addr/port coherency with health checks
while reading `update_server_addr_port` I found out some things which
can be seen as incoherency. I hope I did not overlooked anything:

- one comment is stating check's address should be updated if it uses
  the server one; however the condition checks if `SRV_F_CHECKADDR` is
  set; this flag is set when a check address is set; result is that we
  override the check address where I was not expecting it. In fact we
  don't need to update anything here as server addr is used when check
  addr is not set.
- same goes for check agent addr
- for port, it is a bit different, we update the check port if it is
  unset. This is harmless because we also use server port if check port
  is unset. However it creates some incoherency before/after using this
  command, as check port should stay unset througout the life of the
  process unless it is is set by `set server check-port` command.

quite hard to locate the origin of this this issue but the function was
introduced in commit d458adcc52b74608e2fe6a2a95f09ce5e94932b7 ("MINOR:
new update_server_addr_port() function to change both server's ADDR and
service PORT"). I was however not able to determine whether this is due
to a change of behavior along the years. So this patch can potentially
be backported up to v1.8 but we must be careful while doing so, as the
code has changed a lot. That being said, the bug being not very
impacting I would be fine keeping it for 2.4 only.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-04 09:06:04 +01:00
William Lallemand
e0de0a6b32 MINOR: ssl/cli: flush the server session cache upon 'commit ssl cert'
Flush the SSL session cache when updating a certificate which is used on a
server line. This prevent connections to be established with a cached
session which was using the previous SSL_CTX.

This patch also replace the ha_barrier with a thread_isolate() since there
are more operations to do. The reg-test was also updated to remove the
'no-ssl-reuse' keyword which is now uneeded.
2021-02-03 18:51:01 +01:00
Amaury Denoyelle
377d8786a7 BUG/MINOR: mux_h2: fix incorrect stat titles
Duplicate titles for the stats H2_ST_{OPEN,TOTAL}_{CONN,STREAM}. These
entries are used on csv for the heading.

This must be backported up to 2.3.

This fixes the github issue #1102.
2021-02-03 17:50:45 +01:00
Willy Tarreau
0630038e77 BUG/MEDIUM: ssl: check a connection's status before computing a handshake
As spotted in issue #822, we're having a problem with error detection in
the SSL layer. The problem is that on an overwhelmed machine, accepted
connections can start to pile up, each of them requiring a slow handshake,
and during all this time if the client aborts, the handshake will still be
calculated.

The error controls are properly placed, it's just that the SSL layer
reads records exactly of the advertised size, without having the ability
to encounter a pending connection error. As such if injecting many TLS
connections to a listener with a huge backlog, it's fairly possible to
meet this situation:

  12:50:48.236056 accept4(8, {sa_family=AF_INET, sin_port=htons(62794), sin_addr=inet_addr("127.0.0.1")}, [128->16], SOCK_NONBLOCK) = 1109
  12:50:48.236071 setsockopt(1109, SOL_TCP, TCP_NODELAY, [1], 4) = 0
  (process other connections' handshakes)

  12:50:48.257270 getsockopt(1109, SOL_SOCKET, SO_ERROR, [ECONNRESET], [4]) = 0
  (proof that error was detectable there but this code was added for the PoC)

  12:50:48.257297 recvfrom(1109, "\26\3\1\2\0", 5, 0, NULL, NULL) = 5
  12:50:48.257310 recvfrom(1109, "\1\0\1\3"..., 512, 0, NULL, NULL) = 512

  (handshake calculation taking 700us)

  12:50:48.258004 sendto(1109, "\26\3\3\0z"..., 1421, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe)
  12:50:48.258036 close(1109)             = 0

The situation was amplified by the multi-queue accept code, as it resulted
in many incoming connections to be accepted long before they could be
handled. Prior to this they would have been accepted and the handshake
immediately started, which would have resulted in most of the connections
waiting in the the system's accept queue, and dying there when the client
aborted, thus the error would have been detected before even trying to
pass them to the handshake code.

As a result, with a listener running on a very large backlog, it's possible
to quickly accept tens of thousands of connections and waste time slowly
running their handshakes while they get replaced by other ones.

This patch adds an SO_ERROR check on the connection's FD before starting
the handshake. This is not pretty as it requires to access the FD, but it
does the job.

Some improvements should be made over the long term so that the transport
layers can report extra information with their ->rcv_buf() call, or at the
very least, implement a ->get_conn_status() function to report various
flags such as shutr, shutw, error at various stages, allowing an upper
layer to inquire for the relevance of engaging into a long operation if
it's known the connection is not usable anymore. An even simpler step
could probably consist in implementing this in the control layer.

This patch is simple enough to be backported as far as 2.0.

Many thanks to @ngaugler for his numerous tests with detailed feedback.
2021-02-02 15:55:53 +01:00
William Lallemand
8695ce0bae BUG/MEDIUM: ssl/cli: abort ssl cert is freeing the old store
The "abort ssl cert" command is buggy and removes the current ckch store,
and instances, leading to SNI removal. It must only removes the new one.

This patch also adds a check in set_ssl_cert.vtc and
set_ssl_server_cert.vtc.

Must be backported as far as 2.2.
2021-02-01 17:58:21 +01:00
William Dauchy
19f7cfc8c3 MINOR: stats: improve max stats descriptions
In order to unify prometheus and stats description, we need to remove
some field reference which are specific to stats implementation:
- `scur` in max current sessions (also reword current session)
- `rate` in max sessions
- `req_rate` in max requests
- `conn_rate` in max connections

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-01 15:16:33 +01:00
William Dauchy
eedb9b13f4 MINOR: stats: improve pending connections description
In order to unify prometheus and stats description, we need to clarify
the description for pending connections.
- remove the BE reference in counters struct, as it is also used in
  servers
- remove reference of `qcur` field in description as it is specific to
  stats implemention
- try to reword cur and max pending connections description

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-01 15:16:33 +01:00
Christopher Faulet
7aa3271439 MINOR: checks: Add function to get the result code corresponding to a status
The function get_check_status_result() can now be used to get the result
code (CHK_RES_*) corresponding to a check status (HCHK_STATUS_*). It will be
used by the Prometheus exporter when reporting the check status of a server.
2021-02-01 15:16:33 +01:00