Because of a buggy tests when processing the EOH HTX block, an extra CRLF is
added for empty chunked messages. This bug was introduced by the commit
d1ac2b90c ("MAJOR: htx: Remove the EOM block type and use HTX_FL_EOM
instead").
This fix is specific for 2.4. No backport needed.
special guard macros HAVE_SSL_CTX_ADD_SERVER_CUSTOM_EXT was defined earlier
exactly for guarding SSL_CTX_add_server_custom_ext, let us use it wherever
appropriate
HAVE_SSL_CTX_ADD_SERVER_CUSTOM_EXT was introduced in ec609098718b9c1cd803ca57442b2b98c9ba4a16
however it was defined as HAVE_SL_CTX_ADD_SERVER_CUSTOM_EXT (missing "S")
let us fix typo
The demux loop could quit on missing data but the H2_CF_END_REACHED flag
would not be set in this case. This fixes a remaining situation where
previous commit f09612289 ("BUG/MEDIUM: mux-h2: handle remaining read0
cases") could not be sufficient and still leave CLOSE_WAIT. It's harder
to reproduce but was still observed in prod.
Now we quit via the end of the loop which already takes care of shutr.
This should be backported along with the patch above as far as 2.0.
If allocating a connection object failed right after a successful accept
on a listener, the new file descriptor was not properly closed.
This fixes GitHub issue #905.
It can be backported to 2.3.
During error files conversion to HTX message, in http_str_to_htx(), if a
file is empty, the corresponding buffer's area is initialized with a
malloc(0) and its size is set to 0. There is no problem here. The behaviour
is totally defined. But it is not really intuitive. Instead, we can simply
set the area to NULL.
This patch should fix the issue #1022.
Commit 3d4631fec ("BUG/MEDIUM: mux-h2: fix read0 handling on partial
frames") tried to address an issue introduced in commit aade4edc1 where
read0 wasn't properly handled in the middle of a frame. But the fix was
incomplete for two reasons:
- first, it would set H2_CF_RCVD_SHUT in h2_recv() after detecting
a read0 but the condition was guarded by h2_recv_allowed() which
explicitly excludes read0 ;
- second, h2_process would only call h2_process_demux() when there
were still data in the buffer, but closing after a short pause to
leave a buffer empty wouldn't be caught in this case.
This patch fixes this by properly taking care of the received shutdown
and by also waking up h2_process_demux() on an empty buffer if the demux
is not blocked.
Given the patches above were tagged for backporting to 2.0, this one
should be as well.
FD dumps are not always easy to match against netstat dumps, and often
require an lsof as a third dump. Let's emit the socket family, and the
local and remore ports when the FD is an IPv4/IPv6 socket, this will
significantly ease the matching.
The CO_FL_EARLY_SSL_HS flag was inconditionally set on the connection,
resulting in SSL_read_early_data() always being used first in handshake
calculations. While this seems to work well (probably that there are
fallback paths inside openssl), it's particularly confusing and makes
the debugging quite complicated. It possibly is not optimal by the way.
This flag ought to be set only when early_data is configured on the bind
line. Apparently there used to be a good reason for doing it this way in
1.8 times, but it really does not make sense anymore. It may be OK to
backport this to 2.3 if this helps with troubleshooting, but better not
go too far as it's unlikely to fix any real issue while it could introduce
some in old versions.
in the same manner of agentaddr, we now:
- permit to set agentport through `port` keyword, like it is the case
for agentaddr through `addr`
- set the priority on `agent-port` keyword when used
- add a flag to be able to test when the value is set like for agentaddr
it makes the behaviour between `addr` and `port` more consistent.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
small consistency problem with `addr` and `agent-addr` options:
for the both options, the last one parsed is always used to set the
agent-check addr. Thus these two lines don't have the same behavior:
server ... addr <addr1> agent-addr <addr2>
server ... agent-addr <addr2> addr <addr1>
After this patch `agent-addr` will always be the priority option over
`addr`. It means we test the flag before setting agentaddr.
We also fix all the places where we did not set the flag to be coherent
everywhere.
I was not really able to determine where this issue is coming from. So
it is probable we may backport it to all stable version where the agent
is supported.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
We can currently change the check-port using the cli command `set server
check-port` but there is a consistency issue when using server state.
This patch aims to fix this problem but will be also a good preparation
work to get rid of checkport flag, so we are able to know when checkport
was set by config.
I am fully aware this is not making github #953 moving forward, I
however think this might be acceptable while waiting for a proper
solution and resolve consistency problem faced with port settings.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
While trying to fix some consistency problem with the config file/cli
(e.g. check-port cli command does not set the flag), we realised
checkport flag was not necessarily needed. Indeed tcpcheck uses service
port as the last choice if check.port is zero. So we can assume if
check.port is zero, it means it was never set by the user, regardless if
it is by the cli or config file. In the longterm this will avoid to
introduce a new consistency issue if we forget to set the flag.
in the same manner of checkport flag, we don't really need checkaddr
flag. We can assume if checkaddr is not set, it means it was never set
by the user or config.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
When a server dns resolution is performed, there is no reason to set an
unconfigured check port with the server port. Because by default, if the
check port is not set, the server's one is used. Thus we can remove this
useless assignment. It is mandatory for next improvements.
When the server state is loaded from a server-state file, there is no reason
to set an unconfigured check port with the server port. Because by default,
if the check port is not set, the server's one is used. Thus we can remove
this useless assignment. It is mandatory for next improvements.
while reading `update_server_addr_port` I found out some things which
can be seen as incoherency. I hope I did not overlooked anything:
- one comment is stating check's address should be updated if it uses
the server one; however the condition checks if `SRV_F_CHECKADDR` is
set; this flag is set when a check address is set; result is that we
override the check address where I was not expecting it. In fact we
don't need to update anything here as server addr is used when check
addr is not set.
- same goes for check agent addr
- for port, it is a bit different, we update the check port if it is
unset. This is harmless because we also use server port if check port
is unset. However it creates some incoherency before/after using this
command, as check port should stay unset througout the life of the
process unless it is is set by `set server check-port` command.
quite hard to locate the origin of this this issue but the function was
introduced in commit d458adcc52b74608e2fe6a2a95f09ce5e94932b7 ("MINOR:
new update_server_addr_port() function to change both server's ADDR and
service PORT"). I was however not able to determine whether this is due
to a change of behavior along the years. So this patch can potentially
be backported up to v1.8 but we must be careful while doing so, as the
code has changed a lot. That being said, the bug being not very
impacting I would be fine keeping it for 2.4 only.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Flush the SSL session cache when updating a certificate which is used on a
server line. This prevent connections to be established with a cached
session which was using the previous SSL_CTX.
This patch also replace the ha_barrier with a thread_isolate() since there
are more operations to do. The reg-test was also updated to remove the
'no-ssl-reuse' keyword which is now uneeded.
Duplicate titles for the stats H2_ST_{OPEN,TOTAL}_{CONN,STREAM}. These
entries are used on csv for the heading.
This must be backported up to 2.3.
This fixes the github issue #1102.
As spotted in issue #822, we're having a problem with error detection in
the SSL layer. The problem is that on an overwhelmed machine, accepted
connections can start to pile up, each of them requiring a slow handshake,
and during all this time if the client aborts, the handshake will still be
calculated.
The error controls are properly placed, it's just that the SSL layer
reads records exactly of the advertised size, without having the ability
to encounter a pending connection error. As such if injecting many TLS
connections to a listener with a huge backlog, it's fairly possible to
meet this situation:
12:50:48.236056 accept4(8, {sa_family=AF_INET, sin_port=htons(62794), sin_addr=inet_addr("127.0.0.1")}, [128->16], SOCK_NONBLOCK) = 1109
12:50:48.236071 setsockopt(1109, SOL_TCP, TCP_NODELAY, [1], 4) = 0
(process other connections' handshakes)
12:50:48.257270 getsockopt(1109, SOL_SOCKET, SO_ERROR, [ECONNRESET], [4]) = 0
(proof that error was detectable there but this code was added for the PoC)
12:50:48.257297 recvfrom(1109, "\26\3\1\2\0", 5, 0, NULL, NULL) = 5
12:50:48.257310 recvfrom(1109, "\1\0\1\3"..., 512, 0, NULL, NULL) = 512
(handshake calculation taking 700us)
12:50:48.258004 sendto(1109, "\26\3\3\0z"..., 1421, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe)
12:50:48.258036 close(1109) = 0
The situation was amplified by the multi-queue accept code, as it resulted
in many incoming connections to be accepted long before they could be
handled. Prior to this they would have been accepted and the handshake
immediately started, which would have resulted in most of the connections
waiting in the the system's accept queue, and dying there when the client
aborted, thus the error would have been detected before even trying to
pass them to the handshake code.
As a result, with a listener running on a very large backlog, it's possible
to quickly accept tens of thousands of connections and waste time slowly
running their handshakes while they get replaced by other ones.
This patch adds an SO_ERROR check on the connection's FD before starting
the handshake. This is not pretty as it requires to access the FD, but it
does the job.
Some improvements should be made over the long term so that the transport
layers can report extra information with their ->rcv_buf() call, or at the
very least, implement a ->get_conn_status() function to report various
flags such as shutr, shutw, error at various stages, allowing an upper
layer to inquire for the relevance of engaging into a long operation if
it's known the connection is not usable anymore. An even simpler step
could probably consist in implementing this in the control layer.
This patch is simple enough to be backported as far as 2.0.
Many thanks to @ngaugler for his numerous tests with detailed feedback.
The "abort ssl cert" command is buggy and removes the current ckch store,
and instances, leading to SNI removal. It must only removes the new one.
This patch also adds a check in set_ssl_cert.vtc and
set_ssl_server_cert.vtc.
Must be backported as far as 2.2.
In order to unify prometheus and stats description, we need to remove
some field reference which are specific to stats implementation:
- `scur` in max current sessions (also reword current session)
- `rate` in max sessions
- `req_rate` in max requests
- `conn_rate` in max connections
Signed-off-by: William Dauchy <wdauchy@gmail.com>
In order to unify prometheus and stats description, we need to clarify
the description for pending connections.
- remove the BE reference in counters struct, as it is also used in
servers
- remove reference of `qcur` field in description as it is specific to
stats implemention
- try to reword cur and max pending connections description
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The function get_check_status_result() can now be used to get the result
code (CHK_RES_*) corresponding to a check status (HCHK_STATUS_*). It will be
used by the Prometheus exporter when reporting the check status of a server.
During the call to thread_isolate(), some other threads might have
performed some task_wakeup() which will have a call date past the
one we retrieved. It could be avoided by taking the current date
once we're alone but this would significantly affect the latency
measurements by adding the isolation time. Instead we're now only
accounting positive times, so that late wakeups normally appear
with a zero latency.
No backport is needed, this is 2.4.
In h2s_bck_make_req_headers() function, in the loop on the HTX blocks, the
most common blocks, the headers, are now handled in first, before the
start-line. The same change was already performed on the response HEADERS
frames. Thus the code is more consistent now.
When a HEADERS frame is sent, it is always when an HTX start-line block is
found. Thus, in h2s_bck_make_req_headers() and h2s_frt_make_resp_headers()
functions, it is useless to tests the start-line. Instead of being too
defensive, we use BUG_ON() now because it must not happen and must be
handled as a bug.
This patch should fix the issue #1086.
The list is always defined by definition. Thus there is no reason to test
it. There is also plenty of checks on arguments types while it is already
validated during the configuration parsing. But one thing at a time.
This patch should fix the issue #1087.
The sample fetch functions must always be called with a valid argument
list. When called by hand, if there is no argument to pass, empty_arg_list must
be used.
In the stick-table code, there are some calls to smp_fetch_src() with NULL as
argument list. It is changed to use empty_arg_list instead. It is not really a
bug because smp_fetch_src() does not use the argument list. But it is an API
bug.
This patch may be backported to all stable branches as a cleanup.
h1_process_output() function is never called with no data to send (count ==
0). Thus, the first test on count, at the beginning of the function is
useless and may be removed. This way, by reading the code, it is obvious the
<chn_htx> variable is always defined.
This patch should fix the issue #1085.
This handler can take quite some time as it deletes a large number of
entries under a lock, let's export it so that it's immediately visible
in "show profiling".
The check I/O handler, process_chk_conn and server_warmup are often
present in complex backtraces as they're impacted by locking or I/O
issues. Let's export them so that they resolve cleanly.
This finally adds the long-awaited solution to inspect the run queues
and figure what is eating the CPU or causing latencies. We can even see
the experienced latencies when profiling is enabled. Example on a
saturated process:
> show tasks
Running tasks: 14983 (4 threads)
function places % lat_tot lat_avg
process_stream 4948 33.0 5.840m 70.82ms
h1_io_cb 2535 16.9 - -
main+0x9e670 2508 16.7 2.930m 70.10ms
ssl_sock_io_cb 2499 16.6 - -
si_cs_io_cb 2493 16.6 - -
If a user enables profiling by hand, it makes sense to reset the stats
counters to provide fresh new measurements. Therefore it's worth using
this as the standard method to reset counters.
"show profiling" will now dump the stats collected by the scheduler if
profiling was previously enabled. This will immediately make it obvious
what functions are responsible for others' high latencies or which ones
are suffering from others, and should help spot issues like undesired
wakeups.
Example:
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
si_cs_io_cb 5569479 23.37s 4.196us - -
h1_io_cb 5558654 13.60s 2.446us - -
process_stream 250841 1.476s 5.882us 3.499s 13.95us
main+0x9e670 198 - - 5.526ms 27.91us
task_run_applet 17 1.509ms 88.77us 205.8us 12.11us
srv_cleanup_idle_connections 12 44.51us 3.708us 25.71us 2.142us
main+0x158c80 9 48.72us 5.413us - -
srv_cleanup_toremove_connections 5 165.1us 33.02us 123.6us 24.72us
Now when the profiling is enabled, the scheduler wlil update per-function
task-level statistics on number of calls, cpu usage and lateny, that could
later be checked using "show profiling". This will immediately make it
obvious what functions are responsible for others' high latencies or which
ones are suffering from others, and should help spot issues like undesired
wakeups. For now the stats are only collected but not reported (though they
are readable from sched_activity[] under gdb).
The new sched_activity structure will be used to collect task-level
activity based on the target function. The principle is to declare a
large enough array to make collisions rare (256 entries), and hash
the function pointer using a reduced XXH to decide where to store the
stats. On first computation an entry is definitely assigned to the
array and it's done atomically. A special entry (0) is used to store
collisions ("others"). The goal is to make it easy and inexpensive for
the scheduler code to use these to store #calls, cpu_time and lat_time
for each task.
In 2.0, commit d2d3348ac ("MINOR: activity: enable automatic profiling
turn on/off") introduced an automatic mode to enable/disable profiling.
The problem is that the automatic mode automatically changes to on/off,
which implied that the forced on/off modes aren't sticky anymore. It's
annoying when debugging because as soon as the load decreases, profiling
stops.
This makes a small change which ought to have been done first, which
consists in having two states for "auto" (auto-on, auto-off) to
distinguish them from the forced states. Setting to "auto" in the config
defaults to "auto-off" as before, and setting it on the CLI switches to
auto but keeps the current operating state.
This is simple enough to be backported to older releases if needed.
When reporting some values in debugging output we often need to have
some condensed, stable-length values. This function prints a duration
from nanosecond to years with at least 4 digits of accuracy using the
most suitable unit, always on 7 chars.
Do not consider reuse connection if available list is not allocated for
the target server. This will prevent a crash when using a standalone
server for an external purpose like socket_tcp/socket_ssl on hlua code.
For the idle/safe lists, they are considered allocated if
srv.max_idle_conns is not null.
Note that the hlua code is currently safe thanks to the additional
checks on proxy http mode and stream reuse policy not never. However,
this might not be sufficient for future code.
This patch should be backported in every branches containing the
following patch :
7f68d815af356fbe1b2e1080a88b9935581c91d2 (2.4 tree)
REORG: backend: simplify conn_backend_get
This reverts commit 62e8aaa1bd5ca96089eaa88487c700c4af4617f4.
While is works extremely well to address SSL handshake floods, it prevents
establishment of new connections during regular traffic above 50-60 Gbps,
because for an unknown reason the queue seems to have ~1.7 active tasks
per connection all the time, which makes no sense as these ought to be
waiting on subscribed events. It might uncover a deeper issue but at least
for now a different solution is needed. cf issue #822.
The test is trivial to run, just start a config with tune.runqueue-depth 10
and inject on 1GB objects with more than 10 connections. Try to connect to
the stats socket, it only works once, then the listeners are not dequeued.
In github issue #822, user @ngaugler reported some performance problems when
dealing with many concurrent SSL connections on restarts, after migrating
from 1.6 to 2.2, indicating a long time required to re-establish connections.
The Run_queue metric in the traces showed an abnormally high number of tasks
in the run queue, likely indicating we were accepting faster than we could
process. And this is indeed one of the differences between 1.6 and 2.2, the
accept I/O loop and the TLS handshakes are totally independent, so much that
they can even run on different threads. In 1.6 the SSL handshake was handled
almost immediately after the accept(), so this was limiting the input rate.
With large maxconn values, as long as there are incoming connections, new
I/Os are scheduled and many of them pass before the handshake, being tagged
for low latency processing.
The result is that handshakes get postponed, and are further postponed as
new connections are accepted. When they are finally able to be processed,
some of them fail as the client is gone, and the client had already queued
new ones. This causes an excess number of apparent connections and total
number of handshakes to be processed, just because we were accepting
connections on a temporarily saturated machine.
The solution is to temporarily pause new incoming connections when the
load already indicates that more tasks are already queued than will be
handled in a poll loop. The difficulty with this usually is to be able
to come back to re-enable the operation, but given that the metric is
the run queue, we just have to queue the global_listener_queue task so
that it gets picked by any thread once the run queues get flushed.
Before this patch, injecting with SSL reneg with 10000 concurrent
connections resulted in 350k tasks in the run queue, and a majority of
handshake timeouts noticed by the client. With the patch, the run queue
fluctuates between 1-3x runqueue-depth, the process is constantly busy, the
accept rate is maximized and clients observe no error anymore.
It would be desirable to backport this patch to 2.3 and 2.2 after some more
testing, provided the accept loop there is compatible.