In stats_dump_proxy_to_buffer() function, special care was taken when
dealing with servers dump.
Indeed, stats_dump_proxy_to_buffer() can be interrupted and resumed if
buffer space is not big enough to complete dump.
Thus, a reference is taken on the server being dumped in the hope that
the server will still be valid when the function resumes.
(to prevent the server from being freed in the meantime)
While this is now true thanks to:
- "BUG/MINOR: server/del: fix legacy srv->next pointer consistency"
We still have an issue: when resuming, saved server reference is not
dropped.
This prevents the server from being freed when we no longer use it.
Moreover, as the saved server might now be deleted
(SRV_F_DELETED flag set), the current deleted server may still be dumped
in the stats and while this is not a bug, this could be misleading for
the user.
Let's add a px_st variable to detect if the stats_dump_proxy_to_buffer()
is being resumed at the STAT_PX_ST_SV stage: perform some housekeeping
to skip deleted servers and properly drop the reference on the saved
server.
This commit depends on:
- "MINOR: server: add SRV_F_DELETED flag"
- "BUG/MINOR: server/del: fix legacy srv->next pointer consistency"
This should be backported up to 2.6
We recently discovered a bug which affects dynamic server deletion:
When a server is deleted, it is removed from the "visible" server list.
But as we've seen in previous commit
("MINOR: server: add SRV_F_DELETED flag"), it can still be accessed by
someone who keeps a reference on it (waiting for the final srv_drop()).
Throughout this transient state, server ptr is still valid (may be
dereferenced) and the flag SRV_F_DELETED is set.
However, as the server is not part of server list anymore, we have
an issue: srv->next pointer won't be updated anymore as the only place
where we perform such update is in cli_parse_delete_server() by
iterating over the "visible" server list.
Because of this, we cannot guarantee that a server with the
SRV_F_DELETED flag has a valid 'next' ptr: 'next' could be pointing
to a fully removed (already freed) server.
This problem can be easily demonstrated with server dumping in
the stats:
server list dumping is performed in stats_dump_proxy_to_buffer()
The function can be interrupted and resumed later by design.
ie: output buffer is full: partial dump and finish the dump after
the flush
This is implemented by calling srv_take() on the server being dumped,
and only releasing it when we're done with it using srv_drop().
(drop can be delayed after function resume if buffer is full)
While the function design seems OK, it works with the assumption that
srv->next will still be valid after the function resumes, which is
not true. (especially if multiple servers are being removed in between
the 2 dumping attempts)
In practice, this did not cause any crash yet (at least this was not
reported so far), because server dumping is so fast that it is very
unlikely that multiple server deletions make their way between 2
dumping attempts in most setups. But still, this is a problem that we
need to address because some upcoming work might depend on this
assumption as well and for the moment it is not safe at all.
========================================================================
Here is a quick reproducer:
With this patch, we're creating a large deletion window of 3s as soon
as we reach a server named "t2" while iterating over the list.
This will give us plenty of time to perform multiple deletions before
the function is resumed.
| diff --git a/src/stats.c b/src/stats.c
| index 84a4f9b6e..15e49b4cd 100644
| --- a/src/stats.c
| +++ b/src/stats.c
| @@ -3189,11 +3189,24 @@ int stats_dump_proxy_to_buffer(struct stconn *sc, struct htx *htx,
| * Temporarily increment its refcount to prevent its
| * anticipated cleaning. Call free_server to release it.
| */
| + struct server *orig = ctx->obj2;
| for (; ctx->obj2 != NULL;
| ctx->obj2 = srv_drop(sv)) {
|
| sv = ctx->obj2;
| + printf("sv = %s\n", sv->id);
| srv_take(sv);
| + if (!strcmp("t2", sv->id) && orig == px->srv) {
| + printf("deletion window: 3s\n");
| + thread_idle_now();
| + thread_harmless_now();
| + sleep(3);
| + thread_harmless_end();
| +
| + thread_idle_end();
| +
| + goto full; /* simulate full buffer */
| + }
|
| if (htx) {
| if (htx_almost_full(htx))
| @@ -4353,6 +4366,7 @@ static void http_stats_io_handler(struct appctx *appctx)
| struct channel *res = sc_ic(sc);
| struct htx *req_htx, *res_htx;
|
| + printf("http dump\n");
| /* only proxy stats are available via http */
| ctx->domain = STATS_DOMAIN_PROXY;
|
Ok, we're ready, now we start haproxy with the following conf:
global
stats socket /tmp/ha.sock mode 660 level admin expose-fd listeners thread 1-1
nbthread 2
frontend stats
mode http
bind *:8081 thread 2-2
stats enable
stats uri /
backend farm
server t1 127.0.0.1:1899 disabled
server t2 127.0.0.1:18999 disabled
server t3 127.0.0.1:18998 disabled
server t4 127.0.0.1:18997 disabled
And finally, we execute the following script:
curl localhost:8081/stats&
sleep .2
echo "del server farm/t2" | nc -U /tmp/ha.sock
echo "del server farm/t3" | nc -U /tmp/ha.sock
This should be enough to reveal the issue, I easily manage to
consistently crash haproxy with the following reproducer:
http dump
sv = t1
http dump
sv = t1
sv = t2
deletion window = 3s
[NOTICE] (2940566) : Server deleted.
[NOTICE] (2940566) : Server deleted.
http dump
sv = t2
sv = �����U
[1] 2940566 segmentation fault (core dumped) ./haproxy -f ttt.conf
========================================================================
To fix this, we add prev_deleted mt_list in server struct.
For a given "visible" server, this list will contain the pending
"deleted" servers references that point to it using their 'next' ptr.
This way, whenever this "visible" server is going to be deleted via
cli_parse_delete_server() it will check for servers in its
'prev_deleted' list and update their 'next' pointer so that they no
longer point to it, and then it will push them in its
'next->prev_deleted' list to transfer the update responsibility to the
next 'visible' server (if next != NULL).
Then, following the same logic, the server about to be removed in
cli_parse_delete_server() will push itself as well into its
'next->prev_deleted' list (if next != NULL) so that it may still use its
'next' ptr for the time it is in transient removal state.
In srv_drop(), right before the server is finally freed, we make sure
to remove it from the 'next->prev_deleted' list so that 'next' won't
try to perform the pointers update for this server anymore.
This has to be done atomically to prevent 'next' srv from accessing a
purged server.
As a result:
for a valid server, either deleted or not, 'next' ptr will always
point to a non deleted (ie: visible) server.
With the proposed fix, and several removal combinations (including
unordered cli_parse_delete_server() and srv_drop() calls), I cannot
reproduce the crash anymore.
Example tricky removal sequence that is now properly handled:
sv list: t1,t2,t3,t4,t5,t6
ops:
take(t2)
del(t4)
del(t3)
del(t5)
drop(t3)
drop(t4)
drop(t5)
drop(t2)
Set the SRV_F_DELETED flag when server is removed from the cli.
When removing a server from the cli (in cli_parse_delete_server()),
we update the "visible" server list so that the removed server is no
longer part of the list.
However, despite the server being removed from "visible" server list,
one could still access the server data from a valid ptr (ie: srv_take())
Deleted flag helps detecting when a server is in transient removal
state: that is, removed from the list, thus not visible but not yet
purged from memory.
SE_FL_EOS flag must never be set on the SE descriptor without SE_FL_EOI or
SE_FL_ERROR. When a mux or an applet report an end of stream, it must be
able to state if it is the end of input too or if it is an error.
Because all this part was recently refactored, especially the applet part,
it is a bit sensitive. Thus a BUG_ON_HOT() is used and not a BUG_ON().
The purpose of this patch is only a one-to-one replacement, as far as
possible.
CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream. Here, the applet is a bit
refactored to handle SE descriptor EOS, EOI and ERROR flags
The state of the opposite SC is already tested to wait the connection is
established before sending messages. So, there is no reason to test it again
before looping on the ring buffer.
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream. We must just be sure to consume
request data when we are waiting the applet to be released.
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream.
Here, the refactoring only reports errors by setting SE_FL_ERROR flag.
There are 3 kinds of applet in lua: The co-sockets, the TCP services and the
HTTP services. The three are refactored to use the SE descriptor instead of
the channel to report error and end-of-stream.
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream. We must just be sure to consume
request data when we are waiting the applet to be released.
This patch is bit different than others because messages handling is
dispatched in several functions. But idea if the same.
It is now the dns turn to be refactored to use the SE descriptor instead of
the channel to report error and end-of-stream. We must just be sure to
consume request data when we are waiting the applet to be released.
The state of the opposite SC is already tested to wait the connection is
established before sending requests. So, there is no reason to test it again
before looping on the ring buffer.
It is the same kind of change than for the cache applet. Idea is to use the SE
desc instead of the channel or the SC to report end-of-input, end-of-stream and
errors.
Truncated commands are now reported on error. Other changes are the same than
for the cache applet. We now set SE_FL_EOS flag instead of calling cf_shutr()
and calls to cf_shutw are removed.
We now try, as far as possible, to rely on the SE descriptor to detect end
of processing. Idea is to no longer rely on the channel or the SC to do so.
First, we now set SE_FL_EOS instead of calling and cf_shutr() to report the
end of the stream. It happens when the response is fully sent (SE_FL_EOI is
already set in this case) or when an error is reported. In this last case,
SE_FL_ERROR is also set.
Thanks to this change, it is now possible to detect the applet must only
consume the request waiting for the upper layer releases it. So, if
SE_FL_EOS or SE_FL_ERROR are set, it means the reponse was fully
handled. And if SE_FL_SHR or SE_FL_SHW are set, it means the applet was
released by upper layer and is waiting to be freed.
Just like for end of input, the end of stream reported by the endpoint
(SE_FL_EOS flag) is now handled in sc_applet_process(). The idea is to have
applets acting as muxes by reporting events through the SE descriptor, as
far as possible.
Thanks to the previous patch, it is now possible for applets to not set the
CF_EOI flag on the channels. On this point, the applets get closer to the
muxes.
The end of input reported by the endpoint (SE_FL_EOI flag), is now handled
in sc_applet_process(). This function is always called after an applet was
called. So, the applets can now only report EOI on the SE descriptor and
have no reason to update the channel too.
EOS is now acknowledge at the end of sc_conn_recv(), even if an error was
encountered. There is no reason to not do so, especially because, if it not
performed here, it will be ack in sc_conn_process().
Note, it is still performed in sc_conn_process() because this function is
also the .wake callback function and can be directly called from the lower
layer.
On truncated message, a parsing error is still reported. But an error on the
SE descriptor is also reported. This will avoid any bugs in future. We are
know sure the SC is able to detect the error, independently on the HTTP
analyzers.
In h1_rcv_pipe(), only the end of stream was reported when a read0 was
detected. However, it is also important to report the end of input or an
error, depending on the message state. This patch does not fix any real
issue for now, but some others, specific to the 2.8, rely on it.
No backport needed.
In the PT multiplexer, the end of stream is also the end of input. Thus
we must report EOI to the stream-endpoint descriptor when the EOS is
reported. For now, it is a bit useless but it will be important to
disginguish an shutdown to an error to an abort.
To be sure to not report an EOI on an error, the errors are now handled
first.
In sc_conn_recv(), if the EOS is reported by the endpoint, it will always be
acknowledged by the SC and a read0 will be performed on the input
channel. Thus there is no reason to still test at the begining of the
function because there is already a test on CF_SHUTR.
When a response is consumed, result for co_getblk() is never checked. It
seems ok because amount of output data is always checked first. But There is
an issue when we try to get the first 2 bytes to read the message length. If
there is only one byte followed by a shutdown, the applet ignore the
shutdown and loop till the timeout to get more data.
So to avoid any issue and improve shutdown detection, the co_getblk() return
value is always tested. In addition, if there is not enough data, the applet
explicitly ask for more data by calling applet_need_more_data().
This patch relies on the previous one:
* BUG/MEDIUM: channel: Improve reports for shut in co_getblk()
Both should be backported as far as 2.4. On 2.5 and 2.4,
applet_need_more_data() must be replaced by si_rx_endp_more().
When co_getblk() is called with a length and an offset to 0, shutdown is
never reported. It may be an issue when the function is called to retrieve
all available output data, while there is no output data at all. And it
seems pretty annoying to handle this case in the caller.
Thus, now, in co_getblk(), -1 is returned when the channel is empty and a
shutdown was received.
There is no real reason to backport this patch alone. However, another fix
will rely on it.
There is a bug in a way the channels flags are checked to set clientfin or
serverfin timeout. Indeed, to set the clientfin timeout, the request channel
must be shut for reads (CF_SHUTR) or the response channel must be shut for
writes (CF_SHUTW). As the opposite, the serverfin timeout must be set when
the request channel is shut for writes (CF_SHUTW) or the response channel is
shut for reads (CF_SHUTR).
It is a 2.8-dev specific issue. No backport needed.
In the commut b08c5259e ("MINOR: stconn: Always report READ/WRITE event on
shutr/shutw"), a return statement was erroneously removed from
sc_app_shutr(). As a consequence, CF_SHUTR flags was never set. Fortunately,
it is the default .shutr callback function. Thus when a connection or an
applet is attached to the SC, another callback is used to performe a
shutdown for reads.
It is a 28-dev specific issue. No backport needed.
It is not possible to successfully match an empty response. However using
regex, it should be possible to reject response with any content. For
instance:
tcp-check expect !rstring ".+"
It may seem a be strange to do that, but it is possible, it is a valid
config. So it must work. Thanks to this patch, it is now really supported.
This patch may be backported as far as 2.2. But only if someone ask for it.
As timestamps based on now_ms values are used to compute the probing timeout,
they may wrap. So, use ticks API to compared them.
Must be backported to 2.7 and 2.6.
This this commit, this is ->idle_expire of quic_conn struct which must
be taken into an account to display the idel timer task expiration value:
MEDIUM: quic: Ack delay implementation
Furthermore, this value was always zero until now_ms has wrapped (20 s after
the start time) due to this commit:
MEDIUM: clock: force internal time to wrap early after boot
Do not rely on the value of now_ms compared to ->idle_expire to display the
difference but use ticks_remain() to compute it.
Must be backported to 2.7 where "show quic" has already been backported.
This bug arrived with this commit:
MEDIUM: quic: Ack delay implementation
It is possible that the idle timer task was already in the run queue when its
->expire field was updated calling qc_idle_timer_do_rearm(). To prevent this
task from running in this condition, one must check its ->expire field value
with this condition to run the task if its timer has really expired:
!tick_is_expired(t->expire, now_ms)
Furthermore, as this task may be directly woken up with a call to task_wakeup()
all, for instance by qc_kill_conn() to kill the connection, one must check this
task has really been woken up when it was in the wait queue and not by a direct
call to task_wakeup() thanks to this test:
(state & TASK_WOKEN_ANY) == TASK_WOKEN_TIMER
Again, when this condition is not fulfilled, the task must be run.
Must be backported where the commit mentionned above was backported.
As found in issue #2089, it's easy to mistakenly paste a colon in a
header name, or other chars (e.g. spaces) when quotes are in use, and
this causes all sort of trouble in field because such chars are rejected
by the peer.
Better try to detect these upfront. That's what we're doing here during
the parsing of the add-header/set-header/early-hint actions, where a
warning is emitted if a non-token character is found in a header name.
A special case is made for the colon at the beginning so that it remains
possible to place any future pseudo-headers that may appear. E.g:
[WARNING] (14388) : config : parsing [badchar.cfg:23] : header name 'X-Content-Type-Options:' contains forbidden character ':'.
This should be backported to 2.7, and ideally it should be turned to an
error in future versions.
As now_ms may be zero, these BUG_ON() could be triggered when its value has wrapped.
These call to BUG_ON() may be removed because the values they was supposed to
check are safely used by the ticks API.
Must be backported to 2.6 and 2.7.
If OPENSSL_NO_DEPRECATED is set, we get a 'error: ‘RSA_PKCS1_PADDING’
undeclared' when building jwt.c. The symbol is not deprecated, we are
just missing an include.
This was raised in GitHub issue #2098.
It does not need to be backported.