Most of calls to channel_abort() are associated to a call to
channel_auto_close(). Others are in areas where the auto close is the
default. So, it is now systematically enabled when an abort is performed on
a channel, as part of channel_abort() function.
First, it is useless to abort the both channel explicitly. For HTTP streams,
http_reply_and_close() is called. This function already take care to abort
processing. For TCP streams, we can rely on stream_retnclose().
To set termination flags, we can also rely on http_set_term_flags() for HTTP
streams and sess_set_term_flags() for TCP streams. Thus no reason to handle
them by hand.
At the end, the error handling after filters evaluation is now quite simple.
We erroneously though that an attempt to receive data was not possible if the SC
was waiting for more room in the channel buffer. A BUG_ON() was added to detect
bugs. And in fact, it is possible.
The regression was added in commit 341a5783b ("BUG/MEDIUM: stconn: stop to
enable/disable reads from streams via si_update_rx").
This patch should fix the issue #2115. It must be backported if the commit
above is backported.
A regression was introduced when stream's timeouts were refactored. Write
timeouts are not testing is the right order. When timeous of the front SC
are handled, we must then test the read timeout on the request channel and
the write timeout on the response channel. But write timeout is tested on
the request channel instead. On the back SC, the same mix-up is performed.
We must be careful to handle timeouts before checking channel flags. To
avoid any confusions, all timeuts are handled first, on front and back SCs.
Then flags of the both channels are tested.
It is a 2.8-specific issue. No backport needed.
There is a bug at begining of process_stream(). The SE_FL_ERROR flag is
tested against backend stream-connector's flags instead of its SE
descriptor's flags. It is an old typo, introduced when the stream-interfaces
were replaced by the conn-streams.
This patch must be backported as far as 2.6.
This bug arrived with this commit:
MEDIUM: quic: Ack delay implementation
After having probed the Handshake packet number space, one must not select the
Application encryption level to continue trying building packets as this is done
when the connection is not probing. Indeed, if the ACK timer has been triggered
in the meantime, the packet builder will try to build a packet at the Application
encryption level to acknowledge the received packet. But there is very often
no 01RTT packet to acknowledge when the connection is probing before the
handshake is completed. This triggers a BUG_ON() in qc_do_build_pkt() which
checks that the tree of ACK ranges to be used is not empty.
Thank you to @Tristan971 for having reported this issue in GH #2109.
Must be backported to 2.6 and 2.7.
qel->pktns->tx.pto_probe is set to 0 after having prepared a probing
datagram. There is no reason to check this parameter. Furthermore
it is always 0 when the connection does not probe the peer.
Must be backported to 2.6 and 2.7.
As reported by @Tristan971 in GH #2116, the congestion control window could be zero
due to an inversion in the code about the reduction factor to be applied.
On a new loss event, it must be applied to the slow start threshold and the window
should never be below ->min_cwnd (2*max_udp_payload_sz).
Same issue in both newReno and cubic algorithm. Furthermore in newReno, only the
threshold was decremented.
Must be backported to 2.6 and 2.7.
Add two missing checks not to substract too big values from another too little
one. In this case the resulted wrapped huge values could be passed to the function
which has to remove the last range of a tree of ACK ranges as encoded limit size
not to go below, cancelling the ACK ranges deletion. The consequence could be that
no ACK were sent.
Must be backported to 2.6 and 2.7.
qc_may_build_pkt() has been modified several times regardless of the conditions
the functions it is supposed to allow to send packets (qc_build_pkt()/qc_do_build_pkt())
really use to finally send packets just after having received others, leading
to contraditions and possible very long loops sending empty packets (PADDING only packets)
because qc_may_build_pkt() could allow qc_build_pkt()/qc_do_build_pkt to build packet,
and the latter did nothing except sending PADDING frames, because from its point
of view they had nothing to send.
For now on, this is the job of qc_may_build_pkt() to decide to if there is
packets to send just after having received others AND to provide this information
to the qc_build_pkt()/qc_do_build_pkt()
Note that the unique case where the acknowledgements are completely ignored is
when the endpoint must probe. But at least this is when sending at most two datagrams!
This commit also fixes the issue reported by Willy about a very low throughput
performance when the client serialized its requests.
Must be backported to 2.7 and 2.6.
This should help in diagnosing issues.
Some adjustments have to be done to avoid deferencing a quic_conn objects from
TRACE_*() calls.
Must be backported to 2.7 and 2.6.
Do not ignore very short RTTs (less than 1ms) before computing the smoothed
RTT initializing it to an "infinite" value (UINT_MAX).
Must be backported to 2.7 and 2.6.
Add the number of packet losts and the maximum congestion control window computed
by the algorithms to "show quic".
Same thing for the traces of existent congestion control algorithms.
Must be backported to 2.7 and 2.6.
In fd_update_events(), we loop until there's no bit in the running_mask
that is not in the thread_mask. Problem is, the thread sets its
running_mask bit before that loop, and so if 2 threads do the same, and
a 3rd one just closes the FD and sets the thread_mask to 0, then
running_mask will always be non-zero, and we will loop forever. This is
trivial to reproduce when using a DNS resolver that will just answer
"port unreachable", but could theoretically happen with other types of
file descriptors too.
To fix that, just don't bother looping if we're no longer in the
thread_mask, if that happens we know we won't have to take care of the
FD, anyway.
This should be backported to 2.7, 2.6 and 2.5.
Setting "shards by-group" will create one shard per thread group. This
can often be a reasonable tradeoff between a single one that can be
suboptimal on CPUs with many cores, and too many that will eat a lot
of file descriptors. It was shown to provide good results on a 224
thread machine, with a distribution that was even smoother than the
system's since here it can take into account the number of connections
per thread in the group. Depending on how popular it becomes, it could
even become the default setting in a future version.
Instead of artificially setting the shards count to MAX_THREAD when
"by-thread" is used, let's reserve special values for symbolic names
so that we can add more in the future. For now we use value -1 for
"by-thread", which requires to turn the type to signed int but it was
already used as such everywhere anyway.
fd_migrate_on() can be used to migrate an existing FD to any thread, even
one belonging to a different group from the current one and from the
caller's. All that is needed is to make sure the FD is still valid when
the operation is performed (which is the case when such operations happen).
This is potentially slightly expensive since it locks the tgid during the
delicate operation, but it is normally performed only from an owning
thread to offer the FD to another one (e.g. reassign a better thread upon
accept()).
We're only checking for 0, 1, or >1 groups enabled there, and we'll soon
need to be more precise and know quickly which groups are non-empty.
Let's just replace the count with a mask of enabled groups. This will
allow to quickly spot the presence of any such group in a set.
Alert when the len argument of a stick table type contains incorrect
characters.
Replace atol by strtol.
Could be backported in every maintained versions.
It was missing from the output but is sometimes convenient to observe
and understand how incoming connections are distributed. The CPU usage
is reported as the instant measurement of 100-idle_pct for each thread,
and the average value is shown for the aggregated value.
This could be backported as it's helpful in certain troublehsooting
sessions.
As the ACK frames are not added to the packet list of ack-eliciting frames,
it could not be traced. But there is a flag to identify such packet.
Let's use it to add this information to the traces of TX packets.
Must be backported to 2.6 and 2.7.
Dump at proto level the packet information when its header protection was removed.
Remove no more use qpkt_trace variable.
Must be backported to 2.7 and 2.6.
It is possible that the handshake was not confirmed and there was no more
packet in flight to probe with. It this case the server must wait for
the client to be unblocked without probing any packet number space contrary
to what was revealed by interop tests as follows:
[01|quic|2|uic_loss.c:65] TX loss pktns : qc@0x7fac301cd390 pktns=I pp=0
[01|quic|2|uic_loss.c:67] TX loss pktns : qc@0x7fac301cd390 pktns=H pp=0 tole=-102ms
[01|quic|2|uic_loss.c:67] TX loss pktns : qc@0x7fac301cd390 pktns=01RTT pp=0 if=1054 tole=-1987ms
[01|quic|5|uic_loss.c:73] quic_loss_pktns(): leaving : qc@0x7fac301cd390
[01|quic|5|uic_loss.c:91] quic_pto_pktns(): entering : qc@0x7fac301cd390
[01|quic|3|ic_loss.c:121] TX PTO handshake not already completed : qc@0x7fac301cd390
[01|quic|2|ic_loss.c:141] TX PTO : qc@0x7fac301cd390 pktns=I pp=0 dur=83ms
[01|quic|5|ic_loss.c:142] quic_pto_pktns(): leaving : qc@0x7fac301cd390
[01|quic|3|c_conn.c:5179] needs to probe Initial packet number space : qc@0x7fac301cd390
This bug was not visible before this commit:
BUG/MINOR: quic: wake up MUX on probing only for 01RTT
This means that before it, one could do bad things (probing the 01RTT packet number
space before the handshake was confirmed).
Must be backported to 2.7 and 2.6.
When end-of-stream is reported by a H2 stream, we must take care to also
report an error is end-of-input was not reported. Indeed, it is now
mandatory to set SE_FL_EOI or SE_FL_ERROR flags when SE_FL_EOS is set.
It is a 2.8-specific issue. No backport needed.
When TCP connection is first upgrade to H1 then to H2, the stream-connector,
created by the PT mux, must be destroyed because the H2 mux cannot inherit
from it. When it is performed, the SE_FL_EOS flag is set but SE_FL_EOI must
also be set. It is now required to never set SE_FL_EOS without SE_FL_EOI or
SE_FL_ERROR.
It is a 2.8-specific issue. No backport needed.
Timeouts for dynamic resolutions are not handled at the stream level but by
the resolvers themself. It means there is no connect, client and server
timeouts defined on the internal proxy used by a resolver.
While it is not an issue for DNS resolution over UDP, it can be a problem
for resolution over TCP. New sessions are automatically created when
required, and killed on excess. But only established connections are
considered. Connecting ones are never killed. Because there is no conncet
timeout, we rely on the kernel to report a connection error. And this may be
quite long.
Because resolutions are periodically triggered, this may lead to an excess
of unusable sessions in connecting state. This also prevents HAProxy to
quickly exit on soft-stop. It is annoying, especially because there is no
reason to not set a connect timeout.
So to mitigate the issue, we now use the "resolve" timeout as connect
timeout for the internal proxy attached to a resolver.
This patch should be backported as far as 2.4.
Thanks to previous commit ("BUG/MEDIUM: dns: Kill idle DNS sessions during
stopping stage"), DNS idle sessions are killed on stopping staged. But the
task responsible to kill these sessions is running every 5 seconds. It
means, when HAProxy is stopped, we can observe a delay before the process
exits.
To reduce this delay, when the resolvers task is executed, all DNS idle
tasks are woken up.
This patch must be backported as far as 2.6.
There is no server timeout for DNS sessions over TCP. It means idle session
cannot be killed by itself. There is a task running peridically, every 5s,
to kill the excess of idle sessions. But the last one is never
killed. During the stopping stage, it is an issue since the dynamic
resolutions are no longer performed (2ec6f14c "BUG/MEDIUM: resolvers:
Properly stop server resolutions on soft-stop").
Before the above commit, during stopping stage, the DNS sessions were killed
when a resolution was triggered. Now, nothing kills these sessions. This
prevents the process to finish on soft-stop.
To fix this bug, the task killing excess of idle sessions now kill all idle
sessions on stopping stage.
This patch must be backported as far as 2.6.
When the log applet is executed while a shut is pending, the remaining
output data must always be consumed. Otherwise, this can prevent the stream
to exit, leading to a spinning loop on the applet.
It is 2.8-specific. No backport needed.
When the stats applet is executed while a shut is pending, the remaining
output data must always be consumed. Otherwise, this can prevent the stream
to exit, leading to a spinning loop on the applet.
It is 2.8-specific. No backport needed.
When the http-client applet is executed while a shut is pending, the
remaining output data must always be consumed. Otherwise, this can prevent
the stream to exit, leading to a spinning loop on the applet.
It is 2.8-specific. No backport needed.
When the cli applet is executed while a shut is pending, the remaining
output data must always be consumed. Otherwise, this can prevent the stream
to exit, leading to a spinning loop on the applet.
This patch should fix the issue #2107. It is 2.8-specific. No backport
needed.
An applet must never set SE_FL_EOS flag without SE_FL_EOI or SE_FL_ERROR
flags. Here, SE_FL_EOI flag was missing for "quit" or "_getsocks"
commands. Indeed, these commands are terminal.
This bug triggers a BUG_ON() recently added.
This patch is related to the issue #2107. It is 2.8-specific. No backport
needed.
When calls to strcpy() were replaced with calls to strlcpy2(), one of them
was replaced wrong, and the source and size were inverted. Correct that.
This should fix issue #2110.
These ones are genenerally harmless on modern compilers because the
compiler checks them. While gcc optimizes them away without even
referencing strcpy(), clang prefers to call strcpy(). Nevertheless they
prevent from enabling stricter checks so better remove them altogether.
They were all replaced by strlcpy2() and the size of the destination
which is always known there.
strcpy() is quite nasty but tolerable to copy constants, but here
it copies a variable path into a node in a code path that's not
trivial to follow given that it takes the node as the result of
a tree lookup. Let's get rid of it and mention where the entry
is retrieved.
As every time strncat() is used, it's wrong, and this one is no exception.
Users often think that the length applies to the destination except it
applies to the source and makes it hard to use correctly. The bug did not
have an impact because the length was preallocated from the sum of all
the individual lengths as measured by strlen() so there was no chance one
of them would change in between. But it could change in the future. Let's
fix it to use memcpy() instead for strings, or byte copies for delimiters.
No backport is needed, though it can be done if it helps to apply other
fixes.
Add code so that compression can be used for requests as well.
New compression keywords are introduced :
"direction" that specifies what we want to compress. Valid values are
"request", "response", or "both".
"type-req" and "type-res" define content-type to be compressed for
requests and responses, respectively. "type" is kept as an alias for
"type-res" for backward compatibilty.
"algo-req" specifies the compression algorithm to be used for requests.
Only one algorithm can be provided.
"algo-res" provides the list of algorithm that can be used to compress
responses. "algo" is kept as an alias for "algo-res" for backward
compatibility.
Make provision for being able to store both compression algorithms and
content-types to compress for both requests and responses. For now only
the responses one are used.
Make provision for storing the compression algorithm and the compression
context twice, one for requests, and the other for responses. Only the
response ones are used for now.
When a trace message for an applet is dumped, if the SC exists, the stream
always exists too. There is no way to attached an applet to a health-check.
So, we can use the unsafe version __sc_strm() to get the stream.
This patch is related to #2106. Not sure it will be enough for
Coverity. However, there is no bug here.
On startup/reload, startup_logs_init() will try to export startup logs shm
filedescriptor through the internal HAPROXY_STARTUPLOGS_FD env variable.
While memprintf() is used to prepare the string to be exported via
setenv(), str_fd argument (first argument passed to memprintf()) could
be non NULL as a result of HAPROXY_STARTUPLOGS_FD env variable being
already set.
Indeed: str_fd is already used earlier in the function to store the result
of getenv("HAPROXY_STARTUPLOGS_FD").
The issue here is that memprintf() is designed to free the 'out' argument
if out != NULL, and here we don't expect str_fd to be freed since it was
provided by getenv() and would result in memory violation.
To prevent any invalid free, we must ensure that str_fd is set to NULL
prior to calling memprintf().
This must be backported in 2.7 with eba6a54cd4 ("MINOR: logs: startup-logs
can use a shm for logging the reload")