Since commit cc7a11ee3 ("MINOR: threads: set the tid, ltid and their bit
in thread_cfg") we ought not use (1UL << thr) to get the group mask for
thread <thr>, but (ha_thread_info[thr].ltid_bit). wdt_handler() needs
this.
Since commit cc7a11ee3 ("MINOR: threads: set the tid, ltid and their bit
in thread_cfg") we ought not use (1UL << thr) to get the group mask for
thread <thr>, but (ha_thread_info[thr].ltid_bit). ha_thread_dump() needs
this.
In order to replace the global "all_threads_mask" we'll need to have an
equivalent per group. Take this opportunity for calling it threads_enabled
and make sure which ones are counted there (in case in the future we allow
to stop some).
Now that the tgid is accessible from the thread, it's pointless to have
it in the group, and it was only set but never used. However we'll soon
frequently need the mask corresponding to the group ID and the risk of
getting it wrong with the +1 or to shift 1 instead of 1UL is important,
so let's store the tgid_bit there.
At several places we're dereferencing the thread group just to catch
the group number, and this will become even more required once we start
to use per-group contexts. Let's just add the tgid in the thread_info
struct to make this easier.
Every single place where sleeping_thread_mask was still used was to test
or set a single thread. We can now add a per-thread flag to indicate a
thread is sleeping, and remove this shared mask.
The wake_thread() function now always performs an atomic fetch-and-or
instead of a first load then an atomic OR. That's cleaner and more
reliable.
This is not easy to test, as broadcast FD events are rare. The good
way to test for this is to run a very low rate-limited frontend with
a listener that listens to the fewest possible threads (2), and to
send it only 1 connection at a time. The listener will periodically
pause and the wakeup task will sometimes wake up on a random thread
and will call wake_thread():
frontend test
bind :8888 maxconn 10 thread 1-2
rate-limit sessions 5
Alternately, disabling/enabling a frontend in loops via the CLI also
broadcasts such events, but they're more difficult to observe since
this is causing connection failures.
Right now when an inter-thread wakeup happens, we preliminary check if the
thread was asleep, and if so we wake the poller up and remove its bit from
the sleeping mask. That's not very clean since the sleeping mask cannot be
entirely trusted since a thread that's about to wake up will already have
its sleeping bit removed.
This patch adds a new per-thread flag (TH_FL_NOTIFIED) to remember that a
thread was notified to wake up. It's cleared before checking the task lists
last, so that new wakeups can be considered again (since wake_thread() is
only used to notify about task wakeups and FD polling changes). This way
we do not need to modify a remote thread's sleeping mask anymore. As such
wake_thread() now only tests and sets the TH_FL_NOTIFIED flag but doesn't
clear sleeping anymore.
When enabling an FD that's only bound to another thread, instead of
always picking the first one, let's pick a random one. This is rarely
used (enabling a frontend, or session rate-limiting period ending),
and has greater chances of avoiding that some obscure corner cases
could degenerate into a poorly distributed load.
Till now, update_fd_polling() used to check if all the target threads were
sleeping, and only then would wake an owning thread up. This causes several
problems among which the need for the sleeping_thread_mask and the fact that
by the time we wake one thread up, it has changed.
This commit changes this by leaving it to wake_thread() to perform this
check on the selected thread, since wake_thread() is already capable of
doing this now. Concretely speaking, for updt_fd_polling() it will mean
performing one computation of an ffsl() before knowing the sleeping status
on a global FD state change (which is very rare and not important here,
as it basically happens after relaxing a rate-limit (i.e. once a second
at beast) or after enabling a frontend from the CLI); thus we don't care.
When returning from the polling syscall, all pollers have a certain
dance to follow, made of wall clock updates, thread harmless updates,
idle time management and sleeping mask updates. Let's have a centralized
function to deal with all of this boring stuff: fd_leaving_poll(), and
make all the pollers use it.
The thread flags are touched a little bit by other threads, e.g. the STUCK
flag may be set by other ones, and they're watched a little bit. As such
we need to use atomic ops only to manipulate them. Most places were already
using them, but here we generalize the practice. Only ha_thread_dump() does
not change because it's run under isolation.
Almost every call place of wake_thread() checks for sleeping threads and
clears the sleeping mask itself, while the function is solely used for
there. Let's move the check and the clearing of the bit inside the function
itself. Note that updt_fd_polling() still performs the check because its
rules are a bit different.
Now that the inter-task wakeups are cheap, there's no point in using
task_instant_wakeup() anymore when dequeueing tasks. The use of the
regular task_wakeup() is sufficient there and will preserve a better
fairness: the test that went from 40k to 570k RPS now gives 580k RPS
(down from 585k RPS with previous commit). This essentially reverts
commit 27fab1dcb ("MEDIUM: queue: use tasklet_instant_wakeup() to
wake tasks").
Since we don't mix tasks from different threads in the run queues
anymore, we don't need to use the eb32sc_ trees and we can switch
to the regular eb32 ones. This uses cheaper lookup and insert code,
and a 16-thread test on the queues shows a performance increase
from 570k RPS to 585k RPS.
This bit field used to be a per-thread cache of the result of the last
lookup of the presence of a task for each thread in the shared cache.
Since we now know that each thread has its own shared cache, a test of
emptiness is now sufficient to decide whether or not the shared tree
has a task for the current thread. Let's just remove this mask.
grq_total was only used to know how many tasks were being queued in the
global runqueue for stats purposes, and that was transferred to the per
thread rq_total counter once assigned. We don't need this anymore since
we know where they are, so let's just directly update rq_total and drop
that one.
Since we only use the shared runqueue to put tasks only assigned to
known threads, let's move that runqueue to each of these threads. The
goal will be to arrange an N*(N-1) mesh instead of a central contention
point.
The global_rqueue_ticks had to be dropped (for good) since we'll now
use the per-thread rqueue_ticks counter for both trees.
A few points to note:
- the rq_lock stlil remains the global one for now so there should not
be any gain in doing this, but should this trigger any regression, it
is important to detect whether it's related to the lock or to the tree.
- there's no more reason for using the scope-based version of the ebtree
now, we could switch back to the regular eb32_tree.
- it's worth checking if we still need TASK_GLOBAL (probably only to
delete a task in one's own shared queue maybe).
The runqueue ticks counter is per-thread and wasn't initially meant to
be shared. We'll soon have to share it so let's make it atomic. It's
only updated when waking up a task, and no performance difference was
observed. It was moved in the thread_ctx struct so that it doesn't
pollute the local cache line when it's later updated by other threads.
TASK_SHARED_WQ was set upon task creation and never changed afterwards.
Thus if a task was created to run anywhere (e.g. a check or a Lua task),
all its timers would always pass through the shared timers queue with a
lock. Now we know that tid<0 indicates a shared task, so we can use that
to decide whether or not to use the shared queue. The task might be
migrated using task_set_affinity() but it's always dequeued first so
the check will still be valid.
Not only this removes a flag that's difficult to keep synchronized with
the thread ID, but it should significantly lower the load on systems with
many checks. A quick test with 5000 servers and fast checks that were
saturating the CPU shows that the check rate increased by 20% (hence the
CPU usage dropped by 17%). It's worth noting that run_task_lists() almost
no longer appears in perf top now.
This removes the mask-based variant so that from now on the low-level
function becomes appctx_new_on() and it takes either a thread number or
a negative value for "any thread". This way we can use task_new_on() and
task_new_anywhere() instead of task_new() which will soon disappear.
At a few places where the task's thread mask. Now we know that it's always
either one bit or all bits of all_threads_mask, so we can replace it with
either 1<<tid or all_threads_mask depending on what's expected.
It's worth noting that the global_tasks_mask is still set this way and
that it's reaching its limits. Similarly, the task_new() API would deserve
an update to stop using a thread mask and use a thread number instead.
Similarly, task_set_affinity() should be updated to directly take a
thread number.
At this point the task's thread mask is not used anymore.
At several places we need to figure the ID of the first thread allowed
to run a task. Till now this was performed using my_ffsl(t->thread_mask)
but since we now have the thread ID stored into the task, let's use it
instead. This is tagged major because it starts to assume that tid<0 is
strictly equivalent to atleast2(thread_mask), and that as such, among
the allowed threads are the current one.
The tasks currently rely on a mask but do not have an assigned thread ID,
contrary to tasklets. However, in practice they're either running on a
single thread or on any thread, so that it will be worth simplifying all
this in order to ease the transition to the thread groups.
This patch introduces a "tid" field in the task struct, that's either
the number of the thread the task is attached to, or a negative value
if the task is not bound to a thread, (i.e. its mask is all_threads_mask).
The new ID is only set and updated but not used yet.
The thread mask will not be used anymore, instead the thread id only
is used. Interestingly it was already implemented in the parsing but
not used. The single/multi thread argument is not needed anymore since
it's sufficient to pass tid<0 to get a multi-threaded task/tasklet.
This is in preparation for the removal of the thread_mask in tasks as
only this debug code was using it!
Before 2.3, after an async crypto processing or on session close, the engine
async file's descriptors were removed from the fdtab but not closed because
it is the engine which has created the file descriptor, and it is responsible
for closing it. In 2.3 the fd_remove() call was replaced by fd_stop_both()
which stops the polling but does not remove the fd from the fdtab and the
fd remains indefinitively in the fdtab.
A simple replacement by fd_delete() is not a valid fix because fd_delete()
removes the fd from the fdtab but also closes the fd. And the fd will be
closed twice: by the haproxy's core and by the engine itself.
Instead, let's set FD_DISOWN on the FD before calling fd_delete() which will
take care of not closing it.
This patch must be backported on branches >= 2.3, and it relies on this
previous patch:
MINOR: fd: add a new FD_DISOWN flag to prevent from closing a deleted FD
As mentioned in the patch above, a different flag will be needed in 2.3.
Some FDs might be offered to some external code (external libraries)
which will deal with them until they close them. As such we must not
close them upon fd_delete() but we need to delete them anyway so that
they do not appear anymore in the fdtab. This used to be handled by
fd_remove() before 2.3 but we don't have this anymore.
This patch introduces a new flag FD_DISOWN to let fd_delete() know that
the core doesn't own the fd and it must not be closed upon removal from
the fd_tab. This way it's totally unregistered from the poller but still
open.
This patch must be backported on branches >= 2.3 because it will be
needed to fix a bug affecting SSL async. it should be adapted on 2.3
because state flags were stored in a different way (via bits in the
structure).
Adjust FIN signal on Rx path for the application layer : ensure that the
receive buffer has no gap.
Without this extra condition, FIN was signalled as soon as the STREAM
frame with FIN was received, even if we were still waiting to receive
missing offsets.
This bug could have lead to incomplete requests read from the
application protocol. However, in practice this bug has very little
chance to happen as the application layer ensures that the demuxed frame
length is equivalent to the buffer data size. The only way to happen is
if to receive the FIN STREAM as the H3 demuxer is still processing on a
frame which is not the last one of the stream.
This must be backported up to 2.6. The previous patch on ncbuf is
required for the newly defined function ncb_is_fragmented().
MINOR: ncbuf: implement ncb_is_fragmented()
Implement a new status function for ncbuf. It allows to quickly report
if a buffer contains data in a fragmented way, i.e. with gaps in between
or at start of the buffer.
To summarize, a buffer is considered as non-fragmented in the following
cases :
- a null or empty buffer
- a full buffer
- a buffer containing exactly one data block at the beginning, following
by a gap until the end.
Since a previous refactoring, application protocol layer is not require
anymore to call qcs_consume(). This function is now automatically used
by the MUX itself.
First we add a loop around recfrom() into the most low level I/O handler
quic_sock_fd_iocb() to collect as most as possible datagrams before during
its tasklet wakeup with a limit: we recvfrom() at most "maxpollevents"
datagrams. Furthermore we add a local task list into the datagram handler
quic_lstnr_dghdlr() which is passed to the first datagrams parser qc_lstnr_pkt_rcv().
This latter parser only identifies the connection associated to the datagrams then
wakeup the highest level packet parser I/O handlers (quic_conn.*io_cb()) after
it is done, thanks to the call to tasklet_wakeup_after() which replaces from now on
the call to tasklet_wakeup(). This should reduce drastically the latency and the
chances to fulfil the RX buffers at the QUIC connections level as reported in
GH #1737 by Tritan.
These modifications depend on this commit:
"MINOR: task: Add tasklet_wakeup_after()"
Must be backported to 2.6 with the previous commit.
Remove the call to qc_list_all_rx_pkts() which print messages on stderr
during RX buffer overruns and add a new counter for the number of dropped packets
because of such events.
Must be backported to 2.6
When the connection RX buffer is full, the received packets are dropped.
Some of them were not taken into an account by the ->dropped_pkt counter.
This very simple patch has no impact at all on the packet handling workflow.
Must be backported to 2.6.
We want to be able to schedule a tasklet onto a thread after the current tasklet
is done. What we have to do is to insert this tasklet at the head of the thread
task list. Furthermore, we would like to serialize the tasklets. They must be
run in the same order as the order in which they have been scheduled. This is
implemented passing a list of tasklet as parameter (see <head> parameters) which
must be reused for subsequent calls.
_tasklet_wakeup_after_on() is implemented to accomplish this job.
tasklet_wakeup_after_on() and tasklet_wake_after() are only wrapper macros around
_tasklet_wakeup_after_on(). tasklet_wakeup_after_on() does exactly the same thing
as _tasklet_wakeup_after_on() without having to pass the filename and line in the
filename as parameters (usefull when DEBUG_TASK is enabled).
tasklet_wakeup_after() hides also the usage of the thread parameter which is
<tl> tasklet thread ID.
Return QPACK_DECOMPRESSION_FAILED error code when dealing with dynamic
table references. This is justified as for now haproxy does not
implement dynamic table support and advertizes a zero-sized table.
The H3 calling function will thus reuse this code in a CONNECTION_CLOSE
frame, in conformance with the QPACK RFC.
This proper error management allows to remove obsolete ABORT_NOW guards.
This is a complement to partial fix from commit
debaa04f9e
BUG/MINOR: qpack: abort on dynamic index field line decoding
The main objective is to fix coverity report about usage of
uninitialized variable when receiving dynamic table references. These
references are invalid as for the moment haproxy advertizes a 0-sized
dynamic table. An ABORT_NOW clause is present to catch this. A following
patch will clean up this in order to properly handle QPACK errors with
CONNECTION_CLOSE.
This should fix github issue #1753.
No need to backport as this was introduced in the current dev branch.
Emit a CONNECTION_CLOSE if HEADERS parsing function returns an error.
This is useful to remove previous ABORT_NOW guards.
For the moment, the whole connection is closed. In the future, it may be
justified to only reset the faulting stream in some cases. This requires
the implementation of RESET_STREAM emission.
The local variable 't' was renamed 'static_tbl'. Fix its name in the
qpack_debug_printf() statement which is activated only with QPACK_DEBUG
mode.
No need to backport as this was introduced in current dev branch.
This patch adds a filter to limit bandwith at the stream level. Several
filters can be defined. A filter may limit incoming data (upload) or
outgoing data (download). The limit can be defined per-stream or shared via
a stick-table. For a given stream, the bandwith limitation filters can be
enabled using the "set-bandwidth-limit" action.
A bandwith limitation filter can be used indifferently for HTTP or TCP
stream. For HTTP stream, only the payload transfer is limited. The filter is
pretty simple for now. But it was designed to be extensible. The current
design tries, as far as possible, to never exceed the limit. There is no
burst.
In GH #1760 (which is marked as being a feature), there were compilation
errors on MacOS which could be reproduced in Linux when building 32-bit code
(-m32 gcc option). Most of them were due to variables types mixing in QUIC_MIN macro
or using size_t type in place of uint64_t type.
Must be backported to 2.6.
This previous commit:
"BUG/MAJOR: Big RX dgrams leak when fulfilling a buffer"
partially fixed an RX dgram memleak. There is a missing break in the loop which
looks for the first datagram attached to an RX buffer dgrams list which may be
reused (because consumed by the connection thread). So when several dgrams were
consumed by the connection thread and are present in the RX buffer list, some are
leaked because never reused for ever. They are removed for their list.
Furthermore, as commented in this patch, there is always at least one dgram
object attached to an RX dgrams list, excepted the first time we enter this
I/O handler function for this RX buffer. So, there is no need to use a loop
to lookup and reuse the first datagram in an RX buffer dgrams list.
This isssue was reproduced with quiche client with plenty of POST requests
(100000 streams):
cargo run --bin quiche-client -- https://127.0.0.1:8080/helloworld.html
--no-verify -n 100000 --method POST --body /var/www/html/helloworld.html
and could be reproduce with GET request.
This bug was reported by Tristan in GH #1749.
Must be backported to 2.6.
When entering quic_sock_fd_iocb() I/O handler which is responsible
of recvfrom() datagrams, the first thing which is done it to try
to reuse a dgram object containing metadata about the received
datagrams which has been consumed by the connection thread.
If this object could not be used for any reason, so when we
"goto out" of this function, we must release the memory allocated
for this objet, if not it will leak. Most of the time, this happened
when we fulfilled a buffer as reported in GH #1749 by Tristan. This is why we
added a pool_free() call just before the out label. We mark <new_dgram> as NULL
when it successfully could be used.
Thank you for Tristan and Willy for their participation on this issue.
Must be backported to 2.6.
After having fulfilled a buffer, then marked it as full, we must
consume the remaining space. But to do that, and not to erase the
already existing data, we must check there is not remaining data in
after the tail of the buffer (between the tail and the head).
This is done adding a condition to test that adding the number of
bytes from the remaining contiguous space to the tail does not
pass the wrapping postion in the buffer.
Must be backported to 2.6.
Sometimes using "debug dev memstats" can be frustrating because all
pool allocations are reported through pool-os.h and that's all.
But in practice there's nothing wrong with also intercepting pool_alloc,
pool_free and pool_zalloc and report their call counts and locations,
so that's what this patch does. It only uses an alternate set of macroes
for these 3 calls when DEBUG_MEM_STATS is defined. The outputs are
reported as P_ALLOC (for both pool_malloc() and pool_zalloc()) and
P_FREE (for pool_free()).
A curious practise seems to have started long ago and contaminated various
code areas, consisting in appending "_pool" at the end of the name of a
given pool. That makes no sense as the name is only used to name the pool
in diags such as "show pools", and since names are truncated there, this
adds some confusion when analysing the dump outputs. Let's just clean all
of them at once. there were essentially in SSL and QUIC.
There's a subtle bug in stream_free() when releasing captures. The
pools may be NULL when no capture is defined, and the calls to
pool_free() are inconditional. The only reason why this doesn't
cause trouble is because the pointer to be freed is always NULL in
this case and we don't go further down the chain. That's particularly
ugly and it complicates debugging, so let's only call these ones when
the pointers are set.
There's no impact on running code, it only fools those trying to debug
pools manually. There's no need to backport it though it unless it helps
for debugging sessions.
freq_ctr_overshoot_period() function may be used to retrieve the excess of
events over the current period for a givent frequency counter, ignoring the
history. It is a way compare the "current rate" (the number of events over
the current period) to a given rate and estimate the excess of events.
It may be used to safely add new events, especially at the begining of the
current period for a frequency counter with large period. This way, it is
possible to smoothly add events during the whole period without quickly
consuming all the quota at the beginning of the period and waiting for the
next one to be able to add new events.
Because of the previous fix, if the HTTP parsing is performed when the
"method" sample fetch is called, we always rely on the string representation
of the request method.
Indeed, if no parsing was performed when the "method" sample fetch is
called, the transaction method is HTTP_METH_OTHER because it was just
initialized. However, without this patch, in this case, we always retrieve
the method by reading the request start-line.
Now, when the method is HTTP_METH_OTHER, we systematically try to parse the
request but the method is tested once again after the parsing to be able to
use the integer representation when possible.
This patch must be backported as far as 2.0.
This patch is required to fix "method" sample fetch. But it make sense to
initialize the method of an HTTP transaction to HTTP_METH_OTHER. This way,
before the request parsing, the method is considered as unknown except if we
are able to retrieve the request start-line. It is especially important for
TCP streams.
About the "method" sample fetch, this patch is a way to be sure no random
method is returned when the sample fetch is used on a TCP stream before any
HTTP parsing.
This patch must be backported as far as 2.0.
This bug was revealed by key_update QUIC tracker test. During this test,
after the handshake has succeeded, the client sends a 1-RTT packet containing
only a PING frame. On our side, we do not acknowledge it, because we have
no ack-eliciting packet to send. This is not correct. We must acknowledge all
the ack-eliciting packets unconditionally. But we must not send too much
ACK frames: every two packets since we have sent an ACK frame. This is the test
(nb_aepkts_since_last_ack >= QUIC_MAX_RX_AEPKTS_SINCE_LAST_ACK) which ensure
this is the case. But this condition must be checked at the very last time,
when we are building a packet and when an acknowledgment is required. This
is not to qc_may_build_pkt() to do that. This boolean function decides if
we have packets to send. It must return "true" if there is no more ack-eliciting
packets to send and if an acknowledgement is required.
We must also add a "force_ack" parameter to qc_build_pkt() to force the
acknowledments during the handshakes (for each packet). If not, they will
be sent every two packets. This parameter must be passed to qc_do_build_pkt()
and be checked alongside the others conditions when building a packet to
decide to add an ACK frame or not to this packet.
Must be backported to 2.6.
A bug was introduced by commit 9bf3a1f67e
"BUG/MINOR: ssl: Fix crash when no private key is found in pem".
If a private key is already contained in a pem file, we will still look
for a .key file and load its private key if it exists when we should
not.
This patch should be backported to all branches where the original fix
was backported (all the way to 2.2).
As reported in issue #1755, gcc-9.3 and 9.4 emit a "maybe-uninitialized"
warning in cli_io_handler_commit_cafile_crlfile() because it sees that
when the "path" variable is not set, we're jumping to the error label
inside the loop but cannot see that the new state will avoid the places
where the value is used. Thus it's a false positive but a difficult one.
Let's just preset the value to NULL to make it happy.
This was introduced in 2.7-dev by commit ddc8e1cf8 ("MINOR: ssl_ckch:
Simplify I/O handler to commit changes on CA/CRL entry"), thus no
backport is needed for now.
Sometimes we need to be able to signal one thread among a mask, without
caring much about which bit will be picked. At the moment we use ffsl()
for this but this sometimes results in imbalance at certain specific
places where the same first thread in a set is always the same one that
is selected.
Another approach would consist in using the rank finding function but it
requires a popcount and a setup phase, and possibly a modulo operation
depending on the popcount, which starts to be very expensive.
Here we take a different approach. The idea is an input bit value is
passed, from 0 to LONGBITS-1, and that as much as possible we try to
pick the bit matching it if it is set. Otherwise we look at a mirror
position based on a decreasing power of two, and jump to the side
that still has bits left. In 6 iterations it ends up spotting one bit
among 64 and the operations are very cheap and optimizable. This method
has the benefit that we don't care where the holes are located in the
mask, thus it shows a good distribution of output bits based on the
input ones. A long-time test shows an average of 16 cycles, or ~4ns
per lookup at 3.8 GHz, which is about twice as fast as using the rank
finding function.
Just like for that one, the code was stored into tools.c since we don't
have a C file for intops.
In bug #1751, it was reported that haproxy is consumming too much memory
since the 2.4 version. This is because of a change in the master, which
loses completely its configuration in wait mode, and lose its maxconn.
Without the maxconn, haproxy will try to compute one itself, and will
allocate RAM consequently, too much in our case. Which means the master
will have a too high maxconn and too much RAM allocated.
The patch fixes the issue by setting the maxconn to the default value
when re-executing the master in wait mode.
Must be backported as far as 2.5.
Implement quic_tp_version_info_dump() to dump such a transport parameter (only remote).
Call it from quic_transport_params_dump() which dump all the transport parameters.
Can be backported to 2.6 as it's useful for debugging.
All packets received during hanshakes must be acknowledged asap. This was
not the case for Handshake packets received. At this time, this had
no impact because the client has often only one Handshake packet to send
and last handshake to be sent on our side always embeds an HANDSHAKE_DONE
frame which leads the client to consider it has no more handshake packet
to send.
Add <force_ack> to qc_may_build_pkt() to force an ACK frame to be sent.
Set this parameter to 1 when sending packets from Initial or Handshake
packet number spaces, 0 when sending only Application level packet.
Must be backported to 2.6.
The crash occures when the same certificate which is used on both a
server line and a bind line is inserted in a crt-list over the CLI.
This is quite uncommon as using the same file for a client and a server
certificate does not make sense in a lot of environments.
This patch fixes the issue by skipping the insertion of the SNI when no
bind_conf is available in the ckch_inst.
Change the reg-test to reproduce this corner case.
Should fix issue #1748.
Must be backported as far as 2.2. (it was previously in ssl_sock.c)
Add an ABORT_NOW() clause if indexed field line referred to the dynamic
table. This is required as current haproxy QPACK implementation does not
support the dynamic table.
Note that this should not happen as haproxy explicitely advertizes a
null-sized dynamic table to the other peer.
This shoud fix github issue #1753.
No need to backport as this was introduced by commit
b666c6b26e
MINOR: qpack: improve decoding function
Free Rx packet in the datagram handler if the packet was not taken in
charge by a quic_conn instance. This is reflected by the packet refcount
which is null.
A packet can be rejected for a variety of reasons. For example, failed
decryption, no Initial token and Retry emission or for datagram null
padding.
This patch should resolve the Rx packets memory leak observed via "show
pools" with the previous commit
2c31e12936
BUG/MINOR: quic: purge conn Rx packet list on release
This specific memory leak instance was reproduced using quiche client
which uses null datagram padding.
This should partially resolve github issue #1751.
It must be backported up to 2.6.
When releasing a quic_conn instance, free all remaining Rx packets in
quic_conn.rx.pkt_list. This partially fixes a memory leak on Rx packets
which can be observed after several QUIC connections establishment.
This should partially resolve github issue #1751.
It must be backported up to 2.6.
As reported by broxio in GH #1757, there was a duplication field name
for "quic_streams_data_blocked_bidi", due to a copy and paste without
renaming I guess.
Must be backported to 2.6.
This counter must be incremented only one time by connection and decremented
as soon as the handshake has failed or succeeded. This is a gauge. Under certain
conditions this counter could be decremented twice. For instance
after having received a TLS alert, then upon SSL_do_handshake() failure.
To stop having to deal to all the current combinations which can lead to such a
situation (and the next to come), add a connection flag to denote if this counter
has been already decremented for a connection. So, this counter must be decremented
only if this flag has not been already set.
Must be backported up to 2.6.
Non constant expressions were used to initialize constant variables leading to
such compilation errors:
src/xprt_quic.c:66:3: error: initializer element is not a constant expression
.key_label_len = strlen(QUIC_HKDF_KEY_LABEL_V1),
Reproduced with CC=gcc-4.9 compilation option.
Fix using macros for each HKDF label.
In order to better detect the danger caused by extra shared libraries
which replace some symbols, upon dlopen() we now compare a few critical
symbols such as malloc(), free(), and some OpenSSL symbols, to see if
the loaded library comes with its own version. If this happens, a
warning is emitted and TAINTED_REDEFINITION is set. This is important
because some external libs might be linked against different libraries
than the ones haproxy was linked with, and most often this will end
very badly (e.g. an OpenSSL object is allocated by haproxy and freed
by such libs).
Since the main source of dlopen() calls is the Lua lib, via a "require"
statement, it's worth trying to show a Lua call trace when detecting a
symbol redefinition during dlopen(). As such we emit a Lua backtrace if
Lua is detected as being in use.
Several bug reports were caused by shared libraries being loaded by other
libraries or some Lua code. Such libraries could define alternate symbols
or include dependencies to alternate versions of a library used by haproxy,
making it very hard to understand backtraces and analyze the issue.
Let's intercept dlopen() and set a new TAINTED_SHARED_LIBS flag when it
succeeds, so that "show info" makes it visible that some external libs
were added.
The redefinition is based on the definition of RTLD_DEFAULT or RTLD_NEXT
that were previously used to detect that dlsym() is usable, since we need
it as well. This should be sufficient to catch most situations.
This function may be used to try to show where some Lua code is currently
being executed. It tries hard to detect the initialization phase, both for
the global and the per-thread states, and for runtime states. This intends
to be used by error handlers to provide the users with indications about
what Lua code was being executed when the error triggered.
Calling hlua_traceback() sometimes reports empty entries looking like:
[C]: ?
These ones correspond to certain internal C functions of the Lua library,
but they do not provide any information and even complicate the
interpretation of the dump. Better just skip them.
The commit 731c8e6cf ("MINOR: stream: Simplify retries counter calculation")
introduced a regression. It broke the dontlog-normal option because the test
on the connection retries counter was not updated accordingly.
This patch should fix the issue #1754. It must be backported to 2.6.
On destructive connection upgrade, instead of using the new mux name to
abort the old stream, we can relay on the stream connector flags. If it is
detached after the upgrade, it means the stream will not be resused by the
new mux and it must be aborted.
This patch may be backported to 2.6.
When the protocol is changed for a client connection at the stream level
(from TCP to H1/H2), there are two cases. The stream may be reused or
not. The first case, when the stream is reused is working. The second one is
buggy since the conn-stream refactoring and leads to a crash.
In this case, the new mux don't reuse the stream. It must be silently
aborted. However, its front stream connector is still referencing the
connection. So it must be detached. But it must be performed in two stages,
to be sure to not loose the context for the upgrade and to be able to
rollback on error. So now, before the upgrade, we prepare to detach the
stconn and it is finally detached if the upgrade succeeds. There is a trick
here. Because we pretend the stconn is detached but its state is preserved.
This patch must be backported to 2.6.
tasklet_kill() was introduced in 2.5-dev4 with commit 7b368339a
("MEDIUM: task: implement tasklet kill"), but a comparison error
there makes tasklets killed on thread 1 assigned to the killing
thread. Fortunately, the function was finally not used so there's
no harm right now, hence the minor tag, but this must be fixed and
backported in case a later fix relies on it.
This should be backported to 2.5.
At this time haproxy supported only incompatible version negotiation feature which
consists in sending a Version Negotiation packet after having received a long packet
without compatible value in its version field. This version value is the version
use to build the current packet. This patch does not modify this behavior.
This patch adds the support for compatible version negotiation feature which
allows endpoints to negotiate during the first flight or packets sent by the
client the QUIC version to use for the connection (or after the first flight).
This is done thanks to "version_information" parameter sent by both endpoints.
To be short, the client offers a list of supported versions by preference order.
The server (or haproxy listener) chooses the first version it also supported as
negotiated version.
This implementation has an impact on the tranport parameters handling (in both
direcetions). Indeed, the server must sent its version information, but only
after received and parsed the client transport parameters). So we cannot
encode these parameters at the same time we instantiated a new connection.
Add QUIC_TP_DRAFT_VERSION_INFORMATION(0xff73db) new transport parameter.
Add tp_version_information new C struct to handle this new parameter.
Implement quic_transport_param_enc_version_info() (resp.
quic_transport_param_dec_version_info()) to encode (resp. decode) this
parameter.
Add qc_conn_finalize() which encodes the transport parameters and configure
the TLS stack to send them.
Add ->negotiated_ictx quic_conn C struct new member to store the Initial
QUIC TLS context for the negotiated version. The Initial secrets derivation
is version dependent.
Rename ->version to ->original_version and add ->negotiated_version to
this C struct to reflect the QUIC-VN RFC denomination.
Modify most of the QUIC TLS API functions to pass a version as parameter.
Export the QUIC version definitions to be reused at least from quic_tp.c
(transport parameters.
Move the token check after the QUIC connection lookup. As this is the original
version which is sent into a Retry packet, and because this original version is
stored into the connection, we must check the token after having retreived this
connection.
Add packet version to traces.
See https://datatracker.ietf.org/doc/html/draft-ietf-quic-version-negotiation-08
for more information about this new feature.
This is not clear at all how to distinguish a QUIC draft version number from a
released one. And among these QUIC draft versions, which one must use the draft
QUIC TLS extension.
According to the QUIC implementations which support v2 draft, the TLS extension
(transport parameters) to be used is the released one
(TLS_EXTENSION_QUIC_TRANSPORT_PARAMETERS).
As the unique QUIC draft version we support is 0xff00001d and as at this time the
unique version with 0xff as most significant byte is this latter which must use
the draft TLS extension, we select the draft TLS extension
(TLS_EXTENSION_QUIC_TRANSPORT_PARAMETERS_DRAFT) only for such versions with 0xff
as most signification byte.
This is becoming difficult to handle the QUIC TLS related definitions
which arrive with a QUIC version (draft or not). So, here we add
quic_version C struct which does not define only the QUIC version number,
but also the QUIC TLS definitions which depend on a QUIC version.
Modify consequently all the QUIC TLS API to reuse these definitions
through new quic_version C struct.
Implement quic_pkt_type() function which return a packet type (0 up to 3)
depending on the QUIC version number.
Stop harding the Retry packet first byte in send_retry(): this is not more
possible because the packet type field depends on the QUIC version.
Also modify quic_build_packet_long_header() for the same reason: the packet
type depends on the QUIC version.
Add a quic_version C struct member to quic_conn C struct.
Modify qc_lstnr_pkt_rcv() to set this member asap.
Remove the version member from quic_rx_packet C struct: a packet is attached
asap to a connection (or dropped) which is the unique object which should
store the QUIC version.
Modify qc_pkt_is_supported_version() to return a supported quic_version C
struct from a version number.
Add Initial salt for QUIC v2 draft (initial_salt_v2_draft).
This is to prepare the support for QUIC v2 version. The packet type
depends on the version. So, we must parse early enough the version
before defining the type of each packet.
The nonce and keys used to cipher the Retry tag depend on the QUIC version.
Add these definitions for 0xff00001d (draft-29) and v2 QUIC version. At least
draft-29 is useful for QUIC tracker tests with "quic-force-retry" enabled
on haproxy side.
Validated with -v 0xff00001d ngtcp2 option.
Could not validate the v2 nonce and key at this time because not supported.
Use the same version as the one received. This is safe because the
version is treated before anything else sending a Version packet.
Must be backported to 2.6.
A pretty ugly mistake introduced recently with an invalid goto statement
which prevents QUIC compilation on haproxy.
This must be backported on 2.6 as a complement to
60ef19f137
BUG/MINOR: h3/qpack: deal with too many headers
Adjust decoding loop by using temporary ist for header name and value.
The header is inserted at the end of an iteration, which guarantee that
we do not insert name only in the list in case of an error on value
decoding. This also helps the function readability by centralizing the
LIST insert operation.
The return value of the decoding function is also changed. Now on
success the number of headers inserted in the input list is returned.
This change as no impact as success value is not used by the caller.
This is mainly done to have a behavior similar to hpack decoding
function.
ensures that we never insert too many entries in a headers input list.
On the decoding side, a new error QPACK_ERR_TOO_LARGE is reported in
this case.
This prevents crash if headers number on a H3 request or response is
superior to tune.http.maxhdr config value. Previously, a crash would
occur in QPACK decoding function.
Note that the process still crashes later with ABORT_NOW() because error
reporting on frame parsing is not implemented for now. It should be
treated with a RESET_STREAM frame in most cases.
This can be backported up to 2.6.
Post-base indices is not supported at the moment for decoding. This
should never be encountered as it is only used with a dynamic table.
However, haproxy deactivates support for the dynamic table via its
SETTINGS.
Use ABORT_NOW() if this situation happens anyway. This should help
debugging instead of silently failed without error reporting.
Complete QPACK decoding with full implementation of litteral field name
with litteral value representation. This change is mandatory to support
decoding of headers name not present in the QPACK static table.
Previously, these headers were silently ignored and not transferred on
the backend request.
QPACK decoding should now be sufficient to deal with all real
situations. Only post-base indices representation are not handled but
this should not cause a problem as they are only used for the dynamic
table whose size is null as announced by the haproxy implementation.
The direct impact of this change is that it should now be possible to
use complex webapp through a QUIC frontend.
This must be backported up to 2.6.
Clean up QPACK decoder API by removing dependencies on ncbuf and
MUX-QUIC. This reduces includes statements. It will also help to
implement a standalone QPACK decoder.
Add comments on the decoding function to facilitate code analysis.
Also remove the qpack_debug_hexdump() which prints the whole left buffer
on each header parsing. With large HEADERS frame payload, QPACK traces
are complicated to debug with this statement.
If we've stopped consulting the local wait queue due to too many tasks
(max_processed <= 0), there's no point starting to lock the shared WQ,
check the first task's expiration date, upgrading the lock just to
refrain from doing the work because of the limit. All this does is
increase contention on an already contended system.
Note that there is still a fairness issue in this WQ dequeuing code. If
each thread is busy with expired tasks, no thread will dequeue the global
ones. In practice it doesn't make much sense and should quickly resorb,
but it could be nice to have an alternating flag indicating where to
start from on next call to improve this.
This macro was used both for binding and for lookups. When binding tasks
or FDs, using all_threads_mask instead is better as it will later be per
group. For lookups, ~0UL always does the job. Thus in practice the macro
was already almost not used anymore since the rest of the code could run
fine with a constant of all ones there.
In 1.9-dev1, commit 5bc9972ed ("BUG/MINOR: lua/threads: Make lua's tasks
sticky to the current thread") to detect unconfigured Lua tasks that could
run on any thread, by comparing their thread mask with MAX_THREADS_MASK.
The proper way to do it is to check for at least 2 threads in their mask
in fact. This is more reliable and allows to get rid of MAX_THREADS_MASK
there.
Each thread has its own local thread id and its own global thread id,
in addition to the masks corresponding to each. Once the global thread
ID can go beyond 64 it will not be possible to have a global thread Id
bit anymore, so better start to remove it and use only the local one
from the struct thread_info.
This simply replaces a call to task_new(1<<thr) with task_new_on(thr)
so that we can later isolate the changes required to add more thread
group stuff.
Instead of having a global mask of all the profiled threads, let's have
one flag per thread in each thread's flags. They are never accessed more
than one at a time an are better located inside the threads' contexts for
both performance and scalability.
LIST_ELEM macro was incorrectly used in the loop when purging
flow-control frames from qcc.lfctl.frms on MUX release. This caused a
segfault in qc_release() due to an invalid quic_frame pointer instance.
The occurence of this bug seems fairly rare. To happen, some
flow-control frames must have been allocated but not yet sent just as
the MUX release is triggered.
I did not find a reproducer scenario. Instead, I artificially triggered
it by inserting a quic_frame in qcc.lfctl.frms just before purging it in
qc_release() using the following snippet.
struct quic_frame *frm;
frm = pool_zalloc(pool_head_quic_frame);
LIST_INIT(&frm->reflist);
frm->type = QUIC_FT_MAX_DATA;
frm->max_data.max_data = 0;
LIST_APPEND(&qcc->lfctl.frms, &frm->list);
This should fix github issue #1747.
This must be backported up to 2.6.
The CLI applet process one request after another. Thus, when several
requests are pipelined, it is important to notify it won't consume remaining
outgoing data while it is processing a request. Otherwise, the applet may be
woken up in loop. For instance, it may happen with the HTTP client while we
are waiting for the server response if a shutr is received.
This patch must be backported in all supported versions after an observation
period. But a massive refactoring was performed in 2.6. So, for the 2.5 and
below, the patch will have to be adapted. Note also that, AFAIK, the bug can
only be triggered by the HTTP client for now.
in .chk_snd applet callback function, we must not wake up an applet if
SE_FL_WONT_CONSUME flag is set. Indeed, if an applet explicitly specify it
will not consume any outgoing data, it is useless to wake it up when more
data are sent. Note the applet may still be woken up for another reason. In
this case SE_FL_WONT_CONSUME flag will be removed. It is the applet
responsibility to set it again, if necessary.
This patch must be backported to 2.6 after an observation period. On earlier
versions, the bug probably exists too. However, because a massive
refactoring was performed in 2.6, the patch will have to be adapted and
carefully reviewed/tested if it is backported..
Since the conn-stream refactoring, from the time the health-check is in
progress, its stream-connector is always defined. So, some tests on it are
useless and can be removed.
This patch should fix the issue #1739.
When a TCP content ruleset is evaluated, we stop waiting for more data if
the inspect-delay is reached, if there is a read error or if we know no more
data will be received. This last point is only valid for ACLs. An action may
decide to yield for another reason. For instance, in the SPOE, the
"send-spoe-group" action yields while the agent response is not
received. Thus, now, an action call is final only when the inspect-delay is
reached or if there is a read error. But it is possible for an action to
yield if the buffer is full or if CF_EOI flag is set.
This patch could be backported to all supported versions.
When the MUX transfers a big amount of data to the client, the transport
layer may reject some of them because of the congestion controller
limit. Frames built by the MUX are thus dropped, even if streams transferred
data are kept in buffers for future new frames.
Thus, the MUX is required to free rejected frames. This fixes a memory
leak which may grow with important data transfers.
It should be backported to 2.6 after it has been tested and validated.
TX flow-control enforcing is not straightforward : it requires the usage
of several counters at stream and connection level, in part due to the
difficult sending API between MUX and quic-conn layers.
To strengthen this part and ensures it behaves as expected, some
existing BUG_ON statements were adjusted and new one were added. This
should help to catch errors as early as possible, as in the case with
github issue #1738.
The flow control enforced at connection level is incorrectly calculated.
There is a risk of exceeding the limit. In most cases, this results in a
segfault produced by a BUG_ON which is here to catch this kind of error.
If not compiled with DEBUG_STRICT, this should generate a connection
closed by the client due to the flow control overflow.
The problem is encountered when transfered payload is big enough to fill
the transport congestion window. In this case, some data are rejected by
the transport layer and kept by the MUX to be reemitted later. However,
these preserved data are not counted on the connection flow control when
resubmitted, which gradually amplify the gap between expected and real
consumed flow control.
To fix this, handle the flow-control at the connection level in the same
way as the stream level. A new field qcc.tx.offsets is incremented as
soon as data are transfered between stream TX buffers. The field
qcc.tx.sent_offsets is preserved to count bytes handled by the transport
layer and stop the MUX transfer if limit is reached.
As already stated, this bug can occur during transfers with enough
emitted data, over multiple streams. When using a single stream, the
flow control at the stream level hides it.
The BUG_ON crash is reproduced systematically with quiche client :
$ quiche-client --no-verify --http-version HTTP/3 -n 10000 https://127.0.0.1:20443/10K
This must be backported up to 2.6 when confirmed to work as expected.
This should fix github issue #1738.
This is the continuation of commit 5a0e7ca5d ("BUG/MINOR: cli/stats: add
missing trailing LF after JSON outputs"). There's also a "show info json"
command which was also missing the trailing LF. It's constructed exactly
like the "show stat json", in that it dumps a series of fields without any
LF. The difference however is that for the stats output, everything was
enclosed in an array which required an LF *after* the closing bracket,
while here there's no such array so we have to emit the LF after the loop.
That makes the two functions a bit inconsistent, it's quite annoying, but
making them better would require sending one LF per line in the stats
output, which is not particularly interesting.
Given that it took 5+ years to spot that this code wasn't working as
expected it doesn't seem worth investing much time trying to refactor
it to make it look cleaner at the risk of breaking other obscure parts.
Leonhard Wimmer reported an interesting bug in github issue #1742.
Servers in disabled proxies that are configured for resolution are still
subscribed to DNS resolutions, but the LB algos are not initialized at
all since the proxy is disabled, so when the server state changes,
attempts to update its status cause a crash when the server's weight
is recalculated via a divide by the proxy's total weight which is zero.
This should be backported to all versions. Beware that before 2.5 or
so, there's no PR_FL_DISABLED flag, instead px->disabled should be
used (2.3-2.4) or PR_STSTOPPED for older versions.
Thanks to Leonhard for his report and quick test!
Patrick Hemmer reported that we have a bug in the CLI commands
"show stat json" and "show schema json" in that they forget the trailing
LF that's required to mark the end of the response. This has been the
case since the introduction of the feature in 1.8-dev1 by commit 6f6bb380e
("MEDIUM: stats: Add show json schema"), so this fix may be backported to
all versions.
Function used to parse SETTINGS frame is incorrect as it does not stop
at the frame length but continue to parse beyond it. In most cases, it
will result in a connection closed with error H3_FRAME_ERROR.
This bug can be reproduced with clients that sent more than just a
SETTINGS frame on the H3 control stream. This is notably the case with
aioquic which emit a MAX_PUSH_ID after SETTINGS.
This bug has been introduced in the current dev release, by the
following patch
62eef85961
MINOR: mux-quic: simplify decode_qcs API
thus, it does not need to be backported.
This changes the default from RFC 7540's default 65535 (64k-1) to avoid
avoid some degenerative WINDOW_UPDATE behaviors in the wild observed with
clients using 65536 as their buffer size, and have to complete each block
with a 1-byte frame, which with some servers tend to degenerate in 1-byte
WU causing more 1-byte frames to be sent until the transfer almost only
uses 1-byte frames.
More details here: https://github.com/nghttp2/nghttp2/issues/1722
As mentioned in previous commit (MEDIUM: mux-h2: try to coalesce outgoing
WINDOW_UPDATE frames) the issue could not be reproduced with haproxy but
individual WU frames are sent so theoretically nothing prevents this from
happening. As such it should be backported as a workaround for already
deployed clients after watching for any possible side effect with rare
clients. As an added benefit, uploads from curl now use less DATA frames
(all are 16384 now). Note that the previous patch alone is sufficient to
stop the issue with curl in case this one would need to be reverted.
[wt: edited commit messaged, updated doc]
Glenn Strauss from Lighttpd reported a corner case affecting curl+lighttpd
that causes some uploads to degenerate to extremely suboptimal conditions
under certain circumstances, and noted that many other implementations
were possibly not safe against this degradation.
Glenn's detailed analysis is available here:
https://github.com/nghttp2/nghttp2/issues/1722
In short, curl uses a 65536 bytes buffer and the default stream window
is 65535, with 16384 bytes per frame. Curl will then send 3 frames of
16384 bytes followed by one of 16383, will wait for a window update to
send the last byte before recycling the buffer to read the next 64kB.
On each round like this, one extra single-byte frame will be sent, and
if ACKs for these single-byte frames are not aggregated, this will only
allow the client to send one extra byte at a time. At some point it is
possible (at least Glenn observed it) to have mostly 1-byte frames in
the transfer, resulting in huge CPU usage and a long transfer.
It was not possible to reproduce this with haproxy, even when playing
with frame sizes, buffer sizes nor window sizes. One reason seems to
be that we're using the same buffer size for the connection and the
stream and that the frame headers prevent the filling of the window
from happening on the same boundaries as on the sender. However it
does occasionally happen to see up to two 1-byte data frames in a row,
indicating that there's definitely room for improvement.
The WINDOW_UPDATE frames for the connection are sent at the end of the
demuxing, but the ones for the streams are currently sent immediately
after a DATA frame is processed, mostly for convenience. But we don't
need to proceed like this, we already have the counter of unacked bytes
in rcvd_s, so we can simply use that to decide when to send an ACK. It
must just be done before processing a new frame. The benefit is that
contiguous frames for the same stream will now only produce a single
WU, like for the connection. On complicated tests involving a client
that was limited to 100 Mbps transfers and a dummy Lua-based payload
consumer, it was possible to see the number of stream WU frames being
halved for a 100 MB transfer, which is already a nice saving anyway.
Glenn proposed a better workaround consisting in increasing the
default window size to 65536. This will be done in a separate patch
so that both can be studied independently in field and backported as
needed.
This patch is not much complicated and shold be backportable. It just
needs to be tested in development first.
BUG_ON() assertion to check for incomplete SETTINGS frame is incorrect.
It should check if frame length is greater, not smaller, than current
buffer data. Anyway, this BUG_ON() is useless as h3_decode_qcs()
prevents parsing of an incomplete frame, except for H3 DATA. Remove it
to fix this bug.
This bug was introduced in the current dev tree by commit
commit 62eef85961
MINOR: mux-quic: simplify decode_qcs API
Thus it does not need to be backported.
This fixes crashes which happen with DEBUG_STRICT=2. Most notably, this
is reproducible with clients that emit more than just a SETTINGS frame
on the H3 control stream. It can be reproduced with aioquic for example.
The health-check attached to an email alert has no type. It is unexpected,
and since the 2.6, it is important because we rely on it to know the
application type in front of a connection at the stream-connector
level. Because the object type is not set, the SE descriptor is not properly
initialized, leading to a segfault when a connection to the SMTP server is
established.
This patch must be backported to 2.6 and may be backported as far as
2.0. However, it is only an issue for the 2.6 and upper.
There is no server for email alerts. So the trace messages must be adapted
to handle this case. Information related to the server are now skipped for
email alerts and "[EMAIL]" prefix is used.
This patch must be backported as far as 2.4.
Email alerts are based on health-checks but with no server. Thus, in
__trace() function, responsible to write a trace message, we must be
prepared to have no server and thus no proxy.
This patch must be backported as far as 2.4.
Building QUIC with gcc-4.4 on el6 shows this error:
src/xprt_quic.c: In function 'qc_release_lost_pkts':
src/xprt_quic.c:1905: error: unknown field 'loss' specified in initializer
compilation terminated due to -Wfatal-errors.
make: *** [src/xprt_quic.o] Error 1
make: *** Waiting for unfinished jobs....
Initializing an anonymous form of union like :
struct quic_cc_event ev = {
(...)
.loss.time_sent = newest_lost->time_sent,
(...)
};
generates an error with gcc-4.4 but not when initializing the
fields outside of the declaration.
Convert return code to -1 when an error has been detected. This is
required since the previous API change on return value from the patch :
1f21ebdd76
MINOR: mux-quic/h3: adjust demuxing function return values
Without this, QUIC MUX won't consider the call as an error and will try
to remove one byte from the buffer. This may cause a BUG_ON failure if
the buffer is empty at this stage.
This bug was introduced in the current dev tree. Does not need to be
backported.
Clean the API used by decode_qcs() and transcoder internal functions.
Parsing functions now returns a ssize_t which represents the number of
consumed bytes or a negative error code. The total consumed bytes is
returned via decode_qcs().
The API is now unified and cleaner. The MUX can thus simply use the
return value of decode_qcs() instead of substracting the data bytes in
the buffer before and after the call. Transcoders functions are not
anymore obliged to remove consumed bytes from the buffer which was not
obvious.
Slightly modify decode_qcs function used by transcoders. The MUX now
gives a buffer instance on which each transcoder is free to work on it.
At the return of the function, the MUX removes consume data from its own
buffer.
This reduces the number of invocation to qcs_consume at the end of a
full demuxing process. The API is also cleaner with the transcoders not
responsible of calling it with the risk of having the input buffer
freed if empty.
As a mirror to qcc/qcs types, add a h3c pointer into h3s struct. This
should help to clean up H3 code and avoid to use qcs.qcc.ctx to retrieve
the h3c instance.
smp_fc_http_major may be used to return the http version as an integer
used on the frontend or backend side. Previously, the handler only
checked for version 2 or 1 as a fallback. Extend it to support version 3
with the QUIC mux.
Commit d6c66f06a ("MINOR: ssl_ckch: Remove service context for "set ssl
crl-file" command") introduced a regression leading to a build error because
of a possible uninitialized value. It is now fixed.
This patch must be backported as far as 2.5.
A build error is reported about the path variable in the switch statement on
the commit type, in cli_io_handler_commit_cafile_crlfile() function. The
enum contains only 2 values, but a default clause has been added to return an
error to make GCC happy.
This patch must be backported as far as 2.5.
Commit 9a99e5478 ("BUG/MINOR: ssl_ckch: Dump CRL transaction only once if
show command yield") introduced a regression leading to a build error
because of a possible uninitialized value. It is now fixed.
This patch must be backported as far as 2.5.
Commit 5a2154bf7 ("BUG/MINOR: ssl_ckch: Dump CA transaction only once if
show command yield") introduced a regression leading to a build error
because of a possible uninitialized value. It is now fixed.
This patch must be backported as far as 2.5.
Commit 3e94f5d4b ("BUG/MINOR: ssl_ckch: Dump cert transaction only once if
show command yield") introduced a regression leading to a build error
because of a possible uninitialized value. It is now fixed.
This patch must be backported as far as 2.2.
The same type is used for CA and CRL entries. So, in commit_cert_ctx
structure, there is no reason to have different fields for the CA and CRL
entries.
.next_ckchi_link field must be initialized to NULL instead of .next_ckchi in
cli_parse_commit_crlfile() function. Only '.nex_ckchi_link' is used in the
I/O handler.
This patch must be backported as far as 2.5 with some adaptations for the 2.5.
When loaded SSL certificates are displayed via "show ssl cert" command, the
in-progess transaction, if any, is also displayed. However, if the command
yield, the transaction is re-displayed again and again.
To fix the issue, old_ckchs field is used to remember the transaction was
already displayed.
This patch must be backported as far as 2.2.
When loaded CA files are displayed via "show ssl ca-file" command, the
in-progress transaction, if any, is also displayed. However, if the command
yield, the transaction is re-displayed again and again.
To fix the issue, old_cafile_entry field is used to remember the transaction
was already displayed.
This patch must be backported as far as 2.5.
When loaded CRL files are displayed via "show ssl crl-file" command, the
in-progess transaction, if any, is also displayed. However, if the command
yield, the transaction is re-displayed again and again.
To fix the issue, old_crlfile_entry field is used to remember the transaction
was already displayed.
This patch must be backported as far as 2.5.
Because of a typo (I guess), an unknown type is used for the old entry in
show_crlfile_ctx structure. Because this field is unused, there is no
compilation error. But it must be a cafile_entry and not a crlfile_entry.
Note this field is not used for now, but it will be used.
This patch must be backported to 2.6.
Simplify cli_io_handler_commit_cafile_crlfile() handler function by
retrieving old and new entries at the beginning. In addition the path is
also retrieved at this stage. This removes several switch statements.
Note that the ctx was already validated by the corresponding parsing
function. Thus there is no reason to test the pointers.
While it is not a bug, this patch may help to fix issue #1731.
There is an enum to determine the entry entry type when changes are
committed on a CA/CRL entry. So use it in the service context instead of an
integer.
This patch may help to fix issue #1731.
Rewrite failures in http rules are reported as proxy errors (PRXCOND) in
logs. However, other rewrite errors are reported as internal errors. For
instance, it happens when we fail to add X-Forwarded-For header. It is not
consistent and it is confusing. So now, all rewite failures are reported as
proxy errors.
This patch may be backported if necessary.
'httpclient' command does not properly handle full buffer cases. When the
response buffer is full, we exit to retry later. However, the context flags
are updated. It means when this happens, we may loose a part of the
response.
So now, flags are preserved when we fail to push data into the response
buffer. In addition, instead of dumping one part per call, we now try to
dump as much data as possible.
Finally, when there is no more data, because everything was dumped or
because we are waiting for more data from the HTTP client, the applet is
updated accordingly by calling applet_have_no_more_data(). Otherwise, when
some data are blocked, applet_putchk() already takes care to update the SE
flags. So, it is useless to call sc_need_room().
This patch should fix the issue #1723. It must be backported as far as
2.5. But a massive refactoring was performed in 2.6. So, for the 2.5 and
below, the patch will have to be adapted.
Commit 534645d6 ("BUG/MEDIUM: httpclient: Fix loop consuming HTX blocks from
the response channel") introduced a regression. When the response is
consumed, The HTX header blocks are removed before duplicating them. Thus,
the first header block is always lost.
This patch must be backported as far as 2.5.
'add ssl crt-list' command is also concerned. This patch is similar to the
previous ones. Full buffer cases when we try to push the reply are not
properly handled. To fix the issue, the functions responsible to add a
crt-list entry were reworked.
First, the error message is now part of the service context. This way, if we
cannot push the error message in the reponse buffer, we may retry later. To
do so, a dedicated state was created (ADDCRT_ST_ERROR,). Then, the success
message is also handled in a dedicated state (ADDCRT_ST_SUCCESS). This way
we are able to retry to push it if necessary. Finally, the dot displayed for
each new instance is now immediatly pushed in the response buffer, and
before the update. This way, we are able to retry too if necessary.
This patch should fix the issue #1724. It must be backported as far as
2.2. But a massive refactoring was performed in 2.6. So, for the 2.5 and
below, the patch will have to be adapted.
'commit ssl crl-file' command is also concerned. This patch is similar to
the previous one. Full buffer cases when we try to push the reply are not
properly handled. To fix the issue, the functions responsible to commit CA
or CRL entry changes were reworked.
First, the error message is now part of the service context. This way, if we
cannot push the error message in the reponse buffer, we may retry later. To
do so, a dedicated state was created (CACRL_ST_ERROR). Then, the success
message is also handled in a dedicated state (CACRL_ST_SUCCESS). This way we
are able to retry to push it if necessary. Finally, the dot displayed for
each updated CKCH instance is now immediatly pushed in the response buffer,
and before the update. This way, we are able to retry too if necessary.
This patch should fix the issue #1722. It must be backported as far as
2.5. But a massive refactoring was performed in 2.6. So, for the 2.5, the
patch will have to be adapted.
When changes on a certificate are commited, a trash buffer is used to create
the response. Once done, the message is copied in the response buffer.
However, if the buffer is full, there is no way to retry and the message is
lost. The same issue may happen with the error message. It is a design issue
of cli_io_handler_commit_cert() function.
To fix it, the function was reworked. First, the error message is now part
of the service context. This way, if we cannot push the error message in the
reponse buffer, we may retry later. To do so, a dedicated state was created
(CERT_ST_ERROR). Then, the success message is also handled in a dedicated
state (CERT_ST_SUCCESS). This way we are able to retry to push it if
necessary. Finally, the dot displayed for each updated CKCH instance is now
immediatly pushed in the response buffer, and before the update. This way,
we are able to retry too if necessary.
This patch should fix the issue #1725. It must be backported as far as
2.2. But massive refactoring was performed in 2.6. So, for the 2.5 and
below, the patch must be adapted.
When a CA or CRL entry is replaced (via 'set ssl ca-file' or 'set ssl
crl-file' commands), the path is duplicated and used to identify the ongoing
transaction. However, if the same command is repeated, the path is still
duplicated but the transaction is not changed and the duplicated path is not
released. Thus there is a memory leak.
By reviewing the code, it appears there is no reason to duplicate the
path. It is always the filename path of the old entry. So, a reference on it
is now used. This simplifies the code and this fixes the memory leak.
This patch must be backported as far as 2.5.
When a certificate entry is replaced (via 'set ssl cert' command), the path
is duplicated and used to identify the ongoing transaction. However, if the
same command is repeated, the path is still duplicated but the transaction
is not changed and the duplicated path is not released. Thus there is a
memory leak.
By reviewing the code, it appears there is no reason to duplicate the
path. It is always the path of the old entry. So, a reference on it is now
used. This simplifies the code and this fixes the memory leak.
This patch must be backported as far as 2.2.
When a CA or a CRL entry is being modified, we must take care to no delete
it because the corresponding ongoing transaction still references it. If we
do so, it leads to a null-deref and a crash may be exeperienced if changes
are commited.
This patch must be backported as far as 2.5.
When a certificate entry is being modified, we must take care to no delete
it because the corresponding ongoing transaction still references it. If we
do so, it leads to a null-deref and a crash may be exeperienced if changes
are commited.
This patch must be backported as far as 2.2.
On the CLI, If we fail to commit changes on a CA or a CRL entry, an error
message is returned. This error must be released.
This patch must be backported as far as 2.4.
On the CLI, If we fail to commit changes on a certificate entry, an error
message is returned. This error must be released.
This patch must be backported as far as 2.2.
When parsing QPACK encoder/decoder streams, h3_decode_qcs() displays an
error trace if they are empty. Change the return code used in QPACK code
to avoid this trace.
To uniformize with MUX/H3 code, 0 is now used to indicate success.
Beyond this spurious error trace, this bug has no impact.
The MUX now provides a single API for both uni and bidirectional
streams. It is responsible to reject reception on a local unidirectional
stream with the error STREAM_STATE_ERROR. This is already implemented in
qcc_recv(). As such, remove this duplicated check from xprt_quic.c.
The H3 frame demuxing code is incorrect when receiving a STREAM frame
which contains only a new H3 frame header without its payload.
In this case, the check on frames bigger than the buffer size is
incorrect. This is because the buffer has been freed via
qcs_consume()/qc_free_ncbuf() as it was emptied after H3 frame header
parsing. This causes the connection to be incorrectly closed with
H3_EXCESSIVE_LOAD error.
This bug was reproduced with xquic client on the interop and with the
command-line invocation :
$ ./interop_client -l d -k $SSLKEYLOGFILE -a <addr> -p <port> -D /tmp \
-A h3 -U https://<addr>:<port>/hello_world.txt
Note also that h3_is_frame_valid() invocation has been moved before the
new buffer size check. This ensures that first we check the frame
validity before returning from the function. It's also better
positionned as this is only needed when a new H3 frame header has been
parsed.
The H3 demuxing code was not fully correct. After parsing the H3 frame
header, the check between frame length and buffer data is wrong as we
compare a copy of the buffer made before the H3 header removal.
Fix this by improving the H3 demuxing code API. h3_decode_frm_header()
now uses a ncbuf instance, this prevents an unnecessary cast
ncbuf/buffer in h3_decode_qcs() which resolves this error.
This bug was not triggered at this moment. Its impact should be really
limited.
Replace ncb_blk_is_null() by ncb_is_null() as a prelude to ncb_data().
The result is the same : the function will return 0 if the buffer is
uninitialized. However, it is clearer to directly call ncb_is_null() to
reflect this.
There is no functional change with this commit.
Some server keywords are currently silently ignored in the peers
section, which is not good because it wastes time on user-side, trying
to make something work while it cannot by design.
With this patch we at least report a few of them (the most common ones),
which are init_addr, resolvers, check, agent-check. Others might follow.
This may be backported to 2.5 to encourage some cleaning of bogus configs.
When parsing a peers section, it's particularly difficult to make the
difference between the local peer which doesn't have any address, and
other peers which need one, and the error messages do not help because
with just:
peers foo
bind :8001
server foo 127.0.0.1:8001
server bar 127.0.0.2:8001
One can get such a confusing message when the local peer is "bar":
[peers.cfg:15] : 'server foo/bar' : unknown keyword '127.0.0.1:8001'.
It's not clear there why the other peer doesn't trigger an error.
With this commit we add a hint in the error message when no address
was expected. The error remains quite generic (since deep into the
server code) but at least the useer gets a hint about why the keyword
wasn't understood:
[peers.cfg:15] : 'server foo/bar' : unknown keyword '127.0.0.1:8001'.
Hint: no address was expected for this server.
For some poor historical reasons, the name of a peers proxy used to be
set to the name of the local peer itself. That causes some confusion when
multiple sections are present because the same proxy name appears at
multiple places in "show peers", but since 2.5 where parsing errors include
the proxy name, a config like this one :
peers foo
server foobar blah
Would report this when the local peer name isn't "foobar":
'server (null)/foobar' : invalid address: 'blah' in 'blah'
And this when it is foobar:
'server foobar/foobar' : invalid address: 'blah' in 'blah'
This is wrong, confusing and not very practical. This commit addresses
all this by using the peers section's name when it's created. This now
allows to report messages such as:
'server foo/foobar' : invalid address: 'blah' in 'blah'
Which make it clear that the section is called "foo" and the server
"foobar".
This may be backported to 2.5, though the patch may be simplified if
needed, by just adding the change at the output of init_peers_frontend().
By having the appctx in argument this function wouldn't have experienced
the previous bug. Better do that now to avoid proliferation of awkward
functions.
In the context of a CLI command, it's particularly not welcome to use
an "appctx" variable that is not the current one. In addition it was
created for use at exactly 6 places in 2 lines. Let's just remove it
and stick to peer->appctx which is used elsewhere in the function and
is unambiguous.
Commit d0a06d52f ("CLEANUP: applet: use applet_put*() everywhere possible")
replaced most accesses to the conn_stream with simpler accesses to the
appctx. Unfortunately, in all the CLI functions using an appctx, one
makes an exception where the appctx is not the caller's but the one being
inspected! When no peers connection is active, the early exit immediately
crashes.
No backport is needed.
stream.c and mux_fcgi.c may cause a warning for a possible NULL deref
at -Os, while that is not possible thanks to the previous test. Let's
just switch to __htx_get_head_blk() instead.
Remove an unneeded BUG_ON statement when find_http_meth() returns
HTTP_METH_OTHER.
This fix is necessary to support requests with unusual methods with
DEBUG_STRICT activated. This was detected when browsing with HTTP/3 over
a nextcloud instance which uses PROPFIND method for Webdav.
Prefix-integer encoding function was incomplete. It was not able to deal
correctly with value encoded on more than 2 bytes. This maximum value depends
on the size of the prefix, but value greater than 254 were all impacted.
Most notably, this change is required to support header name/value with
sizeable length. Previously, length was incorrectly encoded. The client thus
closed the connection with QPACK_DECOMPRESSION_ERROR.
Replace bogus call b_data() by b_room() to check if there is enough
space left in the buffer before encoding a prefix integer.
At this moment, no real scenario was found to trigger a bug related to
this change. This is probably because the buffer always contains data
(field section line and status code) before calling
qpack_encode_prefix_integer() which prevents an occurrence of this bug.
If the connection client timeout has expired, the mux is released.
If the client decides to initiate a new request, we send a STOP_SENDING
frame. Then, the client endessly sends a RESET_STREAM frame.
At this time, we simulate the fact that we support the RESET_STREAM frame
thanks to this ridiculously minimalistic patch.
If the connection client timeout has expired, the mux is released.
If the client decides to initiate a new request, we do not ack its
request. This leads the client to endlessly sent it request.
This patch makes a QUIC listener send a STOP_SENDING frame in such
a situation.
Add ->inc_err_cnt new callback to qcc_app_ops struct which can
be called from xprt to increment the application level error code counters.
It take the application context as first parameter to be generic and support
new QUIC applications to come.
Add h3_stats.c module with counters for all the frame types and error codes.
Rename "tune.quic.conn-buf-limit" to "tune.quic.frontend.conn-tx-buffers.limit"
to reflect the stream direction (TX) and the objects (frontends) which are
concerned.
Add new counters to count the number of dropped packet upon parsing error, lost
sent packets and the number of stateless reset packet sent.
Take the oppportunity of this patch to rename CONN_OPENINGS to QUIC_ST_HALF_OPEN_CONN
(total number of half open connections) and QUIC_ST_HDSHK_FAILS to QUIC_ST_HDSHK_FAIL.
When we select the next encryption level in qc_treat_rx_pkts() we
must reset the local largest_pn variable if we do not want to reuse its
previous value for this encryption. This bug could only happend during
handshake step and had no visible impact because this variable
is only used during the header protection removal step which hopefully
supports the packet reordering.
This is becoming difficult to distinguish the default values for
transport parameters which come with the RFC from our implementation
default values when not set by configuration (tunable parameters).
Add a comment to distinguish them.
Prefix these default values by QUIC_TP_DFLT_ to distinguish them from
QUIC_DFLT_* value even if there are not numerous.
Furthermore ->max_udp_payload_size must be first initialized to
QUIC_TP_DFLT_MAX_UDP_PAYLOAD_SIZE especially for received value.
Add tunable "tune.quic.frontend.max_streams_bidi" setting for QUIC frontends
to set the "initial_max_streams_bidi" transport parameter.
Add some documentation for this new setting.
Add two tunable settings both for backends and frontends "max_idle_timeout"
QUIC transport parameter, "tune.quic.frontend.max-idle-timeout" and
"tune.quic.backend.max-idle-timeout" respectively.
cfg_parse_quic_time() has been implemented to parse a time value thanks
to parse_time_err(). It should be reused for any tunable time value to be
parsed.
Add the documentation for this tunable setting only for frontend.
Add quic_transport_params_dump() static inline function to do so for
a quic_transport_parameters struct as parameter.
We use the trace API do dump these transport parameters both
after they have been initialized (RX/local) or received (TX/remote).
Make the transport parameters be standlone as much as possible as
it consists only in encoding/decoding data into/from buffers.
Reduce the size of xprt_quic.h. Unfortunalety, I think we will
have to continue to include <xprt_quic-t.h> to use the trace API
into this module.
We do not want to count the out of packet padding as being belonging
to an invalid packet, the firt byte of a QUIC packet being never null.
Some browsers like firefox proceeds this way to add PADDING frames
after an Initial packet and increase the size of their Initial packets.
In muxes, the stream-endoint descriptor of a stream is always defined. Thus,
in .show_fd callback functions, there is no reason to test it.
This patch should address the issue #1727.
.
Thanks to the recent refactoring, when tcpcheck_main() function is called,
the stream-connector of the healthchek is always defined. There is no reason
to still test it.
This patch should fix the issue #1721.
The only two places where it was used was to carefully preserve the
SE_FL_WILL_CONSUME flag (since others are irrelevant there and the
previous RXBLK* flags moved to the stconn). Now that the flag is
cleared by default there's no need to re-created a fresh new one
when replacing the descriptor, so we can eliminate that remaining
trick.
Now that the data consumption from the endpoint is the default setting,
we can generalize the pre-clearing of the wont_consume flag, which is
no more specific to applets. In practice it's not needed anymore to do
it, but since streams might be initiatied from asynchronous applets,
these might have blocked their consumption side before creating the
stream thus it's safer to preserve the clearing of the flag.
The stream endpoint descriptor that was named "endp" is now called "sd"
both in the fcgi_strm struct and in the few functions using this. The
name was also updated in the "show fd" output.
The stream endpoint descriptor that was named "endp" is now called "sd"
both in the h2s struct and in the few functions using this. The name
was also updated in the "show fd" output.
The stream endpoint descriptor that was named "endp" is now called "sd"
both in the h1s struct and in the few functions using this. The name
was also updated in the "show fd" output.
In ssl_action_wait_for_hs() the local variables called "cs" is just a
copy of s->scf that's only used once, so it can be removed. In addition
the check was removed as well since it's not possible to have a NULL SC
on a stream.
Function arguments and local variables called "cs" were renamed to
"sc" to avoid future confusion. There was also one place in traces
where "cs" used to display the stconn, which were turned to "sc".
Function arguments and local variables called "cs" were renamed to
"sc" to avoid future confusion. There were also 2 places in traces
where "cs" used to display the stconn, which were turned to "sc".
The "nb_cs" struct field and "h2_has_too_many_cs()" functions were
also renamed.
Function arguments and local variables called "cs" were renamed to
"sc" to avoid future confusion. There were also 2 places in traces
where "cs" used to display the stconn, which were turned to "sc".
h1s_upgrade_cs() and h1s_new_cs() were both renamed to _cs.
Function arguments and local variables called "cs" were renamed to
"sc" to avoid future confusion. There were also 3 places in debugging
traces where "cs" used to display the stconn, which were turned to "sc"
for similar reasons. The number of streams "nb_cs" was turned to "nb_sc".
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. Both the core functions and the ones in the
resolvers files were updated.
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. The HTTP analyser and the backend functions
were all updated after being reviewed. Function stream_update_both_cs()
was renamed to stream_update_both_sc()
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. The "nb_cs" stream-connector counter was
renamed to "nb_sc" and qc_attach_cs() was renamed to qc_attach_sc().
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. The change is huge (~580 lines), so extreme
care was given not to change anything else.
The check struct had a "cs" field renamed to "sc", which also required
a tiny update to a few functions using it to distinguish a check from
a stream (log.c, payload.c, ssl_sample.c, tcp_sample.c, tcpcheck.c,
connection.c).
Function arguments and local variables called "cs" were renamed to "sc".
The presence of one "cs=" in the debugging traces was also turned to
"sc=" for consistency.
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
This file contains all the stream-connector functions that are specific
to application layers of type stream. So let's name it accordingly so
that it's easier to figure what's located there.
The alphabetical ordering of include files was preserved.
QUIC was the last user of entities with "conn_stream" in their names,
though there's no more reason for this given that the pool names were
already pretty straightforward. The renaming does this:
qc_stream_desc: pool_head_quic_conn_stream -> pool_head_quic_stream_desc
qc_stream_buf: pool_head_quic_conn_stream_buf -> pool_head_quic_stream_buf
An equivalent applet_need_more_data() was added as well since that function
is mostly used from applet code. It makes it much clearer that the applet
is waiting for data from the stream layer.
These ones are essentially for the stream endpoint, let's give them a
name that matches the intent. Equivalent versions were provided in the
applet namespace to ease code legibility.
The following flags are not at all related to the endpoint but to the
connector itself:
- SE_FL_RXBLK_ROOM
- SE_FL_RXBLK_BUFF
- SE_FL_RXBLK_CHAN
As such they have no business staying in the endpoint descriptor and
they must move to the stream connector. They've also been renamed
accordingly to better match what they correspond to (the same name
as the function that sets them).
The rare occurrences of cs_rx_blocked() were replaced by an explicit
test on the list of flags. The reason is that cs_rx_blocked() used to
preserve some tests that are not needed at certain places since already
known. For the same reason SE_FL_RXBLK_ANY wasn't converted. As such it
will later be possible to carefully review these few locations and
eliminate the unneeded flags from the tests. No particular function
was made to test them since they're explicit enough.
It now looks like ci_putchk() and friends could very well place the flag
themselves on the connector when they detect a buffer full condition, as
this would significantly simplify the high-level API. But all usages must
first be reviewed before this simplification can be done. For now it
remains done by applet_put*() instead.
It's more explicit this way. The cs_rx_endp_ready() function could be
removed so that the flag is directly tested. In the future it should
be inverted and the few places where it's set (or preserved via
SE_FL_APP_MASK) could be dropped.
At plenty of places we combine multiple flags checks to determine if we
can receive (endp_ready, rx_blocked, cf_shutr etc). Let's group them
under a single function that is meant to replace existing tests.
Some tests were only checking the rxblk flags at the connection level,
so for now they were not converted, this requires a bit of auditing
first, and probably a test to determine whether or not to check for
cf_shutr (e.g. there is none if no stream is present).
The analysis of cs_rx_endp_more() showed that the purpose is for a stream
endpoint to inform the connector that it's ready to deliver more data to
that one, and conversely cs_rx_endp_done() that it's done delivering data
so it should not be bothered again for this.
This was modified two ways:
- the operation is no longer performed on the connector but on the
endpoint so that there is no more doubt when reading applet code
about what this rx refers to; it's the endpoint that has more or
no more data.
- an applet implementation is also provided and mostly used from
applet code since it saves the caller from having to access the
endpoint descriptor.
It's visible that the flag ought to be inverted because some places
have to set it by default for no reason.
These functions are used by the application layer to disable or enable
reading at the stream connector's level when the input buffer failed to
be allocated (or was finally allocated). The new names makes things
clearer.
These functions were used by the channel to inform the lower layer
whether reading was acceptable or not. Usually this directly mimmicks
the CF_DONT_READ flag from the channel, which may be set when it's
desired not to buffer incoming data that will not be processed, or
that the buffer wants to be flushed before starting to read again,
or that bandwidth limiting might be enforced, etc. It's always a
policy reason, not a purely resource-based one.
The new name mor eclearly indicates that a stream connector cannot make
any more progress because it needs room in the channel buffer, or that
it may be unblocked because the buffer now has more room available. The
testing function is sc_waiting_room(). This is mostly used by applets.
Note that the flags will change soon.
This makes SE_FL_APPLET_NEED_CONN autonomous, in that we check for it
everywhere we have a relevant cs_rx_blocked(), so that the flag doesn't
need anymore to be covered by cs_rx_blocked(). Indeed, this flag doesn't
really translate a receive blocking condition but rather a refusal to
wake up an applet that is waiting for a connection to finish to setup.
This also ensures we will not risk to set it back on a new endpoint
after cs_reset_endp() via SE_FL_APP_MASK, because the flag being
specific to the endpoint only and not to the connector, we don't
want to preserve it when replacing the endpoint.
It's possible that cs_chk_rcv() could later be further simplified if
we can demonstrate that the two tests in it can be merged.
This flag is exclusively used when a front applet needs to wait for the
other side to connect (or fail to). Let's give it a more explicit name
and remove the ambiguous function that was used only once.
This also ensures we will not risk to set it back on a new endpoint
after cs_reset_endp() via SE_FL_APP_MASK, because the flag being
specific to the endpoint only and not to the connector, we don't
want to preserve it when replacing the endpoint.
This flag is no more needed, it was only set on shut read to be tested
by cs_rx_blocked() which is now properly tested for shutr as well. The
cs_rx_blk_shut() calls were removed. Interestingly it allowed to remove
a special case in the L7 retry code.
This also ensures we will not risk to set it back on a new endpoint
after cs_reset_endp() via SE_FL_APP_MASK.
One flag (RXBLK_SHUT) is always set with CF_SHUTR, so in order to remove
it, we first need to make sure we always check for CF_SHUTR where
cs_rx_blocked() is being used.
sc_is_send_allowed() is now used everywhere instead of the combination
of cs_tx_endp_ready() && !cs_tx_blocked(). There's no place where we
need them individually thus it's simpler. The test was placed in cs_util
as we'll complete it later.
First it applies to the stream endpoint and not the conn_stream, and
second it only tests and touches the flags so it makes sense to call
it se_fl_ like other functions which only manipulate the flags, as
it's just a special case of flags.
It returns an stconn from a connection and not the opposite, so the name
change was more appropriate. In addition it was moved to connection.h
which manipulates the connection stuff, and it happens that only
connection.c uses it.
The following functions which act on a connection-based stream connector
were renamed to sc_conn_* (~60 places):
cs_conn_drain_and_shut
cs_conn_process
cs_conn_read0
cs_conn_ready
cs_conn_recv
cs_conn_send
cs_conn_shut
cs_conn_shutr
cs_conn_shutw
The function doesn't return a pointer to the mux but to the mux stream
(h1s, h2s etc). Let's adjust its name to reflect this. It's rarely used,
the name can be enlarged a bit. And of course s/cs/sc to accommodate for
the updated name.
These functions return the app-layer associated with an stconn, which
is a check, a stream or a stream's task. They're used a lot to access
channels, flags and for waking up tasks. Let's just name them
appropriately for the stream connector.
We're starting to propagate the stream connector's new name through the
API. Most call places of these functions that retrieve the channel or its
buffer are in applets. The local variable names are not changed in order
to keep the changes small and reviewable. There were ~92 uses of cs_ic(),
~96 of cs_oc() (due to co_get*() being less factorizable than ci_put*),
and ~5 accesses to the buffer itself.
This applies the change so that the applet code stops using ci_putchk()
and friends everywhere possible, for the much saferapplet_put*() instead.
The change is mechanical but large. Two or three functions used to have no
appctx and a cs derived from the appctx instead, which was a reminiscence
of old times' stream_interface. These were simply changed to directly take
the appctx. No sensitive change was performed, and the old (more complex)
API is still usable when needed (e.g. the channel is already known).
The change touched roughly a hundred of locations, with no less than 124
lines removed.
It's worth noting that the stats applet, the oldest of the series, could
get a serious lifting, as it's still very channel-centric instead of
propagating the appctx along the chain. Given that this code doesn't
change often, there's no emergency to clean it up but it would look
better.
For historical reasons (stream-interface and connections), we used to
require two independent fields for the application level callbacks and
the transport-level functions. Over time the distinction faded away so
much that the low-level functions became specific to the application
and conversely. For example, applets may only work with streams on top
since they rely on the channels, and the stream-level functions differ
between applets and connections. Right now the application level only
contains a wake() callback and the low-level ones contain the functions
that act at the lower level to perform the shutr/shutw and at the upper
level to notify about readability and writability. Let's just merge them
together into a single set and get rid of this confusing distinction.
Note that the check ops do not define any app-level function since these
are only called by streams.
This also follows the natural naming. There are roughly 238 changes, all
totally trivial. conn_stream-t.h has become completely void of any
"conn_stream" related stuff now (except its name).
This renames the "struct conn_stream" to "struct stconn" and updates
the descriptions in all comments (and the rare help descriptions) to
"stream connector" or "connector". This touches a lot of files but
the change is minimal. The local variables were not even renamed, so
there's still a lot of "cs" everywhere.
Let's start to introduce the stream connector at the app_ops level.
This is entirely self-contained into conn_stream.c. The functions
were also updated to reflect the new name, and the comments were
updated.
Just like for the appctx, this is a pointer to a stream endpoint descriptor,
so let's make this explicit and not confuse it with the full endpoint. There
are very few changes thanks to the preliminary refactoring of the flags
manipulation.
Now at least it makes it obvious that it's the stream endpoint descriptor
and not an endpoint. There were few changes thanks to the previous refactor
of the flags.
After some discussion we found that the cs_endpoint was precisely the
descriptor for a stream endpoint, hence the naturally coming name,
stream endpoint constructor.
This patch renames only the type everywhere and the new/init/free functions
to remain consistent with it. Future patches will address field names and
argument names in various code areas.
That's the "stream endpoint" pointer. Let's change it now while it's
not much spread. The function __cs_endp_target() wasn't yet renamed
because that will change more globally soon.
This changes all main uses of endp->flags to the se_fl_*() equivalent
by applying coccinelle script endp_flags.cocci. The se_fl_*() functions
themselves were manually excluded from the change, of course.
Note: 144 locations were touched, manually reviewed and found to be OK.
The script was applied with all includes:
spatch --in-place --recursive-includes -I include --sp-file $script $files
This changes all main uses of cs->endp->flags to the sc_ep_*() equivalent
by applying coccinelle script cs_endp_flags.cocci.
Note: 143 locations were touched, manually reviewed and found to be OK,
except a single one that was adjusted in cs_reset_endp() where the flags
are read and filtered to be used as-is and not as a boolean, hence was
replaced with sc_ep_get() & $FLAGS.
The script was applied with all includes:
spatch --in-place --recursive-includes -I include --sp-file $script $files
This one is exclusively used by the connection, regardless its generic
name "ctx" is rather confusing. Let's make it a struct connection* and
call it "conn". This way there's no doubt about what it is and there's
no way it will be used by accident by being taken for something else.
This test in cs_update_rx() was introduced in 1.9 by commit b26a6f970
("MEDIUM: stream-int: make use of si_rx_chan_{rdy,blk} to control the
stream-int from the channel"), but by then already it was not needed
because the RX_WAIT_EP flag has never been part of RXBLK_ANY so there's
no point doing "flags & RXBLK_ANY & ~RX_WAIT_EP", that part is already
complicated enough like this.
Adjust the size of the sample buffer before we change the "area"
pointer. Otherwise, we end up not changing the size, because the area
pointer is already the same as "start" before we compute the difference
between the two.
This is similar to the change in b28430591d
but for the word converter instead of field.
Fix a typo that lead to using the wrong pointer when loading a
certificate, which lead to always using the pem loader for every
parameeter.
Use the cert_ext->load() ptr instead of cert_exts->load() which was the
first element of the cert_exts[] array.
Enhance the error message with the field name.
Should fix issue #1716
Commit 2cb3be76b ("CLEANUP: init: address a coverity warning about
possible multiply overflow") was incomplete, two other locations were
present. This should address issue #1585.
Bring some improvment to h3_parse_settings_frm() function. The first one
is the parsing which now manipulates a buffer instead of a plain char*.
This is more to unify with other parsing functions rather than dealing
with data wrapping : it's unlikely to happen as SETTINGS is only
received as the first frame on the control STREAM.
Various errors are now properly reported as connection error :
* on incomplete frame payload
* on a duplicated settings in the same frame
* on reserved settings receive
As specified by HTTP/3 draft, an unknown unidirectional stream can be
aborted. To do this, use a new flag QC_SF_READ_ABORTED. When the MUX
detects this flag, QCS instance is automatically freed.
Previously, such streams were instead automatically drained. By aborting
them, we economize some useless memcpy instruction. On future data
reception, QCS instance is not found in the tree and considered as
already closed. The frame payload is thus deleted without copying it.
Remove all unnecessary bits of code for H3 unidirectional streams. Most
notable, an individual tasklet is not require anymore for each stream.
This is useless since the merge of RX/TX uni streams handling with
bidirectional streams code.
The whole QUIC stack is impacted by this change :
* at quic-conn level, a single function is now used to handle uni and
bidirectional streams. It uses qcc_recv() function from MUX.
* at MUX level, qc_recv() io-handler function does not skip uni streams
* most changes are conducted at app layer. Most notably, all received
data is handle by decode_qcs operation.
Now that decode_qcs is the single app read function, the H3 layer can be
simplified. Uni streams parsing was extracted from h3_attach_ruqs() to
h3_decode_qcs().
h3_decode_qcs() is able to deal with all HTTP/3 frame types. It first
check if the frame is valid for the H3 stream type. Most notably,
SETTINGS parsing was moved from h3_control_recv() into h3_decode_qcs().
This commit has some major benefits besides removing duplicated code.
Mainly, QUIC flow control is now enforced for uni streams as with bidi
streams. Also, an unknown frame received on control stream does not set
an error : it is now silently ignored as required by the specification.
Some cleaning in H3 code is already done with this patch :
h3_control_recv() and h3_attach_ruqs() are removed as they are now
unused. A final patch should clean up the unneeded remaining bit.
Define a new function h3_parse_uni_stream_no_h3(). It can be used to
handle the payload of streams which does not convey H3 frames. This is
mainly useful for QPACK encoder/decoder streams. It can also be used for
a stream of unknown type which should be drain without parsing it.
This patch is useful to extract code in a dedicated function. It will be
simple to reuse it in h3_decode_qcs() when uni-streams reception is
unify with bidirectional streams, without using dedicated stream tasklet.
Define a new function h3_is_frame_valid(). It returns if a frame is
valid or not depending on the stream which received it.
For the moment, it is used in h3_decode_qcs() which only deals with
bidirectional streams. Soon, uni streams will use the same function,
rendering the frame type check useful.
Define a new function h3_init_uni_stream(). This can be used to read the
stream type of an unidirectional stream. There is no functional change
with previous code.
This patch will be useful to unify reception for uni streams with
bidirectional ones.
Define a new enum h3s_t. This is used to differentiate between the
different stream types used in a HTTP/3 connection, including the QPACK
encoder/decoder streams.
For the moment, only bidirectional streams is positioned. This patch
will be useful to unify reception of uni streams with bidirectional
ones.
Replace h3_uqs type by qcs in stream callbacks. This change is done in
the context of unification between bidi and uni-streams. h3_uqs type
will be unneeded when this is achieved.
Remove the unneeded skip over unidirectional streams in qc_send(). This
unify sending for both uni and bidi streams.
In fact, the only local unidirectional streams in use for the moment is
the H3 Control stream responsible of SETTINGS emission. The frame was
already properly generated in qcs.tx.buf, but not send due to stream
skip in qc_send(). Now, there is no need to ignore uni streams so remove
this condition.
This fixes the emission of H3 settings which is now properly emitted.
Uni and bidi streams use the same set of funtcions for sending. One of
the most notable gain is that flow-control is now enforced for uni
streams.
Emit STREAM_STATE_ERROR connection error in two cases :
* if receiving data for send-only stream
* if receiving data on a locally initiated stream not open yet
For the moment the first case cannot be encoutered as uni streams
reception does not use qcc_recv(). However, this will be soon
implemented with the unification between bidi and uni streams.
The whole frame payload must have been received to demux a H3 frames,
except for H3 DATA which can be fragmented into multiple HTX blocks.
If the frame is bigger than the buffer and is not a DATA frame, a
connection error is reported with error H3_EXCESSIVE_LOAD.
This should be completed in the future with the H3 settings to limit the
size of uncompressed header section.
This code is more generic : it can handle every H3 frames. This is done
in order to be able to use h3_decode_qcs() to demux both uni and bidir
streams.
Similar to sending, read operations are disabled when a CONNECTION_CLOSE
frame has been emitted.
Most notably, this prevents unneeded loop demuxing when the H3 layer has
issue an error and cannot process the buffer payload anymore.
Note that read is not prevented for unidirectional streams for the
moment. This will supported soon with the unification of bidir and uni
streams treatment.
Complete quic-conn API for error reporting. A new parameter <app> is
defined in the function quic_set_connection_close(). This will transform
the frame into a CONNECTION_CLOSE_APP type.
This type of frame will be generated by the applicative layer, h3 or
hq-interop for the moment. A new function qcc_emit_cc_app() is exported
by the MUX layer for them.
The only change is that the H3_CF_SETTINGS_SENT flag if-condition is
replaced by a BUG_ON statement. This may help to catch multiple calls on
h3_control_send() instead of silently ignore them.
h3_parse_settings_frm() read one byte after the frame payload. Fix the
parsing code. In most cases, this has no impact as we are inside an
allocated buffer but it could cause a segfault depending on the buffer
alignment.
struct h3 represents the whole HTTP/3 connection. A new type h3s was
recently introduced to represent a single HTTP/3 stream. To facilitate
the analogy with other haproxy code, most notable in MUX, rename h3 type
to h3c.
Do not allocate cs_endpoint for every QCS instances in qcs_new().
Instead, this is delayed to qc_attach_cs() function.
In effect, with H3 as app protocol, cs_endpoint will be allocated on
HEADERS parsing. Thus, no cs_endpoint is allocated for H3 unidirectional
streams which do not convey any HTTP data.
h3_b_dup() is used to obtains a ncbuf representation into a struct
buffer. ncbuf can thus be marked as a const parameter. This will allows
function which already manipulates a const ncbuf to use it.
The previous fix:
BUG/MEDIUM: peers: fix segfault using multiple bind on peers
Prevents to declare multiple listeners on a peers sections but if
peers protocol is extended to support this we could raise the bug
again.
Indeed, after allocating a new listener and adding it to a list the
code mistakenly re-configure the first element of the list instead
of the new added one, and the last one remains finally uninitialized.
The previous fix assure there is no more than one listener in this
list but this could be changed in futur.
This patch patch assures we configure and initialize the newly added
listener instead of the first one in the list.
This patch could be backported until version 2.0 to complete
BUG/MEDIUM: peers: fix segfault using multiple bind on peers
If multiple "bind" lines were present on the "peers" section, multiple
listeners were added to a list but the code mistakenly initialize
the first member and this first listener was re-configured instead of
the newly created one. The last one remains uninitialized causing a null
dereference a soon a connection is received.
In addition, the 'peers' sections and protocol are not currently designed to
handle multiple listeners.
This patch check if there is already a listener configured on the 'peers'
section when we want to create a new one. This is rising an error if
a listener is already present showing the file and line in the error
message.
To keep the file and line number of the previous listener available
for the error message, the 'bind_conf_uniq_alloc' function was modified
to keep the file/line data the struct 'bind_conf' was firstly
allocated (previously it was updated each time the 'bind_conf' was
reused).
This patch should be backported until version 2.0
resolvers_deinit() function is called on error, during post-parsing stage,
or on deinit, when HAProxy is stopped. It releases all entities: resolvers,
resolutions and SRV requests. There is no reason to defer the resolutions
release by moving them in the death_row list because this function is
terminal. And it is in fact a bug. Resolutions must not be released at the
end of the function because resolvers were already freed. However some
resolutions may still be attached to a reolver. Thus, when we try to remove
it from the resolver's tree, in resolv_reset_resolution(), this resolver was
already released.
So now, resolution are immediately released. It means there is no more
reason to track this function. calls to
enter_resolver_code()/leave_resolver_code() have been removed.
This patch should fix the issue #1680 and may be related to #1485. It must
be backported as far as 2.2.
We used to support both RTSP and HTTP protocol version names with and
without accept-invalid-http-request, but since this is based on the
characters themselves, any protocol made of chars {0-9/.HPRST} was
possible and not others. Now that such non-standard protocols are
restricted to accept-invalid-http-request, there's no reason for not
allowing other letters. With this patch, characters {0-9./A-Z} are
permitted when the option is set.
This patch hardens the verification of the HTTP/1.x version line
(i.e. the first line within an HTTP/1.x request) to verify that
the protocol name within the version actually reads "HTTP".
Previously protocols that superficially resembled the wire-format
of HTTP/1.x and having a 4-letter acronym as the protocol name, such
as RTSP would pass this check.
This patch fixes GitHub issue #540, it must be backported to all
supported versions. The legacy, non-HTX parser is affected as well,
a fix must be created for it as well.
Note that such protocols can still be used when option
accept-invalid-http-request is set.
In issue #1585 Coverity suspects a risk of multiply overflow when
calculating the SSL cache size, though in practice the cache is
limited to 2^32 anyway thus it cannot really happen. Nevertheless,
casting the operation should be sufficient to avoid marking it as a
false positive.
This reverts commit 118b2cbf84.
This patch was useful mainly for the docker image of QUIC interop to
have traces on stdout.
A better solution has been found by integrating this patch directly in
the qns repository which is used to build the docker image. Thus, this
hack is not require anymore in the main repository.
The wake handler detects if the frontend is closed. This can happen if
the proxy has been disabled individually or even on process soft-stop.
Before this patch, in this condition QCS instances were freed before
being detached from the cs_endpoint. This clearly violates the haproxy
connection architecture and cause a BUG_ON statement crash in cs_free().
To handle this properly, cs_endpoint is notified by setting RD_SH|WR_SH
on connection flags. The cs_endpoint will thus use the detach operation
which allows the QCS instance to be freed.
This code allows the soft-stop process to complete as soon as possible.
However, the client is not notified about the connection closing. It
should be done by emitting a H3 GOAWAY + CONNECTION_CLOSE. Sadly, this
is impossible at this stage because the listener sockets are closed so
the quic-conn cannot use it to emit new frames. At this stage the client
will most probably detect connection closing on its idle timeout
expiration.
Thus, to completely support proxy closing/soft-stop, important
architecture changes are required in QUIC socket management. This is
also linked with the reload feature.
The given size must be the size of the destination buffer, not the size of the
(binary) address representation.
This fixes GitHub issue #1599.
The bug was introduced in 92149f9a82 which is in
2.4+. The fix must be backported there.
If QUIC support is enabled both branches of the ternary conditional are
identical, upsetting Coverity. Move the full conditional into the non-QUIC
preprocessor branch to make the code more clear.
This resolves GitHub issue #1710.
When we receive a CONNECTION_CLOSE frame, we should decrement this counter
if the handshake state was not successful and if we have not received
a TLS alert from the TLS stack.
When a bind line is configured without the "ssl" keyword, a warning is
emitted and a crash happens at runtime:
bind quic4@:4449 crt rsa+dh2048.pem alpn h3 allow-0rtt
[WARNING] (17867) : config : Proxy 'decrypt': A certificate was specified but SSL was not enabled on bind 'quic4@:4449' at [quic-mini.cfg:24] (use 'ssl').
Let's automatically turn SSL on when QUIC is detected, as it doesn't
exist without SSL anyway. It solves the runtime issue, and also makes
sure it is not possible to accidentally configure a quic listener with
no certificate since the error is detected via the SSL checks.
A warning is emitted in this case, to encourage the user to fix the
configuration so that it remains reviewable.
When no mux protocol is configured on a bind line with "proto", and the
transport layer is QUIC, right now mux_h1 is being used, leading to a
crash.
Now when the transport layer of the bind line is already known as being
QUIC, let's automatically try to configure the QUIC mux, so that users
do not have to enter "proto quic" all the time while it's the only
supported option. this means that the following line now works:
bind quic4@:4449 ssl crt rsa+dh2048.pem alpn h3 allow-0rtt
Till now, placing "proto h1" or "proto h2" on a "quic" bind or placing
"proto quic" on a TCP line would parse fine but would crash when traffic
arrived. The reason is that there's a strong binding between the QUIC
mux and QUIC transport and that they're not expected to be called with
other types at all.
Now that we have the mux's type and we know the type of the protocol used
on the bind conf, we can perform such checks. This now returns:
[ALERT] (16978) : config : frontend 'decrypt' : stream-based MUX protocol 'h2' is incompatible with framed transport of 'bind quic4@:4448' at [quic-mini.cfg:27].
[ALERT] (16978) : config : frontend 'decrypt' : frame-based MUX protocol 'quic' is incompatible with stream transport of 'bind :4448' at [quic-mini.cfg:29].
This config tightening is only tagged MINOR since while such a config,
despite not reporting error, cannot work at all so even if it breaks
experimental configs, they were just waiting for a single connection
to crash.
In order to be able to check compatibility between muxes and transport
layers, we'll need a new flag to tag muxes that work on framed transport
layers like QUIC. Only QUIC has this flag now.
We used to preset XPRT_SSL on bind_conf->xprt when parsing the "ssl"
keyword, which required to be careful about what QUIC could have set
before, and which makes it impossible to consider the whole line to
set all options.
Now that we have the BC_O_USE_SSL option on the bind_conf, it becomes
easier to set XPRT_SSL only once the bind_conf's args are parsed.
It used to be set when parsing the listeners' addresses but this comes
with some difficulties in that other places have to be careful not to
replace it (e.g. the "ssl" keyword parser).
Now we know what protocols a bind_conf line relies on, we can set it
after having parsed the whole line.
Now that we have a function to parse all bind keywords, and that we
know what types of sock-level and xprt-level protocols a bind_conf
is using, it's easier to centralize the check for stream vs dgram
conflict by putting it directly at the end of the args parser. This
way it also works for peers, provides better precision in the report,
and will also allow to validate transport layers. The check was even
extended to detect inconsistencies between xprt layer (which were not
covered before). It can even detect that there are two incompatible
"bind" lines in a single peers section.
Let's collect the set of xprt-level and sock-level dgram/stream protocols
seen on a bind line and store that in the bind_conf itself while they're
being parsed. This will make it much easier to detect incompatibilities
later than the current approch which consists in scanning all listeners
in post-parsing.
This now makes sure that both the peers' "bind" line and the regular one
will use the exact same parser with the exact same behavior. Note that
the parser applies after the address and that it could be factored
further, since the peers one still does quite a bit of duplicated work.
The "bind" parsing code was duplicated for the peers section and as a
result it wasn't kept updated, resulting in slightly different error
behavior (e.g. errors were not freed, warnings were emitted as alerts)
Let's first unify it into a new dedicated function that properly reports
and frees the error.
There's been some great confusion between proto_type, ctrl_type and
sock_type. It turns out that ctrl_type was improperly chosen because
it's not the control layer that is of this or that type, but the
transport layer, and it turns out that the transport layer doesn't
(normally) denaturate the underlying control layer, except for QUIC
which turns dgrams to streams. The fact that the SOCK_{DGRAM|STREAM}
set of values was used added to the confusion.
Let's replace it with xprt_type which reuses the later introduced
PROTO_TYPE_* values, and update the comments to explain which one
works at what level.
Just trying "quic4@:4433" with USE_QUIC not set rsults in such a cryptic
error:
[ALERT] (14610) : config : parsing [quic-mini.cfg:44] : 'bind' : unsupported protocol family 2 for address 'quic4@:4433'
Let's at least add the stream and datagram statuses to indicate what was
being looked for:
[ALERT] (15252) : config : parsing [quic-mini.cfg:44] : 'bind' : unsupported stream protocol for datagram family 2 address 'quic4@:4433'
Still not very pretty but gives a little bit more info.
In case the str2listener() parser reports a generic error with no message
when parsing the argument of a "bind" statement in a "peers" section, the
reported error indicates an invalid address on the empty arg. This has
existed since 2.0 with commit 355b2033e ("MINOR: cfgparse: SSL/TLS binding
in "peers" sections."), so this must be backported till 2.0.
As specified by the RFC reception of different STREAM data for the same
offset should be treated with a CONNECTION_CLOSE with error
PROTOCOL_VIOLATION.
Use ncbuf API to detect this case : if add operation fails with
NCB_RET_DATA_REJ with add mode NCB_ADD_COMPARE.
Send a CONNECTION_CLOSE on reception of a STREAM frame for a STREAM id
exceeding the maximum value enforced. Only implemented for bidirectional
streams for the moment.
Send a CONNECTION_CLOSE if the peer emits more data than authorized by
our flow-control. This is implemented for both stream and connection
level.
Fields have been added in qcc/qcs structures to differentiate received
offsets for limit enforcing with consumed offsets for sending of
MAX_DATA/MAX_STREAM_DATA frames.
Define an API to easily set a CONNECTION_CLOSE. This will mainly be
useful for the MUX when an error is detected which require to close the
whole connection.
On the MUX side, a new flag is added when a CONNECTION_CLOSE has been
prepared. This will disable add future send operations.
We rely on <conn_opening> stats counter and tune.quic.retry_threshold
setting to dynamically start sending Retry packets. We continue to send such packets
when "quic-force-retry" setting is set. The difference is when we receive tokens.
We check them regardless of this setting because the Retry could have been
dynamically started. We must also send Retry packets when we receive Initial
packets without token if the dynamic Retry threshold was reached but only for connection
which are not currently opening or in others words for Initial packets without
connection already instantiated. Indeed, we must not send Retry packets for all
Initial packets without token. For instance a client may have already sent an
Initial packet without receiving Retry packet because the Retry feature was not
started, then the Retry starts on exeeding the threshold value due to others
connections, then finally our client decide to send another Initial packet
(to ACK Initial CRYPTO data for instance). It does this without token. So, for
this already existing connection we must not send a Retry packet.
This QUIC specific keyword may be used to set the theshold, in number of
connection openings, beyond which QUIC Retry feature will be automatically
enabled. Its default value is 100.
First commit to handle the QUIC stats counters. There is nothing special to say
except perhaps for ->conn_openings which is a gauge to count the number of
connection openings. It is incremented after having instantiated a quic_conn
struct, then decremented when the handshake was successful (handshake completed
state) or failed or when the connection timed out without reaching the handshake
completed state.
Move the code which finalizes the QUIC connections initialisations after
having called qc_new_conn() into this function to benefit from its
error handling to release the memory allocated for QUIC connections
the initialization of which could not be finalized.
Here is the format of a token:
- format (1 byte)
- ODCID (from 9 up 21 bytes)
- creation timestamp (4 bytes)
- salt (16 bytes)
A format byte is required to distinguish the Retry token from others sent in
NEW_TOKEN frames.
The Retry token is ciphered after having derived a strong secret from the cluster secret
and generated the AEAD AAD, as well as a 16 bytes long salt. This salt is
added to the token. Obviously it is not ciphered. The format byte is not
ciphered too.
The AAD are built by quic_generate_retry_token_aad() which concatenates the version,
the client SCID and the IP address and port. We had to implement quic_saddr_cpy()
to copy the IP address and port to the AAD buffer. Only the Retry SCID is generated
on our side to build a Retry packet, the others fields come from the first packet
received by the client. It must reuse this Retry SCID in response to our Retry packet.
So, we have not to store it on our side. Everything is offloaded to the client (stateless).
quic_generate_retry_token() must be used to generate a Retry packet. It calls
quic_pkt_encrypt() to cipher the token.
quic_generate_retry_check() must be used to check the validity of a Retry token.
It is able to decipher a token which arrives into an Initial packet in response
to a Retry packet. It calls parse_retry_token() after having deciphered the token
to store the ODCID into a local quic_cid struct variable. Finally this ODCID may
be stored into the transport parameter thanks to qc_lstnr_params_init().
The Retry token lifetime is 10 seconds. This lifetime is also checked by
quic_generate_retry_check(). If quic_generate_retry_check() fails, the received
packet is dropped without anymore packet processing at this time.
This function does exactly the same thing as quic_tls_decrypt(), except that
it does reuse its input buffer as output buffer. This is needed
to decrypt the Retry token without modifying the packet buffer which
contains this token. Indeed, this would prevent us from decryption
the packet itself as the token belong to the AEAD AAD for the packet.
This function must be used to derive strong secrets from a non pseudo-random
secret (cluster-secret setting in our case) and an IV. First it call
quic_hkdf_extract_and_expand() to do that for a temporary strong secret (tmpkey)
then two calls to quic_hkdf_expand() reusing this strong temporary secret
to derive the final strong secret and IV.
In issue #1563, Coverity reported a very interesting issue about a
possible UAF in the config parser if the config file ends in with a
very large line followed by an empty one and the large one causes an
allocation failure.
The issue essentially is that we try to go on with the next line in case
of allocation error, while there's no point doing so. If we failed to
allocate memory to read one config line, the same may happen on the next
one, and blatantly dropping it while trying to parse what follows it. In
the best case, subsequent errors will be incorrect due to this prior error
(e.g. a large ACL definition with many patterns, followed by a reference of
this ACL).
Let's just immediately abort in such a condition where there's no recovery
possible.
This may be backported to all versions once the issue is confirmed to be
addressed.
Thanks to Ilya for the report.
If an unlisted errno is reported, abort the process. If a crash is
reported on this condition, we must determine if the error code is a
bug, should interrupt emission on the fd or if we can retry the syscall.
If sendto returns an error, we should not retry the call and break from
the sending loop. An exception is made for EINTR which allows to retry
immediately the syscall.
This bug caused an infinite loop reproduced when the process is in the
closing state by SIGUSR1 but there is still QUIC data emission left.
Instead of using the health-check to subscribe to read/write events, we now
rely on the conn-stream. Indeed, on the server side, the conn-stream's
endpoint is a multiplexer. Thus it seems appropriate to handle subscriptions
for read/write events the same way than for the streams. Of course, the I/O
callback function is not the same. We use srv_chk_io_cb() instead of
cs_conn_io_cb().
event_srv_chk_io() function is renamed srv_chk_io_cb() to be consistant with
the I/O callback function of connections. In addition, this function is
exported. It will be required to use the conn-stream's subscriptions.
The connection is already closed by the health-check itself. Thus there is
now reason to duplicate this part in the .wake callback function. It is
enough to wake the health-check and wait.
The buffer wait list is used to deal with buffer allocation failure. But at
the end of health-check, it must be reinitialized. There is no reason to
reason to get a buffer between two health-check runs. And in fact, the
associated flags, CHK_ST_IN_ALLOC and CHK_ST_OUT_ALLOC, are already cleared
at the end of a health-check.
This patch must be backported as far as 2.2. On the 2.2, MT_LIST_ADDED and
MT_LIST_DEL must be used instead of LIST_INLIST and LIST_DEL_INIT.
When the line parsing failed because outline buffer must be reallocated, if
my_realloc2() call fails, the buffer size must be reset. Indeed, in this case
the current line is skipped, a fatal error is reported and we jump to the next
line. At this stage the outline buffer is NULL. If the buffer size is not reset,
the next call to parse_line() crashes because we try to write in the buffer. We
fail to detect the outline buffer is too small to copy any character.
To fix the issue, outlinesize variable must be set to 0 when outline allocation
failed.
This patch should fix the issue #1563. It must be backported as far as 2.2.
Release the QCS RX buffer if emptied afer qcs_consume(). This improves
memory usage and avoids a QCS to keep an allocated buffer, particularly
when no data is received anymore. Buffer is automatically reallocated if
needed via qc_get_ncbuf().
qc_free_ncbuf() may now be used with a NCBUF_NULL buffer as parameter.
This is useful when using this function on a QCS with no allocated
buffer. This case was not reproduced for the moment, but it will soon
become more present as buffers will be released if emptied.
Also a call to offer_buffers() is added to conform with the dynamic
buffer management of haproxy.
This commit is similar to the previous one but deals with MAX_DATA for
connection-level data flow control. It uses the same function
qcc_consume_qcs() to update flow control level and generate a MAX_DATA
frame if needed.
Send MAX_STREAM_DATA frames when at least half of the allocated
flow-control has been demuxed, frame and cleared. This is necessary to
support QUIC STREAM with received data greater than a buffer.
Transcoders must use the new function qcc_consume_qcs() to empty the QCS
buffer. This will allow to monitor current flow-control level and
generate a MAX_STREAM_DATA frame if required. This frame will be emitted
via qc_io_cb().
Adjust the mechanism for MAX_STREAMS_BIDI emission. When a bidirectional
stream is removed, current flow-control level is checked. If needed, a
MAX_STREAMS_BIDI frame is generated and inserted in a new list in the
QCS instance. The new frames will be emitted at the start of qc_send().
This has no impact on the current MAX_STREAMS_BIDI behavior. However,
this mechanism is more flexible and will allow to implement quickly
MAX_STREAM_DATA/MAX_DATA emission.
Slightly change the interface for qcc_recv() between MUX and XPRT. The
MUX is now responsible to call qcc_decode_qcs(). This is cleaner as now
the XPRT does not have to deal with an extra QCS parameter and the MUX
will call qcc_decode_qcs() only if really needed.
This change is possible since there is no extra buffering for
out-of-order STREAM frames and the XPRT does not have to handle buffered
frames.
Previously, qc_io_cb() of mux-quic only dealt with TX. Add support for
RX in it. This is done through a new function qc_recv(qcc). It loops
over all QCS instances and call qcc_decode_qcs(qcs).
This has no impact from the quic-conn layer as qcc_decode_qcs(qcs) is
called directly. However, this allows to have a resume point when demux
is blocked on the upper layer HTX full buffer.
Note that for the moment, only RX for bidirectional streams is managed
in qc_io_cb(). Unidirectional streams use their own mechanism for both
TX/RX. It should be unified in the near future in a refactoring.
Flag QCS if HTX buffer is full on demux. This will block all future
operations on QCS demux and should limit unnecessary decode_qcs() calls.
The flag is cleared on rcv_buf operation called by conn-stream.
Previously, H3 demuxer refused to proceed the payload if the frame was
not entirely received and the QCS buffer is not full. This code was
duplicated from the H2 demuxer.
In H2, this is a justified optimization as only one frame at a time can
be demuxed. However, this is not the case in H3 with interleaved frames
in the lower layer QUIC STREAM frames.
This condition is now removed. H3 demuxer will proceed payload as soon
as possible. An exception is kept for HEADERS frame as the code is not
able to deal with partial HEADERS.
With this change, H3 demuxer should consume less memory. To ensure that
we never received a HEADER bigger than the RX buffer, we should use the
H3 SETTINGS_MAX_FIELD_SECTION_SIZE.
First adjusted some typos in comments inside the function. Second,
change the naming of some variable to reduce confusion.
A special case has been inserted when advance is done inside a GAP block
and this block is the last of the buffer. In this case, the whole buffer
will be emptied, equivalent to a ncb_init() operation.
ncb_is_empty() was plainly incorrect as it directly dereferences the
memory to read offset blocks instead of ncb_read_off(). The result is
undefined.
Also, BUG_ON() statement is wrong when the buffer starts with a data
block. In this case, ncb_head() is not the first gap offset but instead
just random data. The calculated sum in BUG_ON() statement has thus no
meaning and may cause an abort. Adjust this by reorganizing the whole
function. Only the first data block size is read. If and only if not
nul, the first gap size is then checked.
ncb_is_full() has been rewritten to share the same model as
ncb_is_empty().
The quic-conn manages a buffer to store received QUIC packets. When the
buffer wraps, the gap is filled until the end with junk and packets can
be inserted at the start of the buffer.
On the other end, deletion is implemented via quic_rx_pkts_del().
Packets are removed one by one if their refcount is nul. If junk is
found, the buffer is emptied until its wrap.
This seems to work in most cases but a bug was found in a particular
case : on insertion if buffer gap is not at the end of the buffer. In
this case, the gap was filled, which is useless as now the buffer is
full and the packet cannot be inserted. Worst, on deletion, when junk is
removed there is a risk to removed new packets. This can happens in the
following case :
1. buffer contig space is too small, junk is inserted in the middle of
it
2. on quic_rx_pkts_del() invocation, a packet is removed, but not the
next one because its refcount is still positive. When a new packet is
received, it will be stored after the junk.
3. on next quic_rx_pkts_del(), when junk is removed, all contig data is
cleared, with newer packets data too.
This will cause a transfer between a client and haproxy to be stalled.
This can be reproduced with big enough POST requests. I triggered it
with ngtcp2 and 10M of posted data.
Hopefully, the solution of this bug is simple. If contig space is not
big enough to store a packet, but the space is not at the end of the
buffer, no junk is inserted and the packet is dropped as we cannot
buffered it. This ensures that junk is only present at the end of the
buffer and when removed no packets data is purged with it.
In httpclient_applet_init() function, ss_dst variable is always defined
before the call to sockaddr_alloc(). There is no reason to test it.
This patch should fix the issue #1706.
labels used in goto statement was not called in the right order. Thus if
there is an error during the appctx startup, it is possible to leak a task.
This patch should fix the issue #1703. No backport needed.
Even if `unique_id` and `s->unique_id` are identical it is a bit odd to
`isttest()` `unique_id` and then use `s->unique_id` in the call to `http_add_header()`.
This "issue" was introduced in a17e66289c,
because before that commit the function returned the length of the ID, as it
was not an ist.
When loading providers with 'ssl-provider' global options, this
ssl-provider-path option can be used to set the search path that is to
be used by openssl. It behaves the same way as the OPENSSL_MODULES
environment variable.
negation or default modifiers are not supported for this option. However,
this was already tested earlier in cfg_parse_listen() function. Thus, when
"http-restrict-req-hdr-names" option is parsed, the keyword modifier is
always equal to KWM_STD. It is useless to test it again at this place.
This patch should solve the issue #1702.
The appctx is already the endpoint target. It is confusing to also use it to
set the endpoint context. So, never set the endpoint ctx when an appctx is
created or attached to an existing conn-stream.
When creating a new applet for peer outgoing connection, we check
the load on each thread. Threads with least applet count are
preferred.
With this solution we avoid a situation when many outgoing
connections run on the same thread causing significant load on
single CPU core.
It is now possible to start an appctx on a thread subset. Some controls were
added here and there. It is forbidden to start a backend appctx on another
thread than the local one. If a frontend appctx is started on another thread
or a thread subset, the applet .init callback function must be defined. This
callback function is responsible to finalize the appctx startup. It can be
performed synchornously. In this case, the appctx is started on the local
thread. It is not really useful but it is valid. Or it can be performed
asynchronously. In this case, .init callback function is called when the
appctx is woken up for the first time. When this happens, the appctx
affinity is set to the current thread to be able to start the session and
the stream.
In the same way than for the tasks, the applets api was changed to be able
to start a new appctx on a thread subset. For now the feature is
disabled. Only appctx_new_here() is working. But it will be possible to
start an appctx on a specific thread or a subset via a mask.
A .init callback function is defined for the peer_applet applet. This
function finishes the appctx startup by calling appctx_finalize_startup()
and its handles the stream customization.
A .init callback function is defined for the sink_forward_applet applet.
This function finishes the appctx startup by calling
appctx_finalize_startup() and its handles the stream customization.
A .init callback function is defined for the httpclient_applet applet. This
function finishes the appctx startup by calling appctx_finalize_startup()
and its handles the stream customization.
A .init callback function is defined for the update_applet applet. This
function finishes the appctx startup by calling appctx_finalize_startup()
and its handles the stream customization.
A .init callback function is defined for the spoe_applet applet. This
function finishes the spoe_appctx initialization. It also finishes the
appctx startup by calling appctx_finalize_startup() and its handles the
stream customization.
A .init callback function is defined for the dns_session_applet applet. This
function finishes the appctx startup by calling appctx_finalize_startup()
and its handles the stream customization.
appctx_free_on_early_error() must be used to release a freshly created
frontend appctx if an error occurred during the init stage. It takes care to
release the stream instead of the appctx if it exists. For a backend appctx,
it just calls appctx_free().
appctx_finalize_startup() may be used to finalize the frontend appctx
startup. It is responsible to create the appctx's session and the frontend
conn-stream. On error, it is the caller responsibility to release the
appctx. However, the session is released if it was created. On success, if
an error is encountered in the caller function, the stream must be released
instead of the appctx.
This function should ease the init stage when new appctx is created.
It is just a helper function that call the .init applet callback function,
if it exists. This will simplify a bit the init stage when a new applet is
started. For now, this callback function is only used when a new service is
started.
The session created for frontend applets is now totally owns by the
corresponding appctx. It means the appctx is now responsible to release
it. This removes the hack in stream_free() about frontend applets to be sure
to release the session.
Applets were moved at the same level than multiplexers. Thus, gradually,
applets code is changed to be less dependent from the stream. With this
commit, the frontend appctx are ready to own the session. It means a
frontend appctx will be responsible to release the session.
If no private key can be found in a bind line's certificate and
ssl-load-extra-files is set to none we end up trying to call
X509_check_private_key with a NULL key, which crashes.
This fix should be backported to all stable branches.
Found with -Wmissing-prototypes:
src/hlua_fcn.c:53:5: fatal error: no previous prototype for function 'hlua_checkboolean' [-Wmissing-prototypes]
int hlua_checkboolean(lua_State *L, int index)
^
src/hlua_fcn.c:53:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int hlua_checkboolean(lua_State *L, int index)
^
static
1 error generated.
Found with -Wmissing-prototypes:
src/ssl_utils.c:22:5: fatal error: no previous prototype for function 'cert_get_pkey_algo' [-Wmissing-prototypes]
int cert_get_pkey_algo(X509 *crt, struct buffer *out)
^
src/ssl_utils.c:22:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int cert_get_pkey_algo(X509 *crt, struct buffer *out)
^
static
1 error generated.
When HAProxy is linked to an OpenSSLv3 library, this option can be used
to load a provider during init. You can specify multiple ssl-provider
options, which will be loaded in the order they appear. This does not
prevent OpenSSL from parsing its own configuration file in which some
other providers might be specified.
A linked list of the providers loaded from the configuration file is
kept so that all those providers can be unloaded during cleanup. The
providers loaded directly by OpenSSL will be freed by OpenSSL.
This option can be used to define a default property query used when
fetching algorithms in OpenSSL providers. It follows the format
described in https://www.openssl.org/docs/man3.0/man7/property.html.
It is only available when haproxy is built with SSL support and linked
to OpenSSLv3 libraries.
The random generator initialization needs to be performed before the
chroot but it is not needed before. If we want to add provider
configuration option to the configuration file, they need to be
processed before any call to a crypto-related OpenSSL function.
We can then delay the initialization until after the configuration file
is parsed and processed.
The "http-restrict-req-hdr-names" option can now be set to restrict allowed
characters in the request header names to the "[a-zA-Z0-9-]" charset.
Idea of this option is to not send header names with non-alphanumeric or
hyphen character. It is especially important for FastCGI application because
all those characters are converted to underscore. For instance,
"X-Forwarded-For" and "X_Forwarded_For" are both converted to
"HTTP_X_FORWARDED_FOR". So, header names can be mixed up by FastCGI
applications. And some HAProxy rules may be bypassed by mangling header
names. In addition, some non-HTTP compliant servers may incorrectly handle
requests when header names contain characters ouside the "[a-zA-Z0-9-]"
charset.
When this option is set, the policy must be specify:
* preserve: It disables the filtering. It is the default mode for HTTP
proxies with no FastCGI application configured.
* delete: It removes request headers with a name containing a character
outside the "[a-zA-Z0-9-]" charset. It is the default mode for
HTTP backends with a configured FastCGI application.
* reject: It rejects the request with a 403-Forbidden response if it
contains a header name with a character outside the
"[a-zA-Z0-9-]" charset.
The option is evaluated per-proxy and after http-request rules evaluation.
This patch may be backported to avoid any secuirty issue with FastCGI
application (so as far as 2.2).
Using -Wall reveals several warning when building ncbuf testing API. One
of them was about the signedness mismatch. The other one was with an
incorrect print format.
ncbuf public API functions were not ready to deal with a NCBUF_NULL as
parameter. Strenghten these functions by handling it properly.
Most of the functions will consider the buffer as empty and silently
returns. The only exception is ncb_init(buf) which cannot be called
with a NCBUF_NULL. This seems legitimate to consider this as a bug and
not silently failed in this case.
quic_rx_strm_frm type was used to buffered STREAM frames received out of
order. Now the MUX is able to deal directly with these frames and
buffered it inside its ncbuf.
Rx has been simplified since the conversion of buffer to a ncbuf. The
old buffer can now be removed. The frms tree is also removed. It was
used previously to stored out-of-order received STREAM frames. Now the
MUX is able to buffer them directly into the ncbuf.
This commit is the equivalent for uni-streams of previous commit
MEDIUM: mux-quic/h3/hq-interop: use ncbuf for bidir streams
All unidirectional streams data is now handle in MUX Rx ncbuf. The
obsolete buffer is not unused and will be cleared in the following
patches.
Add a ncbuf for data reception on qcs. Thanks to this, the MUX is able
to buffered all received frame directly into the buffer. Flow control
parameters will be used to ensure there is never an overflow.
This change will simplify Rx path with the future deletion of acked
frames tree previously used for frames out of order.
Coverity reports that data block generated by ncb_blk_first() has
sz_data field uninitialized. This has no real impact as it has no sense
for data block. Set to 0 to hide the warning.
This should fix github issue #1695.
It is absolutely not possible to use the same "bind" line to listen to
both quic and tcp for example, because no single transport layer would
fit both modes and we'll need the type to choose one then to choose a
mux. Let's make sure this does not happen. This may be relaxed in the
future if we manage to instantiate transport layers on the fly, but the
SSL vs quic part might be tricky to handle.
The error message when mixing stream and dgram protocols in an
address speaks about sockets while it ought to speak about addresses,
let's fix this as in some contexts it can be a bit confusing.
Fred & Amaury found that I messed up with qc_detach() in commit 4201ab791
("CLEANUP: muxes: make mux->attach/detach take a conn_stream endpoint"),
causing a segv in this case with endp->cs == NULL being passed to
__cs_mux(). It obviously ought to have been endp->target like in other
muxes.
No backport needed.
Valerio Pachera explained [1] that external checks would benefit from
having a variable indicating if SSL is being used or not on the server
being checked, and the discussion derived to also indicating the protocol
in use.
This patch adds two environment variables for external checks:
- HAPROXY_SERVER_SSL: equals "0" when SSL is not used, "1" when it is
- HAPROXY_SERVER_PROTO: contains one of the following words to describe
the protocol used with this server:
- "cli": the haproxy CLI. Normally not seen
- "syslog": this is a syslog TCP server
- "peers": this is a peers TCP server
- "h1": this is an HTTP/1.x server
- "h2": this is an HTTP/2 server
- "tcp": this is any other TCP server
The patch is very simple, and may be backported to recent versions if
needed. This closes github issue #1692.
[1] https://www.mail-archive.com/haproxy@formilux.org/msg42233.html
The two functions became exact copies since there's no more special case
for the appctx owner. Let's merge them into a single one, that simplifies
the code.
This one is the pointer to the conn_stream which is always in the
endpoint that is always present in the appctx, thus it's not needed.
This patch removes it and replaces it with appctx_cs() instead. A
few occurences that were using __cs_strm(appctx->owner) were moved
directly to appctx_strm() which does the equivalent.
The former takes a conn_stream still attached to a valid appctx,
which also complicates the termination of the applet. Instead, let's
pass the appctx which already points to the endpoint, this allows us
to properly detach the conn_stream before the call, which is cleaner
and safer.
The mux ->detach() function currently takes a conn_stream. This causes
an awkward situation where the caller cs_detach_endp() has to partially
mark it as released but not completely so that ->detach() finds its
endpoint and context, and it cannot be done later since it's possible
that ->detach() deletes the endpoint. As such the endpoint link between
the conn_stream and the mux's stream is in a transient situation while
we'd like it to be clean so that the mux's ->detach() code can call any
regular function it wants that knows the regular semantics of the
relation between the CS and the endpoint.
A better approach consists in slightly modifying the detach() API to
better match the reality, which is that the endpoint is detached but
still alive and that it's the only part the function is interested in.
As such, this patch modifies the function to take an endpoint there,
and by analogy (or simplicity) does the same for ->attach(), even
though it looks less important there since we're always attaching an
endpoint to a conn_stream anyway. It is possible that in the future
the API could evolve to use more endpoints that provide a bit more
flexibility in the API, but at this point we don't need to go further.
The principle that each mux stream should have an endpoint is not
guaranteed for closed streams that map to the dummy static streams.
Let's have a dummy endpoint for use with such streams. It only has
the DETACHED flag and a NULL conn_stream, and is referenced by all
the closed streams so that we can afford not to test strm->endp when
trying to access the flags or the CS.
The principle that each mux stream should have an endpoint is not
guaranteed for closed streams that map to the dummy static streams.
Let's have a dummy endpoint for use with such streams. It only has
the DETACHED flag and a NULL conn_stream, and is referenced by all
the closed streams so that we can afford not to test h2s->endp when
trying to access the flags or the CS.
There is always an endpoint link in a stream, and this endpoint link
contains a pointer to the conn_stream it's attached to, so the one in
the h1 stream is always duplicate now. Let's always use endp->cs
instead and get rid of it.
Muxes and applets need to have both a pointer to the endpoint and to the
conn_stream. It would seem more natural that they only have a pointer to
the endpoint (that is always there) and that this one has an optional
pointer to the conn_stream. This would reduce the number of elements to
manipulate in lower level code. In addition, the conn_stream is not much
used from the lower layers (wake and exceptional events mostly).
The few applets that set CS_EP_EOI or CS_EP_ERROR used to set it on the
endpoint retrieved from the conn_stream while it's already available on
the appctx itself. Better use the appctx one to limit the unneeded
interactions between the two sides.
At a few places the endpoint pointer was retrieved from the conn_stream
while it's safer and more long-term proof to take it from the qcs. Let's
just do that.
At a few places the endpoint pointer was retrieved from the conn_stream
while it's safer and more long-term proof to take it from the fstrm.
Let's just do that.
At a few places the endpoint pointer was retrieved from the conn_stream
while it's safer and more long-term proof to take it from the context.
Let's just do that.
At a few places the endpoint pointer was retrieved from the conn_stream
while it's safer and more long-term proof to take it from the h2s. Let's
just do that.
At a few places the endpoint pointer was retrieved from the conn_stream
while it's safer and more long-term proof to take it from the h1s. Let's
just do that.
Wherever we need to report an error, we have an even easier access to
the endpoint than the conn_stream. Let's first adjust the API to use
the endpoint and rename the function accordingly to cs_ep_set_error().
Since the 2.5, for security reason, HTTP/1.0 GET/HEAD/DELETE requests with a
payload are rejected (See e136bd12a "MEDIUM: mux-h1: Reject HTTP/1.0
GET/HEAD/DELETE requests with a payload" for details). However it may be an
issue for old clients.
To avoid any compatibility issue with such clients,
"h1-accept-payload-with-any-method" global option was added. It must only be
set if there is a good reason to do so because it may lead to a request
smuggling attack on some servers or intermediaries.
This patch should solve the issue #1691. it may be backported to 2.5.
In wdt_handler(), does not try to trigger the watchdog if the
prev_cpu_time wasn't initialized.
This prevents an unexpected trigger of the watchdog when it wasn't
initialized yet. This case could happen in the master just after loading
the configuration. This would show a trace where the <diff> value is equal
to the <now> value in the trace, and the <poll> value would be 0.
For example:
Thread 1 is about to kill the process.
*>Thread 1 : id=0x0 act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=1 prof=0 harmless=0 wantrdv=0
cpu_ns: poll=0 now=6005541706 diff=6005541706
curr_task=0
Thanks to Christian Ruppert for repporting the problem.
Could be backported in every stable versions.
Lua API Channel.remove() and HTTPMessage.remove() expects 1 to 3
arguments (counting the manipulated object), with offset and length
being the 2nd and 3rd argument, respectively.
hlua_{channel,http_msg}_del_data() incorrectly gets the 3rd argument as
offset, and 4th (nonexistent) as length. hlua_http_msg_del_data() also
improperly checks arguments. This patch fixes argument handling in both.
Must be backported to 2.5.
Implement a series of unit test to validate ncbuf. This is written with
a main function which can be compiled independently using the following
command-line :
$ gcc -DSTANDALONE -lasan -I./include -o ncbuf src/ncbuf.c
The first part tests is used to test ncb_add()/ncb_advance(). After each
call a loop is done on the buffer blocks which should ensure that the
gap infos are correct.
The second part generates random offsets and insert them until the
buffer is full. The buffer is then resetted and all random offsets are
re-inserted in the reverse order : the buffer should be full once again.
The generated binary takes arguments to change the tests execution.
"usage: ncbuf [-r] [-s bufsize] [-h bufhead] [-p <delay_msec>]"
A new function ncb_advance() is implemented. This is used to advance the
buffer head pointer. This will consume the front data while forming a
new gap at the end for future data.
On success NCB_RET_OK is returned. The operation can be rejected if a
too small new gap is formed in front of the buffer.
Define three different ways to proceed insertion. This configures how
overlapping data is treated.
- NCB_ADD_PRESERVE : in this mode, old data are kept during insertion.
- NCB_ADD_OVERWRT : new data will overwrite old ones.
- NCB_ADD_COMPARE : this mode adds a new test in check stage. The
overlapping old and new data must be identical or else the insertion
is not conducted. An error NCB_RET_DATA_REJ is used in this case.
The mode is specified with a new argument to ncb_add() function.
Implement a new function ncb_add() to insert data in ncbuf. This
operation is conducted in two stages. First, a simulation will be run to
ensure that insertion can be proceeded. If a gap is formed, either
before or after the new data, it must be big enough to store its header,
or else the insertion is aborted.
After this check stage, the insertion is conducted block by block with
the function pair ncb_fill_data_blk()/ncb_fill_gap_blk().
A new type ncb_ret is used as a return value. For the moment, only
success or gap-size error is used. It is planned to add new error types
in the future when insertion will be extended.
Relax the constraint for gap storage when this is the last block.
ncb_blk API functions will consider that if a gap is stored near the end
of the buffer, without the space to store its header, the gap will cover
entirely the buffer end.
For these special cases, the gap size/data size are not write/read
inside the gap to prevent an overflow. Such a gap is designed in
functions as "reduced gap" and will be flagged with the value
NCB_BK_F_FIN.
This should reduce the rejection on future add operation when receiving
data in-order. Without reduced gap handling, an insertion would be
rejected if it covers only partially the last buffer bytes, which can be
a very common case.
Implement two new functions to report the total data stored accross the
whole buffer and the data stored at a specific offset until the next gap
or the buffer end.
To facilitate implementation of these new functions and also future
add/delete operations, a new abstraction is introduced : ncb_blk. This
structure represents a block of either data or gap in the buffer. It
simplifies operation when moving forward in the buffer. The first buffer
block can be retrieved via ncb_blk_first(buf). The block at a specific
offset is accessed via ncb_blk_find(buf, off).
This abstraction is purely used in functions but not stored in the ncbuf
structure per-se. This is necessary to keep the minimal memory
footprint.
Define the new type ncbuf. It can be used as a buffer with
non-contiguous data and wrapping support.
To reduce as much as possible the memory footprint, size of data and
gaps are stored in the gaps themselves. This put some limitation on the
buffer usage. A reserved space is present just before the head to store
the size of the first data block. Also, add and delete operations will
be constrained to ensure minimal gap sizes are preserved.
The sizes stored in the gaps are represented by a custom type named
ncb_sz_t. This type is a typedef to easily change it : this has a
direct impact on the maximum buffer size (MAX(ncb_sz_t) - sizeof(ncb_sz_t))
and the minimal gap sizes (sizeof(ncb_sz_t) * 2)).
Currently, it is set to uint32_t.
Add send_stateless_reset() to send a stateless reset packet. It prepares
a packet to build a 1-RTT packet with quic_stateless_reset_token_cpy()
to copy a stateless reset token derived from the cluster secret with
the destination connection ID received as salt.
Also add QUIC_EV_STATELESS_RST new trace event to at least to have a trace
of the connection which are reset.
A server may send the stateless reset token associated to the current
connection from its transport parameters. So, let's copy it from
qc_lstnt_params_init().
The stateless reset token of a connection is generated from qc_new_conn() when
allocating the first connection ID. A QUIC server can copy it into its transport
parameters to allow the peer to reset the associated connection.
This latter is not easily reachable after having returned from qc_new_conn().
We want to be able to initialize the transport parameters from this function which
has an access to all the information to do so.
Extract the code used to initialize the transport parameters from qc_lstnr_pkt_rcv()
and make it callable from qc_new_conn(). qc_lstnr_params_init() is implemented
to accomplish this task for a haproxy listener.
Modify qc_new_conn() to reduce its the number of parameters.
The source address coming from Initial packets is also copied from qc_new_conn().
Add quic_stateless_reset_token_init() wrapper function around
quic_hkdf_extract_and_expand() function to derive the stateless reset tokens
attached to the connection IDs from "cluster-secret" configuration setting
and call it each time we instantiate a QUIC connection ID.
This function will have to call another one from quic_tls.[ch] soon.
As we do not want to include quic_tls.h from xprt_quic.h because
quic_tls.h already includes xprt_quic.h, let's moving it into
xprt_quic.c.
This is a wrapper function around OpenSSL HKDF API functions to
use the "extract-then-expand" HKDF mode as defined by rfc5869.
This function will be used to derived stateless reset tokens
from secrets ("cluster-secret" conf. keyword) and CIDs (as salts).
It could be usefull to set a ASCII secret which could be used for different
usages. For instance, it will be used to derive QUIC stateless reset tokens.
A ->time_received new member is added to quic_rx_packet to store the time the
packet are received. ->largest_time_received is added the the packet number
space structure to store this timestamp for the packet with a new largest
packet number to be acknowledged. QUIC_FL_PKTNS_NEW_LARGEST_PN new flag is
added to mark a packet number space as having to acknowledged a packet wih a
new largest packet number. In this case, the packet number space ack delay
must be recalculated.
Add quic_compute_ack_delay_us() function to compute the ack delay from the value
of the time a packet was received. Used only when a packet with a new largest
packet number.
As we do not have any task to be wake up by the poller after sendto() error,
we add an sendto() error counter to the quic_conn struct.
Dump its values from qc_send_ppkts().
There are two reasons we can reject the creation of an h2 stream on the
frontend:
- its creation would violate the MAX_CONCURRENT_STREAMS setting
- there's no more memory available
And on the backend it's almost the same except that the setting might
have be negotiated after trying to set up the stream.
Let's add traces for such a sitaution so that it's possible to know why
the stream was rejected (currently we only know it was rejected).
It could be nice to backport this to the most recent versions.
When a client doesn't respect the h2 MAX_CONCURRENT_STREAMS setting, we
rightfully send RST_STREAM to it so that the client closes. But the
max_id is only updated on the successful path of h2c_handle_stream_new(),
which may be reentered for partial frames or CONTINUATION frames, and as
a result we don't increment it if an extraneous stream ID is rejected.
Normally it doesn't have any consequence. But on a POST it can have some
if the DATA frame immediately follows the faulty HEADERS frame: with
max_id not incremented, the stream remains in IDLE state, and the DATA
frame now lands in an invalid state from a protocol's perspective, which
must lead to a connection error instead of a stream error.
This can be tested by modifying the code to send an arbitrarily large
MAX_CONCURRENT_STREAM setting and using h2load to send more concurrent
streams than configured: with a GET, only a tiny fraction of them will
report an error (e.g. 101 streams for 100 accepted will result in ~1%
failure), but when sending data, most of the streams will be reported
as failed because the connection will be closed. By updating the max_id
earlier, the stream is now considered as closed when the DATA frame
arrives and it's silently discarded.
This must be backported to all versions but only if the code is exactly
the same. Under no circumstance this ID may be updated for a partial frame
(i.e. only update it before or just after calling h2c_frt_steam_new()).
This patch adds a lock on the struct dgram_conn to ensure
that an other thread cannot trash a fd or alter its status
while the current thread processing it on for send/receive/connect
operations.
Starting with the 2.4 version this could cause a crash when a DNS
request is failing, setting the FD of the dgram structure to -1. If the
dgram structure is reused after that, a read access to fdtab[-1] is
attempted. The crash was only triggered when compiled with ASAN.
In previous versions the concurrency issue also exists but is less
likely to crash.
This patch must be backported until v2.4 and should be
adapt for v < 2.4.
... or how a bogus warning forces you to do tricky changes in your code
and fail on a length test condition! Fortunately it changed in the right
direction that immediately broke, due to a missing "> sizeof(path)" that
had to be added to the already ugly condition.
This fixes recent commit 393e42ae5 ("BUILD: ssl: work around bogus warning
in gcc 12's -Wformat-truncation"). It may have to be backported if that
one is backported.
When building without threads, gcc 12 says that there's a null-deref in
_HA_ATOMIC_INC() called from listener_accept(). It's just that the code
was originally written in an attempt not to always have a proxy for a
listener and that there are two places where the pointer is tested before
being used, so the compiler concludes that the pointer might be null
hence that other places are null-derefs.
In practice the pointer cannot be null there (and never has been), but
since that code was initially built that way and it's only a matter of
adding a pair of braces to shut it up, let's respect that initial
attempt in case one day we need it.
This one was also reported by Ilya in issue #1513, though with threads
enabled in his case.
This may have to be backported if users complain about new breakage with
gcc-12.
As was first reported by Ilya in issue #1513, Gcc 12 incorrectly reports
a possible overflow from the concatenation of two strings whose size was
previously checked to fit:
src/ssl_crtlist.c: In function 'crtlist_parse_file':
src/ssl_crtlist.c:545:58: error: '%s' directive output may be truncated writing up to 4095 bytes into a region of size between 1 and 4096 [-Werror=format-truncation=]
545 | snprintf(path, sizeof(path), "%s/%s", global_ssl.crt_base, crt_path);
| ^~
src/ssl_crtlist.c:545:25: note: 'snprintf' output between 2 and 8192 bytes into a destination of size 4097
545 | snprintf(path, sizeof(path), "%s/%s", global_ssl.crt_base, crt_path);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It would be a bit concerning to disable -Wformat-truncation because it
might detect real programming mistakes at other places. The solution
adopted in this patch is absolutely ugly and error-prone, but it works,
it consists in integrating the snprintf() call in the error condition
and to test the result again. Let's hope a smarter compiler will not
warn that this test is absurd since guaranteed by the first condition...
This may have to be backported for those suffering from a compiler upgrade.
The CRL file CLI update code was strongly based off the CA one and some
copy-paste issues were then introduced.
This patch fixes GitHub issue #1685.
It should be backported to 2.5.
The STAT_ST_* values have been abused by virtually every applet and CLI
keyword handler, and this must not continue as it's a source of bugs and
of overly complicated code.
This patch renames the states to STAT_STATE_*, and keeps the previous
enum while marking each entry as deprecated. This should be sufficient to
catch out-of-tree code that might rely on them and to let them know what
to do with that.
Now that the CLI's print context is alone in the appctx, it's possible
to refine the appctx's ctx layout so that the cli part matches exactly
a regular svcctx, and as such move the CLI context into an svcctx like
other applets. External code will still build and work because the
struct cli perfectly maps onto the struct cli_print_ctx that's located
into svc.storage. This is of course only to make a smooth transition
during 2.6 and will disappear immediately after.
A tiny change had to be applied to the opentracing addon which performs
direct accesses to the CLI's err pointer in its own print function. The
rest uses the standard cli_print_* which were the only ones that needed
a small change.
The whole "ctx.cli" struct could be tagged as deprecated so that any
possibly existing external code that relies on it will get build
warnings, and the comments in the struct are pretty clear about the
way to fix it, and the lack of future of this old API.
Just like for the TCP service, let's move the context away from
appctx.ctx. A new struct hlua_http_ctx was defined, reserved in
hlua_applet_http_init() and used everywhere else. Similarly, the
task dump code will no more report decoded stack traces in case
these services would be involved. That may be solved later.
The use-service mechanism for Lua in TCP mode relies on the
hlua_tcp storage in appctx->ctx. We can move its definition to
hlua.c and simply use appctx_reserve_svcctx() to reserve and access
the stoage. One tiny side effect is that the task dump used in panics
will not show anymore the Lua call stack in its trace. For this a
better API is needed from the Lua code to expose a function that does
the job from an appctx.
The Lua cosockets were using appctx.ctx.hlua_cosocket. Let's move this
to a local definition of "hlua_csk_ctx" in hlua.c, which is allocated
from the appctx by hlua_socket_new(). There's a notable change which is
that, while previously the xref link with the peer was established with
the appctx, it's now in the hlua_csk_ctx. This one must then hold a
pointer to the appctx. The code was adjusted accordingly, and now that
part of the code doesn't use the appctx.ctx anymore.
The context was moved to a local definition in the cache code, and
there's nothing specific to the cache anymore in the appctx. The
struct is stored into the appctx's storage area via the svcctx.
This one has been misused for a while as well, it's time to deprecate it
since we don't use it anymore. It will be removed in 2.7 and for now is
only marked as deprecated. Since we need to guarantee that it's zeroed
before starting any applet or CLI command, it was moved into an anonymous
union where its sibling is not marked as deprecated so that we can
continue to initialize it without triggering a warning.
If you found this commit after a bisect session you initiated to figure
why you got some build warnings and don't know what to do, have a look
at the code that deals with the "show fd", "show sess" or "show servers"
commands, as it's supposed to be self-explanatory about the tiny changes
to apply to your code to port it. If you find APPLET_MAX_SVCCTX to be
too small for your use case, either kindly ask for a tiny extension
(and try to get your code merged), or just use a pool.
The httpclient already uses its own pointer and only used to store this
single pointer into the appctx.ctx field. Let's just move it to the
svcctx and remove this entry from the appctx union.
The httpclient's CLI uses ctx.cli.i0 for its flags and .p0 for the client
instance. Let's have a locally defined structure for this so that we don't
need the generic cli variables anymore.
Let's create a show_sock_ctx to store the bind_conf and the listener.
The entry is reserved when entering the I/O handler since there's no
parser here. That's fine because the function doesn't touch the area.
The code is was a bit convoluted by the use of a state machine around
st2 that is not used since only the STAT_ST_LIST state was used, and
the test of global.cli_fe inside the loop while it ought better be
tested before entering there. Let's get rid of this unneded state and
simplify the code. There's no more need for ->st2 now. The code looks
more changed than it really is due to the reindent caused by the
removal of the switch statement, but "git show -b" shows what really
changed.
There is the variable to start from (or environ) and an option to stop
after dumping the first one, just like "show fd". Let's have a small
locally-defined context with these two fields.
The "show fd" command used to rely on cli.i0 for the fd, and st2 just
to decide whether to stop after the first value or not. It could have
been possible to decide to use just a negative integer to dump a single
value, but it's as easy and more durable to declare a two-field struct
show_fd_ctx for this.
The command uses a pointer to the current proxy being dumped, one to the
current server being dumped, an optional ID of the only proxy to dump
(which is in fact used as a boolean), and a flag indicating if we're
doing a "show servers conn" or a "show servers state". Let's move all
this to a struct show_srv_ctx.
The command uses a pointer to a cache instance and the next key to dump,
they were in cli.p0/i0 respectively, let's move them to a struct
show_cache_ctx.
The command uses this state but _INIT immediately turns to _LIST, which
turns to _FIN at the end without doing anything in that state, thus the
only existing state is _LIST so we don't need to store a state. Let's
just get rid of it.
The command was using cli.p0/p1/p2 to select which section to dump, the
current section and the current ns. Let's instead have a locally defined
"show_resolvers_ctx" section for this.
The ring code was using ctx.cli.i0/p0/o0 to store its context during CLI
dumps via "show events" or "show errors". Let's use a locally defined
context and drop that.
The ring watch flags (wait, seek end) were dangerously passed via ctx.cli.i0
from "show buf" in sink.c:cli_parse_show_events(), or implicitly reset in
"show errors". That's very unconvenient, difficult to follow, and prone to
short-term breakage.
Let's pass an extra argument to ring_attach_cli() to take these flags, now
defined in ring-t.h as RING_WF_*, and let the function set them itself
where appropriate (still ctx.cli.i0 for now).
The command only requires to store an int, but it will be useful later
to have a struct to pass extra info such as an "all" flag to dump all
FDs. The new context is now a struct dev_fd_ctx stored in svcctx.
Instead of having a struct that contains a single pointer in the appctx
context, let's directly use the generic context pointer and get rid of
the now unused sft.ptr entry.
Several steps are used during the addition of a crtlist to yield during
long operations, and states are used for this. Let's just not use the
st2 anymore and place the state inside the add_crtlist_ctx struct instead.
These commands were using cli.i0/p0/p1 and in a not very clean way since
they use the same parser but with different types depending on the I/O
handler. Given there was no explanation about what the variables were
supposed to be, they were named based on best guess and placed into a
new "show_crtlist_ctx" structure.
These two commands use distinct parse/release functions but a common
iohandler, thus they need to keep the same context. It was created
under the name "commit_cacrlfile_ctx" and holds a large part of the
pointers (6) and the ca_type field that helps distinguish between
the two commands for the I/O handler. It looks like some of these
fields could have been merged since apparently the CA part only
uses *cafile* and the CRL part *crlfile*, while both old and new
are of type cafile_entry and set only for each type. This could
probably even simplify some parts of the code that tries to use
the correct field.
These fields were the last ones to be migrated thus the appctx's
ssl context could finally be removed.
Just like for "set ssl cafile", the command doesn't really need this
context which doesn't outlive the parsing function but it was there
for a purpose so it's maintained. Only 3 fields were used from the
appctx's ssl context: old_crlfile_entry, new_crlfile_entry, and path.
These ones were reinstantiated into a new "set_crlfile_ctx" struct.
It could have been merged with the one used in "set cafile" if the
fields had been renamed since cafile and crlfile are of the same
type (probably one of them ought to be renamed?).
None of these fields could be dropped as they are still shared with
other commands.
Just like for "set ssl cert", the command doesn't really need this
context which doesn't outlive the parsing function but it was there
for a purpose so it's maintained. Only 3 fields were used from the
appctx's ssl context: old_cafile_entry, new_cafile_entry, and path.
These ones were reinstantiated into a new "set_cafile_ctx" struct.
None of them could be dropped as they are still shared with other
commands.
The command doesn't really need any storage since there's only a parser,
but since it used this context, there might have been plans for extension,
so better continue with a persistent one. Only old_ckchs, new_ckchs, and
path were being used from the appctx's ssl context. There ones moved to
the local definition, and the two former ones were removed from the appctx
since not used anymore.
This command only really uses old_ckchs, new_ckchs and next_ckchi
from the appctx's ssl context. The new structure "commit_cert_ctx"
only has these 3 fields, though none could be removed from the shared
ssl context since they're still used by other commands.
This command only really uses old_ckchs, cur_ckchs and the index
in which the transaction was stored. The new structure "show_cert_ctx"
only has these 3 fields, and the now unused "cur_ckchs" and "index"
could be removed from the shared ssl context.
Now this command doesn't share any context anymore with "show cafile"
nor with the other commands. The previous "cur_cafile_entry" field from
the applet's ssl context was removed as not used anymore. Everything was
moved to show_crlfile_ctx which only has 3 fields.
Saying that the layout and usage of the various variables in the ssl
applet context is a mess would be an understatement. It's very hard
to know what command uses what fields, even after having moved away
from the mix of cli and ssl.
Let's extract the parts used by "show cafile" into their own structure.
Only the "show_all" field would be removed from the ssl ctx, the other
fields are still shared with other commands.
This context is used by CLI keywords registered by Lua. We can take
it out of the appctx and use the generic command context allocation so
that the appctx doesn't have to declare a specific one anymore. The
context is created during parsing.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing (both in the CLI and HTTP).
The change looks large but it's particularly mechanical. The context
initialization appears in stats.c and http_ana.c. The context is used
in stats.c and resolvers.c since "show stat resolvers" points there.
That's the reason why the definition moved to stats.h. "show info"
and "show stat" continue to share the same state definition for now.
Nothing else was modified.
Historically the CLI was a second access to the stats and we've continued
to initialize only the stats part when initializing the CLI. Let's make
sure we do that on the whole ctx instead. It's probably not more needed
at all nowadays but better stay on the safe side.
Let's instead define a 4-state enum solely for this use case, and place
it into the command's context. Note that END and FIN were already
aliases, which is why they were merged.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. The code also uses st2 which deserves being
addressed in separate commit.
There's no point checking the state before deciding to detach the backref
on "show map", it should always be done if the list is not empty. Note
that being empty guarantees that it's not linked into the list, and
conversely not being empty guarantees that it's in the list, hence the
test doesn't need to be performed under the lock.
The "show map"/"clear map" code used to rely on the cli's i0/i1 fields
to store the generation numbers to work with. That's particularly dirty
because it's done at places where ctx.map is also manipulated while they
are part of the same union, and the reason why this didn't cause trouble
is because cli.i0/i1 are at offset 216/224 while the map parts end at
204, so luckily there was no overlap.
Let's add these fields to the map context.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. Many commands, including pure parsers, use this
context but that's not a problem as it's designed to be used this way.
Due to this, many lines are changed but that's in fact a replacement of
"appctx->ctx.map" with "ctx->". Note that the code also uses st2 which
deserves being addressed in separate commit.
This state is pointless now, it just serves to initialize the initial
table pointer while this can be done more easily in the parser, so let's
do that and drop that state.
Let's instead define a 4-state enum solely for this use case, and place
it into the command's context. Note that END and FIN were already
aliases, which is why they were merged.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. The code also uses st2 which deserves being
addressed in separate commit.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing.
The code still has room for improvement, such as in the "flags" field
where bits are hard-coded, but they weren't modified.
The state was a constant, let's remove what remains of the switch/case.
The code from the "case" statement was only reindented as can be checked
with "git show -b".
This state is only an alias for "thr >= global.nbthread" which is the
sole condition that indicates the end and reaches it. Let's just drop
this state. There's only the STATE_LIST that's left.
This state was only used to preset the list element. Now that we can
guarantee that the context can be properly preset during the parsing
we don't need this state anymore. The first pointer has to be set to
point to the first stream during the initial call which is detected
by the pointer not yet being set (null). Thanks to this we can also
remove one state check on the abort path.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing.
Till now, appctx_new() used to allocate an entry from the pool and
pass it through appctx_init() to initialize a debatable part of it that
didn't correspond anymore to the comments, and fill other fields. It's
hard to say what is fully initialized and what is not.
Let's get rid of that, and always zero the initialization (appctx are
not that big anyway, even with the cache there's no difference in
performance), and initialize what remains. this is cleaner and more
resistant to new field additions.
The appctx_init() function was removed.
Instead of using existing fields and having to put keyword-specific
contexts in the applet definition, let's have the appctx provide a
generic storage area that's currently large enough for existing CLI
commands and small applets, and a function to allocate that storage.
The function will be responsible for verifying that the requested size
fits in the area so that the caller doesn't need to add specific checks;
it is validated during development as this size is static and will
not change at runtime. In addition the caller doesn't even need to
free() the area since it's part of an existing context. For the
caller's convenience, a context pointer "svcctx" for the command is
also provided so that the allocated area can be placed there (or
possibly any other one in case a larger area is needed).
The struct's layout has been temporarily complicated by adding one
level of anonymous union on top of the "ctx" one. This will allow us
to preserve "ctx" during 2.6 for compatibility with possible external
code and get rid of it in 2.7. This explains why the diff extends to
the whole "ctx" union, but a "git show -b" shows that only one extra
layer was added. In order to make both the svcctx pointer and its
storage accessible without further enlarging the appctx structure,
both svcctx and the storage share the same storage as the ctx part.
This is done by having them placed in the union with a protected
overlapping area for svcctx, for which a shadow member is also
present in the storage area:
union {
void* svcctx; // variable accessed by services
struct {
void *shadow; // shadow of svcctx;
char storage[]; // where most services store their data
};
union { // older commands store here and ignore svcctx
...
} ctx;
};
I.e. new applications will use appctx->svcctx while older ones will be
able to continue to use appctx->ctx.*
The whole area (including the pointer's context) is zeroed before any
applet is initialized, and before CLI keyword processor's first invocation,
as it is an important part of the existing keyword processors, which makes
CLI keywords effectively behave like applets.
The io_handler in "add ssl crt_list" is built around a "while" loop that
only makes forward progress and that doesn't handle its final state as
it's not supposed to be called again once reached. This makes the code
confusing because its construct implies an infinite loop for such a
state (or any other unhandled one). Let's just remove that unneeded loop.
The "show ssl cert" command mixes some generic pointers from the
"ctx.cli" struct with context-specific ones from "ctx.ssl" while both
are in a union. Amazingly, despite the use of both p0 and i0 to store
respectively a pointer to the current ckchs and a transaction id, there
was no overlap with the other pointers used during these operations,
but should these fields be reordered or slightly updated this will break.
Comments were added above the faulty functions to indicate which fields
they are using.
This needs to be backported to 2.5.
The "show ssl crl-file" command mixes some generic pointers from the
"ctx.cli" struct with context-specific ones from "ctx.ssl" while both
are in a union. It's fortunate that the p1 pointer in use is located
before the first one used (it overlaps with old_cafile_entry). But
should these fields be reordered or slightly updated this will break.
This needs to be backported to 2.5.
The "show ssl ca-file <name>" command mixes some generic pointers from
the "ctx.cli" struct and context-specific ones from "ctx.ssl" while both
are in a union. The i0 integer used to store the current ca_index overlaps
with new_crlfile_entry which is thus harmless for now but is at the mercy
of any reordering or addition of these fields. Let's add dedicated fields
into the ssl structure for this.
Comments were added on top of the affected functions to indicate what they
use.
This needs to be backported to 2.5.
The "show ca-file" and "show crl-file" commands mix some generic pointers
from the "ctx.cli" struct and context-specific ones from "ctx.ssl" while
both are in a union. It's fortunate that the p0 pointer in use is located
immediately before the first one used (it overlaps with next_ckchi_link,
and old_cafile_entry is safe). But should these fields be reordered or
slightly updated this will break.
Comments were added on top of the affected functions to indicate what they
use.
This needs to be backported to 2.5.
When "show map" initializes itself, it first takes the reference to the
starting point under a lock, then releases it before switching to state
STATE_LIST, and takes the lock again. The problem is that it is possible
for another thread to remove the first element during this unlock/lock
sequence, and make the list run anywhere. This is of course extremely
unlikely but not impossible.
Let's initialize the pointer in the STATE_LIST part under the same lock,
which is simpler and more reliable.
This should be backported to all versions.
In case of write error in "show map", the backref is detached but
the list wasn't locked when this is done. The risk is very low but
it may happen that two concurrent "show map" one of which would fail
or one "show map" failing while the same entry is being updated could
cause a crash.
This should be backported to all stable versions.
Commit e7f74623e ("MINOR: stats: don't output internal proxies (PR_CAP_INT)")
in 2.5 ensured we don't dump internal proxies on the stats page, but the
same is needed for "show backend", as since the addition of the HTTP client
it now appears there:
$ socat /tmp/sock1 - <<< "show backend"
# name
<HTTPCLIENT>
This needs to be backported to 2.5.
This command was introduced in 1.8 with commit eceddf722 ("MEDIUM: cli:
'show cli sockets' list the CLI sockets") but its yielding doesn't work.
Each time it enters, it restarts from the last bind_conf but enumerates
all listening sockets again, thus it loops forever. The risk that it
happens in field is low but it easily triggers on port ranges after
400-500 sockets depending on the length of their addresses:
global
stats socket /tmp/sock1 level admin
stats socket 192.168.8.176:30000-31000 level operator
$ socat /tmp/sock1 - <<< "show cli sockets"
(...)
ipv4@192.168.8.176:30426 operator all
ipv4@192.168.8.176:30427 operator all
ipv4@192.168.8.176:30428 operator all
ipv4@192.168.8.176:30000 operator all
ipv4@192.168.8.176:30001 operator all
ipv4@192.168.8.176:30002 operator all
^C
This patch adds the minimally needed restart point for the listener so
that it can easily be backported. Some more cleanup is needed though.
The "show resolvers" command is bogus, it tries to implement a yielding
mechanism except that if it yields it restarts from the beginning, until
it manages to fill the buffer with only line breaks, and faces error -2
that lets it reach the final state and exit.
The risk is low since it requires about 50 name servers to reach that
state, but it's not impossible, especially when using multiple sections.
In addition, the extraneous line breaks, if sent over an interactive
connection, will desynchronize the commands and make the client believe
the end was reached after the first nameserver. This cannot be fixed
separately because that would turn this bug into an infinite loop since
it's the line feed that manages to fill the buffer and stop it.
The fix consists in saving the current resolvers section into ctx.cli.p1
and the current nameserver into ctx.cli.p2.
This should be backported, but that code moved a lot since it was
introduced and has always been bogus. It looks like it has mostly
stabilized in 2.4 with commit c943799c86 so the fix might be backportable
to 2.4 without too much effort.
Try to create a "default" resolvers section at startup, but does not
display any error nor warning. This section is initialized using the
/etc/resolv.conf of the system.
This is opportunistic and with no guarantee that it will work (but it should
on most systems).
This is useful for the httpclient as it allows to use the DNS resolver
without any configuration in most of the cases.
The function is called from the httpclient_pre_check() function to
ensure than we tried to create the section before trying to initiate the
httpclient. But it is also called from the resolvers.c to ensure the
section is created when the httpclient init was disabled.
Release the expression used by the set-{src,dst}[-port] actions so we
keep valgrind happy upon an exit or an haproxy -c.
Could be backported in every supported version.