By moving the config-time stuff after the updt_lock, we can plug some
holes without interfering with it. This allows us to get back to the
768-bytes struct. The performance was not affected at all.
The read-lock contention observed on the update lock while turning it
into an upgradable lock were due to false sharing with the nearby
updates. Simply moving the lock alone into its own cache line is
sufficient to almost double the performance again, raising from 2355
to 4480k RPS with very low contention:
Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 743422995452 lost
Overhead Shared Object Symbol
15.88% haproxy [.] stktable_lookup_key
5.94% haproxy [.] ebmb_lookup
5.69% haproxy [.] http_wait_for_request
3.66% haproxy [.] stktable_touch_with_exp
2.62% [kernel] [k] _raw_spin_unlock_irqrestore
1.86% haproxy [.] http_action_return
1.79% haproxy [.] stream_process_counters
1.78% [kernel] [k] skb_release_data
1.77% haproxy [.] process_stream
Unfortunately, trying to move the line anywhere else didn't work,
despite the remaining holes, because this structure is not quite
clean. This adds 64 bytes to a struct that was already 768 long,
so it's now 832. It's possible to repack it a little bit and regain
these bytes by removing the THREAD_ALIGN before "keys" because we
rarely use the config stuff, but that's a bit unsafe.
Updating an entry in the updates tree is currently performed under the
table's write lock, which causes huge contention with other accesses
such as lookups and free. Aside the updates tree, the update,
localupdate and commitupdate variables, nothing is manipulated, so
let's create a distinct lock (updt_lock) to protect these together
to remove this contention. It required to add an extra lock in the
few places where we delete the update (though only if we're really
going to delete it) to protect the tree. This is very convenient
because now peer_send_teachmsgs() only needs to take this read lock,
and there is very little contention left on the stick-table.
With this alone, the performance jumped from 614k to 1140k/s on a
80-thread machine with a peers section! Stick-table updates with
no peers however now has to stand two locks and slightly regressed
from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the
significant unlocking of the peers updates and considered totally
acceptable.
The structure currently mixes R/O and R/W fields, let's organize them
by access type, focusing mainly on splitting the updates from the rest
so that peers activity does not affect the rest. For now it doesn't
bring any benefit but it paves the way for splitting the lock.
Due to the ts->ref_cnt being manipulated and checked inside wrlocks,
we continue to have it updated under plenty of read locks, which have
an important cost on many-thread machines.
This patch turns them all to atomic ops and carefully moves them outside
of locks every time this is possible:
- the ref_cnt is incremented before write-unlocking on creation otherwise
the element could vanish before we can do it
- the ref_cnt is decremented after write-locking on release
- for all other cases it's updated out of locks since it's guaranteed by
the sequence that it cannot vanish
- checks are done before locking every time it's used to decide
whether we're going to release the element (saves several write locks)
- expiration tests are just done using atomic loads, since there's no
particular ordering constraint there, we just want consistent values.
For Lua, the loop that is used to dump stick-tables could switch to read
locks only, but this was not done.
For peers, the loop that builds updates in peer_send_teachmsgs is extremely
expensive in write locks and it doesn't seem this is really needed since
the only updated variables are last_pushed and commitupdate, the first
one being on the shared table (thus not used by other threads) and the
commitupdate could likely be changed using a CAS. Thus all of this could
theoretically move under a read lock, but that was not done here.
On a 80-thread machine with a peers section enabled, the request rate
increased from 415 to 520k rps.
This updates the local copy of the plock library to benefit from finer
memory ordering, EBO on more operations such as when take_w() and stow()
wait for readers to leave and refined EBO, especially on common operation
such as attempts to upgade R to S, and avoids a counter-productive prior
read in rtos() and take_r().
These changes have shown a 5% increase on regular operations on ARM,
a 33% performance increase on ARM on stick-tables and 2% on x86, and
a 14% and 4% improvements on peers updates respectively on ARM and x86.
The availability of relaxed operations will probably be useful for stats
counters which are still extremely expensive to update.
The following plock commits were included in this update:
9db830b plock: support inlining exponential backoff code
008d3c2 plock: make the rtos upgrade faster
2f76dde atomic: clean up the generic xchg()
3c6919b atomic: make sure that the no-return macros do not return a value
97c2bb7 atomic: make the fallback bts use the pointed type for the shift
f4c1880 atomic: also implement the missing pl_btr()
8329b82 atomic: guard all generic definitions to make it easier to provide specific ones
7c5cb62 atomic: use C11 atomics when available
96afaf9 atomic: prefer the C11 definitions in general
f3ec7a6 atomic: implement load/store/atomic barriers
8bdbd1e atomic: add atomic load/stores
0f604c0 atomic: add more _noret operations
3fe35db atomic: remove the (void) cast from the C11 operations
3b08a7c atomic: allow to define the fallback _noret variants
28deb22 atomic: make x86 arithmetic operations the _noret variants
8061fe2 atomic: handle modern compilers that support returning flags
b8b91b7 atomic: add the fetch-and-<op> operations (pl_ld<op>)
59817ca atomic: add memory order variants for most operations
a40774f plock: explicitly make use of the pl_*_noret operations
6f1861b plock: switch to pl_sub_noret_lax() for cancellation
c013980 plock: use pl_ldadd{_lax,_acq,} instead of pl_xadd()
382eea3 plock: use a release ordering when dropping the lock
60d750d plock: use EBO when waiting for readers to leave in take_w() and stow()
fc01c4f plock: improve EBO a little bit
1ef6390 plock: switch to CAS + XADD for pl_take_r()
->xprt_ctx (struct ssl_sock_ctx) and ->conn (struct connection) must be kept
by the remaining QUIC connection object (struct quic_cc_conn) after having
release the previous one (struct quic_conn) to allow "show fd/sess" commands
to be functional without causing haproxy crashes.
No need to backport.
We're adding a new argument "relaxed" to h2_make_htx_request() so that
we can control its level of acceptance of certain invalid requests at
the proxy level with "option accept-invalid-http-request". The goal
will be to add deactivable checks that are still desirable to have by
default. For now no test is subject to it.
As its name implies, this function checks if a path component has any
forbidden headers starting at the designated location. The goal is to
seek from the result of a successful ist_find_range() for more precise
chars. Here we're focusing on 0x00-0x1F, 0x20 and 0x23 to make sure
we're not too strict at this point.
This looks up the character range <min>..<max> in the input string and
returns a pointer to the first one found. It's essentially the equivalent
of ist_find_ctl() in that it searches by 32 or 64 bits at once, but deals
with a range.
This function is not H2 specific but rather generic to HTTP. We'll
need it in H3 soon, so let's move it to HTTP and rename it to
http_header_has_forbidden_char().
When the connection enters the "connection closing" state after having sent
a datagram with CONNECTION_CLOSE frames inside its packets, a lot of memory
may be freed from quic_conn objects (QUIC connection). This is done allocating
a reduced sized object which keeps enough information to handle the remaining
incoming packets for the connection in "connection closing" state, and to
continue to send again the previous datagram with CONNECTION_CLOSE frames inside
which has already been sent.
Define a new quic_cc_conn struct which represents the connection object after
entering the "connection close" state and after having release the quic_conn
connection object.
Define <pool_head_quic_cc_conn> new pool for these quic_cc_conn struct objects.
Define QUIC_CONN_COMMON structure which is shared between quic_conn struct object
(the connection before entering "connection close" state), and new quic_cc_conn
struct object (the connection after entering "connection close"). So, all the
members inside QUIC_CONN_COMMON may be indifferently dereferenced from a
quic_conn struct or a quic_cc_conn struct pointer.
Implement qc_new_cc_conn() function to allocate such connections in
"connection close" state. This function is responsible of copying the
required information from the original connection (quic_conn) to the remaining
connection (quic_cc_conn). Among others initialization, it redefined the
QUIC packet handler task to quic_cc_conn_io_cb() and the idle timer task
to qc_cc_idle_timer_task(). quic_cc_conn_io_cb() drains the received and
resend the datagram which CONNECTION_CLOSE frame which has already been sent
when entering "connection close" state. qc_cc_idle_timer_task() only releases
the remaining quic_cc_conn struct object.
Modify quic_conn_release() to allocate quic_cc_conn struct objects from the
original connection passed as argument. It does nothing if this original
connection is not in closing state, or if the idle timer has already expired.
Implement quic_release_cc_conn() to release a "connection close" connection.
It is called when its timer expires or if an error occured when sending
a packet from this connection when the peer is no more reachable.
Add "quic_cids" new pool to allocate the ->cids trees of quic_conn objects.
Replace ->cids member of quic_conn objects by pointer to "quic_cids" and
adapt the code consequently. Nothing special.
Add a new pool <pool_head_quic_cc_buf> for buffer used when building datagram
wich CONNECTION_CLOSE frames inside with QUIC_MIN_CC_PKTSIZE(128) as minimum
size.
Add ->cc_buf_area to quic_conn struct to store such buffers.
Add ->cc_dgram_len to store the size of the "connection close" datagrams
and ->cc_buf a buffer struct to be used with ->cc_buf_area as ->area member
value.
Implement qc_get_txb() to be called in place of qc_txb_alloc() to allocate
a struct "quic_cc_buf" buffer when the connection needs an immediate close
or a buffer struct if not.
Modify qc_prep_hptks() and qc_prep_app_pkts() to allow them to use such
"quic_cc_buf" buffer when an immediate close is required.
Move rx.bytes, tx.bytes and tx.prep_bytes quic_conn struct member to
bytes anonymous struct (bytes.rx, bytes.tx and bytes.prep member respectively).
They are moved before being defined into a bytes anonoymous struct common to
a future struct to be defined.
Consequently adapt the code.
Add a BUG_ON() to quic_peer_validated_addr() to check the amplification limit
is respected when it return false(0), i.e. when the connection is not validated.
Implement quic_may_send_bytes() which returns the number of bytes which may be
sent when the connection has not already been validated and call this functions
at several places when this is the case (after having called
quic_peer_validated_addr()).
Furthermore, this patch improves the code maintainability. Some patches to
come will have to rename ->[rt]x.bytes quic_conn struct members.
A HTTP server may provide a complete response even prior receiving the
full request. In this case, RFC 9114 allows the server to abort read
with a STOP_SENDING with error code H3_NO_ERROR.
This scenario was notably reproduced with haproxy and an inactive
server. If the client send a POST request, haproxy may provide a full
HTTP 503 response before the end of the full request.
Fields of sedesc structure were documented in the comment about the
structure itself. It was not really convenient, hard to read, hard to
update. So comments about the fields are moved on the corresponding field
line, as usual.
There is a mechanisme in the H1 and H2 multiplexer to skip the payload when
a response is returned to the client when it must not contain any payload
(response to a HEAD request or a 204/304 response). However, this does not
work when the splicing is used. The H2 multiplexer does not support the
splicing, so there is no issue. But with the mux-h1, when data are sent
using the kernel splicing, the mux on the server side is not aware the
client side should skip the payload. And once the data are put in a pipe,
there is no way to stop the sending.
It is a defect of the current design. This will be easier to deal with this
case when the mux-to-mux forwarding will be implemented. But for now, to fix
the issue, we should add an HTX flag on the start-line to pass the info from
the client side to the server side and be able to disable the splicing in
necessary.
The associated reg-test was improved to be sure it does not fail when the
splicing is configured.
This patch should be backported as far as 2.4..
Move the TX part of the code to quic_tx.c.
Add quic_tx-t.h and quic_tx.h headers for this TX part code.
The definition of quic_tx_packet struct has been move from quic_conn-t.h to
quic_tx-t.h.
Same thing for the TX part:
Move the RX part of the code to quic_rx.c.
Add quic_rx-t.h and quic_rx.h headers for this TX part code.
The definition of quic_rx_packet struct has been move from quic_conn-t.h to
quic_rx-t.h.
Move the code which directly calls the functions of the OpenSSL QUIC API into
quic_ssl.c new C file.
Some code have been extracted from qc_conn_finalize() to implement only
the QUIC TLS part (see quic_tls_finalize()) into quic_tls.c.
qc_conn_finalize() has also been exported to be used from this new quic_ssl.c
C module.
To accelerate the compilation of quic_conn.c file, export the code in relation
with the traces from quic_conn.c to quic_trace.c.
Also add some headers (quic_trace-t.h and quic_trace.h).
This setting which may be used into a "global" section, enables the QUIC listener
bindings when haproxy is compiled with the OpenSSL wrapper. It has no effect
when haproxy is compiled against a TLS stack with QUIC support, typically quictls.
This wrapper needs to have an access to an encoded version of the local transport
parameter (to be sent to the peer). They are provided to the TLS stack thanks to
qc_ssl_compat_add_tps_cb() callback.
These encoded transport parameters were attached to the QUIC connection but
removed by this commit to save memory:
MINOR: quic: Stop storing the TX encoded transport parameters
This patch restores these transport parameters and attaches them again
to the QUIC connection (quic_conn struct), but only when the QUIC OpenSSL wrapper
is compiled.
Implement qc_set_quic_transport_params() to encode the transport parameters
for a connection and to set them into the stack and make this function work
for both the OpenSSL wrapper or any other TLS stack with QUIC support. Its uses
the encoded version of the transport parameters attached to the connection
when compiled for the OpenSSL wrapper, or local parameters when compiled
with TLS stack with QUIC support. These parameters are passed to
quic_transport_params_encode() and SSL_set_quic_transport_params() as before
this patch.
Include haproxy/quic_openssl_compat.h from haproxy/openssl-compat.h when the
compilation of the QUIC openssl wrapper for TLS stacks is enabled with
USE_QUIC_OPENSSLCOMPAT.
Highly inspired from nginx openssl wrapper code.
This wrapper implement this list of functions:
SSL_set_quic_method(),
SSL_quic_read_level(),
SSL_quic_write_level(),
SSL_set_quic_transport_params(),
SSL_provide_quic_data(),
SSL_process_quic_post_handshake()
and SSL_QUIC_METHOD QUIC specific bio method which are also implemented by quictls
to support QUIC from OpenSSL. So, its aims is to support QUIC from a standard OpenSSL
stack without QUIC support. It relies on the OpenSSL keylog feature to retreive
the secrets derived by the OpenSSL stack during a handshake and to pass them to
the ->set_encryption_secrets() callback as this is done by quictls. It makes
usage of a callback (quic_tls_compat_msg_callback()) to handle some TLS messages
only on the receipt path. Some of them must be passed to the ->add_handshake_data()
callback as this is done with quictls to be sent to the peer as CRYPTO data.
quic_tls_compat_msg_callback() callback also sends the received TLS alert with
->send_alert() callback.
AES 128-bits with CCM mode is not supported at this time. It is often disabled by
the OpenSSL stack, but as it can be enabled by "ssl-default-bind-ciphersuites",
the wrapper will send a TLS alerts (Handhshake failure) if this algorithm is
negotiated between the client and the server.
0rtt is also not supported by this wrapper.
The aim of this patch is to allow the building of QUIC datagrams with
as much as packets with different encryption levels inside during handshake.
At this time, this is possible only for at most two encryption levels.
That said, most of the time, a server only needs to use two encryption levels
by datagram, except during retransmissions.
Modify qc_prep_pkts(), the function responsible of building datagrams, to pass
a list of encryption levels as parameter in place of two encryption levels. This
function is also used when retransmitting datagrams. In this case this is a
customized/flexible list of encryption level which is passed to this function.
Add ->retrans new member to quic_enc_level struct, to be used as attach point
to list of encryption level used only during retransmission, and ->retrans_frms
new member which is a pointer to a list of frames to be retransmitted.
This context may be released at the same time as the Initial TLS context.
This is done calling quic_tls_ctx_secs_free() and pool_free() in two code locations.
Implement quic_nictx_free() to do that.
Shorten ->negotiated_ictx quic_conn struct member (->nictx).
This variable is used during version negotiation. Indeed, a connection
may have to support support several QUIC versions of paquets during
the handshake. ->nictx is the QUIC TLS cipher context used for the negotiated
QUIC version.
This patch allows a connection to dynamically allocate this TLS cipher context.
Add a new pool (pool_head_quic_tls_ctx) for such QUIC TLS cipher context object.
Modify qc_new_conn() to initialize ->nictx to NULL value.
quic_tls_ctx_secs_free() frees all the secrets attached to a QUIC TLS cipher context.
Modify it to do nothing if it is called with a NULL TLS cipher context.
Modify to allocate ->nictx from qc_conn_finalize() just before initializing
its secrets. qc_conn_finalize() allocates -nictx only if needed (if a new QUIC
version was negotiated).
Modify qc_conn_release() which release a QUIC connection (quic_conn struct) to
release ->nictx TLS cipher context.
There is no need to keep an encoded version of the QUIC listener transport
parameters attache to the connection.
Remove ->enc_params and ->enc_params_len member of quic_conn struct.
Use variables to build the encoded transport parameter local to
ha_quic_set_encryption_secrets() before they are passed to
SSL_set_quic_transport_params().
Modify qc_ssl_sess_init() prototype. It was expected to be used with
the encoded transport parameters as passed parameter, but they were not
used. Cleanup this function.
parse_cpu_set() stopped returning the undocumented -1 which was a
leftover from an earlier attempt, changed from ulong to int since
it only returns a success/failure and no more a mask. Thus it must
not return -1 and its callers must only test for != 0, as is
documented.
This field used to store the cpumap of the first thread in a group, and
was used till 2.4 to hold some default settings, after which it was no
longer used. Let's just drop it.
We're currently having a problem with the porting from cpu_map from
processes to thread-groups as it happened in 2.7 with commit 5b09341c0
("MEDIUM: cpu-map: replace the process number with the thread group
number"), though it seems that it has deeper roots even in 2.0 and
that it was progressively made worng over time.
The issue stems in the way the per-process and per-thread cpu-sets were
employed over time. Originally only processes were supported. Then
threads were added after an optional "/" and it was documented that
"cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified
in 2.5 by commit 317804d28 ("DOC: update references to process numbers
in cpu-map and bind-process").
The reality is different: when processes were still supported, setting
"cpu-map 1" would apply the mask to the process itself (and only when
run in the background, which is not documented either and is also a
bug for another fix), and would be combined with any possible per-thread
mask when calculating the threads' affinity, possibly resulting in empty
sets. However, "cpu-map 1/all" would only set the mask for the threads
and not the process. As such the following:
cpu-map 1 odd
cpu-map 1/1-8 even
would leave no CPU while doing:
cpu-map 1/all odd
cpu-map 1/1-8 even
would allow all CPUs.
While such configs are very unlikely to ever be met (which is why this
bug is tagged minor), this is becoming quite more visible while testing
automatic CPU binding during 2.9 development because due to this bug
it's much more common to end up with incorrect bindings.
This patch fixes it by simply removing the .proc entry from cpu_map and
always setting all threads' maps. The process is no longer arbitrarily
bound to the group 1's mask, but in case threads are disabled, we'll
use thread 1's mask since it contains the configured CPUs.
This fix should be backported at least to 2.6, but no need to insist if
it resists as it's easier to break cpu-map than to fix an unlikely issue.
Since we'll soon want to adjust the "thread-groups" degree of freedom
based on the presence of cpu-map, we first need to be able to detect
if cpu-map was used. This function scans all cpu-map sets to detect if
any is present, and returns true accordingly.
lua_yieldk ctx argument is of type lua_KContext which is typedefed to
intptr_t when available so it can be used to store pointers.
But the wrapper function hlua_yieldk() passes it as a regular it so it
breaks that promise.
Changing hlua_yieldk() prototype so that ctx argument is of type
lua_KContext.
This bug had no functional impact because ctx argument is not being
actively used so far. This may be backported to all stable versions
anyway.
The internal tick clock was used to export the timestamp int the token
on retry packets. Doing this in cluster mode the nodes don't
understand the timestamp from tokens generated by others.
This patch re-work this using the the real current date (wall-clock time).
Timestamp are also now considered in secondes instead of milleseconds.
This patch should be backported until v2.6
sink_write() currently relies on sink->maxlen to know when to stop
writing a given payload.
But it could be useful to pass a smaller, explicit value to sink_write()
to stop before the ring maxlen, for instance if the ring is shared between
multiple feeders.
sink_write() now takes an optional maxlen parameter:
if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen
is smaller than ring->maxlen, else only ring->maxlen will be considered.
[for haproxy <= 2.7, patch must be applied by hand: that is:
__sink_write() and sink_write() should be patched to take maxlen into
account and function calls to sink_write() should use 0 as second argument
to keep original behavior]
Thierry Fournier reported an annoying side-effect when using the debug()
converter.
Consider the following examples:
[1] http-request set-var(txn.test) bool(true),ipmask(24)
[2] http-request redirect location /match if { bool(true),ipmask(32) }
When starting haproxy with [1] example we get:
config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request set-var(txn.test)' rule : converter 'ipmask' cannot be applied.
With [2], we get:
config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request redirect' rule : error in condition: converter 'ipmask' cannot be applied in ACL expression 'bool(true),ipmask(32)'.
Now consider the following examples which are based on [1] and [2]
but with the debug() sample conv inserted in-between those incompatible
sample types:
[1*] http-request set-var(txn.test) bool(true),debug,ipmask(24)
[2*] http-request redirect location /match if { bool(true),debug,ipmask(32) }
According to the documentation, "it is safe to insert the debug converter
anywhere in a chain, even with non-printable sample types".
Thus we don't expect any side-effect from using it within a chain.
However in current implementation, because of debug() returning SMP_T_ANY
type which is a pseudo type (only resolved at runtime), the sample
compatibility checks performed at parsing time are completely uneffective.
(haproxy will start and no warning will be emitted)
The indesirable effect of this is that debug() prevents haproxy from
warning you about impossible type conversions, hiding potential errors
in the configuration that could result to unexpected evaluation or logic
while serving live traffic. We better let haproxy warn you about this kind
of errors when it has the chance.
With our previous examples, this could cause some inconveniences. Let's
say for example that you are testing a configuration prior to deploying
it. When testing the config you might want to use debug() converter from
time to time to check how the conversion chain behaves.
Now after deploying the exact same conf, you might want to remove those
testing debug() statements since they are no longer relevant.. but
removing them could "break" the config and suddenly prevent haproxy from
starting upon reload/restart. (this is the expected behavior, but it
comes a bit too late because of debug() hiding errors during tests..)
To fix this, we introduce a new output type for sample expressions:
SMP_T_SAME - may only be used as "expected" out_type (parsing time)
for sample converters.
As it name implies, it is a way for the developpers to indicate that the
resulting converter's output type is guaranteed to match the type of the
sample that is presented on the converter's input side.
(converter may alter data content, but data type must not be changed)
What it does is that it tells haproxy that if switching to the converter
(by looking at the converter's input only, since outype is SAME) is
conversion-free, then the converter type can safely be ignored for type
compatibility checks within the chain.
debug()'s out_type is thus set to SMP_T_SAME instead of ANY, which allows
it to fully comply with the doc in the sense that it does not impact the
conversion chain when inserted between sample items.
Co-authored-by: Thierry Fournier <thierry.f.78@gmail.com>
SMP_T_ADDR support was added in b805f71 ("MEDIUM: sample: let the cast
functions set their output type").
According to the above commit, it is made clear that the ADDR type is
a pseudo/generic type that may be used for compatibility checks but
that cannot be emitted from a fetch or converter.
With that in mind, all conversions from ADDR to other types were
explicitly disabled in the compatibility matrix.
But later, when map_*_ip functions were updated in b2f8f08 ("MINOR: map:
The map can return IPv4 and IPv6"), we started using ADDR as "expected"
output type for converters. This still complies with the original
description from b805f71, because it is used as the theoric output
type, and is never emitted from the converters themselves (only "real"
types such as IPV4 or IPV6 are actually being emitted at runtime).
But this introduced an ambiguity as well as a few bugs, because some
compatibility checks are being performed at config parse time, and thus
rely on the expected output type to check if the conversion from current
element to the next element in the chain is theorically supported.
However, because the compatibility matrix doesn't support ADDR to other
types it is currently impossible to use map_*_ip converters in the middle
of a chain (the only supported usage is when map_*_ip converters are
at the very end of the chain).
To illustrate this, consider the following examples:
acl test str(ok),map_str_ip(test.map) -m found # this will work
acl test str(ok),map_str_ip(test.map),ipmask(24) -m found # this will raise an error
Likewise, stktable_compatible_sample() check for stick tables also relies
on out_type[table_type] compatibility check, so map_*_ip cannot be used
with sticktables at the moment:
backend st_test
stick-table type string size 1m expire 10m store http_req_rate(10m)
frontend fe
bind localhost:8080
mode http
http-request track-sc0 str(test),map_str_ip(test.map) table st_test # raises an error
To fix this, and prevent future usage of ADDR as expected output type
(for either fetches or converters) from introducing new bugs, the
ADDR=>? part of the matrix should follow the ANY type logic.
That is, ADDR, which is a pseudo-type, must be compatible with itself,
and where IPV4 and IPV6 both support a backward conversion to a given
type, ADDR must support it as well. It is done by setting the relevant
ADDR entries to c_pseudo() in the compatibility matrix to indicate that
the operation is theorically supported (c_pseudo() will never be executed
because ADDR should not be emitted: this only serves as a hint for
compatibility checks at parse time).
This is what's being done in this commit, thanks to this the broken
examples documented above should work as expected now, and future
usage of ADDR as out_type should not cause any issue.
This function is used for ANY=>!ANY conversions in the compatibility
matrix to help differentiate between real NOOP (c_none) and pseudo
conversions that are theorically supported at config parse time but can
never occur at runtime,. That is, to explicit the fact that actual related
runtime operations (e.g.: ANY->IPV4) are not NOOP since they might require
some conversion to be performed depending on the input type.
When checking the conf we don't know the effective out types so
cast[pseudo type][pseudo type] is allowed in the compatibility matrix,
but at runtime we only expect cast[real type][(real type || pseudo type)]
because fetches and converters may not emit pseudo types, thus using
c_none() everywhere was too ambiguous.
The process will crash if c_pseudo() is invoked to help catch bugs:
crashing here means that a pseudo type has been encountered on a
converter's input at runtime (because it was emitted earlier in the
chain), which is not supported and results from a broken sample fetch
or converter implementation. (pseudo types may only be used as out_type
in sample definitions for compatibility checks at parsing time)
Both sample_parse_expr() and parse_acl_expr() implement some code
logic to parse sample conv list after respective fetch or acl keyword.
(Seems like the acl one was inspired by the sample one historically)
But there is clearly code duplication between the two functions, making
them hard to maintain.
Hopefully, the parsing logic between them has stayed pretty much the
same, thus the sample conv parsing part may be moved in a dedicated
helper parsing function.
This is what's being done in this commit, we're adding the new function
sample_parse_expr_cnv() which does a single thing: parse the converters
that are listed right after a sample fetch keyword and inject them into
an already existing sample expression.
Both sample_parse_expr() and parse_acl_expr() were adapted to now make
use of this specific parsing function and duplicated code parts were
cleaned up.
Although sample_parse_expr() remains quite complicated (numerous function
arguments due to contextual parsing data) the first goal was to get rid of
code duplication without impacting the current behavior, with the added
benefit that it may allow further code cleanups / simplification in the
future.
This patch allows the low level packet parser to drop packets with type for discarded
packet number spaces. Furthermore, this prevents it from reallocating new encryption
levels and packet number spaces already released/discarded. When a packet number space
is discarded, it MUST NOT be reallocated.
As the packet number space discarding is done asap the type of packet received is
known, some packet number space discarding check may be safely removed from qc_try_rm_hp()
and qc_qel_may_rm_hp() which are called after having parse the packet header, and
is type.
As the packet number spaces and encryption level are dynamically allocated,
the information about the packet number space discarded status must be kept
somewhere else than in these objects.
quic_tls_discard_keys() is no more useful.
Modify quic_pktns_discard() to do the same job: flag the quic_conn object
has having discarded packet number space.
Implement quic_tls_pktns_is_disarded() to check if a packet number space is
discarded. Note the Application data packet number space is never discarded.
The maximum length of the secrets derived by the TLS stack is 384 bits.
This reduces the size of the objects provided by the "quic_tls_secret" pool by
16 bytes.
Should be backported as far as 2.6
Replace ->els static array of encryption levels by 4 pointers into the QUIC
connection object, quic_conn struct.
->iel denotes the Initial encryption level,
->eel the Early-Data encryption level,
->hel the Handshaske encryption level and
->ael the Application Data encryption level.
Add ->qel_list to this structure to list the encryption levels after having been
allocated. Modify consequently the encryption level object itself (quic_enc_level
struct) so that it might be added to ->qel_list QUIC connection list of
encryption levels.
Implement qc_enc_level_alloc() to initialize the value of a pointer to an encryption
level object. It is used to initialized the pointer newly added to the quic_conn
structure. It also takes a packet number space pointer address as argument to
initialize it if not already initialized.
Modify quic_tls_ctx_reset() to call it from quic_conn_enc_level_init() which is
called by qc_enc_level_alloc() to allocate an encryption level object.
Implement 2 new helper functions:
- ssl_to_qel_addr() to match and pointer address to a quic_encryption level
attached to a quic_conn object with a TLS encryption level enum value;
- qc_quic_enc_level() to match a pointer to a quic_encryption level attached
to a quic_conn object with an internal encryption level enum value.
This functions are useful to be called from ->set_encryption_secrets() and
->add_handshake_data() TLS stack called which takes a TLS encryption enum
as argument (enum ssl_encryption_level_t).
Replace all the use of the qc->els[] array element values by one of the newly
added ->[ieha]el quic_conn struct member values.
Very simple patch to define and declare a pool for the QUIC TLS encryptions levels.
It will be used to dynamically allocate such objects to be attached to the
QUIC connection object (quic_conn struct) and to remove from quic_conn struct the
static array of encryption levels (see ->els[]).
Add a pool to dynamically handle the memory used for the QUIC TLS packet number spaces.
Remove the static array of packet number spaces at QUIC connection level (struct
quic_conn) and add three new members to quic_conn struc as pointers to quic_pktns
struct, one by packet number space as follows:
->ipktns for Initial packet number space,
->hpktns for Handshake packet number space and
->apktns for Application packet number space.
Also add a ->pktns_list new member (struct list) to quic_conn struct to attach
the list of the packet number spaces allocated for the QUIC connection.
Implement ssl_to_quic_pktns() to map and retrieve the addresses of these pointers
from TLS stack encryption levels.
Modify quic_pktns_init() to initialize these members.
Modify ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() to
allocate the packet numbers and initialize the encryption level.
Implement quic_pktns_release() which takes pointers to pointers to packet number
space objects to release the memory allocated for a packet number space attached
to a QUIC connection and reset their address values.
Modify qc_new_conn() to allocation only the Initial packet number space and
Initial encryption level.
Modify QUIC loss detection API (quic_loss.c) to use the new ->pktns_list
list attached to a QUIC connection in place of a static array of packet number
spaces.
Replace at several locations the use of elements of an array of packet number
spaces by one of the three pointers to packet number spaces
haproxy/quic_tls-t.h is the correct place to quic_enc_level structure
definition.
Should be backported as far as 2.6 to ease any further backport to come.
In some cases it may be desirable for latency reasons to forcefully
flush the queue even if it results in suboptimal compression. In our
case the queue might contain up to almost 4 bytes, which need an EOB
and a switch to literal mode, followed by 4 bytes to encode an empty
message. This means that each call can add 5 extra bytes in the ouput
stream. And the flush may also result in the header being produced for
the first time, which can amount to 2 or 10 bytes (zlib or gzip). In
the worst case, a total of 19 bytes may be emitted at once upon a flush
with 31 pending bits and a gzip header.
This is libslz upstream commit cf8c4668e4b4216e930b56338847d8d46a6bfda9.
This patch introduces the "client-sigalgs" keyword for the server line,
which allows to configure the list of server signature algorithms
negociated during the handshake. Also available as
"ssl-default-server-client-sigalgs" in the global section.
This patch introduces the "sigalgs" keyword for the server line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-server-sigalgs" in
the global section.
This bug arrived with this commit:
MINOR: quic: Remove pool_zalloc() from qc_new_conn()
Missing initialization of largest packet number received during a keyupdate phase.
This prevented the keyupdate feature from working and made the keyupdate interop
tests to fail for all the clients.
Furthermore, ->flags from quic_tls_ctx was also not initialized. This could
also impact the keyupdate feature at least.
No backport needed.
qc_new_conn() is ued to initialize QUIC connections with quic_conn struct objects.
This function calls quic_conn_release() when it fails to initialize a connection.
quic_conn_release() is also called to release the memory allocated by a QUIC
connection.
Replace pool_zalloc() by pool_alloc() in this function and initialize
all quic_conn struct members which are referenced by quic_conn_release() to
prevent use of non initialized variables in this fonction.
The ebtrees, the lists attached to quic_conn struct must be initialized.
The tasks must be reset to their NULL default values to be safely destroyed
by task_destroy(). This is all the case for all the TLS cipher contexts
of the encryption levels (struct quic_enc_level) and those for the keyupdate.
The packet number spaces (struct quic_pktns) must also be initialized.
->prx_counters pointer must be initialized to prevent quic_conn_prx_cntrs_update()
from dereferencing this pointer.
->latest_rtt member of quic_loss struct must also be initialized. This is done
by quic_loss_init() called by quic_path_init().
->tx.pto_probe member of quic_pktns struct was not initialized by quic_pktns_init().
This bug never occured because all quic_pktns structs are attached to quic_conn
structs which are always pool_zalloc()'ed.
Must be backported as far as 2.6.
proxy default-server is a specific type of server that is not allocated
using new_server(): it is directly stored within the parent proxy
structure. However, since it may contain some default config options that
may be inherited by regular servers, it is also subject to dynamic members
(strings, structures..) that needs to be deallocated when the parent proxy
is cleaned up.
Unfortunately, srv_drop() may not be used directly from p->defsrv since
this function is meant to be used on regular servers only (those created
using new_server()).
To circumvent this, we're splitting srv_drop() to make a new function
called srv_free_params() that takes care of the member cleaning which
originally takes place in srv_drop(). This function is exposed through
server.h, so it may be called from outside server.c.
Thanks to this, calling srv_free_params(&p->defsrv) from free_proxy()
prevents any memory leaks due to dynamic parameters allocated when
parsing a default-server line from a proxy section.
This partially fixes GH #2173 and may be backported to 2.8.
[While it could also be relevant for other stable versions, the patch
won't apply due to architectural changes / name changes between 2.4 => 2.6
and then 2.6 => 2.8. Considering this is a minor fix that only makes
memory analyzers happy during deinit paths (at least for <= 2.8), it might
not be worth the trouble to backport them any further?]
Historically the client-fin and server-fin timeouts were made to allow
a connection closure to be effective quickly if the last data were sent
down a socket and the client didn't close, something that can happen
when the peer's FIN is lost and retransmits are blocked by a firewall
for example. This made complete sense in 1.5 for TCP and HTTP in close
mode. But nowadays with muxes, it's not done at the right layer anymore
and even the description doesn't match what is being done, because what
happens is that the stream will abort the whole transfer after it's done
sending to the mux and this timeout expires.
We've seen in GH issue 2095 that this can happen with very short timeout
values, and while this didn't trigger often before, now that the muxes
(h2 & quic) properly report an end of stream before even the first
sc_conn_sync_recv(), it seems that it can happen more often, and have
two undesirable effects:
- logging a timeout when that's not the case
- aborting the request channel, hence the server-side conn, possibly
before it had a chance to be put back to the idle list, causing
this connection to be closed and not reusable.
Unfortunately for TCP (mux_pt) this remains necessary because the mux
doesn't have a timeout task. So here we're adding tests to only do
this through an HTX mux. But to be really clean we should in fact
completely drop all of this and implement these timeouts in the mux
itself.
This needs to be backported to 2.8 where the issue was discovered,
and maybe carefully to older versions, though that is not sure at
all. In any case, using a higher timeout or removing client-fin in
HTTP proxies is sufficient to make the issue disappear.
As seen in commits 33a4461fa ("BUG/MINOR: stats: Fix Lua's `get_stats`
function") and a46b142e8 ("BUG/MINOR: Missing stat_field_names (since
f21d17bb)") it seems frequent to omit to update stats_fields[] when
adding a new ST_F_xxx entry. This breaks Lua's get_stats() and shows
a "(null)" in the header of "show stat", but that one is not detectable
to the naked eye anymore.
Let's add a reminder above the enum declaration about this, and a small
reg tests checking for the absence of "(null)". It was verified to fail
before the last patch above.
This patch is similar to the previous one but for QUIC mux functions
used inside the mux code itself or application layer. Replace all
occurences of qc_* prefix by qcc_* or qcs_*. This should help to better
differentiate code between quic_conn and MUX.
This should be backported up to 2.7.
There was a misnaming in stats counter for *_BLOCKED frames in regard to
QUIC rfc convention. This patch fixes it to prevent future ambiguity :
- STREAMS_BLOCKED -> STREAM_DATA_BLOCKED
- STREAMS_DATA_BLOCKED_BIDI -> STREAMS_BLOCKED_BIDI
- STREAMS_DATA_BLOCKED_UNI -> STREAMS_BLOCKED_UNI
This should be backported up to 2.7.
Remove nb_streams field from qcc. It was not used outside of a BUG_ON()
statement to ensure we never have a negative count of streams. However
this is already checked with other fields.
This should be backported up to 2.7.
The Linux kernel maintains data structures to track a processes' open file
descriptors, and it expands these structures as necessary when FD usage grows
(at every FD=2^X starting at 64). However when threading is in use, during
expansion the kernel will pause (observed up to 47ms) while it waits for thread
synchronization (see https://bugzilla.kernel.org/show_bug.cgi?id=217366).
This change addresses the issue and avoids the random pauses by opening the
maximum file descriptor during initialization, so that expansion will not occur
while processing traffic.
TCC silently ignores the weak and section attributes, which ruins the
initcalls. Technically we're exactly in the same situation as with an
obsolete linker. Let's just automatically set the flag if TCC is
detected, this avoids surprises where the program compiles but does
not start.
No backport is needed.
TCC doesn't knoow about __attribute__((weak)), it silently ignores it.
We could add a "static" modifier there in this case but we already have
an alternate portable mode that is based on a slightly larger literal
for obsolete linkers (and non-ELF systems) which choke on weak. Let's
just add the test for tcc there and use it in this case.
No backport is needed.
TCC is upset by the declaration looking like:
const unsigned char ist_lc[256] __attribute__((weak)) = ((const unsigned char[256]){ ... });
It was written like this because it's expanded from the _IST_LC macro
but it's never used as-is, it's only used from ist_lc, which should be
the one containing the cast so that the macro only contains the list of
bytes that can be used in both places. And this assigns more consistent
roles to the lower and upper case macro/variable now, one is typed and
the other one not. No backport is needed.
Add ->sent_pkt counter to quic_conn struct to count the packet at QUIC connection
level. Then, when the connection is released, the ->sent_pkt counter value
is added to the one for the listener.
Must be backported to 2.7.
Add some statistical counters to quic_conn struct from quic_counters struct which
are used at listener level to handle them at QUIC connection level. This avoid
calling atomic functions. Furthermore this will be useful soon when a counter will
be added for the total number of packets which have been sent which will be very
often incremented.
Some counters were not added, espcially those which count the number of QUIC errors
by QUIC error types. Indeed such counters would be incremented most of the time
only one time at QUIC connection level.
Implement quic_conn_prx_cntrs_update() which accumulates the QUIC connection level
statistical counters to the listener level statistical counters.
Must be backported to 2.7.
Thierry Fournier reported a build breakage with the ubiquitous make
3.81, LDFLAGS were ignored. This is caused by the declaration of the
collect_opt_flags macro that is defined with an "=" sign, something
that only appeared in 3.82 and that is not necessary. With it removed,
the build now works fine at least from 3.80 to 4.3.
No backport is needed since this makefile cleanup appeared in 2.8.
During a code audit of the various situations that promote ERR_PENDING to
ERROR, it appeared that:
- all muxes use se_fl_set_error() to set it, which chooses either based
on EOI/EOS presence ;
- EOI/EOS that arrive late after ERR_PENDING were not systematically
upgraded to ERROR
This results in confusion about how such ERROR or ERR_PENDING ought to
be handled, which is not quite desirable.
This patch adds a test to se_fl_set() to detect if we're setting EOI or
EOS while ERR_PENDING is present, or the other way around so that any
sequence of EOI/EOS <-> ERR_PENDING results in ERROR being set. This
way there will no longer be possible situations where ERROR is missing
while the other ones are set.
quic_aead_iv_build() should never fail unless we call it with buffers of
different size. This never happens in the code as every input buffers
are of size QUIC_TLS_IV_LEN.
Remove the return value and add a BUG_ON() to prevent future misusage.
This is especially useful to remove one error handling on the sending
patch via quic_packet_encrypt().
This should be backported up to 2.7.
Right now there's no way to enforce a specific value of now_ms upon
startup in order to compensate for the time it takes to load a config,
specifically when dealing with the health check startup. For this we'd
need to force the now_offset value to compensate for the last known
value of the current date. This patch exposes a function to do exactly
this.
Just like we have the uptime in "show info", let's add the boot time.
It's trivial to collect as it's just the difference between the ready
date and the start date, and will allow users to monitor this element
in order to take action before it starts becoming problematic. Here
the boot time is reported in milliseconds, so this allows to even
observe sub-second anomalies in startup delays.
Some huge configs take a significant amount of time to start and this
can cause some trouble (e.g. health checks getting delayed and grouped,
process not responding to the CLI etc). For example, some configs might
start fast in certain environments and slowly in other ones just due to
the use of a wrong DNS server that delays all libc's resolutions. Let's
first start by measuring it by keeping a copy of the most recently known
ready date, once before calling check_config_validity() and then refine
it when leaving this function. A last call is finally performed just
before deciding to split between master and worker processes, and it covers
the whole boot. It's trivial to collect and even allows to get rid of a
call to clock_update_date() in function check_config_validity() that was
used in hope to better schedule future events.
Uninline and move qc_attach_sc() function to implementation source file.
This will be useful for next commit to add traces in it.
This should be backported up to 2.7.
MUX is responsible to put EOS on stream when read channel is closed.
This happens if underlying connection is closed or a RESET_STREAM is
received. FIN STREAM is ignored in this case.
For connection closure, simply check for CO_FL_SOCK_RD_SH.
For RESET_STREAM reception, a new flag QC_CF_RECV_RESET has been
introduced. It is set when RESET_STREAM is received, unless we already
received all data. This is conform to QUIC RFC which allows to ignore a
RESET_STREAM in this case. During RESET_STREAM processing, input buffer
is emptied so EOS can be reported right away on recv_buf operation.
This should be backported up to 2.7.
Fix the openssl build with older openssl version by disabling the new
ssl_c_r_dn fetch.
This also disable the ssl_client_samples.vtc file for OpenSSL version
older than 1.1.1
This patch addresses #1514, adds the ability to fetch DN of the root
ca that was in the chain when client certificate was verified during SSL
handshake.
qc_stream_buf_alloc() can fail for two reasons :
* limit of Tx buffer per connection reached
* allocation failure
The first case is properly treated. A flag QC_CF_CONN_FULL is set on the
connection to interrupt emission. It is cleared when a buffer became
available after in order ACK reception and the MUX tasklet is woken up.
The allocation failure was handled with the same mechanism which in this
case is not appropriate and could lead to a connection transfer freeze.
Instead, prefer to close the connection with a QUIC internal error code.
To differentiate the two causes, qc_stream_buf_alloc() API was changed
to return the number of available buffers to the caller.
This must be backported up to 2.6.
Remove QUIC MUX function qcs_http_handle_standalone_fin(). The purpose
of this function was only used when receiving an empty STREAM frame with
FIN bit. Besides, it was called by each application protocol which could
have different approach and render the function purpose unclear.
Invocation of qcs_http_handle_standalone_fin() have been replaced by
explicit code in both H3 and HTTP/0.9 module. In the process, use
htx_set_eom() to reliably put EOM on the HTX message.
This should be backported up to 2.7, along with the previous patch which
introduced htx_set_eom().
Implement a new HTX utility function htx_set_eom(). If the HTX message
is empty, it will first add a dummy EOT block. This is a small trick
needed to ensure readers will detect the HTX buffer as not empty and
retrieve the EOM flag.
Replace the H2 code related by a htx_set_eom() invocation. QUIC also has
the same code which will be replaced in the next commit.
This should be backported up to 2.7 before the related QUIC patch.
This provides more consistency between the master and the worker. When
"prompt timed" is passed on the master, the timed mode is toggled. When
enabled, for a master it will show the master process' uptime, and for
a worker it will show this worker's uptime. Example:
master> prompt timed
[0:00:00:50] master> show proc
#<PID> <type> <reloads> <uptime> <version>
11940 master 1 [failed: 0] 0d00h02m10s 2.8-dev11-474c14-21
# workers
11955 worker 0 0d00h00m59s 2.8-dev11-474c14-21
# old workers
11942 worker 1 0d00h02m10s 2.8-dev11-474c14-21
# programs
[0:00:00:58] master> @!11955
[0:00:01:03] 11955> @!11942
[0:00:02:17] 11942> @
[0:00:01:10] master>
Entering "prompt timed" toggles reporting of the process' uptime in
the prompt, which will report days, hours, minutes and seconds since
it was started. As discussed with Tim in issue #2145, this can be
convenient to roughly estimate the time between two outputs, as well
as detecting that a process failed to be reloaded for example.
Adding http_free_redirect_rule() function to free a single redirect rule
since it may be required to free rules outside of free_proxy() function.
This patch is required for an upcoming bugfix.
[for 2.2, free_proxy function did not exist (first seen in 2.4), thus
http_free_redirect_rule() needs to be deducted from haproxy.c deinit()
function if the patch is required]
A xref is added between the endpoint descriptors. It is created when the
server endpoint is attached to the SC and it is destroyed when an endpoint
is detached.
This xref is not used for now. But it will be useful to retrieve info about
an endpoint for the opposite side. It is also the warranty there is still a
endpoint attached on the other side.
When "optioon socket-stats" is used in a frontend, its listeners have
their own stats and will appear in the stats page. And when the stats
page has "stats show-legends", then a tooltip appears on each such
socket with ip:port and ID. The problem is that since QUIC arrived, it
was not possible to distinguish the TCP listeners from the QUIC ones
because no protocol indication was mentioned. Now we add a "proto"
legend there with the protocol name, so we can see "tcp4" or "quic6"
and figure how the socket is bound.
Following previous patch, error notification from quic_conn has been
adjusted to rely on standard connection flags. Most notably, CO_FL_ERROR
on the connection instance when a fatal error is detected.
Check for CO_FL_ERROR is implemented by qc_send(). If set the new flag
QC_CF_ERR_CONN will be set for the MUX instance. This flag is similar to
the local error flag and will abort most of the futur processing. To
ensure stream upper layer is also notified, qc_wake_some_streams()
called by qc_process() will put the stream on error if this new flag is
set.
This should be backported up to 2.7.
When an error is detected at quic-conn layer, the upper MUX must be
notified. Previously, this was done relying on quic_conn flag
QUIC_FL_CONN_NOTIFY_CLOSE set and the MUX wake callback called on
connection closure.
Adjust this mechanism to use an approach more similar to other transport
layers in haproxy. On error, connection flags are updated with
CO_FL_ERROR, CO_FL_SOCK_RD_SH and CO_FL_SOCK_WR_SH. The MUX is then
notified when the error happened instead of just before the closing. To
reflect this change, qc_notify_close() has been renamed qc_notify_err().
This function must now be explicitely called every time a new error
condition arises on the quic_conn layer.
To ensure MUX send is disabled on error, qc_send_mux() now checks
CO_FL_SOCK_WR_SH. If set, the function returns an error. This should
prevent the MUX from sending data on closing or draining state.
To complete this patch, MUX layer must now check for CO_FL_ERROR
explicitely. This will be the subject of the following commit.
This should be backported up to 2.7.
Add traces for when an upper layer stream is woken up by the MUX. This
should help to diagnose frozen stream issues.
This should be backported up to 2.7.
As discussed a few times over the years, it's quite difficult to know
how often we stop accepting connections because the global maxconn was
reached. This is not easy to know because when we reach the limit we
stop accepting but we don't know if incoming connections are pending,
so it's not possible to know how many were delayed just because of this.
However, an interesting equivalent metric consist in counting the number
of times an accepted incoming connection resulted in the limit being
reached. I.e. "we've accepted the last one for now". That doesn't imply
any other one got delayed but it's a factual indicator that something
might have been delayed. And by counting the number of such events, it
becomes easier to know whether some limits need to be adjusted because
they're reached often, or if it's exceptionally rare.
The metric is reported as a counter in show info and on the stats page
in the info section right next to "maxconn".
Now in "show info" we have a TotalWarnings field that reports the total
number of warnings issued since the process started. It's also reported
in the the stats page next to the uptime.
LIST_DELETE doesn't affect the previous pointers of the stored element.
This can sometimes hide bugs when such a pointer is reused by accident
in a LIST_NEXT() or equivalent after having been detached for example, or
ia another LIST_DELETE is performed again, something that LIST_DEL_INIT()
is immune to. By compiling with -DDEBUG_LIST, we'll replace a freshly
detached list element with two invalid pointers that will cause a crash
in case of accidental misuse. It's not enabled by default.
->others member of tp_version_information structure pointed to a buffer in the
TLS stack used to parse the transport parameters. There is no garantee that this
buffer is available until the connection is released.
Do not dump the available versions selected by the client anymore, but displayed the
chosen one (selected by the client for this connection) and the negotiated one.
Must be backported to 2.7 and 2.6.
A recent series of patch were introduced to streamline error generation
by QUIC MUX. However, a regression was introduced : every error
generated by the MUX was built as CONNECTION_CLOSE_APP frame, whereas it
should be only for H3/QPACK errors.
Fix this by adding an argument <app> in qcc_set_error. When false, a
standard CONNECTION_CLOSE is used as error.
This bug was detected by QUIC tracker with the following tests
"stop_sending" and "server_flow_control" which requires a
CONNECTION_CLOSE frame.
This must be backported up to 2.7.
The comment for the stconn structure was still referencing the SHUTR/SHUTW
flags. These flags were replaced and we now use ABRT/SHUT flags in
comments. The comment itself was slightly updated to be accurate.
When sc_need_room() is called, the caller cannot request more free space
than a minimum value to be sure it is always possible to unblock it. it is a
safety guard to not freeze any SC on NEED_ROOM condition. At worse it will
lead to some wakeups un excess at the edge.
To keep things simple, the following minimum is used:
(global.tune.bufsize - global.tune.maxrewrite - sizeof(struct htx))
Most of the function in quic_frame.c and quic_frame.h manipulate <buf> buffer
position variables which have nothing to see with struct buffer variables.
Rename them to <pos>
Should be backported to 2.7.
Commit e83f937cc ("MEDIUM: quic: use a global CID trees list") uses a
local variable "tree" used only for locks, but when threads are disabled
it spews a warning about this unused variable.
The function that cpu-map uses to parse CPU sets, parse_cpu_set(), was
etended in 2.4 with commit a80823543 ("MINOR: cfgparse: support the
comma separator on parse_cpu_set") to support commas between ranges.
But since it was quite late in the development cycle, by then it was
decided not to add a last-minute surprise and not to magically support
commas in cpu-map, hence the "comma_allowed" argument.
Since then we know that it was not the best choice, because the comma
is silently ignored in the cpu-map syntax, causing all sorts of
surprises in field with threads running on a single node for example.
In addition it's quite common to copy-paste a taskset line and put it
directly into the haproxy configuration.
This commit relaxes this rule an finally allows cpu-map to support
commas between ranges. It simply consists in removing the comma_allowed
argument in the parse_cpu_set() function. The doc was updated to
reflect this.
Exposing a new hlua function, available from body or init contexts, that
forcefully disables the sending of email alerts even if the mailers are
defined in haproxy configuration.
This will help for sending email directly from lua.
(prevent legacy email sending from intefering with lua)
Adding a new event type: SERVER_CHECK.
This event is published when a server's check state ought to be reported.
(check status change or check result)
SERVER_CHECK event is provided as a server event with additional data
carrying relevant check's context such as check's result and health.
Adding a new SERVER event in the event_hdl API.
SERVER_ADMIN is implemented as an advanced server event.
It is published each time the administrative state changes.
(when s->cur_admin changes)
SERVER_ADMIN data is an event_hdl_cb_data_server_admin struct that
provides additional info related to the admin state change, but can
be casted as a regular event_hdl_cb_data_server struct if additional
info is not needed.
Adding a new SERVER event in the event_hdl API.
SERVER_STATE is implemented as an advanced server event.
It is published each time the server's effective state changes.
(when s->cur_state changes)
SERVER_STATE data is an event_hdl_cb_data_server_state struct that
provides additional info related to the server state change, but can
be casted as a regular event_hdl_cb_data_server struct if additional
info is not needed.
sc_need_room() now takes the required free space to receive more data as
parameter. All calls to this function are updated accordingly. For now, this
value is set but not used. When we are waiting for a buffer, 0 is used. So
we expect to be unblocked ASAP. However this must be reviewed because
SC_FL_NEED_BUF is probably enough in this case and this flag is already set
if the input buffer allocation fails.
When the SC is blocked because it is waiting for room in the input buffer,
it will be responsible to specify the minimum free space required to
progress. In this commit, we only introduce the field in the stconn
structure that will be used to store this value. It is a signed value with
the following meaning:
* -1: The SC is waiting for room but not based on the buffer state. It
will be typically used during splicing when the pipe is full. In
this case, only a successful send can unblock the SC.
* >= 0; The minimum free space in the input buffer to unblock the SC. 0 is
a special value to specify the SC must be unblocked ASAP, by the
stream, at the end of process_stream() or when output data are
consumed on the opposite side.
stats_putchk() is updated to use the applet API instead of the channel API
to write data. To do so, the appctx is passed as parameter instead of the
channel. This way, the applet does not need to take care to request more
room it it fails to put data into the channel's buffer.
This commit introduces the keyword "client-sigalgs" for the bind line,
which does the same as "sigalgs" but for the client authentication.
"ssl-default-bind-client-sigalgs" allows to set the default parameter
for all the bind lines.
This patch should fix issue #2081.
This patch introduces the "sigalgs" keyword for the bind line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-bind-sigalgs" in
the default section.
This patch was originally written by Bruno Henc.
The thread dump mechanism that is used by "show threads" and by the
panic dump is overly complicated due to an initial misdesign. It
firsts wakes all threads, then serializes their dumps, then releases
them, while taking extreme care not to face colliding dumps. In fact
this is not what we need and it reached a limit where big machines
cannot dump all their threads anymore due to buffer size limitations.
What is needed instead is to be able to dump *one* thread, and to let
the requester iterate on all threads.
That's what this patch does. It adds the thread_dump_buffer to the
struct thread_ctx so that the requester offers the buffer to the
thread that is about to be dumped. This buffer also serves as a lock.
A thread at rest has a NULL, a valid pointer indicates the thread is
using it, and 0x1 (NULL+1) is used by the dumped thread to tell the
requester it's done. This makes sure that a given thread is dumped
once at a time. In addition to this, the calling thread decides
whether it accesses the thread by itself or via the debug signal
handler, in order to get a backtrace. This is much saner because the
calling thread is free to do whatever it wants with the buffer after
each thread is dumped, and there is no dependency between threads,
once they've dumped, they're free to continue (and possibly to dump
for another requester if needed). Finally, when the THREAD_DUMP
feature is disabled and the debug signal is not used, the requester
accesses the thread by itself like before.
For now we still have the buffer size limitation but it will be
addressed in future patches.
NS_TO_TV helper was implemented in 591fa59 ("MINOR: time: add conversions
to/from nanosecond timestamps")
Due to NS_TO_TV being implemented as a macro and not a function, we must
take extra care when manipulating user input.
In current implementation, 't' argument is not isolated within the macro.
Because of this, NS_TO_TV(1 + 1) will expand to:
((const struct timeval){ .tv_sec = 1 + 1 / 1000000000ULL, .tv_usec = (1 + 1 % 1000000000ULL) / 1000U })
Instead of:
((const struct timeval){ .tv_sec = 2 / 1000000000ULL, .tv_usec = (2 % 1000000000ULL) / 1000U })
As such, NS_TO_TV usage in hlua_now() is currently incorrect and this
results in unexpected values being passed to lua.
In this patch, we're adding an extra parenthesis around 't' in NS_TO_TV()
macro to make it safe against such usages. (that is: ensure proper
argument expansion as if NS_TO_TV was implemented as a function)
This is a 2.8 specific bug, no backport needed.
When a fatal error is detected by the QUIC MUX or H3 layer, the
connection should be closed with a CONNECTION_CLOSE with an error code
as the reason.
Previously, a direct call was used to the quic_conn layer to try to
close the connection. This API was adjusted to be more flexible. Now,
when an error is detected, the function qcc_set_error() is called. This
set the flag QC_CF_ERRL with the error code stored by the MUX. The
connection will be closed soon so most of the operations are not
conducted anymore. Connection is then finally closed during qc_send()
via quic_conn layer if QC_CF_ERRL is set. This will set the flag
QC_CF_ERRL_DONE which indicates that the MUX instance can be freed.
This model is cleaner and brings the following improvments :
- interaction with quic_conn layer for closure is centralized on a
single function
- CO_FL_ERROR is not set anymore. This was incorrect as this should be
reserved to errors reported by the transport layer to be similar with
other haproxy components. As a consequence, qcc_is_dead() has been
adjusted to check for QC_CF_ERRL_DONE to release the MUX instance.
This should be backported up to 2.7.
Add a dedicated trace event QMUX_EV_QCC_ERR. This is used for locally
detected error when a CONNECTION_CLOSE should be emitted.
This should be backported up to 2.7.
Now that "now" is no more a timeval, there's no point keeping a copy
of it as a timeval, let's also switch start_time to nanoseconds, it
simplifies operations.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.