An interesting issue was met when testing the mux-to-mux forwarding code.
In order to preserve fairness, in h2_snd_buf() if other streams are waiting
in send_list or fctl_list, the stream that is attempting to send also goes
to its list, and will be woken up by h2_process_mux() or h2_send() when
some space is released. But on rare occasions, there are only a few (or
even a single) streams waiting in this list, and these streams are just
quickly removed because of a timeout or a quick h2_detach() that calls
h2s_destroy(). In this case there's no even to wake up the other waiting
stream in its list, and this will possibly resume processing after some
client WINDOW_UPDATE frames or even new streams, so usually it doesn't
last too long and it not much noticeable, reason why it was left that
long. In addition, measures have shown that in heavy network-bound
benchmark, this exact situation happens on less than 1% of the streams
(reached 4% with mux-mux).
The fix here consists in replacing these LIST_DEL_INIT() calls on
h2s->list with a function call that checks if other streams were queued
to the send_list recently, and if so, which also tries to resume them
by calling h2_resume_each_sending_h2s(). The detection of late additions
is made via a new flag on the connection, H2_CF_WAIT_INLIST, which is set
when a stream is queued due to other streams being present, and which is
cleared when this is function is called.
It is particularly difficult to reproduce this case which is particularly
timing-dependent, but in a constrained environment, a test involving 32
conns of 20 streams each, all downloading a 10 MB object previously
showed a limitation of 17 Gbps with lots of idle CPU time, and now
filled the cable at 25 Gbps.
This should be backported to all versions where it applies.
"log-bufsize" may now be used for a log server (in a log backend) to
configure the bufsize of implicit ring associated to the server (which
defaults to BUFSIZE).
hash lb algorithm can be configured with the "log-balance hash <cnv_list>"
directive. With this algorithm, the user specifies a converter list with
<cnv_list>.
The produced log message will be passed as-is to the provided converter
list, and the resulting hash will be used to select the log server that
will receive the log message.
split sample_process() in 2 parts in order to be able to only process
the converter part of a sample expression from an existing input sample
struct passed as parameter.
Allow the use of the "none" hash-type function so that the key resulting
from the sample expression is directly used as the hash.
This can be useful to do the hashing manually using available hashing
converters, or even custom ones, and then inform haproxy that it can
directly rely on the sample expression result which is explictly handled
as an integer in this case.
Using "mode log" in a backend section turns the proxy in a log backend
which can be used to log-balance logs between multiple log targets
(udp or tcp servers)
log backends can be used as regular log targets using the log directive
with "backend@be_name" prefix, like so:
| log backend@mybackend local0
A log backend will distribute log messages to servers according to the
log load-balancing algorithm that can be set using the "log-balance"
option from the log backend section. For now, only the roundrobin
algorithm is supported and set by default.
This helper function can be used to create a new sink from an existing
server struct (and thus existing proxy as well), in order to spare some
resources when possible.
implicit rings were automatically forced to the parent logger format, but
this was done upon ring creation.
This is quite restrictive because we might want to choose the desired
format right before generating the log header (ie: when producing the
log message), depending on the logger (log directive) that is
responsible for the log message, and with current logic this is not
possible. (To this day, we still have dedicated implicit ring per log
directive, but this might change)
In ring_write(), we check if the sink->fmt is specified:
- defined: we use it since it is the most precise format
(ie: for named rings)
- undefined: then we fallback to the format from the logger
With this change, implicit rings' format is now set to UNSPEC upon
creation. This is safe because the log header building function
automatically enforces the "raw" format when UNSPEC is set. And since
logger->format also defaults to "raw", no change of default behavior
should be expected.
Introduce log_header struct to easily pass log header data between
functions and use that to simplify the logic around log header
handling.
While at it, some outdated comments were updated as well.
No change in behavior should be expected.
log targets were immediately embedded in logger struct (previously
named logsrv) and could not be used outside of this context.
In this patch, we're introducing log_target type with the associated
helper functions so that it becomes possible to declare and use log
targets outside of loggers scope.
When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.
But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.
Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.
So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.
So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).
This is internal rewording, so no functional change should be expected
on user-side.
CIDs tree is now allocated dynamically since the following commit :
276697438d50456f92487c990f20c4d726dfdb96
MINOR: quic: Use a pool for the connection ID tree.
This can caused a crash if qc_new_conn() is interrupted due to an
intermediary failed allocation. When freeing all connection members,
free_quic_conn_cids() is used. However, this function does not support a
NULL cids.
To fix this, simply check that cids is NULL during free_quic_conn_cids()
prologue.
This bug was reproduced using -dMfail.
No need to backport.
Since commit 5afcb686b ("MAJOR: connection: purge idle conn by last usage")
in 2.9-dev4, the test on conn->toremove_list added to conn_get_idle_flag()
in 2.8 by commit 3a7b539b1 ("BUG/MEDIUM: connection: Preserve flags when a
conn is removed from an idle list") becomes misleading. Indeed, now both
toremove_list and idle_list are shared by a union since the presence in
these lists is mutually exclusive. However, in conn_get_idle_flag() we
check for the presence in the toremove_list to decide whether or not to
delete the connection from the tree. This test now fails because instead
it sees the presence in the idle or safe list via the union, and concludes
the element must not be removed. Thus the element remains in the tree and
can be found later after the connection is released, causing crashes that
Tristan reported in issue #2292.
The following config is sufficient to reproduce it with 2 threads:
defaults
mode http
timeout client 5s
timeout server 5s
timeout connect 1s
listen front
bind :8001
server next 127.0.0.1:8002
frontend next
bind :8002
timeout http-keep-alive 1
http-request redirect location /
Sending traffic with a few concurrent connections and some short timeouts
suffices to instantly crash it after ~10k reqs:
$ h2load -t 4 -c 16 -n 10000 -m 1 -w 1 http://0:8001/
With Amaury we analyzed the conditions in which the function is called
in order to figure a better condition for the test and concluded that
->toremove_list is never filled there so we can safely remove that part
from the test and just move the flag retrieval back to what it was prior
to the 2.8 patch above. Note that the patch is not reverted though, as
the parts that would drop the unexpected flags removal are unchanged.
This patch must NOT be backported. The code in 2.8 works correctly, it's
only the change in 2.9 that makes it misbehave.
Move all QUIC trace definitions from quic_conn.h to quic_trace-t.h. Also
remove multiple definition trace_quic macro definition into
quic_trace.h. This forces all QUIC source files who relies on trace to
include it while reducing the size of quic_conn.h.
This bug was detected when compiling haproxy against aws-lc TLS stack
during QUIC interop runner tests. Some algorithms could be negotiated by haproxy
through the TLS stack but not fully supported by haproxy QUIC implentation.
This leaded tls_aead() to return NULL (same thing for tls_md(), tls_hp()).
As these functions returned values were never checked, they could triggered
segfaults.
To fix this, one closes the connection as soon as possible with a
handshake_failure(40) TLS alert. Note that as the TLS stack successfully
negotiates an algorithm, it provides haproxy with CRYPTO data before entering
->set_encryption_secrets() callback. This is why this callback
(ha_set_encryption_secrets() on haproxy side) is modified to release all
the CRYPTO frames before triggering a CONNECTION_CLOSE with a TLS alert. This is
done calling qc_release_pktns_frms() for all the packet number spaces.
Modify some quic_tls_keys_hexdump to avoid crashes when the ->aead or ->hp EVP_CIPHER
are NULL.
Modify qc_release_pktns_frms() to do nothing if the packet number space passed
as parameter is not intialized.
This bug does not impact the QUIC TLS compatibily mode (USE_QUIC_OPENSSL_COMPAT).
Thank you to @ilia-shipitsin for having reported this issue in GH #2309.
Must be backported as far as 2.6.
The openssl-compat.h file has some function which were implemented in
order to provide compatibility with openssl < 1.0.0. Most of them where
to support the 0.9.8 version, but we don't support this version anymore.
This patch removes the deprecated code from openssl-compat.h
Many actions take arguments after a parenthesis. When this happens, they
have to be tagged in the parser with KWF_MATCH_PREFIX so that a sub-word
is sufficient (since by default the whole block including the parenthesis
is taken).
The problem with this is that the parser stops on the first match. This
was OK years ago when there were very few actions, but over time new ones
were added and many actions are the prefix of another one (e.g. "set-var"
is the prefix of "set-var-fmt"). And what happens in this case is that the
first word is picked. Most often that doesn't cause trouble because such
similar-looking actions involve the same custom parser so actually the
wrong selection of the first entry results in the correct parser to be
used anyway and the error to be silently hidden.
But it's getting worse when accidentally declaring prefixes in multiple
files, because in this case it will solely depend on the object file link
order: if the longest name appears first, it will be properly detected,
but if it appears last, its other prefix will be detected and might very
well not be related at all and use a distinct parser. And this is random
enough to make some actions succeed or fail depending on the build options
that affect the linkage order. Worse: what if a keyword is the prefix of
another one, with a different parser but a compatible syntax ? It could
seem to work by accident but not do the expected operations.
The correct solution is to always look for the longest matching name.
This way the correct keyword will always be matched and used and there
will be no risk to randomly pick the wrong anymore.
This fix must be backported to the relevant stable releases.
sc_need_room() function may be called with a negative value. In this case,
the intent is to be notified if any space was made in the channel buffer. In
the function, we get the min between the requested room and the maximum
possible room in the buffer, considering it may be an HTX buffer.
However this max value is unsigned and leads to an unsigned comparison,
casting the negative value to an unsigned value. Of course, in this case,
this always leads to the wrong result. This bug seems to have no effect but
it is hard to be sure.
To fix the issue, we take care to respect the requested room sign by casting
the max value to a signed integer.
This patch must be backported to 2.8.
now forward_px only serves as a hint to know if a proxy was created
specifically for the sink, in which case the sink is responsible for it.
Everywhere forward_px was used in appctx context: get the parent proxy from
the sft->srv instead.
This permits to finally get rid of the double link dependency between sink
and proxy.
The regtests are using the "feature()" predicate but this one can only
rely on build-time options. It would be nice if some runtime-specific
options could be detected at boot time so that regtests could more
flexibly adapt to what is supported (capabilities, splicing, etc).
Similarly, certain features that are currently enabled with USE_XXX
could also be automatically detected at build time using ifdefs and
would simplify the configuration, but then we'd lose the feature
report in the feature list which is convenient for regtests.
This patch makes sure that haproxy -vv shows the variable's contents
and not the macro's contents, and adds a new hap_register_feature()
to allow the code to register a new keyword.
This patch add a hash of the Origin header to the cache's secondary key.
This enables to manage store responses that have a "Vary: Origin" header
in the cache when vary is enabled.
This cannot be considered as a means to manage CORS requests though, it
only processes the Origin header and hashes the presented value without
any form of URI normalization.
This need was expressed by Philipp Hossner in GitHub issue #251.
Co-Authored-by: Philipp Hossner <philipp.hossner@posteo.de>
This patch fixes the build with AWSLC and USE_QUIC=1, this is only meant
to be able to build for now and it's not feature complete.
The set_encryption_secrets callback has been split in set_read_secret
and set_write_secret.
Missing features:
- 0RTT was disabled.
- TLS1_3_CK_CHACHA20_POLY1305_SHA256, TLS1_3_CK_AES_128_CCM_SHA256 were disabled
- clienthello callback is missing, certificate selection could be
limited (RSA + ECDSA at the same time)
If a "Content-length" or "Transfer-Encoding; chunked" headers is found or
inserted in an outgoing message, a specific flag is now set on the H1
stream. H1S_F_HAVE_CLEN is set for "Content-length" header and
H1S_F_HAVE_CHNK for "Transfer-Encoding: chunked".
This will be useful to properly format outgoing messages, even if one of
these headers was removed by hand (with no update of the message meta-data).
Refactor alloc_bind_address() function which is used to allocate a
sockaddr if a connection to a target server relies on a specific source
address setting.
The main objective of this change is to be able to use this function
outside of backend module, namely for preconnections using a reverse
server. As such, this function is now exported globally.
For reverse connect, there is no stream instance. As such, the function
parts which relied on it were reduced to the minimal. Now, stream is
only used if a non-static address is configured which is useful for
usesrc client|clientip|hdr_ip. These options have no sense for reverse
connect so it should be safe to use the same function.
Improve EACCES permission errors encounterd when using QUIC connection
socket at runtime :
* First occurence of the error on the process will generate a log
warning. This should prevent users from using a privileged port
without mandatory access rights.
* Socket mode will automatically fallback to listener socket for the
receiver instance. This requires to duplicate the settings from the
bind_conf to the receiver instance to support configurations with
multiple addresses on the same bind line.
Define a new bind option quic-socket :
quic-socket [ connection | listener ]
This new setting works in conjunction with the existing configuration
global tune.quic.socket-owner and reuse the same semantics.
The purpose of this setting is to allow to disable connection socket
usage on listener instances individually. This will notably be useful
when needing to deactivating it when encountered a fatal permission
error on bind() at runtime.
When EBO was brought to pl_take_w() by plock commit 60d750d ("plock: use
EBO when waiting for readers to leave in take_w() and stow()"), a mistake
was made: the mask against which the current value of the lock is tested
excludes the first reader like in stow(), but it must not because it was
just obtained via an ldadd() which means that it doesn't count itself.
The problem this causes is that if there is exactly one reader when a
writer grabs the lock, the writer will not wait for it to leave before
starting its operations.
The solution consists in checking for any reader in the IF. However the
mask passed to pl_wait_unlock_*() must still exclude the lowest bit as
it's verified after a subsequent load.
Kudos to Remi Tricot-Le Breton for reporting and bisecting this issue
with a reproducer.
No backport is needed since this was brought in 2.9-dev3 with commit
8178a5211 ("MAJOR: threads/plock: update the embedded library again").
The code is now on par with plock commit ada70fe.
Add a new MUX flag MX_FL_REVERSABLE. This value is used to indicate that
MUX instance supports connection reversal. For the moment, only HTTP/2
multiplexer is flagged with it.
This allows to dynamically check if reversal can be completed during MUX
installation. This will allow to relax requirement on config writing for
'tcp-request session attach-srv' which currently cannot be used mixed
with non-http/2 listener instances, even if used conditionnally with an
ACL.
Define a new error code for connection CO_ER_REVERSE. This will be used
to report an issue which happens on a connection targetted for reversal
before reverse process is completed.
This reverts commit 072e77493961a06b89f853f4ab2bbf0e9cf3eff7.
Doing h2load with h3 tests we notice this behavior:
Client ---- INIT no token SCID = a , DCID = A ---> Server (1)
Client <--- RETRY+TOKEN DCID = a, SCID = B ---- Server (2)
Client ---- INIT+TOKEN SCID = a , DCID = B ---> Server (3)
Client <--- INIT DCID = a, SCID = C ---- Server (4)
Client ---- INIT+TOKEN SCID = a, DCID = C ---> Server (5)
With (5) dropped by haproxy due to token validation.
Indeed the previous patch adds SCID of retry packet sent to the aad
of the token ciphering aad. It was useful to validate the next INIT
packets including the token are sent by the client using the new
provided SCID for DCID as mantionned into the RFC 9000.
But this stateless information is lost on received INIT packets
following the first outgoing INIT packet from the server because
the client is also supposed to re-use a second time the lastest
received SCID for its new DCID. This will break the token validation
on those last packets and they will be dropped by haproxy.
It was discussed there:
https://mailarchive.ietf.org/arch/msg/quic/7kXVvzhNCpgPk6FwtyPuIC6tRk0/
To resume: this is not the role of the server to verify the re-use of
retry's SCID for DCID in further client's INIT packets.
The previous patch must be reverted in all versions where it was
backported (supposed until 2.6)
When a stream is caught looping, we produce some output to help figure
its internal state explaining why it's looping. The problem is that this
debug output is quite old and the info it provides are quite insufficient
to debug a modern process, and since such bugs happen only once or twice
a year the situation doesn't improve.
On the other hand the output of "show sess all" is extremely detailed
and kept up to date with code evolutions since it's a heavily used
debugging tool.
This commit replaces the call to the totally outdated stream_dump() with
a call to strm_dump_to_buffer(), and removes the filters dump since they
are already emitted there, and it now produces much more exploitable
output:
[ALERT] (5936) : A bogus STREAM [0x7fa8dc02f660] is spinning at 5653514 calls per second and refuses to die, aborting now! Please report this error to developers:
0x7fa8dc02f660: [28/Sep/2023:09:53:08.811818] id=2 proto=tcpv4 source=127.0.0.1:58306
flags=0xc4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x133f220, pend_pos=(nil) waiting=0 epoch=0x1
frontend=public (id=2 mode=http), listener=? (id=1) addr=127.0.0.1:4080
backend=public (id=2 mode=http) addr=127.0.0.1:61932
server=s1 (id=1) addr=127.0.0.1:7443
task=0x7fa8dc02fa40 (state=0x01 nice=0 calls=5749559 rate=5653514 exp=3s tid=1(1/1) age=1s)
txn=0x7fa8dc02fbf0 flags=0x3000 meth=1 status=-1 req.st=MSG_DONE rsp.st=MSG_RPBEFORE req.f=0x4c rsp.f=0x00
scf=0x7fa8dc02f5f0 flags=0x00000482 state=EST endp=CONN,0x7fa8dc02b4b0,0x05004001 sub=1 rex=58s wex=<NEVER>
h1s=0x7fa8dc02b4b0 h1s.flg=0x100010 .sd.flg=0x5004001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE
.meth=GET status=0 .sd.flg=0x05004001 .sc.flg=0x00000482 .sc.app=0x7fa8dc02f660
.subs=0x7fa8dc02f608(ev=1 tl=0x7fa8dc02fae0 tl.calls=0 tl.ctx=0x7fa8dc02f5f0 tl.fct=sc_conn_io_cb)
h1c=0x7fa8dc0272d0 h1c.flg=0x0 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc0273f0 .exp=<NEVER>
co0=0x7fa8dc027040 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=LISTENER:0x12840c0
flags=0x00000300 fd=32 fd.state=20 updt=0 fd.tmask=0x2
scb=0x7fa8dc02fb30 flags=0x00001411 state=EST endp=CONN,0x7fa8dc0300c0,0x05000001 sub=1 rex=58s wex=<NEVER>
h1s=0x7fa8dc0300c0 h1s.flg=0x4010 .sd.flg=0x5000001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE
.meth=GET status=0 .sd.flg=0x05000001 .sc.flg=0x00001411 .sc.app=0x7fa8dc02f660
.subs=0x7fa8dc02fb48(ev=1 tl=0x7fa8dc02feb0 tl.calls=2 tl.ctx=0x7fa8dc02fb30 tl.fct=sc_conn_io_cb)
h1c=0x7fa8dc02ff00 h1c.flg=0x80000000 .sub=1 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc030020 .exp=<NEVER>
co1=0x7fa8dc02fcd0 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x133f220
flags=0x10000300 fd=33 fd.state=10421 updt=0 fd.tmask=0x2
req=0x7fa8dc02f680 (f=0x1840000 an=0x8000 pipe=0 tofwd=0 total=79)
an_exp=<NEVER> buf=0x7fa8dc02f688 data=(nil) o=0 p=0 i=0 size=0
htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
res=0x7fa8dc02f6d0 (f=0x80000000 an=0x1400000 pipe=0 tofwd=0 total=0)
an_exp=<NEVER> buf=0x7fa8dc02f6d8 data=(nil) o=0 p=0 i=0 size=0
htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
call trace(10):
| 0x59f2b7 [0f 0b 0f 1f 80 00 00 00]: stream_dump_and_crash+0x1f7/0x2bf
| 0x5a0d71 [e9 af e6 ff ff ba 40 00]: process_stream+0x19f1/0x3a56
| 0x68d7bb [49 89 c7 4d 85 ff 74 77]: run_tasks_from_lists+0x3ab/0x924
| 0x68e0b4 [29 44 24 14 8b 4c 24 14]: process_runnable_tasks+0x374/0x6d6
| 0x656f67 [83 3d f2 75 84 00 01 0f]: run_poll_loop+0x127/0x5a8
| 0x6575d7 [48 8b 1d 42 50 5c 00 48]: main+0x1b22f7
| 0x7fa8e0f35e45 [64 48 89 04 25 30 06 00]: libpthread:+0x7e45
| 0x7fa8e0e5a4af [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a
Note that the output is subject to the global anon key so that IPs and
object names can be anonymized if required. It could make sense to
backport this and the few related previous patches next time such an
issue is reported.
There used to be two working modes for this function, a single-line one
and a multi-line one, the difference being made on the "eol" argument
which could contain either a space or an LF (and with the prefix being
adjusted accordingly). Let's get rid of the single-line mode as it's
what limits the output contents because it's difficult to produce
exploitable structured data this way. It was only used in the rare case
of spinning streams and applets and these are the ones lacking info. Now
a spinning stream produces:
[ALERT] (3511) : A bogus STREAM [0x227e7b0] is spinning at 5581202 calls per second and refuses to die, aborting now! Please report this error to developers:
strm=0x227e7b0,c4a src=127.0.0.1 fe=public be=public dst=s1
txn=0x2041650,3000 txn.req=MSG_DONE,4c txn.rsp=MSG_RPBEFORE,0
rqf=1840000 rqa=8000 rpf=80000000 rpa=1400000
scf=0x24af280,EST,482 scb=0x24af430,EST,1411
af=(nil),0 sab=(nil),0
cof=0x7fdb28026630,300:H1(0x24a6f60)/RAW((nil))/tcpv4(33)
cob=0x23199f0,10000300:H1(0x24af630)/RAW((nil))/tcpv4(32)
filters={}
call trace(11):
(...)
Since 2.4-dev18 with commit b4476c6a8 ("CLEANUP: freq_ctr: make
arguments of freq_ctr_total() const"), most of the freq_ctr readers
should be fine with a const, except that they were not updated to
reflect this and they continue to force variable on some functions
that call them. Let's update this. This could even be backported if
needed.
Added set-timeout for frontend side of session, so it can be used to set
custom per-client timeouts if needed. Added cur_client_timeout to fetch
client timeout samples.
Add reporting using send_log() for preconnect operation. This is minimal
to ensure we understand the current status of listener in active reverse
connect.
To limit logging quantity, only important transition are considered.
This requires to implement a minimal state machine as a new field in
receiver structure.
Here are the logs produced :
* Initiating : first time preconnect is enabled on a listener
* Error : last preconnect attempt interrupted on a connection error
* Reaching maxconn : all necessary connections were reversed and are
operational on a listener
When a connection is freed during preconnect before reversal, the error
must be notified to the listener to remove any connection reference and
rearm a new preconnect attempt. Currently, this can occur through 2 code
paths :
* conn_free() called directly by H2 mux
* error during conn_create_mux(). For this case, connection is flagged
with CO_FL_ERROR and reverse_connect task is woken up. The process
task handler is then responsible to call conn_free() for such
connection.
Duplicated steps where done both in conn_free() and process task
handler. These are now removed. To facilitate code maintenance,
dedicated operation have been centralized in a new function
rev_notify_preconn_err() which is called by conn_free().
This patch adds the ability to externalize and customize the code
of the computation of extra CIDs after the first one was derived from
the ODCID.
This is to prepare interoperability with extra components such as
different QUIC proxies or routers for instance.
To process the patch defines two function callbacks:
- the first one to compute a hash 64bits from the first generated CID
(itself continues to be derived from ODCID). Resulting hash is stored
into the 'quic_conn' and 64bits is chosen large enought to be able to
store an entire haproxy's CID.
- the second callback re-uses the previoulsy computed hash to derive
an extra CID using the custom algorithm. If not set haproxy will
continue to choose a randomized CID value.
Those two functions have also the 'cluster_secret' passed as an argument:
this way, it is usable for obfuscation or ciphering.
Function comments were outdated, probably because they have not been
updated during the previous refactors.
Fixing comments to better reflect the current behavior.
This may be backported up to 2.2, or even 2.0 by slightly adapting the
patch (in 2.0, such functions are documented in proto/pattern.h)
The ring lock was initially mostly used for the logs and used to inherit
its name in lock stats. Now that it's exclusively used by rings, let's
rename it accordingly.
The log server lock is pretty visible in perf top when using log samples
because it's taken for each server in turn while trying to validate and
update the log server's index. Let's change this for a CAS, since we have
the index and the range at hand now. This allow us to remove the logsrv
lock.
The test on 4 servers now shows a 3.7 times improvement thanks to much
lower contention. Without log sampling a test producing 4.4M logs/s
delivers 4.4M logs/s at 21 CPUs used, everything spent in the kernel.
After enabling 4 samples (1:4, 2:4, 3:4 and 4:4), the throughput would
previously drop to 1.13M log/s with 37 CPUs used and 75% spent in
process_send_log(). Now with this change, 4.25M logs/s are emitted,
using 26 CPUs and 22% in process_send_log(). That's a 3.7x throughput
improvement for a 30% global CPU usage reduction, but in practice it
mostly shows that the performance drop caused by having samples is much
less noticeable (each of the 4 servers has its index updated for each
log).
Note that in order to even avoid incrementing an index for each log srv
that is consulted, it would be more convenient to have a single index
per frontend and apply the modulus on each log server in turn to see if
the range has to be updated. It would then only perform one write per
range switch. However the place where this is done doesn't have access
to a frontend, so some changes would need to be performed for this, and
it would require to update the current range independently in each
logsrv, which is not necessarily easier since we don't know yet if we
can commit it.
By using a single long long to store both the current range and the
next index, we'll make it possible to perform atomic operations instead
of locking. Let's only regroup them for now under a new "curr_rg_idx".
The upper word is the range, the lower is the index.
This index is useless because it only serves to know when the global
index reached the end, while the global one already knows it. Let's
just drop it and perform the test on the global range.
It was verified with the following config that the first server continues
to take 1/10 of the traffic, the 2nd one 2/10, the 3rd one 3/10 and the
4th one 4/10:
log 127.0.0.1:10001 sample 1:10 local0
log 127.0.0.1:10002 sample 2,5:10 local0
log 127.0.0.1:10003 sample 3,7,9:10 local0
log 127.0.0.1:10004 sample 4,6,8,10:10 local0
The test of the log range is not very clear, in part due to the
reuse of the "curr_idx" name that happens at two levels. The call
to in_smp_log_range() applies to the smp_info's index to which 1 is
added: it verifies that the next index is still within the current
range.
Let's just have a local variable "next_index" in process_send_log()
that gets assigned the next index (current+1) and compare it to the
current range's boundaries. This makes the test much clearer. We can
then simply remove in_smp_log_range() that's no longer needed.
This reverts commit c618ed5ff41ce29454e784c610b23bad0ea21f4f.
The list iterator is broken. As found by Fred, running QUIC single-
threaded shows that only the first connection is accepted because the
accepter relies on the element being initialized once detached (which
is expected and matches what MT_LIST_DELETE_SAFE() used to do before).
However while doing this in the quic_sock code seems to work, doing it
inside the macro show total breakage and the unit test doesn't work
anymore (random crashes). Thus it looks like the fix is not trivial,
let's roll this back for the time it will take to fix the loop.