session_add_conn() uses three argument : connection and session
instances, plus a void pointer labelled as target. Typically, it
represents the server, but can also be a backend instance (for example
on dispatch).
In fact, this argument is redundant as <target> is already a member of
the connection. This commit simplifies session_add_conn() by removing
it. A BUG_ON() on target is extended to ensure it is never NULL.
On stream detach on backend side, connection is inserted in the proper
server/session list to be able to reuse it later. If insertion fails and
the connection is idle, the connection can be removed immediately.
If this occurs on a QUIC connection, QUIC MUX implements graceful
shutdown to ensure the server is notified of the closure. However, the
connection instance is not freed. Change this to ensure that both
shutdown and release is performed.
This is preparation work for shared counters between co-processes. As
co-processes will need to share a common date. global_now_ms will be used
for that as it will point to the shm when sharing is enabled.
Thus in this patch we turn global_now_ms into a pointer (and adjust the
places where it is written to and read from, hopefully atomic operations
through pointer are already used so the change is trivial)
For now global_now_ms points to process-local _global_now_ms which is a
fallback for when sharing through the shm is not enabled.
75e480d10 ("MEDIUM: stats: avoid 1 indirection by storing the shared
stats directly in counters struct") took care of renaming
counters_fe_shared_init() but we forgot counters_be_shared_init().
Let's fix that for consistency
As discussed in GH #3051, default-path is not taken into account when
loading files using lua-load-per-thread. In fact, the initial
hlua_load_state() (performed on first thread which parses the config)
is successful, but other threads run hlua_load_state() later based
on config hints which were saved by the first thread, and those config
hints only contain the file path provided on the lua-load-per-thread
config line, not the absolute one. Indeed, `default-path` directive
changes the current working directory only for the thread parsing the
configuration.
To fix the issue, when storing config hints under hlua_load_per_thread()
we now make sure to save the absolute file path for `lua-load-per-thread'
argument.
Thanks to GH user @zhanhb for having reported the issue
It may be backported to all stable versions.
Implement traces for the ACME protocol.
-dt acme:data:complete will dump every input and output buffers,
including decoded buffers before being converted to JWS.
It will also dump certificates in the traces.
-dt acme:user:complete will only dump the state of the task handler.
Following c24de07 ("OPTIM: stats: store fast sharded counters pointers
at session and stream level") some crashes were observed in
connect_server():
#0 0x00000000007ba39c in connect_server (s=0x65117b0) at src/backend.c:2101
2101 _HA_ATOMIC_INC(&s->sv_tgcounters->connect);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 nss-softokn-freebl-3.67.0-3.el7_9.x86_64 pcre-8.32-17.el7.x86_64
(gdb) bt
#0 0x00000000007ba39c in connect_server (s=0x65117b0) at src/backend.c:2101
#1 0x00000000007baff8 in back_try_conn_req (s=0x65117b0) at src/backend.c:2378
#2 0x00000000006c0e9f in process_stream (t=0x650f180, context=0x65117b0, state=8196) at src/stream.c:2366
#3 0x0000000000bd3e51 in run_tasks_from_lists (budgets=0x7ffd592752e0) at src/task.c:655
#4 0x0000000000bd49ef in process_runnable_tasks () at src/task.c:889
#5 0x0000000000851169 in run_poll_loop () at src/haproxy.c:2834
#6 0x0000000000851865 in run_thread_poll_loop (data=0x1a03580 <ha_thread_info>) at src/haproxy.c:3050
#7 0x0000000000852a53 in main (argc=7, argv=0x7ffd592755f8) at src/haproxy.c:3637
Here the crash occurs during the atomic inc of a sv_tgcounters metric from
the stream pointer, which tells us the pointer is likely garbage.
In fact, we assign s->sv_tgcounters each time the stream target is set to
a valid server. For that we use stream_set_srv_target() helper which does
assigment for us. By reviewing the code, in turns out we forgot to call
stream_set_srv_target() in pendconn_dequeue(), where the stream target
is set to the server who picked the pendconn.
Let's fix the bug by using stream_set_srv_target() there.
No backport needed unless c24de07 is.
Following commit 75e480d10 ("MEDIUM: stats: avoid 1 indirection by storing
the shared stats directly in counters struct"), in order to minimize the
impact of the recent sharded counters work, we try to push things a bit
further in this patch by storing and using "fast" pointers at the session
and stream levels when available to avoid costly indirections and
systematic "tgid" resolution (which can not be cached by the CPU due to
its THREAD-local nature).
Indeed, we know that a session/stream is tied to a given CPU, thanks to
this we know that the tgid for a given session/stream will never change.
Given that, we are able to store sharded frontend and listener counters
pointer at the session level (namely sess->fe_tgcounters and
sess->li_tgcounters), and once the backend and the server are selected,
we are also able to store backend and server sharded counters
pointer at the stream level (namely s->be_tgcounters and s->sv_tgcounters)
Everywhere we rely on these counters and the stream or session context is
available, we use the fast pointers it instead of the indirect pointers
path to make the pointer resolution a bit faster.
This optimization proved to bring a few percents back, and together with
the previous 75e480d10 commit we now fixed the performance regression (we
are back to back with 3.2 stats performance)
Since commit 7293eb68 ("MEDIUM: peers: use server as stream target") peer
session target always point to server in order to benefit from existing
server transport options.
Thanks to that, it is no longer necessary to have peer_session_target()
helper function, because all it does is return the pointer to the
server object. Let's get rid of that
Between 3.2 and 3.3-dev we noticed a noticeable performance regression
due to stats handling. After bisecting, Willy found out that recent
work to split stats computing accross multiple thread groups (stats
sharding) was responsible for that performance regression. We're looking
at roughly 20% performance loss.
More precisely, it is the added indirections, multiplied by the number
of statistics that are updated for each request, which in the end causes
a significant amount of time being spent resolving pointers.
We noticed that the fe_counters_shared and be_counters_shared structures
which are currently allocated in dedicated memory since a0dcab5c
("MAJOR: counters: add shared counters base infrastructure")
are no longer huge since 16eb0fab31 ("MAJOR: counters: dispatch counters
over thread groups") because they now essentially hold flags plus the
per-thread group id pointer mapping, not the counters themselves.
As such we decided to try merging fe_counters_shared and
be_counters_shared in their parent structures. The cost is slight memory
overhead for the parent structure, but it allows to get rid of one
pointer indirection. This patch alone yields visible performance gains
and almost restores 3.2 stats performance.
counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and
now returns either failure or success instead of a pointer because we
don't need to retrieve a shared pointer anymore, the function takes care
of initializing existing pointer.
Since ccc43412 ("OPTIM: log: use thread local lf_buildctx to stop pushing
it on the stack"), recursively calling sess_build_logline_orig(), which
may for instance happen when leveraging %ID (or unique-id fetch) for the
first time, would lead to undefined behavior because the parent
sess_build_logline_orig() build context was shared between recursive calls
(only one build ctx per thread to avoid pushing it on the stack for each
call)
In short, the parent build ctx would be altered by the recursive calls,
which is obviously not expected and could result in log formatting errors.
To fix the issue but still avoid polluting the stack with large lf_buildctx
struct, let's move the static 256 bytes build buffer out of the buildctx
so that the buildctx is now stored in the stack again (each function
invokation has its own dedicated build ctx). On the other hand, it's
acceptable to have only 1 256 bytes build buffer per thread because the
build buffer is not involved in recursives calls (unlike the build ctx)
Thanks to Willy and Vincent Gramer for spotting the bug and providing
useful repro.
It should be backported in 3.0 with ccc43412.
To motivate developers to support the new applets API, a warning is now
emitted when a legacy applet is spawned. To not flood users, this warning is
only emitted once per legacy applet. To do so, the applet flag
APPLET_FL_WARNED was added. It is set when the warning is emitted.
Note that test and set on this flag are not performed via atomic operations.
So it is possible to have more than one warning for a given applet if it is
spawned in same time on several threads. At worrst, there is one warning per
thread.
A new field was added in the applet structure to be able to set flags on the
applets The first one is related to the new API. APPLET_FL_NEW_API is set
for applets based on the new API. It was set on all HAProxy's applets.
Thanks to this patch, the peer applet is now using its own buffers. .rcv_buf
and .snd_buf callback functions are now defined to use the default raw
functions. The applet API is now used and any dependencies on the
stream-connectors and the channels were removed.
Thanks to this patch, the sink applets is now using their own buffers.
.rcv_buf and .snd_buf callback functions are now defined to use the default
raw functions. The applet API is now used and any dependencies on the
stream-connectors and the channels were removed.
Thanks to this patch, the log applet is now using its own buffers. .rcv_buf
and .snd_buf callback functions are now defined to use the default raw
functions. The applet API is now used and any dependencies on the
stream-connectors and the channels were removed.
Thanks to this patch, the http-client applet is now using its own buffers.
.rcv_buf and .snd_buf callback functions are now defined to use the default
HTX functions. Parts to receive and send data have also been updated to use
the applet API and to remove any dependencies on the stream-connectors and
the channels.
In the CLI I/O handler interacting with the HTTP client, in HTX mode, after
a dump of the HTX message, data must be removed. Instead of removng all
blocks one by one, we can call htx_reset() because all the message must be
flushed.
In the CLI I/O handler interacting with the HTTP client, we must not try to
push raw headers in HTX mode, because there is no raw data in this
mode. This prevent the HTX dump at the end of the I/O handle.
It is a 3.3-specific issue. No backport needed.
The first HTX block of a response must be a start-line. There is no reason
to wait for something else. And if there are output data in the response
channel buffer, it means we must found the start-line.
There is no reason to yield after sending the request headers, except if the
request was fully sent. If there is a payload, it is better to send it as
well. However, when the whole request was sent, we can leave the I/O handler.
Thanks to this patch, the dns_session applet is now using its own
buffers. .rcv_buf and .snd_buf callback functions are now defined to use the
default raw functions. Functions to receive and send data have also been
updated to use the applet API and to remove any dependencies on the
stream-connectors and the channels.
The issue was introduced by commit 27236f221 ("BUG/MINOR: dns: add tempo
between 2 connection attempts for dns servers"). In this patch, to delay the
reconnection, a timer is used on the appctx when it is created. This
postpones the appctx initialization. However, once initialized, the
expiration time of the underlying task is not reset. So, it is always
considered as expired and the appctx is woken up in loop.
The fix is quite simple. In dns_session_init(), the expiration time of the
appctx's task is alwaus set to TICK_ETERNITY.
This patch must be backported everywhere the commit above was backported. So
as far as 2.8 for now but possibly to all stable versions.
Thanks to this patch, the lua cosocket applet is now using its own
buffers. .rcv_buf and .snd_buf callback functions are now defined to use the
default raw functions. Functions to receive and send data have also been
updated to use the applet API and to remove any dependencies on the
stream-connectors and the channels.
It is a fix similar to the previous one ("BUG/MEDIUM: hlua: Report to SC
when data were consumed on a lua socket"), but for the write side. The
writer must notify the cosocket it needs more space in the request buffer to
produce more data by calling sc_need_room(). Otherwise, there is nothing to
prevent to wake the cosocket applet up again and again.
This patch must be backported as far as 2.8, and maybe to 2.6 too.
The lua cosocket are quite strange. There is an applet used to handle the
connection and writer and readers subscribed on it to write or read
data. Writers and readers are tasks woken up by the cosocket applet when
data can be consumed or produced, depending on the channels buffers
state. Then the cosocket applet is woken up by writers and readers when read
or write events were performed.
It means the cosocket applet has only few information on what was produced
or consumed. It is the writers and readers responsibility to notify any
blocking. Among other things, the readers must take care to notify the
stream on top of the cosocket applet that some data was consumed. Otherwise,
it may remain blocked, waiting for a write event (a write event from the
stream point of view is a read event from the cosocket point of view).
Thie patch must be backported as far as 2.8, and maybe to 2.6 too.
Thanks to this patch, the lua HTTP applet is now using its own buffers.
.rcv_buf and .snd_buf callback functions are now defined to use the default
HTX functions. Functions to receive and send data have also been updated to
use the applet API and to remove any dependencies on the stream-connectors
and the channels.
hlua_http_get_headers() function was using the HTTP message from the stream
TXN to retrieve headers from a message. However, this will be an issue to
update the lua HTTP applet to use its own buffers. Indeed, in that case,
information from the channels will be unavailable. So now,
hlua_http_get_headers() is now using a buffer containing an HTX message. It
is just an API change bacause, internally, the function was already
manipulation an HTX message.
When a lua HTTP applet is created, a "request" object is created, filled
with the request information (method, path, headers...), to be able to
easily retrieve these information from the script. However, this was done
when thee appctx was created, retrieving the info from the request channel.
To be ale to update the applet to use its own buffer, it is now performed on
the first applet run. Indead, when the applet is created, the info are not
forwarded yet and should not be accessed. Note that for now, information are
still retrieved from the channel.
Thanks to this patch, the lua TCP applet is now using its own buffers.
.rcv_buf and .snd_buf callback functions are now defined to use the default
raw functions. Other changes are quite light. Mainly, end of stream and
errors are reported on the appctx instead of the stream-endpoint descriptor.
When an HTTP applet tries to retrieve data, the request headers are still in
the buffer. But, instead of being silently removed, their size is removed
from the amount of data retrieved. When the request payload is fully
retrieved, it is not an issue. But it is a problem when a length is
specified. The data are shorten from the headers size.
So now, we take care to silently remove headers.
This patch must be backported to all stable versions.
The curve name to curve id mapping table was built out of multiple
internal tables found in openssl sources, namely the 'nid_to_group'
table found in 'ssl/t1_lib.c' which maps openssl specific NIDs to public
IANA curve identifiers. In this table, there were two instances of
EVP_PKEY_XXX ids being used while all the other ones are NID_XXX
identifiers.
Since the two EVP_PKEY are actually equal to their NID equivalent in
'include/openssl/evp.h' we can use NIDs all along for better coherence.
Allow the "processing" status in the challenge object when requesting
to do the challenge, in addition to "pending".
According to RFC 8555 https://datatracker.ietf.org/doc/html/rfc8555/#section-7.1.6
Challenge objects are created in the "pending" state. They
transition to the "processing" state when the client responds to the
challenge (see Section 7.5.1)
However some CA could respond with a "processing" state without ever
transitioning to "pending".
Must be backported to 3.2.
If a backend connection is private, it should not be reused outside of
its original attached session. As such, on stream detach operation, such
connection is never inserted into server idle/avail list. Instead, it is
stored directly on the session.
The purpose of this commit is to implement proper handling of private
backend connections via QUIC multiplexer.
QUIC connection graceful closure is performed in two steps. First, the
application layer is closed. In the context of HTTP/3, this is done with
a GOAWAY frame emission, which forbids opening of new streams. Then the
whole connection is terminated via CONNECTION_CLOSE which is the final
emitted frame.
This commit ensures that when app layer is shut for a backend
connection, this connection is removed from either idle or avail server
tree. The objective is to prevent stream layer to try to reuse a
connection if no new stream can be attached on it.
New BUG_ON checks are inserted in qmux_strm_attach() and h3_attach() to
ensure that this assertion is always true.
Implement support for QUIC connection reuse on the backend side. The
main change is done during detach stream operation. If a connection is
idle, it is inserted in the server list. Else, it is stored in the
server avail tree if there is room for more streams.
For non idle connection, qmux_avail_streams() is reused to detect that
stream flow-control limit is not yet reached. If this is the case, the
connection is not inserted in the avail tree, so it cannot be reuse,
even if flow-control is unblocked later by the peer. This latter point
could be improved in the future.
Note that support for QUIC private connections is still missing. Reuse
code will evolved to fully support this case.
Add a new <sess> member into QCS structure. It is used to store the
parent session of the stream on attach operation. This is only done for
backend side.
This new member will become necessary when connection reuse will be
implemented. <owner> member of connection is not suitable as it could be
set to NULL, notably after a session_add_conn() failure.
Also, a single BE conn can be shared along different session instance,
in particular when using aggressive/always reuse mode. Thus it is
necessary to linked each QCS instance with its session.
For now, QUIC glitch limit counter is only available on the frontend
side. Thus, disable incrementation on the backend side for now. Also,
session is only available as conn <owner> reliably on the frontend side,
so session_add_glitch_ctr() operation is also securised.
qcc_refresh_timeout() is the function called on QUIC MUX activity. Its
purpose is to update the timeout by selecting the correct value
depending on the connection state.
Prior to this patch, backend connections were mostly ignored by the
function. However, the default server timeout was selecting as a
fallback. This is incompatible with backend connections reuse.
This patch fixes timeout applied on backend connections. Only values
specific to frontend which are http-request and http-keep-alive timeouts
are now ignored for a backend connection. Also, fallback timeout is only
used for frontend connections.
This patch ensures that an idle backend connection won't be deleted due
to server timeout. This is necessary for proper connection reuse which
will be implemented in a future patch.
This commit is a small reorganization of condition used into
qcc_refresh_timeout(). Its objective is to render the code more logical
before the next patch which will ensure that timeout is properly set for
backend connections.
If a connection remains on a proxy currently disabled or stopped, a
special spread timeout is set if active close is configured. For QUIC
MUX, this is set via qcc_refresh_timeout() as with all other timeout
values.
Fix this closing timeout setting : it is now used as an override to any
other timeout that may have been chosen if calculated spread time is
lower than the previously selected value. This is done for backend
connections as well.
This should be backported up to 2.6 after a period of observation.
When no stream is attached, mux layer is responsible to maintain a
timeout. The first criteria is to apply client/server timeout if there
is still data waiting for emission.
Previously, <hreq> qcc member was used to determine this state. However,
this only covers bidirectional streams. Fix this by testing if
<send_list> is empty or not. This is enough to take into account both
bidi and uni streams.
Theorically, this should be backported to every stable versions.
However, send-list is not available on 2.6 and there is no alternative
to quickly determine if there is waiting output data. Thus, it's better
to backport it up to 2.8 only.
The requests that checked the status of the challenge and the retrieval
of the certificate were done using a GET.
This is working with letsencrypt and other CA providers, but it might
not work everywhere. RFC 8555 specifies that only the directory and
newNonce resources MUST work with a GET requests, but everything else
must use POST-as-GET.
Must be backported to 3.2.
"log-steps" was already ignored if directly defined in a backend section,
however, when defined in a defaults section it was inherited to all
proxies no matter their capability (ie: including backends).
As configurations often contain more backends than frontends, this would
result in wasted memory given that the log-steps setting is only
considered on frontends.
Let's fix that by preventing the inheritance from defaults section to
anything else than frontends. Also adjust the documentation to mention
that the setting in not relevant for backends.
Due to the introduction of smallbuf usage for HTTP/3 headers emission,
ret variable may be used uninitialized if buffer allocation fails due to
not enough room in QUIC connection window.
Fix this by setting ret value to 0.
Function variable declaration are also adjusted so that the pattern is
similar to h3_resp_headers_send(). Finally, outbuf buffer is also
removed as it is now unused.
No need to backport.