It is now possible to set the stream log level from a "tcp-request content"
or "tcp-response content" ruleset. To do so, the action parsing is moved in
stream.c and the action evaluation is handled in a dedicated function.
This patch should fix issue #1306. It may be backported as far as 2.2 if
necessary.
This basically undoes the API changes that were performed by commit
0274286dd ("BUG/MAJOR: server: fix deadlock when changing maxconn via
agent-check") to address the deadlock issue: since process_srv_queue()
doesn't use the server lock anymore, it doesn't need the "server_locked"
argument, so let's get rid of it before it gets used again.
This reverts commit c83e45e9b0.
The recent changes since 5304669e1 MEDIUM: queue: make
pendconn_process_next_strm() only return the pendconn opened a tiny race
condition between stream_free() and process_srv_queue(), as the pendconn
is accessed outside of the lock, possibly while it's being freed. A
different approach is required.
This basically undoes the API changes that were performed by commit
0274286dd ("BUG/MAJOR: server: fix deadlock when changing maxconn via
agent-check") to address the deadlock issue: since process_srv_queue()
doesn't use the server lock anymore, it doesn't need the "server_locked"
argument, so let's get rid of it before it gets used again.
This essentially reverts commit 2b4370078 ("MINOR: lb/api: let callers
of take_conn/drop_conn tell if they have the lock") that was merged
during 2.4 before the various locks could be eliminated at the lower
layers. Passing that information complicates the cleanup of the queuing
code and it's become useless.
The server_parse_maxconn_change_request locks the server lock. However,
this function can be called via agent-checks or lua code which already
lock it. This bug has been introduced by the following commit :
commit 79a88ba3d0
BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'
This commit tried to fix another deadlock with can occur because
previoulsy server_parse_maxconn_change_request requires the server lock
to be held. However, it may call internally process_srv_queue which also
locks the server lock. The locking policy has thus been updated. The fix
is functional for the CLI 'set maxconn' but fails to address the
agent-check / lua counterparts.
This new issue is fixed in two steps :
- changes from the above commit have been reverted. This means that
server_parse_maxconn_change_request must again be called with the
server lock.
- to counter the deadlock fixed by the above commit, process_srv_queue
now takes an argument to render the server locking optional if the
caller already held it. This is only used by
server_parse_maxconn_change_request.
The above commit was subject to backport up to 1.8. Thus this commit
must be backported in every release where it is already present.
This reverts commit 5b82cc5b5c. The purpose of
this commit was to fully handle L7 retries in HTTP analysers and stop to
deal with the L7 buffer in si_cs_send()/si_cs_recv(). It is of course
cleaner this way. But there is a huge drawback. The L7 buffer is reserved
from the time the request analysis is finished until the moment the response
is received. For a small request, the analysis is finished before the
connection to the server. Thus for the L7 buffer will be kept for queued
sessions while it is not mandatory.
So, for now, the commit is reverted to go back to the less expensive
solution. This patch must be backported to 2.4.
There were 102 CLI commands whose help were zig-zagging all along the dump
making them unreadable. This patch realigns all these messages so that the
command now uses up to 40 characters before the delimiting colon. About a
third of the commands did not correctly list their arguments which were
added after the first version, so they were all updated. Some abuses of
the term "id" were fixed to use a more explanatory term. The
"set ssl ocsp-response" command was not listed because it lacked a help
message, this was fixed as well. The deprecated enable/disable commands
for agent/health/server were prominently written as deprecated. Whenever
possible, clearer explanations were provided.
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
Both structures are identical except the name of the field starting
the period and its description. Let's call them all freq_ctr and the
period's start "curr_tick" which is generic.
This is only a temporary change and fields are expected to remain
the same with no code change (verified).
This patch replaces roughly all occurrences of an HA_ATOMIC_ADD(&foo, 1)
or HA_ATOMIC_SUB(&foo, 1) with the equivalent HA_ATOMIC_INC(&foo) and
HA_ATOMIC_DEC(&foo) respectively. These are 507 changes over 45 files.
The fetch_and_xxx variant is often missing for add/sub/and/or. In fact
it was only provided for ADD under the name XADD which corresponds to
the x86 instruction name. But for destructive operations like AND and
OR it's missing even more as it's not possible to know the value before
modifying it.
This patch explicitly adds HA_ATOMIC_FETCH_{OR,AND,ADD,SUB} which
cover these standard operations, and renames XADD to FETCH_ADD (there
were only 6 call places).
In the future, backport of fixes involving such operations could simply
remap FETCH_ADD(x) to XADD(x), FETCH_SUB(x) to XADD(-x), and for the
OR/AND if needed, these could possibly be done using BTS/BTR.
It's worth noting that xchg could have been renamed to fetch_and_store()
but xchg already has well understood semantics and it wasn't needed to
go further.
It is now possible to perform HTTP upgrades on a TCP stream from the
frontend side. To do so, a tcp-request content rule must be defined with the
switch-mode action, specifying the mode (for now, only http is supported)
and optionnaly the proto (h1 or h2).
This way it could be possible to set HTTP directives on a TCP frontend which
will only be evaluated if an upgrade is performed. This new way to perform
HTTP upgrades should replace progressively the old way, consisting to route
the request to an HTTP backend. And it should be also a good start to remove
all HTTP processing from tcp-request content rules.
This action is terminal, it stops the ruleset evaluation. It is only
available on proxy with the frontend capability.
The configuration manual has been updated accordingly.
The code responsible to perform an HTTP upgrade from a TCP stream is moved
in a dedicated function, stream_set_http_mode().
The stream_set_backend() function is slightly updated, especially to
correctly set the request analysers.
Now allocation and initialization of HTTP transactions are performed in a
unique function. Historically, there were two functions because the same TXN
was reset for K/A connections in the legacy HTTP mode. Now, in HTX, K/A
connections are handled at the mux level. A new stream, and thus a new TXN,
is created for each request. In addition, the function responsible to end
the TXN is now also reponsible to release it.
So, now, http_create_txn() and http_destroy_txn() must be used to create and
destroy an HTTP transaction.
It is just a small cleanup. AN_REQ_FLT_HTTP_HDRS and AN_RES_FLT_HTTP_HDRS
analysers are now set in HTTP analysers at the same place
AN_REQ_HTTP_XFER_BODY and AN_RES_HTTP_XFER_BODY are set.
We now use the stream instead of the proxy to know if we are processing HTTP
data or not. If the stream is an HTX stream, it means we are dealing with
HTTP data. It is more accurate than the proxy mode because when an HTTP
upgrade is performed, the proxy is not changed and only the stream may be
used.
Note that it was not a problem to rely on the proxy because HTTP upgrades
may only happen when an HTTP backend was set. But, we will add the support
of HTTP upgrades on the frontend side, after te tcp-request rules
evaluation. In this context, we cannot rely on the proxy mode.
Always set frontend HTTP analysers when an HTX stream is created. It is only
useful in case a destructive HTTP upgrades (TCP>H2) because the frontend is
a TCP proxy.
In fact, to be strict, we must only set these analysers when the upgrade is
performed before setting the backend (it is not supported yet, but this
patch is required to do so), in the frontend part. If the upgrade happens
when the backend is set, it means the HTTP processing is just the backend
buisness. But there is no way to make the difference when a stream is
created, at least for now.
When a TCP stream is upgraded to H2 stream, a destructive upgrade is
performed. It means the TCP stream is silently released while a new one is
created. It is of course more complicated but it is what we observe from the
stream point of view.
That was performed by returning an error when the backend was set. It is
neither really elegant nor accurate. So now, instead of returning an error
from stream_set_backend() in case of destructive HTTP upgrades, the TCP
stream processing is aborted and no error is reported. However, the result
is more or less the same.
The function's purpose used to be to fail a buffer allocation if that
allocation wouldn't result in leaving some buffers available. Thus,
some allocations could succeed and others fail for the sole purpose of
trying to provide 2 buffers at once to process_stream(). But things
have changed a lot with 1.7 breaking the promise that process_stream()
would always succeed with only two buffers, and later the thread-local
pool caches that keep certain buffers available that are not accounted
for in the global pool so that local allocators cannot guess anything
from the number of currently available pools.
Let's just replace all last uses of b_alloc_margin() with b_alloc() once
for all.
Another way to say it: "Safely unlink requester from a requester callbacks".
Requester callbacks must never try to unlink a requester from a resolution, for
the current requester or another one. First, these callback functions are called
in a loop on a request list, not necessarily safe. Thus unlink resolution at
this place, may be unsafe. And it is useless to try to make these loops safe
because, all this stuff is placed in a loop on a resolution list. Unlink a
requester may lead to release a resolution if it is the last requester.
However, the unkink is necessary because we cannot reset the server state
(hostname and IP) with some pending DNS resolution on it. So, to workaround
this issue, we introduce the "safe" unlink. It is only performed from a
requester callback. In this case, the unlink function never releases the
resolution, it only reset it if necessary. And when a resolution is found
with an empty requester list, it is released.
This patch depends on the following commits :
* MINOR: resolvers: Purge answer items when a SRV resolution triggers an error
* MINOR: resolvers: Use a function to remove answers attached to a resolution
* MINOR: resolvers: Directly call srvrq_update_srv_state() when possible
* MINOR: resolvers: Add function to change the srv status based on SRV resolution
All the series must be backported as far as 2.2. It fixes a regression
introduced by the commit b4badf720 ("BUG/MINOR: resolvers: new callback to
properly handle SRV record errors").
don't release resolution from requester cb
These are some leftovers from the ancient code where they were still
called sessions, but these areas in the code remain confusing due to
this naming. They were now called "strm" which will not even affect
indenting nor alignment.
It's been too short for quite a while now and is now full. It's still
time to extend it to 32-bits since we have room for this without
wasting any space, so we now gained 16 new bits for future flags.
The values were not reassigned just in case there would be a few
hidden u16 or short somewhere in which these flags are placed (as
it used to be the case with stream->pending_events).
The patch is tagged MEDIUM because this required to update the task's
process() prototype to use an int instead of a short, that's quite a
bunch of places.
Using abort() occasionally results in unexploitable core due to issues
rewinding the stack. Let's use ABORT_NOW() which in addition to crashing
much closer to the call point also has the benefit of showing the call
trace.
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).
The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.
178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:
@ rule @
expression E;
@@
- free(E);
- E = NULL;
+ ha_free(&E);
It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.
A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
The lock was still used exclusively to deal with the concurrency between
the "show sess" release handler and a stream_new() or stream_free() on
another thread. All other accesses made by "show sess" are already done
under thread isolation. The release handler only requires to unlink its
node when stopping in the middle of a dump (error, timeout etc). Let's
just isolate the thread to deal with this case so that it's compatible
with the dump conditions, and remove all remaining locking on the streams.
This effectively kills the streams lock. The measured gain here is around
1.6% with 4 threads (374krps -> 380k).
The global streams list is exclusively used for "show sess", to look up
a stream to shut down, and for the hard-stop. Having all of them in a
single list is extremely expensive in terms of locking when using threads,
with performance losses as high as 7% having been observed just due to
this.
This patch makes the list per-thread, since there's no need to have a
global one in this situation. All call places just iterate over all
threads. The most "invasive" changes was in "show sess" where the end
of list needs to go back to the beginning of next thread's list until
the last thread is seen. For now the lock was maintained to keep the
code auditable but a next commit should get rid of it.
The observed performance gain here with only 4 threads is already 7%
(350krps -> 374krps).
Instead of placing the current stream at the end of the stream list when
issuing a "show sess" on the CLI as was done in 2.2 with commit c6e7a1b8e
("MINOR: cli: make "show sess" stop at the last known session"), now we
compare the listed stream's epoch with the dumping stream's and stop on
more recent ones.
This way we're certain to always only dump known streams at the moment we
issue the dump command without having to modify the list. In theory we
could miss some streams if more than 2^31 "show sess" requests are issued
while an old stream remains present, but that's 68 years at 1 "show sess"
per second and it's unlikely we'll keep a process, let alone a stream, that
long.
It could be verified that the count of dumped streams still matches the
one before this change.
The "show sess" CLI command currently lists all streams and needs to
stop at a given position to avoid dumping forever. Since 2.2 with
commit c6e7a1b8e ("MINOR: cli: make "show sess" stop at the last known
session"), a hack consists in unlinking the stream running the applet
and linking it again at the current end of the list, in order to serve
as a delimiter. But this forces the stream list to be global, which
affects scalability.
This patch introduces an epoch, which is a global 32-bit counter that
is incremented by the "show sess" command, and which is copied by newly
created streams. This way any stream can know whether any other one is
newer or older than itself.
For now it's only stored and not exploited.
There's no locking around the lookup of a stream nor its shutdown
when issuing "shutdown sessions" over the CLI so the risk of crashing
the process is particularly high.
Let's use a thread_isolate() there which is suitable for this task, and
there are not that many alternatives.
This must be backported to 1.8.
Historically this function would try to wake the most accurate number of
process_stream() waiters. But since the introduction of filters which could
also require buffers (e.g. for compression), things started not to be as
accurate anymore. Nowadays muxes and transport layers also use buffers, so
the runqueue size has nothing to do anymore with the number of supposed
users to come.
In addition to this, the threshold was compared to the number of free buffer
calculated as allocated minus used, but this didn't work anymore with local
pools since these counts are not updated upon alloc/free!
Let's clean this up and pass the number of released buffers instead, and
consider that each waiter successfully called counts as one buffer. This
is not rocket science and will not suddenly fix everything, but at least
it cannot be as wrong as it is today.
This could have been marked as a bug given that the current situation is
totally broken regarding this, but this probably doesn't completely fix
it, it only goes in a better direction. It is possible however that it
makes sense in the future to backport this as part of a larger series if
the situation significantly improves.
The buffer wait queue used to be global historically but this doest not
make any sense anymore given that the most common use case is to have
thread-local pools. Thus there's no point waking up waiters of other
threads after releasing an entry, as they won't benefit from it.
Let's move the queue head to the thread_info structure and use
ti->buffer_wq from now on.
The two algos defining these functions (first and leastconn) do not need the
server's lock. However it's already present in pendconn_process_next_strm()
so the API must be updated so that the functions may take it if needed and
that the callers indicate whether they already own it.
As such, the call places (backend.c and stream.c) now do not take it
anymore, queue.c was unchanged since it's already held, and both "first"
and "leastconn" were updated to take it if not already held.
A quick test on the "first" algo showed a jump from 432 to 565k rps by
just dropping the lock in stream.c!
The remaining contention on the server lock solely comes from
sess_change_server() which takes the lock to add and remove a
stream from the server's actconn list. This is both expensive
and pointless since we have mt-lists, and this list is only
used by the CLI's "shutdown server sessions" command!
Let's migrate to an mt-list and remove the need for this costly
lock. By doing so, the request rate increased by ~1.8%.
This patch splits current dns.c into two files:
The first dns.c contains code related to DNS message exchange over UDP
and in future other TCP. We try to remove depencies to resolving
to make it usable by other stuff as DNS load balancing.
The new resolvers.c inherit of the code specific to the actual
resolvers.
Note:
It was really difficult to obtain a clean diff dur to the amount
of moved code.
Note2:
Counters and stuff related to stats is not cleany separated because
currently counters for both layers are merged and hard to separate
for now.
The code dealing with the copy of requests in the L7-buffer and the
retransmits during L7 retries has been moved in the HTTP analysers. The copy
is now performed in the REQ_HTTP_XFER_BODY analyser and the L7 retries is
performed in the RES_WAIT_HTTP analyser. This way, si_cs_recv() and
si_cs_send() don't care of it anymore. It is much more natural to deal with
L7 retry in HTTP analysers.
TCP to H1 upgrades are buggy for now. When such upgrade is performed, a
crash is experienced. The bug is the result of the recent H1 mux
refactoring, and more specifically because of the commit c4bfa59f1 ("MAJOR:
mux-h1: Create the client stream as later as possible"). Indeed, now the H1
mux is responsible to create the frontend conn-stream once the request
headers are fully received. Thus the TCP to H1 upgrade is a problem because
the frontend conn-stream already exists.
To fix the bug, we must keep this conn-stream and the associate stream and
use it in the H1 mux. To do so, the upgrade will be performed in two
steps. First, the mux is upgraded from mux-pt to mux-h1. Then, the mux-h1
performs the stream upgrade, once the request headers are fully received and
parsed. To do so, stream_upgrade_from_cs() must be used. This function set
the SF_HTX flags to switch the stream to HTX mode, it removes the SF_IGNORE
flags and eventually it fills the request channel with some input data.
This patch is required to fix the TCP to H1 upgrades and is intimately
linked with the next commits.
When a TCP to H2 upgrade is performed, the SF_IGNORE flag is set on the
stream before killing it. This happens when a TCP/SSL client connection is
routed to a HTTP backend and the h2 alpn detected. The SF_IGNORE flag was
added for this purpose, to skip some processing when the stream is aborted
before a mux upgrade. Some counters updates were skipped this way. But some
others are still updated.
Now, all counters update at the end of process_stream(), before releasing
the stream, are ignored if SF_IGNORE flag is set. Note this stream is
aborted because we switch from a mono-stream to a multi-stream
multiplexer. It works differently for TCP to H1 upgrades.
This patch should be backported as far as 2.0 after some observation period.
I was looking at writing a simple first test for prometheus but I
realised there is no proper way to exclude it if haproxy was not built
with prometheus plugin.
Today we have `REQUIRE_OPTIONS` in reg-tests which is based on `Feature
list` from `haproxy -vv`. Those options are coming from the Makefile
itself.
A plugin is build this way:
EXTRA_OBJS="contrib/prometheus-exporter/service-prometheus.o"
It does register service actions through `service_keywords_register`.
Those are listed through `list_services` in `haproxy -vv`.
To facilitate parsing, I slightly changed the output to a single line
and integrate it in regtests shell script so that we can now specify a
dependency while writing a reg-test for prometheus, e.g:
#REQUIRE_SERVICE=prometheus-exporter
#REQUIRE_SERVICES=prometheus-exporter,foo
There might be other ways to handle this, but that's the cleanest I
found; I understand people might be concerned by this output change in
`haproxy -vv` which goes from:
Available services :
foo
bar
to:
Available services : foo bar
Signed-off-by: William Dauchy <wdauchy@gmail.com>
This allows using the address of the server rather than the name of the
server for keeping track of servers in a backend for stickiness.
The peers code was also extended to support feeding the dictionary using
this key instead of the name.
Fixes#814
Add cur_server_timeout and cur_tunnel_timeout.
These sample fetches return the current timeout value for a stream. This
is useful to retrieve the value of a timeout which was changed via a
set-timeout rule.
Prepare the possibility to register sample fetches on the stream.
This commit is necessary to implement sample fetches to retrieve the
current timeout values.
Allow the modification of the tunnel timeout on the stream side.
Use a new field in the stream for the tunnel timeout. It is initialized
by the tunnel timeout from backend unless it has already been set by a
set-timeout tunnel rule.
Allow the modification of the timeout server value on the stream side.
Do not apply the default backend server timeout in back_establish if it
is already defined. This is the case if a set-timeout server rule has
been executed.
At the end of stream_new(), once the input buffer is transfer to the request
channel, it must not be used anymore. The previous patch (16df178b6 "BUG/MEDIUM:
stream: Xfer the input buffer to a fully created stream") was pushed to quickly.
No backport needed.
The input buffer passed as argument to create a new stream must not be
transferred when the request channel is initialized because the channel
flags are not set at this stage. In addition, the API is a bit confusing
regarding the buffer owner when an error occurred. The caller remains the
owner, but reading the code it is not obvious.
So, first of all, to avoid any ambiguities, comments are added on the
calling chain to make it clear. The buffer owner is the caller if any error
occurred. And the ownership is transferred to the stream on success.
Then, to make things simple, the ownership is transferred at the end of
stream_new(), in case of success. And the input buffer is updated to point
on BUF_NULL. Thus, in all cases, if the caller try to release it calling
b_free() on it, it is not a problem. Of course, it remains the caller
responsibility to release it on error.
The patch fixes a bug introduced by the commit 26256f86e ("MINOR: stream:
Pass an optional input buffer when a stream is created"). No backport is
needed.
The cumulative numbers of http requests, http errors, bytes received and
sent and their respective rates for a tracked counters are now updated using
specific stream independent functions. These functions are used by the
stream but the aim is to allow the session to do so too. For now, there is
no reason to perform these updates from the session, except from the mux-h2
maybe. But, the mux-h1, on the frontend side, will be able to return some
errors to the client, before the stream creation. In this case, it will be
mandatory to update counters tracked at the session level.
It is now possible to set the buffer used by the channel request buffer when
a stream is created. It may be useful if input data are already received,
instead of waiting the first call to the mux rcv_buf() callback. This change
is mandatory to support H1 connection with no stream attached.
For now, the multiplexers don't pass any buffer. BUF_NULL is thus used to
call stream_create_from_cs().
These info are only provided by the mux-h1. But, thanks to previous patches,
we can get them from the session directly. There is no need to retrieve them
from the mux anymore.
Since the idle duration provided by the session is always up-to-date, there
is no more reason to rely on the multiplexer cs_info to set it to the
stream.
This function simply calls action_lookup() on the private service_keywords,
to look up a service name. This will be used to detect double registration
of a same service from Lua.
This will be needed by a next patch to fix a bug and will have to be
backported.
The server lock must be held when server_take_conn() and server_drop_conn()
lbprm callback functions are called. It is a documented prerequisite but it is
not always performed. It only affects leastconn and fas lb algorithm. Others
don't use these callback functions.
A race condition on the next pending effecive weight (next_eweight) may be
encountered with the leastconn lb algorithm. An agent check may set it to 0
while fwlc_srv_reposition() is called. The server is locked during the
next_eweight update. But because the server lock is not acquired when
fwlc_srv_reposition() is called, we may use it to recompute the server key,
leading to a division by 0.
This patch must be backported as far as 1.8.
The remaining proxy states were only used to distinguish an enabled
proxy from a disabled one. Due to the initialization order, both
PR_STNEW and PR_STREADY were equivalent after startup, and they
would only differ from PR_STSTOPPED when the proxy is disabled or
shutdown (which is effectively another way to disable it).
Now we just have a "disabled" field which allows to distinguish them.
It's becoming obvious that start_proxies() is only used to print a
greeting message now, that we'd rather get rid of. Probably that
zombify_proxy() and stop_proxy() should be merged once their
differences move to the right place.
The receiver is the one which depends on the protocol while the listener
relies on the receiver. Let's move the protocol there. Since there's also
a list element to get back to the listener from the proto list, this list
element (proto_list) was moved as well. For now when scanning protos, we
still see listeners which are linked by their rx.proto_list part.
A dedicated expiration date is now used to apply the inspect-delay of the
tcp-request or tcp-response rulesets. Before, the analyse expiratation date was
used but it may also be updated by the lua (at least). So a lua script may
extend or reduce the inspect-delay by side effect. This is not expected. If it
becomes necessary, a specific function will be added to do this. Because, for
now, it is a bit confusing.
The do-resolve HTTP action, performing a DNS resolution of a sample expression
output, is not thread-safe at all. The resolver object used to do the resolution
must be locked when the action is executed or when the stream is released
because its curr or wait resolution lists and the requester list inside a
resolution are updated. It is also important to not wake up a released stream
(with a destroyed task).
Of course, because of this bug, various kind of crashes may be observed.
This patch should fix the issue #236. It must be backported as far as 2.0.
The TRY_ADDQ there was not needed since the wait list is exclusively
owned by the caller. There's a preliminary test on MT_LIST_ADDED()
that might have been eliminated by keeping MT_LIST_TRY_ADDQ() but
it would have required two more expensive writes before testing so
better keep the test the way it is.
Initially when mt_lists were added, their purpose was to be used with
the scheduler, where anyone may concurrently add the same tasklet, so
it sounded natural to implement a check in MT_LIST_ADD{,Q}. Later their
usage was extended and MT_LIST_ADD{,Q} started to be used on situations
where the element to be added was exclusively owned by the one performing
the operation so a conflict was impossible. This became more obvious with
the idle connections and the new macro was called MT_LIST_ADDQ_NOCHECK.
But this remains confusing and at many places it's not expected that
an MT_LIST_ADD could possibly fail, and worse, at some places we start
by initializing it before adding (and the test is superflous) so let's
rename them to something more conventional to denote the presence of the
check or not:
MT_LIST_ADD{,Q} : inconditional operation, the caller owns the
element, and doesn't care about the element's
current state (exactly like LIST_ADD)
MT_LIST_TRY_ADD{,Q}: only perform the operation if the element is not
already added or in the process of being added.
This means that the previously "safe" MT_LIST_ADD{,Q} are not "safe"
anymore. This also means that in case of backport mistakes in the
future causing this to be overlooked, the slower and safer functions
will still be used by default.
Note that the missing unchecked MT_LIST_ADD macro was added.
The rest of the code will have to be reviewed so that a number of
callers of MT_LIST_TRY_ADDQ are changed to MT_LIST_ADDQ to remove
the unneeded test.
"show sess" and particularly "show sess all" can be very slow when dumping
lots of information, and while dumping, new sessions might appear, making
the output really endless. When threads are used, this causes a double
problem:
- all threads are paused during the dump, so an overly long dump degrades
the quality of service ;
- since all threads are paused, more events get postponed, possibly
resulting in more streams to be dumped on next invocation of the dump
function.
This patch addresses this long-lasting issue by doing something simple:
the CLI's stream is moved at the end of the steams list, serving as an
identifiable marker to end the dump, because all entries past it were
added after the command was entered. As a result, the CLI's stream always
appears as the last one.
It may make sense to backport this to stable branches where dumping live
streams is difficult as well.
This one was confusingly called, I thought it was the cumulated number
of streams but it's the number of calls to process_stream(). Let's make
this clearer.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.
There is nothing left in common/ now so this directory must not be used
anymore.
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.
In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.
It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
The files remained mostly unchanged since they were OK. However, half of
the users didn't need to include them, and about as many actually needed
to have it and used to find functions like srv_currently_usable() through
a long chain that broke when moving the file.
This one is particularly difficult to split because it provides all the
functions used to manipulate a proxy state and to retrieve names or IDs
for error reporting, and as such, it was included in 73 files (down to
68 after cleanup). It would deserve a small cleanup though the cut points
are not obvious at the moment given the number of structs involved in
the struct proxy itself.
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.
Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.
A good approach would consist in splitting that file in 3 parts:
- one for error output ("errors" ?).
- one for log_format processing
- and one for actual logging.
It was moved without any change, however many callers didn't need it at
all. This was a consequence of the split of proto_http.c into several
parts that resulted in many locations to still reference it.
Almost no change except moving the cli_kw struct definition after the
defines. Almost all users had both types&proto included, which is not
surprizing since this code is old and it used to be the norm a decade
ago. These places were cleaned.
Just some minor reordering, and the usual cleanup of call places for
those which didn't need it. We don't include the whole tools.h into
stats-t anymore but just tools-t.h.
The type file was slightly tidied. The cli-specific APPCTX_CLI_ST1_* flag
definitions were moved to cli.h. The type file was adjusted to include
buf-t.h and not the huge buf.h. A few call places were fixed because they
did not need this include.
The files were moved almost as-is, just dropping arg-t and auth-t from
acl-t but keeping arg-t in acl.h. It was useful to revisit the call places
since a handful of files used to continue to include acl.h while they did
not need it at all. Struct stream was only made a forward declaration
since not otherwise needed.
The stktable_types[] array declaration was moved to the main file as
it had nothing to do in the types. A few declarations were reordered
in the types file so that defines were before the structs. Thread-t
was added since there are a few __decl_thread(). The loss of peers.h
revealed that cfgparse-listen needed it.
All includes that were not absolutely necessary were removed because
checks.h happens to very often be part of dependency loops. A warning
was added about this in check-t.h. The fields, enums and structs were
a bit tidied because it's particularly tedious to find anything there.
It would make sense to split this in two or more files (at least
extract tcp-checks).
The file was renamed to the singular because it was one of the rare
exceptions to have an "s" appended to its name compared to the struct
name.
The type file is becoming a mess, half of it is for the proxy protocol,
another good part describes conn_streams and mux ops, it would deserve
being split again. At least it was reordered so that elements are easier
to find, with the PP-stuff left at the end. The MAX_SEND_FD macro was moved
to compat.h as it's said to be the value for Linux.
The TASK_IS_TASKLET() macro was moved to the proto file instead of the
type one. The proto part was a bit reordered to remove a number of ugly
forward declaration of static inline functions. About a tens of C and H
files had their dependency dropped since they were not using anything
from task.h.
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.
It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.
The UNIX_MAX_PATH definition was moved to compat.h.
This one is particularly tricky to move because everyone uses it
and it depends on a lot of other types. For example it cannot include
arg-t.h and must absolutely only rely on forward declarations to avoid
dependency loops between vars -> sample_data -> arg. In order to address
this one, it would be nice to split the sample_data part out of sample.h.
List.h was missing for LIST_ADDQ(). A few unneeded includes of action.h
were removed from certain files.
This one still relies on applet.h and stick-table.h.
A few includes were missing in each file. A definition of
struct polled_mask was moved to fd-t.h. The MAX_POLLERS macro was
moved to defaults.h
Stdio used to be silently inherited from whatever path but it's needed
for list_pollers() which takes a FILE* and which can thus not be
forward-declared.
Most of the file was a large set of HTX elements manipulation functions
and few types, so splitting them allowed to further reduce dependencies
and shrink the build time. Doing so revealed that a few files (h2.c,
mux_pt.c) needed haproxy/buf.h and were previously getting it through
htx.h. They were fixed.
The pretty confusing "buffer.h" was in fact not the place to look for
the definition of "struct buffer" but the one responsible for dynamic
buffer allocation. As such it defines the struct buffer_wait and the
few functions to allocate a buffer or wait for one.
This patch moves it renaming it to dynbuf.h. The type definition was
moved to its own file since it's included in a number of other structs.
Doing this cleanup revealed that a significant number of files used to
rely on this one to inherit struct buffer through it but didn't need
anything from this file at all.
This moves types/activity.h to haproxy/activity-t.h and
proto/activity.h to haproxy/activity.h.
The macros defining the bit field values for the profiling variable
were moved to the type file to be more future-proof.
Now the file is ready to be stored into its final destination. A few
minor reorderings were performed to keep the file properly organized,
making the various sections more visible (cache & lockless).
In addition and to stay consistent, memory.c was renamed to pool.c.
types/freq_ctr.h was moved to haproxy/freq_ctr-t.h and proto/freq_ctr.h
was moved to haproxy/freq_ctr.h. Files were updated accordingly, no other
change was applied.
This splits the hathreads.h file into types+macros and functions. Given
that most users of this file used to include it only to get the definition
of THREAD_LOCAL and MAXTHREADS, the bare minimum was placed into thread-t.h
(i.e. types and macros).
All the thread management was left to haproxy/thread.h. It's worth noting
the drop of the trailing "s" in the name, to remove the permanent confusion
that arises between this one and the system implementation (no "s") and the
makefile's option (no "s").
For consistency, src/hathreads.c was also renamed thread.c.
A number of files were updated to only include thread-t which is the one
they really needed.
Some future improvements are possible like replacing empty inlined
functions with macros for the thread-less case, as building at -O0 disables
inlining and causes these ones to be emitted. But this really is cosmetic.
This one used to be stored into debug.h but the debug tools got larger
and require a lot of other includes, which can't use BUG_ON() anymore
because of this. It does not make sense and instead this macro should
be placed into the lower includes and given its omnipresence, the best
solution is to create a new bug.h with the few surrounding macros needed
to trigger bugs and place assertions anywhere.
Another benefit is that it won't be required to add include <debug.h>
anymore to use BUG_ON, it will automatically be covered by api.h. No
less than 32 occurrences were dropped.
The FSM_PRINTF macro was dropped since not used at all anymore (probably
since 1.6 or so).
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
Only allow L7 retries when using HTTP, it only really makes sense for HTTP,
anyway, and as the L7 retries code assume the message will be HTX, it will
crash when used with mode TCP.
This should fix github issue #627.
This should be backported to 2.1 and 2.0.
Now we very rarely catch spinning streams, and whenever we catch one it
seems a filter is involved, but we currently report no info about them.
Let's print the list of enabled filters on the stream with such a crash
to help with the reports. A typical output will now look like this:
[ALERT] 121/165908 (1110) : A bogus STREAM [0x7fcaf4016a60] is spinning at 2 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7fcaf4016a60 src=127.0.0.1 fe=l1 be=l1 dst=<CACHE> rqf=6dc42000 rqa=48000 rpf=a0040223 rpa=24000000 sif=EST,10008 sib=DIS,80110 af=(nil),0 csf=0x7fcaf4023c00,10c000 ab=0x7fcaf40235f0,4 csb=(nil),0 cof=0x7fcaf4016610,1300:H1(0x7fcaf4016840)/RAW((nil))/tcpv4(29) cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0) filters={0x7fcaf4016fb0="cache store filter", 0x7fcaf4017080="compression filter"}]
This may be backported to 2.0.
With server-template was introduced the possibility to scale the
number of servers in a backend without needing a configuration change
and associated reload. On the other hand it became impractical to
write use-server rules for these servers as they would only accept
existing server labels as argument. This patch allows the use of
log-format notation to describe targets of a use-server rules, such
as in the example below:
listen test
bind *:1234
use-server %[hdr(srv)] if { hdr(srv) -m found }
use-server s1 if { path / }
server s1 127.0.0.1:18080
server s2 127.0.0.1:18081
If a use-server rule is applied because it was conditionned by an
ACL returning true, but the target of the use-server rule cannot be
resolved, no other use-server rule is evaluated and we fall back to
load balancing.
This feature was requested on the ML, and bumped with issue #563.
It's more generic and versatile than the previous shut_your_big_mouth_gcc()
that was used to silence annoying warnings as it's not limited to ignoring
syscalls returns only. This allows us to get rid of the aforementioned
function and the shut_your_big_mouth_gcc_int variable, that started to
look ugly in multi-threaded environments.
In the rare case of immediate connect() (unix sockets, socket pairs, and
occasionally TCP over the loopback), it is counter-productive to subscribe
for sending and then getting immediately back to process_stream() after
having passed through si_cs_process() just to update the connection. We
already know it is established and it doesn't have any handshake anymore
so we just have to complete it and return to process_stream() with the
stream_interface in the SI_ST_RDY state. In this case, process_stream will
simply loop back to the beginning to synchronize the state and turn it to
SI_ST_EST/ASS/CLO/TAR etc.
This will save us from having to needlessly subscribe in the connect()
code, something which in addition cannot work with edge-triggered pollers.
This lock was only needed to protect the buffer_wq list, but now we have
the mt_list for this. This patch simply turns the buffer_wq list to an
mt_list and gets rid of the lock.
It's worth noting that the whole buffer_wait thing still looks totally
wrong especially in a threaded context: the wakeup_cb() callback is
called synchronously from any thread and may end up calling some
connection code that was not expected to run on a given thread. The
whole thing should probably be reworked to use tasklets instead and be
a bit more centralized.
This counter is already incremented when a new request is received (or if an
error occurred waiting it). So it must not be incremented when the stream is
terminated, at the end of process_strem(). This bug was introduced by the commit
cff0f739e ("MINOR: counters: Review conditions to increment counters from
analysers").
No backport needed.
In process_stream(), when a client or a server abort is handled, the
corresponding listener's counter is incremented. But, we must be sure to have a
listener attached to the session. This bug was introduced by the commit
cff0f739e5.
Thanks to Fred to reporting me the bug.
No need to backport this patch, except if commit cff0f739e5 is backported.
The flags in the act_flag enum have been renamed act_opt. It means ACT_OPT
prefix is used instead of ACT_FLAG. The purpose of this patch is to reserve the
action flags for the actions configuration.
Now, for these counters, the following rules are followed to know if it must be
incremented or not:
* if it exists for a frontend, the counter is incremented
* if stats must be collected for the session's listener, if the counter exists
for this listener, it is incremented
* if the backend is already assigned, if the counter exists for this backend,
it is incremented
* if a server is attached to the stream, if the counter exists for this
server, it is incremented
It is not hardcoded rules. Some counters are still handled in a different
way. But many counters are incremented this way now.
For more than a decade we've kept all the sess_update_st_*() functions
in stream.c while they're only there to work in relation with what is
currently being done in backend.c (srv_redispatch_connect, connect_server,
etc). Let's move all this pollution over there and take this opportunity
to try to find slightly less confusing names for these old functions
whose role is only to handle transitions from one specific stream-int
state:
sess_update_st_rdy_tcp() -> back_handle_st_rdy()
sess_update_st_con_tcp() -> back_handle_st_con()
sess_update_st_cer() -> back_handle_st_cer()
sess_update_stream_int() -> back_try_conn_req()
sess_prepare_conn_req() -> back_handle_st_req()
sess_establish() -> back_establish()
The last one remained in stream.c because it's more or less a completion
function which does all the initialization expected on a connection
success or failure, can set analysers and emit logs.
The other ones could possibly slightly benefit from being modified to
take a stream-int instead since it's really what they're working with,
but it's unimportant here.
In process_sticking_rules() we only want to apply the first store-request
rule for a given table, but when doing so we need to make sure we only
count actual store-request rules when we list the sticking rules.
Failure to do so leads to not being able to write store-request and match
sticking rules in any order as a match rule after a store-request rule
will be ignored.
The following configuration reproduces the issue:
global
stats socket /tmp/foobar
defaults
mode http
frontend in
bind *:8080
default_backend bar
backend bar
server s1 127.0.0.1:21212
server s2 127.0.0.1:21211
stick store-request req.hdr(foo)
stick match req.hdr(foo)
stick-table type string size 10
listen foo
bind *:21212
bind *:21211
http-request deny deny_status 200 if { dst_port 21212 }
http-request deny
This patch fixes issue #448 and should be backported as far as 1.6.
Building on a 32-bit platform produces these warnings in trace code:
src/stream.c: In function 'strm_trace':
src/stream.c:226:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 9 has type 'size_t {aka const unsigned int}' [-Wformat=]
chunk_appendf(&trace_buf, " req=(%p .fl=0x%08x .ana=0x%08x .exp(r,w,a)=(%u,%u,%u) .o=%lu .tot=%llu .to_fwd=%u)",
^
src/stream.c:229:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 9 has type 'size_t {aka const unsigned int}' [-Wformat=]
chunk_appendf(&trace_buf, " res=(%p .fl=0x%08x .ana=0x%08x .exp(r,w,a)=(%u,%u,%u) .o=%lu .tot=%llu .to_fwd=%u)",
^
src/mux_fcgi.c: In function 'fcgi_trace':
src/mux_fcgi.c:443:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka const unsigned int}' [-Wformat=]
chunk_appendf(&trace_buf, " - VAL=%lu", *val);
^
src/mux_h1.c: In function 'h1_trace':
src/mux_h1.c:290:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka const unsigned int}' [-Wformat=]
chunk_appendf(&trace_buf, " - VAL=%lu", *val);
^
Let's just cast the type to long. This should be backported to 2.1.
For backends and servers, some average times for last 1024 connections are
already calculated. For the moment, the averages for the time passed in the
queue, the connect time, the response time (for HTTP session only) and the total
time are calculated. Now, in addition, the maximum time observed for these
values are also stored.
In addition, These new counters are cleared as all other max values with the CLI
command "clear counters".
This patch is related to #272.
The "shutdown session server" command used to open-code the list traversal
while there's already a function for this: srv_shutdown_streams(). Better
use it.
We need to call vars_init() when the list is empty otherwise we
can't use variables in the response scope. This regression was
introduced by cda7f3f5 (MINOR: stream: don't prune variables if
the list is empty).
The following config reproduces the issue:
defaults
mode http
frontend in
bind *:11223
http-request set-var(req.foo) str("foo") if { path /bar }
http-request set-header bar %[var(req.foo)] if { var(req.foo) -m found }
http-response set-var(res.bar) str("bar")
http-response set-header foo %[var(res.bar)] if { var(res.bar) -m found }
use_backend out
backend out
server s1 127.0.0.1:11224
listen back
bind *:11224
http-request deny deny_status 200
> GET /ba HTTP/1.1
> Host: localhost:11223
> User-Agent: curl/7.66.0
> Accept: */*
>
< HTTP/1.0 200 OK
< Cache-Control: no-cache
< Content-Type: text/html
> GET /bar HTTP/1.1
> Host: localhost:11223
> User-Agent: curl/7.66.0
> Accept: */*
>
< HTTP/1.0 200 OK
< Cache-Control: no-cache
< Content-Type: text/html
< foo: bar
This must be backported as far as 1.9.
All TCP and HTTP captures are stored in 2 arrays, one for the request and
another for the response. In HAPRoxy 1.5, these arrays are part of the HTTP
transaction and thus are released during its cleanup. Because in this version,
the transaction is part of the stream (in 1.5, streams are still called
sessions), the cleanup is always performed, for HTTP and TCP streams.
In HAProxy 1.6, the HTTP transaction was moved out from the stream and is now
dynamically allocated only when required (becaues of an HTTP proxy or an HTTP
sample fetch). In addition, still in 1.6, the captures arrays were moved from
the HTTP transaction to the stream. This way, it is still possible to capture
elements from TCP rules for a full TCP stream. Unfortunately, the release is
still exclusively performed during the HTTP transaction cleanup. Thus, for a TCP
stream where the HTTP transaction is not required, the TCP captures, if any, are
never released.
Now, all captures are released when the stream is freed. This fixes the memory
leak for TCP streams. For streams with an HTTP transaction, the captures are now
released when the transaction is reset and not systematically during its
cleanup.
This patch must be backported as fas as 1.6.
Runtime traces are now supported for the streams, only if compiled with
debug. process_stream() is covered as well as TCP/HTTP analyzers and filters.
In traces, the first argument is always a stream. So it is easy to get the info
about the channels and the stream-interfaces. The second argument, when defined,
is always a HTTP transaction. And the third one is an HTTP message. The trace
message is adapted to report HTTP info when possible.
Despite the addition of the mux layer, no change have been made on how to enable
the TCP splicing on process_stream(). We still check if transport layer on both
sides support the splicing, but we don't check the muxes support. So it is
possible to start to splice data with an unencrypted H2 connection on a side and
an H1 connection on the other. This leads to a freeze of the stream until a
client or server timeout is reached.
This patch fixed a part of the issue #356. It must be backported as far as 1.8.
the option "http-send-name-header" is an eyesore. It was responsible of several
bugs because it is handled after the message analysis. With the HTX
representation, the situation is cleaner because no rewind on forwarded data is
required. But it remains ugly.
With recent changes in HAProxy, we have the opportunity to make it fairly
better. The message formatting in now done in the HTTP multiplexers. So it seems
to be the right place to handle this option. Now, the server name is added by
the HTTP multiplexers (h1, h2 and fcgi).
The "cache" entry was still present in the fdtab struct and it was
reported in "show sess". Removing it broke the cache-line alignment
on 64-bit machines which is important for threads, so it was fixed
by adding an attribute(aligned()) when threads are in use. Doing it
only in this case allows 32-bit thread-less platforms to see the
struct fit into 32 bytes.
There were 221 places where a status message or an error message were built
to be returned on the CLI. All of them were replaced to use cli_err(),
cli_msg(), cli_dynerr() or cli_dynmsg() depending on what was expected.
This removed a lot of duplicated code because most of the times, 4 lines
are replaced by a single, safer one.
Since last commit there's no point anymore in having two variants of the
same function, let's switch to b_free() only. __b_drop() was renamed to
__b_free() for obvious consistency reasons.
In stream_set_backend(), if we have a TCP stream, and we want to upgrade it
to H2 instead of attempting ot reuse the stream, just destroy the
conn_stream, make sure we don't log anything about the stream, and pretend
we failed setting the backend, so that the stream will get destroyed.
New streams will then be created by the mux, as if the connection just
happened.
This fixes a crash when upgrading from TCP to H2, as the H2 mux totally
ignored the conn_stream provided by the upgrade, as reported in github
issue #196.
This should be backported to 2.0.
In sess_established(), don't immediately switch the backend stream_interface
to SI_ST_DIS if we only got a SHUTR. We may still have something to send,
ie if the request is a POST, and we should be switched to SI_ST8DIS later
when the shutw will happen.
This should be backported to 2.0 and 1.9.
The purpose will be to store the target address there and not to
allocate a connection just for this anymore. For now it's only placed
in the struct, a few fields were moved to plug some holes, and the
entry is freed on release (never allocated yet for now). This must
have no impact. Note that in order to fit, the store_count which
previously was an int was turned into a short, which is way more
than enough given that the hard-coded limit is 8.
No allocation is needed there. Some extra checks were added in the
stream dump code to make sure the source address is effectively valid
(it always is but it doesn't cost much to be certain).
The stream outputs requires to retrieve connections sources and
destinations. The previous call involving conn_get_{to,from}_addr()
was missing a status check which has now been integrated with the
new call since these places already handle connection errors there.
The same code parts were reused for "show peers" and were modified
similarly.
The old module proto_http does not exist anymore. All code dedicated to the HTTP
analysis is now grouped in the file proto_htx.c. So, to finish the polishing
after removing the legacy HTTP code, proto_htx.{c,h} files have been moved in
http_ana.{c,h} files.
In addition, all HTX analyzers and related functions prefixed with "htx_" have
been renamed to start with "http_" instead.
First of all, all legacy HTTP analyzers and all functions exclusively used by
them were removed. So the most of the functions in proto_http.{c,h} were
removed. Only functions to deal with the HTTP transaction have been kept. Then,
http_msg and hdr_idx modules were entirely removed. And finally the structure
http_msg was lightened of all its useless information about the legacy HTTP. The
structure hdr_ctx was also removed because unused now, just like unused states
in the enum h1_state. Note that the memory pool "hdr_idx" was removed and
"http_txn" is now smaller.
All streams created for an HTTP proxy must now use the HTX internal
resprentation. So, it is no more necessary to test the flag PR_O2_USE_HTX. It
means a stream is an HTX stream if the frontend is an HTTP proxy or if the
frontend multiplexer, if any, set the flag MX_FL_HTX.
Since the legacy HTTP mode is disabled, old HTTP analyzers do nothing but call
those of the HTX. So, it is safe to directly call HTX analyzers from
process_stream().
<head> and <tail> fields are now signed 32-bits integers. For an empty HTX
message, these fields are set to -1. So the field <used> is now useless and can
safely be removed. To know if an HTX message is empty or not, we just compare
<head> against -1 (it also works with <tail>). The function htx_nbblks() has
been added to get the number of used blocks.
Move the logic to decide if we redispatch to a new server from
sess_update_st_cer() to a new inline function, stream_choose_redispatch(), and
use it in do_l7_retry() instead of just setting the state to SI_ST_REQ.
That way, when using L7 retries, we won't redispatch the request to another
server except if "option redispatch" is used.
This should be backported to 2.0.
It's really confusing to call it a task because it's a tasklet and used
in places where tasks and tasklets are used together. Let's rename it
to tasklet to remove this confusion.
These flags are not used by analysers, only by the shut* functions, and
they were covered by CF_MASK_STATIC only because in the past the shut
functions were in the middle of the analysers. But here they are causing
excess loop backs which provide no value and increase processing cost.
Ideally the CF_MASK_STATIC bitfield should be revisited, but doing this
alone is enough to reduce by 30% the number of calls to si_sync_send().
In process_stream() we detect a number of conditions to decide to loop
back to the analysers. Some of them are excessive in that they perform
a strict comparison instead of filtering on the flags relevant to the
analysers as is done at other places, resulting in excess wakeups. One
of the effect is that after a successful WRITE_PARTIAL, a second send is
not possible, resulting in the loss of WRITE_PARTIAL, causing another
wakeup! Let's apply the same mask and verify the flags correctly.
The "goto redo" at the end of process_stream() to make the states converge
is still a big source of problems and mostly stems from the very late call
to the send() functions, whose results need to be considered, while it's
being done in si_update_both() when leaving.
This patch extracts the si_sync_send() calls from si_update_both(), and
places them at the relevant places in process_stream(), which are just
after the amount of data to forward is updated and before the shutw()
calls (which were also moved). The stream-interface resynchronization
needs to go slightly upper to take into account the transition from CON
to RDY that will happen consecutive to some successful send(), and that's
all.
By doing so we can now get rid of this loop and have si_update_both()
called only to update the stream interface and channel when leaving the
function, as it was initially designed to work.
It is worth noting that a number of the remaining conditions to perform
a goto resync_XXX still seem suboptimal and would benefit from being
refined to perform les resynchronization. But what matters at this stage
is that the code remains valid and efficient.
Till now when a wakeup happens after a connection is attempted, we go
through sess_update_st_con_tcp() to deal with the various possible events,
then to sess_update_st_cer() to deal with a possible error detected by the
former, or to sess_establish() to complete the connection validation. There
are multiple issues in the way this is handled, which have accumulated over
time. One of them is that any spurious wakeup during SI_ST_CON would validate
the READ_ATTACHED flag and wake the analysers up. Another one is that nobody
feels responsible for clearing SI_FL_EXP if it happened at the same time as
a success (and it is present in all reports of loops to date). And another
issue is that aborts cannot happen after a clean connection setup with no
data transfer (since CF_WRITE_NULL is part of CF_WRITE_ACTIVITY). Last, the
flags cleanup work was hackish, added here and there to please the next
function (typically what had to be donne in commit 7a3367cca to work around
the url_param+reuse issue by moving READ_ATTACHED to CON).
This patch performs a significant lift up of this setup code. First, it
makes sure that the state handlers are the ones responsible for the cleanup
of the stuff they rely on. Typically sess_sestablish() will clean up the
SI_FL_EXP flag because if we decided to validate the connection it means
that we want to ignore this late timeout. Second, it splits the CON and
RDY state handlers because the former only has to deal with failures,
timeouts and non-events, while the latter has to deal with partial or
total successes. Third, everything related to connection success was
moved to sess_establish() since it's the only safe place to do so, and
this function is also called at a few places to deal with synchronous
connections, which are not seen by intermediary state handlers.
The code was made a bit more robust, for example by making sure we
always set SI_FL_NOLINGER when aborting a connection so that we don't
have any risk to leave a connection in SHUTW state in case it was
validated late. The useless return codes of some of these functions
were dropped so that callers only rely on the stream-int's state now
(which was already partially the case anyway).
The code is now a bit cleaner, could be further improved (and functions
renamed) but given the sensitivity of this part, better limit changes to
strictly necessary. It passes all reg tests.
The main reason for all the trouble we're facing with stream interface
error or timeout reports during the connection phase is that we currently
can't make the difference between a connection attempt and a validated
connection attempt. It is problematic because we tend to switch early
to SI_ST_EST but can't always do what we want in this state since it's
supposed to be set when we don't need to visit sess_establish() again.
This patch introduces a new state betwen SI_ST_CON and SI_ST_EST, which
is SI_ST_RDY. It indicates that we've verified that the connection is
ready. It's a transient state, like SI_ST_DIS, that cannot persist when
leaving process_stream(). For now it is not set, only verified in various
tests where SI_ST_CON was used or SI_ST_EST depending on the cases.
The stream-int state diagram was minimally updated to reflect the new
state, though it is largely obsolete and would need to be seriously
updated.
The stream interface state checks involving ranges were replaced with
checks on a set of states, already revealing some issues. No issue was
fixed, all was replaced in a one-to-one mapping for easier control. Some
checks involving a strict difference were also replaced with fields to
be clearer. At this stage, the result must be strictly equivalent. A few
tests were also turned to their bit-field equivalent for better readability
or in preparation for upcoming changes.
The test performed in the SPOE filter was swapped so that the closed and
error states are evicted first and that the established vs conn state is
tested second.
The test for the send-name-header field used to cover all states between
SI_ST_CON and SI_ST_CLO, which include SI_ST_CER and SI_ST_DIS. Trying to
send a header in these states makes no sense at all, so let's fix this.
This should have no visible impact so no backport is needed.
With this patch we modify the stickiness server targets lookup behavior.
First we look for this server targets by their names before looking for them by their
IDs if not found. We also insert a dictionary entry for the name of the server targets
and store the address of this entry in the underlying stick-table.
During 1.9 development cycle a shortcut was made in process_stream() to
update the analysers immediately after an I/O even detected on the send()
path while leaving the function. In order to prevent this from being abused
by a single stream stealing all the CPU, the loop didn't cover the initial
recv() call, so that events ultimately converge.
This has caused a number of issues over time because the conditions to
decide to loop are a bit tricky. For example the CF_READ_PARTIAL flag is
not immediately removed from rqf_last and may appear for a long time at
this point, sometimes causing some loops to last long.
Another unexpected side effect is that all analysers are called again with
no data to process, just because CF_WRITE_PARTIAL is present. We cannot get
rid of this event even if of very rare use, because some analysers might
wait for some data to leave a buffer before proceeding. With a full loop,
this event would have been merged with a subsequent recv() allowing analysers
to do something more useful than just ack an event they don't care about.
While during early 1.9-dev it was very important to be kind with the
scheduler, nowadays it's lock-free for local tasks so this optimization
is much less interesting to use it for I/Os, especially if we factor in
the trouble it causes.
This patch thus removes the use of the loop for regular I/Os and instead
performs a task_wakeup() with an I/O event so that the task will be
scheduled after all other ones and will have a chance to perform another
recv() and possibly to gather more I/O events to be processed at once.
Synchronous errors and transitions to SI_ST_DIS however are still handled
by the loop.
Doing so significantly reduces the average number of calls to analysers
(those are typically halved when compression is enabled in legacy mode),
and as a side benefit, has increased the H1 performance by about 1%.
The head of an HTX message is heavily used whereas the wrap position is only
used when a block is added or removed. So it is more logical to store the head
position in the HTX message instead of the wrap one. The wrap position can be
easily deduced. To get it, the new function htx_get_wrap() may be used.
It was not as efficient as the watchdog in that it would only trigger
after the problem resolved by itself, and still required a huge margin
to make sure we didn't trigger for an invalid reason. This used to leave
little indication about the cause. Better use the watchdog now and
improve it if needed.
The detector of unkillable tasks remains active though.
This function dumps a lot of information about a stream into the provided
buffer. It is now used by stream_dump_and_crash() and will be used by the
debugger as well.
When we receive a read0, and we're still in SI_ST_CON state (so on an
outgoing conneciton), don't immediately switch to SI_ST_DIS, or, we would
never call sess_establish(), and so the analysers will never run.
Instead, let sess_establish() handle that case, and switch to SI_ST_DIS if
we already have CF_SHUTR on the channel.
This should be backported to 1.9.
The test consisted in checking that there was always a timeout on a
stream's task and was only enabled when built in development mode,
but 1) it is never tested and 2) if it had been tested it would have
been noticed that it triggers a bit too easily on the CLI. Let's get
rid of this old one.
This makes sure that the stream is not visible from its own task just
before starting to free some of its components. This way we have the
guarantee that a stream found in a task list is totally valid and can
safely be dereferenced.
Add a new action for http-request, disable-l7-retry, that can be used to
disable any attempt at retry requests (see retry-on) if it fails for any
reason other than a connection failure.
This is useful for example to make sure POST requests aren't retried.
A backend stream-interface attached to a reused connection remains in the state
SI_ST_CONN until some data are sent to validate the connection. But when the
url_param algorithm is used to balance connections, no data are sent while the
connection is not established. So it is a chicken and egg situation.
To solve the problem, if no error is detected and when the request channel is
waiting for the connect(), we mark the read side as attached on the response
channel as soon as possible and we wake the request channel up once. This
happens in 2 places. The first one is right after the connect(), when the
stream-interface is still in state SI_ST_CON, in the function
sess_update_st_con_tcp(). The second one is when an applet is used instead of a
real connection to a server, in the function sess_prepare_conn_req(). In fact,
it is done when the backend stream-interface is set to the state SI_ST_EST.
This patch must be backported to 1.9.
When running in HTX mode, if we sent the request, but failed to get the
answer, either because the server just closed its socket, we hit a server
timeout, or we get a 404, 408, 425, 500, 501, 502, 503 or 504 error,
attempt to retry the request, exactly as if we just failed to connect to
the server.
To do so, add a new backend keyword, "retry-on".
It accepts a list of keywords, which can be "none" (never retry),
"conn-failure" (we failed to connect, or to do the SSL handshake),
"empty-response" (the server closed the connection without answering),
"response-timeout" (we timed out while waiting for the server response),
or "404", "408", "425", "500", "501", "502", "503" and "504".
The default is "conn-failure".
In sess_update_st_con_tcp(), if we have an error on the stream_interface
because we tried to send early_data but failed, don't flag the request
channel as CF_WRITE_ERROR, or we will never reach the analyser that sends
back the 425 response.
This should be backported to 1.9.
On some occasions we've had loops happening when processing actions
(e.g. a yield not being well understood) resulting in analysers being
called in loops until the analysis timeout without incrementing the
stream's call count, thus this type of bug cannot be caught by the
current protection system.
What this patch proposes is to start to measure the time spent in analysers
when profiling is enabled on the thread, in order to detect if a stream is
really misbehaving. In this case we measured the consumed CPU time, not the
wall clock time, so as not to be affected by possible noisy neighbours
sharing the same CPU. When more than 100ms are spent in an analyser, we
trigger the stream_dump_and_crash() function to report the anomaly.
The choice of 100ms comes from the fact that regular calls only take around
1 microsecond and it seems reasonable to accept a degradation factor of
100000, which covers very slow machines such as home gateways running on
sub-ghz processors, with extremely heavy configurations. Some complete
tests show that even this common bogus map_regm() entry supposedly designed
to extract a port from an IP:port entry does not trigger the timeout (25 ms
evaluation time for a 4kB header, exercise left to the reader to spot the
mistake) :
([0-9]{0,3}).([0-9]{0,3}).([0-9]{0,3}).([0-9]{0,3}):([0-9]{0,5}) \5
However this one purposely designed to kill haproxy definitely dies as it
manages to completely freeze the whole process for more than one second
on a 4 GHz CPU for only 120 bytes in :
(.{0,20})(.{0,20})(.{0,20})(.{0,20})(.{0,20})b \1
This protection will definitely help during the code stabilization period
and may possibly be left enabled later depending on reported issues or not.
If you've noticed that your workload is affected by this patch, please
report it as you have very likely found a bug. And in the mean time you
can turn profiling off to disable it.
If a stream is caught spinning over itself at more than 100000 loops per
second and for more than one second, the process will be aborted and the
offender reported on the console and logs. Typical figures usually are just
a few tens to hundreds per second over a very short time so there is a huge
margin here. Using even higher values could also work but there is the risk
of not being able to catch offenders if multiple ones start to bug at the
same time and share the load. This code should ideally be disabled for
stable releases, though in theory nothing should ever trigger it.
During 1.9 development (and even a bit after) we've started to face a
significant number of situations where streams were abusively spinning
due to an uncaught error flag or complex conditions that couldn't be
correctly identified. Sometimes streams wake appctx up and conversely
as well. More importantly when this happens the only fix is to restart.
This patch adds a new function to report a serious error, some relevant
info and to crash the process using abort() so that a core dump is
available. The purpose will be for this function to be called in various
situations where the process is unfixable. It will help detect these
issues much earlier during development and may even help fixing test
platforms which are able to automatically restart when such a condition
happens, though this is not the primary purpose.
This patch only provides the function and doesn't use it yet.
Very similarly to previous commit doing the same for streams, we now
measure and report an appctx's call rate. This will help catch applets
which do not consume all their data and/or which do not properly report
that they're waiting for something else. Some of them like peers might
theorically be able to exhibit some occasional peeks when teaching a
full table to a nearby peer (e.g. the new replacement process), but
nothing close to what a bogus service can do so there is no risk of
confusion.
Quite a few times some bugs have made a stream task incorrectly
handle a complex combination of events, which was often reported as
"100% CPU", and was usually caused by the event not being properly
identified and flushed, and the stream's handler called in loops.
This patch adds a call rate counter to the stream struct. It's not
huge, it's really inexpensive (especially compared to the rest of the
processing function) and will easily help spot such tasks in "show sess"
output, possibly even allowing to kill them.
A future patch should probably consist in alerting when they're above a
certain threshold, possibly sending a dump and killing them. Some options
could also consist in aborting in order to get an analyzable core dump
and let a service manager restart a fresh new process.
A regression was introduced with the commit c9aecc8ff ("BUG/MEDIUM: stream:
Don't request a server connection if a shutw was scheduled"). Among other this,
it breaks the CLI when the shutr on the client side is handled with the client
data. To depend on the flag CF_SHUTW_NOW to not establish the server connection
when an error on the client side is detected is the right way to fix the bug,
because this flag may be set without any error on the client side.
So instead, we abort the request where the error is handled and only when the
backend stream-interface is in the state SI_ST_INI. This way, there is no
ambiguity on the reason why the abort accurred. The stream-interface is also
switched to the state SI_ST_CLO.
This patch must be backported to 1.9. If the commit c9aecc8ff is backported to
previous versions, this one MUST also be backported. Otherwise, it MAY be
backported to older versions that 1.9 with caution.
Fix some missing initializations wich came with 333939c commit (MINOR: action:
new '(http-request|tcp-request content) do-resolve' action). The DNS contexts of
streams which were allocated were not initialized by stream_new(). This leaded to
accesses to non-allocated memory when freeing these contexts with stream_free().
The 'do-resolve' action is an http-request or tcp-request content action
which allows to run DNS resolution at run time in HAProxy.
The name to be resolved can be picked up in the request sent by the
client and the result of the resolution is stored in a variable.
The time the resolution is being performed, the request is on pause.
If the resolution can't provide a suitable result, then the variable
will be empty. It's up to the admin to take decisions based on this
statement (return 503 to prevent loops).
Read carefully the documentation concerning this feature, to ensure your
setup is secure and safe to be used in production.
This patch creates a global counter to track various errors reported by
the action 'do-resolve'.
If a shutdown for writes was performed on the client side (CF_SHUTW is set on
the request channel) while the server connection is still unestablished (the
stream-int is in the state SI_ST_INI), then it is aborted. It must also be
aborted when the shudown for write is pending (only CF_SHUTW_NOW is
set). Otherwise, some errors on the request channel can be ignored, leaving the
stream in an undefined state.
This patch must be backported to 1.9. It may probably be backported to all
suported versions, but it is unclear if the bug is visbile for older versions
than 1.9. So it is probably safer to wait bug reports on these versions to
backport this patch.
task_delete() was never used without calling task_free() just after, and
task_free() was only used on error pathes to destroy a just-created task,
so merge them into task_destroy(), that will remove the task from the
wait queue, and make sure the task is either destroyed immediately if it's
not in the run queue, or destroyed when it's supposed to run.
The flag SF_HTX has been added to know when a stream uses the HTX or not. It is
set when an HTX stream is created. There are 2 conditions to set it. The first
one is when the HTTP frontend enables the HTX. The second one is when the attached
conn_stream uses an HTX multiplexer.
In process_stream(), only try again when there's the SI_FL_ERR flag and we're
in a connected state, otherwise we can loop forever.
It used to work because si_update_both() bogusly removed the SI_FL_ERR flag,
and it would never be set at this point. Now it does, so take that into
account.
Many, many thanks to Maciej Zdeb for reporting the problem, and helping
investigating it.
This should be backported to 1.9.
In commit d7704b534, we introduced and expiration flag on the stream interface,
which is used for the connect, the queue and the turn around. Because the
turn around state isn't an error, the flag was reset in process_stream(), and
later in commit cff6411f9 when introducing the SI_FL_ERR flag, the cleanup
of the flag at this place was erroneously generalized.
To fix this, the SI_FL_EXP flag is only cleared at the end of the turn around
state, and nobody should clear the stream interface flags anymore.
This should be backported to 1.9, it has no known impact on older versions.
As si_update_both() sets prev_state to state for each stream_interface, if
we want to check it changed, copy it before calling si_update_both().
This should be backported to 1.9.
Don't inconditionally remove the SI_FL_ERR code in si_update_both(), which
is called at the end of process_stream(). Doing so was a bug that was there
since the flag was introduced, because we were always setting si->flags to
SI_FL_NONE, however we don't want to lose that one, except if we will retry
connecting, so only remove it in sess_update_st_cer().
This should be backported to 1.9.
Because the flag CF_SHUTR is no more set to mark the end of the message by the
H2 multiplexer, we can rely on it again to detect aborts. there is no more need
to make a check on the flag SI_FL_CLEAN_ABRT when the option abortonclose is
enabled. So, this option should work as before for h2 clients.
This patch must be backported to 1.9 with the previous EOI patches.
In addition to stats and cache applets, there are also HTTP applet services
declared in an http-request rule. All these applets are now handled the same
way. Among other things, the header Expect is handled at the same place for all
these applets.
It's never easy to guess what services are built in. We currently have
the prometheus exporter in contrib/ which is the only extension for now.
Let's enumerate all available ones just like we do for filterr and pollers.
gcc 6+ complains about a possible null-deref here due to the test in
objt_server() :
if (objt_server(s->target))
HA_ATOMIC_ADD(&objt_server(s->target)->counters.retries, 1);
Let's simply change it to __objt_server(). This can be backported to
1.9 and 1.8.
Commit 32211a1 ("BUG/MEDIUM: stream: Don't forget to free
s->unique_id in stream_free().") addressed a memory leak but in
exchange may cause double-free due to the fact that after freeing
s->unique_id it doesn't null it and then calls http_end_txn()
which frees it again. Thus the process quickly crashes at runtime.
This fix must be backported to all stable branches where the
aforementioned patch was backported.
In stream_free(), free s->unique_id. We may still have one, because it's
allocated in log.c::strm_log() no matter what, even if it's a TCP connection
and thus it won't get free'd by http_end_txn().
Failure to do so leads to a memory leak.
This should probably be backported to all maintained branches.
In 1.5-dev13, a bug was introduced by commit e3224e870 ("BUG/MINOR:
session: ensure that we don't retry connection if some data were sent").
If a connection error is reported after some data were sent (and lost),
we used to accidently mark the front connection as being in error instead
of only the back one because the two direction flags were applied to the
same channel. This case is extremely rare with raw connections but can
happen a bit more often with multiplexed streams. This will result in
the error not being correctly reported to the client.
This patch can be backported to all supported versions.
When a connection reuse fails, we must not wait before retrying, as most
likely the issue is related to the reused connection and not to the server
itself.
This should be backported to 1.9, though it depends on previous patches
dealing with SI_ST_CON for connection reuse.
Before the first send() attempt, we should be in SI_ST_CON, not
SI_ST_EST, since we have not yet attempted to send and we are
allowed to retry. This is particularly important with complex
outgoing muxes which can fail during the first send attempt (e.g.
failed stream ID allocation).
It only requires that sess_update_st_con_tcp() knows about this
possibility, as we must not forcefully close a reused connection
when facing an error in this case, this will be handled later.
This may be backported to 1.9 with care after some observation period.
We currently detect a number of situations where we have to immediately
deal with a state change, but we failed to consider the case of the
synchronous error reported on the stream-interface. We definitely do not
want to have to wait for a timeout to handle this one, especially at the
beginning of the connection when it can lead to an immediate retry.
This should be backported to 1.9.
The "show sess all" command didn't allow to detect whether compression
is in use for a given stream, which is sometimes annoying. Let's add a
few more info about the HTTP messages, namely the flags, body len, chunk
len and the "next" pointer.
The "waiting" flag indicates if the stream is waiting for some memory,
and was placed on the same output line as the txn for ease of reading.
But since 1.6 the txn is not part of the stream anymore so this output
was placed under a condition, resulting in "waiting" to appear only
when a txn is present. Let's move it upper, closer to the stream's
flags to fix this.
This may safely be backported though it has little value for older
versions.
Commit b9af88151 ("MINOR: stream/htx: Add info about the HTX structs in
"show sess all" command") accidently forgot the flags on the request
path, it was only on the response path.
It makes sense to backport this to 1.9 so that both outputs are the same.
This one used to rely on a few spin locks around lists manipulations
only but 1) there were still a few races (e.g. when aborting, or
between STAT_ST_INIT and STAT_ST_LIST), and 2) after last commit
which dumps htx info it became obvious that dereferencing the buffer
contents is not safe at all.
This patch uses the thread isolation from the rendez-vous point
instead, to guarantee that nothing moves during the dump. It may
make the dump a bit slower but it will be 100% safe.
This fix must be backported to 1.9, and possibly to 1.8 which likely
suffers from the short races above, eventhough they're extremely
hard to trigger.
For HTX streams, info about the HTX structure is now dumped for the request and
the response channels in "show sess all" command.
The patch may be backported to 1.9.
As long-time changes have accumulated over time, the exported functions
of the stream-interface were almost all prefixed "si_<something>" while
most private ones (mostly callbacks) were called "stream_int_<something>".
There were still a few confusing exceptions, which were addressed to
follow this shcme :
- stream_sock_read0(), only used internally, was renamed stream_int_read0()
and made static
- stream_int_notify() is only private and was made static
- stream_int_{check_timeouts,report_error,retnclose,register_handler,update}
were renamed si_<something>.
Now it is clearer when checking one of these if it risks to be used outside
or not.
Before setting the infinite forward, we first forward all remaining input data
from the channel. Of course for HTX streams, this must be done using the amount
of data in the HTX message not in the channel (which appears as full because of
the HTX).
When the connection failed, we don't really want to close the conn_stream,
as we're probably about to retry, so just make sure the file descriptor is
closed.
These flags haven't been used for a while. SF_TUNNEL was reintroduced
by commit d62b98c6e ("MINOR: stream: don't set backend's nor response
analysers on SF_TUNNEL") to handle the two-level streams needed to
deal with the first model for H2, and was not removed after this model
was abandonned. SF_INITIALIZED was only set. SF_CONN_TAR was never
referenced at all.
All the HTX definition is self-contained and doesn't really depend on
anything external since it's a mostly protocol. In addition, some
external similar files (like h2) also placed in common used to rely
on it, making it a bit awkward.
This patch moves the two htx.h files into a single self-contained one.
The historical dependency on sample.h could be also removed since it
used to be there only for http_meth_t which is now in http.h.
The CLI proxy was not handling payload. To do that, we needed to keep a
connection active on a server and to transfer each new line over that
connection until we receive a empty line.
The CLI proxy handles the payload in the same way that the CLI do it.
Examples:
$ echo -e "@1;add map #-1 <<\n$(cat data)\n" | socat /tmp/master-socket -
$ socat /tmp/master-socket readline
prompt
master> @1
25130> add map #-1 <<
+ test test
+ test2 test2
+ test3 test3
+
25130>
To ease the fast forwarding and the infinte forwarding on HTX proxies, 2
functions have been added to let the channel be almost aware of the way data are
stored in its buffer. By calling these functions instead of legacy ones, we are
sure to forward the right amount of data.
These potential null-deref warnings are emitted on gcc 7 and above
when threads are disabled due to the use of objt_server() after an
existing validity test. Let's switch to __objt_server() since we
know the pointer is valid, it will not confuse the compiler.
Some of these may be backported to 1.8.
We can reach sess_update_st_con_tcp() while we still have a connection
attached, so take that into account, and free the connection, instead of
assuming it's always a conn_stream.
signal_init(), init_log(), init_stream(), and init_task() all used to
only preset some values and lists. This needs to be done very early to
provide a reliable interface to all other users. The calls used to be
explicit in haproxy.c:init(). Now they're placed in initcalls at the
STG_PREPARE stage. The functions are not exported anymore.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
This patch replaces a number of __decl_hathread() followed by HA_SPIN_INIT
or HA_RWLOCK_INIT by the new __decl_spinlock() or __decl_rwlock() which
automatically registers the lock for initialization in during the STG_LOCK
init stage. A few static modifiers were lost in the process, but since they
were not essential at all it was not worth extending the API to provide such
a variant.
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :
- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
When ending a stream, if the origin is an appctx, the appctx will have been
destroyed already, but it does not destroy the session. So later, when we
try to destroy the session, we try to dereference sess->origin and die
trying.
Fix this by explicitely setting sess->origin to NULL before calling
session_free().
If an ALPN (or a NPN) was chosen for a server, defer choosing the mux until
after the SSL handshake is done, and the ALPN/NPN has been negociated, so
that we know which mux to pick.
As we now will no longer try tro subscribe to recv/send events before the
connection is established, there's no need to reactivate polling on the fd
when retrying connection. It will be activated later on subscribe.
Right now we measure for each task the cumulated time spent waiting for
the CPU and using it. The timestamp uses a 64-bit integer to report a
nanosecond-level date. This is only enabled when "profiling.tasks" is
enabled, and consumes less than 1% extra CPU on x86_64 when enabled.
The cumulated processing time and wait time are reported in "show sess".
The task's counters are also reset when an HTTP transaction is reset
since the HTTP part pretends to restart on a fresh new stream. This
will make sure we always report correct numbers for each request in
the logs.
At the moment the situation with activity measurement is quite tricky
because the struct activity is defined in global.h and declared in
haproxy.c, with operations made in time.h and relying on freq_ctr
which are defined in freq_ctr.h which itself includes time.h. It's
barely possible to touch any of these files without breaking all the
circular dependency.
Let's move all this stuff to activity.{c,h} and be done with it. The
measurement of active and stolen time is now done in a dedicated
function called just after tv_before_poll() instead of mixing the two,
which used to be a lazy (but convenient) decision.
No code was changed, stuff was just moved around.
When a server is down, the channel's data must not be consumed. This is
required to allow redispatch and connection retry. So now, we wait for
the connection to be marked as connected, with the flag CO_FL_CONNECTED,
before starting to consume channel's data. In the mux, this event is
tracked with the flag H1C_F_CS_WAIT_CONN.
Now, the connection mode is detected in the mux and not in HTX analyzers
anymore. Keep-alive connections are now managed by the mux. A new stream is
created for each transaction. This removes the most important part of the
synchronization between channels and the HTTP transaction cleanup. These changes
only affect the HTX part (proto_htx.c). Legacy HTTP analyzers remain untouched
for now.
On the client-side, the mux is responsible to create new streams when a new
request starts. It is also responsible to parse and update the "Connection:"
header of the response. On the server-side, the mux is responsible to parse and
update the "Connection:" header of the request. Muxes on each side are
independent. For now, there is no connection pool on the server-side, so it
always close the server connection.
Instead of trying to receive as soon as the connection is created, and to
eventually have to transfer subscription if we move connections, wait
until the connection is established before attempting to recv.
Remaining calls to si_cant_put() were all for lack of room and were
turned to si_rx_room_blk(). A few places where SI_FL_RXBLK_ROOM was
cleared by hand were converted to si_rx_room_rdy().
The now unused si_cant_put() function was removed.
The stream interface used to conflate a missing buffer and lack of
buffer space into SI_FL_WAIT_ROOM but this causes difficulties as
these cannot be checked at the same moment and are not resolved at
the same moment either. Now we instead mark the buffer as presumably
available using si_rx_buff_rdy() and mark it as unavailable+requested
using si_rx_buff_blk().
The call to si_alloc_buf() was moved after si_stop_put(). This makes
sure that the SI_FL_RX_WAIT_EP flag is cleared on allocation failure so
that the function is called again if the callee fails to do its work.
This flag is not enough to describe all blocking situations, as can be
seen in each case we remove it. The muxes has taught us that using multiple
blocking flags in parallel will be much easier, so let's start to do this
now. This patch only renames this flags in order to make next changes more
readable.
There are still some unwelcome synchronous calls to si_cs_recv() in
process_stream(). Let's have a new function si_sync_recv() to perform
a synchronous receive call on a stream interface regardless of the type
of its endpoint, and move these calls there. For now it only implements
conn_streams since it doesn't seem useful to support applets there. The
function implements an extra check for the stream interface to be in an
established state before attempting anything.
In commit f26c26c ("BUG/MEDIUM: stream-int: change the way buffer room
is requested by a stream-int") we used to call si_want_put() at the
end of sess_update_st_con_tcp(), when switching to SI_ST_EST state.
But this is incorrect as there are a few other situations where we
can switch to this state, such as in si_connect() where a connection
reuse is detected, or when directly calling an applet (in which case
that was already covered anyway). For now it doesn't have any side
effect but it could impact connection reuse after the stream-int
changes by stalling an immediately reused connection.
Let's move this flag change to sess_establish() instead, which is the
only place which is always called exactly once on connection setup.
No backport is needed, this is purely 1.9.
Subsequent to the recent stream-int updates, we started to consider that
SI_FL_WANT_PUT needs to be set when receipt is enabled, but this is wrong
and results in 100% CPU when an HTTP client stays idle after a keep-alive
request because the stream-int has nothing to provide and nothing to send.
In fact just like for applets this flag should reflect the continuation
of an attempt. So it's si_cs_recv() which should set the flag, and clear
it if it has nothing more to provide. This function is called the first
time in process_stream()), and called again during transfers, so it will
always be up to date during stream_int_update() and stream_int_notify().
As a special case, it should also be set when a connection switches to
the established state. And we should absolutely refrain from calling
si_cs_recv() to re-enable reading, normally just setting this flag
(from within the stream-int's handler or prior to calling si_chk_rcv())
is expected to be OK.
A corner case remains where it was observed that in stream_int_notify() we
can sometimes be called with an empty output channel with SI_FL_WAIT_ROOM
and no CF_WRITE_PARTIAL, so there's no way to detect that we should
re-enable receiving. It's easy to also take care of this condition
there for the time it takes to figure if this situation is expected
or not.
Now it becomes more obvious that relying on a single flag to request
room (or on two flags to arbiter activity) is not workable given the
autonomy of both sides. The mux_h2 has taught us that blocking flags
are much more reliable, require much less condition and are much easier
to deal with. That's probably something to consider quickly in this
area.
No backport is needed.
It's far from being clean, but at least it allows to resync both CS and
applets from the same place, taking into account the fact that CS are
processed synchronously for the send side while appletx are processed
outside of the process_stream() loop. The arrangement is optimised to
minimize the amount of iteration by handling send first, then updating
the SI_FL_WAIT_ROOM flags and only then dealing with si_chk_rcv() on
both sides. The SI_FL_WANT_PUT flag is set if needed before calling
si_chk_rcv() since this is done prior to calling stream_int_update().
Now there's no risk that stream_int_notify() is called anymore during
such operations, thus we cannot have any spurious wake-up anymore. The
case where a successful send() could complete a pending connect() is
handled by taking any stream-int state changes into account at the
call place, which is normal since process_stream() is designed to
iterate till stabilisation.
Doing this solves most of the remaining inconsistencies between CS and
applets.
The function used to be called in turn for each side of the stream, but
since it's called exclusively from process_stream(), it prevents us from
making use of the knowledge we have of the operations in progress for
each side, resulting in having to go all the way through functions like
stream_int_notify() which are not appropriate there.
That patch creates a new function, si_update_both() which takes two
stream interfaces expected to belong to the same stream, and processes
their flags in a more suitable order, but for now doesn't change the
logic at all.
The next step will consist in trying to reinsert the rest of the socket
layer-specific update code to ultimately update the flags correctly at
the end of the operation.
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
This function replaces stream_res_available(), which is used as a callback
for the buffer allocator. It now carefully checks which stream interface
was blocked on a buffer allocation, tries to allocate the input buffer to
this stream interface, and wakes the task up once such a buffer was found.
It will automatically remove the SI_FL_WAIT_ROOM flag upon success since
the info this flag indicates becomes wrong as soon as the buffer is
allocated.
The code is still far from being perfect because if a call to si_cs_recv()
fails to allocate a buffer, we'll still end up passing via process_stream()
again, but this could be improved in the future by using finer-grained
wake-up notifications.
This patch implements analysers for parsing the CLI and extra features
for the master's CLI.
For each command (sent alone, or separated by ; or \n) the request
analyser will determine to which server it should send the request.
The 'mode cli' proxy is able to parse a prefix for each command which is
used to select the apropriate server. The prefix start by @ and is
followed by "master", the PID preceded by ! or the relative PID. (e.g.
@master, @1, @!1234). The servers are not round-robined anymore.
The command is sent with a SHUTW which force the server to close the
connection after sending its response. However the proxy allows a
keepalive connection on the client side and does not close.
The response analyser does not do much stuff, it only reinits the
connection when it received a close from the server, and forward the
response. It does not analyze the response data.
The only guarantee of the end of the response is the close of the
server, we can't rely on the double \n since it's not send by every
command.
This could be reimplemented later as a filter.
With the new synchronous si_cs_send() at the end of process_stream(),
we're seeing re-appear the I/O layer specific part of the stream interface
which is supposed to deal with I/O event subscription. The only difference
is that now we subscribe to I/Os only after having attempted (and failed)
them.
This patch brings a cleanup in this by reintroducing stream_int_update_conn()
with the send code from process_stream(). However this alone would not be
enough because the flags which are cleared afterwards would result in the
loss of the possible events (write events only at the moment). So the flags
clearing and stream-int state updates are also performed inside si_update()
between the generic code and the I/O specific code. This definitely makes
sense as after this call we can simply check again for channel and SI flag
changes and decide to loop once again or not.
The rationale here is that we should never need to try to send() at the
beginning of process_stream() because :
- if something was pending, it's very unlikely that it was unblocked
and not sent just between the last poll() and the wakeup instant.
- if something pending was recently sent, then we don't have anything
to send anymore.
So at first glance it doesn't seem like there could be any valid case
where trying to send before entering the function brings any benefit.
If a buffer allocation failed, we have SI_FL_WAIT_ROOM set and c_size(buf)
being zero. It's the only moment where we have a new opportunity to try to
allocate this buffer. However we don't want to waste our time trying this
if both are non-null since it indicates missing room without any changed
condition.
The vars_prune() and vars_init() functions involve locking while most of
the time there is no variable at all in streams nor sessions. Let's check
for emptiness before calling these functions. Simply doing this has
increased the multithreaded performance from 1.5 to 5% depending on the
workload.
The behaviour of the flag CF_WRITE_PARTIAL was modified by commit
95fad5ba4 ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") due
to a situation where it could trigger an immediate wake up of the other
side, both acting in loops via the FD cache. This loss has caused the
need to introduce CF_WRITE_EVENT as commit c5a9d5bf, to replace it, but
both flags express more or less the same thing and this distinction
creates a lot of confusion and complexity in the code.
Since the FD cache now acts via tasklets, the issue worked around in the
first patch no longer exists, so it's more than time to kill this hack
and to restore CF_WRITE_PARTIAL's semantics (i.e.: there has been some
write activity since we last left process_stream).
This patch mostly reverts the two commits above. Only the part making
use of CF_WROTE_DATA instead of CF_WRITE_PARTIAL to detect the loss of
data upon connection setup was kept because it's more accurate and
better suited.
Make sure we call tasklet_free() only after si_release_endpoint(), when the
unsubscribe() method has been called, so that we're sure the mux won't
attempt to access the taslet.
Avoid using conn_xprt_want_send/recv, and totally nuke cs_want_send/recv,
from the upper layers. The polling is now directly handled by the connection
layer, it is activated on subscribe(), and unactivated once we got the event
and we woke the related task.
When retrying to connect to a server, because the previous connection failed,
make sure if we subscribed to the previous connection, the polling flags will
be true for the new fd.
No backport is needed.
In case pool_alloc() fails in stream_new(), we try to detach the stream
from the list before it has been added, dereferencing a NULL. In order
to fix it, simply move the LIST_DEL call upwards.
This must be backported to 1.8.
Make sure we unsubscribe from events before si_release_endpoint destroys
the conn_stream, or it will be never called. To do so, move the call to
unsubscribe to si_release_endpoint() directly.
This is 1.9-specific and shouldn't be backported.
When subscribing, we don't need to provide a list element, only the h2 mux
needs it. So instead, Add a list element to struct h2s, and use it when a
list is needed.
This forces us to use the unsubscribe method, since we can't just unsubscribe
by using LIST_DEL anymore.
This patch is larger than it should be because it includes some renaming.
These ones are mostly called from cfgparse.c for the parsing and do
not depend on the HTTP representation. The functions's prototypes
were moved to proto/http_rules.h, making this file work exactly like
tcp_rules. Ideally we should stop calling these functions directly
from cfgparse and register keywords, but there are a few cases where
that wouldn't work (stats http-request) so it's probably not worth
trying to go this far.
At the eand of process_stream(), we wake the task if there's something in
the input buffer, after attempting a recv. However this is wrong, and we should
only do so if we received new data. Just check the CF_READ_PARTIAL flag.
This is 1.9-specific and should not be backported.
Instead of using si_cs_io_cb() in process_stream() use si_cs_send/si_cs_recv
instead, as si_cs_io_cb() may lead to process_stream being woken up when it
shouldn't be, and thus timeout would never get triggered.
Instead of waiting for the connection layer to let us know we can read,
attempt to receive as soon as process_stream() is called, and subscribe
to receive events if we can't receive yet.
Now, except for idle connections, the recv(), send() and wake() methods are
no more, all the lower layers do is waking tasklet for anybody waiting
for I/O events.
The handshake processing time used to be stored per stream, which was
valid when there was exactly one stream per session. With H2 and
multiplexing it's not the case anymore and the reported handshake times
are wrong in the logs as it's computed between the TCP accept() and the
stream creation. Let's first move the handshake where it belongs, which
is the session.
However, this is not enough because we don't want to report an excessive
idle time either for H2 (since many requests use the connection).
So the solution used here is to have the stream retrieve sess->tv_accept
and the handshake duration when the stream is created, and let the mux
immediately reset them. This way, the handshake time becomes zero for the
second and subsequent requests in H2 (which was already the case in H1),
and the idle time exactly counts how long the connection remained unused
while it could be used, so in H1 it runs from the end of the previous
response and in H2 it runs from the end of the previous request since the
channel is already available.
This patch will need to be backported to 1.8.
The request counter is incremented when creating a new stream and when
resetting a stream, preparing for a new request. Unfortunately during
the thread migration this was missed, leading to non-atomic increments
in case threads are in use. The most visible side effect is that two
requests may have the same ID from time to time in the logs. However
the SPOE also uses this ID to route responses back to the stream so it
may also lead to occasional spurious SPOE timeouts.
Note that it still doesn't guarantee temporal unicity in the stream
identifiers since a long and a short connection could technically use
the same ID. The likeliness that this happens at the same time is almost
null (roughly threads*runqueue_depth/2^32 that it happens in the same
poll loop), but it will have to be addressed later anyway.
This patch must be backported to 1.8 with the other one it relies on
("MINOR: thread: implement HA_ATOMIC_XADD()").
When freeing the stream, make sure we remove the stream interfaces from the
wait lists, in case it was in there.
This is 1.9-specific, no backport is needed.
Instead of just using the conn_stream wait_list, give the stream_interface
its own. When the conn_stream will have its own buffers, the stream_interface
may have to wait on it.
This adds the set-priority-class and set-priority-offset actions to
http-request and tcp-request content. At this point they are not used
yet, which is the purpose of the next commit, but all the logic to
set and clear the values is there.
The current name is misleading as it implies a queue size, but the value
instead indicates a position in the queue.
The value is only the queue size at the exact moment the element is enqueued.
Soon we will gain the ability to insert anywhere into the queue, upon which
clarity of the name is more important.
We'll soon need to rely on the pendconn position at the time of dequeuing
to figure the position a stream took in the queue. Usually it's not a
problem since pendconn_free() is called once the connection starts, but
it will make a difference for failed dequeues (eg: queue timeout reached).
Thus it's important to call pendconn_free() before logging in cases we are
not certain whether it was already performed, and to call pendconn_unlink()
after we know the pendconn will not be used so that we collect the queue
state as accurately as possible. As a benefit it will also make the
server's and backend's queues count more accurate in these cases.
Now pendconn_free() takes a stream, checks that pend_pos is set, clears
it, and uses pendconn_unlink() to complete the job. It's cleaner and
centralizes all the bookkeeping work in pendconn_unlink() only and
ensures that there's a single place where the stream's position in the
queue is manipulated.
It remained some fragments of the old buffers API in debug messages, here and
there.
This was caused by the recent buffer API changes, no backport is needed.
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.
Most of the changes were made with spatch using this coccinelle script :
@rule_d1@
typedef chunk;
struct chunk chunk;
@@
- chunk.str
+ chunk.area
@rule_d2@
typedef chunk;
struct chunk chunk;
@@
- chunk.len
+ chunk.data
@rule_i1@
typedef chunk;
struct chunk *chunk;
@@
- chunk->str
+ chunk->area
@rule_i2@
typedef chunk;
struct chunk *chunk;
@@
- chunk->len
+ chunk->data
Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
Now the buffers only contain the header and a pointer to the storage
area which can be anywhere. This will significantly simplify buffer
swapping and will make it possible to map chunks on buffers as well.
The buf_empty variable was removed, as now it's enough to have size==0
and area==NULL to designate the empty buffer (thus a non-allocated head
is the empty buffer by default). buf_wanted for now is indicated by
size==0 and area==(void *)1.
The channels and the checks now embed the buffer's head, and the only
pointer is to the storage area. This slightly increases the unallocated
buffer size (3 extra ints for the empty buffer) but considerably
simplifies dynamic buffer management. It will also later permit to
detach unused checks.
The way the struct buffer is arranged has proven quite efficient on a
number of tests, which makes sense given that size is always accessed
and often first, followed by the othe ones.
For the same consistency reasons, let's use b_empty() at the few places
where an empty buffer is expected, or c_empty() if it's done on a channel.
Some of these places were there to realign the buffer so
{b,c}_realign_if_empty() was used instead.
Passing unsigned ints everywhere is painful, and will cause some headache
later when we'll want to integrate better with struct ist which already
uses size_t. Let's switch buffers to use size_t instead.
Since applets are now part of the main scheduler, it's useful to report
their nice value and the number of calls to the applet handler, to see
where the CPU is spent.
There's no real reason to have a specific scheduler for applets anymore, so
nuke it and just use tasks. This comes with some benefits, the first one
being that applets cannot induce high latencies anymore since they share
nice values with other tasks. Later it will be possible to configure the
applets' nice value. The second benefit is that the applet scheduler was
not very thread-friendly, having a big lock around it in prevision of this
change. Thus applet-intensive workloads should now scale much better with
threads.
Some more improvement is possible now : some applets also use a task to
handle timers and timeouts. These ones could now be simplified to use only
one task.
In preparation for thread-specific runqueues, change the task API so that
the callback takes 3 arguments, the task itself, the context, and the state,
those were retrieved from the task before. This will allow these elements to
change atomically in the scheduler while the application uses the copied
value, and even to have NULL tasks later.
In order to use arbitrary data in the CLI (multiple lines or group of words
that must be considered as a whole, for example), it is now possible to add a
payload to the commands. To do so, the first line needs to end with a special
pattern: <<\n. Everything that follows will be left untouched by the CLI parser
and will be passed to the commands parsers.
Per-command support will need to be added to take advantage of this
feature.
Signed-off-by: Aurélien Nephtali <aurelien.nephtali@corp.ovh.com>
issue was identified by cppcheck
[src/map.c:372] -> [src/map.c:376]: (warning) Variable 'appctx->st2' is reassigned a value before the old one has been used. 'break;' missing?
[src/map.c:433] -> [src/map.c:437]: (warning) Variable 'appctx->st2' is reassigned a value before the old one has been used. 'break;' missing?
[src/map.c:555] -> [src/map.c:559]: (warning) Variable 'appctx->st2' is reassigned a value before the old one has been used. 'break;' missing?
[src/stream.c:3264] -> [src/stream.c:3268]: (warning) Variable 'appctx->st2' is reassigned a value before the old one has been used. 'break;' missing?
Signed-off-by: Ilya Shipitsin <chipitsine@gmail.com>
The management of the servers and the proxies queues was not thread-safe at
all. First, the accesses to <strm>->pend_pos were not protected. So it was
possible to release it on a thread (for instance because the stream is released)
and to use it in same time on another one (because we redispatch pending
connections for a server). Then, the accesses to stream's information (flags and
target) from anywhere is forbidden. To be safe, The stream's state must always
be updated in the context of process_stream.
So to fix these issues, the queue module has been refactored. A lock has been
added in the pendconn structure. And now, when we try to dequeue a pending
connection, we start by unlinking it from the server/proxy queue and we wake up
the stream. Then, it is the stream reponsibility to really dequeue it (or
release it). This way, we are sure that only the stream can create and release
its <pend_pos> field.
However, be careful. This new implementation should be thread-safe
(hopefully...). But it is not optimal and in some situations, it could be really
slower in multi-threaded mode than in single-threaded one. The problem is that,
when we try to dequeue pending connections, we process it from the older one to
the newer one independently to the thread's affinity. So we need to wait the
other threads' wakeup to really process them. If threads are blocked in the
poller, this will add a significant latency. This problem happens when maxconn
values are very low.
This patch must be backported in 1.8.
An fd cache entry might be removed and added at the end of the list, while
another thread is parsing it, if that happens, we may miss fd cache entries,
to avoid that, add a new field in the struct fdtab, "added_mask", which
contains a mask for potentially affected threads, if it is set, the
corresponding thread will set its bit in fd_cache_mask, to avoid waiting in
poll while it may have more work to do.
Create a local, per-thread, fdcache, for file descriptors that only belongs
to one thread, and make the global fd cache mostly lockless, as we can get
a lot of contention on the fd cache lock.
Since the fd update tables are per-thread, we need to have a bit per
thread to indicate whether an update exists, otherwise this can lead
to lost update events every time multiple threads want to update the
same FD. In practice *for now*, it only happens at start time when
listeners are enabled and ask for polling after facing their first
EAGAIN. But since the pollers are still shared, a lost event is still
recovered by a neighbor thread. This will not reliably work anymore
with per-thread pollers, where it has been observed a few times on
startup that a single-threaded listener would not always accept
incoming connections upon startup.
It's worth noting that during this code review it appeared that the
"new" flag in the fdtab isn't used anymore.
This fix should be backported to 1.8.
A number of counters have been added at special places helping better
understanding certain bug reports. These counters are maintained per
thread and are shown using "show activity" on the CLI. The "clear
counters" commands also reset these counters. The output is sent as a
single write(), which currently produces up to about 7 kB of data for
64 threads. If more counters are added, it may be necessary to write
into multiple buffers, or to reset the counters.
To backport to 1.8 to help collect more detailed bug reports.
James Mc Bride reported an interesting case affecting all versions since
at least 1.5 : if a client aborts a connection on an empty buffer at the
exact moment a server redispatch happens, the CF_SHUTW_NOW flag on the
channel is immediately turned into CF_SHUTW, which is not caught by
check_req_may_abort(), leading the redispatch to be performed anyway
with the channel marked as shut in both directions while the stream
interface correctly establishes. This situation makes no sense.
Ultimately the transfer times out and the server-side stream interface
remains in EST state while the client is in CLO state, and this case
doesn't correspond to anything we can handle in process_stream, leading
to poll() being woken up all the time without any progress being made.
And the session cannot even be killed from the CLI.
So we must ensure that check_req_may_abort() also considers the case
where the channel is already closed, which is what this patch does.
Thanks to James for providing detailed captures allowing to diagnose
the problem.
This fix must be backported to all maintained versions.
The H2 mux can cleanly report an error when a client closes, which is not
the case for the pass-through mux which only reports shutr. That was the
reason why "option abortonclose" was created since there was no way to
distinguish a clean shutdown after sending the request from an abort.
The problem is that in case of H2, the streams are always shut read after
the request is complete (when the END_STREAM flag is received), and that
when this lands on a backend configured with "option abortonclose", this
aborts the request. Disabling abortonclose is not always an option when
H1 and H2 have to coexist.
This patch makes use of the newly introduced mux capabilities reported
via the stream interface's SI_FL_CLEAN_ABRT indicating that the mux is
safe and that there is no need to turn a clean shutread into an abort.
This way abortonclose has no effect on requests initiated from an H2
mux.
This patch as well as these 3 previous ones need to be backported to
1.8 :
- BUG/MINOR: h2: properly report a stream error on RST_STREAM
- MINOR: mux: add flags to describe a mux's capabilities
- MINOR: stream-int: set flag SI_FL_CLEAN_ABRT when mux supports clean aborts
By copying the info in the stream interface that the mux cleanly reports
aborts, we'll have the ability to check this flag wherever needed regardless
of the presence of a mux or not.
Commit 3e13cba ("MEDIUM: session: make use of the connection's destroy
callback") ensured that connections could be autonomous to destroy the
session they initiated, but it didn't take care of doing the same for
applets. Such applets are used for peers, Lua and SPOE outgoing
connections. In this case, once the stream ends, it closes everything
and nothing takes care of releasing the session. The problem is not
immediately obvious since the only visible effect is that older
processes will not quit on reload after having leaked one such session.
For now we check in stream_free() if the session's origin is the applet
we're releasing, and then free the session as well. Something more
uniform should probably be done once we manage to unify applets and
connections a bit more.
This fix needs to be backported to 1.8. Thanks to Emmanuel Hocdet for
reporting the problem.
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
The cache exhibited a but in process_stream() where upon abort it is
possible to switch the stream-int's state to SI_ST_CLO without calling
si_release_endpoint(), resulting in a possibly missing ->release() for
the applet.
It should affect all other applets as well (eg: lua, spoe, peers) and
should carefully be backported to stable branches after some observation
period.
When the stats code was moved to an applet, it wasn't completely
cleaned of its usage of the HTTP transaction and it used to store
the HTTP status in txn->status and to set the HTTP request date to
<now> from within the applet. This is totally wrong because the
applet is seen as a server from the HTTP engine, which parses its
response, so the http_txn must not be touched there.
This was made visible by the cache which would always exhibit a
negative TR log, indicating that nowhere in the code we took care of
setting s->logs.tv_request while the code above used to continue to
hide this. Another side effect of this issue is that under load, if
the stats applet call risks to be delayed, the reported t_queue can
appear negative by being below tv_request-tv_accept.
This patch removes the assignment of tv_request and txn->status from
the applet code and instead sets the tv_request if still unset when
connecting to the applet. This ensures that all applets report correct
request timers now.