Instead of having each caller of appctx_get_buf() think about setting
the blocking flag, better have the function do it, since it's already
handling the queue anyway. This way we're sure that both are consistent.
If a response varies on anything other than accept-encoding (origin or
referer) but still contains an 'Encoding' header, the cached responses
were never sent back.
This is because of the 'set_secondary_key_encoding' call that always
filled the accept-encoding part of the secondary signature with the
response's actual encoding, regardless of whether the response varies on
this or not. This meant that the accept-encoding part of the signature
could be non-null in the cached entry which made the
'get_secondary_entry' calls in 'http_action_req_cache_use' always fail
because in those cases the request's secondary signature always had a
null accept-encoding part.
This patch can be backported up to branch 2.4.
From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA
so caches can be identified when looking at HAProxy process memory
mapping.
The most possible error is lack of kernel support, as a result
we ignore it, if the naming fails the mapping of memory context
ought to still occur.
Gcc before 7 does really not like direct operations on cast pointers
such as "((struct a*)b)->c += d;". It turns our that we have exactly
that construct in 3.0 since commit 5baa9ea168 ("MEDIUM: cache: Save
body size of cached objects and track it on delivery").
It's generally sufficient to use an intermediary variable such as :
"({ (struct a*) _ = b; _; })->c +=d;" but that's ugly. Fortunately
DISGUISE() implicitly does something very similar and works fine, so
let's use that.
No backport is needed.
When cache and stats applets were changed to use their own buffers, a change
was also performed to no longer access the stream from the I/O
handller. Among other things, the HTTP start-line of the request is now
retrieved to get the method. But, when these changes were brought, the inbuf
buffer allocation failures were not handled.
It is of course not so common. But if this happens, a crash may be
experienced. To fix the issue, we now check for inbuf allocation failures
before accessing it.
No backported needed.
Some compilers report this on the cache:
src/cache.c:235: warning: 'release_entry_locked' declared inline after being called
src/cache.c:235: warning: previous declaration of 'release_entry_locked' was here
And indeed, the function is first declared non-inline and later inline.
Let's just set the inline status from the beginning. It's not really
needed to backport this.
Global options to disable for zero-copy forwarding are now tested outside
callbacks responsible to perform the forwarding itself. It is cleaner this
way because we don't try at all zero-copy forwarding if at least one side
does not support it. It is equivalent to what was performed before, but it
is simplier this way.
To fix a bug, a flag to announce the capabitlity to support the zero-copy
forwarding on the consumer side will be added on the SE descriptor. So the
old flag SE_FL_MAY_FASTFWD is renamed to indicate it concerns the producer
side. It is now SE_FL_MAY_FASTFWD_PROD. And to prepare addition of the new
flag, the bitfield is a bit reordered.
These both flags are set after releasing the applet, in
appctx_shut(). Concretly, it means the applet is shutdown for reads and
writes. Once set, the applet's I/O handler was no longer called. Tests on
these flags are useless. There is no chance to match them.
An issue was introduced when zero-copy forwarding was added to the stats and
cache applets. There is no test to be sure the upper layer is ready to use
the zero-copy forwarding. So these applets refuse to deliver the response
into the applet's output buffer if the zero-copy forwarding is supported by
the opposite endpoint. It is especially an issue when a filter, like the
compression, is in-use on the response channel.
Because of this bug, the response is not delivered and the applet is woken
up in loop to produce data.
To fix the issue, an appctx flag was added, APPCTX_FL_FASTFWD, to know when
the zero-copy forwarding is in-use. We rely on this flag to not fill the
outbuf in the applet's I/O handler.
No backport needed.
At the beginning of the 3.0-dev cycle, the zero-copy forwarding support was
added only for the cache applet with an option to disable it. This was a
hack, waiting for a better integration with applets. It is now possible to
implement the zero-copy forwarding for any applets. So the specific option
for the cache applet was renamed to be used for all applets. And this option
is now also checked for the stats applet.
Concretely, 'tune.cache.zero-copy-forwarding' was renamed to
'tune.applet.zero-copy-forwarding'.
This field was introduced when the first implementation of the zero-copy
forwarding was added. It is now useless. However, we must still save the
body-size of the object in the cache.
Default .rcv_buf and .snd_buf functions that applets can use are now
specialized to manipulate raw buffers or HTX buffers.
Thus a TCP applet should use appctx_raw_rcv_buf() and appctx_raw_snd_buf()
while HTTP applet should use appctx_htx_rcv_buf() and appctx_htx_snd_buf().
Note that the appctx is now directly passed to these functions instead of
the SC.
The cache applet will be refactored to use its own buffer. Thus, for now,
the zero-copy forwarding support is removed and it will be reintrocuded
later.
When a cache is "cold" and multiple clients simultaneously try to access
the same resource we must forward all the requests to the server. Next,
every "duplicated" response will be processed in http_action_store_cache
and we will try to cache every one of them regardless of whether this
response was already cached. In order to avoid having multiple entries
for a same primary key, the logic is then to first delete any
preexisting entry from the cache tree before storing the current one.
The actual previous response content will not be deleted yet though
because if the corresponding row is detached from the "avail" list it
might still be used by a cache applet if it actually performed a lookup
in the cache tree before the new response could be received.
This all means that we can end up using a valid row that references a
cache_entry that was already removed from the cache tree. This does not
pose any problem in regular caches (no 'vary' mechanism enabled) because
the applet only works on the data and not the 'cache_entry' information,
but in the "vary" context, when calling 'http_cache_applet_release' we
might call 'delete_entry' on the given entry which in turn tries to
iterate over all the secondary entries to find the right one in which
the secondary entry counter can be updated. We would then call
eb32_next_dup on an entry that was not in the tree anymore which ended
up crashing.
This crash was introduced by "48f81ec09 : MAJOR: cache: Delay cache
entry delete in reserve_hot function" which added the call to
"release_entry" in "http_cache_applet_release" that ended up crashing.
This issue was raised in GitHub #2417.
This patch must be backported to branch 2.9.
tune.cache.zero-copy-forwarding parameter can now be used to enable or
disable the zero-copy fast-forwarding for the cache applet only. It is
enabled ('on') by default. It can be disabled by setting the parameter to
'off'.
It is now possible to directly forward data to the opposite side from the
cache applet. To do so, dedicated functions were added to fast-forward the
payload part of the cached objects. Of course headers and trailers are still
sent via the channel's buffer, using the HTX.
When an object is delivered from the cache, once the applet reaches the
HTX_CACHE_DATA state, it declares it can fast-forward data. From this point,
all data are directly transferred to the oppposite side.
We now save the body size of cached objets in the cache entry strucutre. In
addition, the cache applet tracks the body part already sent.
This will be mandatory to add support of endpoint-to-endpoint
fast-forwarding in the cache applet.
When a stream is interrupted by the client before the full answer is
stored in the cache, we end up with an incomplete entry in the cache
that cannot be overwritten until it "naturally" expires. In such a case,
we call the cache filter's cache_store_strm_deinit callback without ever
calling cache_store_http_end which means that the 'complete' flag is
never set on the concerned cache_entry.
This patch adds a check on the 'complete' flag in the strm_deinit
callback and removes the entry from the cache if it is incomplete.
A way to exhibit this bug is to try to get the same "big" response on
multiple clients at the same time thanks to h2load for instance, and to
interrupt the client side before the answer can be fully stored in the
cache.
This patch can be backported up to 2.4 but it will need some rework
starting with branch 2.8 because of the latest cache changes.
pre-c99 compilers will fail to build the cache since commit 48f81ec09
("MAJOR: cache: Delay cache entry delete in reserve_hot function") due
to an int declaration in the for loop. No backport is needed.
Some functions are built on the fact that the cache lock must be already
taken by the caller. This patch adds this information in the functions'
descriptions.
This global variable was used to avoid using locks on shared_contexts in
the unlikely case of nbthread==1. Since the locks do not do anything
when USE_THREAD is not defined, it will be more beneficial to simply
remove this variable and the systematic test on its value in the shared
context locking functions.
A reference counter on the cache_entry was added in a previous commit.
Its value is atomically increased and decreased via the retain_entry and
release_entry functions.
This is needed because of the latest cache and shared_context
modifications that introduced two separate locks instead of the
preexisting single shctx_lock one.
With the new logic, we have two main blocks competing for the two locks:
- the one in the http_action_req_cache_use that performs a lookup in the
cache tree (locked by the cache lock) and then tries to remove the
corresponding blocks from the shared_context's 'avail' list until the
response is sent to the client by the cache applet,
- the shctx_row_reserve_hot that traverses the 'avail' list and gives
them back to the caller, while removing previous row heads from the
cache tree
Those two blocks require the two locks but one of them would take the
cache lock first, and the other one the shctx_lock first, which would
end in a deadlock without the current patch.
The way this conflict is resolved in this patch is by ensuring that at
least one of those uses works without taking the two locks at the same
time.
The solution found was to keep taking the two locks in the cache_use
case. We first lock the cache to lookup for an entry and we then take
the shctx lock as well to detach the corresponding blocks from the
'avail' list. The subtlety is that between the cache lookup and the
actual locking of the shctx, another thread might have called the
reserve_hot function in which we only take the shctx lock.
In this function we traverse the 'avail' list to remove blocks that are
then given to the caller. If one of those blocks corresponds to a
previous row head, we call the 'free_blocks' callback that used to
delete the cache entry from the tree.
We now avoid deleting directly the cache entries in reserve_hot and we
rather set the cache entries 'complete' param to 0 so that no other
thread tries to work with this entry. This way, when we release the
shctx lock in reserve_hot, the first thread that had performed the cache
lookup and had found an entry that we just gave to another thread will
see that the 'complete' field is 0 and it won't try to work with this
response.
The actual removal of entries from the cache tree will now be performed
in the new 'reserve_finish' callback called at the end of the
shctx_row_reserve_hot function. It will iterate on all the row head that
were inserted in a dedicated list in the 'free_block' callback and
perform the actual delete.
Descend the shctx_lock calls into the shctx_row_reserve_hot so that the
cases when we don't need to lock anything (enough space in the current
row or not enough space in the 'avail' list) do not take the lock at
all.
In sh_ssl_sess_new_cb the lock had to be descended into
sh_ssl_sess_store in order not to cover the shctx_row_reserve_hot call
anymore.
Add a reference counter on the cache_entry. Its value will be atomically
increased and decreased via the retain_entry and release_entry
functions.
The release_entry function has two distinct versions,
release_entry_locked and release_entry_unlocked that should be called
when the cache lock is already taken in write mode or not
(respectively). In the unlocked case the cache lock will only be taken
in write mode on the last reference of the entry (before calling
delete_entry). This allows to limit the amount of times when we need to
take the cache lock during a release operation.
Since a lock on the cache tree was added in the latest cache changes, we
do not need to use the shared_context's lock to lock more than pure
shared_context related data anymore. This already existing lock will now
only cover the 'avail' list from the shared_context. It can then be
changed to a rwlock instead of a spinlock because we might want to only
run through the avail list sometimes.
Apart form changing the type of the shctx lock, the main modification
introduced by this patch is to limit the amount of code covered by the
shctx lock. This lock does not need to cover any code strictly related
to the cache tree anymore.
After the latest changes in the cache/shared_context mechanism, the
cache and shared_context logic were decorrelated and in some unlikely
cases we might end up using the "show cache" command while some regular
cache processing is occurring (a response being stored in the cache for
instance). In such a case, because we used the same 'trash' buffer in
those two contexts, we could end up with the contents of a response in
the ouput of the "show cache" command.
This patch fixes this problem by allocating a dedicated trash for the
CLI command.
The "hot" list stored in a shared_context was used to keep a reference
to shared blocks that were currently being used and were thus removed
from the available list (so that they don't get reused for another cache
response). This 'hot' list does not ever need to be shared across
threads since every one of them only works on their current row.
The main need behind this 'hot' list was to detach the corresponding
blocks from the 'avail' list and to have a known list root when calling
list_for_each_entry_from in shctx_row_data_append (for instance).
Since we actually never need to iterate over all members of the 'hot'
list, we can remove it and replace the inc_hot/dec_hot logic by a
detach/reattach one.
When looking for a valid entry in the cache tree in
http_action_req_cache_use, we do not need to delete an expired entry at
once because even if an expired entry exists, since the request will be
forwarded to the server, then the expired entry will be overwritten when
the updated response is seen. We can then use a simpler rdlock during
cache_use operation.
Any lookup in the cache tree done through entry_exist or
secondary_entry_exist functions could end up deleting the corresponding
entry if it is expired which prevents from using a rdlock on code paths
that would just perform a lookup on the tree (in
http_action_req_cache_use for instance).
Adding a 'delete_expired' boolean as a parameter allows for "pure"
lookups and thus it will allow to perform operations on the tree that
simply require a rdlock instead of a "heavier" wrlock.
The "show cache" CLI command iterates over all the entries of the cache
tree and it used this opportunity to remove expired entries from the
cache. This behavior was completely undocumented and does not seem that
necessary. By removing it we can take the cache lock in read mode only
which limits the impact on the other threads.
Every use of the cache tree was covered by the shctx lock even when no
operations were performed on the shared_context lists (avail and hot).
This patch adds a dedicated RW lock for the cache so that blocks of code
that work on the cache tree only can use this lock instead of the
superseding shctx one. This is useful for operations during which the
concerned blocks are already in the hot list.
When the two locks need to be taken at the same time, in
http_action_req_cache_use and in shctx_row_reserve_hot, the shctx one
must be taken first.
A new parameter needed to be added to the shared_context's free_block
callback prototype so that cache_free_block can take the cache lock and
release it afterwards.
This patch add a hash of the Origin header to the cache's secondary key.
This enables to manage store responses that have a "Vary: Origin" header
in the cache when vary is enabled.
This cannot be considered as a means to manage CORS requests though, it
only processes the Origin header and hashes the presented value without
any form of URI normalization.
This need was expressed by Philipp Hossner in GitHub issue #251.
Co-Authored-by: Philipp Hossner <philipp.hossner@posteo.de>
When building the secondary signature for cache entries when vary is
enabled, the referer part of the signature was a simple crc32 of the
first referer header.
This patch changes it to a 64bits hash based of xxhash algorithm with a
random seed built during init. This will prevent "malicious" hash
collisions between entries of the cache.
Since a recent change on the SC API, a producer must specify the amount of
free space it needs to progress when it is blocked. But, it must take care
to never exceed the maximum size allowed in the buffer. Otherwise, the
stream is freezed because it cannot reach the condition to unblock the
producer.
In this context, there is a bug in the cache applet when it fails to dump a
message. It may request more space than allowed. It happens when the cached
object is too big.
It is a 2.8-specific bug. No backport needed.
sc_need_room() now takes the required free space to receive more data as
parameter. All calls to this function are updated accordingly. For now, this
value is set but not used. When we are waiting for a buffer, 0 is used. So
we expect to be unblocked ASAP. However this must be reviewed because
SC_FL_NEED_BUF is probably enough in this case and this flag is already set
if the input buffer allocation fails.
We now try, as far as possible, to rely on the SE descriptor to detect end
of processing. Idea is to no longer rely on the channel or the SC to do so.
First, we now set SE_FL_EOS instead of calling and cf_shutr() to report the
end of the stream. It happens when the response is fully sent (SE_FL_EOI is
already set in this case) or when an error is reported. In this last case,
SE_FL_ERROR is also set.
Thanks to this change, it is now possible to detect the applet must only
consume the request waiting for the upper layer releases it. So, if
SE_FL_EOS or SE_FL_ERROR are set, it means the reponse was fully
handled. And if SE_FL_SHR or SE_FL_SHW are set, it means the applet was
released by upper layer and is waiting to be freed.
Thanks to the previous patch, it is now possible for applets to not set the
CF_EOI flag on the channels. On this point, the applets get closer to the
muxes.
It was done by hand by callers when a shutdown for read or write was
performed. It is now always handled by the functions performing the
shutdown. This way the callers don't take care of it. This will avoid some
bugs.
Before looking for a secondary cache entry for a given request we
checked that the first entry was complete, which might prevent us from
using a valid entry if the first one with the same primary key is not
full yet.
Likewise, if the primary entry is complete but not the secondary entry
we try to use, we might end up using a partial entry from the cache as
a response.
This bug was raised in GitHub #2048.
It can be backported up to branch 2.4.
Since commit cc9bf2e5f "MEDIUM: cache: Change caching conditions"
responses that do not have an explicit expiration time are not cached
anymore. But this mechanism wrongly used the TX_CACHE_IGNORE flag
instead of the TX_CACHEABLE one. The effect this had is that a cacheable
response that corresponded to a request having a "Cache-Control:
no-cache" for instance would not be cached.
Contrary to what was said in the other commit message, the "checkcache"
option should not be impacted by the use of the TX_CACHEABLE flag
instead of the TX_CACHE_IGNORE one. The response is indeed considered as
not cacheable if it has no expiration time, regardless of the presence
of a cookie in the response.
This should fix GitHub issue #2048.
This patch can be backported up to branch 2.4.
The cache makes use of dates advertised by external components, such
as "last-modified" or "date". As such these are wall-clock dates, and
not internal dates. However, all comparisons are mistakenly made based
on the internal monotonic date which is designed to drift from the wall
clock one in order to catch up with stolen time (which can sometimes be
intense in VMs). As such after some run time some objects may fail to
validate or fail to expire depending on the direction of the drift. This
is particularly visible when applying an offset to the internal time to
force it to wrap soon after startup, as it will be shifted up to 49.7
days in the future depending on the current date; what happens in this
case is that the reg-test "cache_expires.vtc" fails on the 3rd test by
returning stale contents from the cache at the date of this commit.
It is really important that all external dates are compared against
"date" and not "now" for this reason.
This fix needs to be backported to all versions.
CF_READ_NULL flag is not really useful and used. It is a transient event
used to wakeup the stream. As we will see, all read events on a channel may
be resumed to only one and are all used to wake up the stream.
In this patch, we introduce CF_READ_EVENT flag as a replacement to
CF_READ_NULL. There is no breaking change for now, it is just a
rename. Gradually, other read events will be merged with this one.