This one was added by commit 53216e7db ("MEDIUM: connections: Don't
directly mess with the polling from the upper layers.") after the
removal of the conditional cs_want_send() call. But after analysis
it turned out that it's not needed since the si_cs_send() call will
either succeed or subscribe.
Calling si_cs_send() alone is always dangerous because it can result
in the loss of an event if it manages to empty the buffer. Indeed, in
this case it's critical to call si_chk_rcv() on the opposite stream-int.
Given that si_cs_process() takes care of all this, let's call it instead.
All this code could possibly be refined soon to avoid redoing the whole
stream_int_notify() and do it only after a send(), but at the moment it's
not important.
With the new synchronous si_cs_send() at the end of process_stream(),
we're seeing re-appear the I/O layer specific part of the stream interface
which is supposed to deal with I/O event subscription. The only difference
is that now we subscribe to I/Os only after having attempted (and failed)
them.
This patch brings a cleanup in this by reintroducing stream_int_update_conn()
with the send code from process_stream(). However this alone would not be
enough because the flags which are cleared afterwards would result in the
loss of the possible events (write events only at the moment). So the flags
clearing and stream-int state updates are also performed inside si_update()
between the generic code and the I/O specific code. This definitely makes
sense as after this call we can simply check again for channel and SI flag
changes and decide to loop once again or not.
The rationale here is that we should never need to try to send() at the
beginning of process_stream() because :
- if something was pending, it's very unlikely that it was unblocked
and not sent just between the last poll() and the wakeup instant.
- if something pending was recently sent, then we don't have anything
to send anymore.
So at first glance it doesn't seem like there could be any valid case
where trying to send before entering the function brings any benefit.
If a buffer allocation failed, we have SI_FL_WAIT_ROOM set and c_size(buf)
being zero. It's the only moment where we have a new opportunity to try to
allocate this buffer. However we don't want to waste our time trying this
if both are non-null since it indicates missing room without any changed
condition.
Well that's only 3 places (applet.c, stream_interface.c, hlua.c). This
ensures we always clear SI_FL_WAIT_ROOM before setting it on failure,
so that it is granted that SI_FL_WAIT_ROOM always indicates a lack of
room for doing an operation, including the inability to allocate a
buffer for this.
The vars_prune() and vars_init() functions involve locking while most of
the time there is no variable at all in streams nor sessions. Let's check
for emptiness before calling these functions. Simply doing this has
increased the multithreaded performance from 1.5 to 5% depending on the
workload.
While "option prefer-last-server" only applies to non-deterministic load
balancing algorithms, 401/407 responses actually caused haproxy to prefer
the last server unconditionally.
As this breaks deterministic load balancing algorithms like uri, this
patch applies the same condition here.
Should be backported to 1.8 (together with "BUG/MINOR: only mark
connections private if NTLM is detected").
Instead of marking all connections that see a 401/407 response private
(for connection reuse), this patch detects a RFC4559/NTLM authentication
scheme and restricts the private setting to those connections.
This is so we can reuse connections with 401/407 responses with
deterministic load balancing algorithms later (which requires another fix).
This fixes the problem reported here by Elliot Barlas :
https://discourse.haproxy.org/t/unable-to-configure-load-balancing-per-request-over-persistent-connection/3144
Should be backported to 1.8.
The behaviour of the flag CF_WRITE_PARTIAL was modified by commit
95fad5ba4 ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") due
to a situation where it could trigger an immediate wake up of the other
side, both acting in loops via the FD cache. This loss has caused the
need to introduce CF_WRITE_EVENT as commit c5a9d5bf, to replace it, but
both flags express more or less the same thing and this distinction
creates a lot of confusion and complexity in the code.
Since the FD cache now acts via tasklets, the issue worked around in the
first patch no longer exists, so it's more than time to kill this hack
and to restore CF_WRITE_PARTIAL's semantics (i.e.: there has been some
write activity since we last left process_stream).
This patch mostly reverts the two commits above. Only the part making
use of CF_WROTE_DATA instead of CF_WRITE_PARTIAL to detect the loss of
data upon connection setup was kept because it's more accurate and
better suited.
With this patch we avoid parsing "max-object-size" with atoi() and we store its
value as an unsigned int to prevent bad implicit conversion issues especially
when we compare it with others unsigned value (content length).
With this patch we check that shctx_init() does not returns 0.
This is possible if the maxblocks argument, which is passed as an
int, is negative due to an implicit conversion.
Must be backported to 1.8.
With this patch we support cache size larger than 2047 (MB) and prevent haproxy from crashing when "total-max-size" is parsed as negative values by atoi().
The limit at parsing time is 4095 MB (UINT_MAX >> 20).
May be backported to 1.8.
This patch adds "max-object-size" option to the cache to limit
the size in bytes of the HTTP objects to be cached. When not provided,
the maximum size of an HTTP object is a 256th of the cache size.
This patch makes the capable of storing HTTP objects larger than a buffer.
It makes usage of the "block by block shared object allocation" new shctx API.
A new pointer to struct shared_block has been added to the cache applet
context to memorize the next block to be used by the HTTP cache I/O handler
http_cache_io_handler() to emit the data. Another member, named "sent" memorize
the number of bytes already sent by this handler. So, to send an object from cache,
http_cache_io_handler() must be called until "sent" counter reaches the size
of this object.
This patch makes shctx capable of storing objects in several parts,
each parts being made of several blocks. There is no more need to
walk through until reaching the end of a row to append new blocks.
A new pointer to a struct shared_block member, named last_reserved,
has been added to struct shared_block so that to memorize the last block which was
reserved by shctx_row_reserve_hot(). Same thing about "last_append" pointer which
is used to memorize the last block used by shctx_row_data_append() to store the data.
Fred reported a random crash related to the pools. This was introduced
by commit e18db9e98 ("MEDIUM: pools: implement a thread-local cache for
pool entries") because the minimum pool item size should have been
increased to 32 bytes to accommodate the 2 double-linked lists.
No backport is needed.
This option makes a proxy use only HTX-compatible muxes instead of the
HTTP-compatible ones for HTTP modes. It must be set on both ends, this
is checked at parsing time.
With the previous connection model, when we purposely decided to stop
receiving in order to avoid polling after a complete request was received
for example, it was needed to set SI_FL_WAIT_ROOM to prevent receive
polling from being re-armed. Now with the new subscription-based model
there is no such thing anymore and there is noone to remove this flag
either. Thus if a request takes more than one packet to come in or
spans over too many packets, this flag will cause it to wait forever.
Let's simply remove this flag now.
This patch should not be backported since older versions still need
that this flag is set here to stop receiving.
We wake up all the streams waiting to send data when we have space available
in the mux buffer. Doing so means we probably wake way too many streams,
because after a few the buffer will probably be full instead. So keep a
list of all the streams that are about to send data, and if we detect that
the buffer is full, unschedule the tasks and put the streams back to the
send_list.
Make sure we call tasklet_free() only after si_release_endpoint(), when the
unsubscribe() method has been called, so that we're sure the mux won't
attempt to access the taslet.
Avoid using conn_xprt_want_send/recv, and totally nuke cs_want_send/recv,
from the upper layers. The polling is now directly handled by the connection
layer, it is activated on subscribe(), and unactivated once we got the event
and we woke the related task.
In h2_recv(), return 1 if we have data available, or if h2_recv_allowed()
failed, to be sure h2_process() is called.
Also don't subscribe if our buffer is full.
When retrying to connect to a server, because the previous connection failed,
make sure if we subscribed to the previous connection, the polling flags will
be true for the new fd.
No backport is needed.
When we're closing a stream, is there's no stream left and a goaway was sent,
close the connection, there's no reason to keep it open.
[wt: it's likely that this is needed in 1.8 as well, though it's unclear
how to trigger this issue, some tests are needed]
Similary to what's been done in 7a6ad88b02,
take into account that free_list that free_list is a void **, and so use
a void ** too when attempting to do a CAS.
The purpose is to detect if threads or processes are competing for the
same CPU. This can happen when threads are incorrectly bound, or after a
reload if the previous process still has an important activity. With
threads this situation is problematic because a preempted thread holding
a lock will block other ones waiting for this lock to be released.
A first attempt consisted in measuring the cumulated lost time more
precisely but the system's scheduler is smart enough to try to limit the
thread preemption rate by mostly context switching during poll()'s blank
periods, so most of the time lost is not seen. In essence this is good
because it means a thread is not preempted with a lock held, and even
regarding the rendez-vous point it cannot prevent the other ones from
making progress. But still it happens tens to hundreds of times per
second that a thread might be preempted, so it's still possible to detect
that the situation is happening, thus it's interesting to measure and
report its frequency.
Each time we enter the poller, we check the CPU time spent working and
see if we've lost time doing something else. To limit false positives,
we're only interested in losses of 500 microseconds or more (i.e. half
a clock tick on a 1 kHz system). If so, it indicates that some time was
stolen by another thread or process. Note that we purposely store some
sub-millisecond counters so that under heavy traffic with a 1 kHz clock,
it's still possible to measure something without being subject to the
risk of rounding errors (i.e. if exactly 1 ms is stolen it's possible
that the time difference could often be slightly lower).
This counter of lost CPU time slots time is reported in "show activity"
in numbers of milliseconds of CPU lost per second, per 15s, and total
over the process' life. By definition, the per-second counter cannot
report values larger than 1000 per thread per second and the 15s one
will be limited to 15000/s in the worst case, but it's possible that
peak values exceed such thresholds after long pauses.
The calls to HA_ATOMIC_CAS() on the lockfree version of the pool allocator
were mistakenly done on (void*) for the old value instead of (void **).
While this has no impact on "recent" gcc, it does have one for gcc < 4.7
since the CAS was open coded and it's not possible to assign a temporary
variable of type "void".
No backport is needed, this only affects 1.9.
By placing this code into time.h (tv_entering_poll() and tv_leaving_poll())
we can remove the logic from the pollers and prepare for extending this to
offer more accurate time measurements.
The 4 pollers all contain the same code used to compute the poll timeout.
This is pointless, let's centralize this into fd.h. This also gets rid of
the useless SCHEDULER_RESOLUTION macro which used to work arond a very old
linux 2.2 bug causing select() to wake up slightly before the timeout.
There are as many ways to build the globalfilepathlen variable as branches
in the if/then/else, creating lots of confusion. Address the most obvious
parts, but some polishing definitely is still needed.
These ones are on error paths that are properly handled by luaL_error()
which does a longjmp() but the compiler cannot know it. By adding an
__unreachable() statement in WILL_LJMP(), there is no ambiguity anymore.
This may be backported to 1.8 but these previous patches are needed first :
- BUILD: compiler: add a new statement "__unreachable()"
- MINOR: lua: all functions calling lua_yieldk() may return
- BUILD: lua: silence some compiler warnings about potential null derefs (#2)
There was a mistake when tagging functions which always use longjmp and
those which may use it in that all those supposed to call lua_yieldk()
may return without calling longjmp. Thus they must not use WILL_LJMP()
but MAY_LJMP(). It has zero impact on the code emitted as such, but
prevents other fixes from being properly implemented : this was the
cause of the previous failure with the __unreachable() calls.
This may be backported to older versions. It may or may not apply
well depending on the context, though the change simply consists in
replacing "WILL_LJMP(hlua_yieldk" with "MAY_LJMP(hlua_yieldk", and
same with the single call to lua_yieldk() in hlua_yieldk().
Here we make sure that appctx is always taken from the unchecked value
since we know it's an appctx, which explains why it's immediately
dereferenced. A missing test was added to ensure that task_new() does
not return a NULL.
This may be backported to 1.8.
This reverts commit f1ffb39b61.
It breaks Lua causing some timeouts. Removing the __unreachable() statement
from WILL_LJMP() fixes it. It's very strange and unclear whether it's an
issue with WILL_LJMP() not fullfilling its promise of not returning, if
the code emitted with __unreachable() gets broken, or anything else. Let's
revert this for now.
There is a bug in this function used to release other threads. It leaves
the current thread marked as harmless. If after this another thread does
a thread_isolate(), but before the first one reaches poll(), the second
thread will believe it's alone while it's not.
This must be backported to 1.8 since the rendez-vous point was merged
into 1.8.14.
Each thread now keeps the last ~512 kB of freed objects into a local
cache. There are some heuristics involved so that a specific pool cannot
use more than 1/8 of the total cache in number of objects. Tests have
shown that 512 kB is an optimal size on a 24-thread test running on a
dual-socket machine, resulting in an overall 7.5% performance increase
and a cache miss ratio reducing from 19.2 to 17.7%. Anyway it seems
pointless to keep more than an L2 cache, which probably explains why
sizes between 256 and 512 kB are optimal.
Cached objects appear in two lists, one per pool and one LRU to help
with fair eviction. Currently there is no way to check each thread's
cache state nor to flush it. This cache cannot be disabled and is
enabled as soon as the lockless pools are enabled (i.e.: threads are
enabled, no pool debugging is in use and the CPU supports a double word
CAS).
For caching it will be convenient to have indexes associated with pools,
without having to dereference the pool itself. One solution could consist
in replacing all pool pointers with integers but this would limit the
number of allocatable pools. Instead here we allocate the 32 first pools
from a pre-allocated array whose base address is known so that it's trivial
to convert a pool to an index in this array. Pools that cannot fit there
will be allocated normally.
Currently we have per-thread arrays of trees and counts, but these
ones unfortunately share cache lines and are accessed very often. This
patch moves the task-specific stuff into a structure taking a multiple
of a cache line, and has one such per thread. Just doing this has
reduced the cache miss ratio from 19.2% to 18.7% and increased the
12-thread test performance by 3%.
It starts to become visible that we really need a process-wide per-thread
storage area that would cover more than just these parts of the tasks.
The code was arranged so that it's easy to move the pieces elsewhere if
needed.
Now we still have a main contention point with the timers in the main
wait queue, but the vast majority of the tasks are pinned to a single
thread. This patch creates a per-thread wait queue and queues a task
to the local wait queue without any locking if the task is bound to a
single thread (the current one) otherwise to the shared queue using
locking. This significantly reduces contention on the wait queue. A
test with 12 threads showed 11 ms spent in the WQ lock compared to
4.7 seconds in the same test without this change. The cache miss ratio
decreased from 19.7% to 19.2% on the 12-thread test, and its performance
increased by 1.5%.
Another indirect benefit is that the average queue size is divided
by the number of threads, which roughly removes log(nbthreads) levels
in the tree and further speeds up lookups.
The vast majority of FDs are only seen by one thread. Currently the lock
on FDs costs a lot because it's touched often, though there should be very
little contention. This patch ensures that the lock is only grabbed if the
FD is shared by more than one thread, since otherwise the situation is safe.
Doing so resulted in a 15% performance boost on a 12-threads test.
peers_init_sync() doesn't check task_new()'s return value and doesn't
return any result to indicate success or failure. Let's make it return
an int and check it from the caller.
This can be backported as far as 1.6.
Gcc reports a potential null-deref error in the stick-table init code.
While not critical there, it's trivial to fix. This check has been
missing since 1.4 so this fix can be backported to all supported versions.
This null-deref cannot happen either as there necesarily is a listener
where this function is called. Let's use __objt_listener() to address
this.
This may be backported to 1.8.
Gcc 6.4 detects a potential null-deref warning in smp_fetch_ssl_fc_cl_str().
This one is not real since already addressed a few lines above. Let's use
__objt_conn() instead of objt_conn() to avoid the extra test that confuses
it.
This could be backported to 1.8.
These ones are on error paths that are properly handled by luaL_error()
which does a longjmp() but the compiler cannot know it. By adding an
__unreachable() statement in WILL_LJMP(), there is no ambiguity anymore.
This may be backported to 1.8 but the previous patch (BUILD: compiler:
add a new statement "__unreachable()") is needed for this.
In case pool_alloc() fails in stream_new(), we try to detach the stream
from the list before it has been added, dereferencing a NULL. In order
to fix it, simply move the LIST_DEL call upwards.
This must be backported to 1.8.
The listeners with the LI_O_INHERITED flag were deleted but not unbound
which is a problem since we have a polling in the master.
This patch unbind every listeners which are not require for the master,
but does not close the FD of those that have a LI_O_INHERITED flag.
We will need to know if a mux was created for a front or a back
connection and once it's established it's much harder, so let's
introduce H2_CF_IS_BACK for this.
For backend connections we'll have to initialize streams but not allocate
conn_streams since they'll already be there. Thus this patch splits the
h2c_stream_new() function into one dedicated to allocation of a new stream
and another one supposed to attach this stream to an existing frontend
connection.
Till now in order to figure the timeouts, we used to retrieve the proxy
from the session's owner, but the new API provides it so it's better to
simply take it from the caller at init time. We take this opportunity to
store the pointer to the proxy into the h2 connection so that we can
reuse it later when needed.
The init function was split into the mux init and the front init, but it
appears that most of the code will be common between the two sides when
implementing the backend init. Thus let's simply make this a unique
h2_init() function.
h2_snd_buf() must not accept to send data if the preface was not yet
received nor sent. At the moment it doesn't happen but it can with
server-side H2.
At a few places we check these states to detect if a stream has valid
data/errcode or is one of the two dummy streams (idle or closed). It
will become problematic for outgoing streams as it will not be possible
to report errors for example since the stream will switch from IDLE
state only after sending a HEADERS frame.
There is a safer solution consisting in checking the stream ID, which
may only be zero in the dummy streams. This patch changes the test to
only rely on the stream ID.
At many places in muxes we'll have to add tests to check if the
connection is front or back before deciding to log. Instead let's
centralize this test in sess_log() to simply do nothing when sess=NULL.
Some pseudo-headers are added during the headers parsing, mainly for the mux
H2. With this flag, it is possible to not add them. This avoid some boring
filtering in the mux H1.
Instead of using offsets relating to the parsed buffer to store start line
infos, we now use indirect strings. So now, these infos remain valid only if the
origin buffer remains untouched. But it's not a real problem because this union
is used during the parsing and never stored to a later use.
When headers parsing ends, a pseudo header with an empty name and an empty value
is added to the array of parsed headers to mark its end. It is convenient to
loop on this array, but not really useful if we want remove the last header or
add a new one, because we don't really know where is the last CRLF (the empty
line ending the headers block). So now, instead the name of this pseudo header
points on this last CRLF. Its length is still 0 and its value is still empty, so
loops on the array remains unchanged.
Since keep-alive mode is the default mode, the passive close has disappeared,
and in the code, httpclose and forceclose options are handled the same way:
connections with the client and the server are closed as soon as the request and
the response are received and missing "Connection: close" header is added in
each direction.
So to make things clearer, forceclose is now an alias for httpclose. And
httpclose is explicitly an active close. So the old passive close does not exist
anymore. Internally, the flag PR_O_HTTP_PCL has been removed and PR_O_HTTP_FCL
has been replaced by PR_O_HTTP_CLO. In HTTP analyzers, the checks done to find
the right mode to use, depending on proxies options and "Connection: " header
value, have been simplified.
This should only be a cleanup and no changes are expected.
This option is frontends specific, so there is no reason to support it on
backends. So now, it is ignored if it is set on a backend and a warning is
emitted during the startup. The change is quite trivial, but the commit is
tagged as MEDIUM because it is a small breakage with previous versions and
configurations using this options could emit a warning now.
This option is backends specific, so there is no reason to support it on
frontends. So now, it is ignored if it is set on a frontend and a warning is
emitted during the startup. The change is quite trivial, but the commit is
tagged as MEDIUM because it is a small breakage with previous versions and
configurations using this options could emit a warning now.
To ease the refactoring, the function "http_header_add_tail" have been
remove. Now, "http_header_add_tail2" is always used. And the function
"capture_headers" have been renamed into "http_capture_headers". Finally, some
functions have been exported.
Make sure we unsubscribe from events before si_release_endpoint destroys
the conn_stream, or it will be never called. To do so, move the call to
unsubscribe to si_release_endpoint() directly.
This is 1.9-specific and shouldn't be backported.
This bug appeared only if nbthread > 1. Handling the pipe with the
master, multiple threads of the same worker could process the deinit().
In addition, deinit() was called while some other threads were still
performing some tasks.
This patch assign the handler of the pipe with master to only the first
thread and removes the call to deinit() before exiting with an error.
This patch should be backported in v1.8.
If we can't send data for a stream because of its flow control, make sure
not to put it in the send_list, until the flow control lets it send again.
This is specific to 1.9, and should not be backported.
When subscribing, we don't need to provide a list element, only the h2 mux
needs it. So instead, Add a list element to struct h2s, and use it when a
list is needed.
This forces us to use the unsubscribe method, since we can't just unsubscribe
by using LIST_DEL anymore.
This patch is larger than it should be because it includes some renaming.
As we don't know how subscriptions are handled, we can't just assume we can
use LIST_DEL() to unsubscribe, so introduce a new method to mux and connections
to do so.
CurSslConns inc/dec operations are not threadsafe. The unsigned CurSslConns
counter can wrap to a negative value. So we could notice connection rejects
because of MaxSslConns limit artificially exceeded.
CumSslConns inc operation are also not threadsafe so we could miss
some connections and show inconsistenties values compared to CumConns.
This fix should be backported to v1.8.
The run queue is designed to perform a single tree lookup and to
use multiple passes to eb32sc_next(). The scheduler rework took a
conservative approach first but this is not needed anymore and it
increases the processing cost of process_runnable_tasks() and even
the time during which the RQ lock is held if the global queue is
heavily loaded. Let's simply move the initial lookup to the entry
of the loop like the previous scheduler used to do. This has reduced
by a factor of 5.5 the number of calls to eb32sc_lookup_get() there.
In the past this conditional had multiple conditionals which is why the
additional parentheses were needed. The conditional was simplified but
the duplicate parentheses were not cleaned up.
OpenSSL released support for TLSv1.3. It also added a separate function
SSL_CTX_set_ciphersuites that is used to set the ciphers used in the
TLS 1.3 handshake. This change adds support for that new configuration
option by adding a ciphersuites configuration variable that works
essentially the same as the existing ciphers setting.
Note that it should likely be backported to 1.8 in order to ease usage
of the now released openssl-1.1.1.
In ci_insert_line2() and b_rep_blk(), we can't afford to wrap, so don't use
b_tail() to check if we do, use __b_tail() instead.
This should be backported to previous versions.
For generate-certificates, X509V3_EXT_conf is used but it's an old API
call: X509V3_EXT_nconf must be preferred. Openssl compatibility is ok
because it's inside #ifdef SSL_CTRL_SET_TLSEXT_HOSTNAME, introduce 5
years after X509V3_EXT_nconf.
Commit 8ae735da0 ("MEDIUM: mux_h2: Revamp the send path when blocking.")
added a tasklet allocation in h2_stream_new(), however the error exit path
fails to reset h2s in case the tasklet cannot be allocated, resulting in
the h2s pointer to be returned as valid to the caller. Let's readjust the
exit path to always return NULL on error and to always log as well (since
there is no reason for not logging on such important errors).
No backport is needed, this is strictly 1.9-dev.
Since commit 7505f94f9 ("MEDIUM: h2: Don't use a wake() method anymore."),
the H2 mux's init() calls h2_process(). But this last one may detect an
early error and call h2_release(), destroying the connection, and return
-1. At this point we're screwed because the caller will still dereference
the connection for various things ranging from the configuration of the
proxy protocol header to the retries. We could simply return -1 here upon
failure but that's not enough since the stream layer really needs to keep
its connection structure allocated (to clean it up in session_kill_embryonic
or for example because it holds the destination address to reconnect to
when the connection goes to the backend). Thus the correct solution here is
to only schedule a wakeup of the I/O callback so that the init succeeds,
and that the connection is only handled later.
No backport is needed, this is 1.9-specific.
The return value from conn_install_mux() was not checked, so if an
inconsistency happens in the code, or a memory allocation fails while
initializing the mux, we can crash while using an uninitialized mux.
In practice the code inconsistency does not really happen since we
cannot configure such a situation, except during development, but
the out of memory condition could definitely happen.
This should be backported to 1.8 (the code is a bit different there,
there are two calls to conn_install_mux()).
The prototypes of functions find_hdr_value_end(), extract_cookie_value()
and http_header_match2() were still in proto_http.h while some of them
don't exist anymore and the others were just moved. Let's remove them.
In addition, da.c was updated to use http_extract_cookie_value() which
is the correct one.
These ones are mostly called from cfgparse.c for the parsing and do
not depend on the HTTP representation. The functions's prototypes
were moved to proto/http_rules.h, making this file work exactly like
tcp_rules. Ideally we should stop calling these functions directly
from cfgparse and register keywords, but there are a few cases where
that wouldn't work (stats http-request) so it's probably not worth
trying to go this far.
The current proto_http.c file is huge and contains different processing
domains making it very difficult to work on an alternative representation.
This commit moves some parts to other files :
- ACL registration code => http_acl.c
This code only creates some ACL mappings and doesn't know anything
about HTTP nor about the representation. This code could even have
moved to acl.c but it was not worth polluting it again.
- HTTP sample conversion => http_conv.c
This code doesn't depend on the internal representation but definitely
manipulates some HTTP elements, such as dates. It also has access to
captures.
- HTTP sample fetching => http_fetch.c
This code does depend entirely on the internal representation but is
totally independent on the analysers. Placing it into a different
file will ease the transition to the new representation and the
creation of a wrapper if required. An include file was created due
to CHECK_HTTP_MESSAGE_FIRST() being used at various places.
- HTTP action registration => http_act.c
This code doesn't directly interact with the messages nor the
transaction but it does so via some exported http functions like
http_replace_req_line() or http_set_status() so it will be easier
to change only this after the conversion.
- a few very generic parts were found and moved to http.{c,h} as
relevant.
It is worth noting that the functions moved to these new files are not
referenced anywhere outside of the files and are only called as registered
callbacks, so these files do not even require associated include files.
found by coverity.
[wt: this bug was introduced by commit 404d978 ("MINOR: add ALPN
information to send-proxy-v2"). It might be triggered by a health
check on a server using ppv2 or by an applet making use of such a
server, if at all configurable].
This needs to be backported to 1.8.
This ads support for accessing stick tables from Lua. The supported
operations are reading general table info, lookup by string/IP key, and
dumping the table.
Similar to "show table", a data filter is available during dump, and as
an improvement over "show table" it's possible to use up to 4 filter
expressions instead of just one (with implicit AND clause binding the
expressions). Dumping with/without filters can take a long time for
large tables, and should be used sparingly.
At the eand of process_stream(), we wake the task if there's something in
the input buffer, after attempting a recv. However this is wrong, and we should
only do so if we received new data. Just check the CF_READ_PARTIAL flag.
This is 1.9-specific and should not be backported.
Instead of using si_cs_io_cb() in process_stream() use si_cs_send/si_cs_recv
instead, as si_cs_io_cb() may lead to process_stream being woken up when it
shouldn't be, and thus timeout would never get triggered.
With recent modifications on the buffers API, when a buffer is released (calling
b_free), we replace it by BUF_NULL where the area pointer is NULL. So many
operations, like b_peek, must be avoided on a released or not allocated
buffer. These changes were mainly made in the commit c9fa048 ("MAJOR: buffer:
finalize buffer detachment").
Since this commit, HAProxy can crash during the body parsing of chunked HTTP
messages because there is no check on the channel's buffer in HTTP analyzers
(http_request_forward_body and http_response_forward_body) nor in H1 functions
reponsible to parse chunked content (h1_skip_chunk_crlf & co). If a stream is
woken up after all input data were forwarded, its input channel's buffer is
released (so set to BUF_NULL). In this case, if we resume the parsing of a
chunk, HAProxy crashes.
To fix this issue, we just skip the parsing of chunks if there is no input data
for the corresponding channel. This is only done if the message state is
strickly lower to HTTP_MSG_ENDING.
Tim Düsterhus found using afl-fuzz that some parts of the HPACK decoder
use incorrect bounds checking which do not catch negative values after
a type cast. The first culprit is hpack_valid_idx() which takes a signed
int and is fed with an unsigned one, but a few others are affected as
well due to being designed to work with an uint16_t as in the table
header, thus not being able to detect the high offset bits, though they
are not exposed if hpack_valid_idx() is fixed.
The impact is that the HPACK decoder can be crashed by an out-of-bounds
read. The only work-around without this patch is to disable H2 in the
configuration.
CVE-2018-14645 was assigned to this bug.
This patch addresses all of these issues at once. It must be backported
to 1.8.
An invalid null-deref warning is emitted because cmsg is not checked,
though it definitely is valid given the test performed 10 lines above,
but the compiler cannot necessarily guess this. Adding a null test to
the problematic condition is enough to get rid of it and cheap enough.
Like for the other checks, the type is being tested just before calling
objt_{server,dns_srvrq}() so let's use the unguarded version instead to
silence the warning.
When building with -Wnull-dereferences, gcc sees some cases where a
pointer is dereferenced after a check may set it to null. While all of
these are already guarded by either a preliminary test or the code's
construction (eg: listeners code being called only on listeners), it
cannot be blamed for not "seeing" this, so better use the unguarded
calls everywhere this happens, particularly after checks. This is a
step towards building with -Wextra.
Theorically nothing would prevent a front applet form connecting to a stats
socket, and if a "getsock" command was issued, it would cause a crash. Right
now nothing in the code does this so in its current form there is no impact.
It may or may not be backported to 1.8.
In h1_headers_to_hdr_list, when an incomplete message is parsed, all updates
must be skipped until the end of the message is found. Then the parsing is
restarted from the beginning. But not all updates were skipped, leading to
invalid rewritting or segfault.
No backport is needed.
A null pointer assignment was missing after free() in function
pat_ref_reload() which can lead to segfault.
This bug was introduced in commit b5997f7 ("MAJOR: threads/map: Make
acls/maps thread safe").
Must be backported to 1.8.
Just like we used to do in proto_http, we now check that each and every
occurrence of the content-length header field and each of its values are
exactly identical, and we normalize the header to return the last value
of the first header with spaces trimmed.
The transfer-encoding header processing was a bit lenient in this part
because it was made to read messages already validated by haproxy. We
absolutely need to reinstate the strict processing defined in RFC7230
as is currently being done in proto_http.c. That is, transfer-encoding
presence alone is enough to cancel content-length, and must be
terminated by the "chunked" token, except in the response where we
can fall back to the close mode if it's not last.
For this we now use a specific parsing function which updates the
flags and we introduce a new flag H1_MF_XFER_ENC indicating that the
transfer-encoding header is present.
Last, if such a header is found, we delete all content-length header
fields found in the message.
The new function h1_parse_connection_header() is called when facing a
connection header in the generic parser, and it will set up to 3 bits
in h1m->flags indicating if at least one "close", "keep-alive" or "upgrade"
tokens was seen.
This will be needed for the mux to know how to process the Connection
header, and will save it from having to re-parse the request line since
it's captured on the fly.
While it was possible to consider the status before parsing response
headers, it's wrong to do it for request headers and could lead to
random behaviours due to this status matching other fields instead.
Additionnally there is little to no value in doing this for each and
every new header field. It's much better to reset the content-length
at once in the callerwhen seeing such statuses (which currently is only
the H2 mux).
No backport is needed, this is purely 1.9.
The h2 parser has this specificity that if it cannot send the headers
frame resulting from the headers it just parsed, it needs to drop it
and parse it again later. Since commit 8852850 ("MEDIUM: h1: let the
caller pass the initial parser's state"), when this happens the parser
remains in the data state and the headers are not parsed again next
time, resulting in a parse error. Let's reset the parser on exit there.
No backport is needed.
If we're detaching the conn_stream, and it was subscribed to be waken up
when more data was available to receive, unsubscribe it.
No backport is needed.
Empty both send_list and fctl_list when destroying the h2 context, so that
if we're freeing the stream after, it doesn't try to remove itself from the
now-deleted list.
No backport is needed.
Till now it was very difficult for a mux to know what proxy it was
working for. Let's pass the proxy when the mux is instanciated at
init() time. It's not yet used but the H1 mux will definitely need
it, just like the H2 mux when dealing with backend connections.
The h1 parser used to systematically turn header field names to lower
case because it was designed for H2. Let's add a flag which is off by
default to condition this behaviour so that when using it from an H1
parser it will not affect the message.
The original H1 request parsing code was reintroduced into the generic
H1 parser so that it can be used regardless of the direction. If the
parser is interrupted and restarts, it makes use of the H1_MF_RESP
flag to decide whether to re-parse a request or a response. While
parsing the request, the method is decoded and set into the start line
structure.
The HTTP status is not relevant to the H1 message but to the H2 stream
itself. It used to be placed there by pure convenience but better move
it before it's too hard to remove.
This state was only a delimiter between headers and body but it now
causes more harm than good because it requires someone to change it.
Since the H1 parser knows if we're in DATA or CHUNK_SIZE, simply let
it set the right next state so that h1m->state constantly matches
what is expected afterwards.
While it was not needed in the H2 mux which was reading full H1 messages
from the channel, it is mandatory for the H1 mux reading contents from
outside to be able to restart on a message. The problem is that the
headers are indexed on the fly, and it's not fun to have to store
everything between calls.
The solution here is to complete the first pass doing a partial restart,
and only once the end of message was found, to start over it again at
once, filling entries. This way there is a bounded number of passes on
the contents and no need to store an intermediary result anymore. Later
this principle could even be used to decide to completely drop an output
buffer to save memory.
This will allow the parser to fill some extra fields like the method or
status without having to store them permanently in the HTTP message. At
this point however the parser cannot restart from an interrupted read.
Till now the H1 parser made for H2 used to be lenient on invalid header
field names because they were supposed to be produced by haproxy. Now
instead we'll rely on err_pos to know how to act (ie: -2 == must block).
There's no reason to have the two sides in H1 format since we only use
one at a time (the response at the moment). While completely removing
the request declaration, let's rename the response to "h1m" to clarify
that it's the unique h1 message there.
This is the *parsing* state of an HTTP/1 message. Currently the h1_state
is composite as it's made both of parsing and control (100SENT, BODY,
DONE, TUNNEL, ENDING etc). The purpose here is to have a purely H1 state
that can be used by H1 parsers. For now it's equivalent to h1_state.
Instead of waiting for the connection layer to let us know we can read,
attempt to receive as soon as process_stream() is called, and subscribe
to receive events if we can't receive yet.
Now, except for idle connections, the recv(), send() and wake() methods are
no more, all the lower layers do is waking tasklet for anybody waiting
for I/O events.
Change fctl_list and send_list to be lists of struct wait_list, and nuke
send_wait_list, as it's now redundant.
Make the code responsible for shutr/shutw subscribe to those lists.
Instead of having our wake() method called each time a fd event happens,
just subscribe to recv/send events, and get our tasklet called when that
happens. If any recv/send was possible, the equivalent of what h2_wake_cb()
will be done.
Let the connection layer know we're always interested in getting more data,
so that we get scheduled as soon as data is available, instead of relying
on the wake() method.
Make h2_recv() and h2_send() return 1 if data has been sent/received, or 0
if it did not. That way the caller will be able to know if more work may
have to be done.
Remove the recv() method from mux and conn_stream.
The goal is to always receive from the upper layers, instead of waiting
for the connection later. For now, recv() is still called from the wake()
method, but that should change soon.
For struct connection, struct conn_stream, and for the h2 mux, add 2 new
lists, one that handles waiters for recv, and one that handles waiters for
recv and send. That way we can ask to subscribe for either recv or send.
Resetting the polling flags at the end of conn_fd_handler() shouldn't be
needed anymore, and it will create problem when we won't handle send/recv
from conn_fd_handler() anymore.
Cyril Bonté reported that commit f9cc07c25b broke the build without
thread.
We don't need to initialise tid = 0 in mworker_loop, so we could
completely remove it.
Christopher noticed that the CS_FL_EOS to CS_FL_REOS conversion was
incomplete : when the connectionis closed, we mark the streams with EOS
instead of REOS, causing the loss of any possibly pending data. At the
moment it's not an issue since H2 is used only with a client, but with
servers it could be a real problem if servers close the connection right
after sending their response.
This patch should be backported to 1.8.
This patch ensures that a DNS resolution may be launched before
setting a server FQDN via the CLI. Especially, it checks that
resolvers was set.
A LEVEL 4 reg testing file is provided.
Thanks to Lukas Tribus for having reported this issue.
Must be backported to 1.8.
This protocol is based on the uxst one, but it uses socketpair and FD
passing insteads of a connect()/accept().
The "sockpair@" prefix has been implemented for both bind and server
keywords.
When HAProxy wants to connect through a sockpair@, it creates 2 new
sockets using the socketpair() syscall and pass one of the socket
through the FD specified on the server line.
On the bind side, haproxy will receive the FD, and will use it like it
was the FD of an accept() syscall.
This protocol was designed for internal communication within HAProxy
between the master and the workers, but it's possible to use it
externaly with a wrapper and pass the FD through environment variabls.
It's possible to have several protocols per family which is a problem
with the current way the protocols are stored.
This allows to register a new protocol in HAProxy which is not a
protocol in the strict socket definition. It will be used to register a
SOCK_STREAM protocol using socketpair().
In _update_fd(), if the fd wasn't polled, and we don't want it to be polled,
we just returned 0, however, we should return changes instead, or all previous
changes will be lost.
This should be backported to 1.8.
The following functions only deal with header field values and are agnostic
to the HTTP version so they were moved to http.c :
http_header_match2(), find_hdr_value_end(), find_cookie_value_end(),
extract_cookie_value(), parse_qvalue(), http_find_url_param_pos(),
http_find_next_url_param().
Those lacking the "http_" prefix were modified to have it.
There are 3 tables in proto_http which are used exclusively by logs :
hdr_encode_map[], url_encode_map[] and http_encode_map[]. They indicate
what characters are safe to be emitted in logs depending on the part of
the message where they are placed. Let's move this to log.c, as well as
its initialization. It's worth noting that the rfc5424 map was already
initialized there.
These error codes and messages are agnostic to the version, even if
they are represented as HTTP/1.0 messages. Ultimately they will have
to be transformed into internal HTTP messages to be used everywhere.
The HTTP/1.1 100 Continue message was turned to an IST and the local
copy in the Lua code was removed.
This function is purely HTTP once http_txn is put aside. So the original
one was renamed to http_txn_get_path() and it extracts the relevant offsets
from the txn to pass them to http_get_path(). One benefit of the new version
is that it returns the length at the same time so that allowed to slightly
simplify http_get_path_from_string() which had to look up the end pointer
previously and which is not needed anymore.
It's a bit painful to have to deal with HTTP semantics for each protocol
version (H1 and H2), and working on the version-agnostic code further
emphasizes the problem.
This patch creates http.h and http.c which are agnostic to the version
in use, and which borrow a few parts from proto_http and from h1. For
example the once thought h1-specific h1_char_classes array is in fact
dictated by RFC7231 and is used to parse HTTP headers. A few changes
were made to a few files which were including proto_http.h while they
only needed http.h.
Certain string definitions pre-dated the introduction of indirect
strings (ist) so some were used to simplify the definition of the known
HTTP methods. The current lookup code saves 2 kB of a heavily used table
and is faster than the previous table based lookup (typ. 14 ns vs 16
before).
We need to clean the FDs registered manually in the poller to avoid FD
leaking during a reload of the master.
This patch call the per thread deinit function which close the thread
waker pipe.
In order to communicate with the workers, the master pipe has been
replaced by a socketpair() per worker.
The goal is to use these sockets as stats sockets and be able to access
them from the master.
When reloading, the master serialize the information of the workers and
put them in a environment variable. Once the master has been reexecuted
it unserialize that information and it is capable of closing the FDs of
the leaving children.
The master now use a poll loop, which should be initialized even in wait
mode. We need to init some variables if we didn't success to load the
configuration file.
If haproxy failed to load its configuration, the process is reexecuted
and it did not init the poller. So we must not try to deinit the poller
before the exec().
With the new way of handling the signals in the master worker, we are
are not staying in a waitpid() loop. Which means that we need to catch the
SIGCHLD signals to call waitpid().
The problem is when the master is reloading, this signal is neither
registered nor blocked so we lost all signals between the restart and
the call to mworker_loop().
This patch blocks the SIGCHLD signals before the reloading and ensure
it's not unblocked before the master registered the SIGCHLD handler.
In order to reorganize the code of the master worker, the mworker_wait()
function which was the main function was split. This function was
handling a wait() loop, but it does not need it anymore since the code
will use the poll loop of haproxy instead.
The function was split in several functions:
- mworker_catch_sigterm() which is a signal handler for SIGTERM ans
SIGUSR1 that sends the signals to the workers
- mworker_catch_sigchld() which is the code handling the leaving of a
child
- mworker_catch_sighup which basically call the mworker_restart()
function
- mworker_loop() which is the function calling the main poll loop in the
master
Instead of having a separate area for the captured data, we now have a
contigous block made of the descriptor and the data. At the moment, since
the area is dynamically allocated, we can adjust its size to what is
needed, but the idea is to quickly switch to a pool and an LRU list.
Now upon error we dynamically allocate the snapshot instead of overwriting
it. This way there is no more memory wasted in the proxy to hold the two
error snapshot descriptors. Also an appreciable side effect of this is that
the proxy's lock is only taken during the pointer swap, no more while copying
the buffer's contents. This saves 480 bytes of memory per proxy.
The proxy's lock it held while filling the error but not while dumping
it, so it's possible to dereference pointers being replaced, typically
server pointers. The risk is very low and unlikely but not inexistent.
Since "show errors" is rarely used in parallel, let's simply grab the
proxy's lock while dumping. Ideally we should use an R/W lock here but
it will not make any difference.
This patch must be backported to 1.8, but the code is in proto_http.c
there, though mostly similar.
Now that we have a generic error capture function, let's simplify
http_capture_bad_message() to make use of it. At this point the API
is not changed at all, but it could be further simplified.
This function now captures an error regardless of its side and protocol.
The caller must pass a number of elements and may pass a protocol-specific
structure and a callback to display it. Later this function may deal with
more advanced allocation techniques to avoid allocating as many buffers
as proxies.
The HTTP dumps are now configurable in the code : "show errors" now
calls a protocol-specific function to emit the decoded output. For
now only HTTP is implemented.
The output of "show errors" was slightly reordered to split the HTTP part
in a single chunk_appendf() call. The useless buffer total input was
replaced to report the buffer's start offset, which is the offset in the
stream of the first input byte (thus not counting output). Also it was
the opportunity to stop calling the stream "session".
The idea will be to make the error snapshot feature accessible to other
protocols than just HTTP. This patch only introduces an "http_snapshot"
structure and renames a few fields to make things more explicit. The
HTTP part was installed inside a union so that we can easily add more
protocols in the future.
The snapshots have the ability to restart a partial dump and they use
the stream ID as the restart point. Since it's purely HTTP, let's use
the event ID instead.
Let's use an atomic increment for the error snapshot, as we'd rather
not assign the same ID to two errors happening in parallel. It's very
unlikely that it will ever happen though.
This patch must be backported to 1.8 with the other one it relies on
("MINOR: thread: implement HA_ATOMIC_XADD()").
On the Mailing list, Marcos Moreno reported that haproxy configuration
validation (through "haproxy -c cfgfile") does not detect when a
resolvers section does not exist for a server.
That said, this checking is done after HAProxy has started up.
The problem is that this can create production issue, since init
script can't detect the problem before starting / reloading HAProxy.
To fix this issue, this patch registers the function which validates DNS
configuration validity and run it right after configuration parsing is
finished (through cfg_register_postparser()).
Thanks to it, now "haproxy -c cfgfile" will fail when a server
points to a non-existing resolvers section (or any other validation made
by the function above).
Backport status: 1.8
Sometimes a connection is prepared before the target is set, sometimes
after. There's no real rule since the few functions involved operate on
different and independent fields. Soon we'll benefit from knowing the
target at the connection layer, in order to figure the associated proxy
and retrieve the various parameters (timeouts etc). This patch slightly
reorders a few calls to conn_prepare() so that we can make sure that the
target is always known to the mux.
Commit 5e74b0b ("MEDIUM: h1: port to new buffer API.") introduced a
minor bug by which a buffer's head could stay shifted by the amount
of removed CRLF if it started with empty lines. This would cause the
second request (or response) not to work until it would receive a few
extra characters. This most only impacts requests sent by hand though.
This is purely 1.9, no backport is needed.
The h2 mux currently lacks some basic transparency. Some errors cause the
connection to be aborted but they couldn't be reported. With this patch,
almost all situations where an error will cause a stream or connection to
be aborted without the ability for an existing stream to report it will be
reported in the logs. This at least provides a solution to monitor the
activity and abnormal traffic.
The new function sess_log() only needs a session to emit a log. It will
ignore the parts that depend on the stream. It is usable to emit a log
to report early errors in muxes. These ones will typically mention
"<BADREQ>" for the request and 0 for the HTTP status code.
Till now it was impossible to emit logs from the lower layers only because
a stream was mandatory. From now on it will at least be possible to emit a
log to report a bad request or some timings for example. When the stream
is null, sess_build_logline() will use default values and will extract the
timing information from the session just like stream_new() does, so the
resulting log line is perfectly valid.
The termination state will indicate a proxy error during the request phase
since it is the only realistic use for such a call with no stream.
When s==NULL we don't have any assigned request counter. Ideally we
should proceed exactly like when a stream is initialized and assign
a unique value. For now we only place it into a local variable.
We'll soon support s==NULL so let's use an intermediary variable for the
logs structure. For now it only points to s->logs but will support a local
variable as an alternative later.
The current build_logline() can only be used with valid streams, which
means it is not suitable for use from muxes. We start by moving it into
another more generic function which takes the session as an argument,
to avoid complexifying all the internal API for jsut a few use cases.
This new function is not supposed to be called directly from outside so
we'll be able to instrument it to support several calling conventions.
For now the behaviour and conditions remain unchanged.
While parsing a headers frame, if the frame is wrapped in the buffer
and needs to be unwrapped, it will be duplicated before being processed.
But if it contains certain combinations of invalid flags, the parser
returns without releasing the temporary buffer leading to a memory
leak.
This fix needs to be backported to 1.8.
The handshake processing time used to be stored per stream, which was
valid when there was exactly one stream per session. With H2 and
multiplexing it's not the case anymore and the reported handshake times
are wrong in the logs as it's computed between the TCP accept() and the
stream creation. Let's first move the handshake where it belongs, which
is the session.
However, this is not enough because we don't want to report an excessive
idle time either for H2 (since many requests use the connection).
So the solution used here is to have the stream retrieve sess->tv_accept
and the handshake duration when the stream is created, and let the mux
immediately reset them. This way, the handshake time becomes zero for the
second and subsequent requests in H2 (which was already the case in H1),
and the idle time exactly counts how long the connection remained unused
while it could be used, so in H1 it runs from the end of the previous
response and in H2 it runs from the end of the previous request since the
channel is already available.
This patch will need to be backported to 1.8.
The request counter is incremented when creating a new stream and when
resetting a stream, preparing for a new request. Unfortunately during
the thread migration this was missed, leading to non-atomic increments
in case threads are in use. The most visible side effect is that two
requests may have the same ID from time to time in the logs. However
the SPOE also uses this ID to route responses back to the stream so it
may also lead to occasional spurious SPOE timeouts.
Note that it still doesn't guarantee temporal unicity in the stream
identifiers since a long and a short connection could technically use
the same ID. The likeliness that this happens at the same time is almost
null (roughly threads*runqueue_depth/2^32 that it happens in the same
poll loop), but it will have to be addressed later anyway.
This patch must be backported to 1.8 with the other one it relies on
("MINOR: thread: implement HA_ATOMIC_XADD()").
With openssl >= 1.1.1 and boringssl multi-cert is natively supported.
ECDSA/RSA selection is done and work correctly with TLS >= v1.2.
TLS < v1.2 have no TLSEXT_TYPE_signature_algorithms extension: ECC
certificate can't be selected, and handshake fail if no RSA cert
is present. Safe ECC certificate selection without client announcement
can be very tricky (browser compatibilty). The safer approach is to
select ECDSA certificate if no other certificate matches, like it is
with openssl < 1.1.1: certificate selection is only done via the SNI.
Thanks to Lukas Tribus for reporting this and analysing the problem.
This patch should be backported to 1.8
Server state file has no indication that a server is currently managed
by a DNS SRV resolution.
And thus, both feature (DNS SRV resolution and server state), when used
together, does not provide the expected behavior: a smooth experience...
This patch introduce the "SRV record name" in the server state file and
loads and applies it if found and wherever required.
This patch applies to haproxy-dev branch only. For backport, a specific patch
is provided for 1.8.
If SET_SAFE_LJMP returns 0, the spinlock is already unlocked, and lua_atpanic
is already set back to hlua_panic_safe, so there's no need to call
RESET_SAFE_LJMP.
This should be MFC'd into 1.8.
When calling ->prepare_srv() callback for SSL server which
depends on global "nbthread" value, this latter was not already parsed,
so equal to 1 default value. This lead to bad memory accesses.
Thank you to Pieter (PiBa-NL) for having reported this issue and
for having provided a very helpful reg testing file to reproduce
this issue (reg-test/lua/b00002.*).
Must be backported to 1.8.
Call si_cs_send() at the beginning of si_cs_wake_cb(), instead of from
stream_int_notify-), so that if we get a connection error while trying to
send, the stream_interface will get SI_FL_ERR, the associated task will
be woken up, and the connection will be properly destroyed.
No backport needed.
Instead of calling __event_srv_chk_w, call wake_srv_chk(), which will then
either call tcpcheck_main() or __event_srv_chk_w().
Also make tcpcheck_main() subscribe if it can't send.
In hlua_applet_tcp_fct(), drain the output buffer when the applet is done
running, every time we're called.
Overwise, there's a race condition, and the output buffer could be filled
after the applet ran, and as it is never cleared, the stream interface
will never be destroyed.
This should be backported to 1.8 and 1.7.
This adds the sample fetch 'be_conn_free([<backend>])'. This sample fetch
provides the total number of unused connections across available servers in the
specified backend.
Previously LUA code would maintain the transaction state between http
requests, resulting in things like txn:get_priv() retrieving data from
a previous request. This addresses the issue by ensuring the LUA state
is reset between requests.
Co-authored-by: Tim Düsterhus <tim@bastelstu.be>
mux_pt_wake() calls data->wake() which can return -1 indicating that the
connection was just destroyed. We need to check for this condition and
immediately exit in this case otherwise we dereference a just freed
connection. Note that this mainly happens on idle connections between
two HTTP requests. It can have random implications between requests as
it may lead a wrong connection's polling to be re-enabled or disabled
for example, especially with threads.
This patch must be backported to 1.8.
HTTP LUA applet callback should not update the date on which the HTTP client requests
arrive. This was done just after the LUA applet has completed its job.
This patch simply removes the affected statement. The same fixe has been applied
to TCP LUA applet callback.
To reproduce this issue, as reported by Patrick Hemmer, implement an HTTP LUA applet
which sleeps a bit before replying:
core.register_service("foo", "http", function(applet)
core.msleep(100)
applet:set_status(200)
applet:start_response()
end)
This had as a consequence to log %TR field with approximatively the same value as
the LUA sleep time.
Thank you to Patrick Hemmer for having reported this issue.
Must be backported to 1.8, 1.7 and 1.6.
This patch improves the previous fix by implementing the socket draining
code directly in conn_sock_drain() so that it always applies regardless
of the protocol's family. Thus it gets rid of tcp_drain().
Right now conn_sock_drain() calls the protocol's ->drain() function if
it exists, otherwise it simply tries to disable polling for receiving
on the connection. This doesn't work well anymore since we've implemented
the muxes in 1.8, and it has a side effect with keep-alive backend
connections established over unix sockets. What happens is that if
during the idle time after a request, a connection reports some data,
si_idle_conn_null_cb() is called, which will call conn_sock_drain().
This one sees there's no drain() on unix sockets and will simply disable
polling for data on the connection. But it doesn't do anything on the
conn_stream. Thus while leaving the conn_fd_handler, the mux's polling
is updated and recomputed based on the conn_stream's polling state,
which is still enabled, and nothing changes, so we see the process
use 100% CPU in this case because the FD remains active in the cache.
There are several issues that need to be addressed here. The first and
most important is that we cannot expect some protocols to simply stop
reading data when asked to drain pending data. So this patch make the
unix sockets rely on tcp_drain() since the functions are the same. This
solution is appropriate for backporting, but a better one is desired for
the long term. The second issue is that si_idle_conn_null_cb() shouldn't
drain the connection but the conn_stream.
At the moment we don't have any way to drain a conn_stream, though a flag
on rcv_buf() will do it well. Until we support muxes on the server side
it is not a problem so this part can be addressed later.
This fix must be backported to 1.8.
Since commit 843b7cb ("MEDIUM: chunks: make the chunk struct's fields
match the buffer struct") a chunk length is unsigned so we can remove
negative size checks.
By convenience or laziness we used to store base64dec()'s return code
into trash.data and to compare it against 0 to check for conversion
failure, but it's now unsigned since commit 843b7cb ("MEDIUM: chunks:
make the chunk struct's fields match the buffer struct"). Let's clean
this up and test the result itself without storing it first.
No backport is needed.
Cyril Bonté discovered that the proxy protocol randomly fails since
commit 843b7cb ("MEDIUM: chunks: make the chunk struct's fields match
the buffer struct"). This is because we used to store recv()'s return
code into trash.data which is now unsigned, so it never compares as
negative against 0. Let's clean this up and test the result itself
without storing it first.
No backport is needed.
By convenience or laziness we used to store exp_replace()'s return code
into str->data. The result checks applied there compare str->data to -1
while it's now unsigned since commit 843b7cb ("MEDIUM: chunks: make the
chunk struct's fields match the buffer struct"). Let's clean this up
and test the result itself without storing it first.
No backport is needed.
By convenience or laziness we used to store dns_build_query()'s return code
into trash.data. The result checks applied there compare trash.data to -1
while it's now unsigned since commit 843b7cb ("MEDIUM: chunks: make the
chunk struct's fields match the buffer struct"). Let's clean this up
and test the result itself without storing it first.
No backport is needed.
By convenience or laziness we used to store url_decode()'s return code
into smp->data.u.str.data. The result checks applied there compare it
to 0 while it's now unsigned since commit 843b7cb ("MEDIUM: chunks: make
the chunk struct's fields match the buffer struct "). Let's clean this up
and test the result itself without storing it first.
No backport is needed.
By convenience or laziness we used to store exp_replace()'s return code
into trash.data. The result checks applied there compare trash.data to -1
while it's now unsigned since commit 843b7cb ("MEDIUM: chunks: make the
chunk struct's fields match the buffer struct "). Let's clean this up
and test the result itself without storing it first.
No backport is needed.
Since commit 843b7cb ("MEDIUM: chunks: make the chunk struct's fields
match the buffer struct") a chunk length is unsigned so we can't reliably
store -1 and check for negative values in the caller. Only one such
location was found in proto_http's http-request auth rules (which cannot
realistically fail).
No backport is needed.
thread_isolate() is currently being called with the server lock held.
This is not acceptable because it prevents other threads from reaching
the rendez-vous point. Now that the LB algos are thread-safe, let's get
rid of this call.
No backport is nedeed.
Since commit 3ff577e ("MAJOR: server: make server state changes
synchronous again"), srv_update_status() calls the various maintenance
operations of the LB algorithms (->set_server_up, ->set_server_down,
->update_server_weight()). These ones are called with a single thread
guaranteed by the rendez-vous point, so the fact that they're lacking
some locks has no effect. However we'll need to remove the rendez-vous
point so we have to take care of properly locking all the LB algos.
The comments have been properly updated on the various functions to
mention their locking expectations. All these functions are called
with the server lock held, and all of them now support concurrent
calls by using the lbprm's lock.
This fix doesn't need to be backported at the moment, though if any
check-specific issue surfaced in 1.8, it could make sense to reuse it.
Since commit 3ff577e ("MAJOR: server: make server state changes
synchronous again"), srv_update_status() is called with the server
lock held. It calls (among others) pendconn_redistribute() which used
to take this lock, causing CPU loops by default, or crashes if build
with -DDEBUG_THREAD. Since this function is not called from any other
place anymore, it doesn't require the lock on its own so let's simply
drop it from there.
No backport is needed, this is 1.9-specific.
If we subscribed to send, and the callback is called, call the wake callback
after, so that process_stream() may be woken up if needed.
This is 1.9-specific, no backport is needed.
It is possible that the conn_stream gets detached from the stream_interface,
and as it subscribed to the wait list, si_cs_io_cb() gets called anyway,
so make sure we have a conn_stream before attempting to send more data.
This is 1.9-specific, no backport is needed.
When freeing the stream, make sure we remove the stream interfaces from the
wait lists, in case it was in there.
This is 1.9-specific, no backport is needed.
The server-specific CLI commands "set weight", "set maxconn",
"disable agent", "enable agent", "disable health", "enable health",
"disable server" and "enable server" were not protected against
concurrent accesses. Now they take the server lock around the
sensitive part.
This patch must be backported to 1.8.
At the moment it's totally unclear while reading the server's code which
functions require to be called with the server lock held and which ones
grab it and cannot be called this way. This commit simply inventories
all of them to indicate what is detected depending on how these functions
use the struct server. Only functions used at runtime were checked, those
dedicated to config parsing were skipped. Doing so already has uncovered
a few bugs on some CLI actions.
The proxy-related commands like "{enable|disable|shutdown} frontend",
"{enable|disable} dynamic-cookie", "set dynamic-cookie-key" were not
protected against concurrent accesses making their use dangerous with
threads.
This patch must be backported to 1.8.
Commit 3ff577e ("MAJOR: server: make server state changes synchronous again")
reintroduced synchronous server state changes. However, during the previous
change from synchronous to asynchronous, the server state propagation was
placed at the end of the function to ease the code changes, and the commit
above didn't put it back at its place. This has resulted in propagated
states to be incomplete. For example, making a server leave maintenance
would make it up but would leave its tracking servers down because they
see their tracked server is still down.
Let's just move the status update right to its place. It also adds the
benefit of reporting state changes in the order they appear and not in
reverse.
No backport is needed.
Since commit #56cc12509, haproxy accepts double values for timeouts. The
value is then converted to milliseconds before being rounded up and cast
to int. The issue is that to round up the value, a constant value of 0.5
is added to it, but too early in the conversion, resulting in an
additional 500ms to the value. We are talking about a precision of 1ms,
so we can safely get rid of this rounding trick and adjust resulting
timeouts equal to 0 to a minimum of 1ms.
This patch is specific to the 1.9 branch and doesn't require to be
backported.
Sachin Shetty reported that socket timeouts set in LUA code have no effect.
Indeed, connect timeout is never modified and is always set to its default,
set to 5 seconds. Currently, this patch will apply the specified timeout
value to the connect timeout.
For the read and write timeouts, the issue is that the timeout is updated but
the expiration dates were not updated.
This patch should be backported up to the 1.6 branch.
Instead of checking if nbthreads == 1, just and thread_mask with
all_threads_mask to know if we're supposed to add the task to the local or
the global runqueue.
In most cases, "TLSv1.x" naming is used across and documentation, lazy
people tend to grep too much and may not find what they are looking for.
Fixing people is hard.
Due to a cascade of get_trash_chunk calls the sample is
corrupted when we want to read it.
The fix consist to use a temporary chunk to copy the sample
value and use it.
[wt: for 1.8 and older, a backport was successfully tested here :
https://www.mail-archive.com/haproxy@formilux.org/msg30694.html]
If the dh parameter is not found, the openssl's error global
stack was not correctly cleared causing unpredictable error
during the following parsing (chain cert parsing for instance).
This patch should be backported in 1.8 (and perhaps 1.7)
If there was an issue loading a keytype's part of a bundle, the bundle
was implicitly ignored without errors.
This patch should be backported in 1.8 (and perhaps 1.7)
Instead of just using the conn_stream wait_list, give the stream_interface
its own. When the conn_stream will have its own buffers, the stream_interface
may have to wait on it.
Instead of using si_cs_send() as a task handler, define a new function,
si_cs_io_cb(), and give si_cs_send() its original prototype. Right now
si_cs_io_cb() just handles send, but later it'll handle recv() too.
Empty connection is reported as handshake error
even if dont-log-null is specified.
This bug affect is a regression du to:
BUILD: ssl: fix to build (again) with boringssl
New openssl 1.1.1 defines OPENSSL_NO_HEARTBEATS as boring ssl
so the test was replaced by OPENSSL_IS_BORINGSSL
This fix should be backported on 1.8
The priority values are used when connections are queued to determine
which connections should be served first. The lowest priority class is
served first. When multiple requests from the same class are found, the
earliest (according to queue_time + offset) is served first. The queue
offsets can span over roughly 17 minutes after which the offsets will
wrap around. This allows up to 8 minutes spent in the queue with no
reordering.
This adds the set-priority-class and set-priority-offset actions to
http-request and tcp-request content. At this point they are not used
yet, which is the purpose of the next commit, but all the logic to
set and clear the values is there.
We'll need trees to manage the queues by priorities. This change replaces
the list with a tree based on a single key. It's effectively a list but
allows us to get rid of the list management right now.
We store the queue index in the stream and check it on dequeueing to
figure how many entries were processed in between. This way we'll be
able to count the elements that may later be added before ours.
The current name is misleading as it implies a queue size, but the value
instead indicates a position in the queue.
The value is only the queue size at the exact moment the element is enqueued.
Soon we will gain the ability to insert anywhere into the queue, upon which
clarity of the name is more important.
We'll soon need to rely on the pendconn position at the time of dequeuing
to figure the position a stream took in the queue. Usually it's not a
problem since pendconn_free() is called once the connection starts, but
it will make a difference for failed dequeues (eg: queue timeout reached).
Thus it's important to call pendconn_free() before logging in cases we are
not certain whether it was already performed, and to call pendconn_unlink()
after we know the pendconn will not be used so that we collect the queue
state as accurately as possible. As a benefit it will also make the
server's and backend's queues count more accurate in these cases.
To do so, mux choices are split to handle incoming and outgoing connections in a
different way. The protocol specified on the bind/server line is used in
priority. Then, for frontend connections, the ALPN is retrieved and used to
choose the best mux. For backend connection, there is no ALPN. Finaly, if no
protocol is specified and no protocol matches the ALPN, we fall back on a
default mux, choosing in priority the first mux with exactly the same mode.
It is mandatory to be sure to process data blocked in the RX buffer of the
conn_stream while the shutr/read0 was already processed. The stream interface
doesn't need to rely on this flags because it already tests CS_FL_EOS.
Now we try to synchronously push updates as they come using the new rdv
point, so that the call to the server update function from the main poll
loop is not needed anymore.
It further reduces the apparent latency in the health checks as the response
time almost always appears as 0 ms, resulting in a slightly higher check rate
of ~1960 conn/s. Despite this, the CPU consumption has slightly dropped again
to ~32% for the same test.
The only trick is that the checks code is built with a bit of recursivity
because srv_update_status() calls server_recalc_eweight(), and the latter
needs to signal srv_update_status() in case of updates. Thus we added an
extra argument to this function to indicate whether or not it must
propagate updates (no if it comes from srv_update_status).
This partially reverts commit d8fd2af ("BUG/MEDIUM: threads: Use the sync
point to check active jobs and exit") which used to address an issue in
the way the sync point used to check for present threads, which was later
addressed by commit ddb6c16 ("BUG/MEDIUM: threads: Fix the exit condition
of the thread barrier"). Thus there is no need anymore to use the sync
point for exiting and we can completely remove this call in the main loop.
The current sync point causes some important stress when a high number
of threads is in use on a config with lots of checks, because it wakes
up all threads every time a server state changes.
A config like the following can easily saturate a 4-core machine reaching
only 750 checks per second out of the ~2000 configured :
global
nbthread 4
defaults
mode http
timeout connect 5s
timeout client 5s
timeout server 5s
frontend srv
bind :8001 process 1/1
redirect location / if { method OPTIONS } { rand(100) ge 50 }
stats uri /
backend chk
option httpchk
server-template srv 1-100 127.0.0.1:8001 check rise 1 fall 1 inter 50
The reason is that the random on the fake server causes the responses
to randomly match an HTTP check, and results in a lot of up/down events
that are broadcasted to all threads. It's worth noting that the CPU usage
already dropped by about 60% between 1.8 and 1.9 just due to the scheduler
updates, but the sync point remains expensive.
In addition, it's visible on the stats page that a lot of requests end up
with an L7TOUT status in ~60ms. With smaller timeouts, it's even L4TOUT
around 20-25ms.
By not using THREAD_WANT_SYNC() anymore and only calling the server updates
under thread_isolate(), we can avoid all these wakeups. The CPU usage on
the same config drops to around 44% on the same machine, with all checks
being delivered at ~1900 checks per second, and the stats page shows no
more timeouts, even at 10 ms check interval. The difference is mainly
caused by the fact that there's no more need to wait for a thread to wake
up from poll() before starting to process check results.
Multiplexers are not necessarily associated to an ALPN. ALPN is a TLS extension,
so it is not always defined or used. Instead, we now rather speak of
multiplexer's protocols. So in this patch, there are no significative changes,
some structures and functions are just renamed.
Now, a multiplexer can specify if it can be install on incoming connections
(ALPN_SIDE_FE), on outgoing connections (ALPN_SIDE_BE) or both
(ALPN_SIDE_BOTH). These flags are compatible with proxies' ones.
The comment above the change remains true. We assume there is always 1
conn_stream per outgoing connectionq. Today, it is always true because H2 is not
supported yet for server connections.
This function is generic and is able to automatically transfer data from a
buffer to the conn_stream's tx buffer. It does this automatically if the mux
doesn't define another snd_buf() function.
It cannot yet be used as-is with the conn_stream's txbuf without risking to
lose data on close since conn_streams need to be orphaned for this.
This is a partial revert of the commit deccd1116 ("MEDIUM: mux: make
mux->snd_buf() take the byte count in argument"). It is a requirement to do
zero-copy transfers. This will be mandatory when the TX buffer of the
conn_stream will be used.
So, now, data are consumed by mux->snd_buf() and not only sent. So it needs to
update the buffer state. On its side, the caller must be aware the buffer can be
replaced y an empty or unallocated one.
As a side effet of this change, the function co_set_data() is now only responsible
to update the channel set, by update ->output field.
When switching back from a backup to an active server, the backup server
currently continues to drain the proxy's connections, which is a problem
because it's not expected to be able to pick them.
This patch ensures that a backup server will only pick backend connections
if there is no active server and it is the selected backup server or all
backup servers are supposed to be used.
This issue seems to have existed forever, so this fix should be backported
to all stable versions.
Commit 64cc49c ("MAJOR: servers: propagate server status changes
asynchronously.") heavily changed the way the server states are
updated since they became asynchronous. During this change, some
code was lost, which is used to shut down some sessions from a
backup server and to pick pending connections from a proxy once
a server is turned back from maintenance to ready state. The
effect is that when temporarily disabling a server, connections
stay in the backend's queue, and when re-enabling it, they are
not picked and they expire in the backend's queue. Now they're
properly picked again.
This fix must be backported to 1.8.
In commit 0c026f4 ("MINOR: threads: add more consistency between certain
variables in no-thread case"), we ensured that we don't have all_threads_mask
zeroed anymore. But one test was missed for the write() to the sync pipe.
This results in a situation where when running single-threaded, once a
server status changes, a wake-up message is written to the pipe and never
consumed, showing a 100% CPU usage.
No backport is needed.
Released version 1.9-dev1 with the following main changes :
- BUG/MEDIUM: kqueue: Don't bother closing the kqueue after fork.
- DOC: cache: update sections and fix some typos
- BUILD/MINOR: deviceatlas: enable thread support
- BUG/MEDIUM: tcp-check: Don't lock the server in tcpcheck_main
- BUG/MEDIUM: ssl: don't allocate shctx several time
- BUG/MEDIUM: cache: bad computation of the remaining size
- BUILD: checks: don't include server.h
- BUG/MEDIUM: stream: fix session leak on applet-initiated connections
- BUILD/MINOR: haproxy : FreeBSD/cpu affinity needs pthread_np header
- BUILD/MINOR: Makefile : enabling USE_CPU_AFFINITY
- BUG/MINOR: ssl: CO_FL_EARLY_DATA removal is managed by stream
- BUG/MEDIUM: threads/peers: decrement, not increment jobs on quitting
- BUG/MEDIUM: h2: don't report an error after parsing a 100-continue response
- BUG/MEDIUM: peers: fix some track counter rules dont register entries for sync.
- BUG/MAJOR: thread/peers: fix deadlock on peers sync.
- BUILD/MINOR: haproxy: compiling config cpu parsing handling when needed
- MINOR: config: report when "monitor fail" rules are misplaced
- BUG/MINOR: mworker: fix validity check for the pipe FDs
- BUG/MINOR: mworker: detach from tty when in daemon mode
- MINOR: threads: Fix pthread_setaffinity_np on FreeBSD.
- BUG/MAJOR: thread: Be sure to request a sync between threads only once at a time
- BUILD: Fix LDFLAGS vs. LIBS re linking order in various makefiles
- BUG/MEDIUM: checks: Be sure we have a mux if we created a cs.
- BUG/MINOR: hpack: fix debugging output of pseudo header names
- BUG/MINOR: hpack: must reject huffman literals padded with more than 7 bits
- BUG/MINOR: hpack: reject invalid header index
- BUG/MINOR: hpack: dynamic table size updates are only allowed before headers
- BUG/MAJOR: h2: correctly check the request length when building an H1 request
- BUG/MINOR: h2: immediately close if receiving GOAWAY after the last stream
- BUG/MINOR: h2: try to abort closed streams as soon as possible
- BUG/MINOR: h2: ":path" must not be empty
- BUG/MINOR: h2: fix a typo causing PING/ACK to be responded to
- BUG/MINOR: h2: the TE header if present may only contain trailers
- BUG/MEDIUM: h2: enforce the per-connection stream limit
- BUG/MINOR: h2: do not accept SETTINGS_ENABLE_PUSH other than 0 or 1
- BUG/MINOR: h2: reject incorrect stream dependencies on HEADERS frame
- BUG/MINOR: h2: properly check PRIORITY frames
- BUG/MINOR: h2: reject response pseudo-headers from requests
- BUG/MEDIUM: h2: remove connection-specific headers from request
- BUG/MEDIUM: h2: do not accept upper case letters in request header names
- BUG/MINOR: h2: use the H2_F_DATA_* macros for DATA frames
- BUG/MINOR: action: Don't check http capture rules when no id is defined
- BUG/MAJOR: hpack: don't pretend large headers fit in empty table
- BUG/MINOR: ssl: support tune.ssl.cachesize 0 again
- BUG/MEDIUM: mworker: also close peers sockets in the master
- BUG/MEDIUM: ssl engines: Fix async engines fds were not considered to fix fd limit automatically.
- BUG/MEDIUM: checks: a down server going to maint remains definitely stucked on down state.
- BUG/MEDIUM: peers: set NOLINGER on the outgoing stream interface
- BUG/MEDIUM: h2: fix handling of end of stream again
- MINOR: mworker: Update messages referencing exit-on-failure
- MINOR: mworker: Improve wording in `void mworker_wait()`
- CONTRIB: halog: Add help text for -s switch in halog program
- BUG/MEDIUM: email-alert: don't set server check status from a email-alert task
- BUG/MEDIUM: threads/vars: Fix deadlock in register_name
- MINOR: systemd: remove comment about HAPROXY_STATS_SOCKET
- DOC: notifications: add precisions about thread usage
- BUG/MEDIUM: lua/notification: memory leak
- MINOR: conn_stream: add new flag CS_FL_RCV_MORE to indicate pending data
- BUG/MEDIUM: stream-int: always set SI_FL_WAIT_ROOM on CS_FL_RCV_MORE
- BUG/MEDIUM: h2: automatically set CS_FL_RCV_MORE when the output buffer is full
- BUG/MEDIUM: h2: enable recv polling whenever demuxing is possible
- BUG/MEDIUM: h2: work around a connection API limitation
- BUG/MEDIUM: h2: debug incoming traffic in h2_wake()
- MINOR: h2: store the demux padding length in the h2c struct
- BUG/MEDIUM: h2: support uploading partial DATA frames
- MINOR: h2: don't demand that a DATA frame is complete before processing it
- BUG/MEDIUM: h2: don't switch the state to HREM before end of DATA frame
- BUG/MEDIUM: h2: don't close after the first DATA frame on tunnelled responses
- BUG/MEDIUM: http: don't disable lingering on requests with tunnelled responses
- BUG/MEDIUM: h2: fix stream limit enforcement
- BUG/MINOR: stream-int: don't try to receive again after receiving an EOS
- MINOR: sample: add len converter
- BUG: MAJOR: lb_map: server map calculation broken
- BUG: MINOR: http: don't check http-request capture id when len is provided
- MINOR: sample: rename the "len" converter to "length"
- BUG/MEDIUM: mworker: Set FD_CLOEXEC flag on log fd
- DOC/MINOR: intro: typo, wording, formatting fixes
- MINOR: netscaler: respect syntax
- MINOR: netscaler: remove the use of cip_magic only used once
- MINOR: netscaler: rename cip_len to clarify its uage
- BUG/MEDIUM: netscaler: use the appropriate IPv6 header size
- BUG/MAJOR: netscaler: address truncated CIP header detection
- MINOR: netscaler: check in one-shot if buffer is large enough for IP and TCP header
- MEDIUM: netscaler: do not analyze original IP packet size
- MEDIUM: netscaler: add support for standard NetScaler CIP protocol
- MINOR: spoe: add force-set-var option in spoe-agent configuration
- CONTRIB: iprange: Fix compiler warning in iprange.c
- CONTRIB: halog: Fix compiler warnings in halog.c
- BUG/MINOR: h2: properly report a stream error on RST_STREAM
- MINOR: mux: add flags to describe a mux's capabilities
- MINOR: stream-int: set flag SI_FL_CLEAN_ABRT when mux supports clean aborts
- BUG/MEDIUM: stream: don't consider abortonclose on muxes which close cleanly
- BUG/MEDIUM: checks: a server passed in maint state was not forced down.
- BUG/MEDIUM: lua: fix crash when using bogus mode in register_service()
- MINOR: http: adjust the list of supposedly cacheable methods
- MINOR: http: update the list of cacheable status codes as per RFC7231
- MINOR: http: start to compute the transaction's cacheability from the request
- BUG/MINOR: http: do not ignore cache-control: public
- BUG/MINOR: http: properly detect max-age=0 and s-maxage=0 in responses
- BUG/MINOR: cache: do not force the TX_CACHEABLE flag before checking cacheability
- MINOR: http: add a function to check request's cache-control header field
- BUG/MEDIUM: cache: do not try to retrieve host-less requests from the cache
- BUG/MEDIUM: cache: replace old object on store
- BUG/MEDIUM: cache: respect the request cache-control header
- BUG/MEDIUM: cache: don't cache the response on no-cache="set-cookie"
- BUG/MAJOR: connection: refine the situations where we don't send shutw()
- BUG/MEDIUM: checks: properly set servers to stopping state on 404
- BUG/MEDIUM: h2: properly handle and report some stream errors
- BUG/MEDIUM: h2: improve handling of frames received on closed streams
- DOC/MINOR: configuration: typo, formatting fixes
- BUG/MEDIUM: h2: ensure we always know the stream before sending a reset
- BUG/MEDIUM: mworker: don't close stdio several time
- MINOR: don't close stdio anymore
- BUG/MEDIUM: http: don't automatically forward request close
- BUG/MAJOR: hpack: don't return direct references to the dynamic headers table
- MINOR: h2: add a function to report pseudo-header names
- DEBUG: hpack: make hpack_dht_dump() expose the output file
- DEBUG: hpack: add more traces to the hpack decoder
- CONTRIB: hpack: add an hpack decoder
- MEDIUM: h2: prepare a graceful shutdown when the frontend is stopped
- BUG/MEDIUM: h2: properly handle the END_STREAM flag on empty DATA frames
- BUILD: ssl: silence a warning when building without NPN nor ALPN support
- CLEANUP: rbtree: remove
- BUG/MEDIUM: ssl: cache doesn't release shctx blocks
- BUG/MINOR: lua: Fix default value for pattern in Socket.receive
- DOC: lua: Fix typos in comments of hlua_socket_receive
- BUG/MEDIUM: lua: Fix IPv6 with separate port support for Socket.connect
- BUG/MINOR: lua: Fix return value of Socket.settimeout
- MINOR: dns: Handle SRV record weight correctly.
- BUG/MEDIUM: mworker: execvp failure depending on argv[0]
- MINOR: hathreads: add support for gcc < 4.7
- BUILD/MINOR: ancient gcc versions atomic fix
- BUG/MEDIUM: stream: properly handle client aborts during redispatch
- MINOR: spoe: add register-var-names directive in spoe-agent configuration
- MINOR: spoe: Don't queue a SPOE context if nothing is sent
- DOC: clarify the scope of ssl_fc_is_resumed
- CONTRIB: debug: fix a few flags definitions
- BUG/MINOR: poll: too large size allocation for FD events
- MINOR: sample: add date_us sample
- BUG/MEDIUM: peers: fix expire date wasn't updated if entry is modified remotely.
- MINOR: servers: Don't report duplicate dyncookies for disabled servers.
- MINOR: global/threads: move cpu_map at the end of the global struct
- MINOR: threads: add a MAX_THREADS define instead of LONGBITS
- MINOR: global: add some global activity counters to help debugging
- MINOR: threads/fd: Use a bitfield to know if there are FDs for a thread in the FD cache
- BUG/MEDIUM: threads/polling: Use fd_cache_mask instead of fd_cache_num
- BUG/MEDIUM: fd: maintain a per-thread update mask
- MINOR: fd: add a bitmask to indicate that an FD is known by the poller
- BUG/MEDIUM: epoll/threads: use one epoll_fd per thread
- BUG/MEDIUM: kqueue/threads: use one kqueue_fd per thread
- BUG/MEDIUM: threads/mworker: fix a race on startup
- BUG/MINOR: mworker: only write to pidfile if it exists
- MINOR: threads: Fix build when we're not compiling with threads.
- BUG/MINOR: threads: always set an owner to the thread_sync pipe
- BUG/MEDIUM: threads/server: Fix deadlock in srv_set_stopping/srv_set_admin_flag
- BUG/MEDIUM: checks: Don't try to release undefined conn_stream when a check is freed
- BUG/MINOR: kqueue/threads: Don't forget to close kqueue_fd[tid] on each thread
- MINOR: threads: Use __decl_hathreads instead of #ifdef/#endif
- BUILD: epoll/threads: Add test on MAX_THREADS to avoid warnings when complied without threads
- BUILD: kqueue/threads: Add test on MAX_THREADS to avoid warnings when complied without threads
- CLEANUP: sample: Fix comment encoding of sample.c
- CLEANUP: sample: Fix outdated comment about sample casts functions
- BUG/MINOR: sample: Fix output type of c_ipv62ip
- CLEANUP: Fix typo in ARGT_MSK6 comment
- CLEANUP: standard: Use len2mask4 in str2mask
- MINOR: standard: Add str2mask6 function
- MINOR: config: Add support for ARGT_MSK6
- MEDIUM: sample: Add IPv6 support to the ipmask converter
- MINOR: config: Enable tracking of up to MAX_SESS_STKCTR stick counters.
- BUG/MINOR: cli: use global.maxsock and not maxfd to list all FDs
- MINOR: polling: make epoll and kqueue not depend on maxfd anymore
- MINOR: fd: don't report maxfd in alert messages
- MEDIUM: polling: start to move maxfd computation to the pollers
- CLEANUP: fd/threads: remove the now unused fdtab_lock
- MINOR: poll: more accurately compute the new maxfd in the loop
- CLEANUP: fd: remove the unused "new" field
- MINOR: fd: move the hap_fd_{clr,set,isset} functions to fd.h
- MEDIUM: select: make use of hap_fd_* functions
- MEDIUM: fd: use atomic ops for hap_fd_{clr,set} and remove poll_lock
- MEDIUM: select: don't use the old FD state anymore
- MEDIUM: poll: don't use the old FD state anymore
- MINOR: fd: pass the iocb and owner to fd_insert()
- BUG/MINOR: threads: Update labels array because of changes in lock_label enum
- MINOR: stick-tables: Adds support for new "gpc1" and "gpc1_rate" counters.
- BUG/MINOR: epoll/threads: only call epoll_ctl(DEL) on polled FDs
- DOC: don't suggest using http-server-close
- MINOR: introduce proxy-v2-options for send-proxy-v2
- BUG/MEDIUM: spoe: Always try to receive or send the frame to detect shutdowns
- BUG/MEDIUM: spoe: Allow producer to read and to forward shutdown on request side
- MINOR: spoe: Remove check on min_applets number when a SPOE context is queued
- MINOR: spoe: Always link a SPOE context with the applet processing it
- MINOR: spoe: Replace sending_rate by a frequency counter
- MINOR: spoe: Count the number of frames waiting for an ack for each applet
- MEDIUM: spoe: Use an ebtree to manage idle applets
- MINOR: spoa_example: Count the number of frames processed by each worker
- MINOR: spoe: Add max-waiting-frames directive in spoe-agent configuration
- MINOR: init: make stdout unbuffered
- MINOR: early data: Don't rely on CO_FL_EARLY_DATA to wake up streams.
- MINOR: early data: Never remove the CO_FL_EARLY_DATA flag.
- MINOR: compiler: introduce offsetoff().
- MINOR: threads: Introduce double-width CAS on x86_64 and arm.
- MINOR: threads: add test and set/reset operations
- MINOR: pools/threads: Implement lockless memory pools.
- MAJOR: fd/threads: Make the fdcache mostly lockless.
- MEDIUM: fd/threads: Make sure we don't miss a fd cache entry.
- MAJOR: fd: compute the new fd polling state out of the fd lock
- MINOR: epoll: get rid of the now useless fd_compute_new_polled_status()
- MINOR: kqueue: get rid of the now useless fd_compute_new_polled_status()
- MINOR: poll: get rid of the now useless fd_compute_new_polled_status()
- MINOR: select: get rid of the now useless fd_compute_new_polled_status()
- CLEANUP: fd: remove the now unused fd_compute_new_polled_status() function
- MEDIUM: fd: make updt_fd_polling() use atomics
- MEDIUM: poller: use atomic ops to update the fdtab mask
- MINOR: fd: move the fd_{add_to,rm_from}_fdlist functions to fd.c
- BUG/MINOR: fd/threads: properly dereference fdcache as volatile
- MINOR: fd: remove the unneeded last CAS when adding an fd to the list
- MINOR: fd: reorder fd_add_to_fd_list()
- BUG/MINOR: time/threads: ensure the adjusted time is always correct
- BUG/MEDIUM: standard: Fix memory leak in str2ip2()
- MINOR: init: emit warning when -sf/-sd cannot parse argument
- BUILD: fd/threads: fix breakage build breakage without threads
- DOC: Describe routing impact of using interface keyword on bind lines
- DOC: Mention -Ws in the list of available options
- BUG/MINOR: config: don't emit a warning when global stats is incompletely configured
- BUG/MINOR: fd/threads: properly lock the FD before adding it to the fd cache.
- BUG/MEDIUM: threads: fix the double CAS implementation for ARMv7
- BUG/MEDIUM: ssl: Don't always treat SSL_ERROR_SYSCALL as unrecovarable.
- BUILD/MINOR: memory: stdint is needed for uintptr_t
- BUG/MINOR: init: Add missing brackets in the code parsing -sf/-st
- DOC: lua: new prototype for function "register_action()"
- DOC: cfgparse: Warn on option (tcp|http)log in backend
- BUG/MINOR: ssl/threads: Make management of the TLS ticket keys files thread-safe
- MINOR: sample: add a new "concat" converter
- BUG/MEDIUM: ssl: Shutdown the connection for reading on SSL_ERROR_SYSCALL
- BUG/MEDIUM: http: Switch the HTTP response in tunnel mode as earlier as possible
- BUG/MEDIUM: ssl/sample: ssl_bc_* fetch keywords are broken.
- MINOR: ssl/sample: adds ssl_bc_is_resumed fetch keyword.
- CLEANUP: cfgparse: Remove unused label end
- CLEANUP: spoe: Remove unused label retry
- CLEANUP: h2: Remove unused labels from mux_h2.c
- CLEANUP: pools: Remove unused end label in memory.h
- CLEANUP: standard: Fix typo in IPv6 mask example
- BUG/MINOR: pools/threads: don't ignore DEBUG_UAF on double-word CAS capable archs
- BUG/MINOR: debug/pools: properly handle out-of-memory when building with DEBUG_UAF
- MINOR: debug/pools: make DEBUG_UAF also detect underflows
- MINOR: stats: display the number of threads in the statistics.
- BUG/MINOR: h2: Set the target of dbuf_wait to h2c
- BUG/MEDIUM: h2: always consume any trailing data after end of output buffers
- BUG/MEDIUM: buffer: Fix the wrapping case in bo_putblk
- BUG/MEDIUM: buffer: Fix the wrapping case in bi_putblk
- BUG/MEDIUM: spoe: Remove idle applets from idle list when HAProxy is stopping
- Revert "BUG/MINOR: send-proxy-v2: string size must include ('\0')"
- MINOR: ssl: extract full pkey info in load_certificate
- MINOR: ssl: add ssl_sock_get_pkey_algo function
- MINOR: ssl: add ssl_sock_get_cert_sig function
- MINOR: connection: add proxy-v2-options ssl-cipher,cert-sig,cert-key
- MINOR: connection: add proxy-v2-options authority
- MINOR: systemd: Add section for SystemD sandboxing to unit file
- MINOR: systemd: Add SystemD's Protect*= options to the unit file
- MINOR: systemd: Add SystemD's SystemCallFilter option to the unit file
- CLEANUP: h2: rename misleading h2c_stream_close() to h2s_close()
- MINOR: h2: provide and use h2s_detach() and h2s_free()
- MEDIUM: h2: use a single buffer allocator
- MINOR/BUILD: fix Lua build on Mac OS X
- BUILD/MINOR: fix Lua build on Mac OS X (again)
- BUG/MINOR: session: Fix tcp-request session failure if handshake.
- CLEANUP: .gitignore: Ignore binaries from the contrib directory
- BUG/MINOR: unix: Don't mess up when removing the socket from the xfer_sock_list.
- DOC: buffers: clarify the purpose of the <from> pointer in offer_buffers()
- BUG/MEDIUM: h2: also arm the h2 timeout when sending
- BUG/MINOR: cli: Fix a crash when passing a negative or too large value to "show fd"
- CLEANUP: ssl: Remove a duplicated #include
- CLEANUP: cli: Remove a leftover debug message
- BUG/MINOR: cli: Fix a typo in the 'set rate-limit' usage
- BUG/MEDIUM: fix a 100% cpu usage with cpu-map and nbthread/nbproc
- BUG/MINOR: force-persist and ignore-persist only apply to backends
- BUG/MEDIUM: threads/unix: Fix a deadlock when a listener is temporarily disabled
- BUG/MAJOR: threads/queue: Fix thread-safety issues on the queues management
- BUG/MINOR: dns: don't downgrade DNS accepted payload size automatically
- TESTS: Add a testcase for multi-port + multi-server listener issue
- CLEANUP: dns: remove duplicate code in src/dns.c
- BUG/MINOR: seemless reload: Fix crash when an interface is specified.
- BUG/MINOR: cli: Ensure all command outputs end with a LF
- BUG/MINOR: cli: Fix a crash when sending a command with too many arguments
- BUILD: ssl: Fix build with OpenSSL without NPN capability
- BUG/MINOR: spoa-example: unexpected behavior for more than 127 args
- BUG/MINOR: lua: return bad error messages
- CLEANUP: lua/syntax: lua is a name and not an acronym
- BUG/MEDIUM: tcp-check: single connect rule can't detect DOWN servers
- BUG/MINOR: tcp-check: use the server's service port as a fallback
- BUG/MEDIUM: threads/queue: wake up other threads upon dequeue
- MINOR: log: stop emitting alerts when it's not possible to write on the socket
- BUILD/BUG: enable -fno-strict-overflow by default
- BUG/MEDIUM: fd/threads: ensure the fdcache_mask always reflects the cache contents
- DOC: log: more than 2 log servers are allowed
- MINOR: hash: add new function hash_crc32c
- MINOR: proxy-v2-options: add crc32c
- MINOR: accept-proxy: support proxy protocol v2 CRC32c checksum
- REORG: compact "struct server"
- MINOR: samples: add crc32c converter
- BUG/MEDIUM: h2: properly account for DATA padding in flow control
- BUG/MINOR: h2: ensure we can never send an RST_STREAM in response to an RST_STREAM
- BUG/MINOR: listener: Don't decrease actconn twice when a new session is rejected
- CLEANUP: map, stream: remove duplicate code in src/map.c, src/stream.c
- BUG/MINOR: lua: the function returns anything
- BUG/MINOR: lua funtion hlua_socket_settimeout don't check negative values
- CLEANUP: lua: typo fix in comments
- BUILD/MINOR: fix build when USE_THREAD is not defined
- MINOR: lua: allow socket api settimeout to accept integers, float, and doubles
- BUG/MINOR: hpack: fix harmless use of uninitialized value in hpack_dht_insert
- MINOR: cli/threads: make "show fd" report thread_sync_io_handler instead of "unknown"
- MINOR: cli: make "show fd" report the mux and mux_ctx pointers when available
- BUILD/MINOR: cli: fix a build warning introduced by last commit
- BUG/MAJOR: h2: remove orphaned streams from the send list before closing
- MINOR: h2: always call h2s_detach() in h2_detach()
- MINOR: h2: fuse h2s_detach() and h2s_free() into h2s_destroy()
- BUG/MEDIUM: h2/threads: never release the task outside of the task handler
- BUG/MEDIUM: h2: don't consider pending data on detach if connection is in error
- BUILD/MINOR: threads: always export thread_sync_io_handler()
- MINOR: mux: add a "show_fd" function to dump debugging information for "show fd"
- MINOR: h2: implement a basic "show_fd" function
- MINOR: cli: report cache indexes in "show fd"
- BUG/MINOR: h2: remove accidental debug code introduced with show_fd function
- BUG/MEDIUM: h2: always add a stream to the send or fctl list when blocked
- BUG/MINOR: checks: check the conn_stream's readiness and not the connection
- BUG/MINOR: fd: Don't clear the update_mask in fd_insert.
- BUG/MINOR: email-alert: Set the mailer port during alert initialization
- BUG/MINOR: cache: fix "show cache" output
- BUG/MAJOR: cache: fix random crashes caused by incorrect delete() on non-first blocks
- BUG/MINOR: spoe: Initialize variables used during conf parsing before any check
- BUG/MINOR: spoe: Don't release the context buffer in .check_timeouts callbaclk
- BUG/MINOR: spoe: Register the variable to set when an error occurred
- BUG/MINOR: spoe: Don't forget to decrement fpa when a processing is interrupted
- MINOR: spoe: Add metrics in to know time spent in the SPOE
- MINOR: spoe: Add options to store processing times in variables
- MINOR: log: move 'log' keyword parsing in dedicated function
- MINOR: log: Keep the ref when a log server is copied to avoid duplicate entries
- MINOR: spoe: Add loggers dedicated to the SPOE agent
- MINOR: spoe: Add support for option dontlog-normal in the SPOE agent section
- MINOR: spoe: use agent's logger to log SPOE messages
- MINOR: spoe: Add counters to log info about SPOE agents
- BUG/MAJOR: cache: always initialize newly created objects
- MINOR: servers: Support alphanumeric characters for the server templates names
- BUG/MEDIUM: threads: Fix the max/min calculation because of name clashes
- BUG/MEDIUM: connection: Make sure we have a mux before calling detach().
- BUG/MINOR: http: Return an error in proxy mode when url2sa fails
- MINOR: proxy: Add fe_defbe fetcher
- MINOR: config: Warn if resolvers has no nameservers
- BUG/MINOR: cli: Guard against NULL messages when using CLI_ST_PRINT_FREE
- MINOR: cli: Ensure the CLI always outputs an error when it should
- MEDIUM: sample: Extend functionality for field/word converters
- MINOR: export localpeer as an environment variable
- BUG/MEDIUM: kqueue: When adding new events, provide an output to get errors.
- BUILD: sample: avoid build warning in sample.c
- BUG/CRITICAL: h2: fix incorrect frame length check
- DOC: lua: update the links to the config and Lua API
- BUG/MINOR: pattern: Add a missing HA_SPIN_INIT() in pat_ref_newid()
- BUG/MAJOR: channel: Fix crash when trying to read from a closed socket
- BUG/MINOR: log: t_idle (%Ti) is not set for some requests
- BUG/MEDIUM: lua: Fix segmentation fault if a Lua task exits
- MINOR: h2: detect presence of CONNECT and/or content-length
- BUG/MEDIUM: h2: implement missing support for chunked encoded uploads
- BUG/MINOR: spoe: Fix counters update when processing is interrupted
- BUG/MINOR: spoe: Fix parsing of dontlog-normal option
- MEDIUM: cli: Add payload support
- MINOR: map: Add payload support to "add map"
- MINOR: ssl: Add payload support to "set ssl ocsp-response"
- BUG/MINOR: lua/threads: Make lua's tasks sticky to the current thread
- MINOR: sample: Add strcmp sample converter
- MINOR: http: Add support for 421 Misdirected Request
- BUG/MINOR: config: disable http-reuse on TCP proxies
- MINOR: ssl: disable SSL sample fetches when unsupported
- MINOR: ssl: add fetch 'ssl_fc_session_key' and 'ssl_bc_session_key'
- BUG/MINOR: checks: Fix check->health computation for flapping servers
- BUG/MEDIUM: threads: Fix the sync point for more than 32 threads
- BUG/MINOR, BUG/MINOR: lua: Put tasks to sleep when waiting for data
- MINOR: backend: implement random-based load balancing
- DOC/MINOR: clean up LUA documentation re: servers & array/table.
- MINOR: lua: Add server name & puid to LUA Server class.
- MINOR: lua: add get_maxconn and set_maxconn to LUA Server class.
- BUG/MINOR: map: correctly track reference to the last ref_elt being dumped
- BUG/MEDIUM: task: Don't free a task that is about to be run.
- MINOR: fd: Make the lockless fd list work with multiple lists.
- BUG/MEDIUM: pollers: Use a global list for fd shared between threads.
- MINOR: pollers: move polled_mask outside of struct fdtab.
- BUG/MINOR: lua: schedule socket task upon lua connect()
- BUG/MINOR: lua: ensure large proxy IDs can be represented
- BUG/MEDIUM: pollers/kqueue: use incremented position in event list
- BUG/MINOR: cli: don't stop cli_gen_usage_msg() when kw->usage == NULL
- BUG/MEDIUM: http: don't always abort transfers on CF_SHUTR
- BUG/MEDIUM: ssl: properly protect SSL cert generation
- BUG/MINOR: lua: Socket.send threw runtime error: 'close' needs 1 arguments.
- BUG/MINOR: spoe: Mistake in error message about SPOE configuration
- BUG/MEDIUM: spoe: Flags are not encoded in network order
- CLEANUP: spoe: Remove unused variables the agent structure
- DOC: spoe: fix a typo
- BUG/MEDIUM: contrib/mod_defender: Use network order to encode/decode flags
- BUG/MEDIUM: contrib/modsecurity: Use network order to encode/decode flags
- DOC: add some description of the pending rework of the buffer structure
- BUG/MINOR: ssl/lua: prevent lua from affecting automatic maxconn computation
- MINOR: lua: Improve error message
- BUG/MEDIUM: cache: don't cache when an Authorization header is present
- MINOR: ssl: set SSL_OP_PRIORITIZE_CHACHA
- BUG/MEDIUM: dns: Delay the attempt to run a DNS resolution on check failure.
- BUG/BUILD: threads: unbreak build without threads
- BUG/MEDIUM: servers: Add srv_addr default placeholder to the state file
- BUG/MEDIUM: lua/socket: Length required read doesn't work
- MINOR: tasks: Change the task API so that the callback takes 3 arguments.
- MAJOR: tasks: Create a per-thread runqueue.
- MAJOR: tasks: Introduce tasklets.
- MINOR: tasks: Make the number of tasks to run at once configurable.
- MAJOR: applets: Use tasks, instead of rolling our own scheduler.
- BUG/MEDIUM: stick-tables: Decrement ref_cnt in table_* converters
- MINOR: http: Log warning if (add|set)-header fails
- DOC: management: add the new wrew stats column
- MINOR: stats: also report the failed header rewrites warnings on the stats page
- BUG/MEDIUM: tasks: Don't forget to increase/decrease tasks_run_queue.
- BUG/MEDIUM: task: Don't forget to decrement max_processed after each task.
- MINOR: task: Also consider the task list size when getting global tasks.
- MINOR: dns: Implement `parse-resolv-conf` directive
- BUG/MEDIUM: spoe: Return an error when the wrong ACK is received in sync mode
- MINOR: task/notification: Is notifications registered ?
- BUG/MEDIUM: lua/socket: wrong scheduling for sockets
- BUG/MAJOR: lua: Dead lock with sockets
- BUG/MEDIUM: lua/socket: Notification error
- BUG/MEDIUM: lua/socket: Sheduling error on write: may dead-lock
- BUG/MEDIUM: lua/socket: Buffer error, may segfault
- DOC: contrib/modsecurity: few typo fixes
- DOC: SPOE.txt: fix a typo
- MAJOR: spoe: upgrade the SPOP version to 2.0 and remove the support for 1.0
- BUG/MINOR: contrib/spoa_example: Don't reset the status code during disconnect
- BUG/MINOR: contrib/mod_defender: Don't reset the status code during disconnect
- BUG/MINOR: contrib/modsecurity: Don't reset the status code during disconnect
- BUG/MINOR: contrib/mod_defender: update pointer on the end of the frame
- BUG/MINOR: contrib/modsecurity: update pointer on the end of the frame
- MINOR: task: Fix a compiler warning by adding a cast.
- MINOR: stats: also report the nice and number of calls for applets
- MINOR: applet: assign the same nice value to a new appctx as its owner task
- MINOR: task: Fix compiler warning.
- BUG/MEDIUM: tasks: Use the local runqueue when building without threads.
- MINOR: tasks: Don't define rqueue if we're building without threads.
- BUG/MINOR: unix: Make sure we can transfer abns sockets on seamless reload.
- MINOR: lua: Increase debug information
- BUG/MEDIUM: threads: handle signal queue only in thread 0
- BUG/MINOR: don't ignore SIG{BUS,FPE,ILL,SEGV} during signal processing
- BUG/MINOR: signals: ha_sigmask macro for multithreading
- BUG/MAJOR: map: fix a segfault when using http-request set-map
- DOC: regression testing: Add a short starting guide.
- MINOR: tasks: Make sure we correctly init and deinit a tasklet.
- BUG/MINOR: tasklets: Just make sure we don't pass a tasklet to the handler.
- BUG/MINOR: lua: Segfaults with wrong usage of types.
- BUG/MAJOR: ssl: Random crash with cipherlist capture
- BUG/MAJOR: ssl: OpenSSL context is stored in non-reserved memory slot
- BUG/MEDIUM: ssl: do not store pkinfo with SSL_set_ex_data
- MINOR: tests: First regression testing file.
- MINOR: reg-tests: Add reg-tests/README file.
- MINOR: reg-tests: Add a few regression testing files.
- DOC: Add new REGTEST tag info about reg testing.
- BUG/MEDIUM: fd: Don't modify the update_mask in fd_dodelete().
- MINOR: Some spelling cleanup in the comments.
- BUG/MEDIUM: threads: Use the sync point to check active jobs and exit
- MINOR: threads: Be sure to remove threads from all_threads_mask on exit
- REGTEST/MINOR: Wrong URI in a reg test for SSL/TLS.
- REGTEST/MINOR: Set HAPROXY_PROGRAM default value.
- REGTEST/MINOR: Add levels to reg-tests target.
- BUG/MAJOR: Stick-tables crash with segfault when the key is not in the stick-table
- BUG/BUILD: threads: unbreak build without threads
- BUG/MAJOR: stick_table: Complete incomplete SEGV fix
- MINOR: stick-tables: make stktable_release() do nothing on NULL
- BUG/MEDIUM: lua: possible CLOSE-WAIT state with '\n' headers
- MINOR: startup: change session/process group settings
- MINOR: systemd: consider exit status 143 as successful
- REGTEST/MINOR: Wrong URI syntax.
- CLEANUP: dns: remove obsolete macro DNS_MAX_IP_REC
- CLEANUP: dns: inacurate comment about prefered IP score
- MINOR: dns: fix wrong score computation in dns_get_ip_from_response
- MINOR: dns: new DNS options to allow/prevent IP address duplication
- REGTEST/MINOR: Unexpected curl URL globling.
- BUG/MINOR: ssl: properly ref-count the tls_keys entries
- MINOR: h2: keep a count of the number of conn_streams attached to the mux
- BUG/MEDIUM: h2: don't accept new streams if conn_streams are still in excess
- MINOR: h2: add the mux and demux buffer lengths on "show fd"
- BUG/MEDIUM: h2: never leave pending data in the output buffer on close
- BUG/MEDIUM: h2: make sure the last stream closes the connection after a timeout
- MINOR: tasklet: Set process to NULL.
- MINOR: buffer: implement a new file for low-level buffer manipulation functions
- MINOR: buffer: switch buffer sizes and offsets to size_t
- MINOR: buffer: add a few basic functions for the new API
- MINOR: buffer: Introduce b_sub(), b_add(), and bo_add()
- MINOR: buffer: Add b_set_data().
- MINOR: buffer: introduce b_realign_if_empty()
- MINOR: compression: pass the channel to http_compression_buffer_end()
- MINOR: channel: add a few basic functions for the new buffer API
- MINOR: channel/buffer: use c_realign_if_empty() instead of buffer_realign()
- MINOR: channel/buffer: replace buffer_slow_realign() with channel_slow_realign() and b_slow_realign()
- MEDIUM: channel: make channel_slow_realign() take a swap buffer
- MINOR: h2: use b_slow_realign() with the trash as a swap buffer
- MINOR: buffer: remove buffer_slow_realign() and the swap_buffer allocation code
- MINOR: channel/buffer: replace b_{adv,rew} with c_{adv,rew}
- MINOR: buffer: replace calls to buffer_space_wraps() with b_space_wraps()
- MINOR: buffer: remove bi_getblk() and bi_getblk_nc()
- MINOR: buffer: split bi_contig_data() into ci_contig_data and b_config_data()
- MINOR: buffer: remove bi_ptr()
- MINOR: buffer: remove bo_ptr()
- MINOR: buffer: remove bo_end()
- MINOR: buffer: remove bi_end()
- MINOR: buffer: remove bo_contig_data()
- MINOR: buffer: merge b{i,o}_contig_space()
- MINOR: buffer: replace bo_getblk() with direction agnostic b_getblk()
- MINOR: buffer: replace bo_getblk_nc() with b_getblk_nc() which takes an offset
- MINOR: buffer: replace bi_del() and bo_del() with b_del()
- MINOR: buffer: convert most b_ptr() calls to c_ptr()
- MINOR: h1: make h1_measure_trailers() take the byte count in argument
- MINOR: h2: clarify the fact that the send functions are unsigned
- MEDIUM: h2: prevent the various mux encoders from modifying the buffer
- MINOR: h1: make h1_skip_chunk_crlf() not depend on b_ptr() anymore
- MINOR: h1: make h1_parse_chunk_size() not depend on b_ptr() anymore
- MINOR: h1: make h1_measure_trailers() use an offset and a count
- MEDIUM: h2: do not use buf->o anymore inside h2_snd_buf's loop
- MEDIUM: h2: don't use b_ptr() nor b_end() anymore
- MINOR: buffer: get rid of b_end() and b_to_end()
- MINOR: buffer: make b_getblk_nc() take const pointers
- MINOR: buffer: make b_getblk_nc() take size_t for the block sizes
- MEDIUM: connection: make xprt->snd_buf() take the byte count in argument
- MEDIUM: mux: make mux->snd_buf() take the byte count in argument
- MEDIUM: connection: make xprt->rcv_buf() use size_t for the count
- MEDIUM: mux: make mux->rcv_buf() take a size_t for the count
- MINOR: connection: add a flags argument to rcv_buf()
- MINOR: connection: add a new receive flag : CO_RFL_BUF_WET
- MINOR: buffer: get rid of b_ptr() and convert its last users
- MINOR: buffer: use b_room() to determine available space in a buffer
- MINOR: buffer: replace buffer_not_empty() with b_data() or c_data()
- MINOR: buffer: replace buffer_empty() with b_empty() or c_empty()
- MINOR: buffer: make bo_putchar() use b_tail()
- MINOR: buffer: replace buffer_full() with channel_full()
- MINOR: buffer: replace bi_space_for_replace() with ci_space_for_replace()
- MINOR: buffer: replace buffer_pending() with ci_data()
- MINOR: buffer: replace buffer_flush() with c_adv(chn, ci_data(chn))
- MINOR: buffer: use c_head() instead of buffer_wrap_sub(c->buf, p-o)
- MINOR: buffer: use b_orig() to replace most references to b->data
- MINOR: buffer: Use b_add()/bo_add() instead of accessing b->i/b->o.
- MINOR: channel: remove almost all references to buf->i and buf->o
- MINOR: channel: Add co_set_data().
- MEDIUM: channel: adapt to the new buffer API
- MINOR: checks: adapt to the new buffer API
- MEDIUM: h2: update to the new buffer API
- MINOR: buffer: remove unused bo_add()
- MEDIUM: spoe: use the new buffer API for the SPOE buffer
- MINOR: stats: adapt to the new buffers API
- MINOR: cli: use the new buffer API
- MINOR: cache: use the new buffer API
- MINOR: stream-int: use the new buffer API
- MINOR: stream: use wrappers instead of directly manipulating buffers
- MINOR: backend: use new buffer API
- MEDIUM: http: use wrappers instead of directly manipulating buffers states
- MINOR: filters: convert to the new buffer API
- MINOR: payload: convert to the new buffer API
- MEDIUM: h1: port to new buffer API.
- MINOR: flt_trace: adapt to the new buffer API
- MEDIUM: compression: start to move to the new buffer API
- MINOR: lua: use the wrappers instead of directly manipulating buffer states
- MINOR: buffer: convert part bo_putblk() and bi_putblk() to the new API
- MINOR: buffer: adapt buffer_slow_realign() and buffer_dump() to the new API
- MAJOR: start to change buffer API
- MINOR: buffer: remove the check for output on b_del()
- MINOR: buffer: b_set_data() doesn't truncate output data anymore
- MINOR: buffer: rename the "data" field to "area"
- MEDIUM: buffers: move "output" from struct buffer to struct channel
- MINOR: buffer: replace bi_fast_delete() with b_del()
- MINOR: buffer: replace b{i,o}_put* with b_put*
- MINOR: buffer: add a new file for ist + buffer manipulation functions
- MINOR: checks: use b_putist() instead of b_putstr()
- MINOR: buffers: remove b_putstr()
- CLEANUP: buffer: minor cleanups to buffer.h
- MINOR: buffers/channel: replace buffer_insert_line2() with ci_insert_line2()
- MINOR: buffer: replace buffer_replace2() with b_rep_blk()
- MINOR: buffer: rename the data length member to '->data'
- MAJOR: buffer: finalize buffer detachment
- MEDIUM: chunks: make the chunk struct's fields match the buffer struct
- MAJOR: chunks: replace struct chunk with struct buffer
- DOC: buffers: document the new buffers API
- DOC: buffers: remove obsolete docs about buffers
- MINOR: tasklets: Don't attempt to add a tasklet in the list twice.
- MINOR: connections/mux: Add a new "subscribe" method.
- MEDIUM: connections/mux: Revamp the send direction.
- MINOR: connection: simplify subscription by adding a registration function
- BUG/MINOR: http: Set brackets for the unlikely macro at the right place
- BUG/MINOR: build: Fix compilation with debug mode enabled
- BUILD: Generate sha256 checksums in publish-release
- MINOR: debug: Add check for CO_FL_WILL_UPDATE
- MINOR: debug: Add checks for conn_stream flags
- MINOR: ist: Add the function isteqi
- BUG/MEDIUM: threads: Fix the exit condition of the thread barrier
- BUG/MEDIUM: mux_h2: Call h2_send() before updating polling.
- MINOR: buffers: simplify b_contig_space()
- MINOR: buffers: split b_putblk() into __b_putblk()
- MINOR: buffers: add b_xfer() to transfer data between buffers
- DOC: add some design notes about the new layering model
- MINOR: conn_stream: add a new CS_FL_REOS flag
- MINOR: conn_stream: add an rx buffer to the conn_stream
- MEDIUM: conn_stream: add cs_recv() as a default rcv_buf() function
- MEDIUM: stream-int: automatically call si_cs_recv_cb() if the cs has data on wake()
- MINOR: h2: make each H2 stream support an intermediary input buffer
- MEDIUM: h2: make h2_frt_decode_headers() use an intermediary buffer
- MEDIUM: h2: make h2_frt_transfer_data() copy via an intermediary buffer
- MEDIUM: h2: centralize transfer of decoded frames in h2_rcv_buf()
- MEDIUM: h2: move headers and data frame decoding to their respective parsers
- MEDIUM: buffers: make b_xfer() automatically swap buffers when possible
- MEDIUM: h2: perform a single call to the data layer in demux()
- MEDIUM: h2: don't call data_cb->recv() anymore
- MINOR: h2: make use of CS_FL_REOS to indicate that end of stream was seen
- MEDIUM: h2: use the default conn_stream's receive function
- DOC: add more design feedback on the new layering model
- MINOR: h2: add the error code and the max/last stream IDs to "show fd"
- BUG/MEDIUM: stream-int: don't immediately enable reading when the buffer was reportedly full
- BUG/MEDIUM: stats: don't ask for more data as long as we're responding
- BUG/MINOR: servers: Don't make "server" in a frontend fatal.
- BUG/MEDIUM: tasks: make sure we pick all tasks in the run queue
- BUG/MEDIUM: tasks: Decrement rqueue_size at the right time.
- BUG/MEDIUM: tasks: use atomic ops for active_tasks_mask
- BUG/MEDIUM: tasks: Make sure there's no task left before considering inactive.
- MINOR: signal: don't pass the signal number anymore as the wakeup reason
- MINOR: tasks: extend the state bits from 8 to 16 and remove the reason
- MINOR: tasks: Add a flag that tells if we're in the global runqueue.
- BUG/MEDIUM: tasks: make __task_unlink_rq responsible for the rqueue size.
- MINOR: queue: centralize dequeuing code a bit better
- MEDIUM: queue: make pendconn_free() work on the stream instead
- DOC: queue: document the expected locking model for the server's queue
- MINOR: queue: make sure pendconn->strm->pend_pos is always valid
- MINOR: queue: use a distinct variable for the assigned server and the queue
- MINOR: queue: implement pendconn queue locking functions
- MEDIUM: queue: get rid of the pendconn lock
- MINOR: tasks: Make active_tasks_mask volatile.
- MINOR: tasks: Make global_tasks_mask volatile.
- MINOR: pollers: Add a way to wake a thread sleeping in the poller.
- MINOR: threads/queue: Get rid of THREAD_WANT_SYNC in the queue code.
- BUG/MEDIUM: threads/sync: use sched_yield when available
- MINOR: ssl: BoringSSL matches OpenSSL 1.1.0
- BUG/MEDIUM: h2: prevent orphaned streams from blocking a connection forever
- BUG/MINOR: config: stick-table is not supported in defaults section
- BUILD/MINOR: threads: unbreak build with threads disabled
- BUG/MINOR: threads: Handle nbthread == MAX_THREADS.
- BUG/MEDIUM: threads: properly fix nbthreads == MAX_THREADS
- MINOR: threads: move "nbthread" parsing to hathreads.c
- BUG/MEDIUM: threads: unbreak "bind" referencing an incorrect thread number
- MEDIUM: proxy_protocol: Convert IPs to v6 when protocols are mixed
- BUILD/MINOR: compiler: fix offsetof() on older compilers
- SCRIPTS: git-show-backports: add missing quotes to "echo"
- MINOR: threads: add more consistency between certain variables in no-thread case
- MEDIUM: hathreads: implement a more flexible rendez-vous point
- BUG/MEDIUM: cli: make "show fd" thread-safe
The "show fd" command was implemented as a debugging aid but it's not
thread safe. Its features have grown, it can now dump some mux-specific
parts and is being used in production to capture some useful debugging
traces. But it will quickly crash the process when used during an H2 load
test for example, especially when haproxy is built with the DEBUG_UAF
option. It cannot afford not to be thread safe anymore. Let's make use
of the new rendez-vous point using thread_isolate() / thread_release()
to ensure that the data being dumped are not changing under us. The dump
becomes slightly slower under load but now it's safe.
This should be backported to 1.8 along with the rendez-vous point code
once considered stable enough.
The current synchronization point enforces certain restrictions which
are hard to workaround in certain areas of the code. The fact that the
critical code can only be called from the sync point itself is a problem
for some callback-driven parts. The "show fd" command for example is
fragile regarding this.
Also it is expensive in terms of CPU usage because it wakes every other
thread just to be sure all of them join to the rendez-vous point. It's a
problem because the sleeping threads would not need to be woken up just
to know they're doing nothing.
Here we implement a different approach. We keep track of harmless threads,
which are defined as those either doing nothing, or doing harmless things.
The rendez-vous is used "for others" as a way for a thread to isolate itself.
A thread then requests to be alone using thread_isolate() when approaching
the dangerous area, and then waits until all other threads are either doing
the same or are doing something harmless (typically polling). The function
only returns once the thread is guaranteed to be alone, and the critical
section is terminated using thread_release().
When threads are disabled, some variables such as tid and tid_bit are
still checked everywhere, the MAX_THREADS_MASK macro is ~0UL while
MAX_THREADS is 1, and the all_threads_mask variable is replaced with a
macro forced to zero. The compiler cannot optimize away all this code
involving checks on tid and tid_bit, and we end up in special cases
where all_threads_mask has to be specifically tested for being zero or
not. It is not even certain the code paths are always equivalent when
testing without threads and with nbthread 1.
Let's change this to make sure we always present a single thread when
threads are disabled, and have the relevant values declared as constants
so that the compiler can optimize all the tests away. Now we have
MAX_THREADS_MASK set to 1, all_threads_mask set to 1, tid set to zero
and tid_bit set to 1. Doing just this has removed 4 kB of code in the
no-thread case.
A few checks for all_threads_mask==0 have been removed since it never
happens anymore.
http-request set-src possibly creates a situation where src and dst
are from different address families. Convert both addresses to IPv6
to avoid a PROXY UNKNOWN.
This patch should be backported to haproxy 1.8.
The "process" directive on "bind" lines supports process references and
thread references. No check is performed on the thread number validity,
so that if a listener is only bound to non-existent threads, the traffic
will never be processed. It easily happens when setting one bind line per
thread with an incorrect (or reduced) thread count. No warning appears
and some random connections are never served. It also happens when setting
thread references with threads support disabled at build time.
This patch makes use of the all_threads_mask variable to detect if some
referenced threads don't exist, to emit a warning and fix this.
This patch needs to be backported to 1.8, just like the previous one which
it depends on (MINOR: threads: move "nbthread" parsing to hathreads.c).
The purpose is to make sure that all variables which directly depend
on this nbthread argument are set at the right moment. For now only
all_threads_mask needs to be set. It used to be set while calling
thread_sync_init() which is called too late for certain checks. The
same function handles threads and non-threads, which removes the need
for some thread-specific knowledge from cfgparse.c.
While moving Olivier's patch for nbthread==MAX_THREADS in commit
3e12304 ("BUG/MINOR: threads: Handle nbthread == MAX_THREADS.") to
hathreads.c, I missed one place resulting in the computed thread mask
being used as the thread count, which is worse than the initial bug.
Let's fix it properly this time.
This fix must be backported to 1.8 just like the other one.
If nbthread is MAX_THREADS, the shift operation needed to compute
all_threads_mask fails in thread_sync_init(). Instead pass a number
of threads to this function and let it compute the mask without
overflowing.
This should be backported to 1.8.
Depending on the optimization level, gcc may complain that wake_thread()
uses an invalid array index for poller_wr_pipe[] when called from
__task_wakeup(). Normally the condition to get there never happens,
but it's simpler to ifdef out this part of the code which is only
used to wake other threads up. No backport is needed, this was brought
by the recent introduction of the ability to wake a sleeping thread.
Thierry discovered that the following config crashes haproxy while
parsing the config (it's probably the smallest crasher) :
defaults
stick-table type ip size 1M
And indeed it does because it looks for the current proxy's name which it
does not have as it's the default one. This affects all versions since 1.6.
This fix must be backported to all versions back to 1.6.
Some h2 connections remaining in CLOSE_WAIT state forever have been
reported for a while. Thanks to detailed captures provided by Milan
Petruzelka, the sequence where this happens became clearer :
1) multiple streams compete for the mux and are queued in the send_list
2) at this point the mux has to emit a GOAWAY for any reason (for
example because it received a bad message)
3) the streams are woken up, notified about the error
4) h2_detach() is called for each of them
5) the CS they are detached from the H2S
6) since the streams are marked as blocked for some room, they are
orphaned and nothing more is done on them.
7) at this point, any activity on the connection goes through h2_wake()
which sees the conneciton in ERROR2 state, tries again to release
the streams, cannot, and stops polling (thus even connection errors
cannot be detected anymore).
=> from this point, no more events can be received on the connection, and
the streams remain orphaned forever.
This patch makes sure that we never return without doing anything once
an error was met. It has to act both on the h2_detach() side (for h2
streams being detached after the error was emitted) and on the h2_wake()
side (for errors reported after h2s have already been orphaned).
Many thanks to Milan Petruzelka and Janusz Dziemidowicz for their
awesome work on this issue, collecting traces and testing patches,
and to Olivier Doucet for extra testing and confirming the fix.
This fix must be backported to 1.8.
There is a corner case with the sync point which can significantly
degrade performance. The reason is that it forces all threads to busy
spin there, and that if there are less CPUs available than threads,
this busy activity from some threads will force others to wait longer
in epoll() or to simply be scheduled out while doing something else,
and will increase the time needed to reach the sync point.
Given that the sync point is not expected to be stressed *that* much,
better call sched_yield() while waiting there to release the CPU and
offer it to waiting threads.
On a simple test with 4 threads bound to two cores using "maxconn 1"
on the server line, the performance was erratic before the recent
scheduler changes (between 40 and 200 conn/s with hundreds of ms
response time), and it jumped to 7200 with 12ms response time with
this fix applied.
It should be backported to 1.8 since 1.8 is affected as well.
Now that we can wake one thread sleeping in the poller, we don't have to
use THREAD_WANT_SYNC any more.
This gives a significant performance boost on highly contended accesses
(servers with maxconn 1), showing a jump from 21k to 31k conn/s on a
test involving 8 threads.
Add a new pipe, one per thread, so that we can write on it to wake a thread
sleeping in a poller, and use it to wake threads supposed to take care of a
task, if they are all sleeping.
This lock was necessary to manipulate the pendconn element between
concurrent places, but was causing great difficulties in the list walk
by having to iterate over multiple entries instead of being able to
safely pick the first one (in fact the first element was always the
right one but the locking model was hard to prove).
Here since we know we can always rely on the queue's locks, we take
the queue's lock every time we need to modify the element. In practice
it was already the case everywhere except in pendconn_dequeue() which
only works on an element that was already detached. This function had
to be protected against the risk of meeting an incompletely detached
element (which could be unlinked but not yet assigned). By taking the
queue lock around the LIST_ISEMPTY test, it's enough to ensure that a
concurrent thread either didn't begin or had completed the operation.
The true benefit really is in pendconn_process_next_strm() where we
can again safely work with the first element of each queue. This will
significantly simplify next updates to this code.
The new pendconn_queue_lock() and pendconn_queue_unlock() functions are
made to make it more convenient to lock or unlock the pendconn queue
either at the proxy or the server depending on pendconn->srv. This way
it is possible to remove the open-coding of these locks at various places.
These ones have been used in pendconn_unlink() and pendconn_add(), thus
significantly simplifying the logic there.
The pendconn struct uses ->px and ->srv to designate where the element is
queued. There is something confusing regarding threads though, because we
have to lock the appropriate queue before inserting/removing elements, and
this queue may only be determined by looking at ->srv (if it's not NULL
it's the server, otherwise use the proxy). But pendconn_grab_from_px() and
pendconn_process_next_strm() both assign this ->srv field, making it
complicated to know what queue to lock before manipulating the element,
which is exactly why we have the pendconn_lock in the first place.
This commit introduces pendconn->target which is the target server that
the two aforementioned functions will set when assigning the server.
Thanks to this, the server pointer may always be relied on to determine
what queue to use.
pendconn_add() used to assign strm->pend_pos very late, after unlocking
the queue, so that a watching thread could see a random value in
pendconn->strm->pend_pos even while holding the lock on the element and
the queue itself. While there's currently nothing wrong with this, it
costs nothing to arrange it and will simplify code analysis later.
Now pendconn_free() takes a stream, checks that pend_pos is set, clears
it, and uses pendconn_unlink() to complete the job. It's cleaner and
centralizes all the bookkeeping work in pendconn_unlink() only and
ensures that there's a single place where the stream's position in the
queue is manipulated.
For now the pendconns may be dequeued at two places :
- pendconn_unlink(), which operates on a locked queue
- pendconn_free(), which operates on an unlocked queue and frees
everything.
Some changes are coming to the queue and we'll need to be able to be a
bit stricter regarding the places where we dequeue to keep the accounting
accurate. This first step renames the locked function __pendconn_unlink()
as it's for use by those aware of it, and introduces a new general purpose
pendconn_unlink() function which automatically grabs the necessary locks
before calling the former, and pendconn_cond_unlink() which additionally
checks the pointer and the presence in the queue.
As __task_wakeup() is responsible for increasing
rqueue_local[tid]/global_rqueue_size, make __task_unlink_rq responsible for
decreasing it, as process_runnable_tasks() isn't the only one that removes
tasks from runqueues.
This is never used and would even be wrong since the reasons are ORed
so two signals would be turned into a third value, just like if any
other reason was used at the same time.
We may remove the thread's bit in active_tasks_mask despite tasks for that
thread still being present in the global runqueue. To fix that, introduce
global_tasks_mask, and set the correspnding bits when we add a task to the
runqueue.
We need to decrement requeue_size when we remove a task form rqueue_local,
not when we remove if from the task list, or we'd also decrement it for any
tasklet, that was never in the rqueue in the first place.
Commit 09eeb76 ("BUG/MEDIUM: tasks: Don't forget to increase/decrease
tasks_run_queue.") addressed a count issue in the run queue and uncovered
another issue with the way the tasks are dequeued from the global run
queue. The number of tasks to pick is computed using an integral divide,
which results in up to nbthread-1 tasks never being run. The fix simply
consists in getting rid of the divide and checking the task count in the
loop.
No backport is needed, this is 1.9-specific.
When parsing the configuration, if "server", "default-server" or
"server-template" are found in a frontend, we first warn that it will be
ignored, only to be considered a fatal error later. Be true to our word, and
just ignore it.
This should be backported to 1.8 and 1.7.
The stats applet is still a bit hackish. It uses the HTTP txn to parse
the POST contents. Due to this it pretends not having parsed the request
from the buffer so that the HTTP parser continues to work fine on these
data. This comes with a side effect : the request lies pending in the
channel's buffer, and because of this, stream_int_update_applet() always
wakes the applet up. It's very visible when retrieving a large stats page
over a slow link as haproxy eats 100% of the CPU waiting for the data to
leave.
While the proper long term solution definitely is to consume these data
and parse the body from the applet, changing this is not suitable for a
fix.
What this patch does instead is to disable request polling as long as there
are pending data in the response buffer. Given that for almost all cases,
the applet remains busy sending data, this is at least enough to ensure
that we don't wake up for the pending request data while we're waiting for
the client to receive these data. Now a 5k backend stats page is dumped at
1% CPU over a 10 Mbps link instead of 100%, using 1500 epoll_wait() calls
instead of 80000.
Note that the previous fix (BUG/MEDIUM: stream-int: don't immediately
enable reading when the buffer was reportedly full) is necessary for the
effects of the fix to be noticed since both bugs have the exact same
effect.
This fix must be backported at least as far as 1.5.
There is a long-time issue which affects some applets, at least the stats
applet. If a large stats page is read over a slow link, regularly the
channel's buffer contains too many response data to allow another round
of ci_putblk() to copy a new message. In this case the applet calls
si_applet_cant_put() to mention that it failed to emit data into the
channel's buffer, and wants to be called only once some room is made.
The problem is that stream_int_update(), which is called from
process_stream(), will clear this flag whenever it sees there's some
spare room in the channel's buffer. It causes the applet to be woken
again immediately. This is very visible when reading a large stats
page over a slow link, because in this case haproxy will run at 100%
CPU and strace shows mostly epoll_wait(0). It is very likely that some
other applets like CLI, Lua, peers or SPOE have also been affected but
that the effect were less noticeable because it was mixed with traffic.
Ideally stream_int_update() should not touch these flags, but changing
this would require a very careful auditing of all users. Instead here
what we do is that we respect the flag if the channel still has output
data. This way the flag will automatically disappear once the buffer is
empty, and the applet function will be called only when input data
remains, if at all.
This patch alone is not enough to observe the behaviour change on the
stats page because another bug takes over, addressed by next patch
(BUG/MEDIUM: stats: don't ask for more data as long as we're responding).
When both are applied, dumping stats for 5k backends over a 10 Mbps link
take 1% CPU instead of 100%, with 1.5k epoll_wait() calls instead of 80k.
This fix should be backported at least as far as 1.5.
Instead of calling the data layer from each individual frame processing
function, we now call it from demux. This requires to know the h2s that
was created inside h2c_frt_handle_headers(), which is why the pointer is
now returned. This results in a small performance boost from 58k to 60k
POST requests/s compared to -master, thanks to half the number of
si_cs_recv_cb() calls and 66% calls to si_cs_wake_cb().
It's interesting to note that all calls to data_cb->recv() are now always
immediately followed by a call to data_cb->wake(). The next step should
be to let the ->wake handler perform the recv() call itself. For this it
will be useful to have some info on the CS to indicate whether or not it
is ready to be read (ie: contains a non-empty input buffer).
We still call the parser but it should soon not be needed anymore. The
decode functions don't need the buffer nor the max size anymore. They
must also not touch the CS_FL_EOS or CS_FL_RCV_MORE flags either, so
this is done within h2_rcv_buf() after transmission.
The "flags" argument to h2_frt_decode_headers() and h2_frt_transfer_data()
has been removed since it's not used anymore.
The purpose here is also to ensure we can split the lower from the top
layers. The way the CS_FL_MSG_MORE flag is set was updated so that it's
set or cleared upon exit depending on the buffer's remaining contents.
The purpose is to decode to a temporary buffer and then to copy this buffer
to the caller. This double-copy definitely has an impact on performance, the
test code goes down from 220k to 140k req/s, but this memcpy() will disappear
soon.
The test on CO_RFL_BUF_WET has become irrelevant now since we only use
the cs' rxbuf, so we cannot be blocked by "output" data that has to be
forwarded first. Thus instead we don't start until the rxbuf is empty
(it will be drained from any input data once the stream processes it).
The purpose is to decode to a temporary buffer and then to copy this buffer
to the caller upon request to avoid having to process frames on the fly
when called from the higher level. For now the buffer is only initialized
on stream creation via cs_new() and allocated if the buffer_wait's callback
is called.
If the cs has data pending or shutdown and the input channel is still
waiting for reads, let's simply call the recv() function from the wake()
callback. This will allow the lower layers to simply wake the upper one
up without having to consider the recv() nor anything else.
This function is generic and is able to automatically transfer data
from a conn_stream's rx buffer to the destination buffer. It does this
automatically if the mux doesn't define another rcv_buf() function.
In thread_sync_barrier, we exit when all threads have set their own bit in the
barrier mask. It is done by comparing it to all_threads_mask. But we must not
use a simple equality to do so, becaue all_threads_mask may change. Since commit
ba86c6c25 ("MINOR: threads: Be sure to remove threads from all_threads_mask on
exit"), when a thread exit, its bit is removed from all_threads_mask. Instead,
we must use a bitwise AND to test is all bits of all_threads_mask are set.
This also requires that all_threads_mask is set to volatile if we want to
catch changes.
This patch must be backported in 1.8.
It remained some fragments of the old buffers API in debug messages, here and
there.
This was caused by the recent buffer API changes, no backport is needed.
This new function wl_set_waitcb() prepopulates a wait_list with a tasklet
and a context and returns it so that it can be passed to ->subscribe() to
be added to a connection or conn_stream's wait_list. The caller doesn't
need to know all the insiders details anymore this way.
Totally nuke the "send" method, instead, the upper layer decides when it's
time to send data, and if it's not possible, uses the new subscribe() method
to be called when it can send data again.
Add a new "subscribe" method for connection, conn_stream and mux, so that
upper layer can subscribe to them, to be called when the event happens.
Right now, the only event implemented is "SUB_CAN_SEND", where the upper
layer can register to be called back when it is possible to send data.
The connection and conn_stream got a new "send_wait_list" entry, which
required to move a few struct members around to maintain an efficient
cache alignment (and actually this slightly improved performance).
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.
Most of the changes were made with spatch using this coccinelle script :
@rule_d1@
typedef chunk;
struct chunk chunk;
@@
- chunk.str
+ chunk.area
@rule_d2@
typedef chunk;
struct chunk chunk;
@@
- chunk.len
+ chunk.data
@rule_i1@
typedef chunk;
struct chunk *chunk;
@@
- chunk->str
+ chunk->area
@rule_i2@
typedef chunk;
struct chunk *chunk;
@@
- chunk->len
+ chunk->data
Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
Now the buffers only contain the header and a pointer to the storage
area which can be anywhere. This will significantly simplify buffer
swapping and will make it possible to map chunks on buffers as well.
The buf_empty variable was removed, as now it's enough to have size==0
and area==NULL to designate the empty buffer (thus a non-allocated head
is the empty buffer by default). buf_wanted for now is indicated by
size==0 and area==(void *)1.
The channels and the checks now embed the buffer's head, and the only
pointer is to the storage area. This slightly increases the unallocated
buffer size (3 extra ints for the empty buffer) but considerably
simplifies dynamic buffer management. It will also later permit to
detach unused checks.
The way the struct buffer is arranged has proven quite efficient on a
number of tests, which makes sense given that size is always accessed
and often first, followed by the othe ones.
This one is more generic and designed to work on a random block. It
may later get a b_rep_ist() variant since many strings are already
available as (ptr,len).
There was no point keeping that function in the buffer part since it's
exclusively used by HTTP at the channel level, since it also automatically
appends the CRLF. This further cleans up the buffer code.
The new file istbuf.h links the indirect strings (ist) with the buffers.
The purpose is to encourage addition of more standard buffer manipulation
functions that rely on this in order to improve the overall ease of use
along all the code. Just like ist.h and buf.h, this new file is not
expected to depend on anything beyond these two files.
A few functions were added and/or converted from buffer.h :
- b_isteq() : indicates if a buffer and a string match
- b_isteat() : consumes a string from the buffer if it matches
- b_istput() : appends a small string to a buffer (all or none)
- b_putist() : appends part of a large string to a buffer
The equivalent functions were removed from buffer.h and changed at the
various call places.
The two variants now do exactly the same (appending at the tail of the
buffer) so let's not keep the distinction between these classes of
functions and have generic ones for this. It's also worth noting that
b{i,o}_putchk() wasn't used at all and was removed.
There's no distinction between in and out data now. The latter covers
the needs of the former and supports wrapping. The extra cost is
negligible given the locations where it's used.
Since we never access this field directly anymore, but only through the
channel's wrappers, it can now move to the channel. The buffers are now
completely free from the distinction between input and output data.
Since we use "_data" for the amount of data at many places, as opposed to
"_space" for the amount of space, let's rename the "data" field to "area"
so that we can reuse "data" later for the amount of data in the buffer
(currently called "len" despite not being contigous).
This is intentionally the minimal and safest set of changes, some cleanups
area still required. These changes are quite tricky and cannot be
independantly tested, so it's important to keep this patch as bisectable
as possible.
buf_empty and buf_wanted were changed and are now exactly similar since
there's no <p> member in the structure anymore. Given that no test is
ever made in the code to check that buf == &buf_wanted, it may be possible
that we don't need to have two anymore, unless some buf_empty tests have
precedence. This will have to be investigated.
A significant part of this commit affects the HTTP compression code,
which used to deeply manipulate the input and output buffers without
any reasonable solution for a better abstraction. For this reason, if
any regression is met and designates this patch as the culprit, it is
important to run tests which specifically involve compression or which
definitely don't use it in order to spot the issue.
Cc: Olivier Houchard <ohouchard@haproxy.com>
This replaces chn->buf->p with ci_head(chn), chn->buf->o with co_data(chn)
and chn->buf->i with ci_data(chn). This is in order to help porting to the
new buffer API.
This part is tricky, it passes a channel where we used to have a buffer,
in order to reduce the API changes during the big switch. This way all
the channel's wrappers to distinguish between input and output are
available. It also makes sense given that the compression applies on
a channel since it's in the forwarding path.
The parser now uses the channel exclusively to access the data. In order
to avoid the cost of indirection, a local variable "input" was added to
the function that replaces buf->p. Given that this part is on the critical
path, it will have to be tested again for any visible performance loss.
This is aimed at easing the transition to the new API. There are a few places
which deserve some simplifications afterwards because ci_head() is called
often and may be placed into a local pointer.
A few locations still accessing ->i and ->o directly were changed to
use ci_data() and co_data() respectively. A call to b_del() was replaced
with co_set_data() in si_cs_send() so that ->o will is automatically be
decremented after the migration.
The buffer is not used as a forwarding buffer so we can simply map ->i
to ->len and ->p to b_head(). It *seems* that p is never modified, so
that we could even always use b_orig(). This needs to be rechecked.
There is no more distinction between ->i and ->o for the mux's buffers,
we always use b_data() to know the buffer's length since only one side
is used for each direction.
The code exclusively used ->i for data received and ->o for data sent. Now
it always uses b_data(), b_head() and b_tail() so that there is no more
distinction between ->i and ->o.
For the same consistency reasons, let's use b_empty() at the few places
where an empty buffer is expected, or c_empty() if it's done on a channel.
Some of these places were there to realign the buffer so
{b,c}_realign_if_empty() was used instead.
We used to have variations around buffer_total_space() and
size-buffer_len() or size-b_data(). Let's simplify all this. buffer_len()
was also removed as not used anymore.
With this flag we introduce the notion of "dry" vs "wet" buffers : some
demultiplexers like the H2 mux require as much room as possible for some
operations that are not retryable like decoding a headers frame. For this
they need to know if the buffer is congested with data scheduled for
leaving soon or not. Since the new API will not provide this information
in the buffer itself, the caller must indicate it. We never need to know
the amount of such data, just the fact that the buffer is not in its
optimal condition to be used for receipt. This "CO_RFL_BUF_WET" flag is
used to mention that such outgoing data are still pending in the buffer
and that a sensitive receiver should better let it "dry" before using it.
The mux and transport rcv_buf() now takes a "flags" argument, just like
the snd_buf() one or like the equivalent syscall lower part. The upper
layers will use this to pass some information such as indicating whether
the buffer is free from outgoing data or if the lower layer may allocate
the buffer itself.
It also returns a size_t. This is in order to clean the API. Note
that the H2 mux still uses some ints in the functions called from
h2_rcv_buf(), though it's not really a problem given that H2 frames
are smaller. It may deserve a general cleanup later though.
Just like we have a size_t for xprt->snd_buf(), we adjust to use size_t
for rcv_buf()'s count argument and return value. It also removes the
ambiguity related to the possibility to see a negative value there.
This way the mux doesn't need to modify the buffer's metadata anymore
nor to know the output's size. The mux->snd_buf() function now takes a
const buffer and it's up to the caller to update the buffer's state.
The return type was updated to return a size_t to comply with the count
argument.
This way the senders don't need to modify the buffer's metadata anymore
nor to know about the output's split point. This way the functions can
take a const buffer and it's clearer who's in charge of updating the
buffer after a send. That's why the buffer realignment is now performed
by the caller of the transport's snd_buf() functions.
The return type was updated to return a size_t to comply with the count
argument.
Now that there are no more users requiring to modify the buffer anymore,
switch these ones to const char and const buffer. This will make it more
obvious next time send functions are tempted to modify the buffer's output
count. Minor adaptations were necessary at a few call places which were
using char due to the function's previous prototype.
The few places where they were still used were replaced with b_peek() and
b_wrap() respectively. The parts making use of ->i and ->o should now be
convertible to the new API.
Functions h2s_frt_make_resp_headers() and h2s_frt_make_resp_data() used
to modify the buffer's output data count. This is problematic for the
buffer's rework as we don't want to rely on this anymore. This commit
modifies these functions to take an offset (relative to the buffer's
head) and a maximum byte count. Thus h2_snd_buf() now calls them with
buf->o and takes care of removing deleted data itself. The send functions
now almost support being passed const buffers (except for the data part
which is still embedded).
There's no more error return combined with the send output, though
the comments were misleading. Let's fix this as well as the functions'
prototypes. h2_snd_buf()'s return value wasn't changed yet since it
has to match the ->snd_buf prototype.
Till now the callers had to know which one to call for specific use cases.
Let's fuse them now since a single one will remain after the API migration.
Given that bi_del() may only be used where o==0, just combine the two tests
by first removing output data then only input.
This will be important so that we can parse a buffer without touching it.
Now we indicate where from the buffer's head we plan to start to copy, and
for how many bytes. This will be used by send functions to loop at the end
of the buffer without having to update the buffer's output byte count.
This new functoin limits itself to the amount of data available in the
buffer and doesn't care about the direction anymore. It's only called
from co_getblk() which already checks that no more than the available
output bytes is requested.