In case a header frame carrying trailers just fits into the HTX buffer
but leaves no room for the EOM block, we used to return the same code
as the one indicating we're missing data. This could would result in
such frames causing timeouts instead of immediate clean aborts. Now
they are properly reported as stream errors (since the frame was
decoded and the compression context is still synchronized).
This must be backported to 1.9.
When sending trailers, we may face an empty HTX trailers block or even
have to discard some of the headers there and be left with nothing to
send. RFC7540 forbids sending of empty HEADERS frames, so in this case
we turn to DATA frames (which is possible since after other DATA).
The code used to only check the input frame's contents to decide whether
or not to switch to a DATA frame, it didn't consider the possibility that
the frame only used to contain headers discarded later, thus it could still
emit an empty HEADERS frame in such a case. This patch makes sure that the
output frame size is checked instead to take the decision.
This patch must be backported to 1.9. In practice this situation is never
encountered since the discarded headers have really nothing to do in a
trailers block.
Now we atomically allocate the my_regex struct within function
regex_comp() and compile the regex or free both in case of failure. The
pointer to the allocated my_regex struct is returned directly. The
my_regex* argument to regex_comp() is removed.
Function regex_free() was modified so that it systematically frees the
my_regex entry. The function does nothing when called with a NULL as
argument (like free()). It will avoid existing risk of not properly
freeing the initialized area.
Other structures are also updated in order to be compatible (the ones
related to Lua and action rules).
This commit "MINOR: stick-table: Add prefixes to stick-table names"
prepended the "peers" section name to stick-table names declared in such "peers"
sections followed by a '/' character. This is not this name which must be sent
over the network to avoid collisions with stick-table name declared as backends.
As the '/' character is forbidden as first character of a backend name, we prefix
the stick-table names declared in peers sections only with a '/' character.
With such declarations:
peers mypeers
table t1
backend t1
stick-table ... peers mypeers
at peer protocol level, "t1" declared as stick-table in "mypeers" section is different
of "t1" stick-table declared as backend.
In src/peers.c, only two modifications were required: use ->nid stktable struct
member in place of ->id in peer_prepare_switchmsg() to prepare the stick-table
definition messages. Same thing in peer_treat_definemsg() to treat a stick-table
definition messages.
With this patch we add a prefix to stick-table names declared in "peers" sections
concatenating the "peers" section name followed by a '/' character with
the stick-table name. Consequently, "peers" sections have their own
namespace for their stick-tables. Obviously, these stick-table names are not the
ones which should be sent over the network. So these configurations must be
compatible and should make A and B peers communicate with peers protocol:
# haproxy A config, old way stick-table declerations
peers mypeers
peer A ...
peer B ...
backend t1
stick-table type string size 10m store gpc0 peers mypeers
# haproxy B config, new way stick-table declerations
peers mypeers
peer A ...
peer B ...
table t1 type string size store gpc0 10m
This "network" name is stored in ->nid new field of stktable struct. The "local"
stktable-name is still stored in ->id.
Add a list of proxies for all the stick-tables (->proxies_list struct stktable
member) so that to be able to compute the process bindings of the peers after having
parsed the configuration file.
The proxies are added to the stick-tables they reference when parsing
stick-tables lines in proxy sections, when checking the actions in
check_trk_action() and when resolving samples args for stick-tables
without checking is they are duplicates. We check only there is no loop.
Then, after having parsed everything, we add the proxy bindings to the
peers frontend bindings with stick-tables they reference.
This patch adds the support for the "table" line parsing in "peers" sections
to declare stick-table in such sections. This also prevents the user from having
to declare dummy backends sections with a unique stick-table inside.
Even if still supported, this usage will become deprecated.
To do so, the ->table member of proxy struct which is a stktable struct is replaced
by a pointer to a stktable struct allocated at parsing time in src/cfgparse-listen.c
for the dummy stick-table backends and in src/cfgparse.c for "peers" sections.
This has an impact on the code for stick-table sample converters and on the stickiness
rules parsers which first store the name of the dummy before resolving the rules.
This patch replaces proxy_tbl_by_name() calls by stktable_find_by_name() calls
to lookup for stick-tables stored in "stktable_by_name" ebtree at parsing time.
There is only one remaining place where proxy_tbl_by_name() is used: src/hlua.c.
At several places in the code we relied on the fact that ->size member of stick-table
was equal to zero to consider the stick-table was present by not configured,
this do not make sense anymore as ->table member of struct proxyis fow now on a pointer.
These tests are replaced by a test on ->table value itself.
In "peers" section we do not have to temporary store the name of the section the
stick-table are attached to because this name is obviously already known just after
having entered this "peers" section.
About the CLI stick-table I/O handler, the pointer to proxy struct is replaced by
a pointer to a stktable struct.
With this patch we move the code responsible of parsing "stick-table"
lines to implement parse_stick_table() function in src/stick-tabble.c
so that to be able to parse "stick-table" elsewhere than in proxy sections.
We have have also added a conf struct to stktable struct to store the filename
and the line in the file the stick-table has been parsed to help in
diagnosing and displaying any configuration issue.
This implements support for the new API which relies on a call to
setsockopt().
On systems that support it (currently, only Linux >= 4.11), this enables
using TCP fast open when connecting to server.
Please note that you should use the retry-on "conn-failure", "empty-response"
and "response-timeout" keywords, or the request won't be able to be retried
on failure.
Co-authored-by: Olivier Houchard <ohouchard@haproxy.com>
The connect() method had 2 arguments, "data", that tells if there's pending
data to be sent, and "delack" that tells if we have to use a delayed ack
inconditionally, or if the backend is configured with tcp-smart-connect.
Turn that into one argument, "flags".
That way it'll be easier to provide more informations to connect() without
adding extra arguments.
Libressl doesn't yet provide early data, so don't put the CO_FL_EARLY_SSL_HS
on the connection if we're building with libressl, or the handshake will
never be done.
SSL_SESSION_get0_id_context is introduced in LibreSSL-2.7.0
async operations are not supported by LibreSSL
early data is not supported by LibreSSL
packet_length is removed from SSL struct in LibreSSL
I was about to partly revert 294d0f08b3d100fcae0e71c26d4f9f93d26e3569,
because there were no 'X' for 'appsession' in the keyword matrix until
I checked the blame, realizing that the feature does not exist any more.
Clearly the documentation is confusing here, the removal note is only
listed *below* the old documentation and the supported sections still
show 'backend' and 'listen'.
It's been 3.5 years and 4 releases (1.6, 1.7, 1.8 and 1.9), I guess
this can be removed from the documentation of future versions.
If logs were emitted before creating the threads, then the dataptr pointer
keeps a copy of the end of the log header. Then after the threads are
created, the headers are reallocated for each thread. However the end
pointer was not reset until the end of the first second, which may result
in logs emitted by multiple threads during the first second to be mangled,
or possibly in some cases to use a memory area that was reused for something
else. The fix simply consists in reinitializing the end pointers immediately
when the threads are created.
This fix must be backported to 1.9 and 1.8.
The server warmup task is used when a server uses the "slowstart"
parameter. This task affects the server's weight and maxconn, and may
dequeue pending connections from the queue. This must be done under
the server's lock, which was not the case.
This must be backported to 1.9 and 1.8.
It happens that the retries stats use their own counter and are not
derived from the stream interface, so we need to update it as well
when performing an L7 retry.
No backport is needed.
Add a way to retry requests if we got a junk response from the server, ie
an incomplete response, or something that is not valid HTTP.
To do so, one can use the new "junk-response" keyword for retry-on.
Add a new keyword for retry-on, 0rtt-rejected. If set, we will try to
replay requests for which we sent early data that got rejected by the
server.
If that option is set, we will attempt to use 0rtt if "allow-0rtt" is set
on the server line even if the client didn't send early data.
When running in HTX mode, if we sent the request, but failed to get the
answer, either because the server just closed its socket, we hit a server
timeout, or we get a 404, 408, 425, 500, 501, 502, 503 or 504 error,
attempt to retry the request, exactly as if we just failed to connect to
the server.
To do so, add a new backend keyword, "retry-on".
It accepts a list of keywords, which can be "none" (never retry),
"conn-failure" (we failed to connect, or to do the SSL handshake),
"empty-response" (the server closed the connection without answering),
"response-timeout" (we timed out while waiting for the server response),
or "404", "408", "425", "500", "501", "502", "503" and "504".
The default is "conn-failure".
In sess_update_st_con_tcp(), if we have an error on the stream_interface
because we tried to send early_data but failed, don't flag the request
channel as CF_WRITE_ERROR, or we will never reach the analyser that sends
back the 425 response.
This should be backported to 1.9.
Currently the thread array is a local variable inside a function block
and there is no access to it from outside, which often complicates
debugging. Let's make it global and export it. Also the allocation
return is now checked.
It's still obscure how we managed to initialize an array of integers
with values always equal to the index, just to retrieve the value
from an opaque pointer to the index instead of directly using it! I
suspect it's a leftover from the very early threading experiments.
This commit gets rid of this and simply passes the thread ID as the
argument to run_thread_poll_loop(), thus significantly simplifying the
few call places and removing the need to allocate then free an array
of identity.
When we initially experimented with threads and processes support, we
needed to implement arrays of threads per process for cpu-map, but this
is not needed anymore since we support either threads or processes.
Let's simply make the thread-based cpu-map per thread and not per
thread and per process since that's not used anymore. Doing so reduces
the global struct from 33kB to 1.5kB.
When for some reason the session is not the owner of the connection anymore,
make sure we remove CO_FL_SESS_IDLE, even if we're about to call
conn->mux->destroy(), as the destroy may not destroy the connection
immediately if it's still in use.
This should be backported to 1.9.
u
The allocated regex is not freed properly and can cause a memory leak,
eg. when patterns are updated via CLI socket.
This patch should be backported to all supported versions.
When using the "use_backend" configuration directive, the configuration
file name stored as rule->file was not freed in some situations. This
was introduced in commit 4ed1c95 ("MINOR: http/conf: store the
use_backend configuration file and line for logs").
This patch should be backported to 1.9, 1.8 and 1.7.
In ha_ssl_write() and ha_ssl_read(), don't pretend we can retry a read/write
if we got a shutr/shutw, or we will never properly shutdown the connection.
When copying the settings for all servers when using server templates,
fix a typo, or we would never copy the length of the ALPN to be used for
checks.
This should be backported to 1.9.
This patch only removes a useless calculation on listener->maxaccept when nbproc
is set to 1. Indeed, the following formula has no effet in such case:
listener->maxaccept = (listener->maxaccept + nbproc - 1) / nbproc;
This patch may be backported as far as 1.5.
Only -1 and positive integers from 0 to INT_MAX are accepted. An error is
triggered during the config parsing for any other values.
This patch may be backported to all supported versions.
There is a bug when global.tune.maxaccept is set to -1 (no limit). It is pretty
visible with one process (nbproc sets to 1). The functions listener_accept() and
accept_queue_process() don't expect to handle negative maxaccept values. So
instead of accepting incoming connections without any limit, none are never
accepted and HAProxy loop infinitly in the scheduler.
When there are 2 or more processes, the bug is a bit more subtile. The limit for
a listener is set to 1. So only one connection is accepted at a time by a given
listener. This happens because the listener's maxaccept value is an unsigned
integer. In check_config_validity(), it is first set to UINT_MAX (-1 casted in
an unsigned integer), and then some calculations on it leads to an integer
overflow.
To fix the bug, the listener's maxaccept value is now a signed integer. So, if a
negative value is set for global.tune.maxaccept, we keep it untouched for the
listener and no calculation is made on it. Then, in the listener code, this
signed value is casted to a unsigned one. It simplifies all tests instead of
dealing with negative values. So, it limits the number of connections accepted
at a time to UINT_MAX at most. But, honestly, it not an issue.
This patch must be backported to 1.9 and 1.8.
It's not logical to report context switch rates per thread in show activity
because everything else is a counter and it's not even possible to compare
values. Let's only report counts. Further, this simplifies the scheduler's
code.
A previous commit 8d85aa44d ("BUG/MAJOR: map: fix segfault during
'show map/acl' on cli.") was provided to address a concurrency issue
between "show acl" and "clear acl" on the CLI. Sadly the code placed
there was copy-pasted without changing the element type (which was
struct stream in the original code) and not tested since the crash
is still present.
The reproducer is simple : load a large ACL file (e.g. geolocation
addresses), issue "show acl #0" in loops in one window and issue a
"clear acl #0" in the other one, haproxy crashes.
This fix was also tested with threads enabled and looks good since
the locking seems to work correctly in these areas though. It will
have to be backported as far as 1.6 since the commit above went
that far as well...
This patch implements the sampling and load-balancing of log servers configured
with "sample" new keyword implemented by this commit:
'MINOR: log: Add "sample" new keyword to "log" lines'.
As the list of ranges used to sample the log to balance is ordered, we only
have to maintain ->curr_idx member of smp_info struct which is the index of
the sample and check if it belongs or not to the current range to decide if we
must send it to the log server or not.
This patch implements the parsing of "sample" new optional keyword for "log" lines
to be able to sample and balance the load of log messages between serveral log
destinations declared by "log" lines. This keyword must be followed by a list of
comma seperated ranges of indexes numbered from 1 to define the samples to be used
to balance the load of logs to send. This "sample" keyword must be used on "log" lines
obviously before the remaining optional ones without keyword. The list of ranges
must be followed by a colon character to separate it from the log sampling size.
With such following configuration declarations:
log stderr local0
log 127.0.0.1:10001 sample 2-3,8-11:11 local0
log 127.0.0.2:10002 sample 5:5 local0
in addition to being sent to stderr, about the second "log" line, every 11 logs
the logs #2 up to #3 would be sent to 127.0.0.1:10001, then #8 up tp #11 four
logs would be sent to the same log server and so on periodically. Logs would be
sent to 127.0.0.2:100002 every 5 logs.
It is also possible to define the size of the sample with a value different of
the maximum of the high limits of the ranges, for instance as follows:
log 127.0.0.1:10001 sample 2-3,8-11:15 local0
as before the two logs #2 and #3 would be sent to 127.0.0.1:10001, then #8
up tp #11 logs, but in this case here, this would be done periodically every 15
messages.
Also note that the ranges must not overlap each others. This is to ease the
way the logs are periodically sent.
This simplifies the API and hide the details in the sample. This way, only
string and binary are aware of these info, because other types cannot be
partially encoded.
This patch may be backported to 1.9 and 1.8.
Fragmented arg will do fetch at every encode time, each fetch may get
different result if SMP_F_MAY_CHANGE, for example res.payload, but
the length already encoded in first fragment of the frame, that will
cause SPOA decode failed and waste resources.
This patch must be backported to 1.9 and 1.8.
The function stream_inc_be_http_req_ctr() is called at the beginning of the
analysers AN_REQ_HTTP_PROCESS_FE/BE. It as an effect only on the backend. But we
must be careful to call it only once. If the processing of HTTP rules is
interrupted in the middle, when the analyser is resumed, we must not call it
again. Otherwise, the tracked counters of the backend are incremented several
times.
This bug was reported in github. See issue #74.
This fix should be backported as far as 1.6.
In h2c_decode_headers(), now that we support CONTINUATION frames, we
try to defragment all pending frames at once before processing them.
However if the first is exactly full and the second cannot be parsed,
we don't detect the problem and we wait for the next part forever due
to an incorrect check on exit; we must abort the processing as soon as
the current frame remains full after defragmentation as in this case
there is no way to make forward progress.
Thanks to Yves Lafon for providing traces exhibiting the problem.
This must be backported to 1.9.
On some occasions we've had loops happening when processing actions
(e.g. a yield not being well understood) resulting in analysers being
called in loops until the analysis timeout without incrementing the
stream's call count, thus this type of bug cannot be caught by the
current protection system.
What this patch proposes is to start to measure the time spent in analysers
when profiling is enabled on the thread, in order to detect if a stream is
really misbehaving. In this case we measured the consumed CPU time, not the
wall clock time, so as not to be affected by possible noisy neighbours
sharing the same CPU. When more than 100ms are spent in an analyser, we
trigger the stream_dump_and_crash() function to report the anomaly.
The choice of 100ms comes from the fact that regular calls only take around
1 microsecond and it seems reasonable to accept a degradation factor of
100000, which covers very slow machines such as home gateways running on
sub-ghz processors, with extremely heavy configurations. Some complete
tests show that even this common bogus map_regm() entry supposedly designed
to extract a port from an IP:port entry does not trigger the timeout (25 ms
evaluation time for a 4kB header, exercise left to the reader to spot the
mistake) :
([0-9]{0,3}).([0-9]{0,3}).([0-9]{0,3}).([0-9]{0,3}):([0-9]{0,5}) \5
However this one purposely designed to kill haproxy definitely dies as it
manages to completely freeze the whole process for more than one second
on a 4 GHz CPU for only 120 bytes in :
(.{0,20})(.{0,20})(.{0,20})(.{0,20})(.{0,20})b \1
This protection will definitely help during the code stabilization period
and may possibly be left enabled later depending on reported issues or not.
If you've noticed that your workload is affected by this patch, please
report it as you have very likely found a bug. And in the mean time you
can turn profiling off to disable it.
If a stream is caught spinning over itself at more than 100000 loops per
second and for more than one second, the process will be aborted and the
offender reported on the console and logs. Typical figures usually are just
a few tens to hundreds per second over a very short time so there is a huge
margin here. Using even higher values could also work but there is the risk
of not being able to catch offenders if multiple ones start to bug at the
same time and share the load. This code should ideally be disabled for
stable releases, though in theory nothing should ever trigger it.
If an appctx is caught spinning over itself at more than 100000 loops per
second and for more than one second, the process will be aborted and the
offender reported on the console and logs. Typical figures usually are just
a few tens to hundreds per second over a very short time so there is a huge
margin here. Using even higher values could also work but there is the risk
of not being able to catch offenders if multiple ones start to bug at the
same time and share the load. This code should ideally be disabled for
stable releases, though in theory nothing should ever trigger it.
During 1.9 development (and even a bit after) we've started to face a
significant number of situations where streams were abusively spinning
due to an uncaught error flag or complex conditions that couldn't be
correctly identified. Sometimes streams wake appctx up and conversely
as well. More importantly when this happens the only fix is to restart.
This patch adds a new function to report a serious error, some relevant
info and to crash the process using abort() so that a core dump is
available. The purpose will be for this function to be called in various
situations where the process is unfixable. It will help detect these
issues much earlier during development and may even help fixing test
platforms which are able to automatically restart when such a condition
happens, though this is not the primary purpose.
This patch only provides the function and doesn't use it yet.
The stream's call rate measurement was added by commit 2e9c1d296 ("MINOR:
stream: measure and report a stream's call rate in "show sess"") but it
forgot to reset it in case of HTTP keep-alive (legacy mode), resulting
in incorrect measurements.
No backport is needed, unless the patch above is backported.