This patch simply brings HAProxy internal regex system to the Lua API.
Lua doesn't embed regexes, now it inherits from the regexes compiled
with haproxy.
Allow to register a function which will be called after the
configuration file parsing, at the end of the check_config_validity().
It's useful fo checking dependencies between sections or for resolving
keywords, pointers or values.
This commit implements a post section callback. This callback will be
used at the end of a section parsing.
Every call to cfg_register_section must be modified to use the new
prototype:
int cfg_register_section(char *section_name,
int (*section_parser)(const char *, int, char **, int),
int (*post_section_parser)());
Calls to build_logline() are audited in order to use dynamic trash buffers
allocated by alloc_trash_chunk() instead of global trash buffers.
This is similar to commits 07a0fec ("BUG/MEDIUM: http: Prevent
replace-header from overwriting a buffer") and 0d94576 ("BUG/MEDIUM: http:
prevent redirect from overwriting a buffer").
This patch should be backported in 1.7, 1.6 and 1.5. It relies on commit
b686afd ("MINOR: chunks: implement a simple dynamic allocator for trash
buffers") for the trash allocator, which has to be backported as well.
When switching the check code to a non-permanent connection, the new code
forgot to free the connection if an error happened and was returned by
connect_conn_chk(), leading to the check never be ran again.
Now when ssl_sock_{to,from}_buf are called, if the connection doesn't
feature CO_FL_WILL_UPDATE, they will first retrieve the updated flags
using conn_refresh_polling_flags() before changing any flag, then call
conn_cond_update_sock_polling() before leaving, to commit such changes.
Now when raw_sock_{to,from}_{pipe,buf} are called, if the connection
doesn't feature CO_FL_WILL_UPDATE, they will first retrieve the updated
flags using conn_refresh_polling_flags() before changing any flag, then
call conn_cond_update_sock_polling() before leaving, to commit such
changes. Note that the only real call to one of the __conn_* functions
is in fact in conn_sock_read0() which is called from here.
In transport-layer functions (snd_buf/rcv_buf), it's very problematic
never to know if polling changes made to the connection will be propagated
or not. This has led to some conn_cond_update_polling() calls being placed
at a few places to cover both the cases where the function is called from
the upper layer and when it's called from the lower layer. With the arrival
of the MUX, this becomes even more complicated, as the upper layer will not
have to manipulate anything from the connection layer directly and will not
have to push such updates directly either. But the snd_buf functions will
need to see their updates committed when called from upper layers.
The solution here is to introduce a connection flag set by the connection
handler (and possibly any other similar place) indicating that the caller
is committed to applying such changes on return. This way, the called
functions will be able to apply such changes by themselves before leaving
when the flag is not set, and the upper layer will not have to care about
that anymore.
SSL records are 16kB max. When trying to send larger data chunks at once,
SSL_read() only processes 16kB and ssl_sock_from_buf() believes it means
the system buffers are full, which is not the case, contrary to raw_sock.
This is particularly noticeable with HTTP/2 when using a 64kB buffer with
multiple streams, as the mux buffer can start to fill up pretty quickly
in this situation, slowing down the data delivery.
We've been keep this test for a connection being established since 1.5-dev14
when the stream-interface was still accessing the FD directly. The test on
CO_FL_HANDSHAKE and L{4,6}_CONN is totally useless here, and can even be
counter-productive on pure TCP where it could prevent a request from being
sent on a connection still attempting to complete its establishment. And it
creates an abnormal dependency between the layers that will complicate the
implementation of the mux, so let's get rid of it now.
This is based on the git SHA1 implementation and optimized to do word
accesses rather than byte accesses, and to avoid unnecessary copies into
the context array.
in 32af203b75 ("REORG: cli: move ssl CLI functions to ssl_sock.c")
"set ssl tls-key" was accidentally replaced with "set ssl tls-keys"
(keys instead of key). This is undocumented and breaks upgrades from
1.6 to 1.7.
This patch restores "set ssl tls-key" and also registers a helptext.
This should be backported to 1.7.
Commit 872085ce "BUG/MINOR: ssl: ocsp response with 'revoked' status is correct"
introduce a regression. OCSP_single_get0_status can return -1 and haproxy must
generate an error in this case.
Thanks to Sander Hoentjen who have spotted the regression.
This patch should be backported in 1.7, 1.6 and 1.5 if the patch above is
backported.
BoringSSL switch OPENSSL_VERSION_NUMBER to 1.1.0 for compatibility.
Fix BoringSSL call and openssl-compat.h/#define occordingly.
This will not break openssl/libressl compat.
On x86_64, when gcc instruments functions and compiles at -O0, it saves
the function's return value in register rbx before calling the trace
callback. It provides a nice opportunity to display certain useful
values (flags, booleans etc) during trace sessions. It's absolutely
not guaranteed that it will always work but it provides a considerable
help when it does so it's worth activating it. When building on a
different architecture, the value 0 is always reported as the return
value. On x86_64 with optimizations (-O), the RBX register will not
necessarily match and random values will be reported, but since it's
not the primary target it's not a problem.
Now any call to trace() in the code will automatically appear interleaved
with the call sequence and timestamped in the trace file. They appear with
a '#' on the 3rd argument (caller's pointer) in order to make them easy to
spot. If the trace functionality is not used, a dmumy weak function is used
instead so that it doesn't require to recompile every time traces are
enabled/disabled.
The trace decoder knows how to deal with these messages, detects them and
indents them similarly to the currently traced function. This can be used
to print function arguments for example.
Note that we systematically flush the log when calling trace() to ensure we
never miss important events, so this may impact performance.
The trace() function uses the same format as printf() so it should be easy
to setup during debugging sessions.
Don't forget to allocate tmptrash before using it, and free it once we're
done.
[wt: introduced by commit 64cc49cf ("MAJOR: servers: propagate server
status changes asynchronously"), no backport needed]
ocsp_status can be 'good', 'revoked', or 'unknown'. 'revoked' status
is a correct status and should not be dropped.
In case of certificate with OCSP must-stapling extension, response with
'revoked' status must be provided as well as 'good' status.
This patch can be backported in 1.7, 1.6 and 1.5.
Instead of having to manually handle lingering outside, let's make
conn_sock_shutw() check for it before calling shutdown(). We simply
don't want to emit the FIN if we're going to reset the connection
due to lingering. It's particularly important for silent-drop where
it's absolutely mandatory that no packet leaves the machine.
These flags are not exactly for the data layer, they instead indicate
what is expected from the transport layer. Since we're going to split
the connection between the transport and the data layers to insert a
mux layer, it's important to have a clear idea of what each layer does.
All function conn_data_* used to manipulate these flags were renamed to
conn_xprt_*.
The HTTP/2->HTTP/1 gateway will need to process HTTP/1 responses. We
cannot sanely rely on the HTTP/1 txn to parse a response because :
1) responses generated by haproxy such as error messages, redirects,
stats or Lua are neither parsed nor indexed ; this could be
addressed over the long term but will take time.
2) the http txn is useless to parse the body : the states present there
are only meaningful to received bytes (ie next bytes to parse) and
not at all to sent bytes. Thus chunks cannot be followed at all.
Even when implementing this later, it's unsure whether it will be
possible when dealing with compression.
So using the HTTP txn is now out of the equation and the only remaining
solution is to call an HTTP/1 message parser. We already have one, it was
slightly modified to avoid keeping states by benefitting from the fact
that the response was produced by haproxy and this is entirely available.
It assumes the following rules are true, or that incuring an extra cost
to work around them is acceptable :
- the response buffer is read-write and supports modifications in place
- headers sent through / by haproxy are not folded. Folding is still
implemented by replacing CR/LF/tabs/spaces with spaces if encountered
- HTTP/0.9 responses are never sent by haproxy and have never been
supported at all
- haproxy will not send partial responses, the whole headers block will
be sent at once ; this means that we don't need to keep expensive
states and can afford to restart the parsing from the beginning when
facing a partial response ;
- response is contiguous (does not wrap). This was already the case
with the original parser and ensures we can safely dereference all
fields with (ptr,len)
The parser replaces all of the http_msg fields that were necessary with
local variables. The parser is not called on an http_msg but on a string
with a start and an end. The HTTP/1 states were reused for ease of use,
though the request-specific ones have not been implemented for now. The
error position and error state are supported and optional ; these ones
may be used later for bug hunting.
The parser issues the list of all the headers into a caller-allocated
array of struct ist.
The content-length/transfer-encoding header are checked and the relevant
info fed the h1 message state (flags + body_len).
The chunk crlf parser used to depend on the channel and on the HTTP
message, eventhough it's not really needed. Let's remove this dependency
so that it can be used within the H2 to H1 gateway.
As part of this small API change, it was renamed to h1_skip_chunk_crlf()
to mention that it doesn't depend on http_msg anymore.
The chunk parser used to depend on the channel and on the HTTP message
but it's not really needed as they're only used to retrieve the buffer
as well as to return the number of bytes parsed and the chunk size.
Here instead we pass the (few) relevant information in arguments so that
the function may be reused without a channel nor an HTTP message (ie
from the H2 to H1 gateway).
As part of this API change, it was renamed to h1_parse_chunk_size() to
mention that it doesn't depend on http_msg anymore.
Functions http_parse_chunk_size(), http_skip_chunk_crlf() and
http_forward_trailers() were moved to h1.h and h1.c respectively so
that they can be called from outside. The parts that were inline
remained inline as it's critical for performance (+41% perf
difference reported in an earlier test). For now the "http_" prefix
remains in their name since they still depend on the http_msg type.
Certain types and enums are very specific to the HTTP/1 parser, and we'll
need to share them with the HTTP/2 to HTTP/1 translation code. Let's move
them to h1.c/h1.h. Those with very few occurrences or only used locally
were renamed to explicitly mention the relevant HTTP version :
enum ht_state -> h1_state.
http_msg_state_str -> h1_msg_state_str
HTTP_FLG_* -> H1_FLG_*
http_char_classes -> h1_char_classes
Others like HTTP_IS_*, HTTP_MSG_* are left to be done later.
Fix regression introduced by commit:
'MAJOR: servers: propagate server status changes asynchronously.'
The building of the log line was re-worked to be done at the
postponed point without lack of data.
[wt: this only affects 1.8-dev, no backport needed]
There's no point having the channel marked writable as these functions
only extract data from the channel. The code was retrieved from their
ci/co ancestors.
For HTTP/2 we'll need some buffer-only equivalent functions to some of
the ones applying to channels and still squatting the bi_* / bo_*
namespace. Since these names have kept being misleading for quite some
time now and are really getting annoying, it's time to rename them. This
commit will use "ci/co" as the prefix (for "channel in", "channel out")
instead of "bi/bo". The following ones were renamed :
bi_getblk_nc, bi_getline_nc, bi_putblk, bi_putchr,
bo_getblk, bo_getblk_nc, bo_getline, bo_getline_nc, bo_inject,
bi_putchk, bi_putstr, bo_getchr, bo_skip, bi_swpbuf
Since commit 'MAJOR: task: task scheduler rework'
0194897e54. LUA's
scheduling tasks are freezing.
A running task should not handle the scheduling itself
but let the task scheduler to handle it based on the
'expire' field.
[wt: no backport needed]
Clear MaxSslRate, SslFrontendMaxKeyRate and SslBackendMaxKeyRate when
clear counters is used, it was probably forgotten when those counters were
added.
[wt: this can probably be backported as far as 1.5 in dumpstats.c]
When the server weight is rised using the CLI, extra nodes have to be
allocated, or the weight will be effectively the same as the original one.
[wt: given that the doc made no explicit mention about this limitation,
this patch could even be backported as it fixes an unexpected behaviour]
Since around 1.5-dev12, we've been setting MSG_MORE on send() on various
conditions, including the fact that SHUTW_NOW is present, but we don't
check that it's accompanied with AUTO_CLOSE. The result is that on requests
immediately followed by a close (where AUTO_CLOSE is not set), the request
gets delayed in the TCP stack before being sent to the server. This is
visible with the H2 code where the end-of-stream flag is set on requests,
but probably happens when a POLL_HUP is detected along with the request.
The (lack of) presence of option abortonclose has no effect here since we
never send the SHUTW along with the request.
This fix can be backported to 1.7, 1.6 and 1.5.
The hour part of the timezone offset was multiplied by 60 instead of
3600, resulting in an inaccurate expiry. This bug was introduced in
1.6-dev1 by commit 4f3c87a ("BUG/MEDIUM: ssl: Fix to not serve expired
OCSP responses."), so this fix must be backported into 1.7 and 1.6.
In order to prepare multi-thread development, code was re-worked
to propagate changes asynchronoulsy.
Servers with pending status changes are registered in a list
and this one is processed and emptied only once 'run poll' loop.
Operational status changes are performed before administrative
status changes.
In a case of multiple operational status change or admin status
change in the same 'run poll' loop iteration, those changes are
merged to reach only the targeted status.
When using haproxy in front of distccd, it's possible to provide significant
improvements by only connecting when the preprocessing is completed, and by
selecting different farms depending on the payload size. This patch provides
two new sample fetch functions :
distcc_param(<token>[,<occ>]) : integer
distcc_body(<token>[,<occ>]) : binary
srv_queue([<backend>/]<server>) : integer
Returns an integer value corresponding to the number of connections currently
pending in the designated server's queue. If <backend> is omitted, then the
server is looked up in the current backend. It can sometimes be used together
with the "use-server" directive to force to use a known faster server when it
is not much loaded. See also the "srv_conn", "avg_queue" and "queue" sample
fetch methods.
Commit bcb86ab ("MINOR: session: add a streams field to the session
struct") added this list of streams that is not needed anymore. Let's
get rid of it now.
The warning appears when building with 51Degrees release that uses a new
Hash Trie algorithm (release version 3.2.12.12):
src/51d.c: In function init_51degrees:
src/51d.c:566:2: warning: enumeration value DATA_SET_INIT_STATUS_TOO_MANY_OPEN_FILES not handled in switch [-Wswitch]
switch (_51d_dataset_status) {
^
This patch can be backported in 1.7.
When
1) HAProxy configured to enable splice on both directions
2) After some high load, there are 2 input channels with their socket buffer
being non-empty and pipe being full at the same time, sitting in `fd_cache`
without any other fds.
The 2 channels will repeatedly be stopped for receiving (pipe full) and waken
for receiving (data in socket), thus getting out and in of `fd_cache`, making
their fd swapping location in `fd_cache`.
There is a `if (entry < fd_cache_num && fd_cache[entry] != fd) continue;`
statement in `fd_process_cached_events` to prevent frequent polling, but since
the only 2 fds are constantly swapping location, `fd_cache[entry] != fd` will
always hold true, thus HAProxy can't make any progress.
The root cause of the issue is dual :
- there is a single fd_cache, for next events and for the ones being
processed, while using two distinct arrays would avoid the problem.
- the write side of the stream interface wakes the read side up even
when it couldn't write, and this one really is a bug.
Due to CF_WRITE_PARTIAL not being cleared during fast forwarding, a failed
send() attempt will still cause ->chk_rcv() to be called on the other side,
re-creating an entry for its connection fd in the cache, causing the same
sequence to be repeated indefinitely without any opportunity to make progress.
CF_WRITE_PARTIAL used to be used for what is present in these tests : check
if a recent write operation was performed. It's part of the CF_WRITE_ACTIVITY
set and is tested to check if timeouts need to be updated. It's also used to
detect if a failed connect() may be retried.
What this patch does is use CF_WROTE_DATA() to check for a successful write
for connection retransmits, and to clear CF_WRITE_PARTIAL before preparing
to send in stream_int_notify(). This way, timeouts are still updated each
time a write succeeds, but chk_rcv() won't be called anymore after a failed
write.
It seems the fix is required all the way down to 1.5.
Without this patch, the only workaround at this point is to disable splicing
in at least one direction. Strictly speaking, splicing is not absolutely
required, as regular forwarding could theorically cause the issue to happen
if the timing is appropriate, but in practice it appears impossible to
reproduce it without splicing, and even with splicing it may vary.
The following config manages to reproduce it after a few attempts (haproxy
going 100% CPU and having to be killed) :
global
maxpipes 50000
maxconn 10000
listen srv1
option splice-request
option splice-response
bind :8001
server s1 127.0.0.1:8002
server$ tcploop 8002 L N20 A R10 S1000000 R10 S1000000 R10 S1000000 R10 S1000000 R10 S1000000
client$ tcploop 8001 N20 C T S1000000 R10 J
url_dec sample converter uses url_decode function to decode an URL. This
function fails by returning -1 when an invalid character is found. But the
sample converter never checked the return value and it used it as length for the
decoded string. Because it always succeeded, the invalid sample (with a string
length set to -1) could be used by other sample fetches or sample converters,
leading to undefined behavior like segfault.
The fix is pretty simple, url_dec sample converter just needs to return an error
when url_decode fails.
This patch must be backported in 1.7 and 1.6.
I misplaced the "if (!fdt.owner)" test so it can occasionally crash
when dumping an fd that's already been closed but still appears in
the table. It's not critical since this was not pushed into any
release nor backported though.
Health check currently cheat, they allocate a connection upon startup and never
release it, it's only recycled. The problem with doing this is that this code
is preventing the connection code from evolving towards multiplexing.
This code ensures that it's safe for the checks to run without a connection
all the time. Given that the code heavily relies on CO_FL_ERROR to signal
check errors, it is not trivial but in practice this is the principle adopted
here :
- the connection is not allocated anymore on startup
- new checks are not supposed to have a connection, so an attempt is made
to allocate this connection in the check task's context. If it fails,
the check is aborted on a resource error, and the rare code on this path
verifying the connection was adjusted to check for its existence (in
practice, avoid to close it)
- returning checks necessarily have a valid connection (which may possibly
be closed).
- a "tcp-check connect" rule tries to allocate a new connection before
releasing the previous one (but after closing it), so that if it fails,
it still keeps the previous connection in a closed state. This ensures
a connection is always valid here
Now it works well on all tested cases (regular and TCP checks, even with
multiple reconnections), including when the connection is forced to NULL or
randomly allocated.
The tcp-checks are very fragile. They can modify a connection's FD by
closing and reopening a socket without informing the connection layer,
which may then possibly touch the wrong fd. Given that the events are
only cleared and that the fd is just created, there should be no visible
side effect because the old fd is deleted so even if its flags get cleared
they were already, and the new fd already has them cleared as well so it's
a NOP. Regardless, this is too fragile and will not resist to threads.
In order to address this situation, this patch makes tcpcheck_main()
indicate if it closed a connection and report it to wake_srv_chk(), which
will then report it to the connection's fd handler so that it refrains
from updating the connection polling and the fd. Instead the connection
polling status is updated in the wake() function.
When tcp-checks are in use, a connection starts to be created, then it's
destroyed so that tcp-check can recreate its own. Now we directly move
to tcpcheck_main() when it's detected that tcp-check is in use.
The following config :
backend tcp9000
option tcp-check
tcp-check comment "this is a comment"
tcp-check connect port 10000
server srv 127.0.0.1:9000 check inter 1s
will result in a connection being first made to port 9000 then immediately
destroyed and re-created on port 10000, because the first rule is a comment
and doesn't match the test for the first rule being a connect(). It's
mostly harmless (unless the server really must not receive empty
connections) and the workaround simply consists in removing the comment.
Let's proceed like in other places where we simply skip leading comments.
A new function was made to make this lookup les boring. The fix should be
backported to 1.7 and 1.6.
Since this connection is not used at all anymore, do not allocate it.
It was verified that check successes and failures (both synchronous
and asynchronous) continue to be properly reported.
Upon fork() error, a first report is immediately made by connect_proc_chk()
via set_server_check_status(), then process_chk_proc() detects the error
code and makes up a dummy connection error to call chk_report_conn_err(),
which tries to retrieve the errno code from the connection, fails, then
saves the status message from the check, fails all "if" tests on its path
related to the connection then resets the check's state to the current one
with the current status message. All this useless chain is the only reason
why process checks require a connection! Let's simply get rid of this second
useless call.
The external process check code abused a little bit from copy-pasting to
the point of making think it requires a connection... The initialization
code only returns SF_ERR_NONE and SF_ERR_RESOURCE, so the other one can
be folded there. The code now only uses the connection to report the
error status.
Amazingly, this function takes a connection to report an error and is used
by process checks, placing a hard dependency between the connection and the
check preventing the mux from being completely implemented. Let's first get
rid of this.
A config containing "stats socket /path/to/socket mode admin" used to
silently start and be unusable (mode 0, level user) because the "mode"
parser doesn't take care of non-digits. Now it properly reports :
[ALERT] 276/144303 (7019) : parsing [ext-check.cfg:4] : 'stats socket' : ''mode' : missing or invalid mode 'admin' (octal integer expected)'
This can probably be backported to 1.7, 1.6 and 1.5, though reporting
parsing errors in very old versions probably isn't a good idea if the
feature was left unused for years.
This function can destroy a socket and create a new one, resulting in a
change of FD on the connection between recv() and send() for example,
which is absolutely not permitted, and can result in various funny games
like polling not being properly updated (or with the flags from a previous
fd) etc.
Let's only call this from the wake() callback which is more tolerant.
Ideally the operations should be made even more reliable by returning
a specific value to indicate that the connection was released and that
another one was created. But this is hasardous for stable releases as
it may reveal other issues.
This fix should be backported to 1.7 and 1.6.
In the rare case where the "tcp-check send" directive is the last one in
the list, it leaves the loop without sending the data. Fortunately, the
polling is still enabled on output, resulting in the connection handler
calling back to send what remains, but this is ugly and not very reliable.
This may be backported to 1.7 and 1.6.
While porting the connection to use the mux layer, it appeared that
tcp-checks wouldn't receive anymore because the polling is not enabled
before attempting to call xprt->rcv_buf() nor xprt->snd_buf(), and it
is illegal to call these functions with polling disabled as they
directly manipulate the FD state, resulting in an inconsistency where
the FD is enabled and the connection's polling flags disabled.
Till now it happened to work only because when recv() fails on EAGAIN
it calls fd_cant_recv() which enables polling while signaling the
failure, so that next time the message is received. But the connection's
polling is never enabled, and any tiny change resulting in a call to
conn_data_update_polling() immediately disables reading again.
It's likely that this problem already happens on some corner cases
such as multi-packet responses. It definitely breaks as soon as the
response buffer is full but we don't support consuming more than one
response buffer.
This fix should be backported to 1.7 and 1.6. In order to check for the
proper behaviour, this tcp-check must work and clearly show an SSH
banner in recvfrom() as observed under strace, otherwise it's broken :
tcp-check connect port 22
tcp-check expect rstring SSH
tcp-check send blah
This used to be needed to know whether there was a check in progress a
long time ago (before tcp_checks) but this is not true anymore and even
becomes wrong after the check is reused as conn_init() initializes it
to DEAD_FD_MAGIC.
A regression has been introduced in commit
00005ce5a1: the port being changed is the
one from 'cli_conn->addr.from' instead of 'cli_conn->addr.to'.
This patch fixes the regression.
Backport status: should be backported to HAProxy 1.7 and above.
Leaving the maintenance state and if the server remains in stopping mode due
to a tracked one:
- We mistakenly try to grab some pending conns and shutdown backup sessions.
- The proxy down time and last change were also mistakenly updated
This patch adds the ability to read from a wrapping memory area (ie:
buffers). The new functions are called "readv_<type>". The original
ones were renamed to start with "read_" to make the difference more
obvious between the read method and the returned type.
It's worth noting that the memory barrier in readv_bytes() is critical,
as otherwise gcc decides that it doesn't need the resulting data, but
even worse, removes the length checks in readv_u64() and happily
performs an out-of-bounds unaligned read using read_u64()! Such
"optimizations" are a bit borderline, especially when they impact
security like this...
snr_resolution_cb can be called with <nameserver> parameter set to NULL. So we
must check it before using it. This is done most of time, except when we deal
with invalid DNS response.
The check was totally messed up. In the worse case, it led to a crash, when
res.comp_algo sample fetch was retrieved on uncompressed response (with the
compression enabled).
This patch must be backported in 1.7.
There are several places where we see feconn++, feconn--, totalconn++ and
an increment on the frontend's number of connections and connection rate.
This is done exactly once per session in each direction, so better take
care of this counter in the session and simplify the callers. At least it
ensures a better symmetry. It also ensures consistency as till now the
lua/spoe/peers frontend didn't have these counters properly set, which can
be useful at least for troubleshooting.
session_accept_fd() may either successfully complete a session creation,
or defer it to conn_complete_session() depending of whether a handshake
remains to be performed or not. The problem is that all the code after
the handshake was duplicated between the two functions.
This patch make session_accept_fd() synchronously call
conn_complete_session() to finish the session creation. It is only needed
to check if the session's task has to be released or not at the end, which
is fairly minimal. This way there is now a single place where the sessions
are created.
Commit 8e3c6ce ("MEDIUM: connection: get rid of data->init() which was
not for data") simplified conn_complete_session() but introduced a
confusing check which cannot happen on CO_FL_HANDSHAKE. Make it clear
that this call is final and will either succeed and complete the
session or fail.
Instead of duplicating some sensitive listener-specific code in the
session and in the stream code, let's call listener_release() when
releasing a connection attached to a listener.
Each user of a session increments/decrements the jobs variable at its
own place, resulting in a real mess and inconsistencies between them.
Let's have session_new() increment jobs and session_free() decrement
it.
Some places call delete_listener() then decrement the number of
listeners and jobs. At least one other place calls delete_listener()
without doing so, but since it's in deinit(), it's harmless and cannot
risk to cause zombie processes to survive. Given that the number of
listeners and jobs is incremented when creating the listeners, it's
much more logical to symmetrically decrement them when deleting such
listeners.
This function is used to create a series of listeners for a specific
address and a port range. It automatically calls the matching protocol
handlers to add them to the relevant lists. This way cfgparse doesn't
need to manipulate listeners anymore. As an added bonus, the memory
allocation is checked.
Since everything is self contained in proto_uxst.c there's no need to
export anything. The same should be done for proto_tcp.c but the file
contains other stuff that's not related to the TCP protocol itself
and which should first be moved somewhere else.
cfgparse has no business directly calling each individual protocol's 'add'
function to create a listener. Now that they're all registered, better
perform a protocol lookup on the family and have a standard ->add method
for all of them.
It's a shame that cfgparse() has to make special cases of each protocol
just to cast the port to the target address family. Let's pass the port
in argument to the function. The unix listener simply ignores it.
It's pointless to read it on each and every accept(), as we only need
it for reporting in debugging mode a few lines later. Let's move this
part to the relevant block.
Since v1.7 it's pointless to reference a listener when greating a session
for an outgoing connection, it only complicates the code. SPOE and Lua were
cleaned up in 1.8-dev1 but the peers code was forgotten. This patch fixes
this by not assigning such a listener for outgoing connections. It also has
the extra benefit of not discounting the outgoing connections from the number
of allowed incoming connections (the code currently adds a safety marging of
3 extra connections to take care of this).
Adds cli commands to change at runtime whether informational messages
are prepended with severity level or not, with support for numeric and
worded severity in line with syslog severity level.
Adds stats socket config keyword severity-output to set default behavior
per socket on startup.
These notification management function and structs are generic and
it will be better to move in common parts.
The notification management functions and structs have names
containing some "lua" references because it was written for
the Lua. This patch removes also these references.
When we try to access to other proxy context, we must check
its existence because haproxy can kill it between the creation
and the usage.
This patch should be backported in 1.6 and 1.7
A previous fix was made to prevent the connection to a server if a redirect was
performed during the request processing when we wait to keep the client
connection alive. This fix introduced a pernicious bug. If a client closes its
connection immediately after sending a request, it is possible to keep stream
alive infinitely. This happens when the connection closure is caught when the
request is received, before the request parsing.
To be more specific, this happens because the close event is not "forwarded",
first because of the call to "channel_dont_connect" in the function
"http_apply_redirect_rule", then because we want to keep the client connection
alive, we explicitly call "channel_dont_close" in the function
"http_request_forward_body".
So, to fix the bug, instead of blocking the server connection, we force its
shutdown. This will force the stream to re-evaluate all connexions states. So it
will detect the client has closed its connection.
This patch must be backported in 1.7.
smp_fetch_ssl_fc_cl_str as very limited usage (only work with openssl == 1.0.2
compiled with the option enable-ssl-trace). It use internal cipher.algorithm_ssl
attribut and SSL_CIPHER_standard_name (available with ssl-trace).
This patch implement this (debug) function in a standard way. It used common
SSL_CIPHER_get_name to display cipher name. It work with openssl >= 1.0.2
and boringssl.
This reverts commit 19e8aa58f7.
It causes some trouble reported by Manu :
listen tls
[...]
server bla 127.0.0.1:8080
[ALERT] 248/130258 (21960) : parsing [/etc/haproxy/test.cfg:53] : 'server bla' : no method found to resolve address '(null)'
[ALERT] 248/130258 (21960) : Failed to initialize server(s) addr.
According to Nenad :
"It's not a good way to fix the issue we were experiencing
before. It will need a bigger rewrite, because the logic in
srv_iterate_initaddr needs to be changed."
Historically the DNS was the only way of updating the server IP dynamically
and the init-addr processing and state file load required the server to have
an FQDN defined. Given that we can now update the IP through the socket as
well and also can have different init-addr values (like IP and 'none') - this
requirement needs to be removed.
This patch should be backported to 1.7.
Since commit 5be2f35 ("MAJOR: polling: centralize calls to I/O callbacks")
that came into 1.6-dev1, each poller deals with its own events and decides
to signal ability to receive or send on a file descriptor based on the
active events on the file descriptor.
The commit above was incorrectly done for the epoll code. Instead of
checking the active events on the fd, it checks for the new events. In
general these ones are the same for POLL_IN and POLL_OUT since they
are always cleared prior to being computed, but it is possible that
POLL_HUP and POLL_ERR were initially reported and are not reported
again (especially for HUP). This could happen for example if POLL_HUP
and POLL_IN were received together, the pending data exactly correspond
to a full buffer which is read at once, preventing the POLL_HUP from
being dealt with in the same call, and on the next call only POLL_OUT
is reported (eg: to emit some response or peers protocol ACKs). In this
case fd_may_recv() will not be enabled anymore and the close event will
be missed.
It seems quite hard to trigger this case, though it might explain some
of the rare missed close events that were detected in the past on the
peers.
This fix needs to be backported to 1.6 and 1.7.
The server state and weight was reworked to handle
"pending" values updated by checks/CLI/LUA/agent.
These values are commited to be propagated to the
LB stack.
In further dev related to multi-thread, the commit
will be handled into a sync point.
Pending values are named using the prefix 'next_'
Current values used by the LB stack are named 'cur_'
This string is used in sample fetches so it is safe to use a preallocated trash
chunk instead of a buffer dynamically allocated during HAProxy startup.
First, this variable does not need to be publicly exposed because it is only
used by stick_table functions. So we declare it as a global static in
stick_table.c file. Then, it is useless to use a pointer. Using a plain struct
variable avoids any dynamic allocation.
swap_buffer is a global variable only used by buffer_slow_realign. So it has
been moved from global.h to buffer.c and it is allocated by init_buffer
function. deinit_buffer function has been added to release it. It is also used
to destroy the buffers' pool.
During the configuration parsing, log buffers are reallocated when
global.max_syslog_len is updated. This can be done serveral time. So, instead of
doing it serveral time, we do it only once after the configuration parsing.
Now, we use init_log_buffers and deinit_log_buffers to, respectively, initialize
and deinitialize log buffers used for syslog messages.
These functions have been introduced to be used by threads, to deal with
thread-local log buffers.
Trash buffers are reallocated when "tune.bufsize" parameter is changed. Here, we
just move the realloc after the configuration parsing.
Given that the config parser doesn't rely on the trash size, it should be
harmless.
Now, we use init_trash_buffers and deinit_trash_buffers to, respectively,
initialize and deinitialize trash buffers (trash, trash_buf1 and trash_buf2).
These functions have been introduced to be used by threads, to deal with
thread-local trash buffers.
Unfortunatly, a regression bug was introduced in the commit 1486b0ab
("BUG/MEDIUM: http: Switch HTTP responses in TUNNEL mode when body length is
undefined"). HTTP responses with undefined body length are blocked until timeout
when the compression is enabled. This bug was fixed in commit 69744d92
("BUG/MEDIUM: http: Fix blocked HTTP/1.0 responses when compression is
enabled").
The bug is still the same. We do not forward response data because we are
waiting for the synchronization between the HTTP request and the response.
To fix the bug, conditions to infinitly forward channel data has been slightly
relaxed. Now, it is done if there is no more analyzer registered on the channel
or if _FLT_END analyzer is still there but without the flag CF_FLT_ANALYZE. This
last condition is only possible when a channel is waiting the end of the other
side. So, fundamentally, it means that no one is analyzing the channel
anymore. This is a transitional state during a sync phase.
This patch must be backported in 1.7.
Patch "MINOR: ssl: support ssl-min-ver and ssl-max-ver with crt-list"
introduce ssl_methods in struct ssl_bind_conf. struct bind_conf have now
ssl_methods and ssl_conf.ssl_methods (unused). It's error-prone. This patch
remove the duplicate structure to avoid any confusion.
After careful inspection, this flag is set at exactly two places :
- once in the health-check receive callback after receipt of a
response
- once in the stream interface's shutw() code where CF_SHUTW is
always set on chn->flags
The flag was checked in the checks before deciding to send data, but
when it is set, the wake() callback immediately closes the connection
so the CO_FL_SOCK_WR_SH flag is also set.
The flag was also checked in si_conn_send(), but checking the channel's
flag instead is enough and even reveals that one check involving it
could never match.
So it's time to remove this flag and replace its check with a check of
CF_SHUTW in the stream interface. This way each layer is responsible
for its shutdown, this will ease insertion of the mux layer.
This flag is both confusing and wrong. It is supposed to report the
fact that the data layer has received a shutdown, but in fact this is
reported by CO_FL_SOCK_RD_SH which is set by the transport layer after
this condition is detected. The only case where the flag above is set
is in the stream interface where CF_SHUTR is also set on the receiving
channel.
In addition, it was checked in the health checks code (while never set)
and was always test jointly with CO_FL_SOCK_RD_SH everywhere, except in
conn_data_read0_pending() which incorrectly doesn't match the second
time it's called and is fortunately protected by an extra check on
(ic->flags & CF_SHUTR).
This patch gets rid of the flag completely. Now conn_data_read0_pending()
accurately reports the fact that the transport layer has detected the end
of the stream, regardless of the fact that this state was already consumed,
and the stream interface watches ic->flags&CF_SHUTR to know if the channel
was already closed by the upper layer (which it already used to do).
The now unused conn_data_read0() function was removed.
The session may need to enforce a timeout when waiting for a handshake.
Till now we used a trick to avoid allocating a pointer, we used to set
the connection's owner to the task and set the task's context to the
session, so that it was possible to circle between all of them. The
problem is that we'll really need to pass the pointer to the session
to the upper layers during initialization and that the only place to
store it is conn->owner, which is squatted for this trick.
So this patch moves the struct task* into the session where it should
always have been and ensures conn->owner points to the session until
the data layer is properly initialized.
Historically listeners used to have a handler depending on the upper
layer. But now it's exclusively process_stream() and nothing uses it
anymore so it can safely be removed.
Currently a task is allocated in session_new() and serves two purposes :
- either the handshake is complete and it is offered to the stream via
the second arg of stream_new()
- or the handshake is not complete and it's diverted to be used as a
timeout handler for the embryonic session and repurposed once we land
into conn_complete_session()
Furthermore, the task's process() function was taken from the listener's
handler in conn_complete_session() prior to being replaced by a call to
stream_new(). This will become a serious mess with the mux.
Since it's impossible to have a stream without a task, this patch removes
the second arg from stream_new() and make this function allocate its own
task. In session_accept_fd(), we now only allocate the task if needed for
the embryonic session and delete it later.
The ->init() callback of the connection's data layer was only used to
complete the session's initialisation since sessions and streams were
split apart in 1.6. The problem is that it creates a big confusion in
the layers' roles as the session has to register a dummy data layer
when waiting for a handshake to complete, then hand it off to the
stream which will replace it.
The real need is to notify that the transport has finished initializing.
This should enable a better splitting between these layers.
This patch thus introduces a connection-specific callback called
xprt_done_cb() which informs about handshake successes or failures. With
this, data->init() can disappear, CO_FL_INIT_DATA as well, and we don't
need to register a dummy data->wake() callback to be notified of errors.
The stream interface chk_snd() code checks if the connection has already
subscribed to write events in order to avoid attempting a useless write()
which will fail. But it used to check both the CO_FL_CURR_WR_ENA and the
CO_FL_DATA_WR_ENA flags, while the former may only be present without the
latterif either the other side just disabled writing did not synchronize
yet (which is harmless) or if it's currently performing a handshake, which
is being checked by the next condition and will be better dealt with by
properly subscribing to the data events.
This code was added back in 1.5-dev20 to limit the number of useless calls
to splice() but both flags were checked at once while only CO_FL_DATA_WR_ENA
was needed. This bug seems to have no impact other than making code changes
more painful. This fix may be backported down to 1.5 though is unlikely to
be needed there.
Till now connections used to rely exclusively on file descriptors. It
was planned in the past that alternative solutions would be implemented,
leading to member "union t" presenting sock.fd only for now.
With QUIC, the connection will need to continue to exist but will not
rely on a file descriptor but a connection ID.
So this patch introduces a "connection handle" which is either a file
descriptor or a connection ID, to replace the existing "union t". We've
now removed the intermediate "struct sock" which was never used. There
is no functional change at all, though the struct connection was inflated
by 32 bits on 64-bit platforms due to alignment.
Haproxy doesn't need this anymore, we're wasting cycles checking for
a Connection header in order to add "Connection: close" only in the
1.1 case so that haproxy sees it and removes it. All tests were run
in 1.0 and 1.1, with/without the request header, and in the various
keep-alive/close modes, with/without compression, and everything works
fine. It's worth noting that this header was inherited from the stats
applet and that the same cleanup probably ought to be done there as
well.
In the HTTP applet, we have to parse the response headers provided by
the application and to produce a response. strcasecmp() is expensive,
and chunk_append() even more as it uses a format string.
Here we check the string length before calling strcasecmp(), which
results in strcasecmp() being called only on the relevant header in
practise due to very few collisions on the name lengths, effectively
dividing the number of calls by 3, and we replace chunk_appendf()
with memcpy() as we already know the string lengths.
Doing just this makes the "hello-world" applet 5% faster, reaching
41400 requests/s on a core i5-3320M.
Commit 4850e51 ("BUG/MAJOR: lua: Do not force the HTTP analysers in
use-services") fixed a bug in how services are used in Lua, but this
fix broke the ability for Lua services to support keep-alive.
The cause is that we branch to a service while we have not yet set the
body analysers on the request nor the response, and when we start to
deal with the response we don't have any request analyser anymore. This
leads the response forward engine to detect an error and abort. It's
very likely that this also causes some random truncation of responses
though this has not been observed during the tests.
The root cause is not the Lua part in fact, the commit above was correct,
the problem is the implementation of the "use-service" action. When done
in an HTTP request, it bypasses the load balancing decisions and the
connect() phase. These ones are normally the ones preparing the request
analysers to parse the body when keep-alive is set. This should be dealt
with in the main process_use_service() function in fact.
That's what this patch does. If process_use_service() is called from the
http-request rule set, it enables the XFER_BODY analyser on the request
(since the same is always set on the response). Note that it's exactly
what is being done on the stats page which properly supports keep-alive
and compression.
This fix must be backported to 1.7 and 1.6 as the breakage appeared in 1.6.3.
The header's value was parsed with atoi() then compared against -1,
meaning that all the unparsable stuff returning zero was not considered
and that all multiples of 2^32 + 0xFFFFFFFF would continue to emit a
chunk.
Now instead we parse the value using a long long, only accept positive
values and consider all unparsable values as incorrect and switch to
either close or chunked encoding. This is more in line with what a
client (including haproxy's parser) would expect.
This may be backported as a cleanup to stable versions, though it's
really unlikely that Lua applications are facing side effects of this.
The following Lua code causes emission of a final chunk after the body,
which is wrong :
core.register_service("send204", "http", function(applet)
applet:set_status(204)
applet:start_response()
end)
Indeed, responses with status codes 1xx, 204 and 304 do not contain any
body and the message ends immediately after the empty header (cf RFC7230)
so by emitting a 0<CR><LF> we're disturbing keep-alive responses. There's
a workaround against this for now which consists in always emitting
"Content-length: 0" but it may not be cool with 304 when clients use
the headers to update their cache.
This fix must be backported to stable versions back to 1.6.
Commit d1aa41f ("BUG/MAJOR: lua: properly dequeue hlua_applet_wakeup()
for new scheduler") tried to address the side effects of the scheduler
changes on Lua, but it was not enough. Having some Lua code send data
in chunks separated by one second each clearly shows busy polling being
done.
The issue was tracked down to hlua_applet_wakeup() being woken up on
timer expiration, and returning itself without clearing the timeout,
causing the task to be re-inserted with an expiration date in the past,
thus firing again. In the past it was not a problem, as returning NULL
was enough to clear the timer. Now we can't rely on this anymore so
it's important to clear this timeout.
No backport is needed, this issue is specific to 1.8-dev and results
from an incomplete fix in the commit above.
Since commit 9d8dbbc ("MINOR: dns: Maximum DNS udp payload set to 8192") it's
possible to specify a packet size, but passing too large a size or a negative
size is not detected and results in memset() being performed over a 2GB+ area
upon receipt of the first DNS response, causing runtime crashes.
We now check that the size is not smaller than the smallest packet which is
the DNS header size (12 bytes).
No backport is needed.
Since the DNS layer split and the use of obj_type structure, we did not
updated propoerly the code used to compute the interval between 2
resolutions.
A nasty loop was then created when:
- resolver's hold.valid is shorter than servers' check.inter
- a valid response is available in the DNS cache
A task was woken up for a server's resolution. The servers pick up the IP
in the cache and returns without updating the 'last update' timestamp of
the resolution (which is normal...). Then the task is woken up again for
the same server.
The fix simply computes now properly the interval between 2 resolutions
and the cache is used properly while a new resolution is triggered if
the data is not fresh enough.
a reader pointer comparison to the end of the buffer was performed twice
while once is obviously enough.
backport status: this patch can be backported into HAProxy 1.6 (with some
modification. Please contact me)
For troubleshooting purpose, it may be important to know when a server
got its fqdn updated by a SRV record.
This patch makes HAProxy to report such events through stderr and logs.
RFC 6891 states that if a DNS client announces "big" payload size and
doesn't receive a response (because some equipments on the path may
block/drop UDP fragmented packets), then it should try asking for
smaller responses.
Following up DNS extension introduction, this patch aims at making the
computation of the maximum number of records in DNS response dynamic.
This computation is based on the announced payload size accepted by
HAProxy.
This patch fixes a bug where some servers managed by SRV record query
types never ever recover from a "no resolution" status.
The problem is due to a wrong function called when breaking the
server/resolution (A/AAAA) relationship: this is performed when a server's SRV
record disappear from the SRV response.
Contrary to 64-bits libCs where size_t type size is 8, on systems with 32-bits
size of size_t is 4 (the size of a long) which does not equal to size of uint64_t type.
This was revealed by such GCC warnings on 32bits systems:
src/flt_spoe.c:2259:40: warning: passing argument 4 of spoe_decode_buffer from
incompatible pointer type
if (spoe_decode_buffer(&p, end, &str, &sz) == -1)
^
As the already existing code using spoe_decode_buffer() already use such pointers to
uint64_t, in place of pointer to size_t ;), most of this code is in contrib directory,
this simple patch modifies the prototype of spoe_decode_buffer() so that to use a
pointer to uint64_t in place of a pointer to size_t, uint64_t type being the type
finally required for decode_varint().
The two macros EXPECT_LF_HERE and EAT_AND_JUMP_OR_RETURN were exported
for use outside the HTTP parser. They now take extra arguments to avoid
implicit pointers and jump labels. These will be used to reimplement a
minimalist HTTP/1 parser in the H1->H2 gateway.
We now refrain from clearing a session's variables, counters, and from
releasing it as long as at least one stream references it. For now it
never happens but with H2 this will be mandatory to avoid double frees.
Now each stream is added to the session's list of streams, so that it
will be possible to know all the streams belonging to a session, and
to know if any stream is still attached to a sessoin.
The "hold obsolete" timer is used to prevent HAProxy from moving a server to
an other IP or from considering the server as DOWN if the IP currently
affected to this server has not been seen for this period of time in DNS
responses.
That said, historically, HAProxy used to update servers as soon as the IP
has disappeared from the response. Current default timeout break this
historical behavior and may change HAProxy's behavior when people will
upgrade to 1.8.
This patch changes the default value to 0 to keep backward compatibility.
Edns extensions may be used to negotiate some settings between a DNS
client and a server.
For now we only use it to announce the maximum response payload size accpeted
by HAProxy.
This size can be set through a configuration parameter in the resolvers
section. If not set, it defaults to 512 bytes.
The function srv_set_fqdn() is used to update a server's fqdn and set
accordingly its DNS resolution.
Current implementation prevents a server whose update is triggered by a
SRV record from being linked to an existing resolution in the cache (if
applicable).
This patch aims at fixing this.
Current code implementation prevents multiple backends from relying on
the same SRV resolution. Actually, only the first backend which triggers
the resolution gets updated.
This patch makes HAProxy to process the whole list of the 'curr'
requesters to apply the changes everywhere (hence, the cache also applies
to SRV records...)
This function is particularly useful when debugging DNS resolution at
run time in HAProxy.
SRV records must be read differently, hence we have to update this
function.
DNS SRV records uses "dns name compression" to store the target name.
"dns compression" principle is simple. Let's take the name below:
3336633266663038.red.default.svc.cluster.local.
It can be stored "as is" in the response or it can be compressed like
this:
3336633266663038<POINTER>
and <POINTER> would point to the string
'.red.default.svc.cluster.local.' availble in the question section for
example.
This mechanism allows storing much more data in a single DNS response.
This means the flag "record->data_len" which stores the size of the
record (hence the whole string, uncompressed) can't be used to move the
pointer forward when reading responses. We must use the "offset" integer
which means the real number of bytes occupied by the target name.
If we don't do that, we can properly read the first SRV record, then we
loose alignment and we start reading unrelated data (still in the
response) leading to a false negative error treated as an "invalid"
response...
DNS response for SRV queries look like this:
- query dname looks like '_http._tcp.red.default.svc.cluster.local'
- answer record dname looks like
'3336633266663038.red.default.svc.cluster.local.'
Of course, it never matches... and it triggers many false positive in
the current code (which is suitable for A/AAAA/CNAME).
This patch simply ignores this dname matching in the case of SRV query
type.
First implementation of the DNS parser used to consider TRUNCATED
responses as errors and triggered a failover to an other query type
(usually A to AAAA or vice-versa).
When we query for SRV records, a TRUNCATED response still contains valid
records we can exploit, so we shouldn't trigger a failover in such case.
Note that we had to move the maching against the flag later in the
response parsing (actually, until we can read the query type....)
Use a cpuset_t instead of assuming the cpu mask is an unsigned long.
This should fix setting the CPU affinity on FreeBSD >= 11.
This patch should be backported to stable releases.
I just noticed the raw socket constructor was called __ssl_sock_deinit,
which is a bit confusing, and wrong twice, so the attached patch renames it
to __raw_sock_init, which seems more correct.
stream_free() used to close the front connection by using s->sess->origin,
instead of using s->si[0].end. This is very visible in HTTP/2 where the
front connection is abusively closed and causes all sort of issues including
crashes caused by double closes due to the same origin being referenced many
times.
It's also suspected that it may have caused some of the early issues met
during the Lua development.
It's uncertain whether stable branches are affected. It might be worth
backporting it once it has been confirmed not to create new impacts.
As mentionned in commit cf4e496c9 ("BUG/MEDIUM: build without openssl broken"),
commit 872f9c213 ("MEDIUM: ssl: add basic support for OpenSSL crypto engine")
broke the build without openssl support. But the former did only fix it when
openssl is not enabled, but not when it's not installed on the system :
In file included from src/haproxy.c:112:
include/proto/ssl_sock.h:24:25: openssl/ssl.h: No such file or directory
In file included from src/haproxy.c:112:
include/proto/ssl_sock.h:45: error: syntax error before "SSL_CTX"
include/proto/ssl_sock.h:75: error: syntax error before '*' token
include/proto/ssl_sock.h:75: warning: type defaults to `int' in declaration of `ssl_sock_create_cert'
include/proto/ssl_sock.h:75: warning: data definition has no type or storage class
include/proto/ssl_sock.h:76: error: syntax error before '*' token
include/proto/ssl_sock.h:76: warning: type defaults to `int' in declaration of `ssl_sock_get_generated_cert'
include/proto/ssl_sock.h:76: warning: data definition has no type or storage class
include/proto/ssl_sock.h:77: error: syntax error before '*' token
Now we also surround the include with #ifdef USE_OPENSSL to fix this. No
backport is needed since openssl async engines were not backported.
Commit 48a8332a introduce SSL_CTX_get0_privatekey in openssl-compat.h but
SSL_CTX_get0_privatekey access internal structure and can't be a candidate
to openssl-compat.h. The workaround with openssl < 1.0.2 is to use SSL_new
then SSL_get_privatekey.
Recent commit 7a4a0ac ("MINOR: cli: add a new "show fd" command") introduced
a warning when building at -O2 and above. The compiler doesn't know if a
variable's value might have changed between two if blocks so warns that some
values might be used uninitialized, which is not the case. Let's simply
initialize them to shut the warning.
Make it so for each server, instead of specifying a hostname, one can use
a SRV label.
When doing so, haproxy will first resolve the SRV label, then use the
resulting hostnames, as well as port and weight (priority is ignored right
now), to each server using the SRV label.
It is resolved periodically, and any server disappearing from the SRV records
will be removed, and any server appearing will be added, assuming there're
free servers in haproxy.
As DNS servers may not return all IPs in one answer, we want to cache the
previous entries. Those entries are removed when considered obsolete, which
happens when the IP hasn't been returned by the DNS server for a time
defined in the "hold obsolete" parameter of the resolver section. The default
is 30s.
With strict-sni, ssl connection will fail if no certificate match. Have no
certificate in bind line, fail on all ssl connections. It's ok with the
behavior of strict-sni. When 'generate-certificates' is set 'strict-sni' is
never used. When 'strict-sni' is set, default_ctx is never used. Allow to start
without certificate only in this case.
Use case is to start haproxy with ssl before customer start to use certificates.
Typically with 'crt' on a empty directory and 'strict-sni' parameters.
Since the commit f6b37c67 ["BUG/MEDIUM: ssl: in bind line, ssl-options after
'crt' are ignored."], the certificates generation is broken.
To generate a certificate, we retrieved the private key of the default
certificate using the SSL object. But since the commit f6b37c67, the SSL object
is created with a dummy certificate (initial_ctx).
So to fix the bug, we use directly the default certificate in the bind_conf
structure. We use SSL_CTX_get0_privatekey function to do so. Because this
function does not exist for OpenSSL < 1.0.2 and for LibreSSL, it has been added
in openssl-compat.h with the right #ifdef.
This one dumps the fdtab for all active FDs with some quickly interpretable
characters to read the flags (like upper case=set, lower case=unset). It
can probably be improved to report fdupdt[] and/or fdinfo[] but at least it
provides a good start and allows to see how FDs are seen. When the fd owner
is a connection, its flags are also reported as it can help compare with the
polling status, and the target (fe/px/sv) as well. When it's a listener, the
listener's state is reported as well as the frontend it belongs to.
The logical operations were inverted so enable/disable operations did
the opposite.
The bug is present since 1.7 so the fix should be backported there.
Commits 2ab8867 ("MINOR: ssl: compare server certificate names to the
SNI on outgoing connections") and 96c7b8d ("BUG/MINOR: ssl: Fix check
against SNI during server certificate verification") made it possible
to check that the server's certificate matches the name presented in
the SNI field. While it solves a class of problems, it opens another
one which is that by failing such a connection, we'll retry it and put
more load on the server. It can be a real problem if a user can trigger
this issue, which is what will very often happen when the SNI is forwarded
from the client to the server.
This patch solves this by detecting that this very specific hostname
verification failed and that the hostname was provided using SNI, and
then it simply disables retries and the failure is immediate.
At the time of writing this patch, the previous patches were not backported
(yet), so no backport is needed for this one unless the aforementionned
patches are backported as well. This patch requires previous patches
"BUG/MINOR: ssl: make use of the name in SNI before verifyhost" and
"MINOR: ssl: add a new error code for wrong server certificates".
If a server presents an unexpected certificate to haproxy, that is, a
certificate that doesn't match the expected name as configured in
verifyhost or as requested using SNI, we want to store that precious
information. Fortunately we have access to the connection in the
verification callback so it's possible to store an error code there.
For this purpose we use CO_ER_SSL_MISMATCH_SNI (for when the cert name
didn't match the one requested using SNI) and CO_ER_SSL_MISMATCH for
when it doesn't match verifyhost.
Commit 2ab8867 ("MINOR: ssl: compare server certificate names to the SNI
on outgoing connections") introduced the ability to check server cert
names against the name provided with in the SNI, but verifyhost was kept
as a way to force the name to check against. This was a mistake, because :
- if an SNI is used, any static hostname in verifyhost will be wrong ;
worse, if it matches and doesn't match the SNI, the server presented
the wrong certificate ;
- there's no way to have a default name to check against for health
checks anymore because the point above mandates the removal of the
verifyhost directive
This patch reverses the ordering of the check : whenever SNI is used, the
name provided always has precedence (ie the server must always present a
certificate that matches the requested name). And if no SNI is provided,
then verifyhost is used, and will be configured to match the server's
default certificate name. This will work both when SNI is not used and
for health checks.
If the commit 2ab8867 is backported in 1.7 and/or 1.6, this one must be
backported too.
This patch fixes the commit 2ab8867 ("MINOR: ssl: compare server certificate
names to the SNI on outgoing connections")
When we check the certificate sent by a server, in the verify callback, we get
the SNI from the session (SSL_SESSION object). In OpenSSL, tlsext_hostname value
for this session is copied from the ssl connection (SSL object). But the copy is
done only if the "server_name" extension is found in the server hello
message. This means the server has found a certificate matching the client's
SNI.
When the server returns a default certificate not matching the client's SNI, it
doesn't set any "server_name" extension in the server hello message. So no SNI
is set on the SSL session and SSL_SESSION_get0_hostname always returns NULL.
To fix the problemn, we get the SNI directly from the SSL connection. It is
always defined with the value set by the client.
If the commit 2ab8867 is backported in 1.7 and/or 1.6, this one must be
backported too.
Note: it's worth mentionning that by making the SNI check work, we
introduce another problem by which failed SNI checks can cause
long connection retries on the server, and in certain cases the
SNI value used comes from the client. So this patch series must
not be backported until this issue is resolved.
Adis Nezirovic reports:
While playing with Lua API I've noticed that core.proxies attribute
doesn't return all the proxies, more precisely the ones with same names
(e.g. for frontend and backend with the same name it would only return
the latter one).
So, this patch fixes this problem without breaking the actual behaviour.
We have two case of proxies with frontend/backend capabilities:
The first case is the listen. This case is not a problem because the
proxy object process these two entities as only one and it is the
expected behavior. With these case the "proxies" list works fine.
The second case is the frontend and backend with the same name. i think
that this case is possible for compatibility with 'listen' declaration.
These two proxes with same name and different capabilities must not
processed with the same object (different statitics, differents orders).
In fact, one the the two object crush the other one whoch is no longer
accessible.
To fix this problem, this patch adds two lists which are "frontends" and
"backends", each of these list contains specialized proxy, but warning
the "listen" proxy are declare in each list.
By Adis Nezirovic:
This is just for convenience and uniformity, Proxy.servers/listeners
returns a table/hash of objects with names as keys, but for example when
I want to pass such object to some other Lua function I have to manually
copy the name (or wrap the object), since the object itself doesn't
expose name info.
This patch simply adds the proxy name as member of the proxy object.
The recent scheduler change broke the Lua co-sockets due to
hlua_applet_wakeup() returning NULL after waking the applet up. With the
previous scheduler, returning NULL was a way to do nothing on return.
With the new one it keeps TASK_RUNNING set, causing all new notifications
to end up into t->pending_state instead of t->state, and prevents the
task from being added into the run queue again, so and it's never woken
up anymore.
The applet keeps waking up, causing hlua_socket_handler() to do nothing
new, then si_applet_wake_cb() calling stream_int_notify() to try to wake
the task up, which it can't do due to the TASK_RUNNING flag, then decide
that since the associated task is not in the run queue, it needs to call
stream_int_update_applet() to propagate the update. This last one finds
that the applet needs to be woken up to deal with the last reported events
and calling appctx_wakeup() again. Previously, this situation didn't exist
because the task was always added in the run queue despite the TASK_RUNNING
flag.
By returning the task instead in hlua_applet_wakeup(), we can ensure its
flag is properly cleared and the task is requeued if needed or just sits
waiting for new events to happen.
This fix requires the previous ones ("BUG/MINOR: lua: always detach the
tcp/http tasks before freeing them") and MINOR: task: always preinitialize
the task's timeout in task_init().
Thanks to Thierry, Christopher and Emeric for the long head-scratching
session!
No backport is needed as the bug doesn't appear in older versions and
it's unsure whether we'll not break something by backporting it.
In hlua_{http,tcp}_applet_release(), a call to task_free() is performed
to release the task, but no task_delete() is made on these tasks. Till
now it wasn't much of a problem because this was normally not done with
the task in the run queue, and the task was never put into the wait queue
since it doesn't have any timer. But with threading it will become an
issue. And not having this already prevents another bug from being fixed.
Thanks to Christopher for spotting this one. A backport to 1.7 and 1.6 is
preferred for safety.
For known methods (GET,POST...), in samples, an enum is used instead of a chunk
to reference the method. So there is no needs to allocate memory when a variable
is stored with this kind of sample.
First, the type SMP_T_METH was not handled by smp_dup function. It was never
called with this kind of samples, so it's not really a problem. But, this could
be useful in future.
For all known HTTP methods (GET, POST...), there is no extra space allocated for
a sample of type SMP_T_METH. But for unkown methods, it uses a chunk. So, like
for strings, we duplicate data, using a trash chunk.
The get_addr() method of the Lua Server class incorrectly used
INET_ADDRSTRLEN for IPv6 addresses resulting in failing to convert
longer IPv6 addresses to strings.
This fix should be backported to 1.7.
The get_addr() method of the Lua Server class was using the
'sockaddr_storage addr' member to get the port value. HAProxy does not
store ports in this member as it uses a separate member, called
'svc_port'.
This fix should be backported to 1.7.
In commit "MINOR: http: Switch requests/responses in TUNNEL mode only by
checking txn flags", it is possible to have an infinite loop on HTTP_MSG_CLOSING
state.
Akhnin Nikita reported that Lua doesn't build on Solaris 10 because
the code uses timegm() to parse a date, which is not provided there.
The recommended way to implement timegm() is broken in the man page,
as it is based on a change of the TZ environment variable at run time
before calling the function (which is obviously not thread safe, and
terribly inefficient).
Here instead we rely on the new my_timegm() function, it should be
sufficient for all known use cases.
timegm() is not provided everywhere and the documentation on how to
replace it is bogus as it proposes an inefficient and non-thread safe
alternative.
Here we reimplement everything needed to compute the number of seconds
since Epoch based on the broken down fields in struct tm. It is only
guaranteed to return correct values for correct inputs. It was successfully
tested with all possible 32-bit values of time_t converted to struct tm
using gmtime() and back to time_t using the legacy timegm() and this
function, and both functions always produced the same result.
Thanks to Benot Garnier for an instructive discussion and detailed
explanations of the various time functions, leading to this solution.
The commit 5db33cbd "MEDIUM: ssl: ssl_methods implementation is reworked and
factored for min/max tlsxx" drop the case when ssl lib have removed SSLv3.
The commit 1e59fcc5 "BUG/MINOR: ssl: Be sure that SSLv3 connection methods
exist for openssl < 1.1.0" fix build but it's false because haproxy think
that ssl lib support SSLv3.
SSL_OP_NO_* are flags to set in ssl_options and is the way haproxy do the
link between ssl capabilities and haproxy configuration. (The mapping table
is done via methodVersions). SSL_OP_NO_* is set to 0 when ssl lib doesn't
support a new TLS version. Older version (like SSLv3) can be removed at
build or unsupported (like libressl). In all case OPENSSL_NO_SSL3 is define.
To keep the same logic, this patch alter SSL_OP_NO_SSLv3 to 0 when SSLv3 is
not supported by ssl lib (when OPENSSL_NO_SSL3 is define).
The previous patch ("MINOR: http: Rely on analyzers mask to end processing in
forward_body functions") contains a bug for keep-alive transactions.
For these transactions, AN_REQ_FLT_END and AN_RES_FLT_END analyzers must be
removed only when all outgoing data was forwarded.
Instead of relying on request or response state, we use "chn->analysers" mask as
all other analyzers. So now, http_resync_states does not return anything
anymore.
The debug message in http_resync_states has been improved.
When the body length of a HTTP response is undefined, the HTTP parser is blocked
in the body parsing. Before HAProxy 1.7, in this case, because
AN_RES_HTTP_XFER_BODY is never set, there is no visible effect. When the server
closes its connection to terminate the response, HAProxy catches it as a normal
closure. Since 1.7, we always set this analyzer to enter at least once in
http_response_forward_body. But, in the present case, when the server connection
is closed, http_response_forward_body is called one time too many. The response
is correctly sent to the client, but an error is catched and logged with "SD--"
flags.
To reproduce the bug, you can use the configuration "tests/test-fsm.cfg". The
tests 3 and 21 hit the bug.
Idea to fix the bug is to switch the response in TUNNEL mode without switching
the request. This is possible because of previous patches.
First, we need to detect responses with undefined body length during states
synchronization. Excluding tunnelled transactions, when the response length is
undefined, TX_CON_WANT_CLO is always set on the transaction. So, when states are
synchronized, if TX_CON_WANT_CLO is set, the response is switched in TUNNEL mode
and the request remains unchanged.
Then, in http_msg_forward_body, we add a specific check to switch the response
in DONE mode if the body length is undefined and if there is no data filter.
This patch depends on following previous commits:
* MINOR: http: Switch requests/responses in TUNNEL mode only by checking txn flags
* MINOR: http: Reorder/rewrite checks in http_resync_states
This patch must be backported in 1.7 with 2 previous ones.
Today, the only way to have a request or a response in HTTP_MSG_TUNNEL state is
to have the flag TX_CON_WANT_TUN set on the transaction. So this is a symmetric
state. Both the request and the response are switch in same time in this
state. This can be done only by checking transaction flags instead of relying on
the other side state. This is the purpose of this patch.
This way, if for any reason we need to switch only one side in TUNNEL mode, it
will be possible. And to prepare asymmetric cases, we check channel flags in
DONE _AND_ TUNNEL states.
WARNING: This patch will be used to fix a bug. The fix will be commited in a
very next commit. So if the fix is backported, this one must be backported too.
The previous patch removed the forced symmetry of the TUNNEL mode during the
state synchronization. Here, we take care to remove body analyzer only on the
channel in TUNNEL mode. In fact, today, this change has no effect because both
sides are switched in same time. But this way, with some changes, it will be
possible to keep body analyzer on a side (to finish the states synchronization)
with the other one in TUNNEL mode.
WARNING: This patch will be used to fix a bug. The fix will be commited in a
very next commit. So if the fix is backported, this one must be backported too.
We cannot perform garbage collection on unreferenced thread.
This memory is now free and another Lua process can use it for
other things.
HAProxy is monothread, so this bug doesn't cause crash.
This patch must be backported in 1.6 and 1.7
In some cases, the socket is misused. The user can open socket and never
close it, or open the socket and close it without sending data. This
causes resources leak on all resources associated to the stream (buffer,
spoe, ...)
This is caused by the stream_shutdown function which is called outside
of the stream execution process. Sometimes, the shtudown is required
while the stream is not started, so the cleanup is ignored.
This patch change the shutdown mode of the session. Now if the session is
no longer used and the Lua want to destroy it, it just set a destroy flag
and the session kill itself.
This patch should be backported in 1.6 and 1.7
When we destroy the Lua session, we manipulates Lua stack,
so errors can raises. It will be better to catch these errors.
This patch should be backported in 1.6 and 1.7
Just forgot of reset the safe mode. This have not consequences
the safe mode just set a pointer on fucntion which is called only
and initialises a longjmp.
Out of lua execution, this longjmp is never executed and the
function is never called.
This patch should be backported in 1.6 and 1.7
When several stick-tables were configured with several peers sections,
only a part of them could be synchronized: the ones attached to the last
parsed 'peers' section. This was due to the fact that, at least, the peer I/O handler
refered to the wrong peer section list, in fact always the same: the last one parsed.
The fact that the global peer section list was named "struct peers *peers"
lead to this issue. This variable name is dangerous ;).
So this patch renames global 'peers' variable to 'cfg_peers' to ensure that
no such wrong references are still in use, then all the functions wich used
old 'peers' variable have been modified to refer to the correct peer list.
Must be backported to 1.6 and 1.7.
In ssl_sock_to_buf(), when we face a small read, we used to consider it
as an indication for the end of incoming data, as is the case with plain
text. The problem is that here it's quite different, SSL records are
returned at once so doing so make us wake all the upper layers for each
and every record. Given that SSL records are 16kB by default, this is
rarely observed unless the protocol employs small records or the buffers
are increased. But with 64kB buffers while trying to deal with HTTP/2
frames, the exchanges are obviously suboptimal as there are two messages
per frame (one for the frame header and another one for the frame payload),
causing the H2 parser to be woken up half of the times without being able
to proceed :
try=65536 ret=45
try=65536 ret=16384
try=49152 ret=9
try=49143 ret=16384
try=32759 ret=9
try=32750 ret=16384
try=16366 ret=9
try=32795 ret=27
try=49161 ret=9
try=49152 ret=16384
try=49116 ret=9
try=49107 ret=16384
try=32723 ret=9
try=32714 ret=16384
try=16330 ret=9
try=32831 ret=63
try=49161 ret=9
try=49152 ret=16384
try=49080 ret=9
try=49071 ret=2181
With this change, the buffer can safely be filled with all pending frames
at once when they are available.
Only 100 was considered informational instead of all 1xx. This can be
a problem when facing a 102 ("progress") or with the upcoming 103 for
early hints. Let's properly handle all 1xx now, leaving a special case
for 101 which is used for the upgrade.
This fix should be backported to 1.7, 1.6 and 1.5. In 1.4 the code is
different but the backport should be made there as well.
With this patch additional information are added to stick-table definition
messages so that to make external application capable of learning peer
stick-table configurations. First stick-table entries duration is added
followed by the frequency counters type IDs and values.
May be backported to 1.7 and 1.6.
In the commit 2b553de5 ("BUG/MINOR: filters: Don't force the stream's wakeup
when we wait in flt_end_analyze"), we removed a task_wakeup in flt_end_analyze
to no consume too much CPU by looping in certain circumstances.
But this fix was too drastic. For Keep-Alive transactions, flt_end_analyze is
often called only for the response. Then the stream is paused until a timeout is
hitted or the next request is received. We need first let a chance to both
channels to call flt_end_analyze function. Then if a filter need to wait here,
it is its responsibility to wake up the stream when needed. To fix the bug, and
thanks to previous commits, we set the flag CF_WAKE_ONCE on channels to pretend
there is an activity. On the current channel, the flag will be removed without
any effect, but for the other side the analyzer will be called immediatly.
Thanks for Lukas Tribus for his detailed analysis of the bug.
This patch must be backported in 1.7 with the 2 previous ones:
* a94fda3 ("BUG/MINOR: http: Don't reset the transaction if there are still data to send")
* cdaea89 ("BUG/MINOR: stream: Don't forget to remove CF_WAKE_ONCE flag on response channel")
To reset an HTTP transaction, we need to be sure all data were sent, for the
request and the response. There are tests on request and response buffers for
that in http_resync_states function. But the return code was wrong. We must
return 0 to wait.
This patch must be backported in 1.7
This flag can be set on a channel to pretend there is activity on it. This is a
way to wake-up the corresponding stream and evaluate stream analyzers on the
channel. It is correctly handled on both channels but removed only on the
request channel.
This patch is flagged as a bug but for now, CF_WAKE_ONCE is never set on the
response channel.
When support for passing SNI to the server was added in 1.6-dev3, there
was no way to validate that the certificate presented by the server would
really match the name requested in the SNI, which is quite a problem as
it allows other (valid) certificates to be presented instead (when hitting
the wrong server or due to a man in the middle).
This patch adds the missing check against the value passed in the SNI.
The "verifyhost" value keeps precedence if set. If no SNI is used and
no verifyhost directive is specified, then the certificate name is not
checked (this is unchanged).
In order to extract the SNI value, it was necessary to make use of
SSL_SESSION_get0_hostname(), which appeared in openssl 1.1.0. This is
a trivial function which returns the value of s->tlsext_hostname, so
it was provided in the compat layer for older versions. After some
refinements from Emmanuel, it now builds with openssl 1.0.2, openssl
1.1.0 and boringssl. A test file was provided to ease testing all cases.
After some careful observation period it may make sense to backport
this to 1.7 and 1.6 as some users rightfully consider this limitation
as a bug.
Cc: Emmanuel Hocdet <manu@gandi.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
The pool used to log the uri was created with a size of 0 because the
configuration and 'tune.http.logurilen' were parsed too earlier.
The fix consist to postpone the pool_create as it is done for
cookie captures.
Regression introduced with 'MINOR: log: Add logurilen tunable'
The bug: Maps/ACLs using the same file/id can mistakenly inherit
their flags from the last declared one.
i.e.
$ cat haproxy.conf
listen mylistener
mode http
bind 0.0.0.0:8080
acl myacl1 url -i -f mine.acl
acl myacl2 url -f mine.acl
acl myacl3 url -i -f mine.acl
redirect location / if myacl2
$ cat mine.acl
foobar
Shows an unexpected redirect for request 'GET /FOObAR HTTP/1.0\n\n'.
This fix should be backported on mainline branches v1.6 and v1.7.
Introduced regression with 'MAJOR: applet scheduler rework' (1.8-dev only).
The fix consist to re-enable the appctx immediatly from the
applet wake cb if the process_stream is not pending in runqueue
and the applet want perform a put or a get and the WAIT_ROOM
flag was removed by stream_int_notify.
Instead of doing a malloc/free to each HTTP transaction to allocate the
compression state (when the HTTP compression is enabled), we use a memory pool.
This patch fixes an obvious memory leak in the compression filter. The
compression state (comp_state) is allocated when a HTTP transaction starts, in
channel_start_analyze callback, Whether we are able to compression the response
or not. So it must be released when the transaction ends, in channel_end_analyze
callback.
But there is a bug here. The state is released on the response side only. So, if
a transaction ends before the response is started, it is never released. This
happens when a connection is closed before the response is started.
To fix the bug, statistics about the HTTP compression are now updated in
http_end callback, when the response parsing ends. It happens only if no error
is encountered and when the response is compressed. So, it is safe to release
the compression state in channel_end_analyze callback, regardless the
channel's type.
This patch must be backported in 1.7.
The reference of the current map/acl element to dump could
be destroyed if map is updated from an 'http-request del-map'
configuration rule or throught a 'del map/acl' on CLI.
We use a 'back_refs' chaining element to fix this. As it
is done to dump sessions.
This patch needs also fix:
'BUG/MAJOR: cli: fix custom io_release was crushed by NULL.'
To clean the back_ref and avoid a crash on a further
del/clear map operation.
Those fixes should be backported on mainline branches 1.7 and 1.6.
This patch wont directly apply on 1.6.
Recently merged commit 0cfe388 ("MINOR: frontend: retrieve the ALPN name when
available") assumed that the connection is always known in frontend_accept()
which is not true for outgoing peers connections for example.
No backport needed.
In order to authorize call of appctx_wakeup on running task:
- from within the task handler itself.
- in futur, from another thread.
The appctx is considered paused as default after running the handler.
The handler should explicitly call appctx_wakeup to be re-called.
When the appctx_free is called on a running handler. The real
free is postponed at the end of the handler process.
It's more efficient this way, as it allows to flush a send buffer before
receiving data in the other one. This can lead to a slightly faster buffer
recycling, thus slightly less memory and a small performance increase by
using a hotter cache.
In order to implement hot-pluggable applets like we'll need for HTTP/2
which will speak a different protocol than the expected one, it will be
mandatory to be able to clear all analysers from the request and response
channel and/or to keep only the ones the applet initializer installed.
Unfortunately for now in sess_establish() we systematically place a number
of analysers inherited from the frontend, backend and some hard-coded ones.
This patch reuses the now unused SF_TUNNEL flag on the stream to indicate
we're dealing with a tunnel and don't want to add more analysers anymore.
It will be usable to install such a specific applet.
Ideally over the long term it might be nice to be able to set the mode on
the stream instead of the proxy so that we can decide to change a stream's
mode (eg: TCP, HTTP, HTTP/2) at run time. But it would require many more
changes for a gain which is not yet obvious.
This is used to retrieve the TLS ALPN information from a connection. We
also support a fallback to NPN if ALPN doesn't find anything or is not
available on the existing implementation. It happens that depending on
the library version, either one or the other is available. NPN was
present in openssl 1.0.1 (very common) while ALPN is in 1.0.2 and onwards
(still uncommon at the time of writing). Clients are used to send either
one or the other to ensure a smooth transition.
For HTTP/2 we'll have to choose the upper layer based on the
advertised protocol name here and we want to keep debugging,
so let's move debugging earlier.
It doesn't make sense that stream_new() doesn't sets the target nor
analysers and that the caller has to do it even if it doesn't know
about streams (eg: in session_accept_fd()). This causes trouble for
H2 where the applet handling the protocol cannot properly change
these information during its init phase.
Let's ensure it's always set and that the callers don't set it anymore.
Note: peers and lua don't use analysers and that's properly handled.
The task_wakeup was called on stream_new, but the task/stream
wasn't fully initialized yet. The task_wakeup must be called
explicitly by the caller once the task/stream is initialized.
In order to authorize call of task_wakeup on running task:
- from within the task handler itself.
- in futur, from another thread.
The lookups on runqueue and waitqueue are re-worked
to prepare multithread stuff.
If task_wakeup is called on a running task, the woken
message flags are savec in the 'pending_state' attribute of
the state. The real wakeup is postponed at the end of the handler
process and the woken messages are copied from pending_state
to the state attribute of the task.
It's important to note that this change will cause a very minor
(though measurable) performance loss but it is necessary to make
forward progress on a multi-threaded scheduler. Most users won't
ever notice.
Mathias Weiersmueller reported an interesting issue with logs which Lukas
diagnosed as dating back from commit 9b061e332 (1.5-dev9). When front
connection information (ip, port) are logged in TCP mode and the log is
emitted at the end of the connection (eg: because %B or any log tag
requiring LW_BYTES is set), the log is emitted after the connection is
closed, so the address and ports cannot be retrieved anymore.
It could be argued that we'd make a special case of these to immediatly
retrieve the source and destination addresses from the connection, but it
seems cleaner to simply pin the front connection, marking it "tracked" by
adding the LW_XPRT flag to mention that we'll need some of these elements
at the last moment. Only LW_FRTIP and LW_CLIP are affected. Note that after
this change, LW_FRTIP could simply be removed as it's not used anywhere.
Note that the problem doesn't happen when using %[src] or %[dst] since
all sample expressions set LW_XPRT.
This must be backported to 1.7, 1.6 and 1.5.
We cannot store more than 32K headers in the structure hdr_idx, because
internaly we use signed short integers. To avoid any bugs (due to an integers
overflow), a check has been added on tune.http.maxhdr to be sure to not set a
value greater than 32767 and lower than 1 (because this is a nonsense to set
this parameter to a value <= 0).
The documentation has been updated accordingly.
This patch can be backported in 1.7, 1.6 and 1.5.
When a peer task has sent a synchronization request to remote peers
its next expiration date was updated based on a resynchronization timeout
value which itself may have already expired leading the underlying
poller to wait for 0ms during a fraction of second (consuming high CPU
resources).
With this patch we update such peer task expiration dates only if
the resynchronization timeout is not already expired.
Thanks to Patrick Hemmer who reported an issue with nice traces
which helped in finding this one.
This patch may be backported to 1.7 and 1.6.
When starting the master worker with -sf or -st, the PIDs will be reused
on the next reload, which is a problem if new processes on the system
took those PIDs.
This patch ensures that we don't register old PIDs in the reload system
when launching the master worker.
Don't copy the -x argument anymore in copy_argv() since it's already
allocated in mworker_reload().
Make the copy_argv() more consistent when used with multiple arguments
to strip.
It prevents multiple -x on reload, which is not supported.
This patch fixes a segfault in the command line parser.
When haproxy is launched with -x with no argument and -x is the latest
option in argv it segfaults.
Use usage() insteads of exit() on error.
James Brown reported some cases where a race condition happens between
the old and the new processes resulting in the leaving process removing
a newly bound unix socket. Jeff gave all the details he observed here :
https://www.mail-archive.com/haproxy@formilux.org/msg25001.html
The unix socket removal was an attempt at an optimal cleanup, which
almost never works anyway since the process is supposed to be chrooted.
And in the rare cases where it works it occasionally creates trouble.
There was already a workaround in place to avoid removing this socket
when it's been inherited from a parent's file descriptor.
So let's finally kill this useless stuff now to definitely get rid of
this persistent problem.
This fix should be backported to all stable releases.
A peer session which has just been created upon reconnect timeout expirations,
could be right after shutdown (at peer session level) because the remote
side peer could also righ after have connected. In such a case the underlying
TCP session was still running (connect()/accept()) and finally left in CLOSE_WAIT
state after the remote side stopped writting (shutdown(SHUT_WR)).
Now on, with this patch we never shutdown such peer sessions wich have just
been created. We leave them connect to the remote peer which is already
connected and must shutdown its own peer session.
Thanks to Patric Hemmer and Yves Lafon at w3.org for reporting this issue,
and for having tested this patch on the field.
Thanks also to Willy and Yelp blogs which helped me a lot in fixing it
(see https://www.haproxy.com/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/ and
https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.htmll).
A filter can choose to loop when a HTTP message is in the state
HTTP_MSG_ENDING. But the transaction is terminated with an error if the input is
closed (CF_SHUTR set on the channel). At this step, we have received all data,
so we can wait.
So now, we also check the parser state before leaving. This fix only affects
configs that use a filter that can wait in http_forward_data or http_end
callbacks, when all data were parsed.
For openssl 1.0.2, SSLv3_server_method and SSLv3_client_method are undefined if
OPENSSL_NO_SSL3_METHOD is set. So we must add a check on this macro before using
these functions.
For an ACL, we can load patterns from a map using the flag -M. For example:
acl test hdr(host) -M -f hosts.map
The file is parsed as a map et the ACL will be executed as expected. But the
reference flag is wrong. It is set to PAT_REF_ACL. So the map will never be
listed by a "show map" on the stat socket. Setting the reference flag to
PAT_REF_ACL|PAT_REF_MAP fixes the bug.
Jean Lubatti reported a crash on haproxy using a config involving cookies
and tarpit rules. It just happens that since 1.7-dev3 with commit 83a2c3d
("BUG/MINOR : allow to log cookie for tarpit and denied request"), function
manage_client_side_cookies() was called after erasing the request buffer in
case of a tarpit action. The problem is that this function must absolutely
not be called with an empty buffer since it moves parts of it. A typical
reproducer consists in sending :
"GET / HTTP/1.1\r\nCookie: S=1\r\n\r\n"
On such a config :
listen crash
bind :8001
mode http
reqitarpit .
cookie S insert indirect
server s1 127.0.0.1:8000 cookie 1
The fix simply consists in moving the call to the function before the call
to buffer_erase().
Many thanks to Jean for testing instrumented code and providing a usable
core.
This fix must be backported to all stable versions since the fix introducing
this bug was backported as well.
Commit cb11fd2 ("MEDIUM: mworker: wait mode on reload failure")
introduced a regression, when HAProxy is used in daemon mode, it exits 1
after forking its children.
HAProxy should exit(0), the exit(EXIT_FAILURE) was expected to be use
when the master fail in master-worker mode.
Thanks to Emmanuel Hocdet for reporting this bug. No backport needed.
The commit 201c07f68 ("MAJOR/REORG: dns: DNS resolution task and
requester queues") introduces a warning during compilation:
src/dns.c: In function ‘dns_resolve_recv’:
src/dns.c:487:6: warning: ‘need_resend’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (need_resend) {
^
This patch initialize the variable and remove the comment about it.
The commit 872f9c213 ("MEDIUM: ssl: add basic support for OpenSSL crypto
engine") broke the build without openssl support.
The ssl_free_dh() function is not defined when USE_OPENSSL is not
defined and leads to a compilation failure.
This patch modifies the way to re-enable the connection from the async fd
handler calling conn_update_sock_polling instead of the conn_fd_handler.
It also ensures that the polling is really stopped on the async fd.
The Openssl's ASYNC API does'nt support moving buffers on SSL_read/write
This patch disables the ASYNC mode dynamically when the handshake
is left and re-enables it on reneg.
This patch ensure that the ASYNC fd handlers won't be wake up
too early, disabling the event cache for this fd on connection close
and when a WANT_ASYNC is rised by Openssl.
The calls to SSL_read/SSL_write/SSL_do_handshake before rising a real read
event from the ASYNC fd, generated an EAGAIN followed by a context switch
for some engines, or a blocked read for the others.
On connection close it resulted in a too early call to SSL_free followed
by a segmentation fault.
SSL/TLS version can be changed per certificat if and only if openssl lib support
earlier callback on handshake and, of course, is implemented in haproxy. It's ok
for BoringSSL. For Openssl, version 1.1.1 have such callback and could support it.
This patch cleanup the usage of set_version func with a more suitable name:
ctx_set_version. It introduce ssl_set_version func (unused for the moment).
Very early in the connection rework process leading to v1.5-dev12, commit
56a77e5 ("MEDIUM: connection: complete the polling cleanups") marked the
end of use for this flag which since was never set anymore, but it continues
to be tested. Let's kill it now.
When dumping data at various places in the code, it's hard to figure
what is present where. To make this easier, this patch slightly modifies
debug_hexdump() to take a prefix string which is prepended in front of
each output line.
This patch is a major upgrade of the internal run-time DNS resolver in
HAProxy and it brings the following 2 main changes:
1. DNS resolution task
Up to now, DNS resolution was triggered by the health check task.
From now, DNS resolution task is autonomous. It is started by HAProxy
right after the scheduler is available and it is woken either when a
network IO occurs for one of its nameserver or when a timeout is
matched.
From now, this means we can enable DNS resolution for a server without
enabling health checking.
2. Introduction of a dns_requester structure
Up to now, DNS resolution was purposely made for resolving server
hostnames.
The idea, is to ensure that any HAProxy internal object should be able
to trigger a DNS resolution. For this purpose, 2 things has to be done:
- clean up the DNS code from the server structure (this was already
quite clean actually) and clean up the server's callbacks from
manipulating too much DNS resolution
- create an agnostic structure which allows linking a DNS resolution
and a requester of any type (using obj_type enum)
3. Manage requesters through queues
Up to now, there was an uniq relationship between a resolution and it's
owner (aka the requester now). It's a shame, because in some cases,
multiple objects may share the same hostname and may benefit from a
resolution being performed by a third party.
This patch introduces the notion of queues, which are basically lists of
either currently running resolution or waiting ones.
The resolutions are now available as a pool, which belongs to the resolvers.
The pool has has a default size of 64 resolutions per resolvers and is
allocated at configuration parsing.
This patch introduces a bit of roundrobin in the records stored in our
local cache.
Purpose is to allow some kind of distribution of the IPs found in a
response.
Note that distribution properly applies only when the IP used by many
requesters disappear and is replaced by an other one.
ancount is the number of answers available in a DNS response.
Before this patch, HAProxy used to store the ancount found in the buffer
(sent by the DNS server).
Unfortunately, this is now inaccurate and does not correspond to the
number of records effectively stored in our local version of the
response. In Example, the CNAMEs are not stored.
This patch updates ancount field in to make it match what is effectively
stored in our version.
Introduction of a DNS response LRU cache in HAProxy.
When a positive response is received from a DNS server, HAProxy stores
it in the struct resolution and then also populates a LRU cache with the
response.
For now, the key in the cache is a XXHASH64 of the hostname in the
domain name format concatened to the query type in string format.
Prior this patch, the DNS responses were stored in a pre-allocated
memory area (allocated at HAProxy's startup).
The problem is that this memory is erased for each new DNS responses
received and processed.
This patch removes the global memory allocation (which was not thread
safe by the way) and introduces a storage of the dns response in the
struct
resolution.
The memory in the struct resolution is also reserved at start up and is
thread safe, since each resolution structure will have its own memory
area.
For now, we simply store the response and use it atomically per
response per server.
In the process of breaking links between dns_* functions and other
structures (mainly server and a bit of resolution), the function
dns_get_ip_from_response needs to be reworked: it now can call
"callback" functions based on resolution's owner type to allow modifying
the way the response is processed.
For now, main purpose of the callback function is to check that an IP
address is not already affected to an element of the same type.
For now, only server type has a callback.
This patch introduces a some re-organisation around the DNS code in
HAProxy.
1. make the dns_* functions less dependent on 'struct server' and 'struct resolution'.
With this in mind, the following changes were performed:
- 'struct dns_options' has been removed from 'struct resolution' (well,
we might need it back at some point later, we'll see)
==> we'll use the 'struct dns_options' from the owner of the resolution
- dns_get_ip_from_response(): takes a 'struct dns_options' instead of
'struct resolution'
==> so the caller can pass its own dns options to get the most
appropriate IP from the response
- dns_process_resolve(): struct dns_option is deduced from new
resolution->requester_type parameter
2. add hostname_dn and hostname_dn_len into struct server
In order to avoid recomputing a server's hostname into its domain name
format (and use a trash buffer to store the result), it is safer to
compute it once at configuration parsing and to store it into the struct
server.
In the mean time, the struct resolution linked to the server doesn't
need anymore to store the hostname in domain name format. A simple
pointer to the server one will make the trick.
The function srv_alloc_dns_resolution() properly manages everything for
us: memory allocation, pointer updates, etc...
3. move resolvers pointer into struct server
This patch makes the pointer to struct dns_resolvers from struct
dns_resolution obsolete.
Purpose is to make the resolution as "neutral" as possible and since the
requester is already linked to the resolvers, then we don't need this
information anymore in the resolution itself.
A couple of new functions to allocate and free memory for a DNS
resolution structure. Main purpose is to to make the code related to DNS
more consistent.
They allocate or free memory for the structure itself. Later, if needed,
they should also allocate / free the buffers, etc, used by this structure.
They don't set/unset any parameters, this is the role of the caller.
This patch also implement calls to these function eveywhere it is
required.
The default len of request uri in log messages is 1024. In some use
cases, you need to keep the long trail of GET parameters. The only
way to increase this len is to recompile with DEFINE=-DREQURI_LEN=2048.
This commit introduces a tune.http.logurilen configuration directive,
allowing to tune this at runtime.
This patch ensure that the children will exit when the master quits,
even if the master didn't send any signal.
The master and the workers are connected through a pipe, when the pipe
closes the children leave.
This option exits every workers when one of the current workers die.
It allows you to monitor the master process in order to relaunch
everything on a failure.
For example it can be used with systemd and Restart=on-failure in a spec
file.
In master worker mode, you can't specify the stats socket where you get
your listeners FDs on a reload, because the command line of the re-exec
is launched by the master.
To solve the problem, when -x is found on the command line, its
parameter is rewritten on a reexec with the first stats socket with the
capability to send sockets. It tries to reuse the original parameter if
it has this capability.
In Master Worker mode, when the reloading of the configuration fail,
the process is exiting leaving the children without their father.
To handle this, we register an exit function with atexit(3), which is
reexecuting the binary in a special mode. This particular mode of
HAProxy don't reload the configuration, it only loops on wait().
The master-worker will reload itself on SIGUSR2/SIGHUP
It's inherited from the systemd wrapper, when the SIGUSR2 signal is
received, the master process will reexecute itself with the -sf flag
followed by the PIDs of the children.
In the systemd wrapper, the children were using a pipe to notify when
the config has been parsed and when the new process is ready. The goal
was to ensure that the process couldn't reload during the parsing of the
configuration, before signals were send to old process.
With the new mworker model, the master parses the configuration and is
aware of all the children. We don't need a pipe, but we need to block
those signals before the end of a reload, to ensure that the process
won't be killed during a reload.
The SIGUSR1 signal is forwarded to the children to soft-stop HAProxy.
The SIGTERM and SIGINT signals are forwarded to the children in order to
terminate them.
This commit remove the -Ds systemd mode in HAProxy in order to replace
it by a more generic master worker system. It aims to replace entirely
the systemd wrapper in the near future.
The master worker mode implements a new way of managing HAProxy
processes. The master is in charge of parsing the configuration
file and is responsible for spawning child processes.
The master worker mode can be invoked by using the -W flag. It can be
used either in background mode (-D) or foreground mode. When used in
background mode, the master will fork to daemonize.
In master worker background mode, chroot, setuid and setgid are done in
each child rather than in the master process, because the master process
will still need access to filesystem to reload the configuration.
This patch adds the support of a maximum of 32 engines
in async mode.
Some tests have been done using 2 engines simultaneously.
This patch also removes specific 'async' attribute from the connection
structure. All the code relies only on Openssl functions.
ssl-mode-async is a global configuration parameter which enables
asynchronous processing in OPENSSL for all SSL connections haproxy
handles. With SSL_MODE_ASYNC set, TLS I/O operations may indicate a
retry with SSL_ERROR_WANT_ASYNC with this mode set if an asynchronous
capable engine is used to perform cryptographic operations. Currently
async mode only supports one async-capable engine.
This is the latest version of the patchset which includes Emeric's
updates :
- improved async fd cleaning when openssl reports an fd to delete
- prevent conn_fd_handler from calling SSL_{read,write,handshake} until
the async fd is ready, as these operations are very slow and waste CPU
- postpone of SSL_free to ensure the async operation can complete and
does not cause a dereference a released SSL.
- proper removal of async fd from the fdtab and removal of the unused async
flag.
This patch adds the global 'ssl-engine' keyword. First arg is an engine
identifier followed by a list of default_algorithms the engine will
operate.
If the openssl version is too old, an error is reported when the option
is used.
When HAProxy is running with multiple processes and some listeners
arebound to processes, the unused sockets were not closed in the other
processes. The aim was to be able to send those listening sockets using
the -x option.
However to ensure the previous behavior which was to close those
sockets, we provided the "no-unused-socket" global option.
This patch changes this behavior, it will close unused sockets which are
not in the same process as an expose-fd socket, making the
"no-unused-socket" option useless.
The "no-unused-socket" option was removed in this patch.
This patch changes the stats socket rights for allowing the sending of
listening sockets.
The previous behavior was to allow any unix stats socket with admin
level to send sockets. It's not possible anymore, you have to set this
option to activate the socket sending.
Example:
stats socket /var/run/haproxy4.sock mode 666 expose-fd listeners level user process 4
The current level variable use only 2 bits for storing the 3 access
level (user, oper and admin).
This patch add a bitmask which allows to use the remaining bits for
other usage.
In the case of a Lua sample-fetch or converter doesn't return any
value, an acces outside the Lua stack can be performed. This patch
check the stack size before converting the top value to a HAProxy
internal sample.
A workaround consist to check that a value value is always returned
with sample fetches and converters.
This patch should be backported in the version 1.6 and 1.7
Add "b64dec" as a new converter which can be used to decode a base64
encoded string into its binary representation. It performs the inverse
operation of the "base64" converter.
Some DNS related network sockets were closed without unregistering their file
descriptors from their underlying kqueue event sets. This patch replaces calls to
close() by fd_delete() calls to that to delete such events attached to DNS
network sockets from the kqueue before closing the sockets.
The bug was introduced by commit 26c6eb8 ("BUG/MAJOR: dns: restart sockets
after fork()") which was backported in 1.7 so this fix has to be backported
there as well.
Thanks to Jim Pingle who reported it and indicated the faulty commit, and
to Lukas Tribus for the trace showing the bad file descriptor.
In haproxy < 1.8, no-sslv3/no-tlsv1x are ignored when force-sslv3/force-tlsv1x
is used (without warning). With this patch, no-sslv3/no-tlsv1x are ignored when
ssl-min-ver or ssl-max-ver is used (with warning).
When all SSL/TLS versions are disable: generate an error, not a warning.
example: ssl-min-ver TLSV1.3 (or force-tlsv13) with a openssl <= 1.1.0.
'ssl-min-ver' and 'ssl-max-ver' with argument SSLv3, TLSv1.0, TLSv1.1, TLSv1.2
or TLSv1.3 limit the SSL negotiation version to a continuous range. ssl-min-ver
and ssl-max-ver should be used in replacement of no-tls* and no-sslv3. Warning
and documentation are set accordingly.
Plan is to add min-tlsxx max-tlsxx configuration, more consistent than no-tlsxx.
Find the real min/max versions (openssl capabilities and haproxy configuration)
and generate warning with bad versions range.
'no-tlsxx' can generate 'holes':
"The list of protocols available can be further limited using the SSL_OP_NO_X
options of the SSL_CTX_set_options or SSL_set_options functions. Clients should
avoid creating 'holes' in the set of protocols they support, when disabling a
protocol, make sure that you also disable either all previous or all subsequent
protocol versions. In clients, when a protocol version is disabled without
disabling all previous protocol versions, the effect is to also disable all
subsequent protocol versions."
To not break compatibility, "holes" is authorized with warning, because openssl
1.1.0 and boringssl deal with it (keep the upper or lower range depending the
case and version).
Plan is to add min-tlsxx max-tlsxx configuration, more consistent than no-tlsxx.
This patch introduce internal min/max and replace force-tlsxx implementation.
SSL method configuration is store in 'struct tls_version_filter'.
SSL method configuration to openssl setting is abstract in 'methodVersions' table.
With openssl < 1.1.0, SSL_CTX_set_ssl_version is used for force (min == max).
With openssl >= 1.1.0, SSL_CTX_set_min/max_proto_version is used.
Plan is to add min-tlsxx max-tlsxx configuration, more consistent than no-tlsxx.
min-tlsxx and max-tlsxx can be overwrite on local definition. This directives
should be the only ones needed in default-server.
To simplify next patches (rework of tls versions settings with min/max) all
ssl/tls version settings relative to default-server are reverted first:
remove: 'sslv3', 'tls*', 'no-force-sslv3', 'no-force-tls*'.
remove from default-server: 'no-sslv3', 'no-tls*'.
Note:
. force-tlsxx == min-tlsxx + max-tlsxx : would be ok in default-server.
. no-tlsxx is keep for compatibility: should not be propagated to default-server.
James Brown reported that agent-check mistakenly sends the proxy
protocol header when it's configured. This is obviously wrong as
the agent is an independant servie and not a traffic port, let's
disable this.
This fix must be backported to 1.7 and possibly 1.6.
This patch adds a new stats socket command to modify server
FQDNs at run time.
Its syntax:
set server <backend>/<server> fqdn <FQDN>
This patch also adds FQDNs to server state file at the end
of each line for backward compatibility ("-" if not present).
This patch replaces the calls to TLSvX_X_client/server/_method
by the new TLS_client/server_method and it uses the new functions
SSL_set_min_proto_version and SSL_set_max_proto_version, setting them
at the wanted protocol version using 'force-' statements.
The sample fetch returns all headers including the last jump line.
The last jump line is used to determine if the block of headers is
truncated or not.
These encoding functions does general stuff and can be used in
other context than spoe. This patch moves the function spoe_encode_varint
and spoe_decode_varint from spoe to common. It also remove the prefix spoe.
These functions will be used for encoding values in new binary sample fetch.
in chash_get_server_hash, we find the nearest server entries both
before and after the request hash. If the next and prev entries both
point to the same server, the function would exit early and return that
server, to save work.
Before hash-balance-factor this was a valid optimization -- one of nsrv
and psrv would definitely be chosen, so if they are the same there's no
need to choose between them. But with hash-balance-factor it's possible
that adding another request to that server would overload it
(chash_server_is_eligible returns false) and we go further around the
ring. So it's not valid to return before checking for that.
This commit simply removes the early return, as it provides a minimal
savings even when it's correct.
The priv context is not cleaned when we set a new priv context.
This is caused by a stupid swap between two parameter of the
luaL_unref() function.
workaround: use set_priv only once when we process a stream.
This patch should be backported in version 1.7 and 1.6
This patch adds server_template_init() function used to initialize servers
from server templates. It is called just after having parsed a 'server-template'
line.
This patch makes backend sections support 'server-template' new keyword.
Such 'server-template' objects are parsed similarly to a 'server' object
by parse_server() function, but its first arguments are as follows:
server-template <ID prefix> <nb | range> <ip | fqdn>:<port> ...
The remaining arguments are the same as for 'server' lines.
With such server template declarations, servers may be allocated with IDs
built from <ID prefix> and <nb | range> arguments.
For instance declaring:
server-template foo 1-5 google.com:80 ...
or
server-template foo 5 google.com:80 ...
would be equivalent to declare:
server foo1 google.com:80 ...
server foo2 google.com:80 ...
server foo3 google.com:80 ...
server foo4 google.com:80 ...
server foo5 google.com:80 ...
This patch moves the code which is responsible of finalizing server initializations
after having fully parsed a 'server' line (health-check, agent check and SNI expression
initializations) from parse_server() to new functions.
This patch moves the code responsible of copying default server settings
to a new server instance from parse_server() function to new defsrv_*_cpy()
functions which may be used both during server lines parsing and during server
templates initializations to come.
These defsrv_*_cpy() do not make any reference to anything else than default
server settings.
'resolvers' setting was not duplicated from default server setting to
new server instances when parsing 'server' lines.
This fix is simple: strdup() default resolvers <id> string argument after
having allocated a new server when parsing 'server' lines.
This patch must be backported to 1.7 and 1.6.
This bug occurs when a redirect rule is applied during the request analysis on a
persistent connection, on a proxy without any server. This means, in a frontend
section or in a listen/backend section with no "server" line.
Because the transaction processing is shortened, no server can be selected to
perform the connection. So if we try to establish it, this fails and a 503 error
is returned, while a 3XX was already sent. So, in this case, HAProxy generates 2
replies and only the first one is expected.
Here is the configuration snippet to easily reproduce the problem:
listen www
bind :8080
mode http
timeout connect 5s
timeout client 3s
timeout server 6s
redirect location /
A simple HTTP/1.1 request without body will trigger the bug:
$ telnet 0 8080
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
GET / HTTP/1.1
HTTP/1.1 302 Found
Cache-Control: no-cache
Content-length: 0
Location: /
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
Connection closed by foreign host.
[wt: only 1.8-dev is impacted though the bug is present in older ones]
In server_parse_sni_expr(), we use the "proxy" global variable, when we
should probably be using "px" given as an argument.
It happens to work by accident right now, but may not in the future.
[wt: better backport it]
Overall we do have an issue with the severity of a number of errors. Most
fatal errors are reported with ERR_FATAL (which prevents startup) and not
ERR_ABORT (which stops parsing ASAP), but check_config_validity() is still
called on ERR_FATAL, and will most of the time report bogus errors. This
is what caused smp_resolve_args() to be called on a number of unparsable
ACLs, and it also is what reports incorrect ordering or unresolvable
section names when certain entries could not be properly parsed.
This patch stops this domino effect by simply aborting before trying to
further check and resolve the configuration when it's already know that
there are fatal errors.
A concrete example comes from this config :
userlist users :
user foo insecure-password bar
listen foo
bind :1234
mode htttp
timeout client 10S
timeout server 10s
timeout connect 10s
stats uri /stats
stats http-request auth unless { http_auth(users) }
http-request redirect location /index.html if { path / }
It contains a colon after the userlist name, a typo in the client timeout value,
another one in "mode http" which cause some other configuration elements not to
be properly handled.
Previously it would confusingly report :
[ALERT] 108/114851 (20224) : parsing [err-report.cfg:1] : 'userlist' cannot handle unexpected argument ':'.
[ALERT] 108/114851 (20224) : parsing [err-report.cfg:6] : unknown proxy mode 'htttp'.
[ALERT] 108/114851 (20224) : parsing [err-report.cfg:7] : unexpected character 'S' in 'timeout client'
[ALERT] 108/114851 (20224) : Error(s) found in configuration file : err-report.cfg
[ALERT] 108/114851 (20224) : parsing [err-report.cfg:11] : unable to find userlist 'users' referenced in arg 1 of ACL keyword 'http_auth' in proxy 'foo'.
[WARNING] 108/114851 (20224) : config : missing timeouts for proxy 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[WARNING] 108/114851 (20224) : config : 'stats' statement ignored for proxy 'foo' as it requires HTTP mode.
[WARNING] 108/114851 (20224) : config : 'http-request' rules ignored for proxy 'foo' as they require HTTP mode.
[ALERT] 108/114851 (20224) : Fatal errors found in configuration.
The "requires HTTP mode" errors are just pollution resulting from the
improper spelling of this mode earlier. The unresolved reference to the
userlist is caused by the extra colon on the declaration, and the warning
regarding the missing timeouts is caused by the wrong character.
Now it more accurately reports :
[ALERT] 108/114900 (20225) : parsing [err-report.cfg:1] : 'userlist' cannot handle unexpected argument ':'.
[ALERT] 108/114900 (20225) : parsing [err-report.cfg:6] : unknown proxy mode 'htttp'.
[ALERT] 108/114900 (20225) : parsing [err-report.cfg:7] : unexpected character 'S' in 'timeout client'
[ALERT] 108/114900 (20225) : Error(s) found in configuration file : err-report.cfg
[ALERT] 108/114900 (20225) : Fatal errors found in configuration.
Despite not really a fix, this patch should be backported at least to 1.7,
possibly even 1.6, and 1.5 since it hardens the config parser against
certain bad situations like the recently reported use-after-free and the
last null dereference.
Stephan Zeisberg reported another dirty abort case which can be triggered
with this simple config (where file "d" doesn't exist) :
backend b1
stats auth a:b
acl auth_ok http_auth(c) -f d
This issue was brought in 1.5-dev9 by commit 34db108 ("MAJOR: acl: make use
of the new argument parsing framework") when prune_acl_expr() started to
release arguments. The arg pointer is set to NULL but not its length.
Because of this, later in smp_resolve_args(), the argument is still seen
as valid (since only a test on the length is made as in all other places),
and the NULL pointer is dereferenced.
This patch properly clears the lengths to avoid such tests.
This fix needs to be backported to 1.7, 1.6, and 1.5.
Any valid keyword could not be parsed anymore if provided after 'source' keyword.
This was due to the fact that 'source' number of arguments is variable.
So, as its parser srv_parse_source() is the only one who may know how many arguments
was provided after 'source' keyword, it updates 'cur_arg' variable (the index
in the line of the current arg to be parsed), this is a good thing.
This variable is also incremented by one (to skip the 'source' keyword).
This patch disable this behavior.
Should have come with dba9707 commit.
'usesrc' setting is not permitted on 'server' lines if not provided after
'source' setting. This is now also the case on 'default-server' lines.
Without this patch parse_server() parser displayed that 'usersrc' is
an unknown keyword.
Should have come with dba9707 commit.
Make the systemd wrapper chech if HAPROXY_STATS_SOCKET if set.
If set, it will use it as an argument to the "-x" option, which makes
haproxy asks for any listening socket, on the stats socket, in order
to achieve reloads with no new connection lost.
When running with multiple process, if some proxies are just assigned
to some processes, the other processes will just close the file descriptors
for the listening sockets. However, we may still have to provide those
sockets when reloading, so instead we just try hard to pretend those proxies
are dead, while keeping the sockets opened.
A new global option, no-reused-socket", has been added, to restore the old
behavior of closing the sockets not bound to this process.
Try to reuse any socket from the old process, provided by the "-x" flag,
before binding a new one, assuming it is compatible.
"Compatible" here means same address and port, same namspace if any,
same interface if any, and that the following flags are the same :
LI_O_FOREIGN, LI_O_V6ONLY and LI_O_V4V6.
Also change tcp_bind_listener() to always enable/disable socket options,
instead of just doing so if it is in the configuration file, as the option
may have been removed, ie TCP_FASTOPEN may have been set in the old process,
and removed from the new configuration, so we have to disable it.
Add the "-x" flag, that takes a path to a unix socket as an argument. If
used, haproxy will connect to the socket, and asks to get all the
listening sockets from the old process. Any failure is fatal.
This is needed to get seamless reloads on linux.
Add a new command that will send all the listening sockets, via the
stats socket, and their properties.
This is a first step to workaround the linux problem when reloading
haproxy.
luaL_setstate() uses malloc() to initialize the first objects, and only
after this we replace the allocator. This creates trouble when replacing
the standard memory allocators during debugging sessions since the new
allocator is used to realloc() an area previously allocated using the
default malloc().
Lua provides lua_newstate() in addition to luaL_newstate(), which takes
an allocator for the initial malloc. This is exactly what we need, and
this patch does this and fixes the problem. The now useless call to
lua_setallocf() could be removed.
This has no impact outside of debugging sessions and there's no need to
backport this.
This reverts commit 266b1a8 ("MEDIUM: server: Inherit CLI weight changes and
agent-check weight responses") from Michal Idzikowski, which is still broken.
It stops propagating weights at the first error encountered, leaving servers
in a random state depending on what LB algorithms are used on other servers
tracking the one experiencing the weight change. It's unsure what the best
way to address this is, but we cannot leave the servers in an inconsistent
state between farms. For example :
backend site1
mode http
balance uri
hash-type consistent
server s1 127.0.0.1:8001 weight 10 track servers/s1
backend site2
mode http
balance uri
server s1 127.0.0.1:8001 weight 10 track servers/s1
backend site3
mode http
balance uri
hash-type consistent
server s1 127.0.0.1:8001 weight 10 track servers/s1
backend servers
server s1 127.0.0.1:8001 weight 10 check inter 1s
The weight change is applied on "servers/s1". It tries to propagate
to the servers tracking it, which are site1/s1, site2/s1 and site3/s1.
Let's say that "weight 50%" is requested. The servers are linked in
reverse-order, so the change is applied to "servers/s1", then to
"site3/s1", then to "site2/s1" and this one fails and rejects the
change. The change is aborted and never propagated to "site1/s1",
which keeps the server in a different state from "site3/s1". At the
very least, in case of error, the changes should probably be unrolled.
Also the error reported on the CLI (when changing from the CLI) simply says :
Backend is using a static LB algorithm and only accepts weights '0%' and '100%'.
Without more indications what the faulty backend is.
Let's revert this change for now, as initially feared it will definitely
cause more harm than good and at least needs to be revisited. It was never
backported to any stable branch so no backport is needed.
In case of error it's very difficult to properly unroll the list of
unresolved args because the error can appear on any argument, and all
of them share the same memory area, pointed to by one or multiple links
from the global args list. The problem is that till now the arguments
themselves were released and were not unlinked from the list, causing
all forms of corruption in deinit() when quitting on the error path if
an argument couldn't properly parse.
A few attempts at trying to selectively spot the appropriate list entries
to kill before releasing the shared area have only resulted in complicating
the code and pushing the issue further.
Here instead we use a simple conservative approach : prune_acl_expr()
only tries to free the argument array if none of the arguments were
unresolved, which means that none of them was added to the arg list.
It's unclear what a better approach would be. We could imagine that
args would point to their own location in the shared list but given
that this extra cost and complexity would be added exclusively in
order to cleanly release everything when we're exiting due to a config
parse error, this seems quite overkill.
This bug was noticed on 1.7 and likely affects 1.6 and 1.5, so the fix
should be backported. It's not easy to reproduce it, as the reproducers
randomly work depending on how memory is allocated. One way to do it is
to use parsable and non-parsable patterns on an ACL making use of args.
Big thanks to Stephan Zeisberg for reporting this problem with a working
reproducer.
If make_arg_list() fails to process an argument after having queued an
unresolvable one, it frees the allocated argument list but doesn't remove
the referenced args from the arg list. This causes a use after free or a
double free if the same location was reused, during the deinit phase upon
exit after reporting the error.
Since it's not easy to properly unlinked all elements, we only release the
args block if none of them was queued in the list.
When agent-check or CLI command executes relative weight change this patch
propagates it to tracking server allowing grouping many backends running on
same server underneath. Additionaly in case with many src IPs many backends
can have shared state checker, so there won't be unnecessary health checks.
[wt: Note: this will induce some behaviour change on some setups]
Take care of arg_list_clone() returning NULL in arg_list_add() since
the former does it too. It's only used during parsing so the impact
is very low.
Can be backported to 1.7, 1.6 and 1.5.
The error doesn't prevent checking for other errors after an invalid
character was detected in an ACL name. Better quit ASAP to avoid risking
to emit garbled and confusing error messages if something else fails on
the same line.
This should be backported to 1.7, 1.6 and 1.5.
AF_INET address family was always used to create sockets to connect
to name servers. This prevented any connection over IPv6 from working.
This fix must be backported to 1.7 and 1.6.
Commit 0ebb511 ("MINOR: tools: add a generic hexdump function for debugging")
introduced debug_hexdump() which is used to dump a memory area during
debugging sessions. This function can start at an unaligned offset and
uses a signed comparison to know where to start dumping from. But the
operation mixes signed and unsigned, making the test incorrect and causing
the following warnings to be emitted under Clang :
src/standard.c:3775:14: warning: comparison of unsigned expression >= 0 is
always true [-Wtautological-compare]
if (b + j >= 0 && b + j < len)
~~~~~ ^ ~
Make "j" signed instead. At the moment this function is not used at all
so there's no impact. Thanks to Dmitry Sivachenko for reporting it. No
backport is needed.
Commit 05ee213 ("MEDIUM: stats: Add JSON output option to show (info|stat)")
used to pass argument "uri" to the aforementionned function which doesn't
take any. It's probably a leftover from multiple iterations of the same
patchset. Spotted by Dmitry Sivachenko. No backport is needed.
IP*_BINDANY is not defined under this system thus it is
necessary to make those fields access since CONFIG_HAP_TRANSPARENT
is not defined.
[wt: problem introduced late in 1.8-dev. The same fix was also reported
by Steven Davidovitz]
Each time we generate a dynamic cookie, we try to make sure the same
cookie hasn't been generated for another server, it's very unlikely, but
it may happen.
We only have to check that for the servers in the same proxy, no, need to
check in others, plus the code was buggy and would always check in the
first proxy of the proxy list.
Released version 1.8-dev1 with the following main changes :
- BUG/MEDIUM: proxy: return "none" and "unknown" for unknown LB algos
- BUG/MINOR: stats: make field_str() return an empty string on NULL
- DOC: Spelling fixes
- BUG/MEDIUM: http: Fix tunnel mode when the CONNECT method is used
- BUG/MINOR: http: Keep the same behavior between 1.6 and 1.7 for tunneled txn
- BUG/MINOR: filters: Protect args in macros HAS_DATA_FILTERS and IS_DATA_FILTER
- BUG/MINOR: filters: Invert evaluation order of HTTP_XFER_BODY and XFER_DATA analyzers
- BUG/MINOR: http: Call XFER_DATA analyzer when HTTP txn is switched in tunnel mode
- BUG/MAJOR: stream: fix session abort on resource shortage
- OPTIM: stream-int: don't disable polling anymore on DONT_READ
- BUG/MINOR: cli: allow the backslash to be escaped on the CLI
- BUG/MEDIUM: cli: fix "show stat resolvers" and "show tls-keys"
- DOC: Fix map table's format
- DOC: Added 51Degrees conv and fetch functions to documentation.
- BUG/MINOR: http: don't send an extra CRLF after a Set-Cookie in a redirect
- DOC: mention that req_tot is for both frontends and backends
- BUG/MEDIUM: variables: some variable name can hide another ones
- MINOR: lua: Allow argument for actions
- BUILD: rearrange target files by build time
- CLEANUP: hlua: just indent functions
- MINOR: lua: give HAProxy variable access to the applets
- BUG/MINOR: stats: fix be/sessions/max output in html stats
- MINOR: proxy: Add fe_name/be_name fetchers next to existing fe_id/be_id
- DOC: lua: Documentation about some entry missing
- DOC: lua: Add documentation about variable manipulation from applet
- MINOR: Do not forward the header "Expect: 100-continue" when the option http-buffer-request is set
- DOC: Add undocumented argument of the trace filter
- DOC: Fix some typo in SPOE documentation
- MINOR: cli: Remove useless call to bi_putchk
- BUG/MINOR: cli: be sure to always warn the cli applet when input buffer is full
- MINOR: applet: Count number of (active) applets
- MINOR: task: Rename run_queue and run_queue_cur counters
- BUG/MEDIUM: stream: Save unprocessed events for a stream
- BUG/MAJOR: Fix how the list of entities waiting for a buffer is handled
- BUILD/MEDIUM: Fixing the build using LibreSSL
- BUG/MEDIUM: lua: In some case, the return of sample-fetches is ignored (2)
- SCRIPTS: git-show-backports: fix a harmless typo
- SCRIPTS: git-show-backports: add -H to use the hash of the commit message
- BUG/MINOR: stream-int: automatically release SI_FL_WAIT_DATA on SHUTW_NOW
- CLEANUP: applet/lua: create a dedicated ->fcn entry in hlua_cli context
- CLEANUP: applet/table: add an "action" entry in ->table context
- CLEANUP: applet: remove the now unused appctx->private field
- DOC: lua: documentation about time parser functions
- DOC: lua: improve links
- DOC: lua: section declared twice
- MEDIUM: cli: 'show cli sockets' list the CLI sockets
- BUG/MINOR: cli: "show cli sockets" wouldn't list all processes
- BUG/MINOR: cli: "show cli sockets" would always report process 64
- CLEANUP: lua: rename one of the lua appctx union
- BUG/MINOR: lua/cli: bad error message
- MEDIUM: lua: use memory pool for hlua struct in applets
- MINOR: lua/signals: Remove Lua part from signals.
- DOC: cli: show cli sockets
- MINOR: cli: automatically enable a CLI I/O handler when there's no parser
- CLEANUP: memory: remove the now unused cli_parse_show_pools() function
- CLEANUP: applet: group all CLI contexts together
- CLEANUP: stats: move a misplaced stats context initialization
- MINOR: cli: add two general purpose pointers and integers in the CLI struct
- MINOR: appctx/cli: remove the cli_socket entry from the appctx union
- MINOR: appctx/cli: remove the env entry from the appctx union
- MINOR: appctx/cli: remove the "be" entry from the appctx union
- MINOR: appctx/cli: remove the "dns" entry from the appctx union
- MINOR: appctx/cli: remove the "server_state" entry from the appctx union
- MINOR: appctx/cli: remove the "tlskeys" entry from the appctx union
- CONTRIB: tcploop: add limits.h to fix build issue with some compilers
- MINOR/DOC: lua: just precise one thing
- DOC: fix small typo in fe_id (backend instead of frontend)
- BUG/MINOR: Fix the sending function in Lua's cosocket
- BUG/MINOR: lua: memory leak executing tasks
- BUG/MINOR: lua: bad return code
- BUG/MINOR: lua: memleak when Lua/cli fails
- MEDIUM: lua: remove Lua struct from session, and allocate it with memory pools
- CLEANUP: haproxy: statify unexported functions
- MINOR: haproxy: add a registration for build options
- CLEANUP: wurfl: use the build options list to report it
- CLEANUP: 51d: use the build options list to report it
- CLEANUP: da: use the build options list to report it
- CLEANUP: namespaces: use the build options list to report it
- CLEANUP: tcp: use the build options list to report transparent modes
- CLEANUP: lua: use the build options list to report it
- CLEANUP: regex: use the build options list to report the regex type
- CLEANUP: ssl: use the build options list to report the SSL details
- CLEANUP: compression: use the build options list to report the algos
- CLEANUP: auth: use the build options list to report its support
- MINOR: haproxy: add a registration for post-check functions
- CLEANUP: checks: make use of the post-init registration to start checks
- CLEANUP: filters: use the function registration to initialize all proxies
- CLEANUP: wurfl: make use of the late init registration
- CLEANUP: 51d: make use of the late init registration
- CLEANUP: da: make use of the late init registration code
- MINOR: haproxy: add a registration for post-deinit functions
- CLEANUP: wurfl: register the deinit function via the dedicated list
- CLEANUP: 51d: register the deinitialization function
- CLEANUP: da: register the deinitialization function
- CLEANUP: wurfl: move global settings out of the global section
- CLEANUP: 51d: move global settings out of the global section
- CLEANUP: da: move global settings out of the global section
- MINOR: cfgparse: add two new functions to check arguments count
- MINOR: cfgparse: move parsing of "ca-base" and "crt-base" to ssl_sock
- MEDIUM: cfgparse: move all tune.ssl.* keywords to ssl_sock
- MEDIUM: cfgparse: move maxsslconn parsing to ssl_sock
- MINOR: cfgparse: move parsing of ssl-default-{bind,server}-ciphers to ssl_sock
- MEDIUM: cfgparse: move ssl-dh-param-file parsing to ssl_sock
- MEDIUM: compression: move the zlib-specific stuff from global.h to compression.c
- BUG/MEDIUM: ssl: properly reset the reused_sess during a forced handshake
- BUG/MEDIUM: ssl: avoid double free when releasing bind_confs
- BUG/MINOR: stats: fix be/sessions/current out in typed stats
- MINOR: tcp-rules: check that the listener exists before updating its counters
- MEDIUM: spoe: don't create a dummy listener for outgoing connections
- MINOR: listener: move the transport layer pointer to the bind_conf
- MEDIUM: move listener->frontend to bind_conf->frontend
- MEDIUM: ssl: remote the proxy argument from most functions
- MINOR: connection: add a new prepare_bind_conf() entry to xprt_ops
- MEDIUM: ssl_sock: implement ssl_sock_prepare_bind_conf()
- MINOR: connection: add a new destroy_bind_conf() entry to xprt_ops
- MINOR: ssl_sock: implement ssl_sock_destroy_bind_conf()
- MINOR: server: move the use_ssl field out of the ifdef USE_OPENSSL
- MINOR: connection: add a minimal transport layer registration system
- CLEANUP: connection: remove all direct references to raw_sock and ssl_sock
- CLEANUP: connection: unexport raw_sock and ssl_sock
- MINOR: connection: add new prepare_srv()/destroy_srv() entries to xprt_ops
- MINOR: ssl_sock: implement and use prepare_srv()/destroy_srv()
- CLEANUP: ssl: move tlskeys_finalize_config() to a post_check callback
- CLEANUP: ssl: move most ssl-specific global settings to ssl_sock.c
- BUG/MINOR: backend: nbsrv() should return 0 if backend is disabled
- BUG/MEDIUM: ssl: for a handshake when server-side SNI changes
- BUG/MINOR: systemd: potential zombie processes
- DOC: Add timings events schemas
- BUILD: lua: build failed on FreeBSD.
- MINOR: samples: add xx-hash functions
- MEDIUM: regex: pcre2 support
- BUG/MINOR: option prefer-last-server must be ignored in some case
- MINOR: stats: Support "select all" for backend actions
- BUG/MINOR: sample-fetches/stick-tables: bad type for the sample fetches sc*_get_gpt0
- BUG/MAJOR: channel: Fix the definition order of channel analyzers
- BUG/MINOR: http: report real parser state in error captures
- BUILD: scripts: automatically update the branch in version.h when releasing
- MINOR: tools: add a generic hexdump function for debugging
- BUG/MAJOR: http: fix risk of getting invalid reports of bad requests
- MINOR: http: custom status reason.
- MINOR: connection: add sample fetch "fc_rcvd_proxy"
- BUG/MINOR: config: emit a warning if http-reuse is enabled with incompatible options
- BUG/MINOR: tools: fix off-by-one in port size check
- BUG/MEDIUM: server: consider AF_UNSPEC as a valid address family
- MEDIUM: server: split the address and the port into two different fields
- MINOR: tools: make str2sa_range() return the port in a separate argument
- MINOR: server: take the destination port from the port field, not the addr
- MEDIUM: server: disable protocol validations when the server doesn't resolve
- BUG/MEDIUM: tools: do not force an unresolved address to AF_INET:0.0.0.0
- BUG/MINOR: ssl: EVP_PKEY must be freed after X509_get_pubkey usage
- BUG/MINOR: ssl: assert on SSL_set_shutdown with BoringSSL
- MINOR: Use "500 Internal Server Error" for 500 error/status code message.
- MINOR: proto_http.c 502 error txt typo.
- DOC: add deprecation notice to "block"
- MINOR: compression: fix -vv output without zlib/slz
- BUG/MINOR: Reset errno variable before calling strtol(3)
- MINOR: ssl: don't show prefer-server-ciphers output
- OPTIM/MINOR: config: Optimize fullconn automatic computation loading configuration
- BUG/MINOR: stream: Fix how backend-specific analyzers are set on a stream
- MAJOR: ssl: bind configuration per certificat
- MINOR: ssl: add curve suite for ECDHE negotiation
- MINOR: checks: Add agent-addr config directive
- MINOR: cli: Add possiblity to change agent config via CLI/socket
- MINOR: doc: Add docs for agent-addr configuration variable
- MINOR: doc: Add docs for agent-addr and agent-send CLI commands
- BUILD: ssl: fix to build (again) with boringssl
- BUILD: ssl: fix build on OpenSSL 1.0.0
- BUILD: ssl: silence a warning reported for ERR_remove_state()
- BUILD: ssl: eliminate warning with OpenSSL 1.1.0 regarding RAND_pseudo_bytes()
- BUILD: ssl: kill a build warning introduced by BoringSSL compatibility
- BUG/MEDIUM: tcp: don't poll for write when connect() succeeds
- BUG/MINOR: unix: fix connect's polling in case no data are scheduled
- MINOR: server: extend the flags to 32 bits
- BUG/MINOR: lua: Map.end are not reliable because "end" is a reserved keyword
- MINOR: dns: give ability to dns_init_resolvers() to close a socket when requested
- BUG/MAJOR: dns: restart sockets after fork()
- MINOR: chunks: implement a simple dynamic allocator for trash buffers
- BUG/MEDIUM: http: prevent redirect from overwriting a buffer
- BUG/MEDIUM: filters: Do not truncate HTTP response when body length is undefined
- BUG/MEDIUM: http: Prevent replace-header from overwriting a buffer
- BUG/MINOR: http: Return an error when a replace-header rule failed on the response
- BUG/MINOR: sendmail: The return of vsnprintf is not cleanly tested
- BUG/MAJOR: ssl: fix a regression in ssl_sock_shutw()
- BUG/MAJOR: lua segmentation fault when the request is like 'GET ?arg=val HTTP/1.1'
- BUG/MEDIUM: config: reject anything but "if" or "unless" after a use-backend rule
- MINOR: http: don't close when redirect location doesn't start with "/"
- MEDIUM: boringssl: support native multi-cert selection without bundling
- BUG/MEDIUM: ssl: fix verify/ca-file per certificate
- BUG/MEDIUM: ssl: switchctx should not return SSL_TLSEXT_ERR_ALERT_WARNING
- MINOR: ssl: removes SSL_CTX_set_ssl_version call and cleanup CTX creation.
- BUILD: ssl: fix build with -DOPENSSL_NO_DH
- MEDIUM: ssl: add new sample-fetch which captures the cipherlist
- MEDIUM: ssl: remove ssl-options from crt-list
- BUG/MEDIUM: ssl: in bind line, ssl-options after 'crt' are ignored.
- BUG/MINOR: ssl: fix cipherlist captures with sustainable SSL calls
- MINOR: ssl: improved cipherlist captures
- BUG/MINOR: spoe: Fix soft stop handler using a specific id for spoe filters
- BUG/MINOR: spoe: Fix parsing of arguments in spoe-message section
- MAJOR: spoe: Add support of pipelined and asynchronous exchanges with agents
- MINOR: spoe: Add support for pipelining/async capabilities in the SPOA example
- MINOR: spoe: Remove SPOE details from the appctx structure
- MINOR: spoe: Add status code in error variable instead of hardcoded value
- MINOR: spoe: Send a log message when an error occurred during event processing
- MINOR: spoe: Check the scope of sample fetches used in SPOE messages
- MEDIUM: spoe: Be sure to wakeup the good entity waiting for a buffer
- MINOR: spoe: Use the min of all known max_frame_size to encode messages
- MAJOR: spoe: Add support of payload fragmentation in NOTIFY frames
- MINOR: spoe: Add support for fragmentation capability in the SPOA example
- MAJOR: spoe: refactor the filter to clean up the code
- MINOR: spoe: Handle NOTIFY frames cancellation using ABORT bit in ACK frames
- REORG: spoe: Move struct and enum definitions in dedicated header file
- REORG: spoe: Move low-level encoding/decoding functions in dedicated header file
- MINOR: spoe: Improve implementation of the payload fragmentation
- MINOR: spoe: Add support of negation for options in SPOE configuration file
- MINOR: spoe: Add "pipelining" and "async" options in spoe-agent section
- MINOR: spoe: Rely on alertif_too_many_arg during configuration parsing
- MINOR: spoe: Add "send-frag-payload" option in spoe-agent section
- MINOR: spoe: Add "max-frame-size" statement in spoe-agent section
- DOC: spoe: Update SPOE documentation to reflect recent changes
- MINOR: config: warn when some HTTP rules are used in a TCP proxy
- BUG/MEDIUM: ssl: Clear OpenSSL error stack after trying to parse OCSP file
- BUG/MEDIUM: cli: Prevent double free in CLI ACL lookup
- BUG/MINOR: Fix "get map <map> <value>" CLI command
- MINOR: Add nbsrv sample converter
- CLEANUP: Replace repeated code to count usable servers with be_usable_srv()
- MINOR: Add hostname sample fetch
- CLEANUP: Remove comment that's no longer valid
- MEDIUM: http_error_message: txn->status / http_get_status_idx.
- MINOR: http-request tarpit deny_status.
- CLEANUP: http: make http_server_error() not set the status anymore
- MEDIUM: stats: Add JSON output option to show (info|stat)
- MEDIUM: stats: Add show json schema
- BUG/MAJOR: connection: update CO_FL_CONNECTED before calling the data layer
- MINOR: server: Add dynamic session cookies.
- MINOR: cli: Let configure the dynamic cookies from the cli.
- BUG/MINOR: checks: attempt clean shutw for SSL check
- CONTRIB: tcploop: make it build on FreeBSD
- CONTRIB: tcploop: fix time format to silence build warnings
- CONTRIB: tcploop: report action 'K' (kill) in usage message
- CONTRIB: tcploop: fix connect's address length
- CONTRIB: tcploop: use the trash instead of NULL for recv()
- BUG/MEDIUM: listener: do not try to rebind another process' socket
- BUG/MEDIUM server: Fix crash when dynamic is defined, but not key is provided.
- CLEANUP: config: Typo in comment.
- BUG/MEDIUM: filters: Fix channels synchronization in flt_end_analyze
- TESTS: add a test configuration to stress handshake combinations
- BUG/MAJOR: stream-int: do not depend on connection flags to detect connection
- BUG/MEDIUM: connection: ensure to always report the end of handshakes
- MEDIUM: connection: don't test for CO_FL_WAKE_DATA
- CLEANUP: connection: completely remove CO_FL_WAKE_DATA
- BUG: payload: fix payload not retrieving arbitrary lengths
- BUILD: ssl: simplify SSL_CTX_set_ecdh_auto compatibility
- BUILD: ssl: fix OPENSSL_NO_SSL_TRACE for boringssl and libressl
- BUG/MAJOR: http: fix typo in http_apply_redirect_rule
- MINOR: doc: 2.4. Examples should be 2.5. Examples
- BUG/MEDIUM: stream: fix client-fin/server-fin handling
- MINOR: fd: add a new flag HAP_POLL_F_RDHUP to struct poller
- BUG/MINOR: raw_sock: always perfom the last recv if RDHUP is not available
- OPTIM: poll: enable support for POLLRDHUP
- MINOR: kqueue: exclusively rely on the kqueue returned status
- MEDIUM: kqueue: take care of EV_EOF to improve polling status accuracy
- MEDIUM: kqueue: only set FD_POLL_IN when there are pending data
- DOC/MINOR: Fix typos in proxy protocol doc
- DOC: Protocol doc: add checksum, TLV type ranges
- DOC: Protocol doc: add SSL TLVs, rename CHECKSUM
- DOC: Protocol doc: add noop TLV
- MEDIUM: global: add a 'hard-stop-after' option to cap the soft-stop time
- MINOR: dns: improve DNS response parsing to use as many available records as possible
- BUG/MINOR: cfgparse: loop in tracked servers lists not detected by check_config_validity().
- MINOR: server: irrelevant error message with 'default-server' config file keyword.
- MINOR: server: Make 'default-server' support 'backup' keyword.
- MINOR: server: Make 'default-server' support 'check-send-proxy' keyword.
- CLEANUP: server: code alignement.
- MINOR: server: Make 'default-server' support 'non-stick' keyword.
- MINOR: server: Make 'default-server' support 'send-proxy' and 'send-proxy-v2 keywords.
- MINOR: server: Make 'default-server' support 'check-ssl' keyword.
- MINOR: server: Make 'default-server' support 'force-sslv3' and 'force-tlsv1[0-2]' keywords.
- CLEANUP: server: code alignement.
- MINOR: server: Make 'default-server' support 'no-ssl*' and 'no-tlsv*' keywords.
- MINOR: server: Make 'default-server' support 'ssl' keyword.
- MINOR: server: Make 'default-server' support 'send-proxy-v2-ssl*' keywords.
- CLEANUP: server: code alignement.
- MINOR: server: Make 'default-server' support 'verify' keyword.
- MINOR: server: Make 'default-server' support 'verifyhost' setting.
- MINOR: server: Make 'default-server' support 'check' keyword.
- MINOR: server: Make 'default-server' support 'track' setting.
- MINOR: server: Make 'default-server' support 'ca-file', 'crl-file' and 'crt' settings.
- MINOR: server: Make 'default-server' support 'redir' keyword.
- MINOR: server: Make 'default-server' support 'observe' keyword.
- MINOR: server: Make 'default-server' support 'cookie' keyword.
- MINOR: server: Make 'default-server' support 'ciphers' keyword.
- MINOR: server: Make 'default-server' support 'tcp-ut' keyword.
- MINOR: server: Make 'default-server' support 'namespace' keyword.
- MINOR: server: Make 'default-server' support 'source' keyword.
- MINOR: server: Make 'default-server' support 'sni' keyword.
- MINOR: server: Make 'default-server' support 'addr' keyword.
- MINOR: server: Make 'default-server' support 'disabled' keyword.
- MINOR: server: Add 'no-agent-check' server keyword.
- DOC: server: Add docs for "server" and "default-server" new "no-*" and other settings.
- MINOR: doc: fix use-server example (imap vs mail)
- BUG/MEDIUM: tcp: don't require privileges to bind to device
- BUILD: make the release script use shortlog for the final changelog
- BUILD: scripts: fix typo in announce-release error message
- CLEANUP: time: curr_sec_ms doesn't need to be exported
- BUG/MEDIUM: server: Wrong server default CRT filenames initialization.
- BUG/MEDIUM: peers: fix buffer overflow control in intdecode.
- BUG/MEDIUM: buffers: Fix how input/output data are injected into buffers
- BUG/MINOR: http: Fix conditions to clean up a txn and to handle the next request
- CLEANUP: http: Remove channel_congested function
- CLEANUP: buffers: Remove buffer_bounce_realign function
- CLEANUP: buffers: Remove buffer_contig_area and buffer_work_area functions
- MINOR: http: remove useless check on HTTP_MSGF_XFER_LEN for the request
- MINOR: http: Add debug messages when HTTP body analyzers are called
- BUG/MEDIUM: http: Fix blocked HTTP/1.0 responses when compression is enabled
- BUG/MINOR: filters: Don't force the stream's wakeup when we wait in flt_end_analyze
- DOC: fix parenthesis and add missing "Example" tags
- DOC: update the contributing file
- DOC: log-format/tcplog/httplog update
- MINOR: config parsing: add warning when log-format/tcplog/httplog is overriden in "defaults" sections
In flt_end_analyze, we wait that the anlayze is finished for both the request
and the response. In this case, because of a task_wakeup, some streams can
consume too much CPU to do nothing. So now, this is the filter's responsibility
to know if this wakeup is needed.
This fix should be backported in 1.7.
When the compression filter is enabled, if a HTTP/1.0 response is received (no
content-length and no transfer-encoding), no data are forwarded to the client
because of a bug and the transaction is blocked indefinitly.
The bug comes from the fact we need to synchronize the end of the request and
the response because of the compression filter. This inhibits the infinite
forwarding of data. But for these responses, the compression is not
activated. So the response body is not analyzed. This leads to a deadlock.
The solution is to enable the analyze of the response body in all cases and
handle this one to enable the infinite forwarding. All other cases should
already by handled.
This fix should be backported to 1.7.
To finish a HTTP transaction and to start the new one, we check, among other
things, that there is enough space in the reponse buffer to eventually inject a
message during the parsing of the next request. Because these messages can reach
the maximum buffers size, it is mandatory to have an empty response
buffer. Remaining input data are trimmed during the txn cleanup (in
http_reset_txn), so we just need to check that the output data were flushed.
The current implementation depends on channel_congested, which does check the
reserved area is available. That's not of course good enough. There are other
tests on the reponse buffer is http_wait_for_request. But conditions to move on
are almost the same. So, we can imagine some scenarii where some output data
remaining in the reponse buffer during the request parsing prevent any messages
injection.
To fix this bug, we just wait that output data were flushed before cleaning up
the HTTP txn (ie. s->res.buf->o == 0). In addition, in http_reset_txn we realign
the response buffer (note the buffer is empty at this step).
Thanks to this changes, there is no more need to set CF_EXPECT_MORE on the
response channel in http_end_txn_clean_session. And more important, there is no
more need to check the response buffer state in http_wait_for_request. This
remove a workaround on response analysers to handle HTTP pipelining.
This patch can be backported in 1.7, 1.6 and 1.5.
The function buffer_contig_space is buggy and could lead to pernicious bugs
(never hitted until now, AFAIK). This function should return the number of bytes
that can be written into the buffer at once (without wrapping).
First, this function is used to inject input data (bi_putblk) and to inject
output data (bo_putblk and bo_inject). But there is no context. So it cannot
decide where contiguous space should placed. For input data, it should be after
bi_end(buf) (ie, buf->p + buf->i modulo wrapping calculation). For output data,
it should be after bo_end(buf) (ie, buf->p) and input data are assumed to not
exist (else there is no space at all).
Then, considering we need to inject input data, this function does not always
returns the right value. And when we need to inject output data, we must be sure
to have no input data at all (buf->i == 0), else the result can also be wrong
(but this is the caller responsibility, so everything should be fine here).
The buffer can be in 3 different states:
1) no wrapping
<---- o ----><----- i ----->
+------------+------------+-------------+------------+
| |oooooooooooo|iiiiiiiiiiiii|xxxxxxxxxxxx|
+------------+------------+-------------+------------+
^ <contig_space>
p ^ ^
l r
2) input wrapping
...---> <---- o ----><-------- i -------...
+-----+------------+------------+--------------------+
|iiiii|xxxxxxxxxxxx|oooooooooooo|iiiiiiiiiiiiiiiiiiii|
+-----+------------+------------+--------------------+
<contig_space> ^
^ ^ p
l r
3) output wrapping
...------ o ------><----- i -----> <----...
+------------------+-------------+------------+------+
|oooooooooooooooooo|iiiiiiiiiiiii|xxxxxxxxxxxx|oooooo|
+------------------+-------------+------------+------+
^ <contig_space>
p ^ ^
l r
buffer_contig_space returns (l - r). The cases 1 and 3 are correctly
handled. But for the second case, r is wrong. It points on the buffer's end
(buf->data + buf->size). It should be bo_end(buf) (ie, buf->p - buf->o).
To fix the bug, the function has been splitted. Now, bi_contig_space and
bo_contig_space should be used to know the contiguous space available to insert,
respectively, input data and output data. For bo_contig_space, input data are
assumed to not exist. And the right version is used, depending what we want to
do.
In addition, to clarify the buffer's API, buffer_realign does not return value
anymore. So it has the same API than buffer_slow_realign.
This patch can be backported in 1.7, 1.6 and 1.5.
A buffer overflow could happen if an integer is badly encoded in
the data part of a msg received from a peer. It should not happen
with authenticated peers (the handshake do not use this function).
This patch makes the code of the 'intdecode' function more robust.
It also adds some comments about the intencode function.
This bug affects versions >= 1.6.
This patch fixes a bug which came with 5e57643 commit where server
default CRT filenames were initialized to the same value as server
default CRL filenames.
Ankit Malp reported a bug that we've had since binding to devices was
implemented. Haproxy wrongly checks that the process stays privileged
after startup when a binding to a device is specified via the bind
keyword "interface". This is wrong, because after startup we're not
binding any socket anymore, and during startup if there's a permission
issue it will be immediately reported ("permission denied"). More
importantly there's no way around it as the process exits on startup
when facing such an option.
This fix should be backported to 1.7, 1.6 and 1.5.
This patch adds 'no-agent-check' setting supported both by 'default-server'
and 'server' directives to disable an agent check for a specific server which would
have 'agent-check' set as default value (inherited from 'default-server'
'agent-check' setting), or, on 'default-server' lines, to disable 'agent-check' setting
as default value for any further 'server' declarations.
For instance, provided this configuration:
default-server agent-check
server srv1
server srv2 no-agent-check
server srv3
default-server no-agent-check
server srv4
srv1 and srv3 would have an agent check enabled contrary to srv2 and srv4.
We do not allocate anymore anything when parsing 'default-server' 'agent-check'
setting.
Before this patch, only 'server' directives could support 'disabled' setting.
This patch makes also 'default-server' directives support this setting.
It is used to disable a list of servers declared after a 'defaut-server' directive.
'enabled' new keyword has been added, both supported as 'default-server' and
'server' setting, to enable again a list of servers (so, declared after a
'default-server enabled' directive) or to explicitly enable a specific server declared
after a 'default-server disabled' directive.
For instance provided this configuration:
default-server disabled
server srv1...
server srv2...
server srv3... enabled
server srv4... enabled
srv1 and srv2 are disabled and srv3 and srv4 enabled.
This is equivalent to this configuration:
default-server disabled
server srv1...
server srv2...
default-server enabled
server srv3...
server srv4...
even if it would have been preferable/shorter to declare:
server srv3...
server srv4...
default-server disabled
server srv1...
server srv2...
as 'enabled' is the default server state.
This patch makes 'default-server' support 'addr' setting.
The code which was responsible of parsing 'server' 'addr' setting
has moved from parse_server() to implement a new parser
callable both as 'default-server' and 'server' 'addr' setting parser.
Should not break anything.
This patch makes 'default-server' directives support 'sni' settings.
A field 'sni_expr' has been added to 'struct server' to temporary
stores SNI expressions as strings during both 'default-server' and 'server'
lines parsing. So, to duplicate SNI expressions from 'default-server' 'sni' setting
for new 'server' instances we only have to "strdup" these strings as this is
often done for most of the 'server' settings.
Then, sample expressions are computed calling sample_parse_expr() (only for 'server'
instances).
A new function has been added to produce the same error output as before in case
of any error during 'sni' settings parsing (display_parser_err()).
Should not break anything.
Before this patch, only 'server' directives could support 'source' setting.
This patch makes also 'default-server' directives support this setting.
To do so, we had to extract the code responsible of parsing 'source' setting
arguments from parse_server() function and make it callable both
as 'default-server' and 'server' 'source' setting parser. So, the code is mostly
the same as before except that before allocating anything for 'struct conn_src'
members, we must free the memory previously allocated.
Should not break anything.
Before this patch, 'cookie' setting was only supported by 'server' directives.
This patch makes 'default-server' directive also support 'cookie' setting.
Should not break anything.
Before this path, 'observe' setting was only supported by 'server' directives.
This patch makes 'default-server' directives also support 'observe' setting.
Should not break anything.
Before this patch only 'server' directives could support 'redir' setting.
This patch makes also 'default-server' directives support 'redir' setting.
Should not break anything.
Before this patch only 'server' directives could support 'track' setting.
This patch makes 'default-server' directives also support this setting.
Should not break anything.
Before this patch 'check' setting was only supported by 'server' directives.
This patch makes also 'default-server' directives support this setting.
A new 'no-check' keyword parser has been implemented to disable this setting both
in 'default-server' and 'server' directives.
Should not break anything.
This patch makes 'default-server' directive support 'verifyhost' setting.
Note: there was a little memory leak when several 'verifyhost' arguments were
supplied on the same 'server' line.
This patch makes 'default-server' directive support 'send-proxy-v2-ssl'
(resp. 'send-proxy-v2-ssl-cn') setting.
A new keyword 'no-send-proxy-v2-ssl' (resp. 'no-send-proxy-v2-ssl-cn') has been
added to disable 'send-proxy-v2-ssl' (resp. 'send-proxy-v2-ssl-cn') setting both
in 'server' and 'default-server' directives.
This patch makes 'default-server' directive support 'ssl' setting.
A new keyword 'no-ssl' has been added to disable this setting both
in 'server' and 'default-server' directives.
This patch makes 'default-server' directive support 'no-sslv3' (resp. 'no-ssl-reuse',
'no-tlsv10', 'no-tlsv11', 'no-tlsv12', and 'no-tls-tickets') setting.
New keywords 'sslv3' (resp. 'ssl-reuse', 'tlsv10', 'tlsv11', 'tlsv12', and
'tls-no-tickets') have been added to disable these settings both in 'server' and
'default-server' directives.
This patch makes 'default-server' directive support 'force-sslv3'
and 'force-tlsv1[0-2]' settings.
New keywords 'no-force-sslv3' (resp. 'no-tlsv1[0-2]') have been added
to disable 'force-sslv3' (resp. 'force-tlsv1[0-2]') setting both in 'server' and
'default-server' directives.
This patch makes 'default-server' directive support 'check-ssl' setting
to enable SSL for health checks.
A new keyword 'no-check-ssl' has been added to disable this setting both in
'server' and 'default-server' directives.
This patch makes 'default-server' directive support 'send-proxy'
(resp. 'send-proxy-v2') setting.
A new keyword 'no-send-proxy' (resp. 'no-send-proxy-v2') has been added
to disable 'send-proxy' (resp. 'send-proxy-v2') setting both in 'server' and
'default-server' directives.
This patch makes 'default-server' directive support 'non-stick' setting.
A new keyword 'stick' has been added so that to disable
'non-stick' setting both in 'server' and 'default-server' directives.
This patch makes 'default-server' directive support 'check-send-proxy' setting.
A new keyword 'no-check-send-proxy' has been added so that to disable
'check-send-proxy' setting both in 'server' and 'default-server' directives.
At this time, only 'server' supported 'backup' keyword.
This patch makes also 'default-server' directive support this keyword.
A new keyword 'no-backup' has been added so that to disable 'backup' setting
both in 'server' and 'default-server' directives.
For instance, provided the following sequence of directives:
default-server backup
server srv1
server srv2 no-backup
default-server no-backup
server srv3
server srv4 backup
srv1 and srv4 are declared as backup servers,
srv2 and srv3 are declared as non-backup servers.
There is no reason to emit such an error message:
"'default-server' expects <name> and <addr>[:<port>] as arguments."
if less than two arguments are provided on 'default-server' lines.
This is a 'server' specific error message.
There is a silly case where a loop is not detected in tracked servers lists:
when a server tracks itself.
Ex:
server srv1 127.0.0.1:8000 track srv1
Well, this never happens and this does not prevent haproxy from working.
But with this next following configuration:
server srv1 127.0.0.1:8000 track srv2
server srv2 127.0.0.1:8000 track srv2
server srv3 127.0.0.1:8000 track srv2
the code in charge of detecting such loops never returns (without any error message).
haproxy becomes stuck in an infinite loop because of this statement found
in check_config_validity():
for (loop = srv->track; loop && loop != newsrv; loop = loop->track);
Again, such a configuration is never accidentally used I guess.
This latter example seems silly, but as several 'default-server' directives may be used
in the same proxy section, and as 'default-server' settings are not resetted each a
new 'default-server' line is created, it will match the following configuration, in the future,
when 'track' setting will be supported by 'default-server':
default-server track srv3
server srv1 127.0.0.1:8000
server srv2 127.0.0.1:8000
.
.
.
default-server check
server srv3 127.0.0.1:8000
(cherry picked from commit 6528fc93d3c065fdac841f24e55cfe9674a67414)
A "weakness" exist in the first implementation of the parsing of the DNS
responses: HAProxy always choses the first IP available matching family
preference, or as a failover, the first IP.
It should be good enough, since most DNS servers do round robin on the
records they send back to clients.
That said, some servers does not do proper round robin, or we may be
unlucky too and deliver the same IP to all the servers sharing the same
hostname.
Let's take the simple configuration below:
backend bk
srv s1 www:80 check resolvers R
srv s2 www:80 check resolvers R
The DNS server configured with 2 IPs for 'www'.
If you're unlucky, then HAProxy may apply the same IP to both servers.
Current patch improves this situation by weighting the decision
algorithm to ensure we'll prefer use first an IP found in the response
which is not already affected to any server.
The new algorithm does not guarantee that the chosen IP is healthy,
neither a fair distribution of IPs amongst the servers in the farm,
etc...
It only guarantees that if the DNS server returns many records for a
hostname and that this hostname is being resolved by multiple servers in
the same backend, then we'll use as many records as possible.
If a server fails, HAProxy won't pick up an other record from the
response.
When SIGUSR1 is received, haproxy enters in soft-stop and quits when no
connection remains.
It can happen that the instance remains alive for a long time, depending
on timeouts and traffic. This option ensures that soft-stop won't run
for too long.
Example:
global
hard-stop-after 30s # Once in soft-stop, the instance will remain
# alive for at most 30 seconds.
kevent() always sets EV_EOF with EVFILT_READ to notify of a read shutdown
and EV_EOF with EVFILT_WRITE to notify of a write error. Let's check this
flag to properly update the FD's polled status (FD_POLL_HUP and FD_POLL_ERR
respectively).
It's worth noting that this one can be coupled with a regular read event
to notify about a pending read followed by a shutdown, but for now we only
use this to set the relevant flags (HUP and ERR).
The poller now exhibits the flag HAP_POLL_F_RDHUP to indicate this new
capability.
An improvement may consist in not setting FD_POLL_IN when the "data"
field is null since it normally only reflects the amount of pending
data.
After commit e852545 ("MEDIUM: polling: centralize polled events processing")
all pollers stopped to explicitly check the FD's polled status, except the
kqueue poller which is constructed a bit differently. It doesn't seem possible
to cause any issue such as missing an event however, but anyway it's better to
definitely get rid of this since the event filter already provides the event
direction.
On Linux since 2.6.17 poll() supports POLLRDHUP to notify of an upcoming
hangup after pending data. Making use of it allows us to avoid a useless
recv() after short responses on keep-alive connections. Note that we
automatically enable the feature once this flag has been met first in a
poll() status. Till now it was only enabled on epoll.
Curu Wong reported a case where haproxy used to send RST to a server
after receiving its FIN. The problem is caused by the fact that being
a server connection, its fd is marked with linger_risk=1, and that the
poller didn't report POLLRDHUP, making haproxy unaware of a pending
shutdown that came after the data, so it used to resort to nolinger
for closing.
However when pollers support RDHUP we're pretty certain whether or not
a shutdown comes after the data and we don't need to perform that extra
recv() call.
Similarly when we're dealing with an inbound connection we don't care
and don't want to perform this extra recv after a request for a very
unlikely case, as in any case we'll have to deal with the client-facing
TIME_WAIT socket.
So this patch ensures that only when it's known that there's no risk with
lingering data, as well as in cases where it's known that the poller would
have detected a pending RDHUP, we perform the fd_done_recv() otherwise we
continue, trying a subsequent recv() to try to detect a pending shutdown.
This effectively results in an extra recv() call for keep-alive sockets
connected to a server when POLLRDHUP isn't known to be supported, but
it's the only way to know whether they're still alive or closed.
This fix should be backported to 1.7, 1.6 and 1.5. It relies on the
previous patch bringing support for the HAP_POLL_F_RDHUP flag.
We'll need to differenciate between pollers which can report hangup at
the same time as read (POLL_RDHUP) from the other ones, because only
these ones may benefit from the fd_done_recv() optimization. Epoll has
had support for EPOLLRDHUP since Linux 2.6.17 and has always been used
this way in haproxy, so now we only set the flag once we've observed it
once in a response. It means that some initial requests may try to
perform a second recv() call, but after the first closed connection it
will be enough to know that the second call is not needed anymore.
Later we may extend these flags to designate event-triggered pollers.
A tcp half connection can cause 100% CPU on expiration.
First reproduced with this haproxy configuration :
global
tune.bufsize 10485760
defaults
timeout server-fin 90s
timeout client-fin 90s
backend node2
mode tcp
timeout server 900s
timeout connect 10s
server def 127.0.0.1:3333
frontend fe_api
mode tcp
timeout client 900s
bind :1990
use_backend node2
Ie timeout server-fin shorter than timeout server, the backend server
sends data, this package is left in the cache of haproxy, the backend
server continue sending fin package, haproxy recv fin package. this
time the session information is as follows:
time the session information is as follows:
0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
srv=def ts=08 age=1s calls=3 rq[f=848000h,i=0,an=00h,rx=14m58s,wx=,ax=]
rp[f=8004c020h,i=0,an=00h,rx=,wx=14m58s,ax=] s0=[7,0h,fd=6,ex=]
s1=[7,18h,fd=7,ex=] exp=14m58s
rp has set the CF_SHUTR state, next, the client sends the fin package,
session information is as follows:
0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
srv=def ts=08 age=38s calls=4 rq[f=84a020h,i=0,an=00h,rx=,wx=,ax=]
rp[f=8004c020h,i=0,an=00h,rx=1m11s,wx=14m21s,ax=] s0=[7,0h,fd=6,ex=]
s1=[9,10h,fd=7,ex=] exp=1m11s
After waiting 90s, session information is as follows:
0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
srv=def ts=04 age=4m11s calls=718074391 rq[f=84a020h,i=0,an=00h,rx=,wx=,ax=]
rp[f=8004c020h,i=0,an=00h,rx=?,wx=10m49s,ax=] s0=[7,0h,fd=6,ex=]
s1=[9,10h,fd=7,ex=] exp=? run(nice=0)
cpu information:
6899 root 20 0 112224 21408 4260 R 100.0 0.7 3:04.96 haproxy
Buffering is set to ensure that there is data in the haproxy buffer, and haproxy
can receive the fin package, set the CF_SHUTR flag, If the CF_SHUTR flag has been
set, The following code does not clear the timeout message, causing cpu 100%:
stream.c:process_stream:
if (unlikely((res->flags & (CF_SHUTR|CF_READ_TIMEOUT)) == CF_READ_TIMEOUT)) {
if (si_b->flags & SI_FL_NOHALF)
si_b->flags |= SI_FL_NOLINGER;
si_shutr(si_b);
}
If you have closed the read, set the read timeout does not make sense.
With or without cf_shutr, read timeout is set:
if (tick_isset(s->be->timeout.serverfin)) {
res->rto = s->be->timeout.serverfin;
res->rex = tick_add(now_ms, res->rto);
}
After discussion on the mailing list, setting half-closed timeouts the
hard way here doesn't make sense. They should be set only at the moment
the shutdown() is performed. It will also solve a special case which was
already reported of some half-closed timeouts not working when the shutw()
is performed directly at the stream-interface layer (no analyser involved).
Since the stream interface layer cannot know the timeout values, we'll have
to store them directly in the stream interface so that they are used upon
shutw(). This patch does this, fixing the problem.
An easier reproducer to validate the fix is to keep the huge buffer and
shorten all timeouts, then call it under tcploop server and client, and
wait 3 seconds to see haproxy run at 100% CPU :
global
tune.bufsize 10485760
listen px
bind :1990
timeout client 90s
timeout server 90s
timeout connect 1s
timeout server-fin 3s
timeout client-fin 3s
server def 127.0.0.1:3333
$ tcploop 3333 L W N20 A P100 F P10000 &
$ tcploop 127.0.0.1:1990 C S10000000 F
Because of this typo, AN_RES_FLT_END was never called when a redirect rule is
applied on a keep-alive connection. In almost all cases, this bug has no
effect. But, it leads to a memory leak if a redirect is done on a http-response
rule when the HTTP compression is enabled.
This patch should be backported in 1.7.
SSL_CTX_set_ecdh_auto is declared (when present) with #define. A simple #ifdef
avoid to list all cases of ssllibs. It's a placebo in new ssllibs. It's ok with
openssl 1.0.1, 1.0.2, 1.1.0, libressl and boringssl.
Thanks to Piotr Kubaj for postponing and testing with libressl.
This fixes a regression introduced in d7bdcb874b, that removed the
ability to use req.payload(0,0) to read the whole buffer content. The
offending commit is present starting in version 1.6, so the patch
should be backported to versions 1.6 and 1.7.
This flag is always set when we end up here, for each and every data
layer (idle, stream-interface, checks), and continuing to test it
leaves a big risk of forgetting to set it as happened once already
before 1.5-dev13.
It could make sense to backport this into stable branches as part of
the connection flag fixes, after some cool down period.
Despite the previous commit working fine on all tests, it's still not
sufficient to completely address the problem. If the connection handler
is called with an event validating an L4 connection but some handshakes
remain (eg: accept-proxy), it will still wake the function up, which
will not report the activity, and will not detect a change once the
handshake it complete so it will not notify the ->wake() handler.
In fact the only reason why the ->wake() handler is still called here
is because after dropping the last handshake, we try to call ->recv()
and ->send() in turn and change the flags in order to detect a data
activity. But if for any reason the data layer is not interested in
reading nor writing, it will not get these events.
A cleaner way to address this is to call the ->wake() handler only
on definitive status changes (shut, error), on real data activity,
and on a complete connection setup, measured as CONNECTED with no
more handshake pending.
It could be argued that the handshake flags have to be made part of
the condition to set CO_FL_CONNECTED but that would currently break
a part of the health checks. Also a handshake could appear at any
moment even after a connection is established so we'd lose the
ability to detect a second end of handshake.
For now the situation around CO_FL_CONNECTED is not clean :
- session_accept() only sets CO_FL_CONNECTED if there's no pending
handshake ;
- conn_fd_handler() will set it once L4 and L6 are complete, which
will do what session_accept() above refrained from doing even if
an accept_proxy handshake is still pending ;
- ssl_sock_infocbk() and ssl_sock_handshake() consider that a
handshake performed with CO_FL_CONNECTED set is a renegociation ;
=> they should instead filter on CO_FL_WAIT_L6_CONN
- all ssl_fc_* sample fetch functions wait for CO_FL_CONNECTED before
accepting to fetch information
=> they should also get rid of any pending handshake
- smp_fetch_fc_rcvd_proxy() uses !CO_FL_CONNECTED instead of
CO_FL_ACCEPT_PROXY
- health checks (standard and tcp-checks) don't check for HANDSHAKE
and may report a successful check based on CO_FL_CONNECTED while
not yet done (eg: send buffer full on send_proxy).
This patch aims at solving some of these side effects in a backportable
way before this is reworked in depth :
- we need to call ->wake() to report connection success, measure
connection time, notify that the data layer is ready and update
the data layer after activity ; this has to be done either if
we switch from pending {L4,L6}_CONN to nothing with no handshakes
left, or if we notice some handshakes were pending and are now
done.
- we document that CO_FL_CONNECTED exactly means "L4 connection
setup confirmed at least once, L6 connection setup confirmed
at least once or not necessary, all this regardless of any
possibly remaining handshakes or future L6 negociations".
This patch also renames CO_FL_CONN_STATUS to the more explicit
CO_FL_NOTIFY_DATA, and works around the previous flags trick consiting
in setting an impossible combination of flags to notify the data layer,
by simply clearing the current flags.
This fix should be backported to 1.7, 1.6 and 1.5.
Recent fix 7bf3fa3 ("BUG/MAJOR: connection: update CO_FL_CONNECTED before
calling the data layer") marked an end to a fragile situation where the
absence of CO_FL_{CONNECTED,L4,L6}* flags is used to mark the completion
of a connection setup. The problem is that by setting the CO_FL_CONNECTED
flag earlier, we can indeed call the ->wake() function from conn_fd_handler
but the stream-interface's wake function needs to see CO_FL_CONNECTED unset
to detect that a connection has just been established, so if there's no
pending data in the buffer, the connection times out. The other ->wake()
functions (health checks and idle connections) don't do this though.
So instead of trying to detect a subtle change in connection flags,
let's simply rely on the stream-interface's state and validate that the
connection is properly established and that handshakes are completed
before reporting the WRITE_NULL indicating that a pending connection was
just completed.
This patch passed all tests of handshake and non-handshake combinations,
with synchronous and asynchronous connect() and should be safe for backport
to 1.7, 1.6 and 1.5 when the fix above is already present.
When a filter is used, there are 2 channel's analyzers to surround all the
others, flt_start_analyze and flt_end_analyze. This is the good place to acquire
and release resources used by filters, when needed. In addition, the last one is
used to synchronize the both channels, especially for HTTP streams. We must wait
that the analyze is finished for the both channels for an HTTP transaction
before restarting it for the next one.
But this part was buggy, leading to unexpected behaviours. First, depending on
which channel ends first, the request or the response can be switch in a
"forward forever" mode. Then, the HTTP transaction can be cleaned up too early,
while a processing is still in progress on a channel.
To fix the bug, the flag CF_FLT_ANALYZE has been added. It is set on channels in
flt_start_analyze and is kept if at least one filter is still analyzing the
channel. So, we can trigger the channel syncrhonization if this flag was removed
on the both channels. In addition, the flag TX_WAIT_CLEANUP has been added on
the transaction to know if the transaction must be cleaned up or not during
channels syncrhonization. This way, we are sure to reset everything once all the
processings are finished.
This patch should be backported in 1.7.
When the "process" setting of a bind line limits the processes a
listening socket is enabled on, a "disable frontend" operation followed
by an "enable frontend" triggers a bug because all declared listeners
are attempted to be bound again regardless of their assigned processes.
This can at minima create new sockets not receiving traffic, and at worst
prevent from re-enabling a frontend if it's bound to a privileged port.
This bug was introduced by commit 1c4b814 ("MEDIUM: listener: support
rebinding during resume()") merged in 1.6-dev1, trying to perform the
bind() before checking the process list instead of after.
Just move the process check before the bind() operation to fix this.
This fix must be backported to 1.7 and 1.6.
Thanks to Pavlos for reporting this one.
Strict interpretation of TLS can cause SSL sessions to be thrown away
when the socket is shutdown without sending a "close notify", resulting
in each check to go through the complete handshake, eating more CPU on
the servers.
[wt: strictly speaking there's no guarantee that the close notify will
be delivered, it's only best effort, but that may be enough to ensure
that once at least one is received, next checks will be cheaper. This
should be backported to 1.7 and possibly 1.6]
This adds 3 new commands to the cli :
enable dynamic-cookie backend <backend> that enables dynamic cookies for a
specified backend
disable dynamic-cookie backend <backend> that disables dynamic cookies for a
specified backend
set dynamic-cookie-key backend <backend> that lets one change the dynamic
cookie secret key, for a specified backend.
This adds a new "dynamic" keyword for the cookie option. If set, a cookie
will be generated for each server (assuming one isn't already provided on
the "server" line), from the IP of the server, the TCP port, and a secret
key provided. To provide the secret key, a new keyword as been added,
"dynamic-cookie-key", for backends.
Example :
backend bk_web
balance roundrobin
dynamic-cookie-key "bla"
cookie WEBSRV insert dynamic
server s1 127.0.0.1:80 check
server s2 192.168.56.1:80 check
This is a first step to be able to dynamically add and remove servers,
without modifying the configuration file, and still have all the load
balancers redirect the traffic to the right server.
Provide a way to generate session cookies, based on the IP address of the
server, the TCP port, and a secret key provided.
Matthias Fechner reported a regression in 1.7.3 brought by the backport
of commit 819efbf ("BUG/MEDIUM: tcp: don't poll for write when connect()
succeeds"), causing some connections to fail to establish once in a while.
While this commit itself was a fix for a bad sequencing of connection
events, it in fact unveiled a much deeper bug going back to the connection
rework era in v1.5-dev12 : 8f8c92f ("MAJOR: connection: add a new
CO_FL_CONNECTED flag").
It's worth noting that in a lab reproducing a similar environment as
Matthias' about only 1 every 19000 connections exhibit this behaviour,
making the issue not so easy to observe. A trick to make the problem
more observable consists in disabling non-blocking mode on the socket
before calling connect() and re-enabling it later, so that connect()
always succeeds. Then it becomes 100% reproducible.
The problem is that this CO_FL_CONNECTED flag is tested after deciding to
call the data layer (typically the stream interface but might be a health
check as well), and that the decision to call the data layer relies on a
change of one of the flags covered by the CO_FL_CONN_STATE set, which is
made of CO_FL_CONNECTED among others.
Before the fix above, this bug couldn't appear with TCP but it could
appear with Unix sockets. Indeed, connect() was always considered
blocking so the CO_FL_WAIT_L4_CONN connection flag was always set, and
polling for write events was always enabled. This used to guarantee that
the conn_fd_handler() could detect a change among the CO_FL_CONN_STATE
flags.
Now with the fix above, if a connect() immediately succeeds for non-ssl
connection with send-proxy enabled, and no data in the buffer (thus TCP
mode only), the CO_FL_WAIT_L4_CONN flag is not set, the lack of data in
the buffer doesn't enable polling flags for the data layer, the
CO_FL_CONNECTED flag is not set due to send-proxy still being pending,
and once send-proxy is done, its completion doesn't cause the data layer
to be woken up due to the fact that CO_FL_CONNECT is still not present
and that the CO_FL_SEND_PROXY flag is not watched in CO_FL_CONN_STATE.
Then no progress is made when data are received from the client (and
attempted to be forwarded), because a CF_WRITE_NULL (or CF_WRITE_PARTIAL)
flag is needed for the stream-interface state to turn from SI_ST_CON to
SI_ST_EST, allowing ->chk_snd() to be called when new data arrive. And
the only way to set this flag is to call the data layer of course.
After the connect timeout, the connection gets killed and if in the mean
time some data have accumulated in the buffer, the retry will succeed.
This patch fixes this situation by simply placing the update of
CO_FL_CONNECTED where it should have been, before the check for a flag
change needed to wake up the data layer and not after.
This fix must be backported to 1.7, 1.6 and 1.5. Versions not having
the patch above are still affected for unix sockets.
Special thanks to Matthias Fechner who provided a very detailed bug
report with a bisection designating the faulty patch, and to Olivier
Houchard for providing full access to a pretty similar environment where
the issue could first be reproduced.
This may be used to output the JSON schema which describes the output of
show info json and show stats json.
The JSON output is without any extra whitespace in order to reduce the
volume of output. For human consumption passing the output through a
pretty printer may be helpful.
e.g.:
$ echo "show schema json" | socat /var/run/haproxy.stat stdio | \
python -m json.tool
The implementation does not generate the schema. Some consideration could
be given to integrating the output of the schema with the output of
typed and json info and stats. In particular the types (u32, s64, etc...)
and tags.
A sample verification of show info json and show stats json using
the schema is as follows. It uses the jsonschema python module:
cat > jschema.py << __EOF__
import json
from jsonschema import validate
from jsonschema.validators import Draft3Validator
with open('schema.txt', 'r') as f:
schema = json.load(f)
Draft3Validator.check_schema(schema)
with open('instance.txt', 'r') as f:
instance = json.load(f)
validate(instance, schema, Draft3Validator)
__EOF__
$ echo "show schema json" | socat /var/run/haproxy.stat stdio > schema.txt
$ echo "show info json" | socat /var/run/haproxy.stat stdio > instance.txt
python ./jschema.py
$ echo "show stats json" | socat /var/run/haproxy.stat stdio > instance.txt
python ./jschema.py
Signed-off-by: Simon Horman <horms@verge.net.au>
Add a json parameter to show (info|stat) which will output information
in JSON format. A follow-up patch will add a JSON schema which describes
the format of the JSON output of these commands.
The JSON output is without any extra whitespace in order to reduce the
volume of output. For human consumption passing the output through a
pretty printer may be helpful.
e.g.:
$ echo "show info json" | socat /var/run/haproxy.stat stdio | \
python -m json.tool
STAT_STARTED has bee added in order to track if show output has begun or
not. This is used in order to allow the JSON output routines to only insert
a "," between elements when needed. I would value any feedback on how this
might be done better.
Signed-off-by: Simon Horman <horms@verge.net.au>
Given that all call places except one had to set txn->status prior to
calling http_server_error(), it's simpler to make this function rely
on txn->status than have it store it from an argument.
This commit removes second argument(msgnum) from http_error_message and
changes http_error_message to use s->txn->status/http_get_status_idx for
mapping status code from 200..504 to HTTP_ERR_200..HTTP_ERR_504(enum).
This is needed for http-request tarpit deny_status commit.
It adds "hostname" as a new sample fetch. It does exactly the same as
"%H" in a log format except that it can be used outside of log formats.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
2 places were using an open-coded implementation of this function to count
available servers. Note that the avg_queue_size() fetch didn't check that
the proxy was in STOPPED state so it would possibly return a wrong server
count here but that wouldn't impact the returned value.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
This is like the nbsrv() sample fetch function except that it works as
a converter so it can count the number of available servers of a backend
name retrieved using a sample fetch or an environment variable.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
The said form of the CLI command didn't return anything since commit
ad8be61c7.
This fix needs to be backported to 1.7.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
The memory is released by cli_release_mlook, which also properly sets the
pointer to NULL. This was introduced with a big code reorganization
involving moving to the new keyword registration form in commit ad8be61c7.
This fix needs to be backported to 1.7.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
Invalid OCSP file (for example empty one that can be used to enable
OCSP response to be set dynamically later) causes errors that are
placed on OpenSSL error stack. Those errors are not cleared so
anything that checks this stack later will fail.
Following configuration:
bind :443 ssl crt crt1.pem crt crt2.pem
With following files:
crt1.pem
crt1.pem.ocsp - empty one
crt2.pem.rsa
crt2.pem.ecdsa
Will fail to load.
This patch should be backported to 1.7.
Surprizingly, http-request, http-response, block, redirect, and capture
rules did not cause a warning to be emitted when used in a TCP proxy, so
let's fix this.
This patch may be backported to older versions as it helps spotting
configuration issues.
As its named said, this statement customize the maximum allowed size for frames
exchanged between HAProxy and SPOAs. It should be greater than or equal to 256
and less than or equal to (tune.bufsize - 4) (4 bytes are reserved to the frame
length).
This option can be used to enable or to disable (prefixing the option line with
the "no" keyword) the sending of fragmented payload to agents. By default, this
option is enabled.
These options can be used to enable or to disable (prefixing the option line
with the "no" keyword), respectively, pipelined and asynchronous exchanged
between HAproxy and agents. By default, pipelining and async options are
enabled.
Now, when a payload is fragmented, the first frame must define the frame type
and the followings must use the special type SPOE_FRM_T_UNSET. This way, it is
easy to know if a fragment is the first one or not. Of course, all frames must
still share the same stream-id and frame-id.
Update SPOA example accordingly.
If an agent want to abort the processing a fragmented NOTIFY frame before
receiving all fragments, it can send an ACK frame at any time with ABORT bit set
(and of course, the FIN bit too).
Beside this change, SPOE_FRM_ERR_FRAMEID_NOTFOUND error flag has been added. It
is set when a unknown ACK frame is received.
Now, agents can announce the support for the "fragmentation" capability during
the HELLO handshake. HAProxy will never announce it because fragmented frame
decoding is not implemented yet. But it can send such kind of frames. So, if an
agent supports this capability, payloads exceeding the frame size will be
split. A fragemented payload consists of several frames with the FIN bit clear
and terminated by a single frame with the FIN bit set. All these frames must
share the same STREAM-ID and FRAME-ID.
Note that an unfragemnted payload consists of a single frame with the FIN bit
set. And HELLO and DISCONNECT frames cannot be fragmented. This means that only
NOTIFY frames can transport fragmented payload for now.
The max_frame_size value is negociated between HAProxy and SPOE agents during
the HELLO handshake. It is a per-connection value. Different SPOE agents can
choose to use different max_frame_size values. So, now, we keep the minimum of
all known max_frame_size. This minimum is updated when a new connection to a
SPOE agent is opened and when a connection is closed. We use this value as a
limit to encode messages in NOTIFY frames.
This happens when buffer allocation failed. In the SPOE context, buffers are
allocated by streams and SPOE applets at different time. First, by streams, when
messages need to be encoded before sending them in a NOTIFY frame. Then, by SPOE
applets, when a ACK frame is received.
The first case works as expected, we wake up the stream. But for the second one,
we must wake up the waiting SPOE applet.
Now, when option "set-on-error" is enabled, we set a status code representing
the error occurred instead of "true". For values under 256, it represents an
error coming from the engine. Below 256, it reports a SPOP error. In this case,
to retrieve the right SPOP status code, you must remove 256 to this value. Here
are possible values:
* 1: a timeout occurred during the event processing.
* 2: an error was triggered during the ressources allocation.
* 255: an unknown error occurred during the event processing.
* 256+N: a SPOP error occurred during the event processing.
Now, as for peers, we use an opaque pointer to store information related to the
SPOE filter in appctx structure. These information are now stored in a dedicated
structure (spoe_appctx) and allocated, using a pool, when the applet is created.
This removes the dependency between applets and the SPOE filter and avoids to
eventually inflate the appctx structure.
Now, HAProxy and agents can announce the support for "pipelining" and/or "async"
capabilities during the HELLO handshake. For now, HAProxy always announces the
support of both. In addition, in its HELLO frames. HAproxy adds the "engine-id"
key. It is a uniq string that identify a SPOE engine.
The "pipelining" capability is the ability for a peer to decouple NOTIFY and ACK
frames. This is a symmectical capability. To be used, it must be supported by
HAproxy and agents. Unlike HTTP pipelining, the ACK frames can be send in any
order, but always on the same TCP connection used for the corresponding NOTIFY
frame.
The "async" capability is similar to the pipelining, but here any TCP connection
established between HAProxy and the agent can be used to send ACK frames. if an
agent accepts connections from multiple HAProxy, it can use the "engine-id"
value to group TCP connections.
The array of pointers passed to sample_parse_expr was not really an array but a
pointer to pointer. So it can easily lead to a segfault during the configuration
parsing.
During a soft stop, we need to wakeup all SPOE applets to stop them. So we loop
on all proxies, and for each proxy, on all filters. But we must be sure to only
handle SPOE filters here. To do so, we use a specific id.
Use SSL_set_ex_data/SSL_get_ex_data standard API call to store capture.
We need to avoid internal structures/undocumented calls usage to try to
control the beast and limit painful compatibilities.
Bug introduced with "removes SSL_CTX_set_ssl_version call and cleanup CTX
creation": ssl_sock_new_ctx is called before all the bind line is parsed.
The fix consists of separating the use of default_ctx as the initialization
context of the SSL connection via bind_conf->initial_ctx. Initial_ctx contains
all the necessary parameters before performing the selection of the CTX:
default_ctx is processed as others ctx without unnecessary parameters.
This new sample-fetches captures the cipher list offer by the client
SSL connection during the client-hello phase. This is useful for
fingerprint the SSL connection.
BoringSSL doesn't support SSL_CTX_set_ssl_version. To remove this call, the
CTX creation is cleanup to clarify what is happening. SSL_CTX_new is used to
match the original behavior, in order: force-<method> according the method
version then the default method with no-<method> options.
OPENSSL_NO_SSL3 error message is now in force-sslv3 parsing (as force-tls*).
For CTX creation in bind environement, all CTX set related to the initial ctx
are aggregate to ssl_sock_new_ctx function for clarity.
Tests with crt-list have shown that server_method, options and mode are
linked to the initial CTX (default_ctx): all ssl-options are link to each
bind line and must be removed from crt-list.
Extract from RFC 6066:
"If the server understood the ClientHello extension but does not recognize
the server name, the server SHOULD take one of two actions: either abort the
handshake by sending a fatal-level unrecognized_name(112) alert or continue the
handshake. It is NOT RECOMMENDED to send a warning-level unrecognized_name(112)
alert, because the client's behavior in response to warning-level alerts is
unpredictable. If there is a mismatch between the server name used by the
client application and the server name of the credential chosen by the server,
this mismatch will become apparent when the client application performs the
server endpoint identification, at which point the client application will have
to decide whether to proceed with the communication."
Thanks Roberto Guimaraes for the bug repport, spotted with openssl-1.1.0.
This fix must be backported.
SSL verify and client_CA inherits from the initial ctx (default_ctx).
When a certificate is found, the SSL connection environment must be replaced by
the certificate configuration (via SSL_set_verify and SSL_set_client_CA_list).
This patch used boringssl's callback to analyse CLientHello before any
handshake to extract key signature capabilities.
Certificat with better signature (ECDSA before RSA) is choosed
transparenty, if client can support it. RSA and ECDSA certificates can
be declare in a row (without order). This makes it possible to set
different ssl and filter parameter with crt-list.
In 1.4-dev5 when we started to implement keep-alive, commit a9679ac
("[MINOR] http: make the conditional redirect support keep-alive")
added a specific check was added to support keep-alive on redirect
rules but only when the location would start with a "/" indicating
the client would come back to the same server.
But nowadays most applications put http:// or https:// in front of
each and every location, and continuing to perform a close there is
counter-efficient, especially when multiple objects are fetched at
once from a same origin which redirects them to the correct origin
(eg: after an http to https forced upgrade).
It's about time to get rid of this old trick as it causes more harm
than good at an era where persistent connections are omnipresent.
Special thanks to Ciprian Dorin Craciun for providing convincing
arguments with a pretty valid use case and proposing this draft
patch which addresses the issue he was facing.
This change although not exactly a bug fix should be backported
to 1.7 to adapt better to existing infrastructure.
Adrian Fitzpatrick reported that since commit f51658d ("MEDIUM: config:
relax use_backend check to make the condition optional"), typos like "of"
instead of "if" on use_backend rules are not properly detected. The reason
is that the parser only checks for "if" or "unless" otherwise it considers
there's no keyword, making the rule inconditional.
This patch fixes it. It may reveal some rare config bugs for some people,
but will not affect valid configurations.
This fix must be backported to 1.7, 1.6 and 1.5.
Error in the HTTP parser. The function http_get_path() can
return NULL and this case is not catched in the code. So, we
try to dereference NULL pointer, and a segfault occurs.
These two lines are useful to prevent the bug.
acl prevent_bug path_beg /
http-request deny if !prevent_bug
This bug fix should be backported in 1.6 and 1.7
Commit 405ff31 ("BUG/MINOR: ssl: assert on SSL_set_shutdown with BoringSSL")
introduced a regression causing some random crashes apparently due to
memory corruption. The issue is the use of SSL_CTX_set_quiet_shutdown()
instead of SSL_set_quiet_shutdown(), making it use a different structure
and causing the flag to be put who-knows-where.
Many thanks to Jarno Huuskonen who reported this bug early and who
bisected the issue to spot this patch. No backport is needed, this
is 1.8-specific.
Historically, http-response rules couldn't produce errors generating HTTP
responses during their evaluation. This possibility was "implicitly" added with
http-response redirect rules (51d861a4). But, at the time, replace-header rules
were kept untouched. When such a rule failed, the rules processing was just
stopped (like for an accept rule).
Conversely, when a replace-header rule fails on the request, it generates a HTTP
response (400 Bad Request).
With this patch, errors on replace-header rule are now handled in the same way
for HTTP requests and HTTP responses.
This patch should be backported in 1.7 and 1.6.
This is the same fix as which concerning the redirect rules (0d94576c).
The buffer used to expand the <replace-fmt> argument must be protected to
prevent it being overwritten during build_logline() execution (the function used
to expand the format string).
This patch should be backported in 1.7, 1.6 and 1.5. It relies on commit b686afd
("MINOR: chunks: implement a simple dynamic allocator for trash buffers") for
the trash allocator, which has to be backported as well.
Some users have experienced some troubles using the compression filter when the
HTTP response body length is undefined. They complained about receiving
truncated responses.
In fact, the bug can be triggered if there is at least one filter attached to
the stream but none registered to analyze the HTTP response body. In this case,
when the body length is undefined, data should be forwarded without any
parsing. But, because of a wrong check, we were starting to parse them. Because
it was not expected, the end of response was not correctly detected and the
response could be truncted. So now, we rely on HAS_DATA_FILTER macro instead of
HAS_FILTER one to choose to parse HTTP response body or not.
Furthermore, in http_response_forward_body, the test to not forward the server
closure to the client has been updated to reflect conditions listed in the
associated comment.
And finally, in http_msg_forward_body, when the body length is undefined, we
continue the parsing it until the server closes the connection without any on
filters. So filters can safely stop to filter data during their parsing.
This fix should be backported in 1.7
See 4b788f7d34
If we use the action "http-request redirect" with a Lua sample-fetch or
converter, and the Lua function calls one of the Lua log function, the
header name is corrupted, it contains an extract of the last loggued data.
This is due to an overwrite of the trash buffer, because his scope is not
respected in the "add-header" function. The scope of the trash buffer must
be limited to the function using it. The build_logline() function can
execute a lot of other function which can use the trash buffer.
This patch fix the usage of the trash buffer. It limits the scope of this
global buffer to the local function, we build first the header value using
build_logline, and after we store the header name.
Thanks Jesse Schulman for the bug repport.
This patch must be backported in 1.7, 1.6 and 1.5 version, and it relies
on commit b686afd ("MINOR: chunks: implement a simple dynamic allocator for
trash buffers") for the trash allocator, which has to be backported as well.
The trash buffers are becoming increasingly complex to deal with due to
the code's modularity allowing some functions to be chained and causing
the same chunk buffers to be used multiple times along the chain, possibly
corrupting each other. In fact the trash were designed from scratch for
explicitly not surviving a function call but string manipulation makes
this impossible most of the time while not fullfilling the need for
reliable temporary chunks.
Here we introduce the ability to allocate a temporary trash chunk which
is reserved, so that it will not conflict with the trash chunks other
functions use, and will even support reentrant calls (eg: build_logline).
For this, we create a new pool which is exactly the size of a usual chunk
buffer plus the size of the chunk struct so that these chunks when allocated
are exactly the same size as the ones returned by get_trash_buffer(). These
chunks may fail so the caller must check them, and the caller is also
responsible for freeing them.
The code focuses on minimal changes and ease of reliable backporting
because it will be needed in stable versions in order to support next
patch.
UDP sockets used to send DNS queries are created before fork happens and
this is a big problem because all the processes (in case of a
configuration starting multiple processes) share the same socket. Some
processes may consume responses dedicated to an other one, some servers
may be disabled, some IPs changed, etc...
As a workaround, this patch close the existing socket and create a new
one after the fork() has happened.
[wt: backport this to 1.7]
The function dns_init_resolvers() is used to initialize socket used to
send DNS queries.
This patch gives the function the ability to close a socket before
re-opening it.
[wt: this needs to be backported to 1.7 for next fix]
This patch change the names prefixing it by a "_". So "end" becomes "_end".
The backward compatibility with names without the prefix "_" is assured.
In other way, another the keyword "end" can be used like this: Map['end'].
Thanks Robin H. Johnson for the bug repport
This should be backported in version 1.6 and 1.7
There's a test after a successful synchronous connect() consisting
in waking the data layer up asap if there's no more handshake.
Unfortunately this test is run before setting the CO_FL_SEND_PROXY
flag and before the transport layer adds its own flags, so it can
indicate a willingness to send data while it's not the case and it
will have to be handled later.
This has no visible effect except a useless call to a function in
case of health checks making use of the proxy protocol for example.
Additionally a corner case where EALREADY was returned and considered
equivalent to EISCONN was fixed so that it's considered equivalent to
EINPROGRESS given that the connection is not complete yet. But this
code should never return on the first call anyway so it's mostly a
cleanup.
This fix should be backported to 1.7 and 1.6 at least to avoid
headaches during some debugging.
While testing a tcp_fastopen related change, it appeared that in the rare
case where connect() can immediately succeed, we still subscribe to write
notifications on the socket, causing the conn_fd_handler() to immediately
be called and a second call to connect() to be attempted to double-check
the connection.
In fact this issue had already been met with unix sockets (which often
respond immediately) and partially addressed but incorrect so another
patch will follow. But for TCP nothing was done.
The fix consists in removing the WAIT_L4_CONN flag if connect() succeeds
and to subscribe for writes only if some handshakes or L4_CONN are still
needed. In addition in order not to fail raw TCP health checks, we have
to continue to enable polling for data when nothing is scheduled for
leaving and the connection is already established, otherwise the caller
will never be notified.
This fix should be backported to 1.7 and 1.6.
A recent patch to support BoringSSL caused this warning to appear on
OpenSSL 1.1.0 :
src/ssl_sock.c:3062:4: warning: statement with no effect [-Wunused-value]
It's caused by SSL_CTX_set_ecdh_auto() which is now only a macro testing
that the last argument is zero, and the result is not used here. Let's
just kill it for both versions.
Tested with 0.9.8, 1.0.0, 1.0.1, 1.0.2, 1.1.0. This fix may be backported
to 1.7 if the boringssl fix is as well.
Limitations:
. disable force-ssl/tls (need more work)
should be set earlier with SSL_CTX_new (SSL_CTX_set_ssl_version is removed)
. disable generate-certificates (need more work)
introduce SSL_NO_GENERATE_CERTIFICATES to disable generate-certificates.
Cleanup some #ifdef and type related to boringssl env.
This change adds possibility to change agent-addr and agent-send directives
by CLI/socket. Now you can replace server's and their configuration without
reloading/restarting whole haproxy, so it's a step in no-reload/no-restart
direction.
Depends on #e9602af - agent-addr is implemented there.
Can be backported to 1.7.
This directive add possibility to set different address for agent-checks.
With this you can manage server status and weight from central place.
Can be backported to 1.7.
crt-list is extend to support ssl configuration. You can now have
such line in crt-list <file>:
mycert.pem [npn h2,http/1.1]
Support include "npn", "alpn", "verify", "ca_file", "crl_file",
"ecdhe", "ciphers" configuration and ssl options.
"crt-base" is also supported to fetch certificates.
When the stream's backend was defined, the request's analyzers flag was always
set to 0 if the stream had no listener. This bug was introduced with the filter
API but never triggered (I think so).
Because of the commit 5820a366, it is now possible to encountered it. For
example, this happens when the trace filter is enabled on a SPOE backend. The
fix is pretty trivial.
This fix must be backported to 1.7.
The previous version used an O(number of proxies)^2 algo to get the sum of
the number of maxconns of frontends which reference a backend at least once.
This new version adds the frontend's maxconn number to the backend's
struct proxy member 'tot_fe_maxconn' when the backend name is resolved
for switching rules or default_backend statment. At the end, the final
backend's fullconn is computed looping only one time for all on proxies O(n).
The load of a configuration using a large amount of backends (10 thousands)
without configured fullconn was reduced from several minutes to few seconds.
The output of whether prefer-server-ciphers is supported by OpenSSL
actually always show yes in 1.8, because SSL_OP_CIPHER_SERVER_PREFERENCE
is redefined before the actual check in src/ssl_sock.c, since it was
moved from here from src/haproxy.c.
Since this is not really relevant anymore as we don't support OpenSSL
< 0.9.7 anyway, this change just removes this output.
When haproxy is compiled without zlib or slz, the output of
haproxy -vv shows (null).
Make haproxy -vv output great again by providing the proper
information (which is what we did before).
This is for 1.8 only.
With BoringSSL:
SSL_set_shutdown: Assertion `(SSL_get_shutdown(ssl) & mode) == SSL_get_shutdown(ssl)' failed.
"SSL_set_shutdown causes ssl to behave as if the shutdown bitmask (see SSL_get_shutdown)
were mode. This may be used to skip sending or receiving close_notify in SSL_shutdown by
causing the implementation to believe the events already happened.
It is an error to use SSL_set_shutdown to unset a bit that has already been set.
Doing so will trigger an assert in debug builds and otherwise be ignored.
Use SSL_CTX_set_quiet_shutdown instead."
Change logic to not notify on SSL_shutdown when connection is not clean.
"X509_get_pubkey() attempts to decode the public key for certificate x.
If successful it returns the public key as an EVP_PKEY pointer with its
reference count incremented: this means the returned key must be freed
up after use."
This prevents DNS from resolving IPv6-only servers in 1.7. Note, this
patch depends on the previous series :
1. BUG/MINOR: tools: fix off-by-one in port size check
2. BUG/MEDIUM: server: consider AF_UNSPEC as a valid address family
3. MEDIUM: server: split the address and the port into two different fields
4. MINOR: tools: make str2sa_range() return the port in a separate argument
5. MINOR: server: take the destination port from the port field, not the addr
6. MEDIUM: server: disable protocol validations when the server doesn't resolve
This fix (hence the whole series) must be backported to 1.7.
When a server doesn't resolve we don't know the address family so we
can't perform the basic protocol validations. However we know that we'll
ultimately resolve to AF_INET4 or AF_INET6 so the controls are OK. It is
important to proceed like this otherwise it will not be possible to start
with unresolved addresses.
Next patch will cause the port to disappear from the address field when servers
do not resolve so we need to take it from the separate field provided by
str2sa_range().
Keeping the address and the port in the same field causes a lot of problems,
specifically on the DNS part where we're forced to cheat on the family to be
able to keep the port. This causes some issues such as some families not being
resolvable anymore.
This patch first moves the service port to a new field "svc_port" so that the
port field is never used anymore in the "addr" field (struct sockaddr_storage).
All call places were adapted (there aren't that many).
The DNS code is written so as to support AF_UNSPEC to decide on the
server family based on responses, but unfortunately snr_resolution_cb()
considers it as invalid causing a DNS storm to happen when a server
arrives with this family.
This situation is not supposed to happen as long as unresolved addresses
are forced to AF_INET, but this will change with the upcoming fixes and
it's possible that it's not granted already when changing an address on
the CLI.
This fix must be backported to 1.7 and 1.6.
port_to_str() checks that the port size is at least 5 characters instead
of at least 6. While in theory it could permit a buffer overflow, it's
harmless because all callers have at least 6 characters here.
This fix needs to be backported to 1.7, 1.6 and 1.5.
http-reuse should normally not be used in conjunction with the proxy
protocol or with "usesrc clientip". While there's nothing fundamentally
wrong with this, whenever these options are used, the server expects the
IP address to be the source address for all requests, which doesn't make
sense with http-reuse.
fc_rcvd_proxy : boolean
Returns true if the client initiated the connection with a PROXY protocol
header.
A flag is added on the struct connection if a PROXY header is successfully
parsed.
The older 'rsprep' directive allows modification of the status reason.
Extend 'http-response set-status' to take an optional string of the new
status reason.
http-response set-status 418 reason "I'm a coffeepot"
Matching updates in Lua code:
- AppletHTTP.set_status
- HTTP.res_set_status
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Commits 5f10ea3 ("OPTIM: http: improve parsing performance of long
URIs") and 0431f9d ("OPTIM: http: improve parsing performance of long
header lines") introduced a bug in the HTTP parser : when a partial
request is read, the first part ends up on a 8-bytes boundary (or 4-byte
on 32-bit machines), the end lies in the header field value part, and
the buffer used to contain a CR character exactly after the last block,
then the parser could be confused and read this CR character as being
part of the current request, then switch to a new state waiting for an
LF character. Then when the next part of the request appeared, it would
read the character following what was erroneously mistaken for a CR,
see that it is not an LF and fail on a bad request. In some cases, it
can even be worse and the header following the hole can be improperly
indexed causing all sort of unexpected behaviours like a content-length
being ignored or a header appended at the wrong position. The reason is
that there's no control of and of parsing just after breaking out of the
loop.
One way to reproduce it is with this config :
global
stats socket /tmp/sock1 mode 666 level admin
stats timeout 1d
frontend px
bind :8001
mode http
timeout client 10s
redirect location /
And sending requests this way :
$ tcploop 8001 C P S:"$(dd if=/dev/zero bs=16384 count=1 2>/dev/null | tr '\000' '\r')"
$ tcploop 8001 C P S:"$(dd if=/dev/zero bs=16384 count=1 2>/dev/null | tr '\000' '\r')"
$ tcploop 8001 C P \
S:"GET / HTTP/1.0\r\nX-padding: 0123456789.123456789.123456789.123456789.123456789.123456789.1234567" P \
S:"89.123456789\r\n\r\n" P
Then a "show errors" on the socket will report :
$ echo "show errors" | socat - /tmp/sock1
Total events captured on [04/Jan/2017:15:09:15.755] : 32
[04/Jan/2017:15:09:13.050] frontend px (#2): invalid request
backend <NONE> (#-1), server <NONE> (#-1), event #31
src 127.0.0.1:59716, session #91, session flags 0x00000080
HTTP msg state 17, msg flags 0x00000000, tx flags 0x00000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00808002, out 0 bytes, total 111 bytes
pending 111 bytes, wrapping at 16384, error at position 107:
00000 GET / HTTP/1.0\r\n
00016 X-padding: 0123456789.123456789.123456789.123456789.123456789.12345678
00086+ 9.123456789.123456789\r\n
00109 \r\n
This fix must be backported to 1.7.
Many thanks to Aleksey Gordeev and Axel Reinhold for providing detailed
network captures and configurations exhibiting the issue.
debug_hexdump() prints to the requested output stream (typically stdout
or stderr) an hex dump of the blob passed in argument. This is useful
to help debug binary protocols.
Error captures almost always report a state 26 (MSG_ERROR) making it
very hard to know what the parser was expecting. The reason is that
we have to switch to MSG_ERROR to trigger the dump, and then during
the dump we capture the current state which is already MSG_ERROR. With
this change we now copy the current state into an err_state field that
will be reported as the faulty state.
This patch looks a bit large because the parser doesn't update the
current state until it runs out of data so the current state is never
known when jumping to ther error label! Thus the code had to be updated
to take copies of the current state before switching to MSG_ERROR based
on the switch/case values.
As a bonus, it now shows the current state in human-readable form and
not only in numeric form ; in the past it was not an issue since it was
always 26 (MSG_ERROR).
At least now we can get exploitable invalid request/response reports :
[05/Jan/2017:19:28:57.095] frontend f (#2): invalid request
backend <NONE> (#-1), server <NONE> (#-1), event #1
src 127.0.0.1:39894, session #4, session flags 0x00000080
HTTP msg state MSG_RQURI(4), msg flags 0x00000000, tx flags 0x00000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00908002, out 0 bytes, total 20 bytes
pending 20 bytes, wrapping at 16384, error at position 5:
00000 GET /\e HTTP/1.0\r\n
00017 \r\n
00019 \n
[05/Jan/2017:19:28:33.827] backend b (#3): invalid response
frontend f (#2), server s1 (#1), event #0
src 127.0.0.1:39718, session #0, session flags 0x000004ce
HTTP msg state MSG_HDR_NAME(17), msg flags 0x00000000, tx flags 0x08300000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x80008002, out 0 bytes, total 59 bytes
pending 59 bytes, wrapping at 16384, error at position 31:
00000 HTTP/1.1 200 OK\r\n
00017 Content-length : 10\r\n
00038 \r\n
00040 0a\r\n
00044 0123456789\r\n
00056 0\r\n
This should be backported to 1.7 and 1.6 at least to help with bug
reports.
It is important to defined analyzers (AN_REQ_* and AN_RES_*) in the same order
they are evaluated in process_stream. This order is really important because
during analyzers evaluation, we run them in the order of the lower bit to the
higher one. This way, when an analyzer adds/removes another one during its
evaluation, we know if it is located before or after it. So, when it adds an
analyzer which is located before it, we can switch to it immediately, even if it
has already been called once but removed since.
With the time, and introduction of new analyzers, this order was broken up. the
main problems come from the filter analyzers. We used values not related with
their evaluation order. Furthermore, we used same values for request and response
analyzers.
So, to fix the bug, filter analyzers have been splitted in 2 distinct lists to
have different analyzers for the request channel than those for the response
channel. And of course, we have moved them to the right place.
Some other analyzers have been reordered to respect the evaluation order:
* AN_REQ_HTTP_TARPIT has been moved just before AN_REQ_SRV_RULES
* AN_REQ_PRST_RDP_COOKIE has been moved just before AN_REQ_STICKING_RULES
* AN_RES_STORE_RULES has been moved just after AN_RES_WAIT_HTTP
Note today we have 29 analyzers, all stored into a 32 bits bitfield. So we can
still add 4 more analyzers before having a problem. A good way to fend off the
problem for a while could be to have a different bitfield for request and
response analyzers.
[wt: all of this must be backported to 1.7, and part of it must be backported
to 1.6 and 1.5]
The registered output type for the sample fetches sc*_get_gpt0
is a boolean, but the value returned is an integer.
This patch fixs the default type to SINT in place of BOOL.
This patch should be backported in 1.6 and 1.7
when using "option prefer-last-server", we may not always stay on
the same backend if option balance told us otherwise.
For example, backend may change in the following cases:
balance hdr()
balance rdp-cookie
balance source
balance uri
balance url_param
[wt: backport this to 1.7 and 1.6]
this adds a support of the newest pcre2 library,
more secure than its older sibling in a cost of a
more complex API.
It works pretty similarly to pcre's part to keep
the overall change smooth, except :
- we define the string class supported at compile time.
- after matching the ovec data is properly sized, althought
we do not take advantage of it here.
- the lack of jit support is treated less 'dramatically'
as pcre2_jit_compile in this case is 'no-op'.
In systemd mode (-Ds), the master haproxy process is waiting for each
child to exit in a specific order. If a process die when it's not his
turn, it will become a zombie process until every processes exit.
The master is now waiting for any process to exit in any order.
This patch should be backported to 1.7, 1.6 and 1.5.
Calling SSL_set_tlsext_host_name() on the current SSL ctx has no effect
if the session is being resumed because the hostname is already stored
in the session and is not advertised again in subsequent connections.
It's visible when enabling SNI and health checks at the same time because
checks do not send an SNI and regular traffic reuses the same connection,
resulting in no SNI being sent.
The only short-term solution is to reset the reused session when the
SNI changes compared to the previous one. It can make the server-side
performance suffer when SNIs are interleaved but it will work. A better
long-term solution would be to keep a small cache of a few contexts for
a few SNIs.
Now with SSL_set_session(ctx, NULL) it works. This needs to be double-
checked though. The man says that SSL_set_session() frees any previously
existing context. Some people report a bit of breakage when calling
SSL_set_session(NULL) on openssl 1.1.0a (freed session not reusable at
all though it's not an issue for now).
This needs to be backported to 1.7 and 1.6.
According to nbsrv() documentation this fetcher should return "an
integer value corresponding to the number of usable servers".
In case backend is disabled none of servers is usable, so I believe
fetcher should return 0.
This patch should be backported to 1.7, 1.6, 1.5.
Historically a lot of SSL global settings were stored into the global
struct, but we've reached a point where there are 3 ifdefs in it just
for this, and others in haproxy.c to initialize it.
This patch moves all the private fields to a new struct "global_ssl"
stored in ssl_sock.c. This includes :
char *crt_base;
char *ca_base;
char *listen_default_ciphers;
char *connect_default_ciphers;
int listen_default_ssloptions;
int connect_default_ssloptions;
int tune.sslprivatecache; /* Force to use a private session cache even if nbproc > 1 */
unsigned int tune.ssllifetime; /* SSL session lifetime in seconds */
unsigned int tune.ssl_max_record; /* SSL max record size */
unsigned int tune.ssl_default_dh_param; /* SSL maximum DH parameter size */
int tune.ssl_ctx_cache; /* max number of entries in the ssl_ctx cache. */
The "tune" part was removed (useless here) and the occasional "ssl"
prefixes were removed as well. Thus for example instead of
global.tune.ssl_default_dh_param
we now have :
global_ssl.default_dh_param
A few initializers were present in the constructor, they could be brought
back to the structure declaration.
A few other entries had to stay in global for now. They concern memory
calculationn (used in haproxy.c) and stats (used in stats.c).
The code is already much cleaner now, especially for global.h and haproxy.c
which become readable.
tlskeys_finalize_config() was the only reason for haproxy.c to still
require ifdef and includes for ssl_sock. This one fits perfectly well
in the late initializers so it was changed to be registered with
hap_register_post_check().
Now we can simply check the transport layer at run time and decide
whether or not to initialize or destroy these entries. This removes
other ifdefs and includes from cfgparse.c, haproxy.c and hlua.c.
Now we exclusively use xprt_get(XPRT_RAW) instead of &raw_sock or
xprt_get(XPRT_SSL) for &ssl_sock. This removes a bunch of #ifdef and
include spread over a number of location including backend, cfgparse,
checks, cli, hlua, log, server and session.
There are still a lot of #ifdef USE_OPENSSL in the code (still 43
occurences) because we never know if we can directly access ssl_sock
or not. This patch attacks the problem differently by providing a
way for transport layers to register themselves and for users to
retrieve the pointer. Unregistered transport layers will point to NULL
so it will be easy to check if SSL is registered or not. The mechanism
is very inexpensive as it relies on a two-entries array of pointers,
so the performance will not be affected.
Having it in the ifdef complicates certain operations which require
additional ifdefs just to access a member which could remain zero in
non-ssl cases. Let's move it out, it will not even increase the
struct size on 64-bit machines due to alignment.
Instead of hard-coding all SSL destruction in cfgparse.c and haproxy.c,
we now register this new function as the transport layer's destroy_bind_conf()
and call it only when defined. This removes some non-obvious SSL-specific
code and #ifdefs from cfgparse.c and haproxy.c
Instead of hard-coding all SSL preparation in cfgparse.c, we now register
this new function as the transport layer's prepare_bind_conf() and call it
only when definied. This removes some non-obvious SSL-specific code from
cfgparse.c as well as a #ifdef.
Most of the SSL functions used to have a proxy argument which was mostly
used to be able to emit clean errors using Alert(). First, many of them
were converted to memprintf() and don't require this pointer anymore.
Second, the rare which still need it also have either a bind_conf argument
or a server argument, both of which carry a pointer to the relevant proxy.
So let's now get rid of it, it needlessly complicates the API and certain
functions already have many arguments.
Historically, all listeners have a pointer to the frontend. But since
the introduction of SSL, we now have an intermediary layer called
bind_conf corresponding to a "bind" line. It makes no sense to have
the frontend on each listener given that it's the same for all
listeners belonging to a same bind_conf. Also certain parts like
SSL can only operate on bind_conf and need the frontend.
This patch fixes this by moving the frontend pointer from the listener
to the bind_conf. The extra indirection is quite cheap given and the
places were this is used are very scarce.
A mistake was made when the socket layer was cut into proto and
transport, the transport was attached to the listener while all
listeners in a single "bind" line always have exactly the same
transport. It doesn't seem obvious but this is the reason why there
are so many #ifdefs USE_OPENSSL in cfgparse : a lot of operations
have to be open-coded because cfgparse only manipulates bind_conf
and we don't have the information of the transport layer here.
Very little code makes use of the transport layer, mainly session
setup and log. These places can afford an extra pointer indirection
(the listener points to the bind_conf). This change is thus very small,
it saves a little bit of memory (8B per listener) and makes the code
more flexible.
The code currently creates a listener only to ensure that sess->li is
properly populated, and to retrieve the frontend (which is also available
directly from the session).
It turns out that the current infrastructure (for a large part) already
supports not having any listener on a session (since Lua does the same),
except for the following places which were not yet converted :
- session_count_new() : used by session_accept_fd, ie never for spoe
- session_accept_fd() : never used here, an applet initiates the session
- session_prepare_log_prefix() : embryonic sessions only, thus unused
- session_kill_embryonic() : same
- conn_complete_session() : same
- build_log_line() for fields %cp, %fp and %ft : unused here but may change
- http_wait_for_request() and subsequent functions : unused here
Thus for now it's as safe to run SPOE without listener as it is for Lua,
and this was an obstacle against some cleanups of the listener code. The
places above should be plugged so that it becomes save over the long term
as well.
An alternative in the future might be to create a dummy listener that
outgoing connections could use just to avoid keeping a null here.
The tcp rules may be applied to a TCP stream initiated by applets (spoe,
lua, peers, later H2). These ones do not necessarily have a valid listener
so we must verify the field is not null before updating the stats. For now
there's no way to trigger this bug because lua and peers don't have analysers,
h2 is not implemented and spoe has a dummy listener. But this threatens to
break at any instant.
"scur" was typed as "limit" (FO_CONFIG) and "config value" (FN_LIMIT).
The real types of "scur" are "metric" (FO_METRIC) and "gauge"
(FN_GAUGE). FO_METRIC and FN_GAUGE are the value 0.
ssl_sock functions don't mark pointers as NULL after freeing them. So
if a "bind" line specifies some SSL settings without the "ssl" keyword,
they will get freed at the end of check_config_validity(), then freed
a second time on exit. Simply mark the pointers as NULL to fix this.
This fix needs to be backported to 1.7 and 1.6.
We have a bug when SSL reuse is disabled on the server side : we reset
the context but do not set it to NULL, causing a multiple free of the
same entry. It seems like this bug cannot appear as-is with the current
code (or the conditions to get it are not obvious) but it did definitely
strike when trying to fix another bug with the SNI which forced a new
handshake.
This fix should be backported to 1.7, 1.6 and 1.5.
This finishes to clean up the zlib-specific parts. It also unbreaks recent
commit b97c6fb ("CLEANUP: compression: use the build options list to report
the algos") which broke USE_ZLIB due to MAXWBITS not being defined anymore
in haproxy.c.
The following keywords were still parsed in cfgparse and were moved
to ssl_sock to remove some #ifdefs :
"tune.ssl.cachesize", "tune.ssl.default-dh-param", "tune.ssl.force-private-cache",
"tune.ssl.lifetime", "tune.ssl.maxrecord", "tune.ssl.ssl-ctx-cache-size".
It's worth mentionning that some of them used to have incorrect sign
checks possibly resulting in some negative values being used. All of
them are now checked for being positive.
This removes 2 #ifdefs and makes the code much cleaner. The controls
are still there and the two parsers have been merged into a single
function ssl_parse_global_ca_crt_base().
It's worth noting that there's still a check to prevent a change when
the value was already specified. This test seems useless and possibly
counter-productive, it may have to be revisited later, but for now it
was implemented identically.
We already had alertif_too_many_args{,_idx}(), but these ones are
specifically designed for use in cfgparse. Outside of it we're
trying to avoid calling Alert() all the time so we need an
equivalent using a pointer to an error message.
These new functions called too_many_args{,_idx)() do exactly this.
They don't take the file name nor the line number which they have
no use for but instead they take an optional pointer to an error
message and the pointer to the error code is optional as well.
With (NULL, NULL) they'll simply check the validity and return a
verdict. They are quite convenient for use in isolated keyword
parsers.
These two new functions as well as the previous ones have all been
exported.
We replaced global.deviceatlas with global_deviceatlas since there's no need
to store all this into the global section. This removes the last #ifdefs,
and now the code is 100% self-contained in da.c. The file da.h was now
removed because it was only used to load dac.h, which is more easily
loaded directly from da.c. It provides another good example of how to
integrate code in the future without touching the core parts.
We replaced global._51degrees with global_51degrees since there's no need
to store all this into the global section. This removes the last #ifdefs,
and now the code is 100% self-contained in 51d.c. The file 51d.h was now
removed because it was only used to load 51Degrees.h, which is more easily
loaded from 51d.c. It provides a good example of how to integrate code in
the future without touching the core parts.
We replaced global.wurfl with global_wurfl since there's no need to store
all this into the global section. This removes the last #ifdefs, and now
the code is 100% self-contained in wurfl.c. It provides a good example of
how to integrate code in the future without touching the core parts.
deinit_51degrees() is not called anymore from haproxy.c, removing
2 #ifdefs and one include. The function was made static. The include
file still includes 51Degrees.h which is needed by global.h and 51d.c
so it was not touched beyond this last function removal.
By registering the deinit function we avoid another #ifdef in haproxy.c.
The ha_wurfl_deinit() function has been made static and unexported. Now
proto/wurfl.h is totally empty, the code being self-contained in wurfl.c,
so the useless .h has been removed.
The 3 device detection engines stop at the same place in deinit()
with the usual #ifdefs. Similar to the other functions we can have
some late deinitialization functions. These functions do not return
anything however so we have to use a different type.
Instead of having a #ifdef in the main init code we now use the registered
init functions. Doing so also enables error checking as errors were previously
reported as alerts but ignored. Also they were incorrect as the 'status'
variable was hidden by a second one and was always reporting DA_SYS (which
is apparently an error) in every case including the case where no file was
loaded. The init_deviceatlas() function was unexported since it's not used
outside of this place anymore.
This removes some #ifdefs from the main haproxy code path. Function
init_51degrees() now returns ERR_* instead of exit(1) on error, and
this function was made static and is not exported anymore.
This removes some #ifdefs from the main haproxy code path and enables
error checking. The current code only makes use of warnings even for
some errors that look serious. While this choice is questionnable, it
has been kept as-is, and only the return codes were adapted to ERR_WARN
to at least report that some warnings were emitted. ha_wurfl_init() was
unexported as it's not needed anymore.
Instead of calling the checks directly from the init code, we now
register the start_checks() function to be run at this point. This
also allows to unexport the check init function and to remove one
include from haproxy.c.
There's a significant amount of late initialization calls which are
performed after the point where we exit in check mode. These calls
are used to allocate resource and perform certain slow operations.
Let's have a way to register some functions which need to be called
there instead of having this multitude of #ifdef in the init path.
This removes 2 #ifdef, an include, an ugly construct and a wild "extern"
declaration from haproxy.c. The message indicating that compression is
*not* enabled is not there anymore.
Many extensions now report some build options to ease debugging, but
this is now being done at the expense of code maintainability. Let's
provide a registration function to do this so that we can start to
remove most of the #ifdefs from haproxy.c (18 currently just for a
single function).
If the memory allocator fails, it return a bad code, and the execution
continue. If the Lua/cli initializer fails, the allocated struct is not
released.
If the lua/cli fails during initialization, it returns an ok
status, an the execution continue. This will probably occur a
segfault.
Thiw patch should be backported in 1.7
This is a regression from the commit a73e59b690.
When data are sent from a cosocket, the action is done in the context of the
applet running a lua stack and not in the context of the applet owning the
cosocket. So we must take care, explicitly, that this last applet have a buffer
(the req buffer of the cosocket).
This patch must be backported in 1.7
In the case of applet, the Lua context is taken from session
when we get the private values. This patch just update comments
associated to this action because it is not obvious.
This one now migrates to the general purpose cli.p0 for the ref pointer,
cli.i0 for the dump_all flag and cli.i1 for the dump_keys_index. A few
comments were added.
The applet.h file doesn't depend on openssl anymore. It's worth noting
that the previous dependency was accidental and only used to work because
all files including this one used to have openssl included prior to
loading this file.
This one now migrates to the general purpose cli.p0 for the proxy pointer,
cli.p1 for the server pointer, and cli.i0 for the proxy's instance if only
one has to be dumped.
Most of the keywords don't need to have their own entry in the appctx
union, they just need to reuse some generic pointers like we've been
used to do in the appctx with st{0,1,2}. This patch adds p0, p1, i0, i1
and initializes them to zero before calling the parser. This way some
of the simplest existing keywords will be able to disappear from the
union.
It's worth noting that this is an extension to what was initially
attempted via the "private" member that I removed a few patches ago by
not understanding how it was supposed to be used. Here the fact that
we share the same union will force us to be stricter: the code either
uses the general purpose variables or it uses its own fields but not
both.
This is a leftover from the cleanup campaign, the stats scope was still
initialized by the CLI instead of being initialized by the stats keyword
parsers. This should probably be backported to 1.7 to make the code more
consistent.
Sometimes a registered keyword will not need any specific parsing nor
initialization, so it's annoying to have to write an empty parsing
function returning zero just for this.
This patch makes it possible to automatically call a keyword's I/O
handler of when the parsing function is not defined, while still allowing
a parser to set the I/O handler itself.
The signals system embedded in Lua can be tranformed in general purpose
signals code. To reach this goal, this path removes the Lua part of the
signals.
This is an easy job, because Lua is useles with signal. I change just two
prototypes.
The struct hlua size is 128 bytes. The size is the biggest of all the elements
of the union embedded in the appctx struct. With HTTP2, it is possible that this
appctx struct will be use many times for each connection, so the 128 bytes are
a little bit heavy for the global memory consomation.
This patch replace the embbeded hlua struct by a pointer and an associated memory
pool. Now, the memory for lua is allocated only if it is required.
[wt: the appctx is now down to 160 bytes]
Another small bug in "show cli sockets" made the last fix always report
process 64 due to a signedness issue in the shift operation when building
the mask.
Just like previous patch, this was the only other user of the "private"
field of the applet. It used to store a copy of the keyword's action.
Let's just put it into ->table->action and use it from there. It also
slightly simplifies the code by removing a few pointer to integer casts.
We have very few users of the appctx's private field which was introduced
prior to the split of the CLI. Unfortunately it was not removed after the
end. This commit simply introduces hlua_cli->fcn which is the pointer to
the Lua function that the Lua code used to store in this private pointer.
While developing an experimental applet performing only one read per full
line, it appeared that it would be woken up for the client's close, not
read all data (missing LF), then wait for a subsequent call, and would only
be woken up on client timeout to finish the read. The reason is that we
preset SI_FL_WAIT_DATA in the stream-interface's flags to avoid a fast loop,
but there's nothing which can remove this flag until there's a read operation.
We must definitely remove it in stream_int_notify() each time we're called
with CF_SHUTW_NOW because we know there will be no more subsequent read
and we don't want an applet which keeps the WANT_GET flag to block on this.
This fix should be backported to 1.7 and 1.6 though it's uncertain whether
cli, peers, lua or spoe really are affected there.
This problem is already detected here:
8dc7316a6f
Another case raises. Now HAProxy sends a final message (typically
with "http-request deny"). Once the the message is sent, the response
channel flags are not modified.
HAProxy executes a Lua sample-fecthes for building logs, and the
result is ignored because the response flag remains set to the value
HTTP_MSG_RPBEFORE. So the Lua function hlua_check_proto() want to
guarantee the valid state of the buffer and ask for aborting the
request.
The function check_proto() is not the good way to ensure request
consistency. The real question is not "Are the message valid ?", but
"Are the validity of message unchanged ?"
This patch memorize the parser state before entering int the Lua
code, and perform a check when it go out of the Lua code. If the parser
state change for down, the request is aborted because the HTTP message
is degraded.
This patch should be backported in version 1.6 and 1.7
Fixing the build using LibreSSL as OpenSSL implementation.
Currently, LibreSSL 2.4.4 provides the same API of OpenSSL 1.0.1x,
but it redefine the OpenSSL version number as 2.0.x, breaking all
checks with OpenSSL 1.1.x.
The patch solves the issue checking the definition of the symbol
LIBRESSL_VERSION_NUMBER when Openssl 1.1.x features are requested.
When an entity tries to get a buffer, if it cannot be allocted, for example
because the number of buffers which may be allocated per process is limited,
this entity is added in a list (called <buffer_wq>) and wait for an available
buffer.
Historically, the <buffer_wq> list was logically attached to streams because it
were the only entities likely to be added in it. Now, applets can also be
waiting for a free buffer. And with filters, we could imagine to have more other
entities waiting for a buffer. So it make sense to have a generic list.
Anyway, with the current design there is a bug. When an applet failed to get a
buffer, it will wait. But we add the stream attached to the applet in
<buffer_wq>, instead of the applet itself. So when a buffer is available, we
wake up the stream and not the waiting applet. So, it is possible to have
waiting applets and never awakened.
So, now, <buffer_wq> is independant from streams. And we really add the waiting
entity in <buffer_wq>. To be generic, the entity is responsible to define the
callback used to awaken it.
In addition, applets will still request an input buffer when they become
active. But they will not be sleeped anymore if no buffer are available. So this
is the responsibility to the applet I/O handler to check if this buffer is
allocated or not. This way, an applet can decide if this buffer is required or
not and can do additional processing if not.
[wt: backport to 1.7 and 1.6]
A stream can be awakened for different reasons. During its processing, it can be
early stopped if no buffer is available. In this situation, the reason why the
stream was awakened is lost, because we rely on the task state, which is reset
after each processing loop.
In many cases, that's not a big deal. But it can be useful to accumulate the
task states if the stream processing is interrupted, especially if some filters
need to be called.
To be clearer, here is an simple example:
1) A stream is awakened with the reason TASK_WOKEN_MSG.
2) Because no buffer is available, the processing is interrupted, the stream
is back to sleep. And the task state is reset.
3) Some buffers become available, so the stream is awakened with the reason
TASK_WOKEN_RES. At this step, the previous reason (TASK_WOKEN_MSG) is lost.
Now, the task states are saved for a stream and reset only when the stream
processing is not interrupted. The correspoing bitfield represents the pending
events for a stream. And we use this one instead of the task state during the
stream processing.
Note that TASK_WOKEN_TIMER and TASK_WOKEN_RES are always removed because these
events are always handled during the stream processing.
[wt: backport to 1.7 and 1.6]
<run_queue> is used to track the number of task in the run queue and
<run_queue_cur> is a copy used for the reporting purpose. These counters has
been renamed, respectively, <tasks_run_queue> and <tasks_run_queue_cur>. So the
naming is consistent between tasks and applets.
[wt: needed for next fixes, backport to 1.7 and 1.6]
As for tasks, 2 counters has been added to track :
* the total number of applets : nb_applets
* the number of active applets : applets_active_queue
[wt: needed for next fixes, to backport to 1.7 and 1.6]
[wt: while it could seem suspicious, the preceeding call to
dump_servers_state() indeed flushes the trash in case anything is
emitted. No backport needed though.]
When the option "http-buffer-request" is set, HAProxy send itself the
"HTTP/1.1 100 Continue" response in order to retrieve the post content.
When HAProxy forward the request, it send the body directly after the
headers. The header "Expect: 100-continue" was sent with the headers.
This header is useless because the body will be sent in all cases, and
the server reponse is not removed by haproxy.
This patch removes the header "Expect: 100-continue" if HAProxy sent it
itself.
These 2 patches add ability to fetch frontend/backend name in your
logic, so they can be used later to make routing decisions (fe_name) or
taking some actions based on backend which responded to request (be_name).
In our case we needed a fetcher to be able to extract information we
needed from frontend name.
(http|tcp)-(request|response) action cannot take arguments from the
configuration file. Arguments are useful for executing the action with
a special context.
This patch adds the possibility of passing arguments to an action. It
runs exactly like sample fetches and other Lua wrappers.
Note that this patch implements a 'TODO'.
The variable are compared only using text, the final '\0' (or the
string length) are not checked. So, the variable name "txn.internal"
matchs other one call "txn.int".
This patch fix this behavior
It must be backported ni 1.6 and 1.7
By investigating a keep-alive issue with CloudFlare, we[1] found that
when using the 'set-cookie' option in a redirect (302) HAproxy is adding
an extra `\r\n`.
Triggering rule :
`http-request redirect location / set-cookie Cookie=value if [...]`
Expected result :
```
HTTP/1.1 302 Found
Cache-Control: no-cache
Content-length: 0
Location: /
Set-Cookie: Cookie=value; path=/;
Connection: close
```
Actual result :
```
HTTP/1.1 302 Found
Cache-Control: no-cache
Content-length: 0
Location: /
Set-Cookie: Cookie=value; path=/;
Connection: close
```
This extra `\r\n` seems to be harmless with another HAproxy instance in
front of it (sanitizing) or when using a browser. But we confirm that
the CloudFlare NGINX implementation is not able to handle this. It
seems that both 'Content-length: 0' and extra carriage return broke RFC
(to be confirmed).
When looking into the code, this carriage-return was already present in
1.3.X versions but just before closing the connection which was ok I
think. Then, with 1.4.X the keep-alive feature was added and this piece
of code remains unchanged.
[1] all credit for the bug finding goes to CloudFlare Support Team
[wt: the bug was indeed present since the Set-Cookie was introduced
in 1.3.16, by commit 0140f25 ("[MINOR] redirect: add support for
"set-cookie" and "clear-cookie"") so backporting to all supported
versions is desired]
The recent CLI reorganization managed to break these two commands
by having their parser return 1 (indicating an end of processing)
instead of 0 to indicate new calls to the io handler were needed.
Namely the faulty commits are :
69e9644 ("REORG: cli: move show stat resolvers to dns.c")
32af203 ("REORG: cli: move ssl CLI functions to ssl_sock.c")
The fix is trivial and there is no other loss of functionality. Thanks
to Dragan Dosen for reporting the issue and the faulty commits. The
backport is needed in 1.7.
In 1.5-dev20, commit 48bcfda ("MEDIUM: dumpstat: make the CLI parser
understand the backslash as an escape char") introduced support for
backslash on the CLI, but it strips all backslashes in all arguments
instead of only unescaping them, making it impossible to pass a
backslash in an argument.
This will allow us to use a backslash in a command over the socket, eg.
"add acl #0 ABC\\XYZ".
[wt: this should be backported to 1.7 and 1.6]
Commit 5fddab0 ("OPTIM: stream_interface: disable reading when
CF_READ_DONTWAIT is set") improved the connection layer's efficiency
back in 1.5-dev13 by avoiding successive read attempts on an active
FD. But by disabling this on a polled FD, it causes an unpleasant
side effect which is that the FD that was subscribed to polling is
suddenly stopped and may need to be re-enabled once the kernel
starts to slow down on data eviction (eg: saturated server at the
other end, bursty traffic caused by too large maxpollevents).
This behaviour is observable with persistent connections when there
is a large enough connection count so that there's no data in the
early connection and polling is required, because there are then
up to 4 epoll_ctl() calls per request. It's important that the
server is slower than haproxy to cause some delays when reading
response.
The current connection layer as designed in 1.6 with the FD cache
doesn't require this trick anymore, though it still benefits from
it when it saves an FD from being uselessly polled. But compared
to the increased cost of enabling and disabling poll all the time,
it's still better to disable it. In some cases it's possible to
observe a performance increase as high as 30% by avoiding this
epoll_ctl() dance.
In the end we only want to disable it when the FD is speculatively
read and not when it's polled. For this we introduce a new function
__conn_data_done_recv() which is used to indicate that we're done
with recv() and not interested in new attempts. If/when we later
support event-triggered epoll, this function will have to change
a bit to do the same even in the polled case.
A quick test with keep-alive requests run on a dual-core / dual-
thread Atom shows a significant improvement :
single process, 0 bytes :
before: Requests per second: 12243.20 [#/sec] (mean)
after: Requests per second: 13354.54 [#/sec] (mean)
single process, 4k :
before: Requests per second: 9639.81 [#/sec] (mean)
after: Requests per second: 10991.89 [#/sec] (mean)
dual process, 0 bytes (unstable) :
before: Requests per second: 16900-19800 ~ 17600 [#/sec] (mean)
after: Requests per second: 18600-21400 ~ 20500 [#/sec] (mean)
In 1.6-dev2, commit 32990b5 ("MEDIUM: session: remove the task pointer
from the session") introduced a bug which can sometimes crash the process
on resource shortage. When stream_complete() returns -1, it has already
reattached the connection to the stream, then kill_mini_session() is
called and still expects to find the task in conn->owner. Note that
since this commit, the code has moved a bit and is now in stream_new()
but the problem remains the same.
Given that we already know the task around these places, let's simply
pass the task to kill_mini_session().
The conditions currently at risk are :
- failure to initialize filters for the new stream (lack of memory or
any filter returning < 0 on attach())
- failure to attach filters (any filter returning < 0 on stream_start())
- frontend's accept() returning < 0 (allocation failure)
This fix is needed in 1.7 and 1.6.
This allow a filter to start to analyze data in HTTP and to fallback in TCP when
data are tunneled.
[wt: backport desired in 1.7 - no impact right now but may impact the ability
to backport future fixes]
These 2 analyzers are responsible of the data forwarding in, respectively, HTTP
mode and TCP mode. Now, the analyzer responsible of the HTTP data forwarding is
called before the one responsible of the TCP data forwarding. This will allow
the filtering of tunneled data in HTTP.
[wt: backport desired in 1.7 - no impact right now but may impact the ability
to backport future fixes]
In HAProxy 1.6, When "http-tunnel" option is enabled, HTTP transactions are
tunneled as soon as possible after the headers parsing/forwarding. When the
transfer length of the response can be determined, this happens when all data
are forwarded. But for responses with an undetermined transfer length this
happens when headers are forwarded. This behavior is questionable, but this is
not the purpose of this fix...
In HAProxy 1.7, the first use-case works like in 1.6. But the second one not
because of the data filtering. HAProxy was always trying to forward data until
the server closes the connection. So the transaction was never switched in
tunnel mode. This is the expected behavior when there is a data filter. But in
the default case (no data filter), it should work like in 1.6.
This patch fixes the bug. We analyze response data until the server closes the
connection only when there is a data filter.
[wt: backport needed in 1.7]
When a 2xx response to a CONNECT request is returned, the connection must be
switched in tunnel mode immediatly after the headers, and Transfer-Encoding and
Content-Length headers must be ignored. So from the HTTP parser point of view,
there is no body.
The bug comes from the fact the flag HTTP_MSGF_XFER_LEN was not set on the
response (This flag means that the body size can be determined. In our case, it
can, it is 0). So, during data forwarding, the connection was never switched in
tunnel mode and we were blocked in a state where we were waiting that the
server closes the connection to ends the response.
Setting the flag HTTP_MSGF_XFER_LEN on the response fixed the bug.
The code of http_wait_for_response has been slightly updated to be more
readable.
[wt: 1.7-only, this is not needed in 1.6]
When a backend doesn't use any known LB algorithm, backend_lb_algo_str()
returns NULL. It used to cause "nil" to be printed in the stats dump
since version 1.4 but causes 1.7 to try to parse this NULL to encode
it as a CSV string, causing a crash on "show stat" in this case.
The only situation where this can happen is when "transparent" or
"dispatch" are used in a proxy, in which case the LB algorithm is
BE_LB_ALGO_NONE. Thus now we explicitly report "none" when this
situation is detected, and we preventively report "unknown" if any
unknown algorithm is detected, which may happen if such an algo is
added in the future and the function is not updated.
This fix must be backported to 1.7 and may be backported as far as
1.4, though it has less impact there.
Historically we used to have the stick counters processing put into
session.c which became stream.c. But a big part of it is now in
stick-table.c (eg: converters) but despite this we still have all
the sample fetch functions in stream.c
These parts do not depend on the stream anymore, so let's move the
remaining chunks to stick-table.c and have cleaner files.
What remains in stream.c is everything needed to attach/detach
trackers to the stream and to update the counters while the stream
is being processed.
There's no more reason to keep tcp rules processing inside proto_tcp.c
given that there is nothing in common there except these 3 letters : tcp.
The tcp rules are in fact connection, session and content processing rules.
Let's move them to "tcp-rules" and let them live their life there.
There are 8 functions each repeating what another does and adding one
extra test. We used to have some copy-paste issues in the past due to
this. Instead we now make them simply rely on the previous one and add
the final test. It's much better and much safer. The functions could
be moved to inlines but they're used at a few other locations only,
it didn't make much sense in the end.
We used to have 3 types of counters with a huge overlap :
- listener counters : stats collected for each bind line
- proxy counters : union of the frontend and backend counters
- server counters : stats collected per server
It happens that quite a good part was common between listeners and
proxies due to the frontend counters being updated at the two locations,
and that similarly the server and proxy counters were overlapping and
being updated together.
This patch cleans this up to propose only two types of counters :
- fe_counters: used by frontends and listeners, related to
incoming connections activity
- be_counters: used by backends and servers, related to outgoing
connections activity
This allowed to remove some non-sensical counters from both parts. For
frontends, the following entries were removed :
cum_lbconn, last_sess, nbpend_max, failed_conns, failed_resp,
retries, redispatches, q_time, c_time, d_time, t_time
For backends, this ones was removed : intercepted_req.
While doing this it was discovered that we used to incorrectly report
intercepted_req for backends in the HTML stats, which was always zero
since it's never updated.
Also it revealed a few inconsistencies (which were not fixed as they
are harmless). For example, backends count connections (cum_conn)
instead of sessions while servers count sessions and not connections.
Over the long term, some extra cleanups may be performed by having
some counters update functions touching both the server and backend
at the same time, as well as both the frontend and listener, to
ensure that all sides have all their stats properly filled. The stats
dump will also be able to factor the dump functions by counter types.
When dealing with many proxies, it's hard to spot response errors because
all internet-facing frontends constantly receive attacks. This patch now
makes it possible to demand that only request or response errors are dumped
by appending "request" or "reponse" to the show errors command.
The function log format emit its own error message using Alert(). This
patch replaces this behavior and uses the standard HAProxy error system
(with memprintf).
The benefits are:
- cleaning the log system
- the logformat can ignore the caller (actually the caller must set
a flag designing the caller function).
- Make the usage of the logformat function easy for future components.
For tokenizing a string, standard Lua recommends to use regexes.
The followinf example splits words:
for i in string.gmatch(example, "%S+") do
print(i)
end
This is a little bit overkill for simply split words. This patch
adds a tokenize function which quick and do not use regexes.
gcc 3.4.6 noticed a possibly unitialized variable in vars.c, and while it
cannot happen the way the function is used, it's surprizing that newer
versions did not report it.
This fix may be backported to 1.6.
This patch takes into account the return code of the parse_logformat_string()
function. Now the configuration parser will fail if the log_format is not
strict.
Until now, the function parse_logformat_string() never fails. It
send warnings when it parses bad format, and returns expression in
best effort.
This patch replaces warnings by alert and returns a fail code.
Maybe the warning mode is designed for a compatibility with old
configuration versions. If it is the case, now this compatibility
is broken.
[wt: no, the reason is that an alert must cause a startup failure,
but this will be OK with next patch]
The log-format function parse_logformat_string() takes file and line
for building parsing logs. These two parameters are embedded in the
struct proxy curproxy, which is the current parsing context.
This patch removes these two unused arguments.
This patch replace the successful return code from 0 to 1. The
error code is replaced from 1 to 0.
The return code of this function is actually unused, so this
patch cannot modify the behaviour.
This patch replaces the successful return code from 0 to 1. The
error code is replaced from -1 to 0.
The return code of this function is actually unused, so this
patch cannot modify the behaviour.
The caller must log location information, so this information is
provided two times in the log line. The error log is like this:
[ALERT] 327/011513 (14291) : parsing [o3.conf:38]: 'http-response
set-header': Sample fetch <method,json(rrr)> failed with : invalid
args in conv method 'json' : Unexpected input code type at file
'o3.conf', line 38. Allowed value are 'ascii', 'utf8', 'utf8s',
'utf8p' and 'utf8ps'.
This patch removes the second location indication, the the same error
becomes:
[ALERT] 327/011637 (14367) : parsing [o3.conf:38]: 'http-response
set-header': Sample fetch <method,json(rrr)> failed with : invalid
args in conv method 'json' : Unexpected input code type. Allowed
value are 'ascii', 'utf8', 'utf8s', 'utf8p' and 'utf8ps'.
stats_sock_parse_request() was renamed cli_parse_request(). It now takes
an appctx instead of a stream interface, and presets ->st2 to 0 so that
most handlers will not have to set it anymore. The io_handler is set by
default to the keyword's IO handler so that the parser can simply change
it without having to rewrite the new state.
All 4 rate-limit settings were handled at once given that exactly the
same checks are performed on them. In case of missing or incorrect
argument, the detailed supported options are printed with their use
case.
This was the last specific entry in the CLI parser, some additional
cleanup may still be done.
Also mention that "set server" is preferred now. Note that these
were the last enable/disable commands in cli.c. Also remove the
now unused expect_server_admin() function.
It could be argued that it's between server, stream and session but
at least due to the fact that it operates on streams, its best place
is in stream.c.
This way we don't have any more state specific to a given yieldable
command. The other commands should be easier to move as they only
involve a parser.
It really belongs to proto_http.c since it's a dump for HTTP request
and response errors. Note that it's possible that some parts do not
need to be exported anymore since it really is the only place where
errors are manipulated.
The table dump code was a horrible mess, with common parts interleaved
all the way to deal with the various actions (set/clear/show). A few
error messages were still incorrect, as the "set" operation did not
update them so they would still report "unknown action" (now fixed).
The action was now passed as a private argument to the CLI keyword
which itself is copied into the appctx private field. It's just an
int cast to a pointer.
Some minor issues were noticed while doing this, for example when dumping
an entry by key, if the key doesn't exist, nothing is printed, not even
the table's header. It's unclear whether this was intentional but it
doesn't really match what is done for data-based dumps. It was left
unchanged for now so that a later fix can be backported if needed.
Enum entries STAT_CLI_O_TAB, STAT_CLI_O_CLR and STAT_CLI_O_SET were
removed.
Move the "show info" command to stats.c using the CLI keyword API
to register it on the CLI. The stats_dump_info_to_buffer() function
is now static again. Note, we don't need proto_ssl anymore in cli.c.
Move the "show stat" command to stats.c using the CLI keyword API
to register it on the CLI. The stats_dump_stat_to_buffer() function
is now static again.
Move 'show sess' CLI functions to stream.c and use the cli keyword API
to register it on the CLI.
[wt: the choice of stream vs session makes sense because since 1.6 these
really are streams that we're dumping and not sessions anymore]
Several CLI commands require a frontend, so let's have a function to
look this one up and prepare the appropriate error message and the
appctx's state in case of failure.
Several CLI commands require a server, so let's have a function to
look this one up and prepare the appropriate error message and the
appctx's state in case of failure.
Move map and acl CLI functions to map.c and use the cli keyword API to
register actions on the CLI. Then remove the now unused individual
"add" and "del" keywords.
proto/dumpstats.h has been split in 4 files:
* proto/cli.h contains protypes for the CLI
* proto/stats.h contains prototypes for the stats
* types/cli.h contains definition for the CLI
* types/stats.h contains definition for the stats
When the CLI's timeout is reduced, nothing was done to take the task
up to update it. In the past it used to run inside process_stream()
so it used to be refreshed. This is not the case anymore since we have
the appctx so the task needs to be woken up in order to recompute the
new expiration date.
This fix needs to be backported to 1.6.
The "set maxconn frontend" statement on the CLI tries to dequeue possibly
pending requests, but due to a copy-paste error, they're dequeued on the
CLI's frontend instead of the one being changed.
The impact is very minor as it only means that possibly pending connections
will still have to wait for a previous one to complete before being accepted
when a limit is raised.
This fix has to be backported to 1.6 and 1.5.
In dumpstats.c we have get_conn_xprt_name() and get_conn_data_name() to
report the name of the data and transport layers used on a connection.
But when the name is not known, its pointer is reported instead. But the
static char used to report the pointer is too small as it doesn't leave
room for '0x'. Fortunately all subsystems are known so we never trigger
this case.
This fix needs to be backported to 1.6 and 1.5.
uint16_t instead of u_int16_t
None ISO fields of struct tm are not present, but
by zeroyfing it, on GNU and BSD systems tm_gmtoff
field will be set.
[wt: moved the memset into each of the date functions]
It defines the variable to set when an error occurred during an event
processing. It will only be set when an error occurred in the scope of the
transaction. As for all other variables define by the SPOE, it will be
prefixed. So, if your variable name is "error" and your prefix is "my_spoe_pfx",
the variable will be "txn.my_spoe_pfx.error".
When set, the variable is the boolean "true". Note that if "option
continue-on-error" is set, the variable is not automatically removed between
events processing.
"maxconnrate" is the maximum number of connections per second. The SPOE will
stop to open new connections if the maximum is reached and will wait to acquire
an existing one.
"maxerrrate" is the maximum number of errors per second. The SPOE will stop its
processing if the maximum is reached.
These options replace hardcoded macros MAX_NEW_SPOE_APPLETS and
MAX_NEW_SPOE_APPLET_ERRS. We use it to limit SPOE activity, especially when
servers are down..
By default, for a specific stream, when an abnormal/unexpected error occurs, the
SPOE is disabled for all the transaction. So if you have several events
configured, such error on an event will disabled all followings. For TCP
streams, this will disable the SPOE for the whole session. For HTTP streams,
this will disable it for the transaction (request and response).
To bypass this behaviour, you can set 'continue-on-error' option in 'spoe-agent'
section. With this option, only the current event will be ignored.
It is a way to set the maximum time to wait for a stream to process an event,
i.e to acquire a stream to talk with an agent, to encode all messages, to send
the NOTIFY frame, to receive the corrsponding acknowledgement and to process all
actions. It is applied on the stream that handle the client and the server
sessions.
When:
- A Lua action return data and close the channel. The request status
is set to HTTP_MSG_CLOSED for the request and HTTP_MSG_DONE for the
response.
- HAProxy sets the state HTTP_MSG_ERROR. I don't known why, because
there are many line which sets this state.
- A Lua sample-fetch is executed, typically for building the log
line.
- When the Lua sample fetch exits, a control of the data is
executed. If HAProxy is currently parsing the request, the request
is aborted in order to prevent a segfault or sending corrupted
data.
This ast control is executed comparing the state HTTP_MSG_BODY. When
this state is reached, the request is parsed and no error are
possible. When the state is < than HTTP_MSG_BODY, the parser is
running.
Unfortunately, the code HTTP_MSG_ERROR is just < HTTP_MSG_BODY. When
we are in error, we want to terminate the execution of Lua without
error.
This patch changes the comparaison level.
This patch must be backported in 1.6
Gernot Prner reported some constant leak of ref counts for stick tables
entries. It happens that this leak was not at all in the regular traffic
path but on the "show table" path. An extra ref count was taken during
the dump if the output had to be paused, and it was released upon clean
termination or an error detected in the I/O handler. But the release
handler didn't do it, while it used to properly do it for the sessions
dump.
This fix needs to be backported to 1.6.
Commit ef8f4fe ("BUG/MINOR: stick-table: handle out-of-memory condition
gracefully") unfortunately got trapped by a pointer operation. Replacing
ts = poll_alloc() + size;
with :
ts = poll_alloc();
ts += size;
Doesn't give the same result because pool_alloc() is void while ts is a
struct stksess*. So now we don't access the same places, which is visible
in certain stick-table scenarios causing a crash.
This must be backported to 1.6 and 1.5.
Setting an FD to -1 when closed isn't the most easily noticeable thing
to do when we're chasing accidental reuse of a stale file descriptor.
Instead set it to that large a negative value that it will overflow the
fdtab and provide an analysable core at the moment the issue happens.
Care was taken to ensure it doesn't overflow nor change sign on 32-bit
machines when multiplied by fdtab, and that it also remains negative for
the various checks that exist. The value equals 0xFDDEADFD which happens
to be easily spotted in a debugger.
The bug described in commit 568743a ("BUG/MEDIUM: stream-int: completely
detach connection on connect error") was not a stream-interface layer bug
but a connection layer bug. There was exactly one place in the code where
we could change a file descriptor's status without first checking whether
it is valid or not, it was in conn_stop_polling(). This one is called when
the polling status is changed after an update, and calls fd_stop_both even
if we had already closed the file descriptor :
1479388298.484240 ->->->->-> conn_fd_handler > conn_cond_update_polling
1479388298.484240 ->->->->->-> conn_cond_update_polling > conn_stop_polling
1479388298.484241 ->->->->->->-> conn_stop_polling > conn_ctrl_ready
1479388298.484241 conn_stop_polling < conn_ctrl_ready
1479388298.484241 ->->->->->->-> conn_stop_polling > fd_stop_both
1479388298.484242 ->->->->->->->-> fd_stop_both > fd_update_cache
1479388298.484242 ->->->->->->->->-> fd_update_cache > fd_release_cache_entry
1479388298.484242 fd_update_cache < fd_release_cache_entry
1479388298.484243 fd_stop_both < fd_update_cache
1479388298.484243 conn_stop_polling < fd_stop_both
1479388298.484243 conn_cond_update_polling < conn_stop_polling
1479388298.484243 conn_fd_handler < conn_cond_update_polling
The problem with the previous fix above is that it break the http_proxy mode
and possibly even some Lua parts and peers to a certain extent ; all outgoing
connections where the target address is initially copied into the outgoing
connection which experience a retry would use a random outgoing address after
the retry because closing and detaching the connection causes the target
address to be lost. This was attempted to be addressed by commit 0857d7a
("BUG/MAJOR: stream: properly mark the server address as unset on connect
retry") but it used to only solve the most visible effect and not the root
cause.
Prior to this fix, it was possible to cause this config to keep CLOSE_WAIT
for as long as it takes to expire a client or server timeout (note the
missing client timeout) :
listen test
mode http
bind :8002
server s1 127.0.0.1:8001
$ tcploop 8001 L0 W N20 A R P100 S:"HTTP/1.1 200 OK\r\nContent-length: 0\r\n\r\n" &
$ tcploop 8002 N200 C T W S:"GET / HTTP/1.0\r\n\r\n" O P10000 K
With this patch, these CLOSE_WAIT properly vanish when both processes leave.
This commit reverts the two fixes above and replaces them with the proper
fix in connection.h. It must be backported to 1.6 and 1.5. Thanks to
Robson Roberto Souza Peixoto for providing very detailed traces showing
some obvious inconsistencies leading to finding this bug.
This pointer will be used for storing private context. With this,
the same executed function can handle more than one keyword. This
will be very useful for creation Lua cli bindings.
The release function is called when the command is terminated (give
back the hand to the prompt) or when the session is broken (timeout
or client closed).
In case `pool_alloc2()` returns NULL, propagate the condition to the
caller. This could happen when limiting the amount of memory available
for HAProxy with `-m`.
[wt: backport to 1.6 and 1.5 needed]
Before this change, trash is being used to create certificate filename
to read in care Mutli-Cert are in used. But then ssl_sock_load_ocsp()
modify trash leading to potential wrong information given in later error
message.
This also blocks any further use of certificate filename for other
usage, like ongoing patch to support Certificate Transparency handling
in Multi-Cert bundle.
The unlikely macro doesn't take in acount the condition, but only one
variable.
Must be backported in 1.6
[wt: with gcc 3.x, unlikely(x) is defined as __builtin_expect((x) != 0, 0)
so the condition is wrong for negative numbers, which correspond to the
case where bi_getblk_nc() has reached the end of the buffer and the
channel is already closed. With gcc 4.x, the output is cast to unsigned long
so the <=0 will not match negative values either. This is only used in Lua
for now so that may explain why it hasn't hit yet]
A new "option spop-check" statement has been added to enable server health
checks based on SPOP HELLO handshake. SPOP is the protocol used by SPOE filters
to talk to servers.
SPOE makes possible the communication with external components to retrieve some
info using an in-house binary protocol, the Stream Processing Offload Protocol
(SPOP). In the long term, its aim is to allow any kind of offloading on the
streams. This first version, besides being experimental, won't do lot of
things. The most important today is to validate the protocol design and lay the
foundations of what will, one day, be a full offload engine for the stream
processing.
So, for now, the SPOE can offload the stream processing before "tcp-request
content", "tcp-response content", "http-request" and "http-response" rules. And
it only supports variables creation/suppression. But, in spite of these limited
features, we can easily imagine to implement a SSO solution, an ip reputation
service or an ip geolocation service.
Internally, the SPOE is implemented as a filter. So, to use it, you must use
following line in a proxy proxy section:
frontend my-front
...
filter spoe [engine <name>] config <file>
...
It uses its own configuration file to keep the HAProxy configuration clean. It
is also a easy way to disable it by commenting out the filter line.
See "doc/SPOE.txt" for all details about the SPOE configuration.
It does the opposite of 'set-var' action/converter. It is really useful for
per-process variables. But, it can be used for any scope.
The lua function 'unset_var' has also been added.
Now it is possible to use variables attached to a process. The scope name is
'proc'. These variables are released only when HAProxy is stopped.
'tune.vars.proc-max-size' directive has been added to confiure the maximum
amount of memory used by "proc" variables. And because memory accounting is
hierachical for variables, memory for "proc" vars includes memory for "sess"
vars.
This function, unsurprisingly, sets a variable value only if it already
exists. In other words, this function will succeed only if the variable was
found somewhere in the configuration during HAProxy startup.
It will be used by SPOE filter. So an agent will be able to set a value only for
existing variables. This prevents an agent to create a very large number of
unused variables to flood HAProxy and exhaust the memory reserved to variables..
This code has been moved from haproxy.c to sample.c and the function
release_sample_expr can now be called from anywhere to release a sample
expression. This function will be used by the stream processing offload engine
(SPOE).
A scope is a section name between square bracket, alone on its line, ie:
[scope-name]
...
The spaces at the beginning and at the end of the line are skipped. Comments at
the end of the line are also skipped.
When a scope is parsed, its name is saved in the global variable
cfg_scope. Initially, cfg_scope is NULL and it remains NULL until a valid scope
line is parsed.
This feature remains unused in the HAProxy configuration file and
undocumented. However, it will be used during SPOE configuration parsing.
This feature will be used by the stream processing offload engine (SPOE) to
parse dedicated configuration files without mixing HAProxy sections with SPOE
sections.
So, here we can back up all sections known by HAProxy, unregister all of them
and add new ones, dedicted to the SPOE. Once the SPOE configuration file parsed,
we can roll back all changes by restoring HAProxy sections.
Now, for TCP streams, backend filters are released when the stream is
destroyed. But, for HTTP streams, these filters are released when the
transaction analyze ends, in flt_end_analyze callback.
New callbacks have been added to handle creation and destruction of filter
instances:
* 'attach' callback is called after a filter instance creation, when it is
attached to a stream. This happens when the stream is started for filters
defined on the stream's frontend and when the backend is set for filters
declared on the stream's backend. It is possible to ignore the filter, if
needed, by returning 0. This could be useful to have conditional filtering.
* 'detach' callback is called when a filter instance is detached from a stream,
before its destruction. This happens when the stream is stopped for filters
defined on the stream's frontend and when the analyze ends for filters defined
on the stream's backend.
In addition, the callback 'stream_set_backend' has been added to know when a
backend is set for a stream. It is only called when the frontend and the backend
are not the same. And it is called for all filters attached to a stream
(frontend and backend).
Finally, the TRACE filter has been updated.
The 'set-var' converter uses function smp_conv_store (vars.c). In this function,
we should use the first argument (index 0) to retrieve the variable name and its
scope. But because of a typo, we get the scope of the second argument (index
1). In this case, there is no second argument. So the scope used was always 0
(SCOPE_SESS), always setting the variable in the session scope.
So, due to this bug, this rules
tcp-request content accept if { src,set-var(txn.foo) -m found }
always set the variable 'sess.foo' instead of 'txn.foo'.
Now that it is possible to decide whether we prefer to use libc or the
state file to resolve the server's IP address and it is possible to change
a server's IP address at run time on the CLI, let's not restrict the reuse
of the address from the state file anymore to the DNS only.
The impact is that by default the state file will be considered first
(which matches its purpose) and only then the libc. This way any address
change performed at run time over the CLI will be preserved regardless
of DNS usage or not.
It is very common when validating a configuration out of production not to
have access to the same resolvers and to fail on server address resolution,
making it difficult to test a configuration. This option simply appends the
"none" method to the list of address resolution methods for all servers,
ensuring that even if the libc fails to resolve an address, the startup
sequence is not interrupted.
This will allow a server to automatically fall back to an explicit numeric
IP address when all other methods fail. The address is simply specified in
the address list.
Now that we have "init-addr none", it becomes possible to recover on
libc resolver's failures. Thus it's preferable not to alert nor fail
at the moment the libc is called, and instead process the failure at
the end of the list. This allows "none" to be set after libc to
provide a smooth fallback in case of resolver issues.
This new setting supports a comma-delimited list of methods used to
resolve the server's FQDN to an IP address. Currently supported methods
are "libc" (use the regular libc's resolver) and "last" (use the last
known valid address found in the state file).
The list is implemented in a 32-bit integer, because each init-addr
method only requires 3 bits. The last one must always be SRV_IADDR_END
(0), allowing to store up to 10 methods in a single 32 bit integer.
Note: the doc is provided at the end of this series.
The RMAINT state happens when a server doesn't get a valid DNS response
past the hold time. If the address is forced on the CLI, we must use it
and leave the RMAINT state.
WARNING: this is a MAJOR (and disruptive) change with previous HAProxy's
behavior: before, HAProxy never ever used to change a server administrative
status when the DNS resolution failed at run time.
This patch gives HAProxy the ability to change the administrative status
of a server to MAINT (RMAINT actually) when an error is encountered for
a period longer than its own allowed by the corresponding 'hold'
parameter.
IE if the configuration sets "hold nx 10s" and a server's hostname
points to a NX for more than 10s, then the server will be set to RMAINT,
hence in MAINTENANCE mode.
This adds new "hold" timers : nx, refused, timeout, other. This timers
will be used to tell HAProxy to keep an erroneous response as valid for
the corresponding period. For now they're only configured, not enforced.
It will be important to help debugging some DNS resolution issues to
know why a server was marked down, so let's make the function support
a 3rd argument with an indication of the reason. Passing NULL will keep
the message as-is.
The server's state is now "MAINT (resolution)" just like we also have
"MAINT (via x/y)" when servers are tracked. The HTML stats page reports
"resolution" in the checks field similarly to what is done for the "via"
entry.
It's important to report in the server state change logs that RMAINT was
cleared, as it's not the regular maintenance mode, it's specific to name
resolution, and it's important to report the new state (which can be DRAIN
or READY).
Server addresses are not resolved anymore upon the first pass so that we
don't fail if an address cannot be resolved by the libc. Instead they are
processed all at once after the configuration is fully loaded, by the new
function srv_init_addr(). This function only acts on the server's address
if this address uses an FQDN, which appears in server->hostname.
For now the function does two things, to followup with HAProxy's historical
default behavior:
1. apply server IP address found in server-state file if runtime DNS
resolution is enabled for this server
2. use the DNS resolver provided by the libc
If none of the 2 options above can find an IP address, then an error is
returned.
All of this will be needed to support the new server parameter "init-addr".
For now, the biggest user-visible change is that all server resolution errors
are dumped at once instead of causing a startup failure one by one.
Currently, the function which applies server states provided by the
"old" process is applied after configuration sanity check. This results
in the impossibility to check the validity of the state file during a
regular config check, implying a full start is required, which can be
a problem sometimes.
This patch moves the loading of server_state file before MODE_CHECK.
This will be needed to later postpone server address resolution. We need the
FQDN even when it doesn't resolve. The caller then needs to check if fqdn was
set when resolve is null to detect that the address couldn't be parsed and
needs later resolution.
Quite a lot of people have been complaining about option contstats not
working correctly anymore since about 1.4. The reason was that one reason
for the significant performance boost between 1.3 and 1.4 was the ability
to forward data between a server and a client without waking up the stream
manager. And we couldn't afford to force sessions to constantly wake it
up given that most of the people interested in contstats are also those
interested in high performance transmission.
An idea was experimented with in the past, consisting in limiting the
amount of transmissible data before waking it up, but it was not usable
on slow connections (eg: FTP over modem lines, RDP, SSH) as stats would
be updated too rarely if at all, so that idea was dropped.
During a discussion today another idea came up : ensure that stats are
updated once in a while, since it's the only thing that matters. It
happens that we have the request channel's analyse_exp timeout that is
used to wake the stream up after a configured delay, and that by
definition this timeout is not used when there's no more analyser
(otherwise the stream would wake up and the stats would be updated).
Thus here the idea is to reuse this timeout when there's no analyser
and set it to now+5 seconds so that a stream wakes up at least once
every 5 seconds to update its stats. It should be short enough to
provide smooth traffic graphs and to allow to debug outputs of "show
sess" more easily without inflicting too much load even for very large
number of concurrent connections.
This patch is simple enough and safe enough to be backportable to 1.6
if there is some demand.
In the last release a lot of the structures have become opaque for an
end user. This means the code using these needs to be changed to use the
proper functions to interact with these structures instead of trying to
manipulate them directly.
This does not fix any deprecations yet that are part of 1.1.0, it only
ensures that it can be compiled against that version and is still
compatible with older ones.
[wt: openssl-0.9.8 doesn't build with it, there are conflicts on certain
function prototypes which we declare as inline here and which are
defined differently there. But openssl-0.9.8 is not supported anymore
so probably it's OK to go without it for now and we'll see later if
some users still need it. Emeric has reviewed this change and didn't
spot anything obvious which requires special care. Let's try it for
real now]
The only reason wurfl/wurfl.h was needed outside of wurfl.c was to expose
wurfl_handle which is a pointer to a structure, referenced by global.h.
By just storing a void* there instead, we can confine all wurfl code to
wurfl.c, which is really nice.
WURFL is a high-performance and low-memory footprint mobile device
detection software component that can quickly and accurately detect
over 500 capabilities of visiting devices. It can differentiate between
portable mobile devices, desktop devices, SmartTVs and any other types
of devices on which a web browser can be installed.
In order to add WURFL device detection support, you would need to
download Scientiamobile InFuze C API and install it on your system.
Refer to www.scientiamobile.com to obtain a valid InFuze license.
Any useful information on how to configure HAProxy working with WURFL
may be found in:
doc/WURFL-device-detection.txt
doc/configuration.txt
examples/wurfl-example.cfg
Please find more information about WURFL device detection API detection
at https://docs.scientiamobile.com/documentation/infuze/infuze-c-api-user-guide
Right now there is an issue with the way the maintenance flags are
propagated upon startup. They are not propagate, just copied from the
tracked server. This implies that depending on the server's order, some
tracking servers may not be marked down. For example this configuration
does not work as expected :
server s1 1.1.1.1:8000 track s2
server s2 1.1.1.1:8000 track s3
server s3 1.1.1.1:8000 track s4
server s4 wtap:8000 check inter 1s disabled
It results in s1/s2 being up, and s3/s4 being down, while all of them
should be down.
The only clean way to process this is to run through all "root" servers
(those not tracking any other server), and to propagate their state down
to all their trackers. This is the same algorithm used to propagate the
state changes. It has to be done both to compute the IDRAIN flag and the
IMAINT flag. However, doing so requires that tracking servers are not
marked as inherited maintenance anymore while parsing the configuration
(and given that it is wrong, better drop it).
This fix also addresses another side effect of the bug above which is
that the IDRAIN/IMAINT flags are stored in the state files, and if
restored while the tracked server doesn't have the equivalent flag,
the servers may end up in a situation where it's impossible to remove
these flags. For example in the configuration above, after removing
"disabled" on server s4, the other servers would have remained down,
and not anymore with this fix. Similarly, the combination of IMAINT
or IDRAIN with their respective forced modes was not accepted on
reload, which is wrong as well.
This bug has been present at least since 1.5, maybe even 1.4 (it came
with tracking support). The fix needs to be backported there, though
the srv-state parts are irrelevant.
This commit relies on previous patch to silence warnings on startup.
We'll have to use srv_set_admin_flag() to propagate some server flags
during the startup, and we don't want the resulting actions to cause
warnings, logs nor e-mail alerts to be generated since we're just applying
the config or a state file. So let's condition these notifications to the
fact that we're starting.
CMAINT indicates that the server was *initially* disabled in the
configuration via the "disabled" keyword. FDRAIN indicates that the
server was switched to the DRAIN state from the CLI or the agent.
This it's perfectly valid to have both of them in the state file,
so the parser must not reject this combination.
This fix must be backported to 1.6.
There were seveal reports about the DRAIN state not being properly
restored upon reload.
It happens that the condition in the code does exactly the opposite
of what the comment says, and the comment is right so the code is
wrong.
It's worth noting that the conditions are complex here due to the 2
available methods to set the drain state (CLI/agent, and config's
weight). To paraphrase the updated comment in the code, there are
two possible reasons for FDRAIN to have been present :
- previous config weight was zero
- "set server b/s drain" was sent to the CLI
In the first case, we simply want to drop this drain state if the new
weight is not zero anymore, meaning the administrator has intentionally
turned the weight back to a positive value to enable the server again
after an operation. In the second case, the drain state was forced on
the CLI regardless of the config's weight so we don't want a change to
the config weight to lose this status. What this means is :
- if previous weight was 0 and new one is >0, drop the DRAIN state.
- if the previous weight was >0, keep it.
This fix must be backported to 1.6.
http_find_header2() relies on find_hdr_value_end() to find the comma
delimiting a header field value, which also properly handles double
quotes and backslashes within quotes. In fact double quotes are very
rare, and commas happen once every multiple characters, especially
with cookies where a full block can be found at once. So it makes
sense to optimize this function to speed up the lookup of the first
block before the quote.
This change increases the performance from 212k to 217k req/s when
requests contain a 1kB cookie (+2.5%). We don't care about going
back into the fast parser after the first quote, as it may
needlessly make the parser more complex for very marginal gains.
Searching the trailing space in long URIs takes some time. This can
happen especially on static files and some blogs. By skipping valid
character ranges by 32-bit blocks, it's possible to increase the
HTTP performance from 212k to 216k req/s on requests features a
100-character URI, which is an increase of 2%. This is done for
architectures supporting unaligned accesses (x86_64, x86, armv7a).
There's only a 32-bit version because URIs are rarely long and very
often short, so it's more efficient to limit the systematic overhead
than to try to optimize for the rarest requests.
A performance test with 1kB cookies was capping at 194k req/s. After
implementing multi-byte skipping, the performance increased to 212k req/s,
or 9.2% faster. This patch implements this for architectures supporting
unaligned accesses (x86_64, x86, armv7a). Maybe other architectures can
benefit from this but they were not tested yet.
We used to have 7 different character classes, each was 256 bytes long,
resulting in almost 2kB being used in the L1 cache. It's as cheap to
test a bit than to check the byte is not null, so let's store a 7-bit
composite value and check for the respective bits there instead.
The executable is now 4 kB smaller and the performance on small
objects increased by about 1% to 222k requests/second with a config
involving 4 http-request rules including 1 header lookup, one header
replacement, and 2 variable assignments.
ipcpy() is used to replace an IP address with another one, but it
doesn't preserve the original port so all callers have to do it
manually while it's trivial to do there. Better do it inside the
function.
Often we need to call str2ip2() on an address which already contains a
port without replacing it, so let's ensure we preserve it even if the
family changes.
Gabriele Cerami reported the the exit codes of the systemd-wrapper are
wrong. In short, it directly returns the output of the wait syscall's
status, which is a composite value made of error code an signal numbers.
In general it contains the signal number on the lower bits and the error
code on the higher bits, but exit() truncates it to the lowest 8 bits,
causing config validations to incorrectly report a success. Example :
$ ./haproxy-systemd-wrapper -c -f /dev/null
<7>haproxy-systemd-wrapper: executing /tmp/haproxy -c -f /dev/null -Ds
Configuration file has no error but will not start (no listener) => exit(2).
<5>haproxy-systemd-wrapper: exit, haproxy RC=512
$ echo $?
0
If the process is killed however, the signal number is directly reported
in the exit code.
Let's fix all this to ensure that the exit code matches what the shell does,
which means that codes 0..127 are for exit codes, codes 128..254 for signals,
and code 255 for unknown exit code. Now the return code is correct :
$ ./haproxy-systemd-wrapper -c -f /dev/null
<7>haproxy-systemd-wrapper: executing /tmp/haproxy -c -f /dev/null -Ds
Configuration file has no error but will not start (no listener) => exit(2).
<5>haproxy-systemd-wrapper: exit, haproxy RC=2
$ echo $?
2
$ ./haproxy-systemd-wrapper -f /tmp/cfg.conf
<7>haproxy-systemd-wrapper: executing /tmp/haproxy -f /dev/null -Ds
^C
<5>haproxy-systemd-wrapper: exit, haproxy RC=130
$ echo $?
130
This fix must be backported to 1.6 and 1.5.
There's no reason to use the stream anymore, only the appctx should be
used by a peer. This was a leftover from the migration to appctx and it
caused some confusion, so let's totally drop it now. Note that half of
the patch are just comment updates.
It was inherited from initial code but we must only manipulate the appctx
and never the stream, otherwise we always risk shooting ourselves in the
foot.
In case of resource allocation error, peer_session_create() frees
everything allocated and returns a pointer to the stream/session that
was put back into the free pool. This stream/session is then assigned
to ps->{stream,session} with no error control. This means that it is
perfectly possible to have a new stream or session being both used for
a regular communication and for a peer at the same time.
In fact it is the only way (for now) to explain a CLOSE_WAIT on peers
connections that was caught in this dump with the stream interface in
SI_ST_CON state while the error field proves the state ought to have
been SI_ST_DIS, very likely indicating two concurrent accesses on the
same area :
0x7dbd50: [31/Oct/2016:17:53:41.267510] id=0 proto=tcpv4
flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil)
frontend=myhost2 (id=4294967295 mode=tcp), listener=? (id=0)
backend=<NONE> (id=-1 mode=-) addr=127.0.0.1:41432
server=<NONE> (id=-1) addr=127.0.0.1:8521
task=0x7dbcd8 (state=0x08 nice=0 calls=2 exp=<NEVER> age=1m5s)
si[0]=0x7dbf48 (state=CLO flags=0x4040 endp0=APPCTX:0x7d99c8 exp=<NEVER>, et=0x000)
si[1]=0x7dbf68 (state=CON flags=0x50 endp1=CONN:0x7dc0b8 exp=<NEVER>, et=0x020)
app0=0x7d99c8 st0=11 st1=0 st2=0 applet=<PEER>
co1=0x7dc0b8 ctrl=tcpv4 xprt=RAW data=STRM target=PROXY:0x7fe62028a010
flags=0x0020b310 fd=7 fd.state=22 fd.cache=0 updt=0
req=0x7dbd60 (f=0x80a020 an=0x0 pipe=0 tofwd=0 total=0)
an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
buf=0x78a3c0 data=0x78a3d4 o=0 p=0 req.next=0 i=0 size=0
res=0x7dbda0 (f=0x80402020 an=0x0 pipe=0 tofwd=0 total=0)
an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
buf=0x78a3c0 data=0x78a3d4 o=0 p=0 rsp.next=0 i=0 size=0
Special thanks to Arnaud Gavara who provided lots of valuable input and
ran some validation testing on this patch.
This fix must be backported to 1.6 and 1.5. Note that in 1.5 the
session is not assigned from within the function so some extra checks
may be needed in the callers.
This part was missed when peers were ported to the new applet
infrastructure in 1.6, the main stream is woken up instead of the
appctx. This creates a race condition by which it is possible to
wake the stream at the wrong moment and miss an event. This bug
might be at least partially responsible for some of the CLOSE_WAIT
that were reported on peers session upon reload in version 1.6.
This fix must be backported to 1.6.
Greetings,
Was recently working with a stick table storing URL's and one had an
equals sign in it (e.g. 127.0.0.1/f=ab) which made it difficult to
easily split the key and value without a regex.
This patch will change it so that the key looks like
"key=127.0.0.1/f\=ab" instead of "key=127.0.0.1/f=ab".
Not very important given that there are ways to work around it.
Thanks,
- Chad
The consistent hash lookup is done as normal, then if balancing is
enabled, we progress through the hash ring until we find a server that
doesn't have "too much" load. In the case of equal weights for all
servers, the allowed number of requests for a server is either the
floor or the ceil of (num_requests * hash-balance-factor / num_servers);
with unequal weights things are somewhat more complicated, but the
spirit is the same -- a server should not be able to go too far above
(its relative weight times) the average load. Using the hash ring to
make the second/third/etc. choice maintains as much locality as
possible given the load limit.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
For active servers, this is the sum of the eweights of all active
servers before this one in the backend, and
[srv->cumulative_weight .. srv_cumulative_weight + srv_eweight) is a
space occupied by this server in the range [0 .. lbprm.tot_wact), and
likewise for backup servers with tot_wbck. This allows choosing a
server or a range of servers proportional to their weight, by simple
integer comparison.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
0 will mean no balancing occurs; otherwise it represents the ratio
between the highest-loaded server and the average load, times 100 (i.e.
a value of 150 means a 1.5x ratio), assuming equal weights.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
Pierre Cheynier found that there's a persistent issue with the systemd
wrapper. Too fast reloads can lead to certain old processes not being
signaled at all and continuing to run. The problem was tracked down as
a race between the startup and the signal processing : nothing prevents
the wrapper from starting new processes while others are still starting,
and the resulting pid file will only contain the latest pids in this
case. This can happen with large configs and/or when a lot of SSL
certificates are involved.
In order to solve this we want the wrapper to wait for the new processes
to complete their startup. But we also want to ensure it doesn't wait for
nothing in case of error.
The solution found here is to create a pipe between the wrapper and the
sub-processes. The wrapper waits on the pipe and the sub-processes are
expected to close this pipe once they completed their startup. That way
we don't queue up new processes until the previous ones have registered
their pids to the pid file. And if anything goes wrong, the wrapper is
immediately released. The only thing is that we need the sub-processes
to know the pipe's file descriptor. We pass it in an environment variable
called HAPROXY_WRAPPER_FD.
It was confirmed both by Pierre and myself that this completely solves
the "zombie" process issue so that only the new processes continue to
listen on the sockets.
It seems that in the future this stuff could be moved to the haproxy
master process, also getting rid of an environment variable.
This fix needs to be backported to 1.6 and 1.5.
It's important to know that a signal sent to the wrapper had no effect
because something failed during execve(). Ideally more info (strerror)
should be reported. It would be nice to backport this to 1.6 and 1.5.
The wrapper is not the best reliable thing in the universe, so start
by adding at least the minimum expected controls :-/
To be backported to 1.5 and 1.6.
Since signals are inherited, we must restore them before calling execve()
and intercept them again after a failed execve(). In order to cleanly deal
with the SIGUSR2/SIGHUP loops where we re-exec the wrapper, we ignore these
two signals during a re-exec, and restore them to defaults when spawning
haproxy.
This should be backported to 1.6 and 1.5.
When execv() fails to execute the haproxy executable, it's important to
return an error instead of pretending everything is cool. This fix should
be backported to 1.6 and 1.5 in order to improve the overall reliability
under systemd.
Today, the certificate are indexed int he SNI tree using their CN and the
list of thier AltNames. So, Some certificates have the same names in the
CN and one of the AltNames entries.
Typically Let's Encrypt duplicate the the DNS name in the CN and the
AltName.
This patch prevents the creation of identical entries in the trees. It
checks the same DNS name and the same SSL context.
If the same certificate is registered two time it will be duplicated.
This patch should be backported in the 1.6 and 1.5 version.
Add some debug trace when haproxy is configured in debug & verbose mode.
This is useful for openssl tests. Typically, the error "SSL handshake
failure" can be caused by a lot of protocol error. This patch details
the encountered error. For exemple:
OpenSSL error 0x1408a0c1: ssl3_get_client_hello: no shared cipher
Note that my compilator (gcc-4.7) refuse to considers the function
ssl_sock_dump_errors() as inline. The condition "if" ensure that the
content of the function is not executed in normal case. It should be
a pity to call a function just for testing its execution condition, so
I use the macro "forceinline".
This commit introduces "tcp-request session" rules. These are very
much like "tcp-request connection" rules except that they're processed
after the handshake, so it is possible to consider SSL information and
addresses rewritten by the proxy protocol header in actions. This is
particularly useful to track proxied sources as this was not possible
before, given that tcp-request content rules are processed after each
HTTP request. Similarly it is possible to assign the proxied source
address or the client's cert to a variable.
This is in order to make integration of tcp-request-session cleaner :
- tcp_exec_req_rules() was renamed tcp_exec_l4_rules()
- LI_O_TCP_RULES was renamed LI_O_TCP_L4_RULES
(LI_O_*'s horrible indent was also fixed and a provision was left
for L5 rules).
These are denied conns. Strangely this wasn't emitted while it used to be
available for a while. It corresponds to the number of connections blocked
by "tcp-request connection reject".
Thus the SMP_USE_HTTP_ANY dependency is incorrect, we have to depend on
SMP_USE_L5_CLI (the session). It's particularly important for session-wide
variables which are kept across HTTP requests. For now there is no impact
but it will make a difference with tcp-request session rules.
smp_fetch_var() may be called from everywhere since it just reads a
variable. It must ensure that the stream exists before trying to return
a stream-dependant variable. For now there is no impact but it will
cause trouble with tcp-request session rules.
When the tcp/http actions above were introduced in 1.7-dev4, we used to
proceed like this :
- set-src/set-dst would force the port to zero
- set-src-port/set-dst-port would not do anything if the address family is
neither AF_INET nor AF_INET6.
It was a stupid idea of mine to request this behaviour because it ensures
that these functions cannot be used in a wide number of situations. Because
of the first rule, it is necessary to save the source port one way or
another if only the address has to be changed (so you have to use an
variable). Due to the second rule, there's no way to set the source port
on a unix socket without first overwriting the address. And sometimes it's
really not convenient, especially when there's no way to guarantee that all
fields will properly be set.
In order to fix all this, this small change does the following :
- set-src/set-dst always preserve the original port even if the address
family changes. If the previous address family didn't have a port (eg:
AF_UNIX), then the port is set to zero ;
- set-src-port/set-dst-port always preserve the original address. If the
address doesn't have a port, then the family is forced to IPv4 and the
address to "0.0.0.0".
Thanks to this it now becomes possible to perform one action, the other or
both in any order.
To register a new cli keyword, you need to declare a cli_kw_list
structure in your source file:
static struct cli_kw_list cli_kws = {{ },{
{ { "test", "list", NULL }, "test list : do some tests on the cli", test_parsing, NULL },
{ { NULL }, NULL, NULL, NULL, NULL }
}};
And then register it:
cli_register_kw(&cli_kws);
The first field is an array of 5 elements, where you declare the
keywords combination which will match, it must be ended by a NULL
element.
The second field is used as a usage message, it will appear in the help
of the cli, you can set it to NULL if you don't want to show it, it's a
good idea if you want to overwrite some existing keywords.
The two last fields are callbacks.
The first one is used at parsing time, you can use it to parse the
arguments of your keywords and print small messages. The function must
return 1 in case of a failure, otherwise 0:
#include <proto/dumpstats.h>
static int test_parsing(char **args, struct appctx *appctx)
{
struct chunk out;
if (!*args[2]) {
appctx->ctx.cli.msg = "Error: the 3rd argument is mandatory !";
appctx->st0 = STAT_CLI_PRINT;
return 1;
}
chunk_reset(&trash);
chunk_printf(&trash, "arg[3]: %s\n", args[2]);
chunk_init(&out, NULL, 0);
chunk_dup(&out, &trash);
appctx->ctx.cli.err = out.str;
appctx->st0 = STAT_CLI_PRINT_FREE; /* print and free in the default cli_io_handler */
return 0;
}
The last field is the IO handler callback, it can be set to NULL if you
want to use the default cli_io_handler() otherwise you can write your
own. You can use the private pointer in the appctx if you need to store
a context or some data. stats_dump_sess_to_buffer() is a good example of
IO handler, IO handlers often use the appctx->st2 variable for the state
machine. The handler must return 0 in case it have to be recall later
otherwise 1.
During the stick-table teaching process which occurs at reloading/restart time,
expiration dates of stick-tables entries were not synchronized between peers.
This patch adds two new stick-table messages to provide such a synchronization feature.
As these new messages are not supported by older haproxy peers protocol versions,
this patch increments peers protol version, from 2.0 to 2.1, to help in detecting/supporting
such older peers protocol implementations so that new versions might still be able
to transparently communicate with a newer one.
[wt: technically speaking it would be nice to have this backported into 1.6
as some people who reload often are affected by this design limitation, but
it's not a totally transparent change that may make certain users feel
reluctant to upgrade older versions. Let's let it cook in 1.7 first and
decide later]
The fe_req_rate is similar to fe_sess_rate, but fetches the number
of HTTP requests per second instead of connections/sessions per second.
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
dns_init_resolvers() tries to emit the current resolver's name in the
error message in case of out-of-memory condition. But it must not do
it when initializing the trash before even having such a resolver
otherwise the user is certain to get a dirty crash instead of the
error message. No backport is needed.
An apparent copy-paste error resulted in backend's avg connection time
to report the average queue time instead in the HTML dump. Backport to
1.6 is desired.
When the compression is enable on HTTP responses, the chunked data are copied in
a temporary buffer during the HTTP body parsing and then compressed when
everything is forwarded to the client. But the amout of data that can be copied
was not correctly calculated. In many cases, it worked, else on the edge when
the channel buffer was almost full.
[wt: bug introduced by b77c5c26 in 1.7-dev, no backport needed]
Enable IP_BIND_ADDRESS_NO_PORT on backend connections when the source
address is specified without port or port ranges. This is supported
since Linux 4.2/libc 2.23.
If the kernel supports it but the libc doesn't, we can define it at
build time:
make [...] DEFINE=-DIP_BIND_ADDRESS_NO_PORT=24
For more informations about this feature, see Linux commit 90c337da
With Linux officially introducing SO_REUSEPORT support in 3.9 and
its mainstream adoption we have seen more people running into strange
SO_REUSEPORT related issues (a process management issue turning into
hard to diagnose problems because the kernel load-balances between the
new and an obsolete haproxy instance).
Also some people simply want the guarantee that the bind fails when
the old process is still bound.
This change makes SO_REUSEPORT configurable, introducing the command
line argument "-dR" and the noreuseport configuration directive.
A backport to 1.6 should be considered.
pcre_version() returns the running PCRE release, not the release
haproxy was built with.
This simple string fix should be backported to supported releases,
as the output may be confusing.
The analyse of CNAME resolution and request's domain name was performed
twice:
- when validating the response buffer
- when loading the right IP address from the response
Now DNS response are properly loaded into a DNS response structure, we
do the domain name validation when loading/validating the response in
the DNS strcucture and later processing of this task is now useless.
backport: no
DNS servers don't return A or AAAA record if the query points to a CNAME
not resolving to the right type.
We know it because the last record of the response is a CNAME. We can
trigger a new query, switching to a new query type, handled by the layer
above.
New DNS response parser function which turn the DNS response from a
network buffer into a DNS structure, much easier for later analysis
by upper layer.
Memory is pre-allocated at start-up in a chunk dedicated to DNS
response store.
New error code to report a wrong number of queries in a DNS response.
Enrichment of the 'set server <b>/<s> addr' cli directive to allow changing
now a server's port.
The new syntax looks like:
set server <b>/<s> addr [port <port>]
This function can replace update_server_addr() where the need to change the
server's port as well as the IP address is required.
It performs some validation before performing each type of change.
Introduction of 3 new server flags to remember if some parameters were set
during configuration parsing.
* SRV_F_CHECKADDR: this server has a check addr configured
* SRV_F_CHECKPORT: this server has a check port configured
* SRV_F_AGENTADDR: this server has a agent addr configured
HAProxy used to deduce port used for health checks when parsing configuration
at startup time.
Because of this way of working, it makes it complicated to change the port at
run time.
The current patch changes this behavior and makes HAProxy to choose the
port used for health checking when preparing the check task itself.
A new type of error is introduced and reported when no port can be found.
There won't be any impact on performance, since the process to find out the
port value is made of a few 'if' statements.
This patch also introduces a new check state CHK_ST_PORT_MISS: this flag is
used to report an error in the case when HAProxy needs to establish a TCP
connection to a server, to perform a health check but no TCP ports can be
found for it.
And last, it also introduces a new stream termination condition:
SF_ERR_CHK_PORT. Purpose of this flag is to report an error in the event when
HAProxy has to run a health check but no port can be found to perform it.
SOL_IPV6 is not defined on OSX, breaking the compile. Also libcrypt is
not available for installation neither in Macports nor as a Brew recipe,
so we're disabling implicit dependancy.
Signed-off-by: Dinko Korunic <dinko.korunic@gmail.com>
Introduction of a new CLI command "set server <srv> check-port <port>' to
allow admins to change a server's health check port at run time.
This changes the equivalent of the configuration server parameter
called 'port'.
Today I was working on an auto-update script for some ACLs, and found
that I couldn't load ACL entries with a semi-colon in them no matter
how I tried to escape it.
As such, I wrote this patch (this one is for 1.7dev, but it applies to
1.5 the same with just line numbers changed), which seems to allow me
to execute a command such as "add acl /etc/foo.lst foo\;bar" over the
socket. It's worth noting that stats_sock_parse_request() already uses
the backslash to escape spaces in words so it makes sense to use it as
well to escape the semi-colon.
A typo resulting from a copy-paste in the original req.ssl_ver code will
make certain SSLv2 hello messages not properly detected. The bug has been
present since the code was added in 1.3.16. In 1.3 and 1.4, this code was
in proto_tcp.c. In 1.5-dev0, it moved to acl.c, then later to payload.c.
This bug was tagged "minor" because SSLv2 is outdated and this encoding
was rarely (if at all) used, the shorter form starting with 0x80 being
more common.
This fix needs to be backported to all currently maintained branches.
In dns_send_query(), ret was set to 0 but always reassigned before the
usage so this initialisation was useless.
The send_error variable was created, assigned to 0 but never used. So
this variable is just useless by itself. Removing it.
In dump_servers_state(), srv_time_since_last_change, bk_f_forced_id, srv_f_forced_id variables
were firstly set to zero and immediately reassigned to another value while never been accessed in between.
Sounds like a useless initiazation. So let's make only the useful allocation.
Bartosz Koninski reported that recent commit 568743a ("BUG/MEDIUM: stream-int:
completely detach connection on connect error") introduced a nasty side effect
during the connection retry, causing a reconnect attempt to be performed on a
random address, possibly another server from another backend as shown in his
reproducer.
The real reason is in fact that by calling si_release_endpoint() after a
failed connect attempt and not clearing the SN_ADDR_SET flag on the
stream, we indicate that we continue to trust the connection's current
address. So upon next connect attempt, a connection is picked from the
pool and the last target address is reused. It is not very easy to
reproduce it 100% reliably out of production traffic, but it's easier to
see when haproxy is started with -dM to force the pools to be filled with
junk, and where subsequent connections are not even attempted due to their
family being incorrect.
The correct fix consists in clearing the SN_ADDR_SET flag after calling
si_release_endpoint() during a retry. But it outlines a deeper issue which
is in fact that the target address is *stored* in the connection and its
validity is stored in the stream. Until we have a better solution to store
target addresses, it would be better to rely on the connection flags
(CO_FL_ADDR_TO_SET) for this purpose. But it also outlines the fact that
the same issue still exists in Lua sockets and in idle sockets, which
fortunately are not affected by this issue.
Thanks to Bartosz for providing all the elements needed to understand the
problem.
This fix needs to be backported to 1.6 and 1.5.
Trie now uses a dataset structure just like Pattern, so this has been
defined in includes/types/global.h for both Pattern and Trie where it
was just Pattern.
In src/51d.c all functions used by the Trie implementation which need a
dataset as an argument now use the global dataset. The
fiftyoneDegreesDestroy method has now been replaced with
fiftyoneDegreesDataSetFree which is common to Pattern and Trie. In
addition, two extra dataset init status' have been added to the switch
statement in init_51degrees.
Tq is the time between the instant the connection is accepted and a
complete valid request is received. This time includes the handshake
(SSL / Proxy-Protocol), the idle when the browser does preconnect and
the request reception.
This patch decomposes %Tq in 3 measurements names %Th, %Ti, and %TR
which returns respectively the handshake time, the idle time and the
duration of valid request reception. It also adds %Ta which reports
the request's active time, which is the total time without %Th nor %Ti.
It replaces %Tt as the total time, reporting accurate measurements for
HTTP persistent connections.
%Th is avalaible for TCP and HTTP sessions, %Ti, %TR and %Ta are only
avalaible for HTTP connections.
In addition to this, we have new timestamps %tr, %trg and %trl, which
log the date of start of receipt of the request, respectively in the
default format, in GMT time and in local time (by analogy with %t, %T
and %Tl). All of them are obviously only available for HTTP. These values
are more relevant as they more accurately represent the request date
without being skewed by a browser's preconnect nor a keep-alive idle
time.
The HTTP log format and the CLF log format have been modified to
use %tr, %TR, and %Ta respectively instead of %t, %Tq and %Tt. This
way the default log formats now produce the expected output for users
who don't want to manually fiddle with the log-format directive.
Example with the following log-format :
log-format "%ci:%cp [%tr] %ft %b/%s h=%Th/i=%Ti/R=%TR/w=%Tw/c=%Tc/r=%Tr/a=%Ta/t=%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"
The request was sent by hand using "openssl s_client -connect" :
Aug 23 14:43:20 haproxy[25446]: 127.0.0.1:45636 [23/Aug/2016:14:43:20.221] test~ test/test h=6/i=2375/R=261/w=0/c=1/r=0/a=262/t=2643 200 145 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
=> 6 ms of SSL handshake, 2375 waiting before sending the first char (in
fact the time to type the first line), 261 ms before the end of the request,
no time spent in queue, 1 ms spend connecting to the server, immediate
response, total active time for this request = 262ms. Total time from accept
to close : 2643 ms.
The timing now decomposes like this :
first request 2nd request
|<-------------------------------->|<-------------- ...
t tr t tr ...
---|----|----|----|----|----|----|----|----|--
: Th Ti TR Tw Tc Tr Td : Ti ...
:<---- Tq ---->: :
:<-------------- Tt -------------->:
:<--------- Ta --------->:
Up to HAProxy 1.7-dev3, HAProxy used to use the first bind port from it's
local 'listen' section when no port is configured on the server.
IE, in the configuration below, the server port would be 25:
listen smtp
bind :25
server s1 1.0.0.1 check
This way of working is now obsolete and can be removed, furthermore it is not
documented!
This will make the possibility to change the server's port much easier.
The function ipcpy() simply duplicates the IP address found in one
struct sockaddr_storage into an other struct sockaddr_storage.
It also update the family on the destination structure.
Memory of destination structure must be allocated and cleared by the
caller.
Bryan Talbot reported a very interesting bug. The sc_trackers() sample
fetch seems to have escaped the sanitization that was performed during 1.5
to ensure all dereferences of stkctr_entry() were safe. Here if a tacker
is set on a backend and is then checked against a different backend where
the entry doesn't exist, stkctr_entry() returns NULL and this is dereferenced
to retrieve the ref count. Thanks to Bryan for his detailed bug report featuring
a working config and reproducer.
This fix must be backported to 1.6 and 1.5.
After pushing the last update related to a resync process, the teacher resets
the re-connection's origin to the saved one (pointer position when he receive
the resync request). But the last acknowledgement overwrites this pointer to
an inconsistent value. In peersv2, it results with empty table chunks
regularly pushed.
The fix consist to move the confirm code to assure that the confirm message is
always sent after the last acknowledgement related to the resync. And to reset
the re-connection's origin to the saved one when a confirm message is received.
This bug affects versions 1.6 and superior.
For older versions (peersv1), this inconsistent state should not generate side
effects because chunks are not used and a next acknowlegement message resets
the pointer in a valid state.
Adding on to Thierry's work (http://git.haproxy.org/?p=haproxy.git;h=6310bef5)
I have added a few more fetchers for counters based on the tcp_info struct
maintained by the kernel :
fc_unacked, fc_sacked, fc_retrans, fc_fackets, fc_lost,
fc_reordering
Two fields were not added because they're version-dependant :
fc_rcv_rtt, fc_total_retrans
The fields name depend on the operating system. FreeBSD and NetBSD prefix
all the field names with "__" so we have to rely on a few #ifdef for
portability.
Clang reports that this function is not used :
src/ev_poll.c:34:28: warning: unused function 'hap_fd_isset' [-Wunused-function]
static inline unsigned int hap_fd_isset(int fd, unsigned int *evts)
It's been true since the rework of the pollers in 1.5 and it's unlikely we'll
ever need it anymore, so better remove it now to provide clean builds.
This fix can be backported to 1.6 and 1.5 now.
This warning appears when building without any compression library :
src/compression.c:143:19: warning: unused function 'init_comp_ctx' [-Wunused-function]
static inline int init_comp_ctx(struct comp_ctx **comp_ctx)
^
src/compression.c:176:19: warning: unused function 'deinit_comp_ctx' [-Wunused-function]
static inline int deinit_comp_ctx(struct comp_ctx **comp_ctx)
No backport is needed.