When a frequency counter must be updated, we use the curr_sec/curr_tick fields
as a lock, by setting the MSB to 1 in a compare-and-swap to lock and by reseting
it to unlock. And when we need to read it, we loop until the counter is
unlocked. This way, the frequency counters are thread-safe without any external
lock. It is important to avoid increasing the size of many structures (global,
proxy, server, stick_table).
locks have been added in pat_ref and pattern_expr structures to protect all
accesses to an instance of on of them. Moreover, a global lock has been added to
protect the LRU cache used for pattern matching.
Patterns are now duplicated after a successfull matching, to avoid modification
by other threads when the result is used.
Finally, the function reloading a pattern list has been modified to be
thread-safe.
First, OpenSSL is now initialized to be thread-safe. This is done by setting 2
callbacks. The first one is ssl_locking_function. It handles the locks and
unlocks. The second one is ssl_id_function. It returns the current thread
id. During the init step, we create as much as R/W locks as needed, ie the
number returned by CRYPTO_num_locks function.
Next, The reusable SSL session in the server context is now thread-local.
Shctx is now also initialized if HAProxy is started with several threads.
And finally, a global lock has been added to protect the LRU cache used to store
generated certificates. The function ssl_sock_get_generated_cert is now
deprecated because the retrieved certificate can be removed by another threads
in same time. Instead, a new function has been added,
ssl_sock_assign_generated_cert. It must be used to search a certificate in the
cache and set it immediatly if found.
A lock is used to protect accesses to a peer structure.
A the lock is taken in the applet handler when the peer is identified
and released living the applet handler.
In the scheduling task for peers section, the lock is taken for every
listed peer and released at the end of the process task function.
The peer 'force shutdown' function was also re-worked.
A global lock has been added to protect accesses to the list of active
applets. A process mask has also been added on each applet. Like for FDs and
tasks, it is used to know which threads are allowed to process an
applet. Because applets are, most of time, linked to a session, it should be
sticky on the same thread. But in all cases, it is the responsibility of the
applet handler to lock what have to be protected in the applet context.
This is done by passing the right stream's proxy (the frontend or the backend,
depending on the context) to lock the error snapshot used to store the error
info.
The stick table API was slightly reworked:
A global spin lock on stick table was added to perform lookup and
insert in a thread safe way. The handling of refcount on entries
is now handled directly by stick tables functions under protection
of this lock and was removed from the code of callers.
The "stktable_store" function is no more externalized and users should
now use "stktable_set_entry" in any case of insertion. This last one performs
a lookup followed by a store if not found. So the code using "stktable_store"
was re-worked.
Lookup, and set_entry functions automatically increase the refcount
of the returned/stored entry.
The function "sticktable_touch" was renamed "sticktable_touch_local"
and is now able to decrease the refcount if last arg is set to true. It
is allowing to release the entry without taking the lock twice.
A new function "sticktable_touch_remote" is now used to insert
entries coming from remote peers at the right place in the update tree.
The code of peer update was re-worked to use this new function.
This function is also able to decrease the refcount if wanted.
The function "stksess_kill" also handle a parameter to decrease
the refcount on the entry.
A read/write lock is added on each entry to protect the data content
updates of the entry.
A lock for LB parameters has been added inside the proxy structure and atomic
operations have been used to update server variables releated to lb.
The only significant change is about lb_map. Because the servers status are
updated in the sync-point, we can call recalc_server_map function synchronously
in map_set_server_status_up/down function.
This list is used to save changes on the servers state. So when serveral threads
are used, it must be locked. The changes are then applied in the sync-point. To
do so, servers_update_status has be moved in the sync-point. So this is useless
to lock it at this step because the sync-point is a protected area by iteself.
For now, we have a list of each type per thread. So there is no need to lock
them. This is the easiest solution for now, but not the best one because there
is no sharing between threads. An idle connection on a thread will not be able
be used by a stream on another thread. So it could be a good idea to rework this
patch later.
Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.
First, we use atomic operations to update jobs/totalconn/actconn variables,
listener's nbconn variable and listener's counters. Then we add a lock on
listeners to protect access to their information. And finally, listener queues
(global and per proxy) are also protected by a lock. Here, because access to
these queues are unusal, we use the same lock for all queues instead of a global
one for the global queue and a lock per proxy for others.
2 global locks have been added to protect, respectively, the run queue and the
wait queue. And a process mask has been added on each task. Like for FDs, this
mask is used to know which threads are allowed to process a task.
For many tasks, all threads are granted. And this must be your first intension
when you create a new task, else you have a good reason to make a task sticky on
some threads. This is then the responsibility to the process callback to lock
what have to be locked in the task context.
Nevertheless, all tasks linked to a session must be sticky on the thread
creating the session. It is important that I/O handlers processing session FDs
and these tasks run on the same thread to avoid conflicts.
Many changes have been made to do so. First, the fd_updt array, where all
pending FDs for polling are stored, is now a thread-local array. Then 3 locks
have been added to protect, respectively, the fdtab array, the fd_cache array
and poll information. In addition, a lock for each entry in the fdtab array has
been added to protect all accesses to a specific FD or its information.
For pollers, according to the poller, the way to manage the concurrency is
different. There is a poller loop on each thread. So the set of monitored FDs
may need to be protected. epoll and kqueue are thread-safe per-se, so there few
things to do to protect these pollers. This is not possible with select and
poll, so there is no sharing between the threads. The poller on each thread is
independant from others.
Finally, per-thread init/deinit functions are used for each pollers and for FD
part for manage thread-local ressources.
Now, you must be carefull when a FD is created during the HAProxy startup. All
update on the FD state must be made in the threads context and never before
their creation. This is mandatory because fd_updt array is thread-local and
initialized only for threads. Because there is no pollers for the main one, this
array remains uninitialized in this context. For this reason, listeners are now
enabled in run_thread_poll_loop function, just like the worker pipe.
log buffers and static variables used in log functions are now thread-local. So
there is no need to lock anything to log messages. Moreover, per-thread
init/deinit functions are now used to initialize these buffers.
A sync-point is a protected area where you have the warranty that no concurrency
access is possible. It is implementated as a thread barrier to enter in the
sync-point and another one to exit from it. Inside the sync-point, all threads
that must do some syncrhonous processing will be called one after the other
while all other threads will wait. All threads will then exit from the
sync-point at the same time.
A sync-point will be evaluated only when necessary because it is a costly
operation. To limit the waiting time of each threads, we must have a mechanism
to wakeup all threads. This is done with a pipe shared by all threads. By
writting in this pipe, we will interrupt all threads blocked on a poller. The
pipe is then flushed before exiting from the sync-point.
hap_register_per_thread_init and hap_register_per_thread_deinit functions has
been added to register functions to do, for each thread, respectively, some
initialization and deinitialization. These functions are added in the global
lists per_thread_init_list and per_thread_deinit_list.
These functions are called only when HAProxy is started with more than 1 thread
(global.nbthread > 1).
This file contains all functions and macros used to deal with concurrency in
HAProxy. It contains all high-level function to do atomic operation
(HA_ATOMIC_*). Note, for now, we rely on "__atomic" GCC builtins to do atomic
operation. So HAProxy can be compiled with the thread support iff these builtins
are available.
It also contains wrappers around plocks to use spin or read/write locks. These
wrappers are used to abstract the internal representation of the locking system
and to add information to help debugging, when compiled with suitable
options.
To add extra info on locks, you need to add DEBUG=-DDEBUG_THREAD or
DEBUG=-DDEBUG_FULL compilation option. In addition to timing info on locks, we
keep info on where a lock was acquired the last time (function name, file and
line). There are also the thread id and a flag to know if it is still locked or
not. This will be useful to debug deadlocks.
Now memprintf relies on memvprintf. This new function does exactly what
memprintf did before, but it must be called with a va_list instead of a variable
number of arguments. So there is no change for every functions using
memprintf. But it is now also possible to have same functionnality from any
function with variadic arguments.
Email alerts relies on checks to send emails. The link between a mailers section
and a proxy was resolved during the configuration parsing, But initialization was
done when the first alert is triggered. This implied memory allocations and
tasks creations. With this patch, everything is now initialized during the
configuration parsing. So when an alert is triggered, only the memory required
by this alert is dynamically allocated.
Moreover, alerts processing had a flaw. The task handler used to process alerts
to be sent to the same mailer, process_email_alert, was designed to give back
the control to the scheduler when an alert was sent. So there was a delay
between the sending of 2 consecutives alerts (the min of
"proxy->timeout.connect" and "mailer->timeout.mail"). To fix this problem, now,
we try to process as much queued alerts as possible when the task is woken up.
This is a huge patch with many changes, all about the DNS. Initially, the idea
was to update the DNS part to ease the threads support integration. But quickly,
I started to refactor some parts. And after several iterations, it was
impossible for me to commit the different parts atomically. So, instead of
adding tens of patches, often reworking the same parts, it was easier to merge
all my changes in a uniq patch. Here are all changes made on the DNS.
First, the DNS initialization has been refactored. The DNS configuration parsing
remains untouched, in cfgparse.c. But all checks have been moved in a post-check
callback. In the function dns_finalize_config, for each resolvers, the
nameservers configuration is tested and the task used to manage DNS resolutions
is created. The links between the backend's servers and the resolvers are also
created at this step. Here no connection are kept alive. So there is no needs
anymore to reopen them after HAProxy fork. Connections used to send DNS queries
will be opened on demand.
Then, the way DNS requesters are linked to a DNS resolution has been
reworked. The resolution used by a requester is now referenced into the
dns_requester structure and the resolution pointers in server and dns_srvrq
structures have been removed. wait and curr list of requesters, for a DNS
resolution, have been replaced by a uniq list. And Finally, the way a requester
is removed from a DNS resolution has been simplified. Now everything is done in
dns_unlink_resolution.
srv_set_fqdn function has been simplified. Now, there is only 1 way to set the
server's FQDN, independently it is done by the CLI or when a SRV record is
resolved.
The static DNS resolutions pool has been replaced by a dynamoc pool. The part
has been modified by Baptiste Assmann.
The way the DNS resolutions are triggered by the task or by a health-check has
been totally refactored. Now, all timeouts are respected. Especially
hold.valid. The default frequency to wake up a resolvers is now configurable
using "timeout resolve" parameter.
Now, as documented, as long as invalid repsonses are received, we really wait
all name servers responses before retrying.
As far as possible, resources allocated during DNS configuration parsing are
releases when HAProxy is shutdown.
Beside all these changes, the code has been cleaned to ease code review and the
doc has been updated.
The messages processing is done using existing functions. So here, the main task
is to find the SPOE engine to use. To do so, we loop on all filter instances
attached to the stream. For each, we check if it is a SPOE filter and, if yes,
if its name is the one used to declare the "send-spoe-group" action.
We also take care to return an error if the action processing is interrupted by
HAProxy (because of a timeout or an error at the HAProxy level). This is done by
checking if the flag ACT_FLAG_FINAL is set.
The function spoe_send_group is the action_ptr callback ot
Because we can have messages chained by event or by group, we need to have a way
to know which kind of list we manipulate during the encoding. So 2 types of list
has been added, SPOE_MSGS_BY_EVENT and SPOE_MSGS_BY_GROUP. And the right type is
passed when spoe_encode_messages is called.
This action is used to trigger sending of a group of SPOE messages. To do so,
the SPOE engine used to send messages must be defined, as well as the SPOE group
to send. Of course, the SPOE engine must refer to an existing SPOE filter. If
not engine name is provided on the SPOE filter line, the SPOE agent name must be
used. For example:
http-request send-spoe-group my-engine some-group
This action is available for "tcp-request content", "tcp-response content",
"http-request" and "http-response" rulesets. It cannot be used for tcp
connection/session rulesets because actions for these rulesets cannot yield.
For now, the action keyword is parsed and checked. But it does nothing. Its
processing will be added in another patch.
For now, this section is only parsed. It should have the following format:
spoe-group <grp-name>
messages <msg-name> ...
And then SPOE groups must be referenced in spoe-agent section:
spoe-agnt <name>
...
groups <grp-name> ...
The purpose of these groups is to trigger messages sending from TCP or HTTP
rules, directly from HAProxy configuration, and not on specific event. This part
will be added in another patch.
It is important to note that a message belongs at most to a group.
The engine name is now kept in "spoe_config" struture. Because a SPOE filter can
be declared without engine name, we use the SPOE agent name by default. Then,
its uniqness is checked against all others SPOE engines configured for the same
proxy.
* TODO: Add documentation
Now, it is possible to conditionnaly send a SPOE message by adding an ACL-based
condition on the "event" line, in a "spoe-message" section. Here is the example
coming for the SPOE documentation:
spoe-message get-ip-reputation
args ip=src
event on-client-session if ! { src -f /etc/haproxy/whitelist.lst }
To avoid mixin with proxy's ACLs, each SPOE message has its private ACL list. It
possible to declare named ACLs in "spoe-message" section, using the same syntax
than for proxies. So we can rewrite the previous example to use a named ACL:
spoe-message get-ip-reputation
args ip=src
acl ip-whitelisted src -f /etc/haproxy/whitelist.lst
event on-client-session if ! ip-whitelisted
ACL-based conditions are executed in the context of the stream that handle the
client and the server connections.
It was painful not to have the status code available, especially when
it was computed. Let's store it and ensure we don't claim content-length
anymore on 1xx, only 0 body bytes.
This patch reorganize the shctx API in a generic storage API, separating
the shared SSL session handling from its core.
The shctx API only handles the generic data part, it does not know what
kind of data you use with it.
A shared_context is a storage structure allocated in a shared memory,
allowing its usage in a multithread or a multiprocess context.
The structure use 2 linked list, one containing the available blocks,
and another for the hot locked blocks. At initialization the available
list is filled with <maxblocks> blocks of size <blocksize>. An <extra>
space is initialized outside the list in case you need some specific
storage.
+-----------------------+--------+--------+--------+--------+----
| struct shared_context | extra | block1 | block2 | block3 | ...
+-----------------------+--------+--------+--------+--------+----
<-------- maxblocks --------->
* blocksize
The API allows to store content on several linked blocks. For example,
if you allocated blocks of 16 bytes, and you want to store an object of
60 bytes, the object will be allocated in a row of 4 blocks.
The API was made for LRU usage, each time you get an object, it pushes
the object at the end of the list. When it needs more space, it discards
The functions name have been renamed in a more logical way, the part
regarding shctx have been prefixed by shctx_ and the functions for the
shared ssl session cache have been prefixed by sh_ssl_sess_.
Move the ssl callback functions of the ssl shared session cache to
ssl_sock.c. The shctx functions still needs to be separated of the ssl
tree and data.
A bind_conf does contain a ssl_bind_conf, which already has a flag to know
if early data are activated, so use that, instead of adding a new flag in
the ssl_options field.
When compiled with Openssl >= 1.1.1, before attempting to do the handshake,
try to read any early data. If any early data is present, then we'll create
the session, read the data, and handle the request before we're doing the
handshake.
For this, we add a new connection flag, CO_FL_EARLY_SSL_HS, which is not
part of the CO_FL_HANDSHAKE set, allowing to proceed with a session even
before an SSL handshake is completed.
As early data do have security implication, we let the origin server know
the request comes from early data by adding the "Early-Data" header, as
specified in this draft from the HTTP working group :
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-replay
This patch simply brings HAProxy internal regex system to the Lua API.
Lua doesn't embed regexes, now it inherits from the regexes compiled
with haproxy.
Allow to register a function which will be called after the
configuration file parsing, at the end of the check_config_validity().
It's useful fo checking dependencies between sections or for resolving
keywords, pointers or values.
This commit implements a post section callback. This callback will be
used at the end of a section parsing.
Every call to cfg_register_section must be modified to use the new
prototype:
int cfg_register_section(char *section_name,
int (*section_parser)(const char *, int, char **, int),
int (*post_section_parser)());
We used to have bo_{get,put}_{chr,blk,str} to retrieve/send data to
the output area of a buffer, but not the equivalent ones for the input
area. This will be needed to copy uploaded data frames in HTTP/2.
This one may be called by upper layers (eg: si_shutw()) or lower layers
(si_shutw() as well during stream_int_notify()) so we want it to take
care of updating the connection's flags if it's not going to be done
by the caller.
In transport-layer functions (snd_buf/rcv_buf), it's very problematic
never to know if polling changes made to the connection will be propagated
or not. This has led to some conn_cond_update_polling() calls being placed
at a few places to cover both the cases where the function is called from
the upper layer and when it's called from the lower layer. With the arrival
of the MUX, this becomes even more complicated, as the upper layer will not
have to manipulate anything from the connection layer directly and will not
have to push such updates directly either. But the snd_buf functions will
need to see their updates committed when called from upper layers.
The solution here is to introduce a connection flag set by the connection
handler (and possibly any other similar place) indicating that the caller
is committed to applying such changes on return. This way, the called
functions will be able to apply such changes by themselves before leaving
when the flag is not set, and the upper layer will not have to care about
that anymore.
This flag is only used when reading using splicing for now, and is only
set when a pipe full condition is met, so we can simplify its reset
condition in conn_refresh_polling_flags so that it's cleared at the
same time as the other ones, only when the control layer is ready.
This flag could be used more, to mark that a buffer full condition was
met with any receive method in order to simplify polling management.
This should probably be revisited after 1.8.
This is based on the git SHA1 implementation and optimized to do word
accesses rather than byte accesses, and to avoid unnecessary copies into
the context array.
BoringSSL switch OPENSSL_VERSION_NUMBER to 1.1.0 for compatibility.
Fix BoringSSL call and openssl-compat.h/#define occordingly.
This will not break openssl/libressl compat.
Now any call to trace() in the code will automatically appear interleaved
with the call sequence and timestamped in the trace file. They appear with
a '#' on the 3rd argument (caller's pointer) in order to make them easy to
spot. If the trace functionality is not used, a dmumy weak function is used
instead so that it doesn't require to recompile every time traces are
enabled/disabled.
The trace decoder knows how to deal with these messages, detects them and
indents them similarly to the currently traced function. This can be used
to print function arguments for example.
Note that we systematically flush the log when calling trace() to ensure we
never miss important events, so this may impact performance.
The trace() function uses the same format as printf() so it should be easy
to setup during debugging sessions.
Now only conn_full_close() will be used. It will become more obvious
when the tracking is in place or not and will make it easier to
convert remaining call places to conn_streams.
Instead of having to manually handle lingering outside, let's make
conn_sock_shutw() check for it before calling shutdown(). We simply
don't want to emit the FIN if we're going to reset the connection
due to lingering. It's particularly important for silent-drop where
it's absolutely mandatory that no packet leaves the machine.
These flags are not exactly for the data layer, they instead indicate
what is expected from the transport layer. Since we're going to split
the connection between the transport and the data layers to insert a
mux layer, it's important to have a clear idea of what each layer does.
All function conn_data_* used to manipulate these flags were renamed to
conn_xprt_*.
The HTTP/2->HTTP/1 gateway will need to process HTTP/1 responses. We
cannot sanely rely on the HTTP/1 txn to parse a response because :
1) responses generated by haproxy such as error messages, redirects,
stats or Lua are neither parsed nor indexed ; this could be
addressed over the long term but will take time.
2) the http txn is useless to parse the body : the states present there
are only meaningful to received bytes (ie next bytes to parse) and
not at all to sent bytes. Thus chunks cannot be followed at all.
Even when implementing this later, it's unsure whether it will be
possible when dealing with compression.
So using the HTTP txn is now out of the equation and the only remaining
solution is to call an HTTP/1 message parser. We already have one, it was
slightly modified to avoid keeping states by benefitting from the fact
that the response was produced by haproxy and this is entirely available.
It assumes the following rules are true, or that incuring an extra cost
to work around them is acceptable :
- the response buffer is read-write and supports modifications in place
- headers sent through / by haproxy are not folded. Folding is still
implemented by replacing CR/LF/tabs/spaces with spaces if encountered
- HTTP/0.9 responses are never sent by haproxy and have never been
supported at all
- haproxy will not send partial responses, the whole headers block will
be sent at once ; this means that we don't need to keep expensive
states and can afford to restart the parsing from the beginning when
facing a partial response ;
- response is contiguous (does not wrap). This was already the case
with the original parser and ensures we can safely dereference all
fields with (ptr,len)
The parser replaces all of the http_msg fields that were necessary with
local variables. The parser is not called on an http_msg but on a string
with a start and an end. The HTTP/1 states were reused for ease of use,
though the request-specific ones have not been implemented for now. The
error position and error state are supported and optional ; these ones
may be used later for bug hunting.
The parser issues the list of all the headers into a caller-allocated
array of struct ist.
The content-length/transfer-encoding header are checked and the relevant
info fed the h1 message state (flags + body_len).
This will be used initially by the hpack table and hopefully later by a
new native http processor. These headers are made of name and value, both
an immediate string (ie: pointer and length).
The chunk crlf parser used to depend on the channel and on the HTTP
message, eventhough it's not really needed. Let's remove this dependency
so that it can be used within the H2 to H1 gateway.
As part of this small API change, it was renamed to h1_skip_chunk_crlf()
to mention that it doesn't depend on http_msg anymore.
The chunk parser used to depend on the channel and on the HTTP message
but it's not really needed as they're only used to retrieve the buffer
as well as to return the number of bytes parsed and the chunk size.
Here instead we pass the (few) relevant information in arguments so that
the function may be reused without a channel nor an HTTP message (ie
from the H2 to H1 gateway).
As part of this API change, it was renamed to h1_parse_chunk_size() to
mention that it doesn't depend on http_msg anymore.
Functions http_parse_chunk_size(), http_skip_chunk_crlf() and
http_forward_trailers() were moved to h1.h and h1.c respectively so
that they can be called from outside. The parts that were inline
remained inline as it's critical for performance (+41% perf
difference reported in an earlier test). For now the "http_" prefix
remains in their name since they still depend on the http_msg type.
Certain types and enums are very specific to the HTTP/1 parser, and we'll
need to share them with the HTTP/2 to HTTP/1 translation code. Let's move
them to h1.c/h1.h. Those with very few occurrences or only used locally
were renamed to explicitly mention the relevant HTTP version :
enum ht_state -> h1_state.
http_msg_state_str -> h1_msg_state_str
HTTP_FLG_* -> H1_FLG_*
http_char_classes -> h1_char_classes
Others like HTTP_IS_*, HTTP_MSG_* are left to be done later.
Thus function returns the number of blocks. When a buffer is full and
properly aligned, buf->p loops back the beginning, and the test in the
code doesn't cover that specific case, so it returns two chunks, a full
one and an empty one. It's harmless but can sometimes have a small impact
on performance and definitely makes the code hard to debug.
Fix regression introduced by commit:
'MAJOR: servers: propagate server status changes asynchronously.'
The building of the log line was re-worked to be done at the
postponed point without lack of data.
[wt: this only affects 1.8-dev, no backport needed]
This function modifies the string to add a zero after the end, and returns
the start pointer. The purpose is to use it on strings extracted by parsers
from larger strings cut with delimiters that are not important and can be
destroyed. It allows any such string to be used with regular string
functions. It's also convenient to use with printf() to show data extracted
from writable areas.
There's no point having the channel marked writable as these functions
only extract data from the channel. The code was retrieved from their
ci/co ancestors.
For HTTP/2 we'll need some buffer-only equivalent functions to some of
the ones applying to channels and still squatting the bi_* / bo_*
namespace. Since these names have kept being misleading for quite some
time now and are really getting annoying, it's time to rename them. This
commit will use "ci/co" as the prefix (for "channel in", "channel out")
instead of "bi/bo". The following ones were renamed :
bi_getblk_nc, bi_getline_nc, bi_putblk, bi_putchr,
bo_getblk, bo_getblk_nc, bo_getline, bo_getline_nc, bo_inject,
bi_putchk, bi_putstr, bo_getchr, bo_skip, bi_swpbuf
This function returns true if the available buffer space wraps. This
will be used to detect if it's worth realigning a buffer when it lacks
contigous space.
bi_istput() injects the ist string into the input region of the buffer,
it will be used to feed small data chunks into the conn_stream. bo_istput()
does the same into the output region of the buffer, it will be used to send
data via the transport layer and assumes there's no input data.
In order to match known patterns in wrapping buffer, we'll introduce new
string manipulation functions for buffers. The new function b_isteq()
relies on an ist string for the pattern and compares it against any
location in the buffer relative to <p>. The second function bi_eat()
is specially designed to match input contents.
This simply reduces the amount of output data from the buffer after
they have been transferred, in a way that is more natural than by
fiddling with buf->o. b_del() was renamed to bi_del() to avoid any
ambiguity (it's not yet used).
Commit 36eb3a3 ("MINOR: tools: make my_htonll() more efficient on x86_64")
brought an incorrect asm statement missing the input constraints, causing
the input value not necessarily to be placed into the same register as the
output one, resulting in random output. It happens to work when building at
-O0 but not above. This was only detected in the HTTP/2 parser, but in
mainline it could only affect the integer to binary sample cast.
No backport is needed since this bug was only introduced in the development
branch.
In order to prepare multi-thread development, code was re-worked
to propagate changes asynchronoulsy.
Servers with pending status changes are registered in a list
and this one is processed and emptied only once 'run poll' loop.
Operational status changes are performed before administrative
status changes.
In a case of multiple operational status change or admin status
change in the same 'run poll' loop iteration, those changes are
merged to reach only the targeted status.
Commit bcb86ab ("MINOR: session: add a streams field to the session
struct") added this list of streams that is not needed anymore. Let's
get rid of it now.
After some tests, gcc 5.x produces better code with likely()
than without, contrary to gcc 4.x where it was better to disable
it. Let's re-enable it for 5 and above.
It's not possible to use strlen() in const arrays even with const
strings, but we can use sizeof-1 via a macro. Let's provide this in
the IST() macro, as it saves the developer from having to count the
characters.
After the removal of CO_FL_DATA_RD_SH and CO_FL_DATA_WR_SH, the
aggregate mask CO_FL_NOTIFY_DATA was not updated. It happens that
now CO_FL_NOTIFY_DATA and CO_FL_NOTIFY_DONE are similar, which may
reveal some overlap between the ->wake and ->xprt_done callbacks.
We'll see after the mux changes if both are still required.
These ones are the same as the previous ones but for 64 bit values.
We're using my_ntohll() and my_htonll() from standard.h for the byte
order conversion.
These ones are the equivalent of the read_* functions. They support
writing unaligned words, possibly wrapping, in host and network order.
The write_i*() functions were not implemented since the caller can
already use the unsigned version.
This patch adds the ability to read from a wrapping memory area (ie:
buffers). The new functions are called "readv_<type>". The original
ones were renamed to start with "read_" to make the difference more
obvious between the read method and the returned type.
It's worth noting that the memory barrier in readv_bytes() is critical,
as otherwise gcc decides that it doesn't need the resulting data, but
even worse, removes the length checks in readv_u64() and happily
performs an out-of-bounds unaligned read using read_u64()! Such
"optimizations" are a bit borderline, especially when they impact
security like this...
These ones return respectively the pointer to the end of the buffer and
the distance between b->p and the end. These will simplify a bit some
new code needed to parse directly from a wrapping buffer.
The current construct was made when developing on a 32-bit machine.
Having a simple bswap operation replaced with 2 bswap, 2 shift and
2 or is quite of a waste of precious cycles... Let's provide a trivial
asm-based implementation for x86_64.
Instead of duplicating some sensitive listener-specific code in the
session and in the stream code, let's call listener_release() when
releasing a connection attached to a listener.
Some places call delete_listener() then decrement the number of
listeners and jobs. At least one other place calls delete_listener()
without doing so, but since it's in deinit(), it's harmless and cannot
risk to cause zombie processes to survive. Given that the number of
listeners and jobs is incremented when creating the listeners, it's
much more logical to symmetrically decrement them when deleting such
listeners.
This function is used to create a series of listeners for a specific
address and a port range. It automatically calls the matching protocol
handlers to add them to the relevant lists. This way cfgparse doesn't
need to manipulate listeners anymore. As an added bonus, the memory
allocation is checked.
Since everything is self contained in proto_uxst.c there's no need to
export anything. The same should be done for proto_tcp.c but the file
contains other stuff that's not related to the TCP protocol itself
and which should first be moved somewhere else.
cfgparse has no business directly calling each individual protocol's 'add'
function to create a listener. Now that they're all registered, better
perform a protocol lookup on the family and have a standard ->add method
for all of them.
It's a shame that cfgparse() has to make special cases of each protocol
just to cast the port to the target address family. Let's pass the port
in argument to the function. The unix listener simply ignores it.
Adds cli commands to change at runtime whether informational messages
are prepended with severity level or not, with support for numeric and
worded severity in line with syslog severity level.
Adds stats socket config keyword severity-output to set default behavior
per socket on startup.
These notification management function and structs are generic and
it will be better to move in common parts.
The notification management functions and structs have names
containing some "lua" references because it was written for
the Lua. This patch removes also these references.
xref is used to create a relation between two elements.
Once an element is released, it breaks the relation. If the
relation is already broken, it frees the xref struct.
The pointer between two elements is a sort of refcount with
max value 1. The relation is only between two elements.
The pointer and the type of element a and b are conventional.
Note that xref is initialised from Lua files because Lua is
the only one user.
smp_fetch_ssl_fc_cl_str as very limited usage (only work with openssl == 1.0.2
compiled with the option enable-ssl-trace). It use internal cipher.algorithm_ssl
attribut and SSL_CIPHER_standard_name (available with ssl-trace).
This patch implement this (debug) function in a standard way. It used common
SSL_CIPHER_get_name to display cipher name. It work with openssl >= 1.0.2
and boringssl.
This function should be called by the poller to set FD_POLL_* flags on an FD and
update its state if needed. This function has been added to ease threads support
integration.
The server state and weight was reworked to handle
"pending" values updated by checks/CLI/LUA/agent.
These values are commited to be propagated to the
LB stack.
In further dev related to multi-thread, the commit
will be handled into a sync point.
Pending values are named using the prefix 'next_'
Current values used by the LB stack are named 'cur_'
This string is used in sample fetches so it is safe to use a preallocated trash
chunk instead of a buffer dynamically allocated during HAProxy startup.
First, this variable does not need to be publicly exposed because it is only
used by stick_table functions. So we declare it as a global static in
stick_table.c file. Then, it is useless to use a pointer. Using a plain struct
variable avoids any dynamic allocation.
swap_buffer is a global variable only used by buffer_slow_realign. So it has
been moved from global.h to buffer.c and it is allocated by init_buffer
function. deinit_buffer function has been added to release it. It is also used
to destroy the buffers' pool.
Now, we use init_log_buffers and deinit_log_buffers to, respectively, initialize
and deinitialize log buffers used for syslog messages.
These functions have been introduced to be used by threads, to deal with
thread-local log buffers.
Now, we use init_trash_buffers and deinit_trash_buffers to, respectively,
initialize and deinitialize trash buffers (trash, trash_buf1 and trash_buf2).
These functions have been introduced to be used by threads, to deal with
thread-local trash buffers.
Patch "MINOR: ssl: support ssl-min-ver and ssl-max-ver with crt-list"
introduce ssl_methods in struct ssl_bind_conf. struct bind_conf have now
ssl_methods and ssl_conf.ssl_methods (unused). It's error-prone. This patch
remove the duplicate structure to avoid any confusion.
After careful inspection, this flag is set at exactly two places :
- once in the health-check receive callback after receipt of a
response
- once in the stream interface's shutw() code where CF_SHUTW is
always set on chn->flags
The flag was checked in the checks before deciding to send data, but
when it is set, the wake() callback immediately closes the connection
so the CO_FL_SOCK_WR_SH flag is also set.
The flag was also checked in si_conn_send(), but checking the channel's
flag instead is enough and even reveals that one check involving it
could never match.
So it's time to remove this flag and replace its check with a check of
CF_SHUTW in the stream interface. This way each layer is responsible
for its shutdown, this will ease insertion of the mux layer.
This flag is both confusing and wrong. It is supposed to report the
fact that the data layer has received a shutdown, but in fact this is
reported by CO_FL_SOCK_RD_SH which is set by the transport layer after
this condition is detected. The only case where the flag above is set
is in the stream interface where CF_SHUTR is also set on the receiving
channel.
In addition, it was checked in the health checks code (while never set)
and was always test jointly with CO_FL_SOCK_RD_SH everywhere, except in
conn_data_read0_pending() which incorrectly doesn't match the second
time it's called and is fortunately protected by an extra check on
(ic->flags & CF_SHUTR).
This patch gets rid of the flag completely. Now conn_data_read0_pending()
accurately reports the fact that the transport layer has detected the end
of the stream, regardless of the fact that this state was already consumed,
and the stream interface watches ic->flags&CF_SHUTR to know if the channel
was already closed by the upper layer (which it already used to do).
The now unused conn_data_read0() function was removed.
The session may need to enforce a timeout when waiting for a handshake.
Till now we used a trick to avoid allocating a pointer, we used to set
the connection's owner to the task and set the task's context to the
session, so that it was possible to circle between all of them. The
problem is that we'll really need to pass the pointer to the session
to the upper layers during initialization and that the only place to
store it is conn->owner, which is squatted for this trick.
So this patch moves the struct task* into the session where it should
always have been and ensures conn->owner points to the session until
the data layer is properly initialized.
Historically listeners used to have a handler depending on the upper
layer. But now it's exclusively process_stream() and nothing uses it
anymore so it can safely be removed.
Currently a task is allocated in session_new() and serves two purposes :
- either the handshake is complete and it is offered to the stream via
the second arg of stream_new()
- or the handshake is not complete and it's diverted to be used as a
timeout handler for the embryonic session and repurposed once we land
into conn_complete_session()
Furthermore, the task's process() function was taken from the listener's
handler in conn_complete_session() prior to being replaced by a call to
stream_new(). This will become a serious mess with the mux.
Since it's impossible to have a stream without a task, this patch removes
the second arg from stream_new() and make this function allocate its own
task. In session_accept_fd(), we now only allocate the task if needed for
the embryonic session and delete it later.
The ->init() callback of the connection's data layer was only used to
complete the session's initialisation since sessions and streams were
split apart in 1.6. The problem is that it creates a big confusion in
the layers' roles as the session has to register a dummy data layer
when waiting for a handshake to complete, then hand it off to the
stream which will replace it.
The real need is to notify that the transport has finished initializing.
This should enable a better splitting between these layers.
This patch thus introduces a connection-specific callback called
xprt_done_cb() which informs about handshake successes or failures. With
this, data->init() can disappear, CO_FL_INIT_DATA as well, and we don't
need to register a dummy data->wake() callback to be notified of errors.
Till now connections used to rely exclusively on file descriptors. It
was planned in the past that alternative solutions would be implemented,
leading to member "union t" presenting sock.fd only for now.
With QUIC, the connection will need to continue to exist but will not
rely on a file descriptor but a connection ID.
So this patch introduces a "connection handle" which is either a file
descriptor or a connection ID, to replace the existing "union t". We've
now removed the intermediate "struct sock" which was never used. There
is no functional change at all, though the struct connection was inflated
by 32 bits on 64-bit platforms due to alignment.
Since commit 9d8dbbc ("MINOR: dns: Maximum DNS udp payload set to 8192") it's
possible to specify a packet size, but passing too large a size or a negative
size is not detected and results in memset() being performed over a 2GB+ area
upon receipt of the first DNS response, causing runtime crashes.
We now check that the size is not smaller than the smallest packet which is
the DNS header size (12 bytes).
No backport is needed.
Following up DNS extension introduction, this patch aims at making the
computation of the maximum number of records in DNS response dynamic.
This computation is based on the announced payload size accepted by
HAProxy.
This patch fixes a bug where some servers managed by SRV record query
types never ever recover from a "no resolution" status.
The problem is due to a wrong function called when breaking the
server/resolution (A/AAAA) relationship: this is performed when a server's SRV
record disappear from the SRV response.
Contrary to 64-bits libCs where size_t type size is 8, on systems with 32-bits
size of size_t is 4 (the size of a long) which does not equal to size of uint64_t type.
This was revealed by such GCC warnings on 32bits systems:
src/flt_spoe.c:2259:40: warning: passing argument 4 of spoe_decode_buffer from
incompatible pointer type
if (spoe_decode_buffer(&p, end, &str, &sz) == -1)
^
As the already existing code using spoe_decode_buffer() already use such pointers to
uint64_t, in place of pointer to size_t ;), most of this code is in contrib directory,
this simple patch modifies the prototype of spoe_decode_buffer() so that to use a
pointer to uint64_t in place of a pointer to size_t, uint64_t type being the type
finally required for decode_varint().
The two macros EXPECT_LF_HERE and EAT_AND_JUMP_OR_RETURN were exported
for use outside the HTTP parser. They now take extra arguments to avoid
implicit pointers and jump labels. These will be used to reimplement a
minimalist HTTP/1 parser in the H1->H2 gateway.
For HPACK we'll need to perform a lot of string manipulation between the
dynamic headers table and the output stream, and we need an efficient way
to deal with that, considering that the zero character is not an end of
string marker here. It turns out that gcc supports returning structs from
functions and is able to place up to two words directly in registers when
-freg-struct is used, which is the case by default on x86 and armv8. On
other architectures the caller reserves some stack space where the callee
can write, which is equivalent to passing a pointer to the return value.
So let's implement a few functions to deal with this as the resulting code
will be optimized on certain architectures where retrieving the length of
a string will simply consist in reading one of the two returned registers.
Extreme care was taken to ensure that the compiler gets maximum opportunities
to optimize out every bit of unused code. This is also the reason why no
call to regular string functions (such as strlen(), memcmp(), memcpy() etc)
were used. The code involving them is often larger than when they are open
coded. Given that strings are usually very small, especially when manipulating
headers, the time spent calling a function optimized for large vectors often
ends up being higher than the few cycles needed to count a few bytes.
An issue was met with __builtin_strlen() which can automatically convert
a constant string to its constant length. It doesn't accept NULLs and there
is no way to hide them using expressions as the check is made before the
optimizer is called. On gcc 4 and above, using an intermediary variable
is enough to hide it. On older versions, calls to ist() with an explicit
NULL argument will issue a warning. There is normally no reason to do this
but taking care of it the best possible still seems important.
Now each stream is added to the session's list of streams, so that it
will be possible to know all the streams belonging to a session, and
to know if any stream is still attached to a sessoin.
These two functions respectively copy a memory area onto the chunk, and
append the contents of a memory area over a chunk. They are convenient
to prepare binary output data to be sent and will be used for HTTP/2.
Edns extensions may be used to negotiate some settings between a DNS
client and a server.
For now we only use it to announce the maximum response payload size accpeted
by HAProxy.
This size can be set through a configuration parameter in the resolvers
section. If not set, it defaults to 512 bytes.
Commit 48a8332a introduce SSL_CTX_get0_privatekey in openssl-compat.h but
SSL_CTX_get0_privatekey access internal structure and can't be a candidate
to openssl-compat.h. The workaround with openssl < 1.0.2 is to use SSL_new
then SSL_get_privatekey.
Make it so for each server, instead of specifying a hostname, one can use
a SRV label.
When doing so, haproxy will first resolve the SRV label, then use the
resulting hostnames, as well as port and weight (priority is ignored right
now), to each server using the SRV label.
It is resolved periodically, and any server disappearing from the SRV records
will be removed, and any server appearing will be added, assuming there're
free servers in haproxy.
As DNS servers may not return all IPs in one answer, we want to cache the
previous entries. Those entries are removed when considered obsolete, which
happens when the IP hasn't been returned by the DNS server for a time
defined in the "hold obsolete" parameter of the resolver section. The default
is 30s.
Since the commit f6b37c67 ["BUG/MEDIUM: ssl: in bind line, ssl-options after
'crt' are ignored."], the certificates generation is broken.
To generate a certificate, we retrieved the private key of the default
certificate using the SSL object. But since the commit f6b37c67, the SSL object
is created with a dummy certificate (initial_ctx).
So to fix the bug, we use directly the default certificate in the bind_conf
structure. We use SSL_CTX_get0_privatekey function to do so. Because this
function does not exist for OpenSSL < 1.0.2 and for LibreSSL, it has been added
in openssl-compat.h with the right #ifdef.
If a server presents an unexpected certificate to haproxy, that is, a
certificate that doesn't match the expected name as configured in
verifyhost or as requested using SNI, we want to store that precious
information. Fortunately we have access to the connection in the
verification callback so it's possible to store an error code there.
For this purpose we use CO_ER_SSL_MISMATCH_SNI (for when the cert name
didn't match the one requested using SNI) and CO_ER_SSL_MISMATCH for
when it doesn't match verifyhost.
This patch fixes the commit 2ab8867 ("MINOR: ssl: compare server certificate
names to the SNI on outgoing connections")
When we check the certificate sent by a server, in the verify callback, we get
the SNI from the session (SSL_SESSION object). In OpenSSL, tlsext_hostname value
for this session is copied from the ssl connection (SSL object). But the copy is
done only if the "server_name" extension is found in the server hello
message. This means the server has found a certificate matching the client's
SNI.
When the server returns a default certificate not matching the client's SNI, it
doesn't set any "server_name" extension in the server hello message. So no SNI
is set on the SSL session and SSL_SESSION_get0_hostname always returns NULL.
To fix the problemn, we get the SNI directly from the SSL connection. It is
always defined with the value set by the client.
If the commit 2ab8867 is backported in 1.7 and/or 1.6, this one must be
backported too.
Note: it's worth mentionning that by making the SNI check work, we
introduce another problem by which failed SNI checks can cause
long connection retries on the server, and in certain cases the
SNI value used comes from the client. So this patch series must
not be backported until this issue is resolved.
task_init() is called exclusively by task_new() which is the only way
to create a task. Most callers set t->expire to TICK_ETERNITY, some set
it to another value and a few like Lua don't set it at all as they don't
need a timeout, causing random values to be used in case the task gets
queued.
Let's always set t->expire to TICK_ETERNITY in task_init() so that all
tasks are now initialized in a clean state.
This patch can be backported as it will definitely make the code more
robust (at least the Lua code, possibly other places).
timegm() is not provided everywhere and the documentation on how to
replace it is bogus as it proposes an inefficient and non-thread safe
alternative.
Here we reimplement everything needed to compute the number of seconds
since Epoch based on the broken down fields in struct tm. It is only
guaranteed to return correct values for correct inputs. It was successfully
tested with all possible 32-bit values of time_t converted to struct tm
using gmtime() and back to time_t using the legacy timegm() and this
function, and both functions always produced the same result.
Thanks to Benoît Garnier for an instructive discussion and detailed
explanations of the various time functions, leading to this solution.
In some cases, the socket is misused. The user can open socket and never
close it, or open the socket and close it without sending data. This
causes resources leak on all resources associated to the stream (buffer,
spoe, ...)
This is caused by the stream_shutdown function which is called outside
of the stream execution process. Sometimes, the shtudown is required
while the stream is not started, so the cleanup is ignored.
This patch change the shutdown mode of the session. Now if the session is
no longer used and the Lua want to destroy it, it just set a destroy flag
and the session kill itself.
This patch should be backported in 1.6 and 1.7
Functions hdr_idx_first_idx() and hdr_idx_first_pos() were missing a
"const" qualifier on their arguments which are not modified, causing
a warning in some experimental H2 code.
When several stick-tables were configured with several peers sections,
only a part of them could be synchronized: the ones attached to the last
parsed 'peers' section. This was due to the fact that, at least, the peer I/O handler
refered to the wrong peer section list, in fact always the same: the last one parsed.
The fact that the global peer section list was named "struct peers *peers"
lead to this issue. This variable name is dangerous ;).
So this patch renames global 'peers' variable to 'cfg_peers' to ensure that
no such wrong references are still in use, then all the functions wich used
old 'peers' variable have been modified to refer to the correct peer list.
Must be backported to 1.6 and 1.7.
When support for passing SNI to the server was added in 1.6-dev3, there
was no way to validate that the certificate presented by the server would
really match the name requested in the SNI, which is quite a problem as
it allows other (valid) certificates to be presented instead (when hitting
the wrong server or due to a man in the middle).
This patch adds the missing check against the value passed in the SNI.
The "verifyhost" value keeps precedence if set. If no SNI is used and
no verifyhost directive is specified, then the certificate name is not
checked (this is unchanged).
In order to extract the SNI value, it was necessary to make use of
SSL_SESSION_get0_hostname(), which appeared in openssl 1.1.0. This is
a trivial function which returns the value of s->tlsext_hostname, so
it was provided in the compat layer for older versions. After some
refinements from Emmanuel, it now builds with openssl 1.0.2, openssl
1.1.0 and boringssl. A test file was provided to ease testing all cases.
After some careful observation period it may make sense to backport
this to 1.7 and 1.6 as some users rightfully consider this limitation
as a bug.
Cc: Emmanuel Hocdet <manu@gandi.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
The bug: Maps/ACLs using the same file/id can mistakenly inherit
their flags from the last declared one.
i.e.
$ cat haproxy.conf
listen mylistener
mode http
bind 0.0.0.0:8080
acl myacl1 url -i -f mine.acl
acl myacl2 url -f mine.acl
acl myacl3 url -i -f mine.acl
redirect location / if myacl2
$ cat mine.acl
foobar
Shows an unexpected redirect for request 'GET /FOObAR HTTP/1.0\n\n'.
This fix should be backported on mainline branches v1.6 and v1.7.
The reference of the current map/acl element to dump could
be destroyed if map is updated from an 'http-request del-map'
configuration rule or throught a 'del map/acl' on CLI.
We use a 'back_refs' chaining element to fix this. As it
is done to dump sessions.
This patch needs also fix:
'BUG/MAJOR: cli: fix custom io_release was crushed by NULL.'
To clean the back_ref and avoid a crash on a further
del/clear map operation.
Those fixes should be backported on mainline branches 1.7 and 1.6.
This patch wont directly apply on 1.6.
In order to authorize call of appctx_wakeup on running task:
- from within the task handler itself.
- in futur, from another thread.
The appctx is considered paused as default after running the handler.
The handler should explicitly call appctx_wakeup to be re-called.
When the appctx_free is called on a running handler. The real
free is postponed at the end of the handler process.
This will be used to retrieve the ALPN negociated over SSL (or possibly
via the proxy protocol later). It's likely that this information should
be stored in the connection itself, but it requires adding an extra
pointer and an extra integer. Thus better rely on the transport layer
to pass this info for now.
In order to authorize call of task_wakeup on running task:
- from within the task handler itself.
- in futur, from another thread.
The lookups on runqueue and waitqueue are re-worked
to prepare multithread stuff.
If task_wakeup is called on a running task, the woken
message flags are savec in the 'pending_state' attribute of
the state. The real wakeup is postponed at the end of the handler
process and the woken messages are copied from pending_state
to the state attribute of the task.
It's important to note that this change will cause a very minor
(though measurable) performance loss but it is necessary to make
forward progress on a multi-threaded scheduler. Most users won't
ever notice.
Under certain circumstances, if a stream's task is first woken up
(eg: I/O event) then notified of the availability of a buffer it
was waiting for via stream_res_wakeup(), this second event is lost
because the flags are only merged after seeing that the task is
running. At the moment it seems that the TASK_WOKEN_RES event is
not explicitly checked for, but better fix this before getting
reports of lost events.
This fix removes this "task running" test which is properly
performed in task_wakeup(), while the flags are properly merged.
It must be backported to 1.7 and 1.6.
These functions was added in commit 637f8f2c ("BUG/MEDIUM: buffers: Fix how
input/output data are injected into buffers").
This patch fixes hidden bugs. When a buffer is full (buf->i + buf->o ==
buf->size), instead of returning 0, these functions can return buf->size. Today,
this never happens because callers already check if the buffer is full before
calling bi/bo_contig_space. But to avoid possible bugs if calling conditions
changed, we slightly refactored these functions.
SSL/TLS version can be changed per certificat if and only if openssl lib support
earlier callback on handshake and, of course, is implemented in haproxy. It's ok
for BoringSSL. For Openssl, version 1.1.1 have such callback and could support it.
Very early in the connection rework process leading to v1.5-dev12, commit
56a77e5 ("MEDIUM: connection: complete the polling cleanups") marked the
end of use for this flag which since was never set anymore, but it continues
to be tested. Let's kill it now.
When dumping data at various places in the code, it's hard to figure
what is present where. To make this easier, this patch slightly modifies
debug_hexdump() to take a prefix string which is prepended in front of
each output line.
This patch is a major upgrade of the internal run-time DNS resolver in
HAProxy and it brings the following 2 main changes:
1. DNS resolution task
Up to now, DNS resolution was triggered by the health check task.
From now, DNS resolution task is autonomous. It is started by HAProxy
right after the scheduler is available and it is woken either when a
network IO occurs for one of its nameserver or when a timeout is
matched.
From now, this means we can enable DNS resolution for a server without
enabling health checking.
2. Introduction of a dns_requester structure
Up to now, DNS resolution was purposely made for resolving server
hostnames.
The idea, is to ensure that any HAProxy internal object should be able
to trigger a DNS resolution. For this purpose, 2 things has to be done:
- clean up the DNS code from the server structure (this was already
quite clean actually) and clean up the server's callbacks from
manipulating too much DNS resolution
- create an agnostic structure which allows linking a DNS resolution
and a requester of any type (using obj_type enum)
3. Manage requesters through queues
Up to now, there was an uniq relationship between a resolution and it's
owner (aka the requester now). It's a shame, because in some cases,
multiple objects may share the same hostname and may benefit from a
resolution being performed by a third party.
This patch introduces the notion of queues, which are basically lists of
either currently running resolution or waiting ones.
The resolutions are now available as a pool, which belongs to the resolvers.
The pool has has a default size of 64 resolutions per resolvers and is
allocated at configuration parsing.
Introduction of a DNS response LRU cache in HAProxy.
When a positive response is received from a DNS server, HAProxy stores
it in the struct resolution and then also populates a LRU cache with the
response.
For now, the key in the cache is a XXHASH64 of the hostname in the
domain name format concatened to the query type in string format.
Prior this patch, the DNS responses were stored in a pre-allocated
memory area (allocated at HAProxy's startup).
The problem is that this memory is erased for each new DNS responses
received and processed.
This patch removes the global memory allocation (which was not thread
safe by the way) and introduces a storage of the dns response in the
struct
resolution.
The memory in the struct resolution is also reserved at start up and is
thread safe, since each resolution structure will have its own memory
area.
For now, we simply store the response and use it atomically per
response per server.
In the process of breaking links between dns_* functions and other
structures (mainly server and a bit of resolution), the function
dns_get_ip_from_response needs to be reworked: it now can call
"callback" functions based on resolution's owner type to allow modifying
the way the response is processed.
For now, main purpose of the callback function is to check that an IP
address is not already affected to an element of the same type.
For now, only server type has a callback.
This patch introduces a some re-organisation around the DNS code in
HAProxy.
1. make the dns_* functions less dependent on 'struct server' and 'struct resolution'.
With this in mind, the following changes were performed:
- 'struct dns_options' has been removed from 'struct resolution' (well,
we might need it back at some point later, we'll see)
==> we'll use the 'struct dns_options' from the owner of the resolution
- dns_get_ip_from_response(): takes a 'struct dns_options' instead of
'struct resolution'
==> so the caller can pass its own dns options to get the most
appropriate IP from the response
- dns_process_resolve(): struct dns_option is deduced from new
resolution->requester_type parameter
2. add hostname_dn and hostname_dn_len into struct server
In order to avoid recomputing a server's hostname into its domain name
format (and use a trash buffer to store the result), it is safer to
compute it once at configuration parsing and to store it into the struct
server.
In the mean time, the struct resolution linked to the server doesn't
need anymore to store the hostname in domain name format. A simple
pointer to the server one will make the trick.
The function srv_alloc_dns_resolution() properly manages everything for
us: memory allocation, pointer updates, etc...
3. move resolvers pointer into struct server
This patch makes the pointer to struct dns_resolvers from struct
dns_resolution obsolete.
Purpose is to make the resolution as "neutral" as possible and since the
requester is already linked to the resolvers, then we don't need this
information anymore in the resolution itself.
A couple of new functions to allocate and free memory for a DNS
resolution structure. Main purpose is to to make the code related to DNS
more consistent.
They allocate or free memory for the structure itself. Later, if needed,
they should also allocate / free the buffers, etc, used by this structure.
They don't set/unset any parameters, this is the role of the caller.
This patch also implement calls to these function eveywhere it is
required.
The default len of request uri in log messages is 1024. In some use
cases, you need to keep the long trail of GET parameters. The only
way to increase this len is to recompile with DEFINE=-DREQURI_LEN=2048.
This commit introduces a tune.http.logurilen configuration directive,
allowing to tune this at runtime.
This option exits every workers when one of the current workers die.
It allows you to monitor the master process in order to relaunch
everything on a failure.
For example it can be used with systemd and Restart=on-failure in a spec
file.
This commit remove the -Ds systemd mode in HAProxy in order to replace
it by a more generic master worker system. It aims to replace entirely
the systemd wrapper in the near future.
The master worker mode implements a new way of managing HAProxy
processes. The master is in charge of parsing the configuration
file and is responsible for spawning child processes.
The master worker mode can be invoked by using the -W flag. It can be
used either in background mode (-D) or foreground mode. When used in
background mode, the master will fork to daemonize.
In master worker background mode, chroot, setuid and setgid are done in
each child rather than in the master process, because the master process
will still need access to filesystem to reload the configuration.
This patch adds the support of a maximum of 32 engines
in async mode.
Some tests have been done using 2 engines simultaneously.
This patch also removes specific 'async' attribute from the connection
structure. All the code relies only on Openssl functions.
ssl-mode-async is a global configuration parameter which enables
asynchronous processing in OPENSSL for all SSL connections haproxy
handles. With SSL_MODE_ASYNC set, TLS I/O operations may indicate a
retry with SSL_ERROR_WANT_ASYNC with this mode set if an asynchronous
capable engine is used to perform cryptographic operations. Currently
async mode only supports one async-capable engine.
This is the latest version of the patchset which includes Emeric's
updates :
- improved async fd cleaning when openssl reports an fd to delete
- prevent conn_fd_handler from calling SSL_{read,write,handshake} until
the async fd is ready, as these operations are very slow and waste CPU
- postpone of SSL_free to ensure the async operation can complete and
does not cause a dereference a released SSL.
- proper removal of async fd from the fdtab and removal of the unused async
flag.
This patch adds the global 'ssl-engine' keyword. First arg is an engine
identifier followed by a list of default_algorithms the engine will
operate.
If the openssl version is too old, an error is reported when the option
is used.
This patch changes the stats socket rights for allowing the sending of
listening sockets.
The previous behavior was to allow any unix stats socket with admin
level to send sockets. It's not possible anymore, you have to set this
option to activate the socket sending.
Example:
stats socket /var/run/haproxy4.sock mode 666 expose-fd listeners level user process 4
The current level variable use only 2 bits for storing the 3 access
level (user, oper and admin).
This patch add a bitmask which allows to use the remaining bits for
other usage.
Plan is to add min-tlsxx max-tlsxx configuration, more consistent than no-tlsxx.
This patch introduce internal min/max and replace force-tlsxx implementation.
SSL method configuration is store in 'struct tls_version_filter'.
SSL method configuration to openssl setting is abstract in 'methodVersions' table.
With openssl < 1.1.0, SSL_CTX_set_ssl_version is used for force (min == max).
With openssl >= 1.1.0, SSL_CTX_set_min/max_proto_version is used.
This patch adds a new stats socket command to modify server
FQDNs at run time.
Its syntax:
set server <backend>/<server> fqdn <FQDN>
This patch also adds FQDNs to server state file at the end
of each line for backward compatibility ("-" if not present).
These encoding functions does general stuff and can be used in
other context than spoe. This patch moves the function spoe_encode_varint
and spoe_decode_varint from spoe to common. It also remove the prefix spoe.
These functions will be used for encoding values in new binary sample fetch.
When we include the header proto/spoe.h in other files in the same
project, the compilator claim that the symbol have multiple definitions:
src/flt_spoe.o: In function `spoe_encode_varint':
~/git/haproxy/include/proto/spoe.h:45: multiple definition of `spoe_encode_varint'
src/proto_http.o:~/git/haproxy/include/proto/spoe.h:45: first defined here
This patch makes backend sections support 'server-template' new keyword.
Such 'server-template' objects are parsed similarly to a 'server' object
by parse_server() function, but its first arguments are as follows:
server-template <ID prefix> <nb | range> <ip | fqdn>:<port> ...
The remaining arguments are the same as for 'server' lines.
With such server template declarations, servers may be allocated with IDs
built from <ID prefix> and <nb | range> arguments.
For instance declaring:
server-template foo 1-5 google.com:80 ...
or
server-template foo 5 google.com:80 ...
would be equivalent to declare:
server foo1 google.com:80 ...
server foo2 google.com:80 ...
server foo3 google.com:80 ...
server foo4 google.com:80 ...
server foo5 google.com:80 ...
When running with multiple process, if some proxies are just assigned
to some processes, the other processes will just close the file descriptors
for the listening sockets. However, we may still have to provide those
sockets when reloading, so instead we just try hard to pretend those proxies
are dead, while keeping the sockets opened.
A new global option, no-reused-socket", has been added, to restore the old
behavior of closing the sockets not bound to this process.
Add the "-x" flag, that takes a path to a unix socket as an argument. If
used, haproxy will connect to the socket, and asks to get all the
listening sockets from the old process. Any failure is fatal.
This is needed to get seamless reloads on linux.
Add a new command that will send all the listening sockets, via the
stats socket, and their properties.
This is a first step to workaround the linux problem when reloading
haproxy.
Released version 1.8-dev1 with the following main changes :
- BUG/MEDIUM: proxy: return "none" and "unknown" for unknown LB algos
- BUG/MINOR: stats: make field_str() return an empty string on NULL
- DOC: Spelling fixes
- BUG/MEDIUM: http: Fix tunnel mode when the CONNECT method is used
- BUG/MINOR: http: Keep the same behavior between 1.6 and 1.7 for tunneled txn
- BUG/MINOR: filters: Protect args in macros HAS_DATA_FILTERS and IS_DATA_FILTER
- BUG/MINOR: filters: Invert evaluation order of HTTP_XFER_BODY and XFER_DATA analyzers
- BUG/MINOR: http: Call XFER_DATA analyzer when HTTP txn is switched in tunnel mode
- BUG/MAJOR: stream: fix session abort on resource shortage
- OPTIM: stream-int: don't disable polling anymore on DONT_READ
- BUG/MINOR: cli: allow the backslash to be escaped on the CLI
- BUG/MEDIUM: cli: fix "show stat resolvers" and "show tls-keys"
- DOC: Fix map table's format
- DOC: Added 51Degrees conv and fetch functions to documentation.
- BUG/MINOR: http: don't send an extra CRLF after a Set-Cookie in a redirect
- DOC: mention that req_tot is for both frontends and backends
- BUG/MEDIUM: variables: some variable name can hide another ones
- MINOR: lua: Allow argument for actions
- BUILD: rearrange target files by build time
- CLEANUP: hlua: just indent functions
- MINOR: lua: give HAProxy variable access to the applets
- BUG/MINOR: stats: fix be/sessions/max output in html stats
- MINOR: proxy: Add fe_name/be_name fetchers next to existing fe_id/be_id
- DOC: lua: Documentation about some entry missing
- DOC: lua: Add documentation about variable manipulation from applet
- MINOR: Do not forward the header "Expect: 100-continue" when the option http-buffer-request is set
- DOC: Add undocumented argument of the trace filter
- DOC: Fix some typo in SPOE documentation
- MINOR: cli: Remove useless call to bi_putchk
- BUG/MINOR: cli: be sure to always warn the cli applet when input buffer is full
- MINOR: applet: Count number of (active) applets
- MINOR: task: Rename run_queue and run_queue_cur counters
- BUG/MEDIUM: stream: Save unprocessed events for a stream
- BUG/MAJOR: Fix how the list of entities waiting for a buffer is handled
- BUILD/MEDIUM: Fixing the build using LibreSSL
- BUG/MEDIUM: lua: In some case, the return of sample-fetches is ignored (2)
- SCRIPTS: git-show-backports: fix a harmless typo
- SCRIPTS: git-show-backports: add -H to use the hash of the commit message
- BUG/MINOR: stream-int: automatically release SI_FL_WAIT_DATA on SHUTW_NOW
- CLEANUP: applet/lua: create a dedicated ->fcn entry in hlua_cli context
- CLEANUP: applet/table: add an "action" entry in ->table context
- CLEANUP: applet: remove the now unused appctx->private field
- DOC: lua: documentation about time parser functions
- DOC: lua: improve links
- DOC: lua: section declared twice
- MEDIUM: cli: 'show cli sockets' list the CLI sockets
- BUG/MINOR: cli: "show cli sockets" wouldn't list all processes
- BUG/MINOR: cli: "show cli sockets" would always report process 64
- CLEANUP: lua: rename one of the lua appctx union
- BUG/MINOR: lua/cli: bad error message
- MEDIUM: lua: use memory pool for hlua struct in applets
- MINOR: lua/signals: Remove Lua part from signals.
- DOC: cli: show cli sockets
- MINOR: cli: automatically enable a CLI I/O handler when there's no parser
- CLEANUP: memory: remove the now unused cli_parse_show_pools() function
- CLEANUP: applet: group all CLI contexts together
- CLEANUP: stats: move a misplaced stats context initialization
- MINOR: cli: add two general purpose pointers and integers in the CLI struct
- MINOR: appctx/cli: remove the cli_socket entry from the appctx union
- MINOR: appctx/cli: remove the env entry from the appctx union
- MINOR: appctx/cli: remove the "be" entry from the appctx union
- MINOR: appctx/cli: remove the "dns" entry from the appctx union
- MINOR: appctx/cli: remove the "server_state" entry from the appctx union
- MINOR: appctx/cli: remove the "tlskeys" entry from the appctx union
- CONTRIB: tcploop: add limits.h to fix build issue with some compilers
- MINOR/DOC: lua: just precise one thing
- DOC: fix small typo in fe_id (backend instead of frontend)
- BUG/MINOR: Fix the sending function in Lua's cosocket
- BUG/MINOR: lua: memory leak executing tasks
- BUG/MINOR: lua: bad return code
- BUG/MINOR: lua: memleak when Lua/cli fails
- MEDIUM: lua: remove Lua struct from session, and allocate it with memory pools
- CLEANUP: haproxy: statify unexported functions
- MINOR: haproxy: add a registration for build options
- CLEANUP: wurfl: use the build options list to report it
- CLEANUP: 51d: use the build options list to report it
- CLEANUP: da: use the build options list to report it
- CLEANUP: namespaces: use the build options list to report it
- CLEANUP: tcp: use the build options list to report transparent modes
- CLEANUP: lua: use the build options list to report it
- CLEANUP: regex: use the build options list to report the regex type
- CLEANUP: ssl: use the build options list to report the SSL details
- CLEANUP: compression: use the build options list to report the algos
- CLEANUP: auth: use the build options list to report its support
- MINOR: haproxy: add a registration for post-check functions
- CLEANUP: checks: make use of the post-init registration to start checks
- CLEANUP: filters: use the function registration to initialize all proxies
- CLEANUP: wurfl: make use of the late init registration
- CLEANUP: 51d: make use of the late init registration
- CLEANUP: da: make use of the late init registration code
- MINOR: haproxy: add a registration for post-deinit functions
- CLEANUP: wurfl: register the deinit function via the dedicated list
- CLEANUP: 51d: register the deinitialization function
- CLEANUP: da: register the deinitialization function
- CLEANUP: wurfl: move global settings out of the global section
- CLEANUP: 51d: move global settings out of the global section
- CLEANUP: da: move global settings out of the global section
- MINOR: cfgparse: add two new functions to check arguments count
- MINOR: cfgparse: move parsing of "ca-base" and "crt-base" to ssl_sock
- MEDIUM: cfgparse: move all tune.ssl.* keywords to ssl_sock
- MEDIUM: cfgparse: move maxsslconn parsing to ssl_sock
- MINOR: cfgparse: move parsing of ssl-default-{bind,server}-ciphers to ssl_sock
- MEDIUM: cfgparse: move ssl-dh-param-file parsing to ssl_sock
- MEDIUM: compression: move the zlib-specific stuff from global.h to compression.c
- BUG/MEDIUM: ssl: properly reset the reused_sess during a forced handshake
- BUG/MEDIUM: ssl: avoid double free when releasing bind_confs
- BUG/MINOR: stats: fix be/sessions/current out in typed stats
- MINOR: tcp-rules: check that the listener exists before updating its counters
- MEDIUM: spoe: don't create a dummy listener for outgoing connections
- MINOR: listener: move the transport layer pointer to the bind_conf
- MEDIUM: move listener->frontend to bind_conf->frontend
- MEDIUM: ssl: remote the proxy argument from most functions
- MINOR: connection: add a new prepare_bind_conf() entry to xprt_ops
- MEDIUM: ssl_sock: implement ssl_sock_prepare_bind_conf()
- MINOR: connection: add a new destroy_bind_conf() entry to xprt_ops
- MINOR: ssl_sock: implement ssl_sock_destroy_bind_conf()
- MINOR: server: move the use_ssl field out of the ifdef USE_OPENSSL
- MINOR: connection: add a minimal transport layer registration system
- CLEANUP: connection: remove all direct references to raw_sock and ssl_sock
- CLEANUP: connection: unexport raw_sock and ssl_sock
- MINOR: connection: add new prepare_srv()/destroy_srv() entries to xprt_ops
- MINOR: ssl_sock: implement and use prepare_srv()/destroy_srv()
- CLEANUP: ssl: move tlskeys_finalize_config() to a post_check callback
- CLEANUP: ssl: move most ssl-specific global settings to ssl_sock.c
- BUG/MINOR: backend: nbsrv() should return 0 if backend is disabled
- BUG/MEDIUM: ssl: for a handshake when server-side SNI changes
- BUG/MINOR: systemd: potential zombie processes
- DOC: Add timings events schemas
- BUILD: lua: build failed on FreeBSD.
- MINOR: samples: add xx-hash functions
- MEDIUM: regex: pcre2 support
- BUG/MINOR: option prefer-last-server must be ignored in some case
- MINOR: stats: Support "select all" for backend actions
- BUG/MINOR: sample-fetches/stick-tables: bad type for the sample fetches sc*_get_gpt0
- BUG/MAJOR: channel: Fix the definition order of channel analyzers
- BUG/MINOR: http: report real parser state in error captures
- BUILD: scripts: automatically update the branch in version.h when releasing
- MINOR: tools: add a generic hexdump function for debugging
- BUG/MAJOR: http: fix risk of getting invalid reports of bad requests
- MINOR: http: custom status reason.
- MINOR: connection: add sample fetch "fc_rcvd_proxy"
- BUG/MINOR: config: emit a warning if http-reuse is enabled with incompatible options
- BUG/MINOR: tools: fix off-by-one in port size check
- BUG/MEDIUM: server: consider AF_UNSPEC as a valid address family
- MEDIUM: server: split the address and the port into two different fields
- MINOR: tools: make str2sa_range() return the port in a separate argument
- MINOR: server: take the destination port from the port field, not the addr
- MEDIUM: server: disable protocol validations when the server doesn't resolve
- BUG/MEDIUM: tools: do not force an unresolved address to AF_INET:0.0.0.0
- BUG/MINOR: ssl: EVP_PKEY must be freed after X509_get_pubkey usage
- BUG/MINOR: ssl: assert on SSL_set_shutdown with BoringSSL
- MINOR: Use "500 Internal Server Error" for 500 error/status code message.
- MINOR: proto_http.c 502 error txt typo.
- DOC: add deprecation notice to "block"
- MINOR: compression: fix -vv output without zlib/slz
- BUG/MINOR: Reset errno variable before calling strtol(3)
- MINOR: ssl: don't show prefer-server-ciphers output
- OPTIM/MINOR: config: Optimize fullconn automatic computation loading configuration
- BUG/MINOR: stream: Fix how backend-specific analyzers are set on a stream
- MAJOR: ssl: bind configuration per certificat
- MINOR: ssl: add curve suite for ECDHE negotiation
- MINOR: checks: Add agent-addr config directive
- MINOR: cli: Add possiblity to change agent config via CLI/socket
- MINOR: doc: Add docs for agent-addr configuration variable
- MINOR: doc: Add docs for agent-addr and agent-send CLI commands
- BUILD: ssl: fix to build (again) with boringssl
- BUILD: ssl: fix build on OpenSSL 1.0.0
- BUILD: ssl: silence a warning reported for ERR_remove_state()
- BUILD: ssl: eliminate warning with OpenSSL 1.1.0 regarding RAND_pseudo_bytes()
- BUILD: ssl: kill a build warning introduced by BoringSSL compatibility
- BUG/MEDIUM: tcp: don't poll for write when connect() succeeds
- BUG/MINOR: unix: fix connect's polling in case no data are scheduled
- MINOR: server: extend the flags to 32 bits
- BUG/MINOR: lua: Map.end are not reliable because "end" is a reserved keyword
- MINOR: dns: give ability to dns_init_resolvers() to close a socket when requested
- BUG/MAJOR: dns: restart sockets after fork()
- MINOR: chunks: implement a simple dynamic allocator for trash buffers
- BUG/MEDIUM: http: prevent redirect from overwriting a buffer
- BUG/MEDIUM: filters: Do not truncate HTTP response when body length is undefined
- BUG/MEDIUM: http: Prevent replace-header from overwriting a buffer
- BUG/MINOR: http: Return an error when a replace-header rule failed on the response
- BUG/MINOR: sendmail: The return of vsnprintf is not cleanly tested
- BUG/MAJOR: ssl: fix a regression in ssl_sock_shutw()
- BUG/MAJOR: lua segmentation fault when the request is like 'GET ?arg=val HTTP/1.1'
- BUG/MEDIUM: config: reject anything but "if" or "unless" after a use-backend rule
- MINOR: http: don't close when redirect location doesn't start with "/"
- MEDIUM: boringssl: support native multi-cert selection without bundling
- BUG/MEDIUM: ssl: fix verify/ca-file per certificate
- BUG/MEDIUM: ssl: switchctx should not return SSL_TLSEXT_ERR_ALERT_WARNING
- MINOR: ssl: removes SSL_CTX_set_ssl_version call and cleanup CTX creation.
- BUILD: ssl: fix build with -DOPENSSL_NO_DH
- MEDIUM: ssl: add new sample-fetch which captures the cipherlist
- MEDIUM: ssl: remove ssl-options from crt-list
- BUG/MEDIUM: ssl: in bind line, ssl-options after 'crt' are ignored.
- BUG/MINOR: ssl: fix cipherlist captures with sustainable SSL calls
- MINOR: ssl: improved cipherlist captures
- BUG/MINOR: spoe: Fix soft stop handler using a specific id for spoe filters
- BUG/MINOR: spoe: Fix parsing of arguments in spoe-message section
- MAJOR: spoe: Add support of pipelined and asynchronous exchanges with agents
- MINOR: spoe: Add support for pipelining/async capabilities in the SPOA example
- MINOR: spoe: Remove SPOE details from the appctx structure
- MINOR: spoe: Add status code in error variable instead of hardcoded value
- MINOR: spoe: Send a log message when an error occurred during event processing
- MINOR: spoe: Check the scope of sample fetches used in SPOE messages
- MEDIUM: spoe: Be sure to wakeup the good entity waiting for a buffer
- MINOR: spoe: Use the min of all known max_frame_size to encode messages
- MAJOR: spoe: Add support of payload fragmentation in NOTIFY frames
- MINOR: spoe: Add support for fragmentation capability in the SPOA example
- MAJOR: spoe: refactor the filter to clean up the code
- MINOR: spoe: Handle NOTIFY frames cancellation using ABORT bit in ACK frames
- REORG: spoe: Move struct and enum definitions in dedicated header file
- REORG: spoe: Move low-level encoding/decoding functions in dedicated header file
- MINOR: spoe: Improve implementation of the payload fragmentation
- MINOR: spoe: Add support of negation for options in SPOE configuration file
- MINOR: spoe: Add "pipelining" and "async" options in spoe-agent section
- MINOR: spoe: Rely on alertif_too_many_arg during configuration parsing
- MINOR: spoe: Add "send-frag-payload" option in spoe-agent section
- MINOR: spoe: Add "max-frame-size" statement in spoe-agent section
- DOC: spoe: Update SPOE documentation to reflect recent changes
- MINOR: config: warn when some HTTP rules are used in a TCP proxy
- BUG/MEDIUM: ssl: Clear OpenSSL error stack after trying to parse OCSP file
- BUG/MEDIUM: cli: Prevent double free in CLI ACL lookup
- BUG/MINOR: Fix "get map <map> <value>" CLI command
- MINOR: Add nbsrv sample converter
- CLEANUP: Replace repeated code to count usable servers with be_usable_srv()
- MINOR: Add hostname sample fetch
- CLEANUP: Remove comment that's no longer valid
- MEDIUM: http_error_message: txn->status / http_get_status_idx.
- MINOR: http-request tarpit deny_status.
- CLEANUP: http: make http_server_error() not set the status anymore
- MEDIUM: stats: Add JSON output option to show (info|stat)
- MEDIUM: stats: Add show json schema
- BUG/MAJOR: connection: update CO_FL_CONNECTED before calling the data layer
- MINOR: server: Add dynamic session cookies.
- MINOR: cli: Let configure the dynamic cookies from the cli.
- BUG/MINOR: checks: attempt clean shutw for SSL check
- CONTRIB: tcploop: make it build on FreeBSD
- CONTRIB: tcploop: fix time format to silence build warnings
- CONTRIB: tcploop: report action 'K' (kill) in usage message
- CONTRIB: tcploop: fix connect's address length
- CONTRIB: tcploop: use the trash instead of NULL for recv()
- BUG/MEDIUM: listener: do not try to rebind another process' socket
- BUG/MEDIUM server: Fix crash when dynamic is defined, but not key is provided.
- CLEANUP: config: Typo in comment.
- BUG/MEDIUM: filters: Fix channels synchronization in flt_end_analyze
- TESTS: add a test configuration to stress handshake combinations
- BUG/MAJOR: stream-int: do not depend on connection flags to detect connection
- BUG/MEDIUM: connection: ensure to always report the end of handshakes
- MEDIUM: connection: don't test for CO_FL_WAKE_DATA
- CLEANUP: connection: completely remove CO_FL_WAKE_DATA
- BUG: payload: fix payload not retrieving arbitrary lengths
- BUILD: ssl: simplify SSL_CTX_set_ecdh_auto compatibility
- BUILD: ssl: fix OPENSSL_NO_SSL_TRACE for boringssl and libressl
- BUG/MAJOR: http: fix typo in http_apply_redirect_rule
- MINOR: doc: 2.4. Examples should be 2.5. Examples
- BUG/MEDIUM: stream: fix client-fin/server-fin handling
- MINOR: fd: add a new flag HAP_POLL_F_RDHUP to struct poller
- BUG/MINOR: raw_sock: always perfom the last recv if RDHUP is not available
- OPTIM: poll: enable support for POLLRDHUP
- MINOR: kqueue: exclusively rely on the kqueue returned status
- MEDIUM: kqueue: take care of EV_EOF to improve polling status accuracy
- MEDIUM: kqueue: only set FD_POLL_IN when there are pending data
- DOC/MINOR: Fix typos in proxy protocol doc
- DOC: Protocol doc: add checksum, TLV type ranges
- DOC: Protocol doc: add SSL TLVs, rename CHECKSUM
- DOC: Protocol doc: add noop TLV
- MEDIUM: global: add a 'hard-stop-after' option to cap the soft-stop time
- MINOR: dns: improve DNS response parsing to use as many available records as possible
- BUG/MINOR: cfgparse: loop in tracked servers lists not detected by check_config_validity().
- MINOR: server: irrelevant error message with 'default-server' config file keyword.
- MINOR: server: Make 'default-server' support 'backup' keyword.
- MINOR: server: Make 'default-server' support 'check-send-proxy' keyword.
- CLEANUP: server: code alignement.
- MINOR: server: Make 'default-server' support 'non-stick' keyword.
- MINOR: server: Make 'default-server' support 'send-proxy' and 'send-proxy-v2 keywords.
- MINOR: server: Make 'default-server' support 'check-ssl' keyword.
- MINOR: server: Make 'default-server' support 'force-sslv3' and 'force-tlsv1[0-2]' keywords.
- CLEANUP: server: code alignement.
- MINOR: server: Make 'default-server' support 'no-ssl*' and 'no-tlsv*' keywords.
- MINOR: server: Make 'default-server' support 'ssl' keyword.
- MINOR: server: Make 'default-server' support 'send-proxy-v2-ssl*' keywords.
- CLEANUP: server: code alignement.
- MINOR: server: Make 'default-server' support 'verify' keyword.
- MINOR: server: Make 'default-server' support 'verifyhost' setting.
- MINOR: server: Make 'default-server' support 'check' keyword.
- MINOR: server: Make 'default-server' support 'track' setting.
- MINOR: server: Make 'default-server' support 'ca-file', 'crl-file' and 'crt' settings.
- MINOR: server: Make 'default-server' support 'redir' keyword.
- MINOR: server: Make 'default-server' support 'observe' keyword.
- MINOR: server: Make 'default-server' support 'cookie' keyword.
- MINOR: server: Make 'default-server' support 'ciphers' keyword.
- MINOR: server: Make 'default-server' support 'tcp-ut' keyword.
- MINOR: server: Make 'default-server' support 'namespace' keyword.
- MINOR: server: Make 'default-server' support 'source' keyword.
- MINOR: server: Make 'default-server' support 'sni' keyword.
- MINOR: server: Make 'default-server' support 'addr' keyword.
- MINOR: server: Make 'default-server' support 'disabled' keyword.
- MINOR: server: Add 'no-agent-check' server keyword.
- DOC: server: Add docs for "server" and "default-server" new "no-*" and other settings.
- MINOR: doc: fix use-server example (imap vs mail)
- BUG/MEDIUM: tcp: don't require privileges to bind to device
- BUILD: make the release script use shortlog for the final changelog
- BUILD: scripts: fix typo in announce-release error message
- CLEANUP: time: curr_sec_ms doesn't need to be exported
- BUG/MEDIUM: server: Wrong server default CRT filenames initialization.
- BUG/MEDIUM: peers: fix buffer overflow control in intdecode.
- BUG/MEDIUM: buffers: Fix how input/output data are injected into buffers
- BUG/MINOR: http: Fix conditions to clean up a txn and to handle the next request
- CLEANUP: http: Remove channel_congested function
- CLEANUP: buffers: Remove buffer_bounce_realign function
- CLEANUP: buffers: Remove buffer_contig_area and buffer_work_area functions
- MINOR: http: remove useless check on HTTP_MSGF_XFER_LEN for the request
- MINOR: http: Add debug messages when HTTP body analyzers are called
- BUG/MEDIUM: http: Fix blocked HTTP/1.0 responses when compression is enabled
- BUG/MINOR: filters: Don't force the stream's wakeup when we wait in flt_end_analyze
- DOC: fix parenthesis and add missing "Example" tags
- DOC: update the contributing file
- DOC: log-format/tcplog/httplog update
- MINOR: config parsing: add warning when log-format/tcplog/httplog is overriden in "defaults" sections
The function buffer_contig_space is buggy and could lead to pernicious bugs
(never hitted until now, AFAIK). This function should return the number of bytes
that can be written into the buffer at once (without wrapping).
First, this function is used to inject input data (bi_putblk) and to inject
output data (bo_putblk and bo_inject). But there is no context. So it cannot
decide where contiguous space should placed. For input data, it should be after
bi_end(buf) (ie, buf->p + buf->i modulo wrapping calculation). For output data,
it should be after bo_end(buf) (ie, buf->p) and input data are assumed to not
exist (else there is no space at all).
Then, considering we need to inject input data, this function does not always
returns the right value. And when we need to inject output data, we must be sure
to have no input data at all (buf->i == 0), else the result can also be wrong
(but this is the caller responsibility, so everything should be fine here).
The buffer can be in 3 different states:
1) no wrapping
<---- o ----><----- i ----->
+------------+------------+-------------+------------+
| |oooooooooooo|iiiiiiiiiiiii|xxxxxxxxxxxx|
+------------+------------+-------------+------------+
^ <contig_space>
p ^ ^
l r
2) input wrapping
...---> <---- o ----><-------- i -------...
+-----+------------+------------+--------------------+
|iiiii|xxxxxxxxxxxx|oooooooooooo|iiiiiiiiiiiiiiiiiiii|
+-----+------------+------------+--------------------+
<contig_space> ^
^ ^ p
l r
3) output wrapping
...------ o ------><----- i -----> <----...
+------------------+-------------+------------+------+
|oooooooooooooooooo|iiiiiiiiiiiii|xxxxxxxxxxxx|oooooo|
+------------------+-------------+------------+------+
^ <contig_space>
p ^ ^
l r
buffer_contig_space returns (l - r). The cases 1 and 3 are correctly
handled. But for the second case, r is wrong. It points on the buffer's end
(buf->data + buf->size). It should be bo_end(buf) (ie, buf->p - buf->o).
To fix the bug, the function has been splitted. Now, bi_contig_space and
bo_contig_space should be used to know the contiguous space available to insert,
respectively, input data and output data. For bo_contig_space, input data are
assumed to not exist. And the right version is used, depending what we want to
do.
In addition, to clarify the buffer's API, buffer_realign does not return value
anymore. So it has the same API than buffer_slow_realign.
This patch can be backported in 1.7, 1.6 and 1.5.
This patch adds 'no-agent-check' setting supported both by 'default-server'
and 'server' directives to disable an agent check for a specific server which would
have 'agent-check' set as default value (inherited from 'default-server'
'agent-check' setting), or, on 'default-server' lines, to disable 'agent-check' setting
as default value for any further 'server' declarations.
For instance, provided this configuration:
default-server agent-check
server srv1
server srv2 no-agent-check
server srv3
default-server no-agent-check
server srv4
srv1 and srv3 would have an agent check enabled contrary to srv2 and srv4.
We do not allocate anymore anything when parsing 'default-server' 'agent-check'
setting.
This patch makes 'default-server' directives support 'sni' settings.
A field 'sni_expr' has been added to 'struct server' to temporary
stores SNI expressions as strings during both 'default-server' and 'server'
lines parsing. So, to duplicate SNI expressions from 'default-server' 'sni' setting
for new 'server' instances we only have to "strdup" these strings as this is
often done for most of the 'server' settings.
Then, sample expressions are computed calling sample_parse_expr() (only for 'server'
instances).
A new function has been added to produce the same error output as before in case
of any error during 'sni' settings parsing (display_parser_err()).
Should not break anything.
Before this patch 'check' setting was only supported by 'server' directives.
This patch makes also 'default-server' directives support this setting.
A new 'no-check' keyword parser has been implemented to disable this setting both
in 'default-server' and 'server' directives.
Should not break anything.
When SIGUSR1 is received, haproxy enters in soft-stop and quits when no
connection remains.
It can happen that the instance remains alive for a long time, depending
on timeouts and traffic. This option ensures that soft-stop won't run
for too long.
Example:
global
hard-stop-after 30s # Once in soft-stop, the instance will remain
# alive for at most 30 seconds.
We'll need to differenciate between pollers which can report hangup at
the same time as read (POLL_RDHUP) from the other ones, because only
these ones may benefit from the fd_done_recv() optimization. Epoll has
had support for EPOLLRDHUP since Linux 2.6.17 and has always been used
this way in haproxy, so now we only set the flag once we've observed it
once in a response. It means that some initial requests may try to
perform a second recv() call, but after the first closed connection it
will be enough to know that the second call is not needed anymore.
Later we may extend these flags to designate event-triggered pollers.
A tcp half connection can cause 100% CPU on expiration.
First reproduced with this haproxy configuration :
global
tune.bufsize 10485760
defaults
timeout server-fin 90s
timeout client-fin 90s
backend node2
mode tcp
timeout server 900s
timeout connect 10s
server def 127.0.0.1:3333
frontend fe_api
mode tcp
timeout client 900s
bind :1990
use_backend node2
Ie timeout server-fin shorter than timeout server, the backend server
sends data, this package is left in the cache of haproxy, the backend
server continue sending fin package, haproxy recv fin package. this
time the session information is as follows:
time the session information is as follows:
0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
srv=def ts=08 age=1s calls=3 rq[f=848000h,i=0,an=00h,rx=14m58s,wx=,ax=]
rp[f=8004c020h,i=0,an=00h,rx=,wx=14m58s,ax=] s0=[7,0h,fd=6,ex=]
s1=[7,18h,fd=7,ex=] exp=14m58s
rp has set the CF_SHUTR state, next, the client sends the fin package,
session information is as follows:
0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
srv=def ts=08 age=38s calls=4 rq[f=84a020h,i=0,an=00h,rx=,wx=,ax=]
rp[f=8004c020h,i=0,an=00h,rx=1m11s,wx=14m21s,ax=] s0=[7,0h,fd=6,ex=]
s1=[9,10h,fd=7,ex=] exp=1m11s
After waiting 90s, session information is as follows:
0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
srv=def ts=04 age=4m11s calls=718074391 rq[f=84a020h,i=0,an=00h,rx=,wx=,ax=]
rp[f=8004c020h,i=0,an=00h,rx=?,wx=10m49s,ax=] s0=[7,0h,fd=6,ex=]
s1=[9,10h,fd=7,ex=] exp=? run(nice=0)
cpu information:
6899 root 20 0 112224 21408 4260 R 100.0 0.7 3:04.96 haproxy
Buffering is set to ensure that there is data in the haproxy buffer, and haproxy
can receive the fin package, set the CF_SHUTR flag, If the CF_SHUTR flag has been
set, The following code does not clear the timeout message, causing cpu 100%:
stream.c:process_stream:
if (unlikely((res->flags & (CF_SHUTR|CF_READ_TIMEOUT)) == CF_READ_TIMEOUT)) {
if (si_b->flags & SI_FL_NOHALF)
si_b->flags |= SI_FL_NOLINGER;
si_shutr(si_b);
}
If you have closed the read, set the read timeout does not make sense.
With or without cf_shutr, read timeout is set:
if (tick_isset(s->be->timeout.serverfin)) {
res->rto = s->be->timeout.serverfin;
res->rex = tick_add(now_ms, res->rto);
}
After discussion on the mailing list, setting half-closed timeouts the
hard way here doesn't make sense. They should be set only at the moment
the shutdown() is performed. It will also solve a special case which was
already reported of some half-closed timeouts not working when the shutw()
is performed directly at the stream-interface layer (no analyser involved).
Since the stream interface layer cannot know the timeout values, we'll have
to store them directly in the stream interface so that they are used upon
shutw(). This patch does this, fixing the problem.
An easier reproducer to validate the fix is to keep the huge buffer and
shorten all timeouts, then call it under tcploop server and client, and
wait 3 seconds to see haproxy run at 100% CPU :
global
tune.bufsize 10485760
listen px
bind :1990
timeout client 90s
timeout server 90s
timeout connect 1s
timeout server-fin 3s
timeout client-fin 3s
server def 127.0.0.1:3333
$ tcploop 3333 L W N20 A P100 F P10000 &
$ tcploop 127.0.0.1:1990 C S10000000 F
"sample-fetch which captures the cipherlist" patch introduce #define
do deal with trace functions only available in openssl > 1.0.2.
Add this #define to libressl and boringssl environment.
Thanks to Piotr Kubaj for postponing and testing with libressl.
SSL_CTX_set_ecdh_auto is declared (when present) with #define. A simple #ifdef
avoid to list all cases of ssllibs. It's a placebo in new ssllibs. It's ok with
openssl 1.0.1, 1.0.2, 1.1.0, libressl and boringssl.
Thanks to Piotr Kubaj for postponing and testing with libressl.
Despite the previous commit working fine on all tests, it's still not
sufficient to completely address the problem. If the connection handler
is called with an event validating an L4 connection but some handshakes
remain (eg: accept-proxy), it will still wake the function up, which
will not report the activity, and will not detect a change once the
handshake it complete so it will not notify the ->wake() handler.
In fact the only reason why the ->wake() handler is still called here
is because after dropping the last handshake, we try to call ->recv()
and ->send() in turn and change the flags in order to detect a data
activity. But if for any reason the data layer is not interested in
reading nor writing, it will not get these events.
A cleaner way to address this is to call the ->wake() handler only
on definitive status changes (shut, error), on real data activity,
and on a complete connection setup, measured as CONNECTED with no
more handshake pending.
It could be argued that the handshake flags have to be made part of
the condition to set CO_FL_CONNECTED but that would currently break
a part of the health checks. Also a handshake could appear at any
moment even after a connection is established so we'd lose the
ability to detect a second end of handshake.
For now the situation around CO_FL_CONNECTED is not clean :
- session_accept() only sets CO_FL_CONNECTED if there's no pending
handshake ;
- conn_fd_handler() will set it once L4 and L6 are complete, which
will do what session_accept() above refrained from doing even if
an accept_proxy handshake is still pending ;
- ssl_sock_infocbk() and ssl_sock_handshake() consider that a
handshake performed with CO_FL_CONNECTED set is a renegociation ;
=> they should instead filter on CO_FL_WAIT_L6_CONN
- all ssl_fc_* sample fetch functions wait for CO_FL_CONNECTED before
accepting to fetch information
=> they should also get rid of any pending handshake
- smp_fetch_fc_rcvd_proxy() uses !CO_FL_CONNECTED instead of
CO_FL_ACCEPT_PROXY
- health checks (standard and tcp-checks) don't check for HANDSHAKE
and may report a successful check based on CO_FL_CONNECTED while
not yet done (eg: send buffer full on send_proxy).
This patch aims at solving some of these side effects in a backportable
way before this is reworked in depth :
- we need to call ->wake() to report connection success, measure
connection time, notify that the data layer is ready and update
the data layer after activity ; this has to be done either if
we switch from pending {L4,L6}_CONN to nothing with no handshakes
left, or if we notice some handshakes were pending and are now
done.
- we document that CO_FL_CONNECTED exactly means "L4 connection
setup confirmed at least once, L6 connection setup confirmed
at least once or not necessary, all this regardless of any
possibly remaining handshakes or future L6 negociations".
This patch also renames CO_FL_CONN_STATUS to the more explicit
CO_FL_NOTIFY_DATA, and works around the previous flags trick consiting
in setting an impossible combination of flags to notify the data layer,
by simply clearing the current flags.
This fix should be backported to 1.7, 1.6 and 1.5.
When a filter is used, there are 2 channel's analyzers to surround all the
others, flt_start_analyze and flt_end_analyze. This is the good place to acquire
and release resources used by filters, when needed. In addition, the last one is
used to synchronize the both channels, especially for HTTP streams. We must wait
that the analyze is finished for the both channels for an HTTP transaction
before restarting it for the next one.
But this part was buggy, leading to unexpected behaviours. First, depending on
which channel ends first, the request or the response can be switch in a
"forward forever" mode. Then, the HTTP transaction can be cleaned up too early,
while a processing is still in progress on a channel.
To fix the bug, the flag CF_FLT_ANALYZE has been added. It is set on channels in
flt_start_analyze and is kept if at least one filter is still analyzing the
channel. So, we can trigger the channel syncrhonization if this flag was removed
on the both channels. In addition, the flag TX_WAIT_CLEANUP has been added on
the transaction to know if the transaction must be cleaned up or not during
channels syncrhonization. This way, we are sure to reset everything once all the
processings are finished.
This patch should be backported in 1.7.
This adds 3 new commands to the cli :
enable dynamic-cookie backend <backend> that enables dynamic cookies for a
specified backend
disable dynamic-cookie backend <backend> that disables dynamic cookies for a
specified backend
set dynamic-cookie-key backend <backend> that lets one change the dynamic
cookie secret key, for a specified backend.
This adds a new "dynamic" keyword for the cookie option. If set, a cookie
will be generated for each server (assuming one isn't already provided on
the "server" line), from the IP of the server, the TCP port, and a secret
key provided. To provide the secret key, a new keyword as been added,
"dynamic-cookie-key", for backends.
Example :
backend bk_web
balance roundrobin
dynamic-cookie-key "bla"
cookie WEBSRV insert dynamic
server s1 127.0.0.1:80 check
server s2 192.168.56.1:80 check
This is a first step to be able to dynamically add and remove servers,
without modifying the configuration file, and still have all the load
balancers redirect the traffic to the right server.
Provide a way to generate session cookies, based on the IP address of the
server, the TCP port, and a secret key provided.
This may be used to output the JSON schema which describes the output of
show info json and show stats json.
The JSON output is without any extra whitespace in order to reduce the
volume of output. For human consumption passing the output through a
pretty printer may be helpful.
e.g.:
$ echo "show schema json" | socat /var/run/haproxy.stat stdio | \
python -m json.tool
The implementation does not generate the schema. Some consideration could
be given to integrating the output of the schema with the output of
typed and json info and stats. In particular the types (u32, s64, etc...)
and tags.
A sample verification of show info json and show stats json using
the schema is as follows. It uses the jsonschema python module:
cat > jschema.py << __EOF__
import json
from jsonschema import validate
from jsonschema.validators import Draft3Validator
with open('schema.txt', 'r') as f:
schema = json.load(f)
Draft3Validator.check_schema(schema)
with open('instance.txt', 'r') as f:
instance = json.load(f)
validate(instance, schema, Draft3Validator)
__EOF__
$ echo "show schema json" | socat /var/run/haproxy.stat stdio > schema.txt
$ echo "show info json" | socat /var/run/haproxy.stat stdio > instance.txt
python ./jschema.py
$ echo "show stats json" | socat /var/run/haproxy.stat stdio > instance.txt
python ./jschema.py
Signed-off-by: Simon Horman <horms@verge.net.au>
Add a json parameter to show (info|stat) which will output information
in JSON format. A follow-up patch will add a JSON schema which describes
the format of the JSON output of these commands.
The JSON output is without any extra whitespace in order to reduce the
volume of output. For human consumption passing the output through a
pretty printer may be helpful.
e.g.:
$ echo "show info json" | socat /var/run/haproxy.stat stdio | \
python -m json.tool
STAT_STARTED has bee added in order to track if show output has begun or
not. This is used in order to allow the JSON output routines to only insert
a "," between elements when needed. I would value any feedback on how this
might be done better.
Signed-off-by: Simon Horman <horms@verge.net.au>
This commit removes second argument(msgnum) from http_error_message and
changes http_error_message to use s->txn->status/http_get_status_idx for
mapping status code from 200..504 to HTTP_ERR_200..HTTP_ERR_504(enum).
This is needed for http-request tarpit deny_status commit.
This is like the nbsrv() sample fetch function except that it works as
a converter so it can count the number of available servers of a backend
name retrieved using a sample fetch or an environment variable.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
This option can be used to enable or to disable (prefixing the option line with
the "no" keyword) the sending of fragmented payload to agents. By default, this
option is enabled.
These options can be used to enable or to disable (prefixing the option line
with the "no" keyword), respectively, pipelined and asynchronous exchanged
between HAproxy and agents. By default, pipelining and async options are
enabled.
Now, when a payload is fragmented, the first frame must define the frame type
and the followings must use the special type SPOE_FRM_T_UNSET. This way, it is
easy to know if a fragment is the first one or not. Of course, all frames must
still share the same stream-id and frame-id.
Update SPOA example accordingly.
Now, as for peers, we use an opaque pointer to store information related to the
SPOE filter in appctx structure. These information are now stored in a dedicated
structure (spoe_appctx) and allocated, using a pool, when the applet is created.
This removes the dependency between applets and the SPOE filter and avoids to
eventually inflate the appctx structure.
Now, HAProxy and agents can announce the support for "pipelining" and/or "async"
capabilities during the HELLO handshake. For now, HAProxy always announces the
support of both. In addition, in its HELLO frames. HAproxy adds the "engine-id"
key. It is a uniq string that identify a SPOE engine.
The "pipelining" capability is the ability for a peer to decouple NOTIFY and ACK
frames. This is a symmectical capability. To be used, it must be supported by
HAproxy and agents. Unlike HTTP pipelining, the ACK frames can be send in any
order, but always on the same TCP connection used for the corresponding NOTIFY
frame.
The "async" capability is similar to the pipelining, but here any TCP connection
established between HAProxy and the agent can be used to send ACK frames. if an
agent accepts connections from multiple HAProxy, it can use the "engine-id"
value to group TCP connections.
Bug introduced with "removes SSL_CTX_set_ssl_version call and cleanup CTX
creation": ssl_sock_new_ctx is called before all the bind line is parsed.
The fix consists of separating the use of default_ctx as the initialization
context of the SSL connection via bind_conf->initial_ctx. Initial_ctx contains
all the necessary parameters before performing the selection of the CTX:
default_ctx is processed as others ctx without unnecessary parameters.
This patch used boringssl's callback to analyse CLientHello before any
handshake to extract key signature capabilities.
Certificat with better signature (ECDSA before RSA) is choosed
transparenty, if client can support it. RSA and ECDSA certificates can
be declare in a row (without order). This makes it possible to set
different ssl and filter parameter with crt-list.
The trash buffers are becoming increasingly complex to deal with due to
the code's modularity allowing some functions to be chained and causing
the same chunk buffers to be used multiple times along the chain, possibly
corrupting each other. In fact the trash were designed from scratch for
explicitly not surviving a function call but string manipulation makes
this impossible most of the time while not fullfilling the need for
reliable temporary chunks.
Here we introduce the ability to allocate a temporary trash chunk which
is reserved, so that it will not conflict with the trash chunks other
functions use, and will even support reentrant calls (eg: build_logline).
For this, we create a new pool which is exactly the size of a usual chunk
buffer plus the size of the chunk struct so that these chunks when allocated
are exactly the same size as the ones returned by get_trash_buffer(). These
chunks may fail so the caller must check them, and the caller is also
responsible for freeing them.
The code focuses on minimal changes and ease of reliable backporting
because it will be needed in stable versions in order to support next
patch.
The function dns_init_resolvers() is used to initialize socket used to
send DNS queries.
This patch gives the function the ability to close a socket before
re-opening it.
[wt: this needs to be backported to 1.7 for next fix]
Right now not only we're limited to 8 bits, but it's mentionned nowhere
and the limit was already reached. In addition, pp_opts (proxy protocol
options) were set to 32 bits while only 3 are needed. So let's swap
these two and group them together to avoid leaving two holes in the
structure, saving 64 bits on 64-bit machines.
A recent patch to support BoringSSL caused this warning to appear on
OpenSSL 1.1.0 :
src/ssl_sock.c:3062:4: warning: statement with no effect [-Wunused-value]
It's caused by SSL_CTX_set_ecdh_auto() which is now only a macro testing
that the last argument is zero, and the result is not used here. Let's
just kill it for both versions.
Tested with 0.9.8, 1.0.0, 1.0.1, 1.0.2, 1.1.0. This fix may be backported
to 1.7 if the boringssl fix is as well.
This function was deprecated in 1.1.0 causing this warning :
src/ssl_sock.c:551:3: warning: 'RAND_pseudo_bytes' is deprecated (declared at /opt/openssl-1.1.0/include/openssl/rand.h:47) [-Wdeprecated-declarations]
The man suggests to use RAND_bytes() instead. While the return codes
differ, it turns out that the function was already misused and was
relying on RAND_bytes() return code instead.
The patch was tested on 0.9.8, 1.0.0, 1.0.1, 1.0.2 and 1.1.0.
This fix must be backported to 1.7 and the return code check should
be backported to earlier versions if relevant.
In 1.0.0, this function was replaced with ERR_remove_thread_state().
As of openssl 1.1.0, both are now deprecated and do nothing at all.
Thus we simply make this call do nothing in 1.1.0 to silence the
warning.
The change was tested with 0.9.8, 1.0.0, 1.0.1, 1.0.2 and 1.1.0.
This kills the following warning on 1.1.0 :
src/ssl_sock.c:7266:9: warning: 'ERR_remove_state' is deprecated (declared at /dev/shm/openssl-1.1.0b/include/openssl/err.h:247) [-Wdeprecated-declarations]
This fix should be backported to 1.7.
After the code was ported to support 1.1.0, this one broke on 1.0.0 :
src/shctx.c:406: undefined reference to `SSL_SESSION_set1_id_context'
The function was indeed introduced only in 1.0.1. The build was validated
with 0.9.8, 1.0.0, 1.0.1, 1.0.2 and 1.1.0.
This fix must be backported to 1.7.
Limitations:
. disable force-ssl/tls (need more work)
should be set earlier with SSL_CTX_new (SSL_CTX_set_ssl_version is removed)
. disable generate-certificates (need more work)
introduce SSL_NO_GENERATE_CERTIFICATES to disable generate-certificates.
Cleanup some #ifdef and type related to boringssl env.
crt-list is extend to support ssl configuration. You can now have
such line in crt-list <file>:
mycert.pem [npn h2,http/1.1]
Support include "npn", "alpn", "verify", "ca_file", "crl_file",
"ecdhe", "ciphers" configuration and ssl options.
"crt-base" is also supported to fetch certificates.
The previous version used an O(number of proxies)^2 algo to get the sum of
the number of maxconns of frontends which reference a backend at least once.
This new version adds the frontend's maxconn number to the backend's
struct proxy member 'tot_fe_maxconn' when the backend name is resolved
for switching rules or default_backend statment. At the end, the final
backend's fullconn is computed looping only one time for all on proxies O(n).
The load of a configuration using a large amount of backends (10 thousands)
without configured fullconn was reduced from several minutes to few seconds.
Keeping the address and the port in the same field causes a lot of problems,
specifically on the DNS part where we're forced to cheat on the family to be
able to keep the port. This causes some issues such as some families not being
resolvable anymore.
This patch first moves the service port to a new field "svc_port" so that the
port field is never used anymore in the "addr" field (struct sockaddr_storage).
All call places were adapted (there aren't that many).
fc_rcvd_proxy : boolean
Returns true if the client initiated the connection with a PROXY protocol
header.
A flag is added on the struct connection if a PROXY header is successfully
parsed.
The older 'rsprep' directive allows modification of the status reason.
Extend 'http-response set-status' to take an optional string of the new
status reason.
http-response set-status 418 reason "I'm a coffeepot"
Matching updates in Lua code:
- AppletHTTP.set_status
- HTTP.res_set_status
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
debug_hexdump() prints to the requested output stream (typically stdout
or stderr) an hex dump of the blob passed in argument. This is useful
to help debug binary protocols.
Error captures almost always report a state 26 (MSG_ERROR) making it
very hard to know what the parser was expecting. The reason is that
we have to switch to MSG_ERROR to trigger the dump, and then during
the dump we capture the current state which is already MSG_ERROR. With
this change we now copy the current state into an err_state field that
will be reported as the faulty state.
This patch looks a bit large because the parser doesn't update the
current state until it runs out of data so the current state is never
known when jumping to ther error label! Thus the code had to be updated
to take copies of the current state before switching to MSG_ERROR based
on the switch/case values.
As a bonus, it now shows the current state in human-readable form and
not only in numeric form ; in the past it was not an issue since it was
always 26 (MSG_ERROR).
At least now we can get exploitable invalid request/response reports :
[05/Jan/2017:19:28:57.095] frontend f (#2): invalid request
backend <NONE> (#-1), server <NONE> (#-1), event #1
src 127.0.0.1:39894, session #4, session flags 0x00000080
HTTP msg state MSG_RQURI(4), msg flags 0x00000000, tx flags 0x00000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00908002, out 0 bytes, total 20 bytes
pending 20 bytes, wrapping at 16384, error at position 5:
00000 GET /\e HTTP/1.0\r\n
00017 \r\n
00019 \n
[05/Jan/2017:19:28:33.827] backend b (#3): invalid response
frontend f (#2), server s1 (#1), event #0
src 127.0.0.1:39718, session #0, session flags 0x000004ce
HTTP msg state MSG_HDR_NAME(17), msg flags 0x00000000, tx flags 0x08300000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x80008002, out 0 bytes, total 59 bytes
pending 59 bytes, wrapping at 16384, error at position 31:
00000 HTTP/1.1 200 OK\r\n
00017 Content-length : 10\r\n
00038 \r\n
00040 0a\r\n
00044 0123456789\r\n
00056 0\r\n
This should be backported to 1.7 and 1.6 at least to help with bug
reports.
It is important to defined analyzers (AN_REQ_* and AN_RES_*) in the same order
they are evaluated in process_stream. This order is really important because
during analyzers evaluation, we run them in the order of the lower bit to the
higher one. This way, when an analyzer adds/removes another one during its
evaluation, we know if it is located before or after it. So, when it adds an
analyzer which is located before it, we can switch to it immediately, even if it
has already been called once but removed since.
With the time, and introduction of new analyzers, this order was broken up. the
main problems come from the filter analyzers. We used values not related with
their evaluation order. Furthermore, we used same values for request and response
analyzers.
So, to fix the bug, filter analyzers have been splitted in 2 distinct lists to
have different analyzers for the request channel than those for the response
channel. And of course, we have moved them to the right place.
Some other analyzers have been reordered to respect the evaluation order:
* AN_REQ_HTTP_TARPIT has been moved just before AN_REQ_SRV_RULES
* AN_REQ_PRST_RDP_COOKIE has been moved just before AN_REQ_STICKING_RULES
* AN_RES_STORE_RULES has been moved just after AN_RES_WAIT_HTTP
Note today we have 29 analyzers, all stored into a 32 bits bitfield. So we can
still add 4 more analyzers before having a problem. A good way to fend off the
problem for a while could be to have a different bitfield for request and
response analyzers.
[wt: all of this must be backported to 1.7, and part of it must be backported
to 1.6 and 1.5]
this adds a support of the newest pcre2 library,
more secure than its older sibling in a cost of a
more complex API.
It works pretty similarly to pcre's part to keep
the overall change smooth, except :
- we define the string class supported at compile time.
- after matching the ovec data is properly sized, althought
we do not take advantage of it here.
- the lack of jit support is treated less 'dramatically'
as pcre2_jit_compile in this case is 'no-op'.
Historically a lot of SSL global settings were stored into the global
struct, but we've reached a point where there are 3 ifdefs in it just
for this, and others in haproxy.c to initialize it.
This patch moves all the private fields to a new struct "global_ssl"
stored in ssl_sock.c. This includes :
char *crt_base;
char *ca_base;
char *listen_default_ciphers;
char *connect_default_ciphers;
int listen_default_ssloptions;
int connect_default_ssloptions;
int tune.sslprivatecache; /* Force to use a private session cache even if nbproc > 1 */
unsigned int tune.ssllifetime; /* SSL session lifetime in seconds */
unsigned int tune.ssl_max_record; /* SSL max record size */
unsigned int tune.ssl_default_dh_param; /* SSL maximum DH parameter size */
int tune.ssl_ctx_cache; /* max number of entries in the ssl_ctx cache. */
The "tune" part was removed (useless here) and the occasional "ssl"
prefixes were removed as well. Thus for example instead of
global.tune.ssl_default_dh_param
we now have :
global_ssl.default_dh_param
A few initializers were present in the constructor, they could be brought
back to the structure declaration.
A few other entries had to stay in global for now. They concern memory
calculationn (used in haproxy.c) and stats (used in stats.c).
The code is already much cleaner now, especially for global.h and haproxy.c
which become readable.
tlskeys_finalize_config() was the only reason for haproxy.c to still
require ifdef and includes for ssl_sock. This one fits perfectly well
in the late initializers so it was changed to be registered with
hap_register_post_check().
There are still a lot of #ifdef USE_OPENSSL in the code (still 43
occurences) because we never know if we can directly access ssl_sock
or not. This patch attacks the problem differently by providing a
way for transport layers to register themselves and for users to
retrieve the pointer. Unregistered transport layers will point to NULL
so it will be easy to check if SSL is registered or not. The mechanism
is very inexpensive as it relies on a two-entries array of pointers,
so the performance will not be affected.
Having it in the ifdef complicates certain operations which require
additional ifdefs just to access a member which could remain zero in
non-ssl cases. Let's move it out, it will not even increase the
struct size on 64-bit machines due to alignment.
This one will be set by the transport layers which want to destroy
a bind_conf. It will typically be used by SSL to release certificates,
CAs and so on.
Instead of hard-coding all SSL preparation in cfgparse.c, we now register
this new function as the transport layer's prepare_bind_conf() and call it
only when definied. This removes some non-obvious SSL-specific code from
cfgparse.c as well as a #ifdef.
This one will be set by the transport layers which want to initialize
a bind_conf. It will typically be used by SSL to load certificates, CAs
and so on.
Most of the SSL functions used to have a proxy argument which was mostly
used to be able to emit clean errors using Alert(). First, many of them
were converted to memprintf() and don't require this pointer anymore.
Second, the rare which still need it also have either a bind_conf argument
or a server argument, both of which carry a pointer to the relevant proxy.
So let's now get rid of it, it needlessly complicates the API and certain
functions already have many arguments.
Historically, all listeners have a pointer to the frontend. But since
the introduction of SSL, we now have an intermediary layer called
bind_conf corresponding to a "bind" line. It makes no sense to have
the frontend on each listener given that it's the same for all
listeners belonging to a same bind_conf. Also certain parts like
SSL can only operate on bind_conf and need the frontend.
This patch fixes this by moving the frontend pointer from the listener
to the bind_conf. The extra indirection is quite cheap given and the
places were this is used are very scarce.
A mistake was made when the socket layer was cut into proto and
transport, the transport was attached to the listener while all
listeners in a single "bind" line always have exactly the same
transport. It doesn't seem obvious but this is the reason why there
are so many #ifdefs USE_OPENSSL in cfgparse : a lot of operations
have to be open-coded because cfgparse only manipulates bind_conf
and we don't have the information of the transport layer here.
Very little code makes use of the transport layer, mainly session
setup and log. These places can afford an extra pointer indirection
(the listener points to the bind_conf). This change is thus very small,
it saves a little bit of memory (8B per listener) and makes the code
more flexible.
This finishes to clean up the zlib-specific parts. It also unbreaks recent
commit b97c6fb ("CLEANUP: compression: use the build options list to report
the algos") which broke USE_ZLIB due to MAXWBITS not being defined anymore
in haproxy.c.
We already had alertif_too_many_args{,_idx}(), but these ones are
specifically designed for use in cfgparse. Outside of it we're
trying to avoid calling Alert() all the time so we need an
equivalent using a pointer to an error message.
These new functions called too_many_args{,_idx)() do exactly this.
They don't take the file name nor the line number which they have
no use for but instead they take an optional pointer to an error
message and the pointer to the error code is optional as well.
With (NULL, NULL) they'll simply check the validity and return a
verdict. They are quite convenient for use in isolated keyword
parsers.
These two new functions as well as the previous ones have all been
exported.
We replaced global.deviceatlas with global_deviceatlas since there's no need
to store all this into the global section. This removes the last #ifdefs,
and now the code is 100% self-contained in da.c. The file da.h was now
removed because it was only used to load dac.h, which is more easily
loaded directly from da.c. It provides another good example of how to
integrate code in the future without touching the core parts.
We replaced global._51degrees with global_51degrees since there's no need
to store all this into the global section. This removes the last #ifdefs,
and now the code is 100% self-contained in 51d.c. The file 51d.h was now
removed because it was only used to load 51Degrees.h, which is more easily
loaded from 51d.c. It provides a good example of how to integrate code in
the future without touching the core parts.
We replaced global.wurfl with global_wurfl since there's no need to store
all this into the global section. This removes the last #ifdefs, and now
the code is 100% self-contained in wurfl.c. It provides a good example of
how to integrate code in the future without touching the core parts.
deinit_51degrees() is not called anymore from haproxy.c, removing
2 #ifdefs and one include. The function was made static. The include
file still includes 51Degrees.h which is needed by global.h and 51d.c
so it was not touched beyond this last function removal.
By registering the deinit function we avoid another #ifdef in haproxy.c.
The ha_wurfl_deinit() function has been made static and unexported. Now
proto/wurfl.h is totally empty, the code being self-contained in wurfl.c,
so the useless .h has been removed.
The 3 device detection engines stop at the same place in deinit()
with the usual #ifdefs. Similar to the other functions we can have
some late deinitialization functions. These functions do not return
anything however so we have to use a different type.
Instead of having a #ifdef in the main init code we now use the registered
init functions. Doing so also enables error checking as errors were previously
reported as alerts but ignored. Also they were incorrect as the 'status'
variable was hidden by a second one and was always reporting DA_SYS (which
is apparently an error) in every case including the case where no file was
loaded. The init_deviceatlas() function was unexported since it's not used
outside of this place anymore.
This removes some #ifdefs from the main haproxy code path. Function
init_51degrees() now returns ERR_* instead of exit(1) on error, and
this function was made static and is not exported anymore.
This removes some #ifdefs from the main haproxy code path and enables
error checking. The current code only makes use of warnings even for
some errors that look serious. While this choice is questionnable, it
has been kept as-is, and only the return codes were adapted to ERR_WARN
to at least report that some warnings were emitted. ha_wurfl_init() was
unexported as it's not needed anymore.
Instead of calling the checks directly from the init code, we now
register the start_checks() function to be run at this point. This
also allows to unexport the check init function and to remove one
include from haproxy.c.
There's a significant amount of late initialization calls which are
performed after the point where we exit in check mode. These calls
are used to allocate resource and perform certain slow operations.
Let's have a way to register some functions which need to be called
there instead of having this multitude of #ifdef in the init path.
Many extensions now report some build options to ease debugging, but
this is now being done at the expense of code maintainability. Let's
provide a registration function to do this so that we can start to
remove most of the #ifdefs from haproxy.c (18 currently just for a
single function).
This one now migrates to the general purpose cli.p0 for the ref pointer,
cli.i0 for the dump_all flag and cli.i1 for the dump_keys_index. A few
comments were added.
The applet.h file doesn't depend on openssl anymore. It's worth noting
that the previous dependency was accidental and only used to work because
all files including this one used to have openssl included prior to
loading this file.
This one now migrates to the general purpose cli.p0 for the proxy pointer,
cli.p1 for the server pointer, and cli.i0 for the proxy's instance if only
one has to be dumped.
Most of the keywords don't need to have their own entry in the appctx
union, they just need to reuse some generic pointers like we've been
used to do in the appctx with st{0,1,2}. This patch adds p0, p1, i0, i1
and initializes them to zero before calling the parser. This way some
of the simplest existing keywords will be able to disappear from the
union.
It's worth noting that this is an extension to what was initially
attempted via the "private" member that I removed a few patches ago by
not understanding how it was supposed to be used. Here the fact that
we share the same union will force us to be stricter: the code either
uses the general purpose variables or it uses its own fields but not
both.
The appctx storage became a real mess along the years. It now contains
mostly CLI-specific parts that share the same storage as the "cli" part
which in fact only contains the fields needed to pass an error message
to the caller, and it also has room a few other regular applets which
may become more and more common.
This first patch moves the parts around in the union so that all
standard applet parts are grouped together and the CLI-specific ones
are grouped together. It also adds a few comments to indicate what
certain parts are used for since it's sometimes a bit confusing.
The struct hlua size is 128 bytes. The size is the biggest of all the elements
of the union embedded in the appctx struct. With HTTP2, it is possible that this
appctx struct will be use many times for each connection, so the 128 bytes are
a little bit heavy for the global memory consomation.
This patch replace the embbeded hlua struct by a pointer and an associated memory
pool. Now, the memory for lua is allocated only if it is required.
[wt: the appctx is now down to 160 bytes]
Just like previous patch, this was the only other user of the "private"
field of the applet. It used to store a copy of the keyword's action.
Let's just put it into ->table->action and use it from there. It also
slightly simplifies the code by removing a few pointer to integer casts.
We have very few users of the appctx's private field which was introduced
prior to the split of the CLI. Unfortunately it was not removed after the
end. This commit simply introduces hlua_cli->fcn which is the pointer to
the Lua function that the Lua code used to store in this private pointer.
This problem is already detected here:
8dc7316a6f
Another case raises. Now HAProxy sends a final message (typically
with "http-request deny"). Once the the message is sent, the response
channel flags are not modified.
HAProxy executes a Lua sample-fecthes for building logs, and the
result is ignored because the response flag remains set to the value
HTTP_MSG_RPBEFORE. So the Lua function hlua_check_proto() want to
guarantee the valid state of the buffer and ask for aborting the
request.
The function check_proto() is not the good way to ensure request
consistency. The real question is not "Are the message valid ?", but
"Are the validity of message unchanged ?"
This patch memorize the parser state before entering int the Lua
code, and perform a check when it go out of the Lua code. If the parser
state change for down, the request is aborted because the HTTP message
is degraded.
This patch should be backported in version 1.6 and 1.7
Fixing the build using LibreSSL as OpenSSL implementation.
Currently, LibreSSL 2.4.4 provides the same API of OpenSSL 1.0.1x,
but it redefine the OpenSSL version number as 2.0.x, breaking all
checks with OpenSSL 1.1.x.
The patch solves the issue checking the definition of the symbol
LIBRESSL_VERSION_NUMBER when Openssl 1.1.x features are requested.
When an entity tries to get a buffer, if it cannot be allocted, for example
because the number of buffers which may be allocated per process is limited,
this entity is added in a list (called <buffer_wq>) and wait for an available
buffer.
Historically, the <buffer_wq> list was logically attached to streams because it
were the only entities likely to be added in it. Now, applets can also be
waiting for a free buffer. And with filters, we could imagine to have more other
entities waiting for a buffer. So it make sense to have a generic list.
Anyway, with the current design there is a bug. When an applet failed to get a
buffer, it will wait. But we add the stream attached to the applet in
<buffer_wq>, instead of the applet itself. So when a buffer is available, we
wake up the stream and not the waiting applet. So, it is possible to have
waiting applets and never awakened.
So, now, <buffer_wq> is independant from streams. And we really add the waiting
entity in <buffer_wq>. To be generic, the entity is responsible to define the
callback used to awaken it.
In addition, applets will still request an input buffer when they become
active. But they will not be sleeped anymore if no buffer are available. So this
is the responsibility to the applet I/O handler to check if this buffer is
allocated or not. This way, an applet can decide if this buffer is required or
not and can do additional processing if not.
[wt: backport to 1.7 and 1.6]
A stream can be awakened for different reasons. During its processing, it can be
early stopped if no buffer is available. In this situation, the reason why the
stream was awakened is lost, because we rely on the task state, which is reset
after each processing loop.
In many cases, that's not a big deal. But it can be useful to accumulate the
task states if the stream processing is interrupted, especially if some filters
need to be called.
To be clearer, here is an simple example:
1) A stream is awakened with the reason TASK_WOKEN_MSG.
2) Because no buffer is available, the processing is interrupted, the stream
is back to sleep. And the task state is reset.
3) Some buffers become available, so the stream is awakened with the reason
TASK_WOKEN_RES. At this step, the previous reason (TASK_WOKEN_MSG) is lost.
Now, the task states are saved for a stream and reset only when the stream
processing is not interrupted. The correspoing bitfield represents the pending
events for a stream. And we use this one instead of the task state during the
stream processing.
Note that TASK_WOKEN_TIMER and TASK_WOKEN_RES are always removed because these
events are always handled during the stream processing.
[wt: backport to 1.7 and 1.6]
<run_queue> is used to track the number of task in the run queue and
<run_queue_cur> is a copy used for the reporting purpose. These counters has
been renamed, respectively, <tasks_run_queue> and <tasks_run_queue_cur>. So the
naming is consistent between tasks and applets.
[wt: needed for next fixes, backport to 1.7 and 1.6]
As for tasks, 2 counters has been added to track :
* the total number of applets : nb_applets
* the number of active applets : applets_active_queue
[wt: needed for next fixes, to backport to 1.7 and 1.6]
(http|tcp)-(request|response) action cannot take arguments from the
configuration file. Arguments are useful for executing the action with
a special context.
This patch adds the possibility of passing arguments to an action. It
runs exactly like sample fetches and other Lua wrappers.
Note that this patch implements a 'TODO'.
Commit 5fddab0 ("OPTIM: stream_interface: disable reading when
CF_READ_DONTWAIT is set") improved the connection layer's efficiency
back in 1.5-dev13 by avoiding successive read attempts on an active
FD. But by disabling this on a polled FD, it causes an unpleasant
side effect which is that the FD that was subscribed to polling is
suddenly stopped and may need to be re-enabled once the kernel
starts to slow down on data eviction (eg: saturated server at the
other end, bursty traffic caused by too large maxpollevents).
This behaviour is observable with persistent connections when there
is a large enough connection count so that there's no data in the
early connection and polling is required, because there are then
up to 4 epoll_ctl() calls per request. It's important that the
server is slower than haproxy to cause some delays when reading
response.
The current connection layer as designed in 1.6 with the FD cache
doesn't require this trick anymore, though it still benefits from
it when it saves an FD from being uselessly polled. But compared
to the increased cost of enabling and disabling poll all the time,
it's still better to disable it. In some cases it's possible to
observe a performance increase as high as 30% by avoiding this
epoll_ctl() dance.
In the end we only want to disable it when the FD is speculatively
read and not when it's polled. For this we introduce a new function
__conn_data_done_recv() which is used to indicate that we're done
with recv() and not interested in new attempts. If/when we later
support event-triggered epoll, this function will have to change
a bit to do the same even in the polled case.
A quick test with keep-alive requests run on a dual-core / dual-
thread Atom shows a significant improvement :
single process, 0 bytes :
before: Requests per second: 12243.20 [#/sec] (mean)
after: Requests per second: 13354.54 [#/sec] (mean)
single process, 4k :
before: Requests per second: 9639.81 [#/sec] (mean)
after: Requests per second: 10991.89 [#/sec] (mean)
dual process, 0 bytes (unstable) :
before: Requests per second: 16900-19800 ~ 17600 [#/sec] (mean)
after: Requests per second: 18600-21400 ~ 20500 [#/sec] (mean)
It already returns an empty string when the field is empty, but as a
preventive measure we should do the same when the string itself is a
NULL. While it is not supposed to happen, it will make the code more
resistant against failed allocations and unexpected results.
This fix should be backported to 1.7.
Historically we used to have the stick counters processing put into
session.c which became stream.c. But a big part of it is now in
stick-table.c (eg: converters) but despite this we still have all
the sample fetch functions in stream.c
These parts do not depend on the stream anymore, so let's move the
remaining chunks to stick-table.c and have cleaner files.
What remains in stream.c is everything needed to attach/detach
trackers to the stream and to update the counters while the stream
is being processed.
There's no more reason to keep tcp rules processing inside proto_tcp.c
given that there is nothing in common there except these 3 letters : tcp.
The tcp rules are in fact connection, session and content processing rules.
Let's move them to "tcp-rules" and let them live their life there.
We used to have 3 types of counters with a huge overlap :
- listener counters : stats collected for each bind line
- proxy counters : union of the frontend and backend counters
- server counters : stats collected per server
It happens that quite a good part was common between listeners and
proxies due to the frontend counters being updated at the two locations,
and that similarly the server and proxy counters were overlapping and
being updated together.
This patch cleans this up to propose only two types of counters :
- fe_counters: used by frontends and listeners, related to
incoming connections activity
- be_counters: used by backends and servers, related to outgoing
connections activity
This allowed to remove some non-sensical counters from both parts. For
frontends, the following entries were removed :
cum_lbconn, last_sess, nbpend_max, failed_conns, failed_resp,
retries, redispatches, q_time, c_time, d_time, t_time
For backends, this ones was removed : intercepted_req.
While doing this it was discovered that we used to incorrectly report
intercepted_req for backends in the HTML stats, which was always zero
since it's never updated.
Also it revealed a few inconsistencies (which were not fixed as they
are harmless). For example, backends count connections (cum_conn)
instead of sessions while servers count sessions and not connections.
Over the long term, some extra cleanups may be performed by having
some counters update functions touching both the server and backend
at the same time, as well as both the frontend and listener, to
ensure that all sides have all their stats properly filled. The stats
dump will also be able to factor the dump functions by counter types.
Reinhard Vicinus reported that the reported average response times cannot
be larger than 16s due to the double multiply being performed by
swrate_add() which causes an overflow very quickly. Indeed, with N=512,
the highest average value is 16448.
One solution proposed by Reinhard is to turn to long long, but this
involves 64x64 multiplies and 64->32 divides, which are extremely
expensive on 32-bit platforms.
There is in fact another way to avoid the overflow without using larger
integers, it consists in avoiding the multiply using the fact that
x*(n-1)/N = x-(x/N).
Now it becomes possible to store average values as large as 8.4 millions,
which is around 2h18mn.
Interestingly, this improvement also makes the code cheaper to execute
both on 32 and on 64 bit platforms :
Before :
00000000 <swrate_add>:
0: 8b 54 24 04 mov 0x4(%esp),%edx
4: 8b 0a mov (%edx),%ecx
6: 89 c8 mov %ecx,%eax
8: c1 e0 09 shl $0x9,%eax
b: 29 c8 sub %ecx,%eax
d: 8b 4c 24 0c mov 0xc(%esp),%ecx
11: c1 e8 09 shr $0x9,%eax
14: 01 c8 add %ecx,%eax
16: 89 02 mov %eax,(%edx)
After :
00000020 <swrate_add>:
20: 8b 4c 24 04 mov 0x4(%esp),%ecx
24: 8b 44 24 0c mov 0xc(%esp),%eax
28: 8b 11 mov (%ecx),%edx
2a: 01 d0 add %edx,%eax
2c: 81 c2 ff 01 00 00 add $0x1ff,%edx
32: c1 ea 09 shr $0x9,%edx
35: 29 d0 sub %edx,%eax
37: 89 01 mov %eax,(%ecx)
This fix may be backported to 1.6.
When dealing with many proxies, it's hard to spot response errors because
all internet-facing frontends constantly receive attacks. This patch now
makes it possible to demand that only request or response errors are dumped
by appending "request" or "reponse" to the show errors command.
The function log format emit its own error message using Alert(). This
patch replaces this behavior and uses the standard HAProxy error system
(with memprintf).
The benefits are:
- cleaning the log system
- the logformat can ignore the caller (actually the caller must set
a flag designing the caller function).
- Make the usage of the logformat function easy for future components.
Commit 1866d6d ("MEDIUM: ssl: Add support for OpenSSL 1.1.0")
introduced support for openssl 1.1.0 and temporarily broke 0.9.8.
In the end the port was not very hard given that the only cause of
build failures were functions supposedly absent from 0.9.8 that in
fact did exist.
Thus, adding a new #if to move these functions for versions older
than 0.9.8 was enough to fix the trouble. It received very light
testing, basically only an SSL bridge decrypting and re-encrypting
traffic, and checking that everything looks right. That said, the
functions specific to 0.9.8 here compared to 1.0.x are only
SSL_SESSION_set1_id_context(), EVP_PKEY_base_id(), and
X509_PUBKEY_get0_param().
Until now, the function parse_logformat_string() never fails. It
send warnings when it parses bad format, and returns expression in
best effort.
This patch replaces warnings by alert and returns a fail code.
Maybe the warning mode is designed for a compatibility with old
configuration versions. If it is the case, now this compatibility
is broken.
[wt: no, the reason is that an alert must cause a startup failure,
but this will be OK with next patch]
The log-format function parse_logformat_string() takes file and line
for building parsing logs. These two parameters are embedded in the
struct proxy curproxy, which is the current parsing context.
This patch removes these two unused arguments.
Remove export of the fucntion parse_logformat_var_args() and
parse_logformat_var(). These functions are a part of the
logformat parser, and this export is useless.
We get this when Lua is disabled, just a missing include.
In file included from src/queue.c:18:0:
include/proto/server.h:51:39: warning: 'struct appctx' declared inside parameter list [enabled by default]
This way we don't have any more state specific to a given yieldable
command. The other commands should be easier to move as they only
involve a parser.
It really belongs to proto_http.c since it's a dump for HTTP request
and response errors. Note that it's possible that some parts do not
need to be exported anymore since it really is the only place where
errors are manipulated.
The table dump code was a horrible mess, with common parts interleaved
all the way to deal with the various actions (set/clear/show). A few
error messages were still incorrect, as the "set" operation did not
update them so they would still report "unknown action" (now fixed).
The action was now passed as a private argument to the CLI keyword
which itself is copied into the appctx private field. It's just an
int cast to a pointer.
Some minor issues were noticed while doing this, for example when dumping
an entry by key, if the key doesn't exist, nothing is printed, not even
the table's header. It's unclear whether this was intentional but it
doesn't really match what is done for data-based dumps. It was left
unchanged for now so that a later fix can be backported if needed.
Enum entries STAT_CLI_O_TAB, STAT_CLI_O_CLR and STAT_CLI_O_SET were
removed.
Move the "show info" command to stats.c using the CLI keyword API
to register it on the CLI. The stats_dump_info_to_buffer() function
is now static again. Note, we don't need proto_ssl anymore in cli.c.
Move the "show stat" command to stats.c using the CLI keyword API
to register it on the CLI. The stats_dump_stat_to_buffer() function
is now static again.
Move 'show sess' CLI functions to stream.c and use the cli keyword API
to register it on the CLI.
[wt: the choice of stream vs session makes sense because since 1.6 these
really are streams that we're dumping and not sessions anymore]
Several CLI commands require a frontend, so let's have a function to
look this one up and prepare the appropriate error message and the
appctx's state in case of failure.
Several CLI commands require a server, so let's have a function to
look this one up and prepare the appropriate error message and the
appctx's state in case of failure.
Move map and acl CLI functions to map.c and use the cli keyword API to
register actions on the CLI. Then remove the now unused individual
"add" and "del" keywords.
proto/dumpstats.h has been split in 4 files:
* proto/cli.h contains protypes for the CLI
* proto/stats.h contains prototypes for the stats
* types/cli.h contains definition for the CLI
* types/stats.h contains definition for the stats
These functions will be needed by "show sess" on the CLI, let's make them
globally available. It's important to note that due to the fact that we
still do not set the data and transport layers' names in the structures,
we still have to rely on some exports just to match the pointers. This is
ugly but is preferable to adding many includes since the short-term goal
is to get rid of these tests by having proper names in place.
uint16_t instead of u_int16_t
None ISO fields of struct tm are not present, but
by zeroyfing it, on GNU and BSD systems tm_gmtoff
field will be set.
[wt: moved the memset into each of the date functions]
Setting an FD to -1 when closed isn't the most easily noticeable thing
to do when we're chasing accidental reuse of a stale file descriptor.
Instead set it to that large a negative value that it will overflow the
fdtab and provide an analysable core at the moment the issue happens.
Care was taken to ensure it doesn't overflow nor change sign on 32-bit
machines when multiplied by fdtab, and that it also remains negative for
the various checks that exist. The value equals 0xFDDEADFD which happens
to be easily spotted in a debugger.
The bug described in commit 568743a ("BUG/MEDIUM: stream-int: completely
detach connection on connect error") was not a stream-interface layer bug
but a connection layer bug. There was exactly one place in the code where
we could change a file descriptor's status without first checking whether
it is valid or not, it was in conn_stop_polling(). This one is called when
the polling status is changed after an update, and calls fd_stop_both even
if we had already closed the file descriptor :
1479388298.484240 ->->->->-> conn_fd_handler > conn_cond_update_polling
1479388298.484240 ->->->->->-> conn_cond_update_polling > conn_stop_polling
1479388298.484241 ->->->->->->-> conn_stop_polling > conn_ctrl_ready
1479388298.484241 conn_stop_polling < conn_ctrl_ready
1479388298.484241 ->->->->->->-> conn_stop_polling > fd_stop_both
1479388298.484242 ->->->->->->->-> fd_stop_both > fd_update_cache
1479388298.484242 ->->->->->->->->-> fd_update_cache > fd_release_cache_entry
1479388298.484242 fd_update_cache < fd_release_cache_entry
1479388298.484243 fd_stop_both < fd_update_cache
1479388298.484243 conn_stop_polling < fd_stop_both
1479388298.484243 conn_cond_update_polling < conn_stop_polling
1479388298.484243 conn_fd_handler < conn_cond_update_polling
The problem with the previous fix above is that it break the http_proxy mode
and possibly even some Lua parts and peers to a certain extent ; all outgoing
connections where the target address is initially copied into the outgoing
connection which experience a retry would use a random outgoing address after
the retry because closing and detaching the connection causes the target
address to be lost. This was attempted to be addressed by commit 0857d7a
("BUG/MAJOR: stream: properly mark the server address as unset on connect
retry") but it used to only solve the most visible effect and not the root
cause.
Prior to this fix, it was possible to cause this config to keep CLOSE_WAIT
for as long as it takes to expire a client or server timeout (note the
missing client timeout) :
listen test
mode http
bind :8002
server s1 127.0.0.1:8001
$ tcploop 8001 L0 W N20 A R P100 S:"HTTP/1.1 200 OK\r\nContent-length: 0\r\n\r\n" &
$ tcploop 8002 N200 C T W S:"GET / HTTP/1.0\r\n\r\n" O P10000 K
With this patch, these CLOSE_WAIT properly vanish when both processes leave.
This commit reverts the two fixes above and replaces them with the proper
fix in connection.h. It must be backported to 1.6 and 1.5. Thanks to
Robson Roberto Souza Peixoto for providing very detailed traces showing
some obvious inconsistencies leading to finding this bug.
This pointer will be used for storing private context. With this,
the same executed function can handle more than one keyword. This
will be very useful for creation Lua cli bindings.
The release function is called when the command is terminated (give
back the hand to the prompt) or when the session is broken (timeout
or client closed).
Commit d7c9196 ("MAJOR: filters: Add filters support") removed sample.h
from proto_http.h, but it has become necessary as of commit fd7edd3
("MINOR: Move http method enum from proto_http to sample") in order
to have HTTP_METH_*. Due to this, the "debug/flags" utility doesn't
build anymore.
A new "option spop-check" statement has been added to enable server health
checks based on SPOP HELLO handshake. SPOP is the protocol used by SPOE filters
to talk to servers.
SPOE makes possible the communication with external components to retrieve some
info using an in-house binary protocol, the Stream Processing Offload Protocol
(SPOP). In the long term, its aim is to allow any kind of offloading on the
streams. This first version, besides being experimental, won't do lot of
things. The most important today is to validate the protocol design and lay the
foundations of what will, one day, be a full offload engine for the stream
processing.
So, for now, the SPOE can offload the stream processing before "tcp-request
content", "tcp-response content", "http-request" and "http-response" rules. And
it only supports variables creation/suppression. But, in spite of these limited
features, we can easily imagine to implement a SSO solution, an ip reputation
service or an ip geolocation service.
Internally, the SPOE is implemented as a filter. So, to use it, you must use
following line in a proxy proxy section:
frontend my-front
...
filter spoe [engine <name>] config <file>
...
It uses its own configuration file to keep the HAProxy configuration clean. It
is also a easy way to disable it by commenting out the filter line.
See "doc/SPOE.txt" for all details about the SPOE configuration.
It does the opposite of 'set-var' action/converter. It is really useful for
per-process variables. But, it can be used for any scope.
The lua function 'unset_var' has also been added.
Now it is possible to use variables attached to a process. The scope name is
'proc'. These variables are released only when HAProxy is stopped.
'tune.vars.proc-max-size' directive has been added to confiure the maximum
amount of memory used by "proc" variables. And because memory accounting is
hierachical for variables, memory for "proc" vars includes memory for "sess"
vars.
This function, unsurprisingly, sets a variable value only if it already
exists. In other words, this function will succeed only if the variable was
found somewhere in the configuration during HAProxy startup.
It will be used by SPOE filter. So an agent will be able to set a value only for
existing variables. This prevents an agent to create a very large number of
unused variables to flood HAProxy and exhaust the memory reserved to variables..
This code has been moved from haproxy.c to sample.c and the function
release_sample_expr can now be called from anywhere to release a sample
expression. This function will be used by the stream processing offload engine
(SPOE).
A scope is a section name between square bracket, alone on its line, ie:
[scope-name]
...
The spaces at the beginning and at the end of the line are skipped. Comments at
the end of the line are also skipped.
When a scope is parsed, its name is saved in the global variable
cfg_scope. Initially, cfg_scope is NULL and it remains NULL until a valid scope
line is parsed.
This feature remains unused in the HAProxy configuration file and
undocumented. However, it will be used during SPOE configuration parsing.
This feature will be used by the stream processing offload engine (SPOE) to
parse dedicated configuration files without mixing HAProxy sections with SPOE
sections.
So, here we can back up all sections known by HAProxy, unregister all of them
and add new ones, dedicted to the SPOE. Once the SPOE configuration file parsed,
we can roll back all changes by restoring HAProxy sections.
New callbacks have been added to handle creation and destruction of filter
instances:
* 'attach' callback is called after a filter instance creation, when it is
attached to a stream. This happens when the stream is started for filters
defined on the stream's frontend and when the backend is set for filters
declared on the stream's backend. It is possible to ignore the filter, if
needed, by returning 0. This could be useful to have conditional filtering.
* 'detach' callback is called when a filter instance is detached from a stream,
before its destruction. This happens when the stream is stopped for filters
defined on the stream's frontend and when the analyze ends for filters defined
on the stream's backend.
In addition, the callback 'stream_set_backend' has been added to know when a
backend is set for a stream. It is only called when the frontend and the backend
are not the same. And it is called for all filters attached to a stream
(frontend and backend).
Finally, the TRACE filter has been updated.
It is very common when validating a configuration out of production not to
have access to the same resolvers and to fail on server address resolution,
making it difficult to test a configuration. This option simply appends the
"none" method to the list of address resolution methods for all servers,
ensuring that even if the libc fails to resolve an address, the startup
sequence is not interrupted.
This will allow a server to automatically fall back to an explicit numeric
IP address when all other methods fail. The address is simply specified in
the address list.
This new setting supports a comma-delimited list of methods used to
resolve the server's FQDN to an IP address. Currently supported methods
are "libc" (use the regular libc's resolver) and "last" (use the last
known valid address found in the state file).
The list is implemented in a 32-bit integer, because each init-addr
method only requires 3 bits. The last one must always be SRV_IADDR_END
(0), allowing to store up to 10 methods in a single 32 bit integer.
Note: the doc is provided at the end of this series.
This adds new "hold" timers : nx, refused, timeout, other. This timers
will be used to tell HAProxy to keep an erroneous response as valid for
the corresponding period. For now they're only configured, not enforced.
It will be important to help debugging some DNS resolution issues to
know why a server was marked down, so let's make the function support
a 3rd argument with an indication of the reason. Passing NULL will keep
the message as-is.
This flag has to be set when an IP address resolution fails (either
using libc at start up or using HAProxy's runtime resolver). This will
automatically trigger the administrative status "MAINT", through the
global mask SRV_ADMF_MAINT.
Server addresses are not resolved anymore upon the first pass so that we
don't fail if an address cannot be resolved by the libc. Instead they are
processed all at once after the configuration is fully loaded, by the new
function srv_init_addr(). This function only acts on the server's address
if this address uses an FQDN, which appears in server->hostname.
For now the function does two things, to followup with HAProxy's historical
default behavior:
1. apply server IP address found in server-state file if runtime DNS
resolution is enabled for this server
2. use the DNS resolver provided by the libc
If none of the 2 options above can find an IP address, then an error is
returned.
All of this will be needed to support the new server parameter "init-addr".
For now, the biggest user-visible change is that all server resolution errors
are dumped at once instead of causing a startup failure one by one.
In the last release a lot of the structures have become opaque for an
end user. This means the code using these needs to be changed to use the
proper functions to interact with these structures instead of trying to
manipulate them directly.
This does not fix any deprecations yet that are part of 1.1.0, it only
ensures that it can be compiled against that version and is still
compatible with older ones.
[wt: openssl-0.9.8 doesn't build with it, there are conflicts on certain
function prototypes which we declare as inline here and which are
defined differently there. But openssl-0.9.8 is not supported anymore
so probably it's OK to go without it for now and we'll see later if
some users still need it. Emeric has reviewed this change and didn't
spot anything obvious which requires special care. Let's try it for
real now]
The only reason wurfl/wurfl.h was needed outside of wurfl.c was to expose
wurfl_handle which is a pointer to a structure, referenced by global.h.
By just storing a void* there instead, we can confine all wurfl code to
wurfl.c, which is really nice.
WURFL is a high-performance and low-memory footprint mobile device
detection software component that can quickly and accurately detect
over 500 capabilities of visiting devices. It can differentiate between
portable mobile devices, desktop devices, SmartTVs and any other types
of devices on which a web browser can be installed.
In order to add WURFL device detection support, you would need to
download Scientiamobile InFuze C API and install it on your system.
Refer to www.scientiamobile.com to obtain a valid InFuze license.
Any useful information on how to configure HAProxy working with WURFL
may be found in:
doc/WURFL-device-detection.txt
doc/configuration.txt
examples/wurfl-example.cfg
Please find more information about WURFL device detection API detection
at https://docs.scientiamobile.com/documentation/infuze/infuze-c-api-user-guide
Right now there is an issue with the way the maintenance flags are
propagated upon startup. They are not propagate, just copied from the
tracked server. This implies that depending on the server's order, some
tracking servers may not be marked down. For example this configuration
does not work as expected :
server s1 1.1.1.1:8000 track s2
server s2 1.1.1.1:8000 track s3
server s3 1.1.1.1:8000 track s4
server s4 wtap:8000 check inter 1s disabled
It results in s1/s2 being up, and s3/s4 being down, while all of them
should be down.
The only clean way to process this is to run through all "root" servers
(those not tracking any other server), and to propagate their state down
to all their trackers. This is the same algorithm used to propagate the
state changes. It has to be done both to compute the IDRAIN flag and the
IMAINT flag. However, doing so requires that tracking servers are not
marked as inherited maintenance anymore while parsing the configuration
(and given that it is wrong, better drop it).
This fix also addresses another side effect of the bug above which is
that the IDRAIN/IMAINT flags are stored in the state files, and if
restored while the tracked server doesn't have the equivalent flag,
the servers may end up in a situation where it's impossible to remove
these flags. For example in the configuration above, after removing
"disabled" on server s4, the other servers would have remained down,
and not anymore with this fix. Similarly, the combination of IMAINT
or IDRAIN with their respective forced modes was not accepted on
reload, which is wrong as well.
This bug has been present at least since 1.5, maybe even 1.4 (it came
with tracking support). The fix needs to be backported there, though
the srv-state parts are irrelevant.
This commit relies on previous patch to silence warnings on startup.
We used to have 7 different character classes, each was 256 bytes long,
resulting in almost 2kB being used in the L1 cache. It's as cheap to
test a bit than to check the byte is not null, so let's store a 7-bit
composite value and check for the respective bits there instead.
The executable is now 4 kB smaller and the performance on small
objects increased by about 1% to 222k requests/second with a config
involving 4 http-request rules including 1 header lookup, one header
replacement, and 2 variable assignments.
There's no reason to use the stream anymore, only the appctx should be
used by a peer. This was a leftover from the migration to appctx and it
caused some confusion, so let's totally drop it now. Note that half of
the patch are just comment updates.
For active servers, this is the sum of the eweights of all active
servers before this one in the backend, and
[srv->cumulative_weight .. srv_cumulative_weight + srv_eweight) is a
space occupied by this server in the range [0 .. lbprm.tot_wact), and
likewise for backup servers with tot_wbck. This allows choosing a
server or a range of servers proportional to their weight, by simple
integer comparison.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
0 will mean no balancing occurs; otherwise it represents the ratio
between the highest-loaded server and the average load, times 100 (i.e.
a value of 150 means a 1.5x ratio), assuming equal weights.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
This commit introduces "tcp-request session" rules. These are very
much like "tcp-request connection" rules except that they're processed
after the handshake, so it is possible to consider SSL information and
addresses rewritten by the proxy protocol header in actions. This is
particularly useful to track proxied sources as this was not possible
before, given that tcp-request content rules are processed after each
HTTP request. Similarly it is possible to assign the proxied source
address or the client's cert to a variable.
This is in order to make integration of tcp-request-session cleaner :
- tcp_exec_req_rules() was renamed tcp_exec_l4_rules()
- LI_O_TCP_RULES was renamed LI_O_TCP_L4_RULES
(LI_O_*'s horrible indent was also fixed and a provision was left
for L5 rules).
These are denied conns. Strangely this wasn't emitted while it used to be
available for a while. It corresponds to the number of connections blocked
by "tcp-request connection reject".
To register a new cli keyword, you need to declare a cli_kw_list
structure in your source file:
static struct cli_kw_list cli_kws = {{ },{
{ { "test", "list", NULL }, "test list : do some tests on the cli", test_parsing, NULL },
{ { NULL }, NULL, NULL, NULL, NULL }
}};
And then register it:
cli_register_kw(&cli_kws);
The first field is an array of 5 elements, where you declare the
keywords combination which will match, it must be ended by a NULL
element.
The second field is used as a usage message, it will appear in the help
of the cli, you can set it to NULL if you don't want to show it, it's a
good idea if you want to overwrite some existing keywords.
The two last fields are callbacks.
The first one is used at parsing time, you can use it to parse the
arguments of your keywords and print small messages. The function must
return 1 in case of a failure, otherwise 0:
#include <proto/dumpstats.h>
static int test_parsing(char **args, struct appctx *appctx)
{
struct chunk out;
if (!*args[2]) {
appctx->ctx.cli.msg = "Error: the 3rd argument is mandatory !";
appctx->st0 = STAT_CLI_PRINT;
return 1;
}
chunk_reset(&trash);
chunk_printf(&trash, "arg[3]: %s\n", args[2]);
chunk_init(&out, NULL, 0);
chunk_dup(&out, &trash);
appctx->ctx.cli.err = out.str;
appctx->st0 = STAT_CLI_PRINT_FREE; /* print and free in the default cli_io_handler */
return 0;
}
The last field is the IO handler callback, it can be set to NULL if you
want to use the default cli_io_handler() otherwise you can write your
own. You can use the private pointer in the appctx if you need to store
a context or some data. stats_dump_sess_to_buffer() is a good example of
IO handler, IO handlers often use the appctx->st2 variable for the state
machine. The handler must return 0 in case it have to be recall later
otherwise 1.
During the stick-table teaching process which occurs at reloading/restart time,
expiration dates of stick-tables entries were not synchronized between peers.
This patch adds two new stick-table messages to provide such a synchronization feature.
As these new messages are not supported by older haproxy peers protocol versions,
this patch increments peers protol version, from 2.0 to 2.1, to help in detecting/supporting
such older peers protocol implementations so that new versions might still be able
to transparently communicate with a newer one.
[wt: technically speaking it would be nice to have this backported into 1.6
as some people who reload often are affected by this design limitation, but
it's not a totally transparent change that may make certain users feel
reluctant to upgrade older versions. Let's let it cook in 1.7 first and
decide later]
With Linux officially introducing SO_REUSEPORT support in 3.9 and
its mainstream adoption we have seen more people running into strange
SO_REUSEPORT related issues (a process management issue turning into
hard to diagnose problems because the kernel load-balances between the
new and an obsolete haproxy instance).
Also some people simply want the guarantee that the bind fails when
the old process is still bound.
This change makes SO_REUSEPORT configurable, introducing the command
line argument "-dR" and the noreuseport configuration directive.
A backport to 1.6 should be considered.
To avoid issues when porting code to some architecture, we need to know
the endianess the structures are currently used.
This patch simply had a short notice before those structures to report
endianess and ease contributor's job.
New DNS response parser function which turn the DNS response from a
network buffer into a DNS structure, much easier for later analysis
by upper layer.
Memory is pre-allocated at start-up in a chunk dedicated to DNS
response store.
New error code to report a wrong number of queries in a DNS response.
struct dns_query_item: describes a DNS query record
struct dns_answer_item: describes a DNS answer record
struct dns_response_packet: describes a DNS response packet
DNS_MIN_RECORD_SIZE: minimal size of a DNS record
DNS_MAX_QUERY_RECORDS: maximum number of query records we allow.
For now, we send one DNS query per request.
DNS_MAX_ANSWER_RECORDS: maximum number of records we may found in a
response
WIP dns: new MAX values
Current implementation of HAProxy's DNS resolution expect only 512 bytes
of data in the response.
Update DNS_MAX_UDP_MESSAGE to match this.
Backport: can be backported to 1.6
This function can replace update_server_addr() where the need to change the
server's port as well as the IP address is required.
It performs some validation before performing each type of change.
Introduction of 3 new server flags to remember if some parameters were set
during configuration parsing.
* SRV_F_CHECKADDR: this server has a check addr configured
* SRV_F_CHECKPORT: this server has a check port configured
* SRV_F_AGENTADDR: this server has a agent addr configured
HAProxy used to deduce port used for health checks when parsing configuration
at startup time.
Because of this way of working, it makes it complicated to change the port at
run time.
The current patch changes this behavior and makes HAProxy to choose the
port used for health checking when preparing the check task itself.
A new type of error is introduced and reported when no port can be found.
There won't be any impact on performance, since the process to find out the
port value is made of a few 'if' statements.
This patch also introduces a new check state CHK_ST_PORT_MISS: this flag is
used to report an error in the case when HAProxy needs to establish a TCP
connection to a server, to perform a health check but no TCP ports can be
found for it.
And last, it also introduces a new stream termination condition:
SF_ERR_CHK_PORT. Purpose of this flag is to report an error in the event when
HAProxy has to run a health check but no port can be found to perform it.
Trie now uses a dataset structure just like Pattern, so this has been
defined in includes/types/global.h for both Pattern and Trie where it
was just Pattern.
In src/51d.c all functions used by the Trie implementation which need a
dataset as an argument now use the global dataset. The
fiftyoneDegreesDestroy method has now been replaced with
fiftyoneDegreesDataSetFree which is common to Pattern and Trie. In
addition, two extra dataset init status' have been added to the switch
statement in init_51degrees.
A few log format fields were declared but never used, so let's drop
them, the whole list is confusing enough already :
LOG_FMT_VARIABLE, LOG_FMT_T, LOG_FMT_CONN, LOG_FMT_QUEUES.
Tq is the time between the instant the connection is accepted and a
complete valid request is received. This time includes the handshake
(SSL / Proxy-Protocol), the idle when the browser does preconnect and
the request reception.
This patch decomposes %Tq in 3 measurements names %Th, %Ti, and %TR
which returns respectively the handshake time, the idle time and the
duration of valid request reception. It also adds %Ta which reports
the request's active time, which is the total time without %Th nor %Ti.
It replaces %Tt as the total time, reporting accurate measurements for
HTTP persistent connections.
%Th is avalaible for TCP and HTTP sessions, %Ti, %TR and %Ta are only
avalaible for HTTP connections.
In addition to this, we have new timestamps %tr, %trg and %trl, which
log the date of start of receipt of the request, respectively in the
default format, in GMT time and in local time (by analogy with %t, %T
and %Tl). All of them are obviously only available for HTTP. These values
are more relevant as they more accurately represent the request date
without being skewed by a browser's preconnect nor a keep-alive idle
time.
The HTTP log format and the CLF log format have been modified to
use %tr, %TR, and %Ta respectively instead of %t, %Tq and %Tt. This
way the default log formats now produce the expected output for users
who don't want to manually fiddle with the log-format directive.
Example with the following log-format :
log-format "%ci:%cp [%tr] %ft %b/%s h=%Th/i=%Ti/R=%TR/w=%Tw/c=%Tc/r=%Tr/a=%Ta/t=%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"
The request was sent by hand using "openssl s_client -connect" :
Aug 23 14:43:20 haproxy[25446]: 127.0.0.1:45636 [23/Aug/2016:14:43:20.221] test~ test/test h=6/i=2375/R=261/w=0/c=1/r=0/a=262/t=2643 200 145 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
=> 6 ms of SSL handshake, 2375 waiting before sending the first char (in
fact the time to type the first line), 261 ms before the end of the request,
no time spent in queue, 1 ms spend connecting to the server, immediate
response, total active time for this request = 262ms. Total time from accept
to close : 2643 ms.
The timing now decomposes like this :
first request 2nd request
|<-------------------------------->|<-------------- ...
t tr t tr ...
---|----|----|----|----|----|----|----|----|--
: Th Ti TR Tw Tc Tr Td : Ti ...
:<---- Tq ---->: :
:<-------------- Tt -------------->:
:<--------- Ta --------->:
The function ipcpy() simply duplicates the IP address found in one
struct sockaddr_storage into an other struct sockaddr_storage.
It also update the family on the destination structure.
Memory of destination structure must be allocated and cleared by the
caller.
FreeBSD prefers to use IPPROTO_TCP over SOL_TCP, just like it does
with their *_IP counterparts. It's worth noting that there are a few
inconsistencies between SOL_TCP and IPPROTO_TCP in the code, eg on
TCP_QUICKACK. The two values are the same but it's worth applying
what implementations recommend.
No backport is needed, this was uncovered by the recent tcp_info stuff.
Recent commit 93b227d ("MINOR: listener: add the "accept-netscaler-cip"
option to the "bind" keyword") introduced an include of netinet/ip.h
which requires in_systm.h on OpenBSD. No backport is needed.
It is sometimes needed in application server environments to easily tell
if a source is local to the machine or a remote one, without necessarily
knowing all the local addresses (dhcp, vrrp, etc). Similarly in transparent
proxy configurations it is sometimes desired to tell the difference between
local and remote destination addresses.
This patch adds two new sample fetch functions for this :
dst_is_local : boolean
Returns true if the destination address of the incoming connection is local
to the system, or false if the address doesn't exist on the system, meaning
that it was intercepted in transparent mode. It can be useful to apply
certain rules by default to forwarded traffic and other rules to the traffic
targetting the real address of the machine. For example the stats page could
be delivered only on this address, or SSH access could be locally redirected.
Please note that the check involves a few system calls, so it's better to do
it only once per connection.
src_is_local : boolean
Returns true if the source address of the incoming connection is local to the
system, or false if the address doesn't exist on the system, meaning that it
comes from a remote machine. Note that UNIX addresses are considered local.
It can be useful to apply certain access restrictions based on where the
client comes from (eg: require auth or https for remote machines). Please
note that the check involves a few system calls, so it's better to do it only
once per connection.
At some places, smp_dup() is inappropriately called to ensure a modification
is possible while in fact we only need to ensure the sample may be modified
in place. Let's provide smp_is_rw() to check for this capability and
smp_make_rw() to perform the smp_dup() when it is not the case.
Note that smp_is_rw() will also try to add the trailing zero on strings when
needed if possible, to avoid a useless duplication.
These functions ensure that the designated sample is "safe for use",
which means that its size is known, its length is correct regarding its
size, and that strings are properly zero-terminated.
smp_is_safe() only checks (and optionally sets the trailing zero when
needed and possible). smp_make_safe() will call smp_dup() after
smp_is_safe() fails.
Vedran Furac reported a strange problem where the "base" sample fetch
would not always work for tracking purposes.
In fact, it happens that commit bc8c404 ("MAJOR: stick-tables: use sample
types in place of dedicated types") merged in 1.6 exposed a fundamental
bug related to the way samples use chunks as strings. The problem is that
chunks convey a base pointer, a length and an optional size, which may be
zero when unknown or when the chunk is allocated from a read-only location.
The sole purpose of this size is to know whether or not the chunk may be
appended new data. This size cause some semantics issue in the sample,
which has its own SMP_F_CONST flag to indicate read-only contents.
The problem was emphasized by the commit above because it made use of new
calls to smp_dup() to convert a sample to a table key. And since smp_dup()
would only check the SMP_F_CONST flag, it would happily return read-write
samples indicating size=0.
So some tests were added upon smp_dup() return to ensure that the actual
length is smaller than size, but this in fact made things even worse. For
example, the "sni" server directive does some bad stuff on many occasions
because it limits len to size-1 and effectively sets it to -1 and writes
the zero byte before the beginning of the string!
It is therefore obvious that smp_dup() needs to be modified to take this
nature of the chunks into account. It's not enough but is needed. The core
of the problem comes from the fact that smp_dup() is called for 5 distinct
needs which are not always fulfilled :
1) duplicate a sample to keep a copy of it during some operations
2) ensure that the sample is rewritable for a converter like upper()
3) ensure that the sample is terminated with a \0
4) set a correct size on the sample
5) grow the sample in case it was extracted from a partial chunk
Case 1 is not used for now, so we can ignore it. Case 2 indicates the wish
to modify the sample, so its R/O status must be removed if any, but there's
no implied requirement that the chunk becomes larger. Case 3 is used when
the sample has to be made compatible with libc's str* functions. There's no
need to make it R/W nor to duplicate it if it is already correct. Case 4
can happen when the sample's size is required (eg: before performing some
changes that must fit in the buffer). Case 5 is more or less similar but
will happen when the sample by be grown but we want to ensure we're not
bound by the current small size.
So the proposal is to have different functions for various operations. One
will ensure a sample is safe for use with str* functions. Another one will
ensure it may be rewritten in place. And smp_dup() will have to perform an
inconditional duplication to guarantee at least #5 above, and implicitly
all other ones.
This patch only modifies smp_dup() to make the duplication inconditional. It
is enough to fix both the "base" sample fetch and the "sni" server directive,
and all use cases in general though not always optimally. More patches will
follow to address them more optimally and even better than the current
situation (eg: avoid a dup just to add a \0 when possible).
The bug comes from an ambiguous design, so its roots are old. 1.6 is affected
and a backport is needed. In 1.5, the function already existed but was only
used by two converters modifying the data in place, so the bug has no effect
there.
Similar to "escape_chunk", this function tries to prefix all characters
tagged in the <map> with the <escape> character. The specified <string>
contains the input to be escaped.
This enables tracking of sticky counters from current response. The only
difference from "http-request track-sc" is the <key> sample expression
can only make use of samples in response (eg. res.*, status etc.) and
samples below Layer 6.
If an action wrapper stops the processing of the transaction
with a txn_done() function, the return code of the action is
"continue". So the continue can implies the processing of other
like adding headers. However, the HTTP content is flushed and
a segfault occurs.
This patchs add a flag indicating that the Lua code want to
stop the processing, ths flags is forwarded to the haproxy core,
and other actions are ignored.
Must be backported in 1.6
The function txn_done() ends a transaction. It does not make
sense to call this function from a lua sample-fetch wrapper,
because the role of a sample-fetch is not to terminate a
transaction.
This patch modify the role of the fucntion txn_done() if it
is called from a sample-fetch wrapper, now it just ends the
execution of the Lua code like the done() function.
Must be backported in 1.6
Alexander Lebedev reported that the response bit is set on SPARC when
DNS queries are sent. This has been tracked to the endianess issue, so
this patch makes the code portable.
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
Alexander Lebedev reported that the DNS parser crashes in 1.6 with a bus
error on Sparc when it receives a response. This is obviously caused by
some alignment issues. The issue can also be reproduced on ARMv5 when
setting /proc/cpu/alignment to 4 (which helps debugging).
Two places cause this crash in turn, the first one is when the IP address
from the packet is compared to the current one, and the second place is
when the address is assigned because an unaligned address is passed to
update_server_addr().
This patch modifies these places to properly use memcpy() and memcmp()
to manipulate the unaligned data.
Nenad Merdanovic found another set of places specific to 1.7 in functions
in_net_ipv4() and in_net_ipv6(), which are used to compare networks. 1.6
has the functions but does not use them. There we perform a temporary copy
to a local variable to fix the problem. The type of the function's argument
is wrong since it's not necessarily aligned, so we change it for a const
void * instead.
This fix must be backported to 1.6. Note that in 1.6 the code is slightly
different, there's no rec[] array, the pointer is used directly from the
buffer.
Originally, tcphdr's source and dest from Linux were used to get the
source and port which led to a build issue on BSD oses.
To avoid side problems related to network then we just use an internal
struct as we need only those two fields.
When realloc fails to allocate memory, the original pointer is not
freed. Sometime people override the original pointer with the pointer
returned by realloc which is NULL in case of failure. This results
in a memory leak because the memory pointed by the original pointer
cannot be freed.
This configures the client-facing connection to receive a NetScaler
Client IP insertion protocol header before any byte is read from the
socket. This is equivalent to having the "accept-netscaler-cip" keyword
on the "bind" line, except that using the TCP rule allows the PROXY
protocol to be accepted only for certain IP address ranges using an ACL.
This is convenient when multiple layers of load balancers are passed
through by traffic coming from public hosts.
When NetScaler application switch is used as L3+ switch, informations
regarding the original IP and TCP headers are lost as a new TCP
connection is created between the NetScaler and the backend server.
NetScaler provides a feature to insert in the TCP data the original data
that can then be consumed by the backend server.
Specifications and documentations from NetScaler:
https://support.citrix.com/article/CTX205670https://www.citrix.com/blogs/2016/04/25/how-to-enable-client-ip-in-tcpip-option-of-netscaler/
When CIP is enabled on the NetScaler, then a TCP packet is inserted just after
the TCP handshake. This is composed as:
- CIP magic number : 4 bytes
Both sender and receiver have to agree on a magic number so that
they both handle the incoming data as a NetScaler Client IP insertion
packet.
- Header length : 4 bytes
Defines the length on the remaining data.
- IP header : >= 20 bytes if IPv4, 40 bytes if IPv6
Contains the header of the last IP packet sent by the client during TCP
handshake.
- TCP header : >= 20 bytes
Contains the header of the last TCP packet sent by the client during TCP
handshake.
The previous dump algorithm was not trying to yield when the buffer is
full, it's not a problem with the TLS_TICKETS_NO which is 3 by default
but it can become one if the buffer size is lowered and if the
TLS_TICKETS_NO is increased.
The index of the latest ticket dumped is now stored to ensure we can
resume the dump after a yield.
The function stats_tlskeys_list() can meet an undefined behavior when
called with appctx->st2 == STAT_ST_LIST, indeed the ref pointer is used
uninitialized.
However this function was using NULL in appctx->ctx.tlskeys.ref as a
flag to dump every tickets from every references. A real flag
appctx->ctx.tlskeys.dump_all is now used for this behavior.
This patch delete the 'ref' variable and use appctx->ctx.tlskeys.ref
directly.
The 'set-src' action was not available for tcp actions The action code
has been converted into a function in proto_tcp.c to be used for both
'http-request' and 'tcp-request connection' actions.
Both http and tcp keywords are registered in proto_tcp.c
htonll()/ntohll() already exist on Solaris 11 with a different declaration,
causing a build error as reported by Jonathan Fisher. They used to exist on
OSX with a #define which allowed us to detect them. It was a bad idea to give
these functions a name subject to conflicts like this. Simply rename them
my_htonll()/my_ntohll() to definitely get rid of the conflict.
This patch must be backported to 1.6.
DNS requests (using the internal resolver) are corrupted since commit
e2f8497716 ("BUG/MINOR: dns: fix DNS header definition").
Fix it by defining the struct in network byte order, while complying
with RFC 2535, section 6.1.
First reported by Eduard Vopicka on discourse.
This must be backported to 1.6 (1.6.5 is affected).
Commit 108b1dd ("MEDIUM: http: configurable http result codes for
http-request deny") introduced in 1.6-dev2 was incomplete. It introduced
a new field "rule_deny_status" into struct http_txn, which is filled only
by actions "http-request deny" and "http-request tarpit". It's then used
in the deny code path to emit the proper error message, but is used
uninitialized when the deny comes from a "reqdeny" rule, causing random
behaviours ranging from returning a 200, an empty response, or crashing
the process. Often upon startup only 200 was returned but after the fields
are used the crash happens. This can be sped up using -dM.
There's no need at all for storing this status in the http_txn struct
anyway since it's used immediately after being set. Let's store it in
a temporary variable instead which is passed as an argument to function
http_req_get_intercept_rule().
As an extra benefit, removing it from struct http_txn reduced the size
of this struct by 8 bytes.
This fix must be backported to 1.6 where the bug was detected. Special
thanks to Falco Schmutz for his detailed report including an exploitable
core and a reproducer.
When compiled with GCC 6, the IP address specified for a frontend was
ignored and HAProxy was listening on all addresses instead. This is
caused by an incomplete copy of a "struct sockaddr_storage".
With the GNU Libc, "struct sockaddr_storage" is defined as this:
struct sockaddr_storage
{
sa_family_t ss_family;
unsigned long int __ss_align;
char __ss_padding[(128 - (2 * sizeof (unsigned long int)))];
};
Doing an aggregate copy (ss1 = ss2) is different than using memcpy():
only members of the aggregate have to be copied. Notably, padding can be
or not be copied. In GCC 6, some optimizations use this fact and if a
"struct sockaddr_storage" contains a "struct sockaddr_in", the port and
the address are part of the padding (between sa_family and __ss_align)
and can be not copied over.
Therefore, we replace any aggregate copy by a memcpy(). There is another
place using the same pattern. We also fix a function receiving a "struct
sockaddr_storage" by copy instead of by reference. Since it only needs a
read-only copy, the function is converted to request a reference.
'channel_analyze' callback has been removed. Now, there are 2 callbacks to
surround calls to analyzers:
* channel_pre_analyze: Called BEFORE all filterable analyzers. it can be
called many times for the same analyzer, once at each loop until the
analyzer finishes its processing. This callback is resumable, it returns a
negative value if an error occurs, 0 if it needs to wait, any other value
otherwise.
* channel_post_analyze: Called AFTER all filterable analyzers. Here, AFTER
means when an analyzer finishes its processing. This callback is NOT
resumable, it returns a negative value if an error occurs, any other value
otherwise.
Pre and post analyzer callbacks are not automatically called. 'pre_analyzers'
and 'post_analyzers' bit fields in the filter structure must be set to the right
value using AN_* flags (see include/types/channel.h).
The flag AN_RES_ALL has been added (AN_REQ_ALL already exists) to ease the life
of filter developers. AN_REQ_ALL and AN_RES_ALL include all filterable
analyzers.
Now, to call an analyzer in 'process_stream' function, we should use
FLT_ANALAYZE or ANALYZE macros, depending if this is a filterable analyzer or
not.
Instead of calling 'channel_analyze' callback with the flag AN_FLT_HTTP_HDRS,
now we use the new callback 'http_headers'. This change is done because
'channel_analyze' callback will be removed in a next commit.
As suggested by Pavlos, it's too bad that we didn't have a %Td log
format tag given that there are a few mentions of Td corresponding
to the data transmission time already in the doc, so this is now done.
Just like the other specifiers, we report -1 if the connection failed
before reaching the data transmission state.
int list_append_word(struct list *li, const char *str, char **err)
Append a copy of string <str> (inside a wordlist) at the end of
the list <li>.
The caller is responsible for freeing the <err> and <str> copy memory
area using free().
On failure : return 0 and <err> filled with an error message.
Conforming to RFC 2535, section 6.1. This is not an important bug as
those fields don't seem to be set to something else than 0 and to be
checked on answers.
This is the same issue as "show servers state", where the result is incorrect
it the data can't fit in one buffer. The similar fix is applied, to restart
the data processing where it stopped as buffers are sent to the client.
This fix should be backported to haproxy 1.6
It was reported that the unix socket command "show servers state" returned an
empty response while "show servers state <backend>" worked.
In fact, both cases can reproduce the issue. It happens when the response can't
fit in one buffer.
The fix consists in processing the response in several steps, as it is done in
some others commands, by restarting where it was stopped after the buffer is
sent to the client.
This fix should be backported to haproxy 1.6
In 1.4-dev3, commit 31971e5 ("[MEDIUM] add support for infinite forwarding")
made it possible to configure the lower layer to forward data indefinitely
by setting the forward size to CHN_INFINITE_FORWARD (4GB-1). By then larger
chunk sizes were not supported so there was no confusion in the usage of the
function.
Since 1.5 we support 64-bit content-lengths and chunk sizes and the function
has grown to support 64-bit arguments, though it still limits a single pass
to 32-bit quantities (what fit in the channel's to_forward field). The issue
now becomes that a 4GB-1 content-length can be confused with infinite
forwarding (in fact it's 4GB-1+what was already in the buffer). It causes a
visible effect when transferring this exact size because the transfer rate
is lower than with other sizes due in part to the disabling of the Nagle
algorithm on the sendto() call.
In theory with keep-alive it should prevent a second request from being
processed after such a transfer, but since the analysers are still present,
the forwarding analyser properly counts down the remaining size to transfer
and ultimately the transaction gets correctly reset so there is no visible
effect.
Since the root cause of the issue is an API problem (lack of distinction
between a real valid length and a magic value), this patch modifies the API
to have a new dedicated function called channel_forward_forever() to program
a permanent forwarding. The existing function __channel_forward() was modified
to properly take care of the requested sizes and ensure it 1) never overflows
and 2) never reaches CHN_INFINITE_FORWARD by accident.
It is worth noting that the function used to have a bug causing a 2GB
forward to be scheduled if it was called with less data than what is present
in buf->i. Fortunately this bug couldn't be triggered with existing code.
This fix should be backported to 1.6 and 1.5. While it also theorically
affects 1.4, it's better not to backport it there, as the risk of breaking
large object transfers due to significant API differences is high, compared
to the fact that the largest supported objects (4GB-1) are just slower to
transfer.
Unfortunately, commit 169c470 ("BUG/MEDIUM: channel: fix miscalculation of
available buffer space (3rd try)") was still not enough to completely
address the issue. It fell into an integer comparison trap. Contrary to
expectations, chn->to_forward may also have the sign bit set when
forwarding regular data having a large content-length, resulting in
an incomplete check of the result and of the reserve because the with
to_forward very large, to_forward+o could become very small and also
the reserve could become positive again and make channel_recv_limit()
return a negative value.
One way to reproduce this situation is to transfer a large file (> 2GB)
with http-keep-alive or http-server-close, without splicing, and ensure
that the server uses content-length instead of chunks. The transfer
should stall very early after the first buffer has been transferred
to the client.
This fix now properly checks 1) for an overflow caused by summing o and
to_forward, and 2) for o+to_forward being smaller or larger than maxrw
before performing the subtract, so that all sensitive operations are
properly performed on 33-bit arithmetics.
The code was subjected again to a series of tests using inject+httpterm
scanning a wide range of object sizes (+10MB after each new request) :
$ printf "new page 1\nget 127.0.0.1:8002 / s=%%s0m\n" | \
inject64 -o 1 -u 1 -f /dev/stdin
With previous fix, the transfer would suddenly stop when reaching 2GB :
hits ^hits hits/s ^h/s bytes kB/s last errs tout htime sdht ptime
203 1 2 1 216816173354 2710202 3144892 0 0 685.0 0.0 685.0
205 2 2 2 219257283186 2706880 2441109 0 0 679.5 6.5 679.5
205 0 2 0 219257283186 2673836 0 0 0 0.0 0.0 0.0
205 0 2 0 219257283186 2641622 0 0 0 0.0 0.0 0.0
205 0 2 0 219257283186 2610174 0 0 0 0.0 0.0 0.0
Now it's fine even past 4 GB.
Many thanks to Vedran Furac for reporting this issue early with a common
access pattern helping to troubleshoot this.
This fix must be backported to 1.6 and 1.5 where the commit above was
already backported.
This function returns non-zero if the channel is congested with data in
transit waiting for leaving, indicating to the caller that it should wait
for the reserve to be released before starting to process new data in
case it needs the ability to append data. This is meant to be used while
waiting for a clean response buffer before processing a request.
Add opaque data between the filter keyword registrering and the parsing
function. This opaque data allow to use the same parser with differents
registered keywords. The opaque data is used for giving data which mainly
makes difference between the two keywords.
It will be used with Lua keywords registering.
This is very useful in complex architecture systems where HAproxy
is balancing DB connections for example. We want to keep the maxconn
high in order to avoid issues with queueing on the LB level when
there is slowness on another part of the system. Example is a case of
an architecture where each thread opens multiple DB connections, which
if get stuck in queue cause a snowball effect (old connections aren't
closed, new ones cannot be established). These connections are mostly
idle and the DB server has no problem handling thousands of them.
Allowing us to dynamically set maxconn depending on the backend usage
(LA, CPU, memory, etc.) enables us to have high maxconn for situations
like above, but lowering it in case there are real issues where the
backend servers become overloaded (cache issues, DB gets hit hard).
Latest fix 8a32106 ("BUG/MEDIUM: channel: fix miscalculation of available
buffer space (2nd try)") did happen to fix some observable issues but not
all of them in fact, some corner cases still remained and at least one user
reported a busy loop that appeared possible, though not easily reproducible
under experimental conditions.
The remaining issue is that we still consider min(i, to_fwd) as the number
of bytes in transit, but in fact <i> is not relevant here. Indeed, what
matters is that we can read everything we want at once provided that at
the end, <i> cannot be larger than <size-maxrw> (if it was not already).
This is visible in two cases :
- let's have i=o=max/2 and to_fwd=0. Then i+o >= max indicates that the
buffer is already full, while it is not since once <o> is forwarded,
some space remains.
- when to_fwd is much larger than i, it's obvious that we can fill the
buffer.
The only relevant part in fact is o + to_fwd. to_fwd will ensure that at
least this many bytes will be moved from <i> to <o> hence will leave the
buffer, whatever the number of rounds it takes.
Interestingly, the fix applied here ensures that channel_recv_max() will
now equal (size - maxrw - i + to_fwd), which is indeed what remains
available below maxrw after to_fwd bytes are forwarded from i to o and
leave the buffer.
Additionally, the latest fix made it possible to meet an integer overflow
that was not caught by the range test when forwarding in TCP or tunnel
mode due to to_forward being added to an existing value, causing the
buffer size to be limited when it should not have been, resulting in 2
to 3 recv() calls when a single one was enough. The first one was limited
to the unreserved buffer size, the second one to the size of the reserve
minus 1, and the last one to the last byte. Eg with a 2kB buffer :
recvfrom(22, "HTTP/1.1 200\r\nConnection: close\r"..., 1024, 0, NULL, NULL) = 1024
recvfrom(22, "23456789.123456789.123456789.123"..., 1023, 0, NULL, NULL) = 1023
recvfrom(22, "5", 1, 0, NULL, NULL) = 1
This bug is still present in 1.6 and 1.5 so the fix should be backported
there.
The condition to poll for receive as implemented in channel_may_recv()
is still incorrect. If buf->o is null and buf->i is slightly larger than
chn->to_forward and at least as large as buf->size - maxrewrite, then
reading will be disabled. It may slightly delay some data delivery by
having first to forward pending bytes, but may also cause some random
issues with analysers that wait for some data before starting to forward
what they correctly parsed. For instance, a body analyser may be prevented
from seeing the data that only fits in the reserve.
This bug may also prevent an applet's chk_rcv() function from being called
when part of a buffer is released. It is possible (though not verified)
that this participated to some peers frozen session issues some people
have been facing.
This fix should be backported to 1.6 and 1.5 to ensure better coherency
with channel_recv_limit().