This function was only used to call chunk_init_len() from another chunk,
which in the end consists in simply assigning the source chunk to the
destination chunk. Let's remove this indirection to make the code clearer.
Anyway it was the only place such a function was used.
This is 1.5-specific. It causes issues with transparent source binding involving
hdr_ip. We must not try to bind() to a foreign address when the family is not set,
and we must set the family when an address is set.
By default we disable TCP quick-acking on HTTP requests so that we
avoid sending a pure ACK immediately followed by the HTTP response.
However, if the client sends an incomplete request in a short packet,
its TCP stack might wait for this packet to be ACKed before sending
the rest of the request, delaying incoming requests by up to 40-200ms.
We can detect this undesirable situation when parsing the request :
- if an incomplete request is received
- if a full request is received and uses chunked encoding or advertises
a content-length larger than the data available in the buffer
In these situations, we re-enable TCP quick-ack if we had previously
disabled it.
Server Name Indication (SNI) is a TLS extension which makes a client
present the name of the server it is connecting to in the client hello.
It allows a transparent proxy to take a decision based on the beginning
of an SSL/TLS stream without deciphering it.
The new ACL "req_ssl_sni" matches the name extracted from the TLS
handshake against a list of names which may be loaded from a file if
needed.
When splice() returns EAGAIN, on old kernels it could be caused by a read
shutdown which was not detected. Due to this behaviour, we had to fall
back to recv(), which in turn says if it's a real EAGAIN or a shutdown.
Since this behaviour was fixed in 2.6.27.14, on more recent kernels we'd
prefer to avoid the fallback to recv() when possible. For this, we set a
variable the first time splice() detects a shutdown, to indicate that it
works. We can then rely on this variable to adjust our behaviour.
Doing this alone increased the overall performance by about 1% on medium
sized objects.
First, it's a waste not to call chk_snd() when spliced data are available,
because the pipe can almost always be transferred into the outgoing socket
buffers. Starting from now, when we splice data in, we immediately try to
send them. This results in less pipes used, and possibly less kernel memory
in use at once.
Second, if a pipe cannot be transferred into the outgoing socket buffers,
it means this buffer is full. There's no point trying again then, as space
will almost never be available, resulting in a useless syscall returning
EAGAIN.
Daniel Rankov reported that "option nolinger" is inefficient on backends.
The reason is that it is set on the file descriptor only, which does not
prevent haproxy from performing a clean shutdown() before closing. We must
set the flag on the stream_interface instead if we want an RST to be emitted
upon active close.
A number of primitives were missing for buffer management, and some
of them were particularly awkward to use. Specifically, the functions
used to compute free space could not always be used depending what was
wrapping in the buffers. Some documentation has been added about how
the buffers work and their properties. Some functions are still missing
such as a buffer replacement which would support wrapping buffers.
This patch settles the 2 loggers limitation.
Loggers are now stored in linked lists.
Using "global log", the global loggers list content is added at the end
of the current proxy list. Each "log" entries are added at the end of
the proxy list.
"no log" flush a logger list.
Up to now, if a cookie value was specified on a server when the proxy was
in TCP mode, it would cause a fatal error. Now we only report a warning,
since the cookie will be ignored. This makes it easier to generate configs
from scripts.
Ludovic Levesque reported and diagnosed an annoying bug. When a server is
configured to track another one and has a slowstart interval set, it's
assigned a minimal weight when the tracked server goes back up but keeps
this weight forever.
This is because the throttling during the warmup phase is only computed
in the health checking function.
After several attempts to resolve the issue, the only real solution is to
split the check processing task in two tasks, one for the checks and one
for the warmup. Each server with a slowstart setting has a warmum task
which is responsible for updating the server's weight after a down to up
transition. The task does not run in othe situations.
In the end, the fix is neither complex nor long and should be backported
to 1.4 since the issue was detected there first.
When reading the code, the "tracked" member of a server makes one
think the server is tracked while it's the opposite, it's a pointer
to the server being tracked. This is particularly true in constructs
such as :
if (srv->tracked) {
Since it's the second time I get caught misunderstanding it, let's
rename it to "track" to avoid the confusion.
Baptiste Assmann reported that a config where a non-existing peers
section is referenced by a stick-table causes a segfault after displaying
the error. This is caused by the freeing of the peers. Setting it to NULL
after displaying the error fixes the issue.
For a long time, the max number of headers was taken as a part of the buffer
size. Since the header size can be configured at runtime, it does not make
much sense anymore.
Nothing was making it necessary to have a static value, so let's turn this into
a tunable with a default value of 101 which equals what was previously used.
It makes no sense to have one pointer to the hdr_idx pool in each proxy
struct since these pools do not depend on the proxy. Let's have a common
pool instead as it is already the case for other types.
By default, pipes are the default size for the system. But sometimes when
using TCP splicing, it can improve performance to increase pipe sizes,
especially if it is suspected that pipes are not filled and that many
calls to splice() are performed. This has an impact on the kernel's
memory footprint, so this must not be changed if impacts are not understood.
Stream interfaces used to distinguish between client and server addresses
because they were previously of different types (sockaddr_storage for the
client, sockaddr_in for the server). This is not the case anymore, and this
distinction is confusing at best and has caused a number of regressions to
be introduced in the process of converting everything to full-ipv6. We can
now remove this and have a much cleaner code.
Nick Chalk reported that a connection to a server which has no port specified
used twice the port number. The reason is that the port number was taken from
the wrong part of the address, the client's destination address was used as the
base port instead of the server's configured address.
Thanks to Nick for his helpful diagnostic.
This patch introduces hdr_len, path_len and url_len for matching these
respective parts lengths against integers. This can be used to detect
abuse or empty headers.
Commit 588bd4 fixed header parsing so that trailing spaces were not part
of the returned string. Unfortunately, if a header only had spaces, the
last spaces were trimmed past the beginning of the value, causing a negative
length to be returned.
A quick code review shows that there should be no impact since the only
places where the vlen is used are either compared to a specific value or
with explicit contents (eg: digits).
This must be backported to 1.4.
These requests are mainly monitor requests, as well as stats requests when
the stats are processed by the frontend. Having this counter helps explain
the difference in number of sessions that is sometimes observed between a
frontend and a backend.
Passing -C <dir> causes haproxy to chdir to <dir> before loading
any file. The argument may be passed anywhere on the command line.
A typical use case is :
$ haproxy -C /etc/haproxy -f global.cfg -f haproxy.cfg
We now measure the work and idle times in order to report the idle
time in the stats. It's expected that we'll be able to use it at
other places later.
*_dom is mostly used for matching Host headers, and host headers may
include port numbers. To avoid having to create multiple rules with
and without :<port-number> in hdr_dom rules, change the *_dom
matching functions to also handle : as a delimiter.
Typically there are rules like this in haproxy.cfg:
acl is_foo hdr_dom(host) www.foo.com
Most clients send "Host: www.foo.com" in their HTTP header, but some
send "Host: www.foo.com:80" (which is allowed), and the above
rule will now work for those clients as well.
[Note: patch was edited before merge, any unexpected bug is mine /willy]
We already had the ability to kill a connection, but it was only
for the checks. Now we can do this for any session, and for this we
add a specific flag "K" to the logs.
The stats socket now allows the admin to disable, enable or shutdown a frontend.
This can be used when a bug is discovered in a configuration and it's desirable
to fix it but the rules in place don't allow to change a running config. Thus it
becomes possible to kill the frontend to release the port and start a new one in
a separate process.
This can also be used to temporarily make haproxy return TCP resets to incoming
requests to pretend the service is not bound. For instance, this may be useful
to quickly flush a very deep SYN backlog.
The frontend check and lookup code was factored with the "set maxconn" usage.
Upon an incoming soft restart request, we first pause all frontends and
peers. If the caller changes its mind and asks us to resume (eg: failed
binding), we must resume all the frontends and peers. Unfortunately the
peers were not resumed.
The code was arranged to avoid code duplication (which used to hide the
issue till now).
If a peers section has no peer named as the local peer, we must destroy
it, otherwise a NULL peer frontend remains in the lists and a segfault
can happen upon a soft restart.
We also now report the missing peer name in order to help troubleshooting.
Peers' frontends must have logging disabled by default, which was not
the case, so logs were randomly emitted upon restart, sometimes causing
a new process to fail to replace the old one.
This made sense a long time ago but since the maxconn is dynamically
computed from the tracking tables, it does not make any sense anymore
and will harm future changes.
The HTML page reports the current process connection rate, and the
"show info" command on the stats socket also reports the conn rate
limit and the max conn rate that was once reached.
Note that the max value can be cleared using "clear counters".
This one enforces a per-process connection rate limit, regardless of what
may be set per frontend. It can be a way to limit the CPU usage of a process
being severely attacked.
The side effect is that the global process connection rate is now measured
for each incoming connection, so it will be possible to report it.
This option permits to change the global maxconn setting within the
limit that was set by the initial value, which is now reported as the
hard maxconn value. This allows to immediately accept more concurrent
connections or to stop accepting new ones until the value passes below
the indicated setting.
The main use of this option is on systems where many haproxy instances
are loaded and admins need to re-adjust resource sharing at run time
to regain a bit of fairness between processes.
The way the unix socket is initialized is awkward. Some of the settings are put
in the sockets itself, other ones in the backend. And more importantly the
global.maxsock value is adjusted so that the stats socket evades the global
maxconn value. This complexifies maxsock computations for nothing, since the
stats socket is not supposed to receive hundreds of concurrent connections when
the global maxconn is very low. What is needed however is to ensure that there
are always connections left for the stats socket even when traffic sockets are
saturated, but this guarantee is not offered anymore by current code.
So as of now, the stats socket is subject to the global maxconn limitation just
as any other socket until a reservation mechanism is implemented.