The channel class permits manipulation of channels. A channel is
an FIFO buffer between the client and the server. This class provides
function to read, write, forward, destroy and alter data between
the input and the ouput of the buffer.
This patch adds the TCP I/O functionnality. The class implemented
provides the same functions than the "lua socket" project. This
make network compatibility with another LUA project. The documentation
is located here:
http://w3.impa.br/~diego/software/luasocket/tcp.html
This version of sleep is based on a coroutine. A sleeping
task is started and a signal is registered. This sleep version
must disapear to be replaced by a version using the internal
timers.
This patch adds the browsing of all the HAProxy fetches and
create associated LUA functions. The HAProxy internal fetches
can be used in LUA trough the class "TXN".
Note that the symbols "-", "+" and "." in the name of current
sample fetch are rewrited as "_" in LUA because ".", "-" and "+"
are operators.
This class of functions permit to access to all the functions
associated with the transaction like http header, HAProxy internal
fetches, etc ...
This patch puts the skeleton of this class. The class will be
enhanced later.
This system permits to execute some lua function after than HAProxy
complete his initialisation. These functions are executed between
the end of the configuration parsing and check and the begin of the
scheduler.
This system permits to send signals between lua tasks. A main lua stack can
register the signal in a coprocess. When the coprocess finish his job, it
send a signal, and the associated task is wakes. If the main lua execution
stack stop (with or without errors), the list or pending signals is purged.
This is the first step of the lua integration. We add the useful
files in the HAProxy project. These files contains the main
includes, the Makefile options and empty initialisation function.
Is is the LUA skeleton.
Later, the processing of some actions needs to be interrupted and resumed
later. This patch permit to resume the actions. The actions that needs
to run with the resume mode are not yet avalaible. It will be soon with
Lua patches. So the code added by this patch is untestable for the moment.
The list of "tcp_exec_req_rules" cannot resme because is called by the
unresumable function "accept_session".
This patch introduces an action keyword registration system for TCP
rulesets similar to what is available for HTTP rulesets. This sytem
will be useful with lua.
These modifications are done for resolving cross-dependent
includes in the upcoming LUA code.
<proto/channel.h> misses <types/channel.h>.
<types/acl.h> doesn't use <types/session.h> because the session
is already declared in the file as undefined pointer.
appsession.c misses <unistd.h> to use "write()".
Declare undefined pointer "struct session" for <types/proxy.h>
and <types/queue.h>. These includes dont need the detail of this
struct.
Some usages of the converters need to know the attached session. The Lua
needs the session for retrieving his running context. This patch adds
the "session" as an argument of the converters prototype.
This behavior is already existing for the "WAIT_HTTP" analyzer,
this patch just extends the system to any analyzer that would
be waked up on response activity.
When the destination IP is dynamically set, we can't use the "target"
to define the proto. This patch ensures that we always use the protocol
associated with the address family. The proto field was removed from
the server and check structs.
Until now, the TLS ticket keys couldn't have been configured and
shared between multiple instances or multiple servers running HAproxy.
The result was that if a request got a TLS ticket from one instance/server
and it hits another one afterwards, it will have to go through the full
SSL handshake and negotation.
This patch enables adding a ticket file to the bind line, which will be
used for all SSL contexts created from that bind line. We can use the
same file on all instances or servers to mitigate this issue and have
consistent TLS tickets assigned. Clients will no longer have to negotiate
every time they change the handling process.
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
This option disables SSL session reuse when SSL is used to communicate with
the server. It will force the server to perform a full handshake for every
new connection. It's probably only useful for benchmarking, troubleshooting,
and for paranoid users.
This patch adds a new option which allows configuration of the maximum
log level of messages for which email alerts will be sent.
The default is alert which is more restrictive than
the current code which sends email alerts for all priorities.
That behaviour may be configured using the new configuration
option to set the maximum level to notice or greater.
email-alert level notice
Signed-off-by: Simon Horman <horms@verge.net.au>
On Linux since 2.6.37, it's possible to set the socket timeout for
pending outgoing data, with an accuracy of 1 millisecond. This is
pretty handy to deal with dead connections to clients and or servers.
For now we only implement it on the frontend side (bind line) so
that when a client disappears from the net, we're able to quickly
get rid of its connection and possibly release a server connection.
This can be useful with long-lived connections where an application
level timeout is not suited because long pauses are expected (remote
terminals, connection pools, etc).
Thanks to Thijs Houtenbos and John Eckersberg for the suggestion.
This currently does nothing beyond parsing the configuration
and storing in the proxy as there is no implementation of email alerts.
Signed-off-by: Simon Horman <horms@verge.net.au>
As mailer and mailers structures and allow parsing of
a mailers section into those structures.
These structures will subsequently be freed as it is
not yet possible to use reference them in the configuration.
Signed-off-by: Simon Horman <horms@verge.net.au>
The motivation for this is to make checks more independent of each
other to allow further reuse of their infrastructure.
For nowserver->check and server->agent still always use the same values
for the addr and proto fields so this patch should not introduce any
behavioural changes.
Signed-off-by: Simon Horman <horms@verge.net.au>
This commit implements the following new actions :
- "set-method" rewrites the request method with the result of the
evaluation of format string <fmt>. There should be very few valid reasons
for having to do so as this is more likely to break something than to fix
it.
- "set-path" rewrites the request path with the result of the evaluation of
format string <fmt>. The query string, if any, is left intact. If a
scheme and authority is found before the path, they are left intact as
well. If the request doesn't have a path ("*"), this one is replaced with
the format. This can be used to prepend a directory component in front of
a path for example. See also "set-query" and "set-uri".
Example :
# prepend the host name before the path
http-request set-path /%[hdr(host)]%[path]
- "set-query" rewrites the request's query string which appears after the
first question mark ("?") with the result of the evaluation of format
string <fmt>. The part prior to the question mark is left intact. If the
request doesn't contain a question mark and the new value is not empty,
then one is added at the end of the URI, followed by the new value. If
a question mark was present, it will never be removed even if the value
is empty. This can be used to add or remove parameters from the query
string. See also "set-query" and "set-uri".
Example :
# replace "%3D" with "=" in the query string
http-request set-query %[query,regsub(%3D,=,g)]
- "set-uri" rewrites the request URI with the result of the evaluation of
format string <fmt>. The scheme, authority, path and query string are all
replaced at once. This can be used to rewrite hosts in front of proxies,
or to perform complex modifications to the URI such as moving parts
between the path and the query string. See also "set-path" and
"set-query".
All of them are handled by the same parser and the same exec function,
which is why they're merged all together. For once, instead of adding
even more entries to the huge switch/case, we used the new facility to
register action keywords. A number of the existing ones should probably
move there as well.
This one will be used when a regex is expected. It is automatically
resolved after the parsing and compiled into a regex. Some optional
flags are supported in the type-specific flags that should be set by
the optional arg checker. One is used during the regex compilation :
ARGF_REG_ICASE to ignore case.
These flags are meant to be used by arg checkers to pass out-of-band
information related to some args. A typical use is to indicate how a
regex is expected to be compiled/matched based on other arguments.
These flags are initialized to zero by default and it is up to the args
checkers to set them if needed.
We'll soon need to add new argument types, and we don't use the current
limit of 7 arguments, so let's increase the arg type size to 5 bits and
reduce the arg count to 5 (3 max are used today).
This is in order to add new types. This patch does not change anything
else. Two remaining (harmless) occurrences of a count of 8 instead of 7
were fixed by this patch : empty_arg_list[] and the for() loop counting
args.
An SSL connection takes some memory when it exists and during handshakes.
We measured up to 16kB for an established endpoint, and up to 76 extra kB
during a handshake. The SSL layer stores these values into the global
struct during initialization. If other SSL libs are used, it's easy to
change these values. Anyway they'll only be used as gross estimates in
order to guess the max number of SSL conns that can be established when
memory is constrained and the limit is not set.
We'll need to know the number of SSL connections, their use and their
cost soon. In order to avoid getting tons of ifdefs everywhere, always
export SSL information in the global section. We add two flags to know
whether or not SSL is used in a frontend and in a backend.
This is equivalent to what was done in commit 48936af ("[MINOR] log:
ability to override the syslog tag") but this time instead of doing
this globally, it does it per proxy. The purpose is to be able to use
a separate log tag for various proxies (eg: make it easier to route
log messages depending on the customer).
commit 9ede66b0 introduced an environment variable (HAPROXY_SERVER_CURCONN) that
was supposed to be dynamically updated, but it was set only once, during its
initialization.
Most of the code provided in this previous patch has been rewritten in order to
easily update the environment variables without reallocating memory during each
check.
Now, HAPROXY_SERVER_CURCONN will contain the current number of connections on
the server at the time of the check.
This setting is used to limit memory usage without causing the alloc
failures caused by "-m". Unexpectedly, tests have shown a performance
boost of up to about 18% on HTTP traffic when limiting the number of
buffers to about 10% of the amount of concurrent connections.
tune.buffers.limit <number>
Sets a hard limit on the number of buffers which may be allocated per process.
The default value is zero which means unlimited. The minimum non-zero value
will always be greater than "tune.buffers.reserve" and should ideally always
be about twice as large. Forcing this value can be particularly useful to
limit the amount of memory a process may take, while retaining a sane
behaviour. When this limit is reached, sessions which need a buffer wait for
another one to be released by another session. Since buffers are dynamically
allocated and released, the waiting time is very short and not perceptible
provided that limits remain reasonable. In fact sometimes reducing the limit
may even increase performance by increasing the CPU cache's efficiency. Tests
have shown good results on average HTTP traffic with a limit to 1/10 of the
expected global maxconn setting, which also significantly reduces memory
usage. The memory savings come from the fact that a number of connections
will not allocate 2*tune.bufsize. It is best not to touch this value unless
advised to do so by an haproxy core developer.
We've already experimented with three wake up algorithms when releasing
buffers : the first naive one used to wake up far too many sessions,
causing many of them not to get any buffer. The second approach which
was still in use prior to this patch consisted in waking up either 1
or 2 sessions depending on the number of FDs we had released. And this
was still inaccurate. The third one tried to cover the accuracy issues
of the second and took into consideration the number of FDs the sessions
would be willing to use, but most of the time we ended up waking up too
many of them for nothing, or deadlocking by lack of buffers.
This patch completely removes the need to allocate two buffers at once.
Instead it splits allocations into critical and non-critical ones and
implements a reserve in the pool for this. The deadlock situation happens
when all buffers are be allocated for requests pending in a maxconn-limited
server queue, because then there's no more way to allocate buffers for
responses, and these responses are critical to release the servers's
connection in order to release the pending requests. In fact maxconn on
a server creates a dependence between sessions and particularly between
oldest session's responses and latest session's requests. Thus, it is
mandatory to get a free buffer for a response in order to release a
server connection which will permit to release a request buffer.
Since we definitely have non-symmetrical buffers, we need to implement
this logic in the buffer allocation mechanism. What this commit does is
implement a reserve of buffers which can only be allocated for responses
and that will never be allocated for requests. This is made possible by
the requester indicating how much margin it wants to leave after the
allocation succeeds. Thus it is a cooperative allocation mechanism : the
requester (process_session() in general) prefers not to get a buffer in
order to respect other's need for response buffers. The session management
code always knows if a buffer will be used for requests or responses, so
that is not difficult :
- either there's an applet on the initiator side and we really need
the request buffer (since currently the applet is called in the
context of the session)
- or we have a connection and we really need the response buffer (in
order to support building and sending an error message back)
This reserve ensures that we don't take all allocatable buffers for
requests waiting in a queue. The downside is that all the extra buffers
are really allocated to ensure they can be allocated. But with small
values it is not an issue.
With this change, we don't observe any more deadlocks even when running
with maxconn 1 on a server under severely constrained memory conditions.
The code becomes a bit tricky, it relies on the scheduler's run queue to
estimate how many sessions are already expected to run so that it doesn't
wake up everyone with too few resources. A better solution would probably
consist in having two queues, one for urgent requests and one for normal
requests. A failed allocation for a session dealing with an error, a
connection event, or the need for a response (or request when there's an
applet on the left) would go to the urgent request queue, while other
requests would go to the other queue. Urgent requests would be served
from 1 entry in the pool, while the regular ones would be served only
according to the reserve. Despite not yet having this, it works
remarkably well.
This mechanism is quite efficient, we don't perform too many wake up calls
anymore. For 1 million sessions elapsed during massive memory contention,
we observe about 4.5M calls to process_session() compared to 4.0M without
memory constraints. Previously we used to observe up to 16M calls, which
rougly means 12M failures.
During a test run under high memory constraints (limit enforced to 27 MB
instead of the 58 MB normally needed), performance used to drop by 53% prior
to this patch. Now with this patch instead it *increases* by about 1.5%.
The best effect of this change is that by limiting the memory usage to about
2/3 to 3/4 of what is needed by default, it's possible to increase performance
by up to about 18% mainly due to the fact that pools are reused more often
and remain hot in the CPU cache (observed on regular HTTP traffic with 20k
objects, buffers.limit = maxconn/10, buffers.reserve = limit/2).
Below is an example of scenario which used to cause a deadlock previously :
- connection is received
- two buffers are allocated in process_session() then released
- one is allocated when receiving an HTTP request
- the second buffer is allocated then released in process_session()
for request parsing then connection establishment.
- poll() says we can send, so the request buffer is sent and released
- process session gets notified that the connection is now established
and allocates two buffers then releases them
- all other sessions do the same till one cannot get the request buffer
without hitting the margin
- and now the server responds. stream_interface allocates the response
buffer and manages to get it since it's higher priority being for a
response.
- but process_session() cannot allocate the request buffer anymore
=> We could end up with all buffers used by responses so that none may
be allocated for a request in process_session().
When the applet processing leaves the session context, the test will have
to be changed so that we always allocate a response buffer regardless of
the left side (eg: H2->H1 gateway). A final improvement would consists in
being able to only retry the failed I/O operation without waking up a
task, but to date all experiments to achieve this have proven not to be
reliable enough.
When a session_alloc_buffers() fails to allocate one or two buffers,
it subscribes the session to buffer_wq, and waits for another session
to release buffers. It's then removed from the queue and woken up with
TASK_WAKE_RES, and can attempt its allocation again.
We decide to try to wake as many waiters as we release buffers so
that if we release 2 and two waiters need only once, they both have
their chance. We must never come to the situation where we don't wake
enough tasks up.
It's common to release buffers after the completion of an I/O callback,
which can happen even if the I/O could not be performed due to half a
failure on memory allocation. In this situation, we don't want to move
out of the wait queue the session that was just added, otherwise it
will never get any buffer. Thus, we only force ourselves out of the
queue when freeing the session.
Note: at the moment, since session_alloc_buffers() is not used, no task
is subscribed to the wait queue.
This patch makes it possible to create binds and servers in separate
namespaces. This can be used to proxy between multiple completely independent
virtual networks (with possibly overlapping IP addresses) and a
non-namespace-aware proxy implementation that supports the proxy protocol (v2).
The setup is something like this:
net1 on VLAN 1 (namespace 1) -\
net2 on VLAN 2 (namespace 2) -- haproxy ==== proxy (namespace 0)
net3 on VLAN 3 (namespace 3) -/
The proxy is configured to make server connections through haproxy and sending
the expected source/target addresses to haproxy using the proxy protocol.
The network namespace setup on the haproxy node is something like this:
= 8< =
$ cat setup.sh
ip netns add 1
ip link add link eth1 type vlan id 1
ip link set eth1.1 netns 1
ip netns exec 1 ip addr add 192.168.91.2/24 dev eth1.1
ip netns exec 1 ip link set eth1.$id up
...
= 8< =
= 8< =
$ cat haproxy.cfg
frontend clients
bind 127.0.0.1:50022 namespace 1 transparent
default_backend scb
backend server
mode tcp
server server1 192.168.122.4:2222 namespace 2 send-proxy-v2
= 8< =
A bind line creates the listener in the specified namespace, and connections
originating from that listener also have their network namespace set to
that of the listener.
A server line either forces the connection to be made in a specified
namespace or may use the namespace from the client-side connection if that
was set.
For more documentation please read the documentation included in the patch
itself.
Signed-off-by: KOVACS Tamas <ktamas@balabit.com>
Signed-off-by: Sarkozi Laszlo <laszlo.sarkozi@balabit.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.com>
Lasse Birnbaum Jensen reported an issue when agent checks are used at the same
time as standard healthchecks when SSL is enabled on the server side.
The symptom is that agent checks try to communicate in SSL while it should
manage raw data. This happens because the transport layer is shared between all
kind of checks.
To fix the issue, the transport layer is now stored in each check type,
allowing to use SSL healthchecks when required, while an agent check should
always use the raw_sock implementation.
The fix must be backported to 1.5.
Adds global statements 'ssl-default-server-options' and
'ssl-default-bind-options' to force on 'server' and 'bind' lines
some ssl options.
Currently available options are 'no-sslv3', 'no-tlsv10', 'no-tlsv11',
'no-tlsv12', 'force-sslv3', 'force-tlsv10', 'force-tlsv11',
'force-tlsv12', and 'no-tls-tickets'.
Example:
global
ssl-default-server-options no-sslv3
ssl-default-bind-options no-sslv3
Commit bb2e669 ("BUG/MAJOR: http: correctly rewind the request body
after start of forwarding") was incorrect/incomplete. It used to rely on
CF_READ_ATTACHED to stop updating msg->sov once data start to leave the
buffer, but this is unreliable because since commit a6eebb3 ("[BUG]
session: clear BF_READ_ATTACHED before next I/O") merged in 1.5-dev1,
this flag is only ephemeral and is cleared once all analysers have
seen it. So we can start updating msg->sov again each time we pass
through this place with new data. With a sufficiently large amount of
data, it is possible to make msg->sov wrap and validate the if()
condition at the top, causing the buffer to advance by about 2GB and
crash the process.
Note that the offset cannot be controlled by the attacker because it is
a sum of millions of small random sizes depending on how many bytes were
read by the server and how many were left in the buffer, only because
of the speed difference between reading and writing. Also, nothing is
written, the invalid pointer resulting from this operation is only read.
Many thanks to James Dempsey for reporting this bug and to Chris Forbes for
narrowing down the faulty area enough to make its root cause analysable.
This fix must be backported to haproxy 1.5.
Sometimes it would be convenient to have a log counter so that from a log
server we know whether some logs were lost or not. The frontend's log counter
serves exactly this purpose. It's incremented each time a traffic log is
produced. If a log is disabled using "http-request set-log-level silent",
the counter will not be incremented. However, admin logs are not accounted
for. Also, if logs are filtered out before being sent to the server because
of a minimum level set on the log line, the counter will be increased anyway.
The counter is 32-bit, so it will wrap, but that's not an issue considering
that 4 billion logs are rarely in the same file, let alone close to each
other.
Just by moving a few struct members around, we can avoid 32-bit holes
between 64-bit pointers and shrink the struct size by 48 bytes. That's
not huge but that's for free, so let's do it.