The global tuning options right now only concern the polling mechanisms,
and they are not in the global struct itself. It's not very practical to
add other options so let's move them to the global struct and remove
types/polling.h which was not used for anything else.
Three new options have been added when CONFIG_HAP_LINUX_SPLICE is
set :
- splice-request
- splice-response
- splice-auto
They are used to enable splicing per frontend/backend. They are also
supported in defaults sections. The "splice-auto" option is meant to
automatically turn splice on for buffers marked as fast streamers.
This should save quite a bunch of file descriptors.
It was required to add a new "options2" field to the proxy structure
because the original "options" is full.
When global.maxpipes is not set, it is automatically adjusted to
the max of the sums of all frontend's and backend's maxconns for
those which have at least one splice option enabled.
"option transparent" was set and checked on frontends only while it
is purely a backend thing as it replaces the "balance" mode. For this
reason, it did only work in "listen" sections. This change will then
not affect the rare users of this option.
It is now possible to set or clear a cookie during a redirection. This
is useful for logout pages, or for protecting against some DoSes. Check
the documentation for the options supported by the "redirect" keyword.
(cherry-picked from commit 4af993822e880d8c932f4ad6920db4c9242b0981)
If "drop-query" is present on a "redirect" line using the "prefix" mode,
then the returned Location header will be the request URI without the
query-string. This may be used on some login/logout pages, or when it
must be decided to redirect the user to a non-secure server.
(cherry-picked from commit f2d361ccd73aa16538ce767c766362dd8f0a88fd)
was just looking through the source, and noticed this... :)
(cherry picked from commit 63b76be713784f487e8d0c859a85513642fe7bdc)
(cherry picked from commit a801db6c5ea750f93a3795dbb2e70c03e05bbef4)
Using an ACL-related keyword in the defaults section causes a
segfault during parsing because the list headers are not initialized.
We must initialize list headers for default instance and reject
keywords relying on ACLs.
(cherry picked from commit 1c90a6ec20946a713e9c93995a8e91ed3eeb9da4)
(cherry picked from commit eb8131b4e418b838b2d62d991d91d94482ba49de)
There is a problem when an instance is marked "disabled". Its ports are
still bound but will not be unbound upon termination. This causes processes
to accumulate during soft restarts, and might even cause failures to restart
new ones due to the inability to bind to the same port.
The ideal solution would be to bind all ports at the end of the configuration
parsing. An acceptable workaround is to unbind all listeners of disabled
proxies. This is what the current patch does.
(cherry picked from commit a944218e9c1d5ff1aca34609146389dc680335b7)
(cherry picked from commit 8cfebbb82b87345bade831920177077e7d25840a)
In order to achieve more generic accept() code, we can set the request
analysers at the listener registration time. It's better than doing it
during accept(), and allows more code reuse.
The following patch introduced a minor bug :
[MINOR] permit renaming of x-forwarded-for header
If "option forwardfor" is declared in a defaults section, the header name
is never set and we see an empty header name before the value. Also, the
header name was not reset between two defaults sections.
Because I needed it in my situation - here's a quick patch to
allow changing of the "x-forwarded-for" header by using a suboption to
"option forwardfor".
Suboption "header XYZ" will set the header from "x-forwarded-for" to "XYZ".
Default is still "x-forwarded-for" if the header value isn't defined.
Also the suboption 'except a.b.c.d/z' still works on the same line.
So it's now: option forwardfor [except a.b.c.d[/z]] [header XYZ]
When an ACL is referenced at a wrong place (eg: response during request, layer7
during layer4), try to indicate precisely the name and requirements of this ACL.
Only the first faulty ACL is returned. A small change consisting in iterating
that way may improve reports :
cap = ACL_USE_any_unexpected
while ((acl=cond_find_require(cond, cap))) {
warning()
cap &= ~acl->requires;
}
This will report the first ACL of each unsupported type. But doing so will
mangle the error reporting a lot, so we need to rework error reports first.
ACL now hold information on the availability of the data they rely
on. They can indicate which parts of the requests/responses they
require, and the rules parser may now report inconsistencies.
As an example, switching rules are now checked for response-specific
ACLs, though those are not still set. A warning is reported in case
of mismatch. ACLs keyword restrictions will now have to be specifically
set wherever a better control is expected.
The line number where an ACL condition is declared has been added to
the conditions in order to be able to report the faulty line number
during post-loading checks.
It should be stated as a rule that a C file should never
include types/xxx.h when proto/xxx.h exists, as it gives
less exposure to declaration conflicts (one of which was
caught and fixed here) and it complicates the file headers
for nothing.
Only types/global.h, types/capture.h and types/polling.h
have been found to be valid includes from C files.
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
Any module which needs configuration keywords may now dynamically
register a keyword in a given section, and associate it with a
configuration parsing function using cfg_register_keywords() from
a constructor function. This makes the configuration parser more
modular because it is not required anymore to touch cfg_parse.c.
Example :
static int parse_global_blah(char **args, int section_type, struct proxy *curpx,
struct proxy *defpx, char *err, int errlen)
{
printf("parsing blah in global section\n");
return 0;
}
static int parse_listen_blah(char **args, int section_type, struct proxy *curpx,
struct proxy *defpx, char *err, int errlen)
{
printf("parsing blah in listen section\n");
if (*args[1]) {
snprintf(err, errlen, "missing arg for listen_blah!!!");
return -1;
}
return 0;
}
static struct cfg_kw_list cfg_kws = {{ },{
{ CFG_GLOBAL, "blah", parse_global_blah },
{ CFG_LISTEN, "blah", parse_listen_blah },
{ 0, NULL, NULL },
}};
__attribute__((constructor))
static void __module_init(void)
{
cfg_register_keywords(&cfg_kws);
}
This is the first attempt at moving all internal parts from
using struct timeval to integer ticks. Those provides simpler
and faster code due to simplified operations, and this change
also saved about 64 bytes per session.
A new header file has been added : include/common/ticks.h.
It is possible that some functions should finally not be inlined
because they're used quite a lot (eg: tick_first, tick_add_ifset
and tick_is_expired). More measurements are required in order to
decide whether this is interesting or not.
Some function and variable names are still subject to change for
a better overall logics.
The first implementation of the monotonic clock did not verify
forward jumps. The consequence is that a fast changing time may
expire a lot of tasks. While it does seem minor, in fact it is
problematic because most machines which boot with a wrong date
are in the past and suddenly see their time jump by several
years in the future.
The solution is to check if we spent more apparent time in
a poller than allowed (with a margin applied). The margin
is currently set to 1000 ms. It should be large enough for
any poll() to complete.
Tests with randomly jumping clock show that the result is quite
accurate (error less than 1 second at every change of more than
one second).
If the system date is set backwards while haproxy is running,
some scheduled events are delayed by the amount of time the
clock went backwards. This is particularly problematic on
systems where the date is set at boot, because it seldom
happens that health-checks do not get sent for a few hours.
Before switching to use clock_gettime() on systems which
provide it, we can at least ensure that the clock is not
going backwards and maintain two clocks : the "date" which
represents what the user wants to see (mostly for logs),
and an internal date stored in "now", used for scheduled
events.
The dequeuing logic was completely wrong. First, a task was assigned
to all servers to process the queue, but this task was never scheduled
and was only woken up on session free. Second, there was no reservation
of server entries when a task was assigned a server. This means that
as long as the task was not connected to the server, its presence was
not accounted for. This was causing trouble when detecting whether or
not a server had reached maxconn. Third, during a redispatch, a session
could lose its place at the server's and get blocked because another
session at the same moment would have stolen the entry. Fourth, the
redispatch option did not work when maxqueue was reached for a server,
and it was not possible to do so without indefinitely hanging a session.
The root cause of all those problems was the lack of pre-reservation of
connections at the server's, and the lack of tracking of servers during
a redispatch. Everything relied on combinations of flags which could
appear similarly in quite distinct situations.
This patch is a major rework but there was no other solution, as the
internal logic was deeply flawed. The resulting code is cleaner, more
understandable, uses less magics and is overall more robust.
As an added bonus, "option redispatch" now works when maxqueue has
been reached on a server.
A new "redirect" keyword adds the ability to send an HTTP 301/302/303
redirection to either an absolute location or to a prefix followed by
the original URI. The redirection is conditionned by ACL rules, so it
becomes very easy to move parts of a site to another site using this.
This work was almost entirely done at Exceliance by Emeric Brun.
A test-case has been added in the tests/ directory.
This patch allows to specify a domain used when inserting a cookie
providing a session stickiness. Usefull for example with wildcard domains.
The patch adds one new variable to the struct proxy: cookiedomain.
When set the domain is appended to a Set-Cookie header.
Domain name is validated using the new invalid_domainchar() function.
It is basically invalid_char() limited to [A-Za-z0-9_.-]. Yes, the test
is too trivial and does not cover all wrong situations, but the main
purpose is to detect most common mistakes, not intentional abuses.
The underscore ("_") character is not RFC-valid but as it is
often (mis)used so I decided to allow it.
The new "leastconn" LB algorithm selects the server which has the
least established or pending connections. The weights are considered,
so that a server with a weight of 20 will get twice as many connections
as the server with a weight of 10.
The algorithm respects the minconn/maxconn settings, as well as the
slowstart since it is a dynamic algorithm. It also correctly supports
backup servers (one and all).
It is generally suited for protocols with long sessions (such as remote
terminals and databases), as it will ensure that upon restart, a server
with no connection will take all new ones until its load is balanced
with others.
A test configuration has been added in order to ease regression testing.
This patch adds a possibility to set a persistent id for a proxy/server.
Now, even if some proxies/servers are inserted/deleted/moved, iids and
sids can be still used reliable.
Some people add servers with tricky names (BACKEND or FRONTEND for example).
So I also added one more field ('type') to distinguish between a
backend (0), frontend (1) and server (2) without complicated logic:
if name==BACKEND and sid==0 then type is BACKEND else type is SERVER,
etc for a FRONTEND. It also makes possible to have one frontend with more
than one IP (a patch coming soon) with independed stats - for example to
differs between remote and local traffic.
Finally, I added documentation about the CSV format.
This patch depends on '[MEDIUM] Implement "track [<backend>/]<server>"'
This patch implements ability to set the current state of one server
by tracking another one. It:
- adds two variables: *tracknext, *tracked to struct server
- implements findserver(), similar to findproxy()
- adds "track" keyword accepting both "proxy/server" and "server" (assuming current proxy)
- verifies if both checks and tracking is not enabled at the same time
- changes set_server_down() to notify tracking server
- creates set_server_up(), set_server_disabled(), set_server_enabled() by
moving the code from process_chk() and adding notifications
- changes stats to show a name of tracked server instead of Chk/Dwn/Dwntime(html)
or by adding new variable (csv)
Changes from the previuos version:
- it is possibile to track independently of the declaration order
- one extra comma bug is fixed
- new condition to check if there is no disable-on-404 inconsistency
The servers now support the "redir" keyword, making it possible to
return a 302 with the specified prefix in front of the request instead
of connecting to them. This is generally useful for multi-site load
balancing but may also serve in order to achieve very high traffic
rate.
The keyword has only been added to the config parser and to structures,
it's not used yet.
This patch adds two new variables: fastinter and downinter.
When server state is:
- non-transitionally UP -> inter (no change)
- transitionally UP (going down), unchecked or transitionally DOWN (going up) -> fastinter
- down -> downinter
It allows to set something like:
server sr6 127.0.51.61:80 cookie s6 check inter 10000 downinter 20000 fastinter 500 fall 3 weight 40
In the above example haproxy uses 10000ms between checks but as soon as
one check fails fastinter (500ms) is used. If server is down
downinter (20000) is used or fastinter (500ms) if one check pass.
Fastinter is also used when haproxy starts.
New "timeout.check" variable was added, if set haproxy uses it as an additional
read timeout, but only after a connection has been already established. I was
thinking about using "timeout.server" here but most people set this
with an addition reserve but still want checks to kick out laggy servers.
Please also note that in most cases check request is much simpler
and faster to handle than normal requests so this timeout should be smaller.
I also changed the timeout used for check connections establishing.
Changes from the previous version:
- use tv_isset() to check if the timeout is set,
- use min("timeout connect", "inter") but only if "timeout check" is set
as this min alone may be to short for full (connect + read) check,
- debug code (fprintf) commented/removed
- documentation
Compile tested only (sorry!) as I'm currently traveling but changes
are rather small and trivial.
Using some Linux kernel patches which add the IP_TRANSPARENT
SOL_IP option , it is possible to bind to a non-local address
on without having resort to any sort of NAT, thus causing no
performance degradation.
This is by far faster and cleaner than the previous CTTPROXY
method. The code has been slightly changed in order to remain
compatible with CTTPROXY as a fallback for the new method when
it does not work.
It is not needed anymore to specify the outgoing source address
for connect, it can remain 0.0.0.0.
Using some Linux kernel patches, it is possible to redirect non-local
traffic to local sockets when IP forwarding is enabled. In order to
enable this option, we introduce the "transparent" option keyword on
the "bind" command line. It will make the socket reachable by remote
sources even if the destination address does not belong to the machine.
In order to offer DoS protection, it may be required to lower the maximum
accepted time to receive a complete HTTP request without affecting the client
timeout. This helps protecting against established connections on which
nothing is sent. The client timeout cannot offer a good protection against
this abuse because it is an inactivity timeout, which means that if the
attacker sends one character every now and then, the timeout will not
trigger. With the HTTP request timeout, no matter what speed the client
types, the request will be aborted if it does not complete in time.
This new parameter makes it possible to override the default
number of consecutive incoming connections which can be
accepted on a socket. By default it is not limited on single
process mode, and limited to 8 in multi-process mode.
Add the "backlog" parameter to frontends, to give hints to
the system about the approximate listen backlog desired size.
In order to protect against SYN flood attacks, one solution is
to increase the system's SYN backlog size. Depending on the
system, sometimes it is just tunable via a system parameter,
sometimes it is not adjustable at all, and sometimes the system
relies on hints given by the application at the time of the
listen() syscall. By default, HAProxy passes the frontend's
maxconn value to the listen() syscall. On systems which can
make use of this value, it can sometimes be useful to be able
to specify a different value, hence this backlog parameter.
This patch adds a possibility to invert most of available options by
introducing the "no" keyword, available as an additional prefix.
If it is found arguments are shifted left and an additional flag (inv)
is set.
It allows to use all options from a current defaults section, except
the selected ones, for example:
-- cut here --
defaults
contimeout 4200
clitimeout 50000
srvtimeout 40000
option contstats
listen stats 1.2.3.4:80
no option contstats
-- cut here --
Currenly inversion works only with the "option" keyword.
The patch also moves last_checks calculation at the end of the readcfgfile()
function and changes "PR_O_FORCE_CLO | PR_O_HTTP_CLOSE" into "PR_O_FORCE_CLO"
in cfg_opts so it is possible to invert forceclose without breaking httpclose
(and vice versa) and to invert tcpsplice in one proxy but to keep a proper
last_checks value when tcpsplice is used in another proxy. Now, the code
checks for PR_O_FORCE_CLO everywhere it checks for PR_O_HTTP_CLOSE.
I also decided to depreciate "redisp" and "redispatch" keywords as it is IMHO
better to use "option redispatch" which can be inverted.
Some useful documentation were added and at the same time I sorted
(alfabetically) all valid options both in the code and the documentation.
The code in haproxy-1.3.13.1 only supports syslogging to an internet
address. The attached patch:
- Adds support for syslogging to a UNIX domain socket (e.g., /dev/log).
If the address field begins with '/' (absolute file path), then
AF_UNIX is used to construct the socket. Otherwise, AF_INET is used.
- Achieves clean single-source build on both Mac OS X and Linux
(sockaddr_in.sin_len and sockaddr_un.sun_len field aren't always present).
For handling sendto() failures in send_log(), it appears that the existing
code is fine (no need to close/recreate socket) for both UDP and UNIX-domain
syslog server. So I left things alone (did not close/recreate socket).
Closing/recreating socket after each failure would also work, but would lead
to increased amount of unnecessary socket creation/destruction if syslog is
temporarily unavailable for some reason (especially for verbose loggers).
Please consider this patch for inclusion into the upstream haproxy codebase.
A new "timeout" keyword replaces old "{con|cli|srv}timeout", and
provides the ability to independantly set the following timeouts :
- client
- tarpit
- queue
- connect
- server
- appsession
Additionally, the "clitimeout", "contimeout" and "srvtimeout" values
are supported but deprecated. No warning is emitted yet when they are
used since the option is very new.
Other timeouts should follow soon now.
Now the connect timeout, tarpit timeout and queue timeout are
distinct. In order to retain compatibility with older versions,
if either queue or tarpit is left unset both in the proxy and
in the default proxy, then it is inherited from the connect
timeout as before.