A suboptimal behaviour was appearing quite often with sepoll. When a
speculative write failed after a connect(), the socket was added to
the poll list using epoll_ctl(ADD). Then when epoll_wait() returned a
write event, the send() was performed and write event disabled, causing
it to get back to the spec list in order to be disabled later. But if
some new accept() did succeed in the same run, then fd_created was not
null, causing a new run of the spec list to happen. This run would then
detect the old event in STOP state and would remove it from the poll
list using epoll_ctl(DEL).
After this, process_session() enables reading on the FD, attempting
an speculative recv() which fails then adds it again using epoll_ctl(ADD)
to do it again. So the total sequence of syscalls looked like this :
connect(fd) = EAGAIN
send(fd) = EAGAIN
epoll_ctl(ADD(fd:OUT))
epoll_wait() = fd:OUT
send(fd) > 0
epoll_ctl(DEL(fd))
recv(fd) = EAGAIN
epoll_ctl(ADD(fd:IN))
recv(fd) > 0
In order to fix this stupid situation, we must compute the epoll_ctl()
parameters at the last moment, just before doing epoll_wait(). This is
what was done except that the spec events were processed just before doing
that without leaving time for the tasks to adjust the FDs if needed. This
is also the reason we have the re_poll_once label to try to catch new
events in case of a successful accept().
The new solution consists in doing the opposite :
- compute epoll_ctl()
- call epoll_wait()
- call spec events
This significantly reduces the number of iterations on the spec events
and avoids a huge number of epoll_ctl() ping/pongs. The new sequence
above simply becomes :
connect(fd) = EAGAIN
send(fd) = EAGAIN
epoll_ctl(ADD(fd:OUT))
epoll_wait() = fd:OUT
send(fd) > 0
epoll_ctl(MOD(fd:IN))
recv(fd) > 0
Also, there is no need to re-run the spec events after an accept() as
it will automatically be detected in the spec list after a return from
polled events.
The gains are important, with up to 4.5% global performance increase in
connection rate on HTTP with small objects. The code is less tricky and
does not need anymore to skip epoll_wait() every other call, nor to
track the number of FDs newly created.
Commit 5e205524 was a bit overzealous by inconditionally enabling
quick ack when a request is not yet in the buffer, because it also
does so when nothing has been received yet, causing a useless ACK
to be emitted.
Improve the situation by doing this only if the input buffer is
empty (indicating that nothing was sent by the client).
In case of keep-alive, an empty buffer means we already have a
response in flight which will serve as an ACK.
These pointers were used to hold pointers to buffers in the past, but
since we introduced the stream interface, they're no longer used but
they were still sometimes set.
Removing them shrink the struct fdtab from 32 to 24 bytes on 32-bit machines,
and from 52 to 36 bytes on 64-bit machines, which is a significant saving. A
quick tests shows a steady 0.5% performance gain, probably due to the better
cache efficiency.
Tunnel timeouts are used when TCP connections are forwarded, or
when forwarding upgraded HTTP connections (WebSocket) as well as
CONNECT requests to proxies.
This timeout allows long-lived sessions to be supported without
having to set large timeouts to normal requests.
In sess_establish, once we've prepared everythin, we can call the socket layer
init function. We pass an argument for targets which have one (eg: servers). At
the moment, the existing socket layers don't have init functions, but SSL will
need one.
Up to now, if an outgoing connection had no data to send, the socket layer
had to perform a connect() again to check for establishment. This is not
acceptable for SSL, and will cause problems with socketpair(). Some socket
layers will also need an initializer before sending data (eg: SSL).
The solution consists in moving the connect() test to the protocol layer
(eg: TCP) and to make it hold the fd->write callback until the connection
is validated. At this point, it will switch the write callback to the
socket layer's write function. In fact we need to hold both read and write
callbacks to ensure the socket layer is never called before being initialized.
This intermediate callback is used only if there is a socket init function
or if there are no data to send.
The socket layer does not have any code to check for connection establishment
anymore, which makes sense.
Instead of hard-coding sock_raw in connect_server(), we set this socket
operation at config parsing time. Right now, only servers and peers have
it. Proxies are still hard-coded as sock_raw. This will be needed for
future work on SSL which requires a different socket layer.
Similarly to the previous patch, we don't need the socket-layer functions
outside of stream_interface. They could even move to a file dedicated to
applets, though that does not seem particularly useful at the moment.
Commit e164e7a removed get_src/get_dst setting in the stream interfaces but
forgot to set it in proto_tcp. Get the feature back because we need it for
logging, transparent mode, ACLs etc... We now rely on the stream interface
direction to know what syscall to use.
One benefit of doing it this way is that we don't use getsockopt() anymore
on outgoing stream interfaces nor on UNIX sockets.
We'll soon have an SSL socket layer, and in order to ease the difference
between the two, we use the name "sock_raw" to designate the one which
directly talks to the sockets without any conversion.
Cyril Bont reported that passing an invalid userlist name to
http_auth_group() caused haproxy to crash at load. This was due
to an attempt to use the unresolved userlist pointer later to
resolve auth groups since we report many errors before leaving
now.
This issue does not exist in earlier versions since they immediately
abort on the first error, so no backport is needed.
http_auth and http_auth_group used to share the same fetch function, while
they're doing very different things. The first one only checks whether the
supplied credentials are valid wrt a userlist while the second not only
checks this but also checks group ownership from a list of patterns.
Recent acl/pattern merge caused a simplification here by which the fetch
function would always return a boolean, so the group match was always fine
if the user:password was valid, regardless of the patterns provided with
the ACL.
The proper solution consists in splitting the function in two, depending
on what is desired.
It's also worth noting that check_user() would probably be split, one to
check user:password, and the other one to check for group ownership for
an already valid user:password combination. At this point it is not certain
if the group mask is still useful or not considering that the passwd check
is always made.
This bug was reported and diagnosed by Cyril Bont. It first appeared
in 1.5-dev9 so it does not need any backporting.
I introduced a regression in commit 19979e176e while reworking the admin
actions results.
"Unexpected result" was displayed even if the action was applied due to a
misplaced initialization. This small patch should fix it.
Note: no need to backport.
There is no more reason for the realign function being HTTP specific,
it only operates on a buffer now. Let's move it to buffers.c instead.
It's likely that buffer_bounce_realign is broken (not used), this will
have to be inspected. The function is worth rewriting as it can be
cheaper than buffer_slow_realign() to realign large wrapping buffers.
All keywords registered using a cfg_kw_list now make use of the new error reporting
framework. This allows easier and more precise error reporting without having to
deal with complex buffer allocation issues.
Last memory poisonning patch immediately made this issue appear.
The unique_id field is released but not properly initialized. The
feature was introduced very recently, no backport is needed.
From time to time, some bugs are discovered that are caused by non-initialized
memory areas. It happens that most platforms return a zero-filled area upon
first malloc() thus hiding potential bugs. This patch also replaces malloc()
in pools with calloc() to ensure that all platforms exhibit the same behaviour
upon startup. In order to catch these bugs more easily, add a -dM command line
flag to enable memory poisonning. Optionally, passing -dM<byte> forces the
poisonning byte to <byte>.
Commit b22e55bc introduced send_proxy_ofs but forgot to initialize it,
which remained unnoticed since it's always at the same place in the
stream interface. On a machine with dirty RAM returned by malloc(),
some responses were holding a PROXY header, which normally is not
possible.
The problem goes away after properly initializing the field upon each
new session_accept().
This fix does not need to be backported except if any code makes use of
a backport of this feature.
A number of important information were missing from the error captures, so
let's improve them. Now we also log source port, session flags, transaction
flags, message flags, pending output bytes, expected buffer wrapping position,
total bytes transferred, message chunk length, and message body length.
As such, the output format has slightly evolved and the source address moved
to the third line :
[08/May/2012:11:14:36.341] frontend echo (#1): invalid request
backend echo (#1), server <NONE> (#-1), event #1
src 127.0.0.1:40616, session #4, session flags 0x00000000
HTTP msg state 26, msg flags 0x00000000, tx flags 0x00000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00909002, out 0 bytes, total 28 bytes
pending 28 bytes, wrapping at 8030, error at position 7:
00000 GET / /?t=20000 HTTP/1.1\r\n
00026 \r\n
[08/May/2012:11:13:13.426] backend echo (#1) : invalid response
frontend echo (#1), server local (#1), event #0
src 127.0.0.1:40615, session #1, session flags 0x0000044e
HTTP msg state 32, msg flags 0x0000000e, tx flags 0x08200000
HTTP chunk len 0 bytes, HTTP body len 20 bytes
buffer flags 0x00008002, out 81 bytes, total 92 bytes
pending 11 bytes, wrapping at 7949, error at position 9:
00000 Foo: bar\r\r\n
Since the beginning of buffer&msg changes, the error position (err_pos)
had not completely been converted and some offsets still appear wrong.
Now we ensure that everywhere msg->err_pos is relative to buf->p and
we always report buf->i bytes starting at buf->p in all error captures,
which ensures that err_pos is there.
This is not exactly a bug and is specific to latest changes so no backport
is needed.
Commit 81f2fb added support for wrapping buffer captures, but unfortunately
the code used to perform two memcpy() over the same destination, causing a
loss of the start of the buffer rendering some error snapshots unusable.
This bug is present in 1.4 too and must be backported.
These methods have been superseded by src and dst which support
multiple families. There is no point keeping them since they appeared
in a development version anyway.
For configurations using "src6", please use "src" instead. For "dst6",
use "dst" instead.
The previous sockstream_accept() function uses nothing from sockstream, and
is totally irrelevant to stream interfaces. Move this to the protocols.c
file which handles listeners and protocols, and call it listener_accept().
It now makes much more sense that the code dealing with listen() also handles
accept() and passes it to upper layers.
These operators are used regardless of the socket protocol family. Move
them to a "sock_ops" struct. ->read and ->write have been moved there too
as they have no reason to remain at the protocol level.
Make use of the new IPv6 pattern type so that acl_match_ip() knows how to
compare pattern and sample.
IPv6 may be entered in their usual form, with or without a netmask appended.
Only bit counts are accepted for IPv6 netmasks. In order to avoid any risk of
trouble with randomly resolved IP addresses, host names are never allowed in
IPv6 patterns.
HAProxy is also able to match IPv4 addresses with IPv6 addresses in the
following situations :
- tested address is IPv4, pattern address is IPv4, the match applies
in IPv4 using the supplied mask if any.
- tested address is IPv6, pattern address is IPv6, the match applies
in IPv6 using the supplied mask if any.
- tested address is IPv6, pattern address is IPv4, the match applies in IPv4
using the pattern's mask if the IPv6 address matches with 2002:IPV4::,
::IPV4 or ::ffff:IPV4, otherwise it fails.
- tested address is IPv4, pattern address is IPv6, the IPv4 address is first
converted to IPv6 by prefixing ::ffff: in front of it, then the match is
applied in IPv6 using the supplied IPv6 mask.
We cannot currently match IPv6 addresses in ACL simply because we don't
support types on the patterns. Let's introduce this notion. For now, we
rely on the SMP_TYPES though it doesn't seem like it will last forever
given that some types are not present there (eg: regex, meth). Still it
should be enough to support mixed matchings for most types.
We use the special impossible value SMP_TYPES for types that don't exist
in the SMP_T_* space.
This is mainly a massive renaming in the code to get it in line with the
calling convention. Next patch will rename a few files to complete this
operation.
It's important to report the faulty argument position and to distinguish
between empty arguments and wrong ones.
Integers were not properly tested either, now their parsing has been improved
to report use of incorrect characters.
All parsing errors were known but impossible to return. Now by making use
of memprintf(), we're able to build meaningful error messages that the
caller can display.
It's easy to merge pattern and ACL fetches of cookies. It allows us
to remove two distinct fetch functions. The new function internally
uses an occurrence number to serve both purposes, but it didn't appear
worth exposing it outside so there is no keyword argument to set it.
However one of the benefits is that the "cookie" fetch for stick tables
now automatically adapts to requests and responses, so there is no more
need for set-cookie().
HTTP header fetch is now done using smp_fetch_hdr() for both ACLs and
patterns. This one also supports an occurrence number, making it possible
to specify explicit occurrences for ACLs and patterns.
This way, fetch functions will be able to tell if they're called for a single
request or as part of a loop. This is important for instance when we use
hdr(foo), because in an ACL this means that all hdr(foo) occurrences must
be checked while in a pattern it means only one of them (eg: last one).
pattern_fetch_rdp_cookie() is useless now since it only used to add controls
on top of smp_fetch_rdp_cookie() which have now been integrated into the
pattern subsystem. Let's remove it.
Pattern fetch functions currently check for unstable data and return 0
when SMP_F_MAY_CHANGE is set. Instead of doing this everywhere and having
to support specific fetch functions, better do that in pattern_process()
which is the one interested in having stable data.