The new "cpu-map" directive allows one to assign the CPU sets that
a process is allowed to bind to. This is useful in combination with
the "nbproc" and "bind-process" directives.
The support is implicit on Linux 2.6.28 and above.
This is done by passing the default value to SSLCACHESIZE in sessions.
User can use tune.sslcachesize to change this value.
By default, it is set to 20000 sessions as openssl internal cache size.
Currently, a session entry size is between 592 and 616 bytes depending on the arch.
Instead of storing a couple of (int, ptr) in the struct connection
and the struct session, we use a different method : we only store a
pointer to an integer which is stored inside the target object and
which contains a unique type identifier. That way, the pointer allows
us to retrieve the object type (by dereferencing it) and the object's
address (by computing the displacement in the target structure). The
NULL pointer always corresponds to OBJ_TYPE_NONE.
This reduces the size of the connection and session structs. It also
simplifies target assignment and compare.
In order to improve the generated code, we try to put the obj_type
element at the beginning of all the structs (listener, server, proxy,
si_applet), so that the original and target pointers are always equal.
A lot of code was touched by massive replaces, but the changes are not
that important.
Before connections were introduced, it was possible to connect an
external task to a stream interface. However it was left as an
exercise for the brave implementer to find how that ought to be
done.
The feature was broken since the introduction of connections and
was never fixed since due to lack of users. Better remove this dead
code now.
Hijackers were functions designed to inject data into channels in the
distant past. They became unused around 1.3.16, and since there has
not been any user of this mechanism to date, it's uncertain whether
the mechanism still works (and it's not really useful anymore). So
better remove it as well as the pointer it uses in the channel struct.
Some servers are not totally HTTP-compliant when it comes to parsing the
Connection header. This is particularly true with WebSocket where it happens
from time to time that a server doesn't support having a "close" token along
with the "Upgrade" token in the Connection header. This broken behaviour has
also been noticed on some clients though the problem is less frequent on the
response path.
Sometimes the workaround consists in enabling "option http-pretend-keepalive"
to leave the request Connection header untouched, but this is not always the
most convenient solution. This patch introduces a new solution : haproxy now
also looks for the "Upgrade" token in the Connection header and if it finds
it, then it refrains from adding any other token to the Connection header
(though "keep-alive" and "close" may still be removed if found). The same is
done for the response headers.
This way, WebSocket much with less changes even when facing non-compliant
clients or servers. At least it fixes the DISCONNECT issue that was seen
on the websocket.org test.
Note that haproxy does not change its internal mode, it just refrains from
adding new tokens to the connection header.
Now that all pollers make use of speculative I/O, there is no point
having two epoll implementations, so replace epoll with the sepoll code
and remove sepoll which has just become the standard epoll method.
si_fd() is not used a lot, and breaks builds on OpenBSD 5.2 which
defines this name for its own purpose. It's easy enough to remove
this one-liner function, so let's do it.
ev_sepoll already provides everything needed to manage FD events
by only manipulating the speculative I/O list. Nothing there is
sepoll-specific so move all this to fd.
This patch adds input and output rate calcutation on the HTTP compresion
feature.
Compression can be limited with a maximum rate value in kilobytes per
second. The rate is set with the global 'maxcomprate' option. You can
change this value dynamicaly with 'set rate-limit http-compression
global' on the UNIX socket.
At the moment sepoll is not 100% event-driven, because a call to fd_set()
on an event which is already being polled will not change its state.
This causes issues with OpenSSL because if some I/O processing is interrupted
after clearing the I/O event (eg: read all data from a socket, can't put it
all into the buffer), then there is no way to call the SSL_read() again once
the buffer releases some space.
The only real solution is to go 100% event-driven. The principle is to use
the spec list as an event cache and that each time an I/O event is reported
by epoll_wait(), this event is automatically scheduled for addition to the
spec list for future calls until the consumer explicitly asks for polling
or stopping.
Doing this is a bit tricky because sepoll used to provide a substantial
number of optimizations such as event merging. These optimizations have
been maintained : a dedicated update list is affected when events change,
but not the event list, so that updates may cancel themselves without any
side effect such as displacing events. A specific case was considered for
handling newly created FDs as soon as they are detected from within the
poll loop. This ensures that their read or write operation will always be
attempted as soon as possible, thus reducing the number of poll loops and
process_session wakeups. This is especially true for newly accepted fds
which immediately perform their first recv() call.
Two new flags were added to the fdtab[] struct to tag the fact that a file
descriptor already exists in the update list. One flag indicates that a
file descriptor is new and has just been created (fdtab[].new) and the other
one indicates that a file descriptor is already referenced by the update list
(fdtab[].updated). Even if the FD state changes during operations or if the
fd is closed and replaced, it's not an issue because the update flag remains
and is easily spotted during list walks. The flag must absolutely reflect the
presence of the fd in the update list in order to avoid overflowing the update
list with more events than there are distinct fds.
Note that this change also recovers the small performance loss introduced
by its connection counter-part and goes even beyond.
This is the first step of a series of changes aiming at making the
polling totally event-driven. This first change consists in only
remembering at the connection level whether an FD was enabled or not,
regardless of the fact it was being polled or cached. From now on, an
EAGAIN will always be considered as a change so that the pollers are
able to manage a cache and to flush it based on such events. One of
the noticeable effect is that conn_fd_handler() is called once more
per session (6 instead of 5 min) but other update functions are less
called.
Note that the performance loss caused by this change at the moment is
quite significant, around 2.5%, but the change is needed to have SSL
working correctly in all situations, even when data were read from the
socket and stored in the invisible cache, waiting for some room in the
channel's buffer.
With the global maxzlibmem option, you are able ton control the maximum
amount of RAM usable for HTTP compression.
A test is done before each zlib allocation, if the there isn't available
memory, the test fail and so the zlib initialization, so data won't be
compressed.
Don't use the zlib allocator anymore, 5 pools are used for the zlib
compression. Their sizes depends of the window size and the memLevel in
deflateInit2.
The window size and the memlevel of the zlib are now configurable using
global options tune.zlib.memlevel and tune.zlib.windowsize.
It affects the memory consumption of the zlib.
The build was dependent of the zlib.h header, regardless of the USE_ZLIB
option. The fix consists of several #ifdef in the source code.
It removes the overhead of the zstream structure in the session when you
don't use the option.
It is stupid to loop over ->snd_buf() because the snd_buf() itself already
loops and stops when system buffers are full. But looping again onto it,
we lose the information of the full buffers and perform one useless syscall.
Furthermore, this causes issues when dealing with large uploads while waiting
for a connection to establish, as it can report a server reject of some data
as a connection abort, which is wrong.
1.4 does not have this issue as it loops maximum twice (once for each buffer
half) and exists as soon as system buffers are full. So no backport is needed.
Keys are copied from samples to stick_table_key. If a key is larger
than the stick_table_key, we have an overflow. In pratice it does not
happen because it requires :
1) a configuration with tune.bufsize larger than BUFSIZE (common)
2) a stick-table configured with keys strictly larger than buffers
3) extraction of data larger than BUFSIZE (eg: using payload())
Points 2 and 3 don't make any sense for a real world configuration. That
said the issue needs be fixed. The solution consists in allocating it the
same size as the global buffer size, just like the samples. This fixes the
issue.
Sample conversions rely on two alternative buffers which were previously
allocated as static bufs of size BUFSIZE. Now they're initialized to the
global buffer size. It was the same for HTTP authentication. Note that it
seems that none of them was prone to any mistake when dealing with the
buffer size, but better stay on the safe side by maintaining the old
assumption that a trash buffer is always "large enough".
The trash is used everywhere to store the results of temporary strings
built out of s(n)printf, or as a storage for a chunk when chunks are
needed.
Using global.tune.bufsize is not the most convenient thing either.
So let's replace trash with a chunk and directly use it as such. We can
then use trash.size as the natural way to get its size, and get rid of
many intermediary chunks that were previously used.
The patch is huge because it touches many areas but it makes the code
a lot more clear and even outlines places where trash was used without
being that obvious.
This function's naming was misleading as it is used to append data
at the end of a string, causing some surprizes when used for the
first time!
Add a chunk_printf() function which does what its name suggests.
This is a first step in avoiding to constantly reinitialize chunks.
It replaces the old chunk_reset() which was not properly named as it
used to drop everything and was only used by chunk_destroy(). It has
been renamed chunk_drop().
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.
Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().
This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
This commit introduces HTTP compression using the zlib library.
http_response_forward_body has been modified to call the compression
functions.
This feature includes 3 algorithms: identity, gzip and deflate:
* identity: this is mostly for debugging, and it was useful for
developping the compression feature. With Content-Length in input, it
is making each chunk with the data available in the current buffer.
With chunks in input, it is rechunking, the output chunks will be
bigger or smaller depending of the size of the input chunk and the
size of the buffer. Identity does not apply any change on data.
* gzip: same as identity, but applying a gzip compression. The data
are deflated using the Z_NO_FLUSH flag in zlib. When there is no more
data in the input buffer, it flushes the data in the output buffer
(Z_SYNC_FLUSH). At the end of data, when it receives the last chunk in
input, or when there is no more data to read, it writes the end of
data with Z_FINISH and the ending chunk.
* deflate: same as gzip, but with deflate algorithm and zlib format.
Note that this algorithm has ambiguous support on many browsers and
no support at all from recent ones. It is strongly recommended not
to use it for anything else than experimentation.
You can't choose the compression ratio at the moment, it will be set to
Z_BEST_SPEED (1), as tests have shown very little benefit in terms of
compression ration when going above for HTML contents, at the cost of
a massive CPU impact.
Compression will be activated depending of the Accept-Encoding request
header. With identity, it does not take care of that header.
To build HAProxy with zlib support, use USE_ZLIB=1 in the make
parameters.
This work was initially started by David Du Colombier at Exceliance.
This state's name is confusing as it is only used with chunked encoding
and makes newcomers think it's also related to the content-length. Let's
call it CHUNK_CRLF to clear any doubt on this.
This tiny function was not inlined because initially not much used.
However it's been used un the chunk parser for a while and it became
one of the most CPU-cycle eater there. By inlining it, the chunk parser
speed was increased by 74 %. We're almost 3 times faster than original
with just the last 4 commits.
Most calls to channel_forward() are performed with short byte counts and
are already optimized in channel_forward() taking just a few instructions.
Thus it's a waste of CPU cycles to call a function for this, let's just
inline the short byte count case and fall back to the common one for
remaining situations.
Doing so has increased the chunked encoding parser's performance by 12% !
ACL and sample fetches use args list and it is really not convenient to
check for null args everywhere. Now for empty args we pass a constant
list of end of lists. It will allow us to remove many useless checks.
It's sometimes needed to be able to compare a zero-terminated string with a
chunk, so we now have two functions to do that, one strcmp() equivalent and
one strcasecmp() equivalent.
The ssl_npn match could not work by itself because clients do not use
the NPN extension unless the server advertises the protocols it supports.
Thanks to Simone Bordet for the explanations on how to get it right.
The struct target contains one int and one pointer, causing it to be
64-bit aligned on 64-bit platforms. By marking it "packed", we can
save 8 bytes in struct connection and as many in struct session on
such platforms.
This field was used to trace precisely where a session was terminated
but it did not survive code rearchitecture and was not used at all
anymore. Let's get rid of it.
Now that the buffer is moved out of the channel, it is possible to move
the pointer earlier in the struct and reorder some fields. This new
ordering improves overall performance by 2%, mainly saved in the HTTP
parsers and data transfers.
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.
Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
These two new log-format tags report the SSL protocol version (%sslv) and the
SSL ciphers (%sslc) used for the connection with the client. For instance, to
append these information just after the client's IP/port address information
on an HTTP log line, use the following configuration :
log-format %Ci:%Cp\ %sslv:%sslc\ [%t]\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %st\ %B\ %cc\ \ %cs\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\ %{+Q}r
It will report a line such as the following one :
Oct 12 20:47:30 haproxy[9643]: 127.0.0.1:43602 TLSv1:AES-SHA [12/Oct/2012:20:47:30.303] stick2~ stick2/s1 7/0/12/0/19 200 145 - - ---- 0/0/0/0/0 0/0 "GET /?t=0 HTTP/1.0"
This flag will have to be set on log tags which require transport layer
information. They will prevent the conn_xprt_close() call from releasing
the transport layer too early.
When we start logging SSL information, we need the SSL struct to be
present even past the conn_xprt_close() call. In order to achieve this,
we should use refcounting on the connection and the transport layer. At
the moment it's not worth using plain refcounting as only the logs require
this, so instead of real refcounting we just use a flag which will be set
by the log subsystem when SSL data need to be logged.
What happens then is that the xprt->close() call is ignored and the
transport layer is closed again during session_free(), after the log
line is emitted.
When calling conn_xprt_close(), we always clear the transport pointer
so that all transport layers leave the connection in the same state after
a close. This will also make it safer and cheaper to call conn_xprt_close()
multiple times if needed.
Until now it was not possible to know from the logs whether the incoming
connection was made over SSL or not. In order to address this in the existing
log formats, a new log format %ft was introduced, to log the frontend's name
suffixed with its transport layer. The only transport layer in use right now
is '~' for SSL, so that existing log formats for non-SSL traffic are not
affected at all, and SSL log formats have the frontend's name suffixed with
'~'.
The TCP, HTTP and CLF log format now use %ft instead of %f. This does not
affect existing log formats which still make use of %f however.
It now becomes possible to verify the server's certificate using the "verify"
directive. This one only supports "none" and "required", as it does not make
much sense to also support "optional" here.
Just like with the "bind" lines, we'll switch the "server" line
parsing to keyword registration. The code is essentially the same
as for bind keywords, with minor changes such as support for the
default-server keywords and support for variable argument count.
On Linux, accept4() does the same as accept() except that it allows
the caller to specify some flags to set on the resulting socket. We
use this to set the O_NONBLOCK flag and thus to save one fcntl()
call in each connection. The effect is a small performance gain of
around 1%.
The option is automatically enabled when target linux2628 is set, or
when the USE_ACCEPT4 Makefile variable is set. If the libc is too old
to provide the equivalent function, this is automatically detected and
our own function is used instead. In any case it is possible to force
the use of our implementation with USE_MY_ACCEPT4.
Baptiste Assmann reported a bug causing a crash on recent versions when
sticking rules were set on layer 7 in a TCP proxy. The bug is easier to
reproduce with the "defer-accept" option on the "bind" line in order to
have some contents to parse when the connection is accepted. The issue
is that the acl_prefetch_http() function called from HTTP fetches relies
on hdr_idx to be preinitialized, which is not the case if there is no L7
ACL.
The solution consists in adding a new SMP_CAP_L7 flag to fetches to indicate
that they are expected to work on L7 data, so that the proxy knows that the
hdr_idx has to be initialized. This is already how ACL and HTTP mode are
handled.
The bug was present since 1.5-dev9.
These ones are used to set the default ciphers suite on "bind" lines and
"server" lines respectively, instead of using OpenSSL's defaults. These
are probably mainly useful for distro packagers.
When health checks are configured on a server which has the send-proxy
directive and no "port" nor "addr" settings, the health check connections
will automatically use the PROXY protocol. If "port" or "addr" are set,
the "check-send-proxy" directive may be used to force the protocol.
Since it's possible for the checks to use a different protocol or transport layer
than the prod traffic, we need to have them referenced in the server. The
SSL checks are not enabled yet, but the transport layers are completely used.
Till now the request was made in the trash and sent to the network at
once, and the response was read into a preallocated char[]. Now we
allocate a full buffer for both the request and the response, and make
use of it.
Some of the operations will probably be replaced later with buffer macros
but the point was to ensure we could migrate to use the data layers soon.
One nice improvement caused by this change is that requests are now formed
at the beginning of the check and may safely be sent in multiple chunks if
needed.
The health checks in the servers are becoming a real mess, move them
into their own subsection. We'll soon need to have a struct buffer to
replace the char * as well as check-specific protocol and transport
layers.
This callback sends a PROXY protocol line on the outgoing connection,
with the local and remote endpoint information. This is used for local
connections (eg: health checks) where the other end needs to have a
valid address and no connection is relayed.
It was previously in frontend.c but there is no reason for this anymore
considering that all the information involved is in the connection itself
only. Theorically this should be in the socket layer but we don't have
this yet.
We absolutely want to disable FD polling after an error is detected,
otherwise the data layer has to do it and it's far from being obvious
at these layers.
The way we did it was a bit tricky in conn_update_*_polling and
conn_*_polling_changes. However it has almost no impact on performance
and code size both for the fast and slow path.
We'll now be able to remove some flag updates in the stream interface.
Just like ->init(), ->wake() may now be used to return an error and
abort the connection. Currently this is not used but will be with
embryonic sessions.
We now check the connection flags for changes in order not to call the
data->wake callback when there is no activity. Activity means a change
on any of the CO_FL_*_SH, CO_FL_ERROR, CO_FL_CONNECTED, CO_FL_WAIT_CONN*
flags, as well as a call to data->recv or data->send.
The connection flags have progressively been added one after the other
and were not very well organized. Some of them are often used together
and a number of operations are performed on the DATA/SOCK ENA/POL flags.
Thus, they have been reorganized so that flags that work together are
close to each other (allows immediate operands on ARM) and that polling
changes can be detected with fewer operations using a simple shift and
xor. The handshakes are now the last ones so that it will be easier to
add new ones after without risking a collision. All activity-related
flags are also grouped together.
The generic data-layer init callback is now used after the transport
layer is complete and before calling the data layer recv/send callbacks.
This allows the session to switch from the embryonic session data layer
to the complete stream interface data layer, by making conn_session_complete()
the data layer's init callback.
It sill looks awkwards that the init() callback must be used opon error,
but except by adding yet another one, it does not seem to be mergeable
into another function (eg: it should probably not be merged with ->wake
to avoid unneeded calls during the handshake, though semantically that
would make sense).
Instead of calling conn_notify_si() from the connection handler, we
now call data->wake(), which will allow us to use a different callback
with health checks.
Note that we still rely on a flag in order to decide whether or not
to call this function. The reason is that with embryonic sessions,
the callback is already initialized to si_conn_cb without the flag,
and we can't call the SI notify function in the leave path before
the stream interface is initialized.
This issue should be addressed by involving a different data_cb for
embryonic sessions and for stream interfaces, that would be changed
during session_complete() for the final data_cb.
Now conn->data will designate the data layer which is the client for
the transport layer. In practice it's the stream interface and will
soon also be the health checks.
While working on the changes required to make the health checks use the
new connections, it started to become obvious that some naming was not
logical at all in the connections. Specifically, it is not logical to
call the "data layer" the layer which is in charge for all the handshake
and which does not yet provide a data layer once established until a
session has allocated all the required buffers.
In fact, it's more a transport layer, which makes much more sense. The
transport layer offers a medium on which data can transit, and it offers
the functions to move these data when the upper layer requests this. And
it is the upper layer which iterates over the transport layer's functions
to move data which should be called the data layer.
The use case where it's obvious is with embryonic sessions : an incoming
SSL connection is accepted. Only the connection is allocated, not the
buffers nor stream interface, etc... The connection handles the SSL
handshake by itself. Once this handshake is complete, we can't use the
data functions because the buffers and stream interface are not there
yet. Hence we have to first call a specific function to complete the
session initialization, after which we'll be able to use the data
functions. This clearly proves that SSL here is only a transport layer
and that the stream interface constitutes the data layer.
A similar change will be performed to rename app_cb => data, but the
two could not be in the same commit for obvious reasons.
Each proxy contains a reference to the original config file and line
number where it was declared. The pointer used is just a reference to
the one passed to the function instead of being duplicated. The effect
is that it is not valid anymore at the end of the parsing and that all
proxies will be enumerated as coming from the same file on some late
configuration errors. This may happen for exmaple when reporting SSL
certificate issues.
By copying using strdup(), we avoid this issue.
1.4 has the same issue, though no report of the proxy file name is done
out of the config section. Anyway a backport is recommended to ease
post-mortem analysis.
Herv Commowick reported an issue : haproxy dies in a segfault during a
soft restart if it tries to pause a disabled proxy. This is because disabled
proxies have no management task so we must not wake the task up. This could
easily remain unnoticed since the old process was expected to go away, so
having it go away faster was not really troubling. However, with sync peers,
it is obvious that there is no peer sync during this reload.
This issue has been introduced in 1.5-dev7 with the removal of the
maintain_proxies() function. No backport is needed.
Disables the stateless session resumption (RFC 5077 TLS Ticket extension)
and force to use stateful session resumption.
Stateless session resumption is more expensive in CPU usage.
This is because "notlsv1" used to disable TLSv1.0 only and had no effect
on v1.1/v1.2. so better have an option for each version. This applies both
to "bind" and "server" statements.
It removes dependencies with futex or mutex but ssl performances decrease
using nbproc > 1 because switching process force session renegotiation.
This can be useful on small systems which never intend to run in multi-process
mode.
We don't needa to lock the memory when there is a single process. This can
make a difference on small systems where locking is much more expensive than
just a test.
Allow to ignore some verify errors and to let them pass the handshake.
Add option 'crt-ignore-err <list>'
Ignore verify errors at depth == 0 (client certificate)
<list> is string 'all' or a comma separated list of verify error IDs
(see http://www.openssl.org/docs/apps/verify.html)
Add option 'ca-ignore-err <list>'
Same as 'crt-ignore-err' for all depths > 0 (CA chain certs)
Ex ignore all errors on CA and expired or not-yet-valid errors
on client certificate:
bind 0.0.0.0:443 ssl crt crt.pem verify required
cafile ca.pem ca-ignore-err all crt-ignore-err 10,9
Add keyword 'verify' on bind:
'verify none': authentication disabled (default)
'verify optional': accept connection without certificate
and process a verify if the client sent a certificate
'verify required': reject connection without certificate
and process a verify if the client send a certificate
Add keyword 'cafile' on bind:
'cafile <path>' path to a client CA file used to verify.
'crlfile <path>' path to a client CRL file used to verify.
This will be needed to find the stream interface from the connection
once they're detached, but in the more immediate term, we'll need this
for health checks since they don't use a stream interface.
Now the stats socket is allocated when the 'stats socket' line is parsed,
and assigned using the standard str2listener(). This has two effects :
- more than one stats socket can now be declared
- stats socket now support protocols other than UNIX
The next step is to remove the duplicate bind config parsing.
Alex Markham reported and diagnosed a bug appearing on 1.5-dev11,
causing a crash on x86_64 when header hashing is used. The cause is
a missing (int) cast causing a negative offset to appear positive
and the resulting pointer to go out of bounds.
The crash is not possible anymore since 1.5-dev12 because a second
bug caused the negative sign to disappear so the pointer is always
within range but always wrong, so balance hdr() never works anymore.
This fix restores the correct behaviour and ensures the sign is
correct.
Unix permissions are per-bind configuration line and not per listener,
so let's concretize this in the way the config is stored. This avoids
some unneeded loops to set permissions on all listeners.
The access level is not part of the unix perms so it has been moved
away. Once we can use str2listener() to set all listener addresses,
we'll have a bind keyword parser for this one.
Navigating through listeners was very inconvenient and error-prone. Not to
mention that listeners were linked in reverse order and reverted afterwards.
In order to definitely get rid of these issues, we now do the following :
- frontends have a dual-linked list of bind_conf
- frontends have a dual-linked list of listeners
- bind_conf have a dual-linked list of listeners
- listeners have a pointer to their bind_conf
This way we can now navigate from anywhere to anywhere and always find the
proper bind_conf for a given listener, as well as find the list of listeners
for a current bind_conf.
When an unknown "bind" keyword is detected, dump the list of all
registered keywords. Unsupported default alternatives are also reported
as "not supported".
Registering new SSL bind keywords was not particularly handy as it required
many #ifdef in cfgparse.c. Now the code has moved to ssl_sock.c which calls
a register function for all the keywords.
Error reporting was also improved by this move, because the called functions
build an error message using memprintf(), which can span multiple lines if
needed, and each of these errors will be displayed indented in the context of
the bind line being processed. This is important when dealing with certificate
directories which can report multiple errors.
With the arrival of SSL, the "bind" keyword has received even more options,
all of which are processed in cfgparse in a cumbersome way. So it's time to
let modules register their own bind options. This is done very similarly to
the ACLs with a small difference in that we make the difference between an
unknown option and a known, unimplemented option.
Some settings need to be merged per-bind config line and are not necessarily
SSL-specific. It becomes quite inconvenient to have this ssl_conf SSL-specific,
so let's replace it with something more generic.
Bind parsers may return multiple errors, so let's make use of a new function
to re-indent multi-line error messages so that they're all reported in their
context.
A side effect of this change is that the "ssl" keyword on "bind" lines is now
just a boolean and that "crt" is needed to designate certificate files or
directories.
Note that much refcounting was needed to have the free() work correctly due to
the number of cert aliases which can make a context be shared by multiple names.
SSL config holds many parameters which are per bind line and not per
listener. Let's use a per-bind line config instead of having it
replicated for each listener.
At the moment we only do this for the SSL part but this should probably
evolved to handle more of the configuration and maybe even the state per
bind line.
SSL connections take a huge amount of memory, and unfortunately openssl
does not check malloc() returns and easily segfaults when too many
connections are used.
The only solution against this is to provide a global maxsslconn setting
to reject SSL connections above the limit in order to avoid reaching
unsafe limits.
I wrote a small path to add the SSL_OP_CIPHER_SERVER_PREFERENCE OpenSSL option
to frontend, if the 'prefer-server-ciphers' keyword is set.
Example :
bind 10.11.12.13 ssl /etc/haproxy/ssl/cert.pem ciphers RC4:HIGH:!aNULL:!MD5 prefer-server-ciphers
This option mitigate the effect of the BEAST Attack (as I understand), and it
equivalent to :
- Apache HTTPd SSLHonorCipherOrder option.
- Nginx ssl_prefer_server_ciphers option.
[WT: added a test for the support of the option]
This is aimed at disabling SSLv3 and TLSv1 respectively. SSLv2 is always
disabled. This can be used in some situations where one version looks more
suitable than the other.
This SSL session cache was developped at Exceliance and is the same that
was proposed for stunnel and stud. It makes use of a shared memory area
between the processes so that sessions can be handled by any process. It
is only useful when haproxy runs with nbproc > 1, but it does not hurt
performance at all with nbproc = 1. The aim is to totally replace OpenSSL's
internal cache.
The cache is optimized for Linux >= 2.6 and specifically for x86 platforms.
On Linux/x86, it makes use of futexes for inter-process locking, with some
x86 assembly for the locked instructions. On other architectures, GCC
builtins are used instead, which are available starting from gcc 4.1.
On other operating systems, the locks fall back to pthread mutexes so
libpthread is automatically linked. It is not recommended since pthreads
are much slower than futexes. The lib is only linked if SSL is enabled.
CVE-2009-3555 suggests that client-initiated renegociation should be
prevented in the middle of data. The workaround here consists in having
the SSL layer notify our callback about a handshake occurring, which in
turn causes the connection to be marked in the error state if it was
already considered established (which means if a previous handshake was
completed). The result is that the connection with the client is immediately
aborted and any pending data are dropped.
This option currently takes no option and simply turns SSL on for all
connections going to the server. It is likely that more options will
be needed in the future.
This data layer supports socket-to-buffer and buffer-to-socket operations.
No sock-to-pipe nor pipe-to-sock functions are provided, since splicing does
not provide any benefit with data transformation. At best it could save a
memcpy() and avoid keeping a buffer allocated but that does not seem very
useful.
An init function and a close function are provided because the SSL context
needs to be allocated/freed.
A data-layer shutw() function is also provided because upon successful
shutdown, we want to store the SSL context in the cache in order to reuse
it for future connections and avoid a new key generation.
The handshake function is directly called from the connection handler.
At this point it is not certain whether this will remain this way or
if a new ->handshake callback will be added to the data layer so that
the connection handler doesn't care about SSL.
The sock-to-buf and buf-to-sock functions are all capable of enabling
the SSL handshake at any time. This also implies polling in the opposite
direction to what was expected. The upper layers must take that into
account (it is OK right now with the stream interface).
It appears that fd.h includes a number of unneeded files and was
included from standard.h, and as such served as an intermediary
to provide almost everything to everyone.
By removing its useless includes, a long dependency chain broke
but could easily be fixed.
The "spec" sub-struct was using 8 bytes for only 5 needed. There is no
reason to keep it as a struct, it doesn't bring any value. By flattening
it, we can merge the single byte with the next single byte, resulting in
an immediate saving of 4 bytes (20%). Interestingly, tests have shown a
steady performance gain of 0.6% after this change, which can possibly be
attributed to a more cache-line friendly struct.
These flags were added for TCP_CORK. They were only set at various places
but never checked by any user since TCP_CORK was replaced with MSG_MORE.
Simply get rid of this now.
I/O handlers now all use __conn_{sock,data}_{stop,poll,want}_* instead
of returning dummy flags. The code has become slightly simpler because
some tricks such as the MIN_RET_FOR_READ_LOOP are not needed anymore,
and the data handlers which switch to a handshake handler do not need
to disable themselves anymore.
Polling flags were set for data and sock layer, but while this does make
sense for the ENA flag, it does not for the POL flag which translates the
detection of an EAGAIN condition. So now we remove the {DATA,SOCK}_POL*
flags and instead introduce two new layer-independant flags (WANT_RD and
WANT_WR). These flags are only set when an EAGAIN is encountered so that
polling can be enabled.
In order for these flags to have any meaning they are not persistent and
have to be cleared by the connection handler before calling the I/O and
data callbacks. For this reason, changes detection has been slightly
improved. Instead of comparing the WANT_* flags with CURR_*_POL, we only
check if the ENA status changes, or if the polling appears, since we don't
want to detect the useless poll to ena transition. Tests show that this
has eliminated one useless call to __fd_clr().
Finally the conn_set_polling() function which was becoming complex and
required complex operations from the caller was split in two and replaced
its two only callers (conn_update_data_polling and conn_update_sock_polling).
The two functions are now much smaller due to the less complex conditions.
Note that it would be possible to re-merge them and only pass a mask but
this does not appear much interesting.
The PROXY protocol is now decoded in the connection before other
handshakes. This means that it may be extracted from a TCP stream
before SSL is decoded from this stream.
When an incoming connection request is accepted, a connection
structure is needed to store its state. However we don't want to
fully initialize a session until the data layer is about to be
ready.
As long as the connection is physically stored into the session,
it's not easy to split both allocations.
As such, we only initialize the minimum requirements of a session,
which results in what we call an embryonic session. Then once the
data layer is ready, we can complete the function's initialization.
Doing so avoids buffers allocation and ensures that a session only
sees ready connections.
The frontend's client timeout is used as the handshake timeout. It
is likely that another timeout will be used in the future.
SSL need to initialize the data layer before proceeding with data. At
the moment, this data layer is automatically initialized from itself,
which will not be possible once we extract connection from sessions
since we'll only create the data layer once the handshake is finished.
So let's have the application layer initialize the data layer before
using it.
Make it more obvious that this function does not depend on any knowledge
of the session. This is important to plan for TCP rules that can run on
connection without any initialized session yet.
The last uses of the stream interfaces were in tcp_connect_server() and
could easily and more appropriately be moved to its callers, si_connect()
and connect_server(), making a lot more sense.
Now the function should theorically be usable for health checks.
It also appears more obvious that the file is split into two distinct
parts :
- the protocol layer used at the connection level
- the tcp analysers executing tcp-* rules and their samples/acls.
These ones are implicitly handled by the connection's data layer, no need
to rely on them anymore and reaching them maintains undesired dependences
on stream-interface.
We need to have the source and destination addresses in the connection.
They were lying in the stream interface so let's move them. The flags
SI_FL_FROM_SET and SI_FL_TO_SET have been moved as well.
It's worth noting that tcp_connect_server() almost does not use the
stream interface anymore except for a few flags.
It has been identified that once we detach the connection from the SI,
it will probably be needed to keep a copy of the server-side addresses
in the SI just for logging purposes. This has not been implemented right
now though.
This is a massive rename of most functions which should make use of the
word "channel" instead of the word "buffer" in their names.
In concerns the following ones (new names) :
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes);
static inline void channel_init(struct channel *buf)
static inline int channel_input_closed(struct channel *buf)
static inline int channel_output_closed(struct channel *buf)
static inline void channel_check_timeouts(struct channel *b)
static inline void channel_erase(struct channel *buf)
static inline void channel_shutr_now(struct channel *buf)
static inline void channel_shutw_now(struct channel *buf)
static inline void channel_abort(struct channel *buf)
static inline void channel_stop_hijacker(struct channel *buf)
static inline void channel_auto_connect(struct channel *buf)
static inline void channel_dont_connect(struct channel *buf)
static inline void channel_auto_close(struct channel *buf)
static inline void channel_dont_close(struct channel *buf)
static inline void channel_auto_read(struct channel *buf)
static inline void channel_dont_read(struct channel *buf)
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes)
Some functions provided by channel.[ch] have kept their "buffer" name because
they are really designed to act on the buffer according to some information
gathered from the channel. They have been moved together to the same place in
the file for better readability but they were not changed at all.
The "buffer" memory pool was also renamed "channel".
Get rid of these confusing BF_* flags. Now channel naming should clearly
be used everywhere appropriate.
No code was changed, only a renaming was performed. The comments about
channel operations was updated.
These functions do not depend on the channel flags anymore thus they're
much better suited to be used on plain buffers. Move them from channel
to buffer.
This is similar to the recent removal of BF_OUT_EMPTY. This flag was very
problematic because it relies on permanently changing information such as the
to_forward value, so it had to be updated upon every change to the buffers.
Previous patch already got rid of its users.
One part of the change is sensible : the flag was also part of BF_MASK_STATIC,
which is used by process_session() to rescan all analysers in case the flag's
status changes. At first glance, none of the analysers seems to change its
mind base on this flag when it is subject to change, so it seems fine not to
add variation checks here. Otherwise it's possible that checking the buffer's
input and output is more reliable than checking the flag's replacement.
This flag was very problematic because it was composite in that both changes
to the pipe or to the buffer had to cause this flag to be updated, which is
not always simple (eg: there may not even be a channel attached to a buffer
at all).
There were not that many users of this flags, mostly setters. So the flag got
replaced with a macro which reports whether the channel is empty or not, by
checking both the pipe and the buffer.
One part of the change is sensible : the flag was also part of BF_MASK_STATIC,
which is used by process_session() to rescan all analysers in case the flag's
status changes. At first glance, none of the analysers seems to change its
mind base on this flag when it is subject to change, so it seems fine not to
add variation checks here. Otherwise it's possible that checking the buffer's
output size is more useful than checking the flag's replacement.
Some parts of the sock_ops structure were only used by the stream
interface and have been moved into si_ops. Some of them were callbacks
to the stream interface from the connection and have been moved into
app_cp as they're the application seen from the connection (later,
health-checks will need to use them). The rest has moved to data_ops.
Normally at this point the connection could live without knowing about
stream interfaces at all.
The splicing is now provided by the data-layer rcv_pipe/snd_pipe functions
which in turn are called by the stream interface's recv and send callbacks.
The presence of the rcv_pipe/snd_pipe functions is used to attest support
for splicing at the data layer. It looks like the stream-interface's
SI_FL_CAP_SPLICE flag does not make sense anymore as it's used as a proxy
for the pointers above.
It also appears that we call chk_snd() from the recv callback and then
try to call it again in update_conn(). It is very likely that this last
function will progressively slip into the recv/send callbacks in order
to avoid duplicate check code.
The code works right now with and without splicing. Only raw_sock provides
support for it and it is automatically selected when the various splice
options are set. However it looks like splice-auto doesn't enable it, which
possibly means that the streamer detection code does not work anymore, or
that it's only called at a time where it's too late to enable splicing (in
process_session).
Similar to what was done on the receive path, the data layer now provides
only an snd_buf() callback that is iterated over by the stream interface's
si_conn_send_loop() function.
The data layer now has no knowledge about channels nor stream interfaces.
The splice() code still need to be ported as it currently is disabled.
The recv function is now generic and is usable to iterate any connection-to-buf
reading function from a stream interface. So let's move it to stream-interface.
This is the start of the stream connection iterator which calls the
data-layer reader. This still looks a bit tricky but is OK. Splicing
is not handled at all at the moment.
The "raw_sock" prefix will be more convenient for naming functions as
it will be prefixed with the data layer and suffixed with the data
direction. So let's rename the files now to avoid any further confusion.
The #include directive was also removed from a number of files which do
not need it anymore.
At the moment, the struct is still embedded into the struct channel, but
all the functions have been updated to use struct buffer only when possible,
otherwise struct channel. Some functions would likely need to be splitted
between a buffer-layer primitive and a channel-layer function.
Later the buffer should become a pointer in the struct buffer, but doing so
requires a few changes to the buffer allocation calls.
This is a massive rename. We'll then split channel and buffer.
This change needs a lot of cleanups. At many locations, the parameter
or variable is still called "buf" which will become ambiguous. Also,
the "struct channel" is still defined in buffers.h.
This function is used by the data layer when a zero has been read over a
connection. At the moment it only handles sockets and nothing else. Once
the complete split is done between buffers and stream interfaces, it should
become possible to work regardless on the connection type.
The connection send() callback is supposed to be generic for a
stream-interface, and consists in calling the lower layer snd_buf
function. Move this function to the stream interface and remove
the sock-raw and sock-ssl clones.
This callback is used to send data from the buffer to the socket. It is
the old write_loop() call of the data layer which is used both by the
->write() callback and the ->chk_snd() function. The reason for having
it as a pointer is that it's the only remaining part which causes the
write and chk_snd() functions to be different between raw and ssl.
sock_raw and sock_ssl use a pretty generic chk_rcv function, so let's move
this function to the stream_interface and remove specific functions. Later
we might have a single chk_rcv function.
We need to have a generic function to be called by upper layers when buffer
flags have been updated (the si->update function). At the moment, both sock_raw
and sock_ssl had their own which basically was a copy-paste. Since these
functions are only used to update stream interface flags, it is logical to
have them handled by the stream interface code.
This allowed us to remove the stream_interface-specific update function from
sock_raw and sock_ssl which now use the generic code.
The stream_sock_update_conn callback has also been more appropriately renamed
conn_notify_si() since it's meant to be called by lower layers to notify the
SI and possibly upper layers about incoming changes.
This is a second attempt at getting rid of FD_WAIT_*. Now the situation is
much better since native I/O handlers can directly manipulate the FD using
fd_{poll|want|stop}_* and the connection handlers manipulate connection-level
flags using the conn_{data|sock}_* equivalent.
Proceeding this way ensures that the connection flags always reflect the
reality even after data<->handshake switches.
Now the connection handler, the handshake callbacks and the I/O callbacks
make use of the connection-layer polling functions to enable or disable
polling on a file descriptor.
Some changes still need to be done to avoid using the FD_WAIT_* constants.
The conflicts we're facing with polling is that handshake handlers have
precedence over data handlers and may change the polling requirements
regardless of what is expected by the data layer. This causes issues
such as missed events.
The real need is to have three polling levels :
- the "current" one, which is effective at any moment
- the data one, which reflects what the data layer asks for
- the sock one, which reflects what the socket layer asks for
Depending on whether a handshake is in progress or not, either one of the
last two will replace the current one, and the change will be propagated
to the lower layers.
At the moment, the shutdown status is not considered, and only handshakes
are used to decide which layer to chose. This will probably change.
The old EV_FD_SET() macro was confusing, as it would enable receipt but there
was no way to indicate that EAGAIN was received, hence the recently added
FD_WAIT_* flags. They're not enough as we're still facing a conflict between
EV_FD_* and FD_WAIT_*. So let's offer I/O functions what they need to explicitly
request polling.
These functions have a more explicity meaning and will offer provisions
for explicit polling.
EV_FD_ISSET() has been left for now as it is still in use in checks.
Up to now, we had to use a shutr/shutw interface per data layer, which
basically means 3 distinct functions when we include SSL :
- generic stream_interface
- sock_raw
- sock_ssl
With this change, the code located in the stream_interface manages all the
stream_interface and buffer updates, and calls the data layer hooks when
needed.
At the moment, the socket layer hook had been implicitly considered as
being a regular socket, so the si_shut*() functions call the normal
shutdown() and EV_FD_CLR() functions on the fd if a socket layer is
defined. This may change in the future. The stream_int_shut*()
functions don't call EV_FD_CLR() so that they can later be embedded
in lower layers.
Thus, the si->data->shutr() is not called anymore and si->data->shutw()
is called to close the data layer only (eg: only for SSL).
Proceeding like this is very important because it's the only way to be
able not to rely on these functions when called from the connection
handlers, and call the data layers' instead.
These primitives were initially introduced so that callers were able to
conditionally set/disable polling on a file descriptor and check in return
what the state was. It's been long since we last had an "if" on this, and
all pollers' functions were the same for cond_* and their systematic
counter parts, except that this required a check and a specific return
value that are not always necessary.
So let's simplify the FD API by removing this now unused distinction and
by making all specific functions return void.
Handshakes is not called anymore from the data handlers, they're only
called from the connection handler when their flag is set.
Also, this move has uncovered an issue with the stream interface notifier :
it doesn't consider the FD_WAIT_* flags possibly set by the handshake
handlers. This will result in a stuck handshake when no data is in the
output buffer. In order to cover this, for now we'll perform the EV_FD_SET
in the SSL handshake function, but this needs to be addressed separately
from the stream interface operations.
This new flag is used to indicate that the connection was already
connected. It can be used by I/O handlers to know that a connection
has just completed. It is used by stream_sock_update_conn(), allowing
the sock_opt handlers not to manipulate the SI timeout nor the
BF_WRITE_NULL flag anymore.
The sock_ops I/O callbacks made use of an FD till now. This has become
inappropriate and the struct connection is much more useful. It also
fixes the race condition introduced by previous change.
The socket data layer code must only focus on moving data between a
socket and a buffer. We need a special stream interface handler to
update the stream interface and the file descriptor status.
At the moment the code works but suffers from a race condition caused
by its API : the read/write callbacks still make use of the fd instead
of using the connection. And when a double shutdown is performed, a call
to ->write() after ->read() processed an error results in dereferencing
a NULL fdtab[]->owner. This is only a temporary issue which doesn't need
to be fixed now since this will automatically go away when the functions
change to use the connection instead.
Use a single tcp_connect_probe() instead of tcp_connect_write() and
tcp_connect_read(). We call this one only when no data layer function
have been processed, so this is a fallback to test for completion of
a connection attempt.
With this done, we don't have the need for any direct I/O callback
anymore.
The function still relies on ->write() to wake the stream interface up,
so it's not finished.
This handshake handler must be independant, so move it away from
proto_tcp. It has a dedicated connection flag. It is tested before
I/O handlers and automatically removes the CO_FL_WAIT_L4_CONN flag
upon success.
It also sets the BF_WRITE_NULL flag on the stream interface and
stops the SI timeout. However it does not perform the task_wakeup(),
and relies on the data handler to do so for now. The SI wakeup will
have to be moved elsewhere anyway.
fdtab[].state was only used to know whether a connection was in progress
or an error was encountered. Instead we now use connection->flags to store
a flag for both. This way, connection management will be able to update the
connection status on I/O.
In an attempt to get rid of fdtab[].state, and to move the relevant
parts to the connection struct, we remove the FD_STCLOSE state which
can easily be deduced from the <owner> pointer as there is a 1:1 match.
The correct spelling is "independent", not "independant". This patch
fixes the doc and the configuration parser to accept the correct form.
The config parser still allows the old naming for backwards compatibility.
This is used to enter values for stick tables. The most likely usage
is to set gpc0 for a specific IP address in order to block traffic
for abusers without having to reload. Since all data types are
supported, other usages are possible (eg: replace a users's assigned
server).
Source addresses of non-TCP families were not correctly handled by
tcp_src_to_stktable_key() as it forgot to return NULL and instead left
the previous value in the stick-table buffer.
This bug is 1.5-specific and was introduced by commit 4f92d320 in 1.5-dev6
so it does not need any backport.
The destination address is purely a connection thing and not an fd thing.
It's also likely that later the address will be stored into the connection
and linked to by the SI.
struct fdinfo only keeps the pointer to the port range and the local port
for now. All of this also needs to move to the connection but before this
the release of the port range must move from fd_delete() to a new function
dedicated to the connection.
It was not possible to kill remaining sessions from the admin interface,
which is annoying especially when switching to maintenance mode. Now it's
possible.
This implements the feature discussed in the earlier thread of killing
connections on backup servers when a non-backup server comes back up. For
example, you can use this to route to a mysql master & slave and ensure
clients don't stay on the slave after the master goes from down->up. I've done
some minimal testing and it seems to work.
[WT: added session flag & doc, moved the killing after logging the server UP,
and ensured that the new server is really usable]
When passing arguments to ACLs and samples, some types are stored as
strings then resolved later after config parsing is done. Upon exit,
the arguments need to be freed only if the string was not resolved
yet. At the moment we can encounter double free during deinit()
because some arguments (eg: userlists) are freed once as their own
type and once as a string.
The solution consists in adding an "unresolved" flag to the args to
say whether the value is still held in the <str> part or is final.
This could be debugged thanks to a useful bug report from Sander Klein.
httponly This option tells haproxy to add an "HttpOnly" cookie attribute
when a cookie is inserted. This attribute is used so that a
user agent doesn't share the cookie with non-HTTP components.
Please check RFC6265 for more information on this attribute.
secure This option tells haproxy to add a "Secure" cookie attribute when
a cookie is inserted. This attribute is used so that a user agent
never emits this cookie over non-secure channels, which means
that a cookie learned with this flag will be presented only over
SSL/TLS connections. Please check RFC6265 for more information on
this attribute.
This one was already taken care of in proxy_cfg_ensure_no_http(), so if a
cookie is presented in a TCP backend, we got two warnings.
This can be backported to 1.4 since it's been this way for 2 years (although not dramatic).
Cookies were mixed with many other options while they're not used as options.
Move them to a dedicated bitmask (ck_opts). This has released 7 flags in the
proxy options and leaves some room for new proxy flags.
Option httplog needs to be checked only once the proxy has been validated,
so that its final mode (tcp/http) can be used. Also we need to check for
httplog before checking the log format, so that we can report a warning
about this specific option and not about the format it implies.
Commit 13e66da introduced b_rew() but passes -adv which is an unsigned
quantity on 64-bit platforms, causing the buffer to advance in the wrong
direction.
No backport is needed.
This patch brings a new "whole" parameter to "balance uri" which makes
the hash work over the whole uri, not just the part before the query
string. Len and depth parameter are still honnored.
The reason for this new feature is explained below.
I have 3 backend servers, each accepting different form of HTTP queries:
http://backend1.server.tld/service1.php?q=...
http://backend1.server.tld/service2.php?q=...
http://backend2.server.tld/index.php?query=...&subquery=...
http://backend3.server.tld/image/49b8c0d9ff
Each backend server returns a different response based on either:
- the URI path (the left part of the URI before the question mark)
- the query string (the right part of the URI after the question mark)
- or the combination of both
I wanted to set up a common caching cluster (using 6 Squid servers, each
configured as reverse proxy for those 3 backends) and have HAProxy balance
the queries among the Squid servers based on URL. I also wanted to achieve
hight cache hit ration on each Squid server and send the same queries to
the same Squid servers. Initially I was considering using the 'balance uri'
algorithm, but that would not work as in case of backend2 all queries would
go to only one Squid server. The 'balance url_param' would not work either
as it would send the backend3 queries to only one Squid server.
So I thought the simplest solution would be to use 'balance uri', but to
calculate the hash based on the whole URI (URI path + query string),
instead of just the URI path.
The listener struct is now aware of the socket layer to use upon accept().
At the moment, only sock_raw is supported so this patch should not change
anything.
When the target is a client, it will be convenient to have a pointer to the
original listener so that we can retrieve some configuration information at
the stream interface level.
This function will be called later when splitting the shutdown in two
steps. It will be needed by SSL and for remote socket operations to
release unused contexts.
The state and the private pointer are not specific to the applets, since SSL
will require exactly both of them. Move them to the connection layer now and
rename them. We also now ensure that both are NULL on first call.
We start to move everything needed to manage a connection to a special
entity "struct connection". We have the data layer operations and the
control operations there. We'll also have more info in the future such
as file descriptors and applet contexts, so that in the end it becomes
detachable from the stream interface, which will allow connections to
be reused between sessions.
For now on, we start with minimal changes.
David Touzeau reported that haproxy dies when a server is checked and is
used in a farm with only "option transparent" and no LB algo. This is
because the LB params are NULL, the functions should be checked before
being called.
The same bug is present in 1.4 so this patch must be backported.
msg->som was zero before the body and was used to carry the beginning
of a chunk size for chunked-encoded messages, at a moment when msg->sol
is always zero.
Remove msg->som and replace it with msg->sol where needed.
This is a left-over from the buffer changes. Msg->sol is always null at the
end of the parsing, so we must not use it anymore to read headers or find
the beginning of a message. As a side effect, the dump of the request in
debug mode is working again because it was relying on msg->sol not being
null.
Maybe it will even be mergeable with another of the message pointers.
Calling the init() function in sess_establish was a bad idea, it is
too late to allow it to fail on lack of resource and does not help at
all. Remove it for now before it's used.
Before it was possible to resize the buffers using global.tune.bufsize,
the trash has always been the size of a buffer by design. Unfortunately,
the recent buffer sizing at runtime forgot to adjust the trash, resulting
in it being too short for content rewriting if buffers were enlarged from
the default value.
The bug was encountered in 1.4 so the fix must be backported there.
This flag indicates that we're not interested in keeping half-open
connections on a stream interface. It has the benefit of allowing
the socket layer to cause an immediate write close when detecting
an incoming read close. This releases resources much faster and
saves one syscall (either a shutdown or setsockopt).
This flag is only set by HTTP on the interface going to the server
since we don't want to continue pushing data there when it has
closed.
Another benefit is that it responds with a FIN to a server's FIN
instead of responding with an RST as it used to, which is much
cleaner.
Performance gains of 7.5% have been measured on HTTP connection
rate on empty objects.
These pointers were used to hold pointers to buffers in the past, but
since we introduced the stream interface, they're no longer used but
they were still sometimes set.
Removing them shrink the struct fdtab from 32 to 24 bytes on 32-bit machines,
and from 52 to 36 bytes on 64-bit machines, which is a significant saving. A
quick tests shows a steady 0.5% performance gain, probably due to the better
cache efficiency.
This macro is usable like printf but sends messages to fd #-1, which has no
visible effect but is easy to spot in strace. This is very useful to put
tracers at many points during debugging sessions.
Tunnel timeouts are used when TCP connections are forwarded, or
when forwarding upgraded HTTP connections (WebSocket) as well as
CONNECT requests to proxies.
This timeout allows long-lived sessions to be supported without
having to set large timeouts to normal requests.
Instead of hard-coding sock_raw in connect_server(), we set this socket
operation at config parsing time. Right now, only servers and peers have
it. Proxies are still hard-coded as sock_raw. This will be needed for
future work on SSL which requires a different socket layer.
Similarly to the previous patch, we don't need the socket-layer functions
outside of stream_interface. They could even move to a file dedicated to
applets, though that does not seem particularly useful at the moment.
Commit e164e7a removed get_src/get_dst setting in the stream interfaces but
forgot to set it in proto_tcp. Get the feature back because we need it for
logging, transparent mode, ACLs etc... We now rely on the stream interface
direction to know what syscall to use.
One benefit of doing it this way is that we don't use getsockopt() anymore
on outgoing stream interfaces nor on UNIX sockets.
We'll soon have an SSL socket layer, and in order to ease the difference
between the two, we use the name "sock_raw" to designate the one which
directly talks to the sockets without any conversion.
There is no more reason for the realign function being HTTP specific,
it only operates on a buffer now. Let's move it to buffers.c instead.
It's likely that buffer_bounce_realign is broken (not used), this will
have to be inspected. The function is worth rewriting as it can be
cheaper than buffer_slow_realign() to realign large wrapping buffers.
All keywords registered using a cfg_kw_list now make use of the new error reporting
framework. This allows easier and more precise error reporting without having to
deal with complex buffer allocation issues.
From time to time, some bugs are discovered that are caused by non-initialized
memory areas. It happens that most platforms return a zero-filled area upon
first malloc() thus hiding potential bugs. This patch also replaces malloc()
in pools with calloc() to ensure that all platforms exhibit the same behaviour
upon startup. In order to catch these bugs more easily, add a -dM command line
flag to enable memory poisonning. Optionally, passing -dM<byte> forces the
poisonning byte to <byte>.
A number of important information were missing from the error captures, so
let's improve them. Now we also log source port, session flags, transaction
flags, message flags, pending output bytes, expected buffer wrapping position,
total bytes transferred, message chunk length, and message body length.
As such, the output format has slightly evolved and the source address moved
to the third line :
[08/May/2012:11:14:36.341] frontend echo (#1): invalid request
backend echo (#1), server <NONE> (#-1), event #1
src 127.0.0.1:40616, session #4, session flags 0x00000000
HTTP msg state 26, msg flags 0x00000000, tx flags 0x00000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00909002, out 0 bytes, total 28 bytes
pending 28 bytes, wrapping at 8030, error at position 7:
00000 GET / /?t=20000 HTTP/1.1\r\n
00026 \r\n
[08/May/2012:11:13:13.426] backend echo (#1) : invalid response
frontend echo (#1), server local (#1), event #0
src 127.0.0.1:40615, session #1, session flags 0x0000044e
HTTP msg state 32, msg flags 0x0000000e, tx flags 0x08200000
HTTP chunk len 0 bytes, HTTP body len 20 bytes
buffer flags 0x00008002, out 81 bytes, total 92 bytes
pending 11 bytes, wrapping at 7949, error at position 9:
00000 Foo: bar\r\r\n
The previous sockstream_accept() function uses nothing from sockstream, and
is totally irrelevant to stream interfaces. Move this to the protocols.c
file which handles listeners and protocols, and call it listener_accept().
It now makes much more sense that the code dealing with listen() also handles
accept() and passes it to upper layers.
These operators are used regardless of the socket protocol family. Move
them to a "sock_ops" struct. ->read and ->write have been moved there too
as they have no reason to remain at the protocol level.
Make use of the new IPv6 pattern type so that acl_match_ip() knows how to
compare pattern and sample.
IPv6 may be entered in their usual form, with or without a netmask appended.
Only bit counts are accepted for IPv6 netmasks. In order to avoid any risk of
trouble with randomly resolved IP addresses, host names are never allowed in
IPv6 patterns.
HAProxy is also able to match IPv4 addresses with IPv6 addresses in the
following situations :
- tested address is IPv4, pattern address is IPv4, the match applies
in IPv4 using the supplied mask if any.
- tested address is IPv6, pattern address is IPv6, the match applies
in IPv6 using the supplied mask if any.
- tested address is IPv6, pattern address is IPv4, the match applies in IPv4
using the pattern's mask if the IPv6 address matches with 2002:IPV4::,
::IPV4 or ::ffff:IPV4, otherwise it fails.
- tested address is IPv4, pattern address is IPv6, the IPv4 address is first
converted to IPv6 by prefixing ::ffff: in front of it, then the match is
applied in IPv6 using the supplied IPv6 mask.
We cannot currently match IPv6 addresses in ACL simply because we don't
support types on the patterns. Let's introduce this notion. For now, we
rely on the SMP_TYPES though it doesn't seem like it will last forever
given that some types are not present there (eg: regex, meth). Still it
should be enough to support mixed matchings for most types.
We use the special impossible value SMP_TYPES for types that don't exist
in the SMP_T_* space.
This is mainly a massive renaming in the code to get it in line with the
calling convention. Next patch will rename a few files to complete this
operation.
All parsing errors were known but impossible to return. Now by making use
of memprintf(), we're able to build meaningful error messages that the
caller can display.
HTTP header fetch is now done using smp_fetch_hdr() for both ACLs and
patterns. This one also supports an occurrence number, making it possible
to specify explicit occurrences for ACLs and patterns.
This way, fetch functions will be able to tell if they're called for a single
request or as part of a loop. This is important for instance when we use
hdr(foo), because in an ACL this means that all hdr(foo) occurrences must
be checked while in a pattern it means only one of them (eg: last one).
Patterns were using a bitmask to indicate if request or response was desired
in fetch functions and keywords. ACLs were using a bitmask in fetch keywords
and a single bit in fetch functions. ACLs were also using an ACL_PARTIAL bit
in fetch functions indicating that a non-final fetch was performed, which was
an abuse of the existing direction flag.
The change now consists in using :
- a capabilities field for fetch keywords => SMP_CAP_REQ/RES to indicate
if a keyword supports requests, responses, both, etc...
- an option field for fetch functions to indicate what the caller expects
(request/response, final/non-final)
The ACL_PARTIAL bit was reversed to get SMP_OPT_FINAL as it's more explicit
to know we're working on a final buffer than on a non-final one.
ACL_DIR_* were removed, as well as PATTERN_FETCH_*. L4 fetches were improved
to support being called on responses too since they're still available.
The <dir> field of all fetch functions was changed to <opt> which is now
unsigned.
The patch is large but mostly made of cosmetic changes to accomodate this, as
almost no logic change happened.
Having the args everywhere will make it easier to share fetch functions
between patterns and ACLs. The only place where we could have needed
the expr was in the http_prefetch function which can do well without.
Previously, both pattern, backend and persist_rdp_cookie would build fake
ACL expressions to fetch an RDP cookie by calling acl_fetch_rdp_cookie().
Now we switch roles. The RDP cookie fetch function is provided as a sample
fetch function that all others rely on, including ACL. The code is exactly
the same, only the args handling moved from expr->args to args. The code
was moved to proto_tcp.c, but probably that a dedicated file would be more
suited to content handling.
Now there is no more reference to union pattern_data. All pattern fetch and
conversion functions now make use of the common sample type. Note: none of
them adjust the type right now so it's important to do it next otherwise
we would risk sharing such functions with ACLs and seeing them fail.
This change is pretty minor. Struct pattern is only used for
pattern_process() now so changing it to use the common type is
quite obvious. It's worth noting that the last argument of
pattern_process() is never used so the function is self-sufficient.
Note that pattern_process() does not initialize the pattern at all
before calling fetch->process(), and that minimal initialization
will be required when we later change the argument for the sample.
These ones were either unused or improperly used. Some integers were marked
read-only, which does not make much sense. Buffers are not read-only, they're
"constant" in that they must be kept intact after any possible change.
This one is not needed anymore as we can return the data and its type in the
sample provided by the caller. ACLs now always return the proper type. BOOL
is already returned when the result is expected to be processed as a boolean.
temp_pattern has been unexported now.
The new sample types are necessary for the acl-pattern convergence.
These types are boolean and signed int. Some types were renamed for
less ambiguity (ip->ipv4, integer->uint).
The pattern type is ambiguous because a pattern is only a type and a data
part, and is normally used to match against samples. Currently, patterns
cannot hold information related to the life of the data which was extracted.
We don't want to overload patterns either, so let's add a new "sample" type
which will progressively supersede the acl_test and maybe the pattern at most
places. The sample shares similar information with patterns and also has flags
describing the data volatility and protection.
This flag was used to force a boolean match even if there was no pattern
to match. It was used only by http_auth() and designed only for this one.
It's easier and cleaner to make the fetch function perform the test and
report the boolean result as a few other functions already do. It simplifies
the acl_exec_cond() logic and will help merging ACLs and patterns.
This is used to validate that arguments are coherent. For instance,
payload_lv expects that the last arg (if any) is not more negative
than the sum of the first two. The error is reported if any.
We don't need the pattern-specific args parsers anymore, make use of the
common parser instead. We still need to improve this by adding a validation
function to report abnormal argument values or combinations. We don't report
precise parsing errors yet but this was not previously done either.
arg_i was almost unused, and since we migrated to use struct arg everywhere,
the rare cases where arg_i was needed could be replaced by switching to
arg->type = ARGT_STOP.
The types and minimal number of ACL keyword arguments are now stored in
their declaration. This will allow many more fantasies if some ACL use
several arguments or types.
Doing so required to rework all ACL keyword declarations to add two
parameters. So this was a good opportunity for a general cleanup and
to sort all entries in alphabetical order.
We still have two pending issues :
- parse_acl_expr() checks for errors but has no way to report them to
the user ;
- the types of some arguments are still not resolved and kept as strings
(eg: ARGT_FE/BE/TAB) for compatibility reasons, which must be resolved
in acl_find_targets()
The ACL parser now uses the argument parser to build a typed argument list.
Right now arguments are all strings and only one argument is supported since
this is what ACLs currently support.
make_arg_list() builds an array of typed arguments with their values,
that the caller describes how to parse. This will be used to support
multiple arguments for ACLs and patterns, which is currently problematic
and prevents ACLs and patterns from being merged. Up to 7 arguments types
may be enumerated in a single 32-bit word, including their number of
mandatory parts.
At the moment, these files are not used yet, they're only built. Note that
the 4-bit encoding for the type has left only one unused type!
This is more convenient and efficient than buf->p = b_ptr(buf, n);
It simply advances the buffer's pointer by <n> and trasfers that
amount of bytes from <in> to <out>. The BF_OUT_EMPTY flag is updated
accordingly.
A few occurrences of such computations in buffers.c and stream_sock.c
were updated to use b_adv(), which resulted in a small code shrink.
buffer_wrap_add was convenient for the migration but is not handy at all.
Let's have new wrappers that report input begin/end and output begin/end
instead.
It looks like we'll also need a b_adv(ofs) to advance a buffer's pointer.
buffer_ignore may only be used when the output of a buffer is empty,
but it's not granted it is always the case when sending HTTP error
responses. Better use buffer_cut_tail() instead, and use buffer_ignore
only on non-wrapping data.
msg->sol is now a relative pointer just like all other ones. There is no
more absolute references to the buffer outside the struct buffer itself.
Next two cleanups should include removing buffer references to functions
which already have an msg, and removal of wrapping detection in request
and response parsing which cannot wrap by definition.
ACLs and patterns only rely on a struct http_msg and don't know the pointer
to the actual data. struct http_msg will soon only hold relative references
so that's not possible. We need http_msg to hold a reference to the struct
buffer before having relative pointers everywhere.
It is likely that doing so will also result in opportunities to simplify
a number of functions arguments. The following functions are already
candidate :
http_buffer_heavy_realign
http_capture_bad_message
http_change_connection_header
http_forward_trailers
http_header_add_tail
http_header_add_tail2
http_msg_analyzer
http_parse_chunk_size
http_parse_connection_header
http_remove_header2
http_send_name_header
http_skip_chunk_crlf
http_upgrade_v09_to_v10
These offsets were relative to the buffer itself. Now they're relative to
the buffer's origin (buf->p) which normally corresponds to the start of
current message.
This saves a big dependency between the HTTP message struct and the buffers.
It appeared during this change that ->col is not used anymore (it will have
to be removed). Next step is to turn ->eol and ->sol from absolute to relative.
The buffer's pointer <lr> was only used by HTTP parsers which also use a
struct http_msg to keep track of the parser's state. We've reached a point
where it makes no sense to keep ->lr in the buffer, as the split between
buffer and msg is only arbitrary for historical reasons.
This change ensures that touching buffers will not impact HTTP messages
anymore, making the buffers more content-agnostic. However, it becomes
very important not to forget to update msg->next when some data get
forwarded or moved (and in general each time buf->p is updated).
The new pointer in http_msg becomes relative to buffer->p so that
parsing multiple messages becomes easier. It is possible that at one
point ->som and ->next will be merged.
Note: http_parse_reqline() and http_parse_stsline() have been temporarily
modified to know the message starting point in the buffer (->p).
This change gets rid of buf->r which is always equal to buf->p + buf->i.
It removed some wrapping detection at a number of places, but required addition
of new relative offset computations at other locations. A large number of places
can be simplified now with extreme care, since most of the time, either the
pointer has to be computed once or we need a difference between the old ->w and
old ->r to compute free space. The cleanup will probably happen with the rewrite
of the buffer_input_* and buffer_output_* functions anyway.
buf->lr still has to move to the struct http_msg and be relative to buf->p
for the rework to be complete.
This change introduces the buffer's base pointer, which is the limit between
incoming and outgoing data. It's the point where the parsing should start
from. A number of computations have already been greatly simplified, but
more simplifications are expected to come from the removal of buf->r.
The changes appear good and have revealed occasional improper use of some
pointers. It is possible that this patch has introduced bugs or revealed
some, although preliminary testings tend to indicate that everything still
works as it should.
We don't have buf->l anymore. We have buf->i for pending data and
the total length is retrieved by adding buf->o. Some computation
already become simpler.
Despite extreme care, bugs are not excluded.
It's worth noting that msg->err_pos as set by HTTP request/response
analysers becomes relative to pending data and not to the beginning
of the buffer. This has not been completed yet so differences might
occur when outgoing data are left in the buffer.
Too many flags are stored in the transaction structure. Some flags are
clearly message-specific and exist in two versions (request and response).
Move them to a new "flags" field in the http_message struct instead.
memprintf() is just like snprintf() except that it always returns a properly
sized allocated string that the caller is responsible for freeing. NULL is
returned on serious errors. It also supports stackable calls over the same
pointer since it offers support for automatically freeing a previous one :
memprintf(&err, "invalid argument: '%s'", arg);
...
memprintf(&err, "keyword parser said: <%s>", *err);
...
memprintf(&err, "line parser said: %s\n", *err);
...
free(*err);
It's very annoying that we have to deal with the crappy size_t and with ints
at some places because these ones don't mix well. Patch 6f61b2 changed the
chunk len to int but its size remains size_t and some functions are having
trouble being used by several callers depending on the type of their arguments.
Let's turn extract_cookie_value() to int for now on, and plan a massive cleanup
later to remove all size_t.
These callbacks are used to retrieve the source and destination address
of a socket. The address flags are not hold on the stream interface and
not on the session anymore. The addresses are collected when needed.
This still needs to be improved to store the IP and port separately so
that it is not needed to perform a getsockname() when only the IP address
is desired for outgoing traffic.