We're having a lot of duplicate code just because of minor variants between
fetch functions that could be dealt with if the functions had the pointer to
the original keyword, so let's pass it as the last argument. An earlier
version used to pass a pointer to the sample_fetch element, but this is not
the best solution for two reasons :
- fetch functions will solely rely on the keyword string
- some other smp_fetch_* users do not have the pointer to the original
keyword and were forced to pass NULL.
So finally we're passing a pointer to the keyword as a const char *, which
perfectly fits the original purpose.
Benoit Dolez reported a failure to start haproxy 1.5-dev19. The
process would immediately report an internal error with missing
fetches from some crap instead of ACL names.
The cause is that some versions of gcc seem to trim static structs
containing a variable array when moving them to BSS, and only keep
the fixed size, which is just a list head for all ACL and sample
fetch keywords. This was confirmed at least with gcc 3.4.6. And we
can't move these structs to const because they contain a list element
which is needed to link all of them together during the parsing.
The bug indeed appeared with 1.5-dev19 because it's the first one
to have some empty ACL keyword lists.
One solution is to impose -fno-zero-initialized-in-bss to everyone
but this is not really nice. Another solution consists in ensuring
the struct is never empty so that it does not move there. The easy
solution consists in having a non-null list head since it's not yet
initialized.
A new "ILH" list head type was thus created for this purpose : create
an Initialized List Head so that gcc cannot move the struct to BSS.
This fixes the issue for this version of gcc and does not create any
burden for the declarations.
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
I left a mistake in my previous patch bringing the crt-list feature,
it breaks clients with no SNI support.
Also remove the useless wildp = NULL as per a previous discussion.
We were using "tune.ssl.maxrecord 2000" and discovered an interesting
problem: SSL data sent from the server to the client showed occasional
corruption of the payload data.
The root cause was:
When ssl_max_record is smaller than the requested send amount
the ring buffer wrapping wasn't properly adjusting the
number of bytes to send.
I solved this by selecting the initial size based on the number
of output bytes that can be sent without splitting _before_ checking
against ssl_max_record.
This new pattern fetch returns the client certificate's SHA-1 fingerprint
(i.e. SHA-1 hash of DER-encoded certificate) in a binary chunk.
This can be useful to pass it to a server in a header or to stick a client
to a server across multiple SSL connections.
Improve the crt-list file format to allow a rule to negate a certain SNI :
<crtfile> [[!]<snifilter> ...]
This can be useful when a domain supports a wildcard but you don't want to
deliver the wildcard cert for certain specific domains.
The ALPN extension is meant to replace the now deprecated NPN extension.
This patch implements support for it. It requires a version of openssl
with support for this extension. Patches are available here right now :
http://html5labs.interopbridges.com/media/167447/alpn_patches.zip
Now that ACLs solely rely on sample fetch functions, make them use the
same arg mask. All inconsistencies have been fixed separately prior to
this patch, so this patch almost only adds a new pointer indirection
and removes all references to ARG*() in the definitions.
The parsing is still performed by the ACL code though.
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.
In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.
A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
This flag is used on ACL matches that support being looking up patterns
in trees. At the moment, only strings and IPs support tree-based lookups,
but the flag is randomly set also on integers and binary data, and is not
even always set on strings nor IPs.
Better get rid of this mess by only relying on the matching function to
decide whether or not it supports tree-based lookups, this is safer and
easier to maintain.
fe61656b added the ability to load a list of certificates from a file,
but error control was incomplete and misleading, as some errors such
as missing files were not reported, and errors reported with Alert()
instead of memprintf() were inappropriate and mixed with upper errors.
Also, the code really supports a single SNI filter right now, so let's
correct it and the doc for that, leaving room for later change if needed.
It designates a list of PEM file with an optional list of SNI filter
per certificate, with the following format for each line :
<crtfile>[ <snifilter>]*
Wildcards are supported in the SNI filter. The certificates will be
presented to clients who provide a valid TLS Server Name Indication
field matching one of SNI filter. If no SNI filter is specified the
CN and alt subjects are used.
Add new tunable "tune.ssl.maxrecord".
Over SSL/TLS, the client can decipher the data only once it has received
a full record. With large records, it means that clients might have to
download up to 16kB of data before starting to process them. Limiting the
record size can improve page load times on browsers located over high
latency or low bandwidth networks. It is suggested to find optimal values
which fit into 1 or 2 TCP segments (generally 1448 bytes over Ethernet
with TCP timestamps enabled, or 1460 when timestamps are disabled), keeping
in mind that SSL/TLS add some overhead. Typical values of 1419 and 2859
gave good results during tests. Use "strace -e trace=write" to find the
best value.
This trick was first suggested by Mike Belshe :
http://www.belshe.com/2010/12/17/performance-and-the-tls-record-size/
Then requested again by Ilya Grigorik who provides some hints here :
http://ofps.oreilly.com/titles/9781449344764/_transport_layer_security_tls.html#ch04_00000101
Openssl needs to access /dev/urandom to initialize its internal random
number generator. It does so when it needs a random for the first time,
which fails if it is a handshake performed after the chroot(), causing
all SSL incoming connections to fail.
This fix consists in calling RAND_bytes() to produce a random before
the chroot, which will in turn open /dev/urandom before it's too late,
and avoid the issue.
If the random generator fails to work while processing the config,
haproxy now fails with an error instead of causing SSL connections to
fail at runtime.
This new option ensures that there is no possible fallback to a default
certificate if the client does not provide an SNI which is explicitly
handled by a certificate.
At the moment, we need trash chunks almost everywhere and the only
correctly implemented one is in the sample code. Let's move this to
the chunks so that all other places can use this allocator.
Additionally, the get_trash_chunk() function now really returns two
different chunks. Previously it used to always overwrite the same
chunk and point it to a different buffer, which was a bit tricky
because it's not obvious that two consecutive results do alias each
other.
J. Maurice reported that ssllabs.com test affects unrelated
legitimate traffic and cause SSL errors and broken connections.
Sometimes openssl store read/write/handshake errors in a global stack. This
stack is not specific to the current session. Openssl API does not clean the
stack at the beginning of a new read/write. And the function used to retrieve
error ID after read/write, returns the generic error SSL_ERROR_SSL if the
global stack is not empty.
The fix consists in cleaning the errors stack after read/write/handshake
errors.
When using ca_ignore_err/crt_ignore_err, a connection to an untrusted
server raises an error which is ignored. But the next SSL_read() that
encounters EAGAIN raises the error again, breaking the connection.
Subsequent connections don't have this problem because the session has
been stored and is correctly reused without performing a verify again.
The solution consists in correctly flushing the SSL error stack when
ignoring the crt/ca error.
It's annoying that handshake handlers remove themselves from the
connection flags when they fail because there is no way to tell
which one fails. So now we only remove them when they succeed.
SSL_do_handshake is not appropriate for reneg, it's only appropriate at the
beginning of a connection. OpenSSL correctly handles renegs using the data
functions, so we use SSL_peek() here to make its state machine progress if
SSL_renegotiate_pending() says a reneg is pending.
SSL may decide to switch to a handshake in the middle of a transfer due to
a reneg. In this case we don't want to re-enable polling because data might
have been left pending in the buffer. We just want to switch immediately to
the handshake mode.
Instead of storing a couple of (int, ptr) in the struct connection
and the struct session, we use a different method : we only store a
pointer to an integer which is stored inside the target object and
which contains a unique type identifier. That way, the pointer allows
us to retrieve the object type (by dereferencing it) and the object's
address (by computing the displacement in the target structure). The
NULL pointer always corresponds to OBJ_TYPE_NONE.
This reduces the size of the connection and session structs. It also
simplifies target assignment and compare.
In order to improve the generated code, we try to put the obj_type
element at the beginning of all the structs (listener, server, proxy,
si_applet), so that the original and target pointers are always equal.
A lot of code was touched by massive replaces, but the changes are not
that important.
The trash is used everywhere to store the results of temporary strings
built out of s(n)printf, or as a storage for a chunk when chunks are
needed.
Using global.tune.bufsize is not the most convenient thing either.
So let's replace trash with a chunk and directly use it as such. We can
then use trash.size as the natural way to get its size, and get rid of
many intermediary chunks that were previously used.
The patch is huge because it touches many areas but it makes the code
a lot more clear and even outlines places where trash was used without
being that obvious.
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.
Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().
This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
ssl_c_notbefore: start date of client cert (string, eg: "121022182230Z" for YYMMDDhhmmss[Z])
ssl_c_notafter: end date of client cert (string, eg: "121022182230Z" for YYMMDDhhmmss[Z])
ssl_f_notbefore: start date of frontend cert (string, eg: "121022182230Z" for YYMMDDhhmmss[Z])
ssl_f_notafter: end date of frontend cert (string, eg: "121022182230Z" for YYMMDDhhmmss[Z])
ssl_c_key_alg: algo used to encrypt the client's cert key (ex: rsaEncryption)
ssl_f_key_alg: algo used to encrypt the frontend's cert key (ex: rsaEncryption)
ssl_c_s_dn : client cert subject DN (string)
ssl_c_i_dn : client cert issuer DN (string)
ssl_f_s_dn : frontend cert subject DN (string)
ssl_f_i_dn : frontend cert issuer DN (string)
Return either the full DN without params, or just the DN entry (first param) or
its specific occurrence (second param).