If the ClientHello callback does not manage to find a correct QUIC transport
parameters extension, we immediately close the connection with
missing_extension(109) as TLS alert which is turned into 0x16d QUIC connection
error.
We set this TLS error when no application protocol could be negotiated
via the TLS callback concerned. It is converted as a QUIC CRYPTO_ERROR
error (0x178).
SSL counters were added with commit d0447a7c3 ("MINOR: ssl: add counters
for ssl sessions") in 2.4, but their updates were not atomic, so it's
likely that under significant loads they are not correct.
This needs to be backported to 2.4.
If we want to run quic-tracker against haproxy, we must at least
support the draft version of the TLS extension for the QUIC transport
parameters (0xffa5). quic-tracker QUIC version is draft-29 at this time.
We select this depending on the QUIC version. If draft, we select the
draft TLS extension.
Since commit c2aae74 ("MEDIUM: ssl: Handle early data with OpenSSL
1.1.1"), the codepath of the clientHello callback changed, letting an
unknown SNI escape with a 'return 1' instead of passing through the
abort label.
An error was still emitted because the frontend continued the handshake
with the initial_ctx, which can't be used to achieve an handshake.
However, it had the ugly side effect of letting the request pass in the
case of a TLS resume. Which could be surprising when combining strict-sni
with the removing of a crt-list entry over the CLI for example. (like
its done in the ssl/new_del_ssl_crlfile.vtc reg-test).
This patch switches the code path of the allow_early and abort label, so
the default code path is the abort one, letting the clientHello returns
the correct SSL_AD_UNRECOGNIZED_NAME in case of errors.
Which means the client will now receive:
OpenSSL error[0x14094458] ssl3_read_bytes: tlsv1 unrecognized name
Instead of:
OpenSSL error[0x14094410] ssl3_read_bytes: sslv3 alert handshake failure
Which was the error emitted before HAProxy 1.8.
This patch must be carrefuly backported as far as 1.8 once we validated
its impact.
When establishing an outboud connection, haproxy checks if the cached
TLS session has the same SNI as the connection we are trying to
resume.
This test was done by calling SSL_get_servername() which in TLSv1.2
returned the SNI. With TLSv1.3 this is not the case anymore and this
function returns NULL, which invalidates any outboud connection we are
trying to resume if it uses the sni keyword on its server line.
This patch fixes the problem by storing the SNI in the "reused_sess"
structure beside the session itself.
The ssl_sock_set_servername() now has a RWLOCK because this session
cache entry could be accessed by the CLI when trying to update a
certificate on the backend.
This fix must be backported in every maintained version, however the
RWLOCK only exists since version 2.4.
The else is not for boringSSL but for the lack of Client Hello callback.
Should have been changed in 1fc44d4 ("BUILD: ssl: guard Client Hello
callbacks with HAVE_SSL_CLIENT_HELLO_CB macro instead of openssl
version").
Could be backported in 2.4.
Remove the hardcoded initialization of h3 layer on mux init. Now the
ALPN is looked just after the SSL handshake. The app layer is then
installed if the ALPN negotiation returned a supported protocol.
This required to add a get_alpn on the ssl_quic layer which is just a
call to ssl_sock_get_alpn() from ssl_sock. This is mandatory to be able
to use conn_get_alpn().
These two counters were the only ones not in the global struct, while
the SSL freq counters or the req counts are already in it, this forces
stats.c to include ssl_sock just to know about them. Let's move them
over there with their friends. This reduces from 408 to 384 the number
of includes of opensslconf.h.
This one has nothing to do with ssl_sock as it manipulates the struct
server only. Let's move it to server.c and remove unneeded dependencies
on ssl_sock.h. This further reduces by 10% the number of includes of
opensslconf.h and by 0.5% the number of compiled lines.
This one doesn't use anything from an SSL context, it only checks the
type of the transport layer of a connection, thus it belongs to
connection.h. This is particularly visible due to all the ifdefs
around it in various call places.
In case of error while calling a SSL_read or SSL_write, the
SSL_get_error function is called in order to know more about the error
that happened. If the error code is SSL_ERROR_SSL or SSL_ERROR_SYSCALL,
the error queue might contain more information on the error. This error
code was not used until now. But we now need to store it in order for
backend error fetches to catch all handshake related errors.
The change was required because the previous backend fetch would not
have raised anything if the client's certificate was rejected by the
server (and the connection interrupted). This happens because starting
from TLS1.3, the 'Finished' state on the client is reached before its
certificate is sent to the server (see the "Protocol Overview" part of
RFC 8446). The only place where we can detect that the server rejected the
certificate is after the first SSL_read call after the SSL_do_handshake
function.
This patch then adds an extra ERR_peek_error after the SSL_read and
SSL_write calls in ssl_sock_to_buf and ssl_sock_from_buf. This means
that it could set an error code in the SSL context a long time after the
handshake is over, hence the change in the error fetches.
The ssl_bc_hsk_err sample fetch will need to raise more errors than only
handshake related ones hence its renaming to a more generic ssl_bc_err.
This patch is required because some handshake failures that should have
been caught by this fetch (verify error on the server side for instance)
were missed. This is caused by a change in TLS1.3 in which the
'Finished' state on the client is reached before its certificate is sent
(and verified) on the server side (see the "Protocol Overview" part of
RFC 8446).
This means that the SSL_do_handshake call is finished long before the
server can verify and potentially reject the client certificate.
The ssl_bc_hsk_err will then need to be expanded to catch other types of
errors.
This change is also applied to the frontend fetches (ssl_fc_hsk_err
becomes ssl_fc_err) and to their string counterparts.
In case of a connection error happening after the SSL handshake is
completed, the error code stored in the connection structure would not
always be set, hence having some connection failures being described as
successful in the fc_conn_err or bc_conn_err sample fetches.
The most common case in which it could happen is when the SSL server
rejects the client's certificate. The SSL_do_handshake call on the
client side would be sucessful because the client effectively sent its
client hello and certificate information to the server, but the next
call to SSL_read on the client side would raise an SSL_ERROR_SSL code
(through the SSL_get_error function) which is decribed in OpenSSL
documentation as a non-recoverable and fatal SSL error.
This patch ensures that in such a case, the connection's error code is
set to a special CO_ERR_SSL_FATAL value.
Set the streams transport parameters which could not be initialized because they
were not available during initializations. Indeed, the streams transport parameters
are provided by the peer during the handshake.
This solves setting XXH_INLINE_ALL in a cleaner way, because the imported
header is not modified, easing future updates.
see 6f7cc11e6dd0f01b437fba893da2edd2362660a2
When we set tune.ssl.capture-cipherlist-size to a non-zero value
we are able to capture cipherlist supported by the client. To be able to
provide JA3 compatible TLS fingerprinting we need to capture more
information from Client Hello message:
- SSL Version
- SSL Extensions
- Elliptic Curves
- Elliptic Curve Point Formats
This patch allows HAProxy to capture such information and store it for
later use.
The X509_STORE_CTX_get0_cert did not exist yet on OpenSSL 1.0.2 and
neither did X509_STORE_CTX_get0_chain, which was not actually needed
since its get1 equivalent already existed.
Most of the SSL sample fetches related to the client certificate were
based on the SSL_get_peer_certificate function which returns NULL when
the verification process failed. This made it impossible to use those
fetches in a log format since they would always be empty.
The patch adds a reference to the X509 object representing the client
certificate in the SSL structure and makes use of this reference in the
fetches.
The reference can only be obtained in ssl_sock_bind_verifycbk which
means that in case of an SSL error occurring before the verification
process ("no shared cipher" for instance, which happens while processing
the Client Hello), we won't ever start the verification process and it
will be impossible to get information about the client certificate.
This patch also allows most of the ssl_c_XXX fetches to return a usable
value in case of connection failure (because of a verification error for
instance) by making the "conn->flags & CO_FL_WAIT_XPRT" test (which
requires a connection to be established) less strict.
Thanks to this patch, a log-format such as the following should return
usable information in case of an error occurring during the verification
process :
log-format "DN=%{+Q}[ssl_c_s_dn] serial=%[ssl_c_serial,hex] \
hash=%[ssl_c_sha1,hex]"
It should answer to GitHub issue #693.
This new sample fetch along the ssl_fc_hsk_err_str fetch contain the
last SSL error of the error stack that occurred during the SSL
handshake (from the frontend's perspective). The errors happening during
the client's certificate verification will still be given by the
ssl_c_err and ssl_c_ca_err fetches. This new fetch will only hold errors
retrieved by the OpenSSL ERR_get_error function.
Use non-checked function to retrieve listener/server via obj_type. This
is done as a previous obj_type function ensure that the type is well
known and the instance is not NULL.
Incidentally, this should prevent the coverity report from the #1335
github issue which warns about a possible NULL dereference.
The function ssl_sock_load_srv_cert will be used at runtime for dynamic
servers. If the cert is not loaded on ckch tree, we try to access it
from the file-system.
Now this access operation is rendered optional by a new function
argument. It is only allowed at parsing time, but will be disabled for
dynamic servers at runtime.
Explicitly call ssl_initialize_random to initialize the random generator
in init() global function. If the initialization fails, the startup is
interrupted.
This commit is in preparation for support of ssl on dynamic servers. To
be able to activate ssl on dynamic servers, it is necessary to ensure
that the random generator is initialized on startup regardless of the
config. It cannot be called at runtime as access to /dev/urandom is
required.
This also has the effect to fix the previous non-consistent behavior.
Indeed, if bind or server in the config are using ssl, the
initialization function was called, and if it failed, the startup was
interrupted. Otherwise, the ssl initialization code could have been
called through the ssl server for lua, but this times without blocking
the startup on error. Or not called at all if lua was deactivated.
The global shctx lookups and misses was updated without using atomic
ops, so the stats available in "show info" are very likely off by a few
units over time. This should be backported as far as 1.8. Versions
without _HA_ATOMIC_INC() can use HA_ATOMIC_ADD(,1).
The ifdefs surrounding the "show ssl ocsp-response" functionality that
were supposed to disable the code with BoringSSL were built the wrong
way.
It does not need to be backported.
This patch adds the "show ssl ocsp-response [<id>]" CLI command. This
command can be used to display the IDs of the OCSP tree entries along
with details about the entries' certificate ID (issuer's name and key
hash + serial number), or to display the details of a single
ocsp-response if an ID is given. The details displayed in this latter
case are the ones shown by a "openssl ocsp -respin <ocsp-response>
-text" call.
The OCSP tree entry key is a serialized version of the OCSP_CERTID of
the entry which is stored in a buffer that can be at most 128 bytes.
Depending on the length of the serial number, the actual non-zero part
of the key can be smaller than 128 bytes and this new structure member
allows to know how many of the bytes are filled. It will be useful when
dumping the key (in a "show ssl cert <cert>" output for instance).
The wey the "Next Update" field of the OCSP response is converted into a
timestamp relies on the use of signed integers for the year and month so
if the calculated timestamp happens to overflow INT_MAX, it ends up
being seen as negative and the OCSP response being dwignored in
ssl_sock_ocsp_stapling_cbk (because of the "ocsp->expire < now.tv_sec"
test).
It could be backported to all stable branches.
Since commit 04a5a44 ("BUILD: ssl: use HAVE_OPENSSL_KEYLOG instead of
OpenSSL versions") the "tune.ssl.keylog" feature is broken because
HAVE_OPENSSL_KEYLOG does not exist.
Replace this by a HAVE_SSL_KEYLOG which is defined in openssl-compat.h.
Also add an error when not built with the right openssl version.
Must be backported as far as 2.3.
Initialize the parsing context when checking server config validity.
Adjust the log messages to remove redundant config file/line and server
name. Do a similar cleaning in prepare_srv from ssl_sock as this
function is called at the same stage.
This will standardize the stderr output on startup with the parse_server
function.
Set "config :" as a prefix for the user messages context before starting
the configuration parsing. All following stderr output will be prefixed
by it.
As a consequence, remove extraneous prefix "config" already specified in
various ha_alert/warning/notice calls.
Some changes in the OpenSSL syntax API broke this syntax:
#if SSL_OP_NO_TLSv1_3
OpenSSL made this change which broke our usage in commit f04bb0bce490de847ed0482b8ec9eabedd173852:
-# define SSL_OP_NO_TLSv1_3 (uint64_t)0x20000000
+#define SSL_OP_BIT(n) ((uint64_t)1 << (uint64_t)n)
+# define SSL_OP_NO_TLSv1_3 SSL_OP_BIT(29)
Which can't be evaluated by the preprocessor anymore.
This patch replace the test by an openssl version test.
This fix part of #1276 issue.
A memory allocation failure happening during ssl_init_single_engine
would have resulted in a crash. This function is only called during
init.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
The CA/CRL hot update patches did not compile on some targets of the CI
(mainly gcc + ssl). This patch should fix almost all of them. It adds
missing variable initializations and return value checks to the
BIO_reset calls in show_crl_detail.
In order for the link between the cafile_entry and the default ckch
instance to be built, we need to give a pointer to the instance during
the ssl_sock_prepare_ctx call.
Each ca-file entry of the tree will now hold a list of the ckch
instances that use it so that we can iterate over them when updating the
ca-file via a cli command. Since the link between the SSL contexts and
the CA file tree entries is only built during the ssl_sock_prepare_ctx
function, which are called after all the ckch instances are created, we
need to add a little post processing after each ssl_sock_prepare_ctx
that builds the link between the corresponding ckch instance and CA file
tree entries.
In order to manage the ca-file and ca-verify-file options, any ckch
instance can be linked to multiple CA file tree entries and any CA file
entry can link multiple ckch instances. This is done thanks to a
dedicated list of ckch_inst references stored in the CA file tree
entries over which we can iterate (during an update for instance). We
avoid having one of those instances go stale by keeping a list of
references to those references in the instances.
When deleting a ckch_inst, we can then remove all the ckch_inst_link
instances that reference it, and when deleting a cafile_entry, we
iterate over the list of ckch_inst reference and clear the corresponding
entry in their own list of ckch_inst_link references.
This patch moves all the ssl_store related code to ssl_ckch.c since it
will mostly be used there once the CA file update CLI commands are all
implemented. It also makes the cafile_entry structure visible as well as
the cafile_tree.
This function is one of the few high-profile, unresolved ones in the memory
profile output, let's have it resolve to ease matching of SSL allocations,
which are not easy to follow.
There were 102 CLI commands whose help were zig-zagging all along the dump
making them unreadable. This patch realigns all these messages so that the
command now uses up to 40 characters before the delimiting colon. About a
third of the commands did not correctly list their arguments which were
added after the first version, so they were all updated. Some abuses of
the term "id" were fixed to use a more explanatory term. The
"set ssl ocsp-response" command was not listed because it lacked a help
message, this was fixed as well. The deprecated enable/disable commands
for agent/health/server were prominently written as deprecated. Whenever
possible, clearer explanations were provided.
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
This patch replaces roughly all occurrences of an HA_ATOMIC_ADD(&foo, 1)
or HA_ATOMIC_SUB(&foo, 1) with the equivalent HA_ATOMIC_INC(&foo) and
HA_ATOMIC_DEC(&foo) respectively. These are 507 changes over 45 files.
Currently our atomic ops return a value but it's never known whether
the fetch is done before or after the operation, which causes some
confusion each time the value is desired. Let's create an explicit
variant of these operations suffixed with _FETCH to explicitly mention
that the fetch occurs after the operation, and make use of it at the
few call places.
The default SSL_CTX used by a specific frontend is the one of the first
ckch instance created for this frontend. If this instance has SNIs, then
the SSL context is linked to the instance through the list of SNIs
contained in it. If the instance does not have any SNIs though, then the
SSL_CTX is only referenced by the bind_conf structure and the instance
itself has no link to it.
When trying to update a certificate used by the default instance through
a cli command, a new version of the default instance was rebuilt but the
default SSL context referenced in the bind_conf structure would not be
changed, resulting in a buggy behavior in which depending on the SNI
used by the client, he could either use the new version of the updated
certificate or the original one.
This patch adds a reference to the default SSL context in the default
ckch instances so that it can be hot swapped during a certificate
update.
This should fix GitHub issue #1143.
It can be backported as far as 2.2.
If an unknown CA file was first mentioned in an "add ssl crt-list" CLI
command, it would result in a call to X509_STORE_load_locations which
performs a disk access which is forbidden during runtime. The same would
happen if a "ca-verify-file" or "crl-file" was specified. This was due
to the fact that the crt-list file parsing and the crt-list related CLI
commands parsing use the same functions.
The patch simply adds a new parameter to all the ssl_bind parsing
functions so that they know if the call is made during init or by the
CLI, and the ssl_store_load_locations function can then reject any new
cafile_entry creation coming from a CLI call.
It can be backported as far as 2.2.
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.