When openssl_load_certificates() is called as a result of USR2
signal, it has the effect of SSL_free() on certificates.
But pointers to these certificates are borrowed by the ioa_engines
where they are used for new connections.
The tls_mutex when loading the certificates does not prevent this use
because it's released before despatching asynchronous events to each
ioa_engine asking them to pick up the new SSL context.
So there is a race; if a new connection arrives quickly after
openssl_load_certificates() but before the tls_ctx_update_ev.
This patch resolves this using OpenSSL's own fine grained locking.
The ioa_engines now 'copy' the SSL context (actually a refcounted copy)
When SSL certificates are renewed during runtime (via SIGUSR2),
e->dtls_ctx is replaced with a context based on the new certificate.
But this code continues to operate on its own borrowed pointer.
This is clearly visible using valgrind, but the bug is subtle and not
always noticed at runtime, possibly due to some fortunate re-use of
memory.
At the point of SSL_new():
==28413== Thread 5:
==28413== Invalid read of size 8
==28413== at 0x4F6198F: SSL_new (in /lib/libssl.so.1.1)
==28413== by 0x137A72: dtls_server_input_handler (dtls_listener.c:291)
==28413== by 0x137A72: handle_udp_packet (dtls_listener.c:443)
==28413== by 0x138153: udp_server_input_handler (dtls_listener.c:728)
==28413== by 0x4FC499E: ??? (in /usr/lib/libevent_core-2.1.so.7.0.0)
==28413== by 0x4FC50AF: event_base_loop (in /usr/lib/libevent_core-2.1.so.7.0.0)
==28413== by 0x121F34: run_events (netengine.c:1579)
==28413== by 0x121F34: run_general_relay_thread (netengine.c:1707)
==28413== by 0x40517B6: start (pthread_create.c:195)
==28413== by 0x40538EF: ??? (clone.s:22)
==28413== Address 0x49a75e0 is 0 bytes inside a block of size 1,024 free'd
==28413== at 0x48A074F: free (vg_replace_malloc.c:540)
==28413== by 0x4F5F6F1: SSL_CTX_free (in /lib/libssl.so.1.1)
==28413== by 0x11CEC4: set_ctx (mainrelay.c:3104)
==28413== by 0x11D233: openssl_load_certificates (mainrelay.c:3173)
==28413== by 0x11D328: reload_ssl_certs (mainrelay.c:3190)
==28413== by 0x4FC4601: ??? (in /usr/lib/libevent_core-2.1.so.7.0.0)
==28413== by 0x4FC50AF: event_base_loop (in /usr/lib/libevent_core-2.1.so.7.0.0)
==28413== by 0x122582: run_events (netengine.c:1579)
==28413== by 0x122582: run_listener_server (netengine.c:1603)
==28413== by 0x110BB8: main (mainrelay.c:2536)
==28413== Block was alloc'd at
==28413== at 0x489F72A: malloc (vg_replace_malloc.c:309)
==28413== by 0x4DFA2C6: CRYPTO_zalloc (in /lib/libcrypto.so.1.1)
==28413== by 0x4F5F79E: SSL_CTX_new (in /lib/libssl.so.1.1)
==28413== by 0x11CA80: set_ctx (mainrelay.c:2875)
==28413== by 0x11D233: openssl_load_certificates (mainrelay.c:3173)
==28413== by 0x110A19: openssl_setup (mainrelay.c:3139)
==28413== by 0x110A19: main (mainrelay.c:2396)
==28413==
Multiple DTLS listener servers are created, and server->dtls_ctx is
the same object shared between them.
Set these callbacks once, and logically this is at the point where the
SSL context is created.