We now count the number of times a check was started on each thread
and the number of times a check was adopted. This helps understand
better what is observed regarding checks.
The goal here is to explicitly mark that a check was migrated so that
we don't do it again. This will allow us to perform other actions on
the target thread while still knowing that we don't want to be migrated
again. The new READY bit combine with SLEEPING to form 4 possible states:
SLP RDY State Description
0 0 - (reserved)
0 1 RUNNING Check is bound to current thread and running
1 0 SLEEPING Check is sleeping, not bound to a thread
1 1 MIGRATING Check is migrating to another thread
Thus we set READY upon migration, and check for it before migrating, this
is sufficient to prevent a second migration. To make things a bit clearer,
the SLEEPING bit was switched with FASTINTER so that SLEEPING and READY are
adjacent.
When a check leaves the sleeping state, we must pin it to the thread that
is processing it. It's normally always the case after the first execution,
but initial checks that start assigned to any thread (-1) could be assigned
much later, causing problems with planned changes involving queuing. Thus
better do it early, so that all threads start properly pinned.
The CHK_ST_SLEEPING state was introduced by commit d114f4a68 ("MEDIUM:
checks: spread the checks load over random threads") to indicate that
a check was not currently bound to a thread and that it could easily
be migrated to any other thread. However it did not start the checks
in this state, meaning that they were not redispatchable on startup.
Sometimes under heavy load (e.g. when using SSL checks with OpenSSL 3.0)
the cost of setting up new connections is so high that some threads may
experience connection timeouts on startup. In this case it's better if
they can transfer their excess load to other idle threads. By just
marking the check as sleeping upon startup, we can do this and
significantly reduce the number of failed initial checks.
A small issue was introduced with commit d114f4a68 ("MEDIUM: checks:
spread the checks load over random threads"): when a check is bounced
to another thread, its expiration time is set to TICK_ETERNITY. This
makes it show as not expired upon first wakeup on the next thread,
thus being detected as "woke up too early" and being instantly
rescheduled. Only this after this next wakeup it will be properly
considered.
Several approaches were attempted to fix this. The best one seems to
consist in resetting t->expire and expired upon wakeup, and changing
the !expired test for !tick_is_expired() so that we don't trigger on
this case.
This needs to be backported to 2.7.
The per-thread SSL context in servers causes a burst of connection
renegotiations on startup, both for the forwarded traffic and for the
health checks. Health checks have been seen to continue to cause SSL
rekeying for several minutes after a restart on large thread-count
machines. The reason is that the context is exlusively per-thread
and that the more threads there are, the more likely it is for a new
connection to start on a thread that doesn't have such a context yet.
In order to improve this situation, this commit ensures that a thread
starting an SSL connection to a server without a session will first
look at the last session that was updated by another thread, and will
try to use it. In order to minimize the contention, we're using a read
lock here to protect the data, and the first-level index is an integer
containing the thread number, that is always valid and may always be
dereferenced. This way the session retrieval algorithm becomes quite
simple:
- if the last thread index is valid, then try to use the same session
under a read lock ;
- if any error happens, then atomically nuke the index so that other
threads don't use it and the next one to update a connection updates
it again
And for the ssl_sess_new_srv_cb(), we have this:
- update the entry under a write lock if the new session is valid,
otherwise kill it if the session is not valid;
- atomically update the index if it was 0 and the new one is valid,
otherwise atomically nuke it if the session failed.
Note that even if only the pointer is destroyed, the element will be
re-allocated by the next thread during the sess_new_srv_sb().
Right now a session is picked even if the SNI doesn't match, because
we don't know the SNI yet during ssl_sock_init(), but that's essentially
a matter of API, since connect_server() figures the SNI very early, then
calls conn_prepare() which calls ssl_sock_init(). Thus in the future we
could easily imaging storing a number of SNI-based contexts instead of
storing contexts per thread.
It could be worth backporting this to one LTS version after some
observation, though this is not strictly necessary. the current commit
depends on the following ones:
BUG/MINOR: ssl_sock: fix possible memory leak on OOM
MINOR: ssl_sock: avoid iterating realloc(+1) on stored context
DOC: ssl: add some comments about the non-obvious session allocation stuff
CLEANUP: ssl: keep a pointer to the server in ssl_sock_init()
MEDIUM: ssl_sock: always use the SSL's server name, not the one from the tid
MEDIUM: server/ssl: place an rwlock in the per-thread ssl server session
MINOR: server/ssl: maintain an index of the last known valid SSL session
MINOR: server/ssl: clear the shared good session index on failure
MEDIUM: server/ssl: pick another thread's session when we have none yet
If we fail to set the session using SSL_set_session(), we want to quickly
erase our index from the shared one so that any other thread with a valid
session replaces it.
When a thread creates a new session for a server, if none was known yet,
we assign the thread id (hence the reused_sess index) to a shared variable
so that other threads will later be able to find it when they don't have
one yet. For now we only set and clear the pointer upon session creation,
we do not yet pick it.
Note that we could have done it per thread-group, so as to avoid any
cross-thread exchanges, but it's anticipated that this is essentially
used during startup, at a moment where the cost of inter-thread contention
is very low compared to the ability to restart at full speed, which
explains why instead we store a single entry.
The goal will be to permit a thread to update its session while having
it shared with other threads. For now we only place the lock and arrange
the code around it so that this is quite light. For now only the owner
thread uses this lock so there is no contention.
Note that there is a subtlety in the openssl API regarding
i2s_SSL_SESSION() in that it fills the area pointed to by its argument
with a dump of the session and returns a size that's equal to the
previously allocated one. As such, it does modify the shared area even
if that's not obvious at first glance.
In ssl_sock_set_servername(), we're retrieving the current server name
from the current thread, hoping it will not have changed. This is a
bit dangerous as strictly speaking it's not easy to prove that no other
connection had to use one between the moment it was retrieved in
ssl_sock_init() and the moment it's being read here. In addition, this
forces us to maintain one session per thread while this is not the real
need, in practice we only need one session per SNI. And the current model
prevents us from sharing sessions between threads.
This had been done in 2.5 via commit e18d4e828 ("BUG/MEDIUM: ssl: backend
TLS resumption with sni and TLSv1.3"), but as analyzed with William, it
turns out that a saner approach consists in keeping the call to
SSL_get_servername() there and instead to always assign the SNI to the
current SSL context via SSL_set_tlsext_host_name() immediately when the
session is retreived. This way the session and SNI are consulted atomically
and the host name is only checked from the session and not from possibly
changing elements.
As a bonus the rdlock that was added by that commit could now be removed,
though it didn't cost much.
The SSL session allocation/reuse part is far from being trivial, and
there are some necessary tricks such as allocating then immediately
freeing that are required by the API due to internal refcount. All of
this is particularly hard to grasp, even with the scarce man pages.
Let's document a little bit what's granted and expected along this path
to help the reader later.
The SSL context storage in servers is per-thread, and the contents are
allocated for a length that is determined from the session. It turns out
that placing some traces there revealed that the realloc() that is called
to grow the area can be called multiple times in a row even for just
health checks, to grow the area by just one or two bytes. Given that
malloc() allocates in multiples of 8 or 16 anyway, let's round the
allocated size up to the nearest multiple of 8 to avoid this unneeded
operation.
The fetch logic is redundant and can be simplified by simply
calling the generic fetch with the correct TLV ID set as an
argument, similar to fc_pp_authority.
We already have a call that can retreive an TLV with any value.
Therefore, the fetch logic is redundant and can be simplified
by simply calling the generic fetch with the correct TLV ID
set as an argument.
Based on the new, generic allocation infrastructure, a new sample
fetch fc_pp_tlv is introduced. It is an abstraction for existing
PPv2 TLV sample fetches. It takes any valid TLV ID as argument and
returns the value as a string, similar to fc_pp_authority and
fc_pp_unique_id.
In order to be able to implement fetches in the future that allow
retrieval of any TLVs, a new generic data structure for TLVs is introduced.
Existing TLV fetches for PP2_TYPE_AUTHORITY and PP2_TYPE_UNIQUE_ID are
migrated to use this new data structure. TLV related pools are updated
to not rely on type, but only on size. Pools accomodate the TLV list
element with their associated value. For now, two pools for 128 B and
256 B values are introduced. More fine-grained solutions are possible
in the future, if necessary.
This patch improves readability by scoping HA proxy related PPv2 constants
with a 'HA" prefix. Besides, a new constant for the length of a CRC32C
TLV is introduced. The length is derived from the PPv2 spec, so 32 Bit.
For a while there has been the constraint of having to run as root for
transparent proxying, and we're starting to see some cases where QUIC is
not running in socket-per-connection mode due to the missing capability
that would be needed to bind a privileged port. It's not realistic to
ask all QUIC users on port 443 to run as root, so instead let's provide
a basic support for capabilities at least on linux. The ones currently
supported are cap_net_raw, cap_net_admin and cap_net_bind_service. The
mechanism was made OS-specific with a dedicated file because it really
is. It can be easily refined later for other OSes if needed.
A new keyword "setcaps" is added to the global section, to enumerate the
capabilities that must be kept when switching from root to non-root. This
is ignored in other situations though. HAProxy has to be built with
USE_LINUX_CAP=1 for this to be supported, which is enabled by default
for linux-glibc, linux-glibc-legacy and linux-musl.
A good way to test this is to start haproxy with such a config:
global
uid 1000
setcap cap_net_bind_service
frontend test
mode http
timeout client 3s
bind quic4@:443 ssl crt rsa+dh2048.pem allow-0rtt
and run it under "sudo strace -e trace=bind,setuid", then connecting
there from an H3 client. The bind() syscall must succeed despite the
user id having been switched.
This option defaults to "connection" but is also dependent on the user
being allowed to bind the specified port. Since QUIC can easily run on
non-privileged ports, usually this is not a problem, but if bound to port
443 it will usually fail. Let's mention this.
If a stream is interrupted during its initialization by a panic signal
and tries to dump itself, it may cause a crash during the dump due to
scf and/or scb not being fully initialized. This may also happen while
releasing an endpoint to attach a new one. The effect is that instead
of dying on an abort, the process dies on a segv. This race is ultra-
rare but totally possible. E.g:
#0 se_fl_test (test=1, se=0x0) at include/haproxy/stconn.h:98
#1 sc_ep_test (test=1, sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:148
#2 sc_conn (sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:223
#3 stream_dump (buf=buf@entry=0x7ff9507e7678, s=0x7ff4c40c8800, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>, eol=eol@entry=10 '\n') at src/stream.c:2840
#4 0x000055996c493b42 in ha_task_dump (buf=buf@entry=0x7ff9507e7678, task=<optimized out>, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>) at src/debug.c:328
#5 0x000055996c493edb in ha_thread_dump_one (thr=thr@entry=18, from_signal=from_signal@entry=0) at src/debug.c:227
#6 0x000055996c493ff1 in ha_thread_dump (buf=buf@entry=0x7ff9507e7678, thr=thr@entry=18) at src/debug.c:270
#7 0x000055996c494257 in ha_panic () at src/debug.c:430
#8 ha_panic () at src/debug.c:411
(...)
#23 0x000055996c341fe8 in ssl_sock_close (conn=<optimized out>, xprt_ctx=0x7ff8dcae3880) at src/ssl_sock.c:6699
#24 0x000055996c397648 in conn_xprt_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:148
#25 conn_full_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:192
#26 h1_release (h1c=0x7ff8c297b3c0) at src/mux_h1.c:1074
#27 0x000055996c39c9f0 in h1_detach (sd=<optimized out>) at src/mux_h1.c:3502
#28 0x000055996c474de4 in sc_detach_endp (scp=scp@entry=0x7ff9507e3148) at src/stconn.c:375
#29 0x000055996c4752a5 in sc_reset_endp (sc=<optimized out>, sc@entry=0x7ff8d5cbd560) at src/stconn.c:475
Note that this cannot happen on "show sess" since a stream never leaves
process_stream in such an uninitialized state, thus it's really only the
crash dump that may cause this.
It should be backported to 2.8.
Bug was introduced by commit 26654 ("MINOR: ssl: add "crt" in the
cert_exts array").
When looking for a .crt directly in the cert_exts array, the
ssl_sock_load_pem_into_ckch() function will be called with a argument
which does not have its ".crt" extensions anymore.
If "ssl-load-extra-del-ext" is used this is not a problem since we try
to add the ".crt" when doing the lookup in the tree.
However when using directly a ".crt" without this option it will failed
looking for the file in the tree.
The fix removes the "crt" entry from the array since it does not seem to
be really useful without a rework of all the lookups.
Should fix issue #2265
Must be backported as far as 2.6.
In 2.9-dev4, commit 544c2f2d9 ("MINOR: pools: use EBO to wait for
unlock during pool_flush()") broke the thread-less build by calling
pl_wait_new_long() without explicitly including plock.h which is
normally included by thread.h when threads are enabled.
Surprisingly there's no include guard in plock.h though there is one in
atomic-ops.h. Let's add one, or we cannot risk including the file multiple
times.
If the connection is closed in h2_release(), which is indicated by ret<0, we
must not dereference conn anymore. This was introduced in 2.9-dev4 by commit
5053e8914 ("MEDIUM: h2: prevent stream opening before connection reverse
completed") and detected after a few hours of runtime thanks to running with
pool integrity checks and caller enabled. No backport is needed.
Released version 2.9-dev4 with the following main changes :
- DEV: flags/show-sess-to-flags: properly decode fd.state
- BUG/MINOR: stktable: allow sc-set-gpt(0) from tcp-request connection
- BUG/MINOR: stktable: allow sc-add-gpc from tcp-request connection
- DOC: typo: fix sc-set-gpt references
- SCRIPTS: git-show-backports: automatic ref and base detection with -m
- REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (3)
- DOC: jwt: Add explicit list of supported algorithms
- BUILD: Makefile: add the USE_QUIC option to make help
- BUILD: Makefile: add USE_QUIC_OPENSSL_COMPAT to make help
- BUILD: Makefile: realigned USE_* options in make help
- DEV: makefile: fix POSIX compatibility for "range" target
- IMPORT: plock: also support inlining the int code
- IMPORT: plock: always expose the inline version of the lock wait function
- IMPORT: lorw: support inlining the wait call
- MINOR: threads: inline the wait function for pthread_rwlock emulation
- MINOR: atomic: make sure to always relax after a failed CAS
- MINOR: pools: use EBO to wait for unlock during pool_flush()
- BUILD/IMPORT: fix compilation with PLOCK_DISABLE_EBO=1
- MINOR: quic+openssl_compat: Do not start without "limited-quic"
- MINOR: quic+openssl_compat: Emit an alert for "allow-0rtt" option
- BUG/MINOR: quic: allow-0rtt warning must only be emitted with quic bind
- BUG/MINOR: quic: ssl_quic_initial_ctx() uses error count not error code
- MINOR: pattern: do not needlessly lookup the LRU cache for empty lists
- IMPORT: xxhash: update xxHash to version 0.8.2
- MINOR: proxy: simplify parsing 'backend/server'
- MINOR: connection: centralize init/deinit of backend elements
- MEDIUM: connection: implement passive reverse
- MEDIUM: h2: reverse connection after SETTINGS reception
- MINOR: server: define reverse-connect server
- MINOR: backend: only allow reuse for reverse server
- MINOR: tcp-act: parse 'tcp-request attach-srv' session rule
- REGTESTS: provide a reverse-server test
- MINOR: tcp-act: define optional arg name for attach-srv
- MINOR: connection: use attach-srv name as SNI reuse parameter on reverse
- REGTESTS: provide a reverse-server test with name argument
- MINOR: proto: define dedicated protocol for active reverse connect
- MINOR: connection: extend conn_reverse() for active reverse
- MINOR: proto_reverse_connect: parse rev@ addresses for bind
- MINOR: connection: prepare init code paths for active reverse
- MEDIUM: proto_reverse_connect: bootstrap active reverse connection
- MINOR: proto_reverse_connect: handle early error before reversal
- MEDIUM: h2: implement active connection reversal
- MEDIUM: h2: prevent stream opening before connection reverse completed
- REGTESTS: write a full reverse regtest
- BUG/MINOR: h2: fix reverse if no timeout defined
- CI: fedora: fix "dnf" invocation syntax
- BUG/MINOR: hlua_fcn: potentially unsafe stktable_data_ptr usage
- DOC: lua: fix Sphinx warning from core.get_var()
- DOC: lua: fix core.register_action typo
- BUG/MINOR: ssl_sock: fix possible memory leak on OOM
- MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs)
- MEDIUM: map/acl: Improve pat_ref_set_elt() efficiency (for "set-map", "add-acl"action perfs)
- MEDIUM: map/acl: Accelerate several functions using pat_ref_elt struct ->head list
- MEDIUM: map/acl: Replace map/acl spin lock by a read/write lock.
- DOC: map/acl: Remove the comments about map/acl performance issue
- DOC: Explanation of be_name and be_id fetches
- MINOR: connection: simplify removal of idle conns from their trees
- MINOR: server: move idle tree insert in a dedicated function
- MAJOR: connection: purge idle conn by last usage
Backend idle connections are purged on a recurring occurence during the
process lifetime. An estimated number of needed connections is
calculated and the excess is removed periodically.
Before this patch, purge was done directly using the idle then the safe
connection tree of a server instance. This has a major drawback to take
no account of a specific ordre and it may removed functional connections
while leaving ones which will fail on the next reuse.
The problem can be worse when using criteria to differentiate idle
connections such as the SSL SNI. In this case, purge may remove
connections with a high rate of reusing while leaving connections with
criteria never matched once, thus reducing drastically the reuse rate.
To improve this, introduce an alternative storage for idle connection
used in parallel of the idle/safe trees. Now, each connection inserted
in one of this tree is also inserted in the new list at
`srv_per_thread.idle_conn_list`. This guarantees that recently used
connection is present at the end of the list.
During the purge, use this list instead of idle/safe trees. Remove first
connection in front of the list which were not reused recently. This
will ensure that connection that are frequently reused are not purged
and should increase the reuse rate, particularily if distinct idle
connection criterias are in used.
Define a new function _srv_add_idle(). This is a simple wrapper to
insert a connection in the server idle tree. This is reserved for simple
usage and require to idle_conns lock. In most cases,
srv_add_to_idle_list() should be used.
This patch does not have any functional change. However, it will help
with the next patch as idle connection will be always inserted in a list
as secondary storage along with idle/safe trees.
Small change of API for conn_delete_from_tree(). Now the connection
instance is taken as argument instead of its inner node.
No functional change introduced with this commit. This simplifies
slightly invocation of conn_delete_from_tree(). The most useful changes
is that this function will be extended in the next patch to be able to
remove the connection from its new idle list at the same time as in its
idle tree.
The be_name and be_id fetches contain data related to the current
backend and can be used in frontend responses. Yet, in cases where
no backend is used due to a local response or backend selection
failure, these fetches retain details of the current frontend.
This patch enhances the clarity of the values provided by these
fetches.
Signed-off-by: Sébastien Gross <sgross@haproxy.com>
Replace ->lock type of pat_ref struct by HA_RWLOCK_T.
Replace all calls to HA_SPIN_LOCK() (resp. HA_SPIN_UNLOCK()) by HA_RWLOCK_WRLOCK()
(resp. HA_RWLOCK_WRUNLOCK()) when a write access is required.
There is only one read access which is needed. This is in the "show map" command
callback, cli_io_handler_map_lookup() where a HA_SPIN_LOCK() call is replaced
by HA_RWLOCK_RDLOCK() (resp. HA_SPIN_UNLOCK() by HA_RWLOCK_RDUNLOCK).
Replace HA_SPIN_INIT() calls by HA_RWLOCK_INIT() calls.
Store a pointer to the expression (struct pattern_expr) into the data structure
used to chain/store the map element references (struct pat_ref_elt) , e.g. the
struct pattern_tree when stored into an ebtree or struct pattern_list when
chained to a list.
Modify pat_ref_set_elt() to stop inspecting all the expressions attached to a map
and to look for the <elt> element passed as parameter to retrieve the sample data
to be parsed. Indeed, thanks to the pointer added above to each pattern tree nodes
or list elements, they all can be inspected directly from the <elt> passed as
parameter and its ->tree_head and ->list_head member: the pattern tree nodes are
stored into elt->tree_head, and the pattern list elements are chained to
elt->list_head list. This inspection was also the job of pattern_find_smp() which
is no more useful. This patch removes the code of this function.
Organize reference to pattern element of map (struct pat_ref_elt) into an ebtree:
- add an eb_root member to the map (pat_ref struct) and an ebpt_node to its
element (pat_ref_elt struct),
- modify the code to insert these nodes into their ebtrees each time they are
allocated. This is done in pat_ref_append().
Note that ->head member (struct list) of map (struct pat_ref) is not removed
could have been removed. This is not the case because still necessary to dump
the map contents from the CLI in the order the map elememnts have been inserted.
This patch also modifies http_action_set_map() which is the callback at least
used by "set-map" action. The pat_ref_elt element returned by pat_ref_find_elt()
is no more ignored, but reused if not NULL by pat_ref_set() as first element to
lookup from. This latter is also modified to use the ebtree attached to the map
in place of the ->head list attached to each map element (pat_ref_elt struct).
Also modify pat_ref_find_elt() to makes it use ->eb_root map ebtree added to the
map by this patch in place of inspecting all the elements with a strcmp() call.
That's the classical realloc() issue: if it returns NULL, the old area
is not freed but we erase the pointer. It was brought by commit e18d4e828
("BUG/MEDIUM: ssl: backend TLS resumption with sni and TLSv1.3"), and
should be backported where this commit was backported.
"converter" was used in place of "action" as a result of a copy-paste
error probably.
Also, rephrasing the "actions" keyword explanation to prevent confusion
between action name (which is the name of new action about to be created)
and action facilities where we want to expose the new action.
This could be backported to every stable versions.
Since f034139bc0 ("MINOR: lua: Allow reading "proc." scoped vars from
LUA core."), a new Sphinx warning is emitted when generating the lua doc:
"WARNING: Field list ends without a blank line; unexpected unindent."
This is due to a missing space after the line break continuation, sphinx
parser is very restrictive unfortunately!
Suppressing the warning and fixing the html output at the same time by
adding the missing space.
As reported by Coverity in GH #2253, stktable_data_ptr() usage in
hlua_stktable_dump() func is potentially unsafe because
stktable_data_ptr() may return NULL and the returned value is
dereferenced as-is without precautions.
In practise, this should not happen because some error checking was
already performed prior to calling stktable_data_ptr(). But since we're
using the safe stktable_data_ptr() function, all the error checking is
already done within the function, thus all we need to do is check ptr
against NULL instead to protect against NULL dereferences.
This should be backported in every stable versions.
h2c.task is not allocated in h2_init() if timeout client/server is not
defined depending on the connection side. This caused crash on
connection reverse due to systematic requeuing of h2c.task in
h2_conn_reverse().
To fix this, check h2c.task in h2_conn_reverse(). If old timeout was
undefined but new one is, h2c.task must be allocated as it was not in
h2_init(). On the opposite situation, if old timeout was defined and new
one is not, h2c.task is freed. In this case, or if neither timeout are
defined, skip the task requeuing.
This bug is easily reproduced by using reverse bind or server with
undefined timeout client/server depending on the connection reverse
direction.
This bug has been introduced by reverse connect support.
No need to backport it.
This test instantiates two haproxy instances :
* first one uses a reverse server with two bind pub and priv
* second one uses a reverse bind to initiate connection to priv endpoint
On startup, only first haproxy instance is up. A client send a request
to pub endpoint and should receive a HTTP 503 as no connection are
available on the reverse server.
Second haproxy instance is started. A delay of 3 seconds is inserted to
wait for the connection between the two LBs. Then a client retry the
request and this time should receive a HTTP 200 reusing the bootstrapped
connection.
HTTP/2 demux must be handled with care for active reverse connection.
Until accept has been completed, it should be forbidden to handle
HEADERS frame as session is not yet ready to handle streams.
To implement this, use the flag H2_CF_DEM_TOOMANY which blocks demux
process. This flag is automatically set just after conn_reverse()
invocation. The flag is removed on rev_accept_conn() callback via a new
H2 ctl enum. H2 tasklet is woken up to restart demux process.
As a side-effect, reporting in H2 mux may be blocked as demux functions
are used to convert error status at the connection level with
CO_FL_ERROR. To ensure error is reported for a reverse connection, check
h2c_is_dead() specifically for this case in h2_wake(). This change also
has its own side-effect : h2c_is_dead() conditions have been adjusted to
always exclude !h2c->conn->owner condition which is always true for
reverse connection or else H2 mux may kill them unexpectedly.
Implement active reverse on h2_conn_reverse().
Only minimal steps are done here : HTTP version session counters are
incremented on the listener instance. Also, the connection is inserted
in the mux_stopping_list to ensure it will be actively closed on process
shutdown/listener suspend.
An error can occured on a reverse connection before accept is completed.
In this case, no parent session can be notified. Instead, wake up the
receiver task on conn_create_mux().
As a counterpart to this, receiver task is extended to match CO_FL_ERROR
flag on pending connection. In this case, the onnection is freed. The
task is then requeued with a 1 second delay to start a new reverse
connection attempt.
Implement active reverse connection initialization. This is done through
a new task stored in the receiver structure. This task is instantiated
via bind callback and first woken up via enable callback.
Task handler is separated into two halves. On the first step, a new
connection is allocated and stored in <pend_conn> member of the
receiver. This new client connection will proceed to connect using the
server instance referenced in the bind_conf.
When connect has successfully been executed and HTTP/2 connection is
ready for exchange after SETTINGS, reverse_connect task is woken up. As
<pend_conn> is still set, the second halve is executed which only
execute listener_accept(). This will in turn execute accept_conn
callback which is defined to return the pending connection.
The task is automatically requeued inside accept_conn callback if bind
maxconn is not yet reached. This allows to specify how many connection
should be opened. Each connection is instantiated and reversed serially
one by one until maxconn is reached.
conn_free() has been modified to handle failure if a reverse connection
fails before being accepted. In this case, no session exists to notify
about the failure. Instead, reverse_connect task is requeud with a 1
second delay, giving time to fix a possible network issue. This will
allow to attempt a new connection reverse.
Note that for the moment connection rebinding after accept is disabled
for simplicity. Extra operations are required to migrate an existing
connection and its stack to a new thread which will be implemented
later.
When an active reverse connection is initialized, it has no stream-conn
attached to it contrary to other backend connections. This forces to add
extra check on stream existence in conn_create_mux() and h2_init().
There is also extra checks required for session_accept_fd() after
reverse and accept is done. This is because contrary to other frontend
connections, reversed connections have already initialized their mux and
transport layers. This forces us to skip the majority of
session_accept_fd() initialization part.
Finally, if session_accept_fd() is interrupted due to an early error, a
reverse connection cannot be freed directly or else mux will remain
alone. Instead, the mux destroy callback is used to free all connection
elements properly.
Implement parsing for "rev@" addresses on bind line. On config parsing,
server name is stored on the bind_conf.
Several new callbacks are defined on reverse_connect protocol to
complete parsing. listen callback is used to retrieve the server
instance from the bind_conf server name. If found, the server instance
is stored on the receiver. Checks are implemented to ensure HTTP/2
protocol only is used by the server.