Previous commit renames 'proto_reverse_connect' module to 'proto_rhttp'.
This commits follows this by replacing various custom prefix by 'rhttp_'
to make the code uniform.
Note that 'reverse_' prefix was kept in connection module. This is
because if a new reversable protocol not based on HTTP is implemented,
it may be necessary to reused the same connection function which are
protocol agnostic.
"log-balance" directive was recently introduced to configure the
balancing algorithm to use when in a log backend. However, it is
confusing and it causes issues when used in default section.
In this patch, we take another approach: first we remove the
"log-balance" directive, and instead we rely on existing "balance"
directive to configure log load balancing in log backend.
Some algorithms such as roundrobin can be used as-is in a log backend,
and for log-only algorithms, they are implemented as "log-$name" inside
the "backend" directive.
The documentation was updated accordingly.
If less connections than threads are established on a reverse-http gateway
and these servers have a non-nul pool-min-conn, then conn_backend_get()
will refrain from picking available connections from other threads. But
this makes no sense for protocols for which there is no ->connect(),
since there's no way the current thread will manage to establish its own
connection. For such situations we should always accept to use another
thread's connection. That's precisely what this patch does.
Define a new function srv_add_to_avail_list(). This function is used to
centralize connection insertion in available tree. It reuses a BUG_ON()
statement to ensure the connection is not present in the idle list.
Since the following commit, idle conns are stored in a list as secondary
storage to retrieve them in usage order :
5afcb686b9
MAJOR: connection: purge idle conn by last usage
The list usage has been extended wherever connections lookup are done
both on idle and safe trees. This reduced the code size by replacing a
two tree loops by a single list loop.
LIST_ELEM() is used in this context to retrieve the first idle list
element from the server list head. However, macro usage was wrong due to
an extra '&' operator which returns an invalid connection reference.
This will most of the time caused a crash on conn_delete_from_tree() or
affiliated functions.
This bug only occurs if the FD pool is exhausted and some idle
connections are selected to be killed.
It can be reproduced using the following config and h2load command :
$ h2load -t 8 -c 800 -m 10 -n 800 "http://127.0.0.1:21080/?s=10k"
global
maxconn 100
defaults
mode http
timeout connect 20s
timeout client 20s
timeout server 20s
listen li
bind :21080 proto h2
server nginx 127.99.0.1:30080 proto h1
This bug has been introduced by the above commit. Thus no need to
backport this fix.
Note that LIST_ELEM() macro usage was slightly adjusted also in
srv_migrate_conns_to_remove(). The function used toremove_list instead
of idle_list connection list element. This is not a bug as they are
stored in the same union. However, the new code is clearer as it intends
to move connection from the idle_list only into the toremove_list
mt-list.
Idle connections are both stored in an idle/safe tree and in an idle
list. The list is used as a secondary storage to be able to retrieve
them by usage order.
If a connection is moved into the available tree, it must not be present
in the idle list. A BUG_ON() was written to check this but was placed at
the wrong code section. Fix this by removing the misplaced one and write
new ones for avail_conns tree insertion and lookup.
The impact of this bug is minor as the misplaced BUG_ON() did not seem
to be triggered.
No need to backport.
Because channel_is_empty() function does now only check the channel's
buffer, we can remove it and rely on co_data() instead. Of course, all tests
must be inverted.
channel_is_empty() is thus removed.
hash lb algorithm can be configured with the "log-balance hash <cnv_list>"
directive. With this algorithm, the user specifies a converter list with
<cnv_list>.
The produced log message will be passed as-is to the provided converter
list, and the resulting hash will be used to select the log server that
will receive the log message.
Instead of systematically computing the avalanche hash right after the
gen_hash() call, do it inside the gen_hash() function directly to ensure
avalanche setting is always considered.
Allow the use of the "none" hash-type function so that the key resulting
from the sample expression is directly used as the hash.
This can be useful to do the hashing manually using available hashing
converters, or even custom ones, and then inform haproxy that it can
directly rely on the sample expression result which is explictly handled
as an integer in this case.
In this patch we add basic support for the random algorithm:
random algorithm picks a random server using the result of the
statistical_prng() function as if it was a hash key to then compute the
related server ID.
There is no support for the <draw> parameter (which is implemented for
tcp/http load-balancing), because we don't have the required metrics to
evaluate server's load in log backends for the moment. Plus it would add
more complexity to the __do_send_log_backend() function so we'll keep it
this way for now but this might be needed in the future.
sticky algorithm always tries to send log messages to the first server in
the farm. The server will stay in front during queue and dequeue
operations (no other server can steal its place), unless it becomes
unavailable, in which case it will be replaced by another server from
the tree.
Using "mode log" in a backend section turns the proxy in a log backend
which can be used to log-balance logs between multiple log targets
(udp or tcp servers)
log backends can be used as regular log targets using the log directive
with "backend@be_name" prefix, like so:
| log backend@mybackend local0
A log backend will distribute log messages to servers according to the
log load-balancing algorithm that can be set using the "log-balance"
option from the log backend section. For now, only the roundrobin
algorithm is supported and set by default.
Refactor alloc_bind_address() function which is used to allocate a
sockaddr if a connection to a target server relies on a specific source
address setting.
The main objective of this change is to be able to use this function
outside of backend module, namely for preconnections using a reverse
server. As such, this function is now exported globally.
For reverse connect, there is no stream instance. As such, the function
parts which relied on it were reduced to the minimal. Now, stream is
only used if a non-static address is configured which is useful for
usesrc client|clientip|hdr_ip. These options have no sense for reverse
connect so it should be safe to use the same function.
Backend idle connections are purged on a recurring occurence during the
process lifetime. An estimated number of needed connections is
calculated and the excess is removed periodically.
Before this patch, purge was done directly using the idle then the safe
connection tree of a server instance. This has a major drawback to take
no account of a specific ordre and it may removed functional connections
while leaving ones which will fail on the next reuse.
The problem can be worse when using criteria to differentiate idle
connections such as the SSL SNI. In this case, purge may remove
connections with a high rate of reusing while leaving connections with
criteria never matched once, thus reducing drastically the reuse rate.
To improve this, introduce an alternative storage for idle connection
used in parallel of the idle/safe trees. Now, each connection inserted
in one of this tree is also inserted in the new list at
`srv_per_thread.idle_conn_list`. This guarantees that recently used
connection is present at the end of the list.
During the purge, use this list instead of idle/safe trees. Remove first
connection in front of the list which were not reused recently. This
will ensure that connection that are frequently reused are not purged
and should increase the reuse rate, particularily if distinct idle
connection criterias are in used.
Small change of API for conn_delete_from_tree(). Now the connection
instance is taken as argument instead of its inner node.
No functional change introduced with this commit. This simplifies
slightly invocation of conn_delete_from_tree(). The most useful changes
is that this function will be extended in the next patch to be able to
remove the connection from its new idle list at the same time as in its
idle tree.
A reverse server relies solely on its pool of idle connection to
transfer requests which will be populated through a new tcp-request rule
'attach-srv'.
Several changes are required on connect_server() to implement this.
First, reuse mode is forced to always for this type of server. Then, if
no idle connection is found, the request will be aborted. This results
with a 503 HTTP error code, similarly to when no server is available.
A connection contains extra elements which are only used for the backend
side. Regroup their allocation and deallocation in two new functions
named conn_backend_init() and conn_backend_deinit().
No functional change is introduced with this commit. The new functions
are reused in place of manual alloc/dealloc in conn_new() / conn_free().
This patch will be useful for reverse connect support with connection
conversion from backend to frontend side and vice-versa.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
Let's get rid of timeval in storage of internal timestamps so that they
are no longer mistaken for wall clock time. These were exclusively used
subtracted from each other or to/from "now" after being converted to ns,
so this patch removes the tv_to_ns() conversion to use them natively. Two
occurrences of tv_isge() were turned to a regular wrapping subtract.
Instead of operating on {sec, usec} now we convert both operands to
ns then subtract them and convert to ms. This is a first step towards
dropping timeval from these timestamps.
Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at
all and could be removed.
When compiled in debug mode, HAProxy prints a debug message at the beginning
of assign_server(). It is pretty annoying and useless because, in debug
mode, we can active stream traces. Thus, just remove it.
SE_FL_ERROR flag is no longer set when an error is detected durign the
connection establishment. SC_FL_ERROR flag is set instead. So it is safe to
remove test on SE_FL_ERROR to detect connection establishment error.
We can now fully rely on SC_FL_ERROR flag from the stream. The first step is
to stop to set the SE_FL_ERROR flag. Only endpoints are responsible to set
this flag. It was a design limitation. It is now fixed.
From the stream, when SE_FL_ERROR flag is tested, we now also test the
SC_FL_ERROR flag. Idea is to stop to rely on the SE descriptor to detect
errors.
Because shutowns for reads are now considered as aborts, the shudowns for
writes can now be considered as shutdowns. Here it is just a flag
renaming. SC_FL_SHUTW_NOW is renamed SC_FL_SHUT_WANTED.
The purpose of this patch is only a one-to-one replacement, as far as
possible.
CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
It was done by hand by callers when a shutdown for read or write was
performed. It is now always handled by the functions performing the
shutdown. This way the callers don't take care of it. This will avoid some
bugs.
This patch removes CF_READ_ERROR and CF_WRITE_ERROR flags. We now rely on
SE_FL_ERR_PENDING and SE_FL_ERROR flags. SE_FL_ERR_PENDING is used for write
errors and SE_FL_ERROR for read or unrecoverable errors.
When a connection error is reported, SE_FL_ERROR and SE_FL_EOS are now set and a
read event and a write event are reported to be sure the stream will properly
process the error. At the stream-connector level, it is similar. When an error
is reported during a send, a write event is triggered. On the read side, nothing
more is performed because an error at this stage is enough to wake the stream
up.
A major change is brought with this patch. We stop to check flags of the
ooposite channel to report abort or timeout. It also means when an read or
write error is reported on a side, we no longer update the other side. Thus
a read error on the server side does no long lead to a write error on the
client side. This should ease errors report.
CF_READ_PARTIAL flag is now merged with CF_READ_EVENT. It means
CF_READ_EVENT is set when a read0 is received (formely CF_READ_NULL) or when
data are received (formely CF_READ_ACTIVITY).
There is nothing special here, except conditions to wake the stream up in
sc_notify(). Indeed, the test was a bit changed to reflect recent
change. read0 event is now formalized by (CF_READ_EVENT + CF_SHUTR).
As for CF_READ_NULL, it appears CF_WRITE_NULL and other write events on a
channel are mainly used to wake up the stream and may be replace by on write
event.
In this patch, we introduce CF_WRITE_EVENT flag as a replacement to
CF_WRITE_EVENT_NULL. There is no breaking change for now, it is just a
rename. Gradually, other write events will be merged with this one.
In order to evenly pick idle connections from other threads, there is
a "next_takeover" index in the server, that is incremented each time
a connection is picked from another thread, and indicates which one to
start from next time.
With thread groups this doesn't work well because the index is the same
regardless of the group, and if a group has more threads than another,
there's even a risk to reintroduce an imbalance.
This patch introduces a new per-tgroup storage in servers which, for now,
only contains an instance of this next_takeover index. This way each
thread will now only manipulate the index specific to its own group, and
the takeover will become fair again. More entries may come soon.
In github issue #1878, Bart Butler reported observing turn-around states
(1 second pause) after connection retries going to different servers,
while this ought not happen.
In fact it does happen because back_handle_st_cer() enforces the TAR
state for any algo that's not round-robin. This means that even leastconn
has it, as well as hashes after the number of servers changed.
Prior to doing that, the call to stream_choose_redispatch() has already
had a chance to perform the correct choice and to check the algo and
the number of retries left. So instead we should just let that function
deal with the algo when needed (and focus on deterministic ones), and
let the former just obey. Bart confirmed that the fixed version works
as expected (no more delays during retries).
This may be backported to older releases, though it doesn't seem very
important. At least Bart would like to have it in 2.4 so let's go there
for now after it has cooked a few weeks in 2.6.
Idle connections do not work on 32-bit machines due to an alignment issue
causing the connection nodes to be indexed with their lower 32-bits set to
zero and the higher 32 ones containing the 32 lower bitss of the hash. The
cause is the use of ebmb_node with an aligned data, as on this platform
ebmb_node is only 32-bit aligned, leaving a hole before the following hash
which is a uint64_t:
$ pahole -C conn_hash_node ./haproxy
struct conn_hash_node {
struct ebmb_node node; /* 0 20 */
/* XXX 4 bytes hole, try to pack */
int64_t hash; /* 24 8 */
struct connection * conn; /* 32 4 */
/* size: 40, cachelines: 1, members: 3 */
/* sum members: 32, holes: 1, sum holes: 4 */
/* padding: 4 */
/* last cacheline: 40 bytes */
};
Instead, eb64 nodes should be used when it comes to simply storing a
64-bit key, and that is what this patch does.
For backports, a variant consisting in simply marking the "hash" member
with a "packed" attribute on the struct also does the job (tested), and
might be preferable if the fix is difficult to adapt. Only 2.6 and 2.5
are affected by this.
The "first req" rule consists in not delivering a connection's first
request to a connection that's not known for being safe so that we
don't deliver a broken page to a client if the server didn't intend to
keep it alive. That's what's used by "http-reuse safe" particularly.
But the reason this rule was created was precisely because haproxy was
not able to re-emit the request to the server in case of connection
breakage, which is precisely what l7 retries later brought. As such,
there's no reason for enforcing this rule when l7 retries are properly
enabled because such a blank page will trigger a retry and will not be
delivered to the client.
This patch simply checks that the l7 retries are enabled for the 3 cases
that can be triggered on a dead or dying connection (failure, empty, and
timeout), and if all 3 are enabled, then regular idle connections can be
reused.
This could almost be marked as a bug fix because a lot of users relying
on l7 retries do not necessarily think about using http-reuse always due
to the recommendation against it in the doc, while the protection that
the safe mode offers is never used in that mode, and it forces the http
client not to reuse existing persistent connections since it never sets
the "not first" flag.
It could also be decided that the protection is not used either when
the origin is an applet, as in this case this is internal code that
we can decide to let handle the retry by itself (all info are still
present). But at least the httpclient will be happy with this alone.
It would make sense to backport this at least to 2.6 in order to let
the httpclient reuse connections, maybe to older releases if some
users report low reuse counts.
The connection retry counter is incremented too early when a connection
fails. In SC_ST_CER state, errors handling must be performed before
incrementing the counter. Otherwise, we may consider the max connection
attempt is reached while a last one is in fact possible.
This patch must be backported to 2.6.
If the loadbalancing is performed on the source IP address, an internal
error was returned on error. So for an applet on the client side (for
instance an SPOE applet) or for a client connected to a unix socket, an
internal error is returned.
However, when other LB algos fail, a fallback on round-robin is
performed. There is no reson to not do the same here.
This patch should fix the issue #1797. It must be backported to all
supported versions.
We don't want to pick idle connections from another thread group,
this would be very slow by forcing to share undesirable data.
This patch makes sure that we start seeking from the current thread
group's threads only and loops over that range exclusively.
It's worth noting that the next_takeover pointer remains per-server
and will bounce when multiple groups use it at the same time. But we
preserve the perturbation by applying a modulo when retrieving it,
so that when groups are of the same size (most common case), the
index will not even change. At this time it doesn't seem worth
storing one index per group in servers, but that might be an option
if any contention is detected later.
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. The HTTP analyser and the backend functions
were all updated after being reviewed. Function stream_update_both_cs()
was renamed to stream_update_both_sc()
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
This file contains all the stream-connector functions that are specific
to application layers of type stream. So let's name it accordingly so
that it's easier to figure what's located there.
The alphabetical ordering of include files was preserved.
We're starting to propagate the stream connector's new name through the
API. Most call places of these functions that retrieve the channel or its
buffer are in applets. The local variable names are not changed in order
to keep the changes small and reviewable. There were ~92 uses of cs_ic(),
~96 of cs_oc() (due to co_get*() being less factorizable than ci_put*),
and ~5 accesses to the buffer itself.
This also follows the natural naming. There are roughly 238 changes, all
totally trivial. conn_stream-t.h has become completely void of any
"conn_stream" related stuff now (except its name).
This renames the "struct conn_stream" to "struct stconn" and updates
the descriptions in all comments (and the rare help descriptions) to
"stream connector" or "connector". This touches a lot of files but
the change is minimal. The local variables were not even renamed, so
there's still a lot of "cs" everywhere.
Just like for the appctx, this is a pointer to a stream endpoint descriptor,
so let's make this explicit and not confuse it with the full endpoint. There
are very few changes thanks to the preliminary refactoring of the flags
manipulation.
This changes all main uses of cs->endp->flags to the sc_ep_*() equivalent
by applying coccinelle script cs_endp_flags.cocci.
Note: 143 locations were touched, manually reviewed and found to be OK,
except a single one that was adjusted in cs_reset_endp() where the flags
are read and filtered to be used as-is and not as a boolean, hence was
replaced with sc_ep_get() & $FLAGS.
The script was applied with all includes:
spatch --in-place --recursive-includes -I include --sp-file $script $files
The mux ->detach() function currently takes a conn_stream. This causes
an awkward situation where the caller cs_detach_endp() has to partially
mark it as released but not completely so that ->detach() finds its
endpoint and context, and it cannot be done later since it's possible
that ->detach() deletes the endpoint. As such the endpoint link between
the conn_stream and the mux's stream is in a transient situation while
we'd like it to be clean so that the mux's ->detach() code can call any
regular function it wants that knows the regular semantics of the
relation between the CS and the endpoint.
A better approach consists in slightly modifying the detach() API to
better match the reality, which is that the endpoint is detached but
still alive and that it's the only part the function is interested in.
As such, this patch modifies the function to take an endpoint there,
and by analogy (or simplicity) does the same for ->attach(), even
though it looks less important there since we're always attaching an
endpoint to a conn_stream anyway. It is possible that in the future
the API could evolve to use more endpoints that provide a bit more
flexibility in the API, but at this point we don't need to go further.
This flag is no longer needed now that it must always match the presence
of a destination address on the backend conn_stream. Worse, before previous
patch, if it were to be accidently removed while the address is present, it
could result in a leak of that address since alloc_dst_address() would first
be called to flush it.
Its usage has a long history where addresses were stored in an area shared
with the connection, but as this is no longer the case, there's no reason
for putting this burden onto application-level code that should not focus
on setting obscure flags.
The only place where that made a small difference is in the dequeuing code
in case of queue redistribution, because previously the code would first
clear the flag, and only later when trying to deal with the queue, would
release the address. It's not even certain whether there would exist a
code path going to connect_server() without calling pendconn_dequeue()
first (e.g. retries on queue timeout maybe?).
Now the pendconn_dequeue() code will rely on SF_ASSIGNED to decide to
clear and release the address, since that flag is always set while in
a server's queue, and its clearance implies that we don't want to keep
the address. At least it remains consistent and there's no more risk of
leaking it.
These functions dynamically allocate a source or destination address but
start by clearing the previous one. There's a non-null risk of leaking
addresses there in case of misuse. Better have them do nothing if the
address was already allocated.
Only CS_EP_ERROR flag is now removed from the endpoint when a reset is
performed. When a new the endpoint is allocated, flags are preserved. It is
the caller responsibility to remove other flags, depending on its need.
Concretly, during a connection retry or a L7 retry, we must preserve
flags. In tcpcheck and the CLI, we reset flags.
This patch is 2.6-specific. No backport needed.
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
Almost all of our hash-based LB algorithms are implemented as special
cases of something that can now be achieved using sample expressions,
and some of them have adopted some options to adapt their behavior in
ways that could also be achieved using converters.
There are users who want to hash other parameters that are combined
into variables, and who set headers from these values and use
"balance hdr(name)" for this.
Instead of constantly implementing specific options and having users
hack around when they want a real hash, let's implement a native hash
mode that applies to a standard sample expression. This way, any
fetchable element (including variables) may be used to construct the
hash, even modified by any converter if desired.
Since the idle connections management changed to use eb-trees instead of MT
lists, a lock must be acquired to manipulate servers idle/safe/available
connection lists. However, it remains an unprotected use in
connect_server(), when a connection is removed from an idle list if the mux
has no more streams available. Thus it is possible to remove a connection
from an idle list on a thread, while another one is looking for a idle
connection. Of couse, this may lead to a crash.
To fix the bug, we must take care to acquire the idle connections lock
first. The bug was introduced by the commit f232cb3e9 ("MEDIUM: connection:
replace idle conn lists by eb trees").
The patch must be backported as far as 2.4.
When an appctx is created on the server side, we now set the corresponding
conn-stream to ready state (CS_ST_RDY). When it happens, the backend
conn-stream is in CS_ST_INI state. It is not consistant to let the
conn-stream in this state because it means it is possible to have a target
installed in CS_ST_INI state, while with a connection, the conn-stream is
switch to CS_ST_RDY or CS_ST_EST state.
It is especially anbiguous because we may be tempted to think there is no
endpoint attached to the conn-stream before the CS_ST_CON state. And it is
indeed the reason for a bug leading to a crash because a cs_detach_endp() is
performed if an abort is detected on the backend conn-stream in CS_ST_INI
state. With a mux or a appctx attached to the conn-stream, "->endp" field is
set to NULL. It is unexpected. The API will be changed to be sure it is not
possible. But it exposes a consistency issue with applets.
So, the conn-stream must not stay in CS_ST_INI state when an appctx is
attached. But there is no reason to set it in CS_ST_REQ. The conn-stream
must be set to CS_ST_RDY to handle applets and connections in the same
way. Note that if only the target is set but no appctx is created, the
backend conn-stream is switched from CS_ST_INI to CS_ST_REQ state to be able
to create the corresponding appctx. This part is unchanged.
This patch depends on the commit "MINOR: backend: Don't allow to change
backend applet".
The ambiguity exists on previous versions. But the issue is
2.6-specific. Thus, no backport is needed.
This part was inherited from haproxy-1.5. But since a while (at least 1.8),
the backend applet, once created, is no longer changed. Thus there is no
reason to still check if the target has changed. And in fact, if it was
still possible, there would be a memory leak because the old applet would be
lost and never released.
There is no reason to backport this fix because the leak only exists on a
dead code path.
si_register_applet() and si_applet_release() are renamed
cs_register_applet() and cs_applet_release() and now manipulate a
conn-stream instead of a stream-inteface.
si_shutr(), si_shutw(), si_chk_rcv() and si_chk_snd() are moved in the
conn-stream scope and renamed, respectively, cs_shutr(), cs_shutw(),
cs_chk_rcv(), cs_chk_snd() and manipulate a conn-stream instead of a
stream-interface.
To be able to move wait_event from the stream-interface to the conn-stream,
we must be prepare to handle errors when a mux is attached to a conn-stream.
Indeed, the wait_event's tasklet will be allocated when both a mux and a
stream will be both attached to a stream. So, we must be prepared to handle
allocation errors.
si_connect() is moved in backend.c and renamed as do_connect_server(). In
addition, the function now manipulate a stream instead of a
stream-interface.
The stream-interface state (SI_ST_*) is now in the conn-stream. It is a
mechanical replacement for now. Nothing special. SI_ST_* and SI_SB_* were
renamed accordingly. Utils functions to manipulate these infos were moved
under the conn-stream scope.
But it could be good to keep in mind that this part should be
reworked. Indeed, at the CS level, we only need to know if it is ready to
receive or to send. The state of conn-stream from INI to EST is only used on
the server side. The client CS is immediately set to EST. Thus current
SI_ST_* states should probably be moved to the stream to reflect the server
connection state during the establishment stage.
Only the server side is concerned by the stream-interface error type. It is
useless to have an err_type field on the client side. So, it is now move to
the stream. SI_ET_* are renames STRM_ET_* and moved in stream-t.h header
file.
Flag to get the source ip/port with getsockname is now handled at the stream
level. Thus SI_FL_SRC_ADDR stream-int flag is replaced by SF_SRC_ADDR stream
flag.
Flags to disable lingering and half-close are now handled at the conn-stream
level. Thus SI_FL_NOLINGER and SI_FL_NOHALF stream-int flags are replaced by
CS_FL_NOLINGER and CS_FL_NOHALF conn-stream flags.
Instead of relying on the conn-stream error, via CS_FL_ERR flags, we now
directly use the error at the endpoint level with the flag CS_EP_ERROR. It
should be safe to do so. But we must be careful because it is still possible
that an error is processed too early. Anyway, a conn-stream has always a
valid endpoint, maybe detached from any endpoint, but valid.
SI_FL_ERR is removed and replaced by CS_FL_ERROR. It is a transient patch
because the idea is to rely on the endpoint to handle errors at this
level. But if for any reason it is not possible, the stream-interface flags
will still be replaced.
The expiration date in the stream-interface was only used on the server side
to set the connect, queue or turn-around timeout. It was checked on the
frontend stream-interface, but never used concretely. So it was removed and
replaced by a connect expiration date in the stream itself. Thus, SI_FL_EXP
flag in stream-interfaces is replaced by a stream flag, SF_CONN_EXP.
The source and destination addresses at the applicative layer are moved from
the stream-interface to the conn-stream. This simplifies a bit the code and
it is a logicial step to remove the stream-interface.
The conn_retries counter was set to the max value and decremented at each
connection retry. Thus the counter reflected the number of retries left and
not the real number of retries. All calculations of redispatch or reporting
of number of retries experienced were made using subtracts from the
configured retries, which was complicated and didn't bring any benefit.
Now, this counter is set to 0 and incremented at each retry. We know we've
reached the maximum allowed connection retries by comparing it to the
configured value. In all other cases, we directly use the counter.
This patch should address the feature request #1608.
The conn_retries counter may be moved into the stream structure. It only
concerns the connection establishment. The frontend stream-interface does not
use it. So it is a logical change.
At many places, we now use the new CS functions to get a stream or a channel
from a conn-stream instead of using the stream-interface API. It is the
first step to reduce the scope of the stream-interfaces. The main change
here is about the applet I/O callback functions. Before the refactoring, the
stream-interface was the appctx owner. Thus, it was heavily used. Now, as
far as possible,the conn-stream is used. Of course, it remains many calls to
the stream-interface API.
All old flags CS_FL_* are now moved in the endpoint scope and renamed
CS_EP_* accordingly. It is a systematic replacement. There is no true change
except for the health-check and the endpoint reset. Here it is a bit special
because the same conn-stream is reused. Thus, we must handle endpoint
allocation errors. To do so, cs_reset_endp() has been adapted.
Thanks to this last change, it will now be possible to simplify the
multiplexer and probably the applets too. A review must also be performed to
remove some flags in the channel or the stream-interface. The HTX will
probably be simplified too. Finally, there is now some place in the
conn-stream to move info from the stream-interface.
The conn-stream endpoint is now shared between the conn-stream and the
applet or the multiplexer. If the mux or the applet is created first, it is
responsible to also create the endpoint and share it with the conn-stream.
If the conn-stream is created first, it is the opposite.
When the endpoint is only owned by an applet or a mux, it is called an
orphan endpoint (there is no conn-stream). When it is only owned by a
conn-stream, it is called a detached endpoint (there is no mux/applet).
The last entity that owns an endpoint is responsible to release it. When a
mux or an applet is detached from a conn-stream, the conn-stream
relinquishes the endpoint to recreate a new one. This way, the endpoint
state is never lost for the mux or the applet.
It is a transient commit to prepare next changes. Now, when a conn-stream is
created from an applet or a multiplexer, an endpoint is always provided. In
addition, the API to create a conn-stream was specialized to have one
function per type.
The next step will be to share the endpoint structure.
Some CS flags, only related to the endpoint, are moved into the endpoint
struct. More will probably moved later. Those ones are not critical. So it
is pretty safe to move them now and this will ease next changes.
This change is only significant for the multiplexer part. For the applets,
the context and the endpoint are the same. Thus, there is no much change. For
the multiplexer part, the connection was used to set the conn-stream
endpoint and the mux's stream was the context. But it is a bit strange
because once a mux is installed, it takes over the connection. In a
wonderful world, the connection should be totally hidden behind the mux. The
stream-interface and, in a lesser extent, the stream, still access the
connection because that was inherited from the pre-multiplexer era.
Now, the conn-stream endpoint is the mux's stream (an opaque entity for the
conn-stream) and the connection is the context. Dedicated functions have
been added to attached an applet or a mux to a conn-stream.
Thanks to all previous changes, it is now possible to move the
stream-interface into the conn-stream. To do so, some SI functions are
removed and their conn-stream counterparts are added. In addition, the
conn-stream is now responsible to create and release the
stream-interface. While the stream-interfaces were inlined in the stream
structure, there is now a pointer in the conn-stream. stream-interfaces are
now dynamically allocated. Thus a dedicated pool is added. It is a temporary
change because, at the end, the stream-interface structure will most
probably disappear.
To be able to move the stream-interface from the stream to the conn-stream,
all access to the SI is done via the conn-stream. This patch is limited to
the backend part.