HTTP health-checks are now internally based on tcp-checks. Of course all the
configuration parsing of the "http-check" keyword and the httpchk option has
been rewritten. But the main changes is that now, as for tcp-check ruleset, it
is possible to perform several send/expect sequences into the same
health-checks. Thus the connect rule is now also available from HTTP checks, jst
like set-var, unset-var and comment rules.
Because the request defined by the "option httpchk" line is used for the first
request only, it is now possible to set the method, the uri and the version on a
"http-check send" line.
Make it so sessions are not responsible for connection anymore, except for
connections that are private, and thus can't be shared, otherwise, as soon
as a request is done, the session will just add the connection to the
orphan connections pool.
This will break http-reuse safe, but it is expected to be fixed later.
As mentioned in commit c192b0ab95 ("MEDIUM: connection: remove
CO_FL_CONNECTED and only rely on CO_FL_WAIT_*"), there is a lack of
consistency on which flags are checked among L4/L6/HANDSHAKE depending
on the code areas. A number of sample fetch functions only check for
L4L6 to report MAY_CHANGE, some places only check for HANDSHAKE and
many check both L4L6 and HANDSHAKE.
This patch starts to make all of this more consistent by introducing a
new mask CO_FL_WAIT_XPRT which is the union of L4/L6/HANDSHAKE and
reports whether the transport layer is ready or not.
All inconsistent call places were updated to rely on this one each time
the goal was to check for the readiness of the transport layer.
Commit 477902bd2e ("MEDIUM: connections: Get ride of the xprt_done
callback.") broke the master CLI for a very obscure reason. It happens
that short requests immediately terminated by a shutdown are properly
received, CS_FL_EOS is correctly set, but in si_cs_recv(), we refrain
from setting CF_SHUTR on the channel because CO_FL_CONNECTED was not
yet set on the connection since we've not passed again through
conn_fd_handler() and it was not done in conn_complete_session(). While
commit a8a415d31a ("BUG/MEDIUM: connections: Set CO_FL_CONNECTED in
conn_complete_session()") fixed the issue, such accident may happen
again as the root cause is deeper and actually comes down to the fact
that CO_FL_CONNECTED is lazily set at various check points in the code
but not every time we drop one wait bit. It is not the first time we
face this situation.
Originally this flag was used to detect the transition between WAIT_*
and CONNECTED in order to call ->wake() from the FD handler. But since
at least 1.8-dev1 with commit 7bf3fa3c23 ("BUG/MAJOR: connection: update
CO_FL_CONNECTED before calling the data layer"), CO_FL_CONNECTED is
always synchronized against the two others before being checked. Moreover,
with the I/Os moved to tasklets, the decision to call the ->wake() function
is performed after the I/Os in si_cs_process() and equivalent, which don't
care about this transition either.
So in essence, checking for CO_FL_CONNECTED has become a lazy wait to
check for (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN), but that always
relies on someone else having synchronized it.
This patch addresses it once for all by killing this flag and only checking
the two others (for which a composite mask CO_FL_WAIT_L4L6 was added). This
revealed a number of inconsistencies that were purposely not addressed here
for the sake of bisectability:
- while most places do check both L4+L6 and HANDSHAKE at the same time,
some places like assign_server() or back_handle_st_con() and a few
sample fetches looking for proxy protocol do check for L4+L6 but
don't care about HANDSHAKE ; these ones will probably fail on TCP
request session rules if the handshake is not complete.
- some handshake handlers do validate that a connection is established
at L4 but didn't clear CO_FL_WAIT_L4_CONN
- the ->ctl method of mux_fcgi, mux_pt and mux_h1 only checks for L4+L6
before declaring the mux ready while the snd_buf function also checks
for the handshake's completion. Likely the former should validate the
handshake as well and we should get rid of these extra tests in snd_buf.
- raw_sock_from_buf() would directly set CO_FL_CONNECTED and would only
later clear CO_FL_WAIT_L4_CONN.
- xprt_handshake would set CO_FL_CONNECTED itself without actually
clearing CO_FL_WAIT_L4_CONN, which could apparently happen only if
waiting for a pure Rx handshake.
- most places in ssl_sock that were checking CO_FL_CONNECTED don't need
to include the L4 check as an L6 check is enough to decide whether to
wait for more info or not.
It also becomes obvious when reading the test in si_cs_recv() that caused
the failure mentioned above that once converted it doesn't make any sense
anymore: having CS_FL_EOS set while still waiting for L4 and L6 to complete
cannot happen since for CS_FL_EOS to be set, the other ones must have been
validated.
Some of these parts will still deserve further cleanup, and some of the
observations above may induce some backports of potential bug fixes once
totally analyzed in their context. The risk of breaking existing stuff
is too high to blindly backport everything.
We can't just assume conn_create_mux() will be called, and set CO_FL_CONNECTED,
conn_complete_session() might be call synchronously if we're not using SSL,
so ew haee no choice but to set CO_FL_CONNECTED in there. This should fix
the recent breakage of the mcli reg tests.
The xprt_done_cb callback was used to defer some connection initialization
until we're connected and the handshake are done. As it mostly consists of
creating the mux, instead of using the callback, introduce a conn_create_mux()
function, that will just call conn_complete_session() for frontend, and
create the mux for backend.
In h2_wake(), make sure we call the wake method of the stream_interface,
as we no longer wakeup the stream task.
In session_accept_fd() we can perform a synchronous call to
conn_complete_session() and if it succeeds the connection is accepted
and turned into a session. If it fails we take it as an error while it
is not, in this case, it's just that a tcp-request rule has decided to
reject the incoming connection. The problem with reporting such an event
as an error is that the failed status is passed down to the listener code
which decides to disable accept() for 100ms in order to leave some time
for transient issues to vanish, and that's not what we want to do here.
This fix must be backported as far as 1.7. In 1.7 the code is a bit
different as tcp_exec_l5_rules() is called directly from within
session_new_fd() and ret=0 must be assigned there.
Since previous commit a132e5efa9 ("BUG/MEDIUM: Make sure we leave the
session list in session_free().") it's pointless to delete the conn
element inside "if" blocks given that the second test is always true
as well. Let's simplify this with a single LIST_DEL_INIT() before the
test.
In session_free(), if we're about to destroy a connection that had no mux,
make sure we leave the session_list before calling conn_free(). Otherwise,
conn_free() would call session_unown_conn(), which would potentially free
the associated srv_list, but session_free() also frees it, so that would
lead to a double free, and random memory corruption.
This should be backported to 1.9 and 2.0.
Now that the architecture was changed so that attempts to receive/send data
always come from the upper layers, instead of them only trying to do so when
the lower layer let them know they could try, we can finally get rid of the
fd cache. We don't really need it anymore, and removing it gives us a small
performance boost.
This commit places calls to sockaddr_alloc() at the places where an address
is needed, and makes sure that the allocation is properly tested. This does
not add too many error paths since connection allocations are already in the
vicinity and share the same error paths. For the two cases where a
clear_addr() was called, instead the address was not allocated.
In session_accept_fd() we'll soon have to dynamically allocate the
address, or better, steal it from the caller and define a strict calling
convention regarding who's responsible for the freeing. In the simpler
session_prepare_log_prefix(), just add an attempt to retrieve the address
if not yet set and do not dereference it on failure.
If session_accept_fd() fails for a raw HTTP socket, we try to send an HTTP error
500. But we must not rely on error messages of the proxy or on the array
http_err_chunks because these are HTX messages. And it should be too expensive
to convert an HTX message to a raw message at this place. So instead, we send a
default HTTP error message from the array http_err_msgs.
This patch must be backported to 2.0 and 1.9.
If session_accept_fd() fails for a raw HTTP socket, we try to send an HTTP error
500. But, we must also take care it is an HTTP/1 connection. We cannot rely on
the mux at this stage, because the error, if any, happens before or during its
creation. So, instead, we check if the mux_proto is specified or not. Indeed,
the mux h1 cannot be forced on the bind line and there is no ALPN to choose
another mux on a raw socket. So if there is no mux_proto defined for a raw HTTP
socket, we are sure to have an HTTP/1 connection.
This patch must be backported to 2.0 and 1.9.
Add a new XPRT that is used when using non-SSL handshakes, such as proxy
protocol or Netscaler, instead of taking care of it in conn_fd_handler().
This XPRT is installed when any of those is used, and it removes itself once
the handshake is done.
This should allow us to remove the distinction between CO_FL_SOCK* and
CO_FL_XPRT*.
Add session flags, and add a new flag, SESS_FL_PREFER_LAST, to be set when
we use NTLM authentication, and we should reuse the last connection. This
should fix using NTLM with HTX. This totally replaces TX_PREFER_LAST.
This should be backported to 1.9.
Since previous commit it's not needed anymore to test a task pointer
before calling task_destory() so let's just remove these tests from
the various callers before they become confusing. The function's
arguments were also documented. The same should probably be done
with tasklet_free() which involves a test in roughly half of the
call places.
task_delete() was never used without calling task_free() just after, and
task_free() was only used on error pathes to destroy a just-created task,
so merge them into task_destroy(), that will remove the task from the
wait queue, and make sure the task is either destroyed immediately if it's
not in the run queue, or destroyed when it's supposed to run.
It is mandatory to handle mux upgrades, because during a mux upgrade, the
connection will be reassigned to another multiplexer. So when the old one is
destroyed, it does not own the connection anymore. Or in other words, conn->ctx
does not point to the old mux's context when its destroy() callback is
called. So we now rely on the multiplexer context do destroy it instead of the
connection.
In addition, h1_release() and h2_release() have also been updated in the same
way.
We were not checking p->feconn nor the global actconn soon enough. In
older versions this could result in a frontend accepting more connections
than allowed by its maxconn or the global maxconn, exactly N-1 extra
connections where N is the number of threads, provided each of these
threads were running a different listener. But with the lock removal,
it became worse, the excess could be the listener's maxconn multiplied
by the number of threads. Among the nasty side effect was that LI_FULL
could be removed while the limit was still over and in some cases the
polling on the socket was no re-enabled.
This commit takes care of updating and checking p->feconn and the global
actconn *before* processing the connection, so that the listener can be
turned off before accepting the socket if needed. This requires to move
some of the bookkeeping operations form session to listen, which totally
makes sense in this context.
Now the limits are properly respected, even if a listener's maxconn is
over a frontend's. This only applies on top of the listener lock removal
series and doesn't have to be backported.
If we're adding a connection to the server orphan idle list, don't forget
to remove the CO_FL_SESS_IDLE flag, or we will assume later it's still
attached to a session.
This should be backported to 1.9.
Instead of keeping track of the number of connections we're responsible for,
keep track of the number of connections we're responsible for that we are
currently considering idling (ie that we are not using, they may be in use
by other sessions), that way we can actually reuse connections when we have
more connections than the max configured.
When a session adds a connection to its connection list, we used to remove
connections for an another server if there were not enough room for our
server. This can't work, because those lists are now the list of connections
we're responsible for, not just the idle connections.
To fix this, allow for an unlimited number of servers, instead of using
an array, we're now using a linked list.
In session, don't keep an infinite number of connection that can idle.
Add a new frontend parameter, "max-session-srv-conns" to set a max number,
with a default value of 5.
Instead of the old "idle-timeout" mechanism, add a new option,
"pool-purge-delay", that sets the delay before purging idle connections.
Each time the delay happens, we destroy half of the idle connections.
Add a new command, "pool-max-conn" that sets the maximum number of connections
waiting in the orphan idling connections list (as activated with idle-timeout).
Using "-1" means unlimited. Using pools is now dependant on this.
Add a new keyword for servers, "idle-timeout". If set, unused connections are
kept alive until the timeout happens, and will be picked for reuse if no
other connection is available.
Instead of just storing the last connection in the session, store all of
the connections, for at most MAX_SRV_LIST (currently 5) targets.
That way we can do keepalive on more than 1 outgoing connection when the
client uses HTTP/2.
In session_free(), make sure the outgoing connection is not in the idle list
anymore, and it does no longer have an owner, so that it will properly be
destroyed and nobody will be able to access it.
When freeing the session, we may fail to free the outgoing connection,
because it still has streams attached. So remove ourself from the session
list, so that the connection doesn't try to access it later.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
If an ALPN (or a NPN) was chosen for a server, defer choosing the mux until
after the SSL handshake is done, and the ALPN/NPN has been negociated, so
that we know which mux to pick.
Do not destroy the connection when we're about to destroy a stream. This
prevents us from doing keepalive on server connections when the client is
using HTTP/2, as a new stream is created for each request.
Instead, the session is now responsible for destroying connections.
When reusing connections, the attach() mux method is now used to create a new
conn_stream.
Introduce a new field in session, "srv_conn", and a linked list of sessions
in the connection. It will be used later when we'll switch connections
from being managed by the stream, to being managed by the session.
The tasks API was changed in 1.9-dev1 with commit 9f6af3322 ("MINOR: tasks:
Change the task API so that the callback takes 3 arguments."), causing the
task's state not to be usable anymore and to have been replaced with an
explicit argument in the callee. The task's state doesn't contain any trace
of the wakeup cause anymore. But there were two places where the old task's
state remained in use :
- sessions, used to more accurately report timeouts in logs when seeing
TASK_WOKEN_TIMEOUT ;
- peers, used to finish resynchronization when seeing TASK_WOKEN_SIGNAL
This commit fixes both occurrences by making sure we don't access task->state
directly (should we rename it by the way ?).
No backport is needed.
It's a bit painful to have to deal with HTTP semantics for each protocol
version (H1 and H2), and working on the version-agnostic code further
emphasizes the problem.
This patch creates http.h and http.c which are agnostic to the version
in use, and which borrow a few parts from proto_http and from h1. For
example the once thought h1-specific h1_char_classes array is in fact
dictated by RFC7231 and is used to parse HTTP headers. A few changes
were made to a few files which were including proto_http.h while they
only needed http.h.
Certain string definitions pre-dated the introduction of indirect
strings (ist) so some were used to simplify the definition of the known
HTTP methods. The current lookup code saves 2 kB of a heavily used table
and is faster than the previous table based lookup (typ. 14 ns vs 16
before).
Sometimes a connection is prepared before the target is set, sometimes
after. There's no real rule since the few functions involved operate on
different and independent fields. Soon we'll benefit from knowing the
target at the connection layer, in order to figure the associated proxy
and retrieve the various parameters (timeouts etc). This patch slightly
reorders a few calls to conn_prepare() so that we can make sure that the
target is always known to the mux.
The handshake processing time used to be stored per stream, which was
valid when there was exactly one stream per session. With H2 and
multiplexing it's not the case anymore and the reported handshake times
are wrong in the logs as it's computed between the TCP accept() and the
stream creation. Let's first move the handshake where it belongs, which
is the session.
However, this is not enough because we don't want to report an excessive
idle time either for H2 (since many requests use the connection).
So the solution used here is to have the stream retrieve sess->tv_accept
and the handshake duration when the stream is created, and let the mux
immediately reset them. This way, the handshake time becomes zero for the
second and subsequent requests in H2 (which was already the case in H1),
and the idle time exactly counts how long the connection remained unused
while it could be used, so in H1 it runs from the end of the previous
response and in H2 it runs from the end of the previous request since the
channel is already available.
This patch will need to be backported to 1.8.
To do so, mux choices are split to handle incoming and outgoing connections in a
different way. The protocol specified on the bind/server line is used in
priority. Then, for frontend connections, the ALPN is retrieved and used to
choose the best mux. For backend connection, there is no ALPN. Finaly, if no
protocol is specified and no protocol matches the ALPN, we fall back on a
default mux, choosing in priority the first mux with exactly the same mode.
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.
Most of the changes were made with spatch using this coccinelle script :
@rule_d1@
typedef chunk;
struct chunk chunk;
@@
- chunk.str
+ chunk.area
@rule_d2@
typedef chunk;
struct chunk chunk;
@@
- chunk.len
+ chunk.data
@rule_i1@
typedef chunk;
struct chunk *chunk;
@@
- chunk->str
+ chunk->area
@rule_i2@
typedef chunk;
struct chunk *chunk;
@@
- chunk->len
+ chunk->data
Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
In preparation for thread-specific runqueues, change the task API so that
the callback takes 3 arguments, the task itself, the context, and the state,
those were retrieved from the task before. This will allow these elements to
change atomically in the scheduler while the application uses the copied
value, and even to have NULL tasks later.
When a freshly created session is rejected, for any reason, during the accept in
the function "session_accept_fd", the variable "actconn" is decreased twice. The
first time when the rejected session is released, then in the function
"listener_accpect", because of the failure. So it is possible to have an
negative value for actconn. Note that, in this case, we will also have a negatve
value for the current number of connections on the listener rejecting the
session (actconn and l->nbconn are in/decreased in same time).
It is easy to reproduce the bug with this small configuration:
global
stats socket /tmp/haproxy
listen test
bind *:12345
tcp-request connection reject if TRUE
A "show info" on the stat socket, after a connection attempt, will show a very
high value (the unsigned representation of -1).
To fix the bug, if the function "session_accept_fd" returns an error, it
decrements the right counters and "listener_accpect" leaves them untouched.
This patch must be backported in 1.8.
Some sample fetches check if session is established using
the flag CO_FL_CONNECTED. But in some cases, when a handshake
is performed this flag is set too late, after the process
of the tcp-request session rules.
This fix move the raising of the flag at the beginning of the
conn_complete_session function which processes the tcp-request
session rules.
This fix must be backported to 1.8 (and perhaps 1.7)
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
Now we don't remove the session when a stream dies, instead we
detach the stream and let the mux decide to release the connection
and call session_free() instead.