Instead of replying by adding an OOB block in the HTX structure, we now add a
valid HTX message. The old code relied on the function http_reply_and_close() to
send 401/407 responses. Now, we push it in the response's buffer. So we take
care to drain the request's channel and to shutdown the response's channel for
the read.
Instead of replying by adding an OOB block in the HTX structure, we now add a
valid HTX message. A header block is added to each early-hint rule, prefixed by
the start line if it is the first one. The response is terminated and forwarded
when the rules execution is stopped or when a rule of another type is applied.
the flags of the HTX start-line (HTX_SL_F_*) are mapped on ones of the HTTP
message (HTTP_MSGS_*). So we can easily retrieve info from the parsing in HTX
analyzers.
Instead, we now use the htx_sl coming from the HTX message. It avoids to have
too H1 specific code in version-agnostic parts. Of course, the concept of the
start-line is higly influenced by the H1, but the structure htx_sl can be
adapted, if necessary. And many things depend on a start-line during HTTP
analyzis. Using the structure htx_sl also avoid boring conversions between HTX
version and H1 version.
If there is no start-line, this offset is set to -1. Otherwise, it is the
relative address where the start-line is stored in the data block. When the
start-line is added, replaced or removed, this offset is updated accordingly. On
remove, if the start-line is no set and if the next block is a start-line, the
offset is updated. Finally, when an HTX structure is defragmented, the offset is
also updated accordingly.
The HTX start-line is now a struct. It will be easier to extend, if needed. Same
info can be found, of course. In addition it is now possible to set flags on
it. It will be used to set some infos about the message.
Some macros and functions have been added in proto/htx.h to help accessing
different parts of the start-line.
The function htx_add_data_before() can be used to add an HTX block before
another one. For instance, it could be used to add some data before the
end-of-message marker.
With the legacy representation, keep-alive outgoing connections are added in
private/idle/safe connections list when the transaction is cleaned up. But this
stage does not exist with the HTX representaion because a new stream, and
therefore a new transaction, is created for each request. So it is now handled
when the stream is detached from the connection.
In h1_snd_buf(), the data sending is done synchronously, as much as possible. So
if some data remains in the channel's buffer, because there was not enougth
place in the output buffer, it may be good the retry after a send because some
space may have been released when sending. Most of time the output buffer is
empty and all channel's data are consumed the first time. And if no data are
sent, we don't retry to do more. So the loop is just here to optimize edge cases
without any cost for all others.
After a call to snd_buf, if some data remain in the channel's buffer, this means
the system buffers are full or we are unable to fully consume an HTX block for
any reason. In the last case, we need to wakeup the stream to process more data
as soon as possible. We do it subscribing to send at the end of h1_snd_buf().
When trailers are parsed, we must add the corrresponsing HTX block and then we
must add the block end-of-message. But this last operation can failed because
there is not enough space the HTX message. This case was left aside till
now. Now, we stay in the state H1_MSG_TRAILERS with the warranty we will be able
to restart at the right stage.
For chunked messages, during output process, the mux is now able to write the
last empty chunk and empty trailers when corrsponding blocks have not been found
in the HTX message. It is handy for filters changing a not-chunked message into
a chunked one (like the compression filter).
In h1_set_srv_conn_mode(), we need to get the frontend proxy of a server
connection. untill now, we relied on the stream to get it. But it was a bit
dirty. The stream always exists at this stage but to get it, we also need to get
the stream-interface. Since the commit 7c6f8b146 ("MAJOR: connections: Detach
connections from streams."), the connection's owner is always the session, even
for outgoing connections. So now, we rely on the session to get the frontend
proxy in h1_set_srv_conn_mode().
Use the session instead of the stream to get
the frontend on the server connection
When the connection client is accepted, the info of the client conn_stream are
filled with the session info (accept_date, tv_accept and t_handshake). For all
other conn_streams, on client and server side, their info are filled using
global values (date and now).
In http_wait_for_response(), we wait that all outgoing data have really been
sent (from the channel's point of view) to start the processing of the
response. In fact, it is used to send all intermediate 10x responses. For now
the HTX api is not really handy when multiple messages are stored in the HTX
structure.
Because multiple HTTP messages can be stored in an HTX structure, it is
important to not forget to reset the H1 parser at the beginning of each
one. With the current version, this case only happens on the response, when
multiple HTTP-1XX responses are forwarded to the client (for instance
103-Early-Hints). So strickly speaking, it is the same message. But for now,
internally, each one is a standalone message. Note that it might change in a
future version of the HTX.
in h1_process_output(), before formatting the headers, we need to find and check
the "Connection: " header to update the connection mode. But, the context used
to do so was not correctly initialized. We must explicitly set ctx.value to NULL
to be sure to rescan the current header.
the function http_show_error_snapshot() must not use the trash buffer to append
the HTTP error description. Instead, it must use the <out> buffer, its first
argument. Note that concretely, this function always succeeds because <out> is
always the trash buffer.
When a section's parser is registered, it can also define a post section
callback, called at the end of the section parsing. But when 2 sections with the
same name followed each other, the transition between them was missed. This
induced 2 bugs. First, the call to the post section callback was skipped. Then,
the parsing of the second section was mixed with the first one.
This patch must be backported in 1.8.
http_proxy is special, because it creates its connection and conn_stream
earlier. So in assign_server(), check that the connection associated with
the conn_stream has a destination address set, and in connect_server(),
use the connection and the conn_stream already attached to the
stream_interface, instead of looking for a connection in the session, and
creating a new conn_stream.
Instead of parsing all the available connections owned by the session
each time we choose a server, even if prefer-last-server is not set,
just do it if prefer-last-server is used, and check if the server is usable,
before checking the connections.
When a transaction ends, if we want to do keepalive, and the connection we
used didn't have an owner, attach the connection to the session, so that we
don't have to destroy it, and we can reuse it later.
Instead of just storing the last connection in the session, store all of
the connections, for at most MAX_SRV_LIST (currently 5) targets.
That way we can do keepalive on more than 1 outgoing connection when the
client uses HTTP/2.
When creating a new outgoing H2 connection, put it in the idle list so that
it's immediately available for others to use, if http-reuse always is used.
While it is true the SSL code will do the right thing if the SSL handshake
is not done, we have other types of handshake to deal with (proxy protocol,
netscaler, ...). For those we definitively don't want to try to send data
before it's done. All handshakes but SSL will go through the mux_pt, so in
mux_pt_snd_buf, don't try to send while a handshake is pending.
In session_free(), make sure the outgoing connection is not in the idle list
anymore, and it does no longer have an owner, so that it will properly be
destroyed and nobody will be able to access it.
We can reach sess_update_st_con_tcp() while we still have a connection
attached, so take that into account, and free the connection, instead of
assuming it's always a conn_stream.
When freeing the session, we may fail to free the outgoing connection,
because it still has streams attached. So remove ourself from the session
list, so that the connection doesn't try to access it later.
In h2_recv(), return 1 if there's an error on the connection, not just if
there's a read0 pending, so that h2_process() can be called and act as a
janitor.
In si_cs_recv(), when there's an error on the connection or the conn_stream,
don't give up if CS_FL_RCV_MORE is set on the conn_stream, as it means there's
still data available.
In si_cs_send(), don't give up and subscribe if the connection is still
waiting for a SSL handshake. We will never be woken up once the handshake is
done if we're using HTTP/2. Instead, directly try to send data. When using
the mux_pt, if the handshake is not done yet, snd_buf() would return 0 and
we will subscribe anyway.
When we're deferring the mux choice until the ALPN is negociated, we
attach the connection to the stream_interface until it's done, so that we
can destroy it if something goes wrong and the stream is destroy.
Before calling si_attach_cs() to attach the conn_stream once we have it,
call si_detach_endpoint(), or is_attach_cs() would destroy the connection.
When we defer the mux choice until the ALPN is negociated, don't forget
to wake the stream once it's done, or it will never have the opportunity
to send data.
In ssl_sock_parse_clienthello(), the code considers that SSL Sessionid
size is '1', and then considers that the SSL cipher suite is availble
right after the session id size information.
This actually works in a single case, when the client does not send a
session id.
This patch fixes this issue by introducing the a propoer way to parse
the session id and move forward the cursor by the session id length when
required.
Need to be backported to 1.8.
In the mux_pt, when we're attaching a new conn_stream, don't forget to
unsubscribe from the connection. Failure to do so may lead to the mux_pt
freeing the connection while the conn_stream can still want to access it.
In h2_process_demux(), if we're demuxing multiple frames, and the previous
frame led to a stream getting closed, don't bogusly consider that an error,
and destroy the next stream, as there are valid cases where the stream could
be closed.
The CLOEXEC flag was set using a F_SETFL which can't work.
To set the CLOEXEC flag F_SETFD should be used, the problem is that it
needs a new call to fcntl() and it's on the path of every accept.
This flag was only needed in the case of the master, so the patch was
reverted and the flag set only in this case.
The bug was introduced by 0b3e849 ("MEDIUM: listeners: set O_CLOEXEC on
the accepted FDs").
No backport needed.
If the master was reloaded and there was a established connection to a
server, the FD resulting from the accept was leaking.
There was no CLOEXEC flag set on the FD of the socketpair created during
a connect call. This is specific to the socketpair in the master process
but it should be applied to every protocol in case we use them in the
master at some point.
No backport needed.
The previous fix da95fd90 ("BUILD/MINOR: ssl: fix build with non-alpn/
non-npn libssl") does fix the build in old OpenSSL release, but I
overlooked that the ctx is only freed when NPN is supported.
Fix this by moving the #endif to the proper place (this was broken in
c7566001 ("MINOR: server: Add "alpn" and "npn" keywords")).
Having a thread_local for the pool cache is messy as we need to
initialize all elements upon startup, but we can't until the threads
are created, and once created it's too late. For this reason, the
allocation code used to check for the pool's initialization, and
it was the release code which used to detect the first call and to
initialize the cache on the fly, which is not exactly optimal.
Now that we have initcalls, let's turn this into a per-thread array.
This array is initialized very early in the boot process (STG_PREPARE)
so that pools are always safe to use. This allows to remove the tests
from the alloc/free calls.
Doing just this has removed 2.5 kB of code on all cumulated pool_alloc()
and pool_free() paths.
signal_init(), init_log(), init_stream(), and init_task() all used to
only preset some values and lists. This needs to be done very early to
provide a reliable interface to all other users. The calls used to be
explicit in haproxy.c:init(). Now they're placed in initcalls at the
STG_PREPARE stage. The functions are not exported anymore.
Instead of exporting a number of pools and having to manually delete
them in deinit() or to have dedicated destructors to remove them, let's
simply kill all pools on deinit().
For this a new function pool_destroy_all() was introduced. As its name
implies, it destroys and frees all pools (provided they don't have any
user anymore of course).
This allowed to remove 4 implicit destructors, 2 explicit ones, and 11
individual calls to pool_destroy(). In addition it properly removes
the mux_pt_ctx pool which was not cleared on exit (no backport needed
here since it's 1.9 only). The sig_handler pool doesn't need to be
exported anymore and became static now.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
The new function create_pool_callback() takes 3 args including the
return pointer, and creates a pool with the specified name and size.
In case of allocation error, it emits an error message and returns.
The new macro REGISTER_POOL() registers a callback using this function
and will be usable to request some pools creation and guarantee that
the allocation will be checked. An even simpler approach is to use
DECLARE_POOL() and DECLARE_STATIC_POOL() which declare and register
the pool.
Most calls to hap_register_post_check(), hap_register_post_deinit(),
hap_register_per_thread_init(), hap_register_per_thread_deinit() can
be done using initcalls and will not require a constructor anymore.
Let's create a set of simplified macros for this, called respectively
REGISTER_POST_CHECK, REGISTER_POST_DEINIT, REGISTER_PER_THREAD_INIT,
and REGISTER_PER_THREAD_DEINIT.
Some files were not modified because they wouldn't benefit from this
or because they conditionally register (e.g. the pollers).
Most register_build_opts() calls use static strings. These ones were
replaced with a trivial REGISTER_BUILD_OPTS() statement adding the string
and its call to the STG_REGISTER section. A dedicated section could be
made for this if needed, but there are very few such calls for this to
be worth it. The calls made with computed strings however, like those
which retrieve OpenSSL's version or zlib's version, were moved to a
dedicated function to guarantee they are called late in the process.
For example, the SSL call probably requires that SSL_library_init()
has been called first.
This patch replaces a number of __decl_hathread() followed by HA_SPIN_INIT
or HA_RWLOCK_INIT by the new __decl_spinlock() or __decl_rwlock() which
automatically registers the lock for initialization in during the STG_LOCK
init stage. A few static modifiers were lost in the process, but since they
were not essential at all it was not worth extending the API to provide such
a variant.
This patch adds ha_spin_init() and ha_rwlock_init() which are used as
a callback to initialise locks at boot time. They perform exactly the
same as HA_SPIN_INIT() or HA_RWLOCK_INIT() but from within a real
function.
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :
- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
We reintroduced some FDs leaking by using a poller and some listeners in
the master.
The master proxy needs to be stopped to avoid leaking its listeners, the
polling loop needs to be deinit, and the thread waker pipe need to be
closed too.
No backport needed.
Surprisingly, the compression pool was created at runtime on first use,
which is not very convenient, has performance and reliability impacts,
and even makes monitoring less easy. Let's move the pool creation at
startup time instead. This even removes the need for the spinlock in
case USE_ZLIB is not defined.
The tune.maxzlibmem setting was moved with commit 368780334 ("MEDIUM:
compression: move the zlib-specific stuff from global.h to compression.c")
but the preset value using DEFAULT_MAXZLIBMEM was incorrectly moved :
- the field is in "global" and not "global.tune"
- the trailing comma instead of semi-colon will make it either zero
(threads enabled), break (threads enabled with debugging), or cast
the memprintf's return pointer to int (threads disabled)
It simply proves that nobody ever used DEFAULT_MAXZLIBMEM since 1.8!
This needs to be backported to 1.8.
Valgrind reports:
==3389== Warning: invalid file descriptor -1 in syscall close()
Check for >= 0 before closing.
This bug was introduced in commit ce83b4a5dd
and is specific to 1.9. No backport needed.
In commit c7566001 ("MINOR: server: Add "alpn" and "npn" keywords") and
commit 201b9f4e ("MAJOR: connections: Defer mux creation for outgoing
connection if alpn is set"), the build was broken on older OpenSSL
releases.
Move the #ifdef's around so that we build again with older OpenSSL
releases (0.9.8 was tested).
Since the connection changes in 1.9, some breakage happened to the H2 mux
whose initial design was heavily relying on the fact that connection-level
functions were woken up after data were transferred to the stream layer.
We need to wake the demux up after receiving such data if the demux is
blocked. This at least allows to receive POSTs again. One issue remains,
it looks like the end of the uploaded data is silently discarded if the
server responds before the end of the transfer (H2 in half-closed(local)
state), which doesn't happen with 1.8.14 and nghttp as the client.
No backport is needed.
After the changes to the connection layer in 1.9, some wake up calls
need to be introduced to re-activate reading from the connection. One
such place is at the end of h2_process_demux(), otherwise processing
of input data stops after a few frames.
No backport is needed.
When we create a connection, if we have to defer the conn_stream and the
mux creation until we can decide it (ie until the SSL handshake is done, and
the ALPN is decided), store the connection in the stream_interface, so that
we're sure we can destroy it if needed.
When ending a stream, if the origin is an appctx, the appctx will have been
destroyed already, but it does not destroy the session. So later, when we
try to destroy the session, we try to dereference sess->origin and die
trying.
Fix this by explicitely setting sess->origin to NULL before calling
session_free().
The creation of the conn_stream for an outgoing connection has been delayed
a bit, and when using dispatch, a check was made to see if a conn_stream
was attached before the conn_stream was created, so remove the test, as
it's done later anyway, and create and install the conn_stream right away
when we don't have a server, as is done when we don't have an alpn/npn
defined.
In the various connect_server() functions, don't reset the connection flags,
as some may have been set before. The flags are initialized in conn_init(),
anyway.
If an ALPN (or a NPN) was chosen for a server, defer choosing the mux until
after the SSL handshake is done, and the ALPN/NPN has been negociated, so
that we know which mux to pick.
As we now will no longer try tro subscribe to recv/send events before the
connection is established, there's no need to reactivate polling on the fd
when retrying connection. It will be activated later on subscribe.
In some situations, especially when dealing with low latency on processors
supporting a variable frequency or when running inside virtual machines,
each time the process waits for an I/O using the poller, the processor
goes back to sleep or is offered to another VM for a long time, and it
causes excessively high latencies.
A solution to this provided by this patch is to enable busy polling using
a global option. When busy polling is enabled, the pollers never sleep and
loop over themselves waiting for an I/O event to happen or for a timeout
to occur. On multi-processor machines it can significantly overheat the
processor but it usually results in much lower latencies.
A typical test consisting in injecting traffic over a single connection at
a time over the loopback shows a bump from 4640 to 8540 connections per
second on forwarded connections, indicating a latency reduction of 98
microseconds for each connection, and a bump from 12500 to 21250 for
locally terminated connections (redirects), indicating a reduction of
33 microseconds.
It is only usable with epoll and kqueue because select() and poll()'s
API is not convenient for such usages, and the level of performance they
are used in doesn't benefit from this anyway.
The option, which obviously remains disabled by default, can be turned
on using "busy-polling" in the global section, and turned off later
using "no busy-polling". Its status is reported in "show info" to help
troubleshooting suspicious CPU spikes.
Fix some memory leak and a FD leak in the error path of the master proxy
initialisation. It's a really minor issue since the process is exiting
when taking those error paths.
Valgrind's memcheck reports memory leaks in cli.c, because
the out parameter of memprintf is not properly freed:
==31035== 11 bytes in 1 blocks are definitely lost in loss record 16 of 101
==31035== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31035== by 0x4C2FDEF: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31035== by 0x4A3C72: my_realloc2 (standard.h:1364)
==31035== by 0x4A3C72: memvprintf (standard.c:3459)
==31035== by 0x4A3D93: memprintf (standard.c:3482)
==31035== by 0x4AF77E: mworker_cli_sockpair_new (cli.c:2324)
==31035== by 0x48E826: init (haproxy.c:1749)
==31035== by 0x408BBC: main (haproxy.c:2725)
==31035==
==31035== 11 bytes in 1 blocks are definitely lost in loss record 17 of 101
==31035== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31035== by 0x4C2FDEF: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31035== by 0x4A3C72: my_realloc2 (standard.h:1364)
==31035== by 0x4A3C72: memvprintf (standard.c:3459)
==31035== by 0x4A3D93: memprintf (standard.c:3482)
==31035== by 0x4AF071: mworker_cli_proxy_create (cli.c:2172)
==31035== by 0x48EC89: init (haproxy.c:1760)
==31035== by 0x408BBC: main (haproxy.c:2725)
These leaks were introduced in commits
ce83b4a5dd and
8a02257d88
which are specific to haproxy 1.9 dev.
The "cpust_{tot,1s,15s}" fields used to report milliseconds but nothing
in the value's title made this explicit. Let's rename the field to report
"cpust_ms_{tot,1s,15s}" to more easily remind that the unit represents
milliseconds.
These sample fetch keywords report performance metrics about the task calling
them. They are useful to report in logs which requests consume too much CPU
time and what negative performane impact it has on other requests. Typically
logging cpu_ns_avg and lat_ns_avg will show culprits and victims.
Right now we measure for each task the cumulated time spent waiting for
the CPU and using it. The timestamp uses a 64-bit integer to report a
nanosecond-level date. This is only enabled when "profiling.tasks" is
enabled, and consumes less than 1% extra CPU on x86_64 when enabled.
The cumulated processing time and wait time are reported in "show sess".
The task's counters are also reset when an HTTP transaction is reset
since the HTTP part pretends to restart on a fresh new stream. This
will make sure we always report correct numbers for each request in
the logs.
This is a new global setting which enables or disables CPU profiling
per task. For now it only sets/resets the variable based on the global
option "profiling.tasks" and supports showing it as well as setting it
from the CLI using "show profiling" and "set profiling". The option will
be used by a future commit. It was done in a way which should ease future
addition of profiling options.
Since we know the time it takes to process everything between two poll()
calls, we can use this as the max latency measurement any task will
experience and average it.
This code does this, and reports in "show activity" the average of this
loop time over the last 1024 poll() loops, for each thread. It will vary
quickly at high loads and slowly under low to moderate loads, depending
on the rate at which poll() is called. The latency a task experiences
is expected to be half of this on average.
At the moment the situation with activity measurement is quite tricky
because the struct activity is defined in global.h and declared in
haproxy.c, with operations made in time.h and relying on freq_ctr
which are defined in freq_ctr.h which itself includes time.h. It's
barely possible to touch any of these files without breaking all the
circular dependency.
Let's move all this stuff to activity.{c,h} and be done with it. The
measurement of active and stolen time is now done in a dedicated
function called just after tv_before_poll() instead of mixing the two,
which used to be a lazy (but convenient) decision.
No code was changed, stuff was just moved around.
The signal_register_fct() does not remove the handlers assigned to a
signal, but add a new handler to a list.
We accidentality inherited the handlers of the main() function in the
master process which is a problem because they act on the proxies.
The side effect was to stop the MASTER proxy which handle the master CLI
on a SIGUSR1, and to display some debug info when doing a SIGHUP and a
SIGQUIT.
The new function signal_unregister() removes every handlers assigned to
a signal. Once the handler list of the signal is empty, the signal is
ignored with SIG_IGN.
In the output of 'show fd', the worker CLI's socketpair was still
handled by an "unknown" function. That can be really confusing during
debug. Fixed it by showing "mworker_accept_wrapper" instead.
The mworker waitpid mode (which is used when a reload failed to apply
the new configuration) was still using a specific initialisation path.
That's a problem since we use a polling loop in the master now, the
master proxy is not initialized and the master CLI is not activated.
This patch removes the initialisation code of the wait mode and
introduce the MODE_MWORKER_WAIT in order to use the same init path as
the MODE_MWORKER with some exceptions. It allows to use the master proxy
and the master CLI during the waitpid mode.