Because the HTX is still experimental, we must add special cases during the
configuration check to be sure it is not enabled on a proxy with incompatible
options. Here, for HTX proxies, when a mux protocol is specified on a bind line
or a server line, we must force the HTX mode (PROTO_MODE_HTX).
Concretely, H2 is the only mux protocol that can be forced. And it doesn't yet
support the HTX. So forcing the H2 on an HTX proxy will always fail.
In si_cs_send(), some checks are done the CS flags en the connection flags
before calling snd_buf(). But these checks are useless because they have already
been done earlier in the function. The harder to figure out is the flag
CO_FL_SOCK_WR_SH. So it is now tested with CF_SHUTW at the beginning.
In si_cs_send, as said in comments, snd_buf() should only be called if there is
no data in the pipe anymore. But actually, this condition was not respected.
For the same reason than for the commit b46784b1c ("MINOR: stream-int: Notify
caller when an error is reported after a rcv_pipe()"), we return 1 after the
call to rcv_buf() in si_cs_send() to notify the caller some processing may be
triggered.
This patch is not flagged as a bug because no strange behaviour was yet observed
without it. It is just a proactive fix to be consistent.
In si_cs_send(), when an error is found on the CS or the connection at the
beginning of the function, we return 1 to notify the caller some processing may
be triggered. So, it seems logical to do the same after the call to rcv_pipe().
This patch is not flagged as a bug because no strange behaviour was yet observed
without it. It is just a proactive fix to be consistent.
The request is eaten when the stats applet have finished to send its
response. It was removed from the channel's buffer, removing all HTX blocks till
the EOM. But the channel's output was not reset, leaving the request channel in
an undefined state.
When we start to parse a new message, if all headers have not been received,
nothing is copied in the channel's buffer. In this situation, we must not set
the flag CS_FL_RCV_MORE on the conn-stream. If we do so, the connection freezes
because there is no data to send that can reenable the reads
First of all, we need to be sure to keep the flag H1S_F_BUF_FLUSH on the H1S
reading data until all data was flushed from the buffer. Then we need to know
when the kernel splicing is in use and when it ends. This is handled with the
new flag H1S_F_SPLICED_DATA.
Then, we must subscribe to send when some data remain in the pipe after a
snd_pipe(). It is mandatory to wakeup the stream and avoid a freeze.
Finally, we must be sure to update the message state when we restart to use the
channel's buffer. Among other things, it is mandatory to swith the message from
DATA to DONE state when all data were sent using the kernel splicing.
Don't force the close on server side anymore. Since commit 7c6f8b146 ("MAJOR:
connections: Detach connections from streams"), it is possible to release a
stream without the underlying connection.
Because of this change, we must be sure to create a new stream to handle the
next HTTP transaction only on the client side. And we must be sure to correctly
handle the read0 event in h1_recv, to be sure to call h1_process().
It avoids a copy between the rxbuf and the channel's buffer. It means the
parsing is done in h1_rcv_buf(). So we need to have a stream to start the
parsing. This change should improve the overall performances. It also implies a
better split between the connection layer and the applicative layer. Now, on the
connection layer, only raw data are manipulated. Raw data received from the
socket are stored in ibuf and those sent are get from obuf. On the applicative
layer, data in ibuf are parsed and copied into the channel's buffer. And on the
other side, those structured data are formatted and copied into obuf.
James Brown reported that when an "accept-ranges" header field is sent
through haproxy and converted from HTTP/1.1 to H2, it's mis-encoded as
"accept-language". It happens that it's one of the few very common header
fields encoded using its index value and that this index value was misread
in the spec as 17 instead of 18, resulting in the wrong name being sent.
Thanks to Lukas for spotting the issue in the HPACK encoder itself.
This fix must be backported to 1.8.
This was the largest function of the whole file, taking a rough second
to build alone. Let's move it to a distinct file along with a few
dependencies. Doing so saved about 2 seconds on the total build time.
The config parser is the largest file to build and its build dominates
the total project's build time. Let's start to split it into multiple
smaller pieces by extracting the "global" section parser into a new
file called "cfgparse-global.c". This removes 1/4th of the file's build
time.
For now, the lua scripts are not compatible with the new HTX internal
representation of HTTP messages. Thus, for a given proxy, when the option
"http-use-htx" is enabled, an error is triggered if any lua's
action/service/sample-fetch/converter is also configured.
For now, the filters are not compatible with the new HTX internal representation
of HTTP messages. Thus, for a given proxy, when the option "http-use-htx" is
enabled, an error is triggered if any filter is also configured.
To do so, the stream is created as earlier as possible. It means, during the mux
creation for the first request, and for others, just at the end of the previous
transaction. Because all timeouts are handled by the strream, the mux's task is
now useless, so it is removed. Finally, to report errors, flags are set on the
HTX message. The HTX message is passed to the stream if there is some content to
analyse or if there is some error to handle.
All of this will probably be reworked later to handle errors and timeouts
directly in the mux. For now, it is the simpler way to handle all of this.
When a server is down, the channel's data must not be consumed. This is
required to allow redispatch and connection retry. So now, we wait for
the connection to be marked as connected, with the flag CO_FL_CONNECTED,
before starting to consume channel's data. In the mux, this event is
tracked with the flag H1C_F_CS_WAIT_CONN.
It does the same than smp_prefetch_http but for HTX messages. It can be called
from an HTTP proxy or a TCP proxy. For HTTP proxies, the parsing is handled by
the mux, so it does nothing but wait. For TCP proxies, it tries to parse an HTTP
message and to convert it in a temporary HTX message. Sample fetches will use
this temporary variable to do their job.
This version is simpler than the legacy one because the parsing is no more
handled by the analyzer. So now we just need to wait to have more data to move
on.
For now, the call to the stats applet is disabled for HTX messages. But HTX
versions of the function to check the request URI against the stats URI and the
fnuction to prepare the call to the stats applet have been added.
It is more or less the same than legacy version but adapted to be called from
HTX analyzers. In the legacy version of this function, we switch on the HTX code
when applicable.
It is more or less the same than legacy version but adapted to be called from
HTX analyzers. In the legacy version of this function, we switch on the HTX code
when applicable.
It is more or less the same than legacy versions but adapted to be called from
HTX analyzers. In the legacy versions of these functions, we switch on the HTX
code when applicable.
It is more or less the same than legacy versions but adapted to be called from
HTX analyzers. In the legacy versions of these functions, we switch on the HTX
code when applicable.
It is more or less the same than del_hdr_value but adapted to be called from HTX
analyzers. The main changes is that it takes pointers on the start and the end
of the header value.
The mux-h1 now parses and formats HTTP/1 messages using the HTX
representation. The HTX analyzers have been updated too. For now, only
htx_wait_for_{request/response} and http_{request/response}_forward_body have
been adapted. Others are disabled for now.
Now, the HTTP messages are parsed by the mux on a side and then, after analysis,
formatted on the other side. In the middle, in the stream, there is no more
parsing. Among other things, the version parsing is now handled by the
mux. During the data forwarding, depending the value of the "extra" field, we
are able to know if the body length is known or not and if yes, how many bytes
are still expected.
This file will host all functions to manipulate HTTP messages using the HTX
representation. Functions in this file will be able to be called from anywhere
and are mainly related to the HTTP semantics.
The internal representation of an HTTP message, called HTX, is a structured
representation, unlike the old one which is a raw representation of
messages. Idea is to have a version-agnostic representation of the HTTP
messages, which can be easily used by to handle HTTP/1, HTTP/2 and hopefully
QUIC messages, and communication from one of them to another.
In this patch, we add types to define the internal representation itself and the
main functions to manipulate them.
The mux relies on the flag CO_RFL_BUF_FLUSH during a call to h1_rcv_buf to know
if it needs to stop reads and to flush its internal buffers to use kernel tcp
splicing. It is the caller responsibility (here the SI) to know when it must
come back on buffered exchanges.
Now, the connection mode is detected in the mux and not in HTX analyzers
anymore. Keep-alive connections are now managed by the mux. A new stream is
created for each transaction. This removes the most important part of the
synchronization between channels and the HTTP transaction cleanup. These changes
only affect the HTX part (proto_htx.c). Legacy HTTP analyzers remain untouched
for now.
On the client-side, the mux is responsible to create new streams when a new
request starts. It is also responsible to parse and update the "Connection:"
header of the response. On the server-side, the mux is responsible to parse and
update the "Connection:" header of the request. Muxes on each side are
independent. For now, there is no connection pool on the server-side, so it
always close the server connection.
For now, it only parses and transfers data. There is no internal representation
yet. It means the stream still need to parse it too. So a message is parsed 3
times today: one time by each muxes (the client one and the server one) and
another time by the stream. This is of course inefficient. But don't worry, it
is only a transitionnal state. And this mux is optional for now.
BTW, headers and body parsing are now handled using same functions than the mux
H2. Request/Response synchronization is also handled. The mux's task is now used
to catch client/http-request timeouts. Others timeouts are still handled by the
stream. On the clien-side, the stream is created once headers are fully parsed
and body parsing starts only when heeaders are transferred to the stream (ie,
copied into channel buffer).
There is still some known limitations here and there. But, it works in the
common cases. Bad message are not captured and some logs are emitted when errors
occur, only if no stream are attached to the mux. Otherwise, data are
transferred and we let the stream handles errors itself.
For now, it is just an other kind of passthrough multiplexer, but with internal
buffers to be prepared to parse incoming messages and to format outgoing
ones. There is also a task attached to it to handle timeouts. However, because
it does not handle any timeout for now, this task is unused. And finally,
because it handles internal buffers, it also handles retries on recv/send. To
use this multiplexer, you must use the option "http-use-htx" both on the
frontend and the backend.
It does not support keep-alive and will freeze connections after the first
request/response.
For now, these analyzers are just copies of the legacy HTTP analyzers. But,
during the HTTP refactoring, it will be the main place where it will be
visible. And in legacy analyzers, the macro IS_HTX_STRM is used to know if the
HTX version should be called or not.
Note: the following commits were applied to proto_http.c after this patch
was developed and need to be studied to see if an adaptation to htx
is required :
fd9b68c BUG/MINOR: only mark connections private if NTLM is detected
The flag CS_FL_READ_PARTIAL can be set by the mux on the conn_stream to notify
the stream interface that some data were received. Is is used in si_cs_recv to
re-arm read timeout on the channel.
These 2 functions are pretty naive. They only split a start-line into its 3
substrings or a header line into its name and value. Spaces before and after
each part are skipped. No CRLF at the end are expected.
By setting the flag CO_RFL_KEEP_RSV when calling mux->rcv_buf, the
stream-interface notifies the mux it must keep some space to preserve the
buffer's reserve. This flag is only useful for multiplexers handling structured
data, because in such case, the stream-interface cannot know the real amount of
free space in the channel's buffer.
This file is empty for now. But it will be used to add new versions of the HTTP
analyzers based on the internal representation of HTTP messages (not implemented
yet but called HTX).
By setting the flag CO_RFL_BUF_FLUSH when calling mux->rcv_buf, the
stream-interface notifies the mux it should flush its buffers without reading
more data. This flag is set when the SI want to use the kernel TCP splicing to
forward data. Of course, the mux can respect it or not, depending on its
state. It's just an information.
Do not destroy the connection when we're about to destroy a stream. This
prevents us from doing keepalive on server connections when the client is
using HTTP/2, as a new stream is created for each request.
Instead, the session is now responsible for destroying connections.
When reusing connections, the attach() mux method is now used to create a new
conn_stream.
Introduce a new field in session, "srv_conn", and a linked list of sessions
in the connection. It will be used later when we'll switch connections
from being managed by the stream, to being managed by the session.
Add a new method for mux, avail_streams, that returns the number of streams
still available for a mux.
For the mux_pt, it'll return 1 if the connection is in idle, or 0. For
the H2 mux, it'll return the max number of streams allowed, minus the number
of streams currently in use.
In order to make the mux_pt able to handle idle connections, give it its
own context, where it'll stores the connection, the current conn_stream if
any, and a wait_event, so that it can subscribe to I/O events.
Add a new parameter to the detach() method, that gives the mux a hint
if it should destroy the connection or not when detaching a conn_stream.
If 1, then the mux_pt immediately destroys the connecion, if 0, then it
just subscribes to any read event. If a read happens, it will call
conn_sock_drain(), and if there's a connection error, it'll free the
connection, after removing it from the idle list.
Instead of trying to receive as soon as the connection is created, and to
eventually have to transfer subscription if we move connections, wait
until the connection is established before attempting to recv.
Remaining calls to si_cant_put() were all for lack of room and were
turned to si_rx_room_blk(). A few places where SI_FL_RXBLK_ROOM was
cleared by hand were converted to si_rx_room_rdy().
The now unused si_cant_put() function was removed.
The channel can disable reading from the stream-interface using various
methods, such as :
- CF_DONT_READ
- !channel_may_recv()
- and possibly others
Till now this was done by mangling SI_FL_RX_WAIT_EP which is not
appropriate at all since it's not the stream interface which decides
whether it wants to deliver data or not. Some places were also wrongly
relying on SI_FL_RXBLK_ROOM since it was the only other alternative,
but it's not suitable for CF_DONT_READ.
Let's use the SI_FL_RXBLK_CHAN flag for this instead. It will properly
prevent the stream interface from being woken up and reads from
subscribing to more receipt without being accidently removed. It is
automatically reset if CF_DONT_READ is not set in stream_int_notify().
The code is not trivial because it splits the logic between everything
related to buffer contents (channel_is_empty(), CF_WRITE_PARTIAL, etc)
and buffer policy (CF_DONT_READ). Also it now needs to decide timeouts
based on any blocking flag and not just SI_FL_RXBLK_ROOM anymore.
It looks like this patch has caused a minor performance degradation on
connection rate, which possibly deserves being investigated deeper as
the test conditions are uncertain (e.g. slightly more subscribe calls?).
For a long time, stream_int_update() and stream_int_notify() used to only
conditionally call si_chk_rcv() based on state change detection. This
detection is not reliable and quite complex. With the new blocked flags
that si_chk_rcv() checks, it's much more reliable to always call the
function to take into account recent changes,and let it decide if it needs
to wake something up or not.
This also removes the calls to si_chk_rcv() that were performed in
si_update_both() since these ones are systematically performed in
stream_int_update() after updating the Rx flags.
Till now we were using si_done_put() upon shutr, but these flags could
be reset upon next activity. Now let's switch to SI_FL_RXBLK_SHUT which
doesn't go away. It's also set in stream_int_update() in case a shutr
condition is detected.
The now unused si_done_put() was removed.
A number of calls to si_cant_put() were used in fact to request being
called back once a buffer is available. These ones are not needed anymore
since si_alloc_ibuf() already sets the SI_FL_RXBLK_BUFF flag when called
in appctx context. Those called with a foreign stream-int are simply turned
to si_rx_buff_blk().
A number of si_cant_put() calls were still present to in fact indicate
that the end point is ready (thus should be turned to si_rx_endp_more()).
One other call in the Lua handler indicates that the endpoint wanted to
be blocked until some room is made in the Rx buffer in order to detect
that the connection happened, which is in fact an indication that it
wants to be called once the endpoint is ready, this is the default case
for an applet so this call was removed.
A useless call to si_cant_put() before appctx_wakeup() in the Lua
applet wakeup call was removed as well since the first thing that will
be done there will be to set end ENDP blocking flag.
If an applet reports being blocked due to any of the channel-side flags,
it's reportedly ready to deliver incoming data. It's better to do this
after the return from the applet handler so that applet developers don't
have to worry about details related to flags ordering.
Instead of first indicating that there's more data to read from the
conn_stream then re-adjusting this info along the function, we now
instead set the status according to the subscription status at the
end. It's easier, more accurate, and less sensitive to intermediary
changes.
This will soon allow to remove all the si_cant_put() calls that were
placed in the middle to force a subsequent callback and prevent the
function from subscribing to the mux layer.
The stream interface used to conflate a missing buffer and lack of
buffer space into SI_FL_WAIT_ROOM but this causes difficulties as
these cannot be checked at the same moment and are not resolved at
the same moment either. Now we instead mark the buffer as presumably
available using si_rx_buff_rdy() and mark it as unavailable+requested
using si_rx_buff_blk().
The call to si_alloc_buf() was moved after si_stop_put(). This makes
sure that the SI_FL_RX_WAIT_EP flag is cleared on allocation failure so
that the function is called again if the callee fails to do its work.
The SI_FL_WANT_PUT flag is used in an awkward way, sometimes it's
set by the stream-interface to mean "I have something to deliver",
sometimes it's cleared by the channel to say "I don't want you to
send what you have", and it has to be set back once CF_DONT_READ
is cleared. This will have to be split between SI_FL_RX_WAIT_EP
and SI_FL_RXBLK_CHAN. This patch only replaces all uses of the
flag with its natural (but negated) replacement SI_FL_RX_WAIT_EP.
The code is expected to be strictly equivalent. The now unused flag
was completely removed.
This flag is not enough to describe all blocking situations, as can be
seen in each case we remove it. The muxes has taught us that using multiple
blocking flags in parallel will be much easier, so let's start to do this
now. This patch only renames this flags in order to make next changes more
readable.
There currently is an optimization in stream_int_notify() consisting
in not trying to forward small bits of data if extra data remain to be
processed. The purpose is to avoid forwarding one chunk at a time if
multiple chunks are available to be parsed at once. It consists in
avoiding sending pending output data if there are still data to be
parsed in the channel's buffer, since process_stream() will have the
opportunity to deal with them all at once.
Not only this optimization is less useful with the new way the connections
work, but it even causes problems like lost events since WAIT_ROOM will
not be removed. And with HTX, it will never be able to update the input
buffer after the first read.
Let's relax the rules now, by always sending if we don't have the
CF_EXPECT_MORE flag (used to group writes), or if the buffer is
already full.
The function used to abuse the internals of mux_pt to retrieve a
conn_stream, which will not work anymore after the idle connection
changes. Let's make it rely on the more reliable cs_get_first()
instead.
This method is used to retrieve the first known good conn_stream from
the mux. It will be used to find the other end of a connection when
dealing with the proxy protocol for example.
Commit d4dd22d ("MINOR: h2: Let user of h2_recv() and h2_send() know xfer
has been done") changed the API without documenting the expected returned
values which appear to come out of nowhere in the code :-( Please don't
do that anymore! The description was recovered from the commit message.
To be used, error messages declared in a default section must be copied when the
parsing of a proxy section starts. But this was only done for frontends.
This patch may be backported to older versions.
There are still some unwelcome synchronous calls to si_cs_recv() in
process_stream(). Let's have a new function si_sync_recv() to perform
a synchronous receive call on a stream interface regardless of the type
of its endpoint, and move these calls there. For now it only implements
conn_streams since it doesn't seem useful to support applets there. The
function implements an extra check for the stream interface to be in an
established state before attempting anything.
In commit f26c26c ("BUG/MEDIUM: stream-int: change the way buffer room
is requested by a stream-int") we used to call si_want_put() at the
end of sess_update_st_con_tcp(), when switching to SI_ST_EST state.
But this is incorrect as there are a few other situations where we
can switch to this state, such as in si_connect() where a connection
reuse is detected, or when directly calling an applet (in which case
that was already covered anyway). For now it doesn't have any side
effect but it could impact connection reuse after the stream-int
changes by stalling an immediately reused connection.
Let's move this flag change to sess_establish() instead, which is the
only place which is always called exactly once on connection setup.
No backport is needed, this is purely 1.9.
In master-worker mode, the socketpair CLI listener of the worker is now
marked unstoppable, which allows to connect to the CLI of an old process
which is in a leaving state, allowing to debug it.
An unstoppable listener is a listener which won't be stop during a soft
stop. The unstoppable_jobs variable is incremented and the listener
won't prevent the process to leave properly.
It is not a good idea to use this feature (the LI_O_NOSTOP flag) with a
listener that need to be bind again on another process during a soft
reload.
This patch allows a process to properly quit when some jobs are still
active, this feature is handled by the unstoppable_jobs variable, which
must be atomically incremented.
During each new iteration of run_poll_loop() the break condition of the
loop is now (jobs - unstoppable_jobs) == 0.
The unique usage of this at the moment is to handle the socketpair CLI
of a the worker during the stopping of the process. During the soft
stop, we could mark the CLI listener as an unstoppable job and still
handle new connections till every other jobs are stopped.
The previous commit fedceaf33 ("MINOR: http: Regroup return statements of
http_req_get_intercept_rule at the end") partly fixes the problem. But not
entierly. Because HTTP 103 reponses were sent line by line it is possible to mix
them with others. For instance, an early-hint rule followed by a redirect rule
leaving the response buffer totally messed up. Furthermore, if we fail to add
the last CRLF to finish the HTTP 103 response because there is no more space in
the buffer, it leave the buffer with an unfinished and invalid message.
This patch fixes the bug by creating a fully formed HTTP 103 response before
trying to push it in the response buffer. If an error occurred during the copy
or if another response was already sent, the HTTP 103 response is
ignored. However, the last point should never happened because, for redirects
and authentication errors, we first try to copy any pending HTTP 103 response.
Instead of having multiple return statements spreaded here and there in middle
of the function, we just exit from the loop setting the right return code. It
let a chance to do some work before leaving the function. It is also less error
prone.
Instead of having multiple return statements spreaded here and there in middle
of the function, we just exit from the loop setting the right return code. It
let a chance to do some work before leaving the function. It is also less error
prone.
This patch fixes a bug introduced in the commit 6b952c810 ("REORG: http: move
http_get_path() to http.c"). In the reorg, the code responsible to skip the
version to only extract the path in the HTTP request was dropped.
No backport is needed, this only affects 1.9.
When splice() reports a pipe full condition, we go through the common
code used to release a possibly empty pipe (which we don't have) and which
immediately tries to allocate a buffer that will never be used. Further,
it may even subscribe to get this buffer if the resources are low. Let's
simply get out of this way if the pipe is full.
This fix could be backported to 1.8 though the code is a bit different
overthere.
Since we don't necessarily pass through conn_fd_handler() when reading,
conn_refresh_polling_flags() is not necessarily called when performing
a recv() operation, thus flags like CO_FL_WAIT_ROOM are not cleared.
It happens that si_cs_recv() checks CO_FL_WAIT_ROOM before deciding to
receive into a buffer, to see if the previous rcv_pipe() call failed by
lack of pipe room. The combined effect of these two statements is that
at the end of a file transmission, when there's too little data to
warrant the use of a pipe and the pipe is empty, we refrain from using
rcv_pipe() for the last few bytes, but since CO_FL_WAIT_ROOM is still
present, we don't use rcv_buf() either, and the connection remains
frozen in this state with si_cs_recv() called in loops.
In order to fix this we can simply manually clear CO_FL_WAIT_ROOM when
not using pipe so that the next check sees the result of the previous
operation and not an old one. We could equally call
cond_refresh_polling_flags() but that would be overkill and dangerous
given that it would manipulate the connection's flags under the mux.
By the way ideally the mux should report this flag into the connstream
for cleaner manipulation.
No backport is needed as this is only post 1.9-dev2.
As part of the changes that went into 1.9-dev2 regarding the polling
modifications, the changes consecutive to the removal of the wait_list
from the conn_streams (commit 71384551a) made si_cs_recv() occasionally
return without subscribing to receive events, causing spliced transfers
to randomly fail if the client was at least as fast as the server. This
may remain unnoticed on most deployments since servers are usually close
to haproxy with higher bandwidth than clients have, resulting in buffers
always being full.
In order to reproduce his effect, it is better to do it on the local
machine and to transfer very large objects (hundreds of gigs) over a
single connection, to see it suddenly stall after a few tens of gigs.
Now with this fix it's fine even after 3 TB over a single connection.
No backport is needed.
When we allocate struct stksess, we also allocate memory to store the
associated data before the struct itself.
As the data can be of different types, they can have different size. However,
we need the struct stksess to be properly aligned, as it can do 64bits
load/store (including atomic load/stores) on 64bits platforms, and some of
them doesn't support unaligned access.
So, when allocating the struct stksess, round the size up to the next
multiple of sizeof(void *), and make sure the struct stksess itself is
properly aligned.
Many thanks to Paul Martin for investigating and reporting that bug.
This should be backported to earlier releases.
When configuring the logs with a FD and using the master worker, the FD
was closed upon a reload because it was configured with CLOEXEC. It
leads to using the wrong FD for the logs and to close them. Which is
unfortunate since the master rely on the FD left opened during a reload.
The fix is to stop doing a CLOEXEC when the FD is inherited.
No backport needed.
This patch implements http_apply_early_hint_rule() function is responsible of
building HTTP 103 Early Hint responses each time a "early-hint" rule is matched.
This patch adds a "early_hint" struct to "arg" union of "act_rule" struct
and parse "early-hint" http-request keyword with it using the same
code as for "(add|set)-header" parser.
When namespaces are disabled, support is still reported because the file
is built with almost nothing in it but built anyway. Instead of extending
the scope of the numerous ifdefs in this file, better avoid building it
when namespaces are diabled. In this case we define my_socketat() as an
inline function mapping directly to socket(). The struct netns_entry
still needs to be defined because it's used by various other functions
in the code.
Splicing was in great part broken over the last few development version
due to the use of co_data() to detect if data are available in the channel.
But co_data() only looks at buffered data, not spliced data.
Channel_is_empty() takes care of both and should be used. With this,
splicing restarts to work but there are still a few cases where transfers
may stall.
No backport is needed.
Subsequent to the recent stream-int updates, we started to consider that
SI_FL_WANT_PUT needs to be set when receipt is enabled, but this is wrong
and results in 100% CPU when an HTTP client stays idle after a keep-alive
request because the stream-int has nothing to provide and nothing to send.
In fact just like for applets this flag should reflect the continuation
of an attempt. So it's si_cs_recv() which should set the flag, and clear
it if it has nothing more to provide. This function is called the first
time in process_stream()), and called again during transfers, so it will
always be up to date during stream_int_update() and stream_int_notify().
As a special case, it should also be set when a connection switches to
the established state. And we should absolutely refrain from calling
si_cs_recv() to re-enable reading, normally just setting this flag
(from within the stream-int's handler or prior to calling si_chk_rcv())
is expected to be OK.
A corner case remains where it was observed that in stream_int_notify() we
can sometimes be called with an empty output channel with SI_FL_WAIT_ROOM
and no CF_WRITE_PARTIAL, so there's no way to detect that we should
re-enable receiving. It's easy to also take care of this condition
there for the time it takes to figure if this situation is expected
or not.
Now it becomes more obvious that relying on a single flag to request
room (or on two flags to arbiter activity) is not workable given the
autonomy of both sides. The mux_h2 has taught us that blocking flags
are much more reliable, require much less condition and are much easier
to deal with. That's probably something to consider quickly in this
area.
No backport is needed.
This format is pretty similar to the previous "short" format except
that it also removes the severity level. Thus only the raw message is
sent. This is suitable for use in containers, where only the raw
information is expected and where the severity is supposed to come
from the file descriptor used.
This format is meant to be used with local file descriptors. It emits
messages only prefixed with a level, removing all the process name,
system name, date and so on. It is similar to the printk() format used
on Linux. It's suitable to be sent to a local logger compatible with
systemd's output format.
Note that the facility is still required but not used, hence it is
suggested to use "daemon" to remind that it's a local logger.
Example :
log stdout format short daemon # send everything to stdout
log stderr format short daemon notice # send important events to stderr
In certain situations it would be desirable to log to an existing file
descriptor, the most common case being a pipe between containers or
processes. The main issue with pipes is that using write() on them will
randomly truncate messages. But there is a trick. By using writev(), we
can atomically deliver or drop a message, which perfectly fits the
purpose. The only caveat is that large messages (4096 bytes on modern
operating systems) may be interleaved with messages from other processes
if using nbproc for example. In practice such messages are rare and most
of the time when users need such type of logging, the load is low enough
for a single process to be running so this is not really a problem.
This logging method thus uses unbuffered writev() calls and is uses more
CPU than if it used its own buffer with large writes at once, though this
is not a problem for moderate loads.
Logging to a file descriptor attached to a file also works with the side
effect that the process is significantly slowed down during disk accesses
and that it's not possible to rotate the file without restarting the
process. For this reason this option is not offered as a configuration
option, since it would confuse most users, but one could decide to
redirect haproxy's output to a file during debugging sessions. Two aliases
"stdout" and "stderr" are provided, but keep in mind that these are closed
by default in daemon mode.
When logging to a pipe or socket at a high enough rate, some logs will be
dropped and the number of dropped messages is reported in "show info".
It's easy to detect when logs on some paths are lost as sendmsg() will
return EAGAIN. This is particularly true when sending to /dev/log, which
often doesn't support a big logging capacity. Let's keep track of these
and report the total number of dropped messages in "show info".
The error messages used to say something along "socket logger 2 failed"
or "sendmsg logger 2 failed" which are confusing. Let's rephrase this
"sendmsg() failed for logger 2".
Building on 32 bit gives this :
src/cache.c: In function 'http_action_store_cache':
src/cache.c:466:4: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:467:5: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c: In function 'cache_channel_append_age_header':
src/cache.c:578:2: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:579:3: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
It's because of the definition below added in commit e7a770c ("MINOR:
cache: Add "Age" header.") :
#define CACHE_ENTRY_MAX_AGE 2147483648
Just appending "U" to mark it unsigned is enough to fix it. This only
affects 1.9, no backport is needed.
In 1.8, commit 45a66cc ("MEDIUM: config: ensure that tune.bufsize is at
least 16384 when using HTTP/2") tried to avoid an annoying issue making
H2 fail when haproxy is built with default buffer sizes smaller than 16kB,
which used to be the case for a very long time. Sadly, the test only sees
when NPN/ALPN exactly match "h2" and not when it's combined like
"h2,http/1.1" nor "http/1.1,h2". We can safely use strstr() there because
the string is prefixed by the token's length (0x02) which is unambiguous
as it cannot be part of any other token.
This fix should be backported to 1.8 as a safety guard against bad
configurations.
Before calling the mux to get incoming data, we get the amount of space
available at the input of the buffer. If there is no space, we don't try to read
more data. This is good enough when raw data are stored in the buffer. But this
info has no meaning when structured data are stored. Because with the HTTP
refactoring, such kind of data will be stored in buffers, it is a bit annoying.
So, to avoid any problems, we always call the mux. It is the mux's responsiblity
to notify the stream interface it needs more space to store more data. This must
be done by setting the flag CS_FL_RCV_MORE on the conn_stream.
This is exactly what we do in the pass-through mux when <count> is null.
This flag is set on the stream interface when we should wait for more space in
the channel's buffer to store more incoming data. This means we should wait some
outgoing data are sent before retrying to receive more data.
But in stream interface functions, at many places, instead of checking this
flag, we use the function channel_may_recv to know if we can (re)start
reading. This currently works but it is not really consistent. And, it works
because only raw data are stored in buffers. But it will be a problem when we
start to store structured data in buffers.
So to avoid any problems with futur implementations, we now rely only on
SI_FL_WAIT_ROOM. The function channel_may_recv can still be called, but only
when we are sure to handle raw data (for instance in functions ci_put*). To do
so, among other things, we must be sure to unset SI_FL_WAIT_ROOM and offer an
opportunity to call chk_rcv() on a stream interface when some data are sent
on the other end, which is now granted by the previous patch series.
We exclusively use stream_int_update() now, the lower layers are not
called anymore so let's remove them, as well as si_update() which used
to be their wrapper.
It's far from being clean, but at least it allows to resync both CS and
applets from the same place, taking into account the fact that CS are
processed synchronously for the send side while appletx are processed
outside of the process_stream() loop. The arrangement is optimised to
minimize the amount of iteration by handling send first, then updating
the SI_FL_WAIT_ROOM flags and only then dealing with si_chk_rcv() on
both sides. The SI_FL_WANT_PUT flag is set if needed before calling
si_chk_rcv() since this is done prior to calling stream_int_update().
Now there's no risk that stream_int_notify() is called anymore during
such operations, thus we cannot have any spurious wake-up anymore. The
case where a successful send() could complete a pending connect() is
handled by taking any stream-int state changes into account at the
call place, which is normal since process_stream() is designed to
iterate till stabilisation.
Doing this solves most of the remaining inconsistencies between CS and
applets.
The function used to be called in turn for each side of the stream, but
since it's called exclusively from process_stream(), it prevents us from
making use of the knowledge we have of the operations in progress for
each side, resulting in having to go all the way through functions like
stream_int_notify() which are not appropriate there.
That patch creates a new function, si_update_both() which takes two
stream interfaces expected to belong to the same stream, and processes
their flags in a more suitable order, but for now doesn't change the
logic at all.
The next step will consist in trying to reinsert the rest of the socket
layer-specific update code to ultimately update the flags correctly at
the end of the operation.
Instead of clearing the SI_FL_WAIT_ROOM flag and losing the information
about the need from the producer to be woken up, we now call si_chk_rcv()
immediately. This is cheap to do and it could possibly be further improved
by only doing it when SI_FL_WAIT_ROOM was still set, though this will
require some extra auditing of the code paths.
The only remaining place where the flag was cleared without a call to
si_chk_rcv() is si_alloc_ibuf(), but since this one is called from a
receive path woken up from si_chk_rcv() or not having failed, the
clearing was not necessary anymore either.
And there was one place in stream_int_notify() where si_chk_rcv() was
called with SI_FL_WAIT_ROOM still explicitly set so this place was
adjusted in order to clear the flag prior to calling si_chk_rcv().
Now we don't have any situation where we randomly clear SI_FL_WAIT_ROOM
without trying to wake the other side up, nor where we call si_chk_rcv()
with the flag set, so this flag should accurately represent a failed
attempt at putting data into the buffer.
When CF_DONT_READ is set, till now we used to set SI_FL_WAIT_ROOM, which
is not appropriate since it would lose the subscribe status. Instead let's
clear SI_FL_WANT_PUT (just like applets do), and set the flag only when
CF_DONT_READ is cleared.
We have to do this in stream_int_update(), and in si_cs_io_cb() after
returning from si_cs_recv() since it would be a bit invasive to hack
this one for now. It must not be done in stream_int_notify() otherwise
it would re-enable blocked applets.
Last, when si_chk_rcv() is called, it immediately clears the flag before
calling ->chk_rcv() so that we are not tempted to uselessly loop on the
same call until the receive function is called. This is the same principle
as what is done with the applet scheduler.
This flag should already be cleared before calling the *chk_rcv() functions.
Before adapting all call places, let's first make sure si_chk_rcv() clears
it before calling them so that these functions do not have to check it again
and so that they do not adjust it. This function will only call the lower
layers if the SI_FL_WANT_PUT flag is present so that the endpoint can decide
not to be called (as done with applets).
We now do this on the si_cs_recv() path so that we always have
SI_FL_WANT_PUT properly set when there's a need to receive and
SI_FL_WAIT_ROOM upon failure.
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
The buffer allocation callback appctx_res_wakeup() used to rely on old
tricks to detect if a buffer was already granted to an appctx, namely
by checking the task's state. Not only this test is not valid anymore,
but it's inaccurate.
Let's solely on SI_FL_WAIT_ROOM that is now set on allocation failure by
the functions trying to allocate a buffer. The buffer is now allocated on
the fly and the flag removed so that the consistency between the two
remains granted. The patch also fixes minor issues such as the function
being improperly declared inline(!) and the fact that using appctx_wakeup()
sets the wakeup reason to TASK_WOKEN_OTHER while we try to use TASK_WOKEN_RES
when waking up consecutive to a ressource allocation such as a buffer.
This function replaces stream_res_available(), which is used as a callback
for the buffer allocator. It now carefully checks which stream interface
was blocked on a buffer allocation, tries to allocate the input buffer to
this stream interface, and wakes the task up once such a buffer was found.
It will automatically remove the SI_FL_WAIT_ROOM flag upon success since
the info this flag indicates becomes wrong as soon as the buffer is
allocated.
The code is still far from being perfect because if a call to si_cs_recv()
fails to allocate a buffer, we'll still end up passing via process_stream()
again, but this could be improved in the future by using finer-grained
wake-up notifications.
When using the CLI proxy of the master and trying to access a worker
with the @ prefix, the worker just crash.
The commit 7216032 ("MEDIUM: mworker: leave when the master die")
reintroduced the old code of the pipe, which was not trying to access
the pointers before. The owner of the FD was modified to a different
value, this is a problem since we call listener_accept() in most cases
now from the mworker_accept_wrapper() and it casts the owner variable to
get the listener.
This patch fix the issue by setting back the previous owner of the FD.
Commit eafd8ebcf ("MEDIUM: stream-int: call si_cs_process() in
stream_int_update_conn") uncovered a sleeping bug. By calling
si_cs_process() within si_update(), we end up calling stream_int_notify().
We rely on it to update the stream-int before quitting as a hack, but
it happens to immediately wake the task up while the stream int's
state is still SI_ST_CON (during the connection establishment). The
observable effect is that an unreachable server causes haproxy to
use 100% CPU until the connection timeout strikes.
This patch fixes this by not causing the wake up for the SI_ST_CON
state. It would equally be possible to check for states higher than
SI_ST_EST as is done in other places, but for now better stay on the
safe side by covering the only issue that can be triggered. It's
suspected that this issue slightly affects older versions by causing
one extra call to process_stream() during the connection setup for
each activity change on the other side, but this should not have
any observable effect.
No backport is needed.
The process was aborting with nbthread > 1.
The mworker_pipe_register() could be called several time in multithread
mode, we don't want to abort() there.
It took me 17 minutes this morning to figure where si->wait_event was
set (it's in si_reset() which should now probably be renamed since it
doesn't just perform a reset anymore but also an allocation) and what
its task was assigned to (si_cs_io_cb() even for applets and empty SI).
This is too confusing and not intuitive enough, let's at least add a
few comments for now to help figure how this stuff works next time.
When the master die, the worker should exit too, this is achieved by
checking if the FD of the socketpair/pipe was closed between the master
and the worker.
In the former architecture of the master-worker, there was only a pipe
between the master and the workers, and it was easy to check an EOF on
the pipe FD to exit() the worker.
With the new architecture, we use a socketpair by process, and this
socketpair is also used to accept new connections with the
listener_accept() callback.
This accept callback can't handle the EOF and the exit of the process,
because it's very specific to the master worker. This is why we
transformed the mworker_pipe_handler() function in a wrapper which check
if there is an EOF and exit the process, and if not call
listener_accept() to achieve the accept.
The former behavior was to exit() the master process with the latest
status code known, which was the one of the last process to exit.
The problem is that the master process was not exiting with the status
code which provoked the exit-on-failure.
The active peers output indicates both the number of established peers
connections and the number of peers connection attempts. The new counter
"ConnectedPeers" also indicates the number of currently connected peers.
This helps detect that some peers cannot be reached for example. It's
worth mentioning that this value changes over time because unused peers
are often disconnected and reconnected. Most of the time it should be
equal to ActivePeers.
Peers are the last type of activity which can maintain a job present, so
it's important to report that such an entity is still active to explain
why the job count may be higher than zero. Here by "ActivePeers" we report
peers sessions, which include both established connections and outgoing
connection attempts.
When an haproxy process doesn't stop after a reload, it's because it
still has some active "jobs", which mainly are active sessions, listeners,
peers or other specific activities. Sometimes it's difficult to troubleshoot
the cause of these issues (which generally are the result of a bug) only
because some indicators are missing.
This patch add the number of listeners, the number of jobs, and the stopping
status to the output of "show info". This way it becomes a bit easier to try
to narrow down the cause of such an issue should it happen. A typical use
case is to connect to the CLI before reloading, then issuing the "show info"
command to see what happens. In the normal situation, stopping should equal
1, jobs should equal 1 (meaning only the CLI is still active) and listeners
should equal zero.
The patch is so trivial that it could make sense to backport it to 1.8 in
order to help with troubleshooting.
The tasks API was changed in 1.9-dev1 with commit 9f6af3322 ("MINOR: tasks:
Change the task API so that the callback takes 3 arguments."), causing the
task's state not to be usable anymore and to have been replaced with an
explicit argument in the callee. The task's state doesn't contain any trace
of the wakeup cause anymore. But there were two places where the old task's
state remained in use :
- sessions, used to more accurately report timeouts in logs when seeing
TASK_WOKEN_TIMEOUT ;
- peers, used to finish resynchronization when seeing TASK_WOKEN_SIGNAL
This commit fixes both occurrences by making sure we don't access task->state
directly (should we rename it by the way ?).
No backport is needed.
This one causes some events to be lost. It has already been tested in
an experimental branch but was not merged until being certain it was
needed. Fred figured that requesting /?k=1&s=447392 from httpterm through
haproxy-master was enough to stall the transfer.
No backport is needed, this only affects 1.9-dev5.
It was reported here that authentication may fail when threads are
enabled :
https://bugzilla.redhat.com/show_bug.cgi?id=1643941
While I couldn't reproduce the issue, it's obvious that there is a
problem with the use of the non-reentrant crypt() function there.
On Linux systems there's crypt_r() but not on the vast majority of
other ones. Thus a first approach consists in placing a lock around
this crypt() call. Another patch may relax it when crypt_r() is
available.
This fix must be backported to 1.8. Thanks to Ryan O'Hara for the
quick notification.
A bug occurs when the CLI proxy of the master received a command which
is prefixed by some spaces but without a routing prefix (@).
In this case the pcli_parse_request() was returning a wrong number of
data to forward.
The response analyzer was called twice and the prompt displayed twice.
Commit 85b73e9 ("BUG/MEDIUM: stream: Make sure polling is right on retry.")
introduced a possible null dereference on the error path detected by gcc-7.
Let's simply assign srv_conn after checking the error and not before.
No backport is needed.
When the "path" sample fetch function is called without any path, the
function doesn't check that the request buffer is allocated. While this
doesn't happen with the request during processing, it can definitely
happen when mistakenly trying to reference a path from the response
since the request channel is not allocated anymore.
It's certain that this bug was emphasized by the buffer changes that
went in 1.9 and the HTTP refactoring, but at first glance, 1.8 doesn't
seem 100% safe either so it's possible that older version are affected
as well.
Thanks to PiBa-NL for reporting this bug with a reproducer.
This patch makes the cache capable of adding an "Age" header as defined by
rfc7234.
During the storage of new HTTP objects we memorize ->eoh value and
the value of the "Age" header coming from the origin server.
These information may then be reused to return the cached HTTP objects
with a new "Age" header.
May be backported to 1.8.
This patch implements analysers for parsing the CLI and extra features
for the master's CLI.
For each command (sent alone, or separated by ; or \n) the request
analyser will determine to which server it should send the request.
The 'mode cli' proxy is able to parse a prefix for each command which is
used to select the apropriate server. The prefix start by @ and is
followed by "master", the PID preceded by ! or the relative PID. (e.g.
@master, @1, @!1234). The servers are not round-robined anymore.
The command is sent with a SHUTW which force the server to close the
connection after sending its response. However the proxy allows a
keepalive connection on the client side and does not close.
The response analyser does not do much stuff, it only reinits the
connection when it received a close from the server, and forward the
response. It does not analyze the response data.
The only guarantee of the end of the response is the close of the
server, we can't rely on the double \n since it's not send by every
command.
This could be reimplemented later as a filter.
Add a struct server pointer in the mworker_proc struct so we can easily
use it as a target for the mworker proxy.
pcli_prefix_to_pid() is used to find the right PID of the worker
when using a prefix in the CLI. (@master, @#<relative pid> , @<pid>)
pcli_pid_to_server() is used to find the right target server for the
CLI proxy.
The master process does not need all the keywords of the cli, add 2
flags to chose which keyword to use.
It might be useful to activate some of them in a debug mode later...
This patch introduces mworker_cli_proxy_new_listener() which allows the
creation of new listeners for the CLI proxy.
Using this function it is possible to create new listeners from the
program arguments with -Sa <unix_socket>. It is allowed to create
multiple listeners with several -Sa.
This patch implements a listen proxy within the master. It uses the
sockpair of all the workers as servers.
In the current state of the code, the proxy is only doing round robin on
the CLI of the workers. A CLI mode will be needed to know to which CLI
send the requests.
The init code of the mworker_proc structs has been moved before the
init of the listeners.
Each socketpair is now connected to a CLI within the workers, which
allows the master to access their CLI.
The inherited flag of the worker side socketpair is removed so the
socket can be closed in the master.
There's a call there to si_cs_send() while we're supposed to come from
si_cs_io_cb() which has just done it. But in fact we can also come here
as a lower layer callback from ->wake() after a connection is established.
Since most of the time we'll end up here with either no data in the buffer
or a blocked output, let's simply check if we're already susbcribed to send
events before calling si_cs_send().
stream_int_notify() is I/O agnostic and should not wake up the tasklet,
it's up to si_cs_process() to do that, just like si_applet_wake_cb()
does it for the applet.