Make "bind" keywork be supported in "peers" sections.
All "bind" settings are supported on this line.
Add "default-bind" option to parse the binding options excepted the bind address.
Do not parse anymore the bind address for local peers on "server" lines.
Do not use anymore list_for_each_entry() to set the "peers" section
listener parameters because there is only one listener by "peers" section.
May be backported to 1.5 and newer.
This patch adds pointer to a struct server to peer structure which
is initialized after having parsed a remote "peer" line.
After having parsed all peers section we run ->prepare_srv to initialize
all SSL/TLS stuff of remote perr (or server).
Remaining thing to do to completely support peer protocol over SSL/TLS:
make "bind" keyword be supported in "peers" sections to make SSL/TLS
incoming connections to local peers work.
May be backported to 1.5 and newer.
With this patch "default-server" lines are supported in "peers" sections
to setup the default settings of peers which are from now setup
when parsing both "peer" and "server" lines.
May be backported to 1.5 and newer.
Even if not already the case, we suppose that the frontend "peers" section
may have been already initialized outside of "peer" line, we seperate
their initializations from their binding initializations.
May be backported to 1.5 and newer.
Use ->local "peers" struct member to flag a "peers" section frontend
has being initialized. This is to be able to initialize the frontend
of "peers" sections on lines different from "peer" lines.
May be backported to 1.5 and newer.
Create init_peers_frontend() function to allocate and initialize
the frontend of "peers" sections (->peers_fe) so that to reuse it later.
May be backported to 1.5 and newer.
If a send succeeded, add the CO_FL_CONNECTED flag, the send may have been
called by the upper layers before we even realized we were connected, and we
may even read the response before we get the information, and si_cs_recv()
has to know if we were connected or not.
This should be backported to 1.9.
If an ALPN is set on the server line, then when we reach assign_tproxy_address,
the stream_interface's endpoint will be a connection, not a conn_stream,
so make sure assign_tproxy_address() handles both cases.
This should be backported to 1.9.
For HTX streams, the scope pointer is relative to the URI in the start-line. But
for streams using the legacy HTTP representation, the scope pointer is relative
to the beginning of output data in the channel's buffer. So we must be careful
to use the right one depending on the HTX is used or not.
Because the start-line is used to get de scope pointer, it is important to keep
it after the parsing of post paramters. So now, instead of removing blocks when
read in the function stats_process_http_post(), we just move on next, leaving it
in the HTX message.
Thanks to Pieter (PiBa-NL) to report this bug.
This patch must be backported to 1.9.
The code using the deprecated 'buf->p' has been updated to use
'ci_head(buf)' as described in section 5 of
'doc/internals/buffer-api.txt'.
A compile time switch on 'BUF_NULL' has also been added to enable the
same source code to be used with pre and post API change HAProxy.
This should be backported to 1.9, and is compatible with previous
versions.
The most significant change from 1.8 to >=1.9 is the buffer
data structure, using the new field and fixing along side
a little hidden compilation warning.
This must be backported to 1.9.
When an argument <draws> is present, it must be an integer value one
or greater, indicating the number of draws before selecting the least
loaded of these servers. It was indeed demonstrated that picking the
least loaded of two servers is enough to significantly improve the
fairness of the algorithm, by always avoiding to pick the most loaded
server within a farm and getting rid of any bias that could be induced
by the unfair distribution of the consistent list. Higher values N will
take away N-1 of the highest loaded servers at the expense of performance.
With very high values, the algorithm will converge towards the leastconn's
result but much slower. The default value is 2, which generally shows very
good distribution and performance. This algorithm is also known as the
Power of Two Random Choices and is described here :
http://www.eecs.harvard.edu/~michaelm/postscripts/handbook2001.pdf
Since all of them are exclusive, let's move them to an union instead
of eating memory with the sum of all of them. We're using a transparent
union to limit the code changes.
Doing so reduces the struct lbprm from 392 bytes to 372, and thanks
to these changes, the struct proxy is now down to 6480 bytes vs 6624
before the changes (144 bytes saved per proxy).
This one is a proxy option which can be inherited from defaults even
if the LB algo changes. Move it out of the lb_chash struct so that we
don't need to keep anything separate between these structs. This will
allow us to merge them into an union later. It even takes less room
now as it fills a hole and removes another one.
The algo-specific settings move from the proxy to the LB algo this way :
- uri_whole => arg_opt1
- uri_len_limit => arg_opt2
- uri_dirs_depth1 => arg_opt3
Some algorithms require a few extra options (up to 3). Let's provide
some room in lbprm to store them, and make sure they're passed from
defaults to backends.
These ones used to rely on separate variables called hh_name/hh_len
but they are exclusive with the former. Let's use the same variable
which becomes a generic argument name and length for the LB algorithm.
There are a few instances where the lookup algo is tested against
BE_LB_LKUP_CHTREE using a binary "AND" operation while this macro
is a value among a set, and not a bit. The test happens to work
because the value is exactly 4 and no bit overlaps with the other
possible values but this is a latent bug waiting for a new LB algo
to appear to strike. At the moment the only other algo sharing a bit
with it is the "first" algo which is never supported in the same code
places.
This fix should be backported to maintained versions for safety if it
passes easily, otherwise it's not important as it will not fix any
visible issue.
The "balance uri" options "whole", "len" and "depth" were not properly
inherited from the defaults sections. In addition, "whole" and "len"
were not even reset when parsing "uri", meaning that 2 subsequent
"balance uri" statements would not have the expected effect as the
options from the first one would remain for the second one.
This may be backported to all maintained versions.
At a few places in the code we used to rely on this variable to guess
what LB algo was in place. This is wrong because if the defaults section
presets "balance url_param foo" and a backend uses "balance roundrobin",
these locations will still see this url_param_name set and consider it.
The harm is limited, as this only causes the beginning of the request
body to be buffered. And in general this is a bad practice which prevents
us from cleaning the lbprm stuff. Let's explicitly check the LB algo
instead.
This may be backported to all currently maintained versions.
Openssl switched from aes128 to aes256 since may 2016 to compute
tls ticket secrets used by default. But Haproxy still handled only
128 bits keys for both tls key file and CLI.
This patch permit the user to set aes256 keys throught CLI or
the key file (80 bytes encoded in base64) in the same way that
aes128 keys were handled (48 bytes encoded in base64):
- first 16 bytes for the key name
- next 16/32 bytes for aes 128/256 key bits key
- last 16/32 bytes for hmac 128/256 bits
Both sizes are now supported (but keys from same file must be
of the same size and can but updated via CLI only using a key of
the same size).
Note: This feature need the fix "dec func ignores padding for output
size checking."
This patch fixes missing allocation checks loading tls key file
and avoid memory leak in some error cases.
This patch should be backport on branches 1.9 and 1.8
Decode function returns an error even if the ouptut buffer is
large enought because the padding was not considered. This
case was never met with current code base.
In h1_process(), if we have no associated stream, and the connection got a
shutw, then destroy it, it is unusable and it may be our last chance to do
so.
This should be backported to 1.9.
When using a check to send email, avoid having an associated server, so that
we don't modify the server state if we fail to send an email.
Also revert back to initialize the check status to HCHK_STATUS_INI, now that
set_server_check_status() stops early if there's no server, we shouldn't
get in a mail loop anymore.
This should be backported to 1.9.
Instead of assuming we have a server, store the proxy directly in struct
check, and use it instead of s->server.
This should be a no-op for now, but will be useful later when we change
mail checks to avoid having a server.
This should be backported to 1.9.
The cache uses the first 32 bits of the uri's hash as the key to reference
the object in the cache. It makes a special case of the value zero to mean
that the object is not in the cache anymore. The problem is that when an
object hashes as zero, it's still inserted but the eb32_delete() call is
skipped, resulting in the object still being chained in the memory area
while the block has been reclaimed and used for something else. Then when
objects which were chained below it (techically any object since zero is
at the root) are deleted, the walk through the upper object may encounter
corrupted values where valid pointers were expected.
But while this should only happen statically once on 4 billion, the problem
gets worse when the cache-use conditions don't match the cache-store ones,
because cache-store runs with an uninitialized key, which can create objects
that will never be found by the lookup code, or worse, entries with a zero
key preventing eviction of the tree node and resulting in a crash. It's easy
to accidently end up on such a config because the request rules generally
can't be used to decide on the response :
http-request cache-use cache if { path_beg /images }
http-response cache-store cache
In this test, mixing traffic with /images/$RANDOM and /foo/$RANDOM will
result in random keys being inserted, some of them possibly being zero,
and crashes will quickly happen.
The fix consists in 1) always initializing the transaction's cache_hash
to zero, and 2) never storing a response for which the hash has not been
calculated, as indicated by the value zero.
It is worth noting that objects hashing as value zero will never be cached,
but given that there's only one chance among 4 billion that this happens,
this is totally harmless.
This fix must be backported to 1.9 and 1.8.
When using early data, disable the OpenSSL anti-replay protection, and set
the max amount of early data we're ready to accept, based on the size of
buffers, or early data won't work with the released OpenSSL 1.1.1.
This should be backported to 1.8.
When initializing server-template all of the servers after the first
have srv->idle_orphan_conns initialized within server_template_init()
The first server does not have this initialized and when http-reuse
is active this causes a segmentation fault when accessed from
srv_add_to_idle_list(). This patch removes the check for
srv->tmpl_info.prefix within server_finalize_init() and allows
the first server within a server-template to have srv->idle_orphan_conns
properly initialized.
This should be backported to 1.9.
In the function hlua_applet_htx_send_yield(), there already was a test to
respect the reserve but the wrong function was used to get the available space
for data in the HTX buffer. Instead of calling htx_free_space(), the function
htx_free_data_space() must be used. But in fact, there is no reason to bother
with that anymore because the function channel_htx_recv_max() has been added for
this purpose.
The result of this bug is that the call to htx_add_data() failed unexpectedly
while the amount of written data was incremented, leading the applet to think
all data was sent. To prevent any futher bugs, a test has been added to yield if
we are not able to write data into the channel buffer.
This patch must be backported to 1.9.
Tim Düsterhus reported a possible crash in the H2 HEADERS frame decoder
when the PRIORITY flag is present. A check is missing to ensure the 5
extra bytes needed with this flag are actually part of the frame. As per
RFC7540#4.2, let's return a connection error with code FRAME_SIZE_ERROR.
Many thanks to Tim for responsibly reporting this issue with a working
config and reproducer. This issue was assigned CVE-2018-20615.
This fix must be backported to 1.9 and 1.8.
channel_truncate() is not aware of the underlying format of the messages. So if
there are some outgoing data in the channel when called, it does some unexpected
operations on the channel's buffer. So the HTX version, channel_htx_truncate(),
must be used. The same is true for channel_erase(). It resets the buffer but not
the HTX message. So channel_htx_erase() must be used instead. This patch is
flagged as a bug, but as far as we know, it was never hitted.
This patch should be backported to 1.9. If so, following patch must be
backported too:
* MINOR: channel/htx: Add the HTX version of channel_truncate/erase
We need to check if any compression filter precedes the cache filter. This is
only possible when the compression is configured in the frontend while the cache
filter is configured on the backend (via a cache-store action or
explicitly). This case cannot be detected during HAProxy startup. So in such
cases, the cache is disabled.
The patch must be backported to 1.9.
On legacy HTTP streams, it is forbidden to use the compression with the
cache. When the compression filter is explicitly specified, the detection works
as expected and such configuration are rejected at startup. But it does not work
when the compression filter is implicitly defined. To fix the bug, the implicit
declaration of the compression filter is checked first, before calling .check()
callback of each filters.
This patch should be backported to 1.9.
Since the commit 9666720c8 ("BUG/MEDIUM: compression: Use the right buffer
pointers to compress input data"), the compression can be done twice. The first
time on the frontend and the second time on the backend. This may happen by
configuring the compression in a default section.
To fix the bug, when the response is checked to know if it should be compressed
or not, if the flag HTTP_MSGF_COMPRESSING is set, the compression is not
performed. It means it is already handled by a previous compression filter.
Thanks to Pieter (PiBa-NL) to report this bug.
This patch must be backported to 1.9.
Now, h1_shutr() only do a shutdown read and try to set the flag
H1C_F_CS_SHUTDOWN if shutdown write was already performed. On its side,
h1_shutw(), if all conditions are met, do the same for the shutdown write. The
real connection close is done when the mux h1 is released, in h1_release().
The flag H1C_F_CS_SHUTW was renamed to H1C_F_CS_SHUTDOWN to be less ambiguous.
This patch may be backported to 1.9.
In h1_shutr(), to fully close the connection, we must be sure the shutdown write
was already performed on the connection. So we know rely on connection flags
instead of conn_stream flags. If CO_FL_SOCK_WR_SH is already set when h1_shutr()
is called, we can do a full connection close. Otherwise, we just do the shutdown
read.
Without this patch, it is possible to close the connection too early with some
outgoing data in the output buf.
This patch must be backported to 1.9.
As for the cache applet, this one must respect the reserve on HTX streams. This
patch is tagged as MINOR because it is unlikely to fully fill the channel's
buffer. Some tests are already done to not process almost full buffer.
This patch must be backported to 1.9.
It is only true for HTX streams. The legacy code relies on ci_putblk() which is
already aware of the reserve. It is mandatory to not fill the reserve to let
other filters analysing data. It is especially true for the compression
filter. It needs at least 20 bytes of free space, plus at most 5 bytes per 32kB
block. So if the cache fully fills the channel's buffer, the compression will
not have enough space to do its job and it will block the data forwarding,
waiting for more free space. But if the buffer fully filled with input data (ie
no outgoing data), the stream will be frozen infinitely.
This patch must be backported to 1.9. It depends on the following patches:
* BUG/MEDIUM: cache/htx: Respect the reserve when cached objects are served
from the cache
* MINOR: channel/htx: Add HTX version for some helper functions
When a task is created from Lua context out of initialisation,
the hlua_ctx_init() function can be called from safe environement,
so we must not initialise it. While the support of threads appear,
the safe environment set a lock to ensure only one Lua execution
at a time. If we initialize safe environment in another safe
environmenet, we have a dead lock.
this patch adds the support of the idicator "already_safe" whoch
indicates if the context is initialized form safe Lua fonction.
thank to Flakebi for the report
This patch must be backported to haproxy-1.9 and haproxy-1.8
In tcp actions case, the argument n - 1 is returned. For example:
http-request lua.script stuff
display "stuff" as first arg
tcp-request content lua.script stuff
display "lua.script" as first arg
The action parser doesn't use the *cur_arg value.
Thanks to Andy Franks for the bug report.
This patch mist be backported in haproxy-1.8 and haproxy-1.9
The "show sess all" command didn't allow to detect whether compression
is in use for a given stream, which is sometimes annoying. Let's add a
few more info about the HTTP messages, namely the flags, body len, chunk
len and the "next" pointer.
The "waiting" flag indicates if the stream is waiting for some memory,
and was placed on the same output line as the txn for ease of reading.
But since 1.6 the txn is not part of the stream anymore so this output
was placed under a condition, resulting in "waiting" to appear only
when a txn is present. Let's move it upper, closer to the stream's
flags to fix this.
This may safely be backported though it has little value for older
versions.
Commit b9af88151 ("MINOR: stream/htx: Add info about the HTX structs in
"show sess all" command") accidently forgot the flags on the request
path, it was only on the response path.
It makes sense to backport this to 1.9 so that both outputs are the same.
While testing fixes, it's sometimes confusing to rebuild only one C file
(e.g. a mux) and not to have the correct commit ID reported in "haproxy -v"
nor on the stats page.
This patch adds a new "version.c" file which is always rebuilt. It's
very small and contains only 3 variables derived from the various
version strings. These variables are used instead of the macros at the
few places showing the version. This way the output version of the
running code is always correct for the parts that were rebuilt.
This one used to rely on a few spin locks around lists manipulations
only but 1) there were still a few races (e.g. when aborting, or
between STAT_ST_INIT and STAT_ST_LIST), and 2) after last commit
which dumps htx info it became obvious that dereferencing the buffer
contents is not safe at all.
This patch uses the thread isolation from the rendez-vous point
instead, to guarantee that nothing moves during the dump. It may
make the dump a bit slower but it will be 100% safe.
This fix must be backported to 1.9, and possibly to 1.8 which likely
suffers from the short races above, eventhough they're extremely
hard to trigger.
In connect_server(), if we're using a new connection, and we have to
initialize the mux right away, only do it so after si_connect() has been
called. si_connect() is responsible for initializing the xprt, and the
mux initialization may depend on the xprt being usable, as it may try to
receive data. Otherwise, the connection will be flagged as having an error,
and we will have to try to connect a second time.
This should be backported to 1.9.
In h1_init(), instead of calling h1_recv() directly, just wake the tasklet,
so that the receive will be done later.
h1_init() might be called from connect_server(), which is itself called
indirectly from process_stream(), and if the receive fails, we may call
si_cs_process(), which may destroy the channel buffers while process_stream()
still expects them to exist.
This should be backported to 1.9.
When a chunked object is served from the cache, If the trailers are not pushed
in the channel's buffer in one time, we still have to count them in the total
written bytes in the buffer.
This patch must be backported to 1.9.
Since the commit 0f8fb6b7f ("MINOR: h1: make the H1 headers block parser able to
parse headers only"), when headers are not received in one time, a parsing error
is returned because the local state in the function h1_headers_to_hdr_list() was
not initialized with the previous one (in fact, it was not initialized at all).
So now, we start the parsing of headers with the state H1_MSG_HDR_FIRST when the
flag H1_MF_HDRS_ONLY is set. Otherwise, we always get it from the h1m.
This patch must be backported to 1.9.
For HTX streams, info about the HTX structure is now dumped for the request and
the response channels in "show sess all" command.
The patch may be backported to 1.9.
Now the H2 mux will parse and encode the HTX trailers blocks and send
the corresponding HEADERS frame. Since these blocks contain pure H1
trailers which may be fragmented on line boundaries, if first needs
to collect all of them, parse them using the H1 parser, build a list
and finally encode all of them at once once the EOM is met. Note that
this HEADERS frame always carries the end-of-headers and end-of-stream
flags.
This was tested using the helloworld examples from the grpc project,
as well as with the h2c tools. It doesn't seem possible at the moment
to test tailers using varnishtest though.
Currently the H1 headers parser works for either a request or a response
because it starts from the start line. It is also able to resume its
processing when it was interrupted, but in this case it doesn't update
the list.
Make it support a new flag, H1_MF_HDRS_ONLY so that the caller can
indicate it's only interested in the headers list and not the start
line. This will be convenient to parse H1 trailers.
We want to make sure we won't emit another empty DATA frame if we meet
HTX_BLK_EOM after and end of stream was already sent. For now it cannot
happen as far as HTX is respected, but with trailers it may become
ambiguous.
Recent commit 4710d20 ("BUG/MEDIUM: mux-h1: make HTX chunking
consistent with H2") tried to address chunking inconsistencies between
H1/HTX/H2 and has enforced it on every outgoing message carrying
H1_MF_XFER_LEN without H1_MF_CLEN nor H1_MF_CHNK. But it also does it
on requests, which is not appropriate since a request by default
doesn't have a message body unless explicitly mentioned. Also make
sure we only do this on HTTP/1.1 messages.
The problem is to guarantee the highest level of compatibility between
H1/H1, H1/H2, H2/H1 in each direction regarding the lack of content-
length. We have this truth table (a star '*' indicates which one can
pass trailers) :
H1 client -> H1 server :
request:
CL=0 TE=0 XL=1 -> CL=0 TE=0
CL=0 TE=1 XL=1 -> CL=0 TE=1 *
CL=1 TE=0 XL=1 -> CL=1 TE=0
CL=1 TE=1 XL=1 -> CL=1 TE=1 *
response:
CL=0 TE=0 XL=0 -> CL=0 TE=0
CL=0 TE=1 XL=1 -> CL=0 TE=1 *
CL=1 TE=0 XL=1 -> CL=1 TE=0
CL=1 TE=1 XL=1 -> CL=1 TE=1 *
H2 client -> H1 server : (H2 messages always carry XFER_LEN)
request:
CL=0 XL=1 -> CL=0 TE=0
CL=1 XL=1 -> CL=1 TE=0
response:
CL=0 TE=0 XL=0 -> CL=0
CL=0 TE=1 XL=1 -> CL=0 *
CL=1 TE=0 XL=1 -> CL=1
CL=1 TE=1 XL=1 -> CL=1 *
H1 client -> H2 server : (H2 messages always carry XFER_LEN)
request:
CL=0 TE=0 XL=1 -> CL=0
CL=0 TE=1 XL=1 -> CL=0 *
CL=1 TE=0 XL=1 -> CL=1
CL=1 TE=1 XL=1 -> CL=1 *
response:
CL=0 XL=1 -> CL=0 TE=1 *
CL=1 XL=1 -> CL=1 TE=0
For H1 client to H2 server, it will be possible to rely on the presence
of "TE: trailers" in the H1 request to automatically switch to chunks
in the response, and be able to pass trailers at the end. For now this
check is not implemented so an H2 response missing a content-length to
an H1 request will always have a transfer-encoding header added and
trailers will be forwarded if any.
This patch depends on previous commit "MINOR: mux-h1: parse the
content-length header on output and set H1_MF_CLEN" to work properly.
Since the aforementioned commit is scheduled for backport to 1.9 this
commit must also be backported to 1.9.
The H1_MF_CLEN flag is needed to figure whether a content-length header is
present or not when producing a request, so let's check it on output just
like we already check the transfer-encoding header.
This function is usable to transform a list of H2 header fields to a
HTX trailers block. It takes care of rejecting forbidden headers and
pseudo-headers when performing the conversion. It also emits the
trailing CRLF that is currently needed in the HTX trailers block.
When forwarding an H2 request to an H1 server, if the request doesn't
have a content-length header field, it is chunked. In this case it is
possible to send trailers to the server, which is what this patch does.
If the transfer is performed without chunking, then the trailers are
silently discarded.
This function is usable to transform a list of H2 header fields to a
H1 trailers block. It takes care of rejecting forbidden headers and
pseudo-headers when performing the conversion.
This is not exactly a bug but a long-time design limitation. We used not
to decode trailers in H2, resulting in broken connections each time a
trailer was sent, since it was impossible to keep the HPACK decompressor
synchronized. Now that the sequencing of operations permits it, we must
make sure to at least properly decode them.
What we try to do is to identify if a HEADERS frame was already seen and
use this indication to know if it's a headers or a trailers. For this,
h2c_decode_headers() checks if the stream indicates that a HEADERS frame
was already received. If so, it decodes it and emits the trailing
0 CRLF CRLF in case of H1, or the HTX_EOD + HTX_EOM blocks in case of HTX,
to terminate the data stream.
The trailers contents are still deleted for now but the request works, and
the connection remains synchronized and usable for subsequent streams.
The correctness may be tested using a simple config and h2spec :
h2spec -o 1000 -v -t -S -k -h 127.0.0.1 -p 4443 generic/4/4
This should definitely be backported to 1.9 given the low impact for the
benefit. However it cannot be backported to 1.8 since the operations cannot
be resumed. The following patches are also needed with this one :
MINOR: mux-h2: make h2c_decode_headers() return a status, not a count
MINOR: mux-h2: add a new dummy stream : h2_error_stream
MEDIUM: mux-h2: make h2c_decode_headers() support recoverable errors
BUG/MINOR: mux-h2: detect when the HTX EOM block cannot be added after headers
MINOR: mux-h2: check for too many streams only for idle streams
MINOR: mux-h2: set H2_SF_HEADERS_RCVD when a HEADERS frame was decoded
The HEADERS frame parser checks if we still have too many streams, but
this should only be done for idle streams, otherwise it would prevent
us from processing trailer frames.
In h2c_frt_handle_headers() and h2c_bck_handle_headers() we have an unused
error path made of the strm_err label, while send_rst is used to emit an
RST upon stream error after forcing the stream to h2_refused_stream. Let's
remove this unused strm_err block now.
In h2c_frt_handle_headers(), we test the stream for SS_ERROR just after
setting it to SS_OPEN, this makes no sense and creates confusion in the
error path. Remove this misleading test.
In case we receive a very large HEADERS frame which doesn't leave enough
room to place the EOM block after the decoded headers, we must fail the
stream. This test was missing, resulting in the loss of the EOM, possibly
leaving the stream waiting for a time-out.
Note that we also clear h2c->dfl here so that we don't attempt to clear
it twice when going back to the demux.
If this is backported to 1.9, it also requires that the following patches
are backported as well :
MINOR: mux-h2: make h2c_decode_headers() return a status, not a count
MINOR: mux-h2: add a new dummy stream : h2_error_stream
MEDIUM: mux-h2: make h2c_decode_headers() support recoverable errors
When a decoding error is recoverable, we should emit a stream error and
not a connection error. This patch does this by carefully checking the
connection state before deciding to send a connection error. If only the
stream is in error, an RST_STREAM is sent.
This function used to return a byte count for the output produced, or
zero on failure. Not only this value is not used differently than a
boolean, but it prevents us from returning stream errors when a frame
cannot be extracted because it's too large, or from parsing a frame
and producing nothing on output.
This patch modifies its API to return <0 on errors, 0 on inability to
proceed, or >0 on success, irrelevant to the amount of output data.
The mux h1 mainly depends on the stream to handle errors and timeouts. But,
there is one unhandled case. If a timeout occurred when some outgoing data are
blocked in the output buffer, the stream is detached and the mux waits infinitly
the data are gone before closing the connection. To fix the bug, a task has been
added on the mux to handle connection timeouts. For now, a expiration date is
set only when some outgoing data are blocked. And if a stream is still attached
when the mux's task timed out, an error flag is set on the mux but the
connection is not closed immediatly. We assume the stream will hit the same
timeout just after.
This patch must be backported to 1.9.
In the function htx_end_request, the flag SI_FL_NOHALF must be set on the server
side once the request is in the state HTTP_MSG_DONE. But the response state was
checked before and the flag was only set when the response was also in the state
HTTP_MSG_DONE. Of course, it is not desirable.
This patch must be backported to 1.9.
Since a long time, the expiration date of a stream is only updated in
process_stream(). It is calculated, among others, using the channels expiration
dates for reads and writes (.rex and .wex values). But these values are updated
by the stream-interface. So when this happens at the connection layer, the
update is only done if the stream's task is woken up. Otherwise, the stream
expiration date is not immediatly updated. This leads to unexpected
behaviours. Time to time, users reported that the wrong timeout was hitted or
the wrong termination state was reported. This is partly because of this
bug.
Recently, we observed some blocked sessions for a while when big objects are
served from the cache applet. It seems only concern the clients not reading the
response. Because delivered objects are big, not all data can be sent. And
because delivered objects are big, data are fast forwarded (from the input to
the output with no stream wakeup). So in such situation, the stream expiration
date is never updated and no timeout is hitted. The session remains blocked
while the client remains connected.
This bug exists at least since HAProxy 1.5. But recent changes on the connection
layer make it more visible. It must be backported from 1.9 to 1.6. And with more
pain it should be backported to 1.5.
When transfering from H1 to H1, chunking is always indicated by the
presence of the Transfer-encoding header field. But when a message
comes from H2 there is no such header and only HTX_SL_F_XFER_LEN
ought to be relied on. This one will also result in H1_MF_XFER_LEN
to be set, just like transfer-encoding, so let's always rely on
this latter flag to detect the need for chunking (when CLEN is not
here) and automatically add the transfer-encoding header if it was
not present, as reported by H1_MF_CHNK.
This must be backported to 1.9.
The H1 mux needs to store some information regarding the states that
were met (EOD, trailers, etc) for each direction but currently uses
only one set of flags. This results in failures when both the request
and the response use chunked-encoding because some elements are believed
to have been met already and a trailing 0 CRLF or just a CRLF may be
missing at the end.
The solution here consists in splitting these flags per direction, one
set for input processing and another set for output processing. Only
two flags were affected so this is not a big deal.
This needs to be backported to 1.9.
In h2c_decode_headers() we update the buffer's length according to the
amount of data produced (outlen). But in case of HTX this outlen value
is not a quantity, just an indicator of success, resulting in the buffer
being added one extra byte and temporarily showing .data > .size, which
is wrong. Fortunately this is overridden when leaving the function by
htx_to_buf() so the impact only exists in step-by-step debugging, but
it definitely needs to be fixed.
This must be backported to 1.9.
When dealing with a server's H2 response, we used to set the
end-of-stream flag on the conn_stream and the stream before parsing
the response, which is incorrect since we can fail to process this
response by lack of room, buffer or anything. The extend of this problem
is still limited to a few rare cases, but with trailers it will cause a
systematic failure.
This fix must be backported to 1.9.
This function handles response HEADERS frames, it is not responsible
for creating new streams thus it must not check if we've reached the
stream count limit, otherwise it could lead to some undesired pauses
which bring no benefit.
This must be backported to 1.9.
If we exit this function because some data are pending in the rxbuf, we
currently don't indicate any blocking flag, which will prevent the operation
from being attempted again. Let's set H2_CF_DEM_SFULL in this case to indicate
there's not enough room in the stream buffer so that the operation may be
attempted again once we make room. It seems that this issue cannot be
triggered right now but it definitely will with trailers.
This fix should be backported to 1.9 for completeness.
h2c_restart_reading() is used at various place to resume processing of
demux data, but this one refrains from doing so if the mux is already
subscribed for receiving. It just happens that even if some incoming
frame processing is interrupted, the mux is always subscribed for
receiving, so this condition alone is not enough, it must be combined
with the fact that the demux buffer is empty, otherwise some resume
events are lost. This typically happens when we refrain from processing
some incoming data due to missing room in the stream's rxbuf, and want
to resume in h2c_rcv_buf(). It will become even more visible with trailers
since these ones want to have an empty rxbuf before proceeding.
This must be backported to 1.9.
In h2c_decode_headers() we mistakenly check for H2_F_DATA_END_STREAM
while we should check for H2_F_HEADERS_END_STREAM. Both have the same
value (1) but better stick to the correct flag.
When an HTX structure is defragmented, it is possible to retrieve the new block
corresponding to an old one. This is useful to do a defrag during a loop on
blocks, to be sure to continue looping on the good block. But, instead of
returning the address of the new block in the HTX structure, the one in the
temporary structure used to do the defrag was returned, leading to unexpected
behaviours.
This patch must be backported to 1.9.
This bug exists in the HTX code and in the legacy one. When the body length is
unknown, the applet hangs. For the legacy code, it hangs because the end of the
cached object is not correctly handled and the applet is never recalled. For the
HTX code, only the begining of the response (the 1st buffer) is sent then the
applet hangs. To work in HTX, The fast forwarding must be correctly handled.
This patch must be backported to 1.9.
[cf: the patch adding the function channel_add_input must be backported with
this one. It does not exist in 1.8 because only responses with a C-L are cached.]
This way we are sure the channel state is always correctly upadated, especially
the amount of data directly forwarded. For the stats applet, it is not a bug
because the fast forwarding is never used (the response is chunked and the HTX
extra field is always set to 0).
This patch must be backported to 1.9.
This function must be called when new incoming data are pushed in the channel's
buffer. It updates the channel state and take care of the fast forwarding by
consuming right amount of data and decrementing "->to_forward" accordingly when
necessary. In fact, this patch just moves a part of ci_putblk in a dedicated
function.
This patch must be backported to 1.9.
With the new ability to log to a terminal, it's convenient to be able
to use "log stdout" in a config file, except that it now results in
setting the terminal to non-blocking mode, breaking every utility
relying on stdin afterwards. Since the only reason for logging to a
terminal is to debug, do not set the FD to non-blocking mode when it's
a terminal.
This fix must be backported to 1.9.
Application-Layer Protocol Negotiation (ALPN, RFC7301) is a TLS
extension which allows a client to present a preference for which
protocols it wishes to connect to, when a single port supports multiple
multiple application protocols.
It allows a transparent proxy to take a decision based on the beginning
of an SSL/TLS stream without deciphering it.
The new fetch "req.ssl_alpn" extracts the ALPN protocol names that may
be present in the ClientHello message.
Instead of keeping track of the number of connections we're responsible for,
keep track of the number of connections we're responsible for that we are
currently considering idling (ie that we are not using, they may be in use
by other sessions), that way we can actually reuse connections when we have
more connections than the max configured.
When connecting to a server, and reusing a connection, always attempt to give
the owner of the previous session one of its own connections, so that one
session won't be responsible for too many connections.
This should be backported to 1.9.
When creating a new outgoing connection, if we're using ALPN and waiting
for the handshake completion to choose the mux, and for some reason the
handshake failed, add the SI_FL_ERR flag to the stream_interface, so that
process_streams() knows the connection failed, and can attempt to retry,
instead of just hanging.
This should be backported to 1.9.
When a session adds a connection to its connection list, we used to remove
connections for an another server if there were not enough room for our
server. This can't work, because those lists are now the list of connections
we're responsible for, not just the idle connections.
To fix this, allow for an unlimited number of servers, instead of using
an array, we're now using a linked list.
To access the first element of the list, correctly use LIST_ELEM(), or we
end up getting the head of the list, instead of getting the first connection.
This should be backported to 1.9.
In connect_server(), if we looked for an usable connection and failed to
find one, srv_conn won't be NULL at the end of list_for_each_entry(), but
will point to the head of a list, which is not a pointer to a struct
connection, so explicitely set it to NULL.
This should be backported to 1.9.
If, for some reason we failed to allocate a conn_stream when reusing an
existing connection, set srv_conn to NULL, so that we fail later, instead
of pretending all is right. This ends up giving a stream_interface with
no endpoint, and so the stream will never end.
This should be backported to 1.9.
In h2_detach(), don't add the connection to the idle list if nb_streams
is at the max. This can happen if we already closed that stream before, so
its slot became available and was used by another stream.
This should be backported to 1.9.
When we use htx and http-request auth rules, we need to send WWW-Authenticate
with a 401 and Proxy-Authenticate with a 407. We only sent Proxy-Authenticate
regardless of status, with htx enabled.
To be backported to 1.9.
In connect_server(), don't attempt to reuse the old connection if it's
targetting a different server than the one we're supposed to access, or
we will never be able to connect to a server if the first one we tried failed.
This should be backported to 1.9.
Now that the HEADERS frame decoding is retryable, we can safely try to
fold CONTINUATION frames into a HEADERS frame when the END_OF_HEADERS
flag is missing. In order to do this, h2c_decode_headers() moves the
frames payloads in-situ and leaves a hole that is plugged when leaving
the function. There is no limit to the number of CONTINUATION frames
handled this way provided that all of them fit into the buffer. The
error reported when meeting isolated CONTINUATION frames has now changed
from INTERNAL_ERROR to PROTOCOL_ERROR.
Now there is only one (unrelated) remaining failure in h2spec.
The H2 demux only checks for too many streams in h2c_frt_stream_new(),
then refuses to create a new stream and causes the connection to be
aborted by sending a GOAWAY frame. This will also happen if any error
happens during the stream creation (e.g. memory allocation).
RFC7540#5.1.2 says that attempts to create streams in excess should
instead be dealt with using an RST_STREAM frame conveying either the
PROTOCOL_ERROR or REFUSED_STREAM reason (the latter being usable only
if it is guaranteed that the stream was not processed). In theory it
should not happen for well behaving clients, though it may if we
configure a low enough h2.max_concurrent_streams limit. This error
however may definitely happen on memory shortage.
Previously it was not possible to use RST_STREAM due to the fact that
the HPACK decompressor would be desynchronized. But now we first decode
and only then try to allocate the stream, so the decompressor remains
synchronized regardless of policy or resources issues.
With this patch we enforce stream termination with RST_STREAM and
REFUSED_STREAM if this protocol violation happens, as well as if there
is a temporary condition like a memory allocation issue. It will allow
a client to recover cleanly.
This could possibly be backported to 1.9. Note that this requires that
these five previous patches are merged as well :
MINOR: h2: add a bit-based frame type representation
MEDIUM: mux-h2: remove padlen during headers phase
MEDIUM: mux-h2: decode HEADERS frames before allocating the stream
MINOR: mux-h2: make h2c_send_rst_stream() use the dummy stream's error code
MINOR: mux-h2: add a new dummy stream for the REFUSED_STREAM error code
This patch introduces a new dummy stream, h2_refused_stream, in CLOSED
status with the aforementioned error code. It will be usable to reject
unexpected extraneous streams.
We currently have 2 dummy streams allowing us to send an RST_STREAM
message with an error code matching this one. However h2c_send_rst_stream()
still enforces the STREAM_CLOSED error code for these dummy streams,
ignoring their respective errcode fields which however are properly
set.
Let's make the function always use the stream's error code. This will
allow to create other dummy streams for different codes.
It's hard to recover from a HEADERS frame decoding error after having
already created the stream, and it's not possible to recover from a
stream allocation error without dropping the connection since we can't
maintain the HPACK context, so let's decode it before allocating the
stream, into a temporary buffer that will then be offered to the newly
created stream.
Three types of frames may be padded : DATA, HEADERS and PUSH_PROMISE.
Currently, each of these independently deals with padding and needs to
wait for and skip the initial padlen byte. Not only this complicates
frame processing, but it makes it very hard to process CONTINUATION
frames after a padded HEADERS frame, and makes it complicated to perform
atomic calls to h2s_decode_headers(), which are needed if we want to be
able to maintain the HPACK decompressor's context even when dropping
streams.
This patch takes a different approach : the padding is checked when
parsing the frame header, the padlen byte is waited for and parsed,
and the dpl value is updated with this padlen value. This will allow
the frame parsers to decide to overwrite the padding if needed when
merging adjacent frames.
Since commit f210191 ("BUG/MEDIUM: h2: don't accept new streams if
conn_streams are still in excess") we're refraining from reading input
frames if we've reached the limit of number of CS. The problem is that
it prevents such situations from working fine. The initial purpose was
in fact to prevent from reading new HEADERS frames when this happens,
and causes some occasional transfer hiccups and pauses with large
concurrencies.
Given that we now properly reject extraneous streams before checking
this value, we can be sure never to have too many streams, and that
any higher value is only caused by a scheduling reason and will go
down after the scheduler calls the code.
This fix must be backported to 1.9 and possibly to 1.8. It may be
tested using h2spec this way with an h2spec config :
while :; do
h2spec -o 5 -v -t -S -k -h 127.0.0.1 -p 4443 http2/5.1.2
done
We were returning a stream error of type PROTOCOL_ERROR on empty HEADERS
frames, but RFC7540#4.2 stipulates that we should instead return a
connection error of type FRAME_SIZE_ERROR.
This may be backported to 1.9 and 1.8 though it's unlikely to have any
real life effect.
Commit dc57236 ("BUG/MINOR: mux-h2: advertise a larger connection window
size") caused a WINDOW_UPDATE message to be sent early with the connection
to increase the connection's window size. It turns out that it causes some
minor trouble that need to be worked around :
- varnishtest cannot transparently cope with the WU frames during the
handshake, forcing all tests to explicitly declare the handshake
sequence ;
- some vtc scripts randomly fail if the WU frame is sent after another
expected response frame, adding uncertainty to some tests ;
- h2spec doesn't correctly identify these WU at the connection level
that it believes are the responses to some purposely erroneous frames
it sends, resulting in some errors being reported
None of these are a problem with real clients but they add some confusion
during troubleshooting.
Since the fix above was intended to increase the upload bandwidth, we
have another option which is to increase the window size with the first
WU frame sent for the connection. This way, no WU frame is sent until
one is really needed, and this first frame will adjust the window to
the maximum value. It will make the window increase slightly later, so
the client will experience the first round trip when uploading data,
but this should not be perceptible, and is not worth the extra hassle
needed to maintain our debugging abilities. As an extra bonus, a few
extra bytes are saved for each connection until the first attempt to
upload data.
This should possibly be backported to 1.9 and 1.8.
Add a way to configure the ALPN used by check, with a new "check-alpn"
keyword. By default, the checks will use the server ALPN, but it may not
be convenient, for instance because the server may use HTTP/2, while checks
are unable to do HTTP/2 yet.
In some situations, if too short a frame header is received, we may leave
h2_process_demux() waking up the task again without checking that we were
already subscribed.
In order to avoid this once for all, let's introduce an h2_restart_reading()
function which performs the control and calls the task up. This way we won't
needlessly wake the task up if it's already waiting for I/O.
Must be backported to 1.9.
In HTX, when the compression filter analyze the EOM, it flushes the compression
context and add the last block of compressed data. But, this block can be
empty. In this case, we must ignore it.
In dns_read_name() when dns name is used with compression and start position of
name is greater than 255 name read is incorrect and causes invalid dns error.
eg: 0xc11b c specifies name compression being used. 11b represent the start
position of name but currently we are using only 1b for start position.
This should be backported as far as 1.7.
A regression was introduced with efbbdf72 BUG: dns: Prevent out-of-bounds
read in dns_validate_dns_response() as it prevented from taking into account
the last byte of the payload. this patch aims at fixing it.
this must be backported in 1.8.
In mux_h2_unsubscribe, don't forget to leave the sending_list if
SUB_CALL_UNSUBSCRIBE was set. SUB_CALL_UNSUBSCRIBE means we were about
to be woken up for writing, unless the mux was too full to get more data.
If there's an unsubscribe call in the meanwhile, we should leave the list,
or we may be put back in the send_list.
This should be backported to 1.9.
First, it's a pain to always have to think about updating this date,
second for a long time I've not been the only developer there, and third,
some users contact me hoping to get help that I can't deliver. It's about
time to redirect them to the main site where all the useful links should
be.
In h2_snd_buf(), if we couldn't send the data because of flow control, and
the connection got a shutr, then add CS_FL_ERROR (or CS_FL_ERR_PENDING). We
will never get any window update, so we will never be unlocked, anyway.
No backport is needed.
When dealing with early data we scan the list of stream to notify them.
We're not supposed to have h2s->cs == NULL here but it doesn't cost much
to make the scan more robust and verify it before notifying.
No backport is needed.
If we had no pending read, it could be complicated to report an
RST_STREAM to a sender since we used to only report it via the
rx side if subscribed. Similarly in h2_wake_some_streams() we
now try all methods, hoping to catch all possible events.
No backport is needed.
In order to report an error to the data layer, we have different ways
depending on the situation. At a lot of places it's open-coded and not
always correct. Let's create a new function h2s_alert() to handle this
task. It tries to wake on recv() first, then on send(), then using
wake().
In the mux h2, make sure we set CS_FL_ERR_PENDING and wake the recv task,
instead of setting CS_FL_ERROR, if CS_FL_EOS is not set, so if there's
potentially still some data to be sent.
We still have an issue with asynchronous errors, which is that while
they don't truncate reads anymore, they might be missed during a
send() attempt. This can happen for example when processing a request
followed by undesired data for which the stream doesn't try to receive,
while the send side experiences an error (transfer aborted by the client).
In this case we definitely want all send() attempts to fail as soon as
the error was reported, even if it's only pending. This way we leave an
opportunity to the stream interface to try to receive the last data
pending in the buffer but it cannot send anymore and knows that there
is an error when trying to do so.
Commiy 8519357c ("BUG/MEDIUM: mux-h2: report asynchronous errors in
h2_wake_some_streams()") addressed an issue with synchronous errors
but forgot to fix the call places to also pass CS_FL_ERR_PENDING
instead of CS_FL_ERROR.
No backport is needed.
In h1_shutw() and h1_shutr(), don't attempt to shutdown() the connection
if we're using keepalive and the connection has no error, or we will close
the connection too soon.
As long-time changes have accumulated over time, the exported functions
of the stream-interface were almost all prefixed "si_<something>" while
most private ones (mostly callbacks) were called "stream_int_<something>".
There were still a few confusing exceptions, which were addressed to
follow this shcme :
- stream_sock_read0(), only used internally, was renamed stream_int_read0()
and made static
- stream_int_notify() is only private and was made static
- stream_int_{check_timeouts,report_error,retnclose,register_handler,update}
were renamed si_<something>.
Now it is clearer when checking one of these if it risks to be used outside
or not.
There was a reference to struct stream in conn_free() for the case
where we're freeing a connection that doesn't have a mux attached.
For now we know it's always a stream, and we only need to do it to
put a NULL in s->si[1].end.
Let's do it better by storing the pointer to si[1].end in the context
and specifying that this pointer is always nulled if the mux is null.
This way it allows a connection to detach itself from wherever it's
being used. Maybe we could even get rid of the condition on the mux.
We most often store the mux context there but it can also be something
else while setting up the connection. Better call it "ctx" and know
that it's the owner's context than misleadingly call it mux_ctx and
get caught doing suspicious tricks.
The SUB_CAN_SEND/SUB_CAN_RECV enum values have been confusing a few
times, especially when checking them on reading. After some discussion,
it appears that calling them SUB_RETRY_SEND/SUB_RETRY_RECV more
accurately reflects their purpose since these events may only appear
after a first attempt to perform the I/O operation has failed or was
not completed.
In addition the wait_reason field in struct wait_event which carries
them makes one think that a single reason may happen at once while
it is in fact a set of events. Since the struct is called wait_event
it makes sense that this field is called "events" to indicate it's the
list of events we're subscribed to.
Last, the values for SUB_RETRY_RECV/SEND were swapped so that value
1 corresponds to recv and 2 to send, as is done almost everywhere else
in the code an in the shutdown() call.
In HTTP applets, the request's EOM was removed like other blocks when receive or
get_line was called from lua scripts. So it was impossible to stop receiving
data on successive calls when all the request body was already consumed,
blocking infinitly the applet.
Now, we never consume the EOM. So it is easy to interrupt receive/get_line
calls. In all cases, this block is consumed when the applet ends.
Before setting the infinite forward, we first forward all remaining input data
from the channel. Of course for HTX streams, this must be done using the amount
of data in the HTX message not in the channel (which appears as full because of
the HTX).
When producing an HTX message, we can't rely on the next-level H1 parser
to check and deduplicate the content-length header, so we have to do it
while parsing a message. The algorithm is the exact same as used for H1
messages.
There is an issue with some medium sized transfers occasionally not
shutting down at the end. Olivier tracked this to being caused by a
missing wakeup of process_stream(). What happens is that one of the
analysers sets CF_WAKE_WRITE to be woken up at the end of the transfer
to take note of the end of transaction, but a failed si_cs_send() at
the end of process_stream causes the call to be attempted again, with
CF_WAKE_WRITE lost. Then stream_int_notify() doesn't find any valid
condition to wake up process_stream(), and the stream stays there,
idling till the timeout.
In fact, CF_WAKE_WRITE has been designed for calling the analysers
to complete an operation without closing (keep-alive HTTP transfer
for instance). It only applies once the buffer is empty and there
is nothing left to be forwarded. In case the channel is closed, the
wakeup is already granted. So what we need here is to make sure to
wake process_stream() up in case the channel will not be closed and
it doesn't have anything left to be transferred. This is detected by
the lack of CF_AUTO_CLOSE and the emptiness of the buffer + to_forward
after a write activity. So now we take care of always waking the stream
up on end of transfers even if the analysers didn't subscribe to this
or if their subscription was lost.
CF_WAKE_WRITE should probably be killed now, though this first requires
careful inspection.
No backport is needed.
Cc: Olivier Houchard <ohouchard@haproxy.com>
Cc: Christopher Faulet <cfaulet@haproxy.com>
The error captures provided in HTX by the H1 mux would always report the
backend as the "other end". We need to assign the backend only on requests.
No backport is needed.
Today the demux only wakes a stream up after receiving some contents, but
not necessarily on close or error. Let's do it based on both error flags
and both EOS flags. With a bit of refinement we should be able to only do
it when the pending bits are there but not the static ones.
No backport is needed.
This function is called when dealing with a connection error or a GOAWAY
frame. It used to report a synchronous error instead of an asycnhronous
error, which can lead to data truncation since whatever is still available
in the rxbuf will be ignored. Let's correctly use CS_FL_ERR_PENDING instead
and only fall back to CS_FL_ERROR if CS_FL_EOS was already delivered.
No backport is needed.
If EOS has already been reported on the conn_stream, there won't be
any read anymore to turn ERR_PENDING into ERROR, so we have to do
report it directly.
No backport is needed.
It takes ages to proceed with "show fd" when there is sustained activity
because it uses the rendez-vous point for each and every file descriptor
in the loop. It's very common to see socat timeout there.
Instead of doing this, let's just isolate the function when entering the
loop. Its duration is limited by the number of FDs that may be emitted in
a single buffer anyway, so it's much lighter and responds much faster.
The h2s pointer was used to scan fctl lists prior to being used to scan
the send list by ID, so it could appear non-null eventhough the list is
empty, resulting in misleading information on empty connections.
No backport is needed.
Most of the time when we issue "show fd" to dump a mux's state, it's
to figure why a transfer is frozen. Connection, stream and conn_stream
states are critical there. And most of the time when this happens there
is a single stream left in the H2 mux, so let's always dump the last
known stream on show fd, as most of the time it will be the one of
interest.
Cyril Bonté reported a bug in the way the cookie length is computed
when aggregating multiple cookies : the first cookie name was counted
as part of the value length, causing random contents to be placed there,
possibly leading to bad requests.
No backport is needed.
Commit 7505f94f9 ("MEDIUM: h2: Don't use a wake() method anymore.")
changed the conditions to restart demuxing so that this happens as soon
as something is read. But similar to previous fix, at an end of stream
we may be woken up with nothing to read but data still available in the
demux buffer, so we must also use this as a valid condition for demuxing.
No backport is needed, this is purely 1.9.
Commit 082f559d3 ("BUG/MEDIUM: h2: restart demuxing after releasing
buffer space") tried to address a situation where transfers could stall
after a read, but the condition was not completely covered : some stalls
may still happen at end of stream because there's nothing anymore to
receive and the last data lie in the demux buffer. Thus we must also
consider this state as a valid condition to restart demuxing.
No backport is needed.
Commit d94f877cd ("BUG/MINOR: mux_pt: Set CS_FL_WANT_ROOM when count is
zero in rcv_buf() callback") triggered a pending issue with this flag,
which is that it's cleared too late and sometimes causes some Rx
transfers to stall. We need to clear it before attempting to receive
otherwise we may risk to see an earlier copy of the flag.
Note that it should probably be defined that this flag could be purged
on each invocation of mux->rcv_buf(), which would make sense.
No backport is needed.
Add a new flag to conn_streams, CS_FL_ERR_PENDING. This is to be set instead
of CS_FL_ERR in case there's still more data to be read, so that we read all
the data before closing.
When count is zero in the function mux_pt_rcv_buf(), it means the channel's
buffer is full. So we need to set the CS_FL_WANT_ROOM on the
conn_stream. Otherwise, while the channel is full, we will try to receive in
loop more data.
A bug was introduced when the buffers API was refactored. It was when wrapping
input data were compressed. the pointer b_peek(in, 0) was used instead of
"b_orig(in)". b_peek(in, 0) is in fact the same as b_head(in).
Commit 19ed92b ("MINOR: hpack: optimize header encoding for short names")
introduced an error in the space computation for short names, as it removed
the length encoding from the count without replacing with 1 (the minimum
byte). This results in the last byte of the area being occasionally
overwritten, which is immediately detected with -DDEBUG_MEMORY_POOLS as
the canary at the end gets overwritten.
No backport is needed.
h1_process_input() may occasionally be called with an empty input
buffer, and the code behind cannot deal with that, let's check the
condition at the beginning.
No backport is needed.
In h2_deferred_shut, if we're done sending the shutr/shutw, don't destroy
the h2s if it still has a conn_stream attached, or the conn_stream may try
to access it again.
The cache runs in an applet, so it delivers data into the input side
of the channel's buffer. Thus it must also abort feeding the buffer
as soon as CF_SHUTR is present, not just CF_SHUTW*, since these last
ones may only appear later. There doesn't seem to be an observable
side effect of this bug, the fix probably doesn't even need to be
backported.
The HTX-specific cache code uses HTX_CACHE_* states which overlap with
the legacy HTTP states. A typo in the error handling made the state
become HTTP_CACHE_END, which equals 3 and is the value for HTX_CACHE_EOD,
which explains why we were seeing a transition to trailers and memory
corruption.
no backport needed.
When creating new conn_streams, always set the CS_FL_NOT_FIRST flag. We
don't really care about being the first request for HTTP/2, this only
really makes sense for HTTP/1, and that way we can reuse connections.
Add the newly created to the idle list as long as http-reuse != never, and
when completing a H2 request, add the connection to the safe list instead of
the idle list, if we have to add it at that point, that means we created
many streams so we know it's safe.
In session, don't keep an infinite number of connection that can idle.
Add a new frontend parameter, "max-session-srv-conns" to set a max number,
with a default value of 5.
Instead of trying to get the session from the connection, which is not
always there, and of course there could be multiple sessions per connection,
provide it with the init() and attach() methods, so that we know the
session for each outgoing stream.
In the mux_h1 and mux_h2, move the test to see if we should add the
connection in the idle list until after we destroyed the h1s/h2s, that way
later we'll be able to check if the connection has no stream at all, and if
it should be added to the server idling list.
Instead of the old "idle-timeout" mechanism, add a new option,
"pool-purge-delay", that sets the delay before purging idle connections.
Each time the delay happens, we destroy half of the idle connections.
Add a new command, "pool-max-conn" that sets the maximum number of connections
waiting in the orphan idling connections list (as activated with idle-timeout).
Using "-1" means unlimited. Using pools is now dependant on this.
Change the default for http-reuse from "never" to "safe", as it has been
the recommended setting for a few versions now and backend H2 makes little
sense without it.
Some warnings were removed from the config parser since it can dynamically
be disabled depending on the server's configuration, so there's no need to
disable it on a whole backend just for one server.
Caching the response with the compression enabled was totally broken. To fix the
problem, the compression must be done after caching the response. Otherwise it
needs to change the cache to store compressed and uncompressed objects for the
same ressource. So, because it is not possible for now, it is forbidden to
declare the compression filter before the cache one. To ease the configuration,
both can be implicitly declared (without "filter" keyword). The compression will
automatically be inserted after the cache.
Then, to make it works this way, the compression filter has been slighly
modified. Now, the response headers are updated after http-response rules
evaluations, instead of before. So, if the response contains a "Content-length"
header, it will be kept with the response stored in the cache. So this cached
response will be able to be served to clients not supporting the compression at
all.
This bugfix concerns the thread deinit but affects the master process.
When the master process falls in wait mode (it fails to reload the
configuration), it launches the deinit_pollers_per_thread and close the
thread waker pipe. It closes rd (-1) and wr (0).
Closing a FD in the master can have several sides effects and the
process will probably quit at some point.
In this case it assigns 0 to the socketpair of a worker during the next
correct reload, and then closes the socketpair once it falls in wait
mode again. The worker assumes that the master died and leaves.
Commit f8188c6 ("MEDIUM: threads/logs: Make logs thread-safe") made logs
thread-local but it also made the copy of the startup-logs thread-local,
meaning that when threads are configured, upon startup the list of startup
logs appears to be empty. Let's just remove the THEAD_LOCAL directive
there, as the check for the startup period is already present.
This fix should be backported to 1.8.
When refactoring the build option strings in 1.9, the thread support
was placed outside of the ifdef block resulting in threads always being
mentioned even if that was not true. Let's fix this and also mention
when threads are disabled to help troubleshooting.
Removing deprecated APIs is an optional part of OpenWrt's build system to
save some space on embedded devices.
Also added compatibility for LibreSSL.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
PiBa-NL reported an issue affecting logs when stdout is enabled at the
same time as an IP address. It does not affect FD and UNIX, but does
still affect multiple FDs. What happens is that the condition to detect
that the initialization was not made relies on the FD being -1, and in
this case the FD points to the *unique* FD used for AF_INET sockets, so
the configured socket used for outgoing logs over UDP gets overwritten
by the last configured FD. This is not appropriate, so instead we rely
on the sin_port part of the IPv4-mapped address to store the
initialization state for each FD.
This part deserves being significantly revamped, as IPv6 is still not
possible due to the way the FDs are managed, and inherited FDs are a
bit hackish.
Note that this patch relies on "MINOR: tools: preset the port of
fd-based "sockets" to zero" in order to operate properly.
No backport is needed.
Addresses made of a file descriptor store the file descriptor into the
address part of a sin_addr. Contrary to other address classes, there's
no way to figure later based on the FD if an initialization was done
(which is how logs initialize their FDs). The port part is currently
left with random data, so let's instead specifically set the port part
to zero when creating an FD, and let the code using it set whatever
info it needs there, typically an initialization state.
PiBa-NL reported that since this commit f157384 ("MINOR: backend: count
the number of connect and reuse per server and per backend"), reg-test
connection/h00001 fails. Indeed it does, the server is not checked for
existing prior to updating its counter. It should also fail with
transparent mode.
Commit 84cca66 ("BUG/MEDIUM: htx: When performing zero-copy, start from
the right offset.") uncovered another issue which is that the send function
may occasionally be called without any block. It's important to check for
this case when computing the zero-copy offsets.
No backport is needed.
In sess_build_logline(), don't attempt to call sample_fetch_as_type()
if we don't have a stream.
It used never to happen in the past before commit 09bb27c ("MEDIUM: log:
make sess_build_logline() support being called with no stream"). But now
it can happen when sess_log() is called from the lower layers (i.e. the
H2 mux got garbage when it was expecting a preface frame), and it reveals
that some sample fetch functions and some converter fnuctions which rely
on the stream don't test for its existence. For the sample fetch functions,
a durable solution is easy and would consist in adapting sample_process()
to verify the SMP_USE_* bits when the stream is not set. But for the
converters we don't have this option as they don't declare whether or not
they use a stream (which possibly is the real issue).
Thus for now let's disable sample_fetch_as_type() if a stream does not
exist, until it can be more accurately refined later.
If a reload was issued to the master process and failed, it is critical
that the admin sees it because it means that the saved configuration
does not work anymore and might not be usable after a full restart. For
this reason in this case we modify the "master" prompt to explicitly
indicate that a reload failed.
If the reload fail after the parsing of the configuration, the
mworker_proc structures are created for the processes it tried to
create.
The mworker_proc_list_to_env() function was exporting these unitialized
structures in the "HAPROXY_PROCESSES" environment variable which was
leading to this kind of output in "show proc":
4294967295 worker [was: 1] 1 17879d 16h26m28s
When using zerocopy, start from the beginning of the data, not from the
beginning of the buffer, it may have contained headers, and so the data
won't start at the beginning of the buffer.
This patch is a bit huge but nothing special here. Some functions have been
duplicated to support the HTX, some others have a switch inside to do so. So,
now, it is possible to create HTTP applets from HTX proxies. However, TCP
applets remains unsupported.
Functions from then Channel class are now forbidden for LUA scripts called from
HTTP proxies. These functions totally hijacked the HTTP parser, leaving it in an
undefined state. This patch is tagged as MAJOR because it could be see as a
compatibility breakage. But a LUA script using one of these functions has a very
low probablity to work correctly except by chance.
So, concretely, following functions are concerned: Channel.get, Channel.dup,
Channel.getline, Channel.set, Channel.append, Channel.send,
Channel.forward. Others remain available.
When deciding whether to scan the global run queue or not, we currently
check the configured threads number, and if it's 1 we skip the queue
since it's not supposed to be used. However when running with a master
process and multiple threads in the workers, the master will turn this
number back to 1 while some task wakeups might possibly have set bits
in the global tasks mask, thus causing active_tasks_mask to have one
bit permanently set, preventing the process from sleeping.
Instead of checking global.nbthread, let's check for the current
thread's bit in global_tasks_mask. First it will make this part of the
code more consistent, working like a test and set operation, it will
solve the issue with master+nbthread and as a bonus it will save a
lock/unlock for each scheduler call when the thread doesn't have a
task in the global run queue.
The tooltip in the HTML stats page was damaged by commit 1b0f85e47 ("MINOR:
stats: also report the failed header rewrites warnings on the stats page"),
due to the header rewrites counter being inserted at the wrong place and
taking the place of the other statuses.
This is only for 1.9, no backport is needed.
Sadly we didn't have the cumulated number of connections established to
servers till now, so let's now update it per backend and per-server and
report it in the stats. On the stats page it appears in the tooltip
when hovering over the total sessions count field.
The transpory layer now respects buffer alignment, so we don't need to
cheat anymore pretending we have some data at the head, adjusting the
buffer's head is enough.
For a long time we've been realigning empty buffers in the transport
layers, where the I/Os were performed based on callbacks. Doing so is
optimal for higher data throughput but makes it trickier to optimize
unaligned data, where mux_h1/h2 have to claim some data are present
in the buffer to force unaligned accesses to skip the frame's header
or the chunk header.
We don't need to do this anymore since the I/O calls are now always
performed from top to bottom, so it's only the mux's responsibility
to realign an empty buffer if it wants to.
In practice it doesn't change anything, it's just a convention, and
it will allow the code to be simplified in a next patch.
The cconf variable was not initialized before the two first possible
error exits before being freed, resulting in random crashes instead
of displaying an error message if the cache ID was missing from the
filter declaration.
No backport is needed, this is exclusively 1.9.
The leastconn algorithm queues available servers based on their weighted
current load. But this results in an inaccurate load balancing when weights
differ and the load is very low, because what matters is not the load before
picking the server but the load resulting from picking the server. At the
very least, it must be granted that servers with the highest weight are
always picked first when no server has any connection.
This patch addresses this by simply adding one to the current connections
count when queuing the server, since this is the load the server will have
once picked. This finally allows to bridge the gap that existed between
the "leastconn" and the "first" algorithms.
In the mux detach function, when using HTX, take the connection over if
it no longer has an owner (ie because the session that was the owner left).
It is done for legacy code in proto_http.c, but not for HTX.
Also when using HTX, in H2, try to add the connection back to idle_conns if
it was not already (ie we used to use all the available streams, and we're
freeing one). That too was done in proto_http.c.
Before trying to add a connection to the idle list, make sure it doesn't
have the error, the shutr or the shutw flag. If any of them is present,
don't bother trying to recycle the connection, it's going to be destroyed
anyway.
A first patch was pushed to fix this bug if it happens during the headers
parsing. But it is also possible to hit the bug during the parsing of
chunks. For instance, if the server sends only part of the trailers, some data
remains unparsed. So it the server closes its connection without sending the end
of the response, we fall back again into an infinite loop.
The fix contains in 2 parts. First, we block the receive if a read0 or an error
is detected on the connection, independently if the input buffer is empty or
not. Then, the flags CS_FL_RCV_MORE and CL_FL_WANT_ROOM are always reset when
input data are processed. We set them again only when necessary.
Add a new method to mux, "reset", that is used to let the mux know the
connection attempt failed, and we're about to retry, so it just have to
reinit itself. Currently only the H1 mux needs it.
When the connection failed, we don't really want to close the conn_stream,
as we're probably about to retry, so just make sure the file descriptor is
closed.
In si_cs_recv(), report that arrive at the end of stream only if we were
indeed connected, we don't want that if the connection failed and we're about
to retry.
CS_FL_EOS | CS_FL_REOS can be set by the mux if the connection failed, so make
sure we remove them before retrying to connect, or it may lead to a premature
close of the connection.
In the master CLI, the commands and the prefix were still parsed and
trimmed after the pattern payload. Don't parse anything but the end of a
line till we are in payload mode.
Put the search of the pattern after the trim so we can use correctly a
payload with a command which is prefixed by @.
Handle the CLI level in the master CLI. In order to do this, the master
CLI stores the level in the stream. Each command are prefixed by a
"user" or "operator" command before they are forwarded to the target
CLI.
The level can be configured in the haproxy program arguments with the
level keyword: -S /tmp/sock,level,admin -S /tmp/sock2,level,user.
Implement "show cli level" which show the level of the current CLI
session.
Implement "operator" and "user" which lower the permissions of the
current CLI session.
We need to make sure that the record length is not making us read
past the end of the data we received.
Before this patch we could for example read the 16 bytes
corresponding to an AAAA record from the non-initialized part of
the buffer, possibly accessing anything that was left on the stack,
or even past the end of the 8193-byte buffer, depending on the
value of accepted_payload_size.
To be backported to 1.8, probably also 1.7.
Some callers of dns_read_name() do not make sure that we can read
the first byte, holding the length of the next label, without going
past our buffer, so we need to make sure of that.
In addition, if the label is a compressed one we need to make sure
that we can read the following byte to compute the target offset.
To be backported to 1.8, probably also 1.7.
When a compressed pointer is encountered, dns_read_name() will call
itself with the pointed-to offset in the packet.
With a specially crafted packet, it was possible to trigger an
infinite-loop recursion by making the pointer points to itself.
While it would be possible to handle that particular case differently
by making sure that the target is different from the current offset,
it would still be possible to craft a packet with a very long chain
of valid pointers, always pointing backwards. To prevent a stack
exhaustion in that case, this patch restricts the number of recursive
calls to 100, which should be more than enough.
To be backported to 1.8, probably also 1.7.
The commit 3815b227f ("MEDIUM: mux-h1: implement true zero-copy of DATA blocks")
broke the output of chunked messages. When the zero-copy was performed on such
messages, no chunk size was emitted nor ending CRLF.
Now, the chunked envelope is added when necessary. We have at least the size of
the struct htx to emit it. So 40 bytes for now. It should be enough.
Change the output of the relative pid for the old processes, displays
"[was: X]" instead of just "X" which was confusing if you want to
connect to the CLI of an old PID.
H2 has a 9-byte frame header, and HTX has a 40-byte frame header.
By artificially advancing the Rx header and limiting the amount of
bytes read to protect the end of the buffer, we can make the data
payload perfectly aligned with HTX blocks and optimize the copy.
This is similar to what was done for the H1 mux : when the mux's buffer
is empty and the htx area contains exactly one data block of the same
size as the requested count, and all window and frame size conditions are
satisfied, then it's possible to simply swap the caller's buffer with the
mux's output buffer and adjust offsets and length to match the entire
DATA HTX block in the middle. An H2 frame header has to be prepended
before the block but this always fits in an HTX frame header.
In this case we perform a true zero-copy operation from end-to-end. This
is the situation that happens all the time with large files. When using
HTX over H2 over TLS, this brings a 3% extra performance gain. TLS remains
a limiting factor here but the copy definitely has a cost. Also since
haproxy can now use H2 in clear, the savings can be higher.
Due to blocking factor being different on H1 and H2, we regularly end
up with tails of data blocks that leave room in the mux buffer, making
it tempting to copy the pending frame into the remaining room left, and
possibly realigning the output buffer.
Here we check if the output buffer contains data, and prefer to wait
if either the current frame doesn't fit or if it's larger than 1/4 of
the buffer. This way upon next call, either a zero copy, or a larger
and aligned copy will be performed, taking the whole chunk at once.
Doing so increases the H2 bandwidth by slightly more than 1% on large
objects.
By default H2 uses a 65535 bytes window for the connection, and changing
it requires sending a WINDOW_UPDATE message. We only used to update the
window when receiving data, thus never increasing it further.
As reported by user klzgrad on the mailing list, this seriously limits
the upload bitrate, and will have an even higher impact on the backend
H2 connections to origin servers.
There is no technical reason for keeping this window so low, so let's
increase it to the maximum possible value (2G-1). We do this by
pretending we've already received that many data minus the maximum
data the client might already send (65535), so that an early
WINDOW_UPDATE message is sent right after the SETTINGS frame.
This should be backported to 1.8. This patch depends on previous
patch "BUG/MINOR: mux-h2: refrain from muxing during the preface".
The condition to refrain from processing the mux was insufficient as it
would only handle the outgoing connections. In essence it is not that much
of a problem since we don't have streams yet on an incoming connetion. But
it prevents waiting for the end of the preface before sending an early
WINDOW_UPDATE message, thus causing the connections to fail in this case.
This must be backported to 1.8 with a few minor adaptations.
Since HTX casts the buffer to a struct and stores relative pointers at the
end, it is mandatory that its end is properly aligned. This patch enforces
a buffer size rounding up to the next multiple of two void*, thus 8 on
32-bit and 16 on 64-bit, to match what malloc() already does on the beginning
of the buffer. In pratice it will never be really noticeable since default
sizes already are such multiples.
When the mux's buffer is empty and the htx area contains exactly one
data block of the same size as the requested count, then it's possible
to simply swap the caller's buffer with the mux's output buffer and
adjust offsets and length to match the entire DATA HTX block in the
middle. In this case we perform a true zero-copy operation from
end-to-end. This is the situation that happens all the time with large
files. With this change, the HTX bit rate performance catches up again
with the legacy mode (measured at 97%).
These flags haven't been used for a while. SF_TUNNEL was reintroduced
by commit d62b98c6e ("MINOR: stream: don't set backend's nor response
analysers on SF_TUNNEL") to handle the two-level streams needed to
deal with the first model for H2, and was not removed after this model
was abandonned. SF_INITIALIZED was only set. SF_CONN_TAR was never
referenced at all.
Now that h1 and legacy HTTP are two distinct things, there's no need
to keep the legacy HTTP parsers in h1.c since they're only used by
the legacy code in proto_http.c, and h1.h doesn't need to include
hdr_idx anymore. This concerns the following functions :
- http_parse_reqline();
- http_parse_stsline();
- http_msg_analyzer();
- http_forward_trailers();
All of these were moved to http_msg.c.
Lots of HTTP code still uses struct http_msg. Not only this code is
still huge, but it's part of the legacy interface. Let's move most
of these functions to a separate file http_msg.c to make it more
visible which file relies on what. It's mostly symmetrical with
what is present in http_htx.c.
The function http_transform_header_str() which used to rely on two
function pointers to look up a header was simplified to rely on
two variants http_legacy_replace_{,full_}header(), making both
sides of the function much simpler.
No code was changed beyond these moves.
All the HTX definition is self-contained and doesn't really depend on
anything external since it's a mostly protocol. In addition, some
external similar files (like h2) also placed in common used to rely
on it, making it a bit awkward.
This patch moves the two htx.h files into a single self-contained one.
The historical dependency on sample.h could be also removed since it
used to be there only for http_meth_t which is now in http.h.
As for the compression filter, the cache filter must be explicitly declared
(using the filter keyword) if other filters than cache are used. It is mandatory
to explicitly define the filters order.
Documentation has been updated accordingly.
This is only true for HTX proxies. On legacy HTTP proxy, if the compression and
the cache are both enabled, an error during HAProxy startup is triggered.
With the HTX, now you can use both in any order. If the compression is defined
before the cache, then the responses will be stored compressed. If the
compression is defined after the cache, then the responses will be stored
uncompressed. So in the last case, when a response is served from the cache, it
will compressed too like any response.
To do so, a dedicated configuration has been added on cache filters. Before the
cache filter configuration pointed directly to the cache it used. Now, it is the
dedicated structure cache_flt_conf. Store and use rules also point to this
structure. It is linked to the cache the filter must used. It also contains a
flags field. This will allow us to define the behavior of a cache filter when a
response is stored in the cache or delivered from it.
And now, Store and use rules uses a common parsing function. So if it does not
already exists, a filter is always created for both kind of rules. The cache
filters configuration is checked using their check callback. In the postparser
function, we only check the caches configuration. This removes the loop on all
proxies in the postparser function.
The cache is now able to store and resend HTX messages. When an HTX message is
stored in the cache, the headers are prefixed with their block's info (an
uint32_t), containing its type and its length. Data, on their side, are stored
without any prefix. Only the value is copied in the cache. 2 fields have been
added in the structure cache_entry, hdrs_len and data_len, to known the size, in
the cache, of the headers part and the data part. If the message is chunked, the
trailers are also copied, the same way as data. When the HTX message is
recreated in the cache applet, the trailers size is known removing the headers
length and the data lenght from the total object length.
Instead of calling register_data_filter() when the stream analyze starts, we now
call it when we are sure the response is cacheable. It is done in the
http_headers callback, just before the body analyzis, and only if the headers
was already been cached. And during the body analyzis, if an error occurred or
if the response is too big, we unregistered the cache immediatly.
This patch may be backported in 1.8. It is not a bug but a significant
improvement.
It is not possible to mix the format of messages stored in a cache. So we reject
the configurations with a cache used by an HTX proxy and a legacy HTTP proxy in
same time.
The CLI proxy was not handling payload. To do that, we needed to keep a
connection active on a server and to transfer each new line over that
connection until we receive a empty line.
The CLI proxy handles the payload in the same way that the CLI do it.
Examples:
$ echo -e "@1;add map #-1 <<\n$(cat data)\n" | socat /tmp/master-socket -
$ socat /tmp/master-socket readline
prompt
master> @1
25130> add map #-1 <<
+ test test
+ test2 test2
+ test3 test3
+
25130>
During a payload transfer, we need to wait for the data even when we are
not in interactive mode. Indeed, the data could be received line per
line progressively instead of in one recv.
Previously the CLI was doing a SHUTW just after the first line if it was
not in interactive mode. We now check if we are in payload mode to do
a SHUTW.
Should be backported in 1.8.
Rework the CLI proxy parser to look more like the CLI parser, corner
case and escaping are handled the same way.
The parser now splits the commands in words instead of just handling
the prefixes.
It's easier to compare words and arguments of a command this way and to
parse internal command that will be consumed directly by the CLI proxy.
There were a number of ugly setsockopt() calls spread all over
proto_http.c, proto_htx.c and hlua.c just to manipulate the front
connection's TOS, mark or TCP quick-ack. These ones entirely relied
on the connection, its existence, its control layer's presence, and
its addresses. Worse, inet_set_tos() was placed in proto_http.c,
exported and used from the two other ones, surrounded in #ifdefs.
This patch moves this code to connection.h and makes the other ones
rely on it without ifdefs.
This way we don't open-code the HPACK status codes anymore in the H2
code. Special care was taken not to cause any slowdown as this code is
very sensitive.
We'll need these functions from other inline functions, let's make them
accessible. len_to_bytes() was renamed to hpack_len_to_bytes() since it's
now exposed.
We used to have a series of well-known header fields that were looked
up, but most of them were not. The current model couldn't scale with
the addition of the new headers or pseudo-headers required to process
requests, resulting in their encoding being hard-coded in the caller.
This patch implements a quick lookup which retrieves any header from
the static table. A binary stream is made of header names prefixed by
lengths and indexes. These header names are sorted by length, then by
frequency, then by direction (preference for response), then by name,
the the lowest index of each is stored only in case of multiple
entries. A parallel length index table provides the index of the first
header for a given string. This allows to focus on the first few values
matching the same length.
Everything was made to limit the cache footprint. Interestingly, the
lookup ends up being slightly faster than the previous one, while
covering the 54 distinct headers instead of only 10.
A test with a curl request and a basic response showed that the request
size has dropped from 85 to 56 bytes and that the response size has
dropped from 197 to 170 bytes, thus we can now shave roughly 25-30 bytes
per message.
For unknown fields, since we know that most of them are less than 127
characters, we don't need to go through the loop and can instead directly
emit the one-byte length encoding. This increases the request rate by
approximately 0.5%.
memcpy() tends to be overkill to copy short strings, better use ist's
naive functions for this. This shows a consistent 1.2% performance
gain with h2load.
The len-to-bytes conversion can be slightly simplified and optimized
by hardcoding a tree lookup. Just doing this increases by 1% the
request rate on H2. It could be made almost branch-free by using
fls() but it looks overkill for most situations since most headers
are very short.
In hpack_encode_header() there is a length check to verify that a literal
header name fits in the buffer, but there it an off-by-one in this length
check, which forgets the byte required to mark the encoding type (literal
without indexing). It should be harmless though as it cannot be triggered
since response headers passing through haproxy are limited by the reserve,
which is not the case of the output buffer.
This fix should be backported to 1.8.
Otherwise, after such replaces, the HTX message appears to wrap but the head
block address is not necessarily the first one. So adding new blocks will
override data of old ones.
If a server sends part of headers and then close its connection, the mux H1
reamins blocked in an infinite loop trying to read more data to finish the
parsing of the message. The flag CS_FL_REOS is set on the conn_stream. But
because there are some data in the input buffer, CS_FL_EOS is never set.
To fix the bug, in h1_process_input, when CS_FL_REOS is set on the conn_stream,
we also set CS_FL_EOS if the input buffer is empty OR if the channel's buffer is
empty.
When a request is fully processed, no more data are parsed until the response is
totally processed and a new transaction starts. But during this time, the mux is
trying to read more data and subscribes to read. If requests are pipelined, we
start to receive the next requests which will stay in the input buffer, leading
to a loop consuming all the CPU. This loop ends when the transaction ends. To
avoid this loop, the flag H1C_F_IN_BUSY has been added. It is set when the
request is fully parsed and unset when the transaction ends. Once set on H1C, it
blocks the reads. So the mux never tries to receive more data in this state.
Condition to process the connection mode on outgoing messages whithout
'Connection' header was wrong. It relied on the wrong H1M
state. H1_MSG_HDR_L2_LWS is only a possible state for messages with at least one
header. Now, to fix the bug, we just check the H1M state is not
H1_MSG_LAST_LF. So, we have the warranty the EOH was not processed yet.
Jerome reported that outgoing H2 failed for methods different from GET
or POST. It turns out that the HPACK encoding is performed by hand in
the outgoing headers encoding function and that the data length was not
incremented to cover the literal method value, resulting in a corrupted
HEADERS frame.
Admittedly this code should move to the generic HPACK code.
No backport is needed.
In connect_server(), don't attempt to reuse the conn_stream associated to
the stream_interface, if we already attempted a connection with it.
Using that conn_stream is only there for the cases where a connection and
a conn_stream was created ahead, mostly by http_proxy or by the LUA code.
If we already attempted to connect, that means we fail, and so we should
create a new connection.
No backport needed.
When the connection mode can be deduced from the HTTP version, we remove the
redundant connection header. So "keep-alive" connection header is removed from
HTTP/1.1 messages and "close" connection header is remove from HTTP/1.0
messages.
Gcc 7 warns about a potential null pointer deref that cannot happen
since the start line block is guaranteed to be present in the functions
where it's dereferenced. Let's mark it as already checked.
When we're using HTX, we don't have to generate chunk header/trailers, and
that ultimately leads to a crash when we try to access a buffer that
contains just chunk trailers.
This should not be backported.
A typo in the block type check makes this function fail all the time,
which has impact on anything rewriting a start line (set-uri, set-path
etc).
No backport needed.
This adds the sample fetch bc_http_major. It returns the backend connection's HTTP
version encoding, which may be 1 for HTTP/0.9 to HTTP/1.1 or 2 for HTTP/2.0. It is
based on the on-wire encoding, and not the version present in the request header.
In smp_dup(), don't consider a SMP_T_METH with an unknown method the same as
SMP_T_STR. The string and string length aren't stored at the same place.
This should be backported to 1.8.
The flag CS_FL_EOS can be set while no data was received. So the flas
CS_FL_RCV_MORE is not set. In this case, the read0 was never processed by the
stream interface. To be sure to process it, the test on CS_FL_RCV_MORE has been
moved after the one on CS_FL_EOS.
Now that we know that htx only contains lower case header names, there
is no need anymore for looking them up in a case-insensitive manner.
Note that http_find_header() still does it because header names to
compare against may come from everywhere there.
Since HTX stores header names in lower case already, we don't need to
do it again anymore. This increased H2 performance by 2.7% on quick
tests, now making H2 overr HTX about 5.5% faster than H2 over H1.
HTTP/2 and above require header names to be lower cased, while HTTP/1
doesn't care. By making lower case the standard way to store header
names in HTX, we can significantly simplify all operations applying to
header names retrieved from HTX (including, but not limited to, lookups
and lower case checks which are not needed anymore).
As a side effect of replacing memcpy() with ist2bin_lc(), a small increase
of the request rate performance of about 0.5-1% was noticed on keep-alive
traffic, very likely due to memcpy() being overkill for tiny strings.
This trivial patch was marked medium because it may have a visible end-user
impact (e.g. non-HTTP compliant agent, etc).
In the commit 6a2d33481 ("BUG/MEDIUM: h1: Set CS_FL_REOS if we had a read0."),
We set the flag CS_FL_REOS on the conn_stream when a read0 is detected. But we
must be sure to have a conn_stream first.
In h1_recv(), if we get a read0, let the conn_stream know by setting the
CS_FL_REOS flag, or it may never be aware we did hit EOS.
This should not be backported.
In h1_process(), don't release the connection if it is an outgoing connection
and we don't have an h1s associated, if it is so it is probably just in
a pool.
CS_FL_RCV_MORE is used in two cases, to let the conn_stream
know there may be more data available, and to let it know that
it needs more room. We can't easily differentiate between the
two, and that may leads to hangs, so split it into two flags,
CS_FL_RCV_MORE, that means there may be more data, and
CS_FL_WANT_ROOM, that means we need more room.
This should not be backported.
Commit 27f3fa5 ("BUG/MEDIUM: mworker: stop every tasks in the master")
used MAX_THREADS as a mask instead of MAX_THREADS_MASK to clean the
global run queue, and used rq_next (global variable) instead of next_rq.
Renamed next_rq as tmp_rq and next_wq as tmp_wq to avoid confusion.
No backport needed.
We used to wait for the other side to be connected, but the blocking
flags were inaccurate. It used to work fine almost by accident before
the stream interface changes. Now we use the new RXBLK_CONN flag to
explicitly subscribe to this event.
Thanks to Adis for reporting the issue, PiBaNL for the test case,
and Olivier for the diagnostic.
No backport is needed.
In connect_server(), if we already have a conn_stream, reuse it
instead of trying to create a new one. http_proxy and LUA both
manually create a conn_stream and a connection, and we want
to use it.
The offset was always wrong after an HTX defragmentation because the wrong
address was used and because the update could occcur several time on the same
defragmentation.
The master is not supposed to run (at the moment) any task before the
polling loop, the created tasks should be run only in the workers but in
the master they should be disabled or removed.
No backport needed.
It avoids to subscribe to send events because some may remain in the output
buffer. If the output is closed or if an error occurred, there is no way to send
these data anyway, so it is safe to drain them.
Due to a thinko, I used sl_off as the start line index number but it's
not it, it's its offset. The first index is obtained using htx_get_head(),
and the start line is obtained using htx_get_sline(). This caused crashes
to happen when forwarding HTX traffic via the H2 mux once the HTX buffer
started to wrap.
No backport is needed.
In h1_process_output(), instead of waiting to have enough data to send to
consume a full block of data, we are now able consume partially these blocks.
To ease the fast forwarding and the infinte forwarding on HTX proxies, 2
functions have been added to let the channel be almost aware of the way data are
stored in its buffer. By calling these functions instead of legacy ones, we are
sure to forward the right amount of data.
Now, the function htx_from_buf() will set the buffer's length to its size
automatically. In return, the caller should call htx_to_buf() at the end to be
sure to leave the buffer hosting the HTX message in the right state. When the
caller can use the function htxbuf() to get the HTX message without any update
on the underlying buffer.
On the server side, we must test the request headers to deduce if we able to do
keepalive or not. Otherwise, by default, the keepalive will be enabled on the
server's connection, whatever the client said.
After 8706c8131 ("BUG/MEDIUM: mux_pt: Always set CS_FL_RCV_MORE."), a
side effect caused failed receives to mark the buffer as missing room,
a flag that no other place can remove since it's empty. Ideally we need
a separate flag to mean "failed to deliver data by lack of room", but
in the mean time at the very least we must not mark as blocked an
empty buffer.
No backport is needed.
In order to properly deal with unaligned contents, the output data are
currently copied into a temporary buffer, to be copied into the mux's
output buffer at the end. The new buffer API allows several buffers to
share the same data area, so we're using this here to make the temporary
buffer point to the same area as the output buffer when that one is
empty. This is enough to avoid the copy at the end, only pointers and
lengths have to be adjusted. In addition the output buffer's head is
advanced by the HTX header size so that the remaining copy is aligned.
By doing this we improve the large object performance by an extra 10%,
which is 64% above the 1.9-dev9 state. It's worth noting that there are
no more calls to __memcpy_sse2_unaligned() now.
Since this code deals with various block types, it appears difficult to
adjust it to be smart enough to even avoid the first copy. However a
distinct approach could consist in trying to detect a single blocked
HTX and jump to dedicated code in this case.
When transferring large objects, most calls are made between a full
buffer and an empty buffer. In this case there is a large opportunity
for performing zero-copy calls, with a few exceptions : the input data
must fit into the output buffer, and the data need to be properly
aligned and formated to let the HTX header fit before and the HTX
block(s) fit after.
This patch does two things :
1) it makes sure that we prepare an empty input buffer before an recv()
call so that it appears as holding an HTX block at the front, which is
removed afterwards. This way the data received using recv() are placed
exactly at the target position in the input buffer for a later cast to
HTX.
2) when receiving data in h1_process_data(), if it appears that the input
buffer can be cast to an HTX buffer and the target buffer is empty,
then the buffers are swapped, an HTX block is prepended in front of the
data area, and the HTX block is appended to reference this data block.
In practice, this ensures that in most cases when transferring large files,
calls to h1_rcv_buf() are made using zero copy and a little bit of buffer
preparation (~40 bytes to be written).
Doing this adds an extra 13% performance boost on top of previous patch,
resulting in a total of 50% speed up on large transfers.
Just by using this buffer room estimation for the demux buffer, the large
object performance has increased by up to 33%. This is mostly due to less
recv() calls and unaligned copies.
When using the mux_pt, as we can't know if there's more data to be read,
always set CS_FL_RCV_MORE, and only remove it if we got an error or a shutr
and rcv_buf() returned 0.
If the ibuf only contains a small amount of data, realign it
before calling rcv_buf(), as it's probably going to be cheaper
to do so than to do 2 calls to recv().
It's incorrect to send more bytes than requested, because some filters
(e.g. compression) might intentionally hold on some blocks, so DATA
blocks must not be processed past the advertised byte count. It is not
the case for headers however.
No backport is needed.
If we're blocking on mux full, mux busy or whatever, we must get out of
the loop. In legacy mode this problem doesn't exist as we can normally
return 0 but here it's not a sufficient condition to stop sending, so
we must inspect the blocking flags as well.
No backport is needed.
The way htx_xfer_blks() was used is wrong, if we receive data, we must
report everything we found, not just the headers blocks. This ways causing
the EOM to be postponed and some fast responses (or errors) to be incorrectly
delayed.
No backport is needed.
In h2_snd_buf(), when running with htx, make sure we return the amount of
data the caller specified, if we emptied the buffer, as it is what the
caller expects, and will lead to him properly consider the buffer to be
empty.
With the current design, there is always an H1 stream attached to the mux. So
after the conn_stream is detached, if we don't create a new H1 stream in
h1_process(), it is important to release the mux.
In h1_recv(), return 1 if we have data available, or if h1_recv_allowed()
failed, to be sure h1_process() is called. Also don't subscribe if our buffer
is full.
Don't always wake the tasklets subscribed to recv or send events as soon as
we had any I/O event, and don't call the wake() method if there were no
subscription, instead, wake the recv tasklet if we received data in h2_recv(),
and wake the send tasklet if we were able to send data in h2_send(), and the
buffer is not full anymore.
Only call the data_cb->wake() method if we get an error/a read 0, just in
case the stream was not subscribed to receive events.
Of course, the flag FLT_CFG_FL_HTX must be used and not
STRM_FLT_FL_HAS_FILTERS. "Fortunately", these 2 flags have the same value, so
everything worked as expected.
When reaching h2_shutr/h2_shutw, as we may have generated an empty frame,
a goaway or a rst, make sure we wake the I/O tasklet, or we may not send
what we just generated.
Also in h2_shutw(), don't forget to return if all went well, we don't want
to subscribe the h2s to wait events.
The previous code was only stopping the listeners in the master, not the
entire proxy.
Since we now have a polling loop in the master, there might be some side
effects, indeed some things that are still initialized. For example the
checks were still running.
When ssl_bc_alpn was meant to be added, a typo slipped in and as a result ssl_fc_alpn behaved as ssl_bc_alpn,
and ssl_bc_alpn was not a valid keyword. this patch aims at fixing this.
This only happens for connections using the h1 mux. We must be sure to force the
version to HTTP/1.1 when the version of the message is 1.1 or above. It is
important for H2 messages to not send an invalid version string (HTTP/2.0) to
peers.
These potential null-deref warnings are emitted on gcc 7 and above
when threads are disabled due to the use of objt_server() after an
existing validity test. Let's switch to __objt_server() since we
know the pointer is valid, it will not confuse the compiler.
Some of these may be backported to 1.8.
The loop trying to figure the best server is theorically capable of
finishing the loop with best == NULL, causing the HA_ATOMIC_SUB()
to fail there. However for this to happen the list should be empty,
which is avoided at the beginning of the function. As it is, the
function still remains at risk so better address this now.
This patch should be backported to 1.8.
Add a new keyword for servers, "idle-timeout". If set, unused connections are
kept alive until the timeout happens, and will be picked for reuse if no
other connection is available.
Add a new method to muxes, "max_streams", that returns the max number of
streams the mux can handle. This will be used to know if a mux is in use
or not.
When creating a new stream, don't bother flagging a connection with
H2_CF_DEM_TOOMANY if we created the last available stream. We won't create
any other anyway, because h2_avail_streams() would return 0 available streams,
and has it is a blocking flag, it prevents us from reading data after.
The function now calls h2c_bck_handle_headers() or h2c_frt_handle_headers()
depending on the connection's side. The former doesn't create a new stream
but feeds an existing one. At this point it's possible to forward an H2
request to a backend server and retrieve the response headers.
This function does not really depend on the request, all it does is
also valid for H2 responses found on the backend side, so this patch
renames it and makes it call the appropriate decoder based on the
direction.
This creates an H2 HEADERS frame from an HTX request. The code is
very similar to the response encoding, so probably that in the future
we'll have to factor these functions differently. The HTX's start line
type is used to decide on the direction. We also purposely error out
when trying to encode an H2 request from an H1 message since it's not
implemented.
For now it reports an immediate error when trying to encode the request
since it doesn't parse as a response. We take care of sending the preface
and settings frame with the outgoing connection, and not to wait for a
preface during the H2_CS_PREFACE phase for outgoing connections.
For the backend we'll need to allocate streams as well. Let's do this
with h2c_bck_stream_new(). The stream ID allocator was split from it
so that the caller can decide whether or not to stay on the same
connection or create a new one. It possibly isn't the best way to do
this as once we're on the mux it's too late to give up creation of a
new stream. Another approach would possibly consist in detaching muxes
that reached their connection count limit before they can be reused.
Instead of choosing the stream id as soon as the stream is created, wait
until data is about to be sent. If we don't do that, the stream may send
data out of order, and so the stream 3 may send data before the stream 1,
and then when the stream 1 will try to send data, the other end will
consider that an error, as stream ids should always be increased.
Cc: Olivier Houchard <ohouchard@haproxy.com>
We declare two configurations for the H2 mux. One supporting only
the frontend in HTTP mode and one supporting both sides in HTX mode.
This is only to ease development at this point. Trying to assign an h2
mux on the server side will still fail during h2_init() anyway instead
of at config parsing time.
Currently a mux may be forced on a bind or server line by specifying the
"proto" keyword. The problem is that the mux may depend on the proxy's
mode, which is not known when parsing this keyword, so a wrong mux could
be picked.
Let's simply update the mux entry while checking its validity. We do have
the name and the side, we only need to see if a better mux fits based on
the proxy's mode. It also requires to remove the side check while parsing
the "proto" keyword since a wrong mux could be picked.
This way it becomes possible to declare multiple muxes with the same
protocol names and different sides or modes.
If we decided to emit the end of stream flag on the H2 response headers
frame, we must remove the EOM block from the HTX stream, otherwise it
will lead to an extra DATA frame being sent with the ES flag and will
violate the protocol.
Wrong variable was used to know if we need to call the callback
post_section_parser() or not. We must use 'cs' and not 'pcs'.
This patch must be backported in 1.8 with the commit 7805e2b ("BUG/MINOR:
cfgparse: Fix transition between 2 sections with the same name").
This is used for uploads, we can now convert H2 DATA frames to HTX
DATA blocks. It's uncertain whether it's better to reuse the same
function or to split it in two at this point. For now the same
function was added with some paths specific to HTX. In this mode
we loop back to the same or next frame in order to try to complete
DATA blocks.
At the moment the way it's done is not optimal. We should aggregate multiple
blocks into a single DATA frame, and we should merge the ES flag with the
last one when we already know we've reached the end. For now and for an
easier tracking of the HTX stream, an individual empty DATA frame is sent
with the ES bit when EOM is met.
The DATA function is called for DATA, EOD and EOM since these stats indicate
that a previous frame was already produced without the ES flag (typically a
headers frame or another DATA frame). Thus it makes sense to handle all these
blocks there.
There's still an uncertainty on the way the EOD and EOM HTX blocks must be
accounted for, as they're counted as one byte in the HTX stream, but if we
count that byte off when parsing these blocks, we end up sending too much
and desynchronizing the HTX stream. Maybe it hides an issue somewhere else.
At least it's possible to reliably retrieve payloads up to 1 GB over H2/HTX
now. It's still unclear why larger ones are interrupted at 1 GB.
When using HTX, we need a separate function to emit a headers frame.
The code is significantly different from the H1 to H2 conversion, though
it borrows some parts there. It looks like the part building the H2 frame
from the headers list could be factored out, however some of the logic
around dealing with end of stream or block sizes remains different.
With this patch it becomes possible to retrieve bodyless HTTP responses
using H2 over HTX.
When the proxy is configured to use HTX mode, the headers frames
will be converted to HTX header blocks instead of HTTP/1 messages.
This requires very little modifications to the existing function
so it appeared better to do it this way than to duplicate it.
Only the request headers are handled, responses are not processed
yet and data frames are not processed yet either. The return value
is inaccurate but this is not an issue since we're using it as a
boolean : data received or not.
Now h2_snd_buf() will check the proxy's mode to decide whether to use
HTX-specific send functions or legacy functions. In HTX mode, the HTX
blocks of the output buffer will be parsed and the related functions
will be called accordingly based on the block type, and unimplemented
blocks will be skipped. For now all blocks are skipped, this is only
helpful for debugging.
The function needs to be slightly adapted to transfer HTX blocks, since
it may face a full buffer on the receive path, thus it needs to transfer
HTX blocks between the two sides ignoring the <count> argument in this
mode.
The H2 mux will now be called for both HTTP and HTX modes. For now the
data transferr functions are not HTX-aware so this will lead to problems
if used as-is but it's convenient for development and debugging.
Till now we could only produce an HTTP/1 request from a list of H2
request headers. Now the new function h2_make_htx_request() does the
same but using the HTX encoding instead, while respecting the H2
semantics. The code is not much different from the first version,
only the encoding differs.
For now it's not used.
Functions analyzing request and response headers have been duplicated and
adapted to support HTX messages. The callback http_payload have been implemented
to handle the data compression itself. It loops on HTX blocks and replace
uncompressed value of DATA block by compressed one. Unlike the HTTP legacy
version, there is no chunk at all. So HTX version is significantly easier.
The callback http_headers has been updated to dump HTX headers when the HTX
internal representation is in use. And the callback http_payload has been
implemented with its hexdump function.
If there is data filters registered on the stream, the function
flt_http_payload() is called before forwarding any data. And the function
flt_http_end() is called when all data are forwarded. While at least one data
filter reamins registered on the stream, no fast forwarding is used.
First, to be called on HTX streams, a filter must explicitly be declared as
compatible by setting the flag STRM_FLT_FL_HAS_FILTERS on the filter's config at
HAProxy startup. This flag is checked when a filter implementation is attached
to a stream.
Then, some changes have been made on HTTP callbacks. The callback http_payload
has been added to filter HTX data. It will be called on HTX streams only. It
replaces the callbacks http_data, http_chunk_trailers and http_forward_data,
called on legacy HTTP streams only and marked as deprecated. The documention
(once updated)) will give all information to implement this new callback. Other
HTTP callbacks will be called for HTX and HTTP legacy streams. So it is the
filter's responsibility to known which kind of data it handles. The macro
IS_HTX_STRM should be used in such cases.
There is at least a noticeable changes in the way data are forwarded. In HTX,
after the call to the callback http_headers, all the headers are considered as
forwarded. So, in http_payload, only the body and eventually the trailers will
be filtered.
First of all, an dedicated error snapshot, h1_snapshot, has been added. It
contains more or less the some info than http_snapshot but adapted for H1
messages. Then, the function h1_capture_bad_message() has been added to capture
bad H1 messages. And finally, the function h1_show_error_snapshot() is used to
dump these errors. Only Headers or data parsing are captured.
in h1_set_cli_conn_mode(), on the response path, If the response's connection
header is explicitly set to close and if the request is unfinished (state !=
DONE), then the client connection is marked as WANT_CLO.
Instead of looking for a connection header just after the start line to know if
we must process the conn_mode by hand or if we wait to parse the connection
header, we now delay this processing when the end of headers is reached. A flag
is used to know if it was already done (or skipped) or not. This save a lookup
on headers.
During startup, after the configuration parsing, all HTTP error messages
(errorloc, errorfile or default messages) are converted into HTX messages and
stored in dedicated buffers. We use it to return errors in the HTX analyzers
instead of using ugly OOB blocks.
Instead of replying by adding an OOB block in the HTX structure, we now add a
valid HTX message. The old code relied on the function http_reply_and_close() to
send 401/407 responses. Now, we push it in the response's buffer. So we take
care to drain the request's channel and to shutdown the response's channel for
the read.
Instead of replying by adding an OOB block in the HTX structure, we now add a
valid HTX message. A header block is added to each early-hint rule, prefixed by
the start line if it is the first one. The response is terminated and forwarded
when the rules execution is stopped or when a rule of another type is applied.
the flags of the HTX start-line (HTX_SL_F_*) are mapped on ones of the HTTP
message (HTTP_MSGS_*). So we can easily retrieve info from the parsing in HTX
analyzers.
Instead, we now use the htx_sl coming from the HTX message. It avoids to have
too H1 specific code in version-agnostic parts. Of course, the concept of the
start-line is higly influenced by the H1, but the structure htx_sl can be
adapted, if necessary. And many things depend on a start-line during HTTP
analyzis. Using the structure htx_sl also avoid boring conversions between HTX
version and H1 version.
If there is no start-line, this offset is set to -1. Otherwise, it is the
relative address where the start-line is stored in the data block. When the
start-line is added, replaced or removed, this offset is updated accordingly. On
remove, if the start-line is no set and if the next block is a start-line, the
offset is updated. Finally, when an HTX structure is defragmented, the offset is
also updated accordingly.
The HTX start-line is now a struct. It will be easier to extend, if needed. Same
info can be found, of course. In addition it is now possible to set flags on
it. It will be used to set some infos about the message.
Some macros and functions have been added in proto/htx.h to help accessing
different parts of the start-line.
The function htx_add_data_before() can be used to add an HTX block before
another one. For instance, it could be used to add some data before the
end-of-message marker.
With the legacy representation, keep-alive outgoing connections are added in
private/idle/safe connections list when the transaction is cleaned up. But this
stage does not exist with the HTX representaion because a new stream, and
therefore a new transaction, is created for each request. So it is now handled
when the stream is detached from the connection.
In h1_snd_buf(), the data sending is done synchronously, as much as possible. So
if some data remains in the channel's buffer, because there was not enougth
place in the output buffer, it may be good the retry after a send because some
space may have been released when sending. Most of time the output buffer is
empty and all channel's data are consumed the first time. And if no data are
sent, we don't retry to do more. So the loop is just here to optimize edge cases
without any cost for all others.
After a call to snd_buf, if some data remain in the channel's buffer, this means
the system buffers are full or we are unable to fully consume an HTX block for
any reason. In the last case, we need to wakeup the stream to process more data
as soon as possible. We do it subscribing to send at the end of h1_snd_buf().
When trailers are parsed, we must add the corrresponsing HTX block and then we
must add the block end-of-message. But this last operation can failed because
there is not enough space the HTX message. This case was left aside till
now. Now, we stay in the state H1_MSG_TRAILERS with the warranty we will be able
to restart at the right stage.
For chunked messages, during output process, the mux is now able to write the
last empty chunk and empty trailers when corrsponding blocks have not been found
in the HTX message. It is handy for filters changing a not-chunked message into
a chunked one (like the compression filter).
In h1_set_srv_conn_mode(), we need to get the frontend proxy of a server
connection. untill now, we relied on the stream to get it. But it was a bit
dirty. The stream always exists at this stage but to get it, we also need to get
the stream-interface. Since the commit 7c6f8b146 ("MAJOR: connections: Detach
connections from streams."), the connection's owner is always the session, even
for outgoing connections. So now, we rely on the session to get the frontend
proxy in h1_set_srv_conn_mode().
Use the session instead of the stream to get
the frontend on the server connection
When the connection client is accepted, the info of the client conn_stream are
filled with the session info (accept_date, tv_accept and t_handshake). For all
other conn_streams, on client and server side, their info are filled using
global values (date and now).
In http_wait_for_response(), we wait that all outgoing data have really been
sent (from the channel's point of view) to start the processing of the
response. In fact, it is used to send all intermediate 10x responses. For now
the HTX api is not really handy when multiple messages are stored in the HTX
structure.
Because multiple HTTP messages can be stored in an HTX structure, it is
important to not forget to reset the H1 parser at the beginning of each
one. With the current version, this case only happens on the response, when
multiple HTTP-1XX responses are forwarded to the client (for instance
103-Early-Hints). So strickly speaking, it is the same message. But for now,
internally, each one is a standalone message. Note that it might change in a
future version of the HTX.
in h1_process_output(), before formatting the headers, we need to find and check
the "Connection: " header to update the connection mode. But, the context used
to do so was not correctly initialized. We must explicitly set ctx.value to NULL
to be sure to rescan the current header.
the function http_show_error_snapshot() must not use the trash buffer to append
the HTTP error description. Instead, it must use the <out> buffer, its first
argument. Note that concretely, this function always succeeds because <out> is
always the trash buffer.
When a section's parser is registered, it can also define a post section
callback, called at the end of the section parsing. But when 2 sections with the
same name followed each other, the transition between them was missed. This
induced 2 bugs. First, the call to the post section callback was skipped. Then,
the parsing of the second section was mixed with the first one.
This patch must be backported in 1.8.
http_proxy is special, because it creates its connection and conn_stream
earlier. So in assign_server(), check that the connection associated with
the conn_stream has a destination address set, and in connect_server(),
use the connection and the conn_stream already attached to the
stream_interface, instead of looking for a connection in the session, and
creating a new conn_stream.
Instead of parsing all the available connections owned by the session
each time we choose a server, even if prefer-last-server is not set,
just do it if prefer-last-server is used, and check if the server is usable,
before checking the connections.
When a transaction ends, if we want to do keepalive, and the connection we
used didn't have an owner, attach the connection to the session, so that we
don't have to destroy it, and we can reuse it later.
Instead of just storing the last connection in the session, store all of
the connections, for at most MAX_SRV_LIST (currently 5) targets.
That way we can do keepalive on more than 1 outgoing connection when the
client uses HTTP/2.
When creating a new outgoing H2 connection, put it in the idle list so that
it's immediately available for others to use, if http-reuse always is used.
While it is true the SSL code will do the right thing if the SSL handshake
is not done, we have other types of handshake to deal with (proxy protocol,
netscaler, ...). For those we definitively don't want to try to send data
before it's done. All handshakes but SSL will go through the mux_pt, so in
mux_pt_snd_buf, don't try to send while a handshake is pending.
In session_free(), make sure the outgoing connection is not in the idle list
anymore, and it does no longer have an owner, so that it will properly be
destroyed and nobody will be able to access it.
We can reach sess_update_st_con_tcp() while we still have a connection
attached, so take that into account, and free the connection, instead of
assuming it's always a conn_stream.
When freeing the session, we may fail to free the outgoing connection,
because it still has streams attached. So remove ourself from the session
list, so that the connection doesn't try to access it later.
In h2_recv(), return 1 if there's an error on the connection, not just if
there's a read0 pending, so that h2_process() can be called and act as a
janitor.
In si_cs_recv(), when there's an error on the connection or the conn_stream,
don't give up if CS_FL_RCV_MORE is set on the conn_stream, as it means there's
still data available.
In si_cs_send(), don't give up and subscribe if the connection is still
waiting for a SSL handshake. We will never be woken up once the handshake is
done if we're using HTTP/2. Instead, directly try to send data. When using
the mux_pt, if the handshake is not done yet, snd_buf() would return 0 and
we will subscribe anyway.
When we're deferring the mux choice until the ALPN is negociated, we
attach the connection to the stream_interface until it's done, so that we
can destroy it if something goes wrong and the stream is destroy.
Before calling si_attach_cs() to attach the conn_stream once we have it,
call si_detach_endpoint(), or is_attach_cs() would destroy the connection.
When we defer the mux choice until the ALPN is negociated, don't forget
to wake the stream once it's done, or it will never have the opportunity
to send data.
In ssl_sock_parse_clienthello(), the code considers that SSL Sessionid
size is '1', and then considers that the SSL cipher suite is availble
right after the session id size information.
This actually works in a single case, when the client does not send a
session id.
This patch fixes this issue by introducing the a propoer way to parse
the session id and move forward the cursor by the session id length when
required.
Need to be backported to 1.8.
In the mux_pt, when we're attaching a new conn_stream, don't forget to
unsubscribe from the connection. Failure to do so may lead to the mux_pt
freeing the connection while the conn_stream can still want to access it.
In h2_process_demux(), if we're demuxing multiple frames, and the previous
frame led to a stream getting closed, don't bogusly consider that an error,
and destroy the next stream, as there are valid cases where the stream could
be closed.
The CLOEXEC flag was set using a F_SETFL which can't work.
To set the CLOEXEC flag F_SETFD should be used, the problem is that it
needs a new call to fcntl() and it's on the path of every accept.
This flag was only needed in the case of the master, so the patch was
reverted and the flag set only in this case.
The bug was introduced by 0b3e849 ("MEDIUM: listeners: set O_CLOEXEC on
the accepted FDs").
No backport needed.
If the master was reloaded and there was a established connection to a
server, the FD resulting from the accept was leaking.
There was no CLOEXEC flag set on the FD of the socketpair created during
a connect call. This is specific to the socketpair in the master process
but it should be applied to every protocol in case we use them in the
master at some point.
No backport needed.
The previous fix da95fd90 ("BUILD/MINOR: ssl: fix build with non-alpn/
non-npn libssl") does fix the build in old OpenSSL release, but I
overlooked that the ctx is only freed when NPN is supported.
Fix this by moving the #endif to the proper place (this was broken in
c7566001 ("MINOR: server: Add "alpn" and "npn" keywords")).
Having a thread_local for the pool cache is messy as we need to
initialize all elements upon startup, but we can't until the threads
are created, and once created it's too late. For this reason, the
allocation code used to check for the pool's initialization, and
it was the release code which used to detect the first call and to
initialize the cache on the fly, which is not exactly optimal.
Now that we have initcalls, let's turn this into a per-thread array.
This array is initialized very early in the boot process (STG_PREPARE)
so that pools are always safe to use. This allows to remove the tests
from the alloc/free calls.
Doing just this has removed 2.5 kB of code on all cumulated pool_alloc()
and pool_free() paths.
signal_init(), init_log(), init_stream(), and init_task() all used to
only preset some values and lists. This needs to be done very early to
provide a reliable interface to all other users. The calls used to be
explicit in haproxy.c:init(). Now they're placed in initcalls at the
STG_PREPARE stage. The functions are not exported anymore.
Instead of exporting a number of pools and having to manually delete
them in deinit() or to have dedicated destructors to remove them, let's
simply kill all pools on deinit().
For this a new function pool_destroy_all() was introduced. As its name
implies, it destroys and frees all pools (provided they don't have any
user anymore of course).
This allowed to remove 4 implicit destructors, 2 explicit ones, and 11
individual calls to pool_destroy(). In addition it properly removes
the mux_pt_ctx pool which was not cleared on exit (no backport needed
here since it's 1.9 only). The sig_handler pool doesn't need to be
exported anymore and became static now.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
The new function create_pool_callback() takes 3 args including the
return pointer, and creates a pool with the specified name and size.
In case of allocation error, it emits an error message and returns.
The new macro REGISTER_POOL() registers a callback using this function
and will be usable to request some pools creation and guarantee that
the allocation will be checked. An even simpler approach is to use
DECLARE_POOL() and DECLARE_STATIC_POOL() which declare and register
the pool.
Most calls to hap_register_post_check(), hap_register_post_deinit(),
hap_register_per_thread_init(), hap_register_per_thread_deinit() can
be done using initcalls and will not require a constructor anymore.
Let's create a set of simplified macros for this, called respectively
REGISTER_POST_CHECK, REGISTER_POST_DEINIT, REGISTER_PER_THREAD_INIT,
and REGISTER_PER_THREAD_DEINIT.
Some files were not modified because they wouldn't benefit from this
or because they conditionally register (e.g. the pollers).
Most register_build_opts() calls use static strings. These ones were
replaced with a trivial REGISTER_BUILD_OPTS() statement adding the string
and its call to the STG_REGISTER section. A dedicated section could be
made for this if needed, but there are very few such calls for this to
be worth it. The calls made with computed strings however, like those
which retrieve OpenSSL's version or zlib's version, were moved to a
dedicated function to guarantee they are called late in the process.
For example, the SSL call probably requires that SSL_library_init()
has been called first.
This patch replaces a number of __decl_hathread() followed by HA_SPIN_INIT
or HA_RWLOCK_INIT by the new __decl_spinlock() or __decl_rwlock() which
automatically registers the lock for initialization in during the STG_LOCK
init stage. A few static modifiers were lost in the process, but since they
were not essential at all it was not worth extending the API to provide such
a variant.
This patch adds ha_spin_init() and ha_rwlock_init() which are used as
a callback to initialise locks at boot time. They perform exactly the
same as HA_SPIN_INIT() or HA_RWLOCK_INIT() but from within a real
function.
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :
- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
We reintroduced some FDs leaking by using a poller and some listeners in
the master.
The master proxy needs to be stopped to avoid leaking its listeners, the
polling loop needs to be deinit, and the thread waker pipe need to be
closed too.
No backport needed.
Surprisingly, the compression pool was created at runtime on first use,
which is not very convenient, has performance and reliability impacts,
and even makes monitoring less easy. Let's move the pool creation at
startup time instead. This even removes the need for the spinlock in
case USE_ZLIB is not defined.
The tune.maxzlibmem setting was moved with commit 368780334 ("MEDIUM:
compression: move the zlib-specific stuff from global.h to compression.c")
but the preset value using DEFAULT_MAXZLIBMEM was incorrectly moved :
- the field is in "global" and not "global.tune"
- the trailing comma instead of semi-colon will make it either zero
(threads enabled), break (threads enabled with debugging), or cast
the memprintf's return pointer to int (threads disabled)
It simply proves that nobody ever used DEFAULT_MAXZLIBMEM since 1.8!
This needs to be backported to 1.8.
Valgrind reports:
==3389== Warning: invalid file descriptor -1 in syscall close()
Check for >= 0 before closing.
This bug was introduced in commit ce83b4a5dd
and is specific to 1.9. No backport needed.
In commit c7566001 ("MINOR: server: Add "alpn" and "npn" keywords") and
commit 201b9f4e ("MAJOR: connections: Defer mux creation for outgoing
connection if alpn is set"), the build was broken on older OpenSSL
releases.
Move the #ifdef's around so that we build again with older OpenSSL
releases (0.9.8 was tested).
Since the connection changes in 1.9, some breakage happened to the H2 mux
whose initial design was heavily relying on the fact that connection-level
functions were woken up after data were transferred to the stream layer.
We need to wake the demux up after receiving such data if the demux is
blocked. This at least allows to receive POSTs again. One issue remains,
it looks like the end of the uploaded data is silently discarded if the
server responds before the end of the transfer (H2 in half-closed(local)
state), which doesn't happen with 1.8.14 and nghttp as the client.
No backport is needed.
After the changes to the connection layer in 1.9, some wake up calls
need to be introduced to re-activate reading from the connection. One
such place is at the end of h2_process_demux(), otherwise processing
of input data stops after a few frames.
No backport is needed.
When we create a connection, if we have to defer the conn_stream and the
mux creation until we can decide it (ie until the SSL handshake is done, and
the ALPN is decided), store the connection in the stream_interface, so that
we're sure we can destroy it if needed.
When ending a stream, if the origin is an appctx, the appctx will have been
destroyed already, but it does not destroy the session. So later, when we
try to destroy the session, we try to dereference sess->origin and die
trying.
Fix this by explicitely setting sess->origin to NULL before calling
session_free().
The creation of the conn_stream for an outgoing connection has been delayed
a bit, and when using dispatch, a check was made to see if a conn_stream
was attached before the conn_stream was created, so remove the test, as
it's done later anyway, and create and install the conn_stream right away
when we don't have a server, as is done when we don't have an alpn/npn
defined.
In the various connect_server() functions, don't reset the connection flags,
as some may have been set before. The flags are initialized in conn_init(),
anyway.
If an ALPN (or a NPN) was chosen for a server, defer choosing the mux until
after the SSL handshake is done, and the ALPN/NPN has been negociated, so
that we know which mux to pick.
As we now will no longer try tro subscribe to recv/send events before the
connection is established, there's no need to reactivate polling on the fd
when retrying connection. It will be activated later on subscribe.
In some situations, especially when dealing with low latency on processors
supporting a variable frequency or when running inside virtual machines,
each time the process waits for an I/O using the poller, the processor
goes back to sleep or is offered to another VM for a long time, and it
causes excessively high latencies.
A solution to this provided by this patch is to enable busy polling using
a global option. When busy polling is enabled, the pollers never sleep and
loop over themselves waiting for an I/O event to happen or for a timeout
to occur. On multi-processor machines it can significantly overheat the
processor but it usually results in much lower latencies.
A typical test consisting in injecting traffic over a single connection at
a time over the loopback shows a bump from 4640 to 8540 connections per
second on forwarded connections, indicating a latency reduction of 98
microseconds for each connection, and a bump from 12500 to 21250 for
locally terminated connections (redirects), indicating a reduction of
33 microseconds.
It is only usable with epoll and kqueue because select() and poll()'s
API is not convenient for such usages, and the level of performance they
are used in doesn't benefit from this anyway.
The option, which obviously remains disabled by default, can be turned
on using "busy-polling" in the global section, and turned off later
using "no busy-polling". Its status is reported in "show info" to help
troubleshooting suspicious CPU spikes.
Fix some memory leak and a FD leak in the error path of the master proxy
initialisation. It's a really minor issue since the process is exiting
when taking those error paths.
Valgrind's memcheck reports memory leaks in cli.c, because
the out parameter of memprintf is not properly freed:
==31035== 11 bytes in 1 blocks are definitely lost in loss record 16 of 101
==31035== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31035== by 0x4C2FDEF: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31035== by 0x4A3C72: my_realloc2 (standard.h:1364)
==31035== by 0x4A3C72: memvprintf (standard.c:3459)
==31035== by 0x4A3D93: memprintf (standard.c:3482)
==31035== by 0x4AF77E: mworker_cli_sockpair_new (cli.c:2324)
==31035== by 0x48E826: init (haproxy.c:1749)
==31035== by 0x408BBC: main (haproxy.c:2725)
==31035==
==31035== 11 bytes in 1 blocks are definitely lost in loss record 17 of 101
==31035== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31035== by 0x4C2FDEF: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31035== by 0x4A3C72: my_realloc2 (standard.h:1364)
==31035== by 0x4A3C72: memvprintf (standard.c:3459)
==31035== by 0x4A3D93: memprintf (standard.c:3482)
==31035== by 0x4AF071: mworker_cli_proxy_create (cli.c:2172)
==31035== by 0x48EC89: init (haproxy.c:1760)
==31035== by 0x408BBC: main (haproxy.c:2725)
These leaks were introduced in commits
ce83b4a5dd and
8a02257d88
which are specific to haproxy 1.9 dev.
The "cpust_{tot,1s,15s}" fields used to report milliseconds but nothing
in the value's title made this explicit. Let's rename the field to report
"cpust_ms_{tot,1s,15s}" to more easily remind that the unit represents
milliseconds.
These sample fetch keywords report performance metrics about the task calling
them. They are useful to report in logs which requests consume too much CPU
time and what negative performane impact it has on other requests. Typically
logging cpu_ns_avg and lat_ns_avg will show culprits and victims.
Right now we measure for each task the cumulated time spent waiting for
the CPU and using it. The timestamp uses a 64-bit integer to report a
nanosecond-level date. This is only enabled when "profiling.tasks" is
enabled, and consumes less than 1% extra CPU on x86_64 when enabled.
The cumulated processing time and wait time are reported in "show sess".
The task's counters are also reset when an HTTP transaction is reset
since the HTTP part pretends to restart on a fresh new stream. This
will make sure we always report correct numbers for each request in
the logs.
This is a new global setting which enables or disables CPU profiling
per task. For now it only sets/resets the variable based on the global
option "profiling.tasks" and supports showing it as well as setting it
from the CLI using "show profiling" and "set profiling". The option will
be used by a future commit. It was done in a way which should ease future
addition of profiling options.
Since we know the time it takes to process everything between two poll()
calls, we can use this as the max latency measurement any task will
experience and average it.
This code does this, and reports in "show activity" the average of this
loop time over the last 1024 poll() loops, for each thread. It will vary
quickly at high loads and slowly under low to moderate loads, depending
on the rate at which poll() is called. The latency a task experiences
is expected to be half of this on average.
At the moment the situation with activity measurement is quite tricky
because the struct activity is defined in global.h and declared in
haproxy.c, with operations made in time.h and relying on freq_ctr
which are defined in freq_ctr.h which itself includes time.h. It's
barely possible to touch any of these files without breaking all the
circular dependency.
Let's move all this stuff to activity.{c,h} and be done with it. The
measurement of active and stolen time is now done in a dedicated
function called just after tv_before_poll() instead of mixing the two,
which used to be a lazy (but convenient) decision.
No code was changed, stuff was just moved around.
The signal_register_fct() does not remove the handlers assigned to a
signal, but add a new handler to a list.
We accidentality inherited the handlers of the main() function in the
master process which is a problem because they act on the proxies.
The side effect was to stop the MASTER proxy which handle the master CLI
on a SIGUSR1, and to display some debug info when doing a SIGHUP and a
SIGQUIT.
The new function signal_unregister() removes every handlers assigned to
a signal. Once the handler list of the signal is empty, the signal is
ignored with SIG_IGN.
In the output of 'show fd', the worker CLI's socketpair was still
handled by an "unknown" function. That can be really confusing during
debug. Fixed it by showing "mworker_accept_wrapper" instead.
The mworker waitpid mode (which is used when a reload failed to apply
the new configuration) was still using a specific initialisation path.
That's a problem since we use a polling loop in the master now, the
master proxy is not initialized and the master CLI is not activated.
This patch removes the initialisation code of the wait mode and
introduce the MODE_MWORKER_WAIT in order to use the same init path as
the MODE_MWORKER with some exceptions. It allows to use the master proxy
and the master CLI during the waitpid mode.
Because the HTX is still experimental, we must add special cases during the
configuration check to be sure it is not enabled on a proxy with incompatible
options. Here, for HTX proxies, when a mux protocol is specified on a bind line
or a server line, we must force the HTX mode (PROTO_MODE_HTX).
Concretely, H2 is the only mux protocol that can be forced. And it doesn't yet
support the HTX. So forcing the H2 on an HTX proxy will always fail.
In si_cs_send(), some checks are done the CS flags en the connection flags
before calling snd_buf(). But these checks are useless because they have already
been done earlier in the function. The harder to figure out is the flag
CO_FL_SOCK_WR_SH. So it is now tested with CF_SHUTW at the beginning.
In si_cs_send, as said in comments, snd_buf() should only be called if there is
no data in the pipe anymore. But actually, this condition was not respected.
For the same reason than for the commit b46784b1c ("MINOR: stream-int: Notify
caller when an error is reported after a rcv_pipe()"), we return 1 after the
call to rcv_buf() in si_cs_send() to notify the caller some processing may be
triggered.
This patch is not flagged as a bug because no strange behaviour was yet observed
without it. It is just a proactive fix to be consistent.
In si_cs_send(), when an error is found on the CS or the connection at the
beginning of the function, we return 1 to notify the caller some processing may
be triggered. So, it seems logical to do the same after the call to rcv_pipe().
This patch is not flagged as a bug because no strange behaviour was yet observed
without it. It is just a proactive fix to be consistent.
The request is eaten when the stats applet have finished to send its
response. It was removed from the channel's buffer, removing all HTX blocks till
the EOM. But the channel's output was not reset, leaving the request channel in
an undefined state.
When we start to parse a new message, if all headers have not been received,
nothing is copied in the channel's buffer. In this situation, we must not set
the flag CS_FL_RCV_MORE on the conn-stream. If we do so, the connection freezes
because there is no data to send that can reenable the reads
First of all, we need to be sure to keep the flag H1S_F_BUF_FLUSH on the H1S
reading data until all data was flushed from the buffer. Then we need to know
when the kernel splicing is in use and when it ends. This is handled with the
new flag H1S_F_SPLICED_DATA.
Then, we must subscribe to send when some data remain in the pipe after a
snd_pipe(). It is mandatory to wakeup the stream and avoid a freeze.
Finally, we must be sure to update the message state when we restart to use the
channel's buffer. Among other things, it is mandatory to swith the message from
DATA to DONE state when all data were sent using the kernel splicing.
Don't force the close on server side anymore. Since commit 7c6f8b146 ("MAJOR:
connections: Detach connections from streams"), it is possible to release a
stream without the underlying connection.
Because of this change, we must be sure to create a new stream to handle the
next HTTP transaction only on the client side. And we must be sure to correctly
handle the read0 event in h1_recv, to be sure to call h1_process().
It avoids a copy between the rxbuf and the channel's buffer. It means the
parsing is done in h1_rcv_buf(). So we need to have a stream to start the
parsing. This change should improve the overall performances. It also implies a
better split between the connection layer and the applicative layer. Now, on the
connection layer, only raw data are manipulated. Raw data received from the
socket are stored in ibuf and those sent are get from obuf. On the applicative
layer, data in ibuf are parsed and copied into the channel's buffer. And on the
other side, those structured data are formatted and copied into obuf.
James Brown reported that when an "accept-ranges" header field is sent
through haproxy and converted from HTTP/1.1 to H2, it's mis-encoded as
"accept-language". It happens that it's one of the few very common header
fields encoded using its index value and that this index value was misread
in the spec as 17 instead of 18, resulting in the wrong name being sent.
Thanks to Lukas for spotting the issue in the HPACK encoder itself.
This fix must be backported to 1.8.
This was the largest function of the whole file, taking a rough second
to build alone. Let's move it to a distinct file along with a few
dependencies. Doing so saved about 2 seconds on the total build time.
The config parser is the largest file to build and its build dominates
the total project's build time. Let's start to split it into multiple
smaller pieces by extracting the "global" section parser into a new
file called "cfgparse-global.c". This removes 1/4th of the file's build
time.
For now, the lua scripts are not compatible with the new HTX internal
representation of HTTP messages. Thus, for a given proxy, when the option
"http-use-htx" is enabled, an error is triggered if any lua's
action/service/sample-fetch/converter is also configured.
For now, the filters are not compatible with the new HTX internal representation
of HTTP messages. Thus, for a given proxy, when the option "http-use-htx" is
enabled, an error is triggered if any filter is also configured.
To do so, the stream is created as earlier as possible. It means, during the mux
creation for the first request, and for others, just at the end of the previous
transaction. Because all timeouts are handled by the strream, the mux's task is
now useless, so it is removed. Finally, to report errors, flags are set on the
HTX message. The HTX message is passed to the stream if there is some content to
analyse or if there is some error to handle.
All of this will probably be reworked later to handle errors and timeouts
directly in the mux. For now, it is the simpler way to handle all of this.
When a server is down, the channel's data must not be consumed. This is
required to allow redispatch and connection retry. So now, we wait for
the connection to be marked as connected, with the flag CO_FL_CONNECTED,
before starting to consume channel's data. In the mux, this event is
tracked with the flag H1C_F_CS_WAIT_CONN.
It does the same than smp_prefetch_http but for HTX messages. It can be called
from an HTTP proxy or a TCP proxy. For HTTP proxies, the parsing is handled by
the mux, so it does nothing but wait. For TCP proxies, it tries to parse an HTTP
message and to convert it in a temporary HTX message. Sample fetches will use
this temporary variable to do their job.
This version is simpler than the legacy one because the parsing is no more
handled by the analyzer. So now we just need to wait to have more data to move
on.
For now, the call to the stats applet is disabled for HTX messages. But HTX
versions of the function to check the request URI against the stats URI and the
fnuction to prepare the call to the stats applet have been added.
It is more or less the same than legacy version but adapted to be called from
HTX analyzers. In the legacy version of this function, we switch on the HTX code
when applicable.
It is more or less the same than legacy version but adapted to be called from
HTX analyzers. In the legacy version of this function, we switch on the HTX code
when applicable.
It is more or less the same than legacy versions but adapted to be called from
HTX analyzers. In the legacy versions of these functions, we switch on the HTX
code when applicable.
It is more or less the same than legacy versions but adapted to be called from
HTX analyzers. In the legacy versions of these functions, we switch on the HTX
code when applicable.
It is more or less the same than del_hdr_value but adapted to be called from HTX
analyzers. The main changes is that it takes pointers on the start and the end
of the header value.
The mux-h1 now parses and formats HTTP/1 messages using the HTX
representation. The HTX analyzers have been updated too. For now, only
htx_wait_for_{request/response} and http_{request/response}_forward_body have
been adapted. Others are disabled for now.
Now, the HTTP messages are parsed by the mux on a side and then, after analysis,
formatted on the other side. In the middle, in the stream, there is no more
parsing. Among other things, the version parsing is now handled by the
mux. During the data forwarding, depending the value of the "extra" field, we
are able to know if the body length is known or not and if yes, how many bytes
are still expected.
This file will host all functions to manipulate HTTP messages using the HTX
representation. Functions in this file will be able to be called from anywhere
and are mainly related to the HTTP semantics.
The internal representation of an HTTP message, called HTX, is a structured
representation, unlike the old one which is a raw representation of
messages. Idea is to have a version-agnostic representation of the HTTP
messages, which can be easily used by to handle HTTP/1, HTTP/2 and hopefully
QUIC messages, and communication from one of them to another.
In this patch, we add types to define the internal representation itself and the
main functions to manipulate them.
The mux relies on the flag CO_RFL_BUF_FLUSH during a call to h1_rcv_buf to know
if it needs to stop reads and to flush its internal buffers to use kernel tcp
splicing. It is the caller responsibility (here the SI) to know when it must
come back on buffered exchanges.
Now, the connection mode is detected in the mux and not in HTX analyzers
anymore. Keep-alive connections are now managed by the mux. A new stream is
created for each transaction. This removes the most important part of the
synchronization between channels and the HTTP transaction cleanup. These changes
only affect the HTX part (proto_htx.c). Legacy HTTP analyzers remain untouched
for now.
On the client-side, the mux is responsible to create new streams when a new
request starts. It is also responsible to parse and update the "Connection:"
header of the response. On the server-side, the mux is responsible to parse and
update the "Connection:" header of the request. Muxes on each side are
independent. For now, there is no connection pool on the server-side, so it
always close the server connection.
For now, it only parses and transfers data. There is no internal representation
yet. It means the stream still need to parse it too. So a message is parsed 3
times today: one time by each muxes (the client one and the server one) and
another time by the stream. This is of course inefficient. But don't worry, it
is only a transitionnal state. And this mux is optional for now.
BTW, headers and body parsing are now handled using same functions than the mux
H2. Request/Response synchronization is also handled. The mux's task is now used
to catch client/http-request timeouts. Others timeouts are still handled by the
stream. On the clien-side, the stream is created once headers are fully parsed
and body parsing starts only when heeaders are transferred to the stream (ie,
copied into channel buffer).
There is still some known limitations here and there. But, it works in the
common cases. Bad message are not captured and some logs are emitted when errors
occur, only if no stream are attached to the mux. Otherwise, data are
transferred and we let the stream handles errors itself.
For now, it is just an other kind of passthrough multiplexer, but with internal
buffers to be prepared to parse incoming messages and to format outgoing
ones. There is also a task attached to it to handle timeouts. However, because
it does not handle any timeout for now, this task is unused. And finally,
because it handles internal buffers, it also handles retries on recv/send. To
use this multiplexer, you must use the option "http-use-htx" both on the
frontend and the backend.
It does not support keep-alive and will freeze connections after the first
request/response.
For now, these analyzers are just copies of the legacy HTTP analyzers. But,
during the HTTP refactoring, it will be the main place where it will be
visible. And in legacy analyzers, the macro IS_HTX_STRM is used to know if the
HTX version should be called or not.
Note: the following commits were applied to proto_http.c after this patch
was developed and need to be studied to see if an adaptation to htx
is required :
fd9b68c BUG/MINOR: only mark connections private if NTLM is detected
The flag CS_FL_READ_PARTIAL can be set by the mux on the conn_stream to notify
the stream interface that some data were received. Is is used in si_cs_recv to
re-arm read timeout on the channel.
These 2 functions are pretty naive. They only split a start-line into its 3
substrings or a header line into its name and value. Spaces before and after
each part are skipped. No CRLF at the end are expected.
By setting the flag CO_RFL_KEEP_RSV when calling mux->rcv_buf, the
stream-interface notifies the mux it must keep some space to preserve the
buffer's reserve. This flag is only useful for multiplexers handling structured
data, because in such case, the stream-interface cannot know the real amount of
free space in the channel's buffer.
This file is empty for now. But it will be used to add new versions of the HTTP
analyzers based on the internal representation of HTTP messages (not implemented
yet but called HTX).
By setting the flag CO_RFL_BUF_FLUSH when calling mux->rcv_buf, the
stream-interface notifies the mux it should flush its buffers without reading
more data. This flag is set when the SI want to use the kernel TCP splicing to
forward data. Of course, the mux can respect it or not, depending on its
state. It's just an information.
Do not destroy the connection when we're about to destroy a stream. This
prevents us from doing keepalive on server connections when the client is
using HTTP/2, as a new stream is created for each request.
Instead, the session is now responsible for destroying connections.
When reusing connections, the attach() mux method is now used to create a new
conn_stream.
Introduce a new field in session, "srv_conn", and a linked list of sessions
in the connection. It will be used later when we'll switch connections
from being managed by the stream, to being managed by the session.
Add a new method for mux, avail_streams, that returns the number of streams
still available for a mux.
For the mux_pt, it'll return 1 if the connection is in idle, or 0. For
the H2 mux, it'll return the max number of streams allowed, minus the number
of streams currently in use.
In order to make the mux_pt able to handle idle connections, give it its
own context, where it'll stores the connection, the current conn_stream if
any, and a wait_event, so that it can subscribe to I/O events.
Add a new parameter to the detach() method, that gives the mux a hint
if it should destroy the connection or not when detaching a conn_stream.
If 1, then the mux_pt immediately destroys the connecion, if 0, then it
just subscribes to any read event. If a read happens, it will call
conn_sock_drain(), and if there's a connection error, it'll free the
connection, after removing it from the idle list.
Instead of trying to receive as soon as the connection is created, and to
eventually have to transfer subscription if we move connections, wait
until the connection is established before attempting to recv.
Remaining calls to si_cant_put() were all for lack of room and were
turned to si_rx_room_blk(). A few places where SI_FL_RXBLK_ROOM was
cleared by hand were converted to si_rx_room_rdy().
The now unused si_cant_put() function was removed.
The channel can disable reading from the stream-interface using various
methods, such as :
- CF_DONT_READ
- !channel_may_recv()
- and possibly others
Till now this was done by mangling SI_FL_RX_WAIT_EP which is not
appropriate at all since it's not the stream interface which decides
whether it wants to deliver data or not. Some places were also wrongly
relying on SI_FL_RXBLK_ROOM since it was the only other alternative,
but it's not suitable for CF_DONT_READ.
Let's use the SI_FL_RXBLK_CHAN flag for this instead. It will properly
prevent the stream interface from being woken up and reads from
subscribing to more receipt without being accidently removed. It is
automatically reset if CF_DONT_READ is not set in stream_int_notify().
The code is not trivial because it splits the logic between everything
related to buffer contents (channel_is_empty(), CF_WRITE_PARTIAL, etc)
and buffer policy (CF_DONT_READ). Also it now needs to decide timeouts
based on any blocking flag and not just SI_FL_RXBLK_ROOM anymore.
It looks like this patch has caused a minor performance degradation on
connection rate, which possibly deserves being investigated deeper as
the test conditions are uncertain (e.g. slightly more subscribe calls?).
For a long time, stream_int_update() and stream_int_notify() used to only
conditionally call si_chk_rcv() based on state change detection. This
detection is not reliable and quite complex. With the new blocked flags
that si_chk_rcv() checks, it's much more reliable to always call the
function to take into account recent changes,and let it decide if it needs
to wake something up or not.
This also removes the calls to si_chk_rcv() that were performed in
si_update_both() since these ones are systematically performed in
stream_int_update() after updating the Rx flags.
Till now we were using si_done_put() upon shutr, but these flags could
be reset upon next activity. Now let's switch to SI_FL_RXBLK_SHUT which
doesn't go away. It's also set in stream_int_update() in case a shutr
condition is detected.
The now unused si_done_put() was removed.
A number of calls to si_cant_put() were used in fact to request being
called back once a buffer is available. These ones are not needed anymore
since si_alloc_ibuf() already sets the SI_FL_RXBLK_BUFF flag when called
in appctx context. Those called with a foreign stream-int are simply turned
to si_rx_buff_blk().
A number of si_cant_put() calls were still present to in fact indicate
that the end point is ready (thus should be turned to si_rx_endp_more()).
One other call in the Lua handler indicates that the endpoint wanted to
be blocked until some room is made in the Rx buffer in order to detect
that the connection happened, which is in fact an indication that it
wants to be called once the endpoint is ready, this is the default case
for an applet so this call was removed.
A useless call to si_cant_put() before appctx_wakeup() in the Lua
applet wakeup call was removed as well since the first thing that will
be done there will be to set end ENDP blocking flag.
If an applet reports being blocked due to any of the channel-side flags,
it's reportedly ready to deliver incoming data. It's better to do this
after the return from the applet handler so that applet developers don't
have to worry about details related to flags ordering.
Instead of first indicating that there's more data to read from the
conn_stream then re-adjusting this info along the function, we now
instead set the status according to the subscription status at the
end. It's easier, more accurate, and less sensitive to intermediary
changes.
This will soon allow to remove all the si_cant_put() calls that were
placed in the middle to force a subsequent callback and prevent the
function from subscribing to the mux layer.
The stream interface used to conflate a missing buffer and lack of
buffer space into SI_FL_WAIT_ROOM but this causes difficulties as
these cannot be checked at the same moment and are not resolved at
the same moment either. Now we instead mark the buffer as presumably
available using si_rx_buff_rdy() and mark it as unavailable+requested
using si_rx_buff_blk().
The call to si_alloc_buf() was moved after si_stop_put(). This makes
sure that the SI_FL_RX_WAIT_EP flag is cleared on allocation failure so
that the function is called again if the callee fails to do its work.
The SI_FL_WANT_PUT flag is used in an awkward way, sometimes it's
set by the stream-interface to mean "I have something to deliver",
sometimes it's cleared by the channel to say "I don't want you to
send what you have", and it has to be set back once CF_DONT_READ
is cleared. This will have to be split between SI_FL_RX_WAIT_EP
and SI_FL_RXBLK_CHAN. This patch only replaces all uses of the
flag with its natural (but negated) replacement SI_FL_RX_WAIT_EP.
The code is expected to be strictly equivalent. The now unused flag
was completely removed.
This flag is not enough to describe all blocking situations, as can be
seen in each case we remove it. The muxes has taught us that using multiple
blocking flags in parallel will be much easier, so let's start to do this
now. This patch only renames this flags in order to make next changes more
readable.
There currently is an optimization in stream_int_notify() consisting
in not trying to forward small bits of data if extra data remain to be
processed. The purpose is to avoid forwarding one chunk at a time if
multiple chunks are available to be parsed at once. It consists in
avoiding sending pending output data if there are still data to be
parsed in the channel's buffer, since process_stream() will have the
opportunity to deal with them all at once.
Not only this optimization is less useful with the new way the connections
work, but it even causes problems like lost events since WAIT_ROOM will
not be removed. And with HTX, it will never be able to update the input
buffer after the first read.
Let's relax the rules now, by always sending if we don't have the
CF_EXPECT_MORE flag (used to group writes), or if the buffer is
already full.
The function used to abuse the internals of mux_pt to retrieve a
conn_stream, which will not work anymore after the idle connection
changes. Let's make it rely on the more reliable cs_get_first()
instead.
This method is used to retrieve the first known good conn_stream from
the mux. It will be used to find the other end of a connection when
dealing with the proxy protocol for example.
Commit d4dd22d ("MINOR: h2: Let user of h2_recv() and h2_send() know xfer
has been done") changed the API without documenting the expected returned
values which appear to come out of nowhere in the code :-( Please don't
do that anymore! The description was recovered from the commit message.
To be used, error messages declared in a default section must be copied when the
parsing of a proxy section starts. But this was only done for frontends.
This patch may be backported to older versions.
There are still some unwelcome synchronous calls to si_cs_recv() in
process_stream(). Let's have a new function si_sync_recv() to perform
a synchronous receive call on a stream interface regardless of the type
of its endpoint, and move these calls there. For now it only implements
conn_streams since it doesn't seem useful to support applets there. The
function implements an extra check for the stream interface to be in an
established state before attempting anything.
In commit f26c26c ("BUG/MEDIUM: stream-int: change the way buffer room
is requested by a stream-int") we used to call si_want_put() at the
end of sess_update_st_con_tcp(), when switching to SI_ST_EST state.
But this is incorrect as there are a few other situations where we
can switch to this state, such as in si_connect() where a connection
reuse is detected, or when directly calling an applet (in which case
that was already covered anyway). For now it doesn't have any side
effect but it could impact connection reuse after the stream-int
changes by stalling an immediately reused connection.
Let's move this flag change to sess_establish() instead, which is the
only place which is always called exactly once on connection setup.
No backport is needed, this is purely 1.9.
In master-worker mode, the socketpair CLI listener of the worker is now
marked unstoppable, which allows to connect to the CLI of an old process
which is in a leaving state, allowing to debug it.
An unstoppable listener is a listener which won't be stop during a soft
stop. The unstoppable_jobs variable is incremented and the listener
won't prevent the process to leave properly.
It is not a good idea to use this feature (the LI_O_NOSTOP flag) with a
listener that need to be bind again on another process during a soft
reload.
This patch allows a process to properly quit when some jobs are still
active, this feature is handled by the unstoppable_jobs variable, which
must be atomically incremented.
During each new iteration of run_poll_loop() the break condition of the
loop is now (jobs - unstoppable_jobs) == 0.
The unique usage of this at the moment is to handle the socketpair CLI
of a the worker during the stopping of the process. During the soft
stop, we could mark the CLI listener as an unstoppable job and still
handle new connections till every other jobs are stopped.
The previous commit fedceaf33 ("MINOR: http: Regroup return statements of
http_req_get_intercept_rule at the end") partly fixes the problem. But not
entierly. Because HTTP 103 reponses were sent line by line it is possible to mix
them with others. For instance, an early-hint rule followed by a redirect rule
leaving the response buffer totally messed up. Furthermore, if we fail to add
the last CRLF to finish the HTTP 103 response because there is no more space in
the buffer, it leave the buffer with an unfinished and invalid message.
This patch fixes the bug by creating a fully formed HTTP 103 response before
trying to push it in the response buffer. If an error occurred during the copy
or if another response was already sent, the HTTP 103 response is
ignored. However, the last point should never happened because, for redirects
and authentication errors, we first try to copy any pending HTTP 103 response.
Instead of having multiple return statements spreaded here and there in middle
of the function, we just exit from the loop setting the right return code. It
let a chance to do some work before leaving the function. It is also less error
prone.
Instead of having multiple return statements spreaded here and there in middle
of the function, we just exit from the loop setting the right return code. It
let a chance to do some work before leaving the function. It is also less error
prone.
This patch fixes a bug introduced in the commit 6b952c810 ("REORG: http: move
http_get_path() to http.c"). In the reorg, the code responsible to skip the
version to only extract the path in the HTTP request was dropped.
No backport is needed, this only affects 1.9.
When splice() reports a pipe full condition, we go through the common
code used to release a possibly empty pipe (which we don't have) and which
immediately tries to allocate a buffer that will never be used. Further,
it may even subscribe to get this buffer if the resources are low. Let's
simply get out of this way if the pipe is full.
This fix could be backported to 1.8 though the code is a bit different
overthere.
Since we don't necessarily pass through conn_fd_handler() when reading,
conn_refresh_polling_flags() is not necessarily called when performing
a recv() operation, thus flags like CO_FL_WAIT_ROOM are not cleared.
It happens that si_cs_recv() checks CO_FL_WAIT_ROOM before deciding to
receive into a buffer, to see if the previous rcv_pipe() call failed by
lack of pipe room. The combined effect of these two statements is that
at the end of a file transmission, when there's too little data to
warrant the use of a pipe and the pipe is empty, we refrain from using
rcv_pipe() for the last few bytes, but since CO_FL_WAIT_ROOM is still
present, we don't use rcv_buf() either, and the connection remains
frozen in this state with si_cs_recv() called in loops.
In order to fix this we can simply manually clear CO_FL_WAIT_ROOM when
not using pipe so that the next check sees the result of the previous
operation and not an old one. We could equally call
cond_refresh_polling_flags() but that would be overkill and dangerous
given that it would manipulate the connection's flags under the mux.
By the way ideally the mux should report this flag into the connstream
for cleaner manipulation.
No backport is needed as this is only post 1.9-dev2.
As part of the changes that went into 1.9-dev2 regarding the polling
modifications, the changes consecutive to the removal of the wait_list
from the conn_streams (commit 71384551a) made si_cs_recv() occasionally
return without subscribing to receive events, causing spliced transfers
to randomly fail if the client was at least as fast as the server. This
may remain unnoticed on most deployments since servers are usually close
to haproxy with higher bandwidth than clients have, resulting in buffers
always being full.
In order to reproduce his effect, it is better to do it on the local
machine and to transfer very large objects (hundreds of gigs) over a
single connection, to see it suddenly stall after a few tens of gigs.
Now with this fix it's fine even after 3 TB over a single connection.
No backport is needed.
When we allocate struct stksess, we also allocate memory to store the
associated data before the struct itself.
As the data can be of different types, they can have different size. However,
we need the struct stksess to be properly aligned, as it can do 64bits
load/store (including atomic load/stores) on 64bits platforms, and some of
them doesn't support unaligned access.
So, when allocating the struct stksess, round the size up to the next
multiple of sizeof(void *), and make sure the struct stksess itself is
properly aligned.
Many thanks to Paul Martin for investigating and reporting that bug.
This should be backported to earlier releases.
When configuring the logs with a FD and using the master worker, the FD
was closed upon a reload because it was configured with CLOEXEC. It
leads to using the wrong FD for the logs and to close them. Which is
unfortunate since the master rely on the FD left opened during a reload.
The fix is to stop doing a CLOEXEC when the FD is inherited.
No backport needed.
This patch implements http_apply_early_hint_rule() function is responsible of
building HTTP 103 Early Hint responses each time a "early-hint" rule is matched.
This patch adds a "early_hint" struct to "arg" union of "act_rule" struct
and parse "early-hint" http-request keyword with it using the same
code as for "(add|set)-header" parser.
When namespaces are disabled, support is still reported because the file
is built with almost nothing in it but built anyway. Instead of extending
the scope of the numerous ifdefs in this file, better avoid building it
when namespaces are diabled. In this case we define my_socketat() as an
inline function mapping directly to socket(). The struct netns_entry
still needs to be defined because it's used by various other functions
in the code.
Splicing was in great part broken over the last few development version
due to the use of co_data() to detect if data are available in the channel.
But co_data() only looks at buffered data, not spliced data.
Channel_is_empty() takes care of both and should be used. With this,
splicing restarts to work but there are still a few cases where transfers
may stall.
No backport is needed.
Subsequent to the recent stream-int updates, we started to consider that
SI_FL_WANT_PUT needs to be set when receipt is enabled, but this is wrong
and results in 100% CPU when an HTTP client stays idle after a keep-alive
request because the stream-int has nothing to provide and nothing to send.
In fact just like for applets this flag should reflect the continuation
of an attempt. So it's si_cs_recv() which should set the flag, and clear
it if it has nothing more to provide. This function is called the first
time in process_stream()), and called again during transfers, so it will
always be up to date during stream_int_update() and stream_int_notify().
As a special case, it should also be set when a connection switches to
the established state. And we should absolutely refrain from calling
si_cs_recv() to re-enable reading, normally just setting this flag
(from within the stream-int's handler or prior to calling si_chk_rcv())
is expected to be OK.
A corner case remains where it was observed that in stream_int_notify() we
can sometimes be called with an empty output channel with SI_FL_WAIT_ROOM
and no CF_WRITE_PARTIAL, so there's no way to detect that we should
re-enable receiving. It's easy to also take care of this condition
there for the time it takes to figure if this situation is expected
or not.
Now it becomes more obvious that relying on a single flag to request
room (or on two flags to arbiter activity) is not workable given the
autonomy of both sides. The mux_h2 has taught us that blocking flags
are much more reliable, require much less condition and are much easier
to deal with. That's probably something to consider quickly in this
area.
No backport is needed.
This format is pretty similar to the previous "short" format except
that it also removes the severity level. Thus only the raw message is
sent. This is suitable for use in containers, where only the raw
information is expected and where the severity is supposed to come
from the file descriptor used.
This format is meant to be used with local file descriptors. It emits
messages only prefixed with a level, removing all the process name,
system name, date and so on. It is similar to the printk() format used
on Linux. It's suitable to be sent to a local logger compatible with
systemd's output format.
Note that the facility is still required but not used, hence it is
suggested to use "daemon" to remind that it's a local logger.
Example :
log stdout format short daemon # send everything to stdout
log stderr format short daemon notice # send important events to stderr
In certain situations it would be desirable to log to an existing file
descriptor, the most common case being a pipe between containers or
processes. The main issue with pipes is that using write() on them will
randomly truncate messages. But there is a trick. By using writev(), we
can atomically deliver or drop a message, which perfectly fits the
purpose. The only caveat is that large messages (4096 bytes on modern
operating systems) may be interleaved with messages from other processes
if using nbproc for example. In practice such messages are rare and most
of the time when users need such type of logging, the load is low enough
for a single process to be running so this is not really a problem.
This logging method thus uses unbuffered writev() calls and is uses more
CPU than if it used its own buffer with large writes at once, though this
is not a problem for moderate loads.
Logging to a file descriptor attached to a file also works with the side
effect that the process is significantly slowed down during disk accesses
and that it's not possible to rotate the file without restarting the
process. For this reason this option is not offered as a configuration
option, since it would confuse most users, but one could decide to
redirect haproxy's output to a file during debugging sessions. Two aliases
"stdout" and "stderr" are provided, but keep in mind that these are closed
by default in daemon mode.
When logging to a pipe or socket at a high enough rate, some logs will be
dropped and the number of dropped messages is reported in "show info".
It's easy to detect when logs on some paths are lost as sendmsg() will
return EAGAIN. This is particularly true when sending to /dev/log, which
often doesn't support a big logging capacity. Let's keep track of these
and report the total number of dropped messages in "show info".
The error messages used to say something along "socket logger 2 failed"
or "sendmsg logger 2 failed" which are confusing. Let's rephrase this
"sendmsg() failed for logger 2".
Building on 32 bit gives this :
src/cache.c: In function 'http_action_store_cache':
src/cache.c:466:4: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:467:5: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c: In function 'cache_channel_append_age_header':
src/cache.c:578:2: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:579:3: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
It's because of the definition below added in commit e7a770c ("MINOR:
cache: Add "Age" header.") :
#define CACHE_ENTRY_MAX_AGE 2147483648
Just appending "U" to mark it unsigned is enough to fix it. This only
affects 1.9, no backport is needed.
In 1.8, commit 45a66cc ("MEDIUM: config: ensure that tune.bufsize is at
least 16384 when using HTTP/2") tried to avoid an annoying issue making
H2 fail when haproxy is built with default buffer sizes smaller than 16kB,
which used to be the case for a very long time. Sadly, the test only sees
when NPN/ALPN exactly match "h2" and not when it's combined like
"h2,http/1.1" nor "http/1.1,h2". We can safely use strstr() there because
the string is prefixed by the token's length (0x02) which is unambiguous
as it cannot be part of any other token.
This fix should be backported to 1.8 as a safety guard against bad
configurations.
Before calling the mux to get incoming data, we get the amount of space
available at the input of the buffer. If there is no space, we don't try to read
more data. This is good enough when raw data are stored in the buffer. But this
info has no meaning when structured data are stored. Because with the HTTP
refactoring, such kind of data will be stored in buffers, it is a bit annoying.
So, to avoid any problems, we always call the mux. It is the mux's responsiblity
to notify the stream interface it needs more space to store more data. This must
be done by setting the flag CS_FL_RCV_MORE on the conn_stream.
This is exactly what we do in the pass-through mux when <count> is null.
This flag is set on the stream interface when we should wait for more space in
the channel's buffer to store more incoming data. This means we should wait some
outgoing data are sent before retrying to receive more data.
But in stream interface functions, at many places, instead of checking this
flag, we use the function channel_may_recv to know if we can (re)start
reading. This currently works but it is not really consistent. And, it works
because only raw data are stored in buffers. But it will be a problem when we
start to store structured data in buffers.
So to avoid any problems with futur implementations, we now rely only on
SI_FL_WAIT_ROOM. The function channel_may_recv can still be called, but only
when we are sure to handle raw data (for instance in functions ci_put*). To do
so, among other things, we must be sure to unset SI_FL_WAIT_ROOM and offer an
opportunity to call chk_rcv() on a stream interface when some data are sent
on the other end, which is now granted by the previous patch series.
We exclusively use stream_int_update() now, the lower layers are not
called anymore so let's remove them, as well as si_update() which used
to be their wrapper.
It's far from being clean, but at least it allows to resync both CS and
applets from the same place, taking into account the fact that CS are
processed synchronously for the send side while appletx are processed
outside of the process_stream() loop. The arrangement is optimised to
minimize the amount of iteration by handling send first, then updating
the SI_FL_WAIT_ROOM flags and only then dealing with si_chk_rcv() on
both sides. The SI_FL_WANT_PUT flag is set if needed before calling
si_chk_rcv() since this is done prior to calling stream_int_update().
Now there's no risk that stream_int_notify() is called anymore during
such operations, thus we cannot have any spurious wake-up anymore. The
case where a successful send() could complete a pending connect() is
handled by taking any stream-int state changes into account at the
call place, which is normal since process_stream() is designed to
iterate till stabilisation.
Doing this solves most of the remaining inconsistencies between CS and
applets.
The function used to be called in turn for each side of the stream, but
since it's called exclusively from process_stream(), it prevents us from
making use of the knowledge we have of the operations in progress for
each side, resulting in having to go all the way through functions like
stream_int_notify() which are not appropriate there.
That patch creates a new function, si_update_both() which takes two
stream interfaces expected to belong to the same stream, and processes
their flags in a more suitable order, but for now doesn't change the
logic at all.
The next step will consist in trying to reinsert the rest of the socket
layer-specific update code to ultimately update the flags correctly at
the end of the operation.
Instead of clearing the SI_FL_WAIT_ROOM flag and losing the information
about the need from the producer to be woken up, we now call si_chk_rcv()
immediately. This is cheap to do and it could possibly be further improved
by only doing it when SI_FL_WAIT_ROOM was still set, though this will
require some extra auditing of the code paths.
The only remaining place where the flag was cleared without a call to
si_chk_rcv() is si_alloc_ibuf(), but since this one is called from a
receive path woken up from si_chk_rcv() or not having failed, the
clearing was not necessary anymore either.
And there was one place in stream_int_notify() where si_chk_rcv() was
called with SI_FL_WAIT_ROOM still explicitly set so this place was
adjusted in order to clear the flag prior to calling si_chk_rcv().
Now we don't have any situation where we randomly clear SI_FL_WAIT_ROOM
without trying to wake the other side up, nor where we call si_chk_rcv()
with the flag set, so this flag should accurately represent a failed
attempt at putting data into the buffer.
When CF_DONT_READ is set, till now we used to set SI_FL_WAIT_ROOM, which
is not appropriate since it would lose the subscribe status. Instead let's
clear SI_FL_WANT_PUT (just like applets do), and set the flag only when
CF_DONT_READ is cleared.
We have to do this in stream_int_update(), and in si_cs_io_cb() after
returning from si_cs_recv() since it would be a bit invasive to hack
this one for now. It must not be done in stream_int_notify() otherwise
it would re-enable blocked applets.
Last, when si_chk_rcv() is called, it immediately clears the flag before
calling ->chk_rcv() so that we are not tempted to uselessly loop on the
same call until the receive function is called. This is the same principle
as what is done with the applet scheduler.
This flag should already be cleared before calling the *chk_rcv() functions.
Before adapting all call places, let's first make sure si_chk_rcv() clears
it before calling them so that these functions do not have to check it again
and so that they do not adjust it. This function will only call the lower
layers if the SI_FL_WANT_PUT flag is present so that the endpoint can decide
not to be called (as done with applets).
We now do this on the si_cs_recv() path so that we always have
SI_FL_WANT_PUT properly set when there's a need to receive and
SI_FL_WAIT_ROOM upon failure.
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
The buffer allocation callback appctx_res_wakeup() used to rely on old
tricks to detect if a buffer was already granted to an appctx, namely
by checking the task's state. Not only this test is not valid anymore,
but it's inaccurate.
Let's solely on SI_FL_WAIT_ROOM that is now set on allocation failure by
the functions trying to allocate a buffer. The buffer is now allocated on
the fly and the flag removed so that the consistency between the two
remains granted. The patch also fixes minor issues such as the function
being improperly declared inline(!) and the fact that using appctx_wakeup()
sets the wakeup reason to TASK_WOKEN_OTHER while we try to use TASK_WOKEN_RES
when waking up consecutive to a ressource allocation such as a buffer.
This function replaces stream_res_available(), which is used as a callback
for the buffer allocator. It now carefully checks which stream interface
was blocked on a buffer allocation, tries to allocate the input buffer to
this stream interface, and wakes the task up once such a buffer was found.
It will automatically remove the SI_FL_WAIT_ROOM flag upon success since
the info this flag indicates becomes wrong as soon as the buffer is
allocated.
The code is still far from being perfect because if a call to si_cs_recv()
fails to allocate a buffer, we'll still end up passing via process_stream()
again, but this could be improved in the future by using finer-grained
wake-up notifications.
When using the CLI proxy of the master and trying to access a worker
with the @ prefix, the worker just crash.
The commit 7216032 ("MEDIUM: mworker: leave when the master die")
reintroduced the old code of the pipe, which was not trying to access
the pointers before. The owner of the FD was modified to a different
value, this is a problem since we call listener_accept() in most cases
now from the mworker_accept_wrapper() and it casts the owner variable to
get the listener.
This patch fix the issue by setting back the previous owner of the FD.
Commit eafd8ebcf ("MEDIUM: stream-int: call si_cs_process() in
stream_int_update_conn") uncovered a sleeping bug. By calling
si_cs_process() within si_update(), we end up calling stream_int_notify().
We rely on it to update the stream-int before quitting as a hack, but
it happens to immediately wake the task up while the stream int's
state is still SI_ST_CON (during the connection establishment). The
observable effect is that an unreachable server causes haproxy to
use 100% CPU until the connection timeout strikes.
This patch fixes this by not causing the wake up for the SI_ST_CON
state. It would equally be possible to check for states higher than
SI_ST_EST as is done in other places, but for now better stay on the
safe side by covering the only issue that can be triggered. It's
suspected that this issue slightly affects older versions by causing
one extra call to process_stream() during the connection setup for
each activity change on the other side, but this should not have
any observable effect.
No backport is needed.
The process was aborting with nbthread > 1.
The mworker_pipe_register() could be called several time in multithread
mode, we don't want to abort() there.
It took me 17 minutes this morning to figure where si->wait_event was
set (it's in si_reset() which should now probably be renamed since it
doesn't just perform a reset anymore but also an allocation) and what
its task was assigned to (si_cs_io_cb() even for applets and empty SI).
This is too confusing and not intuitive enough, let's at least add a
few comments for now to help figure how this stuff works next time.
When the master die, the worker should exit too, this is achieved by
checking if the FD of the socketpair/pipe was closed between the master
and the worker.
In the former architecture of the master-worker, there was only a pipe
between the master and the workers, and it was easy to check an EOF on
the pipe FD to exit() the worker.
With the new architecture, we use a socketpair by process, and this
socketpair is also used to accept new connections with the
listener_accept() callback.
This accept callback can't handle the EOF and the exit of the process,
because it's very specific to the master worker. This is why we
transformed the mworker_pipe_handler() function in a wrapper which check
if there is an EOF and exit the process, and if not call
listener_accept() to achieve the accept.
The former behavior was to exit() the master process with the latest
status code known, which was the one of the last process to exit.
The problem is that the master process was not exiting with the status
code which provoked the exit-on-failure.
The active peers output indicates both the number of established peers
connections and the number of peers connection attempts. The new counter
"ConnectedPeers" also indicates the number of currently connected peers.
This helps detect that some peers cannot be reached for example. It's
worth mentioning that this value changes over time because unused peers
are often disconnected and reconnected. Most of the time it should be
equal to ActivePeers.
Peers are the last type of activity which can maintain a job present, so
it's important to report that such an entity is still active to explain
why the job count may be higher than zero. Here by "ActivePeers" we report
peers sessions, which include both established connections and outgoing
connection attempts.
When an haproxy process doesn't stop after a reload, it's because it
still has some active "jobs", which mainly are active sessions, listeners,
peers or other specific activities. Sometimes it's difficult to troubleshoot
the cause of these issues (which generally are the result of a bug) only
because some indicators are missing.
This patch add the number of listeners, the number of jobs, and the stopping
status to the output of "show info". This way it becomes a bit easier to try
to narrow down the cause of such an issue should it happen. A typical use
case is to connect to the CLI before reloading, then issuing the "show info"
command to see what happens. In the normal situation, stopping should equal
1, jobs should equal 1 (meaning only the CLI is still active) and listeners
should equal zero.
The patch is so trivial that it could make sense to backport it to 1.8 in
order to help with troubleshooting.
The tasks API was changed in 1.9-dev1 with commit 9f6af3322 ("MINOR: tasks:
Change the task API so that the callback takes 3 arguments."), causing the
task's state not to be usable anymore and to have been replaced with an
explicit argument in the callee. The task's state doesn't contain any trace
of the wakeup cause anymore. But there were two places where the old task's
state remained in use :
- sessions, used to more accurately report timeouts in logs when seeing
TASK_WOKEN_TIMEOUT ;
- peers, used to finish resynchronization when seeing TASK_WOKEN_SIGNAL
This commit fixes both occurrences by making sure we don't access task->state
directly (should we rename it by the way ?).
No backport is needed.
This one causes some events to be lost. It has already been tested in
an experimental branch but was not merged until being certain it was
needed. Fred figured that requesting /?k=1&s=447392 from httpterm through
haproxy-master was enough to stall the transfer.
No backport is needed, this only affects 1.9-dev5.
It was reported here that authentication may fail when threads are
enabled :
https://bugzilla.redhat.com/show_bug.cgi?id=1643941
While I couldn't reproduce the issue, it's obvious that there is a
problem with the use of the non-reentrant crypt() function there.
On Linux systems there's crypt_r() but not on the vast majority of
other ones. Thus a first approach consists in placing a lock around
this crypt() call. Another patch may relax it when crypt_r() is
available.
This fix must be backported to 1.8. Thanks to Ryan O'Hara for the
quick notification.
A bug occurs when the CLI proxy of the master received a command which
is prefixed by some spaces but without a routing prefix (@).
In this case the pcli_parse_request() was returning a wrong number of
data to forward.
The response analyzer was called twice and the prompt displayed twice.
Commit 85b73e9 ("BUG/MEDIUM: stream: Make sure polling is right on retry.")
introduced a possible null dereference on the error path detected by gcc-7.
Let's simply assign srv_conn after checking the error and not before.
No backport is needed.
When the "path" sample fetch function is called without any path, the
function doesn't check that the request buffer is allocated. While this
doesn't happen with the request during processing, it can definitely
happen when mistakenly trying to reference a path from the response
since the request channel is not allocated anymore.
It's certain that this bug was emphasized by the buffer changes that
went in 1.9 and the HTTP refactoring, but at first glance, 1.8 doesn't
seem 100% safe either so it's possible that older version are affected
as well.
Thanks to PiBa-NL for reporting this bug with a reproducer.
This patch makes the cache capable of adding an "Age" header as defined by
rfc7234.
During the storage of new HTTP objects we memorize ->eoh value and
the value of the "Age" header coming from the origin server.
These information may then be reused to return the cached HTTP objects
with a new "Age" header.
May be backported to 1.8.
This patch implements analysers for parsing the CLI and extra features
for the master's CLI.
For each command (sent alone, or separated by ; or \n) the request
analyser will determine to which server it should send the request.
The 'mode cli' proxy is able to parse a prefix for each command which is
used to select the apropriate server. The prefix start by @ and is
followed by "master", the PID preceded by ! or the relative PID. (e.g.
@master, @1, @!1234). The servers are not round-robined anymore.
The command is sent with a SHUTW which force the server to close the
connection after sending its response. However the proxy allows a
keepalive connection on the client side and does not close.
The response analyser does not do much stuff, it only reinits the
connection when it received a close from the server, and forward the
response. It does not analyze the response data.
The only guarantee of the end of the response is the close of the
server, we can't rely on the double \n since it's not send by every
command.
This could be reimplemented later as a filter.
Add a struct server pointer in the mworker_proc struct so we can easily
use it as a target for the mworker proxy.
pcli_prefix_to_pid() is used to find the right PID of the worker
when using a prefix in the CLI. (@master, @#<relative pid> , @<pid>)
pcli_pid_to_server() is used to find the right target server for the
CLI proxy.
The master process does not need all the keywords of the cli, add 2
flags to chose which keyword to use.
It might be useful to activate some of them in a debug mode later...
This patch introduces mworker_cli_proxy_new_listener() which allows the
creation of new listeners for the CLI proxy.
Using this function it is possible to create new listeners from the
program arguments with -Sa <unix_socket>. It is allowed to create
multiple listeners with several -Sa.
This patch implements a listen proxy within the master. It uses the
sockpair of all the workers as servers.
In the current state of the code, the proxy is only doing round robin on
the CLI of the workers. A CLI mode will be needed to know to which CLI
send the requests.
The init code of the mworker_proc structs has been moved before the
init of the listeners.
Each socketpair is now connected to a CLI within the workers, which
allows the master to access their CLI.
The inherited flag of the worker side socketpair is removed so the
socket can be closed in the master.
There's a call there to si_cs_send() while we're supposed to come from
si_cs_io_cb() which has just done it. But in fact we can also come here
as a lower layer callback from ->wake() after a connection is established.
Since most of the time we'll end up here with either no data in the buffer
or a blocked output, let's simply check if we're already susbcribed to send
events before calling si_cs_send().
stream_int_notify() is I/O agnostic and should not wake up the tasklet,
it's up to si_cs_process() to do that, just like si_applet_wake_cb()
does it for the applet.
This one was added by commit 53216e7db ("MEDIUM: connections: Don't
directly mess with the polling from the upper layers.") after the
removal of the conditional cs_want_send() call. But after analysis
it turned out that it's not needed since the si_cs_send() call will
either succeed or subscribe.
Calling si_cs_send() alone is always dangerous because it can result
in the loss of an event if it manages to empty the buffer. Indeed, in
this case it's critical to call si_chk_rcv() on the opposite stream-int.
Given that si_cs_process() takes care of all this, let's call it instead.
All this code could possibly be refined soon to avoid redoing the whole
stream_int_notify() and do it only after a send(), but at the moment it's
not important.
With the new synchronous si_cs_send() at the end of process_stream(),
we're seeing re-appear the I/O layer specific part of the stream interface
which is supposed to deal with I/O event subscription. The only difference
is that now we subscribe to I/Os only after having attempted (and failed)
them.
This patch brings a cleanup in this by reintroducing stream_int_update_conn()
with the send code from process_stream(). However this alone would not be
enough because the flags which are cleared afterwards would result in the
loss of the possible events (write events only at the moment). So the flags
clearing and stream-int state updates are also performed inside si_update()
between the generic code and the I/O specific code. This definitely makes
sense as after this call we can simply check again for channel and SI flag
changes and decide to loop once again or not.
The rationale here is that we should never need to try to send() at the
beginning of process_stream() because :
- if something was pending, it's very unlikely that it was unblocked
and not sent just between the last poll() and the wakeup instant.
- if something pending was recently sent, then we don't have anything
to send anymore.
So at first glance it doesn't seem like there could be any valid case
where trying to send before entering the function brings any benefit.
If a buffer allocation failed, we have SI_FL_WAIT_ROOM set and c_size(buf)
being zero. It's the only moment where we have a new opportunity to try to
allocate this buffer. However we don't want to waste our time trying this
if both are non-null since it indicates missing room without any changed
condition.
Well that's only 3 places (applet.c, stream_interface.c, hlua.c). This
ensures we always clear SI_FL_WAIT_ROOM before setting it on failure,
so that it is granted that SI_FL_WAIT_ROOM always indicates a lack of
room for doing an operation, including the inability to allocate a
buffer for this.
The vars_prune() and vars_init() functions involve locking while most of
the time there is no variable at all in streams nor sessions. Let's check
for emptiness before calling these functions. Simply doing this has
increased the multithreaded performance from 1.5 to 5% depending on the
workload.
While "option prefer-last-server" only applies to non-deterministic load
balancing algorithms, 401/407 responses actually caused haproxy to prefer
the last server unconditionally.
As this breaks deterministic load balancing algorithms like uri, this
patch applies the same condition here.
Should be backported to 1.8 (together with "BUG/MINOR: only mark
connections private if NTLM is detected").
Instead of marking all connections that see a 401/407 response private
(for connection reuse), this patch detects a RFC4559/NTLM authentication
scheme and restricts the private setting to those connections.
This is so we can reuse connections with 401/407 responses with
deterministic load balancing algorithms later (which requires another fix).
This fixes the problem reported here by Elliot Barlas :
https://discourse.haproxy.org/t/unable-to-configure-load-balancing-per-request-over-persistent-connection/3144
Should be backported to 1.8.
The behaviour of the flag CF_WRITE_PARTIAL was modified by commit
95fad5ba4 ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") due
to a situation where it could trigger an immediate wake up of the other
side, both acting in loops via the FD cache. This loss has caused the
need to introduce CF_WRITE_EVENT as commit c5a9d5bf, to replace it, but
both flags express more or less the same thing and this distinction
creates a lot of confusion and complexity in the code.
Since the FD cache now acts via tasklets, the issue worked around in the
first patch no longer exists, so it's more than time to kill this hack
and to restore CF_WRITE_PARTIAL's semantics (i.e.: there has been some
write activity since we last left process_stream).
This patch mostly reverts the two commits above. Only the part making
use of CF_WROTE_DATA instead of CF_WRITE_PARTIAL to detect the loss of
data upon connection setup was kept because it's more accurate and
better suited.
With this patch we avoid parsing "max-object-size" with atoi() and we store its
value as an unsigned int to prevent bad implicit conversion issues especially
when we compare it with others unsigned value (content length).
With this patch we check that shctx_init() does not returns 0.
This is possible if the maxblocks argument, which is passed as an
int, is negative due to an implicit conversion.
Must be backported to 1.8.
With this patch we support cache size larger than 2047 (MB) and prevent haproxy from crashing when "total-max-size" is parsed as negative values by atoi().
The limit at parsing time is 4095 MB (UINT_MAX >> 20).
May be backported to 1.8.
This patch adds "max-object-size" option to the cache to limit
the size in bytes of the HTTP objects to be cached. When not provided,
the maximum size of an HTTP object is a 256th of the cache size.
This patch makes the capable of storing HTTP objects larger than a buffer.
It makes usage of the "block by block shared object allocation" new shctx API.
A new pointer to struct shared_block has been added to the cache applet
context to memorize the next block to be used by the HTTP cache I/O handler
http_cache_io_handler() to emit the data. Another member, named "sent" memorize
the number of bytes already sent by this handler. So, to send an object from cache,
http_cache_io_handler() must be called until "sent" counter reaches the size
of this object.
This patch makes shctx capable of storing objects in several parts,
each parts being made of several blocks. There is no more need to
walk through until reaching the end of a row to append new blocks.
A new pointer to a struct shared_block member, named last_reserved,
has been added to struct shared_block so that to memorize the last block which was
reserved by shctx_row_reserve_hot(). Same thing about "last_append" pointer which
is used to memorize the last block used by shctx_row_data_append() to store the data.
Fred reported a random crash related to the pools. This was introduced
by commit e18db9e98 ("MEDIUM: pools: implement a thread-local cache for
pool entries") because the minimum pool item size should have been
increased to 32 bytes to accommodate the 2 double-linked lists.
No backport is needed.
This option makes a proxy use only HTX-compatible muxes instead of the
HTTP-compatible ones for HTTP modes. It must be set on both ends, this
is checked at parsing time.
With the previous connection model, when we purposely decided to stop
receiving in order to avoid polling after a complete request was received
for example, it was needed to set SI_FL_WAIT_ROOM to prevent receive
polling from being re-armed. Now with the new subscription-based model
there is no such thing anymore and there is noone to remove this flag
either. Thus if a request takes more than one packet to come in or
spans over too many packets, this flag will cause it to wait forever.
Let's simply remove this flag now.
This patch should not be backported since older versions still need
that this flag is set here to stop receiving.
We wake up all the streams waiting to send data when we have space available
in the mux buffer. Doing so means we probably wake way too many streams,
because after a few the buffer will probably be full instead. So keep a
list of all the streams that are about to send data, and if we detect that
the buffer is full, unschedule the tasks and put the streams back to the
send_list.
Make sure we call tasklet_free() only after si_release_endpoint(), when the
unsubscribe() method has been called, so that we're sure the mux won't
attempt to access the taslet.
Avoid using conn_xprt_want_send/recv, and totally nuke cs_want_send/recv,
from the upper layers. The polling is now directly handled by the connection
layer, it is activated on subscribe(), and unactivated once we got the event
and we woke the related task.
In h2_recv(), return 1 if we have data available, or if h2_recv_allowed()
failed, to be sure h2_process() is called.
Also don't subscribe if our buffer is full.
When retrying to connect to a server, because the previous connection failed,
make sure if we subscribed to the previous connection, the polling flags will
be true for the new fd.
No backport is needed.
When we're closing a stream, is there's no stream left and a goaway was sent,
close the connection, there's no reason to keep it open.
[wt: it's likely that this is needed in 1.8 as well, though it's unclear
how to trigger this issue, some tests are needed]
Similary to what's been done in 7a6ad88b02,
take into account that free_list that free_list is a void **, and so use
a void ** too when attempting to do a CAS.
The purpose is to detect if threads or processes are competing for the
same CPU. This can happen when threads are incorrectly bound, or after a
reload if the previous process still has an important activity. With
threads this situation is problematic because a preempted thread holding
a lock will block other ones waiting for this lock to be released.
A first attempt consisted in measuring the cumulated lost time more
precisely but the system's scheduler is smart enough to try to limit the
thread preemption rate by mostly context switching during poll()'s blank
periods, so most of the time lost is not seen. In essence this is good
because it means a thread is not preempted with a lock held, and even
regarding the rendez-vous point it cannot prevent the other ones from
making progress. But still it happens tens to hundreds of times per
second that a thread might be preempted, so it's still possible to detect
that the situation is happening, thus it's interesting to measure and
report its frequency.
Each time we enter the poller, we check the CPU time spent working and
see if we've lost time doing something else. To limit false positives,
we're only interested in losses of 500 microseconds or more (i.e. half
a clock tick on a 1 kHz system). If so, it indicates that some time was
stolen by another thread or process. Note that we purposely store some
sub-millisecond counters so that under heavy traffic with a 1 kHz clock,
it's still possible to measure something without being subject to the
risk of rounding errors (i.e. if exactly 1 ms is stolen it's possible
that the time difference could often be slightly lower).
This counter of lost CPU time slots time is reported in "show activity"
in numbers of milliseconds of CPU lost per second, per 15s, and total
over the process' life. By definition, the per-second counter cannot
report values larger than 1000 per thread per second and the 15s one
will be limited to 15000/s in the worst case, but it's possible that
peak values exceed such thresholds after long pauses.
The calls to HA_ATOMIC_CAS() on the lockfree version of the pool allocator
were mistakenly done on (void*) for the old value instead of (void **).
While this has no impact on "recent" gcc, it does have one for gcc < 4.7
since the CAS was open coded and it's not possible to assign a temporary
variable of type "void".
No backport is needed, this only affects 1.9.
By placing this code into time.h (tv_entering_poll() and tv_leaving_poll())
we can remove the logic from the pollers and prepare for extending this to
offer more accurate time measurements.
The 4 pollers all contain the same code used to compute the poll timeout.
This is pointless, let's centralize this into fd.h. This also gets rid of
the useless SCHEDULER_RESOLUTION macro which used to work arond a very old
linux 2.2 bug causing select() to wake up slightly before the timeout.
There are as many ways to build the globalfilepathlen variable as branches
in the if/then/else, creating lots of confusion. Address the most obvious
parts, but some polishing definitely is still needed.
These ones are on error paths that are properly handled by luaL_error()
which does a longjmp() but the compiler cannot know it. By adding an
__unreachable() statement in WILL_LJMP(), there is no ambiguity anymore.
This may be backported to 1.8 but these previous patches are needed first :
- BUILD: compiler: add a new statement "__unreachable()"
- MINOR: lua: all functions calling lua_yieldk() may return
- BUILD: lua: silence some compiler warnings about potential null derefs (#2)
There was a mistake when tagging functions which always use longjmp and
those which may use it in that all those supposed to call lua_yieldk()
may return without calling longjmp. Thus they must not use WILL_LJMP()
but MAY_LJMP(). It has zero impact on the code emitted as such, but
prevents other fixes from being properly implemented : this was the
cause of the previous failure with the __unreachable() calls.
This may be backported to older versions. It may or may not apply
well depending on the context, though the change simply consists in
replacing "WILL_LJMP(hlua_yieldk" with "MAY_LJMP(hlua_yieldk", and
same with the single call to lua_yieldk() in hlua_yieldk().
Here we make sure that appctx is always taken from the unchecked value
since we know it's an appctx, which explains why it's immediately
dereferenced. A missing test was added to ensure that task_new() does
not return a NULL.
This may be backported to 1.8.
This reverts commit f1ffb39b61.
It breaks Lua causing some timeouts. Removing the __unreachable() statement
from WILL_LJMP() fixes it. It's very strange and unclear whether it's an
issue with WILL_LJMP() not fullfilling its promise of not returning, if
the code emitted with __unreachable() gets broken, or anything else. Let's
revert this for now.
There is a bug in this function used to release other threads. It leaves
the current thread marked as harmless. If after this another thread does
a thread_isolate(), but before the first one reaches poll(), the second
thread will believe it's alone while it's not.
This must be backported to 1.8 since the rendez-vous point was merged
into 1.8.14.
Each thread now keeps the last ~512 kB of freed objects into a local
cache. There are some heuristics involved so that a specific pool cannot
use more than 1/8 of the total cache in number of objects. Tests have
shown that 512 kB is an optimal size on a 24-thread test running on a
dual-socket machine, resulting in an overall 7.5% performance increase
and a cache miss ratio reducing from 19.2 to 17.7%. Anyway it seems
pointless to keep more than an L2 cache, which probably explains why
sizes between 256 and 512 kB are optimal.
Cached objects appear in two lists, one per pool and one LRU to help
with fair eviction. Currently there is no way to check each thread's
cache state nor to flush it. This cache cannot be disabled and is
enabled as soon as the lockless pools are enabled (i.e.: threads are
enabled, no pool debugging is in use and the CPU supports a double word
CAS).
For caching it will be convenient to have indexes associated with pools,
without having to dereference the pool itself. One solution could consist
in replacing all pool pointers with integers but this would limit the
number of allocatable pools. Instead here we allocate the 32 first pools
from a pre-allocated array whose base address is known so that it's trivial
to convert a pool to an index in this array. Pools that cannot fit there
will be allocated normally.
Currently we have per-thread arrays of trees and counts, but these
ones unfortunately share cache lines and are accessed very often. This
patch moves the task-specific stuff into a structure taking a multiple
of a cache line, and has one such per thread. Just doing this has
reduced the cache miss ratio from 19.2% to 18.7% and increased the
12-thread test performance by 3%.
It starts to become visible that we really need a process-wide per-thread
storage area that would cover more than just these parts of the tasks.
The code was arranged so that it's easy to move the pieces elsewhere if
needed.
Now we still have a main contention point with the timers in the main
wait queue, but the vast majority of the tasks are pinned to a single
thread. This patch creates a per-thread wait queue and queues a task
to the local wait queue without any locking if the task is bound to a
single thread (the current one) otherwise to the shared queue using
locking. This significantly reduces contention on the wait queue. A
test with 12 threads showed 11 ms spent in the WQ lock compared to
4.7 seconds in the same test without this change. The cache miss ratio
decreased from 19.7% to 19.2% on the 12-thread test, and its performance
increased by 1.5%.
Another indirect benefit is that the average queue size is divided
by the number of threads, which roughly removes log(nbthreads) levels
in the tree and further speeds up lookups.
The vast majority of FDs are only seen by one thread. Currently the lock
on FDs costs a lot because it's touched often, though there should be very
little contention. This patch ensures that the lock is only grabbed if the
FD is shared by more than one thread, since otherwise the situation is safe.
Doing so resulted in a 15% performance boost on a 12-threads test.
peers_init_sync() doesn't check task_new()'s return value and doesn't
return any result to indicate success or failure. Let's make it return
an int and check it from the caller.
This can be backported as far as 1.6.
Gcc reports a potential null-deref error in the stick-table init code.
While not critical there, it's trivial to fix. This check has been
missing since 1.4 so this fix can be backported to all supported versions.
This null-deref cannot happen either as there necesarily is a listener
where this function is called. Let's use __objt_listener() to address
this.
This may be backported to 1.8.
Gcc 6.4 detects a potential null-deref warning in smp_fetch_ssl_fc_cl_str().
This one is not real since already addressed a few lines above. Let's use
__objt_conn() instead of objt_conn() to avoid the extra test that confuses
it.
This could be backported to 1.8.
These ones are on error paths that are properly handled by luaL_error()
which does a longjmp() but the compiler cannot know it. By adding an
__unreachable() statement in WILL_LJMP(), there is no ambiguity anymore.
This may be backported to 1.8 but the previous patch (BUILD: compiler:
add a new statement "__unreachable()") is needed for this.
In case pool_alloc() fails in stream_new(), we try to detach the stream
from the list before it has been added, dereferencing a NULL. In order
to fix it, simply move the LIST_DEL call upwards.
This must be backported to 1.8.
The listeners with the LI_O_INHERITED flag were deleted but not unbound
which is a problem since we have a polling in the master.
This patch unbind every listeners which are not require for the master,
but does not close the FD of those that have a LI_O_INHERITED flag.
We will need to know if a mux was created for a front or a back
connection and once it's established it's much harder, so let's
introduce H2_CF_IS_BACK for this.
For backend connections we'll have to initialize streams but not allocate
conn_streams since they'll already be there. Thus this patch splits the
h2c_stream_new() function into one dedicated to allocation of a new stream
and another one supposed to attach this stream to an existing frontend
connection.
Till now in order to figure the timeouts, we used to retrieve the proxy
from the session's owner, but the new API provides it so it's better to
simply take it from the caller at init time. We take this opportunity to
store the pointer to the proxy into the h2 connection so that we can
reuse it later when needed.
The init function was split into the mux init and the front init, but it
appears that most of the code will be common between the two sides when
implementing the backend init. Thus let's simply make this a unique
h2_init() function.
h2_snd_buf() must not accept to send data if the preface was not yet
received nor sent. At the moment it doesn't happen but it can with
server-side H2.
At a few places we check these states to detect if a stream has valid
data/errcode or is one of the two dummy streams (idle or closed). It
will become problematic for outgoing streams as it will not be possible
to report errors for example since the stream will switch from IDLE
state only after sending a HEADERS frame.
There is a safer solution consisting in checking the stream ID, which
may only be zero in the dummy streams. This patch changes the test to
only rely on the stream ID.
At many places in muxes we'll have to add tests to check if the
connection is front or back before deciding to log. Instead let's
centralize this test in sess_log() to simply do nothing when sess=NULL.
Some pseudo-headers are added during the headers parsing, mainly for the mux
H2. With this flag, it is possible to not add them. This avoid some boring
filtering in the mux H1.
Instead of using offsets relating to the parsed buffer to store start line
infos, we now use indirect strings. So now, these infos remain valid only if the
origin buffer remains untouched. But it's not a real problem because this union
is used during the parsing and never stored to a later use.
When headers parsing ends, a pseudo header with an empty name and an empty value
is added to the array of parsed headers to mark its end. It is convenient to
loop on this array, but not really useful if we want remove the last header or
add a new one, because we don't really know where is the last CRLF (the empty
line ending the headers block). So now, instead the name of this pseudo header
points on this last CRLF. Its length is still 0 and its value is still empty, so
loops on the array remains unchanged.
Since keep-alive mode is the default mode, the passive close has disappeared,
and in the code, httpclose and forceclose options are handled the same way:
connections with the client and the server are closed as soon as the request and
the response are received and missing "Connection: close" header is added in
each direction.
So to make things clearer, forceclose is now an alias for httpclose. And
httpclose is explicitly an active close. So the old passive close does not exist
anymore. Internally, the flag PR_O_HTTP_PCL has been removed and PR_O_HTTP_FCL
has been replaced by PR_O_HTTP_CLO. In HTTP analyzers, the checks done to find
the right mode to use, depending on proxies options and "Connection: " header
value, have been simplified.
This should only be a cleanup and no changes are expected.
This option is frontends specific, so there is no reason to support it on
backends. So now, it is ignored if it is set on a backend and a warning is
emitted during the startup. The change is quite trivial, but the commit is
tagged as MEDIUM because it is a small breakage with previous versions and
configurations using this options could emit a warning now.
This option is backends specific, so there is no reason to support it on
frontends. So now, it is ignored if it is set on a frontend and a warning is
emitted during the startup. The change is quite trivial, but the commit is
tagged as MEDIUM because it is a small breakage with previous versions and
configurations using this options could emit a warning now.
To ease the refactoring, the function "http_header_add_tail" have been
remove. Now, "http_header_add_tail2" is always used. And the function
"capture_headers" have been renamed into "http_capture_headers". Finally, some
functions have been exported.
Make sure we unsubscribe from events before si_release_endpoint destroys
the conn_stream, or it will be never called. To do so, move the call to
unsubscribe to si_release_endpoint() directly.
This is 1.9-specific and shouldn't be backported.
This bug appeared only if nbthread > 1. Handling the pipe with the
master, multiple threads of the same worker could process the deinit().
In addition, deinit() was called while some other threads were still
performing some tasks.
This patch assign the handler of the pipe with master to only the first
thread and removes the call to deinit() before exiting with an error.
This patch should be backported in v1.8.
If we can't send data for a stream because of its flow control, make sure
not to put it in the send_list, until the flow control lets it send again.
This is specific to 1.9, and should not be backported.
When subscribing, we don't need to provide a list element, only the h2 mux
needs it. So instead, Add a list element to struct h2s, and use it when a
list is needed.
This forces us to use the unsubscribe method, since we can't just unsubscribe
by using LIST_DEL anymore.
This patch is larger than it should be because it includes some renaming.
As we don't know how subscriptions are handled, we can't just assume we can
use LIST_DEL() to unsubscribe, so introduce a new method to mux and connections
to do so.
CurSslConns inc/dec operations are not threadsafe. The unsigned CurSslConns
counter can wrap to a negative value. So we could notice connection rejects
because of MaxSslConns limit artificially exceeded.
CumSslConns inc operation are also not threadsafe so we could miss
some connections and show inconsistenties values compared to CumConns.
This fix should be backported to v1.8.
The run queue is designed to perform a single tree lookup and to
use multiple passes to eb32sc_next(). The scheduler rework took a
conservative approach first but this is not needed anymore and it
increases the processing cost of process_runnable_tasks() and even
the time during which the RQ lock is held if the global queue is
heavily loaded. Let's simply move the initial lookup to the entry
of the loop like the previous scheduler used to do. This has reduced
by a factor of 5.5 the number of calls to eb32sc_lookup_get() there.
In the past this conditional had multiple conditionals which is why the
additional parentheses were needed. The conditional was simplified but
the duplicate parentheses were not cleaned up.
OpenSSL released support for TLSv1.3. It also added a separate function
SSL_CTX_set_ciphersuites that is used to set the ciphers used in the
TLS 1.3 handshake. This change adds support for that new configuration
option by adding a ciphersuites configuration variable that works
essentially the same as the existing ciphers setting.
Note that it should likely be backported to 1.8 in order to ease usage
of the now released openssl-1.1.1.
In ci_insert_line2() and b_rep_blk(), we can't afford to wrap, so don't use
b_tail() to check if we do, use __b_tail() instead.
This should be backported to previous versions.
For generate-certificates, X509V3_EXT_conf is used but it's an old API
call: X509V3_EXT_nconf must be preferred. Openssl compatibility is ok
because it's inside #ifdef SSL_CTRL_SET_TLSEXT_HOSTNAME, introduce 5
years after X509V3_EXT_nconf.
Commit 8ae735da0 ("MEDIUM: mux_h2: Revamp the send path when blocking.")
added a tasklet allocation in h2_stream_new(), however the error exit path
fails to reset h2s in case the tasklet cannot be allocated, resulting in
the h2s pointer to be returned as valid to the caller. Let's readjust the
exit path to always return NULL on error and to always log as well (since
there is no reason for not logging on such important errors).
No backport is needed, this is strictly 1.9-dev.
Since commit 7505f94f9 ("MEDIUM: h2: Don't use a wake() method anymore."),
the H2 mux's init() calls h2_process(). But this last one may detect an
early error and call h2_release(), destroying the connection, and return
-1. At this point we're screwed because the caller will still dereference
the connection for various things ranging from the configuration of the
proxy protocol header to the retries. We could simply return -1 here upon
failure but that's not enough since the stream layer really needs to keep
its connection structure allocated (to clean it up in session_kill_embryonic
or for example because it holds the destination address to reconnect to
when the connection goes to the backend). Thus the correct solution here is
to only schedule a wakeup of the I/O callback so that the init succeeds,
and that the connection is only handled later.
No backport is needed, this is 1.9-specific.
The return value from conn_install_mux() was not checked, so if an
inconsistency happens in the code, or a memory allocation fails while
initializing the mux, we can crash while using an uninitialized mux.
In practice the code inconsistency does not really happen since we
cannot configure such a situation, except during development, but
the out of memory condition could definitely happen.
This should be backported to 1.8 (the code is a bit different there,
there are two calls to conn_install_mux()).
The prototypes of functions find_hdr_value_end(), extract_cookie_value()
and http_header_match2() were still in proto_http.h while some of them
don't exist anymore and the others were just moved. Let's remove them.
In addition, da.c was updated to use http_extract_cookie_value() which
is the correct one.
These ones are mostly called from cfgparse.c for the parsing and do
not depend on the HTTP representation. The functions's prototypes
were moved to proto/http_rules.h, making this file work exactly like
tcp_rules. Ideally we should stop calling these functions directly
from cfgparse and register keywords, but there are a few cases where
that wouldn't work (stats http-request) so it's probably not worth
trying to go this far.
The current proto_http.c file is huge and contains different processing
domains making it very difficult to work on an alternative representation.
This commit moves some parts to other files :
- ACL registration code => http_acl.c
This code only creates some ACL mappings and doesn't know anything
about HTTP nor about the representation. This code could even have
moved to acl.c but it was not worth polluting it again.
- HTTP sample conversion => http_conv.c
This code doesn't depend on the internal representation but definitely
manipulates some HTTP elements, such as dates. It also has access to
captures.
- HTTP sample fetching => http_fetch.c
This code does depend entirely on the internal representation but is
totally independent on the analysers. Placing it into a different
file will ease the transition to the new representation and the
creation of a wrapper if required. An include file was created due
to CHECK_HTTP_MESSAGE_FIRST() being used at various places.
- HTTP action registration => http_act.c
This code doesn't directly interact with the messages nor the
transaction but it does so via some exported http functions like
http_replace_req_line() or http_set_status() so it will be easier
to change only this after the conversion.
- a few very generic parts were found and moved to http.{c,h} as
relevant.
It is worth noting that the functions moved to these new files are not
referenced anywhere outside of the files and are only called as registered
callbacks, so these files do not even require associated include files.
found by coverity.
[wt: this bug was introduced by commit 404d978 ("MINOR: add ALPN
information to send-proxy-v2"). It might be triggered by a health
check on a server using ppv2 or by an applet making use of such a
server, if at all configurable].
This needs to be backported to 1.8.
This ads support for accessing stick tables from Lua. The supported
operations are reading general table info, lookup by string/IP key, and
dumping the table.
Similar to "show table", a data filter is available during dump, and as
an improvement over "show table" it's possible to use up to 4 filter
expressions instead of just one (with implicit AND clause binding the
expressions). Dumping with/without filters can take a long time for
large tables, and should be used sparingly.
At the eand of process_stream(), we wake the task if there's something in
the input buffer, after attempting a recv. However this is wrong, and we should
only do so if we received new data. Just check the CF_READ_PARTIAL flag.
This is 1.9-specific and should not be backported.
Instead of using si_cs_io_cb() in process_stream() use si_cs_send/si_cs_recv
instead, as si_cs_io_cb() may lead to process_stream being woken up when it
shouldn't be, and thus timeout would never get triggered.
With recent modifications on the buffers API, when a buffer is released (calling
b_free), we replace it by BUF_NULL where the area pointer is NULL. So many
operations, like b_peek, must be avoided on a released or not allocated
buffer. These changes were mainly made in the commit c9fa048 ("MAJOR: buffer:
finalize buffer detachment").
Since this commit, HAProxy can crash during the body parsing of chunked HTTP
messages because there is no check on the channel's buffer in HTTP analyzers
(http_request_forward_body and http_response_forward_body) nor in H1 functions
reponsible to parse chunked content (h1_skip_chunk_crlf & co). If a stream is
woken up after all input data were forwarded, its input channel's buffer is
released (so set to BUF_NULL). In this case, if we resume the parsing of a
chunk, HAProxy crashes.
To fix this issue, we just skip the parsing of chunks if there is no input data
for the corresponding channel. This is only done if the message state is
strickly lower to HTTP_MSG_ENDING.
Tim Düsterhus found using afl-fuzz that some parts of the HPACK decoder
use incorrect bounds checking which do not catch negative values after
a type cast. The first culprit is hpack_valid_idx() which takes a signed
int and is fed with an unsigned one, but a few others are affected as
well due to being designed to work with an uint16_t as in the table
header, thus not being able to detect the high offset bits, though they
are not exposed if hpack_valid_idx() is fixed.
The impact is that the HPACK decoder can be crashed by an out-of-bounds
read. The only work-around without this patch is to disable H2 in the
configuration.
CVE-2018-14645 was assigned to this bug.
This patch addresses all of these issues at once. It must be backported
to 1.8.
An invalid null-deref warning is emitted because cmsg is not checked,
though it definitely is valid given the test performed 10 lines above,
but the compiler cannot necessarily guess this. Adding a null test to
the problematic condition is enough to get rid of it and cheap enough.
Like for the other checks, the type is being tested just before calling
objt_{server,dns_srvrq}() so let's use the unguarded version instead to
silence the warning.
When building with -Wnull-dereferences, gcc sees some cases where a
pointer is dereferenced after a check may set it to null. While all of
these are already guarded by either a preliminary test or the code's
construction (eg: listeners code being called only on listeners), it
cannot be blamed for not "seeing" this, so better use the unguarded
calls everywhere this happens, particularly after checks. This is a
step towards building with -Wextra.
Theorically nothing would prevent a front applet form connecting to a stats
socket, and if a "getsock" command was issued, it would cause a crash. Right
now nothing in the code does this so in its current form there is no impact.
It may or may not be backported to 1.8.
In h1_headers_to_hdr_list, when an incomplete message is parsed, all updates
must be skipped until the end of the message is found. Then the parsing is
restarted from the beginning. But not all updates were skipped, leading to
invalid rewritting or segfault.
No backport is needed.
A null pointer assignment was missing after free() in function
pat_ref_reload() which can lead to segfault.
This bug was introduced in commit b5997f7 ("MAJOR: threads/map: Make
acls/maps thread safe").
Must be backported to 1.8.
Just like we used to do in proto_http, we now check that each and every
occurrence of the content-length header field and each of its values are
exactly identical, and we normalize the header to return the last value
of the first header with spaces trimmed.
The transfer-encoding header processing was a bit lenient in this part
because it was made to read messages already validated by haproxy. We
absolutely need to reinstate the strict processing defined in RFC7230
as is currently being done in proto_http.c. That is, transfer-encoding
presence alone is enough to cancel content-length, and must be
terminated by the "chunked" token, except in the response where we
can fall back to the close mode if it's not last.
For this we now use a specific parsing function which updates the
flags and we introduce a new flag H1_MF_XFER_ENC indicating that the
transfer-encoding header is present.
Last, if such a header is found, we delete all content-length header
fields found in the message.
The new function h1_parse_connection_header() is called when facing a
connection header in the generic parser, and it will set up to 3 bits
in h1m->flags indicating if at least one "close", "keep-alive" or "upgrade"
tokens was seen.
This will be needed for the mux to know how to process the Connection
header, and will save it from having to re-parse the request line since
it's captured on the fly.
While it was possible to consider the status before parsing response
headers, it's wrong to do it for request headers and could lead to
random behaviours due to this status matching other fields instead.
Additionnally there is little to no value in doing this for each and
every new header field. It's much better to reset the content-length
at once in the callerwhen seeing such statuses (which currently is only
the H2 mux).
No backport is needed, this is purely 1.9.
The h2 parser has this specificity that if it cannot send the headers
frame resulting from the headers it just parsed, it needs to drop it
and parse it again later. Since commit 8852850 ("MEDIUM: h1: let the
caller pass the initial parser's state"), when this happens the parser
remains in the data state and the headers are not parsed again next
time, resulting in a parse error. Let's reset the parser on exit there.
No backport is needed.
If we're detaching the conn_stream, and it was subscribed to be waken up
when more data was available to receive, unsubscribe it.
No backport is needed.
Empty both send_list and fctl_list when destroying the h2 context, so that
if we're freeing the stream after, it doesn't try to remove itself from the
now-deleted list.
No backport is needed.
Till now it was very difficult for a mux to know what proxy it was
working for. Let's pass the proxy when the mux is instanciated at
init() time. It's not yet used but the H1 mux will definitely need
it, just like the H2 mux when dealing with backend connections.
The h1 parser used to systematically turn header field names to lower
case because it was designed for H2. Let's add a flag which is off by
default to condition this behaviour so that when using it from an H1
parser it will not affect the message.
The original H1 request parsing code was reintroduced into the generic
H1 parser so that it can be used regardless of the direction. If the
parser is interrupted and restarts, it makes use of the H1_MF_RESP
flag to decide whether to re-parse a request or a response. While
parsing the request, the method is decoded and set into the start line
structure.
The HTTP status is not relevant to the H1 message but to the H2 stream
itself. It used to be placed there by pure convenience but better move
it before it's too hard to remove.
This state was only a delimiter between headers and body but it now
causes more harm than good because it requires someone to change it.
Since the H1 parser knows if we're in DATA or CHUNK_SIZE, simply let
it set the right next state so that h1m->state constantly matches
what is expected afterwards.
While it was not needed in the H2 mux which was reading full H1 messages
from the channel, it is mandatory for the H1 mux reading contents from
outside to be able to restart on a message. The problem is that the
headers are indexed on the fly, and it's not fun to have to store
everything between calls.
The solution here is to complete the first pass doing a partial restart,
and only once the end of message was found, to start over it again at
once, filling entries. This way there is a bounded number of passes on
the contents and no need to store an intermediary result anymore. Later
this principle could even be used to decide to completely drop an output
buffer to save memory.
This will allow the parser to fill some extra fields like the method or
status without having to store them permanently in the HTTP message. At
this point however the parser cannot restart from an interrupted read.
Till now the H1 parser made for H2 used to be lenient on invalid header
field names because they were supposed to be produced by haproxy. Now
instead we'll rely on err_pos to know how to act (ie: -2 == must block).
There's no reason to have the two sides in H1 format since we only use
one at a time (the response at the moment). While completely removing
the request declaration, let's rename the response to "h1m" to clarify
that it's the unique h1 message there.
This is the *parsing* state of an HTTP/1 message. Currently the h1_state
is composite as it's made both of parsing and control (100SENT, BODY,
DONE, TUNNEL, ENDING etc). The purpose here is to have a purely H1 state
that can be used by H1 parsers. For now it's equivalent to h1_state.
Instead of waiting for the connection layer to let us know we can read,
attempt to receive as soon as process_stream() is called, and subscribe
to receive events if we can't receive yet.
Now, except for idle connections, the recv(), send() and wake() methods are
no more, all the lower layers do is waking tasklet for anybody waiting
for I/O events.
Change fctl_list and send_list to be lists of struct wait_list, and nuke
send_wait_list, as it's now redundant.
Make the code responsible for shutr/shutw subscribe to those lists.
Instead of having our wake() method called each time a fd event happens,
just subscribe to recv/send events, and get our tasklet called when that
happens. If any recv/send was possible, the equivalent of what h2_wake_cb()
will be done.
Let the connection layer know we're always interested in getting more data,
so that we get scheduled as soon as data is available, instead of relying
on the wake() method.
Make h2_recv() and h2_send() return 1 if data has been sent/received, or 0
if it did not. That way the caller will be able to know if more work may
have to be done.
Remove the recv() method from mux and conn_stream.
The goal is to always receive from the upper layers, instead of waiting
for the connection later. For now, recv() is still called from the wake()
method, but that should change soon.
For struct connection, struct conn_stream, and for the h2 mux, add 2 new
lists, one that handles waiters for recv, and one that handles waiters for
recv and send. That way we can ask to subscribe for either recv or send.
Resetting the polling flags at the end of conn_fd_handler() shouldn't be
needed anymore, and it will create problem when we won't handle send/recv
from conn_fd_handler() anymore.
Cyril Bonté reported that commit f9cc07c25b broke the build without
thread.
We don't need to initialise tid = 0 in mworker_loop, so we could
completely remove it.
Christopher noticed that the CS_FL_EOS to CS_FL_REOS conversion was
incomplete : when the connectionis closed, we mark the streams with EOS
instead of REOS, causing the loss of any possibly pending data. At the
moment it's not an issue since H2 is used only with a client, but with
servers it could be a real problem if servers close the connection right
after sending their response.
This patch should be backported to 1.8.
This patch ensures that a DNS resolution may be launched before
setting a server FQDN via the CLI. Especially, it checks that
resolvers was set.
A LEVEL 4 reg testing file is provided.
Thanks to Lukas Tribus for having reported this issue.
Must be backported to 1.8.
This protocol is based on the uxst one, but it uses socketpair and FD
passing insteads of a connect()/accept().
The "sockpair@" prefix has been implemented for both bind and server
keywords.
When HAProxy wants to connect through a sockpair@, it creates 2 new
sockets using the socketpair() syscall and pass one of the socket
through the FD specified on the server line.
On the bind side, haproxy will receive the FD, and will use it like it
was the FD of an accept() syscall.
This protocol was designed for internal communication within HAProxy
between the master and the workers, but it's possible to use it
externaly with a wrapper and pass the FD through environment variabls.
It's possible to have several protocols per family which is a problem
with the current way the protocols are stored.
This allows to register a new protocol in HAProxy which is not a
protocol in the strict socket definition. It will be used to register a
SOCK_STREAM protocol using socketpair().
In _update_fd(), if the fd wasn't polled, and we don't want it to be polled,
we just returned 0, however, we should return changes instead, or all previous
changes will be lost.
This should be backported to 1.8.
The following functions only deal with header field values and are agnostic
to the HTTP version so they were moved to http.c :
http_header_match2(), find_hdr_value_end(), find_cookie_value_end(),
extract_cookie_value(), parse_qvalue(), http_find_url_param_pos(),
http_find_next_url_param().
Those lacking the "http_" prefix were modified to have it.
There are 3 tables in proto_http which are used exclusively by logs :
hdr_encode_map[], url_encode_map[] and http_encode_map[]. They indicate
what characters are safe to be emitted in logs depending on the part of
the message where they are placed. Let's move this to log.c, as well as
its initialization. It's worth noting that the rfc5424 map was already
initialized there.
These error codes and messages are agnostic to the version, even if
they are represented as HTTP/1.0 messages. Ultimately they will have
to be transformed into internal HTTP messages to be used everywhere.
The HTTP/1.1 100 Continue message was turned to an IST and the local
copy in the Lua code was removed.