Let's instead define a 4-state enum solely for this use case, and place
it into the command's context. Note that END and FIN were already
aliases, which is why they were merged.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. The code also uses st2 which deserves being
addressed in separate commit.
There's no point checking the state before deciding to detach the backref
on "show map", it should always be done if the list is not empty. Note
that being empty guarantees that it's not linked into the list, and
conversely not being empty guarantees that it's in the list, hence the
test doesn't need to be performed under the lock.
The "show map"/"clear map" code used to rely on the cli's i0/i1 fields
to store the generation numbers to work with. That's particularly dirty
because it's done at places where ctx.map is also manipulated while they
are part of the same union, and the reason why this didn't cause trouble
is because cli.i0/i1 are at offset 216/224 while the map parts end at
204, so luckily there was no overlap.
Let's add these fields to the map context.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. Many commands, including pure parsers, use this
context but that's not a problem as it's designed to be used this way.
Due to this, many lines are changed but that's in fact a replacement of
"appctx->ctx.map" with "ctx->". Note that the code also uses st2 which
deserves being addressed in separate commit.
This state is pointless now, it just serves to initialize the initial
table pointer while this can be done more easily in the parser, so let's
do that and drop that state.
Let's instead define a 4-state enum solely for this use case, and place
it into the command's context. Note that END and FIN were already
aliases, which is why they were merged.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. The code also uses st2 which deserves being
addressed in separate commit.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing.
The code still has room for improvement, such as in the "flags" field
where bits are hard-coded, but they weren't modified.
The state was a constant, let's remove what remains of the switch/case.
The code from the "case" statement was only reindented as can be checked
with "git show -b".
This state is only an alias for "thr >= global.nbthread" which is the
sole condition that indicates the end and reaches it. Let's just drop
this state. There's only the STATE_LIST that's left.
This state was only used to preset the list element. Now that we can
guarantee that the context can be properly preset during the parsing
we don't need this state anymore. The first pointer has to be set to
point to the first stream during the initial call which is detected
by the pointer not yet being set (null). Thanks to this we can also
remove one state check on the abort path.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing.
Till now, appctx_new() used to allocate an entry from the pool and
pass it through appctx_init() to initialize a debatable part of it that
didn't correspond anymore to the comments, and fill other fields. It's
hard to say what is fully initialized and what is not.
Let's get rid of that, and always zero the initialization (appctx are
not that big anyway, even with the cache there's no difference in
performance), and initialize what remains. this is cleaner and more
resistant to new field additions.
The appctx_init() function was removed.
Instead of using existing fields and having to put keyword-specific
contexts in the applet definition, let's have the appctx provide a
generic storage area that's currently large enough for existing CLI
commands and small applets, and a function to allocate that storage.
The function will be responsible for verifying that the requested size
fits in the area so that the caller doesn't need to add specific checks;
it is validated during development as this size is static and will
not change at runtime. In addition the caller doesn't even need to
free() the area since it's part of an existing context. For the
caller's convenience, a context pointer "svcctx" for the command is
also provided so that the allocated area can be placed there (or
possibly any other one in case a larger area is needed).
The struct's layout has been temporarily complicated by adding one
level of anonymous union on top of the "ctx" one. This will allow us
to preserve "ctx" during 2.6 for compatibility with possible external
code and get rid of it in 2.7. This explains why the diff extends to
the whole "ctx" union, but a "git show -b" shows that only one extra
layer was added. In order to make both the svcctx pointer and its
storage accessible without further enlarging the appctx structure,
both svcctx and the storage share the same storage as the ctx part.
This is done by having them placed in the union with a protected
overlapping area for svcctx, for which a shadow member is also
present in the storage area:
union {
void* svcctx; // variable accessed by services
struct {
void *shadow; // shadow of svcctx;
char storage[]; // where most services store their data
};
union { // older commands store here and ignore svcctx
...
} ctx;
};
I.e. new applications will use appctx->svcctx while older ones will be
able to continue to use appctx->ctx.*
The whole area (including the pointer's context) is zeroed before any
applet is initialized, and before CLI keyword processor's first invocation,
as it is an important part of the existing keyword processors, which makes
CLI keywords effectively behave like applets.
The io_handler in "add ssl crt_list" is built around a "while" loop that
only makes forward progress and that doesn't handle its final state as
it's not supposed to be called again once reached. This makes the code
confusing because its construct implies an infinite loop for such a
state (or any other unhandled one). Let's just remove that unneeded loop.
The "show ssl cert" command mixes some generic pointers from the
"ctx.cli" struct with context-specific ones from "ctx.ssl" while both
are in a union. Amazingly, despite the use of both p0 and i0 to store
respectively a pointer to the current ckchs and a transaction id, there
was no overlap with the other pointers used during these operations,
but should these fields be reordered or slightly updated this will break.
Comments were added above the faulty functions to indicate which fields
they are using.
This needs to be backported to 2.5.
The "show ssl crl-file" command mixes some generic pointers from the
"ctx.cli" struct with context-specific ones from "ctx.ssl" while both
are in a union. It's fortunate that the p1 pointer in use is located
before the first one used (it overlaps with old_cafile_entry). But
should these fields be reordered or slightly updated this will break.
This needs to be backported to 2.5.
The "show ssl ca-file <name>" command mixes some generic pointers from
the "ctx.cli" struct and context-specific ones from "ctx.ssl" while both
are in a union. The i0 integer used to store the current ca_index overlaps
with new_crlfile_entry which is thus harmless for now but is at the mercy
of any reordering or addition of these fields. Let's add dedicated fields
into the ssl structure for this.
Comments were added on top of the affected functions to indicate what they
use.
This needs to be backported to 2.5.
The "show ca-file" and "show crl-file" commands mix some generic pointers
from the "ctx.cli" struct and context-specific ones from "ctx.ssl" while
both are in a union. It's fortunate that the p0 pointer in use is located
immediately before the first one used (it overlaps with next_ckchi_link,
and old_cafile_entry is safe). But should these fields be reordered or
slightly updated this will break.
Comments were added on top of the affected functions to indicate what they
use.
This needs to be backported to 2.5.
When "show map" initializes itself, it first takes the reference to the
starting point under a lock, then releases it before switching to state
STATE_LIST, and takes the lock again. The problem is that it is possible
for another thread to remove the first element during this unlock/lock
sequence, and make the list run anywhere. This is of course extremely
unlikely but not impossible.
Let's initialize the pointer in the STATE_LIST part under the same lock,
which is simpler and more reliable.
This should be backported to all versions.
In case of write error in "show map", the backref is detached but
the list wasn't locked when this is done. The risk is very low but
it may happen that two concurrent "show map" one of which would fail
or one "show map" failing while the same entry is being updated could
cause a crash.
This should be backported to all stable versions.
Commit e7f74623e ("MINOR: stats: don't output internal proxies (PR_CAP_INT)")
in 2.5 ensured we don't dump internal proxies on the stats page, but the
same is needed for "show backend", as since the addition of the HTTP client
it now appears there:
$ socat /tmp/sock1 - <<< "show backend"
# name
<HTTPCLIENT>
This needs to be backported to 2.5.
This command was introduced in 1.8 with commit eceddf722 ("MEDIUM: cli:
'show cli sockets' list the CLI sockets") but its yielding doesn't work.
Each time it enters, it restarts from the last bind_conf but enumerates
all listening sockets again, thus it loops forever. The risk that it
happens in field is low but it easily triggers on port ranges after
400-500 sockets depending on the length of their addresses:
global
stats socket /tmp/sock1 level admin
stats socket 192.168.8.176:30000-31000 level operator
$ socat /tmp/sock1 - <<< "show cli sockets"
(...)
ipv4@192.168.8.176:30426 operator all
ipv4@192.168.8.176:30427 operator all
ipv4@192.168.8.176:30428 operator all
ipv4@192.168.8.176:30000 operator all
ipv4@192.168.8.176:30001 operator all
ipv4@192.168.8.176:30002 operator all
^C
This patch adds the minimally needed restart point for the listener so
that it can easily be backported. Some more cleanup is needed though.
The "show resolvers" command is bogus, it tries to implement a yielding
mechanism except that if it yields it restarts from the beginning, until
it manages to fill the buffer with only line breaks, and faces error -2
that lets it reach the final state and exit.
The risk is low since it requires about 50 name servers to reach that
state, but it's not impossible, especially when using multiple sections.
In addition, the extraneous line breaks, if sent over an interactive
connection, will desynchronize the commands and make the client believe
the end was reached after the first nameserver. This cannot be fixed
separately because that would turn this bug into an infinite loop since
it's the line feed that manages to fill the buffer and stop it.
The fix consists in saving the current resolvers section into ctx.cli.p1
and the current nameserver into ctx.cli.p2.
This should be backported, but that code moved a lot since it was
introduced and has always been bogus. It looks like it has mostly
stabilized in 2.4 with commit c943799c86 so the fix might be backportable
to 2.4 without too much effort.
Try to create a "default" resolvers section at startup, but does not
display any error nor warning. This section is initialized using the
/etc/resolv.conf of the system.
This is opportunistic and with no guarantee that it will work (but it should
on most systems).
This is useful for the httpclient as it allows to use the DNS resolver
without any configuration in most of the cases.
The function is called from the httpclient_pre_check() function to
ensure than we tried to create the section before trying to initiate the
httpclient. But it is also called from the resolvers.c to ensure the
section is created when the httpclient init was disabled.
Release the expression used by the set-{src,dst}[-port] actions so we
keep valgrind happy upon an exit or an haproxy -c.
Could be backported in every supported version.
Move the resolv.conf parser from the cfg_parse_resolvers so it could be
used separately.
Some changes were made in the memprintf in order to use a char **
instead of a char *. Also the variable is tested before each memprintf
so could skip them if no warnmsg nor errmsg were set.
Cleanup the alert and warning handling in the "parse-resolve-conf"
parser to use the errmsg and warnmsg variables and memprintf.
This will allow to split the parser and shut the alert/warning if
needed.
The commit 2eb5243e7 ("BUG/MEDIUM: mux-h1: Set outgoing message to DONE when
payload length is reached") introduced a regression. An internal error is
reported when we try to forward a message with trailers while the
content-length header was specified. Indeed, this case does not exist for H1
messages but it is possible in H2.
This patch should solve the issue #1684. It must be backported as far as
2.4.
This bug was already fixed at many places (stats, promex, lua) but the FCGI
multiplexer is also affected. When there is no content-length specified in
the response and when the END_REQUEST record is delayed, the response may be
truncated because an abort is erroneously detected. If the connection is not
closed because "keep-conn" option is set, the response is aborted at the end
of the server timeout.
This bug is a design issue with the HTX. It should be addressed. But it will
probably not be possible to backport them as far as 2.4. So, for now, the
only solution is to explicitly add an EOT block with the EOM flag in this
case.
This patch should fix the issue #1682. It must be backported as far as 2.4.
The commit a6c4a4834 ("BUG/MEDIUM: conn-stream: Don't erase endpoint flags
on reset") was too laxy on reset. Only app layer flags must be preserved. On
reset, the endpoint is detached. Thus all flags set by the endpoint itself
or concerning its type must be removed.
Without this fix, we can experienced crashes when a stream is released while
a server connection attempt failed. Indeed, in this case, endpoint of the
backend conn-stream is reset. But the endpoint type is still set. Thus when
the stream is released, the endpoint is detached again.
This patch is 2.6-specific. No backport needed. This commit depends on the
previous one ("MINOR: conn-stream: Add mask from flags set by endpoint or
app layer").
The httpclient.resolvers.prefer global keyword allows to configure an
ipv4 or ipv6 preference when resolving. This could be useful in
environment where the ipv6 is not supported.
By default the httpclient uses the resolvers section whose ID is
"default", the httpclient.resolvers.id global option allows to configure
another section to use.
The hard_error_ssl flag is set when the configuration is explicitely
done for the ssl in the httpclient.
If no configuration was made, the features are simply disabled and no
alert is emitted.
Cleanup the error handling in the initialization so we rely on the
ERR_CODE and use memprintf() to set the errmsg before printing it at the
end of the functions.
httpclient_set_dst() allows one to set the destination address instead
of using the one in the URL or resolving one from the host.
This function also support other types of socket like sockpair@, unix@,
anything that could be used on a server line.
In order to still support this behavior, the address must be set on the
backend in this particular case because the frontend connection does not
support anything other than ipv4 or ipv6.
To allow the http-request set-dst to work for the httpclient DNS
resolving, some changes need to be done:
- The destination address need to be set in the frontend (s->csf->dst)
instead of the backend (s->csb->dst) to be able to use
tcp_action_req_set_dst()
- SRV_F_MAPPORTS need to be set on the proxy in order to allow the port
change in alloc_dst_address()
The httpclient_resolve_init() adds http-request actions which does the
resolving using the Host header of the HTTP request.
The parse_http_req_cond function is directly used over an array of http
rules.
The do-resolve rule uses the "default" resolvers section. If this
section does not exist in the configuration, the resolving is disabling.
The httpclient DNS resolver will need a more efficient URL parser which
splits the URL into parts and does not try to resolve.
httpclient_spliturl uses the http_uri_parser in order to split the URL
into several ist.
The result of the function host part is then processed into str2ip2(),
and fails if it it not an IP, allowing us to resolve the domain later.
We rely on the largest ID which was used to open streams to know if the
stream we received STREAM frames for is closed or not. If closed, we return the
same status as the one for a STREAM frame which was a already received one for
on open stream.
It is possible that we continue to receive retransmitted STREAM frames after
the mux have been released. We rely on the ->rx.streams[].nb_streams counter
to check the stream was closed. If not, at this time we drop the packet.
This is required when the retransmitted frame types when the mux is released.
We add a counter for the number of streams which were opened or closed by the mux.
After the mux has been released, we can rely on this counter to know if the STREAM
frames are retransmitted ones or not.
That's similar to what was done for conn_streams and connections. The
flags were only set exactly when the relevant pointers were allocated,
so better test the pointer than the flag and stop setting the flag.
Just like for the conn_stream, now that these addresses are dynamically
allocated, there is no single case where the pointer is set without the
corresponding flag, and the flag is used as a permission to dereference
the pointer. Let's just replace the test of the flag with a test of the
pointer and remove all flag assignment. This makes the code clearer
(especially in "if" conditions) and saves the need for future code to
think about properly setting the flag after setting the pointer.
Some of the protocol-level ->connect() functions currently dereference
the connection's destination address while others test it and return an
error. There's normally no more non-bogus code path that calls such
functions without a valid destination address on the connection, so
let's unify these functions and just place a BUG_ON() there, and drop
the useless test that's supposed to return an internal error.
This flag is no longer needed now that it must always match the presence
of a destination address on the backend conn_stream. Worse, before previous
patch, if it were to be accidently removed while the address is present, it
could result in a leak of that address since alloc_dst_address() would first
be called to flush it.
Its usage has a long history where addresses were stored in an area shared
with the connection, but as this is no longer the case, there's no reason
for putting this burden onto application-level code that should not focus
on setting obscure flags.
The only place where that made a small difference is in the dequeuing code
in case of queue redistribution, because previously the code would first
clear the flag, and only later when trying to deal with the queue, would
release the address. It's not even certain whether there would exist a
code path going to connect_server() without calling pendconn_dequeue()
first (e.g. retries on queue timeout maybe?).
Now the pendconn_dequeue() code will rely on SF_ASSIGNED to decide to
clear and release the address, since that flag is always set while in
a server's queue, and its clearance implies that we don't want to keep
the address. At least it remains consistent and there's no more risk of
leaking it.
These functions dynamically allocate a source or destination address but
start by clearing the previous one. There's a non-null risk of leaking
addresses there in case of misuse. Better have them do nothing if the
address was already allocated.
HTTP/3 implementation must ignore unknown frame type to support protocol
evolution. Clients can deliberately use unknown type to test that the
server is conformant : this principle is called greasing.
Quiche client uses greasing on H3 frame type with a zero length frame.
This reveals a bug in H3 parsing code which causes the transfer to be
interrupted. Fix this by removing the break statement on ret variable.
Now the parsing loop is only interrupted if input buffer is empty or the
demux is blocked.
This should fix http/3 freeze transfers with the quiche client. Thanks
to Lucas Pardue from Cloudflare for his report on the bug. Frédéric
Lecaille quickly found the source of the problem which helps me to write
this patch.
If the request channel buffer is full, H3 demuxing must be interrupted
on the stream until some read is performed. This condition is reported
if the HTX stream buffer qcs.rx.app_buf is full.
In this case, qcs instance is marked with a new flag QC_SF_DEM_FULL.
This flag cause the H3 demuxing to be interrupted. It is cleared when
the HTX buffer is read by the conn-stream layer through rcv_buf
operation.
When the flag is cleared, the MUX tasklet is woken up. However, as MUX
iocb does not treat Rx for the moment, this is useless. It must be fix
to prevent possible freeze on POST transfers.
In practice, for the moment the HTX buffer is never full as the current
Rx code is limited by the quic-conn receive buffer size and the
incomplete flow-control implementation. So for now this patch is not
testable under the current conditions.
It seems this multiplier ended up in oblivion. Indeed a multiplier must be
applied to the loss delay expressed as an RTT multiplier: 9/8.
So, some packets were detected as lost too soon, leading to be retransmitted too
early!
If the MUX cannot handle immediately nor buffer a STREAM frame, the
packet containing it must not be acknowledge. This is in conformance
with the RFC9000.
qcc_recv() return codes have been adjusted to differentiate an invalid
frame with an already fully received offset which must be acknowledged.
If a packet contains a STREAM frame but the MUX is not allocated, the
frame cannot be enqueued. According to the RFC9000, we must not
acknowledge the packet under this condition.
This may prevents a bug with firefox which keeps trying on refreshing
the web page. This issue has already been detected before closing state
implementation : haproxy wasn't emitted CONNECTION_CLOSE and keeps
acknowledge STREAM frames despite not handle them.
In the future, it might be necessary to respond with a CONNECTION_CLOSE
if the MUX has already been freed.
In issue #1468 it was reported that sometimes server-side connection
attempts were only validated after the "timeout connect" value, and
that would only happen with an H2 client. A long code analysis with the
output dumps showed only one possible call path: an I/O event on the
frontend while reading had just been disabled calls h2_wake() which in
turns wakes cs_conn_io_cb(), which tries cs_conn_process() and cs_notify(),
which sees that the other side is not blocked (already in CS_ST_CON)
and tries cs_chk_snd() on it. But on that side the connection had just
finished to be set up and not yet woken the stream up, cs_notify()
would then call cs_conn_send() which succeeds and passes the connection
to CS_ST_RDY. The problem is that nothing new happened on the frontend
side so there's no reason to wake the stream up and the backend-side
conn_stream remains in CS_ST_RDY state with the stream never being
woken up.
Once the "timeout connect" strikes, process_stream() is woken up and
finds the connection finally setup, so it ignores the timeout and goes
on.
The number of conditions to meet to reproduce this is huge, which also
explains why the reporter says it's "occasional" and we were never able
to reproduce it in the lab. It needs at least reads to be disabled and
immediately re-enabled on the frontend side (e.g. buffer full) with an
I/O even reported before the poller had an opportunity to be disabled
but with no subscribe being reinstalled, so that sock_conn_iocb() has
no other choice but calling h2_wake(), and exactly at the same time
the backend connection must finish to set up so that it was not yet
reported by the poller, the data were sent and the polling for writes
disabled.
Several factors are to be considered here:
- h2_wake() should probably not call h2_wake_some_streams() for
ret >= 0 (common case), but only if some special event is reported
for at least one stream; that part is sensitive though as in the
past we managed to lose some rare cases (e.g. restart processing
after a pause), and such wakeups are extremely rare so we'd better
make that effort once in a while.
- letting a lazy forward attempt on the frontend confirm a backend
connection establishment is too smart to be reliable. That wasn't
in fact the intent and it's inherited from the very old code where
muxes didn't exist and where it was guaranteed that an even at this
layer would wake everyone up.
Here the best thing to do is to refrain from attempting to forward data
until the connection is confirmed. This will let the poller report the
connect() event to the backend side which will process it as it should
and does in all other cases.
Thanks to Jimmy Crutchfield for having reported useful traces and
tested patches.
This will have to be backported to all stable branches after some
observation. Before 2.6 the function is stream_int_chk_snd_conn(),
and the flag to remove is SI_SB_CON.
The use of co_set_data() should be strictly limited to setting the amount of
existing data to be transmitted. It ought not be used to decrement the
output after the data have left the buffer, because doing so involves
performing incorrect calculations using co_data() that still comprises data
that are not in the buffer anymore. Let's use c_rew() for this, which is
made exactly for this purpose, i.e. decrement c->output by as much as
requested.
When HTX blocks are transfer from the HTTP client context to the request
channel, via htx_xfer_blks() function, the metadata must also be counted, in
addition to the data size. Otherwise, expected payload size will not be
copied because the metadata of an HTX block (8 bytes) will be reserved. And
if the payload size is lower than 8 bytes, nothing will be copied. Thus only
a zero-copy will be able to copy the payload.
This issue is 2.6-specific, no backport is needed.
When the HTTP client consumes the response, it loops on the HTX message to
copy blocks content and it removes blocks by calling htx_remove_blk(). But
this function removes a block and returns the next one in the HTX
message. The result must be used instead of using htx_get_next(). It is
especially important because the block used in htx_get_next() loop was
removed. It only works because the message is not defragmented during the
loop.
In addition, the loop on the response was simplified to iter on blocks
instead of positions.
This patch must be backported to 2.5.
Only CS_EP_ERROR flag is now removed from the endpoint when a reset is
performed. When a new the endpoint is allocated, flags are preserved. It is
the caller responsibility to remove other flags, depending on its need.
Concretly, during a connection retry or a L7 retry, we must preserve
flags. In tcpcheck and the CLI, we reset flags.
This patch is 2.6-specific. No backport needed.
The SSL_SERVER_VERIFY_* constants were incorrectly set on the httpclient
server verify. The right constants are SSL_SOCK_VERIFY_* .
This could cause issues when using "httpclient-ssl-verify" or when the
SSL certificates can't be loaded.
No backport needed
This function is used to parse the QUIC packets carried by a UDP datagram.
When a correct packet could be found, the ->len RX packet structure value
is set to the packet length value. On the contrary, it is set to the remaining
number of bytes in the UDP datagram if no correct QUIC packet could be found.
So, there is no need to make this function return a status value. It allows
the caller to parse any QUIC packet carried by a UDP datagram without this.
Any client Initial packet carried in a datagram smaller than QUIC_INITIAL_PACKET_MINLEN(200)
bytes must be discarded. This does not mean we must discard the entire datagram.
So we must at least try to parse the packet length before dropping the packet
and return its length from qc_lstnr_pkt_rcv().
A crash is possible under such circumtances:
- The congestion window is drastically reduced to its miniaml value
when a quic listener is experiencing extreme packet loss ;
- we enqueue several STREAM frames to be resent and some of them could not be
transmitted ;
- some of the still in flight are acknowledged and trigger the
stream memory releasing ;
- when we come back to send the remaing STREAM frames, haproxy
crashes when it tries to build them.
To fix this issue, we mark the STREAM frame as lost when detected as lost.
Then a lookup if performed for the stream the STREAM frames are attached
to before building them. They are released if the stream is no more available
or the data range of the frame is consumed.
When we have to probe the peer, we must first try to send new data. This is done
here waking up the mux after having set the number of maximum number of datagrams
to send to QUIC_MAX_NB_PTO_DGRAMS (2). Of course, this is only the case if the
mux was subscribed to SEND events.
This may happen in rare cases with extreme packet loss (30% for both TX and RX)
which leads the congestion window to decrease down to its minimal value (two
datagrams). Under such circumtances, no ack-eliciting frame can be added to
a packet by qc_build_frms(). In this case we must cancel the packet building
process if there is no ACK or probe (PING frame) to send.
This function must return a successful status as soon as it could be build
a frame to be embedded by a packet. This behavior was broken by the last
modifications. This was due to a dangerous "ret = 1" statement inside
a loop. This statement must be reach only if we go out of a switch/case
after a "break" statement.
Add comments to mention this information.
When we are probing, we do not receive packets, furthermore all ACK frames have
already been sent. This is useless to send ACK when probing the peer. This
modification does not reset the flag which marks the connection as requiring an
ACK frame to be sent. If this is the case, this will be taken into an account
by after the probing process.
Make the two I/O handlers quic_conn_io_cb() and quic_conn_app_io_cb() call
qc_dgrams_retransmit() after probing retransmissions need was detected by
the timer task (qc_process_timer()).
We must modify qc_prep_pkts() to support QUIC_TLS_ENC_LEVEL_NONE as <next_tel>
parameter when called from qc_dgrams_retransmit().
When probing retranmissions with old data are needed for the connection we
mark the packets as probing with old data to track them when acknowledged:
we do not resend frames with old data when lost.
Modify qc_send_app_pkt() to distinguish the case where it sends new data
against the case where it sends old data during probing retransmissions.
We add <old_data> boolean parameter to this function to do so. The mux
never directly send old data when probing retransmissions are needed by
the connection.
This function is used to requeue the TX frames from TX packets which have
been detected as lost. The modifications consist in avoiding resending frames from
duplicated frames used to probe the peer. This is useless. Only the original
frames loss must be taken into an account because detected as lost before
the retransmitted frames. If these latter are also detected as lost, other
duplicated frames would have been retransmitted before their loss detection.
qc_prep_fast_retrans() and qc_prep_hdshk_fast_retrans() are modified to
take two list of frames as parameters. Two lists are needed for
qc_prep_hdshk_fast_retrans() to build datagrams with two packets during
handshake. qc_prep_fast_retrans() needs two lists of frames to be used
to send two datagrams with one list by datagram.
We want to be able to resend frames from list of frames during handshakes to
resend datagrams with the same frames as during the first transmissions.
This leads to decrease drasctically the chances of frame fragmentation due to
variable lengths of the packet fields. Furthermore the frames were not duplicated
when retransmitted from a packet to another one. This must be the case only during
packet loss dectection.
qc_dup_pkt_frms() is there to duplicate the frames from an input list to an output
list. A distinction is made between STREAM frames and the other ones because we
can rely on the "acknowledged offset" the aim of which is to track the number
of bytes which were acknowledged from TX STREAM frames.
qc_release_frm() in addition to release the frame passed as parameter, also mark
the duplicate STREAM frames as acknowledeged.
qc_send_hdshk_pkts() is the qc_send_app_pkts() counterpart to send datagrams from
at most two list of frames to be able to coalesced packets from two different
packet number spaces
qc_dgrams_retransmit() is there to probe the peer with datagrams depending on the
need of the packet number spaces which must be flag with QUIC_FL_PKTNS_PROBE_NEEDED
by the PTO timer task (qc_process_timer()).
Add QUIC_FL_CONN_RETRANS_NEEDED connection flag definition to mark a quic_conn
struct as needing a retranmission.
Add QUIC_FL_PKTNS_PROBE_NEEDED to mark a packet number space as needing a
datagram probing.
Set these flags from process_timer() to trigger datagram probings.
Do not initiate anymore datagrams probing from any quic encryption level.
This will be done from the I/O handlers (quic_conn_io_cb() during handshakes and
quic_conn_app_io_cb() after handshakes).
Add QUIC_FL_TX_PACKET_COALESCED flag to mark a TX packet as coalesced with others
to build a datagram.
Ensure we do not directly retransmit frames from such coalesced packets. They must
be retransmitted from their packet number spaces to avoid duplications.
We want to track the frames which have been duplicated during retransmissions so
that to avoid uselessly retransmitting frames which would already have been
acknowledged. ->origin new member is there to store the frame from which a copy
was done, ->reflist is a list to store the frames which are copies.
Also ensure all the frames are zeroed and that their ->reflist list member is
initialized.
Add QUIC_FL_TX_FRAME_ACKED flag definition to mark a TX frame as acknowledged.
Add a loop in the bidi STREAM function. This will call repeatdly
qcc_decode_qcs() and dequeue buffered frames.
This is useful when reception of more data is interrupted because the
MUX buffer was full. qcc_decode_qcs() has probably free some space so it
is useful to immediatly retry reception of buffered frames of the qcs
tree.
This may fix occurences of stalled Rx transfers with large payload.
Note however that there is still room for improvment. The conn-stream
layer is not able at this moment to retrigger demuxing. This is because
the mux io-handler does not treat Rx : this may continue to cause
stalled tranfers.
Previously, h3 layer was not able to demux a DATA frame if not fully
received in the Rx buffer. This causes evident limitation and prevents
to be able to demux a frame bigger than the buffer.
Improve h3_data_to_htx() to support partial frame demuxing. The demux
state is preserved in the h3s new fields : this is useful to keep the
current type and length of the demuxed frame.
Define a new structure h3s used to provide context for a H3 stream. This
structure is allocated and stored in the qcs thanks to previous commit
which provides app-layer context storage.
For now, h3s is empty. It will soon be completed to be able to support
stateful demux : this is required to be able to demux an incomplete
frame if the rx buffer is full.
Define 2 new callback for qcc_app_ops : attach and detach. They are
called when a qcs instance is respectively allocated and freed. If
implemented, they can allocate a custom context stored in the new
abstract field ctx of qcs.
For now, h3 and hq-interop does not use these new callbacks. They will
be soon implemented by the h3 layer to allocate a context used for
stateful demuxing.
This change is required to support the demuxing of H3 frames bigger than
a buffer.
Edit the functions used for HEADERS and DATA parsing. They now return
the number of bytes handled.
This change will help to demux H3 frames bigger than the buffer.
Improve the reception for STREAM frames. In qcc_recv(), if the frame is
bigger than the remaining space in rx buffer, do not reject it wholly.
Instead, copy as much data as possible. The rest of the data is
buffered.
This is necessary to handle H3 frames bigger than a buffer. The H3 code
does not demux until the frame is complete or the buffer is full.
Without this, the transfer on payload larger than the Rx buffer can
rapidly freeze.
Handle wrapping buffer in h3_data_to_htx(). If data is wrapping, first
copy the contiguous data, then copy the data in front of the buffer.
Note that h3_headers_to_htx() is not able to handle wrapping data. For
the moment, a BUG_ON was added as a reminder. This cas never happened,
most probably because HEADERS is the first frame of the stream.
Always set HTX flag HTX_SL_F_XFER_LEN for http/3. This is correct
becuase the size of H3 requests is always known thanks to the protocol
framing.
This may fix occurences of incomplete POST requests when the client side
of the connection has been closed before.
Add new qcs fields to count the sum of bytes received for each stream.
This is necessary to enforce flow-control for reception on the peer.
For the moment, the implementation is partial. No MAX_STREAM_DATA or
FLOW_CONTROL_ERROR are emitted. BUG_ON statements are here as a
remainder.
This means that for the moment we do not support POST payloads greater
that the initial max-stream-data announced (256k currently).
At least, we now ensure that we never buffer a frame which overflows the
flow-control limit : this ensures that the memory consumption per stream
should stay under control.
qcs and its field are not properly freed if the conn-stream allocation
fail in qcs_new(). Fix this by having a proper deinit code with a
dedicated label.
Comments were not properly edited since the splitting of functions for
stream emission. Also "payload" argument has been renamed to "in" as it
better reflects the function purpose.
This adds a deinit_idle_conns() function that's called on deinit to
release the per-thread idle connection management tasks. The global
task was already taken care of.
The freeing of pre-check callbacks was missing when this feature was
recently added with commit b53eb8790 ("MINOR: init: add the pre-check
callback"), let's do it to make valgrind happy.
Tim reported in issue #1676 that just like startup_logs, trash buffers
are not released on deinit since they're thread-local, but valgrind
notices it when quitting before creating threads like "-c -f ...". Let's
just subscribe the function to deinit in addition to threads' end.
The two "free(x);x=NULL;" in free_trash_buffers_per_thread() were
also simplified using ha_free().
Tim reported in issue #1676 that we don't release startup logs if we
warn during startup and quit before creating threads (e.g. -c -f ...).
Let's subscribe deinit_errors_buffers() to both thread's end and
deinit. That's OK since it uses both per-thread and global variables,
and is idempotent.
Low footprint client machines may not have enough memory to download a
complete 16KB TLS record at once. With the new option the maximum
record size can be defined on the server side.
Note: Before limiting the the record size on the server side, a client should
consider using the TLS Maximum Fragment Length Negotiation Extension defined
in RFC6066.
This patch fixes GitHub issue #1679.
In issue #1677, Tim reported that we don't correctly free some shared
pools on exit. What happens in fact is that pool_destroy() is meant to
be called once per pool *pointer*, as it decrements the use count for
each pass and only releases the pool when it reaches zero. But since
pool_destroy_all() iterates over the pools list, it visits each pool
only once and will not eliminate some of them, which thus remain in the
list.
In an ideal case, the function should loop over all pools for as long
as the list is not empty, but that's pointless as we know we're exiting,
so let's just set the users count to 1 before the call so that
pool_destroy() knows it can delete and release the entry.
This could be backported to all versions (memory.c in 2.0 and older) but
it's not a real problem in practice.
We thought that we could get rid of some DISGUISE() with commit a80e4a354
("MINOR: fd: add functions to set O_NONBLOCK and FD_CLOEXEC") thanks to
the calls being in a function but that was without counting on Coverity.
Let's put it directly in the function since most if not all callers don't
care about this result.
It appears that it is safe to call perform a clean deinit at this point, so
let's do this to exercise the deinit paths some more.
Running `valgrind --leak-check=full --show-leak-kinds=all ./haproxy -vv` with
this change reports:
==261864== HEAP SUMMARY:
==261864== in use at exit: 344 bytes in 11 blocks
==261864== total heap usage: 1,178 allocs, 1,167 frees, 1,102,089 bytes allocated
==261864==
==261864== 24 bytes in 1 blocks are still reachable in loss record 1 of 2
==261864== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==261864== by 0x324BA6: hap_register_pre_check (init.c:92)
==261864== by 0x155824: main (haproxy.c:3024)
==261864==
==261864== 320 bytes in 10 blocks are still reachable in loss record 2 of 2
==261864== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==261864== by 0x26E54E: cfg_register_postparser (cfgparse.c:4238)
==261864== by 0x155824: main (haproxy.c:3024)
==261864==
==261864== LEAK SUMMARY:
==261864== definitely lost: 0 bytes in 0 blocks
==261864== indirectly lost: 0 bytes in 0 blocks
==261864== possibly lost: 0 bytes in 0 blocks
==261864== still reachable: 344 bytes in 11 blocks
==261864== suppressed: 0 bytes in 0 blocks
which is looking pretty good.
A config like the following:
global
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
resolvers unbound
nameserver unbound 127.0.0.1:53
will report the following leak when running a configuration check:
==241882== 6,991 (6,952 direct, 39 indirect) bytes in 1 blocks are definitely lost in loss record 8 of 13
==241882== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==241882== by 0x25938D: cfg_parse_resolvers (resolvers.c:3193)
==241882== by 0x26A1E8: readcfgfile (cfgparse.c:2171)
==241882== by 0x156D72: init (haproxy.c:2016)
==241882== by 0x156D72: main (haproxy.c:3037)
because the `.px` member of `struct resolvers` is not freed.
The offending allocation was introduced in
c943799c86 which is a reorganization that
happened during development of 2.4.x. This fix can likely be backported without
issue to 2.4+ and is likely not needed for earlier versions as the leak happens
during deinit only.
To make the deinit function a proper inverse of the init function we need to
free the `http_err_chunks`:
==252081== 311,296 bytes in 19 blocks are still reachable in loss record 50 of 50
==252081== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==252081== by 0x2727EE: http_str_to_htx (http_htx.c:914)
==252081== by 0x272E60: http_htx_init (http_htx.c:1059)
==252081== by 0x26AC87: check_config_validity (cfgparse.c:4170)
==252081== by 0x155DFE: init (haproxy.c:2120)
==252081== by 0x155DFE: main (haproxy.c:3037)
A memory leak was introduced when ignore-empty option was added to redirect
rules. If there is no location, when this option is set, the redirection is
aborted and the processing continues. But when this happened, the trash buffer
allocated to format the redirect response was not released.
The bug was introduced by commit bc1223be7 ("MINOR: http-rules: add a new
"ignore-empty" option to redirects.").
This patch should fix the issue #1675. It must be backported to 2.5.
If the "close-spread-time" option is set to "infinite", active
connection closing during a soft-stop can be disabled. The 'connection:
close' header or the GOAWAY frame will not be added anymore to the
server's response and active connections will only be closed once the
clients disconnect. Idle connections will not be closed all at once when
the soft-stop starts anymore, and each idle connection will follow its
own timeout based on the multiple timeouts set in the configuration (as
is the case during regular execution).
This feature request was described in GitHub issue #1614.
This patch should be backported to 2.5. It depends on 'MEDIUM: global:
Add a "close-spread-time" option to spread soft-stop on time window'.
HAProxy crashes when "show ssl ca-file" is being called on a ca-file
which contains a lot of certificates. (127 in our test with
/etc/ssl/certs/ca-certificates.crt).
The root cause is the fonction does not yield when there is no available
space when writing the details, and we could write a lot.
To fix the issue, we try to put the data with ci_putchk() after every
show_cert_detail() and we yield if the ci_putchk() fails.
This patch also cleans up a little bit the code:
- the end label is now a real end with a return 1;
- i0 is used instead of (long)p1
- the ID is stored upon yield
Since the httpclient verify now has a fallback which disable the SSL in
the httpclient without exiting haproxy at startup, we can safely
re-enable it by default.
It could still be disabled with "httpclient-ssl-verify none".
The cafile_tree was never free upon deinit, making valgrind and ASAN
complains when haproxy quits.
This could be backported as far as 2.2 but it requires the
ssl_store_delete_cafile_entry() helper from
5daff3c8ab.
Emit a warning when the ca-file couldn't be loaded for the httpclient,
and disable the SSL of the httpclient.
We must never be in a case where the verify is disabled without any
configuration, so better disable the SSL completely.
Move the check on the scheme above the initialization of the applet so
we could abort before initializing the appctx.
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
This gets rid of most open-coded fcntl() calls, some of which were passed
through DISGUISE() to avoid a useless test. The FD_CLOEXEC was most often
set without preserving previous flags, which could become a problem once
new flags are created. Now this will not happen anymore.
Instead of seeing each location manipulate the fcntl() themselves and
often forget to check previous flags, let's centralize the functions to
do this. It also allows to drop fcntl.h from most call places and will
ease the adoption of different OS-specific mechanisms if needed. Note
that the fd_set_nonblock() function purposely doesn't check the previous
flags as it's meant to be used on new FDs only.
Despite what the 'close-spread-time' option should do, the
'connection:close' header was always added to HTTP responses during
soft-stops even with a soft-stop window defined.
This patch adds the proper random based closing to HTTP connections
during a soft-stop (based on the time left in the soft close window).
It should be backported to 2.5 once 'MEDIUM: global: Add a
"close-spread-time" option to spread soft-stop on time window' is
backported as well.
Some older systems may routinely return EWOULDBLOCK for some syscalls
while we tend to check only for EAGAIN nowadays. Modern systems define
EWOULDBLOCK as EAGAIN so that solves it, but on a few older ones (AIX,
VMS etc) both are different, and for portability we'd need to test for
both or we never know if we risk to confuse some status codes with
plain errors.
There were few entries, the most annoying ones are the switch/case
because they require to only add the entry when it differs, but the
other ones are really trivial.
__comp_fetch_init() only presets the maxzlibmem, and only when both
USE_ZLIB and DEFAULT_MAXZLIBMEM are set. The intent is to preset a
default value to protect the system against excessive memory usage
when no setting is set by the user.
Nowadays the entry in the global struct is always there so there's no
point anymore in passing via a constructor to possibly set this value.
Let's go the cleaner way by always presetting DEFAULT_MAXZLIBMEM to 0
in defaults.h unless these conditions are met, and always assigning it
instead of pre-setting the entry to zero. This is more straightforward
and removes some ifdefs and the last constructor. In addition, now the
setting has a chance of being found.
The constructor present there could be replaced with an initcall.
This one is set at level STG_PREPARE because it also zeroes the
lock_stats, and it's a bit odd that it could possibly have been
scheduled to run after other constructors that might already
preset some of these locks by accident.
Transport layers (raw_sock, ssl_sock, xprt_handshake and xprt_quic)
were using 4 constructors and 2 destructors. The 4 constructors were
replaced with INITCALL and the destructors with REGISTER_POST_DEINIT()
so that we do not depend on this anymore.
Pollers are among the few remaining blocks still using constructors
to register themselves. That's not needed anymore since the initcalls
so better turn to initcalls.
On some systems, the hard limit for ulimit -n may be huge, in the order
of 1 billion, and using this to automatically compute maxconn doesn't
work as it requires way too much memory. Users tend to hard-code maxconn
but that's not convenient to manage deployments on heterogenous systems,
nor when porting configs to developers' machines. The ulimit-n parameter
doesn't work either because it forces the limit. What most users seem to
want (and it makes sense) is to respect the system imposed limits up to
a certain value and cap this value. This is exactly what fd-hard-limit
does.
This addresses github issue #1622.
Almost all of our hash-based LB algorithms are implemented as special
cases of something that can now be achieved using sample expressions,
and some of them have adopted some options to adapt their behavior in
ways that could also be achieved using converters.
There are users who want to hash other parameters that are combined
into variables, and who set headers from these values and use
"balance hdr(name)" for this.
Instead of constantly implementing specific options and having users
hack around when they want a real hash, let's implement a native hash
mode that applies to a standard sample expression. This way, any
fetchable element (including variables) may be used to construct the
hash, even modified by any converter if desired.
Any type except bool could cast to bin, while it can cast to string.
That's a bit inconsistent, and prevents a boolean from being used as
the entry of a hash function while any other type can. This is a
problem when passing via variable where someone could use:
... set-var(txn.bar) always_false
to temporarily disable something, but this would result in an empty
hash output when later doing:
... var(txn.bar),sdbm
Instead of using c_int2bin() as is done for the string output, better
enfore an set of inputs or exactly 0 or 1 so that a poorly written sample
fetch function does not result in a difficult to debug hash output.
Surprisingly, while about all calls to a sample cast function carefully
avoid calling c_none(), sample_fetch_as_type() makes no effort regarding
this, while by nature, the function is most often called with an expected
output type similar to the one of the expresison. Let's add it to shorten
the most common call path.
The use_backend and use-server contexts were not enumerated in
smp_resolve_args, and while use-server doesn't currently take an
expression, at least use_backend supports that, and both entries ought
to be listed for completeness. Now an error in a use_backend rule
becomes more precise, from:
[ALERT] (12373) : config : parsing [use-srv.cfg:33]: unable to find
backend 'foo' referenced in arg 1 of sample fetch
keyword 'nbsrv' in proxy 'echo'.
to:
[ALERT] (12307) : config : parsing [use-srv.cfg:33]: unable to find
backend 'foo' referenced in arg 1 of sample fetch
keyword 'nbsrv' in use_backend expression in proxy
'echo'.
This may be backported though this is totally harmless.
Since commit dd7e6c6dc ("BUG/MINOR: http-rules: completely free incorrect
TCP rules on error") free_act_rule() is called on some error paths, and one
of them involves incomplete redirect rules that may cause a crash if the
rule wasn't yet initialized, as shown in this config snippet:
frontend ft
mode http
bind *:8001
http-request redirect location /%[always_false,sdbm]
Let's simply make release_http_redir() more robust against null redirect
rules.
No backport needed since it seems that the only way to trigger this was
the extra check above that was merged during 2.6-dev.
The function checking captures defined in tcp-request content ruleset didn't
use the right rule arguments. "arg.trk_ctr" was used instead of "arg.cap".
This patch must be backported as far as 2.2.
Since the 2.5, it is possible to define TCP/HTTP ruleset in defaults
sections. However, rules defining a capture in defaults sections was not
properly handled because they was not shared with the proxies inheriting
from the defaults section. This led to crash when haproxy tried to store a
new capture.
So now, to fix the issue, when a new proxy is created, the list of captures
points to the list of its defaults section. It may be NULL or not. All new
caputres are prepended to this list. It is not a problem to share the same
defaults section between several proxies, because it is not altered and we
take care to not release it when corresponding proxies are freed but only
when defaults proxies are freed. To do so, defaults proxies are now
unreferenced at the end of free_proxy() function instead of the beginning.
This patch should fix the issue #1674. It must be backported to 2.5.
Captures must only be defined in proxies with the frontend capabilities or
in defaults sections used by proxies with the frontend capabilities. Thus,
an extra check is added to be sure a defaults section defining a capture
will never be references by a backend.
Note that in this case, only named captures in "tcp-request content" or
"http-request" rules are possible. It is not possible in a defaults section
to decalre a capture slot. Not yet at least.
This patch must be backported to 2.5. It is releated to issue #1674.
When using qc_stream_desc_ack(), the stream instance may be freed if
there is no more data in its buffers. This also means that all frames
still stored waiting for ACK for this stream are freed via
qc_stream_desc_free().
This is particularly important in quic_stream_try_to_consume() where we
loop over the frames tree of the stream. A use-after-free is present in
cas the stream has been freed in the trace "stream consumed" which
dereference the frame. Fix this by first checking if the stream has been
freed or not.
This bug was detected by using ASAN + quic traces enabled.
It's long been known that queues didn't scale with threads for various
reasons ranging from the cost of the queue lock to the cost of the
massive amount of inter-thread wakeups.
But some recent reports showing deplorable perfs with threads used at
100% CPU helped us notice that the two elements above add on top of
each other:
- with plenty of inter-thread wakeups, the scheduler takes a lot of
time to dequeue pending tasks from the shared queue ;
- the lock held by the scheduler to do this slows down subsequent
task_wakeup() calls from the the queue that are made under the
queue's lock
- the queue's lock slows down addition of new requests to the queue
and adds up to the number of needed queue entries for a steady
traffic.
But the cost of the share queue has no reason for being paid because
it had already been paid when process_stream() added the request to
the queue. As such an instant wakeup is perfectly fit for this.
This is exactly what this patch does, it uses tasklet_instant_wakeup()
to dequeue pending requests, which has the effect of not bloating the
shared queue, hence not requiring the global queue lock, which in turn
results in the wakeup to be much faster, and the queue lock to be much
shorter. In the end, a test with 4k concurrent connections that was
being limited to 40-80k requests/s before with 16 threads, some of
which were stuck at 100% CPU now reaches 570k req/s with 4% idle.
Given that it's been found that it was possible to trigger the watchdog
on the queue lock under extreme conditions, and that such conditions
could happen when users want to protect their servers during a DoS, it
would definitely make sense to backport it to the most recent releases
(2.5 and 2.4 seem like good candidates especially because their scheduler
is modern enough to receive the change above). If a backport is performed,
the following patch is needed:
MINOR: task: add a new task_instant_wakeup() function
Remove CS_EP_EOS set erroneously on qc_rcv_buf().
This fixes POST with abortonclose. Previously, request was preemptively
aborted by haproxy due to the incorrect EOS flag.
For the moment, EOS flag is not set anymore. It should be set to warn
about a premature close from the client.
Since the idle connections management changed to use eb-trees instead of MT
lists, a lock must be acquired to manipulate servers idle/safe/available
connection lists. However, it remains an unprotected use in
connect_server(), when a connection is removed from an idle list if the mux
has no more streams available. Thus it is possible to remove a connection
from an idle list on a thread, while another one is looking for a idle
connection. Of couse, this may lead to a crash.
To fix the bug, we must take care to acquire the idle connections lock
first. The bug was introduced by the commit f232cb3e9 ("MEDIUM: connection:
replace idle conn lists by eb trees").
The patch must be backported as far as 2.4.
Disable temporary the SSL verify by default in the httpclient. The
initialization of the @system-ca during the init of the httpclient is a
problem in some cases.
The verify can be reactivated with "httpclient-ssl-verify required" in
the global section.
The httpclient HTTPS requests now enable the "verify required" option.
To achieve this, the "@system-ca" ca-file is configured in the
httpclient ssl server. Which means all the system CAs will be loaded at
haproxy startup.
Change the init order of the httpclient, a different init sequence is
required to allow a more complicated init.
The init is splitted in two parts:
- the first part is executed before config_check_validity(), which
allows to create proxy and more advanced stuff than STG_INIT, because
we might want to use stuff already initialized in haproxy (trash
buffers for example)
- the second part is executed after the config_check_validity(),
currently it is used for the log configuration.
This adds a call to function <fct> to the list of functions to be called at
the step just before the configuration validity checks. This is useful when you
need to create things like it would have been done during the configuration
parsing and where the initialization should continue in the configuration
check.
It could be used for example to generate a proxy with multiple servers using
the configuration parser itself. At this step the trash buffers are allocated.
Threads are not yet started so no protection is required. The function is
expected to return non-zero on success, or zero on failure. A failure will make
the process emit a succinct error message and immediately exit.
A conn-stream is never detached from an endpoint or an application alone,
except on a reset. Thus, to avoid any error, these functions are now
private. And cs_destroy() function is added to destroy a conn-stream. This
function is called when a stream is released, on the front and back
conn-streams, and when a health-check is finished.
When a client abort is detected with the server conn-stream in CS_ST_INI
state, there is no reason to detach the endpoing because we know there is no
endpoint attached to this conn-stream. This patch depends on the commit
"BUG/MEDIUM: conn-stream: Set back CS to RDY state when the appctx is
created".
When an appctx is created on the server side, we now set the corresponding
conn-stream to ready state (CS_ST_RDY). When it happens, the backend
conn-stream is in CS_ST_INI state. It is not consistant to let the
conn-stream in this state because it means it is possible to have a target
installed in CS_ST_INI state, while with a connection, the conn-stream is
switch to CS_ST_RDY or CS_ST_EST state.
It is especially anbiguous because we may be tempted to think there is no
endpoint attached to the conn-stream before the CS_ST_CON state. And it is
indeed the reason for a bug leading to a crash because a cs_detach_endp() is
performed if an abort is detected on the backend conn-stream in CS_ST_INI
state. With a mux or a appctx attached to the conn-stream, "->endp" field is
set to NULL. It is unexpected. The API will be changed to be sure it is not
possible. But it exposes a consistency issue with applets.
So, the conn-stream must not stay in CS_ST_INI state when an appctx is
attached. But there is no reason to set it in CS_ST_REQ. The conn-stream
must be set to CS_ST_RDY to handle applets and connections in the same
way. Note that if only the target is set but no appctx is created, the
backend conn-stream is switched from CS_ST_INI to CS_ST_REQ state to be able
to create the corresponding appctx. This part is unchanged.
This patch depends on the commit "MINOR: backend: Don't allow to change
backend applet".
The ambiguity exists on previous versions. But the issue is
2.6-specific. Thus, no backport is needed.
This part was inherited from haproxy-1.5. But since a while (at least 1.8),
the backend applet, once created, is no longer changed. Thus there is no
reason to still check if the target has changed. And in fact, if it was
still possible, there would be a memory leak because the old applet would be
lost and never released.
There is no reason to backport this fix because the leak only exists on a
dead code path.
When we want to serve a resource from the cache, if the applet creation
fails, the "cache-use" action must not yield. Otherwise, the stream will
hang. Instead, we now disable the cache. Thus the request may be served by
the server.
This patch must be backported as far as 1.8.
cs_applet_shut() now relies on CS_EP_SH* flags to performed the applet
shutdown. It means the applet release callback is called if there is no
CS_EP_SHR or CS_EP_SHW flags set. And it set these flags, CS_EP_SHRR and
CS_EP_SHWN more specifically, before exiting.
This way, cs_applet_shut() is the really equivalent to cs_conn_shut().
This function does not release the applet but only call the applet release
callback. It is equivalent to cs_conn_shut() but for applets. Thus the
function is renamed cs_applet_shut().
These functions don't close the connection but only perform shutdown for
reads and writes at the mux level. It is a bit ambiguous. Thus,
cs_conn_close() is renamed cs_conn_shut() and cs_conn_drain_and_close() is
renamed cs_conn_drain_and_shut(). These both functions rely on
cs_conn_shutw() and cs_conn_shutr().
Since the recent changes about the conn-streams, the stream dump in "show
sess all" command is a bit mangled. front and back conn-stream are now
properly displayed (csf and csb). In addition, when there is no backend
endpoint, "APPCTX" was always reported. Now, "NONE" is reported in this
case.
It is 2.6-specific. No backport needed.
Since previous patch
MINOR: mux-quic: split xfer and STREAM frames build
there is no way to report an error in qcs_xfer_data().
This should fix github issue #1669.
As anticipated in commit 211ea252d ("BUG/MINOR: logs: fix logsrv leaks
on clean exit"), there were indeed other corner cases that were not
properly covered. Setting the http client's ring_name to NULL make the
sink lookup crash on startup in sink_find () with a config as simple as:
global
log ring@buf0 local0
The fields must be properly initialized (both config file name and
the ring_name). This only needs to be backported if/when the commit
above is backported.
Do not initialize mux task timeout if timeout client is set to 0 in the
configuration. Check for the task before queuing it in qc_io_cb() or
qc_detach().
This fix a crash when timeout client is 0 or undefined.
Unsubscribe from lower layer on qc_release. This ensures that the lower
layer won't wake up a null tasklet after the MUX has been released and
may prevent a crash.
It is possible the xprt layer have to process retransmitted STREAM frames after
the mux was released. In this case, there is no need to try to wake it up.
Starting from OpenSSLv3, providers are at the core of cryptography
functions. Depending on the provider used, the way the SSL
functionalities work could change. This new 'show ssl providers' CLI
command allows to show what providers were loaded by the SSL library.
This is required because the provider configuration is exclusively done
in the OpenSSL configuration file (/usr/local/ssl/openssl.cnf for
instance).
A new line is also added to the 'haproxy -vv' output containing the same
information.
Complete qc_send function. After having processed each qcs emission, it
will now retry send on qcs where transfer can continue. This is useful
when qc_stream_desc buffer is full and there is still data present in
qcs buf.
To implement this, each eligible qcs is inserted in a new list
<qcc.send_retry_list>. This is done on send notification from the
transport layer through qcc_streams_sent_done(). Retry emission until
send_retry_list is empty or the transport layer cannot proceed more
data.
Several send operations are now called on two different places. Thus a
new _qc_send_qcs() function is defined to factorize the code.
This change should maximize the throughput during QUIC transfers.
MUX streams can now allocate multiple buffers for sending. quic-conn is
responsible to limit the total count of allowed allocated buffers. A
counter is stored in the new field <stream_buf_count>.
For the moment, the value is hardcoded to 30.
On stream buffer allocation failure, the qcc MUX is flagged with
QC_CF_CONN_FULL. The MUX is then woken up as soon as a buffer is freed,
most notably on ACK reception.
Acknowledge of STREAM has been complexified with the introduction of
stream multi buffers. Two functions are executing roughly the same set
of instructions in xprt_quic.c.
To simplify this, move the code complexity in a new function
qc_stream_desc_ack(). It will handle offset calculation, removal of
data, freeing oldest buffer and freeing stream instance if required.
The qc_stream_desc API is cleaner as qc_stream_desc_free_buf() ambiguous
function has been removed.
Complete the qc_stream_desc type to support multiple buffers on
emission. The main objective is to increase the transfer throughput.
The MUX is now able to transfer more data without having to wait ACKs.
To implement this feature, a new type qc_stream_buf is declared. it
encapsulates a buffer with a list element. New functions are defined to
retrieve the current buffer, release it or allocate a new one. Each
buffer is kept in the qc_stream_desc list until all of its data is
acknowledged.
On the MUX side, a qcs uses the current stream buffer to transfer data.
Once the buffer is full, it is released and a new one will be allocated
on a future qc_send() invocation.
Add a new member <qc> in qc_stream_desc structure. This change is
possible since previous patch which add quic-conn argument to
qc_stream_desc_new().
The purpose of this change is to simplify the future evolution of
qc-stream-desc API. This will avoid to repeat qc as argument in various
functions which already used a qc_stream_desc.
Simplify the model qcs/qc_stream_desc. Each types has now its own tree
node, stored respectively in qcc and quic-conn trees. It is still
necessary to mark the stream as detached by the MUX once all data is
transfered to the lower layer.
This might improve slightly the performance on ACK management as now
only the lookup in quic-conn is necessary. On the other hand, memory
size of qcs structure is increased.
Regroup all type definitions and functions related to qc_stream_desc in
the source file src/quic_stream.c.
qc_stream_desc complexity will be increased with the development of Tx
multi-buffers. Having a dedicated module is useful to mix it with
pure transport/quic-conn code.
Split qcs_push_frame() in two functions.
The first one is qcs_xfer_data(). Its purpose is to transfer data from
qcs.tx.buf to qc_stream_desc buffer. The second function is named
qcs_build_stream_frm(). It generates a STREAM frame using qc_stream_desc
buffer as payload.
The trace events previously associated with qcs_push_frame() has also
been split in two to reflect the new code structure.
The purpose of this refactoring is first to better reflect how sending
is implemented. It will also simplify the implementation of Tx
multi-buffer per streams.
The DH parameters used for OpenSSL versions 1.1.1 and earlier where
changed. For OpenSSL 1.0.2 and LibreSSL the newly introduced
ssl_get_dh_by_nid function is not used since we keep the original
parameters.
DHE ciphers do not present a security risk if the key is big enough but
they are slow and mostly obsoleted by ECDHE. This patch removes any
default DH parameters. This will effectively disable all DHE ciphers
unless a global ssl-dh-param-file is defined, or
tune.ssl.default-dh-param is set, or a frontend has DH parameters
included in its PEM certificate. In this latter case, only the frontends
that have DH parameters will have DHE ciphers enabled.
Adding explicitely a DHE ciphers in a "bind" line will not be enough to
actually enable DHE. We would still need to know which DH parameters to
use so one of the three conditions described above must be met.
This request was described in GitHub issue #1604.
RFC7919 defined sets of DH parameters supposedly strong enough to be
used safely. We will then use them when we can instead of our hard coded
ones (namely the ffdhe2048 and ffdhe4096 named groups).
The ffdhe2048 and ffdhe4096 named groups were integrated in OpenSSL
starting with version 1.1.1. Instead of duplicating those parameters in
haproxy for older versions of OpenSSL, we will keep using our own
parameters when they are not provided by the SSL library.
We will also need to keep our 1024 bits DH parameters since they are
considered not safe enough to have a dedicated named group in RFC7919
but we must still keep it for retrocompatibility with old Java clients.
This request was described in GitHub issue #1604.
MacOS can feed fc_rtt, fc_rttvar, fc_sacked, fc_lost and fc_retrans
so let's expose them on this platform.
Note that at the tcp(7) level, the API is slightly different, as
struct tcp_info is called tcp_connection_info and TCP_INFO is
called TCP_CONNECTION_INFO, so for convenience these ones were
defined to point to their equivalent. However there is a small
difference now in that tcpi_rtt is called tcpi_rttcur on this
platform, which forces us to make a special case for it before
other platforms.
While there is some overlap between what each OS provides in terms of
retrievable info, each set is not a real subset of another one and this
results in increasing complexity when trying to add support for new OSes.
Let's just condition each item to the OS that support it. It's not pretty
but at least it will avoid a real mess later.
Note that fc_rtt and fc_rttvar are supported on any OS that has TCP_INFO,
not just linux/freebsd/netbsd, so we continue to expose them unconditionally.
If the response is compressed, we must update the HTX start-line flags and
the HTTP message flags. It is especially important if there is another
filter enabled. Otherwise, there is no way to know the C-L header was
removed and T-E one was added. Except by looping on headers.
This patch is related to the issue #1660. It must backported as far as 2.0
(for HTX part only).
Instead of relying on the HTX start-line flags, it is better to rely on
http_msg flags to know if a content-length header can be added or not. In
addition, if the header is added, HTTP_MSGF_CNT_LEN flag must be added.
Because of this bug, an invalid message can be emitted when the response is
compressed because it may contain C-L and a T-E headers.
This patch should fix the issue #1660. It must be backported as far as 2.2.
A released qc_stream_desc is freed as soon as all its buffer content has
been acknowledged. However, it may still contains other frames waiting
for ACK pointing to deleted buffer content. This can happen on
retransmission.
When freeing a qc_stream_desc, free all its frames in acked_frms tree to
fix memory leak. This may also possibly fix a crash on retransmission.
Now, the frames are properly removed from a packet. This ensure we do
not retransmit a frame whose buffer is deallocated.
The issue only concerns the backend connection. The conn-stream is now owned
by the stream and persists during all the stream life. Thus we must not
crush it when the backend connection is released.
It is 2.6-specific. No backport is needed.
While it's often a pain to try to figure a UNIX socket address, the
server ones are reliable and may be emitted in the check provided
they are retrieved in time. We cannot rely on addr_to_str() because
it only reports "unix" since it may be used to log client addresses
or listener addresses (which are renamed).
The address length was extended to 256 chars to deal with long paths
as previously it was limited to INET6_ADDRSTRLEN+1.
This addresses github issue #101. There's no point backporting this,
external checks are almost never used.
During the config parsing we preset the server's address and port, but
that's pointless since it's replaced during each check in order to deal
with the possibility that the address was changed since.
Github issue #472 reports a problem with short client connections making
stick-table entries disappear. The problem is in fact totally different
and stems at the connection establishment step.
What happens is that the stick-table there has a single entry. The
"stick-on" directive is forced to purge an existing entry before being
able to create a new one. The new entry will be committed during the
call to process_store_rules() on the response path.
But if the client sends the FIN immediately after the connection is set
up (e.g. using nc -z) then the SHUTR is received and will cancel the
connection setup just after it starts. This cancellation will induce a
call to cs_shutw() which will in turn leave the server-side state in
ST_DIS. This transition from ST_CON to ST_DIS doesn't belong to the
list of handled transition during the connection setup so it will be
handled right after on the regular path, causing the connection to be
closed. Because of this, we never pass through back_establish() and
the backend's analysers are never set on the response channel, which
is why process_store_rules() is not called and the stick-tables entry
never committed.
The comment above the code that causes this transition clearly says
that the function is to be used after the connection is established
with the server, but there's no such protection, and we always have
the AUTO_CLOSE flag there (but there's hardly any available condition
to eliminate it).
This patch adds a test for the connection not being in ST_CON or for
option abortonclose being set. It's sufficient to do the job and it
should not cause issues.
One concern was that the transition could happen during cs_recv()
after the connection switches from CON to RDY then the read0 would
be taken into account and would cause DIS to appear, which is not
handled either. But that cannot happen because cs_recv() doesn't do
anything until it's in ST_EST state, hence the read0() cannot be
called from CON/RDY. Thus the transition from CON to DIS is only
possible in back_handle_st_con() and back_handle_st_rdy() both of
which are called when dealing with the transition already, or when
abortonclose is set and the client aborts before connect() succeeds.
It's possible that some further improvements could be made to detect
this specific transition but it doesn't seem like anything would have
to be added.
This issue was first reported on 2.1. The abortonclose area is very
sensitive so it would be wise to backport slowly, and probably no
further than 2.4.
Emit a CONNECTION_CLOSE if the app layer cannot be properly initialized
on qc_xprt_start. This force the quic-conn to enter the closing state
before being closed.
Without this, quic-conn normal operations continue, despite the
app-layer reported as not initialized. This behavior is undefined, in
particular when handling STREAM frames.
Fix the return value used in quic-conn start callback for error. The
caller expects a negative value in this case.
Without this patch, the quic-conn and the connection stack are not
closed despite an initialization failure error, which is an undefined
behavior and may cause a crash in the end.
In the quic_session_accept, connection is in charge to call the
quic-conn start callback. If this callback fails for whatever reason,
there is a crash because of an explicit session_free.
This happens because the connection is now the owner of the session due
to previous conn_complete_session call. It will automatically calls
session_free. Fix this by skipping the session_free explicit invocation
on error.
In practice, currently this has never happened as there is only limited
cases of failures for conn_xprt_start for QUIC.
Implement qc_destroy. This callback is used to quickly release all MUX
resources.
session_free uses this callback. Currently, it can only be called if
there was an error during connection initialization. If not defined, the
process crashes.
When an HTTP client is started on an HAProxy compiled without the SSL
support, an error is triggered when HTTPS is used. In this case, the freshly
created conn-stream is released. But this code is specific to the non-SSL
part. Thus it is moved the in right #if/#else section.
This patch should fix the issue #1655.
The commit 744451c7c ("BUG/MEDIUM: mux-h1: Properly detect full buffer cases
during message parsing") introduced a regression if trailers are not
received in one time. Indeed, in this case, nothing is appended in the
channel buffer, while there are some data in the input buffer. In this case,
we must not request more room to the upper layer, especially because the
channel buffer can be empty.
To fix the issue, on trailers parsing, we consider the H1 stream as
congested when the max size allowed is reached. Of course, the H1 stream is
also considered as congested if the trailers are too big and the channel
buffer is not empty.
This patch should fix the issue #1657. It must be backported as far as 2.0.
For all muxes, the function responsible to release a mux is always called
with a defined mux. Thus there is no reason to test if it is defined or not.
Note the patch may seem huge but it is just because of indentation changes.
Several muxes (h2, fcgi, quic) don't support the protocol upgrade. For these
muxes, there is no reason to have code to support it. Thus in the destroy
callback, there is now a BUG_ON() and the release function is simplified
because the connection is always owned by the mux..
Once a mux initialized, the underlying connection alwaus exists from its
point of view and it is never removed until the mux is released. It may be
owned by another mux during an upgrade. But the pointer remains set. Thus
there is no reason to test it in the destroy callback function.
This patch should fix the issue #1652.
The doc states that timeout http-keep-alive is not set, timeout http-request
is used instead. As implemented in commit 15a4733d5 ("BUG/MEDIUM: mux-h2:
make use of http-request and keep-alive timeouts"), we use http-keep-alive
unconditionally between requests, with a fallback on client/server. Let's
make sure http-request is always used as a fallback for http-keep-alive
first.
This needs to be backported wherever the commit above is backported.
Thanks to Christian Ruppert for spotting this.
Commit 15a4733d5 ("BUG/MEDIUM: mux-h2: make use of http-request and
keep-alive timeouts") omitted to check the side of the connection, and
as a side effect, automatically enabled timeouts on idle backend
connections, which is totally contrary to the principle that they
must be autonomous.
This needs to be backported wherever the patch above is backported.
If the client does not sent an ALPN, the SSL ALPN negotiation callback
is not called. However, the handshake is reported as successful. Check
just after SSL_do_handshake if an ALPN was negotiated. If not, emit a
CONNECTION_CLOSE with a TLS alert to close the connection.
This prevent a crash in qcc_install_app_ops() called with null as second
parameter value.
Instead of testing if a conn-stream exists or not, we rely on CS_EP_ORPHAN
endpoint flag. In addition, if possible, we access the endpoint from the
h1s. Finally, the endpoint flags are now reported in trace messages.
cs_free_cond() must now be used to remove a CS. cs_free() may be used on
error path to release a freshly allocated but unused CS. But in all other
cases cs_free_cond() must be used. This function takes care to release the
CS if it is possible (no app and detached from any endpoint).
In fact, this function is only used internally. From the outside,
cs_detach_* functions are used.
It is a partial revert of 54e85cbfc ("MAJOR: check: Use a persistent
conn-stream for health-checks"). But with the CS refactoring, the result is
cleaner now. A CS is allocated when a new health-check run is started. The
same CS is then used throughout the run. If there are several connections,
the endpoint is just reset. At the end of the run, the CS is released. It
means, in the tcp-check part, the CS is always defined.
process_stream() and all associated functions now manipulate conn-streams.
stream-interfaces are no longer used. In addition, function to dump info
about a stream no longer print info about stream-interfaces.
cs_conn_io_cb(), cs_conn_sync_recv() and cs_conn_sync_send() are moved in
conn_stream.c. Associated functions are moved too (cs_notify, cs_conn_read0,
cs_conn_recv, cs_conn_send and cs_conn_process).
Remaining flags and associated functions are move in the conn-stream
scope. These flags are added on the endpoint and not the conn-stream
itself. This way it will be possible to get them from the mux or the
applet. The functions to get or set these flags are renamed accordingly with
the "cs_" prefix and updated to manipualte a conn-stream instead of a
stream-interface.
si_conn_cb variable is renamed cs_data_conn_cb. In addtion, its associated
functions are also renamed. si_cs_recv(), si_cs_send() and si_cs_process() are
renamed cs_conn_recv(), cs_conn_send and cs_conn_process(). These functions are
updated to manipulate conn-streams instead of stream-interfaces.
data callbacks were only used for streams attached to a connection and
for health-checks. However there is a callback used by task_run_applet. So,
si_applet_wake_cb() is first renamed to cs_applet_process() and it is
defined as the data callback for streams attached to an applet. This way,
this part now manipulates a conn-stream instead of a stream-interface. In
addition, applets are no longer handled as an exception for this part.
si_update_both() is renamed stream_update_both_cs() and moved in stream.c.
The function is slightly changed to manipulate the stream instead the front
and back conn-streams.
si_update_rx(), si_update_tx() and si_update() are renamed cs_update_rx(),
cs_upate_tx() and cs_update() and updated to manipulate a conn-stream
instead of a stream-interface.
It is a transient commit. It should ease next changes about the conn-stream
refactoring. At the end these functions will be moved in the conn-stream
scope.
si_register_applet() and si_applet_release() are renamed
cs_register_applet() and cs_applet_release() and now manipulate a
conn-stream instead of a stream-inteface.
si_shutr(), si_shutw(), si_chk_rcv() and si_chk_snd() are moved in the
conn-stream scope and renamed, respectively, cs_shutr(), cs_shutw(),
cs_chk_rcv(), cs_chk_snd() and manipulate a conn-stream instead of a
stream-interface.
Some conn-stream functions are only used when there is a connection. Thus,
they was renamed with "cs_conn_" prefix. In addition, we expect to have a
connection, so a BUG_ON is added to be sure the functions are never called
in another context.
wait_event structure is moved in the conn-stream. The tasklet is only
created if the conn-stream is attached to a mux and released when the mux is
detached. This implies a subtle change. In stream_int_chk_rcv() function,
the wakeup of the tasklet was removed because there is no longer tasklet at
this stage (stream_int_chk_rcv() is a callback function of si_embedded_ops).
To be able to move wait_event from the stream-interface to the conn-stream,
we must be prepare to handle errors when a mux is attached to a conn-stream.
Indeed, the wait_event's tasklet will be allocated when both a mux and a
stream will be both attached to a stream. So, we must be prepared to handle
allocation errors.
These flags only concerns the connection part. In addition, it is required
for a next commit, to avoid circular deps. Thus CS_SHR_* and CS_SHW_* were
renamed with the "CO_" prefix.
si_connect() is moved in backend.c and renamed as do_connect_server(). In
addition, the function now manipulate a stream instead of a
stream-interface.
si_retnclose() is used to send a reply to a client before closing. There is
no use on the server side, in spite of the function is generic. Thus, it is
renamed stream_retnclose() and moved into the stream scope. The function now
handle a stream and explicitly send a message to the client.
The stream-interface state (SI_ST_*) is now in the conn-stream. It is a
mechanical replacement for now. Nothing special. SI_ST_* and SI_SB_* were
renamed accordingly. Utils functions to manipulate these infos were moved
under the conn-stream scope.
But it could be good to keep in mind that this part should be
reworked. Indeed, at the CS level, we only need to know if it is ready to
receive or to send. The state of conn-stream from INI to EST is only used on
the server side. The client CS is immediately set to EST. Thus current
SI_ST_* states should probably be moved to the stream to reflect the server
connection state during the establishment stage.
Only the server side is concerned by the stream-interface error type. It is
useless to have an err_type field on the client side. So, it is now move to
the stream. SI_ET_* are renames STRM_ET_* and moved in stream-t.h header
file.
The previous connection state on the client side was only used for debugging
purpose to report client close. But this may be handled when the client
stream-interface is switched from SI_ST_DIS to SI_ST_CLO.
So, there only remains the previous connection state on the server side that
is used by the stream, in process_stream(), to be able to set the correct
termination flags. Thus, instead of keeping this info in the
stream-interface for only one side, the info is now stored in the stream
itself.
Flag to get the source ip/port with getsockname is now handled at the stream
level. Thus SI_FL_SRC_ADDR stream-int flag is replaced by SF_SRC_ADDR stream
flag.
Flag to consider a stream as indepenent is now handled at the conn-stream
level. Thus SI_FL_INDEP_STR stream-int flag is replaced by CS_FL_INDEP_STR
conn-stream flags.
Flag to not wake the stream up on I/O is now handled at the conn-stream
level. Thus SI_FL_DONT_WAKE stream-int flag is replaced by CS_FL_DONT_WAKE
conn-stream flags.
Flags to disable lingering and half-close are now handled at the conn-stream
level. Thus SI_FL_NOLINGER and SI_FL_NOHALF stream-int flags are replaced by
CS_FL_NOLINGER and CS_FL_NOHALF conn-stream flags.
Instead of setting a stream-interface flag to then set the corresponding
conn-stream endpoint flag, we now only rely the conn-stream endoint. Thus
SI_FL_KILL_CON is replaced by CS_EP_KILL_CONN.
In addition si_must_kill_conn() is replaced by cs_must_kill_conn().
Instead of relying on the conn-stream error, via CS_FL_ERR flags, we now
directly use the error at the endpoint level with the flag CS_EP_ERROR. It
should be safe to do so. But we must be careful because it is still possible
that an error is processed too early. Anyway, a conn-stream has always a
valid endpoint, maybe detached from any endpoint, but valid.
SI_FL_ERR is removed and replaced by CS_FL_ERROR. It is a transient patch
because the idea is to rely on the endpoint to handle errors at this
level. But if for any reason it is not possible, the stream-interface flags
will still be replaced.
The expiration date in the stream-interface was only used on the server side
to set the connect, queue or turn-around timeout. It was checked on the
frontend stream-interface, but never used concretely. So it was removed and
replaced by a connect expiration date in the stream itself. Thus, SI_FL_EXP
flag in stream-interfaces is replaced by a stream flag, SF_CONN_EXP.
The source and destination addresses at the applicative layer are moved from
the stream-interface to the conn-stream. This simplifies a bit the code and
it is a logicial step to remove the stream-interface.
The conn_retries counter was set to the max value and decremented at each
connection retry. Thus the counter reflected the number of retries left and
not the real number of retries. All calculations of redispatch or reporting
of number of retries experienced were made using subtracts from the
configured retries, which was complicated and didn't bring any benefit.
Now, this counter is set to 0 and incremented at each retry. We know we've
reached the maximum allowed connection retries by comparing it to the
configured value. In all other cases, we directly use the counter.
This patch should address the feature request #1608.
The conn_retries counter may be moved into the stream structure. It only
concerns the connection establishment. The frontend stream-interface does not
use it. So it is a logical change.
The L7 retries only concerns the stream when a server connection is
established. Thus instead of storing the L7 buffer into the
stream-interface, it may be moved to the stream. And because it is only
available for HTTP streams, it may be moved in the HTTP transaction.
Associated flags are also moved into the HTTP transaction.
At many places, we now use the new CS functions to get a stream or a channel
from a conn-stream instead of using the stream-interface API. It is the
first step to reduce the scope of the stream-interfaces. The main change
here is about the applet I/O callback functions. Before the refactoring, the
stream-interface was the appctx owner. Thus, it was heavily used. Now, as
far as possible,the conn-stream is used. Of course, it remains many calls to
the stream-interface API.
CS_FL_ISBACK is a new flag, set on backend conn-streams. We must just be
careful to preserve this flag when the endpoint is detached from the
conn-stream.
Instead of testing if a conn-stream exists or not, we rely on CS_EP_ORPHAN
endpoint flag. In addition, if possible, we access the endpoint from the
mux_pt context. Finally, the endpoint flags are now reported in trace
messages.
All old flags CS_FL_* are now moved in the endpoint scope and renamed
CS_EP_* accordingly. It is a systematic replacement. There is no true change
except for the health-check and the endpoint reset. Here it is a bit special
because the same conn-stream is reused. Thus, we must handle endpoint
allocation errors. To do so, cs_reset_endp() has been adapted.
Thanks to this last change, it will now be possible to simplify the
multiplexer and probably the applets too. A review must also be performed to
remove some flags in the channel or the stream-interface. The HTX will
probably be simplified too. Finally, there is now some place in the
conn-stream to move info from the stream-interface.
The conn-stream endpoint is now shared between the conn-stream and the
applet or the multiplexer. If the mux or the applet is created first, it is
responsible to also create the endpoint and share it with the conn-stream.
If the conn-stream is created first, it is the opposite.
When the endpoint is only owned by an applet or a mux, it is called an
orphan endpoint (there is no conn-stream). When it is only owned by a
conn-stream, it is called a detached endpoint (there is no mux/applet).
The last entity that owns an endpoint is responsible to release it. When a
mux or an applet is detached from a conn-stream, the conn-stream
relinquishes the endpoint to recreate a new one. This way, the endpoint
state is never lost for the mux or the applet.
It is a transient commit to prepare next changes. Now, when a conn-stream is
created from an applet or a multiplexer, an endpoint is always provided. In
addition, the API to create a conn-stream was specialized to have one
function per type.
The next step will be to share the endpoint structure.
It is a transient commit to prepare next changes. It is possible to pass a
pre-allocated endpoint to create a new conn-stream. If it is NULL, a new
endpoint is created, otherwise the existing one is used. There no more
change at the conn-stream level.
In the applets, all conn-stream are created with no pre-allocated
endpoint. But for multiplexers, an endpoint is systematically created before
creating the conn-stream.
Some CS flags, only related to the endpoint, are moved into the endpoint
struct. More will probably moved later. Those ones are not critical. So it
is pretty safe to move them now and this will ease next changes.
Group the endpoint target of a conn-stream, its context and the associated
flags in a dedicated structure in the conn-stream. It is not inlined in the
conn-stream structure. There is a dedicated pool.
For now, there is no complexity. It is just an indirection to get the
endpoint or its context. But the purpose of this structure is to be able to
share a refcounted context between the mux and the conn-stream. This way, it
will be possible to preserve it when the mux is detached from the
conn-stream.
The function cs_init() is only called by cs_new(). The conn-stream
initialization will be reviewed. It is easier to do it in cs_new() instead
of using a dedicated function. cs_new() is pretty simple, there is no reason
to split the code in this case.
This change is only significant for the multiplexer part. For the applets,
the context and the endpoint are the same. Thus, there is no much change. For
the multiplexer part, the connection was used to set the conn-stream
endpoint and the mux's stream was the context. But it is a bit strange
because once a mux is installed, it takes over the connection. In a
wonderful world, the connection should be totally hidden behind the mux. The
stream-interface and, in a lesser extent, the stream, still access the
connection because that was inherited from the pre-multiplexer era.
Now, the conn-stream endpoint is the mux's stream (an opaque entity for the
conn-stream) and the connection is the context. Dedicated functions have
been added to attached an applet or a mux to a conn-stream.
The appctx owner is now always a conn-stream. Thus, it can be set during the
appctx allocation. But, to do so, the conn-stream must be created first. It
is not a problem on the server side because the conn-stream is created with
the stream. On the client side, we must take care to create the conn-stream
first.
This change should ease other changes about the applets bootstrapping.
This patch is mandatory to invert the endpoint and the context in the
conn-stream. There is no common type (at least for now) for the entity
representing a mux (h1s, h2s...), thus we must set its type when the
endpoint is attached to a conn-stream. There is 2 types for the conn-stream
endpoints: the mux (CS_FL_ENDP_MUX) and the applet (CS_FL_ENDP_APP).
For now there is no much change. Only the appctx is passed as argument when
the .init callback function is called. And it is not possible to yield at
this stage. It is not a problem because the feature is not used. Only the
lua defines this callback function for the lua TCP/HTTP services. The idea
is to be able to use it for all applets to initialize the appctx context.
cs_free() must not be called when we fail to allocate the conn-stream in
h1s_new_cs() function. This bug was introduced by the commit cda94accb
("MAJOR: stream/conn_stream: Move the stream-interface into the
conn-stream").
It is 2.6-specific, no backport is needed.
It was mentioned in issue #12 that expired entries would appear with a
negative expire delay in "show cache". Instead of listing them, let's
just evict them.
This could be backported to all versions since this was reported on
1.8 already.
It was reported in issue #13 that a GOAWAY frame was sent on timeout even
if no SETTINGS frame was sent. The approach imagined by then was to track
the fact that a SETTINGS frame was already sent to avoid this, but that's
already what is done through the state, though it doesn't stand due to the
fact that we switch the frame to the error state. Thus instead what we're
doing here is to instead set the GOAWAY_FAILED flag in h2c_error() before
switching to the ERROR state when the state indicates we've not yet sent
settings, and refrain from sending anything from the h2c_send_goaway_error()
function for such states.
This could be backported to all versions where it applies well.
qcs by_id field has been replaced by a new field named "id". Adjust the
h3_debug_printf traces. This is the case since the introduction of the
qc_stream_desc type.
In issue #1184, cppcheck reports that an incorrect format "%d" was
used to print an unsigned in the debug code, though values are always
very small and this will never be an issue.
In issue #1184, cppcheck complains about some inconsistent printf
formats. At least the one in peer_prepare_hellomsg() that uses "%u"
for the int "min_ver" is wrong. Let's force other types to make it
happy, though constants cannot cause trouble.
cppcheck reports in issue #1184 a type mismatch between "%d" and the
unsigned int "misses" in the standalone debug code of lru.c. Let's
switch to "%u".
We used to check if the transport layer was ssl_sock to decide to log
"~" after a frontend's name. Now that QUIC is present, this doesn't work
anymore. Better rely on the transport layer's get_ssl_sock_ctx() method.
Coverity found in issue #1646 that I added a double-close bug in last
commit e4d09cedb ("MINOR: sock: check configured limits at the sock layer,
not the listener's") because the error path already closes the FD. No
backport needed.
In issue #1645, coverity suspects some dead code due to a pair of
remaining tests on "if (!ctx)". While all other functions test the
context earlier, these ones used to only test the connection and the
transport. It's still not very clear to me if there are certain error
cases that can lead to no SSL being initially set while the rest is
ready, and the SSL arriving later, but better preserve this original
construct by testing first the connection and only later the context.
First gcc, then now coverity report possible null derefs in situations
where we know these cannot happen since we call the functions in
contexts that guarantee the existence of the connection and the method
used. Let's introduce an unchecked version of the function for such
cases, just like we had to do with objt_*. This allows us to remove the
ALREADY_CHECKED() statements (which coverity doesn't see), and addresses
github issues #1643, #1644, #1647.
Some compilers see a possible null deref after conn_get_ssl_sock_ctx()
in ssl_sock_parse_heartbeat, which cannot happen there, so let's mark
it as safe. No backport needed.
It was supposed to be there, and probably was not placed there due to
historic limitations in listener_accept(), but now there does not seem
to be a remaining valid reason for keeping the quic_conn out of the
handle. In addition in new_quic_cli_conn() the handle->fd was incorrectly
set to the listener's FD.
By being able to return the ssl_sock_ctx, we're now enabling the whole
set of SSL sample fetch methods to work on the current SSL context of
the QUIC connection, as seen in the following test showing a request
forwarded to an HTTP/1 server with plenty of SSL headers filled:
00000001:decrypt.clireq[000f:ffffffff]: GET / HTTP/1.1
00000001:decrypt.clihdr[000f:ffffffff]: host: localhost
00000001:decrypt.clihdr[000f:ffffffff]: user-agent: nghttp3/ngtcp2 client
00000001:decrypt.clihdr[000f:ffffffff]: x-src: 127.0.0.1
00000001:decrypt.clihdr[000f:ffffffff]: x-dst: 127.0.0.4
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_f_serial: D16197E7D3E634E9
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_f_key_alg: rsaEncryption
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_f_sig_alg: RSA-SHA1
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc: 1
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_has_sni: 1
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_sni: blah
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_alpn: h3
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_protocol: TLSv1.3
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_cipher: TLS_AES_256_GCM_SHA384
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_alg_keysize: 256
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_use_keysize: 256
00000001:decrypt.clihdr[000f:ffffffff]: x-forwarded-for: 127.0.0.1
The code is trivial, but this is marked as medium as there's always
the risk that some of the callable functions do not like being called
on such SSL contexts.
The SSL functions must not use conn->xprt_ctx anymore but find the context
by calling conn_get_ssl_sock_ctx(), which will properly pass through the
transport layers to retrieve the desired information. Otherwise when the
functions are called on a QUIC connection, they refuse to work for not
being called on the proper transport.
Historically there was a single way to have an SSL transport on a
connection, so detecting if the transport layer was SSL and a context
was present was sufficient to detect SSL. With QUIC, things have changed
because QUIC also relies on SSL, but the context is embedded inside the
quic_conn and the transport layer doesn't match expectations outside,
making it difficult to detect that SSL is in use over the connection.
The approach taken here to improve this consists in adding a new method
at the transport layer, get_ssl_sock_ctx(), to retrieve this often needed
ssl_sock_ctx, and to use this to detect the presence of SSL. This will
even allow some simplifications and cleanups to be made in the SSL code
itself, and QUIC will be able to provide one to export its ssl_sock_ctx.
These functions will allow the connection layer to retrieve a quic_conn's
source or destination when possible. The quic_conn holds the peer's address
but not the local one, and the sockets API doesn't always makes that easy
for datagrams. Thus for frontend connection what we're doing here is to
retrieve the listener's address when the destination address is desired.
Now it finally becomes possible to fetch the source and destination using
"src" and "dst", and to pass an incoming connection's endpoints via the
proxy protocol.
The mux didn't have its flags nor name set, as seen in this output of
"haproxy -vv":
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
quic : mode=HTTP side=FE mux= flags=
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
This might have random impacts at certain points like forcing some
connections to close instead of aborting a stream, or not always
handling certain streams as fully HTX-compliant.
Certain functions cannot be called on an FD-less conn because they are
normally called as part of the protocol-specific setup/teardown sequence.
Better place a few BUG_ON() to make sure none of them is called in other
situations. If any of them would trigger in ambiguous conditions, it would
always be possible to replace it with an error.
Some syscalls at the TCP level act directly on the FD. Some of them
are used by TCP actions like set-tos, set-mark, silent-drop, others
try to retrieve TCP info, get the source or destination address. These
ones must not be called with an invalid FD coming from an FD-less
connection, so let's add the relevant tests for this. It's worth
noting that all these ones already have fall back plans (do nothing,
error, or switch to alternate implementation).
QUIC connections do not use a file descriptor, instead they use the
quic equivalent which is the quic_conn. A number of our historical
functions at the connection level continue to unconditionally touch
the file descriptor and this may have consequences once QUIC starts
to be used.
This patch adds a new flag on QUIC connections, CO_FL_FDLESS, to
mention that the connection doesn't have a file descriptor, hence the
FD-based API must never be used on them.
From now on it will be possible to intrument existing functions to
panic when this flag is present.
listener_accept() used to continue to enforce the FD limits relative to
global.maxsock by itself while it's the last FD-specific test in the
whole file. This test has nothing to do there, it ought to be placed in
sock_accept_conn() which is the one in charge of FD allocation and tests.
Similar tests are already located there by the way. The only tiny
difference is that listener_accept() used to pause for one second when
this limit was reached, while other similar conditions were pausing only
100ms, so now the same 100ms will apply. But that's not important and
could even be considered as an improvement.
OpenSSL 3.0 warns that ERR_func_error_string() is deprecated. Using
ERR_peek_error_func() solves it instead, and this function was added to
the compat layer by commit 1effd9aa0 ("MINOR: ssl: Remove call to
ERR_func_error_string with OpenSSLv3").
The OpenSSL engine API is deprecated starting with OpenSSL 3.0.
In order to have a clean build this feature is now disabled by default.
It can be reactivated with USE_ENGINE=1 on the build line.
Shawn Heisey reported that the proxy's description was unreadable in dark
color scheme. This is because the text color is changed in the table but
not the cell's background.
This should be backported to 2.5.
In "haproxy -vv" we produce a list of available muxes with their
capabilities, but that list is often quite large for terminals due
to excess of spaces, so let's reduce them a bit to make the output
more readable.
The new 'close-spread-time' global option can be used to spread idle and
active HTTP connction closing after a SIGUSR1 signal is received. This
allows to limit bursts of reconnections when too many idle connections
are closed at once. Indeed, without this new mechanism, in case of
soft-stop, all the idle connections would be closed at once (after the
grace period is over), and all active HTTP connections would be closed
by appending a "Connection: close" header to the next response that goes
over it (or via a GOAWAY frame in case of HTTP2).
This patch adds the support of this new option for HTTP as well as HTTP2
connections. It works differently on active and idle connections.
On active connections, instead of sending systematically the GOAWAY
frame or adding the 'Connection: close' header like before once the
soft-stop has started, a random based on the remainder of the close
window is calculated, and depending on its result we could decide to
keep the connection alive. The random will be recalculated for any
subsequent request/response on this connection so the GOAWAY will still
end up being sent, but we might wait a few more round trips. This will
ensure that goaways are distributed along a longer time window than
before.
On idle connections, a random factor is used when determining the expire
field of the connection's task, which should naturally spread connection
closings on the time window (see h2c_update_timeout).
This feature request was described in GitHub issue #1614.
This patch should be backported to 2.5. It depends on "BUG/MEDIUM:
mux-h2: make use of http-request and keep-alive timeouts" which
refactorized the timeout management of HTTP2 connections.
We modify the key update feature implementation to support reusable cipher contexts
as this is done for the other cipher contexts for packet decryption and encryption.
To do so we attach a context to the quic_tls_kp struct and initialize it each time
the underlying secret key is updated. Same thing when we rotate the secrets keys,
we rotate the contexts as the same time.
These settings are potentially cancelled by others setting initialization shared
with SSL sock bindings. This will have to be clarified when we will adapt the
QUIC bindings configuration.
Add ->ctx new member field to quic_tls_secrets struct to store the cipher context
for each QUIC TLS context TX/RX parts.
Add quic_tls_rx_ctx_init() and quic_tls_tx_ctx_init() functions to initialize
these cipher context for RX and TX parts respectively.
Make qc_new_isecs() call these two functions to initialize the cipher contexts
of the Initial secrets. Same thing for ha_quic_set_encryption_secrets() to
initialize the cipher contexts of the subsequent derived secrets (ORTT, Handshake,
1RTT).
Modify quic_tls_decrypt() and quic_tls_encrypt() to always use the same cipher
context without allocating it each time they are called.
When a QUIC connection is accepted, the wrong field is set from the
client's source address, it's the destination instead of the source.
No backport needed.
On qc_detach(), the qcs must cleared the conn-stream context and set its
cs pointer to NULL. This prevents the qcs to point to a dangling
reference.
Without this, a SEGFAULT may occurs in qc_wake_some_streams() when
accessing an already detached conn-stream instance through a qcs.
Here is the SEGFAULT observed on haproxy.org.
Program terminated with signal 11, Segmentation fault.
1234 else if (qcs->cs->data_cb->wake) {
(gdb) p qcs.cs.data_cb
$1 = (const struct data_cb *) 0x0
This can happens since the following patch :
commit fe035eca3a
MEDIUM: mux-quic: report errors on conn-streams
The stream mux buffering has been reworked since the introduction of the
struct qc_stream_desc. A qcs is now able to quickly release its buffer
to the quic-conn.
For replace-path, replace-pathq and replace-uri actions, we must take care
to not match on the selected element if it is not defined.
regex_exec_match2() function expects to be called with a defined
subject. However, if the request path is invalid or not found, the function
is called with a NULL subject, leading to a crash when compiled without the
PRCE/PCRE2 support.
For instance the following rules crashes HAProxy on a CONNECT request:
http-request replace-path /short/(.) /\1
This patch must be backported as far as 2.0.
url_enc() encodes an input string by calling encode_string(). To do so, it
adds a trailing '\0' to the sample string. However it never restores the
sample string at the end. It is a problem for const samples. The sample
string may be in the middle of a buffer. For instance, the HTTP headers
values are concerned.
However, instead of modifying the sample string, it is easier to rely on
encode_chunk() function. It does the same but on a buffer.
This patch must be backported as far as 2.2.
When compiled in debug mode, a BUG_ON triggers an error when the payload is
fully transfered from the http-client buffer to the request channel
buffer. In fact, when channel_add_input() is called, the request buffer is
empty. So an error is reported when those data are directly forwarded,
because we try to add some output data on a buffer with no data.
To fix the bug, we must be sure to call channel_add_input() after the data
transfer.
The bug was introduced by the commit ccc7ee45f ("MINOR: httpclient: enable
request buffering"). So, this patch must be backported if the above commit
is backported.
When a message is sent, we can switch it state to MSG_DONE if all the
announced payload was processed. This way, if the EOM flag is not set on the
message when the last expected data block is processed, the message can
still be set to MSG_DONE state.
This bug is related to the previous ones. There is a design issue with the
HTX since the 2.4. When the EOM HTX block was replaced by a flag, I tried
hard to be sure the flag is always set with the last HTX block on a
message. It works pretty well for all messages received from a client or a
server. But for internal messages, it is not so simple. Because applets
cannot always properly handle the end of messages. So, there are some cases
where the EOM flag is set on an empty message.
As a workaround, for chunked messages, we can add an EOT HTX block. It does
the trick. But for messages with a content-length, there is no empty DATA
block. Thus, the only way to be sure the end of the message was reached in
this case is to detect it in the H1 multiplexr.
We already count the amount of data processed when the payload length is
announced. Thus, we must only switch the message in DONE state when last
bytes of the payload are received. Or when the EOM flag is received of
course.
This patch must be backported as far as 2.4.
In a lua HTTP applet, when the script is finished, we must be sure to not
set the EOM on an empty message. Otherwise, because there is no data to
send, the mux on the client side may miss the end of the message and
consider any shutdown as an abort.
See "UG/MEDIUM: stats: Be sure to never set EOM flag on an empty HTX
message" for details.
This patch must be backported as far as 2.4. On previous version there is
still the EOM HTX block.
During the last call to the stats I/O handle, it is possible to have nothing
to dump. It may happen for many reasons. For instance, all remaining proxies
are disabled or they don't match the specified scope. In HTML or in JSON, it
is not really an issue because there is a footer. So there are still some
data to push in the response channel buffer. In CSV, it is a problem because
there is no footer. It means it is possible to finish the response with no
payload at all in the HTX message. Thus, the EOM flag may be added on an
empty message. When this happens, a shutdown is performed on an empty HTX
message. Because there is nothing to send, the mux on the client side is not
notified that the message was properly finished and interprets the shutdown
as an abort.
The response is chunked. So an abort at this stage means the last CRLF is
never sent to the client. All data were sent but the message is invalid
because the response chunking is not finished. If the reponse is compressed,
because of a similar bug in the comppression filter, the compression is also
aborted and the content is truncated because some data a lost in the
compression filter.
It is design issue with the HTX. It must be addressed. And there is an
opportunity to do so with the recent conn-stream refactoring. It must be
carefully evaluated first. But it is possible. In the means time and to also
fix stable versions, to workaround the bug, a end-of-trailer HTX block is
systematically added at the end of the message when the EOM flag is set if
the HTX message is empty. This way, there are always some data to send when
the EOM flag is set.
Note that with a H2 client, it is only a problem when the response is
compressed.
This patch should fix the issue #1478. It must be backported as far as
2.4. On previous versions there is still the EOM block.
In the FCGI app, when a full response is received, if there is no
content-length and transfer-encoding headers, a content-length header is
automatically added. This avoid, as far as possible to chunk the
response. This trick was added because, most of time, scripts don"t add
those headers.
But this should not be performed for response to HEAD requests. Indeed, in
this case, there is no payload. If the payload size is not specified, we
must not added it by hand. Otherwise, a "content-length: 0" will always be
added while it is not the real payload size (unknown at this stage).
This patch should solve issue #1639. It must be backported as far as 2.2.
Define a new API to notify the MUX from the quic-conn when the
connection is about to be closed. This happens in the following cases :
- on idle timeout
- on CONNECTION_CLOSE emission or reception
The MUX wake callback is called on these conditions. The quic-conn
QUIC_FL_NOTIFY_CLOSE is set to only report once. On the MUX side,
connection flags CO_FL_SOCK_RD_SH|CO_FL_SOCK_WR_SH are set to interrupt
future emission/reception.
This patch is the counterpart to
"MEDIUM: mux-quic: report CO_FL_ERROR on send".
Now the quic-conn is able to report its closing, which may be translated
by the MUX into a CO_FL_ERROR on the connection for the upper layer.
This allows the MUX to properly react to the QUIC closing mechanism for
both idle-timeout and closing/draining states.
Complete the error reporting. For each attached streams, if CO_FL_ERROR
is set, mark them with CS_FL_ERR_PENDING|CS_FL_ERROR. This will notify
the upper layer to trigger streams detach and release of the MUX.
This reporting is implemented in a new function qc_wake_some_streams(),
called by qc_wake(). This ensures that a lower-layer error is quickly
reported to the individual streams.
Mark the connection with CO_FL_ERROR on qc_send() if the socket Tx is
closed. This flag is used by the upper layer to order a close on the
MUX. This requires to check CO_FL_ERROR in qcc_is_dead() to process to
immediate MUX free when set.
The qc_wake() callback has been completed. Most notably, it now calls
qc_send() to report a possible CO_FL_ERROR. This is useful because
qc_wake() is called by the quic-conn on imminent closing.
Note that for the moment the error flag can never be set because the
quic-conn does not report when the Tx socket is closed. This will be
implemented in a following patch.
Regroup all features related to sending in qc_send(). This will be
useful when qc_send() will be called outside of the io-cb.
Currently, flow-control frames generation is now automatically
integrated in qc_send().
Add a new app layer operation is_active. This can be used by the MUX to
check if the connection can be considered as active or not. This is used
inside qcc_is_dead as a first check.
For example on HTTP/3, if there is at least one bidir client stream
opened the connection is active. This explicitly ignore the uni streams
used for control and qpack as they can never be closed during the
connection lifetime.
Improve timeout handling on the MUX. When releasing a stream, first
check if the connection can be considered as dead and should be freed
immediatly. This allows to liberate resources faster when possible.
If the connection is still active, ensure there is no attached
conn-stream before scheduling the timeout. To do this, add a nb_cs field
in the qcc structure.
When freeing a quic-conn, the streams resources attached to it must be
cleared. This code is already implemented but the streams buffer was not
deallocated.
Fix this by using the function qc_stream_desc_free. This existing
function centralize all operations to properly free all streams
elements, attached both to the MUX and the quic-conn.
This fixes a memory leak which can happen for each released connection.
This flag was used to notify the MUX about a CONNECTION_CLOSE frame
reception. It is now unused on the MUX side and can be removed. A new
mechanism to detect quic-conn closing will be soon implemented.
Rationalize the lifetime of the quic-conn regarding with the MUX. The
quic-conn must not be freed if the MUX is still allocated.
This simplify the MUX code when accessing the quic-conn and removed
possible segfaults.
To implement this, if the quic-conn timer expired, the quic-conn is
released only if the MUX is not allocated. Else, the quic-conn is
flagged with QUIC_FL_CONN_EXP_TIMER. The MUX is then responsible
to call quic_close() which will free the flagged quic-conn.
New received packets after sending CONNECTION_CLOSE frame trigger a new
CONNECTION_CLOSE frame to be sent. Each time such a frame is sent we
increase the number of packet required to send another CONNECTION_CLOSE
frame.
Rearm only one time the idle timer when sending a CONNECTION_CLOSE frame.
In case an error provokes the release of the applet, we will never call
the end callback of the httpclient.
In the case of a lua script, it would mean that the lua task will never
be waked up after a yield, letting the lua script stuck forever.
Fix the issue by moving the callback from the end of the iohandler to
the applet release function.
Must be backported in 2.5.
The request buffering is required for doing l7 retry. The IO handler of
the httpclient need to be rework for that.
This patch change the IO handler so it copies partially the data instead
of swapping buffer. This is needed because the b_xfer won't never work
if the destination buffer is not empty, which is the case when
buffering.
There were empty lines in the output of "show ssl ca-file <cafile>" and
"show ssl crl-file <crlfile>" commands when an empty line should only
mark the end of the output. This patch adds a space to those lines.
This patch should be backported to 2.5.
ssl_store_load_locations_file() is using X509_get_default_cert_dir()
when using '@system-ca' as a parameter.
This function could return a NULL if OpenSSL was built with a
X509_CERT_DIR set to NULL, this is uncommon but let's fix this.
No backport needed, 2.6 only.
Fix issue #1637.
This new converter is similar to the concat converter and can be used to
build new variables made of a succession of other variables but the main
difference is that it does the checks if adding a delimiter makes sense as
wouldn't be the case if e.g the current input sample is empty. That
situation would require 2 separate rules using concat converter where the
first rule would have to check if the current sample string is empty before
adding a delimiter. This resolves GitHub Issue #1621.
Previous patch was accidentaly breaking upon an error when itarating
through a CA directory. This is not the expected behavior, the function
must start processing the other files after the warning.
This patch implements the ability to load a certificate directory with
the "ca-file" directive.
The X509_STORE_load_locations() API does not allow to cache a directory
in memory at startup, it only references the directory to allow a lookup
of the files when needed. But that is not compatible with the way
HAProxy works, without any access to the filesystem.
The current implementation loads every ".pem", ".crt", ".cer", and
".crl" available in the directory which is what is done when using
c_rehash and X509_STORE_load_locations(). Those files are cached in the
same X509_STORE referenced by the directory name. When looking at "show ssl
ca-file", everything will be shown in the same entry.
This will eventually allow to load more easily the CA of the system,
which could already be done with "ca-file /etc/ssl/certs" in the
configuration.
Loading failure intentionally emit a warning instead of an alert,
letting HAProxy starts when one of the files can't be loaded.
Known limitations:
- There is a bug in "show ssl ca-file", once the buffer is full, the
iohandler is not called again to output the next entries.
- The CLI API is kind of limited with this, since it does not allow to
add or remove a entry in a particular ca-file. And with a lot of
CAs you can't push them all in a buffer. It probably needs a "add ssl
ca-file" like its done with the crt-list.
Fix issue #1476.
As suggested in the comment, it's possible to read 32 bits at once in
big-endian order, and now we have the functions to do this, so let's
do it. This reduces the code on the fast path by 31 bytes on x86, and
more importantly performs single-operation 32-bit reads instead of
playing with shifts and additions.
As reported in issue #1635, there's a subtle sign change when shifting
a uint8_t value to the left because integer promotion first turns any
smaller type to signed int *even if it was unsigned*. A warning was
reported about uint8_t shifted left 24 bits that couldn't fit in int
due to this.
It was verified that the emitted code didn't change, as expected, but
at least this allows to silence the code checkers. There's no need to
backport this.
This one has been detected by valgrind:
==2179331== Conditional jump or move depends on uninitialised value(s)
==2179331== at 0x1B6EDE: qcs_notify_recv (mux_quic.c:201)
==2179331== by 0x1A17C5: qc_handle_uni_strm_frm (xprt_quic.c:2254)
==2179331== by 0x1A1982: qc_handle_strm_frm (xprt_quic.c:2286)
==2179331== by 0x1A2CDB: qc_parse_pkt_frms (xprt_quic.c:2550)
==2179331== by 0x1A6068: qc_treat_rx_pkts (xprt_quic.c:3463)
==2179331== by 0x1A6C3D: quic_conn_app_io_cb (xprt_quic.c:3589)
==2179331== by 0x3AA566: run_tasks_from_lists (task.c:580)
==2179331== by 0x3AB197: process_runnable_tasks (task.c:883)
==2179331== by 0x357E56: run_poll_loop (haproxy.c:2750)
==2179331== by 0x358366: run_thread_poll_loop (haproxy.c:2921)
==2179331== by 0x3598D2: main (haproxy.c:3538)
==2179331==
This should be useful to have an idea of the list of frames which could be built
towards the list of available frames when building packets.
Same thing about the frames which could not be built because of a lack of room
in the TX buffer.
During a handshake, after having prepared a probe upon a PTO expiration from
process_timer(), we wake up the I/O handler to make it send probing packets.
This handler first treat incoming packets which trigger a fast retransmission
leading to send too much probing (duplicated) packets. In this cas we cancel
the fast retranmission.
When discarding a packet number space, we at least reset the PTO backoff counter.
Doing this several times have an impact on the PTO duration calculation.
We must not discard a packet number space several times (this is already the case
for the handshake packet number space).
Before having a look at the next encryption level to build packets if there is
no more ack-eliciting frames to send we must check we have not to probe from
the current encryption level anymore. If not, we only send one datagram instead
of sending two datagrams giving less chance to recover from packet loss.
Due to a erroneous interpretation of the RFC 9000 (quic-transport), ACKs frames
were always sent only after having received two ack-eliciting packets.
This could trigger useless retransmissions for tail packets on the peer side.
For now on, we send as soon as possible ACK frames as soon as we have ACK to send,
in the same packets as the ack-eliciting frame packets, and we also send ACK
frames after having received 2 ack-eliciting packets since the last time we sent
an ACK frame with other ack-eliciting frames.
As such variables are handled by the QUIC connection I/O handler which runs
always on the thread, there is no need to continue to use such atomic operations
This bug has come with this commit:
1fc5e16c4 MINOR: quic: More accurate immediately close
As mentionned in this commit we do not want to derive anymore secret when in closing
state. But the flag which denote secrets were derived was set. Add a label at
the correct flag to skip the secrets derivation without setting this flag.
Over time we've tried hard to abstract connection errors from the upper
layers so that they're reported per stream and not per connection. As
early as 1.8-rc1, commit 4ff3b8964 ("MINOR: connection: make conn_stream
users also check for per-stream error flag") did precisely this, but
strangely only for rx, not for tx (probably that by then send errors
were not imagined to be reported that way).
And this lack of Tx error check was just revealed in 2.6 by recent commit
d1480cc8a ("BUG/MEDIUM: stream-int: do not rely on the connection error
once established") that causes wakeup loops between si_cs_send() failing
to send via mux_pt_snd_buf() and subscribing against si_cs_io_cb() in
loops because the function now rightfully only checks for CS_FL_ERROR
and not CO_FL_ERROR.
As found by Amaury, this causes aborted "show events -w" to cause
haproxy to loop at 100% CPU.
This fix theoretically needs to be backported to all versions, though
it will be necessary and sufficient to backport it wherever 4ff3b8964
gets backported.
The list of streams was modified in 2.4 to become per-thread with commit
a698eb673 ("MINOR: streams: use one list per stream instead of a global
one"). However the change applied to cli_parse_shutdown_session() is
wrong, as it uses the nullity of the stream pointer to continue on next
threads, but this one is not null once the list_for_each_entry() loop
is finished, it points to the list's head again, so the loop doesn't
check other threads, and no message is printed either to say that the
stream was not found.
Instead we should check if the stream is equal to the requested pointer
since this is the condition to break out of the loop.
Thus must be backported to 2.4. Thanks to Maciej Zdeb for reporting this.
The new qc_stream_desc type has a tree node for storage. Thus, we can
remove the node in the qcs structure.
When initializing a new stream, it is stored into the qcc streams_by_id
tree. When the MUX releases it, it will freed as soon as its buffer is
emptied. Before this, the quic-conn is responsible to store it inside
its own streams_by_id tree.
Move the xprt-buf and ack related fields from qcs to the qc_stream_desc
structure. In exchange, qcs has a pointer to the low-level stream. For
each new qcs, a qc_stream_desc is automatically allocated.
This simplify the transport layer by removing qcs/mux manipulation
during ACK frame parsing. An additional check is done to not notify the
MUX on sending if the stream is already released : this case may now
happen on retransmission.
To complete this change, the quic_stream frame now references the
quic_stream instance instead of a qcs.
Currently, the mux qcs streams manage the Tx buffering, even after
sending it to the transport layer. Buffers are emptied when
acknowledgement are treated by the transport layer. This complicates the
MUX liberation and we may loose some data after the MUX free.
Change this paradigm by moving the buffering on the transport layer. For
this goal, a new type is implemented as low-level stream at the
transport layer, as a counterpart of qcs mux instances. This structure
is called qc_stream_desc. This will allow to free the qcs/qcc instances
without having to wait for acknowledge reception.
For the moment, the quic-conn is responsible to store the qc_stream_desc
in a new tree named streams_by_id. This will sligthly change in the next
commits to remove the qcs node which has a similar purpose :
qc_stream_desc instances will be shared between the qcc MUX and the
quic-conn.
This patch only introduces the new type definition and the function to
manipulate it. The following commit will bring the rearchitecture in the
qcs structure.
Remove qcs instances left during qcc MUX release. This can happen when
the MUX is closed before the completion of all the transfers, such as on
a timeout or process termination.
This may free some memory leaks on the connection.
Implement the release app-ops ops for H3 layer. This is used to clean up
uni-directional streams and the h3 context.
This prevents a memory leak on H3 resources for each connection.
Define a new callback release inside qcc_app_ops. It is called when the
qcc MUX is freed via qc_release. This will allows to implement cleaning
on the app layer.
Regroup some cleaning operations inside a new function qcs_free. This
can be used for all streams, both through qcs_destroy and with
uni-directional streams.
The quic_stream frame stores the qcs instance. On ACK parsing, qcs is
accessed to clear the stream buffer. This can cause a segfault if the
MUX or the qcs is already released.
Consider the following scenario :
1. a STREAM frame is generated by the MUX
transport layer emits the frame with PKN=1
upper layer has finished the transfer so related qcs is detached
2. transport layer reemits the frame with PKN=2 because ACK was not
received
3. ACK for PKN=1 is received, stream buffer is cleared
at this stage, qcs may be freed by the MUX as it is detached
4. ACK for PKN=2 is received
qcs for STREAM frame is dereferenced which will lead to a crash
To prevent this, qcs is never accessed from the quic_stream during ACK
parsing. Instead, a lookup is done on the MUX streams tree. If the MUX
is already released, no lookup is done. These checks prevents a possible
segfault.
This change may have an impact on the perf as now we are forced to use a
tree lookup operation. If this is the case, an alternative solution may
be to implement a refcount on qcs instances.
The CertCache.set() function allows to update an SSL certificate file
stored in the memory of the HAProxy process. This function does the same
as "set ssl cert" + "commit ssl cert" over the CLI.
This could be used to update the crt and key, as well as the OCSP, the
SCTL, and the OSCP issuer.
The implementation does yield every 10 ckch instances, the same way the
"commit ssl cert" do.
Simplify the "cert_exts" array which is used for the selection of the
parsing function depending on the extension.
It now uses a pointer to an array element instead of an index, which is
simplier for the declaration of the array.
This way also allows to have multiple extension using the same type.
Extract the code that replace the ckch_store and its dependencies into
the ckch_store_replace() function.
This function must be used under the global ckch lock.
It frees everything related to the old ckch_store.
Note that we cannot reuse dump_act_rules() because the output format
may be adjusted depending on the call place (this is also used from
haproxy -vv). The principle is the same however.
There are very few but they're registered from constructors, hence
in a random order. The scope had to be copied when retrieving the
next keyword. Note that this also has the effect of listing them
sorted in haproxy -vv.
Like for previous keyword classes, we're sorting the output. But this
time as it's not trivial to do it with multiple words, instead we're
proceeding like the help command, we sort them on their usage message
when present, and fall back to the first word of the command when there
is no usage message (e.g. "help" command).
It's much more convenient to sort these keywords on output to detect
changes, and it's easy to do. The patch looks big but most of it is
only caused by an indent change in the loop, as "git diff -b" shows.
The output produced by dump_registered_keywords() really deserves to be
sorted in order to ease comparisons. The function now implements a tiny
sorting mechanism that's suitable for each two-level list, and makes
use of dump_act_rules() to dump rulesets. The code is not significantly
more complicated and some parts (e.g options) could even be factored.
The output is much more exploitable to detect differences now.
The new function dump_act_rules() now dumps the list of actions supported
by a ruleset. These actions are alphanumerically sorted first so that the
produced output is easy to compare.
When trying to sort sets of strings, it's often needed to required to
compare 3 strings to see if the chosen one fits well between the two
others. That's what this function does, in addition to being able to
ignore extremities when they're NULL (typically for the first iteration
for example).
Similar to the sample fetch keywords, let's also list the converter
keywords. They're much simpler since there's no compatibility matrix.
Instead the input and output types are listed. This is called by
dump_registered_keywords() for the "cnv" keywords class.
New function smp_dump_fetch_kw lists registered sample fetch keywords
with their compatibility matrix, mandatory and optional argument types,
and output types. It's called from dump_registered_keywords() with class
"smp".
New function acl_dump_kwd() dumps the registered ACL keywords and their
sample-fetch equivalent to stdout. It's called by dump_registered_keywords()
for keyword class "acl".
New function cli_list_keywords() scans the list of registered CLI keywords
and dumps them on stdout. It's now called from dump_registered_keywords()
for the class "cli".
Some keywords are valid for the master, they'll be suffixed with
"[MASTER]". Others are valid for the worker, they'll have "[WORKER]".
Those accessible only in expert mode will show "[EXPERT]" and the
experimental ones will show "[EXPERIM]".
When no output stream is passed, stdout is used with one entry per line,
and this is called from dump_registered_services() when passed the class
"svc".
When passing a NULL output buffer the function will now dump to stdout
with a more compact format that is more suitable for machine processing.
An entry was added to dump_registered_keyword() to call it when the
keyword class "flt" is requested.