Avoid loss of precision when computing K cubic value.
Same issue when computing the congestion window value from cubic increase function
formula with possible integer varaiable wrap around.
Depends on this commit:
MINOR: quic: Code clarifications for QUIC CUBIC (RFC 9438)
Must be backported as far as 2.6.
The first version of our QUIC CUBIC implementation is confusing because relying on
TCP CUBIC linux kernel implementation and with references to RFC 8312 which is
obsoleted by RFC 9438 (August 2023) after our implementation. RFC 8312 is a little
bit hard to understand. RFC 9438 arrived with much more clarifications.
So, RFC 9438 is about "CUBIC for Fast Long-Distance Networks". Our implementation
for QUIC is not very well documented. As it was difficult to reread this
code, this patch adds only some comments at complicated locations and rename
some macros, variables without logic modifications at all.
So, the aim of this patch is to add first some comments and variables/macros renaming
to avoid embedding too much code modifications in the same big patch.
Some code modifications will come to adapt this CUBIC implementation to this new
RFC 9438.
Rename some macros:
CUBIC_BETA -> CUBIC_BETA_SCALED
CUBIC_C -> CUBIC_C_SCALED
CUBIC_BETA_SCALE_SHIFT -> CUBIC_SCALE_FACTOR_SHIFT (this is the scaling factor
which is used only for CUBIC_BETA_SCALED)
CUBIC_DIFF_TIME_LIMIT -> CUBIC_TIME_LIMIT
CUBIC_ONE_SCALED was added (scaled value of 1).
These cubic struct members were renamed:
->tcp_wnd -> ->W_est
->origin_point -> ->W_target
->epoch_start -> ->t_epoch
->remaining_tcp_inc -> remaining_W_est_inc
Local variables to quic_cubic_update() were renamed:
t -> elapsed_time
diff ->t
delta -> W_cubic_t
Add a grahpic curve about the CUBIC Increase function.
Add big copied & pasted RFC 9438 extracts in relation with the 3 different increase
function regions.
Same thing for the fast convergence.
Fix a typo about the reference to QUIC RFC 9002.
Must be backported as far as 2.6 to ease any further modifications to come.
This commit adds support for an optional second argument to BUG_ON(),
WARN_ON(), CHECK_IF(), that can be a constant string. When such an
argument is given, it will be printed on a second line after the
existing first message that contains the condition.
This can be used to provide more human-readable explanations about
what happened, such as "too low on memory" or "memory corruption
detected" that may help a user resolve the incident by themselves.
The ABORT_NOW() macro is not much used since we have BUG_ON(), but
there are situations where it makes sense, typically if the program
must always die regardless od DEBUG_STRICT, or if the condition must
always be evaluated (e.g. decompress something and check it).
It's not convenient not to have any hint about what happened there. But
providing too much info also results in wiping some registers, making
the trace less exploitable, so a compromise must be found.
What this patch does is to provide the support for an optional argument
to ABORT_NOW(). When an argument is passed (a string), then a message
will be emitted with the file name, line number, the message and a
trailing LF, before the stack dump and the crash. It should be used
reasonably, for example in functions that have multiple calls that need
to be more easily distinguished.
If we were to enable 'ocsp-update' on a certificate that does not have
an OCSP URI, we would exit ssl_sock_load_ocsp with a negative error code
which would raise a misleading error message ("<cert> has an OCSP URI
and OCSP auto-update is set to 'on' ..."). This patch simply fixes the
error message but an error is still raised.
This issue was raised in GitHub #2432.
It can be backported up to branch 2.8.
As seen in previous commit 59acb27001 ("BUILD: quic: Variable name typo
inside a BUG_ON()."), it can sometimes happen that with DEBUG forced
without DEBUG_STRICT, BUG_ON() statements are ignored. Sadly, it means
that typos there are not even build-tested.
This patch makes these statements reference sizeof(cond) to make sure
the condition is parsed. This doesn't result in any code being emitted,
but makes sure the expression is correct so that an issue such as the one
above will fail to build (which was verified).
This may be backported as it can help spot failed backports as well.
Since d480b7b ("MINOR: debug: make ABORT_NOW() store the caller's line
number when using abort"), building with 'DEBUG_USE_ABORT' fails with:
|In file included from include/haproxy/api.h:35,
| from include/haproxy/activity.h:26,
| from src/ev_poll.c:20:
|include/haproxy/thread.h: In function ‘ha_set_thread’:
|include/haproxy/bug.h:107:47: error: expected ‘;’ before ‘_with_line’
| 107 | #define ABORT_NOW() do { DUMP_TRACE(); abort()_with_line(__LINE__); } while (0)
| | ^~~~~~~~~~
|include/haproxy/bug.h:129:25: note: in expansion of macro ‘ABORT_NOW’
| 129 | ABORT_NOW(); \
| | ^~~~~~~~~
|include/haproxy/bug.h:123:9: note: in expansion of macro ‘__BUG_ON’
| 123 | __BUG_ON(cond, file, line, crash, pfx, sfx)
| | ^~~~~~~~
|include/haproxy/bug.h:174:30: note: in expansion of macro ‘_BUG_ON’
| 174 | # define BUG_ON(cond) _BUG_ON (cond, __FILE__, __LINE__, 3, "FATAL: bug ", "")
| | ^~~~~~~
|include/haproxy/thread.h:201:17: note: in expansion of macro ‘BUG_ON’
| 201 | BUG_ON(!thr->ltid_bit);
| | ^~~~~~
|compilation terminated due to -Wfatal-errors.
|make: *** [Makefile:1006: src/ev_poll.o] Error 1
This is because of a leftover: abort()_with_line(__LINE__);
^^
Fixing it by removing the extra parentheses after 'abort' since the
abort() call is now performed under abort_with_line() helper function.
This was raised by Ilya in GH #2440.
No backport is needed, unless the above commit gets backported.
Fetch will return true if the stream underwent a redispatch according to
"option redispatch" setting upon retries.
Documentation was added, and the "%rc" logformat alternative now mentions
the new fetch to properly emulate the logformat behavior.
This build issued was introduced by this previous commit which is a bugfix:
BUG/MINOR: quic: Wrong ack ranges handling when reaching the limit.
A BUG_ON() referenced <fist> variable in place of <first>.
Must be backported as far as 2.6 as the previous commit.
Acknowledgements ranges are used to build ACK frames. To avoid allocating too
much such objects, a limit was set to 32(QUIC_MAX_ACK_RANGES) by this commit:
MINOR: quic: Do not allocate too much ack ranges
But there is an inversion when removing the oldest range from its tree.
eb64_first() must be used in place of eb64_last(). Note that this patch
only does this modification in addition to rename <last> variable to <first>.
This bug leads such a h2load command to block when a request ends up not
being acknowledged by haproxy even if correctly served:
/opt/nghttp2/build/bin/h2load --alpn-list h3 -t 1 -c 1 -m 1 -n 100 \
https://127.0.0.1/?s=5m
There is a remaining question to be answered. In such a case, haproxy refuses to
reopen the stream, this is a good thing but should not haproxy ackownledge the
request (because correctly parsed again).
Note that to be easily reproduced, this setting had to be applied to the client
network interface:
tc qdisc add dev eth1 root netem delay 100ms 1s loss random
Must be backported as far as 2.6.
As noticed in this thread, some bogus configurations are not always easy
to spot: https://www.mail-archive.com/haproxy@formilux.org/msg44558.html
Here it was about config keywords being used in ACL patterns where strings
were expected, hence they're always valid.
Since we have the diag mode (-dD) we can perform some extra checks when
it's used, and emit them to suggest the user there might be an issue.
Here we detect a few common words (logic such as "and"/"or"/"||" etc),
C++/JS comments mistakenly used to try to isolate final args, and words
that have the exact name of a sample fetch or an ACL keyword. These checks
are only done in diag mode of course.
Final diags were added in 2.4 by commit 5a6926dcf ("MINOR: diag: create
cfgdiag module"), but it's called too late in the startup process,
because when "-c" is passed, the call is not made, while it's its primary
use case. Let's just move the call earlier.
Note that currently the check in this function is limited to verifying
unicity of server cookies in a backend, so it can be backported as far
as 2.4, but there is little value in insisting if it doesn't backport
easily.
Diag warnings were added in 2.4 by commit 7b01a8dbd ("MINOR: global:
define diagnostic mode of execution") but probably due to the split
function that checks for the mode, they did not reuse the emission of
the version string before the first warning, as was brought in 2.2 by
commit bebd21206 ("MINOR: init: report in "haproxy -c" whether there
were warnings or not"). The effet is that diag warnings are emitted
before the version string if there is no other warning nor error. Let's
just proceed like for the two other ones.
This can be backported to 2.4, though this is of very low importance.
Placing DO_NOT_FOLD() before abort() only works in -O2 but not in -Os which
continues to place only 5 calls to abort() in h3.o for call places. The
approach taken here is to replace abort() with a new function that wraps
it and stores the line number in the stack. This slightly increases the
code size (+0.1%) but when unwinding a crash, the line number remains
present now. This is a very low cost, especially if we consider that
DEBUG_USE_ABORT is almost only used by code coverage tools and occasional
debugging sessions.
As indicated in previous commit, we don't want calls to ha_crash_now()
to be merged, since it will make gdb return a wrong line number. This
was found to happen with gcc 4.7 and 4.8 in h3.c where 26 calls end up
as only 5 to 18 "ud2" instructions depending on optimizations. By
calling DO_NOT_FOLD() just before provoking the trap, we can reliably
avoid this folding problem. Note that this does not address the case
where abort() is used instead (DEBUG_USE_ABORT).
Modern compilers sometimes perform function tail merging and identical
code folding, which consist in merging identical occurrences of same
code paths, generally final ones (e.g. before a return, a jump or an
unreachable statement). In the case of ABORT_NOW(), it can happen that
the compiler merges all of them into a single one in a function,
defeating the purpose of the check which initially was to figure where
the bug occurred.
Here we're creating a DO_NO_FOLD() macro which makes use of the line
number and passes it as an integer argument to an empty asm() statement.
The effect is a code position dependency which prevents the compiler
from merging the code till that point (though it may still merge the
following code). In practice it's efficient at stopping the compilers
from merging calls to ha_crash_now(), which was the initial purpose.
It may also be used to force certain optimization constructs since it
gives more control to the developer.
First, checks on the resolver scope were added. Then, because of the recent
changes, the logs emitted by vtest are now too big and this makes the script
fails. So tests on NaN values are now performed on a smaller request. This
reduces enough the logs to pass.
In fact some checks were removed
It is now possible to filter the metrics on their name, by listing
explicitly metrics to dump or on the opposite to exclude only some metrics
from the dump. To do so, a comma-separated list of metrics must be
specified. If a name is preceded by a minus (-), the metric is excluded from
the dump. If at least one metric is specified to be explicitly dumped, all
metrics are no longer dumped, but only those explicitly listed.
The list is specified via one or more "metrics" parameters in the uri
query-string. For insance:
# Dumped all metrics, except "haproxy_server_check_status"
/metrics?metrics=-haproxy_server_check_status
# Only dump frontends, backends and servers status
/metrics?metrics=haproxy_frontend_status,haproxy_backend_status,haproxy_server_status
Included and Excluded metrics can be mixed. Only the intersection will be
dumped.
This patch should fix the issue #770.
It is easier this way, especially for promex modules. And because name and
description are now explicitly passed to this function, there is no reason
to still pass the metric, its type is enough. The function is easier to read
this way.
In Prometheus, a time series a stream of timestamped values belonging to the
same metric and the same set of labeled dimensions. Thus the exporter dump
time-series and not metrics.
Thus, promex_dump_metric(), promex_dump_metric_header() and
promex_metric_to_str() functions were renamed to replace "metric"
Just like for stick-tables, this patch adds a promex module to dump
resolvers metrics. It adds the "resolver" scope and for now, it dumps
folloowing metrics:
* haproxy_resolver_sent
* haproxy_resolver_send_error
* haproxy_resolver_valid
* haproxy_resolver_update
* haproxy_resolver_cname
* haproxy_resolver_cname_error
* haproxy_resolver_any_err
* haproxy_resolver_nx
* haproxy_resolver_timeout
* haproxy_resolver_refused
* haproxy_resolver_other
* haproxy_resolver_invalid
* haproxy_resolver_too_big
* haproxy_resolver_outdated
This patch adds a dump loop on the registered modules. It is very similar to
other dump loops. When a module registered, a implicit scope is created with
the module's name. It means a module name must be unique. It also means,
metrics dump of modules can be filtered via the "scope" parameter.
In this patch we add a registration mechanism for modules. To do so, a
module must defined the "promex_module" structure. The dump itself will be
based on 2 contexts. One for all the dump and another one for each metric
time-series. These contexts are used as restart points when the dump is
interrupted.
Modules must also implement 6 callback functions:
* start_metric_dump(): It is an optional callback function. If defined, it
is responsible to initialize the dump context use
as the first restart point.
* stop_metric_dump(): It is an optional callback function. If defined, it
is responsible to deinit the dump context.
* metric_info(): This one is mandatory. It returns the info about the
metric: name, type and flags and descrition.
* start_ts(): This one is mandatory, it initializes the context for a time
series for a given metric. This context is the second
restart point.
* next_ts(): This one is mandatory. It interates on time series for a
given metrics. It is also responsible to handle end of a
time series and deinit the context.
* fill_ts(): It fills info on the time series for a given metric : the
labels and the value.
In addition, a module must set its name and declare the number of metrics is
exposed.
Instead of using typed pointers to save the restart points we know use
generic pointers. 4 pointers can be saved now. This replaces the 5 typed
pointers used before. So, we save 8-bytes but it is also more generic and
this will be used by the promex modules.
It was not an issue since now, be a way to register modules on promex will
be added. Thus it is important to add some extra checks. Here, we take care
to never dump more than the max labels allowed.
The context of the promex applet was extended to support the dump of extra
counters. These counters are not dumped yet, but info to interrupt and
restart the dump are required. The stats module and the relative field
number for this module can now be saved.
In addition support for "extra-counters" parameter was added on the
query-string to dump these counters. Otherwise, no extra-counters are
dumped.
When a metric is dumped, it is now possible to specify a custom
description. We will add the support for extra counters. The list of these
counters is retrived dynamically. Thus the description must be dynamic
too. Note it was already possible to customize the metric name.
It is now possible to selectively retrieve extra counters from stats
modules. H1, H2, QUIC and H3 fill_stats() callback functions are updated to
return a specific counter.
The list of modules registered on the stats to expose extra counters is now
public. It is required to export these counters into the Prometheus
exporter.
set-bc-{mark,tos} actions are pretty similar to set-fc-{mark,tos} to set
mark/tos on packets sent from haproxy to server: set-bc-{mark,tos} actions
act on the whole backend/srv connection: from connect() to connection
teardown, thus they may only be used before the connection to the server
is instantiated, meaning that they are only relevant for request-oriented
rules such as tcp-request or http-request rules. For now their use is
limited to content request rules, because tos and mark informations are
stored directly within the stream, thus it is required that the stream
already exists.
stream flags are used in combination with dedicated stream struct members
variables to pass 'tos' and 'mark' informations so that they are correctly
considered during stream connection assignment logic (prior to connecting
to actually connecting to the server)
'tos' and 'mark' fd sockopts are taken into account in conn hash
parameters for connection reuse mechanism.
The documentation was updated accordingly.
In this patch we add the possibility to use sample expression as argument
for set-fc-{mark,tos} actions. To make it backward compatible with
previous behavior, during parsing we first try to parse the value as
as integer (decimal or hex notation), and then fallback to expr parsing
in case of failure.
The documentation was updated accordingly.
This is a complementary patch to "MINOR: tcp-act: Rename "set-{mark,tos}"
to "set-fc-{mark,tos}"", but for the Lua API.
set_mark and set_tos were kept as aliases for set_fc_mark and set_fc_tos
but they were marked as deprecated.
Using this opportunity to reorder set_mark and set_tos by alphabetical
order.
"set-mark" and "set-tos" only alter packets from haproxy to client
(frontend connection). Since we may add support for equivalent keywords
on server side, we rename them with an explicit name to prevent
confusions.
Thus, we rename:
- "set-mark" to "set-fc-mark"
- "set-tos" to "set-fc-tos"
"set-mark" and "set-tos" were kept as aliases (to "set-fc-mark" and
"set-fc-tos" respectively) for now to prevent config breakage, but they
have been marked as deprecated so they can be removed in future version.
Some CPU time is needlessly wasted in conn_calculate_hash(), because all
params are first copied into a temporary buffer before computing the
hash on the whole buffer. Instead, let's leverage the XXH progressive
hash update functions to avoid expensive memcpys.
0x00000008 bit for CO_FL_* flags is no more unused since 8cc3fc73f1
("MINOR: connection: update rhttp flags usage"). Removing the comment
that says otherwise.
A major reorganization of QUIC MUX sending has been implemented. Now
data transfer occur over a single QCS buffer. This has improve
performance but at the cost of restrictions on snd_buf. Indeed, buffer
instances are now shared from stream callback snd_buf up to quic-conn
layer.
As such, snd_buf cannot manipulate freely already present data buffer.
In particular, realign has been completely removed by the previous
patches.
This commit reintroduces a partial realign support. This is only done if
the buffer contains only unsent data, via a new MUX function
qcc_realign_stream_txbuf() which is called during snd_buf.
This commit is a direct follow-up on the major rearchitecture of send
buffering. This patch implements the proper handling of connection pool
buffer temporary exhaustion.
The first step is to be able to differentiate a fatal allocation error
from a temporary pool exhaustion. This is done via a new output argument
on qcc_get_stream_txbuf(). For a fatal error, application protocol layer
will schedule the immediate connection closing. For a pool exhaustion,
QCC is flagged with QC_CF_CONN_FULL and stream sending process is
interrupted. QCS instance is also registered in a new list
<qcc.buf_wait_list>.
A new connection buffer can become available when all ACKs are received
for an older buffer. This process is taken in charge by quic-conn layer.
It uses qcc_notify_buf() function to clear QC_CF_CONN_FULL and to wake
up every streams registered on buf_wait_list to resume sending process.
This commit is a direct follow-up on the major rearchitecture of send
buffering. It allows application protocol to react if current QCS
sending buffer space is too small. In this case, the buffer can be
released to the quic-conn layer. This allows to allocate a new QCS
buffer and retry HTX parsing, unless connection buffer pool is already
depleted.
A new function qcc_release_stream_txbuf() serves as API for app protocol
to release the QCS sending buffer. This operation fails if there is
unsent data in it. In this case, MUX has to keep it to finalize transfer
of unsent data to quic-conn layer. QCS is thus flagged with
QC_SF_BLK_MROOM to interrupt snd_buf operation.
When all data are sent to the quic-conn layer, QC_SF_BLK_MROOM is
cleared via qcc_streams_sent_done() and stream layer is woken up to
restart snd_buf.
Note that a new function qcc_stream_can_send() has been defined. It
allows app proto to check if sending is currently blocked for the
current QCS. For now, it checks QC_SF_BLK_MROOM flag. However, it will
be extended to other conditions with the following patches.