This now turns the cpu-policy to "first-usable-node" by default, so that
we preserve the current default behavior consisting in binding to the
first node if nothing was forced. If a second node is found,
global.nbthread is set and the previous code will be skipped.
This is a reimplemlentation of the current default policy. It binds to
the first node having usable CPUs if found, and drops CPUs from the
second and next nodes.
These are processed after the topology is detected, and they allow to
restrict binding to or evict CPUs matching the indicated hardware
cluster number(s). It can be used to bind to only some clusters, such
as CCX or different energy efficiency cores. For this reason, here we
use the cluster's local ID (local to the node).
These are processed after the topology is detected, and they allow to
restrict binding to or evict CPUs matching the indicated hardware
core number(s). It can be used to bind to only some clusters as well
as to evict efficient cores whose number is known.
These are processed after the topology is detected, and they allow to
restrict binding to or evict CPUs matching the indicated hardware
thread number(s). It can be used to reserve even threads for HW IRQs
and odd threads for haproxy for example, or to evict efficient cores
that do only have thread #0.
For now it's limited, it only supports "reset" to ask that any previous
"taskset" be ignored. The goal will be to later add more actions that
allow to symbolically define sets of cpus to bind to or to drop. This
also clears the cpu_mask_forced variable that is used to detect
that a taskset had been used.
During development, everything related to CPU binding and the CPU topology
is debugged using state dumps at various places, but it does make sense to
have a real command line option so that this remains usable in production
to help users figure why some CPUs are not used by default. Let's add
"-dc" for this. Since the list of global.tune.options values is almost
full and does not 100% match this option, let's add a new "tune.debug"
field for this.
Lots of collected data and observations aggregated into a single commit
so as not to lose them. Some parts below come from several commit
messages and are incremental.
Add captures and analysis of intel 14900 where it's not easy to draw
the line between the desired P and E cores.
The 14900 raises some questions (imagine a dual-die variant in multi-socket).
That's the start of an algorithmic distribution of performance cores into
thread groups.
cpu-map currently conflicts a lot with the choices after auto-detection
but it doesn't have to. The problem is the inability to configure the
threads for the whole process like taskset does. By offering this ability
we can also start to designate groups of CPUs symbolically (package, die,
ccx, cores, smt).
It can also be useful to exploit the info from cpuinfo that is not
available in /sys, such as the model number. At least on arm, higher
numbers indicate bigger cores and can be useful to distinguish cores
inside a cluster. It will not indicate big vs medium ones of the same
type (e.g. a78 3.0 vs 2.4 GHz) but can still be effective at identifying
the efficient ones.
In short, infos such as cluster ID not always reliable, and are
local to the package. die_id as well. die number is not reported
here but should definitely be used, as a higher priority than L3.
We're still missing a discriminant between the l3 and cluster number
in order to address heterogenous CPUs (e.g. intel 14900), though in
terms of locality that's currently done correctly.
CPU selection is also a full topic, and some thoughts were noted
regarding sorting by perf vs locality so as never to mix inter-
socket CPUs due to sorting.
The proposed cpu-selection cannot work as-is, because it acts both on
restriction and preference, and these two are not actions but a sequence.
First restrictions must be enforced, and second the remaining CPUs are
sorted according to the preferred criterion, and a number of threads are
selected.
Currently we refine the OS-exposed cluster number but it's not correct
as we can end up with something poorly numbered. We need to respect the
LLC in any case so let's explain the approach.
This is a complementary patch to cf913c2f9 ("DOC: management: rename show
stats domain cli "dns" to "resolvers"). The doc still refered to the
legacy "dns" domain filter for stat command. Let's rename those occurences
to "resolvers".
It may be backported to all stable versions.
Historically, log-forward proxy used to preserve host field from input
message as much as possible, and if syslog host wasn't provided
(rfc5424 '-' or bad rfc3164 or rfc5424 message) then "localhost" or "-"
would be used as host when outputting message using rfc3164 or rfc5424.
We change that behavior (which corresponds to "keep" host option), so that
log-forward now uses "fill" strategy as default: if the host is provided
in input message, it is preserved. However if it is missing and IP address
from sender is available, we use it.
Following previous patch, we know implement the logic for the host
option under log-forward section. Possible strategies are:
replace If input message already contains a value for the host
field, we replace it by the source IP address from the
sender.
If input message doesn't contain a value for the host field
(ie: '-' as input rfc5424 message or non compliant rfc3164
or rfc5424 message), we use the source IP address from the
sender as host field.
fill If input message already contains a value for the host field,
we keep it.
If input message doesn't contain a value for the host field
(ie: '-' as input rfc5424 message or non compliant rfc3164
or rfc5424 message), we use the source IP address from the
sender as host field.
keep If input message already contains a value for the host field,
we keep it.
If input message doesn't contain a value for the host field,
we set it to localhost (rfc3164) or '-' (rfc5424).
(This is the default)
append If input message already contains a value for the host field,
we append a comma followed by the IP address from the sender.
If input message doesn't contain a value for the host field,
we use the source IP address from the sender.
Default value (unchanged) is "keep" strategy. option host is only relevant
with rfc3164 or rfc5424 format on log targets. Also, if the source address
is not available (ie: UNIX socket), default behavior prevails.
Documentation was updated.
Released version 3.2-dev7 with the following main changes :
- BUG/MEDIUM: applet: Don't handle EOI/EOS/ERROR is applet is waiting for room
- BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK
- BUG/MINOR: cfgparse: fix NULL ptr dereference in cfg_parse_peers
- BUG/MEDIUM: uxst: fix outgoing abns address family in connect()
- REGTESTS: fix reg-tests/server/abnsz.vtc
- BUG/MINOR: log: fix outgoing abns address family
- BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers
- MINOR: clock: always use atomic ops for global_now_ms
- CI: QUIC Interop: clean old docker images
- BUG/MINOR: stream: do not call co_data() from __strm_dump_to_buffer()
- BUG/MINOR: mux-h1: always make sure h1s->sd exists in h1_dump_h1s_info()
- MINOR: tinfo: add a new thread flag to indicate a call from a sig handler
- BUG/MEDIUM: stream: never allocate connection addresses from signal handler
- MINOR: freq_ctr: provide non-blocking read functions
- BUG/MEDIUM: stream: use non-blocking freq_ctr calls from the stream dumper
- MINOR: tools: use only opportunistic symbols resolution
- CLEANUP: task: move the barrier after clearing th_ctx->current
- MINOR: compression: Introduce minimum size
- BUG/MINOR: h2: always trim leading and trailing LWS in header values
- MINOR: tinfo: split the signal handler report flags into 3
- BUG/MEDIUM: stream: don't use localtime in dumps from a signal handler
- OPTIM: connection: don't try to kill other threads' connection when !shared
- BUILD: add possibility to use different QuicTLS variants
- MEDIUM: fd: Wait if locked in fd_grab_tgid() and fd_take_tgid().
- MINOR: fd: Add fd_lock_tgid_cur().
- MEDIUM: epoll: Make sure we can add a new event
- MINOR: pollers: Add a fixup_tgid_takeover() method.
- MEDIUM: pollers: Drop fd events after a takeover to another tgid.
- MEDIUM: connections: Allow taking over connections from other tgroups.
- MEDIUM: servers: Add strict-maxconn.
- BUG/MEDIUM: server: properly initialize PROXY v2 TLVs
- BUG/MINOR: server: fix the "server-template" prefix memory leak
- BUG/MINOR: h3: do not report transfer as aborted on preemptive response
- CLEANUP: h3: fix documentation of h3_rcv_buf()
- MINOR: hq-interop: properly handle incomplete request
- BUG/MEDIUM: mux-fcgi: Try to fully fill demux buffer on receive if not empty
- MINOR: h1: permit to relax the websocket checks for missing mandatory headers
- BUG/MINOR: hq-interop: fix leak in case of rcv_buf early return
- BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer
- MINOR: jws: implement a JWK public key converter
- DEBUG: init: add a way to register functions for unit tests
- TESTS: add a unit test runner in the Makefile
- TESTS: jws: register a unittest for jwk
- CI: github: run make unit-tests on the CI
- TESTS: add config smoke checks in the unit tests
- MINOR: jws: conversion to NIST curves name
- CI: github: remove smoke tests from vtest.yml
- TESTS: ist: fix wrong array size
- TESTS: ist: use the exit code to return a verdict
- TESTS: ist: add a ist.sh to launch in make unit-tests
- CI: github: fix h2spec.config proxy names
- DEBUG: init: Add a macro to register unit tests
- MINOR: sample: allow custom date format in error-log-format
- CLEANUP: log: removing "log-balance" references
- BUG/MINOR: log: set proper smp size for balance log-hash
- MINOR: log: use __send_log() with exact payload length
- MEDIUM: log: postpone the decision to send or not log with empty messages
- MINOR: proxy: make pr_mode enum bitfield compatible
- MINOR: cfgparse-listen: add and use cfg_parse_listen_match_option() helper
- MINOR: log: add options eval for log-forward
- MINOR: log: detach prepare from parse message
- MINOR: log: add dont-parse-log and assume-rfc6587-ntf options
- BUG/MEIDUM: startup: return to initial cwd only after check_config_validity()
- TESTS: change the output of run-unittests.sh
- TESTS: unit-tests: store sh -x in a result file
- CI: github: show results of the Unit tests
- BUG/MINOR: cfgparse/peers: fix inconsistent check for missing peer server
- BUG/MINOR: cfgparse/peers: properly handle ignored local peer case
- BUG/MINOR: server: dont return immediately from parse_server() when skipping checks
- MINOR: cfgparse/peers: provide more info when ignoring invalid "peer" or "server" lines
- BUG/MINOR: stream: fix age calculation in "show sess" output
- MINOR: stream/cli: rework "show sess" to better consider optional arguments
- MINOR: stream/cli: make "show sess" support filtering on front/back/server
- TESTS: quic: create first quic unittest
- MINOR: h3/hq-interop: restore function for standalone FIN receive
- MINOR/OPTIM: mux-quic: do not allocate rxbuf on standalone FIN
- MINOR: mux-quic: refine reception of standalone STREAM FIN
- MINOR: mux-quic: define globally stream rxbuf size
- MINOR: mux-quic: define rxbuf wrapper
- MINOR: mux-quic: store QCS Rx buf in a single-entry tree
- MINOR: mux-quic: adjust Rx data consumption API
- MINOR: mux-quic: adapt return value of qcc_decode_qcs()
- MAJOR: mux-quic: support multiple QCS RX buffers
- MEDIUM: mux-quic: handle too short data splitted on multiple rxbuf
- MAJOR: mux-quic: increase stream flow-control for multi-buffer alloc
- BUG/MINOR: cfgparse-tcp: relax namespace bind check
- MINOR: startup: adjust alert messages, when capabilities are missed
With "show sess", particularly "show sess all", we're often missing the
ability to inspect only streams attached to a frontend, backend or server.
Let's just add these filters to the command. Only one at a time may be set.
One typical use case could be to dump streams attached to a server after
issuing "shutdown sessions server XXX" to figure why any wouldn't stop
for example.
The "show sess" CLI command parser is getting really annoying because
several options were added in an exclusive mode as the single possible
argument. Recently some cumulable options were added ("show-uri") but
the older ones were not yet adapted. Let's just make sure that the
various filters such as "older" and "age" now belong to the options
and leave only <id>, "all", and "help" for the first ones. The doc was
updated and it's now easier to find these options.
This commit introduces the dont-parse-log option to disable log message
parsing, allowing raw log data to be forwarded without modification.
Also, it adds the assume-rfc6587-ntf option to frame log messages
using only non-transparent framing as per RFC 6587. This avoids
missparsing in certain cases (mainly with non RFC compliant messages).
The documentation is updated to include details on the new options and
their intended use cases.
This feature was discussed in GH #2856
At least one user would like to allow a standards-violating client setup
WebSocket connections through haproxy to a standards-violating server that
accepts them. While this should of course never be done over the internet,
it can make sense in the datacenter between application components which do
not need to mask the data, so this typically falls into the situation of
what the "accept-unsafe-violations-in-http-request" option and the
"accept-unsafe-violations-in-http-response" option are made for.
See GH #2876 for more context.
This patch relaxes the test on the "Sec-Websocket-Key" header field in
the request, and of the "Sec-Websocket-Accept" header in the response
when these respective options are set.
The doc was updated to reference this addition. This may be backported
to 3.1 but preferably not further.
Maxconn is a bit of a misnomer when it comes to servers, as it doesn't
control the maximum number of connections we establish to a server, but
the maximum number of simultaneous requests. So add "strict-maxconn",
that will make it so we will never establish more connections than
maxconn.
It extends the meaning of the "restricted" setting of
tune.takeover-other-tg-connections, as it will also attempt to get idle
connections from other thread groups if strict-maxconn is set.
Allow haproxy to take over idle connections from other thread groups
than our own. To control that, add a new tunable,
tune.takeover-other-tg-connections. It can have 3 values, "none", where
we won't attempt to get connections from the other thread group (the
default), "restricted", where we only will try to get idle connections
from other thread groups when we're using reverse HTTP, and "full",
where we always try to get connections from other thread groups.
Unless there is a special need, it is advised to use "none" (or
restricted if we're using reverse HTTP) as using connections from other
thread groups may have a performance impact.
This is the introduction of "minsize-req" and "minsize-res".
These two options allow you to set the minimum payload size required for
compression to be applied.
This helps save CPU on both server and client sides when the payload does
not need to be compressed.
Released version 3.2-dev6 with the following main changes :
- BUG/MEDIUM: debug: close a possible race between thread dump and panic()
- DEBUG: thread: report the spin lock counters as seek locks
- DEBUG: thread: make lock time computation more consistent
- DEBUG: thread: report the wait time buckets for lock classes
- DEBUG: thread: don't keep the redundant _locked counter
- DEBUG: thread: make lock_stat per operation instead of for all operations
- DEBUG: thread: reduce the struct lock_stat to store only 30 buckets
- MINOR: lbprm: add a new callback ->server_requeue to the lbprm
- MEDIUM: server: allocate a tasklet for asyncronous requeuing
- MAJOR: leastconn: postpone the server's repositioning under contention
- BUG/MINOR: quic: reserve length field for long header encoding
- BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding
- MINOR: quic: simplify length calculation for STREAM/CRYPTO frames
- BUG/MINOR: mworker: section ignored in discovery after a post_section_parser
- BUG/MINOR: mworker: post_section_parser for the last section in discovery
- CLEANUP: mworker: "program" section does not have a post_section_parser anymore
- MEDIUM: initcall: allow to register mutiple post_section_parser per section
- CI: cirrus-ci: bump FreeBSD image to 14-2
- DOC: initcall: name correctly REGISTER_CONFIG_POST_SECTION()
- REGTESTS: stop using truncated.vtc on freebsd
- MINOR: quic: refactor STREAM encoding and splitting
- MINOR: quic: refactor CRYPTO encoding and splitting
- BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED
- BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals
- BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals
- MINOR: ssl/cli: display more filenames in 'show ssl cert'
- DOC: watchdog: document the sequence of the watchdog and panic
- MINOR: ssl: store the filenames resulting from a lookup in ckch_conf
- MINOR: startup: allow hap_register_feature() to enable a feature in the list
- MINOR: quic: support frame type as a varint
- BUG/MINOR: startup: leave at first post_section_parser which fails
- BUG/MINOR: startup: hap_register_feature() fix for partial feature name
- BUG/MEDIUM: cli: Be sure to drop all input data in END state
- BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker
- BUG/MEDIUM: filters: Handle filters registered on data with no payload callback
- BUG/MINOR: fcgi: Don't set the status to 302 if it is already set
- MINOR: ssl/crtlist: split the ckch_conf loading from the crtlist line parsing
- MINOR: ssl/crtlist: handle crt_path == cc->crt in crtlist_load_crt()
- MINOR: ssl/ckch: return from ckch_conf_clean() when conf is NULL
- MEDIUM: ssl/crtlist: "crt" keyword in frontend
- DOC: configuration: document the "crt" frontend keyword
- DEV: h2: add a Lua-based HTTP/2 connection tracer
- BUG/MINOR: quic: prevent crash on conn access after MUX init failure
- BUG/MINOR: mux-quic: prevent crash after MUX init failure
- DEV: h2: fix flags for the continuation frame
- REGTESTS: Fix truncated.vtc to send 0-CRLF
- BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut
- Revert "REGTESTS: stop using truncated.vtc on freebsd"
- MINOR: mux-quic: define a QCC application state member
- MINOR: mux-quic/h3: emit SETTINGS via MUX tasklet handler
- MINOR: mux-quic/h3: support temporary blocking on control stream sending
Each time we go into the watchdog and panic code, it's super hard to
figure who calls what since signals are involved to bounce between
threads. Let's document the main principles and sequences to ease the
journey next time.
Before this patch, REGISTER_CONFIG_SECTION() allowed to register one and only
one callback (<post>) called after the parsing of a section.
It was limitating because you couldn't register a post callback from anywhere
else in the code.
This patch introduces the new REGISTER_CONFIG_SECTION_POST() macros which allows
to register a new post callback for a section keyword from anywhere.
This patch introduces the feature by allowing `struct cfg_section` entries that
does not have a `section_parser`, and then iterating on all cfg_section with a
post_section_parser for a keyword.
Released version 3.2-dev5 with the following main changes :
- BUG/MINOR: ssl: put ssl_sock_load_ca under SSL_NO_GENERATE_CERTIFICATES
- CLEANUP: ssl: rename ssl_sock_load_ca to ssl_sock_gencert_load_ca
- CLEANUP: ssl: move ssl_sock_gencert_load_ca declaration in ssl_gencert.h
- CLEANUP: tree-wide: define and use acl_match_cond() helper
- MINOR: epoll: permit to mask certain specific events
- MINOR: proxies: Add a per-thread group field to struct proxy.
- MINOR: Add fields to the per-thread group field in struct server.
- MINOR: proxies/servers: Calculate queueslength and use it.
- MEDIUM: servers/proxies: Switch to using per-tgroup queues.
- BUG/MINOR: stream: Properly handle "on-marked-up shutdown-backup-sessions"
- MEDIUM: stream: Map task wake up reasons to dedicated stream events
- MEDIUM: stream: No longer use TASK_F_UEVT* to shut a stream down
- BUILD: tools: fix build on BSD by dropping the ETIME check
- MINOR: queues: use __ha_cpu_relax() on failed CAS.
- BUILD: queues: Use unsigned int when needed
- BUILD: ssl: allow to build without the renegotiation API of WolfSSL
- BUILD: ssl: more cleaner approach to WolfSSL without renegotiation
- BUG/MEDIUM: chunk: make sure to flush the trash pool before resizing
- MINOR: quic: remove references to burst in quic-cc-algo parsing
- MINOR: quic: allow BBR testing without pacing
- MINOR: quic: transform pacing settings into a global option
- MAJOR: quic: mark pacing as stable and enable it by default
- MINOR: quic: mark BBR as stable
- MINOR: quic: define quic_tune
- BUILD: quic: fix overflow in global tune
- DEBUG: fd: add a counter of takeovers of an FD since it was last opened
- MINOR: fd: add a generation number to file descriptors
- DEBUG: epoll: store and compare the FD's generation count with reported event
- MEDIUM: epoll: skip reports of stale file descriptors
- MINOR: mux-h1: Add masks to group H1S DEMUX and MUX errors
- BUG/MINOR: mux-h1: Only report a SE error on demux error
- MINOR: tevt: Add the termination events log's fundations
- MINOR: tevt/stconn: Add a termination events log in the SE descriptor
- MINOR: tevt/mux-h1: Report termination events for the H1C and H1S
- MINOR: tevt/mux-h2: Report termination events for the H2C
- MINOR: tevt/stream/stconn: Report termination events for stream and sc
- MINOR: tevt/conn: Report intercepted event for L4 rules
- MINOR: tevt/mux-h1/mux-h2: Add termination events log when dumping mux info
- MINOR: tevt/muxes: Add CTL and SCTL command to get the termination event logs
- MINOR: tevt/mux-pt: Add support for termination event logs
- MINOR: tevt/connection: Add dedicated termination events for lower locations
- MEDIUM: tevt/muxes: Add dedicated termination events for muxc/se locations
- MINOR: tevt/stconn: Be more accurate to report shutw events
- MEDIUM: tevt/stconn/stream: Add dedicated termination events for stream location
- MINOR: tevt: Don't duplicate termination event during reporting
- MINOR: tevt/applet: Add limited support for termination event logs for applets
- MINOR: tevt: Add a sample to get termination events for all locations
- MINOR: tevt: Improve function to convert a termination events log to string
- REORG: tevt/connection: Move enums at the end of the header file
- MINOR: tevt/dev: Add term_events tool
- MINOR: tevt/connection: Add support for POLL_HUP/POLL_ERR events
- MINOR: tevt/dev: Parse tuple of termination events
- BUG/MEDIUM: htx: wrong count computation in htx_xfer_blks()
- DOC: htx: clarify <mark> parameter for htx_xfer_blks()
- BUILD: quic: remove GCC undefined error in qc_release_lost_pkts()
- MEDIUM: htx: prevent <mark> to copy incomplete headers in htx_xfer_blks()
- BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records
- BUG/MINOR: tevt/http-ana: Remove badly placed event reports
- DEBUG: http-ana: Remove debug counters from HTTP analyzers
- DEBUG: mux-h1: Remove some debug counters
- BUG/MINOR: tcp-rules: Don't forward close during tcp-response content rules eval
- MEDIUM: stream: interrupt costly rulesets after too many evaluations
- BUG/MINOR: http-check: Don't pretend a C-L heeader is set before adding it
- BUILD: ssl: remove a boringssl definition defined by recent boringssl libs
- BUG/MINOR: tevt/mux-h2: Set truncated receive/eos events at SE level on error
- BUG/MEDIUM: flt-spoe: Set/test applet flags instead of SE flags from I/O handler
- BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR
- BUG/MEDIUM: flt-spoe: Properly handle end of stream from the SPOE applet
- MINOR: flt-spoe: Report end of input immediately after applet init
- MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream
- MINOR: mux-spop: Set SPOP_CF_ERROR flag on connection error only
- MINOR: tevt/mux-spop: Report termination events for the SPOP connect/stream
- CLEANUP: mux-spop: Remove useless comments
- MINOR: mux-spop: Dump info about connections and streams in dedicated functions
- MINOR: mux-spop: Implement .show_sd callback function
- MEDIUM: mux-fcgi: Add a function to propagate termination flags from fstrm to SE
- BUG/MEDIUM: mux-fcgi: Propagate flags to SE in fcgi_strm_wake_one_stream
- MINOR: tevt/mux-fcgi: Report termination events for the FCGI connect/stream
- MINOR: mux-fcgi: Dump info about connections and streams in dedicated functions
- MINOR: mux-spop/mux-fcgi: Add support of the debug string for logs
- BUG/MINOR: cli: Don't set SE flags from the cli applet
- BUG/MINOR: cli: Fix memory leak on error for _getsocks command
- BUG/MINOR: cli: Fix a possible infinite loop in _getsocks()
- BUG/MINOR: config/userlist: Support one 'users' option for 'group' directive
- BUG/MINOR: auth: Fix a leak on error path when parsing user's groups
- BUG/MINOR: flt-trace: Support only one name option
- MINOR: filters: Improve errors formating during filters parsing
- BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer
- DOC: option redispatch should mention persist options
- BUG/MINOR: debug: make "debug dev sched" accept a negative TID
- BUG/MINOR: debug: make sure the "debug dev sched" tasks don't block stopping
- IMPORT: plock: export the uninlined version of the lock wait function
- IMPORT: plock: give higher precedence to W than S
- IMPORT: plock: lower the slope of the exponential back-off
- IMPORT: plock: use cpu_relax() for a shorter time in EBO
- Revert "IMPORT: plock: export the uninlined version of the lock wait function"
- BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3
"option redispatch" remains vague in which cases a session would persist;
let's mention "option persist" and "force-persist" as an example so folks
don't draw the conclusion that this may be default.
Should be backported to stable branches.
It is not rare to see configurations with a large number of "tcp-request
content" or "http-request" rules for instance. A large number of rules
combined with cpu-demanding actions (e.g.: actions that work on content)
may create thread contention as all the rules from a given ruleset are
evaluated under the same polling loop if the evaluation is not interrupted
Thus, in this patch we add extra logic around "tcp-request content",
"tcp-response content", "http-request" and "http-response" rulesets, so
that when a certain number of rules are evaluated under the single polling
loop, we force the evaluating function to yield. As such, the rule which
was about to be evaluated is saved, and the function starts evaluating
rules from the save pointer when it returns (in the next polling loop).
We use task_wakeup(task, TASK_WOKEN_MSG) to explicitly wake the task so
that no time is wasted and the processing is resumed ASAP. TASK_WOKEN_MSG
is mandatory here because process_stream() expects TASK_WOKEN_MSG for
explicit analyzers re-evaluation.
rules_bcount stream's attribute was added to count how manu rules were
evaluated since last interruption (yield). Also, SF_RULE_FYIELD flag
was added to know that the s->current_rule was assigned due to forced
yield and not regular yield.
By default haproxy will enforce a yield every 50 rules, this behavior
can be configured using the "tune.max-rules-at-once" global keyword.
There is a limitation though: for now, if the ACT_OPT_FINAL flag is set
on act_opts, we consider it is not safe to yield (as it is already the
case for automatic yield). In this case instead of yielding an taking
the risk of not being called back, we skip the yield and hope it will
not create contention. This is something we should ideally try to
improve in order to yield in all conditions.
"term_events" is a sample fetche function that can be used to get
termination events for all locations in one call. The format equivalent to:
{fc_term_events,fc_mux_term_events,fs.term_events,txn.term_events,bs.term_events,bc_mux_term_events,bc_term_events}
If no event was reported for a location, the field is empty. If the feature
is not supported yet, a dash ('-') is printed.
Pacing has recently been moved out of experimental status and is
activated by default. This is a mandatory requirement for BBR.
Furthermore, BBR is now considered stable. As such, removes its
experimental status with this commit.
Remove pacing experimental status, so it's not required anymore to use
expose-experimental-directives to enable it.
Along this change, pacing is now activated by default. As such, pacing
configuration is transformed into its final form. The global on/off
setting is turned into a disable setting without argument.
Pacing support was previously activated on each bind line individually,
via an optional argument of quic-cc-algo keyword. Remove this optional
argument and introduce a global setting to enable/disable pacing. Pacing
activation is still flagged as experimental.
One important change is that previously BBR usage automatically
activated pacing support. This is not the case anymore, so users should
now always explicitely activate pacing if BBR is selected. A new warning
message will be displayed if this is not the case.
Another consequence of this change is that now pacing_inter callback is
always defined for every quic_cc_algo types. As such, QUIC MUX uses
global.tune.options to determine if pacing is required.
This should be backported up to 3.1, after a period of observation.
Pacing is activated per bind line via an optional boolean argument of
quic-cc-algo keyword. Contrary to the default usage, pacing is
automatically activated when BBR is chosen. This is because this
algorithm is expected to run on top of pacing, else its behavior is
undefined.
Previously, pacing argument was thus ignored when BBR was selected.
Change this to support explicit deactivation of pacing with it. This
could be useful to test BBR without pacing when debugging some issues.
This should be backported up to 3.1, after a period of observation.
shutdown-backup-sessions action for on-marked-up directive does not work anymore
since the stream_shutdown() function was modified to be async-safe.
When stream_shutdown() was modified to be async-safe, dedicated task events were
added to map the reasons to shut a stream down. SF_ERR_DOWN was mapped to
TASK_F_EVT1 and SF_ERR_KILLED was mapped to TASK_F_EVT2. The reverse mapping was
performed by process_stream() to shut the stream with the appropriate reason.
However, SF_ERR_UP reason, used by shutdown-backup-sessions action to shut a
stream down because a preferred server became available, was not mapped in the
same way. So since commit b8e3b0a18d ("BUG/MEDIUM: stream: make
stream_shutdown() async-safe"), this action is ignored and does not work
anymore.
To fix an issue, and being able to bakcport the fix, a third task event was
added. TASK_F_EVT3 is now mapped on SF_ERR_UP.
This patch should fix the issue #2848. It must be backported as far as 2.6.
A few times in the past we've seen cases where epoll was caught reporting
a wrong event that caused trouble (e.g. spuriously reporting HUP or RDHUP
after a successful connect()). The new tune.epoll.mask-events directive
permits to mask events such as ERR, HUP and RDHUP and convert them to IN
events that are processed by the regular receive path. This should help
better diagnose and troubleshoot issues such as this one, as well as rule
out such a cause when similar issues are reported:
https://github.com/haproxy/haproxy/issues/2368https://www.spinics.net/lists/netdev/msg876470.html
It should be harmless to backport this if necessary.
Released version 3.2-dev4 with the following main changes :
- BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey()
- MINOR: stktable: add stkey_to_smp() helper
- MINOR: stktable: add stksess_getkey() helper
- MINOR: stktable: add sc[0-2]_key fetches
- BUG/MEDIUM: queues: Adjust the proxy counters when appropriate
- MINOR: trace: add help message for -dt argument
- MINOR: trace: ensure -dt priority over traces config section
- MINOR: trace: support all source alias on -dt
- BUG/MINOR: quic: reject NEW_TOKEN frames from clients
- MINOR: stktable: fix potential build issue in smp_to_stkey
- BUG/MEDIUM: stktable: fix missing lock on some table converters
- BUG/MEDIUM: promex: Use right context pointers to dump backends extra-counters
- MINOR: stktable: fix potential build issue in smp_to_stkey (2nd try)
- MINOR: stktable: add smp_fetch_stksess() helper function
- MEDIUM: stktable: split src-based key smp_fetch_sc functions
- MEDIUM: stktable: split sc_ and src_ fetch lookup logics
- MEDIUM: stktable: leverage smp_fetch_* helpers from sample conv
- DOC: config: unify sample conv|fetches optional arguments syntax
- DOC: config: stick-table converters support implicit <table> argument
- DOC: config: stick-table converter do accept ANY-typed input
- DOC: config: clarify return type for some stick-table converters
- DOC: config: refer to canonical sticktable converters for src_* fetches
- CLEANUP: stktable: move sample_conv_table_bytes_out_rate()
- MINOR: stktable: add table_{inc,clr}_gpc* converters
- BUG/MAJOR: quic: reject too large CRYPTO frames
- BUG/MAJOR: log/sink: possible sink collision in sink_new_from_srv()
- BUG/MINOR: init: set HAPROXY_STARTUP_VERSION from the variable, not the macro
- REORG: version: move the remaining BUILD_* stuff from haproxy.c to version.c
- BUG/MINOR: quic: ensure a detached coalesced packet can't access its neighbours
- MINOR: quic: Add a BUG_ON() on quic_tx_packet refcount
- BUILD: quic: Move an ASSUME_NONNULL() for variable which is not null
- BUG/MEDIUM: mux-h1: Properly close H1C if an error is reported before sending data
- CLEANUP: quic: remove unused prototype
- MINOR: quic: rename pacing_rate cb to pacing_inter
- BUG/MINOR: quic: do not increase congestion window if app limited
- MINOR: mux-quic: increment pacing retry counter on expired
- MEDIUM: quic: implement credit based pacing
- MEDIUM: mux-quic: reduce pacing CPU usage with passive wait
- MEDIUM: quic: use dynamic credit for pacing
- MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path
- MINOR: quic: adapt credit based pacing to BBR
- MINOR: tools: add errname to print errno macro name
- MINOR: debug: debug_parse_cli_show_dev: use errname
- MINOR: debug: show boot and runtime process settings in table
Major improvements have been introduced in pacing recently. Most
notably, QMUX schedules emission on a millisecond resolution, which
allow to use passive wait to be much CPU friendly.
However, an issue remains with the pacing max credit. Unless BBR is
used, it is fixed to the configured value from quic-cc-algo bind
statement. This is not practical as if too low, it may drastically
reduce performance due to 1ms sleep resolution. If too high, some
clients will suffer from too much packet loss.
This commit fixes the issue by implementing a dynamic maximum credit
value based on the network condition specific to each clients.
Calculation is done to fix a maximum value which should allow QMUX
current tasklet context to emit enough data to cover the delay with the
next tasklet invokation. As such, avg_loop_us is used to detect the
process load. If too small, 1.5ms is used as minimal value, to cover the
extra delay incurred by the system which will happen for a default 1ms
sleep.
This should be backported up to 3.1.
Pacing algorithm has been revamped in the previous commit to implement a
credit based solution. This is a far more adaptative solution, in
particular which allow to catch up in case pause between pacing emission
was longer than expected.
This allows QMUX to remove the active loop based on tasklet wake-up.
Instead, a new task is used when emission should be paced. The main
advantage is that CPU usage is drastically reduced.
New pacing task timer is reset each time qcc_io_send() is invoked. Timer
will be set only if pacing engine reports that emission must be
interrupted. In this case timer is set via qcc_wakeup_pacing() to the
delay reported by congestion algorithm, or 1ms if delay is too short. At
the end of qcc_io_cb(), pacing task is queued if timer has been set.
Pacing task execution is simple enough : it immediately wakes up QCC I/O
handler.
Note that to have decent performance, it requires to have a large enough
burst defined in configuration of quic-cc-algo. However, this value is
common to every listener clients, which may cause too much loss under
network conditions. This will be address in a future patch.
This should be backported up to 3.1.
As discussed in GH #2423, there are some cases where src_{inc,clr}_gpc*
is not sufficient because we need to perform the lookup on a specific
key. Indeed, just like we did in e642916 ("MEDIUM: stktable: leverage
smp_fetch_* helpers from sample conv"), we can easily implement new
table converters based on existing fetches. This is what we do in
this patch.
Also the doc was updated so that src_{inc,clr}_gpc* fetches now point to
their generic equivalent table_{inc,clr}_gpc*. Indeed, src_{inc,clr}_gpc*
are simply aliases.
This should fix GH #2423.
When available, to prevent doc duplication, let's make src_* fetches
point to equivalent table_* converters, as they are in fact aliases
for src,table_* converters.
Some stick-table converters such as "table_gpt" erroneously suggest that
the returned type is a boolean while in fact it is integer type, as
properly documented for the sample fetch equivalents.
Since 2d17db58 ("MINOR: stick-table: change all stick-table converters'
inputs to SMP_T_ANY"), all stick-table converters accept ANY input
type as parameter, this means that it does no longer restrict the key as
a string representation of the input. However the doc wasn't updated when
the change was made. Moreover, some converters document the updated behavior
while others don't, which is kind of confusing, let's fix that.
As with stick-table sample fetches, the <table> argument is not strictly
needed and defaults to the current proxy's stick-table when not provided
Let's update the doc and prototype to reflect the current behavior.
The most common way (and proper way it seems) to declare optional
arguments in sample fetch or converters' prototype is to declare
them between square brackets, including the leading coma (because the
coma should be omitted if the argument is not provided). Also, when
multiple optional arguments are found, we should apply the same logic
but recursively.
In this patch we fix prototypes that include optional arguments and don't
follow this syntax. This improves readibility and sets the norm for
upcoming sample fetches/converters.
As discussed in GH #1750, we were lacking a sample fetch to be able to
retrieve the key from the currently tracked counter entry. To do so,
sc_key fetch can now be used. It returns a sample with the correct type
(table key type) corresponding to the tracked counter entry (from previous
track-sc rules).
If no entry is currently tracked, it returns nothing.
It can be used using the standard form "sc_key(<sc_number>)" or the legacy
form: "sc0_key", "sc1_key", "sc2_key"
Documentation was updated.
Released version 3.2-dev3 with the following main changes :
- DOC: config: add missing "track-sc0" in action keywords matrix
- BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types
- BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission
- BUG/MEDIUM: mux-h2: Count copied data when looping on RX bufs in h2_rcv_buf()
- Revert "BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission"
- BUG/MAJOR: mux-quic: properly fix BUG_ON on empty STREAM emission
- MINOR: mux-quic: add traces on sd attach
- BUG/MEDIUM: mux-quic: do not attach on already closed stream
- BUG/MINOR: compression: handle a possible strdup() failure
- BUG/MINOR: pool: handle a possible strdup() failure
- BUG/MINOR: cfgparse-tcp: handle a possible strdup() failure
- BUG/MINOR: log: Allow to use if/unless conditionnals for do-log action
- MINOR: config: Alert about extra arguments for errorfile and errorloc
- BUG/MINOR: mux-quic: fix wakeup on qcc_set_error()
- MINOR: mux-quic: change return value of qcs_attach_sc()
- BUG/MINOR: mux-quic: handle closure of uni-stream
- BUG/MEDIUM: promex/resolvers: Don't dump metrics if no nameserver is defined
- BUG/MAJOR: ssl/ocsp: fix NULL conn object dereferencing to access QUIC TLS counters
- MEDIUM: errors: get rid of shm_open()
- BUILD: makefile: do not clean standalone binaries on a simple "make clean"
- BUILD: makefile: add a qinfo macro to pass info in quiet mode
- DEV: ncpu: add a simple utility to help with NUMA development
- DEV: ncpu: implement a wrapper mode
- DEV: ncpu: make the wrapper work both as a lib and executable
- BUG/MEDIUM: h1-htx: Properly handle bodyless messages
- MINOR: tools: add a few functions to simply check for a file's existence
In d54e8f8107 ("DOC: config: reorganize actions into their own section"),
"track-sc0" keyword was properly documented but the keyword was not placed
in the action keywords matrix alongside other track-sc* statements. It
was probably overlooked, so let's fix that.
Could be backported up to 2.9 with d54e8f8107.
Released version 3.2-dev2 with the following main changes :
- MINOR: build: define DEBUG_STRESS
- MINOR: applet: define applet_putchk_stress() alternative
- MINOR: stats: use stress mode to force reentrant dumps
- CI: scripts: add support for AWS-LC-FIPS in build-ssl.sh
- MINOR: ssl: add "FIPS" details in haproxy -vv
- MEDIUM: ssl: rename 'OpenSSL' by 'SSL library' in haproxy -vv
- CI: github: let's add an AWS-LC-FIPS job
- MINOR: window_filter: rely on the time to update the filter samples (QUIC/BBR)
- BUG/MINOR: quic: wrong logical statement in in_recovery_period() (BBR)
- BUG/MINOR: quic: fix BBB max bandwidth oscillation issue.
- BUG/MINOR: quic: wrong bbr_target_inflight() implementation
- BUG/MINOR: quic: remove max_bw filter from delivery rate sampling
- BUG/MINOR: quic: underflow issue for bbr_inflight_hi_from_lost_packet()
- BUG/MINOR: quic: reduce packet losses at least during ProbeBW_CRUISE (BBR)
- MINOR: quic: reduce the private data size of QUIC cc algos
- CLEANUP: quic: remove a wrong comment about ->app_limited (drs)
- BUG/MINOR: quic: fix the wrong tracked recovery start time value
- BUG/MINOR: quic: too permissive exit condition for high loss detection in Startup (BBR)
- BUG/MINOR: cli: cli_snd_buf: preserve \r\n for payload lines
- REGTESTS: ssl: add a PEM with mix of LF and CRLF line endings
- BUG/MINOR: quic: missing Startup accelerating probing bw states
- CLEANUP: quic: Rename some BBR functions in relation with bw probing
- REORG: startup: move global.maxconn calculations in limits.c
- REORG: startup: move code that applies limits to limits.c
- REORG: startup: move nofile limit checks in limits.c
- MINOR: ssl: add utils functions to extract X509 notAfter date
- MINOR: ssl/cli: allow to filter expired certificates with 'show ssl sni'
- MINOR: ssl/cli: add -A to the 'show ssl sni' command description
- BUG/MINOR: ssl/cli: 'show ssl cert' escape the first '*' of a filename
- BUG/MINOR: ssl/cli: 'show ssl crl-file' escape the first '*' of a filename
- BUG/MINOR: ssl/cli: 'show ssl ca-file' escape the first '*' of a filename
- BUG/MEDIUM: stconn: Only consider I/O timers to update stream's expiration date
- BUG/MEDIUM: queues: Make sure we call process_srv_queue() when leaving
- BUG/MEDIUM: queues: Do not use pendconn_grab_from_px().
- CLEANUP: queues: Remove pendconn_grab_from_px().
- BUILD: debug: only dump/reset glitch counters when really defined
- MINOR: compiler: add a __has_builtin() macro to detect features more easily
- MINOR: compiler: rely on builtin detection for __builtin_unreachable()
- MINOR: compiler: add a new "ASSUME" macro to help the compiler
- MINOR: compiler: also enable __builtin_assume() for ASSUME()
- MINOR: compiler: add ASSUME_NONNULL() to tell the compiler a pointer is valid
- MINOR: bug: make BUG_ON() fall back to ASSUME
- CLEANUP: cache: use ASSUME_NONNULL() instead of DISGUISE()
- CLEANUP: hlua: use ASSUME_NONNULL() instead of ALREADY_CHECKED()
- CLEANUP: htx: use ASSUME_NONNULL() to mark the start line as non-null
- CLEANUP: mux-fcgi: use ASSUME_NONNULL() to indicate that the first block exists
- CLEANUP: stats: use ASSUME_NONNULL() to indicate that the first block exists
- CLEANUP: quic: replace ALREADY_CHECKED() with ASSUME_NONNULL() at a few places
- CLEANUP: ssl-sock: drop two now unneeded ALREADY_CHECKED()
- BUG/MEDIUM: mux-quic: do not mix qcc_io_send() return codes with pacing
- CLEANUP: mux-quic: remove unused qcc member send_retry_list
- MINOR: quic: add traces
- MINOR: mux-quic: refactor wait-for-handshake support
- MEDIUM/OPTIM: mux-quic: define a recv_list for demux resumption
- MEDIUM/OPTIM: mux-quic: implement purg_list
- MINOR: mux-quic: extract code to build STREAM frames list
- MINOR: mux-quic: split STREAM and RS/SS emission
- MEDIUM/OPTIM: mux-quic: do not rebuild frms list on every send
- MEDIUM: mux-quic: remove pacing specific code on qcc_io_cb
- MINOR: trace: implement tracing disabling API
- MINOR: mux-quic: hide traces when woken up on pacing only
- MINOR: ssl/cli: add a 'Uncommitted' status for 'show ssl' commands
- MINOR: ssl/ocsp: Add extra details in error logs when possible
- BUILD: ssl/ocsp: error: ‘%.*s’ directive argument is null
- MEDIUM: ssl/ocsp: OCSP response is expired with OCSP_MAX_RESPONSE_TIME_SKEW
- MINOR: ssl: improve HAVE_SSL_OCSP ifdef
- DOC: config: add example for server "track" keyword
- DOC: config: reorder "tune.lua.*" keywords by alphabetical order
- DOC: config: add "tune.lua.burst-timeout" to the list of global parameters
- MINOR: hlua: add option to preserve bool type from smp to lua
- REGTESTS: fix lua-based regtests using tune.lua.smp-preserve-bool
- BUG/MEDIUM: mux-quic: prevent BUG_ON() by refreshing frms on MAX_DATA
- CLEANUP: mux-quic: remove dead err label in qcc_build_frms()
- BUG/MINOR: h2/rhttp: fix HTTP2 conn counters on reverse
- MINOR: hlua: rename "tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion"
- MINOR: ssl: change visibility of ssl_stats_module
- MINOR: ssl: rework the error management in the OCSP callback
- MEDIUM: ssl/ocsp: counters for OCSP stapling
- CI: limit aws-lc and libressl Quic Interop to "haproxy" only
- BUG/MEDIUM: queue: Make process_srv_queue return the number of streams
- CI: github: try to build the latest WolfSSL master weekly
- CI: github: activate ASAN on the WolfSSL weekly job
- BUG/MINOR: stats: fix segfault caused by uninitialized value in "show schema json"
- MINOR: stktable: add stktable_get_data_type_idx() helper function
- MINOR: stktable: support optional index for array types in {set, clear, show} table commands
- CI: scripts: allow to build wolfssl with --enable-debug
- CI: github: activate debug in wolfssl weekly build
- BUG/MEDIUM: queues: Stricly respect maxconn for outgoing connections
- MEDIUM: queue: Handle the race condition between queue and dequeue differently
- CLEANUP: Remove pendconn_must_try_again().
- BUILD: compat: add missing fcntl.h before defining F_SETPIPE_SZ
- BUILD: mworker: always initialize the saveptr of strtok_r()
- BUILD: limits: make normalize_rlim() take an rlim_t to fix build on m68k
- BUG/MINOR: checks: handle a possible strdup() failure
- BUG/MINOR: listener: handle a possible strdup() failure
- BUG/MINOR: mux_h1: handle a possible strdup() failure
- BUG/MINOR: debug: handle a possible strdup() failure