Now that "now" is no more a timeval, there's no point keeping a copy
of it as a timeval, let's also switch start_time to nanoseconds, it
simplifies operations.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Sharding by-group is exactly identical to by-process for a single
group, and will use the same number of file descriptors for more than
one group, while significantly lowering the kernel's locking overhead.
Now that all special listeners (cli, peers) are properly handled, and
that support for SO_REUSEPORT is detected at runtime per protocol, there
should be no more reason for now switching to by-group by default.
That's what this patch does. It does only this and nothing else so that
it's easy to revert, should any issue be raised.
Testing on an AMD EPYC 74F3 featuring 24 cores and 48 threads distributed
into 8 core complexes of 3 cores each, shows that configuring 8 groups
(one per CCX) is sufficient to simply double the forwarded connection
rate from 112k to 214k/s, reducing kernel locking from 71 to 55%.
This new setting accepts "by-process", "by-group" and "by-thread" and
will dictate how listeners will be sharded by default when nothing is
specified. While the default remains "by-process", "by-group" should be
much more efficient with many threads, while not changing anything for
single-group setups.
Some protocol support SO_REUSEPORT and others not. Some have such a
limitation in the kernel, and others in haproxy itself (e.g. sock_unix
cannot support multiple bindings since each one will unbind the previous
one). Also it's really protocol-dependent and not just family-dependent
because on Linux for some time it was supported for TCP and not UDP.
Let's move the definition to the protocols instead. Now it's preset in
tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset.
The enabled() config condition test validates IPv4 (generally sufficient),
and -dR / noreuseport all protocols at once.
This new algorithm for rebalancing incoming connections to multiple
threads is simpler and instead of considering the threads load, it will
only cycle through all of them, offering a fair share of the traffic to
each thread. It may be well suited for short-lived connections but is
also convenient for very large thread counts where it's not always certain
that the least loaded thread will always be found.
Proxies belonging to the cfg_log_forward proxy list are not cleaned up
in haproxy deinit() function.
We add the missing cleanup directly in the main deinit() function since
no other specific function may be used for this.
This could be backported up to 2.4
When a ring section is configured, a new sink is created and forward_px
proxy may be allocated and assigned to the sink.
Such sink-related proxies are added to the sink_proxies_list and thus
don't belong to the main proxy list which is cleaned up in
haproxy deinit() function.
We don't have to manually clean up sink_proxies_list in the main deinit()
func:
sink API already provides the sink_deinit() function so we just add the
missing free_proxy(sink->forward_px) there.
This could be backported up to 2.4.
[in 2.4, commit b0281a49 ("MINOR: proxy: check if p is NULL in free_proxy()")
must be backported first]
Starting haproxy with -dL helps enumerate the list of libraries in use.
But sometimes in order to go further we'd like to see their address
ranges. This is already supported on the CLI's "show libs" but not on
the command line where it can sometimes help troubleshoot startup issues.
Let's dump them when in verbose mode. This way it doesn't change the
existing behavior for those trying to enumerate libs to produce an archive.
In environments where SYSTEM_MAXCONN is defined when compiling, the
master will use this value instead of the original minimal value which
was set to 100. When this happens, the master process could allocate
RAM excessively since it does not need to have an high maxconn. (For
example if SYSTEM_MAXCONN was set to 100000 or more)
This patch fixes the issue by using the new define MASTER_MAXCONN which
define a default maxconn of 100 for the master process.
Must be backported as far as 2.5.
Since the following commit :
commit fb375574f9
MINOR: quic: mark quic-conn as jobs on socket allocation
quic-conn instances are marked as jobs. This prevent haproxy process to
stop while there is transfer in progress. To not delay process
termination, idle connections are woken up through their MUX instances
to be able to release them immediately.
However, there is no mechanism to wake up quic connections left on
closing or draining state. This means that haproxy process termination
is delayed until every closing quic connections timer has expired.
To improve this, a new function quic_handle_stopping() is called when
haproxy process is stopping. It simply wakes up the idle timer task of
all connections in the global closing list. These connections will thus
be released immediately to not interrupt haproxy process stopping.
This should be backported up to 2.7.
This patch adds support from HAPROXY_BRANCH environment variable.
It can be useful is some resources are loaded from different
locations when migrating from one version to another.
Signed-off-by: Sébastien Gross <sgross@haproxy.com>
In haproxy startup, all init error paths after the protocol bind step
cautiously call protocol_unbind_all() before exiting except one that was
conditional. We're not making an exception to the rule and we now properly
call protocol_unbind_all() as well.
No backport needed as this patch is unnoticeable.
HAPROXY_STARTUP_VERSION: contains the version used to start, in
master-worker mode this is the version which was used to start the
master, even after updating the binary and reloading.
This patch could be backported in every version since it is useful when
debugging.
The option was renamed to only permit to disable the fast-forward. First
there is no reason to enable it because it is the default behavior. Then it
introduced a bug because there is no way to be sure the command line has
precedence over the configuration this way. So, the option is now named
"tune.disable-fast-forward" and does not support any argument. And of
course, the commande line option "-dF" has now precedence over the
configuration.
No backport needed.
Since the recent changes on the clocks, now.tv_sec is not to be used
between processes because it's a clock which is local to the process and
does not contain a real unix timestamp. This patch fixes the issue by
using "data.tv_sec" which is the wall clock instead of "now.tv_sec'.
It prevents having incoherent timestamps.
It also introduces some checks on negatives values in order to never
displays a netative value if it was computed from a wrong value set by a
previous haproxy version.
It must be backported as far as 2.0.
Several times during debugging it has been difficult to find a way to
reliably indicate if a thread had been started and if it was still
running. It's really not easy because the elements we look at are not
necessarily reliable (e.g. harmless bit or idle bit might not reflect
what we think during a signal). And such notions can be subjective
anyway.
Here we define two thread flags, TH_FL_STARTED which is set as soon as
a thread enters run_thread_poll_loop() and drops the idle bit, and
another one, TH_FL_IN_LOOP, which is set when entering run_poll_loop()
and cleared when leaving it. This should help init/deinit code know
whether it's called from a non-initialized thread (i.e. tid must not
be trusted), or shared functions know if they're being called from a
running thread or from init/deinit code outside of the polling loop.
The -dF option can now be used to disable data fast-forward. It does the
same than the global option "tune.fast-forward off". Some reg-tests may rely
on this optim. To detect the feature and skip such script, the following
vtest command must be used:
feature cmd "$HAPROXY_PROGRAM -cc '!(globa.tune & GTUNE_NO_FAST_FWD)'"
Uptime calculation for master process was incorrect as it used
<start_date> as its timestamp base time. Fix this by using the scheduler
time <start_time> for this.
The impact of this bug is minor as timestamp base time is only used for
"show proc" CLI output. it was highlighted by the following commit.
which caused a negative value to be displayed for the master process
uptime on "show proc" output.
28360dc53f
MEDIUM: clock: force internal time to wrap early after boot
This should be backported up to 2.0.
We've had a start date even before the internal monotonic clock existed,
but once the monotonic clock was added, the start date was not updated
to distinguish the wall clock time units and the internal monotonic time
units. The distinction is important because both clocks do not necessarily
progress at the same speed. The very rare occurrences of the wall-clock
date are essentially for human consumption and communication with third
parties (e.g. report the start date in "show info" for monitoring
purposes). However currently this one is also used to measure the distance
to "now" as being the process' uptime. This is actually not correct. It
only works because for now the two dates are initialized at the exact
same instant at boot but could still be wrong if the system's date shows
a big jump backwards during startup for example. In addition the current
situation prevents us from enforcing an abritrary offset at boot to reveal
some heisenbugs.
This patch adds a new "start_time" at boot that is set from "now" and is
used in uptime calculations. "start_date" instead is now set from "date"
and will always reflect the system date for human consumption (e.g. in
"show info"). This way we're now sure that any drift of the internal
clock relative to the system date will not impact the reported uptime.
This could possibly be backported though it's unlikely that anyone has
ever noticed the problem.
Define a new configuration option "tune.quic.max-frame-loss". This is
used to specify the limit for which a single frame instance can be
detected as lost. If exceeded, the connection is closed.
This should be backported up to 2.7.
idle and harmless bits in the tgroup_ctx structure were not explicitly
set during boot.
| struct tgroup_ctx ha_tgroup_ctx[MAX_TGROUPS] = { };
As the structure is first statically initialized,
.threads_harmless and .threads_idle are automatically zero-
initialized by the compiler.
Unfortulately, this means that such threads are not considered idle
nor harmless by thread_isolate(_full)() functions until they enter
the polling loop (thread_harmless_now() and thread_idle_now() are
respectively called before entering the polling loop)
Because of this, any attempt to call thread_isolate() or thread_isolate_full()
during a startup phase with nbthreads >= 2 will cause thread_isolate to
loop until every secondary threads make it through their first polling loop.
If the startup phase is aborted during boot (ie: "-c" option to check the
configuration), secondary threads may be initialized but will never be started
(ie: they won't enter the polling loop), thus thread_isolate()
could would loop forever in such cases.
We can easily reveal the bug with this patch reproducer:
| diff --git a/src/haproxy.c b/src/haproxy.c
| index e91691658..0b733f6ee 100644
| --- a/src/haproxy.c
| +++ b/src/haproxy.c
| @@ -2317,6 +2317,10 @@ static void init(int argc, char **argv)
| if (pr || px) {
| /* At least one peer or one listener has been found */
| qfprintf(stdout, "Configuration file is valid\n");
| + printf("haproxy will loop...\n");
| + thread_isolate();
| + printf("we will never reach this\n");
| + thread_release();
| deinit_and_exit(0);
| }
| qfprintf(stdout, "Configuration file has no error but will not start (no listener) => exit(2).\n");
Now we start haproxy with a valid config:
$> haproxy -c -f valid.conf
Configuration file is valid
haproxy will loop...
^C
------------------------------------------------------------------------------
This did not cause any issue so far because no early deinit paths require
full thread isolation. But this may change when new features or requirements
are introduced, so we should fix this before it becomes a real issue.
To fix this, we explicitly assign .threads_harmless and .threads_idle
to .threads_enabled value in thread_map_to_groups() function during boot.
This is the proper place to do this since as long as .threads_enabled is not
explicitly set, its default value is also 0 (zero-initialized by the compiler)
code snippet from thread_isolate() function:
ulong te = _HA_ATOMIC_LOAD(&ha_tgroup_info[tgrp].threads_enabled);
ulong th = _HA_ATOMIC_LOAD(&ha_tgroup_ctx[tgrp].threads_harmless);
if ((th & te) == te)
break;
Thus thread_isolate(_full()) won't be looping forever in thread_isolate()
even if it were to be used before thread_map_to_groups() is executed.
No backport needed unless this is a requirement.
Since 2.6 we have a free_logsrv() function that is used to release log
servers. It must be called from deinit() instead of manually iterating
over the log servers, otherwise some parts of the structure are not
freed (namely the ring name), as reported by ASAN.
This should be backported to 2.6.
A few loops waiting for threads to synchronize such as thread_isolate()
rightfully filter the thread masks via the threads_enabled field that
contains the list of enabled threads. However, it doesn't use an atomic
load on it. Before 2.7, the equivalent variables were marked as volatile
and were always reloaded. In 2.7 they're fields in ha_tgroup_ctx[], and
the risk that the compiler keeps them in a register inside a loop is not
null at all. In practice when ha_thread_relax() calls sched_yield() or
an x86 PAUSE instruction, it could be verified that the variable is
always reloaded. If these are avoided (e.g. architecture providing
neither solution), it's visible in asm code that the variables are not
reloaded. In this case, if a thread exists just between the moment the
two values are read, the loop could spin forever.
This patch adds the required _HA_ATOMIC_LOAD() on the relevant
threads_enabled fields. It must be backported to 2.7.
Released version 2.8-dev1 with the following main changes :
- MEDIUM: 51d: add support for 51Degrees V4 with Hash algorithm
- MINOR: debug: support pool filtering on "debug dev memstats"
- MINOR: debug: add a balance of alloc - free at the end of the memstats dump
- LICENSE: wurfl: clarify the dummy library license.
- MINOR: event_hdl: add event handler base api
- DOC/MINOR: api: add documentation for event_hdl feature
- MEDIUM: ssl: rename the struct "cert_key_and_chain" to "ckch_data"
- MINOR: quic: remove qc from quic_rx_packet
- MINOR: quic: complete traces in qc_rx_pkt_handle()
- MINOR: quic: extract datagram parsing code
- MINOR: tools: add port for ipcmp as optional criteria
- MINOR: quic: detect connection migration
- MINOR: quic: ignore address migration during handshake
- MINOR: quic: startup detect for quic-conn owned socket support
- MINOR: quic: test IP_PKTINFO support for quic-conn owned socket
- MINOR: quic: define config option for socket per conn
- MINOR: quic: allocate a socket per quic-conn
- MINOR: quic: use connection socket for emission
- MEDIUM: quic: use quic-conn socket for reception
- MEDIUM: quic: move receive out of FD handler to quic-conn io-cb
- MINOR: mux-quic: rename duplicate function names
- MEDIUM: quic: requeue datagrams received on wrong socket
- MINOR: quic: reconnect quic-conn socket on address migration
- MINOR: quic: activate socket per conn by default
- BUG/MINOR: ssl: initialize SSL error before parsing
- BUG/MINOR: ssl: initialize WolfSSL before parsing
- BUG/MINOR: quic: fix fd leak on startup check quic-conn owned socket
- BUG/MEDIIM: stconn: Flush output data before forwarding close to write side
- MINOR: server: add srv->rid (revision id) value
- MINOR: stats: add server revision id support
- MINOR: server/event_hdl: add support for SERVER_ADD and SERVER_DEL events
- MINOR: server/event_hdl: add support for SERVER_UP and SERVER_DOWN events
- BUG/MEDIUM: checks: do not reschedule a possibly running task on state change
- BUG/MINOR: checks: make sure fastinter is used even on forced transitions
- CLEANUP: assorted typo fixes in the code and comments
- MINOR: mworker: display an alert upon a wait-mode exit
- BUG/MEDIUM: mworker: fix segv in early failure of mworker mode with peers
- BUG/MEDIUM: mworker: create the mcli_reload socketpairs in case of upgrade
- BUG/MINOR: checks: restore legacy on-error fastinter behavior
- MINOR: check: use atomic for s->consecutive_errors
- MINOR: stats: properly handle ST_F_CHECK_DURATION metric
- MINOR: mworker: remove unused legacy code in mworker_cleanlisteners
- MINOR: peers: unused code path in process_peer_sync
- BUG/MINOR: init/threads: continue to limit default thread count to max per group
- CLEANUP: init: remove useless assignment of nbthread
- BUILD: atomic: atomic.h may need compiler.h on ARMv8.2-a
- BUILD: makefile/da: also clean Os/ in Device Atlas dummy lib dir
- BUG/MEDIUM: httpclient/lua: double LIST_DELETE on end of lua task
- CLEANUP: pools: move the write before free to the uaf-only function
- CLEANUP: pool: only include pool-os from pool.c not pool.h
- REORG: pool: move all the OS specific code to pool-os.h
- CLEANUP: pools: get rid of CONFIG_HAP_POOLS
- DEBUG: pool: show a few examples in -dMhelp
- MINOR: pools: make DEBUG_UAF a runtime setting
- BUG/MINOR: promex: create haproxy_backend_agg_server_status
- MINOR: promex: introduce haproxy_backend_agg_check_status
- DOC: promex: Add missing backend metrics
- BUG/MAJOR: fcgi: Fix uninitialized reserved bytes
- REGTESTS: fix the race conditions in iff.vtc
- CI: github: reintroduce openssl 1.1.1
- BUG/MINOR: quic: properly handle alloc failure in qc_new_conn()
- BUG/MINOR: quic: handle alloc failure on qc_new_conn() for owned socket
- CLEANUP: mux-quic: remove unused attribute on qcs_is_close_remote()
- BUG/MINOR: mux-quic: remove qcs from opening-list on free
- BUG/MINOR: mux-quic: handle properly alloc error in qcs_new()
- CI: github: split ssl lib selection based on git branch
- REGTESTS: startup: check maxconn computation
- BUG/MINOR: startup: don't use internal proxies to compute the maxconn
- REGTESTS: startup: change the expected maxconn to 11000
- CI: github: set ulimit -n to a greater value
- REGTESTS: startup: activate automatic_maxconn.vtc
- MINOR: sample: add param converter
- CLEANUP: ssl: remove check on srv->proxy
- BUG/MEDIUM: freq-ctr: Don't compute overshoot value for empty counters
- BUG/MEDIUM: resolvers: Use tick_first() to update the resolvers task timeout
- REGTESTS: startup: add alternatives values in automatic_maxconn.vtc
- BUG/MEDIUM: h3: reject request with invalid header name
- BUG/MEDIUM: h3: reject request with invalid pseudo header
- MINOR: http: extract content-length parsing from H2
- BUG/MEDIUM: h3: parse content-length and reject invalid messages
- CI: github: remove redundant ASAN loop
- CI: github: split matrix for development and stable branches
- BUG/MEDIUM: mux-h1: Don't release H1 stream upgraded from TCP on error
- BUG/MINOR: mux-h1: Fix test instead a BUG_ON() in h1_send_error()
- MINOR: http-htx: add BUG_ON to prevent API error on http_cookie_register
- BUG/MEDIUM: h3: fix cookie header parsing
- BUG/MINOR: h3: fix memleak on HEADERS parsing failure
- MINOR: h3: check return values of htx_add_* on headers parsing
- MINOR: ssl: Remove unneeded buffer allocation in show ocsp-response
- MINOR: ssl: Remove unnecessary alloc'ed trash chunk in show ocsp-response
- BUG/MINOR: ssl: Fix memory leak of find_chain in ssl_sock_load_cert_chain
- MINOR: stats: provide ctx for dumping functions
- MINOR: stats: introduce stats field ctx
- BUG/MINOR: stats: fix show stat json buffer limitation
- MINOR: stats: make show info json future-proof
- BUG/MINOR: quic: fix crash on PTO rearm if anti-amplification reset
- BUILD: 51d: fix build issue with recent compilers
- REGTESTS: startup: disable automatic_maxconn.vtc
- BUILD: peers: peers-t.h depends on stick-table-t.h
- BUG/MEDIUM: tests: use tmpdir to create UNIX socket
- BUG/MINOR: mux-h1: Report EOS on parsing/internal error for not running stream
- BUG/MINOR:: mux-h1: Never handle error at mux level for running connection
- BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats
- OPTIM: pool: split the read_mostly from read_write parts in pool_head
- MINOR: pool: make the thread-local hot cache size configurable
- MINOR: freq_ctr: add opportunistic versions of swrate_add()
- MINOR: pool: only use opportunistic versions of the swrate_add() functions
- REGTESTS: ssl: enable the ssl_reuse.vtc test for WolfSSL
- BUG/MEDIUM: mux-quic: fix double delete from qcc.opening_list
- BUG/MEDIUM: quic: properly take shards into account on bind lines
- BUG/MINOR: quic: do not allocate more rxbufs than necessary
- MINOR: ssl: Add a lock to the OCSP response tree
- MINOR: httpclient: Make the CLI flags public for future use
- MINOR: ssl: Add helper function that extracts an OCSP URI from a certificate
- MINOR: ssl: Add OCSP request helper function
- MINOR: ssl: Add helper function that checks the validity of an OCSP response
- MINOR: ssl: Add "update ssl ocsp-response" cli command
- MEDIUM: ssl: Add ocsp_certid in ckch structure and discard ocsp buffer early
- MINOR: ssl: Add ocsp_update_tree and helper functions
- MINOR: ssl: Add crt-list ocsp-update option
- MINOR: ssl: Store 'ocsp-update' mode in the ckch_data and check for inconsistencies
- MEDIUM: ssl: Insert ocsp responses in update tree when needed
- MEDIUM: ssl: Add ocsp update task main function
- MEDIUM: ssl: Start update task if at least one ocsp-update option is set to on
- DOC: ssl: Add documentation for ocsp-update option
- REGTESTS: ssl: Add tests for ocsp auto update mechanism
- MINOR: ssl: Move OCSP code to a dedicated source file
- BUG/MINOR: ssl/ocsp: check chunk_strcpy() in ssl_ocsp_get_uri_from_cert()
- CLEANUP: ssl/ocsp: add spaces around operators
- BUG/MEDIUM: mux-h2: Refuse interim responses with end-stream flag set
- BUG/MINOR: pool/stats: Use ullong to report total pool usage in bytes in stats
- BUG/MINOR: ssl/ocsp: httpclient blocked when doing a GET
- MINOR: httpclient: don't add body when istlen is empty
- MEDIUM: httpclient: change the default log format to skip duplicate proxy data
- BUG/MINOR: httpclient/log: free of invalid ptr with httpclient_log_format
- MEDIUM: mux-quic: implement shutw
- MINOR: mux-quic: do not count stream flow-control if already closed
- MINOR: mux-quic: handle RESET_STREAM reception
- MEDIUM: mux-quic: implement STOP_SENDING emission
- MINOR: h3: use stream error when needed instead of connection
- CI: github: enable github api authentication for OpenSSL tags read
- BUG/MINOR: mux-quic: ignore remote unidirectional stream close
- CI: github: use the GITHUB_TOKEN instead of a manually generated token
- BUILD: makefile: build the features list dynamically
- BUILD: makefile: move common options-oriented macros to include/make/options.mk
- BUILD: makefile: sort the features list
- BUILD: makefile: initialize all build options' variables at once
- BUILD: makefile: add a function to collect all options' CFLAGS/LDFLAGS
- BUILD: makefile: start to automatically collect CFLAGS/LDFLAGS
- BUILD: makefile: ensure that all USE_* handlers appear before CFLAGS are used
- BUILD: makefile: clean the wolfssl include and lib generation rules
- BUILD: makefile: make sure to also ignore SSL_INC when using wolfssl
- BUILD: makefile: reference libdl only once
- BUILD: makefile: make sure LUA_INC and LUA_LIB are always initialized
- BUILD: makefile: do not restrict Lua's prepend path to empty LUA_LIB_NAME
- BUILD: makefile: never force -latomic, set USE_LIBATOMIC instead
- BUILD: makefile: add an implicit USE_MATH variable for -lm
- BUILD: makefile: properly report USE_PCRE/USE_PCRE2 in features
- CLEANUP: makefile: properly indent ifeq/ifneq conditional blocks
- BUILD: makefile: rework 51D to split v3/v4
- BUILD: makefile: support LIBCRYPT_LDFLAGS
- BUILD: makefile: support RT_LDFLAGS
- BUILD: makefile: support THREAD_LDFLAGS
- BUILD: makefile: support BACKTRACE_LDFLAGS
- BUILD: makefile: support SYSTEMD_LDFLAGS
- BUILD: makefile: support ZLIB_CFLAGS and ZLIB_LDFLAGS
- BUILD: makefile: support ENGINE_CFLAGS
- BUILD: makefile: support OPENSSL_CFLAGS and OPENSSL_LDFLAGS
- BUILD: makefile: support WOLFSSL_CFLAGS and WOLFSSL_LDFLAGS
- BUILD: makefile: support LUA_CFLAGS and LUA_LDFLAGS
- BUILD: makefile: support DEVICEATLAS_CFLAGS and DEVICEATLAS_LDFLAGS
- BUILD: makefile: support PCRE[2]_CFLAGS and PCRE[2]_LDFLAGS
- BUILD: makefile: refactor support for 51DEGREES v3/v4
- BUILD: makefile: support WURFL_CFLAGS and WURFL_LDFLAGS
- BUILD: makefile: make all OpenSSL variants use the same settings
- BUILD: makefile: remove the special case of the SSL option
- BUILD: makefile: only consider settings from enabled options
- BUILD: makefile: also list per-option settings in 'make opts'
- BUG/MINOR: debug: don't mask the TH_FL_STUCK flag before dumping threads
- MINOR: cfgparse-ssl: avoid a possible crash on OOM in ssl_bind_parse_npn()
- BUG/MINOR: ssl: Missing goto in error path in ocsp update code
- BUG/MINOR: stick-table: report the correct action name in error message
- CI: Improve headline in matrix.py
- CI: Add in-memory cache for the latest OpenSSL/LibreSSL
- CI: Use proper `if` blocks instead of conditional expressions in matrix.py
- CI: Unify the `GITHUB_TOKEN` name across matrix.py and vtest.yml
- CI: Explicitly check environment variable against `None` in matrix.py
- CI: Reformat `matrix.py` using `black`
- MINOR: config: add environment variables for default log format
- REGTESTS: Remove REQUIRE_VERSION=1.9 from all tests
- REGTESTS: Remove REQUIRE_VERSION=2.0 from all tests
- REGTESTS: Remove tests with REQUIRE_VERSION_BELOW=1.9
- BUG/MINOR: http-fetch: Only fill txn status during prefetch if not already set
- BUG/MAJOR: buf: Fix copy of wrapping output data when a buffer is realigned
- DOC: config: fix alphabetical ordering of http-after-response rules
- MINOR: http-rules: Add missing actions in http-after-response ruleset
- DOC: config: remove duplicated "http-response sc-set-gpt0" directive
- BUG/MINOR: proxy: free orgto_hdr_name in free_proxy()
- REGTEST: fix the race conditions in json_query.vtc
- REGTEST: fix the race conditions in add_item.vtc
- REGTEST: fix the race conditions in digest.vtc
- REGTEST: fix the race conditions in hmac.vtc
- BUG/MINOR: fd: avoid bad tgid assertion in fd_delete() from deinit()
- BUG/MINOR: http: Memory leak of http redirect rules' format string
- MEDIUM: stick-table: set the track-sc limit at boottime via tune.stick-counters
- MINOR: stick-table: implement the sc-add-gpc() action
The number of stick-counter entries usable by track-sc rules is currently
set at build time. There is no good value for this since the vast majority
of users don't need any, most need only a few and rare users need more.
Adding more counters for everyone increases memory and CPU usages for no
reason.
This patch moves the per-session and per-stream arrays to a pool of a size
defined at boot time. This way it becomes possible to set the number of
entries at boot time via a new global setting "tune.stick-counters" that
sets the limit for the whole process. When not set, the MAX_SESS_STR_CTR
value still applies, or 3 if not set, as before.
It is also possible to lower the value to 0 to save a bit of memory if
not used at all.
Note that a few low-level sample-fetch functions had to be protected due
to the ability to use sample-fetches in the global section to set some
variables.
This patch provides a convenient way to override the default TCP, HTTP
and HTTP log formats. Instead of having a look into the documentation
to figure out what is the appropriate default log format three new
environment variables can be used: HAPROXY_TCP_LOG_FMT,
HAPROXY_HTTP_LOG_FMT and HAPROXY_HTTPS_LOG_FMT. Their content are
substituted verbatim.
These variables are set before parsing the configuration and are unset
just after all configuration files are successful parsed.
Example:
# Instead of writing this long log-format line...
log-format "%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC \
%CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r \
lr=last_rule_file:last_rule_line"
# ..the HAPROXY_HTTP_LOG_FMT can be used to provide the default
# http log-format string
log-format "${HAPROXY_HTTP_LOG_FMT} lr=last_rule_file:last_rule_line"
Please note that nothing prevents users to unset the variables or
override their content in a global section.
Signed-off-by: Sébastien Gross <sgross@haproxy.com>
Till now it was only possible to change the thread local hot cache size
at build time using CONFIG_HAP_POOL_CACHE_SIZE. But along benchmarks it
was sometimes noticed a huge contention in the lower level memory
allocators indicating that larger caches could be beneficial, especially
on machines with large L2 CPUs.
Given that the checks against this value was no longer on a hot path
anymore, there was no reason for continuing to force it to be tuned at
build time. So this patch allows to set it by tune.memory-hot-size.
It's worth noting that during the boot phase the value remains zero so
that it's possible to know if the value was set or not, which opens the
possibility that we try to automatically adjust it based on the per-cpu
L2 cache size or the use of certain protocols (none of this is done yet).
In ticket #1956, it was reported that an upgrade from 2.6 to 2.7 via a
reload would stop the master process.
When upgrading the binary, the new process is considered reexec and does
not try to creates the socketpair for the mcli_reload listener, then
tries to bind on -1 since the socket doesn't exit. The failure provokes
an exit() of the master.
This patch fixes the issue by trying to create the mcli_reload sockets
only when they don't exist, instead of creating them at first start.
This way we also avoid possible fd leak since we always try to use the
existing FDs first.
Must be backported in 2.7.
When the mworker wait mode fails it does an exit, but there is no
error message which says it exits.
Add a message which specify that the error is non-recoverable.
Could be backported in 2.7 and possibly earlier branch.
Activate QUIC connection socket to achieve the best performance. The
previous behavior can be reverted by tune.quic.socket-owner
configuration option.
This change is part of quic-conn owned socket implementation.
Contrary to its siblings patches, I suggest to not backport it to 2.7.
This should ensure that stable releases behavior is perserved. If a user
faces issues with QUIC performance on 2.7, he can nonetheless change the
default configuration.
This adds a USE_OPENSSL_WOLFSSL option, wolfSSL must be used with the
OpenSSL compatibility layer. This must be used with USE_OPENSSL=1.
WolfSSL build options:
./configure --prefix=/opt/wolfssl --enable-haproxy
HAProxy build options:
USE_OPENSSL=1 USE_OPENSSL_WOLFSSL=1 WOLFSSL_INC=/opt/wolfssl/include/ WOLFSSL_LIB=/opt/wolfssl/lib/ ADDLIB='-Wl,-rpath=/opt/wolfssl/lib'
Using at least the commit 54466b6 ("Merge pull request #5810 from
Uriah-wolfSSL/haproxy-integration") from WolfSSL. (2022-11-23).
This is still to be improved, reg-tests are not supported yet, and more
tests are to be done.
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
If no cluster-secret is defined by the user, a random one is silently
generated.
This ensures that at least QUIC Retry tokens are generated if abnormal
conditions are detected. However, it is advisable to specify it in the
configuration for tokens to be valid even after a reload or across LBs
instances in the same cluster.
This should be backported up to 2.6.
The SSL_load_error_strings function was marked as deprecated in OpenSSL
1.1.0 so compiling HAProxy with OPENSSL_NO_DEPRECATED set and a recent
OpenSSL library would fail.
The manpages say that this function was replaced by OPENSSL_init_crypto
and OPENSSL_init_ssl which are already called at start up by the SSL
lib. We do not seem to be in a case where explicit call of those
functions is required.
This patch fixes GitHub issue #1813.
It can be backported to 2.6.
Once in a while we spot a bug in the deinit code that is complex,
especially when it has to deal with incomplete initializations, and the
ability to bypass this step has regularly been raised. In addition for
fast-reloading setups it could theoretically save some time. Tests have
shown that very large configs can barely save ~100-150ms by skipping the
deinit step. However the ability not to crash if a bug is encountered can
occasionally help.
This patch adds an option to do exactly this. It's obviously not enabled
by default and the documentation discourages from using it, but this might
be useful in the future.
When compiled with USE_SHM_OPEN=1 the startup-logs are now able to use
an shm which is used to keep the logs when switching to mworker wait
mode. This allows to keep the failed reload logs.
When allocating the startup-logs at first start of the process, haproxy
will do a shm_open with a unique path using the PID of the process, the
file is unlink immediatly so we don't let unwelcomed files be. The fd
resulting from this shm is stored in the HAPROXY_STARTUPLOGS_FD
environment variable so it can be mmap again when switching to wait
mode.
When forking children, the process is copying the mmap to a a mallocated
ring so we never share the same memory section between the master and
the workers. When switching to wait mode, the shm is not used anymore as
it is also copied to a mallocated structure.
This allow to use the "show startup-logs" command over the master CLI,
to get the logs of the latest startup or reload. This way the logs of
the latest failed reload are also kept.
This is only activated on the linux-glibc target for now.
As seen in issue #1866, some environments will not allow to change the
current FD limit, and actually we don't need to do it, we only do it as
a byproduct of adjusting the limit to the one that fits. Here we're
replacing calls to setrlimit() with calls to raise_rlim_nofile(), which
will avoid making the setrlimit() syscall in case the desired value is
lower than the current process' one.
This depends on previous commit "MINOR: fd: add a new function to only
raise RLIMIT_NOFILE" and may need to be backported to 2.6, possibly
earlier, depending on users' experience in such environments.
xprt_quic module was too large and did not reflect the true architecture
by contrast to the other protocols in haproxy.
Extract code related to XPRT layer and keep it under xprt_quic module.
This code should only contains a simple API to communicate between QUIC
lower layer and connection/MUX.
The vast majority of the code has been moved into a new module named
quic_conn. This module is responsible to the implementation of QUIC
lower layer. Conceptually, it overlaps with TCP kernel implementation
when comparing QUIC and HTTP1/2 stacks of haproxy.
This should be backported up to 2.6.
Add an option to dump the number lines of the configuration file when
it's dumped. Other options can be easily added. Options are separated
by ',' when tapping the command line:
'./haproxy -dC[key],line -f [file]'
No backport needed, except if anonymization mechanism is backported.
The environment variable HAPROXY_LOAD_SUCCESS stores "1" if it
successfully load the configuration and started, "0" otherwise.
The "_loadstatus" master CLI command displays either
"Loading failure!\n" or "Loading success.\n"
When using the "reload" command over the master CLI, all connections to
the master CLI were cut, this was unfortunate because it could have been
used to implement a synchronous reload command.
This patch implements an architecture to keep the connection alive after
the reload.
The master CLI is now equipped with a listener which uses a socketpair,
the 2 FDs of this socketpair are stored in the mworker_proc of the
master, which the master keeps via the environment variable.
ipc_fd[1] is used as a listener for the master CLI. During the "reload"
command, the CLI will send the FD of the current session over ipc_fd[0],
then the reload is achieved, so the master won't handle the recv of the
FD. Once reloaded, ipc_fd[1] receives the FD of the session, so the
connection is preserved. Of course it is a new context, so everything
like the "prompt mode" are lost.
Only the FD which performs the reload is kept.
This commit adds a new command line option -dC to dump the configuration
file. An optional key may be appended to -dC in order to produce an
anonymized dump using this key. The anonymizing process uses the same
algorithm as the CLI so that the same key will produce the same hashes
for the same identifiers. This way an admin may share an anonymized
extract of a configuration to match against live dumps. Note that key 0
will not anonymize the output. However, in any case, the configuration
is dumped after tokenizing, thus comments are lost.
Add self-wake in signal_handler() to fix a race condition with a signal
coming in between checking signal_queue_len and entering polling sleep.
The changes in commit 43c891dda ("BUG/MINOR: signals/poller: set the
poller timeout to 0 when there are signals") were insufficient.
Move the signal_queue_len check from the poll implementations to
run_poll_loop() to keep that logic in one place.
The poll loops are terminated either by the parameter wake being set or
wake up due to a write to their poller_wr_pipe by wake_thread() in
signal_handler().
This fixes issue #1841.
Must be backported in every stable version.
Christopher bisected that recent commit d0b73bca71 ("MEDIUM: listener:
switch bind_thread from global to group-local") broke the master socket
in that only the first out of the Nth initial connections would work,
where N is the number of threads, after which they all work.
The cause is that the master socket was bound to multiple threads,
despite global.nbthread being 1 there, so the incoming connection load
balancing would try to send incoming connections to non-existing threads,
however the bind_thread mask would nonetheless include multiple threads.
What happened is that in 1.9 we forced "nbthread" to 1 in the master's poll
loop with commit b3f2be338b ("MEDIUM: mworker: use the haproxy poll loop").
In 2.0, nbthread detection was enabled by default in commit 149ab779cc
("MAJOR: threads: enable one thread per CPU by default"). From this point
on, the operation above is unsafe because everything during startup is
performed with nbthread corresponding to the default value, then it
changes to one when starting the polling loop. But by then we weren't
using the wait mode except for reload errors, so even if it would have
happened nobody would have noticed.
In 2.5 with commit fab0fdce9 ("MEDIUM: mworker: reexec in waitpid mode
after successful loading") we started to rexecute all the time, not just
for errors, so as to release precious resources and to possibly spot bugs
that were rarely exposed in this mode. By then the incoming connection LB
was enforcing all_threads_mask on the listener's thread mask so that the
incorrect value was being corrected while using it.
Finally in 2.7 commit d0b73bca71 ("MEDIUM: listener: switch bind_thread
from global to group-local") replaces the all_threads_mask there with
the listener's bind_thread, but that one was never adjusted by the
starting master, whose thread group was filled to N threads by the
automatic detection during early setup.
The best approach here is to set nbthread to 1 very early in init()
when we're in the master in wait mode, so that we don't try to guess
the best value and don't end up with incorrect bindings anymore. This
patch does this and also sets nbtgroups to 1 in preparation for a
possible future where this will also be automatically calculated.
There is no need to backport this patch since no other versions were
affected, but if it were to be discovered that the incorrect bind mask
on some of the master's FDs could be responsible for any trouble in
older versions, then the backport should be safe (provided that
nbtgroups is dropped of course).