37 Commits

Author SHA1 Message Date
William Lallemand
2078d4b1f7 BUG/MINOR: mworker: use MASTER_MAXCONN as default maxconn value
In environments where SYSTEM_MAXCONN is defined when compiling, the
master will use this value instead of the original minimal value which
was set to 100. When this happens, the master process could allocate
RAM excessively since it does not need to have an high maxconn. (For
example if SYSTEM_MAXCONN was set to 100000 or more)

This patch fixes the issue by using the new define MASTER_MAXCONN which
define a default maxconn of 100 for the master process.

Must be backported as far as 2.5.
2023-03-09 14:28:44 +01:00
Willy Tarreau
28360dc53f MEDIUM: clock: force internal time to wrap early after boot
GH issue #2034 clearly indicates yet another case of time roll-over
that went badly. Issues that happen only once every 50 days are hard
to detect and debug, and are usually reported more or less synchronized
from multiple sources. This patch finally does what had long been planned
but never done yet, which is to force the time to wrap early after boot
so that any such remaining issue can be spotted quicker. The margin delay
here is 20s (it may be changed by setting BOOT_TIME_WRAP_SEC to another
value). This value seems sufficient to permit failed health checks to
succeed and traffic to come in and possibly start to update some time
stamps (accept dates in logs, freq counters, stick-tables expiration
dates etc).

It could theoretically be helpful to have this in 2.7, but as can be
seen with the two patches below, we've already had incorrect use cases
of the internal monotonic time when the wall-clock one was needed, so
we could expect to detect other ones in the future. Note that this will
*not* induce bugs, it will only make them happen much faster (i.e. no
need to wait for 50 days before seeing them). If it were to eventually
be backported, these two previous patches must also be backported:

    BUG/MINOR: clock: use distinct wall-clock and monotonic start dates
    BUG/MEDIUM: cache: use the correct time reference when comparing dates
2023-02-08 11:10:33 +01:00
Willy Tarreau
b8b243ac6a MINOR: trace: add the long awaited TRACE_PRINTF()
TRACE_PRINTF() can be used to produce arbitrary trace contents at any
trace level. It uses the exact same arguments as other TRACE_* macros,
but here they are mandatory since they are followed by the format-string,
though they may be filled with zeroes. The reason for the arguments is to
match tracking or filtering and not pollute other non-inspected objects.

It will probably be used inside loops, in which case there are two points
to be careful about:
  - output atomicity is only per-message, so competing threads may see
    their messages interleaved. As such, it is recommended that the
    caller places a recognizable unique context at the beginning of the
    message such as a connection pointer.
  - iterating over arrays or lists for all requests could be very
    expensive. In order to avoid this it is best to condition the call
    via TRACE_ENABLED() with the same arguments, which will return the
    same decision.
  - messages longer than TRACE_MAX_MSG-1 (1023 by default) will be
    truncated.

For example, in order to dump the list of HTTP headers between hpack
and h2:

  if (outlen > 0 &&
      TRACE_ENABLED(TRACE_LEVEL_DEVELOPER,
      H2_EV_RX_FRAME|H2_EV_RX_HDR, h2c->conn, 0, 0, 0)) {
          int i;
          for (i = 0; list[i].n.len; i++)
              TRACE_PRINTF(TRACE_LEVEL_DEVELOPER, H2_EV_RX_FRAME|H2_EV_RX_HDR,
                           h2c->conn, 0, 0, 0, "h2c=%p hdr[%d]=%s:%s", h2c, i,
                           list[i].n.ptr, list[i].v.ptr);
  }

In addition, a lower-level TRACE_PRINTF_LOC() macro is provided, that takes
two extra arguments, the caller's location and the caller's function name.
This will allow to emit composite traces from central functions on the
behalf of another one.
2023-01-26 15:49:43 +01:00
Willy Tarreau
4da51bd190 CLEANUP: pools: get rid of CONFIG_HAP_POOLS
This one was set in defaults.h only when neither DEBUG_NO_POOLS nor
DEBUG_UAF were set. This was not the most convenient location to look
for it, and it was only used in pool.c to decide on the initial value
of POOL_DBG_NO_CACHE.

Let's just use DEBUG_NO_POOLS || DEBUG_UAF directly on this flag and
get rid of the intermediary condition. This also has the benefit of
removing a double inversion, which is always nice for understanding.
2022-12-08 17:45:08 +01:00
William Lallemand
eba6a54cd4 MINOR: logs: startup-logs can use a shm for logging the reload
When compiled with USE_SHM_OPEN=1 the startup-logs are now able to use
an shm which is used to keep the logs when switching to mworker wait
mode. This allows to keep the failed reload logs.

When allocating the startup-logs at first start of the process, haproxy
will do a shm_open with a unique path using the PID of the process, the
file is unlink immediatly so we don't let unwelcomed files be. The fd
resulting from this shm is stored in the HAPROXY_STARTUPLOGS_FD
environment variable so it can be mmap again when switching to wait
mode.

When forking children, the process is copying the mmap to a a mallocated
ring so we never share the same memory section between the master and
the workers. When switching to wait mode, the shm is not used anymore as
it is also copied to a mallocated structure.

This allow to use the "show startup-logs" command over the master CLI,
to get the logs of the latest startup or reload. This way the logs of
the latest failed reload are also kept.

This is only activated on the linux-glibc target for now.
2022-10-13 16:50:22 +02:00
Willy Tarreau
693688e734 MINOR: config: automatically preset MAX_THREADS based on MAX_TGROUPS
MAX_THREADS was not changed when setting MAX_TGROUPS, which still limits
some possibilities. Let's preset it to 4 * LONGBITS when MAX_TGROUPS is
larger than 1, or LONGBITS when it's set to 1. This means that the new
default value is 256 threads.

The rationale behind this is that the main use of thread groups is
mostly to address NUMA issues and that we don't necessarily need large
thread counts when using many groups, and 256 threads is already plenty
even on quite large systems.

For now it's important not to go too far because some internal structs
are arrays of MAX_THREADS entries, for example accept_queue_ring, which
is around 8kB per thread. Such structures will need to become dynamic
before defaulting to large thread counts (at 4096 threads max the
accept queues would require 32 MB RAM alone).
2022-08-06 16:51:20 +02:00
Willy Tarreau
856d56d2d2 MINOR: config: change default MAX_TGROUPS to 16
This will allows nbtgroups > 1 to be declared in the config without
recompiling. The theoretical limit is 64, though we'd rather not push
it too far for now as some structures might be enlarged to be indexed
per group. Let's start with 16 groups max, allowing to experiment with
dual-socket machines suffering from up to 8 loosely coupled L3 caches.
It's a good start and doesn't engage us too far.
2022-07-15 21:51:48 +02:00
Willy Tarreau
3ccb14d60d MINOR: thread: get rid of MAX_THREADS_MASK
This macro was used both for binding and for lookups. When binding tasks
or FDs, using all_threads_mask instead is better as it will later be per
group. For lookups, ~0UL always does the job. Thus in practice the macro
was already almost not used anymore since the rest of the code could run
fine with a constant of all ones there.
2022-06-14 11:18:40 +02:00
Willy Tarreau
197715ae21 CLEANUP: compression: move the default setting of maxzlibmem to defaults
__comp_fetch_init() only presets the maxzlibmem, and only when both
USE_ZLIB and DEFAULT_MAXZLIBMEM are set. The intent is to preset a
default value to protect the system against excessive memory usage
when no setting is set by the user.

Nowadays the entry in the global struct is always there so there's no
point anymore in passing via a constructor to possibly set this value.
Let's go the cleaner way by always presetting DEFAULT_MAXZLIBMEM to 0
in defaults.h unless these conditions are met, and always assigning it
instead of pre-setting the entry to zero. This is more straightforward
and removes some ifdefs and the last constructor. In addition, now the
setting has a chance of being found.
2022-04-25 19:42:43 +02:00
Remi Tricot-Le Breton
1d6338ea96 MEDIUM: ssl: Disable DHE ciphers by default
DHE ciphers do not present a security risk if the key is big enough but
they are slow and mostly obsoleted by ECDHE. This patch removes any
default DH parameters. This will effectively disable all DHE ciphers
unless a global ssl-dh-param-file is defined, or
tune.ssl.default-dh-param is set, or a frontend has DH parameters
included in its PEM certificate. In this latter case, only the frontends
that have DH parameters will have DHE ciphers enabled.
Adding explicitely a DHE ciphers in a "bind" line will not be enough to
actually enable DHE. We would still need to know which DH parameters to
use so one of the three conditions described above must be met.

This request was described in GitHub issue #1604.
2022-04-20 17:30:55 +02:00
Willy Tarreau
b61fccdc3f CLEANUP: init: remove the ifdef on HAPROXY_MEMMAX
It's ugly, let's move it to defaults.h with all other ones and preset
it to zero if not defined.
2022-02-23 17:11:33 +01:00
Remi Tricot-Le Breton
55d7e782ee MINOR: ssl: Set default dh size to 2048
Starting from OpenSSLv3, we won't rely on the
SSL_CTX_set_tmp_dh_callback mechanism so we will need to know the DH
size we want to use during init. In order for the default DH param size
to be used when no RSA or DSA private key can be found for a given bind
line, we will need to know the default size we want to use (which was
not possible the way the code was built, since the global default dh
size was set too late.
2022-02-14 10:07:14 +01:00
Willy Tarreau
39fd546d4b MINOR: pools: enable pools with DEBUG_FAIL_ALLOC as well
During 2.4-dev, fault injection was enabled for cached pools with commit
207c09509 ("MINOR: pools: move the fault injector to __pool_alloc()"),
except that the condition for CONFIG_HAP_POOLS still depended on
DEBUG_FAIL_ALLOC not being set, which limits the usability to cases
where the define is set by hand. Let's remove it from the equation as
this is not a constraint anymore. While a bit old, there's no need to
backport this as it's only used during development.
2022-01-12 17:31:01 +01:00
Willy Tarreau
f5e94b2f47 OPTIM: pools: reduce local pool cache size to 512kB
Now that we support batched allocations/releases, it appears that we can
reach the same performance on H2 with shared pools and 256kB thread-local
cache as without shared pools, a fast allocator and 1MB thread-local cache.
With 512kB we're up to 10% faster on highly multiplexed H2 than without the
shared cache. This was tested on a 16-core ARM machine. Thus it's time to
slightly reduce the per-thread memory cost, which may also improve the
performance on machines with smaller L2 caches. It essentially reverts
commit f587003fe ("MINOR: pools: double the local pool cache size to 1 MB").
2022-01-02 19:52:15 +01:00
Willy Tarreau
43937e920f MEDIUM: pools: start to batch eviction from local caches
Since previous patch we can forcefully evict multiple objects from the
local cache, even when evicting basd on the LRU entries. Let's define
a compile-time configurable setting to batch releasing of objects. For
now we set this value to 8 items per round.

This is marked medium because eviction from the LRU will slightly change
in order to group the last items that are freed within a single cache
instead of accurately scanning only the oldest ones exactly in their
order of appearance. But this is required in order to evolve towards
batched removals.
2022-01-02 19:35:26 +01:00
Willy Tarreau
f9662848f2 MINOR: threads: introduce a minimalistic notion of thread-group
This creates a struct tgroup_info which knows the thread ID of the first
thread in a group, and the number of threads in it. For now there's only
one thread group supported in the configuration, but it may be forced to
other values for development purposes by defining MAX_TGROUPS, and it's
enabled even when threads are disabled and will need to remain accessible
during boot to keep a simple enough internal API.

For the purpose of easing the configurations which do not specify a thread
group, we're starting group numbering at 1 so that thread group 0 can be
"undefined" (i.e. for "bind" lines or when binding tasks).

The goal will be to later move there some global items that must be
made per-group.
2021-10-08 17:22:26 +02:00
Willy Tarreau
561958c17c CLEANUP: time: move a few configurable defines to defaults.h
TV_ETERNITY, TV_ETERNITY_MS and MAX_DELAY_MS may be configured and
ought to be in defaults.h so that they can be inherited from everywhere
without including time.h and could also be redefined if neede
(particularly for MAX_DELAY_MS).
2021-10-07 01:41:14 +02:00
Willy Tarreau
8de6dc9926 REORG: pools: move default settings to defaults.h
There's no reason CONFIG_HAP_POOLS and its opposite are located into
pools-t.h, it forces those that depend on them to inlcude the file.
Other similar options are normally dealt with in defaults.h, which is
part of the default API, so let's do that.
2021-09-28 19:31:16 +02:00
Tim Duesterhus
d5fc8fcb86 CLEANUP: Add haproxy/xxhash.h to avoid modifying import/xxhash.h
This solves setting XXH_INLINE_ALL in a cleaner way, because the imported
header is not modified, easing future updates.

see 6f7cc11e6dd0f01b437fba893da2edd2362660a2
2021-09-11 19:58:45 +02:00
Willy Tarreau
33056436c7 BUILD/MINOR: defaults: eliminate warning on MAXHOSTNAMELEN with -Wundef
As reported in GH issue #1369, there is a single case of #if with a
possibly undefined value in defaults.h which is on MAXHOSTNAMELEN. Let's
turn it to a #ifdef.
2021-08-28 12:05:32 +02:00
Willy Tarreau
dc70c18ddc BUG/MEDIUM: cfgcond: limit recursion level in the condition expression parser
Oss-fuzz reports in issue 36328 that we can recurse too far by passing
extremely deep expressions to the ".if" parser. I thought we were still
limited to the 1024 chars per line, that would be highly sufficient, but
we don't have any limit now :-/

Let's just pass a maximum recursion counter to the recursive parsers.
It's decremented for each call and the expression fails if it reaches
zero. On the most complex paths it can add 3 levels per parenthesis,
so with a limit of 1024, that's roughly 343 nested sub-expressions that
are supported in the worst case. That's more than sufficient, for just
a few kB of RAM.

No backport is needed.
2021-07-20 18:03:08 +02:00
Willy Tarreau
06987f4238 CLEANUP: global: remove unused definition of MAX_PROCS
This one was forced to 1 and the only reference was a test to verify it
was comprised between 1 and LONGBITS.
2021-06-15 16:52:42 +02:00
Willy Tarreau
b63dbb7b2e MAJOR: config: remove parsing of the global "nbproc" directive
This one was deprecated in 2.3 and marked for removal in 2.5. It suffers
too many limitations compared to threads, and prevents some improvements
from being engaged. Instead of a bypassable startup error, there is now
a hard error.

The parsing code was removed, and very few obvious cases were as well.
The code is deeply rooted at certain places (e.g. "for" loops iterating
from 0 to nbproc) so it will not be that trivial to remove everywhere.
The "bind" and "bind-process" parsers will have to be adjusted, though
maybe not completely changed if we later want to support thread groups
for large NUMA machines. Some stats socket restrictions were removed,
and the doc was updated according to what was done. A few places in the
doc still refer to nbproc and will have to be revisited. The master-worker
code also refers to the process number to distinguish between master and
workers and will have to be carefully adjusted. The MAX_PROCS macro was
reset to 1, this will at least reduce the size of some remaining arrays.

Two regtests were dependieng on this directive, one with an explicit
"nbproc 1" and another one testing the master's CLI using nbproc 4.
Both were adapted.
2021-06-11 17:02:13 +02:00
Amaury Denoyelle
b56a7c89a8 MEDIUM: cfgparse: detect numa and set affinity if needed
On process startup, the CPU topology of the machine is inspected. If a
multi-socket CPU machine is detected, automatically define the process
affinity on the first node with active cpus. This is done to prevent an
impact on the overall performance of the process in case the topology of
the machine is unknown to the user.

This step is not executed in the following condition :
- a non-null nbthread statement is present
- a restrictive 'cpu-map' statement is present
- the process affinity is already restricted, for example via a taskset
  call

For the record, benchmarks were executed on a machine with 2 CPUs
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. In both clear and ssl
scenario, the performance were sub-optimal without the automatic
rebinding on a single node.
2021-04-23 16:06:49 +02:00
Willy Tarreau
f14c7570d6 CLEANUP: cli: rename MAX_STATS_ARGS to MAX_CLI_ARGS
This is the number of args accepted on a command received on the CLI,
is has long been totally independent of stats and should not carry
this misleading "stats" name anymore.
2021-03-13 10:59:23 +01:00
Willy Tarreau
060a761248 OPTIM: task: automatically adjust the default runqueue-depth to the threads
The recent default runqueue size reduction appeared to have significantly
lowered performance on low-thread count configs. Testing various values
runqueue values on different workloads under thread counts ranging from
1 to 64, it appeared that lower values are more optimal for high thread
counts and conversely. It could even be drawn that the optimal value for
various workloads sits around 280/sqrt(nbthread), and probably has to do
with both the L3 cache usage and how to optimally interlace the threads'
activity to minimize contention. This is much easier to optimally
configure, so let's do this by default now.
2021-03-10 11:15:34 +01:00
Willy Tarreau
f587003fe9 MINOR: pools: double the local pool cache size to 1 MB
The reason is that H2 can already require 32 16kB buffers for the mux
output at once, which will deplete the local cache. Thus it makes sense
to go further to leave some time to other connection to release theirs.
In addition, the L2 cache on modern CPUs is already 1 MB, so this change
is welcome in any case.
2021-03-05 08:30:08 +01:00
Willy Tarreau
66161326fd MINOR: listener: refine the default MAX_ACCEPT from 64 to 4
The maximum number of connections accepted at once by a thread for a single
listener used to default to 64 divided by the number of processes but the
tasklet-based model is much more scalable and benefits from smaller values.
Experimentation has shown that 4 gives the highest accept rate for all
thread values, and that 3 and 5 come very close, as shown below (HTTP/1
connections forwarded per second at multi-accept 4 and 64):

 ac\thr|    1     2    4     8     16
 ------+------------------------------
      4|   80k  106k  168k  270k  336k
     64|   63k   89k  145k  230k  274k

Some tests were also conducted on SSL and absolutely no change was observed.

The value was placed into a define because it used to be spread all over the
code.

It might be useful at some point to backport this to 2.3 and 2.2 to help
those who observed some performance regressions from 1.6.
2021-02-19 16:02:04 +01:00
Willy Tarreau
4327d0ac00 MINOR: tasks: refine the default run queue depth
Since a lot of internal callbacks were turned to tasklets, the runqueue
depth had not been readjusted from the default 200 which was initially
used to favor batched processing. But nowadays it appears too large
already based on the following tests conducted on a 8c16t machine with
a simple config involving "balance leastconn" and one server. The setup
always involved the two threads of a same CPU core except for 1 thread,
and the client was running over 1000 concurrent H1 connections. The
number of requests per second is reported for each (runqueue-depth,
nbthread) couple:

 rq\thr|    1     2     4     8    16
 ------+------------------------------
     32|  120k  159k  276k  477k  698k
     40|  122k  160k  276k  478k  722k
     48|  121k  159k  274k  482k  720k
     64|  121k  160k  274k  469k  710k
    200|  114k  150k  247k  415k  613k  <-- default

It's possible to save up to about 18% performance by lowering the
default value to 40. One possible explanation to this is that checking
I/Os more frequently allows to flush buffers faster and to smooth the
I/O wait time over multiple operations instead of alternating phases
of processing, waiting for locks and waiting for new I/Os.

The total round trip time also fell from 1.62ms to 1.40ms on average,
among which at least 0.5ms is attributed to the testing tools since
this is the minimum attainable on the loopback.

After some observation it would be nice to backport this to 2.3 and
2.2 which observe similar improvements, since some users have already
observed some perf regressions between 1.6 and 2.2.
2021-02-19 16:01:55 +01:00
Dragan Dosen
6f7cc11e6d MEDIUM: xxhash: use the XXH_INLINE_ALL macro to inline all functions
This way we make all xxhash functions inline, with implementations being
directly included within xxhash.h.

Makefile is updated as well, since we don't need to compile and link
xxhash.o anymore.

Inlining should improve performance on small data inputs.
2020-12-23 06:39:21 +01:00
Willy Tarreau
ca8b069aa7 REORG: include: move MAX_THREADS to defaults.h
That's already where MAX_PROCS is set, and we already handle the case of
the default value so there is no reason for placing it in thread.h given
that most call places don't need the rest of the threads definitions. The
include was removed from global-t.h and activity.c.
2020-06-11 10:18:59 +02:00
Willy Tarreau
aeed4a85d6 REORG: include: move log.h to haproxy/log{,-t}.h
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.

Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.

A good approach would consist in splitting that file in 3 parts:
  - one for error output ("errors" ?).
  - one for log_format processing
  - and one for actual logging.
2020-06-11 10:18:58 +02:00
Willy Tarreau
ac13aeaa89 REORG: include: move auth.h to haproxy/auth{,-t}.h
The STATS_DEFAULT_REALM and STATS_DEFAULT_URI were moved to defaults.h.
It was required to include types/pattern.h and types/sample.h since they
are mentioned in function prototypes.

It would be wise to merge this with uri_auth.h later.
2020-06-11 10:18:57 +02:00
Willy Tarreau
0f6ffd652e REORG: include: move fd.h to haproxy/fd{,-t}.h
A few includes were missing in each file. A definition of
struct polled_mask was moved to fd-t.h. The MAX_POLLERS macro was
moved to defaults.h

Stdio used to be silently inherited from whatever path but it's needed
for list_pollers() which takes a FILE* and which can thus not be
forward-declared.
2020-06-11 10:18:57 +02:00
Willy Tarreau
fd4bffe7c0 REORG: include: move the base files from common/ to haproxy/
The files currently covered by api-t.h and api.h (compat, compiler,
defaults, initcall) are now located inside haproxy/.
2020-06-11 10:18:56 +02:00
Willy Tarreau
2dd0d4799e [CLEANUP] renamed include/haproxy to include/common 2006-06-29 17:53:05 +02:00
Willy Tarreau
baaee00406 [BIGMOVE] exploded the monolithic haproxy.c file into multiple files.
The files are now stored under :
  - include/haproxy for the generic includes
  - include/types.h for the structures needed within prototypes
  - include/proto.h for function prototypes and inline functions
  - src/*.c for the C files

Most include files are now covered by LGPL. A last move still needs
to be done to put inline functions under GPL and not LGPL.

Version has been set to 1.3.0 in the code but some control still
needs to be done before releasing.
2006-06-26 02:48:02 +02:00