Like many exposed network deamons, haproxy does normally not need to run
as root and strongly recommends against this, unless strictly necessary.
On some operating systems, capabilities even totally alleviate this need.
Lately, maybe due to a raise of containerization or automated config
generation or a bit of both, we've observed a resurgence of this bad
practice, possibly due to the fact that users are just not aware of the
conditions they're using their daemon.
Let's add a warning at boot when starting as root without having requested
it using "uid" or "user". And take this opportunity for warning the user
about the existence of capabilities when supported, and encouraging the
use of a chroot.
This is achieved by leaving global.uid set to -1 by default, allowing us
to detect if it was explicitly set or not.
add initial support for the "shm-stats-file" directive and
associated "shm-stats-file-max-objects" directive. For now they are
flagged as experimental directives.
The shared memory file is automatically created by the first process.
The file is created using open() so it is up to the user to provide
relevant path (either on regular filesystem or ramfs for performance
reasons). The directive takes only one argument which is path of the
shared memory file. It is passed as-is to open().
The maximum number of objects per thread-group (hard limit) that can be
stored in the shm is defined by "shm-stats-file-max-objects" directive,
Upon initial creation, the main shm stats file header is provisioned with
the version which must remains the same to be compatible between processes
and defaults to 2k. which means approximately 1mb max per thread group
and should cover most setups. When the limit is reached (during startup)
an error is reported by haproxy which invites the user to increase the
"shm-stats-file-max-objects" if desired, but this means more memory will
be allocated. Actual memory usage is low at start, because only the mmap
(mapping) is provisionned with the maximum number of objects to avoid
relocating the memory area during runtime, but the actual shared memory
file is dynamically resized when objects are added (resized by following
half power of 2 curve when new objects are added, see upcoming commits)
For now only the file is created, further logic will be implemented in
upcoming commits.
In issue #3013, an user observed a crash at startup of haproxy when
building statically and using the "user" global section.
This is a known problem of the glibc and the linker even warn about
this:
> warning: Using 'getgrnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
> warning: Using 'getpwnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
Let's emit a warning when using user/group in this case.
Add a new global option, "noktls", as well as a command line option,
"-dT", to totally disable ktls usage, even if it is activated on servers
or binds in the configuration.
That makes it easier to quickly figure out if a problem is related to
ktls or not.
It was mentioned during the development of glitches that it would be
nice to support not killing misbehaving connections below a certain
CPU usage so that poor implementations that routinely misbehave without
impact are not killed. This is now possible by setting a CPU usage
threshold under which we don't kill them via this parameter. It defaults
to zero so that we continue to kill them by default.
When env variable name or value are not provided for setenv/presetenv it's not
clear from the old error message shown at stderr, what exactly is missed. User
needs to search in it's configuration.
Let's add more explicit error messages about these inconsistencies.
No need to be backported.
TCP_NOTSENT_LOWAT is very convenient as it indicates when to report
EAGAIN on the sending side. It takes a margin on top of the estimated
window, meaning that it's no longer needed to store too many data in
socket buffers. Instead there's just enough to fill the send window
and a little bit of margin to cover the scheduling time to restart
sending. Experiments on a 100ms network have shown a 10-fold reduction
in the memory used by socket buffers by just setting this value to
tune.bufsize, without noticing any performance degradation. Theoretically
the responsiveness on multiplexed protocols such as H2 should also be
improved.
These ones were added by mistake during the change of the cfgparse
mechanism in 3.1, but they're corrupting the output of "debug counters"
by leaving stray ']' on their own lines. We could possibly check them
all once at boot but it doens't seem worth it.
This should be backported to 3.1.
Allow haproxy to take over idle connections from other thread groups
than our own. To control that, add a new tunable,
tune.takeover-other-tg-connections. It can have 3 values, "none", where
we won't attempt to get connections from the other thread group (the
default), "restricted", where we only will try to get idle connections
from other thread groups when we're using reverse HTTP, and "full",
where we always try to get connections from other thread groups.
Unless there is a special need, it is advised to use "none" (or
restricted if we're using reverse HTTP) as using connections from other
thread groups may have a performance impact.
It is not rare to see configurations with a large number of "tcp-request
content" or "http-request" rules for instance. A large number of rules
combined with cpu-demanding actions (e.g.: actions that work on content)
may create thread contention as all the rules from a given ruleset are
evaluated under the same polling loop if the evaluation is not interrupted
Thus, in this patch we add extra logic around "tcp-request content",
"tcp-response content", "http-request" and "http-response" rulesets, so
that when a certain number of rules are evaluated under the single polling
loop, we force the evaluating function to yield. As such, the rule which
was about to be evaluated is saved, and the function starts evaluating
rules from the save pointer when it returns (in the next polling loop).
We use task_wakeup(task, TASK_WOKEN_MSG) to explicitly wake the task so
that no time is wasted and the processing is resumed ASAP. TASK_WOKEN_MSG
is mandatory here because process_stream() expects TASK_WOKEN_MSG for
explicit analyzers re-evaluation.
rules_bcount stream's attribute was added to count how manu rules were
evaluated since last interruption (yield). Also, SF_RULE_FYIELD flag
was added to know that the s->current_rule was assigned due to forced
yield and not regular yield.
By default haproxy will enforce a yield every 50 rules, this behavior
can be configured using the "tune.max-rules-at-once" global keyword.
There is a limitation though: for now, if the ACT_OPT_FINAL flag is set
on act_opts, we consider it is not safe to yield (as it is already the
case for automatic yield). In this case instead of yielding an taking
the risk of not being called back, we skip the yield and hope it will
not create contention. This is something we should ideally try to
improve in order to yield in all conditions.
Define a new build mode DEBUG_STRESS. This will be used to stress some
code parts which cannot be reproduce easily with an alternative
suboptimal code.
First, a global <mode_stress> is set either to 1 or 0 depending on
DEBUG_STRESS compilation. A new global keyword "stress-level" is also
defined. It allows to specify a level from 0 to 9, to increase the
stress incurred on the code.
Helper macro STRESS_RUN* are defined for each stress level. This allows
to easily specify an instruction in default execution and a stress
counterpart if running on the corresponding stress level.
This commit prepares the parsing of localpeer keyword in MODE_DISCOVERY. We
need this, as HAPROXY_LOCALPEER environment variable could be checked in the
configuration in order to enable some backend or frontend settings.
So, let's at first add a dedicated parser for localpeer. At second, we no
longer need to check, if cfg_peers is valid pointer, as in MODE_DISCOVERY we
parse only the "global" section.
In addition, let's make the code of localpeer parser a little bit more
readable.
If directory provided as a "chroot" keyword argument does not exist or
inaccessible, this is reported only at the latest initialization stage, when
haproxy tries to perform chroot. Sometimes it's not very convenient, as the
process is already bound to listen sockets.
This was done explicitly in order not to break the case, when haproxy is
launched with "-c" option in some specific environment, where it's not possible
to create or to modify chroot directory, provided in the configuration.
So, let's add more checks for "chroot" directory during the parsing
stage and let's show diagnostic warnings, if this directory has become
non-accesible or was deleted. Like this, users, who wants to catch errors
related to misconfigured chroot before starting the process, can launch haproxy
with -dW and -dD. zero-warning mode will stop the process with error, if any
warning was emitted during initialization stage.
Let's add a dedicated parser for "chroot" keyword, as we add some more checks
for its argument in the next commit.
This reduces the size of cfg_parse_global().
Before this patch HAPROXY_LOCALPEER variable could be set in init_early(),
in init_args() and in cfg_parse_global(). In master-worker mode, if localpeer
keyword set in the global section, HAPROXY_LOCALPEER in the worker
environment is set to this keyword's value, but in the master environment it
still keeps the default, a localhost name. This is confusing.
To fix it, let's set HAPROXY_LOCALPEER only once, when a worker or process in a
standalone mode has finished to parse its configuration. And let's set this
variable only for the worker process or for the process in a standalone mode,
because the master doesn't need it.
HAPROXY_LOCALPEER takes the value saved in localpeer global variable, which is
always set by default in init_early() to the local hostname. Then, localpeer
could be reset in init_args (-L option) and in cfg_parse_global() (while
parsing "localpeer" keyword).
'Program' section is considered as deprecated now, see the commit 581c8a27d98c
("MEDIUM: mworker: depreciate the 'program' section"). So, the 'program'
section parser emits a warning every time since this commit, if its section is
presented. This makes impossible to launch the process in zero-warning mode.
After master-worker refactoring only the master process parses the 'program'
section. So, at first, in order to be able to start in zero-warning mode, we
need to parse in master process option, which allows deprecated keywords. Thus,
let's set in this commit KWF_DISCOVERY flag to
cfg_parse_global_non_std_directives parser, which parses
'expose-deprecated-directives' and 'expose-deprecated-directives' options.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, preventing from starting when set
e.g. to "64k". Let's make use of parse_size_err() on it so that units are
supported. This requires to turn it to uint as well, and to explicitly
limit its range to INT_MAX - 2*sizeof(void*), which was previously
partially handled as part of the sign check.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, and
since it's sometimes compared to an int, we limit its range to
0..INT_MAX.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, which
was verified to be OK.
Till now these values were parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on them so that
units are supported. This requires to turn them to uint as well, which
is OK.
This commit introduces the tune.renice.startup and tune.renice.runtime
global keywords that allows to change the priority with setpriority().
tune.renice.startup is parsed and applied in the worker or the standalone
process for configuration parsing. If this keyword is used alone, the
nice value is changed to the previous one after configuration parsing.
tune.renice.runtime is applied after configuration parsing, so in the
worker or a standalone process. Combined with tune.renice.startup it
allows to have a different nice value during configuration parsing and
during runtime.
The feature was discussed in github issue #1919.
Example:
global
tune.renice.startup 15
tune.renice.runtime 0
setenv/resetenv/presetenv/unsetenv keywords should be parsed by master
process and by worker. As some other master parameters could be enabled in
conditional blocks (.if...endif). To achieve this let's tag '*env' keywords
with KWF_DISCOVERY flag.
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.
Global section parser parses the majority of keywords in its function, so
those keywords don't have any dedicated parsers yet. Only after this parsing
block cfg_parse_global() starts to call dedicated parsers for any other
discovered keywords, which were not found in the block.
As all keywords, which should be parsed in MODE_DISCOVERY have its own parser
funtions, we can skip this block with goto discovery_kw and start directly from
the part, where we call parsers from the keywords list. KWF_DISCOVERY flag helps
to call in MODE_DISCOVERY only the parsers, which we are needed at this mode.
All unknown keywords and garbage will be ignored at this stage.
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.
Some keyword parsers tagged with KWF_DISCOVERY (for example those, which parse
runtime modes, poller types, pidfile), should not be called twice when
the configuration will be read the second time after the discovery mode.
It's redundant and could trigger parser's errors in standalone mode. In
master-worker mode the worker process inherits parsed settings from the master.
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.
So, let's add here KWF_DISCOVERY flag to distinguish the keywords,
which should be parsed in "discovery" mode and which are needed for master
process, from all others. Keywords, that should be parsed in "discovery" mode
have its dedicated parser funtions. Let's tag these functions with
KWF_DISCOVERY flag in keywords list. Like this, only these keyword parsers
might be called during the first configuration read in discovery mode.
This command is pausing the configuration parser for <timeout>
milliseconds. This is useful for development or for testing timeouts of
init scripts, particularly to simulate a very long reload. It requires
the expose-experimental-directives to be set.
It is no longer supported to declare debug traces, via 'trace' directive, in
a global section. A 'traces' directive must be used instead. The syntax of
the 'trace' directive in these sections remains the same. But it is no
longer experimental.
The main reason for this change is to avoid to have a ring section defined
before a global one. Indeed, for now, forward declarations of ring sections
are not supported. So to configure traces, you had to add a ring section
before the global one defining the traces. Most of time, that meant to have
two global sections :
global
[...] # global settings
ring <name>
[...]
global
[...] # trace config
In addition, it will be possible to easily extend the traces section by
adding some new directives.
This commit prepares the config parser to support MODE_DISCOVERY and, thus,
refactored master-worker mode. The latter implies, that master process reads
only the 'DISCOVERY' tagged keywords from the global section and it must call
for this an appropriate keyword parser.
So, let's move the code, which parses *env keywords, from the global section
parser to its own keyword registered parser.
Keywords setenv and presetenv take 2 arguments: variable name and value.
So, the total number, that should be passed to alertif_too_many_args is 2
("setenv <name> <value>") instead of 3. For alertif_too_many_args the first
argument index is 0.
This should be backported in all stable versions.
Remove tune.fast-forward from common_kw_list. It was replaced by
'tune.disable-fast-forward' and it's no longer present in "if..else if.."
parser from cfg_parse_global(). Otherwise, it may be shown as the best-match
keyword for some tune options, which is now wrong.
Should be backported in versions 2.9 and 3.0.
Following the previous commits and in order to clean up cfg_parse_global let's
move unsupported keywords in the global list and let's add for them a dedicated
parser.
In order to clean up cfg_parse_global() and to add the support of the new
MODE_DISCOVERY in configuration parsing, let's move the keywords related to
tune options into the global keywords list and let's add for them two dedicated
parsers. Tune options keywords are sorted between two parsers in dependency of
parameters number, which a given tune option needs.
tune options parser is called by section parser and follows the common API, i.e.
it returns -1 on failure, 0 on success and 1 on recoverable error. In case of
recoverable error we've previously returned ERR_ALERT (0x10) and we have emitted
an alert message at startup. Section parser treats all rc > 0 as ERR_WARN. So in
case, if some tune option was set twice in the global section, tune
options parser will return 1 (in order to respect the common API), section
parser will treat this as ERR_WARN and a warning message will be emitted during
process startup instead of alert, as it was before.
Following the previous commit let's also move 'expose-*' keywords in the global
cfg_kws list and let's add for them a dedicated parser. This will simplify the
configuration parsing in the new MODE_DISCOVERY, which allows to read only the
keywords, needed at the early start of haproxy process (i.e. modes, pidfile,
chosen poller).
This commit cleans up cfg_parse_global() and prepares the config parser to
support MODE_DISCOVERY. This step is needed in early starting stage, just to
figura out in which mode the process was started, to set some necessary
parameteres needed for this mode and to continue the initialization
stage.
'pidfile' makes part of such common keywords, which are needed to be parsed
very early and which are used almost in all process modes (except the
foreground, '-d').
'pidfile' keyword parser is called by section parser and follows the common
API, i.e. it returns -1 on failure, 0 on success and 1 on recoverable error. In
case of recoverable error we've previously returned ERR_ALERT (0x10) and we have
emitted an alert message at startup. Section parser treats all rc > 0 as
ERR_WARN. So in case, if pidfile was already specified via command line, the
keyword parser will return 1 (in order to respect the common API), section
parser will treat this as ERR_WARN and a warning message will be emitted during
process startup instead of alert, as it was before.
In the case, when the given keyword was found in the global 'cfg_kws' list, we
go to 'out' label anyway, after testing rc returned by the keyword's parser. So
there is not a much gain if we perform 'goto out' jump specifically when rc > 0.
This patch fixes commits 118ac11ce
("MINOR: cfgparse-global: move mode's keywords in cfg_kw_list") and 83ff4db18
(MINOR: cfgparse-global: move no<poller_name> in cfg_kw_list).
'common_kw_list' serves to show the best-match keyword in cfg_parse_global(), if
the given keyword was not parsed in "if..else if.." cases. cfg_parse_global()
is still used as a parser for some keywords from the global section.
Mode-specific and no<poller_name> keywords now have their own parsers. They no
longer take place in the "if..else if.." from cfg_parse_global() and they are
registered in the 'cfg_kws' list. So, there is no longer need to duplicate
them in the 'common_kw_list'. Otherwise, they will be shown twice in parser
error message.
This patch fixes the commit 118ac11ce
("cfgparse-global: move mode's keywords in cfg_kw_list"). Error message
delivered by keyword parser in **err is always shown with ha_alert() by the
caller cfg_parse_global(). The caller always supplies these alerts with the
filename and the line number.
This commit continues to clean up cfg_parse_global() and to prepare the
refactoring of master-worker mode. Master, after forking a worker, enters in
its wait polling loop to catch signals and to provide master CLI. So, some
poller types could be disabled for master process it as well.
This commit cleans up cfg_parse_global() and prepares the config parser for
master-worker mode refactoring, where daemon and master-worker fork() calls
will happen very early in init().
So, the config in such case should be read twice:
- at first: only some keywords in the global section for the mode discovery
and everything, which is related to master process by opportunity;
- at second: except the master process, all other keywords would be parsed;
Remove development leftover introduced by commit 15e9c7da6 ("MINOR: log:
add log-profile parsing logic").
Indeed, since "log-profile" section keyword is registered via
REGISTER_CONFIG_SECTION() macro, it is not relevant to declare it in
common_kw_list[] from cfgparse-global.c. All it does is that it could
confuse the user by suggesting him to use "log-profile" inside a global
section when trying to find a best match in cfg_parse_global().
This patch implements prerequisite log-profile struct and parser logic.
It has no effect during runtime for now.
Logformat expressions provided in log-profile "steps" are postchecked
during postparsing for each proxy "log" directive that makes use of a
given profile. (this allows to ensure that the logformat expressions
used in the profile are compatible with proxy using them)
This commit introduces a new global setting named
harden.reject_privileged_ports.{tcp|quic}. When active, communications
with clients which use privileged source ports are forbidden. Such
behavior is considered suspicious as it can be used as spoofing or
DNS/NTP amplication attack.
Value is configured per transport protocol. For each TCP and QUIC
distinct code locations are impacted by this setting. The first one is
in sock_accept_conn() which acts as a filter for all TCP based
communications just after accept() returns a new connection. The second
one is dedicated for QUIC communication in quic_recv(). In both cases,
if a privileged source port is used and setting is disabled, received
message is silently dropped.
By default, protection are disabled for both protocols. This is to be
able to backport it without breaking changes on stable release.
This should be backported as it is an interesting security feature yet
relatively simple to implement.
I just added a new setting to set the number of reserved buffer, to
discover we already had one... Let's move the parsing of this keyword
(tune.buffers.reserve) and tune.buffers.limit to dynbuf.c where they
should be.
This commit is the final to implement preloading of haproxy internal
counters via stats-file parsing.
Define a global keyword "stats-file". It allows to specify the path to
the stats-file which will be parsed on process startup.
Similarly to "expose-exprimental-directives" option, there is no a global
option to expose some deprecated directives. Idea is to have a way to silent
warnings about deprecated directives when there is no alternative solution.
Of course, deprecated directives covered by this option are not listed and
may change. It is only a best effort to let users upgrade smoothly.
Since 1de44da ("MINOR: ext-check: add an option to preserve environment
variables"), it is now possible to provide an extra argument to
"external-check" directive. This allows to support the "preserve-env"
option which differs from the default behavior.
However a mistake was made, because the config parser doesn't allow the
default configuration anymore: using external-check without argument will
trigger an error:
'external-check' only supports 'preserve-env' as an argument, found ''.
This is due to as small mistake in the code that make the check
systematically report an error if the first argument is not equal to
"preserve-env". The check was modified so that the error is only reported
if the argument is provided, so that the default behavior is restored.
This should fix GH #2380 and should be backported on 2.9 and potentially
further (anywhere 1de44da is, because a note about an optional backport
up to the 2.6 was left in the original commit message)
Zero-copy fast-forwading feature is a quite new and is a bit sensitive.
There is an option to disable it globally. However, all protocols have not
the same maturity. For instance, for the PT multiplexer, there is nothing
really new. The zero-copy fast-forwading is only another name for the kernel
splicing. However, for the QUIC/H3, it is pretty new, not really optimized
and it will evolved. And soon, the support will be added for the cache
applet.
In this context, it is usefull to be able to enable/disable zero-copy
fast-forwading per-protocol and applet. And when it is applicable, on sends
or receives separately. So, instead of having one flag to disable it
globally, there is now a dedicated bitfield, global.tune.no_zero_copy_fwd.