For now the function refrains from detecting the CPU topology when a
restrictive taskset or cpu-map was already performed on the process,
and it's documented as such, the reason being that until we're able
to automatically create groups, better not change user settings. But
we'll need to be able to detect bound CPUs and to process them as
desired by the user, so we now need to move that detection into the
function itself. It changes nothing to the logic, just gives more
freedom to the function.
The function is not convenient because it doesn't allow us to undo the
startup changes, and depending on where it's being used, we don't know
whether the values read have already been altered (this is not the case
right now but it's going to evolve).
Let's just compute the status during cpu_detect_usable() and set a
variable accordingly. This way we'll always read the init value, and
if needed we can even afford to reset it. Also, placing it in cpu_topo.c
limits cross-file dependencies (e.g. threads without affinity etc).
The cpuset files are normally used only for cpu manipulations. It happens
that the initial CPU binding detection was initially placed there since
there was no better place, but in practice, being OS-specific, it should
really be in cpu-topo. This simplifies cpuset which doesn't need to know
about the OS anymore.
Invalid (incomplete) "server" or "peer" lines under peers section are now
properly ignored. For completeness, in this patch we add some reports so
that the user knows that incomplete lines were ignored.
For an incomplete server line, since it is tolerated (see GH #565), we
only emit a diag warning.
For an incomplete peer line, we report a real warning, as it is not
expected to have a peer line without an address:port specified.
Also, 'newpeer == curpeers->local' check could be simplified since
we already have the 'local_peer' variable which tells us that the
parsed line refers to a local peer.
In 8ba10fea6 ("BUG/MINOR: peers: Incomplete peers sections should be
validated."), some checks were relaxed in parse_server(), and extra logic
was added in the peers section parser in an attempt to properly ignore
incomplete "server" or "peer" statement under peers section.
This was done in response to GH #565, the main intent was that haproxy
should already complain about incomplete peers section (ie: missing
localpeer).
However, 8ba10fea69 explicitly skipped the peer cleanup upon missing
srv association for local peers. This is wrong because later haproxy
code always assumes that peer->srv is valid. Indeed, we got reports
that the (invalid) config below would cause segmentation fault on
all stable versions:
global
localpeer 01JM0TEPAREK01FQQ439DDZXD8
peers my-table
peer 01JM0TEPAREK01FQQ439DDZXD8
listen dummy
bind localhost:8080
To fix the issue, instead of by-passing some cleanup for the local
peer, handle this case specifically by doing the regular peer cleanup
and reset some fields set on the curpeers and curpeers proxy because
of the invalid local peer (do as if the peer was not declared).
It should still comply with requirements from #565.
This patch should be backported to all stable versions.
In the "peers" section parser, right after parse_server() is called, we
used to check whether the curpeers->peers_fe->srv pointer was set or not
to know if parse_server() successfuly added a server to the peers proxy,
server that we can then associate to the new peer.
However the check is wrong, as curpeers->peers_fe->srv points to the
last added server, if a server was successfully added before the
failing one, we cannot detect that the last parse_server() didn't
add a server. This is known to cause bug with bad "peer"/"server"
statements.
To fix the issue, we save a pointer on the last known
curpeers->peers_fe->srv before parse_server() is called, and we then
compare the save with the pointer after parse_server(), if the value
didn't change, then parse_server() didn't add a server. This makes
the check consistent in all situations.
It should be backported to all stable versions.
In check_config_validity() we evaluate some sample fetch expressions
(log-format, server rules, etc). These expressions may use external files like
maps.
If some particular 'default-path' was set in the global section before, it's no
longer applied to resolve file pathes in check_config_validity(). parse_cfg()
at the end of config parsing switches back to the initial cwd.
This fixes the issue #2886.
This patch should be backported in all stable versions since 2.4.0, including
2.4.0.
When "peers" keyword is followed by more than one argument and it's the first
"peers" section in the config, cfg_parse_peers() detects it and exits with
"ERR_ALERT|ERR_FATAL" err_code.
So, upper layer parser, parse_cfg(), continues and parses the next keyword
"peer" and then he tries to check the global cfg_peers, which should contain
"my_cluster". The global cfg_peers is still NULL, because after alerting a user
in alertif_too_many_args, cfg_parse_peers() exited.
peers my_cluster __some_wrong_data__
peer haproxy1 1.1.1.1 1000
In order to fix this, let's add ERR_ABORT, if "peers" keyword is followed by
more than one argument. Like this parse_cfg() will stops immediately and
terminates haproxy with "too many args for peers my_cluster..." alert message.
It's more reliable, than add checks "if (cfg_peers !=NULL)" in "peer"
subparser, as we may have many "peers" sections.
peers my_another_cluster
peer haproxy1 1.1.1.2 1000
peers my_cluster __some_wrong_data__
peer haproxy1 1.1.1.1 1000
In addition, for the example above, parse_cfg() will parse all configuration
until the end and only then terminates haproxy with the alert
"too many args...". Peer haproxy1 will be wrongly associated with
my_another_cluster.
This fixes the issue #2872.
This should be backported in all stable versions.
Since we are now iterating on post_section_parser() for a same keyword,
we need to exit at the first ERR_ABORT.
The post_section_parser() is called when parsing a new section, but also
at the end of the file to be called for the last section.
The changes in 4de86bb ("MEDIUM: initcall: allow to register mutiple
post_section_parser per section") should have added tests on the
ERR_ABORT value.
Also pcs->post_section_parser() must be called instead of
cs->post_section_parser() because we could have a NULL ptr.
This bug does not affect anything since we don't use
REGISTER_CONFIG_POST_SECTION() yet.
Before this patch, REGISTER_CONFIG_SECTION() allowed to register one and only
one callback (<post>) called after the parsing of a section.
It was limitating because you couldn't register a post callback from anywhere
else in the code.
This patch introduces the new REGISTER_CONFIG_SECTION_POST() macros which allows
to register a new post callback for a section keyword from anywhere.
This patch introduces the feature by allowing `struct cfg_section` entries that
does not have a `section_parser`, and then iterating on all cfg_section with a
post_section_parser for a keyword.
Previous patch 2c270a05f ("BUG/MINOR: mworker: section ignored in
discovery after a post_section_parser") needs an adjustment for the last
section of the file.
Indeed the post_section_parser of the last section must not be called in
discovery mode.
Must be backported in 3.1.
When a new section is discovered, the post_section_parser of the
previous section is called. However in the new master-worker mode the
discovery mode will skip the post_section_parser. But instead of
trying to parse the current section keyword after that, it would skip
completely the current line.
This is a minor bug since there isn't a lot of section with
post_section_parser, and not a lot of section to parse in discovery
mode.
But this could be reproduced like this:
global
expose-deprecated-directives
resolvers res
parse-resolv-conf
program foo
command sleep 10
program bar
command sleep 10
Ths 'resolvers' section has a post_section_parser which will be ignored
in discovery mode with the consequence of ignoring the first program
section.
This must be backported in 3.1.
When a group is defined in a userlist section, only one 'users' option is
expected. But it was not tested. Thus it was possible to set several options
leading to a memory leak.
It is now tested, and it is not allowed to redefine the users option.
It was reported by Coverity in #2841: CID 1587771.
This patch could be backported to all stable versions.
A new global option was recently introduced to disable pacing. However,
the value used (1<<31) caused issue with some compiler as options field
used for storage is declared as int. Move pacing deactivation flag
outside into the newly defined quic_tune to fix this.
This should be backported up to 3.1 after a period of observation. Note
that it relied on the previous patch which defined new quic_tune type.
Pacing support was previously activated on each bind line individually,
via an optional argument of quic-cc-algo keyword. Remove this optional
argument and introduce a global setting to enable/disable pacing. Pacing
activation is still flagged as experimental.
One important change is that previously BBR usage automatically
activated pacing support. This is not the case anymore, so users should
now always explicitely activate pacing if BBR is selected. A new warning
message will be displayed if this is not the case.
Another consequence of this change is that now pacing_inter callback is
always defined for every quic_cc_algo types. As such, QUIC MUX uses
global.tune.options to determine if pacing is required.
This should be backported up to 3.1, after a period of observation.
Add a per-thread group field to struct proxy, that will contain a struct
queue, as well as a new field, "queueslength".
This is currently unused, so should change nothing.
Please note that proxy_init_per_thr() must now be called for each proxy
once the thread groups number is known.
proxy auth_uri struct was manually cleaned up during deinit, but the logic
behind was kind of akward because it was required to find out which ones
were shared or not. Instead, let's switch to a proper refcount mechanism
and free the auth_uri struct directly in proxy_free_common().
Each server is inserted in a global list named servers_list on
new_server(). This list is then only used to finalize servers
initialization after parsing.
On dynamic server creation, there is no issue as new_server() is under
thread isolation. However, when a server is deleted after its refcount
reached zero, srv_drop() removes it from servers_list without lock
protection. In the longterm, this can cause list corruption and crashes,
especially if multiple adjacent servers are removed in parallel.
To fix this, convert servers_list to a mt_list. This should not impact
performance as servers_list is not used during runtime outside of server
creation/deletion.
This should fix github issue #2733. Thanks to Chris Staite who first
found the issue here.
This must be backported up to 2.6.
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.
Programs are launched by master, thus only the master process needs its
configuration. Therefore, program section parser should be called only in
discovery mode, when master parses its configuration.
Program section has a post section parser. It should be called only in
discovery mode as well.
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.
So, in discovery mode, when we read the configuration the first time, we
parse for the moment only the "global" section. Unknown section names will be
ignored.
It is no longer supported to declare debug traces, via 'trace' directive, in
a global section. A 'traces' directive must be used instead. The syntax of
the 'trace' directive in these sections remains the same. But it is no
longer experimental.
The main reason for this change is to avoid to have a ring section defined
before a global one. Indeed, for now, forward declarations of ring sections
are not supported. So to configure traces, you had to add a ring section
before the global one defining the traces. Most of time, that meant to have
two global sections :
global
[...] # global settings
ring <name>
[...]
global
[...] # trace config
In addition, it will be possible to easily extend the traces section by
adding some new directives.
As discussed below, there are too many problems and limitations caused
by still supporting duplicate server names. That's already particularly
complicated and dissuasive to use since it requires these servers to
have explicit IDs to be accept. Let's now warn on any duplicate, even
with explicit IDs and remind that this will become forbidden in 3.3.
Link: https://www.mail-archive.com/haproxy@formilux.org/msg45185.html
Surprisingly, the duplicate server name detection has never made use
of the names tree, so lookups were still in O(N^2). It took 1 second
to validate 50k servers spread into 25 backends at 2k per backend.
By simply using the tree (and since the current server already is in
the tree), we just have to walk using ebpt_prev_dup to visit previous
servers with the same name. We can then detect which ones conflict
without having an ID set and error. The config check time is now 1/4
of the previous one for 2k servers per backend, and more importantly
it will make it simpler to check for any duplicates later.
Proxy file names are assigned a bit everywhere (resolvers, peers,
cli, logs, proxy). All these elements were enumerated and now use
copy_file_name(). The only ha_free() call was turned to drop_file_name().
As a bonus side effect, a 300k backend config saved 14 MB of RAM.
Add a new parameter "alt" that will store wether this configuration
use an alternate protocol.
This alt pointer will contain a value that can be transparently
passed to protocol_lookup to obtain an appropriate protocol structure.
This change is needed to allow for example the servers to know if it
need to use an alternate protocol or not.
load_cfg_in_mem() can continuously reallocate memory in order to load an
extremely large input from /dev/stdin, until it fails with ENOMEM, which means
that process has consumed all available RAM. In case of containers and
virtualized environments it's not very good.
So, in order to prevent this, let's introduce MAX_CFG_SIZE as 10MB, which will
limit the size of input supplied via /dev/stdin.
This commit fixes potential null ptr dereferences reported by coverity, see
more details about it in the issues #2676 and #2668.
'outline' ptr, which is initialized to NULL explicitly as a temporary buffer to
store split keywords may be in theory implicitly dereferenced in some corner
cases (which we haven't encountered yet with real world configurations) in
'if (!**args)'. parse_line() code, called before under some conditions
assigns: args[arg] = outline + outpos and outpos initial value is 0.
This helps to optimize a bit load_cfg_in_mem() and fixes the potential null ptr
dereference in fread() call. If (read_bytes + bytes_to_read) equals to initial
chunk_size (zero), realloc is never called, *cfg_content keeps its NULL value.
So, let's assure that initial number of bytes to read
(read_bytes + bytes_to_read) is stricly positive, when we enter into loop at
the first time.
As readcfgfile no longer opens configuration files and reads them with fgets,
but performs only the parsing of provided data, let's rename it to parse_cfg by
analogy with read_cfg in haproxy.c.
Let's call load_cfg_in_ram() helper for each configuration file to load it's
content in some area in memory. Adapt readcfgfile() parser function
respectively. In order to limit changes in its scope we give as an argument a
cfgfile structure, already filled in init_args() and in load_cfg_in_ram() with
file metadata and content.
Parser function (readcfgfile()) uses now fgets_from_mem() instead of standard
fgets from libc implementations.
SPOE filter parses its own configuration file, pointed by 'config' keyword in
the configuration already loaded in memory. So, let's allocate and fill for
this a supplementary cfgfile structure, which is not referenced in cfg_cfgfiles
list. This structure and the memory with content of SPOE filter configuration
are freed immediately in parse_spoe_flt(), when readcfgfile() returns.
HAProxy OpenTracing filter also uses its own configuration file. So, let's
follow the same logic as we do for SPOE filter.
Let's take in account the given file size, when its reported via stat.
It's very convenient for large configuration files, as this allows to
perform only the one memory allocation call for precisely needeed file size.
This also allows to perform only the one call to fread().
We need to provide to fread() file_stat.st_size + 1 to be able to grab EOF.
Like this it sets feof(f)=1 flag and this allows to exit from the loop
immediately, just after fread call.
If /dev/stdin or /dev/null is provided as a file, we continue to read the
configuration chunk by chunk, stat doesn't report the size.
list_append_word() helper was used before only to chain configuration file names
in a list. As now we start to use cfgfile structure which represents entire file
in memory and its metadata, let's adapt this helper to use this structure and
let's rename it to list_append_cfgfile().
Adapt functions, which process configuration files and directories to use
cfgfile structure and list_append_cfgfile() instead of wordlist.
The reuse "always" mode is forced for SPOP backends. For now, SPOP
connections cannot be idle, but once implemented, thanks to this patch, it
will be possible to reuse SPOP connections.
The related issue is #2502.
The SPOE was significantly lightened. It is now possible to refactor it to
use a dedicated multiplexer. The first step is to add a SPOP mode for
proxies. The corresponding multiplexer mode is also added.
For now, there is no SPOP multiplexer, so it is only declarative. But at the
end, the SPOP multiplexer will be automatically selected for servers inside
a SPOP backend.
The related issue is #2502.
As shown in GH #2608 and ("BUG/MEDIUM: proxy: fix email-alert invalid
free"), simply calling free_email_alert() from free_proxy() is not the
right thing to do.
In this patch, we reuse proxy->email_alert.set memory space to introduce
proxy->email_alert.flags in order to support 2 flags:
PR_EMAIL_ALERT_SET (to mimic proxy->email_alert.set) and
PR_EMAIL_ALERT_RESOLVED (set once init_email_alert() was called on the
proxy to resolve email_alert.mailer pointer).
Thanks to PR_EMAIL_ALERT_RESOLVED flag, free_email_alert() may now
properly handle the freeing of proxy email_alert settings: if the RESOLVED
flag is set, then it means the .email_alert.mailers.name parsing hint was
replaced by the actual mailers pointer, thus no free should be attempted.
No backport needed: as described in ("BUG/MEDIUM: proxy: fix email-alert
invalid free"), this historical leak is not sensitive as it cannot be
triggered during runtime.. thus given that the fix is not backport-
friendly, it's not worth the trouble.
In GH issue #2586 @Bbulatov reported a bug where the http-check
send-state flag is removed from options instead of options2 when
http-check is disabled. It only has an effect when this option is
set and http-check disabled, where it displays a warning indicating
this will be ignored. The option removed instead is srvtcpka when
this happens. It's likely that both options being so minor, nobody
ever faced it.
This can be backported to all versions.
On todays large systems, it's not always desired to run on all threads
for light loads, and usually users enforce nbthread to a lower value
(e.g. 8). The problem is that this is a fixed value, and moving such
configs to smaller machines continues to enforce the value and this
becomes extremely unproductive due to having more threads than CPUs.
This also happens quite a bit in VMs, containers, or cloud instances
of various sizes.
This commit introduces the thread-hard-limit setting that allows to only
set an upper bound to the number of threads without raising a lower value.
This means that using "thread-hard-limit 8" will make sure that no more
than 8 threads will be used when available, but it will remain two when
run on a dual-core machine.
Previously check_config_validity() had its own curproxy variable. This
resulted in the acl() sample fetch being unable to determine which
proxy was in use when used from within log-format statements. This
change addresses the issue by having the check_config_validity()
function use the global variable instead.
Some flags are defined during statistics generation and output. They use
the prefix STAT_* which is also used for other purposes. Rename them
with the new prefix STAT_F_* to differentiate them from the other
usages.
A dumb mistake was made in f6ae25858 ("MINOR: peers: rely on srv->addr
and remove peer->addr"). I completely overlooked the part where the bind
address settings are used as implicit server's address settings when the
peers are declared using the new bind+server config style (which is the
new recommended method to declare peers as it follows the same logic as
the one used in other proxy sections).
As such, the peers synchro fails to work between previous and new process
(localpeer mechanism) upon reload when declaring peers with way:
global
localpeer local
peers mypeers
bind 127.0.0.1:10001
server local
And one has to use the 'old' config style to make it work:
global
localpeer local
peers mypeers
peer local 127.0.0.1:10001
--
To fix the issue, let's explicitly set the server's addr:port
according to the bind's address settings (only the first listener is
considered) when local peer was declared using the 'bind+server' method.
No backport needed.
When sharded listeners were introdcued in 2.5 with commit 6dfbef4145
("MEDIUM: listener: add the "shards" bind keyword"), a point was
overlooked regarding how IDs are assigned to listeners: they are just
duplicated! This means that if a "option socket-stats" is set and a
shard is configured, or multiple thread groups are enabled, then a stats
dump will produce several lines with exactly the same socket name and ID.
This patch tries to address this by trying to assign consecutive numbers
to these sockets. The usual algo is maintained, but with a preference for
the next number in a shard. This will help users reserve ranges for each
socket, for example by using multiples of 100 or 1000 on each bind line,
leaving enough room for all shards to be assigned.
The mechanism however is quite tricky, because the configured listener
currently ends up being the last one of the shard. This helps insert them
before the current position without having to revisit them. But here it
causes a difficulty which is that we'd like to restart from the current
ID and assign new ones on top of it. What is done is that the number is
passed between shards and the current one is cleared (and removed from
the tree) so that we instead insert the new one. It's tricky because of
the situation which depends whether it's the listener that was already
assigned on the bind line or not. But overall, always removing the entry,
always adding the new one when the ID is not zero, and passing them from
the reference to the next one does the trick.
This may be backported to all versions till 2.6.
This commit is similar with the two previous ones. Its purpose is to add
GUID support on listeners. Due to bind_conf and listeners configuration,
some specifities were required.
Its possible to define several listeners on a single bind line, for
example by specifying multiple addresses. As such, it's impossible to
support a "guid" keyword on a bind line. The problem is exacerbated by
the cloning of listeners when sharding is used.
To resolve this, a new keyword "guid-prefix" is defined for bind lines.
It allows to specify a string which will be used as a prefix for
automatically generated GUID for each listeners attached to a bind_conf.
Automatic GUID listeners generation is implemented via a new function
bind_generate_guid(). It is called on post-parsing, after
bind_complete_thread_setup(). For each listeners on a bind_conf, a new
GUID is generated with bind_conf prefix and the index of the listener
relative to other listeners in the bind_conf. This last value is stored
in a new bind_conf field named <guid_idx>. If a GUID cannot be inserted,
for example due to a non-unique value, an error is returned, startup is
interrupted with configuration rejected.
Currently, the way proxy-oriented logformat directives are handled is way
too complicated. Indeed, "log-format", "log-format-error", "log-format-sd"
and "unique-id-format" all rely on preparsing hints stored inside
proxy->conf member struct. Those preparsing hints include the original
string that should be compiled once the proxy parameters are known plus
the config file and line number where the string was found to generate
precise error messages in case of failure during the compiling process
that happens within check_config_validity().
Now that lf_expr API permits to compile a lf_expr struct that was
previously prepared (with original string and config hints), let's
leverage lf_expr_compile() from check_config_validity() and instead
of relying on individual proxy->conf hints for each logformat expression,
store string and config hints in the lf_expr struct directly and use
lf_expr helpers funcs to handle them when relevant (ie: original
logformat string freeing is now done at a central place inside
lf_expr_deinit(), which allows for some simplifications)
Doing so allows us to greatly simplify the preparsing logic for those 4
proxy directives, and to finally save some space in the proxy struct.
Also, since httpclient proxy has its "logformat" automatically compiled
in check_config_validity(), we now use the file hint from the logformat
expression struct to set an explicit name that will be reported in case
of error ("parsing [httpclient:0] : ...") and remove the extraneous check
in httpclient_precheck() (logformat was parsed twice previously..)