Implement parsing for "rev@" addresses on bind line. On config parsing,
server name is stored on the bind_conf.
Several new callbacks are defined on reverse_connect protocol to
complete parsing. listen callback is used to retrieve the server
instance from the bind_conf server name. If found, the server instance
is stored on the receiver. Checks are implemented to ensure HTTP/2
protocol only is used by the server.
Implement reverse-connect server. This server type cannot instantiate
its own connection on transfer. Instead, it can only reuse connection
from its idle pool. These connections will be populated using the future
'tcp-request session attach-srv' rule.
A reverse-connect has no address. Instead, it uses a new custom server
notation with '@' character prefix. For the moment, only '@reverse' is
defined. An extra check is implemented to ensure server is used in a
HTTP proxy.
Add a check for limited-quic in check_config_validity() when compiled
with USE_QUIC_OPENSSL_COMPAT so that we prevent a config from starting
accidentally with limited QUIC support. If a QUIC listener is found
when using the compatibility mode and limited-quic is not set, an error
message is reported explaining that the SSL library is not compatible
and proposing the user to enable limited-quic if that's what they want,
and the startup fails.
This partially reverts commit 7c730803d ("MINOR: quic: Warning for
OpenSSL wrapper QUIC bindings without "limited-quic"") since a warning
was not sufficient.
Surprisingly, commit 00e00fb42 ("REORG: cfgparse: extract curproxy as a
global variable") caused a build breakage on the CI but not on two
developers' machines. It looks like it's dependent on the linker version
used. What happens is that flt_spoe.c already has a curproxy struct which
already is a copy of the one passed by the parser because it also needed
it to be exported, so they now conflict. Let's just drop this unused copy.
parse_cpu_set() stopped returning the undocumented -1 which was a
leftover from an earlier attempt, changed from ulong to int since
it only returns a success/failure and no more a mask. Thus it must
not return -1 and its callers must only test for != 0, as is
documented.
As documented, the NUMA auto-detection is not supposed to be used when
the CPU affinity was set either by taskset (already checked) or by a
cpu-map directive. However this check was missing, so that configs
having cpu-map entries would still first bind to a single node. In
practice it has no impact on correct configs since bindings will be
replaced. However for those where the cpu-map directive are not
exhaustive it will have the impact of binding those threads to one node,
which disagrees with the doc (and makes future evolutions significantly
more complicated).
This could be backported to 2.4 where numa-cpu-mapping was added, though
if nobody encountered this by then maybe we should only focus on recent
versions that are more NUMA-friendly (e.g. 2.8 only). This patch depends
on this previous commit that brings the function we rely on:
MINOR: cpuset: add cpu_map_configured() to know if a cpu-map was found
When a stick-table is defined within a peers section, the name is
prefixed with the peers section name. However when checking for
duplicate table names, the check was using the table name without
the prefix, and would thus never match.
Must be backported as far as 2.6.
There are several misuses in peers sections that are not detected during the
configuration parsing and that could lead to undefined behaviors or crashes.
First, only one listener is expected for a peers section. If several bind
lines or local peer definitions are used, an error is triggered. However, if
multiple addresses are set on the same bind line, there is no error while
only the last listener is properly configured. On the 2.8, there is no crash
but side effects are hardly predictable. On older version, HAProxy crashes
if an unconfigured listener is used.
Then, there is no check on remote peers name. It is unexpected to have same
name for several remote peers. There is now a test, performed during the
post-parsing, to verify all remote peer names are unique.
Finally, server parsing options for the peers sections are changed to be
sure a port is always defined, and not a port range or a port offset.
This patch fixes the issue #2066. It could be backported to all stable
versions.
Some huge configs take a significant amount of time to start and this
can cause some trouble (e.g. health checks getting delayed and grouped,
process not responding to the CLI etc). For example, some configs might
start fast in certain environments and slowly in other ones just due to
the use of a wrong DNS server that delays all libc's resolutions. Let's
first start by measuring it by keeping a copy of the most recently known
ready date, once before calling check_config_validity() and then refine
it when leaving this function. A last call is finally performed just
before deciding to split between master and worker processes, and it covers
the whole boot. It's trivial to collect and even allows to get rid of a
call to clock_update_date() in function check_config_validity() that was
used in hope to better schedule future events.
The function that cpu-map uses to parse CPU sets, parse_cpu_set(), was
etended in 2.4 with commit a80823543 ("MINOR: cfgparse: support the
comma separator on parse_cpu_set") to support commas between ranges.
But since it was quite late in the development cycle, by then it was
decided not to add a last-minute surprise and not to magically support
commas in cpu-map, hence the "comma_allowed" argument.
Since then we know that it was not the best choice, because the comma
is silently ignored in the cpu-map syntax, causing all sorts of
surprises in field with threads running on a single node for example.
In addition it's quite common to copy-paste a taskset line and put it
directly into the haproxy configuration.
This commit relaxes this rule an finally allows cpu-map to support
commas between ranges. It simply consists in removing the comma_allowed
argument in the parse_cpu_set() function. The doc was updated to
reflect this.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
The listeners in peers sections were still not handing the thread
groups fine. Shards were silently ignored and if a listener was bound
to more than one group, it would simply fail. Now we can call the
dedicated function to resolve all this and possibly create the missing
extra listeners.
bind_complete_thread_setup() was adjusted to use the proxy_type_str()
instead of writing "proxy" at the only place where this word was still
hard-coded so that we continue to speak about peers sections when
relevant.
What used to be only two lines to apply a mask in a loop in
check_config_validity() grew into a 130-line block that performs deeply
listener-specific operations that do not have their place there anymore.
In addition it's worth noting that the peers code still doesn't support
shards nor being bound to more than one group, which is a second reason
for moving that code to its own function. Nothing was changed except
recreating the missing variables from the bind_conf itself (the fe only).
In 2.6-dev1, NUMA topology detection was enabled on FreeBSD with commit
f5d48f8b3 ("MEDIUM: cfgparse: numa detect topology on FreeBSD."). But
it suffers from a minor bug which is that it forgets to check for the
number of domains and always emits a confusing warning indicating that
multiple sockets were found while it's not the case.
This can be backported to 2.6.
Now it's possible for a bind line to span multiple thread groups. When
this happens, the first one will become the reference and will be entirely
set up, and the subsequent ones will be duplicated from this reference,
so that they can be registered in distinct groups. The reference is
always setup and started first so it is always available when the other
ones are started.
The doc was updated to reflect this new possibility with its limitations
and impacts, and the differences with the "shards" option.
Commit 5003ac7fe ("MEDIUM: config: set useful ALPN defaults for HTTPS
and QUIC") revealed a build dependency bug: if QUIC is not enabled,
cfgparse doesn't have any dependency on the SSL stack, so the various
ifdefs that try to check special conditions such as rejecting an H2
config with too small a bufsize, are silently ignored. This was
detected because the default ALPN string was not set and caused the
alpn regtest to fail without QUIC support. Adding openssl-compat to
the list of includes seems to be sufficient to have what we need.
It's unclear when this dependency was broken, it seems that even 2.2
didn't have an explicit dependency on anything SSL-related, though it
could have been inherited through other files (as happens with QUIC
here). It would be safe to backport it to all stable branches. The
impact is very low anyway.
This commit makes sure that if three is no "alpn", "npn" nor "no-alpn"
setting on a "bind" line which corresponds to an HTTPS or QUIC frontend,
we automatically turn on "h2,http/1.1" as an ALPN default for an HTTP
listener, and "h3" for a QUIC listener. This simplifies the configuration
for end users since they won't have to explicitly configure the ALPN
string to enable H2, considering that at the time of writing, HTTP/1.1
represents less than 7% of the traffic on large infrastructures. The
doc and regtests were updated. For more info, refer to the following
thread:
https://www.mail-archive.com/haproxy@formilux.org/msg43410.html
It's possible to replace a previously set ALPN but not to disable ALPN
if it was previously set. The new "no-alpn" setting allows to disable
a previously set ALPN setting by preparing an empty one that will be
replaced and freed when the config is validated.
Setting "shards by-group" will create one shard per thread group. This
can often be a reasonable tradeoff between a single one that can be
suboptimal on CPUs with many cores, and too many that will eat a lot
of file descriptors. It was shown to provide good results on a 224
thread machine, with a distribution that was even smoother than the
system's since here it can take into account the number of connections
per thread in the group. Depending on how popular it becomes, it could
even become the default setting in a future version.
Instead of artificially setting the shards count to MAX_THREAD when
"by-thread" is used, let's reserve special values for symbolic names
so that we can add more in the future. For now we use value -1 for
"by-thread", which requires to turn the type to signed int but it was
already used as such everywhere anyway.
We're only checking for 0, 1, or >1 groups enabled there, and we'll soon
need to be more precise and know quickly which groups are non-empty.
Let's just replace the count with a mask of enabled groups. This will
allow to quickly spot the presence of any such group in a set.
thread_set_first_group() and thread_set_first_tmask() were modified
and renamed to instead return the number and mask of the nth group.
Passing zero continues to return the first one, but it will be more
convenient to use this way when building shards.
The ssl_bind_kw structure is exclusively used for crt-list keyword, it
must be named otherwise to remove the confusion.
The structure was renamed ssl_crtlist_kws.
In check_config_validity() function, we performed some consistency checks to
adjust minconn/maxconn attributes for each declared server.
We move this logic into a dedicated function named srv_minmax_conn_apply()
to be able to perform those checks later in the process life when needed
(ie: dynamic servers)
Aurélien reported that the BUG_ON(!new_ts.nbgrp) added in 2.8-dev3 by
commit 50440457e ("MEDIUM: config: restrict shards, not bind_conf to one
group each") can trigger on some invalid configs where the thread_set on
the "bind" line couldn't be resolved. The reason is that we still enter
the parsing loop (as it was done previously) and we possibly have no
group to work on (which was the purpose of this assertion). There we
need to bypass all this block on such a condition.
No backport is needed.
Now that we're using thread_sets there's no need to restrict an entire
bind_conf to 1 group, the real concern being the FD, we can move that
restriction to the shard only. This means that as long as we have enough
shards and that they're properly aligned on group boundaries (i.e. shards
are an integer divider of the number of threads), we can support "bind"
lines spanning more than one group.
The check is still performed for shards to span more than one group,
and an error is emitted when this happens. But at least now it becomes
possible to have this:
global
nbthread 256
frontend foo
bind :1111 shards 4
bind :2222 shards by-thread
Let's now retrieve the first thread group and its mask from the
thread_set so that we don't need these fields in the bind_conf anymore.
For now we're still limited to the first group (like before) but that
allows to get rid of these fields and to make sure that there's nothing
"special" being done there anymore.
Instead of reading and storing a single group and a single mask for a
"thread" directive on a bind line, we now store the complete range in
a thread set that's stored in the bind_conf. The bind_parse_thread()
function now just calls parse_thread_set() to complete the current set,
which starts empty, and thread_resolve_group_mask() was updated to
support retrieving thread group numbers or absolute thread numbers
directly from the pre-filled thread_set, and continue to feed bind_tgroup
and bind_thread. The CLI parsers which were pre-initialized to set the
bind_tgroup to 1 cannot do it anymore as it would prevent one from
restricting the thread set. Instead check_config_validity() now detects
the CLI frontend and passes the info down to thread_resolve_group_mask()
that will automatically use only the group 1's threads for these
listeners. The same is done for the peers listeners for now.
At this step it's already possible to start with all previous valid
configs as well as extended ones supporting comma-delimited thread
sets. In addition the parser already accepts large ranges spanning
multiple groups, but since the underlying listeners infrastructure
is not read, for now we're maintaining a specific check against this
at the higher level of the config validity check.
The patch is a bit large because thread resolution is performed in
multiple steps, so we need to adjust all of them at once to preserve
functional and technical consistency.
During 2.5 development, a fallback was implemented for bind "thread"
directives that would not map to existing threads, with commit e3f4d7496
("MEDIUM: config: resolve relative threads on bind lines to absolute ones").
The approch consisted in remapping the threads to other ones. But now
that relative threads and not absolute threads are stored in this mask,
this case cannot happen anymore, and this confusing hack is not needed
anymore.
This flag is only used to tag a QUIC listener, which we now know by
its bind_conf's xprt as well. It's only used to decide whether or not
to perform an extra initialization step on the listener. Let's drop it
as well as the flags field.
With the various fields and options moved, the listener struct reduced
by 48 bytes total.
LI_O_TCP_L4_RULES and LI_O_TCP_L5_RULES are only set by from the proxy
based on the presence or absence of tcp_req l4/l5 rules. It's basically
as cheap to check the list as it is to check the flag, except that there
is no need to maintain a copy. Let's get rid of them, and this may ease
addition of more dynamic stuff later.
These two flags are entirely for internal use and are even per proxy
in practice since they're used for peers and CLI to indicate (for the
first one) that the listener(s) are not subject to connection limits,
and for the second that the listener(s) should not be stopped on
soft-stop. No need to keep them in the listeners, let's move them to
the bind_conf under names BC_O_UNLIMITED and BC_O_NOSTOP.
It's currently declared per-frontend, though it would make sense to
support it per-line but in no case per-listener. Let's move the option
to a bind_conf option BC_O_NOLINGER.
This field is used by stream_new() to optionally set the applet the
stream will connect to for simple proxies like the CLI for example.
But it has never been configurable to anything and is always strictly
equal to the frontend's ->default_target. Let's just drop it and make
stream_new() only use the frontend's. It makes more sense anyway as
we don't want the proxy to work differently based on the "bind" line.
This idea was brought in 1.6 hoping that the h2 implementation would
use applets for decoding (which was dropped after the very first
attempt in 1.8).
The accept callback directly derives from the upper layer, generally
it's session_accept_fd(). As such it's also defined per bind line
so it makes sense to move it there.
Like for previous values, maxaccept is really per-bind_conf, so let's
move it there. Some frontends (peers, log) set it to 1 so the assignment
was slightly moved.
When bind_conf were created, some elements such as the analysers mask
ought to have moved there but that wasn't the case. Now that it's
getting clearer that bind_conf provides all binding parameters and
the listener is essentially a listener on an address, it's starting
to get really confusing to keep such parameters in the listener, so
let's move the mask to the bind_conf. We also take this opportunity
for pre-setting the mask to the frontend's upon initalization. Now
several loops have one less argument to take care of.
This is a simple refactor to remove specific http_ext post-parsing treatment from
cfgparse.
Related work is now performed internally through check_http_ext_postconf()
function that is registered via REGISTER_POST_PROXY_CHECK() in http_ext.c.
proxy http-only options implemented in http_ext were statically stored
within proxy struct.
We're making some changes so that http_ext are now stored in a dynamically
allocated structs.
http_ext related structs are only allocated when needed to save some space
whenever possible, and they are automatically freed upon proxy deletion.
Related PX_O_HTTP{7239,XFF,XOT) option flags were removed because we're now
considering an http_ext option as 'active' if it is allocated (ptr is not NULL)
A few checks (and BUG_ON) were added to make these changes safe because
it adds some (acceptable) complexity to the previous design.
Also, proxy.http was renamed to proxy.http_ext to make things more explicit.
Just like forwarded (7239) header and forwardfor header, move parsing,
logic and management of 'originalto' option into http_ext dedicated class.
We're only doing this to standardize proxy http options management.
Existing behavior remains untouched.
Just like forwarded (7239) header, move parsing, logic and management
of 'forwardfor' option into http_ext dedicated class.
We're only doing this to standardize proxy http options management.
Existing behavior remains untouched.
Introducing http_ext class for http extension related work that
doesn't fit into existing http classes.
HTTP extension "forwarded", introduced with 7239 RFC is now supported
by haproxy.
The option supports various modes from simple to complex usages involving
custom sample expressions.
Examples :
# Those servers want the ip address and protocol of the client request
# Resulting header would look like this:
# forwarded: proto=http;for=127.0.0.1
backend www_default
mode http
option forwarded
#equivalent to: option forwarded proto for
# Those servers want the requested host and hashed client ip address
# as well as client source port (you should use seed for xxh32 if ensuring
# ip privacy is a concern)
# Resulting header would look like this:
# forwarded: host="haproxy.org";for="_000000007F2F367E:60138"
backend www_host
mode http
option forwarded host for-expr src,xxh32,hex for_port
# Those servers want custom data in host, for and by parameters
# Resulting header would look like this:
# forwarded: host="host.com";by=_haproxy;for="[::1]:10"
backend www_custom
mode http
option forwarded host-expr str(host.com) by-expr str(_haproxy) for for_port-expr int(10)
# Those servers want random 'for' obfuscated identifiers for request
# tracing purposes while protecting sensitive IP information
# Resulting header would look like this:
# forwarded: for=_000000002B1F4D63
backend www_for_hide
mode http
option forwarded for-expr rand,hex
By default (no argument provided), forwarded option will try to mimic
x-forward-for common setups (source client ip address + source protocol)
The option is not available for frontends.
no option forwarded is supported.
More info about 7239 RFC here: https://www.rfc-editor.org/rfc/rfc7239.html
More info about the feature in doc/configuration.txt
This should address feature request GH #575
Depends on:
- "MINOR: http_htx: add http_append_header() to append value to header"
- "MINOR: sample: add ARGC_OPT"
- "MINOR: proxy: introduce http only options"
The number of stick-counter entries usable by track-sc rules is currently
set at build time. There is no good value for this since the vast majority
of users don't need any, most need only a few and rare users need more.
Adding more counters for everyone increases memory and CPU usages for no
reason.
This patch moves the per-session and per-stream arrays to a pool of a size
defined at boot time. This way it becomes possible to set the number of
entries at boot time via a new global setting "tune.stick-counters" that
sets the limit for the whole process. When not set, the MAX_SESS_STR_CTR
value still applies, or 3 if not set, as before.
It is also possible to lower the value to 0 to save a bit of memory if
not used at all.
Note that a few low-level sample-fetch functions had to be protected due
to the ability to use sample-fetches in the global section to set some
variables.