12588 Commits

Author SHA1 Message Date
Willy Tarreau
cc8fd4c040 MINOR: resolvers: merge address and target into a union "data"
These two fields are exclusive as they depend on the data type.
Let's move them into a union to save some precious bytes. This
reduces the struct resolv_answer_item size from 600 to 576 bytes.
2021-10-14 22:52:04 +02:00
Willy Tarreau
b4ca0195a9 BUG/MEDIUM: resolvers: use correct storage for the target address
The struct resolv_answer_item contains an address field of type
"sockaddr" which is only 16 bytes long, but which is used to store
either IPv4 or IPv6. Fortunately, the contents only overlap with
the "target" field that follows it and that is large enough to
absorb the extra bytes needed to store AAAA records. But this is
dangerous as just moving fields around could result in memory
corruption.

The fix uses a union and removes the casts that were used to hide
the problem.

Older versions need to be checked and possibly fixed. This needs
to be backported anyway.
2021-10-14 22:44:51 +02:00
Willy Tarreau
6dfbef4145 MEDIUM: listener: add the "shards" bind keyword
In multi-threaded mode, on operating systems supporting multiple listeners on
the same IP:port, this will automatically create this number of multiple
identical listeners for the same line, all bound to a fair share of the number
of the threads attached to this listener. This can sometimes be useful when
using very large thread counts where the in-kernel locking on a single socket
starts to cause a significant overhead. In this case the incoming traffic is
distributed over multiple sockets and the contention is reduced. Note that
doing this can easily increase the CPU usage by making more threads work a
little bit.

If the number of shards is higher than the number of available threads, it
will automatically be trimmed to the number of threads. A special value
"by-thread" will automatically assign one shard per thread.
2021-10-14 21:27:48 +02:00
Willy Tarreau
59a877dfd9 MINOR: listeners: add clone_listener() to duplicate listeners at boot time
This function's purpose will be to duplicate a listener in INIT state.
This will be used to ease declaration of listeners spanning multiple
groups, which will thus require multiple FDs hence multiple receivers.
2021-10-14 21:27:48 +02:00
Willy Tarreau
01cac3f721 MEDIUM: listeners: split the thread mask between receiver and bind_conf
With groups at some point we'll have to have distinct masks/groups in the
receiver and the bind_conf, because a single bind_conf might require to
instantiate multiple receivers (one per group).

Let's split the thread mask and group to have one for the bind_conf and
another one for the receiver while it remains easy to do. This will later
allow to use different storage for the bind_conf if needed (e.g. support
multiple groups).
2021-10-14 21:27:48 +02:00
Willy Tarreau
875ee704dd MINOR: resolvers: fix the resolv_dn_label_to_str() API about trailing zero
This function suffers from the same API issue as its sibling that does the
opposite direction, it demands that the input string is zero-terminated
*and* that its length *including* the trailing zero is passed on input,
forcing callers to pass length + 1, and itself to use that length - 1
everywhere internally.

This patch addressess this. There is a single caller, which is the
location of the previous bug, so it should probably be backported at
least to keep the code consistent across versions. Note that the
function is called dns_dn_label_to_str() in 2.3 and earlier.
2021-10-14 21:24:18 +02:00
Willy Tarreau
85c15e6bff BUG/MINOR: resolvers: do not reject host names of length 255 in SRV records
An off-by-one issue in buffer size calculation used to limit the output
of resolv_dn_label_to_str() to 254 instead of 255.

This must be backported to 2.0.
2021-10-14 21:24:18 +02:00
Willy Tarreau
947ae125cc BUG/MEDIUM: resolver: make sure to always use the correct hostname length
In issue #1411, @jjiang-stripe reports that do-resolve() sometimes seems
to be trying to resolve crap from random memory contents.

The issue is that action_prepare_for_resolution() tries to measure the
input string by itself using strlen(), while resolv_action_do_resolve()
directly passes it a pointer to the sample, omitting the known length.
Thus of course any other header present after the host in memory are
appended to the host value. It could theoretically crash if really
unlucky, with a buffer that does not contain any zero including in the
index at the end, and if the HTX buffer ends on an allocation boundary.
In practice it should be too low a probability to have ever been observed.

This patch modifies the action_prepare_for_resolution() function to take
the string length on with the host name on input and pass that down the
chain. This should be backported to 2.0 along with commit "MINOR:
resolvers: fix the resolv_str_to_dn_label() API about trailing zero".
2021-10-14 21:24:18 +02:00
Willy Tarreau
bf9498a31b MINOR: resolvers: fix the resolv_str_to_dn_label() API about trailing zero
This function is bogus at the API level: it demands that the input string
is zero-terminated *and* that its length *including* the trailing zero is
passed on input. While that already looks smelly, the trailing zero is
copied as-is, and is then explicitly replaced with a zero... Not only
all callers have to pass hostname_len+1 everywhere to work around this
absurdity, but this requirement causes a bug in the do-resolve() action
that passes random string lengths on input, and that will be fixed on a
subsequent patch.

Let's fix this API issue for now.

This patch will have to be backported, and in versions 2.3 and older,
the function is in dns.c and is called dns_str_to_dn_label().
2021-10-14 21:24:18 +02:00
Willy Tarreau
6823a3acee MINOR: protocol: uniformize protocol errors
Some protocols fail with "error blah [ip:port]" and other fail with
"[ip:port] error blah". All this already appears in a "starting" or
"binding" context after a proxy name. Let's choose a more universal
approach like below where the ip:port remains at the end of the line
prefixed with "for".

  [WARNING]  (18632) : Binding [binderr.cfg:10] for proxy http: cannot bind receiver to device 'eth2' (No such device) for [0.0.0.0:1080]
  [WARNING]  (18632) : Starting [binderr.cfg:10] for proxy http: cannot set MSS to 12 for [0.0.0.0:1080]
2021-10-14 21:22:52 +02:00
Willy Tarreau
37de553f1d MINOR: protocol: report the file and line number for binding/listening errors
Binding errors and late socket errors provide no information about
the file and line where the problem occurs. These are all done by
protocol_bind_all() and they only report "Starting proxy blah". Let's
change this a little bit so that:
  - the file name and line number of the faulty bind line is alwas mentioned
  - early binding errors are indicated with "Binding" instead of "Starting".

Now we can for example have this:
  [WARNING]  (18580) : Binding [binderr.cfg:10] for proxy http: cannot bind receiver to device 'eth2' (No such device) [0.0.0.0:1080]
2021-10-14 21:22:52 +02:00
Willy Tarreau
f78b52eb7d MINOR: inet: report the faulty interface name in "bind" errors
When a "bind ... interface foo" statement fails, let's report the
interface name in the error message to help locating it in the file.
2021-10-14 21:22:52 +02:00
Willy Tarreau
3cf05cb0b1 MINOR: proto_tcp: also report the attempted MSS values in error message
The MSS errors are the only ones not indicating what was attempted, let's
report the value that was tried, as it can help users spot them in the
config (particularly if a default value was used).
2021-10-14 21:22:52 +02:00
Bjoern Jacke
ed1748553a MINOR: proto_tcp: use chunk_appendf() to ouput socket setup errors
Right now only the last warning or error is reported from
tcp_bind_listener(), but it is useful to report all warnings and no only
the last one, so we now emit them delimited by commas. Previously we used
a fixed buffer of 100 bytes, which was too small to store more than one
message, so let's extend it.

Signed-off-by: Bjoern Jacke <bjacke@samba.org>
2021-10-14 21:22:52 +02:00
Remi Tricot-Le Breton
130e142ee2 MEDIUM: jwt: Add jwt_verify converter to verify JWT integrity
This new converter takes a JSON Web Token, an algorithm (among the ones
specified for JWS tokens in RFC 7518) and a public key or a secret, and
it returns a verdict about the signature contained in the token. It does
not simply return a boolean because some specific error cases cas be
specified by returning an integer instead, such as unmanaged algorithms
or invalid tokens. This enables to distinguich malformed tokens from
tampered ones, that would be valid format-wise but would have a bad
signature.
This converter does not perform a full JWT validation as decribed in
section 7.2 of RFC 7519. For instance it does not ensure that the header
and payload parts of the token are completely valid JSON objects because
it would need a complete JSON parser. It only focuses on the signature
and checks that it matches the token's contents.
2021-10-14 16:38:14 +02:00
Remi Tricot-Le Breton
0a72f5ee7c MINOR: jwt: jwt_header_query and jwt_payload_query converters
Those converters allow to extract a JSON value out of a JSON Web Token's
header part or payload part (the two first dot-separated base64url
encoded parts of a JWS in the Compact Serialization format).
They act as a json_query call on the corresponding decoded subpart when
given parameters, and they return the decoded JSON subpart when no
parameter is given.
2021-10-14 16:38:13 +02:00
Remi Tricot-Le Breton
864089e0a6 MINOR: jwt: Insert public certificates into dedicated JWT tree
A JWT signed with the RSXXX or ESXXX algorithm (RSA or ECDSA) requires a
public certificate to be verified and to ensure it is valid. Those
certificates must not be read on disk at runtime so we need a caching
mechanism into which those certificates will be loaded during init.
This is done through a dedicated ebtree that is filled during
configuration parsing. The path to the public certificates will need to
be explicitely mentioned in the configuration so that certificates can
be loaded as early as possible.
This tree is different from the ckch one because ckch entries are much
bigger than the public certificates used in JWT validation process.
2021-10-14 16:38:12 +02:00
Remi Tricot-Le Breton
e0d3c00086 MINOR: jwt: JWT tokenizing helper function
This helper function splits a JWT under Compact Serialization format
(dot-separated base64-url encoded strings) into its different sub
strings. Since we do not want to manage more than JWS for now, which can
only have at most three subparts, any JWT that has strictly more than
two dots is considered invalid.
2021-10-14 16:38:10 +02:00
Remi Tricot-Le Breton
7feb361776 MINOR: jwt: Parse JWT alg field
The full list of possible algorithms used to create a JWS signature is
defined in section 3.1 of RFC7518. This patch adds a helper function
that converts the "alg" strings into an enum member.
2021-10-14 16:38:08 +02:00
Remi Tricot-Le Breton
f5dd337b12 MINOR: http: Add http_auth_bearer sample fetch
This fetch can be used to retrieve the data contained in an HTTP
Authorization header when the Bearer scheme is used. This is used when
transmitting JSON Web Tokens for instance.
2021-10-14 16:38:07 +02:00
William Lallemand
1d58b01316 MINOR: ssl: add ssl_fc_is_resumed to "option httpslog"
In order to trace which session were TLS resumed, add the
ssl_fc_is_resumed in the httpslog option.
2021-10-14 14:27:48 +02:00
Amaury Denoyelle
493bb1db10 MINOR: quic: handle CONNECTION_CLOSE frame
On receiving CONNECTION_CLOSE frame, the mux is flagged for immediate
connection close. A stream is closed even if there is data not ACKed
left if CONNECTION_CLOSE has been received.
2021-10-13 16:38:56 +02:00
Amaury Denoyelle
1e308ffc79 MINOR: mux: remove last occurences of qcc ring buffer
The mux tx buffers have been rewritten with buffers attached to qcs
instances. qc_buf_available and qc_get_buf functions are updated to
manipulates qcs. All occurences of the unused qcc ring buffer are
removed to ease the code maintenance.
2021-10-13 16:38:56 +02:00
Amaury Denoyelle
cae0791942 MEDIUM: mux-quic: defer stream shut if remaining tx data
Defer the shutting of a qcs if there is still data in its tx buffers. In
this case, the conn_stream is closed but the qcs is kept with a new flag
QC_SF_DETACH.

On ACK reception, the xprt wake up the shut_tl tasklet if the stream is
flagged with QC_SF_DETACH. This tasklet is responsible to free the qcs
and possibly the qcc when all bidirectional streams are removed.
2021-10-13 16:38:56 +02:00
Amaury Denoyelle
ac8ee25659 MINOR: mux-quic: implement standard method to detect if qcc is dead
For the moment, a quic connection is considered dead if it has no
bidirectional streams left on it. This test is implemented via
qcc_is_dead function. It can be reused to properly close the connection
when needed.
2021-10-13 16:38:56 +02:00
Amaury Denoyelle
4fc8b1cb17 CLEANUP: h3: remove dead code
Remove unused function. This will simplify code maintenance.
2021-10-13 16:38:56 +02:00
Amaury Denoyelle
a587136c6f MINOR: mux-quic: standardize h3 settings sending
Use same buffer management to send h3 settings as for streams. This
simplify the code maintenance with unused function removed.
2021-10-13 16:38:56 +02:00
Amaury Denoyelle
a543eb1f6f MEDIUM: h3: properly manage tx buffers for large data
Properly handle tx buffers management in h3 data sending. If there is
not enough contiguous space, the buffer is first realigned. If this is
not enough, the stream is flagged with QC_SF_BLK_MROOM waiting for the
buffer to be emptied.

If a frame on a stream is successfully pushed for sending, the stream is
called if it was flagged with QC_SF_BLK_MROOM.
2021-10-13 16:38:56 +02:00
Amaury Denoyelle
d3d97c6ae7 MEDIUM: mux-quic: rationalize tx buffers between qcc/qcs
Remove the tx mux ring buffers in qcs, which should be in the qcc. For
the moment, use a simple architecture with 2 simple tx buffers in the
qcs.

The first buffer is used by the h3 layer to prepare the data. The mux
send operation transfer these into the 2nd buffer named xprt_buf. This
buffer is only freed when an ACK has been received.

This architecture is functional but not optimal for two reasons :
- it won't limit the buffer usage by connection
- each transfer on a new stream requires an allocation
2021-10-13 16:38:56 +02:00
Remi Tricot-Le Breton
b01179aa92 MINOR: ssl: Add ssllib_name_startswith precondition
This new ssllib_name_startswith precondition check can be used to
distinguish application linked with OpenSSL from the ones linked with
other SSL libraries (LibreSSL or BoringSSL namely). This check takes a
string as input and returns 1 when the SSL library's name starts with
the given string. It is based on the OpenSSL_version function which
returns the same output as the "openssl version" command.
2021-10-13 11:28:08 +02:00
Tim Duesterhus
9e5e586e35 BUG/MINOR: lua: Fix lua error handling in hlua_config_prepend_path()
Set an `lua_atpanic()` handler before calling `hlua_prepend_path()` in
`hlua_config_prepend_path()`.

This prevents the process from abort()ing when `hlua_prepend_path()` fails
for some reason.

see GitHub Issue #1409

This is a very minor issue that can't happen in practice. No backport needed.
2021-10-12 11:28:57 +02:00
Christopher Faulet
8c67eceeca CLEANUP: stream: Properly indent current_rule line in "show sess all"
This line is not related to the response channel but to the stream. Thus it
must be indented at the same level as stream-interfaces, connections,
channels...
2021-10-12 11:27:24 +02:00
Christopher Faulet
d4762b8474 MINOR: stream: report the current filter in "show sess all" when known
Filters can block the stream on pre/post analysis for any reason and it can
be useful to report it in "show sess all". So now, a "current_filter" extra
line is reported for each channel if a filter is blocking the analysis. Note
that this does not catch the TCP/HTTP payload analysis because all
registered filters are always evaluated when more data are received.
2021-10-12 11:26:49 +02:00
Willy Tarreau
1274e10d5c MINOR: stream: report the current rule in "show sess all" when known
Sometimes an HTTP or TCP rule may take time to complete because it is
waiting for external data (e.g. "wait-for-body", "do-resolve"), and it
can be useful to report the action and the location of that rule in
"show sess all". Here for streams blocked on such a rule, there will
now be a "current_line" extra line reporting this. Note that this does
not catch rulesets which are re-evaluated from the start on each change
(e.g. tcp-request content waiting for changes) but only when a specific
rule is being paused.
2021-10-12 07:38:30 +02:00
Willy Tarreau
c9e4868510 MINOR: rules: add a file name and line number to act_rules
These ones are passed on rule creation for the sole purpose of being
reported in "show sess", which is not done yet. For now the entries
are allocated upon rule creation and freed in free_act_rules().
2021-10-12 07:38:30 +02:00
Willy Tarreau
d535f807bb MINOR: rules: add a new function new_act_rule() to allocate act_rules
Rules are currently allocated using calloc() by their caller, which does
not make it very convenient to pass more information such as the file
name and line number.

This patch introduces new_act_rule() which performs the malloc() and
already takes in argument the ruleset (ACT_F_*), the file name and the
line number. This saves the caller from having to assing ->from, and
will allow to improve the internal storage with more info.
2021-10-12 07:38:30 +02:00
Willy Tarreau
db2ab8218c MEDIUM: stick-table: never learn the "conn_cur" value from peers
There have been a large number of issues reported with conn_cur
synchronization because the concept is wrong. In an active-passive
setup, pushing the local connections count from the active node to
the passive one will result in the passive node to have a higher
counter than the real number of connections. Due to this, after a
switchover, it will never be able to close enough connections to
go down to zero. The same commonly happens on reloads since the new
process preloads its values from the old process, and if no connection
happens for a key after the value is learned, it is impossible to reset
the previous ones. In active-active setups it's a bit different, as the
number of connections reflects the number on the peer that pushed last.

This patch solves this by marking the "conn_cur" local and preventing
it from being learned from peers. It is still pushed, however, so that
any monitoring system that collects values from the peers will still
see it.

The patch is tiny and trivially backportable. While a change of behavior
in stable branches is never welcome, it remains possible to fix issues
if reports become frequent.
2021-10-08 17:53:12 +02:00
Willy Tarreau
e3f4d7496d MEDIUM: config: resolve relative threads on bind lines to absolute ones
Now threads ranges specified on bind lines will be turned to effective
ones that will lead to a usable thread mask and a group ID.
2021-10-08 17:22:26 +02:00
Willy Tarreau
627def9e50 MINOR: threads: add a new function to resolve config groups and masks
In the configuration sometimes we'll omit a thread group number to designate
a global thread number range, and sometimes we'll mention the group and
designate IDs within that group. The operation is more complex than it
seems due to the need to check for ranges spanning between multiple groups
and determining groups from threads from bit masks and remapping bit masks
between local/global.

This patch adds a function to perform this operation, it takes a group and
mask on input and updates them on output. It's designed to be used by "bind"
lines but will likely be usable at other places if needed.

For situations where specified threads do not exist in the group, we have
the choice in the code between silently fixing the thread set or failing
with a message. For now the better option seems to return an error, but if
it turns out to be an issue we can easily change that in the future. Note
that it should only happen with "x/even" when group x only has one thread.
2021-10-08 17:22:26 +02:00
Willy Tarreau
d57b9ff7af MEDIUM: listeners: support the definition of thread groups on bind lines
This extends the "thread" statement of bind lines to support an optional
thread group number. When unspecified (0) it's an absolute thread range,
and when specified it's one relative to the thread group. Masks are still
used so no more than 64 threads may be specified at once, and a single
group is possible. The directive is not used for now.
2021-10-08 17:22:26 +02:00
Willy Tarreau
a3870b7952 MINOR: debug: report the group and thread ID in the thread dumps
Now thread dumps will report the thread group number and the ID within
this group. Note that this is still quite limited because some masks
are calculated based on the thread in argument while they have to be
performed against a group-level thread ID.
2021-10-08 17:22:26 +02:00
Willy Tarreau
b90935c908 MINOR: threads: add the current group ID in thread-local "tgid" variable
This is the equivalent of "tid" for ease of access. In the future if we
make th_cfg a pure thread-local array (not a pointer), it may make sense
to move it there.
2021-10-08 17:22:26 +02:00
Willy Tarreau
43ab05b3da MEDIUM: threads: replace ha_set_tid() with ha_set_thread()
ha_set_tid() was randomly used either to explicitly set thread 0 or to
set any possibly incomplete thread during boot. Let's replace it with
a pointer to a valid thread or NULL for any thread. This allows us to
check that the designated threads are always valid, and to ignore the
thread 0's mapping when setting it to NULL, and always use group 0 with
it during boot.

The initialization code is also cleaner, as we don't pass ugly casts
of a thread ID to a pointer anymore.
2021-10-08 17:22:26 +02:00
Willy Tarreau
cc7a11ee3b MINOR: threads: set the tid, ltid and their bit in thread_cfg
This will be a convenient way to communicate the thread ID and its
local ID in the group, as well as their respective bits when creating
the threads or when only a pointer is given.
2021-10-08 17:22:26 +02:00
Willy Tarreau
6eee85f887 MINOR: threads: set the group ID and its bit in the thread group
This will ease the reporting of the current thread group ID when coming
from the thread itself, especially since it returns the visible ID,
starting at 1.
2021-10-08 17:22:26 +02:00
Willy Tarreau
e6806ebecc MEDIUM: threads: automatically assign threads to groups
This takes care of unassigned threads groups and places unassigned
threads there, in a more or less balanced way. Too sparse allocations
may still fail though. For now with a maximum group number fixed to 1
nothing can really fail.
2021-10-08 17:22:26 +02:00
Willy Tarreau
d04bc3ac21 MINOR: global: add a new "thread-group" directive
This registers a mapping of threads to groups by enumerating for each thread
what group it belongs to, and marking the group as assigned. It takes care of
checking for redefinitions, overlaps, and holes. It supports both individual
numbers and ranges. The thread group is referenced from the thread config.
2021-10-08 17:22:26 +02:00
Willy Tarreau
c33b969e35 MINOR: global: add a new "thread-groups" directive
This is used to configure the number of thread groups. For now it can
only be 1.
2021-10-08 17:22:26 +02:00
Willy Tarreau
f9662848f2 MINOR: threads: introduce a minimalistic notion of thread-group
This creates a struct tgroup_info which knows the thread ID of the first
thread in a group, and the number of threads in it. For now there's only
one thread group supported in the configuration, but it may be forced to
other values for development purposes by defining MAX_TGROUPS, and it's
enabled even when threads are disabled and will need to remain accessible
during boot to keep a simple enough internal API.

For the purpose of easing the configurations which do not specify a thread
group, we're starting group numbering at 1 so that thread group 0 can be
"undefined" (i.e. for "bind" lines or when binding tasks).

The goal will be to later move there some global items that must be
made per-group.
2021-10-08 17:22:26 +02:00
Willy Tarreau
6036342f58 MINOR: thread: make "ti" a const pointer and clean up thread_info a bit
We want to make sure that the current thread_info accessed via "ti" will
remain constant, so that we don't accidentally place new variable parts
there and so that the compiler knows that info retrieved from there is
not expected to have changed between two function calls.

Only a few init locations had to be adjusted to use the array and the
rest is unaffected.
2021-10-08 17:22:26 +02:00