Commit Graph

149 Commits

Author SHA1 Message Date
Aurelien DARRAGON
d3d35f0fc6 BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char (2)
Fix build warning on NetBSD by reapplying f278eec37a ("BUILD: tree-wide:
cast arguments to tolower/toupper to unsigned char").

This should fix issue #2551.
2024-07-18 13:29:52 +02:00
Christopher Faulet
91fe085943 BUG/MINOR: promex: Skip resolvers metrics when there is no resolver section
By default, there is always at least on resolver section, the default one,
based on "/etc/resolv.conf" content. However, it is possible to have no
resolver at all if the file is empty or if any error occurred. Errors are
silently ignored at this stage.

In that case, there was a bug in the Prometheus exporter leading to a crash
because the resolver section list is empty. An invalid resolver entity was
used. To fix the issue we must only take care to not dump resolvers metrics
when there is no resolver.

Thanks to Aurelien to have spotted the offending commit.

This patch should fix the issue #2604. It must be backported to 3.0.
2024-06-12 08:55:52 +02:00
Aurelien DARRAGON
c16eba8183 BUG/MEDIUM: server/dns: preserve server's port upon resolution timeout or error
@boi4 reported in GH #2578 that since 3.0-dev1 for servers with address
learned from A/AAAA records after a DNS flap server would be put out of
maintenance with proper address but with invalid port (== 0), making it
unusable and causing tcp checks to fail:

[NOTICE]   (1) : Loading success.
[WARNING]  (8) : Server mybackend/myserver1 is going DOWN for maintenance (DNS refused status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT]    (8) : backend 'mybackend' has no server available!
[WARNING]  (8) : mybackend/myserver1: IP changed from '(none)' to '127.0.0.1' by 'myresolver/ns1'.
[WARNING]  (8) : Server mybackend/myserver1 ('myhost') is UP/READY (resolves again).
[WARNING]  (8) : Server mybackend/myserver1 administratively READY thanks to valid DNS answer.
[WARNING]  (8) : Server mybackend/myserver1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

@boi4 also mentioned that this used to work fine before.

Willy suggested that this regression may have been introduced by 64c9c8e
("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS")

Turns out he was right! Indeed, in 64c9c8e we systematically memset the
whole server_inetaddr struct (which contains both the requested server's
addr and port planned for atomic update) instead of only memsetting the
addr part of the structure: except when SRV records are involved (SRV
records provide both the address and the port unlike A or AAAA records),
we must not reset the server's port upon DNS errors because the port may
have been provided at config time and we don't want to lose its value.

Big thanks to @boi4 for his well-documented issue that really helped us to
pinpoint the bug right on time for the dev-13 release.

No backport needed (unless 64c9c8e gets backported).
2024-05-24 15:29:48 +02:00
Amaury Denoyelle
634cc2a5d8 MINOR: counters: move last_change into counters struct
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.

Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.

Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
2024-05-02 10:55:25 +02:00
Amaury Denoyelle
65624876f2 MINOR: stats: introduce a more expressive stat definition method
Previously, statistics were simply defined as a list of name_desc, as
for example "stat_cols_px" for proxy stats. No notion of type was fixed
for each stat definition. This correspondance was done individually
inside stats_fill_*_line() functions. This renders the process to
define new statistics tedious.

Implement a more expressive stat definition method via a new API. A new
type "struct stat_col" for stat column to replace name_desc usage is
defined. It contains a field to store the stat nature and format. A
<cap> field is also defined to be able to define a proxy stat only for
certain type of objects.

This new type is also further extended to include counter offsets. This
allows to define a method to automatically generate a stat value field
from a "struct stat_col". This will be the subject of a future commit.

New type "struct stat_col" is fully compatible full name_desc. This
allows to gradually convert stats definition. The focus will be first
for proxies counters to implement statistics preservation on reload.
2024-04-26 10:20:57 +02:00
Tim Duesterhus
cd5d62249f CLEANUP: Reapply ist.cocci (3)
This reapplies ist.cocci across the whole src/ tree.
2024-04-02 07:27:33 +02:00
Willy Tarreau
ad31e53287 REORG: dns/ring: split the ring between the generic one and the DNS one
A ring is used for the DNS code but slightly differently from the generic
one, which prevents some important changes from being made to the generic
code without breaking DNS. As the use cases differ, it's better to just
split them apart for now and have the DNS code use its own ring that we
rename dns_ring and let the generic code continue to live on its own.

The unused parts such as CLI registration were dropped, resizing and
allocation from a mapped area were dropped. dns_ring_detach_appctx() was
kept despite not being used, so as to stay consistent with the comments
that say it must be called, despite the DNS code explicitly mentioning
that it skips it for now (i.e. this may change in the future).

Hopefully after the generic rings are converted the DNS code can migrate
back to them, though this is really not necessary.
2024-03-25 17:34:19 +00:00
Aurelien DARRAGON
59f08f65fd CLEANUP: tree-wide: use proper ERR_* return values for PRE_CHECK fcts
httpclient_precheck(), ssl_ocsp_update_precheck(), and
resolvers_create_default() functions are registered through
REGISTER_PRE_CHECK() macro to be called by haproxy during init from the
pre_check_list list. When calling functions registered in pre_check_list,
haproxy expects ERR_* return values. However those 3 functions currently
use raw return values, so we better use explicit ERR_* macros to prevent
breakage in the future if ERR_* values mapping were to change.
2024-03-07 11:48:08 +01:00
Ilya Shipitsin
96cd04f8db CLEANUP: fix typo in naming for variable "unused"
In resolvers.c:rslv_promex_next_ts() and in
stick-tables.c:stk_promex_next_ts(), an unused argument was mistakenly
called "unsued" instead of "unused". Let's fix this in a separate patch
so that it can be omitted from backports if this causes build problems.
2024-03-05 11:50:34 +01:00
Christopher Faulet
868205943c MAJOR: stats: Send stats dump over HTTP using zero-copy forwarding
Just like for the cache applet, it is now possible to send response to the
opposite side using the zero-copy forwarding. Internal functions were
slightly updated but there is nothing special to say. Except the requested
size during the nego stage is not exact.
2024-02-07 15:04:48 +01:00
Christopher Faulet
ca6f0ca82b MEDIUM: promex/resolvers: Dump resolvers metrics via a promex module
Just like for stick-tables, this patch adds a promex module to dump
resolvers metrics. It adds the "resolver" scope and for now, it dumps
folloowing metrics:

  * haproxy_resolver_sent
  * haproxy_resolver_send_error
  * haproxy_resolver_valid
  * haproxy_resolver_update
  * haproxy_resolver_cname
  * haproxy_resolver_cname_error
  * haproxy_resolver_any_err
  * haproxy_resolver_nx
  * haproxy_resolver_timeout
  * haproxy_resolver_refused
  * haproxy_resolver_other
  * haproxy_resolver_invalid
  * haproxy_resolver_too_big
  * haproxy_resolver_outdated
2024-02-02 09:11:34 +01:00
Christopher Faulet
3246f863d6 MEDIUM: stats: Be able to access a specific field into a stats module
It is now possible to selectively retrieve extra counters from stats
modules. H1, H2, QUIC and H3 fill_stats() callback functions are updated to
return a specific counter.
2024-02-01 12:00:53 +01:00
Aurelien DARRAGON
c5cace3100 BUG/MEDIUM: server/dns: perform svc_port updates atomically from SRV records
This was the last missing bit from cd994407a ("BUG/MAJOR: server/addr:
fix a race during server addr:svc_port updates")

Indeed, despite the fix, svc_port updates from resolvers were still
directly performed on the server's struct.

Now they make proper use of the server_set_inetaddr() function so the port
change (+ optional addr change with AR) will be propagated atomically.

This patch depends on:
 - "MINOR: server: ensure connection cleanup on server addr changes"
 - "CLEANUP: server/event_hdl: remove purge_conn hint in INETADDR event"
 - "MEDIUM: server: merge srv_update_addr() and srv_update_addr_port() logic"
 - "MEDIUM: server: make server_set_inetaddr() updater serializable"
 - "MINOR: server/event_hdl: expose updater info through INETADDR event"
 - "MINOR: server: add dns hint in server_inetaddr_updater struct"
 - "MEDIUM: server/dns: clear RMAINT when addr resolves again"

While it could be backported in 2.9 with cd994407a ("BUG/MAJOR:
server/addr: fix a race during server addr:svc_port updates") to ensure
addr and svc_port updates performed by resolver's code comply with the
API taking care of pushing the update (and thus avoid any race), some
patch dependencies are quite sensitive so it's probably best to avoid
backporting for no good reason, or at least wait for it to be considered
stable to prevent any breakeages
2023-12-21 14:22:27 +01:00
Aurelien DARRAGON
64c9c8ef39 BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS
As seen before, server's addr and svc_port should not be updated directly
during runtime, because even if the update is performed under the lock,
some competing threads might be reading ->addr and ->svc_port without
the lock because they simply cannot afford it.

To prevent races with such competing threads, server's addr and port
should only be updated using server_set_inetaddr() function or similar.

This patch depends on:
 - "MINOR: server: ensure connection cleanup on server addr changes"
 - "CLEANUP: server/event_hdl: remove purge_conn hint in INETADDR event"
 - "MEDIUM: server: merge srv_update_addr() and srv_update_addr_port() logic"
 - "MEDIUM: server: make server_set_inetaddr() updater serializable"
 - "MINOR: server/event_hdl: expose updater info through INETADDR event"
 - "MINOR: server: add dns hint in server_inetaddr_updater struct"
 - "MEDIUM: server/dns: clear RMAINT when addr resolves again"

While it could be backported in 2.9 with cd994407a ("BUG/MAJOR:
server/addr: fix a race during server addr:svc_port updates") to ensure
addr and svc_port reset performed by resolver's code comply with the
API taking care of pushing the update (and thus avoid any race), some
patch dependencies are quite sensitive so it's probably best to avoid
backporting for no good reason, or at least wait for it to be considered
stable to prevent any breakeages.
2023-12-21 14:22:27 +01:00
Aurelien DARRAGON
334ebfa1a2 MEDIUM: server/dns: clear RMAINT when addr resolves again
snr_update_srv_status() and srvrq_update_srv_status() will both set or
clear the server RMAINT state depending of the result of the current dns
resolution.

This used to work pretty well in the past, but now that addr:svc_port
changes are changed atomically through a dedicated task, the change is
performed asynchronously, so this can cause some flapping issues if the
server is put out of maintenance while the server's address is still
unassigned.

To prevent errors, the resolver's code is now only allowed to put the
server under maintenance but not to remove it from maintenance:

the decision to remove a server from maintenance is performed by the task
responsible for updating the server's addr: if the addr resolves again
thanks to a valid DNS resolution and the server was previously under
RMAINT, then it cleared from RMAINT state.

srvrq_update_srv_status() was renamed srvrq_set_srv_down(), since it is
only called to put the server in maintenance as a result of a failing
SRV entry.

snr_update_srv_status() was renamed srv_set_srv_down() and slightly
modified so that it only takes care of putting the server under
maintenance when needed.

The cli command "set server x/y addr" does not need to remove the RMAINT
flag anymore.
2023-12-21 14:22:27 +01:00
Aurelien DARRAGON
3ac79b504a MEDIUM: server: make server_set_inetaddr() updater serializable
server_set_inetaddr() updater argument is a simple char * string
containing infos about the caller responsible for the update.

In this patch, we try to make this argument serializable, that is, make
it so that we can easily export it without having to keep the original
pointer passed by the caller or having to work with strings of variable
lengths.

This was a prerequisite for exposing more updater information through
SERVER_INETADDR event (upcoming patch).

Static strings were simply mapped to a fixed ID that can be converted back
to a string when needed using server_inetaddr_updater_by_to_str(). One
special case one made for the SERVER_INETADDR_UPDATER_DNS_RESOLVER updater
since in this case the updater hint has to be generated from the
corresponding resolver id / nameserver id combination. This was achieved
by saving the nameserver id within the updater struct. Knowing that the
resolver id can be guessed from the server struct directly, it was not
exposed through the updater struct.

This patch depends on:
 - "MINOR: resolvers: add unique numeric id to nameservers"

No functional change should be expected.
2023-12-21 14:22:27 +01:00
Aurelien DARRAGON
2f6120d6d4 MINOR: resolvers: add unique numeric id to nameservers
When we want to avoid keeping pointers on a nameserver struct, it's not
always convenient to refer as a nameserver using it's text-based unique
identifier since it's not limited in length thus it cannot be serialized
and deserialized safely.

To address this limitation, we add a new ->puid member in dns_nameserver
struct which is a parent-unique numeric value that can be used to refer
to the dns nameserver within its parent resolver context.

To achieve this, we reused the resolver->nb_nameserver member that wasn't
used. Each time we add a new nameserver to a resolver: we set ns->puid to
the current number of nameservers within the resolver and we increment
this number right away.

Public helper function find_nameserver_by_resolvers_and_id() was added to
help retrieve nameserver pointer from (resolver X nameserver puid)
combination.
2023-12-21 14:22:27 +01:00
William Lallemand
0d2ebb53f7 BUG/MINOR: resolvers: default resolvers fails when network not configured
Bug #1740 was opened again, this time a user is complaining about the
"can't create socket for nameserver". This can happen if the resolv.conf
file contains a class of address which was not configured on the
machine, for example IPv6.

The fix does the same as b10b1196b ("MINOR: resolvers: shut the warning
when "default" resolvers is implicit"), and uses the
"resolvers->conf.implicit" variable to emit the error.

Though it is not needed to convert the explicit behavior with a
ERR_WARN, because this is supposed to be an unrecoverable error, unlike
the connect().

Should fix issue #1740.

Must be backported were b10b1196b was backported. (as far as 2.6)
2023-12-18 15:50:07 +01:00
Aurelien DARRAGON
12582eb8e5 MINOR: tools: make str2sa_range() directly return type hints
str2sa_range() already allows the caller to provide <proto> in order to
get a pointer on the protocol matching with the string input thanks to
5fc9328a ("MINOR: tools: make str2sa_range() directly return the protocol")

However, as stated into the commit message, there is a trick:
   "we can fail to return a protocol in case the caller
    accepts an fqdn for use later. This is what servers do and in this
    case it is valid to return no protocol"

In this case, we're unable to return protocol because the protocol lookup
depends on both the [proto type + xprt type] and the [family type] to be
known.

While family type might not be directly resolved when fqdn is involved
(because family type might be discovered using DNS queries), proto type
and xprt type are already known. As such, the caller might be interested
in knowing those address related hints even if the address family type is
not yet resolved and thus the matching protocol cannot be looked up.

Thus in this patch we add the optional net_addr_type (custom type)
argument to str2sa_range to enable the caller to check the protocol type
and transport type when the function succeeds.
2023-11-10 17:49:57 +01:00
Christopher Faulet
06e9c81bd0 MEDIUM: resolvers: Stop scheduling resolution during stopping stage
When the process is stopping, the server resolutions are suspended. However
the task is still periodically woken up for nothing. If there is a huge
number of resolution, it may lead to a noticeable CPU consumption for no
reason.

To avoid this extra CPU cost, we stop to schedule the the resolution tasks
during the stopping stage. Of course, it is only true for server
resolutinos. Dynamic ones, via do-resolve actions, are not concerned. These
ones must still be triggered during stopping stage.

Concretly, during the stopping stage, the resolvers task is no longer
scheduled if there is no running resolutions. In this case, if a do-resolve
action is evaluated, the task is woken up.

This patch should partially solve the issue #2145.
2023-05-17 16:48:33 +02:00
Christopher Faulet
7b3d38a633 MEDIUM: tree-wide: Change sc API to specify required free space to progress
sc_need_room() now takes the required free space to receive more data as
parameter. All calls to this function are updated accordingly. For now, this
value is set but not used. When we are waiting for a buffer, 0 is used. So
we expect to be unblocked ASAP. However this must be reviewed because
SC_FL_NEED_BUF is probably enough in this case and this flag is already set
if the input buffer allocation fails.
2023-05-05 15:44:23 +02:00
Christopher Faulet
f4258bdf3b MINOR: stats: Use the applet API to write data
stats_putchk() is updated to use the applet API instead of the channel API
to write data. To do so, the appctx is passed as parameter instead of the
channel. This way, the applet does not need to take care to request more
room it it fails to put data into the channel's buffer.
2023-05-05 15:41:29 +02:00
Willy Tarreau
69530f59ae MEDIUM: clock: replace timeval "now" with integer "now_ns"
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".

All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really  is.

The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.

The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.

The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.

One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
2023-04-28 16:08:08 +02:00
Willy Tarreau
eed5da1037 MINOR: clock: do not use now.tv_sec anymore
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
2023-04-28 16:08:08 +02:00
Christopher Faulet
89aeabff5b BUG/MINOR: resolvers: Use sc_need_room() to wait more room when dumping stats
It was a cut/paste typo during stream-interface to conn-stream
refactoring. sc_have_room() was used instead of sc_need_room().

This patch must be backported as far as 2.6.
2023-04-28 08:51:34 +02:00
Tim Duesterhus
1307cd42d2 CLEANUP: Stop checking the pointer before calling ring_free()
Changes performed with this Coccinelle patch:

    @@
    expression e;
    @@

    - if (e != NULL) {
    	ring_free(e);
    - }

    @@
    expression e;
    @@

    - if (e) {
    	ring_free(e);
    - }

    @@
    expression e;
    @@

    - if (e)
    	ring_free(e);

    @@
    expression e;
    @@

    - if (e != NULL)
    	ring_free(e);
2023-04-23 00:28:25 +02:00
Tim Duesterhus
fe83f58906 CLEANUP: Stop checking the pointer before calling task_free()
Changes performed with this Coccinelle patch:

    @@
    expression e;
    @@

    - if (e != NULL) {
    	task_destroy(e);
    - }

    @@
    expression e;
    @@

    - if (e) {
    	task_destroy(e);
    - }

    @@
    expression e;
    @@

    - if (e)
    	task_destroy(e);

    @@
    expression e;
    @@

    - if (e != NULL)
    	task_destroy(e);
2023-04-23 00:28:25 +02:00
Tim Duesterhus
c18e244515 CLEANUP: Stop checking the pointer before calling pool_free()
Changes performed with this Coccinelle patch:

    @@
    expression e;
    expression p;
    @@

    - if (e != NULL) {
    	pool_free(p, e);
    - }

    @@
    expression e;
    expression p;
    @@

    - if (e) {
    	pool_free(p, e);
    - }

    @@
    expression e;
    expression p;
    @@

    - if (e)
    	pool_free(p, e);

    @@
    expression e;
    expression p;
    @@

    - if (e != NULL)
    	pool_free(p, e);
2023-04-23 00:28:25 +02:00
Christopher Faulet
5220a8c5c4 BUG/MEDIUM: resolvers: Force the connect timeout for DNS resolutions
Timeouts for dynamic resolutions are not handled at the stream level but by
the resolvers themself. It means there is no connect, client and server
timeouts defined on the internal proxy used by a resolver.

While it is not an issue for DNS resolution over UDP, it can be a problem
for resolution over TCP. New sessions are automatically created when
required, and killed on excess. But only established connections are
considered. Connecting ones are never killed. Because there is no conncet
timeout, we rely on the kernel to report a connection error. And this may be
quite long.

Because resolutions are periodically triggered, this may lead to an excess
of unusable sessions in connecting state. This also prevents HAProxy to
quickly exit on soft-stop. It is annoying, especially because there is no
reason to not set a connect timeout.

So to mitigate the issue, we now use the "resolve" timeout as connect
timeout for the internal proxy attached to a resolver.

This patch should be backported as far as 2.4.
2023-04-11 08:19:06 +02:00
Christopher Faulet
142cc1b52a BUG/MINOR: resolvers: Wakeup DNS idle task on stopping
Thanks to previous commit ("BUG/MEDIUM: dns: Kill idle DNS sessions during
stopping stage"), DNS idle sessions are killed on stopping staged. But the
task responsible to kill these sessions is running every 5 seconds. It
means, when HAProxy is stopped, we can observe a delay before the process
exits.

To reduce this delay, when the resolvers task is executed, all DNS idle
tasks are woken up.

This patch must be backported as far as 2.6.
2023-04-11 08:19:06 +02:00
Christopher Faulet
52ec6f14c4 BUG/MEDIUM: resolvers: Properly stop server resolutions on soft-stop
When HAproxy is stopping, the DNS resolutions must be stopped, except those
triggered from a "do-resolve" action. To do so, the resolutions themselves
cannot be destroyed, the current design is too complex. However, it is
possible to mute the resolvers tasks. The same is already performed with the
health-checks. On soft-stop, the tasks are still running periodically but
nothing if performed.

For the resolvers, when the process is stopping, before running a
resolution, we check all the requesters attached to this resolution. If t
least a request is a stream or if there is a requester attached to a running
proxy, a new resolution is triggered. Otherwise, we ignored the
resolution. It will be evaluated again on the next wakeup. This way,
"do-resolv" action are still working during soft-stop but other resoluation
are stopped.

Of course, it may be see as a feature and not a bug because it was never
performed. But it is in fact not expected at all to still performing
resolutions when HAProxy is stopping. In addution, a proxy option will be
added to change this behavior.

This patch partially fixes the issue #1874. It could be backported to 2.7
and maybe to 2.6. But no further.
2023-03-14 15:23:55 +01:00
Aurelien DARRAGON
e5958d0292 BUG/MEDIUM: stats: fix resolvers dump
In ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats"),
we forgot to apply the patch in resolvers.c which provides the
stats_dump_resolvers() function that is involved when dumping with "resolvers"
domain.

As a consequence, resolvers dump was broken because stats_dump_one_line(),
which is used in stats_dump_resolv_to_buffer(), implicitely uses trash_chunk
from stats.c to prepare the dump, and stats_putchk() is then called with
global trash (currently empty) as output data.

Given that trash_dump variable is static and thus only available within stats.c
we change stats_putchk() function prototype so that the function does not take
the output buffer as an argument. Instead, stats_putchk() will implicitly use
the local trash_dump variable declared in stats.c.

It will also prevent further mixups between stats_dump_* functions and
stats_putchk().

This needs to be backported with ("BUG/MEDIUM: stats: Rely on a local trash
buffer to dump the stats")
2023-02-06 07:53:03 +01:00
Christopher Faulet
51dbb4cb79 BUG/MINOR: resolvers: Wait the resolution execution for a do_resolv action
The do_resolv action triggers a resolution and must wait for the
result. Concretely, if no cache entry is available, it creates a resolution
and wakes up the resolvers task. Then it yields. When the action is
recalled, if the resolution is still running, it yields again.

However, if the resolution is not running, it does not check it was
running. Thus, it is possible to ignore the resolution because the action
was recalled before the resolvers task had a chance to be executed. If there
is result, the action must yield.

This patch should fix the issue #1993. It must be backported as far as 2.0.
2023-01-11 10:31:42 +01:00
Christopher Faulet
819d48b14e BUG/MEDIUM: resolvers: Use tick_first() to update the resolvers task timeout
In resolv_update_resolvers_timeout(), the resolvers task timeout is updated
by checking running and waiting resolutions. However, to find the next
wakeup date, MIN() operator is used to compare ticks. Ticks must never be
compared with such operators, tick helper functions must be used, to
properly handled TICK_ETERNITY value. In this case, tick_first() must be
used instead of MIN().

It is an old bug but it is pretty visible since the commit fdecaf6ae4
("BUG/MINOR: resolvers: do not run the timeout task when there's no
resolution"). Because of this bug, the resolvers task timeout may be set to
TICK_ETERNITY, stopping periodic resolutions.

This patch should solve the issue #1962. It must be backported to all stable
versions.
2022-12-14 10:44:17 +01:00
Ilya Shipitsin
6f86eaae4f CLEANUP: assorted typo fixes in the code and comments
This is 33rd iteration of typo fixes
2022-11-30 14:02:36 +01:00
Willy Tarreau
fdecaf6ae4 BUG/MINOR: resolvers: do not run the timeout task when there's no resolution
The function resolv_update_resolvers_timeout() always schedules a wakeup
of the process_resolvers() task based on the "timeout resolve" setting,
regardless of the presence of an ongoing resolution or not. This is causing
one wakeup every second by default even when there's no resolvers section
(due to the default one), and can even be worse: creating a section with
"timeout resolve 1" bombs the process with 1000 wakeups per second.

Let's condition the setting to the presence of a resolution to address
this.

This issue has been there forever, but it doesn't cause that much trouble,
and given how fragile and tricky this code is, it's probably wise to
refrain from backporting it until it's reported to really cause trouble.
2022-11-21 19:21:07 +01:00
Christopher Faulet
2364b39984 BUG/MINOR: resolvers: Set port before IP address when processing SRV records
For a server subject to SRV resolution, when the server's address is set,
its dynamic cookie, if any, and its server key are computed. Both are based
on the ip/port pair. However, this happens before the server's port is
set. Thus the port is equal to 0 at this stage. It is a problem if several
servers share the same IP but with different ports because they will share
the same dynamic cookie and the same server key, disturbing this way the
connection persistency and the session stickiness.

This patch must be backported as far as 2.2.
2022-11-16 09:27:09 +01:00
Christopher Faulet
68a61b6321 BUG/MINOR: resolvers: Don't wait periodic resolution on healthcheck failure
DNS resoltions may be triggered via a "do-resolve" action or when a connection
failure is experienced during a healthcheck. Cached valid responses are used, if
possible. But if the entry is expired or if there is no valid response, a new
reolution should be performed. However, an resolution is only performed if the
"resolve" timeout is expired. Thus, when this comes from a healthcheck, it means
no extra resolution is performed at all.

Now, when the resolution is performed for a server (SRV or SRVEQ) and no valid
response is found, the resolution timer is reset (last_resolution is set to
TICK_ETERNITY). Of course, it is only performed if no resolution is already
running.

Note that this feature was broken 5 years ago when the resolvers code was
refactored (67957bd59e).

This patch should fix the issue #1906. It affects all stable versions. However,
it is probably a good idea to not backport it too far (2.6, maybe 2.4) and with
some delay.
2022-11-16 09:27:09 +01:00
Christopher Faulet
eaabf06031 BUG/MEDIUM: resolvers: Remove aborted resolutions from query_ids tree
To avoid any UAF when a resolution is released, a mechanism was added to
abort a resolution and delayed the released at the end of the current
execution path. This mechanism depends on an hard assumption: Any reference
on an aborted resolution must be removed. So, when a resolution is aborted,
it is removed from the resolver lists and inserted into a death row list.

However, a resolution may still be referenced in the query_ids tree. It is
the tree containing all resolutions with a pending request. Because aborted
resolutions are released outside the resolvers lock, it is possible to
release a resolution on a side while a query ansswer is received and
processed on another one. Thus, it is still possible to have a UAF because
of this bug.

To fix the issue, when a resolution is aborted, it is removed from any list,
but it is also removed from the query_ids tree.

This patch should solve the issue #1862 and may be related to #1875. It must
be backported as far as 2.2.
2022-09-27 11:18:17 +02:00
Willy Tarreau
0fbc16cfb9 DEBUG: resolvers: unstatify process_resolvers() to make it appear in profiling
The function appears like this in "show profiling tasks", so let's export
it:

  function       calls  cpu_tot  cpu_avg  lat_tot  lat_avg
  main+0x1463f0     92  77.28us  839.0ns  2.018ms  21.93us <- wake_expired_tasks@src/task.c:429 task_drop_running
2022-09-08 16:13:38 +02:00
William Lallemand
b10b1196b8 MINOR: resolvers: shut the warning when "default" resolvers is implicit
Shut the connect() warning of resolvers_finalize_config() when the
configuration was not emitted manually.

This shuts the warning for the "default" resolvers which is created
automatically for the httpclient.

Must be backported in 2.6.
2022-08-24 14:56:42 +02:00
William Lallemand
6020c4e44e BUG/MINOR: mworker: does not create the "default" resolvers in wait mode
When doing a re-exec, the master was creating a "default" resolvers,
which could result in a warning emitted because the "default" resolvers
of the configuration file is not available anymore.

Skip the creating of the "default" resolvers in wait mode, this is not
useful in the master.

Must be backported as far as 2.6.
2022-08-24 11:28:29 +02:00
William Lallemand
866b88bc95 BUG/MINOR: resolvers: return the correct value in resolvers_finalize_config()
Patch c31577f ("MEDIUM: resolvers: continue startup if network is
unavailable") was not working correctly. Indeed
resolvers_finalize_config() was returning a ERR type, but a postparser
is supposed to return 0 or 1.

The return value was never right, however it was only a problem since c31577f.

Must be backported in every stable branch.
2022-08-24 10:11:17 +02:00
William Lallemand
c31577f32e MEDIUM: resolvers: continue startup if network is unavailable
When haproxy starts with a resolver section, and there is a default one
since 2.6 which use /etc/resolv.conf, it tries to do a connect() with the UDP
socket in order to check if the routes of the system allows to reach the
server.

This check is too much restrictive as it won't prevent any runtime
failure.

Relax the check by making it a warning instead of a fatal alert.

This must be backported in 2.6.
2022-07-26 10:59:14 +02:00
William Lallemand
3bda80789c BUG/MINOR: resolvers: shut off the warning for the default resolvers
When the resolv.conf file is empty or there is no resolv.conf file, an
empty resolvers will be created, which emits a warning during the
postparsing step.

This patch fixes the problem by freeing the resolvers section if the
parsing failed or if the nameserver list is empty.

Must be backported in 2.6, the previous patch which introduces
resolvers_destroy() is also required.
2022-07-18 14:39:36 +02:00
William Lallemand
e606c84fee MINOR: resolvers: resolvers_destroy() deinit and free a resolver
Split the resolvers_deinit() function into resolvers_destroy() and
resolvers_deinit() in order to be able to free a unique resolvers
section.
2022-07-18 14:39:36 +02:00
Willy Tarreau
9b46fb4cca BUG/MINOR: server: do not enable DNS resolution on disabled proxies
Leonhard Wimmer reported an interesting bug in github issue #1742.
Servers in disabled proxies that are configured for resolution are still
subscribed to DNS resolutions, but the LB algos are not initialized at
all since the proxy is disabled, so when the server state changes,
attempts to update its status cause a crash when the server's weight
is recalculated via a divide by the proxy's total weight which is zero.

This should be backported to all versions. Beware that before 2.5 or
so, there's no PR_FL_DISABLED flag, instead px->disabled should be
used (2.3-2.4) or PR_STSTOPPED for older versions.

Thanks to Leonhard for his report and quick test!
2022-06-10 11:17:27 +02:00
Willy Tarreau
caff631bc0 CLEANUP: stats: rename all occurrences of stconn "cs" to "sc"
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. Both the core functions and the ones in the
resolvers files were updated.
2022-05-27 19:33:35 +02:00
Willy Tarreau
cb086c6de1 REORG: stconn: rename conn_stream.{c,h} to stconn.{c,h}
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
2022-05-27 19:33:35 +02:00
Willy Tarreau
5edca2f0e1 REORG: rename cs_utils.h to sc_strm.h
This file contains all the stream-connector functions that are specific
to application layers of type stream. So let's name it accordingly so
that it's easier to figure what's located there.

The alphabetical ordering of include files was preserved.
2022-05-27 19:33:35 +02:00