Commit Graph

16085 Commits

Author SHA1 Message Date
Tim Duesterhus
b1ec21d259 CLEANUP: Stop checking the pointer before calling tasklet_free()
Changes performed with this Coccinelle patch:

    @@
    expression e;
    @@

    - if (e != NULL) {
    	tasklet_free(e);
    - }

    @@
    expression e;
    @@

    - if (e) {
    	tasklet_free(e);
    - }

    @@
    expression e;
    @@

    - if (e)
    	tasklet_free(e);

    @@
    expression e;
    @@

    - if (e != NULL)
    	tasklet_free(e);

See GitHub Issue #2126
2023-04-23 00:28:25 +02:00
Willy Tarreau
8adffaa899 MINOR: listener: always compare the local thread as well
By comparing the local thread's load with the least loaded thread's
load, we can further improve the fairness and at the same time also
improve locality since it allows a small ratio of connections not to
be migrated. This is visible on CPU usage with long connections on
very large thread counts (224) and high bandwidth (200G). The cost
of checking the local thread's load remains fairly low so there's no
reason not to do this. We continue to update the index if we select
the local thread, because it means that the two other threads were
both more loaded so we'd rather find better ones.
2023-04-21 17:41:26 +02:00
Willy Tarreau
ff18504d73 MINOR: listener: make sure to avoid ABA updates in per-thread index
One limitation of the current thread index mechanism is that if the
values are assigned multiple times to the same thread and the index
loops, it can match again the old value, which will not prevent a
competing thread from finishing its CAS and assigning traffic to a
thread that's not the optimal one. The probability is low but the
solution is simple enough and consists in implementing an update
counter in the high bits of the index to force a mismatch in this
case (assuming we don't try to cover for extremely unlikely cases
where the update counter loops while the index remains equal). So
let's do that. In order to improve the situation a little bit, we
now set the index to a ulong so that in 32 bits we have 8 bits of
counter and in 64 bits we have 40 bits.
2023-04-21 17:41:26 +02:00
Willy Tarreau
77e33509c8 MINOR: listener: resync with the thread index before heavy calculations
During heavy accept competition, the CAS will occasionally fail and
we'll have to go through all the calculation again. While the first
two loops look heavy, they're almost never taken so they're quite
cheap. However the rest of the operation is heavy because we have to
consult connection counts and queue indexes for other threads, so
better double-check if the index is still valid before continuing.
Tests show that it's more efficient do retry half-way like this.
2023-04-21 17:41:26 +02:00
Willy Tarreau
b657492680 MINOR: listener: use a common thr_idx from the reference listener
Instead of seeing each listener use its own thr_idx, let's use the same
for all those from a shard. It should provide more accurate and smoother
thread allocation.
2023-04-21 17:41:26 +02:00
Willy Tarreau
9d360604bd MEDIUM: listener: rework thread assignment to consider all groups
Till now threads were assigned in listener_accept() to other threads of
the same group only, using a single group mask. Now that we have all the
relevant info (array of listeners of the same shard), we can spread the
thr_idx to cover all assigned groups. The thread indexes now contain the
group number in their upper bits, and the indexes run over te whole list
of threads, all groups included.

One particular subtlety here is that switching to a thread from another
group also means switching the group, hence the listener. As such, when
changing the group we need to update the connection's owner to point to
the listener of the same shard that is bound to the target group.
2023-04-21 17:41:26 +02:00
Willy Tarreau
e6f5ab5afa MINOR: listener: make accept_queue index atomic
There has always been a race when checking the length of an accept queue
to determine which one is more loaded that another, because the head and
tail are read at two different moments. This is not required, we can merge
them as two 16 bit numbers inside a single 32-bit index that is always
accessed atomically. This way we read both values at once and always have
a consistent measurement.
2023-04-21 17:41:26 +02:00
Willy Tarreau
09b52d1c3d MEDIUM: config: permit to start a bind on multiple groups at once
Now it's possible for a bind line to span multiple thread groups. When
this happens, the first one will become the reference and will be entirely
set up, and the subsequent ones will be duplicated from this reference,
so that they can be registered in distinct groups. The reference is
always setup and started first so it is always available when the other
ones are started.

The doc was updated to reflect this new possibility with its limitations
and impacts, and the differences with the "shards" option.
2023-04-21 17:41:26 +02:00
Willy Tarreau
09e266e6f5 MINOR: proto: skip socket setup for duped FDs
It's not strictly necessary, but it's still better to avoid setting
up the same socket multiple times when it's being duplicated to a few
FDs. We don't change that for inherited ones however since they may
really need to be set up, so we only skip duplicated ones.
2023-04-21 17:41:26 +02:00
Willy Tarreau
0e1aaf4e78 MEDIUM: proto: duplicate receivers marked RX_F_MUST_DUP
The different protocol's ->bind() function will now check the receiver's
RX_F_MUST_DUP flag to decide whether to bind a fresh new listener from
scratch or reuse an existing one and just duplicate it. It turns out
that the existing code already supports reusing FDs since that was done
as part of the FD passing and inheriting mechanism. Here it's not much
different, we pass the FD of the reference receiver, it gets duplicated
and becomes the new receiver's FD.

These FDs are also marked RX_F_INHERITED so that they are not exported
and avoid being touched directly (only the reference should be touched).
2023-04-21 17:41:26 +02:00
Willy Tarreau
aae1810b4d MINOR: receiver: add a struct shard_info to store info about each shard
In order to create multiple receivers for one multi-group shard, we'll
need some more info about the shard. Here we store:
  - the number of groups (= number of receivers)
  - the number of threads (will be used for accept LB)
  - pointer to the reference rx (to get the FD and to find all threads)
  - pointers to the other members (to iterate over all threads)

For now since there's only one group per shard it remains simple. The
listener deletion code already takes care of removing the current
member from its shards list and moving others' reference to the last
one if it was their reference (so as to avoid o(n^2) updates during
ordered deletes).

Since the vast majority of setups will not use multi-group shards, we
try to save memory usage by only allocating the shard_info when it is
needed, so the principle here is that a receiver shard_info==NULL is
alone and doesn't share its socket with another group.

Various approaches were considered and tests show that the management
of the listeners during boot makes it easier to just attach to or
detach from a shard_info and automatically allocate it if it does not
exist, which is what is being done here.

For now the attach code is not called, but detach is already called
on delete.
2023-04-21 17:41:26 +02:00
Willy Tarreau
84fe1f479b MINOR: listener: support another thread dispatch mode: "fair"
This new algorithm for rebalancing incoming connections to multiple
threads is simpler and instead of considering the threads load, it will
only cycle through all of them, offering a fair share of the traffic to
each thread. It may be well suited for short-lived connections but is
also convenient for very large thread counts where it's not always certain
that the least loaded thread will always be found.
2023-04-21 17:41:26 +02:00
Willy Tarreau
6a4d48b736 MINOR: quic_sock: index li->per_thr[] on local thread id, not global one
There's a li_per_thread array in each listener for use with QUIC
listeners. Since thread groups were introduced, this array can be
allocated too large because global.nbthread is allocated for each
listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP)
may be used by a single listener. This was because the global thread
ID is used as the index instead of the local ID (since a listener may
only be used by a single group). Let's just switch to local ID and
reduce the allocated size.
2023-04-21 17:41:26 +02:00
Willy Tarreau
77d37b07b1 MINOR: quic: support migrating the listener as well
When migrating a quic_conn to another thread, we may need to also
switch the listener if the thread belongs to another group. When
this happens, the freshly created connection will already have the
target listener, so let's just pick it from the connection and use
it in qc_set_tid_affinity(). Note that it will be the caller's
responsibility to guarantee this.
2023-04-21 17:41:26 +02:00
Aurelien DARRAGON
23f352f7d0 MINOR: server/event_hdl: prepare for server event data wrapper
Adding the possibility to publish an event using a struct wrapper
around existing SERVER events to provide additional contextual info.

Using the specific struct wrapper is not required: it is supported
to cast event data as a regular server event data struct so
that we don't break the existing API.

However, casting event data with a more explicit data type allows
to fetch event-only relevant hints.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
f71e0645c1 MEDIUM: server: split srv_update_status() in two functions
Considering that srv_update_status() is now synchronous again since
3ff577e1 ("MAJOR: server: make server state changes synchronous again"),
and that we can easily identify if the update is from an operational
or administrative context thanks to "MINOR: server: pass adm and op cause
to srv_update_status()".

And given that administrative and operational updates cannot be cumulated
(since srv_update_status() is called synchronously and independently for
admin updates and state/operational updates, and the function directly
consumes the changes).

We split srv_update_status() in 2 distinct parts:

Either <type> is 0, meaning the update is an operational update which
is handled by directly looking at cur_state and next_state to apply the
proper transition.
Also, the check to prevent operational state from being applied
if MAINT admin flag is set is no longer needed given that the calling
functions already ensure this (ie: srv_set_{running,stopping,stopped)

Or <type> is 1, meaning the update is an administrative update, where
cur_admin and next_admin are evaluated to apply the proper transition and
deduct the resulting server state (next_state is updated implicitly).

Once this is done, both operations share a common code path in
srv_update_status() to update proxy and servers stats if required.

Thanks to this change, the function's behavior is much more predictable,
it is not an all-in-one function anymore. Either we apply an operational
change, else it is an administrative change. That's it, we cannot mix
the 2 since both code paths are now properly separated.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
76e255520f MINOR: server: pass adm and op cause to srv_update_status()
Operational and administrative state change causes are not propagated
through srv_update_status(), instead they are directly consumed within
the function to provide additional info during the call when required.

Thus, there is no valid reason for keeping adm and op causes within
server struct. We are wasting space and keeping uneeded complexity.

We now exlicitly pass change type (operational or administrative) and
associated cause to srv_update_status() so that no extra storage is
needed since those values are only relevant from srv_update_status().
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
10518c0d59 CLEANUP: server: fix srv_set_{running, stopping, stopped} function comment
Fixing function comments for the server state changing function since they
still refer to asynchonous propagation of server state which is no longer
in play.
Moreover, there were some mixups between running/stopping.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
c54b98ac9a CLEANUP: server: remove unused variables in srv_update_status()
check and px local variable aliases are not very useful.
Let's remove them and use s->check and s->proxy instead.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
1746b56e68 MINOR: server: change srv_op_st_chg_cause storage type
This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type".

While looking at current srv_op_st_chg_cause usage, it was clear that
the struct needed some cleanup since some leftovers from asynchronous server
state change updates were left behind and resulted in some useless code
duplication, and making the whole thing harder to maintain.

Two observations were made:

- by tracking down srv_set_{running, stopped, stopping} usage,
  we can see that the <reason> argument is always a fixed statically
  allocated string.
- check-related state change context (duration, status, code...) is
  not used anymore since srv_append_status() directly extracts the
  values from the server->check. This is pure legacy from when
  the state changes were applied asynchronously.

To prevent code duplication, useless string copies and make the reason/cause
more exportable, we store it as an enum now, and we provide
srv_op_st_chg_cause() function to fetch the related description string.
HEALTH and AGENT causes (check related) are now explicitly identified to
make consumers like srv_append_op_chg_cause() able to fetch checks info
from the server itself if they need to.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
f3b48a808e MINOR: server: srv_append_status refacto
srv_append_status() has become a swiss-knife function over time.
It is used from server code and also from checks code, with various
inputs and distincts code paths, making it very hard to guess the
actual behavior of the function (resulting string output).

To simplify the logic behind it, we're dividing it in multiple contextual
functions that take simple inputs and do explicit things, making them
more predictable and easier to maintain.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
9b1ccd7325 MINOR: server: change adm_st_chg_cause storage type
Even though it doesn't look like it at first glance, this is more like
a cleanup than an actual code improvement:

Given that srv->adm_st_chg_cause has been used to exclusively store
static strings ever since it was implemented, we make the choice to
store it as an enum instead of a fixed-size string within server
struct.

This will allow to save some space in server struct, and will make
it more easily exportable (ie: event handlers) because of the
reduced memory footprint during handling and the ability to later get
the corresponding human-readable message when it's explicitly needed.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
85b91375bf MINOR: server: propagate lb changes through srv_lb_propagate()
Now that we have a generic srv_lb_propagate(s) function, let's
use it each time we explicitly wan't to set the status down as
well.

Indeed, it is tricky to try to handle "down" case explicitly,
instead we use srv_lb_propagate() which will call the proper
function that will handle the new server state.

This will allow some code cleanup and will prevent any logic
error.

This commit depends on:
- "MINOR: server: propagate server state change to lb through single function"
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
8bbe643acc MINOR: server: propagate server state change to lb through single function
Use a dedicated helper function to propagate server state change to
lb algorithms, since it is performed at multiple places within
srv_update_status() function.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
5f80f8bbc5 MINOR: server: central update for server counters on state change
Based on "BUG/MINOR: server: don't miss server stats update on server
state transitions", we're also taking advantage of the new centralized
logic to update down_trans server counter directly from there instead
of multiple places.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
9c21ff0208 BUG/MINOR: server: don't use date when restoring last_change from state file
When restoring from a state file: the server "Status" reports weird values on
the html stats page:

	"5s UP" becomes -> "? UP" after the restore

This is due to a bug in srv_state_srv_update(): when restoring the states
from a state file, we rely on date.tv_sec to compute the process-relative
server last_change timestamp.

This is wrong because everywhere else we use now.tv_sec when dealing
with last_change, for instance in srv_update_status().

date (which is Wall clock time) deviates from now (monotonic time) in the
long run.

They should not be mixed, and given that last_change is an internal time value,
we should rely on now.tv_sec instead.

last_change export through "show servers state" cli is safe since we export
a delta and not the raw time value in dump_servers_state():

	srv_time_since_last_change = now.tv_sec - srv->last_change

--

While this bug affects all stable versions, it was revealed in 2.8 thanks
to 28360dc ("MEDIUM: clock: force internal time to wrap early after boot")
This is due to the fact that "now" immediately deviates from "date",
whereas in the past they had the same value when starting.

Thus prior to 2.8 the bug is trickier since it could take some time for
date and now to deviate sufficiently for the issue to arise, and instead
of reporting absurd values that are easy to spot it could just result in
last_change becoming inconsistent over time.

As such, the fix should be backported to all stable versions.
[for 2.2 the patch needs to be applied manually since
srv_state_srv_update() was named srv_update_state() and can be found in
server.c instead of server_state.c]
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
9f5853fa38 BUG/MINOR: server: don't miss server stats update on server state transitions
s->last_change and s->down_time updates were manually updated for each
effective server state change within srv_update_status().

This is rather error-prone, and as a result there were still some state
transitions that were not handled properly since at least 1.8.

ie:
- when transitionning from DRAIN to READY: downtime was updated
  (which is wrong since a server in DRAIN state should not be
   considered as DOWN)
- when transitionning from MAINT to READY: downtime was not updated
  (this can be easily seen in the html stats page)

To fix these all at once, and prevent similar bugs from being introduced,
we centralize the server last_change and down_time stats logic at the end
of srv_update_status():

If the server state changed during the call, then it means that
last_change must be updated, with a special case when changing from
STOPPED state which means the server was previously DOWN and thus
downtime should be updated.

This patch depends on:

- "MINOR: server: explicitly commit state change in srv_update_status()"

This could be backported to every stable versions.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
e80ddb18a8 BUG/MINOR: server: don't miss proxy stats update on server state transitions
backend "down" stats logic has been duplicated multiple times in
srv_update_status(), resulting in the logic now being error-prone.

For example, the following bugfix was needed to compensate for a
copy-paste introduced bug: d332f139
("BUG/MINOR: server: update last_change on maint->ready transitions too")

While the above patch works great, we actually forgot to update the
proxy downtime like it is done for other down->up transitions...
This is simply illustrating that the current design is error-prone,
it is very easy to miss something in this area.

To properly update the proxy downtime stats on the maint->ready transition,
to cleanup srv_update_status() and to prevent similar bugs from being
introduced in the future, proxy/backend stats update are now automatically
performed at the end of the server state change if needed.

Thus we can remove existing updates that were performed at various places
within the function, this simplifies things a bit.

This patch depends on:
- "MINOR: server: explicitly commit state change in srv_update_status()"

This could be backported to all stable versions.

Backport notes:

2.2:

Replace
        struct task *srv_cleanup_toremove_conns(struct task *task, void *context, unsigned int state)

by
        struct task *srv_cleanup_toremove_connections(struct task *task, void *context, unsigned short state)
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
22151c70bb MINOR: server: explicitly commit state change in srv_update_status()
As shown in 8f29829 ("BUG/MEDIUM: checks: a down server going to
maint remains definitely stucked on down state."), state changes
that don't result in explicit lb state change, require us to perform
an explicit server state commit to make sure the next state is
applied before returning from the function.

This is the case for server state changes that don't trigger lb logic
and only perform some logging.

This is quite error prone, we could easily forget a state change
combination that could result in next_state, next_admin or next_eweight
not being applied. (cur_state, cur_admin and cur_eweight would be left
with unexpected values)

To fix this, we explicitly call srv_lb_commit_status() at the end
of srv_update_status() to enforce the new values, even if they were
already applied. (when a state changes requires lb state update
an implicit commit is already performed)

Applying the state change multiple times is safe (since the next value
always points to the current value).

Backport notes:

2.2:

Replace
	struct task *srv_cleanup_toremove_conns(struct task *task, void *context, unsigned int state)

by
	struct task *srv_cleanup_toremove_connections(struct task *task, void *context, unsigned short state)
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
9a1df02ccb BUG/MINOR: server: incorrect report for tracking servers leaving drain
Report message for tracking servers completely leaving drain is wrong:

The check for "leaving drain .. via" never evaluates because the
condition !(s->next_admin & SRV_ADMF_FDRAIN) is always true in the
current block which is guarded by !(s->next_admin & SRV_ADMF_DRAIN).

For tracking servers that leave inherited drain mode, this results in the
following message being emitted:
  "Server x/b is UP (leaving forced drain)"

Instead of:
  "Server x/b is UP (leaving drain) via x/a"

To this fix: we check if FDRAIN is currently set, else it means that the
drain status is inherited from the tracked server (IDRAIN)

This regression was introduced with 64cc49cf ("MAJOR: servers: propagate server
status changes asynchronously."), thus it may be backported to every stable
versions.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
096b383e16 MINOR: hlua/event_hdl: timestamp for events
'when' optional argument is provided to lua event handlers.

It is an integer representing the number of seconds elapsed since Epoch
and may be used in conjunction with lua `os.date()` function to provide
a custom format string.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
e9314fb7a7 MINOR: event_hdl: provide event->when for advanced handlers
For advanced async handlers only
(Registered using EVENT_HDL_ASYNC_TASK() macro):

event->when is provided as a struct timeval and fetched from 'date'
haproxy global variable.

Thanks to 'when', related event consumers will be able to timestamp
events, even if they don't work in real-time or near real-time.
Indeed, unlike sync or normal async handlers, advanced async handlers
could purposely delay the consumption of pending events, which means
that the date wouldn't be accurate if computed directly from within
the handler.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
ebf58e991a MINOR: event_hdl: dynamically allocated event data members
Add the ability to provide a cleanup function for event data passed
via the publishing function.

One use case could be the need to provide valid pointers in the safe
section of the data struct.
Cleanup function will be automatically called with data (or copy of data)
as argument when all handlers consumed the event, which provides an easy
way to release some memory or decrement refcounts to ressources that were
provided through the data struct.
data in itself may not be freed by the cleanup function, it is handled
by the API.

This would allow passing large (allocated) data blocks through the data
struct while keeping data struct size under the EVENT_HDL_ASYNC_EVENT_DATA
size limit.

To do so, when publishing an event, where we would currently do:

        struct event_hdl_cb_data_new_family event_data;

        /* safe data, available from both sync and async contexts
	 * may not use pointers to short-living resources
	 */
        event_data.safe.my_custom_data = x;

        /* unsafe data, only available from sync contexts */
        event_data.unsafe.my_unsafe_data = y;

        /* once data is prepared, we can publish the event */
        event_hdl_publish(NULL,
                          EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1,
                          EVENT_HDL_CB_DATA(&event_data));

We could do:

        struct event_hdl_cb_data_new_family event_data;

        /* safe data, available from both sync and async contexts
	 * may not use pointers to short-living resources,
	 * unless EVENT_HDL_CB_DATA_DM is used to ensure pointer
	 * consistency (ie: refcount)
	 */
        event_data.safe.my_custom_static_data = x;
	event_data.safe.my_custom_dynamic_data = malloc(1);

        /* unsafe data, only available from sync contexts */
        event_data.unsafe.my_unsafe_data = y;

        /* once data is prepared, we can publish the event */
        event_hdl_publish(NULL,
                          EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1,
                          EVENT_HDL_CB_DATA_DM(&event_data, data_new_family_cleanup));

With data_new_family_cleanup func which would look like this:

      void data_new_family_cleanup(const void *data)
      {
      	const struct event_hdl_cb_data_new_family *event_data = ptr;

	/* some data members require specific cleanup once the event
	 * is consumed
	 */
      	free(event_data.safe.my_custom_dynamic_data);
	/* don't ever free data! it is not ours */
      }

Not sure if this feature will become relevant in the future, so I prefer not
to mention it in the doc for now.

But given that the implementation is trivial and does not put a burden
on the existing API, it's a good thing to have it there, just in case.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
a63f4903c9 MINOR: server/event_hdl: prepare for upcoming refactors
This commit does nothing that ought to be mentioned, except that
it adds missing comments and slighty moves some function calls
out of "sensitive" code in preparation of some server code refactors.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
2f6a07dce8 MINOR: hlua/event_hdl: fix return type for hlua_event_hdl_cb_data_push_args
Changing hlua_event_hdl_cb_data_push_args() return type to void since it
does not return anything useful.
Also changing its name to hlua_event_hdl_cb_push_args() since it does more
than just pushing cb data argument (it also handles event type and mgmt).

Errors catched by the function are reported as lua errors.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
55f84c7cab MINOR: hlua/event_hdl: expose proxy_uuid variable in server events
Adding proxy_uuid to ServerEvent class.
proxy_uuid contains the uuid of the proxy to which the server belongs
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
3d9bf4e1a5 MINOR: hlua/event_hdl: rely on proxy_uuid instead of proxy_name for lookups
Since "MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server"
we may now use proxy_uuid variable to perform proxy lookups when
handling a server event.

It is more reliable since proxy_uuid isn't subject to any size limitation
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
d714213862 MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server
Expose proxy_uuid variable in event_hdl_cb_data_server struct to
overcome proxy_name fixed length limitation.

proxy_uuid may be used by the handler to perform proxy lookups.
This should be preferred over lookups relying proxy_name.
(proxy_name is suitable for printing / logging purposes but not for
ID lookups since it has a maximum fixed length)
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
0ddf052972 CLEANUP: server: fix update_status() function comment
srv_update_status() function comment says that the function "is designed
to be called asynchronously".

While this used to be true back then with 64cc49cf
("MAJOR: servers: propagate server status changes asynchronously.")

This is not true anymore since 3ff577e ("MAJOR: server: make server state changes
synchronous again")

Fixing the comment in order to better reflect current behavior.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
88687f0980 CLEANUP: errors: fix obsolete function comments
Since 9f903af5 ("MEDIUM: log: slightly refine the output format of
alerts/warnings/etc"), messages generated by ha_{alert,warning,notice}
don't embed date/time information anymore.

Updating some old function comments that kept saying otherwise.
2023-04-21 14:36:45 +02:00
Amaury Denoyelle
a65dd3a2c8 BUG/MINOR: quic: consume Rx datagram even on error
A BUG_ON crash can occur on qc_rcv_buf() if a Rx packet allocation
failed.

To fix this, datagram are marked as consumed even if a fatal error
occured during parsing. For the moment, only a Rx packet allocation
failure could provoke this. At this stage, it's unknown if the datagram
were partially parsed or not at all so it's better to discard it
completely.

This bug was detected using -dMfail argument.

This should be backported up to 2.7.
2023-04-20 14:49:32 +02:00
Amaury Denoyelle
d537ca79dc BUG/MINOR: quic: prevent crash on qc_new_conn() failure
Properly initialize el_th_ctx member first on qc_new_conn(). This
prevents a segfault if release should be called later due to memory
allocation failure in the function on qc_detach_th_ctx_list().

This should be backported up to 2.7.
2023-04-20 14:49:32 +02:00
Amaury Denoyelle
9bbfa72b67 BUG/MINOR: h3: fix crash on h3s alloc failure
Do not emit a CONNECTION_CLOSE on h3s allocation failure. Indeed, this
causes a crash as the calling function qcs_new() will also try to emit a
CONNECTION_CLOSE which triggers a BUG_ON() on qcc_emit_cc().

This was reproduced using -dMfail.

This should be backported up to 2.7.
2023-04-20 14:49:32 +02:00
Amaury Denoyelle
93d2ebe9f3 BUG/MINOR: mux-quic: properly handle STREAM frame alloc failure
Previously, if a STREAM frame cannot be allocated for emission, a crash
would occurs due to an ABORT_NOW() statement in _qc_send_qcs().

Replace this by proper error code handling. Each stream were sending
fails are removed temporarily from qcc::send_list to a list local to
_qc_send_qcs(). Once emission has been conducted for all streams,
reinsert failed stream to qcc::send_list. This avoids to reloop on
failed streams on the second while loop at the end of _qc_send_qcs().

This crash was reproduced using -dMfail.

This should be backported up to 2.6.
2023-04-20 14:49:32 +02:00
Amaury Denoyelle
ed820823f0 BUG/MINOR: mux-quic: fix crash with app ops install failure
On MUX initialization, the application layer is setup via
qcc_install_app_ops(). If this function fails MUX is deallocated and an
error is returned.

This code path causes a crash before connection has been registered
prior into the mux_stopping_data::list for stopping idle frontend conns.
To fix this, insert the connection later in qc_init() once no error can
occured.

The crash was seen on the process closing with SUGUSR1 with a segfault
on mux_stopping_process(). This was reproduced using -dMfail.

This regression was introduced by the following patch :
  commit b4d119f0c7
  BUG/MEDIUM: mux-quic: fix crash on H3 SETTINGS emission

This should be backported up to 2.7.
2023-04-20 14:49:32 +02:00
Frédéric Lécaille
d07421331f BUG/MINOR: quic: Wrong Retry token generation timestamp computing
Again a now_ms variable value used without the ticks API. It is used
to store the generation time of the Retry token to be received back
from the client.

Must be backported to 2.6 and 2.7.
2023-04-19 17:31:28 +02:00
Frédéric Lécaille
45662efb2f BUG/MINOR: quic: Unchecked buffer length when building the token
As server, an Initial does not contain a token but only the token length field
with zero as value. The remaining room was not checked before writting this field.

Must be backported to 2.6 and 2.7.
2023-04-19 11:36:54 +02:00
Frédéric Lécaille
0ed94032b2 MINOR: quic: Do not allocate too much ack ranges
Limit the maximum number of ack ranges to QUIC_MAX_ACK_RANGES(32).

Must be backported to 2.6 and 2.7.
2023-04-19 11:36:54 +02:00
Frédéric Lécaille
4b2627beae BUG/MINOR: quic: Stop removing ACK ranges when building packets
Since this commit:

    BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit.

There are more chances that ack ranges may be removed from their trees when
building a packet. It is preferable to impose a limit to these trees. This
will be the subject of the a next commit to come.

For now on, it is sufficient to stop deleting ack range from their trees.
Remove quic_ack_frm_reduce_sz() and quic_rm_last_ack_ranges() which were
there to do that.
Make qc_frm_len() support ACK frames and calls it to ensure an ACK frame
may be added to a packet before building it.

Must be backported to 2.6 and 2.7.
2023-04-19 11:36:54 +02:00
Aurelien DARRAGON
8cd620b46f MINOR: hlua: safe coroutine.create()
Overriding global coroutine.create() function in order to link the
newly created subroutine with the parent hlua ctx.
(hlua_gethlua() function from a subroutine will return hlua ctx from the
hlua ctx on which the coroutine.create() was performed, instead of NULL)

Doing so allows hlua_hook() function to support being called from
subroutines created using coroutine.create() within user lua scripts.

That is: the related subroutine will be immune to the forced-yield,
but it will still be checked against hlua timeouts. If the subroutine
fails to yield or finish before the timeout, the related lua handler will
be aborted (instead of going rogue unnoticed like it would be the case prior
to this commit)
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON
cf0f792490 MINOR: hlua: hook yield on known lua state
When forcing a yield attempt from hlua_hook(), we should perform it on
the known hlua state, not on a potential substate created using
coroutine.create() from an existing hlua state from lua script.

Indeed, only true hlua couroutines will properly handle the yield and
perform the required timeout checks when returning in hlua_ctx_resume().

So far, this was not a concern because hlua_gethlua() would return NULL
if hlua_hook() is not directly being called from a hlua coroutine anyway.

But with this we're trying to make hlua_hook() ready for being called
from a subcoroutine which inherits from a parent hlua ctx.
In this case, no yield attempt will be performed, we will simply check
for hlua timeouts.

Not doing so would result in the timeout checks not being performed since
hlua_ctx_resume() is completely bypassed when yielding from the subroutine,
resulting in a user-defined coroutine potentially going rogue unnoticed.
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON
2a9764baae CLEANUP: hlua: avoid confusion between internal timers and tick based timers
Not all hlua "time" variables use the same time logic.

hlua->wake_time relies on ticks since its meant to be used in conjunction
with task scheduling. Thus, it should be stored as a signed int and
manipulated using the tick api.
Adding a few comments about that to prevent mixups with hlua internal
timer api which doesn't rely on the ticks api.
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON
58e36e5b14 MEDIUM: hlua: introduce tune.lua.burst-timeout
The "burst" execution timeout applies to any Lua handler.
If the handler fails to finish or yield before timeout is reached,
handler will be aborted to prevent thread contention, to prevent
traffic from not being served for too long, and ultimately to prevent
the process from crashing because of the watchdog kicking in.

Default value is 1000ms.
Combined with forced-yield default value of 10000 lua instructions, it
should be high enough to prevent any existing script breakage, while
still being able to catch slow lua converters or sample fetches
doing thread contention and risking the process stability.

Setting value to 0 completely bypasses this check. (not recommended but
could be required to restore original behavior if this feature breaks
existing setups somehow...)

No backport needed, although it could be used to prevent watchdog crashes
due to poorly coded (slow/cpu consuming) lua sample fetches/converters.
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON
da9503ca9a MEDIUM: hlua: reliable timeout detection
For non yieldable lua handlers (converters, fetches or yield
incompatible lua functions), current timeout detection relies on now_ms
thread local variable.

But within non-yieldable contexts, now_ms won't be updated if not by us
(because we're momentarily stuck in lua context so we won't
re-enter the polling loop, which is responsible for clock updates).

To circumvent this, clock_update_date(0, 1) was manually performed right
before now_ms is being read for the timeout checks.

But this fails to work consistently, because if no other concurrent
threads periodically run clock_update_global_date(), which do happen if
we're the only active thread (nbthread=1 or low traffic), our
clock_update_date() call won't reliably update our local now_ms variable

Moreover, clock_update_date() is not the right tool for this anyway, as
it was initially meant to be used from the polling context.
Using it could have negative impact on other threads relying on now_ms
to be stable. (because clock_update_date() performs global clock update
from time to time)

-> Introducing hlua multipurpose timer, which is internally based on
now_cpu_time_fast() that provides per-thread consistent clock readings.

Thanks to this new hlua timer API, hlua timeout logic is less error-prone
and more robust.

This allows the timeout detection to work as expected for both yieldable
and non-yieldable lua handlers.

This patch depends on commit "MINOR: clock: add now_cpu_time_fast() function"

While this could theorically be backported to all stable versions,
it is advisable to avoid backports unless we're confident enough
since it could cause slight behavior changes (timing related) in
existing setups.
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON
df188f145b MINOR: clock: add now_cpu_time_fast() function
Same as now_cpu_time(), but for fast queries (less accurate)
Relies on now_cpu_time() and now_mono_time_fast() is used
as a cache expiration hint to prevent now_cpu_time() from being
called too often since it is known to be quite expensive.

Depends on commit "MINOR: clock: add now_mono_time_fast() function"
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON
07cbd8e074 MINOR: clock: add now_mono_time_fast() function
Same as now_mono_time(), but for fast queries (less accurate)
Relies on coarse clock source (also known as fast clock source on
some systems).

Fallback to now_mono_time() if coarse source is not supported on the system.
2023-04-19 11:03:31 +02:00
Willy Tarreau
be336620b7 BUG/MINOR: cfgparse: make sure to include openssl-compat
Commit 5003ac7fe ("MEDIUM: config: set useful ALPN defaults for HTTPS
and QUIC") revealed a build dependency bug: if QUIC is not enabled,
cfgparse doesn't have any dependency on the SSL stack, so the various
ifdefs that try to check special conditions such as rejecting an H2
config with too small a bufsize, are silently ignored. This was
detected because the default ALPN string was not set and caused the
alpn regtest to fail without QUIC support. Adding openssl-compat to
the list of includes seems to be sufficient to have what we need.

It's unclear when this dependency was broken, it seems that even 2.2
didn't have an explicit dependency on anything SSL-related, though it
could have been inherited through other files (as happens with QUIC
here). It would be safe to backport it to all stable branches. The
impact is very low anyway.
2023-04-19 10:46:21 +02:00
Amaury Denoyelle
89e48ff92f BUG/MEDIUM: quic: prevent crash on Retry sending
The following commit introduced a regression :
  commit 1a5cc19cec
  MINOR: quic: adjust Rx packet type parsing

Since this commit, qv variable was left to NULL as version is stored
directly in quic_rx_packet instance. In most cases, this only causes
traces to skip version printing. However, qv is dereferenced when
sending a Retry which causes a segfault.

To fix this, simply remove qv variable and use pkt->version instead,
both for traces and send_retry() invocation.

This bug was detected thanks to QUIC interop runner. It can easily be
reproduced by using quic-force-retry on the bind line.

This must be backported up to 2.7.
2023-04-19 10:18:58 +02:00
Willy Tarreau
5003ac7fe9 MEDIUM: config: set useful ALPN defaults for HTTPS and QUIC
This commit makes sure that if three is no "alpn", "npn" nor "no-alpn"
setting on a "bind" line which corresponds to an HTTPS or QUIC frontend,
we automatically turn on "h2,http/1.1" as an ALPN default for an HTTP
listener, and "h3" for a QUIC listener. This simplifies the configuration
for end users since they won't have to explicitly configure the ALPN
string to enable H2, considering that at the time of writing, HTTP/1.1
represents less than 7% of the traffic on large infrastructures. The
doc and regtests were updated. For more info, refer to the following
thread:

  https://www.mail-archive.com/haproxy@formilux.org/msg43410.html
2023-04-19 09:52:20 +02:00
Willy Tarreau
de85de69ec MINOR: ssl_crtlist: dump "no-alpn" on "show crtlist" when "no-alpn" was set
Instead of dumping "alpn " better show "no-alpn" as configured.
2023-04-19 09:12:43 +02:00
Willy Tarreau
a2a095536a MINOR: ssl: do not set ALPN callback with the empty string
While it does not have any effect, it's better not to try to setup an
ALPN callback nor to try to lookup algorithms when the configured ALPN
string is empty as a result of "no-alpn" being used.
2023-04-19 09:12:43 +02:00
Willy Tarreau
158c18e85a MINOR: config: add "no-alpn" support for bind lines
It's possible to replace a previously set ALPN but not to disable ALPN
if it was previously set. The new "no-alpn" setting allows to disable
a previously set ALPN setting by preparing an empty one that will be
replaced and freed when the config is validated.
2023-04-19 08:38:06 +02:00
Christopher Faulet
d0c57d3d33 BUG/MEDIUM: stconn: Propagate error on the SC on sending path
On sending path, a pending error can be promoted to a terminal error at the
endpoint level (SE_FL_ERR_PENDING to SE_FL_ERROR). When this happens, we
must propagate the error on the SC to be able to handle it at the stream
level and eventually forward it to the other side.

Because of this bug, it is possible to freeze sessions, for instance on the
CLI.

It is a 2.8-specific issue. No backport needed.
2023-04-18 18:57:04 +02:00
Christopher Faulet
845f7c4708 CLEANUP: cli: Remove useless debug message in cli_io_handler()
When compiled in debug mode, HAProxy prints a debug message at the end of
the cli I/O handle. It is pretty annoying and useless because, we can active
applet traces. Thus, just remove it.
2023-04-18 18:57:04 +02:00
Christopher Faulet
cbfcb02e21 CLEANUP: backend: Remove useless debug message in assign_server()
When compiled in debug mode, HAProxy prints a debug message at the beginning
of assign_server(). It is pretty annoying and useless because, in debug
mode, we can active stream traces. Thus, just remove it.
2023-04-18 18:57:04 +02:00
Christopher Faulet
27c17d1ca5 BUG/MINOR: http-ana: Update analyzers on both sides when switching in TUNNEL mode
The commit 9704797fa ("BUG/MEDIUM: http-ana: Properly switch the request in
tunnel mode on upgrade") fixes the switch in TUNNEL mode, but only
partially. Because both channels are switch in TUNNEL mode in same time on
one side, the channel's analyzers on the opposite side are not updated
accordingly. This prevents the tunnel timeout to be applied.

So instead of updating both sides in same time, we only force the analysis
on the other side by setting CF_WAKE_ONCE flag when a channel is switched in
TUNNEL mode. In addition, we must take care to forward all data if there is
no DATAa TCP filters registered.

This patch is related to the issue #2125. It is 2.8-specific. No backport
needed.
2023-04-18 18:57:04 +02:00
Amaury Denoyelle
0783a7b08e MINOR: listener: remove unneeded local accept flag
Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC
protocol before thread rebinding was supported by the quic_conn layer.

This should be backported up to 2.7 after the previous patch has also
been taken.
2023-04-18 17:09:34 +02:00
Amaury Denoyelle
1acbbca171 MAJOR: quic: support thread balancing on accept
Before this patch, QUIC protocol used a custom add_listener callback.
This was because a quic_conn instance was allocated before accept. Its
thread affinity was fixed and could not be changed after. The thread was
derived itself from the CID selected by the client which prevent an even
repartition of QUIC connections on multiple threads.

A series of patches was introduced with a lot of changes. The most
important ones :
* removal of affinity between an encoded CID and a thread
* possibility to rebind a quic_conn on a new thread

Thanks to this, it's possible to suppress the custom add_listener
callback. Accept is conducted for QUIC protocol as with the others. A
less loaded thread is selected on listener_accept() and the connection
stack is bind on it. This operation implies that quic_conn instance is
moved to the new thread using the set_affinity QUIC protocol callback.

To reactivate quic_conn instance after thread rebind,
qc_finalize_affinity_rebind() is called after accept on the new thread
by qc_xprt_start() through accept_queue_process() / session_accept_fd().

This should be backported up to 2.7 after a period of observation.
2023-04-18 17:09:34 +02:00
Amaury Denoyelle
739de3f119 MINOR: quic: properly finalize thread rebinding
When a quic_conn instance is rebinded on a new thread its tasks and
tasklet are destroyed and new ones created. Its socket is also migrated
to a new thread which stop reception on it.

To properly reactivate a quic_conn after rebind, wake up its tasks and
tasklet if they were active before thread rebind. Also reactivate
reading on the socket FD. These operations are implemented on a new
function qc_finalize_affinity_rebind().

This should be backported up to 2.7 after a period of observation.
2023-04-18 17:09:02 +02:00
Amaury Denoyelle
5f8704152a BUG/MINOR: quic: transform qc_set_timer() as a reentrant function
qc_set_timer() function is used to rearm the timer for loss detection
and probing. Previously, timer was always rearm when congestion window
was free due to a wrong interpretation of the RFC which mandates the
client to rearm the timer before handshake completion to avoid a
deadlock related to anti-amplification.

Fix this by removing this code from quic_pto_pktns(). This allows
qc_set_timer() to be reentrant and only activate the timer if needed.

The impact of this bug seems limited. It can probably caused the timer
task to be processed too frequently which could caused too frequent
probing.

This change will allow to reuse easily qc_set_timer() after quic_conn
thread migration. As such, the new timer task will be scheduled only if
needed.

This should be backported up to 2.6.
2023-04-18 17:09:02 +02:00
Amaury Denoyelle
25174d51ef MEDIUM: quic: implement thread affinity rebinding
Implement a new function qc_set_tid_affinity(). This function is
responsible to rebind a quic_conn instance to a new thread.

This operation consists mostly of releasing existing tasks and tasklet
and allocating new instances on the new thread. If the quic_conn uses
its owned socket, it is also migrated to the new thread. The migration
is finally completed with updated the CID TID to the new thread. After
this step, the connection is thus accessible to the new thread and
cannot be access anymore on the old one without risking race condition.

To ensure rebinding is either done completely or not at all, tasks and
tasklet are pre-allocated before all operations. If this fails, an error
is returned and rebiding is not done.

To destroy the older tasklet, its context is set to NULL before wake up.
In I/O callbacks, a new function qc_process() is used to check context
and free the tasklet if NULL.

The thread rebinding can cause a race condition if the older thread
quic_dghdlrs::dgrams list contains datagram for the connection after
rebinding is done. To prevent this, quic_rx_pkt_retrieve_conn() always
check if the packet CID is still associated to the current thread or
not. In the latter case, no connection is returned and the new thread is
returned to allow to redispatch the datagram to the new thread in a
thread-safe way.

This should be backported up to 2.7 after a period of observation.
2023-04-18 17:08:34 +02:00
Amaury Denoyelle
1304d19dee MINOR: quic: delay post handshake frames after accept
When QUIC handshake is completed on our side, some frames are prepared
to be sent :
* HANDSHAKE_DONE
* several NEW_CONNECTION_ID with CIDs allocated

This step was previously executed in quic_conn_io_cb() directly after
CRYPTO frames parsing. This patch delays it to be completed after
accept. Special care have been taken to ensure it is still functional
with 0-RTT activated.

For the moment, this patch should have no impact. However, when
quic_conn thread migration on accept will be implemented, it will be
easier to remap only one CID to the new thread. New CIDs will be
allocated after migration on the new thread.

This should be backported up to 2.7 after a period of observation.
2023-04-18 17:08:28 +02:00
Amaury Denoyelle
a66e04338e MINOR: protocol: define new callback set_affinity
Define a new protocol callback set_affinity. This function is used
during listener_accept() to notify about a rebind on a new thread just
before pushing the connection on the selected thread queue. If the
callback fails, accept is done locally.

This change will be useful for protocols with state allocated before
accept is done. For the moment, only QUIC protocol is concerned. This
will allow to rebind the quic_conn to a new thread depending on its
load.

This should be backported up to 2.7 after a period of observation.
2023-04-18 16:54:52 +02:00
Amaury Denoyelle
987812b190 MINOR: quic: do not proceed to accept for closing conn
Each quic_conn is inserted in an accept queue to allocate the upper
layers. This is done through a listener tasklet in
quic_sock_accept_conn().

This patch interrupts the accept process for a quic_conn in
closing/draining state. Indeed, this connection will soon be closed so
it's unnecessary to allocate a complete stack for it.

This patch will become necessary when thread migration is implemented.
Indeed, it won't be allowed to proceed to thread migration for a closing
quic_conn.

This should be backported up to 2.7 after a period of observation.
2023-04-18 16:54:48 +02:00
Amaury Denoyelle
f16ec344d5 MEDIUM: quic: handle conn bootstrap/handshake on a random thread
TID encoding in CID was removed by a recent change. It is now possible
to access to the <tid> member stored in quic_connection_id instance.

For unknown CID, a quick solution was to redispatch to the thread
corresponding to the first CID byte. This ensures that an identical CID
will always be handled by the same thread to avoid creating multiple
same connection. However, this forces an uneven load repartition which
can be critical for QUIC handshake operation.

To improve this, remove the above constraint. An unknown CID is now
handled by its receiving thread. However, this means that if multiple
packets are received with the same unknown CID, several threads will try
to allocate the same connection.

To prevent this race condition, CID insertion in global tree is now
conducted first before creating the connection. This is a thread-safe
operation which can only be executed by a single thread. The thread
which have inserted the CID will then proceed to quic_conn allocation.
Other threads won't be able to insert the same CID : this will stop the
treatment of the current packet which is redispatch to the now owning
thread.

This should be backported up to 2.7 after a period of observation.
2023-04-18 16:54:44 +02:00
Amaury Denoyelle
1e959ad522 MINOR: quic: remove TID encoding in CID
CIDs were moved from a per-thread list to a global list instance. The
TID-encoded is thus non needed anymore.

This should be backported up to 2.7 after a period of observation.
2023-04-18 16:54:31 +02:00
Amaury Denoyelle
e83f937cc1 MEDIUM: quic: use a global CID trees list
Previously, quic_connection_id were stored in a per-thread tree list.
Datagram were first dispatched to the correct thread using the encoded
TID before a tree lookup was done.

Remove these trees and replace it with a global trees list of 256
entries. A CID is using the list index corresponding to its first byte.
On datagram dispatch, CID is lookup on its tree and TID is retrieved
using new member quic_connection_id.tid. As such, a read-write lock
protects each list instances. With 256 entries, it is expected that
contention should be reduced.

A new structure quic_cid_tree served as a tree container associated with
its read-write lock. An API is implemented to ensure lock safety for
insert/lookup/delete operation.

This patch is a step forward to be able to break the affinity between a
CID and a TID encoded thread. This is required to be able to migrate a
quic_conn after accept to select thread based on their load.

This should be backported up to 2.7 after a period of observation.
2023-04-18 16:54:17 +02:00
Amaury Denoyelle
66947283ba MINOR: quic: remove TID ref from quic_conn
Remove <tid> member in quic_conn. This is moved to quic_connection_id
instance.

For the moment, this change has no impact. Indeed, qc.tid reference
could easily be replaced by tid as all of this work was already done on
the connection thread. However, it is planified to support quic_conn
thread migration in the future, so removal of qc.tid will simplify this.

This should be backported up to 2.7.
2023-04-18 16:20:47 +02:00
Amaury Denoyelle
c2a9264f34 MINOR: quic: adjust quic CID derive API
ODCID are never stored in the CID tree. Instead, we store our generated
CID which is directly derived from the CID using a hash function. This
operation is done via quic_derive_cid().

Previously, generated CID was returned as a 64-bits integer. However,
this is cumbersome to convert as an array of bytes which is the most
common CID representation. Adjust this by modifying return type to a
quic_cid struct.

This should be backported up to 2.7.
2023-04-18 16:20:47 +02:00
Amaury Denoyelle
1a5cc19cec MINOR: quic: adjust Rx packet type parsing
qc_parse_hd_form() is the function used to parse the first byte of a
packet and return its type and version. Its API has been simplified with
the following changes :
* extra out paremeters are removed (long_header and version). All infos
  are now stored directly in quic_rx_packet instance
* a new dummy version is declared in quic_versions array with a 0 number
  code. This can be used to match Version negotiation packets.
* a new default packet type is defined QUIC_PACKET_TYPE_UNKNOWN to be
  used as an initial value.

Also, the function has been exported to an include file. This will be
useful to be able to reuse on quic-sock to parse the first packet of a
datagram.

This should be backported up to 2.7.
2023-04-18 16:20:47 +02:00
Amaury Denoyelle
6ac0fb0f13 MINOR: quic: remove uneeded tasklet_wakeup after accept
No need to explicitely wakeup quic-conn tasklet after accept is done.

This should be backported up to 2.7.
2023-04-18 16:20:47 +02:00
Amaury Denoyelle
591e7981d9 CLEANUP: quic: rename quic_connection_id vars
Two different structs exists for QUIC connection ID :
* quic_connection_id which represents a full CID with its sequence
  number
* quic_cid which is just a buffer with a length. It is contained in the
  above structure.

To better differentiate them, rename all quic_connection_id variable
instances to "conn_id" by contrast to "cid" which is used for quic_cid.

This should be backported up to 2.7.
2023-04-18 16:20:47 +02:00
Amaury Denoyelle
9b68b64572 CLEANUP: quic: remove unused qc param on stateless reset token
Remove quic_conn instance as first parameter of
quic_stateless_reset_token_init() and quic_stateless_reset_token_cpy()
functions. It was only used for trace purpose.

The main advantage is that it will be possible to allocate a QUIC CID
without a quic_conn instance using new_quic_cid() which is requires to
first check if a CID is existing before allocating a connection.

This should be backported up to 2.7.
2023-04-18 16:20:47 +02:00
Amaury Denoyelle
90e5027e46 CLEANUP: quic: remove unused scid_node
Remove unused scid_node member for quic_conn structure. It was prepared
for QUIC backend support.

This should be backported up to 2.7.
2023-04-18 16:20:47 +02:00
Amaury Denoyelle
22a368ce58 CLEANUP: quic: remove unused QUIC_LOCK label
QUIC_LOCK label is never used. Indeed, lock usage is minimal on QUIC as
every connection is pinned to its owned thread.

This should be backported up to 2.7.
2023-04-18 16:20:47 +02:00
Amaury Denoyelle
c361937d51 BUG/MINOR: task: allow to use tasklet_wakeup_after with tid -1
Adjust BUG_ON() statement to allow tasklet_wakeup_after() for tasklets
with tid pinned to -1 (the current thread). This is similar to
tasklet_wakeup().

This should be backported up to 2.6.
2023-04-18 16:20:47 +02:00
Willy Tarreau
ca1027c22f MINOR: mux-h2: make the max number of concurrent streams configurable per side
For a long time the maximum number of concurrent streams was set once for
both sides (front and back) while the impacts are different. This commit
allows it to be configured separately for each side. The older settings
remains the fallback choice when other ones are not set.
2023-04-18 15:58:55 +02:00
Willy Tarreau
9d7abda787 MINOR: mux-h2: make the initial window size configurable per side
For a long time the initial window size (per-stream size) was set once
for both directions, frontend and backend, resulting in a tradeoff between
upload speed and download fairness. This commit allows it to be configured
separately for each side. The older settings remains the fallback choice
when other ones are not set.
2023-04-18 15:58:55 +02:00
Christopher Faulet
b36e512bd0 MINOR: stconn: Propagate EOS from an applet to the attached stream-connector
In the same way than for a stream-connector attached to a mux, an EOS is now
propagated from an applet to its stream-connector. To do so, sc_applet_eos()
function is added.
2023-04-17 17:41:28 +02:00
Christopher Faulet
1aec6c92cb MINOR: stconn: Propagate EOS from a mux to the attached stream-connector
Now there is a SC flag to state the endpoint has reported an end-of-stream,
it is possible to distinguish an EOS from an abort at the stream-connector
level.

sc_conn_read0() function is renamed to sc_conn_eos() and it propagates an
EOS by setting SC_FL_EOS instead of SC_FL_ABRT_DONE. It only concernes
stream-connectors attached to a mux.
2023-04-17 17:41:28 +02:00
Christopher Faulet
ca5309a9a3 MINOR: stconn: Add a flag to report EOS at the stream-connector level
SC_FL_EOS flag is added to report the end-of-stream at the SC level. It will
be used to distinguish end of stream reported by the endoint, via the
SE_FL_EOS flag, and the abort triggered by the stream, via the
SC_FL_ABRT_DONE flag.

In this patch, the flag is defined and is systematically tested everywhere
SC_FL_ABRT_DONE is tested. It should be safe because it is never set.
2023-04-17 17:41:28 +02:00
Christopher Faulet
285aa40d35 BUG/MEDIUM: log: Properly handle client aborts in syslog applet
In the syslog applet, when there is no output data, nothing is performed and
the applet leaves by requesting more data. But it is an issue because a
client abort is only handled if it reported with the last bytes of the
message. If the abort occurs after the message was handled, it is ignored.
The session remains opened and inactive until the client timeout is being
triggered. It no such timeout is configured, given that the default maxconn
is 10, all slots can be quickly busy and make the applet unresponsive.

To fix the issue, the best is to always try to read a message when the I/O
handle is called. This way, the abort can be handled. And if there is no
data, we leave as usual.

This patch should fix the issue #2112. It must be backported as far as 2.4.
2023-04-17 16:50:30 +02:00
Christopher Faulet
9704797fa2 BUG/MEDIUM: http-ana: Properly switch the request in tunnel mode on upgrade
Since the commit f2b02cfd9 ("MAJOR: http-ana: Review error handling during
HTTP payload forwarding"), during the payload forwarding, we are analyzing a
side, we stop to test the opposite side. It means when the HTTP request
forwarding analyzer is called, we no longer check the response side and vice
versa.

Unfortunately, since then, the HTTP tunneling is broken after a protocol
upgrade. On the response is switch in TUNNEL mode. The request remains in
DONE state. As a consequence, data received from the server are forwarded to
the client but not data received from the client.

To fix the bug, when both sides are in DONE state, both are switched in same
time in TUNNEL mode if it was requested. It is performed in the same way in
http_end_request() and http_end_response().

This patch should fix the issue #2125. It is 2.8-specific. No backport
needed.
2023-04-17 16:17:35 +02:00
William Lallemand
a21ca74e83 MINOR: ssl: remove OpenSSL 1.0.2 mention into certificate loading error
Remove the mention to OpenSSL 1.0.2 in the certificate chain loading
error, which is not relevant.

Could be backported in 2.7.
2023-04-17 14:45:40 +02:00
Ilya Shipitsin
2ca01589a0 CLEANUP: use "offsetof" where appropriate
let's use the C library macro "offsetof"
2023-04-16 09:58:49 +02:00
Frédéric Lécaille
b5efe7901d BUG/MINOR: quic: Do not use ack delay during the handshakes
As revealed by GH #2120 opened by @Tristan971, there are cases where ACKs
have to be sent without packet to acknowledge because the ACK timer has
been triggered and the connection needs to probe the peer at the same time.
Indeed

Thank you to @Tristan971 for having reported this issue.

Must be backported to 2.6 and 2.7.
2023-04-14 21:09:13 +02:00
Christopher Faulet
75b954fea4 BUG/MINOR: stconn: Don't set SE_FL_ERROR at the end of sc_conn_send()
When I reworked my series, this code was first removed and reinserted by
error. So let's remove it again.
2023-04-14 17:32:44 +02:00
Christopher Faulet
25d9fe50f5 MEDIUM: stconn: Rely on SC flags to handle errors instead of SE flags
It is the last commit on this subject. we stop to use SE_FL_ERROR flag from
the SC, except at the I/O level. Otherwise, we rely on SC_FL_ERROR
flag. Now, there should be a real separation between SE flags and SC flags.
2023-04-14 17:05:54 +02:00
Christopher Faulet
e182a8e651 MEDIUM: stream: Stop to use SE flags to detect endpoint errors
Here again, we stop to use SE_FL_ERROR flag from process_stream() and
sub-functions and we fully rely on SC_FL_ERROR to do so.
2023-04-14 17:05:54 +02:00
Christopher Faulet
d7bac88427 MEDIUM: stream: Stop to use SE flags to detect read errors from analyzers
In the same way the previous commit, we stop to use SE_FL_ERROR flag from
analyzers and their sub-functions. We now fully rely on SC_FL_ERROR to do so.
2023-04-14 17:05:54 +02:00
Christopher Faulet
725170eee6 MEDIUM: backend: Stop to use SE flags to detect connection errors
SE_FL_ERROR flag is no longer set when an error is detected durign the
connection establishment. SC_FL_ERROR flag is set instead. So it is safe to
remove test on SE_FL_ERROR to detect connection establishment error.
2023-04-14 17:05:54 +02:00
Christopher Faulet
88d05a0f3b MEDIUM: tree-wide: Stop to set SE_FL_ERROR from upper layer
We can now fully rely on SC_FL_ERROR flag from the stream. The first step is
to stop to set the SE_FL_ERROR flag. Only endpoints are responsible to set
this flag. It was a design limitation. It is now fixed.
2023-04-14 17:05:54 +02:00
Christopher Faulet
ad46e52814 MINOR: tree-wide: Test SC_FL_ERROR with SE_FL_ERROR from upper layer
From the stream, when SE_FL_ERROR flag is tested, we now also test the
SC_FL_ERROR flag. Idea is to stop to rely on the SE descriptor to detect
errors.
2023-04-14 17:05:54 +02:00
Christopher Faulet
340021b89f MINOR: stream: Set SC_FL_ERROR on channels' buffer allocation error
Set SC_FL_ERROR flag when we fail to allocate a buffer for a stream.
2023-04-14 17:05:54 +02:00
Christopher Faulet
38656f406c MINOR: backend: Set SC_FL_ERROR on connection error
During connection establishement, if an error occurred, the SC_FL_ERROR flag
is now set. Concretely, it is set when SE_FL_ERROR is also set.
2023-04-14 17:05:53 +02:00
Christopher Faulet
a1d14a7c7f MINOR: stconn: Add a flag to ack endpoint errors at SC level
The flag SC_FL_ERROR is added to ack errors on the endpoint. When
SE_FL_ERROR flag is detected on the SE descriptor, the corresponding is set
on the SC. Idea is to avoid, as far as possible, to manipulated the SE
descriptor in upper layers and know when an error in the endpoint is handled
by the SC.

For now, this flag is only set and cleared but never tested.
2023-04-14 17:05:53 +02:00
Christopher Faulet
638fe6ab0f MINOR: stconn: Don't clear SE_FL_ERROR when endpoint is reset
There is no reason to remove this flag. When the SC endpoint is reset, it is
replaced by a new one. The old one is released. It was useful when the new
endpoint inherited some flags from the old one.  But it is no longer
performed. Thus there is no reason still unset this flag.
2023-04-14 17:05:53 +02:00
Christopher Faulet
e8bcef5f22 MEDIUM: stconn: Forbid applets with more to deliver if EOI was reached
When an applet is woken up, before calling its io_handler, we pretend it has
no more data to deliver. So, after the io_handler execution, it is a bug if
an applet states it has more data to deliver while the end of input is
reached.

So a BUG_ON() is added to be sure it never happens.
2023-04-14 17:05:53 +02:00
Christopher Faulet
56a2b608b0 MINOR: stconn: Stop to set SE_FL_ERROR on sending path
It is not the SC responsibility to report errors on the SE descriptor. It is
the endpoint responsibility. It must switch SE_FL_ERR_PENDING into
SE_FL_ERROR if the end of stream was detected. It can even be considered as
a bug if it is not done by he endpoint.

So now, on sending path, a BUG_ON() is added to abort if SE_FL_EOS and
SE_FL_ERR_PENDING flags are set but not SE_FL_ERROR. It is trully important
to handle this case in the endpoint to be able to properly shut the endpoint
down.
2023-04-14 17:05:53 +02:00
Christopher Faulet
d3bc340e7e BUG/MINOR: cli: Don't close when SE_FL_ERR_PENDING is set in cli analyzer
SE_FL_ERR_PENDING is used to report an error on the write side. But it is
not a terminal error. Some incoming data may still be available. In the cli
analyzers, it is important to not close the stream when this flag is
set. Otherwise the response to a command can be truncated. It is probably
hard to observe. But it remains a bug.

While this patch could be backported to 2.7, there is no real reason to do
so, except if someone reports a bug about truncated responses.
2023-04-14 16:49:04 +02:00
Christopher Faulet
214f1b5c16 MINOR: tree-wide: Replace several chn_prod() by the corresponding SC
At many places, call to chn_prod() can be easily replaced by the
corresponding SC. It is a bit easier to understand which side is
manipulated.
2023-04-14 15:06:04 +02:00
Christopher Faulet
64350bbf05 MINOR: tree-wide: Replace several chn_cons() by the corresponding SC
At many places, call to chn_cons() can be easily replaced by the
corresponding SC. It is a bit easier to understand which side is
manipulated.
2023-04-14 15:04:03 +02:00
Christopher Faulet
b2b1c3a6ea MINOR: channel/stconn: Replace sc_shutw() by sc_shutdown()
All reference to a shutw is replaced by an abort. So sc_shutw() is renamed
sc_shutdown(). SC app ops functions are renamed accordingly.
2023-04-14 15:02:57 +02:00
Christopher Faulet
208c712b40 MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE
Here again, it is just a flag renaming. In SC flags, there is no longer
shutdown for writes but shutdowns.
2023-04-14 15:01:21 +02:00
Christopher Faulet
cfc11c0eae MINOR: channel/stconn: Replace sc_shutr() by sc_abort()
All reference to a shutr is replaced by an abort. So sc_shutr() is renamed
sc_abort(). SC app ops functions are renamed accordingly.
2023-04-14 14:54:35 +02:00
Christopher Faulet
0c370eee6d MINOR: stconn: Rename SC_FL_SHUTR in SC_FL_ABRT_DONE
Here again, it is just a flag renaming. In SC flags, there is no longer
shutdown for reads but aborts. For now this flag is set when a read0 is
detected. It is of couse not accurate. This will be changed later.
2023-04-14 14:51:22 +02:00
Christopher Faulet
df7cd710a8 MINOR: channel/stconn: Replace channel_shutw_now() by sc_schedule_shutdown()
After the flag renaming, it is now the turn for the channel function to be
renamed and moved in the SC scope. channel_shutw_now() is replaced by
sc_schedule_shutdown(). The request channel is replaced by the front SC and
the response is replace by the back SC.
2023-04-14 14:49:45 +02:00
Christopher Faulet
e38534cbd0 MINOR: stconn: Rename SC_FL_SHUTW_NOW in SC_FL_SHUT_WANTED
Because shutowns for reads are now considered as aborts, the shudowns for
writes can now be considered as shutdowns. Here it is just a flag
renaming. SC_FL_SHUTW_NOW is renamed SC_FL_SHUT_WANTED.
2023-04-14 14:46:07 +02:00
Christopher Faulet
12762f09a5 MINOR: channel/stconn: Replace channel_shutr_now() by sc_schedule_abort()
After the flag renaming, it is now the turn for the channel function to be
renamed and moved in the SC scope. channel_shutr_now() is replaced by
sc_schedule_abort(). The request channel is replaced by the front SC and the
response is replace by the back SC.
2023-04-14 14:08:49 +02:00
Christopher Faulet
573ead1e68 MINOR: stconn: Rename SC_FL_SHUTR_NOW in SC_FL_ABRT_WANTED
It is the first step to transform shutdown for reads for the upper layer
into aborts. This patch is quite simple, it is just a flag renaming.
2023-04-14 14:06:01 +02:00
Christopher Faulet
7eb837df4a MINOR: stream: Introduce stream_abort() to abort on both sides in same time
The function stream_abort() should now be called when an abort is performed
on the both channels in same time.
2023-04-14 14:04:59 +02:00
Christopher Faulet
3db538ac2f MINOR: channel: Forwad close to other side on abort
Most of calls to channel_abort() are associated to a call to
channel_auto_close(). Others are in areas where the auto close is the
default. So, it is now systematically enabled when an abort is performed on
a channel, as part of channel_abort() function.
2023-04-14 13:56:28 +02:00
Christopher Faulet
0adffb62c1 MINOR: filters: Review and simplify errors handling
First, it is useless to abort the both channel explicitly. For HTTP streams,
http_reply_and_close() is called. This function already take care to abort
processing. For TCP streams, we can rely on stream_retnclose().

To set termination flags, we can also rely on http_set_term_flags() for HTTP
streams and sess_set_term_flags() for TCP streams. Thus no reason to handle
them by hand.

At the end, the error handling after filters evaluation is now quite simple.
2023-04-14 12:13:09 +02:00
Christopher Faulet
dbad8ec787 MINOR: stream: Uninline and export sess_set_term_flags() function
This function will be used to set termination flags on TCP streams from
outside of process_stream(). Thus, it must be uninlined and exported.
2023-04-14 12:13:09 +02:00
Christopher Faulet
95125886ee BUG/MEDIUM: stconn: Do nothing in sc_conn_recv() when the SC needs more room
We erroneously though that an attempt to receive data was not possible if the SC
was waiting for more room in the channel buffer. A BUG_ON() was added to detect
bugs. And in fact, it is possible.

The regression was added in commit 341a5783b ("BUG/MEDIUM: stconn: stop to
enable/disable reads from streams via si_update_rx").

This patch should fix the issue #2115. It must be backported if the commit
above is backported.
2023-04-14 12:13:09 +02:00
Christopher Faulet
915ba08b57 BUG/MEDIUM: stream: Report write timeouts before testing the flags
A regression was introduced when stream's timeouts were refactored. Write
timeouts are not testing is the right order. When timeous of the front SC
are handled, we must then test the read timeout on the request channel and
the write timeout on the response channel. But write timeout is tested on
the request channel instead. On the back SC, the same mix-up is performed.

We must be careful to handle timeouts before checking channel flags. To
avoid any confusions, all timeuts are handled first, on front and back SCs.
Then flags of the both channels are tested.

It is a 2.8-specific issue. No backport needed.
2023-04-14 12:13:09 +02:00
Christopher Faulet
925279ccf2 BUG/MINOR: stream: Fix test on SE_FL_ERROR on the wrong entity
There is a bug at begining of process_stream(). The SE_FL_ERROR flag is
tested against backend stream-connector's flags instead of its SE
descriptor's flags. It is an old typo, introduced when the stream-interfaces
were replaced by the conn-streams.

This patch must be backported as far as 2.6.
2023-04-14 12:13:09 +02:00
Frédéric Lécaille
895700bd32 BUG/MINOR: quic: Wrong Application encryption level selection when probing
This bug arrived with this commit:

    MEDIUM: quic: Ack delay implementation

After having probed the Handshake packet number space, one must not select the
Application encryption level to continue trying building packets as this is done
when the connection is not probing. Indeed, if the ACK timer has been triggered
in the meantime, the packet builder will try to build a packet at the Application
encryption level to acknowledge the received packet. But there is very often
no 01RTT packet to acknowledge when the connection is probing before the
handshake is completed. This triggers a BUG_ON() in qc_do_build_pkt() which
checks that the tree of ACK ranges to be used is not empty.

Thank you to @Tristan971 for having reported this issue in GH #2109.

Must be backported to 2.6 and 2.7.
2023-04-13 19:20:09 +02:00
Frédéric Lécaille
a576c1b0c6 MINOR: quic: Remove a useless test about probing in qc_prep_pkts()
qel->pktns->tx.pto_probe is set to 0 after having prepared a probing
datagram. There is no reason to check this parameter. Furthermore
it is always 0 when the connection does not probe the peer.

Must be backported to 2.6 and 2.7.
2023-04-13 19:20:09 +02:00
Frédéric Lécaille
91369cfcd0 MINOR: quic: Display the packet number space flags in traces
Display this information when the encryption level is also displayed.

Must be backported to 2.6 and 2.7.
2023-04-13 19:20:08 +02:00
Frédéric Lécaille
595251f22e BUG/MINOR: quic: SIGFPE in quic_cubic_update()
As reported by @Tristan971 in GH #2116, the congestion control window could be zero
due to an inversion in the code about the reduction factor to be applied.
On a new loss event, it must be applied to the slow start threshold and the window
should never be below ->min_cwnd (2*max_udp_payload_sz).

Same issue in both newReno and cubic algorithm. Furthermore in newReno, only the
threshold was decremented.

Must be backported to 2.6 and 2.7.
2023-04-13 19:20:08 +02:00
Frédéric Lécaille
9d68c6aaf6 BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit.
Add two missing checks not to substract too big values from another too little
one. In this case the resulted wrapped huge values could be passed to the function
which has to remove the last range of a tree of ACK ranges as encoded limit size
not to go below, cancelling the ACK ranges deletion. The consequence could be that
no ACK were sent.

Must be backported to 2.6 and 2.7.
2023-04-13 19:20:08 +02:00
Frédéric Lécaille
45bf1a82f1 BUG/MEDIUM: quic: Code sanitization about acknowledgements requirements
qc_may_build_pkt() has been modified several times regardless of the conditions
the functions it is supposed to allow to send packets (qc_build_pkt()/qc_do_build_pkt())
really use to finally send packets just after having received others, leading
to contraditions and possible very long loops sending empty packets (PADDING only packets)
because qc_may_build_pkt() could allow qc_build_pkt()/qc_do_build_pkt to build packet,
and the latter did nothing except sending PADDING frames, because from its point
of view they had nothing to send.

For now on, this is the job of qc_may_build_pkt() to decide to if there is
packets to send just after having received others AND to provide this information
to the qc_build_pkt()/qc_do_build_pkt()

Note that the unique case where the acknowledgements are completely ignored is
when the endpoint must probe. But at least this is when sending at most two datagrams!

This commit also fixes the issue reported by Willy about a very low throughput
performance when the client serialized its requests.

Must be backported to 2.7 and 2.6.
2023-04-13 19:20:08 +02:00
Frédéric Lécaille
eb3e5171ed MINOR: quic: Add connection flags to traces
This should help in diagnosing issues.

Some adjustments have to be done to avoid deferencing a quic_conn objects from
TRACE_*() calls.

Must be backported to 2.7 and 2.6.
2023-04-13 19:20:08 +02:00
Frédéric Lécaille
809bd9fed1 BUG/MINOR: quic: Ignored less than 1ms RTTs
Do not ignore very short RTTs (less than 1ms) before computing the smoothed
RTT initializing it to an "infinite" value (UINT_MAX).

Must be backported to 2.7 and 2.6.
2023-04-13 19:20:08 +02:00
Frédéric Lécaille
fad0e6cf73 MINOR: quic: Add packet loss and maximum cc window to "show quic"
Add the number of packet losts and the maximum congestion control window computed
by the algorithms to "show quic".
Same thing for the traces of existent congestion control algorithms.

Must be backported to 2.7 and 2.6.
2023-04-13 19:20:08 +02:00
Olivier Houchard
f98a8c317e BUG/MEDIUM: fd: don't wait for tmask to stabilize if we're not in it.
In fd_update_events(), we loop until there's no bit in the running_mask
that is not in the thread_mask. Problem is, the thread sets its
running_mask bit before that loop, and so if 2 threads do the same, and
a 3rd one just closes the FD and sets the thread_mask to 0, then
running_mask will always be non-zero, and we will loop forever. This is
trivial to reproduce when using a DNS resolver that will just answer
"port unreachable", but could theoretically happen with other types of
file descriptors too.

To fix that, just don't bother looping if we're no longer in the
thread_mask, if that happens we know we won't have to take care of the
FD, anyway.

This should be backported to 2.7, 2.6 and 2.5.
2023-04-13 18:04:46 +02:00
Willy Tarreau
a07635ead5 MINOR: bind-conf: support a new shards value: "by-group"
Setting "shards by-group" will create one shard per thread group. This
can often be a reasonable tradeoff between a single one that can be
suboptimal on CPUs with many cores, and too many that will eat a lot
of file descriptors. It was shown to provide good results on a 224
thread machine, with a distribution that was even smoother than the
system's since here it can take into account the number of connections
per thread in the group. Depending on how popular it becomes, it could
even become the default setting in a future version.
2023-04-13 17:38:31 +02:00
Willy Tarreau
d30e82b9f0 MINOR: receiver: reserve special values for "shards"
Instead of artificially setting the shards count to MAX_THREAD when
"by-thread" is used, let's reserve special values for symbolic names
so that we can add more in the future. For now we use value -1 for
"by-thread", which requires to turn the type to signed int but it was
already used as such everywhere anyway.
2023-04-13 17:12:50 +02:00
Amaury Denoyelle
53fc98c3bc MINOR: fd: implement fd_migrate_on() to migrate on a non-local thread
fd_migrate_on() can be used to migrate an existing FD to any thread, even
one belonging to a different group from the current one and from the
caller's. All that is needed is to make sure the FD is still valid when
the operation is performed (which is the case when such operations happen).

This is potentially slightly expensive since it locks the tgid during the
delicate operation, but it is normally performed only from an owning
thread to offer the FD to another one (e.g. reassign a better thread upon
accept()).
2023-04-13 16:57:51 +02:00
Willy Tarreau
97da942ba6 MINOR: thread: keep a bitmask of enabled groups in thread_set
We're only checking for 0, 1, or >1 groups enabled there, and we'll soon
need to be more precise and know quickly which groups are non-empty.
Let's just replace the count with a mask of enabled groups. This will
allow to quickly spot the presence of any such group in a set.
2023-04-13 16:57:51 +02:00
William Lallemand
3f210970bf BUG/MINOR: stick_table: alert when type len has incorrect characters
Alert when the len argument of a stick table type contains incorrect
characters.

Replace atol by strtol.

Could be backported in every maintained versions.
2023-04-13 14:46:08 +02:00
Willy Tarreau
28f2a590f6 MINOR: activity: add a line reporting the average CPU usage to "show activity"
It was missing from the output but is sometimes convenient to observe
and understand how incoming connections are distributed. The CPU usage
is reported as the instant measurement of 100-idle_pct for each thread,
and the average value is shown for the aggregated value.

This could be backported as it's helpful in certain troublehsooting
sessions.
2023-04-12 08:42:52 +02:00
Frédéric Lécaille
6fd2576d5e MINOR: quic: Add a trace for packet with an ACK frame
As the ACK frames are not added to the packet list of ack-eliciting frames,
it could not be traced. But there is a flag to identify such packet.
Let's use it to add this information to the traces of TX packets.

Must be backported to 2.6 and 2.7.
2023-04-11 10:47:19 +02:00
Frédéric Lécaille
e47adca432 MINOR: quic: Dump more information at proto level when building packets
This should be helpful to debug issues at without too much traces.

Must be backported to 2.7 and 2.6.
2023-04-11 10:47:19 +02:00
Frédéric Lécaille
c0aaa07aa3 MINOR: quic: Modify qc_try_rm_hp() traces
Dump at proto level the packet information when its header protection was removed.
Remove no more use qpkt_trace variable.

Must be backported to 2.7 and 2.6.
2023-04-11 10:47:19 +02:00
Frédéric Lécaille
68737316ea BUG/MINOR: quic: Wrong packet number space probing before confirmed handshake
It is possible that the handshake was not confirmed and there was no more
packet in flight to probe with. It this case the server must wait for
the client to be unblocked without probing any packet number space contrary
to what was revealed by interop tests as follows:

	[01|quic|2|uic_loss.c:65] TX loss pktns : qc@0x7fac301cd390 pktns=I pp=0
	[01|quic|2|uic_loss.c:67] TX loss pktns : qc@0x7fac301cd390 pktns=H pp=0 tole=-102ms
	[01|quic|2|uic_loss.c:67] TX loss pktns : qc@0x7fac301cd390 pktns=01RTT pp=0 if=1054 tole=-1987ms
	[01|quic|5|uic_loss.c:73] quic_loss_pktns(): leaving : qc@0x7fac301cd390
	[01|quic|5|uic_loss.c:91] quic_pto_pktns(): entering : qc@0x7fac301cd390
	[01|quic|3|ic_loss.c:121] TX PTO handshake not already completed : qc@0x7fac301cd390
	[01|quic|2|ic_loss.c:141] TX PTO : qc@0x7fac301cd390 pktns=I pp=0 dur=83ms
	[01|quic|5|ic_loss.c:142] quic_pto_pktns(): leaving : qc@0x7fac301cd390
	[01|quic|3|c_conn.c:5179] needs to probe Initial packet number space : qc@0x7fac301cd390

This bug was not visible before this commit:
     BUG/MINOR: quic: wake up MUX on probing only for 01RTT

This means that before it, one could do bad things (probing the 01RTT packet number
space before the handshake was confirmed).

Must be backported to 2.7 and 2.6.
2023-04-11 10:47:19 +02:00
Frédéric Lécaille
2513b1dd7b MINOR: quic: Trace fix in quic_pto_pktns() (handshaske status)
The handshake must be confirmed before probing the 01RTT packet number space.

Must be backported to 2.7 and 2.6.
2023-04-11 10:47:19 +02:00
Christopher Faulet
c202c740b5 BUG/MEDIUM: mux-h2: Never set SE_FL_EOS without SE_FL_EOI or SE_FL_ERROR
When end-of-stream is reported by a H2 stream, we must take care to also
report an error is end-of-input was not reported. Indeed, it is now
mandatory to set SE_FL_EOI or SE_FL_ERROR flags when SE_FL_EOS is set.

It is a 2.8-specific issue. No backport needed.
2023-04-11 08:59:10 +02:00
Christopher Faulet
c393c9e388 BUG/MEDIUM: mux-h1: Report EOI when a TCP connection is upgraded to H2
When TCP connection is first upgrade to H1 then to H2, the stream-connector,
created by the PT mux, must be destroyed because the H2 mux cannot inherit
from it. When it is performed, the SE_FL_EOS flag is set but SE_FL_EOI must
also be set. It is now required to never set SE_FL_EOS without SE_FL_EOI or
SE_FL_ERROR.

It is a 2.8-specific issue. No backport needed.
2023-04-11 08:45:18 +02:00
Christopher Faulet
f65cf3684d MINOR: hlua: Stop to check the SC state when executing a hlua cli command
This part has changed but it was already handled by the CLI applet. There is
no reason to performe this test when a hlua cli command is executed.
2023-04-11 08:19:06 +02:00
Christopher Faulet
5220a8c5c4 BUG/MEDIUM: resolvers: Force the connect timeout for DNS resolutions
Timeouts for dynamic resolutions are not handled at the stream level but by
the resolvers themself. It means there is no connect, client and server
timeouts defined on the internal proxy used by a resolver.

While it is not an issue for DNS resolution over UDP, it can be a problem
for resolution over TCP. New sessions are automatically created when
required, and killed on excess. But only established connections are
considered. Connecting ones are never killed. Because there is no conncet
timeout, we rely on the kernel to report a connection error. And this may be
quite long.

Because resolutions are periodically triggered, this may lead to an excess
of unusable sessions in connecting state. This also prevents HAProxy to
quickly exit on soft-stop. It is annoying, especially because there is no
reason to not set a connect timeout.

So to mitigate the issue, we now use the "resolve" timeout as connect
timeout for the internal proxy attached to a resolver.

This patch should be backported as far as 2.4.
2023-04-11 08:19:06 +02:00
Christopher Faulet
142cc1b52a BUG/MINOR: resolvers: Wakeup DNS idle task on stopping
Thanks to previous commit ("BUG/MEDIUM: dns: Kill idle DNS sessions during
stopping stage"), DNS idle sessions are killed on stopping staged. But the
task responsible to kill these sessions is running every 5 seconds. It
means, when HAProxy is stopped, we can observe a delay before the process
exits.

To reduce this delay, when the resolvers task is executed, all DNS idle
tasks are woken up.

This patch must be backported as far as 2.6.
2023-04-11 08:19:06 +02:00
Christopher Faulet
e0f4717727 BUG/MEDIUM: dns: Kill idle DNS sessions during stopping stage
There is no server timeout for DNS sessions over TCP. It means idle session
cannot be killed by itself. There is a task running peridically, every 5s,
to kill the excess of idle sessions. But the last one is never
killed. During the stopping stage, it is an issue since the dynamic
resolutions are no longer performed (2ec6f14c "BUG/MEDIUM: resolvers:
Properly stop server resolutions on soft-stop").

Before the above commit, during stopping stage, the DNS sessions were killed
when a resolution was triggered. Now, nothing kills these sessions. This
prevents the process to finish on soft-stop.

To fix this bug, the task killing excess of idle sessions now kill all idle
sessions on stopping stage.

This patch must be backported as far as 2.6.
2023-04-11 08:19:06 +02:00
Christopher Faulet
211452ef9a BUG/MEDIUM: log: Eat output data when waiting for appctx shutdown
When the log applet is executed while a shut is pending, the remaining
output data must always be consumed. Otherwise, this can prevent the stream
to exit, leading to a spinning loop on the applet.

It is 2.8-specific. No backport needed.
2023-04-11 08:19:06 +02:00
Christopher Faulet
9837bd86dc BUG/MEDIUM: stats: Eat output data when waiting for appctx shutdown
When the stats applet is executed while a shut is pending, the remaining
output data must always be consumed. Otherwise, this can prevent the stream
to exit, leading to a spinning loop on the applet.

It is 2.8-specific. No backport needed.
2023-04-11 07:43:26 +02:00
Christopher Faulet
1901c1bf5a BUG/MEDIUM: http-client: Eat output data when waiting for appctx shutdown
When the http-client applet is executed while a shut is pending, the
remaining output data must always be consumed. Otherwise, this can prevent
the stream to exit, leading to a spinning loop on the applet.

It is 2.8-specific. No backport needed.
2023-04-11 07:43:26 +02:00
Christopher Faulet
1fb97e47f0 BUG/MEDIUM: cli: Eat output data when waiting for appctx shutdown
When the cli applet is executed while a shut is pending, the remaining
output data must always be consumed. Otherwise, this can prevent the stream
to exit, leading to a spinning loop on the applet.

This patch should fix the issue #2107. It is 2.8-specific. No backport
needed.
2023-04-11 07:43:26 +02:00
Christopher Faulet
33af99655e BUG/MEDIUM: cli: Set SE_FL_EOI flag for '_getsocks' and 'quit' commands
An applet must never set SE_FL_EOS flag without SE_FL_EOI or SE_FL_ERROR
flags. Here, SE_FL_EOI flag was missing for "quit" or "_getsocks"
commands. Indeed, these commands are terminal.

This bug triggers a BUG_ON() recently added.

This patch is related to the issue #2107. It is 2.8-specific. No backport
needed.
2023-04-11 07:43:26 +02:00
Olivier Houchard
0963b8a07f BUG/MEDIUM: listeners: Use the right parameters for strlcpy2().
When calls to strcpy() were replaced with calls to strlcpy2(), one of them
was replaced wrong, and the source and size were inverted. Correct that.

This should fix issue #2110.
2023-04-08 15:01:57 +02:00
Willy Tarreau
fc458ec8aa CLEANUP: tree-wide: remove strpcy() from constant strings
These ones are genenerally harmless on modern compilers because the
compiler checks them. While gcc optimizes them away without even
referencing strcpy(), clang prefers to call strcpy(). Nevertheless they
prevent from enabling stricter checks so better remove them altogether.
They were all replaced by strlcpy2() and the size of the destination
which is always known there.
2023-04-07 18:14:28 +02:00
Willy Tarreau
6d4c0c2ca2 CLEANUP: ocsp: do no use strpcy() to copy a path!
strcpy() is quite nasty but tolerable to copy constants, but here
it copies a variable path into a node in a code path that's not
trivial to follow given that it takes the node as the result of
a tree lookup. Let's get rid of it and mention where the entry
is retrieved.
2023-04-07 17:57:05 +02:00
Willy Tarreau
a0fa577070 CLEANUP: tcpcheck: remove the only occurrence of sprintf() in the code
There's a single sprintf() in the whole code, in the "option smtpchk"
parser in tcpcheck.c. Let's turn it to a safer snprintf().
2023-04-07 16:04:54 +02:00
Willy Tarreau
22450af22a BUG/MINOR: lua: remove incorrect usage of strncat()
As every time strncat() is used, it's wrong, and this one is no exception.
Users often think that the length applies to the destination except it
applies to the source and makes it hard to use correctly. The bug did not
have an impact because the length was preallocated from the sum of all
the individual lengths as measured by strlen() so there was no chance one
of them would change in between. But it could change in the future. Let's
fix it to use memcpy() instead for strings, or byte copies for delimiters.

No backport is needed, though it can be done if it helps to apply other
fixes.
2023-04-07 16:04:54 +02:00
Olivier Houchard
ead43fe4f2 MEDIUM: compression: Make it so we can compress requests as well.
Add code so that compression can be used for requests as well.
New compression keywords are introduced :
"direction" that specifies what we want to compress. Valid values are
"request", "response", or "both".
"type-req" and "type-res" define content-type to be compressed for
requests and responses, respectively. "type" is kept as an alias for
"type-res" for backward compatibilty.
"algo-req" specifies the compression algorithm to be used for requests.
Only one algorithm can be provided.
"algo-res" provides the list of algorithm that can be used to compress
responses. "algo" is kept as an alias for "algo-res" for backward
compatibility.
2023-04-07 00:49:17 +02:00
Olivier Houchard
dea25f51b6 MINOR: compression: Count separately request and response compression
Duplicate the compression counters, so that we have separate counters
for request and response compression.
2023-04-07 00:47:04 +02:00
Olivier Houchard
db573e9c58 MINOR: compression: Store algo and type for both request and response
Make provision for being able to store both compression algorithms and
content-types to compress for both requests and responses. For now only
the responses one are used.
2023-04-07 00:46:59 +02:00
Olivier Houchard
dfc11da561 MINOR: compression: Prepare compression code for request compression
Make provision for storing the compression algorithm and the compression
context twice, one for requests, and the other for responses. Only the
response ones are used for now.
2023-04-07 00:46:55 +02:00
Olivier Houchard
3ce0f01b81 MINOR: compression: Make compression offload a flag
Turn compression offload into a flag in struct comp, instead of using
an int just for it.
2023-04-07 00:46:45 +02:00
Christopher Faulet
8eeec38bfa MINOR: applet: Use unsafe version to get stream from SC in the trace function
When a trace message for an applet is dumped, if the SC exists, the stream
always exists too. There is no way to attached an applet to a health-check.
So, we can use the unsafe version __sc_strm() to get the stream.

This patch is related to #2106. Not sure it will be enough for
Coverity. However, there is no bug here.
2023-04-06 08:48:17 +02:00
Aurelien DARRAGON
b28ded19a4 BUG/MINOR: errors: invalid use of memprintf in startup_logs_init()
On startup/reload, startup_logs_init() will try to export startup logs shm
filedescriptor through the internal HAPROXY_STARTUPLOGS_FD env variable.

While memprintf() is used to prepare the string to be exported via
setenv(), str_fd argument (first argument passed to memprintf()) could
be non NULL as a result of HAPROXY_STARTUPLOGS_FD env variable being
already set.

Indeed: str_fd is already used earlier in the function to store the result
of getenv("HAPROXY_STARTUPLOGS_FD").

The issue here is that memprintf() is designed to free the 'out' argument
if out != NULL, and here we don't expect str_fd to be freed since it was
provided by getenv() and would result in memory violation.

To prevent any invalid free, we must ensure that str_fd is set to NULL
prior to calling memprintf().

This must be backported in 2.7 with eba6a54cd4 ("MINOR: logs: startup-logs
can use a shm for logging the reload")
2023-04-05 17:06:38 +02:00
William Lallemand
b4e651f12f BUG/MINOR: mworker: unset more internal variables from program section
People who use HAProxy as a process 1 in containers sometimes starts
other things from the program section. This is still not recommend as
the master process has minimal features regarding process management.

Environment variables are still inherited, even internal ones.

Since 2.7, it could provoke a crash when inheriting the
HAPROXY_STARTUPLOGS_FD variable.

Note: for future releases it should be better to clean the env and sets
a list of variable to be exported. We need to determine which variables
are used by users before.

Must be backported in 2.7.
2023-04-05 16:02:36 +02:00
Amaury Denoyelle
15adc4cc4e MINOR: quic: remove address concatenation to ODCID
Previously, ODCID were concatenated with the client address. This was
done to prevent a collision between two endpoints which used the same
ODCID.

Thanks to the two previous patches, first connection generated CID is
now directly derived from the client ODCID using a hash function which
uses the client source address from the same purpose. Thus, it is now
unneeded to concatenate client address to <odcid> quic-conn member.

This change allows to simplify the quic_cid structure management and
reduce its size which is important as it is embedded several times in
various structures such as quic_conn and quic_rx_packet.

This should be backported up to 2.7.
2023-04-05 11:09:57 +02:00
Amaury Denoyelle
2c98209c1c MINOR: quic: remove ODCID dedicated tree
First connection CID generation has been altered. It is now directly
derived from client ODCID since previous commit :
  commit 162baaff7a
  MINOR: quic: derive first DCID from client ODCID

This patch removes the ODCID tree which is now unneeded. On connection
lookup via CID, if a DCID is not found the hash derivation is performed
for an INITIAL/0-RTT packet only. In case a client has used multiple
times an ODCID, this will allow to retrieve our generated DCID in the
CID tree without storing the ODCID node.

The impact of this two combined patch is that it may improve slightly
haproxy memory footprint by removing a tree node from quic_conn
structure. The cpu calculation induced by hash derivation should only be
performed only a few times per connection as the client will start to
use our generated CID as soon as it received it.

This should be backported up to 2.7.
2023-04-05 11:07:01 +02:00
Amaury Denoyelle
162baaff7a MINOR: quic: derive first DCID from client ODCID
Change the generation of the first CID of a connection. It is directly
derived from the client ODCID using a 64-bits hash function. Client
address is added to avoid collision between clients which could use the
same ODCID.

For the moment, this change as no functional impact. However, it will be
directly used for the next commit to be able to remove the ODCID tree.

This should be backported up to 2.7.
2023-04-05 11:06:04 +02:00
Frédéric Lécaille
ce5c145df5 BUG/MINOR: quic: Possible crashes in qc_idle_timer_task()
This is due to this commit:

    MINOR: quic: Add trace to debug idle timer task issues

where has been added without having been tested at developer level.
<qc> was dereferenced after having been released by qc_conn_release().

Set qc to NULL value after having been released to forbid its dereferencing.
Add a check for qc->idle_timer_task in the traces added by the mentionned
commit above to prevent its dereferencing if NULL.
Take the opportunity of this patch to modify trace events from
QUIC_EV_CONN_SSLALERT to QUIC_EV_CONN_IDLE_TIMER.

Must be backported to 2.6 and 2.7.
2023-04-05 11:03:20 +02:00
Christopher Faulet
2954bcc1e8 BUG/MINOR: http-ana: Don't switch message to DATA when waiting for payload
The HTTP message must remains in BODY state during the analysis, to be able
to report accurate termination state in logs. It is also important to know
the HTTP analysis is still in progress. Thus, when we are waiting for the
message payload, the message is no longer switch to DATA state. This was
used to not process "Expect: " header at each evaluation. But thanks to the
previous patch, it is no long necessary.

This patch also fixes a bug in the lua filter api. Some functions must be
called during the message analysis and not during the payload forwarding. It
is not valid to try to manipulate headers during the forward stage because
headers are already forwarded. We rely on the message state to detect
errors. So the api was unusable if a "wait-for-body" action was used.

This patch shoud fix the issue #2093. It relies on the commit:

  * MINOR: http-ana: Add a HTTP_MSGF flag to state the Expect header was checked

Both must be backported as far as 2.5.
2023-04-05 10:53:20 +02:00
Christopher Faulet
ffcffa8e93 MINOR: http-ana: Add a HTTP_MSGF flag to state the Expect header was checked
HTTP_MSGF_EXPECT_CHECKED is now set on the request message to know the
"Expect: " header was already handled, if any. The flag is set from the
moment we try to handle the header to send a "100-continue" response,
whether it was found or not.

This way, when we are waiting for the request payload, thanks to this flag,
we only try to handle "Expect: " header only once. Before it was performed
by changing the message state from BODY to DATA. But this has some side
effects and it is no accurate. So, it is better to rely on a flag to do so.
2023-04-05 10:33:32 +02:00
Aurelien DARRAGON
223770ddca MINOR: hlua/event_hdl: per-server event subscription
Now that event_hdl api is properly implemented in hlua, we may add the
per-server event subscription in addition to the global event
subscription.

Per-server subscription allows to be notified for events related to
single server. It is useful to track a server UP/DOWN and DEL events.

It works exactly like core.event_sub() except that the subscription
will be performed within the server dedicated subscription list instead
of the global one.
The callback function will only be called for server events affecting
the server from which the subscription was performed.

Regarding the implementation, it is pretty trivial at this point, we add
more doc than code this time.

Usage examples have been added to the (lua) documentation.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
c84899c636 MEDIUM: hlua/event_hdl: initial support for event handlers
Now that the event handler API is pretty mature, we can expose it in
the lua API.

Introducing the core.event_sub(<event_types>, <cb>) lua function that
takes an array of event types <event_types> as well as a callback
function <cb> as argument.

The function returns a subscription <sub> on success.
Subscription <sub> allows you to manage the subscription from anywhere
in the script.
To this day only the sub->unsub method is implemented.

The following event types are currently supported:
  - "SERVER_ADD": when a server is added
  - "SERVER_DEL": when a server is removed from haproxy
  - "SERVER_DOWN": server states goes from up to down
  - "SERVER_UP": server states goes from down to up

As for the <cb> function: it will be called when one of the registered
event types occur. The function will be called with 3 arguments:
  cb(<event>,<data>,<sub>)

<event>: event type (string) that triggered the function.
(could be any of the types used in <event_types> when registering
the subscription)

<data>: data associated with the event (specific to each event family).

For "SERVER_" family events, server details such as server name/id/proxy
will be provided.
If the server still exists (not yet deleted), a reference to the live
server is provided to spare you from an additionnal lookup if you need
to have direct access to the server from lua.

<sub> refers to the subscription. In case you need to manage it from
within an event handler.
(It refers to the same subscription that the one returned from
core.event_sub())

Subscriptions are per-thread: the thread that will be handling the
event is the one who performed the subscription using
core.event_sub() function.

Each thread treats events sequentially, it means that if you have,
let's say SERVER_UP, then SERVER_DOWN in a short timelapse, then your
cb function will first be called with SERVER_UP, and once you're done
handling the event, your function will be called again with SERVER_DOWN.

This is to ensure event consitency when it comes to logging / triggering
logic from lua.

Your lua cb function may yield if needed, but you're pleased to process
the event as fast as possible to prevent the event queue from growing up

To prevent abuses, if the event queue for the current subscription goes
over 100 unconsumed events, the subscription will pause itself
automatically for as long as it takes for your handler to catch up.
This would lead to events being missed, so a warning will be emitted in
the logs to inform you about that. This is not something you want to let
happen too often, it may indicate that you subscribed to an event that
is occurring too frequently or/and that your callback function is too
slow to keep up the pace and you should review it.

If you want to do some parallel processing because your callback
functions are slow: you might want to create subtasks from lua using
core.register_task() from within your callback function to perform the
heavy job in a dedicated task and allow remaining events to be processed
more quickly.

Please check the lua documentation for more information.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
4e5e26641d MINOR: proxy: add findserver_unique_id() and findserver_unique_name()
Adding alternative findserver() functions to be able to perform an
unique match based on name or puid and by leveraging revision id (rid)
to make sure the function won't match with a new server reusing the
same name or puid of the "potentially deleted" server we were initially
looking for.

For example, if you were in the position of finding a server based on
a given name provided to you by a different context:

Since dynamic servers were implemented, between the time the name was
picked and the time you will perform the findserver() call some dynamic
server deletion/additions could've been performed in the mean time.

In such cases, findserver() could return a new server that re-uses the
name of a previously deleted server. Depending on your needs, it could
be perfectly fine, but there are some cases where you want to lookup
the original server that was provided to you (if it still exists).
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
f751a97a11 MINOR: event_hdl: pause/resume for subscriptions
While working on event handling from lua, the need for a pause/resume
function to temporarily disable a subscription was raised.

We solve this by introducing the EHDL_SUB_F_PAUSED flag for
subscriptions.

The flag is set via _pause() and cleared via _resume(), and it is
checked prior to notifying the subscription in publish function.

Pause and Resume functions are also available for via lookups for
identified subscriptions.

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
b4b7320a6a MINOR: event_hdl: add event_hdl_async_equeue_size() function
Use event_hdl_async_equeue_size() in advanced async task handler to
get the near real-time event queue size.

By near real-time, you should understand that the queue size is not
updated during element insertion/removal, but shortly before insertion
and shortly after removal, so the size should reflect the approximate
queue size at a given time but should definitely not be used as a
unique source of truth.

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
b289fd1420 MINOR: event_hdl: normal tasks support for advanced async mode
advanced async mode (EVENT_HDL_ASYNC_TASK) provided full support for
custom tasklets registration.

Due to the similarities between tasks and tasklets, it may be useful
to use the advanced mode with an existing task (not a tasklet).
While the API did not explicitly disallow this usage, things would
get bad if we try to wakeup a task using tasklet_wakeup() for notifying
the task about new events.

To make the API support both custom tasks and tasklets, we use the
TASK_IS_TASKLET() macro to call the proper waking function depending
on the task's type:

  - For tasklets: we use tasklet_wakeup()
  - For tasks: we use task_wakeup()

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
afcfc20e14 BUG/MEDIUM: event_hdl: fix async data refcount issue
In _event_hdl_publish(), when publishing an event to async handler(s),
async_data is allocated only once and then relies on a refcount
logic to reuse the same data block for multiple async event handlers.
(this allows to save significant amount of memory)

Because the refcount is first set to 0, there is a small race where
the consumers could consume async data (async data refcount reaching 0)
before publishing is actually over.
The consequence is that async data may be freed by one of the consumers
while we still rely on it within _event_hdl_publish().

This was discovered by chance when stress-testing the API with multiple
async handlers registered to the same event: some of the handlers were
notified about a new event for which the event data was already freed,
resulting in invalid reads and/or segfaults.

To fix this, we first set the refcount to 1, assuming that the
publish function relies on async_data until the publish is over.
At the end of the publish, the reference to the async data is dropped.

This way, async_data is either freed by _event_hdl_publish() itself
or by one of the consumers, depending on who is the last one relying
on it.

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
ef6ca67176 BUG/MEDIUM: event_hdl: clean soft-stop handling
soft-stop was not explicitly handled in event_hdl API.

Because of this, event_hdl was causing some leaks on deinit paths.
Moreover, a task responsible for handling events could require some
additional cleanups (ie: advanced async task), and as the task was not
protected against abort when soft-stopping, such cleanup could not be
performed unless the task itself implements the required protections,
which is not optimal.

Consider this new approach:
 'jobs' global variable is incremented whenever an async subscription is
 created to prevent the related task from being aborted before the task
 acknowledges the final END event.

 Once the END event is acknowledged and freed by the task, the 'jobs'
 variable is decremented, and the deinit process may continue (including
 the abortion of remaining tasks not guarded by the 'jobs' variable).

To do this, a new global mt_list is required: known_event_hdl_sub_list
This list tracks the known (initialized) subscription lists within the
process.

sub_lists are automatically added to the "known" list when calling
event_hdl_sub_list_init(), and are removed from the list with
event_hdl_sub_list_destroy().

This allows us to implement a global thread-safe event_hdl deinit()
function that is automatically called on soft-stop thanks to signal(0).
When event_hdl deinit() is initiated, we simply iterate against the known
subscription lists to destroy them.

event_hdl_subscribe_ptr() was slightly modified to make sure that a sub_list
may not accept new subscriptions once it is destroyed (removed from the
known list)
This can occur between the time the soft-stop is initiated (signal(0)) and
haproxy actually enters in the deinit() function (once tasks are either
finished or aborted and other threads already joined).

It is safe to destroy() the subscription list multiple times as long
as the pointer is still valid (ie: first on soft-stop when handling
the '0' signal, then from regular deinit() path): the function does
nothing if the subscription list is already removed.

We partially reverted "BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe"
since we can use parent mt_list locking instead of a dedicated lock to make
the check gainst duplicate subscription ID.
(insert_lock is not useful anymore)

The check in itself is not changed, only the locking method.

sizeof(event_hdl_sub_list) slightly increases: from 24 bits to 32bits due
to the additional mt_list struct within it.

With that said, having thread-safe list to store known subscription lists
is a good thing: it could help to implement additional management
logic for subcription lists and could be useful to add some stats or
debugging tools in the future.

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
3a81e997ac MINOR: event_hdl: global sublist management clarification
event_hdl_sub_list_init() and event_hdl_sub_list_destroy() don't expect
to be called with a NULL argument (to use global subscription list
implicitly), simply because the global subscription list init and
destroy is internally managed.

Adding BUG_ON() to detect such invalid usages, and updating some comments
to prevent confusion around these functions.

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
d514ca45c6 BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe
List insertion in event_hdl_subscribe() was not thread-safe when dealing
with unique identifiers. Indeed, in this case the list insertion is
conditional (we check for a duplicate, then we insert). And while we're
using mt lists for this, the whole operation is not atomic: there is a
race between the check and the insertion.
This could lead to the same ID being registered multiple times with
concurrent calls to event_hdl_subscribe() on the same ID.

To fix this, we add 'insert_lock' dedicated lock in the subscription
list struct. The lock's cost is nearly 0 since it is only used when
registering identified subscriptions and the lock window is very short:
we only guard the duplicate check and the list insertion to make the
conditional insertion "atomic" within a given subscription list.
This is the only place where we need the lock: as soon as the item is
properly inserted we're out of trouble because all other operations on
the list are already thread-safe thanks to mt lists.

A new lock hint is introduced: LOCK_EHDL which is dedicated to event_hdl

The patch may seem quite large since we had to rework the logic around
the subscribe function and switch from simple mt_list to a dedicated
struct wrapping both the mt_list and the insert_lock for the
event_hdl_sub_list type.
(sizeof(event_hdl_sub_list) is now 24 instead of 16)

However, all the changes are internal: we don't break the API.

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
b8038996e9 MINOR: hlua: support for optional arguments to core.register_task()
core.register_task(function) may now take up to 4 additional arguments
that will be passed as-is to the task function.
This could be convenient to spawn sub-tasks from existing functions
supporting core.register_task() without the need to use global
variables to pass some context to the newly created task function.

The new prototype is:

  core.register_task(function[, arg1[, arg2[, ...[, arg4]]]])

Implementation remains backward-compatible with existing scripts.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
94ee6632ee MINOR: hlua_fcn: add server->get_rid() method
Server revision ID was recently added to haproxy with 61e3894
("MINOR: server: add srv->rid (revision id) value")

Let's add it to the hlua server class.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
6b0b9bd39f BUG/MEDIUM: hlua: prevent deadlocks with main lua lock
Main lua lock is used at various places in the code.

Most of the time it is used from unprotected lua environments,
in which case the locking is mandatory.
But there are some cases where the lock is attempted from protected
lua environments, meaning that lock is already owned by the current
thread. Thus new locking attempt should be skipped to prevent any
deadlocks from occuring.

To address this, "already_safe" lock hint was implemented in
hlua_ctx_init() function with commit bf90ce1
("BUG/MEDIUM: lua: dead lock when Lua tasks are trigerred")

But this approach is not very safe, for 2 reasons:

First reason is that there are still some code paths that could lead
to deadlocks.
For instance, in register_task(), hlua_ctx_init() is called with
already_safe set to 1 to prevent deadlock from occuring.

But in case of task init failure, hlua_ctx_destroy() will be called
from the same environment (protected environment), and hlua_ctx_destroy()
does not offer the already_safe lock hint.. resulting in a deadlock.

Second reason is that already_safe hint is used to completely skip
SET_LJMP macros (which manipulates the lock internally), resulting
in some logics in the function being unprotected from lua aborts in
case of unexpected errors when manipulating the lua stack (the lock
does not protect against longjmps)

Instead of leaving the locking responsibility to the caller, which is
quite error prone since we must find out ourselves if we are or not in
a protected environment (and is not robust against code re-use),
we move the deadlock protection logic directly in hlua_lock() function.

Thanks to a thread-local lock hint, we can easily guess if the current
thread already owns the main lua lock, in which case the locking attempt
is skipped. The thread-local lock hint is implemented as a counter so
that the lock is properly dropped when the counter reaches 0.
(to match actual lock() and unlock() calls)

This commit depends on "MINOR: hlua: simplify lua locking"
It may be backported to every stable versions.
[prior to 2.5 lua filter API did not exist, filter-related parts
should be skipped]
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
e36f803b71 MINOR: hlua: simplify lua locking
The check on lua state==0 to know whether locking is required or not can
be performed in a locking wrapper to simplify things a bit and prevent
implementation errors.

Locking from hlua context should now be performed via hlua_lock(L) and
unlocking via hlua_unlock(L)
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
fde199dddc CLEANUP: hlua: use hlua_unref() instead of luaL_unref()
Replacing some luaL_unref(, LUA_REGISTRYINDEX) calls with hlua_unref()
which is simpler to use and more explicit.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
4fdf8b58f2 CLEANUP: hlua: use hlua_pushref() instead of lua_rawgeti()
Using hlua_pushref() everywhere temporary lua objects are involved.
(ie: hlua_checkfunction(), hlua_checktable...)
Those references are expected to be cleared using hlua_unref() when
they are no longer used.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
73d1a98d52 CLEANUP: hlua: use hlua_ref() instead of luaL_ref()
Using hlua_ref() everywhere temporary lua objects are involved.
Those references are expected to be cleared using hlua_unref()
when they are no longer used.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
55afbedfb4 BUG/MINOR: hlua: prevent function and table reference leaks on errors
Several error paths were leaking function or table references.
(Obtained through hlua_checkfunction() and hlua_checktable() functions)

Now we properly release the references thanks to hlua_unref() in
such cases.

This commit depends on "MINOR: hlua: add simple hlua reference handling API"

This could be backported in every stable versions although it is not
mandatory as such leaks only occur on rare error/warn paths.
[prior to 2.5 lua filter API did not exist, the hlua_register_filter()
part should be skipped]
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
16d047b615 BUG/MINOR: hlua: fix reference leak in hlua_post_init_state()
hlua init function references were not released during
hlua_post_init_state().

Hopefully, this function is only used during startup so the resulting
leak is not a big deal.
Since each init lua function runs precisely once, it is safe to release
the ref as soon as the function is restored on the stack.

This could be backported to every stable versions.
Please note that this commit depends on "MINOR: hlua: add simple hlua reference handling API"
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
be58d6683c BUG/MINOR: hlua: fix reference leak in core.register_task()
In core.register_task(): we take a reference to the function passed as
argument in order to push it in the new coroutine substack.
However, once pushed in the substack: the reference is not useful
anymore and should be cleared.
Currently, this is not the case in hlua_register_task().

Explicitly dropping the reference once the function is pushed to the
coroutine's stack to prevent any reference leak (which could contribute
to resource shortage)

This may be backported to every stable versions.
Please note that this commit depends on "MINOR: hlua: add simple hlua reference handling API"
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
9ee0d04770 MINOR: hlua: fix return type for hlua_checkfunction() and hlua_checktable()
hlua_checktable() and hlua_checkfunction() both return the raw
value of luaL_ref() function call.
As luaL_ref() returns a signed int, both functions should return a signed
int as well to prevent any misuse of the returned reference value.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
f8f8a2b872 MINOR: hlua: add simple hlua reference handling API
We're doing this in an attempt to simplify temporary lua objects
references handling.

Adding the new hlua_unref() function to release lua object references
created using luaL_ref(, LUA_REGISTRYINDEX)
(ie: hlua_checkfunction() and hlua_checktable())

Failure to release unused object reference prevents the reference index
from being re-used and prevents the referred ressource from being garbage
collected.

Adding hlua_pushref(L, ref) to replace
lua_rawgeti(L, LUA_REGISTRYINDEX, ref)

Adding hlua_ref(L) to replace luaL_ref(L, LUA_REGISTRYINDEX)
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
60ab0f7d20 CLEANUP: hlua: fix conflicting comment in hlua_ctx_destroy()
The comment for the hlua_ctx_destroy() function states that the "lua"
struct is not freed.

This is not true anymore since 2c8b54e7 ("MEDIUM: lua: remove Lua struct
from session, and allocate it with memory pools")

Updating the function comment to properly report the actual behavior.

This could be backported in every stable versions with 2c8b54e7
("MEDIUM: lua: remove Lua struct from session, and allocate it with memory pools")
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
c4b2437037 MEDIUM: hlua_fcn/api: remove some old server and proxy attributes
Since ("MINOR: hlua_fcn: alternative to old proxy and server attributes"):
 - s->name(), s->puid() are superseded by s->get_name() and s->get_puid()
 - px->name(), px->uuid() are superseded by px->get_name() and
   px->get_uuid()

And considering this is now the proper way to retrieve proxy name/uuid
and server name/puid from lua:

We're now removing such legacy attributes, but for retro-compatibility
purposes we will be emulating them and warning the user for some time
before completely dropping their support.

To do this, we first remove old legacy code.
Then we move server and proxy methods out of the metatable to allow
direct elements access without systematically involving the "__index"
metamethod.

This allows us to involve the "__index" metamethod only when the requested
key is missing from the table.

Then we define relevant hlua_proxy_index and hlua_server_index functions
that will be used as the "__index" metamethod to respectively handle
"name, uuid" (proxy) or "name, puid" (server) keys, in which case we
warn the user about the need to use the new getter function instead the
legacy attribute (to prepare for the potential upcoming removal), and we
call the getter function to return the value as if the getter function
was directly called from the script.

Note: Using the legacy variables instead of the getter functions results
in a slight overhead due to the "__index" metamethod indirection, thus
it is recommended to switch to the getter functions right away.

With this commit we're also adding a deprecation notice about legacy
attributes.
2023-04-05 08:58:16 +02:00
Thierry Fournier
1edf36a369 MEDIUM: hlua_fcn: dynamic server iteration and indexing
This patch proposes to enumerate servers using internal HAProxy list.
Also, remove the flag SRV_F_NON_PURGEABLE which makes the server non
purgeable each time Lua uses the server.

Removing reg-tests/cli_delete_server_lua.vtc since this test is no
longer relevant (we don't set the SRV_F_NON_PURGEABLE flag anymore)
and we already have a more generic test:
  reg-tests/server/cli_delete_server.vtc

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2023-04-05 08:58:16 +02:00
Thierry Fournier
b0467730a0 MINOR: hlua_fcn: alternative to old proxy and server attributes
This patch adds new lua methods:
- "Proxy.get_uuid()"
- "Proxy.get_name()"
- "Server.get_puid()"
- "Server.get_name()"

These methods will be equivalent to their old analog Proxy.{uuid,name}
and Server.{puid,name} attributes, but this will be the new preferred
way to fetch such infos as it duplicates memory only when necessary and
thus reduce the overall lua Server/Proxy objects memory footprint.

Legacy attributes (now superseded by the explicit getters) are expected
to be removed some day.

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2023-04-05 08:58:16 +02:00
Thierry Fournier
467913c84e MEDIUM: hlua: Dynamic list of frontend/backend in Lua
When HAproxy is loaded with a lot of frontends/backends (tested with 300k),
it is slow to start and it uses a lot of memory just for indexing backends
in the lua tables.

This patch uses the internal frontend/backend index of HAProxy in place of
lua table.

HAProxy startup is now quicker as each frontend/backend object is created
on demand and not at init.
This has to come with some cost: the execution of Lua will be a little bit
slower.
2023-04-05 08:58:16 +02:00
Thierry Fournier
599f2311a8 MINOR: hlua: Fix two functions that return nothing useful
Two lua init function seems to return something useful, but it
is not the case. The function "hlua_concat_init" seems to return
a failure status, but the function never fails. The function
"hlua_fcn_reg_core_fcn" seems to return a number of elements in
the stack, but it is not the case.
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
87f52974ba BUG/MINOR: hlua: enforce proper running context for register_x functions
register_{init, converters, fetches, action, service, cli, filter} are
meant to run exclusively from body context according to the
documentation (unlike register_task which is designed to work from both
init and runtime contexts)

A quick code inspection confirms that only register_task implements
the required precautions to make it safe out of init context.

Trying to use those register_* functions from a runtime lua task will
lead to a program crash since they all assume that they are running from
the main lua context and with no concurrent runs:

    core.register_task(function()
      core.register_init(function()
      end)
    end)

When loaded from the config, the above example would segfault.

To prevent this undefined behavior, we now report an explicit error if
the user tries to use such functions outside of init/body context.

This should be backported in every stable versions.
[prior to 2.5 lua filter API did not exist, the hlua_register_filter()
part should be skipped]
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
795441073c MINOR: hlua: properly handle hlua_process_task HLUA_E_ETMOUT
In hlua_process_task: when HLUA_E_ETMOUT was returned by
hlua_ctx_resume(), meaning that the lua task reached
tune.lua.task-timeout (default: none),
we logged "Lua task: unknown error." before stopping the task.

Now we properly handle HLUA_E_ETMOUT to report a meaningful error
message.
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
0ebd41ff50 BUG/MINOR: hlua: hook yield does not behave as expected
In function hlua_hook, a yieldk is performed when function is yieldable.

But the following code in that function seems to assume that the yield
never returns, which is not the case!

Moreover, Lua documentation says that in this situation the yieldk call
must immediately be followed by a return.

This patch adds a return statement after the yieldk call.
It also adds some comments and removes a needless lua_sethook call.

It could be backported to all stable versions, but it is not mandatory,
because even if it is undefined behavior this bug doesn't seem to
negatively affect lua 5.3/5.4 stacks.
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
32483ecaac MINOR: server: correctly free servers on deinit()
srv_drop() function is reponsible for freeing the server when the
refcount reaches 0.
There is one exception: when global.mode has the MODE_STOPPING flag set,
srv_drop() will ignore the refcount and free the server on first
invocation.

This logic has been implemented with 13f2e2ce ("BUG/MINOR: server: do
not use refcount in free_server in stopping mode") and back then doing
so was not a problem since dynamic server API was just implemented and
srv_take() and srv_drop() were not widely used.

Now that dynamic server API is starting to get more popular we cannot
afford to keep the current logic: some modules or lua scripts may hold
references to existing server and also do their cleanup in deinit phases

In this kind of situation, it would be easy to trigger double-frees
since every call to srv_drop() on a specific server will try to free it.

To fix this, we take a different approach and try to fix the issue at
the source: we now properly drop server references involved with
checks/agent_checks in deinit_srv_check() and deinit_srv_agent_check().

While this could theorically be backported up to 2.6, it is not very
relevant for now since srv_drop() usage in older versions is very
limited and we're only starting to face the issue in mid 2.8
developments. (ie: lua core updates)
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
b5ee8bebfc MINOR: server: always call ssl->destroy_srv when available
In srv_drop(), we only call the ssl->destroy_srv() method on
specific conditions.

But this has two downsides:

First, destroy_srv() is reponsible for freeing data that may have been
allocated in prepare_srv(), but not exclusively: it also frees
ssl-related parameters allocated when parsing a server entry, such as
ca-file for instance.

So this is quite error-prone, we could easily miss a condition where
some data needs to be deallocated using destroy_srv() even if
prepare_srv() was not used (since prepare_srv() is also conditional),
thus resulting in memory leaks.

Moreover, depending on srv->proxy to guard the check is probably not
a good idea here, since srv_drop() could be called in late de-init paths
in which related proxy could be freed already. srv_drop() should only
take care of freeing local server data without external logic.

Thankfully, destroy_srv() function performs the necessary checks to
ensure that a systematic call to the function won't result in invalid
reads or double frees.

No backport needed.
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
cca3355074 BUG/MINOR: log: free log forward proxies on deinit()
Proxies belonging to the cfg_log_forward proxy list are not cleaned up
in haproxy deinit() function.
We add the missing cleanup directly in the main deinit() function since
no other specific function may be used for this.

This could be backported up to 2.4
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
9b1d15f53a BUG/MINOR: sink: free forward_px on deinit()
When a ring section is configured, a new sink is created and forward_px
proxy may be allocated and assigned to the sink.
Such sink-related proxies are added to the sink_proxies_list and thus
don't belong to the main proxy list which is cleaned up in
haproxy deinit() function.

We don't have to manually clean up sink_proxies_list in the main deinit()
func:
sink API already provides the sink_deinit() function so we just add the
missing free_proxy(sink->forward_px) there.

This could be backported up to 2.4.
[in 2.4, commit b0281a49 ("MINOR: proxy: check if p is NULL in free_proxy()")
must be backported first]
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
99a8d0f5d8 BUG/MINOR: stats: properly handle server stats dumping resumption
In stats_dump_proxy_to_buffer() function, special care was taken when
dealing with servers dump.
Indeed, stats_dump_proxy_to_buffer() can be interrupted and resumed if
buffer space is not big enough to complete dump.
Thus, a reference is taken on the server being dumped in the hope that
the server will still be valid when the function resumes.
(to prevent the server from being freed in the meantime)

While this is now true thanks to:
-  "BUG/MINOR: server/del: fix legacy srv->next pointer consistency"

We still have an issue: when resuming, saved server reference is not
dropped.
This prevents the server from being freed when we no longer use it.
Moreover, as the saved server might now be deleted
(SRV_F_DELETED flag set), the current deleted server may still be dumped
in the stats and while this is not a bug, this could be misleading for
the user.

Let's add a px_st variable to detect if the stats_dump_proxy_to_buffer()
is being resumed at the STAT_PX_ST_SV stage: perform some housekeeping
to skip deleted servers and properly drop the reference on the saved
server.

This commit depends on:
 - "MINOR: server: add SRV_F_DELETED flag"
 - "BUG/MINOR: server/del: fix legacy srv->next pointer consistency"

This should be backported up to 2.6
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
f175b08bfb BUG/MINOR: server/del: fix srv->next pointer consistency
We recently discovered a bug which affects dynamic server deletion:

When a server is deleted, it is removed from the "visible" server list.
But as we've seen in previous commit
("MINOR: server: add SRV_F_DELETED flag"), it can still be accessed by
someone who keeps a reference on it (waiting for the final srv_drop()).
Throughout this transient state, server ptr is still valid (may be
dereferenced) and the flag SRV_F_DELETED is set.

However, as the server is not part of server list anymore, we have
an issue: srv->next pointer won't be updated anymore as the only place
where we perform such update is in cli_parse_delete_server() by
iterating over the "visible" server list.

Because of this, we cannot guarantee that a server with the
SRV_F_DELETED flag has a valid 'next' ptr: 'next' could be pointing
to a fully removed (already freed) server.

This problem can be easily demonstrated with server dumping in
the stats:

server list dumping is performed in stats_dump_proxy_to_buffer()
The function can be interrupted and resumed later by design.
ie: output buffer is full: partial dump and finish the dump after
the flush

This is implemented by calling srv_take() on the server being dumped,
and only releasing it when we're done with it using srv_drop().
(drop can be delayed after function resume if buffer is full)

While the function design seems OK, it works with the assumption that
srv->next will still be valid after the function resumes, which is
not true. (especially if multiple servers are being removed in between
the 2 dumping attempts)

In practice, this did not cause any crash yet (at least this was not
reported so far), because server dumping is so fast that it is very
unlikely that multiple server deletions make their way between 2
dumping attempts in most setups. But still, this is a problem that we
need to address because some upcoming work might depend on this
assumption as well and for the moment it is not safe at all.

========================================================================

Here is a quick reproducer:

With this patch, we're creating a large deletion window of 3s as soon
as we reach a server named "t2" while iterating over the list.

This will give us plenty of time to perform multiple deletions before
the function is resumed.

    |  diff --git a/src/stats.c b/src/stats.c
    |  index 84a4f9b6e..15e49b4cd 100644
    |  --- a/src/stats.c
    |  +++ b/src/stats.c
    |  @@ -3189,11 +3189,24 @@ int stats_dump_proxy_to_buffer(struct stconn *sc, struct htx *htx,
    |                    * Temporarily increment its refcount to prevent its
    |                    * anticipated cleaning. Call free_server to release it.
    |                    */
    |  +                struct server *orig = ctx->obj2;
    |                   for (; ctx->obj2 != NULL;
    |                          ctx->obj2 = srv_drop(sv)) {
    |
    |                           sv = ctx->obj2;
    |  +                        printf("sv = %s\n", sv->id);
    |                           srv_take(sv);
    |  +                        if (!strcmp("t2", sv->id) && orig == px->srv) {
    |  +                                printf("deletion window: 3s\n");
    |  +                                thread_idle_now();
    |  +                                thread_harmless_now();
    |  +                                sleep(3);
    |  +                                thread_harmless_end();
    |  +
    |  +                                thread_idle_end();
    |  +
    |  +                                goto full; /* simulate full buffer */
    |  +                        }
    |
    |                           if (htx) {
    |                                   if (htx_almost_full(htx))
    |  @@ -4353,6 +4366,7 @@ static void http_stats_io_handler(struct appctx *appctx)
    |           struct channel *res = sc_ic(sc);
    |           struct htx *req_htx, *res_htx;
    |
    |  +        printf("http dump\n");
    |           /* only proxy stats are available via http */
    |           ctx->domain = STATS_DOMAIN_PROXY;
    |

Ok, we're ready, now we start haproxy with the following conf:

       global
          stats socket /tmp/ha.sock  mode 660  level admin  expose-fd listeners thread 1-1
          nbthread 2

       frontend stats
           mode http
           bind *:8081 thread 2-2
           stats enable
           stats uri /

       backend farm
               server t1 127.0.0.1:1899  disabled
               server t2 127.0.0.1:18999 disabled
               server t3 127.0.0.1:18998 disabled
               server t4 127.0.0.1:18997 disabled

And finally, we execute the following script:

       curl localhost:8081/stats&
       sleep .2
       echo "del server farm/t2" | nc -U /tmp/ha.sock
       echo "del server farm/t3" | nc -U /tmp/ha.sock

This should be enough to reveal the issue, I easily manage to
consistently crash haproxy with the following reproducer:

    http dump
    sv = t1
    http dump
    sv = t1
    sv = t2
    deletion window = 3s
    [NOTICE]   (2940566) : Server deleted.
    [NOTICE]   (2940566) : Server deleted.
    http dump
    sv = t2
    sv = �����U
    [1]    2940566 segmentation fault (core dumped)  ./haproxy -f ttt.conf

========================================================================

To fix this, we add prev_deleted mt_list in server struct.
For a given "visible" server, this list will contain the pending
"deleted" servers references that point to it using their 'next' ptr.

This way, whenever this "visible" server is going to be deleted via
cli_parse_delete_server() it will check for servers in its
'prev_deleted' list and update their 'next' pointer so that they no
longer point to it, and then it will push them in its
'next->prev_deleted' list to transfer the update responsibility to the
next 'visible' server (if next != NULL).

Then, following the same logic, the server about to be removed in
cli_parse_delete_server() will push itself as well into its
'next->prev_deleted' list (if next != NULL) so that it may still use its
'next' ptr for the time it is in transient removal state.

In srv_drop(), right before the server is finally freed, we make sure
to remove it from the 'next->prev_deleted' list so that 'next' won't
try to perform the pointers update for this server anymore.
This has to be done atomically to prevent 'next' srv from accessing a
purged server.

As a result:
  for a valid server, either deleted or not, 'next' ptr will always
  point to a non deleted (ie: visible) server.

With the proposed fix, and several removal combinations (including
unordered cli_parse_delete_server() and srv_drop() calls), I cannot
reproduce the crash anymore.

Example tricky removal sequence that is now properly handled:

sv list: t1,t2,t3,t4,t5,t6

ops:
   take(t2)
   del(t4)
   del(t3)
   del(t5)
   drop(t3)
   drop(t4)
   drop(t5)
   drop(t2)
2023-04-05 08:58:16 +02:00
Aurelien DARRAGON
75b9d1c041 MINOR: server: add SRV_F_DELETED flag
Set the SRV_F_DELETED flag when server is removed from the cli.

When removing a server from the cli (in cli_parse_delete_server()),
we update the "visible" server list so that the removed server is no
longer part of the list.

However, despite the server being removed from "visible" server list,
one could still access the server data from a valid ptr (ie: srv_take())

Deleted flag helps detecting when a server is in transient removal
state: that is, removed from the list, thus not visible but not yet
purged from memory.
2023-04-05 08:58:16 +02:00
Christopher Faulet
8019f78326 MINOR: stconn/applet: Add BUG_ON_HOT() to be sure SE_FL_EOS is never set alone
SE_FL_EOS flag must never be set on the SE descriptor without SE_FL_EOI or
SE_FL_ERROR. When a mux or an applet report an end of stream, it must be
able to state if it is the end of input too or if it is an error.

Because all this part was recently refactored, especially the applet part,
it is a bit sensitive. Thus a BUG_ON_HOT() is used and not a BUG_ON().
2023-04-05 08:57:06 +02:00
Christopher Faulet
7faac7cf34 MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly
At many places, we simplify the tests on SHUT flags to remove calls to
chn_prod() or chn_cons() function because the corresponding SC is available.
2023-04-05 08:57:06 +02:00
Christopher Faulet
87633c3a11 MEDIUM: tree-wide: Move flags about shut from the channel to the SC
The purpose of this patch is only a one-to-one replacement, as far as
possible.

CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
2023-04-05 08:57:06 +02:00
Christopher Faulet
904763f562 MINOR: stconn/channel: Move CF_EOI into the SC and rename it
The channel flag CF_EOI is renamed to SC_FL_EOI and moved into the
stream-connector.
2023-04-05 08:57:06 +02:00
Christopher Faulet
be08df8fb3 MEDIUM: http_client: Use the sedesc to report and detect end of processing
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream. Here, the applet is a bit
refactored to handle SE descriptor EOS, EOI and ERROR flags
2023-04-05 08:57:06 +02:00
Christopher Faulet
df15a5d1f3 MEDIUM: stats: Use the sedesc to report and detect end of processing
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream.
2023-04-05 08:57:06 +02:00
Christopher Faulet
a739dc22c5 MEDIUM: sink: Use the sedesc to report and detect end of processing
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream.
2023-04-05 08:57:06 +02:00
Christopher Faulet
4b866959d8 MINOR: sink: Remove the tests on the opposite SC state to process messages
The state of the opposite SC is already tested to wait the connection is
established before sending messages. So, there is no reason to test it again
before looping on the ring buffer.
2023-04-05 08:57:06 +02:00
Christopher Faulet
3d949010bc MEDIUM: peers: Use the sedesc to report and detect end of processing
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream. We must just be sure to consume
request data when we are waiting the applet to be released.
2023-04-05 08:57:05 +02:00
Christopher Faulet
22a88f06d4 MEDIUM: log: Use the sedesc to report and detect end of processing
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream.

Here, the refactoring only reports errors by setting SE_FL_ERROR flag.
2023-04-05 08:57:05 +02:00
Christopher Faulet
31572229ed MEDIUM: hlua/applet: Use the sedesc to report and detect end of processing
There are 3 kinds of applet in lua: The co-sockets, the TCP services and the
HTTP services. The three are refactored to use the SE descriptor instead of
the channel to report error and end-of-stream.
2023-04-05 08:57:05 +02:00
Christopher Faulet
d550d26a39 MEDIUM: spoe: Use the sedesc to report and detect end of processing
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream. We must just be sure to consume
request data when we are waiting the applet to be released.

This patch is bit different than others because messages handling is
dispatched in several functions. But idea if the same.
2023-04-05 08:57:05 +02:00
Christopher Faulet
26769b0775 MEDIUM: dns: Use the sedesc to report and detect end of processing
It is now the dns turn to be refactored to use the SE descriptor instead of
the channel to report error and end-of-stream. We must just be sure to
consume request data when we are waiting the applet to be released.
2023-04-05 08:57:05 +02:00
Christopher Faulet
4d3283f44b MINOR: dns: Remove the test on the opposite SC state to send requests
The state of the opposite SC is already tested to wait the connection is
established before sending requests. So, there is no reason to test it again
before looping on the ring buffer.
2023-04-05 08:57:05 +02:00
Christopher Faulet
2fd0c7669d MEDIUM: cli: Use the sedesc to report and detect end of processing
It is the same kind of change than for the cache applet. Idea is to use the SE
desc instead of the channel or the SC to report end-of-input, end-of-stream and
errors.

Truncated commands are now reported on error. Other changes are the same than
for the cache applet. We now set SE_FL_EOS flag instead of calling cf_shutr()
and calls to cf_shutw are removed.
2023-04-05 08:57:05 +02:00
Christopher Faulet
f8130b2de2 MEDIUM: cache: Use the sedesc to report and detect end of processing
We now try, as far as possible, to rely on the SE descriptor to detect end
of processing. Idea is to no longer rely on the channel or the SC to do so.

First, we now set SE_FL_EOS instead of calling and cf_shutr() to report the
end of the stream. It happens when the response is fully sent (SE_FL_EOI is
already set in this case) or when an error is reported. In this last case,
SE_FL_ERROR is also set.

Thanks to this change, it is now possible to detect the applet must only
consume the request waiting for the upper layer releases it. So, if
SE_FL_EOS or SE_FL_ERROR are set, it means the reponse was fully
handled. And if SE_FL_SHR or SE_FL_SHW are set, it means the applet was
released by upper layer and is waiting to be freed.
2023-04-05 08:57:05 +02:00
Christopher Faulet
0ffc9d7be3 MINOR: stconn/applet: Handle EOS in the applet .wake callback function
Just like for end of input, the end of stream reported by the endpoint
(SE_FL_EOS flag) is now handled in sc_applet_process(). The idea is to have
applets acting as muxes by reporting events through the SE descriptor, as
far as possible.
2023-04-05 08:57:05 +02:00
Christopher Faulet
92297749e1 MINOR: applet: No longer set EOI on the SC
Thanks to the previous patch, it is now possible for applets to not set the
CF_EOI flag on the channels. On this point, the applets get closer to the
muxes.
2023-04-05 08:57:05 +02:00
Christopher Faulet
f8fbb6de66 MINOR: stconn/applet: Handle EOI in the applet .wake callback function
The end of input reported by the endpoint (SE_FL_EOI flag), is now handled
in sc_applet_process(). This function is always called after an applet was
called. So, the applets can now only report EOI on the SE descriptor and
have no reason to update the channel too.
2023-04-05 08:57:05 +02:00
Christopher Faulet
b208d8cd64 MINOR: stconn: Always ack EOS at the end of sc_conn_recv()
EOS is now acknowledge at the end of sc_conn_recv(), even if an error was
encountered. There is no reason to not do so, especially because, if it not
performed here, it will be ack in sc_conn_process().

Note, it is still performed in sc_conn_process() because this function is
also the .wake callback function and can be directly called from the lower
layer.
2023-04-05 08:57:05 +02:00
Christopher Faulet
e9bacf642d MINOR: mux-h1: Report an error to the SE descriptor on truncated message
On truncated message, a parsing error is still reported. But an error on the
SE descriptor is also reported. This will avoid any bugs in future. We are
know sure the SC is able to detect the error, independently on the HTTP
analyzers.
2023-04-05 08:57:05 +02:00
Christopher Faulet
88dd0b0d13 CLEANUP: mux-h1/mux-pt: Remove useless test on SE_FL_SHR/SE_FL_SHW flags
It is already performed by the called, sc_conn_shutr() and
sc_conn_shutw(). So there is no reason to still test these flags in the PT
and H1 muxes.
2023-04-05 08:57:05 +02:00
Christopher Faulet
147e18f9d8 BUG/MINOR: mux-h1: Properly report EOI/ERROR on read0 in h1_rcv_pipe()
In h1_rcv_pipe(), only the end of stream was reported when a read0 was
detected. However, it is also important to report the end of input or an
error, depending on the message state. This patch does not fix any real
issue for now, but some others, specific to the 2.8, rely on it.

No backport needed.
2023-04-05 08:57:05 +02:00
Christopher Faulet
872b01c984 MINOR: mux-pt: Report end-of-input with the end-of-stream after a read
In the PT multiplexer, the end of stream is also the end of input. Thus
we must report EOI to the stream-endpoint descriptor when the EOS is
reported. For now, it is a bit useless but it will be important to
disginguish an shutdown to an error to an abort.

To be sure to not report an EOI on an error, the errors are now handled
first.
2023-04-05 08:57:05 +02:00
Christopher Faulet
84d3ef982c MINOR: stconn/channel: Move CF_EXPECT_MORE into the SC and rename it
The channel flag CF_EXPECT_MORE is renamed to SC_FL_SND_EXP_MORE and moved
into the stream-connector.
2023-04-05 08:57:05 +02:00
Christopher Faulet
68ef218a72 MINOR: stconn/channel: Move CF_NEVER_WAIT into the SC and rename it
The channel flag CF_NEVER_WAIT is renamed to SC_FL_SND_NEVERWAIT and moved
into the stream-connector.
2023-04-05 08:57:05 +02:00
Christopher Faulet
5c281d58ea MINOR: stconn/channel: Move CF_SEND_DONTWAIT into the SC and rename it
The channel flag CF_SEND_DONTWAIT is renamed to SC_FL_SND_ASAP and moved
into the stream-connector.
2023-04-05 08:57:05 +02:00
Christopher Faulet
9a790f63ed MINOR: stconn/channel: Move CF_READ_DONTWAIT into the SC and rename it
The channel flag CF_READ_DONTWAIT is renamed to SC_FL_RCV_ONCE and moved
into the stream-connector.
2023-04-05 08:57:05 +02:00
Christopher Faulet
9bce9724ec MINOR: stconn: Remove unecessary test on SE_FL_EOS before receiving data
In sc_conn_recv(), if the EOS is reported by the endpoint, it will always be
acknowledged by the SC and a read0 will be performed on the input
channel. Thus there is no reason to still test at the begining of the
function because there is already a test on CF_SHUTR.
2023-04-05 08:57:05 +02:00
Christopher Faulet
28975e1e10 BUG/MEDIUM: dns: Properly handle error when a response consumed
When a response is consumed, result for co_getblk() is never checked. It
seems ok because amount of output data is always checked first. But There is
an issue when we try to get the first 2 bytes to read the message length. If
there is only one byte followed by a shutdown, the applet ignore the
shutdown and loop till the timeout to get more data.

So to avoid any issue and improve shutdown detection, the co_getblk() return
value is always tested. In addition, if there is not enough data, the applet
explicitly ask for more data by calling applet_need_more_data().

This patch relies on the previous one:

   * BUG/MEDIUM: channel: Improve reports for shut in co_getblk()

Both should be backported as far as 2.4. On 2.5 and 2.4,
applet_need_more_data() must be replaced by si_rx_endp_more().
2023-04-05 08:57:05 +02:00
Christopher Faulet
5f5c94617e BUG/MEDIUM: channel: Improve reports for shut in co_getblk()
When co_getblk() is called with a length and an offset to 0, shutdown is
never reported. It may be an issue when the function is called to retrieve
all available output data, while there is no output data at all. And it
seems pretty annoying to handle this case in the caller.

Thus, now, in co_getblk(), -1 is returned when the channel is empty and a
shutdown was received.

There is no real reason to backport this patch alone. However, another fix
will rely on it.
2023-04-05 08:57:05 +02:00
Christopher Faulet
2726624ee7 CLEANUP: stconn: Remove remaining debug messages
It is now possible to enable traces for applets. Thus we can remove annoying
debug messages (DPRINTF) to track calls to applets.
2023-04-05 08:57:05 +02:00
Christopher Faulet
26e0935681 MEDIUM: applet/trace: Register a new trace source with its events
Traces are now supported for applets. The first argument is always the
appctx. This will help to debug applets.
2023-04-05 08:46:06 +02:00
Christopher Faulet
a5915eb1dd MINOR: applet: Uninline appctx_free()
This functin is uninlined and move in src/applet.c. It is mandatory to add
traces for applets.
2023-04-05 08:46:06 +02:00
Christopher Faulet
947b2e5922 BUG/MINOR: stream: Fix test on channels flags to set clientfin/serverfin touts
There is a bug in a way the channels flags are checked to set clientfin or
serverfin timeout. Indeed, to set the clientfin timeout, the request channel
must be shut for reads (CF_SHUTR) or the response channel must be shut for
writes (CF_SHUTW). As the opposite, the serverfin timeout must be set when
the request channel is shut for writes (CF_SHUTW) or the response channel is
shut for reads (CF_SHUTR).

It is a 2.8-dev specific issue. No backport needed.
2023-04-05 08:46:06 +02:00
Christopher Faulet
c665bb5637 BUG/MEDIUM: stconn: Add a missing return statement in sc_app_shutr()
In the commut b08c5259e ("MINOR: stconn: Always report READ/WRITE event on
shutr/shutw"), a return statement was erroneously removed from
sc_app_shutr(). As a consequence, CF_SHUTR flags was never set. Fortunately,
it is the default .shutr callback function. Thus when a connection or an
applet is attached to the SC, another callback is used to performe a
shutdown for reads.

It is a 28-dev specific issue. No backport needed.
2023-04-05 08:46:06 +02:00
Christopher Faulet
a664aa6a68 BUG/MINOR: tcpcheck: Be able to expect an empty response
It is not possible to successfully match an empty response. However using
regex, it should be possible to reject response with any content. For
instance:

   tcp-check expect !rstring ".+"

It may seem a be strange to do that, but it is possible, it is a valid
config. So it must work. Thanks to this patch, it is now really supported.

This patch may be backported as far as 2.2. But only if someone ask for it.
2023-04-05 08:46:06 +02:00
Frédéric Lécaille
0222cc6366 BUG/MINOR: quic: Possible wrong PTO computing
As timestamps based on now_ms values are used to compute the probing timeout,
they may wrap. So, use ticks API to compared them.

Must be backported to 2.7 and 2.6.
2023-04-04 18:24:28 +02:00
Frédéric Lécaille
fdb1494985 BUILD: quic: 32bits compilation issue in cli_io_handler_dump_quic()
Replaced a %zu printf format by %llu for an uint64_t.

Must be backported to 2.7.
2023-04-04 18:24:28 +02:00
Frédéric Lécaille
92f4a7c614 BUG/MINOR: quic: Wrong idle timer expiration (during 20s)
This this commit, this is ->idle_expire of quic_conn struct which must
be taken into an account to display the idel timer task expiration value:

     MEDIUM: quic: Ack delay implementation

Furthermore, this value was always zero until now_ms has wrapped (20 s after
the start time) due to this commit:
     MEDIUM: clock: force internal time to wrap early after boot

Do not rely on the value of now_ms compared to ->idle_expire to display the
difference but use ticks_remain() to compute it.

Must be backported to 2.7 where "show quic" has already been backported.
2023-04-04 18:24:28 +02:00
Frédéric Lécaille
12eca3a727 BUG/MINOR: quic: Unexpected connection closures upon idle timer task execution
This bug arrived with this commit:

      MEDIUM: quic: Ack delay implementation

It is possible that the idle timer task was already in the run queue when its
->expire field was updated calling qc_idle_timer_do_rearm(). To prevent this
task from running in this condition, one must check its ->expire field value
with this condition to run the task if its timer has really expired:

	!tick_is_expired(t->expire, now_ms)

Furthermore, as this task may be directly woken up with a call to task_wakeup()
all, for instance by qc_kill_conn() to kill the connection, one must check this
task has really been woken up when it was in the wait queue and not by a direct
call to task_wakeup() thanks to this test:

	(state & TASK_WOKEN_ANY) == TASK_WOKEN_TIMER

Again, when this condition is not fulfilled, the task must be run.

Must be backported where the commit mentionned above was backported.
2023-04-04 18:24:28 +02:00
Frédéric Lécaille
495968ed51 MINOR: quic: Add trace to debug idle timer task issues
Add TRACE_PROTO() call where this is relevant to debug issues about
qc_idle_timer_task() issues.

Must be backported to 2.6 and 2.7.
2023-04-04 18:24:28 +02:00
Willy Tarreau
db12c0dd10 MINOR: http-act: emit a warning when a header field name contains forbidden chars
As found in issue #2089, it's easy to mistakenly paste a colon in a
header name, or other chars (e.g. spaces) when quotes are in use, and
this causes all sort of trouble in field because such chars are rejected
by the peer.

Better try to detect these upfront. That's what we're doing here during
the parsing of the add-header/set-header/early-hint actions, where a
warning is emitted if a non-token character is found in a header name.
A special case is made for the colon at the beginning so that it remains
possible to place any future pseudo-headers that may appear. E.g:

  [WARNING]  (14388) : config : parsing [badchar.cfg:23] : header name 'X-Content-Type-Options:' contains forbidden character ':'.

This should be backported to 2.7, and ideally it should be turned to an
error in future versions.
2023-04-04 05:38:01 +02:00
Frédéric Lécaille
c877bd4ea5 BUG/MINOR: quic: Remove useless BUG_ON() in newreno and cubic algo implementation
As now_ms may be zero, these BUG_ON() could be triggered when its value has wrapped.
These call to BUG_ON() may be removed because the values they was supposed to
check are safely used by the ticks API.

Must be backported to 2.6 and 2.7.
2023-04-03 13:15:56 +02:00
Frédéric Lécaille
7d6270a845 BUG/MAJOR: quic: Congestion algorithms states shared between the connection
This very old bug is there since the first implementation of newreno congestion
algorithm implementation. This was a very bad idea to put a state variable
into quic_cc_algo struct which only defines the congestion control algorithm used
by a QUIC listener, typically its type and its callbacks.
This bug could lead to crashes since BUG_ON() calls have been added to each algorithm
implementation. This was revealed by interop test, but not very often as there was
not very often several connections run at the time during these tests.

Hopefully this was also reported by Tristan in GH #2095.

Move the congestion algorithm state to the correct structures which are private
to a connection (see cubic and nr structs).

Must be backported to 2.7 and 2.6.
2023-04-02 13:10:13 +02:00
Frédéric Lécaille
de2ba8640b MINOR: quic: Add missing traces in cubic algorithm implementation
May be useful to debug.

Must be backported to 2.7 and 2.6.
2023-04-02 13:10:08 +02:00
Frédéric Lécaille
db54847212 BUG/MINOR: quic: Cubic congestion control window may wrap
Add a check to prevent the cubic congestion control from wrapping (very low risk)
in slow start callback.

Must be backported to 2.6 and 2.7.
2023-04-02 13:10:07 +02:00
Frédéric Lécaille
23b8eef05b BUG/MINOR: quic: Remaining useless statements in cubic slow start callback
When entering a recovery period, the algo state is set by quic_enter_recovery().
And that's it!. These two lines should have been removed with this commit:

     BUG/MINOR: quic: Wrong use of now_ms timestamps (cubic algo)

Take the opportunity of this patch to add a missing TRACE_LEAVE() call in
quic_cc_cubic_ca_cb().

Must be backported to 2.7 and 2.6.
2023-04-02 13:10:04 +02:00
Ilya Shipitsin
07be66d21b CLEANUP: assorted typo fixes in the code and comments
This is 35th iteration of typo fixes
2023-04-01 18:33:40 +02:00
Frédéric Lécaille
db4bc6b4f3 MINOR: quic: Add a fake congestion control algorithm named "nocc"
This algorithm does nothing except initializing the congestion control window
to a fixed value. Very smart!

Modify the QUIC congestion control configuration parser to support this new
algorithm. The congestion control algorithm must be set as follows:

     quic-cc-algo nocc-<cc window size(KB))

For instance if "nocc-15" is provided as quic-cc-algo keyword value, this
will set a fixed window of 15KB.
2023-03-31 17:09:03 +02:00
Willy Tarreau
1cb041a6ee MINOR: cli: support filtering on FD types in "show fd"
Depending on what we're debugging, some FDs can represent pollution in
the "show fd" output. Here we add a set of filters allowing to pick (or
exclude) any combination of listener, frontend conn, backend conn, pipes,
etc. "show fd l" will only list listening connections for example.
2023-03-31 16:35:53 +02:00
Frédéric Lécaille
5d5afe7900 BUG/MINOR: quic: Wrong rtt variance computing
In ->srtt quic_loss struct this is 8*srtt which is stored so that not to have to multiply/devide
it to compute the RTT variance (at least). This is where there was a bug in quic_loss_srtt_update():
each time ->srtt must be used, it must be devided by 8 or right shifted by 3.
This bug had a very bad impact for network with non negligeable packet loss.

Must be backported to 2.6 and 2.7.
2023-03-31 13:41:17 +02:00
Frédéric Lécaille
d721571d26 MEDIUM: quic: Ack delay implementation
Reuse the idle timeout task to delay the acknowledgments. The time of the
idle timer expiration is for now on stored in ->idle_expire. The one to
trigger the acknowledgements is stored in ->ack_expire.
Add QUIC_FL_CONN_ACK_TIMER_FIRED new connection flag to mark a connection
as having its acknowledgement timer been triggered.
Modify qc_may_build_pkt() to prevent the sending of "ack only" packets and
allows the connection to send packet when the ack timer has fired.
It is possible that acks are sent before the ack timer has triggered. In
this case it is cancelled only if ACK frames are really sent.
The idle timer expiration must be set again when the ack timer has been
triggered or when it is cancelled.

Must be backported to 2.7.
2023-03-31 13:41:17 +02:00
Frédéric Lécaille
8f991948f5 MINOR: quic: Traces adjustments at proto level.
Dump variables displayed by TRACE_ENTER() or TRACE_LEAVE() by calls to TRACE_PROTO().
No more variables are displayed by the two former macros. For now on, these information
are accessible from proto level.
Add new calls to TRACE_PROTO() at important locations in relation whith QUIC transport
protocol.
When relevant, try to prefix such traces with TX or RX keyword to identify the
concerned subpart (transmission or reception) of the protocol.

Must be backported to 2.7.
2023-03-31 09:54:59 +02:00
Frédéric Lécaille
01314b8b53 MINOR: quic: Implement cubic state trace callback
This callback was left as not implemented. It should at least display
the algorithm state, the control congestion window the slow start threshold
and the time of the current recovery period. Should be helpful to debug.

Must be backported to 2.7.
2023-03-31 09:54:59 +02:00
Frédéric Lécaille
deb978149a BUG/MINOR: quic: Missing max_idle_timeout initialization for the connection
This bug was revealed by handshakeloss interop tests (often with quiceh) where one
could see haproxy an Initial packet without TLS ClientHello message (only a padded
PING frame). In this case, as the ->max_idle_timeout was not initialized, the
connection was closed about three seconds later, and haproxy opened a new one with
a new source connection ID upon receipt of the next client Initial packet. As the
interop runner count the number of source connection ID used by the server to check
there were exactly 50 such IDs used by the server, it considered the test as failed.

So, the ->max_idle_timeout of the connection must be at least initialized
to the local "max_idle_timeout" transport parameter value to avoid such
a situation (closing connections too soon) until it is "negotiated" with the
client when receiving its TLS ClientHello message.

Must be backported to 2.7 and 2.6.
2023-03-31 09:54:59 +02:00
Frédéric Lécaille
8e6c6611e8 BUG/MINOR: quic: Wrong use of now_ms timestamps (newreno algo)
This patch is similar to the one for cubic algorithm:
    "BUG/MINOR: quic: Wrong use of timestamps with now_ms variable (cubic algo)"

As now_ms may wrap, one must use the ticks API to protect the cubic congestion
control algorithm implementation from side effects due to this.

Furthermore, to make the newreno congestion control algorithm more readable and easy
to maintain, add quic_cc_cubic_rp_cb() new callback for the "in recovery period"
state (QUIC_CC_ST_RP).

Must be backported to 2.7 and 2.6.
2023-03-31 09:54:59 +02:00
Frédéric Lécaille
a3772e1134 MINOR: quic: Add recovery related information to "show quic"
Add ->srtt, ->rtt_var, ->rtt_min and ->pto_count values from ->path->loss
struct to "show quic". Same thing for ->cwnd from ->path struct.

Also take the opportunity of this patch to dump the packet number
space information directly from ->pktns[] array in place of ->els[]
array. Indeed, ->els[QUIC_TLS_ENC_LEVEL_EARLY_DATA] and ->els[QUIC_TLS_ENC_LEVEL_APP]
have the same packet number space.

Must be backported to 2.7 where "show quic" implementation has alredy been
backported.
2023-03-31 09:54:59 +02:00
Frédéric Lécaille
d7243318c4 BUG/MINOR: quic: Wrong use of now_ms timestamps (cubic algo)
As now_ms may wrap, one must use the ticks API to protect the cubic congestion
control algorithm implementation from side effects due to this.

Furthermore to make the cubic congestion control algorithm more readable and easy
to maintain, adding a new state ("in recovery period" QUIC_CC_ST_RP new enum) helps
in reaching this goal. Implement quic_cc_cubic_rp_cb() which is the callback for
this new state.

Must be backported to 2.7 and 2.6.
2023-03-31 09:54:59 +02:00
Remi Tricot-Le Breton
6549f53fb6 BUG/MINOR: ssl: ssl-(min|max)-ver parameter not duplicated for bundles in crt-list
If a bundle is used in a crt-list, the ssl-min-ver and ssl-max-ver
options were not taken into account in entries other than the first one
because the corresponding fields in the ssl_bind_conf structure were not
copied in crtlist_dup_ssl_conf.

This should fix GitHub issue #2069.
This patch should be backported up to 2.4.
2023-03-31 09:11:51 +02:00
Remi Tricot-Le Breton
d32c8e3ccb BUG/MINOR: ssl: Fix potential leak in cli_parse_update_ocsp_response
In some extremely unlikely case (or even impossible for now), we might
exit cli_parse_update_ocsp_response without raising an error but with a
filled 'err' buffer. It was not properly free'd.

It does not need to be backported.
2023-03-31 09:10:36 +02:00
Remi Tricot-Le Breton
ae5187721f BUG/MINOR: ssl: Remove dead code in cli_parse_update_ocsp_response
This patch removes dead code from the cli_parse_update_ocsp_response
function. The 'end' label in only used in case of error so the check of
the 'errcode' variable and the errcode variable itself become useless.

This patch does not need to be backported.
It fixes GitHub issue #2077.
2023-03-31 09:08:28 +02:00
Aurelien DARRAGON
7f01f0a8ef BUG/MEDIUM: proxy/sktable: prevent watchdog trigger on soft-stop
During soft-stop, manage_proxy() (p->task) will try to purge
trashable (expired and not referenced) sticktable entries,
effectively releasing the process memory to leave some space
for new processes.

This is done by calling stktable_trash_oldest(), immediately
followed by a pool_gc() to give the memory back to the OS.

As already mentioned in dfe7925 ("BUG/MEDIUM: stick-table:
limit the time spent purging old entries"), calling
stktable_trash_oldest() with a huge batch can result in the function
spending too much time searching and purging entries, and ultimately
triggering the watchdog.

Lately, an internal issue was reported in which we could see
that the watchdog is being triggered in stktable_trash_oldest()
on soft-stop (thus initiated by manage_proxy())

According to the report, the crash seems to only occur since 5938021
("BUG/MEDIUM: stick-table: do not leave entries in end of window during purge")

This could be the result of stktable_trash_oldest() now working
as expected, and thus spending a large amount of time purging
entries when called with a large enough <to_batch>.

Instead of adding new checks in stktable_trash_oldest(), here we
chose to address the issue directly in manage_proxy().

Since the stktable_trash_oldest() function is called with
<to_batch> == <p->table->current>, it's pretty obvious that it could
cause some issues during soft-stop if a large table, assuming it is
full prior to the soft-stop, suddenly sees most of its entries
becoming trashable because of the soft-stop.

Moreover, we should note that the call to stktable_trash_oldest() is
immediately followed by a call to pool_gc():

We know for sure that pool_gc(), as it involves malloc_trim() on
glibc, is rather expensive, and the more memory to reclaim,
the longer the call.

We need to ensure that both stktable_trash_oldest() + consequent
pool_gc() call both theoretically fit in a single task execution window
to avoid contention, and thus prevent the watchdog from being triggered.

To do this, we now allocate a "budget" for each purging attempt.
budget is maxed out to 32K, it means that each sticktable cleanup
attempt will trash at most 32K entries.

32K value is quite arbitrary here, and might need to be adjusted or
even deducted from other parameters if this fails to properly address
the issue without introducing new side-effects.
The goal is to find a good balance between the max duration of each
cleanup batch and the frequency of (expensive) pool_gc() calls.

If most of the budget is actually spent trashing entries, then the task
will immediately be rescheduled to continue the purge.
This way, the purge is effectively batched over multiple task runs.

This may be slowly backported to all stable versions.
[Please note that this commit depends on 6e1fe25 ("MINOR: proxy/pool:
prevent unnecessary calls to pool_gc()")]
2023-03-31 07:05:08 +02:00
Martin DOLEZ
28c5f40ad6 MINOR: http_fetch: Add case-insensitive argument for url_param/urlp_val
This commit adds a new optional argument to smp_fetch_url_param
and smp_fetch_url_param_val that makes the parameter key comparison
case-insensitive.
Now users can retrieve URL parameters regardless of their case,
allowing to match parameters in case insensitive application.
Doc was updated.
2023-03-30 14:11:25 +02:00
Martin DOLEZ
110e4a8733 MINOR: http_fetch: add case insensitive support for smp_fetch_url_param
This commit adds a new argument to smp_fetch_url_param
that makes the parameter key comparison case-insensitive.
Several levels of callers were modified to pass this info.
2023-03-30 14:11:10 +02:00
Martin DOLEZ
1a9a994c11 MINOR: http_fetch: Add support for empty delim in url_param
In prevision of adding a third parameter to the url_param
sample-fetch function we need to make the second parameter optional.
User can now pass a empty 2nd argument to keep the default delimiter.
2023-03-30 14:10:59 +02:00
Aurelien DARRAGON
2c5b9ded9b CLEANUP: proxy: remove stop_time related dead code
Since eb77824 ("MEDIUM: proxy: remove the deprecated "grace" keyword"),
stop_time is never set, so the related code in manage_proxy() is not
relevant anymore.

Removing code that refers to p->stop_time, since it was probably
overlooked.
2023-03-28 20:26:47 +02:00
Aurelien DARRAGON
6e1fe253b7 MINOR: proxy/pool: prevent unnecessary calls to pool_gc()
Under certain soft-stopping conditions (ie: sticktable attached to proxy
and in-progress connections to the proxy that prevent haproxy from
exiting), manage_proxy() (p->task) will wake up every second to perform
a cleanup attempt on the proxy sticktable (to purge unused entries).

However, as reported by TimWolla in GH #2091, it was found that a
systematic call to pool_gc() could cause some CPU waste, mainly
because malloc_trim() (which is rather expensive) is being called
for each pool_gc() invocation.

As a result, such soft-stopping process could be spending a significant
amount of time in the malloc_trim->madvise() syscall for nothing.

Example "strace -c -f -p `pidof haproxy`" output (taken from
Tim's report):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 46.77    1.840549        3941       467         1 epoll_wait
 43.82    1.724708          13    128509           sched_yield
  8.82    0.346968          11     29696           madvise
  0.58    0.023011          24       951           clock_gettime
  0.01    0.000257          10        25         7 recvfrom
  0.00    0.000033          11         3           sendto
  0.00    0.000021          21         1           rt_sigreturn
  0.00    0.000021          21         1           timer_settime
------ ----------- ----------- --------- --------- ----------------
100.00    3.935568          24    159653         8 total

To prevent this, we now only call pool_gc() when some memory is
really expected to be reclaimed as a direct result of the previous
stick table cleanup.
This is pretty straightforward since stktable_trash_oldest() returns
the number of trashed sticky sessions.

This may be backported to every stable versions.
2023-03-28 20:26:38 +02:00
Frédéric Lécaille
9c317b1d35 BUG/MINOR: quic: Missing padding in very short probe packets
This bug arrived with this commit:
   MINOR: quic: Send PING frames when probing Initial packet number space

This may happen when haproxy needs to probe the peer with very short packets
(only one PING frame). In this case, the packet must be padded. There was clearly
a case which was removed by the mentionned commit above. That said, there was
an extra byte which was added to the PADDING frame before the mentionned commit
above. This is no more the case with this patch.

Thank you to @tatsuhiro-t (ngtcp2 manager) for having reported this issue which
was revealed by the keyupdate test (on client side).

Must be backported to 2.7 and 2.6.
2023-03-28 18:26:57 +02:00
Christopher Faulet
21fb6bdab4 BUG/MEDIUM: mux-h2: Be able to detect connection error during handshake
When a backend H2 connection is waiting the connection is fully established,
nothing is sent. However, it remains useful to detect connection error at
this stage. It is especially important to release H2 connection on connect
error. Be able to set H2_CF_ERR_PENDiNG or H2_CF_ERROR flags when the
underlying connection is not fully established will exclude the H2C to be
inserted in a idle list in h2_detach().

Without this fix, an H2C in PREFACE state and relying on a connection in
error can be inserted in the safe list. Of course, it will be purged if not
reused. But in the mean time, it can be reused. When this happens, the
connection remains in error and nothing happens. At the end a connection
error is returned to the client. On low traffic, we can imagine a scenario
where this dead connection is the only idle connection. If it is always
reused before being purged, no connection to the server is possible.

In addition, h2c_is_dead() is updated to declare as dead any H2 connection
with a pending error if its state is PREFACE or SETTINGS1 (thus if no
SETTINGS frame was received yet).

This patch should fix the issue #2092. It must be backported as far as 2.6.
2023-03-28 14:52:42 +02:00
Christopher Faulet
41a454da0a BUG/MINOR: stats: Don't replace sc_shutr() by SE_FL_EOS flag yet
In commit c2c043ed4 ("BUG/MEDIUM: stats: Consume the request except when
parsing the POST payload"), a change about applet was pushed too early. The
applet must still call cf_shutr() when the response is fully sent. It is
planned to rely on SE_FL_EOS flag, just like connections. But it is not
possible for now.

However, at first glance, this bug has no visible effect.

It is 2.8-specific. No backport needed.
2023-03-28 14:36:05 +02:00
Tim Duesterhus
b39c24b29e BUG/MINOR: ssl: Stop leaking err in ssl_sock_load_ocsp()
Previously performing a config check of `.github/h2spec.config` would report a
20 byte leak as reported in GitHub Issue #2082.

The leak was introduced in a6c0a59e9a, which is
dev only. No backport needed.
2023-03-28 11:09:12 +02:00
Frédéric Lécaille
c425e03b28 BUG/MINOR: quic: Missing STREAM frame type updated
This patch follows this commit which was not sufficient:
  BUG/MINOR: quic: Missing STREAM frame data pointer updates

Indeed, after updating the ->offset field, the bit which informs the
frame builder of its presence must be systematically set.

This bug was revealed by the following BUG_ON() from
quic_build_stream_frame() :
  bug condition "!!(frm->type & 0x04) != !!stream->offset.key" matched at src/quic_frame.c:515

This should fix the last crash occured on github issue #2074.

Must be backported to 2.6 and 2.7.
2023-03-27 16:01:44 +02:00
Aurelien DARRAGON
821581c990 BUG/MINOR: applet/new: fix sedesc freeing logic
Since 465a6c8 ("BUG/MEDIUM: applet: only set appctx->sedesc on
successful allocation"), sedesc is attached to the appctx after the
task is successfully allocated.

If the task fails to allocate: current sedesc cleanup is performed
on appctx->sedesc which still points to NULL so sedesc won't be
freed.
This is fine when sedesc is provided as argument (!=NULL), but leads
to memory leaks if sedesc is allocated locally.

It was shown in GH #2086 that if sedesc != NULL when passed as
argument, it shouldn't be freed on error paths. This is what 465a6c8
was trying to address.

In an attempt to fix both issues at once, we do as Christopher
suggested: that is moving sedesc allocation attempt at the
end of the function, so that we don't have to free it in case
of error, thus removing the ambiguity.
(We won't risk freeing a sedesc that does not belong to us)

If we fail to allocate sedesc, then the task that was previously
created locally is simply destroyed.

This needs to be backported to 2.6 with 465a6c8 ("BUG/MEDIUM: applet:
only set appctx->sedesc on successful allocation")
[Copy pasting the original backport note from Willy:
In 2.6 the function is slightly
different and called appctx_new(), though the issue is exactly the
same.]
2023-03-24 14:38:53 +01:00
Christopher Faulet
551b896772 BUG/MEDIUM: mux-h1: Wakeup H1C on shutw if there is no I/O subscription
This old bug was revealed because of the commit 407210a34 ("BUG/MEDIUM:
stconn: Don't rearm the read expiration date if EOI was reached"). But it is
still possible to hit it if there is no server timeout. At first glance, the
2.8 is not affected. But the fix remains valid.

When a shutdown for writes if performed the H1 connection must be notified
to be released. If it is subscribed for any I/O events, it is not an
issue. But, if there is no subscription, no I/O event is reported to the H1
connection and it remains alive. If the message was already fully received,
nothing more happens.

On my side, I was able to trigger the bug by freezing the session. Some
users reported a spinning loop on process_stream(). Not sure how to trigger
the loop. To freeze the session, the client timeout must be reached while
the server response was already fully received. On old version (< 2.6), it
only happens if there is no server timeout.

To fix the issue, we must wake up the H1 connection on shutdown for writes
if there is no I/O subscription.

This patch must be backported as far as 2.0. It should fix the issue #2090
and #2088.
2023-03-24 14:38:35 +01:00
Christopher Faulet
c2c043ed43 BUG/MEDIUM: stats: Consume the request except when parsing the POST payload
The stats applet is designed to consume the request at the end, when it
finishes to send the response. And during the response forwarding, because
the request is not consumed, the applet states it will not consume
data. This avoid to wake the applet up in loop. When it finishes to send the
response, the request is consumed.

For POST requests, there is no issue because the response is small
enough. It is sent in one time and must be processed by HTTP analyzers. Thus
the forwarding is not performed by the applet itself. The applet is always
able to consume the request, regardless the payload length.

But for other requests, it may be an issue. If the response is too big to be
sent in one time and if the requests is not fully received when the response
headers are sent, the applet may be blocked infinitely, not consuming the
request. Indeed, in the case the applet will be switched in infinite forward
mode, the request will not be consumed immediately. At the end, the request
buffer is flushed. But if some data must still be received, the applet is not
woken up because it is still in a "not-consuming" mode.

So, to fix the issue, we must take care to re-enable data consuming when the
end of the response is reached.

This patch must be backported as far as 2.6.
2023-03-24 09:24:27 +01:00
Christopher Faulet
3aeb36681c BUG/MINOR: syslog: Request for more data if message was not fully received
In the syslog applet, when a message was not fully received, we must request
for more data by calling appctx_need_more_data() and not by setting
CF_READ_DONTWAIT flag on the request channel. Indeed, this flag is only used
to only try a read at once.

This patch could be backported as far as 2.4. On 2.5 and 2.4,
applet_need_more_data() must be replaced by si_cant_get().
2023-03-24 09:24:03 +01:00
Amaury Denoyelle
abbb5ad1f5 MINOR: mux-quic: close on frame alloc failure
Replace all BUG_ON() on frame allocation failure by a CONNECTION_CLOSE
sending with INTERNAL_ERROR code. This can happen for the following
cases :
* sending of MAX_STREAM_DATA
* sending of MAX_DATA
* sending of MAX_STREAMS_BIDI

In other cases (STREAM, STOP_SENDING, RESET_STREAM), an allocation
failure will only result in the current operation to be interrupted and
retried later. However, it may be desirable in the future to replace
this with a simpler CONNECTION_CLOSE emission to recover better under a
memory pressure issue.

This should be backported up to 2.7.
2023-03-23 14:39:49 +01:00
Amaury Denoyelle
c0c6b6d8c0 MINOR: mux-quic: close on qcs allocation failure
Emit a CONNECTION_CLOSE with INTERNAL_ERROR code each time qcs
allocation fails. This can happen in two cases :
* when creating a local stream through application layer
* when instantiating a remote stream through qcc_get_qcs()

In both cases, error paths are already in place to interrupt the current
operation and a CONNECTION_CLOSE will be emitted soon after.

This should be backported up to 2.7.
2023-03-23 14:39:49 +01:00
Amaury Denoyelle
e2213df9fe MINOR: mux-quic: ensure CONNECTION_CLOSE is scheduled once per conn
Add BUG_ON() statements to ensure qcc_emit_cc()/qcc_emit_cc_app() is not
called more than one time for each connection. This should improve code
resilience of MUX-QUIC and H3 and it will ensure that a scheduled
CONNECTION_CLOSE is not overwritten by another one with a different
error code.

This commit relies on the previous one to ensure all QUIC operations are
not conducted as soon as a CONNECTION_CLOSE has been prepared :
  commit d7fbf458f8a4c5b09cbf0da0208fbad70caaca33
  MINOR: mux-quic: interrupt most operations if CONNECTION_CLOSE scheduled

This should be backported up to 2.7.
2023-03-23 14:39:49 +01:00
Amaury Denoyelle
b47310d883 MINOR: mux-quic: interrupt qcc_recv*() operations if CC scheduled
Ensure that external MUX operations are interrupted if a
CONNECTION_CLOSE is scheduled. This was already the cases for some
functions. This is extended to the qcc_recv*() family for
MAX_STREAM_DATA, RESET_STREAM and STOP_SENDING.

Also, qcc_release_remote_stream() is skipped in qcs_destroy() if a
CONNECTION_CLOSE is already scheduled.

All of this will ensure we only proceed to minimal treatment as soon as
a CONNECTION_CLOSE is prepared. Indeed, all sending and receiving is
stopped as soon as a CONNECTION_CLOSE is emitted so only internal
cleanup code should be necessary at this stage.

This should prevent a registered CONNECTION_CLOSE error status to be
overwritten by an error in a follow-up treatment.

This should be backported up to 2.7.
2023-03-23 14:39:47 +01:00
Amaury Denoyelle
665817a91c BUG/MINOR: mux-quic: prevent CC status to be erased by shutdown
HTTP/3 graceful shutdown operation is used to emit a GOAWAY followed by
a CONNECTION_CLOSE with H3_NO_ERROR status. It is used for every
connection on release which means that if a CONNECTION_CLOSE was already
registered for a previous error, its status code is overwritten.

To fix this, skip shutdown operation if a CONNECTION_CLOSE is already
registered at the MUX level. This ensures that the correct error status
is reported to the peer.

This should be backported up to 2.6. Note that qc_shutdown() does not
exists on 2.6 so modification will have to be made directly in
qc_release() as followed :

diff --git a/src/mux_quic.c b/src/mux_quic.c
index 49df0dc418..3463222956 100644
--- a/src/mux_quic.c
+++ b/src/mux_quic.c
@@ -1766,19 +1766,21 @@ static void qc_release(struct qcc *qcc)

        TRACE_ENTER(QMUX_EV_QCC_END, conn);

-       if (qcc->app_ops && qcc->app_ops->shutdown) {
-               /* Application protocol with dedicated connection closing
-                * procedure.
-                */
-               qcc->app_ops->shutdown(qcc->ctx);
+       if (!(qcc->flags & QC_CF_CC_EMIT)) {
+               if (qcc->app_ops && qcc->app_ops->shutdown) {
+                       /* Application protocol with dedicated connection closing
+                        * procedure.
+                        */
+                       qcc->app_ops->shutdown(qcc->ctx);

-               /* useful if application protocol should emit some closing
-                * frames. For example HTTP/3 GOAWAY frame.
-                */
-               qc_send(qcc);
-       }
-       else {
-               qcc_emit_cc_app(qcc, QC_ERR_NO_ERROR, 0);
+                       /* useful if application protocol should emit some closing
+                        * frames. For example HTTP/3 GOAWAY frame.
+                        */
+                       qc_send(qcc);
+               }
+               else {
+                       qcc_emit_cc_app(qcc, QC_ERR_NO_ERROR, 0);
+               }
        }

        if (qcc->task) {
2023-03-23 14:38:06 +01:00
Amaury Denoyelle
5aa21c1748 BUG/MINOR: h3: properly handle incomplete remote uni stream type
A H3 unidirectional stream is always opened with its stream type first
encoded as a QUIC variable integer. If the STREAM frame contains some
data but not enough to decode this varint, haproxy would crash due to an
ABORT_NOW() statement.

To fix this, ensure we support an incomplete stream type. In this case,
h3_init_uni_stream() returns 0 and the buffer content is not cleared.
Stream decoding will resume when new data are received for this stream
which should be enough to decode the stream type varint.

This bug has never occured on production because standard H3 stream types
are small enough to be encoded on a single byte.

This should be backported up to 2.6.
2023-03-23 14:38:06 +01:00
Willy Tarreau
1751db140a MINOR: pools: report a replaced memory allocator instead of just malloc_trim()
Instead of reporting the inaccurate "malloc_trim() support" on -vv, let's
report the case where the memory allocator was actively replaced from the
one used at build time, as this is the corner case we want to be cautious
about. We also put a tainted bit when this happens so that it's possible
to detect it at run time (e.g. the user might have inherited it from an
environment variable during a reload operation).

The now unused is_trim_enabled() function was finally dropped.
2023-03-22 18:05:02 +01:00
Willy Tarreau
0c27ec5df7 BUG/MINOR: pools: restore detection of built-in allocator
The runtime detection of the default memory allocator was broken by
Commit d8a97d8f6 ("BUG/MINOR: illegal use of the malloc_trim() function
if jemalloc is used") due to a misunderstanding of its role. The purpose
is not to detect whether we're on non-jemalloc but whether or not the
allocator was changed from the one we booted with, in which case we must
be extra cautious and absolutely refrain from calling malloc_trim() and
its friends.

This was done only to drop the message saying that malloc_trim() is
supported, which will be totally removed in another commit, and could
possibly be removed even in older versions if this patch would get
backported since in the end it provides limited value.
2023-03-22 17:57:13 +01:00
Willy Tarreau
c3b297d5a4 MEDIUM: tools: further relax dlopen() checks too consider grouped symbols
There's a recurring issue regarding shared library loading from Lua. If
the imported library is linked with a different version of openssl but
doesn't use it, the check will trigger and emit a warning. In practise
it's not necessarily a problem as long as the API is the same, because
all symbols are replaced and the library will use the included ssl lib.

It's only a problem if the library comes with a different API because
the dynamic linker will only replace known symbols with ours, and not
all. Thus the loaded lib may call (via a static inline or a macro) a
few different symbols that will allocate or preinitialize structures,
and which will then pass them to the common symbols coming from a
different and incompatible lib, exactly what happens to users of Lua's
luaossl when building haproxy with quictls and without rebuilding
luaossl.

In order to better address this situation, we now define groups of
symbols that must always appear/disappear in a consistent way. It's OK
if they're all absent from either haproxy or the lib, it means that one
of them doesn't use them so there's no problem. But if any of them is
defined on any side, all of them must be in the exact same state on the
two sides. The symbols are represented using a bit in a mask, and the
mask of the group of other symbols they're related to. This allows to
check 64 symbols, this should be OK for a while. The first ones that
are tested for now are symbols whose combination differs between
openssl versions 1.0, 1.1, and 3.0 as well as quictls. Thus a difference
there will indicate upcoming trouble, but no error will mean that we're
running on a seemingly compatible API and that all symbols should be
replaced at once.

The same mechanism could possibly be used for pcre/pcre2, zlib and the
few other optional libs that may occasionally cause runtime issues when
used by dependencies, provided that discriminatory symbols are found to
distinguish them. But in practice such issues are pretty rare, mainly
because loading standard libs via Lua on a custom build of haproxy is
not pretty common.

In the event that further symbol compatibility issues would be reported
in the future, backporting this patch as well as the following series
might be an acceptable solution given that the scope of changes is very
narrow (the malloc stuff is needed so that the malloc/free tests can be
dropped):

  BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used
  MINOR: pools: make sure 'no-memory-trimming' is always used
  MINOR: pools: intercept malloc_trim() instead of trying to plug holes
  MEDIUM: pools: move the compat code from trim_all_pools() to malloc_trim()
  MINOR: pools: export trim_all_pools()
  MINOR: pattern: use trim_all_pools() instead of a conditional malloc_trim()
  MINOR: tools: relax dlopen() on malloc/free checks
2023-03-22 17:30:28 +01:00
Willy Tarreau
58912b8d92 MINOR: tools: relax dlopen() on malloc/free checks
Now that we can provide a safe malloc_trim() we don't need to detect
anymore that some dependencies use a different set of malloc/free
functions than ours because they will use the same as those we're
seeing, and we control their use of malloc_trim(). The comment about
the incompatibility with DEBUG_MEM_STATS is not true anymore either
since the feature relies on macros so we're now OK.

This will stop catching libraries linked against glibc's allocator
when haproxy is natively built with jemalloc. This was especially
annoying since dlopen() on a lib depending on jemalloc() tends to
fail on TLS issues.
2023-03-22 17:30:28 +01:00
Willy Tarreau
9b060f148e MINOR: pattern: use trim_all_pools() instead of a conditional malloc_trim()
First this will ensure that we serialize the threads and avoid severe
contention. Second it removes ugly ifdefs and conditions.
2023-03-22 17:30:28 +01:00
Willy Tarreau
7aee683541 MINOR: pools: export trim_all_pools()
This way it will be usable from outside instead of malloc_trim().
2023-03-22 17:30:28 +01:00
Willy Tarreau
4138f15182 MEDIUM: pools: move the compat code from trim_all_pools() to malloc_trim()
We already have some generic code in trim_all_pools() to implement the
equivalent of malloc_trim() on jemalloc and macos. Instead of keeping the
logic there, let's just move it to our own malloc_trim() implementation
so that we can unify the mechanism and the logic. Now any low-level code
calling malloc_trim() will either be disabled by haproxy's config if the
user decides to, or will be mapped to the equivalent mechanism if malloc()
was intercepted by a preloaded jemalloc.

Trim_all_pools() preserves the benefit of serializing threads (which we
must not impose to other libs which could come with their own threads).
It means that our own code should mostly use trim_all_pools() instead of
calling malloc_trim() directly.
2023-03-22 17:30:28 +01:00
Willy Tarreau
eaba76b02d MINOR: pools: intercept malloc_trim() instead of trying to plug holes
As reported by Miroslav in commit d8a97d8f6 ("BUG/MINOR: illegal use of
the malloc_trim() function if jemalloc is used") there are still occasional
cases where it's discovered that malloc_trim() is being used without its
suitability being checked first. This is a problem when using another
incompatible allocator. But there's a class of use cases we'll never be
able to cover, it's dynamic libraries loaded from Lua. In order to address
this more reliably, we now define our own malloc_trim() that calls the
previous one after checking that the feature is supported and that the
allocator is the expected one. This way child libraries that would call
it will also be safe.

The function is intentionally left defined all the time so that it will
be possible to clean up some code that uses it by removing ifdefs.
2023-03-22 17:30:28 +01:00
Willy Tarreau
4db0b0430d MINOR: pools: make sure 'no-memory-trimming' is always used
The global option 'no-memory-trimming' was added in 2.6 with commit
c4e56dc58 ("MINOR: pools: add a new global option "no-memory-trimming"")
but there were some cases left where it was not considered. Let's make
is_trim_enabled() also consider it.
2023-03-22 17:29:23 +01:00
Amaury Denoyelle
f4e7616e6c MINOR: mux-quic: add flow-control info to minimal trace level
Complete traces with information from qcc and qcs instances about
flow-control level. This should help to debug further issue on sending.

This must be backported up to 2.7.
2023-03-22 16:08:54 +01:00
Amaury Denoyelle
b7143a8781 MINOR: mux-quic: adjust trace level for MAX_DATA/MAX_STREAM_DATA recv
Change the trace from developer to data level whenever the flow control
limitation is updated following a MAX_DATA or MAX_STREAM_DATA reception.

This should be backported up to 2.7.
2023-03-22 16:08:54 +01:00
Amaury Denoyelle
1ec78ff421 MINOR: mux-quic: complete traces for qcs emission
Add traces for _qc_send_qcs() function. Most notably, traces have been
added each time a qc_stream_desc buffer allocation fails and when stream
or connection flow-level is reached. This should improve debugging for
emission issues.

This must be backported up to 2.7.
2023-03-22 16:08:54 +01:00
Amaury Denoyelle
178fbffda1 BUG/MEDIUM: mux-quic: release data from conn flow-control on qcs reset
Connection flow-control level calculation is a bit complicated. To
ensure it is never exceeded, each time a transfer occurs from a
qcs.tx.buf to its qc_stream_desc buffer it is accounted in
qcc.tx.offsets at the connection level. This value is not decremented
even if the corresponding STREAM frame is rejected by the quic-conn
layer as its emission will be retried later.

In normal cases this works as expected. However there is an issue if a
qcs instance is removed with prepared data left. In this case, its data
is still accounted in qcc.tx.offsets despite being removed which may
block other streams. This happens every time a qcs is reset with
remaining data which will be discarded in favor of a RESET_STREAM frame.

To fix this, if a stream has prepared data in qcc_reset_stream(), it is
decremented from qcc.tx.offsets. A BUG_ON() has been added to ensure
qcs_destroy() is never called for a stream with prepared data left.

This bug can cause two issues :
* transfer freeze as data unsent from closed streams still count on the
  connection flow-control limit and will block other streams. Note that
  this issue was not reproduced so it's unsure if this really happens
  without the following issue first.
* a crash on a BUG_ON() statement in qc_send() loop over
  qc_send_frames(). Streams may remained in the send list with nothing
  to send due to connection flow-control limit. However, limit is never
  reached through qcc_streams_sent_done() so QC_CF_BLK_MFCTL flag is not
  set which will allow the loop to continue.

The last case was reproduced after several minutes of testing using the
following command :

$ ngtcp2-client --exit-on-all-streams-close -t 0.1 -r 0.1 \
  --max-data=100K -n32 \
  127.0.0.1 20443 "https://127.0.0.1:20443/?s=1g" 2>/dev/null

This should fix github issues #2049 and #2074.
2023-03-22 16:08:54 +01:00
Miroslav Zagorac
d8a97d8f60 BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used
In the event that HAProxy is linked with the jemalloc library, it is still
shown that malloc_trim() is enabled when executing "haproxy -vv":
  ..
  Support for malloc_trim() is enabled.
  ..

It's not so much a problem as it is that malloc_trim() is called in the
pat_ref_purge_range() function without any checking.

This was solved by setting the using_default_allocator variable to the
correct value in the detect_allocator() function and before calling
malloc_trim() it is checked whether the function should be called.
2023-03-22 14:14:50 +01:00
Willy Tarreau
9ef2742a51 MINOR: debug: support dumping the libs addresses when running in verbose mode
Starting haproxy with -dL helps enumerate the list of libraries in use.
But sometimes in order to go further we'd like to see their address
ranges. This is already supported on the CLI's "show libs" but not on
the command line where it can sometimes help troubleshoot startup issues.
Let's dump them when in verbose mode. This way it doesn't change the
existing behavior for those trying to enumerate libs to produce an archive.
2023-03-22 11:43:15 +01:00
Willy Tarreau
1b536a11e7 BUILD: thread: silence a build warning when threads are disabled
When threads are disabled, the compiler complains that we might be
accessing tg->abs[] out of bounds since the array is of size 1. It
cannot know that the condition to do this is never met, and given
that it's not in a fast path, we can make it more obvious.
2023-03-22 10:40:06 +01:00
Amaury Denoyelle
8afe4b88c4 BUG/MINOR: quic: ignore congestion window on probing for MUX wakeup
qc_notify_send() is used to wake up the MUX layer for sending. This
function first ensures that all sending condition are met to avoid to
wake up the MUX for unnecessarily.

One of this condition is to check if there is room in the congestion
window. However, when probe packets must be sent due to a PTO
expiration, RFC 9002 explicitely mentions that the congestion window
must be ignored which was not the case prior to this patch.

This commit fixes this by first setting <pto_probe> of 01RTT packet
space before invoking qc_notify_send(). This ensures that congestion
window won't be checked anymore to wake up the MUX layer until probing
packets are sent.

This commit replaces the following one which was not sufficient :
  commit e25fce03eb
  BUG/MINOR: quic: Dysfunctional 01RTT packet number space probing

This should be backported up to 2.7.
2023-03-21 14:52:02 +01:00
Amaury Denoyelle
2a19b6e564 BUG/MINOR: quic: wake up MUX on probing only for 01RTT
On PTO probe timeout expiration, a probe packet must be emitted.
quic_pto_pktns() is used to determine for which packet space the timer
has expired. However, if MUX is already subscribed for sending, it is
woken up without checking first if this happened for the 01RTT packet
space.

It is unsure that this is really a bug as in most cases, MUX is
established only after Initial and Handshake packet spaces are removed.
However, the situation is not se clear when 0-RTT is used. For this
reason, adjust the code to explicitely check for the 01RTT packet space
before waking up the MUX layer.

This should be backported up to 2.6. Note that qc_notify_send() does not
exists in 2.6 so it should be replaced by the explicit block checking
(qc->subs && qc->subs->events & SUB_RETRY_SEND).
2023-03-21 14:09:50 +01:00
Willy Tarreau
465a6c8506 BUG/MEDIUM: applet: only set appctx->sedesc on successful allocation
If appctx_new_on() fails to allocate a task, it will not remove the
freshly allocated sedesc from the appctx despite freeing it, causing
a UAF. Let's only assign appctx->sedesc upon success.

This needs to be backported to 2.6. In 2.6 the function is slightly
different and called appctx_new(), though the issue is exactly the
same.
2023-03-21 10:50:51 +01:00
Willy Tarreau
a220e59ad8 BUG/MEDIUM: mux-h1: properly destroy a partially allocated h1s
In h1c_frt_stream_new() and h1c_bck_stream_new(), if we fail to completely
initialize the freshly allocated h1s, typically because sc_attach_mux()
fails, we must use h1s_destroy() to de-initialize it. Otherwise it stays
attached to the h1c when released, causing use-after-free upon the next
wakeup. This can be triggered upon memory shortage.

This needs to be backported to 2.6.
2023-03-21 10:44:44 +01:00
Willy Tarreau
0c4348c982 MINOR: pools: preset the allocation failure rate to 1% with -dMfail
Using -dMfail alone does nothing unless tune.fail-alloc is set, which
renders it pretty useless as-is, and is not intuitive. Let's change
this so that the filure rate is preset to 1% when the option is set on
the command line. This allows to inject failures without having to edit
the configuration.
2023-03-21 09:26:55 +01:00
Willy Tarreau
7a8ca0a063 BUG/MINOR: stconn: fix sedesc memory leak on stream allocation failure
If we fail to allocate a new stream in sc_new_from_endp(), and the call
to sc_new() allocated the sedesc itself (which normally doesn't happen),
then it doesn't get released on the failure path. Let's explicitly
handle this case so that it's not overlooked and avoids some head
scratching sessions.

This may be backported to 2.6.
2023-03-20 19:58:38 +01:00
Willy Tarreau
e2f7946339 BUG/MEDIUM: stconn: don't set the type before allocation succeeds
There's an occasional crash that can be triggered in sc_detach_endp()
when calling conn->mux->detach() upon memory allocation error. The
problem in fact comes from sc_attach_mux(), which doesn't reset the
sc type flags upon tasklet allocation failure, leading to an attempt
at detaching an incompletely initialized stconn. Let's just attach
the sc after the tasklet allocation succeeds, not before.

This must be backported to 2.6.
2023-03-20 19:58:38 +01:00
Willy Tarreau
389ab0d4b4 BUG/MEDIUM: mux-h2: erase h2c->wait_event.tasklet on error path
On the allocation error path in h2_init() we may check if
h2c->wait_event.tasklet needs to be released but it has not yet been
zeroed. Let's do this before jumping to the freeing location.

This needs to be backported to all maintained versions.
2023-03-20 19:58:38 +01:00
Willy Tarreau
bcdc6cc15b BUG/MEDIUM: mux-h2: do not try to free an unallocated h2s->sd
In h2s_close() we may dereference h2s->sd to get the sc, but this
function may be called on allocation error paths, so we must check
for this specific condition. Let's also update the comment to make
it explicitly permitted.

This needs to be backported to 2.6.
2023-03-20 19:58:38 +01:00
Willy Tarreau
a45e7e81ec BUG/MEDIUM: stream: do not try to free a failed stream-conn
In stream_free() if we fail to allocate s->scb() we go to the path where
we try to free it, and it doesn't like being called with a null at all.
It's easily reproducible with -dMfail,no-cache and "tune.fail-alloc 10"
in the global section.

This must be backported to 2.6.
2023-03-20 19:58:38 +01:00
Frédéric Lécaille
e25fce03eb BUG/MINOR: quic: Dysfunctional 01RTT packet number space probing
This bug arrived with this commit:
   "MINOR: quic: implement qc_notify_send()".
The ->tx.pto_probe variable was no more set when qc_processt_timer() the timer
task for the connection responsible of detecting packet loss and probing upon
PTO expiration leading to interrupted stream transfers. This was revealed by
blackhole interop failed tests where one could see that qc_process_timer()
was wakeup without traces as follows in the log file:
   "needs to probe 01RTT packet number space"

Must be backported to 2.7 and to 2.6 if the commit mentionned above
is backported to 2.6 in the meantime.
2023-03-20 17:50:36 +01:00
Frédéric Lécaille
c664e644eb MINOR: quic: Stop stressing the acknowledgments process (RX ACK frames)
The ACK frame range of packets were handled from the largest to the smallest
packet number, leading to big number of ebtree insertions when the packet are
handled in the inverse way they are sent. This was detected a long time ago
but left in the code to stress our implementation. It is time to be more
efficient and process the packet so that to avoid useless ebtree insertions.

Modify qc_ackrng_pkts() responsible of handling the acknowledged packets from an
ACK frame range of acknowledged packets.

Must be backported to 2.7.
2023-03-20 17:47:12 +01:00
Willy Tarreau
ac78c4fd9d MINOR: ssl-sock: pass the CO_SFL_MSG_MORE info down the stack
Despite having replaced the SSL BIOs to use our own raw_sock layer, we
still didn't exploit the CO_SFL_MSG_MORE flag which is pretty useful to
avoid sending incomplete packets. It's particularly important for SSL
since the extra overhead almost guarantees that each send() will be
followed by an incomplete (and often odd-sided) segment.

We already have an xprt_st set of flags to pass info to the various
layers, so let's just add a new one, SSL_SOCK_SEND_MORE, that is set
or cleared during ssl_sock_from_buf() to transfer the knowledge of
CO_SFL_MSG_MORE. This way we can recover this information and pass
it to raw_sock.

This alone is sufficient to increase by ~5-10% the H2 bandwidth over
SSL when multiple streams are used in parallel.
2023-03-17 16:43:51 +01:00
Willy Tarreau
464fa06e9a MINOR: mux-h2: set CO_SFL_MSG_MORE when sending multiple buffers
Traces show that sendto() rarely has MSG_MORE on H2 despite sending
multiple buffers. The reason is that the loop iterating over the buffer
ring doesn't have this info and doesn't pass it down.

But now we know how many buffers are left to be sent, so we know whether
or not the current buffer is the last one. As such we can set this flag
for all buffers but the last one.
2023-03-17 16:43:51 +01:00
Willy Tarreau
88718955f4 OPTIM: mux-h1: limit first read size to avoid wrapping
Before muxes were used, we used to refrain from reading past the
buffer's reserve. But with muxes which have their own buffer, this
rule was a bit forgotten, resulting in an extraneous read to be
performed just because the rx buffer cannot be entirely transferred
to the stream layer:

  sendto(12, "GET /?s=16k HTTP/1.1\r\nhost: 127."..., 84, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 84
  recvfrom(12, "HTTP/1.1 200\r\nContent-length: 16"..., 16320, 0, NULL, NULL) = 16320
  recvfrom(12, ".123456789.12345", 16, 0, NULL, NULL) = 16
  recvfrom(12, "6789.123456789.12345678\n.1234567"..., 15244, 0, NULL, NULL) = 182
  recvfrom(12, 0x1e5d5d6, 15062, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

Here the server sends 16kB of payload after a headers block, the mux reads
16320 into the ibuf, and the stream layer consumes 15360 from the first
h1_rcv_buf(), which leaves 960 into the buffer and releases a few indexes.
The buffer cannot be realigned due to these remaining data, and a subsequent
read is made on 16 bytes, then again on 182 bytes.

By avoiding to read too much on the first call, we can avoid needlessly
filling this buffer:

  recvfrom(12, "HTTP/1.1 200\r\nContent-length: 16"..., 15360, 0, NULL, NULL) = 15360
  recvfrom(12, "456789.123456789.123456789.12345"..., 16220, 0, NULL, NULL) = 1158
  recvfrom(12, 0x1d52a3a, 15062, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

This is much more efficient and uses less RAM since the first buffer that
was emptied can now be released.

Note that a further improvement (tested) consists in reading even less
(typically 1kB) so that most of the data are transferred in zero-copy, and
are not read until process_stream() is scheduled. This patch doesn't do that
for now so that it can be backported without any obscure impact.
2023-03-17 16:43:51 +01:00
Willy Tarreau
f41dfc22b2 BUG/MAJOR: qpack: fix possible read out of bounds in static table
CertiK Skyfall Team reported that passing an index greater than
QPACK_SHT_SIZE in a qpack instruction referencing a literal field
name with name reference or and indexed field line will cause a
read out of bounds that may crash the process, and confirmed that
this fix addresses the issue.

This needs to be backported as far as 2.5.
2023-03-17 16:43:51 +01:00
Aurelien DARRAGON
e2907c7ee3 MINOR: stick-table: add sc-add-gpc() to http-after-response
sc-add-gpc() was implemented in 5a72d03 ("MINOR: stick-table:
implement the sc-add-gpc() action")

This new action was exposed everywhere sc-inc-gpc() is available,
except for http-after-response.
But there doesn't seem to be a technical constraint that prevents us from
exposing it in http-after-response.
It was probably overlooked, let's add it.

No backport needed, unless 5a72d03 ("MINOR: stick-table: implement the
sc-add-gpc() action") is being backported.
2023-03-17 13:09:09 +01:00
Frédéric Lécaille
ca07979b97 BUG/MINOR: quic: Missing STREAM frame data pointer updates
This patch follows this one which was not sufficient:
    "BUG/MINOR: quic: Missing STREAM frame length updates"
Indeed, it is not sufficient to update the ->len and ->offset member
of a STREAM frame to move it forward. The data pointer must also be updated.
This is not done by the STREAM frame builder.

Must be backported to 2.6 and 2.7.
2023-03-17 09:21:18 +01:00
Willy Tarreau
14ea98af73 BUG/MINOR: mux-h2: set CO_SFL_STREAMER when sending lots of data
Emeric noticed that h2 bit-rate performance was always slightly lower
than h1 when the CPU is saturated. Strace showed that we were always
data in 2kB chunks, corresponding to the max_record size. What's
happening is that when this mechanism of dynamic record size was
introduced, the STREAMER flag at the stream level was relied upon.
Since all this was moved to the muxes, the flag has to be passed as
an argument to the snd_buf() function, but the mux h2 did not use it
despite a comment mentioning it, probably because before the multi-buf
it was not easy to figure the status of the buffer.

The solution here consists in checking if the mbuf is congested or not,
by checking if it has more than one buffer allocated. If so we set the
CO_SFL_STREAMER flag, otherwise we don't. This way moderate size
exchanges continue to be made over small chunks, but downloads will
be able to use the large ones.

While it could be backported to all supported versions, it would be
better to limit it to the last LTS, so let's do it for 2.7 and 2.6 only.
This patch requires previous commit "MINOR: buffer: add br_single() to
check if a buffer ring has more than one buf".
2023-03-16 18:45:46 +01:00
Willy Tarreau
93c5511af8 BUG/MEDIUM: mux-h2: only restart sending when mux buffer is decongested
During performance tests, Emeric faced a case where the wakeups of
sc_conn_io_cb() caused by h2_resume_each_sending_h2s() was multiplied
by 5-50 and a lot of CPU was being spent doing this for apparently no
reason.

The culprit is h2_send() not behaving well with congested buffers and
small SSL records. What happens when the output is congested is that
all buffers are full, and data are emitted in 2kB chunks, which are
sufficient to wake all streams up again to ask them to send data again,
something that will obviously only work for one of them at best, and
waste a lot of CPU in wakeups and memcpy() due to the small buffers.
When this happens, the performance can be divided by 2-2.5 on large
objects.

Here the chosen solution against this is to keep in mind that as long
as there are still at least two buffers in the ring after calling
xprt->snd_buf(), it means that the output is congested and there's
no point trying again, because these data will just be placed into
such buffers and will wait there. Instead we only mark the buffer
decongested once we're back to a single allocated buffer in the ring.

By doing so we preserve the ability to deal with large concurrent
bursts while not causing a thundering herd by waking all streams for
almost nothing.

This needs to be backported to 2.7 and 2.6. Other versions could
benefit from it as well but it's not strictly necessary, and we can
reconsider this option if some excess calls to sc_conn_io_cb() are
faced.

Note that this fix depends on this recent commit:
    MINOR: buffer: add br_single() to check if a buffer ring has more than one buf
2023-03-16 18:45:46 +01:00
Willy Tarreau
3fb2c6d5b4 BUG/MINOR: mux-h2: make sure the h2c task exists before refreshing it
When detaching a stream, if it's the last one and the mbuf is blocked,
we leave without freeing the stream yet. We also refresh the h2c task's
timeout, except that it's possible that there's no such task in case
there is no client timeout, causing a crash. The fix just consists in
doing this when the task exists.

This bug has always been there and is extremely hard to meet even
without a client timeout. This fix has to be backported to all
branches, but it's unlikely anyone has ever met it anyay.
2023-03-16 18:45:46 +01:00
Christopher Faulet
3a7b539b12 BUG/MEDIUM: connection: Preserve flags when a conn is removed from an idle list
The commit 5e1b0e7bf ("BUG/MEDIUM: connection: Clear flags when a conn is
removed from an idle list") introduced a regression. CO_FL_SAFE_LIST and
CO_FL_IDLE_LIST flags are used when the connection is released to properly
decrement used/idle connection counters. if a connection is idle, these
flags must be preserved till the connection is really released. It may be
removed from the list but not immediately released. If these flags are lost
when it is finally released, the current number of used connections is
erroneously decremented. If means this counter may become negative and the
counters tracking the number of idle connecitons is not decremented,
suggesting a leak.

So, the above commit is reverted and instead we improve a bit the way to
detect an idle connection. The function conn_get_idle_flag() must now be
used to know if a connection is in an idle list. It returns the connection
flag corresponding to the idle list if the connection is idle
(CO_FL_SAFE_LIST or CO_FL_IDLE_LIST) or 0 otherwise. But if the connection
is scheduled to be removed, 0 is also returned, regardless the connection
flags.

This new function is used when the connection is temporarily removed from
the list to be used, mainly in muxes.

This patch should fix #2078 and #2057. It must be backported as far as 2.2.
2023-03-16 15:34:20 +01:00
Frédéric Lécaille
fc546ab6a7 BUG/MINOR: quic: Missing STREAM frame length updates
Some STREAM frame lengths were not updated before being duplicated, built
of requeued contrary to their ack offsets. This leads haproxy to crash when
receiving acknowledgements for such frames with this thread #1 backtrace:

 Thread 1 (Thread 0x7211b6ffd640 (LWP 986141)):
 #0  ha_crash_now () at include/haproxy/bug.h:52
 No locals.
 #1  b_del (b=<optimized out>, del=<optimized out>) at include/haproxy/buf.h:436
 No locals.
 #2  qc_stream_desc_ack (stream=stream@entry=0x7211b6fd9bc8, offset=offset@entry=53176, len=len@entry=1122) at src/quic_stream.c:111

Thank you to @Tristan971 for having provided such traces which reveal this issue:

[04|quic|5|c_conn.c:1865] qc_requeue_nacked_pkt_tx_frms(): entering : qc@0x72119c22cfe0
[04|quic|5|_frame.c:1179] qc_frm_unref(): entering : qc@0x72119c22cfe0
[04|quic|5|_frame.c:1186] qc_frm_unref(): remove frame reference : qc@0x72119c22cfe0 frm@0x72118863d260 STREAM_F uni=0 fin=1 id=460 off=52957 len=1122 3244
[04|quic|5|_frame.c:1194] qc_frm_unref(): leaving : qc@0x72119c22cfe0
[04|quic|5|c_conn.c:1902] qc_requeue_nacked_pkt_tx_frms(): updated partially acked frame : qc@0x72119c22cfe0 frm@0x72119c472290 STREAM_F uni=0 fin=1 id=460 off=53176 len=1122

Note that haproxy has much more chance to crash if this frame is the last one
(fin bit set). But another condition must be fullfilled to update the ack offset.
A previous STREAM frame from the same stream with the same offset but with less
data must be acknowledged by the peer. This is the condition to update the ack offset.

For others frames without fin bit in the same conditions, I guess the stream may be
truncated because too much data are removed from the stream when they are
acknowledged.

Must be backported to 2.6 and 2.7.
2023-03-16 14:35:19 +01:00
Aurelien DARRAGON
819817fc5e BUG/MINOR: tcp_sample: fix a bug in fc_dst_port and fc_dst_is_local sample fetches
There is a bug in the smp_fetch_dport() function which affects the 'f' case,
also known as 'fc_dst_port' sample fetch.

conn_get_src() is used to retrieve the address prior to calling conn_dst().
But this is wrong: conn_get_dst() should be used instead.

Because of that, conn_dst() may return unexpected results since the dst
address is not guaranteed to be set depending on the conn state at the time
the sample fetch is used.

This was reported by Corin Langosch on the ML:

during his tests he noticed that using fc_dst_port in a log-format string
resulted in the correct value being printed in the logs but when he used it
in an ACL, the ACL did not evaluate properly.

This can be easily reproduced with the following test conf:
    |frontend test-http
    |  bind 127.0.0.1:8080
    |  mode http
    |
    |  acl test fc_dst_port eq 8080
    |  http-request return status 200 if test
    |  http-request return status 500 if !test

A request on 127.0.0.1:8080 should normally return 200 OK, but here it
will return a 500.

The same bug was also found in smp_fetch_dst_is_local() (fc_dst_is_local
sample fetch) by reading the code: the fix was applied twice.

This needs to be backported up to 2.5
[both sample fetches were introduced in 2.5 with 888cd70 ("MINOR:
tcp-sample: Add samples to get original info about client connection")]
2023-03-16 11:26:53 +01:00
Christopher Faulet
a4bd7602f3 BUG/MEDIUM: mux-h1: Don't block SE_FL_ERROR if EOS is not reported on H1C
When a connection error is encountered during a receive, the error is not
immediatly reported to the SE descriptor. We take care to process all
pending input data first. However, when the error is finally reported, a
fatal error is reported only if a read0 was also received. Otherwise, only a
pending error is reported.

With a raw socket, it is not an issue because shutdowns for read and write
are systematically reported too when a connection error is detected. So in
this case, the fatal error is always reported to the SE descriptor. But with
a SSL socket, in case of pure SSL error, only the connection error is
reported. This prevent the fatal error to go up. And because the connection
is in error, no more receive or send are preformed on the socket. The mux is
blocked till a timeout is triggered at the stream level, looping infinitly
to progress.

To fix the bug, during the demux stage, when there is no longer pending
data, the read error is reported to the SE descriptor, regardless the
shutdown for reads was received or not. No more data are expected, so it is
safe.

This patch should fix the issue #2046. It must be backported to 2.7.
2023-03-16 08:54:51 +01:00
Christopher Faulet
f19c639787 DEBUG: ssl-sock/show_fd: Display SSL error code
Like for connection error code, when FD are dumps, the ssl error code is now
displayed. This may help to diagnose why a connection error occurred.

This patch may be backported to help debugging.
2023-03-14 15:51:34 +01:00
Christopher Faulet
d52f2ad6ee DEBUG: cli/show_fd: Display connection error code
When FD are dumps, the connection error code is now displayed. This may help
to diagnose why a connection error occurred.

This patch may be backported to help debugging.
2023-03-14 15:48:07 +01:00
Christopher Faulet
52ec6f14c4 BUG/MEDIUM: resolvers: Properly stop server resolutions on soft-stop
When HAproxy is stopping, the DNS resolutions must be stopped, except those
triggered from a "do-resolve" action. To do so, the resolutions themselves
cannot be destroyed, the current design is too complex. However, it is
possible to mute the resolvers tasks. The same is already performed with the
health-checks. On soft-stop, the tasks are still running periodically but
nothing if performed.

For the resolvers, when the process is stopping, before running a
resolution, we check all the requesters attached to this resolution. If t
least a request is a stream or if there is a requester attached to a running
proxy, a new resolution is triggered. Otherwise, we ignored the
resolution. It will be evaluated again on the next wakeup. This way,
"do-resolv" action are still working during soft-stop but other resoluation
are stopped.

Of course, it may be see as a feature and not a bug because it was never
performed. But it is in fact not expected at all to still performing
resolutions when HAProxy is stopping. In addution, a proxy option will be
added to change this behavior.

This patch partially fixes the issue #1874. It could be backported to 2.7
and maybe to 2.6. But no further.
2023-03-14 15:23:55 +01:00
Christopher Faulet
48678e483f BUG/MEDIUM: proxy: properly stop backends on soft-stop
On soft-stop, we must properlu stop backends and not only proxies with at
least a listener. This is mandatory in order to stop the health checks. A
previous fix was provided to do so (ba29687bc1 "BUG/MEDIUM: proxy: properly
stop backends"). However, only stop_proxy() function was fixed. When HAproxy
is stopped, this function is no longer used. So the same kind of fix must be
done on do_soft_stop_now().

This patch partially fixes the issue #1874. It must be backported as far as
2.4.
2023-03-14 15:23:55 +01:00
Remi Tricot-Le Breton
7716f27736 MINOR: ssl: Add certificate path to 'show ssl ocsp-response' output
The ocsp-related CLI commands tend to work with OCSP_CERTIDs as well as
certificate paths so the path should also be added to the output of the
"show ssl ocsp-response" command when no certid or path is provided.
2023-03-14 11:07:32 +01:00
Remi Tricot-Le Breton
dafc068f12 MINOR: ssl: Accept certpath as param in "show ssl ocsp-response" CLI command
In order to increase usability, the "show ssl ocsp-response" also takes
a frontend certificate path as parameter. In such a case, it behaves the
same way as "show ssl cert foo.pem.ocsp".
2023-03-14 11:07:32 +01:00
Remi Tricot-Le Breton
f64a05979d BUG/MINOR: ssl: Fix double free in ocsp update deinit
If the last update before a deinit happens was successful, the pointer
to the httpclient in the ocsp update context was not reset while the
httpclient instance was already destroyed.
2023-03-14 11:07:32 +01:00
Remi Tricot-Le Breton
a6c0a59e9a MINOR: ssl: Use ocsp update task for "update ssl ocsp-response" command
Instead of having a dedicated httpclient instance and its own code
decorrelated from the actual auto update one, the "update ssl
ocsp-response" will now use the update task in order to perform updates.

Since the cli command allows to update responses that were never
included in the auto update tree, a new flag was added to the
certificate_ocsp structure so that the said entry can be inserted into
the tree "by hand" and it won't be reinserted back into the tree after
the update process is performed. The 'update_once' flag "stole" a bit
from the 'fail_count' counter since it is the one less likely to reach
UINT_MAX among the ocsp counters of the certificate_ocsp structure.

This new logic required that every certificate_ocsp entry contained all
the ocsp-related information at all time since entries that are not
supposed to be configured automatically can still be updated through the
cli. The logic of the ssl_sock_load_ocsp was changed accordingly.
2023-03-14 11:07:32 +01:00
Remi Tricot-Le Breton
c9bfe32b71 MINOR: ssl: Change the ocsp update log-format
The dedicated proxy used for OCSP auto update is renamed OCSP-UPDATE
which should be more explicit than the previous HC_OCSP name. The
reference to the underlying httpclient is simply kept in the
documentation.
The certid is removed from the log line since it is not really
comprehensible and is replaced by the path to the corresponding frontend
certificate.
2023-03-14 11:07:32 +01:00
Christopher Faulet
e5d02c3d46 BUG/MEDIUM: mux-pt: Set EOS on error on sending path if read0 was received
It is more a less a revert of the commit b65af26e1 ("MEDIUM: mux-pt: Don't
always set a final error on SE on the sending path"). The PT multiplexer is
so simple that an error on the sending path is terminal. Unlike other muxes,
there is no connection level here. However, instead of reporting an final
error by setting SE_FL_ERROR, we set SE_FL_EOS flag instead if a read0 was
received on the underlying connection. Concretely, it is always true with
the current design of the raw socket layer. But it is cleaner this way.

Without this patch, it is possible to block a TCP socket if a connection
error is triggered when data are sent (for instance a broken pipe) while the
upper stream does not expect to receive more data.

Note the patch above introduced a regression because errors handling at the
connection level is quite simple. All errors are final. But we must keep in
mind it may change. And if so, this will require to move back on a 2-step
errors handling in the mux-pt.

This patch must be backported to 2.7.
2023-03-13 11:22:13 +01:00
Willy Tarreau
8f6da64641 MINOR: quic_sock: un-statify quic_conn_sock_fd_iocb()
This one is printed as the iocb in the "show fd" output, and arguably
this wasn't very convenient as-is:
    293 : st=0x000123(cl heopI W:sRa R:sRA) ref=0 gid=1 tmask=0x8 umask=0x0 prmsk=0x8 pwmsk=0x0 owner=0x7f488487afe0 iocb=0x50a2c0(main+0x60f90)

Let's unstatify it and export it so that the symbol can now be resolved
from the various points that need it.
2023-03-10 14:30:01 +01:00
Frédéric Lécaille
4377dbd756 BUG/MINOR: quic: Missing listener accept queue tasklet wakeups
This bug was revealed by h2load tests run as follows:

      h2load -t 4 --npn-list h3 -c 64 -m 16 -n 16384  -v https://127.0.0.1:4443/

This open (-c) 64 QUIC connections (-n) 16384 h3 requets from (-t) 4 threads, i.e.
256 requests by connection. Such tests could not always pass and often ended with
such results displays by h2load:

finished in 53.74s, 38.11 req/s, 493.78KB/s
requests: 16384 total, 2944 started, 2048 done, 2048 succeeded, 14336
failed, 14336 errored, 0 timeout
status codes: 2048 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 25.92MB (27174537) total, 102.00KB (104448) headers (space
savings 1.92%), 25.80MB (27053569) data
UDP datagram: 3883 sent, 24330 received
                     min         max         mean         sd        ± sd
time for request:    48.75ms    502.86ms    134.12ms     75.80ms    92.68%
time for connect:    20.94ms    331.24ms    189.59ms     84.81ms    59.38%
time to 1st byte:   394.36ms    417.01ms    406.72ms      9.14ms    75.00%
req/s           :       0.00      115.45       14.30       38.13    87.50%

The number of successful requests was always a multiple of 256.
Activating the traces also shew that some connections were blocked after having
successfully completed their handshakes due to the fact that the mux. The mux
is started upon the acceptation of the connection.

Under heavy load, some connections were never accepted. From the moment where
more than 4 (MAXACCEPT) connections were enqueued before a listener could be
woken up to accept at most 4 connections, the remaining connections were not
accepted ore lately at the second listener tasklet  wakeup.

Add a call to tasklet_wakeup() to the accept list tasklet of the listeners to
wake up it if there are remaining connections to accept after having called
listener_accept(). In this case the listener must not be removed of this
accept list, if not at the next call it will not accept anything more.

Must be backported to 2.7 and 2.6.
2023-03-10 14:05:24 +01:00
William Lallemand
2078d4b1f7 BUG/MINOR: mworker: use MASTER_MAXCONN as default maxconn value
In environments where SYSTEM_MAXCONN is defined when compiling, the
master will use this value instead of the original minimal value which
was set to 100. When this happens, the master process could allocate
RAM excessively since it does not need to have an high maxconn. (For
example if SYSTEM_MAXCONN was set to 100000 or more)

This patch fixes the issue by using the new define MASTER_MAXCONN which
define a default maxconn of 100 for the master process.

Must be backported as far as 2.5.
2023-03-09 14:28:44 +01:00
Willy Tarreau
bd3b44edff MINOR: debug: add random delay injection with "debug dev delay-inj"
The goal is to send signals to random threads at random instants so that
they spin for a random delay in a relax() loop, trying to give back the
CPU to another competing hardware thread, in hope that from time to time
this can trigger in critical areas and increase the chances to provoke a
latent concurrency bug. For now none were observed.

For example, this command starts 64 such tasks waking after random delays
of 0-1ms and delivering signals to trigger such loops on 3 random threads:

  for i in {1..64}; do
    socat - /tmp/sock1 <<< "expert-mode on;debug dev delay-inj 2 3"
  done

This command is only enabled when DEBUG_DEV is set at build time.
2023-03-09 14:01:58 +01:00
Willy Tarreau
cd8914bc52 BUG/MAJOR: fd/threads: close a race on closing connections after takeover
As mentioned in commit 237e6a0d6 ("BUG/MAJOR: fd/thread: fix race between
updates and closing FD"), a race was found during stress tests involving
heavy backend connection reuse with many competing closes.

Here the problem is complex. The analysis in commit f69fea64e ("MAJOR:
fd: get rid of the DWCAS when setting the running_mask") that removed
the DWCAS in 2.5 overlooked a few races.

First, a takeover from thread1 could happen just after fd_update_events()
in thread2 validates it holds the tmask bit in the CAS loop. Since thread1
releases running_mask after the operation, thread2 will succeed the CAS
and both will believe the FD is theirs. This does explain the occasional
crashes seen with h1_io_cb() being called on a bad context, or
sock_conn_iocb() seeing conn->subs vanish after checking it. This issue
can be addressed using a DWCAS in both fd_takeover() and fd_update_events()
as it was before the patch above but this is not portable to all archs and
is not easy to adapt for those lacking it, due to some operations still
happening only on individual masks after the thread groups were added.

Second, the checks after fd_clr_running() for the current thread being
the last one is not sufficient: at the exact moment the operation
completes, another thread may also set and drop the running bit and see
itself as alone, and both can call _fd_close_orphan() in parallel. In
order to prevent this from happening, we cannot rely on the absence of
others, we need an explicit flag indicating that the FD must be closed.
One approach that was attempted consisted in playing with the thread_mask
but that was not reliable since it could still match between the late
deletion and the early insertion that follows. Instead, a new FD flag
was added, FD_MUST_CLOSE, that exactly indicates that the call to
_fd_delete_orphan() must be done. It is set by fd_delete(), and
atomically cleared by the first one which checks it, and which is the
only one to call _fd_delete_orphan().

With both points addressed, there's no more visible race left:

- takeover() only happens under the connection list's lock and cannot
  compete with fd_delete() since fd_delete() must first remove the
  connection from the list before deleting the FD. That's also why it
  doesn't need to call _fd_delete_orphan() when dropping its running
  bit.

- takeover() sets its running bit then atomically replaces the thread
  mask, so that until that's done, it doesn't validate the condition
  to end the synchonization loop in fd_update_events(). Once it's OK,
  the previous thread's bit is lost, and this is checked for in
  fd_update_events()

- fd_update_events() can compete with fd_delete() at various places
  which are explained above. Since fd_delete() clears the thread mask
  as after setting its running bit and after setting the FD_MUST_CLOSE
  bit, the synchronization loop guarantees that the thread mask is seen
  before going further, and that once it's seen, the FD_MUST_CLOSE flag
  is already present.

- fd_delete() may start while fd_update_events() has already started,
  but fd_delete() must hold a bit in thread_mask before starting, and
  that is checked by the first test in fd_update_events() before setting
  the running_mask.

- the poller's _update_fd() will not compete against _fd_delete_orphan()
  nor fd_insert() thanks to the fd_grab_tgid() that's always done before
  updating the polled_mask, and guarantees that we never pretend that a
  polled_mask has a bit before the FD is added.

The issue is very hard to reproduce and is extremely time-sensitive.
Some tests were required with a 1-ms timeout with request rates
closely matching 1 kHz per server, though certain tests sometimes
benefitted from saturation. It was found that adding the following
slowdown at a few key places helped a lot and managed to trigger the
bug in 0.5 to 5 seconds instead of tens of minutes on a 20-thread
setup:

    { volatile int i = 10000; while (i--); }

Particularly, placing it at key places where only one of running_mask
or thread_mask is set and not the other one yet (e.g. after the
synchronization loop in fd_update_events or after dropping the
running bit) did yield great results.

Many thanks to Olivier Houchard for this expert help analysing these
races and reviewing candidate fixes.

The patch must be backported to 2.5. Note that 2.6 does not have tgid
in FDs, and that it requires a change of output on fd_clr_running() as
we need the previous bit. This is provided by carefully backporting
commit d6e1987612 ("MINOR: fd: make fd_clr_running() return the previous
value instead"). Tests have shown that the lack of tgid is a showstopper
for 2.6 and that unless a better workaround is found, it could still be
preferable to backport the minimum pieces required for fd_grab_tgid()
to 2.6 so that it stays stable long.
2023-03-09 14:01:48 +01:00
Willy Tarreau
cf0d0eedc7 BUG/MINOR: thread: report thread and group counts in the correct order
In case too many thread groups are needed for the threads, we emit
an error indicating the problem. Unfortunately the threads and groups
counts were reversed.

This can be backported to 2.6.
2023-03-09 11:40:56 +01:00
Willy Tarreau
f5b63277f4 BUG/MINOR: init: properly detect NUMA bindings on large systems
The NUMA detection code tries not to interfer with any taskset the user
could have specified in init scripts. For this it compares the number of
CPUs available with the number the process is bound to. However, the CPU
count is retrieved after being applied an upper bound of MAX_THREADS, so
if the machine has more than 64 CPUs, the comparison always fails and
makes haproxy think the user has already enforced a binding, and it does
not pin it anymore to a single NUMA node.

This can be verified by issuing:

  $ socat /path/to/sock - <<< "show info" | grep thread

On a dual 48-CPU machine it reports 64, implying that threads are allowed
to run on the second socket:

  Nbthread: 64

With this fix, the function properly reports 96, and the output shows 48,
indicating that a single NUMA node was used:

  Nbthread: 48

Of course nothing is changed when "no numa-cpu-mapping" is specified:

  Nbthread: 64

This can be backported to 2.4.
2023-03-09 10:17:37 +01:00
Frédéric Lécaille
be795ceb91 MINOR: quic: Do not stress the peer during retransmissions of lost packets
This issue was revealed by "Multiple streams" QUIC tracker test which very often
fails (locally) with a file of about 1Mbytes (x4 streams). The log of QUIC tracker
revealed that from its point of view, the 4 files were never all received entirely:

 "results" : {
      "stream_0_rec_closed" : true,
      "stream_0_rec_offset" : 1024250,
      "stream_0_snd_closed" : true,
      "stream_0_snd_offset" : 15,
      "stream_12_rec_closed" : false,
      "stream_12_rec_offset" : 72689,
      "stream_12_snd_closed" : true,
      "stream_12_snd_offset" : 15,
      "stream_4_rec_closed" : true,
      "stream_4_rec_offset" : 1024250,
      "stream_4_snd_closed" : true,
      "stream_4_snd_offset" : 15,
      "stream_8_rec_closed" : true,
      "stream_8_rec_offset" : 1024250,
      "stream_8_snd_closed" : true,
      "stream_8_snd_offset" : 15
   },

But this in contradiction with others QUIC tracker logs which confirms that haproxy
has really (re)sent the stream at the suspected offset(stream_12_rec_offset):

       1152085,
       "transport",
       "packet_received",
       {
          "frames" : [
             {
                "frame_type" : "stream",
                "length" : "155",
                "offset" : "72689",
                "stream_id" : "12"
             }
          ],
          "header" : {
             "dcid" : "a14479169ebb9dba",
             "dcil" : "8",
             "packet_number" : "466",
             "packet_size" : 190
          },
          "packet_type" : "1RTT"
       }

When detected as losts, the packets are enlisted, then their frames are
requeued in their packet number space by qc_requeue_nacked_pkt_tx_frms().
This was done using a local list which was spliced to the packet number
frame list. This had as bad effect to retransmit the frames in the inverse
order they have been sent. This is something the QUIC tracker go client
does not like at all!

Removing the frame splicing fixes this issue and allows haproxy to pass the
"Multiple streams" test.

Must be backported to 2.7.
2023-03-08 18:50:45 +01:00
Willy Tarreau
9b773ec118 CLEANUP: sock: always perform last connection updates before wakeup
Normally the task_wakeup() in sock_conn_io_cb() is expected to
happen on the same thread the FD is attached to. But due to the
way the code was arranged in the past (with synchronous callbacks)
we continue to update connections after the wakeup, which always
makes the reader have to think deeply whether it's possible or not
to call another thread there. Let's just move the tasklet_wakeup()
at the end to make sure there's no problem with that.
2023-03-08 16:07:32 +01:00
Willy Tarreau
677c006c5c MINOR: fd/cli: report the polling mask in "show fd"
It's missing and often needed when trying to debug a situation, let's
report the polling mask as well in "show fd".
2023-03-08 16:07:32 +01:00
Frédéric Lécaille
cc101cd2aa BUG/MINOR: quic: Wrong RETIRE_CONNECTION_ID sequence number check
This bug arrived with this commit:
     b5a8020e9 MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX)
and was revealed by h3 interop tests with clients like s2n-quic and quic-go
as noticed by Amaury.

Indeed, one must check that the CID matching the sequence number provided by a received
RETIRE_CONNECTION_ID frame does not match the DCID of the packet.
Remove useless ->curr_cid_seq_num member from quic_conn struct.
The sequence number lookup must be done in qc_handle_retire_connection_id_frm()
to check the validity of the RETIRE_CONNECTION_ID frame, it returns the CID to be
retired into <cid_to_retire> variable passed as parameter to this function if
the frame is valid and if the CID was not already retired

Must be backported to 2.7.
2023-03-08 14:53:12 +01:00
Amaury Denoyelle
5907fede87 MEDIUM: quic: release closing connections on stopping
Since the following commit :
  commit fb375574f9
  MINOR: quic: mark quic-conn as jobs on socket allocation

quic-conn instances are marked as jobs. This prevent haproxy process to
stop while there is transfer in progress. To not delay process
termination, idle connections are woken up through their MUX instances
to be able to release them immediately.

However, there is no mechanism to wake up quic connections left on
closing or draining state. This means that haproxy process termination
is delayed until every closing quic connections timer has expired.

To improve this, a new function quic_handle_stopping() is called when
haproxy process is stopping. It simply wakes up the idle timer task of
all connections in the global closing list. These connections will thus
be released immediately to not interrupt haproxy process stopping.

This should be backported up to 2.7.
2023-03-08 14:41:28 +01:00
Amaury Denoyelle
2d37629222 MINOR: quic: handle new closing list in show quic
A new global quic-conn list has been added by the previous patch. It will
contain every quic-conn in closing or draining state.

Thus, it is now easier to include or skip them on a "show quic" output :
when the default list on the current thread has been browsed entirely,
either we skip to the next thread or we look at the closing list on the
current thread.

This should be backported up to 2.7.
2023-03-08 14:39:48 +01:00
Amaury Denoyelle
efed86c973 MINOR: quic: create a global list dedicated for closing QUIC conns
When a CONNECTION_CLOSE is emitted or received, a QUIC connection enters
respectively in draining or closing state. These states are a loose
equivalent of TCP TIME_WAIT. No data can be exchanged anymore but the
connection is maintained during a certain timer to handle packet
reordering or loss.

A new global list has been defined for QUIC connections in
closing/draining state inside thread_ctx structure. Each time a
connection enters in one of this state, it will be moved from the
default global list to the new closing list.

The objective of this patch is to quickly filter connections on
closing/draining. Most notably, this will be used to wake up these
connections and avoid that haproxy process stopping is delayed by them.

A dedicated function qc_detach_th_ctx_list() has been implemented to
transfer a quic-conn from one list instance to the other. This takes
care of back-references attach to a quic-conn instance in case of a
running "show quic".

This should be backported up to 2.7.
2023-03-08 14:39:48 +01:00
Amaury Denoyelle
815c8ce210 MINOR: h3: add traces on h3_init_uni_stream() error paths
Complete traces on h3_init_uni_stream(). This ensures there is always a
dedicated trace for each error paths.

This should be backported up to 2.7.
2023-03-08 14:32:30 +01:00
Remi Tricot-Le Breton
447a38f387 MINOR: jwt: Add support for RSA-PSS signatures (PS256 algorithm)
This patch adds the support for the PS algorithms when verifying JWT
signatures (rsa-pss). It was not managed during the first implementation
and previously raised an "Unmanaged algorithm" error.
The tests use the same rsa signature as the plain rsa tests (RS256 ...)
and the implementation simply adds a call to
EVP_PKEY_CTX_set_rsa_padding in the function that manages rsa and ecdsa
signatures.
The signatures in the reg-test were built thanks to the PyJWT python
library once again.
2023-03-08 10:43:04 +01:00
Aurelien DARRAGON
bce0c0c37a BUG/MINOR: dns: fix ring offset calculation in dns_resolve_send()
With 737d10f ("BUG/MEDIUM: dns: ensure ring offset is properly reajusted
to head") relative offset calculation was fixed in dns_session_io_handler()
and dns_process_req() functions.

But if we compare with the changes performed in the patch that introduced
the bug: d9c7188 ("MEDIUM: ring: make the offset relative to the head/tail
instead of absolute"), we can see that dns_resolve_send() is missing from
the patch.

Applying both 737d10f + ("BUG/MINOR: dns: fix ring offset calculation on
first read") to dns_resolve_send() function.
With this last commit, we should be back at pre d9c7188 behavior.

No backport needed.
2023-03-08 08:57:13 +01:00
Aurelien DARRAGON
5a43db2c5d BUG/MINOR: dns: fix ring offset calculation on first read
With 737d10f ("BUG/MEDIUM: dns: ensure ring offset is properly reajusted
to head") ring offset is now properly re-adjusted in dns_session_io_handler()
and dns_process_req().

But the previous patch does not cope well if the first read is performed
on a non-empty ring since relative ofs will be computed from ds->ofs=0 or
dss->ofs_req=0.
In this case: relative offset could become invalid since we mix up relative
offsets with absolute offsets.

To fix this, we apply the same logic performed in d9c7188 ("MEDIUM: ring:
make the offset relative to the head/tail instead of absolute") for the
cli_io_handler_show_ring() function: that is using b_peek_ofs(buf, 0) to
set the contextual offset instead of hard-coding it to 0.

This should be considered as a minor bugfix since this bug was discovered by
reading the code: 737d10f already survived a good amount of stress-tests as
shown in GH #2068.

No backport needed as 737d10f is not marked for backports.
2023-03-08 08:56:30 +01:00
Aurelien DARRAGON
2c98867187 BUG/MEDIUM: sink/forwarder: ensure ring offset is properly readjusted to head
Since d9c7188 ("MEDIUM: ring: make the offset relative to the head/tail instead
of absolute"), ring offset calculation has changed: we don't rely on ring->ofs
absolute offset anymore.

But with the above patch, relative offset is not properly calculated in
sink_forward_oc_io_handler() and sink_forward_io_handler().

The issue here is the same as 737d10f ("BUG/MEDIUM: dns: ensure ring offset is
properly reajusted to head") since dns and sink_forward share the same
ring logic:

When the ring is becoming full, ring_write() will try to regain some space to
insert new data by calling b_del() on older messages. Here b_del() moves
buffer's head under the hood, and since ring->ofs cannot be used to "correct"
the relative offset, both sink_forward_oc_io_handler() and
sink_forward_io_handler() start to get invalid offset.
At this point, we will suffer from ring data corruption resulting in unexpected
behavior or process crashes.

This can be easily demonstrated with the following test:

    |log-forward syslog
    |  dgram-bind 127.0.0.1:5114
    |  log ring@logbuffer local0
    |
    |ring logbuffer
    |  format rfc5424
    |  size 16384
    |  server logserver 127.0.0.1:5114

Haproxy will forward incoming logs on udp@127.0.0.1:5114 to
tcp@127.0.0.1:5114

Then use the following tcp server:
  nc -l -p 5114

With the following udp log sender:
    |while [ 1 ]
    |do
    |  logger --udp  --server 127.0.0.1 -P 5114 -p user.warn "Test 7"
    |done

Once the ring buffer is full (it takes less that a second to fill the 16k
buffer) haproxy starts to misbehave and the log forwarding stops.

We apply the same fix as in 737d10f ("BUG/MEDIUM: dns: ensure ring offset is
properly reajusted to head").
Please note the ~0 case that is handled slightly differently in this patch:
this is required to properly start reading from a non-empty ring. This case
will be fixed in dns related code in the following patch.

This does not need to be backported as d9c7188 was not marked for backports.
2023-03-08 08:54:43 +01:00
Frédéric Lécaille
5e3201ea77 MINOR: quic: Add transport parameters to "show quic"
Modify quic_transport_params_dump() and others function relative to the
transport parameters value dump from TRACE() to make their output more
compact.
Add call to quic_transport_params_dump() to dump the transport parameters
from "show quic" CLI command.

Must be backported to 2.7.
2023-03-08 08:50:54 +01:00
Frédéric Lécaille
ece86e64c4 MINOR: quic: Add spin bit support
Add QUIC_FL_RX_PACKET_SPIN_BIT new RX packet flag to mark an RX packet as having
the spin bit set. Idem for the connection with QUIC_FL_CONN_SPIN_BIT flag.
Implement qc_handle_spin_bit() to set/unset QUIC_FL_CONN_SPIN_BIT for the connection
as soon as a packet number could be deciphered.
Modify quic_build_packet_short_header() to set the spin bit when building
a short packet header.

Validated by quic-tracker spin bit test.

Must be backported to 2.7.
2023-03-08 08:50:54 +01:00
Frédéric Lécaille
433af7fad9 MINOR: quic: Useless TLS context allocations in qc_do_rm_hp()
These allocations are definitively useless.

Must be backported to 2.7.
2023-03-08 08:50:54 +01:00
Frédéric Lécaille
8ac8a8778d MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX)
Add ->curr_cid_seq_num new quic_conn struct frame to store the connection
ID sequence number currently used by the connection.
Implement qc_handle_retire_connection_id_frm() to handle this RX frame.
Implement qc_retire_connection_seq_num() to remove a connection ID from its
sequence number.
Implement qc_build_new_connection_id_frm to allocate a new NEW_CONNECTION_ID
frame from a CID.
Modify qc_parse_pkt_frms() which parses the frames of an RX packet to handle
the case of the RETIRE_CONNECTION_ID frame.

Must be backported to 2.7.
2023-03-08 08:50:54 +01:00
Frédéric Lécaille
904caac3e4 MINOR: quic: Typo fix for ACK_ECN frame
Wrong name displayed by TRACE().

Must be backported to 2.7.
2023-03-08 08:50:54 +01:00
Frédéric Lécaille
b4c5471425 MINOR: quic: Store the next connection IDs sequence number in the connection
Add ->next_cid_seq_num new member to quic_conn struct to store the next
connection ID to be used to alloacated a connection ID.
It is initialized to 0 from qc_new_conn() which initializes a connection.
Modify new_quic_cid() to use this variable each time it is called without
giving the possibility to the caller to pass the sequence number for the
connection to be allocated.

Modify quic_build_post_handshake_frames() to use ->next_cid_seq_num
when building NEW_CONNECTION_ID frames after the hanshake has been completed.
Limit the number of connection IDs provided to the peer to the minimum
between 4 and the value it sent with active_connection_id_limit transport
parameter. This includes the connection ID used by the connection to send
this new connection IDs.

Must be backported to 2.7.
2023-03-08 08:50:54 +01:00
Frédéric Lécaille
4afbca611f MINOR: quic: Do not accept wrong active_connection_id_limit values
A peer must not send active_connection_id_limit values smaller than 2
which is also the minimum value when not sent.

Make the transport parameters decoding fail in this case.

Must be backported to 2.7.
2023-03-08 08:50:54 +01:00
Amaury Denoyelle
ebfafc212a BUG/MINOR: mux-quic: properly init STREAM frame as not duplicated
STREAM frame retransmission has been recently fixed. A new boolean field
<dup> was created for quic_stream frame type. It is set for duplicated
STREAM frame to ensure extra checks on the underlying buffer are
conducted before sending the frame. All of this has been implemented by
this commit :
  315a4f6ae5
  BUG/MEDIUM: quic: do not crash when handling STREAM on released MUX

However, the above commit is incomplete. In the MUX code, when a new
STREAM frame is created, <dup> is left uninitialized. In most cases this
is harmless as it will only add extra unneeded checks before sending the
frame. So this is mainly a performance issue.

There is however one case where this bug will lead to a crash : when the
response consists only of an empty STREAM frame. In this case, the empty
frame will be silently removed as it is incorrectly assimilated to an
already acked frame range in qc_build_frms(). This can trigger a
BUG_ON() on the MUX code as a qcs instance is still in the send list
after qc_send_frames() invocation.

Note that this is extremely rare to have only an empty STREAM frame. It
was reproduced with HTTP/0.9 where no HTTP status line exists on an
empty body. I do not know if this is possible on HTTP/3 as a status line
should be present each time in a HEADERS frame.

Properly initialize <dup> field to 0 on each STREAM frames generated by
the QUIC MUX to fix this issue.

This crash may be linked to github issue #2049.

This should be backported up to 2.6.
2023-03-07 18:39:49 +01:00
Amaury Denoyelle
737d10fac1 BUG/MEDIUM: dns: ensure ring offset is properly reajusted to head
Since the below patch, ring offset calculation for readers has changed.
  commit d9c7188633
  MEDIUM: ring: make the offset relative to the head/tail instead of absolute

For readers, this requires to adjust their offsets to be relative to the
ring head each time read is resumed. Indeed, buffer head can change any
time a ring_write() is performed after older entries were purged.
This operation was not performed on the DNS code which causes the offset
to become invalid. In most cases, the following BUG_ON() was triggered :

  FATAL: bug condition "msg_len + ofs + cnt + 1 > b_data(buf)" matched
  at src/dns.c:522

Fix this by adjusting DNS reader offsets when entering
dns_session_io_handler() and dns_process_req().

This bug was reproduced by using a backend with 10 servers using SRV
record resolution on a single resolvers section. A BUG_ON() crash would
occur after less than 5 minutes of process execution.

This does not need to be backported as the above patch is not.

This should fix github issue #2068.
2023-03-07 15:51:58 +01:00
Willy Tarreau
237e6a0d65 BUG/MAJOR: fd/thread: fix race between updates and closing FD
While running some L7 retries tests, Christopher and I stumbled upon a
very strange behavior showing some occasional server timeouts when the
server closes keep-alive connections quickly. The issue can be
reproduced with the following config:

    global
        expose-experimental-directives
        #tune.fd.edge-triggered on   # can speed up the issue

    defaults
        mode http
        timeout client 5s
        timeout server 10s
        timeout connect 2s

    listen f
        bind :8001
        http-reuse always
        retry-on all-retryable-errors
        server next 127.0.0.1:8002

    frontend b
        bind :8002
        timeout http-keep-alive 1  # one ms
        redirect location /

Sending fast requests without reusing the client connection on port 8001
with a single connection and at least 3 threads on haproxy occasionally
shows some glitches pauses (below with timeout server 2s):

  $ taskset -c 2,3 h1load  -e -t 1 -r 1 -c 1 http://127.0.0.1:8001/
  #     time conns tot_conn  tot_req      tot_bytes    err  cps  rps  bps   ttfb
           1     1     9794     9793         959714      0 9k79 9k79 7M67 42.94u
           2     1     9794     9793         959714      0 0.00 0.00 0.00    -
           3     1     9794     9793         959714      0 0.00 0.00 0.00    -
           4     0    16015    16015        1569470      0 6k22 6k22 4M87 522.9u
           5     0    18657    18656        1828190      2 2k63 2k63 2M06 39.22u

If this doesn't happen, limiting to a request rate close to 1/timeout
may help.

What is happening is that after several migrations, a late report
via fd_update_events() may detect that the thread is not welcome, and
will want to program an update so that the current thread's poller
disables its polling on it. It is allowed to do so because it used
fd_grab_tgid(). But what if _fd_delete_orphan() was just starting to
be called and already reset the update_mask ? We'll end up with a bit
present in the update mask, then _fd_delete_orphan() resets the tgid,
which will prevent the poller from consuming that update. The update
is not needed anymore since the FD was closed, but in this case nobody
will clear this bit until the same FD is reused again and cleared. And
as long as the thread's bit remains in the update_mask, no new updates
will be programmed for the next use of this FD on the same thread since
due to the bit being present, fd_nbupdt will not be changed. This is
what is causing this timeout.

The fix consists in making sure _fd_delete_orphan() waits for the
occasional watchers to leave, and to do this before clearing the
update_mask. This will be either fd_update_events() trying to check
its thread_mask, or the poller checking its updates, so that's pretty
short. But it definitely closes this race.

This fix is needed since the introduction of fd_grab_tgid(), hence 2.7.

Note that while testing the fix, another related issue concerning the
atomicity of running_mask vs thread_mask popped up and will have to be
fixed till 2.5 as part of another patch. It may make the tests for this
fix occasionally tigger a few BUG_ON() or face a null conn->subs in
sock_conn_iocb(), though these ones are much more difficult to trigger.
This is not caused by this fix.
2023-03-07 07:09:59 +01:00
Amaury Denoyelle
315a4f6ae5 BUG/MEDIUM: quic: do not crash when handling STREAM on released MUX
The MUX instance is released before its quic-conn counterpart. On
termination, a H3 GOAWAY is emitted to prevent the client to open new
streams for this connection.

The quic-conn instance will stay alive until all opened streams data are
acknowledged. If the client tries to open a new stream during this
interval despite the GOAWAY, quic-conn is responsible to request its
immediate closure with a STOP_SENDING + RESET_STREAM.

This behavior was already implemented but the received packet with the
new STREAM was never acknowledged. This was fixed with the following
commit :
  commit 156a89aef8
  BUG/MINOR: quic: acknowledge STREAM frame even if MUX is released

However, this patch introduces a regression as it did not skip the call
to qc_handle_strm_frm() despite the MUX instance being released. This
can cause a segfault when using qcc_get_qcs() on a released MUX
instance. To fix this, add a missing break statement which will skip
qc_handle_strm_frm() when the MUX instance is not initialized.

This commit was reproduced using a short timeout client and sending
several requests with delay between them by using a modified aioquic. It
produces a crash with the following backtrace :
 #0  0x000055555594d261 in __eb64_lookup (x=4, root=0x7ffff4091f60) at include/import/eb64tree.h:132
 #1  eb64_lookup (root=0x7ffff4091f60, x=4) at src/eb64tree.c:37
 #2  0x000055555563fc66 in qcc_get_qcs (qcc=0x7ffff4091dc0, id=4, receive_only=1, send_only=0, out=0x7ffff780ca70) at src/mux_quic.c:668
 #3  0x0000555555641e1a in qcc_recv (qcc=0x7ffff4091dc0, id=4, len=40, offset=0, fin=1 '\001', data=0x7ffff40c4fef "\001&") at src/mux_quic.c:974
 #4  0x0000555555619d28 in qc_handle_strm_frm (pkt=0x7ffff4088e60, strm_frm=0x7ffff780cf50, qc=0x7ffff7cef000, fin=1 '\001') at src/quic_conn.c:2515
 #5  0x000055555561d677 in qc_parse_pkt_frms (qc=0x7ffff7cef000, pkt=0x7ffff4088e60, qel=0x7ffff7cef6c0) at src/quic_conn.c:3050
 #6  0x00005555556230aa in qc_treat_rx_pkts (qc=0x7ffff7cef000, cur_el=0x7ffff7cef6c0, next_el=0x0) at src/quic_conn.c:4214
 #7  0x0000555555625fee in quic_conn_app_io_cb (t=0x7ffff40c1fa0, context=0x7ffff7cef000, state=32848) at src/quic_conn.c:4640
 #8  0x00005555558a676d in run_tasks_from_lists (budgets=0x7ffff780d470) at src/task.c:596
 #9  0x00005555558a725b in process_runnable_tasks () at src/task.c:876
 #10 0x00005555558522ba in run_poll_loop () at src/haproxy.c:2945
 #11 0x00005555558529ac in run_thread_poll_loop (data=0x555555d14440 <ha_thread_info+64>) at src/haproxy.c:3141
 #12 0x00007ffff789ebb5 in ?? () from /usr/lib/libc.so.6
 #13 0x00007ffff7920d90 in ?? () from /usr/lib/libc.so.6

This should fix github issue #2067.

This must be backported up to 2.6.
2023-03-06 13:39:40 +01:00
Frédéric Lécaille
ec93721fb0 MINOR: quic: Send PING frames when probing Initial packet number space
In very very rare cases, it is possible the Initial packet number space
must be probed even if it there is no more in flight CRYPTO frames.
In such cases, a PING frame is sent into an Initial packet. As this
packet is ack-eliciting, it must be padded by the server. qc_do_build_pkt()
is modified to do so.

Take the opportunity of this patch to modify the trace for TX frames to
easily distinguished them from other frame relative traces.

Must be backported to 2.7.
2023-03-03 19:12:26 +01:00
Frédéric Lécaille
a65b71f89f BUG/MINOR: quic: Missing detections of amplification limit reached
Mark the connection as limited by the anti-amplification limit when trying to
probe the peer.
Wakeup the connection PTO/dectection loss timer as soon as a datagram is
received. This was done only when the datagram was dropped.
This fixes deadlock issues revealed by some interop runner tests.

Must be backported to 2.7 and 2.6.
2023-03-03 19:12:26 +01:00
Frédéric Lécaille
e6359b649b BUG/MINOR: quic: Do not resend already acked frames
Some frames are marked as already acknowledged from duplicated packets
whose the original packet has been acknowledged. There is no need
to resend such packets or frames.

Implement qc_pkt_with_only_acked_frms() to detect packet with only
already acknowledged frames inside and use it from qc_prep_fast_retrans()
which selects the packet to be retransmitted.

Must be backported to 2.6 and 2.7.
2023-03-03 19:12:26 +01:00
Frédéric Lécaille
21564be4a2 BUG/MINOR: quic: Ensure not to retransmit packets with no ack-eliciting frames
Even if there is a check in callers of qc_prep_hdshk_fast_retrans() and
qc_prep_fast_retrans() to prevent retransmissions of packets with no ack-eliciting
frames, these two functions should pay attention not do to that especially if
someone decides to modify their implementations in the future.

Must be backported to 2.6 and 2.7.
2023-03-03 19:12:26 +01:00
Frédéric Lécaille
b3562a3815 BUG/MINOR: quic: Remove force_ack for Initial,Handshake packets
This is an old bug which arrived in this commit due to a misinterpretation
of the RFC I guess where the desired effect was to acknowledge all the
handshake packets:

    77ac6f566 BUG/MINOR: quic: Missing acknowledgments for trailing packets

This had as bad effect to acknowledge all the handshake packets even the
ones which are not ack-eliciting.

Must be backported to 2.7 and 2.6.
2023-03-03 19:12:26 +01:00
Frédéric Lécaille
51a7caf921 MINOR: quic: Add traces about QUIC TLS key update
Dump the secret used to derive the next one during a key update initiated by the
client and dump the resulted new secret and the new key and iv to be used to
decryption Application level packets.

Also add a trace when the key update is supposed to be initiated on haproxy side.

This has already helped in diagnosing an issue evealed by the key update interop
test with xquic as client.

Must be backported to 2.7.
2023-03-03 19:12:26 +01:00
Frédéric Lécaille
720277843b BUG/MINOR: quic: v2 Initial packets decryption failed
v2 interop runner test revealed this bug as follows:

     [01|quic|4|c_conn.c:4087] new packet : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080 rel=H
     [01|quic|5|c_conn.c:1509] qc_pkt_decrypt(): entering : qc@0x7f62ec026e30
     [01|quic|0|c_conn.c:1553] quic_tls_decrypt() failed : qc@0x7f62ec026e30
     [01|quic|5|c_conn.c:1575] qc_pkt_decrypt(): leaving : qc@0x7f62ec026e30
     [01|quic|0|c_conn.c:4091] packet decryption failed -> dropped : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080

Only v2 Initial packets decryption received by the clients were impacted. There
is no issue to encrypt v2 Initial packets. This is due to the fact that when
negotiated the client may send two versions of Initial packets (currently v1,
then v2). The selection was done for the TX path but not on the RX path.

Implement qc_select_tls_ctx() to select the correct TLS cipher context for all
types of packets and call this function before removing the header protection
and before deciphering the packet.

Must be backported to 2.7.
2023-03-03 19:12:26 +01:00
Frédéric Lécaille
d30a04a4bb BUG/MINOR: quic: Ensure to be able to build datagrams to be retransmitted
When retransmitting datagrams with two coalesced packets inside, the second
packet was not taken into consideration when checking there is enough space
into the network for the datagram, especially when limited by the anti-amplification.

Must be backported to 2.6 and 2.7.
2023-03-03 19:12:26 +01:00
Frédéric Lécaille
ceb88b8f46 MINOR: quic: Add a BUG_ON_HOT() call for too small datagrams
This should be helpful to detect too small datagrams: datagrams
smaller than 1200 bytes, with Initial packets inside.

Must be backported to 2.7.
2023-03-03 19:12:26 +01:00
Frédéric Lécaille
69e7118fe9 BUG/MINOR: quic: Do not send too small datagrams (with Initial packets)
Before building a packet into a datagram, ensure there is sufficient space for at
least 1200 bytes. Also pad datagrams with only one ack-eliciting Initial packet
inside.

Must be backported to 2.7 and 2.6.
2023-03-03 19:12:26 +01:00
Aurelien DARRAGON
39254cac47 MINOR: http_ext: adding some documentation, forgot to inline function
Making http_7239_valid_obfsc() inline because it is only called by inline
functions.

Removing dead comment and documenting proxy_http_parse_{7239,xff,xot} functions.

No backport needed.
2023-03-03 18:22:59 +01:00
Amaury Denoyelle
dd3a33f863 BUG/MINOR: cli: fix CLI handler "set anon global-key" call
Anonymization mode has two CLI handlers "set anon <on|off>" and "set
anon global-key". The last one only requires admin level. However, as
cli_find_kw() is implemented, only the first handler will be retrieved
as they both start with the same prefix "set anon".

This has the effect to execute the wrong handler for "set anon
global-key" with an error message about an invalid keyword. To fix this,
handlers definition have been separated for both "set anon on" and "set
anon off" commands. This allows to have minimal changes while keeping
the same "set anon" prefix for each commands.

Also take this opportunity to fix a reference to a non-existing "set
global-key" CLI handler in the documentation.

This must be backported up to 2.7.
2023-03-03 18:05:58 +01:00
Amaury Denoyelle
c8a0efbda8 BUG/MEDIUM: quic: properly handle duplicated STREAM frames
When a STREAM frame is re-emitted, it will point to the same stream
buffer as the original one. If an ACK is received for either one of
these frame, the underlying buffer may be freed. Thus, if the second
frame is declared as lost and schedule for retransmission, we must
ensure that the underlying buffer is still allocated or interrupt the
retransmission.

Stream buffer is stored as an eb_tree indexed by the stream ID. To avoid
to lookup over a tree each time a STREAM frame is re-emitted, a lost
STREAM frame is flagged as QUIC_FL_TX_FRAME_LOST.

In most cases, this code is functional. However, there is several
potential issues which may cause a segfault :
- when explicitely probing with a STREAM frame, the frame won't be
  flagged as lost
- when splitting a STREAM frame during retransmission, the flag is not
  copied

To fix both these cases, QUIC_FL_TX_FRAME_LOST flag has been converted
to a <dup> field in quic_stream structure. This field is now properly
copied when splitting a STREAM frame. Also, as this is now an inner
quic_frame field, it will be copied automatically on qc_frm_dup()
invocation thus ensuring that it will be set on probing.

This issue was encounted randomly with the following backtrace :
 #0  __memmove_avx512_unaligned_erms ()
 #1  0x000055f4d5a48c01 in memcpy (__len=18446698486215405173, __src=<optimized out>,
 #2  quic_build_stream_frame (buf=0x7f6ac3fcb400, end=<optimized out>, frm=0x7f6a00556620,
 #3  0x000055f4d5a4a147 in qc_build_frm (buf=buf@entry=0x7f6ac3fcb5d8,
 #4  0x000055f4d5a23300 in qc_do_build_pkt (pos=<optimized out>, end=<optimized out>,
 #5  0x000055f4d5a25976 in qc_build_pkt (pos=0x7f6ac3fcba10,
 #6  0x000055f4d5a30c7e in qc_prep_app_pkts (frms=0x7f6a0032bc50, buf=0x7f6a0032bf30,
 #7  qc_send_app_pkts (qc=0x7f6a0032b310, frms=0x7f6a0032bc50) at src/quic_conn.c:4184
 #8  0x000055f4d5a35f42 in quic_conn_app_io_cb (t=0x7f6a0009c660, context=0x7f6a0032b310,

This should fix github issue #2051.

This should be backported up to 2.6.
2023-03-03 15:08:02 +01:00
Remi Tricot-Le Breton
8c20a74c90 BUG/MINOR: ssl: Use 'date' instead of 'now' in ocsp stapling callback
In the OCSP response callback, instead of using the actual date of the
system, the scheduler's 'now' timer is used when checking a response's
validity.

This patch can be backported to all stable versions.
2023-03-02 15:57:56 +01:00
Remi Tricot-Le Breton
56ab607c40 MINOR: ssl: Replace now.tv_sec with date.tv_sec in ocsp update task
Instead of relying on the scheduler's timer in the main ocsp update
task, we use the actual system's date.
2023-03-02 15:57:56 +01:00
Remi Tricot-Le Breton
86d1e0b163 BUG/MINOR: ssl: Fix ocsp-update when using "add ssl crt-list"
When adding a new certificate through the CLI and appending it to a
crt-list with the 'ocsp-update' option set, the new certificate would
not be added to the OCSP response update list.
The only thing that was missing was the copy of the ocsp_update mode
from the ssl_bind_conf into the ckch_store's object.
An extra wakeup of the update task also needed to happen in case the
newly inserted entry needs to be updated before the next wakeup of the
task.

This patch does not need to be backported.
2023-03-02 15:57:56 +01:00
Remi Tricot-Le Breton
ca0c84a509 MINOR: ssl: Add ocsp-update information to "show ssl crt-list"
The "show ssl crt-list <list>" CLI command did not manage the new
ocsp-update option yet.
2023-03-02 15:57:55 +01:00
Remi Tricot-Le Breton
5843237993 MINOR: ssl: Add global options to modify ocsp update min/max delay
The minimum and maximum delays between two automatic updates of a given
OCSP response can now be set via global options. It allows to limit the
update rate of OCSP responses for configurations that use many frontend
certificates with the ocsp-update option set if the updates are deemed
too costly.
2023-03-02 15:37:23 +01:00
Remi Tricot-Le Breton
9c4437d024 MINOR: ssl: Add way to dump ocsp response in base64
A new format option can be passed to the "show ssl ocsp-response" CLI
command to dump the contents of an OCSP response in base64. This is
needed because thanks to the new OCSP auto update mechanism, we could
end up using an OCSP response internally that was never provided by the
user.
2023-03-02 15:37:22 +01:00
Remi Tricot-Le Breton
7e1a62e2b4 MINOR: ssl: Increment OCSP update replay delay in case of failure
In case of successive OCSP update errors for a given OCSP response, the
retry delay will be multiplied by 2 for every new failure in order to
avoid retrying too often to update responses for which the responder is
unresponsive (for instance). The maximum delay will still be taken into
account so the OCSP update requests will wtill be sent at least every
hour.
2023-03-02 15:37:21 +01:00
Remi Tricot-Le Breton
b33fe2f4a2 MINOR: ssl: Use dedicated proxy and log-format for OCSP update
Instead of using the same proxy as other http client calls (through lua
for instance), the OCSP update will use a dedicated proxy which will
enable it to change the log format and log conditions (for instance).
This proxy will have the NOLOGNORM option and regular logging will be
managed by the update task itself because in order to dump information
related to OCSP updates, we need to control the moment when the logs are
emitted (instead or relying on the stream's life which is decorrelated
from the update itself).
The update task then calls sess_log directly, which uses a dedicated
ocsp logformat that fetches specific OCSP data. Sess_log was preferred
to the more low level app_log because it offers the strength of
"regular" sample fetches and allows to add generic information alongside
OCSP ones in the log line.
In case of connection error (unreachable server for instance), a regular
httpclient log line will also be emitted. This line will have some extra
HTTP related info that can't be provided by the ocsp update logging
mechanism.
2023-03-02 15:37:19 +01:00
Remi Tricot-Le Breton
d42c896216 MINOR: ssl: Add sample fetches related to OCSP update
This patch adds a series of sample fetches that rely on the specified
OCSP update context structure. They will then be of use only in the
context of an ongoing OCSP update.
They cannot be used directly in the configuration so they won't be made
public. They will be used in the OCSP update's specific log format which
should be emitted by the update task itself in a future patch.
2023-03-02 15:37:18 +01:00
Remi Tricot-Le Breton
d14fc51613 MINOR: ssl: Add 'show ssl ocsp-updates' CLI command
This command can be used to dump information about the entries contained
in the ocsp update tree. It will display one line per concerned OCSP
response and will contain the expected next update time as well as the
time of the last successful update, and the number of successful and
failed attempts.
2023-03-02 15:37:17 +01:00
Remi Tricot-Le Breton
0c96ee48b4 MINOR: ssl: Add certificate's path to certificate_ocsp structure
In order to have some information about the frontend certificate when
dumping the contents of the ocsp update tree from the cli, we could
either keep a reference to a ckch_store in the certificate_ocsp
structure, which might cause some dangling reference problems, or
simply copy the path to the certificate in the ocsp response structure.
This latter solution was chosen because of its simplicity.
2023-03-02 15:37:15 +01:00
Remi Tricot-Le Breton
ad6cba83a4 MINOR: ssl: Store specific ocsp update errors in response and update ctx
Those new specific error codes will enable to know a bit better what
went wrong during and OCSP update process. They will come to use in
future sample fetches as well as in debugging means (via the cli or
future traces).
2023-03-02 15:37:12 +01:00
Remi Tricot-Le Breton
9e94df3e55 MINOR: ssl: Add ocsp update success/failure counters
Those counters will be used for debugging purposes and will be dumped
via a cli command.
2023-03-02 15:37:11 +01:00
Remi Tricot-Le Breton
6de7b78c9f MINOR: ssl: Reinsert ocsp update entries later in case of unknown error
In case of allocation error during the construction of an OCSP request
for instance, we would have ended reinserting the ocsp entry at the same
place in the ocsp update tree which could potentially lead to an
"endless" loop of errors in ssl_ocsp_update_responses. In such a case,
entries are now reinserted further in the tree (1 minute later) in order
to avoid such a chain of alloc failure.
2023-03-02 15:37:10 +01:00
Remi Tricot-Le Breton
926f34bc36 MINOR: ssl: Destroy ocsp update http_client during cleanup
If a deinit is started while an OCSP update is in progress we might end
up with a dangling http_client instance that should be destroyed
properly.
2023-03-02 15:37:07 +01:00
Christopher Faulet
91ff709542 BUG/MINOR: mxu-h1: Report a parsing error on abort with pending data
When an abort is detected before all headers were received, and if there are
pending incoming data, we must report a parsing error instead of a
connection abort. This way it will be able to be handled as an invalid
message by HTTP analyzers instead of an early abort with no message.

It is especially important to be accurate on L7 retry. Indeed, without this
fix, this case will be handle by the "empty-response" retries policy while a
retry on "junk-response" is more accurate.

This patch must be backported to 2.7.
2023-03-01 17:35:16 +01:00
Christopher Faulet
c2fba3f77f BUG/MEDIUM: http-ana: Don't close request side when waiting for response
A recent fix (af124360e "BUG/MEDIUM: http-ana: Detect closed SC on opposite side
during body forwarding") was pushed to handle to sync a side when the opposite
one is in closing state. However, sometimes, the synchro is performed too early,
preventing a L7 retry to be performed.

Indeed, while the above fix is valid on the reponse side. On the request side,
if the response was not yet received, we must wait before closing.

So, to fix the fix, on the request side, we at least wait the response was
received before finishing the request analysis. Of course, if there is an error,
an abort or anything wrong on the server side, the response analyser should
handle it.

This patch is related to #2061. No backport needed.
2023-03-01 17:35:16 +01:00
Christopher Faulet
6f78ac5605 BUG/MINOR: http-ana: Do a L7 retry on read error if there is no response
A regression about "empty-response" L7 retry was introduced with the commit
dd6496f591 ("CLEANUP: http-ana: Remove useless if statement about L7
retries").

The if statetement was removed on a wrong assumption. Indeed, L7 retries on
status is now handled in the HTTP analysers. Thus, the stream-connector
(formely the conn-stream, and before again the stream-interface) no longer
report a read error to force a retry. But it is still possible to get a read
error with no response. In this case, we must perform a retry is
"empty-response" is enabled.

So the if statement is re-introduced, reverting the cleanup.

This patch should fix the issue #2061. It must be backported as far as 2.4.
2023-03-01 17:35:16 +01:00
Christopher Faulet
41ade746c7 BUG/MINOR: http-ana: Don't increment conn_retries counter before the L7 retry
When we are about to perform a L7 retry, we deal with the conn_retries
counter, to be sure we can retry. However, there is an issue here because
the counter is incremented before it is checked against the backend
limit. So, we can miss a connection retry.

Of course, we must invert both operation. The conn_retries counter must be
incremented after the check agains the backend limit.

This patch must be backported as far as 2.6.
2023-03-01 17:35:16 +01:00
Amaury Denoyelle
caa16549b8 MINOR: quic: notify on send ready
This patch completes the previous one with poller subscribe of quic-conn
owned socket on sendto() error. This ensures that mux-quic is notified
if waiting on sending when a transient sendto() error is cleared. As
such, qc_notify_send() is called directly inside socket I/O callback.

qc_notify_send() internal condition have been thus completed. This will
prevent to notify upper layer until all sending condition are fulfilled:
room in congestion window and no transient error on socket FD.

This should be backported up to 2.7.
2023-03-01 14:32:37 +01:00
Amaury Denoyelle
e1a0ee3cf6 MEDIUM: quic: implement poller subscribe on sendto error
On sendto() transient error, prior to this patch sending was simulated
and we relied on retransmission to retry sending. This could hurt
significantly the performance.

Thanks to quic-conn owned socket support, it is now possible to improve
this. On transient error, sending is interrupted and quic-conn socket FD
is subscribed on the poller for sending. When send is possible,
quic_conn_sock_fd_iocb() will be in charge of restart sending.

A consequence of this change is on the return value of qc_send_ppkts().
This function will now return 0 on transient error if quic-conn has its
owned socket. This is used to interrupt sending in the calling function.
The flag QUIC_FL_CONN_TO_KILL must be checked to differentiate a fatal
error from a transient one.

This should be backported up to 2.7.
2023-03-01 14:32:37 +01:00
Amaury Denoyelle
147862de61 MINOR: quic: purge txbuf before preparing new packets
Sending is implemented in two parts on quic-conn module. First, QUIC
packets are prepared in a buffer and then sendto() is called with this
buffer as input.

qc.tx.buf is used as the input buffer. It must always be empty before
starting to prepare new packets in it. Currently, this is guarantee by
the fact that either sendto() is completed, a fatal error is encountered
which prevent future send, or a transient error is encountered and we
rely on retransmission to send the remaining data.

This will change when poller subscribe of socket FD on sendto()
transient error will be implemented. In this case, qc.tx.buf will not be
emptied to resume sending when the transient error is cleared. To allow
the current sending process to work as expected, a new function
qc_purge_txbuf() is implemented. It will try to send remaining data
before preparing new packets for sending. If successful, txbuf will be
emptied and sending can continue. If not, sending will be interrupted.

This should be backported up to 2.7.
2023-03-01 14:29:16 +01:00
Amaury Denoyelle
e0fe118dad MINOR: quic: implement qc_notify_send()
Implement qc_notify_send(). This function is responsible to notify the
upper layer subscribed on SUB_RETRY_SEND if sending condition are back
to normal.

For the moment, this patch has no functional change as only congestion
window room is checked before notifying the upper layer. However, this
will be extended when poller subscribe of socket on sendto() error will
be implemented. qc_notify_send() will thus be responsible to ensure that
all condition are met before wake up the upper layer.

This should be backported up to 2.7.
2023-03-01 14:29:16 +01:00
Amaury Denoyelle
37333864ef MINOR: quic: simplify return path in send functions
This patch simply clean up return paths used in various send function of
quic-conn module. This will simplify the implementation of poller
subscribing on sendto() error which add another error handling path.

This should be backported up to 2.7.
2023-03-01 14:29:16 +01:00
Oto Valek
d1773e6881 BUG/MINOR: http-fetch: recognize IPv6 addresses in square brackets in req.hdr_ip()
If an IPv6 address is enclosed in square brackets [], trim them before
calling inet_pton().  This is to comply with RFC7239 6.1 and RFC3986 3.2.2.
2023-03-01 14:09:46 +01:00
Christopher Faulet
d48bfb6983 BUG/MINOR: http-check: Skip C-L header for empty body when it's not mandatory
The Content-Length header is always added into the request for an HTTP
health-check. However, when there is no payload, this header may be skipped
for OPTIONS, GET, HEAD and DELETE methods. In fact, it is a "SHOULD NOT" in
the RCF 9110 (#8.6).

It is not really an issue in itself but it seems to be an issue for AWS
ELB. It returns a 400-Bad-Request if a HEAD/GET request with no payload
contains a Content-Length header.

So, it is better to skip this header when possible.

This patch should fix the issue #2026. It could be backported as far as 2.2.
2023-02-28 18:51:27 +01:00
Christopher Faulet
0506d9de51 BUG/MINOR: http-check: Don't set HTX_SL_F_BODYLESS flag with a log-format body
When the HTTP request of a health-check is forged, we must not pretend there
is no payload, by setting HTX_SL_F_BODYLESS, if a log-format body was
configured.

Indeed, a test on the body length was used but it is only valid for a plain
string. For A log-format string, a list is used. Note it an bug with no
consequence for now.

This patch must be backported as far as 2.2.
2023-02-28 18:44:15 +01:00
Christopher Faulet
fb5fff19fe BUG/MINOR: mux-h1: Don't report an error on an early response close
If the response is closed before any data was received, we must not report
an error to the SE descriptor. It is important to be able to retry on an
empty response.

This patch should fix the issue #2061. It must be backported to 2.7.
2023-02-28 18:36:46 +01:00
Christopher Faulet
5e1b0e7bf8 BUG/MEDIUM: connection: Clear flags when a conn is removed from an idle list
When a connection is removed from the safe list or the idle list,
CO_FL_SAFE_LIST and CO_FL_IDLE_LIST flags must be cleared. It is performed
when the connection is reused. But not when it is moved into the
toremove_conns list. It may be an issue because the multiplexer owning the
connection may be woken up before the connection is really removed. If the
connection flags are not sanitized, it may think the connection is idle and
reinsert it in the corresponding list. From this point, we can imagine
several bugs. An UAF or a connection reused with an invalid state for
instance.

To avoid any issue, the connection flags are sanitized when an idle
connection is moved into the toremove_conns list. The same is performed at
right places in the multiplexers. Especially because the connection release
may be delayed (for h2 and fcgi connections).

This patch shoudld fix the issue #2057. It must carefully be backported as
far as 2.2. Especially on the 2.2 where the code is really different. But
some conflicts should be expected on the 2.4 too.
2023-02-28 18:36:29 +01:00
Amaury Denoyelle
4bdd069637 MINOR: quic: consider EBADF as critical on send()
EBADF on sendto() is considered as a fatal error. As such, it is removed
from the list of the transient errors. The connection will be killed
when encountered.

For the record, EBADF can be encountered on process termination with the
listener socket.

This should be backported up to 2.7.
2023-02-28 10:51:25 +01:00
Amaury Denoyelle
1febc2d316 MEDIUM: quic: improve fatal error handling on send
Send is conducted through qc_send_ppkts() for a QUIC connection. There
is two types of error which can be encountered on sendto() or affiliated
syscalls :
* transient error. In this case, sending is simulated with the remaining
  data and retransmission process is used to have the opportunity to
  retry emission
* fatal error. If this happens, the connection should be closed as soon
  as possible. This is done via qc_kill_conn() function. Until this
  patch, only ECONNREFUSED errno was considered as fatal.

Modify the QUIC send API to be able to differentiate transient and fatal
errors more easily. This is done by fixing the return value of the
sendto() wrapper qc_snd_buf() :
* on fatal error, a negative error code is returned. This is now the
  case for every errno except EAGAIN, EWOULDBLOCK, ENOTCONN, EINPROGRESS
  and EBADF.
* on a transient error, 0 is returned. This is the case for the listed
  errno values above and also if a partial send has been conducted by
  the kernel.
* on success, the return value of sendto() syscall is returned.

This commit will be useful to be able to handle transient error with a
quic-conn owned socket. In this case, the socket should be subscribed to
the poller and no simulated send will be conducted.

This commit allows errno management to be confined in the quic-sock
module which is a nice cleanup.

On a final note, EBADF should be considered as fatal. This will be the
subject of a next commit.

This should be backported up to 2.7.
2023-02-28 10:51:25 +01:00
Willy Tarreau
7b8aac4439 MINOR: tinfo: make thread_set functions return nth group/mask instead of first
thread_set_first_group() and thread_set_first_tmask() were modified
and renamed to instead return the number and mask of the nth group.
Passing zero continues to return the first one, but it will be more
convenient to use this way when building shards.
2023-02-28 10:28:47 +01:00
Willy Tarreau
fea8c19119 CLEANUP: listener: only store conn counts for local threads
The listeners have a thr_conn[] array indexed on the thread number that
is used during connection redispatching to know what threads are the least
loaded. Since we introduced thread groups, and based on the fact that a
listener may only belong to one group, there's no point storing counters
for all threads, we just need to store them for all threads in the group.

Doing so reduces the struct listener from 1500 to 632 bytes. This may be
backported to 2.7 to save a bit of resources.
2023-02-28 10:28:47 +01:00
Willy Tarreau
061754b249 BUG/MEDIUM: fd: make fd_delete() support being called from a different group
There's currently a problem affecting thread groups. Stopping a listener
from a different group than the one that runs this listener will trigger
the BUG_ON() in fd_delete(). This typically happens by issuing "disable
frontend f" on the CLI for the following config since the CLI runs on
group 1:

    global
        nbthread 2
        thread-groups 2
        stats socket /tmp/sock1 level admin

    frontend f
        mode http
        bind abns@frt-sock thread 2

This happens because abns sockets cannot be suspended so here this
requires a full stop.

A first approach would consist in isolating the caller during such rare
operations but it turns out that fd_delete() is not robust against even
such calling conditions, because it uses its own thread mask with an FD
that may be in a different group, and even though the threads would be
isolated and running_mask should be zero, we must not mix thread masks
from different groups like this.

A better solution consists in replacing the bug condition detection with
a self-protection. After all it's not trivial to figure all likely call
places, and forcing upper layers to protect the code is not clean if we
can do it at the bottom. Thus this is what is being done now. We detect
a thread group mismatch, and if so, we forcefully isolate ourselves and
entirely clean the socket. This has the merit of being much more robust
and easier to use (and harder to misuse). Given that such operations are
very rare (actually when they happen a crash follows), it's not a problem
to waste some time isolating the caller there.

This must be backported to 2.7, along with this previous patch:

  BUG/MINOR: fd: used the update list from the fd's group instead of tgid
2023-02-27 19:26:42 +01:00
Willy Tarreau
c0f6f5755b BUG/MINOR: fd: used the update list from the fd's group instead of tgid
In _fd_delete_orphan() we try to remove the FD from its update list
which is supposed to be the current thread group's. However the function
might be called from another group during stopping or under isolation,
so FD is not queued in the current group's update list but in its own
group's list. Let's retrieve the group from the FD instead of using
tgid.

This should have no impact on existing code since there is no code path
calling fd_delete() under thread isolation for now, and other cases are
blocked in fd_delete().

This must be backported to 2.7.
2023-02-27 19:26:41 +01:00
Christopher Faulet
85eabfbf67 MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished
As for the H1 and H2 stream, the QUIC stream now states it does not expect
data from the server as long as the request is unfinished. The aim is the
same. We must be sure to not trigger a read timeout on server side if the
client is still uploading data.

From the moment the end of the request is received and forwarded to upper
layer, the QUIC stream reports it expects to receive data from the opposite
endpoint. This re-enables read timeout on the server side.
2023-02-27 17:45:45 +01:00
Christopher Faulet
72722c04b0 MEDIUM: mux-h2: Don't expect data from server as long as request is unfinished
As for the H1 stream, the H2 stream now states it does not expect data from
the server as long as the request is unfinished. The aim is the same. We
must be sure to not trigger a read timeout on server side if the client is
still uploading data.

From the moment the end of the request is received and forwarded to upper
layer, the H2 stream reports it expects to receive data from the opposite
endpoint. This re-enables read timeout on the server side.
2023-02-27 17:45:45 +01:00
Christopher Faulet
f4b89f162a MEDIUM: mux-h1: Don't expect data from server as long as request is unfinished
On client side, as long as the request is unfinished, the H1 stream states
it does not expect data from the server. It does not mean the server must
not send its response but only it may wait to receive the whole request with
no risk to trigger a read timeout.

When the request is finished, the H1 stream reports it expects to receive
data from the opposite endpoint.

The purpose of this patch is to never report a server timeout on receive if
the client is still uploading data. This way, it is possible to have a
smaller server timeout than the client one.
2023-02-27 17:45:45 +01:00
Christopher Faulet
59b240c30c BUG/MEDIUM: stconn: Report a blocked send if some output data are not consumed
Instead of reporting a blocked send if nothing is send, we do it if some
output data remain blocked after a write attempts or after a call the the
applet's I/O handler. It is mandatory to properly handle write timeouts.

Indeed, if an endpoint is blocked for a while but it partially consumed
output data, no timeout is triggered. It is especially true for
connections. But the same may happen for applet, there is no reason.

Of course, if the endpoint decides to partially consume output data because
it must wait to move on for any reason, it should use the se/applet API
(se/applet_will_consume(), se/applet_wont_consume() and
se/applet_need_more_data()).

This bug was introduced during the channels timeouts refactoring. No
backport is needed.
2023-02-27 17:45:45 +01:00
Christopher Faulet
e758b5c703 MEDIUM: stream: Eventually handle stream timeouts when exiting process_stream()
When we exit from process_stream(), if the task is expired, we try to handle
the stream timeouts and we resync the stream-connectors. This avoids a
useless immediate wakeup. It is not really an issue, but it is a small
improvement in edge cases.
2023-02-27 17:45:45 +01:00
Christopher Faulet
85e568f594 MINOR: stream: Handle stream's timeouts in a dedicated function
This will be mandatory to be able to handle stream's timeouts before exiting
process_stream(). So, to not duplicate code, all this stuff is moved in a
dedicated function.
2023-02-27 17:45:45 +01:00
Christopher Faulet
3bbd2baab3 BUG/MINOR: stream: Remove BUG_ON about the task expiration in process_stream()
At the end of process_stream(), A BUG_ON was recently added to abort if we
leave the function with an expired task. However, it may happen if an event
prevents the timeout to be handled but nothing evolved. In this case, the
task expiration is not updated and we expect to catch the timeout on the
immediate task wakeup.

No backport needed.
2023-02-27 17:45:45 +01:00
Christopher Faulet
c9ec9bc834 BUG/MEDIUM: h1-htx: Never copy more than the max data allowed during parsing
A bug during H1 data parsing may lead to copy more data than the maximum
allowed. The bug is an overflow on this max threshold when it is lower than
the size of an htx_blk structure.

At first glance, it means it is possible to not respsect the buffer's
reserve. So it may lead to rewrite errors but it may also block any progress
on the stream if the compression is enabled. In this case, the channel
buffer appears as full and the compression must wait for space to
proceed. Outside of any bug, it is only possible when there are outgoing
data to forward, so the compression filter just waits. Because of this bug,
there is nothing to forward. The buffer is just full of input data. Thus
nothing move and the stream is infinitly blocked.

To fix the bug, we must be sure to be able to create an HTX block of 1 byte
without exceeding the maximum allowed.

This patch should fix the issue #2053. It must be backported as far as 2.5.
2023-02-27 17:45:45 +01:00
Aurelien DARRAGON
e51891a01d BUG/MEDIUM: fd: avoid infinite loops in fd_add_to_fd_list and fd_rm_from_fd_list
With 4d9888c ("CLEANUP: fd: get rid of the __GET_{NEXT,PREV} macros") some
"volatile" keywords were dropped at various assignment places in fd's code.

In fd_add_to_fd_list() and fd_add_to_fd_list(), because of the absence of
the "volatile" keyword: the compiler was able to perform some code
optimizations that prevented prev and next variables from being reloaded
between locking attempts (goto loop).

The result was that fd_add_to_fd_list() and fd_rm_from_fd_list() could enter in
infinite loops, preventing other threads from working further and ultimately
resulting in the watchdog being triggered as described in GH #2011.

To fix this, we made sure to re-audit 4d9888c in order to restore the required
memory barriers / compilers hints to prevent the compiler from mis-optimizing
the code around the fd's locks.
That is: using atomic loads to fetch the prev and next values, and restoring
the "volatile" cast for cur_list.ent variable assignment in fd_rm_from_fd_list()

Big thanks to @xanaxalan for his help and patience and to @wtarreau for his
findings and explanations in regard to compiler's optimizations.

This must be backported in 2.7 with 4d9888c ("CLEANUP: fd: get rid of the
__GET_{NEXT,PREV} macros")
2023-02-27 16:55:56 +01:00
Frédéric Lécaille
83540ed429 BUILD: thead: Fix several 32 bits compilation issues with uint64_t variables
Cast uint64_t as ullong and difference between two uint64_t as llong.
2023-02-24 09:56:50 +01:00
Sébaastien Gross
2a1bcf1a59 MINOR: config: add HAPROXY_BRANCH environment variable
This patch adds support from HAPROXY_BRANCH environment variable.
It can be useful is some resources are loaded from different
locations when migrating from one version to another.

Signed-off-by: Sébastien Gross <sgross@haproxy.com>
2023-02-24 09:45:44 +01:00
Willy Tarreau
a2a3d5dd25 CLEANUP: ring: remove the now unused ring's offset
Since the previous patch, the ring's offset is not used anymore. The
haring utility remains backward-compatible since it can trust the
buffer element that's at the beginning of the map and which still
contains all the valid data.
2023-02-24 09:26:30 +01:00
Willy Tarreau
d9c7188633 MEDIUM: ring: make the offset relative to the head/tail instead of absolute
The ring's offset currently contains a perpetually growing custor which
is the number of bytes written from the start. It's used by readers to
know where to (re)start reading from. It was made absolute because both
the head and the tail can change during writes and we needed a fixed
position to know where the reader was attached. But this is complicated,
error-prone, and limits the ability to reduce the lock's coverage. In
fact what is needed is to know where the reader is currently waiting, if
at all. And this location is exactly where it stored its count, so the
absolute position in the buffer (the seek offset from the first storage
byte) does represent exactly this, as it doesn't move (we don't realign
the buffer), and is stable regardless of how head/tail changes with writes.

This patch modifies this so that the application code now uses this
representation instead. The most noticeable change is the initialization,
where we've kept ~0 as a marker to go to the end, and it's now set to
the tail offset instead of trying to resolve the current write offset
against the current ring's position.

The offset was also used at the end of the consuming loop, to detect
if a new write had happened between the lock being released and taken
again, so as to wake the consumer(s) up again. For this we used to
take a copy of the ring->ofs before unlocking and comparing with the
new value read in the next lock. Since it's not possible to write past
the current reader's location, there's no risk of complete rollover, so
it's sufficient to check if the tail has changed.

Note that the change also has an impact on the haring consumer which
needs to adapt as well. But that's good in fact because it will rely
on one less variable, and will use offsets relative to the buffer's
head, and the change remains backward-compatible.
2023-02-24 09:26:30 +01:00
Willy Tarreau
d0d85d2e36 BUG/MINOR: ring: do not realign ring contents on resize
If a ring is resized, we must not zero its head since the contents
are preserved in-situ. Till now it used to work because we only resize
during boot and we emit very few data (if at all) during boot. But this
can change in the future.

This can be backported to 2.2 though no older version should notice a
difference.
2023-02-24 09:26:30 +01:00
Frédéric Lécaille
b7a13be6cd BUILD: quic: 32-bits compilation issue with %zu in quic_rx_pkts_del()
This issue arrived with this commit:
    1dbeb35f8 MINOR: quic: Add new traces about by connection RX buffer handling

and revealed by the GH CI as follows:

   src/quic_conn.c: In function ‘quic_rx_pkts_del’:
    include/haproxy/trace.h:134:65: error: format ‘%zu’ expects argument of type ‘size_t’,
    but argument 6 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=]
    _msg_len = snprintf(_msg, sizeof(_msg), (fmt), ##args);

Replace all %zu printf integer format by %llu.

Must be backported to 2.7 where the previous is supposed to be backported.
2023-02-24 09:23:07 +01:00
Aurelien DARRAGON
28a6d48a60 MINOR: haproxy: always protocol unbind on startup error path
In haproxy startup, all init error paths after the protocol bind step
cautiously call protocol_unbind_all() before exiting except one that was
conditional. We're not making an exception to the rule and we now properly
call protocol_unbind_all() as well.

No backport needed as this patch is unnoticeable.
2023-02-23 15:05:05 +01:00
Aurelien DARRAGON
de63efba5a MINOR: proto_ux: ability to dump ABNS names in error messages
In sock_unix_bind_receiver(), uxst_bind_listener() and uxdg_bind_listener(),
properly dump ABNS socket names by leveraging sa2str() function which does the
hard work for us.

UNIX sockets are reported as is (unchanged) while ABNS UNIX sockets
are prefixed with 'abns@' to match the syntax used in config file.
(they where previously showing as empty strings because of the leading
NULL-byte that was not properly handled in this case)

This is only a minor debug improvement, however it could be useful to
backport it up to 2.4.
[for 2.4: you should replace "%s [%s]" by "%s for [%s]" for uxst and uxgd if
you wan't the patch to apply properly]
2023-02-23 15:05:05 +01:00
Aurelien DARRAGON
2338dba18d MEDIUM: proto_ux: properly suspend named UNIX listeners
When a listener is suspended, we expect that it may not process any data for
the time it is suspended.

Yet for named UNIX socket, as the suspend operation is a no-op at the proto
level, recv events on the socket may still be processed by the polling loop.

This is quite disturbing as someone may rely on a paused proxy being harmless,
which is true for all protos except for named UNIX sockets.

To fix this behavior, we explicitely disable io recv events when suspending a
named UNIX socket listener (we call disable() method on the listener).

The io recv events will automatically be restored when the listener is resumed
since the l->enable() method is called at the end of the resume() operation.

This could be backported up to 2.4 after a reasonable observation
period to make sure that this change doesn't cause unwanted side-effects.
2023-02-23 15:05:05 +01:00
Aurelien DARRAGON
2a7903bbb2 BUG/MINOR: sock_unix: match finalname with tempname in sock_unix_addrcmp()
In sock_unix_addrcmp(), named UNIX sockets paths are manually compared in
order to properly handle tempname paths (ending with ".XXXX.tmp") that result
from the 2-step bind implemented in sock_unix_bind_receiver().

However, this logic does not take into account "final" path names (without the
".XXXX.tmp" suffix).

Example:
  /tmp/test did not match with /tmp/test.1288.tmp prior to this patch

Indeed, depending on how the socket addr is retrieved, the same socket
could be designated either by its tempname or finalname.
socket addr is normally stored with its finalname within a receiver, but a
call to getsockname() on the same socket will return the tempname that was
used for the bind() call (sock_get_old_sockets() depends on getsockname()).

This causes sock_find_compatible_fd() to malfunction with named UNIX
sockets (ie: haproxy -x CLI option).

To fix this, we slightly modify the check around the temp suffix in
sock_unix_addrcmp(): we perform the suffix check even if one of the
paths is lacking the temp suffix (with proper precautions).

Now the function is able to match:
  - finalname x finalname
  - tempname  x tempname
  - finalname x tempname

That is: /tmp/test == /tmp/test.1288.tmp == /tmp/test.X.tmp

It should be backported up to 2.4
2023-02-23 15:05:05 +01:00
Aurelien DARRAGON
ca8a4b2966 BUG/MEDIUM: listener/proxy: fix listeners notify for proxy resume
In 58651b42f ("MEDIUM: listener/proxy: make the listeners notify about
proxy pause/resume") we introduced the logic for pause/resume notify using
li_ready for pause and li_paused for resume.

Unfortunately, relying on li_paused for resume doesn't work reliably if we
resume a listener which is only made of receivers that are completely stopped.
For example, this could happen with receivers that don't support the
LI_PAUSED state like ABNS sockets.

This is especially true since pause_listener() was renamed to
suspend_listener() to better reflect its actual behavior in
("MINOR: listener:  pause_listener() becomes suspend_listener())

To fix this, we now rely on the li_suspended state in resume_listener() to make
sure that suspend_listener() and resume_listener() notify messages are
consistent to each other:

"Proxy pause" is triggered when there are no more ready listeners.
"Proxy resume" is triggered when there are no more suspended listeners.

Also, we make use of the new PR_FL_PAUSED proxy flag to make sure we don't
report the same event twice.

This could be backported up to 2.4 after a reasonable observation
period to make sure that this change doesn't cause unwanted side-effects.

--
Backport notes:

This commit depends on:
 - "MINOR: listener: pause_listener() becomes suspend_listener()"

-> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state"
was not backported:

Replace this:

    |+       if (px && !(px->flags & PR_FL_PAUSED) && !px->li_ready) {
    |                /* PROXY_LOCK is required */
    |                proxy_cond_pause(px);
    |                ha_warning("Paused %s %s.\n", proxy_cap_str(px->cap), px->id);

By this:

    |+       if (px && !px->li_ready) {
    |                ha_warning("Paused %s %s.\n", proxy_cap_str(px->cap), px->id);
    |                send_log(px, LOG_WARNING, "Paused %s %s.\n", proxy_cap_str(px->cap), px->id);
    |        }

And this:

    |+       if (px && (px->flags & PR_FL_PAUSED) && !px->li_suspended) {
    |                /* PROXY_LOCK is required */
    |                proxy_cond_resume(px);
    |                ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);

By this:

    |+       if (px && !px->li_suspended) {
    |                ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);
    |                send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);
    |        }
2023-02-23 15:05:05 +01:00
Aurelien DARRAGON
d3ffba4512 MINOR: listener: pause_listener() becomes suspend_listener()
We are simply renaming pause_listener() to suspend_listener() to prevent
confusion around listener pausing.

A suspended listener can be in two differents valid states:
 - LI_PAUSED: the listener is effectively paused, it will unpause on
   resume_listener()
 - LI_ASSIGNED (not bound): the listener does not support the LI_PAUSED
   state, so it was unbound to satisfy the suspend request, it will
   correcly re-bind on resume_listener()

Besides that, we add the LI_F_SUSPENDED flag to mark suspended listeners in
suspend_listener() and unmark them in resume_listener().

We're also adding li_suspend proxy variable to track the number of currently
suspended listeners:
That is, the number of listeners that were suspended through suspend_listener()
and that are either in LI_PAUSED or LI_ASSIGNED state.

Counter is increased on successful suspend in suspend_listener() and it is
decreased on successful resume in resume_listener()

--
Backport notes:

-> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state"
was not backported:

Replace this:

    |                /* PROXY_LOCK is require
    |                proxy_cond_resume(px);

By this:

    |                ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);
    |                send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);

-> 2.6 and 2.7 only, as "MINOR: listener: make sure we don't pause/resume" was
custom patched:

Replace this:

    |@@ -253,6 +253,7 @@ struct listener {
    |
    | /* listener flags (16 bits) */
    | #define LI_F_FINALIZED           0x0001  /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
    |+#define LI_F_SUSPENDED           0x0002  /* listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state */
    |
    | /* Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of
    |  * success, or a combination of ERR_* flags if an error is encountered. The

By this:

    |@@ -222,6 +222,7 @@ struct li_per_thread {
    |
    | #define LI_F_QUIC_LISTENER       0x00000001  /* listener uses proto quic */
    | #define LI_F_FINALIZED           0x00000002  /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
    |+#define LI_F_SUSPENDED           0x00000004  /* listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state */
    |
    | /* The listener will be directly referenced by the fdtab[] which holds its
    |  * socket. The listener provides the protocol-specific accept() function to
2023-02-23 15:05:05 +01:00
Aurelien DARRAGON
046a75e131 BUG/MEDIUM: resume from LI_ASSIGNED in default_resume_listener()
Since fc974887c ("MEDIUM: protocol: explicitly start the receiver before
the listener"), resume from LI_ASSIGNED state does not work anymore.

This is because the binding part has been divided into 2 distinct steps
since: first bind(), then listen().

This new logic was properly implemented in startup sequence
through protocol_bind_all() but wasn't properly reported in
default_resume_listener() function.

Fixing default_resume_listener() to comply with the new logic.

This should help ABNS sockets to properly rebind in resume_listener()
after they have been stopped by pause_listener():
See Redmine:4475 for more context.

This commit depends on:
 - "MINOR: listener: workaround for closing a tiny race between resume_listener() and stopping"
 - "MINOR: listener: make sure we don't pause/resume bypassed listeners"

This could be backported up to 2.4 after a reasonable observation period to
make sure that this change doesn't cause unwanted side-effects.
2023-02-23 15:05:05 +01:00