Commit Graph

6744 Commits

Author SHA1 Message Date
Amaury Denoyelle
e0fe118dad MINOR: quic: implement qc_notify_send()
Implement qc_notify_send(). This function is responsible to notify the
upper layer subscribed on SUB_RETRY_SEND if sending condition are back
to normal.

For the moment, this patch has no functional change as only congestion
window room is checked before notifying the upper layer. However, this
will be extended when poller subscribe of socket on sendto() error will
be implemented. qc_notify_send() will thus be responsible to ensure that
all condition are met before wake up the upper layer.

This should be backported up to 2.7.
2023-03-01 14:29:16 +01:00
Amaury Denoyelle
1febc2d316 MEDIUM: quic: improve fatal error handling on send
Send is conducted through qc_send_ppkts() for a QUIC connection. There
is two types of error which can be encountered on sendto() or affiliated
syscalls :
* transient error. In this case, sending is simulated with the remaining
  data and retransmission process is used to have the opportunity to
  retry emission
* fatal error. If this happens, the connection should be closed as soon
  as possible. This is done via qc_kill_conn() function. Until this
  patch, only ECONNREFUSED errno was considered as fatal.

Modify the QUIC send API to be able to differentiate transient and fatal
errors more easily. This is done by fixing the return value of the
sendto() wrapper qc_snd_buf() :
* on fatal error, a negative error code is returned. This is now the
  case for every errno except EAGAIN, EWOULDBLOCK, ENOTCONN, EINPROGRESS
  and EBADF.
* on a transient error, 0 is returned. This is the case for the listed
  errno values above and also if a partial send has been conducted by
  the kernel.
* on success, the return value of sendto() syscall is returned.

This commit will be useful to be able to handle transient error with a
quic-conn owned socket. In this case, the socket should be subscribed to
the poller and no simulated send will be conducted.

This commit allows errno management to be confined in the quic-sock
module which is a nice cleanup.

On a final note, EBADF should be considered as fatal. This will be the
subject of a next commit.

This should be backported up to 2.7.
2023-02-28 10:51:25 +01:00
Willy Tarreau
7b8aac4439 MINOR: tinfo: make thread_set functions return nth group/mask instead of first
thread_set_first_group() and thread_set_first_tmask() were modified
and renamed to instead return the number and mask of the nth group.
Passing zero continues to return the first one, but it will be more
convenient to use this way when building shards.
2023-02-28 10:28:47 +01:00
Willy Tarreau
fea8c19119 CLEANUP: listener: only store conn counts for local threads
The listeners have a thr_conn[] array indexed on the thread number that
is used during connection redispatching to know what threads are the least
loaded. Since we introduced thread groups, and based on the fact that a
listener may only belong to one group, there's no point storing counters
for all threads, we just need to store them for all threads in the group.

Doing so reduces the struct listener from 1500 to 632 bytes. This may be
backported to 2.7 to save a bit of resources.
2023-02-28 10:28:47 +01:00
Christopher Faulet
85eabfbf67 MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished
As for the H1 and H2 stream, the QUIC stream now states it does not expect
data from the server as long as the request is unfinished. The aim is the
same. We must be sure to not trigger a read timeout on server side if the
client is still uploading data.

From the moment the end of the request is received and forwarded to upper
layer, the QUIC stream reports it expects to receive data from the opposite
endpoint. This re-enables read timeout on the server side.
2023-02-27 17:45:45 +01:00
Christopher Faulet
8aabc8ebfd MINOR: stconn: Report a send activity when endpoint is willing to consume data
When the endpoint (applet or mux) is now willing to consume data while it
said it wouldn't, a send activity is reported. Indeed, the writes was
blocked because of the endpoint. It is now ready to consume outgoing
data. So an send activity must be reported to reset corresponding timers.

Concretly, when the flag SE_FL_WONT_CONSULE is removed, a send activity is
reported.
2023-02-27 17:45:45 +01:00
Willy Tarreau
a2a3d5dd25 CLEANUP: ring: remove the now unused ring's offset
Since the previous patch, the ring's offset is not used anymore. The
haring utility remains backward-compatible since it can trust the
buffer element that's at the beginning of the map and which still
contains all the valid data.
2023-02-24 09:26:30 +01:00
Aurelien DARRAGON
d3ffba4512 MINOR: listener: pause_listener() becomes suspend_listener()
We are simply renaming pause_listener() to suspend_listener() to prevent
confusion around listener pausing.

A suspended listener can be in two differents valid states:
 - LI_PAUSED: the listener is effectively paused, it will unpause on
   resume_listener()
 - LI_ASSIGNED (not bound): the listener does not support the LI_PAUSED
   state, so it was unbound to satisfy the suspend request, it will
   correcly re-bind on resume_listener()

Besides that, we add the LI_F_SUSPENDED flag to mark suspended listeners in
suspend_listener() and unmark them in resume_listener().

We're also adding li_suspend proxy variable to track the number of currently
suspended listeners:
That is, the number of listeners that were suspended through suspend_listener()
and that are either in LI_PAUSED or LI_ASSIGNED state.

Counter is increased on successful suspend in suspend_listener() and it is
decreased on successful resume in resume_listener()

--
Backport notes:

-> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state"
was not backported:

Replace this:

    |                /* PROXY_LOCK is require
    |                proxy_cond_resume(px);

By this:

    |                ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);
    |                send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);

-> 2.6 and 2.7 only, as "MINOR: listener: make sure we don't pause/resume" was
custom patched:

Replace this:

    |@@ -253,6 +253,7 @@ struct listener {
    |
    | /* listener flags (16 bits) */
    | #define LI_F_FINALIZED           0x0001  /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
    |+#define LI_F_SUSPENDED           0x0002  /* listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state */
    |
    | /* Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of
    |  * success, or a combination of ERR_* flags if an error is encountered. The

By this:

    |@@ -222,6 +222,7 @@ struct li_per_thread {
    |
    | #define LI_F_QUIC_LISTENER       0x00000001  /* listener uses proto quic */
    | #define LI_F_FINALIZED           0x00000002  /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
    |+#define LI_F_SUSPENDED           0x00000004  /* listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state */
    |
    | /* The listener will be directly referenced by the fdtab[] which holds its
    |  * socket. The listener provides the protocol-specific accept() function to
2023-02-23 15:05:05 +01:00
Aurelien DARRAGON
2370599f96 MINOR: listener: make sure we don't pause/resume bypassed listeners
Some listeners are kept in LI_ASSIGNED state but are not supposed to be
started since they were bypassed on initial startup (eg: in protocol_bind_all()
or in enable_listener()...)

Introduce the LI_F_FINALIZED flag: when the variable is non
zero it means that the listener made it past the LI_LISTEN state (finalized)
at least once so we can safely pause / resume. This way we won't risk starting
a previously bypassed listener which never made it that far and thus was not
expected to be lazy-started by accident.

As listener_pause() and listener_resume() are currently partially broken, such
unexpected lazy-start won't happen. But we're trying to restore pause() and
resume() behavior so this patch will be required before going any further.

We had to re-introduce listeners 'flags' struct member since it was recently
moved into bind_conf struct. But here we do have a legitimate need for these
listener-only flags.

This should only be backported if explicitly required by another commit.
--
Backport notes:

-> 2.4 and 2.5:

The 2-bytes hole we're using in the current patch does not apply, let's
use the 4-byte hole located under the 'option' field.

Replace this:

    |@@ -226,7 +226,8 @@ struct li_per_thread {
    | struct listener {
    |        enum obj_type obj_type;         /* object type = OBJ_TYPE_LISTENER */
    |        enum li_state state;            /* state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL */
    |-       /* 2-byte hole here */
    |+       uint16_t flags;                 /* listener flags: LI_F_* */
    |        int luid;                       /* listener universally unique ID, used for SNMP */
    |        int nbconn;                     /* current number of connections on this listener */
    |        unsigned int thr_idx;           /* thread indexes for queue distribution : (t2<<16)+t1 */

By this:

    |@@ -209,6 +209,8 @@ struct listener {
    |        short int nice;                 /* nice value to assign to the instantiated tasks */
    |        int luid;                       /* listener universally unique ID, used for SNMP */
    |        int options;                    /* socket options : LI_O_* */
    |+       uint16_t flags;                 /* listener flags: LI_F_* */
    |+       /* 2-bytes hole here */
    |        __decl_thread(HA_RWLOCK_T lock);
    |
    |        struct fe_counters *counters;   /* statistics counters */

-> 2.4 only:
We need to adjust some contextual lines.
Replace this:

    |@@ -477,7 +478,7 @@ int pause_listener(struct listener *l, int lpx, int lli)
    |        if (!lli)
    |                HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
    |
    |-       if (l->state <= LI_PAUSED)
    |+       if (!(l->flags & LI_F_FINALIZED) || l->state <= LI_PAUSED)
    |                goto end;
    |
    |        if (l->rx.proto->suspend)

By this:

    |@@ -477,7 +478,7 @@ int pause_listener(struct listener *l, int lpx, int lli)
    |            !(proc_mask(l->rx.settings->bind_proc) & pid_bit))
    |                goto end;
    |
    |-       if (l->state <= LI_PAUSED)
    |+       if (!(l->flags & LI_F_FINALIZED) || l->state <= LI_PAUSED)
    |                goto end;
    |
    |        if (l->rx.proto->suspend)

And this:

    |@@ -535,7 +536,7 @@ int resume_listener(struct listener *l, int lpx, int lli)
    |        if (MT_LIST_INLIST(&l->wait_queue))
    |                goto end;
    |
    |-       if (l->state == LI_READY)
    |+       if (!(l->flags & LI_F_FINALIZED) || l->state == LI_READY)
    |                goto end;
    |
    |        if (l->rx.proto->resume)

By this:

    |@@ -535,7 +536,7 @@ int resume_listener(struct listener *l, int lpx, int lli)
    |            !(proc_mask(l->rx.settings->bind_proc) & pid_bit))
    |                goto end;
    |
    |-       if (l->state == LI_READY)
    |+       if (!(l->flags & LI_F_FINALIZED) || l->state == LI_READY)
    |                goto end;
    |
    |        if (l->rx.proto->resume)

-> 2.6 and 2.7 only:

struct listener 'flags' member still exists, let's use it.

Remove this from the current patch:

    |@@ -226,7 +226,8 @@ struct li_per_thread {
    | struct listener {
    |        enum obj_type obj_type;         /* object type = OBJ_TYPE_LISTENER */
    |        enum li_state state;            /* state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL */
    |-       /* 2-byte hole here */
    |+       uint16_t flags;                 /* listener flags: LI_F_* */
    |        int luid;                       /* listener universally unique ID, used for SNMP */
    |        int nbconn;                     /* current number of connections on this listener */
    |        unsigned int thr_idx;           /* thread indexes for queue distribution : (t2<<16)+t1 */

Then, replace this:

    |@@ -251,6 +250,9 @@ struct listener {
    |        EXTRA_COUNTERS(extra_counters);
    | };
    |
    |+/* listener flags (16 bits) */
    |+#define LI_F_FINALIZED           0x0001  /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
    |+
    | /* Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of
    |  * success, or a combination of ERR_* flags if an error is encountered. The
    |  * function pointer can be NULL if not implemented. The function also has an

By this:

    |@@ -221,6 +221,7 @@ struct li_per_thread {
    | };
    |
    | #define LI_F_QUIC_LISTENER       0x00000001  /* listener uses proto quic */
    |+#define LI_F_FINALIZED           0x00000002  /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
    |
    | /* The listener will be directly referenced by the fdtab[] which holds its
    |  * socket. The listener provides the protocol-specific accept() function to
2023-02-23 15:05:05 +01:00
Aurelien DARRAGON
bcad7e6319 MINOR: listener: add relax_listener() function
There is a need for a small difference between resuming and relaxing
a listener.

When resuming, we expect that the listener may completely resume, this includes
unpausing or rebinding if required.
Resuming a listener is a best-effort operation: no matter the current state,
try our best to bring the listener up to the LI_READY state.

There are some cases where we only want to "relax" listeners that were
previously restricted using limit_listener() or listener_full() functions.
Here we don't want to ressucitate listeners, we're simply interested in
cancelling out the previous restriction.

To this day, listener_resume() on a unbound listener is broken, that's why
the need for this wasn't felt yet.

But we're trying to restore historical listener_resume() behavior, so we better
prepare for this by introducing an explicit relax_listener() function that
only does what is expected in such cases.

This commit depends on:
 - "MINOR: listener/api: add lli hint to listener functions"
2023-02-23 15:05:05 +01:00
Aurelien DARRAGON
4059e094db MINOR: listener/api: add lli hint to listener functions
Add listener lock hint (AKA lli) to (stop/resume/pause)_listener() functions.
All these functions implicitely take the listener lock when they are called:
It could be useful to be able to call them while already holding the lock, so
we're adding lli hint to make them take the lock only when it is missing.

This should only be backported if explicitly required by another commit
--

-> 2.4 and 2.5 common backport notes:

These 2 commits need to be backported first:
 - 187396e34 "CLEANUP: listener: function comment typo in stop_listener()"
 - a57786e87 "BUG/MINOR: listener: null pointer dereference suspected by
   coverity"

-> 2.4 special backport notes:

In addition to the previously mentionned dependencies, the patch needs to be
slightly adapted to match the corresponding contextual lines:

Replace this:

    |@@ -471,7 +474,8 @@ int pause_listener(struct listener *l, int lpx)
    |        if (!lpx && px)
    |                HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock);
    |
    |-       HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
    |+       if (!lli)
    |+               HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
    |
    |        if (l->state <= LI_PAUSED)
    |                goto end;

By this:

    |@@ -471,7 +474,8 @@ int pause_listener(struct listener *l, int lpx)
    |        if (!lpx && px)
    |                HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock);
    |
    |-       HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
    |+       if (!lli)
    |+               HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
    |
    |        if ((global.mode & (MODE_DAEMON | MODE_MWORKER)) &&
    |            !(proc_mask(l->rx.settings->bind_proc) & pid_bit))

Replace this:

    |@@ -169,7 +169,7 @@ void protocol_stop_now(void)
    |        HA_SPIN_LOCK(PROTO_LOCK, &proto_lock);
    |        list_for_each_entry(proto, &protocols, list) {
    |                list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list)
    |-                       stop_listener(listener, 0, 1);
    |+                       stop_listener(listener, 0, 1, 0);
    |        }
    |        HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock);
    | }

By this:

    |@@ -169,7 +169,7 @@ void protocol_stop_now(void)
    |        HA_SPIN_LOCK(PROTO_LOCK, &proto_lock);
    |        list_for_each_entry(proto, &protocols, list) {
    |                list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list)
    |                        if (!listener->bind_conf->frontend->grace)
    |-                               stop_listener(listener, 0, 1);
    |+                               stop_listener(listener, 0, 1, 0);
    |        }
    |        HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock);

Replace this:

    |@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy *p)
    |        HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock);
    |
    |        list_for_each_entry(l, &p->conf.listeners, by_fe)
    |-               stop_listener(l, 1, 0);
    |+               stop_listener(l, 1, 0, 0);
    |
    |        if (!(p->flags & (PR_FL_DISABLED|PR_FL_STOPPED)) && !p->li_ready) {
    |                /* might be just a backend */

By this:

    |@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy *p)
    |        HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock);
    |
    |        list_for_each_entry(l, &p->conf.listeners, by_fe)
    |-               stop_listener(l, 1, 0);
    |+               stop_listener(l, 1, 0, 0);
    |
    |        if (!p->disabled && !p->li_ready) {
    |                /* might be just a backend */
2023-02-23 15:05:05 +01:00
Christopher Faulet
2bf99123ef MINOR: stconn: Add functions to set/clear SE_FL_EXP_NO_DATA flag from endpoint
se_expect_data() and se_expect_no_data() should be used from the endpoint to
inform upper layer it expects data or not from the opposite endpoint.
2023-02-23 13:44:32 +01:00
Christopher Faulet
be5cc766b0 MINOR: stconn: Remove half-closed timeout
The half-closed timeout is now directly retrieved from the proxy
settings. There is no longer usage for the .hcto field in the stconn
structure. So let's remove it.
2023-02-22 15:59:16 +01:00
Christopher Faulet
bcdcfad3ff MINOR: stconn: Set half-close timeout using proxy settings
We now directly use the proxy settings to set the half-close timeout of a
stream-connector. The function sc_set_hcto() must be used to do so. This
timeout is only set when a shutw is performed. So it is not really a big
deal to use a dedicated function to do so.
2023-02-22 15:59:16 +01:00
Christopher Faulet
15315d6c0a CLEANUP: stconn: Remove old read and write expiration dates
Old read and write expiration dates are no longer used. Thus we can safely
remove them.
2023-02-22 15:59:16 +01:00
Christopher Faulet
b374ba563a MAJOR: stream: Use SE descriptor date to detect read/write timeouts
We stop to use the channel's expiration dates to detect read and write
timeouts on the channels. We now rely on the stream-endpoint descriptor to
do so. All the stuff is handled in process_stream().

The stream relies on 2 helper functions to know if the receives or sends may
expire: sc_rcv_may_expire() and sc_snd_may_expire().
2023-02-22 15:57:16 +01:00
Christopher Faulet
2ca4cc1936 MINOR: applet/stconn: Add a SE flag to specify an endpoint does not expect data
An endpoint should now set SE_FL_EXP_NO_DATA flag if it does not expect any
data from the opposite endpoint. This way, the stream will be able to
disable any read timeout on the opposite endpoint. Applets should use
applet_expect_no_data() and applet_expect_data() functions to set or clear
the flag. For now, only dns and sink forwarder applets are concerned.
2023-02-22 15:56:28 +01:00
Christopher Faulet
4c13568b49 MEDIUM: stconn: Add two date to track successful reads and blocked sends
The stream endpoint descriptor now owns two date, lra (last read activity) and
fsb (first send blocked).

The first one is updated every time a read activity is reported, including data
received from the endpoint, successful connect, end of input and shutdown for
reads. A read activity is also reported when receives are unblocked. It will be
used to detect read timeouts.

The other one is updated when no data can be sent to the endpoint and reset
when some data are sent. It is the date of the first send blocked by the
endpoint. It will be used to detect write timeouts.

Helper functions are added to report read/send activity and to retrieve lra/fsb
date.
2023-02-22 14:52:15 +01:00
Christopher Faulet
5aaacfbccd MEDIUM: stconn: Replace read and write timeouts by a unique I/O timeout
Read and write timeouts (.rto and .wto) are now replaced by an unique
timeout, call .ioto. Since the recent refactoring on channel's timeouts,
both use the same value, the client timeout on client side and the server
timeout on the server side. Thus, this part may be simplified. Now it
represents the I/O timeout.
2023-02-22 14:52:15 +01:00
Christopher Faulet
f8413cba2a MEDIUM: channel/stconn: Move rex/wex timer from the channel to the sedesc
These timers are related to the I/O. Thus it is logical to move them into
the SE descriptor. The patch is a bit huge but it is just a
replacement. However it is error-prone.

From the stconn or the stream, helper functions are used to get, set or
reset these timers. This simplify the timers manipulations.
2023-02-22 14:52:15 +01:00
Christopher Faulet
ed7e66fe1a MINOR: channel/stconn: Move rto/wto from the channel to the stconn
Read and write timeouts concerns the I/O. Thus, it is logical to move it into
the stconn. At the end, the stream is responsible to detect the timeouts. So
it is logcial to have these values in the stconn and not in the SE
descriptor. But it may change depending on the recfactoring.

So, now:
  * scf->rto is used instead of req->rto
  * scf->wto is used instead of res->wto
  * scb->rto is used instead of res->rto
  * scb->wto is used instead of req->wto
2023-02-22 14:52:15 +01:00
Christopher Faulet
2e56a73459 MAJOR: channel: Remove flags to report READ or WRITE errors
This patch removes CF_READ_ERROR and CF_WRITE_ERROR flags. We now rely on
SE_FL_ERR_PENDING and SE_FL_ERROR flags. SE_FL_ERR_PENDING is used for write
errors and SE_FL_ERROR for read or unrecoverable errors.

When a connection error is reported, SE_FL_ERROR and SE_FL_EOS are now set and a
read event and a write event are reported to be sure the stream will properly
process the error. At the stream-connector level, it is similar. When an error
is reported during a send, a write event is triggered. On the read side, nothing
more is performed because an error at this stage is enough to wake the stream
up.

A major change is brought with this patch. We stop to check flags of the
ooposite channel to report abort or timeout. It also means when an read or
write error is reported on a side, we no longer update the other side. Thus
a read error on the server side does no long lead to a write error on the
client side. This should ease errors report.
2023-02-22 14:52:15 +01:00
Christopher Faulet
81fdeb8ce2 MEDIUM: channel: Remove CF_READ_NOEXP flag
This flag was introduced in 1.3 to fix a design issue. It was untouch since
then but there is no reason to still have this trick. Note it could be good
to review what happens in HTTP with the server is waiting for the end of the
request. It could be good to be sure a client timeout is always reported.
2023-02-22 14:52:14 +01:00
Aurelien DARRAGON
3ffbf3896d BUG/MEDIUM: httpclient/lua: fix a race between lua GC and hlua_ctx_destroy
In bb581423b ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout
before the httpclient"), a new logic was implemented to make sure that
when a lua ctx destroyed, related httpclients are correctly destroyed too
to prevent a such httpclients from being resuscitated on a destroyed lua ctx.

This was implemented by adding a list of httpclients within the lua ctx,
and a new function, hlua_httpclient_destroy_all(), that is called under
hlua_ctx_destroy() and runs through the httpclients list in the lua context
to properly terminate them.

This was done with the assumption that no concurrent Lua garbage collection
cycles could occur on the same ressources, which seems OK since the "lua"
context is about to be freed and is not explicitly being used by other threads.

But when 'lua-load' is used, the main lua stack is shared between multiple
OS threads, which means that all lua ctx in the process are linked to the
same parent stack.
Yet it seems that lua GC, which can be triggered automatically under
lua_resume() or manually through lua_gc(), does not limit itself to the
"coroutine" stack (the stack referenced in lua->T) when performing the cleanup,
but is able to perform some cleanup on the main stack plus coroutines stacks
that were created under the same main stack (via lua_newthread()) as well.

This can be explained by the fact that lua_newthread() coroutines are not meant
to be thread-safe by design.
Source: http://lua-users.org/lists/lua-l/2011-07/msg00072.html (lua co-author)

It did not cause other issues so far because most of the time when using
'lua-load', the global lua lock is taken when performing critical operations
that are known to interfere with the main stack.
But here in hlua_httpclient_destroy_all(), we don't run under the global lock.

Now that we properly understand the issue, the fix is pretty trivial:

We could simply guard the hlua_httpclient_destroy_all() under the global
lua lock, this would work but it could increase the contention over the
global lock.

Instead, we switched 'lua->hc_list' which was introduced with bb581423b
from simple list to mt_list so that concurrent accesses between
hlua_httpclient_destroy_all and hlua_httpclient_gc() are properly handled.

The issue was reported by @Mark11122 on Github #2037.

This must be backported with bb581423b ("BUG/MEDIUM: httpclient/lua: crash
when the lua task timeout before the httpclient") as far as 2.5.
2023-02-22 11:44:22 +01:00
Willy Tarreau
27629a7d65 MINOR: compiler: add a TOSTR() macro to turn a value into a string
Pretty often we have to emit a value (setting, limit etc) in an error
message, and this value is known at compile-time, and just doing this
forces to use a printf format such as "%d". Let's have a simple macro
to turn any other macro or value into a string that can be concatenated
with the rest of the string around. This simplifies error messages
production on the CLI for example.
2023-02-22 09:10:53 +01:00
Remi Tricot-Le Breton
879debeecb BUG/MINOR: cache: Cache response even if request has "no-cache" directive
Since commit cc9bf2e5f "MEDIUM: cache: Change caching conditions"
responses that do not have an explicit expiration time are not cached
anymore. But this mechanism wrongly used the TX_CACHE_IGNORE flag
instead of the TX_CACHEABLE one. The effect this had is that a cacheable
response that corresponded to a request having a "Cache-Control:
no-cache" for instance would not be cached.
Contrary to what was said in the other commit message, the "checkcache"
option should not be impacted by the use of the TX_CACHEABLE flag
instead of the TX_CACHE_IGNORE one. The response is indeed considered as
not cacheable if it has no expiration time, regardless of the presence
of a cookie in the response.

This should fix GitHub issue #2048.
This patch can be backported up to branch 2.4.
2023-02-21 18:35:41 +01:00
Christopher Faulet
c13f3028e8 MINOR: cfgcond: Implement enabled condition expression
Implement a way to test if some options are enabled at run-time. For now,
following options may be detected:

  POLL, EPOLL, KQUEUE, EVPORTS, SPLICE, GETADDRINFO, REUSEPORT,
  FAST-FORWARD, SERVER-SSL-VERIFY-NONE

These options are those that can be disabled on the command line. This way
it is possible, from a reg-test for instance, to know if a feature is
supported or not :

  feature cmd "$HAPROXY_PROGRAM -cc '!(globa.tune & GTUNE_NO_FAST_FWD)'"
2023-02-21 11:44:55 +01:00
Christopher Faulet
a1fdad784b MINOR: cfgcond: Implement strstr condition expression
Implement a way to match a substring in a string. The strstr expresionn can
now be used to do so.
2023-02-21 11:44:55 +01:00
Christopher Faulet
2f7c82bfdf BUG/MINOR: haproxy: Fix option to disable the fast-forward
The option was renamed to only permit to disable the fast-forward. First
there is no reason to enable it because it is the default behavior. Then it
introduced a bug because there is no way to be sure the command line has
precedence over the configuration this way. So, the option is now named
"tune.disable-fast-forward" and does not support any argument. And of
course, the commande line option "-dF" has now precedence over the
configuration.

No backport needed.
2023-02-21 11:44:55 +01:00
Amaury Denoyelle
77ed63106d MEDIUM: quic: trigger fast connection closing on process stopping
With previous commit, quic-conn are now handled as jobs to prevent the
termination of haproxy process. This ensures that QUIC connections are
closed when all data are acknowledged by the client and there is no more
active streams.

The quic-conn layer emits a CONNECTION_CLOSE once the MUX has been
released and all streams are acknowledged. Then, the timer is scheduled
to definitely free the connection after the idle timeout period. This
allows to treat late-arriving packets.

Adjust this procedure to deactivate this timer when process stopping is
in progress. In this case, quic-conn timer is set to expire immediately
to free the quic-conn instance as soon as possible. This allows to
quickly close haproxy process.

This should be backported up to 2.7.
2023-02-20 11:20:18 +01:00
Amaury Denoyelle
eb7d320d25 MINOR: mux-quic: implement client-fin timeout
Implement client-fin timeout for MUX quic. This timeout is used once an
applicative layer shutdown has been called. In HTTP/3, this corresponds
to the emission of a GOAWAY.

This should be backported up to 2.7.
2023-02-20 11:20:18 +01:00
Amaury Denoyelle
b30247b16c MINOR: mux-quic: define qc_shutdown()
Factorize shutdown operation in a dedicated function qc_shutdown(). This
will allow to call it from multiple places. A new flag QC_CF_APP_SHUT is
also defined to ensure it will only be executed once even if called
multiple times per connection.

This commit will be useful to properly support haproxy soft stop.
This should be backported up to 2.7.
2023-02-20 11:18:58 +01:00
Frédéric Lécaille
2f531116ed MINOR: quic: Add traces to qc_kill_conn()
Very minor modification to help in debugging issues.

Must be backported to 2.7.
2023-02-17 17:36:30 +01:00
Frédéric Lécaille
a2c62c3141 MINOR: quic: Kill the connections on ICMP (port unreachable) packet receipt
The send*() syscall which are responsible of such ICMP packets reception
fails with ECONNREFUSED as errno.

  man(7) udp
  ECONNREFUSED
      No receiver was associated with the destination address.
      This might be caused by a previous packet sent over the socket.

We must kill asap the underlying connection.

Must be backported to 2.7.
2023-02-17 17:36:30 +01:00
Frédéric Lécaille
75c8ad5490 MINOR: quic: Move code to wakeup the timer task to avoid anti-amplication deadlock
This code was there because the timer task was not running on the same thread
as the one which parse the QUIC packets. Now that this is no more the case,
we can wake up this task directly.

Must be backported to 2.7.
2023-02-17 17:36:30 +01:00
Frédéric Lécaille
1dbeb35f80 MINOR: quic: Add new traces about by connection RX buffer handling
Move quic_rx_pkts_del() out of quic_conn.h to make it benefit from the TRACE API.
Add traces which already already helped in diagnosing an issue encountered with
ngtcp2 which sent too much 1RTT packets before the handshake completion. This
has been fixed here after having discussed with Tasuhiro on QUIC dev slack:

https://github.com/ngtcp2/ngtcp2/pull/663

Must be backported to 2.7.
2023-02-17 17:36:30 +01:00
Amaury Denoyelle
14037bf26f MINOR: h3: add traces on decode_qcs callback
Add traces inside h3_decode_qcs(). Every error path has now its
dedicated trace which should simplify debugging. Each early returns has
been converted to a goto invocation.

To complete the demux tracing, demux frame type and length are now
printed using the h3s instance whenever its possible on trace
invocation. A new internal value H3_FT_UNINIT is used as a frame type to
mark demuxing as inactive.

This should be backported up to 2.7.
2023-02-17 17:31:52 +01:00
Amaury Denoyelle
381d8137e3 MINOR: h3/hq-interop: handle no data in decode_qcs() with FIN set
Properly handle a STREAM frame with no data but the FIN bit set at the
application layer. H3 and hq-interop decode_qcs() callback have been
adjusted to not return early in this case.

If the FIN bit is accepted, a HTX EOM must be inserted for the upper
stream layer. If the FIN is rejected because the stream cannot be
closed, a proper CONNECTION_CLOSE error will be triggered.

A new utility function qcs_http_handle_standalone_fin() has been
implemented in the qmux_http module. This allows to simply add the HTX
EOM on qcs HTX buffer. If the HTX buffer is empty, a EOT is first added
to ensure it will be transmitted above.

This commit will allow to properly handle FIN notify through an empty
STREAM frame. However, it is not sufficient as currently qcc_recv() skip
the decode_qcs() invocation when the offset is already received. This
will be fixed in the next commit.

This should be backported up to 2.6 along with the next patch.
2023-02-17 16:25:00 +01:00
Willy Tarreau
3e820a1056 MINOR: threads: add flags to know if a thread is started and/or running
Several times during debugging it has been difficult to find a way to
reliably indicate if a thread had been started and if it was still
running. It's really not easy because the elements we look at are not
necessarily reliable (e.g. harmless bit or idle bit might not reflect
what we think during a signal). And such notions can be subjective
anyway.

Here we define two thread flags, TH_FL_STARTED which is set as soon as
a thread enters run_thread_poll_loop() and drops the idle bit, and
another one, TH_FL_IN_LOOP, which is set when entering run_poll_loop()
and cleared when leaving it. This should help init/deinit code know
whether it's called from a non-initialized thread (i.e. tid must not
be trusted), or shared functions know if they're being called from a
running thread or from init/deinit code outside of the polling loop.
2023-02-17 16:01:34 +01:00
Christopher Faulet
d4eaa8af6b MINOR: global: Add an option to disable the data fast-forward
The new global option "tune.fast-forward" can be set to "off" to disable the
data fast-forward. It is an debug option, thus it is internally marked as
experimental. The directive "expose-experimental-directives" must be set
first to use this one. By default, the data fast-forward is enable.

It could be usefull to force to wake the stream up when data are
received. To be sure, evreything works fine in this case. The data
fast-forward is an optim. It must work without it. But some code may rely on
the fact the stream will not be woken up. With this option, it is possible
to spot some hidden bugs.
2023-02-17 10:17:02 +01:00
William Lallemand
44979ad680 BUG/MINOR: config: crt-list keywords mistaken for bind ssl keywords
This patch fixes an issue in the "-dK" keywords dumper, which was
mistakenly displaying the "crt-list" keywords for "bind ssl" keywords.

The patch fixes the issue by dumping the "crt-list" keywords in its own
section, and dumping the "bind" keywords which are in the "SSL" scope
with a "bind ssl" prefix.

This commit depends on the previous "MINOR: ssl: rename confusing
ssl_bind_kws" commit.

Must be backported in 2.6.

Diff of the `./haproxy -dKall -q -c -f /dev/null` output before and
after the patch in 2.8-dev4:

     | @@ -190,30 +190,9 @@ listen
     |  	use-fcgi-app
     |  	bind <addr> accept-netscaler-cip +1
     |  	bind <addr> accept-proxy
     | -	bind <addr> allow-0rtt
     | -	bind <addr> alpn +1
     |  	bind <addr> backlog +1
     | -	bind <addr> ca-file +1
     | -	bind <addr> ca-ignore-err +1
     | -	bind <addr> ca-sign-file +1
     | -	bind <addr> ca-sign-pass +1
     | -	bind <addr> ca-verify-file +1
     | -	bind <addr> ciphers +1
     | -	bind <addr> ciphersuites +1
     | -	bind <addr> crl-file +1
     | -	bind <addr> crt +1
     | -	bind <addr> crt-ignore-err +1
     | -	bind <addr> crt-list +1
     | -	bind <addr> curves +1
     |  	bind <addr> defer-accept
     | -	bind <addr> ecdhe +1
     |  	bind <addr> expose-fd +1
     | -	bind <addr> force-sslv3
     | -	bind <addr> force-tlsv10
     | -	bind <addr> force-tlsv11
     | -	bind <addr> force-tlsv12
     | -	bind <addr> force-tlsv13
     | -	bind <addr> generate-certificates
     |  	bind <addr> gid +1
     |  	bind <addr> group +1
     |  	bind <addr> id +1
     | @@ -225,48 +204,52 @@ listen
     |  	bind <addr> name +1
     |  	bind <addr> namespace +1
     |  	bind <addr> nice +1
     | -	bind <addr> no-ca-names
     | -	bind <addr> no-sslv3
     | -	bind <addr> no-tls-tickets
     | -	bind <addr> no-tlsv10
     | -	bind <addr> no-tlsv11
     | -	bind <addr> no-tlsv12
     | -	bind <addr> no-tlsv13
     | -	bind <addr> npn +1
     | -	bind <addr> prefer-client-ciphers
     |  	bind <addr> process +1
     |  	bind <addr> proto +1
     |  	bind <addr> severity-output +1
     |  	bind <addr> shards +1
     | -	bind <addr> ssl
     | -	bind <addr> ssl-max-ver +1
     | -	bind <addr> ssl-min-ver +1
     | -	bind <addr> strict-sni
     |  	bind <addr> tcp-ut +1
     |  	bind <addr> tfo
     |  	bind <addr> thread +1
     | -	bind <addr> tls-ticket-keys +1
     |  	bind <addr> transparent
     |  	bind <addr> uid +1
     |  	bind <addr> user +1
     |  	bind <addr> v4v6
     |  	bind <addr> v6only
     | -	bind <addr> verify +1
     |  	bind <addr> ssl allow-0rtt
     |  	bind <addr> ssl alpn +1
     |  	bind <addr> ssl ca-file +1
     | +	bind <addr> ssl ca-ignore-err +1
     | +	bind <addr> ssl ca-sign-file +1
     | +	bind <addr> ssl ca-sign-pass +1
     |  	bind <addr> ssl ca-verify-file +1
     |  	bind <addr> ssl ciphers +1
     |  	bind <addr> ssl ciphersuites +1
     |  	bind <addr> ssl crl-file +1
     | +	bind <addr> ssl crt +1
     | +	bind <addr> ssl crt-ignore-err +1
     | +	bind <addr> ssl crt-list +1
     |  	bind <addr> ssl curves +1
     |  	bind <addr> ssl ecdhe +1
     | +	bind <addr> ssl force-sslv3
     | +	bind <addr> ssl force-tlsv10
     | +	bind <addr> ssl force-tlsv11
     | +	bind <addr> ssl force-tlsv12
     | +	bind <addr> ssl force-tlsv13
     | +	bind <addr> ssl generate-certificates
     |  	bind <addr> ssl no-ca-names
     | +	bind <addr> ssl no-sslv3
     | +	bind <addr> ssl no-tls-tickets
     | +	bind <addr> ssl no-tlsv10
     | +	bind <addr> ssl no-tlsv11
     | +	bind <addr> ssl no-tlsv12
     | +	bind <addr> ssl no-tlsv13
     |  	bind <addr> ssl npn +1
     | -	bind <addr> ssl ocsp-update +1
     | +	bind <addr> ssl prefer-client-ciphers
     |  	bind <addr> ssl ssl-max-ver +1
     |  	bind <addr> ssl ssl-min-ver +1
     | +	bind <addr> ssl strict-sni
     | +	bind <addr> ssl tls-ticket-keys +1
     |  	bind <addr> ssl verify +1
     |  	server <name> <addr> addr +1
     |  	server <name> <addr> agent-addr +1
     | @@ -591,6 +574,23 @@ listen
     |  	http-after-response unset-var*
     |  userlist
     |  peers
     | +crt-list
     | +	allow-0rtt
     | +	alpn +1
     | +	ca-file +1
     | +	ca-verify-file +1
     | +	ciphers +1
     | +	ciphersuites +1
     | +	crl-file +1
     | +	curves +1
     | +	ecdhe +1
     | +	no-ca-names
     | +	npn +1
     | +	ocsp-update +1
     | +	ssl-max-ver +1
     | +	ssl-min-ver +1
     | +	verify +1
     |  # List of registered CLI keywords:
     |  @!<pid> [MASTER]
     |  @<relative pid> [MASTER]
2023-02-16 16:14:37 +01:00
William Lallemand
af67806651 MINOR: ssl: rename confusing ssl_bind_kws
The ssl_bind_kw structure is exclusively used for crt-list keyword, it
must be named otherwise to remove the confusion.

The structure was renamed ssl_crtlist_kws.
2023-02-16 16:03:45 +01:00
Amaury Denoyelle
15c74702d5 MINOR: quic: implement a basic "show quic" CLI handler
Implement a basic "show quic" CLI handler. This command will be useful
to display various information on all the active QUIC frontend
connections.

This work is heavily inspired by "show sess". Most notably, a global
list of quic_conn has been introduced to be able to loop over them. This
list is stored per thread in ha_thread_ctx.

Also add three CLI handlers for "show quic" in order to allocate and
free the command context. The dump handler runs on thread isolation.
Each quic_conn is referenced using a back-ref to handle deletion during
handler yielding.

For the moment, only a list of raw quic_conn pointers is displayed. The
handler will be completed over time with more information as needed.

This should be backported up to 2.7.
2023-02-09 18:11:00 +01:00
Aurelien DARRAGON
3e7a0bb70b MINOR: cfgparse/server: move (min/max)conn postparsing logic into dedicated function
In check_config_validity() function, we performed some consistency checks to
adjust minconn/maxconn attributes for each declared server.

We move this logic into a dedicated function named srv_minmax_conn_apply()
to be able to perform those checks later in the process life when needed
(ie: dynamic servers)
2023-02-08 14:48:21 +01:00
William Lallemand
a14686d096 MINOR: ssl/ocsp: add a function to check the OCSP update configuration
Deduplicate the code which checks the OCSP update in the ckch_store and
in the crtlist_entry.

Also, jump immediatly to error handling when the ERR_FATAL is catched.
2023-02-08 11:40:31 +01:00
William Lallemand
b4b9caa65f BUILD: ssl/ocsp: ssl_ocsp-t.h depends on ssl_sock-t.h
ssl_ocsp-t.h uses SSL_SOCK_NUM_KEYTYPES which is defined in
ssl_sock-t.h.

No backport needed.
2023-02-08 11:31:03 +01:00
Willy Tarreau
28360dc53f MEDIUM: clock: force internal time to wrap early after boot
GH issue #2034 clearly indicates yet another case of time roll-over
that went badly. Issues that happen only once every 50 days are hard
to detect and debug, and are usually reported more or less synchronized
from multiple sources. This patch finally does what had long been planned
but never done yet, which is to force the time to wrap early after boot
so that any such remaining issue can be spotted quicker. The margin delay
here is 20s (it may be changed by setting BOOT_TIME_WRAP_SEC to another
value). This value seems sufficient to permit failed health checks to
succeed and traffic to come in and possibly start to update some time
stamps (accept dates in logs, freq counters, stick-tables expiration
dates etc).

It could theoretically be helpful to have this in 2.7, but as can be
seen with the two patches below, we've already had incorrect use cases
of the internal monotonic time when the wall-clock one was needed, so
we could expect to detect other ones in the future. Note that this will
*not* induce bugs, it will only make them happen much faster (i.e. no
need to wait for 50 days before seeing them). If it were to eventually
be backported, these two previous patches must also be backported:

    BUG/MINOR: clock: use distinct wall-clock and monotonic start dates
    BUG/MEDIUM: cache: use the correct time reference when comparing dates
2023-02-08 11:10:33 +01:00
Willy Tarreau
6093ba47c0 BUG/MINOR: clock: do not mix wall-clock and monotonic time in uptime calculation
We've had a start date even before the internal monotonic clock existed,
but once the monotonic clock was added, the start date was not updated
to distinguish the wall clock time units and the internal monotonic time
units. The distinction is important because both clocks do not necessarily
progress at the same speed. The very rare occurrences of the wall-clock
date are essentially for human consumption and communication with third
parties (e.g. report the start date in "show info" for monitoring
purposes). However currently this one is also used to measure the distance
to "now" as being the process' uptime. This is actually not correct. It
only works because for now the two dates are initialized at the exact
same instant at boot but could still be wrong if the system's date shows
a big jump backwards during startup for example. In addition the current
situation prevents us from enforcing an abritrary offset at boot to reveal
some heisenbugs.

This patch adds a new "start_time" at boot that is set from "now" and is
used in uptime calculations. "start_date" instead is now set from "date"
and will always reflect the system date for human consumption (e.g. in
"show info"). This way we're now sure that any drift of the internal
clock relative to the system date will not impact the reported uptime.

This could possibly be backported though it's unlikely that anyone has
ever noticed the problem.
2023-02-08 11:06:55 +01:00
Frédéric Lécaille
b7a406ac34 MINOR: quic: Update version_information transport parameter to draft-14
This is necessary to make our stack negotiate the QUIC versions with clients.
(See https://author-tools.ietf.org/iddiff?url1=draft-ietf-quic-version-negotiation-13&url2=draft-ietf-quic-version-negotiation-14&difftype=--html)

Must be backported to 2.7.
2023-02-06 11:54:07 +01:00
Aurelien DARRAGON
e5958d0292 BUG/MEDIUM: stats: fix resolvers dump
In ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats"),
we forgot to apply the patch in resolvers.c which provides the
stats_dump_resolvers() function that is involved when dumping with "resolvers"
domain.

As a consequence, resolvers dump was broken because stats_dump_one_line(),
which is used in stats_dump_resolv_to_buffer(), implicitely uses trash_chunk
from stats.c to prepare the dump, and stats_putchk() is then called with
global trash (currently empty) as output data.

Given that trash_dump variable is static and thus only available within stats.c
we change stats_putchk() function prototype so that the function does not take
the output buffer as an argument. Instead, stats_putchk() will implicitly use
the local trash_dump variable declared in stats.c.

It will also prevent further mixups between stats_dump_* functions and
stats_putchk().

This needs to be backported with ("BUG/MEDIUM: stats: Rely on a local trash
buffer to dump the stats")
2023-02-06 07:53:03 +01:00