Commit Graph

11307 Commits

Author SHA1 Message Date
Christopher Faulet
96bff76087 MINOR: actions: Regroup some info about HTTP rules in the same struct
Info used by HTTP rules manipulating the message itself are splitted in several
structures in the arg union. But it is possible to group all of them in a unique
struct. Now, <arg.http> is used by most of these rules, which contains:

  * <arg.http.i>   : an integer used as status code, nice/tos/mark/loglevel or
                     action id.
  * <arg.http.str> : an IST used as header name, reason string or auth realm.
  * <arg.http.fmt> : a log-format compatible expression
  * <arg.http.re>  : a regular expression used by replace rules
2020-01-20 15:18:45 +01:00
Christopher Faulet
58b3564fde MINOR: actions: Add a function pointer to release args used by actions
Arguments used by actions are never released during HAProxy deinit. Now, it is
possible to specify a function to do so. ".release_ptr" field in the act_rule
structure may be set during the configuration parsing to a specific deinit
function depending on the action type.
2020-01-20 15:18:45 +01:00
Christopher Faulet
95d169ba9a REGTEST: Fix format of set-uri HTTP request rule in h1or2_to_h1c.vtc
First, concat() is a converter, not a sample fetch. So use str() sample fetch
with no string and call concat on it. Then, the argument of the set-uri rule
must be a log format string. So it must be inside %[] to be evaluated.
2020-01-20 15:18:45 +01:00
Christopher Faulet
1aea50e1ff MEDIUM: http-rules: Enable the strict rewriting mode by default
Now, by default, when a rule performing a rewrite on an HTTP message fails, an
internal error is triggered. Before, the failure was ignored. But most of users
are not aware of this behavior. And it does not happen very often because the
buffer reserve space in large enough. So it may be surprising. Returning an
internal error makes the rewrite failure explicit. If it is acceptable to
silently ignore it, the strict rewriting mode can be disabled.
2020-01-20 15:18:45 +01:00
Christopher Faulet
46f95543c5 MINOR: http-rules: Add a rule to enable or disable the strict rewriting mode
It is now possible to explicitly instruct rewriting rules to be strict or not
towards errors. It means that in this mode, an internal error is trigger if a
rewrite rule fails. The HTTP action "strict-mode" can be used to enable or
disable the strict rewriting mode. It can be used in an http-request and an
http-response ruleset.

For now, by default the strict rewriting mode is disabled. Because it is the
current behavior. But it will be changed in another patch.
2020-01-20 15:18:45 +01:00
Christopher Faulet
e00d06c99f MINOR: http-rules: Handle all message rewrites the same way
In HTTP rules, error handling during a rewrite is now handle the same way for
all rules. First, allocation errors are reported as internal errors. Then, if
soft rewrites are allowed, rewrite errors are ignored and only the
failed_rewrites counter is incremented. Otherwise, when strict rewrites are
mandatory, interanl errors are returned.

For now, only soft rewrites are supported. Note also that the warning sent to
notify a rewrite failure was removed. It will be useless once the strict
rewrites will be possible.
2020-01-20 15:18:45 +01:00
Christopher Faulet
a00071e2e5 MINOR: http-ana: Add a txn flag to support soft/strict message rewrites
the HTTP_MSGF_SOFT_RW flag must now be set on the HTTP transaction to ignore
rewrite errors on a message, from HTTP rules. The mode is called the soft
rewrites. If thes flag is not set, strict rewrites are performed. In this mode,
if a rewrite error occurred, an internal error is reported.

For now, HTTP_MSGF_SOFT_RW is always set and there is no way to switch a
transaction in strict mode.
2020-01-20 15:18:45 +01:00
Christopher Faulet
cff0f739e5 MINOR: counters: Review conditions to increment counters from analysers
Now, for these counters, the following rules are followed to know if it must be
incremented or not:

  * if it exists for a frontend, the counter is incremented
  * if stats must be collected for the session's listener, if the counter exists
    for this listener, it is incremented
  * if the backend is already assigned, if the counter exists for this backend,
    it is incremented
  * if a server is attached to the stream, if the counter exists for this
    server, it is incremented

It is not hardcoded rules. Some counters are still handled in a different
way. But many counters are incremented this way now.
2020-01-20 15:18:45 +01:00
Christopher Faulet
a08546bb5a MINOR: counters: Remove failed_secu counter and use denied_resp instead
The failed_secu counter is only used for the servers stats. It is used to report
the number of denied responses. On proxies, the same info is stored in the
denied_resp counter. So, it is more consistent to use the same field for
servers.
2020-01-20 15:18:45 +01:00
Christopher Faulet
e4a2c8d7e7 MINOR: contrib/prometheus-exporter: Export internal errors per proxy/server
The new ST_F_EINT stats field is now exported for each proxy/server.
2020-01-20 15:18:45 +01:00
Christopher Faulet
0159ee4032 MINOR: stats: Report internal errors in the proxies/listeners/servers stats
The stats field ST_F_EINT has been added to report internal errors encountered
per proxy, per listener and per server. It appears in the CLI export and on the
HTML stats page.
2020-01-20 15:18:45 +01:00
Christopher Faulet
74f67af8d4 MINOR: http-rules: Handle denied/aborted/invalid connections from HTTP rules
The new possible results for a custom action (deny/abort/invalid) are now handled
during HTTP rules evaluation. These codes are mapped on HTTP rules ones :

  * ACT_RET_DENY => HTTP_RULE_RES_DENY
  * ACT_RET_ABRT => HTTP_RULE_RES_ABRT
  * ACT_RET_INV  => HTTP_RULE_RES_BADREQ

For now, no custom action uses these new codes.
2020-01-20 15:18:45 +01:00
Christopher Faulet
282992e25f MINOR: tcp-rules: Handle denied/aborted/invalid connections from TCP rules
The new possible results for a custom action (deny/abort/invalid) are now
handled during TCP rules evaluation. For L4/L5 rules, the session is
rejected. For L7 rules, the right counter is incremented, then the connections
killed.

For now, no custom action uses these new codes.
2020-01-20 15:18:45 +01:00
Christopher Faulet
30a2a3724b MINOR: http-rules: Add more return codes to let custom actions act as normal ones
When HTTP/TCP rules are evaluated, especially HTTP ones, some results are
possible for normal actions and not for custom ones. So missing return codes
(ACT_RET_) have been added to let custom actions act as normal ones. Concretely
following codes have been added:

 * ACT_RET_DENY : deny the request/response. It must be handled by the caller
 * ACT_RET_ABRT : abort the request/response, handled by action itsleft.
 * ACT_RET_INV  : invalid request/response
2020-01-20 15:18:45 +01:00
Christopher Faulet
3a26beea18 MINOR: http-rules: Handle internal errors during HTTP rules evaluation
The HTTP_RULE_RES_ERROR code is now used by HTTP analyzers to handle internal
errors during HTTP rules evaluation. It is used instead of HTTP_RULE_RES_BADREQ,
used for invalid requests/responses. In addition, the SF_ERR_RESOURCE flag is
set on the stream when an allocation failure happens.

Note that the return value of http-response rules evaluation is now tested in
the same way than the result of http-request rules evaluation.
2020-01-20 15:18:45 +01:00
Christopher Faulet
4d90db5f4c MINOR: http-rules: Add a rule result to report internal error
Now, when HTTP rules are evaluated, HTTP_RULE_RES_ERROR must be returned when an
internal error is catched. It is a way to make the difference between a bad
request or a bad response and an error during its processing.
2020-01-20 15:18:45 +01:00
Christopher Faulet
b8a5371a32 MEDIUM: http-ana: Properly handle internal processing errors
Now, processing errors are properly handled. Instead of returning an error 400
or 502, depending where the error happens, an error 500 is now returned. And the
processing_errors counter is incremented. By default, when such error is
detected, the SF_ERR_INTERNAL stream error is used. When the error is caused by
an allocation failure, and when it is reasonnably possible, the SF_ERR_RESOURCE
stream error is used. Thanks to this patch, bad requests and bad responses
should be easier to detect.
2020-01-20 15:18:45 +01:00
Christopher Faulet
d4ce6c2957 MINOR: counters: Add a counter to report internal processing errors
This counter, named 'internal_errors', has been added in frontend and backend
counters. It should be used when a internal error is encountered, instead for
failed_req or failed_resp.
2020-01-20 15:18:45 +01:00
Christopher Faulet
28160e73dd MINOR: http-rules: Return an error when custom actions return ACT_RET_ERR
Thanks to the commit "MINOR: actions: Use ACT_RET_CONT code to ignore an error
from a custom action", it is now possible to trigger an error from a custom
action in http rules. Now, when a custom action returns the ACT_RET_ERR code
from an http-request rule, an error 400 is returned. And from an http-response
rule, an error 502 is returned.

Be careful if this patch is backported. The other mentioned patch must be
backported first.
2020-01-20 15:18:45 +01:00
Christopher Faulet
491ab5e2e5 MINOR: tcp-rules: Kill connections when custom actions return ACT_RET_ERR
Thanks to the commit "MINOR: actions: Use ACT_RET_CONT code to ignore an error
from a custom action", it is now possible to trigger an error from a custom
action in tcp-content rules. Now, when a custom action returns the ACT_RET_ERR
code, it has the same behavior than a reject rules, the connection is killed.

Be careful if this patch is backported. The other mentioned patch must be
backported first.
2020-01-20 15:18:45 +01:00
Christopher Faulet
13403761d5 MINOR: actions: Use ACT_RET_CONT code to ignore an error from a custom action
Some custom actions are just ignored and skipped when an error is encoutered. In
that case, we jump to the next rule. To do so, most of them use the return code
ACT_RET_ERR. Currently, for http rules and tcp content rules, it is not a
problem because this code is handled the same way than ACT_RET_CONT. But, it
means there is no way to handle the error as other actions. The custom actions
must handle the error and return ACT_RET_DONE. For instance, when http-request
rules are processed, an error when we try to replace a header value leads to a
bad request and an error 400 is returned to the client. But when we fail to
replace the URI, the error is silently ignored. This difference between the
custom actions and the others is an obstacle to write new custom actions.

So, in this first patch, ACT_RET_CONT is now returned from custom actions
instead of ACT_RET_ERR when an error is encoutered if it should be ignored. The
behavior remains the same but it is now possible to handle true errors using the
return code ACT_RET_ERR. Some actions will probably be reviewed to determine if
an error is fatal or not. Other patches will be pushed to trigger an error when
a custom action returns the ACT_RET_ERR code.

This patch is not tagged as a bug because it is just a design issue. But others
will depends on it. So be careful during backports, if so.
2020-01-20 15:18:45 +01:00
Christopher Faulet
cb9106b3e3 MINOR: tcp-rules: Always set from which ruleset a rule comes from
The ruleset from which a TCP rule comes from (the <from> field in the act_rule
structure) is only set when a rule is created from a registered keyword and not
for all TCP rules. But this information may be useful to check the configuration
validity or during the rule evaluation. So now, we systematically set it.
2020-01-20 15:18:45 +01:00
Christopher Faulet
81e20177df MEDIUM: http-rules: Register an action keyword for all http rules
There are many specific http actions that don't use the action registration
mechanism (allow, deny, set-header...). Instead, the parsing of these actions is
inlined in the functions responsible to parse the http-request/http-response
rules. There is no reason to not register an action keyword for all these
actions. It it the purpose of this patch. The new functions responsible to parse
these http actions are defined in http_act.c
2020-01-20 15:18:45 +01:00
Christopher Faulet
28436e23d3 BUG/MINOR: stick-table: Use MAX_SESS_STKCTR as the max track ID during parsing
During the parsing of the sc-inc-gpc0, sc-inc-gpc1 and sc-inc-gpt1 actions, the
maximum stick table track ID allowed is tested against ACT_ACTION_TRK_SCMAX. It
is the action number and not the maximum number of stick counters. Instead,
MAX_SESS_STKCTR must be used.

This patch must be backported to all stable versions.
2020-01-20 15:18:45 +01:00
Christopher Faulet
cb5501327c BUG/MINOR: http-rules: Remove buggy deinit functions for HTTP rules
Functions to deinitialize the HTTP rules are buggy. These functions does not
check the action name to release the right part in the arg union. Only few info
are released. For auth rules, the realm is released and there is no problem
here. But the regex <arg.hdr_add.re> is always unconditionally released. So it
is easy to make these functions crash. For instance, with the following rule
HAProxy crashes during the deinit :

      http-request set-map(/path/to/map) %[src] %[req.hdr(X-Value)]

For now, These functions are simply removed and we rely on the deinit function
used for TCP rules (renamed as deinit_act_rules()). This patch fixes the
bug. But arguments used by actions are not released at all, this part will be
addressed later.

This patch must be backported to all stable versions.
2020-01-20 15:18:45 +01:00
Christopher Faulet
1a3e0279c6 BUG/MINOR: http-ana/filters: Wait end of the http_end callback for all filters
Filters may define the "http_end" callback, called at the end of the analysis of
any HTTP messages. It is called at the end of the payload forwarding and it can
interrupt the stream processing. So we must be sure to not remove the XFER_BODY
analyzers while there is still at least filter in progress on this callback.

Unfortunatly, once the request and the response are borh in the DONE or the
TUNNEL mode, we consider the XFER_BODY analyzer has finished its processing on
both sides. So it is possible to prematurely interrupt the execution of the
filters "http_end" callback.

To fix this bug, we switch a message in the ENDING state. It is then switched in
DONE/TUNNEL mode only after the execution of the filters "http_end" callback.

This patch must be backported (and adapted) to 2.1, 2.0 and 1.9. The legacy HTTP
mode shoud probaly be fixed too.
2020-01-20 15:18:45 +01:00
Christopher Faulet
cf403f32e4 MINOR: contrib/prometheus-exporter: Add heathcheck status/code in server metrics
ST_F_CHECK_STATUS and ST_F_CHECK_CODE are now part of exported server metrics:

  * haproxy_server_check_status
  * haproxy_server_check_code

The heathcheck status is an integer corresponding to HCHK_STATUS value.
2020-01-20 15:18:45 +01:00
Christopher Faulet
46230363af MINOR: mux-h1: Inherit send flags from the upper layer
Send flags (CO_SFL_*) used when xprt->snd_buf() is called, in h1_send(), are now
inherited from the upper layer, when h1_snd_buf() is called. First, the flag
CO_SFL_MSG_MORE is no more set if the output buffer is full, but only if the
stream-interface decides to set it. It has more info to do it than the
mux. Then, the flag CO_SFL_STREAMER is now also handled this way. It was just
ignored till now.
2020-01-20 15:18:45 +01:00
Christopher Faulet
d47941d6ac DOC: Add a section to document the internal sample fetches
The section 7.3.7. is now dedicated to internal sample fetches. For now, only
HTX sample fetches are referenced in this section. But it should contain the
documentation of all sample fetches reserved to an internal use, for debugging
or testing purposes.
2020-01-20 15:18:45 +01:00
Christopher Faulet
8178e4006c MINOR: http-htx: Make 'internal.htx_blk_data' return a binary string
This internal sample fetch now returns a binary string (SMP_T_BIN) instead of a
character string.
2020-01-20 15:18:45 +01:00
Christopher Faulet
c5db14c5d4 MINOR: http-htx: Rename 'internal.htx_blk.val' to 'internal.htx_blk.data'
Use a more explicit name for this internal sample fetch.
2020-01-20 15:18:45 +01:00
Christopher Faulet
01f44456e6 MINOR: http-htx: Move htx sample fetches in the scope "internal"
HTX sample fetches are now prefixed by "internal." to explicitly reserve their
uses for debugging or testing purposes.
2020-01-20 15:18:45 +01:00
Ben51Degrees
6bf0672711 BUG/MINOR: 51d: Fix bug when HTX is enabled
When HTX is enabled, the sample flags were set too early. When matching for
multiple HTTP headers, the sample is fetched more than once, meaning that the
flags would need to be set again. Instead, the flags are now set last (just
before the outermost function returns). This could be further improved by
passing around the message without calling prefetch again.

This patch must be backported as far as 1.9. it should fix bug #450.
2020-01-20 14:01:52 +01:00
Tim Duesterhus
fcac33d0c1 BUG/MINOR: dns: Make dns_query_id_seed unsigned
Left shifting of large signed values and negative values is undefined.

In a test script clang's ubsan rightfully complains:

> runtime error: left shift of 1934242336581872173 by 13 places cannot be represented in type 'int64_t' (aka 'long')

This bug was introduced in the initial version of the DNS resolver
in 325137d603. The fix must be backported
to HAProxy 1.6+.
2020-01-18 06:45:54 +01:00
Tim Duesterhus
d34b1ce5a2 BUG/MINOR: cache: Fix leak of cache name in error path
This issue was introduced in commit 99a17a2d91
which first appeared in tag v1.9-dev11. This bugfix should be backported
to HAProxy 1.9+.
2020-01-18 06:45:54 +01:00
Tim Duesterhus
6bd909b42f DOC: Fix copy and paste mistake in http-response replace-value doc
This fixes up commit 2252beb855.
2020-01-18 06:45:54 +01:00
Elliot Otchet
71f829767d MINOR: ssl: Add support for returning the dn samples from ssl_(c|f)_(i|s)_dn in LDAP v3 (RFC2253) format.
Modifies the existing sample extraction methods (smp_fetch_ssl_x_i_dn,
smp_fetch_ssl_x_s_dn) to accommodate a third argument that indicates the
DN should be returned in LDAP v3 format. When the third argument is
present, the new function (ssl_sock_get_dn_formatted) is called with
three parameters including the X509_NAME, a buffer containing the format
argument, and a buffer for the output.  If the supplied format matches
the supported format string (currently only "rfc2253" is supported), the
formatted value is extracted into the supplied output buffer using
OpenSSL's X509_NAME_print_ex and BIO_s_mem. 1 is returned when a dn
value is retrieved.  0 is returned when a value is not retrieved.

Argument validation is added to each of the related sample
configurations to ensure the third argument passed is either blank or
"rfc2253" using strcmp.  An error is returned if the third argument is
present with any other value.

Documentation was updated in configuration.txt and it was noted during
preliminary reviews that a CLEANUP patch should follow that adjusts the
documentation.  Currently, this patch and the existing documentation are
copied with some minor revisions for each sample configuration.  It
might be better to have one entry for all of the samples or entries for
each that reference back to a primary entry that explains the sample in
detail.

Special thanks to Chris, Willy, Tim and Aleks for the feedback.

Author: Elliot Otchet <degroens@yahoo.com>
Reviewed-by: Tim Duesterhus <tim@bastelstu.be>
2020-01-18 06:42:30 +01:00
Willy Tarreau
ee1a6fc943 MINOR: connection: make the last arg of subscribe() a struct wait_event*
The subscriber used to be passed as a "void *param" that was systematically
cast to a struct wait_event*. By now it appears clear that the subscribe()
call at every layer is well defined and always takes a pointer to an event
subscriber of type wait_event, so let's enforce this in the functions'
prototypes, remove the intermediary variables used to cast it and clean up
the comments to clarify what all these functions do in their context.
2020-01-17 18:30:37 +01:00
Willy Tarreau
8907e4ddb8 MEDIUM: mux-fcgi: merge recv_wait and send_wait event notifications
This is the last of the "recv_wait+send_wait merge" patches and is
functionally equivalent to previous commit "MEDIUM: mux-h2: merge
recv_wait and send_wait event notifications" but for FCGI this time.
The principle is pretty much the same, since the code is very similar.
We use a single wait_event for both recv and send and rely on the
subscribe flags to know the desired notifications.
2020-01-17 18:30:37 +01:00
Willy Tarreau
f96508aae6 MEDIUM: mux-h2: merge recv_wait and send_wait event notifications
This is the continuation of the recv+send event notifications merge
that was started. This patch is less trivial than the previous ones
because the existence of a send event subscription is also used to
decide to put a stream back into the send list.
2020-01-17 18:30:36 +01:00
Willy Tarreau
1b0d4d19fc MEDIUM: mux-h1: merge recv_wait and send_wait
This is the same principle as previous commit, but for the H1 mux this
time. The checks in the subscribe()/unsubscribe() calls were factored
and some BUG_ON() were added to detect unexpected cases.
h1_wake_for_recv() and h1_wake_for_send() needed to be refined to
consider the current subscription before deciding to wake up.
2020-01-17 18:30:36 +01:00
Willy Tarreau
113d52bfb4 MEDIUM: ssl: merge recv_wait and send_wait in ssl_sock
This is the same principle as previous commit, but for ssl_sock.
2020-01-17 18:30:36 +01:00
Willy Tarreau
ac6febd3ae MEDIUM: xprt: merge recv_wait and send_wait in xprt_handshake
This is the same principle as previous commit, but for xprt_handshake.
2020-01-17 18:30:36 +01:00
Willy Tarreau
7872d1fc15 MEDIUM: connection: merge the send_wait and recv_wait entries
In practice all callers use the same wait_event notification for any I/O
so instead of keeping specific code to handle them separately, let's merge
them and it will allow us to create new events later.
2020-01-17 18:30:36 +01:00
Willy Tarreau
062df2c23a MEDIUM: backend: move the connection finalization step to back_handle_st_con()
Currently there's still lots of code in conn_complete_server() that performs
one half of the connection setup, which is then checked and finalized in
back_handle_st_con(). There isn't a valid reason for this anymore, we can
simplify this and make sure that conn_complete_server() only wakes the stream
up to inform it about the fact the whole connection stack is set up so that
back_handle_st_con() finishes its job at the stream-int level.

It looks like the there could even be further simplified, but for now it
was moved straight out of conn_complete_server() with no modification.
2020-01-17 18:30:36 +01:00
Willy Tarreau
3a9312af8f REORG: stream/backend: move backend-specific stuff to backend.c
For more than a decade we've kept all the sess_update_st_*() functions
in stream.c while they're only there to work in relation with what is
currently being done in backend.c (srv_redispatch_connect, connect_server,
etc). Let's move all this pollution over there and take this opportunity
to try to find slightly less confusing names for these old functions
whose role is only to handle transitions from one specific stream-int
state:

  sess_update_st_rdy_tcp() -> back_handle_st_rdy()
  sess_update_st_con_tcp() -> back_handle_st_con()
  sess_update_st_cer()     -> back_handle_st_cer()
  sess_update_stream_int() -> back_try_conn_req()
  sess_prepare_conn_req()  -> back_handle_st_req()
  sess_establish()         -> back_establish()

The last one remained in stream.c because it's more or less a completion
function which does all the initialization expected on a connection
success or failure, can set analysers and emit logs.

The other ones could possibly slightly benefit from being modified to
take a stream-int instead since it's really what they're working with,
but it's unimportant here.
2020-01-17 18:30:36 +01:00
Willy Tarreau
7aad7039e4 MEDIUM: mux-fcgi: do not make an fstrm subscribe to itself on deferred shut
This is the port to FCGI of previous commit "MEDIUM: mux-h2: do not make
an h2s subscribe to itself on deferred shut".

The purpose is to avoid subscribing to the send_wait list when trying to
close, because we'll soon have to merge both recv and send lists. Basic
testing showed no difference (performance nor issues).
2020-01-17 18:30:36 +01:00
Willy Tarreau
5723f295d8 MEDIUM: mux-h2: do not make an h2s subscribe to itself on deferred shut
The logic handling the deferred shutdown is a bit complex because it
involves a wait_event struct in each h2s dedicated to subscribing to
itself when shutdowns are not immediately possible. This implies that
we will not be able to support a shutdown and a receive subscription
in the future when we merge all wait events.

Let's solely rely on the H2_SF_WANT_SHUT_{R,W} flags instead and have
an autonomous tasklet for this. This requires to add a few controls
in the code because now when waking up a stream we need to check if it
is for I/O or just a shut, but since sending and shutting are exclusive
it's not difficult.

One point worth noting is that further resources could be shaved off
by only allocating the tasklet when failing to shut, given that in the
vast majority of streams it will never be used. In fact the sole purpose
of the tasklet is to support calling this code from outside the H2 mux
context. Looking at the code, it seems that not too many adaptations
would be required to have the send_list walking code deal with sending
the shut bits itself and further simplify all this.
2020-01-17 18:30:36 +01:00
Willy Tarreau
f11be0ea1e MEDIUM: mux-fcgi: do not try to stop sending streams on blocked mux
This is essentially the same change as applied to mux-h2 in previous commit
"MEDIUM: mux-h2: do not try to stop sending streams on blocked mux". The
goal is to make sure we don't need to keep the item in the send_wait list
until it's executed so that we can later merge it with the recv_wait list.
No performance changes were observed.
2020-01-17 18:30:36 +01:00
Willy Tarreau
d9464167fa MEDIUM: mux-h2: do not try to stop sending streams on blocked mux
This partially reverts commit d846c267 ("MINOR: h2: Don't run tasks that
are waiting to send if mux in full"). This commit was introduced to
limit the start/stop overhead incurred by waking many streams to let
only a few work. But since commit 9c218e7521 ("MAJOR: mux-h2: switch
to next mux buffer on buffer full condition."), this situation occurs
way less (typically 2000 to 4000 times less often) and the benefits of
the patch above do not outweigh its shortcomings anymore. And commit
c7ce4e3e7f ("BUG/MEDIUM: mux-h2: don't stop sending when crossing a
buffer boundary") addressed a root cause of many unexpected sleeps and
wakeups.

The main problem it's causing is that it requires to keep the element
in the send_wait list until it's executed, leaving the entry in an
uncertain state, and significantly complicating the coexistence of this
list and the wait list dedicated to shutdown. Also it happens that this
call to tasklet_remove_from_task_list() will not be usable anymore once
we start to support streams on different threads. And finally, some of
the other streams that we remove might very well have managed to find
their way to the h2_snd_buf() with an unblocked condition as well so it
is possible that some of these removals were not welcome.

So this patch now makes sure that send_wait is immediately nulled when
the task is woken up, and that we don't have to play with it afterwards.
Since we don't need to stop the tasklets anymore, we don't need the
sending_list that we can remove.

However one very useful benefit of the sending_list was that it used to
provide the information about the fact that the stream already tried to
send and failed. This was an important factor to improve fairness because
late arrived streams should not be allowed to send if others are already
scheduled. So this patch introduces a new per-stream flag H2_SF_NOTIFIED
to distinguish such streams.

With this patch the fairness is preserved, and the ratio of aborted
h2_snd_buf() due to other streams already sending remains quite low
(~0.3-2.1% measured depending on object size, this is within
expectations for 100 independent streams).

If the contention issue the patch above used to address comes up again
in the future, a much better (though more complicated) solution would
be to switch to per-connection buffer pools to distribute between the
connection and the streams so that by default there are more buffers
available for the mux and the streams only have some when the mux's are
unused, i.e. it would push the memory pressure back to the data layer.

One observation made while developing this patch is that when dealing
with large objects we still spend a huge amount of time scanning the
send_list with tasks that are already woken up every time a send()
manages to purge a bit more data. Indeed, by removing the elements
from the list when H2_SF_NOTIFIED is set, the netowrk bandwidth on
1 MB objects fetched over 100 streams per connection increases by 38%.
This was not done here to preserve fairness but is worth studying (e.g.
by keeping a restart pointer on the list or just having a flag indicating
if an entry was added since last scan).
2020-01-17 18:30:36 +01:00