Commit Graph

4678 Commits

Author SHA1 Message Date
Christopher Faulet
afc02a4436 MINOR: muxes: Remove get_cs_info callback function now useless
This callback function was only defined by the mux-h1. But it has been
removed in the previous commit because it is unused now. So, we can do a
step forward removing the callback function from the mux definition and the
cs_info structure.
2020-12-04 14:41:48 +01:00
Christopher Faulet
d517396f8e MINOR: session: Add the idle duration field into the session
The idle duration between two streams is added to the session structure. It
is not necessarily pertinent on all protocols. In fact, it is only defined
for H1 connections. It is the duration between two H1 transactions. But the
.get_cs_info() callback function on the multiplexers only exists because
this duration is missing at the session level. So it is a simplification
opportunity for a really low cost.

To reduce the cost, a hole in the session structure is filled by moving
.srv_list field at the end of the structure.
2020-12-04 14:41:48 +01:00
Thierry Fournier
c749259dff MINOR: lua-thread: Store each function reference and init reference in array
The goal is to allow execution of one main lua state per thread.

The array introduces storage of one reference per thread, because each
lua state can have different reference id for a same function. A function
returns the preferred state id according to configuration and current
thread id.
2020-12-02 21:53:16 +01:00
Thierry Fournier
021d986ecc MINOR: lua-thread: Replace state_from by state_id
The goal is to allow execution of one main lua state per thread.

"state_from" is a pointer to the parent lua state. "state_id"
is the index of the parent state id in the reference lua states
array. "state_id" is better because the lock is a "== 0" test
which is quick than pointer comparison. In other way, the state_id
index could index other things the the Lua state concerned. I
think to the function references.
2020-12-02 21:53:16 +01:00
Thierry Fournier
62a22aa23f MINOR: lua-thread: Replace "struct hlua_function" allocation by dedicated function
The goal is to allow execution of one main lua state per thread.

This function will initialize the struct with other things than 0.
With this function helper, the initialization is centralized and
it prevents mistakes. This patch also keeps a reference to each
declared function in a list. It will be useful in next patches to
control consistency of declared references.
2020-12-02 21:53:16 +01:00
Thierry Fournier
75fc02956b MINOR: lua-thread: make hlua_ctx_init() get L from its caller
The goal is to allow execution of one main lua state per thread.

The function hlua_ctx_init() now gets the original lua state from
its caller. This allows the initialisation of lua_thread (coroutines)
from any master lua state.

The parent lua state is stored in the hlua struct.

This patch is a temporary transition, it will be modified later.
2020-12-02 21:53:16 +01:00
Thierry Fournier
ad5345fed7 MINOR: lua-thread: Replace embedded struct hlua_function by a pointer
The goal is to allow execution of one main lua state per thread.

Because this struct will be filled after the configuration parser, we
cannot copy the content. The actual state of the Haproxy code doesn't
justify this change, it is an update preparing next steps.
2020-12-02 21:53:16 +01:00
Thierry Fournier
a51a1fd174 MINOR: cli: add a function to look up a CLI service description
This function will be useful to check if the keyword is already registered.
Also add a define for the max number of args.

This will be needed by a next patch to fix a bug and will have to be
backported.
2020-12-02 09:45:18 +01:00
Thierry Fournier
87e539906b MINOR: actions: add a function returning a service pointer from its name
This function simply calls action_lookup() on the private service_keywords,
to look up a service name. This will be used to detect double registration
of a same service from Lua.

This will be needed by a next patch to fix a bug and will have to be
backported.
2020-12-02 09:45:18 +01:00
Thierry Fournier
7a71a6d9d2 MINOR: actions: Export actions lookup functions
These functions will be useful to check if a keyword is already registered.
This will be needed by a next patch to fix a bug, and will need to be
backported.
2020-12-02 09:45:18 +01:00
Willy Tarreau
a1f12746b1 MINOR: traces: add a new level "error" below the "user" level
Sometimes it would be nice to be able to only trace abnormal events such
as protocol errors. Let's add a new "error" level below the "user" level
for this. This will allow to add TRACE_ERROR() at various error points
and only see them.
2020-12-01 10:25:20 +01:00
Maciej Zdeb
fcdfd857b3 MINOR: log: Logging HTTP path only with %HPO
This patch adds a new logging variable '%HPO' for logging HTTP path only
(without query string) from relative or absolute URI.

For example:
log-format "hpo=%HPO hp=%HP hu=%HU hq=%HQ"

GET /r/1 HTTP/1.1
=>
hpo=/r/1 hp=/r/1 hu=/r/1 hq=

GET /r/2?q=2 HTTP/1.1
=>
hpo=/r/2 hp=/r/2 hu=/r/2?q=2 hq=?q=2

GET http://host/r/3 HTTP/1.1
=>
hpo=/r/3 hp=http://host/r/3 hu=http://host/r/3 hq=

GET http://host/r/4?q=4 HTTP/1.1
=>
hpo=/r/4 hp=http://host/r/4 hu=http://host/r/4?q=4 hq=?q=4
2020-12-01 09:32:44 +01:00
Emeric Brun
0237c4e3f5 BUG/MEDIUM: local log format regression.
Since 2.3 default local log format always adds hostame field.
This behavior change was due to log/sink re-work, because according
to rfc3164 the hostname field is mandatory.

This patch re-introduce a legacy "local" format which is analog
to rfc3164 but with hostname stripped. This is the new
default if logs are generated by haproxy.

To stay compliant with previous configurations, the option
"log-send-hostname" acts as if the default format is switched
to rfc3164.

This patch addresses the github issue #963

This patch should be backported in branches >= 2.3.
2020-12-01 06:58:42 +01:00
Willy Tarreau
4d6c594998 BUG/MEDIUM: task: close a possible data race condition on a tasklet's list link
In issue #958 Ashley Penney reported intermittent crashes on AWS's ARM
nodes which would not happen on x86 nodes. After investigation it turned
out that the Neoverse N1 CPU cores used in the Graviton2 CPU are much
more aggressive than the usual Cortex A53/A72/A55 or any x86 regarding
memory ordering.

The issue that was triggered there is that if a tasklet_wakeup() call
is made on a tasklet scheduled to run on a foreign thread and that
tasklet is just being dequeued to be processed, there can be a race at
two places:
  - if MT_LIST_TRY_ADDQ() happens between MT_LIST_BEHEAD() and
    LIST_SPLICE_END_DETACHED() if the tasklet is alone in the list,
    because the emptiness tests matches ;

  - if MT_LIST_TRY_ADDQ() happens during LIST_DEL_INIT() in
    run_tasks_from_lists(), then depending on how LIST_DEL_INIT() ends
    up being implemented, it may even corrupt the adjacent nodes while
    they're being reused for the in-tree storage.

This issue was introduced in 2.2 when support for waking up remote
tasklets was added. Initially the attachment of a tasklet to a list
was enough to know its status and this used to be stable information.
Now it's not sufficient to rely on this anymore, thus we need to use
a different information.

This patch solves this by adding a new task flag, TASK_IN_LIST, which
is atomically set before attaching a tasklet to a list, and is only
removed after the tasklet is detached from a list. It is checked
by tasklet_wakeup_on() so that it may only be done while the tasklet
is out of any list, and is cleared during the state switch when calling
the tasklet. Note that the flag is not set for pure tasks as it's not
needed.

However this introduces a new special case: the function
tasklet_remove_from_tasklet_list() needs to keep both states in sync
and cannot check both the state and the attachment to a list at the
same time. This function is already limited to being used by the thread
owning the tasklet, so in this case the test remains reliable. However,
just like its predecessors, this function is wrong by design and it
should probably be replaced with a stricter one, a lazy one, or be
totally removed (it's only used in checks to avoid calling a possibly
scheduled event, and when freeing a tasklet). Regardless, for now the
function exists so the flag is removed only if the deletion could be
done, which covers all cases we're interested in regarding the insertion.
This removal is safe against a concurrent tasklet_wakeup_on() since
MT_LIST_DEL() guarantees the atomic test, and will ultimately clear
the flag only if the task could be deleted, so the flag will always
reflect the last state.

This should be carefully be backported as far as 2.2 after some
observation period. This patch depends on previous patch
"MINOR: task: remove __tasklet_remove_from_tasklet_list()".
2020-11-30 18:17:59 +01:00
Willy Tarreau
2da4c316c2 MINOR: task: remove __tasklet_remove_from_tasklet_list()
This function is only used at a single place directly within the
scheduler in run_tasks_from_lists() and it really ought not be called
by anything else, regardless of what its comment says. Let's delete
it, move the two lines directly into the call place, and take this
opportunity to factor the atomic decrement on tasks_run_queue. A comment
was added on the remaining one tasklet_remove_from_tasklet_list() to
mention the risks in using it.
2020-11-30 18:17:44 +01:00
Willy Tarreau
a868c2920b MINOR: task: remove tasklet_insert_into_tasklet_list()
This function is only called at a single place and adds more confusion
than it removes. It also makes one think it could be used outside of
the scheduler while it must absolutely not. Let's just move its two
lines to the call place, making the code more readable there. In
addition this clearly shows that the preliminary LIST_INIT() is
useless since the entry is immediately overwritten.
2020-11-30 18:17:44 +01:00
Olivier Houchard
1f05324cbe BUG/MEDIUM: lists: Lock the element while we check if it is in a list.
In MT_LIST_TRY_ADDQ() and MT_LIST_TRY_ADD() we can't just check if the
element is already in a list, because there's a small race condition, it
could be added  between the time we checked, and the time we actually set
its next and prev, so we have to lock it first.

This is required to address issue #958.

This should be backported to 2.3, 2.2 and 2.1.
2020-11-30 18:17:29 +01:00
Your Name
1e237d037b MINOR: plock: use an ARMv8 instruction barrier for the pause instruction
As suggested by @AGSaidi in issue #958, on ARMv8 its convenient to use
an "isb" instruction in pl_cpu_relax() to improve fairness. Without it
I've met a few watchdog conditions on valid locks with 16 threads,
indicating that some threads couldn't manage to get it in 2 seconds. I
never happened again with it. In addition, the performance increased
by slightly more than 5% thanks to the reduced contention.

This should be backported as far as 2.2, possibly even 2.0.
2020-11-29 14:53:33 +01:00
Christopher Faulet
97b7bdfcf7 REORG: tcpcheck: Move check option parsing functions based on tcp-check
The parsing of the check options based on tcp-check rules (redis, spop,
smtp, http...) are moved aways from check.c. Now, these functions are placed
in tcpcheck.c. These functions are only related to the tcpcheck ruleset
configured on a proxy and not to the health-check attached to a server.
2020-11-27 10:30:23 +01:00
Christopher Faulet
bb9fb8b7f8 MINOR: config: Deprecate and ignore tune.chksize global option
This option is now ignored because I/O check buffers are now allocated using the
buffer pool. Thus, it is marked as deprecated in the documentation and ignored
during the configuration parsing. The field is also removed from the global
structure.

Because this option is ignored since a recent fix, backported as fare as 2.2,
this patch should be backported too. Especially because it updates the
documentation.
2020-11-27 10:30:23 +01:00
Christopher Faulet
b381a505c1 BUG/MAJOR: tcpcheck: Allocate input and output buffers from the buffer pool
Historically, the input and output buffers of a check are allocated by hand
during the startup, with a specific size (not necessarily the same than
other buffers). But since the recent refactoring of the checks to rely
exclusively on the tcp-checks and to use the underlying mux layer, this part
is totally buggy. Indeed, because these buffers are now passed to a mux,
they maybe be swapped if a zero-copy is possible. In fact, for now it is
only possible in h2_rcv_buf(). Thus the bug concretely only exists if a h2
health-check is performed. But, it is a latent bug for other muxes.

Another problem is the size of these buffers. because it may differ for the
other buffer size, it might be source of bugs.

Finally, for configurations with hundreds of thousands of servers, having 2
buffers per check always allocated may be an issue.

To fix the bug, we now allocate these buffers when required using the buffer
pool. Thus not-running checks don't waste memory and muxes may swap them if
possible. The only drawback is the check buffers have now always the same
size than buffers used by the streams. This deprecates indirectly the
"tune.chksize" global option.

In addition, the http-check regtest have been update to perform some h2
health-checks.

Many thanks to @VigneshSP94 for its help on this bug.

This patch should solve the issue #936. It relies on the commit "MINOR:
tcpcheck: Don't handle anymore in-progress send rules in tcpcheck_main".
Both must be backport as far as 2.2.

bla
2020-11-27 10:29:41 +01:00
Remi Tricot-Le Breton
3d08236cb3 MINOR: cache: Prepare helper functions for Vary support
The Vary functionality is based on a secondary key that needs to be
calculated for every request to which a server answers with a Vary
header. The Vary header, which can only be found in server responses,
determines which headers of the request need to be taken into account in
the secondary key. Since we do not want to have to store all the headers
of the request until we have the response, we will pre-calculate as many
sub-hashes as there are headers that we want to manage in a Vary
context. We will only focus on a subset of headers which are likely to
be mentioned in a Vary response (accept-encoding and referer for now).
Every managed header will have its own normalization function which is
in charge of transforming the header value into a core representation,
more robust to insignificant changes that could exist between multiple
clients. For instance, two accept-encoding values mentioning the same
encodings but in different orders should give the same hash.
This patch adds a function that parses a Vary header value and checks if
all the values belong to our supported subset. It also adds the
normalization functions for our two headers, as well as utility
functions that can prebuild a secondary key for a given request and
transform it into an actual secondary key after the vary signature is
determined from the response.
2020-11-24 16:52:57 +01:00
Christopher Faulet
401e6dbff3 BUG/MAJOR: filters: Always keep all offsets up to date during data filtering
When at least one data filter is registered on a channel, the offsets of all
filters must be kept up to date. For data filters but also for others. It is
safer to do it in that way. Indirectly, this patch fixes 2 hidden bugs
revealed by the commit 22fca1f2c ("BUG/MEDIUM: filters: Forward all filtered
data at the end of http filtering").

The first one, the worst of both, happens at the end of http filtering when
at least one data filtered is registered on the channel. We call the
http_end() callback function on the filters, when defined, to finish the
http filtering. But it is performed for all filters. Before the commit
22fca1f2c, the only risk was to call the http_end() callback function
unexpectedly on a filter. Now, we may have an overflow on the offset
variable, used at the end to forward all filtered data. Of course, from the
moment we forward an arbitrary huge amount of data, all kinds of bad things
may happen. So offset computation is performed for all filters and
http_end() callback function is called only for data filters.

The other one happens when a data filter alter the data of a channel, it
must update the offsets of all previous filters. But the offset of non-data
filters must be up to date, otherwise, here too we may have an integer
overflow.

Another way to fix these bugs is to always ignore non-data filters from the
offsets computation. But this patch is safer and probably easier to
maintain.

This patch must be backported in all versions where the above commit is. So
as far as 2.0.
2020-11-24 14:17:32 +01:00
Ilya Shipitsin
5bfe66366c BUILD: SSL: do not "update" BoringSSL version equivalent anymore
we have added all required fine guarding, no need to reduce
BoringSSL version back to 1.1.0 anymore, we do not depend on it
2020-11-24 09:54:44 +01:00
Ilya Shipitsin
f04a89c549 CLEANUP: remove unused function "ssl_sock_is_ckch_valid"
"ssl_sock_is_ckch_valid" is not used anymore, let us remove it
2020-11-24 09:54:44 +01:00
Julien Pivotto
2de240a676 MINOR: stream: Add level 7 retries on http error 401, 403
Level-7 retries are only possible with a restricted number of HTTP
return codes. While it is usually not safe to retry on 401 and 403, I
came up with an authentication backend which was not synchronizing
authentication of users. While not perfect, being allowed to also retry
on those return codes is really helpful and acts as a hotfix until we
can fix the backend.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-11-23 09:33:14 +01:00
Maciej Zdeb
ebdd4c55da MINOR: http_act: Add -m flag for del-header name matching method
This patch adds -m flag which allows to specify header name
matching method when deleting headers from http request/response.
Currently beg, end, sub, str and reg are supported.

This is related to GitHub issue #909
2020-11-21 15:54:30 +01:00
Willy Tarreau
3aab17bd56 BUG/MAJOR: connection: reset conn->owner when detaching from session list
Baptiste reported a new crash affecting 2.3 which can be triggered
when using H2 on the backend, with http-reuse always and with a tens
of clients doing close only. There are a few combined cases which cause
this to happen, but each time the issue is the same, an already freed
session is dereferenced in session_unown_conn().

Two cases were identified to cause this:
  - a connection referencing a session as its owner, which is detached
    from the session's list and is destroyed after this session ends.
    The test on conn->owner before calling session_unown_conn() is not
    sufficent as the pointer is not null but is not valid anymore.

  - a connection that never goes idle and that gets killed form the
    mux, where session_free() is called first, then conn_free() calls
    session_unown_conn() which scans the just freed session for older
    connections. This one is only triggered with DEBUG_UAF

The reason for this session to be present here is that it's needed during
the connection setup, to be passed to conn_install_mux_be() to mux->init()
as the owning session, but it's never deleted aftrewards. Furthermore, even
conn_session_free() doesn't delete this pointer after freeing the session
that lies there. Both do definitely result in a use-after-free that's more
easily triggered under DEBUG_UAF.

This patch makes sure that the owner is always deleted after detaching
or killing the session. However it is currently not possible to clear
the owner right after a synchronous init because the proxy protocol
apparently needs it (a reg test checks this), and if we leave it past
the connection setup with the session not attached anywhere, it's hard
to catch the right moment to detach it. This means that the session may
remain in conn->owner as long as the connection has never been added to
nor removed from the session's idle list. Given that this patch needs to
remain simple enough to be backported, instead it adds a workaround in
session_unown_conn() to detect that the element is already not attached
anywhere.

This fix absolutely requires previous patch "CLEANUP: connection: do not
use conn->owner when the session is known" otherwise the situation will
be even worse, as some places used to rely on conn->owner instead of the
session.

The fix could theorically be backported as far as 1.8. However, the code
in this area has significantly changed along versions and there are more
risks of breaking working stuff than fixing real issues there. The issue
was really woken up in two steps during 2.3-dev when slightly reworking
the idle conns with commit 08016ab82 ("MEDIUM: connection: Add private
connections synchronously in session server list") and when adding
support for storing used H2 connections in the session and adding the
necessary call to session_unown_conn() in the muxes. But the same test
managed to crash 2.2 when built in DEBUG_UAF and patched like this,
proving that we used to already leave dangling pointers behind us:

|  diff --git a/include/haproxy/connection.h b/include/haproxy/connection.h
|  index f8f235c1a..dd30b5f80 100644
|  --- a/include/haproxy/connection.h
|  +++ b/include/haproxy/connection.h
|  @@ -458,6 +458,10 @@ static inline void conn_free(struct connection *conn)
|                          sess->idle_conns--;
|                  session_unown_conn(sess, conn);
|          }
|  +       else {
|  +               struct session *sess = conn->owner;
|  +               BUG_ON(sess && sess->origin != &conn->obj_type);
|  +       }
|
|          sockaddr_free(&conn->src);
|          sockaddr_free(&conn->dst);

It's uncertain whether an existing code path there can lead to dereferencing
conn->owner when it's bad, though certain suspicious memory corruption bugs
make one think it's a likely candidate. The patch should not be hard to
adapt there.

Backports to 2.1 and older are left to the appreciation of the person
doing the backport.

A reproducer consists in this:

  global
    nbthread 1

  listen l
    bind :9000
    mode http
    http-reuse always
    server s 127.0.0.1:8999 proto h2

  frontend f
    bind :8999 proto h2
    mode http
    http-request return status 200

Then this will make it crash within 2-3 seconds:

  $ h1load -e -r 1 -c 10 http://0:9000/

If it does not, it might be that DEBUG_UAF was not used (it's harder then)
and it might be useful to restart.
2020-11-21 15:29:22 +01:00
Willy Tarreau
38b4d2eb22 CLEANUP: connection: do not use conn->owner when the session is known
At a few places we used to rely on conn->owner to retrieve the session
while the session is already known. This is not correct because at some
of these points the reason the connection's owner was still the session
(instead of NULL) is a mistake. At one place a comparison is even made
between the session and conn->owner assuming it's valid without checking
if it's NULL. Let's clean this up to use the session all the time.

Note that this will be needed for a forthcoming fix and will have to be
backported.
2020-11-21 15:29:22 +01:00
Ilya Shipitsin
f34ed0b74c BUILD: SSL: guard TLS13 ciphersuites with HAVE_SSL_CTX_SET_CIPHERSUITES
HAVE_SSL_CTX_SET_CIPHERSUITES is newly defined macro set in openssl-compat.h,
which helps to identify ssl libs (currently OpenSSL-1.1.1 only) that supports
TLS13 cipersuites manipulation on TLS13 context
2020-11-21 11:04:36 +01:00
Ilya Shipitsin
bdec3ba796 BUILD: ssl: use SSL_MODE_ASYNC macro instead of OPENSSL_VERSION 2020-11-19 19:59:32 +01:00
William Dauchy
f63704488e MEDIUM: cli/ssl: configure ssl on server at runtime
in the context of a progressive backend migration, we want to be able to
activate SSL on outgoing connections to the server at runtime without
reloading.
This patch adds a `set server ssl` command; in order to allow that:

- add `srv_use_ssl` to `show servers state` command for compatibility,
  also update associated parsing
- when using default-server ssl setting, and `no-ssl` on server line,
  init SSL ctx without activating it
- when triggering ssl API, de/activate SSL connections as requested
- clean ongoing connections as it is done for addr/port changes, without
  checking prior server state

example config:

backend be_foo
  default-server ssl
  server srv0 127.0.0.1:6011 weight 1 no-ssl

show servers state:

  5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - -1

where srv0 can switch to ssl later during the runtime:

  set server be_foo/srv0 ssl on

  5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - 1

Also update existing tests and create a new one.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2020-11-18 17:22:28 +01:00
Christopher Faulet
83fefbcdff MINOR: init: Fix the prototype for per-thread free callbacks
Functions registered to release memory per-thread have no return value. But the
registering function and the function pointer in per_thread_free_fct structure
specify it should return an integer. This patch fixes it.

This patch may be backported as far as 2.0.
2020-11-13 16:26:10 +01:00
Amaury Denoyelle
7f8f6cb926 BUG/MEDIUM: stats: prevent crash if counters not alloc with dummy one
Define a per-thread counters allocated with the greatest size of any
stat module counters. This variable is named trash_counters.

When using a proxy without allocated counters, return the trash counters
from EXTRA_COUNTERS_GET instead of a dangling pointer to prevent
segfault.

This is useful for all the proxies used internally and not
belonging to the global proxy list. As these objects does not appears on
the stat report, it does not matter to use the dummy counters.

For this fix to be functional, the extra counters are explicitly
initialized to NULL on proxy/server/listener init functions.

Most notably, the crash has already been detected with the following
vtc:
- reg-tests/lua/txn_get_priv.vtc
- reg-tests/peers/tls_basic_sync.vtc
- reg-tests/peers/tls_basic_sync_wo_stkt_backend.vtc
There is probably other parts that may be impacted (SPOE for example).

This bug was introduced in the current release and do not need to be
backported. The faulty commits are
"MINOR: ssl: count client hello for stats" and
"MINOR: ssl: add counters for ssl sessions".
2020-11-12 15:16:05 +01:00
Remi Tricot-Le Breton
cc9bf2e5fe MEDIUM: cache: Change caching conditions
Do not cache responses that do not have an explicit expiration time
(s-maxage or max-age Cache-Control directives or Expires header) or a
validator (ETag or Last-Modified headers) anymore, as suggested in
RFC 7234#3.
The TX_FLAG_IGNORE flag is used instead of the TX_FLAG_CACHEABLE so as
not to change the behavior of the checkcache option.
2020-11-12 11:22:05 +01:00
Christopher Faulet
a66adf41ea MINOR: http-htx: Add understandable errors for the errorfiles parsing
No details are provided when an error occurs during the parsing of an errorfile,
Thus it is a bit hard to diagnose where the problem is. Now, when it happens, an
understandable error message is reported.

This patch is not a bug fix in itself. But it will be required to change an
fatal error into a warning in last stable releases. Thus it must be backported
as far as 2.0.
2020-11-06 09:13:58 +01:00
Willy Tarreau
38d41996c1 MEDIUM: pattern: turn the pattern chaining to single-linked list
It does not require heavy deletion from the expr anymore, so we can now
turn this to a single-linked list since most of the time we want to delete
all instances of a given pattern from the head. By doing so we save 32 bytes
of memory per pattern. The pat_unlink_from_head() function was adjusted
accordingly.
2020-11-05 19:27:09 +01:00
Willy Tarreau
94b9abe200 MINOR: pattern: add pat_ref_purge_older() to purge old entries
This function will be usable to purge at most a specified number of old
entries from a reference. Entries are declared old if their generation
number is in the past compared to the one passed in argument. This will
ease removal of early entries when new ones have been appended.

We also call malloc_trim() when available, at the end of the series,
because this is one place where there is a lot of memory to save. Reloads
of 1M IP addresses used in an ACL made the process grow up to 1.7 GB RSS
after 10 reloads and roughly stabilize there without this call, versus
only 260 MB when the call is present. Sadly there is no direct equivalent
for jemalloc, which stabilizes around 800MB-1GB.
2020-11-05 19:27:09 +01:00
Willy Tarreau
1a6857b9c1 MINOR: pattern: implement pat_ref_load() to load a pattern at a given generation
pat_ref_load() basically combines pat_ref_append() and pat_ref_commit().
It's very similar to pat_ref_add() except that it also allows to set the
generation ID and the line number. pat_ref_add() was modified to directly
rely on it to avoid code duplication. Note that a previous declaration
of pat_ref_load() was removed as it was just a leftover of an earlier
incarnation of something possibly similar, so no existing functionality
was changed here.
2020-11-05 19:27:09 +01:00
Willy Tarreau
0439e5eeb4 MINOR: pattern: add pat_ref_commit() to commit a previously inserted element
This function will be used after a successful pat_ref_append() to propagate
the pattern to all use places (including parsing and indexing). On failure,
it will entirely roll back all insertions and free the pattern itself. It
also preserves the generation number so that it is convenient for use in
association with pat_ref_append(). pat_ref_add() was modified to rely on
it instead of open-coding the insertion and roll-back.
2020-11-05 19:27:09 +01:00
Willy Tarreau
29947745b5 MINOR: pattern: store a generation number in the reference patterns
Right now it's not possible to perform a safe reload because we don't
know what patterns were recently added or were already present. This
patch adds a generation counter to the reference patterns so that it
is possible to know what generation of the reference they were loaded
with. A reference now has two generations, the current one, used for
all additions, and the next one, allocated to those wishing to update
the contents. The generation wraps at 2^32 so comparisons must be made
relative to the current position.

The idea will be that upon full reload, the caller will first get a new
generation ID, will insert all new patterns using it, will then switch
the current ID to the new one, and will delete all entries older than
the current ID. This has the benefit of supporting chunked updates that
remain consistent and that won't block the whole process for ages like
pat_ref_reload() currently does.
2020-11-05 19:27:09 +01:00
Willy Tarreau
1fd52f70e5 MINOR: pattern: introduce pat_ref_delete_by_ptr() to delete a valid reference
Till now the only way to remove a known reference was via
pat_ref_delete_by_id() which scans the whole list to find a matching pointer.
Let's add pat_ref_delete_by_ptr() which takes a valid pointer. It can be
called by the function above after the pointer is found, and can also be
used to roll back a failed insertion much more efficiently.
2020-11-05 19:27:09 +01:00
Willy Tarreau
a98b2882ac CLEANUP: pattern: remove pat_delete_fcts[] and pattern_head->delete()
These ones are not used anymore, so let's remove them to remove a bit
of the complexity. The ACL keyword's delete() function could be removed
as well, though most keyword declarations are positional and we have a
high risk of introducing a mistake here, so let's not touch the ACL part.
2020-11-05 19:27:09 +01:00
Willy Tarreau
f1c0892aa6 MINOR: pattern: remerge the list and tree deletion functions
pat_del_tree_gen() was already chained onto pat_del_list_gen() to deal
with remaining cases, so let's complete the merge and have a generic
pattern deletion function acting on the reference and taking care of
reliably removing all elements.
2020-11-05 19:27:09 +01:00
Willy Tarreau
78777ead32 MEDIUM: pattern: change the pat_del_* functions to delete from the references
This is the next step in speeding up entry removal. Now we don't scan
the whole lists or trees for elements pointing to the target reference,
instead we start from the reference and delete all linked patterns.

This simplifies some delete functions since we don't need anymore to
delete multiple times from an expression since all nodes appear after
the reference element. We can now have one generic list and one generic
tree deletion function.

This required the replacement of pattern_delete() with an open-coded
version since we now need to lock all expressions first before proceeding.
This means there is a high risk of lock inversion here but given that the
expressions are always scanned in the same order from the same head, this
must not happen.

Now deleting first entries is instantaneous, and it's still slow to
delete the last ones when looking up their ID since it still requires
to look them up by a full scan, but it's already way faster than
previously. Typically removing the last 10 IP from a 20M entries ACL
with a full-scan each took less than 2 seconds.

It would be technically possible to make use of indexed entries to
speed up most lookups for removal by value (e.g. IP addresses) but
that's for later.
2020-11-05 19:27:09 +01:00
Willy Tarreau
4bdd0a13d6 MEDIUM: pattern: link all final elements from the reference
There is a data model issue in the current pattern design that makes
pattern deletion extremely expensive: there's no direct way from a
reference to access all indexed occurrences. As such, the only way
to remove all indexed entries corresponding to a reference update
is to scan all expressions's lists and trees to find a link to the
reference. While this was possibly OK when map removal was not
common and most maps were small, this is not conceivable anymore
with GeoIP maps containing 10M+ entries and del-map operations that
are triggered from http-request rulesets.

This patch introduces two list heads from the pattern reference, one
for the objects linked by lists and one for those linked by tree node.
Ideally a single list would be enough but the linked elements are too
much unrelated to be distinguished at the moment, so we'll need two
lists. However for the long term a single-linked list will suffice but
for now it's not possible due to the way elements are removed from
expressions. As such this patch adds 32 bytes of memory usage per
reference plus 16 per indexed entry, but both will be cut in half
later.

The links are not yet used for deletion, this patch only ensures the
list is always consistent.
2020-11-05 19:27:09 +01:00
Willy Tarreau
6d8a68914e MINOR: pattern: make the delete and prune functions more generic
Now we have a single prune() function to act on an expression, and one
delete function for the lists and one for the trees. The presence of a
pointer in the lists is enough to warrant a free, and we rely on the
PAT_SF_REGFREE flag to decide whether to free using free() or regfree().
2020-11-05 19:27:09 +01:00
Willy Tarreau
9b5c8bbc89 MINOR: pattern: new sflag PAT_SF_REGFREE indicates regex_free() is needed
Currently we have no way to know how to delete/prune a pattern in a
generic way. A pattern doesn't contain its own type so we don't know
what function to call. Tree nodes are roughly OK but not lists where
regex are possible. Let's add one new bit for sflags at index time to
indicate that regex_free() will be needed upon deletion. It's not used
for now.
2020-11-05 19:27:08 +01:00
Willy Tarreau
3ee0de1b41 MINOR: pattern: move the update revision to the pat_ref, not the expression
It's not possible to uniquely update a single expression without updating
the pattern reference, I don't know why we've put the revision in the
expression back then, given that it in fact provides an update for a
full pattern. Let's move the revision into the reference's head instead.
2020-11-05 19:27:08 +01:00
Willy Tarreau
1d3c7003d9 MINOR: compat: automatically include malloc.h on glibc
This is in order to access malloc_trim() which is convenient after
clearing huge maps to reclaim memory. When this is detected, we also
define HA_HAVE_MALLOC_TRIM.
2020-11-05 19:27:08 +01:00
Baptiste Assmann
e279ca6bbe MINOR: sample: Add converts to parses MQTT messages
This patch implements a couple of converters to validate and extract data from a
MQTT (Message Queuing Telemetry Transport) message. The validation consists of a
few checks as well as "packet size" validation. The extraction can get any field
from the variable header and the payload.

This is limited to CONNECT and CONNACK packet types only. All other messages are
considered as invalid. It is not a problem for now because only the first packet
on each side can be parsed (CONNECT for the client and CONNACK for the server).

MQTT 3.1.1 and 5.0 are supported.

Reviewed and Fixed by Christopher Faulet <cfaulet@haproxy.com>
2020-11-05 19:27:03 +01:00
Baptiste Assmann
e138dda1e0 MINOR: sample: Add converters to parse FIX messages
This patch implements a couple of converters to validate and extract tag value
from a FIX (Financial Information eXchange) message. The validation consists in
a few checks such as mandatory fields and checksum computation. The extraction
can get any tag value based on a tag string or tag id.

This patch requires the istend() function. Thus it depends on "MINOR: ist: Add
istend() function to return a pointer to the end of the string".

Reviewed and Fixed by Christopher Faulet <cfaulet@haproxy.com>
2020-11-05 19:26:30 +01:00
Christopher Faulet
cf26623780 MINOR: ist: Add istend() function to return a pointer to the end of the string
istend() is a shortcut to istptr() + istlen().
2020-11-05 19:25:12 +01:00
Willy Tarreau
1db5579bf8 [RELEASE] Released version 2.4-dev0
Released version 2.4-dev0 with the following main changes :
    - MINOR: version: it's development again.
    - DOC: mention in INSTALL that it's development again
2020-11-05 17:20:35 +01:00
Willy Tarreau
b9b2ac20f8 MINOR: version: it's development again.
This reverts commit 0badabc381.
2020-11-05 17:18:49 +01:00
Willy Tarreau
0badabc381 MINOR: version: mention that it's stable now
This version will be maintained up to around Q1 2022.
2020-11-05 17:00:50 +01:00
Ilya Shipitsin
0aa8c29460 BUILD: ssl: use feature macros for detecting ec curves manipulation support
Let us use SSL_CTX_set1_curves_list, defined by OpenSSL, as well as in
openssl-compat when SSL_CTRL_SET_CURVES_LIST is present (BoringSSL),
for feature detection instead of versions.
2020-11-05 15:08:41 +01:00
Willy Tarreau
5b8af1e30c MINOR: ssl: define SSL_CTX_set1_curves_list to itself on BoringSSL
OpenSSL 1.0.2 and onwards define SSL_CTX_set1_curves_list which is both a
function and a macro. OpenSSL 1.0.2 to 1.1.0 define SSL_CTRL_SET_CURVES_LIST
as a macro, which disappeared from 1.1.1. BoringSSL only has that one and
not the former macro but it does have the function. Let's keep the test on
the macro matching the function name by defining the macro to itself when
needed.
2020-11-05 15:05:09 +01:00
Willy Tarreau
7e98e28eb0 MINOR: fd: add fd_want_recv_safe()
This does the same as fd_want_recv() except that it does check for
fd_updt[] to be allocated, as this may be called during early listener
initialization. Previously we used to check fd_updt[] before calling
fd_want_recv() but this is not correct since it does not update the
FD flags. This method will be safer.
2020-11-04 14:22:42 +01:00
Willy Tarreau
9dd7f4fb4b MINOR: debug: don't count free(NULL) in memstats
The mem stats are pretty convenient to spot leaks, except that they count
free(NULL) as 1, and the code does actually have quite a number of free(foo)
guards where foo is NULL if the object was already freed. Let's just not
count these ones so that the stats remain consistent. Now it's possible
to compare the strdup()/malloc() and free() and verify they are consistent.
2020-11-03 16:46:48 +01:00
Ilya Shipitsin
04a5a440b8 BUILD: ssl: use HAVE_OPENSSL_KEYLOG instead of OpenSSL versions
let us use HAVE_OPENSSL_KEYLOG for feature detection instead
of versions
2020-11-03 14:54:15 +01:00
Willy Tarreau
b706a3b4e1 CLEANUP: pattern: remove unused entry "tree" in pattern.val
This one might have disappeared since patterns were reworked, but the
entry was not removed from the structure, let's do it now.
2020-11-02 11:32:05 +01:00
Willy Tarreau
6bedf151e1 MINOR: pattern: export pat_ref_push()
Strangely this one was marked static inline within the file itself.
Let's export it.
2020-10-31 13:13:48 +01:00
Willy Tarreau
f4edb72e0a MINOR: pattern: make pat_ref_append() return the newly added element
It's more convenient to return the element than to return just 0 or 1,
as the next thing we'll want to do is to act on this element! In addition
it was using variable arguments instead of consts, causing some reuse
constraints which were also addressed. This doesn't change its use as
a boolean, hence why call places were not modified.
2020-10-31 13:13:48 +01:00
Remi Tricot-Le Breton
bb4582cf71 MINOR: ist: Add a case insensitive istmatch function
Add a helper function that checks if a string starts with another string
while ignoring case.
2020-10-30 13:20:21 +01:00
Willy Tarreau
bd71510024 MINOR: stats: report server's user-configured weight next to effective weight
The "weight" column on the stats page is somewhat confusing when using
slowstart becaue it reports the effective weight, without being really
explicit about it. In some situations the user-configured weight is more
relevant (especially with long slowstarts where it's important to know
if the configured weight is correct).

This adds a new uweight stat which reports a server's user-configured
weight, and in a backend it receives the sum of all servers' uweights.
In addition it adds the mention of "effective" in a few descriptions
for the "weight" column (help and doc).

As a result, the list of servers in a backend is now always scanned
when dumping the stats. But this is not a problem given that these
servers are already scanned anyway and for way heavier processing.
2020-10-23 22:47:30 +02:00
Willy Tarreau
3e32036701 MINOR: stats: also support a "no-maint" show stat modifier
"no-maint" is a bit similar to "up" except that it will only hide
servers that are in maintenance (or disabled in the configuration), and
not those that are enabled but failed a check. One benefit here is to
significantly reduce the output of the "show stat" command when using
large server-templates containing entries that are not yet provisioned.

Note that the prometheus exporter also has such an option which does
the exact same.
2020-10-23 18:11:24 +02:00
Willy Tarreau
670119955b Revert "OPTIM: queue: don't call pendconn_unlink() when the pendconn is not queued"
This reverts commit b7ba1d9011. Actually
this test had already been removed in the past by commit fac0f645d
("BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe"),
but the condition to reproduce the bug mentioned there was not clear.

Now after analysis and a certain dose of code cleanup, things start to
appear more obvious. what happens is that if we check the presence of
the node in the tree without taking the lock, we can see the NULL at
the instant the node is being unlinked by another thread in
pendconn_process_next_strm() as part of __pendconn_unlink_prx() or
__pendconn_unlink_srv(). Till now there is no issue except that the
pendconn is not removed from the queue during this operation and that
the task is scheduled to be woken up by pendconn_process_next_strm()
with the stream being added to the list of the server's active
connections by __stream_add_srv_conn(). The first thread finishes
faster and gets back to stream_free() faster than the second one
sets the srv_conn on the stream, so stream_free() skips the s->srv_conn
test and doesn't try to dequeue the freshly queued entry. At the
very least a barrier would be needed there but we can't afford to
free the stream while it's being queued. So there's no other solution
than making sure that either __pendconn_unlink_prx() or
pendconn_cond_unlink() get the entry but never both, which is why the
lock is required around the test. A possible solution would be to set
p->target before unlinking the entry and using it to complete the test.
This would leave no dead period where the pendconn is not seen as
attached.

It is possible, yet extremely difficult, to reproduce this bug, which
was first noticed in bug #880. Running 100 servers with maxconn 1 and
maxqueue 1 on leastconn and a connect timeout of 30ms under 16 threads
with DEBUG_UAF, with a traffic making the backend's queue oscillate
around zero (typically using 250 connections with a local httpterm
server) may rarely manage to trigger a use-after-free.

No backport is needed.
2020-10-23 09:21:55 +02:00
Willy Tarreau
b7ba1d9011 OPTIM: queue: don't call pendconn_unlink() when the pendconn is not queued
On connection error processing, we can see massive storms of calls to
pendconn_cond_unlink() to release a possible place in the queue. For
example, in issue #908, on average half of the threads are caught in
this function via back_try_conn_req() consecutive to a synchronous
error. However we wait until grabbing the lock to know if the pendconn
is effectively in a queue, which is expensive for many cases. We know
the transition may only happen from in-queue to out-of-queue so it's safe
to first run a preliminary check to see if it's worth going further. This
will allow to avoid the cost of locking for most requests. This should
not change anything for those completing correctly as they're already
run through pendconn_free() which doesn't call pendconn_cond_unlink()
unless deemed necessary.
2020-10-22 17:32:28 +02:00
Willy Tarreau
ac66d6bafb MINOR: proxy; replace the spinlock with an rwlock
This is an anticipation of finer grained locking for the queues. For now
all lock places take a write lock so that there is no difference at all
with previous code.
2020-10-22 17:32:28 +02:00
Willy Tarreau
de785f04e1 MINOR: threads/debug: only report lock stats for used operations
In addition to the previous simplification, most locks don't use the
seek or read lock (e.g. spinlocks etc) so let's split the dump into
distinct operations (write/seek/read) and only report those which
were used. Now the output size is roughly divided by 5 compared
to previous ones.
2020-10-22 17:32:28 +02:00
Willy Tarreau
23d3b00bdd MINOR: threads/debug: only report used lock stats
The lock stats are very verbose and more than half of them are used in
a typical test, making it hard to spot the sought values. Let's simply
report "not used" for those which have not been called at all.
2020-10-22 17:32:28 +02:00
Christopher Faulet
d6c48366b8 BUG/MINOR: http-ana: Don't send payload for internal responses to HEAD requests
When an internal response is returned to a client, the message payload must be
skipped if it is a reply to a HEAD request. The payload is removed from the HTX
message just before the message forwarding.

This bugs has been around for a long time. It was already there in the pre-HTX
versions. In legacy HTTP mode, internal errors are not parsed. So this bug
cannot be easily fixed. Thus, this patch should only be backported in all HTX
versions, as far as 2.0. However, the code has significantly changed in the
2.2. Thus in the 2.1 and 2.0, the patch must be entirely reworked.
2020-10-22 17:13:22 +02:00
Remi Tricot-Le Breton
6cb10384a3 MEDIUM: cache: Add support for 'If-None-Match' request header
Partial support of conditional HTTP requests. This commit adds the
support of the 'If-None-Match' header (see RFC 7232#3.2).
When a client specifies a list of ETags through one or more
'If-None-Match' headers, they are all compared to the one that might have
been stored in the corresponding http cache entry until one of them
matches.
If a match happens, a specific "304 Not Modified" response is
sent instead of the cached data. This response has all the stored
headers but no other data (see RFC 7232#4.1). Otherwise, the whole cached data
is sent.
Although unlikely in a GET/HEAD request, the "If-None-Match: *" syntax is
valid and also receives a "304 Not Modified" response (RFC 7434#4.3.2).

This resolves a part of GitHub issue #821.
2020-10-22 16:10:20 +02:00
Remi Tricot-Le Breton
bcced09b91 MINOR: http: Add etag comparison function
Add a function that compares two etags that might be of different types.
If any of them is weak, the 'W/' prefix is discarded and a strict string
comparison is performed.

Co-authored-by: Tim Duesterhus <tim@bastelstu.be>
2020-10-22 16:06:20 +02:00
Tim Duesterhus
2493ee81d4 MINOR: http: Add enum etag_type http_get_etag_type(const struct ist)
http_get_etag_type returns whether a given `etag` is a strong, weak, or invalid
ETag.
2020-10-22 16:02:29 +02:00
William Lallemand
8e8581e242 MINOR: ssl: 'ssl-load-extra-del-ext' removes the certificate extension
In issue #785, users are reporting that it's not convenient to load a
".crt.key" when the configuration contains a ".crt".

This option allows to remove the extension of the certificate before
trying to load any extra SSL file (.key, .ocsp, .sctl, .issuer etc.)

The patch changes a little bit the way ssl_sock_load_files_into_ckch()
looks for the file.
2020-10-20 18:25:46 +02:00
Christopher Faulet
96ddc8ab43 BUG/MEDIUM: connection: Never cleanup server lists when freeing private conns
When a connection is released, depending on its state, it may be detached from
the session and it may be removed from the server lists. The first case may
happen for private or unsharable active connections. The second one should only
be performed for idle or available connections. We never try to remove a
connection from the server list if it is attached to a session. But it is also
important to never try to remove a private connecion from the server lists, even
if it is not attached to a session. Otherwise, the curr_used_conn server counter
is decremented once too often.

This bug was introduced by the commit 04a24c5ea ("MINOR: connection: don't check
priv flag on free"). It is related to the issue #881. It only affects the 2.3,
no backport is needed.
2020-10-19 17:19:10 +02:00
Willy Tarreau
69a7b8fc6c CLEANUP: task: remove the unused and mishandled global_rqueue_size
This counter is only updated and never used, and in addition it's done
without any atomicity so it's very unlikely to be correct on multi-CPU
systems! Let's just remove it since it's not used.
2020-10-19 14:08:13 +02:00
Willy Tarreau
e72a3f4489 CLEANUP: tree-wide: reorder a few structures to plug some holes around locks
A few structures were slightly rearranged in order to plug some holes
left around the locks. Sizes ranging from 8 to 32 bytes could be saved
depending on the structures. No performance difference was noticed (none
was expected there), though memory usage might be slightly reduced in
some rare cases.
2020-10-19 14:08:13 +02:00
Willy Tarreau
8f1f177ed0 MINOR: threads: change lock_t to an unsigned int
We don't need to waste the size of a long for the locks: with the plocks,
even an unsigned short would offer enough room for up to 126 threads! Let's
use an unsigned int which will be easier to place in certain structures
and will more conveniently plug some holes, and Atomic ops are at least
as fast on 32-bit as on 64-bit. This will not change anything for 32-bit
platforms.
2020-10-19 14:08:13 +02:00
Willy Tarreau
3d18498645 CLEANUP: threads: don't register an initcall when not debugging
It's a bit overkill to register an initcall to call a function to set
a lock to zero when not debugging, let's just declare the lock as
pre-initialized to zero.
2020-10-19 14:08:13 +02:00
Ilya Shipitsin
fcb69d768b BUILD: ssl: make BoringSSL use its own version numbers
BoringSSL is a fork of OpenSSL 1.1.0, however in
49e9f67d8b7cbeb3953b5548ad1009d15947a523 it has changed version to 1.1.1.

Should fix issue #895.

This must be backported to 2.2, 2.1, 2.0, 1.8
2020-10-19 11:34:37 +02:00
Willy Tarreau
cd10def825 MINOR: backend: replace the lbprm lock with an rwlock
It was previously a spinlock, and it happens that a number of LB algos
only lock it for lookups, without performing any modification. Let's
first turn it to an rwlock and w-lock it everywhere. This is strictly
identical.

It was carefully checked that every HA_SPIN_LOCK() was turned to
HA_RWLOCK_WRLOCK() and that HA_SPIN_UNLOCK() was turned to
HA_RWLOCK_WRUNLOCK() on this lock. _INIT and _DESTROY were updated too.
2020-10-17 18:51:41 +02:00
Willy Tarreau
61f799b8da MINOR: threads: add the transitions to/from the seek state
Since our locks are based on progressive locks, we support the upgradable
seek lock that is compatible with readers and upgradable to a write lock.
The main purpose is to take it while seeking down a tree for modification
while other threads may seek the same tree for an input (e.g. compute the
next event date).

The newly supported operations are:

  HA_RWLOCK_SKLOCK(lbl,l)         pl_take_s(l)      /* N --> S */
  HA_RWLOCK_SKTOWR(lbl,l)         pl_stow(l)        /* S --> W */
  HA_RWLOCK_WRTOSK(lbl,l)         pl_wtos(l)        /* W --> S */
  HA_RWLOCK_SKTORD(lbl,l)         pl_stor(l)        /* S --> R */
  HA_RWLOCK_WRTORD(lbl,l)         pl_wtor(l)        /* W --> R */
  HA_RWLOCK_SKUNLOCK(lbl,l)       pl_drop_s(l)      /* S --> N */
  HA_RWLOCK_TRYSKLOCK(lbl,l)      (!pl_try_s(l))    /* N -?> S */
  HA_RWLOCK_TRYRDTOSK(lbl,l)      (!pl_try_rtos(l)) /* R -?> S */

Existing code paths are left unaffected so this patch doesn't affect
any running code.
2020-10-16 16:53:46 +02:00
Willy Tarreau
8d5360ca7f MINOR: threads: augment rwlock debugging stats to report seek lock stats
We currently use only read and write lock operations with rwlocks, but
ours also support upgradable seek locks for which we do not report any
stats. Let's add them now when DEBUG_THREAD is enabled.
2020-10-16 16:51:49 +02:00
Willy Tarreau
233ad288cd CLEANUP: protocol: remove the now unused <handler> field of proto_fam->bind()
We don't need to specify the handler anymore since it's set in the
receiver. Let's remove this argument from the function and clean up
the remains of code that were still setting it.
2020-10-15 21:47:56 +02:00
Willy Tarreau
a74cb38e7c MINOR: protocol: register the receiver's I/O handler and not the protocol's
Now we define a new sock_accept_iocb() for socket-based stream protocols
and use it as a wrapper for listener_accept() which now takes a listener
and not an FD anymore. This will allow the receiver's I/O cb to be
redefined during registration, and more specifically to get rid of the
hard-coded hacks in protocol_bind_all() made for syslog.

The previous ->accept() callback in the protocol was removed since it
doesn't have anything to do with accept() anymore but is more generic.
A few places where listener_accept() was compared against the FD's IO
callback for debugging purposes on the CLI were updated.
2020-10-15 21:47:56 +02:00
Willy Tarreau
d2fb99f9d5 MINOR: protocol: add a default I/O callback and put it into the receiver
For now we're still using the protocol's default accept() function as
the I/O callback registered by the receiver into the poller. While
this is usable for most TCP connections where a listener is needed,
this is not suitable for UDP where a different handler is needed.

Let's make this configurable in the receiver just like the upper layer
is configurable for listeners. In order to ease stream protocols
handling, the protocols will now provide a default I/O callback
which will be preset into the receivers upon allocation so that
almost none of them has to deal with it.
2020-10-15 21:47:56 +02:00
Willy Tarreau
f1dc9f2f17 MINOR: sock: implement sock_accept_conn() to accept a connection
The socket-specific accept() code in listener_accept() has nothing to
do there. Let's move it to sock.c where it can be significantly cleaned
up. It will now directly return an accepted connection and provide a
status code instead of letting listener_accept() deal with various errno
values. Note that this doesn't support the sockpair specific code.

The function is now responsible for dealing with its own receiver's
polling state and calling fd_cant_recv() when facing EAGAIN.

One tiny change from the previous implementation is that the connection's
sockaddr is now allocated before trying accept(), which saves a memcpy()
of the resulting address for each accept at the expense of a cheap
pool_alloc/pool_free on the final accept returning EAGAIN. This still
apparently slightly improves accept performance in microbencharks.
2020-10-15 21:47:56 +02:00
Willy Tarreau
1e509a7231 MINOR: protocol: add a new function accept_conn()
This per-protocol function will be used to accept an incoming
connection and return it as a struct connection*. As such the protocol
stack's internal representation of a connection will not need to be
handled by the listener code.
2020-10-15 21:47:56 +02:00
Willy Tarreau
7d053e4211 MINOR: sock: rename sock_accept_conn() to sock_accepting_conn()
This call was introduced by commit 5ced3e887 ("MINOR: sock: add
sock_accept_conn() to test a listening socket") but is actually quite
confusing because it makes one think the socket will accept a connection
(which is what we want to have in a new function) while it only tells
whether it's configured to accept connections. Let's call it
sock_accepting_conn() instead.

The same change was applied to sockpair which had the same issue.
2020-10-15 21:47:56 +02:00
Willy Tarreau
65ed143841 MINOR: connection: add new error codes for accept_conn()
accept_conn() will be used to accept an incoming connection and return it.
It will have to deal with various error codes. The currently identified
ones were created as CO_AC_*.
2020-10-15 21:47:56 +02:00
Willy Tarreau
83efc320aa MEDIUM: listener: allocate the connection before queuing a new connection
Till now we would keep a per-thread queue of pending incoming connections
for which we would store:
  - the listener
  - the accepted FD
  - the source address
  - the source address' length

And these elements were first used in session_accept_fd() running on the
target thread to allocate a connection and duplicate them again. Doing
this induces various problems. The first one is that session_accept_fd()
may only run on file descriptors and cannot be reused for QUIC. The second
issue is that it induces lots of memory copies and that the listerner
queue thrashes a lot of cache, consuming 64 bytes per entry.

This patch changes this by allocating the connection before queueing it,
and by only placing the connection's pointer into the queue. Indeed, the
first two calls used to initialize the connection already store all the
information above, which can be retrieved from the connection pointer
alone. So we just have to pop one pointer from the target thread, and
pass it to session_accept_fd() which only needs the FD for the final
settings.

This starts to make the accept path a bit more transport-agnostic, and
saves memory and CPU cycles at the same time (1% connection rate increase
was noticed with 4 threads). Thanks to dividing the accept-queue entry
size from 64 to 8 bytes, its size could be increased from 256 to 1024
connections while still dividing the overall size by two. No single
queue full condition was met.

One minor drawback is that connection may be allocated from one thread's
pool to be used into another one. But this already happens a lot with
connection reuse so there is really nothing new here.
2020-10-15 21:47:56 +02:00
Willy Tarreau
9b7587a6af MINOR: connection: make sockaddr_alloc() take the address to be copied
Roughly half of the calls to sockadr_alloc() are made to copy an already
known address. Let's optionally pass it in argument so that the function
can handle the copy at the same time, this slightly simplifies its usage.
2020-10-15 21:47:56 +02:00
Willy Tarreau
0138f51f93 CLEANUP: fd: finally get rid of fd_done_recv()
fd_done_recv() used to be useful with the FD cache because it used to
allow to keep a file descriptor active in the poller without being
marked as ready in the cache, saving it from ringing immediately,
without incurring any system call. It was a way to make it yield
to wait for new events leaving a bit of time for others. The only
user left was the connection accepter (listen_accept()). We used
to suspect that with the FD cache removal it had become totally
useless since changing its readiness or not wouldn't change its
status regarding the poller itself, which would be the only one
deciding to report it again.

Careful tests showed that it indeed has exactly zero effect nowadays,
the syscall numbers are exactly the same with and without, including
when enabling edge-triggered polling.

Given that there's no more API available to manipulate it and that it
was directly called as an optimization from listener_accept(), it's
about time to remove it.
2020-10-15 21:47:56 +02:00
Willy Tarreau
e53e7ec9d9 CLEANUP: protocol: remove the ->drain() function
No protocol defines it anymore. The last user used to be the monitor-net
stuff that got partially broken already when the tcp_drain() function
moved to conn_sock_drain() with commit e215bba95 ("MINOR: connection:
make conn_sock_drain() work for all socket families") in 1.9-dev2.

A part of this will surely move back later when non-socket connections
arrive with QUIC but better keep the API clean and implement what's
needed in time instead.
2020-10-15 21:47:04 +02:00
Willy Tarreau
9e9919dd8b MEDIUM: proxy: remove obsolete "monitor-net"
As discussed here during 2.1-dev, "monitor-net" is totally obsolete:

   https://www.mail-archive.com/haproxy@formilux.org/msg35204.html

It's fundamentally incompatible with usage of SSL, and imposes the
presence of file descriptors with hard-coded syscalls directly in the
generic accept path.

It's very unlikely that anyone has used it in the last 10 years for
anything beyond testing. In the worst case if anyone would depend
on it, replacing it with "http-request return status 200 if ..." and
"mode http" would certainly do the trick.

The keyword is still detected as special by the config parser to help
users update their configurations appropriately.
2020-10-15 21:47:04 +02:00
Willy Tarreau
77e0daef9f MEDIUM: proxy: remove obsolete "mode health"
As discussed here during 2.1-dev, "mode health" is totally obsolete:

   https://www.mail-archive.com/haproxy@formilux.org/msg35204.html

It's fundamentally incompatible with usage of SSL, doesn't support
source filtering, and imposes the presence of file descriptors with
hard-coded syscalls directly in the generic accept path.

It's very unlikely that anyone has used it in the last 10 years for
anything beyond testing. In the worst case if anyone would depend
on it, replacing it with "http-request return status 200" and "mode
http" would certainly do the trick.

The keyword is still detected as special by the config parser to help
users update their configurations appropriately.
2020-10-15 21:47:04 +02:00
Amaury Denoyelle
04a24c5eaa MINOR: connection: don't check priv flag on free
Do not check CO_FL_PRIVATE flag to check if the connection is in session
list on conn_free. This is necessary due to the future patches which add
server connections in the session list even if not private, if the mux
protocol is the subject of HOL blocking.
2020-10-15 15:19:34 +02:00