10599 Commits

Author SHA1 Message Date
Willy Tarreau
d2d3348acb MINOR: activity: enable automatic profiling turn on/off
Instead of having to manually turn task profiling on/off in the
configuration, by default it will work in "auto" mode, which
automatically turns on on any thread experiencing sustained loop
latencies over one millisecond averaged over the last 1024 samples.

This may happen with configs using lots of regex (thing map_reg for
example, which is the lazy way to convert Apache's rewrite rules but
must not be abused), and such high latencies affect all the process
and the problem is most often intermittent (e.g. hitting a map which
is only used for certain host names).

Thus now by default, with profiling set to "auto", it remains off all
the time until something bad happens. This also helps better focus on
the issues when looking at the logs as well as in "show sess" output.
It automatically turns off when the average loop latency over the last
1024 calls goes below 990 microseconds (which typically takes a while
when in idle).

This patch could be backported to stable versions after a bit more
exposure, as it definitely improves observability and the ability to
quickly spot the culprit. In this case, previous patch ("MINOR:
activity: make the profiling status per thread and not global") must
also be taken.
2019-04-25 17:26:46 +02:00
Willy Tarreau
d9add3acc8 MINOR: activity: make the profiling status per thread and not global
In order to later support automatic profiling turn on/off, we need to
have it per-thread. We're keeping the global option to know whether to
turn it or on off, but the profiling status is now set per thread. We're
updating the status in activity_count_runtime() which is called before
entering poll(). The reason is that we'll extend this with run time
measurement when deciding to automatically turn it on or off.
2019-04-25 17:26:19 +02:00
Willy Tarreau
d636675137 BUG/MINOR: activity: always initialize the profiling variable
It happens it was only set if present in the configuration. It's
harmless anyway but can still cause doubts when comparing logs and
configurations so better correctly initialize it.

This should be backported to 1.9.
2019-04-25 17:26:19 +02:00
Willy Tarreau
22d63a24d9 MINOR: applet: measure and report an appctx's call rate in "show sess"
Very similarly to previous commit doing the same for streams, we now
measure and report an appctx's call rate. This will help catch applets
which do not consume all their data and/or which do not properly report
that they're waiting for something else. Some of them like peers might
theorically be able to exhibit some occasional peeks when teaching a
full table to a nearby peer (e.g. the new replacement process), but
nothing close to what a bogus service can do so there is no risk of
confusion.
2019-04-24 16:04:23 +02:00
Willy Tarreau
2e9c1d2960 MINOR: stream: measure and report a stream's call rate in "show sess"
Quite a few times some bugs have made a stream task incorrectly
handle a complex combination of events, which was often reported as
"100% CPU", and was usually caused by the event not being properly
identified and flushed, and the stream's handler called in loops.

This patch adds a call rate counter to the stream struct. It's not
huge, it's really inexpensive (especially compared to the rest of the
processing function) and will easily help spot such tasks in "show sess"
output, possibly even allowing to kill them.

A future patch should probably consist in alerting when they're above a
certain threshold, possibly sending a dump and killing them. Some options
could also consist in aborting in order to get an analyzable core dump
and let a service manager restart a fresh new process.
2019-04-24 16:04:23 +02:00
Willy Tarreau
0212fadd65 MINOR: tasks/activity: report the context switch and task wakeup rates
It's particularly useful to spot runaway tasks to see this. The context
switch rate covers all tasklet calls (tasks and I/O handlers) while the
task wakeups only covers tasks picked from the run queue to be executed.
High values there will indicate either an intense traffic or a bug that
mades a task go wild.
2019-04-24 16:04:23 +02:00
Willy Tarreau
69b5a7f1a3 CLEANUP: task: report calls as unsigned in show sess
The "show sess" output used signed ints to report the number of calls,
which is confusing for runaway tasks where the call count can turn
negative.
2019-04-24 16:04:23 +02:00
Christopher Faulet
4904058661 BUG/MINOR: htx: Exclude TCP proxies when the HTX mode is handled during startup
When tests are performed on the HTX mode during HAProxy startup, only HTTP
proxies are considered. It is important because, since the commit 1d2b586cd
("MAJOR: htx: Enable the HTX mode by default for all proxies"), the HTX is
enabled on all proxies by default. But for TCP proxies, it is "deactivated".

This patch must be backported to 1.9.
2019-04-24 15:40:02 +02:00
Willy Tarreau
274ba67862 BUG/MAJOR: lb/threads: fix AB/BA locking issue in round-robin LB
An occasional divide by zero in the round-robin scheduler was addressed
in commit 9df86f997 ("BUG/MAJOR: lb/threads: fix insufficient locking on
round-robin LB") by grabing the server's lock in fwrr_get_server_from_group().

But it happens that this is not the correct approach as it introduces a
case of AB/BA deadlock reported by Maksim Kupriianov. This happens when
a server weight changes from/to zero while another thread extracts this
server from the tree. The reason is that the functions used to manipulate
the state work under the server's lock and grab the LB lock while the ones
used in LB work under the LB lock and grab the server's lock when needed.

This commit mostly reverts the changes above and instead further completes
the locking analysis performed on this code to identify areas that really
need to be protected by the server's lock, since this is the only algorithm
which happens to have this requirement. This audit showed that in fact all
locations which require the server's lock are already protected by the LB
lock. This was not noticed the first time due to the server's lock being
taken instead and due to some functions misleadingly using atomic ops to
modify server fields which are under the LB lock protection (these ones
were now removed).

The change consists in not taking the server's lock anymore here, and
instead making sure that the aforementioned function which used to
suffer from the server's weight becoming zero only uses a copy of the
weight which was preliminary verified to be non-null (when the weight
is null, the server will be removed from the tree anyway so there is
no need to recalculate its position).

With this change, the code survived an injection at 200k req/s split
on two servers with weights changing 50 times a second.

This commit must be backported to 1.9 only.
2019-04-24 14:23:40 +02:00
Olivier Houchard
a28454ee21 BUG/MEDIUM: ssl: Return -1 on recv/send if we got EAGAIN.
In ha_ssl_read()/ha_ssl_write(), if we couldn't send/receive data because
we got EAGAIN, return -1 and not 0, as older SSL versions expect that.
This should fix the problems with OpenSSL < 1.1.0.
2019-04-24 12:06:08 +02:00
Christopher Faulet
371723b0c2 BUG/MINOR: spoe: Don't systematically wakeup SPOE stream in the applet handler
This can lead to wakeups in loop between the SPOE stream and the SPOE applets
waiting to receive agent messages (mainly AGENT-HELLO and AGENT-DISCONNECT).

This patch must be backported to 1.9 and 1.8.
2019-04-23 21:20:47 +02:00
Christopher Faulet
5e1a9d715e BUG/MEDIUM: stream: Fix the way early aborts on the client side are handled
A regression was introduced with the commit c9aecc8ff ("BUG/MEDIUM: stream:
Don't request a server connection if a shutw was scheduled"). Among other this,
it breaks the CLI when the shutr on the client side is handled with the client
data. To depend on the flag CF_SHUTW_NOW to not establish the server connection
when an error on the client side is detected is the right way to fix the bug,
because this flag may be set without any error on the client side.

So instead, we abort the request where the error is handled and only when the
backend stream-interface is in the state SI_ST_INI. This way, there is no
ambiguity on the reason why the abort accurred. The stream-interface is also
switched to the state SI_ST_CLO.

This patch must be backported to 1.9. If the commit c9aecc8ff is backported to
previous versions, this one MUST also be backported. Otherwise, it MAY be
backported to older versions that 1.9 with caution.
2019-04-23 21:20:47 +02:00
Frédéric Lécaille
bed883abe8 BUG/MAJOR: stream: Missing DNS context initializations.
Fix some missing initializations wich came with 333939c commit (MINOR: action:
new '(http-request|tcp-request content) do-resolve' action). The DNS contexts of
streams which were allocated were not initialized by stream_new(). This leaded to
accesses to non-allocated memory when freeing these contexts with stream_free().
2019-04-23 20:24:11 +02:00
Frédéric Lécaille
0bad840b4d MINOR: log: Extract some code to send syslog messages.
This patch extracts the code of __send_log() responsible of sending a syslog
message to a syslog destination represented as a logsrv struct to define
__do_send_log() function. __send_log() calls __do_send_log() for each syslog
destination of a proxy after having prepared some of its parameters.
2019-04-23 14:16:51 +02:00
Baptiste Assmann
333939c2ee MINOR: action: new '(http-request|tcp-request content) do-resolve' action
The 'do-resolve' action is an http-request or tcp-request content action
which allows to run DNS resolution at run time in HAProxy.
The name to be resolved can be picked up in the request sent by the
client and the result of the resolution is stored in a variable.
The time the resolution is being performed, the request is on pause.
If the resolution can't provide a suitable result, then the variable
will be empty. It's up to the admin to take decisions based on this
statement (return 503 to prevent loops).

Read carefully the documentation concerning this feature, to ensure your
setup is secure and safe to be used in production.

This patch creates a global counter to track various errors reported by
the action 'do-resolve'.
2019-04-23 11:41:52 +02:00
Baptiste Assmann
db4c8521ca MINOR: dns: move callback affection in dns_link_resolution()
In dns.c, dns_link_resolution(), each type of dns requester is managed
separately, that said, the callback function is affected globaly (and
points to server type callbacks only).
This design prevents the addition of new dns requester type and this
patch aims at fixing this limitation: now, the callback setting is done
directly into the portion of code dedicated to each requester type.
2019-04-23 11:34:11 +02:00
Baptiste Assmann
dfd35fd71a MINOR: dns: dns_requester structures are now in a memory pool
dns_requester structure can be allocated at run time when servers get
associated to DNS resolution (this happens when SRV records are used in
conjunction with service discovery).
Well, this memory allocation is safer if managed in an HAProxy pool,
furthermore with upcoming HTTP action which can perform DNS resolution
at runtime.

This patch moves the memory management of the dns_requester structure
into its own pool.
2019-04-23 11:33:48 +02:00
paulborile
7714b12604 MINOR: wurfl: enabled multithreading mode
Initially excluded multithreaded mode is completely supported (libwurfl is fully MT safe).
Internal tests now are run also with multithreading enabled.
2019-04-23 11:00:23 +02:00
paulborile
bad132c384 CLEANUP: wurfl: removed deprecated methods
last 2 major releases of libwurfl included a complete review of engine options with
the result of deprecating many features. The patch removes unecessary code and fixes
the documentation.
Can be backported on any version of haproxy.

[wt: must not be backported since it removes config keywords and would
 thus break existing configurations]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2019-04-23 11:00:23 +02:00
paulborile
59d50145dc BUILD: wurfl: build fix for 1.9/2.0 code base
This applies the required changes for the new buffer API that came in 1.9.
This patch must be backported to 1.9.
2019-04-23 11:00:23 +02:00
Willy Tarreau
b518823f1b MINOR: wurfl: indicate in haproxy -vv the wurfl version in use
It also explicitly mentions that the library is the dummy one when it
is detected.

We have this output now :

$ ./haproxy  -vv |grep -i wurfl
Built with WURFL support (dummy library version 1.11.2.100)
2019-04-23 11:00:23 +02:00
Willy Tarreau
b3cc9f2887 Revert "CLEANUP: wurfl: remove dead, broken and unmaintained code"
This reverts commit 8e5e1e7bf000f2603a085c76be118214d22c55b4.

The following patches will fix this code and may be backported.
2019-04-23 10:34:43 +02:00
Emeric Brun
d0e095c2aa MINOR: ssl/cli: async fd io-handlers printable on show fd
This patch exports the async fd iohandlers and make them printable
doing a 'show fd' on cli.
2019-04-19 17:27:01 +02:00
Christopher Faulet
46451d6e04 MINOR: gcc: Fix a silly gcc warning in connect_server()
Don't know why it happens now, but gcc seems to think srv_conn may be NULL when
a reused connection is removed from the orphan list. It happens when HAProxy is
compiled with -O2 with my gcc (8.3.1) on fedora 29... Changing a little how
reuse parameter is tested removes the warnings. So...

This patch may be backported to 1.9.
2019-04-19 15:53:23 +02:00
Christopher Faulet
f48552f2c1 BUG/MINOR: da: Get the request channel to call CHECK_HTTP_MESSAGE_FIRST()
Since the commit 89dc49935 ("BUG/MAJOR: http_fetch: Get the channel depending on
the keyword used"), the right channel must be passed as argument when the macro
CHECK_HTTP_MESSAGE_FIRST is called.

This patch must be backported to 1.9.
2019-04-19 15:53:23 +02:00
Christopher Faulet
2db9dac4c8 BUG/MINOR: 51d: Get the request channel to call CHECK_HTTP_MESSAGE_FIRST()
Since the commit 89dc49935 ("BUG/MAJOR: http_fetch: Get the channel depending on
the keyword used"), the right channel must be passed as argument when the macro
CHECK_HTTP_MESSAGE_FIRST is called.

This patch must be backported to 1.9.
2019-04-19 15:53:23 +02:00
Christopher Faulet
c54e4b053d BUG/MEDIUM: stream: Don't request a server connection if a shutw was scheduled
If a shutdown for writes was performed on the client side (CF_SHUTW is set on
the request channel) while the server connection is still unestablished (the
stream-int is in the state SI_ST_INI), then it is aborted. It must also be
aborted when the shudown for write is pending (only CF_SHUTW_NOW is
set). Otherwise, some errors on the request channel can be ignored, leaving the
stream in an undefined state.

This patch must be backported to 1.9. It may probably be backported to all
suported versions, but it is unclear if the bug is visbile for older versions
than 1.9. So it is probably safer to wait bug reports on these versions to
backport this patch.
2019-04-19 15:53:23 +02:00
Christopher Faulet
e84289e585 BUG/MEDIUM: thread/http: Add missing locks in set-map and add-acl HTTP rules
Locks are missing in the rules "http-request set-map" and "http-response
add-acl" when an acl or map update is performed. Pattern elements must be
locked.

This patch must be backported to 1.9 and 1.8. For the 1.8, the HTX part must be
ignored.
2019-04-19 15:53:23 +02:00
Baptiste Assmann
e1afd4fec6 MINOR: proto_tcp: tcp-request content: enable set-dst and set-dst-var
The set-dst and set dst-var are available at both 'tcp-request
connection' and 'http-request' but not at the layer in the middle.
This patch fixes this miss and enables both set-dst and set-dst-var at
'tcp-request content' layer.
2019-04-19 15:50:06 +02:00
Willy Tarreau
78c5eec949 BUG/MINOR: acl: properly detect pattern type SMP_T_ADDR
Since 1.6-dev4 with commit b2f8f087f ("MINOR: map: The map can return
IPv4 and IPv6"), maps can return both IPv4 and IPv6 addresses, which
is represented as SMP_T_ADDR at the output of the map converter. But
the ACL parser only checks for either SMP_T_IPV4 or SMP_T_IPV6 and
requires to see an explicit matching method specified. Given that it
uses the same pattern parser for both address families, it implicitly
is also compatible with SMP_T_ADDR, which ought to have been added
there.

This fix should be backported as far as 1.6.
2019-04-19 11:45:20 +02:00
Willy Tarreau
aa5801bcaa BUG/MEDIUM: maps: only try to parse the default value when it's present
Maps returning an IP address (e.g. map_str_ip) support an optional
default value which must be parsed. Unfortunately the parsing code does
not check for this argument's existence and uncondtionally tries to
resolve the argument whenever the output is of type address, resulting
in segfaults at parsing time when no such argument is provided. This
patch adds the appropriate check.

This fix may be backported as far as 1.6.
2019-04-19 11:35:22 +02:00
Olivier Houchard
88698d966d MEDIUM: connections: Add a way to control the number of idling connections.
As by default we add all keepalive connections to the idle pool, if we run
into a pathological case, where all client don't do keepalive, but the server
does, and haproxy is configured to only reuse "safe" connections, we will
soon find ourself having lots of idling, unusable for new sessions, connections,
while we won't have any file descriptors available to create new connections.

To fix this, add 2 new global settings, "pool_low_ratio" and "pool_high_ratio".
pool-low-fd-ratio  is the % of fds we're allowed to use (against the maximum
number of fds available to haproxy) before we stop adding connections to the
idle pool, and destroy them instead. The default is 20. pool-high-fd-ratio is
the % of fds we're allowed to use (against the maximum number of fds available
to haproxy) before we start killing idling connection in the event we have to
create a new outgoing connection, and no reuse is possible. The default is 25.
2019-04-18 19:52:03 +02:00
Olivier Houchard
7c49d2e213 MINOR: fd: Add a counter of used fds.
Add a new counter, ha_used_fds, that let us know how many file descriptors
we're currently using.
2019-04-18 19:19:59 +02:00
Emeric Brun
0bbec0fa34 MINOR: peers: adds counters on show peers about tasks calls.
This patch adds a counter of calls on the orchestator peers task
and a counter on the tasks linked to applet i/o handler for
each peer.

Those two counters are useful to detect if a peer sync is active
or frozen.

This patch is related to the commit:
  "MINOR: peers: Add a new command to the CLI for peers."
and should be backported with it.
2019-04-18 18:24:25 +02:00
Olivier Houchard
66a7b3302a BUILD/medium: ssl: Fix build with OpenSSL < 1.1.0
Make sure it builds with OpenSSL < 1.1.0, a lot of the BIO_get/set methods
were introduced with OpenSSL 1.1.0, so fallback with the old way of doing
things if needed.
2019-04-18 15:58:58 +02:00
Olivier Houchard
a8955d57ed MEDIUM: ssl: provide our own BIO.
Instead of letting the OpenSSL code handle the file descriptor directly,
provide a custom BIO, that will use the underlying XPRT to send/recv data.
This will let us implement QUIC later, and probably clean the upper layer,
if/when the SSL code provide its own subscribe code, so that the upper layers
won't have to care if we're still waiting for the handshake to complete or not.
2019-04-18 14:56:24 +02:00
Olivier Houchard
e179d0e88f MEDIUM: connections: Provide a xprt_ctx for each xprt method.
For most of the xprt methods, provide a xprt_ctx.  This will be useful later
when we'll want to be able to stack xprts.
The init() method now has to create and provide the said xprt_ctx if needed.
2019-04-18 14:56:24 +02:00
Olivier Houchard
df35784600 MEDIUM: ssl: provide its own subscribe/unsubscribe function.
In order to prepare for the possibility of using different kinds of xprt
with ssl, make the ssl code provide its own subscribe and unsubscribe
functions, right now it just calls conn_subscribe and conn_unsubsribe.
2019-04-18 14:56:24 +02:00
Olivier Houchard
7b5fd1ec26 MEDIUM: connections: Move some fields from struct connection to ssl_sock_ctx.
Move xprt_st, tmp_early_data and sent_early_data from struct connection to
struct ssl_sock_ctx, as they are only used in the SSL code.
2019-04-18 14:56:24 +02:00
Olivier Houchard
66ab498f26 MEDIUM: ssl: Give ssl_sock its own context.
Instead of using directly a SSL * as xprt_ctx, give ssl_sock its own context.
It's useless for now, but will be useful later when we'll want to be able to
stack xprts.
2019-04-18 14:56:24 +02:00
Olivier Houchard
ed1a6a0d8a MEDIUM: tasks: Use __ha_barrier_store after modifying global_tasks_mask.
Now that we no longer use atomic operations to update global_tasks_mask,
as it's always modified while holding the TASK_RQ_LOCK, we have to use
__ha_barrier_store() instead of __ha_barrier_atomic_store() to ensure
any modification of global_tasks_mask is seen before modifying
active_tasks_mask.

This should be backported to 1.9.
2019-04-18 14:14:10 +02:00
Willy Tarreau
d83b6c1ab3 BUG/MINOR: mworker: disable busy polling in the master process
When enabling busy polling, we don't want the master to use it, or it
wastes a dedicated processor to this!

Must be backported to 1.9.
2019-04-18 11:34:41 +02:00
Olivier Houchard
1cfac37b65 MEDIUM: tasks: Don't account a destroyed task as a runned task.
In process_runnable_tasks(), if the task we're about to run has been
destroyed, and should be free, don't account for it in the number of task
we ran. We're only allowed a maximum number of tasks to run per call to
process_runnable_tasks(), and freeing one shouldn't take the slot of a
valid task.
2019-04-18 10:11:13 +02:00
Olivier Houchard
3f795f76e8 MEDIUM: tasks: Merge task_delete() and task_free() into task_destroy().
task_delete() was never used without calling task_free() just after, and
task_free() was only used on error pathes to destroy a just-created task,
so merge them into task_destroy(), that will remove the task from the
wait queue, and make sure the task is either destroyed immediately if it's
not in the run queue, or destroyed when it's supposed to run.
2019-04-18 10:10:04 +02:00
Willy Tarreau
03dd029a5b CLEANUP: task: remain consistent when using the task's handler
A pointer "process" is assigned the task's handler in
process_runnable_tasks(), we have no reason to use t->process
right after it is assigned.
2019-04-17 22:32:27 +02:00
Olivier Houchard
51205a1958 BUG/MEDIUM: applets: Don't use task_in_rq().
When deciding if we want to wake the task of an applet up, don't give up
if task_in_rq returns 1, as there's a race condition and another thread
may run it. Instead, always attempt to task_wakeup(), at worst the task
is already in the run queue, and nothing will happen.
2019-04-17 19:30:23 +02:00
Olivier Houchard
0c7a4b6371 MINOR: tasks: Don't set the TASK_RUNNING flag when adding in the tasklet list.
Now that TASK_QUEUED is enforced, there's no need to set TASK_RUNNING when
removing the task from the runqueue to add it to the tasklet list. The flag
will only be set right before we run the task.
2019-04-17 19:28:01 +02:00
Olivier Houchard
de82aeaa26 BUG/MEDIUM: tasks: Make sure we modify global_tasks_mask with the rq_lock.
When modifying global_tasks_mask, make sure we hold the rq_lock, or we might
remove the bit while it has been re-set by somebody else, and we make not
be waked when needed.
2019-04-17 19:28:01 +02:00
Willy Tarreau
b038007ae8 BUG/MEDIUM: tasks: Make sure we set TASK_QUEUED before adding a task to the rq.
Make sure we set TASK_QUEUED in every case before adding the task to the
run queue. task_wakeup() now checks if either TASK_QUEUED or TASK_RUNNING
is set, and if neither is set, add TASK_QUEUED and effectively add the task
to the runqueue.
No longer use __task_wakeup() anywhere except in task_wakeup(), always use
task_wakeup() instead.
With the old code, process_runnable_task() may re-add a task in the runqueue
without setting the TASK_QUEUED flag, and there were race conditions that could
lead to a task having the TASK_QUEUED flag but not in the runqueue, thus
being unschedulable.

This should be backported to 1.9.
2019-04-17 19:28:01 +02:00
Christopher Faulet
46575cd392 BUG/MINOR: http_fetch/htx: Use HTX versions if the proxy enables the HTX mode
Because the HTX is now the default mode for all proxies (HTTP and TCP), it is
better to match on the proxy options to know if the HTX is enabled or not. This
way, if a TCP proxy explicitly disables the HTX mode, the legacy version of HTTP
fetches will be used.

No backport needed except if the patch activating the HTX by default for all
proxies is backported.
2019-04-17 15:12:27 +02:00