Commit Graph

7942 Commits

Author SHA1 Message Date
Olivier Houchard
58d87f31f7 BUG/MEDIUM: h2: Don't forget to set h2s->cs to NULL after having free'd cs.
In h2c_frt_stream_new, if we failed to create the stream for some reason,
don't forget to set h2s->cs to NULL before calling h2s_destroy(), otherwise
h2s_destroy() will call h2s_close(), which will attempt to access
h2s->cs->flags if it's non-NULL.

This should be backported to 1.9.
2019-05-29 16:45:13 +02:00
Olivier Houchard
250031e444 MEDIUM: sessions: Introduce session flags.
Add session flags, and add a new flag, SESS_FL_PREFER_LAST, to be set when
we use NTLM authentication, and we should reuse the last connection. This
should fix using NTLM with HTX. This totally replaces TX_PREFER_LAST.

This should be backported to 1.9.
2019-05-29 15:41:47 +02:00
Christopher Faulet
1146f975a9 BUG/MEDIUM: mux-h1: Don't skip the TCP splicing when there is no more data to read
When there is no more data to read (h1m->curr_len == 0 in the state
H1_MSG_DATA), we still call xprt->rcv_pipe() callback. It is important to update
connection's flags. Especially to remove the flag CO_FL_WAIT_ROOM. Otherwise,
the pipe remains marked as full, preventing the stream-interface to fallback on
rcv_buf(). So the connection may be freezed because no more data is received and
the mux H1 remains blocked in the state H1_MSG_DATA.

This patch must be backported to 1.9.
2019-05-29 15:32:14 +02:00
Willy Tarreau
1e928c074b MEDIUM: task: don't grab the WR lock just to check the WQ
When profiling locks, it appears that the WQ's lock has become the most
contended one, despite the WQ being split by thread. The reason is that
each thread takes the WQ lock before checking if it it does have something
to do. In practice the WQ almost only contains health checks and rare tasks
that can be scheduled anywhere, so this is a real waste of resources.

This patch proceeds differently. Now that the WQ's lock was turned to RW
lock, we proceed in 3 phases :
  1) locklessly check for the queue's emptiness

  2) take an R lock to retrieve the first element and check if it is
     expired. This way most visits are performed with an R lock to find
     and return the next expiration date.

  3) if one expiration is found, we perform the WR-locked lookup as
     usual.

As a result, on a one-minute test involving 8 threads and 64 streams at
1.3 million ctxsw/s, before this patch the lock profiler reported this :

    Stats about Lock TASK_WQ:
         # write lock  : 1125496
         # write unlock: 1125496 (0)
         # wait time for write     : 263.143 msec
         # wait time for write/lock: 233.802 nsec
         # read lock   : 0
         # read unlock : 0 (0)
         # wait time for read      : 0.000 msec
         # wait time for read/lock : 0.000 nsec

And after :

    Stats about Lock TASK_WQ:
         # write lock  : 173
         # write unlock: 173 (0)
         # wait time for write     : 0.018 msec
         # wait time for write/lock: 103.988 nsec
         # read lock   : 1072706
         # read unlock : 1072706 (0)
         # wait time for read      : 60.702 msec
         # wait time for read/lock : 56.588 nsec

Thus the contention was divided by 4.3.
2019-05-28 19:15:44 +02:00
Willy Tarreau
ef28dc11e3 MINOR: task: turn the WQ lock to an RW_LOCK
For now it's exclusively used as a write lock though, thus it remains
100% equivalent to the spinlock it replaces.
2019-05-28 19:15:44 +02:00
Willy Tarreau
186e96ece0 MEDIUM: buffers: relax the buffer lock a little bit
In lock profiles it's visible that there is a huge contention on the
buffer lock. The reason is that when offer_buffers() is called, it
systematically takes the lock before verifying if there is any
waiter. However doing so doesn't protect against races since a
waiter can happen just after we release the lock as well. Similarly
in h2 we take the lock every time an h2c is going to be released,
even without checking that the h2c belongs to a wait list. These
two have now been addressed by verifying non-emptiness of the list
prior to taking the lock.
2019-05-28 17:25:21 +02:00
Willy Tarreau
a8b2ce02b8 MINOR: activity: report the number of failed pool/buffer allocations
Haproxy is designed to be able to continue to run even under very low
memory conditions. However this can sometimes have a serious impact on
performance that it hard to diagnose. Let's report counters of failed
pool and buffer allocations per thread in show activity.
2019-05-28 17:25:21 +02:00
Willy Tarreau
2ae84e445d MEDIUM: poller: separate the wait time from the wake events
We have been abusing the do_poll()'s timeout for a while, making it zero
whenever there is some known activity. The problem this poses is that it
complicates activity diagnostic by incrementing the poll_exp field for
each known activity. It also requires extra computations that could be
avoided.

This change passes a "wake" argument to say that the poller must not
sleep. This simplifies the operations and allows one to differenciate
expirations from activity.
2019-05-28 17:25:21 +02:00
Willy Tarreau
d78d08f95b MINOR: activity: report totals and average separately
Some fields need to be averaged instead of summed (e.g. avg_poll_us)
when reported on the CLI. Let's have a distinct macro for this.
2019-05-28 17:25:21 +02:00
Willy Tarreau
a0211b864c MINOR: activity: write totals on the "show activity" output
Most of the time we find ourselves adding per-thread fields to observe
activity, so let's compute these on the fly and display them. Now the
output shows "field: total [ thr0 thr1 ... thrn ]".
2019-05-28 15:16:09 +02:00
Willy Tarreau
0350b90e31 MEDIUM: htx: make htx_add_data() never defragment the buffer
Now instead of trying to fit 100% of the input data into the output
buffer at the risk of defragmenting it, we put what fits into it only
and return the amount of bytes transferred. In a test, compared to the
previous commit, it increases the cached data rate from 44 Gbps to
55 Gbps and saves a lot in case of large buffers : with a 1 MB buffer,
uncached transfers jumped from 700 Mbps to 30 Gbps.
2019-05-28 14:48:59 +02:00
Willy Tarreau
0a7ef02074 MINOR: htx: make htx_add_data() return the transmitted byte count
In order to later allow htx_add_data() to transmit partial blocks and
avoid defragmenting the buffer, we'll need to return the number of bytes
consumed. This first modification makes the function do this and its
callers take this into account. At the moment the function still works
atomically so it returns either the block size or zero. However all
call places have been adapted to consider any value between zero and
the block size.
2019-05-28 14:48:59 +02:00
Willy Tarreau
d4908fa465 MINOR: htx: rename htx_append_blk_value() to htx_add_data_atonce()
This function is now dedicated to data blocks, and we'll soon need to
access it from outside in a rare few cases. Let's rename it and export
it.
2019-05-28 14:48:59 +02:00
Olivier Houchard
692c1d07f9 MINOR: ssl: Don't forget to call the close method of the underlying xprt.
In ssl_sock_close(), don't forget to call the underlying xprt's close method
if it exists. For now it's harmless not to do so, because the only available
layer is the raw socket, which doesn't have a close method, but that will
change when we implement QUIC.
2019-05-28 10:08:39 +02:00
Olivier Houchard
19afb274ad MINOR: ssl: Make sure the underlying xprt's init method doesn't fail.
In ssl_sock_init(), when initting the underlying xprt, check the return value,
and give up if it fails.
2019-05-28 10:08:28 +02:00
Willy Tarreau
11c90fbd92 BUG/MEDIUM: http: fix "http-request reject" when not final
When "http-request reject" was introduced in 1.8 with commit 53275e8b0
("MINOR: http: implement the "http-request reject" rule"), it was already
broken. The code mentions "it always returns ACT_RET_STOP" and obviously
a gross copy-paste made it ACT_RET_CONT. If the rule is the last one it
properly blocks, but if not the last one it gets ignored, as can be seen
with this simple configuration :

    frontend f1
        bind :8011
        mode http
        http-request reject
        http-request redirect location /

This trivial fix must be backported to 1.9 and 1.8. It is tracked by
github issue #107.
2019-05-28 08:26:17 +02:00
Christopher Faulet
39744f792d MINOR: htx: Remove support of pseudo headers because it is unused
The code to handle pseudo headers is unused and with no real value. So remove
it.
2019-05-28 07:42:33 +02:00
Christopher Faulet
ced39006a2 MINOR: htx: don't rely on htx_find_blk() anymore in the function htx_truncate()
the function htx_find_blk() is used by only one function, htx_truncate(). So
because this function does nothing very smart, we don't use it anymore. It will
be removed by another commit.
2019-05-28 07:42:33 +02:00
Christopher Faulet
0f6d6a9ab6 MINOR: htx: Optimize htx_drain() when all data are drained
Instead of looping on the HTX message to drain all data, the message is now
reset..
2019-05-28 07:42:33 +02:00
Christopher Faulet
ee847d45d0 MEDIUM: filters/htx: Filter body relatively to the first block
The filters filtering HTX body, in the callback http_payload, must now loop on
an HTX message starting from the first block position. The offset passed as
parameter is relative to this position and not the head one. It is mandatory
because once filtered, data are now forwarded using the function
channel_htx_fwd_payload(). So the first block position is always updated.
2019-05-28 07:42:33 +02:00
Christopher Faulet
16af60e540 MINOR: proto-htx: Use channel_htx_fwd_all() when unfiltered body are forwarded
So the first block position of the HTX message will always be updated
accordingly.
2019-05-28 07:42:33 +02:00
Christopher Faulet
8fa60e4613 MINOR: stats/htx: don't use the first block position but the head one
Applets must never rely on the first block position to consume an HTX
message. The head position must be used instead. For the request it is always
the start-line. At this stage, it is not a bug, because the first position of
the request is never changed by HTX analysers.
2019-05-28 07:42:33 +02:00
Christopher Faulet
29f1758285 MEDIUM: htx: Store the first block position instead of the start-line one
We don't store the start-line position anymore in the HTX message. Instead we
store the first block position to analyze. For now, it is almost the same. But
once all changes will be made on this part, this position will have to be used
by HTX analyzers, and only in the analysis context, to know where the analyse
should start.

When new blocks are added in an HTX message, if the first block position is not
defined, it is set. When the block pointed by it is removed, it is set to the
block following it. -1 remains the value to unset the position. the first block
position is unset when the HTX message is empty. It may also be unset on a
non-empty message, meaning every blocks were already analyzed.

From HTX analyzers point of view, this position is always set during headers
analysis. When they are waiting for a request or a response, if it is unset, it
means the analysis should wait. But once the analysis is started, and as long as
headers are not forwarded, it points to the message start-line.

As mentionned, outside the HTX analysis, no code must rely on the first block
position. So multiplexers and applets must always use the head position to start
a loop on an HTX message.
2019-05-28 07:42:33 +02:00
Christopher Faulet
ee1bd4b4f7 MINOR: proto-htx: Use channel_htx_fwd_headers() to forward 1xx responses
Instead of doing it by hand, we now call the dedicated function to do so.
2019-05-28 07:42:33 +02:00
Christopher Faulet
17fd8a261f MINOR: filters/htx: Use channel_htx_fwd_headers() after headers filtering
Instead of doing it by hand in the function flt_analyze_http_headers(), we now
call the dedicated function to do so.
2019-05-28 07:42:33 +02:00
Christopher Faulet
b75b5eaf26 MEDIUM: htx: 1xx messages are now part of the final reponses
1xx informational messages (all except 101) are now part of the HTTP reponse,
semantically speaking. These messages are not followed by an EOM anymore,
because a final reponse is always expected. All these parts can also be
transferred to the channel in same time, if possible. The HTX response analyzer
has been update to forward them in loop, as the legacy one.
2019-05-28 07:42:30 +02:00
Christopher Faulet
a61e97bcae MINOR: htx: Be sure to xfer all headers in one time in htx_xfer_blks()
In the function htx_xfer_blks(), we take care to transfer all headers in one
time. When the current block is a start-line, we check if there is enough space
to transfer all headers too. If not, and if the destination is empty, a parsing
error is reported on the source.

The H2 multiplexer is the only one to use this function. When a parsing error is
reported during the transfer, the flag CS_FL_EOI is also set on the conn_stream.
2019-05-28 07:42:12 +02:00
Christopher Faulet
a39d8ad086 MINOR: mux-h1: Set hdrs_bytes on the SL when an HTX message is produced 2019-05-28 07:42:12 +02:00
Christopher Faulet
33543e73a2 MINOR: h2/htx: Set hdrs_bytes on the SL when an HTX message is produced 2019-05-28 07:42:12 +02:00
Christopher Faulet
05c083ca8d MINOR: htx: Add a field to set the memory used by headers in the HTX start-line
The field hdrs_bytes has been added in the structure htx_sl. It should be used
to set how many bytes are help by all headers, from the start-line to the
corresponding EOH block. it must be set to -1 if it is unknown.
2019-05-28 07:42:12 +02:00
Christopher Faulet
2f6edc84a8 MINOR: mux-h2/htx: Support zero-copy when possible in h2_rcv_buf()
If the channel's buffer is empty and the message is small enough, we can swap
the H2S buffer with the channel one.
2019-05-28 07:42:12 +02:00
Christopher Faulet
9cdd5036f3 MINOR: stream-int: Don't use the flag CO_RFL_KEEP_RSV anymore in si_cs_recv()
Because the channel_recv_max() always return the right value, for HTX and legacy
streams, we don't need to set this flag. The multiplexer don't use it anymore.
2019-05-28 07:42:12 +02:00
Christopher Faulet
8a9ad4c0e8 MINOR: mux-h2: Use the count value received from the SI in h2_rcv_buf()
Now, the SI calls h2_rcv_buf() with the right count value. So we can rely on
it. Unlike the H1 multiplexer, it is fairly easier for the H2 multiplexer
because the HTX message already exists, we only transfer blocks from the H2S to
the channel. And this part is handled by htx_xfer_blks().
2019-05-28 07:42:12 +02:00
Christopher Faulet
30db3d737b MEDIUM: mux-h1: Use the count value received from the SI in h1_rcv_buf()
Now, the SI calls h1_rcv_buf() with the right count value. So we can rely on
it. During the parsing, we now really respect this value to be sure to never
exceed it. To do so, once headers are parsed, we should estimate the size of the
HTX message before copying data.
2019-05-28 07:42:12 +02:00
Christopher Faulet
156852b613 BUG/MINOR: htx: Change htx_xfer_blk() to also count metadata
This patch makes the function more accurate. Thanks to the function
htx_get_max_blksz(), the transfer of data has been simplified. Note that now the
total number of bytes copied (metadata + payload) is returned. This slighly
change how the function is used in the H2 multiplexer.
2019-05-28 07:42:12 +02:00
Christopher Faulet
a3f1550dfa MEDIUM: http/htx: Perform analysis relatively to the first block
The first block is the start-line, if defined. Otherwise it the head of the HTX
message. So now, during HTTP analysis, lookup are all done using the first block
instead of the head. Concretely, for now, it is the same because only one HTTP
message is stored at a time in an HTX message. 1xx informational messages are
handled separatly from the final reponse and from each other. But it will make
sense when the 1xx informational messages and the associated final reponse will
be stored in the same HTX message.
2019-05-28 07:42:12 +02:00
Christopher Faulet
7b7d507a5b MINOR: http/htx: Use sl_pos directly to replace the start-line
Since the HTX start-line is now referenced by position instead of by its payload
address, it is fairly easier to replace it. No need to search the rigth block to
find the start-line comparing the payloads address. It just enough to get the
block at the position sl_pos.
2019-05-28 07:42:12 +02:00
Christopher Faulet
297fbb45fe MINOR: htx: Replace the function http_find_stline() by http_get_stline()
Now, we only return the start-line. If not found, NULL is returned. No lookup is
performed and the HTX message is no more updated. It is now the caller
responsibility to update the position of the start-line to the right value. So
when it is not found, i.e sl_pos is set to -1, it means the last start-line has
been already processed and the next one has not been inserted yet.

It is mandatory to rely on this kind of warranty to store 1xx informational
responses and final reponse in the same HTX message.
2019-05-28 07:42:12 +02:00
Christopher Faulet
b77a1d26a4 MINOR: mux-h2/htx: Get the start-line from the head when HEADERS frame is built
in the H2 multiplexer, when a HEADERS frame is built before sending it, we have
the warranty the start-line is the head of the HTX message. It is safer to rely
on this fact than on the sl_pos value. For now, it's safe to use sl_pos in muxes
because HTTP 1xx messages are considered as full messages in HTX and only one
HTTP message can be stored at a time in HTX. But we are trying to handle 1xx
messages as a part of the reponse message. In this way, an HTTP reponse will be
the sum of all 1xx informational messages followed by the final response. So it
will be possible to have several start-line in the same HTX message. And the
sl_pos will point to the first unprocessed start-line from the analyzers point
of view.
2019-05-28 07:42:12 +02:00
Christopher Faulet
9c66b980fa MINOR: htx: Store start-line block's position instead of address of its payload
Nothing much to say. This change is just mandatory to consider 1xx informational
messages as part of a response.
2019-05-28 07:42:12 +02:00
Christopher Faulet
28f29c7eea MINOR: htx: Store the head position instead of the wrap one
The head of an HTX message is heavily used whereas the wrap position is only
used when a block is added or removed. So it is more logical to store the head
position in the HTX message instead of the wrap one. The wrap position can be
easily deduced. To get it, the new function htx_get_wrap() may be used.
2019-05-28 07:42:12 +02:00
Christopher Faulet
429b91d308 MINOR: htx: Remove the macro IS_HTX_SMP() and always use IS_HTX_STRM() instead
The macro IS_HTX_SMP() is only used at a place, in a context where the stream
always exists. So, we can remove it to use IS_HTX_STRM() instead.
2019-05-28 07:42:12 +02:00
Willy Tarreau
b01302f9ac MEDIUM: config: now alert when two servers have the same name
We've been emitting warnings for over 5 years (since 1.5-dev22) about
configs accidently carrying multiple servers with the same name in the
same backend, and this starts to cause some real trouble in dynamic
environments since it's still very difficult to accurately process
a state-file and we still can't transport a server's name over the
peers protocol because of this.

It's about time to force users to fix their configs if they still
hadn't given that there is zero technical justification for doing this,
beyond the "yyp" (or copy-paste accident) when editing the config.

The message remains as clear as before, indicating the file and lines
of the conflict so that the user can easily fix it.
2019-05-27 19:31:06 +02:00
Willy Tarreau
c3b5958255 BUG/MEDIUM: threads: fix double-word CAS on non-optimized 32-bit platforms
On armv7 haproxy doesn't work because of the fixes on the double-word
CAS. There are two issues. The first one is that the last argument in
case of dwcas is a pointer to the set of value and not a value ; the
second is that it's not enough to cast the data as (void*) since it will
be a single word. Let's fix this by using the pointers as an array of
long. This was tested on i386, armv7, x86_64 and aarch64 and it is now
fine. An alternate approach using a struct was attempted as well but it
used to produce less optimal code.

This fix must be backported to 1.9. This fixes github issue #105.

Cc: Olivier Houchard <ohouchard@haproxy.com>
2019-05-27 17:40:59 +02:00
Willy Tarreau
bff005ae58 BUG/MEDIUM: queue: fix the tree walk in pendconn_redistribute.
In pendconn_redistribute() we scan the queue using eb32_next() on the
node we've just deleted, which is wrong since the node is not in the
tree anymore, and it could dereference one node that has already been
released by another thread. Note that we cannot use eb32_first() in the
loop here instead because we need to skip pendconns having SF_FORCE_PRST.
Instead, let's keep a copy of the next node before deleting it.

In addition, the pendconn retrieved there is wrong, it uses &node as
the pointer instead of node, resulting in very quick crashes when the
server list is scanned.

Fortunately this only happens when "option redispatch" is used in
conjunction with "maxconn" on server lines, "cookie" for the stickiness,
and when a server goes down with entries in its queue.

This bug was introduced by commit 0355dabd7 ("MINOR: queue: replace
the linked list with a tree") so the fix must be backported to 1.9.
2019-05-27 10:29:59 +02:00
Willy Tarreau
b6195ef2a6 BUG/MAJOR: lb/threads: make sure the avoided server is not full on second pass
In fwrr_get_next_server(), we optionally pass a server to avoid. It
usually points to the current server during a redispatch operation. If
this server is usable, an "avoided" pointer is set and we continue to
look for another server. If in the end no other server is found, then
we fall back to this avoided one, which is still better than nothing.

The problem that may arise with threads is that in the mean time, this
avoided server might have received extra connections and might not be
usable anymore. This causes it to be queued a second time in the "full"
list and the loop to search for a server again, ending up on this one
again and so on.

This patch makes sure that we break out of the loop when we have to
pick the avoided server. It's probably what the code intended to do
as the current break statement causes fwrr_update_position() and
fwrr_dequeue_srv() to be called again on the avoided server.

It must be backported to 1.9 and 1.8, and seems appropriate for older
versions though it's unclear what the impact of this bug might be
there since the race doesn't exist and we're left with the double
update of the server's position.
2019-05-27 10:29:59 +02:00
Willy Tarreau
d6a7850200 MINOR: cli/activity: add 3 general purpose counters in development mode
The unused fd_del and fd_skip were being abused during debugging sessions
as general purpose event counters. With their removal, let's officially
have dedicated counters for such use cases. These counters are called
"ctr0".."ctr2" and are listed at the end when DEBUG_DEV is set.
2019-05-27 07:03:38 +02:00
Willy Tarreau
394c9b4215 MINOR: cli/activity: remove "fd_del" and "fd_skip" from show activity
These variables are never set anymore and were always reported as zero.
2019-05-27 06:59:14 +02:00
Ilya Shipitsin
0590f44254 BUILD: ssl: fix latest LibreSSL reg-test error
starting with OpenSSL 1.0.0 recommended way to disable compression is
using SSL_OP_NO_COMPRESSION when creating context.

manipulations with SSL_COMP_get_compression_methods, sk_SSL_COMP_num
are only required for OpenSSL < 1.0.0
2019-05-26 21:26:02 +02:00
Willy Tarreau
08e2b41e81 BUILD: connections: shut up gcc about impossible out-of-bounds warning
Since commit 88698d9 ("MEDIUM: connections: Add a way to control the
number of idling connections.") when building without threads, gcc
complains that the operations made on the idle_orphan_conns[] list is
out of bounds, which is always false since 1) <i> can only equal zero,
and 2) given it's equal to <tid> we never even enter the loop. But as
usual it thinks it knows better, so let's mask the origin of this <i>
value to shut it up. Another solution consists in making <i> unsigned
and adding an explicit range check.
2019-05-26 11:54:20 +02:00
Willy Tarreau
9c218e7521 MAJOR: mux-h2: switch to next mux buffer on buffer full condition.
Now when we fail to send because the mux buffer is full, before giving
up and marking MFULL, we try to allocate another buffer in the mux's
ring to try again. Thanks to this (and provided there are enough buffers
allocated to the mux's ring), a single stream picked in the send_list
cannot steal all the mux's room at once. For this, we expand the ring
size to 31 buffers as it seems to be optimal on benchmarks since it
divides the number of context switches by 3. It will inflate each H2
conn's memory by 1 kB.

The bandwidth is now much more stable. Prior to this, it a test on
h2->h1 with very large objects (1 GB), a few tens of connections and
a few tens of streams per connection would show a varying performance
between 34 and 95 Gbps on 2 cores/4 threads, with h2_snd_buf() stopped
on a buffer full condition between 300000 and 600000 times per second.
Now the performance is constantly between 88 and 96 Gbps. Measures show
that buffer full conditions are met around only 159 times per second
in this case, or rougly 2000 to 4000 times less often.
2019-05-26 11:33:19 +02:00
Willy Tarreau
60f62682b1 MINOR: mux-h2: report the mbuf's head and tail in "show fd"
It's useful to know how the mbuf spans over the whole area and to have
access to the first and last ones, so let's dump just this.
2019-05-26 11:33:18 +02:00
Willy Tarreau
bcc4595e57 CLEANUP: mux-h2: consistently use a local variable for the mbuf
This makes the code more readable and reduces the calls to br_tail().
In addition, all calls to h2_get_buf() are now made via this local
variable, which should significantly help for retries.
2019-05-26 10:52:47 +02:00
Willy Tarreau
41c4d6a2c5 MEDIUM: mux-h2: make the send() function iterate over all mux buffers
Now send() uses a loop to iterate over all buffers to be sent. These
buffers are released and deleted from the vector once completely sent.
If any buffer gets released, offer_buffers() is called to wake up some
waiters.
2019-05-26 10:52:25 +02:00
Willy Tarreau
2e3c000c1c MINOR: mux-h2: introduce h2_release_mbuf() to release all buffers in the mbuf ring
This function iterates over all buffers in the mbuf ring to release all
of them from the head to the tail.
2019-05-26 10:51:25 +02:00
Willy Tarreau
662fafc02b MEDIUM: mux-h2: make the conditions to send based on mbuf, not just its tail
This is in preparation for iterating over lists. First we need to always
check the buffer's head and not its tail.
2019-05-26 10:50:50 +02:00
Willy Tarreau
5133096df2 MEDIUM: mux-h2: replace all occurrences of mbuf with a buffer ring
For now it's only one buffer long so the head and tails are always the
same, thus it doesn't change what used to work. In short, br_tail(h2c->mbuf)
was inserted everywhere we used to have h2c->mbuf.
2019-05-26 10:50:18 +02:00
Willy Tarreau
455d5681b6 MEDIUM: mux-h2: avoid doing expensive buffer realigns when not absolutely needed
Transferring large objects over H2 sometimes shows unexplained performance
variations. A long analysis resulted in the following discovery. Often the
mux buffer looks like this :

    [ empty_head |    data     | empty_tail ]

Typical numbers are (very common) :
  - empty_head = 31
  - empty_tail = 16  (total free=47)
  - data = 16337
  - size = 16384
  - data to copy: 43

The reason for these holes are the blocking factors that are not always
the same in and out (due to keeping 9 bytes for the frame size, or the
56 bytes corresponding to the HTX header). This can easily happen 10000
times a second if the network bandwidth permits it!

In this case, while copying a DATA frame we find that the buffer has its
free space wrapped so we decide to realign it to optimize the copy. It's
possible that this practice stems from the code used to emit headers,
which do not support fragmentation and which had no other option left.
But it comes with two problems :
  - we don't check if the data fits, which results in a memcpy for nothing
  - we can move huge amounts of data to just copy a small block.

This patch addresses this two ways :
  - first, by not forcing a data realignment if what we have to copy does
    not fit, as this is totally pointless ;

  - second, by refusing to move too large data blocks. The threshold was
    set to 1 kB, because it may make sense to move 1 kB of data to copy
    a 15 kB one at once, which will leave as a single 16 kB block, but
    it doesn't make sense to mvoe 15 kB to copy just 1 kB. In all cases
    the data would fit and would just be split into two blocks, which is
    not very expensive, hence the low limit to 1 kB

With such changes, realignments are very rare, they show up around once
every 15 seconds at 60 Gbps, and look like this, resulting in a much more
stable bit rate :

  buf=0x7fe6ec0c3510,h=16333,d=35,s=16384 room=16349 in=16337

This patch should be safe for backporting to 1.9 if some performance
issues are reported there.
2019-05-25 20:31:53 +02:00
Ilya Shipitsin
e242f3dfb8 BUG/MINOR: ssl_sock: Fix memory leak when disabling compression
according to manpage:

       sk_TYPE_zero() sets the number of elements in sk to zero. It
       does not free sk so after this call sk is still valid.

so we need to free all elements

[wt: seems like it has been there forever and should be backported
 to all stable branches]
2019-05-25 07:45:55 +02:00
Christopher Faulet
b8fd4c031c BUG/MINOR: htx: Remove a forgotten while loop in htx_defrag()
Fortunately, this loop does nothing. Otherwise it would have led to an infinite
loop. It was probably forgotten during a refactoring, in the early stage of the
HTX.

This patch must be backported to 1.9.
2019-05-24 09:11:10 +02:00
Christopher Faulet
f90c24d14c BUG/MEDIUM: proto-htx: Not forward too much data when 1xx reponses are handled
When an 1xx reponse is processed, we forward it immediatly. But another message
may already be in the channel's buffer, waiting to be processed. This may be
another 1xx reponse or the final one. So instead of forwarding everything, we
must take care to only forward the processed 1xx response.

This patch must be backported to 1.9.
2019-05-24 09:11:07 +02:00
Christopher Faulet
8e9e3ef15c BUG/MINOR: mux-h1: Report EOI instead EOS on parsing error or H2 upgrade
When a parsing error occurrs in the H1 multiplexer, we stop to copy HTX
blocks. So the error may be reported with an emtpy HTX message. For instance, if
the headers parsing failed. When it happens, the flag CS_FL_EOS is also set on
the conn_stream. But it is an error. Most of time, it is set on established
connections, so it is not really an issue. But if it happens when the server
connection is not fully established, the connection is shut down immediatly and
the stream-interface is switched from SI_ST_CON to SI_ST_DIS/CLO. So HTX
analyzers have no chance to catch the error.

Instead of setting CS_FL_EOS, it is fairly better to set CS_FL_EOI, which is the
right flag to use. The same is also done on H2 upgrade. As a side effet of this
fix, in the stream-interface code, we must now set the flag CF_READ_PARTIAL on
the channel when the flag CF_EOI is set. It is a warranty to wakeup the stream
when EOI is reported to the channel while no data are received.

This patch must be backported to 1.9.
2019-05-24 09:11:01 +02:00
Christopher Faulet
316934d3c9 BUG/MINOR: mux-h2: Count EOM in bytes sent when a HEADERS frame is formatted
In HTX, when a HEADERS frame is formatted before sending it to the client or the
server, If an EOM is found because there is no body, we must count it in the
number bytes sent.

This patch must be backported to 1.9.
2019-05-24 09:10:46 +02:00
Christopher Faulet
256b69a82d BUG/MINOR: lua: Set right direction and flags on new HTTP objects
When a LUA HTTP object is created using the current TXN object, it is important
to also set the right direction and flags, using ones from the TXN object.

This patch may be backported to all supported branches with the lua
support. But, it seems to have no impact for now.
2019-05-24 09:07:57 +02:00
Christopher Faulet
55ae8a64e4 BUG/MEDIUM: spoe: Don't use the SPOE applet after releasing it
In spoe_release_appctx(), the SPOE applet may be used after it was released to
get its exit status code. Of course, HAProxy crashes when this happens.

This patch must be backported to 1.9 and 1.8.
2019-05-24 09:07:30 +02:00
Christopher Faulet
08e6646460 BUG/MINOR: proto-htx: Try to keep connections alive on redirect
As fat as possible, we try to keep the connections alive on redirect. It's
possible when the request has no body or when the request parsing is finished.

No backport is needed.
2019-05-24 09:06:59 +02:00
Willy Tarreau
1713c03825 MINOR: stats: report the global output bit rate in human readable form
The stats page now reports the per-process output bit rate and applies
the usual conversions needed to turn the TCP payload rate to an Ethernet
bit rate in order to give a reasonably accurate estimate of how far from
interface saturation we are.
2019-05-23 12:31:51 +02:00
Willy Tarreau
7cf0e4517d MINOR: raw_sock: report global traffic statistics
Many times we've been missing per-process traffic statistics. While it
didn't make sense in multi-process mode, with threads it does. Thus we
now have a counter of bytes emitted by raw_sock, and a freq counter for
these as well. However, freq_ctr are limited to 32 bits, and given that
loads of 300 Gbps have already been reached over a loopback using
splicing, we need to downscale this a bit. Here we're storing 1/32 of
the byte rate, which gives a theorical limit of 128 GB/s or ~1 Tbps,
which is more than enough. Let's have fun re-reading this sentence in
2029 :-)  The values can be read in "show info" output on the CLI.
2019-05-23 11:45:38 +02:00
Willy Tarreau
bc1b820606 BUILD: watchdog: condition it to USE_RT
It's needed on Linux to have access to timerfd_*, and on FreeBSD this
lib is needed as well, though not enabled in our default build. We can
see later if it's OK to enable it, for now let's fix the build issues.
2019-05-23 10:20:55 +02:00
Willy Tarreau
02255b24df BUILD: watchdog: use si_value.sival_int, not si_int for the timer's value
Bah, the linux manpage suggests to use si_int but it's a fake, it's only
a define on sigval.sival_int where sigval is defined as si_value. Let's
use si_value.sival_int, at least it builds on both Linux and FreeBSD. It's
likely that this code will have to be limited to a small subset of OSes
if it causes difficulties like this.
2019-05-23 08:36:29 +02:00
Willy Tarreau
96d5195862 MEDIUM: config: deprecate the antique req* and rsp* commands
These commands don't follow the same flow as the rest of the commands,
each of them iterates over all header lines before switching to the
next directive. In addition they make no distinction between start
line and headers and can lead to unparsable rewrites which are very
difficult to deal with internally.

Most of them are still occasionally found in configurations, mainly
because of the usual "we've always done this way". By marking them
deprecated and emitting a warning and recommendation on first use of
each of them, we will raise users' awareness of users regarding the
cleaner, faster and more reliable alternatives.

Some use cases of "reqrep" still appear from time to time for URL
rewriting that is not so convenient with other rules. But at least
users facing this requirement will explain their use case so that we
can best serve them. Some discussion started on this subject in a
thread linked to from github issue #100.

The goal is to remove them in 2.1 since they require to reparse the
result before indexing it and we don't want this hack to live long.
The following directives were marked deprecated :

  -reqadd
  -reqallow
  -reqdel
  -reqdeny
  -reqiallow
  -reqidel
  -reqideny
  -reqipass
  -reqirep
  -reqitarpit
  -reqpass
  -reqrep
  -reqtarpit
  -rspadd
  -rspdel
  -rspdeny
  -rspidel
  -rspideny
  -rspirep
  -rsprep
2019-05-22 20:43:45 +02:00
Willy Tarreau
3844747536 CLEANUP: raw_sock: remove support for very old linux splice bug workaround
We've been dealing with a workaround for a bug in splice that used to
affect version 2.6.25 to 2.6.27.12 and which was fixed 10 years ago
in kernel versions which are not supported anymore. Given that people
who would use a kernel in such a range would face much more serious
stability and security issues, it's about time to get rid of this
workaround and of the ASSUME_SPLICE_WORKS build option used to disable
it.
2019-05-22 20:02:15 +02:00
Willy Tarreau
e5733234f6 CLEANUP: build: rename some build macros to use the USE_* ones
We still have quite a number of build macros which are mapped 1:1 to a
USE_something setting in the makefile but which have a different name.
This patch cleans this up by renaming them to use the USE_something
one, allowing to clean up the makefile and make it more obvious when
reading the code what build option needs to be added.

The following renames were done :

 ENABLE_POLL -> USE_POLL
 ENABLE_EPOLL -> USE_EPOLL
 ENABLE_KQUEUE -> USE_KQUEUE
 ENABLE_EVPORTS -> USE_EVPORTS
 TPROXY -> USE_TPROXY
 NETFILTER -> USE_NETFILTER
 NEED_CRYPT_H -> USE_CRYPT_H
 CONFIG_HAP_CRYPT -> USE_LIBCRYPT
 CONFIG_HAP_NS -> DUSE_NS
 CONFIG_HAP_LINUX_SPLICE -> USE_LINUX_SPLICE
 CONFIG_HAP_LINUX_TPROXY -> USE_LINUX_TPROXY
 CONFIG_HAP_LINUX_VSYSCALL -> USE_LINUX_VSYSCALL
2019-05-22 19:47:57 +02:00
Willy Tarreau
823bda0eb7 BUILD: time: remove the test on _POSIX_C_SOURCE
It seems it's not defined on FreeBSD while it's mentioned on Linux that
clock_gettime() can be detected using this. Given that we also have the
test for _POSIX_TIMERS>0 that should cover it well enough. If it breaks
on other systems, we'll see.

Report was here :
    https://github.com/haproxy/haproxy/runs/133866993
2019-05-22 19:14:59 +02:00
Willy Tarreau
082b62828d BUG/MEDIUM: init/threads: provide per-thread alloc/free function callbacks
We currently have the ability to register functions to be called early
on thread creation and at thread deinitialization. It turns out this is
not sufficient because certain such functions may use resources that are
being allocated by the other ones, thus creating a race condition depending
only on the linking order. For example the mworker needs to register a
file descriptor while the pollers will reallocate the fd_updt[] array.
Similarly logs and trashes may be used by some init functions while it's
unclear whether they have been deduplicated.

The same issue happens on deinit, if the fd_updt[] or trash is released
before some functions finish to use them, we'll get into trouble.

This patch creates a couple of early and late callbacks for per-thread
allocation/freeing of resources. A few init functions were moved there,
and the fd init code was split between the two (since it used to both
allocate and initialize at once). This way the init/deinit sequence is
expected to be safe now.

This patch should be backported to 1.9 as at least the trash/log issue
seems to be present. The run_thread_poll_loop() code is a bit different
there as the mworker is not a callback, but it will have no effect and
it's enough to drop the mworker changes.

This bug was reported by Ilya Shipitsin in github issue #104.
2019-05-22 14:59:08 +02:00
Willy Tarreau
aabbe6a3bb MINOR: WURFL: do not emit warnings when not configured
At the moment the WURFL module emits 3 lines of warnings upon startup
when it is not referenced in the configuration file, which is quite
confusing. Let's make sure to keep it silent when not configured, as
detected by the absence of the wurfl-data-file statement.
2019-05-22 14:01:22 +02:00
mbellomi
ae4fcf1e67 MINOR: WURFL: module version bump to 2.0
Make it version 2.0.
2019-05-22 12:06:42 +02:00
mbellomi
2c07700098 MEDIUM: WURFL: HTX awareness.
Now wurfl fetch process is fully  HTX aware.
2019-05-22 12:06:38 +02:00
mbellomi
9896981675 MINOR: WURFL: wurfl_get() and wurfl_get_all() now return an empty string if device detection fails 2019-05-22 12:06:38 +02:00
mbellomi
e9fedf560a MINOR: WURFL: removes heading wurfl-information-separator from wurfl-get-all() and wurfl-get() results 2019-05-22 12:06:38 +02:00
mbellomi
4304e30af1 MINOR: WURFL: shows log messages during module initialization
Now some useful startup information is logged to stderr. Previously they
were lost because logs were not yet enabled.
2019-05-22 12:06:34 +02:00
mbellomi
f9ea1e2fd4 MINOR: WURFL: fixed Engine load failed error when wurfl-information-list contains wurfl_root_id 2019-05-22 12:06:07 +02:00
mbellomi
d173e93aa7 BUG/MEDIUM: WURFL: segfault in wurfl-get() with missing info.
A segfault may happen in ha_wurfl_get() when dereferencing information not
present in wurfl-information-list. Check the node retrieved from the tree,
not its container.

This fix must be backported to 1.9.
2019-05-22 12:06:02 +02:00
Willy Tarreau
0a7a4fbbc8 CLEANUP: mux-h1: use "H1" and not "h1" as the mux's name
The mux's name is the only one reported in lower case in "show sess"
or "haproxy -vv" while the other ones are upper case, so it loses and
the other ones win :-)
2019-05-22 11:50:48 +02:00
Willy Tarreau
b106ce1c3d MINOR: stream: remove the cpu time detection from process_stream()
It was not as efficient as the watchdog in that it would only trigger
after the problem resolved by itself, and still required a huge margin
to make sure we didn't trigger for an invalid reason. This used to leave
little indication about the cause. Better use the watchdog now and
improve it if needed.

The detector of unkillable tasks remains active though.
2019-05-22 11:50:48 +02:00
Willy Tarreau
2bfefdbaef MAJOR: watchdog: implement a thread lockup detection mechanism
Since threads were introduced, we've naturally had a number of bugs
related to locking issues. In addition we've also got some issues
with corrupted lists in certain rare cases not necessarily involving
threads. Not only these events cause a lot of trouble to the production
as it is very hard to detect that the process is stuck in a loop and
doesn't deliver the service anymore, but it's often difficult (or too
late) to collect more debugging information.

The patch presented here implements a lockup detection mechanism, also
known as "watchdog". The principle is that (on systems supporting it),
each thread will have its own CPU timer which progresses as the thread
consumes CPU cycles, and when a deadline is met, a signal is delivered
(SIGALRM here since it doesn't interrupt gdb by default).

The thread handling this signal (which is not necessarily the one which
triggered the timer) figures the thread ID from the signal arguments and
checks if it's really stuck by looking at the time spent since last exit
from poll() and by checking that the thread's scheduler is still alive
(so that even when dealing with configuration issues resulting in insane
amount of tasks being called in turn, it is not possible to accidently
trigger it). Checking the scheduler's activity will usually result in a
second chance, thus doubling the detecting time.

In order not to incorrectly flag a thread as being the cause of the
lockup, the thread_harmless_mask is checked : a thread could very well
be spinning on itself waiting for all other threads to join (typically
what happens when issuing "show sess"). In this case, once all threads
but one (or two) have joined, all the innocent ones are marked harmless
and will not trigger the timer. Only the ones not reacting will.

The deadline is set to one second, which already appears impossible to
reach, especially since it's 1 second of CPU usage, not elapsed time
with the CPU being preempted by other threads/processes/hypervisor. In
practice due to the scheduler's health verification it takes up to two
seconds to decide to panic.

Once all conditions are met, the goal is to crash from the offending
thread. So if it's the current one, we call ha_panic() otherwise the
signal is bounced to the offending thread which deals with it. This
will result in all threads being woken up in turn to dump their context,
the whole state is emitted on stderr in hope that it can be logged, and
the process aborts, leaving a chance for a core to be dumped and for a
service manager to restart it.

An alternative mechanism could be implemented for systems unable to
wake up a thread once its CPU clock reaches a deadline (e.g. FreeBSD).
Instead of waking the timer each and every deadline, it is possible to
use a standard timer which is reset each time we leave poll(). Since
the signal handler rechecks the CPU consumption this will also work.
However a totally idle process may trigger it from time to time which
may or may not confuse some debugging sessions. The same is true for
alarm() which could be another option for systems not having such a
broad choice of timers (but it seems that in this case they will not
have per-thread CPU measurements available either).

The feature is currently implemented only when threads are enabled in
order to keep the code clean, since the main purpose is to detect and
address inter-thread deadlocks. But if it proves useful for other
situations this condition might be relaxed.
2019-05-22 11:50:48 +02:00
Willy Tarreau
e6a02fa65a MINOR: threads: add a "stuck" flag to the thread_info struct
This flag is constantly cleared by the scheduler and will be set by the
watchdog timer to detect stuck threads. It is also set by the "show
threads" command so that it is easy to spot if the situation has evolved
between two subsequent calls : if the first "show threads" shows no stuck
thread and the second one shows such a stuck thread, it indicates that
this thread didn't manage to make any forward progress since the previous
call, which is extremely suspicious.
2019-05-22 11:50:48 +02:00
Willy Tarreau
578ea8be55 MINOR: debug: dump streams when an applet, iocb or stream is known
Whenever we can retrieve a valid stream pointer, we now call stream_dump()
to get a detailed dump of the stream currently running on the processor.
This is used by "show threads" and by ha_panic().
2019-05-22 11:50:48 +02:00
Willy Tarreau
5484d58a17 MINOR: stream: introduce a stream_dump() function and use it in stream_dump_and_crash()
This function dumps a lot of information about a stream into the provided
buffer. It is now used by stream_dump_and_crash() and will be used by the
debugger as well.
2019-05-22 11:50:48 +02:00
Willy Tarreau
fade80d162 CLEANUP: debug: make use of ha_tkill() and remove ifdefs
This way we always signal the threads the same way.
2019-05-22 11:50:48 +02:00
Willy Tarreau
2beaaf7d46 MINOR: threads: implement ha_tkill() and ha_tkillall()
These functions are used respectively to signal one thread or all threads.
When multithreading is disabled, it's always the current thread which is
signaled.
2019-05-22 11:50:48 +02:00
Willy Tarreau
8b35ba54bc CLEANUP: debug: always report harmless/want_rdv even without threads
This way we have a more consistent output and we can remove annoying
ifdefs.
2019-05-22 11:50:48 +02:00
Willy Tarreau
05ed14cfc4 CLEANUP: threads: really move thread_info to hathreads.c
Commit 5a6e2245f ("REORG: threads: move the struct thread_info from
global.h to hathreads.h") didn't hold its promise well, as the thread_info
struct was still declared and initialized in haproxy.c in addition to being
in hathreads.c. Let's move it for real now.
2019-05-22 11:50:48 +02:00
Willy Tarreau
ddd8533f1b MINOR: debug: switch to SIGURG for thread dumps
The current choice of SIGPWR has the adverse effect of stopping gdb each
time it is triggered using "show threads" or example, which is not really
convenient. Let's switch to SIGURG instead, which we don't use either.
2019-05-22 11:50:48 +02:00
Tim Duesterhus
9b7a976cd6 BUG/MINOR: mworker: Fix memory leak of mworker_proc members
The struct mworker_proc is not uniformly freed everywhere, sometimes leading
to leaks of the `id` string (and possibly the other strings).

Introduce a mworker_free_child function instead of duplicating the freeing
logic everywhere to prevent this kind of issues.

This leak was reported in issue #96.

It looks like the leaks have been introduced in commit 9a1ee7ac31,
which is specific to 2.0-dev. Backporting `mworker_free_child` might be
helpful to ease backporting other fixes, though.
2019-05-22 11:29:18 +02:00
Willy Tarreau
f61782418c CLEANUP: time: refine the test on _POSIX_TIMERS
The clock_gettime() man page says we must check that _POSIX_TIMERS is
defined to a value greater than zero, not just that it's simply defined
so let's fix this right now.
2019-05-21 20:03:03 +02:00
Olivier Houchard
aacc405c1f BUG/MEDIUM: streams: Don't switch from SI_ST_CON to SI_ST_DIS on read0.
When we receive a read0, and we're still in SI_ST_CON state (so on an
outgoing conneciton), don't immediately switch to SI_ST_DIS, or, we would
never call sess_establish(), and so the analysers will never run.
Instead, let sess_establish() handle that case, and switch to SI_ST_DIS if
we already have CF_SHUTR on the channel.

This should be backported to 1.9.
2019-05-21 19:05:09 +02:00
Emmanuel Hocdet
0ba4f483d2 MAJOR: polling: add event ports support (Solaris)
Event ports are kqueue/epoll polling class for Solaris. Code is based
on https://github.com/joyent/haproxy-1.8/tree/joyent/dev-v1.8.8.
Event ports are available only on SunOS systems derived from
Solaris 10 and later (including illumos systems).
2019-05-21 15:16:45 +02:00
Willy Tarreau
663fda4c90 BUILD: threads: only assign the clock_id when supported
I took extreme care to always check for _POSIX_THREAD_CPUTIME before
manipulating clock_id, except at one place (run_thread_poll_loop) as
found by Manu, breaking Solaris. Now fixed, no backport needed.
2019-05-21 15:14:08 +02:00
Willy Tarreau
9c8800af3b MINOR: debug: report each thread's cpu usage in "show thread"
Now we can report each thread's CPU time, both at wake up (poll) and
retrieved while dumping (now), then the difference, which directly
indicates how long the thread has been running uninterrupted. A very
high value for the diff could indicate a deadlock, especially if it
happens between two threads. Note that it may occasionally happen
that a wrong value is displayed since nothing guarantees that the
date is read atomically.
2019-05-20 21:14:14 +02:00
Willy Tarreau
81036f2738 MINOR: time: move the cpu, mono, and idle time to thread_info
These ones are useful across all threads and would be better placed
in struct thread_info than thread-local. There are very few users.
2019-05-20 21:14:14 +02:00
Willy Tarreau
8323a375bc MINOR: threads: add a thread-local thread_info pointer "ti"
Since we're likely to access this thread_info struct more frequently in
the future, let's reserve the thread-local symbol to access it directly
and avoid always having to combine thread_info and tid. This pointer is
set when tid is set.
2019-05-20 21:14:12 +02:00
Willy Tarreau
624dcbf41e MINOR: threads: always place the clockid in the struct thread_info
It will be easier to deal with the internal API to always have it.
2019-05-20 21:13:01 +02:00
Willy Tarreau
5a6e2245fa REORG: threads: move the struct thread_info from global.h to hathreads.h
It doesn't make sense to keep this struct thread_info in global.h, it
causes difficulties to access its contents from hathreads.h, let's move
it to the threads where it ought to have been created.
2019-05-20 20:00:25 +02:00
Willy Tarreau
a9f9fc9e5b MINOR: debug: make ha_panic() report threads starting at 1
Internally they start at zero but everywhere (config, dumps) we show
them starting at 1, so let's fix the confusion.
2019-05-20 17:46:14 +02:00
Willy Tarreau
3710105945 MINOR: tools: provide a may_access() function and make dump_hex() use it
It's a bit too easy to crash by accident when using dump_hex() on any
area. Let's have a function to check if the memory may safely be read
first. This one abuses the stat() syscall checking if it returns EFAULT
or not, in which case it means we're not allowed to read from there. In
other situations it may return other codes or even a success if the
area pointed to by the file exists. It's important not to abuse it
though and as such it's tested only once per output line.
2019-05-20 16:59:37 +02:00
Willy Tarreau
6bdf3e9b11 MINOR: debug/cli: add some debugging commands for developers
When haproxy is built with DEBUG_DEV, the following commands are added
to the CLI :

  debug dev close <fd>        : close this file descriptor
  debug dev delay [ms]        : sleep this long
  debug dev exec  [cmd] ...   : show this command's output
  debug dev exit  [code]      : immediately exit the process
  debug dev hex   <addr> [len]: dump a memory area
  debug dev log   [msg] ...   : send this msg to global logs
  debug dev loop  [ms]        : loop this long
  debug dev panic             : immediately trigger a panic
  debug dev tkill [thr] [sig] : send signal to thread

These are essentially aimed at helping developers trigger certain
conditions and are expected to be complemented over time.
2019-05-20 16:59:30 +02:00
Willy Tarreau
56131ca58e MINOR: debug: implement ha_panic()
This function dumps all existing threads using the thread dump mechanism
then aborts. This will be used by the lockup detection and by debugging
tools.
2019-05-20 16:51:30 +02:00
Willy Tarreau
9fc5dcbd71 MINOR: tools: add dump_hex()
This is used to dump a memory area into a buffer for debugging purposes.
2019-05-20 16:51:30 +02:00
Willy Tarreau
da5a63f8f1 CLEANUP: stream: remove an obsolete debugging test
The test consisted in checking that there was always a timeout on a
stream's task and was only enabled when built in development mode,
but 1) it is never tested and 2) if it had been tested it would have
been noticed that it triggers a bit too easily on the CLI. Let's get
rid of this old one.
2019-05-20 16:19:40 +02:00
Willy Tarreau
91e6df01fa MINOR: threads: add each thread's clockid into the global thread_info
This is the per-thread CPU runtime clock, it will be used to measure
the CPU usage of each thread and by the lockup detection mechanism. It
must only be retrieved at the beginning of run_thread_poll_loop() since
the thread must already have been started for this. But it must be done
before performing any per-thread initcall so that all thread init
functions have access to the clock ID.

Note that it could make sense to always have this clockid available even
in non-threaded situations and place the process' clock there instead.
But it would add portability issues which are currently easy to deal
with by disabling threads so it may not be worth it for now.
2019-05-20 11:42:25 +02:00
Willy Tarreau
522cfbc1ea MINOR: init/threads: make the global threads an array of structs
This way we'll be able to store more per-thread information than just
the pthread pointer. The storage became an array of struct instead of
an allocated array since it's very small (typically 512 bytes) and not
worth the hassle of dealing with memory allocation on this. The array
was also renamed thread_info to make its intended usage more explicit.
2019-05-20 11:37:57 +02:00
Willy Tarreau
64a47b943c CLEANUP: memory: make the fault injection code use the OTHER_LOCK label
The mem_should_fail() function sets a lock while it's building its
messages, and when this was done there was no relevant label available
hence the confusing use of START_LOCK. Now OTHER_LOCK is available for
such use cases, so let's switch to this one instead as START_LOCK is
going to disappear.
2019-05-20 11:26:12 +02:00
Willy Tarreau
619a95f5ad MEDIUM: init/mworker: make the pipe register function a regular initcall
Now that we have the guarantee that init calls happen before any other
thread starts, we don't need anymore the workaround installed by commit
1605c7ae6 ("BUG/MEDIUM: threads/mworker: fix a race on startup") and we
can instead rely on a regular per-thread initcall for this function. It
will only be performed on worker thread #0, the other ones and the master
have nothing to do, just like in the original code that was only moved
to the function.
2019-05-20 11:26:12 +02:00
Willy Tarreau
3078e9f8e2 MINOR: threads/init: synchronize the threads startup
It's a bit dangerous to let threads initialize at different speeds on
startup. Some are still in their init functions while others area already
running. It was even subject to some race condition bugs like the one
fixed by commit 1605c7ae6 ("BUG/MEDIUM: threads/mworker: fix a race on
startup").

Here in order to secure all this, we take a very simplistic approach
consisting in using half of the rendez-vous point, which is made
exactly for this purpose : we first initialize the mask of the threads
requesting a rendez-vous to the mask of all threads, and we simply call
thread_release() once the init is complete. This guarantees that no
thread will go further than the initialization code during this time.

This could even safely be backported if any other issue related to an
init race was discovered in a stable release.
2019-05-20 11:26:12 +02:00
William Lallemand
7b302d8dd5 MINOR: init: setenv HAPROXY_CFGFILES
Set the HAPROXY_CFGFILES environment variable which contains the list of
configuration files used to start haproxy, separated by semicolon.
2019-05-20 11:21:00 +02:00
Willy Tarreau
c7091d89ae MEDIUM: debug/threads: implement an advanced thread dump system
The current "show threads" command was too limited as it was not possible
to dump other threads' detailed states (e.g. their tasks). This patch
goes further by using thread signals so that each thread can dump its
own state in turn into a shared buffer provided by the caller. Threads
are synchronized using a mechanism very similar to the rendez-vous point
and using this method, each thread can safely dump any of its contents
and the caller can finally report the aggregated ones from the buffer.

It is important to keep in mind that the list of signal-safe functions
is limited, so we take care of only using chunk_printf() to write to a
pre-allocated buffer.

This mechanism is enabled by USE_THREAD_DUMP and is enabled by default
on Linux 2.6.28+. On other platforms it falls back to the previous
solution using the loop and the less precise dump.
2019-05-17 17:16:20 +02:00
Willy Tarreau
0ad46fa6f5 MINOR: stream: detach the stream from its own task on stream_free()
This makes sure that the stream is not visible from its own task just
before starting to free some of its components. This way we have the
guarantee that a stream found in a task list is totally valid and can
safely be dereferenced.
2019-05-17 17:16:20 +02:00
Willy Tarreau
01f3489752 MINOR: task: put barriers after each write to curr_task
This one may be watched by signal handlers, we don't want the compiler
to optimize its assignment away at the end of the loop and leave some
wandering pointers there.
2019-05-17 17:16:20 +02:00
Willy Tarreau
38171daf21 MINOR: thread: implement ha_thread_relax()
At some places we're using a painful ifdef to decide whether to use
sched_yield() or pl_cpu_relax() to relax in loops, this is hardly
exportable. Let's move this to ha_thread_relax() instead and une
this one only.
2019-05-17 17:16:20 +02:00
Willy Tarreau
20db9115dc BUG/MINOR: debug: don't check the call date on tasklets
tasklets don't have a call date, so when a tasklet is cast into a task
and is present at the end of a page we run a risk of dereferencing
unmapped memory when dumping them in ha_task_dump(). This commit
simplifies the test and uses to distinct calls for tasklets and tasks.
No backport is needed.
2019-05-17 17:16:20 +02:00
Willy Tarreau
5cf64dd1bd MINOR: debug: make ha_thread_dump() and ha_task_dump() take a buffer
Instead of having them dump into the trash and initialize it, let's have
the caller initialize a buffer and pass it. This will be convenient to
dump multiple threads at once into a single buffer.
2019-05-17 17:16:20 +02:00
Willy Tarreau
14a1ab75d0 BUG/MINOR: debug: make ha_task_dump() actually dump the requested task
It used to only dump the current task, which isn't different for now
but the purpose clearly is to dump the requested task. No backport is
needed.
2019-05-17 17:16:20 +02:00
Willy Tarreau
231ec395c1 BUG/MINOR: debug: make ha_task_dump() always check the task before dumping it
For now it cannot happen since we're calling it from a task but it will
break with signals. No backport is needed.
2019-05-17 17:16:20 +02:00
Olivier Houchard
6db1699f77 BUG/MEDIUM: streams: Try to L7 retry before aborting the connection.
In htx_wait_for_response, in case of error, attempt a L7 retry before
aborting the connection if the TX_NOT_FIRST flag is set.
If we don't do that, then we wouldn't attempt L7 retries after the first
request, or if we use HTTP/2, as with HTTP/2 that flag is always set.
2019-05-17 15:49:21 +02:00
Olivier Houchard
ce1a0292bf BUG/MEDIUM: streams: Don't use CF_EOI to decide if the request is complete.
In si_cs_send(), don't check CF_EOI on the request channel to decide if the
request is complete and if we should save the buffer to eventually attempt
L7 retries. The flag may not be set yet, and it may too be set to early,
before we're done modifying the buffer. Instead, get the msg, and make sure
its state is HTTP_MSG_DONE.
That way we will store the request buffer when sending it even in H2.
2019-05-17 15:49:21 +02:00
Willy Tarreau
4e2b646d60 MINOR: cli/debug: add a thread dump function
The new function ha_thread_dump() will dump debugging info about all known
threads. The current thread will contain a bit more info. The long-term goal
is to make it possible to use it in signal handlers to improve the accuracy
of some dumps.

The function dumps its output into the trash so as it was trivial to add,
a new "show threads" command appeared on the CLI.
2019-05-16 18:06:45 +02:00
Willy Tarreau
58d9621fc8 MINOR: cli/activity: show the dumping thread ID starting at 1
Both the config and gdb report thread IDs starting at 1, so better do the
same in "show activity" to limit confusion. We also display the full
permitted range.

This could be backported to 1.9 since it was present there.
2019-05-16 18:02:03 +02:00
Tim Duesterhus
3506dae342 MEDIUM: Make 'resolution_pool_size' directive fatal
This directive never appeared in a stable release and instead was
introduced and deprecated within 1.8-dev. While it technically could
be outright removed we detect it and error out for good measure.
2019-05-16 18:02:03 +02:00
Tim Duesterhus
10c6c16cde MEDIUM: Make 'option forceclose' actually warn
It is deprecated since 315b39c391 (1.9-dev),
but only was deprecated in the docs.

Make it warn when being used and remove it from the docs.
2019-05-16 18:02:03 +02:00
Christopher Faulet
c1f40dd492 BUG/MINOR: http_fetch: Rely on the smp direction for "cookie()" and "hdr()"
A regression was introduced in the commit 89dc49935 ("BUG/MAJOR: http_fetch: Get
the channel depending on the keyword used") on the samples "cookie()" and
"hdr()". Unlike other samples manipulating the HTTP headers, these ones depend
on the sample direction. To fix the bug, these samples use now their own
functions. Depending on the sample direction, they call smp_fetch_cookie() and
smp_fetch_hdr() with the appropriate keyword.

Thanks to Yves Lafon to report this issue.

This patch must be backported wherever the commit 89dc49935 was backported. For
now, 1.9 and 1.8.
2019-05-16 11:31:28 +02:00
Olivier Houchard
35d116885d MINOR: connections: Use BUG_ON() to enforce rules in subscribe/unsubscribe.
It is not legal to subscribe if we're already subscribed, or to unsubscribe
if we did not subscribe, so instead of trying to handle those cases, just
assert that it's ok using the new BUG_ON() macro.
2019-05-14 18:18:25 +02:00
Olivier Houchard
00b8f7c60b MINOR: h1: Use BUG_ON() to enforce rules in subscribe/unsubscribe.
It is not legal to subscribe if we're already subscribed, or to unsubscribe
if we did not subscribe, so instead of trying to handle those cases, just
assert that it's ok using the new BUG_ON() macro.
2019-05-14 18:18:25 +02:00
Olivier Houchard
f8338151a3 MINOR: h2: Use BUG_ON() to enforce rules in subscribe/unsubscribe.
It is not legal to subscribe if we're already subscribed, or to unsubscribe
if we did not subscribe, so instead of trying to handle those cases, just
assert that it's ok using the new BUG_ON() macro.
2019-05-14 18:18:25 +02:00
Christopher Faulet
fa922f03a3 BUG/MEDIUM: mux-h2: Set EOI on the conn_stream during h2_rcv_buf()
Just like CS_FL_REOS previously, the CS_FL_EOI flag is abused as a proxy
for H2_SF_ES_RCVD. The problem is that this flag is consumed by the
application layer and is set immediately when an end of stream was met,
which is too early since the application must retrieve the rxbuf's
contents first. The effect is that some transfers are truncated (mostly
the first one of a connection in most tests).

The problem of mixing CS flags and H2S flags in the H2 mux is not new
(and is currently being addressed) but this specific one was emphasized
in commit 63768a63d ("MEDIUM: mux-h2: Don't mix the end of the message
with the end of stream") which was backported to 1.9. Note that other
flags, particularly CS_FL_REOS still need to be asynchronously reported,
though their impact seems more limited for now.

This patch makes sure that all internal uses of CS_FL_EOI are replaced
with a test on H2_SF_ES_RCVD (as there is a 1-to-1 equivalence) and that
CS_FL_EOI is only reported once the rxbuf is empty.

This should ideally be backported to 1.9 unless it causes too much
trouble due to the recent changes in this area, as 1.9 *seems* not
to be directly affected by this bug.
2019-05-14 15:47:57 +02:00
Willy Tarreau
99ad1b3e8c MINOR: mux-h2: stop relying on CS_FL_REOS
This flag was introduced early in 1.9 development (a3f7efe00) to report
the fact that the rxbuf that was present on the conn_stream was followed
by a shutr. Since then the rxbuf moved from the conn_stream to the h2s
(638b799b0) but the flag remained on the conn_stream. It is problematic
because some state transitions inside the mux depend on it, thus depend
on the CS, and as such have to test for its existence before proceeding.

This patch replaces the test on CS_FL_REOS with a test on the only
states that set this flag (H2_SS_CLOSED, H2_SS_HREM, H2_SS_ERROR).
The few places where the flag was set were removed (the flag is not
used by the data layer).
2019-05-14 15:47:57 +02:00
Willy Tarreau
4c688eb8d1 MINOR: mux-h2: add macros to check multiple stream states at once
At many places we need to test for several stream states at once, let's
have macros to make a bit mask from a state to ease this.
2019-05-14 15:47:57 +02:00
Willy Tarreau
f8fe3d63f0 CLEANUP: mux-h2: don't test for impossible CS_FL_REOS conditions
This flag is currently set when an incoming close was received, which
results in the stream being in either H2_SS_HREM, H2_SS_CLOSED, or
H2_SS_ERROR states, so let's remove the test for the OPEN and HLOC
cases.
2019-05-14 15:47:57 +02:00
Willy Tarreau
3cf69fe6b2 BUG/MINOR: mux-h2: make sure to honor KILL_CONN in do_shut{r,w}
If the stream closes and quits while there's no room in the mux buffer
to send an RST frame, next time it is attempted it will not lead to
the connection being closed because the conn_stream will have been
released and the KILL_CONN flag with it as well.

This patch reserves a new H2_SF_KILL_CONN flag that is copied from
the CS when calling shut{r,w} so that the stream remains autonomous
on this even when the conn_stream leaves.

This should ideally be backported to 1.9 though it depends on several
previous patches that may or may not be suitable for backporting. The
severity is very low so there's no need to insist in case of trouble.
2019-05-14 15:47:57 +02:00
Willy Tarreau
aebbe5ef72 MINOR: mux-h2: make h2s_wake_one_stream() not depend on temporary CS flags
In h2s_wake_one_stream() we used to rely on the temporary flags used to
adjust the CS to determine the new h2s state. This really is not convenient
and creates far too many dependencies. This commit just moves the same
condition to the places where the temporary flags were set so that we
don't have to rely on these anymore. Whether these are relevant or not
was not the subject of the operation, what matters was to make sure the
conditions to adjust the stream's state and the CS's flags remain the
same. Later it could be studied if these conditions are correct or not.
2019-05-14 15:47:57 +02:00
Willy Tarreau
13b6c2e8b3 MINOR: mux-h2: make h2s_wake_one_stream() the only function to deal with CS
h2s_wake_one_stream() has access to all the required elements to update
the connstream's flags and figure the necessary state transitions, so
let's move the conditions there from h2_wake_some_streams().
2019-05-14 15:47:57 +02:00
Willy Tarreau
234829111f MINOR: mux-h2: make h2_wake_some_streams() not depend on the CS flags
It's problematic to have to pass some CS flags to this function because
that forces some h2s state transistions to update them just in time
while some of them are supposed to only be updated during I/O operations.

As a first step this patch transfers the decision to pass CS_FL_ERR_PENDING
from the caller to the leaf function h2s_wake_one_stream(). It is easy
since this is the only flag passed there and it depends on the position of
the stream relative to the last_sid if it was set.
2019-05-14 15:47:57 +02:00
Willy Tarreau
c3b1183f57 MINOR: mux-h2: remove useless test on stream ID vs last in wake function
h2_wake_some_streams() first looks up streams whose IDs are greater than
or equal to last+1, then checks if the id is lower than or equal to last,
which by definition will never match. Let's remove this confusing leftover
from ancient code.
2019-05-14 15:47:57 +02:00
William Lallemand
920fc8bbe4 BUG/MINOR: mworker: use after free when the PID not assigned
Commit 4528611 ("MEDIUM: mworker: store the leaving state of a process")
introduced a bug in the mworker_env_to_proc_list() function.

This is very unlikely to occur since the PID should always be assigned.
It can probably happen if the environment variable is corrupted.

No backport needed.
2019-05-14 11:28:16 +02:00
Willy Tarreau
f983d00a1c BUG/MINOR: mux-h2: make the do_shut{r,w} functions more robust against retries
These functions may fail to emit an RST or an empty DATA frame because
the mux is full or busy. Then they subscribe the h2s and try again.
However when doing so, they will already have marked the error state on
the stream and will not pass anymore through the sequence resulting in
the failed frame to be attempted to be sent again nor to the close to
be done, instead they will return a success.

It is important to only leave when the stream is already closed, but
to go through the whole sequence otherwise.

This patch should ideally be backported to 1.9 though it's possible that
the lack of the WANT_SHUT* flags makes this difficult or dangerous. The
severity is low enough to avoid this in case of trouble.
2019-05-14 11:13:06 +02:00
Frédéric Lécaille
90a10aeb65 BUG/MINOR: log: Wrong log format initialization.
This patch fixes an issue introduced by 0bad840b commit
"MINOR: log: Extract some code to send syslog messages" which leaded
to wrong log format variable initializations at least for "short" and "raw" format.
This commit skipped the cases where even if passed to __do_send_log(), the
syslog tag and syslog pid string must not be used to format the log message
with "short" and "raw". This is done iniatilizing "tag_max" and "pid_max"
variables (the lengths of the tag and pid strings) to 0, then updating to them to
the length of the tag and pid strings passed as variables to __do_send_log()
depending on the log format and in every cases using this length for the iovec
variable used to send() the log.

This bug is specific to 2.0.
2019-05-14 11:12:00 +02:00
Willy Tarreau
8bdb5c9bb4 CLEANUP: connection: remove the handle field from the wait_event struct
It was only set and not consumed after the previous change. The reason
is that the task's context always contains the relevant information,
so there is no need for a second pointer.
2019-05-13 19:14:52 +02:00
Willy Tarreau
88bdba31fa CLEANUP: mux-h2: simply use h2s->flags instead of ret in h2_deferred_shut()
This one used to rely on the combined return statuses of the shutr/w
functions but now that we have the H2_SF_WANT_SHUT{R,W} flags we don't
need this anymore if we properly remove these flags after their operations
succeed. This is what this patch does.
2019-05-13 19:14:52 +02:00
Willy Tarreau
2c249ebc75 MINOR: mux-h2: add two H2S flags to report the need for shutr/shutw
Currently when a shutr/shutw fails due to lack of buffer space, we abuse
the wait_event's handle pointer to place up to two bits there in addition
to the original pointer. This pointer is not used for anything but this
and overall the intent becomes clearer with h2s flags than with these
two alien bits in the pointer, so let's use clean flags now.
2019-05-13 19:14:52 +02:00
Willy Tarreau
c234ae38f8 CLEANUP: mux-h2: use LIST_ADDED() instead of LIST_ISEMPTY() where relevant
Lots of places were using LIST_ISEMPTY() to detect if a stream belongs
to one of the send lists or to detect if a connection was already
waiting for a buffer or attached to an idle list. Since these ones are
not list heads but list elements, let's use LIST_ADDED() instead.
2019-05-13 19:14:52 +02:00