Commit Graph

2947 Commits

Author SHA1 Message Date
Willy Tarreau
f0cea1ee3f MINOR: tasks: extend the state bits from 8 to 16 and remove the reason
By removing the reason code for the wakeup we can gain 8 extra bits to
encode the task's state. The reason code was never used at all and is
wrong by design since subsequent calls will OR this value anyway. Let's
say it goodbye and leave the room for more precious bits. The woken bits
were moved to the higher byte so that the most important bits can stay
grouped together.
2018-07-26 16:13:00 +02:00
Willy Tarreau
7999bfbfd3 MEDIUM: buffers: make b_xfer() automatically swap buffers when possible
Whenever it's possible to avoid a copy, b_xfer() will simply swap the
buffer's heads without touching the data. This has brought the performance
back from 140 kH/s to 202 kH/s on the test case.
2018-07-20 19:21:43 +02:00
Willy Tarreau
11c9aa424e MEDIUM: conn_stream: add cs_recv() as a default rcv_buf() function
This function is generic and is able to automatically transfer data
from a conn_stream's rx buffer to the destination buffer. It does this
automatically if the mux doesn't define another rcv_buf() function.
2018-07-20 19:21:43 +02:00
Willy Tarreau
5e1cc5ea83 MINOR: conn_stream: add an rx buffer to the conn_stream
In order to reorganize the connection layers, recv() operations will
need to be retryable and to support partial transfers. This requires
an intermediary buffer to hold the data coming from the mux. After a
few attempts, it turns out that this buffer is best placed inside the
conn_stream itself. For now it's only set to buf_empty and it will be
up to the caller to allocate it if required.
2018-07-20 19:21:43 +02:00
Willy Tarreau
a3f7efe009 MINOR: conn_stream: add a new CS_FL_REOS flag
This flag indicates that the mux layer has already detected an end of
stream which will become CS_FL_EOS during a recv() once the rx buffer
is empty.
2018-07-20 19:21:43 +02:00
Willy Tarreau
f148888d19 MINOR: buffers: add b_xfer() to transfer data between buffers
Instead of open-coding buffer-to-buffer transfers using blocks, let's
have a dedicated function for this. It also adjusts the buffer counts.
2018-07-20 19:21:43 +02:00
Willy Tarreau
f7d0447376 MINOR: buffers: split b_putblk() into __b_putblk()
The latter function is more suited to operations that don't require any
check because the check has already been performed. It will be used by
other b_* functions.
2018-07-20 19:21:43 +02:00
Willy Tarreau
ab322d4fd4 MINOR: buffers: simplify b_contig_space()
This function is used a lot in block copies and is needlessly
complicated since it still uses pointer arithmetic. Let's fall
back to regular offsets and simplify it. This removed around
23 bytes from b_putblk() and it removed any conditional jump.
2018-07-20 19:21:43 +02:00
Christopher Faulet
ddb6c16576 BUG/MEDIUM: threads: Fix the exit condition of the thread barrier
In thread_sync_barrier, we exit when all threads have set their own bit in the
barrier mask. It is done by comparing it to all_threads_mask. But we must not
use a simple equality to do so, becaue all_threads_mask may change. Since commit
ba86c6c25 ("MINOR: threads: Be sure to remove threads from all_threads_mask on
exit"), when a thread exit, its bit is removed from all_threads_mask. Instead,
we must use a bitwise AND to test is all bits of all_threads_mask are set.

This also requires that all_threads_mask is set to volatile if we want to
catch changes.

This patch must be backported in 1.8.
2018-07-20 14:24:41 +02:00
Christopher Faulet
20761453fb MINOR: ist: Add the function isteqi
This new function does the same as isteq, but ignoring the case.
2018-07-20 13:39:30 +02:00
Willy Tarreau
8318885487 MINOR: connection: simplify subscription by adding a registration function
This new function wl_set_waitcb() prepopulates a wait_list with a tasklet
and a context and returns it so that it can be passed to ->subscribe() to
be added to a connection or conn_stream's wait_list. The caller doesn't
need to know all the insiders details anymore this way.
2018-07-19 18:31:07 +02:00
Olivier Houchard
910b2bc829 MEDIUM: connections/mux: Revamp the send direction.
Totally nuke the "send" method, instead, the upper layer decides when it's
time to send data, and if it's not possible, uses the new subscribe() method
to be called when it can send data again.
2018-07-19 18:31:07 +02:00
Olivier Houchard
6ff2039d13 MINOR: connections/mux: Add a new "subscribe" method.
Add a new "subscribe" method for connection, conn_stream and mux, so that
upper layer can subscribe to them, to be called when the event happens.
Right now, the only event implemented is "SUB_CAN_SEND", where the upper
layer can register to be called back when it is possible to send data.

The connection and conn_stream got a new "send_wait_list" entry, which
required to move a few struct members around to maintain an efficient
cache alignment (and actually this slightly improved performance).
2018-07-19 16:23:43 +02:00
Olivier Houchard
e17c2d3e57 MINOR: tasklets: Don't attempt to add a tasklet in the list twice.
Don't try to add a tasklet to the run queue if it's already in there, or we
might get an infinite loop.
2018-07-19 16:23:43 +02:00
Willy Tarreau
83061a820e MAJOR: chunks: replace struct chunk with struct buffer
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
2018-07-19 16:23:43 +02:00
Willy Tarreau
843b7cbe9d MEDIUM: chunks: make the chunk struct's fields match the buffer struct
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.

Most of the changes were made with spatch using this coccinelle script :

  @rule_d1@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.str
  + chunk.area

  @rule_d2@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.len
  + chunk.data

  @rule_i1@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->str
  + chunk->area

  @rule_i2@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->len
  + chunk->data

Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
2018-07-19 16:23:43 +02:00
Willy Tarreau
c9fa0480af MAJOR: buffer: finalize buffer detachment
Now the buffers only contain the header and a pointer to the storage
area which can be anywhere. This will significantly simplify buffer
swapping and will make it possible to map chunks on buffers as well.

The buf_empty variable was removed, as now it's enough to have size==0
and area==NULL to designate the empty buffer (thus a non-allocated head
is the empty buffer by default). buf_wanted for now is indicated by
size==0 and area==(void *)1.

The channels and the checks now embed the buffer's head, and the only
pointer is to the storage area. This slightly increases the unallocated
buffer size (3 extra ints for the empty buffer) but considerably
simplifies dynamic buffer management. It will also later permit to
detach unused checks.

The way the struct buffer is arranged has proven quite efficient on a
number of tests, which makes sense given that size is always accessed
and often first, followed by the othe ones.
2018-07-19 16:23:43 +02:00
Willy Tarreau
bd1dba8a89 MINOR: buffer: rename the data length member to '->data'
It used to be called 'len' during the reorganisation but strictly speaking
it's not a length since it wraps. Also we already use '_data' as the suffix
to count available data, and data is also what we use to indicate the amount
of data in a pipe so let's improve consistency here. It was important to do
this in two operations because data used to be the name of the pointer to
the storage area.
2018-07-19 16:23:43 +02:00
Willy Tarreau
e3128024bf MINOR: buffer: replace buffer_replace2() with b_rep_blk()
This one is more generic and designed to work on a random block. It
may later get a b_rep_ist() variant since many strings are already
available as (ptr,len).
2018-07-19 16:23:43 +02:00
Willy Tarreau
4d893d440c MINOR: buffers/channel: replace buffer_insert_line2() with ci_insert_line2()
There was no point keeping that function in the buffer part since it's
exclusively used by HTTP at the channel level, since it also automatically
appends the CRLF. This further cleans up the buffer code.
2018-07-19 16:23:43 +02:00
Willy Tarreau
7b04cc4467 CLEANUP: buffer: minor cleanups to buffer.h
Remove a few unused functions and add some comments to split the file
parts in sections.
2018-07-19 16:23:43 +02:00
Willy Tarreau
911f7dd893 MINOR: buffers: remove b_putstr()
It's not needed anymore.
2018-07-19 16:23:43 +02:00
Willy Tarreau
ea1b06d5bb MINOR: buffer: add a new file for ist + buffer manipulation functions
The new file istbuf.h links the indirect strings (ist) with the buffers.
The purpose is to encourage addition of more standard buffer manipulation
functions that rely on this in order to improve the overall ease of use
along all the code. Just like ist.h and buf.h, this new file is not
expected to depend on anything beyond these two files.

A few functions were added and/or converted from buffer.h :
  - b_isteq()  : indicates if a buffer and a string match
  - b_isteat() : consumes a string from the buffer if it matches
  - b_istput() : appends a small string to a buffer (all or none)
  - b_putist() : appends part of a large string to a buffer

The equivalent functions were removed from buffer.h and changed at the
various call places.
2018-07-19 16:23:43 +02:00
Willy Tarreau
55372f646f MINOR: buffer: replace b{i,o}_put* with b_put*
The two variants now do exactly the same (appending at the tail of the
buffer) so let's not keep the distinction between these classes of
functions and have generic ones for this. It's also worth noting that
b{i,o}_putchk() wasn't used at all and was removed.
2018-07-19 16:23:43 +02:00
Willy Tarreau
72a100b386 MINOR: buffer: replace bi_fast_delete() with b_del()
There's no distinction between in and out data now. The latter covers
the needs of the former and supports wrapping. The extra cost is
negligible given the locations where it's used.
2018-07-19 16:23:43 +02:00
Olivier Houchard
08afac0fd7 MEDIUM: buffers: move "output" from struct buffer to struct channel
Since we never access this field directly anymore, but only through the
channel's wrappers, it can now move to the channel. The buffers are now
completely free from the distinction between input and output data.
2018-07-19 16:23:43 +02:00
Willy Tarreau
892f1dbe4f MINOR: buffer: rename the "data" field to "area"
Since we use "_data" for the amount of data at many places, as opposed to
"_space" for the amount of space, let's rename the "data" field to "area"
so that we can reuse "data" later for the amount of data in the buffer
(currently called "len" despite not being contigous).
2018-07-19 16:23:43 +02:00
Willy Tarreau
f6dfd88a92 MINOR: buffer: b_set_data() doesn't truncate output data anymore
b_set_data() is used :
  - in proto_http and hlua to trim input data (b_set_data(co_data()))
  - in SPOE to append data to a buffer while building a message

In no case will this truncate a buffer so we can safely remove the
test for len < b->output.
2018-07-19 16:23:43 +02:00
Willy Tarreau
abed1e7f34 MINOR: buffer: remove the check for output on b_del()
b_del() is used in :
  - mux_h2 with the demux buffer : always processes input data
  - checks with output data though output is not considered at all there
  - b_eat() which is not used anywhere
  - co_skip() where the len is always <= output

Thus the distinction for output data is not needed anymore and the
decrement can be made inconditionally in co_skip().
2018-07-19 16:23:43 +02:00
Willy Tarreau
d54a8ceb97 MAJOR: start to change buffer API
This is intentionally the minimal and safest set of changes, some cleanups
area still required. These changes are quite tricky and cannot be
independantly tested, so it's important to keep this patch as bisectable
as possible.

buf_empty and buf_wanted were changed and are now exactly similar since
there's no <p> member in the structure anymore. Given that no test is
ever made in the code to check that buf == &buf_wanted, it may be possible
that we don't need to have two anymore, unless some buf_empty tests have
precedence. This will have to be investigated.

A significant part of this commit affects the HTTP compression code,
which used to deeply manipulate the input and output buffers without
any reasonable solution for a better abstraction. For this reason, if
any regression is met and designates this patch as the culprit, it is
important to run tests which specifically involve compression or which
definitely don't use it in order to spot the issue.

Cc: Olivier Houchard <ohouchard@haproxy.com>
2018-07-19 16:23:42 +02:00
Willy Tarreau
523cc5d506 MINOR: buffer: convert part bo_putblk() and bi_putblk() to the new API
These functions are pretty similar and will be merged at the end of the
migration. For now they still need to remain distinct.
2018-07-19 16:23:42 +02:00
Willy Tarreau
fdabbe243d MINOR: buffer: remove unused bo_add()
We don't need this function anymore.
2018-07-19 16:23:42 +02:00
Willy Tarreau
cd9e60db00 MEDIUM: channel: adapt to the new buffer API
Also, ci_swpbuf() was removed (unused).
2018-07-19 16:23:42 +02:00
Olivier Houchard
d4251a7e98 MINOR: channel: Add co_set_data().
Add a new function that lets one set the channel's output amount.
2018-07-19 16:23:42 +02:00
Willy Tarreau
3ee8344b7b MINOR: channel: remove almost all references to buf->i and buf->o
We use ci_data() and co_data() instead now everywhere we read these
values.
2018-07-19 16:23:42 +02:00
Willy Tarreau
591d445049 MINOR: buffer: use b_orig() to replace most references to b->data
This patch updates most users of b->data to use b_orig().
2018-07-19 16:23:42 +02:00
Willy Tarreau
50227f9b88 MINOR: buffer: use c_head() instead of buffer_wrap_sub(c->buf, p-o)
This way we don't need o anymore.
2018-07-19 16:23:42 +02:00
Willy Tarreau
144c5c4d21 MINOR: buffer: replace buffer_flush() with c_adv(chn, ci_data(chn))
It used to forward some input into output.
2018-07-19 16:23:41 +02:00
Willy Tarreau
5ba65521a3 MINOR: buffer: replace buffer_pending() with ci_data()
It used to return b->i for channels, which is what ci_data() does.
2018-07-19 16:23:41 +02:00
Willy Tarreau
3f6799975f MINOR: buffer: replace bi_space_for_replace() with ci_space_for_replace()
This one computes the size that can be overwritten over the input part
of the buffer, so it's channel-specific.
2018-07-19 16:23:41 +02:00
Willy Tarreau
2375233ef0 MINOR: buffer: replace buffer_full() with channel_full()
It's only used by channels since we need to know the amount of output
data.
2018-07-19 16:23:41 +02:00
Willy Tarreau
271e2a503d MINOR: buffer: make bo_putchar() use b_tail()
It's possible because we can't call bo_putchar() with i != 0.
2018-07-19 16:23:41 +02:00
Willy Tarreau
0c7ed5d264 MINOR: buffer: replace buffer_empty() with b_empty() or c_empty()
For the same consistency reasons, let's use b_empty() at the few places
where an empty buffer is expected, or c_empty() if it's done on a channel.
Some of these places were there to realign the buffer so
{b,c}_realign_if_empty() was used instead.
2018-07-19 16:23:41 +02:00
Willy Tarreau
d760eecf61 MINOR: buffer: replace buffer_not_empty() with b_data() or c_data()
It's mostly for consistency as many places already use one of these instead.
2018-07-19 16:23:41 +02:00
Willy Tarreau
eac5259888 MINOR: buffer: use b_room() to determine available space in a buffer
We used to have variations around buffer_total_space() and
size-buffer_len() or size-b_data(). Let's simplify all this. buffer_len()
was also removed as not used anymore.
2018-07-19 16:23:41 +02:00
Willy Tarreau
bc59f359dc MINOR: buffer: get rid of b_ptr() and convert its last users
Now the new API functions are being used everywhere, we can get rid
of b_ptr(). A few last users like bi_istput() and bo_istput() appear
to only differ by what part of the buffer they're increasing, but
that should quickly be merged.
2018-07-19 16:23:41 +02:00
Willy Tarreau
337ea57cfc MINOR: connection: add a new receive flag : CO_RFL_BUF_WET
With this flag we introduce the notion of "dry" vs "wet" buffers : some
demultiplexers like the H2 mux require as much room as possible for some
operations that are not retryable like decoding a headers frame. For this
they need to know if the buffer is congested with data scheduled for
leaving soon or not. Since the new API will not provide this information
in the buffer itself, the caller must indicate it. We never need to know
the amount of such data, just the fact that the buffer is not in its
optimal condition to be used for receipt. This "CO_RFL_BUF_WET" flag is
used to mention that such outgoing data are still pending in the buffer
and that a sensitive receiver should better let it "dry" before using it.
2018-07-19 16:23:41 +02:00
Willy Tarreau
7f3225f251 MINOR: connection: add a flags argument to rcv_buf()
The mux and transport rcv_buf() now takes a "flags" argument, just like
the snd_buf() one or like the equivalent syscall lower part. The upper
layers will use this to pass some information such as indicating whether
the buffer is free from outgoing data or if the lower layer may allocate
the buffer itself.
2018-07-19 16:23:41 +02:00
Willy Tarreau
d9cf540457 MEDIUM: mux: make mux->rcv_buf() take a size_t for the count
It also returns a size_t. This is in order to clean the API. Note
that the H2 mux still uses some ints in the functions called from
h2_rcv_buf(), though it's not really a problem given that H2 frames
are smaller. It may deserve a general cleanup later though.
2018-07-19 16:23:41 +02:00
Willy Tarreau
bfc4d77ad3 MEDIUM: connection: make xprt->rcv_buf() use size_t for the count
Just like we have a size_t for xprt->snd_buf(), we adjust to use size_t
for rcv_buf()'s count argument and return value. It also removes the
ambiguity related to the possibility to see a negative value there.
2018-07-19 16:23:41 +02:00
Willy Tarreau
deccd1116d MEDIUM: mux: make mux->snd_buf() take the byte count in argument
This way the mux doesn't need to modify the buffer's metadata anymore
nor to know the output's size. The mux->snd_buf() function now takes a
const buffer and it's up to the caller to update the buffer's state.

The return type was updated to return a size_t to comply with the count
argument.
2018-07-19 16:23:41 +02:00
Willy Tarreau
787db9a6a4 MEDIUM: connection: make xprt->snd_buf() take the byte count in argument
This way the senders don't need to modify the buffer's metadata anymore
nor to know about the output's split point. This way the functions can
take a const buffer and it's clearer who's in charge of updating the
buffer after a send. That's why the buffer realignment is now performed
by the caller of the transport's snd_buf() functions.

The return type was updated to return a size_t to comply with the count
argument.
2018-07-19 16:23:41 +02:00
Willy Tarreau
55f3ce1c91 MINOR: buffer: make b_getblk_nc() take size_t for the block sizes
Till now we used to reimplement it using ints to limit external changes
but we must adjust it and the various users to switch to size_t.
2018-07-19 16:23:41 +02:00
Willy Tarreau
206ba834ef MINOR: buffer: make b_getblk_nc() take const pointers
Now that there are no more users requiring to modify the buffer anymore,
switch these ones to const char and const buffer. This will make it more
obvious next time send functions are tempted to modify the buffer's output
count. Minor adaptations were necessary at a few call places which were
using char due to the function's previous prototype.
2018-07-19 16:23:41 +02:00
Willy Tarreau
5d7d1bbd0e MINOR: buffer: get rid of b_end() and b_to_end()
These ones are not used anymore.
2018-07-19 16:23:41 +02:00
Willy Tarreau
f40e68227b MINOR: h1: make h1_measure_trailers() use an offset and a count
This will be needed by the H2 encoder to restart after wrapping.
2018-07-19 16:23:41 +02:00
Willy Tarreau
84d6b7af87 MINOR: h1: make h1_parse_chunk_size() not depend on b_ptr() anymore
It's similar to the previous commit so that the function doesn't rely
on buf->p anymore.
2018-07-19 16:23:41 +02:00
Willy Tarreau
c0973c6742 MINOR: h1: make h1_skip_chunk_crlf() not depend on b_ptr() anymore
It now takes offsets relative to the buffer's head. It's up to the
callers to add this offset which corresponds to the buffer's output
size.
2018-07-19 16:23:41 +02:00
Willy Tarreau
7314be8e2c MINOR: h1: make h1_measure_trailers() take the byte count in argument
The principle is that it should not have to take this value from the
buffer itself anymore.
2018-07-19 16:23:40 +02:00
Willy Tarreau
e5f12ce7f2 MINOR: buffer: replace bi_del() and bo_del() with b_del()
Till now the callers had to know which one to call for specific use cases.
Let's fuse them now since a single one will remain after the API migration.
Given that bi_del() may only be used where o==0, just combine the two tests
by first removing output data then only input.
2018-07-19 16:23:40 +02:00
Willy Tarreau
a1f78fb652 MINOR: buffer: replace bo_getblk_nc() with b_getblk_nc() which takes an offset
This will be important so that we can parse a buffer without touching it.
Now we indicate where from the buffer's head we plan to start to copy, and
for how many bytes. This will be used by send functions to loop at the end
of the buffer without having to update the buffer's output byte count.
2018-07-19 16:23:40 +02:00
Willy Tarreau
90ed3836db MINOR: buffer: replace bo_getblk() with direction agnostic b_getblk()
This new functoin limits itself to the amount of data available in the
buffer and doesn't care about the direction anymore. It's only called
from co_getblk() which already checks that no more than the available
output bytes is requested.
2018-07-19 16:23:40 +02:00
Willy Tarreau
e4d5a036ed MINOR: buffer: merge b{i,o}_contig_space()
These ones were merged into a single b_contig_space() that covers both
(the bo_ case was a simplified version of the other one). The function
doesn't use ->i nor ->o anymore.
2018-07-19 16:23:40 +02:00
Willy Tarreau
0e11d59af6 MINOR: buffer: remove bo_contig_data()
The two call places now make use of b_contig_data(0) and check by
themselves that the returned size is no larger than the scheduled
output data.
2018-07-19 16:23:40 +02:00
Willy Tarreau
8f9c72d301 MINOR: buffer: remove bi_end()
It was replaced by ci_tail() when the channel is known, or b_tail() in
other cases.
2018-07-19 16:23:40 +02:00
Willy Tarreau
41e38ac0ee MINOR: buffer: remove bo_end()
It was replaced by either b_tail() when the buffer has no input data, or
b_peek(b, b->o).
2018-07-19 16:23:40 +02:00
Willy Tarreau
89faf5d7c3 MINOR: buffer: remove bo_ptr()
It was replaced by co_head() when a channel was known, otherwise b_head().
2018-07-19 16:23:40 +02:00
Willy Tarreau
dda2e41881 MINOR: buffer: remove bi_ptr()
It's now been replaced by b_head() when b->o is null, ci_head() when
the channel is known, or b_peek(b, b->o) in other situations.
2018-07-19 16:23:40 +02:00
Willy Tarreau
7194d3cc3b MINOR: buffer: split bi_contig_data() into ci_contig_data and b_config_data()
This function was sometimes used from a channel and sometimes from a buffer.
In both cases it requires knowledge of the size of the output data (to skip
them). Here the split ensures the channel can deal with this point, and that
other places not having output data can continue to work.
2018-07-19 16:23:40 +02:00
Willy Tarreau
d55fe397a0 MINOR: buffer: remove bi_getblk() and bi_getblk_nc()
These ones were relying on bi_ptr() and are not used. They may be
reimplemented later in the channel if needed.
2018-07-19 16:23:40 +02:00
Willy Tarreau
aa7af7213d MINOR: buffer: replace calls to buffer_space_wraps() with b_space_wraps()
And remove the unused function.
2018-07-19 16:23:40 +02:00
Willy Tarreau
bcbd39370f MINOR: channel/buffer: replace b_{adv,rew} with c_{adv,rew}
These ones manipulate the output data count which will be specific to
the channel soon, so prepare the call points to use the channel only.
The b_* functions are now unused and were removed.
2018-07-19 16:23:40 +02:00
Willy Tarreau
c0a51c51b1 MINOR: buffer: remove buffer_slow_realign() and the swap_buffer allocation code
Since all call places can use the trash now, this is not needed anymore.
2018-07-19 16:23:40 +02:00
Willy Tarreau
fd8d42f496 MEDIUM: channel: make channel_slow_realign() take a swap buffer
The few call places where it's used can use the trash as a swap buffer,
which is made for this exact purpose. This way we can rely on the
generic b_slow_realign() call.
2018-07-19 16:23:40 +02:00
Willy Tarreau
4cf1300e6a MINOR: channel/buffer: replace buffer_slow_realign() with channel_slow_realign() and b_slow_realign()
Where relevant, the channel version is used instead. The buffer version
was ported to be more generic and now takes a swap buffer and the output
byte count to know where to set the alignment point. The H2 mux still
uses buffer_slow_realign() with buf->o but it will change later.
2018-07-19 16:23:40 +02:00
Willy Tarreau
d5b343bf9e MINOR: channel/buffer: use c_realign_if_empty() instead of buffer_realign()
This patch removes buffer_realign() and replaces it with c_realign_if_empty()
instead.
2018-07-19 16:23:40 +02:00
Willy Tarreau
08d5ac8f27 MINOR: channel: add a few basic functions for the new buffer API
This adds :
  - c_orig()  : channel buffer's origin
  - c_size()  : channel buffer's size
  - c_wrap()  : channel buffer's wrapping location
  - c_data()  : channel buffer's total data count
  - c_room()  : room left in channel buffer's
  - c_empty() : true if channel buffer is empty
  - c_full()  : true if channel buffer is full

  - c_ptr()   : pointer to an offset relative to input data in the buffer
  - c_adv()   : advances the channel's buffer (bytes become part of output)
  - c_rew()   : rewinds the channel's buffer (output bytes not output anymore)
  - c_realign_if_empty() : realigns the buffer if it's empty

  - co_data() : # of output data
  - co_head() : beginning of output data
  - co_tail() : end of output data
  - ci_data() : # of input data
  - ci_head() : beginning of input data
  - ci_tail() : end of input data
  - ci_stop() : location after ci_tail()
  - ci_next() : pointer to next input byte

And for the ci_* / co_* functions above, the "__*" variants which disable
wrapping checks, and the "_ofs" variants which return an offset relative to
the buffer's origin instead.
2018-07-19 16:23:39 +02:00
Willy Tarreau
f17f19f1a7 MINOR: buffer: introduce b_realign_if_empty()
Many places deal with buffer realignment after data removal. The method
is always the same : if the buffer is empty, set its pointer to the origin.
Let's have a function for this so that we have less code to change with the
new API.
2018-07-19 16:23:39 +02:00
Olivier Houchard
a04e40d578 MINOR: buffer: Add b_set_data().
Add a new function that lets you set the amount of input in a buffer.
For now it extends/truncates b->i except if the total length is
below b->o in which case it clears i and adjusts o.
2018-07-19 16:23:39 +02:00
Olivier Houchard
09138ecc49 MINOR: buffer: Introduce b_sub(), b_add(), and bo_add()
Instead of doing b->i -= directly, introduce b_sub(), that does the job, to
make it easier to switch to the future API.

Also add b_add(), that increases b->i, instead of using it directly, and
bo_add(), that does increase b->o.
2018-07-19 16:23:39 +02:00
Willy Tarreau
bbc68df330 MINOR: buffer: add a few basic functions for the new API
Here's the list of newly introduced functions :

- b_data(), returning the total amount of data in the buffer (currently i+o)

- b_orig(), returning the origin of the storage area, that is, the place of
  position 0.

- b_wrap(), pointer to wrapping point (currently data+size)

- b_size(), returning the size of the buffer

- b_room(), returning the amount of bytes left available

- b_full(), returning true if the buffer is full, otherwise false

- b_stop(), pointer to end of data mark (currently p+i), used to compute
  distances or a stop pointer for a loop.

- b_peek(), this one will help make the transition to the new buffer model.
  It returns a pointer to a position in the buffer known from an offest
  relative to the beginning of the data in the buffer. Thus, we can replace
  the following occurrences :

     bo_ptr(b)     => b_peek(b, 0);
     bo_end(b)     => b_peek(b, b->o);
     bi_ptr(b)     => b_peek(b, b->o);
     bi_end(b)     => b_peek(b, b->i + b->o);
     b_ptr(b, ofs) => b_peek(b, b->o + ofs);

- b_head(), pointer to the beginning of data (currently bo_ptr())

- b_tail(), pointer to first free place (currently bi_ptr())

- b_next() / b_next_ofs(), pointer to the next byte, taking wrapping
  into account.

- b_dist(), returning the distance between two pointers belonging to a buffer

- b_reset(), which resets the buffer

- b_space_wraps(), indicating if the free space wraps around the buffer

- b_almost_full(), indicating if 3/4 or more of the buffer are used

Some of these are provided with the unchecked variants using the "__"
prefix, or with the "_ofs" suffix indicating they return a relative
position to the buffer's origin instead of a pointer.

Cc: Olivier Houchard <ohouchard@haproxy.com>
2018-07-19 16:23:39 +02:00
Willy Tarreau
506a29ac6e MINOR: buffer: switch buffer sizes and offsets to size_t
Passing unsigned ints everywhere is painful, and will cause some headache
later when we'll want to integrate better with struct ist which already
uses size_t. Let's switch buffers to use size_t instead.
2018-07-19 16:23:39 +02:00
Willy Tarreau
41806d1c52 MINOR: buffer: implement a new file for low-level buffer manipulation functions
The buffer code currently depends on pools and other stuff and is not
really autonomous anymore. The rewrite of the new API is an opportunity
to clean this up. This patch creates a new file (buf.h) which does not
depend on other elements and which will only contain what is needed to
perform the most basic buffer operations. The new API will be introduced
in this file and the conversion will be finished once buffer.h is empty.

The definition of struct buffer was moved to this new file, using more
explicity stdint types for the sizes and offsets.

Most new functions will be implemented in two variants :

  __b_something() : unchecked variant, no wrapping is expected
  b_something() : wrapping-checked variant

This way callers will be able to select which one to use depending on
the use cases.
2018-07-19 16:23:39 +02:00
Olivier Houchard
9ddaf794a8 MINOR: tasklet: Set process to NULL.
Some consumers expect the process to be NULL when a tasklet it created, so
do so.
2018-07-19 16:23:08 +02:00
Willy Tarreau
17b4aa1adc BUG/MINOR: ssl: properly ref-count the tls_keys entries
Commit 200b0fa ("MEDIUM: Add support for updating TLS ticket keys via
socket") introduced support for updating TLS ticket keys from the CLI,
but missed a small corner case : if multiple bind lines reference the
same tls_keys file, the same reference is used (as expected), but during
the clean shutdown, it will lead to a double free when destroying the
bind_conf contexts since none of the lines knows if others still use
it. The impact is very low however, mostly a core and/or a message in
the system's log upon old process termination.

Let's introduce some basic refcounting to prevent this from happening,
so that only the last bind_conf frees it.

Thanks to Janusz Dziemidowicz and Thierry Fournier for both reporting
the same issue with an easy reproducer.

This fix needs to be backported from 1.6 to 1.8.
2018-07-18 08:59:50 +02:00
Baptiste Assmann
8e2d9430c0 MINOR: dns: new DNS options to allow/prevent IP address duplication
By default, HAProxy's DNS resolution at runtime ensure that there is no
IP address duplication in a backend (for servers being resolved by the
same hostname).
There are a few cases where people want, on purpose, to disable this
feature.

This patch introduces a couple of new server side options for this purpose:
"resolve-opts allow-dup-ip" or "resolve-opts prevent-dup-ip".
2018-07-12 17:56:44 +02:00
Dave Chiluk
8618a6a5e2 MINOR: Some spelling cleanup in the comments.
Signed-off-by: Dave Chiluk <chiluk+haproxy@indeed.com>
2018-06-21 20:43:52 +02:00
Olivier Houchard
dcd6f3a597 MINOR: tasks: Make sure we correctly init and deinit a tasklet.
Up until now, a tasklet couldn't be free'd while it was in the list, it is
no longer the case, so make sure we remove it from the list before freeing it.
To do so, we have to make sure we correctly initialize it, so use LIST_INIT,
instead of setting the pointers to NULL.
2018-06-14 18:57:13 +02:00
William Lallemand
6e1796e85d BUG/MINOR: signals: ha_sigmask macro for multithreading
The behavior of sigprocmask in an multithreaded environment is
undefined.

The new macro ha_sigmask() calls either pthreads_sigmask() or
sigprocmask() if haproxy was built with thread support or not.

This should be backported to 1.8.
2018-06-08 18:24:53 +02:00
Olivier Houchard
b1ca58b245 MINOR: tasks: Don't define rqueue if we're building without threads.
To make sure we don't inadvertently insert task in the global runqueue,
while only the local runqueue is used without threads, make its definition
and usage conditional on USE_THREAD.
2018-06-06 16:35:12 +02:00
Olivier Houchard
e13ab8b3c6 BUG/MEDIUM: tasks: Use the local runqueue when building without threads.
When building without threads enabled, instead of just using the global
runqueue, just use the local runqueue associated with the only thread, as
that's what is now expected for a single thread in prcoess_runnable_tasks().
This should fix haproxy when built without threads.
2018-06-06 16:34:52 +02:00
Willy Tarreau
10d81b8757 MINOR: applet: assign the same nice value to a new appctx as its owner task
When an applet is created, let's assign it the same nice value as the task
of the stream which owns it. It ensures that fairness is properly propagated
to applets, and that the CLI can regain a low latency behaviour again. Huge
differences have been seen under extreme loads, with the CLI being called
every 200 microseconds instead of 11 milliseconds.
2018-06-05 11:18:21 +02:00
David Carlier
caa8a37ffe MINOR: task: Fix a compiler warning by adding a cast.
When calling HA_ATOMIC_CAS with a pointer as the target, the compiler
expects a pointer as the new value, so give it one by casting 0x1 to
(void *).
2018-06-04 17:43:12 +02:00
Thierry FOURNIER
9d5422a4b7 MINOR: task/notification: Is notifications registered ?
This function returns true is some notifications are registered.

This function is usefull for the following patch

   BUG/MEDIUM: lua/socket: Sheduling error on write: may dead-lock

It should be backported in 1.6, 1.7 and 1.8
2018-05-31 10:58:41 +02:00
Olivier Houchard
09eeb7684d BUG/MEDIUM: tasks: Don't forget to increase/decrease tasks_run_queue.
Don't forget to increase tasks_run_queue when we're adding a task to the
tasklet list, and to decrease it when we remove a task from a runqueue,
or its value won't be accurate, and could lead to tasks not being executed
when put in the global run queue.

1.9-dev only, no backport is needed.
2018-05-28 15:20:55 +02:00
Tim Duesterhus
3fd1973d37 MINOR: http: Log warning if (add|set)-header fails
This patch adds a warning if an http-(request|reponse) (add|set)-header
rewrite fails to change the respective header in a request or response.

This usually happens when tune.maxrewrite is not sufficient to hold all
the headers that should be added.
2018-05-28 14:53:59 +02:00
Olivier Houchard
673867c357 MAJOR: applets: Use tasks, instead of rolling our own scheduler.
There's no real reason to have a specific scheduler for applets anymore, so
nuke it and just use tasks. This comes with some benefits, the first one
being that applets cannot induce high latencies anymore since they share
nice values with other tasks. Later it will be possible to configure the
applets' nice value. The second benefit is that the applet scheduler was
not very thread-friendly, having a big lock around it in prevision of this
change. Thus applet-intensive workloads should now scale much better with
threads.

Some more improvement is possible now : some applets also use a task to
handle timers and timeouts. These ones could now be simplified to use only
one task.
2018-05-26 20:03:30 +02:00
Olivier Houchard
1599b80360 MINOR: tasks: Make the number of tasks to run at once configurable.
Instead of hardcoding 200, make the number of tasks to be run configurable
using tune.runqueue-depth. 200 is still the default.
2018-05-26 20:03:24 +02:00
Olivier Houchard
b0bdae7b88 MAJOR: tasks: Introduce tasklets.
Introduce tasklets, lightweight tasks. They have no notion of priority,
they are just run as soon as possible, and will probably be used for I/O
later.

For the moment they're used to replace the temporary thread-local list
that was used in the scheduler. The first part of the struct is common
with tasks so that tasks can be cast to tasklets and queued in this list.
Once a task is in the tasklet list, it has its leaf_p set to 0x1 so that
it cannot accidently be confused as not in the queue.

Pure tasklets are identifiable by their nice value of -32768 (which is
normally not possible).
2018-05-26 20:03:19 +02:00
Olivier Houchard
f6e6dc12cd MAJOR: tasks: Create a per-thread runqueue.
A lot of tasks are run on one thread only, so instead of having them all
in the global runqueue, create a per-thread runqueue which doesn't require
any locking, and add all tasks belonging to only one thread to the
corresponding runqueue.

The global runqueue is still used for non-local tasks, and is visited
by each thread when checking its own runqueue. The nice parameter is
thus used both in the global runqueue and in the local ones. The rare
tasks that are bound to multiple threads will have their nice value
used twice (once for the global queue, once for the thread-local one).
2018-05-26 19:27:29 +02:00
Olivier Houchard
9f6af33222 MINOR: tasks: Change the task API so that the callback takes 3 arguments.
In preparation for thread-specific runqueues, change the task API so that
the callback takes 3 arguments, the task itself, the context, and the state,
those were retrieved from the task before. This will allow these elements to
change atomically in the scheduler while the application uses the copied
value, and even to have NULL tasks later.
2018-05-26 19:23:57 +02:00
Willy Tarreau
0cd82e883e BUG/BUILD: threads: unbreak build without threads
A few users reported that building without threads was accidently broken
after commit 6b96f72 ("BUG/MEDIUM: pollers: Use a global list for fd
shared between threads.") due to all_threads_mask not being defined.
It's OK to set it to zero as other code parts do when threads are
enabled but only one thread is used.

This needs to be backported to 1.8.
2018-05-23 19:54:43 +02:00
Thierry Fournier
d5b073cf1f MINOR: lua: Improve error message
The function hlua_ctx_resume return less text message and more error
code. These error code allow the caller to return appropriate
message to the user.
2018-05-22 18:57:46 +02:00
Christopher Faulet
68db0235fd CLEANUP: spoe: Remove unused variables the agent structure
applets_act and applets_idle were used for debugging purpose. Now, these values
are part of the agent's counters.
2018-05-18 15:04:46 +02:00
Olivier Houchard
cb92f5cae4 MINOR: pollers: move polled_mask outside of struct fdtab.
The polled_mask is only used in the pollers, and removing it from the
struct fdtab makes it fit in one 64B cacheline again, on a 64bits machine,
so make it a separate array.
2018-05-06 06:27:34 +02:00
Olivier Houchard
6b96f7289c BUG/MEDIUM: pollers: Use a global list for fd shared between threads.
With the old model, any fd shared by multiple threads, such as listeners
or dns sockets, would only be updated on one threads, so that could lead
to missed event, or spurious wakeups.
To avoid this, add a global list for fd that are shared, using the same
implementation as the fd cache, and only remove entries from this list
when every thread as updated its poller.

[wt: this will need to be backported to 1.8 but differently so this patch
 must not be backported as-is]
2018-05-06 06:27:09 +02:00
Olivier Houchard
6a2cf8752c MINOR: fd: Make the lockless fd list work with multiple lists.
Modify fd_add_to_fd_list() and fd_rm_from_fd_list() so that they take an
offset in the fdtab to the list entry, instead of hardcoding the fd cache,
so we can use them with other lists.
2018-05-06 06:25:49 +02:00
Olivier Houchard
9b36cb4a41 BUG/MEDIUM: task: Don't free a task that is about to be run.
While running a task, we may try to delete and free a task that is about to
be run, because it's part of the local tasks list, or because rq_next points
to it.
So flag any task that is in the local tasks list to be deleted, instead of
run, by setting t->process to NULL, and re-make rq_next a global,
thread-local variable, that is modified if we attempt to delete that task.

Many thanks to PiBa-NL for reporting this and analysing the problem.

This should be backported to 1.8.
2018-05-04 20:11:04 +02:00
Willy Tarreau
760e81d356 MINOR: backend: implement random-based load balancing
For large farms where servers are regularly added or removed, picking
a random server from the pool can ensure faster load transitions than
when using round-robin and less traffic surges on the newly added
servers than when using leastconn.

This commit introduces "balance random". It internally uses a random as
the key to the consistent hashing mechanism, thus all features available
in consistent hashing such as weights and bounded load via hash-balance-
factor are usable. It is extremely convenient because one common concern
when using random is what happens when a server is hammered a bit too
much. Here that can trivially be avoided, like in the configuration below :

    backend bk0
        balance random
        hash-balance-factor 110
        server-template s 1-100 127.0.0.1:8000 check inter 1s

Note that while "balance random" internally relies on a hash algorithm,
it holds the same properties as round-robin and as such is compatible with
reusing an existing server connection with "option prefer-last-server".
2018-05-03 07:20:40 +02:00
Tim Duesterhus
e2b10bf491 MINOR: http: Add support for 421 Misdirected Request
This makes haproxy aware of HTTP 421 Misdirected Request, which
is defined in RFC 7540, section 9.1.2.
2018-04-28 07:03:39 +02:00
Aurlien Nephtali
abbf607105 MEDIUM: cli: Add payload support
In order to use arbitrary data in the CLI (multiple lines or group of words
that must be considered as a whole, for example), it is now possible to add a
payload to the commands. To do so, the first line needs to end with a special
pattern: <<\n. Everything that follows will be left untouched by the CLI parser
and will be passed to the commands parsers.

Per-command support will need to be added to take advantage of this
feature.

Signed-off-by: Aurlien Nephtali <aurelien.nephtali@corp.ovh.com>
2018-04-26 14:19:33 +02:00
Willy Tarreau
174b06a572 MINOR: h2: detect presence of CONNECT and/or content-length
We'll need this in order to support uploading chunks. The h2 to h1
converter checks for the presence of the content-length header field
as well as the CONNECT method and returns these information to the
caller. The caller indicates whether or not a body is detected for
the message (presence of END_STREAM or not). No transfer-encoding
header is emitted yet.
2018-04-26 10:15:14 +02:00
Olivier Houchard
302f9ef055 BUG/MEDIUM: connection: Make sure we have a mux before calling detach().
In some cases, we call cs_destroy() very early, so early the connection
doesn't yet have a mux, so we can't call mux->detach(). In this case,
just destroy the associated connection.

This should be backported to 1.8.
2018-04-13 16:02:21 +02:00
Christopher Faulet
48aa13f286 BUG/MEDIUM: threads: Fix the max/min calculation because of name clashes
With gcc < 4.7, when HAProxy is built with threads, the macros
HA_ATOMIC_CAS/XCHG/STORE relies on the legacy __sync builtins. These macros
are slightly complicated than the versions relying on the '_atomic'
builtins. Internally, some local variables are defined, prefixed with '__' to
avoid name clashes with the caller.

On the other hand, the macros HA_ATOMIC_UPDATE_MIN/MAX call HA_ATOMIC_CAS. Some
local variables are also definied in these macros, following the same naming
rule as below. The problem is that '__new' variable is used in
HA_ATOMIC_MIN/_MAX and in HA_ATOMIC_CAS. Obviously, the behaviour is undefined
because '__new' in HA_ATOMIC_CAS is left uninitialized. Unfortunatly gcc fails
to detect this error.

To fix the problem, all internal variables to macros are now suffixed with name
of the macros to avoid clashes (for instance, '__new_cas' in HA_ATOMIC_CAS).

This patch must be backported in 1.8.
2018-04-10 11:07:56 +02:00
Christopher Faulet
caf2feca62 MINOR: spoe: Add counters to log info about SPOE agents
In addition to metrics about time spent in the SPOE, following counters have
been added:

  * applets : number of SPOE applets.
  * idles : number of idle applets.
  * nb_sending : number of streams waiting to send data.
  * nb_waiting : number of streams waiting for a ack.
  * nb_processed : number of events/groups processed by the SPOE (from the
                   stream point of view).
  * nb_errors : number of errors during the processing (from the stream point of
                view).

Log messages has been updated to report these counters. Following pattern has
been added at the end of the log message:

    ... <idles>/<applets> <nb_sending>/<nb_waiting> <nb_error>/<nb_processed>
2018-04-05 15:13:54 +02:00
Christopher Faulet
7250b8fb5c MINOR: spoe: Add loggers dedicated to the SPOE agent
Now it is possible to configure a logger in a spoe-agent section using a "log"
line, as for a proxy. "no log", "log global" and "log <address> ..." syntaxes
are supported.
2018-04-05 15:13:54 +02:00
Christopher Faulet
28ac099907 MINOR: log: Keep the ref when a log server is copied to avoid duplicate entries
With "log global" line, the global list of loggers are copied into the proxy's
struct. The list coming from the default section is also copied when a frontend
or a backend section is parsed. So it is possible to have duplicate entries in
the proxy's list. For instance, with this following config, all messages will be
logged twice:

    global
        log 127.0.0.1 local0 debug
        daemon

    defaults
        mode   http
        log    global
        option httplog

    frontend front-http
        log global
        bind *:8888
        default_backend back-http

    backend back-http
        server www 127.0.0.1:8000
2018-04-05 15:13:54 +02:00
Christopher Faulet
4b0b79dd56 MINOR: log: move 'log' keyword parsing in dedicated function
Now, the function parse_logsrv should be used to parse a "log" line. This
function will update the list of loggers passed in argument. It can release all
log servers when "no log" line was parsed (by the caller) or it can parse "log
global" or "log <address> ... " lines. It takes care of checking the caller
context (global or not) to prohibit "log global" usage in the global section.
2018-04-05 15:13:54 +02:00
Christopher Faulet
36bda1cd4a MINOR: spoe: Add options to store processing times in variables
"set-process-time" and "set-total-time" options have been added to store
processing times in the transaction scope, at each event and group processing,
the current one and the total one. So it is possible to get them.

TODO: documentation
2018-04-05 15:13:54 +02:00
Christopher Faulet
b2dd1e034c MINOR: spoe: Add metrics in to know time spent in the SPOE
Following metrics are added for each event or group of messages processed in the
SPOE:

  * processing time: the delay to process the event or the group. From the
                     stream point of view, it is the latency added by the SPOE
                     processing.
  * request time : It is the encoding time. It includes ACLs processing, if
                   any. For fragmented frames, it is the sum of all fragments.
  * queue time : the delay before the request gets out the sending queue. For
                 fragmented frames, it is the sum of all fragments.
  * waiting time: the delay before the reponse is received. No fragmentation
                  supported here.
  * response time: the delay to process the response. No fragmentation supported
                   here.
  * total time: (unused for now). It is the sum of all events or groups
                processed by the SPOE for a specific threads.

Log messages has been updated. Before, only errors was logged (status_code !=
0). Now every processing is logged, following this format:

  SPOE: [AGENT] <TYPE:NAME> sid=STREAM-ID st=STATUC-CODE reqT/qT/wT/resT/pT

where:

  AGENT              is the agent name
  TYPE               is EVENT of GROUP
  NAME               is the event or the group name
  STREAM-ID          is an integer, the unique id of the stream
  STATUS_CODE        is the processing's status code
  reqT/qT/wT/resT/pT are delays descrive above

For all these delays, -1 means the processing was interrupted before the end. So
-1 for the queue time means the request was never dequeued. For fragmented
frames it is harder to know when the interruption happened.

For now, messages are logged using the same logger than the backend of the
stream which initiated the request.
2018-04-05 15:13:53 +02:00
Olivier Houchard
8ef1a6b0d8 BUG/MINOR: fd: Don't clear the update_mask in fd_insert.
Clearing the update_mask bit in fd_insert may lead to duplicate insertion
of fd in fd_updt, that could lead to a write past the end of the array.
Instead, make sure the update_mask bit is cleared by the pollers no matter
what.

This should be backported to 1.8.
[wt: warning: 1.8 doesn't have the lockless fdcache changes and will
 require some careful changes in the pollers]
2018-04-03 19:38:15 +02:00
Willy Tarreau
b011d8f4c4 MINOR: mux: add a "show_fd" function to dump debugging information for "show fd"
This function will be called from the CLI's "show fd" command to append some
extra mux-specific information that only the mux handler can decode. This is
supposed to help collect various hints about what is happening when facing
certain anomalies.
2018-03-30 14:41:19 +02:00
Willy Tarreau
4037a3f904 MINOR: cli/threads: make "show fd" report thread_sync_io_handler instead of "unknown"
The output was confusing when the sync point's dummy handler was shown.

This patch should be backported to 1.8 to help with troubleshooting.
2018-03-28 18:06:47 +02:00
Emmanuel Hocdet
4952985b71 REORG: compact "struct server"
Move use_ssl (bool value) in "struct server" hole.
2018-03-21 05:04:01 +01:00
Emmanuel Hocdet
4399c75f6c MINOR: proxy-v2-options: add crc32c
This patch add option crc32c (PP2_TYPE_CRC32C) to proxy protocol v2.
It compute the checksum of proxy protocol v2 header as describe in
"doc/proxy-protocol.txt".
2018-03-21 05:04:01 +01:00
Emmanuel Hocdet
6afd898988 MINOR: hash: add new function hash_crc32c
This function will be used to perform CRC32c computations. This is
required to compute proxy protocol v2 CRC32C tlv (PP2_TYPE_CRC32C).
2018-03-21 05:04:01 +01:00
Willy Tarreau
26fb5d8449 BUG/MEDIUM: fd/threads: ensure the fdcache_mask always reflects the cache contents
Commit 4815c8c ("MAJOR: fd/threads: Make the fdcache mostly lockless.")
made the fd cache lockless, but after a few iterations, a subtle part was
lost, consisting in setting the bit on the fd_cache_mask immediately when
adding an event. Now it was done only when the cache started to process
events, but the problem it causes is that fd_cache_mask isn't reliable
anymore as an indicator of presence of events to be processed with no
delay outside of fd_process_cached_events(). This results in some spurious
delays when processing inter-thread wakeups between tasks. Just restoring
the flag when the event is added is enough to fix the problem.

Kudos to Christopher for spotting this one!

No backport is needed as this is only in the development version.
2018-03-20 19:14:24 +01:00
Christopher Faulet
5cd4bbd7ab BUG/MAJOR: threads/queue: Fix thread-safety issues on the queues management
The management of the servers and the proxies queues was not thread-safe at
all. First, the accesses to <strm>->pend_pos were not protected. So it was
possible to release it on a thread (for instance because the stream is released)
and to use it in same time on another one (because we redispatch pending
connections for a server). Then, the accesses to stream's information (flags and
target) from anywhere is forbidden. To be safe, The stream's state must always
be updated in the context of process_stream.

So to fix these issues, the queue module has been refactored. A lock has been
added in the pendconn structure. And now, when we try to dequeue a pending
connection, we start by unlinking it from the server/proxy queue and we wake up
the stream. Then, it is the stream reponsibility to really dequeue it (or
release it). This way, we are sure that only the stream can create and release
its <pend_pos> field.

However, be careful. This new implementation should be thread-safe
(hopefully...). But it is not optimal and in some situations, it could be really
slower in multi-threaded mode than in single-threaded one. The problem is that,
when we try to dequeue pending connections, we process it from the older one to
the newer one independently to the thread's affinity. So we need to wait the
other threads' wakeup to really process them. If threads are blocked in the
poller, this will add a significant latency. This problem happens when maxconn
values are very low.

This patch must be backported in 1.8.
2018-03-19 10:03:06 +01:00
Christopher Faulet
510c0d67ef BUG/MEDIUM: threads/unix: Fix a deadlock when a listener is temporarily disabled
When a listener is temporarily disabled, we start by locking it and then we call
.pause callback of the underlying protocol (tcp/unix). For TCP listeners, this
is not a problem. But listeners bound on an unix socket are in fact closed
instead. So .pause callback relies on unbind_listener function to do its job.

Unfortunatly, unbind_listener hold the listener's lock and then call an internal
function to unbind it. So, there is a deadlock here. This happens during a
reload. To fix the problemn, the function do_unbind_listener, which is lockless,
is now exported and is called when a listener bound on an unix socket is
temporarily disabled.

This patch must be backported in 1.8.
2018-03-16 11:19:07 +01:00
Willy Tarreau
c41b3e8dff DOC: buffers: clarify the purpose of the <from> pointer in offer_buffers()
This one is only used to compare pointers and NULL is permitted though
this is far from being clear.
2018-03-08 18:33:48 +01:00
Emmanuel Hocdet
253c3b7516 MINOR: connection: add proxy-v2-options authority
This patch add option PP2_TYPE_AUTHORITY to proxy protocol v2 when a TLS
connection was negotiated. In this case, authority corresponds to the sni.
2018-03-01 11:38:32 +01:00
Emmanuel Hocdet
fa8d0f1875 MINOR: connection: add proxy-v2-options ssl-cipher,cert-sig,cert-key
This patch implement proxy protocol v2 options related to crypto information:
ssl-cipher (PP2_SUBTYPE_SSL_CIPHER), cert-sig (PP2_SUBTYPE_SSL_SIG_ALG) and
cert-key (PP2_SUBTYPE_SSL_KEY_ALG).
2018-03-01 11:38:28 +01:00
Emmanuel Hocdet
283e004a85 MINOR: ssl: add ssl_sock_get_cert_sig function
ssl_sock_get_cert_sig can be used to report cert signature short name
to log and ppv2 (RSA-SHA256).
2018-03-01 11:34:08 +01:00
Emmanuel Hocdet
96b7834e98 MINOR: ssl: add ssl_sock_get_pkey_algo function
ssl_sock_get_pkey_algo can be used to report pkey algorithm to log
and ppv2 (RSA2048, EC256,...).
Extract pkey information is not free in ssl api (lock/alloc/free):
haproxy can use the pkey information computed in load_certificate.
Store and use this information in a SSL ex_data when available,
compute it if not (SSL multicert bundled and generated cert).
2018-03-01 11:34:05 +01:00
Emmanuel Hocdet
ddc090bc55 MINOR: ssl: extract full pkey info in load_certificate
Private key information is used in switchctx to implement native multicert
selection (ecdsa/rsa/anonymous). This patch extract and store full pkey
information: dsa type and pkey size in bits. This can be used for switchctx
or to report pkey informations in ppv2 and log.
2018-03-01 11:33:18 +01:00
Christopher Faulet
ca6ef50661 BUG/MEDIUM: buffer: Fix the wrapping case in bi_putblk
When the block of data need to be split to support the wrapping, the start of
the second block of data was wrong. We must be sure to skup data copied during
the first memcpy.

This patch must be backported to 1.8.
2018-02-27 15:45:03 +01:00
Christopher Faulet
b2b279464c BUG/MEDIUM: buffer: Fix the wrapping case in bo_putblk
When the block of data need to be split to support the wrapping, the start of
the second block of data was wrong. We must be sure to skip data copied during
the first memcpy.

This patch must be backported to 1.8, 1.7, 1.6 and 1.5.
2018-02-27 15:45:03 +01:00
Yves Lafon
95317289e9 MINOR: stats: display the number of threads in the statistics.
Add the nbthread global variable to the output, matching nbproc.

This may be backported to 1.8
2018-02-26 11:53:46 +01:00
Willy Tarreau
364d745106 MINOR: debug/pools: make DEBUG_UAF also detect underflows
Since we use padding before the allocated page, it's trivial to place
the allocated address there and see if it gets mangled once we release
it.

This may be backported to stable releases already using DEBUG_UAF.
2018-02-22 14:18:45 +01:00
Willy Tarreau
5a9cce4653 BUG/MINOR: debug/pools: properly handle out-of-memory when building with DEBUG_UAF
Commit 158fa75 ("MINOR: pools: implement DEBUG_UAF to detect use after free")
implemented pool use-after-free detection, but the mmap() return value isn't
properly checked, preventing the call to pool_alloc_area() from returning
NULL. So on out-of-memory a mangled pointer is returned, causing a crash on
the pool_alloc() site instead of forcing a GC. It doesn't affect regular
operations however, just complicates complex bug investigations.

This fix should be backported to 1.8 and to 1.7.
2018-02-22 14:18:45 +01:00
Willy Tarreau
f161d0f51e BUG/MINOR: pools/threads: don't ignore DEBUG_UAF on double-word CAS capable archs
Since commit cf975d4 ("MINOR: pools/threads: Implement lockless memory
pools."), we support lockless pools. However the parts dedicated to
detecting use-after-free are not present in this part, making DEBUG_UAF
useless in this situation.

The present patch sets a new define CONFIG_HAP_LOCKLESS_POOLS when such
a compatible architecture is detected, and when pool debugging is not
requested, then makes use of this everywhere in pools and buffers
functions. This way enabling DEBUG_UAF will automatically disable the
lockless version.

No backport is needed as this is purely 1.9-dev.
2018-02-22 14:18:45 +01:00
Tim Duesterhus
5e64286bab CLEANUP: standard: Fix typo in IPv6 mask example
IPv6 addresses with two double colons are invalid.

This typo was introduced in commit 471851713a.
2018-02-21 05:07:35 +01:00
Tim Duesterhus
05f6a43bd4 CLEANUP: pools: Remove unused end label in memory.h
This removes the end label from memory.h.

The labels are unused as of cf975d46bc
which is unreleased (and incidentally the first commit containing
those labels, thus they never have been used).
2018-02-20 08:30:13 +01:00
Christopher Faulet
16f45c87d5 BUG/MINOR: ssl/threads: Make management of the TLS ticket keys files thread-safe
A TLS ticket keys file can be updated on the CLI and used in same time. So we
need to protect it to be sure all accesses are thread-safe. Because updates are
infrequent, a R/W lock has been used.

This patch must be backported in 1.8
2018-02-19 14:15:38 +01:00
David Carlier
4ee76d0281 BUILD/MINOR: memory: stdint is needed for uintptr_t
stdint.h is needed on OpenBSD for uintptr_t type.
2018-02-19 07:58:50 +01:00
Willy Tarreau
41ccb194d1 BUG/MEDIUM: threads: fix the double CAS implementation for ARMv7
Commit f61f0cb ("MINOR: threads: Introduce double-width CAS on x86_64
and arm.") introduced the double CAS. But the ARMv7 version is bogus,
it uses the value of the pointers instead of dereferencing them. When
lucky, it simply doesn't build due to impossible registers combinations.
Otherwise it will immediately crash at run time when facing traffic.

No backport is needed, this bug was introduced in 1.9-dev.
2018-02-14 14:16:28 +01:00
Willy Tarreau
4cc67a2782 MINOR: fd: move the fd_{add_to,rm_from}_fdlist functions to fd.c
There's not point inlining these huge functions, better move them to real
functions in fd.c.
2018-02-05 17:19:40 +01:00
Willy Tarreau
4d84186337 MEDIUM: fd: make updt_fd_polling() use atomics
It only needed a test-and-set and an atomic increment so we can take it
out of the fd lock now.
2018-02-05 16:02:22 +01:00
Willy Tarreau
1b76a6d1a6 CLEANUP: fd: remove the now unused fd_compute_new_polled_status() function
It's not used anymore since the new state is calculated on the fly
during every update. Let's remove this function.
2018-02-05 16:02:22 +01:00
Willy Tarreau
7ac0e35f23 MAJOR: fd: compute the new fd polling state out of the fd lock
Each fd_{may|cant|stop|want}_{recv|send} function sets or resets a
single bit at once, then recomputes the need for updates, and then
the new cache state. Later, pollers will compute the new polling
state based on the resulting operations here. In fact the conditions
are so simple that they can be performed by a single "if", or sometimes
even optimized away.

This means that in practice a simple compare-and-swap operation if often
enough to set the new value inluding the new polling state, and that only
the cache and fdupdt have to be performed under the lock. Better, for the
most common operations (fd_may_{recv,send}, used by the pollers), a simple
atomic OR is needed.

This patch does this for the fd_* functions above and it doesn't yet
remove the now useless fd_compute_new_polling_status() because it's still
used by other pollers. A pure connection rate test shows a 1% performance
increase.
2018-02-05 16:02:22 +01:00