Commit Graph

1314 Commits

Author SHA1 Message Date
Olivier Houchard
9b03c0c9a7 MINOR: tasks: Make active_tasks_mask volatile.
To be sure we have the relevant informations, make active_tasks_mask volatile
2018-07-26 19:09:50 +02:00
Willy Tarreau
d0ad4a87f0 MEDIUM: queue: make pendconn_free() work on the stream instead
Now pendconn_free() takes a stream, checks that pend_pos is set, clears
it, and uses pendconn_unlink() to complete the job. It's cleaner and
centralizes all the bookkeeping work in pendconn_unlink() only and
ensures that there's a single place where the stream's position in the
queue is manipulated.
2018-07-26 17:32:51 +02:00
Willy Tarreau
9624faec86 MINOR: queue: centralize dequeuing code a bit better
For now the pendconns may be dequeued at two places :
  - pendconn_unlink(), which operates on a locked queue
  - pendconn_free(), which operates on an unlocked queue and frees
    everything.

Some changes are coming to the queue and we'll need to be able to be a
bit stricter regarding the places where we dequeue to keep the accounting
accurate. This first step renames the locked function __pendconn_unlink()
as it's for use by those aware of it, and introduces a new general purpose
pendconn_unlink() function which automatically grabs the necessary locks
before calling the former, and pendconn_cond_unlink() which additionally
checks the pointer and the presence in the queue.
2018-07-26 17:32:48 +02:00
Olivier Houchard
77551ee8a7 BUG/MEDIUM: tasks: make __task_unlink_rq responsible for the rqueue size.
As __task_wakeup() is responsible for increasing
rqueue_local[tid]/global_rqueue_size, make __task_unlink_rq responsible for
decreasing it, as process_runnable_tasks() isn't the only one that removes
tasks from runqueues.
2018-07-26 16:33:29 +02:00
Olivier Houchard
76e45181b2 MINOR: tasks: Add a flag that tells if we're in the global runqueue.
How that we have bits available in task->state, add a flag that tells if we're
in the global runqueue or not.
2018-07-26 16:33:10 +02:00
Willy Tarreau
11c9aa424e MEDIUM: conn_stream: add cs_recv() as a default rcv_buf() function
This function is generic and is able to automatically transfer data
from a conn_stream's rx buffer to the destination buffer. It does this
automatically if the mux doesn't define another rcv_buf() function.
2018-07-20 19:21:43 +02:00
Willy Tarreau
5e1cc5ea83 MINOR: conn_stream: add an rx buffer to the conn_stream
In order to reorganize the connection layers, recv() operations will
need to be retryable and to support partial transfers. This requires
an intermediary buffer to hold the data coming from the mux. After a
few attempts, it turns out that this buffer is best placed inside the
conn_stream itself. For now it's only set to buf_empty and it will be
up to the caller to allocate it if required.
2018-07-20 19:21:43 +02:00
Willy Tarreau
8318885487 MINOR: connection: simplify subscription by adding a registration function
This new function wl_set_waitcb() prepopulates a wait_list with a tasklet
and a context and returns it so that it can be passed to ->subscribe() to
be added to a connection or conn_stream's wait_list. The caller doesn't
need to know all the insiders details anymore this way.
2018-07-19 18:31:07 +02:00
Olivier Houchard
910b2bc829 MEDIUM: connections/mux: Revamp the send direction.
Totally nuke the "send" method, instead, the upper layer decides when it's
time to send data, and if it's not possible, uses the new subscribe() method
to be called when it can send data again.
2018-07-19 18:31:07 +02:00
Olivier Houchard
6ff2039d13 MINOR: connections/mux: Add a new "subscribe" method.
Add a new "subscribe" method for connection, conn_stream and mux, so that
upper layer can subscribe to them, to be called when the event happens.
Right now, the only event implemented is "SUB_CAN_SEND", where the upper
layer can register to be called back when it is possible to send data.

The connection and conn_stream got a new "send_wait_list" entry, which
required to move a few struct members around to maintain an efficient
cache alignment (and actually this slightly improved performance).
2018-07-19 16:23:43 +02:00
Olivier Houchard
e17c2d3e57 MINOR: tasklets: Don't attempt to add a tasklet in the list twice.
Don't try to add a tasklet to the run queue if it's already in there, or we
might get an infinite loop.
2018-07-19 16:23:43 +02:00
Willy Tarreau
83061a820e MAJOR: chunks: replace struct chunk with struct buffer
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
2018-07-19 16:23:43 +02:00
Willy Tarreau
843b7cbe9d MEDIUM: chunks: make the chunk struct's fields match the buffer struct
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.

Most of the changes were made with spatch using this coccinelle script :

  @rule_d1@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.str
  + chunk.area

  @rule_d2@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.len
  + chunk.data

  @rule_i1@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->str
  + chunk->area

  @rule_i2@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->len
  + chunk->data

Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
2018-07-19 16:23:43 +02:00
Willy Tarreau
c9fa0480af MAJOR: buffer: finalize buffer detachment
Now the buffers only contain the header and a pointer to the storage
area which can be anywhere. This will significantly simplify buffer
swapping and will make it possible to map chunks on buffers as well.

The buf_empty variable was removed, as now it's enough to have size==0
and area==NULL to designate the empty buffer (thus a non-allocated head
is the empty buffer by default). buf_wanted for now is indicated by
size==0 and area==(void *)1.

The channels and the checks now embed the buffer's head, and the only
pointer is to the storage area. This slightly increases the unallocated
buffer size (3 extra ints for the empty buffer) but considerably
simplifies dynamic buffer management. It will also later permit to
detach unused checks.

The way the struct buffer is arranged has proven quite efficient on a
number of tests, which makes sense given that size is always accessed
and often first, followed by the othe ones.
2018-07-19 16:23:43 +02:00
Willy Tarreau
bd1dba8a89 MINOR: buffer: rename the data length member to '->data'
It used to be called 'len' during the reorganisation but strictly speaking
it's not a length since it wraps. Also we already use '_data' as the suffix
to count available data, and data is also what we use to indicate the amount
of data in a pipe so let's improve consistency here. It was important to do
this in two operations because data used to be the name of the pointer to
the storage area.
2018-07-19 16:23:43 +02:00
Willy Tarreau
4d893d440c MINOR: buffers/channel: replace buffer_insert_line2() with ci_insert_line2()
There was no point keeping that function in the buffer part since it's
exclusively used by HTTP at the channel level, since it also automatically
appends the CRLF. This further cleans up the buffer code.
2018-07-19 16:23:43 +02:00
Olivier Houchard
08afac0fd7 MEDIUM: buffers: move "output" from struct buffer to struct channel
Since we never access this field directly anymore, but only through the
channel's wrappers, it can now move to the channel. The buffers are now
completely free from the distinction between input and output data.
2018-07-19 16:23:43 +02:00
Willy Tarreau
abed1e7f34 MINOR: buffer: remove the check for output on b_del()
b_del() is used in :
  - mux_h2 with the demux buffer : always processes input data
  - checks with output data though output is not considered at all there
  - b_eat() which is not used anywhere
  - co_skip() where the len is always <= output

Thus the distinction for output data is not needed anymore and the
decrement can be made inconditionally in co_skip().
2018-07-19 16:23:43 +02:00
Willy Tarreau
d54a8ceb97 MAJOR: start to change buffer API
This is intentionally the minimal and safest set of changes, some cleanups
area still required. These changes are quite tricky and cannot be
independantly tested, so it's important to keep this patch as bisectable
as possible.

buf_empty and buf_wanted were changed and are now exactly similar since
there's no <p> member in the structure anymore. Given that no test is
ever made in the code to check that buf == &buf_wanted, it may be possible
that we don't need to have two anymore, unless some buf_empty tests have
precedence. This will have to be investigated.

A significant part of this commit affects the HTTP compression code,
which used to deeply manipulate the input and output buffers without
any reasonable solution for a better abstraction. For this reason, if
any regression is met and designates this patch as the culprit, it is
important to run tests which specifically involve compression or which
definitely don't use it in order to spot the issue.

Cc: Olivier Houchard <ohouchard@haproxy.com>
2018-07-19 16:23:42 +02:00
Willy Tarreau
cd9e60db00 MEDIUM: channel: adapt to the new buffer API
Also, ci_swpbuf() was removed (unused).
2018-07-19 16:23:42 +02:00
Olivier Houchard
d4251a7e98 MINOR: channel: Add co_set_data().
Add a new function that lets one set the channel's output amount.
2018-07-19 16:23:42 +02:00
Willy Tarreau
3ee8344b7b MINOR: channel: remove almost all references to buf->i and buf->o
We use ci_data() and co_data() instead now everywhere we read these
values.
2018-07-19 16:23:42 +02:00
Willy Tarreau
591d445049 MINOR: buffer: use b_orig() to replace most references to b->data
This patch updates most users of b->data to use b_orig().
2018-07-19 16:23:42 +02:00
Willy Tarreau
50227f9b88 MINOR: buffer: use c_head() instead of buffer_wrap_sub(c->buf, p-o)
This way we don't need o anymore.
2018-07-19 16:23:42 +02:00
Willy Tarreau
3f6799975f MINOR: buffer: replace bi_space_for_replace() with ci_space_for_replace()
This one computes the size that can be overwritten over the input part
of the buffer, so it's channel-specific.
2018-07-19 16:23:41 +02:00
Willy Tarreau
2375233ef0 MINOR: buffer: replace buffer_full() with channel_full()
It's only used by channels since we need to know the amount of output
data.
2018-07-19 16:23:41 +02:00
Willy Tarreau
0c7ed5d264 MINOR: buffer: replace buffer_empty() with b_empty() or c_empty()
For the same consistency reasons, let's use b_empty() at the few places
where an empty buffer is expected, or c_empty() if it's done on a channel.
Some of these places were there to realign the buffer so
{b,c}_realign_if_empty() was used instead.
2018-07-19 16:23:41 +02:00
Willy Tarreau
55f3ce1c91 MINOR: buffer: make b_getblk_nc() take size_t for the block sizes
Till now we used to reimplement it using ints to limit external changes
but we must adjust it and the various users to switch to size_t.
2018-07-19 16:23:41 +02:00
Willy Tarreau
206ba834ef MINOR: buffer: make b_getblk_nc() take const pointers
Now that there are no more users requiring to modify the buffer anymore,
switch these ones to const char and const buffer. This will make it more
obvious next time send functions are tempted to modify the buffer's output
count. Minor adaptations were necessary at a few call places which were
using char due to the function's previous prototype.
2018-07-19 16:23:41 +02:00
Willy Tarreau
f40e68227b MINOR: h1: make h1_measure_trailers() use an offset and a count
This will be needed by the H2 encoder to restart after wrapping.
2018-07-19 16:23:41 +02:00
Willy Tarreau
84d6b7af87 MINOR: h1: make h1_parse_chunk_size() not depend on b_ptr() anymore
It's similar to the previous commit so that the function doesn't rely
on buf->p anymore.
2018-07-19 16:23:41 +02:00
Willy Tarreau
c0973c6742 MINOR: h1: make h1_skip_chunk_crlf() not depend on b_ptr() anymore
It now takes offsets relative to the buffer's head. It's up to the
callers to add this offset which corresponds to the buffer's output
size.
2018-07-19 16:23:41 +02:00
Willy Tarreau
7314be8e2c MINOR: h1: make h1_measure_trailers() take the byte count in argument
The principle is that it should not have to take this value from the
buffer itself anymore.
2018-07-19 16:23:40 +02:00
Willy Tarreau
e5f12ce7f2 MINOR: buffer: replace bi_del() and bo_del() with b_del()
Till now the callers had to know which one to call for specific use cases.
Let's fuse them now since a single one will remain after the API migration.
Given that bi_del() may only be used where o==0, just combine the two tests
by first removing output data then only input.
2018-07-19 16:23:40 +02:00
Willy Tarreau
7194d3cc3b MINOR: buffer: split bi_contig_data() into ci_contig_data and b_config_data()
This function was sometimes used from a channel and sometimes from a buffer.
In both cases it requires knowledge of the size of the output data (to skip
them). Here the split ensures the channel can deal with this point, and that
other places not having output data can continue to work.
2018-07-19 16:23:40 +02:00
Willy Tarreau
bcbd39370f MINOR: channel/buffer: replace b_{adv,rew} with c_{adv,rew}
These ones manipulate the output data count which will be specific to
the channel soon, so prepare the call points to use the channel only.
The b_* functions are now unused and were removed.
2018-07-19 16:23:40 +02:00
Willy Tarreau
fd8d42f496 MEDIUM: channel: make channel_slow_realign() take a swap buffer
The few call places where it's used can use the trash as a swap buffer,
which is made for this exact purpose. This way we can rely on the
generic b_slow_realign() call.
2018-07-19 16:23:40 +02:00
Willy Tarreau
4cf1300e6a MINOR: channel/buffer: replace buffer_slow_realign() with channel_slow_realign() and b_slow_realign()
Where relevant, the channel version is used instead. The buffer version
was ported to be more generic and now takes a swap buffer and the output
byte count to know where to set the alignment point. The H2 mux still
uses buffer_slow_realign() with buf->o but it will change later.
2018-07-19 16:23:40 +02:00
Willy Tarreau
08d5ac8f27 MINOR: channel: add a few basic functions for the new buffer API
This adds :
  - c_orig()  : channel buffer's origin
  - c_size()  : channel buffer's size
  - c_wrap()  : channel buffer's wrapping location
  - c_data()  : channel buffer's total data count
  - c_room()  : room left in channel buffer's
  - c_empty() : true if channel buffer is empty
  - c_full()  : true if channel buffer is full

  - c_ptr()   : pointer to an offset relative to input data in the buffer
  - c_adv()   : advances the channel's buffer (bytes become part of output)
  - c_rew()   : rewinds the channel's buffer (output bytes not output anymore)
  - c_realign_if_empty() : realigns the buffer if it's empty

  - co_data() : # of output data
  - co_head() : beginning of output data
  - co_tail() : end of output data
  - ci_data() : # of input data
  - ci_head() : beginning of input data
  - ci_tail() : end of input data
  - ci_stop() : location after ci_tail()
  - ci_next() : pointer to next input byte

And for the ci_* / co_* functions above, the "__*" variants which disable
wrapping checks, and the "_ofs" variants which return an offset relative to
the buffer's origin instead.
2018-07-19 16:23:39 +02:00
Olivier Houchard
9ddaf794a8 MINOR: tasklet: Set process to NULL.
Some consumers expect the process to be NULL when a tasklet it created, so
do so.
2018-07-19 16:23:08 +02:00
Olivier Houchard
dcd6f3a597 MINOR: tasks: Make sure we correctly init and deinit a tasklet.
Up until now, a tasklet couldn't be free'd while it was in the list, it is
no longer the case, so make sure we remove it from the list before freeing it.
To do so, we have to make sure we correctly initialize it, so use LIST_INIT,
instead of setting the pointers to NULL.
2018-06-14 18:57:13 +02:00
Olivier Houchard
b1ca58b245 MINOR: tasks: Don't define rqueue if we're building without threads.
To make sure we don't inadvertently insert task in the global runqueue,
while only the local runqueue is used without threads, make its definition
and usage conditional on USE_THREAD.
2018-06-06 16:35:12 +02:00
Olivier Houchard
e13ab8b3c6 BUG/MEDIUM: tasks: Use the local runqueue when building without threads.
When building without threads enabled, instead of just using the global
runqueue, just use the local runqueue associated with the only thread, as
that's what is now expected for a single thread in prcoess_runnable_tasks().
This should fix haproxy when built without threads.
2018-06-06 16:34:52 +02:00
Willy Tarreau
10d81b8757 MINOR: applet: assign the same nice value to a new appctx as its owner task
When an applet is created, let's assign it the same nice value as the task
of the stream which owns it. It ensures that fairness is properly propagated
to applets, and that the CLI can regain a low latency behaviour again. Huge
differences have been seen under extreme loads, with the CLI being called
every 200 microseconds instead of 11 milliseconds.
2018-06-05 11:18:21 +02:00
David Carlier
caa8a37ffe MINOR: task: Fix a compiler warning by adding a cast.
When calling HA_ATOMIC_CAS with a pointer as the target, the compiler
expects a pointer as the new value, so give it one by casting 0x1 to
(void *).
2018-06-04 17:43:12 +02:00
Thierry FOURNIER
9d5422a4b7 MINOR: task/notification: Is notifications registered ?
This function returns true is some notifications are registered.

This function is usefull for the following patch

   BUG/MEDIUM: lua/socket: Sheduling error on write: may dead-lock

It should be backported in 1.6, 1.7 and 1.8
2018-05-31 10:58:41 +02:00
Olivier Houchard
09eeb7684d BUG/MEDIUM: tasks: Don't forget to increase/decrease tasks_run_queue.
Don't forget to increase tasks_run_queue when we're adding a task to the
tasklet list, and to decrease it when we remove a task from a runqueue,
or its value won't be accurate, and could lead to tasks not being executed
when put in the global run queue.

1.9-dev only, no backport is needed.
2018-05-28 15:20:55 +02:00
Olivier Houchard
673867c357 MAJOR: applets: Use tasks, instead of rolling our own scheduler.
There's no real reason to have a specific scheduler for applets anymore, so
nuke it and just use tasks. This comes with some benefits, the first one
being that applets cannot induce high latencies anymore since they share
nice values with other tasks. Later it will be possible to configure the
applets' nice value. The second benefit is that the applet scheduler was
not very thread-friendly, having a big lock around it in prevision of this
change. Thus applet-intensive workloads should now scale much better with
threads.

Some more improvement is possible now : some applets also use a task to
handle timers and timeouts. These ones could now be simplified to use only
one task.
2018-05-26 20:03:30 +02:00
Olivier Houchard
b0bdae7b88 MAJOR: tasks: Introduce tasklets.
Introduce tasklets, lightweight tasks. They have no notion of priority,
they are just run as soon as possible, and will probably be used for I/O
later.

For the moment they're used to replace the temporary thread-local list
that was used in the scheduler. The first part of the struct is common
with tasks so that tasks can be cast to tasklets and queued in this list.
Once a task is in the tasklet list, it has its leaf_p set to 0x1 so that
it cannot accidently be confused as not in the queue.

Pure tasklets are identifiable by their nice value of -32768 (which is
normally not possible).
2018-05-26 20:03:19 +02:00
Olivier Houchard
f6e6dc12cd MAJOR: tasks: Create a per-thread runqueue.
A lot of tasks are run on one thread only, so instead of having them all
in the global runqueue, create a per-thread runqueue which doesn't require
any locking, and add all tasks belonging to only one thread to the
corresponding runqueue.

The global runqueue is still used for non-local tasks, and is visited
by each thread when checking its own runqueue. The nice parameter is
thus used both in the global runqueue and in the local ones. The rare
tasks that are bound to multiple threads will have their nice value
used twice (once for the global queue, once for the thread-local one).
2018-05-26 19:27:29 +02:00