Commit Graph

3452 Commits

Author SHA1 Message Date
Willy Tarreau
e01d11a75b BUILD: http: properly mark some struct as extern
http_known_methods, HTTP_100 and HTTP_103 were not declared extern and
as such were multiply defined since they were in http.h. There was
apparently no more side effect but it may depend on the platform and
the linker.

This needs to be backported to 1.9.
2019-03-29 21:00:22 +01:00
Willy Tarreau
a33d39a1b1 CLEANUP: task: only perform a LIST_DEL() when the list is not empty
In tasklet_free() we unconditionally perform a LIST_DEL() even when
the list is empty, let's move the LIST_DEL() inside the matching block.
2019-03-25 18:10:53 +01:00
Willy Tarreau
e73256fd2a BUG/MEDIUM: task/h2: add an idempotent task removal fucntion
Previous commit 3ea351368 ("BUG/MEDIUM: h2: Remove the tasklet from the
task list if unsubscribing.") uncovered an issue which needs to be
addressed in the scheduler's API. The function task_remove_from_task_list()
was initially designed to remove a task from the running tasklet list from
within the scheduler, and had to be used in h2 to abort pending I/O events.
However this function was not designed to be idempotent, occasionally
causing a double removal from the tasklet list, with the second doing
nothing but affecting the apparent tasks count and making haproxy use
100% CPU on some tests consisting in stopping the client during some
transfers. The h2_unsubscribe() function can sometimes be called upon
stream exit after an error where the tasklet was possibly already
removed, so it.

This patch does 2 things :
  - it renames task_remove_from_task_list() to
    __task_remove_from_tasklet_list() to discourage users from calling
    it. Also note the fix in the naming since it's a tasklet list and
    not a task list. This function is still uesd from the scheduler.
  - it adds a new, idempotent, task_remove_from_tasklet_list() function
    which does nothing if the task is already not in the tasklet list.

This patch will need to be backported where the commit above is backported.
2019-03-25 18:02:54 +01:00
Christopher Faulet
87a8f353f1 CLEANUP: muxes/stream-int: Remove flags CS_FL_READ_NULL and SI_FL_READ_NULL
Since the flag CF_SHUTR is no more set to mark the end of the message, these
flags become useless.

This patch should be backported to 1.9.
2019-03-25 06:55:23 +01:00
Christopher Faulet
297d3e2e0f MINOR: channel: Report EOI on the input channel if it was reached in the mux
The flag CF_EOI is now set on the input channel when the flag CS_FL_EOI is set
on the corresponding conn_stream. In addition, if a read activity is reported
when this flag is set, the stream is woken up.

This patch should be backported to 1.9.
2019-03-25 06:24:43 +01:00
Christopher Faulet
5311a9255d MINOR: connection: and new flag to mark end of input (EOI)
Since the begining, in the H2 multiplexer, when the end of a message is reached,
the flag CS_FL_(R)EOS is set on the conn_stream to notify the upper layer that
all data were received and consumed and there is no longer any expected. The
stream-interface converts it into a shutdown read. But it leads to some
ambiguities with the real shutr. Once it was reported at the end of the message,
there is no way to report it when the read0 is received. For this reason, aborts
after the message was fully received cannot be reported. And on the channel
side, it is hard to make the difference between a shutr because the end of the
message was reached and a shutr because of an abort.

For these reasons, there is now a flag to mark the end of the message. It is
called CS_FL_EOI (end-of-input) because it is only used on the receipt path.
This flag is only declared and not used yet.

This patch will be used by future bug fixes and will have to be backported
to 1.9.
2019-03-25 06:24:25 +01:00
Willy Tarreau
0f22299435 CLEANUP: cache: don't export http_cache_applet anymore
This one can become static since it's not used by http/htx anymore.
2019-03-19 09:58:35 +01:00
Christopher Faulet
3a78aa6e95 BUG/MINOR: stats: Fully consume large requests in the stats applet
In the stats applet (in HTX and legacy HTTP), after a response is fully sent to
a client, the request is consumed. It is done at the end, after all the response
was copied into the channel's buffer. But only outgoing data at time the applet
is called are consumed. Then the applet is closed. If a request with a huge body
is sent, an error is triggerred because a SHUTW is catched for an unfinisehd
request.

Now, we consume request data until the end. In fact, we don't try to shutdown
the request's channel for write anymore.

This patch must be backported to 1.9 after some observation period. It should
probably be backported in prior versions too. But honnestly, with refactoring
on the connection layer and the stream interface in 1.9, it is probably safer
to not do so.
2019-03-19 09:49:29 +01:00
Willy Tarreau
679bba13f7 MINOR: init: report the list of optionally available services
It's never easy to guess what services are built in. We currently have
the prometheus exporter in contrib/ which is the only extension for now.
Let's enumerate all available ones just like we do for filterr and pollers.
2019-03-19 08:08:10 +01:00
Christopher Faulet
203b2b0a5a MINOR: muxes: Report the Last read with a dedicated flag
For conveniance, in HTTP muxes (h1 and h2), the end of the stream and the end of
the message are reported the same way to the stream, by setting the flag
CS_FL_EOS. In the stream-interface, when CS_FL_EOS is detected, a shutdown for
read is reported on the channel side. This is historical. With the legacy HTTP
layer, because the parsing is done by the stream in HTTP analyzers, the EOS
really means a shutdown for read.

Most of time, for muxes h1 and h2, it works pretty well, especially because the
keep-alive is handled by the muxes. The stream is only used for one
transaction. So mixing EOS and EOM is good enough. But not everytime. For now,
client aborts are only reported if it happens before the end of the request. It
is an error and it is properly handled. But because the EOS was already
reported, client aborts after the end of the request are silently
ignored. Eventually an error can be reported when the response is sent to the
client, if the sending fails. Otherwise, if the server does not reply fast
enough, an error is reported when the server timeout is reached. It is the
expected behaviour, excpect when the option abortonclose is set. In this case,
we must report an error when the client aborts. But as said before, this event
can be ignored. So to be short, for now, the abortonclose is broken.

In fact, it is a design problem and we have to rethink all channel's flags and
probably the conn-stream ones too. It is important to split EOS and EOM to not
loose information anymore. But it is not a small job and the refactoring will be
far from straightforward.

So for now, temporary flags are introduced. When the last read is received, the
flag CS_FL_READ_NULL is set on the conn-stream. This way, we can set the flag
SI_FL_READ_NULL on the stream interface. Both flags are persistant. And to be
sure to wake the stream, the event CF_READ_NULL is reported. So the stream will
always have the chance to handle the last read.

This patch must be backported to 1.9 because it will be used by another patch to
fix the option abortonclose.
2019-03-18 15:50:23 +01:00
Christopher Faulet
2b9b6784b9 MINOR: stats: Move stuff about the stats status codes in stats files
The status codes definition (STAT_STATUS_*) and their string representation
stat_status_codes) have been moved in stats files. There is no reason to keep
them in proto_http files.
2019-03-15 14:34:59 +01:00
Christopher Faulet
3c2ecf75c8 MINOR: stats: Add the status code STAT_STATUS_IVAL to handle invalid requests
This patch must be backported to 1.9 because a bug fix depends on it.
2019-03-15 14:34:52 +01:00
Olivier Houchard
1d7f37a2cb BUG/MAJOR: tasks: Use the TASK_GLOBAL flag to know if we're in the global rq.
In task_unlink_rq, to decide if we should logk the global runqueue lock,
use the TASK_GLOBAL flag instead of relying on t->thread_mask being tid_bit,
as it could be so while still being in the global runqueue if another thread
woke that task for us.

This should be backported to 1.9.
2019-03-14 16:19:11 +01:00
Olivier Houchard
237985b228 MEDIUM: connections: Use _HA_ATOMIC_*
Use _HA_ATOMIC_ instead of HA_ATOMIC_ because we know we don't need barriers
2019-03-14 15:55:15 +01:00
Olivier Houchard
9f8d821a55 MEDIUM: list: Use _HA_ATOMIC_*
Use _HA_ATOMIC_ instead of HA_ATOMIC_ because we know we don't need barriers.
2019-03-14 15:55:15 +01:00
Olivier Houchard
17fbb4eb3f MEDIUM: list: Remove useless barriers.
Don't bother forcing a barrier after using HA_ATOMIC_XCHG if we're about
to check the returned value anyway.
2019-03-14 15:55:15 +01:00
Willy Tarreau
b0cef35b09 BUG/MEDIUM: list: fix incorrect pointer unlocking in LIST_DEL_LOCKED()
Injecting on a saturated listener started to exhibit some deadlocks
again between LIST_POP_LOCKED() and LIST_DEL_LOCKED(). Olivier found
it was due to a leftover from a previous debugging session. This patch
fixes it.

This will have to be backported if the other LIST_*_LOCKED() patches
are backported.
2019-03-13 14:15:54 +01:00
Willy Tarreau
df23c0ce45 MINOR: config: continue to rely on DEFAULT_MAXCONN to set the minimum maxconn
Some packages used to rely on DEFAULT_MAXCONN to set the default global
maxconn value to use regardless of the initial ulimit. The recent changes
made the lowest bound set to 100 so that it is compatible with almost any
environment. Now that DEFAULT_MAXCONN is not needed for anything else, we
can use it for the lowest bound set when maxconn is not configured. This
way it retains its original purpose of setting the default maxconn value
eventhough most of the time the effective value will be higher thanks to
the automatic computation based on "ulimit -n".
2019-03-13 10:10:49 +01:00
Willy Tarreau
ca783d4ee6 MINOR: config: remove obsolete use of DEFAULT_MAXCONN at various places
This entry was still set to 2000 but never used anymore. The only places
where it appeared was as an alias to SYSTEM_MAXCONN which forces it, so
let's turn these ones to SYSTEM_MAXCONN and remove the default value for
DEFAULT_MAXCONN. SYSTEM_MAXCONN still defines the upper bound however.
2019-03-13 10:10:25 +01:00
Olivier Houchard
20872763dd MEDIUM: memory: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:38 +01:00
Olivier Houchard
4c28328572 MEDIUM: task: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:37 +01:00
Olivier Houchard
aa4d71a7fe MEDIUM: server: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:37 +01:00
Olivier Houchard
11ecfd1c01 MEDIUM: proxy: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:37 +01:00
Olivier Houchard
d5f9b19196 MEDIUM: freq_ctr: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:37 +01:00
Olivier Houchard
d360879fb5 MEDIUM: fd: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:37 +01:00
Olivier Houchard
8beb27e9ce MEDIUM: xref: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:37 +01:00
Olivier Houchard
a2735340fb MEDIUM: applets: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:37 +01:00
Olivier Houchard
d2b5d16187 MEDIUM: various: Use __ha_barrier_atomic* when relevant.
When protecting data modified by atomic operations, use __ha_barrier_atomic*
to avoid unneeded barriers on x86.
2019-03-11 17:02:37 +01:00
Olivier Houchard
d0c3b8894a MINOR: threads: Add macros to do atomic operation with no memory barrier.
Add variants of the HA_ATOMIC* macros, prefixed with a _, that do the
atomic operation with no barrier generated by the compiler. It is expected
the developer adds barriers manually if needed.
2019-03-11 17:02:37 +01:00
Olivier Houchard
113537967c MEDIUM: threads: Use __ATOMIC_SEQ_CST when using the newer atomic API.
When using the new __atomic* API, ask the compiler to generate barriers.
A variant of those functions that don't generate barriers will be added later.
Before that, using HA_ATOMIC* would not generate any barrier, and some parts
of the code should be reviewed and missing barriers should be added.

This should probably be backported to 1.8 and 1.9.
2019-03-11 17:02:37 +01:00
Olivier Houchard
9abcf6ef9a MINOR: threads: Implement __ha_barrier_atomic*.
Implement __ha_barrier functions to be used when trying to protect data
modified by atomic operations (except when using HA_ATOMIC_STORE).
On intel, atomic operations either use the LOCK prefix and xchg, and both
atc as full barrier, so there's no need to add an extra barrier.
2019-03-11 17:02:37 +01:00
Olivier Houchard
92fce85d03 MINOR: fd: Remove debugging code.
Remove a debugging test, and call to abort, it's no longer needed.
2019-03-08 16:05:25 +01:00
Willy Tarreau
1e56c70cc9 OPTIM: task: limit the impact of memory barriers in taks_remove_from_task_list()
In this function we end up with successive locked operations then a
store barrier, and in addition the compiler has to emit less efficient
code due to a longer jump. There's no need for absolutely updating the
tasks_run_queue counter before clearing the task's leaf pointer, so
let's swap the two operations and benefit from a single barrier as much
as possible. This code is on the hot path and shows about half a percent
of improvement with 8 threads.
2019-03-07 18:44:12 +01:00
Willy Tarreau
0cf33176bd MINOR: listener: move thr_idx from the bind_conf to the listener
Tests show that it's slightly faster to have this field in the listener.
The cache walk patterns are under heavy stress and having only this field
written to in the bind_conf was wasting a cache line that was heavily
read. Let's move this close to the other entries already written to in
the listener. Warning, the position does have an impact on peak performance.
2019-03-07 14:08:26 +01:00
Willy Tarreau
9f1d4e7f7f CLEANUP: listener: remove old thread bit mapping
Now that the P2C algorithm for the accept queue is removed, we don't
need to map a number to a thread bit anymore, so let's remove all
these fields which are taking quite some space for no reason.
2019-03-07 13:59:04 +01:00
Willy Tarreau
d87a67f9bc MINOR: tools: implement my_flsl()
We already have my_ffsl() to find the lowest bit set in a word, and
this patch implements the search for the highest bit set in a word.
On x86 it uses the bsr instruction and on other architectures it
uses an efficient implementation.
2019-03-07 13:48:04 +01:00
Willy Tarreau
fc630bd373 MINOR: listener: improve incoming traffic distribution
By picking two randoms following the P2C algorithm, we seldom observe
asymmetric loads on bursts of small session counts. This is typically
what makes h2load take a bit of time to complete the last 100% because
if a thread gets two connections while the other ones only have one,
it takes twice the time to complete its work.

This patch proposes a modification of the p2c algorithm which seems
more suitable to this case : it mixes a rotating index with a random.
This way, we're certain that all threads are consulted in turn and at
the same time we're not forced to use the ones we're giving a chance.

This significantly increases the traffic rate. Now h2load shows faster
completion and the average request rates on H2 and the TLS resume rate
increases by a bit more than 5% compared to pure p2c.

The index was placed into the struct bind_conf because 1) it's faster
there and it's the best place to optimally distribute traffic among a
group of listeners. It's the only runtime-modified element there and
it will be quite cache-hot.
2019-03-07 13:48:04 +01:00
Willy Tarreau
b238b12e98 MINOR: task: use LIST_DEL_INIT() to remove a task from the queue
By using LIST_DEL_INIT() instead of LIST_DEL()+LIST_INIT() we manage
to bump the peak connection rate by no less than 3% on 8 threads.
The perf top profile shows much less contention in this area which
suffered from the second reload.
2019-03-07 11:45:44 +01:00
Willy Tarreau
c5bd311b2a MINOR: lists: add a LIST_DEL_INIT() macro
It turns out that we call LIST_DEL+LIST_INIT very frequently and that
the compiler doesn't know what pointers get modified in the e->n->p
and e->p->n dance, so when LIST_INIT() is called, it reloads these
pointers, which is quite a bit of a mess in terms of performance.

This patch adds LIST_DEL_INIT() to perform the two operations at once
using local temporary variables so that the compiler knows these
pointers are left unaffected.
2019-03-07 11:45:44 +01:00
Frdric Lcaille
5f33f85ce8 MINOR: sample: Extract some protocol buffers specific code.
We move the code responsible of parsing protocol buffers messages
inside gRPC messages from sample.c to include/proto/protocol_buffers.h
so that to reuse it to cascade "ungrpc" converter.
2019-03-06 15:36:02 +01:00
Frdric Lcaille
756d97f205 MINOR: sample: Rework gRPC converter code.
For now on, "ungrpc" may take a second optional argument to provide
the protocol buffers types used to encode the field value to be extracted.
When absent the field value is extracted as a binary sample which may then
followed by others converters like "hex" which takes binary as input sample.
When this second argument is a type which does not match the one found by "ungrpc",
this field is considered as not found even if present.

With this patch we also remove the useless "varint" and "svarint" converters.

Update the documentation about "ungrpc" converters.
2019-03-05 11:04:23 +01:00
Frdric Lcaille
7c93e88d0c MINOR: sample: Code factorization "ungrpc" converter.
Parsing protocol buffer fields always consists in skip the field
if the field is not found or store the field value if found.
So, with this patch we factorize a little bit the code for "ungrpc" converter.
2019-03-05 11:03:53 +01:00
Willy Tarreau
967de20a43 BUG/MEDIUM: list: fix again LIST_ADDQ_LOCKED
Well, that's becoming embarrassing. Now this fixes commit 4ef6801c
("BUG/MEDIUM: list: correct fix for LIST_POP_LOCKED's removal of last
element") which itself tried to fix commit 285192564. This fix only
works under low contention and was tested with the listener's queue.
With the idle conns it's obvious that it's still wrong since adding
more than one element to the list leaves a LLIST_BUSY pointer into
the list's head. This was visible when accumulating idle connections
in a server's list.

This new version of the fix almost goes back to the original code,
except that since then we addressed issues with expectedly idempotent
operations that were not. Now the code has been verified on paper again
and has survived 300 million connections spread over 4 threads.

This will have to be backported if the commit above is backported.
2019-03-04 14:09:22 +01:00
Willy Tarreau
bf6964007a MINOR: global: keep a copy of the initial rlim_fd_cur and rlim_fd_max values
Let's keep a copy of these initial values. They will be useful to
compute automatic maxconn, as well as to restore proper limits when
doing an execve() on external checks.
2019-03-01 10:40:30 +01:00
Frdric Lcaille
645635da84 MINOR: peers: Add a message for heartbeat.
This patch implements peer heartbeat feature to prevent any haproxy peer
from reconnecting too often, consuming sockets for nothing.

To do so, we add PEER_MSG_CTRL_HEARTBEAT new message to PEER_MSG_CLASS_CONTROL peers
control class of messages. A ->heartbeat field is added to peer structs
to store the heatbeat timeout value which is handled by the same function as for ->reconnect
to control the session timeouts. A 2-bytes heartbeat message is sent every 3s when
no updates have to be sent. This way, the peer which receives such a message is sure
the remote peer is still alive. So, it resets the ->reconnect peer session
timeout to its initial value (5s). This prevents any reconnection to an
already connected alive peer.
2019-03-01 09:33:26 +01:00
Willy Tarreau
c8d5b95e6d MEDIUM: config: don't enforce a low frontend maxconn value anymore
Historically the default frontend's maxconn used to be quite low (2000),
which was sufficient two decades ago but often proved to be a problem
when users had purposely set the global maxconn value but forgot to set
the frontend's.

There is no point in keeping this arbitrary limit for frontends : when
the global maxconn is lower, it's already too high and when the global
maxconn is much higher, it becomes a limiting factor which causes trouble
in production.

This commit allows the value to be set to zero, which becomes the new
default value, to mean it's not directly limited, or in fact it's set
to the global maxconn. Since this operation used to be performed before
computing a possibly automatic global maxconn based on memory limits,
the calculation of the maxconn value and its propagation to the backends'
fullconn has now moved to a dedicated function, proxy_adjust_all_maxconn(),
which is called once the global maxconn is stabilized.

This comes with two benefits :
  1) a configuration missing "maxconn" in the defaults section will not
     limit itself to a magically hardcoded value but will scale up to the
     global maxconn ;

  2) when the global maxconn is not set and memory limits are used instead,
     the frontends' maxconn automatically adapts, and the backends' fullconn
     as well.
2019-02-28 17:05:32 +01:00
Willy Tarreau
e2711c7bd6 MINOR: listener: introduce listener_backlog() to report the backlog value
In an attempt to try to provide automatic maxconn settings, we need to
decorrelate a listner's backlog and maxconn so that these values can be
independent. This introduces a listener_backlog() function which retrieves
the backlog value from the listener's backlog, the frontend's, the
listener's maxconn, the frontend's or falls back to 1024. This
corresponds to what was done in cfgparse.c to force a value there except
the last fallback which was not set since the frontend's maxconn is always
known.
2019-02-28 17:05:29 +01:00
Willy Tarreau
4ef6801cd4 BUG/MEDIUM: list: correct fix for LIST_POP_LOCKED's removal of last element
As seen with Olivier, in the end the fix in commit 285192564 ("BUG/MEDIUM:
list: fix LIST_POP_LOCKED's removal of the last pointer") is wrong,
the code there was right but the bug was triggered by another bug in
LIST_ADDQ_LOCKED() which doesn't properly update the list's head by
inserting in the wrong order.

This will have to be backported if the commit above is backported.
2019-02-28 16:51:28 +01:00
Willy Tarreau
01abd02508 BUG/MEDIUM: listener: use a self-locked list for the dequeue lists
There is a very difficult to reproduce race in the listener's accept
code, which is much easier to reproduce once connection limits are
properly enforced. It's an ABBA lock issue :

  - the following functions take l->lock then lq_lock :
      disable_listener, pause_listener, listener_full, limit_listener,
      do_unbind_listener

  - the following ones take lq_lock then l->lock :
      resume_listener, dequeue_all_listener

This is because __resume_listener() only takes the listener's lock
and expects to be called with lq_lock held. The problem can easily
happen when listener_full() and limit_listener() are called a lot
while in parallel another thread releases sessions for the same
listener using listener_release() which in turn calls resume_listener().

This scenario is more prevalent in 2.0-dev since the removal of the
accept lock in listener_accept(). However in 1.9 and before, a different
but extremely unlikely scenario can happen :

      thread1                                  thread2
         ............................  enter listener_accept()
  limit_listener()
         ............................  long pause before taking the lock
  session_free()
    dequeue_all_listeners()
      lock(lq_lock) [1]
         ............................  try_lock(l->lock) [2]
      __resume_listener()
        spin_lock(l->lock) =>WAIT[2]
         ............................  accept()
                                       l->accept()
                                       nbconn==maxconn =>
                                         listener_full()
                                           state==LI_LIMITED =>
                                             lock(lq_lock) =>DEADLOCK[1]!

In practice it is almost impossible to trigger it because it requires
to limit both on the listener's maxconn and the frontend's rate limit,
at the same time, and to release the listener when the connection rate
goes below the limit between poll() returns the FD and the lock is
taken (a few nanoseconds). But maybe with threads competing on the
same core it has more chances to appear.

This patch removes the lq_lock and replaces it with a lockless queue
for the listener's wait queue (well, technically speaking a self-locked
queue) brought by commit a8434ec14 ("MINOR: lists: Implement locked
variations.") and its few subsequent fixes. This relieves us from the
need of the lq_lock and removes the deadlock. It also gets rid of the
distinction between __resume_listener() and resume_listener() since the
only difference was the lq_lock. All listener removals from the list
are now unconditional to avoid races on the state. It's worth noting
that the list used to never be initialized and that it used to work
only thanks to the state tests, so the initialization has now been
added.

This patch must carefully be backported to 1.9 and very likely 1.8.
It is mandatory to be careful about replacing all manipulations of
l->wait_queue, global.listener_queue and p->listener_queue.
2019-02-28 16:08:54 +01:00
Willy Tarreau
c912f94b57 MINOR: server: remove a few unneeded LIST_INIT calls after LIST_DEL_LOCKED
Since LIST_DEL_LOCKED() and LIST_POP_LOCKED() now automatically reinitialize
the removed element, there's no need for keeping this LIST_INIT() call in the
idle connection code.
2019-02-28 16:08:54 +01:00
Willy Tarreau
4c747e86cd MINOR: list: make the delete and pop operations idempotent
These operations previously used to return a "locked" element, which is
a constraint when multiple threads try to delete the same element, because
the second one will block indefinitely. Instead, let's make sure that both
LIST_DEL_LOCKED() and LIST_POP_LOCKED() always reinitialize the element
after deleting it. This ensures that the second thread will immediately
unblock and succeed with the removal. It also secures the pop vs delete
competition that may happen when trying to remove an element that's about
to be dequeued.
2019-02-28 16:03:29 +01:00
Willy Tarreau
690d2ad4d2 BUG/MEDIUM: list: add missing store barriers when updating elements and head
Commit a8434ec14 ("MINOR: lists: Implement locked variations.")
introduced locked lists which use the elements pointers as locks
for concurrent operations. Under heavy stress the lists occasionally
fail. The cause is a missing barrier at some points when updating
the list element and the head : nothing prevents the compiler (or
CPU) from updating the list head first before updating the element,
making another thread jump to a wrong location. This patch simply
adds the missing barriers before these two opeations.

This will have to be backported if the commit above is backported.
2019-02-28 15:59:31 +01:00
Willy Tarreau
285192564d BUG/MEDIUM: list: fix LIST_POP_LOCKED's removal of the last pointer
There was a typo making the last updated pointer be the pre-last element's
prev instead of the last's prev element. It didn't show up during early
tests because the contention is very rare on this one  and it's implicitly
recovered when updating the pointers to go to the next element, but it was
clearly visible in the listener_accept() tests by having all threads block
on LIST_POP_LOCKED() with n==p==LLIST_BUSY.

This will have to be backported if commit a8434ec14 ("MINOR: lists:
Implement locked variations.") is backported.
2019-02-28 15:59:31 +01:00
Willy Tarreau
bd20ad5874 BUG/MEDIUM: list: fix the rollback on addq in the locked liss
Commit a8434ec14 ("MINOR: lists: Implement locked variations.")
introduced locked lists which use the elements pointers as locks
for concurrent operations. A copy-paste typo in LIST_ADDQ_LOCKED()
causes corruption in the list in case the next pointer is already
held, as it restores the previous pointer into the next one. It
may impact the server pools.

This will have to be backported if the commit above is backported.
2019-02-28 15:10:15 +01:00
Willy Tarreau
149ab779cc MAJOR: threads: enable one thread per CPU by default
Threads have long matured by now, still for most users their usage is
not trivial. It's about time to enable them by default on platforms
where we know the number of CPUs bound. This patch does this, it counts
the number of CPUs the process is bound to upon startup, and enables as
many threads by default. Of course, "nbthread" still overrides this, but
if it's not set the default behaviour is to start one thread per CPU.

The default number of threads is reported in "haproxy -vv". Simply using
"taskset -c" is now enough to adjust this number of threads so that there
is no more need for playing with cpu-map. And thanks to the previous
patches on the listener, the vast majority of configurations will not
need to duplicate "bind" lines with the "process x/y" statement anymore
either, so a simple config will automatically adapt to the number of
processors available.
2019-02-27 14:51:50 +01:00
Willy Tarreau
7ac908bf8c MINOR: config: add global tune.listener.multi-queue setting
tune.listener.multi-queue { on | off }
  Enables ('on') or disables ('off') the listener's multi-queue accept which
  spreads the incoming traffic to all threads a "bind" line is allowed to run
  on instead of taking them for itself. This provides a smoother traffic
  distribution and scales much better, especially in environments where threads
  may be unevenly loaded due to external activity (network interrupts colliding
  with one thread for example). This option is enabled by default, but it may
  be forcefully disabled for troubleshooting or for situations where it is
  estimated that the operating system already provides a good enough
  distribution and connections are extremely short-lived.
2019-02-27 14:27:07 +01:00
Willy Tarreau
8a03408d81 MINOR: activity: add accept queue counters for pushed and overflows
It's important to monitor the accept queues to know if some incoming
connections had to be handled by their originating thread due to an
overflow. It's also important to be able to confirm thread fairness.
This patch adds "accq_pushed" to activity reporting, which reports
the number of connections that were successfully pushed into each
thread's queue, and "accq_full", which indicates the number of
connections that couldn't be pushed because the thread's queue was
full.
2019-02-27 14:27:07 +01:00
Willy Tarreau
1efafce61f MINOR: listener: implement multi-queue accept for threads
There is one point where we can migrate a connection to another thread
without taking risk, it's when we accept it : the new FD is not yet in
the fd cache and no task was created yet. It's still possible to assign
it a different thread than the one which accepted the connection. The
only requirement for this is to have one accept queue per thread and
their respective processing tasks that have to be woken up each time
an entry is added to the queue.

This is a multiple-producer, single-consumer model. Entries are added
at the queue's tail and the processing task is woken up. The consumer
picks entries at the head and processes them in order. The accept queue
contains the fd, the source address, and the listener. Each entry of
the accept queue was rounded up to 64 bytes (one cache line) to avoid
cache aliasing because tests have shown that otherwise performance
suffers a lot (5%). A test has shown that it's important to have at
least 256 entries for the rings, as at 128 it's still possible to fill
them often at high loads on small thread counts.

The processing task does almost nothing except calling the listener's
accept() function and updating the global session and SSL rate counters
just like listener_accept() does on synchronous calls.

At this point the accept queue is implemented but not used.
2019-02-27 14:27:07 +01:00
Willy Tarreau
b2b50a7784 MINOR: listener: pre-compute some thread counts per bind_conf
In order to quickly pick a thread ID when accepting a connection, we'll
need to know certain pre-computed values derived from the thread mask,
which are counts of bits per position multiples of 1, 2, 4, 8, 16 and
32. In practice it is sufficient to compute only the 4 first ones and
store them in the bind_conf. We update the count every time the
bind_thread value is adjusted.

The fields in the bind_conf struct have been moved around a little bit
to make it easier to group all thread bit values into the same cache
line.

The function used to return a thread number is bind_map_thread_id(),
and it maps a number between 0 and 31/63 to a thread ID between 0 and
31/63, starting from the left.
2019-02-27 14:27:07 +01:00
Willy Tarreau
f3241115e7 MINOR: tools: implement functions to look up the nth bit set in a mask
Function mask_find_rank_bit() returns the bit position in mask <m> of
the nth bit set of rank <r>, between 0 and LONGBITS-1 included, starting
from the left. For example ranks 0,1,2,3 for mask 0x55 will be 6, 4, 2
and 0 respectively. This algorithm is based on a popcount variant and
is described here : https://graphics.stanford.edu/~seander/bithacks.html.
2019-02-27 14:27:07 +01:00
Willy Tarreau
9e85318417 MINOR: listener: maintain a per-thread count of the number of connections on a listener
Having this information will help us improve thread-level distribution
of incoming traffic.
2019-02-27 14:27:07 +01:00
Willy Tarreau
a36b324777 MEDIUM: listener: keep a single thread-mask and warn on "process" misuse
Now that nbproc and nbthread are exclusive, we can still provide more
detailed explanations about what we've found in the config when a bind
line appears on multiple threads and processes at the same time, then
ignore the setting.

This patch reduces the listener's thread mask to a single mask instead
of an array of masks per process. Now we have only one thread mask and
one process mask per bind-conf. This removes ~504 bytes of RAM per
bind-conf and will simplify handling of thread masks.

If a "bind" line only refers to process numbers not found by its parent
frontend or not covered by the global nbproc directive, or to a thread
not covered by the global nbthread directive, a warning is emitted saying
what will be used instead.
2019-02-27 14:27:07 +01:00
Olivier Houchard
db64489aac BUG/MEDIUM: lists: Properly handle the case we're removing the first elt.
In LIST_DEL_LOCKED(), initialize p2 to NULL, and only attempt to set it back
to its previous value if we had a previous element, and thus p2 is non-NULL.
2019-02-26 18:47:59 +01:00
Olivier Houchard
9ea5d361ae MEDIUM: servers: Reorganize the way idle connections are cleaned.
Instead of having one task per thread and per server that does clean the
idling connections, have only one global task for every servers.
That tasks parses all the servers that currently have idling connections,
and remove half of them, to put them in a per-thread list of connections
to kill. For each thread that does have connections to kill, wake a task
to do so, so that the cleaning will be done in the context of said thread.
2019-02-26 18:17:32 +01:00
Olivier Houchard
7f1bc31fee MEDIUM: servers: Used a locked list for idle_orphan_conns.
Use the locked macros when manipulating idle_orphan_conns, so that other
threads can remove elements from it.
It will be useful later to avoid having a task per server and per thread to
cleanup the orphan list.
2019-02-26 18:17:32 +01:00
Olivier Houchard
a8434ec146 MINOR: lists: Implement locked variations.
Implement LIST_ADD_LOCKED(), LIST_ADDQ_LOCKED(), LIST_DEL_LOCKED() and
LIST_POP_LOCKED().

LIST_ADD_LOCKED, LIST_ADDQ_LOCKED and LIST_DEL_LOCKED work the same as
LIST_ADD, LIST_ADDQ and LIST_DEL, except before any manipulation it locks
the relevant elements of the list, so it's safe to manipulate the list
with multiple threads.
LIST_POP_LOCKED() removes the first element from the list, and returns its
data.
2019-02-26 18:17:32 +01:00
Frdric Lcaille
1fceee8316 MINOR: http_fetch: add "req.ungrpc" sample fetch for gRPC.
This patch implements "req.ungrpc" sample fetch method to decode and
parse a gRPC request. It takes only one argument: a protocol buffers
field number to identify the protocol buffers message number to be looked up.
This argument is a sort of path in dotted notation to the terminal field number
to be retrieved.

  ex:
    req.ungrpc(1.2.3.4)

This sample fetch catch the data in raw mode, without interpreting them.
Some protocol buffers specific converters may be used to convert the data
to the correct type.
2019-02-26 16:27:05 +01:00
Frdric Lcaille
3a463c92cf MINOR: arg: Add support for ARGT_PBUF_FNUM arg type.
This new argument type is used to parse Protocol Buffers field number
with dotted notation (e.g: 1.2.3.4).
2019-02-26 16:27:05 +01:00
Frdric Lcaille
3b71716685 MINOR: standard: Add a function to parse uints (dotted notation).
This function is useful to parse strings made of unsigned integers
and to allocate a C array of unsigned integers from there.
For instance this function allocates this array { 1, 2, 3, 4, } from
this string: "1.2.3.4".
2019-02-26 16:27:05 +01:00
Christopher Faulet
c6827d52c1 MINOR: channel/htx: Add function to skips output bytes from an HTX channel
It is the HTX version of co_skip(). Internally, It uses the function htx_drain().

It will be used by other commits to fix bugs, so it must be backported to 1.9.
2019-02-26 14:04:23 +01:00
Christopher Faulet
549822f0a1 MINOR: htx: Add function to drain data from an HTX message
The function htx_drain() can now be used to drain data from an HTX message.

It will be used by other commits to fix bugs, so it must be backported to 1.9.
2019-02-26 14:04:23 +01:00
Christopher Faulet
729b5b308c BUG/MINOR: channel: Set CF_WROTE_DATA when outgoing data are skipped
in co_skip(), the flag CF_WRITE_PARTIAL is set on the channel. The flag
CF_WROTE_DATA must also be set to notify the channel some data were sent.

This patch must be backported to 1.9.
2019-02-26 14:04:23 +01:00
Richard Russo
bc9d9844d5 BUG/MAJOR: fd/threads, task/threads: ensure all spin locks are unlocked
Calculate if the fd or task should be locked once, before locking, and
reuse the calculation when determing when to unlock.

Fixes a race condition added in 87d54a9a for fds, and b20aa9ee for tasks,
released in 1.9-dev4. When one thread modifies thread_mask to be a single
thread for a task or fd while a second thread has locked or is waiting on a
lock for that task or fd, the second thread will not unlock it.  For FDs,
this is observable when a listener is polled by multiple threads, and is
closed while those threads have events pending.  For tasks, this seems
possible, where task_set_affinity is called, but I did not observe it.

This must be backported to 1.9.
2019-02-25 16:16:36 +01:00
Willy Tarreau
2d7f81b809 MINOR: fd: add a new my_closefrom() function to close all FDs
This is a naive implementation of closefrom() which closes all FDs
starting from the one passed in argument. closefrom() is not provided
on all operating systems, and other versions will follow.
2019-02-21 22:19:17 +01:00
Olivier Houchard
f131481a0a BUG/MEDIUM: servers: Add a per-thread counter of idle connections.
Add a per-thread counter of idling connections, and use it to determine
how many connections we should kill after the timeout, instead of using
the global counter, or we're likely to just kill most of the connections.

This should be backported to 1.9.
2019-02-21 19:07:45 +01:00
Olivier Houchard
e737103173 BUG/MEDIUM: servers: Use atomic operations when handling curr_idle_conns.
Use atomic operations when dealing with srv->curr_idle_conns, as it's shared
between threads, otherwise we could get inconsistencies.

This should be backported to 1.9.
2019-02-21 19:07:19 +01:00
Christopher Faulet
0b46548a68 BUG/MEDIUM: h2/htx: Correctly handle interim responses when HTX is enabled
1xx responses does not work in HTTP2 when the HTX is enabled. First of all, when
a response is parsed, only one HEADERS frame is expected. So when an interim
response is received, the flag H2_SF_HEADERS_RCVD is set and the next HEADERS
frame (for another interim repsonse or the final one) is parsed as a trailers
one. Then when the response is sent, because an EOM block is found at the end of
the interim HTX response, the ES flag is added on the frame, closing too early
the stream. Here, it is a design problem of the HTX. Iterim responses are
considered as full messages, leading to some ambiguities when HTX messages are
processed. This will not be fixed now, but we need to keep it in mind for future
improvements.

To fix the parsing bug, the flag H2_MSGF_RSP_1XX is added when the response
headers are decoded. When this flag is set, an EOM block is added into the HTX
message, despite the fact that there is no ES flag on the frame. And we don't
set the flag H2_SF_HEADERS_RCVD on the corresponding H2S. So the next HEADERS
frame will not be parsed as a trailers one.

To fix the sending bug, the ES flag is not set on the frame when an interim
response is processed and the flag H2_SF_HEADERS_SENT is not set on the
corresponding H2S.

This patch must be backported to 1.9.
2019-02-19 16:26:14 +01:00
Olivier Houchard
9efa7b8ba8 BUILD/MEDIUM: initcall: Fix build on MacOS.
MacOS syntax for sections is a bit different, so implement it.
(see issue #42).

This should be backported to 1.9.
2019-02-15 14:32:35 +01:00
Frdric Lcaille
76d2cef0c2 BUG/MEDIUM: peers: Missing peer initializations.
Initialize ->srv peer field for all the peers, the local peer included.
Indeed, a haproxy process needs to connect to the local peer of a remote
process. Furthermore, when a "peer" or "server" line is parsed by parse_server()
the address must be copied to ->addr field of the peer object only if this address
has been also parsed by parse_server(). This is not the case if this address belongs
to the local peer and is provided on a "server" line.

After having parsed the "peer" or "server" lines of a peer
sections, the ->srv part of all the peer must be initialized for SSL, if
enabled. Same thing for the binding part.

Revert 1417f0b commit which is no more required.

No backport is needed, this is purely 2.0.
2019-02-12 19:49:22 +01:00
Ben51Degrees
4ddf59d070 MEDIUM: 51d: Enabled multi threaded operation in the 51Degrees module.
The existing threading flag in the 51Degrees API
(FIFTYONEDEGREES_NO_THREADING) has now been mapped to the HAProxy
threading flag (USE_THREAD), and the 51Degrees module code has been made
thread safe.
In Pattern, the cache is now locked with a spin lock from hathreads.h
using a new lable 'OTHER_LOCK'. The workset pool is now created with the
same size as the number of threads to avoid any time waiting on a
worket.
In Hash Trie, the global device offsets structure is only used in single
threaded operation. Multi threaded operation creates a new offsets
structure in each thread.
2019-02-08 21:29:23 +01:00
Willy Tarreau
1417f0b5dc BUG/MEDIUM: peers: check that p->srv actually exists before using p->srv->use_ssl
Commit 1055e687a ("MINOR: peers: Make outgoing connection to SSL/TLS
peers work.") introduced an "srv" field in the peers, which points to
the equivalent server to hold SSL settings. This one is not set when
the peer is local so we must always test it before testing p->srv->use_ssl
otherwise haproxy dies during reloads.

No backport is needed, this is purely 2.0.
2019-02-08 10:22:31 +01:00
Willy Tarreau
ff9c9140f4 MINOR: config: make MAX_PROCS configurable at build time
For some embedded systems, it's pointless to have 32- or even 64- large
arrays of processes when it's known that much fewer processes will be
used in the worst case. Let's introduce this MAX_PROCS define which
contains the highest number of processes allowed to run at once. It
still defaults to LONGBITS but may be lowered.
2019-02-07 15:10:19 +01:00
Willy Tarreau
980855bd95 BUG/MEDIUM: server: initialize the orphaned conns lists and tasks at the end
This also depends on the nbthread count, so it must only be performed after
parsing the whole config file. As a side effect, this removes some code
duplication between servers and server-templates.

This must be backported to 1.9.
2019-02-07 15:08:13 +01:00
Willy Tarreau
2415727a00 MINOR: global: add proc_mask() and thread_mask()
These two functions return either all_{proc,threads}_mask, or the argument.
This is used to default to all_proc_mask or all_threads_mask when not set
on bind_conf or proxies.
2019-02-04 05:09:15 +01:00
Willy Tarreau
a38a7175b1 MINOR: config: keep an all_proc_mask like we have all_threads_mask
This simplifies some mask comparisons at various places where
nbits(global.nbproc) was used.
2019-02-04 05:09:15 +01:00
Willy Tarreau
cafa56ecd6 MINOR: tools: improve the popcount() operation
We'll call popcount() more often so better use a parallel method
than an iterative one. One optimal design is proposed at the site
below. It requires a fast multiplication though, but even without
it will still be faster than the iterative one, and all relevant
64 bit platforms do have a multiply unit.

     https://graphics.stanford.edu/~seander/bithacks.html
2019-02-04 05:09:15 +01:00
Willy Tarreau
4ed84c96cf OPTIM: listener: optimize cache-line packing for struct listener
Some unused fields were placed early and some important ones were on
the second cache line. Let's move the proto_list and name closer to
the end of the structure to bring accept() and default_target() into
the first cache line.
2019-02-04 05:09:14 +01:00
Willy Tarreau
da9e939f3c CLEANUP: threads: fix misleading comment about all_threads_mask
This variable changed a bit after 1.8, it's never zero anymore.
2019-02-02 17:48:39 +01:00
Olivier Houchard
dc21ff778b MINOR: debug: Add an option that causes random allocation failures.
When compiling with DEBUG_FAIL_ALLOC, add a new option, tune.fail-alloc,
that gives the percentage of chances an allocation fails.
This is useful to check that allocation failures are always handled
gracefully.
2019-01-31 19:38:25 +01:00
Olivier Houchard
ff5dd74e25 MINOR: xref: Add missing barriers.
Add a few missing barriers in the xref code, it's unlikely to be a problem
for x86, but may be on architectures with weak memory ordering.
2019-01-31 19:38:25 +01:00
Willy Tarreau
00f18a36b6 BUG/MINOR: server: fix logic flaw in idle connection list management
With variable connection limits, it's not possible to accurately determine
whether the mux is still in use by comparing usage and max to be equal due
to the fact that one determines the capacity and the other one takes care
of the context. This can cause some connections to be dropped before they
reach their stream ID limit.

It seems it could also cause some connections to be terminated with
streams still alive if the limit was reduced to match the newly computed
avail_streams() value, though this cannot yet happen with existing muxes.

Instead let's switch to usage reports and simply check whether connections
are both unused and available before adding them to the idle list.

This should be backported to 1.9.
2019-01-31 19:38:25 +01:00
Willy Tarreau
51d0a7e54c MINOR: connstream: have a new flag CS_FL_KILL_CONN to kill a connection
This is the equivalent of SI_FL_KILL_CONN but for the connstreams. It
will be set by the stream-interface during the various shutdown
operations.
2019-01-31 19:38:25 +01:00
Willy Tarreau
0f9cd7b196 MINOR: stream-int: add a new flag to mention that we want the connection to be killed
The new flag SI_FL_KILL_CONN is now set by the rare actions which
deliberately want the whole connection (and not just the stream) to be
killed. This is only used for "tcp-request content reject",
"tcp-response content reject", "tcp-response content close" and
"http-request reject". The purpose is to desambiguate the close from
a regular shutdown. This will be used by the next patches.
2019-01-31 19:38:25 +01:00
Olivier Houchard
8788b4111c BUG/MEDIUM: connections: Don't forget to remove CO_FL_SESS_IDLE.
If we're adding a connection to the server orphan idle list, don't forget
to remove the CO_FL_SESS_IDLE flag, or we will assume later it's still
attached to a session.

This should be backported to 1.9.
2019-01-31 19:38:25 +01:00
Willy Tarreau
e5fcfbed5c MINOR: htx: never check for null htx pointer in htx_is_{,not_}empty()
The previous patch clarifies the fact that the htx pointer is never null
along all the code. This test for a null will never match, didn't catch
the pointer 1 before the fix for b_is_null(), but it confuses the compiler
letting it think that any dereferences made to this pointer after this
test could actually mean we're dereferencing a null. Let's now drop this
test. This saves us from having to add impossible tests everywhere to
avoid the warning.

This should be backported to 1.9 if the b_is_null() patch is backported.
2019-01-31 08:07:17 +01:00
Willy Tarreau
245d189cce DOC: htx: make it clear that htxbuf() and htx_from_buf() always return valid pointers
Update the comments above htxbuf() and htx_from_buf() to make it clear
that they always return valid htx pointers so that callers know they do
not have to test them. This is only true after the fix on b_is_null()
which was the only known corner case.

This should be backported to 1.9 if the b_is_null() patch is backported.
2019-01-31 08:07:17 +01:00
Olivier Houchard
203d735cac BUG/MEDIUM: buffer: Make sure b_is_null handles buffers waiting for allocation.
In b_is_null(), make sure we return 1 if the buffer is waiting for its
allocation, as users assume there's memory allocated if b_is_null() returns
0.

The indirect impact of not having this was that htxbuf() would not match
b_is_null() for a buffer waiting for an allocation, and would thus return
the value 1 for the htx pointer, causing various crashes under low memory
condition.

Note that this patch makes gcc versions 6 and above report two null-deref
warnings in proto_htx.c since htx_is_empty() continues to check for a null
pointer without knowing that this is protected by the test on b_is_null().
This is addressed by the following patches.

This should be backported to 1.9.
2019-01-31 08:07:17 +01:00
Willy Tarreau
9c84d8299a MINOR: h2: add a generic frame checker
The new function h2_frame_check() checks the protocol limits for the
received frame (length, ID, direction) and returns a verdict made of
a connection error code. The purpose is to be able to validate any
frame regardless of the state and the ability to call the frame handler,
and to emit a GOAWAY early in this case.
2019-01-30 19:37:20 +01:00
Willy Tarreau
13afcb7ab3 BUG/MINOR: task: fix possibly missed event in inter-thread wakeups
There's a very small but existing uncertainty window when waking another
thread up where it is possible for task_wakeup() not to wake the other
task up because it's still running while this once is in the process of
finishing and loses its TASK_RUNNING flag. In this case the wakeup will
be missed.

The problem is that we have a single flag to store 3 states, since the
transition from running to sleeping isn't atomic. Thus we need to have
another flag to cover this part. This patch introduces TASK_QUEUED to
mention that the task is already in the run queue, running or not. This
bit will be removed while TASK_RUNNING is kept once dequeued, and will
be used when removing TASK_RUNNING to check if the task has been requeued.

It might be possible to slightly improve this but the occurrence rate
is quite low and we don't really need to complexify the scheduler to
optimize for a rare case.

The impact with the current code is very low since we have few inter-
thread wakeups. Most of them are caused by checks killing sessions.

This must be backported to 1.9.
2019-01-28 15:03:04 +01:00
Willy Tarreau
f5809cde7a MINOR: threads: make MAX_THREADS configurable at build time
There's some value in being able to limit MAX_THREADS, either to save
precious resources in embedded environments, or to protect certain
deployments against accidently incorrect settings.

With this patch, if MAX_THREADS is defined at build time, it will be
used. However, given that LONGBITS is not a macro but is defined
according to sizeof(long), we can't check the value range at build
time and instead we need to perform the check at early boot time.
However, the compiler is able to optimize away the constant comparisons
and doesn't even emit the check code when values are correct.

The output message regarding threading support was improved to report
the number of threads.
2019-01-26 13:37:48 +01:00
Willy Tarreau
c9a82e48bf MINOR: cfgparse: make the process/thread parser support a maximum value
It was hard-wired to LONGBITS, let's make it configurable depending on the
context (threads, processes).
2019-01-26 13:25:14 +01:00
Willy Tarreau
4790f7c907 MEDIUM: h2: always parse and deduplicate the content-length header
The header used to be parsed only in HTX but not in legacy. And even in
HTX mode, the value was dropped. Let's always parse it and report the
parsed value back so that we'll be able to store it in the streams.
2019-01-24 19:07:26 +01:00
Willy Tarreau
bf66bd1b8b MEDIUM: stream-int: always mark pending outgoing SI_ST_CON
Before the first send() attempt, we should be in SI_ST_CON, not
SI_ST_EST, since we have not yet attempted to send and we are
allowed to retry. This is particularly important with complex
outgoing muxes which can fail during the first send attempt (e.g.
failed stream ID allocation).

It only requires that sess_update_st_con_tcp() knows about this
possibility, as we must not forcefully close a reused connection
when facing an error in this case, this will be handled later.

This may be backported to 1.9 with care after some observation period.
2019-01-24 19:06:43 +01:00
Willy Tarreau
9c538e01c2 MINOR: server: add a max-reuse parameter
Some servers may wish to limit the total number of requests they execute
over a connection because some of their components might leak resources.
In HTTP/1 it was easy, they just had to emit a "connection: close" header
field with the last response. In HTTP/2, it's less easy because the info
is not always shared with the component dealing with the H2 protocol and
it could be harder to advertise a GOAWAY with a stream limit.

This patch provides a solution to this by adding a new "max-reuse" parameter
to the server keyword. This parameter indicates how many times an idle
connection may be reused for new requests. The information is made available
and the underlying muxes will be able to use it at will.

This patch should be backported to 1.9.
2019-01-24 19:06:43 +01:00
Willy Tarreau
1e7d444eec BUG/MINOR: hpack: return a compression error on invalid table size updates
RFC7541#6.3 mandates that an error is reported when a dynamic table size
update announces a size larger than the one configured with settings. This
is tested by h2spec using test "hpack/6.3/1".

This must be backported to 1.9 and possibly 1.8 as well.
2019-01-24 15:27:06 +01:00
Willy Tarreau
71c3811589 MINOR: h2: declare new sets of frame types
This patch adds H2_FT_HDR_MASK to group all frame types carrying headers
information, and H2_FT_LATE_MASK to group frame types allowed to arrive
after a stream was closed.
2019-01-24 15:27:06 +01:00
Frdric Lcaille
355b2033ec MINOR: cfgparse: SSL/TLS binding in "peers" sections.
Make "bind" keywork be supported in "peers" sections.
All "bind" settings are supported on this line.
Add "default-bind" option to parse the binding options excepted the bind address.
Do not parse anymore the bind address for local peers on "server" lines.
Do not use anymore list_for_each_entry() to set the "peers" section
listener parameters because there is only one listener by "peers" section.

May be backported to 1.5 and newer.
2019-01-18 14:26:21 +01:00
Frdric Lcaille
1055e687a2 MINOR: peers: Make outgoing connection to SSL/TLS peers work.
This patch adds pointer to a struct server to peer structure which
is initialized after having parsed a remote "peer" line.

After having parsed all peers section we run ->prepare_srv to initialize
all SSL/TLS stuff of remote perr (or server).

Remaining thing to do to completely support peer protocol over SSL/TLS:
make "bind" keyword be supported in "peers" sections to make SSL/TLS
incoming connections to local peers work.

May be backported to 1.5 and newer.
2019-01-18 14:26:21 +01:00
Tim Duesterhus
8b87c01c4d BUG/MINOR: stick_table: Prevent conn_cur from underflowing
When using the peers feature a race condition could prevent
a connection from being properly counted. When this connection
exits it is being "uncounted" nonetheless, leading to a possible
underflow (-1) of the conn_curr stick table entry in the following
scenario :

  - Connect to peer A     (A=1, B=0)
  - Peer A sends 1 to B   (A=1, B=1)
  - Kill connection to A  (A=0, B=1)
  - Connect to peer B     (A=0, B=2)
  - Peer A sends 0 to B   (A=0, B=0)
  - Peer B sends 0/2 to A (A=?, B=0)
  - Kill connection to B  (A=?, B=-1)
  - Peer B sends -1 to A  (A=-1, B=-1)

This fix may be backported to all supported branches.
2019-01-15 15:34:49 +01:00
Willy Tarreau
0cac26cd88 MEDIUM: backend: move all LB algo parameters into an union
Since all of them are exclusive, let's move them to an union instead
of eating memory with the sum of all of them. We're using a transparent
union to limit the code changes.

Doing so reduces the struct lbprm from 392 bytes to 372, and thanks
to these changes, the struct proxy is now down to 6480 bytes vs 6624
before the changes (144 bytes saved per proxy).
2019-01-14 19:33:17 +01:00
Willy Tarreau
76e84f5091 MINOR: backend: move hash_balance_factor out of chash
This one is a proxy option which can be inherited from defaults even
if the LB algo changes. Move it out of the lb_chash struct so that we
don't need to keep anything separate between these structs. This will
allow us to merge them into an union later. It even takes less room
now as it fills a hole and removes another one.
2019-01-14 19:33:17 +01:00
Willy Tarreau
a9a7249966 MINOR: backend: remap the balance uri settings to lbprm.arg_opt{1,2,3}
The algo-specific settings move from the proxy to the LB algo this way :
  - uri_whole => arg_opt1
  - uri_len_limit => arg_opt2
  - uri_dirs_depth1 => arg_opt3
2019-01-14 19:33:17 +01:00
Willy Tarreau
9fed8586b5 MINOR: backend: make the header hash use arg_opt1 for use_domain_only
This is only a boolean extra arg. Let's map it to arg_opt1 and remove
hh_match_domain from struct proxy.
2019-01-14 19:33:17 +01:00
Willy Tarreau
20e68378f1 MINOR: backend: add new fields in lbprm to store more LB options
Some algorithms require a few extra options (up to 3). Let's provide
some room in lbprm to store them, and make sure they're passed from
defaults to backends.
2019-01-14 19:33:17 +01:00
Willy Tarreau
484ff07691 MINOR: backend: make headers and RDP cookie also use arg_str/len
These ones used to rely on separate variables called hh_name/hh_len
but they are exclusive with the former. Let's use the same variable
which becomes a generic argument name and length for the LB algorithm.
2019-01-14 19:33:17 +01:00
Willy Tarreau
4c03d1c9b6 MINOR: backend: move url_param_name/len to lbprm.arg_str/len
This one is exclusively used by LB parameters, when using URL param
hashing. Let's move it to the lbprm struct under a more generic name.
2019-01-14 19:33:17 +01:00
Emeric Brun
9e7547740c MINOR: ssl: add support of aes256 bits ticket keys on file and cli.
Openssl switched from aes128 to aes256 since may 2016  to compute
tls ticket secrets used by default. But Haproxy still handled only
128 bits keys for both tls key file and CLI.

This patch permit the user to set aes256 keys throught CLI or
the key file (80 bytes encoded in base64) in the same way that
aes128 keys were handled (48 bytes encoded in base64):
- first 16 bytes for the key name
- next 16/32 bytes for aes 128/256 key bits key
- last 16/32 bytes for hmac 128/256 bits

Both sizes are now supported (but keys from same file must be
of the same size and can but updated via CLI only using a key of
the same size).

Note: This feature need the fix "dec func ignores padding for output
size checking."
2019-01-14 19:32:58 +01:00
Olivier Houchard
c98aa1f182 MINOR: checks: Store the proxy in checks.
Instead of assuming we have a server, store the proxy directly in struct
check, and use it instead of s->server.
This should be a no-op for now, but will be useful later when we change
mail checks to avoid having a server.

This should be backported to 1.9.
2019-01-14 11:15:11 +01:00
Willy Tarreau
762475e1f9 BUG/MEDIUM: connection: properly unregister the mux on failed initialization
When mux->init() fails, session_free() will call it again to unregister
it while it was already done, resulting in null derefs or use-after-free.
This typically happens on out-of-memory conditions during H1 or H2 connection
or stream allocation.

This fix must be backported to 1.9.
2019-01-10 19:47:43 +01:00
Christopher Faulet
f7ed195ac8 MINOR: channel/htx: Add the HTX version of channel_truncate/erase
The function channel_htx_truncate() can now be used on HTX buffer to truncate
all incoming data, keeping outgoing one intact. This function relies on the
function channel_htx_erase() and htx_truncate().

This patch may be backported to 1.9. If so, the patch "MINOR: channel/htx: Add
the HTX version of channel_truncate()" must also be backported.
2019-01-08 12:06:55 +01:00
Christopher Faulet
00cf697215 MINOR: htx: Add a function to truncate all blocks after a specific offset
This function will be used to truncate all incoming data in a channel, keeping
outgoing ones.

This may be backported to 1.9.
2019-01-08 12:06:55 +01:00
Christopher Faulet
5811db0043 MINOR: channel/htx: Add HTX version for some helper functions
HTX versions for functions to test the free space in input against the reserve
have been added. Now, on HTX streams, following functions can be used:

  * channel_htx_may_recv
  * channel_htx_recv_limit
  * channel_htx_recv_max
  * channel_htx_full

This patch must be backported in 1.9 because it will be used by a futher patch
to fix a bug.
2019-01-07 16:32:05 +01:00
Christopher Faulet
8564c1f04b MINOR: htx: Add an helper function to get the max space usable for a block
This patch must be backported in 1.9 because it will be used by a futher patch
to fix a bug.
2019-01-07 16:32:02 +01:00
Willy Tarreau
909b9d852b BUILD: add a new file "version.c" to carry version updates
While testing fixes, it's sometimes confusing to rebuild only one C file
(e.g. a mux) and not to have the correct commit ID reported in "haproxy -v"
nor on the stats page.

This patch adds a new "version.c" file which is always rebuilt. It's
very small and contains only 3 variables derived from the various
version strings. These variables are used instead of the macros at the
few places showing the version. This way the output version of the
running code is always correct for the parts that were rebuilt.
2019-01-04 18:20:32 +01:00
Olivier Houchard
f1b11e2d16 MINOR: connections: Remove a stall comment.
Remove the comment that pretends 0x40000000 is unused, it's not true anymore.
2019-01-04 17:26:47 +01:00
Willy Tarreau
0f8fb6b7f9 MINOR: h1: make the H1 headers block parser able to parse headers only
Currently the H1 headers parser works for either a request or a response
because it starts from the start line. It is also able to resume its
processing when it was interrupted, but in this case it doesn't update
the list.

Make it support a new flag, H1_MF_HDRS_ONLY so that the caller can
indicate it's only interested in the headers list and not the start
line. This will be convenient to parse H1 trailers.
2019-01-04 10:48:03 +01:00
Willy Tarreau
1e1f27c5c1 MINOR: h2: add h2_make_htx_trailers to turn H2 headers to HTX trailers
This function is usable to transform a list of H2 header fields to a
HTX trailers block. It takes care of rejecting forbidden headers and
pseudo-headers when performing the conversion. It also emits the
trailing CRLF that is currently needed in the HTX trailers block.
2019-01-03 18:45:38 +01:00
Willy Tarreau
52610e905d MINOR: htx: add a new function to add a block without filling it
htx_add_blk_type_size() creates a block of a specified type and size
and returns it. The caller can then fill it.
2019-01-03 18:45:38 +01:00
Willy Tarreau
9d953e7572 MINOR: h2: add h2_make_h1_trailers to turn H2 headers to H1 trailers
This function is usable to transform a list of H2 header fields to a
H1 trailers block. It takes care of rejecting forbidden headers and
pseudo-headers when performing the conversion.
2019-01-03 18:45:38 +01:00
Willy Tarreau
59884a646c MINOR: lb: allow redispatch when using consistent hash
Redispatch traditionally only worked for cookie based persistence.

Adding redispatch support for consistent hash based persistence - also
update docs.

Reported by Oskar Stenman on discourse:
https://discourse.haproxy.org/t/balance-uri-consistent-hashing-redispatch-3-not-redispatching/3344

Should be backported to 1.8.

Cc: Lukas Tribus <lukas@ltri.eu>
2019-01-02 20:22:17 +01:00
Christopher Faulet
e64582929f MINOR: channel: Add the function channel_add_input
This function must be called when new incoming data are pushed in the channel's
buffer. It updates the channel state and take care of the fast forwarding by
consuming right amount of data and decrementing "->to_forward" accordingly when
necessary. In fact, this patch just moves a part of ci_putblk in a dedicated
function.

This patch must be backported to 1.9.
2019-01-02 20:12:44 +01:00
Olivier Houchard
a2dbeb22fc MEDIUM: sessions: Keep track of which connections are idle.
Instead of keeping track of the number of connections we're responsible for,
keep track of the number of connections we're responsible for that we are
currently considering idling (ie that we are not using, they may be in use
by other sessions), that way we can actually reuse connections when we have
more connections than the max configured.
2018-12-28 19:16:03 +01:00
Olivier Houchard
351411facd BUG/MAJOR: sessions: Use an unlimited number of servers for the conn list.
When a session adds a connection to its connection list, we used to remove
connections for an another server if there were not enough room for our
server. This can't work, because those lists are now the list of connections
we're responsible for, not just the idle connections.
To fix this, allow for an unlimited number of servers, instead of using
an array, we're now using a linked list.
2018-12-28 16:33:13 +01:00
Olivier Houchard
09e498f1a1 BUG/MEDIUM: tasks: Decrement tasks_run_queue in tasklet_free().
If the tasklet is in the list, don't forget to decrement tasks_run_queue
in tasklet_free().

This should be backported to 1.9.
2018-12-24 14:04:55 +01:00
Willy Tarreau
f48919aafb MINOR: buffers: add a new b_move() function
This function will be used to move parts of a buffer to another place
in the same buffer, even if the parts overlap. In order to keep things
under reasonable control, it only uses a length and absolute offsets
for the source and destination, and doesn't consider head nor data.
2018-12-24 11:45:00 +01:00
Willy Tarreau
deab244dc1 MINOR: h2: add a bit-based frame type representation
This will ease checks among sets of frames.
2018-12-24 11:45:00 +01:00
Willy Tarreau
fba74ea7b0 [RELEASE] Released version 2.0-dev0
Released version 2.0-dev0 with the following main changes :
    - BUG/MAJOR: connections: Close the connection before freeing it.
    - REGTEST: Require the option LUA to run lua tests
    - REGTEST: script: Process script arguments before everything else
    - REGTEST: script: Evaluate the varnishtest command to allow quoted parameters
    - REGTEST: script: Add the option --clean to remove previous log direcotries
    - REGTEST: script: Add the option --debug to show logs on standard ouput
    - REGTEST: script: Add the option --keep-logs to keep all log directories
    - REGTEST: script: Add the option --use-htx to enable the HTX in regtests
    - REGTEST: script: Print only errors in the results report
    - REGTEST: Add option to use HTX prefixed by the macro 'no-htx'
    - REGTEST: Make reg-tests target support argument.
    - REGTEST: Fix a typo about barrier type.
    - REGTEST: Be less Linux specific with a syslog regex.
    - REGTEST: Missing enclosing quotes for ${tmpdir} macro.
    - REGTEST: Exclude freebsd target for some reg tests.
    - BUG/MEDIUM: h2: Don't forget to quit the sending_list if SUB_CALL_UNSUBSCRIBE.
    - BUG/MEDIUM: mux-h2: Don't forget to quit the send list on error reports
    - BUG/MEDIUM: dns: Don't prevent reading the last byte of the payload in dns_validate_response()
    - BUG/MEDIUM: dns: overflowed dns name start position causing invalid dns error
    - BUG/MINOR: compression/htx: Don't compress responses with unknown body length
    - BUG/MINOR: compression/htx: Don't add the last block of data if it is empty
    - MEDIUM: mux_h1: Implement h1_show_fd.
    - REGTEST: script: Add support of alternatives in requited options list
    - REGTEST: Add a basic test for the compression
    - BUG/MEDIUM: mux-h2: don't needlessly wake up the demux on short frames
    - REGTEST: A basic test for "http-buffer-request"
    - BUG/MEDIUM: server: Also copy "check-sni" for server templates.
    - MINOR: ssl: Add ssl_sock_set_alpn().
    - MEDIUM: checks: Add check-alpn.
2018-12-22 11:20:35 +01:00
Olivier Houchard
921501443b MEDIUM: checks: Add check-alpn.
Add a way to configure the ALPN used by check, with a new "check-alpn"
keyword. By default, the checks will use the server ALPN, but it may not
be convenient, for instance because the server may use HTTP/2, while checks
are unable to do HTTP/2 yet.
2018-12-21 19:54:16 +01:00
Olivier Houchard
ab28a320aa MINOR: ssl: Add ssl_sock_set_alpn().
Add a new function, ssl_sock_set_alpn(), to be able to change the ALPN
for a connection, instead of relying of the one defined in the SSL_CTX.
2018-12-21 19:53:30 +01:00
Olivier Houchard
8ab8a6eee5 BUG/MAJOR: connections: Close the connection before freeing it.
In si_release_endpoint(), if the end point is a connection, because we don't
know which mux to use it, make sure we close the connection before freeing it,
or else, we'd have a fd left for polling, which would point to a now free'd
connection.

This should be backported to 1.9.
2018-12-20 06:03:14 +01:00
Willy Tarreau
e9f4301f0f MINOR: connection: add cs_set_error() to set the error bits
Depending on the CS_FL_EOS status, we either set CS_FL_ERR_PENDING
or CS_FL_ERROR at various places. Let's have a generic function to
do this.
2018-12-19 18:13:52 +01:00
Willy Tarreau
14bfe9af12 CLEANUP: stream-int: consistently call the si/stream_int functions
As long-time changes have accumulated over time, the exported functions
of the stream-interface were almost all prefixed "si_<something>" while
most private ones (mostly callbacks) were called "stream_int_<something>".
There were still a few confusing exceptions, which were addressed to
follow this shcme :
  - stream_sock_read0(), only used internally, was renamed stream_int_read0()
    and made static
  - stream_int_notify() is only private and was made static
  - stream_int_{check_timeouts,report_error,retnclose,register_handler,update}
    were renamed si_<something>.

Now it is clearer when checking one of these if it risks to be used outside
or not.
2018-12-19 15:25:43 +01:00
Willy Tarreau
94031d30d7 MINOR: connection: remove an unwelcome dependency on struct stream
There was a reference to struct stream in conn_free() for the case
where we're freeing a connection that doesn't have a mux attached.
For now we know it's always a stream, and we only need to do it to
put a NULL in s->si[1].end.

Let's do it better by storing the pointer to si[1].end in the context
and specifying that this pointer is always nulled if the mux is null.
This way it allows a connection to detach itself from wherever it's
being used. Maybe we could even get rid of the condition on the mux.
2018-12-19 14:36:29 +01:00
Willy Tarreau
3d2ee55ebd CLEANUP: connection: rename conn->mux_ctx to conn->ctx
We most often store the mux context there but it can also be something
else while setting up the connection. Better call it "ctx" and know
that it's the owner's context than misleadingly call it mux_ctx and
get caught doing suspicious tricks.
2018-12-19 14:13:07 +01:00
Willy Tarreau
4f6516d677 CLEANUP: connection: rename subscription events values and event field
The SUB_CAN_SEND/SUB_CAN_RECV enum values have been confusing a few
times, especially when checking them on reading. After some discussion,
it appears that calling them SUB_RETRY_SEND/SUB_RETRY_RECV more
accurately reflects their purpose since these events may only appear
after a first attempt to perform the I/O operation has failed or was
not completed.

In addition the wait_reason field in struct wait_event which carries
them makes one think that a single reason may happen at once while
it is in fact a set of events. Since the struct is called wait_event
it makes sense that this field is called "events" to indicate it's the
list of events we're subscribed to.

Last, the values for SUB_RETRY_RECV/SEND were swapped so that value
1 corresponds to recv and 2 to send, as is done almost everywhere else
in the code an in the shutdown() call.
2018-12-19 14:09:21 +01:00
Willy Tarreau
beefaee4f5 MEDIUM: h2: properly check and deduplicate the content-length header in HTX
When producing an HTX message, we can't rely on the next-level H1 parser
to check and deduplicate the content-length header, so we have to do it
while parsing a message. The algorithm is the exact same as used for H1
messages.
2018-12-19 13:08:08 +01:00
Willy Tarreau
d5e3c71208 MINOR: objtype: report a few missing types in names and base pointers
Types DNS_SRVRQ and CS were not referenced in the type to string
conversions, causing possibly misleading outputs in session dumps.
Now instead of showing "NONE" for unknown invalid types names, we
display "!INVAL!" to clear the confusion that may exist in case of
memory corruption for example.
2018-12-18 16:31:10 +01:00
Olivier Houchard
71748cb91b BUG/MEDIUM: connection: Add a new CS_FL_ERR_PENDING flag to conn_streams.
Add a new flag to conn_streams, CS_FL_ERR_PENDING. This is to be set instead
of CS_FL_ERR in case there's still more data to be read, so that we read all
the data before closing.
2018-12-17 21:54:14 +01:00
Willy Tarreau
bce4d8a37d MINOR: debug: make the ABORT_NOW macro use a volatile int
Similar to previous commit, let's make the macro use a volatile when
dereferencing NULL so that clang doesn't optimize it away.
2018-12-16 08:17:23 +01:00
Olivier Houchard
51e474136b MINOR: pools: Cast to volatile int * instead of int *.
When using DEBUG_MEMORY_POOLS, when we want to crash, instead of using
*(int *)0 = 0, use *(volatile int *)0 = 0, or clang will just translate it
to a nop, instead of dereferencing 0.
2018-12-16 08:15:16 +01:00