Commit Graph

3817 Commits

Author SHA1 Message Date
Willy Tarreau
bad0a381d3 MINOR: hpack: move the length computation and encoding functions to .h
We'll need these functions from other inline functions, let's make them
accessible. len_to_bytes() was renamed to hpack_len_to_bytes() since it's
now exposed.
2018-12-11 09:06:46 +01:00
Willy Tarreau
2df026fbce CLEANUP: hpack: no need to include chunk.h, only include buf.h
Chunk.h used to be needed to declare the struct chunk which we don't
use anymore, let's fall back to the lighter buf.h
2018-12-11 09:06:06 +01:00
Willy Tarreau
071d4b31ff MINOR: compiler: add a new macro ALREADY_CHECKED()
This macro may be used to block constant propagation that lets the compiler
detect a possible NULL dereference on a variable resulting from an explicit
assignment in an impossible check. Sometimes a function is called which does
safety checks and returns NULL if safe conditions are not met. The place
where it's called cannot hit this condition and dereferencing the pointer
without first checking it will make the compiler emit a warning about a
"potential null pointer dereference" which is hard to work around. This
macro "washes" the pointer and prevents the compiler from emitting tests
branching to undefined instructions. It may only be used when the developer
is absolutely certain that the conditions are guaranteed and that the
pointer passed in argument cannot be NULL by design.

A typical use case is a top-level function doing this :

     if (frame->type == HEADERS)
        parse_frame(frame);

Then parse_frame() does this :

    void parse_frame(struct frame *frame)
    {
        const char *frame_hdr;

        frame_hdr = frame_hdr_start(frame);
        if (*frame_hdr == FRAME_HDR_BEGIN)
            process_frame(frame);
    }

and :

    const char *frame_hdr_start(const struct frame *frame)
    {
        if (frame->type == HEADERS)
            return frame->data;
        else
            return NULL;
    }

Above parse_frame() is only called for frame->type == HEADERS so it will
never get a NULL in return from frame_hdr_start(). Thus it's always safe
to dereference *frame_hdr since the check was already performed above.
It's then safe to address it this way instead of inventing dummy error
code paths that may create real bugs :

    void parse_frame(struct frame *frame)
    {
        const char *frame_hdr;

        frame_hdr = frame_hdr_start(frame);
        ALREADY_CHECKED(frame_hdr);
        if (*frame_hdr == FRAME_HDR_BEGIN)
            process_frame(frame);
    }
2018-12-08 15:27:03 +01:00
Willy Tarreau
d6735d611e MEDIUM: ist: use local conversion arrays to case conversion
Calling tolower/toupper for each character is slow, a lookup into a
256-byte table is cheaper, especially for common characters used in
header field names which all fit into a cache line. Let's create these
two variables marked weak so that they're included only once.
2018-12-07 13:25:59 +01:00
Willy Tarreau
3f2d696d72 MINOR: ist: add functions to copy/uppercase/lowercase into a buffer or string
The ist functions were missing functions to copy an IST into a target
buffer, making some code have to resort to memcpy(), which tends to be
overkill for small strings, that the compiler cannot guess. In addition
sometimes there is a need to turn a string to lower or upper case so it
had to be overwritten after the operation.

This patch adds 6 functions to copy an ist to a buffer, as binary or as a
string (i.e. a zero is or is not appended), and optionally to apply a
lower case or upper case transformation on the fly.

A number of tests were performed to optimize the processing for small
strings. The loops are marked unlikely to dissuade the compilers from
over-optimizing them and switching to SIMD instructions. The lower case
or upper case transformations used to rely on external functions for
each character and to crappify the code due to clobbered registers,
which is not acceptable when we know that only a certain class of chars
has to be transformed, so the test was open-coded.
2018-12-07 13:25:59 +01:00
Olivier Houchard
d247be0620 BUG/MEDIUM: connections: Split CS_FL_RCV_MORE into 2 flags.
CS_FL_RCV_MORE is used in two cases, to let the conn_stream
know there may be more data available, and to let it know that
it needs more room. We can't easily differentiate between the
two, and that may leads to hangs, so split it into two flags,
CS_FL_RCV_MORE, that means there may be more data, and
CS_FL_WANT_ROOM, that means we need more room.

This should not be backported.
2018-12-06 16:36:05 +01:00
Willy Tarreau
adc7f3edd2 BUG/MEDIUM: stream-int: don't attempt to receive if the connection is not established
If we try to receive before the connection is established, we lose the
send event and are not woken up anymore once the connection is established.
This was diagnosed by Olivier.

No backport is needed.
2018-12-06 15:25:58 +01:00
Willy Tarreau
a3b62d374a MINOR: stream-int: add a new blocking condition on the remote connection
There are some situations where we need to wait for the other side to
be connected. None of the current blocking flags support this. It used
to work more or less by accident using the old flags. Let's add a new
flag to mention we're blocking on this, it's removed by si_chk_rcv()
when a connection is established. It should be enough for now.
2018-12-06 15:24:01 +01:00
William Lallemand
27f3fa56f5 BUG/MEDIUM: mworker: stop every tasks in the master
The master is not supposed to run (at the moment) any task before the
polling loop, the created tasks should be run only in the workers but in
the master they should be disabled or removed.

No backport needed.
2018-12-06 14:12:58 +01:00
Christopher Faulet
aa75b3d2d5 CLEANUP: htx: Fix indentation here and there in HTX files 2018-12-05 17:33:14 +01:00
Christopher Faulet
b2aedea142 MEDIUM: channel/htx: Add functions for forward HTX data
To ease the fast forwarding and the infinte forwarding on HTX proxies, 2
functions have been added to let the channel be almost aware of the way data are
stored in its buffer. By calling these functions instead of legacy ones, we are
sure to forward the right amount of data.
2018-12-05 17:29:30 +01:00
Christopher Faulet
27ba2dc6d6 MEDIUM: htx: Rework conversion from a buffer to an htx structure
Now, the function htx_from_buf() will set the buffer's length to its size
automatically. In return, the caller should call htx_to_buf() at the end to be
sure to leave the buffer hosting the HTX message in the right state. When the
caller can use the function htxbuf() to get the HTX message without any update
on the underlying buffer.
2018-12-05 17:10:16 +01:00
Willy Tarreau
3906e22f6f MINOR: htx: add buf_room_for_htx_data() to help optimize buffer transfers
The small HTX overhead is enough to make the system perform multiple
reads and unaligned memory copies. Here we provide a function whose
purpose is to reduce the apparent room in a buffer by the size of the
overhead for DATA blocks, which is the struct htx plus 2 blocks (one
for DATA, one for the end of message so that small blocks can fit at
once). The muxes using HTX will be encouraged to use this one instead
of b_room() to compute the available buffer room and avoid filling
their demux buf with more data than can fit at once into the HTX
buffer.
2018-12-05 10:57:42 +01:00
Willy Tarreau
8ae4235f94 MINOR: htx: make htx_from_buf() adjust the size only on new buffers
This one is used a lot during transfers, let's avoid resetting its
size when there are already data in the buffer since it implies the
size is correct.
2018-12-05 10:57:42 +01:00
Christopher Faulet
c59ff23804 MINOR: htx: Rename functions htx_*_to_str() to be H1 specific
"_to_h1" suffix is now used because these function produce H1 strings. It avoids
any ambiguity on the output format.
2018-12-04 05:51:37 +01:00
Joseph Herlant
75a323f04e CLEANUP: Fix a typo in the listener subsystem
Fixes a typo in the code comment of the listener subsystem.
2018-12-02 18:43:28 +01:00
Joseph Herlant
f69b807fa4 CLEANUP: Fix typos in the file descriptor subsystem
Fixes 2 typos in the code comment of the file descriptor subsystem.
2018-12-02 18:43:25 +01:00
Joseph Herlant
0b75e63dc5 CLEANUP: Fix a typo in the checks header file
Fixes a typo in the code comments of the checks header file.
2018-12-02 18:43:21 +01:00
Joseph Herlant
eeac3c722f CLEANUP: Fix a typo in the protocol header file
Fixes a typo in the code comments of the header file holding the general
protocol primitives.
2018-12-02 18:42:49 +01:00
Joseph Herlant
8a95a6e5ed CLEANUP: Fix a typo in the connection subsystem
Fixes a typo in the code comments of the connection subsystem.
2018-12-02 18:42:12 +01:00
Joseph Herlant
41abef77cb CLEANUP: Fix a typo in the mini-clist header
Fixes a typo in the code comments of the mini-clist header.
2018-12-02 18:38:15 +01:00
Joseph Herlant
30bc509c40 CLEANUP: Fix typos in the h1 subsystem
Fixes typos in the code comments of the h1 subsystem.
2018-12-02 18:38:02 +01:00
Joseph Herlant
be7619aaca CLEANUP: Fix typo in the chunk headers file
Fix a typo detected in the chunk.h header file's code comments.
2018-12-02 18:37:56 +01:00
Joseph Herlant
c42c0e9969 CLEANUP: fix typos in the htx subsystem
Fix typos detected in the code comments of the htx subsystem.
2018-12-02 18:37:50 +01:00
Olivier Houchard
0c18a6fe34 MEDIUM: servers: Add a way to keep idle connections alive.
Add a new keyword for servers, "idle-timeout". If set, unused connections are
kept alive until the timeout happens, and will be picked for reuse if no
other connection is available.
2018-12-02 18:16:53 +01:00
Olivier Houchard
8defe4b51a MINOR: mux: add a "max_streams" method.
Add a new method to muxes, "max_streams", that returns the max number of
streams the mux can handle. This will be used to know if a mux is in use
or not.
2018-12-02 17:48:32 +01:00
Olivier Houchard
f3e65b086d MINOR: connection: Fix a comment.
Connections can now have an owner for outgoing connections, so update
the comment tu reflect that.
2018-12-02 17:48:28 +01:00
Willy Tarreau
1329b5be71 MINOR: h2: add new functions to produce an HTX message from an H2 response
The new function h2_prepare_htx_stsline() produces an HTX response message
from an H2 response presented as a list of header fields.
2018-12-02 13:30:17 +01:00
Willy Tarreau
3fbea1d8d0 MINOR: server: the mux_proto entry in the server is const
Same as previous commit. We'll have to update this one soon, let's
avoid any cast and mark it const as it really is.
2018-12-02 13:12:16 +01:00
Willy Tarreau
5fc311c001 MINOR: connection: create conn_get_best_mux_entry()
We currently have conn_get_best_mux() to return the best mux for a
given protocol name, side and proxy mode. But we need the mux entry
as well in order to fix the bind_conf and servers at the end of the
config parsing. Let's split the function in two parts. It's worth
noting that the <conn> argument is never used anymore so this part
is eligible to some cleanup.
2018-12-02 13:12:16 +01:00
Willy Tarreau
a004ae3e66 MINOR: listener: the mux_proto entry in the bind_conf is const
We'll have to update this one soon, let's avoid any cast and mark it
const as it really is.
2018-12-02 13:12:15 +01:00
Willy Tarreau
6deb4129de MINOR: h2: implement H2->HTX request header frame transcoding
Till now we could only produce an HTTP/1 request from a list of H2
request headers. Now the new function h2_make_htx_request() does the
same but using the HTX encoding instead, while respecting the H2
semantics. The code is not much different from the first version,
only the encoding differs.

For now it's not used.
2018-12-01 17:38:32 +01:00
Christopher Faulet
75bc913d23 MAJOR: filters: Adapt filters API to be compatible with the HTX represenation
First, to be called on HTX streams, a filter must explicitly be declared as
compatible by setting the flag STRM_FLT_FL_HAS_FILTERS on the filter's config at
HAProxy startup. This flag is checked when a filter implementation is attached
to a stream.

Then, some changes have been made on HTTP callbacks. The callback http_payload
has been added to filter HTX data. It will be called on HTX streams only. It
replaces the callbacks http_data, http_chunk_trailers and http_forward_data,
called on legacy HTTP streams only and marked as deprecated. The documention
(once updated)) will give all information to implement this new callback. Other
HTTP callbacks will be called for HTX and HTTP legacy streams. So it is the
filter's responsibility to known which kind of data it handles. The macro
IS_HTX_STRM should be used in such cases.

There is at least a noticeable changes in the way data are forwarded. In HTX,
after the call to the callback http_headers, all the headers are considered as
forwarded. So, in http_payload, only the body and eventually the trailers will
be filtered.
2018-12-01 17:37:27 +01:00
Christopher Faulet
e44769b4fa MINOR: mux-h1: Capture bad H1 messages
First of all, an dedicated error snapshot, h1_snapshot, has been added. It
contains more or less the some info than http_snapshot but adapted for H1
messages. Then, the function h1_capture_bad_message() has been added to capture
bad H1 messages. And finally, the function h1_show_error_snapshot() is used to
dump these errors. Only Headers or data parsing are captured.
2018-12-01 17:37:27 +01:00
Christopher Faulet
a7b677cd0d MEDIUM: proto_htx: Convert all HTTP error messages into HTX
During startup, after the configuration parsing, all HTTP error messages
(errorloc, errorfile or default messages) are converted into HTX messages and
stored in dedicated buffers. We use it to return errors in the HTX analyzers
instead of using ugly OOB blocks.
2018-12-01 17:37:27 +01:00
Christopher Faulet
b2db4fa016 MINOR: htx: Add BODYLESS flags on the HTX start-line and the HTTP message
the flags HTX_SL_F_BODYLESS and HTTP_MSGF_BODYLESS have been added. These flags
are set when the corresponding HTTP message has no body at all.
2018-12-01 17:37:27 +01:00
Christopher Faulet
f1ba18d7b3 MEDIUM: htx: Don't rely on h1_sl anymore except during H1 header parsing
Instead, we now use the htx_sl coming from the HTX message. It avoids to have
too H1 specific code in version-agnostic parts. Of course, the concept of the
start-line is higly influenced by the H1, but the structure htx_sl can be
adapted, if necessary. And many things depend on a start-line during HTTP
analyzis. Using the structure htx_sl also avoid boring conversions between HTX
version and H1 version.
2018-12-01 17:37:27 +01:00
Christopher Faulet
54483df5ba MINOR: htx: Add the start-line offset for the HTX message in the HTX structure
If there is no start-line, this offset is set to -1. Otherwise, it is the
relative address where the start-line is stored in the data block. When the
start-line is added, replaced or removed, this offset is updated accordingly. On
remove, if the start-line is no set and if the next block is a start-line, the
offset is updated. Finally, when an HTX structure is defragmented, the offset is
also updated accordingly.
2018-12-01 17:37:27 +01:00
Christopher Faulet
570d1614fa MEDIUM: htx: Change htx_sl to be a struct instead of an union
The HTX start-line is now a struct. It will be easier to extend, if needed. Same
info can be found, of course. In addition it is now possible to set flags on
it. It will be used to set some infos about the message.

Some macros and functions have been added in proto/htx.h to help accessing
different parts of the start-line.
2018-12-01 17:37:27 +01:00
Christopher Faulet
14e88252f2 MINOR: htx: Add a function to find the HTX block corresponding to a data offset
The function htx_find_blk() returns the HTX block containing data with a given
offset, relatively to the beginning of the HTX message. It is a good way to skip
outgoing data and find the first HTX block not already processed.
2018-12-01 17:37:27 +01:00
Christopher Faulet
d16b0a7b2d MINOR: htx: Add function to iterate on an HTX message using HTX blocks
the functions htx_get_next() and htx_get_prev() are used to iterate on an HTX
message using blocks position. With htx_get_next_blk() and htx_get_prev_blk(),
it is possible to do the same, but with HTX blocks. Of course, internally, we
rely on position's versions to do so. But it is handy for callers to not take
care of the blocks position.
2018-12-01 17:37:27 +01:00
Christopher Faulet
24ed835129 MINOR: htx: Add function to add an HTX block just before another one
The function htx_add_data_before() can be used to add an HTX block before
another one. For instance, it could be used to add some data before the
end-of-message marker.
2018-12-01 17:37:27 +01:00
Christopher Faulet
3bc1b11dae MEDIUM: conn_stream: Add a way to get mux's info on a CS from the upper layer
Time to time, the need arises to get some info owned by the multiplexer about a
connection stream from the upper layer. Today we really need to get some dates
and durations specific to the conn_stream. It is only true for the mux H1 and
H2. Otherwise it will be impossible to have correct times reported in the logs.

To do so, the structure cs_info has been defined to provide all info we ever
need on a conn_stream from the upper layer. Of course, it is the first step. So
this structure will certainly envloved. But for now, only the bare minimum is
referenced. On the mux side, the callback get_cs_info() has been added in the
structure mux_ops. Multiplexers can now implement it, if necessary, to return a
pointer on a structure cs_info. And finally, the function si_get_cs_info()
should be used from the upper layer. If the stream interface is not attached to
a connection stream, this function returns NULL, likewise if the callback
get_cs_info() is not defined for the corresponding mux.
2018-12-01 17:37:27 +01:00
Willy Tarreau
c01ed9ff20 MINOR: htx: add a function to cut the beginning of a DATA block
htx_cut_data_blk() is used to cut the beginning of a DATA block after a
part of it was tranferred. It simply advances the address, reduces the
advertised length and updates the htx's total data count.
2018-12-01 17:36:59 +01:00
Willy Tarreau
d3c49d17dc BUG/MINOR: connection: report mux modes when HTX is supported
It looks like we forgot to report HTX when listing the muxes and their
respective protocols, leading to "NONE" being displayed. Let's report
"HTX" and "HTTP|HTX" since both will exist. Also fix a minor typo in
the output message.
2018-12-01 17:33:35 +01:00
Olivier Houchard
00cf70f28b MAJOR: sessions: Store multiple outgoing connections in the session.
Instead of just storing the last connection in the session, store all of
the connections, for at most MAX_SRV_LIST (currently 5) targets.
That way we can do keepalive on more than 1 outgoing connection when the
client uses HTTP/2.
2018-12-01 10:47:18 +01:00
William Lallemand
4b58c80ee2 REORG: mworker: declare master variable in global.h
This variable is used at several places, better declare it in global.h.
2018-11-27 19:34:00 +01:00
Willy Tarreau
7f0165e399 MEDIUM: memory: make the pool cache an array and not a thread_local
Having a thread_local for the pool cache is messy as we need to
initialize all elements upon startup, but we can't until the threads
are created, and once created it's too late. For this reason, the
allocation code used to check for the pool's initialization, and
it was the release code which used to detect the first call and to
initialize the cache on the fly, which is not exactly optimal.

Now that we have initcalls, let's turn this into a per-thread array.
This array is initialized very early in the boot process (STG_PREPARE)
so that pools are always safe to use. This allows to remove the tests
from the alloc/free calls.

Doing just this has removed 2.5 kB of code on all cumulated pool_alloc()
and pool_free() paths.
2018-11-26 19:50:32 +01:00
Willy Tarreau
b6b3df3ed3 MEDIUM: initcall: use initcalls for a few initialization functions
signal_init(), init_log(), init_stream(), and init_task() all used to
only preset some values and lists. This needs to be done very early to
provide a reliable interface to all other users. The calls used to be
explicit in haproxy.c:init(). Now they're placed in initcalls at the
STG_PREPARE stage. The functions are not exported anymore.
2018-11-26 19:50:32 +01:00
Willy Tarreau
2455cebe00 MEDIUM: memory: use pool_destroy_all() to destroy all pools on deinit()
Instead of exporting a number of pools and having to manually delete
them in deinit() or to have dedicated destructors to remove them, let's
simply kill all pools on deinit().

For this a new function pool_destroy_all() was introduced. As its name
implies, it destroys and frees all pools (provided they don't have any
user anymore of course).

This allowed to remove 4 implicit destructors, 2 explicit ones, and 11
individual calls to pool_destroy(). In addition it properly removes
the mux_pt_ctx pool which was not cleared on exit (no backport needed
here since it's 1.9 only). The sig_handler pool doesn't need to be
exported anymore and became static now.
2018-11-26 19:50:32 +01:00
Willy Tarreau
8ceae72d44 MEDIUM: init: use initcall for all fixed size pool creations
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.

It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
2018-11-26 19:50:32 +01:00
Willy Tarreau
7107c8b494 MINOR: memory: add a callback function to create a pool
The new function create_pool_callback() takes 3 args including the
return pointer, and creates a pool with the specified name and size.
In case of allocation error, it emits an error message and returns.

The new macro REGISTER_POOL() registers a callback using this function
and will be usable to request some pools creation and guarantee that
the allocation will be checked. An even simpler approach is to use
DECLARE_POOL() and DECLARE_STATIC_POOL() which declare and register
the pool.
2018-11-26 19:50:32 +01:00
Willy Tarreau
e655251e80 MINOR: initcall: use initcalls for section parsers
The two calls to cfg_register_section() and cfg_register_postparser()
are now supported by initcalls. This allowed to remove two other
constructors.
2018-11-26 19:50:32 +01:00
Willy Tarreau
172f5ce948 MINOR: initcall: use initcalls for most post_{check,deinit} and per_thread*
Most calls to hap_register_post_check(), hap_register_post_deinit(),
hap_register_per_thread_init(), hap_register_per_thread_deinit() can
be done using initcalls and will not require a constructor anymore.
Let's create a set of simplified macros for this, called respectively
REGISTER_POST_CHECK, REGISTER_POST_DEINIT, REGISTER_PER_THREAD_INIT,
and REGISTER_PER_THREAD_DEINIT.

Some files were not modified because they wouldn't benefit from this
or because they conditionally register (e.g. the pollers).
2018-11-26 19:50:32 +01:00
Willy Tarreau
8071338c78 MINOR: initcall: apply initcall to all register_build_opts() calls
Most register_build_opts() calls use static strings. These ones were
replaced with a trivial REGISTER_BUILD_OPTS() statement adding the string
and its call to the STG_REGISTER section. A dedicated section could be
made for this if needed, but there are very few such calls for this to
be worth it. The calls made with computed strings however, like those
which retrieve OpenSSL's version or zlib's version, were moved to a
dedicated function to guarantee they are called late in the process.
For example, the SSL call probably requires that SSL_library_init()
has been called first.
2018-11-26 19:50:32 +01:00
Willy Tarreau
90fa97b65e MINOR: threads: add new macros to declare self-initializing locks
Using __decl_spinlock(), __decl_rwlock(), __decl_aligned_spinlock()
and __decl_aligned_rwlock(), one can now simply declare a spinlock
or an rwlock which will automatically be initialized at boot time
by calling the ha_spin_init() or ha_rwlock_init() callback. The
"aligned" variants enforce a 64-byte alignment on the lock.
2018-11-26 19:50:32 +01:00
Willy Tarreau
a8ae77da61 MINOR: thread: provide a set of lock initialisers
This patch adds ha_spin_init() and ha_rwlock_init() which are used as
a callback to initialise locks at boot time. They perform exactly the
same as HA_SPIN_INIT() or HA_RWLOCK_INIT() but from within a real
function.
2018-11-26 19:50:32 +01:00
Willy Tarreau
d13a9281bd MINOR: initcall: introduce a way to register init functions to call at boot
We currently have to deal with multiple initialization stages in a way
that can be confusing, because certain parts rely on others having been
properly initialized. Most calls consist in adding lists to existing
lists, whose heads are initialized in the declaration so this is easy.
But some calls create new pools and require pools to be properly
initialized. Pools currently are thread-local and as such cannot be
pre-initialized, requiring run-time checks.

All this could be simplified by using multiple boot stages and allowing
functions to be registered at various stages.

One approach might be to use gcc's constructor priorities, but this
requires gcc >= 4.3 which eliminates a wide spectrum of working compilers,
and some versions of certain compilers (like clang 3.0) are known for
silently ignore these priorities.

Instead we can use our own init function registration mechanism. A first
attempt was made using register_function() calls in all constructors but
this made the code more painful.

This patch's approach is different. It creates sections containing
arrays of pointers to "initcall" descriptors. An initcall contains a
pointer to a function and an argument. Each section corresponds to a
specific initialization stage. Each module creates such descriptors
for various calls it requires. The main() function starts by scanning
each of these sections in turn to process these initcalls.

This will make it possible to remove many constructors from various
modules, by simply placing initcalls for the requested functions next
to the keyword lists that need to be called.

A first attempt was made by placing the initcalls directly into the
sections instead of creating an array of pointers, but it becomes
sensitive to the array's alignment which depends on the compiler and
the linker, so it seems too fragile.

For now we support 6 init stages :
  - STG_PREPARE  : preset variables, tables and list heads
  - STG_LOCK     : initialize spinlocks and rwlocks
  - STG_ALLOC    : allocate the required structures
  - STG_POOL     : create pools
  - STG_REGISTER : register static lists (keywords etc)
  - STG_INIT     : subsystems normal initialization

These ones are declared directly in the files where they are needed
using one of the INITCALL* macros, passing 0 to 3 pointers as
arguments.

The API should possibly be extended to support a return value to give
a status to the caller, and to support a unified API, possibly a bit
more flexibility in the arguments. In this case it might make sense to
support a set of macros to register functions having a different API
and to pass the function type in the initcall itself.

Special thanks to Olivier for showing how to scan sections as this is
not something particularly well documented and exactly what I've been
missing to achieve this.
2018-11-26 19:50:32 +01:00
Willy Tarreau
a7280a1ec2 BUILD: buffers: buf.h requires unistd to get ssize_t on libmusl
Building with musl and gcc-5.3 for MIPS returns this :

include/common/buf.h: In function 'b_dist':
include/common/buf.h:252:2: error: unknown type name 'ssize_t'
  ssize_t dist = to - from;
  ^
Including stdint or stddef is not sufficient there to get ssize_t,
unistd is needed as well. It's likely that other platforms will have
the same issue. This patch also addresses it in ist.h and memory.h.
2018-11-26 19:49:21 +01:00
Willy Tarreau
6689609090 BUILD: htx: fix fprintf format inconsistency on 32-bit platforms
Building on 32 bits gives this :

  include/proto/htx.h: In function 'htx_dump':
  include/proto/htx.h:443:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 8 has type 'uint64_t {aka long long unsigned int}' [-Wformat=]
         fprintf(stderr, "htx:%p [ size=%u - data=%u - used=%u - wrap=%s - extra=%lu]\n",
                         ^
In htx_dump(), fprintf() uses %lu but the value is an uint64_t so it
doesn't match on 32-bit. Let's cast this to unsigned long long and use
%llu instead.
2018-11-26 19:37:32 +01:00
Olivier Houchard
ee23b2a1e3 MEDIUM: servers: Store the connection in the SI until we have a mux.
When we create a connection, if we have to defer the conn_stream and the
mux creation until we can decide it (ie until the SSL handshake is done, and
the ALPN is decided), store the connection in the stream_interface, so that
we're sure we can destroy it if needed.
2018-11-23 19:11:14 +01:00
Olivier Houchard
201b9f4eb5 MAJOR: connections: Defer mux creation for outgoing connection if alpn is set.
If an ALPN (or a NPN) was chosen for a server, defer choosing the mux until
after the SSL handshake is done, and the ALPN/NPN has been negociated, so
that we know which mux to pick.
2018-11-22 19:52:23 +01:00
Olivier Houchard
c756600103 MINOR: server: Add "alpn" and "npn" keywords.
Add new keywords to "server" lines, alpn and npn.
If set, when connecting through SSL, those alpn/npn will be negociated
during the SSL handshake.
2018-11-22 19:50:08 +01:00
Willy Tarreau
beb859abce MINOR: polling: add an option to support busy polling
In some situations, especially when dealing with low latency on processors
supporting a variable frequency or when running inside virtual machines,
each time the process waits for an I/O using the poller, the processor
goes back to sleep or is offered to another VM for a long time, and it
causes excessively high latencies.

A solution to this provided by this patch is to enable busy polling using
a global option. When busy polling is enabled, the pollers never sleep and
loop over themselves waiting for an I/O event to happen or for a timeout
to occur. On multi-processor machines it can significantly overheat the
processor but it usually results in much lower latencies.

A typical test consisting in injecting traffic over a single connection at
a time over the loopback shows a bump from 4640 to 8540 connections per
second on forwarded connections, indicating a latency reduction of 98
microseconds for each connection, and a bump from 12500 to 21250 for
locally terminated connections (redirects), indicating a reduction of
33 microseconds.

It is only usable with epoll and kqueue because select() and poll()'s
API is not convenient for such usages, and the level of performance they
are used in doesn't benefit from this anyway.

The option, which obviously remains disabled by default, can be turned
on using "busy-polling" in the global section, and turned off later
using "no busy-polling". Its status is reported in "show info" to help
troubleshooting suspicious CPU spikes.
2018-11-22 19:47:30 +01:00
Willy Tarreau
48f8bc1368 MINOR: poller: move the call of tv_update_date() back to the pollers
The reason behind this will be to be able to compute a timeout when
busy polling.
2018-11-22 18:57:37 +01:00
Willy Tarreau
9efd7456e0 MEDIUM: tasks: collect per-task CPU time and latency
Right now we measure for each task the cumulated time spent waiting for
the CPU and using it. The timestamp uses a 64-bit integer to report a
nanosecond-level date. This is only enabled when "profiling.tasks" is
enabled, and consumes less than 1% extra CPU on x86_64 when enabled.
The cumulated processing time and wait time are reported in "show sess".

The task's counters are also reset when an HTTP transaction is reset
since the HTTP part pretends to restart on a fresh new stream. This
will make sure we always report correct numbers for each request in
the logs.
2018-11-22 15:44:21 +01:00
Willy Tarreau
75c62c2793 MINOR: activity: add configuration and CLI support for "profiling.tasks"
This is a new global setting which enables or disables CPU profiling
per task. For now it only sets/resets the variable based on the global
option "profiling.tasks" and supports showing it as well as setting it
from the CLI using "show profiling" and "set profiling". The option will
be used by a future commit. It was done in a way which should ease future
addition of profiling options.
2018-11-22 11:48:51 +01:00
Willy Tarreau
baba82fe70 MINOR: activity: report the average loop time in "show activity"
Since we know the time it takes to process everything between two poll()
calls, we can use this as the max latency measurement any task will
experience and average it.

This code does this, and reports in "show activity" the average of this
loop time over the last 1024 poll() loops, for each thread. It will vary
quickly at high loads and slowly under low to moderate loads, depending
on the rate at which poll() is called. The latency a task experiences
is expected to be half of this on average.
2018-11-22 11:48:41 +01:00
Willy Tarreau
609aad9e73 REORG: time/activity: move activity measurements to activity.{c,h}
At the moment the situation with activity measurement is quite tricky
because the struct activity is defined in global.h and declared in
haproxy.c, with operations made in time.h and relying on freq_ctr
which are defined in freq_ctr.h which itself includes time.h. It's
barely possible to touch any of these files without breaking all the
circular dependency.

Let's move all this stuff to activity.{c,h} and be done with it. The
measurement of active and stolen time is now done in a dedicated
function called just after tv_before_poll() instead of mixing the two,
which used to be a lazy (but convenient) decision.

No code was changed, stuff was just moved around.
2018-11-22 11:48:41 +01:00
Willy Tarreau
17306b905e MINOR: cli: add a few missing includes in proto/cli.h
Just found that proto/cli.h doesn't build if types/cli.h is not also
included by the caller, as it uses cli_kw_list is used in arguments.
But it's also true for a few other ones like mworker_proc, stream,
and channel, so let's fix this.
2018-11-22 11:47:53 +01:00
William Lallemand
31a1c1d5e7 MEDIUM: signal: signal_unregister() removes every handlers
The new function signal_unregister() removes every handlers assigned to
a signal. Once the handler list of the signal is empty, the signal is
ignored with SIG_IGN.
2018-11-22 11:42:51 +01:00
William Lallemand
db6bdfbf68 MINOR: cli: add mworker_accept_wrapper to 'show fd'
In the output of 'show fd', the worker CLI's socketpair was still
handled by an "unknown" function. That can be really confusing during
debug. Fixed it by showing "mworker_accept_wrapper" instead.
2018-11-22 11:42:51 +01:00
William Lallemand
9c56a22b20 MINOR: log: introduce ha_notice()
It's like ha_warning() or ha_alert() but with a NOTICE prefix.
2018-11-21 19:02:23 +01:00
William Lallemand
944e619b64 MEDIUM: mworker: wait mode use standard init code path
The mworker waitpid mode (which is used when a reload failed to apply
the new configuration) was still using a specific initialisation path.
That's a problem since we use a polling loop in the master now, the
master proxy is not initialized and the master CLI is not activated.

This patch removes the initialisation code of the wait mode and
introduce the MODE_MWORKER_WAIT in order to use the same init path as
the MODE_MWORKER with some exceptions. It allows to use the master proxy
and the master CLI during the waitpid mode.
2018-11-21 17:05:30 +01:00
William Lallemand
16dd1b3ead MINOR: cli: show master information in 'show proc'
Displays the master information in show proc.
2018-11-20 04:43:54 +01:00
William Lallemand
e368330128 MINOR: cli: displays uptime in show proc
Displays the uptime of the workers in `show proc`
2018-11-20 04:43:54 +01:00
Willy Tarreau
3a1f5fda10 REORG: config: extract the proxy parser into cfgparse-listen.c
This was the largest function of the whole file, taking a rough second
to build alone. Let's move it to a distinct file along with a few
dependencies. Doing so saved about 2 seconds on the total build time.
2018-11-19 06:47:09 +01:00
Willy Tarreau
36b9e222bb REORG: config: extract the global section parser into cfgparse-global
The config parser is the largest file to build and its build dominates
the total project's build time. Let's start to split it into multiple
smaller pieces by extracting the "global" section parser into a new
file called "cfgparse-global.c". This removes 1/4th of the file's build
time.
2018-11-19 06:41:57 +01:00
Joseph Herlant
32b8327266 CLEANUP: Fix typos in the standard subsystem
Fix typos in the code comments of the standard subsystem.
2018-11-18 22:26:42 +01:00
Joseph Herlant
f7f6031184 CLEANUP: Fix typos in the spoe subsystem
Fix typos in the code comments of the spoe subsystem.
2018-11-18 22:26:42 +01:00
Joseph Herlant
757f5ad73a CLEANUP: Fix typos in the sample subsystem
Fix some typos in the code comment of the sample subsystem.
2018-11-18 22:26:42 +01:00
Joseph Herlant
85b4059b82 CLEANUP: Fix typos in the log subsystem
Fix some misspells in the code comments of the log subsystem.
2018-11-18 22:26:42 +01:00
Joseph Herlant
b35ea68081 CLEANUP: Fix typos in the filters subsystem
Fix typos in the code comments of the filters subsystems.
2018-11-18 22:26:42 +01:00
Joseph Herlant
59dd295926 CLEANUP: fix typos in the proxy subsystem
Fix typos in the code comments of the proxy subsystem.
2018-11-18 22:23:15 +01:00
Joseph Herlant
5ba8025976 CLEANUP: fix typos in the proto_http subsystem
Fixes typos in the code comments of the proto_http subsystem.
2018-11-18 22:23:15 +01:00
Joseph Herlant
44466826b1 CLEANUP: fix a few typos in the comments of the server subsystem
A few misspells where detected in the server subsystem. This commit
fixes them.
2018-11-18 22:23:15 +01:00
Joseph Herlant
42cf6395c4 CLEANUP: Fix typos in the dns subsystem
Fix misspells in the code comments of the dns subsystem.
2018-11-18 22:23:15 +01:00
Christopher Faulet
ef453ed9b0 MINOR: http_fetch: Add smp_prefetch_htx
It does the same than smp_prefetch_http but for HTX messages. It can be called
from an HTTP proxy or a TCP proxy. For HTTP proxies, the parsing is handled by
the mux, so it does nothing but wait. For TCP proxies, it tries to parse an HTTP
message and to convert it in a temporary HTX message. Sample fetches will use
this temporary variable to do their job.
2018-11-18 22:09:00 +01:00
Christopher Faulet
fefc73da34 MINOR: proto_htx: Add functions htx_perform_server_redirect
It is more or less the same than legacy version but adapted to be called from
HTX analyzers. In the legacy version of this function, we switch on the HTX code
when applicable.
2018-11-18 22:08:58 +01:00
Christopher Faulet
64159df1fb MINOR: proto_htx: Add functions htx_send_name_header
It is more or less the same than legacy version but adapted to be called from
HTX analyzers. In the legacy version of this function, we switch on the HTX code
when applicable.
2018-11-18 22:08:58 +01:00
Christopher Faulet
25a02f65b1 MINOR: proto_htx: Add functions to check the cacheability of HTX messages
It is more or less the same than legacy versions but adapted to be called from
HTX analyzers. In the legacy versions of these functions, we switch on the HTX
code when applicable.
2018-11-18 22:08:58 +01:00
Christopher Faulet
8d8ac191a7 MINOR: proto_htx: Add functions htx_req_replace_stline and htx_res_set_status
It is more or less the same than legacy versions but adapted to be called from
HTX analyzers. In the legacy versions of these functions, we switch on the HTX
code when applicable.
2018-11-18 22:08:56 +01:00
Christopher Faulet
7233352fe4 MINOR: proto_htx: Add functions htx_transform_header and htx_transform_header_str
It is more or less the same than legacy versions but adapted to be called from
HTX analyzers.
2018-11-18 22:08:56 +01:00
Christopher Faulet
7ff1ceaa5e MINOR: http_htx: Add functions to retrieve a specific occurrence of a header
There are 2 functions. The first one considers any comma as a delimiter for
distinct values. The second one considers full-line headers.
2018-11-18 22:08:55 +01:00
Christopher Faulet
e010c80753 MINOR: http_htx: Add functions to replace part of the start-line 2018-11-18 22:08:54 +01:00
Christopher Faulet
0f226958b7 MINOR: proto_htx: Add some functions to handle HTX messages
More functions will come, but it is the minimum to switch HTX analyzers on the
HTX internal representation.
2018-11-18 22:08:54 +01:00
Christopher Faulet
47596d3787 MINOR: http_htx: Add functions to manipulate HTX messages in http_htx.c
This file will host all functions to manipulate HTTP messages using the HTX
representation. Functions in this file will be able to be called from anywhere
and are mainly related to the HTTP semantics.
2018-11-18 22:08:53 +01:00
Christopher Faulet
a3d2a16fad MEDIUM: htx: Add API to deal with the internal representation of HTTP messages
The internal representation of an HTTP message, called HTX, is a structured
representation, unlike the old one which is a raw representation of
messages. Idea is to have a version-agnostic representation of the HTTP
messages, which can be easily used by to handle HTTP/1, HTTP/2 and hopefully
QUIC messages, and communication from one of them to another.

In this patch, we add types to define the internal representation itself and the
main functions to manipulate them.
2018-11-18 22:08:53 +01:00
Christopher Faulet
f2824e6e10 MAJOR: mux-h1/proto_htx: Handle keep-alive connections in the mux
Now, the connection mode is detected in the mux and not in HTX analyzers
anymore. Keep-alive connections are now managed by the mux. A new stream is
created for each transaction. This removes the most important part of the
synchronization between channels and the HTTP transaction cleanup. These changes
only affect the HTX part (proto_htx.c). Legacy HTTP analyzers remain untouched
for now.

On the client-side, the mux is responsible to create new streams when a new
request starts. It is also responsible to parse and update the "Connection:"
header of the response. On the server-side, the mux is responsible to parse and
update the "Connection:" header of the request. Muxes on each side are
independent. For now, there is no connection pool on the server-side, so it
always close the server connection.
2018-11-18 22:02:42 +01:00
Christopher Faulet
e0768ebabc MEDIUM: proto_htx: Add HTX analyzers and use it when the mux H1 is used
For now, these analyzers are just copies of the legacy HTTP analyzers. But,
during the HTTP refactoring, it will be the main place where it will be
visible. And in legacy analyzers, the macro IS_HTX_STRM is used to know if the
HTX version should be called or not.

Note: the following commits were applied to proto_http.c after this patch
      was developed and need to be studied to see if an adaptation to htx
      is required :

  fd9b68c BUG/MINOR: only mark connections private if NTLM is detected
2018-11-18 21:45:50 +01:00
Christopher Faulet
1d5b85aba2 MINOR: http: Add macros to check if a stream uses the HTX representation
To prepare the refactoring of the code handling HTTP messages, these macros will
help to use HTX functions instead of legacy ones when the new HTX internal
representation is in use. To do so, for a given stream, we will check if its
frontend has the option PR_O2_USE_HTX. It is useless to test backend options
because it is not possible to mix the HTX representation and the legacy one
(i.e, having an HTX frontend and a legacy backend or vice versa).
2018-11-18 21:45:50 +01:00
Christopher Faulet
effc3750cc MINOR: conn_stream: Add a flag to notify the SI some data were received
The flag CS_FL_READ_PARTIAL can be set by the mux on the conn_stream to notify
the stream interface that some data were received. Is is used in si_cs_recv to
re-arm read timeout on the channel.
2018-11-18 21:45:49 +01:00
Christopher Faulet
27a3dc8fb2 MINOR: http: Call http_send_name_header with the stream instead of the txn
This is just a minor change to ease integrartion of the HTX.
2018-11-18 21:45:49 +01:00
Christopher Faulet
8277ca72b1 MINOR: http: Add standalone functions to parse a start-line or a header
These 2 functions are pretty naive. They only split a start-line into its 3
substrings or a header line into its name and value. Spaces before and after
each part are skipped. No CRLF at the end are expected.
2018-11-18 21:45:49 +01:00
Christopher Faulet
72d9125efb MINOR: conn_stream: Add a flag to notify the mux it must respect the reserve
By setting the flag CO_RFL_KEEP_RSV when calling mux->rcv_buf, the
stream-interface notifies the mux it must keep some space to preserve the
buffer's reserve. This flag is only useful for multiplexers handling structured
data, because in such case, the stream-interface cannot know the real amount of
free space in the channel's buffer.
2018-11-18 21:45:48 +01:00
Christopher Faulet
c6618d6835 MINOR: conn_stream: Add a flag to notify the mux it should flush its buffers
By setting the flag CO_RFL_BUF_FLUSH when calling mux->rcv_buf, the
stream-interface notifies the mux it should flush its buffers without reading
more data. This flag is set when the SI want to use the kernel TCP splicing to
forward data. Of course, the mux can respect it or not, depending on its
state. It's just an information.
2018-11-18 21:45:48 +01:00
Olivier Houchard
7c6f8b146d MAJOR: connections: Detach connections from streams.
Do not destroy the connection when we're about to destroy a stream. This
prevents us from doing keepalive on server connections when the client is
using HTTP/2, as a new stream is created for each request.
Instead, the session is now responsible for destroying connections.
When reusing connections, the attach() mux method is now used to create a new
conn_stream.
2018-11-18 21:45:45 +01:00
Olivier Houchard
131fd89d5a MINOR: sessions: Start to store the outgoing connection in sessions.
Introduce a new field in session, "srv_conn", and a linked list of sessions
in the connection. It will be used later when we'll switch connections
from being managed by the stream, to being managed by the session.
2018-11-18 21:44:56 +01:00
Olivier Houchard
060ed43361 MINOR: mux: Add a destroy() method.
Add a new method to muxes, destroy(), that is responsible for destroying
the mux and the associated connection, to be used for server connections.
2018-11-18 21:44:53 +01:00
Olivier Houchard
d540b36e8a MINOR: mux: Add a new "avail_streams" method.
Add a new method for mux, avail_streams, that returns the number of streams
still available for a mux.
For the mux_pt, it'll return 1 if the connection is in idle, or 0. For
the H2 mux, it'll return the max number of streams allowed, minus the number
of streams currently in use.
2018-11-18 21:44:06 +01:00
Willy Tarreau
db398435aa MINOR: stream-int: replace si_cant_put() with si_rx_room_{blk,rdy}()
Remaining calls to si_cant_put() were all for lack of room and were
turned to si_rx_room_blk(). A few places where SI_FL_RXBLK_ROOM was
cleared by hand were converted to si_rx_room_rdy().

The now unused si_cant_put() function was removed.
2018-11-18 21:41:50 +01:00
Willy Tarreau
b26a6f9708 MEDIUM: stream-int: make use of si_rx_chan_{rdy,blk} to control the stream-int from the channel
The channel can disable reading from the stream-interface using various
methods, such as :
  - CF_DONT_READ
  - !channel_may_recv()
  - and possibly others

Till now this was done by mangling SI_FL_RX_WAIT_EP which is not
appropriate at all since it's not the stream interface which decides
whether it wants to deliver data or not. Some places were also wrongly
relying on SI_FL_RXBLK_ROOM since it was the only other alternative,
but it's not suitable for CF_DONT_READ.

Let's use the SI_FL_RXBLK_CHAN flag for this instead. It will properly
prevent the stream interface from being woken up and reads from
subscribing to more receipt without being accidently removed. It is
automatically reset if CF_DONT_READ is not set in stream_int_notify().

The code is not trivial because it splits the logic between everything
related to buffer contents (channel_is_empty(), CF_WRITE_PARTIAL, etc)
and buffer policy (CF_DONT_READ). Also it now needs to decide timeouts
based on any blocking flag and not just SI_FL_RXBLK_ROOM anymore.

It looks like this patch has caused a minor performance degradation on
connection rate, which possibly deserves being investigated deeper as
the test conditions are uncertain (e.g. slightly more subscribe calls?).
2018-11-18 21:41:49 +01:00
Willy Tarreau
abb5d4202f MEDIUM: stream-int: use si_rx_shut_blk() to indicate the SI is closed
Till now we were using si_done_put() upon shutr, but these flags could
be reset upon next activity. Now let's switch to SI_FL_RXBLK_SHUT which
doesn't go away. It's also set in stream_int_update() in case a shutr
condition is detected.

The now unused si_done_put() was removed.
2018-11-18 21:41:49 +01:00
Willy Tarreau
7f494d0c5e MINOR: stream-int: make si_sync_recv() simply check ENDP before si_cs_recv()
Instead of checking complex conditions to call si_cs_recv() upon first
call, let's simply use si_rx_endp_ready() now that si_cs_recv() reports
it accurately, and add si_rx_blocked() to cover any blocking situation.
2018-11-18 21:41:48 +01:00
Willy Tarreau
8bb2ffb831 MINOR: stream-int: replace si_{want,stop}_put() with si_rx_endp_{more,done}()
Here it's only a 1-to-1 replacement.
2018-11-18 21:41:47 +01:00
Willy Tarreau
8be7cd7b92 MEDIUM: stream-int: use si_rx_buff_{rdy,blk} to report buffer readiness
The stream interface used to conflate a missing buffer and lack of
buffer space into SI_FL_WAIT_ROOM but this causes difficulties as
these cannot be checked at the same moment and are not resolved at
the same moment either. Now we instead mark the buffer as presumably
available using si_rx_buff_rdy() and mark it as unavailable+requested
using si_rx_buff_blk().

The call to si_alloc_buf() was moved after si_stop_put(). This makes
sure that the SI_FL_RX_WAIT_EP flag is cleared on allocation failure so
that the function is called again if the callee fails to do its work.
2018-11-18 21:41:47 +01:00
Willy Tarreau
32742fdf45 MINOR: stream-int: use si_rx_blocked()/si_tx_blocked() to check readiness
This way we don't limit ourselves to random flags only and the code
is more readable and safer for the long term.
2018-11-18 21:41:46 +01:00
Willy Tarreau
05b9b64afb MINOR: stream-int: replace SI_FL_WANT_PUT with !SI_FL_RX_WAIT_EP
The SI_FL_WANT_PUT flag is used in an awkward way, sometimes it's
set by the stream-interface to mean "I have something to deliver",
sometimes it's cleared by the channel to say "I don't want you to
send what you have", and it has to be set back once CF_DONT_READ
is cleared. This will have to be split between SI_FL_RX_WAIT_EP
and SI_FL_RXBLK_CHAN. This patch only replaces all uses of the
flag with its natural (but negated) replacement SI_FL_RX_WAIT_EP.
The code is expected to be strictly equivalent. The now unused flag
was completely removed.
2018-11-18 21:41:46 +01:00
Willy Tarreau
78dcacef5c MINOR: stream-int: add new functions si_{rx,tx}_{blocked,endp_ready}()
The first ones are used to figure if a direction is blocked on the
stream interface for anything but the end point. The second ones are
used to detect if the end point is ready to receive/transmit. They
should be used instead of directly fiddling with the existing bits.
2018-11-18 21:41:46 +01:00
Willy Tarreau
94f7907d65 MINOR: stream-int: introduce new SI_FL_RXBLK flags
The plan is to have the following flags to describe why a stream interface
doesn't produce data :

    - SI_FL_RXBLK_CHAN : the channel doesn't want it to receive
    - SI_FL_RXBLK_BUFF : waiting for a buffer allocation to complete
    - SI_FL_RXBLK_ROOM : more room is required in the channel to receive
    - SI_FL_RXBLK_SHUT : input now closed, nothing new will come
    - SI_FL_RX_WAIT_EP : waiting for the endpoint to produce more data

Applets like the CLI which consume complete commands at once and produce
large chunks of responses will for example be able to stop being woken up
by clearing SI_FL_WANT_GET and setting SI_FL_RXBLK_ROOM when the rx buffer
is full. Once called they will unblock WANT_GET. The flags were moved
together in readable form with the Rx bits using 2 hex digits and still
have some room to do a similar operation on the Tx path later, with the
WAIT_EP flag being represented alone on a digit.
2018-11-18 21:41:45 +01:00
Willy Tarreau
d0f5bbcd64 MINOR: stream-int: rename SI_FL_WAIT_ROOM to SI_FL_RXBLK_ROOM
This flag is not enough to describe all blocking situations, as can be
seen in each case we remove it. The muxes has taught us that using multiple
blocking flags in parallel will be much easier, so let's start to do this
now. This patch only renames this flags in order to make next changes more
readable.
2018-11-18 21:41:45 +01:00
Willy Tarreau
a44e576f62 MINOR: stream-int: expand the flags to 32-bit
We used to have enough of 16 bits, with 3 still available but it's
not possible to add the rx/tx blocking bits there. Let's extend the
format to 32 bits and slightly reorder the fields to maintain the
struct size to 64 bytes. Nothing else was changed.
2018-11-18 21:41:45 +01:00
Willy Tarreau
fafd3984b9 MINOR: mux: implement a get_first_cs() method
This method is used to retrieve the first known good conn_stream from
the mux. It will be used to find the other end of a connection when
dealing with the proxy protocol for example.
2018-11-18 21:29:20 +01:00
Willy Tarreau
ade6478a8c MINOR: stream: move the conn_stream specific calls to the stream-int
There are still some unwelcome synchronous calls to si_cs_recv() in
process_stream(). Let's have a new function si_sync_recv() to perform
a synchronous receive call on a stream interface regardless of the type
of its endpoint, and move these calls there. For now it only implements
conn_streams since it doesn't seem useful to support applets there. The
function implements an extra check for the stream interface to be in an
established state before attempting anything.
2018-11-17 19:53:45 +01:00
William Lallemand
c59f9884d7 MEDIUM: listeners: support unstoppable listener
An unstoppable listener is a listener which won't be stop during a soft
stop. The unstoppable_jobs variable is incremented and the listener
won't prevent the process to leave properly.

It is not a good idea to use this feature (the LI_O_NOSTOP flag) with a
listener that need to be bind again on another process during a soft
reload.
2018-11-16 17:05:40 +01:00
William Lallemand
a719926cf8 MEDIUM: jobs: support unstoppable jobs for soft stop
This patch allows a process to properly quit when some jobs are still
active, this feature is handled by the unstoppable_jobs variable, which
must be atomically incremented.

During each new iteration of run_poll_loop() the break condition of the
loop is now (jobs - unstoppable_jobs) == 0.

The unique usage of this at the moment is to handle the socketpair CLI
of a the worker during the stopping of the process.  During the soft
stop, we could mark the CLI listener as an unstoppable job and still
handle new connections till every other jobs are stopped.
2018-11-16 17:05:40 +01:00
Frdric Lcaille
9ca51aa288 MINOR: http: Implement "early-hint" http request rules.
This patch implements http_apply_early_hint_rule() function is responsible of
building HTTP 103 Early Hint responses each time a "early-hint" rule is matched.
2018-11-12 21:08:55 +01:00
Frdric Lcaille
0ebbcb663c MINOR: http: Make new "early-hint" http-request action really be parsed.
This patch adds a "early_hint" struct to "arg" union of "act_rule" struct
and parse "early-hint" http-request keyword with it using the same
code as for "(add|set)-header" parser.
2018-11-12 21:08:55 +01:00
Frdric Lcaille
a985e3875b MINOR: http: Add new "early-hint" http-request action.
This patch adds the new "early-hint" action to "http-request" rules parser.
This action should be parsed the same way as "(add|set)-header" actions.
2018-11-12 21:08:55 +01:00
Willy Tarreau
7520e4ff57 MINOR: namespaces: don't build namespace.c if disabled
When namespaces are disabled, support is still reported because the file
is built with almost nothing in it but built anyway. Instead of extending
the scope of the numerous ifdefs in this file, better avoid building it
when namespaces are diabled. In this case we define my_socketat() as an
inline function mapping directly to socket(). The struct netns_entry
still needs to be defined because it's used by various other functions
in the code.
2018-11-12 19:15:15 +01:00
Willy Tarreau
c1b0645dac MEDIUM: log: add a new "raw" format
This format is pretty similar to the previous "short" format except
that it also removes the severity level. Thus only the raw message is
sent. This is suitable for use in containers, where only the raw
information is expected and where the severity is supposed to come
from the file descriptor used.
2018-11-12 18:37:55 +01:00
Willy Tarreau
e8746a08b2 MEDIUM: log: support a new "short" format
This format is meant to be used with local file descriptors. It emits
messages only prefixed with a level, removing all the process name,
system name, date and so on. It is similar to the printk() format used
on Linux. It's suitable to be sent to a local logger compatible with
systemd's output format.

Note that the facility is still required but not used, hence it is
suggested to use "daemon" to remind that it's a local logger.
Example :

    log stdout format short daemon          # send everything to stdout
    log stderr format short daemon notice   # send important events to stderr
2018-11-12 18:37:55 +01:00
Willy Tarreau
13ef773722 MINOR: log: report the number of dropped logs in the stats
It's easy to detect when logs on some paths are lost as sendmsg() will
return EAGAIN. This is particularly true when sending to /dev/log, which
often doesn't support a big logging capacity. Let's keep track of these
and report the total number of dropped messages in "show info".
2018-11-12 18:37:55 +01:00
Willy Tarreau
d0d40ebf5e CLEANUP: stream-int: remove the now unused si->update() function
We exclusively use stream_int_update() now, the lower layers are not
called anymore so let's remove them, as well as si_update() which used
to be their wrapper.
2018-11-11 10:18:37 +01:00
Willy Tarreau
d14844a734 MINOR: stream-int: replace si_update() with si_update_both()
The function used to be called in turn for each side of the stream, but
since it's called exclusively from process_stream(), it prevents us from
making use of the knowledge we have of the operations in progress for
each side, resulting in having to go all the way through functions like
stream_int_notify() which are not appropriate there.

That patch creates a new function, si_update_both() which takes two
stream interfaces expected to belong to the same stream, and processes
their flags in a more suitable order, but for now doesn't change the
logic at all.

The next step will consist in trying to reinsert the rest of the socket
layer-specific update code to ultimately update the flags correctly at
the end of the operation.
2018-11-11 10:18:37 +01:00
Willy Tarreau
8fe516f08a MEDIUM: stream-int: make si_chk_rcv() check that SI_FL_WAIT_ROOM is cleared
After careful inspection, it now seems OK to call si_chk_rcv() only when
SI_FL_WAIT_ROOM is cleared and SI_FL_WANT_PUT is set, since all identified
call places have already taken care of this.
2018-11-11 10:18:37 +01:00
Willy Tarreau
abf531caa0 MEDIUM: stream-int: always call si_chk_rcv() when we make room in the buffer
Instead of clearing the SI_FL_WAIT_ROOM flag and losing the information
about the need from the producer to be woken up, we now call si_chk_rcv()
immediately. This is cheap to do and it could possibly be further improved
by only doing it when SI_FL_WAIT_ROOM was still set, though this will
require some extra auditing of the code paths.

The only remaining place where the flag was cleared without a call to
si_chk_rcv() is si_alloc_ibuf(), but since this one is called from a
receive path woken up from si_chk_rcv() or not having failed, the
clearing was not necessary anymore either.

And there was one place in stream_int_notify() where si_chk_rcv() was
called with SI_FL_WAIT_ROOM still explicitly set so this place was
adjusted in order to clear the flag prior to calling si_chk_rcv().

Now we don't have any situation where we randomly clear SI_FL_WAIT_ROOM
without trying to wake the other side up, nor where we call si_chk_rcv()
with the flag set, so this flag should accurately represent a failed
attempt at putting data into the buffer.
2018-11-11 10:18:37 +01:00
Willy Tarreau
1f9de21c38 MEDIUM: stream-int: make SI_FL_WANT_PUT reflect CF_DONT_READ
When CF_DONT_READ is set, till now we used to set SI_FL_WAIT_ROOM, which
is not appropriate since it would lose the subscribe status. Instead let's
clear SI_FL_WANT_PUT (just like applets do), and set the flag only when
CF_DONT_READ is cleared.

We have to do this in stream_int_update(), and in si_cs_io_cb() after
returning from si_cs_recv() since it would be a bit invasive to hack
this one for now. It must not be done in stream_int_notify() otherwise
it would re-enable blocked applets.

Last, when si_chk_rcv() is called, it immediately clears the flag before
calling ->chk_rcv() so that we are not tempted to uselessly loop on the
same call until the receive function is called. This is the same principle
as what is done with the applet scheduler.
2018-11-11 10:18:37 +01:00
Willy Tarreau
1bdb598a55 MINOR: stream-int: factor the SI_ST_EST state test into si_chk_rcv()
This test is made in each implementation of the function, better to
merge it.
2018-11-11 10:18:37 +01:00
Willy Tarreau
96aadd5c55 MEDIUM: stream-int: temporarily make si_chk_rcv() take care of SI_FL_WAIT_ROOM
This flag should already be cleared before calling the *chk_rcv() functions.
Before adapting all call places, let's first make sure si_chk_rcv() clears
it before calling them so that these functions do not have to check it again
and so that they do not adjust it. This function will only call the lower
layers if the SI_FL_WANT_PUT flag is present so that the endpoint can decide
not to be called (as done with applets).
2018-11-11 10:18:37 +01:00
Willy Tarreau
57f08bb63b MINOR: stream-int: make it clear that si_ops cannot be null
There was an ambiguity in which functions of the si_ops struct could be
null or not. only ->update doesn't exist in one of the si_ops (the
embedded one), all others are always defined. ->shutr and ->shutw were
never tested. However ->chk_rcv() and ->chk_snd() were tested, causing
confusion about the proper way to wake the other side up if undefined
(which never happens).

Let's update the comments to state these functions are mandatory and
remove the offending checks.
2018-11-11 10:18:37 +01:00
Willy Tarreau
af4f6f6d2f MINOR: stream-int: use si_cant_put() instead of setting SI_FL_WAIT_ROOM
We now do this on the si_cs_recv() path so that we always have
SI_FL_WANT_PUT properly set when there's a need to receive and
SI_FL_WAIT_ROOM upon failure.
2018-11-11 10:18:37 +01:00
Willy Tarreau
394970c297 MINOR: stream-int: add si_done_{get,put} to indicate that we won't do it anymore
This is useful on close or stream aborts as it saves us from having
to manipulate the (sometimes confusing) flags.
2018-11-11 10:18:37 +01:00
Willy Tarreau
0cd3bd628a MINOR: stream-int: rename si_applet_{want|stop|cant}_{get|put}
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
2018-11-11 10:18:37 +01:00
Willy Tarreau
21028b5e7f MEDIUM: appctx: check for allocation attempts in buffer allocation callbacks
The buffer allocation callback appctx_res_wakeup() used to rely on old
tricks to detect if a buffer was already granted to an appctx, namely
by checking the task's state. Not only this test is not valid anymore,
but it's inaccurate.

Let's solely on SI_FL_WAIT_ROOM that is now set on allocation failure by
the functions trying to allocate a buffer. The buffer is now allocated on
the fly and the flag removed so that the consistency between the two
remains granted. The patch also fixes minor issues such as the function
being improperly declared inline(!) and the fact that using appctx_wakeup()
sets the wakeup reason to TASK_WOKEN_OTHER while we try to use TASK_WOKEN_RES
when waking up consecutive to a ressource allocation such as a buffer.
2018-11-11 10:18:37 +01:00
Willy Tarreau
b882dd88cc MEDIUM: stream: implement stream_buf_available()
This function replaces stream_res_available(), which is used as a callback
for the buffer allocator. It now carefully checks which stream interface
was blocked on a buffer allocation, tries to allocate the input buffer to
this stream interface, and wakes the task up once such a buffer was found.
It will automatically remove the SI_FL_WAIT_ROOM flag upon success since
the info this flag indicates becomes wrong as soon as the buffer is
allocated.

The code is still far from being perfect because if a call to si_cs_recv()
fails to allocate a buffer, we'll still end up passing via process_stream()
again, but this could be improved in the future by using finer-grained
wake-up notifications.
2018-11-11 10:18:37 +01:00
Willy Tarreau
2d372c2aa1 MINOR: stats: report the number of currently connected peers
The active peers output indicates both the number of established peers
connections and the number of peers connection attempts. The new counter
"ConnectedPeers" also indicates the number of currently connected peers.
This helps detect that some peers cannot be reached for example. It's
worth mentioning that this value changes over time because unused peers
are often disconnected and reconnected. Most of the time it should be
equal to ActivePeers.
2018-11-05 17:15:21 +01:00
Willy Tarreau
199ad24661 MINOR: stats: report the number of active peers in "show info"
Peers are the last type of activity which can maintain a job present, so
it's important to report that such an entity is still active to explain
why the job count may be higher than zero. Here by "ActivePeers" we report
peers sessions, which include both established connections and outgoing
connection attempts.
2018-11-05 17:15:21 +01:00
Willy Tarreau
00098ea034 MINOR: stats: report the number of active jobs and listeners in "show info"
When an haproxy process doesn't stop after a reload, it's because it
still has some active "jobs", which mainly are active sessions, listeners,
peers or other specific activities. Sometimes it's difficult to troubleshoot
the cause of these issues (which generally are the result of a bug) only
because some indicators are missing.

This patch add the number of listeners, the number of jobs, and the stopping
status to the output of "show info". This way it becomes a bit easier to try
to narrow down the cause of such an issue should it happen. A typical use
case is to connect to the CLI before reloading, then issuing the "show info"
command to see what happens. In the normal situation, stopping should equal
1, jobs should equal 1 (meaning only the CLI is still active) and listeners
should equal zero.

The patch is so trivial that it could make sense to backport it to 1.8 in
order to help with troubleshooting.
2018-11-05 17:15:21 +01:00
Willy Tarreau
4698adf68f MINOR: compat: automatically detect support for crypt_r()
glibc >= 2.2 and FreeBSD >= 12.0 support crypt_r(), let's detect this
and set a macro HA_HAVE_CRYPT_R for this.
2018-10-29 19:14:14 +01:00
Willy Tarreau
34d4b525a1 BUG/MEDIUM: auth/threads: use of crypt() is not thread-safe
It was reported here that authentication may fail when threads are
enabled :

    https://bugzilla.redhat.com/show_bug.cgi?id=1643941

While I couldn't reproduce the issue, it's obvious that there is a
problem with the use of the non-reentrant crypt() function there.
On Linux systems there's crypt_r() but not on the vast majority of
other ones. Thus a first approach consists in placing a lock around
this crypt() call. Another patch may relax it when crypt_r() is
available.

This fix must be backported to 1.8. Thanks to Ryan O'Hara for the
quick notification.
2018-10-29 18:06:02 +01:00
Willy Tarreau
ce487aab46 BUG/MEDIUM: tools: fix direction of my_ffsl()
Commit 27346b01a ("OPTIM: tools: optimize my_ffsl() for x86_64") optimized
my_ffsl() for intensive use cases in the scheduler, but as half of the times
I got it wrong so it counted bits the reverse way. It doesn't matter for the
scheduler nor fd cache but it broke cpu-map with threads which heavily relies
on proper ordering.

We should probably consider dropping support for gcc < 3.4 and switching
to builtins for these ones, though often they are as ambiguous.

No backport is needed.
2018-10-29 16:09:57 +01:00
Willy Tarreau
8e9f4531cb BUG/MINOR: memory: make the thread-local cache allocator set the debugging link
When building with DEBUG_MEMORY_POOLS, an element returned from the
cache would not have its pool link initialized unless it's allocated
using pool_alloc(). This is problematic for buffer allocators which
use pool_alloc_dirty(), as freeing this object will make the code
think it was allocated from another pool. This patch does two things :
  - make __pool_get_from_cache() set the link
  - remove the extra initialization from pool_alloc() since it's always
    done in either __pool_get_first() or __pool_refill_alloc()

This patch is marked MINOR since it only affects code explicitly built
for debugging. No backport is needed.
2018-10-28 20:12:31 +01:00
William Lallemand
90b1ca1ff5 MEDIUM: channel: reorder the channel analyzers for the cli
Reorder the channel analyzers so the CLI analyzers are defined before
the XFER_DATA ones.
2018-10-28 14:13:31 +01:00
William Lallemand
309dc9adec MEDIUM: mworker: stop the master proxy in the workers
The master proxy which handles the CLI should not be used or shown in
the stats of the workers. This proxy is now disabled after the fork.
2018-10-28 14:03:31 +01:00
William Lallemand
cf62f7e3cb MEDIUM: cli: implement 'mode cli' proxy analyzers
This patch implements analysers for parsing the CLI and extra features
for the master's CLI.

For each command (sent alone, or separated by ; or \n) the request
analyser will determine to which server it should send the request.

The 'mode cli' proxy is able to parse a prefix for each command which is
used to select the apropriate server. The prefix start by @ and is
followed by "master", the PID preceded by ! or the relative PID. (e.g.
@master, @1, @!1234). The servers are not round-robined anymore.

The command is sent with a SHUTW which force the server to close the
connection after sending its response. However the proxy allows a
keepalive connection on the client side and does not close.

The response analyser does not do much stuff, it only reinits the
connection when it received a close from the server, and forward the
response. It does not analyze the response data.
The only guarantee of the end of the response is the close of the
server, we can't rely on the double \n since it's not send by every
command.

This could be reimplemented later as a filter.
2018-10-28 14:03:06 +01:00
William Lallemand
291810d8f8 MEDIUM: mworker: find the server ptr using a CLI prefix
Add a struct server pointer in the mworker_proc struct so we can easily
use it as a target for the mworker proxy.

pcli_prefix_to_pid() is used to find the right PID of the worker
when using a prefix in the CLI. (@master, @#<relative pid> , @<pid>)

pcli_pid_to_server() is used to find the right target server for the
CLI proxy.
2018-10-28 13:51:39 +01:00
William Lallemand
14721be11f MEDIUM: cli: disable some keywords in the master
The master process does not need all the keywords of the cli, add 2
flags to chose which keyword to use.

It might be useful to activate some of them in a debug mode later...
2018-10-28 13:51:39 +01:00
William Lallemand
e736115d3a MEDIUM: mworker: create CLI listeners from argv[]
This patch introduces mworker_cli_proxy_new_listener() which allows the
creation of new listeners for the CLI proxy.

Using this function it is possible to create new listeners from the
program arguments with -Sa <unix_socket>. It is allowed to create
multiple listeners with several -Sa.
2018-10-28 13:51:39 +01:00
William Lallemand
8a02257d88 MEDIUM: mworker: proxy for the master CLI
This patch implements a listen proxy within the master. It uses the
sockpair of all the workers as servers.

In the current state of the code, the proxy is only doing round robin on
the CLI of the workers. A CLI mode will be needed to know to which CLI
send the requests.
2018-10-28 13:51:39 +01:00
William Lallemand
6e0db2fa99 MEDIUM: mworker: add proc_list in global.h
Add the process list in types/global.h so it could be accessed from
anywhere.
2018-10-28 13:51:39 +01:00
William Lallemand
313bfd18c1 MINOR: server: export new_server() function
The new_server() function will be useful to create a proxy for the
master-worker.
2018-10-28 13:51:38 +01:00
William Lallemand
7e1299bb3a REORG: mworker: move struct mworker_proc to global.h
Move the definition of the mworker_proc structure in types/global.h.
2018-10-28 13:51:38 +01:00
William Lallemand
ce83b4a5dd MEDIUM: mworker: each worker socketpair is a CLI listener
The init code of the mworker_proc structs has been moved before the
init of the listeners.

Each socketpair is now connected to a CLI within the workers, which
allows the master to access their CLI.

The inherited flag of the worker side socketpair is removed so the
socket can be closed in the master.
2018-10-28 13:51:38 +01:00
Willy Tarreau
85f890174a MEDIUM: stream-int: make si_update() synchronize flag changes before the I/O
With the new synchronous si_cs_send() at the end of process_stream(),
we're seeing re-appear the I/O layer specific part of the stream interface
which is supposed to deal with I/O event subscription. The only difference
is that now we subscribe to I/Os only after having attempted (and failed)
them.

This patch brings a cleanup in this by reintroducing stream_int_update_conn()
with the send code from process_stream(). However this alone would not be
enough because the flags which are cleared afterwards would result in the
loss of the possible events (write events only at the moment). So the flags
clearing and stream-int state updates are also performed inside si_update()
between the generic code and the I/O specific code. This definitely makes
sense as after this call we can simply check again for channel and SI flag
changes and decide to loop once again or not.
2018-10-28 13:47:00 +01:00
Willy Tarreau
0979916d3b MINOR: stream-int: add si_alloc_ibuf() to ease input buffer allocation
This will supersed channel_alloc_buffer() while relying on it. It will
automatically adjust SI_FL_WAIT_ROOM on the stream-int depending on
success or failure to allocate this buffer.

It's worth noting that it could make sense to also set SI_FL_WANT_PUT
each time we do this to further simplify the code at user places such
as applets, but it would possibly not be easy to clean this flag
everywhere an rx operation stops.
2018-10-28 13:47:00 +01:00
Willy Tarreau
ede3d884fc MEDIUM: channel: merge back flags CF_WRITE_PARTIAL and CF_WRITE_EVENT
The behaviour of the flag CF_WRITE_PARTIAL was modified by commit
95fad5ba4 ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") due
to a situation where it could trigger an immediate wake up of the other
side, both acting in loops via the FD cache. This loss has caused the
need to introduce CF_WRITE_EVENT as commit c5a9d5bf, to replace it, but
both flags express more or less the same thing and this distinction
creates a lot of confusion and complexity in the code.

Since the FD cache now acts via tasklets, the issue worked around in the
first patch no longer exists, so it's more than time to kill this hack
and to restore CF_WRITE_PARTIAL's semantics (i.e.: there has been some
write activity since we last left process_stream).

This patch mostly reverts the two commits above. Only the part making
use of CF_WROTE_DATA instead of CF_WRITE_PARTIAL to detect the loss of
data upon connection setup was kept because it's more accurate and
better suited.
2018-10-26 08:32:57 +02:00
Ioannis Cherouvim
1ff7633dd7 CLEANUP: tools: fix misleading comment above function LIM2A
The function produces ASCII, but its comment was copied from U2H which
produces HTML.
2018-10-26 05:00:48 +02:00
Frdric Lcaille
b80bc273a3 MINOR: shctx: Change max. object size type to unsigned int.
This change is there to prevent implicit conversions when comparing
shctx maximum object sizes with other unsigned values.
2018-10-26 04:54:40 +02:00
Frdric Lcaille
b7838afe6f MINOR: shctx: Add a maximum object size parameter.
This patch adds a new parameter to shctx_init() function to be used to
limit the size of each shared object, -1 value meaning "no limit".
2018-10-24 04:39:44 +02:00
Frdric Lcaille
8df65ae5e2 MINOR: cache: Larger HTTP objects caching.
This patch makes the capable of storing HTTP objects larger than a buffer.
It makes usage of the "block by block shared object allocation" new shctx API.

A new pointer to struct shared_block has been added to the cache applet
context to memorize the next block to be used by the HTTP cache I/O handler
http_cache_io_handler() to emit the data. Another member, named "sent" memorize
the number of bytes already sent by this handler. So, to send an object from cache,
http_cache_io_handler() must be called until "sent" counter reaches the size
of this object.
2018-10-24 04:37:12 +02:00
Frdric Lcaille
0bec807e08 MINOR: shctx: Shared objects block by block allocation.
This patch makes shctx capable of storing objects in several parts,
each parts being made of several blocks. There is no more need to
walk through until reaching the end of a row to append new blocks.

A new pointer to a struct shared_block member, named last_reserved,
has been added to struct shared_block so that to memorize the last block which was
reserved by shctx_row_reserve_hot(). Same thing about "last_append" pointer which
is used to memorize the last block used by shctx_row_data_append() to store the data.
2018-10-24 04:35:53 +02:00
Willy Tarreau
68ad3a42f7 MINOR: proxy: add a new option "http-use-htx"
This option makes a proxy use only HTX-compatible muxes instead of the
HTTP-compatible ones for HTTP modes. It must be set on both ends, this
is checked at parsing time.
2018-10-23 10:22:36 +02:00
Christopher Faulet
55d6be7d83 MINOR: h1: Export some functions parsing the value of some HTTP headers
Functions parsing the value of "Connection:", "Transfer-encoding:" and
"Content-length:" headers are now exported to be used by the mux-h1.
2018-10-23 10:22:36 +02:00
Willy Tarreau
627505d36a MINOR: freq_ctr: add swrate_add_scaled() to work with large samples
Some samples representing time will cover more than one sample at once
if they are units of time per time. For this we'd need to have the
ability to loop over swrate_add() multiple times but that would be
inefficient. By developing the function elevated to power N, it's
visible that some coefficients quickly disappear and that those which
remain at the first order more or less compensate each other.

Thus a simplified version of this function was added to provide a single
value for a given number of samples. Tests with multiple values, window
sizes and sample sizes have shown that it is possible to make it remain
surprisingly accurate (typical error < 0.2% over various large window
and sample sizes, even samples representing up to 1/4 of the window).
2018-10-22 08:13:57 +02:00
Olivier Houchard
3f03ab5b15 MINOR: connection: Add a SUB_CALL_UNSUBSCRIBE event.
Add a SUB_CALL_UNSUBSCRIBE event, to let the caller know that the
unsubscribe method should be called before destroyin the object.
2018-10-21 06:00:04 +02:00
Olivier Houchard
53216e7db9 MEDIUM: connections: Don't directly mess with the polling from the upper layers.
Avoid using conn_xprt_want_send/recv, and totally nuke cs_want_send/recv,
from the upper layers. The polling is now directly handled by the connection
layer, it is activated on subscribe(), and unactivated once we got the event
and we woke the related task.
2018-10-21 05:58:40 +02:00
Olivier Houchard
1fddc9b7bb BUG/MEDIUM: connections: Remove subscription if going in idle mode.
Make sure we don't have any subscription when the connection is going in
idle mode, otherwise there's a race condition when the connection is
reused, if there are still old subscriptions, new ones won't be done.

No backport is needed.
2018-10-21 05:55:20 +02:00
Olivier Houchard
62975a7740 BUG/MEDIUM: pools: Fix the usage of mmap()) with DEBUG_UAF.
When mapping memory with mmap(), we should use a fd of -1, not 0. 0 may
work on linux, but it doesn't work on FreeBSD, and probably other OSes.

It would be nice to backport this to 1.8 to help debugging there.
2018-10-21 05:43:33 +02:00
Willy Tarreau
4e7cc3381b BUILD: compiler: rename __unreachable() to my_unreachable()
Olivier reported that on FreeBSD __unreachable is already defined
and causes build warnings. Let's rename it then.
2018-10-20 17:45:48 +02:00
Willy Tarreau
7a6ad88b02 BUILD: memory: fix free_list pointer declaration again for atomic CAS
Commit ac6c880 ("BUILD: memory: fix pointer declaration for atomic CAS")
attemtped to fix a build warning affecting the lock-free version of the
pool allocator. But the fix tried to hide the cause instead of addressing
it, thus clang still complains about (void **) not matching (void ***).

The real solution is to declare free_list (void **) and not to use a cast.
Now this builds fine with gcc/clang with and without threads.

No backport is needed.
2018-10-20 17:37:38 +02:00
Willy Tarreau
ed72d82827 MEDIUM: time: measure the time stolen by other threads
The purpose is to detect if threads or processes are competing for the
same CPU. This can happen when threads are incorrectly bound, or after a
reload if the previous process still has an important activity. With
threads this situation is problematic because a preempted thread holding
a lock will block other ones waiting for this lock to be released.

A first attempt consisted in measuring the cumulated lost time more
precisely but the system's scheduler is smart enough to try to limit the
thread preemption rate by mostly context switching during poll()'s blank
periods, so most of the time lost is not seen. In essence this is good
because it means a thread is not preempted with a lock held, and even
regarding the rendez-vous point it cannot prevent the other ones from
making progress. But still it happens tens to hundreds of times per
second that a thread might be preempted, so it's still possible to detect
that the situation is happening, thus it's interesting to measure and
report its frequency.

Each time we enter the poller, we check the CPU time spent working and
see if we've lost time doing something else. To limit false positives,
we're only interested in losses of 500 microseconds or more (i.e. half
a clock tick on a 1 kHz system). If so, it indicates that some time was
stolen by another thread or process. Note that we purposely store some
sub-millisecond counters so that under heavy traffic with a 1 kHz clock,
it's still possible to measure something without being subject to the
risk of rounding errors (i.e. if exactly 1 ms is stolen it's possible
that the time difference could often be slightly lower).

This counter of lost CPU time slots time is reported in "show activity"
in numbers of milliseconds of CPU lost per second, per 15s, and total
over the process' life. By definition, the per-second counter cannot
report values larger than 1000 per thread per second and the 15s one
will be limited to 15000/s in the worst case, but it's possible that
peak values exceed such thresholds after long pauses.
2018-10-19 08:51:59 +02:00
Willy Tarreau
5ceeb15002 MINOR: time: add now_mono_time() and now_cpu_time()
These two functions retrieve respectively the monotonic clock time and
the per-thread CPU time when available on the platform, or return zero.
These syscalls may require to link with -lrt on certain libc, which is
enabled in the Makefile with USE_RT=1 (default on Linux systems).
2018-10-18 16:39:48 +02:00
Willy Tarreau
ac6c8805be BUILD: memory: fix pointer declaration for atomic CAS
The calls to HA_ATOMIC_CAS() on the lockfree version of the pool allocator
were mistakenly done on (void*) for the old value instead of (void **).
While this has no impact on "recent" gcc, it does have one for gcc < 4.7
since the CAS was open coded and it's not possible to assign a temporary
variable of type "void".

No backport is needed, this only affects 1.9.
2018-10-18 16:12:28 +02:00
Willy Tarreau
7e9c4ae4de MINOR: poller: move time and date computation out of the pollers
By placing this code into time.h (tv_entering_poll() and tv_leaving_poll())
we can remove the logic from the pollers and prepare for extending this to
offer more accurate time measurements.
2018-10-17 19:59:43 +02:00
Willy Tarreau
f37ba94768 MINOR: fd: centralize poll timeout computation in compute_poll_timeout()
The 4 pollers all contain the same code used to compute the poll timeout.
This is pointless, let's centralize this into fd.h. This also gets rid of
the useless SCHEDULER_RESOLUTION macro which used to work arond a very old
linux 2.2 bug causing select() to wake up slightly before the timeout.
2018-10-17 19:59:43 +02:00
Willy Tarreau
e18db9e984 MEDIUM: pools: implement a thread-local cache for pool entries
Each thread now keeps the last ~512 kB of freed objects into a local
cache. There are some heuristics involved so that a specific pool cannot
use more than 1/8 of the total cache in number of objects. Tests have
shown that 512 kB is an optimal size on a 24-thread test running on a
dual-socket machine, resulting in an overall 7.5% performance increase
and a cache miss ratio reducing from 19.2 to 17.7%. Anyway it seems
pointless to keep more than an L2 cache, which probably explains why
sizes between 256 and 512 kB are optimal.

Cached objects appear in two lists, one per pool and one LRU to help
with fair eviction. Currently there is no way to check each thread's
cache state nor to flush it. This cache cannot be disabled and is
enabled as soon as the lockless pools are enabled (i.e.: threads are
enabled, no pool debugging is in use and the CPU supports a double word
CAS).
2018-10-16 13:46:08 +02:00
Willy Tarreau
146794dc4f MINOR: pools: split pool_free() in the lockfree variant
This separates the validity tests from the code committing the object
to the pool, in order to ease insertion of the thread-local cache.
2018-10-16 10:29:28 +02:00
Willy Tarreau
0a93b6413f MINOR: pools: allocate most memory pools from an array
For caching it will be convenient to have indexes associated with pools,
without having to dereference the pool itself. One solution could consist
in replacing all pool pointers with integers but this would limit the
number of allocatable pools. Instead here we allocate the 32 first pools
from a pre-allocated array whose base address is known so that it's trivial
to convert a pool to an index in this array. Pools that cannot fit there
will be allocated normally.
2018-10-16 10:29:26 +02:00
Bertrand Jacquin
d5e4de8e5f DOC: Fix a few typos
these are mostly spelling mistakes, some of them might be candidate for
backporting as well.
2018-10-15 19:38:15 +02:00
Willy Tarreau
8d8747abe0 OPTIM: tasks: group all tree roots per cache line
Currently we have per-thread arrays of trees and counts, but these
ones unfortunately share cache lines and are accessed very often. This
patch moves the task-specific stuff into a structure taking a multiple
of a cache line, and has one such per thread. Just doing this has
reduced the cache miss ratio from 19.2% to 18.7% and increased the
12-thread test performance by 3%.

It starts to become visible that we really need a process-wide per-thread
storage area that would cover more than just these parts of the tasks.
The code was arranged so that it's easy to move the pieces elsewhere if
needed.
2018-10-15 19:06:13 +02:00
Willy Tarreau
b20aa9eef3 MAJOR: tasks: create per-thread wait queues
Now we still have a main contention point with the timers in the main
wait queue, but the vast majority of the tasks are pinned to a single
thread. This patch creates a per-thread wait queue and queues a task
to the local wait queue without any locking if the task is bound to a
single thread (the current one) otherwise to the shared queue using
locking. This significantly reduces contention on the wait queue. A
test with 12 threads showed 11 ms spent in the WQ lock compared to
4.7 seconds in the same test without this change. The cache miss ratio
decreased from 19.7% to 19.2% on the 12-thread test, and its performance
increased by 1.5%.

Another indirect benefit is that the average queue size is divided
by the number of threads, which roughly removes log(nbthreads) levels
in the tree and further speeds up lookups.
2018-10-15 19:04:40 +02:00
Willy Tarreau
87d54a9a6d MEDIUM: fd/threads: only grab the fd's lock if the FD has more than one thread
The vast majority of FDs are only seen by one thread. Currently the lock
on FDs costs a lot because it's touched often, though there should be very
little contention. This patch ensures that the lock is only grabbed if the
FD is shared by more than one thread, since otherwise the situation is safe.
Doing so resulted in a 15% performance boost on a 12-threads test.
2018-10-15 13:25:06 +02:00
Willy Tarreau
98d334bd94 MINOR: tools: add a new function atleast2() to test masks for more than 1 bit
For threads it's common to have to check if a mask contains more than
one bit set. Let's have this "atleast2()" function report this.
2018-10-15 13:25:06 +02:00
Willy Tarreau
d944344f01 BUILD: peers: check allocation error during peers_init_sync()
peers_init_sync() doesn't check task_new()'s return value and doesn't
return any result to indicate success or failure. Let's make it return
an int and check it from the caller.

This can be backported as far as 1.6.
2018-10-15 13:24:43 +02:00
Willy Tarreau
8d26f02e69 BUILD: compiler: add a new statement "__unreachable()"
This statement is used as a hint for the compiler so that it knows that
the location where it's placed cannot be reached. It will mostly be used
after longjmp() or equivalent statements that deal with error processing
and that the compiler doesn't know will not return on certain conditions,
so that it doesn't complain about null dereferences on error paths.
2018-10-15 13:24:43 +02:00
Willy Tarreau
c1f40b38a6 MINOR: chunk: add chunk_cpy() and chunk_cat()
Sometimes we need to concatenate constant chunks to existing ones, but
no function currently exists to do this easily, hence these two new ones.
2018-10-12 16:58:01 +02:00
Christopher Faulet
25da9e34f1 MINOR: h1: Add the flag H1_MF_NO_PHDR to not add pseudo-headers during parsing
Some pseudo-headers are added during the headers parsing, mainly for the mux
H2. With this flag, it is possible to not add them. This avoid some boring
filtering in the mux H1.
2018-10-12 16:15:18 +02:00
Christopher Faulet
1dc2b49556 MINOR: h1: Change the union h1_sl to use indirect strings to store infos
Instead of using offsets relating to the parsed buffer to store start line
infos, we now use indirect strings. So now, these infos remain valid only if the
origin buffer remains untouched. But it's not a real problem because this union
is used during the parsing and never stored to a later use.
2018-10-12 16:14:57 +02:00
Christopher Faulet
08088e77c6 MINOR: conn-stream: Add CL_FL_NOT_FIRST flag
This flags will be used by multiplexers to warn a conn-stream (and, by
transitivity, a stream) it is not the first one created by the mux. It will help
mux H1 to handle keep-alive connections.
2018-10-12 16:09:26 +02:00
Christopher Faulet
315b39c391 MINOR: http: Use same flag for httpclose and forceclose options
Since keep-alive mode is the default mode, the passive close has disappeared,
and in the code, httpclose and forceclose options are handled the same way:
connections with the client and the server are closed as soon as the request and
the response are received and missing "Connection: close" header is added in
each direction.

So to make things clearer, forceclose is now an alias for httpclose. And
httpclose is explicitly an active close. So the old passive close does not exist
anymore. Internally, the flag PR_O_HTTP_PCL has been removed and PR_O_HTTP_FCL
has been replaced by PR_O_HTTP_CLO. In HTTP analyzers, the checks done to find
the right mode to use, depending on proxies options and "Connection: " header
value, have been simplified.

This should only be a cleanup and no changes are expected.
2018-10-12 16:07:56 +02:00
Christopher Faulet
10079f59b7 MINOR: http: Export some functions and do cleanup to prepare HTTP refactoring
To ease the refactoring, the function "http_header_add_tail" have been
remove. Now, "http_header_add_tail2" is always used. And the function
"capture_headers" have been renamed into "http_capture_headers". Finally, some
functions have been exported.
2018-10-12 16:00:45 +02:00
Christopher Faulet
702226c827 MINOR: stats: Add missing include
"proto/stats.h" must include "types/stats.h".
2018-10-12 16:00:32 +02:00
Christopher Faulet
7e266c7936 MINOR: http: Move comment about some HTTP macros in the right header file
HTTP_FLG_* and HTTP_IS_* were moved from "proto/proto_http.h" to "common/http.h"
but the associated comment was forgotten during the move.

This is 1.9-specific and should not be backported.
2018-10-12 16:00:24 +02:00
Olivier Houchard
4fdec7aafa BUG/MEDIUM: stream: Make sure to unsubscribe before si_release_endpoint.
Make sure we unsubscribe from events before si_release_endpoint destroys
the conn_stream, or it will be never called. To do so, move the call to
unsubscribe to si_release_endpoint() directly.

This is 1.9-specific and shouldn't be backported.
2018-10-11 17:16:43 +02:00
Olivier Houchard
fa8aa867b9 MEDIUM: connections: Change struct wait_list to wait_event.
When subscribing, we don't need to provide a list element, only the h2 mux
needs it. So instead, Add a list element to struct h2s, and use it when a
list is needed.
This forces us to use the unsubscribe method, since we can't just unsubscribe
by using LIST_DEL anymore.
This patch is larger than it should be because it includes some renaming.
2018-10-11 15:34:39 +02:00
Olivier Houchard
83a0cd8a36 MINOR: connections: Introduce an unsubscribe method.
As we don't know how subscriptions are handled, we can't just assume we can
use LIST_DEL() to unsubscribe, so introduce a new method to mux and connections
to do so.
2018-10-11 15:34:21 +02:00
Willy Tarreau
27346b01aa OPTIM: tools: optimize my_ffsl() for x86_64
This call is now used quite a bit in the fd cache, to decide which cache
to add/remove the fd to/from, when waking up a task for a single thread
in __task_wakeup(), in fd_cant_recv() and in fd_process_cached_events(),
and we can replace it with a single instruction, removing ~30 instructions
and ~80 bytes from the inner loop of some of these functions.

In addition the test for zero value was replaced with a comment saying
that it is illegal and leads to an undefined behaviour. The code does
not make use of this useless case today.
2018-10-10 19:24:23 +02:00
Willy Tarreau
2325d8af93 BUG/MINOR: threads: move declaration of capabilities to config.h
In commit f161d0f51 ("BUG/MINOR: pools/threads: don't ignore DEBUG_UAF
on double-word CAS capable archs") I moved some defines and accidently
messed up with lockfree pools. The problem is that the HA_HAVE_CAS_DW
macro is not defined anymore where the CONFIG_HAP_LOCKLESS_POOLS macro
is set, so this fix implicitly disabled lockfree pools.

This patch fixes this by moving the capabilities definition to config.h
(probably that we'd benefit from having an "arch.h" file to declare the
capabilities offered by the architecture). In a test on a 12-core machine,
we used to measure 19s spent in the pool lock for 1M requests without
this patch, and 0 with it so that's definitely a net saving.

No backport is required, this is only for 1.9.
2018-10-10 18:29:23 +02:00
Dirkjan Bussink
c26c72d89b CLEANUP: h1: Fix debug warnings for h1 headers
The wrong method was used to debug the h1m state here. This fixes both
the signature of the h1m method and also fixes the invocation to be
correct.
2018-10-09 15:09:29 +02:00
Dirkjan Bussink
415150f764 MEDIUM: ssl: add support for ciphersuites option for TLSv1.3
OpenSSL released support for TLSv1.3. It also added a separate function
SSL_CTX_set_ciphersuites that is used to set the ciphers used in the
TLS 1.3 handshake. This change adds support for that new configuration
option by adding a ciphersuites configuration variable that works
essentially the same as the existing ciphers setting.

Note that it should likely be backported to 1.8 in order to ease usage
of the now released openssl-1.1.1.
2018-10-08 19:20:13 +02:00
Olivier Houchard
363c745569 BUG/MEDIUM: buffers: Make sure we don't wrap in ci_insert_line2/b_rep_blk.
In ci_insert_line2() and b_rep_blk(), we can't afford to wrap, so don't use
b_tail() to check if we do, use __b_tail() instead.

This should be backported to previous versions.
2018-10-08 16:11:54 +02:00
Emmanuel Hocdet
747ca61693 MINOR: ssl: generate-certificates for BoringSSL 2018-10-08 09:42:34 +02:00
Willy Tarreau
491cec20be CLEANUP: http: remove some leftovers from recent cleanups
The prototypes of functions find_hdr_value_end(), extract_cookie_value()
and http_header_match2() were still in proto_http.h while some of them
don't exist anymore and the others were just moved. Let's remove them.
In addition, da.c was updated to use http_extract_cookie_value() which
is the correct one.
2018-10-02 18:37:27 +02:00
Willy Tarreau
61c112aa5b REORG: http: move HTTP rules parsing to http_rules.c
These ones are mostly called from cfgparse.c for the parsing and do
not depend on the HTTP representation. The functions's prototypes
were moved to proto/http_rules.h, making this file work exactly like
tcp_rules. Ideally we should stop calling these functions directly
from cfgparse and register keywords, but there are a few cases where
that wouldn't work (stats http-request) so it's probably not worth
trying to go this far.
2018-10-02 18:28:05 +02:00
Willy Tarreau
79e57336b5 REORG: http: move the code to different files
The current proto_http.c file is huge and contains different processing
domains making it very difficult to work on an alternative representation.
This commit moves some parts to other files :

  - ACL registration code => http_acl.c
    This code only creates some ACL mappings and doesn't know anything
    about HTTP nor about the representation. This code could even have
    moved to acl.c but it was not worth polluting it again.

  - HTTP sample conversion => http_conv.c
    This code doesn't depend on the internal representation but definitely
    manipulates some HTTP elements, such as dates. It also has access to
    captures.

  - HTTP sample fetching => http_fetch.c
    This code does depend entirely on the internal representation but is
    totally independent on the analysers. Placing it into a different
    file will ease the transition to the new representation and the
    creation of a wrapper if required. An include file was created due
    to CHECK_HTTP_MESSAGE_FIRST() being used at various places.

  - HTTP action registration => http_act.c
    This code doesn't directly interact with the messages nor the
    transaction but it does so via some exported http functions like
    http_replace_req_line() or http_set_status() so it will be easier
    to change only this after the conversion.

  - a few very generic parts were found and moved to http.{c,h} as
    relevant.

It is worth noting that the functions moved to these new files are not
referenced anywhere outside of the files and are only called as registered
callbacks, so these files do not even require associated include files.
2018-10-02 18:26:59 +02:00
Adis Nezirovic
8878f8eb3d MEDIUM: lua: Add stick table support for Lua.
This ads support for accessing stick tables from Lua. The supported
operations are reading general table info, lookup by string/IP key, and
dumping the table.

Similar to "show table", a data filter is available during dump, and as
an improvement over "show table" it's possible to use up to 4 filter
expressions instead of just one (with implicit AND clause binding the
expressions). Dumping with/without filters can take a long time for
large tables, and should be used sparingly.
2018-09-29 20:15:01 +02:00
Olivier Houchard
0e367bbb01 BUG/MEDIUM: process_stream: Don't use si_cs_io_cb() in process_stream().
Instead of using si_cs_io_cb() in process_stream()  use si_cs_send/si_cs_recv
instead, as si_cs_io_cb() may lead to process_stream being woken up when it
shouldn't be, and thus timeout would never get triggered.
2018-09-26 14:21:54 +02:00
Willy Tarreau
7f2a44d319 BUG/CRITICAL: hpack: fix improper sign check on the header index value
Tim Dsterhus found using afl-fuzz that some parts of the HPACK decoder
use incorrect bounds checking which do not catch negative values after
a type cast. The first culprit is hpack_valid_idx() which takes a signed
int and is fed with an unsigned one, but a few others are affected as
well due to being designed to work with an uint16_t as in the table
header, thus not being able to detect the high offset bits, though they
are not exposed if hpack_valid_idx() is fixed.

The impact is that the HPACK decoder can be crashed by an out-of-bounds
read. The only work-around without this patch is to disable H2 in the
configuration.

CVE-2018-14645 was assigned to this bug.

This patch addresses all of these issues at once. It must be backported
to 1.8.
2018-09-20 11:45:56 +02:00
Willy Tarreau
55e0da664e BUILD: connection: silence a couple of null-deref build warnings at -Wextra
These ones don't need to be checked either.
2018-09-20 11:42:15 +02:00
Willy Tarreau
4ae4923c3e MINOR: stream-int: make si_appctx() never fail
Callers of si_appctx() always use the result without checking it because
they know by construction that it's valid. This results in unchecked null
pointer warnings at -Wextra, so let's remove this test and make it clear
that it's up to the caller to check validity first.
2018-09-20 11:42:15 +02:00
Willy Tarreau
babc15e8cf MINOR: stktable: provide an unchecked version of stktable_data_ptr()
stktable_data_ptr() currently performs null pointer checks but most
callers don't check the result since they know by construction that
it cannot be null. This causes valid warnings when building with
-Wextra which are worth addressing since it will result in better
code. Let's provide an unguarded version of this function for use
where the check is known to be useless and untested.
2018-09-20 11:42:15 +02:00
Willy Tarreau
4c0fcc2314 BUG/MINOR: tools: fix set_net_port() / set_host_port() on IPv4
These two functions were apparently written on the same model as their
parents when added by commit 11bcb6c4f ("[MEDIUM] IPv6 support for syslog")
except that they perform an assignment instead of a return, and as a
result fall through the next case where the assigned value may possibly
be partially overwritten. At least under Linux the port offset is the
same in both sockaddr_in and sockaddr_in6 so the value is written twice
without side effects.

This needs to be backported as far as 1.5.
2018-09-20 10:52:48 +02:00
Willy Tarreau
2557f6a3e2 MEDIUM: h1: better handle transfer-encoding vs content-length
The transfer-encoding header processing was a bit lenient in this part
because it was made to read messages already validated by haproxy. We
absolutely need to reinstate the strict processing defined in RFC7230
as is currently being done in proto_http.c. That is, transfer-encoding
presence alone is enough to cancel content-length, and must be
terminated by the "chunked" token, except in the response where we
can fall back to the close mode if it's not last.

For this we now use a specific parsing function which updates the
flags and we introduce a new flag H1_MF_XFER_ENC indicating that the
transfer-encoding header is present.

Last, if such a header is found, we delete all content-length header
fields found in the message.
2018-09-14 17:40:35 +02:00
Willy Tarreau
e2c418e94b MINOR: http: add http_hdr_del() to remove a header from a list
This one removes all occurrences of the specified header field name from
a complete list and returns the new count.
2018-09-14 17:40:35 +02:00
Christopher Faulet
c4e53f4ad7 MINOR: h1: Add H1_MF_XFER_LEN flag
This flag is usefull to handle cases where there is no body, regardless of CL or
TE headers (for instance, responses to HEAD requests). It will not be set by the
parser itself.
2018-09-14 16:02:40 +02:00
Willy Tarreau
98f5cf7a59 MINOR: h1: parse the Connection header field
The new function h1_parse_connection_header() is called when facing a
connection header in the generic parser, and it will set up to 3 bits
in h1m->flags indicating if at least one "close", "keep-alive" or "upgrade"
tokens was seen.
2018-09-13 14:52:31 +02:00
Willy Tarreau
ba5fbca33f MINOR: h1: report in the h1m struct if the HTTP version is 1.1 or above
This will be needed for the mux to know how to process the Connection
header, and will save it from having to re-parse the request line since
it's captured on the fly.
2018-09-13 14:34:09 +02:00
Willy Tarreau
175a2bb507 MINOR: connection: pass the proxy when creating a connection
Till now it was very difficult for a mux to know what proxy it was
working for. Let's pass the proxy when the mux is instanciated at
init() time. It's not yet used but the H1 mux will definitely need
it, just like the H2 mux when dealing with backend connections.
2018-09-12 17:39:22 +02:00
Willy Tarreau
eb528db60b MINOR: h1: add H1_MF_TOLOWER to decide when to turn header names to lower case
The h1 parser used to systematically turn header field names to lower
case because it was designed for H2. Let's add a flag which is off by
default to condition this behaviour so that when using it from an H1
parser it will not affect the message.
2018-09-12 17:38:26 +02:00
Willy Tarreau
11da5674c3 MINOR: h1: remove the HTTP status from the H1M struct
It has nothing to do there and is not used from there anymore, let's
get rid of it.
2018-09-12 17:38:25 +02:00
Willy Tarreau
001823c304 MEDIUM: h1: remove the useless H1_MSG_BODY state
This state was only a delimiter between headers and body but it now
causes more harm than good because it requires someone to change it.
Since the H1 parser knows if we're in DATA or CHUNK_SIZE, simply let
it set the right next state so that h1m->state constantly matches
what is expected afterwards.
2018-09-12 17:38:25 +02:00
Willy Tarreau
a41393fc61 MEDIUM: h1: make the parser support a pointer to a start line
This will allow the parser to fill some extra fields like the method or
status without having to store them permanently in the HTTP message. At
this point however the parser cannot restart from an interrupted read.
2018-09-12 17:38:25 +02:00
Willy Tarreau
bbf3823f82 MINOR: h1: properly pre-initialize err_pos to -2
This way we maintain the old mechanism stating that -2 means we block
on errors, -1 means we only capture them, and a positive value indicates
the position of the first error.
2018-09-12 17:38:25 +02:00
Willy Tarreau
ccaf233741 MINOR: h1: add a message flag to indicate that a message carries a response
This flag is H1_MF_RESP. It will be used by the parser during restarts when
it supports requests.
2018-09-12 17:38:25 +02:00
Willy Tarreau
7f437ff81c MINOR: h1: provide a distinct init() function for request and response
h1m_init() used to handle response only since it was used by the H1
client code. Let's have one init per direction.
2018-09-12 17:38:25 +02:00
Willy Tarreau
acc295cab3 MINOR: h1: remove the unused states from h1m_state
States ERROR, 100_SENT, ENDING, CLOSE, CLOSING are not used at all for
the parsers. It's possible that a few others may disappear as well.
2018-09-12 17:38:25 +02:00
Willy Tarreau
b3b0152b6f MINOR: h1: add the restart offsets into struct h1m
Currently the only user of struct h1m is the h2 mux when it has to parse
an H1 message coming from the channel. Unfortunately this is not enough
to efficiently parse HTTP/1 messages like those coming from the network
as we don't want to restart from scratch at every byte received.

This patch reintroduces the "next" offset into the H1 message so that any
H1 parser can use it to restart when called with a state that is not the
initial state.
2018-09-12 17:38:25 +02:00
Willy Tarreau
801250e07d REORG: h1: create a new h1m_state
This is the *parsing* state of an HTTP/1 message. Currently the h1_state
is composite as it's made both of parsing and control (100SENT, BODY,
DONE, TUNNEL, ENDING etc). The purpose here is to have a purely H1 state
that can be used by H1 parsers. For now it's equivalent to h1_state.
2018-09-12 17:38:25 +02:00
Olivier Houchard
71384551fe MINOR: conn_streams: Remove wait_list from conn_streams.
The conn_streams won't be used for subscribing/waiting for I/O events, after
all, so just remove its wait_list, and send/recv/_wait_list.
2018-09-12 17:37:55 +02:00
Olivier Houchard
26e1a8f2bf MINOR: checks: Give checks their own wait_list.
Instead of (ab)using the conn_stream's wait_list, which should disappear,
give the checks their own wait_list.
2018-09-12 17:37:55 +02:00
Olivier Houchard
cb1f49ff93 MINOR: connections: Add a "handle" field to wait_list.
Add a new field to struct wait_list, "handle", that can be used by the
entity in charge of subscribing.
2018-09-12 17:37:55 +02:00
Olivier Houchard
af4021e680 MEDIUM: connections: Get rid of the recv() method.
Remove the recv() method from mux and conn_stream.
The goal is to always receive from the upper layers, instead of waiting
for the connection later. For now, recv() is still called from the wake()
method, but that should change soon.
2018-09-12 17:37:55 +02:00
Olivier Houchard
4cf7fb148f MEDIUM: connections/mux: Add a recv and a send+recv wait list.
For struct connection, struct conn_stream, and for the h2 mux, add 2 new
lists, one that handles waiters for recv, and one that handles waiters for
recv and send. That way we can ask to subscribe for either recv or send.
2018-09-12 17:37:55 +02:00
Olivier Houchard
931624a00b BUG/MEDIUM: tasks: Don't forget to decrement task_list_size in tasklet_free().
In tasklet_free(), if we're currently in the runnable task list, don't
forget to decrement taks_list_size, or it'll end up being to big, and we may
not process tasks in the global runqueue.
2018-09-12 17:37:55 +02:00
William Lallemand
2fe7dd0b2e MEDIUM: protocol: sockpair protocol
This protocol is based on the uxst one, but it uses socketpair and FD
passing insteads of a connect()/accept().

The "sockpair@" prefix has been implemented for both bind and server
keywords.

When HAProxy wants to connect through a sockpair@, it creates 2 new
sockets using the socketpair() syscall and pass one of the socket
through the FD specified on the server line.

On the bind side, haproxy will receive the FD, and will use it like it
was the FD of an accept() syscall.

This protocol was designed for internal communication within HAProxy
between the master and the workers, but it's possible to use it
externaly with a wrapper and pass the FD through environment variabls.
2018-09-12 07:20:17 +02:00
William Lallemand
2d3f8a411f MEDIUM: protocol: use a custom AF_MAX to help protocol parser
It's possible to have several protocols per family which is a problem
with the current way the protocols are stored.

This allows to register a new protocol in HAProxy which is not a
protocol in the strict socket definition. It will be used to register a
SOCK_STREAM protocol using socketpair().
2018-09-12 07:12:27 +02:00
Willy Tarreau
ab813a4b05 REORG: http: move some header value processing functions to http.c
The following functions only deal with header field values and are agnostic
to the HTTP version so they were moved to http.c :

http_header_match2(), find_hdr_value_end(), find_cookie_value_end(),
extract_cookie_value(), parse_qvalue(), http_find_url_param_pos(),
http_find_next_url_param().

Those lacking the "http_" prefix were modified to have it.
2018-09-11 10:30:25 +02:00
Willy Tarreau
04f1e2d202 REORG: http: move error codes production and processing to http.c
These error codes and messages are agnostic to the version, even if
they are represented as HTTP/1.0 messages. Ultimately they will have
to be transformed into internal HTTP messages to be used everywhere.

The HTTP/1.1 100 Continue message was turned to an IST and the local
copy in the Lua code was removed.
2018-09-11 10:30:25 +02:00
Willy Tarreau
6b952c8101 REORG: http: move http_get_path() to http.c
This function is purely HTTP once http_txn is put aside. So the original
one was renamed to http_txn_get_path() and it extracts the relevant offsets
from the txn to pass them to http_get_path(). One benefit of the new version
is that it returns the length at the same time so that allowed to slightly
simplify http_get_path_from_string() which had to look up the end pointer
previously and which is not needed anymore.
2018-09-11 10:30:25 +02:00
Willy Tarreau
35b51c6e5b REORG: http: move the HTTP semantics definitions to http.h/http.c
It's a bit painful to have to deal with HTTP semantics for each protocol
version (H1 and H2), and working on the version-agnostic code further
emphasizes the problem.

This patch creates http.h and http.c which are agnostic to the version
in use, and which borrow a few parts from proto_http and from h1. For
example the once thought h1-specific h1_char_classes array is in fact
dictated by RFC7231 and is used to parse HTTP headers. A few changes
were made to a few files which were including proto_http.h while they
only needed http.h.

Certain string definitions pre-dated the introduction of indirect
strings (ist) so some were used to simplify the definition of the known
HTTP methods. The current lookup code saves 2 kB of a heavily used table
and is faster than the previous table based lookup (typ. 14 ns vs 16
before).
2018-09-11 10:30:25 +02:00
William Lallemand
e22f11ff47 MINOR: mworker: keep and clean the listeners
Keep the listeners that should be used in the master process and clean
them in the workers.
2018-09-11 10:23:24 +02:00
William Lallemand
d3801c1c21 MEDIUM: startup: unify signal init between daemon and mworker mode
The signals are now unblocked only once the configuration have been
parsed.
2018-09-11 10:21:58 +02:00
Willy Tarreau
4bc7d90d3b MEDIUM: snapshot: merge the captured data after the descriptor
Instead of having a separate area for the captured data, we now have a
contigous block made of the descriptor and the data. At the moment, since
the area is dynamically allocated, we can adjust its size to what is
needed, but the idea is to quickly switch to a pool and an LRU list.
2018-09-07 20:07:17 +02:00
Willy Tarreau
c55015ee5b MEDIUM: snapshots: dynamically allocate the snapshots
Now upon error we dynamically allocate the snapshot instead of overwriting
it. This way there is no more memory wasted in the proxy to hold the two
error snapshot descriptors. Also an appreciable side effect of this is that
the proxy's lock is only taken during the pointer swap, no more while copying
the buffer's contents. This saves 480 bytes of memory per proxy.
2018-09-07 19:59:58 +02:00
Willy Tarreau
fd9419d560 MINOR: http: remove the pointer to the error snapshot in http_capture_bad_message()
It's not needed anymore as we know the side thanks to the channel. This
will allow the proxy generic code to better manage the error snapshots.
2018-09-07 18:36:04 +02:00
Willy Tarreau
75fb65a51f MINOR: proxy: add a new generic proxy_capture_error()
This function now captures an error regardless of its side and protocol.
The caller must pass a number of elements and may pass a protocol-specific
structure and a callback to display it. Later this function may deal with
more advanced allocation techniques to avoid allocating as many buffers
as proxies.
2018-09-07 18:36:04 +02:00
Willy Tarreau
7ccdd8dad9 MEDIUM: snapshot: implement a show() callback and use it for HTTP
The HTTP dumps are now configurable in the code : "show errors" now
calls a protocol-specific function to emit the decoded output. For
now only HTTP is implemented.
2018-09-07 18:36:01 +02:00
Willy Tarreau
7480f323ff MINOR: snapshot: split the error snapshots into common and proto-specific parts
The idea will be to make the error snapshot feature accessible to other
protocols than just HTTP. This patch only introduces an "http_snapshot"
structure and renames a few fields to make things more explicit. The
HTTP part was installed inside a union so that we can easily add more
protocols in the future.
2018-09-07 16:13:45 +02:00
Willy Tarreau
5865a8fe69 MINOR: snapshot: restart on the event ID and not the stream ID
The snapshots have the ability to restart a partial dump and they use
the stream ID as the restart point. Since it's purely HTTP, let's use
the event ID instead.
2018-09-07 15:00:43 +02:00
Olivier Houchard
54620523e2 MINOR: log: One const should be enough.
"const const" doesn't bring much more constness, so only use one.
2018-09-06 18:52:15 +02:00
Willy Tarreau
57f8185625 MINOR: connection: add new function conn_is_back()
This function returns true if the connection is a backend connection
and false if it's a frontend connection.
2018-09-06 14:52:21 +02:00
Willy Tarreau
6ac98ac1be MINOR: connection: add new function conn_get_proxy()
This function returns the proxy associated to a connection. For front
connections it returns the frontend, and for back connections it
returns the backend. This will be used to retrieve some configuration
parameters from within a mux.
2018-09-06 11:48:44 +02:00
Willy Tarreau
be373150c7 MINOR: connection: make the initialization more consistent
Sometimes a connection is prepared before the target is set, sometimes
after. There's no real rule since the few functions involved operate on
different and independent fields. Soon we'll benefit from knowing the
target at the connection layer, in order to figure the associated proxy
and retrieve the various parameters (timeouts etc). This patch slightly
reorders a few calls to conn_prepare() so that we can make sure that the
target is always known to the mux.
2018-09-06 11:45:30 +02:00
Willy Tarreau
5383935856 MINOR: log: provide a function to emit a log for a session
The new function sess_log() only needs a session to emit a log. It will
ignore the parts that depend on the stream. It is usable to emit a log
to report early errors in muxes. These ones will typically mention
"<BADREQ>" for the request and 0 for the HTTP status code.
2018-09-06 09:43:41 +02:00
Willy Tarreau
26ffa8544d CLEANUP: log: make the low_level lf_{ip,port,text,text_len} functions take consts
These ones were abusively relying on variables making it hard to integrate
with const arguments.
2018-09-05 20:01:23 +02:00
Willy Tarreau
43c538eab6 MINOR: log: move the log code to sess_build_logline() to add extra arguments
The current build_logline() can only be used with valid streams, which
means it is not suitable for use from muxes. We start by moving it into
another more generic function which takes the session as an argument,
to avoid complexifying all the internal API for jsut a few use cases.
This new function is not supposed to be called directly from outside so
we'll be able to instrument it to support several calling conventions.

For now the behaviour and conditions remain unchanged.
2018-09-05 20:01:23 +02:00
Willy Tarreau
ec3750c590 BUG/MAJOR: buffer: fix incorrect check in __b_putblk()
This function was split in two at commit f7d0447 ("MINOR: buffers:
split b_putblk() into __b_putblk()") but it's wrong, the first half's
length is not adjusted to the requested size so it copies more than
desired.

This is purely 1.9-specific, no backport is needed.
2018-09-05 20:01:14 +02:00
Willy Tarreau
590a0514f2 BUG/MEDIUM: session: fix reporting of handshake processing time in the logs
The handshake processing time used to be stored per stream, which was
valid when there was exactly one stream per session. With H2 and
multiplexing it's not the case anymore and the reported handshake times
are wrong in the logs as it's computed between the TCP accept() and the
stream creation. Let's first move the handshake where it belongs, which
is the session.

However, this is not enough because we don't want to report an excessive
idle time either for H2 (since many requests use the connection).

So the solution used here is to have the stream retrieve sess->tv_accept
and the handshake duration when the stream is created, and let the mux
immediately reset them. This way, the handshake time becomes zero for the
second and subsequent requests in H2 (which was already the case in H1),
and the idle time exactly counts how long the connection remained unused
while it could be used, so in H1 it runs from the end of the previous
response and in H2 it runs from the end of the previous request since the
channel is already available.

This patch will need to be backported to 1.8.
2018-09-05 16:30:23 +02:00
Willy Tarreau
9378df89f6 MINOR: thread: implement HA_ATOMIC_XADD()
We've been missing it several times and now we'll need it to increment
a request counter. Let's do it once for all.

This patch will need to be backported to 1.8 with the associated fix.
2018-09-05 16:30:17 +02:00
Willy Tarreau
f16cb41d19 MINOR: tools: make date2str_log() take some consts
The "tm" and "date" field are not modified, they can be const instead
of forcing their callers to use vars.
2018-09-05 16:30:11 +02:00
Baptiste Assmann
6d0f38f00d BUG/MEDIUM: dns/server: fix incomatibility between SRV resolution and server state file
Server state file has no indication that a server is currently managed
by a DNS SRV resolution.
And thus, both feature (DNS SRV resolution and server state), when used
together, does not provide the expected behavior: a smooth experience...

This patch introduce the "SRV record name" in the server state file and
loads and applies it if found and wherever required.

This patch applies to haproxy-dev branch only. For backport, a specific patch
is provided for 1.8.
2018-09-04 17:40:22 +02:00
Willy Tarreau
e215bba956 MINOR: connection: make conn_sock_drain() work for all socket families
This patch improves the previous fix by implementing the socket draining
code directly in conn_sock_drain() so that it always applies regardless
of the protocol's family. Thus it gets rid of tcp_drain().
2018-08-24 14:45:46 +02:00
Willy Tarreau
b509232eb8 MINOR: sample: remove impossible tests on negative smp->data.u.str.data
Since commit 843b7cb ("MEDIUM: chunks: make the chunk struct's fields
match the buffer struct") a chunk length is unsigned so we can remove
negative size checks.
2018-08-22 05:28:33 +02:00
Willy Tarreau
bba81563cf MINOR: chunk: remove impossible tests on negative chunk->data
Since commit 843b7cb ("MEDIUM: chunks: make the chunk struct's fields
match the buffer struct") a chunk length is unsigned so we can remove
negative size checks.
2018-08-22 05:28:32 +02:00
Willy Tarreau
1b13bfd646 BUG/MEDIUM: connection: don't forget to always delete the list's head
During a test it happened that a connection was deleted before the
stream it's attached to, resulting in a crash related to the fix
18a85fe ("BUG/MEDIUM: streams: Don't forget to remove the si from
the wait list.") during the LIST_DEL(). Make sure to always delete
the list's head in this case so that other elements can safely
detach later.

This is purely 1.9, no backport is needed.
2018-08-21 18:33:20 +02:00
Olivier Houchard
abedf5f6c3 BUG/MEDIUM: tasklets: Add the thread as active when waking a tasklet.
Set the flag for the current thread in active_threads_mask when waking a
tasklet, or we will never run it if no tasks are available.

This is 1.9-specific, no backport is needed.
2018-08-21 18:06:33 +02:00
Olivier Houchard
6aab737835 MINOR: fd cache: And the thread_mask with all_threads_mask.
When we choose to insert a fd in either the global or the local fd update list,
and the thread_mask against all_threads_mask before checking if it's tid_bit,
that way, if we run with nbthreads==1, we will always use the local list,
which is cheaper than the global one.
2018-08-17 14:50:47 +02:00
Olivier Houchard
8f0b4c66f5 MINOR: stream_interface: Give stream_interface its own wait_list.
Instead of just using the conn_stream wait_list, give the stream_interface
its own. When the conn_stream will have its own buffers, the stream_interface
may have to wait on it.
2018-08-16 17:29:54 +02:00
Olivier Houchard
91894cbf4c MINOR: stream_interface: Don't use si_cs_send() as a task handler.
Instead of using si_cs_send() as a task handler, define a new function,
si_cs_io_cb(), and give si_cs_send() its original prototype. Right now
si_cs_io_cb() just handles send, but later it'll handle recv() too.
2018-08-16 17:29:54 +02:00
Olivier Houchard
e1c6dbcd70 MINOR: connections/mux: Add the wait reason(s) to wait_list.
Add a new element to the wait_list, that let us know which event(s) we are
waiting on.
2018-08-16 17:29:53 +02:00
Olivier Houchard
5d18718c8f MINOR: tasks: Allow tasklet_wakeup() to wakeup a task.
Modify tasklet_wakeup() so that it handles a task as well, and inserts it
directly into the tasklet list, making it effectively a tasklet.
This should make future developments easier.
2018-08-16 17:29:53 +02:00
Olivier Houchard
ed0f207ef5 MINOR: connections: Get rid of txbuf.
Remove txbuf from conn_stream. It is not used yet, and its only user will
probably be the mux_h2, so it will be better suited in the struct h2s.
2018-08-16 17:29:51 +02:00
Olivier Houchard
638b799b09 MINOR: connections: Move rxbuf from the conn_stream to the h2s.
As the mux_h2 is the only user of rxbuf, move it to the struct h2s, instead
of conn_stream.
2018-08-16 17:28:11 +02:00
Olivier Houchard
511efeae7e MINOR: connections: Make rcv_buf mandatory and nuke cs_recv().
Reintroduce h2_rcv_buf(), right now it just does what cs_recv() did, but
should be modified later.
2018-08-16 17:23:44 +02:00
Patrick Hemmer
268a707a3d MEDIUM: add set-priority-class and set-priority-offset
This adds the set-priority-class and set-priority-offset actions to
http-request and tcp-request content. At this point they are not used
yet, which is the purpose of the next commit, but all the logic to
set and clear the values is there.
2018-08-10 15:06:31 +02:00
Patrick Hemmer
0355dabd7c MINOR: queue: replace the linked list with a tree
We'll need trees to manage the queues by priorities. This change replaces
the list with a tree based on a single key. It's effectively a list but
allows us to get rid of the list management right now.
2018-08-10 15:06:27 +02:00
Patrick Hemmer
da282f4a8f MINOR: queue: store the queue index in the stream when enqueuing
We store the queue index in the stream and check it on dequeueing to
figure how many entries were processed in between. This way we'll be
able to count the elements that may later be added before ours.
2018-08-10 15:06:25 +02:00
Patrick Hemmer
ffe5e8c638 MINOR: stream: rename {srv,prx}_queue_size to *_queue_pos
The current name is misleading as it implies a queue size, but the value
instead indicates a position in the queue.
The value is only the queue size at the exact moment the element is enqueued.
Soon we will gain the ability to insert anywhere into the queue, upon which
clarity of the name is more important.
2018-08-10 15:04:14 +02:00
Willy Tarreau
287527a176 BUG/MEDIUM: connection/mux: take care of serverless proxies
Commit 7ce0c89 ("MEDIUM: mux: Use the mux protocol specified on
bind/server lines") assumed a bit too strongly that we could only have
servers on the connect side :-) It segfaults under this config :

    defaults
        contimeout      5s
        clitimeout      5s
        srvtimeout      5s
        mode http

    listen test1
        bind :8001
        dispatch 127.0.0.1:8002

    frontend test2
        mode http
        bind :8002
        redirect location /

No backport needed.
2018-08-08 18:44:43 +02:00
Christopher Faulet
7ce0c891ab MEDIUM: mux: Use the mux protocol specified on bind/server lines
To do so, mux choices are split to handle incoming and outgoing connections in a
different way. The protocol specified on the bind/server line is used in
priority. Then, for frontend connections, the ALPN is retrieved and used to
choose the best mux. For backend connection, there is no ALPN. Finaly, if no
protocol is specified and no protocol matches the ALPN, we fall back on a
default mux, choosing in priority the first mux with exactly the same mode.
2018-08-08 10:42:08 +02:00
Christopher Faulet
8ed0a3e32a MINOR: mux/server: Add 'proto' keyword to force the multiplexer's protocol
For now, it is parsed but not used. Tests are done on it to check if the side
and the mode are compatible with the server's definition.
2018-08-08 10:42:08 +02:00
Christopher Faulet
a717b99284 MINOR: mux/frontend: Add 'proto' keyword to force the mux protocol
For now, it is parsed but not used. Tests are done on it to check if the side
and the mode are compatible with the proxy's definition.
2018-08-08 10:41:11 +02:00
Christopher Faulet
9c9ef03bf4 MINOR: mux: Improve the message with the list of existing mux protocols
Because there can be several default multiplexers (without name), they are now
reported with the name "<default>". And a message warns they cannot be
referenced with the "proto" keyword on a bind line or a server line.
2018-08-08 10:41:11 +02:00
Christopher Faulet
e15c6c48ef MINOR: mux: Change get_mux_proto to get an ist as parameter
It simplifies the API and ease comparisons with the multiplexers token (which is
an ist too).
2018-08-08 10:41:11 +02:00
Christopher Faulet
259e473ecc BUG/MINOR: threads: Remove the unexisting lock label "UPDATED_SERVERS_LOCK"
The update lock was removed by the commit 91c2826e1 ("CLEANUP: server: remove
the update list and the update lock"). But the lock label was not which makes
the compilation fail in debug mode.

pour vos modifications. Les lignes # commençant par '#' seront ignorées, et un
message vide abandonne la validation.  # # Sur la branche temp # Votre branche
est en avance sur 'origin/master' de 87 commits.  # (utilisez "git push" pour
publier vos commits locaux) # # Modifications qui seront validées : # modifié :
include/common/hathreads.h #
2018-08-08 10:41:11 +02:00
Willy Tarreau
91c2826e1d CLEANUP: server: remove the update list and the update lock
These ones are not more used, let's get rid of them.
2018-08-08 09:57:45 +02:00
Willy Tarreau
3ff577e165 MAJOR: server: make server state changes synchronous again
Now we try to synchronously push updates as they come using the new rdv
point, so that the call to the server update function from the main poll
loop is not needed anymore.

It further reduces the apparent latency in the health checks as the response
time almost always appears as 0 ms, resulting in a slightly higher check rate
of ~1960 conn/s. Despite this, the CPU consumption has slightly dropped again
to ~32% for the same test.

The only trick is that the checks code is built with a bit of recursivity
because srv_update_status() calls server_recalc_eweight(), and the latter
needs to signal srv_update_status() in case of updates. Thus we added an
extra argument to this function to indicate whether or not it must
propagate updates (no if it comes from srv_update_status).
2018-08-08 09:57:45 +02:00
Willy Tarreau
647c70b681 MINOR: threads: remove the previous synchronization point
It's not needed anymore as it is fully covered by the new rendez-vous
point. This also removes the pipe and its polling.
2018-08-08 09:57:45 +02:00
Christopher Faulet
98d9fe21e0 MINOR: mux: Print the list of existing mux protocols during HA startup
This is done in verbose/debug mode and when build options are reported.
2018-08-08 09:54:22 +02:00
Christopher Faulet
32f61c0421 MINOR: mux: Unlink ALPN and multiplexers to rather speak of mux protocols
Multiplexers are not necessarily associated to an ALPN. ALPN is a TLS extension,
so it is not always defined or used. Instead, we now rather speak of
multiplexer's protocols. So in this patch, there are no significative changes,
some structures and functions are just renamed.
2018-08-08 09:54:22 +02:00
Christopher Faulet
2d5292a412 MINOR: mux: Add info about the supported side in alpn_mux_list structure
Now, a multiplexer can specify if it can be install on incoming connections
(ALPN_SIDE_FE), on outgoing connections (ALPN_SIDE_BE) or both
(ALPN_SIDE_BOTH). These flags are compatible with proxies' ones.
2018-08-08 09:54:22 +02:00
Christopher Faulet
063f786553 MINOR: conn_stream: add cs_send() as a default snd_buf() function
This function is generic and is able to automatically transfer data from a
buffer to the conn_stream's tx buffer. It does this automatically if the mux
doesn't define another snd_buf() function.

It cannot yet be used as-is with the conn_stream's txbuf without risking to
lose data on close since conn_streams need to be orphaned for this.
2018-08-08 09:53:58 +02:00
Christopher Faulet
3c51802fb9 MINOR: conn_stream: add an tx buffer to the conn_stream
To be symmetrical with the recv() part, we no handle retryable and partial
transmission using a intermediary buffer in the conn_stream. For now it's only
set to BUF_NULL and never allocated nor used.

It cannot yet be used as-is without risking to lose data on close since
conn_streams need to be orphaned for this.
2018-08-08 09:53:01 +02:00
Christopher Faulet
d44a9b3627 MEDIUM: mux: Remove const on the buffer in mux->snd_buf()
This is a partial revert of the commit deccd1116 ("MEDIUM: mux: make
mux->snd_buf() take the byte count in argument"). It is a requirement to do
zero-copy transfers. This will be mandatory when the TX buffer of the
conn_stream will be used.

So, now, data are consumed by mux->snd_buf() and not only sent. So it needs to
update the buffer state. On its side, the caller must be aware the buffer can be
replaced y an empty or unallocated one.

As a side effet of this change, the function co_set_data() is now only responsible
to update the channel set, by update ->output field.
2018-08-07 14:36:52 +02:00
Christopher Faulet
ad4e1a4735 BUG/MINOR: buffers: Fix b_slow_realign when a buffer is realign without output
When b_slow_realign is called with the <output> parameter equal to 0, the
buffer's head, after the realign, must be set to 0. It was errornously set to
the buffer's size, because there was no test on the value of <output>.
2018-08-06 15:56:40 +02:00
Willy Tarreau
60b639ccbe MEDIUM: hathreads: implement a more flexible rendez-vous point
The current synchronization point enforces certain restrictions which
are hard to workaround in certain areas of the code. The fact that the
critical code can only be called from the sync point itself is a problem
for some callback-driven parts. The "show fd" command for example is
fragile regarding this.

Also it is expensive in terms of CPU usage because it wakes every other
thread just to be sure all of them join to the rendez-vous point. It's a
problem because the sleeping threads would not need to be woken up just
to know they're doing nothing.

Here we implement a different approach. We keep track of harmless threads,
which are defined as those either doing nothing, or doing harmless things.
The rendez-vous is used "for others" as a way for a thread to isolate itself.
A thread then requests to be alone using thread_isolate() when approaching
the dangerous area, and then waits until all other threads are either doing
the same or are doing something harmless (typically polling). The function
only returns once the thread is guaranteed to be alone, and the critical
section is terminated using thread_release().
2018-08-02 17:51:45 +02:00
Willy Tarreau
0c026f49e7 MINOR: threads: add more consistency between certain variables in no-thread case
When threads are disabled, some variables such as tid and tid_bit are
still checked everywhere, the MAX_THREADS_MASK macro is ~0UL while
MAX_THREADS is 1, and the all_threads_mask variable is replaced with a
macro forced to zero. The compiler cannot optimize away all this code
involving checks on tid and tid_bit, and we end up in special cases
where all_threads_mask has to be specifically tested for being zero or
not. It is not even certain the code paths are always equivalent when
testing without threads and with nbthread 1.

Let's change this to make sure we always present a single thread when
threads are disabled, and have the relevant values declared as constants
so that the compiler can optimize all the tests away. Now we have
MAX_THREADS_MASK set to 1, all_threads_mask set to 1, tid set to zero
and tid_bit set to 1. Doing just this has removed 4 kB of code in the
no-thread case.

A few checks for all_threads_mask==0 have been removed since it never
happens anymore.
2018-08-02 17:48:09 +02:00
Willy Tarreau
c03ea40763 BUILD/MINOR: compiler: fix offsetof() on older compilers
An offsetof() macro was introduced with commit 928fbfa ("MINOR: compiler:
introduce offsetoff().") with a fallback for older compilers. But this
breaks gcc 3.4 because __size_t and __uintptr_t are not defined there.
However size_t and uintptr_t are, so let's fix it this way. No backport
needed.
2018-07-30 11:49:35 +02:00
Willy Tarreau
0ccd32285f MINOR: threads: move "nbthread" parsing to hathreads.c
The purpose is to make sure that all variables which directly depend
on this nbthread argument are set at the right moment. For now only
all_threads_mask needs to be set. It used to be set while calling
thread_sync_init() which is called too late for certain checks. The
same function handles threads and non-threads, which removes the need
for some thread-specific knowledge from cfgparse.c.
2018-07-30 11:10:46 +02:00
Olivier Houchard
3e12304ae0 BUG/MINOR: threads: Handle nbthread == MAX_THREADS.
If nbthread is MAX_THREADS, the shift operation needed to compute
all_threads_mask fails in thread_sync_init(). Instead pass a number
of threads to this function and let it compute the mask without
overflowing.

This should be backported to 1.8.
2018-07-27 17:18:22 +02:00
Emmanuel Hocdet
ebabd8768a MINOR: ssl: BoringSSL matches OpenSSL 1.1.0
Since BoringSSL 3b2ff028, API now correctly match OpenSSL 1.1.0.
The patch revert part of haproxy 019f9b10: "Fix BoringSSL call and
openssl-compat.h/#define occordingly.".
This will not break openssl/libressl compat.
2018-07-27 09:43:40 +02:00
Olivier Houchard
79321b95a8 MINOR: pollers: Add a way to wake a thread sleeping in the poller.
Add a new pipe, one per thread, so that we can write on it to wake a thread
sleeping in a poller, and use it to wake threads supposed to take care of a
task, if they are all sleeping.
2018-07-26 19:09:50 +02:00
Olivier Houchard
9b03c0c9a7 MINOR: tasks: Make active_tasks_mask volatile.
To be sure we have the relevant informations, make active_tasks_mask volatile
2018-07-26 19:09:50 +02:00
Willy Tarreau
3201e4e428 MEDIUM: queue: get rid of the pendconn lock
This lock was necessary to manipulate the pendconn element between
concurrent places, but was causing great difficulties in the list walk
by having to iterate over multiple entries instead of being able to
safely pick the first one (in fact the first element was always the
right one but the locking model was hard to prove).

Here since we know we can always rely on the queue's locks, we take
the queue's lock every time we need to modify the element. In practice
it was already the case everywhere except in pendconn_dequeue() which
only works on an element that was already detached. This function had
to be protected against the risk of meeting an incompletely detached
element (which could be unlinked but not yet assigned). By taking the
queue lock around the LIST_ISEMPTY test, it's enough to ensure that a
concurrent thread either didn't begin or had completed the operation.

The true benefit really is in pendconn_process_next_strm() where we
can again safely work with the first element of each queue. This will
significantly simplify next updates to this code.
2018-07-26 17:32:51 +02:00
Willy Tarreau
88930dd364 MINOR: queue: use a distinct variable for the assigned server and the queue
The pendconn struct uses ->px and ->srv to designate where the element is
queued. There is something confusing regarding threads though, because we
have to lock the appropriate queue before inserting/removing elements, and
this queue may only be determined by looking at ->srv (if it's not NULL
it's the server, otherwise use the proxy). But pendconn_grab_from_px() and
pendconn_process_next_strm() both assign this ->srv field, making it
complicated to know what queue to lock before manipulating the element,
which is exactly why we have the pendconn_lock in the first place.

This commit introduces pendconn->target which is the target server that
the two aforementioned functions will set when assigning the server.
Thanks to this, the server pointer may always be relied on to determine
what queue to use.
2018-07-26 17:32:51 +02:00
Willy Tarreau
d0ad4a87f0 MEDIUM: queue: make pendconn_free() work on the stream instead
Now pendconn_free() takes a stream, checks that pend_pos is set, clears
it, and uses pendconn_unlink() to complete the job. It's cleaner and
centralizes all the bookkeeping work in pendconn_unlink() only and
ensures that there's a single place where the stream's position in the
queue is manipulated.
2018-07-26 17:32:51 +02:00
Willy Tarreau
9624faec86 MINOR: queue: centralize dequeuing code a bit better
For now the pendconns may be dequeued at two places :
  - pendconn_unlink(), which operates on a locked queue
  - pendconn_free(), which operates on an unlocked queue and frees
    everything.

Some changes are coming to the queue and we'll need to be able to be a
bit stricter regarding the places where we dequeue to keep the accounting
accurate. This first step renames the locked function __pendconn_unlink()
as it's for use by those aware of it, and introduces a new general purpose
pendconn_unlink() function which automatically grabs the necessary locks
before calling the former, and pendconn_cond_unlink() which additionally
checks the pointer and the presence in the queue.
2018-07-26 17:32:48 +02:00
Olivier Houchard
77551ee8a7 BUG/MEDIUM: tasks: make __task_unlink_rq responsible for the rqueue size.
As __task_wakeup() is responsible for increasing
rqueue_local[tid]/global_rqueue_size, make __task_unlink_rq responsible for
decreasing it, as process_runnable_tasks() isn't the only one that removes
tasks from runqueues.
2018-07-26 16:33:29 +02:00
Olivier Houchard
76e45181b2 MINOR: tasks: Add a flag that tells if we're in the global runqueue.
How that we have bits available in task->state, add a flag that tells if we're
in the global runqueue or not.
2018-07-26 16:33:10 +02:00
Willy Tarreau
f0cea1ee3f MINOR: tasks: extend the state bits from 8 to 16 and remove the reason
By removing the reason code for the wakeup we can gain 8 extra bits to
encode the task's state. The reason code was never used at all and is
wrong by design since subsequent calls will OR this value anyway. Let's
say it goodbye and leave the room for more precious bits. The woken bits
were moved to the higher byte so that the most important bits can stay
grouped together.
2018-07-26 16:13:00 +02:00
Willy Tarreau
7999bfbfd3 MEDIUM: buffers: make b_xfer() automatically swap buffers when possible
Whenever it's possible to avoid a copy, b_xfer() will simply swap the
buffer's heads without touching the data. This has brought the performance
back from 140 kH/s to 202 kH/s on the test case.
2018-07-20 19:21:43 +02:00
Willy Tarreau
11c9aa424e MEDIUM: conn_stream: add cs_recv() as a default rcv_buf() function
This function is generic and is able to automatically transfer data
from a conn_stream's rx buffer to the destination buffer. It does this
automatically if the mux doesn't define another rcv_buf() function.
2018-07-20 19:21:43 +02:00
Willy Tarreau
5e1cc5ea83 MINOR: conn_stream: add an rx buffer to the conn_stream
In order to reorganize the connection layers, recv() operations will
need to be retryable and to support partial transfers. This requires
an intermediary buffer to hold the data coming from the mux. After a
few attempts, it turns out that this buffer is best placed inside the
conn_stream itself. For now it's only set to buf_empty and it will be
up to the caller to allocate it if required.
2018-07-20 19:21:43 +02:00
Willy Tarreau
a3f7efe009 MINOR: conn_stream: add a new CS_FL_REOS flag
This flag indicates that the mux layer has already detected an end of
stream which will become CS_FL_EOS during a recv() once the rx buffer
is empty.
2018-07-20 19:21:43 +02:00
Willy Tarreau
f148888d19 MINOR: buffers: add b_xfer() to transfer data between buffers
Instead of open-coding buffer-to-buffer transfers using blocks, let's
have a dedicated function for this. It also adjusts the buffer counts.
2018-07-20 19:21:43 +02:00
Willy Tarreau
f7d0447376 MINOR: buffers: split b_putblk() into __b_putblk()
The latter function is more suited to operations that don't require any
check because the check has already been performed. It will be used by
other b_* functions.
2018-07-20 19:21:43 +02:00
Willy Tarreau
ab322d4fd4 MINOR: buffers: simplify b_contig_space()
This function is used a lot in block copies and is needlessly
complicated since it still uses pointer arithmetic. Let's fall
back to regular offsets and simplify it. This removed around
23 bytes from b_putblk() and it removed any conditional jump.
2018-07-20 19:21:43 +02:00
Christopher Faulet
ddb6c16576 BUG/MEDIUM: threads: Fix the exit condition of the thread barrier
In thread_sync_barrier, we exit when all threads have set their own bit in the
barrier mask. It is done by comparing it to all_threads_mask. But we must not
use a simple equality to do so, becaue all_threads_mask may change. Since commit
ba86c6c25 ("MINOR: threads: Be sure to remove threads from all_threads_mask on
exit"), when a thread exit, its bit is removed from all_threads_mask. Instead,
we must use a bitwise AND to test is all bits of all_threads_mask are set.

This also requires that all_threads_mask is set to volatile if we want to
catch changes.

This patch must be backported in 1.8.
2018-07-20 14:24:41 +02:00
Christopher Faulet
20761453fb MINOR: ist: Add the function isteqi
This new function does the same as isteq, but ignoring the case.
2018-07-20 13:39:30 +02:00
Willy Tarreau
8318885487 MINOR: connection: simplify subscription by adding a registration function
This new function wl_set_waitcb() prepopulates a wait_list with a tasklet
and a context and returns it so that it can be passed to ->subscribe() to
be added to a connection or conn_stream's wait_list. The caller doesn't
need to know all the insiders details anymore this way.
2018-07-19 18:31:07 +02:00
Olivier Houchard
910b2bc829 MEDIUM: connections/mux: Revamp the send direction.
Totally nuke the "send" method, instead, the upper layer decides when it's
time to send data, and if it's not possible, uses the new subscribe() method
to be called when it can send data again.
2018-07-19 18:31:07 +02:00
Olivier Houchard
6ff2039d13 MINOR: connections/mux: Add a new "subscribe" method.
Add a new "subscribe" method for connection, conn_stream and mux, so that
upper layer can subscribe to them, to be called when the event happens.
Right now, the only event implemented is "SUB_CAN_SEND", where the upper
layer can register to be called back when it is possible to send data.

The connection and conn_stream got a new "send_wait_list" entry, which
required to move a few struct members around to maintain an efficient
cache alignment (and actually this slightly improved performance).
2018-07-19 16:23:43 +02:00
Olivier Houchard
e17c2d3e57 MINOR: tasklets: Don't attempt to add a tasklet in the list twice.
Don't try to add a tasklet to the run queue if it's already in there, or we
might get an infinite loop.
2018-07-19 16:23:43 +02:00
Willy Tarreau
83061a820e MAJOR: chunks: replace struct chunk with struct buffer
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
2018-07-19 16:23:43 +02:00
Willy Tarreau
843b7cbe9d MEDIUM: chunks: make the chunk struct's fields match the buffer struct
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.

Most of the changes were made with spatch using this coccinelle script :

  @rule_d1@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.str
  + chunk.area

  @rule_d2@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.len
  + chunk.data

  @rule_i1@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->str
  + chunk->area

  @rule_i2@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->len
  + chunk->data

Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
2018-07-19 16:23:43 +02:00
Willy Tarreau
c9fa0480af MAJOR: buffer: finalize buffer detachment
Now the buffers only contain the header and a pointer to the storage
area which can be anywhere. This will significantly simplify buffer
swapping and will make it possible to map chunks on buffers as well.

The buf_empty variable was removed, as now it's enough to have size==0
and area==NULL to designate the empty buffer (thus a non-allocated head
is the empty buffer by default). buf_wanted for now is indicated by
size==0 and area==(void *)1.

The channels and the checks now embed the buffer's head, and the only
pointer is to the storage area. This slightly increases the unallocated
buffer size (3 extra ints for the empty buffer) but considerably
simplifies dynamic buffer management. It will also later permit to
detach unused checks.

The way the struct buffer is arranged has proven quite efficient on a
number of tests, which makes sense given that size is always accessed
and often first, followed by the othe ones.
2018-07-19 16:23:43 +02:00
Willy Tarreau
bd1dba8a89 MINOR: buffer: rename the data length member to '->data'
It used to be called 'len' during the reorganisation but strictly speaking
it's not a length since it wraps. Also we already use '_data' as the suffix
to count available data, and data is also what we use to indicate the amount
of data in a pipe so let's improve consistency here. It was important to do
this in two operations because data used to be the name of the pointer to
the storage area.
2018-07-19 16:23:43 +02:00
Willy Tarreau
e3128024bf MINOR: buffer: replace buffer_replace2() with b_rep_blk()
This one is more generic and designed to work on a random block. It
may later get a b_rep_ist() variant since many strings are already
available as (ptr,len).
2018-07-19 16:23:43 +02:00
Willy Tarreau
4d893d440c MINOR: buffers/channel: replace buffer_insert_line2() with ci_insert_line2()
There was no point keeping that function in the buffer part since it's
exclusively used by HTTP at the channel level, since it also automatically
appends the CRLF. This further cleans up the buffer code.
2018-07-19 16:23:43 +02:00
Willy Tarreau
7b04cc4467 CLEANUP: buffer: minor cleanups to buffer.h
Remove a few unused functions and add some comments to split the file
parts in sections.
2018-07-19 16:23:43 +02:00
Willy Tarreau
911f7dd893 MINOR: buffers: remove b_putstr()
It's not needed anymore.
2018-07-19 16:23:43 +02:00
Willy Tarreau
ea1b06d5bb MINOR: buffer: add a new file for ist + buffer manipulation functions
The new file istbuf.h links the indirect strings (ist) with the buffers.
The purpose is to encourage addition of more standard buffer manipulation
functions that rely on this in order to improve the overall ease of use
along all the code. Just like ist.h and buf.h, this new file is not
expected to depend on anything beyond these two files.

A few functions were added and/or converted from buffer.h :
  - b_isteq()  : indicates if a buffer and a string match
  - b_isteat() : consumes a string from the buffer if it matches
  - b_istput() : appends a small string to a buffer (all or none)
  - b_putist() : appends part of a large string to a buffer

The equivalent functions were removed from buffer.h and changed at the
various call places.
2018-07-19 16:23:43 +02:00
Willy Tarreau
55372f646f MINOR: buffer: replace b{i,o}_put* with b_put*
The two variants now do exactly the same (appending at the tail of the
buffer) so let's not keep the distinction between these classes of
functions and have generic ones for this. It's also worth noting that
b{i,o}_putchk() wasn't used at all and was removed.
2018-07-19 16:23:43 +02:00
Willy Tarreau
72a100b386 MINOR: buffer: replace bi_fast_delete() with b_del()
There's no distinction between in and out data now. The latter covers
the needs of the former and supports wrapping. The extra cost is
negligible given the locations where it's used.
2018-07-19 16:23:43 +02:00
Olivier Houchard
08afac0fd7 MEDIUM: buffers: move "output" from struct buffer to struct channel
Since we never access this field directly anymore, but only through the
channel's wrappers, it can now move to the channel. The buffers are now
completely free from the distinction between input and output data.
2018-07-19 16:23:43 +02:00
Willy Tarreau
892f1dbe4f MINOR: buffer: rename the "data" field to "area"
Since we use "_data" for the amount of data at many places, as opposed to
"_space" for the amount of space, let's rename the "data" field to "area"
so that we can reuse "data" later for the amount of data in the buffer
(currently called "len" despite not being contigous).
2018-07-19 16:23:43 +02:00
Willy Tarreau
f6dfd88a92 MINOR: buffer: b_set_data() doesn't truncate output data anymore
b_set_data() is used :
  - in proto_http and hlua to trim input data (b_set_data(co_data()))
  - in SPOE to append data to a buffer while building a message

In no case will this truncate a buffer so we can safely remove the
test for len < b->output.
2018-07-19 16:23:43 +02:00
Willy Tarreau
abed1e7f34 MINOR: buffer: remove the check for output on b_del()
b_del() is used in :
  - mux_h2 with the demux buffer : always processes input data
  - checks with output data though output is not considered at all there
  - b_eat() which is not used anywhere
  - co_skip() where the len is always <= output

Thus the distinction for output data is not needed anymore and the
decrement can be made inconditionally in co_skip().
2018-07-19 16:23:43 +02:00
Willy Tarreau
d54a8ceb97 MAJOR: start to change buffer API
This is intentionally the minimal and safest set of changes, some cleanups
area still required. These changes are quite tricky and cannot be
independantly tested, so it's important to keep this patch as bisectable
as possible.

buf_empty and buf_wanted were changed and are now exactly similar since
there's no <p> member in the structure anymore. Given that no test is
ever made in the code to check that buf == &buf_wanted, it may be possible
that we don't need to have two anymore, unless some buf_empty tests have
precedence. This will have to be investigated.

A significant part of this commit affects the HTTP compression code,
which used to deeply manipulate the input and output buffers without
any reasonable solution for a better abstraction. For this reason, if
any regression is met and designates this patch as the culprit, it is
important to run tests which specifically involve compression or which
definitely don't use it in order to spot the issue.

Cc: Olivier Houchard <ohouchard@haproxy.com>
2018-07-19 16:23:42 +02:00
Willy Tarreau
523cc5d506 MINOR: buffer: convert part bo_putblk() and bi_putblk() to the new API
These functions are pretty similar and will be merged at the end of the
migration. For now they still need to remain distinct.
2018-07-19 16:23:42 +02:00
Willy Tarreau
fdabbe243d MINOR: buffer: remove unused bo_add()
We don't need this function anymore.
2018-07-19 16:23:42 +02:00
Willy Tarreau
cd9e60db00 MEDIUM: channel: adapt to the new buffer API
Also, ci_swpbuf() was removed (unused).
2018-07-19 16:23:42 +02:00
Olivier Houchard
d4251a7e98 MINOR: channel: Add co_set_data().
Add a new function that lets one set the channel's output amount.
2018-07-19 16:23:42 +02:00
Willy Tarreau
3ee8344b7b MINOR: channel: remove almost all references to buf->i and buf->o
We use ci_data() and co_data() instead now everywhere we read these
values.
2018-07-19 16:23:42 +02:00
Willy Tarreau
591d445049 MINOR: buffer: use b_orig() to replace most references to b->data
This patch updates most users of b->data to use b_orig().
2018-07-19 16:23:42 +02:00
Willy Tarreau
50227f9b88 MINOR: buffer: use c_head() instead of buffer_wrap_sub(c->buf, p-o)
This way we don't need o anymore.
2018-07-19 16:23:42 +02:00
Willy Tarreau
144c5c4d21 MINOR: buffer: replace buffer_flush() with c_adv(chn, ci_data(chn))
It used to forward some input into output.
2018-07-19 16:23:41 +02:00
Willy Tarreau
5ba65521a3 MINOR: buffer: replace buffer_pending() with ci_data()
It used to return b->i for channels, which is what ci_data() does.
2018-07-19 16:23:41 +02:00
Willy Tarreau
3f6799975f MINOR: buffer: replace bi_space_for_replace() with ci_space_for_replace()
This one computes the size that can be overwritten over the input part
of the buffer, so it's channel-specific.
2018-07-19 16:23:41 +02:00
Willy Tarreau
2375233ef0 MINOR: buffer: replace buffer_full() with channel_full()
It's only used by channels since we need to know the amount of output
data.
2018-07-19 16:23:41 +02:00
Willy Tarreau
271e2a503d MINOR: buffer: make bo_putchar() use b_tail()
It's possible because we can't call bo_putchar() with i != 0.
2018-07-19 16:23:41 +02:00
Willy Tarreau
0c7ed5d264 MINOR: buffer: replace buffer_empty() with b_empty() or c_empty()
For the same consistency reasons, let's use b_empty() at the few places
where an empty buffer is expected, or c_empty() if it's done on a channel.
Some of these places were there to realign the buffer so
{b,c}_realign_if_empty() was used instead.
2018-07-19 16:23:41 +02:00
Willy Tarreau
d760eecf61 MINOR: buffer: replace buffer_not_empty() with b_data() or c_data()
It's mostly for consistency as many places already use one of these instead.
2018-07-19 16:23:41 +02:00
Willy Tarreau
eac5259888 MINOR: buffer: use b_room() to determine available space in a buffer
We used to have variations around buffer_total_space() and
size-buffer_len() or size-b_data(). Let's simplify all this. buffer_len()
was also removed as not used anymore.
2018-07-19 16:23:41 +02:00
Willy Tarreau
bc59f359dc MINOR: buffer: get rid of b_ptr() and convert its last users
Now the new API functions are being used everywhere, we can get rid
of b_ptr(). A few last users like bi_istput() and bo_istput() appear
to only differ by what part of the buffer they're increasing, but
that should quickly be merged.
2018-07-19 16:23:41 +02:00
Willy Tarreau
337ea57cfc MINOR: connection: add a new receive flag : CO_RFL_BUF_WET
With this flag we introduce the notion of "dry" vs "wet" buffers : some
demultiplexers like the H2 mux require as much room as possible for some
operations that are not retryable like decoding a headers frame. For this
they need to know if the buffer is congested with data scheduled for
leaving soon or not. Since the new API will not provide this information
in the buffer itself, the caller must indicate it. We never need to know
the amount of such data, just the fact that the buffer is not in its
optimal condition to be used for receipt. This "CO_RFL_BUF_WET" flag is
used to mention that such outgoing data are still pending in the buffer
and that a sensitive receiver should better let it "dry" before using it.
2018-07-19 16:23:41 +02:00
Willy Tarreau
7f3225f251 MINOR: connection: add a flags argument to rcv_buf()
The mux and transport rcv_buf() now takes a "flags" argument, just like
the snd_buf() one or like the equivalent syscall lower part. The upper
layers will use this to pass some information such as indicating whether
the buffer is free from outgoing data or if the lower layer may allocate
the buffer itself.
2018-07-19 16:23:41 +02:00
Willy Tarreau
d9cf540457 MEDIUM: mux: make mux->rcv_buf() take a size_t for the count
It also returns a size_t. This is in order to clean the API. Note
that the H2 mux still uses some ints in the functions called from
h2_rcv_buf(), though it's not really a problem given that H2 frames
are smaller. It may deserve a general cleanup later though.
2018-07-19 16:23:41 +02:00
Willy Tarreau
bfc4d77ad3 MEDIUM: connection: make xprt->rcv_buf() use size_t for the count
Just like we have a size_t for xprt->snd_buf(), we adjust to use size_t
for rcv_buf()'s count argument and return value. It also removes the
ambiguity related to the possibility to see a negative value there.
2018-07-19 16:23:41 +02:00
Willy Tarreau
deccd1116d MEDIUM: mux: make mux->snd_buf() take the byte count in argument
This way the mux doesn't need to modify the buffer's metadata anymore
nor to know the output's size. The mux->snd_buf() function now takes a
const buffer and it's up to the caller to update the buffer's state.

The return type was updated to return a size_t to comply with the count
argument.
2018-07-19 16:23:41 +02:00
Willy Tarreau
787db9a6a4 MEDIUM: connection: make xprt->snd_buf() take the byte count in argument
This way the senders don't need to modify the buffer's metadata anymore
nor to know about the output's split point. This way the functions can
take a const buffer and it's clearer who's in charge of updating the
buffer after a send. That's why the buffer realignment is now performed
by the caller of the transport's snd_buf() functions.

The return type was updated to return a size_t to comply with the count
argument.
2018-07-19 16:23:41 +02:00
Willy Tarreau
55f3ce1c91 MINOR: buffer: make b_getblk_nc() take size_t for the block sizes
Till now we used to reimplement it using ints to limit external changes
but we must adjust it and the various users to switch to size_t.
2018-07-19 16:23:41 +02:00
Willy Tarreau
206ba834ef MINOR: buffer: make b_getblk_nc() take const pointers
Now that there are no more users requiring to modify the buffer anymore,
switch these ones to const char and const buffer. This will make it more
obvious next time send functions are tempted to modify the buffer's output
count. Minor adaptations were necessary at a few call places which were
using char due to the function's previous prototype.
2018-07-19 16:23:41 +02:00
Willy Tarreau
5d7d1bbd0e MINOR: buffer: get rid of b_end() and b_to_end()
These ones are not used anymore.
2018-07-19 16:23:41 +02:00
Willy Tarreau
f40e68227b MINOR: h1: make h1_measure_trailers() use an offset and a count
This will be needed by the H2 encoder to restart after wrapping.
2018-07-19 16:23:41 +02:00
Willy Tarreau
84d6b7af87 MINOR: h1: make h1_parse_chunk_size() not depend on b_ptr() anymore
It's similar to the previous commit so that the function doesn't rely
on buf->p anymore.
2018-07-19 16:23:41 +02:00
Willy Tarreau
c0973c6742 MINOR: h1: make h1_skip_chunk_crlf() not depend on b_ptr() anymore
It now takes offsets relative to the buffer's head. It's up to the
callers to add this offset which corresponds to the buffer's output
size.
2018-07-19 16:23:41 +02:00
Willy Tarreau
7314be8e2c MINOR: h1: make h1_measure_trailers() take the byte count in argument
The principle is that it should not have to take this value from the
buffer itself anymore.
2018-07-19 16:23:40 +02:00
Willy Tarreau
e5f12ce7f2 MINOR: buffer: replace bi_del() and bo_del() with b_del()
Till now the callers had to know which one to call for specific use cases.
Let's fuse them now since a single one will remain after the API migration.
Given that bi_del() may only be used where o==0, just combine the two tests
by first removing output data then only input.
2018-07-19 16:23:40 +02:00
Willy Tarreau
a1f78fb652 MINOR: buffer: replace bo_getblk_nc() with b_getblk_nc() which takes an offset
This will be important so that we can parse a buffer without touching it.
Now we indicate where from the buffer's head we plan to start to copy, and
for how many bytes. This will be used by send functions to loop at the end
of the buffer without having to update the buffer's output byte count.
2018-07-19 16:23:40 +02:00
Willy Tarreau
90ed3836db MINOR: buffer: replace bo_getblk() with direction agnostic b_getblk()
This new functoin limits itself to the amount of data available in the
buffer and doesn't care about the direction anymore. It's only called
from co_getblk() which already checks that no more than the available
output bytes is requested.
2018-07-19 16:23:40 +02:00
Willy Tarreau
e4d5a036ed MINOR: buffer: merge b{i,o}_contig_space()
These ones were merged into a single b_contig_space() that covers both
(the bo_ case was a simplified version of the other one). The function
doesn't use ->i nor ->o anymore.
2018-07-19 16:23:40 +02:00
Willy Tarreau
0e11d59af6 MINOR: buffer: remove bo_contig_data()
The two call places now make use of b_contig_data(0) and check by
themselves that the returned size is no larger than the scheduled
output data.
2018-07-19 16:23:40 +02:00
Willy Tarreau
8f9c72d301 MINOR: buffer: remove bi_end()
It was replaced by ci_tail() when the channel is known, or b_tail() in
other cases.
2018-07-19 16:23:40 +02:00
Willy Tarreau
41e38ac0ee MINOR: buffer: remove bo_end()
It was replaced by either b_tail() when the buffer has no input data, or
b_peek(b, b->o).
2018-07-19 16:23:40 +02:00
Willy Tarreau
89faf5d7c3 MINOR: buffer: remove bo_ptr()
It was replaced by co_head() when a channel was known, otherwise b_head().
2018-07-19 16:23:40 +02:00
Willy Tarreau
dda2e41881 MINOR: buffer: remove bi_ptr()
It's now been replaced by b_head() when b->o is null, ci_head() when
the channel is known, or b_peek(b, b->o) in other situations.
2018-07-19 16:23:40 +02:00
Willy Tarreau
7194d3cc3b MINOR: buffer: split bi_contig_data() into ci_contig_data and b_config_data()
This function was sometimes used from a channel and sometimes from a buffer.
In both cases it requires knowledge of the size of the output data (to skip
them). Here the split ensures the channel can deal with this point, and that
other places not having output data can continue to work.
2018-07-19 16:23:40 +02:00
Willy Tarreau
d55fe397a0 MINOR: buffer: remove bi_getblk() and bi_getblk_nc()
These ones were relying on bi_ptr() and are not used. They may be
reimplemented later in the channel if needed.
2018-07-19 16:23:40 +02:00
Willy Tarreau
aa7af7213d MINOR: buffer: replace calls to buffer_space_wraps() with b_space_wraps()
And remove the unused function.
2018-07-19 16:23:40 +02:00
Willy Tarreau
bcbd39370f MINOR: channel/buffer: replace b_{adv,rew} with c_{adv,rew}
These ones manipulate the output data count which will be specific to
the channel soon, so prepare the call points to use the channel only.
The b_* functions are now unused and were removed.
2018-07-19 16:23:40 +02:00
Willy Tarreau
c0a51c51b1 MINOR: buffer: remove buffer_slow_realign() and the swap_buffer allocation code
Since all call places can use the trash now, this is not needed anymore.
2018-07-19 16:23:40 +02:00
Willy Tarreau
fd8d42f496 MEDIUM: channel: make channel_slow_realign() take a swap buffer
The few call places where it's used can use the trash as a swap buffer,
which is made for this exact purpose. This way we can rely on the
generic b_slow_realign() call.
2018-07-19 16:23:40 +02:00
Willy Tarreau
4cf1300e6a MINOR: channel/buffer: replace buffer_slow_realign() with channel_slow_realign() and b_slow_realign()
Where relevant, the channel version is used instead. The buffer version
was ported to be more generic and now takes a swap buffer and the output
byte count to know where to set the alignment point. The H2 mux still
uses buffer_slow_realign() with buf->o but it will change later.
2018-07-19 16:23:40 +02:00
Willy Tarreau
d5b343bf9e MINOR: channel/buffer: use c_realign_if_empty() instead of buffer_realign()
This patch removes buffer_realign() and replaces it with c_realign_if_empty()
instead.
2018-07-19 16:23:40 +02:00
Willy Tarreau
08d5ac8f27 MINOR: channel: add a few basic functions for the new buffer API
This adds :
  - c_orig()  : channel buffer's origin
  - c_size()  : channel buffer's size
  - c_wrap()  : channel buffer's wrapping location
  - c_data()  : channel buffer's total data count
  - c_room()  : room left in channel buffer's
  - c_empty() : true if channel buffer is empty
  - c_full()  : true if channel buffer is full

  - c_ptr()   : pointer to an offset relative to input data in the buffer
  - c_adv()   : advances the channel's buffer (bytes become part of output)
  - c_rew()   : rewinds the channel's buffer (output bytes not output anymore)
  - c_realign_if_empty() : realigns the buffer if it's empty

  - co_data() : # of output data
  - co_head() : beginning of output data
  - co_tail() : end of output data
  - ci_data() : # of input data
  - ci_head() : beginning of input data
  - ci_tail() : end of input data
  - ci_stop() : location after ci_tail()
  - ci_next() : pointer to next input byte

And for the ci_* / co_* functions above, the "__*" variants which disable
wrapping checks, and the "_ofs" variants which return an offset relative to
the buffer's origin instead.
2018-07-19 16:23:39 +02:00
Willy Tarreau
f17f19f1a7 MINOR: buffer: introduce b_realign_if_empty()
Many places deal with buffer realignment after data removal. The method
is always the same : if the buffer is empty, set its pointer to the origin.
Let's have a function for this so that we have less code to change with the
new API.
2018-07-19 16:23:39 +02:00
Olivier Houchard
a04e40d578 MINOR: buffer: Add b_set_data().
Add a new function that lets you set the amount of input in a buffer.
For now it extends/truncates b->i except if the total length is
below b->o in which case it clears i and adjusts o.
2018-07-19 16:23:39 +02:00
Olivier Houchard
09138ecc49 MINOR: buffer: Introduce b_sub(), b_add(), and bo_add()
Instead of doing b->i -= directly, introduce b_sub(), that does the job, to
make it easier to switch to the future API.

Also add b_add(), that increases b->i, instead of using it directly, and
bo_add(), that does increase b->o.
2018-07-19 16:23:39 +02:00
Willy Tarreau
bbc68df330 MINOR: buffer: add a few basic functions for the new API
Here's the list of newly introduced functions :

- b_data(), returning the total amount of data in the buffer (currently i+o)

- b_orig(), returning the origin of the storage area, that is, the place of
  position 0.

- b_wrap(), pointer to wrapping point (currently data+size)

- b_size(), returning the size of the buffer

- b_room(), returning the amount of bytes left available

- b_full(), returning true if the buffer is full, otherwise false

- b_stop(), pointer to end of data mark (currently p+i), used to compute
  distances or a stop pointer for a loop.

- b_peek(), this one will help make the transition to the new buffer model.
  It returns a pointer to a position in the buffer known from an offest
  relative to the beginning of the data in the buffer. Thus, we can replace
  the following occurrences :

     bo_ptr(b)     => b_peek(b, 0);
     bo_end(b)     => b_peek(b, b->o);
     bi_ptr(b)     => b_peek(b, b->o);
     bi_end(b)     => b_peek(b, b->i + b->o);
     b_ptr(b, ofs) => b_peek(b, b->o + ofs);

- b_head(), pointer to the beginning of data (currently bo_ptr())

- b_tail(), pointer to first free place (currently bi_ptr())

- b_next() / b_next_ofs(), pointer to the next byte, taking wrapping
  into account.

- b_dist(), returning the distance between two pointers belonging to a buffer

- b_reset(), which resets the buffer

- b_space_wraps(), indicating if the free space wraps around the buffer

- b_almost_full(), indicating if 3/4 or more of the buffer are used

Some of these are provided with the unchecked variants using the "__"
prefix, or with the "_ofs" suffix indicating they return a relative
position to the buffer's origin instead of a pointer.

Cc: Olivier Houchard <ohouchard@haproxy.com>
2018-07-19 16:23:39 +02:00
Willy Tarreau
506a29ac6e MINOR: buffer: switch buffer sizes and offsets to size_t
Passing unsigned ints everywhere is painful, and will cause some headache
later when we'll want to integrate better with struct ist which already
uses size_t. Let's switch buffers to use size_t instead.
2018-07-19 16:23:39 +02:00
Willy Tarreau
41806d1c52 MINOR: buffer: implement a new file for low-level buffer manipulation functions
The buffer code currently depends on pools and other stuff and is not
really autonomous anymore. The rewrite of the new API is an opportunity
to clean this up. This patch creates a new file (buf.h) which does not
depend on other elements and which will only contain what is needed to
perform the most basic buffer operations. The new API will be introduced
in this file and the conversion will be finished once buffer.h is empty.

The definition of struct buffer was moved to this new file, using more
explicity stdint types for the sizes and offsets.

Most new functions will be implemented in two variants :

  __b_something() : unchecked variant, no wrapping is expected
  b_something() : wrapping-checked variant

This way callers will be able to select which one to use depending on
the use cases.
2018-07-19 16:23:39 +02:00
Olivier Houchard
9ddaf794a8 MINOR: tasklet: Set process to NULL.
Some consumers expect the process to be NULL when a tasklet it created, so
do so.
2018-07-19 16:23:08 +02:00
Willy Tarreau
17b4aa1adc BUG/MINOR: ssl: properly ref-count the tls_keys entries
Commit 200b0fa ("MEDIUM: Add support for updating TLS ticket keys via
socket") introduced support for updating TLS ticket keys from the CLI,
but missed a small corner case : if multiple bind lines reference the
same tls_keys file, the same reference is used (as expected), but during
the clean shutdown, it will lead to a double free when destroying the
bind_conf contexts since none of the lines knows if others still use
it. The impact is very low however, mostly a core and/or a message in
the system's log upon old process termination.

Let's introduce some basic refcounting to prevent this from happening,
so that only the last bind_conf frees it.

Thanks to Janusz Dziemidowicz and Thierry Fournier for both reporting
the same issue with an easy reproducer.

This fix needs to be backported from 1.6 to 1.8.
2018-07-18 08:59:50 +02:00
Baptiste Assmann
8e2d9430c0 MINOR: dns: new DNS options to allow/prevent IP address duplication
By default, HAProxy's DNS resolution at runtime ensure that there is no
IP address duplication in a backend (for servers being resolved by the
same hostname).
There are a few cases where people want, on purpose, to disable this
feature.

This patch introduces a couple of new server side options for this purpose:
"resolve-opts allow-dup-ip" or "resolve-opts prevent-dup-ip".
2018-07-12 17:56:44 +02:00
Dave Chiluk
8618a6a5e2 MINOR: Some spelling cleanup in the comments.
Signed-off-by: Dave Chiluk <chiluk+haproxy@indeed.com>
2018-06-21 20:43:52 +02:00
Olivier Houchard
dcd6f3a597 MINOR: tasks: Make sure we correctly init and deinit a tasklet.
Up until now, a tasklet couldn't be free'd while it was in the list, it is
no longer the case, so make sure we remove it from the list before freeing it.
To do so, we have to make sure we correctly initialize it, so use LIST_INIT,
instead of setting the pointers to NULL.
2018-06-14 18:57:13 +02:00
William Lallemand
6e1796e85d BUG/MINOR: signals: ha_sigmask macro for multithreading
The behavior of sigprocmask in an multithreaded environment is
undefined.

The new macro ha_sigmask() calls either pthreads_sigmask() or
sigprocmask() if haproxy was built with thread support or not.

This should be backported to 1.8.
2018-06-08 18:24:53 +02:00
Olivier Houchard
b1ca58b245 MINOR: tasks: Don't define rqueue if we're building without threads.
To make sure we don't inadvertently insert task in the global runqueue,
while only the local runqueue is used without threads, make its definition
and usage conditional on USE_THREAD.
2018-06-06 16:35:12 +02:00
Olivier Houchard
e13ab8b3c6 BUG/MEDIUM: tasks: Use the local runqueue when building without threads.
When building without threads enabled, instead of just using the global
runqueue, just use the local runqueue associated with the only thread, as
that's what is now expected for a single thread in prcoess_runnable_tasks().
This should fix haproxy when built without threads.
2018-06-06 16:34:52 +02:00
Willy Tarreau
10d81b8757 MINOR: applet: assign the same nice value to a new appctx as its owner task
When an applet is created, let's assign it the same nice value as the task
of the stream which owns it. It ensures that fairness is properly propagated
to applets, and that the CLI can regain a low latency behaviour again. Huge
differences have been seen under extreme loads, with the CLI being called
every 200 microseconds instead of 11 milliseconds.
2018-06-05 11:18:21 +02:00
David Carlier
caa8a37ffe MINOR: task: Fix a compiler warning by adding a cast.
When calling HA_ATOMIC_CAS with a pointer as the target, the compiler
expects a pointer as the new value, so give it one by casting 0x1 to
(void *).
2018-06-04 17:43:12 +02:00
Thierry FOURNIER
9d5422a4b7 MINOR: task/notification: Is notifications registered ?
This function returns true is some notifications are registered.

This function is usefull for the following patch

   BUG/MEDIUM: lua/socket: Sheduling error on write: may dead-lock

It should be backported in 1.6, 1.7 and 1.8
2018-05-31 10:58:41 +02:00
Olivier Houchard
09eeb7684d BUG/MEDIUM: tasks: Don't forget to increase/decrease tasks_run_queue.
Don't forget to increase tasks_run_queue when we're adding a task to the
tasklet list, and to decrease it when we remove a task from a runqueue,
or its value won't be accurate, and could lead to tasks not being executed
when put in the global run queue.

1.9-dev only, no backport is needed.
2018-05-28 15:20:55 +02:00
Tim Duesterhus
3fd1973d37 MINOR: http: Log warning if (add|set)-header fails
This patch adds a warning if an http-(request|reponse) (add|set)-header
rewrite fails to change the respective header in a request or response.

This usually happens when tune.maxrewrite is not sufficient to hold all
the headers that should be added.
2018-05-28 14:53:59 +02:00
Olivier Houchard
673867c357 MAJOR: applets: Use tasks, instead of rolling our own scheduler.
There's no real reason to have a specific scheduler for applets anymore, so
nuke it and just use tasks. This comes with some benefits, the first one
being that applets cannot induce high latencies anymore since they share
nice values with other tasks. Later it will be possible to configure the
applets' nice value. The second benefit is that the applet scheduler was
not very thread-friendly, having a big lock around it in prevision of this
change. Thus applet-intensive workloads should now scale much better with
threads.

Some more improvement is possible now : some applets also use a task to
handle timers and timeouts. These ones could now be simplified to use only
one task.
2018-05-26 20:03:30 +02:00
Olivier Houchard
1599b80360 MINOR: tasks: Make the number of tasks to run at once configurable.
Instead of hardcoding 200, make the number of tasks to be run configurable
using tune.runqueue-depth. 200 is still the default.
2018-05-26 20:03:24 +02:00
Olivier Houchard
b0bdae7b88 MAJOR: tasks: Introduce tasklets.
Introduce tasklets, lightweight tasks. They have no notion of priority,
they are just run as soon as possible, and will probably be used for I/O
later.

For the moment they're used to replace the temporary thread-local list
that was used in the scheduler. The first part of the struct is common
with tasks so that tasks can be cast to tasklets and queued in this list.
Once a task is in the tasklet list, it has its leaf_p set to 0x1 so that
it cannot accidently be confused as not in the queue.

Pure tasklets are identifiable by their nice value of -32768 (which is
normally not possible).
2018-05-26 20:03:19 +02:00
Olivier Houchard
f6e6dc12cd MAJOR: tasks: Create a per-thread runqueue.
A lot of tasks are run on one thread only, so instead of having them all
in the global runqueue, create a per-thread runqueue which doesn't require
any locking, and add all tasks belonging to only one thread to the
corresponding runqueue.

The global runqueue is still used for non-local tasks, and is visited
by each thread when checking its own runqueue. The nice parameter is
thus used both in the global runqueue and in the local ones. The rare
tasks that are bound to multiple threads will have their nice value
used twice (once for the global queue, once for the thread-local one).
2018-05-26 19:27:29 +02:00
Olivier Houchard
9f6af33222 MINOR: tasks: Change the task API so that the callback takes 3 arguments.
In preparation for thread-specific runqueues, change the task API so that
the callback takes 3 arguments, the task itself, the context, and the state,
those were retrieved from the task before. This will allow these elements to
change atomically in the scheduler while the application uses the copied
value, and even to have NULL tasks later.
2018-05-26 19:23:57 +02:00
Willy Tarreau
0cd82e883e BUG/BUILD: threads: unbreak build without threads
A few users reported that building without threads was accidently broken
after commit 6b96f72 ("BUG/MEDIUM: pollers: Use a global list for fd
shared between threads.") due to all_threads_mask not being defined.
It's OK to set it to zero as other code parts do when threads are
enabled but only one thread is used.

This needs to be backported to 1.8.
2018-05-23 19:54:43 +02:00
Thierry Fournier
d5b073cf1f MINOR: lua: Improve error message
The function hlua_ctx_resume return less text message and more error
code. These error code allow the caller to return appropriate
message to the user.
2018-05-22 18:57:46 +02:00
Christopher Faulet
68db0235fd CLEANUP: spoe: Remove unused variables the agent structure
applets_act and applets_idle were used for debugging purpose. Now, these values
are part of the agent's counters.
2018-05-18 15:04:46 +02:00
Olivier Houchard
cb92f5cae4 MINOR: pollers: move polled_mask outside of struct fdtab.
The polled_mask is only used in the pollers, and removing it from the
struct fdtab makes it fit in one 64B cacheline again, on a 64bits machine,
so make it a separate array.
2018-05-06 06:27:34 +02:00
Olivier Houchard
6b96f7289c BUG/MEDIUM: pollers: Use a global list for fd shared between threads.
With the old model, any fd shared by multiple threads, such as listeners
or dns sockets, would only be updated on one threads, so that could lead
to missed event, or spurious wakeups.
To avoid this, add a global list for fd that are shared, using the same
implementation as the fd cache, and only remove entries from this list
when every thread as updated its poller.

[wt: this will need to be backported to 1.8 but differently so this patch
 must not be backported as-is]
2018-05-06 06:27:09 +02:00
Olivier Houchard
6a2cf8752c MINOR: fd: Make the lockless fd list work with multiple lists.
Modify fd_add_to_fd_list() and fd_rm_from_fd_list() so that they take an
offset in the fdtab to the list entry, instead of hardcoding the fd cache,
so we can use them with other lists.
2018-05-06 06:25:49 +02:00
Olivier Houchard
9b36cb4a41 BUG/MEDIUM: task: Don't free a task that is about to be run.
While running a task, we may try to delete and free a task that is about to
be run, because it's part of the local tasks list, or because rq_next points
to it.
So flag any task that is in the local tasks list to be deleted, instead of
run, by setting t->process to NULL, and re-make rq_next a global,
thread-local variable, that is modified if we attempt to delete that task.

Many thanks to PiBa-NL for reporting this and analysing the problem.

This should be backported to 1.8.
2018-05-04 20:11:04 +02:00
Willy Tarreau
760e81d356 MINOR: backend: implement random-based load balancing
For large farms where servers are regularly added or removed, picking
a random server from the pool can ensure faster load transitions than
when using round-robin and less traffic surges on the newly added
servers than when using leastconn.

This commit introduces "balance random". It internally uses a random as
the key to the consistent hashing mechanism, thus all features available
in consistent hashing such as weights and bounded load via hash-balance-
factor are usable. It is extremely convenient because one common concern
when using random is what happens when a server is hammered a bit too
much. Here that can trivially be avoided, like in the configuration below :

    backend bk0
        balance random
        hash-balance-factor 110
        server-template s 1-100 127.0.0.1:8000 check inter 1s

Note that while "balance random" internally relies on a hash algorithm,
it holds the same properties as round-robin and as such is compatible with
reusing an existing server connection with "option prefer-last-server".
2018-05-03 07:20:40 +02:00
Tim Duesterhus
e2b10bf491 MINOR: http: Add support for 421 Misdirected Request
This makes haproxy aware of HTTP 421 Misdirected Request, which
is defined in RFC 7540, section 9.1.2.
2018-04-28 07:03:39 +02:00
Aurlien Nephtali
abbf607105 MEDIUM: cli: Add payload support
In order to use arbitrary data in the CLI (multiple lines or group of words
that must be considered as a whole, for example), it is now possible to add a
payload to the commands. To do so, the first line needs to end with a special
pattern: <<\n. Everything that follows will be left untouched by the CLI parser
and will be passed to the commands parsers.

Per-command support will need to be added to take advantage of this
feature.

Signed-off-by: Aurlien Nephtali <aurelien.nephtali@corp.ovh.com>
2018-04-26 14:19:33 +02:00
Willy Tarreau
174b06a572 MINOR: h2: detect presence of CONNECT and/or content-length
We'll need this in order to support uploading chunks. The h2 to h1
converter checks for the presence of the content-length header field
as well as the CONNECT method and returns these information to the
caller. The caller indicates whether or not a body is detected for
the message (presence of END_STREAM or not). No transfer-encoding
header is emitted yet.
2018-04-26 10:15:14 +02:00
Olivier Houchard
302f9ef055 BUG/MEDIUM: connection: Make sure we have a mux before calling detach().
In some cases, we call cs_destroy() very early, so early the connection
doesn't yet have a mux, so we can't call mux->detach(). In this case,
just destroy the associated connection.

This should be backported to 1.8.
2018-04-13 16:02:21 +02:00
Christopher Faulet
48aa13f286 BUG/MEDIUM: threads: Fix the max/min calculation because of name clashes
With gcc < 4.7, when HAProxy is built with threads, the macros
HA_ATOMIC_CAS/XCHG/STORE relies on the legacy __sync builtins. These macros
are slightly complicated than the versions relying on the '_atomic'
builtins. Internally, some local variables are defined, prefixed with '__' to
avoid name clashes with the caller.

On the other hand, the macros HA_ATOMIC_UPDATE_MIN/MAX call HA_ATOMIC_CAS. Some
local variables are also definied in these macros, following the same naming
rule as below. The problem is that '__new' variable is used in
HA_ATOMIC_MIN/_MAX and in HA_ATOMIC_CAS. Obviously, the behaviour is undefined
because '__new' in HA_ATOMIC_CAS is left uninitialized. Unfortunatly gcc fails
to detect this error.

To fix the problem, all internal variables to macros are now suffixed with name
of the macros to avoid clashes (for instance, '__new_cas' in HA_ATOMIC_CAS).

This patch must be backported in 1.8.
2018-04-10 11:07:56 +02:00
Christopher Faulet
caf2feca62 MINOR: spoe: Add counters to log info about SPOE agents
In addition to metrics about time spent in the SPOE, following counters have
been added:

  * applets : number of SPOE applets.
  * idles : number of idle applets.
  * nb_sending : number of streams waiting to send data.
  * nb_waiting : number of streams waiting for a ack.
  * nb_processed : number of events/groups processed by the SPOE (from the
                   stream point of view).
  * nb_errors : number of errors during the processing (from the stream point of
                view).

Log messages has been updated to report these counters. Following pattern has
been added at the end of the log message:

    ... <idles>/<applets> <nb_sending>/<nb_waiting> <nb_error>/<nb_processed>
2018-04-05 15:13:54 +02:00
Christopher Faulet
7250b8fb5c MINOR: spoe: Add loggers dedicated to the SPOE agent
Now it is possible to configure a logger in a spoe-agent section using a "log"
line, as for a proxy. "no log", "log global" and "log <address> ..." syntaxes
are supported.
2018-04-05 15:13:54 +02:00
Christopher Faulet
28ac099907 MINOR: log: Keep the ref when a log server is copied to avoid duplicate entries
With "log global" line, the global list of loggers are copied into the proxy's
struct. The list coming from the default section is also copied when a frontend
or a backend section is parsed. So it is possible to have duplicate entries in
the proxy's list. For instance, with this following config, all messages will be
logged twice:

    global
        log 127.0.0.1 local0 debug
        daemon

    defaults
        mode   http
        log    global
        option httplog

    frontend front-http
        log global
        bind *:8888
        default_backend back-http

    backend back-http
        server www 127.0.0.1:8000
2018-04-05 15:13:54 +02:00
Christopher Faulet
4b0b79dd56 MINOR: log: move 'log' keyword parsing in dedicated function
Now, the function parse_logsrv should be used to parse a "log" line. This
function will update the list of loggers passed in argument. It can release all
log servers when "no log" line was parsed (by the caller) or it can parse "log
global" or "log <address> ... " lines. It takes care of checking the caller
context (global or not) to prohibit "log global" usage in the global section.
2018-04-05 15:13:54 +02:00
Christopher Faulet
36bda1cd4a MINOR: spoe: Add options to store processing times in variables
"set-process-time" and "set-total-time" options have been added to store
processing times in the transaction scope, at each event and group processing,
the current one and the total one. So it is possible to get them.

TODO: documentation
2018-04-05 15:13:54 +02:00
Christopher Faulet
b2dd1e034c MINOR: spoe: Add metrics in to know time spent in the SPOE
Following metrics are added for each event or group of messages processed in the
SPOE:

  * processing time: the delay to process the event or the group. From the
                     stream point of view, it is the latency added by the SPOE
                     processing.
  * request time : It is the encoding time. It includes ACLs processing, if
                   any. For fragmented frames, it is the sum of all fragments.
  * queue time : the delay before the request gets out the sending queue. For
                 fragmented frames, it is the sum of all fragments.
  * waiting time: the delay before the reponse is received. No fragmentation
                  supported here.
  * response time: the delay to process the response. No fragmentation supported
                   here.
  * total time: (unused for now). It is the sum of all events or groups
                processed by the SPOE for a specific threads.

Log messages has been updated. Before, only errors was logged (status_code !=
0). Now every processing is logged, following this format:

  SPOE: [AGENT] <TYPE:NAME> sid=STREAM-ID st=STATUC-CODE reqT/qT/wT/resT/pT

where:

  AGENT              is the agent name
  TYPE               is EVENT of GROUP
  NAME               is the event or the group name
  STREAM-ID          is an integer, the unique id of the stream
  STATUS_CODE        is the processing's status code
  reqT/qT/wT/resT/pT are delays descrive above

For all these delays, -1 means the processing was interrupted before the end. So
-1 for the queue time means the request was never dequeued. For fragmented
frames it is harder to know when the interruption happened.

For now, messages are logged using the same logger than the backend of the
stream which initiated the request.
2018-04-05 15:13:53 +02:00
Olivier Houchard
8ef1a6b0d8 BUG/MINOR: fd: Don't clear the update_mask in fd_insert.
Clearing the update_mask bit in fd_insert may lead to duplicate insertion
of fd in fd_updt, that could lead to a write past the end of the array.
Instead, make sure the update_mask bit is cleared by the pollers no matter
what.

This should be backported to 1.8.
[wt: warning: 1.8 doesn't have the lockless fdcache changes and will
 require some careful changes in the pollers]
2018-04-03 19:38:15 +02:00
Willy Tarreau
b011d8f4c4 MINOR: mux: add a "show_fd" function to dump debugging information for "show fd"
This function will be called from the CLI's "show fd" command to append some
extra mux-specific information that only the mux handler can decode. This is
supposed to help collect various hints about what is happening when facing
certain anomalies.
2018-03-30 14:41:19 +02:00
Willy Tarreau
4037a3f904 MINOR: cli/threads: make "show fd" report thread_sync_io_handler instead of "unknown"
The output was confusing when the sync point's dummy handler was shown.

This patch should be backported to 1.8 to help with troubleshooting.
2018-03-28 18:06:47 +02:00
Emmanuel Hocdet
4952985b71 REORG: compact "struct server"
Move use_ssl (bool value) in "struct server" hole.
2018-03-21 05:04:01 +01:00
Emmanuel Hocdet
4399c75f6c MINOR: proxy-v2-options: add crc32c
This patch add option crc32c (PP2_TYPE_CRC32C) to proxy protocol v2.
It compute the checksum of proxy protocol v2 header as describe in
"doc/proxy-protocol.txt".
2018-03-21 05:04:01 +01:00
Emmanuel Hocdet
6afd898988 MINOR: hash: add new function hash_crc32c
This function will be used to perform CRC32c computations. This is
required to compute proxy protocol v2 CRC32C tlv (PP2_TYPE_CRC32C).
2018-03-21 05:04:01 +01:00
Willy Tarreau
26fb5d8449 BUG/MEDIUM: fd/threads: ensure the fdcache_mask always reflects the cache contents
Commit 4815c8c ("MAJOR: fd/threads: Make the fdcache mostly lockless.")
made the fd cache lockless, but after a few iterations, a subtle part was
lost, consisting in setting the bit on the fd_cache_mask immediately when
adding an event. Now it was done only when the cache started to process
events, but the problem it causes is that fd_cache_mask isn't reliable
anymore as an indicator of presence of events to be processed with no
delay outside of fd_process_cached_events(). This results in some spurious
delays when processing inter-thread wakeups between tasks. Just restoring
the flag when the event is added is enough to fix the problem.

Kudos to Christopher for spotting this one!

No backport is needed as this is only in the development version.
2018-03-20 19:14:24 +01:00
Christopher Faulet
5cd4bbd7ab BUG/MAJOR: threads/queue: Fix thread-safety issues on the queues management
The management of the servers and the proxies queues was not thread-safe at
all. First, the accesses to <strm>->pend_pos were not protected. So it was
possible to release it on a thread (for instance because the stream is released)
and to use it in same time on another one (because we redispatch pending
connections for a server). Then, the accesses to stream's information (flags and
target) from anywhere is forbidden. To be safe, The stream's state must always
be updated in the context of process_stream.

So to fix these issues, the queue module has been refactored. A lock has been
added in the pendconn structure. And now, when we try to dequeue a pending
connection, we start by unlinking it from the server/proxy queue and we wake up
the stream. Then, it is the stream reponsibility to really dequeue it (or
release it). This way, we are sure that only the stream can create and release
its <pend_pos> field.

However, be careful. This new implementation should be thread-safe
(hopefully...). But it is not optimal and in some situations, it could be really
slower in multi-threaded mode than in single-threaded one. The problem is that,
when we try to dequeue pending connections, we process it from the older one to
the newer one independently to the thread's affinity. So we need to wait the
other threads' wakeup to really process them. If threads are blocked in the
poller, this will add a significant latency. This problem happens when maxconn
values are very low.

This patch must be backported in 1.8.
2018-03-19 10:03:06 +01:00
Christopher Faulet
510c0d67ef BUG/MEDIUM: threads/unix: Fix a deadlock when a listener is temporarily disabled
When a listener is temporarily disabled, we start by locking it and then we call
.pause callback of the underlying protocol (tcp/unix). For TCP listeners, this
is not a problem. But listeners bound on an unix socket are in fact closed
instead. So .pause callback relies on unbind_listener function to do its job.

Unfortunatly, unbind_listener hold the listener's lock and then call an internal
function to unbind it. So, there is a deadlock here. This happens during a
reload. To fix the problemn, the function do_unbind_listener, which is lockless,
is now exported and is called when a listener bound on an unix socket is
temporarily disabled.

This patch must be backported in 1.8.
2018-03-16 11:19:07 +01:00
Willy Tarreau
c41b3e8dff DOC: buffers: clarify the purpose of the <from> pointer in offer_buffers()
This one is only used to compare pointers and NULL is permitted though
this is far from being clear.
2018-03-08 18:33:48 +01:00
Emmanuel Hocdet
253c3b7516 MINOR: connection: add proxy-v2-options authority
This patch add option PP2_TYPE_AUTHORITY to proxy protocol v2 when a TLS
connection was negotiated. In this case, authority corresponds to the sni.
2018-03-01 11:38:32 +01:00
Emmanuel Hocdet
fa8d0f1875 MINOR: connection: add proxy-v2-options ssl-cipher,cert-sig,cert-key
This patch implement proxy protocol v2 options related to crypto information:
ssl-cipher (PP2_SUBTYPE_SSL_CIPHER), cert-sig (PP2_SUBTYPE_SSL_SIG_ALG) and
cert-key (PP2_SUBTYPE_SSL_KEY_ALG).
2018-03-01 11:38:28 +01:00
Emmanuel Hocdet
283e004a85 MINOR: ssl: add ssl_sock_get_cert_sig function
ssl_sock_get_cert_sig can be used to report cert signature short name
to log and ppv2 (RSA-SHA256).
2018-03-01 11:34:08 +01:00
Emmanuel Hocdet
96b7834e98 MINOR: ssl: add ssl_sock_get_pkey_algo function
ssl_sock_get_pkey_algo can be used to report pkey algorithm to log
and ppv2 (RSA2048, EC256,...).
Extract pkey information is not free in ssl api (lock/alloc/free):
haproxy can use the pkey information computed in load_certificate.
Store and use this information in a SSL ex_data when available,
compute it if not (SSL multicert bundled and generated cert).
2018-03-01 11:34:05 +01:00
Emmanuel Hocdet
ddc090bc55 MINOR: ssl: extract full pkey info in load_certificate
Private key information is used in switchctx to implement native multicert
selection (ecdsa/rsa/anonymous). This patch extract and store full pkey
information: dsa type and pkey size in bits. This can be used for switchctx
or to report pkey informations in ppv2 and log.
2018-03-01 11:33:18 +01:00
Christopher Faulet
ca6ef50661 BUG/MEDIUM: buffer: Fix the wrapping case in bi_putblk
When the block of data need to be split to support the wrapping, the start of
the second block of data was wrong. We must be sure to skup data copied during
the first memcpy.

This patch must be backported to 1.8.
2018-02-27 15:45:03 +01:00
Christopher Faulet
b2b279464c BUG/MEDIUM: buffer: Fix the wrapping case in bo_putblk
When the block of data need to be split to support the wrapping, the start of
the second block of data was wrong. We must be sure to skip data copied during
the first memcpy.

This patch must be backported to 1.8, 1.7, 1.6 and 1.5.
2018-02-27 15:45:03 +01:00
Yves Lafon
95317289e9 MINOR: stats: display the number of threads in the statistics.
Add the nbthread global variable to the output, matching nbproc.

This may be backported to 1.8
2018-02-26 11:53:46 +01:00
Willy Tarreau
364d745106 MINOR: debug/pools: make DEBUG_UAF also detect underflows
Since we use padding before the allocated page, it's trivial to place
the allocated address there and see if it gets mangled once we release
it.

This may be backported to stable releases already using DEBUG_UAF.
2018-02-22 14:18:45 +01:00
Willy Tarreau
5a9cce4653 BUG/MINOR: debug/pools: properly handle out-of-memory when building with DEBUG_UAF
Commit 158fa75 ("MINOR: pools: implement DEBUG_UAF to detect use after free")
implemented pool use-after-free detection, but the mmap() return value isn't
properly checked, preventing the call to pool_alloc_area() from returning
NULL. So on out-of-memory a mangled pointer is returned, causing a crash on
the pool_alloc() site instead of forcing a GC. It doesn't affect regular
operations however, just complicates complex bug investigations.

This fix should be backported to 1.8 and to 1.7.
2018-02-22 14:18:45 +01:00
Willy Tarreau
f161d0f51e BUG/MINOR: pools/threads: don't ignore DEBUG_UAF on double-word CAS capable archs
Since commit cf975d4 ("MINOR: pools/threads: Implement lockless memory
pools."), we support lockless pools. However the parts dedicated to
detecting use-after-free are not present in this part, making DEBUG_UAF
useless in this situation.

The present patch sets a new define CONFIG_HAP_LOCKLESS_POOLS when such
a compatible architecture is detected, and when pool debugging is not
requested, then makes use of this everywhere in pools and buffers
functions. This way enabling DEBUG_UAF will automatically disable the
lockless version.

No backport is needed as this is purely 1.9-dev.
2018-02-22 14:18:45 +01:00
Tim Duesterhus
5e64286bab CLEANUP: standard: Fix typo in IPv6 mask example
IPv6 addresses with two double colons are invalid.

This typo was introduced in commit 471851713a.
2018-02-21 05:07:35 +01:00
Tim Duesterhus
05f6a43bd4 CLEANUP: pools: Remove unused end label in memory.h
This removes the end label from memory.h.

The labels are unused as of cf975d46bc
which is unreleased (and incidentally the first commit containing
those labels, thus they never have been used).
2018-02-20 08:30:13 +01:00
Christopher Faulet
16f45c87d5 BUG/MINOR: ssl/threads: Make management of the TLS ticket keys files thread-safe
A TLS ticket keys file can be updated on the CLI and used in same time. So we
need to protect it to be sure all accesses are thread-safe. Because updates are
infrequent, a R/W lock has been used.

This patch must be backported in 1.8
2018-02-19 14:15:38 +01:00
David Carlier
4ee76d0281 BUILD/MINOR: memory: stdint is needed for uintptr_t
stdint.h is needed on OpenBSD for uintptr_t type.
2018-02-19 07:58:50 +01:00
Willy Tarreau
41ccb194d1 BUG/MEDIUM: threads: fix the double CAS implementation for ARMv7
Commit f61f0cb ("MINOR: threads: Introduce double-width CAS on x86_64
and arm.") introduced the double CAS. But the ARMv7 version is bogus,
it uses the value of the pointers instead of dereferencing them. When
lucky, it simply doesn't build due to impossible registers combinations.
Otherwise it will immediately crash at run time when facing traffic.

No backport is needed, this bug was introduced in 1.9-dev.
2018-02-14 14:16:28 +01:00
Willy Tarreau
4cc67a2782 MINOR: fd: move the fd_{add_to,rm_from}_fdlist functions to fd.c
There's not point inlining these huge functions, better move them to real
functions in fd.c.
2018-02-05 17:19:40 +01:00
Willy Tarreau
4d84186337 MEDIUM: fd: make updt_fd_polling() use atomics
It only needed a test-and-set and an atomic increment so we can take it
out of the fd lock now.
2018-02-05 16:02:22 +01:00
Willy Tarreau
1b76a6d1a6 CLEANUP: fd: remove the now unused fd_compute_new_polled_status() function
It's not used anymore since the new state is calculated on the fly
during every update. Let's remove this function.
2018-02-05 16:02:22 +01:00
Willy Tarreau
7ac0e35f23 MAJOR: fd: compute the new fd polling state out of the fd lock
Each fd_{may|cant|stop|want}_{recv|send} function sets or resets a
single bit at once, then recomputes the need for updates, and then
the new cache state. Later, pollers will compute the new polling
state based on the resulting operations here. In fact the conditions
are so simple that they can be performed by a single "if", or sometimes
even optimized away.

This means that in practice a simple compare-and-swap operation if often
enough to set the new value inluding the new polling state, and that only
the cache and fdupdt have to be performed under the lock. Better, for the
most common operations (fd_may_{recv,send}, used by the pollers), a simple
atomic OR is needed.

This patch does this for the fd_* functions above and it doesn't yet
remove the now useless fd_compute_new_polling_status() because it's still
used by other pollers. A pure connection rate test shows a 1% performance
increase.
2018-02-05 16:02:22 +01:00
Olivier Houchard
1256836ebf MEDIUM: fd/threads: Make sure we don't miss a fd cache entry.
An fd cache entry might be removed and added at the end of the list, while
another thread is parsing it, if that happens, we may miss fd cache entries,
to avoid that, add a new field in the struct fdtab, "added_mask", which
contains a mask for potentially affected threads, if it is set, the
corresponding thread will set its bit in fd_cache_mask, to avoid waiting in
poll while it may have more work to do.
2018-02-05 16:02:22 +01:00
Olivier Houchard
4815c8cbfe MAJOR: fd/threads: Make the fdcache mostly lockless.
Create a local, per-thread, fdcache, for file descriptors that only belongs
to one thread, and make the global fd cache mostly lockless, as we can get
a lot of contention on the fd cache lock.
2018-02-05 16:02:22 +01:00
Olivier Houchard
cf975d46bc MINOR: pools/threads: Implement lockless memory pools.
On CPUs that support a double-width compare-and-swap, implement lockless
pools.
2018-02-05 16:02:22 +01:00
Willy Tarreau
5266b3e12d MINOR: threads: add test and set/reset operations
This just adds a set of naive bts/btr operations based on OR/AND. Later
it could rely on pl_bts/btr to use arch-specific versions if needed.
2018-02-05 14:24:50 +01:00
Olivier Houchard
f61f0cb95f MINOR: threads: Introduce double-width CAS on x86_64 and arm.
Introduce double-width compare-and-swap on arches that support it, right now
x86_64, arm, and aarch64.
Also introduce functions to do memory barriers.
2018-02-05 14:24:50 +01:00
Olivier Houchard
928fbfa8b7 MINOR: compiler: introduce offsetoff().
Add a offsetof() macro, if it is no there already.
2018-02-05 14:24:50 +01:00
Olivier Houchard
6fa63d9852 MINOR: early data: Don't rely on CO_FL_EARLY_DATA to wake up streams.
Instead of looking for CO_FL_EARLY_DATA to know if we have to try to wake
up a stream, because it is waiting for a SSL handshake, instead add a new
conn_stream flag, CS_FL_WAIT_FOR_HS. This way we don't have to rely on
CO_FL_EARLY_DATA, and we will only wake streams that are actually waiting.
2018-02-05 14:24:50 +01:00
Christopher Faulet
b077cdc012 MEDIUM: spoe: Use an ebtree to manage idle applets
Instead of using a list of applets with idle ones in front, we now use an
ebtree. Aapplets in the tree are idle by definition. And the key is the applet's
weight. When a new frame is queued, the first idle applet (with the lowest
weight) is woken up and its weight is increased by one. And when an applet sends
a frame to a SPOA, its weight is decremented by one.

This is empirical, but it should avoid to overuse a very few number of applets
and increase the balancing between idle applets.
2018-02-02 16:00:32 +01:00
Christopher Faulet
8f82b203d5 MINOR: spoe: Count the number of frames waiting for an ack for each applet
So it is easier to respect the max_fpa value. This is no more the maximum frames
processed by an applet at each loop but the maximum frames waiting for an ack
for a specific applet.

The function spoe_handle_processing_appctx has been rewritten accordingly.
2018-02-02 16:00:32 +01:00
Christopher Faulet
6f9ea4f87b MINOR: spoe: Replace sending_rate by a frequency counter
sending_rate was a counter used to evaluate the SPOE capacity to process
frames. Because it was not really accurrate, it has been replaced by a frequency
counter representing the number of frames handled by the SPOE per second. We
just check this counter is higher than the number of streams waiting for a
reply. If not, a new applet is created.
2018-02-02 16:00:32 +01:00
Christopher Faulet
fce747bbaa MINOR: spoe: Always link a SPOE context with the applet processing it
This was already done for fragmented frames. Now, this is true for all
frames.
2018-02-02 16:00:32 +01:00
Christopher Faulet
420977903b MINOR: spoe: Remove check on min_applets number when a SPOE context is queued
The calculation of a minimal number of active applets was really empirical and
finally useless. On heavy load, there are always many active applets (most of
time, more than the minimal required) and when the load is low, there is no
reason to keep unused applets opened.

Because of this change, the flag SPOE_APPCTX_FL_PERSIST is now unused. So it has
been removed.
2018-02-02 16:00:32 +01:00
Frdric Lcaille
6778b27542 MINOR: stick-tables: Adds support for new "gpc1" and "gpc1_rate" counters.
Implement exactly the same code as this has been done for "gpc0" and "gpc0_rate"
counters.
2018-01-31 09:40:05 +01:00
Christopher Faulet
f51bac2ba8 BUG/MINOR: threads: Update labels array because of changes in lock_label enum
Recent changes to the enum were not synchronized with the lock debugging
code. Now we use a switch/case instead of an array so that the compiler
throws a warning if there is any inconsistency.

To be backported to 1.8 (at least to add the START entry).
2018-01-30 14:35:24 +01:00
Willy Tarreau
a9786b6f04 MINOR: fd: pass the iocb and owner to fd_insert()
fd_insert() is currently called just after setting the owner and iocb,
but proceeding like this prevents the operation from being atomic and
requires a lock to protect the maxfd computation in another thread from
meeting an incompletely initialized FD and computing a wrong maxfd.
Fortunately for now all fdtab[].owner are set before calling fd_insert(),
and the first lock in fd_insert() enforces a memory barrier so the code
is safe.

This patch moves the initialization of the owner and iocb to fd_insert()
so that the function will be able to properly arrange its operations and
remain safe even when modified to become lockless. There's no other change
beyond the internal API.
2018-01-29 16:07:25 +01:00
Willy Tarreau
82b37d74d2 MEDIUM: fd: use atomic ops for hap_fd_{clr,set} and remove poll_lock
Now that we can use atomic ops to set/clear an fd occurrence in an
fd_set, we don't need the poll_lock anymore. Let's remove it.
2018-01-29 16:03:15 +01:00
Willy Tarreau
322e6c7e73 MINOR: fd: move the hap_fd_{clr,set,isset} functions to fd.h
These functions were created for poll() in 1.5-dev18 (commit 80da05a4) to
replace the previous FD_{CLR,SET,ISSET} that were shared with select()
because some libcs enforce a limit on FD_SET. But FD_SET doesn't seem
to be universally MT-safe, requiring locks in the select() code that
are not needed in the poll code. So let's move back to the initial
situation where we used to only use bit fields, since that has been in
use since day one without a problem, and let's use these hap_fd_*
functions instead of FD_*.

This patch only moves the functions to fd.h and revives hap_fd_isset()
that was recently removed to kill an "unused" warning.
2018-01-29 16:03:15 +01:00
Willy Tarreau
745c60eac6 CLEANUP: fd: remove the unused "new" field
This field has been unused since 1.6, it's only updated and never
tested. Let's remove it.
2018-01-29 16:02:59 +01:00
Willy Tarreau
f2b5c99b4c CLEANUP: fd/threads: remove the now unused fdtab_lock
It was only used to protect maxfd computation and is not needed
anymore.
2018-01-29 15:25:35 +01:00
Willy Tarreau
173d9951e2 MEDIUM: polling: start to move maxfd computation to the pollers
Since only select() and poll() still make use of maxfd, let's move
its computation right there in the pollers themselves, and only
during each fd update pass. The computation doesn't need a lock
anymore, only a few atomic ops. It will be accurate, be done much
less often and will not be required anymore in the FD's fast patch.

This provides a small performance increase of about 1% in connection
rate when using epoll since we get rid of this computation which was
performed under a lock.
2018-01-29 15:22:57 +01:00
Frdric Lcaille
a41d531e4e MINOR: config: Enable tracking of up to MAX_SESS_STKCTR stick counters.
This patch really adds support for up to MAX_SESS_STKCTR stick counters.
2018-01-29 13:53:56 +01:00
Tim Duesterhus
471851713a MINOR: standard: Add str2mask6 function
This new function mirrors the str2mask() function for IPv4 addresses.

This commit is in preparation to support ARGT_MSK6.
2018-01-25 22:25:40 +01:00
Tim Duesterhus
92bb034209 CLEANUP: Fix typo in ARGT_MSK6 comment
The incorrect comment was introduced in commit:
2ac5718dbd

v1.5-dev9 is the first tag containing this comment, the fix
should be backported to haproxy 1.5 and newer.
2018-01-25 22:25:40 +01:00
Willy Tarreau
1605c7ae61 BUG/MEDIUM: threads/mworker: fix a race on startup
Marc Fournier reported an interesting case when using threads with the
master-worker mode : sometimes, a listener would have its FD closed
during startup. Sometimes it could even be health checks seeing this.

What happens is that after the threads are created, and the pollers
enabled on each threads, the master-worker pipe is registered, and at
the same time a close() is performed on the write side of this pipe
since the children must not use it.

But since this is replicated in every thread, what happens is that the
first thread closes the pipe, thus releases the FD, and the next thread
starting a listener in parallel gets this FD reassigned. Then another
thread closes the FD again, which this time corresponds to the listener.
It can also happen with the health check sockets if they're started
early enough.

This patch splits the mworker_pipe_register() function in two, so that
the close() of the write side of the FD is performed very early after the
fork() and long before threads are created (we don't need to delay it
anyway). Only the pipe registration is done in the threaded code since
it is important that the pollers are properly allocated for this.
The mworker_pipe_register() function now takes care of registering the
pipe only once, and this is guaranteed by a new surrounding lock.

The call to protocol_enable_all() looks fragile in theory since it
scans the list of proxies and their listeners, though in practice
all threads scan the same list and take the same locks for each
listener so it's not possible that any of them escapes the process
and finishes before all listeners are started. And the operation is
idempotent.

This fix must be backported to 1.8. Thanks to Marc for providing very
detailed traces clearly showing the problem.
2018-01-23 19:18:57 +01:00
Willy Tarreau
c9c8378c2b MINOR: fd: add a bitmask to indicate that an FD is known by the poller
Some pollers like epoll() need to know if the fd is already known or
not in order to compute the operation to perform (add, mod, del). For
now this is performed based on the difference between the previous FD
state and the new state but this will not be usable anymore once threads
become responsible for their own polling.

Here we come with a different approach : a bitmask is stored with the
fd to indicate which pollers already know it, and the pollers will be
able to simply perform the add/mod/del operations based on this bit
combined with the new state.

This patch only adds the bitmask declaration and initialization, it
is it not yet used. It will be needed by the next two fixes and will
need to be backported to 1.8.
2018-01-23 15:42:57 +01:00
Willy Tarreau
ebc78d78a2 BUG/MEDIUM: fd: maintain a per-thread update mask
Since the fd update tables are per-thread, we need to have a bit per
thread to indicate whether an update exists, otherwise this can lead
to lost update events every time multiple threads want to update the
same FD. In practice *for now*, it only happens at start time when
listeners are enabled and ask for polling after facing their first
EAGAIN. But since the pollers are still shared, a lost event is still
recovered by a neighbor thread. This will not reliably work anymore
with per-thread pollers, where it has been observed a few times on
startup that a single-threaded listener would not always accept
incoming connections upon startup.

It's worth noting that during this code review it appeared that the
"new" flag in the fdtab isn't used anymore.

This fix should be backported to 1.8.
2018-01-23 15:41:19 +01:00
Christopher Faulet
69553fe62c MINOR: threads/fd: Use a bitfield to know if there are FDs for a thread in the FD cache
A bitfield has been added to know if there are some FDs processable by a
specific thread in the FD cache. When a FD is inserted in the FD cache, the bits
corresponding to its thread_mask are set. On each thread, the bitfield is
updated when the FD cache is processed. If there is no FD processed, the thread
is removed from the bitfield by unsetting its tid_bit.

Note that this bitfield is updated but not checked in
fd_process_cached_events. So, when this function is called, the FDs cache is
always processed.

[wt: should be backported to 1.8 as it will help fix a design limitation]
2018-01-23 15:39:10 +01:00
Willy Tarreau
d80cb4ee13 MINOR: global: add some global activity counters to help debugging
A number of counters have been added at special places helping better
understanding certain bug reports. These counters are maintained per
thread and are shown using "show activity" on the CLI. The "clear
counters" commands also reset these counters. The output is sent as a
single write(), which currently produces up to about 7 kB of data for
64 threads. If more counters are added, it may be necessary to write
into multiple buffers, or to reset the counters.

To backport to 1.8 to help collect more detailed bug reports.
2018-01-23 15:38:33 +01:00
Willy Tarreau
421f02e738 MINOR: threads: add a MAX_THREADS define instead of LONGBITS
This one allows not to inflate some structures when threads are
disabled. Now struct global is 1.4 kB instead of 33 kB.

Should be backported to 1.8 for ease of backporting of upcoming
patches.
2018-01-23 15:28:20 +01:00
Willy Tarreau
f4571a027f MINOR: global/threads: move cpu_map at the end of the global struct
The "thread" part is 32kB long, better move it at the end of the
structure since it's only used during initialization, to keep the
rest grouped together.

Should be backported to 1.8 to ease backporting of upcoming patches,
no functional impact.
2018-01-23 15:27:52 +01:00
Christopher Faulet
336d3ef0e7 MINOR: spoe: add register-var-names directive in spoe-agent configuration
In addition to "option force-set-var", recently added, this directive can be
used to selectivelly register unknown variable names, without totally relaxing
their registration during the runtime, like "option force-set-var" does.

So there is no way for a malicious agent to exhaust memory by defining a too
high number of variable names. In other hand, you need to enumerate all
variable names. This could be painfull in some circumstances.

Remember, this directive is only usefull when the variable names are not
referenced anywhere in the HAProxy configuration or the SPOE one.

Thanks to Etienne Carrire for his help on this part.
2018-01-15 13:47:27 +01:00
David Carlier
ec5e84552a BUILD/MINOR: ancient gcc versions atomic fix
Commit 1a69af6d38 introduced code
for atomic prior to 4.7. Unfortunately clang uses as well those
constants which is misleading.
2018-01-11 15:31:07 +01:00
Willy Tarreau
1a69af6d38 MINOR: hathreads: add support for gcc < 4.7
Till now the use of __atomic_* gcc builtins required gcc >= 4.7. Since
some supported and quite common operating systems like CentOS 6 still
come with older versions (4.4) and the mapping to the older builtins
is reasonably simple, let's implement it.

This code is only used for gcc < 4.7. It has been quickly tested on a
machine using gcc 4.4.4 and provided expected results.

This patch should be backported to 1.8.
2018-01-10 07:51:56 +01:00
Olivier Houchard
2ec2db9725 MINOR: dns: Handle SRV record weight correctly.
A SRV record weight can range from 0 to 65535, while haproxy weight goes
from 0 to 256, so we have to divide it by 256 before handing it to haproxy.
Also, a SRV record with a weight of 0 doesn't mean the server shouldn't be
used, so use a minimum weight of 1.

This should probably be backported to 1.8.
2018-01-09 15:43:11 +01:00
Olivier Houchard
e2a34967a9 CLEANUP: rbtree: remove
Remove the rbtree implementation. It's not used, it's not even connected to
the build, and we probably have no use for it .
2018-01-05 10:56:32 +01:00
Willy Tarreau
3083276187 MINOR: h2: add a function to report pseudo-header names
For debugging we need to be able to dump pseudo headers when we know
their name, let's put this there as we already have the other way
around.
2017-12-30 17:17:07 +01:00
Willy Tarreau
a48c141f44 BUG/MAJOR: connection: refine the situations where we don't send shutw()
Since commit f9ce57e ("MEDIUM: connection: make conn_sock_shutw() aware
of lingering"), we refrain from performing the shutw() on the socket if
there is no lingering risk. But there is a problem with this in tunnel
and in TCP modes where a client is explicitly allowed to send a shutw
to the server, eventhough it it risky.

Not doing it creates this situation reported by Ricardo Fraile and
diagnosed by Christopher : a typical HTTP client (eg: curl) connecting
via the config below to an HTTP server would receive its response,
immediately close while the server remains in keep-alive mode. The
shutr() received by haproxy from the client is "propagated" to the
server side but not acted upon because fdtab[fd].linger_risk is set,
so we expect that the next close will immediately complete this
operation.

  listen proxy-tcp
    bind 127.0.0.1:8888
    mode tcp
    timeout connect 5s
    timeout server  10s
    timeout client  10s
    server server1 127.0.0.1:8000

But since the whole stream will not end until the server closes in
turn, the server doesn't close and haproxy expires on server timeout.
This problem has already struck by waking up an older bug and was
partially fixed with commit 8059351 ("BUG/MEDIUM: http: don't disable
lingering on requests with tunnelled responses") though it was not
enough.

The problem is that linger_risk is not suited here. In fact we need to
know whether or not it is desired to close normally or silently, and
whether or not a shutr() has already been received on this connection.

This is the approach this patch takes, and it solves the problem for
the various difficult modes (tcp, http-server-close, pretend-keepalive).

This fix needs to be backported to 1.8. Many thanks to Ricardo for
providing very detailed traces and configurations.
2017-12-22 18:54:05 +01:00
Willy Tarreau
0ad8e0dfea MINOR: http: add a function to check request's cache-control header field
The new function check_request_for_cacheability() is used to check if
a request may be served from the cache, and/or allows the response to
be stored into the cache. For this it checks the cache-control and
pragma header fields, and adjusts the existing TX_CACHEABLE and a new
TX_CACHE_IGNORE flags.

For now, just like its response side counterpart, it only checks the
first value of the header field. These functions should be reworked to
improve their parsers and validate all elements.
2017-12-22 17:56:17 +01:00
Willy Tarreau
984fca9363 MINOR: stream-int: set flag SI_FL_CLEAN_ABRT when mux supports clean aborts
By copying the info in the stream interface that the mux cleanly reports
aborts, we'll have the ability to check this flag wherever needed regardless
of the presence of a mux or not.
2017-12-20 16:56:32 +01:00
Willy Tarreau
28f1cb9da2 MINOR: mux: add flags to describe a mux's capabilities
This new field will be used to describe certain properties of some
muxes. For now we only add MX_FL_CLEAN_ABRT to indicate that a mux
is able to unambiguously report aborts using CS_FL_ERROR contrary
to others who may only report it via a read0. This will be used to
improve handling of the abortonclose option with H2. Other flags
may come later to report multiplexing capabilities or not, support
of client/server sides etc.
2017-12-20 16:31:30 +01:00
Etienne Carriere
aec8989e53 MINOR: spoe: add force-set-var option in spoe-agent configuration
For security reasons, the spoe filter was only able to change values of
existing variables. In specific cases (ex : with LUA code), the name of
variables are unknown at the configuration parsing phase.
The force-set-var option can be enabled to register all variables.
2017-12-20 08:55:18 +01:00
Willy Tarreau
3c8294b607 MINOR: conn_stream: add new flag CS_FL_RCV_MORE to indicate pending data
Due to the nature of multiplexed protocols, it will often happen that
some operations are only performed on full frames, preventing any partial
operation from being performed. HTTP/2 is one such example. The current
MUX API causes a problem here because the rcv_buf() function has no way
to let the stream layer know that some data could not be read due to a
lack of room in the buffer, but that data are definitely present. The
problem with this is that the stream layer might not know it needs to
call the function again after it has made some room. And if the frame
in the buffer is not followed by any other, nothing will move anymore.

This patch introduces a new conn_stream flag CS_FL_RCV_MORE whose purpose
is to indicate on the stream that more data than what was received are
already available for reading as soon as more room will be available in
the buffer.

This patch doesn't make use of this flag yet, it only declares it. It is
expected that other similar flags may come in the future, such as reports
of pending end of stream, errors or any such event that might save the
caller from having to poll, or simply let it know that it can take some
actions after having processed data.
2017-12-10 21:13:25 +01:00
Thierry FOURNIER
cb14688496 BUG/MEDIUM: lua/notification: memory leak
The thread patches adds refcount for notifications. The notifications are
used with the Lua cosocket. These refcount free the notifications when
the session is cleared. In the Lua task case, it not have sessions, so
the nofications are never cleraed.

This patch adds a garbage collector for signals. The garbage collector
just clean the notifications for which the end point is disconnected.

This patch should be backported in 1.8
2017-12-10 19:38:58 +01:00
Thierry FOURNIER
d5b79835f8 DOC: notifications: add precisions about thread usage
Precise the terms of use the notification functions.
2017-12-10 19:38:55 +01:00
Emeric Brun
ece0c334bd BUG/MEDIUM: ssl engines: Fix async engines fds were not considered to fix fd limit automatically.
The number of async fd is computed considering the maxconn, the number
of sides using ssl and the number of engines using async mode.

This patch should be backported on haproxy 1.8
2017-12-06 14:17:41 +01:00
Willy Tarreau
6c71e4696b BUG/MAJOR: hpack: don't pretend large headers fit in empty table
In hpack_dht_make_room(), we try to fulfill this rule form RFC7541#4.4 :

 "It is not an error to attempt to add an entry that is larger than the
  maximum size; an attempt to add an entry larger than the maximum size
  causes the table to be emptied of all existing entries and results in
  an empty table."

Unfortunately it is not consistent with the way it's used in
hpack_dht_insert() as this last one will consider a success as a
confirmation it can copy the header into the table, and a failure as
an indexing error. This results in the two following issues :
  - if a client sends too large a header into an empty table, this
    header may overflow the table. Fortunately, most clients send
    small headers like :authority first, and never mark headers that
    don't fit into the table as indexable since it is counter-productive ;

  - if a client sends too large a header into a populated table, the
    operation fails after the table is totally flushed and the request
    is not processed.

This patch fixes the two issues at once :
  - a header not fitting into an empty table is always a sign that it
    will never fit ;
  - not fitting into the table is not an error

Thanks to Yves Lafon for reporting detailed traces demonstrating this
issue. This fix must be backported to 1.8.
2017-12-04 18:06:51 +01:00
Willy Tarreau
d85ba4e092 BUG/MINOR: hpack: reject invalid header index
If the hpack decoder sees an invalid header index, it emits value
"### ERR ###" that was used during debugging instead of rejecting the
block. This is harmless, and was detected by h2spec.

To backport to 1.8.
2017-12-03 21:08:39 +01:00
Emeric Brun
0fed0b0a38 BUG/MEDIUM: peers: fix some track counter rules dont register entries for sync.
This BUG was introduced with:
'MEDIUM: threads/stick-tables: handle multithreads on stick tables'

The API was reviewed to handle stick table entry updates
asynchronously and the caller must now call a 'stkable_touch_*'
function each time the content of an entry is modified to
register the entry to be synced.

There was missing call to stktable_touch_* resulting in
not propagated entries to remote peers (or local one during reload)
2017-11-29 19:16:22 +01:00
Willy Tarreau
ec7464726f BUILD: checks: don't include server.h
server.h needs checks.h since it references the struct check, but depending
on the include order it will fail if check.h is included first due to this
one including server.h in turn while it doesn't need it.
2017-11-29 10:54:05 +01:00
Willy Tarreau
b306650c2a [RELEASE] Released version 1.9-dev0
Released version 1.9-dev0 with the following main changes :
    - BUG/MEDIUM: stream: don't automatically forward connect nor close
    - BUG/MAJOR: stream: ensure analysers are always called upon close
    - BUG/MINOR: stream-int: don't try to read again when CF_READ_DONTWAIT is set
    - MEDIUM: mworker: Add systemd `Type=notify` support
    - BUG/MEDIUM: cache: free callback to remove from tree
    - CLEANUP: cache: remove unused struct
    - MEDIUM: cache: enable the HTTP analysers
    - CLEANUP: cache: remove wrong comment
    - MINOR: threads/atomic: rename local variables in macros to avoid conflicts
    - MINOR: threads/plock: rename local variables in macros to avoid conflicts
    - MINOR: threads/atomic: implement pl_mb() in asm on x86
    - MINOR: threads/atomic: implement pl_bts() on non-x86
    - MINOR: threads/build: atomic: replace the few inlines with macros
    - BUILD: threads/plock: fix a build issue on Clang without optimization
    - BUILD: ebtree: don't redefine types u32/s32 in scope-aware trees
    - BUILD: compiler: add a new type modifier __maybe_unused
    - BUILD: h2: mark some inlined functions "unused"
    - BUILD: server: check->desc always exists
    - BUG/MEDIUM: h2: properly report connection errors in headers and data handlers
    - MEDIUM: h2: add a function to emit an HTTP/1 request from a headers list
    - MEDIUM: h2: change hpack_decode_headers() to only provide a list of headers
    - BUG/MEDIUM: h2: always reassemble the Cookie request header field
    - BUG/MINOR: systemd: ignore daemon mode
    - CONTRIB: spoa_example: allow to compile outside HAProxy.
    - CONTRIB: spoa_example: remove bref, wordlist, cond_wordlist
    - CONTRIB: spoa_example: remove last dependencies on type "sample"
    - CONTRIB: spoa_example: remove SPOE enums that are useless for clients
    - CLEANUP: cache: reorder includes
    - MEDIUM: shctx: use unsigned int for len and block_count
    - MEDIUM: cache: "show cache" on the cli
    - BUG/MEDIUM: cache: use key=0 as a condition for freeing
    - BUG/MEDIUM: cache: refcount forbids to free the objects
    - BUG/MEDIUM: cache fix cli_kws structure
    - BUG/MEDIUM: deinit: correctly deinitialize the proxy and global listener tasks
    - BUG/MINOR: ssl: Always start the handshake if we can't send early data.
    - MINOR: ssl: Don't disable early data handling if we could not write.
    - MINOR: pools: prepare functions to override malloc/free in pools
    - MINOR: pools: implement DEBUG_UAF to detect use after free
    - BUG/MEDIUM: threads/time: fix time drift correction
    - BUG/MEDIUM: threads/time: maintain a common time reference between all threads
    - MINOR: sample: Add "thread" sample fetch
    - BUG/MINOR: Use crt_base instead of ca_base when crt is parsed on a server line
    - BUG/MINOR: stream: fix tv_request calculation for applets
    - BUG/MAJOR: h2: always remove a stream from the send list before freeing it
    - BUG/MAJOR: threads/task: dequeue expired tasks under the WQ lock
    - MINOR: ssl: Handle reading early data after writing better.
    - MINOR: mux: Make sure every string is woken up after the handshake.
    - MEDIUM: cache: store sha1 for hashing the cache key
    - MINOR: http: implement the "http-request reject" rule
    - MINOR: h2: send RST_STREAM before GOAWAY on reject
    - MEDIUM: h2: don't gracefully close the connection anymore on Connection: close
    - MINOR: h2: make use of client-fin timeout after GOAWAY
    - MEDIUM: config: ensure that tune.bufsize is at least 16384 when using HTTP/2
    - MINOR: ssl: Handle early data with BoringSSL
    - BUG/MEDIUM: stream: always release the stream-interface on abort
    - BUG/MEDIUM: cache: free ressources in chn_end_analyze
    - MINOR: cache: move the refcount decrease in the applet release
    - BUG/MINOR: listener: Allow multiple "process" options on "bind" lines
    - MINOR: config: Support a range to specify processes in "cpu-map" parameter
    - MINOR: config: Slightly change how parse_process_number works
    - MINOR: config: Export parse_process_number and use it wherever it's applicable
    - MINOR: standard: Add my_ffsl function to get the position of the bit set to one
    - MINOR: config: Add auto-increment feature for cpu-map
    - MINOR: config: Support partial ranges in cpu-map directive
    - MINOR:: config: Remove thread-map directive
    - MINOR: config: Add the threads support in cpu-map directive
    - MINOR: config: Add threads support for "process" option on "bind" lines
    - MEDIUM: listener: Bind listeners on a thread subset if specified
    - CLEANUP: debug: Use DPRINTF instead of fprintf into #ifdef DEBUG_FULL/#endif
    - CLEANUP: log: Rename Alert/Warning in ha_alert/ha_warning
    - MINOR/CLEANUP: proxy: rename "proxy" to "proxies_list"
    - CLEANUP: pools: rename all pool functions and pointers to remove this "2"
    - DOC: update the roadmap file with the latest changes merged in 1.8
    - DOC: fix mangled version in peers protocol documentation
    - DOC: add initial peers protovol v2.0 documentation.
    - DOC: mention William as maintainer of the cache and master-worker
    - DOC: add Christopher and Emeric as maintainers of the threads
    - MINOR: cache: replace a fprint() by an abort()
    - MEDIUM: cache: max-age configuration keyword
    - DOC: explain HTTP2 timeout behavior
    - DOC: cache: configuration and management
    - MAJOR: mworker: exits the master on failure
    - BUG/MINOR: threads: don't drop "extern" on the lock in include files
    - MINOR: task: keep a pointer to the currently running task
    - MINOR: task: align the rq and wq locks
    - MINOR: fd: cache-align fdtab and fdcache locks
    - MINOR: buffers: cache-align buffer_wq_lock
    - CLEANUP: server: reorder some fields in struct server to save 40 bytes
    - CLEANUP: proxy: slightly reorder the struct proxy to reduce holes
    - CLEANUP: checks: remove 16 bytes of holes in struct check
    - CLEANUP: cache: more efficiently pack the struct cache
    - CLEANUP: fd: place the lock at the beginning of struct fdtab
    - CLEANUP: pools: align pools on a cache line
    - DOC: config: add a few bits about how to configure HTTP/2
    - BUG/MAJOR: threads/queue: avoid recursive locking in pendconn_get_next_strm()
    - BUILD: Makefile: reorder object files by size
2017-11-26 19:50:17 +01:00
Willy Tarreau
103e5663c8 BUG/MAJOR: threads/queue: avoid recursive locking in pendconn_get_next_strm()
pendconn_get_next_strm() is called from process_srv_queue() under the
server lock, and calls stream_add_srv_conn() with this lock held, while
the latter tries to take it again. This results in a deadlock when
a server's maxconn is reached and haproxy is built with thread support.
2017-11-26 18:50:30 +01:00
Willy Tarreau
1ca1b70cf9 CLEANUP: pools: align pools on a cache line
There are just a few pools, and they're stressed a lot, so it makes
sense to dedicate them a cache line to avoid contention and to place
the lock at the beginning.
2017-11-26 11:10:53 +01:00
Willy Tarreau
5809052ae1 CLEANUP: fd: place the lock at the beginning of struct fdtab
The struct is not cache line aligned but at least, every time the lock
will appear in the same cache line as the fd it will benefit from being
accessed first. This improves the performance by about 2% on fd-intensive
workloads with 4 threads.
2017-11-26 11:10:53 +01:00
Willy Tarreau
08eaa78739 CLEANUP: checks: remove 16 bytes of holes in struct check
These ones were easily recovered by swapping two members.
2017-11-26 11:10:52 +01:00
Willy Tarreau
a51108443e CLEANUP: proxy: slightly reorder the struct proxy to reduce holes
16 bytes were recovered from the struct doing minimal reordering.
2017-11-26 11:10:52 +01:00
Willy Tarreau
d7e33bbe2f CLEANUP: server: reorder some fields in struct server to save 40 bytes
In 1.8 many holes were introduced in struct server, so let's slightly
reorder a few fields to plug most of them. This saves 40 bytes in the
struct.
2017-11-26 11:10:52 +01:00
Willy Tarreau
8b94969054 MINOR: fd: cache-align fdtab and fdcache locks
These locks are highly contended, let's not make them share cache lines.
2017-11-26 11:10:51 +01:00
Willy Tarreau
53bae85b8e BUG/MINOR: threads: don't drop "extern" on the lock in include files
Commit 9dcf9b6 ("MINOR: threads: Use __decl_hathreads to declare locks")
accidently lost a few "extern" in certain lock declarations, possibly
causing certain entries to be declared at multiple places. Apparently
it hasn't caused any harm though.

The offending ones were :
  - fdtab_lock
  - fdcache_lock
  - poll_lock
  - buffer_wq_lock
2017-11-26 11:10:50 +01:00
William Lallemand
4cfede87a3 MAJOR: mworker: exits the master on failure
This patch changes the behavior of the master during the exit of a
worker.

When a worker exits with an error code, for example in the case of a
segfault, all workers are now killed and the master leaves.

If you don't want this behavior you can use the option
"master-worker no-exit-on-failure".
2017-11-24 22:48:27 +01:00
Willy Tarreau
bafbe01028 CLEANUP: pools: rename all pool functions and pointers to remove this "2"
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
2017-11-24 17:49:53 +01:00
Olivier Houchard
fbc74e8556 MINOR/CLEANUP: proxy: rename "proxy" to "proxies_list"
Rename the global variable "proxy" to "proxies_list".
There's been multiple proxies in haproxy for quite some time, and "proxy"
is a potential source of bugs, a number of functions have a "proxy" argument,
and some code used "proxy" when it really meant "px" or "curproxy". It worked
by pure luck, because it usually happened while parsing the config, and thus
"proxy" pointed to the currently parsed proxy, but we should probably not
rely on this.

[wt: some of these are definitely fixes that are worth backporting]
2017-11-24 17:21:27 +01:00
Christopher Faulet
767a84bcc0 CLEANUP: log: Rename Alert/Warning in ha_alert/ha_warning 2017-11-24 17:19:12 +01:00
Christopher Faulet
c644fa9bf5 MINOR: config: Add threads support for "process" option on "bind" lines
It is now possible on a "bind" line (or a "stats socket" line) to specify the
thread set allowed to process listener's connections. For instance:

    # HTTPS connections will be processed by all threads but the first and HTTP
    # connection will be processed on the first thread.
    bind *:80 process 1/1
    bind *:443 ssl crt mycert.pem process 1/2-
2017-11-24 15:38:50 +01:00
Christopher Faulet
cb6a94510d MINOR: config: Add the threads support in cpu-map directive
Now, it is possible to bind CPU at the thread level instead of the process level
by defining a thread set in "cpu-map" directives. Thus, its format is now:

  cpu-map [auto:]<process-set>[/<thread-set>] <cpu-set>...

where <process-set> and <thread-set> must follow the format:

  all | odd | even | number[-[number]]

Having a process range and a thread range in same time with the "auto:" prefix
is not supported. Only one range is supported, the other one must be a fixed
number. But it is allowed when there is no "auto:" prefix.

Because it is possible to define a mapping for a process and another for a
thread on this process, threads will be bound on the intersection of their
mapping and the one of the process on which they are attached. If the
intersection is null, no specific binding will be set for the threads.
2017-11-24 15:38:50 +01:00
Christopher Faulet
26028f6209 MINOR: config: Add auto-increment feature for cpu-map
The prefix "auto:" can be added before the process set to let HAProxy
automatically bind a process to a CPU by incrementing process and CPU sets. To
be valid, both sets must have the same size. No matter the declaration order of
the CPU sets, it will be bound from the lower to the higher bound.

  Examples:
      # all these lines bind the process 1 to the cpu 0, the process 2 to cpu 1
      #  and so on.
      cpu-map auto:1-4   0-3
      cpu-map auto:1-4   0-1 2-3
      cpu-map auto:1-4   3 2 1 0

      # bind each process to exaclty one CPU using all/odd/even keyword
      cpu-map auto:all   0-63
      cpu-map auto:even  0-31
      cpu-map auto:odd   32-63

      # invalid cpu-map because process and CPU sets have different sizes.
      cpu-map auto:1-4   0    # invalid
      cpu-map auto:1     0-3  # invalid
2017-11-24 15:38:49 +01:00
Christopher Faulet
ff8131861f MINOR: standard: Add my_ffsl function to get the position of the bit set to one 2017-11-24 15:38:49 +01:00
Christopher Faulet
f1f0c5f591 MINOR: config: Export parse_process_number and use it wherever it's applicable
This function is used when "bind-process" directive is parsed and when "process"
parameter on a "bind" or a "stats socket" line is parsed.
2017-11-24 15:38:49 +01:00
William Lallemand
f528fff46b MEDIUM: cache: store sha1 for hashing the cache key
The cache was relying on the txn->uri for creating its key, which was a
big problem when there was no log activated.

This patch does a sha1 of the host + uri, and stores it in the txn.
When a object is stored, the eb32node uses the first 32 bits of the hash
as a key, and the whole hash is stored in the cache entry.

During a lookup, the truncated hash is used, and when it matches an
entry we check the real sha1.
2017-11-23 20:20:04 +01:00
Olivier Houchard
90084a133d MINOR: ssl: Handle reading early data after writing better.
It can happen that we want to read early data, write some, and then continue
reading them.
To do so, we can't reuse tmp_early_data to store the amount of data sent,
so introduce a new member.
If we read early data, then ssl_sock_to_buf() is now the only responsible
for getting back to the handshake, to make sure we don't miss any early data.
2017-11-23 19:35:28 +01:00
Willy Tarreau
158fa75811 MINOR: pools: implement DEBUG_UAF to detect use after free
This code has been used successfully a few times in the past to detect
that a pool was used after being freed. Its main goal is to allocate a
full page for each object so that they are always released individually
and unmapped from memory. This way if any part of the code reference the
object after is was freed and before it is reallocated, a segv occurs at
the exact offending location. It does a few extra things such as writing
to the memory area before freeing to detect double-frees and free of
read-only areas, and placing the data at the end of the page instead of
the beginning so that out of bounds accesses are easier to spot. The
amount of memory used with this is huge (about 10 times the regular
usage) but it can be useful sometimes.
2017-11-22 19:43:57 +01:00
Willy Tarreau
f13322ede1 MINOR: pools: prepare functions to override malloc/free in pools
This will be useful to add some debugging capabilities. For now it
changes nothing.
2017-11-22 19:27:44 +01:00
William Lallemand
111bfef33c MEDIUM: shctx: use unsigned int for len and block_count
Allows bigger objects to be cached in the shctx, the first
implementation was only storing small ssl session, but we want to store
bigger HTTP response.
2017-11-21 21:35:04 +01:00
Willy Tarreau
59a10fb53d MEDIUM: h2: change hpack_decode_headers() to only provide a list of headers
The current H2 to H1 protocol conversion presents some issues which will
require to perform some processing on certain headers before writing them
so it's not possible to convert HPACK to H1 on the fly.

This commit modifies the headers decoding so that it now works in two
phases : hpack_decode_headers() only decodes the HPACK stream in the
HEADERS frame and puts the result into a list. Headers which require
storage (huffman-compressed or from the dynamic table) are stored in
a chunk allocated by the H2 demuxer. Then once the headers are properly
decoded into this list, h2_make_h1_request() is called with this list
to produce the HTTP/1.1 request into the destination buffer. The list
necessarily enforces a limit. Here we use 2*MAX_HTTP_HDR, which means
that we can have as many individual cookies as we have regular headers
if a client decides to break their cookies into multiple values. This
seams reasonable and will allow the H1 parser to decide whether it's
too much or not.

Thus the output stream is not produced on the fly anymore and this will
permit to deal with certain corner cases like reparing the Cookie header
(which for now is not done).

In order to limit header duplication and parsing, the known pseudo headers
continue to be passed by their index : the name element in the list then
has a NULL pointer and the value is the pseudo header's index. Given that
these ones represent about half of the incoming requests and need to be
found quickly, it maintains an acceptable level of performance.

The code was significantly reduced by doing this because the orignal code
had to deal with HPACK and H1 combinations (eg: index vs not indexed, etc)
and now the HPACK decoding is totally focused on the decompression, and
the H1 encoding doesn't have to deal with the issue of wrapping input for
example.

One bug was addressed here (though it couldn't happen at the moment). The
H2 demuxer used to detect a failure to write the request into the H1 buffer
and would then detect if the output buffer wraps, realign it and try again.
The problem by doing so was that the HPACK context was already modified and
not rewindable. Thus the size check is now performed first and a failure is
reported if it doesn't fit.
2017-11-21 21:13:36 +01:00
Willy Tarreau
f24ea8e45e MEDIUM: h2: add a function to emit an HTTP/1 request from a headers list
The current H2 to H1 protocol conversion presents some issues which will
require to perform some processing on certain headers before writing them
so it's not possible to convert HPACK to H1 on the fly.

Here we introduce a function which performs half of what hpack_decode_header()
used to do, which is to take a list of headers on input and emit the
corresponding request in HTTP/1.1 format. The code is the same and functions
were renamed to be prefixed with "h2" instead of "hpack", though it ends
up being simpler as the various HPACK-specific cases could be fused into
a single one (ie: add header).

Moving this part here makes a lot of sense as now this code is specific to
what is documented in HTTP/2 RFC 7540 and will be able to deal with special
cases related to H2 to H1 conversion enumerated in section 8.1.

Various error codes which were previously assigned to HPACK were never
used (aside being negative) and were all replaced by -1 with a comment
indicating what error was detected. The code could be further factored
thanks to this but this commit focuses on compatibility first.

This code is not yet used but builds fine.
2017-11-21 21:13:33 +01:00
Willy Tarreau
dbd25fc75a BUILD: compiler: add a new type modifier __maybe_unused
While gcc only emits warnings about unused static functions, Clang also
emits such a warning when the functions are inlined. This is a bit
annoying at certain places where functions are provided to manipulate
multiple data types and are not yet used. Let's have a type modifier
"__maybe_unused" which sets the "unused" attribute like the Linux kernel
does. It's elegant as it allows the code author to indicate that it knows
that this element might be unused. It works on variables as well, which
is convenient to remove ifdefs around local variables in certain functions,
but doesn't work on labels.
2017-11-20 21:27:27 +01:00
Willy Tarreau
2532bd2f81 BUILD: threads/plock: fix a build issue on Clang without optimization
[ plock commit 4c53fd3a0b2b1892817cebd0db012a52f4087850 ]

Pieter Baauw reported a build issue affecting haproxy after plock was
included. It happens that expressions of the form :

     if ((const) ? (expr1) : (expr2))
       do_something()

always produce code for both expr1 and expr2 on Clang when building
without optimization. The resulting asm code is even funny, basically
doing :

     mov reg, 1
     cmp reg, 1
     ...

This causes our sizeof() tests to fail to build because we purposely
dereference a fake function that reports the location and nature of the
inconsistency, but this fake function appears in the object code despite
all conditions being there to avoid it.

However the compiler is still smart enough to optimize away code doing

    if (const)
       do_something()

So we simply repeat the condition before do_something(), and the dummy
function is not referenced anymore unless really required.
2017-11-20 21:06:35 +01:00
Willy Tarreau
b5f271555e MINOR: threads/build: atomic: replace the few inlines with macros
[ plock commit 61e255286ae32e83e1a3174dd7c49eda99880a8b]

There are a few inlines such as pl_barrier() and pl_cpu_relax() which
are used a lot. Unfortunately, while building test code at -O0, inlining
is disabled and these ones are called a lot and show up a lot in any
profile, are traced into when single-stepping with a debugger, etc, thus
they are polluting the landscape. Since they're single-asm statements,
there is no reason for not turning them into macros.

The result becomes fairly visible here at -O0 :

  $ size latency.inline latency.macro
     text    data     bss     dec     hex filename
    11431     692     656   12779    31eb treelock.inline
    10967     692     656   12315    301b treelock.macro

And it was verified that regularly optimized code remains strictly identical.
2017-11-20 21:06:35 +01:00
Willy Tarreau
d0d8ba59d3 MINOR: threads/atomic: implement pl_bts() on non-x86
[ plock commit da17ba320aad3a8faf08e36fca604de9cad21fdd ]

This one was missing, it can be done using sync_fetch_and_or().
2017-11-20 21:06:03 +01:00
Willy Tarreau
01b8398b9e MINOR: threads/atomic: implement pl_mb() in asm on x86
[ plock commit 44081ea493dd78dab48076980e881748e9b33db5 ]

Older compilers (eg: gcc 3.4) don't provide __sync_synchronize() so let's
do it by hand on this platform.
2017-11-20 20:45:47 +01:00
Willy Tarreau
f7ba77eb80 MINOR: threads/plock: rename local variables in macros to avoid conflicts
[ plock commit b155d5c762fb9a9793911881f80e61faa6b0e889 ]

Local variables "l", "i" and "ret" were renamed "__pl_l", "__pl_i" and
"__pl_r" respectively, to limit the risk of conflicts with existing
variables in application code.
2017-11-20 20:45:43 +01:00
Willy Tarreau
98409e34ca MINOR: threads/atomic: rename local variables in macros to avoid conflicts
[ plock commit bfac5887ebabb8ef753b0351f162265767eb219b ]

Local variable "t" was renamed "__pl_t" to limit the risk of conflicts
with existing variables in application code.
2017-11-20 20:45:38 +01:00
William Lallemand
71bd11a1f3 MEDIUM: cache: enable the HTTP analysers
Enable the same analysers as the stats applet.
Allows keepalive and termination flags to work.
2017-11-20 19:22:27 +01:00
William Lallemand
44e259c0b7 CLEANUP: cache: remove unused struct
Remove unused structure which remain from old dev.
2017-11-20 19:22:27 +01:00
Tim Duesterhus
d6942c8297 MEDIUM: mworker: Add systemd Type=notify support
This patch adds support for `Type=notify` to the systemd unit.

Supporting `Type=notify` improves both starting as well as reloading
of the unit, because systemd will be let known when the action completed.

See this quote from `systemd.service(5)`:
> Note however that reloading a daemon by sending a signal (as with the
> example line above) is usually not a good choice, because this is an
> asynchronous operation and hence not suitable to order reloads of
> multiple services against each other. It is strongly recommended to
> set ExecReload= to a command that not only triggers a configuration
> reload of the daemon, but also synchronously waits for it to complete.

By making systemd aware of a reload in progress it is able to wait until
the reload actually succeeded.

This patch introduces both a new `USE_SYSTEMD` build option which controls
including the sd-daemon library as well as a `-Ws` runtime option which
runs haproxy in master-worker mode with systemd support.

When haproxy is running in master-worker mode with systemd support it will
send status messages to systemd using `sd_notify(3)` in the following cases:

- The master process forked off the worker processes (READY=1)
- The master process entered the `mworker_reload()` function (RELOADING=1)
- The master process received the SIGUSR1 or SIGTERM signal (STOPPING=1)

Change the unit file to specify `Type=notify` and replace master-worker
mode (`-W`) with master-worker mode with systemd support (`-Ws`).

Future evolutions of this feature could include making use of the `STATUS`
feature of `sd_notify()` to send information about the number of active
connections to systemd. This would require bidirectional communication
between the master and the workers and thus is left for future work.
2017-11-20 18:39:41 +01:00
Olivier Houchard
e6060c5d87 MINOR: SSL: Store the ASN1 representation of client sessions.
Instead of storing the SSL_SESSION pointer directly in the struct server,
store the ASN1 representation, otherwise, session resumption is broken with
TLS 1.3, when multiple outgoing connections want to use the same session.
2017-11-16 19:03:32 +01:00
Christopher Faulet
595d7b72a6 MINOR: applets: Use a bitfield to track applets activity per-thread
a bitfield has been added to know if there are runnable applets for a
thread. When an applet is woken up, the bits corresponding to its thread_mask
are set. When all active applets for a thread is get to be processed, the thread
is removed from active ones by unsetting its tid_bit from the bitfield.
2017-11-16 11:19:46 +01:00
Christopher Faulet
3911ee85df MINOR: tasks: Use a bitfield to track tasks activity per-thread
a bitfield has been added to know if there are runnable tasks for a thread. When
a task is woken up, the bits corresponding to its thread_mask are set. When all
tasks for a thread have been evaluated without any wakeup, the thread is removed
from active ones by unsetting its tid_bit from the bitfield.
2017-11-16 11:19:46 +01:00
William Lallemand
75ea0a06b0 BUG/MEDIUM: mworker: does not close inherited FD
At the end of the master initialisation, a call to protocol_unbind_all()
was made, in order to close all the FDs.

Unfortunately, this function closes the inherited FDs (fd@), upon reload
the master wasn't able to reload a configuration with those FDs.

The create_listeners() function now store a flag to specify if the fd
was inherited or not.

Replace the protocol_unbind_all() by  mworker_cleanlisteners() +
deinit_pollers()
2017-11-15 19:53:33 +01:00
Willy Tarreau
9c1e15d8cd MINOR: tools: emphasize the node being worked on in the tree dump
Now we can show in dotted red the node being removed or surrounded in red
a node having been inserted, and add a description on the graph related to
the operation in progress for example.
2017-11-15 19:43:05 +01:00
Willy Tarreau
ed3cda02ae MINOR: tools: add a function to dump a scope-aware tree to a file
It emits a dump in DOT format for graphing purposes during debugging
sessions. It's convenient to dump the run queue.
2017-11-15 16:07:15 +01:00
Christopher Faulet
99bca65f53 BUG/MEDIUM: standard: itao_str/idx and quote_str/idx must be thread-local
This bug has an impact on the stats applet and easily leads to a crash of HAProxy.

This is specific to threads, no backport is needed.
2017-11-14 18:11:57 +01:00
Christopher Faulet
e9a896e09e BUG/MINOR: threads: tid_bit must be a unsigned long
This is specific to threads, no backport is needed.
2017-11-14 18:11:28 +01:00
Christopher Faulet
fa5c812a6b BUG/MINOR: buffers: Fix b_alloc_margin to be "fonctionnaly" thread-safe
b_alloc_margin is, strickly speeking, thread-safe. It will not crash
HAproxy. But its contract is not respected anymore in a multithreaded
environment. In this function, we need to be sure to have <margin> buffers
available in the pool after the allocation. So to have this guarantee, we must
lock the memory pool during all the operation. This also means, we must call
internal and lockless memory functions (prefixed with '__').

For the record, this patch fixes a pernicious bug happens after a soft reload
where some streams can be blocked infinitly, waiting for a buffer in the
buffer_wq list. This happens because, during a soft reload, pool_gc2 is called,
making some calls to b_alloc_fast fail.

This is specific to threads, no backport is needed.
2017-11-13 11:42:48 +01:00
Christopher Faulet
9dcf9b6f03 MINOR: threads: Use __decl_hathreads to declare locks
This macro should be used to declare variables or struct members depending on
the USE_THREAD compile option. It avoids the encapsulation of such declarations
between #ifdef/#endif. It is used to declare all lock variables.
2017-11-13 11:38:17 +01:00
Willy Tarreau
387bd4f69f CLEANUP: global: introduce variable pid_bit to avoid shifts with relative_pid
At a number of places, bitmasks are used for process affinity and to map
listeners to processes. Every time 1UL<<(relative_pid-1) is used. Let's
create a "pid_bit" variable corresponding to this value to clean this up.
2017-11-10 19:08:14 +01:00
Willy Tarreau
28b55c6fed CLEANUP: mux: remove the unused "release()" function
In commit 53a4766 ("MEDIUM: connection: start to introduce a mux layer
between xprt and data") we introduced a release() function which ends
up never being used. Let's get rid of it now.
2017-11-10 16:43:05 +01:00
Willy Tarreau
aa39860aef MINOR: tools: don't use unlikely() in hex2i()
This small inline function causes some pain to the compiler when used
inside other functions due to its use of the unlikely() hint for non-digits.
It causes the letters to be processed far away in the calling function and
makes the code less efficient. Removing these unlikely() hints has increased
the chunk size parsing by around 5%.
2017-11-10 11:19:54 +01:00
Willy Tarreau
b15e3fefc9 BUG/MEDIUM: h1: ensure the chunk size parser can deal with full buffers
The HTTP/1 code always has the reserve left available so the buffer is
never full there. But with HTTP/2 we have to deal with full buffers,
and it happens that the chunk size parser cannot tell the difference
between a full buffer and an empty one since it compares the start and
the stop pointer.

Let's change this to instead deal with the number of bytes left to process.

As a side effect, this code ends up being about 10% faster than the previous
one, even on HTTP/1.
2017-11-10 11:17:08 +01:00
Christopher Faulet
c5a9d5bf23 BUG/MEDIUM: stream-int: Don't loss write's notifs when a stream is woken up
When a write activity is reported on a channel, it is important to keep this
information for the stream because it take part on the analyzers' triggering.
When some data are written, the flag CF_WRITE_PARTIAL is set. It participates to
the task's timeout updates and to the stream's waking. It is also used in
CF_MASK_ANALYSER mask to trigger channels anaylzers. In the past, it was cleared
by process_stream. Because of a bug (fixed in commit 95fad5ba4 ["BUG/MAJOR:
stream-int: don't re-arm recv if send fails"]), It is now cleared before each
send and in stream_int_notify. So it is possible to loss this information when
process_stream is called, preventing analyzers to be called, and possibly
leading to a stalled stream.

Today, this happens in HTTP2 when you call the stat page or when you use the
cache filter. In fact, this happens when the response is sent by an applet. In
HTTP1, everything seems to work as expected.

To fix the problem, we need to make the difference between the write activity
reported to lower layers and the one reported to the stream. So the flag
CF_WRITE_EVENT has been added to notify the stream of the write activity on a
channel. It is set when a send succedded and reset by process_stream. It is also
used in CF_MASK_ANALYSER. finally, it is checked in stream_int_notify to wake up
a stream and in channel_check_timeouts.

This bug is probably present in 1.7 but it seems to have no effect. So for now,
no needs to backport it.
2017-11-09 15:16:05 +01:00
Willy Tarreau
1b4cf9b754 BUG/MINOR: h1: the HTTP/1 make status code parser check for digits
The H1 parser used by the H2 gateway was a bit lax and could validate
non-numbers in the status code. Since it computes the code on the fly
it's problematic, as "30:" is read as status code 310. Let's properly
check that it's a number now. No backport needed.
2017-11-09 11:15:45 +01:00
Olivier Houchard
522eea7110 MINOR: ssl: Handle sending early data to server.
This adds a new keyword on the "server" line, "allow-0rtt", if set, we'll try
to send early data to the server, as long as the client sent early data, as
in case the server rejects the early data, we no longer have them, and can't
resend them, so the only option we have is to send back a 425, and we need
to be sure the client knows how to interpret it correctly.
2017-11-08 14:11:10 +01:00
Emeric Brun
d8b3b65faa BUG/MEDIUM: splice/threads: pipe reuse list was not protected.
The list is now protected using a global spinlock.
2017-11-07 14:47:28 +01:00
Christopher Faulet
2a944ee16b BUILD: threads: Rename SPIN/RWLOCK macros using HA_ prefix
This remove any name conflicts, especially on Solaris.
2017-11-07 11:10:24 +01:00
Olivier Houchard
55dcdf4c39 BUG/MINOR: dns: Don't try to get the server lock if it's already held.
dns_link_resolution() can be called with the server lock already held, so
don't attempt to lock it again in that case.
2017-11-06 18:34:24 +01:00
Willy Tarreau
88ac59be4d MINOR: threads: use faster locks for the spin locks
The spin locks used to rely on W locks, which involve a loop waiting
for readers to leave, and this doesn't happen here. It's more efficient
to use S locks instead, which are also mutually exclusive and do not
have this loop. This saves one test per spinlock and a few tens of
bytes allowing certain functions to be inlined.
2017-11-06 11:20:11 +01:00
Willy Tarreau
8d38805d3d MAJOR: task: make use of the scope-aware ebtree functions
Currently the task scheduler suffers from an O(n) lookup when
skipping tasks that are not for the current thread. The reason
is that eb32_lookup_ge() has no information about the current
thread so it always revisits many tasks for other threads before
finding its own tasks.

This is particularly visible with HTTP/2 since the number of
concurrent streams created at once causes long series of tasks
for the same stream in the scheduler. With only 10 connections
and 100 streams each, by running on two threads, the performance
drops from 640kreq/s to 11.2kreq/s! Lookup metrics show that for
only 200000 task lookups, 430 million skips had to be performed,
which means that on average, each lookup leads to 2150 nodes to
be visited.

This commit backports the principle of scope lookups for ebtrees
from the ebtree_v7 development tree. The idea is that each node
contains a mask indicating the union of the scopes for the nodes
below it, which is fed during insertion, and used during lookups.

Then during lookups, branches that do not contain any leaf matching
the requested scope are simply ignored. This perfectly matches a
thread mask, allowing a thread to only extract the tasks it cares
about from the run queue, and to always find them in O(log(n))
instead of O(n). Thus the scheduler uses tid_bit and
task->thread_mask as the ebtree scope here.

Doing this has recovered most of the performance, as can be seen on
the test below with two threads, 10 connections, 100 streams each,
and 1 million requests total :

                              Before     After    Gain
              test duration : 89.6s      4.73s     x19
    HTTP requests/s (DEBUG) : 11200     211300     x19
     HTTP requests/s (PROD) : 15900     447000     x28
             spin_lock time : 85.2s      0.46s    /185
            time per lookup : 13us       40ns     /325

Even when going to 6 threads (on 3 hyperthreaded CPU cores), the
performance stays around 284000 req/s, showing that the contention
is much lower.

A test showed that there's no benefit in using this for the wait queue
though.
2017-11-06 11:20:11 +01:00
Willy Tarreau
62a124977b MINOR: applets: no need to check for runqueue's emptiness in appctx_res_wakeup()
The __appctx_wakeup() function already does it. It matters with threads
enabled because it simplifies the code in appctx_res_wakeup() to get rid
of this test.
2017-11-05 12:01:11 +01:00
Willy Tarreau
bbd09b9306 BUG/MAJOR: thread/listeners: enable_listener must not call unbind_listener()
unbind_listener() takes the listener lock, which is already held by
enable_listener(). This situation happens when starting with nbproc > 1
with some bind lines limited to a certain process, because in this case
enable_listener() tries to stop unneeded listeners.

This commit introduces __do_unbind_listeners() which must be called with
the lock held, and makes enable_listener() use this one. Given that the
only return code has never been used and that it starts to make the code
more complicated to propagate it before throwing it to the trash, the
function's return type was changed to void.
2017-11-05 11:38:44 +01:00
David Carlier
5222d8eb25 BUG/MINOR: stdarg.h inclusion
Needed for the memvprintf part, the va_list type.
Spotted during OpenBSD build.
2017-11-03 15:04:09 +01:00
Willy Tarreau
4b75fffa2b BUG/MAJOR: buffers: fix get_buffer_nc() for data at end of buffer
This function incorrectly dealt with the case where data doesn't
wrap but lies at the end of the buffer, resulting in Lukas' reported
data corruption with HTTP/2. No backport is needed, it was introduced
for HTTP/2 in 1.8-dev.
2017-11-02 17:16:07 +01:00
Willy Tarreau
7c2a2ad65c BUG/MINOR: thread: fix a typo in the debug code
__spin_unlock() used to call RWLOCK_WRUNLOCK() to unlock in the
debug code. It's harmless as they happen to be identical.
2017-11-02 16:26:02 +01:00
William Lallemand
77c1197bfb MEDIUM: cache: deliver objects from cache
Lookup objects in the cache and deliver them using the http-request
action "cache-use".
2017-10-31 21:17:19 +01:00
William Lallemand
41db46035e MEDIUM: cache: configuration parsing and initialization
Parse a configuration section "cache" and a http-{response,request}
actions.

Example:

    listen frt
        mode http
        http-response cache-store foobar
        http-request cache-use foobar

    cache foobar
        total-max-size 4   # size in megabytes
2017-10-31 21:17:19 +01:00
Willy Tarreau
ffca736401 MINOR: h2: centralize all HTTP/2 protocol elements and constants
These constants from RFC7540 will be centralized into common/h2.h for
use by the future h2 mux and other places.
2017-10-31 18:03:24 +01:00
Willy Tarreau
1be4f3d8af MEDIUM: hpack: implement basic hpack encoding
For now it only supports literals and a bit of static header table
references for the 9 most common header field names (date, server,
content-type, content-length, last-modified, accept-ranges, etag,
cache-control, location).

A previous incarnation of this commit used to strip the forbidden H2
header names (connection, proxy-connection, upgrade, transfer-encoding,
keep-alive) but this is no longer the case as this filtering is irrelevant
to HPACK encoding and is specific to H2, so this will have to be done by
the caller.

It's quite not optimal but works fine enough to prepare some valid and
partially compressed responses during development.
2017-10-31 18:03:24 +01:00
Willy Tarreau
679790baae MINOR: hpack: implement the decoder
The decoder is now fully functional. It makes use of the dynamic header
table. Dynamic header table size updates are currently ignored, as our
initially advertised value is the highest we support. Strictly speaking,
the impact is that a client referencing a header field after such an
update wouldn't observe an error instead of the connection being dropped
if it was implemented.

Decoded header fields are copied into a target buffer in HTTP/1 format
using HTTP/1.1 as the version. The Host header field is automatically
appended if a ":authority" header field is present.

All decoded header fields can be displayed if the file is compiled with
DEBUG_HPACK.
2017-10-31 18:03:24 +01:00
Willy Tarreau
ce04094c4a MINOR: hpack: implement the header tables management
This code deals with header insertion, retrieval and eviction, as well
as with dynamic header table defragmentation. It is functional for use
as a decoder and was heavily tested in this context. There's still some
room for optimization (eg: the defragmentation code currently does it
in place using a memcpy).

Also for now the dynamic header table is allocated using malloc() while
a pool needs to be created instead.

This code was mostly imported from https://github.com/wtarreau/http2-exp
with "hpack_" prepended in front of most names to avoid risks of conflicts.
Some small cleanups and renamings were applied during the import. This
version must be considered more recent.

Some HPACK error codes were placed here (HPACK_ERR_*), not exactly because
they're needed by the decoder but they'll be needed by all callers. Maybe
a different location should be found.
2017-10-31 18:03:24 +01:00
Willy Tarreau
a004ade512 MINOR: hpack: implement the HPACK Huffman table decoder
The code was borrowed from the HPACK experimental implementations
available here :

    https://github.com/wtarreau/http2-exp

It contains the Huffman table as specified in RFC7541 Appendix B, and a
set of reverse tables used to decode a Huffman byte stream, and produced
by contrib/h2/gen-rht. The encoder is not finalized, it doesn't emit the
byte stream but this is not needed for now.
2017-10-31 18:03:24 +01:00
Willy Tarreau
436d333124 MEDIUM: connection: add a destroy callback
This callback will be used to release upper layers when a mux is in
use. Given that the mux can be asynchronously deleted, we need a way
to release the extra information such as the session.

This callback will be called directly by the mux upon releasing
everything and before the connection itself is released, so that
the callee can find its information inside the connection if needed.

The way it currently works is not perfect, and most likely this should
instead become a mux release callback, but for now we have no easy way
to add mux-specific stuff, and since there's one mux per connection,
it works fine this way.
2017-10-31 18:03:24 +01:00
Willy Tarreau
2c52a2b9ee MEDIUM: connection: make mux->detach() release the connection
For H2, only the mux's timeout or other conditions might cause a
release of the mux and the connection, no stream should be allowed
to kill such a shared connection. So a stream will only detach using
cs_destroy() which will call mux->detach() then free the cs.

For now it's only handled by mux_pt. The goal is that the data layer
never has to care about the connection, which will have to be released
depending on the mux's mood.
2017-10-31 18:03:24 +01:00
Willy Tarreau
6978db35e9 MINOR: connection: add cs_close() to close a conn_stream
This basically calls cs_shutw() followed by cs_shutr(). Both of them
are called in the most conservative mode so that any previous call is
still respected. The CS flags are cleared so that it can be reused
(this is important for connection retries when conn and CS are reused
without being reallocated).
2017-10-31 18:03:24 +01:00
Willy Tarreau
ecdb3fe9f4 MINOR: conn_stream: modify cs_shut{r,w} API to pass the desired mode
Now we can specify how we want to shutdown (drain vs reset, and normal
vs silent), and this propagates to the mux then the transport layer.
2017-10-31 18:03:23 +01:00
Willy Tarreau
79dadb5335 MINOR: conn_stream: new shutr/w status flags
In order to support all shutdown modes on the CS, we introduce the
following flags :
  CS_FL_SHRD : shut read, drain extra data
  CS_FL_SHRR : shut read, reset extra data
  CS_FL_SHWN : shut write, normal notification
  CS_FL_SHWS : shut write, silent mode (no notification)

And the following modes for shutr/shutw :

  CS_SHR_DRAIN, CS_SHR_RESET, CS_SHW_NORMAL, CS_SHW_SILENT.

Note: it's possible that we won't need to distinguish the two shutw
above as they're only an action.

For now they are not used.
2017-10-31 18:03:23 +01:00
Olivier Houchard
9aaf778129 MAJOR: connection : Split struct connection into struct connection and struct conn_stream.
All the references to connections in the data path from streams and
stream_interfaces were changed to use conn_streams. Most functions named
"something_conn" were renamed to "something_cs" for this. Sometimes the
connection still is what matters (eg during a connection establishment)
and were not always renamed. The change is significant and minimal at the
same time, and was quite thoroughly tested now. As of this patch, all
accesses to the connection from upper layers go through the pass-through
mux.
2017-10-31 18:03:23 +01:00
Willy Tarreau
63dd75d934 MINOR: connection: introduce the conn_stream manipulation functions
Most of the functions dealing with conn_streams are here. They act at
the data layer and interact with the mux. For now they are not used yet
but everything builds.
2017-10-31 18:03:23 +01:00
Olivier Houchard
8e6147292e MINOR: mux: add more methods to mux_ops
We'll need to support reading/writing from both sides, with buffers and
pipes, as well as retrieving/updating flags.
2017-10-31 18:03:23 +01:00
Olivier Houchard
e2b40b9eab MINOR: connection: introduce conn_stream
This patch introduces a new struct conn_stream. It's the stream-side of
a multiplexed connection. A pool is created and destroyed on exit. For
now the conn_streams are not used at all.
2017-10-31 18:03:23 +01:00
Willy Tarreau
2e0b2b5f83 MEDIUM: session: use the ALPN token and proxy mode to select the mux
When an incoming connection is made on an HTTP mode frontend, the
session now looks up the mux to use based on the ALPN token and the
proxy mode. This will allow easier mux registration, and we don't
need to hard-code the mux_pt_ops anymore.
2017-10-31 18:03:23 +01:00
Willy Tarreau
2386be64ba MINOR: connection: implement alpn registration of muxes
Selecting a mux based on ALPN and the proxy mode will quickly become a
pain. This commit provides new functions to register/lookup a mux based
on the ALPN string and the proxy mode to make this easier. Given that
we're not supposed to support a wide range of muxes, the lookup should
not have any measurable performance impact.
2017-10-31 18:03:23 +01:00
Willy Tarreau
53a4766e40 MEDIUM: connection: start to introduce a mux layer between xprt and data
For HTTP/2 and QUIC, we'll need to deal with multiplexed streams inside
a connection. After quite a long brainstorming, it appears that the
connection interface to the existing streams is appropriate just like
the connection interface to the lower layers. In fact we need to have
the mux layer in the middle of the connection, between the transport
and the data layer.

A mux can exist on two directions/sides. On the inbound direction, it
instanciates new streams from incoming connections, while on the outbound
direction it muxes streams into outgoing connections. The difference is
visible on the mux->init() call : in one case, an upper context is already
known (outgoing connection), and in the other case, the upper context is
not yet known (incoming connection) and will have to be allocated by the
mux. The session doesn't have to create the new streams anymore, as this
is performed by the mux itself.

This patch introduces this and creates a pass-through mux called
"mux_pt" which is used for all new connections and which only
calls the data layer's recv,send,wake() calls. One incoming stream
is immediately created when init() is called on the inbound direction.
There should not be any visible impact.

Note that the connection's mux is purposely not set until the session
is completed so that we don't accidently run with the wrong mux. This
must not cause any issue as the xprt_done_cb function is always called
prior to using mux's recv/send functions.
2017-10-31 18:03:23 +01:00
Willy Tarreau
b29dc95a97 MINOR: threads: add a portable barrier for threads and non-threads
HA_BARRIER() is just a simple memory barrier to prevent the compiler
from reordering our code.
2017-10-31 18:01:18 +01:00
Willy Tarreau
2510f702f9 MINOR: h1: add a function to measure the trailers length
This is needed in the H2->H1 gateway so that we know how long the trailers
block is in chunked encoding. It returns the number of bytes, or 0 if some
are missing, or -1 in case of parse error.
2017-10-31 17:18:10 +01:00
Willy Tarreau
f65610a83d CLEANUP: threads: rename process_mask to thread_mask
It was a leftover from the last cleaning session; this mask applies
to threads and calling it process_mask is a bit confusing. It's the
same in fd, task and applets.
2017-10-31 16:06:06 +01:00
Olivier Houchard
d16bfe6c01 BUG/MINOR: dns: Fix SRV records with the new thread code.
srv_set_fqdn() may be called with the DNS lock already held, but tries to
lock it anyway. So, add a new parameter to let it know if it was already
locked or not;
2017-10-31 15:47:55 +01:00
Willy Tarreau
a5e0590b80 BUILD: stick-tables: silence an uninitialized variable warning
Commit 819fc6f ("MEDIUM: threads/stick-tables: handle multithreads on
stick tables") introduced a valid warning about an uninitialized return
value in stksess_kill_if_expired(). It just happens that this result is
never used, so let's turn the function back to void as previously.
2017-10-31 15:45:42 +01:00
Emeric Brun
6e0128630b BUG/MAJOR: threads/freq_ctr: fix lock on freq counters.
The wrong bit was set to keep the lock on freq counter update. And the read
functions were re-worked to use volatile.

Moreover, when a freq counter is updated, it is now rotated only if the current
counter is in the past (now.tv_sec > ctr->curr_sec). It is important with
threads because the current time (now) is thread-local. So, rounded to the
second, the time may vary by more or less 1 second. So a freq counter rotated by
one thread may be see 1 second in the future. In this case, it is updated but
not rotated.
2017-10-31 13:58:33 +01:00
Christopher Faulet
cd7879adc2 BUG/MEDIUM: threads: Run the poll loop on the main thread too
There was a flaw in the way the threads was created. the main one was just used
to create all the others and just wait to exit. Now, it is used to run a poll
loop. So we only create nbthread-1 threads.

This also fixes a bug about the compression filter when there is only 1 thread
(nbthread == 1 or no threads support). The bug was in the way thread-local
resources was initialized. per-thread init/deinit callbacks were never called
for the main process. So, with nthread set to 1, some buffers remained
uninitialized.
2017-10-31 13:58:33 +01:00
Emeric Brun
9f0b458525 MEDIUM: threads/server: Use the server lock to protect health check and cli concurrency 2017-10-31 13:58:33 +01:00
Christopher Faulet
c2a89a6aed MINOR: threads/mailers: Add a lock to protect queues of email alerts 2017-10-31 13:58:33 +01:00
Christopher Faulet
cfda847643 MINOR: threads/checks: Add a lock to protect the pid list used by external checks 2017-10-31 13:58:33 +01:00
Christopher Faulet
6251902e67 MINOR: threads: Add thread-map config parameter in the global section
By default, no affinity is set for threads. To bind threads on CPU, you must
define a "thread-map" in the global section. The format is the same than the
"cpu-map" parameter, with a small difference. The process number must be
defined, with the same format than cpu-map ("all", "even", "odd" or a number
between 1 and 31/63).

A thread will be bound on the intersection of its mapping and the one of the
process on which it is attached. If the intersection is null, no specific bind
will be set for the thread.
2017-10-31 13:58:33 +01:00
Christopher Faulet
b2812a6240 MEDIUM: thread/dns: Make DNS thread-safe 2017-10-31 13:58:33 +01:00
Christopher Faulet
24289f2e07 MEDIUM: thread/spoe: Make the SPOE thread-safe
Because there is not migration mechanism yet, all runtime information about an
SPOE agent are thread-local and async exchanges with agents are disabled when we
have serveral threads. Howerver, pipelining is still available. So for now, the
thread part of the SPOE is pretty simple.
2017-10-31 13:58:33 +01:00
Thierry FOURNIER
738a6d76f6 MEDIUM: threads/tasks: Add lock around notifications
This patch add lock around some notification calls
2017-10-31 13:58:32 +01:00
Thierry FOURNIER
952939d294 MEDIUM: threads/xref: Convert xref function to a thread safe model
Ensure that the unlink is done safely between thread and that
the peer struct will not destroy between the usage of the peer.
2017-10-31 13:58:32 +01:00
Thierry FOURNIER
94a6bfce9b MEDIUM: threads/lua: Cannot acces to the socket if we try to access from another thread.
We have two y for nsuring that the data is not concurently manipulated:
 - locks
 - running task on the same thread.
locks are expensives, it is better to avoid it.

This patch cecks that the Lua task run on the same thread that
the stream associated to the coprocess.

TODO: in a next version, the error should be replaced by a yield
and thread migration request.
2017-10-31 13:58:32 +01:00
Thierry FOURNIER
61ba0e2b6d MEDIUM: threads/lua: Add locks around the Lua execution parts.
Note that the Lua processing is not really thread safe. It provides
heavy system which consists to add our own lock function in the Lua
code and recompile the library. This system will probably not accepted
by maintainers of various distribs.

Our main excution point of the Lua is the function lua_resume(). A
quick looking on the Lua sources displays a lua_lock() a the start
of function and a lua_unlock() at the end of the function. So I
conclude that the Lua thread safe mode just perform a mutex around
all execution. So I prefer to do this in the HAProxy code, it will be
easier for distro maintainers.

Note that the HAProxy lua functions rounded by the macro SET_SAFE_LJMP
and RESET_SAFE_LJMP manipulates the Lua stack, so it will be careful
to set mutex around these functions.
2017-10-31 13:58:32 +01:00
Christopher Faulet
8ca3b4bc46 MEDIUM: threads/compression: Make HTTP compression thread-safe 2017-10-31 13:58:32 +01:00
Christopher Faulet
71a6a8efaa MEDIUM: threads/filters: Add init/deinit callback per thread
Now, it is possible to define init_per_thread and deinit_per_thread callbacks to
deal with ressources allocation for each thread.

This is the filter responsibility to deal with concurrency. This is also the
filter responsibility to know if HAProxy is started with some threads. A good
way to do so is to check "global.nbthread" value. If it is greater than 1, then
_per_thread callbacks will be called.
2017-10-31 13:58:32 +01:00
Christopher Faulet
e95f2c3ef5 MEDIUM: thread/vars: Make vars thread-safe
A RW lock has been added to the vars structure to protect each list of
variables. And a global RW lock is used to protect registered names.

When a varibable is fetched, we duplicate sample data because the variable could
be modified by another thread.
2017-10-31 13:58:32 +01:00
Christopher Faulet
94b712337d MEDIUM: threads/freq_ctr: Make the frequency counters thread-safe
When a frequency counter must be updated, we use the curr_sec/curr_tick fields
as a lock, by setting the MSB to 1 in a compare-and-swap to lock and by reseting
it to unlock. And when we need to read it, we loop until the counter is
unlocked. This way, the frequency counters are thread-safe without any external
lock. It is important to avoid increasing the size of many structures (global,
proxy, server, stick_table).
2017-10-31 13:58:32 +01:00
Emeric Brun
b5997f740b MAJOR: threads/map: Make acls/maps thread safe
locks have been added in pat_ref and pattern_expr structures to protect all
accesses to an instance of on of them. Moreover, a global lock has been added to
protect the LRU cache used for pattern matching.

Patterns are now duplicated after a successfull matching, to avoid modification
by other threads when the result is used.

Finally, the function reloading a pattern list has been modified to be
thread-safe.
2017-10-31 13:58:32 +01:00
Emeric Brun
821bb9beaa MAJOR: threads/ssl: Make SSL part thread-safe
First, OpenSSL is now initialized to be thread-safe. This is done by setting 2
callbacks. The first one is ssl_locking_function. It handles the locks and
unlocks. The second one is ssl_id_function. It returns the current thread
id. During the init step, we create as much as R/W locks as needed, ie the
number returned by CRYPTO_num_locks function.

Next, The reusable SSL session in the server context is now thread-local.

Shctx is now also initialized if HAProxy is started with several threads.

And finally, a global lock has been added to protect the LRU cache used to store
generated certificates. The function ssl_sock_get_generated_cert is now
deprecated because the retrieved certificate can be removed by another threads
in same time. Instead, a new function has been added,
ssl_sock_assign_generated_cert. It must be used to search a certificate in the
cache and set it immediatly if found.
2017-10-31 13:58:32 +01:00
Emeric Brun
6b35e9bfbf MEDIUM: threads/stream: Make streams list thread safe
Adds a global lock to protect the full streams list used to dump
sessions on stats socket.
2017-10-31 13:58:32 +01:00
Emeric Brun
a1dd243adb MAJOR: threads/buffer: Make buffer wait queue thread safe
Adds a global lock to protect the buffer wait queue.
2017-10-31 13:58:31 +01:00
Emeric Brun
80527f5bb6 MAJOR: threads/peers: Make peers thread safe
A lock is used to protect accesses to a peer structure.

A the lock is taken in the applet handler when the peer is identified
and released living the applet handler.

In the scheduling task for peers section, the lock is taken for every
listed peer and released at the end of the process task function.

The peer 'force shutdown' function was also re-worked.
2017-10-31 13:58:31 +01:00
Emeric Brun
1138fd0c57 MAJOR: threads/applet: Handle multithreading for applets
A global lock has been added to protect accesses to the list of active
applets. A process mask has also been added on each applet. Like for FDs and
tasks, it is used to know which threads are allowed to process an
applet. Because applets are, most of time, linked to a session, it should be
sticky on the same thread. But in all cases, it is the responsibility of the
applet handler to lock what have to be protected in the applet context.
2017-10-31 13:58:31 +01:00
Emeric Brun
272e252e61 MINOR: threads/regex: Change Regex trash buffer into a thread local variable 2017-10-31 13:58:31 +01:00
Emeric Brun
8c1aaa201a MEDIUM: threads/http: Make http_capture_bad_message thread-safe
This is done by passing the right stream's proxy (the frontend or the backend,
depending on the context) to lock the error snapshot used to store the error
info.
2017-10-31 13:58:31 +01:00
Emeric Brun
819fc6f563 MEDIUM: threads/stick-tables: handle multithreads on stick tables
The stick table API was slightly reworked:

A global spin lock on stick table was added to perform lookup and
insert in a thread safe way. The handling of refcount on entries
is now handled directly by stick tables functions under protection
of this lock and was removed from the code of callers.

The "stktable_store" function is no more externalized and users should
now use "stktable_set_entry" in any case of insertion. This last one performs
a lookup followed by a store if not found. So the code using "stktable_store"
was re-worked.

Lookup, and set_entry functions automatically increase the refcount
of the returned/stored entry.

The function "sticktable_touch" was renamed "sticktable_touch_local"
and is now able to decrease the refcount if last arg is set to true. It
is allowing to release the entry without taking the lock twice.

A new function "sticktable_touch_remote" is now used to insert
entries coming from remote peers at the right place in the update tree.
The code of peer update was re-worked to use this new function.
This function is also able to decrease the refcount if wanted.

The function "stksess_kill" also handle a parameter to decrease
the refcount on the entry.

A read/write lock is added on each entry to protect the data content
updates of the entry.
2017-10-31 13:58:31 +01:00
Christopher Faulet
5b51755aef MEDIUM: threads/lb: Make LB algorithms (lb_*.c) thread-safe
A lock for LB parameters has been added inside the proxy structure and atomic
operations have been used to update server variables releated to lb.

The only significant change is about lb_map. Because the servers status are
updated in the sync-point, we can call recalc_server_map function synchronously
in map_set_server_status_up/down function.
2017-10-31 13:58:31 +01:00
Christopher Faulet
5d42e099c5 MINOR: threads/server: Add a lock to deal with insert in updates_servers list
This list is used to save changes on the servers state. So when serveral threads
are used, it must be locked. The changes are then applied in the sync-point. To
do so, servers_update_status has be moved in the sync-point. So this is useless
to lock it at this step because the sync-point is a protected area by iteself.
2017-10-31 13:58:31 +01:00
Christopher Faulet
29f77e846b MEDIUM: threads/server: Add a lock per server and atomically update server vars
The server's lock is use, among other things, to lock acces to the active
connection list of a server.
2017-10-31 13:58:31 +01:00
Christopher Faulet
40a007cf2a MEDIUM: threads/server: Make connection list (priv/idle/safe) thread-safe
For now, we have a list of each type per thread. So there is no need to lock
them. This is the easiest solution for now, but not the best one because there
is no sharing between threads. An idle connection on a thread will not be able
be used by a stream on another thread. So it could be a good idea to rework this
patch later.
2017-10-31 13:58:30 +01:00
Christopher Faulet
ff8abcd31d MEDIUM: threads/proxy: Add a lock per proxy and atomically update proxy vars
Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.
2017-10-31 13:58:30 +01:00
Christopher Faulet
8d8aa0d681 MEDIUM: threads/listeners: Make listeners thread-safe
First, we use atomic operations to update jobs/totalconn/actconn variables,
listener's nbconn variable and listener's counters. Then we add a lock on
listeners to protect access to their information. And finally, listener queues
(global and per proxy) are also protected by a lock. Here, because access to
these queues are unusal, we use the same lock for all queues instead of a global
one for the global queue and a lock per proxy for others.
2017-10-31 13:58:30 +01:00
Christopher Faulet
b79a94c9f3 MEDIUM: threads/signal: Add a lock to make signals thread-safe
A global lock has been added to protect the signal processing. So when a signal
it triggered, only one thread will catch it.
2017-10-31 13:58:30 +01:00
Emeric Brun
c60def8368 MAJOR: threads/task: handle multithread on task scheduler
2 global locks have been added to protect, respectively, the run queue and the
wait queue. And a process mask has been added on each task. Like for FDs, this
mask is used to know which threads are allowed to process a task.

For many tasks, all threads are granted. And this must be your first intension
when you create a new task, else you have a good reason to make a task sticky on
some threads. This is then the responsibility to the process callback to lock
what have to be locked in the task context.

Nevertheless, all tasks linked to a session must be sticky on the thread
creating the session. It is important that I/O handlers processing session FDs
and these tasks run on the same thread to avoid conflicts.
2017-10-31 13:58:30 +01:00
Christopher Faulet
36716a7fec MEDIUM: threads/fd: Initialize the process mask during the call to fd_insert
Listeners will allow any threads to process the corresponding fd. But for other
FDs, we limit the processing to the current thread.
2017-10-31 13:58:30 +01:00
Christopher Faulet
a7c5d43085 MINOR: threads/fd: Add a mask of threads allowed to process on each fd in fdtab array 2017-10-31 13:58:30 +01:00
Christopher Faulet
d4604adeaa MAJOR: threads/fd: Make fd stuffs thread-safe
Many changes have been made to do so. First, the fd_updt array, where all
pending FDs for polling are stored, is now a thread-local array. Then 3 locks
have been added to protect, respectively, the fdtab array, the fd_cache array
and poll information. In addition, a lock for each entry in the fdtab array has
been added to protect all accesses to a specific FD or its information.

For pollers, according to the poller, the way to manage the concurrency is
different. There is a poller loop on each thread. So the set of monitored FDs
may need to be protected. epoll and kqueue are thread-safe per-se, so there few
things to do to protect these pollers. This is not possible with select and
poll, so there is no sharing between the threads. The poller on each thread is
independant from others.

Finally, per-thread init/deinit functions are used for each pollers and for FD
part for manage thread-local ressources.

Now, you must be carefull when a FD is created during the HAProxy startup. All
update on the FD state must be made in the threads context and never before
their creation. This is mandatory because fd_updt array is thread-local and
initialized only for threads. Because there is no pollers for the main one, this
array remains uninitialized in this context. For this reason, listeners are now
enabled in run_thread_poll_loop function, just like the worker pipe.
2017-10-31 13:58:30 +01:00
Christopher Faulet
b349e48ede MEDIUM: threads/pool: Make pool thread-safe by locking all access to a pool
A lock has been added for each memory pool. It is used to protect the pool
during allocations and releases. It is also used when pool info are dumped.
2017-10-31 13:58:30 +01:00
Christopher Faulet
f8188c69fa MEDIUM: threads/logs: Make logs thread-safe
log buffers and static variables used in log functions are now thread-local. So
there is no need to lock anything to log messages. Moreover, per-thread
init/deinit functions are now used to initialize these buffers.
2017-10-31 13:58:30 +01:00
Christopher Faulet
9a65571781 MEDIUM: threads/time: Many global variables from time.h are now thread-local 2017-10-31 13:58:30 +01:00
Christopher Faulet
6adad11283 MEDIUM: threads/chunks: Transform trash chunks in thread-local variables
So, per-thread init/deinit functions are registered to allocate/release them.
2017-10-31 13:58:30 +01:00
Christopher Faulet
339fff8a18 MEDIUM: threads: Adds a set of functions to handle sync-point
A sync-point is a protected area where you have the warranty that no concurrency
access is possible. It is implementated as a thread barrier to enter in the
sync-point and another one to exit from it. Inside the sync-point, all threads
that must do some syncrhonous processing will be called one after the other
while all other threads will wait. All threads will then exit from the
sync-point at the same time.

A sync-point will be evaluated only when necessary because it is a costly
operation. To limit the waiting time of each threads, we must have a mechanism
to wakeup all threads. This is done with a pipe shared by all threads. By
writting in this pipe, we will interrupt all threads blocked on a poller. The
pipe is then flushed before exiting from the sync-point.
2017-10-31 13:58:29 +01:00
Christopher Faulet
be0faa2e47 MINOR: threads: Add nbthread parameter
It is only parsed and initialized for now. It will be used later. This parameter
is only available when support for threads was built in.
2017-10-31 13:58:29 +01:00
Christopher Faulet
415f611ff4 MINOR: threads: Add mechanism to register per-thread init/deinit functions
hap_register_per_thread_init and hap_register_per_thread_deinit functions has
been added to register functions to do, for each thread, respectively, some
initialization and deinitialization. These functions are added in the global
lists per_thread_init_list and per_thread_deinit_list.

These functions are called only when HAProxy is started with more than 1 thread
(global.nbthread > 1).
2017-10-31 13:58:29 +01:00
Christopher Faulet
1a2b56ea8e MEDIUM: threads: Add hathreads header file
This file contains all functions and macros used to deal with concurrency in
HAProxy. It contains all high-level function to do atomic operation
(HA_ATOMIC_*). Note, for now, we rely on "__atomic" GCC builtins to do atomic
operation. So HAProxy can be compiled with the thread support iff these builtins
are available.

It also contains wrappers around plocks to use spin or read/write locks. These
wrappers are used to abstract the internal representation of the locking system
and to add information to help debugging, when compiled with suitable
options.

To add extra info on locks, you need to add DEBUG=-DDEBUG_THREAD or
DEBUG=-DDEBUG_FULL compilation option. In addition to timing info on locks, we
keep info on where a lock was acquired the last time (function name, file and
line). There are also the thread id and a flag to know if it is still locked or
not. This will be useful to debug deadlocks.
2017-10-31 13:58:23 +01:00
Emeric Brun
7122ab31b1 MINOR: threads: Add atomic-ops and plock includes in import dir
atomic-ops header contains some low-level functions to do atomic
operations. These operations are used by the progressive locks (plock).
2017-10-31 11:36:13 +01:00
Christopher Faulet
e9bd686b68 MINOR: threads: Add THREAD_LOCAL macro
When compiled with threads support, this marco is set to __thread. Else it is
empty.
2017-10-31 11:36:13 +01:00
Christopher Faulet
93a518f02a MINOR: standard: Add memvprintf function
Now memprintf relies on memvprintf. This new function does exactly what
memprintf did before, but it must be called with a va_list instead of a variable
number of arguments. So there is no change for every functions using
memprintf. But it is now also possible to have same functionnality from any
function with variadic arguments.
2017-10-31 11:36:12 +01:00
Christopher Faulet
0108bb3e40 MEDIUM: mailers: Init alerts during conf parsing and refactor their processing
Email alerts relies on checks to send emails. The link between a mailers section
and a proxy was resolved during the configuration parsing, But initialization was
done when the first alert is triggered. This implied memory allocations and
tasks creations. With this patch, everything is now initialized during the
configuration parsing. So when an alert is triggered, only the memory required
by this alert is dynamically allocated.

Moreover, alerts processing had a flaw. The task handler used to process alerts
to be sent to the same mailer, process_email_alert, was designed to give back
the control to the scheduler when an alert was sent. So there was a delay
between the sending of 2 consecutives alerts (the min of
"proxy->timeout.connect" and "mailer->timeout.mail"). To fix this problem, now,
we try to process as much queued alerts as possible when the task is woken up.
2017-10-31 11:36:12 +01:00
Christopher Faulet
67957bd59e MAJOR: dns: Refactor the DNS code
This is a huge patch with many changes, all about the DNS. Initially, the idea
was to update the DNS part to ease the threads support integration. But quickly,
I started to refactor some parts. And after several iterations, it was
impossible for me to commit the different parts atomically. So, instead of
adding tens of patches, often reworking the same parts, it was easier to merge
all my changes in a uniq patch. Here are all changes made on the DNS.

First, the DNS initialization has been refactored. The DNS configuration parsing
remains untouched, in cfgparse.c. But all checks have been moved in a post-check
callback. In the function dns_finalize_config, for each resolvers, the
nameservers configuration is tested and the task used to manage DNS resolutions
is created. The links between the backend's servers and the resolvers are also
created at this step. Here no connection are kept alive. So there is no needs
anymore to reopen them after HAProxy fork. Connections used to send DNS queries
will be opened on demand.

Then, the way DNS requesters are linked to a DNS resolution has been
reworked. The resolution used by a requester is now referenced into the
dns_requester structure and the resolution pointers in server and dns_srvrq
structures have been removed. wait and curr list of requesters, for a DNS
resolution, have been replaced by a uniq list. And Finally, the way a requester
is removed from a DNS resolution has been simplified. Now everything is done in
dns_unlink_resolution.

srv_set_fqdn function has been simplified. Now, there is only 1 way to set the
server's FQDN, independently it is done by the CLI or when a SRV record is
resolved.

The static DNS resolutions pool has been replaced by a dynamoc pool. The part
has been modified by Baptiste Assmann.

The way the DNS resolutions are triggered by the task or by a health-check has
been totally refactored. Now, all timeouts are respected. Especially
hold.valid. The default frequency to wake up a resolvers is now configurable
using "timeout resolve" parameter.

Now, as documented, as long as invalid repsonses are received, we really wait
all name servers responses before retrying.

As far as possible, resources allocated during DNS configuration parsing are
releases when HAProxy is shutdown.

Beside all these changes, the code has been cleaned to ease code review and the
doc has been updated.
2017-10-31 11:36:12 +01:00
Christopher Faulet
344c4ab6a9 MEDIUM: spoe/rules: Process "send-spoe-group" action
The messages processing is done using existing functions. So here, the main task
is to find the SPOE engine to use. To do so, we loop on all filter instances
attached to the stream. For each, we check if it is a SPOE filter and, if yes,
if its name is the one used to declare the "send-spoe-group" action.

We also take care to return an error if the action processing is interrupted by
HAProxy (because of a timeout or an error at the HAProxy level). This is done by
checking if the flag ACT_FLAG_FINAL is set.

The function spoe_send_group is the action_ptr callback ot
2017-10-31 11:36:12 +01:00
Christopher Faulet
c718b82dfe MINOR: spoe: Add a type to qualify the message list during encoding
Because we can have messages chained by event or by group, we need to have a way
to know which kind of list we manipulate during the encoding. So 2 types of list
has been added, SPOE_MSGS_BY_EVENT and SPOE_MSGS_BY_GROUP. And the right type is
passed when spoe_encode_messages is called.
2017-10-31 11:36:12 +01:00
Christopher Faulet
76c09ef8de MEDIUM: spoe/rules: Add "send-spoe-group" action for tcp/http rules
This action is used to trigger sending of a group of SPOE messages. To do so,
the SPOE engine used to send messages must be defined, as well as the SPOE group
to send. Of course, the SPOE engine must refer to an existing SPOE filter. If
not engine name is provided on the SPOE filter line, the SPOE agent name must be
used. For example:

   http-request send-spoe-group my-engine some-group

This action is available for "tcp-request content", "tcp-response content",
"http-request" and "http-response" rulesets. It cannot be used for tcp
connection/session rulesets because actions for these rulesets cannot yield.

For now, the action keyword is parsed and checked. But it does nothing. Its
processing will be added in another patch.
2017-10-31 11:36:12 +01:00
Christopher Faulet
11610f3b5a MEDIUM: spoe: Parse new "spoe-group" section in SPOE config file
For now, this section is only parsed. It should have the following format:

    spoe-group <grp-name>
      messages <msg-name> ...

And then SPOE groups must be referenced in spoe-agent section:

    spoe-agnt <name>
        ...
	groups <grp-name> ...

The purpose of these groups is to trigger messages sending from TCP or HTTP
rules, directly from HAProxy configuration, and not on specific event. This part
will be added in another patch.

It is important to note that a message belongs at most to a group.
2017-10-31 11:36:12 +01:00
Christopher Faulet
7ee8667c99 MINOR: spoe: Check uniqness of SPOE engine names during config parsing
The engine name is now kept in "spoe_config" struture. Because a SPOE filter can
be declared without engine name, we use the SPOE agent name by default. Then,
its uniqness is checked against all others SPOE engines configured for the same
proxy.

  * TODO: Add documentation
2017-10-31 11:36:12 +01:00
Christopher Faulet
57583e474e MEDIUM: spoe: Add support of ACLS to enable or disable sending of SPOE messages
Now, it is possible to conditionnaly send a SPOE message by adding an ACL-based
condition on the "event" line, in a "spoe-message" section. Here is the example
coming for the SPOE documentation:

    spoe-message get-ip-reputation
        args ip=src
        event on-client-session if ! { src -f /etc/haproxy/whitelist.lst }

To avoid mixin with proxy's ACLs, each SPOE message has its private ACL list. It
possible to declare named ACLs in "spoe-message" section, using the same syntax
than for proxies. So we can rewrite the previous example to use a named ACL:

    spoe-message get-ip-reputation
        args ip=src
	acl ip-whitelisted src -f /etc/haproxy/whitelist.lst
        event on-client-session if ! ip-whitelisted

ACL-based conditions are executed in the context of the stream that handle the
client and the server connections.
2017-10-31 11:36:12 +01:00
Christopher Faulet
1b421eab87 MINOR: acl: Pass the ACLs as an explicit parameter of build_acl_cond
So it is possible to use anothers ACLs to build ACL conditions than those of
proxies.
2017-10-31 11:36:12 +01:00
Christopher Faulet
78880fb196 MINOR: action: Add function to check rules using an action ACT_ACTION_TRK_*
The function "check_trk_action" has been added to find and check the target
table for rules using an action ACT_ACTION_TRK_*.
2017-10-31 11:36:12 +01:00
Christopher Faulet
6d950b92cd MINOR: action: Add a function pointer in act_rule struct to check its validity
It is possible to define the field "act_rule.check_ptr" if you want to check the
validity of a tcp/http rule.
2017-10-31 11:36:12 +01:00
Christopher Faulet
4fce0d8447 MINOR: action: Use trk_idx instead of tcp/http_trk_idx
So tcp_trk_idx and http_trk_idx have been removed.
2017-10-31 11:36:12 +01:00
Christopher Faulet
7421b14c22 MINOR: action: Add trk_idx inline function
It returns tracking index corresponding to an action ACT_ACTION_TRK_SC*. It will
replace http_trk_idx and tcp_trk_idx.
2017-10-31 11:36:12 +01:00
Willy Tarreau
d22e83abd9 MINOR: h1: store the status code in the H1 message
It was painful not to have the status code available, especially when
it was computed. Let's store it and ensure we don't claim content-length
anymore on 1xx, only 0 body bytes.
2017-10-31 08:43:29 +01:00
William Lallemand
a3c77cfdd7 MINOR: shctx: rename lock functions
Rename lock functions to shctx_lock() and shctx_unlock() to be coherent
with the new API.
2017-10-31 03:49:44 +01:00
William Lallemand
4f45bb9c46 MEDIUM: shctx: separate ssl and shctx
This patch reorganize the shctx API in a generic storage API, separating
the shared SSL session handling from its core.

The shctx API only handles the generic data part, it does not know what
kind of data you use with it.

A shared_context is a storage structure allocated in a shared memory,
allowing its usage in a multithread or a multiprocess context.

The structure use 2 linked list, one containing the available blocks,
and another for the hot locked blocks. At initialization the available
list is filled with <maxblocks> blocks of size <blocksize>. An <extra>
space is initialized outside the list in case you need some specific
storage.

+-----------------------+--------+--------+--------+--------+----
| struct shared_context | extra  | block1 | block2 | block3 | ...
+-----------------------+--------+--------+--------+--------+----
                                 <--------  maxblocks  --------->
                                            * blocksize

The API allows to store content on several linked blocks. For example,
if you allocated blocks of 16 bytes, and you want to store an object of
60 bytes, the object will be allocated in a row of 4 blocks.

The API was made for LRU usage, each time you get an object, it pushes
the object at the end of the list. When it needs more space, it discards

The functions name have been renamed in a more logical way, the part
regarding shctx have been prefixed by shctx_ and the functions for the
shared ssl session cache have been prefixed by sh_ssl_sess_.
2017-10-31 03:49:40 +01:00
William Lallemand
ed0b5ad1aa REORG: shctx: move ssl functions to ssl_sock.c
Move the ssl callback functions of the ssl shared session cache to
ssl_sock.c. The shctx functions still needs to be separated of the ssl
tree and data.
2017-10-31 03:48:39 +01:00
William Lallemand
3f85c9aec8 MEDIUM: shctx: allow the use of multiple shctx
Add an shctx argument which permits to create new independent shctx
area.
2017-10-31 03:44:11 +01:00
William Lallemand
24a7a75be6 REORG: shctx: move lock functions and struct
Move locks functions to proto/shctx.h, and structures to types/shctx.h
in order to simplify the split ssl/shctx.
2017-10-31 03:44:11 +01:00
William Lallemand
83215a44b8 MEDIUM: lists: list_for_each_entry{_safe}_from functions
Add list_for_each_entry_from and list_for_each_entry_safe_from which
allows to iterate in a list starting from a specific item.
2017-10-31 03:44:11 +01:00
Emmanuel Hocdet
01da571e21 MINOR: merge ssl_sock_get calls for log and ppv2
Merge ssl_sock_get_version and ssl_sock_get_proto_version.
Change ssl_sock_get_cipher to be used in ppv2.
2017-10-27 19:32:36 +02:00
Emmanuel Hocdet
58118b43b1 MINOR: update proxy-protocol-v2 #define
Report #define from doc/proxy-protocol.txt.
2017-10-27 19:32:36 +02:00
Olivier Houchard
9679ac997a MINOR: ssl: Don't abuse ssl_options.
A bind_conf does contain a ssl_bind_conf, which already has a flag to know
if early data are activated, so use that, instead of adding a new flag in
the ssl_options field.
2017-10-27 19:26:52 +02:00
Olivier Houchard
c2aae74f01 MEDIUM: ssl: Handle early data with OpenSSL 1.1.1
When compiled with Openssl >= 1.1.1, before attempting to do the handshake,
try to read any early data. If any early data is present, then we'll create
the session, read the data, and handle the request before we're doing the
handshake.

For this, we add a new connection flag, CO_FL_EARLY_SSL_HS, which is not
part of the CO_FL_HANDSHAKE set, allowing to proceed with a session even
before an SSL handshake is completed.

As early data do have security implication, we let the origin server know
the request comes from early data by adding the "Early-Data" header, as
specified in this draft from the HTTP working group :

    https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-replay
2017-10-27 10:54:05 +02:00
Olivier Houchard
51a76d84e4 MINOR: http: Mark the 425 code as "Too Early".
This adds a new status code for use with the "http-request deny" ruleset.
The use case for this code is currently handled by this draft dedicated
to 0-RTT processing :

   https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-replay
2017-10-27 10:53:32 +02:00
Thierry FOURNIER
31904278dc MINOR: hlua: Add regex class
This patch simply brings HAProxy internal regex system to the Lua API.
Lua doesn't embed regexes, now it inherits from the regexes compiled
with haproxy.
2017-10-27 10:30:44 +02:00
William Lallemand
48b4bb4b09 MEDIUM: cfgparse: post parsing registration
Allow to register a function which will be called after the
configuration file parsing, at the end of the check_config_validity().

It's useful fo checking dependencies between sections or for resolving
keywords, pointers or values.
2017-10-27 10:15:56 +02:00
William Lallemand
d2ff56d2a3 MEDIUM: cfgparse: post section callback
This commit implements a post section callback. This callback will be
used at the end of a section parsing.

Every call to cfg_register_section must be modified to use the new
prototype:

    int cfg_register_section(char *section_name,
                             int (*section_parser)(const char *, int, char **, int),
                             int (*post_section_parser)());
2017-10-27 10:14:51 +02:00
Willy Tarreau
145746c2d5 MINOR: buffer: add the buffer input manipulation functions
We used to have bo_{get,put}_{chr,blk,str} to retrieve/send data to
the output area of a buffer, but not the equivalent ones for the input
area. This will be needed to copy uploaded data frames in HTTP/2.
2017-10-27 10:00:17 +02:00
Willy Tarreau
7b271b214f MEDIUM: connection: make use of CO_FL_WILL_UPDATE in conn_sock_shutw()
This one may be called by upper layers (eg: si_shutw()) or lower layers
(si_shutw() as well during stream_int_notify()) so we want it to take
care of updating the connection's flags if it's not going to be done
by the caller.
2017-10-25 15:52:41 +02:00
Willy Tarreau
916e12dcfb MINOR: connection: add flag CO_FL_WILL_UPDATE to indicate when updates are granted
In transport-layer functions (snd_buf/rcv_buf), it's very problematic
never to know if polling changes made to the connection will be propagated
or not. This has led to some conn_cond_update_polling() calls being placed
at a few places to cover both the cases where the function is called from
the upper layer and when it's called from the lower layer. With the arrival
of the MUX, this becomes even more complicated, as the upper layer will not
have to manipulate anything from the connection layer directly and will not
have to push such updates directly either. But the snd_buf functions will
need to see their updates committed when called from upper layers.

The solution here is to introduce a connection flag set by the connection
handler (and possibly any other similar place) indicating that the caller
is committed to applying such changes on return. This way, the called
functions will be able to apply such changes by themselves before leaving
when the flag is not set, and the upper layer will not have to care about
that anymore.
2017-10-25 15:52:41 +02:00
Willy Tarreau
bc97cc4fd1 MINOR: connection: move the cleanup of flag CO_FL_WAIT_ROOM
This flag is only used when reading using splicing for now, and is only
set when a pipe full condition is met, so we can simplify its reset
condition in conn_refresh_polling_flags so that it's cleared at the
same time as the other ones, only when the control layer is ready.

This flag could be used more, to mark that a buffer full condition was
met with any receive method in order to simplify polling management.
This should probably be revisited after 1.8.
2017-10-25 15:52:41 +02:00
Dragan Dosen
7389dd086c IMPORT: sha1: import SHA1 functions
This is based on the git SHA1 implementation and optimized to do word
accesses rather than byte accesses, and to avoid unnecessary copies into
the context array.
2017-10-25 04:45:48 +02:00
Emmanuel Hocdet
019f9b10ef MINOR: ssl: build with recent BoringSSL library
BoringSSL switch OPENSSL_VERSION_NUMBER to 1.1.0 for compatibility.
Fix BoringSSL call and openssl-compat.h/#define occordingly.
This will not break openssl/libressl compat.
2017-10-24 19:57:16 +02:00
Willy Tarreau
1296382d0b CONTRIB: trace: add the possibility to place trace calls in the code
Now any call to trace() in the code will automatically appear interleaved
with the call sequence and timestamped in the trace file. They appear with
a '#' on the 3rd argument (caller's pointer) in order to make them easy to
spot. If the trace functionality is not used, a dmumy weak function is used
instead so that it doesn't require to recompile every time traces are
enabled/disabled.

The trace decoder knows how to deal with these messages, detects them and
indents them similarly to the currently traced function. This can be used
to print function arguments for example.

Note that we systematically flush the log when calling trace() to ensure we
never miss important events, so this may impact performance.

The trace() function uses the same format as printf() so it should be easy
to setup during debugging sessions.
2017-10-24 19:54:25 +02:00
Willy Tarreau
cbc6524a19 MINOR: connection: remove conn_force_close()
Now only conn_full_close() will be used. It will become more obvious
when the tracking is in place or not and will make it easier to
convert remaining call places to conn_streams.
2017-10-22 09:54:19 +02:00
Willy Tarreau
3b737c9894 MINOR: stream-int: use conn_full_close() instead of conn_force_close()
We simply disable tracking before calling it.
2017-10-22 09:54:18 +02:00
Willy Tarreau
dc42acddb6 MINOR: connection: add conn_stop_tracking() to disable tracking
This will be used before conn_full_close() instead of using
conn_force_close(), resulting in a clearer exit path in various
situations.
2017-10-22 09:54:16 +02:00
Willy Tarreau
6a0a80adaf MINOR: connection: ensure conn_ctrl_close() also resets the fd
The connection's fd was reset to DEAD_FD_MAGIC on conn_force_close()
but not on conn_full_close(), which is a bit strange. Let's do it on
both.
2017-10-22 09:54:16 +02:00
Willy Tarreau
f9ce57e86c MEDIUM: connection: make conn_sock_shutw() aware of lingering
Instead of having to manually handle lingering outside, let's make
conn_sock_shutw() check for it before calling shutdown(). We simply
don't want to emit the FIN if we're going to reset the connection
due to lingering. It's particularly important for silent-drop where
it's absolutely mandatory that no packet leaves the machine.
2017-10-22 09:54:16 +02:00
Olivier Houchard
1a0545f3d7 REORG: connection: rename CO_FL_DATA_* -> CO_FL_XPRT_*
These flags are not exactly for the data layer, they instead indicate
what is expected from the transport layer. Since we're going to split
the connection between the transport and the data layers to insert a
mux layer, it's important to have a clear idea of what each layer does.

All function conn_data_* used to manipulate these flags were renamed to
conn_xprt_*.
2017-10-22 09:54:15 +02:00
Willy Tarreau
794f9af894 MEDIUM: h1: reimplement the http/1 response parser for the gateway
The HTTP/2->HTTP/1 gateway will need to process HTTP/1 responses. We
cannot sanely rely on the HTTP/1 txn to parse a response because :

  1) responses generated by haproxy such as error messages, redirects,
     stats or Lua are neither parsed nor indexed ; this could be
     addressed over the long term but will take time.

  2) the http txn is useless to parse the body : the states present there
     are only meaningful to received bytes (ie next bytes to parse) and
     not at all to sent bytes. Thus chunks cannot be followed at all.
     Even when implementing this later, it's unsure whether it will be
     possible when dealing with compression.

So using the HTTP txn is now out of the equation and the only remaining
solution is to call an HTTP/1 message parser. We already have one, it was
slightly modified to avoid keeping states by benefitting from the fact
that the response was produced by haproxy and this is entirely available.
It assumes the following rules are true, or that incuring an extra cost
to work around them is acceptable :
  - the response buffer is read-write and supports modifications in place

  - headers sent through / by haproxy are not folded. Folding is still
    implemented by replacing CR/LF/tabs/spaces with spaces if encountered

  - HTTP/0.9 responses are never sent by haproxy and have never been
    supported at all

  - haproxy will not send partial responses, the whole headers block will
    be sent at once ; this means that we don't need to keep expensive
    states and can afford to restart the parsing from the beginning when
    facing a partial response ;

  - response is contiguous (does not wrap). This was already the case
    with the original parser and ensures we can safely dereference all
    fields with (ptr,len)

The parser replaces all of the http_msg fields that were necessary with
local variables. The parser is not called on an http_msg but on a string
with a start and an end. The HTTP/1 states were reused for ease of use,
though the request-specific ones have not been implemented for now. The
error position and error state are supported and optional ; these ones
may be used later for bug hunting.

The parser issues the list of all the headers into a caller-allocated
array of struct ist.

The content-length/transfer-encoding header are checked and the relevant
info fed the h1 message state (flags + body_len).
2017-10-22 09:54:15 +02:00
Willy Tarreau
306924ecb8 MINOR: http: add very simple header management based on double strings
This will be used initially by the hpack table and hopefully later by a
new native http processor. These headers are made of name and value, both
an immediate string (ie: pointer and length).
2017-10-22 09:54:14 +02:00
Willy Tarreau
4093a4dc01 MINOR: h1: add struct h1m for basic HTTP/1 messages
This one is much simpler than http_msg and will be used in the HTTP
parsers involved in the H2 to H1 gateway.
2017-10-22 09:54:14 +02:00
Willy Tarreau
b28925675d MEDIUM: http: make the chunk crlf parser only depend on the buffer
The chunk crlf parser used to depend on the channel and on the HTTP
message, eventhough it's not really needed. Let's remove this dependency
so that it can be used within the H2 to H1 gateway.

As part of this small API change, it was renamed to h1_skip_chunk_crlf()
to mention that it doesn't depend on http_msg anymore.
2017-10-22 09:54:14 +02:00
Willy Tarreau
e56cdd3629 MEDIUM: http: make the chunk size parser only depend on the buffer
The chunk parser used to depend on the channel and on the HTTP message
but it's not really needed as they're only used to retrieve the buffer
as well as to return the number of bytes parsed and the chunk size.

Here instead we pass the (few) relevant information in arguments so that
the function may be reused without a channel nor an HTTP message (ie
from the H2 to H1 gateway).

As part of this API change, it was renamed to h1_parse_chunk_size() to
mention that it doesn't depend on http_msg anymore.
2017-10-22 09:54:14 +02:00
Willy Tarreau
8740c8b1b2 REORG: http: move the HTTP/1 header block parser to h1.c
Since it still depends on http_msg, it was not renamed yet.
2017-10-22 09:54:13 +02:00
Willy Tarreau
db4893d6a4 REORG: http: move the HTTP/1 chunk parser to h1.{c,h}
Functions http_parse_chunk_size(), http_skip_chunk_crlf() and
http_forward_trailers() were moved to h1.h and h1.c respectively so
that they can be called from outside. The parts that were inline
remained inline as it's critical for performance (+41% perf
difference reported in an earlier test). For now the "http_" prefix
remains in their name since they still depend on the http_msg type.
2017-10-22 09:54:13 +02:00