At a number of places, bitmasks are used for process affinity and to map
listeners to processes. Every time 1UL<<(relative_pid-1) is used. Let's
create a "pid_bit" variable corresponding to this value to clean this up.
The first pid in the pidfile is now the parent, it's more convenient for
supervising the processus.
You can now reload haproxy in master-worker mode with convenient command
like: kill -USR2 $(head -1 /tmp/haproxy.pid)
This patch introduces a new struct conn_stream. It's the stream-side of
a multiplexed connection. A pool is created and destroyed on exit. For
now the conn_streams are not used at all.
There was a flaw in the way the threads was created. the main one was just used
to create all the others and just wait to exit. Now, it is used to run a poll
loop. So we only create nbthread-1 threads.
This also fixes a bug about the compression filter when there is only 1 thread
(nbthread == 1 or no threads support). The bug was in the way thread-local
resources was initialized. per-thread init/deinit callbacks were never called
for the main process. So, with nthread set to 1, some buffers remained
uninitialized.
By default, no affinity is set for threads. To bind threads on CPU, you must
define a "thread-map" in the global section. The format is the same than the
"cpu-map" parameter, with a small difference. The process number must be
defined, with the same format than cpu-map ("all", "even", "odd" or a number
between 1 and 31/63).
A thread will be bound on the intersection of its mapping and the one of the
process on which it is attached. If the intersection is null, no specific bind
will be set for the thread.
A lock for LB parameters has been added inside the proxy structure and atomic
operations have been used to update server variables releated to lb.
The only significant change is about lb_map. Because the servers status are
updated in the sync-point, we can call recalc_server_map function synchronously
in map_set_server_status_up/down function.
This list is used to save changes on the servers state. So when serveral threads
are used, it must be locked. The changes are then applied in the sync-point. To
do so, servers_update_status has be moved in the sync-point. So this is useless
to lock it at this step because the sync-point is a protected area by iteself.
Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.
2 global locks have been added to protect, respectively, the run queue and the
wait queue. And a process mask has been added on each task. Like for FDs, this
mask is used to know which threads are allowed to process a task.
For many tasks, all threads are granted. And this must be your first intension
when you create a new task, else you have a good reason to make a task sticky on
some threads. This is then the responsibility to the process callback to lock
what have to be locked in the task context.
Nevertheless, all tasks linked to a session must be sticky on the thread
creating the session. It is important that I/O handlers processing session FDs
and these tasks run on the same thread to avoid conflicts.
Many changes have been made to do so. First, the fd_updt array, where all
pending FDs for polling are stored, is now a thread-local array. Then 3 locks
have been added to protect, respectively, the fdtab array, the fd_cache array
and poll information. In addition, a lock for each entry in the fdtab array has
been added to protect all accesses to a specific FD or its information.
For pollers, according to the poller, the way to manage the concurrency is
different. There is a poller loop on each thread. So the set of monitored FDs
may need to be protected. epoll and kqueue are thread-safe per-se, so there few
things to do to protect these pollers. This is not possible with select and
poll, so there is no sharing between the threads. The poller on each thread is
independant from others.
Finally, per-thread init/deinit functions are used for each pollers and for FD
part for manage thread-local ressources.
Now, you must be carefull when a FD is created during the HAProxy startup. All
update on the FD state must be made in the threads context and never before
their creation. This is mandatory because fd_updt array is thread-local and
initialized only for threads. Because there is no pollers for the main one, this
array remains uninitialized in this context. For this reason, listeners are now
enabled in run_thread_poll_loop function, just like the worker pipe.
The function sync_poll_loop is called at the end of each loop inside
run_poll_loop function. It is a protected area where all threads have a chance
to execute tricky tasks with the warranty that no concurrent access is
possible. Of course, it comes with a cost because all threads must be
syncrhonized. So changes must be uncommon.
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
hap_register_per_thread_init and hap_register_per_thread_deinit functions has
been added to register functions to do, for each thread, respectively, some
initialization and deinitialization. These functions are added in the global
lists per_thread_init_list and per_thread_deinit_list.
These functions are called only when HAProxy is started with more than 1 thread
(global.nbthread > 1).
This is a huge patch with many changes, all about the DNS. Initially, the idea
was to update the DNS part to ease the threads support integration. But quickly,
I started to refactor some parts. And after several iterations, it was
impossible for me to commit the different parts atomically. So, instead of
adding tens of patches, often reworking the same parts, it was easier to merge
all my changes in a uniq patch. Here are all changes made on the DNS.
First, the DNS initialization has been refactored. The DNS configuration parsing
remains untouched, in cfgparse.c. But all checks have been moved in a post-check
callback. In the function dns_finalize_config, for each resolvers, the
nameservers configuration is tested and the task used to manage DNS resolutions
is created. The links between the backend's servers and the resolvers are also
created at this step. Here no connection are kept alive. So there is no needs
anymore to reopen them after HAProxy fork. Connections used to send DNS queries
will be opened on demand.
Then, the way DNS requesters are linked to a DNS resolution has been
reworked. The resolution used by a requester is now referenced into the
dns_requester structure and the resolution pointers in server and dns_srvrq
structures have been removed. wait and curr list of requesters, for a DNS
resolution, have been replaced by a uniq list. And Finally, the way a requester
is removed from a DNS resolution has been simplified. Now everything is done in
dns_unlink_resolution.
srv_set_fqdn function has been simplified. Now, there is only 1 way to set the
server's FQDN, independently it is done by the CLI or when a SRV record is
resolved.
The static DNS resolutions pool has been replaced by a dynamoc pool. The part
has been modified by Baptiste Assmann.
The way the DNS resolutions are triggered by the task or by a health-check has
been totally refactored. Now, all timeouts are respected. Especially
hold.valid. The default frequency to wake up a resolvers is now configurable
using "timeout resolve" parameter.
Now, as documented, as long as invalid repsonses are received, we really wait
all name servers responses before retrying.
As far as possible, resources allocated during DNS configuration parsing are
releases when HAProxy is shutdown.
Beside all these changes, the code has been cleaned to ease code review and the
doc has been updated.
In order to prepare multi-thread development, code was re-worked
to propagate changes asynchronoulsy.
Servers with pending status changes are registered in a list
and this one is processed and emptied only once 'run poll' loop.
Operational status changes are performed before administrative
status changes.
In a case of multiple operational status change or admin status
change in the same 'run poll' loop iteration, those changes are
merged to reach only the targeted status.
The server state and weight was reworked to handle
"pending" values updated by checks/CLI/LUA/agent.
These values are commited to be propagated to the
LB stack.
In further dev related to multi-thread, the commit
will be handled into a sync point.
Pending values are named using the prefix 'next_'
Current values used by the LB stack are named 'cur_'
This string is used in sample fetches so it is safe to use a preallocated trash
chunk instead of a buffer dynamically allocated during HAProxy startup.
First, this variable does not need to be publicly exposed because it is only
used by stick_table functions. So we declare it as a global static in
stick_table.c file. Then, it is useless to use a pointer. Using a plain struct
variable avoids any dynamic allocation.
swap_buffer is a global variable only used by buffer_slow_realign. So it has
been moved from global.h to buffer.c and it is allocated by init_buffer
function. deinit_buffer function has been added to release it. It is also used
to destroy the buffers' pool.
During the configuration parsing, log buffers are reallocated when
global.max_syslog_len is updated. This can be done serveral time. So, instead of
doing it serveral time, we do it only once after the configuration parsing.
Now, we use init_log_buffers and deinit_log_buffers to, respectively, initialize
and deinitialize log buffers used for syslog messages.
These functions have been introduced to be used by threads, to deal with
thread-local log buffers.
Trash buffers are reallocated when "tune.bufsize" parameter is changed. Here, we
just move the realloc after the configuration parsing.
Given that the config parser doesn't rely on the trash size, it should be
harmless.
Now, we use init_trash_buffers and deinit_trash_buffers to, respectively,
initialize and deinitialize trash buffers (trash, trash_buf1 and trash_buf2).
These functions have been introduced to be used by threads, to deal with
thread-local trash buffers.
Use a cpuset_t instead of assuming the cpu mask is an unsigned long.
This should fix setting the CPU affinity on FreeBSD >= 11.
This patch should be backported to stable releases.
As mentionned in commit cf4e496c9 ("BUG/MEDIUM: build without openssl broken"),
commit 872f9c213 ("MEDIUM: ssl: add basic support for OpenSSL crypto engine")
broke the build without openssl support. But the former did only fix it when
openssl is not enabled, but not when it's not installed on the system :
In file included from src/haproxy.c:112:
include/proto/ssl_sock.h:24:25: openssl/ssl.h: No such file or directory
In file included from src/haproxy.c:112:
include/proto/ssl_sock.h:45: error: syntax error before "SSL_CTX"
include/proto/ssl_sock.h:75: error: syntax error before '*' token
include/proto/ssl_sock.h:75: warning: type defaults to `int' in declaration of `ssl_sock_create_cert'
include/proto/ssl_sock.h:75: warning: data definition has no type or storage class
include/proto/ssl_sock.h:76: error: syntax error before '*' token
include/proto/ssl_sock.h:76: warning: type defaults to `int' in declaration of `ssl_sock_get_generated_cert'
include/proto/ssl_sock.h:76: warning: data definition has no type or storage class
include/proto/ssl_sock.h:77: error: syntax error before '*' token
Now we also surround the include with #ifdef USE_OPENSSL to fix this. No
backport is needed since openssl async engines were not backported.
When several stick-tables were configured with several peers sections,
only a part of them could be synchronized: the ones attached to the last
parsed 'peers' section. This was due to the fact that, at least, the peer I/O handler
refered to the wrong peer section list, in fact always the same: the last one parsed.
The fact that the global peer section list was named "struct peers *peers"
lead to this issue. This variable name is dangerous ;).
So this patch renames global 'peers' variable to 'cfg_peers' to ensure that
no such wrong references are still in use, then all the functions wich used
old 'peers' variable have been modified to refer to the correct peer list.
Must be backported to 1.6 and 1.7.
When starting the master worker with -sf or -st, the PIDs will be reused
on the next reload, which is a problem if new processes on the system
took those PIDs.
This patch ensures that we don't register old PIDs in the reload system
when launching the master worker.
Don't copy the -x argument anymore in copy_argv() since it's already
allocated in mworker_reload().
Make the copy_argv() more consistent when used with multiple arguments
to strip.
It prevents multiple -x on reload, which is not supported.
This patch fixes a segfault in the command line parser.
When haproxy is launched with -x with no argument and -x is the latest
option in argv it segfaults.
Use usage() insteads of exit() on error.
Commit cb11fd2 ("MEDIUM: mworker: wait mode on reload failure")
introduced a regression, when HAProxy is used in daemon mode, it exits 1
after forking its children.
HAProxy should exit(0), the exit(EXIT_FAILURE) was expected to be use
when the master fail in master-worker mode.
Thanks to Emmanuel Hocdet for reporting this bug. No backport needed.
The commit 872f9c213 ("MEDIUM: ssl: add basic support for OpenSSL crypto
engine") broke the build without openssl support.
The ssl_free_dh() function is not defined when USE_OPENSSL is not
defined and leads to a compilation failure.
This patch ensure that the children will exit when the master quits,
even if the master didn't send any signal.
The master and the workers are connected through a pipe, when the pipe
closes the children leave.
This option exits every workers when one of the current workers die.
It allows you to monitor the master process in order to relaunch
everything on a failure.
For example it can be used with systemd and Restart=on-failure in a spec
file.
In master worker mode, you can't specify the stats socket where you get
your listeners FDs on a reload, because the command line of the re-exec
is launched by the master.
To solve the problem, when -x is found on the command line, its
parameter is rewritten on a reexec with the first stats socket with the
capability to send sockets. It tries to reuse the original parameter if
it has this capability.