Currently a mux may be forced on a bind or server line by specifying the
"proto" keyword. The problem is that the mux may depend on the proxy's
mode, which is not known when parsing this keyword, so a wrong mux could
be picked.
Let's simply update the mux entry while checking its validity. We do have
the name and the side, we only need to see if a better mux fits based on
the proxy's mode. It also requires to remove the side check while parsing
the "proto" keyword since a wrong mux could be picked.
This way it becomes possible to declare multiple muxes with the same
protocol names and different sides or modes.
The CLOEXEC flag was set using a F_SETFL which can't work.
To set the CLOEXEC flag F_SETFD should be used, the problem is that it
needs a new call to fcntl() and it's on the path of every accept.
This flag was only needed in the case of the master, so the patch was
reverted and the flag set only in this case.
The bug was introduced by 0b3e849 ("MEDIUM: listeners: set O_CLOEXEC on
the accepted FDs").
No backport needed.
This patch replaces a number of __decl_hathread() followed by HA_SPIN_INIT
or HA_RWLOCK_INIT by the new __decl_spinlock() or __decl_rwlock() which
automatically registers the lock for initialization in during the STG_LOCK
init stage. A few static modifiers were lost in the process, but since they
were not essential at all it was not worth extending the API to provide such
a variant.
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :
- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
This protocol is based on the uxst one, but it uses socketpair and FD
passing insteads of a connect()/accept().
The "sockpair@" prefix has been implemented for both bind and server
keywords.
When HAProxy wants to connect through a sockpair@, it creates 2 new
sockets using the socketpair() syscall and pass one of the socket
through the FD specified on the server line.
On the bind side, haproxy will receive the FD, and will use it like it
was the FD of an accept() syscall.
This protocol was designed for internal communication within HAProxy
between the master and the workers, but it's possible to use it
externaly with a wrapper and pass the FD through environment variabls.
When a freshly created session is rejected, for any reason, during the accept in
the function "session_accept_fd", the variable "actconn" is decreased twice. The
first time when the rejected session is released, then in the function
"listener_accpect", because of the failure. So it is possible to have an
negative value for actconn. Note that, in this case, we will also have a negatve
value for the current number of connections on the listener rejecting the
session (actconn and l->nbconn are in/decreased in same time).
It is easy to reproduce the bug with this small configuration:
global
stats socket /tmp/haproxy
listen test
bind *:12345
tcp-request connection reject if TRUE
A "show info" on the stat socket, after a connection attempt, will show a very
high value (the unsigned representation of -1).
To fix the bug, if the function "session_accept_fd" returns an error, it
decrements the right counters and "listener_accpect" leaves them untouched.
This patch must be backported in 1.8.
When a listener is temporarily disabled, we start by locking it and then we call
.pause callback of the underlying protocol (tcp/unix). For TCP listeners, this
is not a problem. But listeners bound on an unix socket are in fact closed
instead. So .pause callback relies on unbind_listener function to do its job.
Unfortunatly, unbind_listener hold the listener's lock and then call an internal
function to unbind it. So, there is a deadlock here. This happens during a
reload. To fix the problemn, the function do_unbind_listener, which is lockless,
is now exported and is called when a listener bound on an unix socket is
temporarily disabled.
This patch must be backported in 1.8.
The listeners and connectors may complain that process-wide or
system-wide FD limits have been reached and will in this case report
maxfd as the limit. This is wrong in fact since there's no reason for
the whole FD space to be contiguous when the total # of FD is reached.
A better approach would consist in reporting the accurate number of
opened FDs, but this is pointless as what matters here is to give a
hint about what might be wrong. So let's simply report the configured
maxsock, which will generally explain why the process' limits were
reached, which is the most common reason. This removes another
dependency on maxfd.
This one allows not to inflate some structures when threads are
disabled. Now struct global is 1.4 kB instead of 33 kB.
Should be backported to 1.8 for ease of backporting of upcoming
patches.
It is now possible on a "bind" line (or a "stats socket" line) to specify the
thread set allowed to process listener's connections. For instance:
# HTTPS connections will be processed by all threads but the first and HTTP
# connection will be processed on the first thread.
bind *:80 process 1/1
bind *:443 ssl crt mycert.pem process 1/2-
The prefix "auto:" can be added before the process set to let HAProxy
automatically bind a process to a CPU by incrementing process and CPU sets. To
be valid, both sets must have the same size. No matter the declaration order of
the CPU sets, it will be bound from the lower to the higher bound.
Examples:
# all these lines bind the process 1 to the cpu 0, the process 2 to cpu 1
# and so on.
cpu-map auto:1-4 0-3
cpu-map auto:1-4 0-1 2-3
cpu-map auto:1-4 3 2 1 0
# bind each process to exaclty one CPU using all/odd/even keyword
cpu-map auto:all 0-63
cpu-map auto:even 0-31
cpu-map auto:odd 32-63
# invalid cpu-map because process and CPU sets have different sizes.
cpu-map auto:1-4 0 # invalid
cpu-map auto:1 0-3 # invalid
The documentation specifies that you can have several "process" options to
define several ranges on "bind" lines (or "stats socket" lines). It is uncommon,
but it should be possible. So the bind_proc bitmask in bind_conf structure must
not be overwritten at each new "process" option parsed.
This bug also exists in 1.7, 1.6 and 1.5. So it may be backported. But no one
seems to have noticed it, so it was probably never hitted.
At the end of the master initialisation, a call to protocol_unbind_all()
was made, in order to close all the FDs.
Unfortunately, this function closes the inherited FDs (fd@), upon reload
the master wasn't able to reload a configuration with those FDs.
The create_listeners() function now store a flag to specify if the fd
was inherited or not.
Replace the protocol_unbind_all() by mworker_cleanlisteners() +
deinit_pollers()
This macro should be used to declare variables or struct members depending on
the USE_THREAD compile option. It avoids the encapsulation of such declarations
between #ifdef/#endif. It is used to declare all lock variables.
At a number of places, bitmasks are used for process affinity and to map
listeners to processes. Every time 1UL<<(relative_pid-1) is used. Let's
create a "pid_bit" variable corresponding to this value to clean this up.
unbind_listener() takes the listener lock, which is already held by
enable_listener(). This situation happens when starting with nbproc > 1
with some bind lines limited to a certain process, because in this case
enable_listener() tries to stop unneeded listeners.
This commit introduces __do_unbind_listeners() which must be called with
the lock held, and makes enable_listener() use this one. Given that the
only return code has never been used and that it starts to make the code
more complicated to propagate it before throwing it to the trash, the
function's return type was changed to void.
First, we use atomic operations to update jobs/totalconn/actconn variables,
listener's nbconn variable and listener's counters. Then we add a lock on
listeners to protect access to their information. And finally, listener queues
(global and per proxy) are also protected by a lock. Here, because access to
these queues are unusal, we use the same lock for all queues instead of a global
one for the global queue and a lock per proxy for others.
There are several places where we see feconn++, feconn--, totalconn++ and
an increment on the frontend's number of connections and connection rate.
This is done exactly once per session in each direction, so better take
care of this counter in the session and simplify the callers. At least it
ensures a better symmetry. It also ensures consistency as till now the
lua/spoe/peers frontend didn't have these counters properly set, which can
be useful at least for troubleshooting.
Instead of duplicating some sensitive listener-specific code in the
session and in the stream code, let's call listener_release() when
releasing a connection attached to a listener.
Each user of a session increments/decrements the jobs variable at its
own place, resulting in a real mess and inconsistencies between them.
Let's have session_new() increment jobs and session_free() decrement
it.
Some places call delete_listener() then decrement the number of
listeners and jobs. At least one other place calls delete_listener()
without doing so, but since it's in deinit(), it's harmless and cannot
risk to cause zombie processes to survive. Given that the number of
listeners and jobs is incremented when creating the listeners, it's
much more logical to symmetrically decrement them when deleting such
listeners.
This function is used to create a series of listeners for a specific
address and a port range. It automatically calls the matching protocol
handlers to add them to the relevant lists. This way cfgparse doesn't
need to manipulate listeners anymore. As an added bonus, the memory
allocation is checked.
This commit remove the -Ds systemd mode in HAProxy in order to replace
it by a more generic master worker system. It aims to replace entirely
the systemd wrapper in the near future.
The master worker mode implements a new way of managing HAProxy
processes. The master is in charge of parsing the configuration
file and is responsible for spawning child processes.
The master worker mode can be invoked by using the -W flag. It can be
used either in background mode (-D) or foreground mode. When used in
background mode, the master will fork to daemonize.
In master worker background mode, chroot, setuid and setgid are done in
each child rather than in the master process, because the master process
will still need access to filesystem to reload the configuration.
When running with multiple process, if some proxies are just assigned
to some processes, the other processes will just close the file descriptors
for the listening sockets. However, we may still have to provide those
sockets when reloading, so instead we just try hard to pretend those proxies
are dead, while keeping the sockets opened.
A new global option, no-reused-socket", has been added, to restore the old
behavior of closing the sockets not bound to this process.
Add the "-x" flag, that takes a path to a unix socket as an argument. If
used, haproxy will connect to the socket, and asks to get all the
listening sockets from the old process. Any failure is fatal.
This is needed to get seamless reloads on linux.
When the "process" setting of a bind line limits the processes a
listening socket is enabled on, a "disable frontend" operation followed
by an "enable frontend" triggers a bug because all declared listeners
are attempted to be bound again regardless of their assigned processes.
This can at minima create new sockets not receiving traffic, and at worst
prevent from re-enabling a frontend if it's bound to a privileged port.
This bug was introduced by commit 1c4b814 ("MEDIUM: listener: support
rebinding during resume()") merged in 1.6-dev1, trying to perform the
bind() before checking the process list instead of after.
Just move the process check before the bind() operation to fix this.
This fix must be backported to 1.7 and 1.6.
Thanks to Pavlos for reporting this one.
Historically, all listeners have a pointer to the frontend. But since
the introduction of SSL, we now have an intermediary layer called
bind_conf corresponding to a "bind" line. It makes no sense to have
the frontend on each listener given that it's the same for all
listeners belonging to a same bind_conf. Also certain parts like
SSL can only operate on bind_conf and need the frontend.
This patch fixes this by moving the frontend pointer from the listener
to the bind_conf. The extra indirection is quite cheap given and the
places were this is used are very scarce.
When NetScaler application switch is used as L3+ switch, informations
regarding the original IP and TCP headers are lost as a new TCP
connection is created between the NetScaler and the backend server.
NetScaler provides a feature to insert in the TCP data the original data
that can then be consumed by the backend server.
Specifications and documentations from NetScaler:
https://support.citrix.com/article/CTX205670https://www.citrix.com/blogs/2016/04/25/how-to-enable-client-ip-in-tcpip-option-of-netscaler/
When CIP is enabled on the NetScaler, then a TCP packet is inserted just after
the TCP handshake. This is composed as:
- CIP magic number : 4 bytes
Both sender and receiver have to agree on a magic number so that
they both handle the incoming data as a NetScaler Client IP insertion
packet.
- Header length : 4 bytes
Defines the length on the remaining data.
- IP header : >= 20 bytes if IPv4, 40 bytes if IPv6
Contains the header of the last IP packet sent by the client during TCP
handshake.
- TCP header : >= 20 bytes
Contains the header of the last TCP packet sent by the client during TCP
handshake.
When a listener is not bound to a process its frontend belongs to, it
is only paused and not stopped. This creates confusion from the outside
as "netstat -ltnp" for example will report only the parent process as
the listener instead of the effective one. "ss -lnp" will report that
all processes are listening to all sockets.
This is confusing enough to suggest a fix. Now we simply stop the unused
listeners. Example with this simple config :
global
nbproc 4
frontend haproxy_test
bind-process 1-40
bind :12345 process 1
bind :12345 process 2
bind :12345 process 3
bind :12345 process 4
Before the patch :
$ netstat -ltnp
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30457/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30457/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30457/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30457/./haproxy
After the patch :
$ netstat -ltnp
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30504/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30503/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30502/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30501/./haproxy
This patch may be backported to 1.6 and 1.5, but it relies on commit
7a798e5 ("CLEANUP: fix inconsistency between fd->iocb, proto->accept
and accept()") since it will expose an API inconsistency by including
listener.h in the .c.
There's quite some inconsistency in the internal API. listener_accept()
which is the main accept() function returns void but is declared as int
in the include file. It's assigned to proto->accept() for all stream
protocols where an int is expected but the result is never checked (nor
is it documented by the way). This proto->accept() is in turn assigned
to fd->iocb() which is supposed to return an int composed of FD_WAIT_*
flags, but which is never checked either.
So let's fix all this mess :
- nobody checks accept()'s return
- nobody checks iocb()'s return
- nobody sets a return value
=> let's mark all these functions void and keep the current ones intact.
Additionally we now include listener.h from listener.c to ensure we won't
silently hide this incoherency in the future.
Note that this patch could/should be backported to 1.6 and even 1.5 to
simplify debugging sessions.
The union name "data" is a little bit heavy while we read the source
code because we can read "data.data.sint". The rename from "data" to "u"
makes the read easiest like "data.u.sint".
This patch remove the struct information stored both in the struct
sample_data and in the striuct sample. Now, only thestruct sample_data
contains data, and the struct sample use the struct sample_data for storing
his own data.
This patch removes the 32 bits unsigned integer and the 32 bit signed
integer. It replaces these types by a unique type 64 bit signed.
This makes easy the usage of integer and clarify signed and unsigned use.
With the previous version, signed and unsigned are used ones in place of
others, and sometimes the converter loose the sign. For example, divisions
are processed with "unsigned", if one entry is negative, the result is
wrong.
Note that the integer pattern matching and dotted version pattern matching
are already working with signed 64 bits integer values.
There is one user-visible change : the "uint()" and "sint()" sample fetch
functions which used to return a constant integer have been replaced with
a new more natural, unified "int()" function. These functions were only
introduced in the latest 1.6-dev2 so there's no impact on regular
deployments.
This patch removes the structs "session", "stream" and "proxy" from
the sample-fetches and converters function prototypes.
This permits to remove some weight in the prototype call.
Pavlos Parissis reported that a sequence of disable/enable on a frontend
performed on the CLI can result in an error if the frontend has several
"bind" lines each bound to different processes. This is because the
resume_listener() function returns a failure for frontends not part of
the current process instead of returning a success to pretend there was
no failure.
This fix should be backported to 1.5.
Many such function need a session, and till now they used to dereference
the stream. Once we remove the stream from the embryonic session, this
will not be possible anymore.
So as of now, sample fetch functions will be called with this :
- sess = NULL, strm = NULL : never
- sess = valid, strm = NULL : tcp-req connection
- sess = valid, strm = valid, strm->txn = NULL : tcp-req content
- sess = valid, strm = valid, strm->txn = valid : http-req / http-res
All of them can now retrieve the HTTP transaction *if it exists* from
the stream and be sure to get NULL there when called with an embryonic
session.
The patch is a bit large because many locations were touched (all fetch
functions had to have their prototype adjusted). The opportunity was
taken to also uniformize the call names (the stream is now always "strm"
instead of "l4") and to fix indent where it was broken. This way when
we later introduce the session here there will be less confusion.
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.
In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.
The files stream.{c,h} were added and session.{c,h} removed.
The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.
Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.
Once all changes are completed, we should see approximately this :
L7 - http_txn
L6 - stream
L5 - session
L4 - connection | applet
There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.
Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.